idnits 2.17.1 draft-xu-idr-performance-routing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 16, 2014) is 3753 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5226' is defined on line 296, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 2679 (Obsoleted by RFC 7679) == Outdated reference: A later version (-11) exists of draft-ietf-ospf-te-metric-extensions-02 == Outdated reference: A later version (-03) exists of draft-previdi-isis-te-metric-extensions-02 == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-09 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group X. Xu 2 Internet Draft H. Ni 3 Category: Standard Track Huawei 5 M. Boucadair 6 C. Jacquenet 7 France Telecom 9 N. So 10 Tata Communications 12 Y. Fan 13 China Telecom 15 Expires: July 2014 January 16, 2014 17 Performance-based BGP Routing Mechanism 19 draft-xu-idr-performance-routing-00 21 Abstract 23 The current BGP specification doesn't use network performance 24 metrics (e.g., network latency) in the route selection decision 25 process. This document describes a performance-based BGP routing 26 mechanism in which network latency metric is taken as one of the 27 route selection criteria. This routing mechanism is useful for those 28 server providers with global reach to deliver low-latency network 29 connectivity services to their customers. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on July 16, 2014. 47 Copyright Notice 49 Copyright (c) 2013 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with 57 respect to this document. Code Components extracted from this 58 document must include Simplified BSD License text as described in 59 Section 4.e of the Trust Legal Provisions and are provided without 60 warranty as described in the Simplified BSD License. 62 Conventions used in this document 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC-2119 [RFC2119]. 68 Table of Contents 70 1. Introduction ................................................ 3 71 2. Terminology ................................................. 3 72 3. Performance Route Advertisement ............................. 4 73 4. Capability Advertisement .................................... 5 74 5. Performance Route Selection ................................. 6 75 6. Deployment Considerations ................................... 6 76 7. Security Considerations ..................................... 6 77 8. IANA Considerations ......................................... 7 78 9. Acknowledgements ............................................ 7 79 10. References ................................................. 7 80 10.1. Normative References .................................. 7 81 10.2. Informative References ................................ 7 82 Authors' Addresses ............................................. 8 84 1. Introduction 86 Network performance, especially network latency is widely recognized 87 as one of major obstacles in migrating business applications to the 88 cloud, especially in the case where the network paths between cloud 89 users and cloud data centers traverse more than one Autonomous 90 System (AS), and would therefore stretch the forwarding path. 91 However, the current Border Gateway Protocol (BGP) specification 92 [RFC4271] which is used for path selection across ASes (Autonomous 93 Systems) doesn't use network performance metrics (e.g., network 94 latency) in the route selection process. As such, the best route 95 selected based upon the existing BGP route selection criteria may 96 not be the best from the customer experience perspective. 98 This document describes a performance-based BGP routing mechanism in 99 which network performance metrics are conveyed as additional path 100 attributes of the Network Layer Reachability Information (NLRI) and 101 used in the route selection decisions. So far it's only the network 102 latency metric that would be used in the performance-based route 103 selection decisions. This mechanism is useful for those server 104 providers with global reach, which usually own more than one AS, to 105 deliver low-latency network connectivity services to their customers. 107 For the sake of simplicity, this document considers only one 108 performance metric that's the network latency metric. The support of 109 multiple attributes is out of scope of this document. 111 To make the performance routing paradigm and the vanilla routing 112 paradigm coexist, performance routes should be exchanged as labeled 113 routes as per [RFC3107] while using a specified Subsequent Address 114 Family Identifier (SAFI). As such, network providers deploying such 115 mechanism in their networks may provide the performance routing 116 service as a value-added service to those customers with low latency 117 need, while continually offering the vanilla routing service to the 118 remaining customers as before. 120 A variant of this performance-based BGP routing is implemented [URL: 121 http://www.ist-mescal.org/roadmap/qbgp-demo.avi]. 123 2. Terminology 125 This memo makes use of the terms defined in [RFC4271]. 127 Network latency indicates the amount of time it takes for a packet 128 to traverse a given network path [RFC2679]. Provided a packet was 129 forwarded along a path which contains multiple links and routers, 130 the network latency would be the sum of the transmission latency of 131 each link (i.e., link latency), plus the sum of the internal delay 132 occurred within each router (i.e., router latency) which includes 133 queuing latency and processing latency. The sum of the link latency 134 is also known as the cumulative link latency. In today's service 135 provider networks which usually span across a wide geographical area, 136 the cumulative link latency becomes the major part of the network 137 latency since the total of the internal latency happened within each 138 high-capacity router seems trivial compared to the cumulative link 139 latency. In other words, the cumulative link latency could 140 approximately represent the network latency in the above networks. 142 Furthermore, since the link latency is more stable than the router 143 latency, such approximate network latency represented by the 144 cumulative link latency is more stable. Therefore, if there was a 145 way to calculate the cumulative link latency of a given network path, 146 it is strongly recommended to use such cumulative link latency to 147 approximately represent the network latency. Otherwise, the network 148 latency would have to be measured frequently by some means (e.g., 149 PING or other measurement tools). 151 3. Performance Route Advertisement 153 Performance routes SHOULD be exchanged between BGP peers by using a 154 specified Subsequent Address Family Identifier (SAFI) of TBD (see 155 IANA Section). Meanwhile, these routes SHOULD be carried as labeled 156 routes as per [RFC3107]. 158 A BGP speaker SHOULD NOT advertise performance routes to a 159 particular BGP peer unless that peer indicates, through BGP 160 capability advertisement (see Section 4), that it can process update 161 messages with the specified SAFI field. 163 Network latency metric is attached to the performance routes as one 164 additional path attribute, referred to as NETWORK_LATENCY path 165 attribute, which is a well-known mandatory attribute. This attribute 166 indicates the network latency in microseconds from the BGP speaker 167 depicted by the NEXT_HOP path attribute to the address depicted by 168 the NLRI prefix. The type code of this attribute is TBD (see IANA 169 Section), and the value field is 4 octets in length. In some 170 abnormal cases, if the cumulative link latency exceeds the maximum 171 value of 0xFFFFFFFF, the value field SHOULD be set to 0xFFFFFFFF. 173 A BGP speaker SHOULD be configurable to enable or disable the 174 origination/creation of performance routes. If enabled, a local 175 latency value for a given to-be-originated performance route MUST be 176 configured to the BGP speaker so that it can be filled to the 177 NETWORK_LATENCY attribute of that performance route. 179 When distributing a selected performance route learnt from one BGP 180 peer to another, unless this BGP speaker has set itself as the 181 NEXT_HOP of such route, the NETWORK_LATENCY path attribute of such 182 route MUST NOT be modified. Otherwise when setting itself as the 183 NEXT_HOP of such route, this BGP speaker SHOULD increase the value 184 of the NETWORK_LATENCY path attribute by adding the network latency 185 value from itself to the previous NEXT_HOP of such route. It is 186 RECOMMENDED to use the cumulative link latency from this BGP speaker 187 to the NEXT_HOP to represent the network latency between them if 188 possible. Otherwise, the measured network latency between them can 189 be used instead. It is RECOMMENDED that the type of network latency 190 SHOULD be kept consistent across all these AS's (i.e., either 191 cumulative link latency or measured network latency, choose one). 193 As for how to obtain the network latency to a given BGP NEXT_HOP is 194 outside the scope of this document. However, note that the path 195 latency to the NEXT HOP SHOULD approximately represent the network 196 latency of the exact forwarding path towards the NEXT_HOP. For 197 example, if a BGP speaker uses a Traffic Engineering (TE) Label 198 Switching Path (LSP) from itself to the NEXT_HOP, rather than the 199 shortest path calculated by Interior Gateway Protocol (IGP), the 200 latency to the NEXT HOP SHOULD reflect the network latency of that 201 TE LSP path, rather than the IGP shortest path. 203 To keep performance routes stable enough, a BGP speaker SHOULD use a 204 configurable threshold of network latency fluctuation to suppress 205 any update which would otherwise be triggered just by a minor 206 network latency fluctuation below that threshold. 208 4. Capability Advertisement 210 A BGP speaker that uses multiprotocol extensions to advertise 211 performance routes SHOULD use the Capabilities Optional Parameter, 212 as defined in [RFC5492], to inform its peers about this capability. 214 The MP_EXT Capability Code, as defined in [RFC4760], is used to 215 advertise the (AFI, SAFI) pairs available on a particular connection. 217 A BGP speaker that implements the Performance Routing Capability 218 MUST support the BGP Labeled Route Capability, as defined in 219 [RFC3107]. A BGP speaker that advertises the Performance Routing 220 Capability to a peer using BGP Capabilities advertisement [RFC5492] 221 does not have to advertise the BGP Labeled Route Capability to that 222 peer. 224 5. Performance Route Selection 226 Performance route selection only requires the following modification 227 to the tie-breaking procedures of the BGP route selection decision 228 (phase 2) described in [RFC4271]: network latency metric comparison 229 SHOULD be executed just ahead of the AS-Path Length comparison step. 231 Prior to executing the network latency metric comparison, the value 232 of the NETWORK_LATENCY path attribute SHOULD be increased by adding 233 the network latency from the BGP speaker to the NEXT_HOP of that 234 route. In the case where a router reflector is deployed without 235 next-hop-self enabled when reflecting received routes from one IBGP 236 peer to other IBGP peer, it is RECOMMENDED to enable such route 237 reflector to reflect all received performance routes by using some 238 mechanisms such as [ADD-PATH], rather than reflecting only the 239 performance route which is the best from its own perspective. 240 Otherwise, it may result in a non-optimal choice by its clients 241 and/or its IBGP peers. 243 The Loc-RIB of performance routing paradigm is independent from that 244 of vanilla routing paradigm. Accordingly, the routing table of 245 performance routing paradigm is independent from that of the vanilla 246 routing paradigm. Whether performance routing paradigm or vanilla 247 routing paradigm would be used for a given packet is a local policy 248 issue which is outside the scope of this document. 250 6. Deployment Considerations 252 It is RECOMMENDED to deploy this performance-based BGP routing 253 mechanism across multiple ASes which are within a single 254 administrative domain. Within each AS, it is RECOMMENTED to deliver 255 a packet from a BGP speaker to the BGP NEXT_HOP via tunnels, 256 especially TE LSP tunnels. Furthermore, it is RECOMMENDED to use the 257 latency metric carried in Unidirectional Link Delay Sub-TLV [OSPF- 258 TE-EXT] [ISIS-TE-EXT] if possible, rather than the TE metric 259 [RFC3630] [RFC5305] to perform the C-SPF calculation, unless the TE 260 metric has already been set to the link latency metric. In this way, 261 it could avoid the need for timely measurement of network latency 262 between IBGP peers. 264 7. Security Considerations 266 In addition to the considerations discussed in [RFC4271], the 267 following items should be considered: 269 Tweaking the value of the NETWORK_LATENCY by an illegitimate 270 party may influence the route selection process. Means to check 271 the integrity of BGP messages are RECOMMENDED. 273 Frequent updates of the NETWORK_LATENCY attribute may have a 274 severe impact on the stability of the routing system. Such 275 practice SHOULD be avoided. 277 8. IANA Considerations 279 A new BGP Capability Code for the Performance Routing Capability, a 280 new SAFI specific for performance routing and a new path attribute 281 for NETWORK_LATENCY are required to be allocated by IANA. 283 9. Acknowledgements 285 Thanks to Joel Halpern, Alvaro Retana, Jim Uttaro, Robert Raszuk, 286 Eric Rosen, Qing Zeng, Jie Dong and Mach Chen for their valuable 287 comments on the initial idea of this document. 289 10. References 291 10.1. Normative References 293 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 294 Requirement Levels", BCP 14, RFC 2119, March 1997. 296 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 297 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 298 May 2008. 300 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 301 Protocol 4 (BGP-4)", RFC 4271, January 2006. 303 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label 304 Information in BGP-4", RFC 3107, May 2001. 306 10.2. Informative References 308 [RFC5492] Chandra, R. and J. Scudder, "Capabilities Advertisement 309 with BGP-4", RFC 5492, February 2009. 311 [RFC4760] Bates, T., Rekhter, Y, Chandra, R. and D. Katz, 312 "Multiprotocol Extensions for BGP-4", RFC 4760, January 313 2007. 315 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 316 Delay Metric for IPPM", RFC 2679, September 1999. 318 [OSPF-TE-EXT] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. 319 Previdi, "OSPF Traffic Engineering (TE) Metric 320 Extensions", draft-ietf-ospf-te-metric-extensions-02 (work 321 in progress), December 2012. 323 [ISIS-TE-EXT] Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas, 324 A., and C. Filsfils, "IS-IS Traffic Engineering (TE) 325 Metric Extensions", draft-previdi-isis-te-metric- 326 extensions-02 (work in progress), October 2012. 328 [RFC3630] Katz, D., Kompella, K., Yeung, D., "Traffic 329 Engineering (TE) Extensions to OSPF Version 2", RFC 3630, 330 September 2003. 332 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 333 Engineering", RFC 5305, October 2008. 335 [ADD-PATH] D. Walton, A. Retana, E. Chen, J. Scudder, "Advertisement 336 of Multiple Paths in BGP", draft-ietf-idr-add-paths-09 337 (work in progress), October 2013. 339 Authors' Addresses 341 Xiaohu Xu 342 Huawei Technologies, 343 Beijing, China 344 Phone: +86-10-60610041 345 Email: xuxiaohu@huawei.com 347 Hui Ni 348 Huawei Technologies, 349 Beijing, China 350 Phone: +86-10-606100212 351 Email: nihui@huawei.com 353 Mohamed Boucadair 354 France Telecom 355 Rennes, France 356 EMail: mohamed.boucadair@orange.com 358 Christian Jacquenet 359 Orange 360 Rennes France 361 Email: christian.jacquenet@orange.com 363 Ning So 364 Tata Communications 365 Plano, TX 75082, USA 366 Email: ning.so@tatacommunications.com 368 Yongbing Fan 369 China Telecom 370 Guangzhou, China. 371 Phone: +86 20 38639121 372 Email: fanyb@gsta.com