idnits 2.17.1 draft-ietf-idr-bgp-nh-cost-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4456]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (May 15, 2015) is 3268 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BGP-ORR' is mentioned on line 98, but not defined == Missing Reference: 'I-D.ietf-idr-bgp-orr' is mentioned on line 110, but not defined == Missing Reference: 'I-D.ietf-idr-bgp-ls' is mentioned on line 227, but not defined == Unused Reference: 'I-D.ietf-idr-bgp-optimal-route-reflection' is defined on line 249, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-idr-ls-distribution' is defined on line 255, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 261, but no explicit reference was found in the text == Unused Reference: 'RFC2328' is defined on line 264, but no explicit reference was found in the text == Unused Reference: 'RFC4760' is defined on line 277, but no explicit reference was found in the text == Unused Reference: 'RFC2918' is defined on line 286, but no explicit reference was found in the text == Outdated reference: A later version (-28) exists of draft-ietf-idr-bgp-optimal-route-reflection-09 == Outdated reference: A later version (-13) exists of draft-ietf-idr-ls-distribution-10 Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force I. Varlashkin 3 Internet-Draft Google 4 Intended status: Standards Track R. Raszuk 5 Expires: November 16, 2015 Mirantis Inc. 6 K. Patel 7 M. Bhardwaj 8 S. Bayraktar 9 Cisco Systems 10 May 15, 2015 12 Carrying next-hop cost information in BGP 13 draft-ietf-idr-bgp-nh-cost-02 15 Abstract 17 BGPLS provides a mechanism by which Link state and traffic 18 engineering information can be collected from internal networks and 19 shared with external network routers using BGP. BGPLS defines a new 20 Address Family to exchange this information using BGP. 22 BGP Optimal Route Reflection (ORR) provides a mechanism for a 23 centralized BGP Route Reflector to acheive requirements of a Hot 24 Potato Routing as described in Section 11 of [RFC4456]. Optimal 25 Route Reflection requires BGP ORR to overwrite the default IGP 26 location placement of the route reflector; which is used for 27 determining cost to the nexthop contained in the path. 29 This draft augments BGPLS and defines a new extensions to exchange 30 cost information to next-hops for the purpose of calculating best 31 path from a peer perspective rather than local BGP speaker own 32 perspective. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on November 16, 2015. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . 3 69 3. BGP Bestpath Selection Modification . . . . . . . . . . . . . 4 70 4. BGPLS Extensions . . . . . . . . . . . . . . . . . . . . . . 4 71 4.1. RIB Metrics Prefix Descriptor . . . . . . . . . . . . . . 4 72 4.2. RIB Protocol ID . . . . . . . . . . . . . . . . . . . . . 4 73 4.3. Information Exchange . . . . . . . . . . . . . . . . . . 5 74 4.4. Termination of the session carrying next-hop cost . . . . 5 75 4.5. Graceful Restart and Route-Refresh . . . . . . . . . . . 5 76 5. Security considerations . . . . . . . . . . . . . . . . . . . 5 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 78 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 80 8.1. Normative References . . . . . . . . . . . . . . . . . . 6 81 8.2. Informative References . . . . . . . . . . . . . . . . . 7 82 Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . 7 83 A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . 7 84 A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . 7 85 A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . 8 86 A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . 8 87 A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . 9 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 90 1. Introduction 92 In a certain situation, route-reflector clients may not get optimum 93 path to certain destinations. ADDPATH solves this problem by letting 94 route-reflector to advertise multiple paths for a given prefix. If 95 number of advertised paths are sufficiently big, route-reflector 96 clients can choose same route as they would in case of full-mesh. 97 This approach however places an additional burden on the control 98 plane. Solutions proposed by [BGP-ORR] use different approach - 99 instead of calculating best path from the local speaker's own 100 perspective the calculations are done using cost from the client to 101 the next-hops. Although they eliminate need for transmitting 102 redundant routing information between peers, there are scenarios 103 where cost to the next-hop cannot be obtained accurately using these 104 methods. For example, if next-hop information itself has been 105 learned via BGP then simple SPF run on link-state database won't be 106 sufficient to obtain cost information. There are also scenarios 107 where while a Route Reflector can reach its clients, the client to 108 client connectivity MAY be down. 110 BGPLS [I-D.ietf-idr-bgp-orr] provides a mechanism by which Link state 111 and traffic engineering information can be collected from internal 112 networks and shared with external network routers using BGP. BGPLS 113 defines a new Address Family to exchange this information using BGP. 115 To address such scenarios, this draft defines extensions to BGPLS to 116 carry cost information of the next-hops. In particular, this draft 117 defines a new Protocol ID to announce a Router's IGP routes, and a 118 Prefix Descriptor to carry the cost information of the IGP routes 119 used towards resolving next-hops. 121 2. NEXT-HOP INFORMATION BASE 123 To facilitate further description of the proposed solution we 124 introduce a new table for all known next-hops and costs to it from 125 various routers on the network. 127 Next-Hop Information Base (NHIB) stores cost to reach next-hop from 128 an arbitrary router on the network. This information is essential 129 for choosing best path from a peer perspective rather than BGP- 130 speaker own perspective. In canonical form NHIB entry is triplet 131 (router, next-hop, cost), however this specification does not impose 132 any restriction on how BGP implementations store that information 133 internally. The cost in NHIB is does not have to be an IGP cost, but 134 all costs in NHIB MUST be comparable with each other. 136 NHIB can be populated from various sources including static routing 137 and dynamic routing. However, this document focuses on populating 138 NHIB using BGP. 140 An implementation implementing the BGP extension described in this 141 draft MAY provide an operator-controlled configuration knob 142 significant to an individual BGP speaker that treats next-hop cost 143 information received from two or more clients as equivalent. For 144 example a route-reflector could receive next-hop cost only from R1 145 but it will use it while calculating best-path also for R2, R3, Rn 146 because it has been instructed to do so by locally-significant 147 configuration. Multiple sources can be used for redundancy purpose. 149 3. BGP Bestpath Selection Modification 151 This section applies regardless of method used to populate NHIB. 153 When BGP speaker conforming to this specification selects routes to 154 be advertised to a peer it SHOULD use cost information from NHIB 155 rather than its own IGP cost to the next-hop after step (d) of 156 9.1.2.2 in [RFC4271]. 158 4. BGPLS Extensions 160 4.1. RIB Metrics Prefix Descriptor 162 This draft defines a new Prefix Descriptor known as a Cost Prefix 163 Descriptor with a TLV code point value to be assigned by IANA. The 164 Cost descriptor looks like: 166 +--------------+-----------------------+----------+-----------------+ 167 | TLV Code | Description | Length | Value defined | 168 | Point | | | in: | 169 +--------------+-----------------------+----------+-----------------+ 170 | TBD | Cost | 4 bytes | Cost Value | 171 +--------------+-----------------------+----------+-----------------+ 173 Cost Value is a 4 byte Metric value computed by a Router's 174 local RIB. 176 The Cost value is a cost associated with a prefix by a Router. The 177 cost is typically computed by the routing procotols that owns a 178 route. 180 4.2. RIB Protocol ID 182 This draft defines a new protocol ID for IPv4 and IPv6 Topology 183 Prefix NLRI known as a RIB Protocol ID. The RIB Protocol ID has a 184 value to be assigned by IANA. The Prefix NLRI with RIB Protocol ID 185 is used to announce all the local and IGP computated routes that are 186 installed in the RIB along with its Cost value. 188 4.3. Information Exchange 190 Typically BGPLS sessions will be established between route-reflectors 191 and its internal peers (both clients and non-clients). As soon as 192 the BGPLS session is ESTABLISHED, all the RIB routes used to resolve 193 next-hop cost and information about next-hop costs MAY be sent 194 immediately by clients to its route-reflector. Implementations are 195 advised to announce BGP updates for this SAFI before any other SAFIs 196 to facilitate faster convergence of other SAFIs on Route Reflectors. 198 Each internal neighbor of a route-reflector announces its IGP RIB 199 Prefix information and its RIB metrics to the Route Reflector using a 200 BGPLS session and a new NLRI Protocol ID and RIB metric Prefix 201 Descriptor. Each neighbor updates Route Reflector with its IGP 202 prefix cost everytime a cost to an IGP route changes. 204 Upon a receipt of a BGPLS route and its associated cost, a Route 205 Reflector stores the prefix, cost, and neighbor information in its 206 local NHRIB database. It then uses the received cost towards 207 calculation of bestpath from the respective clients perpective as 208 opposed to its own IGP cost. 210 4.4. Termination of the session carrying next-hop cost 212 When the BGPLS session carrying next-hop cost terminates (for 213 whatever reason), the BGP speaker SHOULD invalidate all the next-hop 214 cost information (i.e same treatment that applies to the next-hop 215 cost as to any other BGP learned information). 217 4.5. Graceful Restart and Route-Refresh 219 BGPLS sessions carrying next-hop cost could use Graceful Restart 220 [RFC4724] and Route Refresh [RFC7313] mechanisms in the same way as 221 it's used for IPv4 and IPv6 unicast. 223 5. Security considerations 225 This document does not introduce new security considerations above 226 and beyond those already specified in [RFC4271], [I-D.ietf-idr-bgp- 227 orr] and [I-D.ietf-idr-bgp-ls]. 229 6. IANA Considerations 231 This draft defines a new protocol id value for RIB Protocol ID. This 232 draft requests IANA to allocate a value for a RIB Protocol ID from 233 BGPLS Protocol ID Registry. 235 This draft defines a new RIB Metrics Prefix Descriptor value. This 236 draft request IANA to allocate a TLV code value for the new 237 descriptor from the Prefix Descriptor registry. 239 7. Acknowledgements 241 The authors would like to acknowledge David Ward, Anton Elita, 242 Nagendra Kumar and Burjiz Pithawala for their critical reviews and 243 feedback. 245 8. References 247 8.1. Normative References 249 [I-D.ietf-idr-bgp-optimal-route-reflection] 250 Raszuk, R., Cassar, C., Aman, E., Decraene, B., and S. 251 Litkowski, "BGP Optimal Route Reflection (BGP-ORR)", 252 draft-ietf-idr-bgp-optimal-route-reflection-09 (work in 253 progress), April 2015. 255 [I-D.ietf-idr-ls-distribution] 256 Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. 257 Ray, "North-Bound Distribution of Link-State and TE 258 Information using BGP", draft-ietf-idr-ls-distribution-10 259 (work in progress), January 2015. 261 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 262 Requirement Levels", BCP 14, RFC 2119, March 1997. 264 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 266 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 267 Protocol 4 (BGP-4)", RFC 4271, January 2006. 269 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 270 Reflection: An Alternative to Full Mesh Internal BGP 271 (IBGP)", RFC 4456, April 2006. 273 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 274 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 275 January 2007. 277 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 278 "Multiprotocol Extensions for BGP-4", RFC 4760, January 279 2007. 281 [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 282 Route Refresh Capability for BGP-4", RFC 7313, July 2014. 284 8.2. Informative References 286 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, 287 September 2000. 289 Appendix A. USAGE SCENARIOS 291 A.1. Trivial case 293 --+---NetA---+-- 294 | | 295 r1 r2 296 | | 297 R1--RR-----R2 298 | \ | 299 | +------R4 300 R3 302 In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4 303 along with RR are in AS2. 305 If RR implements non-optimized route-reflection, then it will choose 306 path to NetA via R1 and advertise it to both R3 and R4. Such choice 307 is good from R3 perspective, but it results in suboptimal traffic 308 flow from R4 to NetA. 310 Using the proposed BGPLS extensions, the route-reflector will learn 311 that cost from R4 to R1 is 8 whereas to R2 it's only 1. RR will 312 announce NetA to R4 with next-hop set to R2, while its announce to R3 313 will still have R1 as next-hop. Both R3 and R4 now will send traffic 314 to NetA via closest exit, achieving same behaviour as if full iBGP 315 mesh would have been configured. 317 A.2. Non-IGP based cost 319 When it's desirable to direct traffic over an exit other than the one 320 with smallest IGP cost, BGPLS extensions can be used to convey cost 321 which is not based on IGP. For example, network operator may arrange 322 exit points in order of administrative preference and configure 323 routers to send this instead of IGP cost. Route reflector then will 324 then calculate best path based on administrative preference rather 325 than IGP metrics. 327 Network operators should excercise care to ensure that all routers up 328 to and including exit point do not devert packets on to a different 329 path, otherwise routing loops may occur. One way to achieve this is 330 to have consistent administrative preference among all routers. 331 Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel) 332 between source and the exit point, provided that the router serving 333 as exit point will send packets out of the network rather than 334 diverting them to another exit point. 336 A.3. Multiple route-reflectors 338 This example demonstrates that BGPLS extensions are necessary only 339 between routers that already exchange other AFI/SAFI. 341 | 342 R1----R3---------R5----R7--+ 343 | | | 344 RR1 | NetA 345 | RR2 | 346 | | | 347 R2----R4---------R6----R8--+ 348 | 350 In the above network the routers R1-R4 are clients of RR1, and R5-R8 351 are clients of RR2. RR1 and RR2 also peer with each other and use 352 ADDPATH. 354 RR2 learns about NetA from R7 and R8. Since it sends not just best- 355 path but all prefixes to RR1, there is no need for RR2 to learn cost 356 information from R1 and R2 towards R7 and R8. On the other hand RR1 357 does exchange cost information using BGPLS with R1 and R2 so that 358 each of them can receive routes, which are best from their 359 perspective. 361 As addition to ADDPATH a mechanism could be devised that would allow 362 RR2 to learn how many alternative routes does it need to send to RR1. 363 For example, if NetA would also be connected to R9 (not shown) but 364 all clients of RR1 prefer R7 as exit point and R9 as next-best, then 365 there is no need for RR2 to send NetA routes with next-hop R8 to RR1. 367 Discussion: authors would like to solicit discussion whether there is 368 sufficient interest in such mechanism. 370 A.4. Inter-AS MPLS VPN 372 Previous example could be transposed to Inter-AS MPLS VPN Option C 373 scenario. In this case route reflectors RR1 and RR2 can be from 374 different autonomous system. Essentially the behaviour of routers 375 remains as already described. 377 A.5. Corner case 379 --+---NetA--+-- 380 | | 381 RR---R1 R2 382 \ / 383 R3---R4 385 In the above network cost from R3 to R1 is 10, all other costs are 1. 386 If RR advertises NetA to R3 based on cost information received from 387 R3, but uses its own cost when advertising NetA to R4, there will be 388 a loop formed. This is the reason why section "BGP best path 389 selection modification" requires RR to have next-hop cost information 390 for every next-hop and every peer. 392 Note that the problem is the same as if RR would not use extensions 393 described in this document and R3 would peer directly with R1 and R2, 394 while R4 would peer only with RR. 396 Authors' Addresses 398 Ilya Varlashkin 399 Google 401 Email: ilya@nobulus.com 403 Robert Raszuk 404 Mirantis Inc. 405 615 National Ave. #100 406 Mt View, CA 94043 407 USA 409 Email: robert@raszuk.net 411 Keyur Patel 412 Cisco Systems 413 170 W. Tasman Drive 414 San Jose, CA 95124 95134 415 USA 417 Email: keyupate@cisco.com 418 Manish Bhardwaj 419 Cisco Systems 420 170 W. Tasman Drive 421 San Jose, CA 95124 95134 422 USA 424 Email: manbhard@cisco.com 426 Serpil Bayraktar 427 Cisco Systems 428 170 W. Tasman Drive 429 San Jose, CA 95124 95134 430 USA 432 Email: serpil@cisco.com