idnits 2.17.1 draft-bhatia-ecmp-routes-in-bgp-02.txt: -(123): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 658. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 635. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 642. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 648. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 6 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([BGP4], [KEYWORDS], [M-HOP]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 7 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 92: '...n implementation MUST choose from one ...' RFC 2119 keyword, line 121: '...vidual BGP paths MAY only be advertise...' RFC 2119 keyword, line 259: '...exclusive and both of them MUST not be...' RFC 2119 keyword, line 307: '...nd an implementation MUST proceed with...' RFC 2119 keyword, line 323: '...forwarding table MUST follow an algori...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 157 has weird spacing: '...lancing invol...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: ECMP and AE flags are mutually exclusive and both of them MUST not be set together. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 10, 2006) is 6650 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'KEYWORDS' is mentioned on line 58, but not defined -- Looks like a reference, but probably isn't: '1' on line 195 -- Looks like a reference, but probably isn't: '2' on line 196 -- Looks like a reference, but probably isn't: '3' on line 197 -- Looks like a reference, but probably isn't: '4' on line 199 == Missing Reference: 'Scenario 1' is mentioned on line 514, but not defined == Missing Reference: 'Scenario 2' is mentioned on line 538, but not defined == Unused Reference: 'BGP-CAP' is defined on line 584, but no explicit reference was found in the text == Unused Reference: 'MED' is defined on line 587, but no explicit reference was found in the text == Unused Reference: 'AFI' is defined on line 598, but no explicit reference was found in the text == Unused Reference: 'SAFI' is defined on line 600, but no explicit reference was found in the text == Unused Reference: 'CONFED' is defined on line 602, but no explicit reference was found in the text == Unused Reference: 'MPBGP' is defined on line 607, but no explicit reference was found in the text -- No information found for draft-bhatia-idr-multiple-hops - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'M-HOP' ** Obsolete normative reference: RFC 3392 (ref. 'BGP-CAP') (Obsoleted by RFC 5492) ** Downref: Normative reference to an Informational RFC: RFC 3345 (ref. 'MED') -- Possible downref: Non-RFC (?) normative reference: ref. 'AFI' -- Possible downref: Non-RFC (?) normative reference: ref. 'SAFI' == Outdated reference: A later version (-06) exists of draft-ietf-idr-rfc3065bis-05 ** Obsolete normative reference: RFC 2858 (ref. 'MPBGP') (Obsoleted by RFC 4760) Summary: 9 errors (**), 0 flaws (~~), 17 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Joel M. Halpern 2 Internet Draft 3 Manav Bhatia 4 Riverstone Networks 5 Paul Jakma 6 Sun Microsystems 7 Expires: August 2006 February 10, 2006 9 Advertising Equal Cost Multipath routes in BGP 11 draft-bhatia-ecmp-routes-in-bgp-02.txt 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet draft will expire on August 2006 38 Copyright Notice 40 Copyright (C) The Internet Society (2006). 42 Abstract 44 This document describes an extensible mechanism that will allow a BGP 45 [BGP4] speaker to advertise ECMP routes for a destination to its 46 peers. It uses an existing, Multi Hop Capability [M-HOP], to 47 advertise multiple paths that it is using to its peers. 49 The mechanism described in this document is only applicable to 50 routers that have the ability to inject multiple routing entries in 51 their forwarding table. 53 Conventions used in this document 55 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 56 "SHOULD", "SHOULD NOT", "RECOMMENDED","MAY", and "OPTIONAL" in this 57 document are to be interpreted as described in RFC 2119 [KEYWORDS] 59 Table of Contents 61 1. Introduction...................................................2 62 2. Some BGP ECMP Scenarios........................................3 63 3. Requirements for the ECMP algorithm............................5 64 4. ECMP Capability................................................6 65 5. Operation when both peers are Multipath capable................6 66 6. Advertisement of Multipath BGP routes..........................6 67 7. Procedures for the Receiving Speaker...........................7 68 8. Working with Non Multipath capable/EBGP peers..................7 69 9. Configuring BGP ECMP support...................................9 70 10. Working with Multipath capable IBGP peers.....................9 71 11. Aggregation vis-�-vis Multi-Hop Solution.....................10 72 12. Security Considerations......................................11 73 13. Acknowledgements.............................................11 74 14. IANA Considerations..........................................11 75 15. Appendix A...................................................11 76 15.1 Constructing AS_PATHs....................................11 77 15.2 Advertising synthetic AS_PATHs...........................12 78 16. References...................................................12 79 17. Author's Addresses...........................................13 80 18. Intellectual Property Notice.................................14 81 19. Disclaimer of Validity.......................................14 82 20. Copyright Statement..........................................14 83 21. Acknowledgment...............................................14 85 1. Introduction 87 Most of the current BGP implementations upon receiving multiple equal 88 cost BGP routes from different peers can insert all of them (or a 89 subset depending upon the local policies) in their forwarding table. 90 This can be done to locally split the traffic across several paths. 91 However, because BGP in its current state can only advertise one path 92 to its peers, an implementation MUST choose from one of the best 93 paths that it is using for the advertisement. 95 This has implications for the BGP peers that receive such 96 advertisements from ECMP capable BGP speakers. In the worst case it 97 can lead to potential loops if the entire path information is not 98 advertised to the peers. The best that can be currently done is to 99 aggregate all the contributing paths and advertise the aggregated 100 route. Doing this leads to a loss in the path length. 102 This imposes a severe limitation on the kind of load balancing that a 103 BGP peer can do. 105 The use of BGP ECMP routes is most prevalent inside an AS to identify 106 its local BGP routes that represent load balanced links. This is 107 useful for applications that want to use the BGP protocol as a 108 mechanism for propagating this information for load balancing across 109 multiple IBGP paths. 111 This document refers to all the candidate paths that remain after the 112 tie breaking procedure, described in sec. 9.1.2.2, reaches step (f) 113 as "ECMP BGP paths/routes". It should be noted that these paths shall 114 have the same AS_PATH length, though the individual AS_PATH segments 115 could differ. 117 Advertising individual BGP ECMP paths with different NEXT_HOPs holds 118 value in IBGP scenarios, as each IBGP peer can reach the NEXT_HOP on 119 its own. 121 However in case of EBGP, individual BGP paths MAY only be advertised 122 if the contributing routers are on the same subnet. If the peers 123 don�t lie on the same subnet, then information about each individual 124 NEXT_HOP is not useful. This is because (i) the receiving EBGP peer 125 is not be aware of the NEXT_HOP information inside some other ASes 126 and, (ii) the EBGP speaker always resets the NEXT_HOP to itself when 127 advertising routes. Thus, individual BGP ECMP paths for a destination 128 are only advertised to IBGP peers or EBGP peers that lie on the same 129 subnet. In other cases, only one path is announced which is an 130 �aggregate� of all the individual equal cost paths for that 131 destination. 133 However, care must be taken to ensure that the AS_PATH length in this 134 aggregated path is retained and enough information is there in the 135 AS_PATH to enable the receiving peer (and the downstream peers) to 136 detect AS loops. 138 2. Some BGP ECMP Scenarios 140 o Load Splitting when receiving BGP routes 141 A (AS X) 142 \ 143 \ 144 \ 145 C (AS Y) 146 / 147 / 148 / 149 B (AS Z) 151 A, B and C are BGP speakers in AS X, Y and Z respectively. 153 Assume that C is peered up with similar sized ISPs A and B and 154 accepts entire Internet feed from each one of these. It is common in 155 such scenarios for the ISP C to receive multiple routes of equal cost 156 from both A and B. Ordinarily, C can use only one "best" route learnt 157 from either of A or B. Configuring C for load balancing involves a 158 lot of prepending, modifying routes, splitting prefixes received from 159 A and B, etc. Even then, it is difficult to have a 50/50 split across 160 A and B as the load can only be statically split. The best and the 161 most obvious solution is to let C install ECMP BGP routes in its FIB 162 and to advertise information about all these ECMP BGP routes to its 163 downstream peers, in a manner that lets each one of them run its loop 164 detection algorithm. 166 o Load Splitting when advertising BGP routes. 168 B(X) 169 / \ 170 / \__ 171 / \ 172 A(W) D(Z) 173 \ _/ 174 \ / 175 \ / 176 C(Y) 178 A, B, C and D are BGP speakers in AS W, X, Y and Z respectively. 179 The dashed lines show EBGP peering. 181 Scenario 1: Traditional BGP without ECMP capabilities 183 Assume that A has a block of 10.0.0.0/20 that it needs to advertise 184 to both B and C. For the purpose of load balancing it splits this 185 block into two smaller blocks of 10.0.0.0/21 and 10.0.8.0/21 and 186 advertises each of these to B and C. Router A somehow needs to ensure 187 that one block is more preferred in B and the other in C. This can be 188 done by configuring policies that can prepend AS numbers, MEDs, etc. 189 when advertising the BGP route associated with these prefixes. Say, 190 10.0.0.0/21 is prepended with AS numbers when advertised to B and the 191 10.0.8.0/21 when advertised to C. This ensures that all the traffic 192 is split between B and C, as this is what is advertised to router D. 193 The problems with this approach are: 195 [1] Number of routes advertised increase. 196 [2] Entries in FIB increase. 197 [3] Complex policies need to be implemented at A to ensure that 198 incoming traffic is balanced. 199 [4] Load balancing is static and cannot ensure 50/50 split. 201 Scenario 2: BGP with ECMP capabilities support 203 Router A now does not need to split the original block of 10.0.0.0/20 204 into two smaller blocks. It can advertise 10.0.0.0/20 to both B and 205 C. With B and C equidistant from D, the latter will insert BGP 206 multipath route for 10.0.0.0/20 with NEXT_HOPs as B and C and will 207 advertise this to its downstream peers. Whatever traffic comes to D 208 will be split across routers B and C. The load balancing this way is 209 dynamic and not static. Entries in the FIB remain under control and 210 Router A need not implement those complex policies. Router D needs to 211 advertise information about the paths it is using to its downstream 212 peers, so that they can run the loop detection algorithm. 214 o Suboptimal Routing in Route Reflector clients 216 Route Reflection can result in suboptimal routing due to the client 217 not having full visibility to all the BGP paths in the AS. This is 218 because the RR selects the best path and reflects only that best path 219 to its clients. In case the RR has equal cost BGP routes, then it 220 shall select the one based on the lower Router ID. As a result, the 221 clients do not receive the full view of the available paths, or 222 atleast the paths that are equidistant from the RR. This is bad, as 223 this can result in suboptimal routing from the client's perspective. 224 A client may have selected a different best path if more paths had 225 been made visible to it. With BGP ECMP, the RR can advertise all the 226 equal cost BGP routes that it has to its client, giving the client 227 more options to choose from. 229 The extensions proposed in this draft provide provision for the RR to 230 reflect all the routes to its clients. 232 3. Requirements for the ECMP algorithm 234 (a) An ECMP capable peer must be able to advertise each individual 235 BGP path to other ECMP capable IBGP peers. 237 (b) When advertising routes to non ECMP capable peers it must 238 preserve as much of the AS Path structure as possible. This 239 includes, the individual AS elements and the path length. 241 We in this document propose an algorithm to advertise BGP ECMP 242 routes, to both routers capable of ECMP and the ones without, so as 243 to honor each of the requirements mentioned above. 245 4. ECMP Capability 247 A router that wishes to express its capability to advertise ECMP 248 routes uses the multi-hop capability [M-HOP]. 250 We define a new bit, ECMP bit, in the bitmask of flags. This is set 251 if the router intends to install ECMP routes. 253 +-------+---+---+---+---+---+---+----+----+ 254 | Bit: | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 255 +-------+---+---+---+---+---+---+----+----+ 256 | flag: | R | R | R | R | R | R |ECMP| AE | 257 +-------+---+---+---+---+---+---+----+----+ 259 ECMP and AE flags are mutually exclusive and both of them MUST not be 260 set together. 262 By advertising the Multi-hop capability to a peer with the ECMP bit 263 set (multipath capable), the BGP Speaker conveys to its peer that the 264 speaker is capable of receiving and properly handling the BGP ECMP 265 updates from that peer. 267 5. Operation when both peers are Multipath capable 269 In the following sections, "Local speaker" refers to a router which 270 is advertising the BGP multipath routes, and the "Receiving Speaker" 271 refers to a router that peers with the former to accept multiple BGP 272 routes for a destination. 274 Consider that the capability with the proper flags has been exchanged 275 between the Local speaker and the Receiving speaker, and a BGP 276 session between them is established. The following sections detail 277 the procedures that shall be followed by the Local speaker as well as 278 the Receiving speaker once the capability has been exchanged, and the 279 local speaker wants to advertise some BGP multipath routes. 281 6. Advertisement of Multipath BGP routes 283 To advertise BGP ECMP paths, the speaker uses MULTIPLE_HOP path 284 attribute. Refer to [M-HOP] for details regarding how this path 285 attribute is used. 287 7. Procedures for the Receiving Speaker 289 The Receiving Speaker upon receiving the MULTIPLE_HOP attribute will 290 understand that the Local Speaker has advertised multipath BGP 291 routes. In a single UPDATE message all the prefixes will have 292 identical attributes, except for the next-hops, which will be carried 293 in the MULTIPLE_HOP attribute. 295 It will run the modified decision process as explained in the Section 296 1 and depending upon the result will either 298 - inject multiple routes into Local-RIB and advertise multiple paths 299 to its peers 300 OR 301 - inject a single route which has better path attributes than the 302 other routes that it has just received. 304 If the Receiving Peer receives some withdrawn routes along with the 305 other path attributes and MULTIPLE_HOP attribute then it shall 306 understand that some of the previously advertised multipath BGP 307 routes have been removed and an implementation MUST proceed with 308 removing all such paths. 310 8. Working with Non Multipath capable/EBGP peers 312 This section discusses how BGP ECMP routes are advertised to non 313 multipath capable routers or EBGP peers lying on different subnets. 314 This is relevant only when the local speaker has installed BGP ECMP 315 routes in its FIB and wants to convey this information to other peers 316 that are not capable of understanding these capabilities. 318 At this point the local speaker needs to aggregate this information 319 and it can follow any algorithm that preserves as much of the 320 available AS path structure as possible. 322 A multipath capable speaker that has installed multiple BGP paths in 323 its forwarding table MUST follow an algorithm to advertise the 324 AS_PATH for each one of these routes in a manner that makes it 325 possible for the downstream peers to run the AS Path loop detection 326 algorithm. However, the algorithm MUST ensure that the AS_PATH length 327 remains unaltered. 329 There can exist multiple ways to do the above and we recommend one 330 such algorithm which preserves most of the AS path structure and 331 information. 333 When crossing the multipath capable boundary to non-multipath 334 capable, the speaker MUST send out a synthetic AS_PATH to the latter, 335 where each element of the synthetic AS_PATH is an AS SET, built with 336 the AS values corresponding to each segment in each contributing 337 AS_PATH. This is possible since each of the contributing AS_PATHs are 338 of the same length. 340 The above procedure is described as a pseudo code. Note that the 341 pseudo-code shown was chosen for clarity, not efficiency. It is not 342 intended to specify any particular implementation. BGP 343 implementations MAY use any algorithm which produces the same results 344 as those described here. 346 Given two AS_PATHs X and Y, each with N number of segments, which we 347 wish to merge into a new combined AS_PATH, Z of N number of segments: 349 Expand every AS_SEQUENCE segment in X and Y which contains multiple 350 AS values into single-valued segments, such that the number of 351 segments is equivalent to the path length, with order preserved. 353 for every segment, n, from 0 to N 354 create a segment Z(n) of type AS_SET 355 for every AS value in segments X(n) and Y(n) 356 add the AS value into Z(n) 358 Resulting AS_PATH Z will consist of n AS_SETs, each AS_SET segment 359 having all AS values in segments X(n) and Y(n). 361 To cite an example, consider a BGP speaker (say in AS A1) having the 362 following paths for a destination D1: 364 Path 1: 365 AS_PATH "a b c", Origin IGP, MED 10, NEXT_HOP N1 367 Path 2: 368 AS_PATH "x y z", Origin IGP, MED 20, NEXT_HOP N2 370 It inserts these two paths in its forwarding table, and announces the 371 following to its non-ecmp capable peer: 373 AS_PATH: 374 AS_SET (a,x) 375 AS_SET (b,y) 376 AS_SET (c,z) 377 Origin IGP, MED 10, NEXT_HOP [N1 or N2] 379 The AS_PATH constructed for advertisement to an EBGP peer is 381 AS_PATH: 382 AS_SEQUENCE "A1" 383 AS_SET (a,x) 384 AS_SET (b,y) 385 AS_SET (c,z) 387 Refer to Appendix A for more complex AS_PATH scenarios. 389 The beauty of this new AS_PATH structure is that it retains the 390 AS_PATH length of the original contributing paths and also retains 391 enough AS information for the receiving peer to do loop prevention. 393 For instance, if the router that has been used in the above example 394 is peered up with another router R2 in AS "c", then R2 will reject 395 all UPDATEs that have AS "c" in the AS_PATH. 397 The Multipath capable router when advertising routes to a non- 398 multipath capable IBGP peer can pick up any one of the NEXT_HOPs from 399 the available list. This will not create any problems because this 400 NEXT_HOP will actually fall within the AS_PATH set-sequence that is 401 being advertised. For EBGP peers, the ECMP capable router, will as 402 usual, put itself as the NEXT_HOP. 404 9. Configuring BGP ECMP support 406 An implementation MUST provide a configuration option to set and 407 unset this feature. 409 The administrator should ensure that the maximum number of multipath 410 routes that all the routers install in their FIB, remains consistent 411 inside an AS. 413 10. Working with Multipath capable IBGP peers 415 This section explains as to how ECMP feature will work in the normal 416 scenarios. 418 Assume that the two IBGP speakers A and B exchange this capability. 419 Consider a case where A receives multiple updates for NLRI N' with 420 Nexthops N0, .. Ni, .. Nm. Say it runs its decision process and finds 421 that routes with the Nexthops Nj, Nk and Nl are equal and that it 422 needs to advertise all three of them to B. Also assume that Nj and Nk 423 share the same path attributes (Origin, AS Path, Local Pref, etc). 425 A makes an UPDATE message and uses the MULTIPLE_HOP path attribute. 426 It puts the AFI, number of next-hops as 2, length of the first next- 427 hop (Nj), network address of Nj, length of Nk and the network address 428 of Nk. 430 When this UPDATE message is received by B, it looks at the 431 MULTIPLE_HOP path attribute and understands that there are multiple 432 routes to reach N'. It inserts two routes for N' with the next-hops 433 as Nj and Nk. 435 A also needs to announce N' with some other path attributes and the 436 next-hop Nl. It makes an UPDATE message, puts the path attributes, 437 and puts the MULTIPLE_HOP path attribute. It fills the AFI, number of 438 next-hops as 1, length of the first next-hop Nl and the network 439 address of Nl. This UPDATE message is sent to B. 441 When B receives this UPDATE message it knows that this is not an 442 implicit WITHDRAW from N' as it comes with the MULTIPLE_HOP path 443 attribute. It simply appends this new route in its BGP database, runs 444 the decision process, and proceeds as normal. 446 Assume that at some point later, A needs to withdraw the route 447 associated with the tuple [N', Nk]. It makes an UPDATE message, puts 448 N' in the unfeasible routes and inserts path attributes and the 449 MULTIPLE_HOP path attribute, keeping the next-hop inside as Nk. 451 When B receives this UPDATE message it understands that A now wants 452 to remove a route associated with N'. It looks at MULTIPLE_HOP and 453 finds the next-hop as Nk. It thus removes, only the route associated 454 with Nk. 456 11. Aggregation vis-�-vis Multi-Hop Solution 458 Another approach to do ECMP is via advertising all the contributing 459 routes through aggregation. In this approach any router that installs 460 multiple BGP routes in its FIB can aggregate the contributing routes 461 and advertise the aggregated route to its peers. The problem with 462 this approach is that the AS_PATH length is not preserved. 464 A clever implementation could prepend its own AS the required number 465 of times to preserve the path length when advertising routes to EBGP 466 peers, but this will not work when advertising routes to IBGP peers. 468 Moreover, lumping everything in one AS_SET hides information and 469 makes it harder for the downstream peers to make out as to what is 470 actually happening. Our method of constructing the synthetic AS_PATH 471 is somewhat complex but preserves maximum information and is regular 472 expression friendly. 474 Another problem with aggregating the information at the router that 475 is doing ECMP is that this router cannot advertise individual 476 contributing routes to its other ECMP capable peers. Advertising 477 individual routes is desirable in some cases as different BGP peers 478 make their path decisions based on their local policies and 479 advertising them just one aggregated route makes it impossible for 480 them to do so. 482 Moreover, by providing them with the NEXT_HOP information each peer 483 can choose its best IGP route to reach those NEXT_HOPs. 485 Last but not the least, putting all the ASes inside one set can cause 486 the SET to overflow. In that case, the path length would increase by 487 1. 489 12. Security Considerations 491 This extension to BGP does not change the underlying security issues 492 inherent in the existing BGP. 494 13. Acknowledgements 496 The authors would like to thank Tony Li and Curtis Villamizar for 497 their valuable comments and suggestions. 499 14. IANA Considerations 501 This document uses an attribute type to indicate additional next-hops 502 for the BGP paths. This must be assigned by IANA as per RFC 2842. 504 15. Appendix A 506 15.1 Constructing AS_PATHs 508 This section deals with some scenarios that could occur. Consider 509 that ecmp capable Router R1 has received multiple paths for a 510 destination D1 and it is connected to a non-ecmp capable/EBGP router 511 R2. R1 thus cannot use the MULTIPLE_HOP attribute to announce these 512 routes. 514 [Scenario 1] 516 Say, R1 has the following BGP paths that it installs in its FIB. 518 Path 1: AS_PATH: AS_SEQ "a b", AS_SET "p1 p2", AS_SET "p3 p4", 519 NEXT_HOP N1 521 Path 2: AS_PATH: AS_SEQ "x y", AS_SET "q1 q2 q3", AS_SET "q4", 522 NEXT_HOP N2 524 It is to be noted in this case when R1 runs its decision process, AS 525 Path lengths are the same because when counting this number, an 526 AS_SET counts as 1, no matter how many ASes are in the SET. 528 The AS_PATH that R1 thus constructs when announcing the UPDATE to R2 529 (assuming R2 is an IBGP peer) is: 531 AS_PATH: AS_SET "a x", 532 AS_SET "b y", 533 AS_SET "p1 p2 q1 q2 q3", 534 AS_SET "p3 p4 q4" 536 It will create a new AS_SEQ segment if R2 is an EBGP peer. 538 [Scenario 2] 540 Say, R1 has the following BGP paths that it installs in its FIB. 542 Path 1: AS_PATH: AS_SEQ "a b c", AS_SET "p1 p2", NEXT_HOP N1 544 Path 2: AS_PATH: AS_SEQ "x y", AS_SET "q1 q2 q3", AS_SET "q4", 545 NEXT_HOP N2 547 The AS_PATH that R1 thus constructs when announcing the UPDATE to R2 548 (assuming R2 is an IBGP peer) is: 550 AS_PATH: AS_SET "a x" 551 AS_SET "b y" 552 AS_SET "c q1 q2 q3" 553 AS_SET "p1 p2 q4" 555 15.2 Advertising synthetic AS_PATHs 557 This section discusses some optimizations/cleanups that can be done 558 by a BGP speaker when constructing the synthetic AS_PATH, to 559 advertise to a non-ecmp capable or an EBGP peer. 561 X(n) and Y(n) are two AS_PATHs, each with N number of segments, which 562 will be merged into a new combined synthetic AS_PATH, Z of N number 563 of segments: 565 - If X(n) and Y(n) are both of type AS_SEQUENCE and contain the same 566 value, the type of Z(n) MUST be set to AS_SEQUENCE, the value being 567 the single as value concerned common to X(n) and Y(n). 569 - Duplicate AS values MUST be removed from Z(n) once all values have 570 been added to it, if the implementation has not already discarded 571 duplicate values while iterating through X(n) and Y(n) when 572 constructing segment Z(n) 574 16. References 576 [M-HOP] Bhatia, M., Halpern, J. and Jakma, P., �Advertising 577 Multiple Nexthop Routes in BGP�, 578 draft-bhatia-idr-multiple-hops-00.txt, February 2006, 579 Work in Progress 581 [BGP4] Rekhter, Y., Li, T. and Hares, S., "A Border Gateway 582 Protocol 4 (BGP-4)", RFC 4271, March 1995 584 [BGP-CAP] Chandra, R. and J. Scudder, "Capabilities Advertisement 585 with BGP-4", RFC 3392, November 2002 587 [MED] D. McPherson, V, Gill, D. Walton, and A. Retana, "Border 588 Gateway Protocol (BGP) Persistent Route Oscillation 589 Condition", RFC 3345, August 2002. 591 [BGP-IPv6]Marques, P. and F. Dupont, "Use of BGP-4 Multiprotocol 592 Extensions for IPv6 Inter-Domain Routing", RFC 2545, March 593 1999 595 [KEYWORDS]Bradner, S., "Key words for use in RFCs to Indicate 596 Requirement Levels", BCP 14, RFC 2119, March 1997. 598 [AFI] http://www.iana.org/assignments/address-family-numbers 600 [SAFI] http://www.iana.org/assignments/safi-namespace 602 [CONFED] McPherson, D., Scudder, J., and P. Traina, "Autonomous 603 System Confederations for BGP", 604 draft-ietf-idr-rfc3065bis-05 (work in progress), 605 October 2005. 607 [MPBGP] Bates, T., R. Chandra, D. Katz, and Y. Rekhter, 608 Multiprotocol Extension for BGP-4", RFC 2858, June 2000. 610 17. Author's Addresses 612 Joel M. Halpern 614 Email: joel@stevecrocker.com 616 Manav Bhatia 617 Riverstone Networks, Inc. 619 Email: manav@riverstonenet.com 621 Paul Jakma 622 Sun Microsystems 624 Email: paul.jakma@sun.com 626 18. Intellectual Property Notice 628 The IETF takes no position regarding the validity or scope of any 629 Intellectual Property Rights or other rights that might be claimed to 630 pertain to the implementation or use of the technology described in 631 this document or the extent to which any license under such rights 632 might or might not be available; nor does it represent that it has 633 made any independent effort to identify any such rights. Information 634 on the procedures with respect to rights in RFC documents can be 635 found in BCP 78 and BCP 79. 637 Copies of IPR disclosures made to the IETF Secretariat and any 638 assurances of licenses to be made available, or the result of an 639 attempt made to obtain a general license or permission for the use of 640 such proprietary rights by implementers or users of this 641 specification can be obtained from the IETF on-line IPR repository at 642 http://www.ietf.org/ipr. 644 The IETF invites any interested party to bring to its attention any 645 copyrights, patents or patent applications, or other proprietary 646 rights that may cover technology that may be required to implement 647 this standard. Please address the information to the IETF at ietf- 648 ipr@ietf.org. 650 19. Disclaimer of Validity 652 This document and the information contained herein are provided on an 653 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 654 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 655 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 656 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 657 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 658 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 660 20. Copyright Statement 662 Copyright (C) The Internet Society (2006). This document is subject 663 to the rights, licenses and restrictions contained in BCP 78, and 664 except as set forth therein, the authors retain all their rights. 666 21. Acknowledgment 668 Funding for the RFC Editor function is currently provided by the 669 Internet Society.