idnits 2.17.1 draft-ietf-idr-rfc2796bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 480. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 451. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 458. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 464. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 11 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 12 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The abstract seems to indicate that this document obsoletes RFC2796, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 1863 (ref. '2') (Obsoleted by RFC 4223) -- Obsolete informational reference (is this intentional?): RFC 1965 (ref. '3') (Obsoleted by RFC 3065) -- Obsolete informational reference (is this intentional?): RFC 1966 (ref. '4') (Obsoleted by RFC 4456) -- Obsolete informational reference (is this intentional?): RFC 2385 (ref. '5') (Obsoleted by RFC 5925) -- Obsolete informational reference (is this intentional?): RFC 2796 (ref. '6') (Obsoleted by RFC 4456) Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Bates (Cisco Systems) 3 Internet Draft R. Chandra (Sonoa Systems) 4 Expiration Date: April 2006 E. Chen (Cisco Systems) 6 BGP Route Reflection - 7 An Alternative to Full Mesh IBGP 9 draft-ietf-idr-rfc2796bis-02.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 The Border Gateway Protocol (BGP) is an inter-autonomous system 37 routing protocol designed for TCP/IP internets. Typically all BGP 38 speakers within a single AS must be fully meshed so that any external 39 routing information must be re-distributed to all other routers 40 within that AS. This represents a serious scaling problem that has 41 been well documented with several alternatives proposed. 43 This document describes the use and design of a method known as 44 "Route Reflection" to alleviate the the need for "full mesh" IBGP. 46 This documents obsoletes RFC 2796 and RFC 1966. 48 1. Introduction 50 Typically all BGP speakers within a single AS must be fully meshed 51 and any external routing information must be re-distributed to all 52 other routers within that AS. For n BGP speakers within an AS that 53 requires to maintain n*(n-1)/2 unique IBGP sessions. This "full 54 mesh" requirement clearly does not scale when there are a large 55 number of IBGP speakers each exchanging a large volume of routing 56 information, as is common in many of today's networks. 58 This scaling problem has been well documented and a number of 59 proposals have been made to alleviate this [2,3]. This document 60 represents another alternative in alleviating the need for a "full 61 mesh" and is known as "Route Reflection". This approach allows a BGP 62 speaker (known as "Route Reflector") to advertise IBGP learned routes 63 to certain IBGP peers. It represents a change in the commonly 64 understood concept of IBGP, and the addition of two new optional non- 65 transitive BGP attributes to prevent loops in routing updates. 67 This documents obsoletes RFC 2796 [6] and RFC 1966 [4]. 69 2. Specification of Requirements 71 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 72 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 73 document are to be interpreted as described in RFC 2119 [7]. 75 3. Design Criteria 77 Route Reflection was designed to satisfy the following criteria. 79 o Simplicity 81 Any alternative must be both simple to configure as well as 82 understand. 84 o Easy Transition 86 It must be possible to transition from a full mesh 87 configuration without the need to change either topology or AS. 88 This is an unfortunate management overhead of the technique 89 proposed in [3]. 91 o Compatibility 93 It must be possible for non compliant IBGP peers to continue be 94 part of the original AS or domain without any loss of BGP 95 routing information. 97 These criteria were motivated by operational experiences of a very 98 large and topology rich network with many external connections. 100 4. Route Reflection 102 The basic idea of Route Reflection is very simple. Let us consider 103 the simple example depicted in Figure 1 below. 105 +-------+ +-------+ 106 | | IBGP | | 107 | RTR-A |--------| RTR-B | 108 | | | | 109 +-------+ +-------+ 110 \ / 111 IBGP \ ASX / IBGP 112 \ / 113 +-------+ 114 | | 115 | RTR-C | 116 | | 117 +-------+ 119 Figure 1: Full Mesh IBGP 121 In ASX there are three IBGP speakers (routers RTR-A, RTR-B and RTR- 122 C). With the existing BGP model, if RTR-A receives an external route 123 and it is selected as the best path it must advertise the external 124 route to both RTR-B and RTR-C. RTR-B and RTR-C (as IBGP speakers) 125 will not re-advertise these IBGP learned routes to other IBGP 126 speakers. 128 If this rule is relaxed and RTR-C is allowed to advertise IBGP 129 learned routes to IBGP peers, then it could re-advertise (or reflect) 130 the IBGP routes learned from RTR-A to RTR-B and vice versa. This 131 would eliminate the need for the IBGP session between RTR-A and RTR-B 132 as shown in Figure 2 below. 134 +-------+ +-------+ 135 | | | | 136 | RTR-A | | RTR-B | 137 | | | | 138 +-------+ +-------+ 139 \ / 140 IBGP \ ASX / IBGP 141 \ / 142 +-------+ 143 | | 144 | RTR-C | 145 | | 146 +-------+ 148 Figure 2: Route Reflection IBGP 150 The Route Reflection scheme is based upon this basic principle. 152 5. Terminology and Concepts 154 We use the term "Route Reflection" to describe the operation of a BGP 155 speaker advertising an IBGP learned route to another IBGP peer. Such 156 a BGP speaker is said to be a "Route Reflector" (RR), and such a 157 route is said to be a reflected route. 159 The internal peers of a RR are divided into two groups: 161 1) Client Peers 163 2) Non-Client Peers 165 A RR reflects routes between these groups, and may reflect routes 166 among client peers. A RR along with its client peers form a Cluster. 167 The Non-Client peer must be fully meshed but the Client peers need 168 not be fully meshed. Figure 3 depicts a simple example outlining the 169 basic RR components using the terminology noted above. 171 / - - - - - - - - - - - - - - 172 | Cluster | 173 +-------+ +-------+ 174 | | | | | | 175 | RTR-A | | RTR-B | 176 | |Client | |Client | | 177 +-------+ +-------+ 178 | \ / | 179 IBGP \ / IBGP 180 | \ / | 181 +-------+ 182 | | | | 183 | RTR-C | 184 | | RR | | 185 +-------+ 186 | / \ | 187 - - - - - /- - -\- - - - - - / 188 IBGP / \ IBGP 189 +-------+ +-------+ 190 | RTR-D | IBGP | RTR-E | 191 | Non- |---------| Non- | 192 |Client | |Client | 193 +-------+ +-------+ 195 Figure 3: RR Components 197 6. Operation 199 When a RR receives a route from an IBGP peer, it selects the best 200 path based on its path selection rule. After the best path is 201 selected, it must do the following depending on the type of the peer 202 it is receiving the best path from: 204 1) A Route from a Non-Client IBGP peer 206 Reflect to all the Clients. 208 2) A Route from a Client peer 210 Reflect to all the Non-Client peers and also to the Client 211 peers. (Hence the Client peers are not required to be fully 212 meshed.) 214 An Autonomous System could have many RRs. A RR treats other RRs just 215 like any other internal BGP speakers. A RR could be configured to 216 have other RRs in a Client group or Non-client group. 218 In a simple configuration the backbone could be divided into many 219 clusters. Each RR would be configured with other RRs as Non-Client 220 peers (thus all the RRs will be fully meshed.). The Clients will be 221 configured to maintain IBGP session only with the RR in their 222 cluster. Due to route reflection, all the IBGP speakers will receive 223 reflected routing information. 225 It is possible in a Autonomous System to have BGP speakers that do 226 not understand the concept of Route-Reflectors (let us call them 227 conventional BGP speakers). The Route-Reflector Scheme allows such 228 conventional BGP speakers to co-exist. Conventional BGP speakers 229 could be either members of a Non-Client group or a Client group. This 230 allows for an easy and gradual migration from the current IBGP model 231 to the Route Reflection model. One could start creating clusters by 232 configuring a single router as the designated RR and configuring 233 other RRs and their clients as normal IBGP peers. Additional clusters 234 can be created gradually. 236 7. Redundant RRs 238 Usually a cluster of clients will have a single RR. In that case, the 239 cluster will be identified by the BGP Identifier of the RR. However, 240 this represents a single point of failure so to make it possible to 241 have multiple RRs in the same cluster, all RRs in the same cluster 242 can be configured with a 4-byte CLUSTER_ID so that an RR can discard 243 routes from other RRs in the same cluster. 245 8. Avoiding Routing Information Loops 247 When a route is reflected, it is possible through mis-configuration 248 to form route re-distribution loops. The Route Reflection method 249 defines the following attributes to detect and avoid routing 250 information loops: 252 ORIGINATOR_ID 254 ORIGINATOR_ID is a new optional, non-transitive BGP attribute of Type 255 code 9. This attribute is 4 bytes long and it will be created by a RR 256 in reflecting a route. This attribute will carry the BGP Identifier 257 of the originator of the route in the local AS. A BGP speaker SHOULD 258 NOT create an ORIGINATOR_ID attribute if one already exists. A 259 router which recognizes the ORIGINATOR_ID attribute SHOULD ignore a 260 route received with its BGP Identifier as the ORIGINATOR_ID. 262 CLUSTER_LIST 263 CLUSTER_LIST is a new optional, non-transitive BGP attribute of Type 264 code 10. It is a sequence of CLUSTER_ID values representing the 265 reflection path that the route has passed. 267 When a RR reflects a route, it MUST prepend the local CLUSTER_ID to 268 the CLUSTER_LIST. If the CLUSTER_LIST is empty, it MUST create a new 269 one. Using this attribute an RR can identify if the routing 270 information has looped back to the same cluster due to mis- 271 configuration. If the local CLUSTER_ID is found in the CLUSTER_LIST, 272 the advertisement received SHOULD be ignored. 274 9. Impact on Route Selection 276 The BGP Decision Process Tie Breaking rules (Sect. 9.1.2.2, [1]) are 277 modified as follows: 279 If a route carries the ORIGINATOR_ID attribute, then in Step f) 280 the ORIGINATOR_ID SHOULD be treated as the BGP Identifier of 281 the BGP speaker that has advertised the route. 283 In addition, the following rule SHOULD be inserted between Steps 284 f) and g): a BGP Speaker SHOULD prefer a route with the shorter 285 CLUSTER_LIST length. The CLUSTER_LIST length is zero if a route 286 does not carry the CLUSTER_LIST attribute. 288 10. Implementation Considerations 290 Care should be taken to make sure that none of the BGP path 291 attributes defined above can be modified through configuration when 292 exchanging internal routing information between RRs and Clients and 293 Non-Clients. Their modification could potentially result in routing 294 loops. 296 In addition, when a RR reflects a route, it SHOULD NOT modify the 297 following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED. 298 Their modification could potential result in routing loops. 300 11. Configuration and Deployment Considerations 302 The BGP protocol provides no way for a Client to identify itself 303 dynamically as a Client of an RR. The simplest way to achieve this 304 is by manual configuration. 306 One of the key component of the route reflection approach in 307 addressing the scaling issue is that the RR summarizes routing 308 information and only reflects its best path. 310 Both MEDs and IGP metrics may impact the BGP route selection. 311 Because MEDs are not always comparable and the IGP metric may differ 312 for each router, with certain route reflection topologies the route 313 reflection approach may not yield the same route selection result as 314 that of the full IBGP mesh approach. A way to make route selection 315 the same as it would be with the full IBGP mesh approach is to make 316 sure that route reflectors are never forced to perform the BGP route 317 selection based on IGP metrics which are significantly different from 318 the IGP metrics of their clients, or based on incomparable MEDs. The 319 former can be achieved by configuring the intra-cluster IGP metrics 320 to be better than the inter-cluster IGP metrics, and maintaining full 321 mesh within the cluster. The latter can be achieved by: 323 o setting the local preference of a route at the border router to 324 reflect the MED values. 326 o or by making sure the AS-path lengths from different ASs are 327 different when the AS-path length is used as a route selection 328 criteria. 330 o or by configuring community based policies using which the 331 reflector can decide on the best route. 333 One could argue though that the latter requirement is overly 334 restrictive, and perhaps impractical in some cases. One could 335 further argue that as long as there are no routing loops, there are 336 no compelling reasons to force route selection with route reflectors 337 to be the same as it would be with the full IBGP mesh approach. 339 To prevent routing loops and maintain consistent routing view, it is 340 essential that the network topology be carefully considered in 341 designing a route reflection topology. In general, the route 342 reflection topology should congruent with the network topology when 343 there exist multiple paths for a prefix. One commonly used approach 344 is the POP-based reflection, in which each POP maintains its own 345 route reflectors serving clients in the POP, and all route reflectors 346 are fully meshed. In addition, clients of the reflectors in each POP 347 are often fully meshed for the purpose of optimal intra-POP routing, 348 and the intra-POP IGP metrics are configured to be better than the 349 inter-POP IGP metrics. 351 12. Security Considerations 353 This extension to BGP does not change the underlying security issues 354 inherent in the existing IBGP [1, 5]. 356 13. Acknowledgments 358 The authors would like to thank Dennis Ferguson, John Scudder, Paul 359 Traina and Tony Li for the many discussions resulting in this work. 360 This idea was developed from an earlier discussion between Tony Li 361 and Dimitri Haskin. 363 In addition, the authors would like to acknowledge valuable review 364 and suggestions from Yakov Rekhter on this document, and helpful 365 comments from Tony Li, Rohit Dube, John Scudder and Bruce Cole. 367 14. References 369 14.1. Normative References 371 [1] Rekhter, Y., T. Li and S. Hares, "A Border Gateway Protocol 4 372 (BGP-4)", draft-ietf-idr-bgp4-26.txt, October 2004. 374 14.2. Informative References 376 [2] Haskin, D., "A BGP/IDRP Route Server alternative to a full mesh 377 routing", RFC 1863, October 1995. 379 [3] Traina, P., "Limited Autonomous System Confederations for BGP", 380 RFC 1965, June 1996. 382 [4] Bates, T. and R. Chandra, "BGP Route Reflection An alternative 383 to full mesh IBGP", RFC 1966, June 1996. 385 [5] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 386 Signature Option", RFC 2385, August 1998. 388 [6] Bates, T., R. Chandra and E. Chen "BGP Route Reflection - An 389 Alternative to Full Mesh IBGP", RFC 2796, Arpil 2000. 391 [7] Bradner, S., "Key words for use in RFCs to Indicate Requirement 392 Levels", BCP 14, RFC 2119, March 1997. 394 15. Authors' Addresses 396 Tony Bates 397 Cisco Systems, Inc. 398 170 West Tasman Drive 399 San Jose, CA 95134 401 EMail: tbates@cisco.com 403 Ravi Chandra 404 Sonoa Systems, Inc. 405 3255-7 Scott Blvd. 406 Santa Clara, CA 95054 408 Email: rchandra@sonoasystems.com 410 Enke Chen 411 Cisco Systems, Inc. 412 170 West Tasman Drive 413 San Jose, CA 95134 415 EMail: enkechen@cisco.com 417 16. Appendix A Comparison with RFC 2796 419 The impact on route selection is added. 421 The pictorial description of the encoding of the CLUSTER_LIST 422 attribute is removed as the description is redundant to the BGP 423 specification, and the attribute length field is inadvertently 424 described as one octet. 426 17. Appendix B Comparison with RFC 1966 428 All the changes listed in Appendix A, plus the following. 430 Several terminologies related to route reflection are clarified, and 431 the reference to EBGP routes/peers are removed. 433 The handling of a routing information loop (due to route reflection) 434 by a receiver is clarified and made more consistent. 436 The addition of a CLUSTER_ID to the CLUSTER_LIST has been changed 437 from "append" to "prepend" to reflect the deployed code. 439 The section on "Configuration and Deployment Considerations" has been 440 expanded to address several operational issues. 442 18. Intellectual Property Considerations 444 The IETF takes no position regarding the validity or scope of any 445 Intellectual Property Rights or other rights that might be claimed to 446 pertain to the implementation or use of the technology described in 447 this document or the extent to which any license under such rights 448 might or might not be available; nor does it represent that it has 449 made any independent effort to identify any such rights. Information 450 on the procedures with respect to rights in RFC documents can be 451 found in BCP 78 and BCP 79. 453 Copies of IPR disclosures made to the IETF Secretariat and any 454 assurances of licenses to be made available, or the result of an 455 attempt made to obtain a general license or permission for the use of 456 such proprietary rights by implementers or users of this 457 specification can be obtained from the IETF on-line IPR repository at 458 http://www.ietf.org/ipr. 460 The IETF invites any interested party to bring to its attention any 461 copyrights, patents or patent applications, or other proprietary 462 rights that may cover technology that may be required to implement 463 this standard. Please address the information to the IETF at ietf- 464 ipr@ietf.org. 466 19. Full Copyright Notice 468 Copyright (C) The Internet Society (2005). 470 This document is subject to the rights, licenses and restrictions 471 contained in BCP 78, and except as set forth therein, the authors 472 retain all their rights. 474 This document and the information contained herein are provided on an 475 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 476 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 477 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 478 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 479 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 480 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.