idnits 2.17.1 draft-ietf-idr-route-reflect-v2-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2,3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 38 has weird spacing: '...as been well...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1999) is 8892 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. '1') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 1863 (ref. '2') (Obsoleted by RFC 4223) ** Obsolete normative reference: RFC 1965 (ref. '3') (Obsoleted by RFC 3065) ** Obsolete normative reference: RFC 1966 (ref. '4') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 2385 (ref. '5') (Obsoleted by RFC 5925) Summary: 11 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Tony Bates 2 Cisco Systems 3 Ravi Chandra 4 Enke Chen 5 Siara Systems 6 December 1999 8 BGP Route Reflection - 9 An Alternative to Full Mesh IBGP 10 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. Internet-Drafts are working 16 documents of the Internet Engineering Task Force (IETF), its areas, 17 and its working groups. Note that other groups may also distribute 18 working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 The Border Gateway Protocol [1] is an inter-autonomous system routing 34 protocol designed for TCP/IP internets. Currently in the Internet BGP 35 deployments are configured such that that all BGP speakers within a 36 single AS must be fully meshed so that any external routing 37 information must be re-distributed to all other routers within that 38 AS. This represents a serious scaling problem that has been well 39 documented with several alternatives proposed [2,3]. 41 This document describes the use and design of a method known as 42 "Route Reflection" to alleviate the the need for "full mesh" IBGP. 44 1. Introduction 46 Currently in the Internet, BGP deployments are configured such that 47 that all BGP speakers within a single AS must be fully meshed and any 48 external routing information must be re-distributed to all other 49 routers within that AS. For n BGP speakers within an AS that 50 requires to maintain n*(n-1)/2 unique IBGP sessions. This "full 51 mesh" requirement clearly does not scale when there are a large 52 number of IBGP speakers each exchanging a large volume of routing 53 information, as is common in many of todays internet networks. 55 This scaling problem has been well documented and a number of 56 proposals have been made to alleviate this [2,3]. This document 57 represents another alternative in alleviating the need for a "full 58 mesh" and is known as "Route Reflection". This approach allows a BGP 59 speaker (known as "Route Reflector") to advertise IBGP learned routes 60 to certain IBGP peers. It represents a change in the commonly 61 understood concept of IBGP, and the addition of two new optional 62 transitive BGP attributes to prevent loops in routing updates. 64 This document is a revision of RFC1966 [4], and it includes editorial 65 changes, clarifications and corrections based on the deployment 66 experience with route reflection. These revisions are summarized in 67 the Appendix. 69 2. Design Criteria 71 Route Reflection was designed to satisfy the following criteria. 73 o Simplicity 75 Any alternative must be both simple to configure as well 76 as understand. 78 o Easy Transition 80 It must be possible to transition from a full mesh 81 configuration without the need to change either topology 82 or AS. This is an unfortunate management overhead of the 83 technique proposed in [3]. 85 o Compatibility 87 It must be possible for non compliant IBGP peers 88 to continue be part of the original AS or domain 89 without any loss of BGP routing information. 91 These criteria were motivated by operational experiences of a very 92 large and topology rich network with many external connections. 94 3. Route Reflection 96 The basic idea of Route Reflection is very simple. Let us consider 97 the simple example depicted in Figure 1 below. 99 +-------+ +-------+ 100 | | IBGP | | 101 | RTR-A |--------| RTR-B | 102 | | | | 103 +-------+ +-------+ 104 \ / 105 IBGP \ ASX / IBGP 106 \ / 107 +-------+ 108 | | 109 | RTR-C | 110 | | 111 +-------+ 113 Figure 1: Full Mesh IBGP 115 In ASX there are three IBGP speakers (routers RTR-A, RTR-B and RTR- 116 C). With the existing BGP model, if RTR-A receives an external route 117 and it is selected as the best path it must advertise the external 118 route to both RTR-B and RTR-C. RTR-B and RTR-C (as IBGP speakers) 119 will not re-advertise these IBGP learned routes to other IBGP 120 speakers. 122 If this rule is relaxed and RTR-C is allowed to advertise IBGP 123 learned routes to IBGP peers, then it could re-advertise (or reflect) 124 the IBGP routes learned from RTR-A to RTR-B and vice versa. This 125 would eliminate the need for the IBGP session between RTR-A and RTR-B 126 as shown in Figure 2 below. 128 +-------+ +-------+ 129 | | | | 130 | RTR-A | | RTR-B | 131 | | | | 132 +-------+ +-------+ 133 \ / 134 IBGP \ ASX / IBGP 135 \ / 136 +-------+ 137 | | 138 | RTR-C | 139 | | 140 +-------+ 142 Figure 2: Route Reflection IBGP 144 The Route Reflection scheme is based upon this basic principle. 146 4. Terminology and Concepts 148 We use the term "Route Reflection" to describe the operation of a BGP 149 speaker advertising an IBGP learned route to another IBGP peer. Such 150 a BGP speaker is said to be a "Route Reflector" (RR), and such a 151 route is said to be a reflected route. 153 The internal peers of a RR are divided into two groups: 155 1) Client Peers 157 2) Non-Client Peers 159 A RR reflects routes between these groups, and may reflect routes 160 among client peers. A RR along with its client peers form a Cluster. 161 The Non-Client peer must be fully meshed but the Client peers need 162 not be fully meshed. Figure 3 depicts a simple example outlining the 163 basic RR components using the terminology noted above. 165 / - - - - - - - - - - - - - - 166 | Cluster | 167 +-------+ +-------+ 168 | | | | | | 169 | RTR-A | | RTR-B | 170 | |Client | |Client | | 171 +-------+ +-------+ 172 | \ / | 173 IBGP \ / IBGP 174 | \ / | 175 +-------+ 176 | | | | 177 | RTR-C | 178 | | RR | | 179 +-------+ 180 | / \ | 181 - - - - - /- - -\- - - - - - / 182 IBGP / \ IBGP 183 +-------+ +-------+ 184 | RTR-D | IBGP | RTR-E | 185 | Non- |---------| Non- | 186 |Client | |Client | 187 +-------+ +-------+ 189 Figure 3: RR Components 191 5. Operation 193 When a RR receives a route from an IBGP peer, it selects the best 194 path based on its path selection rule. After the best path is 195 selected, it must do the following depending on the type of the peer 196 it is receiving the best path from: 198 1) A Route from a Non-Client IBGP peer 200 Reflect to all the Clients. 202 2) A Route from a Client peer 204 Reflect to all the Non-Client peers and also to the 205 Client peers. (Hence the Client peers are not required 206 to be fully meshed.) 208 An Autonomous System could have many RRs. A RR treats other RRs just 209 like any other internal BGP speakers. A RR could be configured to 210 have other RRs in a Client group or Non-client group. 212 In a simple configuration the backbone could be divided into many 213 clusters. Each RR would be configured with other RRs as Non-Client 214 peers (thus all the RRs will be fully meshed.). The Clients will be 215 configured to maintain IBGP session only with the RR in their 216 cluster. Due to route reflection, all the IBGP speakers will receive 217 reflected routing information. 219 It is possible in a Autonomous System to have BGP speakers that do 220 not understand the concept of Route-Reflectors (let us call them 221 conventional BGP speakers). The Route-Reflector Scheme allows such 222 conventional BGP speakers to co-exist. Conventional BGP speakers 223 could be either members of a Non-Client group or a Client group. This 224 allows for an easy and gradual migration from the current IBGP model 225 to the Route Reflection model. One could start creating clusters by 226 configuring a single router as the designated RR and configuring 227 other RRs and their clients as normal IBGP peers. Additional clusters 228 can be created gradually. 230 6. Redundant RRs 232 Usually a cluster of clients will have a single RR. In that case, the 233 cluster will be identified by the ROUTER_ID of the RR. However, this 234 represents a single point of failure so to make it possible to have 235 multiple RRs in the same cluster, all RRs in the same cluster can be 236 configured with a 4-byte CLUSTER_ID so that an RR can discard routes 237 from other RRs in the same cluster. 239 7. Avoiding Routing Information Loops 241 When a route is reflected, it is possible through mis-configuration 242 to form route re-distribution loops. The Route Reflection method 243 defines the following attributes to detect and avoid routing 244 information loops: 246 ORIGINATOR_ID 248 ORIGINATOR_ID is a new optional, non-transitive BGP attribute of Type 249 code 9. This attribute is 4 bytes long and it will be created by a RR 250 in reflecting a route. This attribute will carry the ROUTER_ID of 251 the originator of the route in the local AS. A BGP speaker should not 252 create an ORIGINATOR_ID attribute if one already exists. A router 253 which recognizes the ORIGINATOR_ID attribute should ignore a route 254 received with its ROUTER_ID as the ORIGINATOR_ID. 256 CLUSTER_LIST 258 Cluster-list is a new optional, non-transitive BGP attribute of Type 259 code 10. It is a sequence of CLUSTER_ID values representing the 260 reflection path that the route has passed. It is encoded as follows: 262 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 | Attr. Flags |Attr. Type Code| Length | value ... 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 Where Length is the number of octets. 269 When a RR reflects a route, it must prepend the local CLUSTER_ID to 270 the CLUSTER_LIST. If the CLUSTER_LIST is empty, it must create a new 271 one. Using this attribute an RR can identify if the routing 272 information is looped back to the same cluster due to mis- 273 configuration. If the local CLUSTER_ID is found in the cluster-list, 274 the advertisement received should be ignored. 276 8. Implementation Considerations 278 Care should be taken to make sure that none of the BGP path 279 attributes defined above can be modified through configuration when 280 exchanging internal routing information between RRs and Clients and 281 Non-Clients. Their modification could potential result in routing 282 loops. 284 In addition, when a RR reflects a route, it should not modify the 285 following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED. 286 Their modification could potential result in routing loops. 288 9. Configuration and Deployment Considerations 290 The BGP protocol provides no way for a Client to identify itself 291 dynamically as a Client of an RR. The simplest way to achieve this 292 is by manual configuration. 294 One of the key component of the route reflection approach in 295 addressing the scaling issue is that the RR summarizes routing 296 information and only reflects its best path. 298 Both MEDs and IGP metrics may impact the BGP route selection. 299 Because MEDs are not always comparable and the IGP metric may differ 300 for each router, with certain route reflection topologies the route 301 reflection approach may not yield the same route selection result as 302 that of the full IBGP mesh approach. A way to make route selection 303 the same as it would be with the full IBGP mesh approach is to make 304 sure that route reflectors are never forced to perform the BGP route 305 selection based on IGP metrics which are significantly different from 306 the IGP metrics of their clients, or based on incomparable MEDs. The 307 former can be achieved by configuring the intra-cluster IGP metrics 308 to be better than the inter-cluster IGP metrics, and maintaining full 309 mesh within the cluster. The latter can be achieved by: 311 o setting the local preference of a route at the border router 312 to reflect the MED values. 314 o or by making sure the AS-path lengths from different ASs are 315 different when the AS-path length is used as a route 316 selection criteria. 318 o or by configuring community based policies using which the 319 reflector can decide on the best route. 321 One could argue though that the latter requirement is overly 322 restrictive, and perhaps impractical in some cases. One could 323 further argue that as long as there are no routing loops, there are 324 no compelling reasons to force route selection with route reflectors 325 to be the same as it would be with the full IBGP mesh approach. 327 To prevent routing loops and maintain consistent routing view, it is 328 essential that the network topology be carefully considered in 329 designing a route reflection topology. In general, the route 330 reflection topology should congruent with the network topology when 331 there exist multiple paths for a prefix. One commonly used approach 332 is the POP-based reflection, in which each POP maintains its own 333 route reflectors serving clients in the POP, and all route reflectors 334 are fully meshed. In addition, clients of the reflectors in each POP 335 are often fully meshed for the purpose of optimal intra-POP routing, 336 and the intra-POP IGP metrics are configured to be better than the 337 inter-POP IGP metrics. 339 10. Security 341 This extension to BGP does not change the underlying security issues 342 inherent in the existing IBGP [5]. 344 11. Acknowledgments 345 The authors would like to thank Dennis Ferguson, John Scudder, Paul 346 Traina and Tony Li for the many discussions resulting in this work. 347 This idea was developed from an earlier discussion between Tony Li 348 and Dimitri Haskin. 350 In addition, the authors would like to acknowledge valuable review 351 and suggestions from Yakov Rekhter on this document, and helpful 352 comments from Tony Li, Rohit Dube, and John Scudder on Section 9, and 353 from Bruce Cole. 355 12. Appendix Comparison with RFC 1966 357 Several terminologies related to route reflection are clarified, and 358 the reference to EBGP routes/peers are removed. 360 The handling of a routing information loop (due to route reflection) 361 by a receiver is clarified and made more consistent. 363 The addition of a CLUSTER_ID to the CLUSTER_LIST has been changed 364 from "append" to "prepend" to reflect the deployed code. 366 The section on "Configuration and Deployment Considerations" has been 367 expanded to address several operational issues. 369 13. References 371 [1] Rekhter, Y., and Li, T., "A Border Gateway Protocol 4 (BGP-4)", 372 RFC1771, March 1995. 374 [2] Haskin, D., "A BGP/IDRP Route Server alternative to a full mesh 375 routing", RFC1863, October 1995. 377 [3] Traina, P. "Limited Autonomous System Confederations for BGP", 378 RFC1965, June 1996. 380 [4] Bates, T., and Chandra, R., "BGP Route Reflection An alternative 381 to full mesh IBGP", RFC1966, June 1996. 383 [5] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 Sig- 384 nature Option", RFC2385, August 1998. 386 14. Author's Addresses 388 Tony Bates 389 Cisco Systems, Inc. 390 170 West Tasman Drive 391 San Jose, CA 95134 393 email: tbates@cisco.com 395 Ravi Chandra 396 Siara Systems, Inc. 397 1195 Borregas Ave. 398 Sunnyvale, CA 94089 400 email: rchandra@siara.com 402 Enke Chen 403 Siara Systems, Inc. 404 1195 Borregas Ave. 405 Sunnyvale, CA 94089 407 email: enke@siara.com