idnits 2.17.1 draft-ietf-idr-bgp-optimal-route-reflection-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4456]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 85 has weird spacing: '...routing from...' == Line 96 has weird spacing: '... potato routi...' == Line 253 has weird spacing: '...routing from ...' == Line 570 has weird spacing: '... potato routi...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (December 4, 2012) is 4162 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 930, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 933, but no explicit reference was found in the text == Unused Reference: 'RFC4360' is defined on line 936, but no explicit reference was found in the text == Unused Reference: 'RFC5492' is defined on line 939, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 949, but no explicit reference was found in the text == Unused Reference: 'RFC1998' is defined on line 952, but no explicit reference was found in the text == Unused Reference: 'RFC4384' is defined on line 956, but no explicit reference was found in the text == Unused Reference: 'RFC4893' is defined on line 963, but no explicit reference was found in the text == Unused Reference: 'RFC5668' is defined on line 970, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 -- Obsolete informational reference (is this intentional?): RFC 4893 (Obsoleted by RFC 6793) Summary: 1 error (**), 0 flaws (~~), 16 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group R. Raszuk 3 Internet-Draft NTT MCL 4 Intended status: Standards Track C. Cassar 5 Expires: June 7, 2013 Cisco Systems 6 E. Aman 7 TeliaSonera 8 B. Decraene 9 France Telecom 10 S. Litkowski 11 Orange 12 December 4, 2012 14 BGP Optimal Route Reflection (BGP-ORR) 15 draft-ietf-idr-bgp-optimal-route-reflection-04 17 Abstract 19 [RFC4456] asserts that, because the Interior Gateway Protocol (IGP) 20 cost to a given point in the network will vary across routers, "the 21 route reflection approach may not yield the same route selection 22 result as that of the full IBGP mesh approach." One practical 23 implication of this assertion is that the deployment of route 24 reflection may thwart the ability to achieve hot potato routing. Hot 25 potato routing attempts to direct traffic to the closest AS egress 26 point in cases where no higher priority policy dictates otherwise. 27 As a consequence of the route reflection method, the choice of exit 28 point for a route reflector and its clients will be the egress point 29 closest to the route reflector - and not necessarily closest to the 30 RR clients. 32 Section 11 of [RFC4456] describes a deployment approach and a set of 33 constraints which, if satsified, would result in the deployment of 34 route reflection yielding the same results as the iBGP full mesh 35 approach. Such a deployment approach would make route reflection 36 compatible with the application of hot potato routing policy. 38 As networks evolved to accommodate architectural requirements of new 39 services, tunneled (LSP/IP tunneling) networks with centralized route 40 reflectors became commonplace. This is one type of common deployment 41 where it would be impractical to satisfy the constraints described in 42 Section 11 of [RFC4456]. Yet, in such an environment, hot potato 43 routing policy remains desirable. 45 This document proposes two new solutions which can be deployed to 46 facilitate the application of closest exit point policy centralized 47 route reflection deployments. 49 Status of this Memo 51 This Internet-Draft is submitted in full conformance with the 52 provisions of BCP 78 and BCP 79. 54 Internet-Drafts are working documents of the Internet Engineering 55 Task Force (IETF). Note that other groups may also distribute 56 working documents as Internet-Drafts. The list of current Internet- 57 Drafts is at http://datatracker.ietf.org/drafts/current/. 59 Internet-Drafts are draft documents valid for a maximum of six months 60 and may be updated, replaced, or obsoleted by other documents at any 61 time. It is inappropriate to use Internet-Drafts as reference 62 material or to cite them other than as "work in progress." 64 This Internet-Draft will expire on June 7, 2013. 66 Copyright Notice 68 Copyright (c) 2012 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents 73 (http://trustee.ietf.org/license-info) in effect on the date of 74 publication of this document. Please review these documents 75 carefully, as they describe your rights and restrictions with respect 76 to this document. Code Components extracted from this document must 77 include Simplified BSD License text as described in Section 4.e of 78 the Trust Legal Provisions and are provided without warranty as 79 described in the Simplified BSD License. 81 Table of Contents 83 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 84 2. Proposed solutions . . . . . . . . . . . . . . . . . . . . . . 5 85 3. Best path selection for BGP hot potato routing from 86 customized IGP network position . . . . . . . . . . . . . . . 7 87 3.1. Client's perspective best path selection algorithm . . . . 8 88 3.1.1. Flat IGP network . . . . . . . . . . . . . . . . . . . 8 89 3.1.2. Hierarchical IGP network . . . . . . . . . . . . . . . 9 90 3.2. Aside: Configuration-based flexible route reflector 91 placement . . . . . . . . . . . . . . . . . . . . . . . . 10 92 3.3. Route reflector client grouping . . . . . . . . . . . . . 10 93 3.3.1. Route Reflector Client Group ID . . . . . . . . . . . 11 94 3.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 12 95 3.5. Advantages . . . . . . . . . . . . . . . . . . . . . . . . 13 96 4. Angular distance approximation for BGP warm potato routing . 13 97 4.1. Problem statement . . . . . . . . . . . . . . . . . . . . 14 98 4.2. Proposed solution . . . . . . . . . . . . . . . . . . . . 15 99 4.3. Centralized vs distributed route reflectors . . . . . . . 16 100 5. Client's perspective policy based best path selection . . . . 17 101 5.1. Proposal . . . . . . . . . . . . . . . . . . . . . . . . . 18 102 5.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 18 103 5.3. Avoiding routing loops . . . . . . . . . . . . . . . . . . 19 104 6. Deployment considerations . . . . . . . . . . . . . . . . . . 20 105 7. Security considerations . . . . . . . . . . . . . . . . . . . 21 106 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 107 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 108 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 109 10.1. Normative References . . . . . . . . . . . . . . . . . . . 21 110 10.2. Informative References . . . . . . . . . . . . . . . . . . 22 111 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 113 1. Introduction 115 There are three types of BGP deployments within Autonomous Systems 116 today: full mesh, confederations and route reflection. 118 BGP route reflection is the most popular way to distribute BGP routes 119 between BGP speakers belonging to the same administrative domain. 120 Traditionally route reflectors have been deployed in the forwarding 121 path and carefully placed on the POP to core boundaries. That model 122 of BGP route reflector placement has started to evolve. The 123 placement of route reflectors outside the forwarding path was 124 triggered by applications which required traffic to be tunneled from 125 AS ingress PE to egress PE: for example L3VPN. 127 This evolving model of intra-domain network design has enabled 128 deployments of centralized route reflectors. Initially this model 129 was only employed for new address families e.g. L3VPNs, L2VPNs etc 131 With edge to edge MPLS or IP encapsulation also being used to carry 132 internet traffic, this model has been gradually extended to other BGP 133 address families including IPv4 and IPv6 Internet routing. This is 134 also applicable to new services achieved with BGP as control plane 135 for example 6PE. 137 Such centralized route reflectors can be placed on the POP to core 138 boundaries, but they are often placed in arbitrary locations in the 139 core of large networks. 141 Such deployments suffer from a critical drawback in the context of 142 best path selection. A route reflector with knowledge of multiple 143 paths for a given prefix will pick the best path and only advertise 144 that best path to the the route reflector clients. If the best path 145 for a prefix is selected on the basis of an IGP tie break, the best 146 path advertised from the route reflector to its clients will be the 147 exit point closest to the route reflector. But route reflector 148 clients will be in a place in the network toplogy which is different 149 from the route reflector. In networks with centralized route 150 reflectors, this difference will be even more acute. It follows that 151 the best path chosen by the route reflector is not necessarily the 152 same as the path which would have been chosen by the client if the 153 client considered the same set of candidate paths as the route 154 reflector. Furthermore, the path chosen by the client might have 155 been a better path from that chosen by the route reflector for 156 traffic entering the network at the client. The path chosen by the 157 client would have guaranteed the lowest cost and delay trajectory 158 through the network. 160 Route reflector clients switch packets using routing information 161 learnt from route reflectors which are not on the forwarding path of 162 the packet through the network even in the absence of end-to-end 163 encapsulation. In those cases the path chosen as best and propagated 164 to the clients will often not be the optimal path chosen by the 165 client given all available paths. 167 Eliminating the IGP distance to the BGP nexthop as a tie breaker on 168 centralized route reflectors does not address the issue. Ignoring 169 IGP distance to the BGP next hop results in the tie breaking 170 procedure contributing the best path by differentiating between paths 171 using attributes otherwise considered less important than IGP cost to 172 the BGP nexthop. 174 One possible valid solution or workaround to this problem requires 175 sending all domain external paths from the RR to all its clients. 176 This approach suffers the significant drawback of pushing a large 177 amount of BGP state to all the edge routers. In many networks, the 178 number of EBGP peers over which full Internet routing information is 179 received would correlate directly to the number of paths present in 180 each ASBR. This could easily result in tens of paths for each 181 prefix. 183 Notwithstanding this drawback, there are a number of reasons for 184 sending more than just the single best path to the clients. Improved 185 path diversity at the edge is a requirement for fast connectivity 186 restoration, and a requirement for effective BGP level load 187 balancing. Protocol extensions like add-paths 188 [I-D.ietf-idr-add-paths] or [RFC6774] diverse-path allow for such 189 improved path diversity and can be used to address the same problems 190 addressed by the mechanisms proposed in this draft. 192 In practical terms, add/diverse path deployments are expected to 193 result in the distribution of 2, 3 or n (where n is a small number) 194 'good' paths rather than all domain external paths. While the route 195 reflector chooses one set of n paths and distributes those same n 196 paths to all its route reflector clients, those n paths may not be 197 the right n paths for all clients. In the context of the problem 198 described above, those n paths will not necessarily include the 199 closest egress point out of the network for each route reflector 200 client. The mechanisms proposed in this document are likely to be 201 complementary to mechanisms aimed at improving path diversity. 203 2. Proposed solutions 205 This document proposes two simple solutions to the problem described 206 above. Both of these solutions make it possible for route reflector 207 clients to direct traffic to their closest exit point in hot potato 208 routing deployments, without requiring further state to be pushed out 209 to the edge. These solutions are primarily applicable in deployments 210 using centralized route reflectors, which are typically implemented 211 in devices without a capable forwarding plane. 213 The two alternatives are: 215 "Best path selection for BGP hot potato routing from client's IGP 216 network position" 218 "Angular distance approximation for BGP warm potato routing" 220 Both solutions rely upon all route reflectors learning all paths 221 which are eligible for consideration for hot potato routing. In 222 order to satisfy this requirement, path diversity enhancing 223 mechanisms such as add paths/diverse paths may need to be deployed 224 between route reflectors. 226 In both of these solutions the route reflector selects and 227 distributes a route to each client based on what would be optimal 228 from the client's perspective. By optimal we refer in this document 229 to the decision made during best path selection at the IGP metric to 230 BGP next hop comparison step. Clearly the overall path selection 231 preference may be chosen based other policy step and provisions as 232 defined in this document would not apply. 234 In the respective solutions the choice is made either factoring in 235 IGP costs or the configured angular distance to the next hop. The 236 route reflector makes different decisions for different clients only 237 in the case where the tie breaker for path selection would have been 238 the IGP distance to the BGP nexthop (as in hot potato routing). 240 A significant advantage of this approach is that the RR clients do 241 not need to run new software or hardware. 243 Besides these solutions to manage hot potato routing, there are 244 deployment scenarios where service providers want to have more 245 control of traffic exiting the AS by assigning per client preference 246 to gateways. 248 This document proposes to introduce a solution to perform a policy 249 based route-reflection to address those scenarios. This solution has 250 the same requirements (regarding path diversity) and advantages than 251 the two IGP metric based solutions. 253 3. Best path selection for BGP hot potato routing from customized IGP 254 network position 256 This section describes a method for calculating the order of 257 preference of BGP paths from the point of view of each separate route 258 reflector client. More specifically, the route reflector will 259 compute the IGP metric to the BGP nexthop from the position of the 260 client to which the resulting path will be distributed, if the IGP 261 metric is the tie breaker applied to a set of possible paths. In the 262 subsequent model authors will propose virtual reflector placement at 263 operator's selected IGP location. 265 In the case of a hierarchical IGP deployment where the client is in a 266 different level in the hierarchy to the route reflector, the route 267 reflector will compute IGP distance to the BGP nexthop from the Area 268 Border Routers (ABR) leading to the client in lieu of the route 269 reflector client itself, and use the shortest distance from these 270 ABRs to the nexthop. This provides an approximation to the desired 271 functionality. Rather than a client picking the closest path, the 272 client would be picking the exit point closest to the client region 273 as defined by area or level. In cases where one or more nexthops are 274 in the same region as the client, one of those nexthops would be 275 preferred, with tie breaking within those nexthops performed from the 276 route reflector's position in the network. 278 It is assumed that reachability through a set of ABRs is always 279 advertised through identical prefixes from those ABRs. If a nexthop 280 is reachable through multiple ABRs but the ABRs advertise 281 reachability through prefixes of different length, then only the ABR 282 advertising the longest prefix will be considered as a viable path to 283 the nexthop. 285 BGP best path selection and its distribution has a natural 286 consequence of limiting the amount of state in the network. That is 287 not in itself a drawback. BGP speakers will rarely need to receive 288 all available BGP paths. In network deployments with multiple 289 upstream peerings or with very dense peering schemes, the number of 290 available BGP paths for a given BGP prefix can be high. Real network 291 deployments with the number of paths for a prefix ranging from 10s to 292 100s have been observed. It would be wasteful to propagate all of 293 those paths to all clients, such that each client can select paths 294 according to the position of the nexthop relative to the client. 296 Whenever a BGP route reflector would need to decide what path or 297 paths need to be selected for advertisement to one of its clients, 298 the route reflector would need to virtually position itself in its 299 client IGP network location in order to choose the right set of paths 300 based on the IGP metric to the next hops from the client's 301 perspective. 303 This technique applies in deployments with or without diverse paths 304 or the various path selection modes contemplated in add-paths. 306 In the network architectures consisting of more then single pair of 307 route reflectors it is required that all reflectors are fully meshed 308 and have ability to learn and maintain all external BGP paths. In 309 the event of constructing a hierarchy of reflectors to relax the full 310 RR mesh requirements ORR should not be run between such route 311 reflectors. 313 3.1. Client's perspective best path selection algorithm 315 For each centralized route reflector the proposal assumes that the 316 route reflector participates in a common IGP with its clients. There 317 are two scenarios to consider - flat versus hierarchical IGP network. 319 3.1.1. Flat IGP network 321 Reflectors run SPF from the client IGP node point of view such 322 that the cost of BGP nexthops from the client can be determined if 323 necessary. For the purpose of BGP path selection the interesting 324 product of this calculation is the ability to determine the IGP 325 distance from a client to a BGP next hop. This distance to a 326 nexthop would be interesting in cases where that next hop is for a 327 path which is contending with otherwise equally preferred paths. 328 This approach works in tunneled as well as conventional hop-by-hop 329 IP forwarding cores. 331 When the path selection tie breaker for a prefix is the IGP metric 332 to the BGP nexthops of the contending paths, then the route 333 reflector will determine the order of preference of the contending 334 paths by considering the distance from the client to the path 335 nexthops in order to decide what path/s to advertise to a client 336 (or group of clients where feasible). It should be noted that an 337 operator may wish to provide a distance tolerance value, such that 338 beyond a certain granularity, differences between IGP metric are 339 invisible to the path selection algorithm. This will allow a 340 route reflector some leeway in selecting between paths such that 341 rather than pick one path over another on the basis of a 342 difference in distance which is operationally irrelevant, the 343 route reflector can choose to optimize for update generation 344 grouping. Furthermore, this tolerance will reduce the likelihood 345 of generation of BGP updates when the IGP topology changes in a 346 way which is not operationally relevant. In the case that a path 347 is selected from a set for a given prefix while ignoring 348 differences in distance within the tolerance figure, then that 349 same path must always be preferred for all clients where the paths 350 are within the tolerance figure 352 3.1.2. Hierarchical IGP network 354 Hierarchy introduces two challenges: 356 The first challenge is that the RR IGP view may differ from a 357 client IGP view by virtue of one or the other having a summarized 358 view versus the other. Summarization, by its nature, loses 359 information. Consider the example where a client within a PoP 360 sees two prefixes with two metrics for two egress points within 361 the PoP, but where the RR only sees a single summary covering 362 reachability to both nexthops as injected by the ABR. For 363 clarification purposes in the case of ISIS by ABR we refer to 364 L1/L2 node. However it needs to be observed that inter area 365 networks running LDP are required to disable summarisation of all 366 FEC advertised in LDP (typically all loopbacks) unless [RFC5283] 367 is deployed. Such deployments are not likely to suffer 368 summarization difficulties. 370 The second challenge is that in cases where the client is in a 371 different level of hierarchy from the RR, the RR can not build a 372 Shortest Path First (SPF) tree with the client node as root, 373 simply because the topology derived by the IGP will not include 374 the client node. It will instead only include reachability to the 375 client from one or more ABRs. In order to overcome this problem, 376 the RR could compute an SPF tree from the ABRs in the area. The 377 RR would then determine the shortest distance from a client which 378 lives behind the ABRs, to a nexthop, by adding the advertised 379 distances from an ABR to the client and the distance from the ABR 380 to a nexthop, for each ABR, and picking the minimum. This assumes 381 that IGP metrics on links are symmetric; i.e. that the distance 382 from the ABR to the client or nexthop is equal to the distance 383 from the client or nexthop to the ABR. 385 There are cases where the above approach does not help. If RR is 386 trying to arbitrate amongst a set of paths for a client which is 387 in the same hierarchy as some of those paths, and in a different 388 hierarchy to the RR, the opaqueness of the region containing the 389 client at the RR defeats the selection process. It is impossible 390 to determine the relative position of the RR client and the paths 391 within the client region. 393 The solution for hierarchical IGP networks also assumes that if 394 RRs are present and are responsible for calculation of BGP best 395 path to clients they are either placed in each local area 396 coinciding with area containing clients or they are placed in the 397 core (area 0/level 2) of the network. 399 3.2. Aside: Configuration-based flexible route reflector placement 401 The ability to exploit topology information available in the IGP in 402 ways described above can also be used to virtually place the RR at 403 different points in the network for purposes other than hot potato 404 routing. 406 A route reflector can be globally configured to "pretend" its logical 407 location is one of any of the other nodes within a given IGP area/ 408 level flooding scope regardless of its physical connectivity. 410 Such flexibility provides a useful tool for reflector virtualization, 411 and supports moving or replacing physical route reflectors without 412 any effect on routing. Such a change can be permanent or it could be 413 performed during network maintenance in order to minimize network 414 impact. 416 A possible variation would allow the virtual placement of RR to be 417 effected on a per-AF or AF plus update/peer group granularity. It 418 should be noted that this approach provides for splitting one 419 centralized route reflector such that it is virtually positioned at 420 various network locations, with the network location depending upon 421 of address family or address family plus update/peer group. 423 Virtual slicing of a centralized route reflector relaxes the need to 424 propagate all BGP paths between RRs in a alternative conventional 425 distributed RR deployment. It is expected that such RRs would be 426 deployed in redundant sets, and that those RRs would not need to be 427 physically collocated, while still benefiting from the possibility of 428 being logically collocated, and therefore not compromising any of the 429 best path selection symmetry. 431 3.3. Route reflector client grouping 433 It may be appropriate to allow the operator, or the route reflector 434 itself, to group clients together using IGP distance between clients 435 to determine grouping. All the operation discussed above which 436 relied upon computing best path for each client, and measuring 437 distances from each client to different nexthops, would instead be 438 performed for each group of clients. Configurable thresholds can be 439 used to determine which IGP metric changes should be visible to BGP, 440 and trigger best paths recomputation. The latter would be beneficial 441 in existing BGP RR code too. 443 Alternatively route reflector client grouping could be accomplished 444 statically by the operator by coloring clients belonging to a common 445 group (for example being part of the same POP). In order to 446 accomplish such marking it is proposed that BGP OPEN message be 447 augmented with an optional parameter indicating the Group ID given 448 peer belongs to. 450 3.3.1. Route Reflector Client Group ID 452 This is an Optional Parameter in BGP OPEN message that is used by a 453 BGP speaker to convey to its route reflectors the Group ID value. 454 Such value will allow automatic and predictable peer grouping on the 455 route reflectors as deemed necessary from operator's network 456 architecture. 458 The parameter contains precisely one set of [Group_ID Code, Group_ID 459 Length, Group_ID Value] encoded as shown below: 461 +----------------------------+ 462 | Group ID Code (1 octet) | 463 +----------------------------+ 464 | Group ID Length (1 octet) | 465 +----------------------------+ 466 | Group ID Value (4 octets) | 467 +----------------------------+ 469 The use and meaning of these fields are as follows: 471 Group ID Code: 473 Group ID Code is a one octet field that identifies Group ID 474 optional parameter of BGP OPEN message. Value TBD by IANA 475 Recommended value: 3. 477 Group ID Length: 479 Group ID Length is a one octet field that contains the length 480 of the Group ID Value field in octets. It is fixed and equals 481 to 4. 483 Group ID Value: 485 Group ID Value is a fixed length field of size equal to 486 four octets that contains the numerical value of group given 487 BGP speaker should be part of on the route reflector. 489 Two special values are reserved: 491 0x00000000 - No grouping preference 492 0xFFFFFFFF - Do not group this BGP speaker 494 An implementation may allow automatic population of 495 GROUP_ID value using IGP area identifier. 497 Route reflectors or EBGP speakers receiving such Group IDs from their 498 respective BGP peers as part of the BGP OPEN procedure MAY use them 499 when constructing update or peer groups in addition to any of the 500 existing grouping mechanism already available. An implementation may 501 allow operator to explicitly allow or disallow honoring such grouping 502 or provide means for manual overwrite via explicit configuration. 504 3.4. Discussion 506 This is not the first instance where a router participating in an IGP 507 is required to build the SPF tree using a root other than itself. 508 Determination of loop free alternate paths as described in [RFC5714] 509 is one such example. 511 Determining the shortest path and associated cost between any two 512 arbitrary points in a network based on the IGP topology learned by a 513 router is expected to add some extra cost in terms of CPU resource. 514 However SPF tree generation code is now implemented efficiently in a 515 number of implementations, and therefor this is not expected to be a 516 major drawback. The number of SPTs computed in the general non- 517 hierarchical case is expected to be of the order of the number of 518 clients of an RR whenever a topology change is detected. Advanced 519 optimizations like partial and incremental SPF may also be exploited. 520 By the nature of route reflection, the number of clients can be split 521 arbitrarily by the deployment of more route reflectors for a given 522 number of clients. While this is not expected to be necessary in 523 existing networks with best in class route reflectors available 524 today, this avenue to scaling up the route reflection infrastructure 525 would be available. If we consider the overall network wide cost/ 526 benefit factor, the only alternative to achieve the same level of 527 optimality would require significantly increasing state on the edges 528 of the network, which, in turn, will consume CPU and memory resources 529 on all BGP speakers in the network. Building this client perspective 530 into the route reflectors seems appropriate. 532 3.5. Advantages 534 The solution described provides a model for integrating the client 535 perspective into the best path computation for RRs. More 536 specifically, the choice or BGP path factors in the IGP metric 537 between the client and the nexthop, rather than the distance from the 538 RR to the nexthop. The documented method does not require any BGP or 539 IGP protocol changes as required changes are contained within the RR 540 implementation. 542 This solution can be deployed in traditional hop-by-hop forwarding 543 networks as well as in end-to-end tunneled environments. In the 544 networks where there are multiple route reflectors and hop-by-hop 545 forwarding without encapsulation, such optimizations should be 546 enabled on all route reflectors. Otherwise clients may receive an 547 inconsistent view of the network and in turn lead to intra-domain 548 forwarding loops. 550 With this approach, an ISP can effect a hot potato routing policy 551 even if route reflection has been moved from the forwarding plane to 552 the core and hop-by-hop switching has been replaced by end to end 553 MPLS or IP encapsulation. 555 As per above, the approach reduces the amount of state which needs to 556 be pushed to the edge in order to perform hot potato routing. The 557 memory and CPU resource required at the edge to provide hot potato 558 routing using this approach is lower than what would be required in 559 order to achieve the same level of optimality by pushing and 560 retaining all available paths (potentially 10s) per each prefix at 561 the edge. 563 The proposal allows for a fast and safe transition to BGP control 564 plane route reflection without compromising an operator's closest 565 exit operational principle. Hot potato routing is important to most 566 ISPs. The inability to perform hot potato routing effectively stops 567 migrations to centralized route reflection and edge-to-edge LSP/IP 568 encapsulation for traffic to IPv4 and IPv6 prefixes. 570 4. Angular distance approximation for BGP warm potato routing 572 This section describes an alternative solution to the use of IGP 573 topology information to virtually position the RR at the client 574 location in the network. This solution involves modeling the network 575 topology as a set of elements (regions, PoPs or routers) arranged in 576 a circle. Route reflector clients and inter-domain exit points would 577 then be statically assigned to those elements such that one can 578 compute the angular distance between route-reflector clients and the 579 various exit points in order to infer the distance between any two 580 elements. This measure of distance can be used as an effective 581 alternative to the IGP distance as a tie breaker in the path 582 selection algorithm if necessary. 584 4.1. Problem statement 586 This solution addresses the problem described in earlier sections, 587 while attempting to minimize computational overhead. The aim of the 588 proposed solution is to enable a route reflector to provide a route 589 reflector client with an exit point for a prefix which is 'closest' 590 to the client rather than the route-reflector, without having to 591 distribute all paths to that client, or having to derive each 592 client's view of the network topology. The measure of closest is 593 based on a simplistic description of network topology provided by the 594 operator. 596 Consider the following example of an ISP network topology drawn to 597 reflect the location of the nodes and POPs: 599 N4 POP4 601 CLIENT B 602 POP4 POP1 N1 604 CORE 605 RR(s) POP2 N2 607 N5 POP3 POP2 N3 609 CLIENT A 610 POP3 612 N - represents the different exit points for a given prefix. POP2 is 613 a geographically large PoP with two paths; N2 and N3. 615 In a deployment where the centralized RRs tie break on the basis of 616 their IGP-based view of the network, N1 above would be advertised to 617 all clients on the basis that it is closest to the RR. Path N4 would 618 be a more appropriate choice for client B. Similarly, N5 would be 619 more appropriate for client A since path N5 is closer to client A 620 then path N1. 622 4.2. Proposed solution 624 The proposed solution revolves around the operator establishing the 625 angular position of the route-reflector clients and inter-domain exit 626 points in the network. The route reflector then picks the path to 627 advertise to a client based on the client's angular position versus 628 the angular position of the inter-domain exit points originating the 629 paths. The operator can choose the granularity of angular position 630 appropriate to the desired goals. On one hand, the coarseness of the 631 angular position will effect the operator overhead; versus the 632 optimality of routing on the other. The finest granularity possible 633 will be the relative position of originating clients. 635 Note that this solution has nothing to do with actual IGP link 636 metrics and resulting topology in the network. 638 It can be shown that for each network topology, elements such as AS 639 exit points can be mapped on to a circle. By putting POPs, Regions 640 or individual clients onto the hypothetical circle we can identify an 641 angular location for each element relative to some fixed direction; 642 for example defining the angular north of the circle at 0 degrees. 644 The angular position of elements in the network can be conveyed to a 645 route reflector in a number of ways: 647 Assignment of angular position of each RR client through 648 configuration on the route reflector itself; per client 649 configuration on RR 651 Assignment of angular position of an RR client at each client, 652 then propagating it to RRs. 654 The proposed angular distance approximation is compatible with both 655 flat and hierarchical IGP deployments. 657 In the example illustrated above the route reflector might learn or 658 be configured with the following set of paths and corresponding 659 angular positions: 661 Prefix X/Y N1 N2 N3 N4 N5 663 Location 664 in degrees 60 85 120 290 260 666 If the absolute angular position of clients A and B were as follows: 668 Client A: 260 degrees 670 Client B: 290 degrees 672 Then the corresponding angular distances for those clients versus the 673 exit points can be calculated as follows: 675 Prefix X/Y N1 N2 N3 N4 N5 677 Client A 200 175 140 30 0 679 Client B 230 205 170 0 30 681 With an RR running the BGP best path algorithm modified to use the 682 angular distance from the client to the nexthops, rather than its IGP 683 distance to the nexthops as tie breaker, each client is provided with 684 its closest path with the measure of closeness reflecting the angular 685 position as configured by the operator. 687 The model used by the operator in order to determine the angular 688 position of a client or exit point, might involve grouping elements 689 together by region or PoP, or might involve no grouping at all. 690 Implementations should allow the operator to pick the appropriate 691 granularity. 693 4.3. Centralized vs distributed route reflectors 695 In an environment where the RR clusters are distributed (yet 696 centralized enough to make hot potato routing hard), and each RR 697 cluster serves a subset of clients, it becomes necessary to propagate 698 the angular position of the clients between route reflectors. This 699 can be achieved as follows: 701 Deploy add-paths between route reflectors in order to maximize 702 path diversity within the cluster. 704 A non AS transitive BGP community of type (TBA by IANA) can be 705 used to encode and propagate angular position between 0 and 359 of 706 a client. This community is only relevant to the route reflectors 707 of a given BGP domain and should be stripped either at the ASBR 708 boundary or when propagating updates to BGP peers which are not 709 route reflectors. 711 The angular position marking could also be added by clients and 712 advertised to the route reflector. This would require some 713 configuration effort. 715 5. Client's perspective policy based best path selection 717 There is some deployment scenarios where a service provider wants to 718 achieve a stronger control on traffic exiting the AS (for capacity 719 planning) rather than using hot potato routing based on IGP metric. 721 | | | | 722 | | | | 723 GW1 GW2 GW3 GW4 725 RR1 RR2 727 R1 R2 R3 729 Considering the figure above, all gateways have iBGP sessions to RR1 730 and RR2, and R1 R2 R3 have iBGP sessions as well to RR1 and RR2. 731 Gateway routers are meshed to an external network (for example, a 732 transit service provider). 734 We would like to achieve a strong control on the gateway used 735 (primary and backup) for each router (or each set of routers) in the 736 network (taking into account that routers do not support ADD PATHs). 737 For example, R1 using GW1 as primary and GW2 as backup; R2 using GW2 738 as primary and GW3 as backup; R3 using GW3 as primary and GW4 as 739 backup. 741 Basically, today a prefix P1 is received on each gateway from the 742 external network. Each gateway will send the prefix to both route 743 reflectors. Each route-reflector will receive four paths for P1 and 744 choose the best one based on his own decision process. Note that RR1 745 and RR2 may choose a different path as best. Each route-reflector 746 sends his best path towards R1, R2 and R3. Each router will receive 747 the same paths from the route-reflectors for P1 (at max, only two 748 gateways are visible from Rx routers). So default behavior does not 749 fit our requirements in term of traffic flows. 751 Using current BGP mechanisms available, we could achieve our 752 requirements using two solutions : 754 o Modify the BGP meshing: for example, R1 meshed directly to GW1 and 755 GW2 and apply inbound policies on R1; R2 meshed directly to GW2 756 and GW3 and apply inbound policies on R2 ... 758 o Adding more route-reflectors (one RR per gateway used as primary) 759 and applying inbound policies on RRs to make each RR choosing a 760 different primary gateway and apply policies on routers to select 761 his own primary gateway. 763 These solutions have many drawbacks: first one is not flexible (re- 764 meshing needed when we want to change gateway of a router), second 765 one requires a lot of CAPEX. 767 We would like to introduce a solution where a single currently 768 deployed route-reflector chassis may take a different best path 769 decision for different set of clients based on preferences. 771 It should be noted that in simple scenarios (example: two RRs and two 772 gateways), RFC6774 would be able to fulfill service provider needs. 773 The solution proposed here would permit to handle more complex 774 scenarios and fine gateway choice per client or groups of clients. 776 5.1. Proposal 778 Our proposal is to reuse the concept introduced in [I.D.ietf-idr-ix- 779 bgp-route-server] in an iBGP context. To perform per client best 780 path selection, the router should maintain a per client BGP local-RIB 781 (or Adj-RIB-Out) associated with inbound policies implemented between 782 Adj-RIB-In and client LOC-RIB. 784 It would not be very scalable to use a per client policy (considering 785 hundreds of peers on a route-reflector), therefor our proposal is to 786 group clients sharing common policies inside a client group to 787 minimize computation/memory overhead. Client grouping could be done 788 statically (by configuration) or dynamically using the solution 789 described in section 3.3.1 of this document. Client grouping would 790 be performed with a per AFI/SAFI granularity as gateway/client 791 mapping may change in each AFI/SAFI context. A route-reflector 792 should be able to implement multiple client groups (with associated 793 inbound policies) as well as a default client group for clients that 794 does not require any specific policy decision: in this case, the 795 overall BGP best path computation would be used. 797 5.2. Example 798 GW1 GW2 GW3 799 \ | / 800 \ | / 801 RR1 802 / | \ 803 R1 R2 R3 805 In the above figure GW1, GW2, GW3 and R3 are standard ibgp route- 806 reflector clients. R1 and R2 want to use a special gateway 807 combination (primary GW3, backup GW2, last resort GW1). R1 and R2 808 are configured in a specific client group CG1 on the route-reflector 809 while other peers are in the default client group. CG1 is associated 810 with a policy achieving the expected GW preference for R1 and R2, and 811 letting other paths without any change. 813 All routes received by RR1 (ebgp, ibgp, ibgp rr client, ibgp rr 814 client routing context) must be evaluated using overall BGP best path 815 computation as well as in client group, the client group policy will 816 accept or not the route to be evaluated by the local decision 817 process. 819 o Paths from GW1, GW2, GW3 are compared within default client group 820 leading to one GW (for example GW1) to be selected as best and 821 installed in global LOC-RIB. GW1 path will be advertised to GW2, 822 GW3 and R3 as they are in default CG. In CG1, preference of GW 823 paths has been modified, leading to GW3 being the best path and 824 installed in client group LOC-RIB. GW3 path will be advertised to 825 R1 and R2, as R1 and R2 are part of CG1. 827 o Paths from R3 are compared within default client group and 828 advertised to GW1, GW2, GW3. Those paths are also compared within 829 CG1 (as accepted by policy) and advertised to R1 and R2. 831 o Paths from R1 are compared within default client group and 832 advertised to GW1, GW2, GW3 and R3. Those paths are also compared 833 within GG1 (as accepted by policy) and advertised to R2. 835 o Paths from R2 are compared within default client group and 836 advertised to GW1, GW2, GW3 and R3. Those paths are also compared 837 within CG1 (as accepted by policy) and advertised to R1. 839 5.3. Avoiding routing loops 841 Compared to the IGP approaches described in this document, the policy 842 based route-reflection should be limited to end-to-end encapsulation 843 environments to avoid intra-domain forwarding loops. Using end-to- 844 end encapsulation permit Edge routers to transport the traffic to the 845 targeted/preferred ASBR without any loop in the core. 847 To avoid a potential rerouting of the ASBR into the core (and 848 possible loop between Edges and ASBR), we must enforce forwarding at 849 the ASBR to the eBGP peer. This could be done by : 851 o implementing policies on ASBR to prefer eBGP path and install it 852 in FIB. 854 o implementing tunneling of traffic until the outside interface 855 (ASBR action to switch to outside interface). 857 The exact choice of encapsulation and techniques to prevent transport 858 loops (including potential loops at gateways) is left to the operator 859 choice and its specification is outside of the scope of this 860 document. 862 6. Deployment considerations 864 The solutions are primarily intended for end-to-end tunneled 865 environments, i.e. where traffic is label switched or IP tunneled 866 across the core. If unencapsulated hop-by-hop forwarding is used, 867 either misconfigurations or conflicts between these optimizations and 868 classical BGP path selection rules could lead to intra-domain 869 forwarding loops. Under certain circumstances the solutions can also 870 be deployable without end-to-end tunneling. In particular the best 871 path selection based on the client's IGP best-path selection is 872 guaranteed not to cause any forwarding loops (other than micro loops 873 associated with reconvergence) when deployed in a flat IGP area 874 provided that no distance tolerance value is used so that the path 875 choice is truly made on a per-client basis. 877 Regarding potential intra-domain forwarding loops at ASBR level, this 878 could be solved by enforcing external route preference or by 879 performing tunnel to external interface switching action on ASBRs. 881 Regarding client's IGP best-path selection, it should be self evident 882 that this solution does not interfere with policies enforced above 883 IGP tie breaking in the BGP best path algorithm. 885 The solution applies to NLRIs of all address families which can be 886 route reflected. 888 It should be noted that customized per-client or group of clients 889 best path selection is already in use today in the context of 890 Internet Exchange Point (IXP) route servers. In an IXP route server 891 the client best path is selected as a result of different policies 892 rather than IGP metric distance to BGP next hop. 894 A possible scalability impact of optimizing path selection to take 895 account of the RR client position or operator's policy based 896 preference is that different RR clients receive different paths, and 897 therefore update/peer group efficiency diminishes. This cost is 898 imposed by the requirement to optimize the egress path from the 899 client's perspective. It is also likely that groups of clients will 900 end up receiving the same best path/s, in which case, inefficiency of 901 update generation will be minimized. It should be noted that in the 902 cases described under flexible router placement where placement is 903 determined on a per update/peer group basis or per route reflector, 904 the scale benefits of peer groupings are retained. 906 7. Security considerations 908 No new security issues are introduced to the BGP protocol by this 909 specification. 911 8. IANA Considerations 913 IANA is requested to allocate a type code for the Standard BGP 914 Community to be used for inter cluster propagation of angular 915 position of the clients. 917 IANA is requested to allocate a new type code from BGP OPEN Optional 918 Parameter Types registry to be used for Group_ID propagation. 920 9. Acknowledgments 922 Authors would like to thank Eric Rosen, Clarence Filsfils, Uli 923 Bornhauser Russ White, Jakob Heitz and Mike Shand for their valuable 924 input. 926 10. References 928 10.1. Normative References 930 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 931 Requirement Levels", BCP 14, RFC 2119, March 1997. 933 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 934 Protocol 4 (BGP-4)", RFC 4271, January 2006. 936 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 937 Communities Attribute", RFC 4360, February 2006. 939 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 940 with BGP-4", RFC 5492, February 2009. 942 10.2. Informative References 944 [I-D.ietf-idr-add-paths] 945 Walton, D., Chen, E., Retana, A., and J. Scudder, 946 "Advertisement of Multiple Paths in BGP", 947 draft-ietf-idr-add-paths-07 (work in progress), June 2012. 949 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 950 Communities Attribute", RFC 1997, August 1996. 952 [RFC1998] Chen, E. and T. Bates, "An Application of the BGP 953 Community Attribute in Multi-home Routing", RFC 1998, 954 August 1996. 956 [RFC4384] Meyer, D., "BGP Communities for Data Collection", BCP 114, 957 RFC 4384, February 2006. 959 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 960 Reflection: An Alternative to Full Mesh Internal BGP 961 (IBGP)", RFC 4456, April 2006. 963 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 964 Number Space", RFC 4893, May 2007. 966 [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension 967 for Inter-Area Label Switched Paths (LSPs)", RFC 5283, 968 July 2008. 970 [RFC5668] Rekhter, Y., Sangli, S., and D. Tappan, "4-Octet AS 971 Specific BGP Extended Community", RFC 5668, October 2009. 973 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 974 RFC 5714, January 2010. 976 [RFC6774] Raszuk, R., Fernando, R., Patel, K., McPherson, D., and K. 977 Kumaki, "Distribution of Diverse BGP Paths", RFC 6774, 978 November 2012. 980 Authors' Addresses 982 Robert Raszuk 983 NTT MCL 984 101 S Ellsworth Avenue Suite 350 985 San Mateo, CA 94401 986 US 988 Email: robert@raszuk.net 990 Christian Cassar 991 Cisco Systems 992 10 New Square Park 993 Bedfont Lakes, FELTHAM TW14 8HA 994 UK 996 Email: ccassar@cisco.com 998 Erik Aman 999 TeliaSonera 1000 Marbackagatan 11 1001 Farsta, SE-123 86 1002 Sweden 1004 Email: erik.aman@teliasonera.com 1006 Bruno Decraene 1007 France Telecom 1008 38-40 rue du General Leclerc 1009 Issy les Moulineaux cedex 9, 92794 1010 France 1012 Email: bruno.decraene@orange.com 1014 Stephane Litkowski 1015 Orange 1016 9 rue du chene germain 1017 Cesson Sevigne, 35512 1018 France 1020 Email: stephane.litkowski@orange.com