idnits 2.17.1 draft-ietf-idr-bgp-optimal-route-reflection-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4456]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 82 has weird spacing: '...routing from...' == Line 93 has weird spacing: '... potato routi...' == Line 236 has weird spacing: '...routing from ...' == Line 552 has weird spacing: '... potato routi...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (November 6, 2012) is 4187 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 762, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 765, but no explicit reference was found in the text == Unused Reference: 'RFC4360' is defined on line 768, but no explicit reference was found in the text == Unused Reference: 'RFC5492' is defined on line 771, but no explicit reference was found in the text == Unused Reference: 'RFC1997' is defined on line 787, but no explicit reference was found in the text == Unused Reference: 'RFC1998' is defined on line 790, but no explicit reference was found in the text == Unused Reference: 'RFC4384' is defined on line 794, but no explicit reference was found in the text == Unused Reference: 'RFC4893' is defined on line 801, but no explicit reference was found in the text == Unused Reference: 'RFC5668' is defined on line 808, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 -- Obsolete informational reference (is this intentional?): RFC 4893 (Obsoleted by RFC 6793) Summary: 1 error (**), 0 flaws (~~), 16 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group R. Raszuk 3 Internet-Draft NTT MCL 4 Intended status: Standards Track C. Cassar 5 Expires: May 10, 2013 Cisco Systems 6 E. Aman 7 TeliaSonera 8 B. Decraene 9 France Telecom 10 November 6, 2012 12 BGP Optimal Route Reflection (BGP-ORR) 13 draft-ietf-idr-bgp-optimal-route-reflection-03 15 Abstract 17 [RFC4456] asserts that, because the Interior Gateway Protocol (IGP) 18 cost to a given point in the network will vary across routers, "the 19 route reflection approach may not yield the same route selection 20 result as that of the full IBGP mesh approach." One practical 21 implication of this assertion is that the deployment of route 22 reflection may thwart the ability to achieve hot potato routing. Hot 23 potato routing attempts to direct traffic to the closest AS egress 24 point in cases where no higher priority policy dictates otherwise. 25 As a consequence of the route reflection method, the choice of exit 26 point for a route reflector and its clients will be the egress point 27 closest to the route reflector - and not necessarily closest to the 28 RR clients. 30 Section 11 of [RFC4456] describes a deployment approach and a set of 31 constraints which, if satsified, would result in the deployment of 32 route reflection yielding the same results as the iBGP full mesh 33 approach. Such a deployment approach would make route reflection 34 compatible with the application of hot potato routing policy. 36 As networks evolved to accommodate architectural requirements of new 37 services, tunneled (LSP/IP tunneling) networks with centralized route 38 reflectors became commonplace. This is one type of common deployment 39 where it would be impractical to satisfy the constraints described in 40 Section 11 of [RFC4456]. Yet, in such an environment, hot potato 41 routing policy remains desirable. 43 This document proposes two new solutions which can be deployed to 44 facilitate the application of closest exit point policy centralized 45 route reflection deployments. 47 Status of this Memo 48 This Internet-Draft is submitted in full conformance with the 49 provisions of BCP 78 and BCP 79. 51 Internet-Drafts are working documents of the Internet Engineering 52 Task Force (IETF). Note that other groups may also distribute 53 working documents as Internet-Drafts. The list of current Internet- 54 Drafts is at http://datatracker.ietf.org/drafts/current/. 56 Internet-Drafts are draft documents valid for a maximum of six months 57 and may be updated, replaced, or obsoleted by other documents at any 58 time. It is inappropriate to use Internet-Drafts as reference 59 material or to cite them other than as "work in progress." 61 This Internet-Draft will expire on May 10, 2013. 63 Copyright Notice 65 Copyright (c) 2012 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with respect 73 to this document. Code Components extracted from this document must 74 include Simplified BSD License text as described in Section 4.e of 75 the Trust Legal Provisions and are provided without warranty as 76 described in the Simplified BSD License. 78 Table of Contents 80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 81 2. Proposed solutions . . . . . . . . . . . . . . . . . . . . . . 5 82 3. Best path selection for BGP hot potato routing from 83 customized IGP network position . . . . . . . . . . . . . . . 6 84 3.1. Client's perspective best path selection algorithm . . . . 8 85 3.1.1. Flat IGP network . . . . . . . . . . . . . . . . . . . 8 86 3.1.2. Hierarchical IGP network . . . . . . . . . . . . . . . 8 87 3.2. Aside: Configuration-based flexible route reflector 88 placement . . . . . . . . . . . . . . . . . . . . . . . . 9 89 3.3. Route reflector client grouping . . . . . . . . . . . . . 10 90 3.3.1. Route Reflector Client Group ID . . . . . . . . . . . 10 91 3.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 12 92 3.5. Advantages . . . . . . . . . . . . . . . . . . . . . . . . 12 93 4. Angular distance approximation for BGP warm potato routing . 13 94 4.1. Problem statement . . . . . . . . . . . . . . . . . . . . 13 95 4.2. Proposed solution . . . . . . . . . . . . . . . . . . . . 14 96 4.3. Centralized vs distributed route reflectors . . . . . . . 16 97 5. Deployment considerations . . . . . . . . . . . . . . . . . . 16 98 6. Security considerations . . . . . . . . . . . . . . . . . . . 17 99 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 100 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 101 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 102 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 103 9.2. Informative References . . . . . . . . . . . . . . . . . . 18 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 106 1. Introduction 108 There are three types of BGP deployments within Autonomous Systems 109 today: full mesh, confederations and route reflection. 111 BGP route reflection is the most popular way to distribute BGP routes 112 between BGP speakers belonging to the same administrative domain. 113 Traditionally route reflectors have been deployed in the forwarding 114 path and carefully placed on the POP to core boundaries. That model 115 of BGP route reflector placement has started to evolve. The 116 placement of route reflectors outside the forwarding path was 117 triggered by applications which required traffic to be tunneled from 118 AS ingress PE to egress PE: for example L3VPN. 120 This evolving model of intra-domain network design has enabled 121 deployments of centralized route reflectors. Initially this model 122 was only employed for new address families e.g. L3VPNs, L2VPNs etc 124 With edge to edge MPLS or IP encapsulation also being used to carry 125 internet traffic, this model has been gradually extended to other BGP 126 address families including IPv4 and IPv6 Internet routing. This is 127 also applicable to new services achieved with BGP as control plane 128 for example 6PE. 130 Such centralized route reflectors can be placed on the POP to core 131 boundaries, but they are often placed in arbitrary locations in the 132 core of large networks. 134 Such deployments suffer from a critical drawback in the context of 135 best path selection. A route reflector with knowledge of multiple 136 paths for a given prefix will pick the best path and only advertise 137 that best path to the the route reflector clients. If the best path 138 for a prefix is selected on the basis of an IGP tie break, the best 139 path advertised from the route reflector to its clients will be the 140 exit point closest to the route reflector. But route reflector 141 clients will be in a place in the network toplogy which is different 142 from the route reflector. In networks with centralized route 143 reflectors, this difference will be even more acute. It follows that 144 the best path chosen by the route reflector is not necessarily the 145 same as the path which would have been chosen by the client if the 146 client considered the same set of candidate paths as the route 147 reflector. Furthermore, the path chosen by the client might have 148 been a better path from that chosen by the route reflector for 149 traffic entering the network at the client. The path chosen by the 150 client would have guaranteed the lowest cost and delay trajectory 151 through the network. 153 Route reflector clients switch packets using routing information 154 learnt from route reflectors which are not on the forwarding path of 155 the packet through the network even in the absence of end-to-end 156 encapsulation. In those cases the path chosen as best and propagated 157 to the clients will often not be the optimal path chosen by the 158 client given all available paths. 160 Eliminating the IGP distance to the BGP nexthop as a tie breaker on 161 centralized route reflectors does not address the issue. Ignoring 162 IGP distance to the BGP next hop results in the tie breaking 163 procedure contributing the best path by differentiating between paths 164 using attributes otherwise considered less important than IGP cost to 165 the BGP nexthop. 167 One possible valid solution or workaround to this problem requires 168 sending all domain external paths from the RR to all its clients. 169 This approach suffers the significant drawback of pushing a large 170 amount of BGP state to all the edge routers. In many networks, the 171 number of EBGP peers over which full Internet routing information is 172 received would correlate directly to the number of paths present in 173 each ASBR. This could easily result in tens of paths for each 174 prefix. 176 Notwithstanding this drawback, there are a number of reasons for 177 sending more than just the single best path to the clients. Improved 178 path diversity at the edge is a requirement for fast connectivity 179 restoration, and a requirement for effective BGP level load 180 balancing. Protocol extensions like add-paths 181 [I-D.ietf-idr-add-paths] or diverse-path 182 [I-D.ietf-grow-diverse-bgp-path-dist] allow for such improved path 183 diversity and can be used to address the same problems addressed by 184 the mechanisms proposed in this draft. In practical terms, add/ 185 diverse path deployments are expected to result in the distribution 186 of 2, 3 or n (where n is a small number) 'good' paths rather than all 187 domain external paths. While the route reflector chooses one set of 188 n paths and distributes those same n paths to all its route reflector 189 clients, those n paths may not be the right n paths for all clients. 190 In the context of the problem described above, those n paths will not 191 necessarily include the closest egress point out of the network for 192 each route reflector client. The mechanisms proposed in this 193 document are likely to be complementary to mechanisms aimed at 194 improving path diversity. 196 2. Proposed solutions 198 This document proposes two simple solutions to the problem described 199 above. Both of these solutions make it possible for route reflector 200 clients to direct traffic to their closest exit point in hot potato 201 routing deployments, without requiring further state to be pushed out 202 to the edge. These solutions are primarily applicable in deployments 203 using centralized route reflectors, which are typically implemented 204 in devices without a capable forwarding plane. 206 The two alternatives are: 208 "Best path selection for BGP hot potato routing from client's IGP 209 network position" 211 "Angular distance approximation for BGP warm potato routing" 213 Both solutions rely upon all route reflectors learning all paths 214 which are eligible for consideration for hot potato routing. In 215 order to satisfy this requirement, path diversity enhancing 216 mechanisms such as add paths/diverse paths may need to be deployed 217 between route reflectors. 219 In both of these solutions the route reflector selects and 220 distributes a route to each client based on what would be optimal 221 from the client's perspective. By optimal we refer in this document 222 to the decision made during best path selection at the IGP metric to 223 BGP next hop comparison step. Clearly the overall path selection 224 preference may be chosen based other policy step and provisions as 225 defined in this document would not apply. 227 In the respective solutions the choice is made either factoring in 228 IGP costs or the configured angular distance to the next hop. The 229 route reflector makes different decisions for different clients only 230 in the case where the tie breaker for path selection would have been 231 the IGP distance to the BGP nexthop (as in hot potato routing). 233 A signficant advantage of this approach is that the RR clients do not 234 need to run new software or hardware. 236 3. Best path selection for BGP hot potato routing from customized IGP 237 network position 239 This section describes a method for calculating the order of 240 preference of BGP paths from the point of view of each separate route 241 reflector client. More specifically, the route relflector will 242 compute the IGP metric to the BGP nexthop from the position of the 243 client to which the resulting path will be distributed, if the IGP 244 metric is the tie breaker applied to a set of possible paths. In the 245 subsequent model authors will propose virtual reflector placement at 246 operator's selected IGP location. 248 In the case of a hierarchical IGP deployment where the client is in a 249 different level in the hierarchy to the route reflector, the route 250 reflector will compute IGP distance to the BGP nexthop from the Area 251 Border Routers (ABR) leading to the client in lieu of the route 252 reflector client itself, and use the shortest distance from these 253 ABRs to the nexthop. This provides an approximation to the desired 254 functionality. Rather than a client picking the closest path, the 255 client would be picking the exit point closest to the client region 256 as defined by area or level. In cases where one or more nexthops are 257 in the same region as the client, one of those nexthops would be 258 preferred, with tie breaking within those nexthops performed from the 259 route reflector's position in the network. 261 It is assumed that reachability through a set of ABRs is always 262 advertised through identical prefixes from those ABRs. If a nexthop 263 is reachable through multiple ABRs but the ABRs advertise 264 reachability through prefixes of different length, then only the ABR 265 advertising the longest prefix will be considered as a viable path to 266 the nexthop. 268 BGP best path selection and its distribution has a natural 269 consequence of limiting the amount of state in the network. That is 270 not in itself a drawback. BGP speakers will rarely need to receive 271 all available BGP paths. In network deployments with multiple 272 upstream peerings or with very dense peering schemes, the number of 273 available BGP paths for a given BGP prefix can be high. Real network 274 deployments with the number of paths for a prefix ranging from 10s to 275 100s have been observed. It would be wasteful to propagate all of 276 those paths to all clients, such that each client can select paths 277 according to the position of the nexthop relative to the client. 279 Whenever a BGP route reflector would need to decide what path or 280 paths need to be selected for advertisement to one of its clients, 281 the route reflector would need to virtually position itself in its 282 client IGP network location in order to choose the right set of paths 283 based on the IGP metric to the next hops from the client's 284 perspective. 286 This technique applies in deployments with or without diverse paths 287 or the various path selection modes contemplated in add-paths. 289 In the network architectures consisting of more then single pair of 290 route reflectors it is required that all reflectors are fully meshed 291 and have ability to learn and maintain all external BGP paths. In 292 the event of constructing a hierarchy of reflectors to relax the full 293 RR mesh requirements ORR should not be run between such route 294 reflectors. 296 3.1. Client's perspective best path selection algorithm 298 For each centralized route reflector the proposal assumes that the 299 route reflector participates in a common IGP with its clients. There 300 are two scenarios to consider - flat versus hierarchical IGP network. 302 3.1.1. Flat IGP network 304 Reflectors run SPF from the client IGP node point of view such 305 that the cost of BGP nexthops from the client can be determined if 306 necessary. For the purpose of BGP path selection the interesting 307 product of this calculation is the ability to determine the IGP 308 distance from a client to a BGP next hop. This distance to a 309 nexthop would be interesting in cases where that next hop is for a 310 path which is contending with otherwise equally preferred paths. 311 This approach works in tunneled as well as conventional hop-by-hop 312 IP forwarding cores. 314 When the path selection tie breaker for a prefix is the IGP metric 315 to the BGP nexthops of the contending paths, then the route 316 reflector will determine the order of preference of the contending 317 paths by considering the distance from the client to the path 318 nexthops in order to decide what path/s to advertise to a client 319 (or group of clients where feasible). It should be noted that an 320 operator may wish to provide a distance tolerance value, such that 321 beyond a certain granularity, differences between IGP metric are 322 invisible to the path selection algorithm. This will allow a 323 route reflector some leeway in selecting between paths such that 324 rather than pick one path over another on the basis of a 325 difference in distance which is operationally irrelevant, the 326 route reflector can choose to optimize for update generation 327 grouping. Furthermore, this tolerance will reduce the likelihood 328 of generation of BGP updates when the IGP topology changes in a 329 way which is not operationally relevant. In the case that a path 330 is selected from a set for a given prefix while ignoring 331 differences in distance within the tolerance figure, then that 332 same path must always be preferred for all clients where the paths 333 are within the tolerance figure 335 3.1.2. Hierarchical IGP network 337 Hierarchy introduces two challenges: 339 The first challenge is that the RR IGP view may differ from a 340 client IGP view by virtue of one or the other having a summarised 341 view versus the other. Summarisation, by its nature, loses 342 information. Consider the example where a client within a PoP 343 sees two prefixes with two metrics for two egress points within 344 the PoP, but where the RR only sees a single summary covering 345 reachability to both nexthops as injected by the ABR. For 346 clarification purposes in the case of ISIS by ABR we refer to 347 L1/L2 node. However it needs to be observed that inter area 348 networks running LDP are required to disable summarisation of all 349 FEC advertised in LDP (typically all loopbacks) unless [RFC5283] 350 is deployed. Such deployments are not likely to suffer 351 summarisation difficulties. 353 The second challenge is that in cases where the client is in a 354 different level of hierarchy from the RR, the RR can not build a 355 Shortest Path First (SPF) tree with the client node as root, 356 simply because the topology derived by the IGP will not include 357 the client node. It will instead only include reachability to the 358 client from one or more ABRs. In order to overcome this problem, 359 the RR could compute an SPF tree from the ABRs in the area. The 360 RR would then determine the shortest distance from a client which 361 lives behind the ABRs, to a nexthop, by adding the advertised 362 distances from an ABR to the client and the distance from the ABR 363 to a nexthop, for each ABR, and picking the minimum. This assumes 364 that IGP metrics on links are symmetric; i.e. that the distance 365 from the ABR to the client or nexthop is equal to the distance 366 from the client or nexthop to the ABR. 368 There are cases where the above approach does not help. If RR is 369 trying to arbitrate amongst a set of paths for a client which is 370 in the same hierarchy as some of those paths, and in a different 371 hierarchy to the RR, the opaqueness of the region containing the 372 client at the RR defeats the selection process. It is impossible 373 to determine the relative position of the RR client and the paths 374 within the client region. 376 The solution for hierarchical IGP networks also assumes that if 377 RRs are present and are responsible for calculation of BGP best 378 path to clients they are either placed in each local area 379 coinciding with area containing clients or they are placed in the 380 core (area 0/level 2) of the network. 382 3.2. Aside: Configuration-based flexible route reflector placement 384 The ability to exploit topology information available in the IGP in 385 ways described above can also be used to virtually place the RR at 386 different points in the network for purposes other than hot potato 387 routing. 389 A route reflector can be globally configured to "pretend" its logical 390 location is one of any of the other nodes within a given IGP area/ 391 level flooding scope regardless of its physical connectivity. 393 Such flexibility provides a useful tool for reflector virtualization, 394 and supports moving or replacing physical route reflectors without 395 any effect on routing. Such a change can be permanent or it could be 396 performed during network maintenance in order to minimize network 397 impact. 399 A possible variation would allow the virtual placement of RR to be 400 effected on a per-AF or AF plus update/peer group granularity. It 401 should be noted that this approach provides for splitting one 402 centralized route reflector such that it is virtually positioned at 403 various network locations, with the network location depending upon 404 of address family or address family plus update/peer group. 406 Virtual slicing of a centralized route reflector relaxes the need to 407 propagate all BGP paths between RRs in a alternative conventional 408 distributed RR deployment. It is expected that such RRs would be 409 deployed in redundant sets, and that those RRs would not need to be 410 physically colocated, while still benefiting from the possibility of 411 being logically colocated, and therefore not compromising any of the 412 best path selection symmetry. 414 3.3. Route reflector client grouping 416 It may be appropriate to allow the operator, or the route reflector 417 itself, to group clients together using IGP distance between clients 418 to determine grouping. All the operation discussed above which 419 relied upon computing best path for each client, and measuring 420 distances from each client to different nexthops, would instead be 421 performed for each group of clients. Configurable thresholds can be 422 used to determine which IGP metric changes should be visible to BGP, 423 and trigger best paths recomputation. The latter would be beneficial 424 in existing BGP RR code too. 426 Alternatively route reflector client grouping could be accomplished 427 statically by the operator by coloring clients belonging to a common 428 group (for example being part of the same POP). In order to 429 accomplish such marking it is proposed that BGP OPEN message be 430 augmented with an optional paramiter indicating the Group ID given 431 peer belongs to. 433 3.3.1. Route Reflector Client Group ID 435 This is an Optional Parameter in BGP OPEN message that is used by a 436 BGP speaker to convey to its route reflectors the Group ID value. 437 Such value will allow automatic and predictable peer grouping on the 438 route reflectors as deemed necessary from operator's network 439 architecture. 441 The parameter contains precisely one set of [Group_ID Code, Group_ID 442 Length, Group_ID Value] encoded as shown below: 444 +----------------------------+ 445 | Group ID Code (1 octet) | 446 +----------------------------+ 447 | Group ID Length (1 octet) | 448 +----------------------------+ 449 | Group ID Value (4 octets) | 450 +----------------------------+ 452 The use and meaning of these fields are as follows: 454 Group ID Code: 456 Group ID Code is a one octet field that identifies Group ID 457 optional parameter of BGP OPEN message. Value TBD by IANA 458 Recommended value: 3. 460 Group ID Length: 462 Group ID Length is a one octet field that contains the length 463 of the Group ID Value field in octets. It is fixed and equals 464 to 4. 466 Group ID Value: 468 Group ID Value is a fixed length field of size equal to 469 four octets that contains the numerical value of group given 470 BGP speaker should be part of on the route reflector. 472 Two special values are reserved: 474 0x00000000 - No grouping preference 475 0xFFFFFFFF - Do not group this BGP speaker 477 An implementation may allow automatic population of 478 GROUP_ID value using IGP area identifier. 480 Route reflectors or EBGP speakers receiving such Group IDs from their 481 respective BGP peers as part of the BGP OPEN procedure MAY use them 482 when constructing update or peer groups in addition to any of the 483 existing grouping mechanism already available. An implementation may 484 allow operator to explicitly allow or disallow honoring such grouping 485 or provide means for manual overwrite via explicit configuration. 487 3.4. Discussion 489 This is not the first instance where a router participating in an IGP 490 is required to build the SPF tree using a root other than itself. 491 Determination of loop free alternate paths as described in [RFC5714] 492 is one such example. 494 Determining the shortest path and associated cost between any two 495 arbitrary points in a network based on the IGP topology learned by a 496 router is expected to add some extra cost in terms of CPU resource. 497 However SPF tree generation code is now implemented efficiently in a 498 number of implementations, and therefor this is not expected to be a 499 major drawback. The number of SPTs computed in the general non- 500 hierarchical case is expected to be of the order of the number of 501 clients of an RR whenever a topology change is detected. Advanced 502 optimizations like partial and incremental SPF may also be exploited. 503 By the nature of route reflection, the number of clients can be split 504 arbitrarily by the deployment of more route reflectors for a given 505 number of clients. While this is not expected to be necessary in 506 existing networks with best in class route reflectors available 507 today, this avenue to scaling up the route reflection infrastructure 508 would be available. If we consider the overall network wide cost/ 509 benefit factor, the only alternative to achieve the same level of 510 optimality would require significantly increasing state on the edges 511 of the network, which, in turn, will consume CPU and memory resources 512 on all BGP speakers in the network. Building this client perspective 513 into the route reflectors seems appropriate. 515 3.5. Advantages 517 The solution described provides a model for integrating the client 518 perspective into the best path computation for RRs. More 519 specifically, the choice or BGP path factors in the IGP metric 520 between the client and the nexthop, rather than the distance from the 521 RR to the nexthop. The documented method does not require any BGP or 522 IGP protocol changes as required changes are contained within the RR 523 implementation. 525 This solution can be deployed in traditional hop-by-hop forwarding 526 networks as well as in end-to-end tunneled environments. In the 527 networks where there are multiple route reflectors and unencapsulated 528 hop-by-hop forwarding, such optimizations should be enabled on all 529 route reflectors. Otherwise clients may receive an inconsistent view 530 of the network and in turn lead to intra-domain forwarding loops. 532 With this approach, an ISP can effect a hot potato routing policy 533 even if route reflection has been moved from the forwarding plane to 534 the core and hop-by-hop switching has been replaced by end to end 535 MPLS or IP encapsulation. 537 As per above, the approach reduces the amount of state which needs to 538 be pushed to the edge in order to perform hot potato routing. The 539 memory and CPU resource required at the edge to provide hot potato 540 routing using this approach is lower than what would be required in 541 order to achieve the same level of optimality by pushing and 542 retaining all available paths (potentially 10s) per each prefix at 543 the edge. 545 The proposal allows for a fast and safe transition to BGP control 546 plane route reflection without compromising an operator's closest 547 exit operational principle. Hot potato routing is important to most 548 ISPs. The inability to perform hot potato routing effectively stops 549 migrations to centralized route reflection and edge-to-edge LSP/IP 550 encapsulation for traffic to IPv4 and IPv6 prefixes. 552 4. Angular distance approximation for BGP warm potato routing 554 This section describes an alternative solution to the use of IGP 555 topology information to virtually position the RR at the client 556 location in the network. This solution involves modelling the 557 network topology as a set of elements (regions, PoPs or routers) 558 arranged in a circle. Route reflector clients and inter-domain exit 559 points would then be statically assigned to those elements such that 560 one can compute the angular distance between route-reflector clients 561 and the various exit points in order to infer the distance between 562 any two elements. This measure of distance can be used as an 563 effective alternative to the IGP distance as a tie breaker in the 564 path selection algorithm if necessary. 566 4.1. Problem statement 568 This solution addresses the problem described in earlier sections, 569 while attempting to minimize computational overhead. The aim of the 570 proposed solution is to enable a route reflector to provide a route 571 reflector client with an exit point for a prefix which is 'closest' 572 to the client rather than the route-reflector, without having to 573 distribute all paths to that client, or having to derive each 574 client's view of the network topology. The measure of closest is 575 based on a simplistic description of network topology provided by the 576 operator. 578 Consider the following example of an ISP network topology drawn to 579 reflect the location of the nodes and POPs: 581 N4 POP4 583 CLIENT B 584 POP4 POP1 N1 586 CORE 587 RR(s) POP2 N2 589 N5 POP3 POP2 N3 591 CLIENT A 592 POP3 594 N - represents the different exit points for a given prefix. POP2 is 595 a geographically large PoP with two paths; N2 and N3. 597 In a deployment where the centralized RRs tie break on the basis of 598 their IGP-based view of the network, N1 above would be advertised to 599 all clients on the basis that it is closest to the RR. Path N4 would 600 be a more appropriate choice for client B. Similarly, N5 would be 601 more appropriate for client A since path N5 is closer to client A 602 then path N1. 604 4.2. Proposed solution 606 The proposed solution revolves around the operator establishing the 607 angular position of the route-reflector clients and inter-domain exit 608 points in the network. The route reflector then picks the path to 609 advertise to a client based on the client's angular position versus 610 the angular position of the inter-domain exit points originating the 611 paths. The operator can choose the granularity of angular position 612 appropriate to the desired goals. On one hand, the coarseness of the 613 angular position will effect the operator overhead; versus the 614 optimality of routing on the other. The finest granularity possible 615 will be the relative position of originating clients. 617 Note that this solution has nothing to do with actual IGP link 618 metrics and resulting topology in the network. 620 It can be shown that for each network topology, elements such as AS 621 exit points can be mapped on to a circle. By putting POPs, Regions 622 or individual clients onto the hypothetical circle we can identify an 623 angular location for each element relative to some fixed direction; 624 for example defining the angular north of the circle at 0 degress. 626 The angular position of elements in the network can be conveyed to a 627 route reflector in a number of ways: 629 Assignment of angular position of each RR client through 630 configuration on the route reflector itself; per client 631 configuration on RR 633 Assignment of angular position of an RR client at each client, 634 then propagating it to RRs. 636 The proposed angular distance approximation is compatible with both 637 flat and hierarchical IGP deployments. 639 In the example illustrated above the route reflector might learn or 640 be configured with the following set of paths and corresponding 641 angular positions: 643 Prefix X/Y N1 N2 N3 N4 N5 645 Location 646 in degrees 60 85 120 290 260 648 If the absolute angular position of clients A and B were as follows: 650 Client A: 260 degrees 652 Client B: 290 degrees 654 Then the corresponding angular distances for those clients versus the 655 exit points can be calculated as follows: 657 Prefix X/Y N1 N2 N3 N4 N5 659 Client A 200 175 140 30 0 661 Client B 230 205 170 0 30 663 With an RR running the BGP best path algorithm modified to use the 664 angular distance from the client to the nexthops, rather than its IGP 665 distance to the nexthops as tie breaker, each client is provided with 666 its closest path with the measure of closeness reflecting the angular 667 position as configured by the operator. 669 The model used by the operator in order to determine the angular 670 position of a client or exit point, might involve grouping elements 671 together by region or PoP, or might involve no grouping at all. 672 Implementations should allow the operator to pick the appropriate 673 granularity. 675 4.3. Centralized vs distributed route reflectors 677 In an environment where the RR clusters are distributed (yet 678 centralized enough to make hot potato routing hard), and each RR 679 cluster serves a subset of clients, it becomes necessary to propagate 680 the angular position of the clients between route reflectors. This 681 can be achieved as follows: 683 Deploy add-paths between route reflectors in order to maximize 684 path diversity within the cluster. 686 A non AS transitive BGP community of type (TBA by IANA) can be 687 used to encode and propagate angular position between 0 and 359 of 688 a client. This community is only relevant to the route reflectors 689 of a given BGP domain and should be stripped either at the ASBR 690 boundary or when propagating updates to BGP peers which are not 691 route reflectors. 693 The angular position marking could also be added by clients and 694 advertised to the route reflector. This would require some 695 configuration effort. 697 5. Deployment considerations 699 The solutions are primarily intended for end-to-end tunneled 700 environments, i.e. where traffic is label switched or IP tunneled 701 across the core. If unencapsulated hop-by-hop forwarding is used, 702 either misconfigurations or conflicts between these optimizations and 703 classical BGP path selection rules could lead to intra-domain 704 forwarding loops. Under certain circumstances the solutions can also 705 be deployable without end-to-end tunneling. In particular the best 706 path selection based on the client's IGP best-path selection is 707 guaranteed not to cause any forwarding loops (other than micro loops 708 associated with reconvergence) when deployed in a flat IGP area 709 provided that no distance tolerance value is used so that the path 710 choice is truly made on a per-client basis. 712 It should be self evident that this solution does not interfere with 713 policies enforced above IGP tie breaking in the BGP best path 714 algorithm. 716 The solution applies to NLRIs of all address families which can be 717 route reflected and which can be tie broken by IGP distance to the 718 nexthop. 720 It should be noted that customized per-client or group of clients 721 best path selection is already in use today in the context of 722 Internet Exchange Point (IXP) route servers. In an IXP route server 723 the client best path is selected as a result of different policies 724 rather than IGP metric distance to BGP next hop. 726 A possible scalability impact of optimizing path selection to take 727 account of the RR client position is that different RR clients 728 receive different paths, and therefore update/peer group efficiency 729 diminishes. This cost is imposed by the requirement given the 730 requirement is to optimize the egress path from the client's 731 perspective. It is also not unlikely that groups of clients will end 732 up receiving the same best path/s, in which case, inefficiency of 733 update generation will be minimized. It should be noted that in the 734 cases described under flexible router placement where placement is 735 determined on a per update/peer group basis or per route reflector, 736 the scale benefits of peer groupings are retained. 738 6. Security considerations 740 No new security issues are introduced to the BGP protocol by this 741 specification. 743 7. IANA Considerations 745 IANA is requested to allocate a type code for the Standard BGP 746 Community to be used for inter cluster propagation of angular 747 position of the clients. 749 IANA is requested to allocate a new type code from BGP OPEN Optional 750 Parameter Types registry to be used for Group_ID propagation. 752 8. Acknowledgments 754 Authors would like to thank Eric Rosen, Clarence Filsfils, Uli 755 Bornhauser Russ White, Jakob Heitz and Mike Shand for their valuable 756 input. 758 9. References 760 9.1. Normative References 762 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 763 Requirement Levels", BCP 14, RFC 2119, March 1997. 765 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 766 Protocol 4 (BGP-4)", RFC 4271, January 2006. 768 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 769 Communities Attribute", RFC 4360, February 2006. 771 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 772 with BGP-4", RFC 5492, February 2009. 774 9.2. Informative References 776 [I-D.ietf-grow-diverse-bgp-path-dist] 777 Raszuk, R., Fernando, R., Patel, K., McPherson, D., and K. 778 Kumaki, "Distribution of diverse BGP paths.", 779 draft-ietf-grow-diverse-bgp-path-dist-08 (work in 780 progress), July 2012. 782 [I-D.ietf-idr-add-paths] 783 Walton, D., Chen, E., Retana, A., and J. Scudder, 784 "Advertisement of Multiple Paths in BGP", 785 draft-ietf-idr-add-paths-07 (work in progress), June 2012. 787 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 788 Communities Attribute", RFC 1997, August 1996. 790 [RFC1998] Chen, E. and T. Bates, "An Application of the BGP 791 Community Attribute in Multi-home Routing", RFC 1998, 792 August 1996. 794 [RFC4384] Meyer, D., "BGP Communities for Data Collection", BCP 114, 795 RFC 4384, February 2006. 797 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 798 Reflection: An Alternative to Full Mesh Internal BGP 799 (IBGP)", RFC 4456, April 2006. 801 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 802 Number Space", RFC 4893, May 2007. 804 [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension 805 for Inter-Area Label Switched Paths (LSPs)", RFC 5283, 806 July 2008. 808 [RFC5668] Rekhter, Y., Sangli, S., and D. Tappan, "4-Octet AS 809 Specific BGP Extended Community", RFC 5668, October 2009. 811 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 812 RFC 5714, January 2010. 814 Authors' Addresses 816 Robert Raszuk 817 NTT MCL 818 101 S Ellsworth Avenue Suite 350 819 San Mateo, CA 94401 820 US 822 Email: robert@raszuk.net 824 Christian Cassar 825 Cisco Systems 826 10 New Square Park 827 Bedfont Lakes, FELTHAM TW14 8HA 828 UK 830 Email: ccassar@cisco.com 832 Erik Aman 833 TeliaSonera 834 Marbackagatan 11 835 Farsta, SE-123 86 836 Sweden 838 Email: erik.aman@teliasonera.com 840 Bruno Decraene 841 France Telecom 842 38-40 rue du General Leclerc 843 Issi Moulineaux cedex 9, 92794 844 France 846 Email: bruno.decraene@orange-ftgroup.com