idnits 2.17.1 draft-ietf-grow-ix-bgp-route-server-operations-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 22, 2013) is 3900 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5226' is defined on line 522, but no explicit reference was found in the text == Outdated reference: A later version (-12) exists of draft-ietf-idr-ix-bgp-route-server-02 -- Obsolete informational reference (is this intentional?): RFC 4893 (Obsoleted by RFC 6793) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group N. Hilliard 3 Internet-Draft INEX 4 Intended status: Informational E. Jasinska 5 Expires: February 23, 2014 Microsoft Corporation 6 R. Raszuk 7 NTT I3 8 N. Bakker 9 AMS-IX B.V. 10 August 22, 2013 12 Internet Exchange Route Server Operations 13 draft-ietf-grow-ix-bgp-route-server-operations-00 15 Abstract 17 The popularity of Internet exchange points (IXPs) brings new 18 challenges to interconnecting networks. While bilateral eBGP 19 sessions between exchange participants were historically the most 20 common means of exchanging reachability information over an IXP, the 21 overhead associated with this interconnection method causes serious 22 operational and administrative scaling problems for IXP participants. 24 Multilateral interconnection using Internet route servers can 25 dramatically reduce the administrative and operational overhead of 26 IXP participation and these systems used by many IXP participants as 27 a preferred means of exchanging routing information. 29 This document describes operational considerations for multilateral 30 interconnections at IXPs. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on February 23, 2014. 49 Copyright Notice 51 Copyright (c) 2013 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 68 2. Bilateral BGP Sessions . . . . . . . . . . . . . . . . . . . 3 69 3. Multilateral Interconnection . . . . . . . . . . . . . . . . 4 70 4. Operational Considerations for Route Server Installations . . 5 71 4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 5 72 4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . 6 73 4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 6 74 4.2.1.1. View Merging and Decomposition . . . . . . . . . 6 75 4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 7 76 4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 8 77 4.3. Prefix Leakage Mitigation . . . . . . . . . . . . . . . . 8 78 4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 8 79 4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 80 4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 9 81 4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 9 82 4.6.2. Internet Routing Registry . . . . . . . . . . . . . . 9 83 4.6.3. Client-accessible Databases . . . . . . . . . . . . . 10 84 4.7. Layer 2 Reachability Problems . . . . . . . . . . . . . . 10 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 86 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 87 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 88 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 89 8.1. Normative References . . . . . . . . . . . . . . . . . . 11 90 8.2. Informative References . . . . . . . . . . . . . . . . . 11 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 93 1. Introduction 95 Internet exchange points (IXPs) provide IP data interconnection 96 facilities for their participants, typically using shared Layer-2 97 networking media such as Ethernet. The Border Gateway Protocol (BGP) 98 [RFC4271] is normally used to facilitate exchange of network 99 reachability information over these media. 101 As bilateral interconnection between IXP participants requires 102 operational and administrative overhead, BGP route servers 103 [I-D.ietf-idr-ix-bgp-route-server] are often deployed by IXP 104 operators to provide a simple and convenient means of interconnecting 105 IXP participants with each other. A route server redistributes 106 prefixes received from its BGP clients to other clients according to 107 a pre-specified policy, and it can be viewed as similar to an eBGP 108 equivalent of an iBGP [RFC4456] route reflector. 110 Route servers at IXPs require careful management and it is important 111 for route server operators to thoroughly understand both how they 112 work and what their limitations are. In this document, we discuss 113 several issues of operational relevance to route server operators and 114 provide recommendations to help route server operators provision a 115 reliable interconnection service. 117 1.1. Notational Conventions 119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 121 "OPTIONAL" in this document are to be interpreted as described in 122 [RFC2119]. 124 2. Bilateral BGP Sessions 126 Bilateral interconnection is a method of interconnecting routers 127 using individual BGP sessions between each participant router on an 128 IXP, in order to exchange reachability information. If an IXP 129 participant wishes to implement an open interconnection policy - i.e. 130 a policy of interconnecting with as many other IXP participants as 131 possible - it is necessary for the participant to liaise with each of 132 their intended interconnection partners. Interconnection can then be 133 implemented bilaterally by configuring a BGP session on both 134 participants' routers to exchange network reachability information. 135 If each exchange participant interconnects with each other 136 participant, a full mesh of BGP sessions is needed, as shown in 137 Figure 1. 139 ___ ___ 140 / \ / \ 141 ..| AS1 |..| AS2 |.. 142 : \___/____\___/ : 143 : | \ / | : 144 : | \ / | : 146 : IXP | \/ | : 147 : | /\ | : 148 : | / \ | : 149 : _|_/____\_|_ : 150 : / \ / \ : 151 ..| AS3 |..| AS4 |.. 152 \___/ \___/ 154 Figure 1: Full-Mesh Interconnection at an IXP 156 Figure 1 depicts an IXP platform with four connected routers, 157 administered by four separate exchange participants, each of them 158 with a locally unique autonomous system number: AS1, AS2, AS3 and 159 AS4. Each of these four participants wishes to exchange traffic with 160 all other participants; this is accomplished by configuring a full 161 mesh of BGP sessions on each router connected to the exchange, 162 resulting in 6 BGP sessions across the IXP fabric. 164 The number of BGP sessions at an exchange has an upper bound of 165 n*(n-1)/2, where n is the number of routers at the exchange. As many 166 exchanges have large numbers of participating networks, the amount of 167 administrative and operation overhead required to implement an open 168 interconnection scales quadratically. New participants to an IXP 169 require significant initial resourcing in order to gain value from 170 their IXP connection, while existing exchange participants need to 171 commit ongoing resources in order to benefit from interconnecting 172 with these new participants. 174 3. Multilateral Interconnection 176 Multilateral interconnection is implemented using a route server 177 configured to use BGP to distribute network layer reachability 178 information (NLRI) among all client routers. The route server 179 preserves the BGP NEXT_HOP attribute from all received NLRI UPDATE 180 messages, and passes these messages with unchanged NEXT_HOP to its 181 route server clients, according to its configured routing policy, as 182 described in [I-D.ietf-idr-ix-bgp-route-server]. Using this method 183 of exchanging NLRI messages, an IXP participant router can receive an 184 aggregated list of prefixes from all other route server clients using 185 a single BGP session to the route server instead of depending on BGP 186 sessions with each other router at the exchange. This reduces the 187 overall number of BGP sessions at an Internet exchange from n*(n-1)/2 188 to n, where n is the number of routers at the exchange. 190 Although a route server uses BGP to exchange reachability information 191 with each of its clients, it does not forward traffic itself and is 192 therefore not a router. 194 In practical terms, this allows dense interconnection between IXP 195 participants with low administrative overhead and significantly 196 simpler and smaller router configurations. In particular, new IXP 197 participants benefit from immediate and extensive interconnection, 198 while existing route server participants receive reachability 199 information from these new participants without necessarily having to 200 modify their configurations. 202 ___ ___ 203 / \ / \ 204 ..| AS1 |..| AS2 |.. 205 : \___/ \___/ : 206 : \ / : 207 : \ / : 208 : \__/ : 209 : IXP / \ : 210 : | RS | : 211 : \____/ : 212 : / \ : 213 : / \ : 214 : __/ \__ : 215 : / \ / \ : 216 ..| AS3 |..| AS4 |.. 217 \___/ \___/ 219 Figure 2: IXP-based Interconnection with Route Server 221 As illustrated in Figure 2, each router on the IXP fabric requires 222 only a single BGP session to the route server, from which it can 223 receive reachability information for all other routers on the IXP 224 which also connect to the route server. 226 4. Operational Considerations for Route Server Installations 228 4.1. Path Hiding 230 "Path hiding" is a term used in [I-D.ietf-idr-ix-bgp-route-server] to 231 describe the process whereby a route server may mask individual paths 232 by applying conflicting routing policies to its Loc-RIB. When this 233 happens, route server clients receive incomplete information from the 234 route server about network reachability. 236 There are several approaches which may be used to mitigate against 237 the effect of path hiding; these are described in 238 [I-D.ietf-idr-ix-bgp-route-server]. However, the only method which 239 does not require explicit support from the route server client is for 240 the route server itself to maintain a individual Loc-RIB for each 241 client which is the subject of conflicting routing policies. 243 4.2. Route Server Scaling 245 While deployment of multiple Loc-RIBs on the route server presents a 246 simple way to avoid the path hiding problem noted in Section 4.1, 247 this approach requires significantly more computing resources on the 248 route server than where a single Loc-RIB is deployed for all clients. 249 As the [RFC4271] BGP decision process must be applied to all Loc-RIBs 250 deployed on the route server, both CPU and memory requirements on the 251 host computer scale approximately according to O(P * N), where P is 252 the total number of unique paths received by the route server and N 253 is the number of route server clients which require a unique Loc-RIB. 254 As this is a super-linear scaling relationship, large route servers 255 may derive benefit from deploying per-client Loc-RIBs only where they 256 are required. 258 Regardless of any Loc-RIB optimization technique is implemented, the 259 route server's control plane bandwidth requirements will scale 260 according to O(P * N), where P is the total number of unique paths 261 received by the route server and N is the total number of route 262 server clients. In the case where P_avg (the arithmetic mean number 263 of unique paths received per route server client) remains roughly 264 constant even as the number of connected clients increases, this 265 relationship can be rewritten as O((P_avg * N) * N) or O(N^2). This 266 quadratic upper bound on the network traffic requirements indicates 267 that the route server model will not scale to arbitrarily large 268 sizes. 270 This scaling analysis presents problems in three key areas: route 271 processor CPU overhead associated with BGP decision process 272 calculations, the memory requirements for handling many different BGP 273 path entries, and the network traffic bandwidth required to 274 distribute these prefixes from the route server to each route server 275 client. 277 4.2.1. Tackling Scaling Issues 279 The network traffic scaling issue presents significant difficulties 280 with no clear solution - ultimately, each client must receive a 281 UPDATE for each unique prefix received by the route server. However, 282 there are several potential methods for dealing with the CPU and 283 memory resource requirements of route servers. 285 4.2.1.1. View Merging and Decomposition 286 View merging and decomposition, outlined in [RS-ARCH], describes a 287 method of optimising memory and CPU requirements where multiple route 288 server clients are subject to exactly the same routing policies. In 289 this situation, the multiple Loc-RIB views required by each client 290 are merged into a single view. 292 There are several variations of this approach. If the route server 293 operator has prior knowledge of interconnection relationships between 294 route server clients, then the operator may configure separate Loc- 295 RIBs only for route server clients with unique outbound routing 296 policies. As this approach requires prior knowledge of 297 interconnection relationships, the route server operator must depend 298 on each client sharing their interconnection policies, either in a 299 internal provisioning database controlled by the operator, or else in 300 an external data store such as an Internet Routing Registry Database. 302 Conversely, the route server implementation itself may implement 303 internal view decomposition by creating virtual Loc-RIBs based on a 304 single in-memory master Loc-RIB, with delta differences for each 305 prefix subject to different routing policies. This allows a more 306 granular and flexible approach to the problem of Loc-RIB scaling, at 307 the expense of requiring a more complex in-memory Loc-RIB structure. 309 Whatever method of view merging and decomposition is chosen on a 310 route server, pathological edge cases can be created whereby they 311 will scale no better than fully non-optimised per-client Loc-RIBs. 312 However, as most route server clients connect to a route server for 313 the purposes of reducing overhead, rather than implementing complex 314 per-client routing policies, edge cases tend not to arise in 315 practice. 317 4.2.1.2. Destination Splitting 319 Destination splitting, also described in [RS-ARCH], describes a 320 method for route server clients to connect to multiple route servers 321 and to send non-overlapping sets of prefixes to each route server. 322 As each route server computes the best path for its own set of 323 prefixes, the quadratic scaling requirement operates on multiple 324 smaller sets of prefixes. This reduces the overall computational and 325 memory requirements for managing multiple Loc-RIBs and performing the 326 best-path calculation on each. In order for this method to perform 327 well, destination splitting would require significant co-ordination 328 between the route server operator and each route server client. In 329 practice, this level of close co-ordination between IXP operators and 330 their participants tends not to occur, suggesting that the approach 331 is unlikely to be of any real use on production IXPs. 333 4.2.1.3. NEXT_HOP Resolution 335 As route servers are usually deployed at IXPs which use flat layer 2 336 networks, recursive resolution of the NEXT_HOP attribute is generally 337 not required, and can be replaced by a simple check to ensure that 338 the NEXT_HOP value for each prefix is a network address on the IXP 339 LAN's IP address range. 341 4.3. Prefix Leakage Mitigation 343 Prefix leakage occurs when a BGP client unintentionally distributes 344 NLRI UPDATE messages to one or more neighboring BGP routers. Prefix 345 leakage of this form to a route server can cause serious connectivity 346 problems at an IXP if each route server client is configured to 347 accept all prefix UPDATE messages from the route server. It is 348 therefore RECOMMENDED when deploying route servers that, due to the 349 potential for collateral damage caused by NLRI leakage, route server 350 operators deploy prefix leakage mitigation measures in order to 351 prevent unintentional prefix announcements or else limit the scale of 352 any such leak. Although not foolproof, per-client inbound prefix 353 limits can restrict the damage caused by prefix leakage in many 354 cases. Per-client inbound prefix filtering on the route server is a 355 more deterministic and usually more reliable means of preventing 356 prefix leakage, but requires more administrative resources to 357 maintain properly. 359 If a route server operator implements per-client inbound prefix 360 filtering, then it is RECOMMENDED that the operator also builds in 361 mechanisms to automatically compare the Adj-RIB-In received from each 362 client with the inbound prefix lists configured for those clients. 363 Naturally, it is the responsibility of the route server client to 364 ensure that their stated prefix list is compatible with what they 365 announce to an IXP route server. However, many network operators do 366 not carefully manage their published routing policies and it is not 367 uncommon to see significant variation between the two sets of 368 prefixes. Route server operator visibility into this discrepancy can 369 provide significant advantages to both operator and client. 371 4.4. Route Server Redundancy 373 As the purpose of an IXP route server implementation is to provide a 374 reliable reachability brokerage service, it is RECOMMENDED that 375 exchange operators who implement route server systems provision 376 multiple route servers on each shared Layer-2 domain. There is no 377 requirement to use the same BGP implementation or operating system 378 for each route server on the IXP fabric; however, it is RECOMMENDED 379 that where an operator provisions more than a single server on the 380 same shared Layer-2 domain, each route server implementation be 381 configured equivalently and in such a manner that the path 382 reachability information from each system is identical. 384 4.5. AS_PATH Consistency Check 386 [RFC4271] requires that every BGP speaker which advertises a route to 387 another external BGP speaker prepends its own AS number as the last 388 element of the AS_PATH sequence. Therefore the leftmost AS in an 389 AS_PATH attribute should be equal to the autonomous system number of 390 the BGP speaker which sent the UPDATE message. 392 As [I-D.ietf-idr-ix-bgp-route-server] suggests that route servers 393 should not modify the AS_PATH attribute, a consistency check on the 394 AS_PATH of an UPDATE received by a route server client would normally 395 fail. It is therefore RECOMMENDED that route server clients disable 396 the AS_PATH consistency check towards the route server. 398 4.6. Export Routing Policies 400 Policy filtering is commonly implemented on route servers to provide 401 prefix distribution control mechanisms for route server clients. A 402 route server "export" policy is a policy which affects prefixes sent 403 from the route server to a route server client. Several different 404 strategies are commonly used for implementing route server export 405 policies. 407 4.6.1. BGP Communities 409 Prefixes sent to the route server are tagged with specific [RFC1997] 410 or [RFC4360] BGP community attributes, based on pre-defined values 411 agreed between the operator and all client. Based on these community 412 tags, prefixes may be propagated to all other clients, a subset of 413 clients, or none. This mechanism allows route server clients to 414 instruct the route server to implement per-client export routing 415 policies. 417 As both standard and extended BGP communities values are restricted 418 to 6 octets, the route server operator should take care to ensure 419 that the predefined BGP community values mechanism used on their 420 route server is compatible with [RFC4893] 4-octet autonomous system 421 numbers. 423 4.6.2. Internet Routing Registry 425 Internet Routing Registry databases (IRRDBs) may be used by route 426 server operators to implement construct per-client routing policies. 427 [RFC2622] Routing Policy Specification Language (RPSL) provides an 428 comprehensive grammar for describing interconnection relationships, 429 and several toolsets exist which can be used to translate RPSL policy 430 description into route server configurations. 432 4.6.3. Client-accessible Databases 434 Should the route server operator not wish to use either BGP community 435 tags or the public IRRDBs for implementing client export policies, 436 they may implement their own routing policy database system for 437 managing their clients' requirements. A database of this form SHOULD 438 allow a route server client operator to update their routing policy 439 and provide a mechanism for allowing the client to specify whether 440 they wish to exchange all their prefixes with any other route server 441 client. Optionally, the implementation may allow a client to specify 442 unique routing policies for individual prefixes over which they have 443 routing policy control. 445 4.7. Layer 2 Reachability Problems 447 Layer 2 reachability problems on an IXP can cause serious operational 448 problems for IXP participants which depend on route servers for 449 interconnection. Ethernet switch forwarding bugs have occasionally 450 been observed to cause non-commutative reachability. For example, 451 given a route server and two IXP participants, A and B, if the two 452 participants can reach the route server but cannot reach each other, 453 then traffic between the participants may be dropped until such time 454 as the layer 2 forwarding problem is resolved. This situation does 455 not tend to occur in bilateral interconnection arrangements, as the 456 routing control path between the two hosts is usually (but not 457 always, due to IXP inter-switch connectivity load balancing 458 algorithms) the same as the data path between them. 460 Problems of this form can be dealt with using [RFC5881] bidirectional 461 forwarding detection. However, as this is a bilateral protocol 462 configured between routers, and as there is currently no means for 463 automatic configuration of BFD between route server clients, BFD does 464 not currently provide an optimal means of handling the problem. 466 5. Security Considerations 468 On route server installations which do not employ path hiding 469 mitigation techniques, the path hiding problem outlined in section 470 Section 4.1 can be used in certain circumstances to proactively block 471 third party prefix announcements from other route server clients. 473 6. IANA Considerations 475 There are no IANA considerations. 477 7. Acknowledgments 479 The authors would like to thank Chris Hall, Ryan Bickhart and Steven 480 Bakker for their valuable input. 482 In addition, the authors would like to acknowledge the developers of 483 BIRD, OpenBGPD and Quagga, whose open source BGP implementations 484 include route server capabilities which are compliant with this 485 document. 487 8. References 489 8.1. Normative References 491 [I-D.ietf-idr-ix-bgp-route-server] 492 Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, 493 "Internet Exchange Route Server", draft-ietf-idr-ix-bgp- 494 route-server-02 (work in progress), February 2013. 496 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 497 Requirement Levels", BCP 14, RFC 2119, March 1997. 499 8.2. Informative References 501 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 502 Communities Attribute", RFC 1997, August 1996. 504 [RFC2622] Alaettinoglu, C., Villamizar, C., Gerich, E., Kessens, D., 505 Meyer, D., Bates, T., Karrenberg, D., and M. Terpstra, 506 "Routing Policy Specification Language (RPSL)", RFC 2622, 507 June 1999. 509 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 510 Protocol 4 (BGP-4)", RFC 4271, January 2006. 512 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 513 Communities Attribute", RFC 4360, February 2006. 515 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 516 Reflection: An Alternative to Full Mesh Internal BGP 517 (IBGP)", RFC 4456, April 2006. 519 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 520 Number Space", RFC 4893, May 2007. 522 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 523 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 524 May 2008. 526 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 527 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 528 2010. 530 [RS-ARCH] Govindan, R., Alaettinoglu, C., Varadhan, K., and D. 531 Estrin, "A Route Server Architecture for Inter-Domain 532 Routing", 1995, 533 . 535 Authors' Addresses 537 Nick Hilliard 538 INEX 539 4027 Kingswood Road 540 Dublin 24 541 IE 543 Email: nick@inex.ie 545 Elisa Jasinska 546 Microsoft Corporation 547 One Microsoft Way 548 Redmond, WA 98052 549 US 551 Email: ejas@microsoft.com 553 Robert Raszuk 554 NTT I3 555 101 S Ellsworth Avenue Suite 350 556 San Mateo, CA 94401 557 US 559 Email: robert@raszuk.net 561 Niels Bakker 562 AMS-IX B.V. 563 Westeinde 12 564 Amsterdam, NH 1017 ZN 565 NL 567 Email: niels.bakker@ams-ix.net