idnits 2.17.1 draft-hilliard-ix-bgp-route-server-operations-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 25, 2011) is 4567 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5226' is defined on line 494, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 4893 (Obsoleted by RFC 6793) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group N. Hilliard 3 Internet-Draft INEX 4 Intended status: Informational E. Jasinska 5 Expires: April 27, 2012 Limelight Networks 6 R. Raszuk 7 NTT MCL Inc. 8 N. Bakker 9 AMS-IX B.V. 10 October 25, 2011 12 Internet Exchange Route Server Operations 13 draft-hilliard-ix-bgp-route-server-operations-00 15 Abstract 17 The growing popularity of Internet exchange points (IXPs) brings a 18 new set of requirements to interconnect participating networks. 19 While bilateral exterior BGP sessions between exchange participants 20 were previously the most common means of exchanging reachability 21 information, the overhead associated with dense interconnection has 22 caused substantial operational scaling problems for Internet exchange 23 point participants. 25 Multilateral interconnection can dramatically reduce the 26 administrative and operational overhead of IXP participation and is 27 now used by many IXP participants as a means of exchanging routing 28 information. 30 This document outlines operational considerations for multilateral 31 interconnections at IXPs. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 27, 2012. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Notational Conventions . . . . . . . . . . . . . . . . . . 3 69 2. Bilateral Interconnection . . . . . . . . . . . . . . . . . . 3 70 3. Multilateral Interconnection . . . . . . . . . . . . . . . . . 4 71 4. Operational Considerations for Route Server Installations . . 5 72 4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 5 73 4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . . 6 74 4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 6 75 4.2.1.1. View Merging and Decomposition . . . . . . . . . . 6 76 4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 7 77 4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 7 78 4.3. NLRI Leakage Mitigation . . . . . . . . . . . . . . . . . 8 79 4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 8 80 4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 81 4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 9 82 4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 9 83 4.6.2. Internet Routing Registry . . . . . . . . . . . . . . 9 84 4.6.3. Client-accessible Databases . . . . . . . . . . . . . 10 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 86 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 87 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10 88 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 89 8.1. Normative References . . . . . . . . . . . . . . . . . . . 10 90 8.2. Informative References . . . . . . . . . . . . . . . . . . 11 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 93 1. Introduction 95 Internet exchange points (IXPs) provide IP data interconnection 96 facilities for their participants, typically using shared Layer-2 97 networking media such as Ethernet. The Border Gateway Protocol (BGP) 98 [RFC4271], an inter-Autonomous System routing protocol, is commonly 99 used to facilitate exchange of network reachability information over 100 such media. 102 BGP route servers [I-D.jasinska-ix-bgp-route-server] are commonly 103 deployed by IXP operators to provide a simple and convenient means of 104 interconnecting IXP participants with each other. A route server 105 redistributes prefixes received from its BGP clients to other clients 106 according to a pre-specified policy, and it can be viewed as similar 107 to an eBGP equivalent of an iBGP [RFC4456] route reflector. 109 Route servers at IXPs require careful management and it is important 110 for route server operators to thoroughly understand both how they 111 work and what their limitations are. In this document, we discuss 112 several issues of direct operational relevance to route server 113 operators, and provide a number of recommendations to help route 114 server operators provision a reliable multilateral interconnection 115 service. 117 1.1. Notational Conventions 119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 121 "OPTIONAL" in this document are to be interpreted as described in 122 [RFC2119]. 124 2. Bilateral Interconnection 126 Bilateral interconnection is a method of interconnecting routers 127 using individual BGP sessions between each participant router on an 128 IXP, in order to exchange reachability information. While 129 interconnection policies vary from participant to participant, most 130 IXPs have significant numbers of participants who see value in 131 interconnecting with as many other exchange participants as possible. 132 In order for an IXP participant to implement a dense interconnection 133 policy, it is necessary for the participant to liaise with each of 134 their intended interconnection partners. If this partner agrees to 135 interconnect, then both participants' routers must be configured with 136 a BGP session to exchange network reachability information. If each 137 exchange participant interconnects with each other participant, a 138 full mesh of BGP sessions is needed, as shown in Figure 1. 140 ___ ___ 141 / \ / \ 142 ..| AS1 |..| AS2 |.. 143 : \___/____\___/ : 144 : | \ / | : 145 : | \ / | : 146 : IXP | \/ | : 147 : | /\ | : 148 : | / \ | : 149 : _|_/____\_|_ : 150 : / \ / \ : 151 ..| AS3 |..| AS4 |.. 152 \___/ \___/ 154 Figure 1: Full-Mesh Interconnection at an IXP 156 Figure 1 depicts an IXP platform with four connected routers, 157 administered by four separate exchange participants, each of them 158 with a locally unique autonomous system number: AS1, AS2, AS3 and 159 AS4. Each of these four participants wishes to exchange traffic with 160 all other participants; this is accomplished by configuring a full 161 mesh of BGP sessions on each router connected to the exchange, 162 resulting in 6 BGP sessions across the IXP fabric. 164 The number of BGP sessions at an exchange has an upper bound of 165 n*(n-1)/2, where n is the number of routers at the exchange. As many 166 exchanges have relatively large numbers of participating networks, 167 the quadratic scaling requirements of dense interconnection tend to 168 cause high operational and administrative overhead. Consequently, 169 new participants to an IXP require significant initial resourcing in 170 order to gain value from their IXP connection, while existing 171 exchange participants need to commit ongoing resources in order to 172 benefit from interconnecting with these new participants. 174 3. Multilateral Interconnection 176 Multilateral interconnection is implemented using a route server 177 configured to use BGP to distribute network layer reachability 178 information (NLRI) among all client routers. The route server 179 preserves the BGP NEXT_HOP attribute from all received NLRI UPDATE 180 messages, and passes these messages with unchanged NEXT_HOP to its 181 route server clients, according to its configured routing policy, as 182 described in [I-D.jasinska-ix-bgp-route-server]. Using this method 183 of exchanging NLRI messages, an IXP participant router can receive an 184 aggregated list of prefixes from all other route server clients using 185 a single BGP session to the route server instead of depending on BGP 186 sessions with each other router at the exchange. This reduces the 187 overall number of BGP sessions at an Internet exchange from n*(n-1)/2 188 to n, where n is the number of routers at the exchange. 190 Although a route server uses BGP to exchange reachability information 191 with each of its clients, it does not forward traffic itself and is 192 therefore not a router. 194 In practical terms, this allows dense interconnection between IXP 195 participants with low administrative overhead and significantly 196 simpler and smaller router configurations. In particular, new IXP 197 participants benefit from immediate and extensive interconnection, 198 while existing route server participants receive reachability 199 information from these new participants without necessarily having to 200 modify their configurations. 202 ___ ___ 203 / \ / \ 204 ..| AS1 |..| AS2 |.. 205 : \___/ \___/ : 206 : \ / : 207 : \ / : 208 : \__/ : 209 : IXP / \ : 210 : | RS | : 211 : \____/ : 212 : / \ : 213 : / \ : 214 : __/ \__ : 215 : / \ / \ : 216 ..| AS3 |..| AS4 |.. 217 \___/ \___/ 219 Figure 2: IXP-based Interconnection with Route Server 221 As illustrated in Figure 2, each router on the IXP fabric requires 222 only a single BGP session to the route server, from which it can 223 receive reachability information for all other routers on the IXP 224 which also connect to the route server. 226 4. Operational Considerations for Route Server Installations 228 4.1. Path Hiding 230 "Path hiding" is a term used in [I-D.jasinska-ix-bgp-route-server] to 231 describe the process whereby a route server may mask individual paths 232 by applying conflicting routing policies to its Loc-RIB. When this 233 happens, route server clients receive incomplete information from the 234 route server about network reachability. 236 There are several approaches which may be used to mitigate against 237 the effect of path hiding; these are described in 238 [I-D.jasinska-ix-bgp-route-server]. However, the only method which 239 does not require explicit support from the route server client is for 240 the route server itself to maintain a individual Loc-RIB for each 241 client. 243 4.2. Route Server Scaling 245 While deployment of multiple Loc-RIBs on the route server presents a 246 simple way to avoid the path hiding problem noted in Section 4.1, 247 this approach requires significantly more computing resources on the 248 route server than where a single Loc-RIB is deployed for all clients. 249 As the [RFC4271] Decision Process must be applied to all Loc-RIBs 250 deployed on the route server, both CPU and memory requirements on the 251 host computer scale approximately according to O(P * N), where P is 252 the total number of unique paths received by the route server and N 253 is the number of route server clients which require a unique Loc-RIB. 254 As this is a super-linear scaling relationship, large route servers 255 may derive benefit from deploying per-client Loc-RIBs only where they 256 are required. 258 Regardless of any Loc-RIB optimization implemented, the route 259 server's control plane bandwidth requirements will scale according to 260 O(P * N), where P is the total number of unique paths received by the 261 route server and N is the total number of route server clients. In 262 the case where P_avg (the arithmetic mean number of unique paths 263 received per route server client) remains roughly constant even as 264 the number of connected clients increases, this relationship can be 265 rewritten as O((P_avg * N) * N) or O(N^2). This quadratic upper 266 bound on the network traffic requirements indicates that the route 267 server model will not scale to arbitrarily large sizes. 269 4.2.1. Tackling Scaling Issues 271 The network traffic scaling issue presents significant difficulties 272 with no clear solution - ultimately, each client must receive a 273 UPDATE for each unique prefix received by the route server. However, 274 there are several potential methods for dealing with the CPU and 275 memory resource requirements of route servers. 277 4.2.1.1. View Merging and Decomposition 279 View merging and decomposition, outlined in [RS-ARCH], describes a 280 method of optimising memory and CPU requirements where multiple route 281 server clients are subject to exactly the same routing policies. In 282 this situation, the multiple Loc-RIB views required by each client 283 are merged into a single view. 285 There are several variations of this approach. If the route server 286 operator has prior knowledge of interconnection relationships between 287 route server clients, then the operator may configure separate Loc- 288 RIBs only for route server clients with unique outbound routing 289 policies. As this approach requires prior knowledge of 290 interconnection relationships, the route server operator must depend 291 on each client sharing their interconnection policies, either in a 292 internal provisioning database controlled by the operator, or else in 293 an external data store such as an Internet Routing Registry Database. 295 Conversely, the route server implementation itself may implement view 296 decomposition by creating virtual Loc-RIBs based on a single in- 297 memory master Loc-RIB, with delta differences for each prefix subject 298 to different routing policies. This allows a more granular and 299 flexible approach to the problem of Loc-RIB scaling, at the expense 300 of requiring a more complex in-memory Loc-RIB structure. 302 Whatever method of view merging and decomposition is chosen on a 303 route server, pathological edge cases can be created whereby they 304 will scale no better than fully non-optimised per-client Loc-RIBs. 305 However, as most route server clients connect to a route server for 306 the purposes of reducing overhead, rather than implementing complex 307 per-client routing policies, edge cases tend not to arise in 308 practice. 310 4.2.1.2. Destination Splitting 312 Destination splitting, also described in [RS-ARCH], describes a 313 method for route server clients to connect to multiple route servers 314 and to send non-overlapping sets of prefixes to each route server. 315 As each route server computes the best path for its own set of 316 prefixes, the quadratic scaling requirement operates on multiple 317 smaller sets of prefixes. This reduces the overall computational and 318 memory requirements for managing multiple Loc-RIBs and performing the 319 best-path calculation on each. In order for this method to perform 320 well, destination splitting would require significant co-ordination 321 between the route server operator and each route server client. In 322 practice, such levels of co-ordination are unlikely to work 323 successfully, thereby diminishing the value of this approach. 325 4.2.1.3. NEXT_HOP Resolution 327 As route servers are usually deployed at IXPs which use flat layer 2 328 networks, recursive resolution of the NEXT_HOP attribute is generally 329 not required, and can be replaced by a simple check to ensure that 330 the NEXT_HOP value for each prefix is a network address on the IXP 331 LAN's IP address range. 333 4.3. NLRI Leakage Mitigation 335 NLRI leakage occurs when a BGP client unintentionally distributes 336 NLRI UPDATE messages to one or more neighboring BGP routers. NLRI 337 leakage of this form to a route server can cause connectivity 338 problems at an IXP if each route server client is configured to 339 accept all prefix UPDATE messages from the route server. It is 340 therefore RECOMMENDED when deploying route servers that, due to the 341 potential for collateral damage caused by NLRI leakage, route server 342 operators deploy NLRI leakage mitigation measures in order to prevent 343 unintentional prefix announcements or else limit the scale of any 344 such leak. Although not foolproof, per-client inbound prefix limits 345 can restrict the damage caused by prefix leakage in many cases. Per- 346 client inbound prefix filtering on the route server is a more 347 deterministic and usually more reliable means of preventing prefix 348 leakage, but requires more administrative resources to maintain 349 properly. 351 If a route server operator implements per-client inbound prefix 352 filtering, then it is RECOMMENDED that the operator also builds in 353 mechanisms to automatically compare the Adj-RIB-In received from each 354 client with the inbound prefix lists configured for those clients. 355 Naturally, it is the responsibility of the route server client to 356 ensure that their stated prefix list is compatible with what they 357 announce to an IXP route server. However, many network operators do 358 not carefully manage their published routing policies and it is not 359 uncommon to see significant variation between the two sets of 360 prefixes. Route server operator visibility into this discrepancy can 361 provide significant advantages to both operator and client. 363 4.4. Route Server Redundancy 365 As the purpose of an IXP route server implementation is to provide a 366 reliable reachability brokerage service, it is RECOMMENDED that 367 exchange operators who implement route server systems provision 368 multiple route servers on each shared Layer-2 domain. There is no 369 requirement to use the same BGP implementation or operating system 370 for each route server on the IXP fabric; however, it is RECOMMENDED 371 that where an operator provisions more than a single server on the 372 same shared Layer-2 domain, each route server implementation be 373 configured equivalently and in such a manner that the path 374 reachability information from each system is identical. 376 4.5. AS_PATH Consistency Check 378 [RFC4271] requires that every BGP speaker which advertises a route to 379 another external BGP speaker prepends its own AS number as the last 380 element of the AS_PATH sequence. Therefore the leftmost AS in an 381 AS_PATH attribute should be equal to the autonomous system number of 382 the BGP speaker which sent the UPDATE message. 384 As [I-D.jasinska-ix-bgp-route-server] suggests that route servers 385 should not modify the AS_PATH attribute, a consistency check on the 386 AS_PATH of an UPDATE received by a route server client would normally 387 fail. It is therefore RECOMMENDED that route server clients disable 388 the AS_PATH consistency check towards the route server. 390 4.6. Export Routing Policies 392 Policy filtering is commonly implemented on route servers to provide 393 prefix distribution control mechanisms for route server clients. A 394 route server "export" policy is a policy which affects prefixes sent 395 from the route server to a route server client. Several different 396 strategies are commonly used for implementing route server export 397 policies. 399 4.6.1. BGP Communities 401 Prefixes sent to the route server are tagged with specific [RFC1997] 402 or [RFC4360] BGP community attributes, based on pre-defined values 403 agreed between the operator and all client. Based on these community 404 tags, prefixes may be propagated to all other clients, a subset of 405 clients, or none. This mechanism allows route server clients to 406 instruct the route server to implement per-client export routing 407 policies. 409 As both standard and extended BGP communities values are restricted 410 to 6 octets, the route server operator should take care to ensure 411 that the predefined BGP community values mechanism used on their 412 route server is compatible with [RFC4893] 4-octet autonomous system 413 numbers. 415 4.6.2. Internet Routing Registry 417 Internet Routing Registry databases (IRRDBs) may be used by route 418 server operators to implement construct per-client routing policies. 419 [RFC2622] Routing Policy Specification Language (RPSL) provides an 420 comprehensive grammar for describing interconnection relationships, 421 and several toolsets exist which can be used to translate RPSL policy 422 description into route server configurations. 424 4.6.3. Client-accessible Databases 426 Should the route server operator not wish to use either BGP community 427 tags or the public IRRDBs for implementing client export policies, 428 they may implement their own routing policy database system for 429 managing their clients' requirements. A database of this form SHOULD 430 allow a route server client operator to update their routing policy 431 and provide a mechanism for allowing the client to specify whether 432 they wish to exchange all their prefixes with any other route server 433 client. Optionally, the implementation may allow a client to specify 434 unique routing policies for individual prefixes over which they have 435 routing policy control. 437 5. Security Considerations 439 On route server installations which do not employ path hiding 440 mitigation techniques, the path hiding problem outlined in section 441 Section 4.1 can be used in certain circumstances to proactively block 442 third party prefix announcements from other route server clients. 444 6. IANA Considerations 446 There are no IANA considerations. 448 7. Acknowledgments 450 The authors would like to thank Chris Hall, Ryan Bickhart and Steven 451 Bakker for their valuable input. 453 In addition, the authors would like to acknowledge the developers of 454 BIRD, OpenBGPD and Quagga, whose open source BGP implementations 455 include route server capabilities which are compliant with this 456 document. 458 8. References 460 8.1. Normative References 462 [I-D.jasinska-ix-bgp-route-server] 463 Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, 464 "Internet Exchange Route Server", 465 draft-jasinska-ix-bgp-route-server-03 (work in progress), 466 October 2011. 468 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 469 Requirement Levels", BCP 14, RFC 2119, March 1997. 471 8.2. Informative References 473 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 474 Communities Attribute", RFC 1997, August 1996. 476 [RFC2622] Alaettinoglu, C., Villamizar, C., Gerich, E., Kessens, D., 477 Meyer, D., Bates, T., Karrenberg, D., and M. Terpstra, 478 "Routing Policy Specification Language (RPSL)", RFC 2622, 479 June 1999. 481 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 482 Protocol 4 (BGP-4)", RFC 4271, January 2006. 484 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 485 Communities Attribute", RFC 4360, February 2006. 487 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 488 Reflection: An Alternative to Full Mesh Internal BGP 489 (IBGP)", RFC 4456, April 2006. 491 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 492 Number Space", RFC 4893, May 2007. 494 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 495 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 496 May 2008. 498 [RS-ARCH] Govindan, R., Alaettinoglu, C., Varadhan, K., and D. 499 Estrin, "A Route Server Architecture for Inter-Domain 500 Routing", 1995, 501 . 503 Authors' Addresses 505 Nick Hilliard 506 INEX 507 4027 Kingswood Road 508 Dublin 24 509 IE 511 Email: nick@inex.ie 512 Elisa Jasinska 513 Limelight Networks 514 222 South Mill Avenue 515 Tempe, AZ 85281 516 US 518 Email: elisa@llnw.com 520 Robert Raszuk 521 NTT MCL Inc. 522 101 S Ellsworth Avenue Suite 350 523 San Mateo, CA 94401 524 US 526 Email: robert@raszuk.net 528 Niels Bakker 529 AMS-IX B.V. 530 Westeinde 12 531 Amsterdam, NH 1017 ZN 532 NL 534 Email: niels.bakker@ams-ix.net