idnits 2.17.1 draft-francis-idr-intra-va-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1301. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1312. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1319. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1325. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: 3. If a non-APR router does not have a route for a known VP, then it MAY or MAY NOT install sub-prefixes within that VP. Whether or not it does is up to the vendor and the network operator. One approach is to never install such sub-prefixes, on the assumption that the network operator will engineer his network so that this rarely if ever happens. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 15, 2008) is 5701 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Francis 3 Internet-Draft Cornell U. 4 Intended status: BCP X. Xu 5 Expires: March 19, 2009 Huawei 6 H. Ballani 7 Cornell U. 8 September 15, 2008 10 FIB Suppression with Virtual Aggregation and Default Routes 11 draft-francis-idr-intra-va-01.txt 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on March 19, 2009. 38 Abstract 40 The continued growth in the Default Free Routing Table (DFRT) 41 stresses the global routing system in a number of ways. One of the 42 most costly stresses is FIB size: ISPs often must upgrade router 43 hardware simply because the FIB has run out of space, and router 44 vendors must design routers that have adequate FIB. FIB suppression 45 is an approach to relieving stress on the FIB by NOT loading selected 46 RIB entries into the FIB. This document specifies two styles of FIB 47 suppression. Edge suppression (ES) allows ISPs that deploy a core- 48 edge topology to shrink the FIBs of their edge routers, including 49 those that interface to other ISPs and exchange the full DFRT. 50 Virtual Aggregation (VA) allows ISPs to shrink the FIBs of any and 51 all routers. Both styles may be deployed autonomously by an ISP 52 (cooperation between ISPs is not required), and can co-exist with 53 legacy routers in the ISP. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 5 59 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 5 60 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 61 1.3.1. Terms common to both VA and ES . . . . . . . . . . . . 5 62 1.3.2. Terms unique to VA . . . . . . . . . . . . . . . . . . 6 63 1.3.3. Terms unique to ES . . . . . . . . . . . . . . . . . . 7 64 1.4. Temporary Sections . . . . . . . . . . . . . . . . . . . . 7 65 1.4.1. Status as of September 2008 . . . . . . . . . . . . . 7 66 1.4.2. Document revisions . . . . . . . . . . . . . . . . . . 8 67 1.4.3. Open Questions . . . . . . . . . . . . . . . . . . . . 8 68 2. Overview of Virtual Aggregation (VA) . . . . . . . . . . . . . 10 69 2.1. Mix of legacy and VA routers . . . . . . . . . . . . . . . 11 70 2.2. Summary of Tunnels and Paths . . . . . . . . . . . . . . . 12 71 3. Specification of Edge Suppression (ES) . . . . . . . . . . . . 14 72 4. Specification of VA . . . . . . . . . . . . . . . . . . . . . 16 73 4.1. Requirements for VA . . . . . . . . . . . . . . . . . . . 16 74 4.2. VA Operation . . . . . . . . . . . . . . . . . . . . . . . 16 75 4.2.1. Legacy Routers . . . . . . . . . . . . . . . . . . . . 16 76 4.2.2. Advertising and Handling Virtual Prefixes (VP) . . . . 17 77 4.2.3. Border VA Routers . . . . . . . . . . . . . . . . . . 21 78 4.2.4. Advertising and Handling Sub-Prefixes . . . . . . . . 22 79 4.2.5. Suppressing FIB Sub-prefix Routes . . . . . . . . . . 22 80 4.3. Requirements Discussion . . . . . . . . . . . . . . . . . 24 81 4.3.1. Response to router failure . . . . . . . . . . . . . . 24 82 4.3.2. Traffic Engineering . . . . . . . . . . . . . . . . . 25 83 4.3.3. Incremental and safe deploy and start-up . . . . . . . 25 84 4.3.4. VA security . . . . . . . . . . . . . . . . . . . . . 26 85 4.4. New Configuration . . . . . . . . . . . . . . . . . . . . 26 86 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 87 6. Security Considerations . . . . . . . . . . . . . . . . . . . 29 88 6.1. Properly Configured VA . . . . . . . . . . . . . . . . . . 29 89 6.2. Mis-configured VA . . . . . . . . . . . . . . . . . . . . 29 90 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 91 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 92 8.1. Normative References . . . . . . . . . . . . . . . . . . . 31 93 8.2. Informative References . . . . . . . . . . . . . . . . . . 31 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 95 Intellectual Property and Copyright Statements . . . . . . . . . . 33 97 1. Introduction 99 ISPs today manage constant DFRT growth in a number of ways. Most 100 commonly, ISPs will upgrade their router hardware before DFRT growth 101 outstrips the size of the FIB. In cases where an ISP wants to 102 continue to use routers whose FIBs are not large enough, it may 103 deploy them at edge locations where a full DFRT is not needed, for 104 instance at the customer interface. Packets for which there is no 105 route are defaulted to a "core" infrastructure that does contain the 106 full DFRT. While this helps, it cannot be used for all edge routers, 107 for instance those that interface with other ISPs. Alternatively, 108 some lower-tier ISPs may simply ignore some routes, for instance 109 /24's that fall within the aggregate of another route. 111 FIB Suppression is an approach to shrinking FIB size that requires no 112 changes to BGP, no changes to packet forwarding mechanisms in 113 routers, and relatively minor changes to control mechanisms in 114 routers and configuration of those mechanisms. The core idea behind 115 FIB suppression is to run BGP as normal, and in particular to not 116 shrink the RIB, but rather to not load certain RIB entries into the 117 FIB, for instance by not committing them to the Routing Table. This 118 approach minimizes changes to routers, and in particular is simpler 119 than more general routing architectures that try to shrink both RIB 120 and FIB. With FIB suppression, there are no changes to BGP per se. 121 The BGP decision process does not change. The selected AS-path does 122 not change, and except on rare occasion the exit router does not 123 change. ISPs can deploy FIB suppression autonomously and with no 124 coordination with neighbor ASes. 126 This document describes two styles of FIB suppression, "Edge 127 Suppression" (ES) and "Virtual Aggregation" (VA). ES can be used in 128 ISPs that deploy a "core-edge" topology, where edge routers can 129 default route to core routers. In fact, this basic approach is in 130 use today with edge routers whose external peers do not require the 131 full DFRT, for instance stub networks. ES extends this to edge 132 routers whose external peers do require the full DFRT, including 133 neighbor ISPs and many multi-homed stub networks. ES requires that 134 core routers load the full DFRT into FIBs (i.e. do no FIB 135 suppression). ES operates by tunneling MPLS packets from the core, 136 through edge routers, to external peers (although edge routers strip 137 the MPLS header before forwarding packets to external peers). ES 138 works with legacy core routers, although they must be capable of 139 using MPLS tunnels. ES also works with any mix of legacy and 140 upgraded edge routers. ES imposes minimal new configuration 141 requirements on network operators. 143 By contrast, Virtual Aggregation (VA) allows for FIB suppression in 144 any and all routers within an ISP. The savings can be dramatic, 145 easily 5x or 10x with only a slight path length and router load 146 increase [va-tech-report-08]. VA operates by organizing the IP (v4 147 or v6) address space into Virtual Prefixes (VP), and using tunnels to 148 aggregate the (regular) sub-prefixes within each VP. 150 1.1. Scope of this Document 152 The scope of this document is limited to Intra-domain ES and VA 153 operation. In other words, the case where a single ISP autonomously 154 operates ES or VA internally without any coordination with 155 neighboring ISPs. 157 Note that this document assumes that the ES or VA "domain" (i.e. the 158 unit of autonomy) is the AS (that is, different ASes run VA 159 independently and without coordination). For the remainder of this 160 document, the terms ISP, AS, and domain are used interchangeably. 162 This document applies equally to IPv4 and IPv6. 164 ES or VA may operate with a mix of upgraded routers and legacy 165 routers. There are no topological restrictions placed on the mix of 166 routers. In order to avoid loops between upgraded and legacy 167 routers, however, any legacy routers that require a full FIB MUST 168 participate in tunnel formation (MPLS). 170 ES and VA use tunnels. While in principle a variety of tunnels may 171 be used---any tunnel that works for deploying a VPN---this document 172 limits itself to the use of MPLS tunnels, and indeed the terms 173 "tunnel" and "LSP" (Label Switched Path) are used somewhat 174 interchangeably. This document also generally assumes the use of the 175 Label Distribution Protocol (LDP) as the default method of 176 establishing LSPs [RFC5036]. Other methods of establishing LSPs may 177 be used. Future versions of this document may specify the use of 178 other tunnel types. 180 1.2. Requirements notation 182 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 183 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 184 document are to be interpreted as described in [RFC2119]. 186 1.3. Terminology 188 1.3.1. Terms common to both VA and ES 189 Install and Suppress: The terms "install" and "suppress" are used to 190 describe whether a RIB entry has been loaded or not loaded into 191 the FIB (or, equivalently, the Routing Table). In other words, 192 the phrase "install a route" means "install a route into the FIB", 193 and the phrase "suppress a route" means "do not install a route 194 into the FIB". 196 Legacy Router: A router that does not run VA or ES, and has no 197 knowledge of VA or ES. Legacy routers, however, must participate 198 in tunneling (with the exception of edge routers in ES that do not 199 carry the full DFRT). 201 Popular Prefix: A popular prefix is a sub-prefix that is installed 202 in a router in addition to the sub-prefixes it holds by virtue of 203 being a Aggregation Point Router (in the case of VA), or in 204 addition to the default route (in the case of ES). The popular 205 prefix allows packets to follow the shortest path. Note that 206 different routers do not need to have the same set of popular 207 prefixes. 209 Routing Table: The term Routing Table is defined here the same way 210 as in Section 3.2 of [RFC4271]: "Routing information that the BGP 211 speaker uses to forward packets (or to construct the forwarding 212 table used for packet forwarding) is maintained in the Routing 213 Table." As such, FIB Suppression can be achieved by not 214 installing a route into the Routing Table 216 Routing Information Base (RIB): The term RIB is used rather sloppily 217 in this document to refer either to the loc-RIB (as used in 218 [RFC4271]), or to the combined Adj-RIBs-In, the Loc-RIB, and the 219 Adj-RIBs-Out. 221 1.3.2. Terms unique to VA 223 Aggregation Point Router (APR): An Aggregation Point Router (APR) is 224 a router that aggregates a Virtual Prefix (VP) by installing 225 routes (into the FIB) for all of the sub-prefixes within the VP. 226 APRs advertise the VP to other routers with BGP. For each sub- 227 prefix within the VP, APRs have a Label Switched Path (LSP) from 228 themselves to the external peer where packets for that prefix 229 should be delivered. 231 non-APR Router: In discussing VPs, it is often necessary to 232 distinguish between routers that are APRs for that VP, and routers 233 that are not APRs for that VP (but of course may be APRs for other 234 VPs not under discussion). In these cases, the term "APR" will be 235 taken to mean "a VA router that is an APR for the given VP", and 236 the term "non-APR" will be taken to mean "a VA router that is not 237 an APR for the given VP". The term non-APR router will not be 238 used to refer to legacy routers. 240 Sub-Prefix: A regular (physically aggregatable) prefix. These are 241 equivalent to the prefixes that would normally comprise the DFRT 242 in the absence of VA. A VA router will contain a sub-prefix entry 243 either because the sub-prefix falls within a virtual prefix for 244 which the router is an APR, or because the sub-prefix is installed 245 as a popular prefix. Legacy routers hold the same sub-prefixes 246 they hold today. 248 VA router: A router that operates Virtual Aggregation according to 249 this document. 251 Virtual Prefix (VP): A Virtual Prefix (VP) is a prefix used to 252 aggregate its contained regular prefixes (sub-prefixes). A VP is 253 not physically aggregatable, and so it is aggregated at APRs 254 through the use of tunnels. 256 VP-List: A list of all VPs that must be statically configured into 257 every VA router. 259 1.3.3. Terms unique to ES 261 Core router: A router deployed in the core of a core-edge topology. 262 Core routers may be legacy routers, but they MUST participate in 263 tunnel creation (i.e. they must run MPLS), and they MUST NOT do 264 FIB suppression. 266 ES router: An edge router that operates Edge Suppression according 267 to this document. 269 1.4. Temporary Sections 271 This section contains temporary information, and will be removed in 272 the final version. 274 1.4.1. Status as of September 2008 276 A "configuration-only" variant of VA (i.e. one that can be deployed 277 with today's legacy routers) has been configured and tested on a 278 small testbed of commercial routers, as described in 279 [va-tech-report-08]. While this serves as proof that the data-plane 280 portion of Virtual Aggregation works, this configuration is 281 relatively complex, and there are some control-plane performance 282 issues associated with the routers that we configured. The changes 283 specified by this document (i.e. Section 4) are currently under 284 development. 286 1.4.2. Document revisions 288 1.4.2.1. Revisions from 00 version 290 o Changed intended document type from STD to BCP, as per advice from 291 Dublin IDR meeting. 293 o Cleaned up the MPLS language, and specified that the full-address 294 routes to external peers must be imported into OSPF 295 (Section 4.2.3). As per Daniel Ginsburg's email 296 http://www.ietf.org/mail-archive/web/idr/current/msg02933.html. 298 o Clarified that legacy routers must run MPLS. As per Daniel 299 Ginsburg's email 300 http://www.ietf.org/mail-archive/web/idr/current/msg02935.html. 302 o Fixed LOCAL_PREF bug. As per Daniel Ginsburg's email 303 http://www.ietf.org/mail-archive/web/idr/current/msg02940.html. 305 o Removed the need for the extended communities attribute on VP 306 routes, and added the requirement that all VA routers be 307 statically configured with the complete list of VPs. As per 308 Daniel Ginsburg's emails 309 http://www.ietf.org/mail-archive/web/idr/current/msg02940.html and 310 http://www.ietf.org/mail-archive/web/idr/current/msg02958.html. 311 In addition, the procedure for adding, deleting, splitting, and 312 merging VPs was added. As part of this, the possibility of having 313 overlapping VPs was added. 315 o Added the special case of a core-edge topology with default routes 316 to the edge as suggested by Robert Raszuk in email 317 http://www.ietf.org/mail-archive/web/idr/current/msg02948.html. 318 Note that this altered the structure and even title of the 319 document. 321 o Clarified that FIB suppression can be achieved by not loading 322 entries into the Routing Table, as suggested by Rajiv Asati in 323 email 324 http://www.ietf.org/mail-archive/web/idr/current/msg03019.html. 326 1.4.3. Open Questions 328 o Should we document IP-IP tunnels? Note that doing so may require 329 changes to BGP in order to distribute GRE Key values. 331 o Should we document stacked labels, where the outer label 332 terminates at the VA border router, and the inner label identifies 333 the external peer? Note that doing so may require changes to BGP 334 in order to distribute labels (similarly to what is done for BGP- 335 MPLS VPNs). 337 2. Overview of Virtual Aggregation (VA) 339 For descriptive simplicity, this section starts by describing VA 340 assuming that there are no legacy routers in the domain. Section 2.1 341 describes the additional functions required by VA routers to 342 accommodate legacy routers. 344 A key concept behind VA is to operate BGP as normal, and in 345 particular to populate the RIB with the full DFRT, but to suppress 346 many or most prefixes from being loaded into the FIB. By populating 347 the RIB as normal, we avoid any changes to BGP, and changes to router 348 operation are relatively minor. The basic idea behind VA is quite 349 simple. The address space is partitioned into large prefixes --- 350 larger than any aggregatable prefix in use today. These prefixes are 351 called virtual prefixes (VP). Different VPs do not need to be the 352 same size. They may be a mix of \6, \7, \8 (for IPv4), and so on. 353 Each ISP can independently select the size of its VPs. 355 VPs are not themselves physically aggregatable. VA makes the VPs 356 aggregatable through the use of tunnels, as follows. Associated with 357 each VP are one or more "Aggregation Point Routers" (APR). An APR 358 (for a given VP) is a router that installs routes for all sub- 359 prefixes (i.e. real physically aggregatable prefixes) within the VP. 360 By "install routes" here, we mean: 362 1. The route for each of the sub-prefixes is loaded into the FIB, 363 and 365 2. there is a tunnel from the APR to the external peer that is the 366 BGP NEXT_HOP for the route (though note that the tunnel header is 367 stripped before the packet reaches the external peer). 369 The APR originates a BGP route to the VP. This route is distributed 370 within the domain, but not outside the domain. With this structure 371 in place, a packet transiting the ISP goes from the ingress router to 372 the APR via a tunnel, and then from the APR to the external peer 373 through another tunnel. 375 Note that the AS-path is not effected at all by VA. Furthermore, the 376 external peer selected by the ISP is the same whether or not VA is 377 operating. This path may not follow the shortest path within the ISP 378 (where shortest path is defined here as the path that would have been 379 taken if VA were not operating), because the APR may not be on the 380 shortest path between the ingress and egress routers. When this 381 happens, the packet experiences additional latency and creates extra 382 load (by virtue of taking more hops than it otherwise would have). 384 VA can avoid traversing the APR for selected routes by installing 385 these routes in ingress routers. In other words, even if an ingress 386 router is not an APR for a given sub-prefix, it may install that sub- 387 prefix into its FIB. Packets in this case are tunneled directly from 388 the ingress to the egress. These routes are called "Popular 389 Prefixes", and are typically installed for policy reasons (i.e. 390 customer routes are always installed), or for sub-prefixes that carry 391 a high volume of traffic (Section 4.2.5.1). Different routers may 392 have different popular prefixes. As such, an ISP may assign popular 393 prefixes per router, per POP, or uniformly across the ISP. A given 394 router may have zero popular prefixes, or the majority of its FIB may 395 consist of popular prefixes. The effectiveness of popular prefixes 396 to reduce traffic load relies on the fact that traffic volumes follow 397 something like a power-law distribution: i.e. that 90% of traffic is 398 destined to 10% of the destinations. Internet traffic measurement 399 studies over the years have consistently shown that traffic patterns 400 follow this distribution, though there is no guarantee that they 401 always will. 403 Note that for routing to work properly, every packet must sooner or 404 later reach a router that has installed a sub-prefix route that 405 matches the packet. This would obviously be the case for a given 406 sub-prefix if every router has installed a route for that sub-prefix 407 (which of course is the situation in the absence of VA). If this is 408 not the case, then there must be at least one Aggregation Point 409 Router (APR) for the sub-prefix's virtual prefix (VP). Ideally, 410 every POP contains at least two APRs for every virtual prefix. By 411 having APRs in every POP, the latency imposed by routing to the APR 412 is minimal (the extra hop is within the POP). By having more than 413 one APR, there is a redundant APR should one fail. In practice it is 414 often not possible to have an APR for every VP in every POP. This is 415 because some POPs may have only one or a few routers, and therefore 416 there may not have enough cumulative FIB space in the POP to hold 417 every sub-prefix. Note that any router ("edge", "core", etc.) may be 418 an APR. 420 2.1. Mix of legacy and VA routers 422 It is important that an ISP be able to operate with a mix of "VA 423 routers" (routers upgraded to operate VA as described in the 424 document) and "legacy routers". This allows ISPs to deploy VA in an 425 incremental fashion and to continue to use routers that for whatever 426 reason cannot be upgraded. This document allows such a mix, and 427 indeed places no topological restrictions on that mix. It does, 428 however, require that legacy routers establish and use LSPs, so that 429 APRs can forward packets to them. Specifically, when a legacy router 430 is a border router, it must initiate LSPs to itself for instance 431 using LDP, [RFC5036], and must use its own address as the BGP 432 NEXT_HOP in routes received from external peers. 434 VA prevents the routing loops that might otherwise occur when VA 435 routers and legacy routers are mixed, as follows. First of all, note 436 that once a packet reaches a VA router (either because the ingress 437 router is a VA router, or because a legacy router forwards the packet 438 to a VA router), it will follow tunnels all the way to the egress 439 router (Section 2). If the egress router is a VA router, then the 440 packet is forwarded via the LSP mapping. If the egress router is a 441 legacy router, then it will forward the packet to the appropriate 442 external peer using its FIB entry. 444 If the ingress router is a legacy router, then it will forward the 445 packet to the BGP NEXT_HOP via the associated tunnel. 447 Note that even in the unexpected case that some ingress legacy router 448 actually does not use the tunnel but rather forwards the packet to 449 the IGP-resolved next hop, the packet will either work its way 450 towards the egress router, and will either progress through a series 451 of legacy routers (in which case the IGP prevents loops), or it will 452 eventually reach a VA router (after which it will exit the AS via 453 tunnels as described above). 455 2.2. Summary of Tunnels and Paths 457 To summarize, the following tunnels are created: 459 1. From all routers to all APRs (noting that most VA routers are 460 likely to be APRs). 462 2. From all routers to all legacy border routers. 464 3. From all routers to all external peers that are neighbors of VA 465 border routers. 467 There are a number of possible paths that packets may take through an 468 ISP, summarized in the following diagram. Here, "VA" is a VA router, 469 "LR" is a legacy router, the symbol "==>" represents a tunneled 470 packet (through zero or more LSRs), "-->" represents an untunneled 471 packet, and "(pop)" represents stripping the MPLS header. (Note that 472 the external peer may actually be a legacy router or a VA router---it 473 doesn't matter (and isn't known) to the ISP.) 474 Ingress Some APR Egress External 475 Router Router Router Router Peer 476 ------- ------ ------ ------ -------- 477 1. VA===================>VA=========>VA(pop)====>LR 479 2. VA===================>VA=========>LR--------->LR 481 3. VA===============================>VA(pop)====>LR 483 4. VA===============================>LR--------->LR 485 5. LR===============================>VA(pop)====>LR 487 6. LR===============================>LR--------->LR 489 (the following two are not expected, but may exist with 490 some legacy router) 492 7. LR------->VA (remaining paths as in 1 to 4 above) 494 8. LR------->LR--------------------->LR--------->LR 496 The first and second paths represent the case where the ingress 497 router does not have a popular prefix for the destination, and must 498 tunnel the packet to an APR. The third and fourth paths represent 499 the case where the ingress router does have a popular prefix for the 500 destination, and so tunnels the packet directly to the egress. The 501 fifth and sixth paths are similar, but where the ingress is a legacy 502 router, and effectively has the popular prefix by virtue of holding 503 the entire DFRT. (Note that some ISPs have only partial RIBs in 504 their customer-facing edge routers, and default route to a router 505 that holds the full DFRT. This case is not shown here.) Finally, 506 paths 7 and 8 represent the unexpected case where legacy routers do 507 not use an IGP-resolved next hop rather than a tunnel. 509 3. Specification of Edge Suppression (ES) 511 Edge Suppression can be thought of as VA with only a single VP (i.e. 512 the /0). Its operation, however, is much simpler. The topology for 513 ES consists of core routers and ES routers. Core routers MUST 514 install (into the FIB) the full DFRT, and MUST participate in tunnels 515 as described below. Any legacy router with tunneling capability and 516 a large enough FIB can be a core router. 518 ES routers are deployed at the edge. They MUST have a default route 519 to (or towards) a core router, which MUST be installed. This style 520 of configuration is common today, and so it is not necessary to 521 specify here how the default route is configured and managed. The 522 default route is the only route that ES routers must install, 523 although they may (and typically will) install additional routes. 524 Note that core routers or route reflectors that iBGP peer with an ES 525 router may choose to filter routes they send to the ES router, with 526 the obvious result that the ES router RIB will not contain the full 527 DFRT. This can only be done if the ES router's external peers do not 528 require the full DFRT. Whether or not an ISP chooses to do this is 529 orthogonal to the operation of ES per se, and is not mentioned again. 531 ES routers initiate MPLS Label Switched Paths (LSP, or tunnel) that 532 terminate at each of their external peers, which are then used by 533 other routers to forward packets to their external peers. 534 Specifically, ES routers MUST do the following: 536 1. They MUST initiate LSPs terminated at their external peers. 537 Specifically, they initiate Downstream Unsolicited tunnels to all 538 IGP neighbors for instance using LDP [RFC5036], with the full 539 address of their external peers (/32 for IPv4, /128 for IPv6) as 540 the FEC. The effect of this is that the ES border routers use 541 the received label to know to which external peer to forward an 542 outgoing packet (i.e. without having to do a FIB lookup), but 543 will strip the MPLS header before forwarding to the external 544 peer. 546 2. They MUST import the full address of the external peer into the 547 IGP (i.e. OSPF [RFC2328]). This is of course necessary for LDP 548 to establish the tunnels targeted to the external peers. 550 3. When forwarding externally-received routes over iBGP, the BGP 551 NEXT_HOP attribute MUST be set to the external peer (i.e. the FEC 552 of the corresponding LSP). 554 It is important that if any router has a tunnel to the BGP NEXT_HOP 555 of a route, that it use that tunnel. This should be normal behavior 556 for any router, but ISPs must take care to insure that this is the 557 case. 559 Sometimes an ES router may receive a packet from one external peer 560 that needs to be forwarded to another of its external peers. If the 561 only route in the FIB is the default route, then the packet will be 562 routed to a core router, which will forward the packet back to the ES 563 router via a tunnel. The extra hops can be avoided if the ES router 564 installs additional prefixes into the FIB, but under certain 565 constraints to prevent loops. Specifically, the router SHOULD 566 install any routes where the IGP next hop router is not the same 567 router as that of the default route, but only under the following 568 conditions: 570 o If the IGP next hop router is NOT an external peer, then the 571 router MUST use the tunnel to the BGP NEXT_HOP to forward the 572 packet. If the router does not have such a tunnel, then it MUST 573 NOT install the route. 575 o If the IGP next hop router IS an external peer, then the route is 576 installed without using a tunnel. 578 These conditions prevent the loop that would form whereby 1) ES 579 router R1 uses ES router R2 as a default route towards a core router, 580 2) ES router R2 installs a route where the IGP next hop is ES router 581 R1, and 3) ES router R1 does not install that route. 583 New configuration requirements for Edge Suppression (i.e. in addition 584 to the configuration required today to deploy a core-edge topology 585 with default routes at the edge) are minimal. The administrator must 586 tell the ES router that it is an ES router, and must indicate the 587 default route (including backup defaults). Given this, the ES router 588 can automatically establish the appropriate tunnels, install the 589 default route and the additional routes, and suppress all other 590 routes. 592 4. Specification of VA 594 This section describes how to operate VA. It starts with a brief 595 discussion of requirements, followed by a specification of router 596 support for VA. 598 4.1. Requirements for VA 600 While the core requirement is of course to be able to manage FIB 601 size, this must be done in a way that: 603 o is robust to router failure, 605 o allows for traffic engineering, 607 o allows for existing inter-domain routing policies, 609 o operates in a predictable manner and is therefore possible to 610 test, debug, and reason about performance (i.e. establish SLAs), 612 o can be safely installed, tested, and started up, 614 o Can be configured and reconfigured without service interruption, 616 o can be incrementally deployed, and in particular can be operated 617 in an AS with a mix of VA-capable and legacy routers, 619 o accommodates existing security mechanisms such as ingress 620 filtering and DoS defense, 622 o does not introduce significant new security vulnerabilities. 624 In short, operation of VA must not significantly affect the way ISPs 625 operate their networks today. Section 4.3 discusses the extent to 626 which these requirements are met by the design presented in 627 Section 4.2. 629 4.2. VA Operation 631 In this section, the detailed operation of VA is specified. 633 4.2.1. Legacy Routers 635 VA can operate with a mix of VA and legacy routers. Although legacy 636 routers have no notion of VA, they nevertheless MUST satisfy the 637 following requirements: 639 1. Each legacy router MUST initiate LSPs to itself. Specifically, 640 it initiates Downstream Unsolicited tunnels to all IGP neighbors 641 for instance using LDP [RFC5036], with its own full address (/32 642 if IPv4, /128 if IPv6) as the Forwarding Equivalence Class (FEC). 644 2. When forwarding externally-received routes over iBGP, the BGP 645 NEXT_HOP attribute MUST be set to the legacy router itself (the 646 FEC of the corresponding LSP). 648 3. Legacy routers MUST participate fully in LDP. In other words, 649 they MUST have all tunnels listed in Section 2.2. 651 4. Every legacy router MUST hold its complete FIB. 653 As long as legacy routers install LSPs as described here, there are 654 no topological restrictions on the legacy routers. They may be 655 freely mixed with VA routers without the possibility of forming 656 sustained loops (Section 2.1). 658 4.2.2. Advertising and Handling Virtual Prefixes (VP) 660 4.2.2.1. Distinguishing VP's from Sub-prefixes 662 VA routers must be able to distinguish VP's from sub-prefixes. This 663 is primarily in order to know which routes to install. In 664 particular, non-APR routers must know which prefixes are VPs before 665 they receive routes for those VPs, for instance when they first boot 666 up. This is in order to avoid the situation where they unnecessarily 667 start filling their FIB with routes that they ultimately don't need 668 to install (Section 4.2.5). 670 It MUST be possible to statically configure the complete list of VP's 671 into all VA routers. This list is known as the VP-List. 673 4.2.2.2. Limitations on Virtual Prefixes 675 From the point of view of best-match routing semantics, VPs are 676 treated identically to any other prefix. In other words, if the 677 longest matching prefix is a VP, then the packet is routed towards 678 the VP. If a packet matching a VP reaches an Aggregation Point 679 Router (APR) for that VP, and the APR does not have a better matching 680 route, then the packet is discarded by the APR (just as a router that 681 originates any prefix will discard a packet that does not have a 682 better match). 684 The overall semantics of VPs, however, are subtly different from 685 those of real prefixes (well, maybe not so subtly). Without VA, when 686 a router originates a route for a (real) prefix, the expectation is 687 that the addresses within the prefix are within the originating AS 688 (or a customer of the AS). For VPs, this is not the case. APRs 689 originate VPs whose sub-prefixes exist in different ASes. Because of 690 this, it is important that VPs not be advertised across AS 691 boundaries. 693 It is up to individual domains to define their own VPs. VPs MUST be 694 "larger" (span a larger address space) than any real sub-prefix. If 695 a VP is smaller than a real prefix, then packets that match the real 696 prefix will nevertheless be routed to an APR owning the VP, at which 697 point the packet will be dropped if it does not match a sub-prefix 698 within the VP (Section 6). 700 (Note that, in principle there are cases where a VP could be smaller 701 than a real prefix. There is where the egress router to the real 702 prefix is a VA router. In this case, the APR could theoretically 703 tunnel the packet to the appropriate external peer, which would then 704 forward the packet correctly. On the other hand, if the egress 705 router is a legacy router, then the APR could not tunnel matching 706 packets to the egress. This is because the egress would view the VP 707 as a better match, and would loop the packet back to the APR. For 708 this reason we require that VPs be larger than any real prefixes, and 709 that APR's never install prefixes larger than a VP in their FIBs.) 711 It is valid for a VP to be a subset of another VP. For example, 20/7 712 and 20/8 can both be VPs. In fact, this capability is necessary for 713 "splitting" a VP without increasing the FIB size in any router. 714 (Section 4.2.2.5). 716 4.2.2.3. Aggregation Point Routers (APR) 718 Any router may be configured as an Aggregation Point Router (APR) for 719 one or more Virtual Prefixes (VP). For each VP for which a router is 720 an APR, the router does the following: 722 1. The APR MUST originate a BGP route to the VP [RFC4271]. In this 723 route, the NLRI are all of the VPs for which the router is an 724 APR. This is true even for VPs that are a subset of another VP. 725 The ORIGIN is set to INCOMPLETE (value 2), the AS number of the 726 APR's AS is used in the AS_PATH, and the BGP NEXT_HOP is set to 727 the address of the APR. The ATOMIC_AGGREGATE and AGGREGATOR 728 attributes are not included. 730 2. The APR must attach a NO_EXPORT Communities Attribute [RFC1997] 731 to the route. 733 3. The APR MUST initiate LSPs terminating at itself. Specifically, 734 it initiates Downstream Unsolicited tunnels to all IGP neighbors 735 for instance using LDP [RFC5036], with the address that it used 736 in the BGP NEXT_HOP attribute of the VP route as the FEC. Note 737 that VA routers and legacy routers alike MUST have tunnels to the 738 APR. 740 4. If a packet is received at the APR whose best match is the VP 741 (i.e. it matches the VP but not any sub-prefixes within the VP), 742 then the packet MUST be discarded (see Section 4.2.2.2). This 743 can be accomplished by never installing a prefix larger than the 744 VP into the FIB, or by installing the VP as a route to \dev\null. 746 4.2.2.3.1. Selecting APRs 748 An ISP is free to select APRs however it chooses. The details of 749 this are outside the scope of this document. Nevertheless, a few 750 comments are made here. In general, APRs should be selected such 751 that the distance to the nearest APR for any VP is small---ideally 752 within the same POP. Depending on the number of routers in a POP, 753 and the sizes of the FIBs in the routers relative to the DFRT size, 754 it may not be possible for all VPs to be represented in a given POP. 755 In addition, there should be multiple APRs for each VP, again ideally 756 in each POP, so that the failure of one does not unduly disrupt 757 traffic. 759 APRs may be (and probably should be) statically assigned. They may 760 also, however, be dynamically assigned, for instance in response to 761 APR failure. For instance, each router may be assigned as a backup 762 APR for some other APR. If the other APR crashes (as indicated by 763 the withdrawal of its routes to its VPs), the backup APR can install 764 the appropriate sub-prefixes and advertise the VP as specified above. 765 Note that doing so may require it to first remove some popular 766 prefixes from its FIB to make room. 768 Note that, although VPs MUST be larger than real prefixes, there is 769 intentionally no mechanism designed to automatically insure that this 770 is the case. Such a mechanisms would be dangerous. For instance, if 771 an ISP somewhere advertised a very large prefix (a /4, say), then 772 this would cause APRs to throw out all VPs that are smaller than 773 this. For this reason, VPs must be set through static configuration 774 only. 776 4.2.2.4. Non-APR Routers 778 A non-APR router MUST install at least the following routes: 780 1. Routes to VPs (identifiable using the VP-List). 782 2. Routes to the largest of any prefixes that contain a given VP. 783 (Note that although this is not supposed to happen, if it does 784 the non-APR should install it, with the effect that any addresses 785 in the prefix not covered by VPs will be routed outside the 786 domain.) 788 3. Routes to all prefixes that contain an address that is in part of 789 the address space for which no VP is defined (i.e. as is done 790 today without VP). 792 If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it 793 MUST use the tunnel to forward packets to the BGP NEXT_HOP. 795 When an APR fails, routers MUST select another APR to send packets to 796 (if there is one). This happens, however, through normal internal 797 BGP convergence mechanisms. Note that it is strongly recommended 798 that routers keep at least two VP routes in their RIB at all times. 799 The main reason is that if the currently used VP route is withdrawn, 800 the second VP route can be immediately installed, and the issue of 801 whether to temporarily install sub-prefixes in the FIB is avoided 802 (Section 4.2.5). Another reason is that the IGP can be used to even 803 more quickly detect that the APR has crashed, again allowing the 804 second VP route to be immediately installed. 806 4.2.2.5. Adding and deleting VP's 808 An ISP may from time to time wish to reconfigure its VP-List. There 809 are a number of reasons. For instance, early in its deployment an 810 ISP may configure one or a small number of VPs in order to test VA. 811 As the ISP gets more confident with VA, it may increase the number of 812 VPs. Or, an ISP may start with a small number of large VPs (i.e. 813 /4's), and over time move to more smaller VPs in order to save even 814 more FIB. In this case, the ISP will need to "split" a VP. Finally, 815 since the address space is not uniformly populated with prefixes, the 816 ISP may want to change the size of VPs in order to balance FIB size 817 across routers. This can involve both splitting and merging VPs. Of 818 course, an ISP MUST be able to modify its VP-List without 1) 819 interrupting service to any destinations, or 2) temporarily 820 increasing the size of any FIB (i.e. where the FIB size during the 821 change is no bigger than its size either before or after the change). 823 Adding a VP is straightforward. The first step is to configure the 824 APRs for the VP. This causes the APRs to originate routes for the 825 VP. Non-APR routers will install this route according to the rules 826 in Section 4.2.2.4. even though they do not yet recognize that the 827 prefix is a VP. Subsequently the VP is added to the VP-List of non- 828 APR routers. The Non-APR routers can then start suppressing the sub- 829 prefixes with no loss of service. 831 To delete a VP, the process is reversed. First, the VP is removed 832 from the VP-Lists of non-APRs. This causes the non-APRs to install 833 the sub-prefixes. After all sub-prefixes have been installed, the VP 834 may be removed from the APRs. 836 In many cases, it is desirable to split a VP. For instance, consider 837 the case where two routers, R1 and R2, are APRs for the same prefix. 838 It would be possible to shrink the FIB in both routers by splitting 839 the VP into two VPs (i.e. split one /6 into two /7's), and assigning 840 each router to one of the VPs. While this could in theory be done by 841 first deleting the larger VP, and then adding the smaller VPs, doing 842 so would temporarily increase the FIB size in non-APRs, which may not 843 have adequate space for such an increase. For this reason, we allow 844 overlapping VPs. 846 To split a VP, first the two smaller VPs are added to the VP-lists of 847 all non-APR routers (in addition to the larger superset VP). Next, 848 the smaller VPs are added to the selected APRs (which may or may not 849 be APRs for the larger VP). Because the smaller VPs are a better 850 match than the larger VP, this will cause the non-APR routers to 851 forward packets to the APRs for the smaller VPs. Next, the larger VP 852 can be removed from the VP-lists of all non-APR routers. Finally, 853 the larger VP can be removed from its APRs. 855 Finally, to merge two VPs, the new larger VP is configured in all 856 non-APRs. This has no effect on FIB size or APR selection, since the 857 smaller VPs are better matches. Next the larger VP is configured in 858 its selected APRs. Next the smaller VPs are deleted from all non- 859 APRs. Finally, the smaller VPs are deleted from their corresponding 860 APRs. 862 4.2.3. Border VA Routers 864 VA routers that are border routers MUST do the following: 866 1. They MUST initiate LSPs to their external peers. Specifically, 867 they initiate Downstream Unsolicited tunnels to all IGP neighbors 868 for instance using LDP [RFC5036], with the full address of their 869 external peers (/32 for IPv4, /128 for IPv6) as the FEC. The 870 effect of this is that the VA borders use the received label to 871 know to which external peer to forward an outgoing packet (i.e. 872 without having to do a FIB lookup), but will strip the MPLS 873 header before forwarding to the external peer. 875 2. They MUST import the full address of the external peer into the 876 IGP (i.e. OSPF [RFC2328]). This is of course necessary for LDP 877 to establish the tunnels targeted to the external peers. 879 3. When forwarding externally-received routes over iBGP, the BGP 880 NEXT_HOP attribute MUST be set to the external peer (i.e. the FEC 881 of the corresponding LSP). 883 (Note that an alternative approach would be to used stacked labels, 884 with the outer label terminating at the border router, and the inner 885 label identifying the external peer and distributed in BGP as 886 described in [RFC3107]. This approach requires that fewer tunnels be 887 installed by LDP. The need for this approach is for further study.) 889 4.2.4. Advertising and Handling Sub-Prefixes 891 Sub-prefixes are advertised and handled by BGP as normal. VA does 892 not effect this behavior. The only difference in the handling of 893 sub-prefixes is that they might not be installed in the FIB, as 894 described in Section 4.2.5. 896 In those cases where the route is installed, packets forwarded to 897 prefixes external to the AS MUST be transmitted via the LSP 898 established as described in Section 4.2.3. 900 4.2.5. Suppressing FIB Sub-prefix Routes 902 Any route not for a known VP (i.e. not in the VP-List) is taken to be 903 a sub-prefix. The following rules are used to determine if a sub- 904 prefix route can be suppressed. 906 1. If the router is an APR, a route for every sub-prefix within the 907 VP MUST be installed. 909 2. If a non-APR router has a sub-prefix route that does not fall 910 within any VP (as determined by the VP-List), then the route must 911 be installed. This may occur because the ISP hasn't defined a VP 912 covering that prefix, for instance during an incremental 913 deployment buildup. 915 3. If a non-APR router does not have a route for a known VP, then it 916 MAY or MAY NOT install sub-prefixes within that VP. Whether or 917 not it does is up to the vendor and the network operator. One 918 approach is to never install such sub-prefixes, on the assumption 919 that the network operator will engineer his network so that this 920 rarely if ever happens. 922 4. Another approach is to have routers install such sub-prefixes, 923 but taking care not to do so if the missing VP route is a 924 transient condition. For instance, if the router is booting up, 925 and simply has not yet received all of its routes, then it can 926 reasonably expect to receive a VP route soon and so SHOULD NOT 927 install the sub-prefixes. On the other hand, if a continuously 928 operating router had only a single remaining route for the VP, 929 and that route is withdrawn, then the router might not expect to 930 receive a replacement VP route soon and so SHOULD install the 931 sub-prefixes. Obviously a router can't predict the future with 932 certainty, so the following algorithm might be a useful way to 933 manage whether or not to install sub-prefixes for a non-existing 934 VP route: 936 * Define a timer MISSING_VP_TIMER, set for a relatively short 937 time (say 10 seconds or so). 939 * Start the timer when either: 1) the last VP route is 940 withdrawn, or 2) there are initially neither VP routes nor 941 sub-prefix routes, and the first sub-prefix route is received. 943 * When the timer expires, install sub-prefix routes. Note, 944 however, that optional routes may first need to be removed 945 from the FIB to make room for the new sub-prefix routes. If 946 even after removing optional routes there is no room in the 947 FIB for sub-prefix routes, then they should remain suppressed. 948 In other words, sub-prefix entries required by virtue of being 949 an APR take priority over sub-prefix entries required by 950 virtue of not having a VP route. 952 5. All other sub-prefix routes MAY be suppressed. Such "optional" 953 sub-prefixes that are nevertheless installed are referred to as 954 popular prefixes. 956 4.2.5.1. Selecting Popular Prefixes 958 Individual routers may independently choose which sub-prefixes are 959 popular prefixes. There is no need for different routers to install 960 the same sub-prefixes. There is therefore significant leeway as to 961 how routers select popular prefixes. As a general rule, routers 962 should fill the FIB as much as possible, because the cost of doing so 963 is relatively small, and more FIB entries leads to fewer packets 964 taking a longer path. Broadly speaking, an ISP may choose to fill 965 the FIB by making routers APR's for as many VP's as possible, or by 966 assigning relatively few APR's and rather filling the FIB with 967 popular prefixes. Several basic approaches to selecting popular 968 prefixes are outlined here. Router vendors are free to implement 969 whatever approaches they want. 971 1. Policy-based: The simplest approach for network administrators is 972 to have broad policies that routers use to determine which sub- 973 prefixes are designated as popular. An obvious policy would be a 974 "customer routes" policy, whereby all customer routes are 975 installed (as identified for instance by community attribute 976 tags). Another policy would be for a router to install prefixes 977 originated by specific ASes. For instance, two ISPs could 978 mutually agree to install each other's originated prefixes. A 979 third policy might be to install prefixes with the shortest AS- 980 path. 982 2. Static list: Another approach would be to configure static lists 983 of specific prefixes to install. For instance, prefixes 984 associated with an SLA might be configured. Or, a list of 985 prefixes for the most popular websites might be installed. 987 3. High-volume prefixes: By installing high-volume prefixes as 988 popular prefixes, the latency and load associated with the longer 989 path required by VA is minimized. One approach would be for an 990 ISP to measure its traffic volume over time (days or a few 991 weeks), and statically configure high-volume prefixes as popular 992 prefixes. There is strong evidence that prefixes that are high- 993 volume tend to remain high-volume over multi-day or multi-week 994 timeframes (though not necessarily at short timeframes like 995 minutes or seconds). High-volume prefixes may also be installed 996 dynamically. In other words, a router measures its own traffic 997 volumes, and installs and removes popular prefixes in response to 998 short term traffic load. The downside of this approach is that 999 it complicates debugging network problems. If packets are being 1000 dropped somewhere in the network, it is more difficult to find 1001 out where if the selected path can change dynamically. 1003 4.3. Requirements Discussion 1005 This section describes the extent to which VA satisfies the list of 1006 requirements given in Section 4.1. 1008 4.3.1. Response to router failure 1010 VA introduces a new failure mode in the form of Aggregation Point 1011 Router (APR) failure. There are two basic approaches to protecting 1012 against APR failure, static APR redundancy, and dynamic APR 1013 assignment (see Section 4.2.2.3.1). In static APR redundancy, enough 1014 APRs are assigned for each Virtual Prefix (VP) so that if one goes 1015 down, there are others to absorb its load. Failover to a static 1016 redundant APR is automatic with existing BGP mechanisms. If an APR 1017 crashes, BGP will cause packets to be routed to the next nearest APR. 1018 Nevertheless, there are three concerns here: convergence time, load 1019 increase at the redundant APR, and latency increase for diverted 1020 flows. 1022 Regarding convergence time, note that, while fast-reroute mechanisms 1023 apply to the rerouting of packets to a given APR or egress router, 1024 they don't apply to APR failure. Convergence time was discussed in 1025 Section 4.2.2.4, which suggested that it is likely that BGP 1026 convergence times will be adequate, and if not the IGP mechanisms may 1027 be used. 1029 Regarding load increase, in general this is relatively small. This 1030 is because substantial reductions in FIB size can be achieved with 1031 almost negligible increase in load. For instance, 1032 [va-tech-report-08] shows that a 5x reduction in FIB size yields a 1033 less than one percent increase in load overall. Given this, 1034 depending on the configuration of redundant APRs, failure of one APR 1035 increases the load of its backups by only a few percent. This is 1036 well within the variation seen in normal traffic loads. 1038 Regarding latency increase, some flows may see a significant increase 1039 in delay (and, specifically, an increase that puts it outside of its 1040 SLA boundary). Normally a redundant APR would be placed within the 1041 same POP, and so increased latency would be minimal (assuming that 1042 load is also quite small, and so there is no significant queuing 1043 delay). It is not always possible, however, to have an APR for every 1044 VP within every POP, much less a redundant APR within every POP, and 1045 so sometimes failure of an APR will result in significant latency 1046 increases for a small fraction of traffic. 1048 4.3.2. Traffic Engineering 1050 VA complicates traffic engineering because the placement of APRs and 1051 selection of popular prefixes influences how packets flow. (Though 1052 to repeat, increased load is in any event likely to be minimal, and 1053 so the effect on traffic engineering should not be great in any 1054 event.) Since the majority of packets may be forwarded by popular 1055 prefixes (and therefore follow the shortest path), it is particularly 1056 important that popular prefixes be selected appropriately. As 1057 discussed in Section 4.2.5.1, there are static and dynamic approaches 1058 to this. [va-tech-report-08] shows that high-volume prefixes tend to 1059 stay high-volume for many days, and so a static strategy is probably 1060 adequate. VA can operate correctly using either RSVP-TE [RFC3209] or 1061 LDP to establish tunnels. 1063 4.3.3. Incremental and safe deploy and start-up 1065 It must be possible to install and configure VA in a safe and 1066 incremental fashion, as well as start it up when routers reboot. 1067 This document allows for a mixture of VA and legacy routers, allows a 1068 fraction or all of the address space to fall within virtual prefixes, 1069 and allows different routers to suppress different FIB entries 1070 (including none at all). As a result, it is generally possible to 1071 deploy and test VA in an incremental fashion. Although MPLS and LDP 1072 must be operational everywhere, once done, an ISP can incrementally 1073 increase the number of VA routers, the number of VPs, and the number 1074 of suppressed FIB entries over time. 1076 Likewise, routers can bootstrap VA by first bringing up the IGP, then 1077 establish LSPs, then establish routes to all required sub-prefixes, 1078 and then finally advertise VPs. 1080 4.3.4. VA security 1082 Regarding ingress filtering, because in VA the RIB is effectively 1083 unchanged, routers contain the same information they have today for 1084 installing ingress filters [RFC2827]. Presumably, installing an 1085 ingress filter in the FIB takes up some memory space. Since ingress 1086 filtering is most effective at the "edge" of the network (i.e. at the 1087 customer interface), the number of FIB entries for ingress filtering 1088 should remain relatively small---equal to the number of prefixes 1089 owned by the customer. Whether this is true in all cases remains for 1090 further study. 1092 Regarding DoS attacks, there are two issues that need to be 1093 considered. First, does VA result in new types of DoS attacks? 1094 Second, does VA make it more difficult to deploy DoS defense systems. 1095 Regarding the first issue, one possibility is that an attacker 1096 targets a given router by flooding the network with traffic to 1097 prefixes that are not popular, and for which that router is an APR. 1098 This would cause a disproportionate amount of traffic to be forwarded 1099 to the APR(s). While it is up to individual ISPs to decide if this 1100 attack is a concern, it does not strike the authors that this attack 1101 is likely to significantly worsen the DoS problem. 1103 Regarding DoS defense system deployment, more input about specific 1104 systems is needed. It is the authors' understanding, however, that 1105 at least some of these systems use dynamically established Routing 1106 Table entries to divert victims' traffic into LSPs that carry the 1107 traffic to scrubbers. The expectation is that this mechanism simply 1108 over-rides whatever route is in place (with or without VA), and so 1109 the operation of VA should not limit the deployment of these types of 1110 DoS defense systems. Nevertheless, more study is needed here. 1112 4.4. New Configuration 1114 VA places new configuration requirements on ISP administrators. 1115 Namely, the administrator must: 1117 1. Select VPs, and configure the VP-List into all VA routers. As a 1118 general rule, having a larger number of relatively small prefixes 1119 gives administrators the most flexibility in terms of filling 1120 available FIB with sub-prefixes, and in terms of balancing load 1121 across routers. Once an administrator has selected a VP-List, it 1122 is just as easy to configure routers with a large list as a small 1123 list. We can expect network operator groups like NANOG to 1124 compile good VP-Lists that ISPs can then adopt. A good list 1125 would be one where the number of VPs is relatively large, say 100 1126 or so (noting again that each VP must be smaller than a real 1127 prefix), and the number of sub-prefixes within each VP is roughly 1128 the same. 1130 2. Select and configure APRs. There are three primary 1131 considerations here. First, there must be enough APRs to handle 1132 reasonable APR failure scenarios. Second, APR assignment should 1133 not result in router overload. Third, particularly long paths 1134 should be avoided. Ideally there should be two APRs for each VP 1135 within each PoP, but this may not be possible for small PoPs. 1136 Failing this, there should be at least two APRs in each 1137 geographical region, so as to minimize path length increase. 1138 Routers should have the appropriate counters to allow 1139 administrators to know the volume of APR traffic each router is 1140 handling so as to adjust load by adding or removing APR 1141 assignments. 1143 3. Select and configure Popular Prefixes or Popular Prefix policies. 1144 There are two general goals here. The first is to minimize load 1145 overall by minimizing the number of packets that take longer 1146 paths. The second is to insure that specific selected prefixes 1147 don't have overly long paths. These goals must be weighed 1148 against the administrative overhead of configuring potentially 1149 thousands of popular prefixes. As one example a small ISP may 1150 wish to keep it simple by doing nothing more than indicating that 1151 customer routes should be installed. In this case, the 1152 administrator could otherwise assign as many APRs as possible 1153 while leaving enough FIB space for customer routes. As another 1154 example, a large ISP could build a management system that takes 1155 into consideration the traffic matrix, customer SLAs, robustness 1156 requirements, FIB sizes, topology, and router capacity, and 1157 periodically automatically computes APR and popular prefix 1158 assignments. 1160 5. IANA Considerations 1162 There are no IANA considerations. 1164 6. Security Considerations 1166 We consider the security implications of VA under two scenarios, one 1167 where VA is configured and operated correctly, and one where it is 1168 mis-configured. A cornerstone of VA operation is that the basic 1169 behavior of BGP doesn't change, especially inter-domain. Among other 1170 things, this makes it easier to reason about security. 1172 6.1. Properly Configured VA 1174 If VA is configured and operated properly, then the external behavior 1175 of an AS does not change. The same upstream ASes are selected, and 1176 the same prefixes and AS-paths are advertised. Therefore, a properly 1177 configured VA domain has no security impact on other domains. 1179 This document discusses intra-domain security concerns in 1180 Section 4.3.4 which argues that any new security concerns appear to 1181 be relatively minor. 1183 If another ISP starts advertising a prefix that is larger than a 1184 given VP, this prefix will be ignored by APRs that have a VP that 1185 falls within the larger prefix (Section 4.2.2.3). As a result, 1186 packets that might otherwise have been routed to the new larger 1187 prefix will be dropped at the APRs. Note that the trend in the 1188 Internet is towards large prefixes being broken up into smaller ones, 1189 not the reverse. Therefore, such a larger prefix is likely to be 1190 invalid. If it is determined without a doubt that the larger prefix 1191 is valid, then the ISP will have to reconfigure its VPs. 1193 6.2. Mis-configured VA 1195 VA introduces the possibility that a VP is advertised outside of an 1196 AS. This in fact should be a low probability event, but it is 1197 considered here none-the-less. 1199 If an AS leaks a large VP (i.e. larger than any real prefixes), then 1200 the impact is minimal. Smaller prefixes will be preferred because of 1201 best-match semantics, and so the only impact is that packets that 1202 otherwise have no matching routes will be sent to the misbehaving AS 1203 and dropped there. If an AS leaks a small VP (i.e. smaller than a 1204 real prefix), then packets to that AS will be hijacked by the 1205 misbehaving AS and dropped. This can happen with or without VA, and 1206 so doesn't represent a new security problem per se. 1208 7. Acknowledgements 1210 The authors would like to acknowledge the efforts of Xinyang Zhang 1211 and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an 1212 early inter-domain variant of FIB suppression, and the efforts of 1213 Hitesh Ballani and Tuan Cao, who worked on the configuration-only 1214 variant of VA that works with legacy routers. We would also like to 1215 thank Hitesh and Tuan, as well as Scott Brim, Daniel Ginsburg, Robert 1216 Raszuk, and Rajiv Asati for their helpful comments. In particular, 1217 Daniel's comments significantly simplified the spec (eliminating the 1218 need for a new External Communities Attribute), and Robert suggested 1219 Edge Suppression. 1221 8. References 1223 8.1. Normative References 1225 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 1226 Communities Attribute", RFC 1997, August 1996. 1228 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1229 Requirement Levels", BCP 14, RFC 2119, March 1997. 1231 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 1233 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1234 Defeating Denial of Service Attacks which employ IP Source 1235 Address Spoofing", BCP 38, RFC 2827, May 2000. 1237 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 1238 BGP-4", RFC 3107, May 2001. 1240 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 1241 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 1242 Tunnels", RFC 3209, December 2001. 1244 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 1245 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1247 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 1248 Specification", RFC 5036, October 2007. 1250 8.2. Informative References 1252 [va-tech-report-08] 1253 Francis, P., Ballani, H., and T. Cao, "Virtual 1254 Aggregation: A Configuration-only Approach to Reducing 1255 FIB Size", Cornell Technical Report http://hdl.handle.net/ 1256 1813/11058 http://hdl.handle.net/1813/11058, July 2008. 1258 Authors' Addresses 1260 Paul Francis 1261 Cornell University 1262 4108 Upson Hall 1263 Ithaca, NY 14853 1264 US 1266 Phone: +1 607 255 9223 1267 Email: francis@cs.cornell.edu 1269 Xiaohu Xu 1270 Huawei Technologies 1271 No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District 1272 Beijing, Beijing 100085 1273 P.R.China 1275 Phone: +86 10 82836073 1276 Email: xuxh@huawei.com 1278 Hitesh Ballani 1279 Cornell University 1280 4130 Upson Hall 1281 Ithaca, NY 14853 1282 US 1284 Phone: +1 607 279 6780 1285 Email: hitesh@cs.cornell.edu 1287 Full Copyright Statement 1289 Copyright (C) The IETF Trust (2008). 1291 This document is subject to the rights, licenses and restrictions 1292 contained in BCP 78, and except as set forth therein, the authors 1293 retain all their rights. 1295 This document and the information contained herein are provided on an 1296 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1297 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1298 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1299 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1300 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1301 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1303 Intellectual Property 1305 The IETF takes no position regarding the validity or scope of any 1306 Intellectual Property Rights or other rights that might be claimed to 1307 pertain to the implementation or use of the technology described in 1308 this document or the extent to which any license under such rights 1309 might or might not be available; nor does it represent that it has 1310 made any independent effort to identify any such rights. Information 1311 on the procedures with respect to rights in RFC documents can be 1312 found in BCP 78 and BCP 79. 1314 Copies of IPR disclosures made to the IETF Secretariat and any 1315 assurances of licenses to be made available, or the result of an 1316 attempt made to obtain a general license or permission for the use of 1317 such proprietary rights by implementers or users of this 1318 specification can be obtained from the IETF on-line IPR repository at 1319 http://www.ietf.org/ipr. 1321 The IETF invites any interested party to bring to its attention any 1322 copyrights, patents or patent applications, or other proprietary 1323 rights that may cover technology that may be required to implement 1324 this standard. Please address the information to the IETF at 1325 ietf-ipr@ietf.org.