idnits 2.17.1 draft-ietf-grow-va-perf-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([I-D.francis-intra-va]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2009) is 5381 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 587, but no explicit reference was found in the text Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group H. Ballani 3 Internet-Draft Cornell U. 4 Intended status: Informational P. Francis 5 Expires: January 5, 2010 MPI-SWS 6 D. Jen 7 UCLA 8 X. Xu 9 Huawei 10 L. Zhang 11 UCLA 12 July 4, 2009 14 Performance of Virtual Aggregation 15 draft-ietf-grow-va-perf-00.txt 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on January 5, 2010. 40 Copyright Notice 42 Copyright (c) 2009 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents in effect on the date of 47 publication of this document (http://trustee.ietf.org/license-info). 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. 51 Abstract 53 The document "FIB Suppression with Virtual Aggregation" 54 [I-D.francis-intra-va] describes how router FIB size may be reduced. 55 This approach entails a trade-off between path-length and load versus 56 FIB size. It also has the potential to reduce convergence time. 57 This document describes the results of several studies that examine 58 these characteristics. The results of a study for a Tier-1 ISP with 59 a relatively sophisticated deployment of VA, shows that FIB size 60 could be reduced ten times or more with a worst-case latency penalty 61 of 4ms and a worst-case load increase of <1.5%. Another study, 62 examining a much simpler style of VA deployment, also for a Tier-1 63 ISP, shows that FIB size can be reduced by four times (in routers 64 serving as APRs), and more than 10 times in other routers. Here, 65 worst-case latency increase was 16 ms, though this is almost 66 certainly an over-estimate, both because traceroute was used to make 67 the measurement, and because popular prefixes were not considered. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 73 1.2. Temporary Sections . . . . . . . . . . . . . . . . . . . . 5 74 1.2.1. Document revisions . . . . . . . . . . . . . . . . . . 5 75 2. The NSDI 2009 Study on FIB size versus load and stretch . . . 5 76 3. The IETF74 Study . . . . . . . . . . . . . . . . . . . . . . . 8 77 3.1. Evaluation Setup . . . . . . . . . . . . . . . . . . . . . 8 78 3.2. Evaluation Results . . . . . . . . . . . . . . . . . . . . 10 79 4. Convergence Time . . . . . . . . . . . . . . . . . . . . . . . 11 80 4.1. Topology Convergence Time . . . . . . . . . . . . . . . . 11 81 4.2. Boot Convergence Time . . . . . . . . . . . . . . . . . . 13 82 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 83 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 6.1. Normative References . . . . . . . . . . . . . . . . . . . 14 85 6.2. Informative References . . . . . . . . . . . . . . . . . . 14 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 88 1. Introduction 90 The document "FIB Suppression with Virtual Aggregation" 91 [I-D.francis-intra-va] describes how router FIB size may be reduced. 92 This approach entails a trade-off between path-length and load versus 93 FIB size. This document describes the results of two independent 94 studies that examine these tradeoffs. 96 One of the studies is [nsdi09], published in NSDI 2009. In this 97 study, the router topology and traffic matrix of a Tier-1 ISP were 98 used to model the expected performance of relatively sophisticated VA 99 deployments on that Tier-1 network. The primary results of this 100 study are that FIB size could be reduced to 10% of DFZ FIB size or 101 better, with a latency increase of no more than 4ms and a maximum 102 load increase of <1.5%. Further, the load increase was relatively 103 uniform across the ISP's routers. With these savings, it is 104 estimated that the lifetime of routers with FIB space for only 1/4 105 million IPv4 routes could easily be extended five to ten years. 107 The other study [ietf74] evaluates a more straight-forward deployment 108 style for VA. This evaluation results were presented at the 74th 109 IETF and are summarized in this document. In a nutshell, the IETF 110 evaluation measures the benefits that an ISP might receive under a 111 relatively simpler deployment of VA. While the NSDI evaluation makes 112 assumptions on packet delivery speed and topological deployments that 113 help optimize the benefits of VA, the IETF evaluation looks at a very 114 intuitive and easy-to-manage deployment of VA, one that may be more 115 realistic for an ISP early on. We find that even with a very simple, 116 intuitive, low-maintenance deployment of VA, the ISP we studied would 117 still be able to reduce FIB sizes by 75% or more on all of its 118 routers, at a cost of no more than 16ms additional delivery latency 119 in the worst case. With the use of popular prefixes, this worst-case 120 latency increase could almost certainly be reduced significantly, 121 though this was not studied. In particular, routers in these POPs 122 could have FIB-installed those prefixes that are reachable via the 123 POP, thus avoiding the round-trip to another POP and back. 125 Both studies have their limitations. Neither study actually deployed 126 VA per se---rather they modeled the effect that VA would have on an 127 ISP given certain data from that ISP. The NSDI09 study estimated 128 latency by the speed of light distance between routers, and so is 129 almost certainly under-estimating latency increase by a small amount. 130 The IETF74 study, on the other hand, used trace-routes to estimate 131 latency, and so is probably over-estimating latency increase. 132 Overall, the NSDI09 reflects the best that VA could do, but might 133 only be achievable after considerable deployment experience. The 134 IETF74 study, on the other hand, better reflects what an ISP might 135 see in its initial, simple deployment. 137 1.1. Terminology 139 This document adopts the terminology from [I-D.francis-intra-va]. 140 The following terminology is additionally defined: 142 Latency Increase: In VA, routers don't maintain the entire DFZ FIB 143 and hence, traffic must be routed through APRs. This, in turn, 144 increases the length of the path traversed by packets. "Latency 145 increase" is defined as the increase in latency for packets 146 traversing an ISP due to the adoption of VA as compared to the 147 status quo. Note that traversal latency can have different 148 meanings: the NSDI09 study uses propagation latency while the 149 IETF74 uses delay measured by traceroute. 150 Stretch: Same as latency increase. 151 Worst-case Latency Increase: The maximum increase in latency for 152 packets destined to any routable prefix and ingressing at any ISP 153 router. 154 Load Increase: VA requires APRs to forward traffic which otherwise 155 need not be routed through them. The increase in the amount of 156 traffic forwarded as a fraction of the original traffic forwarded 157 by a router is termed as load increase. 158 Worst-case Load Increase: The maximum increase in load across all 159 the ISP's routers. 161 1.2. Temporary Sections 163 This section contains temporary information, and will be removed in 164 the final version. 166 1.2.1. Document revisions 168 This is the first document revision. 170 2. The NSDI 2009 Study on FIB size versus load and stretch 172 The NSDI09 paper describes and analyzes a "config-only" approach to 173 deploying VA. Specifically, it shows how VA can be deployed with 174 today's legacy routers without software or protocol changes. It has 175 two types of analysis. The first studies the trade-off between FIB 176 size and stretch and load. That analysis is presented in this 177 section. The second looks, somewhat superficially, at router 178 convergence time (specifically, the time it takes for a booting 179 router to fully initialize BGP). That analysis is presented in 180 Section 4. 182 The NSDI09 paper describes two approaches to VA, one where control- 183 plane Route Reflectors (RR) are used to filter out prefixes that need 184 to be suppressed, and another where each data-plane router does the 185 filtering at the boundary between the RIB and the routing table. The 186 latter deployment is closer to the intent of the VA internet drafts, 187 but for the purpose of FIB size/load-latency performance evaluation, 188 either approach applies equally. 190 The core performance issue is to understand the trade-off between FIB 191 size on one hand, and latency and load penalty on the other. To 192 understand this trade-off, the NSDI09 study closely modeled how VA 193 would perform on a Tier-1 ISP. The model was based on knowledge of 194 the Tier-1 ISP's router-level topology including router geographic 195 location, routing tables, and traffic matrix. This information was 196 obtained directly from the Tier-1 ISP. Some additional information 197 required for the study that was not directly available was inferred. 198 This includes the latency between routers, which is approximated as 199 the speed-of-light time for geographic distances between routers. It 200 also includes the IGP link-weights, which were also approximated 201 using (the inverse of) the router distance. Finally, queuing delay 202 was not considered, though given that load increase is quite low, the 203 queuing delay should not increase significantly. Switching time at 204 routers was also not considered, though again this is a relatively 205 minor component of delay. Overall, however, this study slightly 206 under-estimates latency increase. 208 In the study, VA was deployed in such a way that all routers can be 209 APRs. In other words, the different router roles (edge, core, 210 aggregation between edge and core) were not differentiated. As a 211 result, ALL routers in the ISP see significant FIB savings. 213 When deploying VA, two important questions to answer are, how are VPs 214 structured, and how are APRs selected (i.e., what is the assignment 215 of VP to APR)? With regards to the first question, the distribution 216 of prefixes across the VPs directly impacts the FIB size reduction 217 that can be achieved. If the VPs are structured such that they all 218 have similar number of prefixes, the ISP can ensure that all its 219 routers see substantial savings. As a contrast, if some VPs have a 220 lot of prefixes, one (or, a few) of the ISP's routers would need to 221 maintain them which, in turn, limits the reduction in worst-case FIB 222 size. 224 The NSDI09 study considered two approaches. In one, every VP is a 225 \7. This leads to 128 VPs: 0/7, 2/7, ..., 254/7, with the largest VP 226 containing 22,722 prefixes or 8.9% of today's routing table. In the 227 second, the set of VPs was selected such that the number of prefixes 228 in each VP is relatively uniform and each VP is larger than any real 229 prefix. This latter approach yielded 1024 VPs with the largest 230 containing 4,551 prefixes or 1.78% of the BGP routing table. In the 231 rest of this section, we summarize results using the latter approach. 233 Results using the /7 VPs can be found in the NSDI paper. The next 234 section also present results assuming /8 VPs and other simplifying 235 assumptions. 237 With regards to the second question, APRs were assigned using an 238 automated greedy algorithm that ensures that the maximum latency 239 increase for traffic to any prefix from any ISP router is below a 240 specified constraint while trying to minimize the FIB size across the 241 ISP's routers. 243 The analysis was split into two parts, one that looks only at FIB 244 size versus worst case latency increase, and another that adds 245 analysis of load. This is appropriate, because load is decreased 246 simply by adding popular prefixes, which increases FIB size without 247 changing worst case latency. Note that the use of popular prefixes 248 is necessary to achieve acceptable performance in this study, so the 249 analysis that ignores load should not be interpreted as the numbers 250 one would see in practice. 252 A VA deployment such that there is an APR for every VP in each of the 253 ISP's POPs is able to reduce the worst-case FIB size to 25% of the 254 DFZ FIB size. In this case, all traffic redirection is within a POP 255 and hence, the stretch imposed on traffic is virtually 0. The ISP 256 can also choose to incur traffic stretch to further reduce router FIB 257 requirements. The resulting FIB versus latency analysis produced a 258 range of results but there is a performance sweet-spot at the point 259 where worst-case stretch is capped at 4ms. In this case, the worst- 260 case FIB size is 5% of the DMZ FIB size (in other words, 20 times 261 reduction). Capping the latency at higher values did not 262 significantly shrink FIB size. 264 Besides looking at percentage of FIB reduction, the paper analyzed 265 how long the lifetime of current routers could extend into the future 266 before running out of FIB space. Using DMZ growth predictions based 267 on growth history[atnac06], Cat6500 routers that can hold 249K IPv4 268 FIB entries (and therefore could not hold full DFZ as of 2007), would 269 last a decade with virtually no increased latency, and over two 270 decades at 4ms increased latency. 272 The above analysis, however, ignores load. In VA, load is reduced by 273 installing so-called "popular prefixes" into the FIB. These are the 274 prefixes that have the heaviest traffic volumes. Without installing 275 popular prefixes, load increases by about 40%, which is clearly 276 unacceptable. 278 In the Tier-1 ISP studied, 1.5% of most popular prefixes carry 75.5% 279 of the traffic, while 5% of the prefixes carry 90.2% of the traffic 280 (as of late-2007). This "power-law" distribution has been the norm 281 over many studies spanning many years. Because of this distribution, 282 installing a relatively small number of popular prefixes improves 283 load tremendously. Note too that prefixes that are popular at one 284 time tend to stay popular over time. The 5% of popular prefixes that 285 produces 90% of the traffic on one day will still produce roughly 85% 286 of the traffic one month later. This means that an ISP could measure 287 its popular prefixes and reconfigure its routers relatively 288 infrequently---once per week or even less frequently. 290 In the analysis, when 5% of most popular prefixes are installed, the 291 worst-case load reduces to 1.38%. These 5% popular prefixes add to 292 the FIB size and so with a 4ms cap on the worst case latency 293 increase, the actual FIB size is 10% of the total. In other words, 294 overall this Tier-1 ISP could reduce FIB size by ten times while 295 increasing worst case latency by <4ms, and worst case load by <1.5%. 297 Besides studying the Tier-1 ISP in detail, the NSDI09 paper also used 298 Rocketfuel to make rough estimates of FIB size and latency for 9 299 other ISPs (load could not be estimated because Rocketfuel does not 300 have the traffic matrix of these ISPs). Because Rocketfuel tends to 301 underestimate the number of routers in an ISP, the analysis is 302 conservative (more routers means more aggregate FIB over which to 303 spread the routing table). In this analysis, assuming worst case 304 latency increase of 5ms, the FIB could be reduced to 5-15% of DFZ. 306 3. The IETF74 Study 308 While the NSDI09 results might suggest that Virtual Aggregation 309 offers a great tradeoff for ISPs, the relatively management 310 complexity that would result from the style of deployment used in 311 that study suggests that ISPs would not see that level of performance 312 in initial deployments. The IETF74 study described here uses a much 313 simpler style of VA deployment, something that better reflects what 314 an ISP would use early on. In particular, the IETF74 study considers 315 ease of management when assigning APRs and virtual prefixes to use. 316 As such, it better reflects the stretch/savings tradeoff they would 317 actually experience if they deployed VA in their networks today. 319 3.1. Evaluation Setup 321 The results of a VA evaluation will depend heavily on which VA 322 deployment is evaluated. VA deployments can differ in the amount of 323 VPs used, which VPs are used, how often they change, the amount of 324 APRs used, the VP-APR mappings, and the placement of APRs. These 325 variables must be given values to define a particular flavor of VA 326 that's being evaluated. The NSDI study selected these variable 327 values in order to maximize FIB savings and minimize additional 328 packet latency. This allows us to evaluate just how good the VA 329 stretch/savings tradeoff could potentially be, but the variable 330 values probably described a VA deployment that would not 331 realistically occur. For this study, selections of these variable 332 values will be based on what we consider most simple and intuitive. 333 In this subsection, we present the values we selected for these 334 variables, as well as justifications for our selections. 336 We start by describing where we decided to place the APRs for our 337 study. Again, our aim is for simplicity and ease of management. The 338 topology of the tier-1 ISP we used for this evaluation revealed that 339 the ISP consists of a many small PoPs with a few routers, and a few 340 large PoPs consisting of many routers. While exact numbers cannot be 341 revealed due to confidentiality agreements, the discrepancy between 342 the number of routers in small and large PoPs is significant. The 343 discrepancy between the number of large and small PoPs was also 344 significant. Based on these observations, we decided that it would 345 be intuitive to have an APR for every VP at each large PoP. 346 Furthermore, only large PoPs should contain APRs, which makes them 347 easy to locate. When forwarding packets, nodes in large PoPs should 348 obviously use their local APRs, while routers in small PoPs should 349 use the APRs from their nearest major PoP. Unlike the VA deployment 350 used for the NSDI evaluation, APR placement does not change with the 351 routing table. This significantly reduces troubleshooting efforts 352 compared to the architecture proposed by the NSDI evaluation. We 353 also had to decide how many APRs each major PoP should have. This 354 question is independent of the number of VPs an ISP decides to use. 355 For example, an ISP could choose to use 8 different VPs. However, 356 the ISP could have 1 APR be responsible for all 8 VPs, or assign 2 357 VPs to each of 4 different APRs, or assign 5 VPs to one APR and the 358 remaining 3 to another APR, etc. We decided that each large PoP 359 would evenly distribute the FIB storage requirements amongst 8 360 different APRs. We chose 8 APRs in particular simply because each 361 large PoP in our studied tier-1 ISP had at least 8 routers storing 362 complete global routing tables, and these machines would have the 363 storage capacities to be APRs. 365 The next decision involved which virtual prefixes to use. The more 366 virtual prefixes you have, the finer the granularity for assigning 367 virtual prefixes to APRs. This increases an ISP's flexibility to 368 divide FIB storage evenly amongst its different APRs, and thus keep 369 worst-case FIB size at a minimum. Of course, in order for VA to work 370 properly, VPs must be longer than any real prefixes. For these 371 reasons, the NSDI study maximized the number of VPs used by looking 372 at the global routing table on a certain date, and carefully adding 373 virtual prefixes that were never longer than any of the real prefixes 374 it covered, which yielded 1024 prefixes. However, note that as the 375 global routing table changes, the VPs used may have to change as 376 well. A static VP list would be much easier to manage. Therefore, 377 we decided to use VPs of length 8, which allows us to cover the v4 378 space using 256 VPs. While this number is smaller than the 1024 used 379 in the NSDI study, this allows us to maintain a static VP list where 380 each VP is still never longer than any of the real prefixes it 381 covers. 383 We have now decided on a VP list and a placement of APRs we decided 384 would be simple and easy to manage. Our final task to complete the 385 description of our VA deployment was to decide which VPs to assign to 386 which APRs. In accordance to the concept of manageability, we used a 387 simple and straightforward greedy algorithm for assigning virtual 388 prefixes to APRs. The aim was for this algorithm to be 389 computationally cheap and quickly run at regular intervals to still 390 evenly distribute FIB storage responsibilities amongst the APRs. The 391 algorithm basically assigns VPs to an APR one by one until the APR 392 has VPs covering at least 1/8 of the DFZ. We then starts assigning 393 the remaining VPs to another APR. The algorithm is as follows: 395 Select one of the eight APRs 396 For VPs 0/8 thru 255/8: 398 Assign VP to selected APR 399 If number of entries in APR > 1/8 of GRT: Select a previously 400 unselected APR 402 We have now assigned values to all of the variables required to 403 describe a particular deployment of VA. We can now evaluate the 404 costs and benefits of this VA deployment for the ISP. 406 3.2. Evaluation Results 408 FIB savings for VA routers are calculated by counting the number of 409 FIB entries in the VA router and comparing this to the size of the 410 global routing table. 412 To distribute FIB storage requirements evenly amongst the 8 different 413 APRs, we ran the algorithm described above on global routing table 414 contents for the first days of the months between 06/08 and 02/09. 415 The algorithm, while simple, works quite well. In the worst case, an 416 APR was assigned to cover 14% of the global routing table, as opposed 417 to the ideal amount of 12.5%(1/8th). 419 Under VA, routers have to also store peer-to-label mapping for each 420 external router that peers with the ISP. An operator for the ISP we 421 studied estimated the number of external peerings at around 20,000. 423 It turns out that even with this simple deployment of VA, the largest 424 storage requirements for an APR still reduces its FIB storage 425 requirements by 75%. For non-APR routers, FIB storage requirements 426 were reduced by 93%. 428 For each PoP, we calculated the worst-case stretch that a packet 429 ingressing from the PoP could experience. Stretch is defined as the 430 additional time the packet takes to exit the ISP due to suboptimal 431 paths introduced by the VA architecture. Worst-case stretch occurs 432 when the nearest APR for the packet is in the opposite direction of 433 the egress router out of the ISP, and thus the packet is stretched a 434 round-trip distance from the ingress PoP to the APR and back. 436 When calculating the worst-case stretch for each PoP, we did not 437 simply assume speed of light and shortest distance between PoPs as 438 the NSDI evaluation did. We wanted to capture the actual delay that 439 a user would experience for the packet, which would include 440 processing time as well as propagation time. Thus we used traceroute 441 to capture the true stretch experienced by packets due to VA. To 442 determine the worst case stretch for a given PoP, we tracerouted from 443 this PoP to all of the major PoPs. This allowed us to map the PoP to 444 its 'nearest' major PoP. We then used traceroute to determine the 445 time it takes to go to the nearest major PoP and back, thus giving us 446 the worst-case stretch for that PoP. We did this for every PoP, and 447 found that all PoPs can send a packet to a large PoP in 8ms or less, 448 and thus the worst-case stretch for any PoP in our studied ISP is 449 16ms. Furthermore, 70% of PoPs experience a worst-case stretch of 450 8ms or less, and over 30% of all PoPs experience no stretch at all. 451 This is because they were either a major PoP, or they naturally 452 defaulted to a major PoP for all of their traffic anyway, so VA did 453 not change their natural delivery path whatsoever. 455 4. Convergence Time 457 Regarding convergence time, in general we expect convergence with VA 458 to be faster than convergence without VA. The basic argument is 459 simple: updating FIB entries takes time. If there are fewer FIB 460 entries to update, then convergence goes faster. There are two forms 461 of convergence discussed here: convergence time for a given router 462 when a link or some other router goes down or comes up, and time to 463 fully initialize BGP when a router boots up. We call the first 464 topology-convergence, and the second boot-convergence. Each is 465 discussed in turn. 467 4.1. Topology Convergence Time 469 A simple experiment was run to demonstrate the improved topology- 470 convergence aspect of VA. Before discussing the experiment, it is 471 important to note that the benefit demonstrated by this experiment 472 could also be had using the "Prefix Independent Convergence" 473 mechanism. In other words, VA isn't the only way, or even the 474 simplest way, to achieve this kind of fast convergence. 476 The following topology was used in the experiment. 478 _.--------. 479 ,--'' `---. ,-------. 480 ,' `. ,-' AS2 `-. 481 ,' +----+ +----+ `. / +----+ \ 482 / | R1 |.........| R3 |.....\...(......| R2 | ) 483 / +----+ +----+ \ \ +----+ / 484 ; \ / : `-. ,-' 485 | \ / | `-------' 486 : \ / ; 487 \ +----+ / AS1 / 488 \ | R4 | / 489 `. +----+ ,' 490 `. ,' 491 `---. _.--' 492 `--------'' 494 Router R2 advertised a number of routes (ranging from 5000 to 220K) 495 to R3, which in turn distributed them in AS1 with full-mesh iBGP. 496 Packets were transmitted to R1 (via a test-box not shown) for 497 destinations within the advertised routes. At time T0, the link 498 R1--R3 is taken down. This results in a period of time during which 499 some fraction of transmitted packets are dropped at R1. The elapsed 500 time until all transmitted packets are successfully received at R2 is 501 then measured. 503 The following table shows the elapsed time for a varying number of 504 advertised routes, for both with VA and without VA. 506 Number of | With | Without | 507 Routes | VA | VA | 508 =============|==========|==========| 509 5000 | 2 sec | 2 sec | 510 -------------|----------|----------| 511 50000 | 2 sec | 10 sec | 512 -------------|----------|----------| 513 100000 | 2 sec | 17 sec | 514 -------------|----------|----------| 515 150000 | 3 sec | 24 sec | 516 -------------|----------|----------| 517 200000 | 3 sec | 30 sec | 518 -------------|----------|----------| 519 220000 | 3 sec | 35 sec | 520 -------------|----------|----------| 522 As can be seen from the table, convergence time for the VA scenario 523 is pretty much independent of routing table size. This is because 524 the only change in the FIB is that of the best igp next-hop to the 525 APR. In the non-VA case, however, the igp next-hop needs to be 526 changed for all prefixes before routes to all prefixes converge. 528 It should be noted that this experiment intentionally puts VA 529 topology-convergence in the best light. Router R1 only has a single 530 FIB entry that it needs to update: that of the VP. In practice, any 531 given router may have both VP sub-prefixes (those needed by virtue of 532 being an APR) and popular prefixes in the FIB. Certainly at least 533 the new next-hops to the VP sub-prefixes and VPs would have to be 534 FIB-installed before convergence. Realistically, however, the 535 popular prefixes should be installed too, since until they are load 536 is increased. Therefore, in practice the improvement in convergence 537 time will be less than shown here. 539 4.2. Boot Convergence Time 541 The NSDI09 paper [nsdi09] experimented with the time it takes for a 542 router to boot and fully populate its routing tables and advertise 543 all updates (i.e. initialize BGP). It is important to note that the 544 NSDI09 paper did not implement the VA draft [I-D.francis-intra-va] 545 per se. Rather it deployed VA using legacy routers configured to 546 implement VA. In fact, the NSDI09 paper experimented with two 547 distinct configurations of VA. In one, control-plane Route 548 Reflectors (RR) were used to filter out the prefixes not needed by 549 data-plane routers (i.e. prefixes that could be FIB-suppressed 550 according to the rules of VA). In this setting, both the RIBs and 551 FIBs of data-plane routers were shrunk. 553 The second configuration approach better reflects the spirit of the 554 VA draft. Here, data-plane routers ran BGP as normal (i.e. the RIBs 555 were populated more-or-less as they would be in the absence of VA), 556 and did FIB-suppression by setting the administrative distance for 557 suppressible prefixes to 255. In the Cisco routers used in the 558 experiment, such prefixes were then FIB-suppressed, but otherwise 559 treated normally. 561 In the test setup, the VA routers' FIBs held roughly 1/2 of the full 562 DFZ. VA as deployed with control-plane RRs was able to initialize 563 BGP about twice as fast as regular (full DFZ) routers (for example, 564 124 seconds versus 273 seconds). With the "admin-distance = 255" 565 approach, it took roughly twice as long for the VA routers to 566 initialize compared to the regular routers (for example, 487 seconds 567 versus 273 seconds). The reason for this is that these Cisco routers 568 were not designed to deal with large admin-distance access lists 569 efficiently. We believe that this inefficiency can be overcome, but 570 in fact there are at this time no hard numbers to back up this 571 supposition. 573 5. Acknowledgements 575 Tuan Cao, a PhD student at Cornell, did much of the heavy lifting on 576 the NSDI09 measurement study. 578 6. References 580 6.1. Normative References 582 [I-D.francis-intra-va] 583 Francis, P., Xu, X., Ballani, H., Jen, D., Raszuk, R., and 584 L. Zhang, "FIB Suppression with Virtual Aggregation", 585 draft-francis-intra-va-01 (work in progress), April 2009. 587 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 588 Requirement Levels", BCP 14, RFC 2119, March 1997. 590 6.2. Informative References 592 [atnac06] Huston, G. and G. Armitage, "Projection Future IPv4 Router 593 Requirements from Trends in Dynamic BGP Behaviour", ATNAC 594 2006 caia.swin.edu.au/pubs/ATNAC06/Huston_1m.pdf, 595 December 2006. 597 [ietf74] Jen, D., "Scaling FIBs with Virtual Aggregation: How Much 598 Stretch? How Much FIB savings?", IRTF Routing Research 599 Group Meeting http://www.ietf.org/proceedings/09mar/ 600 slides/RRG-2.pdf, March 2009. 602 [nsdi09] Ballani, H., Francis, P., Cao, T., and J. Wang, "Making 603 Routers Last Longer with ViAggre", ACM Usenix NSDI 2009 ht 604 tp://www.usenix.org/events/nsdi09/tech/full_papers/ 605 ballani/ballani.pdf, April 2009. 607 Authors' Addresses 609 Hitesh Ballani 610 Cornell University 611 4130 Upson Hall 612 Ithaca, NY 14853 613 US 615 Phone: +1 607 279 6780 616 Email: hitesh@cs.cornell.edu 618 Paul Francis 619 Max Planck Institute for Software Systems 620 Gottlieb-Daimler-Strasse 621 Kaiserslautern 67633 622 Germany 624 Phone: +49 631 930 39600 625 Email: francis@mpi-sws.org 627 Dan Jen 628 UCLA 629 4805 Boelter Hall 630 Los Angeles, CA 90095 631 US 633 Phone: 634 Email: jenster@cs.ucla.edu 635 Xiaohu Xu 636 Huawei Technologies 637 No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District 638 Beijing, Beijing 100085 639 P.R.China 641 Phone: +86 10 82836073 642 Email: xuxh@huawei.com 644 Lixia Zhang 645 UCLA 646 3713 Boelter Hall 647 Los Angeles, CA 90095 648 US 650 Phone: 651 Email: lixia@cs.ucla.edu