idnits 2.17.1 draft-alaettinoglu-isis-convergence-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 435 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 7 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** There are 5 instances of lines with control characters in the document. ** The abstract seems to contain references ([ISIS-IP], [ISIS-ISO]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ISIS-ISO' on line 37 looks like a reference -- Missing reference section? 'ISIS-IP' on line 37 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force C. Alaettinoglu 2 Internet Draft V. Jacobson 3 Expires in May, 2001 H. Yu 4 draft-alaettinoglu-isis-convergence-00.txt Packet Design 5 November, 2000 7 Towards Milli-Second IGP Convergence 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at 21 any time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow 26 Directories can be accessed at www.ietf.org/shadow.html. Distribution of 27 this memo is unlimited. 29 Copyright Notice 31 Copyright (C) The Internet Society (2000). All Rights Reserved. 33 Abstract 35 It is possible for link-state routing protocols to converge in link 36 propagation time scales, that is, in tens of milliseconds. However, 37 current deployments of ISIS[ISIS-ISO][ISIS-IP], a link-state routing 38 protocol, are not anywhere near this point. In this paper, we present 39 some analyses of ISIS convergence by showing its behavior upon 40 link/router failures and repairs, and its scaling properties to 41 large networks, both in terms of number of nodes and links. We 42 then explore changes needed in the ISIS specification and implementations 43 to reach IGP convergence in milliseconds. Our results are based 44 on experimentation done with ISIS, but some of the findings may 45 apply to OSPF as well. 47 This paper contains color figures and graphs which are missing from 48 the ASCII version of the paper. The postscript and pdf versions 49 of this paper can be found in the Internet-Drafts repository. 51 1 Motivation 53 The theoretical limit for link-state routing protocols to re-route 55 Alaettinoglu, et. al. Expires: May, 2000 [page 1 ] 56 ^L 57 is in link propagation time scales, that is in tens of milliseconds. 58 However, the current ISIS re-route times are in tens of seconds. 59 During the re-route period, a subset of destinations are either 60 not-reachable or are reached through non-optimal routes. 62 It is important to close this gap between theory and implementation 63 for several reasons. First, millisecond re-route times will lead 64 to increased network reliability since the periods at which routes 65 are not available/non-optimal will be shorter (one to three orders 66 of magnitude shorter). Second, it enables multi-service traffic 67 such as voice over IP since it decreases the number of packets 68 dropped to tolerable levels for these services. And finally, it 69 eliminates the need for more expensive and complex layer 2 protections 70 schemes, such as SONET. 72 There are three steps to ISIS re-routing: detection, propagation 73 and shortest path calculation. In the detection step, the topology 74 change is locally detected. That is, a link failure/repair is detected 75 by the routers the link is adjacent to, and a router failure/repair 76 is detected by the router's neighbors. In the propagation step, 77 a new link-state packet (LSP) reflecting the local topology after 78 the change is flooded to all the other routers in the network. 79 Finally, in the shortest path calculation step, each router having 80 received the new LSP computes the new routes using a shortest path 81 tree algorithm. Typically, Dijkstra's Shortest Path First (SPF) 82 algorithm is used in this calculation. Only after these three steps, 83 the re-route has completed, or equivalently we say that ISIS has 84 converged. 86 It is prudent to understand where the tens of seconds of re-route 87 times go before trying to fix ISIS re-routing. Hence, we have done 88 extensive experiments to characterize the time spent at each of 89 these steps. In our experiments, we tested ISIS running on Cisco 90 7200s running IOS 12.0S9 and IOS 12.1P and on Juniper M40s running 91 JunOS 4.1. In Sections [detection], [propagation], and [spf], we 92 present our findings on the detection, propagation, and shortest 93 path calculation steps respectively. In Section [summary], we summarize 94 our findings and suggest changes to ISIS protocol specification 95 and on its implementations. 97 ISTalk Software 99 In many of our experiments, we emulated very large random networks. 100 To facilitate this, we developed a software tool called ISTalk. 101 ISTalk can establish adjacencies with ISIS routers and inject them 102 LSPs emulating any topology allowing us to vary both the number 103 of nodes and the number of edges in the network. This is graphically 104 illustrated in Figure [istalk].([fig] ISTalk Software emulates 105 virtual topologies.) 107 2 Detection 109 The topology change, or the link-state change, is either detected 111 Alaettinoglu, et. al. Expires: May, 2000 [page 2 ] 112 ^L 113 and communicated to ISIS by the lower level protocols (i.e. link 114 level protocols), or it is detected using the peer to peer ISIS 115 hello protocol. The link level detection is fastest but seems to 116 be inconsistently implemented by vendors. In our experiments, with 117 certain medium we noticed that the device driver detected the failure 118 but did not communicate it to the router's OS. The link level detection 119 is also not always possible, for example in a switched environment 120 where a router fails behind a switch. In this case, the switch 121 gets the link layer notification, but has no mechanism to notify 122 other routers in the network. 124 The ISIS hello protocol is more robust, but slower. In this case, 125 the adjacent routers send periodic hello packets to each other 126 (the adjacency is also established through the use of hello packets, 127 but not described here). If a router misses a fixed number of hello 128 packets from an adjacent router, it declares the adjacency to that 129 router down. By default, the routers send each other hello packets 130 every 10 seconds, and declare the adjacency down after missing 131 3 hello packets. Hence, with default parameters, the detection 132 can take up to 30 seconds. The specification limits the granularity 133 of hello interval to seconds, hence making 3 seconds the fastest 134 possible detection based on the hello protocol. Several ISPs use 135 2 second hello intervals, but declare an adjacency down after missing 136 5 hello packets, resulting in a detection time of 10 seconds.([fig] Hello 137 Experiment Setup) 139 To study the effect of hello interval, we configured 3 routers in 140 a triangle topology and connected a workstation running ISTalk 141 to router 1 as shown in Figure [hello-setup]. The routers are connected 142 to each other using switched ethernet interfaces. Hence, when we 143 take down one of the routers, the other two have to rely on the 144 hello protocol to detect the change. The ISTalk establishes an 145 ISIS adjacency with router 1 so that it receives the LSPs flooded 146 in the network. The workstation pings the router 3 using router 147 3's interface address towards router 2 so that the pings follow 148 the path shown on the left hand side of Figure [hello-setup]. We 149 then bring down router 2 and after re-route the pings follow the 150 path shown on the right hand side of Figure [hello-setup]. The 151 pings are sent once every 100 milliseconds. ([fig] Packets received 152 by the workstation during the Hello Experiment) 154 Figure [hello-received-packets] shows the relevant packets received 155 by the workstation. The short spikes are the ICMP ECHO REPLY packets 156 received; they are only received if there is a valid route to the 157 ping's destination and back. As seen in the figure, there is a 158 33 second period where there are no ICMP ECHO REPLY packets received. 159 This is the re-route time, during which router 1 does not have 160 a route to the destination. The two tall spikes are LSPs reflecting 161 the topology change as detected by routers 1 and 3. Note that the 162 two LSPs are separated by 1 second. This is because the two routers 163 detect the change at different times due to their 3 hello packet 164 intervals ending at different times. This separation can be anywhere 165 from 0 second to one full hello interval, in this case 10 seconds. 167 Alaettinoglu, et. al. Expires: May, 2000 [page 3 ] 168 ^L 169 Why then after receipt of LSPs there is still 5 seconds of re-route 170 time? This is because the routers delay the SPF calculation by 171 a number of seconds (default is 5 seconds) in the hope that they 172 may catch more LSPs, particularly this second LSP in the figure, 173 and do one SPF calculation instead of many (because the costly 174 SPF calculation overwhelmed routers in the past). We refer to this 175 delay as the SPF delay. For millisecond convergence, the SPF delay 176 should be in milliseconds as well, and should ideally be zero. 178 The argument against a small hello interval is the bandwidth consumed 179 by the hello packets and the hello packets getting queued behind 180 data packets and the routers not receiving them in time. The most 181 desirable solution to the latter problem is to treat hello packets 182 preferentially over the data packets. In case this solution is 183 not available, we studied how the hello protocol is affected by 184 different hello intervals by simulating a link and varied both 185 the data load (using heavy tail traffic) and the hello interval 186 (as a fraction of the link's bandwidth). Figure [effect of hello interval] 187 shows the results. As seen from the figure, the hello interval 188 does not play a significant role, i.e. the hello packets do not 189 miss their deadlines, until the hello packets become a dominant 190 bandwidth consumer of the link's capacity.([fig] Effect of hello 191 interval and load on the Hello Protocol.) 193 Since the protocol's ultimate limit on the hello interval is set 194 by the bandwidth used by hello packets, extending the specification 195 to allow sub-second intervals would allow sub-second detection 196 on almost all links. This does not mean that the detection time 197 can be made arbitrarily small, only that detection should be limited 198 by the physical constraints of the link (its bandwidth, propagation 199 delay, etc), not by arbitrary clock granularities set by the protocol 200 designers. 202 Detection: stability and damping 204 For either event triggered (via the link layer notification) or 205 hello driven detection, there are network-wide stability issues 206 if routing tries to follow rapid link transients (i.e., a link 207 that goes down and up several times a second). Making hello interval 208 smaller may seem like allowing more instability into the network. 209 The usual way of dealing with this is to treat "bad news" differently 210 from the "good news" so routing is quick to find an alternate path 211 on any failure but slow to switch back when the link comes up. 212 The current ISIS specification treats bad news and good news the 213 same way but it should be trivial to change the detection specification 214 to allow different filtering constants for "down" and "up" state 215 changes. 217 3 Propagation 219 After a topological change is detected, a new link state packet 220 (LSP) is generated at the point of detection and then flooded unmodified 221 (except for the "remaining life" field) through the network. Flooding 223 Alaettinoglu, et. al. Expires: May, 2000 [page 4 ] 224 ^L 225 should propagate the LSP across the network at near the speed of 226 light plus one store-and-forward delay per hop. Thus, in theory 227 the LSP propagation should make a negligible contribution to the 228 re-route time. Unfortunately, this is not what we observed. 230 In our experiment, we connected 3 routers in a line topology using 231 switched ethernet as shown in Figure [LSP Experiment Setup]. The 232 ISTalk established ISIS adjacencies with router 1 on its interface 233 0 and with router 3 on its interface 3. It also passively listened 234 on its interfaces 1 and 2. Our experiments simulated networks of 235 varying sizes. In the steady state, that is after three routers 236 think they are part of a large network, we injected an LSP at interface 237 0 and observed the delay until it appeared at other interfaces 238 of the workstation. 240 Since the SPF calculation can take a significant amount of time 241 (see Section [spf]), commercial router implementations impose a 242 limit on how frequently the SPF calculation can be done using the 243 SPF delay parameter. In some implementations this is 5 seconds 244 fixed. In some implementations, this is changeable but the granularity 245 is in seconds. This limit essentially adds to the propagation time. 246 In this experiment, to remove its affect we set the SPF delay to 247 zero seconds.([fig] LSP Propagation Experiment Setup.) 249 Figure [LSP Propagation Results] shows the results for 500, 800 250 and 1000 node topologies (the average node degree is fixet at 5). 251 The x-axis is the interface number and the y-axis is the time in 252 seconds at which the LSP is seen. As seen from the figure, propagating 253 an LSP across 3 routers takes close to one second and propagation 254 time is related to the topology size. The reason for this delay 255 is related to the topology size is that the routers are performing 256 the SPF calculation before propagating the LSPs. ([fig] LSP Propagation 257 Experiment Results.) 259 In our experiment, we set SPF delay to zero seconds resulting in 260 a race condition at the routers between performing an SPF calculation 261 or propagating the LSP. The routers chose to do the SPF calculation 262 first. This degrades the propagation time from near the speed of 263 light to O(diameter\times SPFtime). To prevent this, the specification 264 might be amended to explicitly state that the LSP flooding is higher 265 priority than the SPF calculation. 267 4 Shortest Path Calculation 269 The final step in re-route is to compute new routes using a shortest 270 path tree algorithm, typically Dijkstra's Shortest Path First algorithm. 271 For sparse topologies, a binary heap implementation of Dijkstra's 272 algorithm has O(n\log n) complexity, and a naive implementation 273 has O(n^{2}) complexity. Figures [vary nodes] shows the SPF calculation 274 time in milliseconds versus number of nodes in the network. The 275 Figure [vary nodes log] shows the same but the y-axis is in log 276 scale. In this experiment each node had a fixed number of neighbors, 277 5.([fig] SPF calculation time versus number of nodes.) 279 Alaettinoglu, et. al. Expires: May, 2000 [page 5 ] 280 ^L 281 ([fig] SPF 282 calculation time versus number of nodes (log scale).) The 283 top 3 curves are the commercial router vendor implementations (collected 284 at the Ciscos using "show isis spf log" and using "show isis spf-log" 285 in Junipers). The bottom curve is an in-house implementation of 286 Dijkstra's shortest path algorithm. All the time values are wall-clock 287 times, and there is no other load on the routers. The SPF calculation 288 is indeed CPU intensive, taking 100s of milliseconds. As can be 289 seen from the log-scale graph, the bottom 3 curves have the same 290 shape. As a matter of fact an n\log n curve fits perfectly to them, 291 unfortunately the top curve, one of the commercial router implementations 292 fits an n^{2} curve perfectly.([fig] SPF calculation time versus 293 number of edges per node.) In Figure [vary degree], 294 we kept the number of nodes at 300 but varied the average number 295 of edges per node. Today, some ISPs have full-mesh topologies (using 296 ATM/MPLS technologies) and run ISIS over it. With full-mesh, the 297 SPF calculation is even more intense, almost about one second. 299 As illustrated by these experiments, even in high-end platforms, 300 the SPF calculation can take a long time (seconds) and has poor 301 scaling properties (n\log n to n^{2}). This has a serious impact 302 both on re-route times since the SPF calculation is in series with 303 the LSP propagation and on overall network stability since the 304 CPU on the routers will be saturated. To fix this, an algorithm 305 that scales much better than Dijkstra's is needed. Dijkstra's algorithm 306 re-computes all routes after a topology change regardless of whether 307 they were affected or not, each time starting from scratch. More 308 recent algorithms[Franciosa97:dijk][Frigioni94:dijk][Narvaez99:dijk] 309 store the data structures from earlier calculations and only re-compute 310 the affected routes. Their average case complexity is O(\log n). 312 We implemented one of these algorithms and results for 300 node 313 topology with degree varied are shown in Figure [dynamic SPT]. 314 Note that the y-axis, SPF time in milliseconds, is in log scale, 315 and the new dynamic shortest path tree algorithm computes new routes 316 in microseconds, that is 10,000 times faster than the commercial 317 implementations of Dijkstra's algorithm. Note that with the dynamic 318 shortest path tree algorithm, the calculation time first increases 319 and then decreases. This is because as the average degree increases, 320 the number of affected nodes by a topology change first increases 321 and then decreases. In the case of a full-mesh topology, the number 322 of affected nodes is often only one node.([fig] Dynamic SPT algorithm 323 is 10,000 times faster.) 325 5 Summary of Findings and Reccomendations 327 Stable, robust IP re-routing that works at the network's propagation 328 time (the theoretical maximum for any re-routing scheme) is both 329 possible and achievable. To get there, we have to make some minor 330 changes to the ISIS specification in the following priority order: 332 * switch to a modern algorithm for the SPF calculation 334 Alaettinoglu, et. al. Expires: May, 2000 [page 6 ] 335 ^L 336 * make the granularity of the Hello Interval in millisecond, rather 337 than seconds 339 * allow different detection and damping filter constants for the 340 link up and down events, i.e. differentiate good and bad news 342 * give higher priority to LSP propagation than SPF computation 344 * queue hello packets in front of data packets 346 What we did not see: During our experiments, we injected topologies 347 with 90,000 edges, thousands of nodes, drove all the routers to 348 100% CPU utilization, randomly unplugged links and powered down 349 routers. Although we were looking for it, we saw no evidence of 350 routing instability and we observed several subtle things done 351 to avoid getting into an unstable operating regime. For example, 352 even under 100% CPU utilization, the routers have never missed 353 sending out an hello packet. If they did, they could have dropped 354 adjacencies, causing more link-state changes, which in turn causing 355 more routing load, and which in turn causing more hello packet 356 being missed and so on. It appears that the two vendors we looked 357 at have learned a lot from a decade's worth or routing disasters 358 and meltdowns and are currently shipping robust routing code. 360 What we did not look for: In full-mesh topologies, each newly generated 361 LSP is received by a router n times, one from each of its neighboring 362 routers. Even though the router only performs one SPF for this 363 LSP, it needs to recognize that the other n-1 LSPs are duplicates. 364 When a router goes down in a full-mesh topology, each router loses 365 an adjacency and generates a new LSP. Hence, routers now need to 366 recognize O(n^{2}) duplicate LSPs and perform up to O(n) SPF calculations 367 (in practice LSPs received while performing an SPF calculation 368 are bundled together for the next round of SPF calculation). We 369 did not design experiments to study the time it takes to recognize 370 O(n^{2}) duplicate LSPs since there are already proposals[mpls-unnum] 371 addressing this issue. 373 6 Acknowledgments 375 Authors are grateful to the Global Crossing Production Testing Lab, 376 Cisco Systems, and Juniper Networks for allowing us to use their 377 facilities to perform some of these tests. Authors would also like 378 to thank Kathie Nichols, Judy Estrin, Chia-Chee Kuan, and Doug 379 Klein for their comments on this document. 381 7 References 383 8 Authors' Addresses 385 Cengiz Alaettinoglu 386 Packet Design 388 Alaettinoglu, et. al. Expires: May, 2000 [page 7 ] 389 ^L 390 66 Willow Place 391 Menlo Park, CA 94025 392 USA 393 email: cengiz@packetdesign.com 395 Van Jacobson 396 Packet Design 397 66 Willow Place 398 Menlo Park, CA 94025 399 USA 400 email: van@packetdesign.com 402 Haobo Yu 403 Packet Design 404 66 Willow Place 405 Menlo Park, CA 94025 406 USA 407 email: haoboy@packetdesign.com