idnits 2.17.1 draft-ietf-ospf-scalability-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'Ref1' on line 643 looks like a reference -- Missing reference section? 'Ref2-Ref5' on line 184 looks like a reference -- Missing reference section? 'Ref6' on line 658 looks like a reference -- Missing reference section? 'Ref7' on line 661 looks like a reference -- Missing reference section? 'Ref8-Ref9' on line 186 looks like a reference -- Missing reference section? 'Ref10' on line 672 looks like a reference -- Missing reference section? 'Ref2' on line 646 looks like a reference -- Missing reference section? 'Ref3' on line 649 looks like a reference -- Missing reference section? 'Ref4' on line 652 looks like a reference -- Missing reference section? 'Ref5' on line 655 looks like a reference -- Missing reference section? 'Ref8' on line 663 looks like a reference -- Missing reference section? 'Ref9' on line 668 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Gagan L. Choudhury 3 Internet Draft Vera D. Sapozhnikova 4 Expires in October, 2002 AT&T 5 draft-ietf-ospf-scalability-01.txt 6 Anurag S. Maunder 7 Sanera Systems 9 Vishwas Manral 10 Netplane Systems 12 April, 2002 14 Explicit Marking and Prioritized Treatment of Specific IGP Packets 15 for Faster IGP Convergence and Improved Network Scalability and 16 Stability 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance 21 with all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet-Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 37 Distribution of this memo is unlimited. 39 Abstract 41 In this draft we propose the following mechanisms in order to allow 42 fast IGP convergence and at the same time maintain scalability and 43 stability of a network: 45 (1) Explicitly mark Hello packets, to differentiate them from other 46 IGP packets, so that efficient implementations can detect and 47 process the Hello packets in a priority fashion. 49 (2) In the absence of special marking, or in addition to it, use 50 other mechanisms in order not to miss Hello packets. One example 51 is to treat any packet received over a link as a surrogate for 52 a Hello packet for the purpose of keeping the link alive. 54 (3) The same type of explicit marking and prioritized treatment may 55 be beneficial to other IGP packets as well. Some examples 56 include (a) LSA acknowledgment packet, (b) Database description 57 (DBD) packet from a slave that is used as an acknowledgement, 58 and (c) LSAs carrying intra-area topology change information. 60 It is possible that some implementations are already using one or 61 more of the above mechanisms in order not to miss the processing of 62 critical packets during periods of congestion. However, we suggest 63 the above mechanisms to be included as part of the standard so that 64 all implementations can benefit from them. 66 Table of Contents 68 1. Motivation......................................................2 69 2. Simulation Study................................................5 70 3. Analytic Model for Delay Experienced by a Hello Packet 71 During an Initial LSA Storm.....................................7 72 4. Need for Special Marking and Prioritized Treatment of 73 Specific IGP Packets...........................................12 74 5. Summary........................................................13 75 6. Acknowledgments................................................14 76 7. References.....................................................14 77 8. Authors' Addresses.............................................15 79 1. Motivation 81 The motivation of this draft is to address the following two key 82 objectives of any data network: (a) Fast restoration under failure 83 conditions, and (b) Improved network scalability and stability. 84 Using analytic and simulation models we show that in general the two 85 objectives are in conflict, i.e., improvement in one usually results 86 in the degradation of the other. However, special marking and 87 prioritized processing of certain key messages can allow us to 88 achieve both objectives. 90 The first item we address is fast restoration. The theoretical limit 91 for link-state routing protocols to re-route is in link propagation 92 time scales, i.e., in tens of milliseconds. However, as pointed 93 out in [Ref1], in practice it may take from seconds to tens of 94 seconds to detect the link failure and disseminate this information 95 to the network followed by the convergence on the new set of paths. 96 This is an inordinately long period of transient time for mission 97 critical traffic destined to the non-reachable nodes of the network. 99 One component of the long re-route time is the link failure detection 100 time of between 20 and 30 seconds through typically three missed 101 Hello packets with the typical hello interval of 10 seconds (between 102 30 and 40 seconds if missed hello threshold is 4). This component 103 would be much shorter in the presence of link level detection, 104 but as pointed out in [Ref1] it does not work in some cases. 105 For example, a device driver may detect the link level failure but 106 fail to notify it to the IGP level. Also, if a router fails behind 107 a switch in a switched environment then even though the switch gets 108 the link level notification it cannot communicate that to other 109 routers. Therefore for faster reliable detection at the IGP level, 110 one has to reduce the hello interval. [Ref1] suggests that 111 this be reduced to below a second, perhaps even to tens of 112 milliseconds. A second component of the long re-route time is 113 delayed SPF (shortest-path-first) computation. The typical delay 114 value is between 1 and 5 seconds but in order to have sub-second 115 rerouting it needs to be reduced significantly. 117 The second item we address is the ability of a network to withstand 118 the simultaneous or near-simultaneous update of a large number of 119 link-state-advertisement messages, or LSAs. We call this event, an 120 LSA storm. An LSA storm may be initiated due to many reasons. Here 121 are some examples: 123 (a) one or more link failures due to fiber cuts, 125 (b) one or more node failures for some reason, e.g., software 126 crash or some type of disaster in an office complex hosting 127 many nodes, 129 (c) requirement of taking down and later bringing back many 130 nodes during a software/hardware upgrade, 132 (d) near-synchronization of the once-in-30-minutes refresh instants 133 of some types of LSAs, 135 (e) refresh of all LSAs in the system during a change in software 136 version. 138 In addition to the LSAs generated as a direct result of link/node 139 failures, there may be other indirect LSAs as well. One example 140 in ATM/MPLS networks is LSAs generated at other links as a result 141 of significant change in bandwidth resulting from rerouting of 142 virtual circuits that went down during the link/node failure. The 143 LSA storm tends to drive the node CPU utilization to 100% for a 144 period of time and the duration of this period increases with the 145 size of the storm and the node adjacency, i.e., the number of links 146 connected to it. During this period the Hello packets received at 147 the node would see high delays and if this delay exceeds the 148 Router-Dead Interval (typically 30-40 seconds or three to four hello 149 intervals) then the associated link would be declared down. 151 In this draft we address only the issue of links 152 being declared down due to the delayed processing of Hello messages, 153 but in general, depending on the implementation, there may be other 154 impacts of a long CPU busy period. For example, in a reliable node 155 architecture with an active and a standby processor, a processor 156 switch-over may result during an extended CPU-busy period which may 157 mean that all the adjacencies would be lost and need to be re- 158 established. A processor switch-over may also result from a memory- 159 exhaust caused by an extended CPU busy period. Both of the above 160 events would cause more database synchronization with neighbors and 161 network-wide LSA flooding which in turn might cause extended CPU- 162 busy periods at other nodes. This may cause unstable behavior in 163 the network for an extended period of time and potentially a 164 meltdown in the extreme case. 166 Due to world-wide increased traffic 167 demand, data networks are ever increasing in size. As the network 168 size grows, a bigger LSA storm and a higher adjacency at certain 169 nodes would be more likely and so would increase the probability of 170 unstable behavior. One way to address the scalability issue is to 171 divide the network hierarchically into different areas so that 172 flooding of LSAs remains localized within areas. However, this 173 approach increases the network management and design complexity and 174 may result in less optimal routing between areas. Also, unless 175 addresses are aggregated, a large number of summary LSAs may need to 176 be flooded. Thus it is important to allow the network to grow towards 177 as large a size as possible under a single area. 179 The undesirable impact of large LSA storms is understood in the 180 networking community and it is well known that large scale flooding 181 of control messages (either naturally or due to a bug) has been 182 responsible for several network events in the past causing a 183 meltdown or a near-meltdown. For some recent examples see 184 [Ref2-Ref5]. Recently, proposals have been submitted to reduce 185 flooding overhead in case more than one interface goes to the same 186 neighbor [Ref6,Ref7]. Also, [Ref8-Ref9] considers a wide range 187 of congestion control and failure recovery mechanisms. 189 Section 2 uses a simulation model to illustrate the onset of 190 instability in the network as the result of a large LSA storm. 191 Section 3 uses a simple, approximate but easy-to-understand analytic 192 model to make the point that reducing hello intervals and more 193 frequent SPF computation would in fact reduce network scalability 194 and stability. Section 4 makes the point that many of the underlying 195 causes of network scalability can be avoided if certain IGP messages 196 are specially marked and provided prioritized treatment. [Ref10] 197 also provides simulation and analytic models to show the onset 198 of instability in large networks due to LSA storms and proposes the 199 prioritization of Hello and other special packets to improve 200 scalability and stability. 202 2. Simulation Study 204 We have developed a network-wide event simulation model to study the 205 impact of an LSA storm. It captures the actual congestion seen at 206 various nodes and accounts for propagation delay between nodes, 207 retransmissions in case an LSA is not acknowledged, failure of links 208 for LSAs delayed beyond the Router-dead interval, and link recovery 209 following database synchronization and LSA flooding once the LSA is 210 processed. It approximates a real network implementation and uses 211 processing times that are roughly in the same order of magnitude as 212 measured in the real network (of the order of milliseconds). 213 There are two categories of IGP messages processed at each node in 214 the simulation. Category 1 messages are triggered by a timer and 215 include the Hello refresh, LSA refresh and retransmission packets. 216 Category 2 messages are not triggered by a timer and include 217 received Hello, received LSA and received acknowledgments. Timer- 218 triggered messages are given non-preemptive priority over the other 219 type. As a result, the received Hello packets and the 220 received acknowledgment packets may see long queuing delays 221 under intense CPU overload. 223 Table 1 below shows sample results 224 of the simulation study when applied to a network with about 225 300 nodes and 800 links. The node-adjacency varies from node 226 to node and the maximum node-adjacency is 30. The Hello 227 interval is assumed to be 5 seconds, the minimum interval between 228 successive SPF (Shortest-Path-First) calculations is 1 second, and 229 the Router-Dead Interval is 15 seconds, i.e., a link is declared 230 down if no Hello packet is received for three successive hello 231 intervals. During the study, an LSA storm of size X is created at 232 instant of time 100 seconds where storm-size is defined as the 233 number of LSAs generated during a storm. Three cases are considered 234 with X = 300, 600 and 900 respectively. Besides the storm, there 235 are also the normal once-in-thirty-minutes LSA refreshes. At any 236 given point of time we define a quantity "dispersion" that is the 237 number of LSU packets already generated in the network but not 238 received and processed in at least one node (each LSU packet is 239 assumed to carry three LSAs). 241 Table 1 plots dispersion as a function of time and thereby 242 identifies the impact of LSA storm on network stability. 244 ======|========================================================== 245 | Table 1: DISPERSION as a FUNCTION of TIME (in sec) 246 LSA | for different LSA Storm Sizes 247 STORM |========================================================== 248 SIZE |100s 106s 110s 115s 140s 170s 230s 330s 370s 249 ======|========================================================== 250 300 | 0 39 3 1 0 1 0 0 0 251 ------|---------------------------------------------------------- 252 600 | 0 133 120 100 12 1 0 0 0 253 ------|---------------------------------------------------------- 254 900 | 0 230 215 196 101 119 224 428 488 255 ======|========================================================== 257 Before the LSA storm, the dispersion due to normal LSA refreshes 258 remains small. We expect the dispersion to jump to a high value 259 right after the storm and then come down to the pre-storm level 260 after some period of time (this happens with X=300 and X=600 but not 261 with X=900). In Table 1 with a LSA storm size 300, the "heavy 262 dispersion period" lasted about 11 seconds and no link losses were 263 observed. With a LSA storm of size 600, the "heavy dispersion 264 period" lasted about 40 seconds. Some link losses were observed a 265 little after 15 seconds within the "heavy dispersion period" but 266 eventually all links recovered and the dispersion came down to the 267 pre-storm level. With a LSA storm of size 900, the "heavy dispersion 268 period" lasted throughout the simulation period (6 minutes). 270 The generic observations are as follows: 272 (1) If the initial LSA storm size (e.g., X=300) is such that the 273 delays experienced by Hello packets are not big enough to cause 274 any link failures anywhere in the network, the network remains 275 stable and quickly gets back to a period of "low dispersion". 276 These types of LSA storms are observed quite frequently in 277 operational networks, from which the network easily recovers. 279 (2) If the initial LSA storm size (e.g., X=600) is such that the 280 delays experienced by a few Hello packets in a few nodes cause 281 link failures then some secondary LSA storms are generated. 282 However, the secondary storms do not keep growing indefinitely 283 and the network remains stable and eventually gets back to a 284 period of "low dispersion". This type of LSA storm was observed 285 in an operational network triggered by a network upgrade, from 286 which the network recovered but with some difficulty. 288 (3) If the initial LSA storm size (e.g., X=900), is such that the 289 delays experienced by many Hello packets in many nodes cause link 290 failures then a wave of secondary LSA storms are generated. The 291 network enters an unstable state and the secondary storms are 292 sustained indefinitely or for a very long period of time. This 293 type of LSA storm was observed in an operational network triggered 294 by a network failure [Ref2] from which the network recovered only 295 after taking some corrective steps (manual procedures based on 296 reducing adjacencies at heavily congested nodes were used to 297 reduce LSA flooding and stabilize the network). 299 The results show that there is a LSA storm threshold above which the 300 network shows unstable behavior. It was also observed that if Hello 301 packets (both received and sent) are given higher priority compared 302 to other IGP packets then the LSA storm threshold above which network 303 shows unstable behavior is significantly increased. In this draft we 304 only look at the failure of links due to missed Hellos, but in 305 general there may be many other types of failures once a network 306 enters an unstable state. Examples of failures include memory 307 exhaust and shooting down of the node processor due to the 308 inability of performing certain critical jobs. 310 3. Analytic Model for Delay experienced by a Hello Packet During an 311 Initial LSA Storm 313 From the simulation results of the previous section it is clear that 314 it is important to identify the delay experienced by a Hello packet 315 during an initial LSA storm and compare that against the maximum 316 allowed delay so as not to declare the link down. We develop a 317 simple and approximate analytic model for this purpose and use it to 318 study the impact of Hello and SPF intervals on network stability. 319 As explained in Section 2, for every link interface, a node has to 320 send and receive a Hello packet once every hello interval. Sending 321 of a Hello packet is triggered by a timer. We assume that higher 322 priority is given to timer-triggered jobs and therefore no 323 significant delay is experienced in the sending of Hello packets. 324 However, a received Hello packet cannot be easily distinguished from 325 other IGP packets and therefore we assume that it is served in 326 a first-come-first-served fashion. Let's assume: 328 S = Size of LSA storm, i.e., the number of LSAs in it. Also, it is 329 assumed that each LSA is carried in one LSU packet. 330 L = Link adjacency of the node under consideration. 332 t1 = Time to send or receive one IGP packet over an interface (the 333 same time is assumed for Hello, LSA, duplicate LSA and LSA 334 acknowledgment even though in general there may be some 335 differences. However, this would be a good approximation if 336 majority of the time were in the act of receiving or sending and a 337 relatively small part for packet-type-specific work.) In the 338 numerical examples we assume t1 = 1 ms. 340 t2 = Time to do one SPF calculation. For large networks, this time 341 is usually in hundreds of ms and in the numerical examples we assume 342 t2 = 200 ms. 344 Hi = Hello interval (the gap between successive Hello messages on 345 the same link). 347 Si = Minimum interval between successive SPF calculations. 349 ro = Rate at which non-IGP work comes to the node (e.g., forwarding 350 of data packets). For the numerical examples we assume ro = 0.2. 352 T = Total work brought in to the node during the LSA storm. For 353 each LSA update generated elsewhere, the node will receive one new 354 LSA packet over one interface, send an acknowledgment packet over 355 that interface, and send copies of the LSA packet over the remaining 356 L-1 interfaces. Also, assuming that the implicit acknowledgment 357 mechanism is in use, the node will subsequently receive either an 358 acknowledgment or a duplicate LSA over the remaining L-1 359 interfaces. So over each interface one packet is sent and one is 360 received. It can be seen that the same would be true for self- 361 generated LSAs (see Table 1 for an example). So the total work per 362 LSA update is 2*L*t1. Since there are S LSAs in the storm, we get 364 T = 2*S*L*t1 (1) 366 In Equation (1) we ignore retransmissions of LSAs in case 367 acknowledgments are not received or processed within 5 seconds. 368 From the simulation study we see that this is a reasonable 369 assumption since usually only a few retransmissions result during 370 the processing of the initial LSA storm (usually retransmissions 371 happen at a higher rate during the secondary storms). 373 T2 = Time period over which the work comes. Due to differences in 374 propagation times and congestion at other nodes, it is possible for 375 the work arrival time to be spread out over a long interval. 376 However, since we are primarily interested in a few nodes that are 377 bottlenecks or near-bottlenecks, it is reasonable to assume that 378 most of the work comes in one chunk. We verified this to be usually 379 true using simulations. One part of T2 will be of the order of link 380 propagation delay and we assume that there is a second part which is 381 proportional to T. Therefore we get, 382 T2 = A + B*T (2) 384 Where A and B are constants. For the numerical examples we assume 385 A = 10 ms and B = 0.1. 387 D = Maximum delay experienced by a Hello packet during the LSA 388 storm. We assume first-come-first-served service and hence the 389 delay seen by the Hello packet would be the total outstanding work 390 at the node at the arrival instant plus its own processing time. We 391 assume that the outstanding work steadily increases over the 392 interval T2 and so the maximum delay is seen by a Hello packet that 393 comes near the end of this interval. We write down an approximate 394 expression for D and then explain the various terms on the right 395 hand side: 397 D = T - T2 + max(1,2*T2/Hi)*t1 + max(1,T2/Si)*t2 + ro*T2 (3) 399 The first term is the total work brought in due to the LSA storm. 400 The second term is the work the node was able to finish since we are 401 assuming that it was continuously busy during the period T2. The 402 third term is the total work due to the sending and receiving of 403 Hello packets during the period T2. Note that it is assumed that at 404 least one Hello packet is processed, i.e., itself. The fourth term 405 is due to SPF processing during the period T2 and we assume that at 406 least one SPF processing is done. The last term is the total non- 407 IGP work coming to the node over the interval T2. 409 Dmax = Maximum allowed value of D, i.e., if D exceeds this value 410 then the associated link would be declared down. In the numerical 411 examples below we assume 413 Dmax = 3*Hi (4) 415 If we assume that the previous Hello packet was minimally delayed 416 then exceeding Dmax really means four missed hellos since the Hello 417 packet under study itself came after a period Hi. In the numerical 418 examples below, both D and Dmax change with choice of system 419 parameters and we are mainly interested in identifying if D exceeds 420 Dmax. For this purpose we define the following ratio variable 422 Delay Ratio = D/Dmax (5) 424 and identify if Delay Ratio exceeds 1. 426 In Tables 2-4 we plot the Delay Ratio as a function of LSA Storm 427 size with node adjacencies 10, 20 and 50 respectively. All 428 parameters except for the ones noted explicitly on the Tables are as 429 stated earlier. Table 2 assumes Hello packets every 10 seconds and 430 SPF calculation every 5 seconds, which are typical default values 431 today. With a node adjacency of 10, the Delay Ratio is below 1 even 432 with an LSA storm of size 900. However, with a node adjacency of 433 20, the Delay Ratio exceeds 1 at around a storm of size 800 and with 434 a node adjacency of 50, the Delay Ratio exceeds 1 at around a storm 435 of size 325. 437 ==========|======================================================== 438 | Table 2: Ratio of Hello Packet Delay to Maximum Allowed 439 | Hello Packet Delay as a function of LSA Storm Size (LSS) 440 | (Hello Every 10 Seconds, SPF Every 5 Seconds, 441 | Dmax = 30 seconds) 442 NODE |======================================================== 443 Adjacency | LSS=100 LSS=300 LSS=500 LSS=700 LSS=900 444 ==========|======================================================== 445 10 | 0.0677 0.1904 0.3131 0.4358 0.5584 446 ----------|-------------------------------------------------------- 447 20 | 0.1291 0.3744 0.6198 0.8651 1.1104 448 ----------|-------------------------------------------------------- 449 50 | 0.3131 0.9264 1.5398 2.1558 2.7718 450 ==========|======================================================== 452 In a large network it is not unusual to have LSA storms of size 453 several hundreds since the LSA database size may be several 454 thousands. This is particularly true if there are many Autonomous- 455 System-External (ASE) LSAs and there are special LSAs for carrying 456 information about available bandwidth at links as is common in ATM 457 networks and might be used in MPLS-based networks as well. Table 3 458 decreases the hello interval to 2 seconds and SPF calculation is 459 done once a second. LSA storm thresholds are significantly reduced. 460 Specifically, with a node adjacency of 10, the Delay Ratio exceeds 1 461 at around a storm of size 310; with a node adjacency of 20, the 462 Delay Ratio exceeds 1 at around a storm of size 160; and with a node 463 adjacency of 50, the Delay Ratio exceeds 1 at around a storm of size 464 only 65. 466 ==========|======================================================= 467 |Table 3: Ratio of Hello Packet Delay to Maximum Allowed 468 |Hello Packet Delay as a function of LSA Storm Size (LSS) 469 | (Hello Every 2 Seconds, SPF Every 1 Second, 470 | Dmax = 6 seconds) 471 NODE |======================================================== 472 ADJACENCY | LSS=30 LSS=90 LSS=150 LSS=210 LSS=270 473 ==========|======================================================== 474 10 | 0.124 0.308 0.492 0.676 0.86 475 ----------|-------------------------------------------------------- 476 20 | 0.216 0.584 0.952 1.32 1.691 477 ----------|-------------------------------------------------------- 478 50 | 0.492 1.412 2.349 3.289 4.229 479 ==========|======================================================== 481 Table 4 decreases the hello interval even further to 300 ms and SPF 482 calculation is done once every 500 ms. LSA storm thresholds are 483 really small now. Specifically, with a node adjacency of 10, the 484 Delay Ratio exceeds 1 at around a storm of size 40, with a node 485 adjacency of 20, the Delay Ratio exceeds 1 at around a storm of size 486 20, and with a node adjacency of 50, the Delay Ratio is already over 487 1 even with a storm of size 10. 489 ==========|======================================================== 490 | Table 4: Ratio of Hello Packet Delay to Maximum Allowed 491 | Hello Packet Delay as a function of LSA Storm Size (LSS) 492 | (Hello Every 300 ms, SPF Every 500 ms, Dmax = 900 ms) 493 NODE |======================================================== 494 ADJACENCY | LSS=10 LSS=30 LSS=50 LSS=70 LSS=90 495 ==========|======================================================== 496 10 | 0.419 0.828 1.237 1.646 2.055 497 ----------|-------------------------------------------------------- 498 20 | 0.623 1.441 2.259 3.078 3.896 499 ----------|-------------------------------------------------------- 500 50 | 1.237 3.282 5.333 7.467 9.602 501 ==========|======================================================== 503 Based on the simulation observations we understand that if Delay 504 Ratio is less than 1 for all Hello packets then the system is stable 505 and if it exceeds 1 at many nodes then the system tends to enter an 506 unstable region. Therefore, the LSA storm threshold at which the 507 Delay Ratio exceeds 1 may also roughly be considered as the network 508 stability threshold. Tables 2-4 show that the stability threshold 509 rapidly decreases as the hello interval and SPF computation interval 510 decreases. One reason for this is the increased CPU work due to 511 more frequent hello and SPF computations, but the dominant reason is 512 that Dmax itself decreases and so a smaller CPU busy interval is 513 needed to exceed it. Specifically, Dmax is 30 seconds in Table 2, 6 514 Seconds in Table 3 and only 900 ms in Table 4. It is clear from the 515 above examples that in order to maintain network stability as the 516 hello interval decreases, it is necessary to provide faster 517 prioritized treatment to received Hello packets which can of course 518 be only done if those packets can be distinguished from other IGP 519 packets. 521 4. Need for Special Marking and Prioritized Treatment of Specific IGP 522 packets 524 The analytic and simulation models show that a major cause for 525 unstable behavior in networks is received Hello packets at a node 526 getting queued behind other work brought in to the node during an 527 LSA storm and missing the deadline of typically three or four hello 528 intervals. Clearly, if the Hello packet can be specially marked to 529 distinguish it from other IGP packets then they can be 530 given prioritized treatment and they would not miss the deadline 531 even during a large LSA storm. However, the key is that the 532 detection mechanism should be significantly faster than the complete 533 processing of an IGP packet and it should be possible to do 534 detection and separate queueing at the line rate. 536 Usually a special Diffserv codepoint is used to differentiate all 537 IGP packets from other packets. We propose a separate Diffserv 538 codepoint for Hello packets that allows them to be queued separately 539 from other IGP packets and given prioritized treatment. 541 We also suggest the use of additional mechanisms in order not to miss 542 Hello packets during periods of congestion and thereby avoid 543 declaring links to be down. One such mechanism is to treat any 544 packet received over a link as an implicit Hello packet for the 545 purpose of keeping the link alive. Under this mechanism a link will 546 be declared down only if no packets are received over the link for a 547 duration of the Router Dead interval. So, during a period of 548 congestion, if Hello packets are queued behind LSAs or some other 549 packets but at least one such packet is received over the link no 550 slower than once every Router Dead interval, the link will stay up. 552 Besides the Hello packets there may be other IGP packets that could 553 also benefit from special marking and prioritized treatment. We give 554 some examples below but clearly others are possible. 556 (1) One example is the LSA acknowledgment packet. This packet 557 disables retransmission and if a large queueing delay to this 558 packet expires the retransmission timer (typical default value 559 is 5 seconds) then a needless retransmission will happen causing 560 extra traffic load. A special marking and prioritization of the 561 LSA acknowledgment packet would eliminate many needless 562 retransmissions. During the database exchange process between 563 neighbours following a link coming up, Database Description 564 packets are exchanged and the successful receipt of such a 565 packet is acknowledged by sending a properly sequenced Database 566 Description packet back to the sender. Since these packets are 567 used as acknowledgments, it makes sense to properly mark and 568 prioritize them as well. 570 (2) Another example is an LSA carrying a change information. It is 571 preferable to transmit this information faster than other LSAs 572 in the network that are just once-in-30-minutes refreshes. 574 Among "change" LSAs we can distinguish further and give 575 preferential treatment to only those "change" LSAs that carry 576 intra-area topology change information as opposed to other 577 "change" LSAs that are summary LSAs or Opaque LSAs. We can also 578 distinguish between "change" LSAs carrying "bad" information 579 (node/link failure) versus those carrying "good" information 580 (node/link coming up) and give higher priority to LSAs carrying 581 "bad" information. There may be multiple levels of priority 582 depending on the relative importance of the various IGP packets. 584 The explicit identification can also be used for preferentially 585 triggering the SPF calculation. We can normally have a longer gap 586 between successive SPF calculations, but revert to a shorter gap 587 after receiving an LSA that carries a area-topology-change 588 information. This will speed up restoration time following a 589 failure but would not unduly increase the SPF processing overhead. 591 5. Summary 593 In this draft we point out that if a large LSA storm is generated as 594 a result of some type of failure/recovery of nodes/links or 595 synchronization among refreshes then the Hello packets received at a 596 node may see large queueing delays and miss the deadline of 597 typically three or four hello intervals. This causes the associated 598 link to be declared down, starts a secondary storm and is 599 potentially the beginning of unstable behavior in the network. This 600 is already a concern in today's network but would be a bigger 601 concern if the hello interval and the minimum interval between SPF 602 calculations are substantially reduced (below or perhaps well below 603 a second) in order to allow faster rerouting. To avoid the above, 604 we propose the following: 606 (1) Explicitly mark Hello packets to differentiate them from other 607 IGP packets so that efficient implementations can detect and act 608 upon these packets in a priority fashion. This may be done by 609 using a special Diffserv codepoint for Hello packets (separate 610 from that used for other IGP packets). 612 (2) In the absence of special marking or in addition to it, other 613 mechanisms should be used in order not to miss Hello packets. 614 One example is to treat any packet received over a link as a 615 surrogate for a Hello packet for the purpose of keeping the link 616 alive. 618 (3) The same type of explicit marking and prioritized treatment 619 would also help other IGP packets and should be considered. Some 620 examples include LSA acknowledgment packets, Database Description 621 packets from the slave during database exchange and LSAs carrying 622 intra-area topology change information. LSAs carrying bad news 623 (node/link failures) may also be given priority over LSAs 624 carrying good news (node/link coming back up). 626 It is possible that some implementations are already using one or 627 more of the above mechanisms in order not to miss the processing of 628 critical packets during periods of congestion. However, we suggest 629 the above mechanisms to be included as part of the standard so that 630 all implementations can benefit from them. 632 6. Acknowledgments 634 The authors would like to acknowledge several people for their 635 helpful comments. In AT&T we recognize Tushar Amin, Jerry Ash, 636 Margaret Chiosi, Elie Francis, Jeff Han, Tom Helstern, Shih-Yue Hou, 637 S. Kandaswamy, Beth Munson, Aswatnarayan Raghuram, Moshe Segal, John 638 Tinacci, Mike Wardlow and Pat Wirth. In Lucent Technologies we 639 recognize Nabil Biter and Roshan Rao. 641 7. References 643 [Ref1] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli-second 644 IGP Convergence," Work in Progress. 646 [Ref2] Pappalardo, D., "AT&T, customers grapple with ATM net outage," 647 Network World, February 26, 2001. 649 [Ref3] "AT&T announces cause of frame-relay network outage," AT&T 650 Press Release, April 22, 1998. 652 [Ref4] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive 653 Week, August 20, 1999. 655 [Ref5] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light 656 Reading, April 6, 2001. 658 [Ref6] A. Zinin and M. Shand, "Flooding Optimizations in Link-State 659 Routing Protocols," Work in Progress. 661 [Ref7] J. Moy, "Flooding over Parallel Point-to-Point Links," Work in 662 progress. 663 [Ref8] J. Ash, G. Choudhury, J. Han, V. Sapozhnikova, M. Sherif, M. 664 Noorchashm, S. Mcallister, A. Maunder, V. Manral, "Proposed 665 Mechanisms for Congestion Control / Failure Recovery in OSPF & ISIS 666 Networks" Work in Progress. 668 [Ref9] J. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A. Maunder, 669 V. Manral, "Congestion Avoidance & Control for OSPF Networks", 670 Work in Progress. 672 [Ref10] G. Choudhury, A. Maunder and V. Sapozhnikova, "Faster 673 Link-State IGP Convergence and Improved Network Scalability and 674 Stability," Presentation at LCN 2001, Tampa, Florida, November 675 14-16, 2001. 677 8 Authors' Addresses 679 Gagan L. Choudhury 680 AT&T 681 Room D5-3C21 682 200 Laurel Avenue 683 Middletown, NJ, 07748 684 USA 685 Phone: (732)420-3721 686 email: gchoudhury@att.com 688 Vera D. Sapozhnikova 689 AT&T 690 Room C5-2C29 691 200 Laurel Avenue 692 Middletown, NJ, 07748 693 USA 694 Phone: (732)420-2653 695 email: sapozhnikova@att.com 696 Anurag S. Maunder 697 Sanera Systems 698 370 San Aleso Ave. 699 Second Floor 700 Sunnyvale, CA 94085 701 Phone: (408)734-6123 702 email: amaunder@sanera.net 704 Vishwas Manral 705 NetPlane 706 189, Prashasan Nagar, 707 Road Number 72 708 Jubilee Hills, Hyderabad 709 India 710 email: Vishwasm@netplane.com