idnits 2.17.1 draft-ietf-ospf-scalability-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Ref4-Ref7' is mentioned on line 321, but not defined == Unused Reference: 'Ref4' is defined on line 200, but no explicit reference was found in the text == Unused Reference: 'Ref5' is defined on line 203, but no explicit reference was found in the text == Unused Reference: 'Ref6' is defined on line 206, but no explicit reference was found in the text == Unused Reference: 'Ref7' is defined on line 209, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref2' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref3' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref4' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref5' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref6' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref7' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref8' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref9' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref10' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref11' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref12' Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Gagan L. Choudhury 3 Internet Draft Vera D. Sapozhnikova 4 Expires in September, 2003 AT&T 5 Category: Best Current Practice 6 draft-ietf-ospf-scalability-03.txt Anurag S. Maunder 7 Sanera Systems 9 Vishwas Manral 10 Netplane Systems 12 March, 2003 14 Prioritized Treatment of Specific OSPF 15 Packets and Congestion Avoidance 17 Status of this Memo 19 This document is an Internet-Draft and is in full conformance 20 with all provisions of Section 10 of RFC2026. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six 28 months and may be updated, replaced, or obsoleted by other documents 29 at any time. It is inappropriate to use Internet-Drafts as 30 reference material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 36 Distribution of this memo is unlimited. 38 Abstract 40 This document proposes methods that are intended to improve the 41 scalability and stability of large networks using OSPF protocol. 42 The methods include processing OSPF Hellos and LSA Acknowledgements 43 at a higher priority compared to other OSPF packets, and other 44 congestion avoidance procedures. Simulation results in support of 45 some of the proposals are given in the appendix sections. 47 Table of Contents 49 1. Motivation.....................................................2 50 2. The Proposals..................................................3 51 3. Security Considerations........................................4 52 4. Acknowledgments................................................4 53 5. References.....................................................5 54 6. Authors' Addresses.............................................5 55 Appendix A. LSA Storm: Causes and Impact..........................6 56 Appendix B. Simulation Study......................................8 57 Appendix B.1. The Network Under Simulation........................8 58 Appendix B.2. Simulation Results.................................11 59 Appendix B.3. Observations on Simulation Results.................15 60 Appendix C. Other Proposals......................................15 62 1. Motivation 64 A large network running OSPF [Ref1] or OSPF-TE [Ref2] protocol may 65 occasionally experience the simultaneous or near-simultaneous update 66 of a large number of link-state-advertisement messages, or LSAs. 67 We call this event, an LSA storm and it may be initiated by an 68 unscheduled failure or a scheduled maintenance or upgrade event. 69 The failure may be hardware, software, or procedural in nature. 71 The LSA storm causes high CPU and memory utilization at the node 72 processors causing incoming packets to be delayed or dropped. 73 Delayed acknowledgements (beyond the retransmission timer value) 74 results in retransmissions, and delayed Hello packets (beyond the 75 router-dead interval) results in links being declared down. 76 The retransmissions and additional LSA generations result in further 77 CPU and memory usage, essentially causing a positive feedback loop, 78 which, in the extreme case, may drive the network to an unstable 79 state. 81 The default value of retransmission timer is 5 seconds and that of 82 the router-dead interval is 40 seconds. However, recently there 83 has been a lot of interest in significantly reducing OSPF convergence 84 time and as part of that plan much shorter (subsecond) Hello and 85 router-dead intervals have been proposed [Ref3]. In such a scenario 86 it will be more likely for Hello packets to be delayed beyond 87 the router-dead interval during a network congestion event 88 caused by an LSA storm. 90 Appendix A explains in more detail LSA storm generation scenarios, 91 its impact, and points out a few real-life examples of control-message 92 storm generation. Appendix B presents a simulation study on this 93 phenomenon. 95 In order to improve the scalability and stability of networks we 96 propose steps for prioritizing critical OSPF packets and avoiding 97 congestion. The details of the proposals are given 98 in Section 2. We also do a simulation study on a subset of the 99 proposals in Appendix B and show that they indeed improve the 100 scalability and stability of networks using OSPF protocol. 102 Appendix C provides some further proposals with similar goals. 104 2. The Proposals 106 The proposals below are intended to improve the scalability 107 and stability of large networks using OSPF protocol. During 108 periods of network congestion they would reduce retransmissions, 109 avoid an interface to be declared down due to Hello packets 110 being delayed beyond the RouterDeadInterval, and take other 111 congestion avoidance steps. 113 Either all, or a subset of the proposals may be implemented by 114 a Router. It is also possible for some routers to implement 115 them fully or partially, and others to not implement them at 116 all. 118 (1) Classify all OSPF packets in two classes: a "high priority" 119 class comprising of OSPF Hello packets and Link State 120 Acknowledgement packets, and a "low priority" class 121 comprising of all other packets. The classification is 122 accomplished by examining the OSPF packet header. While 123 receiving a packet from a neighbor and while transmitting 124 a packet to a neighbor, try to process a "high priority" 125 packet ahead of a "low priority" packet. 127 (2) Reset the Inactivity Timer for an interface whenever any OSPF 128 packet is received over that interface (currently this is 129 done only for the Hello packet). 130 So OSPF would declare the interface to be down only if no 131 OSPF packet is received over that interface for a period 132 equaling or exceeding the RouterDeadInterval. 134 (3) Use an Exponential Backoff algorithm for determining the value 135 of the LSA retransmission interval (RxmtInterval). Let R(i) 136 represent the RxmtInterval value used during the i-th 137 retransmission of an LSA. Use the following algorithm to 138 compute R(i) 140 R(1) = Rmin 141 R(i+1) = Min(KR(i),Rmax) for i>=1 143 where K, Rmin and Rmax are constants and the function 144 Min(.,.) represents the minimum value of its two arguments. 145 Example values for K, Rmin and Rmax may be 2, 5 146 seconds and 40 seconds respectively. 148 (4) Implicit Congestion Detection and Action Based on That: 149 If there is control message congestion at a node, its 150 neighbors do not know about that explicitly. However, they 151 can implicitly detect it based on the number of unacknowledged 152 LSAs to this node. If this number exceeds a certain "high 153 water mark" then the rate at which LSAs are sent to this node 154 should be reduced. At a future time, if the number of 155 unacknowledged LSAs to this node falls below a certain "low 156 water mark" then the normal rate of sending LSAs to this 157 node should be resumed. An example value for the "high 158 water mark" may be 20 unacknowledged LSAs and that for the "low 159 water mark" may be 10 unacknowledged LSAs. An example 160 value for the rate on exceeding the "high water mark" may be 161 50% the normal rate. 163 (5) Throttling Adjacencies to be Brought Up Simultaneously: 164 If a node tries to bring up a large number of adjacencies to 165 its neighbors simultaneously then that may cause severe 166 congestion due to database synchronization and LSA flooding 167 activities. It is recommended that during such a situation 168 no more than "n" adjacencies should be brought up 169 simultaneously. Once a subset of adjacencies have been brought 170 up successfully, newer adjacencies may be brought up as long as 171 the number of simultaneous adjacencies being brought up does not 172 exceed "n". An example value for "n" may be 4. 174 3. Security Considerations 176 This memo does not create any new security issues for the OSPF 177 protocol. Security considerations for the base OSPF protocol are 178 covered in [Ref1]. 180 4. Acknowledgments 182 We would like to acknowledge the support of OSPF WG chairs 183 Rohit Dube, Acee Lindem, and John Moy. We also acknowledge 184 Jerry Ash, Margaret Chiosi, Elie 185 Francis, Jeff Han, Beth Munson, Roshan Rao, Moshe Segal, Mike 186 Wardlow, and Pat Wirth for collaboration and encouragement in 187 our scalability improvement efforts for Link-State-Protocol based 188 networks. 190 5. References 192 [Ref1] J. Moy, "OSPF Version 2", RFC 2328, April, 1998. 194 [Ref2] D. Katz, D. Yeung, K. Kompella, "Traffic Engineering 195 Extension to OSPF Version 2," Work in Progress. 197 [Ref3] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli- 198 second IGP Convergence," Work in Progress. 200 [Ref4] Pappalardo, D., "AT&T, customers grapple with ATM net 201 outage," Network World, February 26, 2001. 203 [Ref5] "AT&T announces cause of frame-relay network outage," AT&T 204 Press Release, April 22, 1998. 206 [Ref6] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive 207 Week, August 20, 1999. 209 [Ref7] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light 210 Reading, April 6, 2001. 212 [Ref8] A. Zinin and M. Shand, "Flooding Optimizations in Link-State 213 Routing Protocols," Work in Progress. 215 [Ref9] J. Moy, "Flooding over Parallel Point-to-Point Links," Work in 216 progress. 218 [Ref10] P. Pillay-Esnault, "OSPF Refresh and flooding reduction in 219 stable topologies," Work in progress. 221 [Ref11] J. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A. 222 Maunder, V. Manral, "Congestion Avoidance & Control for OSPF 223 Networks", Work in Progress. 225 [Ref12] B. M. Waxman, "Routing of Multipoint Connections," IEEE 226 Journal on Selected Areas in Communications, 6(9):1617-1622, 1988. 228 6. Authors' Addresses 230 Gagan L. Choudhury 231 AT&T 232 Room D5-3C21 233 200 Laurel Avenue 234 Middletown, NJ, 07748 235 USA 236 Phone: (732)420-3721 237 email: gchoudhury@att.com 238 Vera D. Sapozhnikova 239 AT&T 240 Room C5-2C29 241 200 Laurel Avenue 242 Middletown, NJ, 07748 243 USA 244 Phone: (732)420-2653 245 email: sapozhnikova@att.com 247 Anurag S. Maunder 248 Sanera Systems 249 370 San Aleso Ave. 250 Second Floor 251 Sunnyvale, CA 94085 252 Phone: (408)734-6123 253 email: amaunder@sanera.net 255 Vishwas Manral 256 NetPlane 257 189, Prashasan Nagar, 258 Road Number 72 259 Jubilee Hills, Hyderabad 260 India 261 email: Vishwasm@netplane.com 263 Appendix A. LSA Storm: Causes and Impact 265 An LSA storm may be initiated due to many reasons. Here 266 are some examples: 268 (a) one or more link failures due to fiber cuts, 270 (b) one or more node failures for some reason, e.g., software 271 crash or some type of disaster (including power outage) 272 in an office complex hosting many nodes, 274 (c) Link/node flapping, 276 (d) requirement of taking down and later bringing back many 277 nodes during a software/hardware upgrade, 279 (e) near-synchronization of the once-in-30-minutes refresh instants 280 of a subset of LSAs, 282 (f) refresh of all LSAs in the system during a change in software 283 version, 285 (g) injecting a large number of external routes to OSPF due to 286 a procedural error. 288 In addition to the LSAs generated as a direct result of link/node 289 failures, there may be other indirect LSAs as well. One example 290 in MPLS networks is traffic engineering LSAs generated at other 291 links as a result of significant change in reserved bandwidth 292 resulting from rerouting of Label Switched Paths (LSPs) that went 293 down during the link/node failure. 295 The LSA storm causes high CPU and memory utilization at the node 296 processors causing incoming packets to be delayed or dropped. 297 Delayed acknowledgements (beyond the retransmission timer value) 298 results in retransmissions, and delayed Hello packets (beyond the 299 Router-Dead interval) results in links being declared down. A 300 trunk-down event causes Router LSA generation by its end-point 301 nodes. If traffic engineering LSAs are used for each link then 302 that type of LSAs would also be generated by the end-point nodes 303 and potentially elsewhere as well due to significant changes in 304 reserved bandwidths at other links caused by the failure and reroute 305 of LSPs originally using the failed trunk. Eventually, when the 306 link recovers that would also trigger additional Router and traffic 307 engineering LSAs. 309 The retransmissions and additional LSA generations result in further 310 CPU and memory usage, essentially causing a positive feedback loop. 311 We define the LSA storm size as the number of LSAs in the original 312 storm and not counting any additional LSAs resulting from the 313 feedback loop described above. If the LSA storm is too large then 315 the positive feedback loop mentioned above may be large enough to 316 indefinitely sustain a large CPU and memory utilization at many 317 network nodes, thereby driving the network to an unstable state. 318 In the past, network 319 outage events have been reported in IP and ATM networks using 320 link-state protocols such as OSPF, IS-IS, PNNI or some proprietary 321 variants. See, for example [Ref4-Ref7]. In many of these examples, 322 large scale flooding of LSAs or other similar control messages 323 (either naturally or triggered by some bug or inappropriate 324 procedure) have been partly or fully responsible for network 325 instability and outage. 327 In Appendix B, we use a simulation model to show that there 328 is a certain LSA storm 329 size threshold above which the network may show unstable behavior 330 caused by large number of retransmissions, link failures due to 331 missed Hello packets and subsequent link recoveries. We also show 332 that the LSA storm size causing instability may be substantially 333 increased by providing prioritized treatment to Hello and LSA 334 Acknowledgment packets and by using an exponential backoff 335 algorithm for determining the LSA retransmission interval. 336 Furthermore, if we prioritize Hello 337 packets then even when the network operates somewhat above the 338 stability threshold, links are not declared down due to missed 339 Hellos. This implies that even though there is 340 control plane congestion due to many retransmissions, the data plane 341 stays up and no new LSAs are generated (besides the ones in the 342 original storm and the refreshes). These observations are the 343 basis of the first three proposals in Section 2. 345 One might argue that the scalability issue of large networks should 346 be solved solely by dividing the network hierarchically into 347 multiple areas so that flooding of LSAs remains localized within 348 areas. However, this approach increases the network management 349 and design complexity and may result in less optimal routing between 350 areas. Also, ASE LSAs are flooded throughout the AS and it may be 351 a problem if there are large numbers of them. Furthermore, 352 a large number of summary LSAs may need to be flooded across 353 Areas and their numbers would increase significantly if 354 multiple Area Border Routers are employed for the purpose of 355 reliability. Thus it is important to allow the network to grow 356 towards as large a size as possible under a single area. 358 Our proposal here is synergistic with a broader set of scalability 359 and stability improvement proposals. [Ref8, Ref9] proposes flooding 360 overhead reduction in case more than one interface goes to the same 361 neighbor. [Ref10] proposes a mechanism for 362 greatly reducing LSA refreshes in stable topologies. 363 [Ref11] proposes a wide range of congestion control and failure 364 recovery mechanisms. 366 Appendix B. Simulation Study 368 The main motivation of this study is to show the network congestion 369 and instability caused by large LSA storms and the improvement in 370 stability and scalability that can be achieved by following the 371 proposals in this memo. 373 Appendix B.1. The Network Under Simulation 375 We generate a random network over a rectangular grid using a 376 modified version of Waxman's algorithm [Ref12] that ensures that 377 the network is connected and has a pre-specified number of nodes, 378 links, maximum number of neighbors per node, and maximum number 379 of adjacencies per node. The rectangular grid resembles the 380 continental U.S.A. with maximum one-way propagation delay of 30 ms 381 in the East-West direction and maximum one-way propagation delay of 382 15 ms in the North-South direction. We consider two different 383 network sizes as explained in Section B.2. 385 The network has a flat, single-area topology. 387 Each node is a Router and each link is a point-to-point link 388 connecting two routers. 390 We assume that node CPU and memory (not the link bandwidth) is the 391 main bottleneck in the LSA flooding process. This will typically 392 be true for high speed links (e.g., OC3 or above) and/or links 393 where OSPF traffic gets an adequate Quality of Service (QoS) 394 compared to other traffic. 396 Different Timers: 397 LSA refresh interval = 1800 seconds, 398 Hello refresh interval = 10 Seconds, 399 Router-Dead interval = 40 seconds, 400 LSA retransmission interval: two values are considered, 10 seconds 401 and 5 Seconds (note that a retransmission is disabled on the 402 receipt of either an explicit acknowledgment or a duplicate LSA 403 over the same interface that acts as an implicit acknowledgment) 404 Minimum time between successive generation of the same LSA = 5 405 seconds, 406 Minimum time between successive Dijkstra SPF calculations 407 is 1 second. 409 Packing of LSAs: It is assumed that for any given node, the LSAs 410 generated over a 1-second period are packed together to form an LSU 411 but no more than 3 LSAs are packed in one LSU. 413 LSU/Ack/Hello Processing Times: All processing times are expressed 414 in terms of the parameter T. Two values of T are considered, 1 ms 415 and 0.5 ms. 417 In the case of a dedicated processor for processing OSPF packets the 418 processing time reported represents the true processing time. If the 419 processor does other work and only a fraction of its capacity can be 420 dedicated to OSPF processing then we have to inflate the processing 421 time appropriately to get the effective processing time and in that 422 case it is assumed that the inflation factor is already taken into 423 account as part of the reported processing time. 425 The fixed time to send or receive any LSU, Ack or Hello packet is T. 426 In addition, a variable processing time is used for LSU and Ack 427 depending on the number and types of LSAs packed. No variable 428 processing time is used for Hello. 429 Variable processing time per Router LSA is (0.5 + 0.17L)T where L is 430 the number of adjacencies advertised by the Router LSA. For other 431 LSA types (e.g., ASE LSA or a "Link" LSA carrying traffic 432 engineering information about a link), the variable processing time 433 per LSA is 0.5T. 435 Variable processing time for an Ack is 25% that of the corresponding 436 LSA. 438 It is to be noted that if multiple LSAs are packed in a single LSU 439 packet then the fixed processing time is needed only once but the 440 variable processing time is needed for every component of the 441 packet. 443 The processing time values we use are roughly in the same range of 444 what has been observed in an operational network. 446 LSU/Ack/Hello Priority: Two non-preemptive priority levels and 447 three priority scenarios are considered. Within each priority level 448 processing is FIFO with new packets of lower priority being 449 dropped when the lower priority queue is full. The higher priority 450 packets are never dropped. 451 In Priority scenario 1, all LSUs/Acks/Hellos received at a node 452 are queued at the lower priority. 453 In Priority scenario 2, Hellos received at a node are queued at 454 the higher priority but LSUs/Acks are queued at lower priority. 455 In Priority scenario 3, Hellos and Acks received at a node are 456 queued at the higher priority but LSUs are queued at lower 457 priority. 458 All packets generated internally to a node (usually triggered by 459 a timer) are processed at the higher priority. This includes the 460 initial LSA storm, LSA refresh, Hello refresh, LSA retransmission 461 and new LSA generation after detection of a failure or recovery. 463 Buffer Size for Incoming LSUs/Acks/Hellos (lower priority): Buffer 464 size is assumed to be 2000 packets where a packet is either an Ack, 465 LSU, or Hello. 467 LSA Refresh: Each LSA is refreshed once in 1800 seconds and the 468 refresh instants of various LSAs in the LSDB are assumed to be 469 uniformly distributed over the 1800 seconds period, i.e., they are 470 completely unsynchronized. If however, an LSA is generated as part 471 of the initial LSA storm then it goes on a new refresh schedule of 472 once in 1800 seconds starting from its generation time. 474 LSA Storm Generation: As defined earlier, "LSA storm" is the 475 simultaneous or near simultaneous generation of a large number of 476 LSAs. In the case of only Router and ASE LSAs we normally assume 477 that the number of ASE LSAs in the storm is about 4 times that of 478 the Router LSAs, but the ratio is allowed to change if either the 479 Router or the ASE LSAs have reached their maximum possible value. 480 In the case of only Router and Link LSAs (carrying traffic 481 engineering information) we normally assume that the number of Link 482 LSAs in the storm is about 4 times that of the Router LSAs, but the 483 ratio is allowed to change if either the Router or the Link LSAs 484 have reached their maximum possible value. For any given LSA storm 485 we keep generating LSAs starting from Node index 1 and moving 486 upwards and stop until the correct number of LSAs of each type have 487 been generated. The LSAs generated at any given node is assumed to 488 start at an instant uniformly distributed between 20 and 30 seconds 489 from the start of the simulation. Successive LSA generations at a 490 node are assumed to be spaced apart by 400 ms. It is to be noted 491 that during the period of observation there are other LSAs 492 generated besides the ones in the storm. These include refresh of 493 LSAs that are not part of the storm and LSAs generated due to 494 possible link failures and subsequent possible link recoveries. 496 Failure/Recovery of Links: If no Hello is received over a link (due 497 to CPU/memory congestion) for longer than Router-Dead Interval then 498 the link is declared down. At a later time, if Hellos are received 499 then the link would be declared up. Whenever a link is declared 500 up or down, one Router LSA is generated by each Router on the 501 two sides of the point-to-point link. If "Link LSAs" carrying 502 traffic engineering information is used then it is assumed that each 503 Router would also generate a Link LSA. In this case it is also 504 assumed that due to rerouting of LSPs, three other links in the 505 network (selected randomly in the simulation) would have significant 506 change in reserved bandwidth which would result in one Link LSA 507 being generated by the routers on the two ends of each such link. 509 Appendix B.2. Simulation Results 511 In this section we study the relative performance of the three 512 priority scenarios defined earlier (no priority to Hello or Ack, 513 priority to Hello only, and priority to both Hello and Ack) with a 514 range of Network sizes, LSA retransmission timer values, LSA types, 515 processing time values and Hello/Router-Dead-Interval values: 517 Network size: Two networks are considered. Network 1 has 100 nodes, 518 1200 links, maximum number of neighbors per node is 30 and maximum 519 number of adjacencies per node is 50 (same neighbor may have more 520 than one adjacencies). Network 2 has 50 nodes, 600 links, maximum 521 number of neighbors per node is 25 and maximum number of adjacencies 522 per node is 48. Dijkstra SPF calculation time for Network 1 is 523 assumed to be 100 ms and that for Network 2 is assumed to be 70 ms. 525 LSA Type: Each node has 1 Router LSA (Total of 100 for Network 1 and 526 50 for Network 2). There are no Network LSAs since all links are 527 point-to-point links and no Summary LSAs since the network has only 528 one area. Regarding other LSA types we consider two situations. In 529 Situation 1 we assume that there are no ASE LSAs and each link has 530 one "Link" LSA carrying traffic engineering information (Total of 531 2400 for Network 1 and 1200 for Network 2). In Situation 2 we assume 532 that there are no "Link" LSAs and half of the nodes are ASA-Border 533 nodes and each border node has 10 ASE LSAs (Total of 500 for 534 Network 1 and 250 for Network 2). We identify Situation 1 as "Link 535 LSAs" and Situation 2 as "ASE LSAs". 537 LSA retransmission timer value: Two values are considered, 10 538 seconds and 5 seconds (default value). 540 Processing time values: Processing times for LSUs, Acks and Hello 541 packets have been previously expressed in terms of a common 542 parameter T. Two values are considered for T, which are 1 ms 543 and 0.5 ms respectively. 545 Hello/Router-Dead-Interval: It is assumed that Router-Dead interval 546 is four times the Hello interval. In one case it is assumed that 547 Hello interval is 10 seconds and Router-Dead-Interval is 40 548 seconds (default values), and in the other case it is assumed that 549 Hello interval is 2 seconds and Router-Dead-Interval is 8 seconds. 551 Based on Network size, LSA type and processing time values we 552 develop 6 Test cases as follows: 554 Case 1: Network 1, Link LSAs, retransmission timer = 10 sec., 555 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 557 Case 2: Network 1, ASE LSAs, retransmission timer = 10 sec., 558 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 560 Case 3: Network 1, Link LSAs, retransmission timer = 5 sec., 561 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 563 Case 4: Network 1, Link LSAs, retransmission timer = 10 sec., 564 T = 0.5 ms, Hello/Router-Dead-Interval = 10/40 sec. 566 Case 5: Network 1, Link LSAs, retransmission timer = 10 sec., 567 T = 1 ms, Hello/Router-Dead-Interval = 2/8 sec. 569 Case 6: Network 2, Link LSAs, retransmission timer = 10 sec., 570 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 572 For each case and for each Priority scenario we study the network 573 stability as a function of the size of the LSA storm. The stability 574 is determined by looking at the number of non-converged LSUs as a 575 function of time. An example is shown in Table 1 for Case 1 and 576 Priority scenario 1 (No priority to Hellos or Acks). 578 =========|========================================================== 579 | Number of Non-Converged LSUs in the Network at Time(in sec) 580 LSA | 581 STORM |====|=====|=====|=====|=====|=====|=====|=====|========|== 582 SIZE |10s | 20s | 30s | 35s | 40s | 50s | 60s | 80s | 100s | 583 =========|====|=====|=====|=====|=====|=====|=====|=====|========|== 584 100 | 0 | 0 | 24 | 29 | 24 | 1 | 0 | 1 | 1 | 585 (Stable)| | | | | | | | | | 586 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 587 140 | 0 | 0 | 35 | 48 | 46 | 27 | 14 | 1 | 1 | 588 (Stable)| | | | | | | | | | 589 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 590 160 | 0 | 0 | 38 | 57 | 55 | 40 | 26 | 65 | 203 | 591 (Unstable) | | | | | | | | | 592 =========|========================================================== 594 Table 1: Network Stability Vs. LSA Storm 595 (Case 1, No priority to Hello/Ack) 597 The LSA storm starts a little after 20 seconds and so for some 598 period of time after that the number of non-converged LSUs should 599 stay high and then come down for a stable network. 600 This happens for LSA storms of sizes 100 and 140. With an LSA storm 601 of size 160, the number of non-converged LSUs stay high indefinitely 602 due to repeated retransmissions, link failures due to missed Hellos 603 for more than the Router-Dead interval which generates additional 604 LSAs and also due to subsequent link recoveries which again 605 generate additional LSAs. We define network stability threshold as 606 the maximum allowable LSA storm size for which the number of 607 non-converged LSUs come down to a low level after some time. It 608 turns out that for this example the stability threshold is 609 150. 611 The network behavior as a function of the LSA storm size can 612 be categorized as follows: 614 (1) If the LSA storm is well below the stability threshold then 615 the CPU/memory congestion lasts only for a short period and 616 during this period there are very few retransmissions, very 617 few dropped OSPF packets and no link 618 failures due to missed Hellos. This type of LSA storms are 619 observed routinely in operational networks and networks 620 recover from them easily. 622 (2) If the LSA storm is just below the stability threshold then 623 the CPU/memory congestion lasts for a longer period and during 624 this period there may be considerable amount of retransmissions 625 and dropped OSPF packets. If Hello packets are not given 626 priority then there may also be some link failures due to 627 missed Hellos. However, the network does go back to a stable 628 state eventually. This type of LSA storm may happen rarely in 629 operational networks and they recover from it with some 630 difficulty. 632 (3) If the LSA storm is above the stability threshold then 633 the CPU/memory congestion may last indefinitely unless 634 some special procedure for relieving congestion is followed. 635 During this period there are considerable amount of 636 retransmissions and dropped OSPF packets. If Hello packets are 637 not given priority then there would also be link failures due 638 to missed Hellos. This type of LSA storm may happen very rarely 639 in operational networks and usually some manual procedure such 640 as taking down adjacencies in heavily congested nodes is needed. 642 (4) If Hello packets are given priority then the network stability 643 threshold increases, i.e., the network can withstand a larger 644 LSA storm. Furthermore, even if the network operates at or 645 somewhat above this higher stability threshold, Hellos are 646 still not missed and so there are no link failures. So even 647 if there is congestion in the control plane due to increased 648 retransmissions requiring some special procedures for congestion 649 reduction, the data plane remains unaffected. 651 (5) If both Hello and Acknowledgement packets are given priority 652 then the stability threshold increases even further. 654 In Table 2 we show the network stability threshold for the five 655 different cases and for the three different priority scenarios 656 defined earlier. 658 |===========|========================================================| 659 | | Maximum Allowable LSA Storm Size For | 660 | Case |=================|==================|===================| 661 | Number | No Priority to |Priority to Hello | Priority to Hello | 662 | | Hello or Ack | Only | and Ack | 663 |===========|=================|==================|===================| 664 | Case 1 | 150 | 190 | 250 | 665 |___________|_________________|__________________|___________________| 666 | Case 2 | 185 | 215 | 285 | 667 |___________|_________________|__________________|___________________| 668 | Case 3 | 115 | 127 | 170 | 669 |___________|_________________|__________________|___________________| 670 | Case 4 | 320 | 375 | 580 | 671 |___________|_________________|__________________|___________________| 672 | Case 5 | 120 | 175 | 225 | 673 |___________|_________________|__________________|___________________| 674 | Case 6 | 185 | 224 | 285 | 675 |___________|_________________|__________________|___________________| 677 Table 2: Maximum Allowable LSA Storm for a Stable Network 679 We also considered one more scenario with priority to Hello and Ack 680 and with a truncated binary exponential backoff of the 681 retransmission interval with an upper limit of 40 seconds (for the 682 same LSA, each successive retransmission interval 683 is doubled but not to exceed 40 seconds). The maximum allowed 684 LSA storm size for this scenario significantly exceeded the numbers 685 given in the third column. 687 Appendix B.3. Observations on Simulation Results 689 Table 2 shows that in all cases prioritizing Hello packets increases 690 the network stability threshold, and in addition, prioritization of 691 LSA Acknowledgment packets increases the stability threshold even 692 further. The reasons for the above observations are as follows. 693 The main sources of sustained CPU/memory congestion (or positive 694 feedback loop) following an LSA storm are (1) LSA retransmissions 695 and (2) links being declared down due to missed Hellos which in 696 turn causes further LSA generation and future recovery of the link 697 causing even more LSA generation. 698 Prioritizing Hello packets avoids and practically eliminates the 699 second source of congestion. Prioritizing Acknowledgements 700 significantly reduces the first source of congestion, i.e., 701 LSA retransmissions. It is to be noted that retransmissions can 702 not be completely eliminated due to the following reasons. Firstly, 703 only the explicit Acknowledgments are prioritized but duplicate 704 LSAs carrying implicit Acknowledgments are still served at the 705 lower priority. Secondly, LSAs may get greatly delayed or dropped 706 at the input queue of receivers and therefore Acknowledgments may 707 not even get generated in which case prioritizing Acks would not 708 help. Another factor to keep in mind is that since Hellos and Acks 709 are prioritized, the LSAs see bigger delay and potential for 710 dropping. However, the simulation results show that on the whole 711 prioritizing Hello and LSA Acks are always beneficial and 712 significantly improve the network stability threshold. 714 As stated in Section B.2, exponenetial backoff of LSA retransmission 715 interval further increases the network stability threshold. 717 Our simulation study also showed that in each of the cases, instead 718 of prioritizing Hello packets if we treat any packet received over 719 a link as a surrogate for a Hello packet (an implicit Hello) then 720 we get about the same stability threshold as obtained with 721 prioritizing Hello packets. 723 Appendix C. Other Proposals 725 (1) Explicit Marking: In Section 2 we proposed that OSPF packets 726 be classified to "high" and "low" priority classes based on 727 examining the OSPF packet header. In some cases (particularly 728 in the receiver) this examination may be computationally 729 costly. An alternative would be the 730 use of different TOS (DSCP) bits marking for high and low 731 priority OSPF packets respectively. The exact specification 732 of this marking is for further study. 734 (2) Other High Priority OSPF Packets: Besides the packets designated 735 as high priority in Section 2 there may be other packets with 736 a need for high priority designation. One example is the 737 Database Description (DBD) packet from a slave (during the 738 database synchronization process) that is used as an 739 acknowledgement. A second example is an LSA carrying 740 intra-area topology change information (this may trigger 741 SPF calculation and rerouting of Label Switched paths and so 742 fast processing of this packet may improve OSPF/LDP convergence 743 times).