idnits 2.17.1 draft-ietf-ospf-scalability-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. == There are 3 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 43 has weird spacing: '... to perta...' == Line 76 has weird spacing: '...for the purpo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Ref4-Ref7' is mentioned on line 429, but not defined == Unused Reference: 'Ref4' is defined on line 301, but no explicit reference was found in the text == Unused Reference: 'Ref5' is defined on line 304, but no explicit reference was found in the text == Unused Reference: 'Ref6' is defined on line 307, but no explicit reference was found in the text == Unused Reference: 'Ref7' is defined on line 310, but no explicit reference was found in the text Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Gagan L. Choudhury 3 Internet Draft Vera D. Sapozhnikov 4 Expires in November, 2003 Gerald R. Ash 5 Category: Best Current Practice AT&T 6 draft-ietf-ospf-scalability-05.txt 7 Anurag S. Maunder 8 Erlang Technology 10 Vishwas Manral 11 Motorola Inc. 13 May, 2003 15 Prioritized Treatment of Specific OSPF 16 Packets and Congestion Avoidance 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance 21 with all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet-Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.html 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 37 Distribution of this memo is unlimited. 39 IPR Notices 41 (A) The IETF takes no position regarding the validity or scope of 42 any intellectual property or other rights that might be claimed 43 to pertain to the implementation or use of the technology 44 described in this document or the extent to which any license 45 under such rights might or might not be available; neither does 46 it represent that it has made any effort to identify any such 47 rights. Information on the IETF's procedures with respect to 48 rights in standards-track and standards-related documentation 49 can be found in BCP-11. Copies of claims of rights made 50 available for publication and any assurances of licenses to 51 be made available, or the result of an attempt made 52 to obtain a general license or permission for the use of such 53 proprietary rights by implementors or users of this 54 specification can be obtained from the IETF Secretariat. 56 (B) The IETF invites any interested party to bring to its 57 attention any copyrights, patents or patent applications, or 58 other proprietary rights which may cover technology that may be 59 required to practice this standard. Please address the 60 information to the IETF Executive Director. 62 Copyright Notice 64 (C) Copyright (C) The Internet Society (date). All Rights 65 Reserved. 67 This document and translations of it may be copied and 68 furnished to others, and derivative works that comment on or 69 otherwise explain it or assist in its implementation may be 70 prepared, copied, published and distributed, in whole or in 71 part, without restriction of any kind, provided that the above 72 copyright notice and this paragraph are included on all such 73 copies and derivative works. However, this document itself may 74 not be modified in any way, such as by removing the copyright 75 notice or references to the Internet Society or other Internet 76 organizations, except as needed for the purpose of developing 77 Internet standards in which case the procedures for copyrights 78 defined in the Internet Standards process must be followed, or 79 as required to translate it into languages other than English. 81 The limited permissions granted above are perpetual and will 82 not be revoked by the Internet Society or its successors or 83 assigns. 85 This document and the information contained herein is provided 86 on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 87 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 88 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE 89 OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 90 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A 91 PARTICULAR PURPOSE. 93 Abstract 95 This document recommends methods that are intended to improve the 96 scalability and stability of large networks using OSPF (Open Shortest 97 Path First) protocol. The methods include processing OSPF Hellos and 98 LSA (Link State Advertisement) Acknowledgments at a higher priority 99 compared to other OSPF packets, and other congestion avoidance 100 procedures. Simulation results in support of some of the 101 recommendations are given in the appendix sections. 103 Table of Contents 105 1. Introduction...................................................3 106 2. Recommendations................................................4 107 3. Security Considerations........................................6 108 4. Acknowledgments................................................6 109 5. Normative References...........................................6 110 6. Informative References.........................................6 111 7. Authors' Addresses.............................................7 112 Appendix A. LSA Storm: Causes and Impact..........................8 113 Appendix B. Simulation Study.....................................10 114 Appendix B.1. The Network Under Simulation.......................10 115 Appendix B.2. Simulation Results.................................13 116 Appendix B.3. Observations on Simulation Results.................17 117 Appendix C. Other Recommendations................................17 119 1. Introduction 121 A large network running OSPF [Ref1] or OSPF-TE [Ref2] protocol may 122 occasionally experience the simultaneous or near-simultaneous update 123 of a large number of link-state-advertisements, or LSAs. 124 We call this event, an LSA storm and it may be initiated by an 125 unscheduled failure or a scheduled maintenance event. 126 The failure may be hardware, software, or procedural in nature. 128 The LSA storm causes high CPU and memory utilization at the router 129 causing incoming packets to be delayed or dropped. 130 Delayed acknowledgments (beyond the retransmission timer value) 131 result in retransmissions, and delayed Hello packets (beyond the 132 router-dead interval) result in neighbor adjacencies being declared 133 down. The retransmissions and additional LSA originations result in 134 further CPU and memory usage, essentially causing a positive feedback 135 loop, which, in the extreme case, may drive the network to an 136 unstable state. 138 The default value of retransmission timer is 5 seconds and that of 139 the router-dead interval is 40 seconds. However, recently there 140 has been a lot of interest in significantly reducing OSPF convergence 141 time. As part of that plan much shorter (subsecond) Hello and 142 router-dead intervals have been proposed [Ref3]. In such a scenario 143 it will be more likely for Hello packets to be delayed beyond 144 the router-dead interval during network congestion 145 caused by an LSA storm. 147 Appendix A explains in more detail LSA storm scenarios, 148 their impact, and points out a few real-life examples of 149 control-message storms. Appendix B presents a simulation study 150 on this phenomenon. 152 In order to improve the scalability and stability of networks we 153 recommend steps for prioritizing critical OSPF packets and avoiding 154 congestion. The details of the recommendations are given 155 in Section 2. We also do a simulation study on a subset of the 156 recommendations in Appendix B and show that they indeed improve the 157 scalability and stability of networks using OSPF protocol. 159 Appendix C provides some further recommendations with similar goals. 161 2. Recommendations 163 The Recommendations below are intended to improve the scalability 164 and stability of large networks using OSPF protocol. During 165 periods of network congestion they would reduce retransmissions, 166 avoid an adjacency to be declared down due to Hello packets 167 being delayed beyond the RouterDeadInterval, and take other 168 congestion avoidance steps. The recommendations are unordered 169 except that Recommendation 2 is to be implemented only if 170 Recommendation 1 is not implemented. 172 (1) Classify all OSPF packets in two classes: a "high priority" 173 class comprising of OSPF Hello packets and Link State 174 Acknowledgment packets, and a "low priority" class 175 comprising of all other packets. The classification is 176 accomplished by examining the OSPF packet header. While 177 receiving a packet from a neighbor and while transmitting 178 a packet to a neighbor, try to process a "high priority" 179 packet ahead of a "low priority" packet. 181 (2) If the Recommendation 1 cannot be implemented then reset the 182 inactivity timer for an adjacency whenever any OSPF unicast 183 packet or any OSPF packet sent to 224.0.0.5 over a 184 point-to-point link is received over that adjacency instead of 185 resetting the inactivity timer only on receipt of the 186 Hello packet. So OSPF would declare the adjacency to be down 187 only if no OSPF unicast packets and no OSPF packets sent to 188 224.0.0.5 over a point-to-point link are received over 189 that adjacency for a period equaling or exceeding the 190 RouterDeadInterval. The reason for not recommending this 191 proposal in conjunction with Recommendation 1 is to avoid 192 potential undesirable side effects. One such effect is the 193 delay in discovering the down status of 194 an adjacency in a case where no high priority Hello packets are 195 being received but the inactivity timer is being reset by other 196 stale packets in the low priority queue. 198 (3) Use an Exponential Backoff algorithm for determining the value 199 of the LSA retransmission interval (RxmtInterval). Let R(i) 200 represent the RxmtInterval value used during the i-th 201 retransmission of an LSA. Use the following algorithm to 202 compute R(i) 204 R(1) = Rmin 205 R(i+1) = Min(KR(i),Rmax) for i>=1 207 where K, Rmin and Rmax are constants and the function 208 Min(.,.) represents the minimum value of its two arguments. 209 Example values for K, Rmin and Rmax may be 2, 5 seconds 210 and 40 seconds respectively. Note that the example value for 211 Rmin, the initial retransmission interval, is the same as the 212 sample value of RxmtInterval in [Ref1]. 214 This recommendation is motivated by the observation that during 215 a network congestion event caused by control messages, a major 216 source for sustaining the congestion is the repeated 217 retransmission of LSAs. The use of an Exponential Backoff 218 algorithm for the LSA retransmission interval reduces the rate 219 of LSA retransmissions while the network experiences 220 congestion (during which it is more likely that multiple 221 retransmissions of the same LSA would happen). This in turn 222 helps the network get out of the congested state. 224 (4) Implicit Congestion Detection and Action Based on That: 225 If there is control message congestion at a router, its 226 neighbors do not know about that explicitly. However, they 227 can implicitly detect it based on the number of unacknowledged 228 LSAs to this router. If this number exceeds a certain "high 229 water mark" then the rate at which LSAs are sent to this router 230 should be reduced. At a future time, if the number of 231 unacknowledged LSAs to this router falls below a certain "low 232 water mark" then the normal rate of sending LSAs to this 233 router should be resumed. An example value for the "high 234 water mark" may be 20 unacknowledged LSAs and that for the "low 235 water mark" may be 10 unacknowledged LSAs. An example 236 value for the rate on exceeding the "high water mark" may be 237 50% the normal rate. This recommendation is to be implemented 238 only for unicast LSAs sent to a neighbor or LSAs sent 239 to 224.0.0.5 over a point-to-point link. 241 Recommendations 3 and 4 both slow down LSAs to congested 242 neighbors based on implicitly detecting the congestion but 243 they have important differences. Recommendation 3 progressively 244 slows down successive retransmissions of the same LSA whereas 245 Recommendation 3 slows down all LSAs (new or retransmission) 246 to a congested neighbor. 248 (5) Throttling Adjacencies to be Brought Up Simultaneously: 249 If a router tries to bring up a large number of adjacencies to 250 its neighbors simultaneously then that may cause severe 251 congestion due to database synchronization and LSA flooding 252 activities. It is recommended that during such a situation 253 no more than "n" adjacencies should be brought up 254 simultaneously. Once a subset of adjacencies have been brought 255 up successfully, newer adjacencies may be brought up as long as 256 the number of simultaneous adjacencies being brought up does not 257 exceed "n". The appropriate value of "n" would depend on the 258 router processing power, link bandwidth and propagation delay. 259 An example value may be 4. 261 In the presence of throttling, an 262 important issue is the order in which adjacencies are to be 263 formed. We recommend a First Come First Served (FCFS) policy 264 based on the order in which the request for adjacency formation 265 arrives. Requests may either be from neighbors or self-generated. 266 Among the self-generated requests a priority 267 list may be used to decide the order in which the requests are 268 to be made. However, once an adjacency 269 formation process starts it is not to be preempted except 270 for unusual circumstances such as errors or time-outs. 272 3. Security Considerations 274 This memo does not create any new security issues for the OSPF 275 protocol. Security considerations for the base OSPF protocol are 276 covered in [Ref1]. 278 4. Acknowledgments 280 We would like to acknowledge the support of OSPF WG chairs 281 Rohit Dube, Acee Lindem, and John Moy. We acknowledge Mitchell 282 Erblich, Mike Fox, Tony Przygienda, and Krishna Rao for comments on 283 previous versions of the draft. We also acknowledge 284 Margaret Chiosi, Elie Francis, Jeff Han, Beth Munson, 285 Roshan Rao, Moshe Segal, Mike Wardlow, and Pat Wirth for 286 collaboration and encouragement in our scalability 287 improvement efforts for Link-State-Protocol based networks. 289 5. Normative References 291 [Ref1] J. Moy, "OSPF Version 2", RFC 2328, April, 1998. 293 6. Informative References 295 [Ref2] D. Katz, D. Yeung, K. Kompella, "Traffic Engineering 296 Extension to OSPF Version 2," Work in Progress. 298 [Ref3] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli- 299 second IGP Convergence," Work in Progress. 301 [Ref4] Pappalardo, D., "AT&T, customers grapple with ATM net 302 outage," Network World, February 26, 2001. 304 [Ref5] "AT&T announces cause of frame-relay network outage," AT&T 305 Press Release, April 22, 1998. 307 [Ref6] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive 308 Week, August 20, 1999. 310 [Ref7] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light 311 Reading, April 6, 2001. 313 [Ref8] A. Zinin and M. Shand, "Flooding Optimizations in Link-State 314 Routing Protocols," Work in Progress. 316 [Ref9] P. Pillay-Esnault, "OSPF Refresh and flooding reduction in 317 stable topologies," Work in progress. 319 [Ref10] G. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A. 320 Maunder, V. Manral, "Congestion Avoidance & Control for OSPF 321 Networks", Work in Progress. 323 [Ref11] B. M. Waxman, "Routing of Multipoint Connections," IEEE 324 Journal on Selected Areas in Communications, 6(9):1617-1622, 1988. 326 7. Authors' Addresses 328 Gagan L. Choudhury 329 AT&T 330 Room D5-3C21 331 200 Laurel Avenue 332 Middletown, NJ, 07748 333 USA 334 Phone: (732)420-3721 335 email: gchoudhury@att.com 337 Vera D. Sapozhnikova 338 AT&T 339 Room C5-2C29 340 200 Laurel Avenue 341 Middletown, NJ, 07748 342 USA 343 Phone: (732)420-2653 344 email: sapozhnikova@att.com 346 Gerald R. Ash 347 AT&T 348 Room D5-2A01 349 200 Laurel Avenue 350 Middletown, NJ, 07748 351 USA 352 Phone: (732)420-4578 353 email: gash@att.com 354 Anurag S. Maunder 355 Erlang Technology 356 2880 Scott Boulevard 357 Santa Clara, CA 95052 358 Phone: (408)420-7617 359 email: anuragm@erlangtech.com 361 Vishwas Manral 362 Motorola Inc. 363 189, Prashasan Nagar, 364 Road Number 72 365 Jubilee Hills, Hyderabad 366 India 367 email: vishwas@motorola.com 369 Appendix A. LSA Storm: Causes and Impact 371 An LSA storm may be initiated due to many reasons. Here 372 are some examples: 374 (a) one or more link failures due to fiber cuts, 376 (b) one or more router failures for some reason, e.g., software 377 crash or some type of disaster (including power outage) 378 in an office complex hosting many routers, 380 (c) Link/router flapping, 382 (d) requirement of taking down and later bringing back many 383 routers during a software/hardware upgrade, 385 (e) near-synchronization of the periodic 1800 second LSA refreshes 386 of a subset of LSAs, 388 (f) refresh of all LSAs in the system during a change in software 389 version, 391 (g) injecting a large number of external routes to OSPF due to 392 a procedural error, 394 (h) Router ID changes causing a large number of LSA re-originations 395 (possibly LSA purges as well depending on the implementation). 397 In addition to the LSAs originated as a direct result of link/router 398 failures, there may be other indirect LSAs as well. One example 399 in MPLS networks is traffic engineering LSAs originated at other 400 links as a result of significant change in reserved bandwidth 401 resulting from rerouting of Label Switched Paths (LSPs) that went 402 down during the link/router failure. 404 The LSA storm causes high CPU and memory utilization at the router 405 processor causing incoming packets to be delayed or dropped. 406 Delayed acknowledgments (beyond the retransmission timer value) 407 results in retransmissions, and delayed Hello packets (beyond the 408 Router-Dead interval) results in links being declared down. A 409 trunk-down event causes Router LSA origination by its end-point 410 routers. If traffic engineering LSAs are used for each link then 411 that type of LSAs would also be originated by the end-point routers 412 and potentially elsewhere as well due to significant changes in 413 reserved bandwidths at other links caused by the failure and reroute 414 of LSPs originally using the failed trunk. Eventually, when the 415 link recovers that would also trigger additional Router LSAs and 416 traffic engineering LSAs. 418 The retransmissions and additional LSA originations result in further 419 CPU and memory usage, essentially causing a positive feedback loop. 420 We define the LSA storm size as the number of LSAs in the original 421 storm and not counting any additional LSAs resulting from the 422 feedback loop described above. If the LSA storm is too large then 423 the positive feedback loop mentioned above may be large enough to 424 indefinitely sustain a large CPU and memory utilization at many 425 routers in the network, thereby driving the network to an unstable 426 state. In the past, network 427 outage events have been reported in IP and ATM networks using 428 link-state protocols such as OSPF, IS-IS, PNNI or some proprietary 429 variants. See for example [Ref4-Ref7]. In many of these examples, 430 large scale flooding of LSAs or other similar control messages 431 (either naturally or triggered by some bug or inappropriate 432 procedure) have been partly or fully responsible for network 433 instability and outage. 435 In Appendix B, we use a simulation model to show that there 436 is a certain LSA storm 437 size threshold above which the network may show unstable behavior 438 caused by large number of retransmissions, link failures due to 439 missed Hello packets and subsequent link recoveries. We also show 440 that the LSA storm size causing instability may be substantially 441 increased by providing prioritized treatment to Hello and LSA 442 Acknowledgment packets and by using an exponential backoff 443 algorithm for determining the LSA retransmission interval. 444 Furthermore, if we prioritize Hello 445 packets then even when the network operates somewhat above the 446 stability threshold, links are not declared down due to missed 447 Hellos. This implies that even though there is 448 control plane congestion due to many retransmissions, the data plane 449 stays up and no new LSAs are originated (besides the ones in the 450 original storm and the refreshes). These observations are the 451 basis of the first three recommendations in Section 2. 453 One might argue that the scalability issue of large networks should 454 be solved solely by dividing the network hierarchically into 455 multiple areas so that flooding of LSAs remains localized within 456 areas. However, this approach increases the network management 457 and design complexity and may result in less optimal routing between 458 areas. Also, ASE LSAs are flooded throughout the AS and it may be 459 a problem if there are large numbers of them. Furthermore, 460 a large number of summary LSAs may need to be flooded across 461 Areas and their numbers would increase significantly if 462 multiple Area Border Routers are employed for the purpose of 463 reliability. Thus it is important to allow the network to grow 464 towards as large a size as possible under a single area. 466 Our recommendations are synergistic with a broader set of scalability 467 and stability improvement proposals. [Ref8] proposes flooding 468 overhead reduction in case more than one interface goes to the same 469 neighbor. [Ref9] proposes a mechanism for 470 greatly reducing LSA refreshes in stable topologies. 472 [Ref10] proposes a wide range of congestion control and failure 473 recovery mechanisms (some of those ideas are covered in this 474 draft but [Ref10] has other ideas not covered here). 476 Appendix B. Simulation Study 478 The main motivation of this study is to show the network congestion 479 and instability caused by large LSA storms and the improvement in 480 stability and scalability that can be achieved by following the 481 recommendations in this memo. 483 Appendix B.1. The Network Under Simulation 485 We generate a random network over a rectangular grid using a 486 modified version of Waxman's algorithm [Ref11] that ensures that 487 the network is connected and has a pre-specified number of routers, 488 links, maximum number of neighbors per router, and maximum number 489 of adjacencies per router. The rectangular grid resembles the 490 continental U.S.A. with maximum one-way propagation delay of 30 ms 491 in the East-West direction and maximum one-way propagation delay of 492 15 ms in the North-South direction. We consider two different 493 network sizes as explained in Section B.2. 495 The network has a flat, single-area topology. 497 Each link is a point-to-point link connecting two routers. 499 We assume that router CPU and memory (not the link bandwidth) is the 500 main bottleneck in the LSA flooding process. This will typically 501 be true for high speed links (e.g., OC3 or above) and/or links 502 where OSPF traffic gets an adequate Quality of Service (QoS) 503 compared to other traffic. 505 Different Timers: 506 LSA refresh interval = 1800 seconds, 507 Hello refresh interval = 10 Seconds, 508 Router-Dead interval = 40 seconds, 509 LSA retransmission interval: two values are considered, 10 seconds 510 and 5 Seconds (note that a retransmission is disabled on the 511 receipt of either an explicit acknowledgment or a duplicate LSA 512 over the same interface that acts as an implicit acknowledgment) 513 Minimum time between successive origination of the same LSA = 5 514 seconds, 515 Minimum time between successive Dijkstra SPF calculations 516 is 1 second. 518 Packing of LSAs: It is assumed that for any given router, the LSAs 519 originated over a 1-second period are packed together to form an LSU 520 but no more than 3 LSAs are packed in one LSU. 522 LSU/Ack/Hello Processing Times: All processing times are expressed 523 in terms of the parameter T. Two values of T are considered, 1 ms 524 and 0.5 ms. 526 In the case of a dedicated processor for processing OSPF packets the 527 processing time reported represents the true processing time. If the 528 processor does other work and only a fraction of its capacity can be 529 dedicated to OSPF processing then we have to inflate the processing 530 time appropriately to get the effective processing time and in that 531 case it is assumed that the inflation factor is already taken into 532 account as part of the reported processing time. 534 The fixed time to send or receive any LSU, Ack or Hello packet is T. 535 In addition, a variable processing time is used for LSU and Ack 536 depending on the number and types of LSAs packed. No variable 537 processing time is used for Hello. 538 Variable processing time per Router LSA is (0.5 + 0.17L)T where L is 539 the number of adjacencies advertised by the Router LSA. For other 540 LSA types (e.g., ASE LSA or a "Link" LSA carrying traffic 541 engineering information about a link), the variable processing time 542 per LSA is 0.5T. 544 Variable processing time for an Ack is 25% that of the corresponding 545 LSA. 547 It is to be noted that if multiple LSAs are packed in a single LSU 548 packet then the fixed processing time is needed only once but the 549 variable processing time is needed for every component of the 550 packet. 552 The processing time values we use are roughly in the same range of 553 what has been observed in an operational network. 555 LSU/Ack/Hello Priority: Two non-preemptive priority levels and 556 three priority scenarios are considered. Within each priority level 557 processing is FIFO with new packets of lower priority being 558 dropped when the lower priority queue is full. The higher priority 559 packets are never dropped. 560 In Priority scenario 1, all LSUs/Acks/Hellos received at a router 561 are queued at the lower priority. 562 In Priority scenario 2, Hellos received at a router are queued at 563 the higher priority but LSUs/Acks are queued at lower priority. 564 In Priority scenario 3, Hellos and Acks received at a router are 565 queued at the higher priority but LSUs are queued at lower 566 priority. 567 All packets generated internally to a router (usually triggered by 568 a timer) are processed at the higher priority. This includes the 569 initial LSA storm, LSA refresh, Hello refresh, LSA retransmission 570 and new LSA origination after detection of a failure or recovery. 572 Buffer Size for Incoming LSUs/Acks/Hellos (lower priority): Buffer 573 size is assumed to be 2000 packets where a packet is either an Ack, 574 LSU, or Hello. 576 LSA Refresh: Each LSA is refreshed once in 1800 seconds and the 577 refresh instants of various LSAs in the LSDB are assumed to be 578 uniformly distributed over the 1800 seconds period, i.e., they are 579 completely unsynchronized. If however, an LSA is originated as part 580 of the initial LSA storm then it goes on a new refresh schedule of 581 once in 1800 seconds starting from its origination time. 583 LSA Storm Origination: As defined earlier, "LSA storm" is the 584 simultaneous or near simultaneous origination of a large number of 585 LSAs. In the case of only Router and ASE LSAs we normally assume 586 that the number of ASE LSAs in the storm is about 4 times that of 587 the Router LSAs, but the ratio is allowed to change if either the 588 Router or the ASE LSAs have reached their maximum possible value. 589 In the case of only Router and Link LSAs (carrying traffic 590 engineering information) we normally assume that the number of Link 591 LSAs in the storm is about 4 times that of the Router LSAs, but the 592 ratio is allowed to change if either the Router or the Link LSAs have 593 reached their maximum possible value. For any given LSA storm we 594 keep originating LSAs starting from router index 1 and moving 595 upwards and stop until the correct number of LSAs of each type have 596 been originated. The LSAs originated at any given router is assumed 597 to start at an instant uniformly distributed between 20 and 30 598 seconds from the start of the simulation. Successive LSA 599 originations at a router are assumed to be spaced apart by 400 ms. 600 It is to be noted that during the period of observation there are 601 other LSAs originated besides the ones in the storm. These include 602 refresh of LSAs that are not part of the storm and LSAs originated 603 due to possible link failures and subsequent possible link 604 recoveries. 606 Failure/Recovery of Links: If no Hello is received over a link (due 607 to CPU/memory congestion) for longer than Router-Dead Interval then 608 the link is declared down. At a later time, if Hellos are received 609 then the link would be declared up. Whenever a link is declared 610 up or down, one Router LSA is originated by each router on the 611 two sides of the point-to-point link. If "Link LSAs" carrying 612 traffic engineering information is used then it is assumed that each 613 router would also originate a Link LSA. In this case it is also 614 assumed that due to rerouting of LSPs, three other links in the 615 network (selected randomly in the simulation) would have significant 616 change in reserved bandwidth which would result in one Link LSA 617 being originated by the routers on the two ends of each such link. 619 Appendix B.2. Simulation Results 621 In this section we study the relative performance of the three 622 priority scenarios defined earlier (no priority to Hello or Ack, 623 priority to Hello only, and priority to both Hello and Ack) with a 624 range of Network sizes, LSA retransmission timer values, LSA types, 625 processing time values and Hello/Router-Dead-Interval values: 627 Network size: Two networks are considered. Network 1 has 100 routers, 628 1200 links, maximum number of neighbors per router is 30 and maximum 629 number of adjacencies per router is 50 (same neighbor may have more 630 than one adjacencies). Network 2 has 50 routers, 600 links, maximum 631 number of neighbors per router is 25 and maximum number of 632 adjacencies per router is 48. Dijkstra SPF calculation time for 633 Network 1 is assumed to be 100 ms and that for Network 2 is assumed 634 to be 70 ms. 636 LSA Type: Each router has 1 Router LSA (Total of 100 for Network 1 637 and 50 for Network 2). There are no Network LSAs since all links are 638 point-to-point links and no Summary LSAs since the network has only 639 one area. Regarding other LSA types we consider two situations. In 640 Situation 1 we assume that there are no ASE LSAs and each link has 641 one "Link" LSA carrying traffic engineering information (Total of 642 2400 for Network 1 and 1200 for Network 2). In Situation 2 we assume 643 that there are no "Link" LSAs and half of the routers are ASA-Border 644 routers and each border router has 10 ASE LSAs (Total of 500 for 645 Network 1 and 250 for Network 2). We identify Situation 1 as "Link 646 LSAs" and Situation 2 as "ASE LSAs". 648 LSA retransmission timer value: Two values are considered, 10 649 seconds and 5 seconds (default value). 651 Processing time values: Processing times for LSUs, Acks and Hello 652 packets have been previously expressed in terms of a common 653 parameter T. Two values are considered for T, which are 1 ms 654 and 0.5 ms respectively. 656 Hello/Router-Dead-Interval: It is assumed that Router-Dead interval 657 is four times the Hello interval. In one case it is assumed that 658 Hello interval is 10 seconds and Router-Dead-Interval is 40 659 seconds (default values), and in the other case it is assumed that 660 Hello interval is 2 seconds and Router-Dead-Interval is 8 seconds. 662 Based on Network size, LSA type and processing time values we 663 develop 6 Test cases as follows: 665 Case 1: Network 1, Link LSAs, retransmission timer = 10 sec., 666 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 668 Case 2: Network 1, ASE LSAs, retransmission timer = 10 sec., 669 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 670 Case 3: Network 1, Link LSAs, retransmission timer = 5 sec., 671 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 673 Case 4: Network 1, Link LSAs, retransmission timer = 10 sec., 674 T = 0.5 ms, Hello/Router-Dead-Interval = 10/40 sec. 676 Case 5: Network 1, Link LSAs, retransmission timer = 10 sec., 677 T = 1 ms, Hello/Router-Dead-Interval = 2/8 sec. 679 Case 6: Network 2, Link LSAs, retransmission timer = 10 sec., 680 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 682 For each case and for each Priority scenario we study the network 683 stability as a function of the size of the LSA storm. The stability 684 is determined by looking at the number of non-converged LSUs as a 685 function of time. An example is shown in Table 1 for Case 1 and 686 Priority scenario 1 (No priority to Hellos or Acks). 688 =========|========================================================== 689 | Number of Non-Converged LSUs in the Network at Time(in sec) 690 LSA | 691 STORM |====|=====|=====|=====|=====|=====|=====|=====|========|== 692 SIZE |10s | 20s | 30s | 35s | 40s | 50s | 60s | 80s | 100s | 693 =========|====|=====|=====|=====|=====|=====|=====|=====|========|== 694 100 | 0 | 0 | 24 | 29 | 24 | 1 | 0 | 1 | 1 | 695 (Stable)| | | | | | | | | | 696 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 697 140 | 0 | 0 | 35 | 48 | 46 | 27 | 14 | 1 | 1 | 698 (Stable)| | | | | | | | | | 699 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 700 160 | 0 | 0 | 38 | 57 | 55 | 40 | 26 | 65 | 203 | 701 (Unstable) | | | | | | | | | 702 =========|========================================================== 704 Table 1: Network Stability Vs. LSA Storm 705 (Case 1, No priority to Hello/Ack) 707 The LSA storm starts a little after 20 seconds and so for some 708 period of time after that the number of non-converged LSUs should 709 stay high and then come down for a stable network. 710 This happens for LSA storms of sizes 100 and 140. With an LSA storm 711 of size 160, the number of non-converged LSUs stay high indefinitely 712 due to repeated retransmissions, link failures due to missed Hellos 713 for more than the Router-Dead interval which originates additional 714 LSAs and also due to subsequent link recoveries which again 715 originate additional LSAs. We define network stability threshold as 716 the maximum allowable LSA storm size for which the number of 717 non-converged LSUs come down to a low level after some time. It 718 turns out that for this example the stability threshold is 719 150. 721 The network behavior as a function of the LSA storm size can 722 be categorized as follows: 724 (1) If the LSA storm is well below the stability threshold then 725 the CPU/memory congestion lasts only for a short period and 726 during this period there are very few retransmissions, very 727 few dropped OSPF packets and no link 728 failures due to missed Hellos. This type of LSA storm is 729 observed routinely in operational networks and networks 730 recover from them easily. 732 (2) If the LSA storm is just below the stability threshold then 733 the CPU/memory congestion lasts for a longer period and during 734 this period there may be considerable amount of retransmissions 735 and dropped OSPF packets. If Hello packets are not given 736 priority then there may also be some link failures due to 737 missed Hellos. However, the network does go back to a stable 738 state eventually. This type of LSA storm may happen rarely in 739 operational networks and they recover from it with some 740 difficulty. 742 (3) If the LSA storm is above the stability threshold then 743 the CPU/memory congestion may last indefinitely unless 744 some special procedure for relieving congestion is followed. 745 During this period there are considerable amount of 746 retransmissions and dropped OSPF packets. If Hello packets are 747 not given priority then there would also be link failures due 748 to missed Hellos. This type of LSA storm may happen very rarely 749 in operational networks and usually some manual procedure such as 750 taking down adjacencies in heavily congested routers is needed. 752 (4) If Hello packets are given priority then the network stability 753 threshold increases, i.e., the network can withstand a larger 754 LSA storm. Furthermore, even if the network operates at or 755 somewhat above this higher stability threshold, Hellos are 756 still not missed and so there are no link failures. So even 757 if there is congestion in the control plane due to increased 758 retransmissions requiring some special procedures for congestion 759 reduction, the data plane remains unaffected. 761 (5) If both Hello and Acknowledgement packets are given priority 762 then the stability threshold increases even further. 764 In Table 2 we show the network stability threshold for the five 765 different cases and for the three different priority scenarios 766 defined earlier. 768 |===========|========================================================| 769 | | Maximum Allowable LSA Storm Size For | 770 | Case |=================|==================|===================| 771 | Number | No Priority to |Priority to Hello | Priority to Hello | 772 | | Hello or Ack | Only | and Ack | 773 |===========|=================|==================|===================| 774 | Case 1 | 150 | 190 | 250 | 775 |___________|_________________|__________________|___________________| 776 | Case 2 | 185 | 215 | 285 | 777 |___________|_________________|__________________|___________________| 778 | Case 3 | 115 | 127 | 170 | 779 |___________|_________________|__________________|___________________| 780 | Case 4 | 320 | 375 | 580 | 781 |___________|_________________|__________________|___________________| 782 | Case 5 | 120 | 175 | 225 | 783 |___________|_________________|__________________|___________________| 784 | Case 6 | 185 | 224 | 285 | 785 |___________|_________________|__________________|___________________| 787 Table 2: Maximum Allowable LSA Storm for a Stable Network 789 We also considered one more scenario with priority to Hello and Ack 790 and with a truncated binary exponential backoff of the 791 retransmission interval with an upper limit of 40 seconds (for the 792 same LSA, each successive retransmission interval 793 is doubled but not to exceed 40 seconds). The maximum allowed 794 LSA storm size for this scenario significantly exceeded the numbers 795 given in the third column. 797 Appendix B.3. Observations on Simulation Results 799 Table 2 shows that in all cases prioritizing Hello packets increases 800 the network stability threshold, and in addition, prioritization of 801 LSA Acknowledgment packets increases the stability threshold even 802 further. The reasons for the above observations are as follows. 803 The main sources of sustained CPU/memory congestion (or positive 804 feedback loop) following an LSA storm are (1) LSA retransmissions 805 and (2) links being declared down due to missed Hellos which in 806 turn causes further LSA origination and future recovery of the link 807 causing even more LSA originations. 808 Prioritizing Hello packets avoids and practically eliminates the 809 second source of congestion. Prioritizing Acknowledgments 810 significantly reduces the first source of congestion, i.e., 811 LSA retransmissions. It is to be noted that retransmissions can 812 not be completely eliminated due to the following reasons. Firstly, 813 only the explicit Acknowledgments are prioritized but duplicate 814 LSAs carrying implicit Acknowledgments are still served at the 815 lower priority. Secondly, LSAs may get greatly delayed or dropped 816 at the input queue of receivers and therefore Acknowledgments may 817 not even get generated in which case prioritizing Acks would not 818 help. Another factor to keep in mind is that since Hellos and Acks 819 are prioritized, the LSAs see bigger delay and potential for 820 dropping. However, the simulation results show that on the whole 821 prioritizing Hello and LSA Acks are always beneficial and 822 significantly improve the network stability threshold. 824 As stated in Section B.2, exponenetial backoff of LSA retransmission 825 interval further increases the network stability threshold. 827 Our simulation study also showed that in each of the cases, instead 828 of prioritizing Hello packets if we treat any packet received over 829 a link as a surrogate for a Hello packet (an implicit Hello) then 830 we get about the same stability threshold as obtained with 831 prioritizing Hello packets. 833 Appendix C. Other Recommendations 835 (1) Explicit Marking: In Section 2 we recommended that OSPF packets 836 be classified to "high" and "low" priority classes based on 837 examining the OSPF packet header. In some cases (particularly 838 in the receiver) this examination may be computationally 839 costly. An alternative would be the 840 use of different TOS/Precedence field settings for the two 841 priority classes. [Ref1] recommends setting the TOS field to 0 842 and the Precedence field to 6 for all OSPF packets. We recommend 843 this same setting for the "low" priority OSPF packets and a 844 different setting for the "high" priority OSPF packets in order 845 to be able to classify them separately without having to examine 846 the OSPF packet header. Two examples are given below: 848 Example 1: For "low" priority packets set TOS field to 0 and 849 Precedence field to 6, and for "high" priority 850 packets set TOS field to 4 and Precedence field to 6. 852 Example 2: For "low" priority packets set TOS field to 0 and 853 Precedence field to 6, and for "high" priority 854 packets set TOS field to 0 and Precedence field to 7. 856 This recommendation is not needed to be followed if it is easy 857 to examine the OSPF packet header and thereby separately 858 classify "high" and "low" priority packets. 860 (2) Further Prioritization of OSPF Packets: Besides the packets 861 designated as "high" priority in Recommendation 1 of Section 2 862 there may be a need for further priority separation among the 863 "low" priority OSPF packets. We recommend the use of three 864 priority classes: "high", "medium" and "low". While 865 receiving a packet from a neighbor and while transmitting 866 a packet to a neighbor, try to process a "high priority" 867 packet ahead of "medium" and "low" priority packets and 868 a "medium" priority packet ahead of "low priority" packets. 869 The "high" priority packets are as designated in Recommendation 870 1 of Section 2. We provide below two candidate examples for 871 "medium" priority packets. All OSPF packets not designated 872 as "high" or "medium" priority are "low" priority. 874 One example of "medium" priority packet is the 875 Database Description (DBD) packet from a slave (during the 876 database synchronization process) that is used as an 877 acknowledgment. 879 A second example is an LSA carrying 880 intra-area topology change information (this may trigger 881 SPF calculation and rerouting of Label Switched paths and so 882 fast processing of this packet may improve OSPF/LDP convergence 883 times). However, if the processing cost of identifying and 884 separately queueing the LSA in this example is deemed to be high 885 then the implementor may decide not to do it.