idnits 2.17.1 draft-ietf-ospf-scalability-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Ref1-Ref4' is mentioned on line 141, but not defined == Unused Reference: 'Ref1' is defined on line 648, but no explicit reference was found in the text == Unused Reference: 'Ref2' is defined on line 651, but no explicit reference was found in the text == Unused Reference: 'Ref3' is defined on line 654, but no explicit reference was found in the text == Unused Reference: 'Ref4' is defined on line 657, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref1' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref2' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref3' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref4' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref5' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref6' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref7' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref8' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref9' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref10' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref11' Summary: 4 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Gagan L. Choudhury 3 Internet Draft Vera D. Sapozhnikova 4 Expires in May, 2003 AT&T 5 draft-ietf-ospf-scalability-02.txt 6 Anurag S. Maunder 7 Sanera Systems 9 Vishwas Manral 10 Netplane Systems 12 November, 2002 14 Explicit Marking and Prioritized Treatment of Specific OSPF Packets 15 for Faster Convergence and Improved Network Scalability and 16 Stability 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance 21 with all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet-Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 37 Distribution of this memo is unlimited. 39 Abstract 41 In this draft we propose the following mechanisms to improve 42 the scalability and stability of OSPF-based network: 44 (1) Process the Hello packets at a higher priority compared to other 45 OSPF packets. In order to facilitate this, explicitly mark the 46 Hello packets, to differentiate them from other OSPF packets. 47 One way of special marking is to use a different Diffserv 48 codepoint for Hello packets compared to other OSPF packets. 50 (2) In the absence of special marking, or in addition to it, use 51 other mechanisms in order not to miss Hello packets. One example 52 is to treat any packet received over a link as a surrogate for 53 a Hello packet (an implicit Hello) for the purpose of keeping 54 the link alive. 56 (3) The same type of explicit marking and prioritized treatment may 57 be beneficial to other OSPF packets as well. One important 58 example is LSA acknowledgment packet that can reduce 59 retransmissions during periods of congestion. Other examples 60 include (a) Database description (DBD) packet from a slave that 61 is used as an acknowledgement, and (b) LSAs carrying intra-area 62 topology change information. 64 It is possible that some implementations are already using one or 65 more of the above mechanisms in order not to miss the processing of 66 critical packets during periods of congestion. However, we suggest 67 the above mechanisms to be included as part of the standard so that 68 all implementations can benefit from them. 70 Table of Contents 72 1. Introduction...................................................2 73 2. The Network Under Simulation...................................5 74 3. Simulation Results ............................................7 75 4. Observations on Simulation Results ...........................11 76 5. Need for Prioritized Treatment of Critical OSPF Packets and 77 Special Marking to Facilitate That............................12 78 6. Summary.......................................................13 79 7. Acknowledgments...............................................14 80 8. References....................................................14 81 9. Authors' Addresses............................................15 83 1. Introduction 85 Due to world-wide increased traffic demand, data networks are ever 86 increasing in size in terms of number of nodes, number of links, 87 adjacencies per node and Link State Database size. Our motivation 88 is to improve the ability of large networks to withstand 89 the simultaneous or near-simultaneous update of a large number of 90 link-state-advertisement messages, or LSAs. We call this event, an 91 LSA storm. An LSA storm may be initiated due to many reasons. Here 92 are some examples: 94 (a) one or more link failures due to fiber cuts, 95 (b) one or more node failures for some reason, e.g., software 96 crash or some type of disaster in an office complex hosting 97 many nodes, 99 (c) requirement of taking down and later bringing back many 100 nodes during a software/hardware upgrade, 102 (d) near-synchronization of the once-in-30-minutes refresh instants 103 of some types of LSAs, 105 (e) refresh of all LSAs in the system during a change in software 106 version. 108 In addition to the LSAs generated as a direct result of link/node 109 failures, there may be other indirect LSAs as well. One example 110 in MPLS networks is traffic engineering LSAs generated at other 111 links as a result of significant change in reserved bandwidth 112 resulting from rerouting of Label Switched Paths (LSPs) that went 113 down during the link/node failure. 115 The LSA storm causes high CPU and memory utilization at the node 116 processors causing incoming packets to be delayed or dropped. 117 Delayed acknowledgements (beyond the retransmission timer value) 118 results in retransmissions, and delayed Hello packets (beyond the 119 Router-Dead interval) results in links being declared down. A 120 trunk-down event causes Router LSA generation by its end-point 121 nodes. If traffic engineering LSAs are used for each link then 122 that type of LSAs would also be generated by the end-point nodes 123 and potentially elsewhere as well due to significant changes in 124 reserved bandwidths at other links caused by the failure and reroute 125 of LSPs originally using the failed trunk. Eventually, when the 126 link recovers that would also trigger additional Router and traffic 127 engineering LSAs. 129 The retransmissions and additional LSA generations result in further 130 CPU and memory usage, essentially causing a positive feedback loop. 131 We define the LSA storm size as the number of LSAs in the original 132 storm and not counting any additional LSAs resulting from the 133 feedback loop described above. If the LSA storm is too large then 134 the positive feedback loop mentioned above may be large enough to 135 indefinitely sustain a large CPU and memory utilization at many 136 network nodes, thereby driving the network to an unstable state. 138 In the past, network 139 outage events have been reported in IP and ATM networks using 140 link-state protocols such as OSPF, IS-IS, PNNI or some proprietary 141 variants. See, for example [Ref1-Ref4]. In many of these examples, 142 large scale flooding of LSAs or other similar control messages 143 (either naturally or triggered by some bug or inappropriate 144 procedure) have been partly or fully responsible for network 145 instability and outage. 147 It has been suggested [Ref5] to reduce the Hello interval and 148 Router-Dead interval significantly in order for OSPF to detect 149 link failures and recoveries faster. Reduction of Router-Dead 150 interval would make it even more likely for links to be declared down 151 due to missed Hellos. 153 We use a simulation model to show that there is a certain LSA storm 154 size threshold above which the network may show unstable behavior 155 caused by large number of retransmissions, link failures due to 156 missed Hello packets and subsequent link recoveries. We also show 157 that the LSA storm size causing instability may be substantially 158 increased by providing prioritized treatment to Hello and LSA 159 Acknowledgment packets. Furthermore, if we prioritize Hello 160 packets then even when the network operates somewhat above the 161 stability threshold, links are not declared down due to missed 162 Hellos. This implies that even though there is 163 control plane congestion due to many retransmissions, the data plane 164 stays up and no new LSAs are generated (besides the ones in the 165 original storm and the refreshes). Based on these observations 166 we propose prioritized treatment of Hello, LSA acknowledgment 167 and other critical OSPF packets and a special marking to facilitate 168 that. 170 One might argue that the scalability issue of large networks should 171 be solved solely by dividing the network hierarchically into 172 multiple areas so that flooding of LSAs remains localized within 173 areas. However, this approach increases the network management 174 and design complexity and may result in less optimal routing between 175 areas. Also, ASE LSAs are flooded throughout the AS and it may be 176 a problem if there are large numbers of them. Furthermore, 177 a large number of summary LSAs may need to be flooded across 178 Areas and their numbers would increase significantly if 179 multiple Area Border Routers are employed for the purpose of 180 reliability. Thus it is important to allow the network to grow 181 towards as large a size as possible under a single area. 183 Our proposal here is synergistic with a broader set of scalability 184 and stability improvement proposals. [Ref6, Ref7] proposes flooding 185 overhead reduction in case more than one interface goes to the same 186 neighbor. [Ref8] proposes a mechanism for 187 greatly reducing LSA refreshes in stable topologies. [Ref9] compares 188 several restricted flooding algorithms in terms of their ability to 189 withstand large LSA storms and robustness to failure conditions. 190 [Ref10] proposes a wide range of congestion control and failure 191 recovery mechanisms. 193 Section 2 describes the network under simulation and Section 3 194 provides the simulation results. Section 4 gives the basic 195 observations based on the simulation results. Section 5 explains 196 the need for prioritized treatment of certain critical OSPF packets 197 and special marking to facilitate that. Section 6 gives the summary. 199 2. The Network Under Simulation 201 We generate a random network over a rectangular grid using a 202 modified version of Waxman's algorithm [Ref11] that ensures that 203 the network is connected and has a pre-specified number of nodes, 204 links, maximum number of neighbors per node, and maximum number 205 of adjacencies per node. The rectangular grid resembles the 206 continental U.S.A. with maximum one-way propagation delay of 30 ms 207 in the East-West direction and maximum one-way propagation delay of 208 15 ms in the North-South direction. We consider two different 209 network sizes as explained in Section 3. 211 The network has a flat, single-area topology. 213 Each node is a Router and each link is a point-to-point link 214 connecting two routers. 216 We assume that node CPU and memory (not the link bandwidth) is the 217 main bottleneck in the LSA flooding process. This will typically 218 be true for high speed links (e.g., OC3 or above) and/or links 219 where OSPF traffic gets an adequate Quality of Service (QoS) 220 compared to other traffic. 222 Different Timers: 223 LSA refresh interval = 1800 seconds, 224 Hello refresh interval = 10 Seconds, 225 Router-Dead interval = 40 seconds, 226 LSA retransmission interval: two values are considered, 10 seconds 227 and 5 Seconds (note that a retransmission is disabled on the 228 receipt of either an explicit acknowledgment or a duplicate LSA 229 over the same interface that acts as an implicit acknowledgment) 230 Minimum time between successive generation of the same LSA = 5 231 seconds, 232 Minimum time between successive Dijkstra SPF calculations 233 is 1 second. 235 Packing of LSAs: It is assumed that for any given node, the LSAs 236 generated over a 1-second period are packed together to form an LSU 237 but no more than 3 LSAs are packed in one LSU. 239 LSU/Ack/Hello Processing Times: All processing times are expressed 240 in terms of the parameter T. Two values of T are considered, 1 ms 241 and 0.5 ms. 243 In the case of a dedicated processor for processing OSPF packets the 244 processing time reported represents the true processing time. If the 245 processor does other work and only a fraction of its capacity can be 246 dedicated to OSPF processing then we have to inflate the processing 247 time appropriately to get the effective processing time and in that 248 case it is assumed that the inflation factor is already taken into 249 account as part of the reported processing time. 251 The fixed time to send or receive any LSU, Ack or Hello packet is T. 252 In addition, a variable processing time is used for LSU and Ack 253 depending on the number and types of LSAs packed. No variable 254 processing time is used for Hello. 255 Variable processing time per Router LSA is (0.5 + 0.17L)T where L is 256 the number of adjacencies advertised by the Router LSA. For other 257 LSA types (e.g., ASE LSA or a "Link" LSA carrying traffic 258 engineering information about a link), the variable processing time 259 per LSA is 0.5T. 261 Variable processing time for an Ack is 25% that of the corresponding 262 LSA. 264 It is to be noted that if multiple LSAs are packed in a single LSU 265 packet then the fixed processing time is needed only once but the 266 variable processing time is needed for every component of the 267 packet. 269 The processing time values we use are roughly in the same range of 270 what has been observed in an operational network. 272 LSU/Ack/Hello Priority: Two non-preemptive priority levels and 273 three priority scenarios are considered. Within each priority level 274 processing is FIFO with new packets of lower priority being 275 dropped when the lower priority queue is full. The higher priority 276 packets are never dropped. 277 In Priority scenario 1, all LSUs/Acks/Hellos received at a node 278 are queued at the lower priority. 279 In Priority scenario 2, Hellos received at a node are queued at 280 the higher priority but LSUs/Acks are queued at lower priority. 281 In Priority scenario 3, Hellos and Acks received at a node are 282 queued at the higher priority but LSUs are queued at lower 283 priority. 284 All packets generated internally to a node (usually triggered by 285 a timer) are processed at the higher priority. This includes the 286 initial LSA storm, LSA refresh, Hello refresh, LSA retransmission 287 and new LSA generation after detection of a failure or recovery. 289 Buffer Size for Incoming LSUs/Acks/Hellos (lower priority): Buffer 290 size is assumed to be 2000 packets where a packet is either an Ack, 291 LSU, or Hello. 293 LSA Refresh: Each LSA is refreshed once in 1800 seconds and the 294 refresh instants of various LSAs in the LSDB are assumed to be 295 uniformly distributed over the 1800 seconds period, i.e., they are 296 completely unsynchronized. If however, an LSA is generated as part 297 of the initial LSA storm then it goes on a new refresh schedule of 298 once in 1800 seconds starting from its generation time. 300 LSA Storm Generation: As defined earlier, "LSA storm" is the 301 simultaneous or near simultaneous generation of a large number of 302 LSAs. In the case of only Router and ASE LSAs we normally assume 303 that the number of ASE LSAs in the storm is about 4 times that of 304 the Router LSAs, but the ratio is allowed to change if either the 305 Router or the ASE LSAs have reached their maximum possible value. 306 In the case of only Router and Link LSAs (carrying traffic 307 engineering information) we normally assume that the number of Link 308 LSAs in the storm is about 4 times that of the Router LSAs, but the 309 ratio is allowed to change if either the Router or the Link LSAs 310 have reached their maximum possible value. For any given LSA storm 311 we keep generating LSAs starting from Node index 1 and moving 312 upwards and stop until the correct number of LSAs of each type have 313 been generated. The LSAs generated at any given node is assumed to 314 start at an instant uniformly distributed between 20 and 30 seconds 315 from the start of the simulation. Successive LSA generations at a 316 node are assumed to be spaced apart by 400 ms. It is to be noted 317 that during the period of observation there are other LSAs 318 generated besides the ones in the storm. These include refresh of 319 LSAs that are not part of the storm and LSAs generated due to 320 possible link failures and subsequent possible link recoveries. 322 Failure/Recovery of Links: If no Hello is received over a link (due 323 to CPU/memory congestion) for longer than Router-Dead Interval then 324 the link is declared down. At a later time, if Hellos are received 325 then the link would be declared up. Whenever a link is declared 326 up or down, one Router LSA is generated by each Router on the 327 two sides of the point-to-point link. If "Link LSAs" carrying 328 traffic engineering information is used then it is assumed that each 329 Router would also generate a Link LSA. In this case it is also 330 assumed that due to rerouting of LSPs, three other links in the 331 network (selected randomly in the simulation) would have significant 332 change in reserved bandwidth which would result in one Link LSA 333 being generated by the routers on the two ends of each such link. 335 3. Simulation Results 337 In this section we study the relative performance of the three 338 Priority scenarios defined earlier (no priority to Hello or Ack, 339 priority to Hello only, and priority to both Hello and Ack) with a 340 range of Network sizes, LSA retransmission timer values, LSA types, 341 processing time values and Hello/Router-Dead-Interval values: 343 Network size: Two networks are considered. Network 1 has 100 nodes, 344 1200 links, maximum number of neighbors per node is 30 and maximum 345 number of adjacencies per node is 50 (same neighbor may have more 346 than one adjacencies). Network 2 has 50 nodes, 600 links, maximum 347 number of neighbors per node is 25 and maximum number of adjacencies 348 per node is 48. Dijkstra SPF calculation time for Network 1 is 349 assumed to be 100 ms and that for Network 2 is assumed to be 70 ms. 351 LSA Type: Each node has 1 Router LSA (Total of 100 for Network 1 and 352 50 for Network 2). There are no Network LSAs since all links are 353 point-to-point links and no Summary LSAs since the network has only 354 one area. Regarding other LSA types we consider two situations. In 355 Situation 1 we assume that there are no ASE LSAs and each link has 356 one "Link" LSA carrying traffic engineering information (Total of 357 2400 for Network 1 and 1200 for Network 2). In Situation 2 we assume 358 that there are no "Link" LSAs and half of the nodes are ASA-Border 359 nodes and each border node has 10 ASE LSAs (Total of 500 for 360 Network 1 and 250 for Network 2). We identify Situation 1 as "Link 361 LSAs" and Situation 2 as "ASE LSAs". 363 LSA retransmission timer value: Two values are considered, 10 364 seconds and 5 seconds (default value). 366 Processing time values: Processing times for LSUs, Acks and Hello 367 packets have been previously expressed in terms of a common 368 parameter T. Two values are considered for T, which are 1 ms 369 and 0.5 ms respectively. 371 Hello/Router-Dead-Interval: It is assumed that Router-Dead interval 372 is four times the Hello interval. In one case it is assumed that 373 Hello interval is 10 seconds and Router-Dead-Interval is 40 374 seconds (default values), and in the other case it is assumed that 375 Hello interval is 2 seconds and Router-Dead-Interval is 8 seconds. 377 Based on Network size, LSA type and processing time values we 378 develop 6 Test cases as follows: 380 Case 1: Network 1, Link LSAs, retransmission timer = 10 sec., 381 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 383 Case 2: Network 1, ASE LSAs, retransmission timer = 10 sec., 384 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 386 Case 3: Network 1, Link LSAs, retransmission timer = 5 sec., 387 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 389 Case 4: Network 1, Link LSAs, retransmission timer = 10 sec., 390 T = 0.5 ms, Hello/Router-Dead-Interval = 10/40 sec. 392 Case 5: Network 1, Link LSAs, retransmission timer = 10 sec., 393 T = 1 ms, Hello/Router-Dead-Interval = 2/8 sec. 395 Case 6: Network 2, Link LSAs, retransmission timer = 10 sec., 396 T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec. 398 For each case and for each Priority scenario we study the network 399 stability as a function of the size of the LSA storm. The stability 400 is determined by looking at the number of non-converged LSUs as a 401 function of time. An example is shown in Table 1 for Case 1 and 402 Priority scenario 1 (No priority to Hellos or Acks). 404 =========|========================================================== 405 | Number of Non-Converged LSUs in the Network at Time(in sec) 406 LSA | 407 STORM |====|=====|=====|=====|=====|=====|=====|=====|========|== 408 SIZE |10s | 20s | 30s | 35s | 40s | 50s | 60s | 80s | 100s | 409 =========|====|=====|=====|=====|=====|=====|=====|=====|========|== 410 100 | 0 | 0 | 24 | 29 | 24 | 1 | 0 | 1 | 1 | 411 (Stable)| | | | | | | | | | 412 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 413 140 | 0 | 0 | 35 | 48 | 46 | 27 | 14 | 1 | 1 | 414 (Stable)| | | | | | | | | | 415 ---------|----|-----|-----|-----|-----|-----|-----|-----|--------|-- 416 160 | 0 | 0 | 38 | 57 | 55 | 40 | 26 | 65 | 203 | 417 (Unstable) | | | | | | | | | 418 =========|========================================================== 420 Table 1: Network Stability Vs. LSA Storm 421 (Case 1, No priority to Hello/Ack) 423 The LSA storm starts a little after 20 seconds and so for some 424 period of time after that the number of non-converged LSUs should 425 stay high and then come down for a stable network. 426 This happens for LSA storms of sizes 100 and 140. With an LSA storm 427 of size 160, the number of non-converged LSUs stay high indefinitely 428 due to repeated retransmissions, link failures due to missed Hellos 429 for more than the Router-Dead interval which generates additional 430 LSAs and also due to subsequent link recoveries which again 431 generate additional LSAs. We define network stability threshold as 432 the maximum allowable LSA storm size for which the number of 433 non-converged LSUs come down to a low level after some time. It 434 turns out that for this example the stability threshold is 435 150. 437 The network behavior as a function of the LSA storm size can 438 be categorized as follows: 440 (1) If the LSA storm is well below the stability threshold then 441 the CPU/memory congestion lasts only for a short period and 442 during this period there are very few retransmissions, very 443 few dropped OSPF packets and no link 444 failures due to missed Hellos. This type of LSA storms are 445 observed routinely in operational networks and networks 446 recover from them easily. 448 (2) If the LSA storm is just below the stability threshold then 449 the CPU/memory congestion lasts for a longer period and during 450 this period there may be considerable amount of retransmissions 451 and dropped OSPF packets. If Hello packets are not given 452 priority then there may also be some link failures due to 453 missed Hellos. However, the network does go back to a stable 454 state eventually. This type of LSA storm may happen rarely in 455 operational networks and they recover from it with some 456 difficulty. 458 (3) If the LSA storm is above the stability threshold then 459 the CPU/memory congestion may last indefinitely unless 460 some special procedure for relieving congestion is followed. 461 During this period there are considerable amount of 462 retransmissions and dropped OSPF packets. If Hello packets are 463 not given priority then there would also be link failures due 464 to missed Hellos. This type of LSA storm may happen very 465 rarely in operational networks and usually some manual procedure 466 such as taking down adjacencies in heavily congested nodes is 467 needed. 469 (4) If Hello packets are given priority then the network stability 470 threshold increases, i.e., the network can withstand a larger 471 LSA storm. Furthermore, even if the network operates at or 472 somewhat above this higher stability threshold, Hellos are 473 still not missed and so there are no link failures. So even 474 if there is congestion in the control plane due to increased 475 retransmissions requiring some special procedures for congestion 476 reduction, the data plane remains unaffected. 478 (5) If both Hello and Acknowledgement packets are given priority 479 then the stability threshold increases even further. 481 In Table 2 we show the network stability threshold for the five 482 different cases and for the three different priority scenarios 483 defined earlier. 485 |===========|========================================================| 486 | | Maximum Allowable LSA Storm Size For | 487 | Case |=================|==================|===================| 488 | Number | No Priority to |Priority to Hello | Priority to Hello | 489 | | Hello or Ack | Only | and Ack | 490 |===========|=================|==================|===================| 491 | Case 1 | 150 | 190 | 250 | 492 |___________|_________________|__________________|___________________| 493 | Case 2 | 185 | 215 | 285 | 494 |___________|_________________|__________________|___________________| 495 | Case 3 | 115 | 127 | 170 | 496 |___________|_________________|__________________|___________________| 497 | Case 4 | 320 | 375 | 580 | 498 |___________|_________________|__________________|___________________| 499 | Case 5 | 120 | 175 | 225 | 500 |___________|_________________|__________________|___________________| 501 | Case 6 | 185 | 224 | 285 | 502 |___________|_________________|__________________|___________________| 504 Table 2: Maximum Allowable LSA Storm for a Stable Network 506 4. Observations on Simulation Results 508 Table 2 shows that in all cases prioritizing Hello packets increases 509 the network stability threshold, and in addition, prioritization of 510 LSA Acknowledgment packets increases the stability threshold even 511 further. The reasons for the above observations are as follows. 512 The main sources of sustained CPU/memory congestion (or positive 513 feedback loop) following an LSA storm are (1) LSA retransmissions 514 and (2) links being declared down due to missed Hellos which in 515 turn causes further LSA generation and future recovery of the link 516 causing even more LSA generation. 517 Prioritizing Hello packets avoids and practically eliminates the 518 second source of congestion. Prioritizing Acknowledgements 519 significantly reduces the first source of congestion, i.e., 520 LSA retransmissions. It is to be noted that retransmissions can 521 not be completely eliminated due to the following reasons. Firstly, 522 only the explicit Acknowledgments are prioritized but duplicate 523 LSAs carrying implicit Acknowledgments are still served at the 524 lower priority. Secondly, LSAs may get greatly delayed or dropped 525 at the input queue of receivers and therefore Acknowledgments may 526 not even get generated in which case prioritizing Acks would not 527 help. Another factor to keep in mind is that since Hellos and Acks 528 are prioritized, the LSAs see bigger delay and potential for 529 dropping. However, the simulation results show that on the whole 530 prioritizing Hello and LSA Acks are always beneficial and 531 significantly improve the network stability threshold. 533 Our simulation study also showed that in each of the cases, instead 534 of prioritizing Hello packets if we treat any packet received over 535 a link as a surrogate for a Hello packet (an implicit Hello) then 536 we get about the same stability threshold as obtained with 537 prioritizing Hello packets. 539 If we prioritize Hello packets then even when the network operates 540 somewhat above the stability threshold, links are not declared 541 down due to missed Hellos. This implies that even though there is 542 control plane congestion due to many retransmissions, the data plane 543 stays up and no new LSAs are generated (besides the ones in the 544 original storm and the refreshes) 546 5. Need for Prioritized Treatment of Critical OSPF Packets and 547 Special Marking to Facilitate That 549 The observations in the previous section clearly show that 550 prioritizing Hello and LSA Acknowledgment packets are greatly 551 beneficial in improving the scalability and stability of large 552 networks. In addition to these packets it may be beneficial 553 to treat certain other OSPF packets at the higher priority as well. 554 One example (during the database exchange process between neighbors 555 following a link recovery) is the Database Description packet from 556 a slave that is used as an acknowledgment for the previous Database 557 Description packet sent from the master. Another example is an LSA 558 carrying a change information which may trigger SPF calculation 559 and rerouting of Label Switched Paths. It is preferable to transmit 560 this information faster than other LSAs in the network that are 561 just once-in-30-minutes refreshes and typically would not trigger 562 any route computation or route change. 564 Given that there is a need for providing prioritized treatment 565 to certain OSPF packets, the next natural question is how to 566 facilitate this prioritization. 568 If it is possible to 569 examine the packet header (for the purpose of prioritization) 570 much faster than processing the whole packet then prioritized 571 treatment is possible without any protocol changes. 573 However, we also propose that a special marking be used for 574 categorizing all OSPF packets into one of two priority classes. 575 It is also important to separately mark OSPF packets from other 576 IP packets. One way to do this is to reserve two diffserv 577 codepoints, one for higher priority OSPF packets and another 578 one for lower priority OSPF packets. With this special 579 marking it would be easy for OSPF implementers to 580 treat Hello, LSA acknowledgment, and other critical OSPF 581 packets at a higher priority and thereby significantly 582 improve the scalability and stability of networks using 583 OSPF. 585 6. Summary 587 In this draft we point out that the node processors of a large 588 network may be subjected to a sustained CPU/Memory congestion 589 as a result of a large LSA storm caused by some type of 590 failure/recovery of nodes/links or synchronization among refreshes. 591 There is a certain LSA storm size threshold above which the network 592 may show unstable behavior caused by large number of 593 retransmissions, link failures due to missed Hello packets and 594 subsequent link recoveries. Using a simulation study we show that 595 the LSA storm size causing instability may be substantially 596 increased by providing prioritized treatment to Hello and LSA 597 Acknowledgment packets. Furthermore, if we prioritize Hello 598 packets then even when the network operates somewhat above the 599 stability threshold, links are not declared down due to missed 600 Hellos. This implies that even though there is 601 control plane congestion due to many retransmissions, the data plane 602 stays up and no new LSAs are generated (besides the ones in the 603 original storm and the refreshes). 605 Based on the above observations we propose the following: 607 (1) Process the Hello packets at a higher priority compared to other 608 OSPF packets. In order to facilitate this, explicitly mark the 609 Hello packets, to differentiate them from other OSPF packets. 610 One way of special marking is to use a different Diffserv 611 codepoint for Hello packets compared to other OSPF packets. 613 (2) In the absence of special marking, or in addition to it, use 614 other mechanisms in order not to miss Hello packets. One example 615 is to treat any packet received over a link as a surrogate for 616 a Hello packet (an implicit Hello) for the purpose of keeping 617 the link alive. Our simulation study shows that this mechanism 618 is just as effective as explicitly prioritizing Hello 619 packets. 621 (3) The same type of explicit marking and prioritized treatment may 622 be beneficial to other OSPF packets as well. One important 623 example is LSA acknowledgment packet that can reduce 624 retransmissions during periods of congestion. Our simulation 625 study shows that prioritization of both Hello and LSA 626 Acknowledgment packets is considerably more effective than 627 just prioritizing Hello packets. Other examples 628 include (a) Database description (DBD) packet from a slave that 629 is used as an acknowledgement, and (b) LSAs carrying intra-area 630 topology change information. 632 It is possible that some implementations are already using one or 633 more of the above mechanisms in order not to miss the processing of 634 critical packets during periods of congestion. However, we suggest 635 the above mechanisms to be included as part of the standard so that 636 all implementations can benefit from them. 638 7. Acknowledgments 640 We would like to acknowledge Jerry Ash, Margaret Chiosi, Elie 641 Francis, Jeff Han, Beth Munson, Roshan Rao, Moshe Segal, Mike 642 Wardlow, and Pat Wirth for collaboration and encouragement in 643 our scalability improvement efforts for Link-State-Protocol based 644 networks. 646 8. References 648 [Ref1] Pappalardo, D., "AT&T, customers grapple with ATM net 649 outage," Network World, February 26, 2001. 651 [Ref2] "AT&T announces cause of frame-relay network outage," AT&T 652 Press Release, April 22, 1998. 654 [Ref3] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive 655 Week, August 20, 1999. 657 [Ref4] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light 658 Reading, April 6, 2001. 660 [Ref5] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli- 661 second IGP Convergence," Work in Progress. 663 [Ref6] A. Zinin and M. Shand, "Flooding Optimizations in Link-State 664 Routing Protocols," Work in Progress. 666 [Ref7] J. Moy, "Flooding over Parallel Point-to-Point Links," Work in 667 progress. 669 [Ref8] P. Pillay-Esnault, "OSPF Refresh and flooding reduction in 670 stable topologies," Work in progress. 672 [Ref9] G. Choudhury, V. Manral, "LSA Flooding Optimization 673 Algorithms and Their Simulation Study," Work in progress. 675 [Ref10] J. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A. 676 Maunder, V. Manral, "Congestion Avoidance & Control for OSPF 677 Networks", Work in Progress. 679 [Ref11] B. M. Waxman, "Routing of Multipoint Connections," IEEE 680 Journal on Selected Areas in Communications, 6(9):1617-1622, 1988. 682 9. Authors' Addresses 684 Gagan L. Choudhury 685 AT&T 686 Room D5-3C21 687 200 Laurel Avenue 688 Middletown, NJ, 07748 689 USA 690 Phone: (732)420-3721 691 email: gchoudhury@att.com 693 Vera D. Sapozhnikova 694 AT&T 695 Room C5-2C29 696 200 Laurel Avenue 697 Middletown, NJ, 07748 698 USA 699 Phone: (732)420-2653 700 email: sapozhnikova@att.com 702 Anurag S. Maunder 703 Sanera Systems 704 370 San Aleso Ave. 705 Second Floor 706 Sunnyvale, CA 94085 707 Phone: (408)734-6123 708 email: amaunder@sanera.net 710 Vishwas Manral 711 NetPlane 712 189, Prashasan Nagar, 713 Road Number 72 714 Jubilee Hills, Hyderabad 715 India 716 email: Vishwasm@netplane.com