idnits 2.17.1 

draft-ietf-ospf-scalability-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Ref1-Ref4' is mentioned on line 141, but not defined

  == Unused Reference: 'Ref1' is defined on line 648, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref2' is defined on line 651, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref3' is defined on line 654, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref4' is defined on line 657, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref1'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref4'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref5'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref6'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref7'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref8'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref9'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref10'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ref11'


     Summary: 4 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	   Internet Engineering Task Force                 Gagan L. Choudhury
3	   Internet Draft                                  Vera D. Sapozhnikova
4	   Expires in May, 2003                            AT&T
5	   draft-ietf-ospf-scalability-02.txt
6	                                                   Anurag S. Maunder
7	                                                   Sanera Systems

9	                                                   Vishwas Manral
10	                                                   Netplane Systems

12	                                                   November, 2002

14	    Explicit Marking and Prioritized Treatment of Specific OSPF Packets
15	      for Faster Convergence and Improved Network Scalability and
16	                                 Stability

18	Status of this Memo

20	   This document is an Internet-Draft and is in full conformance
21	   with all provisions of Section 10 of RFC2026.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six
29	   months and may be updated, replaced, or obsoleted by other documents
30	   at any time.  It is inappropriate to use Internet-Drafts as
31	   reference material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	        http://www.ietf.org/ietf/1id-abstracts.txt
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	        http://www.ietf.org/shadow.html.
37	   Distribution of this memo is unlimited.

39	Abstract

41	   In this draft we propose the following mechanisms to improve
42	   the scalability and stability of OSPF-based network:

44	   (1) Process the Hello packets at a higher priority compared to other
45	       OSPF packets.  In order to facilitate this, explicitly mark the
46	       Hello packets, to differentiate them from other OSPF packets.
47	       One way of special marking is to use a different Diffserv
48	       codepoint for Hello packets compared to other OSPF packets.

50	   (2) In the absence of special marking, or in addition to it, use
51	       other mechanisms in order not to miss Hello packets. One example
52	       is to treat any packet received over a link as a surrogate for
53	       a Hello packet (an implicit Hello) for the purpose of keeping
54	       the link alive.

56	   (3) The same type of explicit marking and prioritized treatment may
57	       be beneficial to other OSPF packets as well.  One important
58	       example is LSA acknowledgment packet that can reduce
59	       retransmissions during periods of congestion.  Other examples
60	       include (a) Database description (DBD) packet from a slave that
61	       is used as an acknowledgement, and (b) LSAs carrying intra-area
62	       topology change information.

64	   It is possible that some implementations are already using one or
65	   more of the above mechanisms in order not to miss the processing of
66	   critical packets during periods of congestion.  However, we suggest
67	   the above mechanisms to be included as part of the standard so that
68	   all implementations can benefit from them.

70	Table of Contents

72	   1. Introduction...................................................2
73	   2. The Network Under Simulation...................................5
74	   3. Simulation Results ............................................7
75	   4. Observations on Simulation Results ...........................11
76	   5. Need for Prioritized Treatment of Critical OSPF Packets and
77	      Special Marking to Facilitate That............................12
78	   6. Summary.......................................................13
79	   7. Acknowledgments...............................................14
80	   8. References....................................................14
81	   9. Authors' Addresses............................................15

83	1. Introduction

85	   Due to world-wide increased traffic demand, data networks are ever
86	   increasing in size in terms of number of nodes, number of links,
87	   adjacencies per node and Link State Database size.  Our motivation
88	   is to improve the ability of large networks to withstand
89	   the simultaneous or near-simultaneous update of a large number of
90	   link-state-advertisement messages, or LSAs.  We call this event, an
91	   LSA storm.  An LSA storm may be initiated due to many reasons.  Here
92	   are some examples:

94	   (a) one or more link failures due to fiber cuts,
95	   (b) one or more node failures for some reason, e.g., software
96	       crash or some type of disaster in an office complex hosting
97	       many nodes,

99	   (c) requirement of taking down and later bringing back many
100	       nodes during a software/hardware upgrade,

102	   (d) near-synchronization of the once-in-30-minutes refresh instants
103	       of some types of LSAs,

105	   (e) refresh of all LSAs in the system during a change in software
106	       version.

108	   In addition to the LSAs generated as a direct result of link/node
109	   failures, there may be other indirect LSAs as well.  One example
110	   in MPLS networks is traffic engineering LSAs generated at other
111	   links as a result of significant change in reserved bandwidth
112	   resulting from rerouting of Label Switched Paths (LSPs) that went
113	   down during the link/node failure.

115	   The LSA storm causes high CPU and memory utilization at the node
116	   processors causing incoming packets to be delayed or dropped.
117	   Delayed acknowledgements (beyond the retransmission timer value)
118	   results in retransmissions, and delayed Hello packets (beyond the
119	   Router-Dead interval) results in links being declared down.  A
120	   trunk-down event causes Router LSA generation by its end-point
121	   nodes.  If traffic engineering LSAs are used for each link then
122	   that type of LSAs would also be generated by the end-point nodes
123	   and potentially elsewhere as well due to significant changes in
124	   reserved bandwidths at other links caused by the failure and reroute
125	   of LSPs originally using the failed trunk.  Eventually, when the
126	   link recovers that would also trigger additional Router and traffic
127	   engineering LSAs.

129	   The retransmissions and additional LSA generations result in further
130	   CPU and memory usage, essentially causing a positive feedback loop.
131	   We define the LSA storm size as the number of LSAs in the original
132	   storm and not counting any additional LSAs resulting from the
133	   feedback loop described above.  If the LSA storm is too large then
134	   the positive feedback loop mentioned above may be large enough to
135	   indefinitely sustain a large CPU and memory utilization at many
136	   network nodes, thereby driving the network to an unstable state.

138	   In the past, network
139	   outage events have been reported in IP and ATM networks using
140	   link-state protocols such as OSPF, IS-IS, PNNI or some proprietary
141	   variants.  See, for example [Ref1-Ref4].  In many of these examples,
142	   large scale flooding of LSAs or other similar control messages
143	   (either naturally or triggered by some bug or inappropriate
144	   procedure) have been partly or fully responsible for network
145	   instability and outage.

147	   It has been suggested [Ref5] to reduce the Hello interval and
148	   Router-Dead interval significantly in order for OSPF to detect
149	   link failures and recoveries faster. Reduction of Router-Dead
150	   interval would make it even more likely for links to be declared down
151	   due to missed Hellos.

153	   We use a simulation model to show that there is a certain LSA storm
154	   size threshold above which the network may show unstable behavior
155	   caused by large number of retransmissions, link failures due to
156	   missed Hello packets and subsequent link recoveries.  We also show
157	   that the LSA storm size causing instability may be substantially
158	   increased by providing prioritized treatment to Hello and LSA
159	   Acknowledgment packets.  Furthermore, if we prioritize Hello
160	   packets then even when the network operates somewhat above the
161	   stability threshold, links are not declared down due to missed
162	   Hellos.  This implies that even though there is
163	   control plane congestion due to many retransmissions, the data plane
164	   stays up and no new LSAs are generated (besides the ones in the
165	   original storm and the refreshes).  Based on these observations
166	   we propose prioritized treatment of Hello, LSA acknowledgment
167	   and other critical OSPF packets and a special marking to facilitate
168	   that.

170	   One might argue that the scalability issue of large networks should
171	   be solved solely by dividing the network hierarchically into
172	   multiple areas so that flooding of LSAs remains localized within
173	   areas.  However, this approach increases the network management
174	   and design complexity and may result in less optimal routing between
175	   areas. Also, ASE LSAs are flooded throughout the AS and it may be
176	   a problem if there are large numbers of them.  Furthermore,
177	   a large number of summary LSAs may need to be flooded across
178	   Areas and their numbers would increase significantly if
179	   multiple Area Border Routers are employed for the purpose of
180	   reliability. Thus it is important to allow the network to grow
181	   towards as large a size as possible under a single area.

183	   Our proposal here is synergistic with a broader set of scalability
184	   and stability improvement proposals. [Ref6, Ref7] proposes flooding
185	   overhead reduction in case more than one interface goes to the same
186	   neighbor.  [Ref8] proposes a mechanism for
187	   greatly reducing LSA refreshes in stable topologies. [Ref9] compares
188	   several restricted flooding algorithms in terms of their ability to
189	   withstand large LSA storms and robustness to failure conditions.
190	   [Ref10] proposes a wide range of congestion control and failure
191	   recovery mechanisms.

193	   Section 2 describes the network under simulation and Section 3
194	   provides the simulation results.  Section 4 gives the basic
195	   observations based on the simulation results.  Section 5 explains
196	   the need for prioritized treatment of certain critical OSPF packets
197	   and special marking to facilitate that.  Section 6 gives the summary.

199	2. The Network Under Simulation

201	   We generate a random network over a rectangular grid using a
202	   modified version of Waxman's algorithm [Ref11] that ensures that
203	   the network is connected and has a pre-specified number of nodes,
204	   links, maximum number of neighbors per node, and maximum number
205	   of adjacencies per node. The rectangular grid resembles the
206	   continental U.S.A. with maximum one-way propagation delay of 30 ms
207	   in the East-West direction and maximum one-way propagation delay of
208	   15 ms in the North-South direction.  We consider two different
209	   network sizes as explained in Section 3.

211	   The network has a flat, single-area topology.

213	   Each node is a Router and each link is a point-to-point link
214	   connecting two routers.

216	   We assume that node CPU and memory (not the link bandwidth) is the
217	   main bottleneck in the LSA flooding process.  This will typically
218	   be true for high speed links (e.g., OC3 or above) and/or links
219	   where OSPF traffic gets an adequate Quality of Service (QoS)
220	   compared to other traffic.

222	   Different Timers:
223	     LSA refresh interval = 1800 seconds,
224	     Hello refresh interval = 10 Seconds,
225	     Router-Dead interval = 40 seconds,
226	     LSA retransmission interval: two values are considered, 10 seconds
227	       and 5 Seconds (note that a retransmission is disabled on the
228	       receipt of either an explicit acknowledgment or a duplicate LSA
229	       over the same interface that acts as an implicit acknowledgment)
230	     Minimum time between successive generation of the same LSA = 5
231	       seconds,
232	     Minimum time between successive Dijkstra SPF calculations
233	       is 1 second.

235	   Packing of LSAs: It is assumed that for any given node, the LSAs
236	   generated over a 1-second period are packed together to form an LSU
237	   but no more than 3 LSAs are packed in one LSU.

239	   LSU/Ack/Hello Processing Times: All processing times are expressed
240	   in terms of the parameter T.  Two values of T are considered, 1 ms
241	   and 0.5 ms.

243	   In the case of a dedicated processor for processing OSPF packets the
244	   processing time reported represents the true processing time. If the
245	   processor does other work and only a fraction of its capacity can be
246	   dedicated to OSPF processing then we have to inflate the processing
247	   time appropriately to get the effective processing time and in that
248	   case it is assumed that the inflation factor is already taken into
249	   account as part of the reported processing time.

251	   The fixed time to send or receive any LSU, Ack or Hello packet is T.
252	   In addition, a variable processing time is used for LSU and Ack
253	   depending on the number and types of LSAs packed.  No variable
254	   processing time is used for Hello.
255	   Variable processing time per Router LSA is (0.5 + 0.17L)T where L is
256	   the number of adjacencies advertised by the Router LSA.  For other
257	   LSA types (e.g., ASE LSA or a "Link" LSA carrying traffic
258	   engineering information about a link), the variable processing time
259	   per LSA is 0.5T.

261	   Variable processing time for an Ack is 25% that of the corresponding
262	   LSA.

264	   It is to be noted that if multiple LSAs are packed in a single LSU
265	   packet then the fixed processing time is needed only once but the
266	   variable processing time is needed for every component of the
267	   packet.

269	   The processing time values we use are roughly in the same range of
270	   what has been observed in an operational network.

272	   LSU/Ack/Hello Priority: Two non-preemptive priority levels and
273	   three priority scenarios are considered. Within each priority level
274	   processing is FIFO with new packets of lower priority being
275	   dropped when the lower priority queue is full.  The higher priority
276	   packets are never dropped.
277	      In Priority scenario 1, all LSUs/Acks/Hellos received at a node
278	      are queued at the lower priority.
279	      In Priority scenario 2, Hellos received at a node are queued at
280	      the higher priority but LSUs/Acks are queued at lower priority.
281	      In Priority scenario 3, Hellos and Acks received at a node are
282	      queued at the higher priority but LSUs are queued at lower
283	      priority.
284	   All packets generated internally to a node (usually triggered by
285	   a timer) are processed at the higher priority.  This includes the
286	   initial LSA storm, LSA refresh, Hello refresh, LSA retransmission
287	   and new LSA generation after detection of a failure or recovery.

289	   Buffer Size for Incoming LSUs/Acks/Hellos (lower priority): Buffer
290	   size is assumed to be 2000 packets where a packet is either an Ack,
291	   LSU, or Hello.

293	   LSA Refresh: Each LSA is refreshed once in 1800 seconds and the
294	   refresh instants of various LSAs in the LSDB are assumed to be
295	   uniformly distributed over the 1800 seconds period, i.e., they are
296	   completely unsynchronized.  If however, an LSA is generated as part
297	   of the initial LSA storm then it goes on a new refresh schedule of
298	   once in 1800 seconds starting from its generation time.

300	   LSA Storm Generation: As defined earlier, "LSA storm" is the
301	   simultaneous or near simultaneous generation of a large number of
302	   LSAs. In the case of only Router and ASE LSAs we normally assume
303	   that the number of ASE LSAs in the storm is about 4 times that of
304	   the Router LSAs, but the ratio is allowed to change if either the
305	   Router or the ASE LSAs have reached their maximum possible value.
306	   In the case of only Router and Link LSAs (carrying traffic
307	   engineering information) we normally assume that the number of Link
308	   LSAs in the storm is about 4 times that of the Router LSAs, but the
309	   ratio is allowed to change if either the Router or the Link LSAs
310	   have reached their maximum possible value.  For any given LSA storm
311	   we keep generating LSAs starting from Node index 1 and moving
312	   upwards and stop until the correct number of LSAs of each type have
313	   been generated.  The LSAs generated at any given node is assumed to
314	   start at an instant uniformly distributed between 20 and 30 seconds
315	   from the start of the simulation.  Successive LSA generations at a
316	   node are assumed to be spaced apart by 400 ms. It is to be noted
317	   that during the period of observation there are other LSAs
318	   generated besides the ones in the storm.  These include refresh of
319	   LSAs that are not part of the storm and LSAs generated due to
320	   possible link failures and subsequent possible link recoveries.

322	   Failure/Recovery of Links: If no Hello is received over a link (due
323	   to CPU/memory congestion) for longer than Router-Dead Interval then
324	   the link is declared down.  At a later time, if Hellos are received
325	   then the link would be declared up.  Whenever a link is declared
326	   up or down, one Router LSA is generated by each Router on the
327	   two sides of the point-to-point link.  If "Link LSAs" carrying
328	   traffic engineering information is used then it is assumed that each
329	   Router would also generate a Link LSA.  In this case it is also
330	   assumed that due to rerouting of LSPs, three other links in the
331	   network (selected randomly in the simulation) would have significant
332	   change in reserved bandwidth which would result in one Link LSA
333	   being generated by the routers on the two ends of each such link.

335	3. Simulation Results

337	   In this section we study the relative performance of the three
338	   Priority scenarios defined earlier (no priority to Hello or Ack,
339	   priority to Hello only, and priority to both Hello and Ack) with a
340	   range of Network sizes, LSA retransmission timer values, LSA types,
341	   processing time values and Hello/Router-Dead-Interval values:

343	   Network size: Two networks are considered.  Network 1 has 100 nodes,
344	   1200 links, maximum number of neighbors per node is 30 and maximum
345	   number of adjacencies per node is 50 (same neighbor may have more
346	   than one adjacencies).   Network 2 has 50 nodes, 600 links, maximum
347	   number of neighbors per node is 25 and maximum number of adjacencies
348	   per node is 48. Dijkstra SPF calculation time for Network 1 is
349	   assumed to be 100 ms and that for Network 2 is assumed to be 70 ms.

351	   LSA Type: Each node has 1 Router LSA (Total of 100 for Network 1 and
352	   50 for Network 2). There are no Network LSAs since all links are
353	   point-to-point links and no Summary LSAs since the network has only
354	   one area. Regarding other LSA types we consider two situations.  In
355	   Situation 1 we assume that there are no ASE LSAs and each link has
356	   one "Link" LSA carrying traffic engineering information (Total of
357	   2400 for Network 1 and 1200 for Network 2). In Situation 2 we assume
358	   that there are no "Link" LSAs and half of the nodes are ASA-Border
359	   nodes and each border node has 10 ASE LSAs (Total of 500 for
360	   Network 1 and 250 for Network 2).  We identify Situation 1 as "Link
361	   LSAs" and Situation 2 as "ASE LSAs".

363	   LSA retransmission timer value: Two values are considered, 10
364	   seconds and 5 seconds (default value).

366	   Processing time values: Processing times for LSUs, Acks and Hello
367	   packets have been previously expressed in terms of a common
368	   parameter T.  Two values are considered for T, which are 1 ms
369	   and 0.5 ms respectively.

371	   Hello/Router-Dead-Interval: It is assumed that Router-Dead interval
372	   is four times the Hello interval.  In one case it is assumed that
373	   Hello interval is 10 seconds and Router-Dead-Interval is 40
374	   seconds (default values), and in the other case it is assumed that
375	   Hello interval is 2 seconds and Router-Dead-Interval is 8 seconds.

377	   Based on Network size, LSA type and processing time values we
378	   develop 6 Test cases as follows:

380	   Case 1: Network 1, Link LSAs, retransmission timer = 10 sec.,
381	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

383	   Case 2: Network 1, ASE LSAs, retransmission timer = 10 sec.,
384	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

386	   Case 3: Network 1, Link LSAs, retransmission timer = 5 sec.,
387	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

389	   Case 4: Network 1, Link LSAs, retransmission timer = 10 sec.,
390	           T = 0.5 ms, Hello/Router-Dead-Interval = 10/40 sec.

392	   Case 5: Network 1, Link LSAs, retransmission timer = 10 sec.,
393	           T = 1 ms, Hello/Router-Dead-Interval = 2/8 sec.

395	   Case 6: Network 2, Link LSAs, retransmission timer = 10 sec.,
396	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

398	   For each case and for each Priority scenario we study the network
399	   stability as a function of the size of the LSA storm.  The stability
400	   is determined by looking at the number of non-converged LSUs as a
401	   function of time. An example is shown in Table 1 for Case 1 and
402	   Priority scenario 1 (No priority to Hellos or Acks).

404	=========|==========================================================
405	         | Number of Non-Converged LSUs in the Network at Time(in sec)
406	    LSA  |
407	   STORM |====|=====|=====|=====|=====|=====|=====|=====|========|==
408	   SIZE  |10s | 20s | 30s | 35s | 40s | 50s | 60s | 80s | 100s   |
409	=========|====|=====|=====|=====|=====|=====|=====|=====|========|==
410	    100  | 0  |  0  | 24  | 29  | 24  |  1  |  0  |  1  |  1     |
411	 (Stable)|    |     |     |     |     |     |     |     |        |
412	---------|----|-----|-----|-----|-----|-----|-----|-----|--------|--
413	    140  | 0  |  0  | 35  | 48  | 46  | 27  | 14  |  1  |  1     |
414	 (Stable)|    |     |     |     |     |     |     |     |        |
415	---------|----|-----|-----|-----|-----|-----|-----|-----|--------|--
416	    160  | 0  |  0  | 38  | 57  | 55  | 40  | 26  | 65  | 203    |
417	(Unstable)    |     |     |     |     |     |     |     |        |
418	=========|==========================================================

420	           Table 1: Network Stability Vs. LSA Storm
421	              (Case 1, No priority to Hello/Ack)

423	   The LSA storm starts a little after 20 seconds and so for some
424	   period of time after that the number of non-converged LSUs should
425	   stay high and then come down for a stable network.
426	   This happens for LSA storms of sizes 100 and 140.  With an LSA storm
427	   of size 160, the number of non-converged LSUs stay high indefinitely
428	   due to repeated retransmissions, link failures due to missed Hellos
429	   for more than the Router-Dead interval which generates additional
430	   LSAs and also due to subsequent link recoveries which again
431	   generate additional LSAs.  We define network stability threshold as
432	   the maximum allowable LSA storm size for which the number of
433	   non-converged LSUs come down to a low level after some time. It
434	   turns out that for this example the stability threshold is
435	   150.

437	   The network behavior as a function of the LSA storm size can
438	   be categorized as follows:

440	   (1) If the LSA storm is well below the stability threshold then
441	       the CPU/memory congestion lasts only for a short period and
442	       during this period there are very few retransmissions, very
443	       few dropped OSPF packets and no link
444	       failures due to missed Hellos.  This type of LSA storms are
445	       observed routinely in operational networks and networks
446	       recover from them easily.

448	   (2) If the LSA storm is just below the stability threshold then
449	       the CPU/memory congestion lasts for a longer period and during
450	       this period there may be considerable amount of retransmissions
451	       and dropped OSPF packets.  If Hello packets are not given
452	       priority then there may also be some link failures due to
453	       missed Hellos.  However, the network does go back to a stable
454	       state eventually. This type of LSA storm may happen rarely in
455	       operational networks and they recover from it with some
456	       difficulty.

458	   (3) If the LSA storm is above the stability threshold then
459	       the CPU/memory congestion may last indefinitely unless
460	       some special procedure for relieving congestion is followed.
461	       During this period there are considerable amount of
462	       retransmissions and dropped OSPF packets.  If Hello packets are
463	       not given priority then there would also be link failures due
464	       to missed Hellos.  This type of LSA storm may happen very
465	       rarely in operational networks and usually some manual procedure
466	       such as taking down adjacencies in heavily congested nodes is
467	       needed.

469	   (4) If Hello packets are given priority then the network stability
470	       threshold increases, i.e., the network can withstand a larger
471	       LSA storm. Furthermore, even if the network operates at or
472	       somewhat above this higher stability threshold, Hellos are
473	       still not missed and so there are no link failures.  So even
474	       if there is congestion in the control plane due to increased
475	       retransmissions requiring some special procedures for congestion
476	       reduction, the data plane remains unaffected.

478	   (5) If both Hello and Acknowledgement packets are given priority
479	       then the stability threshold increases even further.

481	   In Table 2 we show the network stability threshold for the five
482	   different cases and for the three different priority scenarios
483	   defined earlier.

485	|===========|========================================================|
486	|           |    Maximum Allowable LSA Storm Size For                |
487	|   Case    |=================|==================|===================|
488	|  Number   | No Priority to  |Priority to Hello | Priority to Hello |
489	|           |  Hello or Ack   |      Only        |   and Ack         |
490	|===========|=================|==================|===================|
491	|   Case 1  |        150      |        190       |        250        |
492	|___________|_________________|__________________|___________________|
493	|   Case 2  |        185      |        215       |        285        |
494	|___________|_________________|__________________|___________________|
495	|   Case 3  |        115      |        127       |        170        |
496	|___________|_________________|__________________|___________________|
497	|   Case 4  |        320      |        375       |        580        |
498	|___________|_________________|__________________|___________________|
499	|   Case 5  |        120      |        175       |        225        |
500	|___________|_________________|__________________|___________________|
501	|   Case 6  |        185      |        224       |        285        |
502	|___________|_________________|__________________|___________________|

504	       Table 2: Maximum Allowable LSA Storm for a Stable Network

506	4. Observations on Simulation Results

508	   Table 2 shows that in all cases prioritizing Hello packets increases
509	   the network stability threshold, and in addition, prioritization of
510	   LSA Acknowledgment packets increases the stability threshold even
511	   further.  The reasons for the above observations are as follows.
512	   The main sources of sustained CPU/memory congestion (or positive
513	   feedback loop) following an LSA storm are (1) LSA retransmissions
514	   and (2) links being declared down due to missed Hellos which in
515	   turn causes further LSA generation and future recovery of the link
516	   causing even more LSA generation.
517	   Prioritizing Hello packets avoids and practically eliminates the
518	   second source of congestion.  Prioritizing Acknowledgements
519	   significantly reduces the first source of congestion, i.e.,
520	   LSA retransmissions.  It is to be noted that retransmissions can
521	   not be completely eliminated due to the following reasons. Firstly,
522	   only the explicit Acknowledgments are prioritized but duplicate
523	   LSAs carrying implicit Acknowledgments are still served at the
524	   lower priority.  Secondly, LSAs may get greatly delayed or dropped
525	   at the input queue of receivers and therefore Acknowledgments may
526	   not even get generated in which case prioritizing Acks would not
527	   help. Another factor to keep in mind is that since Hellos and Acks
528	   are prioritized, the LSAs see bigger delay and potential for
529	   dropping. However, the simulation results show that on the whole
530	   prioritizing Hello and LSA Acks are always beneficial and
531	   significantly improve the network stability threshold.

533	   Our simulation study also showed that in each of the cases, instead
534	   of prioritizing Hello packets if we treat any packet received over
535	   a link as a surrogate for a Hello packet (an implicit Hello) then
536	   we get about the same stability threshold as obtained with
537	   prioritizing Hello packets.

539	   If we prioritize Hello packets then even when the network operates
540	   somewhat above the stability threshold, links are not declared
541	   down due to missed Hellos.  This implies that even though there is
542	   control plane congestion due to many retransmissions, the data plane
543	   stays up and no new LSAs are generated (besides the ones in the
544	   original storm and the refreshes)

546	5. Need for Prioritized Treatment of Critical OSPF Packets and
547	   Special Marking to Facilitate That

549	   The observations in the previous section clearly show that
550	   prioritizing Hello and LSA Acknowledgment packets are greatly
551	   beneficial in improving the scalability and stability of large
552	   networks.  In addition to these packets it may be beneficial
553	   to treat certain other OSPF packets at the higher priority as well.
554	   One example (during the database exchange process between neighbors
555	   following a link recovery) is the Database Description packet from
556	   a slave that is used as an acknowledgment for the previous Database
557	   Description packet sent from the master. Another example is an LSA
558	   carrying a change information which may trigger SPF calculation
559	   and rerouting of Label Switched Paths. It is preferable to transmit
560	   this information faster than other LSAs in the network that are
561	   just once-in-30-minutes refreshes and typically would not trigger
562	   any route computation or route change.

564	   Given that there is a need for providing prioritized treatment
565	   to certain OSPF packets, the next natural question is how to
566	   facilitate this prioritization.

568	   If it is possible to
569	   examine the packet header (for the purpose of prioritization)
570	   much faster than processing the whole packet then prioritized
571	   treatment is possible without any protocol changes.

573	   However, we also propose that a special marking be used for
574	   categorizing all OSPF packets into one of two priority classes.
575	   It is also important to separately mark OSPF packets from other
576	   IP packets.  One way to do this is to reserve two diffserv
577	   codepoints, one for higher priority OSPF packets and another
578	   one for lower priority OSPF packets.  With this special
579	   marking it would be easy for OSPF implementers to
580	   treat Hello, LSA acknowledgment, and other critical OSPF
581	   packets at a higher priority and thereby significantly
582	   improve the scalability and stability of networks using
583	   OSPF.

585	6. Summary

587	   In this draft we point out that the node processors of a large
588	   network may be subjected to a sustained CPU/Memory congestion
589	   as a result of a large LSA storm caused by some type of
590	   failure/recovery of nodes/links or synchronization among refreshes.
591	   There is a certain LSA storm size threshold above which the network
592	   may show unstable behavior caused by large number of
593	   retransmissions, link failures due to missed Hello packets and
594	   subsequent link recoveries.  Using a simulation study we show that
595	   the LSA storm size causing instability may be substantially
596	   increased by providing prioritized treatment to Hello and LSA
597	   Acknowledgment packets.  Furthermore, if we prioritize Hello
598	   packets then even when the network operates somewhat above the
599	   stability threshold, links are not declared down due to missed
600	   Hellos.  This implies that even though there is
601	   control plane congestion due to many retransmissions, the data plane
602	   stays up and no new LSAs are generated (besides the ones in the
603	   original storm and the refreshes).

605	   Based on the above observations we propose the following:

607	   (1) Process the Hello packets at a higher priority compared to other
608	       OSPF packets.  In order to facilitate this, explicitly mark the
609	       Hello packets, to differentiate them from other OSPF packets.
610	       One way of special marking is to use a different Diffserv
611	       codepoint for Hello packets compared to other OSPF packets.

613	   (2) In the absence of special marking, or in addition to it, use
614	       other mechanisms in order not to miss Hello packets. One example
615	       is to treat any packet received over a link as a surrogate for
616	       a Hello packet (an implicit Hello) for the purpose of keeping
617	       the link alive.  Our simulation study shows that this mechanism
618	       is just as effective as explicitly prioritizing Hello
619	       packets.

621	   (3) The same type of explicit marking and prioritized treatment may
622	       be beneficial to other OSPF packets as well.  One important
623	       example is LSA acknowledgment packet that can reduce
624	       retransmissions during periods of congestion.  Our simulation
625	       study shows that prioritization of both Hello and LSA
626	       Acknowledgment packets is considerably more effective than
627	       just prioritizing Hello packets.  Other examples
628	       include (a) Database description (DBD) packet from a slave that
629	       is used as an acknowledgement, and (b) LSAs carrying intra-area
630	       topology change information.

632	   It is possible that some implementations are already using one or
633	   more of the above mechanisms in order not to miss the processing of
634	   critical packets during periods of congestion.  However, we suggest
635	   the above mechanisms to be included as part of the standard so that
636	   all implementations can benefit from them.

638	7. Acknowledgments

640	   We would like to acknowledge Jerry Ash, Margaret Chiosi, Elie
641	   Francis, Jeff Han, Beth Munson, Roshan Rao, Moshe Segal, Mike
642	   Wardlow, and Pat Wirth for collaboration and encouragement in
643	   our scalability improvement efforts for Link-State-Protocol based
644	   networks.

646	8. References

648	   [Ref1] Pappalardo, D., "AT&T, customers grapple with ATM net
649	   outage," Network World, February 26, 2001.

651	   [Ref2] "AT&T announces cause of frame-relay network outage," AT&T
652	   Press Release, April 22, 1998.

654	   [Ref3] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive
655	   Week, August 20, 1999.

657	   [Ref4] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light
658	   Reading, April 6, 2001.

660	   [Ref5] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli-
661	   second IGP Convergence," Work in Progress.

663	   [Ref6] A. Zinin and M. Shand, "Flooding Optimizations in Link-State
664	   Routing Protocols," Work in Progress.

666	   [Ref7] J. Moy, "Flooding over Parallel Point-to-Point Links," Work in
667	   progress.

669	   [Ref8] P. Pillay-Esnault, "OSPF Refresh and flooding reduction in
670	   stable topologies," Work in progress.

672	   [Ref9] G. Choudhury, V. Manral, "LSA Flooding Optimization
673	   Algorithms and Their Simulation Study," Work in progress.

675	   [Ref10] J. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A.
676	   Maunder, V. Manral, "Congestion Avoidance & Control for OSPF
677	   Networks", Work in Progress.

679	   [Ref11] B. M. Waxman, "Routing of Multipoint Connections," IEEE
680	   Journal on Selected Areas in Communications, 6(9):1617-1622, 1988.

682	9. Authors' Addresses

684	   Gagan L. Choudhury
685	   AT&T
686	   Room D5-3C21
687	   200 Laurel Avenue
688	   Middletown, NJ, 07748
689	   USA
690	   Phone: (732)420-3721
691	   email: gchoudhury@att.com

693	   Vera D. Sapozhnikova
694	   AT&T
695	   Room C5-2C29
696	   200 Laurel Avenue
697	   Middletown, NJ, 07748
698	   USA
699	   Phone: (732)420-2653
700	   email: sapozhnikova@att.com

702	   Anurag S. Maunder
703	   Sanera Systems
704	   370 San Aleso Ave.
705	   Second Floor
706	   Sunnyvale, CA 94085
707	   Phone: (408)734-6123
708	   email: amaunder@sanera.net

710	   Vishwas Manral
711	   NetPlane
712	   189, Prashasan Nagar,
713	   Road Number 72
714	   Jubilee Hills, Hyderabad
715	   India
716	   email: Vishwasm@netplane.com