idnits 2.17.1 

draft-ietf-ospf-scalability-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.

  == There are 3 instances of lines with multicast IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use the 233.252.0.x range defined in RFC 5771


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 43 has weird spacing: '...     to  perta...'

  == Line 76 has weird spacing: '...for the  purpo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Ref4-Ref7' is mentioned on line 429, but not defined

  == Unused Reference: 'Ref4' is defined on line 301, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref5' is defined on line 304, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref6' is defined on line 307, but no explicit
     reference was found in the text

  == Unused Reference: 'Ref7' is defined on line 310, but no explicit
     reference was found in the text


     Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	   Internet Engineering Task Force                 Gagan L. Choudhury
3	   Internet Draft                                  Vera D. Sapozhnikov
4	   Expires in November, 2003                       Gerald R. Ash
5	   Category: Best Current Practice                 AT&T
6	   draft-ietf-ospf-scalability-05.txt
7	                                                   Anurag S. Maunder
8	                                                   Erlang Technology

10	                                                   Vishwas Manral
11	                                                   Motorola Inc.

13	                                                   May, 2003

15	           Prioritized Treatment of Specific OSPF
16	           Packets and Congestion Avoidance

18	Status of this Memo

20	   This document is an Internet-Draft and is in full conformance
21	   with all provisions of Section 10 of RFC2026.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six
29	   months and may be updated, replaced, or obsoleted by other documents
30	   at any time.  It is inappropriate to use Internet-Drafts as
31	   reference material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	        http://www.ietf.org/ietf/1id-abstracts.html
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	        http://www.ietf.org/shadow.html.
37	   Distribution of this memo is unlimited.

39	IPR Notices

41	   (A)   The IETF takes no position regarding the validity or scope of
42	         any intellectual property or other rights that might be claimed
43	         to  pertain to the implementation or use of the technology
44	         described in this document or the extent to which any license
45	         under such rights might or might not be available; neither does
46	         it represent that it has made any effort to identify any such
47	         rights.  Information on the IETF's procedures with respect to
48	         rights in standards-track and standards-related documentation
49	         can be found in BCP-11.  Copies of claims of rights made
50	         available for publication and any assurances of licenses to
51	         be made available, or the result of an attempt made
52	         to obtain a general license or permission for the use of such
53	         proprietary rights by implementors or users of this
54	         specification can be obtained from the IETF Secretariat.

56	   (B)   The IETF invites any interested party to bring to its
57	         attention any copyrights, patents or patent applications, or
58	         other proprietary rights which may cover technology that may be
59	         required to practice this standard.  Please address the
60	         information to the IETF Executive Director.

62	Copyright Notice

64	   (C)   Copyright (C) The Internet Society (date). All Rights
65	         Reserved.

67	         This document and translations of it may be copied and
68	         furnished to others, and derivative works that comment on or
69	         otherwise explain it or assist in its implementation may be
70	         prepared, copied, published and distributed, in whole or in
71	         part, without restriction of any kind, provided that the above
72	         copyright notice and this paragraph are included on all such
73	         copies and derivative works.  However, this document itself may
74	         not be modified in any way, such as by removing the copyright
75	         notice or references to the Internet Society or other Internet
76	         organizations, except as needed for the  purpose of developing
77	         Internet standards in which case the procedures for copyrights
78	         defined in the Internet Standards process must be followed, or
79	         as required to translate it into languages other than English.

81	         The limited permissions granted above are perpetual and will
82	         not be revoked by the Internet Society or its successors or
83	         assigns.

85	         This document and the information contained herein is provided
86	         on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
87	         ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
88	         IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
89	         OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
90	         IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
91	         PARTICULAR PURPOSE.

93	Abstract

95	   This document recommends methods that are intended to improve the
96	   scalability and stability of large networks using OSPF (Open Shortest
97	   Path First) protocol.  The methods include processing OSPF Hellos and
98	   LSA (Link State Advertisement) Acknowledgments at a higher priority
99	   compared to other OSPF packets, and other congestion avoidance
100	   procedures. Simulation results in support of some of the
101	   recommendations are given in the appendix sections.

103	Table of Contents

105	   1. Introduction...................................................3
106	   2. Recommendations................................................4
107	   3. Security Considerations........................................6
108	   4. Acknowledgments................................................6
109	   5. Normative References...........................................6
110	   6. Informative References.........................................6
111	   7. Authors' Addresses.............................................7
112	   Appendix A. LSA Storm: Causes and Impact..........................8
113	   Appendix B. Simulation Study.....................................10
114	   Appendix B.1. The Network Under Simulation.......................10
115	   Appendix B.2. Simulation Results.................................13
116	   Appendix B.3. Observations on Simulation Results.................17
117	   Appendix C. Other Recommendations................................17

119	1. Introduction

121	   A large network running OSPF [Ref1] or OSPF-TE [Ref2] protocol may
122	   occasionally experience the simultaneous or near-simultaneous update
123	   of a large number of link-state-advertisements, or LSAs.
124	   We call this event, an LSA storm and it may be initiated by an
125	   unscheduled failure or a scheduled maintenance event.
126	   The failure may be hardware, software, or procedural in nature.

128	   The LSA storm causes high CPU and memory utilization at the router
129	   causing incoming packets to be delayed or dropped.
130	   Delayed acknowledgments (beyond the retransmission timer value)
131	   result in retransmissions, and delayed Hello packets (beyond the
132	   router-dead interval) result in neighbor adjacencies being declared
133	   down. The retransmissions and additional LSA originations result in
134	   further CPU and memory usage, essentially causing a positive feedback
135	   loop, which, in the extreme case, may drive the network to an
136	   unstable state.

138	   The default value of retransmission timer is 5 seconds and that of
139	   the router-dead interval is 40 seconds.  However, recently there
140	   has been a lot of interest in significantly reducing OSPF convergence
141	   time. As part of that plan much shorter (subsecond) Hello and
142	   router-dead intervals have been proposed [Ref3].  In such a scenario
143	   it will be more likely for Hello packets to be delayed beyond
144	   the router-dead interval during network congestion
145	   caused by an LSA storm.

147	   Appendix A explains in more detail LSA storm scenarios,
148	   their impact, and points out a few real-life examples of
149	   control-message storms.  Appendix B presents a simulation study
150	   on this phenomenon.

152	   In order to improve the scalability and stability of networks we
153	   recommend steps for prioritizing critical OSPF packets and avoiding
154	   congestion. The details of the recommendations are given
155	   in Section 2.  We also do a simulation study on a subset of the
156	   recommendations in Appendix B and show that they indeed improve the
157	   scalability and stability of networks using OSPF protocol.

159	   Appendix C provides some further recommendations with similar goals.

161	2. Recommendations

163	   The Recommendations below are intended to improve the scalability
164	   and stability of large networks using OSPF protocol.  During
165	   periods of network congestion they would reduce retransmissions,
166	   avoid an adjacency to be declared down due to Hello packets
167	   being delayed beyond the RouterDeadInterval, and take other
168	   congestion avoidance steps.  The recommendations are unordered
169	   except that Recommendation 2 is to be implemented only if
170	   Recommendation 1 is not implemented.

172	   (1) Classify all OSPF packets in two classes: a "high priority"
173	       class comprising of OSPF Hello packets and Link State
174	       Acknowledgment packets, and a "low priority" class
175	       comprising of all other packets. The classification is
176	       accomplished by examining the OSPF packet header. While
177	       receiving a packet from a neighbor and while transmitting
178	       a packet to a neighbor, try to process a "high priority"
179	       packet ahead of a "low priority" packet.

181	   (2) If the Recommendation 1 cannot be implemented then reset the
182	       inactivity timer for an adjacency whenever any OSPF unicast
183	       packet or any OSPF packet sent to 224.0.0.5 over a
184	       point-to-point link is received over that adjacency instead of
185	       resetting the inactivity timer only on receipt of the
186	       Hello packet.  So OSPF would declare the adjacency to be down
187	       only if no OSPF unicast packets and no OSPF packets sent to
188	       224.0.0.5 over a point-to-point link are received over
189	       that adjacency for a period equaling or exceeding the
190	       RouterDeadInterval.  The reason for not recommending this
191	       proposal in conjunction with Recommendation 1 is to avoid
192	       potential undesirable side effects.  One such effect is the
193	       delay in discovering the down status of
194	       an adjacency in a case where no high priority Hello packets are
195	       being received but the inactivity timer is being reset by other
196	       stale packets in the low priority queue.

198	   (3) Use an Exponential Backoff algorithm for determining the value
199	       of the LSA retransmission interval (RxmtInterval).  Let R(i)
200	       represent the RxmtInterval value used during the i-th
201	       retransmission of an LSA.  Use the following algorithm to
202	       compute R(i)

204	                    R(1) = Rmin
205	                    R(i+1) = Min(KR(i),Rmax)  for i>=1

207	       where K, Rmin and Rmax are constants and the function
208	       Min(.,.) represents the minimum value of its two arguments.
209	       Example values for K, Rmin and Rmax may be 2, 5 seconds
210	       and 40 seconds respectively.  Note that the example value for
211	       Rmin, the initial retransmission interval, is the same as the
212	       sample value of RxmtInterval in [Ref1].

214	       This recommendation is motivated by the observation that during
215	       a network congestion event caused by control messages, a major
216	       source for sustaining the congestion is the repeated
217	       retransmission of LSAs.  The use of an Exponential Backoff
218	       algorithm for the LSA retransmission interval reduces the rate
219	       of LSA retransmissions while the network experiences
220	       congestion (during which it is more likely that multiple
221	       retransmissions of the same LSA would happen).  This in turn
222	       helps the network get out of the congested state.

224	   (4) Implicit Congestion Detection and Action Based on That:
225	       If there is control message congestion at a router, its
226	       neighbors do not know about that explicitly.  However, they
227	       can implicitly detect it based on the number of unacknowledged
228	       LSAs to this router.  If this number exceeds a certain "high
229	       water mark" then the rate at which LSAs are sent to this router
230	       should be reduced.  At a future time, if the number of
231	       unacknowledged LSAs to this router falls below a certain "low
232	       water mark" then the normal rate of sending LSAs to this
233	       router should be resumed.  An example value for the "high
234	       water mark" may be 20 unacknowledged LSAs and that for the "low
235	       water mark" may be 10 unacknowledged LSAs.  An example
236	       value for the rate on exceeding the "high water mark" may be
237	       50% the normal rate.  This recommendation is to be implemented
238	       only for unicast LSAs sent to a neighbor or LSAs sent
239	       to 224.0.0.5 over a point-to-point link.

241	       Recommendations 3 and 4 both slow down LSAs to congested
242	       neighbors based on implicitly detecting the congestion but
243	       they have important differences. Recommendation 3 progressively
244	       slows down successive retransmissions of the same LSA whereas
245	       Recommendation 3 slows down all LSAs (new or retransmission)
246	       to a congested neighbor.

248	   (5) Throttling Adjacencies to be Brought Up Simultaneously:
249	       If a router tries to bring up a large number of adjacencies to
250	       its neighbors simultaneously then that may cause severe
251	       congestion due to database synchronization and LSA flooding
252	       activities.  It is recommended that during such a situation
253	       no more than "n" adjacencies should be brought up
254	       simultaneously.  Once a subset of adjacencies have been brought
255	       up successfully, newer adjacencies may be brought up as long as
256	       the number of simultaneous adjacencies being brought up does not
257	       exceed "n". The appropriate value of "n" would depend on the
258	       router processing power, link bandwidth and propagation delay.
259	       An example value may be 4.

261	       In the presence of throttling, an
262	       important issue is the order in which adjacencies are to be
263	       formed.  We recommend a First Come First Served (FCFS) policy
264	       based on the order in which the request for adjacency formation
265	       arrives.  Requests may either be from neighbors or self-generated.
266	       Among the self-generated requests a priority
267	       list may be used to decide the order in which the requests are
268	       to be made.  However, once an adjacency
269	       formation process starts it is not to be preempted except
270	       for unusual circumstances such as errors or time-outs.

272	3. Security Considerations

274	   This memo does not create any new security issues for the OSPF
275	   protocol.  Security considerations for the base OSPF protocol are
276	   covered in [Ref1].

278	4. Acknowledgments

280	   We would like to acknowledge the support of OSPF WG chairs
281	   Rohit Dube, Acee Lindem, and John Moy.  We acknowledge Mitchell
282	   Erblich, Mike Fox, Tony Przygienda, and Krishna Rao for comments on
283	   previous versions of the draft.  We also acknowledge
284	   Margaret Chiosi, Elie Francis, Jeff Han, Beth Munson,
285	   Roshan Rao, Moshe Segal, Mike Wardlow, and Pat Wirth for
286	   collaboration and encouragement in our scalability
287	   improvement efforts for Link-State-Protocol based networks.

289	5. Normative References

291	   [Ref1] J. Moy, "OSPF Version 2", RFC 2328, April, 1998.

293	6. Informative References

295	   [Ref2] D. Katz, D. Yeung, K. Kompella, "Traffic Engineering
296	   Extension to OSPF Version 2," Work in Progress.

298	   [Ref3] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli-
299	   second IGP Convergence," Work in Progress.

301	   [Ref4] Pappalardo, D., "AT&T, customers grapple with ATM net
302	   outage," Network World, February 26, 2001.

304	   [Ref5] "AT&T announces cause of frame-relay network outage," AT&T
305	   Press Release, April 22, 1998.

307	   [Ref6] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive
308	   Week, August 20, 1999.

310	   [Ref7] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light
311	   Reading, April 6, 2001.

313	   [Ref8] A. Zinin and M. Shand, "Flooding Optimizations in Link-State
314	   Routing Protocols," Work in Progress.

316	   [Ref9] P. Pillay-Esnault, "OSPF Refresh and flooding reduction in
317	   stable topologies," Work in progress.

319	   [Ref10] G. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A.
320	   Maunder, V. Manral, "Congestion Avoidance & Control for OSPF
321	   Networks", Work in Progress.

323	   [Ref11] B. M. Waxman, "Routing of Multipoint Connections," IEEE
324	   Journal on Selected Areas in Communications, 6(9):1617-1622, 1988.

326	7. Authors' Addresses

328	   Gagan L. Choudhury
329	   AT&T
330	   Room D5-3C21
331	   200 Laurel Avenue
332	   Middletown, NJ, 07748
333	   USA
334	   Phone: (732)420-3721
335	   email: gchoudhury@att.com

337	   Vera D. Sapozhnikova
338	   AT&T
339	   Room C5-2C29
340	   200 Laurel Avenue
341	   Middletown, NJ, 07748
342	   USA
343	   Phone: (732)420-2653
344	   email: sapozhnikova@att.com

346	   Gerald R. Ash
347	   AT&T
348	   Room D5-2A01
349	   200 Laurel Avenue
350	   Middletown, NJ, 07748
351	   USA
352	   Phone: (732)420-4578
353	   email: gash@att.com
354	   Anurag S. Maunder
355	   Erlang Technology
356	   2880 Scott Boulevard
357	   Santa Clara, CA 95052
358	   Phone: (408)420-7617
359	   email: anuragm@erlangtech.com

361	   Vishwas Manral
362	   Motorola Inc.
363	   189, Prashasan Nagar,
364	   Road Number 72
365	   Jubilee Hills, Hyderabad
366	   India
367	   email: vishwas@motorola.com

369	Appendix A. LSA Storm: Causes and Impact

371	   An LSA storm may be initiated due to many reasons.  Here
372	   are some examples:

374	   (a) one or more link failures due to fiber cuts,

376	   (b) one or more router failures for some reason, e.g., software
377	       crash or some type of disaster (including power outage)
378	       in an office complex hosting many routers,

380	   (c) Link/router flapping,

382	   (d) requirement of taking down and later bringing back many
383	       routers during a software/hardware upgrade,

385	   (e) near-synchronization of the periodic 1800 second LSA refreshes
386	       of a subset of LSAs,

388	   (f) refresh of all LSAs in the system during a change in software
389	       version,

391	   (g) injecting a large number of external routes to OSPF due to
392	       a procedural error,

394	   (h) Router ID changes causing a large number of LSA re-originations
395	       (possibly LSA purges as well depending on the implementation).

397	   In addition to the LSAs originated as a direct result of link/router
398	   failures, there may be other indirect LSAs as well.  One example
399	   in MPLS networks is traffic engineering LSAs originated at other
400	   links as a result of significant change in reserved bandwidth
401	   resulting from rerouting of Label Switched Paths (LSPs) that went
402	   down during the link/router failure.

404	   The LSA storm causes high CPU and memory utilization at the router
405	   processor causing incoming packets to be delayed or dropped.
406	   Delayed acknowledgments (beyond the retransmission timer value)
407	   results in retransmissions, and delayed Hello packets (beyond the
408	   Router-Dead interval) results in links being declared down.  A
409	   trunk-down event causes Router LSA origination by its end-point
410	   routers.  If traffic engineering LSAs are used for each link then
411	   that type of LSAs would also be originated by the end-point routers
412	   and potentially elsewhere as well due to significant changes in
413	   reserved bandwidths at other links caused by the failure and reroute
414	   of LSPs originally using the failed trunk.  Eventually, when the
415	   link recovers that would also trigger additional Router LSAs and
416	   traffic engineering LSAs.

418	   The retransmissions and additional LSA originations result in further
419	   CPU and memory usage, essentially causing a positive feedback loop.
420	   We define the LSA storm size as the number of LSAs in the original
421	   storm and not counting any additional LSAs resulting from the
422	   feedback loop described above.  If the LSA storm is too large then
423	   the positive feedback loop mentioned above may be large enough to
424	   indefinitely sustain a large CPU and memory utilization at many
425	   routers in the network, thereby driving the network to an unstable
426	   state. In the past, network
427	   outage events have been reported in IP and ATM networks using
428	   link-state protocols such as OSPF, IS-IS, PNNI or some proprietary
429	   variants.  See for example [Ref4-Ref7].  In many of these examples,
430	   large scale flooding of LSAs or other similar control messages
431	   (either naturally or triggered by some bug or inappropriate
432	   procedure) have been partly or fully responsible for network
433	   instability and outage.

435	   In Appendix B, we use a simulation model to show that there
436	   is a certain LSA storm
437	   size threshold above which the network may show unstable behavior
438	   caused by large number of retransmissions, link failures due to
439	   missed Hello packets and subsequent link recoveries.  We also show
440	   that the LSA storm size causing instability may be substantially
441	   increased by providing prioritized treatment to Hello and LSA
442	   Acknowledgment packets and by using an exponential backoff
443	   algorithm for determining the LSA retransmission interval.
444	   Furthermore, if we prioritize Hello
445	   packets then even when the network operates somewhat above the
446	   stability threshold, links are not declared down due to missed
447	   Hellos.  This implies that even though there is
448	   control plane congestion due to many retransmissions, the data plane
449	   stays up and no new LSAs are originated (besides the ones in the
450	   original storm and the refreshes).  These observations are the
451	   basis of the first three recommendations in Section 2.

453	   One might argue that the scalability issue of large networks should
454	   be solved solely by dividing the network hierarchically into
455	   multiple areas so that flooding of LSAs remains localized within
456	   areas.  However, this approach increases the network management
457	   and design complexity and may result in less optimal routing between
458	   areas. Also, ASE LSAs are flooded throughout the AS and it may be
459	   a problem if there are large numbers of them.  Furthermore,
460	   a large number of summary LSAs may need to be flooded across
461	   Areas and their numbers would increase significantly if
462	   multiple Area Border Routers are employed for the purpose of
463	   reliability. Thus it is important to allow the network to grow
464	   towards as large a size as possible under a single area.

466	   Our recommendations are synergistic with a broader set of scalability
467	   and stability improvement proposals. [Ref8] proposes flooding
468	   overhead reduction in case more than one interface goes to the same
469	   neighbor.  [Ref9] proposes a mechanism for
470	   greatly reducing LSA refreshes in stable topologies.

472	   [Ref10] proposes a wide range of congestion control and failure
473	   recovery mechanisms (some of those ideas are covered in this
474	   draft but [Ref10] has other ideas not covered here).

476	Appendix B. Simulation Study

478	   The main motivation of this study is to show the network congestion
479	   and instability caused by large LSA storms and the improvement in
480	   stability and scalability that can be achieved by following the
481	   recommendations in this memo.

483	Appendix B.1. The Network Under Simulation

485	   We generate a random network over a rectangular grid using a
486	   modified version of Waxman's algorithm [Ref11] that ensures that
487	   the network is connected and has a pre-specified number of routers,
488	   links, maximum number of neighbors per router, and maximum number
489	   of adjacencies per router. The rectangular grid resembles the
490	   continental U.S.A. with maximum one-way propagation delay of 30 ms
491	   in the East-West direction and maximum one-way propagation delay of
492	   15 ms in the North-South direction.  We consider two different
493	   network sizes as explained in Section B.2.

495	   The network has a flat, single-area topology.

497	   Each link is a point-to-point link connecting two routers.

499	   We assume that router CPU and memory (not the link bandwidth) is the
500	   main bottleneck in the LSA flooding process.  This will typically
501	   be true for high speed links (e.g., OC3 or above) and/or links
502	   where OSPF traffic gets an adequate Quality of Service (QoS)
503	   compared to other traffic.

505	   Different Timers:
506	     LSA refresh interval = 1800 seconds,
507	     Hello refresh interval = 10 Seconds,
508	     Router-Dead interval = 40 seconds,
509	     LSA retransmission interval: two values are considered, 10 seconds
510	       and 5 Seconds (note that a retransmission is disabled on the
511	       receipt of either an explicit acknowledgment or a duplicate LSA
512	       over the same interface that acts as an implicit acknowledgment)
513	     Minimum time between successive origination of the same LSA = 5
514	       seconds,
515	     Minimum time between successive Dijkstra SPF calculations
516	       is 1 second.

518	   Packing of LSAs: It is assumed that for any given router, the LSAs
519	   originated over a 1-second period are packed together to form an LSU
520	   but no more than 3 LSAs are packed in one LSU.

522	   LSU/Ack/Hello Processing Times: All processing times are expressed
523	   in terms of the parameter T.  Two values of T are considered, 1 ms
524	   and 0.5 ms.

526	   In the case of a dedicated processor for processing OSPF packets the
527	   processing time reported represents the true processing time. If the
528	   processor does other work and only a fraction of its capacity can be
529	   dedicated to OSPF processing then we have to inflate the processing
530	   time appropriately to get the effective processing time and in that
531	   case it is assumed that the inflation factor is already taken into
532	   account as part of the reported processing time.

534	   The fixed time to send or receive any LSU, Ack or Hello packet is T.
535	   In addition, a variable processing time is used for LSU and Ack
536	   depending on the number and types of LSAs packed.  No variable
537	   processing time is used for Hello.
538	   Variable processing time per Router LSA is (0.5 + 0.17L)T where L is
539	   the number of adjacencies advertised by the Router LSA.  For other
540	   LSA types (e.g., ASE LSA or a "Link" LSA carrying traffic
541	   engineering information about a link), the variable processing time
542	   per LSA is 0.5T.

544	   Variable processing time for an Ack is 25% that of the corresponding
545	   LSA.

547	   It is to be noted that if multiple LSAs are packed in a single LSU
548	   packet then the fixed processing time is needed only once but the
549	   variable processing time is needed for every component of the
550	   packet.

552	   The processing time values we use are roughly in the same range of
553	   what has been observed in an operational network.

555	   LSU/Ack/Hello Priority: Two non-preemptive priority levels and
556	   three priority scenarios are considered. Within each priority level
557	   processing is FIFO with new packets of lower priority being
558	   dropped when the lower priority queue is full.  The higher priority
559	   packets are never dropped.
560	      In Priority scenario 1, all LSUs/Acks/Hellos received at a router
561	      are queued at the lower priority.
562	      In Priority scenario 2, Hellos received at a router are queued at
563	      the higher priority but LSUs/Acks are queued at lower priority.
564	      In Priority scenario 3, Hellos and Acks received at a router are
565	      queued at the higher priority but LSUs are queued at lower
566	      priority.
567	   All packets generated internally to a router (usually triggered by
568	   a timer) are processed at the higher priority.  This includes the
569	   initial LSA storm, LSA refresh, Hello refresh, LSA retransmission
570	   and new LSA origination after detection of a failure or recovery.

572	   Buffer Size for Incoming LSUs/Acks/Hellos (lower priority): Buffer
573	   size is assumed to be 2000 packets where a packet is either an Ack,
574	   LSU, or Hello.

576	   LSA Refresh: Each LSA is refreshed once in 1800 seconds and the
577	   refresh instants of various LSAs in the LSDB are assumed to be
578	   uniformly distributed over the 1800 seconds period, i.e., they are
579	   completely unsynchronized.  If however, an LSA is originated as part
580	   of the initial LSA storm then it goes on a new refresh schedule of
581	   once in 1800 seconds starting from its origination time.

583	   LSA Storm Origination: As defined earlier, "LSA storm" is the
584	   simultaneous or near simultaneous origination of a large number of
585	   LSAs. In the case of only Router and ASE LSAs we normally assume
586	   that the number of ASE LSAs in the storm is about 4 times that of
587	   the Router LSAs, but the ratio is allowed to change if either the
588	   Router or the ASE LSAs have reached their maximum possible value.
589	   In the case of only Router and Link LSAs (carrying traffic
590	   engineering information) we normally assume that the number of Link
591	   LSAs in the storm is about 4 times that of the Router LSAs, but the
592	   ratio is allowed to change if either the Router or the Link LSAs have
593	   reached their maximum possible value.  For any given LSA storm we
594	   keep originating LSAs starting from router index 1 and moving
595	   upwards and stop until the correct number of LSAs of each type have
596	   been originated. The LSAs originated at any given router is assumed
597	   to start at an instant uniformly distributed between 20 and 30
598	   seconds from the start of the simulation.  Successive LSA
599	   originations at a router are assumed to be spaced apart by 400 ms.
600	   It is to be noted that during the period of observation there are
601	   other LSAs originated besides the ones in the storm.  These include
602	   refresh of LSAs that are not part of the storm and LSAs originated
603	   due to possible link failures and subsequent possible link
604	   recoveries.

606	   Failure/Recovery of Links: If no Hello is received over a link (due
607	   to CPU/memory congestion) for longer than Router-Dead Interval then
608	   the link is declared down.  At a later time, if Hellos are received
609	   then the link would be declared up.  Whenever a link is declared
610	   up or down, one Router LSA is originated by each router on the
611	   two sides of the point-to-point link.  If "Link LSAs" carrying
612	   traffic engineering information is used then it is assumed that each
613	   router would also originate a Link LSA.  In this case it is also
614	   assumed that due to rerouting of LSPs, three other links in the
615	   network (selected randomly in the simulation) would have significant
616	   change in reserved bandwidth which would result in one Link LSA
617	   being originated by the routers on the two ends of each such link.

619	Appendix B.2. Simulation Results

621	   In this section we study the relative performance of the three
622	   priority scenarios defined earlier (no priority to Hello or Ack,
623	   priority to Hello only, and priority to both Hello and Ack) with a
624	   range of Network sizes, LSA retransmission timer values, LSA types,
625	   processing time values and Hello/Router-Dead-Interval values:

627	   Network size: Two networks are considered. Network 1 has 100 routers,
628	   1200 links, maximum number of neighbors per router is 30 and maximum
629	   number of adjacencies per router is 50 (same neighbor may have more
630	   than one adjacencies).   Network 2 has 50 routers, 600 links, maximum
631	   number of neighbors per router is 25 and maximum number of
632	   adjacencies per router is 48. Dijkstra SPF calculation time for
633	   Network 1 is assumed to be 100 ms and that for Network 2 is assumed
634	   to be 70 ms.

636	   LSA Type: Each router has 1 Router LSA (Total of 100 for Network 1
637	   and 50 for Network 2). There are no Network LSAs since all links are
638	   point-to-point links and no Summary LSAs since the network has only
639	   one area. Regarding other LSA types we consider two situations.  In
640	   Situation 1 we assume that there are no ASE LSAs and each link has
641	   one "Link" LSA carrying traffic engineering information (Total of
642	   2400 for Network 1 and 1200 for Network 2). In Situation 2 we assume
643	   that there are no "Link" LSAs and half of the routers are ASA-Border
644	   routers and each border router has 10 ASE LSAs (Total of 500 for
645	   Network 1 and 250 for Network 2).  We identify Situation 1 as "Link
646	   LSAs" and Situation 2 as "ASE LSAs".

648	   LSA retransmission timer value: Two values are considered, 10
649	   seconds and 5 seconds (default value).

651	   Processing time values: Processing times for LSUs, Acks and Hello
652	   packets have been previously expressed in terms of a common
653	   parameter T.  Two values are considered for T, which are 1 ms
654	   and 0.5 ms respectively.

656	   Hello/Router-Dead-Interval: It is assumed that Router-Dead interval
657	   is four times the Hello interval.  In one case it is assumed that
658	   Hello interval is 10 seconds and Router-Dead-Interval is 40
659	   seconds (default values), and in the other case it is assumed that
660	   Hello interval is 2 seconds and Router-Dead-Interval is 8 seconds.

662	   Based on Network size, LSA type and processing time values we
663	   develop 6 Test cases as follows:

665	   Case 1: Network 1, Link LSAs, retransmission timer = 10 sec.,
666	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

668	   Case 2: Network 1, ASE LSAs, retransmission timer = 10 sec.,
669	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.
670	   Case 3: Network 1, Link LSAs, retransmission timer = 5 sec.,
671	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

673	   Case 4: Network 1, Link LSAs, retransmission timer = 10 sec.,
674	           T = 0.5 ms, Hello/Router-Dead-Interval = 10/40 sec.

676	   Case 5: Network 1, Link LSAs, retransmission timer = 10 sec.,
677	           T = 1 ms, Hello/Router-Dead-Interval = 2/8 sec.

679	   Case 6: Network 2, Link LSAs, retransmission timer = 10 sec.,
680	           T = 1 ms, Hello/Router-Dead-Interval = 10/40 sec.

682	   For each case and for each Priority scenario we study the network
683	   stability as a function of the size of the LSA storm.  The stability
684	   is determined by looking at the number of non-converged LSUs as a
685	   function of time. An example is shown in Table 1 for Case 1 and
686	   Priority scenario 1 (No priority to Hellos or Acks).

688	=========|==========================================================
689	         | Number of Non-Converged LSUs in the Network at Time(in sec)
690	    LSA  |
691	   STORM |====|=====|=====|=====|=====|=====|=====|=====|========|==
692	   SIZE  |10s | 20s | 30s | 35s | 40s | 50s | 60s | 80s | 100s   |
693	=========|====|=====|=====|=====|=====|=====|=====|=====|========|==
694	    100  | 0  |  0  | 24  | 29  | 24  |  1  |  0  |  1  |  1     |
695	 (Stable)|    |     |     |     |     |     |     |     |        |
696	---------|----|-----|-----|-----|-----|-----|-----|-----|--------|--
697	    140  | 0  |  0  | 35  | 48  | 46  | 27  | 14  |  1  |  1     |
698	 (Stable)|    |     |     |     |     |     |     |     |        |
699	---------|----|-----|-----|-----|-----|-----|-----|-----|--------|--
700	    160  | 0  |  0  | 38  | 57  | 55  | 40  | 26  | 65  | 203    |
701	(Unstable)    |     |     |     |     |     |     |     |        |
702	=========|==========================================================

704	           Table 1: Network Stability Vs. LSA Storm
705	              (Case 1, No priority to Hello/Ack)

707	   The LSA storm starts a little after 20 seconds and so for some
708	   period of time after that the number of non-converged LSUs should
709	   stay high and then come down for a stable network.
710	   This happens for LSA storms of sizes 100 and 140.  With an LSA storm
711	   of size 160, the number of non-converged LSUs stay high indefinitely
712	   due to repeated retransmissions, link failures due to missed Hellos
713	   for more than the Router-Dead interval which originates additional
714	   LSAs and also due to subsequent link recoveries which again
715	   originate additional LSAs.  We define network stability threshold as
716	   the maximum allowable LSA storm size for which the number of
717	   non-converged LSUs come down to a low level after some time. It
718	   turns out that for this example the stability threshold is
719	   150.

721	   The network behavior as a function of the LSA storm size can
722	   be categorized as follows:

724	   (1) If the LSA storm is well below the stability threshold then
725	       the CPU/memory congestion lasts only for a short period and
726	       during this period there are very few retransmissions, very
727	       few dropped OSPF packets and no link
728	       failures due to missed Hellos.  This type of LSA storm is
729	       observed routinely in operational networks and networks
730	       recover from them easily.

732	   (2) If the LSA storm is just below the stability threshold then
733	       the CPU/memory congestion lasts for a longer period and during
734	       this period there may be considerable amount of retransmissions
735	       and dropped OSPF packets.  If Hello packets are not given
736	       priority then there may also be some link failures due to
737	       missed Hellos.  However, the network does go back to a stable
738	       state eventually. This type of LSA storm may happen rarely in
739	       operational networks and they recover from it with some
740	       difficulty.

742	   (3) If the LSA storm is above the stability threshold then
743	       the CPU/memory congestion may last indefinitely unless
744	       some special procedure for relieving congestion is followed.
745	       During this period there are considerable amount of
746	       retransmissions and dropped OSPF packets.  If Hello packets are
747	       not given priority then there would also be link failures due
748	       to missed Hellos.  This type of LSA storm may happen very rarely
749	       in operational networks and usually some manual procedure such as
750	       taking down adjacencies in heavily congested routers is needed.

752	   (4) If Hello packets are given priority then the network stability
753	       threshold increases, i.e., the network can withstand a larger
754	       LSA storm. Furthermore, even if the network operates at or
755	       somewhat above this higher stability threshold, Hellos are
756	       still not missed and so there are no link failures.  So even
757	       if there is congestion in the control plane due to increased
758	       retransmissions requiring some special procedures for congestion
759	       reduction, the data plane remains unaffected.

761	   (5) If both Hello and Acknowledgement packets are given priority
762	       then the stability threshold increases even further.

764	   In Table 2 we show the network stability threshold for the five
765	   different cases and for the three different priority scenarios
766	   defined earlier.

768	|===========|========================================================|
769	|           |    Maximum Allowable LSA Storm Size For                |
770	|   Case    |=================|==================|===================|
771	|  Number   | No Priority to  |Priority to Hello | Priority to Hello |
772	|           |  Hello or Ack   |      Only        |   and Ack         |
773	|===========|=================|==================|===================|
774	|   Case 1  |        150      |        190       |        250        |
775	|___________|_________________|__________________|___________________|
776	|   Case 2  |        185      |        215       |        285        |
777	|___________|_________________|__________________|___________________|
778	|   Case 3  |        115      |        127       |        170        |
779	|___________|_________________|__________________|___________________|
780	|   Case 4  |        320      |        375       |        580        |
781	|___________|_________________|__________________|___________________|
782	|   Case 5  |        120      |        175       |        225        |
783	|___________|_________________|__________________|___________________|
784	|   Case 6  |        185      |        224       |        285        |
785	|___________|_________________|__________________|___________________|

787	       Table 2: Maximum Allowable LSA Storm for a Stable Network

789	   We also considered one more scenario with priority to Hello and Ack
790	   and with a truncated binary exponential backoff of the
791	   retransmission interval with an upper limit of 40 seconds (for the
792	   same LSA, each successive retransmission interval
793	   is doubled but not to exceed 40 seconds).  The maximum allowed
794	   LSA storm size for this scenario significantly exceeded the numbers
795	   given in the third column.

797	Appendix B.3. Observations on Simulation Results

799	   Table 2 shows that in all cases prioritizing Hello packets increases
800	   the network stability threshold, and in addition, prioritization of
801	   LSA Acknowledgment packets increases the stability threshold even
802	   further.  The reasons for the above observations are as follows.
803	   The main sources of sustained CPU/memory congestion (or positive
804	   feedback loop) following an LSA storm are (1) LSA retransmissions
805	   and (2) links being declared down due to missed Hellos which in
806	   turn causes further LSA origination and future recovery of the link
807	   causing even more LSA originations.
808	   Prioritizing Hello packets avoids and practically eliminates the
809	   second source of congestion.  Prioritizing Acknowledgments
810	   significantly reduces the first source of congestion, i.e.,
811	   LSA retransmissions.  It is to be noted that retransmissions can
812	   not be completely eliminated due to the following reasons. Firstly,
813	   only the explicit Acknowledgments are prioritized but duplicate
814	   LSAs carrying implicit Acknowledgments are still served at the
815	   lower priority.  Secondly, LSAs may get greatly delayed or dropped
816	   at the input queue of receivers and therefore Acknowledgments may
817	   not even get generated in which case prioritizing Acks would not
818	   help. Another factor to keep in mind is that since Hellos and Acks
819	   are prioritized, the LSAs see bigger delay and potential for
820	   dropping. However, the simulation results show that on the whole
821	   prioritizing Hello and LSA Acks are always beneficial and
822	   significantly improve the network stability threshold.

824	   As stated in Section B.2, exponenetial backoff of LSA retransmission
825	   interval further increases the network stability threshold.

827	   Our simulation study also showed that in each of the cases, instead
828	   of prioritizing Hello packets if we treat any packet received over
829	   a link as a surrogate for a Hello packet (an implicit Hello) then
830	   we get about the same stability threshold as obtained with
831	   prioritizing Hello packets.

833	Appendix C. Other Recommendations

835	   (1) Explicit Marking:  In Section 2 we recommended that OSPF packets
836	       be classified to "high" and "low" priority classes based on
837	       examining the OSPF packet header.  In some cases (particularly
838	       in the receiver) this examination may be computationally
839	       costly.  An alternative would be the
840	       use of different TOS/Precedence field settings for the two
841	       priority classes.  [Ref1] recommends setting the TOS field to 0
842	       and the Precedence field to 6 for all OSPF packets.  We recommend
843	       this same setting for the "low" priority OSPF packets and a
844	       different setting for the "high" priority OSPF packets in order
845	       to be able to classify them separately without having to examine
846	       the OSPF packet header.  Two examples are given below:

848	       Example 1: For "low" priority packets set TOS field to 0 and
849	                  Precedence field to 6, and for "high" priority
850	                  packets set TOS field to 4 and Precedence field to 6.

852	       Example 2: For "low" priority packets set TOS field to 0 and
853	                  Precedence field to 6, and for "high" priority
854	                  packets set TOS field to 0 and Precedence field to 7.

856	       This recommendation is not needed to be followed if it is easy
857	       to examine the OSPF packet header and thereby separately
858	       classify "high" and "low" priority packets.

860	   (2) Further Prioritization of OSPF Packets: Besides the packets
861	       designated as "high" priority in Recommendation 1 of Section 2
862	       there may be a need for further priority separation among the
863	       "low" priority OSPF packets.  We recommend the use of three
864	       priority classes: "high", "medium" and "low". While
865	       receiving a packet from a neighbor and while transmitting
866	       a packet to a neighbor, try to process a "high priority"
867	       packet ahead of "medium" and "low" priority packets and
868	       a "medium" priority packet ahead of "low priority" packets.
869	       The "high" priority packets are as designated in Recommendation
870	       1 of Section 2.  We provide below two candidate examples for
871	       "medium" priority packets.  All OSPF packets not designated
872	       as "high" or "medium" priority are "low" priority.

874	       One example of "medium" priority packet is the
875	       Database Description (DBD) packet from a slave (during the
876	       database synchronization process) that is used as an
877	       acknowledgment.

879	       A second example is an LSA carrying
880	       intra-area topology change information (this may trigger
881	       SPF calculation and rerouting of Label Switched paths and so
882	       fast processing of this packet may improve OSPF/LDP convergence
883	       times). However, if the processing cost of identifying and
884	       separately queueing the LSA in this example is deemed to be high
885	       then the implementor may decide not to do it.