idnits 2.17.1 

draft-ietf-ospf-scalability-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 2 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'Ref1' on line 643 looks like a reference

  -- Missing reference section? 'Ref2-Ref5' on line 184 looks like a reference

  -- Missing reference section? 'Ref6' on line 658 looks like a reference

  -- Missing reference section? 'Ref7' on line 661 looks like a reference

  -- Missing reference section? 'Ref8-Ref9' on line 186 looks like a reference

  -- Missing reference section? 'Ref10' on line 672 looks like a reference

  -- Missing reference section? 'Ref2' on line 646 looks like a reference

  -- Missing reference section? 'Ref3' on line 649 looks like a reference

  -- Missing reference section? 'Ref4' on line 652 looks like a reference

  -- Missing reference section? 'Ref5' on line 655 looks like a reference

  -- Missing reference section? 'Ref8' on line 663 looks like a reference

  -- Missing reference section? 'Ref9' on line 668 looks like a reference


     Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	   Internet Engineering Task Force                 Gagan L. Choudhury
3	   Internet Draft                                  Vera D. Sapozhnikova
4	   Expires in October, 2002                        AT&T
5	   draft-ietf-ospf-scalability-01.txt
6	                                                   Anurag S. Maunder
7	                                                   Sanera Systems

9	                                                   Vishwas Manral
10	                                                   Netplane Systems

12	                                                   April, 2002

14	    Explicit Marking and Prioritized Treatment of Specific IGP Packets
15	      for Faster IGP Convergence and Improved Network Scalability and
16	                                 Stability

18	Status of this Memo

20	   This document is an Internet-Draft and is in full conformance
21	   with all provisions of Section 10 of RFC2026.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six
29	   months and may be updated, replaced, or obsoleted by other documents
30	   at any time.  It is inappropriate to use Internet-Drafts as
31	   reference material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	        http://www.ietf.org/ietf/1id-abstracts.txt
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	        http://www.ietf.org/shadow.html.
37	   Distribution of this memo is unlimited.

39	Abstract

41	   In this draft we propose the following mechanisms in order to allow
42	   fast IGP convergence and at the same time maintain scalability and
43	   stability of a network:

45	   (1) Explicitly mark Hello packets, to differentiate them from other
46	       IGP packets, so that efficient implementations can detect and
47	       process the Hello packets in a priority fashion.

49	   (2) In the absence of special marking, or in addition to it, use
50	       other mechanisms in order not to miss Hello packets. One example
51	       is to treat any packet received over a link as a surrogate for
52	       a Hello packet for the purpose of keeping the link alive.

54	   (3) The same type of explicit marking and prioritized treatment may
55	       be beneficial to other IGP packets as well.  Some examples
56	       include (a) LSA acknowledgment packet, (b) Database description
57	       (DBD) packet from a slave that is used as an acknowledgement,
58	       and (c) LSAs carrying intra-area topology change information.

60	   It is possible that some implementations are already using one or
61	   more of the above mechanisms in order not to miss the processing of
62	   critical packets during periods of congestion.  However, we suggest
63	   the above mechanisms to be included as part of the standard so that
64	   all implementations can benefit from them.

66	Table of Contents

68	   1. Motivation......................................................2
69	   2. Simulation Study................................................5
70	   3. Analytic Model for Delay Experienced by a Hello Packet
71	      During an Initial LSA Storm.....................................7
72	   4. Need for Special Marking and Prioritized Treatment of
73	      Specific IGP Packets...........................................12
74	   5. Summary........................................................13
75	   6. Acknowledgments................................................14
76	   7. References.....................................................14
77	   8. Authors' Addresses.............................................15

79	1. Motivation

81	   The motivation of this draft is to address the following two key
82	   objectives of any data network: (a) Fast restoration under failure
83	   conditions, and (b) Improved network scalability and stability.
84	   Using analytic and simulation models we show that in general the two
85	   objectives are in conflict, i.e., improvement in one usually results
86	   in the degradation of the other.  However, special marking and
87	   prioritized processing of certain key messages can allow us to
88	   achieve both objectives.

90	   The first item we address is fast restoration. The theoretical limit
91	   for link-state routing protocols to re-route is in link propagation
92	   time scales, i.e., in tens of milliseconds.  However, as pointed
93	   out in [Ref1], in practice it may take from seconds to tens of
94	   seconds to detect the link failure and disseminate this information
95	   to the network followed by the convergence on the new set of paths.
96	   This is an inordinately long period of transient time for mission
97	   critical traffic destined to the non-reachable nodes of the network.

99	   One component of the long re-route time is the link failure detection
100	   time of between 20 and 30 seconds through typically three missed
101	   Hello packets with the typical hello interval of 10 seconds (between
102	   30 and 40 seconds if missed hello threshold is 4). This component
103	   would be much shorter in the presence of link level detection,
104	   but as pointed out in [Ref1] it does not work in some cases.
105	   For example, a device driver may detect the link level failure but
106	   fail to notify it to the IGP level.  Also, if a router fails behind
107	   a switch in a switched environment then even though the switch gets
108	   the link level notification it cannot communicate that to other
109	   routers. Therefore for faster reliable detection at the IGP level,
110	   one has to reduce the hello interval.  [Ref1] suggests that
111	   this be reduced to below a second, perhaps even to tens of
112	   milliseconds.  A second component of the long re-route time is
113	   delayed SPF (shortest-path-first) computation.  The typical delay
114	   value is between 1 and 5 seconds but in order to have sub-second
115	   rerouting it needs to be reduced significantly.

117	   The second item we address is the ability of a network to withstand
118	   the simultaneous or near-simultaneous update of a large number of
119	   link-state-advertisement messages, or LSAs.  We call this event, an
120	   LSA storm.  An LSA storm may be initiated due to many reasons.  Here
121	   are some examples:

123	   (a) one or more link failures due to fiber cuts,

125	   (b) one or more node failures for some reason, e.g., software
126	       crash or some type of disaster in an office complex hosting
127	       many nodes,

129	   (c) requirement of taking down and later bringing back many
130	       nodes during a software/hardware upgrade,

132	   (d) near-synchronization of the once-in-30-minutes refresh instants
133	       of some types of LSAs,

135	   (e) refresh of all LSAs in the system during a change in software
136	       version.

138	   In addition to the LSAs generated as a direct result of link/node
139	   failures, there may be other indirect LSAs as well.  One example
140	   in ATM/MPLS networks is LSAs generated at other links as a result
141	   of significant change in bandwidth resulting from rerouting of
142	   virtual circuits that went down during the link/node failure.  The
143	   LSA storm tends to drive the node CPU utilization to 100% for a
144	   period of time and the duration of this period increases with the
145	   size of the storm and the node adjacency, i.e., the number of links
146	   connected to it. During this period the Hello packets received at
147	   the node would see high delays and if this delay exceeds the
148	   Router-Dead Interval (typically 30-40 seconds or three to four hello
149	   intervals) then the associated link would be declared down.

151	   In this draft we address only the issue of links
152	   being declared down due to the delayed processing of Hello messages,
153	   but in general, depending on the implementation, there may be other
154	   impacts of a long CPU busy period.  For example, in a reliable node
155	   architecture with an active and a standby processor, a processor
156	   switch-over may result during an extended CPU-busy period which may
157	   mean that all the adjacencies would be lost and need to be re-
158	   established.  A processor switch-over may also result from a memory-
159	   exhaust caused by an extended CPU busy period. Both of the above
160	   events would cause more database synchronization with neighbors and
161	   network-wide LSA flooding which in turn might cause extended CPU-
162	   busy periods at other nodes.  This may cause unstable behavior in
163	   the network for an extended period of time and potentially a
164	   meltdown in the extreme case.

166	   Due to world-wide increased traffic
167	   demand, data networks are ever increasing in size. As the network
168	   size grows, a bigger LSA storm and a higher adjacency at certain
169	   nodes would be more likely and so would increase the probability of
170	   unstable behavior.  One way to address the scalability issue is to
171	   divide the network hierarchically into different areas so that
172	   flooding of LSAs remains localized within areas.  However, this
173	   approach increases the network management and design complexity and
174	   may result in less optimal routing between areas. Also, unless
175	   addresses are aggregated, a large number of summary LSAs may need to
176	   be flooded. Thus it is important to allow the network to grow towards
177	   as large a size as possible under a single area.

179	   The undesirable impact of large LSA storms is understood in the
180	   networking community and it is well known that large scale flooding
181	   of control messages (either naturally or due to a bug) has been
182	   responsible for several network events in the past causing a
183	   meltdown or a near-meltdown.  For some recent examples see
184	   [Ref2-Ref5].  Recently, proposals have been submitted to reduce
185	   flooding overhead in case more than one interface goes to the same
186	   neighbor [Ref6,Ref7].  Also, [Ref8-Ref9] considers a wide range
187	   of congestion control and failure recovery mechanisms.

189	   Section 2 uses a simulation model to illustrate the onset of
190	   instability in the network as the result of a large LSA storm.
191	   Section 3 uses a simple, approximate but easy-to-understand analytic
192	   model to make the point that reducing hello intervals and more
193	   frequent SPF computation would in fact reduce network scalability
194	   and stability. Section 4 makes the point that many of the underlying
195	   causes of network scalability can be avoided if certain IGP messages
196	   are specially marked and provided prioritized treatment. [Ref10]
197	   also provides simulation and analytic models to show the onset
198	   of instability in large networks due to LSA storms and proposes the
199	   prioritization of Hello and other special packets to improve
200	   scalability and stability.

202	2. Simulation Study

204	   We have developed a network-wide event simulation model to study the
205	   impact of an LSA storm.  It captures the actual congestion seen at
206	   various nodes and accounts for propagation delay between nodes,
207	   retransmissions in case an LSA is not acknowledged, failure of links
208	   for LSAs delayed beyond the Router-dead interval, and link recovery
209	   following database synchronization and LSA flooding once the LSA is
210	   processed. It approximates a real network implementation and uses
211	   processing times that are roughly in the same order of magnitude as
212	   measured in the real network (of the order of milliseconds).
213	   There are two categories of IGP messages processed at each node in
214	   the simulation. Category 1 messages are triggered by a timer and
215	   include the Hello refresh, LSA refresh and retransmission packets.
216	   Category 2 messages are not triggered by a timer and include
217	   received Hello, received LSA and received acknowledgments. Timer-
218	   triggered messages are given non-preemptive priority over the other
219	   type.  As a result, the received Hello packets and the
220	   received acknowledgment packets may see long queuing delays
221	   under intense CPU overload.

223	   Table 1 below shows sample results
224	   of the simulation study when applied to a network with about
225	   300 nodes and 800 links.  The node-adjacency varies from node
226	   to node and the maximum node-adjacency is 30.  The Hello
227	   interval is assumed to be 5 seconds, the minimum interval between
228	   successive SPF (Shortest-Path-First) calculations is 1 second, and
229	   the Router-Dead Interval is 15 seconds, i.e., a link is declared
230	   down if no Hello packet is received for three successive hello
231	   intervals.   During the study, an LSA storm of size X is created at
232	   instant of time 100 seconds where storm-size is defined as the
233	   number of LSAs generated during a storm.  Three cases are considered
234	   with X = 300, 600 and 900 respectively.  Besides the storm, there
235	   are also the normal once-in-thirty-minutes LSA refreshes.  At any
236	   given point of time we define a quantity "dispersion" that is the
237	   number of LSU packets already generated in the network but not
238	   received and processed in at least one node (each LSU packet is
239	   assumed to carry three LSAs).

241	   Table 1 plots dispersion as a function of time and thereby
242	   identifies the impact of LSA storm on network stability.

244	   ======|==========================================================
245	         |     Table 1: DISPERSION as a FUNCTION of TIME (in sec)
246	    LSA  |                     for different LSA Storm Sizes
247	   STORM |==========================================================
248	   SIZE  |100s  106s  110s  115s  140s  170s  230s  330s  370s
249	   ======|==========================================================
250	    300  | 0     39     3     1     0     1     0     0     0
251	   ------|----------------------------------------------------------
252	    600  | 0    133   120   100    12     1     0     0     0
253	   ------|----------------------------------------------------------
254	    900  | 0    230   215   196   101   119   224   428   488
255	   ======|==========================================================

257	   Before the LSA storm, the dispersion due to normal LSA refreshes
258	   remains small.  We expect the dispersion to jump to a high value
259	   right after the storm and then come down to the pre-storm level
260	   after some period of time (this happens with X=300 and X=600 but not
261	   with X=900).  In Table 1 with a LSA storm size 300, the "heavy
262	   dispersion period" lasted about 11 seconds and no link losses were
263	   observed.  With a LSA storm of size 600, the "heavy dispersion
264	   period" lasted about 40 seconds.  Some link losses were observed a
265	   little after 15 seconds within the "heavy dispersion period" but
266	   eventually all links recovered and the dispersion came down to the
267	   pre-storm level. With a LSA storm of size 900, the "heavy dispersion
268	   period" lasted throughout the simulation period (6 minutes).

270	   The generic observations are as follows:

272	   (1) If the initial LSA storm size (e.g., X=300) is such that the
273	       delays experienced by Hello packets are not big enough to cause
274	       any link failures anywhere in the network, the network remains
275	       stable and quickly gets back to a period of "low dispersion".
276	       These types of LSA storms are observed quite frequently in
277	       operational networks, from which the network easily recovers.

279	   (2) If the initial LSA storm size (e.g., X=600) is such that the
280	       delays experienced by a few Hello packets in a few nodes cause
281	       link failures then some secondary LSA storms are generated.
282	       However, the secondary storms do not keep growing indefinitely
283	       and the network remains stable and eventually gets back to a
284	       period of "low dispersion".  This type of LSA storm was observed
285	       in an operational network triggered by a network upgrade, from
286	       which the network recovered but with some difficulty.

288	   (3) If the initial LSA storm size (e.g., X=900), is such that the
289	       delays experienced by many Hello packets in many nodes cause link
290	       failures then a wave of secondary LSA storms are generated.  The
291	       network enters an unstable state and the secondary storms are
292	       sustained indefinitely or for a very long period of time. This
293	       type of LSA storm was observed in an operational network triggered
294	       by a network failure [Ref2] from which the network recovered only
295	       after taking some corrective steps (manual procedures based on
296	       reducing adjacencies at heavily congested nodes were used to
297	       reduce LSA flooding and stabilize the network).

299	   The results show that there is a LSA storm threshold above which the
300	   network shows unstable behavior.  It was also observed that if Hello
301	   packets (both received and sent) are given higher priority compared
302	   to other IGP packets then the LSA storm threshold above which network
303	   shows unstable behavior is significantly increased. In this draft we
304	   only look at the failure of links due to missed Hellos, but in
305	   general there may be many other types of failures once a network
306	   enters an unstable state.  Examples of failures include memory
307	   exhaust and shooting down of the node processor due to the
308	   inability of performing certain critical jobs.

310	3. Analytic Model for Delay experienced by a Hello Packet During an
311	Initial LSA Storm

313	   From the simulation results of the previous section it is clear that
314	   it is important to identify the delay experienced by a Hello packet
315	   during an initial LSA storm and compare that against the maximum
316	   allowed delay so as not to declare the link down.  We develop a
317	   simple and approximate analytic model for this purpose and use it to
318	   study the impact of Hello and SPF intervals on network stability.
319	   As explained in Section 2, for every link interface, a node has to
320	   send and receive a Hello packet once every hello interval. Sending
321	   of a Hello packet is triggered by a timer.  We assume that higher
322	   priority is given to timer-triggered jobs and therefore no
323	   significant delay is experienced in the sending of Hello packets.
324	   However, a received Hello packet cannot be easily distinguished from
325	   other IGP packets and therefore we assume that it is served in
326	   a first-come-first-served fashion.  Let's assume:

328	   S = Size of LSA storm, i.e., the number of LSAs in it.  Also, it is
329	   assumed that each LSA is carried in one LSU packet.
330	   L = Link adjacency of the node under consideration.

332	   t1 = Time to send or receive one IGP packet over an interface (the
333	   same time is assumed for Hello, LSA, duplicate LSA and LSA
334	   acknowledgment even though in general there may be some
335	   differences.  However, this would be a good approximation if
336	   majority of the time were in the act of receiving or sending and a
337	   relatively small part for packet-type-specific work.)  In the
338	   numerical examples we assume t1 = 1 ms.

340	   t2 = Time to do one SPF calculation. For large networks, this time
341	   is usually in hundreds of ms and in the numerical examples we assume
342	   t2 = 200 ms.

344	   Hi = Hello interval (the gap between successive Hello messages on
345	   the same link).

347	   Si = Minimum interval between successive SPF calculations.

349	   ro = Rate at which non-IGP work comes to the node (e.g., forwarding
350	   of data packets).  For the numerical examples we assume ro = 0.2.

352	   T = Total work brought in to the node during the LSA storm.  For
353	   each LSA update generated elsewhere, the node will receive one new
354	   LSA packet over one interface, send an acknowledgment packet over
355	   that interface, and send copies of the LSA packet over the remaining
356	   L-1 interfaces. Also, assuming that the implicit acknowledgment
357	   mechanism is in use, the node will subsequently receive either an
358	   acknowledgment or a duplicate LSA over the remaining L-1
359	   interfaces.  So over each interface one packet is sent and one is
360	   received.  It can be seen that the same would be true for self-
361	   generated LSAs (see Table 1 for an example).   So the total work per
362	   LSA update is 2*L*t1.  Since there are S LSAs in the storm, we get

364	   T = 2*S*L*t1      (1)

366	   In Equation (1) we ignore retransmissions of LSAs in case
367	   acknowledgments are not received or processed within 5 seconds.
368	   From the simulation study we see that this is a reasonable
369	   assumption since usually only a few retransmissions result during
370	   the processing of the initial LSA storm (usually retransmissions
371	   happen at a higher rate during the secondary storms).

373	   T2 = Time period over which the work comes. Due to differences in
374	   propagation times and congestion at other nodes, it is possible for
375	   the work arrival time to be spread out over a long interval.
376	   However, since we are primarily interested in a few nodes that are
377	   bottlenecks or near-bottlenecks, it is reasonable to assume that
378	   most of the work comes in one chunk.  We verified this to be usually
379	   true using simulations.  One part of T2 will be of the order of link
380	   propagation delay and we assume that there is a second part which is
381	   proportional to T. Therefore we get,
382	   T2 = A + B*T    (2)

384	   Where A and B are constants.  For the numerical examples we assume
385	   A = 10 ms and B = 0.1.

387	   D = Maximum delay experienced by a Hello packet during the LSA
388	   storm.  We assume first-come-first-served service and hence the
389	   delay seen by the Hello packet would be the total outstanding work
390	   at the node at the arrival instant plus its own processing time.  We
391	   assume that the outstanding work steadily increases over the
392	   interval T2 and so the maximum delay is seen by a Hello packet that
393	   comes near the end of this interval.  We write down an approximate
394	   expression for D and then explain the various terms on the right
395	   hand side:

397	   D = T - T2 + max(1,2*T2/Hi)*t1 + max(1,T2/Si)*t2 + ro*T2     (3)

399	   The first term is the total work brought in due to the LSA storm.
400	   The second term is the work the node was able to finish since we are
401	   assuming that it was continuously busy during the period T2.  The
402	   third term is the total work due to the sending and receiving of
403	   Hello packets during the period T2.  Note that it is assumed that at
404	   least one Hello packet is processed, i.e., itself.  The fourth term
405	   is due to SPF processing during the period T2 and we assume that at
406	   least one SPF processing is done.  The last term is the total non-
407	   IGP work coming to the node over the interval T2.

409	   Dmax = Maximum allowed value of D, i.e., if D exceeds this value
410	   then the associated link would be declared down. In the numerical
411	   examples below we assume

413	   Dmax = 3*Hi     (4)

415	   If we assume that the previous Hello packet was minimally delayed
416	   then exceeding Dmax really means four missed hellos since the Hello
417	   packet under study itself came after a period Hi.  In the numerical
418	   examples below, both D and Dmax change with choice of system
419	   parameters and we are mainly interested in identifying if D exceeds
420	   Dmax.  For this purpose we define the following ratio variable

422	   Delay Ratio = D/Dmax      (5)

424	   and identify if Delay Ratio exceeds 1.

426	   In Tables 2-4 we plot the Delay Ratio as a function of LSA Storm
427	   size with node adjacencies 10, 20 and 50 respectively.  All
428	   parameters except for the ones noted explicitly on the Tables are as
429	   stated earlier.  Table 2 assumes Hello packets every 10 seconds and
430	   SPF calculation every 5 seconds, which are typical default values
431	   today.  With a node adjacency of 10, the Delay Ratio is below 1 even
432	   with an LSA storm of size 900.  However, with a node adjacency of
433	   20, the Delay Ratio exceeds 1 at around a storm of size 800 and with
434	   a node adjacency of 50, the Delay Ratio exceeds 1 at around a storm
435	   of size 325.

437	   ==========|========================================================
438	             | Table 2: Ratio of Hello Packet Delay to Maximum Allowed
439	             | Hello Packet Delay as a function of LSA Storm Size (LSS)
440	             | (Hello Every 10 Seconds, SPF Every 5 Seconds,
441	             |  Dmax = 30 seconds)
442	     NODE    |========================================================
443	   Adjacency |  LSS=100    LSS=300    LSS=500    LSS=700    LSS=900
444	   ==========|========================================================
445	     10      |   0.0677     0.1904     0.3131     0.4358     0.5584
446	   ----------|--------------------------------------------------------
447	     20      |   0.1291     0.3744     0.6198     0.8651     1.1104
448	   ----------|--------------------------------------------------------
449	     50      |   0.3131     0.9264     1.5398     2.1558     2.7718
450	   ==========|========================================================

452	   In a large network it is not unusual to have LSA storms of size
453	   several hundreds since the LSA database size may be several
454	   thousands. This is particularly true if there are many Autonomous-
455	   System-External (ASE) LSAs and there are special LSAs for carrying
456	   information about available bandwidth at links as is common in ATM
457	   networks and might be used in MPLS-based networks as well.  Table 3
458	   decreases the hello interval to 2 seconds and SPF calculation is
459	   done once a second.  LSA storm thresholds are significantly reduced.
460	   Specifically, with a node adjacency of 10, the Delay Ratio exceeds 1
461	   at around a storm of size 310; with a node adjacency of 20, the
462	   Delay Ratio exceeds 1 at around a storm of size 160; and with a node
463	   adjacency of 50, the Delay Ratio exceeds 1 at around a storm of size
464	   only 65.

466	   ==========|=======================================================
467	             |Table 3: Ratio of Hello Packet Delay to Maximum Allowed
468	             |Hello Packet Delay as a function of LSA Storm Size (LSS)
469	             |  (Hello Every 2 Seconds, SPF Every 1 Second,
470	             |   Dmax = 6 seconds)
471	   NODE      |========================================================
472	   ADJACENCY |   LSS=30    LSS=90    LSS=150    LSS=210    LSS=270
473	   ==========|========================================================
474	     10      |   0.124     0.308      0.492      0.676      0.86
475	   ----------|--------------------------------------------------------
476	     20      |   0.216     0.584      0.952      1.32       1.691
477	   ----------|--------------------------------------------------------
478	     50      |   0.492     1.412      2.349      3.289      4.229
479	   ==========|========================================================

481	   Table 4 decreases the hello interval even further to 300 ms and SPF
482	   calculation is done once every 500 ms. LSA storm thresholds are
483	   really small now.  Specifically, with a node adjacency of 10, the
484	   Delay Ratio exceeds 1 at around a storm of size 40, with a node
485	   adjacency of 20, the Delay Ratio exceeds 1 at around a storm of size
486	   20, and with a node adjacency of 50, the Delay Ratio is already over
487	   1 even with a storm of size 10.

489	   ==========|========================================================
490	             | Table 4: Ratio of Hello Packet Delay to Maximum Allowed
491	             | Hello Packet Delay as a function of LSA Storm Size (LSS)
492	             | (Hello Every 300 ms, SPF Every 500 ms, Dmax = 900 ms)
493	   NODE      |========================================================
494	   ADJACENCY |    LSS=10      LSS=30      LSS=50    LSS=70    LSS=90
495	   ==========|========================================================
496	     10      |     0.419       0.828      1.237      1.646     2.055
497	   ----------|--------------------------------------------------------
498	     20      |     0.623       1.441      2.259      3.078     3.896
499	   ----------|--------------------------------------------------------
500	     50      |     1.237       3.282      5.333      7.467     9.602
501	   ==========|========================================================

503	   Based on the simulation observations we understand that if Delay
504	   Ratio is less than 1 for all Hello packets then the system is stable
505	   and if it exceeds 1 at many nodes then the system tends to enter an
506	   unstable region.  Therefore, the LSA storm threshold at which the
507	   Delay Ratio exceeds 1 may also roughly be considered as the network
508	   stability threshold.  Tables 2-4 show that the stability threshold
509	   rapidly decreases as the hello interval and SPF computation interval
510	   decreases.  One reason for this is the increased CPU work due to
511	   more frequent hello and SPF computations, but the dominant reason is
512	   that Dmax itself decreases and so a smaller CPU busy interval is
513	   needed to exceed it.  Specifically, Dmax is 30 seconds in Table 2, 6
514	   Seconds in Table 3 and only 900 ms in Table 4. It is clear from the
515	   above examples that in order to maintain network stability as the
516	   hello interval decreases, it is necessary to provide faster
517	   prioritized treatment to received Hello packets which can of course
518	   be only done if those packets can be distinguished from other IGP
519	   packets.

521	4. Need for Special Marking and Prioritized Treatment of Specific IGP
522	packets

524	   The analytic and simulation models show that a major cause for
525	   unstable behavior in networks is received Hello packets at a node
526	   getting queued behind other work brought in to the node during an
527	   LSA storm and missing the deadline of typically three or four hello
528	   intervals.  Clearly, if the Hello packet can be specially marked to
529	   distinguish it from other IGP packets then they can be
530	   given prioritized treatment and they would not miss the deadline
531	   even during a large LSA storm.  However, the key is that the
532	   detection mechanism should be significantly faster than the complete
533	   processing of an IGP packet and it should be possible to do
534	   detection and separate queueing at the line rate.

536	   Usually a special Diffserv codepoint is used to differentiate all
537	   IGP packets from other packets.  We propose a separate Diffserv
538	   codepoint for Hello packets that allows them to be queued separately
539	   from other IGP packets and given prioritized treatment.

541	   We also suggest the use of additional mechanisms in order not to miss
542	   Hello packets during periods of congestion and thereby avoid
543	   declaring links to be down.  One such mechanism is to treat any
544	   packet received over a link as an implicit Hello packet for the
545	   purpose of keeping the link alive.  Under this mechanism a link will
546	   be declared down only if no packets are received over the link for a
547	   duration of the Router Dead interval. So, during a period of
548	   congestion, if Hello packets are queued behind LSAs or some other
549	   packets but at least one such packet is received over the link no
550	   slower than once every Router Dead interval, the link will stay up.

552	   Besides the Hello packets there may be other IGP packets that could
553	   also benefit from special marking and prioritized treatment. We give
554	   some examples below but clearly others are possible.

556	   (1) One example is the LSA acknowledgment packet.  This packet
557	       disables retransmission and if a large queueing delay to this
558	       packet expires the retransmission timer (typical default value
559	       is 5 seconds) then a needless retransmission will happen causing
560	       extra traffic load. A special marking and prioritization of the
561	       LSA acknowledgment packet would eliminate many needless
562	       retransmissions. During the database exchange process between
563	       neighbours following a link coming up, Database Description
564	       packets are exchanged and the successful receipt of such a
565	       packet is acknowledged by sending a properly sequenced Database
566	       Description packet back to the sender.  Since these packets are
567	       used as acknowledgments, it makes sense to properly mark and
568	       prioritize them as well.

570	   (2) Another example is an LSA carrying a change information. It is
571	       preferable to transmit this information faster than other LSAs
572	       in the network that are just once-in-30-minutes refreshes.

574	       Among "change" LSAs we can distinguish further and give
575	       preferential treatment to only those "change" LSAs that carry
576	       intra-area topology change information as opposed to other
577	       "change" LSAs that are summary LSAs or Opaque LSAs.  We can also
578	       distinguish between "change" LSAs carrying "bad" information
579	       (node/link failure) versus those carrying "good" information
580	       (node/link coming up) and give higher priority to LSAs carrying
581	       "bad" information. There may be multiple levels of priority
582	       depending on the relative importance of the various IGP packets.

584	       The explicit identification can also be used for preferentially
585	       triggering the SPF calculation. We can normally have a longer gap
586	       between successive SPF calculations, but revert to a shorter gap
587	       after receiving an LSA that carries a area-topology-change
588	       information. This will speed up restoration time following a
589	       failure but would not unduly increase the SPF processing overhead.

591	5. Summary

593	   In this draft we point out that if a large LSA storm is generated as
594	   a result of some type of failure/recovery of nodes/links or
595	   synchronization among refreshes then the Hello packets received at a
596	   node may see large queueing delays and miss the deadline of
597	   typically three or four hello intervals.  This causes the associated
598	   link to be declared down, starts a secondary storm and is
599	   potentially the beginning of unstable behavior in the network.  This
600	   is already a concern in today's network but would be a bigger
601	   concern if the hello interval and the minimum interval between SPF
602	   calculations are substantially reduced (below or perhaps well below
603	   a second) in order to allow faster rerouting.  To avoid the above,
604	   we propose the following:

606	   (1) Explicitly mark Hello packets to differentiate them from other
607	       IGP packets so that efficient implementations can detect and act
608	       upon these packets in a priority fashion. This may be done by
609	       using a special Diffserv codepoint for Hello packets (separate
610	       from that used for other IGP packets).

612	   (2) In the absence of special marking or in addition to it, other
613	       mechanisms should be used in order not to miss Hello packets.
614	       One example is to treat any packet received over a link as a
615	       surrogate for a Hello packet for the purpose of keeping the link
616	       alive.

618	   (3) The same type of explicit marking and prioritized treatment
619	       would also help other IGP packets and should be considered.  Some
620	       examples include LSA acknowledgment packets, Database Description
621	       packets from the slave during database exchange and LSAs carrying
622	       intra-area topology change information. LSAs carrying bad news
623	       (node/link failures) may also be given priority over LSAs
624	       carrying good news (node/link coming back up).

626	   It is possible that some implementations are already using one or
627	   more of the above mechanisms in order not to miss the processing of
628	   critical packets during periods of congestion.  However, we suggest
629	   the above mechanisms to be included as part of the standard so that
630	   all implementations can benefit from them.

632	6. Acknowledgments

634	   The authors would like to acknowledge several people for their
635	   helpful comments.  In AT&T we recognize Tushar Amin, Jerry Ash,
636	   Margaret Chiosi, Elie Francis, Jeff Han, Tom Helstern, Shih-Yue Hou,
637	   S. Kandaswamy, Beth Munson, Aswatnarayan Raghuram, Moshe Segal, John
638	   Tinacci, Mike Wardlow and Pat Wirth.  In Lucent Technologies we
639	   recognize Nabil Biter and Roshan Rao.

641	7. References

643	   [Ref1] C. Alaettinoglu, V. Jacobson and H. Yu, "Towards Milli-second
644	   IGP Convergence," Work in Progress.

646	   [Ref2] Pappalardo, D., "AT&T, customers grapple with ATM net outage,"
647	   Network World, February 26, 2001.

649	   [Ref3] "AT&T announces cause of frame-relay network outage," AT&T
650	   Press Release, April 22, 1998.

652	   [Ref4] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive
653	   Week, August 20, 1999.

655	   [Ref5] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light
656	   Reading, April 6, 2001.

658	   [Ref6] A. Zinin and M. Shand, "Flooding Optimizations in Link-State
659	   Routing Protocols," Work in Progress.

661	   [Ref7] J. Moy, "Flooding over Parallel Point-to-Point Links," Work in
662	   progress.
663	   [Ref8] J. Ash, G. Choudhury, J. Han, V. Sapozhnikova, M. Sherif, M.
664	   Noorchashm, S. Mcallister, A. Maunder, V. Manral, "Proposed
665	   Mechanisms for Congestion Control / Failure Recovery in OSPF & ISIS
666	   Networks" Work in Progress.

668	   [Ref9] J. Ash, G. Choudhury, V. Sapozhnikova, M. Sherif, A. Maunder,
669	   V. Manral, "Congestion Avoidance & Control for OSPF Networks",
670	   Work in Progress.

672	   [Ref10] G. Choudhury, A. Maunder and V. Sapozhnikova, "Faster
673	   Link-State IGP Convergence and Improved Network Scalability and
674	   Stability," Presentation at LCN 2001, Tampa, Florida, November
675	   14-16, 2001.

677	8 Authors' Addresses

679	   Gagan L. Choudhury
680	   AT&T
681	   Room D5-3C21
682	   200 Laurel Avenue
683	   Middletown, NJ, 07748
684	   USA
685	   Phone: (732)420-3721
686	   email: gchoudhury@att.com

688	   Vera D. Sapozhnikova
689	   AT&T
690	   Room C5-2C29
691	   200 Laurel Avenue
692	   Middletown, NJ, 07748
693	   USA
694	   Phone: (732)420-2653
695	   email: sapozhnikova@att.com
696	   Anurag S. Maunder
697	   Sanera Systems
698	   370 San Aleso Ave.
699	   Second Floor
700	   Sunnyvale, CA 94085
701	   Phone: (408)734-6123
702	   email: amaunder@sanera.net

704	   Vishwas Manral
705	   NetPlane
706	   189, Prashasan Nagar,
707	   Road Number 72
708	   Jubilee Hills, Hyderabad
709	   India
710	   email: Vishwasm@netplane.com