idnits 2.17.1 

draft-ietf-bmwg-ospfconv-applicability-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 10
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 11 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 2004) is 7284 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'TERM' is mentioned on line 67, but not defined

  == Missing Reference: 'OSPF' is mentioned on line 395, but not defined

  == Unused Reference: 'RFC2119' is defined on line 412, but no explicit
     reference was found in the text

  -- No information found for draft-bmwg-ospfconv-intraarea - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'BENCHMARK' 

  == Outdated reference: A later version (-06) exists of
     draft-ietf-isis-igp-p2p-over-lan-03


     Summary: 4 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     Vishwas Manral
3	Internet Draft                                          Netplane Systems
4	                                                              Russ White
5	                                                           Cisco Systems
6	                                                             Aman Shaikh
7	Expiration Date: November 2004                  University of California
8	File Name: draft-ietf-bmwg-ospfconv-applicability-05.txt        May 2004

10	         Benchmarking Applicability for Basic OSPF Convergence
11	             draft-ietf-bmwg-ospfconv-applicability-05.txt

13	Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of RFC2026.

18	   Internet Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its Areas, and its Working Groups. Note that other
20	   groups may also distribute working documents as Internet Drafts.

22	   Internet Drafts are draft documents valid for a maximum of six
23	   months.  Internet Drafts may be updated, replaced, or obsoleted by
24	   other documents at any time. It is not appropriate to use Internet
25	   Drafts as reference material or to cite them other than as a "working
26	   draft" or "work in progress".

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	Copyright Notice

36	   Copyright (C) The Internet Society (2002).  All Rights Reserved.

38	Abstract

40	   This document discusses the applicability of various tests for
41	   measuring single router control plane convergence, specifically in
42	   regards to the Open Shortest First (OSPF) protocol. There are two
43	   general sections in this document, the first discussing specific
44	   advantages and limitations of specific OSPF convergence tests, and
45	   the second discussing more general pitfalls to be considered when
46	   testing routing protocols convergence testing.

48	1. Introduction

50	   There is a growing interest in testing single router control plane
51	   convergence for routing protocols, with many people looking at
52	   testing methodologies which can provide information on how long it
53	   takes for a network to converge after various network events occur.
54	   It is important to consider the framework within which any given
55	   convergence test is executed when attempting to apply the results of
56	   the testing, since the framework can have a major impact on the
57	   results. For instance, determining when a network is converged, what
58	   parts of the router's operation are considered within the testing,
59	   and other such things will have a major impact on what apparent
60	   performance routing protocols provide.

62	   This document describes in detail the various benefits and pitfalls
63	   of tests described in [BENCHMARK]. It also explains how such
64	   measurements can be useful for providers and the research community.

66	   NOTE: The word convergence within this document refers to single
67	   router control plane convergence [TERM].

69	2. Advantages of Such Measurement

71	   o    To be able to compare the iterations of a protocol implementa-
72	        tion. It is often useful to be able to compare the performance
73	        of two iterations of a given implementation of a protocol to
74	        determine where improvements have been made and where further
75	        improvements can be made.

77	   o    To understand, given a set of parameters (network conditions),
78	        how a particular implementation on a particular device is going
79	        to perform. For instance, if you were trying to decide the pro-
80	        cessing power (size of device) required in a certain location
81	        within a network, you can emulate the conditions which are going
82	        to exist at that point in the network and use the test described
83	        to measure the perfomance of several different routers. The
84	        results of these tests can provide one possible data point for
85	        an intelligent decision.

87	        If the device being tested is to be deployed in a running net-
88	        work, using routes taken from the network where the equipment is
89	        to be deployed rather than some generated topology in these
90	        tests will give results which are closer to the real preformance
91	        of the device. Care should be taken to emulate or take routes
92	        from the actual location in the network where the device will be
93	        (or would be) deployed. For instance, one set of routes may be
94	        taken from an ABR, one set from an area 0 only router, various
95	        sets from stub area, another set from various normal areas, etc.

97	   o    To measure the performance of an OSPF implementation in a wide
98	        variety of scenarios.

100	   o    To be used as parameters in OSPF simulations by researchers. It
101	        may some times be required for certain kinds of research to
102	        measure the individual delays of each parameter within an OSPF
103	        implementation. These delays can be measured using the methods
104	        defined in [BENCHMARK].

106	   o    To help optimize certain configurable parameters. It may some
107	        times be helpful for operators to know the delay required for
108	        individual tasks so as to optimize the resource usage in the
109	        network i.e. if it is found that the processing time is x
110	        seconds on an router, it would be helpful to determine the rate
111	        at which to flood LSA's to that router so as to not overload the
112	        network.

114	3. Assumptions Made and Limitations of such measurements

116	   o    The interactions of convergence and forwarding; testing is res-
117	        tricted to events occurring within the control plane. Forwarding
118	        performance is the primary focus in [INTERCONNECT] and it is
119	        expected to be dealt with in work that ensues from [FIB-TERM].

121	   o    Duplicate LSAs are Acknowledged Immediately. A few tests rely on
122	        the property that duplicate LSA Acknowledgements are not delayed
123	        but are done immediately. However if some implementation does
124	        not acknowledge duplicate LSAs immediately on receipt, the test-
125	        ing methods presented in [BENCHMARK] could give inaccurate meas-
126	        urements.

128	   o    It is assumed that SPF is non-preemptive. If SPF is implemented
129	        so that it can (and will be) preempted, the SPF measurements
130	        taken in [BENCHMARK] would include the times that the SPF pro-
131	        cess is not running ([BENCHMARK] measures the total time taken
132	        for SPF to run, not the amount of time that SPF actually spends
133	        on the device's processor), thus giving inaccurate measurements.

135	   o    Some implementations may be multithreaded or use a
136	        multiprocess/multirouter model of OSPF. If because of this any
137	        of the assumptions taken in measurement are violated in such a
138	        model, it could lead to inaccurate measurements.

140	   o    The measurements resulting from the tests in [BENCHMARK] may not
141	        provide the information required to deploy a device in a large
142	        scale network. The tests described focus on individual com-
143	        ponents of an OSPF implementation's performance, and it may be
144	        difficult to combine the measurements in a way which accurately
145	        depicts a device's performance in a large scale network. Further
146	        research is required in this area.

148	   o    The measurements described in [BENCHMARK] should be used with
149	        great care when comparing two different implementations of OSPF
150	        from two different vendors. For instance, there are many other
151	        factors than convergence speed that need to be taken into con-
152	        sideration when comparing different vendor's products, and it's
153	        difficult to align the resources available on one device to the
154	        resources available on another device.

156	4. Observations on the Tests Described in [BENCHMARK]

158	   Some observations taken while implementing the tests described in
159	   [BENCHMARK] are noted in this section.

161	4.1. Measuring the SPF Processing Time Externally

163	   The most difficult test to perform is the external measurement of the
164	   time required to perform an SPF calculation, since the amount of time
165	   between the first LSA which indicates a topology change and the
166	   duplicate LSA is critical. If the duplicate LSA is sent too quickly,
167	   it may be received before the device under test actually begins run-
168	   ning SPF on the network change information. If the delay between the
169	   two LSAs is too long, the device under test may finish SPF processing
170	   before receiving the duplicate LSA. It is important to closely inves-
171	   tigate any delays between the receipt of an LSA and the beginning of
172	   an SPF calculation in the device under test; multiple tests with
173	   various delays might be required to determine what delay needs to be
174	   used to accurately measure the SPF calculation time.

176	   Some implementations may force two intervals, the SPF hold time and
177	   the SPF delay, between successive SPF calculations. If an SPF hold
178	   time exists, it should be subtracted from the total SPF execution
179	   time. If an SPF delay exists, it should be noted in the test results.

181	4.2. Noise in the Measurement Device

183	   The device on which measurements are taken (not the device under
184	   test) also adds noise to the test results, primarily in the form of
185	   delay in packet processing and measurement output. The largest source
186	   of noise is generally the delay between the receipt of packets by the
187	   measuring device and the information about the packet reaching the
188	   device's output, where the event can be measured. The following steps
189	   may be taken to reduce this sampling noise:

191	   o    Increasing the number of samples taken will generally improve
192	        the tester's ability to determine what is noise, and remove it
193	        from the results.

195	   o    Try to take time-stamp for a packet as early as possible.
196	        Depending on the operating system being used on the box, one can
197	        instrument the kernel to take the time-stamp when the interrupt
198	        is processed. This does not eliminate the noise completely, but
199	        at least reduces it.

201	   o    Keep the measurement box as lightly loaded as possible.

203	   o    Having an estimate of noise can also be useful.

205	        The DUT also adds noise to the measurement. Points (a) and (c)
206	        apply to the DUT as well.

208	4.3. Gaining an Understanding of the Implementation Improves Measure-
209	   ments

211	   While the tester will (generally) not have access to internal infor-
212	   mation about the OSPF implementation being tested using [BENCHMARK],
213	   the more thorough the tester's knowledge of the implementation is,
214	   the more accurate the results of the tests will be. For instance, in
215	   some implementations, the installation of routes in local routing
216	   tables may occur while the SPF is being calculated, dramatically
217	   impacting the time required to calculate the SPF.

219	4.4. Gaining an Understanding of the Tests Improves Measurements

221	   One method which can be used to become familiar with the tests
222	   described in [BENCHMARK] is to perform the tests on an OSPF implemen-
223	   tation for which all the internal details are available, such as
224	   [GATED]. While there is no assurance that any two implementations
225	   will be similar, this will provide a better understanding of the
226	   tests themselves.

228	5. LSA and Destination mix

230	   In many OSPF benchmark tests, a generator injecting a number of LSAs
231	   is called for. There are several areas in which injected LSAs can be
232	   varied in testing:

234	   o    The number of destinations represented by the injected LSAs

236	        Each destination represents a single reachable IP network; these
237	        will be leaf nodes on the shortest path tree. The primary impact
238	        to performance should be the time required to insert destina-
239	        tions in the local routing table and handling the memory
240	        required to store the data.

242	   o    The types of LSAs injected

244	        There are several types of LSAs which would be acceptable under
245	        different situations; within an area, for instance, type 1, 2,
246	        3, 4, and 5 are likely to be received by a router. Within a
247	        not-so-stubby area, however, type 7 LSAs would replace the type
248	        5 LSAs received. These sorts of characterizations are important
249	        to note in any test results.

251	   o    The Number of LSAs injected

253	        Within any injected set of information, the number of each type
254	        of LSA injected is also important. This will impact the shortest
255	        path algorithms ability to handle large numbers of nodes, large
256	        shortest path first trees, etc.

258	   o    The Order of LSA Injection

260	        The order in which LSAs are injected should not favor any given
261	        data structure used for storing the LSA database on the device
262	        under test. For instance, AS-External LSA's have AS wide flood-
263	        ing scope; any Type-5 LSA originated is immediately flooded to
264	        all neighbors. However the Type-4 LSA which announces the ASBR
265	        as a border router is originated in an area at SPF time (by ABRs
266	        on the edge of the area in which the ASBR is). If SPF isn't
267	        scheduled immediately on the ABRs originating the type 4 LSA,
268	        the type-4 LSA is sent after the type-5 LSA's reach a router in
269	        the adjacent area. So routes to the external destinations aren't
270	        immediately added to the routers in the other areas. When the
271	        routers which already have the type 5's receive the type-4 LSA,
272	        all the external routes are added to the tree at the same time.
273	        This timing could produce different results than a router
274	        receiving a type 4 indicating the presence of a border router,
275	        followed by the type 5's originated by that border router.

277	        The ordering can be changed in various tests to provide insight
278	        on the efficiency of storage within the DUT. Any such changes in
279	        ordering should be noted in test results.

281	6. Tree Shape and the SPF Algorithm

283	   The complexity of Dijkstra's algorithm depends on the data structure
284	   used for storing vertices with their current minimum distances from
285	   the source, with the simplest structure being a list of vertices
286	   currently reachable from the source. In a simple list of vertices,
287	   finding the minimum cost vertex then would take O(size of the list).
288	   There will be O(n) such operations if we assume that all the vertices
289	   are ultimately reachable from the source. Moreover, after the vertex
290	   with min cost is found, the algorithm iterates thru all the edges of
291	   the vertex and updates cost of other vertices. With an adjacency list
292	   representation, this step when iterated over all the vertices, would
293	   take O(E) time, with E being the number of edges in the graph. Thus,
294	   overall running time is:

296	   O(sum(i:1, n)(size(list at level i) + E).

298	   So, everything boils down to the size (list at level i).

300	   If the graph is linear:

302	      root
303	       |
304	       1
305	       |
306	       2
307	       |
308	       3
309	       |
310	       4
311	       |
312	       5
313	       |
314	       6

316	      and source is a vertex on the end, then size(list at level i) = 1
317	      for all i. Moreover, E = n - 1. Therefore, running time is O(n).

319	      If graph is a balanced binary tree:

321	          root
322	         /    \
323	        1      2
324	       / \    / \
325	      3   4  5   6

327	      size(list at level i) is a little complicated. First it increases
328	      by 1 at each level upto a certain number, and then goes down by 1.
329	      If we assumed that tree is a complete tree (like the one in the
330	      draft) with k levels (1 to k), then size(list) goes on like this:
331	      1, 2, 3,

333	      Then the number of edges E is still n - 1. It then turns out that
334	      the run-time is O(n^2) for such a tree.

336	      If graph is a complete graph (fully-connected mesh), then
337	      size(list at level i) = n - i.  Number of edges E = O(n^2). There-
338	      fore, run-time is O(n^2).

340	      So, the performance of the shortest path first algorithm used to
341	      compute the best paths through the network is dependant o the con-
342	      struction of the tree The best practice would be to try and make
343	      any emulated network look as much like a real network as possible,
344	      especially in the area of the tree depth, the meshiness of the
345	      network, the number of stub links versus transit links, and the
346	      number of connections and nodes to process at each level within
347	      the original tree.

349	7. Topology Generation

351	   As the size of networks grows, it becomes more and more difficult to
352	   actually create a large scale network on which to test the properties
353	   of routing protocols and their implementations. In general, network
354	   emulators are used to provide emulated topologies which can be adver-
355	   tised to a device with varying conditions. Route generators either
356	   tend to be a specialized device, a piece of software which runs on a
357	   router, or a process that runs on another operating system, such as
358	   Linux or another variant of Unix.

360	   Some of the characteristics of this device should be:

362	   o    The ability to connect to the several devices using both point-
363	        to-point and broadcast high speed media. Point-to-point links
364	        can be emulated with high speed Ethernet as long as there is no
365	        hub or other device in between the DUT and the route generator,
366	        and the link is configured as a point-to-point link within OSPF
367	        [BROADCAST-P2P].

369	   o    The ability to create a set of LSAs which appear to be a logi-
370	        cal, realistic topology. For instance, the generator should be
371	        able to mix the number of point-to-point and broadcast links
372	        within the emulated topology, and should be able to inject vary-
373	        ing numbers of externally reachable destinations.

375	   o    The ability to withdraw and add routing information into and
376	        from the emulated topology to emulate links flapping.

378	   o    The ability to randomly order the LSAs representing the emulated
379	        topology as they are advertised.

381	   o    The ability to log or otherwise measure the time between packets
382	        transmitted and received.

384	   o    The ability to change the rate at which OSPF LSAs are transmit-
385	        ted.

387	   o    The generator and the collector should be fast enough so that
388	        they are not bottle necks. The devices should also have a degree
389	        of granularity of measurement atleast as small as desired from
390	        the test results.

392	8. Security Considerations

394	   This doecument does not modify the underlying security considerations
395	   in [OSPF].

397	9. Acknowledgements

399	    Thanks to Howard Berkowitz, (hcb@clark.net) and the rest of the BGP
400	   benchmarking team for their support and to Kevin
401	   Dubray(kdubray@juniper.net) who realized the need of this draft.

403	10. Normative References

405	   [BENCHMARK]
406	        Manral, V., "Benchmarking Basic OSPF Single Router Control Plane
407	        Convergence", draft-bmwg-ospfconv-intraarea-08, May 2004

409	   [TERM]Manral, V., "OSPF Convergence Testing Terminiology and Con-
410	        cepts", draft-bmwg-ospfconv-term-08, May 2004

412	   [RFC2119]
413	        Bradner, S., "Key words for use in RFCs to Indicate Requirement
414	        Levels", BCP 14, RFC 2119, March 1997

416	11. Informative References

418	   [INTERCONNECT]
419	        Bradner, S., McQuaid, J., "Benchmarking Methodology for Network
420	        Interconnect Devices", RFC2544, March 1999.

422	   [FIB-TERM]
423	        Trotter, G., "Terminology for Forwarding Information Base (FIB)
424	        based Router Performance", RFC3222, October 2001.

426	   [BROADCAST-P2P]
427	        Shen, Naiming, et al., "Point-to-point operation over LAN in
428	        link-state routing protocols", draft-ietf-isis-igp-p2p-over-
429	        lan-03.txt, August, 2003.

431	   [GATED]
432	        http://www.gated.org

434	12. Authors' Addresses
435	   Vishwas Manral
436	   Netplane Systems
437	   189 Prashasan Nagar
438	   Road number 72
439	   Jubilee Hills
440	   Hyderabad, India

442	   vmanral@netplane.com

444	   Russ White
445	   Cisco Systems, Inc.
446	   7025 Kit Creek Rd.
447	   Research Triangle Park, NC 27709

449	   riw@cisco.com

451	   Aman Shaikh
452	   University of California
453	   School of Engineering
454	   1156 High Street
455	   Santa Cruz, CA  95064

457	   aman@soe.ucsc.edu