idnits 2.17.1 draft-ietf-bmwg-ospfconv-applicability-04.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([BENCHMARK], [TERM]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 2004) is 7375 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'BENCHMARK' on line 396 looks like a reference -- Missing reference section? 'TERM' on line 61 looks like a reference -- Missing reference section? 'INTERCONNECT' on line 405 looks like a reference -- Missing reference section? 'FIB-TERM' on line 409 looks like a reference -- Missing reference section? 'GATED' on line 418 looks like a reference -- Missing reference section? 'BROADCAST-P2P' on line 413 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Vishwas Manral 3 Internet Draft Netplane Systems 4 Russ White 5 Cisco Systems 6 Aman Shaikh 7 Expiration Date: August 2004 University of California 8 File Name: draft-ietf-bmwg-ospfconv-applicability-04.txt February 2004 10 Benchmarking Applicability for Basic OSPF Convergence 11 draft-ietf-bmwg-ospfconv-applicability-04.txt 13 1. Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its Areas, and its Working Groups. Note that other 20 groups may also distribute working documents as Internet Drafts. 22 Internet Drafts are draft documents valid for a maximum of six 23 months. Internet Drafts may be updated, replaced, or obsoleted by 24 other documents at any time. It is not appropriate to use Internet 25 Drafts as reference material or to cite them other than as a "working 26 draft" or "work in progress". 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 2. Abstract 36 This draft describes the applicability of [BENCHMARK] and similar 37 work which may be done in the future. Refer to [TERM] for terminology 38 used in this draft and [BENCHMARK]. The draft defines the advantages 39 as well as limitations of using the method defined in [BENCHMARK], 40 besides describing the pitfalls to avoid during measurement. 42 3. Motivation 44 There is a growing interest in testing single router control plane 45 convergence for routing protocols, with many people looking at 46 testing methodologies which can provide information on how long it 47 takes for a network to converge after various network events occur. 48 It is important to consider the framework within which any given 49 convergence test is executed when attempting to apply the results of 50 the testing, since the framework can have a major impact on the 51 results. For instance, determining when a network is converged, what 52 parts of the router's operation are considered within the testing, 53 and other such things will have a major impact on what apparent 54 performance routing protocols provide. 56 This document describes in detail the various benefits and pitfalls 57 of tests described in [BENCHMARK]. It also explains how such 58 measurements can be useful for providers and the research community. 60 NOTE: The word convergence within this document refers to single 61 router control plane convergence [TERM]. 63 4. Advantages of Such Measurement 65 o To be able to compare the iterations of a protocol implemen- 66 tation. It is often useful to be able to compare the perfor- 67 mance of two iterations of a given implementation of a proto- 68 col to determine where improvements have been made and where 69 further improvements can be made. 71 o To understand, given a set of parameters (network condi- 72 tions), how a particular implementation on a particular dev- 73 ice is going to perform. For instance, if you were trying to 74 decide the processing power (size of device) required in a 75 certain location within a network, you can emulate the condi- 76 tions which are going to exist at that point in the network 77 and use the test described to measure the perfomance of 78 several different routers. The results of these tests can 79 provide one possible data point for an intelligent decision. 81 If the device being tested is to be deployed in a running 82 network, using routes taken from the network where the equip- 83 ment is to be deployed rather than some generated topology in 84 these tests will give results which are closer to the real 85 preformance of the device. Care should be taken to emulate or 86 take routes from the actual location in the network where the 87 device will be (or would be) deployed. For instance, one set 88 of routes may be taken from an ABR, one set from an area 0 89 only router, various sets from stub area, another set from 90 various normal areas, etc. 92 o To measure the performance of an OSPF implementation in a 93 wide variety of scenarios. 95 o To be used as parameters in OSPF simulations by researchers. 96 It may some times be required for certain kinds of research 97 to measure the individual delays of each parameter within an 98 OSPF implementation. These delays can be measured using the 99 methods defined in [BENCHMARK]. 101 o To help optimize certain configurable parameters. It may some 102 times be helpful for operators to know the delay required for 103 individual tasks so as to optimize the resource usage in the 104 network i.e. if it is found that the processing time is x 105 seconds on an router, it would be helpful to determine the 106 rate at which to flood LSA's to that router so as to not 107 overload the network. 109 5. Assumptions Made and Limitations of such measurements 111 o The interactions of convergence and forwarding; testing is res- 112 tricted to events occurring within the control plane. Forwarding 113 performance is the primary focus in [INTERCONNECT] and it is 114 expected to be dealt with in work that ensues from [FIB-TERM]. 116 o Duplicate LSAs are Acknowledged Immediately. A few tests rely on 117 the property that duplicate LSA Acknowledgements are not delayed 118 but are done immediately. However if some implementation does not 119 acknowledge duplicate LSAs immediately on receipt, the testing 120 methods presented in [BENCHMARK] could give inaccurate measure- 121 ments. 123 o It is assumed that SPF is non-preemptive. If SPF is implemented so 124 that it can (and will be) preempted, the SPF measurements taken in 125 [BENCHMARK] would include the times that the SPF process is not 126 running ([BENCHMARK] measures the total time taken for SPF to run, 127 not the amount of time that SPF actually spends on the device's 128 processor), thus giving inaccurate measurements. 130 o Some implementations may be multithreaded or use a 131 multiprocess/multirouter model of OSPF. If because of this any of 132 the assumptions taken in measurement are violated in such a model, 133 it could lead to inaccurate measurements. 135 o The measurements resulting from the tests in [BENCHMARK] may not 136 provide the information required to deploy a device in a large 137 scale network. The tests described focus on individual components 138 of an OSPF implementation's performance, and it may be difficult 139 to combine the measurements in a way which accurately depicts a 140 device's performance in a large scale network. Further research is 141 required in this area. 143 o The measurements described in [BENCHMARK] should be used with 144 great care when comparing two different implementations of OSPF 145 from two different vendors. For instance, there are many other 146 factors than convergence speed which must be taken into considera- 147 tion when comparing different vendor's products, and it's diffi- 148 cult to align the resources available on one device to the 149 resources available on another device. 151 6. Observations on the Tests Described in [BENCHMARK] 153 Some observations taken while implementing the tests described in 154 [BENCHMARK] are noted in this section. 156 6.1. Measuring the SPF Processing Time Externally 158 The most difficult test to perform is the external measurement of the 159 time required to perform an SPF calculation, since the amount of time 160 between the first LSA which indicates a topology change and the 161 duplicate LSA is critical. If the duplicate LSA is sent too quickly, 162 it may be received before the device under test actually begins run- 163 ning SPF on the network change information. If the delay between the 164 two LSAs is too long, the device under test may finish SPF processing 165 before receiving the duplicate LSA. It is important to closely inves- 166 tigate any delays between the receipt of an LSA and the beginning of 167 an SPF calculation in the device under test; multiple tests with 168 various delays might be required to determine what delay needs to be 169 used to accurately measure the SPF calculation time. 171 Some implementations may force two intervals, the SPF hold time and 172 the SPF delay, between successive SPF calculations. If an SPF hold 173 time exists, it should be subtracted from the total SPF execution 174 time. If an SPF delay exists, it should be noted in the test results. 176 6.2. Noise in the Measurement Device 178 The device on which measurements are taken (not the device under 179 test) also adds noise to the test results, primarily in the form of 180 delay in packet processing and measurement output. The largest source 181 of noise is generally the delay between the receipt of packets by the 182 measuring device and the information about the packet reaching the 183 device's output, where the event can be measured. The following steps 184 may be taken to reduce this sampling noise: 186 o Increasing the number of samples taken will generally improve 187 the tester's ability to determine what is noise, and remove it 188 from the results. 190 o Try to take time-stamp for a packet as early as possible. 191 Depending on the operating system being used on the box, one 192 can instrument the kernel to take the time-stamp when the 193 interrupt is processed. This does not eliminate the noise com- 194 pletely, but at least reduces it. 196 o Keep the measurement box as lightly loaded as possible. 198 o Having an estimate of noise can also be useful. 200 The DUT also adds noise to the measurement. Points (a) and (c) 201 apply to the DUT as well. 203 6.3. Gaining an Understanding of the Implementation Improves Measure- 204 ments 206 While the tester will (generally) not have access to internal infor- 207 mation about the OSPF implementation being tested using [BENCHMARK], 208 the more thorough the tester's knowledge of the implementation is, 209 the more accurate the results of the tests will be. For instance, in 210 some implementations, the installation of routes in local routing 211 tables may occur while the SPF is being calculated, dramatically 212 impacting the time required to calculate the SPF. 214 6.4. Gaining an Understanding of the Tests Improves Measurements 216 One method which can be used to become familiar with the tests 217 described in [BENCHMARK] is to perform the tests on an OSPF implemen- 218 tation for which all the internal details are available, such as 219 [GATED]. While there is no assurance that any two implementations 220 will be similar, this will provide a better understanding of the 221 tests themselves. 223 7. LSA and Destination mix 225 In many OSPF benchmark tests, a generator injecting a number of LSAs 226 is called for. There are several areas in which injected LSAs can be 227 varied in testing: 229 o The number of destinations represented by the injected LSAs 231 Each destination represents a single reachable IP network; 232 these will be leaf nodes on the shortest path tree. The pri- 233 mary impact to performance should be the time required to 234 insert destinations in the local routing table and handling 235 the memory required to store the data. 237 o The types of LSAs injected 239 There are several types of LSAs which would be acceptable 240 under different situations; within an area, for instance, 241 type 1, 2, 3, 4, and 5 are likely to be received by a router. 242 Within a not-so-stubby area, however, type 7 LSAs would 243 replace the type 5 LSAs received. These sorts of characteri- 244 zations are important to note in any test results. 246 o The Number of LSAs injected 248 Within any injected set of information, the number of each 249 type of LSA injected is also important. This will impact the 250 shortest path algorithms ability to handle large numbers of 251 nodes, large shortest path first trees, etc. 253 o The Order of LSA Injection 255 The order in which LSAs are injected should not favor any 256 given data structure used for storing the LSA database on the 257 device under test. For instance, AS-External LSA's have AS 258 wide flooding scope; any Type-5 LSA originated is immediately 259 flooded to all neighbors. However the Type-4 LSA which 260 announces the ASBR as a border router is originated in an 261 area at SPF time (by ABRs on the edge of the area in which 262 the ASBR is). If SPF isn't scheduled immediately on the ABRs 263 originating the type 4 LSA, the type-4 LSA is sent after the 264 type-5 LSA's reach a router in the adjacent area. So routes 265 to the external destinations aren't immediately added to the 266 routers in the other areas. When the routers which already 267 have the type 5's receive the type-4 LSA, all the external 268 routes are added to the tree at the same time. This timing 269 could produce different results than a router receiving a 270 type 4 indicating the presence of a border router, followed 271 by the type 5's originated by that border router. 273 The ordering can be changed in various tests to provide 274 insight on the efficiency of storage within the DUT. Any such 275 changes in ordering should be noted in test results. 277 8. Tree Shape and the SPF Algorithm 279 The complexity of Dijkstra's algorithm depends on the data structure 280 used for storing vertices with their current minimum distances from 281 the source, with the simplest structure being a list of vertices 282 currently reachable from the source. In a simple list of vertices, 283 finding the minimum cost vertex then would take O(size of the list). 284 There will be O(n) such operations if we assume that all the vertices 285 are ultimately reachable from the source. Moreover, after the vertex 286 with min cost is found, the algorithm iterates thru all the edges of 287 the vertex and updates cost of other vertices. With an adjacency list 288 representation, this step when iterated over all the vertices, would 289 take O(E) time, with E being the number of edges in the graph. Thus, 290 overall running time is: 292 O(sum(i:1, n)(size(list at level i) + E). 294 So, everything boils down to the size (list at level i). 296 If the graph is linear: 298 root 299 | 300 1 301 | 302 2 303 | 304 3 305 | 306 4 307 | 308 5 309 | 310 6 312 and source is a vertex on the end, then size(list at level i) 313 = 1 for all i. Moreover, E = n - 1. Therefore, running time 314 is O(n). 316 If graph is a balanced binary tree: 318 root 319 / \ 320 1 2 321 / \ / \ 322 3 4 5 6 324 size(list at level i) is a little complicated. First it 325 increases by 1 at each level upto a certain number, and then 326 goes down by 1. If we assumed that tree is a complete tree 327 (like the one in the draft) with k levels (1 to k), then 328 size(list) goes on like this: 1, 2, 3, 330 Then the number of edges E is still n - 1. It then turns out 331 that the run-time is O(n^2) for such a tree. 333 If graph is a complete graph (fully-connected mesh), then 334 size(list at level i) = n - i. Number of edges E = O(n^2). 335 Therefore, run-time is O(n^2). 337 So, the performance of the shortest path first algorithm used 338 to compute the best paths through the network is dependant o 339 the construction of the tree The best practice would be to 340 try and make any emulated network look as much like a real 341 network as possible, especially in the area of the tree 342 depth, the meshiness of the network, the number of stub links 343 versus transit links, and the number of connections and nodes 344 to process at each level within the original tree. 346 9. Topology Generation 348 As the size of networks grows, it becomes more and more difficult to 349 actually create a large scale network on which to test the properties 350 of routing protocols and their implementations. In general, network 351 emulators are used to provide emulated topologies which can be adver- 352 tised to a device with varying conditions. Route generators either 353 tend to be a specialized device, a piece of software which runs on a 354 router, or a process that runs on another operating system, such as 355 Linux or another variant of Unix. 357 Some of the characteristics of this device should be: 359 o The ability to connect to the several devices using both point- 360 to-point and broadcast high speed media. Point-to-point links can 361 be emulated with high speed Ethernet as long as there is no hub or 362 other device in between the DUT and the route generator, and the 363 link is configured as a point-to-point link within OSPF 364 [BROADCAST-P2P]. 366 o The ability to create a set of LSAs which appear to be a logical, 367 realistic topology. For instance, the generator should be able to 368 mix the number of point-to-point and broadcast links within the 369 emulated topology, and should be able to inject varying numbers of 370 externally reachable destinations. 372 o The ability to withdraw and add routing information into and from 373 the emulated topology to emulate links flapping. 375 o The ability to randomly order the LSAs representing the emulated 376 topology as they are advertised. 378 o The ability to log or otherwise measure the time between packets 379 transmitted and received. 381 o The ability to change the rate at which OSPF LSAs are transmitted. 383 o The generator and the collector should be fast enough so that they 384 are not bottle necks. The devices should also have a degree of 385 granularity of measurement atleast as small as desired from the 386 test results. 388 10. Acknowledgements 390 Thanks to Howard Berkowitz, (hcb@clark.net) and the rest of the BGP 391 benchmarking team for their support and to Kevin 392 Dubray(kdubray@juniper.net) who realized the need of this draft. 394 11. Normative References 396 [BENCHMARK] 397 Manral, V., "Benchmarking Basic OSPF Single Router Control Plane 398 Convergence", draft-bmwg-ospfconv-intraarea-07, November 2003 400 [TERM]Manral, V., "OSPF Convergence Testing Terminiology and Concepts", 401 draft-bmwg-ospfconv-term-07, January 2004 403 12. Informative References 405 [INTERCONNECT] 406 Bradner, S., McQuaid, J., "Benchmarking Methodology for Network 407 Interconnect Devices", RFC2544, March 1999. 409 [FIB-TERM] 410 Trotter, G., "Terminology for Forwarding Information Base (FIB) 411 based Router Performance", RFC3222, October 2001. 413 [BROADCAST-P2P] 414 Shen, Naiming, et al., "Point-to-point operation over LAN in link- 415 state routing protocols", draft-ietf-isis-igp-p2p-over-lan-03.txt, 416 August, 2003. 418 [GATED] 419 http://www.gated.org 421 13. Authors' Addresses 422 Vishwas Manral 423 Netplane Systems 424 189 Prashasan Nagar 425 Road number 72 426 Jubilee Hills 427 Hyderabad, India 429 vmanral@netplane.com 431 Russ White 432 Cisco Systems, Inc. 433 7025 Kit Creek Rd. 435 Research Triangle Park, NC 27709 437 riw@cisco.com 439 Aman Shaikh 440 University of California 441 School of Engineering 442 1156 High Street 443 Santa Cruz, CA 95064 445 aman@soe.ucsc.edu