idnits 2.17.1 draft-ietf-bmwg-benchres-method-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2001) is 8319 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Downref: Normative reference to an Historic RFC: RFC 1819 (ref. '5') -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' ** Downref: Normative reference to an Informational RFC: RFC 2544 (ref. '10') -- Possible downref: Non-RFC (?) normative reference: ref. '11' Summary: 8 errors (**), 0 flaws (~~), 1 warning (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Gabor Feher, BUTE 3 INTERNET-DRAFT Istvan Cselenyi, TRAB 4 Expiration Date: January 2002 Andras Korn, BUTE 6 July 2001 8 Benchmarking Methodology for Routers Supporting Resource Reservation 9 11 1. Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that other 18 groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft shadow directories can be accessed at 30 http://www.ietf.org/shadow.html 32 This memo provides information for the Internet community. This memo 33 does not specify an Internet standard of any kind. Distribution of 34 this memo is unlimited. 36 2. Table of contents 38 1. Status of this Memo.............................................1 39 2. Table of contents...............................................1 40 3. Abstract........................................................2 41 4. Introduction....................................................2 42 5. Existing definitions............................................2 43 6. Methodology.....................................................3 44 6.1 Evaluating the Results......................................3 45 6.2 Test Setup..................................................3 46 6.2.1 Testing Unicast Resource Reservation Sessions..........5 47 6.2.2 Testing Multicast Resource Reservation Sessions........5 48 6.2.3 Signaling Flow.........................................6 49 6.2.4 Signaling Message Verification.........................6 50 6.3 Scalability Tests...........................................6 51 6.3.1 Maximum Signaling Message Burst Size...................7 52 6.3.2 Maximum Signaling Load.................................8 53 6.3.3 Maximum Session Load...................................9 55 6.4 Benchmarking Tests.........................................11 56 6.4.1 Performing the Benchmarking Measurements..............12 57 7. Acknowledgement................................................14 58 8. References.....................................................14 59 9. Authors' Addresses:............................................15 61 3. Abstract 63 The purpose of this document is to define benchmarking methodology 64 measuring performance metrics related to IP routers supporting 65 resource reservation signaling. Apart from the definition and 66 discussion of these tests, this document also specifies formats for 67 reporting the benchmarking results. 69 4. Introduction 71 The IntServ over DiffServ framework [1] outlines a heterogeneous 72 Quality of Service (QoS) architecture for multi domain Internet 73 services. Signaling based resource reservation (e.g. via RSVP [2]) is 74 an integral part of that model. While this significantly lightens the 75 load on most of the core routers, the performance of border routers 76 that handle the QoS signaling is still crucial. Therefore network 77 operators, who are planning to deploy this model, shall scrutinize 78 the scalability limitations in reservation capable routers and the 79 impact of signaling on the forwarding performance of the routers. 81 An objective way for quantifying the scalability constraints of QoS 82 signaling is to perform measurements on routers that are capable of 83 resource reservation. This document defines a specific set of tests 84 that vendors or network operators can use to measure and report the 85 signaling performance characteristics of router devices that support 86 resource reservation protocols. The results of these tests will 87 provide comparable data for different products supporting the 88 decision process before purchase. Moreover, these measurements 89 provide input characteristics for the dimensioning of a network in 90 which resources are provisioned dynamically by signaling. Finally, 91 these tests are applicable for characterizing the impact of control 92 plane signaling on the forwarding performance of routers. 94 This benchmarking methodology document is based on the knowledge 95 gained by examination of (and experimentation with) several very 96 different resource reservation protocols: RSVP [2], Boomerang [3], 97 YESSIR [4], ST2+ [5], SDP [6], Ticket [7] and Load Control [8]. 98 Nevertheless, this document aspires to compose terms that are valid 99 in general and not restricted to these protocols. 101 5. Existing definitions 103 A previous document, "Benchmarking Terminology for Routers Supporting 104 Resource Reservation" [9] defines performance metrics and other terms 105 that are used in this document. To understand the test methodologies 106 defined here, that terminology document must be consulted first. 108 6. Methodology 110 6.1 Evaluating the Results 112 RFC2544 [10] describes considerations regarding the implementation 113 and evaluation of benchmarking tests, which are certainly valid for 114 this test suite also. Namely, the authors intended to create a system 115 from commercially available measurement instruments and devices for 116 the sake of easy implementation of the described tests. Simple test 117 scripts and benchmarking utilities for Linux are publicly available 118 from the Boomerang homepage [11]. 120 During the benchmarking tests, care should be taken for selecting the 121 proper set of tests for a specific router device, since not all of 122 the tests applicable to a particular Devices Under Test (DUT). 124 Finally, the selection of the relevant measurement results and their 125 evaluation requires experience and it must be done with an 126 understanding of generally accepted testing practices regarding 127 repeatability, variance and statistical significance of small numbers 128 of trials. 130 6.2 Test Setup 132 The ideal way to perform the measurements is to connect a passive 133 tester device (or, in short, passive tester) to all network 134 interfaces of the DUT, enabling the tester to capture all signaling 135 and data traffic that enters into or leaves from the DUT. Based on 136 the captured data packets and signaling messages along with the 137 proper time stamps the investigated performance metrics can be 138 computed. In addition to the passive tester there are signaling and 139 data traffic end-points that are responsible to generate and 140 terminate the required signaling and data flows going through the 141 DUT. These flows are used to generate router load in the DUT and the 142 measurements are also performed using them. This scenario is 143 illustrated in Figure 1. 145 Probably, the best solution is to connect the tester via network 146 traffic repeater devices (e.g. hubs) to the network interfaces of the 147 DUT. These repeaters cause very small delay in the ongoing packets, 148 and therefore their effect is insignificant in the measurements. 150 +------------+ 151 | | 152 +--->| Passive |<---+ 153 | | tester | | 154 | +------------+ | 155 | | 156 +---------------+ | +------------+ | +---------------+ 157 | Signaling and | | | | | | Signaling and | 158 | data traffic |----+--->| DUT |----+--->| data traffic | 159 | end-point | | | | end-point | 160 +---------------+ +------------+ +---------------+ 161 Figure 1 163 Moreover, tester devices should not have to be passive during the 164 measurement, rather they can generate the signaling and data flows as 165 well. This way the signaling and data traffic end-point and the 166 traffic capturing device can be combined into a single tester device, 167 called active tester. In this case the signaling and traffic flow, 168 the initiator tester device is the driver of the input network 169 interfaces of the DUT, while the second one, the signaling and 170 traffic terminator tester device is connected to the output network 171 interfaces of the tested device and captures signaling messages and 172 data packets leaving the DUT. Figure 2 shows this scenario. 174 +---------------+ +-----------+ +---------------+ 175 | | | | | | 176 | Active tester |----->| DUT |----->| Active tester | 177 | | | | | | 178 +---------------+ +-----------+ +---------------+ 179 Figure 2 181 In this scenario, the performance metrics are calculated from the log 182 of initiated packets and their initiation time in the first active 183 tester device and the log of captured packets and their capture time 184 in the second active tester. Obviously, the measurements do worth 185 nothing if the two testers are not clock-synchronized, since the 186 difference of the packet initiation times and packet capture times is 187 biased by the clock skew of the testers. For this reason, the clock 188 of the testers must be synchronized before the measurements are 189 performed. Nevertheless, scalability tests do not depend on the clock 190 synchronization and therefore they can be performed without any 191 preparation on the testers. 193 It is also possible to use only one active tester, which is the 194 signaling and traffic flow initiator and terminator device in the 195 same time. Although, this way the clock synchronization problem can 196 be avoided, but the tester should be powerful enough to generate and 197 capture all the test flows required by the measurements. 199 During the benchmarking tests, if the clocks are properly 200 synchronized when it is necessary, each test configuration is 201 suitable for the measurements. For this reason, we have not defined 202 different test methodologies for each test scenarios. Instead, we use 203 terms "initiator tester" and "terminator tester", which have their 204 equivalent appliances in each test configuration. 206 Initiator tester is the device that generates the signaling and data 207 flows, while terminator tester is the device that terminates the 208 signaling and data flow. In addition, the performance metrics 209 measurement is also performed by the tester(s). Evidently, in the 210 case of the configuration, where there is only one active tester, the 211 initiator tester and the terminator tester is the same appliance. 213 6.2.1 Testing Unicast Resource Reservation Sessions 215 Testing unicast resource reservation sessions requires that the 216 initial tester is connected to one of the network interfaces of the 217 DUT and the terminator tester is connected to a different network 218 interface of the tested device. 220 During the benchmarking tests, the initiator tester must use unicast 221 addresses for data traffic flows and the resource reservation 222 requests must refer to unicast resource reservation sessions. In 223 order to be able to compute the performance metrics, all data packets 224 and signaling messages transmitted by the DUT must be perceivable for 225 the tester. 227 6.2.2 Testing Multicast Resource Reservation Sessions 229 Testing multicast resource reservation sessions requires the initial 230 tester to be connected to more than one network interfaces of the 231 DUT, while the terminator tester is connected to more than one 232 network interfaces of the tested device whose interfaces are 233 different from the previous ones. 235 Furthermore, during the measurements, the data traffic flows 236 originated from the initiator tester must be sent to multicast 237 addresses and the reservation sessions must refer to one or more of 238 the multicast flows. Of course, just like in the case of unicast 239 resource reservation sessions, all data packets and signaling 240 messages transmitted by the DUT must be perceivable for the tester. 242 Since there are protocols supporting more than one resource 243 reservation schemes for multicast reservations (e.g. RSVP SE/FF/WF); 244 and in a view of the fact that the number of incoming and outgoing 245 network interface combinations of the DUT might be almost countless; 246 the benchmarking tests, described here, do not require measuring all 247 imaginable setup situation. Still, routers supporting multicast 248 resource reservations must be tested against the performance metrics 249 and scalability limits on at least one multicast scenario. Moreover, 250 there is a suggested multicast test configuration that consists of a 251 multicast group with four signaling end-points including one traffic 252 originator and three traffic destinations residing on different 253 network interfaces of the DUT. 255 The benchmarking test reports taken on DUTs supporting multicast 256 resource reservation sessions always have to contain the proper 257 multicast scenario description. 259 6.2.3 Signaling Flow 261 This document often refers to signaling flows. A signaling flow is 262 sequence of signaling messages. 264 In the case of the measurements defined in this document there are 265 two types of signaling flows: First, there is a signaling flow that 266 is constructed from signaling primitives of the same type. Second, 267 there is a signaling flow that is constructed from signaling 268 primitive pairs. Signaling primitive pairs are needed in situations 269 where one of the signaling primitive alters the states of the DUT, 270 but the test demand constant DUT conditions during the test. In this 271 case, to avoid the effect of the state modification, the second 272 signaling primitive should restore the states modification in the 273 DUT. A typical example for the second type of signaling flow is a 274 flow of alternating reservation set-up and tear-down messages. 276 Moreover, the signaling messages should be equally spaced on the time 277 scale when they are forming a signaling flow. This is mandatory in 278 order to obtain measurements that can be repeated later. Since modern 279 resource reservation protocols are designed to avoid message 280 synchronization, thus, equally spaced signaling messages are not 281 unrealistic in the real life. 283 The signaling flow is characterized with the type of the signaling 284 primitive or the pair of signaling primitives along with the period 285 time of the signaling messages. 287 6.2.4 Signaling Message Verification 289 Although, the conformance testing of the resource reservation is 290 beyond the scope of this document, defective signaling message 291 processing can be expected in an overloaded router. Therefore, during 292 the benchmarking tests, when signaling messages are processed in the 293 DUT, the terminator device must validate the messages whether they 294 are fully conform to the message format of the resource reservation 295 protocol specification and whether they are the expected signaling 296 messages at the given situation. If any of the messages are against 297 the protocol specification then the benchmarking test report must 298 indicate the situation of the failure. 300 Verifying data traffic packets are not required, since the signaling 301 performance benchmarking of reservation capable routers should not 302 deal with data traffic. For this purpose there are other benchmarking 303 methodologies that verify data traffic during the measurements, like 304 the one described in RFC 2544. 306 6.3 Scalability Tests 307 Scalability tests are defined to explore the scalability limits of a 308 reservation capable router. This investigation focuses on the 309 scalability limits related only to signaling message handling and 310 therefore examination of the data forwarding engine is out of the 311 scope of this document. 313 6.3.1 Maximum Signaling Message Burst Size 315 Objective: 316 Determine the maximum signaling burst size, which is the number of 317 the signaling messages in a signaling burst that the DUT is able to 318 handle without signaling loss. 320 Procedure: 321 1. Select a signaling primitive or a signaling primitive pair and 322 construct a signaling flow. The signaling messages should follow each 323 other back-to-back in the flow and after "n" number of messages the 324 flow should be terminated. In the first test sequence the number "n" 325 should be set to one. 327 Additionally, all the signaling messages in the signaling flow must 328 conform to the resource reservation protocol definition and must be 329 parameterized in a way to avoid signaling message processing errors 330 in the DUT. 332 2. Send the signaling flow to the DUT and count the signaling 333 messages received by the terminator tester. 335 3. When the number of sent signaling messages ("n") equals to the 336 number of received messages, then the number of messages forming the 337 signaling flow ("n") should be increased by one; and the test 338 sequence has to be repeated. However, if the receiver receives less 339 signaling messages than the number of sent messages, it indicates 340 that the DUT is beyond its scalability limit. The measured 341 scalability limit for the maximum signaling message burst size is the 342 length of the signaling flow in the previous test sequence ("n"-1). 344 In order to avoid transient test failures, the whole test must be 345 repeated at least 30 times and the report should indicate the median 346 of the measured maximum signaling message burst size values as the 347 result of the test. Among the test runs, the DUT should be reset to 348 its initial state. 350 There are signaling primitives, such as signaling messages indicating 351 errors, which are not suitable for this kind of scalability tests. 352 However, each signaling primitive suitable for the test should be 353 investigated. 355 Reporting format: 356 The report should indicate the type of the signaling primitive or 357 signaling primitive pair and the determined maximum signaling message 358 burst size. 360 Note: 361 In the case of routers supporting multicast resource reservation 362 sessions, the signaling burst can be also constructed by sending 363 signaling messages to multiple network interfaces of the DUT at the 364 same time. 366 6.3.2 Maximum Signaling Load 368 Objective: 369 Determine the maximum signaling load, which is the maximum number of 370 signaling messages within a time unit that the DUT is able to handle 371 without signaling loss. 373 Procedure: 374 1. Select a signaling primitive or a signaling primitive pair and 375 construct a signaling flow. The period of the signaling flow should 376 be adjusted in a way that exactly "s" signaling messages arrive 377 within one second. In the first test sequence the number "s" should 378 be set to one (i.e. 1 message per second). 380 Additionally, all the signaling messages in the signaling flow must 381 conform to the resource reservation protocol definition and must be 382 parameterized in a way to avoid signaling message processing errors 383 in the DUT. 385 2. Send the signaling flow to the DUT for at least one minute, and 386 count the signaling messages received by the terminator tester. 388 3. When the number of sent signaling messages ("s" times the duration 389 of the signaling flow) equals to the number of received messages, the 390 signaling flow period should be decreased in a way that one more 391 signaling message fits into a one second interval of the signaling 392 flow ("s" should be increased by one). But, if the receiver receives 393 less signaling messages than the number of sent messages, it 394 indicates that the DUT is beyond its scalability limit. The measured 395 scalability limit for the maximum signaling load is the number of 396 signaling messages fitting into one second of the signaling flow in 397 the previous test sequence ("s"-1). 399 In order to avoid transient test failures, the whole test must be 400 repeated at least 30 times and the report should indicate the median 401 of the measured maximum signaling load values as the result of the 402 test. Among the test runs, the DUT should be reset to its initial 403 state. 405 In the case of this test, there are also signaling primitives which 406 are not suitable for this kind of scalability tests. However, each 407 signaling primitive that is suitable for the test should be 408 investigated just like in the case of the maximum signaling burst 409 size test. 411 Reporting format: 413 The report should indicate the type of the signaling primitive or 414 signaling primitive pair and the determined maximum signaling load 415 value. 417 6.3.3 Maximum Session Load 419 Objective: 420 Determine the maximum session load, which is the maximum number of 421 resource reservation sessions that can be maintained simultaneously 422 in a reservation capable router. The maximum number of session relies 423 on two architectural components of the DUT. First, the DUT should 424 have enough memory space to store the attributes of the different 425 resource reservation sessions. Second, the DUT has to be powerful 426 enough to maintain all the reservation sessions if they require 427 actions during the lifetime of the sessions. 429 In the case of hard-state protocols we cannot speak of reservation 430 session maintenance, therefore in this situation the available memory 431 space is the only limit for the session number. Moreover, there are 432 also resource reservation protocols that handle only the aggregates 433 of reservation sessions (e.g. Load Control [8]) and do not 434 distinguish the separate traffic flows referring to reserved 435 resources. Of course, in this situation there is no session 436 maintenance either, since there are no reservation sessions, plus the 437 memory allocation for the aggregates is limited. In this latter case, 438 the maximum session load is defined to be unlimited and the test can 439 be skipped. 441 According to the dual limits of the measurement, the benchmarking 442 procedure is separated into two tests. The first test investigates 443 the session number limit due to the memory space, while the second 444 test explores the reservation session maintenance capability of the 445 DUT. 447 The first test is applied to every resource reservation protocol, 448 which stores reservation sessions separately and not only an 449 aggregate of them. Resource reservation protocols that are capable 450 for session aggregation, but still have the capability to handle 451 separate sessions (e.g. Boomerang [3]) are still subject of this 452 test. 454 Procedure: 455 1. Set up a reservation session in the reservation capable router by 456 sending the appropriate signaling messages to the DUT. 458 2. Establish one more reservation session in the DUT using the 459 appropriate signaling messages. In the case of soft-state protocols, 460 all the reservation sessions existing in the DUT must be maintained 461 using refresh messages. 463 3. Repeat step 2 until the router signs that there is not enough 464 memory space to establish the new reservation session. In this case, 465 the test is finished and the maximum memory capacity available to 466 store the sessions is reached. 468 Note: 469 Not all the resource reservation protocols support to signal the 470 overrun of the maximum memory capacity limit directly. However, 471 certain behavior of the router may also indicate the memory overrun. 473 The second test is applied to those resource reservation capable 474 routers only that run reservation session maintenance mechanisms to 475 refresh internal states belonging to reservation sessions. Here, we 476 investigate the DUT whether it is able to cope with the refresh 477 signaling message handling that shows also the capability to refresh 478 the internally stored reservation sessions. 480 Procedure: 481 1. Set up "n" number of reservation session in the reservation 482 capable router by sending the appropriate signaling messages to the 483 DUT. In the first test sequence the number "n" should be set to one. 484 Beside the reservation session generation, the initiator tester must 485 also take care of the reservation session refreshes. 487 2. Capture the refresh signaling messages leaving the DUT for a 488 specified amount of time ("T") while still maintaining the 489 established reservations with refresh signaling messages. Time "T" 490 must be at least as long as the protocol specifies as reservation 491 time out. 493 3. Check whether each reservation session is refreshed during the 494 refresh period that was examined in step 2. The proof of the session 495 refresh is a leaving refresh signaling message referring to the 496 corresponding reservation session. If all sessions that were set up 497 in step 1 are refreshed during step 2, then repeat the test sequence 498 by increasing the number of reservations by one ("n"+1). However, 499 when any of the reservations was dropped by the DUT, then the test 500 sequence should be cancelled and the determined maximum session load 501 is the number of resource reservation sessions maintained 502 successfully in the previous test sequence ("n"-1). 504 In order to avoid transient test failures, the whole test must be 505 repeated at least 30 times and the report should indicate the median 506 of the measured maximum signaling load values as the result of the 507 test. Among the test runs, the DUT should be reset to its initial 508 state. 510 Reporting format: 511 The report should indicate determined maximum session load value, 512 which is the lowest value between the two test results. 514 Note: 515 When the number of reserved sessions grows over a number that counts 516 to a very high value in the given technology conditions, then the 517 test can be canceled and the report can state that the resource 518 reservation protocol implementation performs the maximum number of 519 reservation sessions over that limit (e.g. "Over 100.000 sessions"). 521 Also note, that testing the DUT in the case of multicast and unicast 522 scenario, it may result different maximum session load values. 524 6.4 Benchmarking Tests 526 Benchmarking tests are defined to measure the QoS signaling related 527 performance metrics on the resource reservation capable router 528 device. 530 Since the objective of the benchmarking is to characterize routers 531 performing resource reservation in real-life situations, therefore 532 during the tests the DUT must not bump into its scalability limits 533 determined by the previous test. 535 Each performance metric is measured when the DUT is under different 536 router load conditions. The router load is generated and 537 characterized using combinations of independent load types: 539 a. Signaling load 540 b. Session load 541 c. Premium traffic load 542 d. Best-effort traffic load 544 The initiator tester device generates the signaling load on the DUT 545 by sending a signaling flow to the terminator tester. This signaling 546 flow is constructed from a specific signaling primitive or a 547 signaling primitive pair and has the appropriate period parameter. 549 The session load is generated by the signaling end-points setting up 550 resource reservation sessions in the DUT via signaling. In the case 551 of soft-state protocols, the initiator tester device must also 552 maintain the reservation sessions with refresh signaling messages 553 periodically. 555 The initiator tester device generates the premium traffic load by 556 sending a data traffic flow to the terminator tester across the DUT. 557 This traffic flow should have dedicated resourced in the DUT, set up 558 previously using signaling messages. The traffic must consist of 559 equally spaced and equally sized data packets. Although any transfer 560 protocol is suitable for traffic generation, it is highly recommended 561 to use UDP packets, since this data flow is totally controllable, 562 unlike TCP that uses congestion avoidance mechanism. The premium 563 traffic must be characterized by its traffic parameters: data packet 564 size in octets, the calculated bandwidth of the stream in kbps unit 565 and the transfer protocol type. The data packet size should include 566 both the payload and the header of the IP packet. 568 The initiator tester device generates the best-effort traffic load by 569 sending a data traffic flow (that refers to no resource reservation 570 sessions) to the terminator tester across the DUT. Any other 571 attributes of the traffic flow must meet the conditions described 572 previously in the case of premium traffic load. 574 Note, that these four load types have influence on each other from 575 their nature that may spoil the measurements. Therefore, in order to 576 have accurate results these cross-effects must be minimized during 577 the benchmarking tests. The signaling load can cause interference 578 with the session load, when certain signaling messages alter the 579 number of reservation session in the DUT. To cancel this influence 580 the signaling flow should contain signaling message pairs, where the 581 message pairs has opposite effect restoring the changes caused in the 582 DUT. On the other hand, in the case of soft-state protocols, sessions 583 must be refreshed by periodically sent signaling messages. Although 584 refresh messages are used to maintain the reservation sessions, still 585 they are counted as signaling messages. Furthermore, signaling 586 messages are realized as data packets. Such way signaling messages 587 must be taken into account in the traffic flow calculation as well. 589 6.4.1 Performing the Benchmarking Measurements 591 Objective: 592 The goal is to take measurements on the DUT running a resource 593 reservation protocol implementation in the case of different load 594 conditions. The load on the DUT is always the combination of the four 595 load components described before. 597 Procedure: 598 The procedure is to load the router with each load component at a 599 desired level and measure the investigated performance metrics. The 600 load condition on the DUT should not change during the test. Once, 601 the measurement is complete, repeat the test with different load 602 distributions. 604 During the test sequences, in order to avoid transient flow behavior 605 influencing the measurements, the measurements should begin after a 606 delay of at least "T" time and after the setup of the common load on 607 the DUT. The value of "T" depends on the parameters of the load 608 components and the resource reservation protocol implementations, 609 but, as a rule of thumb, it should be enough for at least 10 packets 610 from the traffic flows and 10 signaling messages from the signaling 611 flow to pass through the DUT and at least one refresh period to 612 expire in the case of soft-state protocols. 614 During the measurement of the performance metrics in a practical load 615 setup, not just one, but 100 measurement samples should be collected. 616 Normally, the empirical distribution function of the tests is similar 617 to the curve of a Gaussian distribution, and therefore the modus and 618 the median are in the same location. Such case, the result of the 619 test sequence is the median of the samples. In the case of different 620 shaped empirical distribution functions, the curve must be further 621 analyzed and the result should describe the curve well enough. 623 In order to avoid transient test run failures that may cause invalid 624 results for the entire test, the whole test must be repeated at least 625 10 times and the report should indicate the median of the measured 626 values filtering out the extreme results. Moreover, after each test 627 run the DUT should be reset to its initial state. 629 In order to perform a complete benchmarking test, every performance 630 metrics must be measured using signaling flows made of every 631 applicable signaling primitives or primitive pairs. 633 Since the test methodology is the same for all the different 634 performance metric benchmarking procedure, it is also recommended to 635 perform the measurements for all performance metrics at the same time 636 in one test cycle. 638 At first sight, this procedure may look easy to carry out, but in 639 fact there are lots of difficulties to overcome. The following 640 guidelines may help in reducing the complexity of creating a 641 conforming measurement setup. 643 1. It is reasonable to define different amounts for each load 644 component (load levels) before benchmarking and then measure the 645 performance metrics with all possible combinations of these 646 individual load levels. 648 2. The number of different load combinations depends on the number of 649 different load levels defined for a load component. Working with too 650 much number of load levels is very time-consuming and therefore not 651 suggested. Instead, there are proposed levels and parameters for each 652 load component. 654 The data traffic parameters for the traffic load components have to 655 be selected from generally used traffic parameters. It is recommended 656 to choose a packet size of: 54, 64, 128, 256, 1024, 1518, 2048 and 657 4472 bytes (these are the same values that are used in RFC 2544 that 658 introduces methodology for benchmarking network interconnect 659 devices). Additionally, the size of the packets should always remain 660 below the MTU of the network segment. The packet rate is recommended 661 to be one of 0, 10, 500, 1000 or 5000 packets/s. Since the number of 662 combinations for these traffic parameters is still large, the highly 663 recommended values are 64, 128 and 1024 bytes for the packet size and 664 10 and 1000 packets/s packet rate. These values adequately represent 665 a wide range of traffic types common in today's Internet. 667 The number of session load levels should be at least 4 and it is 668 recommended to share them equally between 0 and the maximum session 669 load value. 671 The number of signaling load levels should be at least 4 as well, and 672 the actual value of the signaling load is also recommended to be 673 equally distributed between 0 and the maximum signaling load value. 675 Zero load level means that the actual load component is not involved 676 in the router load. 678 Reporting format: 679 As the whole report description requires a four-dimension table (four 680 load components plus the results), which is hard to visualize for a 681 human being, therefore the results are extracted into ordinary two- 682 dimensional tables. Each table has two fixed load component 683 quantities and the other two load component levels are the row and 684 column for the table. Such way, one set of such tables describe the 685 benchmarking results for one certain type of signaling flow used in 686 the generation of the signaling load. Naturally, each different 687 signaling flow requires separate tables. 689 Note: 690 Of course in the case of multicast resource reservation sessions, the 691 combination number of the different multicast scenarios multiplies 692 the number benchmarking tests also. 694 7. Acknowledgement 696 The authors would like to thank the following individuals for their 697 help in forming this document: Norbert Vegh and Anders Bergsten from 698 Telia Research AB, Sweden, Krisztian Nemeth, Peter Vary, Balazs Szabo 699 and Gabor Kovacs from High Speed Networks Laboratory of Budapest 700 University of Technology and Economics. 702 8. References 704 [1] Y. Bernet, et. al., "A Framework For Integrated Services 705 Operation Over Diffserv Networks", Internet Draft, work in 706 progress, May 2000, 708 [2] B. Braden, Ed., et. al., "Resource Reservation Protocol (RSVP) - 709 Version 1 Functional Specification", RFC 2205, September 1997. 711 [3] J. Bergkvist, I. Cselenyi, D. Ahlard, "Boomerang - A Simple 712 Resource Reservation Framework for IP", Internet Draft, work in 713 progress, November 2000, 716 [4] P. Pan, H. Schulzrinne, "YESSIR: A Simple Reservation Mechanism 717 for the Internet", Computer Communication Review, on-line 718 version, volume 29, number 2, April 1999 720 [5] L. Delgrossi, L. Berger, "Internet Stream Protocol Version 2 721 (ST2) Protocol Specification - Version ST2+", RFC 1819, August 722 1995 724 [6] P. White, J. Crowcroft, "A Case for Dynamic Sender-Initiated 725 Reservation in the Internet", Journal on High Speed Networks, 726 Special Issue on QoS Routing and Signaling, Vol 7 No 2, 1998 728 [7] A. Eriksson, C. Gehrmann, "Robust and Secure Light-weight 729 Resource Reservation for Unicast IP Traffic", International WS 730 on QoS'98, IWQoS'98, May 18-20, 1998 732 [8] L. Westberg, Z. R. Turanyi, D. Partain, Load Control of Real- 733 Time Traffic, A Two-bit Resource Allocation Scheme, Internet 734 Draft, work in progress, April 2000, 737 [9] G. Feher, I. Cselenyi, A. Korn, "Benchmarking Terminology for 738 Routers Supporting Resource Reservation", Internet Draft, work 739 in progress, July 2001, 741 [10] S. Bradner, J. McQuaid, "Benchmarking Methodology for Network 742 Interconnect Devices", RFC 2544, March 1999 744 [11] Boomerang Team, "Boomerang homepage - Benchmarking Tools", 745 http://boomerang.ttt.bme.hu 747 9. Authors' Addresses: 749 Gabor Feher 750 Budapest University of Technology and Economics (BUTE) 751 Department of Telecommunications and Telematics 752 Pazmany Peter Setany 1/D, H-1117, Budapest, 753 Phone: +36 1 463-3110 754 Email: feher@ttt-atm.ttt.bme.hu 756 Istvan Cselenyi 757 Telia Research AB 758 Vitsandsgatan 9B 759 SE 12386, Farsta 760 SWEDEN, 761 Phone: +46 8 713-8173 762 Email: istvan.i.cselenyi@telia.se 764 Andras Korn 765 Budapest University of Technology and Economics (BUTE) 766 Institute of Mathematics, Department of Analysis 767 Egry Jozsef u. 2, H-1111 Budapest, Hungary 768 Phone: +36 1 463-2475 769 Email: korn@math.bme.hu