idnits 2.17.1 draft-asati-bmwg-reset-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC2544, updated by this document, for RFC5378 checks: 1999-03-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 19, 2010) is 5179 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Benchmarking Methodology WG Rajiv Asati 2 Internet Draft Cisco 3 Updates: 2544 (if approved) Carlos Pignataro 4 Intended status: Informational Cisco 5 Expires: August 2010 Fernando Calabria 6 Cisco 7 Cesar Olvera 8 Consulintel 10 February 19, 2010 12 Device Reset Characterization 13 draft-asati-bmwg-reset-03 15 Abstract 17 An operational forwarding device may need to be re-started 18 (automatically or manually) for a variety of reasons, an event that 19 we call a "reset" in this document. Since there may be an 20 interruption in the forwarding operation during a reset, it is 21 useful to know how long a device takes to begin forwarding packets 22 again. 24 This document specifies a methodology for characterizing reset 25 during benchmarking of forwarding devices, and provides clarity and 26 consistency in reset test procedures beyond what's specified in 27 RFC2544. It therefore updates RFC2544. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six 40 months and may be updated, replaced, or obsoleted by other documents 41 at any time. It is inappropriate to use Internet-Drafts as 42 reference material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on August 19, 2010. 51 Copyright Notice 53 Copyright (c) 2010 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with 61 respect to this document. Code Components extracted from this 62 document must include Simplified BSD License text as described in 63 Section 4.e of the Trust Legal Provisions and are provided without 64 warranty as described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction...................................................4 69 1.1. Scope.....................................................4 70 2. Key Words to Reflect Requirements..............................5 71 3. Reset Test.....................................................5 72 3.1. Hardware Reset............................................5 73 3.1.1. Routing Processor (RP) / Routing Engine reset........6 74 3.1.1.1. RP Failure for a single-RP device (mandatory)...6 75 3.1.1.2. RP Failure for a multiple-RP device (optional)..7 76 3.1.2. Line Card (LC) Removal and Insertion (mandatory).....9 77 3.2. Software Reset...........................................11 78 3.2.1. Operating System (OS) reset (mandatory).............11 79 3.2.2. Process reset (optional)............................13 80 3.3. Power interruption.......................................15 81 3.3.1. Power Interruption (mandatory)......................15 82 4. Security Considerations.......................................16 83 5. IANA Considerations...........................................17 84 6. Acknowledgments...............................................17 85 7. References....................................................18 86 7.1. Normative References.....................................18 87 7.2. Informative References...................................18 88 Authors' Addresses...............................................19 90 1. Introduction 92 An operational forwarding device (or one of its components) may need 93 to be re-started for a variety of reasons, an event that we call a 94 "reset" in this draft. Since there may be an interruption in the 95 forwarding operation during a reset, it is useful to know how long a 96 device takes to begin forwarding packets again. 98 However, the answer to this question is no longer simple and 99 straight-forward as the modern forwarding devices employ many 100 hardware advancements (distributed forwarding, etc.) and software 101 advancements (graceful restart, etc.) that influence the recovery 102 time after the reset. 104 Additionally, there are other factors that influence the recovery 105 time after the reset: 107 1. Type of reset - Hardware (line-card crash, etc.) vs. Software 108 (protocol reset, process crash, etc.) or even complete power 109 failures 111 2. Manual vs. Automatic reset 113 3. Local vs. Remote reset 115 4. Scale - Number of line cards present vs. in-use 117 5. Scale - Number of physical and logical interfaces 119 6. Scale - Number of routing protocol instances 121 This document specifies a methodology for characterizing reset 122 during benchmarking of forwarding devices, and provides clarity and 123 consistency in reset procedures beyond what's specified in 124 [RFC2544]. These procedures may be used by other benchmarking 125 documents such as [RFC2544], [RFC5180], [RFC5695], etc. 127 This document updates Section 26.6 of [RFC2544]. 129 1.1. Scope 131 This document focuses on only the reset criterion of benchmarking, 132 and presumes that it would be beneficial to [RFC2544], [RFC5180], 133 [RFC5695], and other BMWG benchmarking efforts. 135 2. Key Words to Reflect Requirements 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in BCP 14, RFC 2119 140 [RFC2119]. RFC 2119 defines the use of these key words to help make 141 the intent of standards track documents as clear as possible. While 142 this document uses these keywords, this document is not a standards 143 track document. 145 3. Reset Test 147 This section contains the description of the tests that are related 148 to the characterization of DUT's (Device Under Test) / SUT's (System 149 Under Test) speed to recover from a reset. There are three types of 150 reset considered in this document: 152 1. Hardware resets 154 2. Software resets 156 3. Power interruption 158 Section 3.1 describes various hardware resets, whereas Section 3.2 159 describes various software resets. Additionally, Section 3.3 160 describes power interruption tests. These sections define and 161 characterize these resets. 163 Additionally, since device specific implementations may vary for 164 hardware and software type resets, it is desirable to classify each 165 test case as "mandatory" or "optional". 167 3.1. Hardware Reset 169 A test designed to characterize the time it takes a DUT to recover 170 from the hardware reset. 172 A "hardware reset" generally involves the re-initialization of one 173 or more physical components in the DUT, but not the entire DUT. 175 A hardware reset is executed by the operator for example by physical 176 removal of a physical component, by pressing on a "reset" button for 177 the component, or could even be triggered from the command line 178 interface. 180 For routers that do not contain separate Routing Processor and Line 181 Card modules, the hardware reset tests are not performed since they 182 are not relevant; instead, the power interruption tests are 183 mandatory to be performed (see Section 3.3) in these cases. 185 3.1.1. Routing Processor (RP) / Routing Engine reset 187 The Routing Processor (RP) is the DUT module that is primarily 188 concerned with Control Plane functions. 190 3.1.1.1. RP Failure for a single-RP device (mandatory) 192 Objective 194 To characterize the speed at which a DUT recovers from a Route 195 processor hardware reset in a single RP environment. 197 Procedure 199 First, ensure that the RP is in a permanent state to which it will 200 return to after the reset, by performing some or all of the 201 following operational tasks: save the current DUT configuration, 202 specify boot parameters, ensure the appropriate software files are 203 available, or perform additional Operating System or hardware 204 related task. 206 Second, ensure that the DUT is able to forward the traffic for at 207 least 15 seconds before any test activities are performed. The 208 traffic should use the minimum frame size possible on the media 209 used in the testing and rate should be sufficient for the DUT to 210 attain the maximum forwarding throughput. This enables a finer 211 granularity in the recovery time measurement. 213 Third, perform the Route Processor (RP) hardware reset at this 214 point. This entails for example physically removing the RP to 215 later re-insert it, or triggering a hardware reset by other means 216 (e.g., command line interface, physical switch, etc.) 218 Finally, the characterization is completed by measuring the frame 219 loss and recovery time from the moment the RP is re-initialized or 220 reinserted. 222 Reporting format 223 The reset results are reported in a simple statement including the 224 frame loss and recovery times. 226 For each test case, it is RECOMMENDED that the following 227 parameters be reported in these units: 229 Parameter Units or Examples 231 Throughput Frames per second and bits per 233 second 235 Loss Frames 237 Time Seconds, with sufficient resolution 239 to convey meaningful info 241 Protocol IPv4, IPv6, MPLS, etc. 243 Frame Size Octets 245 Port Media Ethernet, GigE (Gigabit Ethernet), 247 POS (Packet over SONET), etc. 249 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 251 Interface Encap. Ethernet, Ethernet VLAN, 253 PPP, HDLC, etc. 255 The reporting of results MUST regard repeatability considerations 256 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 257 trials and report average results. 259 3.1.1.2. RP Failure for a multiple-RP device (optional) 261 Objective 263 To characterize the speed at which a "secondary" Route Processor 264 (sometimes referred to as "backup" RP) of a DUT becomes active 265 after a "primary" (or "active") Route Processor hardware reset. 266 This process is often referred to as "RP Switchover". The 267 characterization in this test should be done for the default DUT 268 behavior as well as a DUT's non-default configuration that 269 minimizes frame loss. 271 Procedure 273 This test characterizes "RP Switchover". Many implementations 274 allow for optimized switchover capabilities that minimize the 275 downtime during the actual switchover. This test consists of two 276 sub-cases from a switchover characteristics standpoint: First, a 277 default behavior (with no switchover-specific configurations); and 278 second, a non-default behavior with switchover configuration to 279 minimize frame loss. Therefore, the procedures hereby described 280 are executed twice, and reported separately. 282 First, ensure that the RPs are in a permanent state such that the 283 secondary will be activated to the same state as the active is, by 284 performing some or all of the following operational tasks: save 285 the current DUT configuration, specify boot parameters, ensure the 286 appropriate software files are available, or perform additional 287 Operating System or hardware related task. 289 Second, ensure that the DUT is able to forward the traffic for at 290 least 15 seconds before any test activities are performed. The 291 traffic should use the minimum frame size possible on the media 292 used in the testing and rate should be sufficient for the DUT to 293 attain the maximum forwarding throughput. This enables a finer 294 granularity in the recovery time measurement. 296 Third, perform the primary Route Processor (RP) hardware reset at 297 this point. This entails for example physically removing the RP, 298 or triggering a hardware reset by other means (e.g., command line 299 interface, physical switch, etc.) Is up to the Operator to decide 300 if the active RP needs to be re-inserted after a grace period or 301 not. 303 Finally, the characterization is completed by measuring the 304 complete frame loss and recovery time from the moment the active 305 RP is hardware-reset. 307 Reporting format 309 The reset results are reported twice, one for the default 310 switchover behavior and the other for the non-default one. For 311 each, the report consists of a simple statement including the 312 frame loss and recovery times, as well as any specific redundancy 313 scheme in place. 315 For each test case, it is RECOMMENDED that the following 316 parameters be reported in these units: 318 Parameter Units or Examples 320 Throughput Frames per second and bits per 322 second 324 Loss Frames 326 Time Seconds, with sufficient resolution 328 to convey meaningful info 330 Protocol IPv4, IPv6, MPLS, etc. 332 Frame Size Octets 334 Port Media Ethernet, GigE (Gigabit Ethernet), 336 POS (Packet over SONET), etc. 338 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 340 Interface Encap. Ethernet, Ethernet VLAN, 342 PPP, HDLC, etc. 344 The reporting of results MUST regard repeatability considerations 345 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 346 trials and report average results. 348 3.1.2. Line Card (LC) Removal and Insertion (mandatory) 350 The Line Card (LC) is the DUT component that is responsible with 351 packet forwarding. 353 Objective 355 To characterize the speed at which a DUT recovers from a Line Card 356 removal and insertion event. 358 Procedure 360 For this test, the Line Card that is being hardware-reset MUST be 361 on the forwarding path and all destinations MUST be directly 362 connected. 364 First, complete some or all of the following operational tasks: 365 save the current DUT configuration, specify boot parameters, 366 ensure the appropriate software files are available, or perform 367 additional Operating System or hardware related task. 369 Second, ensure that the DUT is able to forward the traffic for at 370 least 15 seconds before any test activities are performed. The 371 traffic should use the minimum frame size possible on the media 372 used in the testing and rate should be sufficient for the DUT to 373 attain the maximum forwarding throughput. This enables a finer 374 granularity in the recovery time measurement. 376 Third, perform the Line Card (LC) hardware reset at this point. 377 This entails for example physically removing the LC to later re- 378 insert it, or triggering a hardware reset by other means (e.g., 379 command line interface, physical switch, etc.) 381 Finally, the characterization is completed by measuring the frame 382 loss and recovery time from the moment the LC is reinitialized or 383 reinserted. 385 Reporting Format 387 The reset results are reported in a simple statement including the 388 frame loss and recovery times. 390 For each test case, it is RECOMMENDED that the following 391 parameters be reported in these units: 393 Parameter Units or Examples 395 Throughput Frames per second and bits per 397 second 399 Loss Frames 401 Time Seconds, with sufficient resolution 403 to convey meaningful info 405 Protocol IPv4, IPv6, MPLS, etc. 407 Frame Size Octets 409 Port Media Ethernet, GigE (Gigabit Ethernet), 411 POS (Packet over SONET), etc. 413 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 415 Interface Encap. Ethernet, Ethernet VLAN, 417 PPP, HDLC, etc. 419 The reporting of results MUST regard repeatability considerations 420 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 421 trials and report average results. 423 3.2. Software Reset 425 To characterize the speed at which a DUT recovers from the software 426 reset. 428 In contrast to a "hardware reset", a "software reset" involves only 429 the re-initialization of the execution, data structures, and partial 430 state within the software running on the DUT module(s). 432 A software reset is initiated for example from the DUT's Command 433 Line Interface (CLI). 435 3.2.1. Operating System (OS) reset (mandatory) 437 Objective 439 To characterize the speed at which a DUT recovers from an 440 Operating System (OS) software reset. 442 Procedure 444 First, complete some or all of the following operational tasks: 445 save the current DUT configuration, specify software boot 446 parameters, ensure the appropriate software files are available, 447 or perform additional Operating System task. 449 Second, ensure that the DUT is able to forward the traffic for at 450 least 15 seconds before any test activities are performed. The 451 traffic should use the minimum frame size possible on the media 452 used in the testing and rate should be sufficient for the DUT to 453 attain the maximum forwarding throughput. This enables a finer 454 granularity in the recovery time measurement. 456 Third, trigger an Operating System re-initialization in the DUT, 457 by operational means such as use of the DUT's Command Line 458 Interface (CLI) or other management interface. 460 Finally, the characterization is completed by measuring the 461 complete frame loss and recovery time from the moment the reset 462 instruction was given until the Operating System finished the 463 reload and re-initialization (inferred by the re-establishing of 464 traffic). 466 Reporting format 468 The reset results are reported in a simple statement including the 469 frame loss and recovery times. 471 For each test case, it is RECOMMENDED that the following 472 parameters be reported in these units: 474 Parameter Units or Examples 476 Throughput Frames per second and bits per 478 second 480 Loss Frames 482 Time Seconds, with sufficient resolution 484 to convey meaningful info 486 Protocol IPv4, IPv6, MPLS, etc. 488 Frame Size Octets 490 Port Media Ethernet, GigE (Gigabit Ethernet), 492 POS (Packet over SONET), etc. 494 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 496 Interface Encap. Ethernet, Ethernet VLAN, 498 PPP, HDLC, etc. 500 The reporting of results MUST regard repeatability considerations 501 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 502 trials and report average results. 504 3.2.2. Process reset (optional) 506 Objective 508 To characterize the speed at which a DUT recovers from a software 509 process reset. 511 Such speed may depend upon the number and types of process running 512 in the DUT and which ones are tested. Different implementations of 513 forwarding devices include various common processes. A process 514 reset should be performed only in the processes most relevant to 515 the tester. 517 Procedure 519 First, complete some or all of the following operational tasks: 520 save the current DUT configuration, specify software parameters or 521 environmental variables, or perform additional Operating System 522 task. 524 Second, ensure that the DUT is able to forward the traffic for at 525 least 15 seconds before any test activities are performed. The 526 traffic should use the minimum frame size possible on the media 527 used in the testing and rate should be sufficient for the DUT to 528 attain the maximum forwarding throughput. This enables a finer 529 granularity in the recovery time measurement. 531 Third, trigger a process reset for each process running in the DUT 532 and considered for testing from a management interface (e.g., by 533 means of the Command Line Interface (CLI), etc.) 535 Finally, the characterization for each individual process is 536 completed by measuring the complete frame loss and recovery time 537 from the moment the reset instruction was given until the 538 Operating System finished the reload and re-initialization 539 (inferred by the re-establishing of traffic). 541 Reporting format 543 The reset results are reported in a simple statement including the 544 frame loss and recovery times for each process running in the DUT 545 and tested. Given the implementation nature of this test, details 546 of the actual process tested should be included along with the 547 statement. 549 For each test case, it is RECOMMENDED that the following 550 parameters be reported in these units: 552 Parameter Units or Examples 554 Throughput Frames per second and bits per 556 second 558 Loss Frames 560 Time Seconds, with sufficient resolution 562 to convey meaningful info 564 Protocol IPv4, IPv6, MPLS, etc. 566 Frame Size Octets 568 Port Media Ethernet, GigE (Gigabit Ethernet), 570 POS (Packet over SONET), etc. 572 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 574 Interface Encap. Ethernet, Ethernet VLAN, 576 PPP, HDLC, etc. 578 The reporting of results MUST regard repeatability considerations 579 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 580 trials and report average results. 582 3.3. Power interruption 584 "Power interruption" refers to the complete loss of power on the 585 DUT. It can be viewed as a special case of a hardware reset, 586 triggered by the loss of the power supply to the DUT or its 587 components, and is characterized by the re-initialization of all 588 hardware and software in the DUT. 590 3.3.1. Power Interruption (mandatory) 592 Objective 594 To characterize the speed at which a DUT recovers from a complete 595 loss of electric power or complete power interruption. This test 596 simulates a complete power failure or outage, and should be 597 indicative of the DUT/SUT's behavior during such event. 599 Procedure 601 First, ensure that the entire DUT is at a permanent state to which 602 it will return to after the power interruption, by performing some 603 or all of the following operational tasks: save the current DUT 604 configuration, specify boot parameters, ensure the appropriate 605 software files are available, or perform additional Operating 606 System or hardware related task. 608 Second, ensure that the DUT is able to forward the traffic for at 609 least 15 seconds before any test activities are performed. The 610 traffic should use the minimum frame size possible on the media 611 used in the testing and rate should be sufficient for the DUT to 612 attain the maximum forwarding throughput. This enables a finer 613 granularity in the recovery time measurement. 615 Third, interrupt the power (AC or DC) that feeds the corresponding 616 DUT's power supplies at this point. This entails for example 617 physically removing the power supplies in the DUT to later re- 618 insert them, or simply disconnecting or switching off their power 619 feeds (AC or DC as applicable). The actual power interruption 620 should last at least 15 seconds. 622 Finally, the characterization is completed by measuring the frame 623 loss and recovery time from the moment the power is restored or 624 the power supplies reinserted in the DUT. 626 Reporting format 628 The reset results are reported in a simple statement including the 629 frame loss and recovery times. 631 For each test case, it is RECOMMENDED that the following 632 parameters be reported in these units: 634 Parameter Units or Examples 636 Throughput Frames per second and bits per 638 second 640 Loss Frames 642 Time Seconds, with sufficient resolution 644 to convey meaningful info 646 Protocol IPv4, IPv6, MPLS, etc. 648 Frame Size Octets 650 Port Media Ethernet, GigE (Gigabit Ethernet), 652 POS (Packet over SONET), etc. 654 Port Speed 10 Gbps, 1 Gbps, 100 Mbps, etc. 656 Interface Encap. Ethernet, Ethernet VLAN, 658 PPP, HDLC, etc. 660 The reporting of results MUST regard repeatability considerations 661 from Section 4 of [RFC2544]. It is RECOMMENDED to perform multiple 662 trials and report average results. 664 4. Security Considerations 666 Benchmarking activities, as described in this memo, are limited to 667 technology characterization using controlled stimuli in a laboratory 668 environment, with dedicated address space and the constraints 669 specified in the sections above. 671 The benchmarking network topology will be an independent test setup 672 and MUST NOT be connected to devices that may forward the test 673 traffic into a production network or misroute traffic to the test 674 management network. 676 Furthermore, benchmarking is performed on a "black-box" basis, 677 relying solely on measurements observable external to the DUT/SUT. 679 Special capabilities SHOULD NOT exist in the DUT/SUT specifically 680 for benchmarking purposes. Any implications for network security 681 arising from the DUT/SUT SHOULD be identical in the lab and in 682 production networks. 684 There are no specific security considerations within the scope of 685 this document. 687 5. IANA Considerations 689 There is no IANA consideration for this document. 691 6. Acknowledgments 693 The authors would like to thank Ron Bonica, who motivated us to 694 write this document. The authors would also like to thank Al Morton 695 and Andrew Yourtchenko for providing review, suggestions, and 696 valuable input. 698 This document was prepared using 2-Word-v2.0.template.dot. 700 7. References 702 7.1. Normative References 704 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 705 Requirement Levels", BCP 14, RFC 2119, March 1997. 707 [RFC2544] Bradner, S. and McQuaid, J., "Benchmarking Methodology for 708 Network Interconnect Devices", RFC 2544, March 1999. 710 7.2. Informative References 712 [RFC5180] Popoviciu, C., et al, "IPv6 Benchmarking Methodology for 713 Network Interconnect Devices", RFC 5180, May 2008. 715 [RFC5695] Akhter, A., Asati, R., and C. Pignataro, "MPLS Forwarding 716 Benchmarking Methodology for IP Flows", RFC 5695, November 717 2009. 719 Authors' Addresses 721 Rajiv Asati 722 Cisco Systems 723 7025-6 Kit Creek Road 724 RTP, NC 27709 725 USA 727 Email: rajiva@cisco.com 729 Carlos Pignataro 730 Cisco Systems 731 7200-12 Kit Creek Road 732 RTP, NC 27709 733 USA 735 Email: cpignata@cisco.com 737 Fernando Calabria 738 Cisco Systems 739 7200-12 Kit Creek Road 740 RTP, NC 27709 741 USA 743 Email: fcalabri@cisco.com 745 Cesar Olvera 746 Consulintel 747 Joaquin Turina, 2 748 Pozuelo de Alarcon, Madrid, E-28224 749 Spain 751 Phone: +34 91 151 81 99 752 Email: cesar.olvera@consulintel.es