idnits 2.17.1 draft-ietf-bmwg-protection-meth-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([TERM-ID], [MPLS-FRR-EXT]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 437 has weird spacing: '...failure aft...' == Line 464 has weird spacing: '...failure aft...' == Line 484 has weird spacing: '...failure aft...' == Line 511 has weird spacing: '...failure aft...' == Line 537 has weird spacing: '...failure aft...' == (3 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2009) is 5306 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-bmwg-protection-term-06 Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group R. Papneja 2 Internet Draft Isocore 3 Intended Status: Informational 4 Expires: April 2010 S. Vapiwala 5 J. Karthik 6 Cisco Systems 8 S. Poretsky 9 Allot Communications 11 S. Rao 12 Qwest Communications 14 J.L. Le Roux 15 France Telecom 17 October 2009 19 Methodology for benchmarking MPLS protection mechanisms 20 draft-ietf-bmwg-protection-meth-06.txt 22 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on April 15, 2010. 43 Copyright Notice 44 Copyright (c) 2009 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents in effect on the date of 49 publication of this document (http://trustee.ietf.org/license-info). 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. 53 Protection Mechanisms 55 Abstract 56 This draft describes the methodology for benchmarking MPLS 57 Protection mechanisms for link and node protection as defined in 58 [MPLS-FRR-EXT]. This document provides test methodologies and 59 testbed setup for measuring failover times while considering 60 all dependencies that might impact faster recovery of real-time 61 applications bound to MPLS based traffic engineered tunnels. 62 The benchmarking terms used in this document are defined in 63 [TERM-ID]. 65 Table of Contents 67 1. Introduction...................................................3 68 2. Document Scope.................................................4 69 3. Existing definitions...........................................5 70 4. General Reference Topology.....................................5 71 5. Test Considerations............................................6 72 5.1. Failover Events..............................................6 73 5.2. Failure Detection............................................7 74 5.3. Use of Data Traffic for MPLS Protection Benchmarking.........7 75 5.4. LSP and Route Scaling........................................8 76 5.5. Selection of IGP.............................................8 77 5.6. Reversion....................................................8 78 5.7. Offered Load.................................................8 79 5.8. Tester Capabilities..........................................9 80 6. Reference Test Setups..........................................9 81 6.1 Link Protection...............................................9 82 6.2 Node Protection..............................................13 83 7. Test Methodologies............................................15 84 7.1. MPLS FRR Forwarding Performance Test Cases..................15 85 7.2. Headend PLR with link failure...............................17 86 7.3. Mid-Point PLR with link failure.............................18 87 7.4. Headend PLR with Node Failure...............................19 88 Protection Mechanisms 90 7.5. Mid-Point PLR with Node Failure.............................21 91 8. Reporting Format..............................................23 92 9. Security Considerations.......................................24 93 10. IANA Considerations..........................................24 94 11. References...................................................24 95 11.1. Normative References.......................................24 96 11.2. Informative References.....................................24 97 12. Acknowledgments..............................................24 98 Author's Addresses...............................................25 99 Appendix A: Fast Reroute Scalability Table.......................26 100 Appendix B: Abbreviations........................................38 102 1. Introduction 104 This draft describes the methodology for benchmarking MPLS based 105 protection mechanisms. The new terminology that this document 106 introduces is defined in [TERM-ID]. 108 MPLS based protection mechanisms provide fast recovery of real-time 109 services from a planned or an unplanned link or node failures. 110 MPLS protection mechanisms are generally deployed in a network 111 infrastructure where MPLS is used for provisioning of point-to- 112 point traffic engineered tunnels (tunnel). MPLS based protection 113 mechanisms promise to improve service disruption period by 114 minimizing recovery time from most common failures. 116 Network elements from different manufacturers behave differently to 117 network failures, which impacts the network's ability and 118 performance for failure recovery. It therefore becomes imperative 119 for service providers to have a common benchmark to understand the 120 performance behaviors of network elements. 122 Protection Mechanisms 124 There are two factors impacting service availability: 125 frequency of failures and duration for which the failures persist. 126 Failures can be classified further into two types: correlated and 127 uncorrelated. Correlated and uncorrelated failures may be planned 128 or unplanned. 130 Planned failures are predictable. Network implementations should 131 be able to handle both planned and unplanned failures and recover 132 gracefully within a time frame to maintain service assurance. 133 Hence, failover recovery time is one of the most important benchmark 134 that a service provider considers in choosing the building blocks 135 for their network infrastructure. 137 A correlated failure is the simultaneous occurrence 138 of two or more failures. A typical example is failure of a logical 139 resource (e.g. layer-2 links) due to a dependency on a common 140 physical resource (e.g. common conduit) that fails. Within 141 the context of MPLS protection mechanisms, failures that arise due 142 to Shared Risk Link Groups (SRLG) [MPLS-FRR-EXT] can be considered 143 as correlated failures. Not all correlated failures are 144 predictable in advance, for example, those caused by natural 145 disasters. 147 2. Document Scope 149 This document provides detailed test cases along with different 150 topologies and scenarios that should be considered to effectively 151 benchmark MPLS protection mechanisms and failover times on the 152 Data Plane. Different Failover Events and scaling considerations 153 are also provided in this document. 155 All benchmarking testcases defined in this document apply to both 156 facility backup and local protection enabled in detour mode. The 157 test cases cover all possible failure scenarios and the 158 associated procedures benchmark the performance of the Device 159 Under Test (DUT) to recover from failures. Data plane traffic is 160 used to benchmark failover times. 162 Benchmarking of correlated failures is out of scope of this 163 document. Protection from Bi-directional Forwarding Detection 164 (BFD) is outside the scope of this document. 166 Protection Mechanisms 168 3. Existing definitions 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in BCP 14, RFC 2119 173 [Br97]. RFC 2119 defines the use of these key words to help make the 174 intent of standards track documents as clear as possible. While this 175 document uses these keywords, this document is not a standards track 176 document. 178 The reader is assumed to be familiar with the commonly used MPLS 179 terminology, some of which is defined in [MPLS-FRR-EXT]. 181 This document uses much of the terminology defined in 182 [TERM-ID]. This document also uses existing terminology defined 183 in other BMWG work. Examples include, but are not limited to: 185 Throughput [Ref.[Br91], section 3.17] 186 Device Under Test (DUT) [Ref.[Ma98], section 3.1.1] 187 System Under Test (SUT) [Ref.[Ma98], section 3.1.2] 188 Out-of-order Packet [Ref.[Po06], section 3.3.2] 189 Duplicate Packet [Ref.[Po06], section 3.3.3] 191 4. General Reference Topology 193 Figure 1 illustrates the basic reference testbed and is applicable 194 to all the test cases defined in this document. The Tester is 195 comprised of a Traffic Generator (TG) & Test Analyzer (TA). A 196 Tester is directly connected to the DUT. The Tester sends and 197 receives IP traffic to the tunnel ingress and performs signaling 198 protocol emulation to simulate real network scenarios in a lab 199 environment. The Tester may also support MPLS-TE signaling to act 200 as the ingress node to the MPLS tunnel. 202 --------------------------- 203 | ------------|--------------- 204 | | | | 205 | | | | 206 -------- -------- -------- -------- -------- 207 TG--| R1 |-----| R2 |----| R3 | | R4 | | R5 | 208 | |-----| |----| |----| |---| | 209 -------- -------- -------- -------- -------- 210 | | | | | 211 | | | | | 212 | -------- | | TA 213 ---------| R6 |--------- | 214 | |---------------------- 215 -------- 217 Fig.1: Fast Reroute Topology. 219 Protection Mechanisms 221 The tester MUST record the number of lost, duplicate, and reordered 222 packets. It should further record arrival and departure times so 223 that Failover Time, Additive Latency, and Reversion Time can be 224 measured. The tester may be a single device or a test system 225 emulating all the different roles along a primary or backup path. 227 The label stack is dependent of the following 3 entities: 229 - Type of protection (Link Vs Node) 230 - # of remaining hops of the primary tunnel from the PLR 231 - # of remaining hops of the backup tunnel from the PLR 233 Due to this dependency, it is RECOMMENDED that the benchmarking of 234 failover times be performed on all the topologies provided in 235 section 6. 237 5. Test Considerations 239 This section discusses the fundamentals of MPLS Protection testing: 241 -The types of network events that causes failover 242 -Indications for failover 243 -the use of data traffic 244 -Traffic generation 245 -LSP Scaling 246 -Reversion of LSP 247 -IGP Selection 249 5.1. Failover Events [TERM-ID] 251 The failover to the backup tunnel is primarily triggered by either 252 link or node failures observed downstream of the Point of Local 253 repair (PLR). Some of these failure events are listed below. 255 Link failure events 256 - Interface Shutdown on PLR side with POS Alarm 257 - Interface Shutdown on remote side with POS Alarm 258 - Interface Shutdown on PLR side with RSVP hello enabled 259 - Interface Shutdown on remote side with RSVP hello enabled 260 - Interface Shutdown on PLR side with BFD 261 - Interface Shutdown on remote side with BFD 262 - Fiber Pull on the PLR side (Both TX & RX or just the TX) 263 - Fiber Pull on the remote side (Both TX & RX or just the RX) 264 - Online insertion and removal (OIR) on PLR side 265 - OIR on remote side 266 - Sub-interface failure (e.g. shutting down of a VLAN) 267 - Parent interface shutdown (an interface bearing multiple sub- 268 interfaces 270 Node failure events 271 - A System reload initiated either by a graceful shutdown or by 272 a power failure. 273 - A system crash due to a software failure or an assert. 275 Protection Mechanisms 277 5.2. Failure Detection [TERM-ID] 279 Link failure detection time depends on the link type and failure 280 detection protocols running. For SONET/SDH, the alarm type (such as 281 LOS, AIS, or RDI) can be used. Other link types have layer-two 282 alarms, but they may not provide a short enough failure detection 283 time. Ethernet based links do not have layer 2 failure indicators, 284 and therefore relies on layer 3 signaling for failure detection. 285 However for directly connected devices, remote fault indication in 286 the ethernet auto-negotiation scheme could be considered as a type 287 of layer 2 link failure indicator. 289 MPLS has different failure detection techniques such as BFD, or use 290 of RSVP hellos. These methods can be used for the layer 3 failure 291 indicators required by Ethernet based links, or for some other non- 292 Ethernet based links to help improve failure detection time. 294 The test procedures in this document can be used for a local failure 295 or remote failure scenarios for comprehensive benchmarking and to 296 evaluate failover performance independent of the failure detection 297 techniques. 299 5.3. Use of Data Traffic for MPLS Protection benchmarking 301 Currently end customers use packet loss as a key metric for 302 Failover Time [TERM-ID]. Failover Packet Loss [TERM-ID] is an 303 externally observable event and has direct impact on application 304 performance. MPLS protection is expected to minimize the packet 305 loss in the event of a failure. For this reason it is important to 306 develop a standard router benchmarking methodology for measuring 307 MPLS protection that uses packet loss as a metric. At a known rate 308 of forwarding, packet loss can be measured and the failover time 309 can be determined. Measurement of control plane signaling to 310 establish backup paths is not enough to verify failover. Failover 311 is best determined when packets are actually traversing the backup 312 path. 314 Protection Mechanisms 316 An additional benefit of using packet loss for calculation of 317 failover time is that it allows use of a black-box test environment. 318 Data traffic is offered at line-rate to the device under test (DUT) 319 an emulated network failure event is forced to occur, and packet loss 320 is externally measured to calculate the convergence time. This setup 321 is independent of the DUT architecture. 323 In addition, this methodology considers the packets in error and 324 duplicate packets that could have been generated during the failover 325 process. The methodologies consider lost, out-of-order, and 326 duplicate packets to be impaired packets that contribute to the 327 Failover Time. 329 5.4. LSP and Route Scaling 331 Failover time performance may vary with the number of established 332 primary and backup tunnel label switched paths (LSP) and installed 333 routes. However the procedure outlined here should be used for 334 any number of LSPs (L) and number of routes protected by PLR(R). 335 The amount of L and R must be recorded. 337 5.5. Selection of IGP 339 The underlying IGP could be ISIS-TE or OSPF-TE for the methodology 340 proposed here. See [IGP-METH] for IGP options to consider and 341 report. 343 5.6. Restoration and Reversion [TERM-ID] 345 Fast Reroute provides a method to return or restore an original 346 primary LSP upon recovery from the failure (Restoration) and to 347 switch traffic from the Backup Path to the restored Primary Path 348 (Reversion). In MPLS-FRR, Reversion can be implemented as Global 349 Reversion or Local Reversion. It is important to include 350 Restoration and Reversion as a step in each test case to measure 351 the amount of packet loss, out of order packets, or duplicate 352 packets that is produced. 354 5.7. Offered Load 356 It is suggested that there be one or more traffic streams as long as 357 there is a steady and constant rate of flow for all the streams. In 358 order to monitor the DUT performance for recovery times, a set of 359 route prefixes should be advertised before traffic is sent. The 360 traffic should be configured towards these routes. 362 A typical example would be configuring the traffic generator to send 363 the traffic to the first, middle and last of the advertised routes. 364 (First, middle and last could be decided by the numerically 365 Protection Mechanisms 367 smallest, median and the largest respectively of the advertised 368 prefix). Generating traffic to all of the prefixes reachable by the 369 protected tunnel (probably in a Round-Robin fashion, where the 370 traffic is destined to all the prefixes but one prefix at a time in 371 a cyclic manner) is not recommended. The reason why traffic 372 generation is not recommended in a Round-Robin fashion to all the 373 prefixes, one at a time is that if there are many prefixes reachable 374 through the LSP the time interval between 2 packets destined to one 375 prefix may be significantly high and may be comparable with the 376 failover time being measured which does not aid in getting an 377 accurate failover measurement. 379 5.8 Tester Capabilities 381 It is RECOMMENDED that the Tester used to execute each test case 382 have the following capabilities: 383 1. Ability to establish MPLS-TE tunnels and push/pop labels. 384 2. Ability to produce Failover Event [TERM-ID]. 385 3. Ability to insert a timestamp in each data packet's IP 386 payload. 387 4. An internal time clock to control timestamping, time 388 measurements, and time calculations. 389 5. Ability to disable or tune specific Layer-2 and Layer-3 390 protocol functions on any interface(s). 392 The Tester MAY be capable to make non-data plane convergence 393 observations and use those observations for measurements. 395 6. Reference Test Setup 397 In addition to the general reference topology shown in figure 1, 398 this section provides detailed insight into various proposed test 399 setups that should be considered for comprehensively benchmarking 400 the failover time in different roles along the primary tunnel: 402 This section proposes a set of topologies that covers all the 403 scenarios for local protection. All of these topologies can be 404 mapped to the reference topology shown in Figure 1. Topologies 405 provided in this section refer to the testbed required to 406 benchmark failover time when the DUT is configured as a PLR in 407 either Headend or midpoint role. Provided with each topology 408 below is the label stack at the PLR. Penultimate Hop 409 Popping (PHP) MAY be used and must be reported when used. 411 Protection Mechanisms 413 Figures 2 thru 9 use the following convention: 415 a) HE is Headend 416 b) TE is Tail-End 417 c) MID is Mid point 418 d) MP is Merge Point 419 e) PLR is Point of Local Repair 420 f) PRI is Primary Path 421 g) BKP denotes Backup Path and Nodes 423 6.1. Link Protection 425 6.1.1 Link Protection - 1 hop primary (from PLR) and 1 hop backup TE 426 tunnels 428 ------- -------- PRI -------- 429 | R1 | | R2 | | R3 | 430 TG-| HE |--| MID |----| TE |-TA 431 | | | PLR |----| | 432 ------- -------- BKP -------- 434 Figure 2. 436 Traffic Num of Labels Num of labels 437 before failure after failure 438 IP TRAFFIC (P-P) 0 0 439 Layer3 VPN (PE-PE) 1 1 440 Layer3 VPN (PE-P) 2 2 441 Layer2 VC (PE-PE) 1 1 442 Layer2 VC (PE-P) 2 2 443 Mid-point LSPs 0 0 444 Protection Mechanisms 446 6.1.2. Link Protection - 1 hop primary (from PLR) and 2 hop backup TE 447 tunnels 449 ------- -------- -------- 450 | R1 | | R2 | | R3 | 451 TG-| HE | | MID |PRI | TE |-TA 452 | |----| PLR |----| | 453 ------- -------- -------- 454 |BKP | 455 | -------- | 456 | | R6 | | 457 |----| BKP |----| 458 | MID | 459 -------- 461 Figure 3. 463 Traffic Num of Labels Num of labels 464 before failure after failure 465 IP TRAFFIC (P-P) 0 1 466 Layer3 VPN (PE-PE) 1 2 467 Layer3 VPN (PE-P) 2 3 468 Layer2 VC (PE-PE) 1 2 469 Layer2 VC (PE-P) 2 3 470 Mid-point LSPs 0 1 472 6.1.3. Link Protection - 2+ hop (from PLR) primary and 1 hop backup TE 473 tunnels 475 -------- -------- -------- -------- 476 | R1 | | R2 |PRI | R3 |PRI | R4 | 477 TG-| HE |----| MID |----| MID |------| TE |-TA 478 | | | PLR |----| | | | 479 -------- -------- BKP -------- -------- 481 Figure 4. 483 Traffic Num of Labels Num of labels 484 before failure after failure 485 IP TRAFFIC (P-P) 1 1 486 Layer3 VPN (PE-PE) 2 2 487 Layer3 VPN (PE-P) 3 3 488 Layer2 VC (PE-PE) 2 2 489 Layer2 VC (PE-P) 3 3 490 Mid-point LSPs 1 1 491 Protection Mechanisms 493 6.1.4. Link Protection - 2+ hop (from PLR) primary and 2 hop backup TE 494 tunnels 496 -------- -------- PRI -------- PRI -------- 497 | R1 | | R2 | | R3 | | R4 | 498 TG-| HE |----| MID |----| MID |------| TE |-TA 499 | | | PLR | | | | | 500 -------- -------- -------- -------- 501 BKP| | 502 | -------- | 503 | | R6 | | 504 ---| BKP |- 505 | MID | 506 -------- 508 Figure 5. 510 Traffic Num of Labels Num of labels 511 before failure after failure 513 IP TRAFFIC (P-P) 1 2 514 Layer3 VPN (PE-PE) 2 3 515 Layer3 VPN (PE-P) 3 4 516 Layer2 VC (PE-PE) 2 3 517 Layer2 VC (PE-P) 3 4 518 Mid-point LSPs 1 2 519 Protection Mechanisms 521 6.2. Node Protection 523 6.2.1. Node Protection - 2 hop primary (from PLR) and 1 hop backup TE 524 tunnels 526 -------- -------- -------- -------- 527 | R1 | | R2 |PRI | R3 | PRI | R4 | 528 TG-| HE |----| MID |----| MID |------| TE |-TA 529 | | | PLR | | | | | 530 -------- -------- -------- -------- 531 |BKP | 532 ----------------------------- 534 Figure 6. 536 Traffic Num of Labels Num of labels 537 before failure after failure 539 IP TRAFFIC (P-P) 1 0 540 Layer3 VPN (PE-PE) 2 1 541 Layer3 VPN (PE-P) 3 2 542 Layer2 VC (PE-PE) 2 1 543 Layer2 VC (PE-P) 3 2 544 Mid-point LSPs 1 0 546 6.2.2. Node Protection - 2 hop primary (from PLR) and 2 hop backup TE 547 tunnels 549 -------- -------- -------- -------- 550 | R1 | | R2 | | R3 | | R4 | 551 TG-| HE | | MID |PRI | MID |PRI | TE |-TA 552 | |----| PLR |----| |----| | 553 -------- -------- -------- -------- 554 | | 555 BKP| -------- | 556 | | R6 | | 557 ---------| BKP |--------- 558 | MID | 559 -------- 561 Figure 7. 563 Protection Mechanisms 565 Traffic Num of Labels Num of labels 566 before failure after failure 568 IP TRAFFIC (P-P) 1 1 569 Layer3 VPN (PE-PE) 2 2 570 Layer3 VPN (PE-P) 3 3 571 Layer2 VC (PE-PE) 2 2 572 Layer2 VC (PE-P) 3 3 573 Mid-point LSPs 1 1 575 6.2.3. Node Protection - 3+ hop primary (from PLR) and 1 hop backup TE 576 tunnels 578 -------- -------- PRI -------- PRI -------- PRI -------- 579 | R1 | | R2 | | R3 | | R4 | | R5 | 580 TG-| HE |--| MID |---| MID |---| MP |---| TE |-TA 581 | | | PLR | | | | | | | 582 -------- -------- -------- -------- -------- 583 BKP| | 584 -------------------------- 586 Figure 8. 588 Traffic Num of Labels Num of labels 589 before failure after failure 591 IP TRAFFIC (P-P) 1 1 592 Layer3 VPN (PE-PE) 2 2 593 Layer3 VPN (PE-P) 3 3 594 Layer2 VC (PE-PE) 2 2 595 Layer2 VC (PE-P) 3 3 596 Mid-point LSPs 1 1 597 Protection Mechanisms 599 6.2.4. Node Protection - 3+ hop primary (from PLR) and 2 hop backup TE 600 tunnels 602 -------- -------- -------- -------- -------- 603 | R1 | | R2 | | R3 | | R4 | | R5 | 604 TG-| HE | | MID |PRI| MID |PRI| MP |PRI| TE |-TA 605 | |-- | PLR |---| |---| |---| | 606 -------- -------- -------- -------- -------- 607 BKP| | 608 | -------- | 609 | | R6 | | 610 ---------| BKP |------- 611 | MID | 612 -------- 614 Figure 9. 616 Traffic Num of Labels Num of labels 617 before failure after failure 619 IP TRAFFIC (P-P) 1 2 620 Layer3 VPN (PE-PE) 2 3 621 Layer3 VPN (PE-P) 3 4 622 Layer2 VC (PE-PE) 2 3 623 Layer2 VC (PE-P) 3 4 624 Mid-point LSPs 1 2 626 7. Test Methodology 628 The procedure described in this section can be applied to all the 8 629 base test cases and the associated topologies. The backup as well as 630 the primary tunnels are configured to be alike in terms of bandwidth 631 usage. In order to benchmark failover with all possible label stack 632 depth applicable as seen with current deployments, it is RECOMMENDED 633 to perform all of the test cases provided in this section. The 634 forwarding performance test cases in section 7.1 MUST be performed 635 prior to performing the failover test cases. 637 Protection Mechanisms 639 7.1. MPLS FRR Forwarding Performance 641 Benchmarking Failover Time [TERM-ID] for MPLS protection first 642 requires baseline measurement of the forwarding performance of the 643 test topology including the DUT. Forwarding performance is 644 benchmarked by the metric Throughput as defined in [Br91] and 645 measured in units pps. This section provides two test cases to 646 benchmark forwarding performance. These are with the DUT 647 configured as a Headend PLR, Mid-Point PLR, and Egress PLR. 649 7.1.1. Headend PLR Forwarding Performance 651 Objective 653 To benchmark the maximum rate (pps) on the PLR (as headend) over 654 primary LSP and backup LSP. 656 Test Setup 658 - Select any one topology out of the 8 from section 6. 659 - Select overlay technologies (e.g. IGP, VPN, or VC) with DUT 660 as Headend PLR. 661 - The DUT will also have 2 interfaces connected to the traffic 662 Generator/analyzer. (If the node downstream of the PLR is not 663 A simulated node, then the Ingress of the tunnel should have 664 one link connected to the traffic generator and the node 665 downstream to the PLR or the egress of the tunnel should have 666 a link connected to the traffic analyzer). 668 Procedure 670 1. Establish the primary LSP on R2 required by the topology 671 selected. 672 2. Establish the backup LSP on R2 required by the selected 673 topology. 674 3. Verify primary and backup LSPs are up and that primary is 675 protected. 676 4. Verify Fast Reroute protection is enabled and ready. 677 5. Setup traffic streams as described in section 5.7. 678 6. Send MPLS traffic over the primary LSP at the Throughput 679 supported by the DUT. 680 7. Record the Throughput over the primary LSP. 681 8. Trigger a link failure as described in section 5.1. 682 9. Verify that the offered load gets mapped to the backup tunnel 683 and measure the Additive Backup Delay. 684 10. 30 seconds after Failover, stop the offered load and 685 measure the Throughput, Packet Loss, Out-of-Order Packets, 686 and Duplicate Packets over the Backup LSP. 687 11. Adjust the offered load and repeat steps 6 through 10 until 688 the Throughput values for the primary and backup LSPs are 689 equal. 690 12. Record the Throughput. This is the offered load that will be 691 used for the Headend PLR failover test cases. 693 Protection Mechanisms 695 7.1.2. Mid-Point PLR Forwarding Performance 697 Objective 699 To benchmark the maximum rate (pps) on the PLR (as mid-point) over 700 primary LSP and backup LSP. 702 Test Setup 704 - Select any one topology out of 8 from section 6. 705 - Select overlay technologies (e.g. IGP, VPN, or VC) with DUT 706 as Mid-Point PLR. 707 - The DUT will also have 2 interfaces connected to the traffic 708 generator. 710 Procedure 712 1. Establish the primary LSP on R1 required by the topology 713 selected. 714 2. Establish the backup LSP on R2 required by the selected 715 topology. 716 3. Verify primary and backup LSPs are up and that primary is 717 protected. 718 4. Verify Fast Reroute protection is enabled and ready. 719 5. Setup traffic streams as described in section 5.7. 720 6. Send MPLS traffic over the primary LSP at the Throughput 721 supported by the DUT. 722 7. Record the Throughput over the primary LSP. 723 8. Trigger a link failure as described in section 5.1. 724 9. Verify that the offered load gets mapped to the backup 725 tunnel and measure the Additive Backup Delay. 726 10. 30 seconds after Failover, stop the offered load and 727 measure the Throughput, Packet Loss, Out-of-Order Packets, 728 and Duplicate Packets over the Backup LSP. 729 11. Adjust the offered load and repeat steps 6 through 10 until 730 the Throughput values for the primary and backup LSPs are 731 equal. 732 12. Record the Throughput. This is the offered load that will 733 be used for the Mid-Point PLR failover test cases. 735 Protection Mechanisms 737 7.1.3. Egress PLR Forwarding Performance 739 Objective 741 To benchmark the maximum rate (pps) on the PLR (as egress) over 742 primary LSP and backup LSP. 744 Test Setup 746 - Select any one topology out of 8 from section 6. 747 - Select overlay technologies (e.g. IGP, VPN, or VC) with DUT 748 as Egress PLR. 749 - The DUT will also have 2 interfaces connected to the traffic 750 generator. 752 Procedure 754 1. Establish the primary LSP on R1 required by the topology 755 selected. 756 2. Establish the backup LSP on R2 required by the selected 757 topology. 758 3. Verify primary and backup LSPs are up and that primary is 759 protected. 760 4. Verify Fast Reroute protection is enabled and ready. 761 5. Setup traffic streams as described in section 5.7. 762 6. Send MPLS traffic over the primary LSP at the Throughput 763 supported by the DUT. 764 7. Record the Throughput over the primary LSP. 765 8. Trigger a link failure as described in section 5.1. 766 9. Verify that the offered load gets mapped to the backup 767 tunnels and measure the Additive Backup Delay.. 768 10. 30 seconds after Failover, stop the offered load and 769 measure the Throughput, Packet Loss, Out-of-Order Packets, 770 and Duplicate Packets over the Backup LSP. 771 11. Adjust the offered load and repeat steps 6 through 10 until 772 the Throughput values for the primary and backup LSPs are 773 equal. 774 12. Record the Throughput. This is the offered load that will be 775 used for the Egress PLR failover test cases. 777 7.2. Headend PLR with Link Failure 779 Objective 781 To benchmark the MPLS failover time due to link failure events 782 described in section 5.1 experienced by the DUT which is the 783 Headend PLR. 785 Test Setup 787 - Select any one topology out of 8 from section 6 788 - Select overlay technology for FRR test (e.g. IGP,VPN,or VC). 790 Protection Mechanisms 792 - The DUT will also have 2 interfaces connected to the traffic 793 Generator/analyzer. (If the node downstream of the PLR is not 794 A simulated node, then the Ingress of the tunnel should have 795 one link connected to the traffic generator and the node 796 downstream to the PLR or the egress of the tunnel should have 797 a link connected to the traffic analyzer). 799 Test Configuration 801 1. Configure the number of primaries on R2 and the backups on R2 802 as required by the topology selected. 803 2. Configure the test setup to support Reversion. 804 3. Advertise prefixes (as per FRR Scalability Table described 805 in Appendix A) by the tail end. 807 Procedure 808 Test Case "7.1.1. Headend PLR Forwarding Performance" MUST be 809 completed first to obtain the Throughput to use as the offered 810 load. 812 1. Establish the primary LSP on R2 required by the topology 813 selected. 814 2. Establish the backup LSP on R2 required by the selected 815 topology. 816 3. Verify primary and backup LSPs are up and that primary is 817 protected. 818 4. Verify Fast Reroute protection is enabled and ready. 819 5. Setup traffic streams for the offered load as described 820 in section 5.7. 821 6. Provide the offered load from the tester at the Throughput 822 [Br91] level obtained from test case 7.1.1. 823 7. Verify traffic is switched over Primary LSP without packet 824 loss. 825 8. Trigger a link failure as described in section 5.1. 826 9. Verify that the offered load gets mapped to the backup 827 tunnel and measure the Additive Backup Delay. 828 10. 30 seconds after Failover [TERM-ID], stop the offered 829 load and measure the total Failover Packet Loss [TERM-ID]. 830 11. Calculate the Failover Time [TERM-ID] benchmark using the 831 selected Failover Time Calculation Method (TBLM, PLBM, or 832 TBM) [TERM-ID]. 833 12. Restart the offered load and restore the primary LSP to 834 verify Reversion [TERM-ID] occurs and measure the Reversion 835 Packet Loss [TERM-ID]. 836 13. Calculate the Reversion Time [TERM-ID] benchmark using the 837 selected Failover Time Calculation Method (TBLM, PLBM, or 838 TBM) [TERM-ID]. 839 14. Verify Headend signals new LSP and protection should be in 840 place again. 842 IT is RECOMMENDED that this procedure be repeated for each of 843 the link failure triggers defined in section 5.1. 845 Protection Mechanisms 847 7.3. Mid-Point PLR with link failure 849 Objective 851 To benchmark the MPLS failover time due to link failure events 852 described in section 5.1 experienced by the DUT which 853 is the Mid-Point PLR. 855 Test Setup 857 - Select any one topology out of 8 from section 6 858 - Select overlay technology for FRR test as Mid-Point LSPs 859 - The DUT will also have 2 interfaces connected to the traffic 860 generator. 862 Test Configuration 864 1. Configure the number of primaries on R1 and the backups on R2 865 as required by the topology selected. 866 2. Configure the test setup to support Reversion. 867 3. Advertise prefixes (as per FRR Scalability Table described in 868 Appendix A) by the tail end. 870 Procedure 871 Test Case "7.1.2. Mid-Point PLR Forwarding Performance" MUST be 872 completed first to obtain the Throughput to use as the offered 873 load. 875 1. Establish the primary LSP on R1 required by the topology 876 selected. 877 2. Establish the backup LSP on R2 required by the selected 878 topology. 879 3. Perform steps 3 through 14 from section 7.2 Headend PLR 880 with Link Failure. 882 IT is RECOMMENDED that this procedure be repeated for each of 883 the link failure triggers defined in section 5.1. 885 Protection Mechanisms 887 7.4. Headend PLR with Node Failure 889 Objective 891 To benchmark the MPLS failover time due to Node failure events 892 described in section 5.1 experienced by the DUT which is the 893 Headend PLR. 895 Test Setup 897 - Select any one topology from section 6.5 to 6.8 898 - Select overlay technology for FRR test (e.g. IGP, VPN, or VC) 899 - The DUT will also have 2 interfaces connected to the traffic 900 generator. 902 Test Configuration 904 1. Configure the number of primaries on R2 and the backups on R2 905 as required by the topology selected. 906 2. Configure the test setup to support Reversion. 907 3. Advertise prefixes (as per FRR Scalability table describe in 908 Appendix A) by the tail end. 910 Procedure 912 Test Case "7.1.1. Headend PLR Forwarding Performance" MUST be 913 completed first to obtain the Throughput to use as the offered 914 load. 916 1. Establish the primary LSP on R2 required by the topology 917 selected. 918 2. Establish the backup LSP on R2 required by the selected 919 topology. 920 3. Verify primary and backup LSPs are up and that primary is 921 protected. 922 4. Verify Fast Reroute protection. 923 5. Setup traffic streams for the offered load as described 924 in section 5.7. 925 6. Provide the offered load from the tester at the Throughput 926 [Br91] level obtained from test case 7.1.1. 927 7. Verify traffic is switched over Primary LSP without packet 928 loss. 929 8. Trigger a node failure as described in section 5.1. 930 9. Perform steps 9 through 14 in 7.2 Headend PLR with Link 931 Failure. 933 IT is RECOMMENDED that this procedure be repeated for each of 934 the node failure triggers defined in section 5.1. 936 Protection Mechanisms 938 7.5. Mid-Point PLR with Node failure 940 Objective 942 To benchmark the MPLS failover time due to Node failure events 943 described in section 5.1 experienced by the DUT which is the 944 Mid-Point PLR. 946 Test Setup 948 - Select any one topology from section 6.5 to 6.8. 949 - Select overlay technology for FRR test as Mid-Point LSPs. 950 - The DUT will also have 2 interfaces connected to the traffic 951 generator. 953 Test Configuration 955 1. Configure the number of primaries on R1 and the backups on 956 R2 as required by the topology selected. 957 2. Configure the test setup to support Reversion. 958 3. Advertise prefixes (as per FRR Scalability table describe in 959 Appendix A) by the tail end. 961 Procedure 963 Test Case "7.1.2. Mid-Point PLR Forwarding Performance" MUST be 964 completed first to obtain the Throughput to use as the offered 965 load. 967 1. Establish the primary LSP on R1 required by the topology 968 selected. 969 2. Establish the backup LSP on R2 required by the selected 970 topology. 971 3. Verify primary and backup LSPs are up and that primary is 972 protected. 973 4. Verify Fast Reroute protection. 974 5. Setup traffic streams for the offered load as described 975 in section 5.7. 976 6. Provide the offered load from the tester at the Throughput 977 [Br91] level obtained from test case 7.1.1. 978 7. Verify traffic is switched over Primary LSP without packet 979 loss. 980 8. Trigger a node failure as described in section 5.1. 981 9. Perform steps 9 through 14 in 7.2 Headend PLR with Link 982 Failure. 984 IT is RECOMMENDED that this procedure be repeated for each of 985 the node failure triggers defined in section 5.1. 987 Protection Mechanisms 989 8. Reporting Format 991 For each test, it is recommended that the results be reported in the 992 following format. 994 Parameter Units 996 IGP used for the test ISIS-TE/ OSPF-TE 998 Interface types Gige,POS,ATM,VLAN etc. 1000 Packet Sizes offered to the DUT Bytes 1002 Forwarding rate packets per second 1004 IGP routes advertised Number of IGP routes 1006 Penultimate Hop Popping Used/Not Used 1008 RSVP hello timers Milliseconds 1010 Number of FRR tunnels Number of tunnels 1012 Number of VPN routes installed Number of VPN routes 1013 on the Headend 1015 Number of VC tunnels Number of VC tunnels 1017 Number of BGP routes BGP routes installed 1019 Number of mid-point tunnels Number of tunnels 1021 Number of Prefixes protected by Number of LSPs 1022 Primary 1024 Topology being used Section number, and 1025 figure reference 1027 Failover Event Event type 1028 Protection Mechanisms 1030 Benchmarks (to be recorded for each test case): 1032 Failover- 1033 Failover Time seconds 1034 Failover Packet Loss packets 1035 Additive Backup Delay seconds 1036 Out-of-Order Packets packets 1037 Duplicate Packets packets 1039 Reversion- 1040 Reversion Time seconds 1041 Reversion Packet Loss packets 1042 Additive Backup Delay seconds 1043 Out-of-Order Packets packets 1044 Duplicate Packets packets 1046 Failover Time suggested above is calculated using one of the 1047 following three methods 1049 1. Packet-Based Loss method (PBLM): (Number of packets 1050 dropped/packets per second * 1000) milliseconds. This method 1051 could also be referred as Rate Derived method. 1053 2. Time-Based Loss Method (TBLM): This method relies on the 1054 ability of the Traffic generators to provide statistics which 1055 reveal the duration of failure in milliseconds based on when 1056 the packet loss occurred (interval between non-zero packet loss 1057 and zero loss). 1059 3. Timestamp Based Method (TBM): This method of failover 1060 calculation is based on the timestamp that gets transmitted as 1061 payload in the packets originated by the generator. The Traffic 1062 Analyzer records the timestamp of the last packet received 1063 before the failover event and the first packet after the 1064 failover and derives the time based on the difference between 1065 these 2 timestamps. Note: The payload could also contain 1066 sequence numbers for out-of-order packet calculation and 1067 duplicate packets. 1069 Protection Mechanisms 1071 9. Security Considerations 1072 Documents of this type do not directly affect the security of 1073 Internet or corporate networks as long as benchmarking is not 1074 performed on devices or systems connected to production networks. 1075 Security threats and how to counter these in SIP and the media 1076 layer is discussed in RFC3261, RFC3550, and RFC3711 and various 1077 other drafts. This document attempts to formalize a set of 1078 common methodology for benchmarking performance of failover 1079 mechanisms in a lab environment. 1081 10. IANA Considerations 1082 This document requires no IANA considerations. 1084 11. References 1086 11.1. Informative References 1087 NONE 1089 11.2. Normative References 1091 [TERM-ID] Poretsky S., Papneja R., Karthik J., Vapiwala S., 1092 "Benchmarking Terminology for Protection Performance", 1093 draft-ietf-bmwg-protection-term-06.txt, work in 1094 progress. 1096 [MPLS-FRR-EXT] Pan P., Swallow G., Atlas A., "Fast Reroute 1097 Extensions to RSVP-TE for LSP Tunnels", RFC 4090. 1099 [IGP-METH] S. Poretsky, B. Imhoff, "Benchmarking Methodology 1100 for IGP Data Plane Route Convergence, "draft-ietf- 1101 bmwg-igp-dataplane-conv-meth-17.txt", work in progress. 1103 [Br91] Bradner, S., Editor, "Benchmarking Terminology for 1104 Network Interconnection Devices", RFC 1242, July 1991. 1106 [Br97] Bradner, S., "Key words for use in RFCs to Indicate 1107 Requirement Levels", RFC 2119, July 1997. 1109 [Ma98] Mandeville, R., "Benchmarking Terminology for LAN 1110 Switching Devices", RFC 2285, February 1998. 1112 [Po06] Poretsky, S., et al., "Terminology for Benchmarking 1113 Network-layer Traffic Control Mechanisms", RFC 4689, 1114 November 2006. 1116 12. Acknowledgments 1118 We would like to thank Jean Philip Vasseur for his invaluable input 1119 to the document and Curtis Villamizar his contribution in suggesting 1120 text on definition and need for benchmarking Correlated failures. 1121 Additionally we would like to thank Al Morton, Arun Gandhi, 1122 Amrit Hanspal, Karu Ratnam, Raveesh Janardan, Andrey Kiselev, and 1123 Mohan Nanduri for their formal reviews of this document. 1125 Protection Mechanisms 1127 Author's Addresses 1129 Rajiv Papneja 1130 Isocore 1131 12359 Sunrise Valley Drive, STE 100 1132 Reston, VA 20190 1133 USA 1134 Phone: +1 703 860 9273 1135 Email: rpapneja@isocore.com 1137 Samir Vapiwala 1138 Cisco System 1139 300 Beaver Brook Road 1140 Boxborough, MA 01719 1141 USA 1142 Phone: +1 978 936 1484 1143 Email: svapiwal@cisco.com 1145 Jay Karthik 1146 Cisco System 1147 300 Beaver Brook Road 1148 Boxborough, MA 01719 1149 USA 1150 Phone: +1 978 936 0533 1151 Email: jkarthik@cisco.com 1153 Scott Poretsky 1154 Allot Communications 1155 USA 1156 Phone: +1 508 309 2179 1157 EMail: sporetsky@allot.com 1159 Shankar Rao 1160 Qwest Communications, 1161 950 17th Street 1162 Suite 1900 1163 Qwest Communications 1164 Denver, CO 80210 1165 USA 1166 Phone: + 1 303 437 6643 1167 Email: shankar.rao@qwest.com 1169 Jean-Louis Le Roux 1170 France Telecom 1171 2 av Pierre Marzin 1172 22300 Lannion 1173 France 1174 Phone: 00 33 2 96 05 30 20 1175 Email: jeanlouis.leroux@orange-ft.com 1176 Protection Mechanisms 1178 Appendix A: Fast Reroute Scalability Table 1180 This section provides the recommended numbers for evaluating the 1181 scalability of fast reroute implementations. It also recommends the 1182 typical numbers for IGP/VPNv4 Prefixes, LSP Tunnels and VC entries. 1183 Based on the features supported by the device under test (DUT), 1184 appropriate scaling limits can be used for the test bed. 1186 A1. FRR IGP Table 1188 No. of Headend TE Tunnels IGP Prefixes 1190 1 100 1192 1 500 1194 1 1000 1196 1 2000 1198 1 5000 1200 2 (Load Balance) 100 1202 2 (Load Balance) 500 1204 2 (Load Balance) 1000 1206 2 (Load Balance) 2000 1208 2 (Load Balance) 5000 1210 100 100 1212 500 500 1214 1000 1000 1216 2000 2000 1217 Protection Mechanisms 1219 A 2. FRR VPN Table 1221 No. of Headend TE Tunnels VPNv4 Prefixes 1223 1 100 1225 1 500 1227 1 1000 1229 1 2000 1231 1 5000 1233 1 10000 1235 1 20000 1237 1 Max 1239 2 (Load Balance) 100 1241 2 (Load Balance) 500 1243 2 (Load Balance) 1000 1245 2 (Load Balance) 2000 1247 2 (Load Balance) 5000 1249 2 (Load Balance) 10000 1251 2 (Load Balance) 20000 1253 2 (Load Balance) Max 1255 A 3. FRR Mid-Point LSP Table 1257 No of Mid-point TE LSPs could be configured at recommended levels - 1258 100, 500, 1000, 2000, or max supported number. 1260 Protection Mechanisms 1262 A 4. FRR VC Table 1264 No. of Headend TE Tunnels VC entries 1266 1 100 1267 1 500 1268 1 1000 1269 1 2000 1270 1 Max 1271 100 100 1272 500 500 1273 1000 1000 1274 2000 2000 1276 Appendix B: Abbreviations 1278 BFD - Bidirectional Fault Detection 1279 BGP - Border Gateway protocol 1280 CE - Customer Edge 1281 DUT - Device Under Test 1282 FRR - Fast Reroute 1283 IGP - Interior Gateway Protocol 1284 IP - Internet Protocol 1285 LSP - Label Switched Path 1286 MP - Merge Point 1287 MPLS - Multi Protocol Label Switching 1288 N-Nhop - Next - Next Hop 1289 Nhop - Next Hop 1290 OIR - Online Insertion and Removal 1291 P - Provider 1292 PE - Provider Edge 1293 PHP - Penultimate Hop Popping 1294 PLR - Point of Local Repair 1295 RSVP - Resource reSerVation Protocol 1296 SRLG - Shared Risk Link Group 1297 TA - Traffic Analyzer 1298 TE - Traffic Engineering 1299 TG - Traffic Generator 1300 VC - Virtual Circuit 1301 VPN - Virtual Private Network