idnits 2.17.1 draft-ietf-bmwg-dcbench-methodology-18.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 INTERNET-DRAFT, Intended Status: Informational Google 4 Expires December 23,2017 J. Rapp 5 June 21, 2017 VMware 7 Data Center Benchmarking Methodology 8 draft-ietf-bmwg-dcbench-methodology-18 10 Abstract 12 The purpose of this informational document is to establish test and 13 evaluation methodology and measurement techniques for physical 14 network equipment in the data center. A pre-requisite to this 15 publication is the terminology document [draft-ietf-bmwg-dcbench- 16 terminology]. Many of these terms and methods may be applicable 17 beyond this publication's scope as the technologies originally 18 applied in the data center are deployed elsewhere. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute working 27 documents as Internet-Drafts. The list of current Internet-Drafts is 28 at http://datatracker.ietf.org/drafts/current. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 54 1.2. Methodology format and repeatability recommendation . . . . 5 55 2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 6 59 3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . . 7 60 3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 7 62 3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 10 63 4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11 64 4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 11 65 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 11 66 4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12 67 5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13 68 5.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13 70 5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15 71 6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15 72 6.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 15 74 6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 77 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 78 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 79 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 80 9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 83 1. Introduction 85 Traffic patterns in the data center are not uniform and are 86 constantly changing. They are dictated by the nature and variety of 87 applications utilized in the data center. It can be largely east-west 88 traffic flows (server to server inside the data center) in one data 89 center and north-south (outside of the data center to server) in 90 another, while others may combine both. Traffic patterns can be 91 bursty in nature and contain many-to-one, many-to-many, or one-to- 92 many flows. Each flow may also be small and latency sensitive or 93 large and throughput sensitive while containing a mix of UDP and TCP 94 traffic. All of these can coexist in a single cluster and flow 95 through a single network device simultaneously. Benchmarking of 96 network devices have long used [RFC1242], [RFC2432], [RFC2544], 97 [RFC2889] and [RFC3918] which have largely been focused around 98 various latency attributes and Throughput [RFC2889] of the Device 99 Under Test (DUT) being benchmarked. These standards are good at 100 measuring theoretical Throughput, forwarding rates and latency under 101 testing conditions; however, they do not represent real traffic 102 patterns that may affect these networking devices. 104 Currently, typical data center networking devices are characterized 105 by: 107 -High port density (48 ports of more) 109 -High speed (up to 100 GB/s currently per port) 111 -High throughput (line rate on all ports for Layer 2 and/or Layer 3) 113 -Low latency (in the microsecond or nanosecond range) 115 -Low amount of buffer (in the MB range per networking device) 117 -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) 119 This document provides a methodology for benchmarking Data Center 120 physical network equipment DUT including congestion scenarios, switch 121 buffer analysis, microburst, head of line blocking, while also using 122 a wide mix of traffic conditions. The terminology document [draft- 123 ietf-bmwg-dcbench-terminology] is a pre-requisite. 125 1.1. Requirements Language 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in RFC 2119 [RFC2119]. 131 1.2. Methodology format and repeatability recommendation 133 The format used for each section of this document is the following: 135 -Objective 137 -Methodology 139 -Reporting Format 141 For each test methodology described, it is critical to obtain 142 repeatability in the results. The recommendation is to perform enough 143 iterations of the given test and to make sure the result is 144 consistent. This is especially important for section 3, as the 145 buffering testing has been historically the least reliable. The 146 number of iterations SHOULD be explicitly reported. The relative 147 standard deviation SHOULD be below 10%. 149 2. Line Rate Testing 151 2.1 Objective 153 Provide a maximum rate test for the performance values for 154 Throughput, latency and jitter. It is meant to provide the tests to 155 perform, and methodology to verify that a DUT is capable of 156 forwarding packets at line rate under non-congested conditions. 158 2.2 Methodology 160 A traffic generator SHOULD be connected to all ports on the DUT. Two 161 tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15 162 compliant] and also in a full mesh type of DUT test [2889/3918 163 section 16 compliant]. 165 For all tests, the test traffic generator sending rate MUST be less 166 than or equal to 99.98% of the nominal value of Line Rate (with no 167 further PPM adjustment to account for interface clock tolerances), to 168 ensure stressing the DUT in reasonable worst case conditions (see RFC 169 [draft-ietf-bmwg-dcbench-terminology] section 5 for more details -- 170 note to RFC Editor, please replace all [draft-ietf-bmwg-dcbench- 171 terminology] references in this document with the future RFC number 172 of that draft). Tests results at a lower rate MAY be provided for 173 better understanding of performance increase in terms of latency and 174 jitter when the rate is lower than 99.98%. The receiving rate of the 175 traffic SHOULD be captured during this test in % of line rate. 177 The test MUST provide the statistics of minimum, average and maximum 178 of the latency distribution, for the exact same iteration of the 179 test. 181 The test MUST provide the statistics of minimum, average and maximum 182 of the jitter distribution, for the exact same iteration of the test. 184 Alternatively when a traffic generator can not be connected to all 185 ports on the DUT, a snake test MUST be used for line rate testing, 186 excluding latency and jitter as those became then irrelevant. The 187 snake test consists in the following method: 189 -connect the first and last port of the DUT to a traffic generator 191 -connect back to back sequentially all the ports in between: port 2 192 to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total 193 number of ports of the DUT 195 -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same 196 vlan Y, etc. port n-1 and port n in the same vlan Z. 198 This snake test provides a capability to test line rate for Layer 2 199 and Layer 3 RFC 2544/3918 in instance where a traffic generator with 200 only two ports is available. The latency and jitter are not to be 201 considered with this test. 203 2.3 Reporting Format 205 The report MUST include: 207 -physical layer calibration information as defined into [draft-ietf- 208 bmwg-dcbench-terminology] section 4. 210 -number of ports used 212 -reading for "Throughput received in percentage of bandwidth", while 213 sending 99.98% of nominal value of Line Rate on each port, for each 214 packet size from 64 bytes to 9216 bytes. As guidance, an increment of 215 64 byte packet size between each iteration being ideal, a 256 byte 216 and 512 bytes being are also often used. The most common packets 217 sizes order for the report is: 218 64b,128b,256b,512b,1024b,1518b,4096,8000,9216b. 220 The pattern for testing can be expressed using [RFC 6985]. 222 -Throughput needs to be expressed in % of total transmitted frames 224 -For packet drops, they MUST be expressed as a count of packets and 225 SHOULD be expressed in % of line rate 227 -For latency and jitter, values expressed in unit of time [usually 228 microsecond or nanosecond] reading across packet size from 64 bytes 229 to 9216 bytes 231 -For latency and jitter, provide minimum, average and maximum values. 232 If different iterations are done to gather the minimum, average and 233 maximum, it SHOULD be specified in the report along with a 234 justification on why the information could not have been gathered at 235 the same test iteration 237 -For jitter, a histogram describing the population of packets 238 measured per latency or latency buckets is RECOMMENDED 240 -The tests for Throughput, latency and jitter MAY be conducted as 241 individual independent trials, with proper documentation in the 242 report but SHOULD be conducted at the same time. 244 -The methodology makes an assumption that the DUT has at least nine 245 ports, as certain methodologies require that number of ports or 246 more. 248 3. Buffering Testing 250 3.1 Objective 252 To measure the size of the buffer of a DUT under 253 typical|many|multiple conditions. Buffer architectures between 254 multiple DUTs can differ and include egress buffering, shared egress 255 buffering SoC (Switch-on-Chip), ingress buffering or a combination. 256 The test methodology covers the buffer measurement regardless of 257 buffer architecture used in the DUT. 259 3.2 Methodology 261 A traffic generator MUST be connected to all ports on the DUT. 263 The methodology for measuring buffering for a data-center switch is 264 based on using known congestion of known fixed packet size along with 265 maximum latency value measurements. The maximum latency will increase 266 until the first packet drop occurs. At this point, the maximum 267 latency value will remain constant. This is the point of inflection 268 of this maximum latency change to a constant value. There MUST be 269 multiple ingress ports receiving known amount of frames at a known 270 fixed size, destined for the same egress port in order to create a 271 known congestion condition. The total amount of packets sent from the 272 oversubscribed port minus one, multiplied by the packet size 273 represents the maximum port buffer size at the measured inflection 274 point. 276 1) Measure the highest buffer efficiency 278 The tests described in this section have iterations called "first 279 iteration", "second iteration" and, "last iteration". The idea is to 280 show the first two iterations so the reader understands the logic on 281 how to keep incrementing the iterations. The last iteration shows the 282 end state of the variables. 284 First iteration: ingress port 1 sending line rate to egress port 2, 285 while port 3 sending a known low amount of over-subscription traffic 286 (1% recommended) with a packet size of 64 bytes to egress port 2. 287 Measure the buffer size value of the number of frames sent from the 288 port sending the oversubscribed traffic up to the inflection point 289 multiplied by the frame size. 291 Second iteration: ingress port 1 sending line rate to egress port 2, 292 while port 3 sending a known low amount of over-subscription traffic 293 (1% recommended) with same packet size 65 bytes to egress port 2. 294 Measure the buffer size value of the number of frames sent from the 295 port sending the oversubscribed traffic up to the inflection point 296 multiplied by the frame size. 298 Last iteration: ingress port 1 sending line rate to egress port 2, 299 while port 3 sending a known low amount of over-subscription traffic 300 (1% recommended) with same packet size B bytes to egress port 2. 301 Measure the buffer size value of the number of frames sent from the 302 port sending the oversubscribed traffic up to the inflection point 303 multiplied by the frame size. 305 When the B value is found to provide the largest buffer size, then 306 size B allows the highest buffer efficiency. 308 2) Measure maximum port buffer size 310 The tests described in this section have iterations called "first 311 iteration", "second iteration" and, "last iteration". The idea is to 312 show the first two iterations so the reader understands the logic on 313 how to keep incrementing the iterations. The last iteration shows the 314 end state of the variables. 316 At fixed packet size B determined in procedure 1), for a fixed 317 default Differentiated Services Code Point (DSCP)/Class of Service 318 (COS) value of 0 and for unicast traffic proceed with the following: 320 First iteration: ingress port 1 sending line rate to egress port 2, 321 while port 3 sending a known low amount of over-subscription traffic 322 (1% recommended) with same packet size to the egress port 2. Measure 323 the buffer size value by multiplying the number of extra frames sent 324 by the frame size. 326 Second iteration: ingress port 2 sending line rate to egress port 3, 327 while port 4 sending a known low amount of over-subscription traffic 328 (1% recommended) with same packet size to the egress port 3. Measure 329 the buffer size value by multiplying the number of extra frames sent 330 by the frame size. 332 Last iteration: ingress port N-2 sending line rate traffic to egress 333 port N-1, while port N sending a known low amount of over- 334 subscription traffic (1% recommended) with same packet size to the 335 egress port N. Measure the buffer size value by multiplying the 336 number of extra frames sent by the frame size. 338 This test series MAY be repeated using all different DSCP/COS values 339 of traffic and then using Multicast type of traffic, in order to find 340 if there is any DSCP/COS impact on the buffer size. 342 3) Measure maximum port pair buffer sizes 344 The tests described in this section have iterations called "first 345 iteration", "second iteration" and, "last iteration". The idea is to 346 show the first two iterations so the reader understands the logic on 347 how to keep incrementing the iterations. The last iteration shows the 348 end state of the variables. 350 First iteration: ingress port 1 sending line rate to egress port 2; 351 ingress port 3 sending line rate to egress port 4 etc. Ingress port 352 N-1 and N will respectively over subscribe at 1% of line rate egress 353 port 2 and port 3. Measure the buffer size value by multiplying the 354 number of extra frames sent by the frame size for each egress port. 356 Second iteration: ingress port 1 sending line rate to egress port 2; 357 ingress port 3 sending line rate to egress port 4 etc. Ingress port 358 N-1 and N will respectively over subscribe at 1% of line rate egress 359 port 4 and port 5. Measure the buffer size value by multiplying the 360 number of extra frames sent by the frame size for each egress port. 362 Last iteration: ingress port 1 sending line rate to egress port 2; 363 ingress port 3 sending line rate to egress port 4 etc. Ingress port 364 N-1 and N will respectively over subscribe at 1% of line rate egress 365 port N-3 and port N-2. Measure the buffer size value by multiplying 366 the number of extra frames sent by the frame size for each egress 367 port. 369 This test series MAY be repeated using all different DSCP/COS values 370 of traffic and then using Multicast type of traffic. 372 4) Measure maximum DUT buffer size with many to one ports 374 The tests described in this section have iterations called "first 375 iteration", "second iteration" and, "last iteration". The idea is to 376 show the first two iterations so the reader understands the logic on 377 how to keep incrementing the iterations. The last iteration shows the 378 end state of the variables. 380 First iteration: ingress ports 1,2,... N-1 sending each [(1/[N- 381 1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port. 383 Second iteration: ingress ports 2,... N sending each [(1/[N- 384 1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port. 386 Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N- 387 1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port. 389 This test series MAY be repeated using all different COS values of 390 traffic and then using Multicast type of traffic. 392 Unicast traffic and then Multicast traffic SHOULD be used in order to 393 determine the proportion of buffer for documented selection of tests. 394 Also the COS value for the packets SHOULD be provided for each test 395 iteration as the buffer allocation size MAY differ per COS value. It 396 is RECOMMENDED that the ingress and egress ports are varied in a 397 random, but documented fashion in multiple tests to measure the 398 buffer size for each port of the DUT. 400 3.3 Reporting format 402 The report MUST include: 404 - The packet size used for the most efficient buffer used, along 405 with DSCP/COS value 406 - The maximum port buffer size for each port 408 - The maximum DUT buffer size 410 - The packet size used in the test 412 - The amount of over-subscription if different than 1% 414 - The number of ingress and egress ports along with their location 415 on the DUT 417 - The repeatability of the test needs to be indicated: number of 418 iterations of the same test and percentage of variation between 419 results for each of the tests (min, max, avg) 421 The percentage of variation is a metric providing a sense of how big 422 the difference between the measured value and the previous ones. 424 For example, for a latency test where the minimum latency is 425 measured, the percentage of variation of the minimum latency will 426 indicate by how much this value has varied between the current test 427 executed and the previous one. 429 PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the 430 current test and x1 is the minimum latency value obtained in the 431 previous test. 433 The same formula is used for max and avg variations measured. 435 4 Microburst Testing 437 4.1 Objective 439 To find the maximum amount of packet bursts a DUT can sustain under 440 various configurations. 442 This test provides additional methodology to the other RFC tests: 444 -All bursts should be send with 100% intensity. Note: intensity is 445 defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1 447 -All ports of the DUT must be used for this test 449 -All ports are recommended to be testes simultaneously 451 4.2 Methodology 452 A traffic generator MUST be connected to all ports on the DUT. In 453 order to cause congestion, two or more ingress ports MUST send bursts 454 of packets destined for the same egress port. The simplest of the 455 setups would be two ingress ports and one egress port (2-to-1). 457 The burst MUST be sent with an intensity of 100% (intensity is 458 defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1), 459 meaning the burst of packets will be sent with a minimum inter-packet 460 gap. The amount of packet contained in the burst will be trial 461 variable and increase until there is a non-zero packet loss measured. 462 The aggregate amount of packets from all the senders will be used to 463 calculate the maximum amount of microburst the DUT can sustain. 465 It is RECOMMENDED that the ingress and egress ports are varied in 466 multiple tests to measure the maximum microburst capacity. 468 The intensity of a microburst MAY be varied in order to obtain the 469 microburst capacity at various ingress rates. Intensity of microburst 470 is defined in [draft-ietf-bmwg-dcbench-terminology]. 472 It is RECOMMENDED that all ports on the DUT will be tested 473 simultaneously and in various configurations in order to understand 474 all the combinations of ingress ports, egress ports and intensities. 476 An example would be: 478 First Iteration: N-1 Ingress ports sending to 1 Egress Ports 480 Second Iterations: N-2 Ingress ports sending to 2 Egress Ports 482 Last Iterations: 2 Ingress ports sending to N-2 Egress Ports 484 4.3 Reporting Format 486 The report MUST include: 488 - The maximum number of packets received per ingress port with the 489 maximum burst size obtained with zero packet loss 491 - The packet size used in the test 493 - The number of ingress and egress ports along with their location 494 on the DUT 496 - The repeatability of the test needs to be indicated: number of 497 iterations of the same test and percentage of variation between 498 results (min, max, avg) 500 5. Head of Line Blocking 502 5.1 Objective 504 Head-of-line blocking (HOLB) is a performance-limiting phenomenon 505 that occurs when packets are held-up by the first packet ahead 506 waiting to be transmitted to a different output port. This is defined 507 in RFC 2889 section 5.5, Congestion Control. This section expands on 508 RFC 2889 in the context of Data Center Benchmarking. 510 The objective of this test is to understand the DUT behavior under 511 head of line blocking scenario and measure the packet loss. 513 Here are the differences between this HOLB test and RFC 2889: 515 -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC 516 2889 518 -This HOLB shifts all the port numbers by one in a second iteration 519 of the test, this is new compared to RFC 2889. The shifting port 520 numbers continue until all ports are the first in the group. The 521 purpose is to make sure to have tested all permutations to cover 522 differences of behavior in the SoC of the DUT 524 -Another test in this HOLB expands the group of ports, such that 525 traffic is divided among 4 ports instead of two (25% instead of 50% 526 per port) 528 -Section 5.3 adds additional reporting requirements from Congestion 529 Control in RFC 2889 531 5.2 Methodology 533 In order to cause congestion in the form of head of line blocking, 534 groups of four ports are used. A group has 2 ingress and 2 egress 535 ports. The first ingress port MUST have two flows configured each 536 going to a different egress port. The second ingress port will 537 congest the second egress port by sending line rate. The goal is to 538 measure if there is loss on the flow for the first egress port which 539 is not over-subscribed. 541 A traffic generator MUST be connected to at least eight ports on the 542 DUT and SHOULD be connected using all the DUT ports. 544 1) Measure two groups with eight DUT ports 545 The tests described in this section have iterations called "first 546 iteration", "second iteration" and, "last iteration". The idea is to 547 show the first two iterations so the reader understands the logic on 548 how to keep incrementing the iterations. The last iteration shows the 549 end state of the variables. 551 First iteration: measure the packet loss for two groups with 552 consecutive ports 554 The first group is composed of: ingress port 1 is sending 50% of 555 traffic to egress port 3 and ingress port 1 is sending 50% of traffic 556 to egress port 4. Ingress port 2 is sending line rate to egress port 557 4. Measure the amount of traffic loss for the traffic from ingress 558 port 1 to egress port 3. 560 The second group is composed of: ingress port 5 is sending 50% of 561 traffic to egress port 7 and ingress port 5 is sending 50% of traffic 562 to egress port 8. Ingress port 6 is sending line rate to egress port 563 8. Measure the amount of traffic loss for the traffic from ingress 564 port 5 to egress port 7. 566 Second iteration: repeat the first iteration by shifting all the 567 ports from N to N+1. 569 The first group is composed of: ingress port 2 is sending 50% of 570 traffic to egress port 4 and ingress port 2 is sending 50% of traffic 571 to egress port 5. Ingress port 3 is sending line rate to egress port 572 5. Measure the amount of traffic loss for the traffic from ingress 573 port 2 to egress port 4. 575 The second group is composed of: ingress port 6 is sending 50% of 576 traffic to egress port 8 and ingress port 6 is sending 50% of traffic 577 to egress port 9. Ingress port 7 is sending line rate to egress port 578 9. Measure the amount of traffic loss for the traffic from ingress 579 port 6 to egress port 8. 581 Last iteration: when the first port of the first group is connected 582 on the last DUT port and the last port of the second group is 583 connected to the seventh port of the DUT. 585 Measure the amount of traffic loss for the traffic from ingress port 586 N to egress port 2 and from ingress port 4 to egress port 6. 588 2) Measure with N/4 groups with N DUT ports 590 The tests described in this section have iterations called "first 591 iteration", "second iteration" and, "last iteration". The idea is to 592 show the first two iterations so the reader understands the logic on 593 how to keep incrementing the iterations. The last iteration shows the 594 end state of the variables. 596 The traffic from ingress split across 4 egress ports (100/4=25%). 598 First iteration: Expand to fully utilize all the DUT ports in 599 increments of four. Repeat the methodology of 1) with all the group 600 of ports possible to achieve on the device and measure for each port 601 group the amount of traffic loss. 603 Second iteration: Shift by +1 the start of each consecutive ports of 604 groups 606 Last iteration: Shift by N-1 the start of each consecutive ports of 607 groups and measure the traffic loss for each port group. 609 5.3 Reporting Format 611 For each test the report MUST include: 613 - The port configuration including the number and location of ingress 614 and egress ports located on the DUT 616 - If HOLB was observed in accordance with the HOLB test in section 5 618 - Percent of traffic loss 620 - The repeatability of the test needs to be indicated: number of 621 iteration of the same test and percentage of variation between 622 results (min, max, avg) 624 6. Incast Stateful and Stateless Traffic 626 6.1 Objective 628 The objective of this test is to measure the values for TCP Goodput 629 [1] and latency with a mix of large and small flows. The test is 630 designed to simulate a mixed environment of stateful flows that 631 require high rates of goodput and stateless flows that require low 632 latency. Stateful flows are created by generating TCP traffic and, 633 stateless flows are created using UDP type of traffic. 635 6.2 Methodology 637 In order to simulate the effects of stateless and stateful traffic on 638 the DUT, there MUST be multiple ingress ports receiving traffic 639 destined for the same egress port. There also MAY be a mix of 640 stateful and stateless traffic arriving on a single ingress port. The 641 simplest setup would be 2 ingress ports receiving traffic destined to 642 the same egress port. 644 One ingress port MUST be maintaining a TCP connection trough the 645 ingress port to a receiver connected to an egress port. Traffic in 646 the TCP stream MUST be sent at the maximum rate allowed by the 647 traffic generator. At the same time, the TCP traffic is flowing 648 through the DUT the stateless traffic is sent destined to a receiver 649 on the same egress port. The stateless traffic MUST be a microburst 650 of 100% intensity. 652 It is RECOMMENDED that the ingress and egress ports are varied in 653 multiple tests to measure the maximum microburst capacity. 655 The intensity of a microburst MAY be varied in order to obtain the 656 microburst capacity at various ingress rates. 658 It is RECOMMENDED that all ports on the DUT be used in the test. 660 The tests described bellow have iterations called "first iteration", 661 "second iteration" and, "last iteration". The idea is to show the 662 first two iterations so the reader understands the logic on how to 663 keep incrementing the iterations. The last iteration shows the end 664 state of the variables. 666 For example: 668 Stateful Traffic port variation (TCP traffic): 670 TCP traffic needs to be generated in this section. During Iterations 671 number of Egress ports MAY vary as well. 673 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 674 Ingress port receiving stateless traffic destined to 1 Egress Port 676 Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1 677 Ingress port receiving stateless traffic destined to 1 Egress Port 679 Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1 680 Ingress port receiving stateless traffic destined to 1 Egress Port 682 Stateless Traffic port variation (UDP traffic): 684 UDP traffic needs to be generated for this test. During Iterations, 685 the number of Egress ports MAY vary as well. 687 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 688 Ingress port receiving stateless traffic destined to 1 Egress Port 690 Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2 691 Ingress port receiving stateless traffic destined to 1 Egress Port 693 Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2 694 Ingress port receiving stateless traffic destined to 1 Egress Port 696 6.3 Reporting Format 698 The report MUST include the following: 700 - Number of ingress and egress ports along with designation of 701 stateful or stateless flow assignment. 703 - Stateful flow goodput 705 - Stateless flow latency 707 - The repeatability of the test needs to be indicated: number of 708 iterations of the same test and percentage of variation between 709 results (min, max, avg) 711 7. Security Considerations 713 Benchmarking activities as described in this memo are limited to 714 technology characterization using controlled stimuli in a laboratory 715 environment, with dedicated address space and the constraints 716 specified in the sections above. 718 The benchmarking network topology will be an independent test setup 719 and MUST NOT be connected to devices that may forward the test 720 traffic into a production network, or misroute traffic to the test 721 management network. 723 Further, benchmarking is performed on a "black-box" basis, relying 724 solely on measurements observable external to the DUT. 726 Special capabilities SHOULD NOT exist in the DUT specifically for 727 benchmarking purposes. Any implications for network security arising 728 from the DUT SHOULD be identical in the lab and in production 729 networks. 731 8. IANA Considerations 733 NO IANA Action is requested at this time. 735 9. References 736 9.1. Normative References 738 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 739 Interconnection Devices", BCP 14, RFC 1242, DOI 740 10.17487/RFC1242, July 1991, 743 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 744 Network Interconnect Devices", BCP 14, RFC 2544, DOI 745 10.17487/RFC2544, March 1999, 748 9.2. Informative References 750 [draft-ietf-bmwg-dcbench-terminology] Avramov L. and Rapp J., "Data 751 Center Benchmarking Terminology", April 2017, RFC "draft-ietf- 752 bmwg-dcbench-terminology", Date [to be fixed when the RFC is 753 published and 1 to be replaced by the RFC number 755 [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for 756 LAN Switching Devices", RFC 2889, August 2000, 759 [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast 760 Benchmarking", RFC 3918, October 2004, 763 [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable 764 Packet Sizes for Additional Testing", RFC 6985, July 2013, 765 767 [1] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 768 Joseph, "Understanding TCP Incast Throughput Collapse in 769 Datacenter Networks, 770 "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" 772 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 773 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 774 March 1997, 776 [RFC2432] Dubray, K., "Terminology for IP Multicast 777 Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 778 1998, 780 9.2. Acknowledgements 782 The authors would like to thank Alfred Morton and Scott Bradner 783 for their reviews and feedback. 785 Authors' Addresses 787 Lucien Avramov 788 Google 789 1600 Amphitheatre Parkway 790 Mountain View, CA 94043 791 United States 792 Phone: +1 408 774 9077 793 Email: lucien.avramov@gmail.com 795 Jacob Rapp 796 VMware 797 3401 Hillview Ave 798 Palo Alto, CA 799 United States 800 Phone: +1 650 857 3367 801 Email: jrapp@vmware.com