idnits 2.17.1 draft-ietf-bmwg-dcbench-methodology-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 INTERNET-DRAFT, Intended Status: Informational Google 4 Expires December 22,2017 J. Rapp 5 June 20, 2017 VMware 7 Data Center Benchmarking Methodology 8 draft-ietf-bmwg-dcbench-methodology-13 10 Abstract 12 The purpose of this informational document is to establish test and 13 evaluation methodology and measurement techniques for physical 14 network equipment in the data center. Many of these terms and methods 15 may be applicable beyond this publication's scope as the technologies 16 originally applied in the data center are deployed elsewhere. 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute working 25 documents as Internet-Drafts. The list of current Internet-Drafts is 26 at http://datatracker.ietf.org/drafts/current. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 Copyright Notice 35 Copyright (c) 2017 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 52 1.2. Methodology format and repeatability recommendation . . . . 5 53 2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . . 5 54 2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 5 55 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 6 57 3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . . 7 58 3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 7 59 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 8 60 3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 11 61 4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11 62 4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 11 63 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 12 64 4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12 65 5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13 66 5.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 13 67 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13 68 5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15 69 6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15 70 6.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 15 71 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 16 72 6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 75 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 76 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 77 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 78 9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 81 1. Introduction 83 Traffic patterns in the data center are not uniform and are 84 constantly changing. They are dictated by the nature and variety of 85 applications utilized in the data center. It can be largely east-west 86 traffic flows in one data center and north-south in another, while 87 others may combine both. Traffic patterns can be bursty in nature and 88 contain many-to-one, many-to-many, or one-to-many flows. Each flow 89 may also be small and latency sensitive or large and throughput 90 sensitive while containing a mix of UDP and TCP traffic. All of these 91 can coexist in a single cluster and flow through a single network 92 device simultaneously. Benchmarking of network devices have long used 93 [RFC1242], [RFC2432], [RFC2544], [RFC2889] and [RFC3918] which have 94 largely been focused around various latency attributes and Throughput 95 [RFC2889] of the Device Under Test (DUT) being benchmarked. These 96 standards are good at measuring theoretical Throughput, forwarding 97 rates and latency under testing conditions; however, they do not 98 represent real traffic patterns that may affect these networking 99 devices. 101 The following provides a methodology for benchmarking Data Center 102 physical network equipment DUT including congestion scenarios, switch 103 buffer analysis, microburst, head of line blocking, while also using 104 a wide mix of traffic conditions. The terminology document [1] is a 105 pre-requisite. 107 1.1. Requirements Language 109 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 110 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 111 document are to be interpreted as described in RFC 2119 [RFC2119]. 113 1.2. Methodology format and repeatability recommendation 115 The format used for each section of this document is the following: 117 -Objective 119 -Methodology 121 -Reporting Format: Additional interpretation of RFC2119 terms: 123 MUST: required metric or benchmark for the scenario described 124 (minimum) 126 SHOULD or RECOMMENDED: strongly suggested metric for the scenario 127 described 129 MAY: Optional metric for the scenario described 131 For each test methodology described, it is critical to obtain 132 repeatability in the results. The recommendation is to perform enough 133 iterations of the given test and to make sure the result is 134 consistent. This is especially important for section 3, as the 135 buffering testing has been historically the least reliable. The 136 number of iterations SHOULD be explicitly reported. The relative 137 standard deviation SHOULD be below 10%. 139 2. Line Rate Testing 141 2.1 Objective 143 Provide a maximum rate test for the performance values for 144 Throughput, latency and jitter. It is meant to provide the tests to 145 perform, and methodology to verify that a DUT is capable of 146 forwarding packets at line rate under non-congested conditions. 148 2.2 Methodology 150 A traffic generator SHOULD be connected to all ports on the DUT. Two 151 tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15 152 compliant] and also in a full mesh type of DUT test [2889/3918 153 section 16 compliant]. 155 For all tests, the test traffic generator sending rate MUST be less 156 than or equal to 99.98% of the nominal value of Line Rate (with no 157 further PPM adjustment to account for interface clock tolerances), to 158 ensure stressing the DUT in reasonable worst case conditions (see RFC 159 [1] section 5 for more details --note to RFC Editor, please replace 160 all [1] references in this document with the future RFC number of 161 that draft). Tests results at a lower rate MAY be provided for better 162 understanding of performance increase in terms of latency and jitter 163 when the rate is lower than 99.98%. The receiving rate of the traffic 164 SHOULD be captured during this test in % of line rate. 166 The test MUST provide the statistics of minimum, average and maximum 167 of the latency distribution, for the exact same iteration of the 168 test. 170 The test MUST provide the statistics of minimum, average and maximum 171 of the jitter distribution, for the exact same iteration of the test. 173 Alternatively when a traffic generator can not be connected to all 174 ports on the DUT, a snake test MUST be used for line rate testing, 175 excluding latency and jitter as those became then irrelevant. The 176 snake test consists in the following method: 178 -connect the first and last port of the DUT to a traffic generator 180 -connect back to back sequentially all the ports in between: port 2 181 to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total 182 number of ports of the DUT 184 -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same 185 vlan Y, etc. port n-1 and port n in the same vlan Z. 187 This snake test provides a capability to test line rate for Layer 2 188 and Layer 3 RFC 2544/3918 in instance where a traffic generator with 189 only two ports is available. The latency and jitter are not to be 190 considered with this test. 192 2.3 Reporting Format 194 The report MUST include: 196 -physical layer calibration information as defined into [1] section 197 4. 199 -number of ports used 201 -reading for "Throughput received in percentage of bandwidth", while 202 sending 99.98% of nominal value of Line Rate on each port, for each 203 packet size from 64 bytes to 9216 bytes. As guidance, an increment of 204 64 byte packet size between each iteration being ideal, a 256 byte 205 and 512 bytes being are also often used. The most common packets 206 sizes order for the report is: 207 64b,128b,256b,512b,1024b,1518b,4096,8000,9216b. 209 The pattern for testing can be expressed using [RFC 6985]. 211 -Throughput needs to be expressed in % of total transmitted frames 213 -For packet drops, they MUST be expressed as a count of packets and 214 SHOULD be expressed in % of line rate 216 -For latency and jitter, values expressed in unit of time [usually 217 microsecond or nanosecond] reading across packet size from 64 bytes 218 to 9216 bytes 220 -For latency and jitter, provide minimum, average and maximum values. 221 If different iterations are done to gather the minimum, average and 222 maximum, it SHOULD be specified in the report along with a 223 justification on why the information could not have been gathered at 224 the same test iteration 226 -For jitter, a histogram describing the population of packets 227 measured per latency or latency buckets is RECOMMENDED 229 -The tests for Throughput, latency and jitter MAY be conducted as 230 individual independent trials, with proper documentation in the 231 report but SHOULD be conducted at the same time. 233 -The methodology makes an assumption that the DUT has at least nine 234 ports, as certain methodologies require that number of ports or 235 more. 237 3. Buffering Testing 239 3.1 Objective 241 To measure the size of the buffer of a DUT under 242 typical|many|multiple conditions. Buffer architectures between 243 multiple DUTs can differ and include egress buffering, shared egress 244 buffering SoC (Switch-on-Chip), ingress buffering or a combination. 245 The test methodology covers the buffer measurement regardless of 246 buffer architecture used in the DUT. 248 3.2 Methodology 250 A traffic generator MUST be connected to all ports on the DUT. 252 The methodology for measuring buffering for a data-center switch is 253 based on using known congestion of known fixed packet size along with 254 maximum latency value measurements. The maximum latency will increase 255 until the first packet drop occurs. At this point, the maximum 256 latency value will remain constant. This is the point of inflection 257 of this maximum latency change to a constant value. There MUST be 258 multiple ingress ports receiving known amount of frames at a known 259 fixed size, destined for the same egress port in order to create a 260 known congestion condition. The total amount of packets sent from the 261 oversubscribed port minus one, multiplied by the packet size 262 represents the maximum port buffer size at the measured inflection 263 point. 265 1) Measure the highest buffer efficiency 267 The tests described in this section have iterations called "first 268 iteration", "second iteration" and, "last iteration". The idea is to 269 show the first two iterations so the reader understands the logic on 270 how to keep incrementing the iterations. The last iteration shows the 271 end state of the variables. 273 First iteration: ingress port 1 sending line rate to egress port 2, 274 while port 3 sending a known low amount of over-subscription traffic 275 (1% recommended) with a packet size of 64 bytes to egress port 2. 276 Measure the buffer size value of the number of frames sent from the 277 port sending the oversubscribed traffic up to the inflection point 278 multiplied by the frame size. 280 Second iteration: ingress port 1 sending line rate to egress port 2, 281 while port 3 sending a known low amount of over-subscription traffic 282 (1% recommended) with same packet size 65 bytes to egress port 2. 283 Measure the buffer size value of the number of frames sent from the 284 port sending the oversubscribed traffic up to the inflection point 285 multiplied by the frame size. 287 Last iteration: ingress port 1 sending line rate to egress port 2, 288 while port 3 sending a known low amount of over-subscription traffic 289 (1% recommended) with same packet size B bytes to egress port 2. 290 Measure the buffer size value of the number of frames sent from the 291 port sending the oversubscribed traffic up to the inflection point 292 multiplied by the frame size. 294 When the B value is found to provide the largest buffer size, then 295 size B allows the highest buffer efficiency. 297 2) Measure maximum port buffer size 299 The tests described in this section have iterations called "first 300 iteration", "second iteration" and, "last iteration". The idea is to 301 show the first two iterations so the reader understands the logic on 302 how to keep incrementing the iterations. The last iteration shows the 303 end state of the variables. 305 At fixed packet size B determined in procedure 1), for a fixed 306 default Differentiated Services Code Point (DSCP)/Class of Service 307 (COS) value of 0 and for unicast traffic proceed with the following: 309 First iteration: ingress port 1 sending line rate to egress port 2, 310 while port 3 sending a known low amount of over-subscription traffic 311 (1% recommended) with same packet size to the egress port 2. Measure 312 the buffer size value by multiplying the number of extra frames sent 313 by the frame size. 315 Second iteration: ingress port 2 sending line rate to egress port 3, 316 while port 4 sending a known low amount of over-subscription traffic 317 (1% recommended) with same packet size to the egress port 3. Measure 318 the buffer size value by multiplying the number of extra frames sent 319 by the frame size. 321 Last iteration: ingress port N-2 sending line rate traffic to egress 322 port N-1, while port N sending a known low amount of over- 323 subscription traffic (1% recommended) with same packet size to the 324 egress port N. Measure the buffer size value by multiplying the 325 number of extra frames sent by the frame size. 327 This test series MAY be repeated using all different DSCP/COS values 328 of traffic and then using Multicast type of traffic, in order to find 329 if there is any DSCP/COS impact on the buffer size. 331 3) Measure maximum port pair buffer sizes 333 The tests described in this section have iterations called "first 334 iteration", "second iteration" and, "last iteration". The idea is to 335 show the first two iterations so the reader understands the logic on 336 how to keep incrementing the iterations. The last iteration shows the 337 end state of the variables. 339 First iteration: ingress port 1 sending line rate to egress port 2; 340 ingress port 3 sending line rate to egress port 4 etc. Ingress port 341 N-1 and N will respectively over subscribe at 1% of line rate egress 342 port 2 and port 3. Measure the buffer size value by multiplying the 343 number of extra frames sent by the frame size for each egress port. 345 Second iteration: ingress port 1 sending line rate to egress port 2; 346 ingress port 3 sending line rate to egress port 4 etc. Ingress port 347 N-1 and N will respectively over subscribe at 1% of line rate egress 348 port 4 and port 5. Measure the buffer size value by multiplying the 349 number of extra frames sent by the frame size for each egress port. 351 Last iteration: ingress port 1 sending line rate to egress port 2; 352 ingress port 3 sending line rate to egress port 4 etc. Ingress port 353 N-1 and N will respectively over subscribe at 1% of line rate egress 354 port N-3 and port N-2. Measure the buffer size value by multiplying 355 the number of extra frames sent by the frame size for each egress 356 port. 358 This test series MAY be repeated using all different DSCP/COS values 359 of traffic and then using Multicast type of traffic. 361 4) Measure maximum DUT buffer size with many to one ports 363 The tests described in this section have iterations called "first 364 iteration", "second iteration" and, "last iteration". The idea is to 365 show the first two iterations so the reader understands the logic on 366 how to keep incrementing the iterations. The last iteration shows the 367 end state of the variables. 369 First iteration: ingress ports 1,2,... N-1 sending each [(1/[N- 370 1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port. 372 Second iteration: ingress ports 2,... N sending each [(1/[N- 373 1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port. 375 Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N- 376 1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port. 378 This test series MAY be repeated using all different COS values of 379 traffic and then using Multicast type of traffic. 381 Unicast traffic and then Multicast traffic SHOULD be used in order to 382 determine the proportion of buffer for documented selection of tests. 383 Also the COS value for the packets SHOULD be provided for each test 384 iteration as the buffer allocation size MAY differ per COS value. It 385 is RECOMMENDED that the ingress and egress ports are varied in a 386 random, but documented fashion in multiple tests to measure the 387 buffer size for each port of the DUT. 389 3.3 Reporting format 391 The report MUST include: 393 - The packet size used for the most efficient buffer used, along 394 with DSCP/COS value 396 - The maximum port buffer size for each port 398 - The maximum DUT buffer size 400 - The packet size used in the test 402 - The amount of over-subscription if different than 1% 404 - The number of ingress and egress ports along with their location 405 on the DUT 407 - The repeatability of the test needs to be indicated: number of 408 iterations of the same test and percentage of variation between 409 results for each of the tests (min, max, avg) 411 The percentage of variation is a metric providing a sense of how big 412 the difference between the measured value and the previous ones. 414 For example, for a latency test where the minimum latency is 415 measured, the percentage of variation of the minimum latency will 416 indicate by how much this value has varied between the current test 417 executed and the previous one. 419 PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the 420 current test and x1 is the minimum latency value obtained in the 421 previous test. 423 The same formula is used for max and avg variations measured. 425 4 Microburst Testing 427 4.1 Objective 429 To find the maximum amount of packet bursts a DUT can sustain under 430 various configurations. 432 This test provides additional methodology to the other RFC tests: 434 -All bursts should be send with 100% intensity. Note: intensity is 435 defined in [1] section 6.1.1 436 -All ports of the DUT must be used for this test 438 -All ports are recommended to be testes simultaneously 440 4.2 Methodology 442 A traffic generator MUST be connected to all ports on the DUT. In 443 order to cause congestion, two or more ingress ports MUST send bursts 444 of packets destined for the same egress port. The simplest of the 445 setups would be two ingress ports and one egress port (2-to-1). 447 The burst MUST be sent with an intensity of 100% (intensity is 448 defined in [1] section 6.1.1), meaning the burst of packets will be 449 sent with a minimum inter-packet gap. The amount of packet contained 450 in the burst will be trial variable and increase until there is a 451 non-zero packet loss measured. The aggregate amount of packets from 452 all the senders will be used to calculate the maximum amount of 453 microburst the DUT can sustain. 455 It is RECOMMENDED that the ingress and egress ports are varied in 456 multiple tests to measure the maximum microburst capacity. 458 The intensity of a microburst MAY be varied in order to obtain the 459 microburst capacity at various ingress rates. Intensity of microburst 460 is defined in [1]. 462 It is RECOMMENDED that all ports on the DUT will be tested 463 simultaneously and in various configurations in order to understand 464 all the combinations of ingress ports, egress ports and intensities. 466 An example would be: 468 First Iteration: N-1 Ingress ports sending to 1 Egress Ports 470 Second Iterations: N-2 Ingress ports sending to 2 Egress Ports 472 Last Iterations: 2 Ingress ports sending to N-2 Egress Ports 474 4.3 Reporting Format 476 The report MUST include: 478 - The maximum number of packets received per ingress port with the 479 maximum burst size obtained with zero packet loss 481 - The packet size used in the test 482 - The number of ingress and egress ports along with their location 483 on the DUT 485 - The repeatability of the test needs to be indicated: number of 486 iterations of the same test and percentage of variation between 487 results (min, max, avg) 489 5. Head of Line Blocking 491 5.1 Objective 493 Head-of-line blocking (HOLB) is a performance-limiting phenomenon 494 that occurs when packets are held-up by the first packet ahead 495 waiting to be transmitted to a different output port. This is defined 496 in RFC 2889 section 5.5, Congestion Control. This section expands on 497 RFC 2889 in the context of Data Center Benchmarking. 499 The objective of this test is to understand the DUT behavior under 500 head of line blocking scenario and measure the packet loss. 502 Here are the differences between this HOLB test and RFC 2889: 504 -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC 505 2889 507 -This HOLB shifts all the port numbers by one in a second iteration 508 of the test, this is new compared to RFC 2889. The shifting port 509 numbers continue until all ports are the first in the group. The 510 purpose is to make sure to have tested all permutations to cover 511 differences of behavior in the SoC of the DUT 513 -Another test in this HOLB expands the group of ports, such that 514 traffic is divided among 4 ports instead of two (25% instead of 50% 515 per port) 517 -Section 5.3 adds additional reporting requirements from Congestion 518 Control in RFC 2889 520 5.2 Methodology 522 In order to cause congestion in the form of head of line blocking, 523 groups of four ports are used. A group has 2 ingress and 2 egress 524 ports. The first ingress port MUST have two flows configured each 525 going to a different egress port. The second ingress port will 526 congest the second egress port by sending line rate. The goal is to 527 measure if there is loss on the flow for the first egress port which 528 is not over-subscribed. 530 A traffic generator MUST be connected to at least eight ports on the 531 DUT and SHOULD be connected using all the DUT ports. 533 1) Measure two groups with eight DUT ports 535 The tests described in this section have iterations called "first 536 iteration", "second iteration" and, "last iteration". The idea is to 537 show the first two iterations so the reader understands the logic on 538 how to keep incrementing the iterations. The last iteration shows the 539 end state of the variables. 541 First iteration: measure the packet loss for two groups with 542 consecutive ports 544 The first group is composed of: ingress port 1 is sending 50% of 545 traffic to egress port 3 and ingress port 1 is sending 50% of traffic 546 to egress port 4. Ingress port 2 is sending line rate to egress port 547 4. Measure the amount of traffic loss for the traffic from ingress 548 port 1 to egress port 3. 550 The second group is composed of: ingress port 5 is sending 50% of 551 traffic to egress port 7 and ingress port 5 is sending 50% of traffic 552 to egress port 8. Ingress port 6 is sending line rate to egress port 553 8. Measure the amount of traffic loss for the traffic from ingress 554 port 5 to egress port 7. 556 Second iteration: repeat the first iteration by shifting all the 557 ports from N to N+1. 559 The first group is composed of: ingress port 2 is sending 50% of 560 traffic to egress port 4 and ingress port 2 is sending 50% of traffic 561 to egress port 5. Ingress port 3 is sending line rate to egress port 562 5. Measure the amount of traffic loss for the traffic from ingress 563 port 2 to egress port 4. 565 The second group is composed of: ingress port 6 is sending 50% of 566 traffic to egress port 8 and ingress port 6 is sending 50% of traffic 567 to egress port 9. Ingress port 7 is sending line rate to egress port 568 9. Measure the amount of traffic loss for the traffic from ingress 569 port 6 to egress port 8. 571 Last iteration: when the first port of the first group is connected 572 on the last DUT port and the last port of the second group is 573 connected to the seventh port of the DUT. 575 Measure the amount of traffic loss for the traffic from ingress port 576 N to egress port 2 and from ingress port 4 to egress port 6. 578 2) Measure with N/4 groups with N DUT ports 580 The tests described in this section have iterations called "first 581 iteration", "second iteration" and, "last iteration". The idea is to 582 show the first two iterations so the reader understands the logic on 583 how to keep incrementing the iterations. The last iteration shows the 584 end state of the variables. 586 The traffic from ingress split across 4 egress ports (100/4=25%). 588 First iteration: Expand to fully utilize all the DUT ports in 589 increments of four. Repeat the methodology of 1) with all the group 590 of ports possible to achieve on the device and measure for each port 591 group the amount of traffic loss. 593 Second iteration: Shift by +1 the start of each consecutive ports of 594 groups 596 Last iteration: Shift by N-1 the start of each consecutive ports of 597 groups and measure the traffic loss for each port group. 599 5.3 Reporting Format 601 For each test the report MUST include: 603 - The port configuration including the number and location of ingress 604 and egress ports located on the DUT 606 - If HOLB was observed in accordance with the HOLB test in section 5 608 - Percent of traffic loss 610 - The repeatability of the test needs to be indicated: number of 611 iteration of the same test and percentage of variation between 612 results (min, max, avg) 614 6. Incast Stateful and Stateless Traffic 616 6.1 Objective 618 The objective of this test is to measure the values for TCP Goodput 619 [4] and latency with a mix of large and small flows. The test is 620 designed to simulate a mixed environment of stateful flows that 621 require high rates of goodput and stateless flows that require low 622 latency. Stateful flows are created by generating TCP traffic and, 623 stateless flows are created using UDP type of traffic. 625 6.2 Methodology 627 In order to simulate the effects of stateless and stateful traffic on 628 the DUT, there MUST be multiple ingress ports receiving traffic 629 destined for the same egress port. There also MAY be a mix of 630 stateful and stateless traffic arriving on a single ingress port. The 631 simplest setup would be 2 ingress ports receiving traffic destined to 632 the same egress port. 634 One ingress port MUST be maintaining a TCP connection trough the 635 ingress port to a receiver connected to an egress port. Traffic in 636 the TCP stream MUST be sent at the maximum rate allowed by the 637 traffic generator. At the same time, the TCP traffic is flowing 638 through the DUT the stateless traffic is sent destined to a receiver 639 on the same egress port. The stateless traffic MUST be a microburst 640 of 100% intensity. 642 It is RECOMMENDED that the ingress and egress ports are varied in 643 multiple tests to measure the maximum microburst capacity. 645 The intensity of a microburst MAY be varied in order to obtain the 646 microburst capacity at various ingress rates. 648 It is RECOMMENDED that all ports on the DUT be used in the test. 650 The tests described bellow have iterations called "first iteration", 651 "second iteration" and, "last iteration". The idea is to show the 652 first two iterations so the reader understands the logic on how to 653 keep incrementing the iterations. The last iteration shows the end 654 state of the variables. 656 For example: 658 Stateful Traffic port variation (TCP traffic): 660 TCP traffic needs to be generated in this section. During Iterations 661 number of Egress ports MAY vary as well. 663 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 664 Ingress port receiving stateless traffic destined to 1 Egress Port 666 Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1 667 Ingress port receiving stateless traffic destined to 1 Egress Port 668 Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1 669 Ingress port receiving stateless traffic destined to 1 Egress Port 671 Stateless Traffic port variation (UDP traffic): 673 UDP traffic needs to be generated for this test. During Iterations, 674 the number of Egress ports MAY vary as well. 676 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 677 Ingress port receiving stateless traffic destined to 1 Egress Port 679 Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2 680 Ingress port receiving stateless traffic destined to 1 Egress Port 682 Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2 683 Ingress port receiving stateless traffic destined to 1 Egress Port 685 6.3 Reporting Format 687 The report MUST include the following: 689 - Number of ingress and egress ports along with designation of 690 stateful or stateless flow assignment. 692 - Stateful flow goodput 694 - Stateless flow latency 696 - The repeatability of the test needs to be indicated: number of 697 iterations of the same test and percentage of variation between 698 results (min, max, avg) 700 7. Security Considerations 702 Benchmarking activities as described in this memo are limited to 703 technology characterization using controlled stimuli in a laboratory 704 environment, with dedicated address space and the constraints 705 specified in the sections above. 707 The benchmarking network topology will be an independent test setup 708 and MUST NOT be connected to devices that may forward the test 709 traffic into a production network, or misroute traffic to the test 710 management network. 712 Further, benchmarking is performed on a "black-box" basis, relying 713 solely on measurements observable external to the DUT/SUT. 715 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for 716 benchmarking purposes. Any implications for network security arising 717 from the DUT/SUT SHOULD be identical in the lab and in production 718 networks. 720 8. IANA Considerations 722 NO IANA Action is requested at this time. 724 9. References 725 9.1. Normative References 727 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 728 Interconnection Devices", BCP 14, RFC 1242, DOI 729 10.17487/RFC1242, July 1991, 732 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 733 Network Interconnect Devices", BCP 14, RFC 2544, DOI 734 10.17487/RFC2544, March 1999, 737 9.2. Informative References 739 [1] Avramov L. and Rapp J., "Data Center Benchmarking Terminology", 740 April 2017. 742 [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for 743 LAN Switching Devices", RFC 2889, August 2000, 746 [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast 747 Benchmarking", RFC 3918, October 2004, 750 [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable 751 Packet Sizes for Additional Testing", RFC 6985, July 2013, 752 754 [4] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 755 Joseph, "Understanding TCP Incast Throughput Collapse in 756 Datacenter Networks, 757 "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" 759 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 760 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 761 March 1997, 763 [RFC2432] Dubray, K., "Terminology for IP Multicast 764 Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 765 1998, 767 9.2. Acknowledgements 768 The authors would like to thank Alfred Morton and Scott Bradner 769 for their reviews and feedback. 771 Authors' Addresses 773 Lucien Avramov 774 Google 775 1600 Amphitheatre Parkway 776 Mountain View, CA 94043 777 United States 778 Phone: +1 408 774 9077 779 Email: lucien.avramov@gmail.com 781 Jacob Rapp 782 VMware 783 3401 Hillview Ave 784 Palo Alto, CA 785 United States 786 Phone: +1 650 857 3367 787 Email: jrapp@vmware.com