idnits 2.17.1 draft-ietf-bmwg-dcbench-methodology-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 INTERNET-DRAFT, Intended Status: Informational Google 4 Expires December 22,2017 J. Rapp 5 June 20, 2017 VMware 7 Data Center Benchmarking Methodology 8 draft-ietf-bmwg-dcbench-methodology-14 10 Abstract 12 The purpose of this informational document is to establish test and 13 evaluation methodology and measurement techniques for physical 14 network equipment in the data center. A pre-requisite to this 15 publication is the terminology document [1]. Many of these terms and 16 methods may be applicable beyond this publication's scope as the 17 technologies originally applied in the data center are deployed 18 elsewhere. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute working 27 documents as Internet-Drafts. The list of current Internet-Drafts is 28 at http://datatracker.ietf.org/drafts/current. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 54 1.2. Methodology format and repeatability recommendation . . . . 5 55 2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 6 59 3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . . 7 60 3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 8 62 3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 11 63 4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11 64 4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 11 65 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 12 66 4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12 67 5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13 68 5.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13 70 5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15 71 6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15 72 6.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 16 74 6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 77 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 78 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 79 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 80 9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 83 1. Introduction 85 Traffic patterns in the data center are not uniform and are 86 constantly changing. They are dictated by the nature and variety of 87 applications utilized in the data center. It can be largely east-west 88 traffic flows in one data center and north-south in another, while 89 others may combine both. Traffic patterns can be bursty in nature and 90 contain many-to-one, many-to-many, or one-to-many flows. Each flow 91 may also be small and latency sensitive or large and throughput 92 sensitive while containing a mix of UDP and TCP traffic. All of these 93 can coexist in a single cluster and flow through a single network 94 device simultaneously. Benchmarking of network devices have long used 95 [RFC1242], [RFC2432], [RFC2544], [RFC2889] and [RFC3918] which have 96 largely been focused around various latency attributes and Throughput 97 [RFC2889] of the Device Under Test (DUT) being benchmarked. These 98 standards are good at measuring theoretical Throughput, forwarding 99 rates and latency under testing conditions; however, they do not 100 represent real traffic patterns that may affect these networking 101 devices. 103 Currently, typical data Center networking devices are characterized 104 by: 106 -High port density (48 ports of more) 108 -High speed (up to 100 Gb/s currently per port) 110 -High throughput (line rate on all ports for Layer 2 and/or Layer 3) 112 -Low latency (in the microsecond or nanosecond range) 114 -Low amount of buffer (in the Mb range) 116 -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) 118 This document provides a methodology for benchmarking Data Center 119 physical network equipment DUT including congestion scenarios, switch 120 buffer analysis, microburst, head of line blocking, while also using 121 a wide mix of traffic conditions. The terminology document [1] is a 122 pre-requisite. 124 1.1. Requirements Language 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 128 document are to be interpreted as described in RFC 2119 [RFC2119]. 130 1.2. Methodology format and repeatability recommendation 132 The format used for each section of this document is the following: 134 -Objective 136 -Methodology 138 -Reporting Format: Additional interpretation of RFC2119 terms: 140 MUST: required metric or benchmark for the scenario described 141 (minimum) 143 SHOULD or RECOMMENDED: strongly suggested metric for the scenario 144 described 146 MAY: Optional metric for the scenario described 148 For each test methodology described, it is critical to obtain 149 repeatability in the results. The recommendation is to perform enough 150 iterations of the given test and to make sure the result is 151 consistent. This is especially important for section 3, as the 152 buffering testing has been historically the least reliable. The 153 number of iterations SHOULD be explicitly reported. The relative 154 standard deviation SHOULD be below 10%. 156 2. Line Rate Testing 158 2.1 Objective 160 Provide a maximum rate test for the performance values for 161 Throughput, latency and jitter. It is meant to provide the tests to 162 perform, and methodology to verify that a DUT is capable of 163 forwarding packets at line rate under non-congested conditions. 165 2.2 Methodology 167 A traffic generator SHOULD be connected to all ports on the DUT. Two 168 tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15 169 compliant] and also in a full mesh type of DUT test [2889/3918 170 section 16 compliant]. 172 For all tests, the test traffic generator sending rate MUST be less 173 than or equal to 99.98% of the nominal value of Line Rate (with no 174 further PPM adjustment to account for interface clock tolerances), to 175 ensure stressing the DUT in reasonable worst case conditions (see RFC 176 [1] section 5 for more details --note to RFC Editor, please replace 177 all [1] references in this document with the future RFC number of 178 that draft). Tests results at a lower rate MAY be provided for better 179 understanding of performance increase in terms of latency and jitter 180 when the rate is lower than 99.98%. The receiving rate of the traffic 181 SHOULD be captured during this test in % of line rate. 183 The test MUST provide the statistics of minimum, average and maximum 184 of the latency distribution, for the exact same iteration of the 185 test. 187 The test MUST provide the statistics of minimum, average and maximum 188 of the jitter distribution, for the exact same iteration of the test. 190 Alternatively when a traffic generator can not be connected to all 191 ports on the DUT, a snake test MUST be used for line rate testing, 192 excluding latency and jitter as those became then irrelevant. The 193 snake test consists in the following method: 195 -connect the first and last port of the DUT to a traffic generator 197 -connect back to back sequentially all the ports in between: port 2 198 to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total 199 number of ports of the DUT 201 -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same 202 vlan Y, etc. port n-1 and port n in the same vlan Z. 204 This snake test provides a capability to test line rate for Layer 2 205 and Layer 3 RFC 2544/3918 in instance where a traffic generator with 206 only two ports is available. The latency and jitter are not to be 207 considered with this test. 209 2.3 Reporting Format 211 The report MUST include: 213 -physical layer calibration information as defined into [1] section 214 4. 216 -number of ports used 218 -reading for "Throughput received in percentage of bandwidth", while 219 sending 99.98% of nominal value of Line Rate on each port, for each 220 packet size from 64 bytes to 9216 bytes. As guidance, an increment of 221 64 byte packet size between each iteration being ideal, a 256 byte 222 and 512 bytes being are also often used. The most common packets 223 sizes order for the report is: 224 64b,128b,256b,512b,1024b,1518b,4096,8000,9216b. 226 The pattern for testing can be expressed using [RFC 6985]. 228 -Throughput needs to be expressed in % of total transmitted frames 230 -For packet drops, they MUST be expressed as a count of packets and 231 SHOULD be expressed in % of line rate 233 -For latency and jitter, values expressed in unit of time [usually 234 microsecond or nanosecond] reading across packet size from 64 bytes 235 to 9216 bytes 237 -For latency and jitter, provide minimum, average and maximum values. 238 If different iterations are done to gather the minimum, average and 239 maximum, it SHOULD be specified in the report along with a 240 justification on why the information could not have been gathered at 241 the same test iteration 243 -For jitter, a histogram describing the population of packets 244 measured per latency or latency buckets is RECOMMENDED 246 -The tests for Throughput, latency and jitter MAY be conducted as 247 individual independent trials, with proper documentation in the 248 report but SHOULD be conducted at the same time. 250 -The methodology makes an assumption that the DUT has at least nine 251 ports, as certain methodologies require that number of ports or 252 more. 254 3. Buffering Testing 256 3.1 Objective 258 To measure the size of the buffer of a DUT under 259 typical|many|multiple conditions. Buffer architectures between 260 multiple DUTs can differ and include egress buffering, shared egress 261 buffering SoC (Switch-on-Chip), ingress buffering or a combination. 262 The test methodology covers the buffer measurement regardless of 263 buffer architecture used in the DUT. 265 3.2 Methodology 267 A traffic generator MUST be connected to all ports on the DUT. 269 The methodology for measuring buffering for a data-center switch is 270 based on using known congestion of known fixed packet size along with 271 maximum latency value measurements. The maximum latency will increase 272 until the first packet drop occurs. At this point, the maximum 273 latency value will remain constant. This is the point of inflection 274 of this maximum latency change to a constant value. There MUST be 275 multiple ingress ports receiving known amount of frames at a known 276 fixed size, destined for the same egress port in order to create a 277 known congestion condition. The total amount of packets sent from the 278 oversubscribed port minus one, multiplied by the packet size 279 represents the maximum port buffer size at the measured inflection 280 point. 282 1) Measure the highest buffer efficiency 284 The tests described in this section have iterations called "first 285 iteration", "second iteration" and, "last iteration". The idea is to 286 show the first two iterations so the reader understands the logic on 287 how to keep incrementing the iterations. The last iteration shows the 288 end state of the variables. 290 First iteration: ingress port 1 sending line rate to egress port 2, 291 while port 3 sending a known low amount of over-subscription traffic 292 (1% recommended) with a packet size of 64 bytes to egress port 2. 293 Measure the buffer size value of the number of frames sent from the 294 port sending the oversubscribed traffic up to the inflection point 295 multiplied by the frame size. 297 Second iteration: ingress port 1 sending line rate to egress port 2, 298 while port 3 sending a known low amount of over-subscription traffic 299 (1% recommended) with same packet size 65 bytes to egress port 2. 300 Measure the buffer size value of the number of frames sent from the 301 port sending the oversubscribed traffic up to the inflection point 302 multiplied by the frame size. 304 Last iteration: ingress port 1 sending line rate to egress port 2, 305 while port 3 sending a known low amount of over-subscription traffic 306 (1% recommended) with same packet size B bytes to egress port 2. 307 Measure the buffer size value of the number of frames sent from the 308 port sending the oversubscribed traffic up to the inflection point 309 multiplied by the frame size. 311 When the B value is found to provide the largest buffer size, then 312 size B allows the highest buffer efficiency. 314 2) Measure maximum port buffer size 316 The tests described in this section have iterations called "first 317 iteration", "second iteration" and, "last iteration". The idea is to 318 show the first two iterations so the reader understands the logic on 319 how to keep incrementing the iterations. The last iteration shows the 320 end state of the variables. 322 At fixed packet size B determined in procedure 1), for a fixed 323 default Differentiated Services Code Point (DSCP)/Class of Service 324 (COS) value of 0 and for unicast traffic proceed with the following: 326 First iteration: ingress port 1 sending line rate to egress port 2, 327 while port 3 sending a known low amount of over-subscription traffic 328 (1% recommended) with same packet size to the egress port 2. Measure 329 the buffer size value by multiplying the number of extra frames sent 330 by the frame size. 332 Second iteration: ingress port 2 sending line rate to egress port 3, 333 while port 4 sending a known low amount of over-subscription traffic 334 (1% recommended) with same packet size to the egress port 3. Measure 335 the buffer size value by multiplying the number of extra frames sent 336 by the frame size. 338 Last iteration: ingress port N-2 sending line rate traffic to egress 339 port N-1, while port N sending a known low amount of over- 340 subscription traffic (1% recommended) with same packet size to the 341 egress port N. Measure the buffer size value by multiplying the 342 number of extra frames sent by the frame size. 344 This test series MAY be repeated using all different DSCP/COS values 345 of traffic and then using Multicast type of traffic, in order to find 346 if there is any DSCP/COS impact on the buffer size. 348 3) Measure maximum port pair buffer sizes 350 The tests described in this section have iterations called "first 351 iteration", "second iteration" and, "last iteration". The idea is to 352 show the first two iterations so the reader understands the logic on 353 how to keep incrementing the iterations. The last iteration shows the 354 end state of the variables. 356 First iteration: ingress port 1 sending line rate to egress port 2; 357 ingress port 3 sending line rate to egress port 4 etc. Ingress port 358 N-1 and N will respectively over subscribe at 1% of line rate egress 359 port 2 and port 3. Measure the buffer size value by multiplying the 360 number of extra frames sent by the frame size for each egress port. 362 Second iteration: ingress port 1 sending line rate to egress port 2; 363 ingress port 3 sending line rate to egress port 4 etc. Ingress port 364 N-1 and N will respectively over subscribe at 1% of line rate egress 365 port 4 and port 5. Measure the buffer size value by multiplying the 366 number of extra frames sent by the frame size for each egress port. 368 Last iteration: ingress port 1 sending line rate to egress port 2; 369 ingress port 3 sending line rate to egress port 4 etc. Ingress port 370 N-1 and N will respectively over subscribe at 1% of line rate egress 371 port N-3 and port N-2. Measure the buffer size value by multiplying 372 the number of extra frames sent by the frame size for each egress 373 port. 375 This test series MAY be repeated using all different DSCP/COS values 376 of traffic and then using Multicast type of traffic. 378 4) Measure maximum DUT buffer size with many to one ports 380 The tests described in this section have iterations called "first 381 iteration", "second iteration" and, "last iteration". The idea is to 382 show the first two iterations so the reader understands the logic on 383 how to keep incrementing the iterations. The last iteration shows the 384 end state of the variables. 386 First iteration: ingress ports 1,2,... N-1 sending each [(1/[N- 387 1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port. 389 Second iteration: ingress ports 2,... N sending each [(1/[N- 390 1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port. 392 Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N- 393 1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port. 395 This test series MAY be repeated using all different COS values of 396 traffic and then using Multicast type of traffic. 398 Unicast traffic and then Multicast traffic SHOULD be used in order to 399 determine the proportion of buffer for documented selection of tests. 400 Also the COS value for the packets SHOULD be provided for each test 401 iteration as the buffer allocation size MAY differ per COS value. It 402 is RECOMMENDED that the ingress and egress ports are varied in a 403 random, but documented fashion in multiple tests to measure the 404 buffer size for each port of the DUT. 406 3.3 Reporting format 408 The report MUST include: 410 - The packet size used for the most efficient buffer used, along 411 with DSCP/COS value 413 - The maximum port buffer size for each port 415 - The maximum DUT buffer size 417 - The packet size used in the test 419 - The amount of over-subscription if different than 1% 421 - The number of ingress and egress ports along with their location 422 on the DUT 424 - The repeatability of the test needs to be indicated: number of 425 iterations of the same test and percentage of variation between 426 results for each of the tests (min, max, avg) 428 The percentage of variation is a metric providing a sense of how big 429 the difference between the measured value and the previous ones. 431 For example, for a latency test where the minimum latency is 432 measured, the percentage of variation of the minimum latency will 433 indicate by how much this value has varied between the current test 434 executed and the previous one. 436 PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the 437 current test and x1 is the minimum latency value obtained in the 438 previous test. 440 The same formula is used for max and avg variations measured. 442 4 Microburst Testing 444 4.1 Objective 446 To find the maximum amount of packet bursts a DUT can sustain under 447 various configurations. 449 This test provides additional methodology to the other RFC tests: 451 -All bursts should be send with 100% intensity. Note: intensity is 452 defined in [1] section 6.1.1 453 -All ports of the DUT must be used for this test 455 -All ports are recommended to be testes simultaneously 457 4.2 Methodology 459 A traffic generator MUST be connected to all ports on the DUT. In 460 order to cause congestion, two or more ingress ports MUST send bursts 461 of packets destined for the same egress port. The simplest of the 462 setups would be two ingress ports and one egress port (2-to-1). 464 The burst MUST be sent with an intensity of 100% (intensity is 465 defined in [1] section 6.1.1), meaning the burst of packets will be 466 sent with a minimum inter-packet gap. The amount of packet contained 467 in the burst will be trial variable and increase until there is a 468 non-zero packet loss measured. The aggregate amount of packets from 469 all the senders will be used to calculate the maximum amount of 470 microburst the DUT can sustain. 472 It is RECOMMENDED that the ingress and egress ports are varied in 473 multiple tests to measure the maximum microburst capacity. 475 The intensity of a microburst MAY be varied in order to obtain the 476 microburst capacity at various ingress rates. Intensity of microburst 477 is defined in [1]. 479 It is RECOMMENDED that all ports on the DUT will be tested 480 simultaneously and in various configurations in order to understand 481 all the combinations of ingress ports, egress ports and intensities. 483 An example would be: 485 First Iteration: N-1 Ingress ports sending to 1 Egress Ports 487 Second Iterations: N-2 Ingress ports sending to 2 Egress Ports 489 Last Iterations: 2 Ingress ports sending to N-2 Egress Ports 491 4.3 Reporting Format 493 The report MUST include: 495 - The maximum number of packets received per ingress port with the 496 maximum burst size obtained with zero packet loss 498 - The packet size used in the test 499 - The number of ingress and egress ports along with their location 500 on the DUT 502 - The repeatability of the test needs to be indicated: number of 503 iterations of the same test and percentage of variation between 504 results (min, max, avg) 506 5. Head of Line Blocking 508 5.1 Objective 510 Head-of-line blocking (HOLB) is a performance-limiting phenomenon 511 that occurs when packets are held-up by the first packet ahead 512 waiting to be transmitted to a different output port. This is defined 513 in RFC 2889 section 5.5, Congestion Control. This section expands on 514 RFC 2889 in the context of Data Center Benchmarking. 516 The objective of this test is to understand the DUT behavior under 517 head of line blocking scenario and measure the packet loss. 519 Here are the differences between this HOLB test and RFC 2889: 521 -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC 522 2889 524 -This HOLB shifts all the port numbers by one in a second iteration 525 of the test, this is new compared to RFC 2889. The shifting port 526 numbers continue until all ports are the first in the group. The 527 purpose is to make sure to have tested all permutations to cover 528 differences of behavior in the SoC of the DUT 530 -Another test in this HOLB expands the group of ports, such that 531 traffic is divided among 4 ports instead of two (25% instead of 50% 532 per port) 534 -Section 5.3 adds additional reporting requirements from Congestion 535 Control in RFC 2889 537 5.2 Methodology 539 In order to cause congestion in the form of head of line blocking, 540 groups of four ports are used. A group has 2 ingress and 2 egress 541 ports. The first ingress port MUST have two flows configured each 542 going to a different egress port. The second ingress port will 543 congest the second egress port by sending line rate. The goal is to 544 measure if there is loss on the flow for the first egress port which 545 is not over-subscribed. 547 A traffic generator MUST be connected to at least eight ports on the 548 DUT and SHOULD be connected using all the DUT ports. 550 1) Measure two groups with eight DUT ports 552 The tests described in this section have iterations called "first 553 iteration", "second iteration" and, "last iteration". The idea is to 554 show the first two iterations so the reader understands the logic on 555 how to keep incrementing the iterations. The last iteration shows the 556 end state of the variables. 558 First iteration: measure the packet loss for two groups with 559 consecutive ports 561 The first group is composed of: ingress port 1 is sending 50% of 562 traffic to egress port 3 and ingress port 1 is sending 50% of traffic 563 to egress port 4. Ingress port 2 is sending line rate to egress port 564 4. Measure the amount of traffic loss for the traffic from ingress 565 port 1 to egress port 3. 567 The second group is composed of: ingress port 5 is sending 50% of 568 traffic to egress port 7 and ingress port 5 is sending 50% of traffic 569 to egress port 8. Ingress port 6 is sending line rate to egress port 570 8. Measure the amount of traffic loss for the traffic from ingress 571 port 5 to egress port 7. 573 Second iteration: repeat the first iteration by shifting all the 574 ports from N to N+1. 576 The first group is composed of: ingress port 2 is sending 50% of 577 traffic to egress port 4 and ingress port 2 is sending 50% of traffic 578 to egress port 5. Ingress port 3 is sending line rate to egress port 579 5. Measure the amount of traffic loss for the traffic from ingress 580 port 2 to egress port 4. 582 The second group is composed of: ingress port 6 is sending 50% of 583 traffic to egress port 8 and ingress port 6 is sending 50% of traffic 584 to egress port 9. Ingress port 7 is sending line rate to egress port 585 9. Measure the amount of traffic loss for the traffic from ingress 586 port 6 to egress port 8. 588 Last iteration: when the first port of the first group is connected 589 on the last DUT port and the last port of the second group is 590 connected to the seventh port of the DUT. 592 Measure the amount of traffic loss for the traffic from ingress port 593 N to egress port 2 and from ingress port 4 to egress port 6. 595 2) Measure with N/4 groups with N DUT ports 597 The tests described in this section have iterations called "first 598 iteration", "second iteration" and, "last iteration". The idea is to 599 show the first two iterations so the reader understands the logic on 600 how to keep incrementing the iterations. The last iteration shows the 601 end state of the variables. 603 The traffic from ingress split across 4 egress ports (100/4=25%). 605 First iteration: Expand to fully utilize all the DUT ports in 606 increments of four. Repeat the methodology of 1) with all the group 607 of ports possible to achieve on the device and measure for each port 608 group the amount of traffic loss. 610 Second iteration: Shift by +1 the start of each consecutive ports of 611 groups 613 Last iteration: Shift by N-1 the start of each consecutive ports of 614 groups and measure the traffic loss for each port group. 616 5.3 Reporting Format 618 For each test the report MUST include: 620 - The port configuration including the number and location of ingress 621 and egress ports located on the DUT 623 - If HOLB was observed in accordance with the HOLB test in section 5 625 - Percent of traffic loss 627 - The repeatability of the test needs to be indicated: number of 628 iteration of the same test and percentage of variation between 629 results (min, max, avg) 631 6. Incast Stateful and Stateless Traffic 633 6.1 Objective 635 The objective of this test is to measure the values for TCP Goodput 636 [4] and latency with a mix of large and small flows. The test is 637 designed to simulate a mixed environment of stateful flows that 638 require high rates of goodput and stateless flows that require low 639 latency. Stateful flows are created by generating TCP traffic and, 640 stateless flows are created using UDP type of traffic. 642 6.2 Methodology 644 In order to simulate the effects of stateless and stateful traffic on 645 the DUT, there MUST be multiple ingress ports receiving traffic 646 destined for the same egress port. There also MAY be a mix of 647 stateful and stateless traffic arriving on a single ingress port. The 648 simplest setup would be 2 ingress ports receiving traffic destined to 649 the same egress port. 651 One ingress port MUST be maintaining a TCP connection trough the 652 ingress port to a receiver connected to an egress port. Traffic in 653 the TCP stream MUST be sent at the maximum rate allowed by the 654 traffic generator. At the same time, the TCP traffic is flowing 655 through the DUT the stateless traffic is sent destined to a receiver 656 on the same egress port. The stateless traffic MUST be a microburst 657 of 100% intensity. 659 It is RECOMMENDED that the ingress and egress ports are varied in 660 multiple tests to measure the maximum microburst capacity. 662 The intensity of a microburst MAY be varied in order to obtain the 663 microburst capacity at various ingress rates. 665 It is RECOMMENDED that all ports on the DUT be used in the test. 667 The tests described bellow have iterations called "first iteration", 668 "second iteration" and, "last iteration". The idea is to show the 669 first two iterations so the reader understands the logic on how to 670 keep incrementing the iterations. The last iteration shows the end 671 state of the variables. 673 For example: 675 Stateful Traffic port variation (TCP traffic): 677 TCP traffic needs to be generated in this section. During Iterations 678 number of Egress ports MAY vary as well. 680 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 681 Ingress port receiving stateless traffic destined to 1 Egress Port 683 Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1 684 Ingress port receiving stateless traffic destined to 1 Egress Port 685 Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1 686 Ingress port receiving stateless traffic destined to 1 Egress Port 688 Stateless Traffic port variation (UDP traffic): 690 UDP traffic needs to be generated for this test. During Iterations, 691 the number of Egress ports MAY vary as well. 693 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 694 Ingress port receiving stateless traffic destined to 1 Egress Port 696 Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2 697 Ingress port receiving stateless traffic destined to 1 Egress Port 699 Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2 700 Ingress port receiving stateless traffic destined to 1 Egress Port 702 6.3 Reporting Format 704 The report MUST include the following: 706 - Number of ingress and egress ports along with designation of 707 stateful or stateless flow assignment. 709 - Stateful flow goodput 711 - Stateless flow latency 713 - The repeatability of the test needs to be indicated: number of 714 iterations of the same test and percentage of variation between 715 results (min, max, avg) 717 7. Security Considerations 719 Benchmarking activities as described in this memo are limited to 720 technology characterization using controlled stimuli in a laboratory 721 environment, with dedicated address space and the constraints 722 specified in the sections above. 724 The benchmarking network topology will be an independent test setup 725 and MUST NOT be connected to devices that may forward the test 726 traffic into a production network, or misroute traffic to the test 727 management network. 729 Further, benchmarking is performed on a "black-box" basis, relying 730 solely on measurements observable external to the DUT/SUT. 732 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for 733 benchmarking purposes. Any implications for network security arising 734 from the DUT/SUT SHOULD be identical in the lab and in production 735 networks. 737 8. IANA Considerations 739 NO IANA Action is requested at this time. 741 9. References 742 9.1. Normative References 744 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 745 Interconnection Devices", BCP 14, RFC 1242, DOI 746 10.17487/RFC1242, July 1991, 749 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 750 Network Interconnect Devices", BCP 14, RFC 2544, DOI 751 10.17487/RFC2544, March 1999, 754 9.2. Informative References 756 [1] Avramov L. and Rapp J., "Data Center Benchmarking Terminology", 757 April 2017. 759 [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for 760 LAN Switching Devices", RFC 2889, August 2000, 763 [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast 764 Benchmarking", RFC 3918, October 2004, 767 [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable 768 Packet Sizes for Additional Testing", RFC 6985, July 2013, 769 771 [4] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 772 Joseph, "Understanding TCP Incast Throughput Collapse in 773 Datacenter Networks, 774 "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" 776 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 777 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 778 March 1997, 780 [RFC2432] Dubray, K., "Terminology for IP Multicast 781 Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 782 1998, 784 9.2. Acknowledgements 785 The authors would like to thank Alfred Morton and Scott Bradner 786 for their reviews and feedback. 788 Authors' Addresses 790 Lucien Avramov 791 Google 792 1600 Amphitheatre Parkway 793 Mountain View, CA 94043 794 United States 795 Phone: +1 408 774 9077 796 Email: lucien.avramov@gmail.com 798 Jacob Rapp 799 VMware 800 3401 Hillview Ave 801 Palo Alto, CA 802 United States 803 Phone: +1 650 857 3367 804 Email: jrapp@vmware.com