idnits 2.17.1 draft-ietf-bmwg-dcbench-methodology-17.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 INTERNET-DRAFT, Intended Status: Informational Google 4 Expires December 23,2017 J. Rapp 5 June 21, 2017 VMware 7 Data Center Benchmarking Methodology 8 draft-ietf-bmwg-dcbench-methodology-17 10 Abstract 12 The purpose of this informational document is to establish test and 13 evaluation methodology and measurement techniques for physical 14 network equipment in the data center. A pre-requisite to this 15 publication is the terminology document [draft-ietf-bmwg-dcbench- 16 terminology]. Many of these terms and methods may be applicable 17 beyond this publication's scope as the technologies originally 18 applied in the data center are deployed elsewhere. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute working 27 documents as Internet-Drafts. The list of current Internet-Drafts is 28 at http://datatracker.ietf.org/drafts/current. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 54 1.2. Methodology format and repeatability recommendation . . . . 5 55 2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . . 5 56 2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 6 59 3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . . 7 60 3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 8 62 3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 10 63 4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11 64 4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 11 65 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 12 66 4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12 67 5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13 68 5.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13 70 5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15 71 6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15 72 6.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 16 74 6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 77 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 78 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 79 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 80 9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 83 1. Introduction 85 Traffic patterns in the data center are not uniform and are 86 constantly changing. They are dictated by the nature and variety of 87 applications utilized in the data center. It can be largely east-west 88 traffic flows (server to server inside the data center) in one data 89 center and north-south (outside of the data center to server) in 90 another, while others may combine both. Traffic patterns can be 91 bursty in nature and contain many-to-one, many-to-many, or one-to- 92 many flows. Each flow may also be small and latency sensitive or 93 large and throughput sensitive while containing a mix of UDP and TCP 94 traffic. All of these can coexist in a single cluster and flow 95 through a single network device simultaneously. Benchmarking of 96 network devices have long used [RFC1242], [RFC2432], [RFC2544], 97 [RFC2889] and [RFC3918] which have largely been focused around 98 various latency attributes and Throughput [RFC2889] of the Device 99 Under Test (DUT) being benchmarked. These standards are good at 100 measuring theoretical Throughput, forwarding rates and latency under 101 testing conditions; however, they do not represent real traffic 102 patterns that may affect these networking devices. 104 Currently, typical data center networking devices are characterized 105 by: 107 -High port density (48 ports of more) 109 -High speed (up to 100 GB/s currently per port) 111 -High throughput (line rate on all ports for Layer 2 and/or Layer 3) 113 -Low latency (in the microsecond or nanosecond range) 115 -Low amount of buffer (in the MB range per networking device) 117 -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) 119 This document provides a methodology for benchmarking Data Center 120 physical network equipment DUT including congestion scenarios, switch 121 buffer analysis, microburst, head of line blocking, while also using 122 a wide mix of traffic conditions. The terminology document [draft- 123 ietf-bmwg-dcbench-terminology] is a pre-requisite. 125 1.1. Requirements Language 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in RFC 2119 [RFC2119]. 131 1.2. Methodology format and repeatability recommendation 133 The format used for each section of this document is the following: 135 -Objective 137 -Methodology 139 -Reporting Format 141 Additional interpretation of RFC2119 terms: 143 For each test methodology described, it is critical to obtain 144 repeatability in the results. The recommendation is to perform enough 145 iterations of the given test and to make sure the result is 146 consistent. This is especially important for section 3, as the 147 buffering testing has been historically the least reliable. The 148 number of iterations SHOULD be explicitly reported. The relative 149 standard deviation SHOULD be below 10%. 151 2. Line Rate Testing 153 2.1 Objective 155 Provide a maximum rate test for the performance values for 156 Throughput, latency and jitter. It is meant to provide the tests to 157 perform, and methodology to verify that a DUT is capable of 158 forwarding packets at line rate under non-congested conditions. 160 2.2 Methodology 162 A traffic generator SHOULD be connected to all ports on the DUT. Two 163 tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15 164 compliant] and also in a full mesh type of DUT test [2889/3918 165 section 16 compliant]. 167 For all tests, the test traffic generator sending rate MUST be less 168 than or equal to 99.98% of the nominal value of Line Rate (with no 169 further PPM adjustment to account for interface clock tolerances), to 170 ensure stressing the DUT in reasonable worst case conditions (see RFC 171 [draft-ietf-bmwg-dcbench-terminology] section 5 for more details -- 172 note to RFC Editor, please replace all [draft-ietf-bmwg-dcbench- 173 terminology] references in this document with the future RFC number 174 of that draft). Tests results at a lower rate MAY be provided for 175 better understanding of performance increase in terms of latency and 176 jitter when the rate is lower than 99.98%. The receiving rate of the 177 traffic SHOULD be captured during this test in % of line rate. 179 The test MUST provide the statistics of minimum, average and maximum 180 of the latency distribution, for the exact same iteration of the 181 test. 183 The test MUST provide the statistics of minimum, average and maximum 184 of the jitter distribution, for the exact same iteration of the test. 186 Alternatively when a traffic generator can not be connected to all 187 ports on the DUT, a snake test MUST be used for line rate testing, 188 excluding latency and jitter as those became then irrelevant. The 189 snake test consists in the following method: 191 -connect the first and last port of the DUT to a traffic generator 193 -connect back to back sequentially all the ports in between: port 2 194 to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total 195 number of ports of the DUT 197 -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same 198 vlan Y, etc. port n-1 and port n in the same vlan Z. 200 This snake test provides a capability to test line rate for Layer 2 201 and Layer 3 RFC 2544/3918 in instance where a traffic generator with 202 only two ports is available. The latency and jitter are not to be 203 considered with this test. 205 2.3 Reporting Format 207 The report MUST include: 209 -physical layer calibration information as defined into [draft-ietf- 210 bmwg-dcbench-terminology] section 4. 212 -number of ports used 214 -reading for "Throughput received in percentage of bandwidth", while 215 sending 99.98% of nominal value of Line Rate on each port, for each 216 packet size from 64 bytes to 9216 bytes. As guidance, an increment of 217 64 byte packet size between each iteration being ideal, a 256 byte 218 and 512 bytes being are also often used. The most common packets 219 sizes order for the report is: 220 64b,128b,256b,512b,1024b,1518b,4096,8000,9216b. 222 The pattern for testing can be expressed using [RFC 6985]. 224 -Throughput needs to be expressed in % of total transmitted frames 226 -For packet drops, they MUST be expressed as a count of packets and 227 SHOULD be expressed in % of line rate 229 -For latency and jitter, values expressed in unit of time [usually 230 microsecond or nanosecond] reading across packet size from 64 bytes 231 to 9216 bytes 233 -For latency and jitter, provide minimum, average and maximum values. 234 If different iterations are done to gather the minimum, average and 235 maximum, it SHOULD be specified in the report along with a 236 justification on why the information could not have been gathered at 237 the same test iteration 239 -For jitter, a histogram describing the population of packets 240 measured per latency or latency buckets is RECOMMENDED 242 -The tests for Throughput, latency and jitter MAY be conducted as 243 individual independent trials, with proper documentation in the 244 report but SHOULD be conducted at the same time. 246 -The methodology makes an assumption that the DUT has at least nine 247 ports, as certain methodologies require that number of ports or 248 more. 250 3. Buffering Testing 252 3.1 Objective 254 To measure the size of the buffer of a DUT under 255 typical|many|multiple conditions. Buffer architectures between 256 multiple DUTs can differ and include egress buffering, shared egress 257 buffering SoC (Switch-on-Chip), ingress buffering or a combination. 258 The test methodology covers the buffer measurement regardless of 259 buffer architecture used in the DUT. 261 3.2 Methodology 263 A traffic generator MUST be connected to all ports on the DUT. 265 The methodology for measuring buffering for a data-center switch is 266 based on using known congestion of known fixed packet size along with 267 maximum latency value measurements. The maximum latency will increase 268 until the first packet drop occurs. At this point, the maximum 269 latency value will remain constant. This is the point of inflection 270 of this maximum latency change to a constant value. There MUST be 271 multiple ingress ports receiving known amount of frames at a known 272 fixed size, destined for the same egress port in order to create a 273 known congestion condition. The total amount of packets sent from the 274 oversubscribed port minus one, multiplied by the packet size 275 represents the maximum port buffer size at the measured inflection 276 point. 278 1) Measure the highest buffer efficiency 280 The tests described in this section have iterations called "first 281 iteration", "second iteration" and, "last iteration". The idea is to 282 show the first two iterations so the reader understands the logic on 283 how to keep incrementing the iterations. The last iteration shows the 284 end state of the variables. 286 First iteration: ingress port 1 sending line rate to egress port 2, 287 while port 3 sending a known low amount of over-subscription traffic 288 (1% recommended) with a packet size of 64 bytes to egress port 2. 289 Measure the buffer size value of the number of frames sent from the 290 port sending the oversubscribed traffic up to the inflection point 291 multiplied by the frame size. 293 Second iteration: ingress port 1 sending line rate to egress port 2, 294 while port 3 sending a known low amount of over-subscription traffic 295 (1% recommended) with same packet size 65 bytes to egress port 2. 296 Measure the buffer size value of the number of frames sent from the 297 port sending the oversubscribed traffic up to the inflection point 298 multiplied by the frame size. 300 Last iteration: ingress port 1 sending line rate to egress port 2, 301 while port 3 sending a known low amount of over-subscription traffic 302 (1% recommended) with same packet size B bytes to egress port 2. 303 Measure the buffer size value of the number of frames sent from the 304 port sending the oversubscribed traffic up to the inflection point 305 multiplied by the frame size. 307 When the B value is found to provide the largest buffer size, then 308 size B allows the highest buffer efficiency. 310 2) Measure maximum port buffer size 312 The tests described in this section have iterations called "first 313 iteration", "second iteration" and, "last iteration". The idea is to 314 show the first two iterations so the reader understands the logic on 315 how to keep incrementing the iterations. The last iteration shows the 316 end state of the variables. 318 At fixed packet size B determined in procedure 1), for a fixed 319 default Differentiated Services Code Point (DSCP)/Class of Service 320 (COS) value of 0 and for unicast traffic proceed with the following: 322 First iteration: ingress port 1 sending line rate to egress port 2, 323 while port 3 sending a known low amount of over-subscription traffic 324 (1% recommended) with same packet size to the egress port 2. Measure 325 the buffer size value by multiplying the number of extra frames sent 326 by the frame size. 328 Second iteration: ingress port 2 sending line rate to egress port 3, 329 while port 4 sending a known low amount of over-subscription traffic 330 (1% recommended) with same packet size to the egress port 3. Measure 331 the buffer size value by multiplying the number of extra frames sent 332 by the frame size. 334 Last iteration: ingress port N-2 sending line rate traffic to egress 335 port N-1, while port N sending a known low amount of over- 336 subscription traffic (1% recommended) with same packet size to the 337 egress port N. Measure the buffer size value by multiplying the 338 number of extra frames sent by the frame size. 340 This test series MAY be repeated using all different DSCP/COS values 341 of traffic and then using Multicast type of traffic, in order to find 342 if there is any DSCP/COS impact on the buffer size. 344 3) Measure maximum port pair buffer sizes 346 The tests described in this section have iterations called "first 347 iteration", "second iteration" and, "last iteration". The idea is to 348 show the first two iterations so the reader understands the logic on 349 how to keep incrementing the iterations. The last iteration shows the 350 end state of the variables. 352 First iteration: ingress port 1 sending line rate to egress port 2; 353 ingress port 3 sending line rate to egress port 4 etc. Ingress port 354 N-1 and N will respectively over subscribe at 1% of line rate egress 355 port 2 and port 3. Measure the buffer size value by multiplying the 356 number of extra frames sent by the frame size for each egress port. 358 Second iteration: ingress port 1 sending line rate to egress port 2; 359 ingress port 3 sending line rate to egress port 4 etc. Ingress port 360 N-1 and N will respectively over subscribe at 1% of line rate egress 361 port 4 and port 5. Measure the buffer size value by multiplying the 362 number of extra frames sent by the frame size for each egress port. 364 Last iteration: ingress port 1 sending line rate to egress port 2; 365 ingress port 3 sending line rate to egress port 4 etc. Ingress port 366 N-1 and N will respectively over subscribe at 1% of line rate egress 367 port N-3 and port N-2. Measure the buffer size value by multiplying 368 the number of extra frames sent by the frame size for each egress 369 port. 371 This test series MAY be repeated using all different DSCP/COS values 372 of traffic and then using Multicast type of traffic. 374 4) Measure maximum DUT buffer size with many to one ports 376 The tests described in this section have iterations called "first 377 iteration", "second iteration" and, "last iteration". The idea is to 378 show the first two iterations so the reader understands the logic on 379 how to keep incrementing the iterations. The last iteration shows the 380 end state of the variables. 382 First iteration: ingress ports 1,2,... N-1 sending each [(1/[N- 383 1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port. 385 Second iteration: ingress ports 2,... N sending each [(1/[N- 386 1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port. 388 Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N- 389 1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port. 391 This test series MAY be repeated using all different COS values of 392 traffic and then using Multicast type of traffic. 394 Unicast traffic and then Multicast traffic SHOULD be used in order to 395 determine the proportion of buffer for documented selection of tests. 396 Also the COS value for the packets SHOULD be provided for each test 397 iteration as the buffer allocation size MAY differ per COS value. It 398 is RECOMMENDED that the ingress and egress ports are varied in a 399 random, but documented fashion in multiple tests to measure the 400 buffer size for each port of the DUT. 402 3.3 Reporting format 404 The report MUST include: 406 - The packet size used for the most efficient buffer used, along 407 with DSCP/COS value 409 - The maximum port buffer size for each port 411 - The maximum DUT buffer size 413 - The packet size used in the test 415 - The amount of over-subscription if different than 1% 417 - The number of ingress and egress ports along with their location 418 on the DUT 420 - The repeatability of the test needs to be indicated: number of 421 iterations of the same test and percentage of variation between 422 results for each of the tests (min, max, avg) 424 The percentage of variation is a metric providing a sense of how big 425 the difference between the measured value and the previous ones. 427 For example, for a latency test where the minimum latency is 428 measured, the percentage of variation of the minimum latency will 429 indicate by how much this value has varied between the current test 430 executed and the previous one. 432 PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the 433 current test and x1 is the minimum latency value obtained in the 434 previous test. 436 The same formula is used for max and avg variations measured. 438 4 Microburst Testing 440 4.1 Objective 442 To find the maximum amount of packet bursts a DUT can sustain under 443 various configurations. 445 This test provides additional methodology to the other RFC tests: 447 -All bursts should be send with 100% intensity. Note: intensity is 448 defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1 450 -All ports of the DUT must be used for this test 452 -All ports are recommended to be testes simultaneously 454 4.2 Methodology 456 A traffic generator MUST be connected to all ports on the DUT. In 457 order to cause congestion, two or more ingress ports MUST send bursts 458 of packets destined for the same egress port. The simplest of the 459 setups would be two ingress ports and one egress port (2-to-1). 461 The burst MUST be sent with an intensity of 100% (intensity is 462 defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1), 463 meaning the burst of packets will be sent with a minimum inter-packet 464 gap. The amount of packet contained in the burst will be trial 465 variable and increase until there is a non-zero packet loss measured. 466 The aggregate amount of packets from all the senders will be used to 467 calculate the maximum amount of microburst the DUT can sustain. 469 It is RECOMMENDED that the ingress and egress ports are varied in 470 multiple tests to measure the maximum microburst capacity. 472 The intensity of a microburst MAY be varied in order to obtain the 473 microburst capacity at various ingress rates. Intensity of microburst 474 is defined in [draft-ietf-bmwg-dcbench-terminology]. 476 It is RECOMMENDED that all ports on the DUT will be tested 477 simultaneously and in various configurations in order to understand 478 all the combinations of ingress ports, egress ports and intensities. 480 An example would be: 482 First Iteration: N-1 Ingress ports sending to 1 Egress Ports 484 Second Iterations: N-2 Ingress ports sending to 2 Egress Ports 486 Last Iterations: 2 Ingress ports sending to N-2 Egress Ports 488 4.3 Reporting Format 490 The report MUST include: 492 - The maximum number of packets received per ingress port with the 493 maximum burst size obtained with zero packet loss 495 - The packet size used in the test 497 - The number of ingress and egress ports along with their location 498 on the DUT 500 - The repeatability of the test needs to be indicated: number of 501 iterations of the same test and percentage of variation between 502 results (min, max, avg) 504 5. Head of Line Blocking 506 5.1 Objective 508 Head-of-line blocking (HOLB) is a performance-limiting phenomenon 509 that occurs when packets are held-up by the first packet ahead 510 waiting to be transmitted to a different output port. This is defined 511 in RFC 2889 section 5.5, Congestion Control. This section expands on 512 RFC 2889 in the context of Data Center Benchmarking. 514 The objective of this test is to understand the DUT behavior under 515 head of line blocking scenario and measure the packet loss. 517 Here are the differences between this HOLB test and RFC 2889: 519 -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC 520 2889 522 -This HOLB shifts all the port numbers by one in a second iteration 523 of the test, this is new compared to RFC 2889. The shifting port 524 numbers continue until all ports are the first in the group. The 525 purpose is to make sure to have tested all permutations to cover 526 differences of behavior in the SoC of the DUT 528 -Another test in this HOLB expands the group of ports, such that 529 traffic is divided among 4 ports instead of two (25% instead of 50% 530 per port) 532 -Section 5.3 adds additional reporting requirements from Congestion 533 Control in RFC 2889 535 5.2 Methodology 537 In order to cause congestion in the form of head of line blocking, 538 groups of four ports are used. A group has 2 ingress and 2 egress 539 ports. The first ingress port MUST have two flows configured each 540 going to a different egress port. The second ingress port will 541 congest the second egress port by sending line rate. The goal is to 542 measure if there is loss on the flow for the first egress port which 543 is not over-subscribed. 545 A traffic generator MUST be connected to at least eight ports on the 546 DUT and SHOULD be connected using all the DUT ports. 548 1) Measure two groups with eight DUT ports 550 The tests described in this section have iterations called "first 551 iteration", "second iteration" and, "last iteration". The idea is to 552 show the first two iterations so the reader understands the logic on 553 how to keep incrementing the iterations. The last iteration shows the 554 end state of the variables. 556 First iteration: measure the packet loss for two groups with 557 consecutive ports 559 The first group is composed of: ingress port 1 is sending 50% of 560 traffic to egress port 3 and ingress port 1 is sending 50% of traffic 561 to egress port 4. Ingress port 2 is sending line rate to egress port 562 4. Measure the amount of traffic loss for the traffic from ingress 563 port 1 to egress port 3. 565 The second group is composed of: ingress port 5 is sending 50% of 566 traffic to egress port 7 and ingress port 5 is sending 50% of traffic 567 to egress port 8. Ingress port 6 is sending line rate to egress port 568 8. Measure the amount of traffic loss for the traffic from ingress 569 port 5 to egress port 7. 571 Second iteration: repeat the first iteration by shifting all the 572 ports from N to N+1. 574 The first group is composed of: ingress port 2 is sending 50% of 575 traffic to egress port 4 and ingress port 2 is sending 50% of traffic 576 to egress port 5. Ingress port 3 is sending line rate to egress port 577 5. Measure the amount of traffic loss for the traffic from ingress 578 port 2 to egress port 4. 580 The second group is composed of: ingress port 6 is sending 50% of 581 traffic to egress port 8 and ingress port 6 is sending 50% of traffic 582 to egress port 9. Ingress port 7 is sending line rate to egress port 583 9. Measure the amount of traffic loss for the traffic from ingress 584 port 6 to egress port 8. 586 Last iteration: when the first port of the first group is connected 587 on the last DUT port and the last port of the second group is 588 connected to the seventh port of the DUT. 590 Measure the amount of traffic loss for the traffic from ingress port 591 N to egress port 2 and from ingress port 4 to egress port 6. 593 2) Measure with N/4 groups with N DUT ports 595 The tests described in this section have iterations called "first 596 iteration", "second iteration" and, "last iteration". The idea is to 597 show the first two iterations so the reader understands the logic on 598 how to keep incrementing the iterations. The last iteration shows the 599 end state of the variables. 601 The traffic from ingress split across 4 egress ports (100/4=25%). 603 First iteration: Expand to fully utilize all the DUT ports in 604 increments of four. Repeat the methodology of 1) with all the group 605 of ports possible to achieve on the device and measure for each port 606 group the amount of traffic loss. 608 Second iteration: Shift by +1 the start of each consecutive ports of 609 groups 611 Last iteration: Shift by N-1 the start of each consecutive ports of 612 groups and measure the traffic loss for each port group. 614 5.3 Reporting Format 616 For each test the report MUST include: 618 - The port configuration including the number and location of ingress 619 and egress ports located on the DUT 621 - If HOLB was observed in accordance with the HOLB test in section 5 623 - Percent of traffic loss 625 - The repeatability of the test needs to be indicated: number of 626 iteration of the same test and percentage of variation between 627 results (min, max, avg) 629 6. Incast Stateful and Stateless Traffic 631 6.1 Objective 633 The objective of this test is to measure the values for TCP Goodput 634 [1] and latency with a mix of large and small flows. The test is 635 designed to simulate a mixed environment of stateful flows that 636 require high rates of goodput and stateless flows that require low 637 latency. Stateful flows are created by generating TCP traffic and, 638 stateless flows are created using UDP type of traffic. 640 6.2 Methodology 642 In order to simulate the effects of stateless and stateful traffic on 643 the DUT, there MUST be multiple ingress ports receiving traffic 644 destined for the same egress port. There also MAY be a mix of 645 stateful and stateless traffic arriving on a single ingress port. The 646 simplest setup would be 2 ingress ports receiving traffic destined to 647 the same egress port. 649 One ingress port MUST be maintaining a TCP connection trough the 650 ingress port to a receiver connected to an egress port. Traffic in 651 the TCP stream MUST be sent at the maximum rate allowed by the 652 traffic generator. At the same time, the TCP traffic is flowing 653 through the DUT the stateless traffic is sent destined to a receiver 654 on the same egress port. The stateless traffic MUST be a microburst 655 of 100% intensity. 657 It is RECOMMENDED that the ingress and egress ports are varied in 658 multiple tests to measure the maximum microburst capacity. 660 The intensity of a microburst MAY be varied in order to obtain the 661 microburst capacity at various ingress rates. 663 It is RECOMMENDED that all ports on the DUT be used in the test. 665 The tests described bellow have iterations called "first iteration", 666 "second iteration" and, "last iteration". The idea is to show the 667 first two iterations so the reader understands the logic on how to 668 keep incrementing the iterations. The last iteration shows the end 669 state of the variables. 671 For example: 673 Stateful Traffic port variation (TCP traffic): 675 TCP traffic needs to be generated in this section. During Iterations 676 number of Egress ports MAY vary as well. 678 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 679 Ingress port receiving stateless traffic destined to 1 Egress Port 681 Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1 682 Ingress port receiving stateless traffic destined to 1 Egress Port 684 Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1 685 Ingress port receiving stateless traffic destined to 1 Egress Port 687 Stateless Traffic port variation (UDP traffic): 689 UDP traffic needs to be generated for this test. During Iterations, 690 the number of Egress ports MAY vary as well. 692 First Iteration: 1 Ingress port receiving stateful TCP traffic and 1 693 Ingress port receiving stateless traffic destined to 1 Egress Port 695 Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2 696 Ingress port receiving stateless traffic destined to 1 Egress Port 698 Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2 699 Ingress port receiving stateless traffic destined to 1 Egress Port 701 6.3 Reporting Format 703 The report MUST include the following: 705 - Number of ingress and egress ports along with designation of 706 stateful or stateless flow assignment. 708 - Stateful flow goodput 710 - Stateless flow latency 712 - The repeatability of the test needs to be indicated: number of 713 iterations of the same test and percentage of variation between 714 results (min, max, avg) 716 7. Security Considerations 718 Benchmarking activities as described in this memo are limited to 719 technology characterization using controlled stimuli in a laboratory 720 environment, with dedicated address space and the constraints 721 specified in the sections above. 723 The benchmarking network topology will be an independent test setup 724 and MUST NOT be connected to devices that may forward the test 725 traffic into a production network, or misroute traffic to the test 726 management network. 728 Further, benchmarking is performed on a "black-box" basis, relying 729 solely on measurements observable external to the DUT. 731 Special capabilities SHOULD NOT exist in the DUT specifically for 732 benchmarking purposes. Any implications for network security arising 733 from the DUT SHOULD be identical in the lab and in production 734 networks. 736 8. IANA Considerations 737 NO IANA Action is requested at this time. 739 9. References 740 9.1. Normative References 742 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 743 Interconnection Devices", BCP 14, RFC 1242, DOI 744 10.17487/RFC1242, July 1991, 747 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 748 Network Interconnect Devices", BCP 14, RFC 2544, DOI 749 10.17487/RFC2544, March 1999, 752 9.2. Informative References 754 [draft-ietf-bmwg-dcbench-terminology] Avramov L. and Rapp J., "Data 755 Center Benchmarking Terminology", April 2017, RFC "draft-ietf- 756 bmwg-dcbench-terminology", Date [to be fixed when the RFC is 757 published and 1 to be replaced by the RFC number 759 [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for 760 LAN Switching Devices", RFC 2889, August 2000, 763 [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast 764 Benchmarking", RFC 3918, October 2004, 767 [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable 768 Packet Sizes for Additional Testing", RFC 6985, July 2013, 769 771 [1] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 772 Joseph, "Understanding TCP Incast Throughput Collapse in 773 Datacenter Networks, 774 "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" 776 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 777 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 778 March 1997, 780 [RFC2432] Dubray, K., "Terminology for IP Multicast 781 Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 782 1998, 784 9.2. Acknowledgements 786 The authors would like to thank Alfred Morton and Scott Bradner 787 for their reviews and feedback. 789 Authors' Addresses 791 Lucien Avramov 792 Google 793 1600 Amphitheatre Parkway 794 Mountain View, CA 94043 795 United States 796 Phone: +1 408 774 9077 797 Email: lucien.avramov@gmail.com 799 Jacob Rapp 800 VMware 801 3401 Hillview Ave 802 Palo Alto, CA 803 United States 804 Phone: +1 650 857 3367 805 Email: jrapp@vmware.com