idnits 2.17.1 

draft-ietf-bmwg-dcbench-methodology-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	INTERNET-DRAFT, Intended Status: Informational                    Google
4	Expires December 22,2017                                         J. Rapp
5	June 20, 2017                                                     VMware

7	                  Data Center Benchmarking Methodology
8	                 draft-ietf-bmwg-dcbench-methodology-13

10	Abstract

12	   The purpose of this informational document is to establish test and
13	   evaluation methodology and measurement techniques for physical
14	   network equipment in the data center. Many of these terms and methods
15	   may be applicable beyond this publication's scope as the technologies
16	   originally applied in the data center are deployed elsewhere.

18	Status of this Memo

20	   This Internet-Draft is submitted in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF). Note that other groups may also distribute working
25	   documents as Internet-Drafts. The list of current Internet-Drafts is
26	   at http://datatracker.ietf.org/drafts/current.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time. It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	Copyright Notice

35	   Copyright (c) 2017 IETF Trust and the persons identified as the
36	   document authors. All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document. Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document. Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
51	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
52	     1.2. Methodology format and repeatability recommendation . . . .  5
53	   2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . .  5
54	     2.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  5
55	     2.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  5
56	     2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . .  6
57	   3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . .  7
58	     3.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  7
59	     3.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  8
60	     3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 11
61	   4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11
62	     4.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 11
63	     4.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 12
64	     4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12
65	   5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13
66	     5.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 13
67	     5.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 13
68	     5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15
69	   6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15
70	     6.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 15
71	     6.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 16
72	     6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17
73	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
74	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
75	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
76	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 19
77	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19
78	     9.2.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19
79	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

81	1.  Introduction

83	   Traffic patterns in the data center are not uniform and are
84	   constantly changing. They are dictated by the nature and variety of
85	   applications utilized in the data center. It can be largely east-west
86	   traffic flows in one data center and north-south in another, while
87	   others may combine both. Traffic patterns can be bursty in nature and
88	   contain many-to-one, many-to-many, or one-to-many flows. Each flow
89	   may also be small and latency sensitive or large and throughput
90	   sensitive while containing a mix of UDP and TCP traffic. All of these
91	   can coexist in a single cluster and flow through a single network
92	   device simultaneously. Benchmarking of network devices have long used
93	   [RFC1242], [RFC2432], [RFC2544], [RFC2889] and [RFC3918] which have
94	   largely been focused around various latency attributes and Throughput
95	   [RFC2889] of the Device Under Test (DUT) being benchmarked. These
96	   standards are good at measuring theoretical Throughput, forwarding
97	   rates and latency under testing conditions; however, they do not
98	   represent real traffic patterns that may affect these networking
99	   devices.

101	   The following provides a methodology for benchmarking Data Center
102	   physical network equipment DUT including congestion scenarios, switch
103	   buffer analysis, microburst, head of line blocking, while also using
104	   a wide mix of traffic conditions. The terminology document [1] is a
105	   pre-requisite.

107	1.1.  Requirements Language

109	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
110	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
111	   document are to be interpreted as described in RFC 2119 [RFC2119].

113	1.2. Methodology format and repeatability recommendation

115	   The format used for each section of this document is the following:

117	   -Objective

119	   -Methodology

121	   -Reporting Format: Additional interpretation of RFC2119 terms:

123	   MUST: required metric or benchmark for the scenario described
124	   (minimum)

126	   SHOULD or RECOMMENDED: strongly suggested metric for the scenario
127	   described

129	   MAY: Optional metric for the scenario described

131	   For each test methodology described, it is critical to obtain
132	   repeatability in the results. The recommendation is to perform enough
133	   iterations of the given test and to make sure the result is
134	   consistent. This is especially important for section 3, as the
135	   buffering testing has been historically the least reliable. The
136	   number of iterations SHOULD be explicitly reported. The relative
137	   standard deviation SHOULD be below 10%.

139	2. Line Rate Testing

141	2.1 Objective

143	   Provide a maximum rate test for the performance values for
144	   Throughput, latency and jitter. It is meant to provide the tests to
145	   perform, and methodology to verify that a DUT is capable of
146	   forwarding packets at line rate under non-congested conditions.

148	2.2 Methodology

150	   A traffic generator SHOULD be connected to all ports on the DUT. Two
151	   tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15
152	   compliant] and also in a full mesh type of DUT test [2889/3918
153	   section 16 compliant].

155	   For all tests, the test traffic generator sending rate MUST be less
156	   than or equal to 99.98% of the nominal value of Line Rate (with no
157	   further PPM adjustment to account for interface clock tolerances), to
158	   ensure stressing the DUT in reasonable worst case conditions (see RFC
159	   [1] section 5 for more details --note to RFC Editor, please replace
160	   all [1] references in this document with the future RFC number of
161	   that draft). Tests results at a lower rate MAY be provided for better
162	   understanding of performance increase in terms of latency and jitter
163	   when the rate is lower than 99.98%. The receiving rate of the traffic
164	   SHOULD be captured during this test in % of line rate.

166	   The test MUST provide the statistics of minimum, average and maximum
167	   of the latency distribution, for the exact same iteration of the
168	   test.

170	   The test MUST provide the statistics of minimum, average and maximum
171	   of the jitter distribution, for the exact same iteration of the test.

173	   Alternatively when a traffic generator can not be connected to all
174	   ports on the DUT, a snake test MUST be used for line rate testing,
175	   excluding latency and jitter as those became then irrelevant. The
176	   snake test consists in the following method:

178	   -connect the first and last port of the DUT to a traffic generator

180	   -connect back to back sequentially all the ports in between: port 2
181	   to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total
182	   number of ports of the DUT

184	   -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same
185	   vlan Y, etc. port n-1 and port n in the same vlan Z.

187	   This snake test provides a capability to test line rate for Layer 2
188	   and Layer 3 RFC 2544/3918 in instance where a traffic generator with
189	   only two ports is available. The latency and jitter are not to be
190	   considered with this test.

192	2.3 Reporting Format

194	   The report MUST include:

196	   -physical layer calibration information as defined into [1] section
197	   4.

199	   -number of ports used

201	   -reading for "Throughput received in percentage of bandwidth", while
202	   sending 99.98% of nominal value of Line Rate on each port, for each
203	   packet size from 64 bytes to 9216 bytes. As guidance, an increment of
204	   64 byte packet size between each iteration being ideal, a 256 byte
205	   and 512 bytes being are also often used. The most common packets
206	   sizes order for the report is:
207	   64b,128b,256b,512b,1024b,1518b,4096,8000,9216b.

209	   The pattern for testing can be expressed using [RFC 6985].

211	   -Throughput needs to be expressed in % of total transmitted frames

213	   -For packet drops, they MUST be expressed as a count of packets and
214	   SHOULD be expressed in % of line rate

216	   -For latency and jitter, values expressed in unit of time [usually
217	   microsecond or nanosecond] reading across packet size from 64 bytes
218	   to 9216 bytes

220	   -For latency and jitter, provide minimum, average and maximum values.
221	   If different iterations are done to gather the minimum, average and
222	   maximum, it SHOULD be specified in the report along with a
223	   justification on why the information could not have been gathered at
224	   the same test iteration

226	   -For jitter, a histogram describing the population of packets
227	   measured per latency or latency buckets is RECOMMENDED

229	   -The tests for Throughput, latency and jitter MAY be conducted as
230	   individual independent trials, with proper documentation in the
231	   report but SHOULD be conducted at the same time.

233	   -The methodology makes an assumption that the DUT has at least nine
234	   ports, as certain methodologies require that number of ports or
235	   more.

237	3. Buffering Testing

239	3.1 Objective

241	   To measure the size of the buffer of a DUT under
242	   typical|many|multiple conditions. Buffer architectures between
243	   multiple DUTs can differ and include egress buffering, shared egress
244	   buffering SoC (Switch-on-Chip), ingress buffering or a combination.
245	   The test methodology covers the buffer measurement regardless of
246	   buffer architecture used in the DUT.

248	3.2 Methodology

250	   A traffic generator MUST be connected to all ports on the DUT.

252	   The methodology for measuring buffering for a data-center switch is
253	   based on using known congestion of known fixed packet size along with
254	   maximum latency value measurements. The maximum latency will increase
255	   until the first packet drop occurs. At this point, the maximum
256	   latency value will remain constant. This is the point of inflection
257	   of this maximum latency change to a constant value. There MUST be
258	   multiple ingress ports receiving known amount of frames at a known
259	   fixed size, destined for the same egress port in order to create a
260	   known congestion condition. The total amount of packets sent from the
261	   oversubscribed port minus one, multiplied by the packet size
262	   represents the maximum port buffer size at the measured inflection
263	   point.

265	   1) Measure the highest buffer efficiency

267	   The tests described in this section have iterations called "first
268	   iteration", "second iteration" and, "last iteration". The idea is to
269	   show the first two iterations so the reader understands the logic on
270	   how to keep incrementing the iterations. The last iteration shows the
271	   end state of the variables.

273	   First iteration: ingress port 1 sending line rate to egress port 2,
274	   while port 3 sending a known low amount of over-subscription traffic
275	   (1% recommended) with a packet size of 64 bytes to egress port 2.
276	   Measure the buffer size value of the number of frames sent from the
277	   port sending the oversubscribed traffic up to the inflection point
278	   multiplied by the frame size.

280	   Second iteration: ingress port 1 sending line rate to egress port 2,
281	   while port 3 sending a known low amount of over-subscription traffic
282	   (1% recommended) with same packet size 65 bytes to egress port 2.
283	   Measure the buffer size value of the number of frames sent from the
284	   port sending the oversubscribed traffic up to the inflection point
285	   multiplied by the frame size.

287	   Last iteration: ingress port 1 sending line rate to egress port 2,
288	   while port 3 sending a known low amount of over-subscription traffic
289	   (1% recommended) with same packet size B bytes to egress port 2.
290	   Measure the buffer size value of the number of frames sent from the
291	   port sending the oversubscribed traffic up to the inflection point
292	   multiplied by the frame size.

294	   When the B value is found to provide the largest buffer size, then
295	   size B allows the highest buffer efficiency.

297	   2) Measure maximum port buffer size

299	   The tests described in this section have iterations called "first
300	   iteration", "second iteration" and, "last iteration". The idea is to
301	   show the first two iterations so the reader understands the logic on
302	   how to keep incrementing the iterations. The last iteration shows the
303	   end state of the variables.

305	   At fixed packet size B determined in procedure 1), for a fixed
306	   default Differentiated Services Code Point (DSCP)/Class of Service
307	   (COS) value of 0 and for unicast traffic proceed with the following:

309	   First iteration: ingress port 1 sending line rate to egress port 2,
310	   while port 3 sending a known low amount of over-subscription traffic
311	   (1% recommended) with same packet size to the egress port 2. Measure
312	   the buffer size value by multiplying the number of extra frames sent
313	   by the frame size.

315	   Second iteration:  ingress port 2 sending line rate to egress port 3,
316	   while port 4 sending a known low amount of over-subscription traffic
317	   (1% recommended) with same packet size to the egress port 3. Measure
318	   the buffer size value by multiplying the number of extra frames sent
319	   by the frame size.

321	   Last iteration: ingress port N-2 sending line rate traffic to egress
322	   port N-1, while port N sending a known low amount of over-
323	   subscription traffic (1% recommended) with same packet size to the
324	   egress port N. Measure the buffer size value by multiplying the
325	   number of extra frames sent by the frame size.

327	   This test series MAY be repeated using all different DSCP/COS values
328	   of traffic and then using Multicast type of traffic, in order to find
329	   if there is any DSCP/COS impact on the buffer size.

331	   3) Measure maximum port pair buffer sizes

333	   The tests described in this section have iterations called "first
334	   iteration", "second iteration" and, "last iteration". The idea is to
335	   show the first two iterations so the reader understands the logic on
336	   how to keep incrementing the iterations. The last iteration shows the
337	   end state of the variables.

339	   First iteration: ingress port 1 sending line rate to egress port 2;
340	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
341	   N-1 and N will respectively over subscribe at 1% of line rate egress
342	   port 2 and port 3. Measure the buffer size value by multiplying the
343	   number of extra frames sent by the frame size for each egress port.

345	   Second iteration: ingress port 1 sending line rate to egress port 2;
346	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
347	   N-1 and N will respectively over subscribe at 1% of line rate egress
348	   port 4 and port 5. Measure the buffer size value by multiplying the
349	   number of extra frames sent by the frame size for each egress port.

351	   Last iteration: ingress port 1 sending line rate to egress port 2;
352	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
353	   N-1 and N will respectively over subscribe at 1% of line rate egress
354	   port N-3 and port N-2. Measure the buffer size value by multiplying
355	   the number of extra frames sent by the frame size for each egress
356	   port.

358	   This test series MAY be repeated using all different DSCP/COS values
359	   of traffic and then using Multicast type of traffic.

361	   4) Measure maximum DUT buffer size with many to one ports

363	   The tests described in this section have iterations called "first
364	   iteration", "second iteration" and, "last iteration". The idea is to
365	   show the first two iterations so the reader understands the logic on
366	   how to keep incrementing the iterations. The last iteration shows the
367	   end state of the variables.

369	   First iteration: ingress ports 1,2,... N-1 sending each [(1/[N-
370	   1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port.

372	   Second iteration: ingress ports 2,... N sending each [(1/[N-
373	   1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port.

375	   Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N-
376	   1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port.

378	   This test series MAY be repeated using all different COS values of
379	   traffic and then using Multicast type of traffic.

381	   Unicast traffic and then Multicast traffic SHOULD be used in order to
382	   determine the proportion of buffer for documented selection of tests.
383	   Also the COS value for the packets SHOULD be provided for each test
384	   iteration as the buffer allocation size MAY differ per COS value. It
385	   is RECOMMENDED that the ingress and egress ports are varied in a
386	   random, but documented fashion in multiple tests to measure the
387	   buffer size for each port of the DUT.

389	3.3 Reporting format

391	   The report MUST include:

393	    - The packet size used for the most efficient buffer used, along
394	   with DSCP/COS value

396	    - The maximum port buffer size for each port

398	    - The maximum DUT buffer size

400	    - The packet size used in the test

402	    - The amount of over-subscription if different than 1%

404	    - The number of ingress and egress ports along with their location
405	   on the DUT

407	    - The repeatability of the test needs to be indicated: number of
408	   iterations of the same test and percentage of variation between
409	   results for each of the tests (min, max, avg)

411	   The percentage of variation is a metric providing a sense of how big
412	   the difference between the measured value and the previous ones.

414	   For example, for a latency test where the minimum latency is
415	   measured, the percentage of variation of the minimum latency will
416	   indicate by how much this value has varied between the current test
417	   executed and the previous one.

419	   PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the
420	   current test and x1 is the minimum latency value obtained in the
421	   previous test.

423	   The same formula is used for max and avg variations measured.

425	4 Microburst Testing

427	4.1 Objective

429	   To find the maximum amount of packet bursts a DUT can sustain under
430	   various configurations.

432	   This test provides additional methodology to the other RFC tests:

434	   -All bursts should be send with 100% intensity. Note: intensity is
435	   defined in [1] section 6.1.1
436	   -All ports of the DUT must be used for this test

438	   -All ports are recommended to be testes simultaneously

440	4.2 Methodology

442	   A traffic generator MUST be connected to all ports on the DUT. In
443	   order to cause congestion, two or more ingress ports MUST send bursts
444	   of packets destined for the same egress port. The simplest of the
445	   setups would be two ingress ports and one egress port (2-to-1).

447	   The burst MUST be sent with an intensity of 100% (intensity is
448	   defined in [1] section 6.1.1), meaning the burst of packets will be
449	   sent with a minimum inter-packet gap. The amount of packet contained
450	   in the burst will be trial variable and increase until there is a
451	   non-zero packet loss measured. The aggregate amount of packets from
452	   all the senders will be used to calculate the maximum amount of
453	   microburst the DUT can sustain.

455	   It is RECOMMENDED that the ingress and egress ports are varied in
456	   multiple tests to measure the maximum microburst capacity.

458	   The intensity of a microburst MAY be varied in order to obtain the
459	   microburst capacity at various ingress rates. Intensity of microburst
460	   is defined in [1].

462	   It is RECOMMENDED that all ports on the DUT will be tested
463	   simultaneously and in various configurations in order to understand
464	   all the combinations of ingress ports, egress ports and intensities.

466	   An example would be:

468	   First Iteration: N-1 Ingress ports sending to 1 Egress Ports

470	   Second Iterations: N-2 Ingress ports sending to 2 Egress Ports

472	   Last Iterations: 2 Ingress ports sending to N-2 Egress Ports

474	4.3 Reporting Format

476	   The report MUST include:

478	    - The maximum number of packets received per ingress port with the
479	   maximum burst size obtained with zero packet loss

481	    - The packet size used in the test
482	    - The number of ingress and egress ports along with their location
483	   on the DUT

485	    - The repeatability of the test needs to be indicated: number of
486	   iterations of the same test and percentage of variation between
487	   results (min, max, avg)

489	5. Head of Line Blocking

491	5.1 Objective

493	   Head-of-line blocking (HOLB) is a performance-limiting phenomenon
494	   that occurs when packets are held-up by the first packet ahead
495	   waiting to be transmitted to a different output port. This is defined
496	   in RFC 2889 section 5.5, Congestion Control. This section expands on
497	   RFC 2889 in the context of Data Center Benchmarking.

499	   The objective of this test is to understand the DUT behavior under
500	   head of line blocking scenario and measure the packet loss.

502	   Here are the differences between this HOLB test and RFC 2889:

504	   -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC
505	   2889

507	   -This HOLB shifts all the port numbers by one in a second iteration
508	   of the test, this is new compared to RFC 2889. The shifting port
509	   numbers continue until all ports are the first in the group. The
510	   purpose is to make sure to have tested all permutations to cover
511	   differences of behavior in the SoC of the DUT

513	   -Another test in this HOLB expands the group of ports, such that
514	   traffic is divided among 4 ports instead of two (25% instead of 50%
515	   per port)

517	   -Section 5.3 adds additional reporting requirements from Congestion
518	   Control in RFC 2889

520	5.2 Methodology

522	   In order to cause congestion in the form of head of line blocking,
523	   groups of four ports are used. A group has 2 ingress and 2 egress
524	   ports. The first ingress port MUST have two flows configured each
525	   going to a different egress port. The second ingress port will
526	   congest the second egress port by sending line rate. The goal is to
527	   measure if there is loss on the flow for the first egress port which
528	   is not over-subscribed.

530	   A traffic generator MUST be connected to at least eight ports on the
531	   DUT and SHOULD be connected using all the DUT ports.

533	   1) Measure two groups with eight DUT ports

535	   The tests described in this section have iterations called "first
536	   iteration", "second iteration" and, "last iteration". The idea is to
537	   show the first two iterations so the reader understands the logic on
538	   how to keep incrementing the iterations. The last iteration shows the
539	   end state of the variables.

541	   First iteration: measure the packet loss for two groups with
542	   consecutive ports

544	   The first group is composed of: ingress port 1 is sending 50% of
545	   traffic to egress port 3 and ingress port 1 is sending 50% of traffic
546	   to egress port 4. Ingress port 2 is sending line rate to egress port
547	   4. Measure the amount of traffic loss for the traffic from ingress
548	   port 1 to egress port 3.

550	   The second group is composed of: ingress port 5 is sending 50% of
551	   traffic to egress port 7 and ingress port 5 is sending 50% of traffic
552	   to egress port 8. Ingress port 6 is sending line rate to egress port
553	   8. Measure the amount of traffic loss for the traffic from ingress
554	   port 5 to egress port 7.

556	   Second iteration: repeat the first iteration by shifting all the
557	   ports from N to N+1.

559	   The first group is composed of: ingress port 2 is sending 50% of
560	   traffic to egress port 4 and ingress port 2 is sending 50% of traffic
561	   to egress port 5. Ingress port 3 is sending line rate to egress port
562	   5. Measure the amount of traffic loss for the traffic from ingress
563	   port 2 to egress port 4.

565	   The second group is composed of: ingress port 6 is sending 50% of
566	   traffic to egress port 8 and ingress port 6 is sending 50% of traffic
567	   to egress port 9. Ingress port 7 is sending line rate to egress port
568	   9. Measure the amount of traffic loss for the traffic from ingress
569	   port 6 to egress port 8.

571	   Last iteration: when the first port of the first group is connected
572	   on the last DUT port and the last port of the second group is
573	   connected to the seventh port of the DUT.

575	   Measure the amount of traffic loss for the traffic from ingress port
576	   N to egress port 2 and from ingress port 4 to egress port 6.

578	   2) Measure with N/4 groups with N DUT ports

580	   The tests described in this section have iterations called "first
581	   iteration", "second iteration" and, "last iteration". The idea is to
582	   show the first two iterations so the reader understands the logic on
583	   how to keep incrementing the iterations. The last iteration shows the
584	   end state of the variables.

586	   The traffic from ingress split across 4 egress ports (100/4=25%).

588	   First iteration: Expand to fully utilize all the DUT ports in
589	   increments of four. Repeat the methodology of 1) with all the group
590	   of ports possible to achieve on the device and measure for each port
591	   group the amount of traffic loss.

593	   Second iteration: Shift by +1 the start of each consecutive ports of
594	   groups

596	   Last iteration: Shift by N-1 the start of each consecutive ports of
597	   groups and measure the traffic loss for each port group.

599	5.3 Reporting Format

601	   For each test the report MUST include:

603	   - The port configuration including the number and location of ingress
604	   and egress ports located on the DUT

606	   - If HOLB was observed in accordance with the HOLB test in section 5

608	   - Percent of traffic loss

610	   - The repeatability of the test needs to be indicated: number of
611	   iteration of the same test and percentage of variation between
612	   results (min, max, avg)

614	6. Incast Stateful and Stateless Traffic

616	6.1 Objective

618	   The objective of this test is to measure the values for TCP Goodput
619	   [4] and latency with a mix of large and small flows. The test is
620	   designed to simulate a mixed environment of stateful flows that
621	   require high rates of goodput and stateless flows that require low
622	   latency. Stateful flows are created by generating TCP traffic and,
623	   stateless flows are created using UDP type of traffic.

625	6.2 Methodology

627	   In order to simulate the effects of stateless and stateful traffic on
628	   the DUT, there MUST be multiple ingress ports receiving traffic
629	   destined for the same egress port. There also MAY be a mix of
630	   stateful and stateless traffic arriving on a single ingress port. The
631	   simplest setup would be 2 ingress ports receiving traffic destined to
632	   the same egress port.

634	   One ingress port MUST be maintaining a TCP connection trough the
635	   ingress port to a receiver connected to an egress port. Traffic in
636	   the TCP stream MUST be sent at the maximum rate allowed by the
637	   traffic generator. At the same time, the TCP traffic is flowing
638	   through the DUT the stateless traffic is sent destined to a receiver
639	   on the same egress port. The stateless traffic MUST be a microburst
640	   of 100% intensity.

642	   It is RECOMMENDED that the ingress and egress ports are varied in
643	   multiple tests to measure the maximum microburst capacity.

645	   The intensity of a microburst MAY be varied in order to obtain the
646	   microburst capacity at various ingress rates.

648	   It is RECOMMENDED that all ports on the DUT be used in the test.

650	   The tests described bellow have iterations called "first iteration",
651	   "second iteration" and, "last iteration". The idea is to show the
652	   first two iterations so the reader understands the logic on how to
653	   keep incrementing the iterations. The last iteration shows the end
654	   state of the variables.

656	   For example:

658	   Stateful Traffic port variation (TCP traffic):

660	   TCP traffic needs to be generated in this section. During Iterations
661	   number of Egress ports MAY vary as well.

663	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
664	   Ingress port receiving stateless traffic destined to 1 Egress Port

666	   Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1
667	   Ingress port receiving stateless traffic destined to 1 Egress Port
668	   Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1
669	   Ingress port receiving stateless traffic destined to 1 Egress Port

671	   Stateless Traffic port variation (UDP traffic):

673	   UDP traffic needs to be generated for this test. During Iterations,
674	   the number of Egress ports MAY vary as well.

676	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
677	   Ingress port receiving stateless traffic destined to 1 Egress Port

679	   Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2
680	   Ingress port receiving stateless traffic destined to 1 Egress Port

682	   Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2
683	   Ingress port receiving stateless traffic destined to 1 Egress Port

685	6.3 Reporting Format

687	   The report MUST include the following:

689	   - Number of ingress and egress ports along with designation of
690	   stateful or stateless flow assignment.

692	   - Stateful flow goodput

694	   - Stateless flow latency

696	   - The repeatability of the test needs to be indicated: number of
697	   iterations of the same test and percentage of variation between
698	   results (min, max, avg)

700	7.  Security Considerations

702	   Benchmarking activities as described in this memo are limited to
703	   technology characterization using controlled stimuli in a laboratory
704	   environment, with dedicated address space and the constraints
705	   specified in the sections above.

707	   The benchmarking network topology will be an independent test setup
708	   and MUST NOT be connected to devices that may forward the test
709	   traffic into a production network, or misroute traffic to the test
710	   management network.

712	   Further, benchmarking is performed on a "black-box" basis, relying
713	   solely on measurements observable external to the DUT/SUT.

715	   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
716	   benchmarking purposes. Any implications for network security arising
717	   from the DUT/SUT SHOULD be identical in the lab and in production
718	   networks.

720	8.  IANA Considerations

722	   NO IANA Action is requested at this time.

724	9.  References
725	9.1.  Normative References

727	   [RFC1242] Bradner, S. "Benchmarking Terminology for Network
728	         Interconnection Devices", BCP 14, RFC 1242, DOI
729	         10.17487/RFC1242, July 1991, <http://www.rfc-
730	         editor.org/info/rfc1242>

732	   [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
733	         Network Interconnect Devices", BCP 14, RFC 2544, DOI
734	         10.17487/RFC2544, March 1999, <http://www.rfc-
735	         editor.org/info/rfc2544>

737	9.2.  Informative References

739	   [1]  Avramov L. and Rapp J., "Data Center Benchmarking Terminology",
740	         April 2017.

742	   [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for
743	         LAN Switching Devices", RFC 2889, August 2000, <http://www.rfc-
744	         editor.org/info/rfc2889>

746	   [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast
747	         Benchmarking", RFC 3918, October 2004, <http://www.rfc-
748	         editor.org/info/rfc3918>

750	         [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable
751	         Packet Sizes for Additional Testing", RFC 6985, July 2013,
752	         <http://www.rfc-editor.org/info/rfc6985>

754	   [4]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.
755	         Joseph, "Understanding TCP Incast Throughput Collapse in
756	         Datacenter Networks,
757	         "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf"

759	         [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
760	         Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
761	         March 1997, <http://www.rfc-editor.org/info/rfc2119>

763	         [RFC2432] Dubray, K., "Terminology for IP Multicast
764	         Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October
765	         1998, <http://www.rfc-editor.org/info/rfc2432>

767	9.2.  Acknowledgements
768	         The authors would like to thank Alfred Morton and Scott Bradner
769	         for their reviews and feedback.

771	Authors' Addresses

773	         Lucien Avramov
774	         Google
775	         1600 Amphitheatre Parkway
776	         Mountain View, CA 94043
777	         United States
778	         Phone: +1 408 774 9077
779	         Email: lucien.avramov@gmail.com

781	         Jacob Rapp
782	         VMware
783	         3401 Hillview Ave
784	         Palo Alto, CA
785	         United States
786	         Phone: +1 650 857 3367
787	         Email: jrapp@vmware.com