idnits 2.17.1 

draft-ietf-bmwg-dcbench-methodology-17.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	INTERNET-DRAFT, Intended Status: Informational                    Google
4	Expires December 23,2017                                         J. Rapp
5	June 21, 2017                                                     VMware

7	                  Data Center Benchmarking Methodology
8	                 draft-ietf-bmwg-dcbench-methodology-17

10	Abstract

12	   The purpose of this informational document is to establish test and
13	   evaluation methodology and measurement techniques for physical
14	   network equipment in the data center. A pre-requisite to this
15	   publication is the terminology document [draft-ietf-bmwg-dcbench-
16	   terminology]. Many of these terms and methods may be applicable
17	   beyond this publication's scope as the technologies originally
18	   applied in the data center are deployed elsewhere.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF). Note that other groups may also distribute working
27	   documents as Internet-Drafts. The list of current Internet-Drafts is
28	   at http://datatracker.ietf.org/drafts/current.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time. It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	Copyright Notice

37	   Copyright (c) 2017 IETF Trust and the persons identified as the
38	   document authors. All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document. Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document. Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Table of Contents

52	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
53	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
54	     1.2. Methodology format and repeatability recommendation . . . .  5
55	   2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . .  5
56	     2.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	     2.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  5
58	     2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . .  6
59	   3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . .  7
60	     3.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  7
61	     3.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  8
62	     3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 10
63	   4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11
64	     4.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 11
65	     4.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 12
66	     4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12
67	   5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13
68	     5.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 13
69	     5.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 13
70	     5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15
71	   6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15
72	     6.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 15
73	     6.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 16
74	     6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17
75	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
76	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
77	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
78	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 19
79	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19
80	     9.2.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20
81	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

83	1.  Introduction

85	   Traffic patterns in the data center are not uniform and are
86	   constantly changing. They are dictated by the nature and variety of
87	   applications utilized in the data center. It can be largely east-west
88	   traffic flows (server to server inside the data center) in one data
89	   center and north-south (outside of the data center to server) in
90	   another, while others may combine both. Traffic patterns can be
91	   bursty in nature and contain many-to-one, many-to-many, or one-to-
92	   many flows. Each flow may also be small and latency sensitive or
93	   large and throughput sensitive while containing a mix of UDP and TCP
94	   traffic. All of these can coexist in a single cluster and flow
95	   through a single network device simultaneously. Benchmarking of
96	   network devices have long used [RFC1242], [RFC2432], [RFC2544],
97	   [RFC2889] and [RFC3918] which have largely been focused around
98	   various latency attributes and Throughput [RFC2889] of the Device
99	   Under Test (DUT) being benchmarked. These standards are good at
100	   measuring theoretical Throughput, forwarding rates and latency under
101	   testing conditions; however, they do not represent real traffic
102	   patterns that may affect these networking devices.

104	   Currently, typical data center networking devices are characterized
105	   by:

107	   -High port density (48 ports of more)

109	   -High speed (up to 100 GB/s currently per port)

111	   -High throughput (line rate on all ports for Layer 2 and/or Layer 3)

113	   -Low latency (in the microsecond or nanosecond range)

115	   -Low amount of buffer (in the MB range per networking device)

117	   -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory)

119	   This document provides a methodology for benchmarking Data Center
120	   physical network equipment DUT including congestion scenarios, switch
121	   buffer analysis, microburst, head of line blocking, while also using
122	   a wide mix of traffic conditions. The terminology document [draft-
123	   ietf-bmwg-dcbench-terminology] is a pre-requisite.

125	1.1.  Requirements Language

127	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
128	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
129	   document are to be interpreted as described in RFC 2119 [RFC2119].

131	1.2. Methodology format and repeatability recommendation

133	   The format used for each section of this document is the following:

135	   -Objective

137	   -Methodology

139	   -Reporting Format

141	   Additional interpretation of RFC2119 terms:

143	   For each test methodology described, it is critical to obtain
144	   repeatability in the results. The recommendation is to perform enough
145	   iterations of the given test and to make sure the result is
146	   consistent. This is especially important for section 3, as the
147	   buffering testing has been historically the least reliable. The
148	   number of iterations SHOULD be explicitly reported. The relative
149	   standard deviation SHOULD be below 10%.

151	2. Line Rate Testing

153	2.1 Objective

155	   Provide a maximum rate test for the performance values for
156	   Throughput, latency and jitter. It is meant to provide the tests to
157	   perform, and methodology to verify that a DUT is capable of
158	   forwarding packets at line rate under non-congested conditions.

160	2.2 Methodology

162	   A traffic generator SHOULD be connected to all ports on the DUT. Two
163	   tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15
164	   compliant] and also in a full mesh type of DUT test [2889/3918
165	   section 16 compliant].

167	   For all tests, the test traffic generator sending rate MUST be less
168	   than or equal to 99.98% of the nominal value of Line Rate (with no
169	   further PPM adjustment to account for interface clock tolerances), to
170	   ensure stressing the DUT in reasonable worst case conditions (see RFC
171	   [draft-ietf-bmwg-dcbench-terminology] section 5 for more details --
172	   note to RFC Editor, please replace all [draft-ietf-bmwg-dcbench-
173	   terminology] references in this document with the future RFC number
174	   of that draft). Tests results at a lower rate MAY be provided for
175	   better understanding of performance increase in terms of latency and
176	   jitter when the rate is lower than 99.98%. The receiving rate of the
177	   traffic SHOULD be captured during this test in % of line rate.

179	   The test MUST provide the statistics of minimum, average and maximum
180	   of the latency distribution, for the exact same iteration of the
181	   test.

183	   The test MUST provide the statistics of minimum, average and maximum
184	   of the jitter distribution, for the exact same iteration of the test.

186	   Alternatively when a traffic generator can not be connected to all
187	   ports on the DUT, a snake test MUST be used for line rate testing,
188	   excluding latency and jitter as those became then irrelevant. The
189	   snake test consists in the following method:

191	   -connect the first and last port of the DUT to a traffic generator

193	   -connect back to back sequentially all the ports in between: port 2
194	   to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total
195	   number of ports of the DUT

197	   -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same
198	   vlan Y, etc. port n-1 and port n in the same vlan Z.

200	   This snake test provides a capability to test line rate for Layer 2
201	   and Layer 3 RFC 2544/3918 in instance where a traffic generator with
202	   only two ports is available. The latency and jitter are not to be
203	   considered with this test.

205	2.3 Reporting Format

207	   The report MUST include:

209	   -physical layer calibration information as defined into [draft-ietf-
210	   bmwg-dcbench-terminology] section 4.

212	   -number of ports used

214	   -reading for "Throughput received in percentage of bandwidth", while
215	   sending 99.98% of nominal value of Line Rate on each port, for each
216	   packet size from 64 bytes to 9216 bytes. As guidance, an increment of
217	   64 byte packet size between each iteration being ideal, a 256 byte
218	   and 512 bytes being are also often used. The most common packets
219	   sizes order for the report is:
220	   64b,128b,256b,512b,1024b,1518b,4096,8000,9216b.

222	   The pattern for testing can be expressed using [RFC 6985].

224	   -Throughput needs to be expressed in % of total transmitted frames

226	   -For packet drops, they MUST be expressed as a count of packets and
227	   SHOULD be expressed in % of line rate

229	   -For latency and jitter, values expressed in unit of time [usually
230	   microsecond or nanosecond] reading across packet size from 64 bytes
231	   to 9216 bytes

233	   -For latency and jitter, provide minimum, average and maximum values.
234	   If different iterations are done to gather the minimum, average and
235	   maximum, it SHOULD be specified in the report along with a
236	   justification on why the information could not have been gathered at
237	   the same test iteration

239	   -For jitter, a histogram describing the population of packets
240	   measured per latency or latency buckets is RECOMMENDED

242	   -The tests for Throughput, latency and jitter MAY be conducted as
243	   individual independent trials, with proper documentation in the
244	   report but SHOULD be conducted at the same time.

246	   -The methodology makes an assumption that the DUT has at least nine
247	   ports, as certain methodologies require that number of ports or
248	   more.

250	3. Buffering Testing

252	3.1 Objective

254	   To measure the size of the buffer of a DUT under
255	   typical|many|multiple conditions. Buffer architectures between
256	   multiple DUTs can differ and include egress buffering, shared egress
257	   buffering SoC (Switch-on-Chip), ingress buffering or a combination.
258	   The test methodology covers the buffer measurement regardless of
259	   buffer architecture used in the DUT.

261	3.2 Methodology

263	   A traffic generator MUST be connected to all ports on the DUT.

265	   The methodology for measuring buffering for a data-center switch is
266	   based on using known congestion of known fixed packet size along with
267	   maximum latency value measurements. The maximum latency will increase
268	   until the first packet drop occurs. At this point, the maximum
269	   latency value will remain constant. This is the point of inflection
270	   of this maximum latency change to a constant value. There MUST be
271	   multiple ingress ports receiving known amount of frames at a known
272	   fixed size, destined for the same egress port in order to create a
273	   known congestion condition. The total amount of packets sent from the
274	   oversubscribed port minus one, multiplied by the packet size
275	   represents the maximum port buffer size at the measured inflection
276	   point.

278	   1) Measure the highest buffer efficiency

280	   The tests described in this section have iterations called "first
281	   iteration", "second iteration" and, "last iteration". The idea is to
282	   show the first two iterations so the reader understands the logic on
283	   how to keep incrementing the iterations. The last iteration shows the
284	   end state of the variables.

286	   First iteration: ingress port 1 sending line rate to egress port 2,
287	   while port 3 sending a known low amount of over-subscription traffic
288	   (1% recommended) with a packet size of 64 bytes to egress port 2.
289	   Measure the buffer size value of the number of frames sent from the
290	   port sending the oversubscribed traffic up to the inflection point
291	   multiplied by the frame size.

293	   Second iteration: ingress port 1 sending line rate to egress port 2,
294	   while port 3 sending a known low amount of over-subscription traffic
295	   (1% recommended) with same packet size 65 bytes to egress port 2.
296	   Measure the buffer size value of the number of frames sent from the
297	   port sending the oversubscribed traffic up to the inflection point
298	   multiplied by the frame size.

300	   Last iteration: ingress port 1 sending line rate to egress port 2,
301	   while port 3 sending a known low amount of over-subscription traffic
302	   (1% recommended) with same packet size B bytes to egress port 2.
303	   Measure the buffer size value of the number of frames sent from the
304	   port sending the oversubscribed traffic up to the inflection point
305	   multiplied by the frame size.

307	   When the B value is found to provide the largest buffer size, then
308	   size B allows the highest buffer efficiency.

310	   2) Measure maximum port buffer size

312	   The tests described in this section have iterations called "first
313	   iteration", "second iteration" and, "last iteration". The idea is to
314	   show the first two iterations so the reader understands the logic on
315	   how to keep incrementing the iterations. The last iteration shows the
316	   end state of the variables.

318	   At fixed packet size B determined in procedure 1), for a fixed
319	   default Differentiated Services Code Point (DSCP)/Class of Service
320	   (COS) value of 0 and for unicast traffic proceed with the following:

322	   First iteration: ingress port 1 sending line rate to egress port 2,
323	   while port 3 sending a known low amount of over-subscription traffic
324	   (1% recommended) with same packet size to the egress port 2. Measure
325	   the buffer size value by multiplying the number of extra frames sent
326	   by the frame size.

328	   Second iteration:  ingress port 2 sending line rate to egress port 3,
329	   while port 4 sending a known low amount of over-subscription traffic
330	   (1% recommended) with same packet size to the egress port 3. Measure
331	   the buffer size value by multiplying the number of extra frames sent
332	   by the frame size.

334	   Last iteration: ingress port N-2 sending line rate traffic to egress
335	   port N-1, while port N sending a known low amount of over-
336	   subscription traffic (1% recommended) with same packet size to the
337	   egress port N. Measure the buffer size value by multiplying the
338	   number of extra frames sent by the frame size.

340	   This test series MAY be repeated using all different DSCP/COS values
341	   of traffic and then using Multicast type of traffic, in order to find
342	   if there is any DSCP/COS impact on the buffer size.

344	   3) Measure maximum port pair buffer sizes

346	   The tests described in this section have iterations called "first
347	   iteration", "second iteration" and, "last iteration". The idea is to
348	   show the first two iterations so the reader understands the logic on
349	   how to keep incrementing the iterations. The last iteration shows the
350	   end state of the variables.

352	   First iteration: ingress port 1 sending line rate to egress port 2;
353	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
354	   N-1 and N will respectively over subscribe at 1% of line rate egress
355	   port 2 and port 3. Measure the buffer size value by multiplying the
356	   number of extra frames sent by the frame size for each egress port.

358	   Second iteration: ingress port 1 sending line rate to egress port 2;
359	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
360	   N-1 and N will respectively over subscribe at 1% of line rate egress
361	   port 4 and port 5. Measure the buffer size value by multiplying the
362	   number of extra frames sent by the frame size for each egress port.

364	   Last iteration: ingress port 1 sending line rate to egress port 2;
365	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
366	   N-1 and N will respectively over subscribe at 1% of line rate egress
367	   port N-3 and port N-2. Measure the buffer size value by multiplying
368	   the number of extra frames sent by the frame size for each egress
369	   port.

371	   This test series MAY be repeated using all different DSCP/COS values
372	   of traffic and then using Multicast type of traffic.

374	   4) Measure maximum DUT buffer size with many to one ports

376	   The tests described in this section have iterations called "first
377	   iteration", "second iteration" and, "last iteration". The idea is to
378	   show the first two iterations so the reader understands the logic on
379	   how to keep incrementing the iterations. The last iteration shows the
380	   end state of the variables.

382	   First iteration: ingress ports 1,2,... N-1 sending each [(1/[N-
383	   1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port.

385	   Second iteration: ingress ports 2,... N sending each [(1/[N-
386	   1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port.

388	   Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N-
389	   1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port.

391	   This test series MAY be repeated using all different COS values of
392	   traffic and then using Multicast type of traffic.

394	   Unicast traffic and then Multicast traffic SHOULD be used in order to
395	   determine the proportion of buffer for documented selection of tests.
396	   Also the COS value for the packets SHOULD be provided for each test
397	   iteration as the buffer allocation size MAY differ per COS value. It
398	   is RECOMMENDED that the ingress and egress ports are varied in a
399	   random, but documented fashion in multiple tests to measure the
400	   buffer size for each port of the DUT.

402	3.3 Reporting format

404	   The report MUST include:

406	    - The packet size used for the most efficient buffer used, along
407	   with DSCP/COS value

409	    - The maximum port buffer size for each port

411	    - The maximum DUT buffer size

413	    - The packet size used in the test

415	    - The amount of over-subscription if different than 1%

417	    - The number of ingress and egress ports along with their location
418	   on the DUT

420	    - The repeatability of the test needs to be indicated: number of
421	   iterations of the same test and percentage of variation between
422	   results for each of the tests (min, max, avg)

424	   The percentage of variation is a metric providing a sense of how big
425	   the difference between the measured value and the previous ones.

427	   For example, for a latency test where the minimum latency is
428	   measured, the percentage of variation of the minimum latency will
429	   indicate by how much this value has varied between the current test
430	   executed and the previous one.

432	   PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the
433	   current test and x1 is the minimum latency value obtained in the
434	   previous test.

436	   The same formula is used for max and avg variations measured.

438	4 Microburst Testing

440	4.1 Objective

442	   To find the maximum amount of packet bursts a DUT can sustain under
443	   various configurations.

445	   This test provides additional methodology to the other RFC tests:

447	   -All bursts should be send with 100% intensity. Note: intensity is
448	   defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1

450	   -All ports of the DUT must be used for this test

452	   -All ports are recommended to be testes simultaneously

454	4.2 Methodology

456	   A traffic generator MUST be connected to all ports on the DUT. In
457	   order to cause congestion, two or more ingress ports MUST send bursts
458	   of packets destined for the same egress port. The simplest of the
459	   setups would be two ingress ports and one egress port (2-to-1).

461	   The burst MUST be sent with an intensity of 100% (intensity is
462	   defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1),
463	   meaning the burst of packets will be sent with a minimum inter-packet
464	   gap. The amount of packet contained in the burst will be trial
465	   variable and increase until there is a non-zero packet loss measured.
466	   The aggregate amount of packets from all the senders will be used to
467	   calculate the maximum amount of microburst the DUT can sustain.

469	   It is RECOMMENDED that the ingress and egress ports are varied in
470	   multiple tests to measure the maximum microburst capacity.

472	   The intensity of a microburst MAY be varied in order to obtain the
473	   microburst capacity at various ingress rates. Intensity of microburst
474	   is defined in [draft-ietf-bmwg-dcbench-terminology].

476	   It is RECOMMENDED that all ports on the DUT will be tested
477	   simultaneously and in various configurations in order to understand
478	   all the combinations of ingress ports, egress ports and intensities.

480	   An example would be:

482	   First Iteration: N-1 Ingress ports sending to 1 Egress Ports

484	   Second Iterations: N-2 Ingress ports sending to 2 Egress Ports

486	   Last Iterations: 2 Ingress ports sending to N-2 Egress Ports

488	4.3 Reporting Format

490	   The report MUST include:

492	    - The maximum number of packets received per ingress port with the
493	   maximum burst size obtained with zero packet loss

495	    - The packet size used in the test

497	    - The number of ingress and egress ports along with their location
498	   on the DUT

500	    - The repeatability of the test needs to be indicated: number of
501	   iterations of the same test and percentage of variation between
502	   results (min, max, avg)

504	5. Head of Line Blocking

506	5.1 Objective

508	   Head-of-line blocking (HOLB) is a performance-limiting phenomenon
509	   that occurs when packets are held-up by the first packet ahead
510	   waiting to be transmitted to a different output port. This is defined
511	   in RFC 2889 section 5.5, Congestion Control. This section expands on
512	   RFC 2889 in the context of Data Center Benchmarking.

514	   The objective of this test is to understand the DUT behavior under
515	   head of line blocking scenario and measure the packet loss.

517	   Here are the differences between this HOLB test and RFC 2889:

519	   -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC
520	   2889

522	   -This HOLB shifts all the port numbers by one in a second iteration
523	   of the test, this is new compared to RFC 2889. The shifting port
524	   numbers continue until all ports are the first in the group. The
525	   purpose is to make sure to have tested all permutations to cover
526	   differences of behavior in the SoC of the DUT

528	   -Another test in this HOLB expands the group of ports, such that
529	   traffic is divided among 4 ports instead of two (25% instead of 50%
530	   per port)

532	   -Section 5.3 adds additional reporting requirements from Congestion
533	   Control in RFC 2889

535	5.2 Methodology

537	   In order to cause congestion in the form of head of line blocking,
538	   groups of four ports are used. A group has 2 ingress and 2 egress
539	   ports. The first ingress port MUST have two flows configured each
540	   going to a different egress port. The second ingress port will
541	   congest the second egress port by sending line rate. The goal is to
542	   measure if there is loss on the flow for the first egress port which
543	   is not over-subscribed.

545	   A traffic generator MUST be connected to at least eight ports on the
546	   DUT and SHOULD be connected using all the DUT ports.

548	   1) Measure two groups with eight DUT ports

550	   The tests described in this section have iterations called "first
551	   iteration", "second iteration" and, "last iteration". The idea is to
552	   show the first two iterations so the reader understands the logic on
553	   how to keep incrementing the iterations. The last iteration shows the
554	   end state of the variables.

556	   First iteration: measure the packet loss for two groups with
557	   consecutive ports

559	   The first group is composed of: ingress port 1 is sending 50% of
560	   traffic to egress port 3 and ingress port 1 is sending 50% of traffic
561	   to egress port 4. Ingress port 2 is sending line rate to egress port
562	   4. Measure the amount of traffic loss for the traffic from ingress
563	   port 1 to egress port 3.

565	   The second group is composed of: ingress port 5 is sending 50% of
566	   traffic to egress port 7 and ingress port 5 is sending 50% of traffic
567	   to egress port 8. Ingress port 6 is sending line rate to egress port
568	   8. Measure the amount of traffic loss for the traffic from ingress
569	   port 5 to egress port 7.

571	   Second iteration: repeat the first iteration by shifting all the
572	   ports from N to N+1.

574	   The first group is composed of: ingress port 2 is sending 50% of
575	   traffic to egress port 4 and ingress port 2 is sending 50% of traffic
576	   to egress port 5. Ingress port 3 is sending line rate to egress port
577	   5. Measure the amount of traffic loss for the traffic from ingress
578	   port 2 to egress port 4.

580	   The second group is composed of: ingress port 6 is sending 50% of
581	   traffic to egress port 8 and ingress port 6 is sending 50% of traffic
582	   to egress port 9. Ingress port 7 is sending line rate to egress port
583	   9. Measure the amount of traffic loss for the traffic from ingress
584	   port 6 to egress port 8.

586	   Last iteration: when the first port of the first group is connected
587	   on the last DUT port and the last port of the second group is
588	   connected to the seventh port of the DUT.

590	   Measure the amount of traffic loss for the traffic from ingress port
591	   N to egress port 2 and from ingress port 4 to egress port 6.

593	   2) Measure with N/4 groups with N DUT ports

595	   The tests described in this section have iterations called "first
596	   iteration", "second iteration" and, "last iteration". The idea is to
597	   show the first two iterations so the reader understands the logic on
598	   how to keep incrementing the iterations. The last iteration shows the
599	   end state of the variables.

601	   The traffic from ingress split across 4 egress ports (100/4=25%).

603	   First iteration: Expand to fully utilize all the DUT ports in
604	   increments of four. Repeat the methodology of 1) with all the group
605	   of ports possible to achieve on the device and measure for each port
606	   group the amount of traffic loss.

608	   Second iteration: Shift by +1 the start of each consecutive ports of
609	   groups

611	   Last iteration: Shift by N-1 the start of each consecutive ports of
612	   groups and measure the traffic loss for each port group.

614	5.3 Reporting Format

616	   For each test the report MUST include:

618	   - The port configuration including the number and location of ingress
619	   and egress ports located on the DUT

621	   - If HOLB was observed in accordance with the HOLB test in section 5

623	   - Percent of traffic loss

625	   - The repeatability of the test needs to be indicated: number of
626	   iteration of the same test and percentage of variation between
627	   results (min, max, avg)

629	6. Incast Stateful and Stateless Traffic

631	6.1 Objective

633	   The objective of this test is to measure the values for TCP Goodput
634	   [1] and latency with a mix of large and small flows. The test is
635	   designed to simulate a mixed environment of stateful flows that
636	   require high rates of goodput and stateless flows that require low
637	   latency. Stateful flows are created by generating TCP traffic and,
638	   stateless flows are created using UDP type of traffic.

640	6.2 Methodology

642	   In order to simulate the effects of stateless and stateful traffic on
643	   the DUT, there MUST be multiple ingress ports receiving traffic
644	   destined for the same egress port. There also MAY be a mix of
645	   stateful and stateless traffic arriving on a single ingress port. The
646	   simplest setup would be 2 ingress ports receiving traffic destined to
647	   the same egress port.

649	   One ingress port MUST be maintaining a TCP connection trough the
650	   ingress port to a receiver connected to an egress port. Traffic in
651	   the TCP stream MUST be sent at the maximum rate allowed by the
652	   traffic generator. At the same time, the TCP traffic is flowing
653	   through the DUT the stateless traffic is sent destined to a receiver
654	   on the same egress port. The stateless traffic MUST be a microburst
655	   of 100% intensity.

657	   It is RECOMMENDED that the ingress and egress ports are varied in
658	   multiple tests to measure the maximum microburst capacity.

660	   The intensity of a microburst MAY be varied in order to obtain the
661	   microburst capacity at various ingress rates.

663	   It is RECOMMENDED that all ports on the DUT be used in the test.

665	   The tests described bellow have iterations called "first iteration",
666	   "second iteration" and, "last iteration". The idea is to show the
667	   first two iterations so the reader understands the logic on how to
668	   keep incrementing the iterations. The last iteration shows the end
669	   state of the variables.

671	   For example:

673	   Stateful Traffic port variation (TCP traffic):

675	   TCP traffic needs to be generated in this section. During Iterations
676	   number of Egress ports MAY vary as well.

678	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
679	   Ingress port receiving stateless traffic destined to 1 Egress Port

681	   Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1
682	   Ingress port receiving stateless traffic destined to 1 Egress Port

684	   Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1
685	   Ingress port receiving stateless traffic destined to 1 Egress Port

687	   Stateless Traffic port variation (UDP traffic):

689	   UDP traffic needs to be generated for this test. During Iterations,
690	   the number of Egress ports MAY vary as well.

692	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
693	   Ingress port receiving stateless traffic destined to 1 Egress Port

695	   Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2
696	   Ingress port receiving stateless traffic destined to 1 Egress Port

698	   Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2
699	   Ingress port receiving stateless traffic destined to 1 Egress Port

701	6.3 Reporting Format

703	   The report MUST include the following:

705	   - Number of ingress and egress ports along with designation of
706	   stateful or stateless flow assignment.

708	   - Stateful flow goodput

710	   - Stateless flow latency

712	   - The repeatability of the test needs to be indicated: number of
713	   iterations of the same test and percentage of variation between
714	   results (min, max, avg)

716	7.  Security Considerations

718	   Benchmarking activities as described in this memo are limited to
719	   technology characterization using controlled stimuli in a laboratory
720	   environment, with dedicated address space and the constraints
721	   specified in the sections above.

723	   The benchmarking network topology will be an independent test setup
724	   and MUST NOT be connected to devices that may forward the test
725	   traffic into a production network, or misroute traffic to the test
726	   management network.

728	   Further, benchmarking is performed on a "black-box" basis, relying
729	   solely on measurements observable external to the DUT.

731	   Special capabilities SHOULD NOT exist in the DUT specifically for
732	   benchmarking purposes. Any implications for network security arising
733	   from the DUT SHOULD be identical in the lab and in production
734	   networks.

736	8.  IANA Considerations
737	   NO IANA Action is requested at this time.

739	9.  References
740	9.1.  Normative References

742	   [RFC1242] Bradner, S. "Benchmarking Terminology for Network
743	         Interconnection Devices", BCP 14, RFC 1242, DOI
744	         10.17487/RFC1242, July 1991, <http://www.rfc-
745	         editor.org/info/rfc1242>

747	   [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
748	         Network Interconnect Devices", BCP 14, RFC 2544, DOI
749	         10.17487/RFC2544, March 1999, <http://www.rfc-
750	         editor.org/info/rfc2544>

752	9.2.  Informative References

754	   [draft-ietf-bmwg-dcbench-terminology]  Avramov L. and Rapp J., "Data
755	         Center Benchmarking Terminology", April 2017, RFC "draft-ietf-
756	         bmwg-dcbench-terminology", Date [to be fixed when the RFC is
757	         published and 1 to be replaced by the RFC number

759	   [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for
760	         LAN Switching Devices", RFC 2889, August 2000, <http://www.rfc-
761	         editor.org/info/rfc2889>

763	   [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast
764	         Benchmarking", RFC 3918, October 2004, <http://www.rfc-
765	         editor.org/info/rfc3918>

767	         [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable
768	         Packet Sizes for Additional Testing", RFC 6985, July 2013,
769	         <http://www.rfc-editor.org/info/rfc6985>

771	   [1]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.
772	         Joseph, "Understanding TCP Incast Throughput Collapse in
773	         Datacenter Networks,
774	         "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf"

776	         [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
777	         Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
778	         March 1997, <http://www.rfc-editor.org/info/rfc2119>

780	         [RFC2432] Dubray, K., "Terminology for IP Multicast
781	         Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October
782	         1998, <http://www.rfc-editor.org/info/rfc2432>

784	9.2.  Acknowledgements

786	         The authors would like to thank Alfred Morton and Scott Bradner
787	         for their reviews and feedback.

789	Authors' Addresses

791	         Lucien Avramov
792	         Google
793	         1600 Amphitheatre Parkway
794	         Mountain View, CA 94043
795	         United States
796	         Phone: +1 408 774 9077
797	         Email: lucien.avramov@gmail.com

799	         Jacob Rapp
800	         VMware
801	         3401 Hillview Ave
802	         Palo Alto, CA
803	         United States
804	         Phone: +1 650 857 3367
805	         Email: jrapp@vmware.com