idnits 2.17.1 

draft-ietf-bmwg-dcbench-methodology-14.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([1]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	INTERNET-DRAFT, Intended Status: Informational                    Google
4	Expires December 22,2017                                         J. Rapp
5	June 20, 2017                                                     VMware

7	                  Data Center Benchmarking Methodology
8	                 draft-ietf-bmwg-dcbench-methodology-14

10	Abstract

12	   The purpose of this informational document is to establish test and
13	   evaluation methodology and measurement techniques for physical
14	   network equipment in the data center. A pre-requisite to this
15	   publication is the terminology document [1]. Many of these terms and
16	   methods may be applicable beyond this publication's scope as the
17	   technologies originally applied in the data center are deployed
18	   elsewhere.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF). Note that other groups may also distribute working
27	   documents as Internet-Drafts. The list of current Internet-Drafts is
28	   at http://datatracker.ietf.org/drafts/current.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time. It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	Copyright Notice

37	   Copyright (c) 2017 IETF Trust and the persons identified as the
38	   document authors. All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document. Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document. Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Table of Contents

52	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
53	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
54	     1.2. Methodology format and repeatability recommendation . . . .  5
55	   2. Line Rate Testing . . . . . . . . . . . . . . . . . . . . . . .  5
56	     2.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	     2.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  5
58	     2.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . .  6
59	   3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . .  7
60	     3.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . .  7
61	     3.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . .  8
62	     3.3 Reporting format . . . . . . . . . . . . . . . . . . . . . . 11
63	   4 Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11
64	     4.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 11
65	     4.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 12
66	     4.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12
67	   5. Head of Line Blocking . . . . . . . . . . . . . . . . . . . . . 13
68	     5.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 13
69	     5.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 13
70	     5.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15
71	   6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15
72	     6.1 Objective  . . . . . . . . . . . . . . . . . . . . . . . . . 15
73	     6.2 Methodology  . . . . . . . . . . . . . . . . . . . . . . . . 16
74	     6.3 Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17
75	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
76	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
77	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
78	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 19
79	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19
80	     9.2.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19
81	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

83	1.  Introduction

85	   Traffic patterns in the data center are not uniform and are
86	   constantly changing. They are dictated by the nature and variety of
87	   applications utilized in the data center. It can be largely east-west
88	   traffic flows in one data center and north-south in another, while
89	   others may combine both. Traffic patterns can be bursty in nature and
90	   contain many-to-one, many-to-many, or one-to-many flows. Each flow
91	   may also be small and latency sensitive or large and throughput
92	   sensitive while containing a mix of UDP and TCP traffic. All of these
93	   can coexist in a single cluster and flow through a single network
94	   device simultaneously. Benchmarking of network devices have long used
95	   [RFC1242], [RFC2432], [RFC2544], [RFC2889] and [RFC3918] which have
96	   largely been focused around various latency attributes and Throughput
97	   [RFC2889] of the Device Under Test (DUT) being benchmarked. These
98	   standards are good at measuring theoretical Throughput, forwarding
99	   rates and latency under testing conditions; however, they do not
100	   represent real traffic patterns that may affect these networking
101	   devices.

103	   Currently, typical data Center networking devices are characterized
104	   by:

106	   -High port density (48 ports of more)

108	   -High speed (up to 100 Gb/s currently per port)

110	   -High throughput (line rate on all ports for Layer 2 and/or Layer 3)

112	   -Low latency (in the microsecond or nanosecond range)

114	   -Low amount of buffer (in the Mb range)

116	   -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory)

118	   This document provides a methodology for benchmarking Data Center
119	   physical network equipment DUT including congestion scenarios, switch
120	   buffer analysis, microburst, head of line blocking, while also using
121	   a wide mix of traffic conditions. The terminology document [1] is a
122	   pre-requisite.

124	1.1.  Requirements Language

126	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
127	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
128	   document are to be interpreted as described in RFC 2119 [RFC2119].

130	1.2. Methodology format and repeatability recommendation

132	   The format used for each section of this document is the following:

134	   -Objective

136	   -Methodology

138	   -Reporting Format: Additional interpretation of RFC2119 terms:

140	   MUST: required metric or benchmark for the scenario described
141	   (minimum)

143	   SHOULD or RECOMMENDED: strongly suggested metric for the scenario
144	   described

146	   MAY: Optional metric for the scenario described

148	   For each test methodology described, it is critical to obtain
149	   repeatability in the results. The recommendation is to perform enough
150	   iterations of the given test and to make sure the result is
151	   consistent. This is especially important for section 3, as the
152	   buffering testing has been historically the least reliable. The
153	   number of iterations SHOULD be explicitly reported. The relative
154	   standard deviation SHOULD be below 10%.

156	2. Line Rate Testing

158	2.1 Objective

160	   Provide a maximum rate test for the performance values for
161	   Throughput, latency and jitter. It is meant to provide the tests to
162	   perform, and methodology to verify that a DUT is capable of
163	   forwarding packets at line rate under non-congested conditions.

165	2.2 Methodology

167	   A traffic generator SHOULD be connected to all ports on the DUT. Two
168	   tests MUST be conducted: a port-pair test [RFC 2544/3918 section 15
169	   compliant] and also in a full mesh type of DUT test [2889/3918
170	   section 16 compliant].

172	   For all tests, the test traffic generator sending rate MUST be less
173	   than or equal to 99.98% of the nominal value of Line Rate (with no
174	   further PPM adjustment to account for interface clock tolerances), to
175	   ensure stressing the DUT in reasonable worst case conditions (see RFC
176	   [1] section 5 for more details --note to RFC Editor, please replace
177	   all [1] references in this document with the future RFC number of
178	   that draft). Tests results at a lower rate MAY be provided for better
179	   understanding of performance increase in terms of latency and jitter
180	   when the rate is lower than 99.98%. The receiving rate of the traffic
181	   SHOULD be captured during this test in % of line rate.

183	   The test MUST provide the statistics of minimum, average and maximum
184	   of the latency distribution, for the exact same iteration of the
185	   test.

187	   The test MUST provide the statistics of minimum, average and maximum
188	   of the jitter distribution, for the exact same iteration of the test.

190	   Alternatively when a traffic generator can not be connected to all
191	   ports on the DUT, a snake test MUST be used for line rate testing,
192	   excluding latency and jitter as those became then irrelevant. The
193	   snake test consists in the following method:

195	   -connect the first and last port of the DUT to a traffic generator

197	   -connect back to back sequentially all the ports in between: port 2
198	   to 3, port 4 to 5 etc to port n-2 to port n-1; where n is the total
199	   number of ports of the DUT

201	   -configure port 1 and 2 in the same vlan X, port 3 and 4 in the same
202	   vlan Y, etc. port n-1 and port n in the same vlan Z.

204	   This snake test provides a capability to test line rate for Layer 2
205	   and Layer 3 RFC 2544/3918 in instance where a traffic generator with
206	   only two ports is available. The latency and jitter are not to be
207	   considered with this test.

209	2.3 Reporting Format

211	   The report MUST include:

213	   -physical layer calibration information as defined into [1] section
214	   4.

216	   -number of ports used

218	   -reading for "Throughput received in percentage of bandwidth", while
219	   sending 99.98% of nominal value of Line Rate on each port, for each
220	   packet size from 64 bytes to 9216 bytes. As guidance, an increment of
221	   64 byte packet size between each iteration being ideal, a 256 byte
222	   and 512 bytes being are also often used. The most common packets
223	   sizes order for the report is:
224	   64b,128b,256b,512b,1024b,1518b,4096,8000,9216b.

226	   The pattern for testing can be expressed using [RFC 6985].

228	   -Throughput needs to be expressed in % of total transmitted frames

230	   -For packet drops, they MUST be expressed as a count of packets and
231	   SHOULD be expressed in % of line rate

233	   -For latency and jitter, values expressed in unit of time [usually
234	   microsecond or nanosecond] reading across packet size from 64 bytes
235	   to 9216 bytes

237	   -For latency and jitter, provide minimum, average and maximum values.
238	   If different iterations are done to gather the minimum, average and
239	   maximum, it SHOULD be specified in the report along with a
240	   justification on why the information could not have been gathered at
241	   the same test iteration

243	   -For jitter, a histogram describing the population of packets
244	   measured per latency or latency buckets is RECOMMENDED

246	   -The tests for Throughput, latency and jitter MAY be conducted as
247	   individual independent trials, with proper documentation in the
248	   report but SHOULD be conducted at the same time.

250	   -The methodology makes an assumption that the DUT has at least nine
251	   ports, as certain methodologies require that number of ports or
252	   more.

254	3. Buffering Testing

256	3.1 Objective

258	   To measure the size of the buffer of a DUT under
259	   typical|many|multiple conditions. Buffer architectures between
260	   multiple DUTs can differ and include egress buffering, shared egress
261	   buffering SoC (Switch-on-Chip), ingress buffering or a combination.
262	   The test methodology covers the buffer measurement regardless of
263	   buffer architecture used in the DUT.

265	3.2 Methodology

267	   A traffic generator MUST be connected to all ports on the DUT.

269	   The methodology for measuring buffering for a data-center switch is
270	   based on using known congestion of known fixed packet size along with
271	   maximum latency value measurements. The maximum latency will increase
272	   until the first packet drop occurs. At this point, the maximum
273	   latency value will remain constant. This is the point of inflection
274	   of this maximum latency change to a constant value. There MUST be
275	   multiple ingress ports receiving known amount of frames at a known
276	   fixed size, destined for the same egress port in order to create a
277	   known congestion condition. The total amount of packets sent from the
278	   oversubscribed port minus one, multiplied by the packet size
279	   represents the maximum port buffer size at the measured inflection
280	   point.

282	   1) Measure the highest buffer efficiency

284	   The tests described in this section have iterations called "first
285	   iteration", "second iteration" and, "last iteration". The idea is to
286	   show the first two iterations so the reader understands the logic on
287	   how to keep incrementing the iterations. The last iteration shows the
288	   end state of the variables.

290	   First iteration: ingress port 1 sending line rate to egress port 2,
291	   while port 3 sending a known low amount of over-subscription traffic
292	   (1% recommended) with a packet size of 64 bytes to egress port 2.
293	   Measure the buffer size value of the number of frames sent from the
294	   port sending the oversubscribed traffic up to the inflection point
295	   multiplied by the frame size.

297	   Second iteration: ingress port 1 sending line rate to egress port 2,
298	   while port 3 sending a known low amount of over-subscription traffic
299	   (1% recommended) with same packet size 65 bytes to egress port 2.
300	   Measure the buffer size value of the number of frames sent from the
301	   port sending the oversubscribed traffic up to the inflection point
302	   multiplied by the frame size.

304	   Last iteration: ingress port 1 sending line rate to egress port 2,
305	   while port 3 sending a known low amount of over-subscription traffic
306	   (1% recommended) with same packet size B bytes to egress port 2.
307	   Measure the buffer size value of the number of frames sent from the
308	   port sending the oversubscribed traffic up to the inflection point
309	   multiplied by the frame size.

311	   When the B value is found to provide the largest buffer size, then
312	   size B allows the highest buffer efficiency.

314	   2) Measure maximum port buffer size

316	   The tests described in this section have iterations called "first
317	   iteration", "second iteration" and, "last iteration". The idea is to
318	   show the first two iterations so the reader understands the logic on
319	   how to keep incrementing the iterations. The last iteration shows the
320	   end state of the variables.

322	   At fixed packet size B determined in procedure 1), for a fixed
323	   default Differentiated Services Code Point (DSCP)/Class of Service
324	   (COS) value of 0 and for unicast traffic proceed with the following:

326	   First iteration: ingress port 1 sending line rate to egress port 2,
327	   while port 3 sending a known low amount of over-subscription traffic
328	   (1% recommended) with same packet size to the egress port 2. Measure
329	   the buffer size value by multiplying the number of extra frames sent
330	   by the frame size.

332	   Second iteration:  ingress port 2 sending line rate to egress port 3,
333	   while port 4 sending a known low amount of over-subscription traffic
334	   (1% recommended) with same packet size to the egress port 3. Measure
335	   the buffer size value by multiplying the number of extra frames sent
336	   by the frame size.

338	   Last iteration: ingress port N-2 sending line rate traffic to egress
339	   port N-1, while port N sending a known low amount of over-
340	   subscription traffic (1% recommended) with same packet size to the
341	   egress port N. Measure the buffer size value by multiplying the
342	   number of extra frames sent by the frame size.

344	   This test series MAY be repeated using all different DSCP/COS values
345	   of traffic and then using Multicast type of traffic, in order to find
346	   if there is any DSCP/COS impact on the buffer size.

348	   3) Measure maximum port pair buffer sizes

350	   The tests described in this section have iterations called "first
351	   iteration", "second iteration" and, "last iteration". The idea is to
352	   show the first two iterations so the reader understands the logic on
353	   how to keep incrementing the iterations. The last iteration shows the
354	   end state of the variables.

356	   First iteration: ingress port 1 sending line rate to egress port 2;
357	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
358	   N-1 and N will respectively over subscribe at 1% of line rate egress
359	   port 2 and port 3. Measure the buffer size value by multiplying the
360	   number of extra frames sent by the frame size for each egress port.

362	   Second iteration: ingress port 1 sending line rate to egress port 2;
363	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
364	   N-1 and N will respectively over subscribe at 1% of line rate egress
365	   port 4 and port 5. Measure the buffer size value by multiplying the
366	   number of extra frames sent by the frame size for each egress port.

368	   Last iteration: ingress port 1 sending line rate to egress port 2;
369	   ingress port 3 sending line rate to egress port 4 etc. Ingress port
370	   N-1 and N will respectively over subscribe at 1% of line rate egress
371	   port N-3 and port N-2. Measure the buffer size value by multiplying
372	   the number of extra frames sent by the frame size for each egress
373	   port.

375	   This test series MAY be repeated using all different DSCP/COS values
376	   of traffic and then using Multicast type of traffic.

378	   4) Measure maximum DUT buffer size with many to one ports

380	   The tests described in this section have iterations called "first
381	   iteration", "second iteration" and, "last iteration". The idea is to
382	   show the first two iterations so the reader understands the logic on
383	   how to keep incrementing the iterations. The last iteration shows the
384	   end state of the variables.

386	   First iteration: ingress ports 1,2,... N-1 sending each [(1/[N-
387	   1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port.

389	   Second iteration: ingress ports 2,... N sending each [(1/[N-
390	   1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port.

392	   Last iteration: ingress ports N,1,2...N-2 sending each [(1/[N-
393	   1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port.

395	   This test series MAY be repeated using all different COS values of
396	   traffic and then using Multicast type of traffic.

398	   Unicast traffic and then Multicast traffic SHOULD be used in order to
399	   determine the proportion of buffer for documented selection of tests.
400	   Also the COS value for the packets SHOULD be provided for each test
401	   iteration as the buffer allocation size MAY differ per COS value. It
402	   is RECOMMENDED that the ingress and egress ports are varied in a
403	   random, but documented fashion in multiple tests to measure the
404	   buffer size for each port of the DUT.

406	3.3 Reporting format

408	   The report MUST include:

410	    - The packet size used for the most efficient buffer used, along
411	   with DSCP/COS value

413	    - The maximum port buffer size for each port

415	    - The maximum DUT buffer size

417	    - The packet size used in the test

419	    - The amount of over-subscription if different than 1%

421	    - The number of ingress and egress ports along with their location
422	   on the DUT

424	    - The repeatability of the test needs to be indicated: number of
425	   iterations of the same test and percentage of variation between
426	   results for each of the tests (min, max, avg)

428	   The percentage of variation is a metric providing a sense of how big
429	   the difference between the measured value and the previous ones.

431	   For example, for a latency test where the minimum latency is
432	   measured, the percentage of variation of the minimum latency will
433	   indicate by how much this value has varied between the current test
434	   executed and the previous one.

436	   PV=((x2-x1)/x1)*100 where x2 is the minimum latency value in the
437	   current test and x1 is the minimum latency value obtained in the
438	   previous test.

440	   The same formula is used for max and avg variations measured.

442	4 Microburst Testing

444	4.1 Objective

446	   To find the maximum amount of packet bursts a DUT can sustain under
447	   various configurations.

449	   This test provides additional methodology to the other RFC tests:

451	   -All bursts should be send with 100% intensity. Note: intensity is
452	   defined in [1] section 6.1.1
453	   -All ports of the DUT must be used for this test

455	   -All ports are recommended to be testes simultaneously

457	4.2 Methodology

459	   A traffic generator MUST be connected to all ports on the DUT. In
460	   order to cause congestion, two or more ingress ports MUST send bursts
461	   of packets destined for the same egress port. The simplest of the
462	   setups would be two ingress ports and one egress port (2-to-1).

464	   The burst MUST be sent with an intensity of 100% (intensity is
465	   defined in [1] section 6.1.1), meaning the burst of packets will be
466	   sent with a minimum inter-packet gap. The amount of packet contained
467	   in the burst will be trial variable and increase until there is a
468	   non-zero packet loss measured. The aggregate amount of packets from
469	   all the senders will be used to calculate the maximum amount of
470	   microburst the DUT can sustain.

472	   It is RECOMMENDED that the ingress and egress ports are varied in
473	   multiple tests to measure the maximum microburst capacity.

475	   The intensity of a microburst MAY be varied in order to obtain the
476	   microburst capacity at various ingress rates. Intensity of microburst
477	   is defined in [1].

479	   It is RECOMMENDED that all ports on the DUT will be tested
480	   simultaneously and in various configurations in order to understand
481	   all the combinations of ingress ports, egress ports and intensities.

483	   An example would be:

485	   First Iteration: N-1 Ingress ports sending to 1 Egress Ports

487	   Second Iterations: N-2 Ingress ports sending to 2 Egress Ports

489	   Last Iterations: 2 Ingress ports sending to N-2 Egress Ports

491	4.3 Reporting Format

493	   The report MUST include:

495	    - The maximum number of packets received per ingress port with the
496	   maximum burst size obtained with zero packet loss

498	    - The packet size used in the test
499	    - The number of ingress and egress ports along with their location
500	   on the DUT

502	    - The repeatability of the test needs to be indicated: number of
503	   iterations of the same test and percentage of variation between
504	   results (min, max, avg)

506	5. Head of Line Blocking

508	5.1 Objective

510	   Head-of-line blocking (HOLB) is a performance-limiting phenomenon
511	   that occurs when packets are held-up by the first packet ahead
512	   waiting to be transmitted to a different output port. This is defined
513	   in RFC 2889 section 5.5, Congestion Control. This section expands on
514	   RFC 2889 in the context of Data Center Benchmarking.

516	   The objective of this test is to understand the DUT behavior under
517	   head of line blocking scenario and measure the packet loss.

519	   Here are the differences between this HOLB test and RFC 2889:

521	   -This HOLB starts with 8 ports in two groups of 4, instead of 4 RFC
522	   2889

524	   -This HOLB shifts all the port numbers by one in a second iteration
525	   of the test, this is new compared to RFC 2889. The shifting port
526	   numbers continue until all ports are the first in the group. The
527	   purpose is to make sure to have tested all permutations to cover
528	   differences of behavior in the SoC of the DUT

530	   -Another test in this HOLB expands the group of ports, such that
531	   traffic is divided among 4 ports instead of two (25% instead of 50%
532	   per port)

534	   -Section 5.3 adds additional reporting requirements from Congestion
535	   Control in RFC 2889

537	5.2 Methodology

539	   In order to cause congestion in the form of head of line blocking,
540	   groups of four ports are used. A group has 2 ingress and 2 egress
541	   ports. The first ingress port MUST have two flows configured each
542	   going to a different egress port. The second ingress port will
543	   congest the second egress port by sending line rate. The goal is to
544	   measure if there is loss on the flow for the first egress port which
545	   is not over-subscribed.

547	   A traffic generator MUST be connected to at least eight ports on the
548	   DUT and SHOULD be connected using all the DUT ports.

550	   1) Measure two groups with eight DUT ports

552	   The tests described in this section have iterations called "first
553	   iteration", "second iteration" and, "last iteration". The idea is to
554	   show the first two iterations so the reader understands the logic on
555	   how to keep incrementing the iterations. The last iteration shows the
556	   end state of the variables.

558	   First iteration: measure the packet loss for two groups with
559	   consecutive ports

561	   The first group is composed of: ingress port 1 is sending 50% of
562	   traffic to egress port 3 and ingress port 1 is sending 50% of traffic
563	   to egress port 4. Ingress port 2 is sending line rate to egress port
564	   4. Measure the amount of traffic loss for the traffic from ingress
565	   port 1 to egress port 3.

567	   The second group is composed of: ingress port 5 is sending 50% of
568	   traffic to egress port 7 and ingress port 5 is sending 50% of traffic
569	   to egress port 8. Ingress port 6 is sending line rate to egress port
570	   8. Measure the amount of traffic loss for the traffic from ingress
571	   port 5 to egress port 7.

573	   Second iteration: repeat the first iteration by shifting all the
574	   ports from N to N+1.

576	   The first group is composed of: ingress port 2 is sending 50% of
577	   traffic to egress port 4 and ingress port 2 is sending 50% of traffic
578	   to egress port 5. Ingress port 3 is sending line rate to egress port
579	   5. Measure the amount of traffic loss for the traffic from ingress
580	   port 2 to egress port 4.

582	   The second group is composed of: ingress port 6 is sending 50% of
583	   traffic to egress port 8 and ingress port 6 is sending 50% of traffic
584	   to egress port 9. Ingress port 7 is sending line rate to egress port
585	   9. Measure the amount of traffic loss for the traffic from ingress
586	   port 6 to egress port 8.

588	   Last iteration: when the first port of the first group is connected
589	   on the last DUT port and the last port of the second group is
590	   connected to the seventh port of the DUT.

592	   Measure the amount of traffic loss for the traffic from ingress port
593	   N to egress port 2 and from ingress port 4 to egress port 6.

595	   2) Measure with N/4 groups with N DUT ports

597	   The tests described in this section have iterations called "first
598	   iteration", "second iteration" and, "last iteration". The idea is to
599	   show the first two iterations so the reader understands the logic on
600	   how to keep incrementing the iterations. The last iteration shows the
601	   end state of the variables.

603	   The traffic from ingress split across 4 egress ports (100/4=25%).

605	   First iteration: Expand to fully utilize all the DUT ports in
606	   increments of four. Repeat the methodology of 1) with all the group
607	   of ports possible to achieve on the device and measure for each port
608	   group the amount of traffic loss.

610	   Second iteration: Shift by +1 the start of each consecutive ports of
611	   groups

613	   Last iteration: Shift by N-1 the start of each consecutive ports of
614	   groups and measure the traffic loss for each port group.

616	5.3 Reporting Format

618	   For each test the report MUST include:

620	   - The port configuration including the number and location of ingress
621	   and egress ports located on the DUT

623	   - If HOLB was observed in accordance with the HOLB test in section 5

625	   - Percent of traffic loss

627	   - The repeatability of the test needs to be indicated: number of
628	   iteration of the same test and percentage of variation between
629	   results (min, max, avg)

631	6. Incast Stateful and Stateless Traffic

633	6.1 Objective

635	   The objective of this test is to measure the values for TCP Goodput
636	   [4] and latency with a mix of large and small flows. The test is
637	   designed to simulate a mixed environment of stateful flows that
638	   require high rates of goodput and stateless flows that require low
639	   latency. Stateful flows are created by generating TCP traffic and,
640	   stateless flows are created using UDP type of traffic.

642	6.2 Methodology

644	   In order to simulate the effects of stateless and stateful traffic on
645	   the DUT, there MUST be multiple ingress ports receiving traffic
646	   destined for the same egress port. There also MAY be a mix of
647	   stateful and stateless traffic arriving on a single ingress port. The
648	   simplest setup would be 2 ingress ports receiving traffic destined to
649	   the same egress port.

651	   One ingress port MUST be maintaining a TCP connection trough the
652	   ingress port to a receiver connected to an egress port. Traffic in
653	   the TCP stream MUST be sent at the maximum rate allowed by the
654	   traffic generator. At the same time, the TCP traffic is flowing
655	   through the DUT the stateless traffic is sent destined to a receiver
656	   on the same egress port. The stateless traffic MUST be a microburst
657	   of 100% intensity.

659	   It is RECOMMENDED that the ingress and egress ports are varied in
660	   multiple tests to measure the maximum microburst capacity.

662	   The intensity of a microburst MAY be varied in order to obtain the
663	   microburst capacity at various ingress rates.

665	   It is RECOMMENDED that all ports on the DUT be used in the test.

667	   The tests described bellow have iterations called "first iteration",
668	   "second iteration" and, "last iteration". The idea is to show the
669	   first two iterations so the reader understands the logic on how to
670	   keep incrementing the iterations. The last iteration shows the end
671	   state of the variables.

673	   For example:

675	   Stateful Traffic port variation (TCP traffic):

677	   TCP traffic needs to be generated in this section. During Iterations
678	   number of Egress ports MAY vary as well.

680	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
681	   Ingress port receiving stateless traffic destined to 1 Egress Port

683	   Second Iteration: 2 Ingress port receiving stateful TCP traffic and 1
684	   Ingress port receiving stateless traffic destined to 1 Egress Port
685	   Last Iteration: N-2 Ingress port receiving stateful TCP traffic and 1
686	   Ingress port receiving stateless traffic destined to 1 Egress Port

688	   Stateless Traffic port variation (UDP traffic):

690	   UDP traffic needs to be generated for this test. During Iterations,
691	   the number of Egress ports MAY vary as well.

693	   First Iteration: 1 Ingress port receiving stateful TCP traffic and 1
694	   Ingress port receiving stateless traffic destined to 1 Egress Port

696	   Second Iteration: 1 Ingress port receiving stateful TCP traffic and 2
697	   Ingress port receiving stateless traffic destined to 1 Egress Port

699	   Last Iteration: 1 Ingress port receiving stateful TCP traffic and N-2
700	   Ingress port receiving stateless traffic destined to 1 Egress Port

702	6.3 Reporting Format

704	   The report MUST include the following:

706	   - Number of ingress and egress ports along with designation of
707	   stateful or stateless flow assignment.

709	   - Stateful flow goodput

711	   - Stateless flow latency

713	   - The repeatability of the test needs to be indicated: number of
714	   iterations of the same test and percentage of variation between
715	   results (min, max, avg)

717	7.  Security Considerations

719	   Benchmarking activities as described in this memo are limited to
720	   technology characterization using controlled stimuli in a laboratory
721	   environment, with dedicated address space and the constraints
722	   specified in the sections above.

724	   The benchmarking network topology will be an independent test setup
725	   and MUST NOT be connected to devices that may forward the test
726	   traffic into a production network, or misroute traffic to the test
727	   management network.

729	   Further, benchmarking is performed on a "black-box" basis, relying
730	   solely on measurements observable external to the DUT/SUT.

732	   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
733	   benchmarking purposes. Any implications for network security arising
734	   from the DUT/SUT SHOULD be identical in the lab and in production
735	   networks.

737	8.  IANA Considerations

739	   NO IANA Action is requested at this time.

741	9.  References
742	9.1.  Normative References

744	   [RFC1242] Bradner, S. "Benchmarking Terminology for Network
745	         Interconnection Devices", BCP 14, RFC 1242, DOI
746	         10.17487/RFC1242, July 1991, <http://www.rfc-
747	         editor.org/info/rfc1242>

749	   [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
750	         Network Interconnect Devices", BCP 14, RFC 2544, DOI
751	         10.17487/RFC2544, March 1999, <http://www.rfc-
752	         editor.org/info/rfc2544>

754	9.2.  Informative References

756	   [1]  Avramov L. and Rapp J., "Data Center Benchmarking Terminology",
757	         April 2017.

759	   [RFC2889] Mandeville R. and Perser J., "Benchmarking Methodology for
760	         LAN Switching Devices", RFC 2889, August 2000, <http://www.rfc-
761	         editor.org/info/rfc2889>

763	   [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast
764	         Benchmarking", RFC 3918, October 2004, <http://www.rfc-
765	         editor.org/info/rfc3918>

767	         [RFC 6985] A. Morton, "IMIX Genome: Specification of Variable
768	         Packet Sizes for Additional Testing", RFC 6985, July 2013,
769	         <http://www.rfc-editor.org/info/rfc6985>

771	   [4]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.
772	         Joseph, "Understanding TCP Incast Throughput Collapse in
773	         Datacenter Networks,
774	         "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf"

776	         [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
777	         Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
778	         March 1997, <http://www.rfc-editor.org/info/rfc2119>

780	         [RFC2432] Dubray, K., "Terminology for IP Multicast
781	         Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October
782	         1998, <http://www.rfc-editor.org/info/rfc2432>

784	9.2.  Acknowledgements
785	         The authors would like to thank Alfred Morton and Scott Bradner
786	         for their reviews and feedback.

788	Authors' Addresses

790	         Lucien Avramov
791	         Google
792	         1600 Amphitheatre Parkway
793	         Mountain View, CA 94043
794	         United States
795	         Phone: +1 408 774 9077
796	         Email: lucien.avramov@gmail.com

798	         Jacob Rapp
799	         VMware
800	         3401 Hillview Ave
801	         Palo Alto, CA
802	         United States
803	         Phone: +1 650 857 3367
804	         Email: jrapp@vmware.com