idnits 2.17.1 

draft-ietf-bmwg-dcbench-terminology-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 11 characters in excess of 72.

  ** There are 6 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 169 has weird spacing: '...  is to  refer...'

  == Line 593 has weird spacing: '... S / Ft  bytes...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     3) LIFO MUST not be used, because it subtracts the latency of the
     packet; unlike all the other methods.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC1242' is defined on line 654, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2544' is defined on line 657, but no explicit
     reference was found in the text

  == Unused Reference: '1' is defined on line 662, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 665, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 668, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 671, but no explicit reference
     was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	Internet-Draft, Intended status: Informational                    Google
4	Expires: October 29,2017                                         J. Rapp
5	April 27, 2017                                                    VMware

7	                  Data Center Benchmarking Terminology
8	                 draft-ietf-bmwg-dcbench-terminology-07

10	Abstract

12	The purpose of this informational document is to establish definitions,
13	discussion and measurement techniques for data center benchmarking.
14	Also, it is to introduce new terminologies applicable to data center
15	performance evaluations. The purpose of this document is not to define
16	the test methodology, but rather establish the important concepts when
17	one is interested in benchmarking network switches and routers in the
18	data center.

20	Status of this Memo

22	This Internet-Draft is submitted in full conformance with the provisions
23	of BCP 78 and BCP 79.

25	Internet-Drafts are working documents of the Internet Engineering Task
26	Force (IETF). Note that other groups may also distribute working
27	documents as Internet-Drafts. The list of current Internet-Drafts is at
28	http://datatracker.ietf.org/drafts/current.

30	Internet-Drafts are draft documents valid for a maximum of six months
31	and may be updated, replaced, or obsoleted by other documents at any
32	time. It is inappropriate to use Internet-Drafts as reference material
33	or to cite them other than as "work in progress."

35	Copyright Notice

37	Copyright (c) 2017 IETF Trust and the persons identified as the document
38	authors. All rights reserved.

40	This document is subject to BCP 78 and the IETF Trust's Legal Provisions
41	Relating to IETF Documents (http://trustee.ietf.org/license-info) in
42	effect on the date of publication of this document.  Please review these
43	documents carefully, as they describe your rights and restrictions with
44	respect to this document.  Code Components extracted from this document
45	must include Simplified BSD License text as described in Section 4.e of
46	the Trust Legal Provisions and are provided without warranty as
47	described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
53	     1.2. Definition format . . . . . . . . . . . . . . . . . . . . .  4
54	   2.  Latency  . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
55	     2.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . .  4
56	     2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	     2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  6
58	   3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
59	     3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	     3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  6
61	     3.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  6
62	   4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . .  7
63	     4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  7
64	     4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  7
65	     4.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  7
66	   5 Line rate  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
67	     5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  8
68	     5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  8
69	     5.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  9
70	   6  Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
71	     6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
72	       6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 10
73	       6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12
74	       6.1.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 12
75	     6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
76	       6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 12
77	       6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 13
78	       6.2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 13
79	   7 Application Throughput: Data Center Goodput  . . . . . . . . . . 13
80	     7.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . . 13
81	     7.2. Discussion  . . . . . . . . . . . . . . . . . . . . . . . . 14
82	     7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 14
83	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 15
84	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
85	   10.  References  . . . . . . . . . . . . . . . . . . . . . . . . . 15
86	     10.1.  Normative References  . . . . . . . . . . . . . . . . . . 15
87	     10.2.  Informative References  . . . . . . . . . . . . . . . . . 15
88	     10.3.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . 16
89	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16

91	1.  Introduction

93	   Traffic patterns in the data center are not uniform and are contently
94	   changing. They are dictated by the nature and variety of applications
95	   utilized in the data center. It can be largely east-west traffic
96	   flows in one data center and north-south in another, while some may
97	   combine both. Traffic patterns can be bursty in nature and contain
98	   many-to-one, many-to-many, or one-to-many flows. Each flow may also
99	   be small and latency sensitive or large and throughput sensitive
100	   while containing a mix of UDP and TCP traffic. All of which can
101	   coexist in a single cluster and flow through a single network device
102	   all at the same time. Benchmarking of network devices have long used
103	   RFC1242, RFC2432, RFC2544, RFC2889 and RFC3918. These benchmarks have
104	   largely been focused around various latency attributes and max
105	   throughput of the Device Under Test being benchmarked. These
106	   standards are good at measuring theoretical max throughput,
107	   forwarding rates and latency under testing conditions, but to not
108	   represent real traffic patterns that may affect these networking
109	   devices. The data center networking devices covered are switches and
110	   routers.

112	   The following defines a set of definitions, metrics and terminologies
113	   including congestion scenarios, switch buffer analysis and redefines
114	   basic definitions in order to represent a wide mix of traffic
115	   conditions.

117	1.1.  Requirements Language

119	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
120	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
121	   document are to be interpreted as described in RFC 2119 [RFC2119].

123	1.2. Definition format

125	   Term to be defined. (e.g., Latency)

127	   Definition: The specific definition for the term.

129	   Discussion: A brief discussion about the term, it's application and
130	   any restrictions on measurement procedures.

132	   Measurement Units: Methodology for the measure and units used to
133	   report measurements of this term, if applicable.

135	2.  Latency

137	2.1. Definition

139	   Latency is a the amount of time it takes a frame to transit the DUT.
140	   Latency is measured in unit of time (seconds, milliseconds,
141	   microseconds and so on). The purpose of measuring latency is to
142	   understand what is the impact of adding a device in the communication
143	   path.

145	   The Latency interval can be assessed between different combinations
146	   of events, irrespectively of the type of switching device (bit
147	   forwarding aka cut-through or store forward type of device)

149	   Traditionally the latency measurement definitions are:

151	   	FILO (First In Last Out) The time interval starting when the end of
152	   the first bit of the input frame reaches the input port and ending
153	   when the last bit of the output frame is seen on the output port

155	   	FIFO (First In First Out) The time interval starting when the end of
156	   the first bit of the input frame reaches the input port and ending
157	   when the start of the first bit of the output frame is seen on the
158	   output port

160	   	LILO (Last In Last Out) The time interval starting when the last bit
161	   of the input frame reaches the input port and the last bit of the
162	   output frame is seen on the output port

164	   	LIFO (Last In First Out) The time interval starting when the last
165	   bit of the input frame reaches the input port and ending when the
166	   first bit of the output frame is seen on the output port.

168	   Another possibility to summarize the four different definitions above
169	   is to  refer to the bit position as they normally occur: input to
170	   output.

172	            FILO is FL (First bit Last bit) 	FIFO is FF (First bit First
173	   bit) 	LILO is LL (Last bit Last bit) 	LIFO is LF (Last bit First bit)

175	   This definition explained in this section in context of data center
176	   switching benchmarking is in lieu of the previous definition of
177	   Latency defined in RFC 1242, section 3.8 and is quoted here:

179	   For store and forward devices: The time interval starting when the
180	   last bit of the input frame reaches the input port and ending when
181	   the first bit of the output frame is seen on the output port.

183	   For bit forwarding devices: The time interval starting when the end
184	   of the first bit of the input frame reaches the input port and ending
185	   when the start of the first bit of the output frame is seen on the
186	   output port.

188	2.2 Discussion

190	   FILO is the most important measuring definition. Any type of switches
191	   MUST be measured with the FILO mechanism: FILO will include the
192	   latency of the switch and the latency of the frame as well as the
193	   serialization delay. It is a picture of the 'whole' latency going
194	   through the DUT. For applications, which are latency sensitive and
195	   can function with initial bytes of the frame, FIFO MAY be an
196	   additional type of measuring to supplement FILO.

198	   Not all DUTs are exclusively cut-through or store-and-forward. Data
199	   Center DUTs are frequently store-and-forward for smaller packet sizes
200	   and then adopting a cut-through behavior. FILO covers all scenarios.

202	   LIFO mechanism can be used with store forward type of switches but
203	   not with cut-through type of switches, as it will provide negative
204	   latency values for larger packet sizes because LIFO removes the
205	   serialization delay. Therefore, this mechanism MUST NOT be used when
206	   comparing latencies of two different DUTs.

208	2.3 Measurement Units

210	   The measuring methods to use for benchmarking purposes are as follow:

212	   1) FILO MUST be used as a measuring method, as this will include the
213	   latency of the packet; and today the application commonly need to
214	   read the whole packet to process the information and take an action.

216	   2) FIFO MAY be used for certain applications able to proceed data as
217	   the first bits arrive (FPGA for example)

219	   3) LIFO MUST not be used, because it subtracts the latency of the
220	   packet; unlike all the other methods.

222	3 Jitter

224	3.1 Definition

226	   Jitter in the data center context is synonymous with the common term
227	   Delay variation. It is derived from multiple measurements of one-way
228	   delay, as described in RFC 3393. The mandatory definition of Delay
229	   Variation is the PDV form from section 4.2 of RFC 5481. When
230	   considering a stream of packets, the delays of all packets are
231	   subtracted from the minimum delay over all packets in the stream.
232	   This facilitates assessment of the range of delay variation (Max -
233	   Min), or a high percentile of PDV (99th percentile, for robustness
234	   against outliers).

236	   If First-bit to Last-bit timestamps are used for Delay measurement,
237	   then Delay Variation MUST be measured using packets or frames of the
238	   same size, since the definition of latency includes the serialization
239	   time for each packet. Otherwise if using First-bit to First-bit, the
240	   size restriction does not apply.

242	3.2 Discussion

244	   In addition to PDV Range and or a high percentile of PDV, Inter-
245	   Packet Delay Variation (IPDV) as defined in section 4.1 of RFC5481
246	   (differences between two consecutive packets) MAY be used for the
247	   purpose of determining how packet spacing has changed during
248	   transfer, for example to see if packet stream has become closely-
249	   spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be
250	   used, as this collapses the "bursty" and "dispersed" sides of the
251	   IPDV distribution together.

253	3.3 Measurement Units
254	   The measurement of delay variation is expressed in units of seconds.
255	   A PDV histogram MAY be provided for the population of packets
256	   measured.

258	4 Physical Layer Calibration

260	4.1 Definition

262	   The calibration of the physical layer consists of defining and
263	   measuring the latency of the physical devices used to perform test on
264	   the DUT.

266	   It includes the list of all physical layer components used as listed
267	   here after:

269	   -type of device used to generate traffic / measure traffic

271	   -type of line cards used on the traffic generator

273	   -type of transceivers on traffic generator

275	   -type of transceivers on DUT

277	   -type of cables

279	   -length of cables

281	   -software name, and version of traffic generator and DUT

283	   -list of enabled features on DUT MAY be provided and is recommended
284	   [especially the control plane protocols such as LLDP, Spanning-Tree
285	   etc.]. A comprehensive configuration file MAY be provided to this
286	   effect.

288	4.2 Discussion

290	   Physical layer calibration is part of the end to end latency, which
291	   should be taken into acknowledgment while evaluating the DUT. Small
292	   variations of the physical components of the test may impact the
293	   latency being measure so they MUST be described when presenting
294	   results.

296	4.3 Measurement Units

298	   It is RECOMMENDED to use all cables of : the same type, the same
299	   length, when possible using the same vendor. It is a MUST to document
300	   the cables specifications on section [4.1s] along with the test
301	   results. The test report MUST specify if the cable latency has been
302	   removed from the test measures or not. The accuracy of the traffic
303	   generator measure MUST be provided [this is usually a value in the
304	   20ns range for current test equipment].

306	5 Line rate

308	5.1 Definition

310	   The transmit timing, or maximum transmitted data rate is controlled
311	   by the "transmit clock" in the DUT.  The receive timing (maximum
312	   ingress data rate) is derived from the transmit clock of the
313	   connected interface.

315	   The line rate or physical layer frame rate is the maximum capacity to
316	   send frames of a specific size at the transmit clock frequency of the
317	   DUT.

319	   The term port capacity term defines the maximum speed capability for
320	   the given port; for example 1GE, 10GE, 40GE, 100GE etc.

322	   The frequency ("clock rate") of the transmit clock in any two
323	   connected interfaces will never be precisely the same, therefore a
324	   tolerance is needed, this will be expressed by Parts Per Million
325	   (PPM) value. The IEEE standards allow a specific +/- variance in the
326	   transmit clock rate, and Ethernet is designed to allow for small,
327	   normal variations between the two clock rates. This results in a
328	   tolerance of the line rate value when traffic is generated from a
329	   testing equipment to a DUT.

331	   Line rate SHOULD be measured in frames per second.

333	5.2 Discussion

335	   For a transmit clock source, most Ethernet switches use "clock
336	   modules" (also called "oscillator modules") that are sealed,
337	   internally temperature-compensated, and very accurate. The output
338	   frequency of these modules is not adjustable because it is not
339	   necessary.  Many test sets, however, offer a software-controlled
340	   adjustment of the transmit clock rate, which should be used to
341	   compensate the test equipment to not send more than line rate of the
342	   DUT.

344	   To allow for the minor variations typically found in the clock rate
345	   of commercially-available clock modules and other crystal-based
346	   oscillators, Ethernet standards specify the maximum transmit clock
347	   rate variation to be not more than +/- 100 PPM (parts per million)
348	   from a calculated center frequency. Therefore a DUT must be able to
349	   accept frames at a rate within +/- 100 PPM to comply with the
350	   standards.

352	   Very few clock circuits are precisely +/- 0.0 PPM because:

354	   1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per
355	   million) variance over time. Therefore it is normal for the frequency
356	   of the oscillator circuits to experience variation over time and over
357	   a wide temperature range, among external factors.

359	   2.The crystals or clock modules, usually have a specific  +/- PPM
360	   variance that is significantly better than +/- 100 PPM. Often times
361	   this is +/- 30 PPM or better in order to be considered a
362	   "certification instrument".

364	   When testing an Ethernet switch throughput at "line rate", any
365	   specific switch will have a clock rate variance. If a test set is
366	   running +1 PPM faster than a switch under test, and a sustained line
367	   rate test is performed,  a gradual increase in latency and eventually
368	   packet drops as buffers fill and overflow in the switch can be
369	   observed. Depending on how much clock variance there is between the
370	   two connected systems, the effect may be seen after the traffic
371	   stream has been running for a few hundred microseconds, a few
372	   milliseconds, or seconds. The same low latency and no-packet-loss can
373	   be demonstrated by setting the test set link occupancy to slightly
374	   less than 100 percent link occupancy. Typically 99 percent link
375	   occupancy produces excellent low-latency and no packet loss. No
376	   Ethernet switch or router will have a transmit clock rate of exactly
377	   +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is
378	   precisely +/- 0.0 PPM.

380	   Test set equipment manufacturers are well-aware of the standards, and
381	   allows a software-controlled +/- 100 PPM "offset" (clock-rate
382	   adjustment) to compensate for normal variations in the clock speed of
383	   "devices under test". This offset adjustment allows engineers to
384	   determine the approximate speed the connected device is operating,
385	   and verify that it is within parameters allowed by standards.

387	5.3 Measurement Units

389	   "Line Rate" CAN be measured in terms of "Frame Rate":

391	   Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap
392	   + Preamble + Start-Frame Delimiter)

394	   Minimum_Gap represents the inter frame gap. This formula "scales up"
395	   or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so
396	   on.

398	   Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate =
399	   1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672
400	   Frame Rate = 1,488,095.2 frames per second.

402	   Considering the allowance of +/- 100 PPM, a switch may "legally"
403	   transmit traffic at a frame rate between 1,487,946.4 FPS and
404	   1,488,244 FPS.  Each 1 PPM variation in clock rate will translate to
405	   a 1.488 frame-per-second frame rate increase or decrease.

407	   In a production network, it is very unlikely to see precise line rate
408	   over a very brief period. There is no observable difference between
409	   dropping packets at 99% of line rate and 100% of line rate. -Line
410	   rate CAN measured at 100% of line rate with a -100PPM adjustment. -
411	   Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.-The PPM
412	   adjustment SHOULD only be used for a line rate type of measurement

414	6  Buffering

416	6.1 Buffer

418	6.1.1 Definition

420	   Buffer Size: the term buffer size, represents the total amount of
421	   frame buffering memory available on a DUT. This size is expressed in
422	   Byte; KB (kilobytes), MB (megabytes) or GB (gigabyte). When the
423	   buffer size is expressed it SHOULD be defined by a size metric
424	   defined above. When the buffer size is expressed, an indication of
425	   the frame MTU used for that measurement is also necessary as well as
426	   the cos or dscp value set; as often times the buffers are carved by
427	   quality of service implementation. (please refer to the buffer
428	   efficiency section for further details).

430	   Example: Buffer Size of DUT when sending 1518 bytes frames is 18 Mb.

432	   Port Buffer Size: the port buffer size is the amount of buffer a
433	   single ingress port, egress port or combination of ingress and egress
434	   buffering location for a single port. The reason of mentioning the
435	   three locations for the port buffer is, that the DUT buffering scheme
436	   can be unknown or untested, and therefore the indication of where the
437	   buffer is located helps understand the buffer architecture and
438	   therefore the total buffer size. The Port Buffer Size is an
439	   informational value that MAY be provided from the DUT vendor. It is
440	   not a value that is tested by benchmarking. Benchmarking will be done
441	   using the Maximum Port Buffer Size or Maximum Buffer Size
442	   methodology.

444	   Maximum Port Buffer Size: this is in most cases the same as the Port
445	   Buffer Size. In certain switch architecture called SoC (switch on
446	   chip), there is a concept of port buffer and shared buffer pool
447	   available for all ports. Maximum Port Buffer, defines the scenario of
448	   a SoC buffer, where this amount in B (byte), KB (kilobyte), MB
449	   (megabyte) or GB (gigabyte) would represent the sum of the port
450	   buffer along with the maximum value of shared buffer this given port
451	   can take. The Maximum Port Buffer Size needs to be expressed along
452	   with the frame MTU used for the measurement and the cos or dscp bit
453	   value set for the test.

455	   Example: a DUT has been measured to have 3KB of port buffer for 1518
456	   frame size packets and a total of 4.7 MB of maximum port buffer for
457	   1518 frame size packets and a cos of 0.

459	   Maximum DUT Buffer Size: this is the total size of Buffer a DUT can
460	   be measured to have. It is most likely different than the Maximum
461	   Port Buffer Size. It CAN also be different from the sum of Maximum
462	   Port Buffer Size. The Maximum Buffer Size needs to be expressed along
463	   with the frame MTU used for the measurement and along with the cos or
464	   dscp value set during the test.

466	   Example: a DUT has been measured to have 3KB of port buffer for 1518
467	   frame size packets and a total of 4.7 MB of maximum port buffer for
468	   1518 frame size packets. The DUT has a Maximum Buffer Size of 18 MB
469	   at 1500 bytes and a cos of 0.

471	   Burst: The burst is a fixed number of packets sent over a percentage
472	   of linerate of a defined port speed. The amount of frames sent are
473	   evenly distributed across the interval T. A constant C, can be
474	   defined to provide the average time between two consecutive packets
475	   evenly spaced.

477	   Microburst: it is a burst. A microburst is when packet drops occur
478	   when there is not sustained or noticeable congestion upon a link or
479	   device. A characterization of microburst is when the Burst is not
480	   evenly distributed over T, and is less than the constant C [C=
481	   average time between two consecutive packets evenly spaced out].

483	   Intensity of Microburst: this is a percentage, representing the level
484	   of microburst between 1 and 100%. The higher the number the higher
485	   the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] /
486	   Sum(packets)]]*100
487	   The above definitions are not meant to comment on the ideal sizing of
488	   a buffer, rather on how to measure it. A larger buffer is not
489	   necessarily better and CAN cause issues with buffer bloat.

491	6.1.2 Discussion

493	   When measuring buffering on a DUT, it is important to understand what
494	   the behavior is for each port, and also for all ports as this will
495	   provide an evidence of the total amount of buffering available on the
496	   switch. The terms of buffer efficiency here helps one understand what
497	   is the optimum packet size for the buffer to be used, or what is the
498	   real volume of buffer available for a specific packet size. This
499	   section does not discuss how to conduct the test methodology, it
500	   rather explains the buffer definitions and what metrics should be
501	   provided for a comprehensive data center device buffering
502	   benchmarking.

504	6.1.3 Measurement Units

506	   When Buffer is measured:-the buffer size MUST be measured-the port
507	   buffer size MAY be provided for each port-the maximum port buffer
508	   size MUST be measured-the maximum DUT buffer size MUST be measured-
509	   the intensity of microburst MAY be mentioned when a microburst test
510	   is performed-the cos or dscp value set during the test SHOULD be
511	   provided

513	6.2 Incast
514	6.2.1 Definition

516	   The term Incast, very commonly utilized in the data center, refers to
517	   the traffic pattern of many-to-one or many-to-many conversations.
518	   Typically in the data center it would refer to many different ingress
519	   server ports(many), sending traffic to a common uplink (one), or
520	   multiple uplinks (many). This pattern is generalized for any network
521	   as many incoming ports sending traffic to one or few uplinks. It can
522	   also be found in many-to-many traffic patterns.

524	   Synchronous arrival time: When two, or more, frames of respective
525	   sizes L1 and L2 arrive at their respective one or multiple ingress
526	   ports, and there is an overlap of the arrival time for any of the
527	   bits on the DUT, then the frames L1 and L2 have a synchronous arrival
528	   times. This is called incast.

530	   Asynchronous arrival time: Any condition not defined by synchronous.

532	   Percentage of synchronization: this defines the level of overlap

534	   [amount of bits] between the frames L1,L2..Ln.

536	   Example: two 64 bytes frames, of length L1 and L2, arrive to ingress
537	   port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes
538	   between the two where L1 and L2 were at the same time on the
539	   respective ingress ports. Therefore the percentage of synchronization
540	   is 10%.

542	   Stateful type traffic defines packets exchanged with a stateful
543	   protocol such as for example TCP.

545	   Stateless type traffic defines packets exchanged with a stateless
546	   protocol such as for example UDP.

548	6.2.2 Discussion

550	   In this scenario, buffers are solicited on the DUT. In a ingress
551	   buffering mechanism, the ingress port buffers would be solicited
552	   along with Virtual Output Queues, when available; whereas in an
553	   egress buffer mechanism, the egress buffer of the one outgoing port
554	   would be used.

556	   In either cases, regardless of where the buffer memory is located on
557	   the switch architecture; the Incast creates buffer utilization.

559	   When one or more frames having synchronous arrival times at the DUT
560	   they are considered forming an incast.

562	6.2.3 Measurement Units

564	   It is a MUST to measure the number of ingress and egress ports. It is
565	   a MUST to have a non null percentage of synchronization, which MUST
566	   be specified.

568	7 Application Throughput: Data Center Goodput

570	7.1. Definition

572	   In Data Center Networking, a balanced network is a function of
573	   maximal throughput 'and' minimal loss at any given time. This is
574	   defined by the Goodput. Goodput is the application-level throughput.
575	   The definition used is a variance of the definition in RFC 2647.

577	   Goodput is the number of bits per unit of time forwarded to the
578	   correct destination interface of the DUT/SUT, minus any bits
579	   retransmitted.

581	7.2. Discussion

583	   In data center benchmarking, the goodput is a value that SHOULD be
584	   measured. It provides a realistic idea of the usage of the available
585	   bandwidth. A goal in data center environments is to maximize the
586	   goodput while minimizing the loss.

588	7.3. Measurement Units

590	   When S is the total bytes received from all senders [not inclusive of
591	   packet headers or TCP headers - it's the payload] and Ft is the
592	   Finishing Time of the last sender; the Goodput G is then measured by
593	   the following formula: G= S / Ft  bytes per second

595	   Example: a TCP file transfer over HTTP protocol on a 10Gb/s media.
596	   The file cannot be transferred over Ethernet as a single continuous
597	   stream. It must be broken down into individual frames of 1500 bytes
598	   when the standard MTU [Maximum Transmission Unit] is used. Each
599	   packet requires 20 bytes of IP header information and 20 bytes of TCP
600	   header information, therefore 1460 byte are available per packet for
601	   the file transfer. Linux based systems are further limited to 1448
602	   bytes as they also carry a 12 byte timestamp. Finally, the date is
603	   transmitted in this example over Ethernet which adds a 26 byte
604	   overhead per packet.

606	   G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit/s or 1.196 Gigabytes per
607	   second.

609	   Please note: this example does not take into consideration additional
610	   Ethernet overhead, such as the interframe gap (a minimum of 96 bit
611	   times), nor collisions (which have a variable impact, depending on
612	   the network load).

614	   When conducting Goodput measurements please document in addition to
615	   the 4.1 section:

617	   -the TCP Stack used

619	   -OS Versions

621	   -NIC firmware version and model

623	   For example, Windows TCP stacks and different Linux versions can
624	   influence TCP based tests results.

626	8.  Security Considerations

628	   Benchmarking activities as described in this memo are limited to
629	   technology characterization using controlled stimuli in a laboratory
630	   environment, with dedicated address space and the constraints
631	   specified in the sections above.

633	   The benchmarking network topology will be an independent test setup
634	   and MUST NOT be connected to devices that may forward the test
635	   traffic into a production network, or misroute traffic to the test
636	   management network.

638	   Further, benchmarking is performed on a "black-box" basis, relying
639	   solely on measurements observable external to the DUT/SUT.

641	   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
642	   benchmarking purposes. Any implications for network security arising
643	   from the DUT/SUT SHOULD be identical in the lab and in production
644	   networks.

646	9.  IANA Considerations

648	   NO IANA Action is requested at this time.

650	10.  References

652	10.1.  Normative References

654	   [RFC1242]   Bradner, S. "Benchmarking Terminology for Network
655	         Interconnection Devices", RFC 1242, July 1991.

657	   [RFC2544]   Bradner, S. and J. McQuaid, "Benchmarking Methodology for
658	         Network Interconnect Devices", RFC 2544, March 1999.

660	10.2.  Informative References

662	   [1]  Avramov L. and Rapp J., "Data Center Benchmarking Methodology",
663	         April 2017.

665	         [2]  Mandeville R. and Perser J., "Benchmarking Methodology for
666	         LAN Switching Devices", RFC 2889, August 2000.

668	   [3]  Stopp D. and Hickman B., "Methodology for IP Multicast
669	         Benchmarking", RFC 3918, October 2004.

671	   [4]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.

673	         Joseph, "Understanding TCP Incast Throughput Collapse in
674	         Datacenter Networks,
675	         "http://www.eecs.berkeley.edu/~ychen2/professional/TCPIncastWREN2009.pdf".

677	         [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
678	         Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
679	         March 1997, <http://www.rfc-editor.org/info/rfc2119>

681	10.3.  Acknowledgments

683	         The authors would like to thank Alfred Morton, Scott Bradner,
684	         Ian Cox, Tim Stevenson for their reviews and feedback.

686	Authors' Addresses

688	         Lucien Avramov
689	         Google
690	         170 West Tasman drive
691	         Mountain View, CA 94043
692	         United States
693	         Email: lucienav@google.com

695	         Jacob Rapp
696	         VMware
697	         3401 Hillview Ave
698	         Palo Alto, CA 94304
699	         United States
700	         Phone: +1 650 857 3367
701	         Email: jrapp@vmware.com