idnits 2.17.1 

draft-ietf-bmwg-dcbench-terminology-19.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 241 has weird spacing: '... change  does ...'

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	INTERNET-DRAFT, Intended status: Informational                    Google
4	Expires: December 24,2017                                        J. Rapp
5	June 22, 2017                                                     VMware

7	                  Data Center Benchmarking Terminology
8	                 draft-ietf-bmwg-dcbench-terminology-19

10	Abstract

12	The purpose of this informational document is to establish definitions
13	and describe measurement techniques for data center benchmarking, as
14	well as it is to introduce new terminologies applicable to performance
15	evaluations of data center network equipment. This document establishes
16	the important concepts for benchmarking network switches and routers in
17	the data center and, is a pre-requisite to the test methodology
18	publication [draft-ietf-bmwg-dcbench-methodology]. Many of these terms
19	and methods may be applicable to network equipment beyond this
20	publication's scope as the technologies originally applied in the data
21	center are deployed elsewhere.

23	Status of this Memo

25	This Internet-Draft is submitted in full conformance with the provisions
26	of BCP 78 and BCP 79.

28	Internet-Drafts are working documents of the Internet Engineering Task
29	Force (IETF). Note that other groups may also distribute working
30	documents as Internet-Drafts. The list of current Internet-Drafts is at
31	http://datatracker.ietf.org/drafts/current.

33	Internet-Drafts are draft documents valid for a maximum of six months
34	and may be updated, replaced, or obsoleted by other documents at any
35	time. It is inappropriate to use Internet-Drafts as reference material
36	or to cite them other than as "work in progress."

38	Copyright Notice

40	Copyright (c) 2017 IETF Trust and the persons identified as the document
41	authors. All rights reserved.

43	This document is subject to BCP 78 and the IETF Trust's Legal Provisions
44	Relating to IETF Documents (http://trustee.ietf.org/license-info) in
45	effect on the date of publication of this document.  Please review these
46	documents carefully, as they describe your rights and restrictions with
47	respect to this document.  Code Components extracted from this document
48	must include Simplified BSD License text as described in Section 4.e of
49	the Trust Legal Provisions and are provided without warranty as
50	described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
56	     1.2. Definition format . . . . . . . . . . . . . . . . . . . . .  4
57	   2.  Latency  . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	     2.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . .  4
59	     2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	     2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  6
61	   3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
62	     3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  6
63	     3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  7
64	     3.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  7
65	   4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . .  7
66	     4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  7
67	     4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  8
68	     4.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  8
69	   5 Line rate  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
70	     5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  8
71	     5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  9
72	     5.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . . 10
73	   6  Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
74	     6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
75	       6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 11
76	       6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12
77	       6.1.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 12
78	     6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
79	       6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 13
80	       6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 14
81	       6.2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 14
82	   7 Application Throughput: Data Center Goodput  . . . . . . . . . . 14
83	     7.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . . 14
84	     7.2. Discussion  . . . . . . . . . . . . . . . . . . . . . . . . 14
85	     7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 15
86	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
87	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
88	   10.  References  . . . . . . . . . . . . . . . . . . . . . . . . . 16
89	     10.1.  Normative References  . . . . . . . . . . . . . . . . . . 16
90	     10.2.  Informative References  . . . . . . . . . . . . . . . . . 17
91	     10.3.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . 17

93	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17

95	1.  Introduction

97	   Traffic patterns in the data center are not uniform and are
98	   constantly changing. They are dictated by the nature and variety of
99	   applications utilized in the data center. It can be largely east-west
100	   traffic flows (server to server inside the data center) in one data
101	   center and north-south (outside of the data center to server) in
102	   another, while some may combine both. Traffic patterns can be bursty
103	   in nature and contain many-to-one, many-to-many, or one-to-many
104	   flows. Each flow may also be small and latency sensitive or large and
105	   throughput sensitive while containing a mix of UDP and TCP traffic.
106	   One or more of these may coexist in a single cluster and flow through
107	   a single network device simultaneously. Benchmarking of network
108	   devices have long used [RFC1242], [RFC2432], [RFC2544], [RFC2889] and
109	   [RFC3918]. These benchmarks have largely been focused around various
110	   latency attributes and max throughput of the Device Under Test being
111	   benchmarked. These standards are good at measuring theoretical max
112	   throughput, forwarding rates and latency under testing conditions,
113	   but they do not represent real traffic patterns that may affect these
114	   networking devices. The data center networking devices covered are
115	   switches and routers.

117	   Currently, typical data center networking devices are characterized
118	   by:

120	   -High port density (48 ports of more)

122	   -High speed (up to 100 GB/s currently per port)

124	   -High throughput (line rate on all ports for Layer 2 and/or Layer 3)

126	   -Low latency (in the microsecond or nanosecond range)

128	   -Low amount of buffer (in the MB range per networking device)

130	   -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory)

132	   The following document defines a set of definitions, metrics and
133	   terminologies including congestion scenarios, switch buffer analysis
134	   and redefines basic definitions in order to represent a wide mix of
135	   traffic conditions. The test methodologies are defined in [draft-
136	   ietf-bmwg-dcbench-methodology].

138	1.1.  Requirements Language

140	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
141	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
142	   document are to be interpreted as described in RFC 2119 [RFC2119].

144	1.2. Definition format

146	   Term to be defined. (e.g., Latency)

148	   Definition: The specific definition for the term.

150	   Discussion: A brief discussion about the term, its application and
151	   any restrictions on measurement procedures.

153	   Measurement Units: Methodology for the measure and units used to
154	   report measurements of this term, if applicable.

156	2.  Latency

158	2.1. Definition

160	   Latency is a the amount of time it takes a frame to transit the
161	   Device Under Test (DUT). Latency is measured in units of time
162	   (seconds, milliseconds, microseconds and so on). The purpose of
163	   measuring latency is to understand the impact of adding a device in
164	   the communication path.

166	   The Latency interval can be assessed between different combinations
167	   of events, regardless of the type of switching device (bit forwarding
168	   aka cut-through, or store-and-forward type of device). [RFC1242]
169	   defined Latency differently for each of these types of devices.

171	   Traditionally the latency measurement definitions are:

173	   FILO (First In Last Out)

175	   The time interval starting when the end of the first bit of the input
176	   frame reaches the input port and ending when the last bit of the
177	   output frame is seen on the output port.

179	   FIFO (First In First Out):

181	   The time interval starting when the end of the first bit of the input
182	   frame reaches the input port and ending when the start of the first
183	   bit of the output frame is seen on the output port. [RFC1242] Latency
184	   for bit forwarding devices uses these events.

186	   LILO (Last In Last Out):

188	   The time interval starting when the last bit of the input frame
189	   reaches the input port and the last bit of the output frame is seen
190	   on the output port.

192	   LIFO (Last In First Out):

194	   The time interval starting when the last bit of the input frame
195	   reaches the input port and ending when the first bit of the output
196	   frame is seen on the output port. [RFC1242] Latency for bit
197	   forwarding devices uses these events.

199	   Another possibility to summarize the four different definitions above
200	   is to refer to the bit position as they normally occur: Input to
201	   output.

203	   FILO is FL (First bit Last bit). FIFO is FF (First bit First bit).
204	   LILO is LL (Last bit Last bit). LIFO is LF (Last bit First bit).

206	   This definition explained in this section in context of data center
207	   switching benchmarking is in lieu of the previous definition of
208	   Latency defined in RFC 1242, section 3.8 and is quoted here:

210	   For store and forward devices: The time interval starting when the
211	   last bit of the input frame reaches the input port and ending when
212	   the first bit of the output frame is seen on the output port.

214	   For bit forwarding devices: The time interval starting when the end
215	   of the first bit of the input frame reaches the input port and ending
216	   when the start of the first bit of the output frame is seen on the
217	   output port.

219	   To accommodate both types of network devices and hybrids of the two
220	   types that have emerged, switch Latency measurements made according
221	   to this document MUST be measured with the FILO events. FILO will
222	   include the latency of the switch and the latency of the frame as
223	   well as the serialization delay. It is a picture of the 'whole'
224	   latency going through the DUT. For applications which are latency
225	   sensitive and can function with initial bytes of the frame, FIFO (or
226	   RFC 1242 Latency for bit forwarding devices) MAY be used. In all
227	   cases, the event combination used in Latency measurement MUST be
228	   reported.

230	2.2 Discussion

232	   As mentioned in section 2.1, FILO is the most important measuring
233	   definition.

235	   Not all DUTs are exclusively cut-through or store-and-forward. Data
236	   Center DUTs are frequently store-and-forward for smaller packet sizes
237	   and then adopting a cut-through behavior. The change of behavior
238	   happens at specific larger packet sizes. The value of the packet size
239	   for the behavior to change MAY be configurable depending on the DUT
240	   manufacturer. FILO covers all scenarios: Store-and-forward or cut-
241	   through.  The threshold of behavior change  does not matter for
242	   benchmarking since FILO covers both possible scenarios.

244	   LIFO mechanism can be used with store forward type of switches but
245	   not with cut-through type of switches, as it will provide negative
246	   latency values for larger packet sizes because LIFO removes the
247	   serialization delay. Therefore, this mechanism MUST NOT be used when
248	   comparing latencies of two different DUTs.

250	2.3 Measurement Units

252	   The measuring methods to use for benchmarking purposes are as
253	   follows:

255	   1) FILO MUST be used as a measuring method, as this will include the
256	   latency of the packet; and today the application commonly needs to
257	   read the whole packet to process the information and take an action.

259	   2) FIFO MAY be used for certain applications able to proceed the data
260	   as the first bits arrive, as for example for a Field-Programmable
261	   Gate Array (FPGA)

263	   3) LIFO MUST NOT be used, because it subtracts the latency of the
264	   packet; unlike all the other methods.

266	3 Jitter

268	3.1 Definition

270	   Jitter in the data center context is synonymous with the common term
271	   Delay variation. It is derived from multiple measurements of one-way
272	   delay, as described in RFC 3393. The mandatory definition of Delay
273	   Variation is the Packet Delay Variation (PDV) from section 4.2 of
274	   [RFC5481]. When considering a stream of packets, the delays of all
275	   packets are subtracted from the minimum delay over all packets in the
276	   stream. This facilitates assessment of the range of delay variation
277	   (Max - Min), or a high percentile of PDV (99th percentile, for
278	   robustness against outliers).

280	   When First-bit to Last-bit timestamps are used for Delay measurement,
281	   then Delay Variation MUST be measured using packets or frames of the
282	   same size, since the definition of latency includes the serialization
283	   time for each packet. Otherwise if using First-bit to First-bit, the
284	   size restriction does not apply.

286	3.2 Discussion

288	   In addition to PDV Range and/or a high percentile of PDV, Inter-
289	   Packet Delay Variation (IPDV) as defined in section 4.1 of [RFC5481]
290	   (differences between two consecutive packets) MAY be used for the
291	   purpose of determining how packet spacing has changed during
292	   transfer, for example, to see if packet stream has become closely-
293	   spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be
294	   used, as this collapses the "bursty" and "dispersed" sides of the
295	   IPDV distribution together.

297	3.3 Measurement Units

299	   The measurement of delay variation is expressed in units of seconds.
300	   A PDV histogram MAY be provided for the population of packets
301	   measured.

303	4 Physical Layer Calibration

305	4.1 Definition

307	   The calibration of the physical layer consists of defining and
308	   measuring the latency of the physical devices used to perform tests
309	   on the DUT.

311	   It includes the list of all physical layer components used as listed
312	   here after:

314	   -Type of device used to generate traffic / measure traffic

316	   -Type of line cards used on the traffic generator

318	   -Type of transceivers on traffic generator

320	   -Type of transceivers on DUT

322	   -Type of cables
323	   -Length of cables

325	   -Software name, and version of traffic generator and DUT

327	   -List of enabled features on DUT MAY be provided and is recommended
328	   (especially the control plane protocols such as Link Layer Discovery
329	   Protocol, Spanning-Tree etc.). A comprehensive configuration file MAY
330	   be provided to this effect.

332	4.2 Discussion

334	   Physical layer calibration is part of the end to end latency, which
335	   should be taken into acknowledgment while evaluating the DUT. Small
336	   variations of the physical components of the test may impact the
337	   latency being measured, therefore they MUST be described when
338	   presenting results.

340	4.3 Measurement Units

342	   It is RECOMMENDED to use all cables of: The same type, the same
343	   length, when possible using the same vendor. It is a MUST to document
344	   the cables specifications on section 4.1 along with the test results.
345	   The test report MUST specify if the cable latency has been removed
346	   from the test measures or not. The accuracy of the traffic generator
347	   measure MUST be provided (this is usually a value in the 20ns range
348	   for current test equipment).

350	5 Line rate

352	5.1 Definition

354	   The transmit timing, or maximum transmitted data rate is controlled
355	   by the "transmit clock" in the DUT.  The receive timing (maximum
356	   ingress data rate) is derived from the transmit clock of the
357	   connected interface.

359	   The line rate or physical layer frame rate is the maximum capacity to
360	   send frames of a specific size at the transmit clock frequency of the
361	   DUT.

363	   The term "nominal value of Line Rate" defines the maximum speed
364	   capability for the given port; for example 1GE, 10GE, 40GE, 100GE
365	   etc.

367	   The frequency ("clock rate") of the transmit clock in any two
368	   connected interfaces will never be precisely the same; therefore, a
369	   tolerance is needed. This will be expressed by Parts Per Million
370	   (PPM) value. The IEEE standards allow a specific +/- variance in the
371	   transmit clock rate, and Ethernet is designed to allow for small,
372	   normal variations between the two clock rates. This results in a
373	   tolerance of the line rate value when traffic is generated from a
374	   testing equipment to a DUT.

376	   Line rate SHOULD be measured in frames per second.

378	5.2 Discussion

380	   For a transmit clock source, most Ethernet switches use "clock
381	   modules" (also called "oscillator modules") that are sealed,
382	   internally temperature-compensated, and very accurate. The output
383	   frequency of these modules is not adjustable because it is not
384	   necessary.  Many test sets, however, offer a software-controlled
385	   adjustment of the transmit clock rate. These adjustments SHOULD be
386	   used to compensate the test equipment in order to not send more than
387	   the line rate of the DUT.

389	   To allow for the minor variations typically found in the clock rate
390	   of commercially-available clock modules and other crystal-based
391	   oscillators, Ethernet standards specify the maximum transmit clock
392	   rate variation to be not more than +/- 100 PPM (parts per million)
393	   from a calculated center frequency. Therefore a DUT must be able to
394	   accept frames at a rate within +/- 100 PPM to comply with the
395	   standards.

397	   Very few clock circuits are precisely +/- 0.0 PPM because:

399	   1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per
400	   million) variance over time. Therefore it is normal for the frequency
401	   of the oscillator circuits to experience variation over time and over
402	   a wide temperature range, among external factors.

404	   2.The crystals, or clock modules, usually have a specific  +/- PPM
405	   variance that is significantly better than +/- 100 PPM. Often times
406	   this is +/- 30 PPM or better in order to be considered a
407	   "certification instrument".

409	   When testing an Ethernet switch throughput at "line rate", any
410	   specific switch will have a clock rate variance. If a test set is
411	   running +1 PPM faster than a switch under test, and a sustained line
412	   rate test is performed,  a gradual increase in latency and eventually
413	   packet drops as buffers fill and overflow in the switch can be
414	   observed. Depending on how much clock variance there is between the
415	   two connected systems, the effect may be seen after the traffic
416	   stream has been running for a few hundred microseconds, a few
417	   milliseconds, or seconds. The same low latency and no-packet-loss can
418	   be demonstrated by setting the test set link occupancy to slightly
419	   less than 100 percent link occupancy. Typically 99 percent link
420	   occupancy produces excellent low-latency and no packet loss. No
421	   Ethernet switch or router will have a transmit clock rate of exactly
422	   +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is
423	   precisely +/- 0.0 PPM.

425	   Test set equipment manufacturers are well-aware of the standards, and
426	   allow a software-controlled +/- 100 PPM "offset" (clock-rate
427	   adjustment) to compensate for normal variations in the clock speed of
428	   DUTs. This offset adjustment allows engineers to determine the
429	   approximate speed the connected device is operating, and verify that
430	   it is within parameters allowed by standards.

432	5.3 Measurement Units

434	   "Line Rate" can be measured in terms of "Frame Rate":

436	   Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap
437	   + Preamble + Start-Frame Delimiter)

439	   Minimum_Gap represents the inter frame gap. This formula "scales up"
440	   or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so
441	   on.

443	   Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate =
444	   1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672
445	   Frame Rate = 1,488,095.2 frames per second.

447	   Considering the allowance of +/- 100 PPM, a switch may "legally"
448	   transmit traffic at a frame rate between 1,487,946.4 FPS and
449	   1,488,244 FPS.  Each 1 PPM variation in clock rate will translate to
450	   a 1.488 frame-per-second frame rate increase or decrease.

452	   In a production network, it is very unlikely to see precise line rate
453	   over a very brief period. There is no observable difference between
454	   dropping packets at 99% of line rate and 100% of line rate.

456	   Line rate can be measured at 100% of line rate with a -100PPM
457	   adjustment.

459	   Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.

461	   The PPM adjustment SHOULD only be used for a line rate type of
462	   measurement.

464	6  Buffering

466	6.1 Buffer

468	6.1.1 Definition

470	   Buffer Size: The term buffer size represents the total amount of
471	   frame buffering memory available on a DUT. This size is expressed in
472	   B (byte); KB (kilobyte), MB (megabyte) or GB (gigabyte). When the
473	   buffer size is expressed it SHOULD be defined by a size metric stated
474	   above. When the buffer size is expressed, an indication of the frame
475	   MTU used for that measurement is also necessary as well as the cos
476	   (class of service) or dscp (differentiated services code point) value
477	   set; as often times the buffers are carved by quality of service
478	   implementation. Please refer to the buffer efficiency section for
479	   further details.

481	   Example: Buffer Size of DUT when sending 1518 byte frames is 18 MB.

483	   Port Buffer Size: The port buffer size is the amount of buffer for a
484	   single ingress port, egress port or combination of ingress and egress
485	   buffering location for a single port. The reason for mentioning the
486	   three locations for the port buffer is because the DUT buffering
487	   scheme can be unknown or untested, and so knowing the buffer location
488	   helps clarify the buffer architecture and consequently the total
489	   buffer size. The Port Buffer Size is an informational value that MAY
490	   be provided from the DUT vendor. It is not a value that is tested by
491	   benchmarking. Benchmarking will be done using the Maximum Port Buffer
492	   Size or Maximum Buffer Size methodology.

494	   Maximum Port Buffer Size: In most cases, this is the same as the Port
495	   Buffer Size. In certain switch architecture called SoC (switch on
496	   chip), there is a port buffer and a shared buffer pool available for
497	   all ports. The Maximum Port Buffer Size , in terms of an SoC buffer,
498	   represents the sum of the port buffer and the maximum value of shared
499	   buffer allowed for this port, defined in terms of B (byte), KB
500	   (kilobyte), MB (megabyte), or GB (gigabyte). The Maximum Port Buffer
501	   Size needs to be expressed along with the frame MTU used for the
502	   measurement and the cos or dscp bit value set for the test.

504	   Example: A DUT has been measured to have 3KB of port buffer for 1518
505	   frame size packets and a total of 4.7 MB of maximum port buffer for
506	   1518 frame size packets and a cos of 0.

508	   Maximum DUT Buffer Size: This is the total size of Buffer a DUT can
509	   be measured to have. It is, most likely, different than than the
510	   Maximum Port Buffer Size. It can also be different from the sum of
511	   Maximum Port Buffer Size. The Maximum Buffer Size needs to be
512	   expressed along with the frame MTU used for the measurement and along
513	   with the cos or dscp value set during the test.

515	   Example: A DUT has been measured to have 3KB of port buffer for 1518
516	   frame size packets and a total of 4.7 MB of maximum port buffer for
517	   1518 B frame size packets. The DUT has a Maximum Buffer Size of 18 MB
518	   at 1500 B and a cos of 0.

520	   Burst: The burst is a fixed number of packets sent over a percentage
521	   of linerate of a defined port speed. The amount of frames sent are
522	   evenly distributed across the interval, T. A constant, C, can be
523	   defined to provide the average time between two consecutive packets
524	   evenly spaced.

526	   Microburst: It is a burst. A microburst is when packet drops occur
527	   when there is not sustained or noticeable congestion upon a link or
528	   device. A characterization of microburst is when the Burst is not
529	   evenly distributed over T, and is less than the constant C [C=
530	   average time between two consecutive packets evenly spaced out].

532	   Intensity of Microburst: This is a percentage, representing the level
533	   of microburst between 1 and 100%. The higher the number the higher
534	   the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] /
535	   Sum(packets)]]*100

537	   The above definitions are not meant to comment on the ideal sizing of
538	   a buffer, rather on how to measure it. A larger buffer is not
539	   necessarily better and can cause issues with buffer bloat.

541	6.1.2 Discussion

543	   When measuring buffering on a DUT, it is important to understand the
544	   behavior for each and all ports. This provides data for the total
545	   amount of buffering available on the switch. The terms of buffer
546	   efficiency here helps one understand the optimum packet size for the
547	   buffer, or the real volume of the buffer available for a specific
548	   packet size. This section does not discuss how to conduct the test
549	   methodology; instead, it explains the buffer definitions and what
550	   metrics should be provided for a comprehensive data center device
551	   buffering benchmarking.

553	6.1.3 Measurement Units

555	   When Buffer is measured:

557	   -The buffer size MUST be measured

559	   -The port buffer size MAY be provided for each port

561	   -The maximum port buffer size MUST be measured

563	   -The maximum DUT buffer size MUST be measured

565	   -The intensity of microburst MAY be mentioned when a microburst test
566	   is performed

568	   -The cos or dscp value set during the test SHOULD be provided

570	6.2 Incast
571	6.2.1 Definition

573	   The term Incast, very commonly utilized in the data center, refers to
574	   the traffic pattern of many-to-one or many-to-many traffic patterns.
575	   It measures the number of ingress and egress ports and the level of
576	   synchronization attributed, as defined in this section. Typically in
577	   the data center it would refer to many different ingress server ports
578	   (many), sending traffic to a common uplink (many-to-one), or multiple
579	   uplinks (many-to-many). This pattern is generalized for any network
580	   as many incoming ports sending traffic to one or few uplinks.

582	   Synchronous arrival time: When two, or more, frames of respective
583	   sizes L1 and L2 arrive at their respective one or multiple ingress
584	   ports, and there is an overlap of the arrival time for any of the
585	   bits on the Device Under Test (DUT), then the frames L1 and L2 have a
586	   synchronous arrival times. This is called Incast regardless of in
587	   many-to-one (simpler form) or, many-to-many.

589	   Asynchronous arrival time: Any condition not defined by synchronous
590	   arrival time.

592	   Percentage of synchronization: This defines the level of overlap
593	   [amount of bits] between the frames L1,L2..Ln.

595	   Example: Two 64 bytes frames, of length L1 and L2, arrive to ingress
596	   port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes
597	   between the two where L1 and L2 were at the same time on the
598	   respective ingress ports. Therefore the percentage of synchronization
599	   is 10%.

601	   Stateful type traffic defines packets exchanged with a stateful
602	   protocol such as TCP.

604	   Stateless type traffic defines packets exchanged with a stateless
605	   protocol such as UDP.

607	6.2.2 Discussion

609	   In this scenario, buffers are solicited on the DUT. In an ingress
610	   buffering mechanism, the ingress port buffers would be solicited
611	   along with Virtual Output Queues, when available; whereas in an
612	   egress buffer mechanism, the egress buffer of the one outgoing port
613	   would be used.

615	   In either case, regardless of where the buffer memory is located on
616	   the switch architecture, the Incast creates buffer utilization.

618	   When one or more frames having synchronous arrival times at the DUT
619	   they are considered forming an Incast.

621	6.2.3 Measurement Units

623	   It is a MUST to measure the number of ingress and egress ports. It is
624	   a MUST to have a non-null percentage of synchronization, which MUST
625	   be specified.

627	7 Application Throughput: Data Center Goodput

629	7.1. Definition

631	   In Data Center Networking, a balanced network is a function of
632	   maximal throughput and minimal loss at any given time. This is
633	   captured by the Goodput [4]. Goodput is the application-level
634	   throughput. For standard TCP applications, a very small loss can have
635	   a dramatic effect on application throughput. [RFC2647] has a
636	   definition of Goodput; the definition in this publication is a
637	   variance.

639	   Goodput is the number of bits per unit of time forwarded to the
640	   correct destination interface of the DUT, minus any bits
641	   retransmitted.

643	7.2. Discussion

645	   In data center benchmarking, the goodput is a value that SHOULD be
646	   measured. It provides a realistic idea of the usage of the available
647	   bandwidth. A goal in data center environments is to maximize the
648	   goodput while minimizing the loss.

650	7.3. Measurement Units

652	   The Goodput, G, is then measured by the following formula:

654	   G=(S/F) x V bytes per second

656	   -S represents the payload bytes, which does not include packet or TCP
657	   headers

659	   -F is the frame size

661	   -V is the speed of the media in bytes per second

663	   Example: A TCP file transfer over HTTP protocol on a 10GB/s media.

665	   The file cannot be transferred over Ethernet as a single continuous
666	   stream. It must be broken down into individual frames of 1500B when
667	   the standard MTU (Maximum Transmission Unit) is used. Each packet
668	   requires 20B of IP header information and 20B of TCP header
669	   information; therefore 1460B are available per packet for the file
670	   transfer. Linux based systems are further limited to 1448B as they
671	   also carry a 12B timestamp. Finally, the date is transmitted in this
672	   example over Ethernet which adds a 26B overhead per packet.

674	   G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit per second or 1.196 GB
675	   per second.

677	   Please note: This example does not take into consideration the
678	   additional Ethernet overhead, such as the interframe gap (a minimum
679	   of 96 bit times), nor collisions (which have a variable impact,
680	   depending on the network load).

682	   When conducting Goodput measurements please document in addition to
683	   the 4.1 section the following information:

685	   -The TCP Stack used

687	   -OS Versions

689	   -NIC firmware version and model

691	   For example, Windows TCP stacks and different Linux versions can
692	   influence TCP based tests results.

694	8.  Security Considerations

696	   Benchmarking activities as described in this memo are limited to
697	   technology characterization using controlled stimuli in a laboratory
698	   environment, with dedicated address space and the constraints
699	   specified in the sections above.

701	   The benchmarking network topology will be an independent test setup
702	   and MUST NOT be connected to devices that may forward the test
703	   traffic into a production network, or misroute traffic to the test
704	   management network.

706	   Further, benchmarking is performed on a "black-box" basis, relying
707	   solely on measurements observable external to the DUT.

709	   Special capabilities SHOULD NOT exist in the DUT specifically for
710	   benchmarking purposes. Any implications for network security arising
711	   from the DUT SHOULD be identical in the lab and in production
712	   networks.

714	9.  IANA Considerations

716	   NO IANA Action is requested at this time.

718	10.  References

720	10.1.  Normative References

722	   [draft-ietf-bmwg-dcbench-methodology]  Avramov L. and Rapp J., "Data
723	         Center Benchmarking Methodology", RFC "draft-ietf-bmwg-dcbench-
724	         methodology", DATE (to be updated once published)

726	         [RFC1242]   Bradner, S. "Benchmarking Terminology for Network
727	         Interconnection Devices", RFC 1242, July 1991, <http://www.rfc-
728	         editor.org/info/rfc1242>

730	   [RFC2544]   Bradner, S. and J. McQuaid, "Benchmarking Methodology for
731	         Network Interconnect Devices", RFC 2544, March 1999,
732	         <http://www.rfc-editor.org/info/rfc2544>

734	         [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
735	         Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
736	         March 1997, <http://www.rfc-editor.org/info/rfc2119>

738	         [RFC5481] , Morton, A., "Packet Delay Variation Applicability
739	         Statement", BCP 14, RFC 5481, March 2009, <http://www.rfc-
740	         editor.org/info/rfc5481>

742	10.2.  Informative References

744	         [RFC2889]  Mandeville R. and Perser J., "Benchmarking
745	         Methodology for LAN Switching Devices", RFC 2889, August 2000,
746	         <http://www.rfc-editor.org/info/rfc2889>

748	   [RFC3918]  Stopp D. and Hickman B., "Methodology for IP Multicast
749	         Benchmarking", RFC 3918, October 2004, <http://www.rfc-
750	         editor.org/info/rfc3918>

752	   [4]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.
753	         Joseph, "Understanding TCP Incast Throughput Collapse in
754	         Datacenter Networks,
755	         "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf"

757	         [RFC2432] Dubray, K., "Terminology for IP Multicast
758	         Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October
759	         1998, <http://www.rfc-editor.org/info/rfc2432>

761	         [RFC2647] Newman D. ,"Benchmarking Terminology for Firewall
762	         Performance" BCP 14, RFC 2647, August 1999, <http://www.rfc-
763	         editor.org/info/rfc2647>

765	10.3.  Acknowledgments

767	         The authors would like to thank Alfred Morton, Scott Bradner,
768	         Ian Cox, Tim Stevenson for their reviews and feedback.

770	Authors' Addresses

772	         Lucien Avramov
773	         Google
774	         1600 Amphitheatre Parkway
775	         Mountain View, CA 94043
776	         United States
777	         Phone: +1 408 774 9077
778	         Email: lucien.avramov@gmail.com

780	         Jacob Rapp
781	         VMware
782	         3401 Hillview Ave
783	         Palo Alto, CA 94304
784	         United States
785	         Phone: +1 650 857 3367
786	         Email: jrapp@vmware.com