idnits 2.17.1 

draft-dcbench-def-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 10 characters in excess of 72.

  ** There are 6 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 167 has weird spacing: '...  is to  refer...'

  == Line 571 has weird spacing: '... S / Ft  bytes...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     3) LIFO MUST not be used, because it subtracts the latency of the
     packet; unlike all the other methods.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: '6' is mentioned on line 124, but not defined

  == Missing Reference: '5' is mentioned on line 624, but not defined

  == Unused Reference: '1' is defined on line 608, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 611, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 616, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 619, but no explicit reference
     was found in the text


     Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               L. Avramov
3	Internet-Draft, Intended status: Informational             Cisco Systems
4	Expires: April 20, 2015                                          J. Rapp
5	October 17, 2014                                         Hewlett-Packard

7	            Data Center Benchmarking Definitions and Metrics
8	                          draft-dcbench-def-02

10	Abstract

12	The purpose of this informational document is to establish definitions,
13	discussion and measurement techniques for data center benchmarking.
14	Also, it is to introduce new terminologies applicable to data center
15	performance evaluations. The purpose of this document is not to define
16	the test methodology, but rather establish the important concepts when
17	one is interested in benchmarking network equipment in the data center.

19	Status of this Memo

21	This Internet-Draft is submitted in full conformance with the provisions
22	of BCP 78 and BCP 79.

24	Internet-Drafts are working documents of the Internet Engineering Task
25	Force (IETF), its areas, and its working groups. Note that other groups
26	may also distribute working documents as Internet-Drafts.

28	Internet-Drafts are draft documents valid for a maximum of six months
29	and may be updated, replaced, or obsoleted by other documents at any
30	time.  It is inappropriate to use Internet-Drafts as reference material
31	or to cite them other than as "work in progress."

33	The list of current Internet-Drafts can be accessed at
34	http://www.ietf.org/1id-abstracts.html

36	The list of Internet-Draft Shadow Directories can be accessed at
37	http://www.ietf.org/shadow.html

39	Copyright Notice

41	Copyright (c) 2013 IETF Trust and the persons identified as the document
42	authors.  All rights reserved.

44	This document is subject to BCP 78 and the IETF Trust's Legal Provisions
45	Relating to IETF Documents (http://trustee.ietf.org/license-info) in
46	effect on the date of publication of this document.  Please review these
47	documents carefully, as they describe your rights and restrictions with
48	respect to this document.  Code Components extracted from this document
49	must include Simplified BSD License text as described in Section 4.e of
50	the Trust Legal Provisions and are provided without warranty as
51	described in the Simplified BSD License.

53	Table of Contents

55	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
56	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
57	     1.2. Definition format . . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Latency  . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	     2.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . .  4
60	     2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  5
61	     2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  5
62	   3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
63	     3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  6
64	     3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  6
65	     3.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  6
66	   4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . .  6
67	     4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  6
68	     4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  7
69	     4.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  7
70	   5 Line rate  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
71	     5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . .  7
72	     5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  8
73	     5.3 Measurement Units  . . . . . . . . . . . . . . . . . . . . .  9
74	   6  Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
75	     6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
76	       6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 10
77	       6.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 11
78	       6.1.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 11
79	     6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
80	       6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 12
81	       6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12
82	       6.2.3 Measurement Units  . . . . . . . . . . . . . . . . . . . 13
83	   7 Application Throughput: Data Center Goodput  . . . . . . . . . . 13
84	     7.1. Definition  . . . . . . . . . . . . . . . . . . . . . . . . 13
85	     7.2. Discussion  . . . . . . . . . . . . . . . . . . . . . . . . 13
86	     7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 13
87	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
88	     3.1.  Normative References . . . . . . . . . . . . . . . . . . . 14
89	     3.2.  Informative References . . . . . . . . . . . . . . . . . . 14
90	     3.3.  URL References . . . . . . . . . . . . . . . . . . . . . . 14
91	     3.4.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . 15
92	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15

94	1.  Introduction

96	   Traffic patterns in the data center are not uniform and are contently
97	   changing. They are dictated by the nature and variety of applications
98	   utilized in the data center. It can be largely east-west traffic
99	   flows in one data center and north-south in another, while some may
100	   combine both. Traffic patterns can be bursty in nature and contain
101	   many-to-one, many-to-many, or one-to-many flows. Each flow may also
102	   be small and latency sensitive or large and throughput sensitive
103	   while containing a mix of UDP and TCP traffic. All of which can
104	   coexist in a single cluster and flow through a single network device
105	   all at the same time. Benchmarking of network devices have long used
106	   RFC1242, RFC2432, RFC2544, RFC2889 and RFC3918. These benchmarks have
107	   largely been focused around various latency attributes and max
108	   throughput of the Device Under Test being benchmarked. These
109	   standards are good at measuring theoretical max throughput,
110	   forwarding rates and latency under testing conditions, but to not
111	   represent real traffic patterns that may affect these networking
112	   devices.

114	   The following defines a set of definitions, metrics and terminologies
115	   including congestion scenarios, switch buffer analysis and redefines
116	   basic definitions in order to represent a wide mix of traffic
117	   conditions.

119	1.1.  Requirements Language

121	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
122	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
123	   document are to be interpreted as described in RFC 2119 [6].

125	1.2. Definition format

127	   Term to be defined. (e.g., Latency)

129	   Definition: The specific definition for the term.

131	   Discussion: A brief discussion about the term, it's application and
132	   any restrictions on measurement procedures.

134	   Measurement Units: Methodology for the measure and units used to
135	   report measurements of this term, if applicable.

137	2.  Latency

139	2.1. Definition

141	   Latency is a the amount of time it takes a frame to transit the DUT.

143	   The Latency interval can be assessed between different combinations
144	   of events, irrespectively of the type of switching device (bit
145	   forwarding aka cut-through or store forward type of device)

147	   Traditionally the latency measurement definitions are:

149	   	FILO (First In Last Out) The time interval starting when the end of
150	   the first bit of the input frame reaches the input port and ending
151	   when the last bit of the output frame is seen on the output port

153	   	FIFO (First In First Out) The time interval starting when the end of
154	   the first bit of the input frame reaches the input port and ending
155	   when the start of the first bit of the output frame is seen on the
156	   output port

158	   	LILO (Last In Last Out) The time interval starting when the last bit
159	   of the input frame reaches the input port and the last bit of the
160	   output frame is seen on the output port

162	   	LIFO (Last In First Out) The time interval starting when the last
163	   bit of the input frame reaches the input port and ending when the
164	   first bit of the output frame is seen on the output port.

166	   Another possibility to summarize the four different definitions above
167	   is to  refer to the bit position as they normally occur: input to
168	   output.

170	            FILO is FL (First bit Last bit) 	FIFO is FF (First bit First
171	   bit) 	LILO is LL (Last bit Last bit) 	LIFO is LF (Last bit First bit)

173	   This definition explained in this section in context of data center
174	   switching benchmarking is in lieu of the previous definition of
175	   Latency defined in RFC 1242, section 3.8 and is quoted here:

177	   For store and forward devices: The time interval starting when the
178	   last bit of the input frame reaches the input port and ending when
179	   the first bit of the output frame is seen on the output port.

181	   For bit forwarding devices: The time interval starting when the end
182	   of the first bit of the input frame reaches the input port and ending
183	   when the start of the first bit of the output frame is seen on the
184	   output port.

186	2.2 Discussion

188	   FILO is the most important measuring definition. Any type of switches
189	   MUST be measured with the FILO mechanism: FILO will include the
190	   latency of the switch and the latency of the frame as well as the
191	   serialization delay. It is a picture of the 'whole' latency going
192	   through the DUT. For applications, which are latency sensitive and
193	   can function with initial bytes of the frame, FIFO MAY be an
194	   additional type of measuring to supplement FILO.

196	   LIFO mechanism can be used with store forward type of switches but
197	   not with cut-through type of switches, as it will provide negative
198	   latency values for larger packet sizes. Therefore this mechanism MUST
199	   NOT be used when comparing latencies of two different DUTs.

201	2.3 Measurement Units

203	   The measuring methods to use for benchmarking purposes are as follow:

205	   1) FILO MUST be used as a measuring method, as this will include the
206	   latency of the packet; and today the application commonly need to
207	   read the whole packet to process the information and take an action.

209	   2) FIFO MAY be used for certain applications able to proceed data as
210	   the first bits arrive (FPGA for example)

212	   3) LIFO MUST not be used, because it subtracts the latency of the
213	   packet; unlike all the other methods.

215	3 Jitter

217	3.1 Definition

219	   The definition of Jitter is covered extensively in RFC 3393. This
220	   definition is not meant to replace that definition, but it is meant
221	   to provide guidance of use for data center network devices.

223	   The use of Jitter is in according with the variation delay definition
224	   from RFC 3393:

226	   The second meaning has to do with the variation of a metric (e.g.,
227	   delay) with respect to some reference metric (e.g., average delay or
228	   minimum delay).  This meaning is frequently used by computer
229	   scientists and frequently (but not always) refers to variation in
230	   delay.

232	   Even with the reference to RFC 3393, there are many definitions of
233	   "jitter" possible. The one selected for Data Center Benchmarking is
234	   closest to RFC 3393.

236	3.2 Discussion

238	   Jitter can be measured in different scenarios:-packet to packet delay
239	   variation-delta between min and max packet delay variation for all
240	   packets sent.

242	3.3 Measurement Units

244	   The jitter MUST be measured when sending packets of the same size.
245	   Jitter MUST be measured as packet to packet delay variation and delta
246	   between min and max packet delay variation of all packets sent. A
247	   histogram MAY be provided as a population of packets measured per
248	   latency or latency buckets.

250	4 Physical Layer Calibration

252	4.1 Definition

254	   The calibration of the physical layer consists of defining and
255	   measuring the latency of the physical devices used to perform test on
256	   the DUT.

258	   It includes the list of all physical layer components used as listed
259	   here after:

261	   -type of device used to generate traffic / measure traffic

263	   -type of line cards used on the traffic generator

265	   -type of transceivers on traffic generator

267	   -type of transceivers on DUT

269	   -type of cables

271	   -length of cables

273	   -software name, and version of traffic generator and DUT

275	   -list of enabled features on DUT MAY be provided and is recommended
276	   [especially the control plane protocols such as LLDP, Spanning-Tree
277	   etc.]. A comprehensive configuration file MAY be provided to this
278	   effect.

280	4.2 Discussion

282	   Physical layer calibration is part of the end to end latency, which
283	   should be taken into acknowledgment while evaluating the DUT. Small
284	   variations of the physical components of the test may impact the
285	   latency being measure so they MUST be described when presenting
286	   results.

288	4.3 Measurement Units

290	   It is RECOMMENDED to use all cables of : the same type, the same
291	   length, when possible using the same vendor. It is a MUST to document
292	   the cables specifications on section [4.1s] along with the test
293	   results. The test report MUST specify if the cable latency has been
294	   removed from the test measures or not. The accuracy of the traffic
295	   generator measure MUST be provided [this is usually a value in the
296	   20ns range for current test equipment].

298	5 Line rate

300	5.1 Definition
301	   The transmit timing, or maximum transmitted data rate is controlled
302	   by the "transmit clock" in the DUT.  The receive timing (maximum
303	   ingress data rate) is derived from the transmit clock of the
304	   connected interface.

306	   The line rate or physical layer frame rate is the maximum capacity to
307	   send frames of a specific size at the transmit clock frequency of the
308	   DUT.

310	   The term port capacity term defines the maximum speed capability for
311	   the given port; for example 1GE, 10GE, 40GE, 100GE etc.

313	   The frequency ("clock rate") of the transmit clock in any two
314	   connected interfaces will never be precisely the same, therefore a
315	   tolerance is needed, this will be expressed by Parts Per Million
316	   (PPM) value. The IEEE standards allow a specific +/- variance in the
317	   transmit clock rate, and Ethernet is designed to allow for small,
318	   normal variations between the two clock rates. This results in a
319	   tolerance of the line rate value when traffic is generated from a
320	   testing equipment to a DUT.

322	5.2 Discussion

324	   For a transmit clock source, most Ethernet switches use "clock
325	   modules" (also called "oscillator modules") that are sealed,
326	   internally temperature-compensated, and very accurate. The output
327	   frequency of these modules is not adjustable because it is not
328	   necessary.  Many test sets, however, offer a software-controlled
329	   adjustment of the transmit clock rate, which should be used to
330	   compensate the test equipment to not send more than line rate of the
331	   DUT.

333	   To allow for the minor variations typically found in the clock rate
334	   of commercially-available clock modules and other crystal-based
335	   oscillators, Ethernet standards specify the maximum transmit clock
336	   rate variation to be not more than +/- 100 PPM (parts per million)
337	   from a calculated center frequency. Therefore a DUT must be able to
338	   accept frames at a rate within +/- 100 PPM to comply with the
339	   standards.

341	   Very few clock circuits are precisely +/- 0.0 PPM because:

343	   1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per
344	   million) variance over time. Therefore it is normal for the frequency
345	   of the oscillator circuits to experience variation over time and over
346	   a wide temperature range, among external factors.

348	   2.The crystals or clock modules, usually have a specific  +/- PPM
349	   variance that is significantly better than +/- 100 PPM. Often times
350	   this is +/- 30 PPM or better in order to be considered a
351	   "certification instrument".

353	   When testing an Ethernet switch throughput at "line rate", any
354	   specific switch will have a clock rate variance. If a test set is
355	   running +1 PPM faster than a switch under test, and a sustained line
356	   rate test is performed,  a gradual increase in latency and eventually
357	   packet drops as buffers fill and overflow in the switch can be
358	   observed. Depending on how much clock variance there is between the
359	   two connected systems, the effect may be seen after the traffic
360	   stream has been running for a few hundred microseconds, a few
361	   milliseconds, or seconds. The same low latency and no-packet-loss can
362	   be demonstrated by setting the test set link occupancy to slightly
363	   less than 100 percent link occupancy. Typically 99 percent link
364	   occupancy produces excellent low-latency and no packet loss. No
365	   Ethernet switch or router will have a transmit clock rate of exactly
366	   +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is
367	   precisely +/- 0.0 PPM.

369	   Test set equipment manufacturers are well-aware of the standards, and
370	   allows a software-controlled +/- 100 PPM "offset" (clock-rate
371	   adjustment) to compensate for normal variations in the clock speed of
372	   "devices under test". This offset adjustment allows engineers to
373	   determine the approximate speed the connected device is operating,
374	   and verify that it is within parameters allowed by standards.

376	5.3 Measurement Units

378	   "Line Rate" CAN be measured in terms of "Frame Rate":

380	   Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap
381	   + Preamble + Start-Frame Delimiter)

383	   Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate =
384	   1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672
385	   Frame Rate = 1,488,095.2 frames per second.

387	   Considering the allowance of +/- 100 PPM, a switch may "legally"
388	   transmit traffic at a frame rate between 1,487,946.4 FPS and
389	   1,488,244 FPS.  Each 1 PPM variation in clock rate will translate to
390	   a 1.488 frame-per-second frame rate increase or decrease.

392	   In a production network, it is very unlikely to see precise line rate
393	   over a very brief period. There is no observable difference between
394	   dropping packets at 99% of line rate and 100% of line rate. -Line
395	   rate CAN measured at 100% of line rate with a -100PPM adjustment. -
396	   Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.-The PPM
397	   adjustment SHOULD only be used for a line rate type of measurement

399	6  Buffering

401	6.1 Buffer

403	6.1.1 Definition

405	   Buffer Size: the term buffer size, represents the total amount of
406	   frame buffering memory available on a DUT. This size is expressed in
407	   Byte; KB (kilobytes), MB (megabytes) or GB (gigabyte). When the
408	   buffer size is expressed it SHOULD be defined by a size metric
409	   defined above. When the buffer size is expressed, an indication of
410	   the frame MTU used for that measurement is also necessary as well as
411	   the cos or dscp value set; as often times the buffers are carved by
412	   quality of service implementation. (please refer to the buffer
413	   efficiency section for further details).

415	   Example: Buffer Size of DUT when sending 1518 bytes frames is 18 Mb.

417	   Port Buffer Size: the port buffer size is the amount of buffer a
418	   single ingress port, egress port or combination of ingress and egress
419	   buffering location for a single port. The reason of mentioning the
420	   three locations for the port buffer is, that the DUT buffering scheme
421	   can be unknown or untested, and therefore the indication of where the
422	   buffer is located helps understand the buffer architecture and
423	   therefore the total buffer size. The Port Buffer Size is an
424	   informational value that MAY be provided from the DUT vendor. It is
425	   not a value that is tested by benchmarking. Benchmarking will be done
426	   using the Maximum Port Buffer Size or Maximum Buffer Size
427	   methodology.

429	   Maximum Port Buffer Size: this is in most cases the same as the Port
430	   Buffer Size. In certain switch architecture called SoC (switch on
431	   chip), there is a concept of port buffer and shared buffer pool
432	   available for all ports. Maximum Port Buffer, defines the scenario of
433	   a SoC buffer, where this amount in B (byte), KB (kilobyte), MB
434	   (megabyte) or GB (gigabyte) would represent the sum of the port
435	   buffer along with the maximum value of shared buffer this given port
436	   can take. The Maximum Port Buffer Size needs to be expressed along
437	   with the frame MTU used for the measurement and the cos or dscp bit
438	   value set for the test.

440	   Example: a DUT has been measured to have 3KB of port buffer for 1518
441	   frame size packets and a total of 4.7 MB of maximum port buffer for
442	   1518 frame size packets and a cos of 0.

444	   Maximum DUT Buffer Size: this is the total size of Buffer a DUT can
445	   be measured to have. It is most likely different than the Maximum
446	   Port Buffer Size. It CAN also be different from the sum of Maximum
447	   Port Buffer Size. The Maximum Buffer Size needs to be expressed along
448	   with the frame MTU used for the measurement and along with the cos or
449	   dscp value set during the test.

451	   Example: a DUT has been measured to have 3KB of port buffer for 1518
452	   frame size packets and a total of 4.7 MB of maximum port buffer for
453	   1518 frame size packets. The DUT has a Maximum Buffer Size of 18 MB
454	   at 1500 bytes and a cos of 0.

456	   Burst: The burst is a fixed number of packets sent over a percentage
457	   of linerate of a defined port speed. The amount of frames sent are
458	   evenly distributed across the interval T. A constant C, can be
459	   defined to provide the average time between two consecutive packets
460	   evenly spaced.

462	   Microburst: it is a burst. A microburst is when packet drops occur
463	   when there is not sustained or noticeable congestion upon a link or
464	   device. A characterization of microburst is when the Burst is not
465	   evenly distributed over T, and is less than the constant C [C=
466	   average time between two consecutive packets evenly spaced out].

468	   Intensity of Microburst: this is a percentage, representing the level
469	   of microburst between 1 and 100%. The higher the number the higher
470	   the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] /
471	   Sum(packets)]]*100

473	6.1.3 Discussion

475	   When measuring buffering on a DUT, it is important to understand what
476	   the behavior is for each port, and also for all ports as this will
477	   provide an evidence of the total amount of buffering available on the
478	   switch. The terms of buffer efficiency here helps one understand what
479	   is the optimum packet size for the buffer to be used, or what is the
480	   real volume of buffer available for a specific packet size. This
481	   section does not discuss how to conduct the test methodology, it
482	   rather explains the buffer definitions and what metrics should be
483	   provided for a comprehensive data center device buffering
484	   benchmarking.

486	6.1.3 Measurement Units

488	   When Buffer is measured:-the buffer size MUST be measured-the port
489	   buffer size MAY be provided for each port-the maximum port buffer
490	   size MUST be measured-the maximum DUT buffer size MUST be measured-
491	   the intensity of microburst MAY be mentioned when a microburst test
492	   is performed-the cos or dscp value set during the test SHOULD be
493	   provided

495	6.2 Incast
496	6.2.1 Definition

498	   The term Incast, very commonly utilized in the data center, refers to
499	   the traffic pattern of many-to-one or many-to-many conversations.
500	   Typically in the data center it would refer to many different ingress
501	   server ports(many), sending traffic to a common uplink (one), or
502	   multiple uplinks (many). This pattern is generalized for any network
503	   as many incoming ports sending traffic to one or few uplinks. It can
504	   also be found in many-to-many traffic patterns.

506	   Synchronous arrival time: When two, or more, frames of respective
507	   sizes L1 and L2 arrive at their respective one or multiple ingress
508	   ports, and there is an overlap of the arrival time for any of the
509	   bits on the DUT, then the frames L1 and L2 have a synchronous arrival
510	   times. This is called incast.

512	   Asynchronous arrival time: Any condition not defined by synchronous.

514	   Percentage of synchronization: this defines the level of overlap
515	   [amount of bits] between the frames L1,L2..Ln.

517	   Example: two 64 bytes frames, of length L1 and L2, arrive to ingress
518	   port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes
519	   between the two where L1 and L2 were at the same time on the
520	   respective ingress ports. Therefore the percentage of synchronization
521	   is 10%.

523	   Stateful type traffic defines packets exchanged with a stateful
524	   protocol such as for example TCP.

526	   Stateless type traffic defines packets exchanged with a stateless
527	   protocol such as for example UDP.

529	6.2.2 Discussion

531	   In this scenario, buffers are solicited on the DUT. In a ingress
532	   buffering mechanism, the ingress port buffers would be solicited
533	   along with Virtual Output Queues, when available; whereas in an
534	   egress buffer mechanism, the egress buffer of the one outgoing port
535	   would be used.

537	   In either cases, regardless of where the buffer memory is located on
538	   the switch architecture; the Incast creates buffer utilization.

540	   When one or more frames having synchronous arrival times at the DUT
541	   they are considered forming an incast.

543	6.2.3 Measurement Units

545	   It is a MUST to measure the number of ingress and egress ports. It is
546	   a MUST to have a non null percentage of synchronization, which MUST
547	   be specified.

549	7 Application Throughput: Data Center Goodput

551	7.1. Definition

553	   In Data Center Networking, a balanced network is a function of
554	   maximal throughput 'and' minimal loss at any given time. This is
555	   defined by the Goodput. Goodput is the application-level throughput.
556	   It is measured in bytes / second. Goodput is the measurement of the
557	   actual payload of the packet being sent.

559	7.2. Discussion

561	   In data center benchmarking, the goodput is a value that SHOULD be
562	   measured. It provides a realistic idea of the usage of the available
563	   bandwidth. A goal in data center environments is to maximize the
564	   goodput while minimizing the loss.

566	7.3. Measurement Units

568	   When S is the total bytes received from all senders [not inclusive of
569	   packet headers or TCP headers - it's the payload] and Ft is the
570	   Finishing Time of the last sender; the Goodput G is then measured by
571	   the following formula: G= S / Ft  bytes per second

573	   Example: a TCP file transfer over HTTP protocol on a 10Gb/s media.
574	   The file cannot be transferred over Ethernet as a single continuous
575	   stream. It must be broken down into individual frames of 1500 bytes
576	   when the standard MTU [Maximum Transmission Unit] is used. Each
577	   packet requires 20 bytes of IP header information and 20 bytes of TCP
578	   header information, therefore 1460 byte are available per packet for
579	   the file transfer. Linux based systems are further limited to 1448
580	   bytes as they also carry a 12 byte timestamp. Finally, the date is
581	   transmitted in this example over Ethernet which adds a 26 byte
582	   overhead per packet.

584	   G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit/s or 1.196 Gigabytes per
585	   second.

587	   Please note: this example does not take into consideration additional
588	   Ethernet overhead, such as the interframe gap (a minimum of 96 bit
589	   times), nor collisions (which have a variable impact, depending on
590	   the network load).

592	   When conducting Goodput measurements please document in addition to
593	   the 4.1 section:

595	   -the TCP Stack used

597	   -OS Versions

599	   -NIC firmware version and model

601	   For example, Windows TCP stacks and different Linux versions can
602	   influence TCP based tests results.

604	8.  References

606	3.1.  Normative References

608	   [1]   Bradner, S. "Benchmarking Terminology for Network
609	         Interconnection Devices", RFC 1242, July 1991.

611	   [2]   Bradner, S. and J. McQuaid, "Benchmarking Methodology for
612	         Network Interconnect Devices", RFC 2544, March 1999.

614	3.2.  Informative References

616	   [3]  Mandeville R. and Perser J., "Benchmarking Methodology for LAN
617	         Switching Devices", RFC 2889, August 2000.

619	   [4]  Stopp D. and Hickman B., "Methodology for IP Multicast
620	         Benchmarking", BCP 26, RFC 3918, October 2004.

622	3.3.  URL References

624	   [5]  Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D.

626	         Joseph, "Understanding TCP Incast Throughput Collapse in
627	         Datacenter Networks",
628	         http://www.eecs.berkeley.edu/~ychen2/professional/TCPIncastWREN2009.pdf".

630	3.4.  Acknowledgments

632	         The authors would like to thank Ian Cox and Tim Stevenson for
633	         their reviews and feedback.

635	Authors' Addresses

637	         Lucien Avramov
638	         Cisco Systems
639	         170 West Tasman drive
640	         San Jose, CA 95134
641	         United States
642	         Phone: +1 408 526 7686
643	         Email: lavramov@cisco.com

645	         Jacob Rapp
646	         Hewlett-Packard Company
647	         3000 Hanover Street
648	         Palo Alto, CA 94304
649	         United States
650	         Phone: +1 650 857 3367
651	         Email: jacob.h.rapp@hp.com