idnits 2.17.1 draft-ietf-bmwg-dcbench-terminology-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 10 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 172 has weird spacing: '... is to refer...' == Line 596 has weird spacing: '... S / Ft bytes...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: 3) LIFO MUST not be used, because it subtracts the latency of the packet; unlike all the other methods. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '6' is mentioned on line 125, but not defined == Unused Reference: '1' is defined on line 633, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 636, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 641, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 644, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 647, but no explicit reference was found in the text Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 Internet-Draft, Intended status: Informational Google 4 Expires: July 3, 2017 J. Rapp 5 December 30, 2016 VMware 7 Data Center Benchmarking Terminology 8 draft-ietf-bmwg-dcbench-terminology-06 10 Abstract 12 The purpose of this informational document is to establish definitions, 13 discussion and measurement techniques for data center benchmarking. 14 Also, it is to introduce new terminologies applicable to data center 15 performance evaluations. The purpose of this document is not to define 16 the test methodology, but rather establish the important concepts when 17 one is interested in benchmarking network switches and routers in the 18 data center. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the provisions 23 of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering Task 26 Force (IETF), its areas, and its working groups. Note that other groups 27 may also distribute working documents as Internet-Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference material 32 or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/1id-abstracts.html 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html 40 Copyright Notice 42 Copyright (c) 2016 IETF Trust and the persons identified as the document 43 authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 46 Relating to IETF Documents (http://trustee.ietf.org/license-info) in 47 effect on the date of publication of this document. Please review these 48 documents carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this document 50 must include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 58 1.2. Definition format . . . . . . . . . . . . . . . . . . . . . 4 59 2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 5 62 2.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 63 3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 3.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 67 4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . . 7 68 4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 4.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 7 71 5 Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 5.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 9 75 6 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 76 6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 77 6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 10 78 6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 79 6.1.3 Measurement Units . . . . . . . . . . . . . . . . . . . 12 80 6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 81 6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 12 82 6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 13 83 6.2.3 Measurement Units . . . . . . . . . . . . . . . . . . . 13 84 7 Application Throughput: Data Center Goodput . . . . . . . . . . 13 85 7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 13 86 7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 14 87 7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 14 88 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 3.1. Normative References . . . . . . . . . . . . . . . . . . . 15 90 3.2. Informative References . . . . . . . . . . . . . . . . . . 15 91 3.4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 15 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 94 1. Introduction 96 Traffic patterns in the data center are not uniform and are contently 97 changing. They are dictated by the nature and variety of applications 98 utilized in the data center. It can be largely east-west traffic 99 flows in one data center and north-south in another, while some may 100 combine both. Traffic patterns can be bursty in nature and contain 101 many-to-one, many-to-many, or one-to-many flows. Each flow may also 102 be small and latency sensitive or large and throughput sensitive 103 while containing a mix of UDP and TCP traffic. All of which can 104 coexist in a single cluster and flow through a single network device 105 all at the same time. Benchmarking of network devices have long used 106 RFC1242, RFC2432, RFC2544, RFC2889 and RFC3918. These benchmarks have 107 largely been focused around various latency attributes and max 108 throughput of the Device Under Test being benchmarked. These 109 standards are good at measuring theoretical max throughput, 110 forwarding rates and latency under testing conditions, but to not 111 represent real traffic patterns that may affect these networking 112 devices. The data center networking devices covered are switches and 113 routers. 115 The following defines a set of definitions, metrics and terminologies 116 including congestion scenarios, switch buffer analysis and redefines 117 basic definitions in order to represent a wide mix of traffic 118 conditions. 120 1.1. Requirements Language 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 124 document are to be interpreted as described in RFC 2119 [6]. 126 1.2. Definition format 128 Term to be defined. (e.g., Latency) 130 Definition: The specific definition for the term. 132 Discussion: A brief discussion about the term, it's application and 133 any restrictions on measurement procedures. 135 Measurement Units: Methodology for the measure and units used to 136 report measurements of this term, if applicable. 138 2. Latency 140 2.1. Definition 142 Latency is a the amount of time it takes a frame to transit the DUT. 143 Latency is measured in unit of time (seconds, milliseconds, 144 microseconds and so on). The purpose of measuring latency is to 145 understand what is the impact of adding a device in the communication 146 path. 148 The Latency interval can be assessed between different combinations 149 of events, irrespectively of the type of switching device (bit 150 forwarding aka cut-through or store forward type of device) 152 Traditionally the latency measurement definitions are: 154 FILO (First In Last Out) The time interval starting when the end of 155 the first bit of the input frame reaches the input port and ending 156 when the last bit of the output frame is seen on the output port 158 FIFO (First In First Out) The time interval starting when the end of 159 the first bit of the input frame reaches the input port and ending 160 when the start of the first bit of the output frame is seen on the 161 output port 163 LILO (Last In Last Out) The time interval starting when the last bit 164 of the input frame reaches the input port and the last bit of the 165 output frame is seen on the output port 167 LIFO (Last In First Out) The time interval starting when the last 168 bit of the input frame reaches the input port and ending when the 169 first bit of the output frame is seen on the output port. 171 Another possibility to summarize the four different definitions above 172 is to refer to the bit position as they normally occur: input to 173 output. 175 FILO is FL (First bit Last bit) FIFO is FF (First bit First 176 bit) LILO is LL (Last bit Last bit) LIFO is LF (Last bit First bit) 178 This definition explained in this section in context of data center 179 switching benchmarking is in lieu of the previous definition of 180 Latency defined in RFC 1242, section 3.8 and is quoted here: 182 For store and forward devices: The time interval starting when the 183 last bit of the input frame reaches the input port and ending when 184 the first bit of the output frame is seen on the output port. 186 For bit forwarding devices: The time interval starting when the end 187 of the first bit of the input frame reaches the input port and ending 188 when the start of the first bit of the output frame is seen on the 189 output port. 191 2.2 Discussion 193 FILO is the most important measuring definition. Any type of switches 194 MUST be measured with the FILO mechanism: FILO will include the 195 latency of the switch and the latency of the frame as well as the 196 serialization delay. It is a picture of the 'whole' latency going 197 through the DUT. For applications, which are latency sensitive and 198 can function with initial bytes of the frame, FIFO MAY be an 199 additional type of measuring to supplement FILO. 201 Not all DUTs are exclusively cut-through or store-and-forward. Data 202 Center DUTs are frequently store-and-forward for smaller packet sizes 203 and then adopting a cut-through behavior. FILO covers all scenarios. 205 LIFO mechanism can be used with store forward type of switches but 206 not with cut-through type of switches, as it will provide negative 207 latency values for larger packet sizes because LIFO removes the 208 serialization delay. Therefore, this mechanism MUST NOT be used when 209 comparing latencies of two different DUTs. 211 2.3 Measurement Units 213 The measuring methods to use for benchmarking purposes are as follow: 215 1) FILO MUST be used as a measuring method, as this will include the 216 latency of the packet; and today the application commonly need to 217 read the whole packet to process the information and take an action. 219 2) FIFO MAY be used for certain applications able to proceed data as 220 the first bits arrive (FPGA for example) 222 3) LIFO MUST not be used, because it subtracts the latency of the 223 packet; unlike all the other methods. 225 3 Jitter 227 3.1 Definition 229 Jitter in the data center context is synonymous with the common term 230 Delay variation. It is derived from multiple measurements of one-way 231 delay, as described in RFC 3393. The mandatory definition of Delay 232 Variation is the PDV form from section 4.2 of RFC 5481. When 233 considering a stream of packets, the delays of all packets are 234 subtracted from the minimum delay over all packets in the stream. 235 This facilitates assessment of the range of delay variation (Max - 236 Min), or a high percentile of PDV (99th percentile, for robustness 237 against outliers). 239 If First-bit to Last-bit timestamps are used for Delay measurement, 240 then Delay Variation MUST be measured using packets or frames of the 241 same size, since the definition of latency includes the serialization 242 time for each packet. Otherwise if using First-bit to First-bit, the 243 size restriction does not apply. 245 3.2 Discussion 247 In addition to PDV Range and or a high percentile of PDV, Inter- 248 Packet Delay Variation (IPDV) as defined in section 4.1 of RFC5481 249 (differences between two consecutive packets) MAY be used for the 250 purpose of determining how packet spacing has changed during 251 transfer, for example to see if packet stream has become closely- 252 spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be 253 used, as this collapses the "bursty" and "dispersed" sides of the 254 IPDV distribution together. 256 3.3 Measurement Units 257 The measurement of delay variation is expressed in units of seconds. 258 A PDV histogram MAY be provided for the population of packets 259 measured. 261 4 Physical Layer Calibration 263 4.1 Definition 265 The calibration of the physical layer consists of defining and 266 measuring the latency of the physical devices used to perform test on 267 the DUT. 269 It includes the list of all physical layer components used as listed 270 here after: 272 -type of device used to generate traffic / measure traffic 274 -type of line cards used on the traffic generator 276 -type of transceivers on traffic generator 278 -type of transceivers on DUT 280 -type of cables 282 -length of cables 284 -software name, and version of traffic generator and DUT 286 -list of enabled features on DUT MAY be provided and is recommended 287 [especially the control plane protocols such as LLDP, Spanning-Tree 288 etc.]. A comprehensive configuration file MAY be provided to this 289 effect. 291 4.2 Discussion 293 Physical layer calibration is part of the end to end latency, which 294 should be taken into acknowledgment while evaluating the DUT. Small 295 variations of the physical components of the test may impact the 296 latency being measure so they MUST be described when presenting 297 results. 299 4.3 Measurement Units 301 It is RECOMMENDED to use all cables of : the same type, the same 302 length, when possible using the same vendor. It is a MUST to document 303 the cables specifications on section [4.1s] along with the test 304 results. The test report MUST specify if the cable latency has been 305 removed from the test measures or not. The accuracy of the traffic 306 generator measure MUST be provided [this is usually a value in the 307 20ns range for current test equipment]. 309 5 Line rate 311 5.1 Definition 313 The transmit timing, or maximum transmitted data rate is controlled 314 by the "transmit clock" in the DUT. The receive timing (maximum 315 ingress data rate) is derived from the transmit clock of the 316 connected interface. 318 The line rate or physical layer frame rate is the maximum capacity to 319 send frames of a specific size at the transmit clock frequency of the 320 DUT. 322 The term port capacity term defines the maximum speed capability for 323 the given port; for example 1GE, 10GE, 40GE, 100GE etc. 325 The frequency ("clock rate") of the transmit clock in any two 326 connected interfaces will never be precisely the same, therefore a 327 tolerance is needed, this will be expressed by Parts Per Million 328 (PPM) value. The IEEE standards allow a specific +/- variance in the 329 transmit clock rate, and Ethernet is designed to allow for small, 330 normal variations between the two clock rates. This results in a 331 tolerance of the line rate value when traffic is generated from a 332 testing equipment to a DUT. 334 Line rate SHOULD be measured in frames per second. 336 5.2 Discussion 338 For a transmit clock source, most Ethernet switches use "clock 339 modules" (also called "oscillator modules") that are sealed, 340 internally temperature-compensated, and very accurate. The output 341 frequency of these modules is not adjustable because it is not 342 necessary. Many test sets, however, offer a software-controlled 343 adjustment of the transmit clock rate, which should be used to 344 compensate the test equipment to not send more than line rate of the 345 DUT. 347 To allow for the minor variations typically found in the clock rate 348 of commercially-available clock modules and other crystal-based 349 oscillators, Ethernet standards specify the maximum transmit clock 350 rate variation to be not more than +/- 100 PPM (parts per million) 351 from a calculated center frequency. Therefore a DUT must be able to 352 accept frames at a rate within +/- 100 PPM to comply with the 353 standards. 355 Very few clock circuits are precisely +/- 0.0 PPM because: 357 1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per 358 million) variance over time. Therefore it is normal for the frequency 359 of the oscillator circuits to experience variation over time and over 360 a wide temperature range, among external factors. 362 2.The crystals or clock modules, usually have a specific +/- PPM 363 variance that is significantly better than +/- 100 PPM. Often times 364 this is +/- 30 PPM or better in order to be considered a 365 "certification instrument". 367 When testing an Ethernet switch throughput at "line rate", any 368 specific switch will have a clock rate variance. If a test set is 369 running +1 PPM faster than a switch under test, and a sustained line 370 rate test is performed, a gradual increase in latency and eventually 371 packet drops as buffers fill and overflow in the switch can be 372 observed. Depending on how much clock variance there is between the 373 two connected systems, the effect may be seen after the traffic 374 stream has been running for a few hundred microseconds, a few 375 milliseconds, or seconds. The same low latency and no-packet-loss can 376 be demonstrated by setting the test set link occupancy to slightly 377 less than 100 percent link occupancy. Typically 99 percent link 378 occupancy produces excellent low-latency and no packet loss. No 379 Ethernet switch or router will have a transmit clock rate of exactly 380 +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is 381 precisely +/- 0.0 PPM. 383 Test set equipment manufacturers are well-aware of the standards, and 384 allows a software-controlled +/- 100 PPM "offset" (clock-rate 385 adjustment) to compensate for normal variations in the clock speed of 386 "devices under test". This offset adjustment allows engineers to 387 determine the approximate speed the connected device is operating, 388 and verify that it is within parameters allowed by standards. 390 5.3 Measurement Units 392 "Line Rate" CAN be measured in terms of "Frame Rate": 394 Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap 395 + Preamble + Start-Frame Delimiter) 397 Minimum_Gap represents the inter frame gap. This formula "scales up" 398 or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so 399 on. 401 Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = 402 1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 403 Frame Rate = 1,488,095.2 frames per second. 405 Considering the allowance of +/- 100 PPM, a switch may "legally" 406 transmit traffic at a frame rate between 1,487,946.4 FPS and 407 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to 408 a 1.488 frame-per-second frame rate increase or decrease. 410 In a production network, it is very unlikely to see precise line rate 411 over a very brief period. There is no observable difference between 412 dropping packets at 99% of line rate and 100% of line rate. -Line 413 rate CAN measured at 100% of line rate with a -100PPM adjustment. - 414 Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.-The PPM 415 adjustment SHOULD only be used for a line rate type of measurement 417 6 Buffering 419 6.1 Buffer 421 6.1.1 Definition 423 Buffer Size: the term buffer size, represents the total amount of 424 frame buffering memory available on a DUT. This size is expressed in 425 Byte; KB (kilobytes), MB (megabytes) or GB (gigabyte). When the 426 buffer size is expressed it SHOULD be defined by a size metric 427 defined above. When the buffer size is expressed, an indication of 428 the frame MTU used for that measurement is also necessary as well as 429 the cos or dscp value set; as often times the buffers are carved by 430 quality of service implementation. (please refer to the buffer 431 efficiency section for further details). 433 Example: Buffer Size of DUT when sending 1518 bytes frames is 18 Mb. 435 Port Buffer Size: the port buffer size is the amount of buffer a 436 single ingress port, egress port or combination of ingress and egress 437 buffering location for a single port. The reason of mentioning the 438 three locations for the port buffer is, that the DUT buffering scheme 439 can be unknown or untested, and therefore the indication of where the 440 buffer is located helps understand the buffer architecture and 441 therefore the total buffer size. The Port Buffer Size is an 442 informational value that MAY be provided from the DUT vendor. It is 443 not a value that is tested by benchmarking. Benchmarking will be done 444 using the Maximum Port Buffer Size or Maximum Buffer Size 445 methodology. 447 Maximum Port Buffer Size: this is in most cases the same as the Port 448 Buffer Size. In certain switch architecture called SoC (switch on 449 chip), there is a concept of port buffer and shared buffer pool 450 available for all ports. Maximum Port Buffer, defines the scenario of 451 a SoC buffer, where this amount in B (byte), KB (kilobyte), MB 452 (megabyte) or GB (gigabyte) would represent the sum of the port 453 buffer along with the maximum value of shared buffer this given port 454 can take. The Maximum Port Buffer Size needs to be expressed along 455 with the frame MTU used for the measurement and the cos or dscp bit 456 value set for the test. 458 Example: a DUT has been measured to have 3KB of port buffer for 1518 459 frame size packets and a total of 4.7 MB of maximum port buffer for 460 1518 frame size packets and a cos of 0. 462 Maximum DUT Buffer Size: this is the total size of Buffer a DUT can 463 be measured to have. It is most likely different than the Maximum 464 Port Buffer Size. It CAN also be different from the sum of Maximum 465 Port Buffer Size. The Maximum Buffer Size needs to be expressed along 466 with the frame MTU used for the measurement and along with the cos or 467 dscp value set during the test. 469 Example: a DUT has been measured to have 3KB of port buffer for 1518 470 frame size packets and a total of 4.7 MB of maximum port buffer for 471 1518 frame size packets. The DUT has a Maximum Buffer Size of 18 MB 472 at 1500 bytes and a cos of 0. 474 Burst: The burst is a fixed number of packets sent over a percentage 475 of linerate of a defined port speed. The amount of frames sent are 476 evenly distributed across the interval T. A constant C, can be 477 defined to provide the average time between two consecutive packets 478 evenly spaced. 480 Microburst: it is a burst. A microburst is when packet drops occur 481 when there is not sustained or noticeable congestion upon a link or 482 device. A characterization of microburst is when the Burst is not 483 evenly distributed over T, and is less than the constant C [C= 484 average time between two consecutive packets evenly spaced out]. 486 Intensity of Microburst: this is a percentage, representing the level 487 of microburst between 1 and 100%. The higher the number the higher 488 the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / 489 Sum(packets)]]*100 490 The above definitions are not meant to comment on the ideal sizing of 491 a buffer, rather on how to measure it. A larger buffer is not 492 necessarily better and CAN cause issues with buffer bloat. 494 6.1.2 Discussion 496 When measuring buffering on a DUT, it is important to understand what 497 the behavior is for each port, and also for all ports as this will 498 provide an evidence of the total amount of buffering available on the 499 switch. The terms of buffer efficiency here helps one understand what 500 is the optimum packet size for the buffer to be used, or what is the 501 real volume of buffer available for a specific packet size. This 502 section does not discuss how to conduct the test methodology, it 503 rather explains the buffer definitions and what metrics should be 504 provided for a comprehensive data center device buffering 505 benchmarking. 507 6.1.3 Measurement Units 509 When Buffer is measured:-the buffer size MUST be measured-the port 510 buffer size MAY be provided for each port-the maximum port buffer 511 size MUST be measured-the maximum DUT buffer size MUST be measured- 512 the intensity of microburst MAY be mentioned when a microburst test 513 is performed-the cos or dscp value set during the test SHOULD be 514 provided 516 6.2 Incast 517 6.2.1 Definition 519 The term Incast, very commonly utilized in the data center, refers to 520 the traffic pattern of many-to-one or many-to-many conversations. 521 Typically in the data center it would refer to many different ingress 522 server ports(many), sending traffic to a common uplink (one), or 523 multiple uplinks (many). This pattern is generalized for any network 524 as many incoming ports sending traffic to one or few uplinks. It can 525 also be found in many-to-many traffic patterns. 527 Synchronous arrival time: When two, or more, frames of respective 528 sizes L1 and L2 arrive at their respective one or multiple ingress 529 ports, and there is an overlap of the arrival time for any of the 530 bits on the DUT, then the frames L1 and L2 have a synchronous arrival 531 times. This is called incast. 533 Asynchronous arrival time: Any condition not defined by synchronous. 535 Percentage of synchronization: this defines the level of overlap 537 [amount of bits] between the frames L1,L2..Ln. 539 Example: two 64 bytes frames, of length L1 and L2, arrive to ingress 540 port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes 541 between the two where L1 and L2 were at the same time on the 542 respective ingress ports. Therefore the percentage of synchronization 543 is 10%. 545 Stateful type traffic defines packets exchanged with a stateful 546 protocol such as for example TCP. 548 Stateless type traffic defines packets exchanged with a stateless 549 protocol such as for example UDP. 551 6.2.2 Discussion 553 In this scenario, buffers are solicited on the DUT. In a ingress 554 buffering mechanism, the ingress port buffers would be solicited 555 along with Virtual Output Queues, when available; whereas in an 556 egress buffer mechanism, the egress buffer of the one outgoing port 557 would be used. 559 In either cases, regardless of where the buffer memory is located on 560 the switch architecture; the Incast creates buffer utilization. 562 When one or more frames having synchronous arrival times at the DUT 563 they are considered forming an incast. 565 6.2.3 Measurement Units 567 It is a MUST to measure the number of ingress and egress ports. It is 568 a MUST to have a non null percentage of synchronization, which MUST 569 be specified. 571 7 Application Throughput: Data Center Goodput 573 7.1. Definition 575 In Data Center Networking, a balanced network is a function of 576 maximal throughput 'and' minimal loss at any given time. This is 577 defined by the Goodput. Goodput is the application-level throughput. 578 The definition used is a variance of the definition in RFC 2647. 580 Goodput is the number of bits per unit of time forwarded to the 581 correct destination interface of the DUT/SUT, minus any bits 582 retransmitted. 584 7.2. Discussion 586 In data center benchmarking, the goodput is a value that SHOULD be 587 measured. It provides a realistic idea of the usage of the available 588 bandwidth. A goal in data center environments is to maximize the 589 goodput while minimizing the loss. 591 7.3. Measurement Units 593 When S is the total bytes received from all senders [not inclusive of 594 packet headers or TCP headers - it's the payload] and Ft is the 595 Finishing Time of the last sender; the Goodput G is then measured by 596 the following formula: G= S / Ft bytes per second 598 Example: a TCP file transfer over HTTP protocol on a 10Gb/s media. 599 The file cannot be transferred over Ethernet as a single continuous 600 stream. It must be broken down into individual frames of 1500 bytes 601 when the standard MTU [Maximum Transmission Unit] is used. Each 602 packet requires 20 bytes of IP header information and 20 bytes of TCP 603 header information, therefore 1460 byte are available per packet for 604 the file transfer. Linux based systems are further limited to 1448 605 bytes as they also carry a 12 byte timestamp. Finally, the date is 606 transmitted in this example over Ethernet which adds a 26 byte 607 overhead per packet. 609 G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit/s or 1.196 Gigabytes per 610 second. 612 Please note: this example does not take into consideration additional 613 Ethernet overhead, such as the interframe gap (a minimum of 96 bit 614 times), nor collisions (which have a variable impact, depending on 615 the network load). 617 When conducting Goodput measurements please document in addition to 618 the 4.1 section: 620 -the TCP Stack used 622 -OS Versions 624 -NIC firmware version and model 626 For example, Windows TCP stacks and different Linux versions can 627 influence TCP based tests results. 629 8. References 631 3.1. Normative References 633 [1] Bradner, S. "Benchmarking Terminology for Network 634 Interconnection Devices", RFC 1242, July 1991. 636 [2] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 637 Network Interconnect Devices", RFC 2544, March 1999. 639 3.2. Informative References 641 [3] Mandeville R. and Perser J., "Benchmarking Methodology for LAN 642 Switching Devices", RFC 2889, August 2000. 644 [4] Stopp D. and Hickman B., "Methodology for IP Multicast 645 Benchmarking", RFC 3918, October 2004. 647 [5] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 648 Joseph, "Understanding TCP Incast Throughput Collapse in 649 Datacenter Networks", 650 http://www.eecs.berkeley.edu/~ychen2/professional/TCPIncastWREN2009.pdf". 652 3.4. Acknowledgments 654 The authors would like to thank Alfred Morton, Scott Bradner, 655 Ian Cox, Tim Stevenson for their reviews and feedback. 657 Authors' Addresses 659 Lucien Avramov 660 Google 661 170 West Tasman drive 662 Mountain View, CA 94043 663 United States 664 Email: lucienav@google.com 666 Jacob Rapp 667 VMware 668 3401 Hillview Ave 669 Palo Alto, CA 94304 670 United States 671 Phone: +1 650 857 3367 672 Email: jrapp@vmware.com