idnits 2.17.1 draft-dcbench-def-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 10 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 167 has weird spacing: '... is to refer...' == Line 571 has weird spacing: '... S / Ft bytes...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: 3) LIFO MUST not be used, because it subtracts the latency of the packet; unlike all the other methods. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '6' is mentioned on line 124, but not defined == Missing Reference: '5' is mentioned on line 624, but not defined == Unused Reference: '1' is defined on line 608, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 611, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 616, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 619, but no explicit reference was found in the text Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 Internet-Draft, Intended status: Informational Cisco Systems 4 Expires: April 20, 2015 J. Rapp 5 October 17, 2014 Hewlett-Packard 7 Data Center Benchmarking Definitions and Metrics 8 draft-dcbench-def-02 10 Abstract 12 The purpose of this informational document is to establish definitions, 13 discussion and measurement techniques for data center benchmarking. 14 Also, it is to introduce new terminologies applicable to data center 15 performance evaluations. The purpose of this document is not to define 16 the test methodology, but rather establish the important concepts when 17 one is interested in benchmarking network equipment in the data center. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the provisions 22 of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering Task 25 Force (IETF), its areas, and its working groups. Note that other groups 26 may also distribute working documents as Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference material 31 or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/1id-abstracts.html 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the document 42 authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 45 Relating to IETF Documents (http://trustee.ietf.org/license-info) in 46 effect on the date of publication of this document. Please review these 47 documents carefully, as they describe your rights and restrictions with 48 respect to this document. Code Components extracted from this document 49 must include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 57 1.2. Definition format . . . . . . . . . . . . . . . . . . . . . 4 58 2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 2.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 5 62 3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 66 4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . . 6 67 4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 4.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 7 70 5 Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 71 5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 7 72 5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 5.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 9 74 6 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 75 6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 76 6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 10 77 6.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 11 78 6.1.3 Measurement Units . . . . . . . . . . . . . . . . . . . 11 79 6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 12 81 6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 82 6.2.3 Measurement Units . . . . . . . . . . . . . . . . . . . 13 83 7 Application Throughput: Data Center Goodput . . . . . . . . . . 13 84 7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 13 85 7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 13 86 7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 13 87 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 3.1. Normative References . . . . . . . . . . . . . . . . . . . 14 89 3.2. Informative References . . . . . . . . . . . . . . . . . . 14 90 3.3. URL References . . . . . . . . . . . . . . . . . . . . . . 14 91 3.4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 15 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 94 1. Introduction 96 Traffic patterns in the data center are not uniform and are contently 97 changing. They are dictated by the nature and variety of applications 98 utilized in the data center. It can be largely east-west traffic 99 flows in one data center and north-south in another, while some may 100 combine both. Traffic patterns can be bursty in nature and contain 101 many-to-one, many-to-many, or one-to-many flows. Each flow may also 102 be small and latency sensitive or large and throughput sensitive 103 while containing a mix of UDP and TCP traffic. All of which can 104 coexist in a single cluster and flow through a single network device 105 all at the same time. Benchmarking of network devices have long used 106 RFC1242, RFC2432, RFC2544, RFC2889 and RFC3918. These benchmarks have 107 largely been focused around various latency attributes and max 108 throughput of the Device Under Test being benchmarked. These 109 standards are good at measuring theoretical max throughput, 110 forwarding rates and latency under testing conditions, but to not 111 represent real traffic patterns that may affect these networking 112 devices. 114 The following defines a set of definitions, metrics and terminologies 115 including congestion scenarios, switch buffer analysis and redefines 116 basic definitions in order to represent a wide mix of traffic 117 conditions. 119 1.1. Requirements Language 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 123 document are to be interpreted as described in RFC 2119 [6]. 125 1.2. Definition format 127 Term to be defined. (e.g., Latency) 129 Definition: The specific definition for the term. 131 Discussion: A brief discussion about the term, it's application and 132 any restrictions on measurement procedures. 134 Measurement Units: Methodology for the measure and units used to 135 report measurements of this term, if applicable. 137 2. Latency 139 2.1. Definition 141 Latency is a the amount of time it takes a frame to transit the DUT. 143 The Latency interval can be assessed between different combinations 144 of events, irrespectively of the type of switching device (bit 145 forwarding aka cut-through or store forward type of device) 147 Traditionally the latency measurement definitions are: 149 FILO (First In Last Out) The time interval starting when the end of 150 the first bit of the input frame reaches the input port and ending 151 when the last bit of the output frame is seen on the output port 153 FIFO (First In First Out) The time interval starting when the end of 154 the first bit of the input frame reaches the input port and ending 155 when the start of the first bit of the output frame is seen on the 156 output port 158 LILO (Last In Last Out) The time interval starting when the last bit 159 of the input frame reaches the input port and the last bit of the 160 output frame is seen on the output port 162 LIFO (Last In First Out) The time interval starting when the last 163 bit of the input frame reaches the input port and ending when the 164 first bit of the output frame is seen on the output port. 166 Another possibility to summarize the four different definitions above 167 is to refer to the bit position as they normally occur: input to 168 output. 170 FILO is FL (First bit Last bit) FIFO is FF (First bit First 171 bit) LILO is LL (Last bit Last bit) LIFO is LF (Last bit First bit) 173 This definition explained in this section in context of data center 174 switching benchmarking is in lieu of the previous definition of 175 Latency defined in RFC 1242, section 3.8 and is quoted here: 177 For store and forward devices: The time interval starting when the 178 last bit of the input frame reaches the input port and ending when 179 the first bit of the output frame is seen on the output port. 181 For bit forwarding devices: The time interval starting when the end 182 of the first bit of the input frame reaches the input port and ending 183 when the start of the first bit of the output frame is seen on the 184 output port. 186 2.2 Discussion 188 FILO is the most important measuring definition. Any type of switches 189 MUST be measured with the FILO mechanism: FILO will include the 190 latency of the switch and the latency of the frame as well as the 191 serialization delay. It is a picture of the 'whole' latency going 192 through the DUT. For applications, which are latency sensitive and 193 can function with initial bytes of the frame, FIFO MAY be an 194 additional type of measuring to supplement FILO. 196 LIFO mechanism can be used with store forward type of switches but 197 not with cut-through type of switches, as it will provide negative 198 latency values for larger packet sizes. Therefore this mechanism MUST 199 NOT be used when comparing latencies of two different DUTs. 201 2.3 Measurement Units 203 The measuring methods to use for benchmarking purposes are as follow: 205 1) FILO MUST be used as a measuring method, as this will include the 206 latency of the packet; and today the application commonly need to 207 read the whole packet to process the information and take an action. 209 2) FIFO MAY be used for certain applications able to proceed data as 210 the first bits arrive (FPGA for example) 212 3) LIFO MUST not be used, because it subtracts the latency of the 213 packet; unlike all the other methods. 215 3 Jitter 217 3.1 Definition 219 The definition of Jitter is covered extensively in RFC 3393. This 220 definition is not meant to replace that definition, but it is meant 221 to provide guidance of use for data center network devices. 223 The use of Jitter is in according with the variation delay definition 224 from RFC 3393: 226 The second meaning has to do with the variation of a metric (e.g., 227 delay) with respect to some reference metric (e.g., average delay or 228 minimum delay). This meaning is frequently used by computer 229 scientists and frequently (but not always) refers to variation in 230 delay. 232 Even with the reference to RFC 3393, there are many definitions of 233 "jitter" possible. The one selected for Data Center Benchmarking is 234 closest to RFC 3393. 236 3.2 Discussion 238 Jitter can be measured in different scenarios:-packet to packet delay 239 variation-delta between min and max packet delay variation for all 240 packets sent. 242 3.3 Measurement Units 244 The jitter MUST be measured when sending packets of the same size. 245 Jitter MUST be measured as packet to packet delay variation and delta 246 between min and max packet delay variation of all packets sent. A 247 histogram MAY be provided as a population of packets measured per 248 latency or latency buckets. 250 4 Physical Layer Calibration 252 4.1 Definition 254 The calibration of the physical layer consists of defining and 255 measuring the latency of the physical devices used to perform test on 256 the DUT. 258 It includes the list of all physical layer components used as listed 259 here after: 261 -type of device used to generate traffic / measure traffic 263 -type of line cards used on the traffic generator 265 -type of transceivers on traffic generator 267 -type of transceivers on DUT 269 -type of cables 271 -length of cables 273 -software name, and version of traffic generator and DUT 275 -list of enabled features on DUT MAY be provided and is recommended 276 [especially the control plane protocols such as LLDP, Spanning-Tree 277 etc.]. A comprehensive configuration file MAY be provided to this 278 effect. 280 4.2 Discussion 282 Physical layer calibration is part of the end to end latency, which 283 should be taken into acknowledgment while evaluating the DUT. Small 284 variations of the physical components of the test may impact the 285 latency being measure so they MUST be described when presenting 286 results. 288 4.3 Measurement Units 290 It is RECOMMENDED to use all cables of : the same type, the same 291 length, when possible using the same vendor. It is a MUST to document 292 the cables specifications on section [4.1s] along with the test 293 results. The test report MUST specify if the cable latency has been 294 removed from the test measures or not. The accuracy of the traffic 295 generator measure MUST be provided [this is usually a value in the 296 20ns range for current test equipment]. 298 5 Line rate 300 5.1 Definition 301 The transmit timing, or maximum transmitted data rate is controlled 302 by the "transmit clock" in the DUT. The receive timing (maximum 303 ingress data rate) is derived from the transmit clock of the 304 connected interface. 306 The line rate or physical layer frame rate is the maximum capacity to 307 send frames of a specific size at the transmit clock frequency of the 308 DUT. 310 The term port capacity term defines the maximum speed capability for 311 the given port; for example 1GE, 10GE, 40GE, 100GE etc. 313 The frequency ("clock rate") of the transmit clock in any two 314 connected interfaces will never be precisely the same, therefore a 315 tolerance is needed, this will be expressed by Parts Per Million 316 (PPM) value. The IEEE standards allow a specific +/- variance in the 317 transmit clock rate, and Ethernet is designed to allow for small, 318 normal variations between the two clock rates. This results in a 319 tolerance of the line rate value when traffic is generated from a 320 testing equipment to a DUT. 322 5.2 Discussion 324 For a transmit clock source, most Ethernet switches use "clock 325 modules" (also called "oscillator modules") that are sealed, 326 internally temperature-compensated, and very accurate. The output 327 frequency of these modules is not adjustable because it is not 328 necessary. Many test sets, however, offer a software-controlled 329 adjustment of the transmit clock rate, which should be used to 330 compensate the test equipment to not send more than line rate of the 331 DUT. 333 To allow for the minor variations typically found in the clock rate 334 of commercially-available clock modules and other crystal-based 335 oscillators, Ethernet standards specify the maximum transmit clock 336 rate variation to be not more than +/- 100 PPM (parts per million) 337 from a calculated center frequency. Therefore a DUT must be able to 338 accept frames at a rate within +/- 100 PPM to comply with the 339 standards. 341 Very few clock circuits are precisely +/- 0.0 PPM because: 343 1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per 344 million) variance over time. Therefore it is normal for the frequency 345 of the oscillator circuits to experience variation over time and over 346 a wide temperature range, among external factors. 348 2.The crystals or clock modules, usually have a specific +/- PPM 349 variance that is significantly better than +/- 100 PPM. Often times 350 this is +/- 30 PPM or better in order to be considered a 351 "certification instrument". 353 When testing an Ethernet switch throughput at "line rate", any 354 specific switch will have a clock rate variance. If a test set is 355 running +1 PPM faster than a switch under test, and a sustained line 356 rate test is performed, a gradual increase in latency and eventually 357 packet drops as buffers fill and overflow in the switch can be 358 observed. Depending on how much clock variance there is between the 359 two connected systems, the effect may be seen after the traffic 360 stream has been running for a few hundred microseconds, a few 361 milliseconds, or seconds. The same low latency and no-packet-loss can 362 be demonstrated by setting the test set link occupancy to slightly 363 less than 100 percent link occupancy. Typically 99 percent link 364 occupancy produces excellent low-latency and no packet loss. No 365 Ethernet switch or router will have a transmit clock rate of exactly 366 +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is 367 precisely +/- 0.0 PPM. 369 Test set equipment manufacturers are well-aware of the standards, and 370 allows a software-controlled +/- 100 PPM "offset" (clock-rate 371 adjustment) to compensate for normal variations in the clock speed of 372 "devices under test". This offset adjustment allows engineers to 373 determine the approximate speed the connected device is operating, 374 and verify that it is within parameters allowed by standards. 376 5.3 Measurement Units 378 "Line Rate" CAN be measured in terms of "Frame Rate": 380 Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap 381 + Preamble + Start-Frame Delimiter) 383 Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = 384 1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 385 Frame Rate = 1,488,095.2 frames per second. 387 Considering the allowance of +/- 100 PPM, a switch may "legally" 388 transmit traffic at a frame rate between 1,487,946.4 FPS and 389 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to 390 a 1.488 frame-per-second frame rate increase or decrease. 392 In a production network, it is very unlikely to see precise line rate 393 over a very brief period. There is no observable difference between 394 dropping packets at 99% of line rate and 100% of line rate. -Line 395 rate CAN measured at 100% of line rate with a -100PPM adjustment. - 396 Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.-The PPM 397 adjustment SHOULD only be used for a line rate type of measurement 399 6 Buffering 401 6.1 Buffer 403 6.1.1 Definition 405 Buffer Size: the term buffer size, represents the total amount of 406 frame buffering memory available on a DUT. This size is expressed in 407 Byte; KB (kilobytes), MB (megabytes) or GB (gigabyte). When the 408 buffer size is expressed it SHOULD be defined by a size metric 409 defined above. When the buffer size is expressed, an indication of 410 the frame MTU used for that measurement is also necessary as well as 411 the cos or dscp value set; as often times the buffers are carved by 412 quality of service implementation. (please refer to the buffer 413 efficiency section for further details). 415 Example: Buffer Size of DUT when sending 1518 bytes frames is 18 Mb. 417 Port Buffer Size: the port buffer size is the amount of buffer a 418 single ingress port, egress port or combination of ingress and egress 419 buffering location for a single port. The reason of mentioning the 420 three locations for the port buffer is, that the DUT buffering scheme 421 can be unknown or untested, and therefore the indication of where the 422 buffer is located helps understand the buffer architecture and 423 therefore the total buffer size. The Port Buffer Size is an 424 informational value that MAY be provided from the DUT vendor. It is 425 not a value that is tested by benchmarking. Benchmarking will be done 426 using the Maximum Port Buffer Size or Maximum Buffer Size 427 methodology. 429 Maximum Port Buffer Size: this is in most cases the same as the Port 430 Buffer Size. In certain switch architecture called SoC (switch on 431 chip), there is a concept of port buffer and shared buffer pool 432 available for all ports. Maximum Port Buffer, defines the scenario of 433 a SoC buffer, where this amount in B (byte), KB (kilobyte), MB 434 (megabyte) or GB (gigabyte) would represent the sum of the port 435 buffer along with the maximum value of shared buffer this given port 436 can take. The Maximum Port Buffer Size needs to be expressed along 437 with the frame MTU used for the measurement and the cos or dscp bit 438 value set for the test. 440 Example: a DUT has been measured to have 3KB of port buffer for 1518 441 frame size packets and a total of 4.7 MB of maximum port buffer for 442 1518 frame size packets and a cos of 0. 444 Maximum DUT Buffer Size: this is the total size of Buffer a DUT can 445 be measured to have. It is most likely different than the Maximum 446 Port Buffer Size. It CAN also be different from the sum of Maximum 447 Port Buffer Size. The Maximum Buffer Size needs to be expressed along 448 with the frame MTU used for the measurement and along with the cos or 449 dscp value set during the test. 451 Example: a DUT has been measured to have 3KB of port buffer for 1518 452 frame size packets and a total of 4.7 MB of maximum port buffer for 453 1518 frame size packets. The DUT has a Maximum Buffer Size of 18 MB 454 at 1500 bytes and a cos of 0. 456 Burst: The burst is a fixed number of packets sent over a percentage 457 of linerate of a defined port speed. The amount of frames sent are 458 evenly distributed across the interval T. A constant C, can be 459 defined to provide the average time between two consecutive packets 460 evenly spaced. 462 Microburst: it is a burst. A microburst is when packet drops occur 463 when there is not sustained or noticeable congestion upon a link or 464 device. A characterization of microburst is when the Burst is not 465 evenly distributed over T, and is less than the constant C [C= 466 average time between two consecutive packets evenly spaced out]. 468 Intensity of Microburst: this is a percentage, representing the level 469 of microburst between 1 and 100%. The higher the number the higher 470 the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / 471 Sum(packets)]]*100 473 6.1.3 Discussion 475 When measuring buffering on a DUT, it is important to understand what 476 the behavior is for each port, and also for all ports as this will 477 provide an evidence of the total amount of buffering available on the 478 switch. The terms of buffer efficiency here helps one understand what 479 is the optimum packet size for the buffer to be used, or what is the 480 real volume of buffer available for a specific packet size. This 481 section does not discuss how to conduct the test methodology, it 482 rather explains the buffer definitions and what metrics should be 483 provided for a comprehensive data center device buffering 484 benchmarking. 486 6.1.3 Measurement Units 488 When Buffer is measured:-the buffer size MUST be measured-the port 489 buffer size MAY be provided for each port-the maximum port buffer 490 size MUST be measured-the maximum DUT buffer size MUST be measured- 491 the intensity of microburst MAY be mentioned when a microburst test 492 is performed-the cos or dscp value set during the test SHOULD be 493 provided 495 6.2 Incast 496 6.2.1 Definition 498 The term Incast, very commonly utilized in the data center, refers to 499 the traffic pattern of many-to-one or many-to-many conversations. 500 Typically in the data center it would refer to many different ingress 501 server ports(many), sending traffic to a common uplink (one), or 502 multiple uplinks (many). This pattern is generalized for any network 503 as many incoming ports sending traffic to one or few uplinks. It can 504 also be found in many-to-many traffic patterns. 506 Synchronous arrival time: When two, or more, frames of respective 507 sizes L1 and L2 arrive at their respective one or multiple ingress 508 ports, and there is an overlap of the arrival time for any of the 509 bits on the DUT, then the frames L1 and L2 have a synchronous arrival 510 times. This is called incast. 512 Asynchronous arrival time: Any condition not defined by synchronous. 514 Percentage of synchronization: this defines the level of overlap 515 [amount of bits] between the frames L1,L2..Ln. 517 Example: two 64 bytes frames, of length L1 and L2, arrive to ingress 518 port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes 519 between the two where L1 and L2 were at the same time on the 520 respective ingress ports. Therefore the percentage of synchronization 521 is 10%. 523 Stateful type traffic defines packets exchanged with a stateful 524 protocol such as for example TCP. 526 Stateless type traffic defines packets exchanged with a stateless 527 protocol such as for example UDP. 529 6.2.2 Discussion 531 In this scenario, buffers are solicited on the DUT. In a ingress 532 buffering mechanism, the ingress port buffers would be solicited 533 along with Virtual Output Queues, when available; whereas in an 534 egress buffer mechanism, the egress buffer of the one outgoing port 535 would be used. 537 In either cases, regardless of where the buffer memory is located on 538 the switch architecture; the Incast creates buffer utilization. 540 When one or more frames having synchronous arrival times at the DUT 541 they are considered forming an incast. 543 6.2.3 Measurement Units 545 It is a MUST to measure the number of ingress and egress ports. It is 546 a MUST to have a non null percentage of synchronization, which MUST 547 be specified. 549 7 Application Throughput: Data Center Goodput 551 7.1. Definition 553 In Data Center Networking, a balanced network is a function of 554 maximal throughput 'and' minimal loss at any given time. This is 555 defined by the Goodput. Goodput is the application-level throughput. 556 It is measured in bytes / second. Goodput is the measurement of the 557 actual payload of the packet being sent. 559 7.2. Discussion 561 In data center benchmarking, the goodput is a value that SHOULD be 562 measured. It provides a realistic idea of the usage of the available 563 bandwidth. A goal in data center environments is to maximize the 564 goodput while minimizing the loss. 566 7.3. Measurement Units 568 When S is the total bytes received from all senders [not inclusive of 569 packet headers or TCP headers - it's the payload] and Ft is the 570 Finishing Time of the last sender; the Goodput G is then measured by 571 the following formula: G= S / Ft bytes per second 573 Example: a TCP file transfer over HTTP protocol on a 10Gb/s media. 574 The file cannot be transferred over Ethernet as a single continuous 575 stream. It must be broken down into individual frames of 1500 bytes 576 when the standard MTU [Maximum Transmission Unit] is used. Each 577 packet requires 20 bytes of IP header information and 20 bytes of TCP 578 header information, therefore 1460 byte are available per packet for 579 the file transfer. Linux based systems are further limited to 1448 580 bytes as they also carry a 12 byte timestamp. Finally, the date is 581 transmitted in this example over Ethernet which adds a 26 byte 582 overhead per packet. 584 G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit/s or 1.196 Gigabytes per 585 second. 587 Please note: this example does not take into consideration additional 588 Ethernet overhead, such as the interframe gap (a minimum of 96 bit 589 times), nor collisions (which have a variable impact, depending on 590 the network load). 592 When conducting Goodput measurements please document in addition to 593 the 4.1 section: 595 -the TCP Stack used 597 -OS Versions 599 -NIC firmware version and model 601 For example, Windows TCP stacks and different Linux versions can 602 influence TCP based tests results. 604 8. References 606 3.1. Normative References 608 [1] Bradner, S. "Benchmarking Terminology for Network 609 Interconnection Devices", RFC 1242, July 1991. 611 [2] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 612 Network Interconnect Devices", RFC 2544, March 1999. 614 3.2. Informative References 616 [3] Mandeville R. and Perser J., "Benchmarking Methodology for LAN 617 Switching Devices", RFC 2889, August 2000. 619 [4] Stopp D. and Hickman B., "Methodology for IP Multicast 620 Benchmarking", BCP 26, RFC 3918, October 2004. 622 3.3. URL References 624 [5] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 626 Joseph, "Understanding TCP Incast Throughput Collapse in 627 Datacenter Networks", 628 http://www.eecs.berkeley.edu/~ychen2/professional/TCPIncastWREN2009.pdf". 630 3.4. Acknowledgments 632 The authors would like to thank Ian Cox and Tim Stevenson for 633 their reviews and feedback. 635 Authors' Addresses 637 Lucien Avramov 638 Cisco Systems 639 170 West Tasman drive 640 San Jose, CA 95134 641 United States 642 Phone: +1 408 526 7686 643 Email: lavramov@cisco.com 645 Jacob Rapp 646 Hewlett-Packard Company 647 3000 Hanover Street 648 Palo Alto, CA 94304 649 United States 650 Phone: +1 650 857 3367 651 Email: jacob.h.rapp@hp.com