idnits 2.17.1 draft-ietf-bmwg-dcbench-terminology-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 11 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 169 has weird spacing: '... is to refer...' == Line 593 has weird spacing: '... S / Ft bytes...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: 3) LIFO MUST not be used, because it subtracts the latency of the packet; unlike all the other methods. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC1242' is defined on line 654, but no explicit reference was found in the text == Unused Reference: 'RFC2544' is defined on line 657, but no explicit reference was found in the text == Unused Reference: '1' is defined on line 662, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 665, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 668, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 671, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 Internet-Draft, Intended status: Informational Google 4 Expires: October 29,2017 J. Rapp 5 April 27, 2017 VMware 7 Data Center Benchmarking Terminology 8 draft-ietf-bmwg-dcbench-terminology-07 10 Abstract 12 The purpose of this informational document is to establish definitions, 13 discussion and measurement techniques for data center benchmarking. 14 Also, it is to introduce new terminologies applicable to data center 15 performance evaluations. The purpose of this document is not to define 16 the test methodology, but rather establish the important concepts when 17 one is interested in benchmarking network switches and routers in the 18 data center. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the provisions 23 of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering Task 26 Force (IETF). Note that other groups may also distribute working 27 documents as Internet-Drafts. The list of current Internet-Drafts is at 28 http://datatracker.ietf.org/drafts/current. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference material 33 or to cite them other than as "work in progress." 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the document 38 authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 41 Relating to IETF Documents (http://trustee.ietf.org/license-info) in 42 effect on the date of publication of this document. Please review these 43 documents carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this document 45 must include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 53 1.2. Definition format . . . . . . . . . . . . . . . . . . . . . 4 54 2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 58 3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 62 4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . . 7 63 4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 4.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 7 66 5 Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 67 5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 5.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 9 70 6 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 10 73 6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 74 6.1.3 Measurement Units . . . . . . . . . . . . . . . . . . . 12 75 6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 76 6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 12 77 6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 13 78 6.2.3 Measurement Units . . . . . . . . . . . . . . . . . . . 13 79 7 Application Throughput: Data Center Goodput . . . . . . . . . . 13 80 7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 13 81 7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 14 82 7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 14 83 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 84 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 86 10.1. Normative References . . . . . . . . . . . . . . . . . . 15 87 10.2. Informative References . . . . . . . . . . . . . . . . . 15 88 10.3. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 16 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 91 1. Introduction 93 Traffic patterns in the data center are not uniform and are contently 94 changing. They are dictated by the nature and variety of applications 95 utilized in the data center. It can be largely east-west traffic 96 flows in one data center and north-south in another, while some may 97 combine both. Traffic patterns can be bursty in nature and contain 98 many-to-one, many-to-many, or one-to-many flows. Each flow may also 99 be small and latency sensitive or large and throughput sensitive 100 while containing a mix of UDP and TCP traffic. All of which can 101 coexist in a single cluster and flow through a single network device 102 all at the same time. Benchmarking of network devices have long used 103 RFC1242, RFC2432, RFC2544, RFC2889 and RFC3918. These benchmarks have 104 largely been focused around various latency attributes and max 105 throughput of the Device Under Test being benchmarked. These 106 standards are good at measuring theoretical max throughput, 107 forwarding rates and latency under testing conditions, but to not 108 represent real traffic patterns that may affect these networking 109 devices. The data center networking devices covered are switches and 110 routers. 112 The following defines a set of definitions, metrics and terminologies 113 including congestion scenarios, switch buffer analysis and redefines 114 basic definitions in order to represent a wide mix of traffic 115 conditions. 117 1.1. Requirements Language 119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 121 document are to be interpreted as described in RFC 2119 [RFC2119]. 123 1.2. Definition format 125 Term to be defined. (e.g., Latency) 127 Definition: The specific definition for the term. 129 Discussion: A brief discussion about the term, it's application and 130 any restrictions on measurement procedures. 132 Measurement Units: Methodology for the measure and units used to 133 report measurements of this term, if applicable. 135 2. Latency 137 2.1. Definition 139 Latency is a the amount of time it takes a frame to transit the DUT. 140 Latency is measured in unit of time (seconds, milliseconds, 141 microseconds and so on). The purpose of measuring latency is to 142 understand what is the impact of adding a device in the communication 143 path. 145 The Latency interval can be assessed between different combinations 146 of events, irrespectively of the type of switching device (bit 147 forwarding aka cut-through or store forward type of device) 149 Traditionally the latency measurement definitions are: 151 FILO (First In Last Out) The time interval starting when the end of 152 the first bit of the input frame reaches the input port and ending 153 when the last bit of the output frame is seen on the output port 155 FIFO (First In First Out) The time interval starting when the end of 156 the first bit of the input frame reaches the input port and ending 157 when the start of the first bit of the output frame is seen on the 158 output port 160 LILO (Last In Last Out) The time interval starting when the last bit 161 of the input frame reaches the input port and the last bit of the 162 output frame is seen on the output port 164 LIFO (Last In First Out) The time interval starting when the last 165 bit of the input frame reaches the input port and ending when the 166 first bit of the output frame is seen on the output port. 168 Another possibility to summarize the four different definitions above 169 is to refer to the bit position as they normally occur: input to 170 output. 172 FILO is FL (First bit Last bit) FIFO is FF (First bit First 173 bit) LILO is LL (Last bit Last bit) LIFO is LF (Last bit First bit) 175 This definition explained in this section in context of data center 176 switching benchmarking is in lieu of the previous definition of 177 Latency defined in RFC 1242, section 3.8 and is quoted here: 179 For store and forward devices: The time interval starting when the 180 last bit of the input frame reaches the input port and ending when 181 the first bit of the output frame is seen on the output port. 183 For bit forwarding devices: The time interval starting when the end 184 of the first bit of the input frame reaches the input port and ending 185 when the start of the first bit of the output frame is seen on the 186 output port. 188 2.2 Discussion 190 FILO is the most important measuring definition. Any type of switches 191 MUST be measured with the FILO mechanism: FILO will include the 192 latency of the switch and the latency of the frame as well as the 193 serialization delay. It is a picture of the 'whole' latency going 194 through the DUT. For applications, which are latency sensitive and 195 can function with initial bytes of the frame, FIFO MAY be an 196 additional type of measuring to supplement FILO. 198 Not all DUTs are exclusively cut-through or store-and-forward. Data 199 Center DUTs are frequently store-and-forward for smaller packet sizes 200 and then adopting a cut-through behavior. FILO covers all scenarios. 202 LIFO mechanism can be used with store forward type of switches but 203 not with cut-through type of switches, as it will provide negative 204 latency values for larger packet sizes because LIFO removes the 205 serialization delay. Therefore, this mechanism MUST NOT be used when 206 comparing latencies of two different DUTs. 208 2.3 Measurement Units 210 The measuring methods to use for benchmarking purposes are as follow: 212 1) FILO MUST be used as a measuring method, as this will include the 213 latency of the packet; and today the application commonly need to 214 read the whole packet to process the information and take an action. 216 2) FIFO MAY be used for certain applications able to proceed data as 217 the first bits arrive (FPGA for example) 219 3) LIFO MUST not be used, because it subtracts the latency of the 220 packet; unlike all the other methods. 222 3 Jitter 224 3.1 Definition 226 Jitter in the data center context is synonymous with the common term 227 Delay variation. It is derived from multiple measurements of one-way 228 delay, as described in RFC 3393. The mandatory definition of Delay 229 Variation is the PDV form from section 4.2 of RFC 5481. When 230 considering a stream of packets, the delays of all packets are 231 subtracted from the minimum delay over all packets in the stream. 232 This facilitates assessment of the range of delay variation (Max - 233 Min), or a high percentile of PDV (99th percentile, for robustness 234 against outliers). 236 If First-bit to Last-bit timestamps are used for Delay measurement, 237 then Delay Variation MUST be measured using packets or frames of the 238 same size, since the definition of latency includes the serialization 239 time for each packet. Otherwise if using First-bit to First-bit, the 240 size restriction does not apply. 242 3.2 Discussion 244 In addition to PDV Range and or a high percentile of PDV, Inter- 245 Packet Delay Variation (IPDV) as defined in section 4.1 of RFC5481 246 (differences between two consecutive packets) MAY be used for the 247 purpose of determining how packet spacing has changed during 248 transfer, for example to see if packet stream has become closely- 249 spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be 250 used, as this collapses the "bursty" and "dispersed" sides of the 251 IPDV distribution together. 253 3.3 Measurement Units 254 The measurement of delay variation is expressed in units of seconds. 255 A PDV histogram MAY be provided for the population of packets 256 measured. 258 4 Physical Layer Calibration 260 4.1 Definition 262 The calibration of the physical layer consists of defining and 263 measuring the latency of the physical devices used to perform test on 264 the DUT. 266 It includes the list of all physical layer components used as listed 267 here after: 269 -type of device used to generate traffic / measure traffic 271 -type of line cards used on the traffic generator 273 -type of transceivers on traffic generator 275 -type of transceivers on DUT 277 -type of cables 279 -length of cables 281 -software name, and version of traffic generator and DUT 283 -list of enabled features on DUT MAY be provided and is recommended 284 [especially the control plane protocols such as LLDP, Spanning-Tree 285 etc.]. A comprehensive configuration file MAY be provided to this 286 effect. 288 4.2 Discussion 290 Physical layer calibration is part of the end to end latency, which 291 should be taken into acknowledgment while evaluating the DUT. Small 292 variations of the physical components of the test may impact the 293 latency being measure so they MUST be described when presenting 294 results. 296 4.3 Measurement Units 298 It is RECOMMENDED to use all cables of : the same type, the same 299 length, when possible using the same vendor. It is a MUST to document 300 the cables specifications on section [4.1s] along with the test 301 results. The test report MUST specify if the cable latency has been 302 removed from the test measures or not. The accuracy of the traffic 303 generator measure MUST be provided [this is usually a value in the 304 20ns range for current test equipment]. 306 5 Line rate 308 5.1 Definition 310 The transmit timing, or maximum transmitted data rate is controlled 311 by the "transmit clock" in the DUT. The receive timing (maximum 312 ingress data rate) is derived from the transmit clock of the 313 connected interface. 315 The line rate or physical layer frame rate is the maximum capacity to 316 send frames of a specific size at the transmit clock frequency of the 317 DUT. 319 The term port capacity term defines the maximum speed capability for 320 the given port; for example 1GE, 10GE, 40GE, 100GE etc. 322 The frequency ("clock rate") of the transmit clock in any two 323 connected interfaces will never be precisely the same, therefore a 324 tolerance is needed, this will be expressed by Parts Per Million 325 (PPM) value. The IEEE standards allow a specific +/- variance in the 326 transmit clock rate, and Ethernet is designed to allow for small, 327 normal variations between the two clock rates. This results in a 328 tolerance of the line rate value when traffic is generated from a 329 testing equipment to a DUT. 331 Line rate SHOULD be measured in frames per second. 333 5.2 Discussion 335 For a transmit clock source, most Ethernet switches use "clock 336 modules" (also called "oscillator modules") that are sealed, 337 internally temperature-compensated, and very accurate. The output 338 frequency of these modules is not adjustable because it is not 339 necessary. Many test sets, however, offer a software-controlled 340 adjustment of the transmit clock rate, which should be used to 341 compensate the test equipment to not send more than line rate of the 342 DUT. 344 To allow for the minor variations typically found in the clock rate 345 of commercially-available clock modules and other crystal-based 346 oscillators, Ethernet standards specify the maximum transmit clock 347 rate variation to be not more than +/- 100 PPM (parts per million) 348 from a calculated center frequency. Therefore a DUT must be able to 349 accept frames at a rate within +/- 100 PPM to comply with the 350 standards. 352 Very few clock circuits are precisely +/- 0.0 PPM because: 354 1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per 355 million) variance over time. Therefore it is normal for the frequency 356 of the oscillator circuits to experience variation over time and over 357 a wide temperature range, among external factors. 359 2.The crystals or clock modules, usually have a specific +/- PPM 360 variance that is significantly better than +/- 100 PPM. Often times 361 this is +/- 30 PPM or better in order to be considered a 362 "certification instrument". 364 When testing an Ethernet switch throughput at "line rate", any 365 specific switch will have a clock rate variance. If a test set is 366 running +1 PPM faster than a switch under test, and a sustained line 367 rate test is performed, a gradual increase in latency and eventually 368 packet drops as buffers fill and overflow in the switch can be 369 observed. Depending on how much clock variance there is between the 370 two connected systems, the effect may be seen after the traffic 371 stream has been running for a few hundred microseconds, a few 372 milliseconds, or seconds. The same low latency and no-packet-loss can 373 be demonstrated by setting the test set link occupancy to slightly 374 less than 100 percent link occupancy. Typically 99 percent link 375 occupancy produces excellent low-latency and no packet loss. No 376 Ethernet switch or router will have a transmit clock rate of exactly 377 +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is 378 precisely +/- 0.0 PPM. 380 Test set equipment manufacturers are well-aware of the standards, and 381 allows a software-controlled +/- 100 PPM "offset" (clock-rate 382 adjustment) to compensate for normal variations in the clock speed of 383 "devices under test". This offset adjustment allows engineers to 384 determine the approximate speed the connected device is operating, 385 and verify that it is within parameters allowed by standards. 387 5.3 Measurement Units 389 "Line Rate" CAN be measured in terms of "Frame Rate": 391 Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap 392 + Preamble + Start-Frame Delimiter) 394 Minimum_Gap represents the inter frame gap. This formula "scales up" 395 or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so 396 on. 398 Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = 399 1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 400 Frame Rate = 1,488,095.2 frames per second. 402 Considering the allowance of +/- 100 PPM, a switch may "legally" 403 transmit traffic at a frame rate between 1,487,946.4 FPS and 404 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to 405 a 1.488 frame-per-second frame rate increase or decrease. 407 In a production network, it is very unlikely to see precise line rate 408 over a very brief period. There is no observable difference between 409 dropping packets at 99% of line rate and 100% of line rate. -Line 410 rate CAN measured at 100% of line rate with a -100PPM adjustment. - 411 Line rate SHOULD be measured at 99,98% with 0 PPM adjustment.-The PPM 412 adjustment SHOULD only be used for a line rate type of measurement 414 6 Buffering 416 6.1 Buffer 418 6.1.1 Definition 420 Buffer Size: the term buffer size, represents the total amount of 421 frame buffering memory available on a DUT. This size is expressed in 422 Byte; KB (kilobytes), MB (megabytes) or GB (gigabyte). When the 423 buffer size is expressed it SHOULD be defined by a size metric 424 defined above. When the buffer size is expressed, an indication of 425 the frame MTU used for that measurement is also necessary as well as 426 the cos or dscp value set; as often times the buffers are carved by 427 quality of service implementation. (please refer to the buffer 428 efficiency section for further details). 430 Example: Buffer Size of DUT when sending 1518 bytes frames is 18 Mb. 432 Port Buffer Size: the port buffer size is the amount of buffer a 433 single ingress port, egress port or combination of ingress and egress 434 buffering location for a single port. The reason of mentioning the 435 three locations for the port buffer is, that the DUT buffering scheme 436 can be unknown or untested, and therefore the indication of where the 437 buffer is located helps understand the buffer architecture and 438 therefore the total buffer size. The Port Buffer Size is an 439 informational value that MAY be provided from the DUT vendor. It is 440 not a value that is tested by benchmarking. Benchmarking will be done 441 using the Maximum Port Buffer Size or Maximum Buffer Size 442 methodology. 444 Maximum Port Buffer Size: this is in most cases the same as the Port 445 Buffer Size. In certain switch architecture called SoC (switch on 446 chip), there is a concept of port buffer and shared buffer pool 447 available for all ports. Maximum Port Buffer, defines the scenario of 448 a SoC buffer, where this amount in B (byte), KB (kilobyte), MB 449 (megabyte) or GB (gigabyte) would represent the sum of the port 450 buffer along with the maximum value of shared buffer this given port 451 can take. The Maximum Port Buffer Size needs to be expressed along 452 with the frame MTU used for the measurement and the cos or dscp bit 453 value set for the test. 455 Example: a DUT has been measured to have 3KB of port buffer for 1518 456 frame size packets and a total of 4.7 MB of maximum port buffer for 457 1518 frame size packets and a cos of 0. 459 Maximum DUT Buffer Size: this is the total size of Buffer a DUT can 460 be measured to have. It is most likely different than the Maximum 461 Port Buffer Size. It CAN also be different from the sum of Maximum 462 Port Buffer Size. The Maximum Buffer Size needs to be expressed along 463 with the frame MTU used for the measurement and along with the cos or 464 dscp value set during the test. 466 Example: a DUT has been measured to have 3KB of port buffer for 1518 467 frame size packets and a total of 4.7 MB of maximum port buffer for 468 1518 frame size packets. The DUT has a Maximum Buffer Size of 18 MB 469 at 1500 bytes and a cos of 0. 471 Burst: The burst is a fixed number of packets sent over a percentage 472 of linerate of a defined port speed. The amount of frames sent are 473 evenly distributed across the interval T. A constant C, can be 474 defined to provide the average time between two consecutive packets 475 evenly spaced. 477 Microburst: it is a burst. A microburst is when packet drops occur 478 when there is not sustained or noticeable congestion upon a link or 479 device. A characterization of microburst is when the Burst is not 480 evenly distributed over T, and is less than the constant C [C= 481 average time between two consecutive packets evenly spaced out]. 483 Intensity of Microburst: this is a percentage, representing the level 484 of microburst between 1 and 100%. The higher the number the higher 485 the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / 486 Sum(packets)]]*100 487 The above definitions are not meant to comment on the ideal sizing of 488 a buffer, rather on how to measure it. A larger buffer is not 489 necessarily better and CAN cause issues with buffer bloat. 491 6.1.2 Discussion 493 When measuring buffering on a DUT, it is important to understand what 494 the behavior is for each port, and also for all ports as this will 495 provide an evidence of the total amount of buffering available on the 496 switch. The terms of buffer efficiency here helps one understand what 497 is the optimum packet size for the buffer to be used, or what is the 498 real volume of buffer available for a specific packet size. This 499 section does not discuss how to conduct the test methodology, it 500 rather explains the buffer definitions and what metrics should be 501 provided for a comprehensive data center device buffering 502 benchmarking. 504 6.1.3 Measurement Units 506 When Buffer is measured:-the buffer size MUST be measured-the port 507 buffer size MAY be provided for each port-the maximum port buffer 508 size MUST be measured-the maximum DUT buffer size MUST be measured- 509 the intensity of microburst MAY be mentioned when a microburst test 510 is performed-the cos or dscp value set during the test SHOULD be 511 provided 513 6.2 Incast 514 6.2.1 Definition 516 The term Incast, very commonly utilized in the data center, refers to 517 the traffic pattern of many-to-one or many-to-many conversations. 518 Typically in the data center it would refer to many different ingress 519 server ports(many), sending traffic to a common uplink (one), or 520 multiple uplinks (many). This pattern is generalized for any network 521 as many incoming ports sending traffic to one or few uplinks. It can 522 also be found in many-to-many traffic patterns. 524 Synchronous arrival time: When two, or more, frames of respective 525 sizes L1 and L2 arrive at their respective one or multiple ingress 526 ports, and there is an overlap of the arrival time for any of the 527 bits on the DUT, then the frames L1 and L2 have a synchronous arrival 528 times. This is called incast. 530 Asynchronous arrival time: Any condition not defined by synchronous. 532 Percentage of synchronization: this defines the level of overlap 534 [amount of bits] between the frames L1,L2..Ln. 536 Example: two 64 bytes frames, of length L1 and L2, arrive to ingress 537 port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes 538 between the two where L1 and L2 were at the same time on the 539 respective ingress ports. Therefore the percentage of synchronization 540 is 10%. 542 Stateful type traffic defines packets exchanged with a stateful 543 protocol such as for example TCP. 545 Stateless type traffic defines packets exchanged with a stateless 546 protocol such as for example UDP. 548 6.2.2 Discussion 550 In this scenario, buffers are solicited on the DUT. In a ingress 551 buffering mechanism, the ingress port buffers would be solicited 552 along with Virtual Output Queues, when available; whereas in an 553 egress buffer mechanism, the egress buffer of the one outgoing port 554 would be used. 556 In either cases, regardless of where the buffer memory is located on 557 the switch architecture; the Incast creates buffer utilization. 559 When one or more frames having synchronous arrival times at the DUT 560 they are considered forming an incast. 562 6.2.3 Measurement Units 564 It is a MUST to measure the number of ingress and egress ports. It is 565 a MUST to have a non null percentage of synchronization, which MUST 566 be specified. 568 7 Application Throughput: Data Center Goodput 570 7.1. Definition 572 In Data Center Networking, a balanced network is a function of 573 maximal throughput 'and' minimal loss at any given time. This is 574 defined by the Goodput. Goodput is the application-level throughput. 575 The definition used is a variance of the definition in RFC 2647. 577 Goodput is the number of bits per unit of time forwarded to the 578 correct destination interface of the DUT/SUT, minus any bits 579 retransmitted. 581 7.2. Discussion 583 In data center benchmarking, the goodput is a value that SHOULD be 584 measured. It provides a realistic idea of the usage of the available 585 bandwidth. A goal in data center environments is to maximize the 586 goodput while minimizing the loss. 588 7.3. Measurement Units 590 When S is the total bytes received from all senders [not inclusive of 591 packet headers or TCP headers - it's the payload] and Ft is the 592 Finishing Time of the last sender; the Goodput G is then measured by 593 the following formula: G= S / Ft bytes per second 595 Example: a TCP file transfer over HTTP protocol on a 10Gb/s media. 596 The file cannot be transferred over Ethernet as a single continuous 597 stream. It must be broken down into individual frames of 1500 bytes 598 when the standard MTU [Maximum Transmission Unit] is used. Each 599 packet requires 20 bytes of IP header information and 20 bytes of TCP 600 header information, therefore 1460 byte are available per packet for 601 the file transfer. Linux based systems are further limited to 1448 602 bytes as they also carry a 12 byte timestamp. Finally, the date is 603 transmitted in this example over Ethernet which adds a 26 byte 604 overhead per packet. 606 G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit/s or 1.196 Gigabytes per 607 second. 609 Please note: this example does not take into consideration additional 610 Ethernet overhead, such as the interframe gap (a minimum of 96 bit 611 times), nor collisions (which have a variable impact, depending on 612 the network load). 614 When conducting Goodput measurements please document in addition to 615 the 4.1 section: 617 -the TCP Stack used 619 -OS Versions 621 -NIC firmware version and model 623 For example, Windows TCP stacks and different Linux versions can 624 influence TCP based tests results. 626 8. Security Considerations 628 Benchmarking activities as described in this memo are limited to 629 technology characterization using controlled stimuli in a laboratory 630 environment, with dedicated address space and the constraints 631 specified in the sections above. 633 The benchmarking network topology will be an independent test setup 634 and MUST NOT be connected to devices that may forward the test 635 traffic into a production network, or misroute traffic to the test 636 management network. 638 Further, benchmarking is performed on a "black-box" basis, relying 639 solely on measurements observable external to the DUT/SUT. 641 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for 642 benchmarking purposes. Any implications for network security arising 643 from the DUT/SUT SHOULD be identical in the lab and in production 644 networks. 646 9. IANA Considerations 648 NO IANA Action is requested at this time. 650 10. References 652 10.1. Normative References 654 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 655 Interconnection Devices", RFC 1242, July 1991. 657 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 658 Network Interconnect Devices", RFC 2544, March 1999. 660 10.2. Informative References 662 [1] Avramov L. and Rapp J., "Data Center Benchmarking Methodology", 663 April 2017. 665 [2] Mandeville R. and Perser J., "Benchmarking Methodology for 666 LAN Switching Devices", RFC 2889, August 2000. 668 [3] Stopp D. and Hickman B., "Methodology for IP Multicast 669 Benchmarking", RFC 3918, October 2004. 671 [4] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 673 Joseph, "Understanding TCP Incast Throughput Collapse in 674 Datacenter Networks, 675 "http://www.eecs.berkeley.edu/~ychen2/professional/TCPIncastWREN2009.pdf". 677 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 678 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 679 March 1997, 681 10.3. Acknowledgments 683 The authors would like to thank Alfred Morton, Scott Bradner, 684 Ian Cox, Tim Stevenson for their reviews and feedback. 686 Authors' Addresses 688 Lucien Avramov 689 Google 690 170 West Tasman drive 691 Mountain View, CA 94043 692 United States 693 Email: lucienav@google.com 695 Jacob Rapp 696 VMware 697 3401 Hillview Ave 698 Palo Alto, CA 94304 699 United States 700 Phone: +1 650 857 3367 701 Email: jrapp@vmware.com