idnits 2.17.1 draft-ietf-bmwg-dcbench-terminology-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 241 has weird spacing: '... change does ...' -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Avramov 3 INTERNET-DRAFT, Intended status: Informational Google 4 Expires: December 24,2017 J. Rapp 5 June 22, 2017 VMware 7 Data Center Benchmarking Terminology 8 draft-ietf-bmwg-dcbench-terminology-19 10 Abstract 12 The purpose of this informational document is to establish definitions 13 and describe measurement techniques for data center benchmarking, as 14 well as it is to introduce new terminologies applicable to performance 15 evaluations of data center network equipment. This document establishes 16 the important concepts for benchmarking network switches and routers in 17 the data center and, is a pre-requisite to the test methodology 18 publication [draft-ietf-bmwg-dcbench-methodology]. Many of these terms 19 and methods may be applicable to network equipment beyond this 20 publication's scope as the technologies originally applied in the data 21 center are deployed elsewhere. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the provisions 26 of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering Task 29 Force (IETF). Note that other groups may also distribute working 30 documents as Internet-Drafts. The list of current Internet-Drafts is at 31 http://datatracker.ietf.org/drafts/current. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference material 36 or to cite them other than as "work in progress." 38 Copyright Notice 40 Copyright (c) 2017 IETF Trust and the persons identified as the document 41 authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 44 Relating to IETF Documents (http://trustee.ietf.org/license-info) in 45 effect on the date of publication of this document. Please review these 46 documents carefully, as they describe your rights and restrictions with 47 respect to this document. Code Components extracted from this document 48 must include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 56 1.2. Definition format . . . . . . . . . . . . . . . . . . . . . 4 57 2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 2.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 61 3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 7 65 4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . . 7 66 4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 7 67 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 4.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 8 69 5 Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 5.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 10 73 6 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 11 76 6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 77 6.1.3 Measurement Units . . . . . . . . . . . . . . . . . . . 12 78 6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 79 6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 13 80 6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 14 81 6.2.3 Measurement Units . . . . . . . . . . . . . . . . . . . 14 82 7 Application Throughput: Data Center Goodput . . . . . . . . . . 14 83 7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 14 84 7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 14 85 7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 15 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 88 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 89 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 90 10.2. Informative References . . . . . . . . . . . . . . . . . 17 91 10.3. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 17 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 95 1. Introduction 97 Traffic patterns in the data center are not uniform and are 98 constantly changing. They are dictated by the nature and variety of 99 applications utilized in the data center. It can be largely east-west 100 traffic flows (server to server inside the data center) in one data 101 center and north-south (outside of the data center to server) in 102 another, while some may combine both. Traffic patterns can be bursty 103 in nature and contain many-to-one, many-to-many, or one-to-many 104 flows. Each flow may also be small and latency sensitive or large and 105 throughput sensitive while containing a mix of UDP and TCP traffic. 106 One or more of these may coexist in a single cluster and flow through 107 a single network device simultaneously. Benchmarking of network 108 devices have long used [RFC1242], [RFC2432], [RFC2544], [RFC2889] and 109 [RFC3918]. These benchmarks have largely been focused around various 110 latency attributes and max throughput of the Device Under Test being 111 benchmarked. These standards are good at measuring theoretical max 112 throughput, forwarding rates and latency under testing conditions, 113 but they do not represent real traffic patterns that may affect these 114 networking devices. The data center networking devices covered are 115 switches and routers. 117 Currently, typical data center networking devices are characterized 118 by: 120 -High port density (48 ports of more) 122 -High speed (up to 100 GB/s currently per port) 124 -High throughput (line rate on all ports for Layer 2 and/or Layer 3) 126 -Low latency (in the microsecond or nanosecond range) 128 -Low amount of buffer (in the MB range per networking device) 130 -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) 132 The following document defines a set of definitions, metrics and 133 terminologies including congestion scenarios, switch buffer analysis 134 and redefines basic definitions in order to represent a wide mix of 135 traffic conditions. The test methodologies are defined in [draft- 136 ietf-bmwg-dcbench-methodology]. 138 1.1. Requirements Language 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in RFC 2119 [RFC2119]. 144 1.2. Definition format 146 Term to be defined. (e.g., Latency) 148 Definition: The specific definition for the term. 150 Discussion: A brief discussion about the term, its application and 151 any restrictions on measurement procedures. 153 Measurement Units: Methodology for the measure and units used to 154 report measurements of this term, if applicable. 156 2. Latency 158 2.1. Definition 160 Latency is a the amount of time it takes a frame to transit the 161 Device Under Test (DUT). Latency is measured in units of time 162 (seconds, milliseconds, microseconds and so on). The purpose of 163 measuring latency is to understand the impact of adding a device in 164 the communication path. 166 The Latency interval can be assessed between different combinations 167 of events, regardless of the type of switching device (bit forwarding 168 aka cut-through, or store-and-forward type of device). [RFC1242] 169 defined Latency differently for each of these types of devices. 171 Traditionally the latency measurement definitions are: 173 FILO (First In Last Out) 175 The time interval starting when the end of the first bit of the input 176 frame reaches the input port and ending when the last bit of the 177 output frame is seen on the output port. 179 FIFO (First In First Out): 181 The time interval starting when the end of the first bit of the input 182 frame reaches the input port and ending when the start of the first 183 bit of the output frame is seen on the output port. [RFC1242] Latency 184 for bit forwarding devices uses these events. 186 LILO (Last In Last Out): 188 The time interval starting when the last bit of the input frame 189 reaches the input port and the last bit of the output frame is seen 190 on the output port. 192 LIFO (Last In First Out): 194 The time interval starting when the last bit of the input frame 195 reaches the input port and ending when the first bit of the output 196 frame is seen on the output port. [RFC1242] Latency for bit 197 forwarding devices uses these events. 199 Another possibility to summarize the four different definitions above 200 is to refer to the bit position as they normally occur: Input to 201 output. 203 FILO is FL (First bit Last bit). FIFO is FF (First bit First bit). 204 LILO is LL (Last bit Last bit). LIFO is LF (Last bit First bit). 206 This definition explained in this section in context of data center 207 switching benchmarking is in lieu of the previous definition of 208 Latency defined in RFC 1242, section 3.8 and is quoted here: 210 For store and forward devices: The time interval starting when the 211 last bit of the input frame reaches the input port and ending when 212 the first bit of the output frame is seen on the output port. 214 For bit forwarding devices: The time interval starting when the end 215 of the first bit of the input frame reaches the input port and ending 216 when the start of the first bit of the output frame is seen on the 217 output port. 219 To accommodate both types of network devices and hybrids of the two 220 types that have emerged, switch Latency measurements made according 221 to this document MUST be measured with the FILO events. FILO will 222 include the latency of the switch and the latency of the frame as 223 well as the serialization delay. It is a picture of the 'whole' 224 latency going through the DUT. For applications which are latency 225 sensitive and can function with initial bytes of the frame, FIFO (or 226 RFC 1242 Latency for bit forwarding devices) MAY be used. In all 227 cases, the event combination used in Latency measurement MUST be 228 reported. 230 2.2 Discussion 232 As mentioned in section 2.1, FILO is the most important measuring 233 definition. 235 Not all DUTs are exclusively cut-through or store-and-forward. Data 236 Center DUTs are frequently store-and-forward for smaller packet sizes 237 and then adopting a cut-through behavior. The change of behavior 238 happens at specific larger packet sizes. The value of the packet size 239 for the behavior to change MAY be configurable depending on the DUT 240 manufacturer. FILO covers all scenarios: Store-and-forward or cut- 241 through. The threshold of behavior change does not matter for 242 benchmarking since FILO covers both possible scenarios. 244 LIFO mechanism can be used with store forward type of switches but 245 not with cut-through type of switches, as it will provide negative 246 latency values for larger packet sizes because LIFO removes the 247 serialization delay. Therefore, this mechanism MUST NOT be used when 248 comparing latencies of two different DUTs. 250 2.3 Measurement Units 252 The measuring methods to use for benchmarking purposes are as 253 follows: 255 1) FILO MUST be used as a measuring method, as this will include the 256 latency of the packet; and today the application commonly needs to 257 read the whole packet to process the information and take an action. 259 2) FIFO MAY be used for certain applications able to proceed the data 260 as the first bits arrive, as for example for a Field-Programmable 261 Gate Array (FPGA) 263 3) LIFO MUST NOT be used, because it subtracts the latency of the 264 packet; unlike all the other methods. 266 3 Jitter 268 3.1 Definition 270 Jitter in the data center context is synonymous with the common term 271 Delay variation. It is derived from multiple measurements of one-way 272 delay, as described in RFC 3393. The mandatory definition of Delay 273 Variation is the Packet Delay Variation (PDV) from section 4.2 of 274 [RFC5481]. When considering a stream of packets, the delays of all 275 packets are subtracted from the minimum delay over all packets in the 276 stream. This facilitates assessment of the range of delay variation 277 (Max - Min), or a high percentile of PDV (99th percentile, for 278 robustness against outliers). 280 When First-bit to Last-bit timestamps are used for Delay measurement, 281 then Delay Variation MUST be measured using packets or frames of the 282 same size, since the definition of latency includes the serialization 283 time for each packet. Otherwise if using First-bit to First-bit, the 284 size restriction does not apply. 286 3.2 Discussion 288 In addition to PDV Range and/or a high percentile of PDV, Inter- 289 Packet Delay Variation (IPDV) as defined in section 4.1 of [RFC5481] 290 (differences between two consecutive packets) MAY be used for the 291 purpose of determining how packet spacing has changed during 292 transfer, for example, to see if packet stream has become closely- 293 spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be 294 used, as this collapses the "bursty" and "dispersed" sides of the 295 IPDV distribution together. 297 3.3 Measurement Units 299 The measurement of delay variation is expressed in units of seconds. 300 A PDV histogram MAY be provided for the population of packets 301 measured. 303 4 Physical Layer Calibration 305 4.1 Definition 307 The calibration of the physical layer consists of defining and 308 measuring the latency of the physical devices used to perform tests 309 on the DUT. 311 It includes the list of all physical layer components used as listed 312 here after: 314 -Type of device used to generate traffic / measure traffic 316 -Type of line cards used on the traffic generator 318 -Type of transceivers on traffic generator 320 -Type of transceivers on DUT 322 -Type of cables 323 -Length of cables 325 -Software name, and version of traffic generator and DUT 327 -List of enabled features on DUT MAY be provided and is recommended 328 (especially the control plane protocols such as Link Layer Discovery 329 Protocol, Spanning-Tree etc.). A comprehensive configuration file MAY 330 be provided to this effect. 332 4.2 Discussion 334 Physical layer calibration is part of the end to end latency, which 335 should be taken into acknowledgment while evaluating the DUT. Small 336 variations of the physical components of the test may impact the 337 latency being measured, therefore they MUST be described when 338 presenting results. 340 4.3 Measurement Units 342 It is RECOMMENDED to use all cables of: The same type, the same 343 length, when possible using the same vendor. It is a MUST to document 344 the cables specifications on section 4.1 along with the test results. 345 The test report MUST specify if the cable latency has been removed 346 from the test measures or not. The accuracy of the traffic generator 347 measure MUST be provided (this is usually a value in the 20ns range 348 for current test equipment). 350 5 Line rate 352 5.1 Definition 354 The transmit timing, or maximum transmitted data rate is controlled 355 by the "transmit clock" in the DUT. The receive timing (maximum 356 ingress data rate) is derived from the transmit clock of the 357 connected interface. 359 The line rate or physical layer frame rate is the maximum capacity to 360 send frames of a specific size at the transmit clock frequency of the 361 DUT. 363 The term "nominal value of Line Rate" defines the maximum speed 364 capability for the given port; for example 1GE, 10GE, 40GE, 100GE 365 etc. 367 The frequency ("clock rate") of the transmit clock in any two 368 connected interfaces will never be precisely the same; therefore, a 369 tolerance is needed. This will be expressed by Parts Per Million 370 (PPM) value. The IEEE standards allow a specific +/- variance in the 371 transmit clock rate, and Ethernet is designed to allow for small, 372 normal variations between the two clock rates. This results in a 373 tolerance of the line rate value when traffic is generated from a 374 testing equipment to a DUT. 376 Line rate SHOULD be measured in frames per second. 378 5.2 Discussion 380 For a transmit clock source, most Ethernet switches use "clock 381 modules" (also called "oscillator modules") that are sealed, 382 internally temperature-compensated, and very accurate. The output 383 frequency of these modules is not adjustable because it is not 384 necessary. Many test sets, however, offer a software-controlled 385 adjustment of the transmit clock rate. These adjustments SHOULD be 386 used to compensate the test equipment in order to not send more than 387 the line rate of the DUT. 389 To allow for the minor variations typically found in the clock rate 390 of commercially-available clock modules and other crystal-based 391 oscillators, Ethernet standards specify the maximum transmit clock 392 rate variation to be not more than +/- 100 PPM (parts per million) 393 from a calculated center frequency. Therefore a DUT must be able to 394 accept frames at a rate within +/- 100 PPM to comply with the 395 standards. 397 Very few clock circuits are precisely +/- 0.0 PPM because: 399 1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per 400 million) variance over time. Therefore it is normal for the frequency 401 of the oscillator circuits to experience variation over time and over 402 a wide temperature range, among external factors. 404 2.The crystals, or clock modules, usually have a specific +/- PPM 405 variance that is significantly better than +/- 100 PPM. Often times 406 this is +/- 30 PPM or better in order to be considered a 407 "certification instrument". 409 When testing an Ethernet switch throughput at "line rate", any 410 specific switch will have a clock rate variance. If a test set is 411 running +1 PPM faster than a switch under test, and a sustained line 412 rate test is performed, a gradual increase in latency and eventually 413 packet drops as buffers fill and overflow in the switch can be 414 observed. Depending on how much clock variance there is between the 415 two connected systems, the effect may be seen after the traffic 416 stream has been running for a few hundred microseconds, a few 417 milliseconds, or seconds. The same low latency and no-packet-loss can 418 be demonstrated by setting the test set link occupancy to slightly 419 less than 100 percent link occupancy. Typically 99 percent link 420 occupancy produces excellent low-latency and no packet loss. No 421 Ethernet switch or router will have a transmit clock rate of exactly 422 +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is 423 precisely +/- 0.0 PPM. 425 Test set equipment manufacturers are well-aware of the standards, and 426 allow a software-controlled +/- 100 PPM "offset" (clock-rate 427 adjustment) to compensate for normal variations in the clock speed of 428 DUTs. This offset adjustment allows engineers to determine the 429 approximate speed the connected device is operating, and verify that 430 it is within parameters allowed by standards. 432 5.3 Measurement Units 434 "Line Rate" can be measured in terms of "Frame Rate": 436 Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap 437 + Preamble + Start-Frame Delimiter) 439 Minimum_Gap represents the inter frame gap. This formula "scales up" 440 or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so 441 on. 443 Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = 444 1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 445 Frame Rate = 1,488,095.2 frames per second. 447 Considering the allowance of +/- 100 PPM, a switch may "legally" 448 transmit traffic at a frame rate between 1,487,946.4 FPS and 449 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to 450 a 1.488 frame-per-second frame rate increase or decrease. 452 In a production network, it is very unlikely to see precise line rate 453 over a very brief period. There is no observable difference between 454 dropping packets at 99% of line rate and 100% of line rate. 456 Line rate can be measured at 100% of line rate with a -100PPM 457 adjustment. 459 Line rate SHOULD be measured at 99,98% with 0 PPM adjustment. 461 The PPM adjustment SHOULD only be used for a line rate type of 462 measurement. 464 6 Buffering 466 6.1 Buffer 468 6.1.1 Definition 470 Buffer Size: The term buffer size represents the total amount of 471 frame buffering memory available on a DUT. This size is expressed in 472 B (byte); KB (kilobyte), MB (megabyte) or GB (gigabyte). When the 473 buffer size is expressed it SHOULD be defined by a size metric stated 474 above. When the buffer size is expressed, an indication of the frame 475 MTU used for that measurement is also necessary as well as the cos 476 (class of service) or dscp (differentiated services code point) value 477 set; as often times the buffers are carved by quality of service 478 implementation. Please refer to the buffer efficiency section for 479 further details. 481 Example: Buffer Size of DUT when sending 1518 byte frames is 18 MB. 483 Port Buffer Size: The port buffer size is the amount of buffer for a 484 single ingress port, egress port or combination of ingress and egress 485 buffering location for a single port. The reason for mentioning the 486 three locations for the port buffer is because the DUT buffering 487 scheme can be unknown or untested, and so knowing the buffer location 488 helps clarify the buffer architecture and consequently the total 489 buffer size. The Port Buffer Size is an informational value that MAY 490 be provided from the DUT vendor. It is not a value that is tested by 491 benchmarking. Benchmarking will be done using the Maximum Port Buffer 492 Size or Maximum Buffer Size methodology. 494 Maximum Port Buffer Size: In most cases, this is the same as the Port 495 Buffer Size. In certain switch architecture called SoC (switch on 496 chip), there is a port buffer and a shared buffer pool available for 497 all ports. The Maximum Port Buffer Size , in terms of an SoC buffer, 498 represents the sum of the port buffer and the maximum value of shared 499 buffer allowed for this port, defined in terms of B (byte), KB 500 (kilobyte), MB (megabyte), or GB (gigabyte). The Maximum Port Buffer 501 Size needs to be expressed along with the frame MTU used for the 502 measurement and the cos or dscp bit value set for the test. 504 Example: A DUT has been measured to have 3KB of port buffer for 1518 505 frame size packets and a total of 4.7 MB of maximum port buffer for 506 1518 frame size packets and a cos of 0. 508 Maximum DUT Buffer Size: This is the total size of Buffer a DUT can 509 be measured to have. It is, most likely, different than than the 510 Maximum Port Buffer Size. It can also be different from the sum of 511 Maximum Port Buffer Size. The Maximum Buffer Size needs to be 512 expressed along with the frame MTU used for the measurement and along 513 with the cos or dscp value set during the test. 515 Example: A DUT has been measured to have 3KB of port buffer for 1518 516 frame size packets and a total of 4.7 MB of maximum port buffer for 517 1518 B frame size packets. The DUT has a Maximum Buffer Size of 18 MB 518 at 1500 B and a cos of 0. 520 Burst: The burst is a fixed number of packets sent over a percentage 521 of linerate of a defined port speed. The amount of frames sent are 522 evenly distributed across the interval, T. A constant, C, can be 523 defined to provide the average time between two consecutive packets 524 evenly spaced. 526 Microburst: It is a burst. A microburst is when packet drops occur 527 when there is not sustained or noticeable congestion upon a link or 528 device. A characterization of microburst is when the Burst is not 529 evenly distributed over T, and is less than the constant C [C= 530 average time between two consecutive packets evenly spaced out]. 532 Intensity of Microburst: This is a percentage, representing the level 533 of microburst between 1 and 100%. The higher the number the higher 534 the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / 535 Sum(packets)]]*100 537 The above definitions are not meant to comment on the ideal sizing of 538 a buffer, rather on how to measure it. A larger buffer is not 539 necessarily better and can cause issues with buffer bloat. 541 6.1.2 Discussion 543 When measuring buffering on a DUT, it is important to understand the 544 behavior for each and all ports. This provides data for the total 545 amount of buffering available on the switch. The terms of buffer 546 efficiency here helps one understand the optimum packet size for the 547 buffer, or the real volume of the buffer available for a specific 548 packet size. This section does not discuss how to conduct the test 549 methodology; instead, it explains the buffer definitions and what 550 metrics should be provided for a comprehensive data center device 551 buffering benchmarking. 553 6.1.3 Measurement Units 555 When Buffer is measured: 557 -The buffer size MUST be measured 559 -The port buffer size MAY be provided for each port 561 -The maximum port buffer size MUST be measured 563 -The maximum DUT buffer size MUST be measured 565 -The intensity of microburst MAY be mentioned when a microburst test 566 is performed 568 -The cos or dscp value set during the test SHOULD be provided 570 6.2 Incast 571 6.2.1 Definition 573 The term Incast, very commonly utilized in the data center, refers to 574 the traffic pattern of many-to-one or many-to-many traffic patterns. 575 It measures the number of ingress and egress ports and the level of 576 synchronization attributed, as defined in this section. Typically in 577 the data center it would refer to many different ingress server ports 578 (many), sending traffic to a common uplink (many-to-one), or multiple 579 uplinks (many-to-many). This pattern is generalized for any network 580 as many incoming ports sending traffic to one or few uplinks. 582 Synchronous arrival time: When two, or more, frames of respective 583 sizes L1 and L2 arrive at their respective one or multiple ingress 584 ports, and there is an overlap of the arrival time for any of the 585 bits on the Device Under Test (DUT), then the frames L1 and L2 have a 586 synchronous arrival times. This is called Incast regardless of in 587 many-to-one (simpler form) or, many-to-many. 589 Asynchronous arrival time: Any condition not defined by synchronous 590 arrival time. 592 Percentage of synchronization: This defines the level of overlap 593 [amount of bits] between the frames L1,L2..Ln. 595 Example: Two 64 bytes frames, of length L1 and L2, arrive to ingress 596 port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes 597 between the two where L1 and L2 were at the same time on the 598 respective ingress ports. Therefore the percentage of synchronization 599 is 10%. 601 Stateful type traffic defines packets exchanged with a stateful 602 protocol such as TCP. 604 Stateless type traffic defines packets exchanged with a stateless 605 protocol such as UDP. 607 6.2.2 Discussion 609 In this scenario, buffers are solicited on the DUT. In an ingress 610 buffering mechanism, the ingress port buffers would be solicited 611 along with Virtual Output Queues, when available; whereas in an 612 egress buffer mechanism, the egress buffer of the one outgoing port 613 would be used. 615 In either case, regardless of where the buffer memory is located on 616 the switch architecture, the Incast creates buffer utilization. 618 When one or more frames having synchronous arrival times at the DUT 619 they are considered forming an Incast. 621 6.2.3 Measurement Units 623 It is a MUST to measure the number of ingress and egress ports. It is 624 a MUST to have a non-null percentage of synchronization, which MUST 625 be specified. 627 7 Application Throughput: Data Center Goodput 629 7.1. Definition 631 In Data Center Networking, a balanced network is a function of 632 maximal throughput and minimal loss at any given time. This is 633 captured by the Goodput [4]. Goodput is the application-level 634 throughput. For standard TCP applications, a very small loss can have 635 a dramatic effect on application throughput. [RFC2647] has a 636 definition of Goodput; the definition in this publication is a 637 variance. 639 Goodput is the number of bits per unit of time forwarded to the 640 correct destination interface of the DUT, minus any bits 641 retransmitted. 643 7.2. Discussion 645 In data center benchmarking, the goodput is a value that SHOULD be 646 measured. It provides a realistic idea of the usage of the available 647 bandwidth. A goal in data center environments is to maximize the 648 goodput while minimizing the loss. 650 7.3. Measurement Units 652 The Goodput, G, is then measured by the following formula: 654 G=(S/F) x V bytes per second 656 -S represents the payload bytes, which does not include packet or TCP 657 headers 659 -F is the frame size 661 -V is the speed of the media in bytes per second 663 Example: A TCP file transfer over HTTP protocol on a 10GB/s media. 665 The file cannot be transferred over Ethernet as a single continuous 666 stream. It must be broken down into individual frames of 1500B when 667 the standard MTU (Maximum Transmission Unit) is used. Each packet 668 requires 20B of IP header information and 20B of TCP header 669 information; therefore 1460B are available per packet for the file 670 transfer. Linux based systems are further limited to 1448B as they 671 also carry a 12B timestamp. Finally, the date is transmitted in this 672 example over Ethernet which adds a 26B overhead per packet. 674 G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit per second or 1.196 GB 675 per second. 677 Please note: This example does not take into consideration the 678 additional Ethernet overhead, such as the interframe gap (a minimum 679 of 96 bit times), nor collisions (which have a variable impact, 680 depending on the network load). 682 When conducting Goodput measurements please document in addition to 683 the 4.1 section the following information: 685 -The TCP Stack used 687 -OS Versions 689 -NIC firmware version and model 691 For example, Windows TCP stacks and different Linux versions can 692 influence TCP based tests results. 694 8. Security Considerations 696 Benchmarking activities as described in this memo are limited to 697 technology characterization using controlled stimuli in a laboratory 698 environment, with dedicated address space and the constraints 699 specified in the sections above. 701 The benchmarking network topology will be an independent test setup 702 and MUST NOT be connected to devices that may forward the test 703 traffic into a production network, or misroute traffic to the test 704 management network. 706 Further, benchmarking is performed on a "black-box" basis, relying 707 solely on measurements observable external to the DUT. 709 Special capabilities SHOULD NOT exist in the DUT specifically for 710 benchmarking purposes. Any implications for network security arising 711 from the DUT SHOULD be identical in the lab and in production 712 networks. 714 9. IANA Considerations 716 NO IANA Action is requested at this time. 718 10. References 720 10.1. Normative References 722 [draft-ietf-bmwg-dcbench-methodology] Avramov L. and Rapp J., "Data 723 Center Benchmarking Methodology", RFC "draft-ietf-bmwg-dcbench- 724 methodology", DATE (to be updated once published) 726 [RFC1242] Bradner, S. "Benchmarking Terminology for Network 727 Interconnection Devices", RFC 1242, July 1991, 730 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 731 Network Interconnect Devices", RFC 2544, March 1999, 732 734 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 735 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 736 March 1997, 738 [RFC5481] , Morton, A., "Packet Delay Variation Applicability 739 Statement", BCP 14, RFC 5481, March 2009, 742 10.2. Informative References 744 [RFC2889] Mandeville R. and Perser J., "Benchmarking 745 Methodology for LAN Switching Devices", RFC 2889, August 2000, 746 748 [RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast 749 Benchmarking", RFC 3918, October 2004, 752 [4] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. 753 Joseph, "Understanding TCP Incast Throughput Collapse in 754 Datacenter Networks, 755 "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" 757 [RFC2432] Dubray, K., "Terminology for IP Multicast 758 Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 759 1998, 761 [RFC2647] Newman D. ,"Benchmarking Terminology for Firewall 762 Performance" BCP 14, RFC 2647, August 1999, 765 10.3. Acknowledgments 767 The authors would like to thank Alfred Morton, Scott Bradner, 768 Ian Cox, Tim Stevenson for their reviews and feedback. 770 Authors' Addresses 772 Lucien Avramov 773 Google 774 1600 Amphitheatre Parkway 775 Mountain View, CA 94043 776 United States 777 Phone: +1 408 774 9077 778 Email: lucien.avramov@gmail.com 780 Jacob Rapp 781 VMware 782 3401 Hillview Ave 783 Palo Alto, CA 94304 784 United States 785 Phone: +1 650 857 3367 786 Email: jrapp@vmware.com