idnits 2.17.1 draft-skommu-bmwg-nvp-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 28 longer pages, the longest (page 10) being 80 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 28 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1. Introduction' ) ** The document seems to lack a Security Considerations section. (A line matching the expected section header was found, but with an unexpected indentation: ' 7. Security Considerations' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) (A line matching the expected section header was found, but with an unexpected indentation: ' 8. IANA Considerations' ) ** There are 556 instances of too long lines in the document, the longest one being 16 characters in excess of 72. ** The abstract seems to contain references ([RFC7364], [RFC8014], [RFC2119], [RFC8172], [RFC8174], [RFC8394]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 150: '...d db. Benchmarks MUST consider whether...' RFC 2119 keyword, line 181: '...Test (SUT) model MUST be used instead ...' RFC 2119 keyword, line 186: '... component MUST be considered as...' RFC 2119 keyword, line 193: '...rking stack itself, MUST be considered...' RFC 2119 keyword, line 264: '...everage TCP optimizations MUST be used...' (49 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Mar 11, 2019) is 1873 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Missing reference section? 'RFC2119' on line 165 looks like a reference -- Missing reference section? 'RFC8174' on line 165 looks like a reference -- Missing reference section? 'RFC7364' on line 997 looks like a reference -- Missing reference section? 'RFC8014' on line 1002 looks like a reference -- Missing reference section? 'RFC8394' on line 1007 looks like a reference -- Missing reference section? 'RFC8172' on line 1017 looks like a reference Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT 3 BMWG S. Kommu 4 Internet-Draft VMware 5 Intended status: Informational J. Rapp 6 Expires: Sep 2019 VMware 7 Mar 11, 2019 9 Considerations for Benchmarking Network Virtualization Platforms 10 draft-skommu-bmwg-nvp-03.txt 12 Status of this Memo 14 This Internet-Draft is submitted in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on September 11, 2019. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 Current network benchmarking methodologies are focused on physical 53 networking components and do not consider the actual application 54 layer traffic patterns and hence do not reflect the traffic that 55 virtual networking components work with when using network 56 virtualization overlays (NVO3). The purpose of this document is to 57 distinguish and highlight benchmarking considerations when testing 58 and evaluating virtual networking components in the data center. 60 Table of Contents 62 1. Introduction .................................................. 3 63 2. Conventions used in this document ............................. 4 64 3. Definitions ................................................... 4 65 3.1. System Under Test (SUT) ................................. 4 66 3.2. Network Virtualization Platform ......................... 5 67 3.3. Microservices ........................................... 6 68 4. Scope ......................................................... 7 69 4.1.1. Scenario 1 ........................................ 7 70 4.1.2. Scenario 2 ........................................ 7 71 4.1.3. Learning .......................................... 7 72 4.1.4. Flow Optimization ................................. 7 73 4.1.5. Out of scope ...................................... 7 74 4.2. Virtual Networking for Datacenter Applications .......... 8 75 4.3. Interaction with Physical Devices ....................... 8 76 5. NVP Benchmarking Considerations ............................... 9 77 5.1. Learning ............................................... 12 78 5.2. Traffic Flow Optimizations ............................. 12 79 5.2.1. Fast Path ........................................ 12 80 5.2.2. Dedicated cores / Co-processors .................. 12 81 5.2.3. Prioritizing and de-prioritizing active flows .... 13 82 5.3. Server Architecture Considerations ..................... 13 83 5.3.1. NVE Component considerations ..................... 13 84 5.3.2. Frame format/sizes within the Hypervisor ......... 17 85 5.3.3. Baseline testing with Logical Switch ............. 17 86 5.3.4. Repeatability .................................... 17 87 5.3.5. Tunnel encap/decap outside the Hypervisor ........ 17 88 5.3.6. SUT Hypervisor Profile ........................... 18 89 5.4. Benchmarking Tools Considerations ...................... 20 90 5.4.1. Considerations for NVE ........................... 20 91 5.4.2. Considerations for Split-NVE ..................... 20 92 6. Control Plane Scale Considerations ........................... 20 93 6.1.1. VM Events ........................................ 21 94 6.1.2. Scale ............................................ 22 95 6.1.3. Control Plane Performance at Scale ............... 22 97 7. Security Considerations ...................................... 23 98 8. IANA Considerations .......................................... 23 99 9. Conclusions .................................................. 23 100 10. References .................................................. 24 101 10.1. Normative References .................................. 24 102 10.2. Informative References ................................ 24 103 11. Acknowledgments ............................................. 24 104 Appendix A. Partial List of Parameters to Document .............. 25 105 A.1. CPU .................................................... 25 106 A.2. Memory ................................................. 25 107 A.3. NIC .................................................... 26 108 A.4. Hypervisor ............................................. 26 109 A.5. Guest VM ............................................... 27 110 A.6. Overlay Network Physical Fabric ........................ 27 111 A.7. Gateway Network Physical Fabric ........................ 28 112 A.8. Metrics ................................................ 28 114 1. Introduction 116 Datacenter virtualization that includes both compute and network 117 virtualization is growing rapidly as the industry continues to look 118 for ways to improve productivity, flexibility and at the same time 119 cut costs. Network virtualization is comparatively new and expected 120 to grow tremendously similar to compute virtualization. There are 121 multiple vendors and solutions out in the market. Each vendor often 122 has their own recommendations on how to benchmark their solutions 123 thus making it difficult to perform a apples-to-apples comparision 124 between different solutions. Hence, the need for a vendor, product 125 and cloud agnostic way to benchmark network virtualization solutions 126 to help with comparison and make informed decisions when it comes to 127 selecting the right network virtualization solution. 129 Applications traditionally have been segmented using VLANs and ACLs 130 between the VLANs. This model does not scale because of the 4K scale 131 limitations of VLANs. Overlays such as VXLAN were designed to 132 address the limitations of VLANs. 134 With VXLAN, applications are segmented based on VXLAN encapsulation 135 (specifically the VNI field in the VXLAN header), which is similar to 136 VLAN ID in the 802.1Q VLAN tag, however without the 4K scale 137 limitations of VLANs. For a more detailed discussion on this subject 138 please refer RFC 7364 'Problem Statement: Overlays for Network 139 Virtualization'. 141 VXLAN is just one of several Network Virtualization Overlays (NVO). 142 Some of the others include STT, Geneve and NVGRE. STT and Geneve 143 have expanded on the capabilities of VXLAN. Please refer IETF's nvo3 144 working group < https://datatracker.ietf.org/wg/nvo3/documents/> for 145 more information. 147 Modern application architectures, such as Micro-services, because of 148 IP based connectivity within the app, place high demands on the 149 networking and security when compared to the traditional three tier 150 app models such as web, app and db. Benchmarks MUST consider whether 151 the proposed solution is able to scale up to the demands of such 152 applications and not just a three-tier architecture. 154 The benchmarks will be utilizing the various terminology and 155 definitions from the NVO3 working group including RFC 8014 and RFC 156 8394. 158 2. Conventions used in this document 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 161 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 162 "OPTIONAL" in this document are to be interpreted as described in BCP 163 14 [RFC2119] [RFC8174] when, and only when, they appear in all 164 capitals, as shown here. 166 3. Definitions 168 3.1. System Under Test (SUT) 170 Traditional hardware based networking devices generally use the 171 device under test (DUT) model of testing. In this model, apart from 172 any allowed configuration, the DUT is a black box from a testing 173 perspective. This method works for hardware based networking devices 174 since the device itself is not influenced by any other components 175 outside the DUT. 177 Virtual networking components cannot leverage DUT model of testing as 178 the DUT is not just the virtual device but includes the hardware 179 components that were used to host the virtual device 181 Hence System Under Test (SUT) model MUST be used instead of the 182 traditional device under test 184 With SUT model, the virtual networking component along with all 185 software and hardware components that host the virtual networking 186 component MUST be considered as part of the SUT. 188 Virtual networking components, because of their dependency on the 189 underlying hardware and other software components, may end up 190 leveraging NIC offload benefits, such as TCP Segmentation Offload 191 (TSO), Large Receive Offload (LRO) and Rx / Tx Filters. Such 192 underlying hardware and software level features, even though they may 193 not be part of virtual networking stack itself, MUST be considered 194 and documented. Note: Physical switches and routers, including the 195 ones that act as initiators for NVOs, work with L2/L3 packets and may 196 not be able to leverage TCP enhancements such as TSO. 198 Please refer to section 5 Figure 1 for a visual representation of 199 System Under Test in the case of Intra-Host testing and section 5 200 Figure 2 for System Under Test in the case of Inter-Host testing. 202 3.2. Network Virtualization Platform 204 This document focuses on the Network Virtualization Overlay platform 205 as outlined in RFC 8014 and use cases from RFC 8394. 207 Network Virtualization platforms, function closer to the application 208 layer and are able to work with not only L2/L3 packets but also 209 segments that leverage TCP optimizations such as Large Segment 210 Offload (LSO). 212 NVPs leverage TCP stack optimizations such as TCP Segmentation 213 Offload (TSO) and Large Receive Offload (LRO) that enables NVPs to 214 work with much larger payloads of up to 64K unlike their counterparts 215 such as NFVs. 217 Because of the difference in the payload, which translates into one 218 operation per 64K of payload in NVP verses ~40 operations for the 219 same amount of payload in NFV because of having to divide it to MTU 220 sized packets, results in considerable difference in performance 221 between NFV and NVP. 223 Please refer to figure 1 for a pictorial representation of this 224 primary difference between NPV and NFV for a 64K payload 225 segment/packet running on network set to 1500 bytes MTU. 227 Note: Payload sizes in figure 1 are approximates. 229 NVP (1 segment) NFV (40 packets) 231 Segment 1 Packet 1 232 +-------------------------+ +-------------------------+ 233 | Headers | | Headers | 234 | +---------------------+ | | +---------------------+ | 235 | | Payload - upto 64K | | | | Payload < 1500 | | 236 | +---------------------+ | | +---------------------+ | 237 +-------------------------+ +-------------------------+ 239 Packet 2 240 +-------------------------+ 241 | Headers | 242 | +---------------------+ | 243 | | Payload < 1500 | | 244 | +---------------------+ | 245 +-------------------------+ 247 . 248 . 249 . 250 . 252 Packet 40 253 +-------------------------+ 254 | Headers | 255 | +---------------------+ | 256 | | Payload < 1500 | | 257 | +---------------------+ | 258 +-------------------------+ 260 Figure 1! Payload NPV vs NFV 262 Hence, normal benchmarking methods are not relevant to the NVPs. 264 Instead, newer methods that leverage TCP optimizations MUST be used 265 for testing Network Virtualization Platforms. 267 3.3. Microservices 269 Moving from traditional monolithic application architectures such as 270 the three tier web, app and db architectures to microservices model 271 open up networking and security stacks to new scale and performance 272 related challenges. At a high level, in a microservices model, a 273 traditional monolithic app that may use few IPs is broken down into 274 100s of individual one-responsibility-only applications where each 275 application has connectivity and security related requirements. 276 These 100s of small one-responsibility-only micro-services need their 277 own IP and also secured into their own segments, hence pushing the 278 scale boundaries of the overlay from both simple segmentation 279 perspective and also from a security perspective. 281 For more details regarding microservices, please refer to wiki on 282 microservices: https://en.wikipedia.org/wiki/Microservices 284 4. Scope 286 Focus of this document is the Network Virtualization Platform in two 287 separate scenarios as outlined in RFC 8014 section 4, Network 288 Virtualization Edge (NVE) and RFC 8394 section 1.1 Split-NVE and the 289 associated learning phase: 291 4.1.1. Scenario 1 293 RFC 8014 Section 4.1 "NVE Co-located with server hypervisor": Where 294 the entire NVE functionality will typically be implemented as part of 295 the hypervisor and/or virtual switch on the server. 297 4.1.2. Scenario 2 299 RFC 8394 Section 1.1 "Split-NVE: A type of NVE (Network 300 Virtualization Edge) where the functionalities are split across an 301 end device supporting virtualization and an external network device." 303 4.1.3. Learning 305 Address learning rate is a key contributor to the overall performance 306 of SUT specially in microservices type of use cases where a large 307 amount of end-points are created and destroyed on demand. 309 4.1.4. Flow Optimization 311 There are several flow optimization algorithms that are designed to 312 help improve latency or throughput. These optimizations MUST be 313 documented. 315 4.1.5. Out of scope 317 This document does not address Network Function Virtualization which 318 has been covered already by previous IETF documents 319 (https://datatracker.ietf.org/doc/draft-ietf-bmwg-virtual- 320 net/?include_text=1). 322 Network Function Virtualization (NFV) focuses on being independent of 323 networking hardware while providing the same functionality. In the 324 case of NFV, traditional benchmarking methodologies recommended by 325 IETF may be used. Considerations for Benchmarking Virtual Network 326 Functions and Their Infrastructure IETF document addresses 327 benchmarking NFVs. 329 Typical NFV implementations emulate in software, the characteristics 330 and features of physical switches. They are similar to any physical 331 L2/L3 switch from the perspective of the packet size, which is 332 typically enforced based on the maximum transmission unit used. 334 4.2. Virtual Networking for Datacenter Applications 336 This document focuses on the virtual networking for east-west traffic 337 within on-prem datacenter and/or cloud. For example, in a three tier 338 app such web, app and db, this document focuses on the east-west 339 traffic between web and app. 341 This document addresses scale requirements for modern application 342 architectures such as Micro-services to consider whether the proposed 343 solution is able to scale up to the demands of micro- services 344 application models that basically have 100s of small services 345 communicating on some standard ports such as http/https using 346 protocols such as REST. 348 4.3. Interaction with Physical Devices 350 Virtual network components MUST NOT be tested independent of other 351 components within the system. Example, unlike a physical router or a 352 firewall, where the tests can be focused solely on the device, when 353 testing a virtual router or firewall, multiple other devices may 354 become part of the SUT. Hence the characteristics of these other 355 traditional networking switches and routers, LB, FW etc. MUST be 356 considered. 358 o Hashing method used 360 o Over-subscription rate 362 o Throughput available 364 o Latency characteristics 366 5. NVP Benchmarking Considerations 368 In virtual environments, the SUT may often share resources and reside 369 on the same physical hardware with other components involved in the 370 tests. Hence SUT MUST be clearly documented. In these tests, a 371 single hypervisor may host multiple servers, switches, routers, 372 firewalls etc. 374 Intra host testing: Intra host testing helps in reducing the number 375 of components involved in a test. For example, intra host testing 376 would help focus on the System Under Test, logical switch and the 377 hardware that is running the hypervisor that hosts the logical 378 switch, and eliminate other components. Because of the nature of 379 virtual infrastructures and multiple elements being hosted on the 380 same physical infrastructure, influence from other components cannot 381 be completely ruled out. For example, unlike in physical 382 infrastructures, logical routing or distributed firewall MUST NOT be 383 benchmarked independent of logical switching. System Under Test 384 definition MUST include all components involved with that particular 385 test. 387 +---------------------------------------------------+ 388 | System Under Test | 389 | +-----------------------------------------------+ | 390 | | Hyper-Visor | | 391 | | | | 392 | | +-------------+ | | 393 | | | NVP | | | 394 | | +-----+ | Switch/ | +-----+ | | 395 | | | VM1 |<------>| Router/ |<------>| VM2 | | | 396 | | +-----+ VW | Fire Wall/ | VW +-----+ | | 397 | | | etc., | | | 398 | | +-------------+ | | 399 | | Legend | | 400 | | VM: Virtual Machine | | 401 | | VW: Virtual Wire | | 402 | +-----------------------------------------------+ | 403 +---------------------------------------------------+ 405 Figure 2! Intra-Host System Under Test 407 In the above figure, we only address the NVE co-located with the 408 hypervisor. 410 Inter-host testing: Inter-host testing helps in profiling the 411 underlying network interconnect performance. For example, when 412 testing Logical Switching, inter host testing would not only test the 413 logical switch component but also any other devices that are part of 414 the physical data center fabric that connects the two hypervisors. 415 System Under Test MUST be well defined to help with repeatability of 416 tests. System Under Test definition in the case of inter host 417 testing, MUST include all components, including the underlying 418 network fabric. 420 Figure 2 is a visual representation of system under test for inter- 421 host testing. 423 +---------------------------------------------------+ 424 | System Under Test | 425 | +-----------------------------------------------+ | 426 | | Hyper-Visor | | 427 | | +-------------+ | | 428 | | | NVP | | | 429 | | +-----+ | Switch/ | +-----+ | | 430 | | | VM1 |<------>| Router/ |<------>| VM2 | | | 431 | | +-----+ VW | Firewall/ | VW +-----+ | | 432 | | | etc., | | | 433 | | +-------------+ | | 434 | +------------------------_----------------------+ | 435 | ^ | 436 | | Network Cabling | 437 | v | 438 | +-----------------------------------------------+ | 439 | | Physical Networking Components | | 440 | | switches, routers, firewalls etc., | | 441 | +-----------------------------------------------+ | 442 | ^ | 443 | | Network Cabling | 444 | v | 445 | +-----------------------------------------------+ | 446 | | Hyper-Visor | | 447 | | +-------------+ | | 448 | | | NVP | | | 449 | | +-----+ | Switch/ | +-----+ | | 450 | | | VM1 |<------>| Router/ |<------>| VM2 | | | 451 | | +-----+ VW | Firewall/ | VW +-----+ | | 452 | | | etc., | | | 453 | | +-------------+ | | 454 | +------------------------_----------------------+ | 455 +---------------------------------------------------+ 456 Legend 457 VM: Virtual Machine 458 VW: Virtual Wire 460 Figure 3! Inter-Host System Under Test 462 Virtual components have a direct dependency on the physical 463 infrastructure that is hosting these resources. Hardware 464 characteristics of the physical host impact the performance of the 465 virtual components. The components that are being tested and the 466 impact of the other hardware components within the hypervisor on the 467 performance of the SUT MUST be documented. Virtual component 468 performance is influenced by the physical hardware components within 469 the hypervisor. Access to various offloads such as TCP segmentation 470 offload, may have significant impact on performance. Firmware and 471 driver differences may also significantly impact results based on 472 whether the specific driver leverages any hardware level offloads 473 offered. Packet processing could be executed on shared or dedicated 474 cores on the main processor or via a dedicated co-processor or 475 embedded processor on NIC. 477 Hence, all physical components of the physical server running the 478 hypervisor that hosts the virtual components MUST be documented along 479 with the firmware and driver versions of all the components used to 480 help ensure repeatability of test results. For example, BIOS 481 configuration of the server MUST be documented as some of those 482 changes are designed to improve performance. Please refer to 483 Appendix A for a partial list of parameters to document. 485 5.1. Learning 487 SUT needs to learn all the addresses before running any tests. 488 Address learning rate MUST be considered in the overall performance 489 metrics because address learning rate has a high impact in 490 microservices based use cases where there is huge churn of end points 491 as they are created and destroyed on demand. In these cases, both 492 the throughput at stable state, and the time taken to get to stable 493 state MUST be tested and documented. 495 5.2. Traffic Flow Optimizations 497 Several mechanisms are employed to optimize traffic flows. Following 498 are some examples: 500 5.2.1. Fast Path 502 A single flow may go through various switching, routing and 503 firewalling decisions. While in the standard model, every single 504 packet has to go through the entire process/pipeline, some 505 optimizations help make this decision for the first packet, store the 506 final state for that packet, and leverage it to skip the process for 507 rest of the packets that are part of the same flow. 509 5.2.2. Dedicated cores / Co-processors 511 Packet processing is a CPU intensive workload. Some NVE's may use 512 dedicated cores or a co-processor primarily for packet processing 513 instead of sharing the cores used for the actual workloads. Such 514 cases MUST be documented. Tests MUST be performed with both shared 515 and dedicated cores. Results and differences in results MUST be 516 documented. 518 5.2.3. Prioritizing and de-prioritizing active flows 520 Certain algorithms may prioritize or de-prioritize traffic flows 521 based on purely their network characteristics such as the length of 522 the flow. For example, de-prioritize a long-lived flow. This could 523 result in changing the performance of a flow over a period of time. 524 Such optimizations MUST be documented, and tests MUST consist of 525 long-lived flows to help capture the change in performance for such 526 flows. Tests MUST note the point at which performance changes. 528 5.3. Server Architecture Considerations 530 When testing physical networking components, the approach taken is to 531 consider the device as a black-box. With virtual infrastructure, 532 this approach would no longer help as the virtual networking 533 components are an intrinsic part of the hypervisor they are running 534 on and are directly impacted by the server architecture used. Server 535 hardware components define the capabilities of the virtual networking 536 components. Hence, server architecture MUST be documented in detail 537 to help with repeatability of tests. And the entire hardware and 538 software components become the SUT. 540 5.3.1. NVE Component considerations 542 5.3.1.1. NVE co-located 544 Components of NVE co-located may be hypervisor based or offloaded 545 entirely to the NIC card or a hybrid model. In the case of 546 hypervisor-based model, they may be running in user space or kernel 547 space. Further, they may use dedicated cores, shared cores or in 548 some cases dedicated co-processors. All the components and the 549 process used MUST be documented. 551 5.3.1.2. NVE split 553 NVE split scenario generally has three primary components as 554 documented per RFC 8394. 556 "tNVE: Terminal-side NVE. The portion of Split-NVE functionalities 557 located on the end device supporting virtualization. The tNVE 558 interacts with a Tenant System through an internal interface in the 559 end device." tNVE may be made of either hypervisor controlled 560 components such as hypervisor provided switches or NVE controlled 561 components where the network functionality is not provided by the 562 hypervisor. In either case, the components used MUST be documented. 564 "nNVE: Network-side NVE. The portion of Split-NVE functionalities 565 located on the network device that is directly or indirectly 566 connected to the end device that contains the corresponding NVE. The 567 nNVE normally performs encapsulation to and decapsulation from the 568 overlay network." All the functionality provided by the nNVE MUST be 569 documented. 571 "External NVE: The physical network device that contains the nNVE." 572 Networking device hardware specs MUST be documented. Please use 573 Apendix A for an example of the specs that MUST be documented. 575 In either case, NVE co-located or NVE split all the components MUST 576 be documented. Where possible, individual components MUST be tested 577 independent of the entire system. For example, where possible, 578 hypervisor provided switching functionality MUST be tested 579 independent of the NVE. 581 Per RFC 8014, "For the split-NVE case, protocols will be needed that 582 allow the hypervisor and NVE to negotiate and set up the necessary 583 state so that traffic sent across the access link between a server 584 and the NVE can be associated with the correct virtual network 585 instance." Supported VM lifecycle events, from RFC 8394 section 2, 586 MUST be documented as part of the benchmark process. This process 587 MUST also include how the hypervisor and the external NVE have 588 signaled each other to reach an agreement. Example, see section 2.1 589 of RFC 8394 "VM creation event". The process used to update 590 agreement status MUST also be documented. 592 +---------------------------------------------------+ 593 | System Under Test | 594 | +-----------------------------------------------+ | 595 | | Hyper-Visor | | 596 | | +-----+ +-------------+ | | 597 | | | VM1 |<------>| tNVE | | | 598 | | +-----+ VW +-------------+ | | 599 | | ^ | | 600 | | | TSI | | 601 | | v Switch | | 602 | | +--------------------------+ | | 603 | | | External NVE | | | 604 | | | Router/Firewall/etc., | | | 605 | | +--------------------------+ | | 606 | | ^ | | 607 | | | TSI | | 608 | | v | | 609 | | +-------------+ +-----+ | | 610 | | | tNVE |<------>| VM2 | | | 611 | | +-------------+ VW +-----+ | | 612 | +-----------------------------------------------+ | 613 +---------------------------------------------------+ 614 Legend 615 VM: Virtual Machine 616 VW: Virtual Wire 617 TSI: Tenant System Interface 618 tNVE: Terminal Side NVE 620 Figure 4 NVE Split collocated - System Under Test 622 +---------------------------------------------------+ 623 | System Under Test | 624 | +-----------------------------------------------+ | 625 | | Hyper-Visor | | 626 | | +-------------+ | | 627 | | +-----+ | NVP | | | 628 | | | VM1 |<------>| Interface | | | 629 | | +-----+ VW +-------------+ | | 630 | +-----------------------------------------------+ | 631 | ^ | 632 | | Network Cabling | 633 | v | 634 | +-----------------------------------------------+ | 635 | | Physical switches, routers, firewalls etc., | | 636 | +-----------------------------------------------+ | 637 | ^ | 638 | | Network Cabling | 639 | v | 640 | +-----------------------------------------------+ | 641 | | Hyper-Visor/ +--------------------------+ | | 642 | | ToR Switch/ | NVP Split | | | 643 | | NIC etc., | Router/Firewall/etc., | | | 644 | | +--------------------------+ | | 645 | +-----------------------------------------------+ | 646 | ^ | 647 | | Network Cabling | 648 | v | 649 | +-----------------------------------------------+ | 650 | | Physical switches, routers, firewalls etc., | | 651 | +-----------------------------------------------+ | 652 | ^ | 653 | | Network Cabling | 654 | v | 655 | +-----------------------------------------------+ | 656 | | Hyper-Visor +-------------+ | | 657 | | | NVP | +-----+ | | 658 | | | Interface |<------>| VM2 | | | 659 | | +-------------+ VW +-----+ | | 660 | +-----------------------------------------------+ | 661 +---------------------------------------------------+ 662 Legend 663 VM: Virtual Machine 664 VW: Virtual Wire 666 Figure 5 NVE Split not collocated - System Under Test 668 5.3.2. Frame format/sizes within the Hypervisor 670 Maximum Transmission Unit (MTU) limits physical network component's 671 frame sizes. The most common max supported MTU for physical devices 672 is 9000. However, 1500 MTU is the standard. Physical network 673 testing and NFV uses these MTU sizes for testing. However, the 674 virtual networking components that live inside a hypervisor, may work 675 with much larger segments because of the availability of hardware and 676 software based offloads. Hence, the normal smaller packets based 677 testing is not relevant for performance testing of virtual networking 678 components. All the TCP related configuration such as TSO size, 679 number of RSS queues MUST be documented along with any other physical 680 NIC related configuration. 682 NVE co-located may have a different performance profile when compared 683 with NVE split because, the NVE co-located may have access to 684 offloads that may not be available when the packet has to traverse 685 the physical link. Such differences MUST be documented. 687 5.3.3. Baseline testing with Logical Switch 689 Logical switch is often an intrinsic component of the test system 690 along with any other hardware and software components used for 691 testing. Also, other logical components cannot be tested independent 692 of the Logical Switch. 694 5.3.4. Repeatability 696 To ensure repeatability of the results, in the physical network 697 component testing, much care is taken to ensure the tests are 698 conducted with exactly the same parameters. Example parameters such 699 as MAC addresses used. 701 When testing NVP components with an application layer test tool, 702 there may be a number of components within the system that may not be 703 available to tune or to ensure they maintain a desired state. 704 Example: housekeeping functions of the underlying Operating System. 706 Hence, tests MUST be repeated a number of times and each test case 707 MUST be run for at least 2 minutes if test tool provides such an 708 option. Results SHOULD be derived from multiple test runs. Variance 709 between the tests SHOULD be documented. 711 5.3.5. Tunnel encap/decap outside the Hypervisor 713 Logical network components may also have performance impact based on 714 the functionality available within the physical fabric. Physical 715 fabric that supports NVO encap/decap is one such case that may have a 716 different performance profile. Any such functionality that exists on 717 the physical fabric MUST be part of the test result documentation to 718 ensure repeatability of tests. In this case SUT MUST include the 719 physical fabric if its being used for encap/decap operations. 721 5.3.6. SUT Hypervisor Profile 723 Physical networking equipment has well defined physical resource 724 characteristics such as type and number of ASICs/SoCs used, amount of 725 memory, type and number of processors etc., Virtual networking 726 components performance is dependent on the physical hardware that 727 hosts the hypervisor. Hence the physical hardware usage, which is 728 part of SUT, for a given test MUST be documented, for example, CPU 729 usage when running logical router. 731 CPU usage, changes based on the type of hardware available within the 732 physical server. For example, TCP Segmentation Offload greatly 733 reduces CPU usage by offloading the segmentation process to the NIC 734 card on the sender side. Receive side scaling offers similar benefit 735 on the receive side. Hence, availability and status of such hardware 736 MUST be documented along with actual CPU/Memory usage when the 737 virtual networking components have access to such offload capable 738 hardware. 740 Following is a partial list of components that MUST be documented 741 both in terms of what is available and also what is used by the SUT 743 o CPU - type, speed, available instruction sets (e.g. AES-NI) 745 o Memory - type, amount 747 o Storage - type, amount 749 o NIC Cards - 751 * Type 753 * number of ports 755 * offloads available/used - following is a partial list of 756 possible features 758 o TCP Segmentation Offload 760 o Large Receive Offload 761 o Checksum Offloads 763 o Receive Side Scaling 765 o Other Queuing Mechanisms 767 * drivers, firmware (if applicable) 769 * HW revision 771 o Libraries such as DPDK if available and used 773 o Number and type of VMs used for testing and 775 * vCPUs 777 * RAM 779 * Storage 781 * Network Driver 783 * Any prioritization of VM resources 785 * Operating System type, version and kernel if applicable 787 * TCP Configuration Changes - if any 789 * MTU 791 o Test tool 793 * Workload type 795 * Protocol being tested 797 * Number of threads 799 * Version of tool 801 o For inter-hypervisor tests, 803 * Physical network devices that are part of the test 805 o Note: For inter-hypervisor tests, system under test 806 is no longer only the virtual component that is being 807 tested but the entire fabric that connects the virtual 808 components become part of the system under test. 810 5.4. Benchmarking Tools Considerations 812 5.4.1. Considerations for NVE 814 Virtual network components in NVE work closer to the application 815 layer then the physical networking components, which enables the 816 virtual network components to take advantage of TCP optimizations 817 such as TCP Segmentation Offload (TSO) and Large Receive Offload 818 (LRO). Because of this optimizations, virtual network components 819 work with type and size of segments that are often not the same type 820 and size that the physical network works with. Hence, testing 821 virtual network components MUST be done with application layer 822 segments instead of the physical network layer packets. Testing MUST 823 be done with application layer testing tools such as iperf, netperf 824 etc., 826 5.4.2. Considerations for Split-NVE 828 In the case of Split-NVE, since they may not leverage any TCP related 829 optimizations, typical network test tools focused on packet 830 processing MUST be used. However, the tools used MUST be able to 831 leverage Receive Side Scaling (RSS). 833 6. Control Plane Scale Considerations 835 For a holistic approach to performance testing, control plane 836 performance must also be considered. While the previous sections 837 focused on performance tests after the SUT has come to a steady 838 state, the following section focusses on tests to measure the time 839 taken to bring the SUT to steady state. 841 In a physical network infrastructure world view, this could be 842 various stages such as boot up time, time taken to apply 843 configuration, BGP convergence time etc., In a virtual 844 infrastructure world, this involves lot more components which may 845 also be distributed across multiple hosts. Some of the components 846 are: 848 o VM Creation Event 850 o VM Migration Event 852 i How many total VMs can the SUT support 853 o What is the rate at which the SUT would allow creation of VMs 855 Please refer to section 2 of RFC 8394 for various VM events and their 856 definitions. In the following section we further clarify some of the 857 terms used in the above RFC. 859 VM Creation 861 For the purposes of NVP control plane testing, VM Creation event is 862 when a VM starts participating for the first time on a NVP provided 863 network. This involves various actions on the tNVE and NVP. Please 864 refer to 2.1 "VM Creation Event" of RFC 8394 for more details. 866 In order to rule out any Hypervisor imposed limitations, System Under 867 Test must first be profiled and baselined with-out the use of NVP 868 components. For the purposes of baselining control plane, the VM 869 used may have very small footprint such as DSL Linux which runs in 870 16MB RAM. 872 Once a baseline has been established for a single HV, a similar 873 exercise MUST be done on multiple HVs to establish a baseline for the 874 entire hypervisor domain. However, it may not be practical to have 875 physical hosts and hence nested hosts may be used for this purpose 877 6.1.1. VM Events 879 Performance of various control plane activities which are associated 880 with the System Under Test, MUST BE documented. 882 o VM Creation: Time taken to join the VMs to the SUT provided 883 network 885 o Policy Realization: Time taken for policy realization on the VM 887 o VM Migration: Time taken to migrate a VM from one SUT provided 888 network to another SUT provided network 890 For the test itself, the following process could be use: 892 1 API to call to join VM on the SUT provided network 894 2 Loop while incrementing a timer - till the VM comes online on 895 the SUT provided network 897 Similarly, policy realization and VM migration may also be tested 898 with a check on whether the VM is available or not available based on 899 the type of policy that is applied. 901 6.1.2. Scale 903 SUT must also be tested to determine the maximum scale supported. 904 Scale can be multi-faceted such as the following: 906 o Total # of VMs per Host 908 o Total # of VMs per one SUT Domain 910 o Total # of Hosts per one SUT Domain 912 o Total # of Logical Switches per one SUT Domain 914 * Total # of VMs per one SUT provided Logical Switch 916 o Per Host 918 o Per SUT Domain 920 o Total # of Logical Routers per one SUT Domain 922 * Total # of Logical Switches per one Logical Router 924 * Total # of VMs on a single Logical Router 926 o Total # of Firewall Sections 928 o Total # of Firewall Rules per Section 930 o Total # of Firewall Rules applied per VM 932 o Total # of Firewall Rules applied per Host 934 o Total # of Firewall Rules per SUT 936 6.1.3. Control Plane Performance at Scale 938 Benchmarking MUST also test and document the control performance at 939 scale. That is, 941 o Total # VMs that can be created in parallel 943 * How long does the action take 945 o Total # of VMs that can be migrated in parallel 947 * How long does the action take 949 o Total amount of time taken to apply 1 firewall across the 950 entire VMs under a SUT 952 o Time taken to apply 1000s rules on a SUT 954 7. Security Considerations 956 Benchmarking activities as described in this memo are limited to 957 technology characterization of a Device Under Test/System Under Test 958 (DUT/SUT) using controlled stimuli in a laboratory environment, with 959 dedicated address space and the constraints specified in the sections 960 above. 962 The benchmarking network topology will be an independent test setup 963 and MUST NOT be connected to devices that may forward the test 964 traffic into a production network, or misroute traffic to the test 965 management network. 967 Further, benchmarking is performed on a 'black-box' basis, relying 968 solely on measurements observable external to the DUT/SUT. 970 Special capabilities SHOULD NOT exist in the DUT/SUT specifically for 971 benchmarking purposes. Any implications for network security arising 972 from the DUT/SUT SHOULD be identical in the lab and in production 973 networks. 975 8. IANA Considerations 977 No IANA Action is requested at this time. 979 9. Conclusions 981 Network Virtualization Platforms, because of their proximity to the 982 application layer and since they can take advantage of TCP stack 983 optimizations, do not function on packets/sec basis. Hence, 984 traditional benchmarking methods, while still relevant for Network 985 Function Virtualization, are not designed to test Network 986 Virtualization Platforms. Also, advances in application 987 architectures such as micro-services, bring new challenges and need 988 benchmarking not just around throughput and latency but also around 989 scale. New benchmarking methods that are designed to take advantage 990 of the TCP optimizations or needed to accurately benchmark 991 performance of the Network Virtualization Platforms 993 10. References 995 10.1. Normative References 997 [RFC7364] T. Narten, E. Gray, D. Black, L. Fang, L. Kreeger, M. 998 Napierala, 'Problem Statement: Overlays for Network 999 Virtualization', RFC 7364, October 2014, 1000 https://datatracker.ietf.org/doc/rfc7364/ 1002 [RFC8014] D. Black, J. Hudson, L. Kreeger, M. Lasserre, T. Narten 'An 1003 Architecture for Data-Center Network Virtualization over 1004 Layer 3 (NVO3)', RFC 8014, December 2016, 1005 https://tools.ietf.org/html/rfc8014 1007 [RFC8394] Y. Li, D. Eastlake 3rd, L. Kreeger, T. Narten, D. Black 1008 'Split Network Virtualization Edge (Split-NVE) Control- 1009 Plane Requirements', RFC 8394, May 2018, 1010 https://tools.ietf.org/html/rfc8394 1012 [nv03] IETF, WG, Network Virtualization Overlays, 1013 1015 10.2. Informative References 1017 [RFC8172] A. Morton 'Considerations for Benchmarking Virtual Network 1018 Functions and Their Infrastructure', RFC 8172, July 2017, 1019 https://tools.ietf.org/html/rfc8172 1021 11. Acknowledgments 1023 This document was prepared using 2-Word-v2.0.template.dot. 1025 Appendix A.!Partial List of Parameters to Document 1027 A.1. CPU 1029 CPU Vendor 1031 CPU Number 1033 CPU Architecture 1035 # of Sockets (CPUs) 1037 # of Cores 1039 Clock Speed (GHz) 1041 Max Turbo Freq. (GHz) 1043 Cache per CPU (MB) 1045 # of Memory Channels 1047 Chipset 1049 Hyperthreading (BIOS Setting) 1051 Power Management (BIOS Setting) 1053 VT-d 1055 Shared vs Dedicated packet processing 1057 User space vs Kernel space packet processing 1059 A.2. Memory 1061 Memory Speed (MHz) 1063 DIMM Capacity (GB) 1065 # of DIMMs 1067 DIMM configuration 1069 Total DRAM (GB) 1071 A.3. NIC 1073 Vendor 1075 Model 1077 Port Speed (Gbps) 1079 Ports 1081 PCIe Version 1083 PCIe Lanes 1085 Bonded 1087 Bonding Driver 1089 Kernel Module Name 1091 Driver Version 1093 VXLAN TSO Capable 1095 VXLAN RSS Capable 1097 Ring Buffer Size RX 1099 Ring Buffer Size TX 1101 A.4. Hypervisor 1103 Hypervisor Name 1105 Version/Build 1107 Based on 1109 Hotfixes/Patches 1111 OVS Version/Build 1113 IRQ balancing 1115 vCPUs per VM 1117 Modifications to HV 1118 Modifications to HV TCP stack 1120 Number of VMs 1122 IP MTU 1124 Flow control TX (send pause) 1126 Flow control RX (honor pause) 1128 Encapsulation Type 1130 A.5. Guest VM 1132 Guest OS & Version 1134 Modifications to VM 1136 IP MTU Guest VM (Bytes) 1138 Test tool used 1140 Number of NetPerf Instances 1142 Total Number of Streams 1144 Guest RAM (GB) 1146 A.6. Overlay Network Physical Fabric 1148 Vendor 1150 Model 1152 # and Type of Ports 1154 Software Release 1156 Interface Configuration 1158 Interface/Ethernet MTU (Bytes) 1160 Flow control TX (send pause) 1162 Flow control RX (honor pause) 1164 A.7. Gateway Network Physical Fabric 1166 Vendor 1168 Model 1170 # and Type of Ports 1172 Software Release 1174 Interface Configuration 1176 Interface/Ethernet MTU (Bytes) 1178 Flow control TX (send pause) 1180 Flow control RX (honor pause) 1182 A.8. Metrics 1184 Drops on the virtual infrastructure 1186 Drops on the physical underlay infrastructure 1188 Authors' Addresses 1190 Samuel Kommu 1191 VMware 1192 3401 Hillview Ave 1193 Palo Alto, CA, 94304 1195 Email: skommu@vmware.com 1197 Jacob Rapp 1198 VMware 1199 3401 Hillview Ave 1200 Palo Alto, CA, 94304 1202 Email: jrapp@vmware.com