idnits 2.17.1 draft-ietf-nvo3-encap-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 30, 2022) is 726 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 7042 (Obsoleted by RFC 9542) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 Workgroup S. Boutros, Ed. 2 Internet-Draft Ciena 3 Intended Status: Informational D. Eastlake, Ed. 4 Futurewei 5 Expires: October 29, 2022 April 30, 2022 7 Network Virtualization Overlays (NVO3) 8 Encapsulation Considerations 9 draft-ietf-nvo3-encap-08 11 Abstract 12 The IETF Network Virtualization Overlays (NVO3) Working Group Chairs 13 and Routing Area Director chartered a design team to take forward the 14 encapsulation discussion and see if there was potential to design a 15 common encapsulation that addresses the various technical concerns. 17 There are implications of having different encapsulations in real 18 environments consisting of both software and hardware implementations 19 and within and spanning multiple data centers. For example, OAM 20 functions such as path MTU discovery become challenging with multiple 21 encapsulations along the data path. 23 The design team recommended Geneve with a few modifications as the 24 common encapsulation. This document provides more details, 25 particularly in Section 7. 27 Status of This Document 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Distribution of this document is unlimited. Comments should be sent 33 to the authors or the IDR Working Group mailing list . 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 https://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 47 Shadow Directories can be accessed at 48 https://www.ietf.org/shadow.html. 50 Copyright Notice 52 Copyright (c) 2022 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Revised BSD License text as described in Section 4.e of the 62 Trust Legal Provisions and are provided without warranty as described 63 in the Revised BSD License. 65 Table of Contents 67 1. Introduction............................................4 68 2. Design Team Goals.......................................4 69 3. Terminology.............................................4 70 4. Abbreviations and Acronyms..............................5 72 5. Encapsulation Issues and Background.....................6 73 5.1 Geneve.................................................6 74 5.2 Generic UDP Encapsulation (GUE)........................7 75 5.3 Generic Protocol Extension (GPE) for VXLAN.............8 77 6. Common Encapsulation Considerations.....................9 78 6.1. Current Encapsulations................................9 79 6.2. Useful Extensions Use Cases...........................9 80 6.2.1. Telemetry Extensions................................9 81 6.2.2. Security/Integrity Extensions......................10 82 6.2.3. Group Based Policy.................................10 83 6.3. Hardware Considerations..............................11 84 6.4. Extension Size.......................................11 85 6.5. Ordering of Extension Headers........................12 86 6.6. TLV versus Bit Fields................................12 87 6.7. Control Plane Considerations.........................13 88 6.8. Split NVE............................................14 89 6.9. Larger VNI Considerations............................15 91 7. Design Team Recommendations............................16 93 8. Acknowledgements.......................................19 94 9. Security Considerations................................19 95 10. IANA Considerations...................................19 97 11. References............................................20 98 11.1 Normative References.................................20 99 11.2 Informative References...............................20 101 Appendix A: Encapsulations Comparison.....................23 102 A.1. Overview.............................................23 103 A.2. Extensibility........................................23 104 A.2.1. Native Extensibility Support.......................23 105 A.2.2. Extension Parsing..................................23 106 A.2.3. Critical Extensions................................24 107 A.2.4. Maximal Header Length..............................24 108 A.3. Encapsulation Header.................................24 109 A.3.1. Virtual Network Identifier (VNI)...................24 110 A.3.2. Next Protocol......................................24 111 A.3.3. Other Header Fields................................25 112 A.4. Comparison Summary...................................25 114 Contributors..............................................27 116 1. Introduction 118 The NVO3 Working Group is chartered to gather requirements and 119 develop solutions for network virtualization data planes based on 120 encapsulation of virtual network traffic over an IP-based underlay 121 data plane. Requirements include due consideration for OAM and 122 security. Based on these requirements the WG was to select, extend, 123 and/or develop one or more data plane encapsulation format(s). 125 This led to WG drafts and an RFC describing three encapsulations as 126 follows: 128 - [RFC8926] Geneve: Generic Network Virtualization Encapsulation 129 - [I-D.ietf-intarea-gue] Generic UDP Encapsulation 130 - [I-D.ietf-nvo3-vxlan-gpe] Generic Protocol Extension for VXLAN 131 (VXLAN-GPE) 133 Discussion on the list and in face-to-face meetings identified a 134 number of technical problems with each of these encapsulations. 135 Furthermore, there was clear consensus at the 96th IETF meeting in 136 Berlin that, to maximize interoperability, the working group should 137 progress only one data plane encapsulation. In order to overcome a 138 deadlock on the encapsulation decision, the WG consensus was to form 139 a Design Team [RFC2418] to resolve this issue. 141 2. Design Team Goals 143 The Design Team (DT) formed as described above was to take one of the 144 proposed encapsulations and enhance it to address the technical 145 concerns. The simple evolution of deployed networks as well as 146 applicability to all locations in the NVO3 architecture are goals. 147 The DT was to specifically avoid a design that is burdensome on 148 hardware implementations but should allow future extensibility. The 149 chosen design also needs to operate well with ICMP and in Equal Cost 150 Multi-Path (ECMP) environments. If further extensibility is 151 required, then it should be done in such a manner that it does not 152 require the consent of an entity outside of the IETF. 154 3. Terminology 156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 158 "OPTIONAL" in this document are to be interpreted as described in BCP 159 14 [RFC2119] [RFC8174] when, and only when, they appear in all 160 capitals, as shown here. 162 4. Abbreviations and Acronyms 164 The following abbreviations and acronyms are used in this document: 166 ACL - Access Control List 168 DT - NVO3 encapsulation Design Team 170 ECMP - Equal Cost Multi-Path 172 EVPN - Ethernet VPN [RFC8365] 174 Geneve - Generic Network Virtualization Encapsulation [RFC8926] 176 GPE - Generic Protocol Extension 178 GUE - Generic UDP Encapsulation [I-D.ietf-intarea-gue] 180 HMAC - Hash based keyed Message Authentication Code [RFC2104] 182 IEEE - Institute for Electrical and Electronic Engineers 183 (www.ieee.org) 185 NIC - Network Interface Card (refers to network interface 186 hardware which is not necessarily a discrete "card") 188 NSH - Network Service Header [RFC8300] 190 NVA - Network Virtualization Authority 192 NVE - Network Virtual Edge (device) 194 NVO3 - Network Virtualization Overlays over Layer 3 196 OAM - Operations, Administration, and Maintenance [RFC6291] 198 PWE3 - Pseudowire Emulation Edge to Edge 200 TCAM - Ternary Content-Addressable Memory 202 TLV - Type, Length, and Value 204 Transit device - Underlay network devices between NVE(s). 206 UUID - Universally Unique Identifier 208 VNI - Virtual Network Identifier 210 VXLAN - Virtual eXtensible LAN [RFC7348] 212 5. Encapsulation Issues and Background 214 The following subsections describe issues with current encapsulations 215 as summarized by the WG Chairs. Numerous extensions and options have 216 been designed for GUE and Geneve but these have not yet been 217 validated by the WG. 219 Also included are diagrams and information on the candidate 220 encapsulations. These are mostly copied from other documents. Since 221 each protocol is assumed to be sent over UDP, an initial UDP Header 222 is shown which would be preceded by an IPv4 or IPv6 Header. 224 5.1 Geneve 226 The Geneve packet format, taken from [RFC8926], is shown in Figure 1 227 below. 229 0 1 2 3 230 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 232 Outer UDP Header: 233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 234 | Source Port | Dest Port = 6081 Geneve | 235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 | UDP Length | UDP Checksum | 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 Geneve Header: 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 243 | Virtual Network Identifier (VNI) | Reserved | 244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 245 | | 246 ~ Variable-Length Options ~ 247 | | 248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 250 Figure 1: Geneve Header 252 The type of payload being carried is indicated by an Ethertype 253 [RFC7042] in the Protocol Type field in the Geneve Header; Ethernet 254 itself is represented by Ethertype 0x6558. See [RFC8926] for details 255 concerning UDP header fields. The O bit indicates an OAM packet. The 256 C bit is the "Critical" bit which means that the options must be 257 processed or the packet discarded. 259 Issues with Geneve [RFC8926] are as follows: 261 - Can't be implemented cost-effectively in all use cases because 262 variable length header and order of the TLVs makes it costly (in 263 terms of number of gates) to implement in hardware. 265 - Header doesn't fit into largest commonly available parse buffer 266 (256 bytes in NIC). Cannot justify doubling buffer size unless it is 267 mandatory for hardware to process additional option fields. 269 Selection of Geneve despite these issues may be the result of the 270 Geneve design effort assuming that the Geneve header would typically 271 be delivered to a server and parsed in software. 273 5.2 Generic UDP Encapsulation (GUE) 275 0 1 2 3 276 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 278 UDP Header: 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 | Source port | Dest port = 6080 GUE | 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 | UDP Length | Checksum | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 GUE Header: 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | 0 |C| Hlen | Proto/ctype | Flags | 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 | | 290 ~ Extensions Fields (optional) ~ 291 | | 292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 294 Figure 2: GUE Header 296 The type of payload being carried is indicated by an IANA Internet 297 protocol number in the Proto/ctype field. The C bit indicates a 298 Control packet. 300 Issues with GUE [I-D.ietf-intarea-gue] are as follows: 302 - There were a significant number of objections to GUE related to the 303 complexity of implementation in hardware, similar to those noted for 304 Geneve above. 306 5.3 Generic Protocol Extension (GPE) for VXLAN 308 0 1 2 3 309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 311 Outer UDP Header: 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 | Source Port | Dest Port = 4790 GPE | 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 | UDP Length | UDP Checksum | 316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 VXLAN-GPE Header 319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 320 |R|R|Ver|I|P|B|O| Reserved | Next Protocol | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 322 | VXLAN Network Identifier (VNI) | Reserved | 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 325 Figure 3: GPE Header 327 The type of payload being carried is indicated by the Next Protocol 328 field using a VXLAN-GPE-specific registry. The I bit indicates that 329 the VNI is valid. The P bit indicates that the Next Protocol field is 330 valid. The B bit indicates the packet is an ingress replicated 331 Broadcase, Unknown Unicast, or Multicast packet. The O bit indicates 332 an OAM packet. 334 Issues with VXLAN-GPE [I-D.ietf-nvo3-vxlan-gpe] are as follows: 336 - GPE is not day-1 backwards compatible with VXLAN [RFC7348]. 337 Although the frame format is similar, it uses a different UDP port, 338 so would require changes to existing implementations even if the rest 339 of the GPE frame were the same. 341 - GPE is insufficiently extensible. It adds a Next Protocol field and 342 some flag bits to the VXLAN header but is not otherwise extensible. 344 - Security, e.g., of the VNI, has not been addressed by GPE. 345 Although a shim header could be used for security and other 346 extensions, this has not been defined yet and its implications on 347 offloading in NICs are not understood. 349 6. Common Encapsulation Considerations 351 6.1. Current Encapsulations 353 Appendix A includes a detailed comparison between the three proposed 354 encapsulations. The comparison indicates several common properties 355 but also three major differences among the encapsulations: 357 - Extensibility: Geneve and GUE were defined with built-in 358 extensibility, while VXLAN-GPE is not inherently extensible. Note 359 that any of the three encapsulations can be extended using the 360 Network Service Header (NSH [RFC8300]). 362 - Extension method: Geneve is extensible using Type/Length/Value 363 (TLV) fields, while GUE uses a small set of possible extensions, and 364 a set of flags that indicate which extensions are present. 366 - Length field: Geneve and GUE include a Length field, indicating the 367 length of the encapsulation header, while VXLAN-GPE does not include 368 such a field. 370 6.2. Useful Extensions Use Cases 372 Non-vendor specific TLVs MUST follow the standardization process. 373 The following use cases for extensions shows that there is a strong 374 requirement to support variable length extensions with possible 375 different subtypes. 377 6.2.1. Telemetry Extensions 379 In several scenarios it is beneficial to make information about the 380 path a packet took through the network or through a network device as 381 well as associated telemetry information available to the operator. 383 This includes not only tasks like debugging, troubleshooting, and 384 network planning and optimization but also policy or service level 385 agreement compliance checks. 387 Packet scheduling algorithms, especially for balancing traffic across 388 equal cost paths or links, often leverage information contained 389 within the packet, such as protocol number, IP address, or MAC 390 address. Probe packets would thus either need to be sent between the 391 exact same endpoints with the exact same parameters, or probe packets 392 would need to be artificially constructed as "fake" packets and 393 inserted along the path. Both approaches are often not feasible from 394 an operational perspective because access to the end-system is not 395 feasible or the diversity of parameters and associated probe packets 396 to be created is simply too large. An extension providing an in-band 397 telemetry mechanism [I-D.ietf-ippm-ioam-data] is an alternative in 398 those cases. 400 6.2.2. Security/Integrity Extensions 402 Since the currently proposed NVO3 encapsulations do not protect their 403 headers, a single bit corruption in the VNI field could deliver a 404 packet to the wrong tenant. Extension headers are needed to use any 405 sophisticated security. 407 The possibility of VNI spoofing with an NVO3 protocol is exacerbated 408 by using UDP. Systems typically have no restrictions on applications 409 being able to send to any UDP port so an unprivileged application can 410 trivially spoof VXLAN [RFC7348] packets for instance, including using 411 arbitrary VNIs. 413 One can envision support of an HMAC-like Message Authentication Code 414 (MAC) [RFC2104] in an NVO3 extension to authenticate the header and 415 the outer IP addresses, thereby preventing attackers from injecting 416 packets with spoofed VNIs. 418 Another aspect of security is payload security. Essentially this 419 makes packets that look like the following: 421 IP|UDP|NVO3 Encap|DTLS/IPSEC-ESP Extension|payload. 423 This is desirable since we still have the UDP header for ECMP, the 424 NVO3 header is in plain text so it can be read by network elements, 425 and different security or other payload transforms can be supported 426 on a single UDP port (we don't need a separate UDP port for 427 DTLS/IPSEC [RFC9147]/[RFC6071]). 429 6.2.3. Group Based Policy 431 Another use case would be to carry the Group Based Policy (GBP) 432 source group information within a NVO3 header extension in a similar 433 manner as has been implemented for VXLAN 434 [I-D.smith-vxlan-group-policy]. This allows various forms of policy 435 such as access control and QoS to be applied between abstract groups 436 rather than coupled to specific endpoint addresses. 438 6.3. Hardware Considerations 440 Hardware restrictions should be taken into consideration along with 441 future hardware enhancements that may provide more flexible metadata 442 processing. However, the set of options that need to and will be 443 implemented in hardware will be a subset of what is implemented in 444 software, since software NVEs are likely to grow features, and hence 445 option support, at a more rapid rate. 447 It is hard to predict which options will be implemented in which 448 piece of hardware and when. That depends on whether the hardware 449 will be in the form of a NIC providing increasing offload 450 capabilities to software NVEs, or a switch chip being used as an NVE 451 gateway towards non-NVO3 parts of the network, or even a transit 452 device that participates in the NVO3 dataplane, e.g., for OAM 453 purposes. 455 A result of this is that it doesn't look useful to prescribe some 456 order of the option so that the ones that are likely to be 457 implemented in hardware come first; we can't decide such an order 458 when we define the options, however a control plane can enforce such 459 an order for some hardware implementation. 461 We do know that hardware needs to initially be able to efficiently 462 skip over the NVO3 header to find the inner payload. That is needed 463 both for NICs implementing various TCP offload mechanisms and for 464 transit devices and NVEs applying policy or ACLs to the inner 465 payload. 467 6.4. Extension Size 469 Extension header length has a significant impact on hardware and 470 software implementations. A maximum total header length that is too 471 small will unnecessarily constrain software flexibility. A maximum 472 total header length that is too large will place a nontrivial cost on 473 hardware implementations. Thus, the DT recommends that there be a 474 minimum and maximum total available extension header length 475 specified. The maximum total header length is determined by the size 476 of the bit field allocated for the total extension header length 477 field. The risk with this approach is that it may be difficult to 478 extend the total header size in the future. The minimum total header 479 length is determined by a requirement in the specifications that all 480 implementations must meet. The risk with this approach is that all 481 implementations will only implement support for the minimum total 482 header length which would then become the de facto maximum total 483 header length. 485 The recommended minimum total svailable header length is 64 bytes. 487 The size of an extension header should always be 4 byte aligned. 489 The maximum length of a single option should be large enough to meet 490 the different extension use case requirements, e.g., in-band 491 telemetry and future use. 493 6.5. Ordering of Extension Headers 495 To support hardware nodes at the target NVE or at a transit device 496 that can process one or a few extension headers in TCAM, a control 497 plane in such a deployment can signal a capability to ensure a 498 specific extension header will always appear in a specific order, for 499 example the first one in the packet. 501 The order of the extension headers should be hardware friendly for 502 both the sender and the receiver and possibly some transit devices 503 also. 505 Transit devices don't participate in control plane communication 506 between the end points and are not required to process the extension 507 headers; however, if they do, they may need to process only a small 508 subset of the extension headers that will be consumed by target NVEs. 510 6.6. TLV versus Bit Fields 512 If there is a well-known initial set of options that are likely to be 513 implemented in software and in hardware, it can be efficient to use 514 the bit fields approach to indicate the presence of extensions as in 515 GUE. However, as described in section 6.3, if options are added over 516 time and different subsets of options are likely to be implemented in 517 different pieces of hardware, then it would be hard for the IETF to 518 specify which options should get the early bit fields. TLVs are a 519 lot more flexible, which avoids the need to determine the relative 520 importance different options. However, general TLVs of arbitrary 521 order, size, and repetition are difficult to implement in hardware. 522 A middle ground is to use TLVs with restrictions on their size and 523 alignment, observing that individual TLVs can have a fixed length, 524 and support via the control plane a method such that an NVE will only 525 receive options that it needs and implements. The control plane 526 approach can potentially be used to control the order of the TLVs 527 sent to a particular NVE. Note that transit devices are not likely 528 to participate in the control plane; hence, to the extent that they 529 need to participate in option processing, some other method must be 530 used. Transit devices would have issues with future GUE bit fields 531 being defined for future options as well. 533 A benefit of TLVs from a hardware perspective is that they are self 534 describing, i.e., all the information is in the TLV. In a bit field 535 approach, the hardware needs to look up the bit to determine the 536 length of the data associated with the bit through some separate 537 table, which would add hardware complexity. 539 There are use cases where multiple modules of software are running on 540 an NVE. These can be modules such as a diagnostic module by one 541 vendor that does packet sampling and another module from a different 542 vendor that implements a firewall. Using a TLV format, it is easier 543 to have different software modules process different TLVs, which 544 could be standard extensions or vendor specific extensions defined by 545 the different vendors, without conflicting with each other. This can 546 help with hardware modularity as well. There are some 547 implementations with options that allows different software modules, 548 like MAC learning and security, to process different options. 550 6.7. Control Plane Considerations 552 Given that we want to allow considerable flexibility and 553 extensibility, e.g., for software NVEs, yet be able to support 554 important extensions in less flexible contexts such as hardware NVEs, 555 it is useful to consider the control plane. By control plane in this 556 section we mean both protocols, such as EVPN [RFC8365] and others, 557 and deployment specific configuration. 559 If each NVE can express in the control plane that it only supports 560 certain extensions (which could be a single extension, or a few), and 561 the source NVEs only include supported extensions in the NVO3 562 packets, then the target NVE can both use a simpler parser (e.g., a 563 TCAM might be usable to look for a single NVO3 extension) and the 564 depth of the inner payload in the NVO3 packet will be minimized. 565 Furthermore, if the target NVE cares about a few extensions and can 566 express in the control plane the desired order of those extensions in 567 the NVO3 packets, then the deployment can provide useful 568 functionality with simplified hardware requirements for the target 569 NVE. 571 Transit devices that are not aware of the NVO3 extensions somewhat 572 benefit from such an approach, since the inner payload is less deep 573 in the packet if no extraneous extension headers are included in the 574 packet. In general, a transit device is not likely to participate in 575 the NVO3 control plane. However, configuration mechanisms can take 576 into account limitations of the transit devices used in particular 577 deployments. 579 Note that with this approach different NVEs could desire different 580 extensions or sets of extensions, which means that the source NVE 581 needs to be able to place different sets of extensions in different 582 NVO3 packets, and perhaps in different order. It also assumes that 583 underlay multicast or replication servers are not used together with 584 NVO3 extension headers. 586 There is a need to consider mandatory extensions versus optional 587 extensions. Mandatory extensions require the receiver to drop the 588 packet if the extension is unknown. A control plane mechanism can 589 prevent the need for dropping unknown extensions, since they would 590 not be included to target NVEs that do not support them. 592 The control planes defined today need to add the ability to describe 593 the different encapsulations. Thus, perhaps EVPN [RFC8365] and any 594 other control plane protocol that the IETF defines should have a way 595 to indicate the supported NVO3 extensions and their order, for each 596 of the encapsulations supported. 598 The WG should consider developing a separate draft on guidance for 599 option processing and control plane participation. This should 600 provide examples/guidance on range of usage models and deployments 601 scenarios for specific options and ordering that are relevant for 602 that specific deployment. This includes end points and middle boxes 603 using the options. Having the control plane negotiate the 604 constraints is the most appropriate and flexible way to address these 605 requirements. 607 6.8. Split NVE 609 If the working group sees a need for having the hosts send and 610 receive options in a split NVE case [RFC8394], this is possible using 611 any of the existing extensible encapsulations (Geneve, GUE, GPE+NSH) 612 by defining a way to carry those over other transports. NSH can 613 already be used over different transports. 615 If we need to do this with other encapsulations it can be done by 616 defining an Ethertype so that it can be carried over Ethernet and 617 [802.1Q]. 619 If we need to carry other encapsulations over MPLS, it would require 620 an EVPN control plane to signal that other encapsulation header + 621 options will be present in front of the L2 packet. The VNI can be 622 ignored in the header, and the MPLS label will be the one used to 623 identify the EVPN L2 instance. 625 6.9. Larger VNI Considerations 627 The DT discussed whether we should make the VNI 32-bits or larger. 628 The benefit of a 24-bit VNI would be to avoid unnecessary changes 629 with existing proposals and implementations that are almost all, if 630 not all, using 24-bit VNI. If we need a larger VNI, an extension can 631 be used to support that. 633 7. Design Team Recommendations 635 The Design Team (DT) concluded that Geneve is most suitable as a 636 starting point for a proposed standard for network virtualization, 637 for the following reasons: 639 1. The DT studied whether VNI should be in the base header or in an 640 extension header and whether it should be a 24-bit or 32-bit field. 641 The Design Team agreed that VNI is critical information for network 642 virtualization and MUST be present in all packets. The DT also 643 agreed that a 24-bit VNI, which is supported by Geneve, matches the 644 existing widely used encapsulation formats, i.e., VXLAN [RFC7348] and 645 NVGRE [RFC7637], and hence is more suitable to use going forward. 647 2. The Geneve header has the total options length which allows 648 skipping over the options for NIC offload operations and will allow 649 transit devices to view flow information in the inner payload. 651 3. The DT considered the option of using NSH [RFC8300] with VXLAN- 652 GPE but given that NSH is targeted at service chaining and contains 653 service chaining information, it is less suitable for the network 654 virtualization use case. The other downside for VXLAN-GPE was lack 655 of a header length in VXLAN-GPE which makes skipping over the headers 656 to process inner payload more difficult. Total Option Length is 657 present in Geneve. It is not possible to skip any options in the 658 middle with VXLAN-GPE. In principle a split between a base header 659 and a header with options is interesting (whether that options header 660 is NSH or some new header without ties to a service path). We 661 explored whether it would make sense to either use NSH for this, or 662 define a new NVO3 options header. However, we observed that this 663 makes it slightly harder to find the inner payload since the length 664 field is not in the NVO3 header itself. Thus, one more field would 665 have to be extracted to compute the start of the inner payload. 666 Also, if the experience with IPv6 extension headers is a guide, there 667 would be a risk that key pieces of hardware might not implement the 668 options header, resulting in future calls to deprecate its use. 669 Making the options part of the base NVO3 header has less of those 670 issues. Even though the implementation of any particular option can 671 not be predicted ahead of time, the option mechanism and ability to 672 skip the options is likely to be broadly implemented. 674 4. The DT compared the TLV vs bit fields style extension. It was 675 deemed that parsing both TLV and bit fields is expensive and, while 676 bit fields may be simpler to parse, it is also more restrictive and 677 requires guessing which extensions will be widely implemented so they 678 can get early bit assignments. Given that half the bits are already 679 assigned in GUE, a widely deployed extension may appear in a flag 680 extension, and this will require extra processing, to dig the flag 681 from the flag extension and then look for the extension itself. Also 682 bit fields are not flexible enough to address the requirements from 683 OAM, Telemetry, and security extensions, for variable length option 684 and different subtypes of the same option. While TLVs are more 685 flexible, a control plane can restrict the number of option TLVs as 686 well as the order and size of the TLVs to make it simpler for a 687 dataplane implementation to handle. 689 5. The DT briefly discussed the multi-vendor NVE case, and the need 690 to allow vendors to put their own extensions in the NVE header. This 691 is possible with TLVs. 693 6. The DT also agreed that the C (Critical) bit in Geneve is 694 helpful. It indicates that the header includes options which must be 695 parsed or the packet discarded. It allows a receiver NVE to easily 696 decide whether to process options or not, for example a UUID based 697 packet trace, and how an optional extension such as that can be 698 ignored by a receiver NVE and thus make it easy for NVE to skip over 699 the options. Thus, the C bit remains as defined in Geneve. 701 7. There are already some extensions that are being discussed (see 702 section 6.2) of varying sizes. By using Geneve options it is possible 703 to get in band parameters like switch id, ingress port, egress port, 704 internal delay, and queue in telemetry defined extension TLV from 705 switches. It is also possible to add security extension TLVs like 706 HMAC [RFC2104] and DTLS/IPSEC [RFC9147]/[RFC6071] to authenticate the 707 Geneve packet header and secure the Geneve packet payload by software 708 or hardware tunnel endpoints. A Group Based Policy extension TLV can 709 be carried as well. 711 8. There are already implementations of Geneve options deployed in 712 production networks as of this writing. There is as well new 713 hardware supporting Geneve TLV parsing. In addition, an In-band 714 Telemetry [INT] specification is being developed by P4.org that 715 illustrates the option of INT meta data carried over Geneve. OVN/OVS 716 [OVN] have also defined some option TLV(s) for Geneve. 718 9. The DT has addressed the usage models while considering the 719 requirements and implementations in general including software and 720 hardware. 722 There seems to be interest in standardizing some well-known secure 723 option TLVs to secure the header and payload to guarantee 724 encapsulation header integrity and tenant data privacy. The Design 725 Team recommends that the working group consider standardizing such 726 option(s). 728 The DT recommends the following enhancements to Geneve to make it 729 more suitable to hardware and yet provide the flexibility for 730 software: 732 The DT proposes a text such as, while TLVs are more flexible, a 733 control plane can restrict the number of option TLVs as well the 734 order and size of the TLVs to make it simpler for a data plane 735 implementation in software or hardware to handle. For example, 736 there may be some critical information such as a secure hash that 737 must be processed in a certain order at lowest latency. 739 A control plane can negotiate a subset of option TLVs and certain 740 TLV ordering, as well as limiting the total number of option TLVs 741 present in the packet, for example, to allow for hardware capable 742 of processing fewer options. Hence, the control plane needs to 743 have the ability to describe the supported TLVs subset and their 744 order. 746 The Geneve documents should specify that the subset and order of 747 option TLVs SHOULD be configurable for each remote NVE in the 748 absence of a protocol control plane. 750 The DT recommends that Geneve follow fragmentation recommendations 751 in overlay services like PWE3 and the L2/L3 VPN recommendations to 752 guarantee larger MTU for the tunnel overhead ([RFC3985] Section 753 5.3). 755 The DT requests that Geneve provide a recommendation for critical 756 bit processing - text could specify how critical bits can be used 757 with control plane specifying the critical options. 759 Given that there is a telemetry option use case for a length of 760 256 bytes, we recommend that Geneve increase the Single TLV option 761 length to 256. 763 The DT requests that Geneve address Requirements for OAM 764 considerations for alternate marking and for performance 765 measurements that need a 2 bit field in the header and clarify the 766 need for the current OAM bit in the Geneve Header. 768 The DT recommends that the WG work on security options for Geneve. 770 8. Acknowledgements 772 The authors would like to thank Tom Herbert for providing the 773 motivation for the Security/Integrity extension, and for his valuable 774 comments, T. Sridhar for his valuable comments and feedback, and 775 Anoop Ghanwani for his extensive comments. 777 9. Security Considerations 779 This document does not introduce any additional security constraints. 781 10. IANA Considerations 783 This document requires no IANA actions. 785 11. References 787 11.1 Normative References 789 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 790 Requirement Levels", BCP 14, RFC 2119, DOI 791 10.17487/RFC2119, March 1997, . 794 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 795 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 796 2017, . 798 11.2 Informative References 800 [I-D.herbert-gue-extensions] Herbert, T., Yong, L., and F. Templin, 801 "Extensions for Generic UDP Encapsulation", 802 draft-herbert-gue-extensions-01 (work in progress), October 803 2016. 805 [I-D.ietf-intarea-gue] Herbert, T., Yong, L., and O. Zia, "Generic 806 UDP Encapsulation", draft-ietf-intarea-gue (work in 807 progress), October 2019. 809 [I-D.ietf-ippm-ioam-data] F. Brockers, S. Bhandari, T. Mizrahi, "Data 810 Fields for In-situ OAM", draft-ietf-ippm-ioam-data (work in 811 progress), June 2021. 813 [I-D.ietf-nvo3-vxlan-gpe] Maino, F., Kreeger, L., and U. Elzur, 814 "Generic Protocol Extension for VXLAN", 815 draft-ietf-nvo3-vxlan-gpe (work in progress), March 2021. 817 [I-D.smith-vxlan-group-policy] Smith, M. and L. Kreeger, "VXLAN Group 818 Policy Option", draft-smith-vxlan-group-policy-05 (work in 819 progress), October 2018. 821 [802.1Q] IEEE, "IEEE Standard for Local and metropolitan area 822 networks--Bridges and Bridged Networks", IEEE Std 823 802.1Q-2014, DOI 10.1109/IEEESTD.2014.6991462, December 824 2014, . 826 [INT] P4.org, "In-band Network Telemetry (INT) Dataplane 827 Specification", November 2020, 828 https://p4.org/p4-spec/docs/INT_v2_1.pdf 830 [OVN] Open Virtual Network, https://www.openvswitch.org/ 832 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 833 Hashing for Message Authentication", RFC 2104, DOI 834 10.17487/RFC2104, February 1997, . 837 [RFC2418] Bradner, S., "IETF Working Group Guidelines and 838 Procedures", BCP 25, RFC 2418, DOI 10.17487/RFC2418, 839 September 1998, . 841 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 842 Edge-to-Edge (PWE3) Architecture", RFC 3985, DOI 843 10.17487/RFC3985, March 2005, 844 . 846 [RFC6071] Frankel, S. and S. Krishnan, "IP Security (IPsec) and 847 Internet Key Exchange (IKE) Document Roadmap", RFC 6071, 848 DOI 10.17487/RFC6071, February 2011, . 851 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, D., 852 and S. Mansfield, "Guidelines for the Use of the "OAM" 853 Acronym in the IETF", BCP 161, RFC 6291, DOI 854 10.17487/RFC6291, June 2011, . 857 [RFC7042] Eastlake 3rd, D. and J. Abley, "IANA Considerations and 858 IETF Protocol and Documentation Usage for IEEE 802 859 Parameters", BCP 141, RFC 7042, DOI 10.17487/RFC7042, 860 October 2013, . 862 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 863 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 864 eXtensible Local Area Network (VXLAN): A Framework for 865 Overlaying Virtualized Layer 2 Networks over Layer 3 866 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 867 . 869 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 870 Virtualization Using Generic Routing Encapsulation", RFC 871 7637, DOI 10.17487/RFC7637, September 2015, 872 . 874 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 875 "Network Service Header (NSH)", RFC 8300, DOI 876 10.17487/RFC8300, January 2018, 877 . 879 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 880 Uttaro, J., and W. Henderickx, "A Network Virtualization 881 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI 882 10.17487/RFC8365, March 2018, 883 . 885 [RFC8394] Li, Y., Eastlake 3rd, D., Kreeger, L., Narten, T., and D. 886 Black, "Split Network Virtualization Edge (Split-NVE) 887 Control-Plane Requirements", RFC 8394, DOI 888 10.17487/RFC8394, May 2018, 889 . 891 [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., 892 "Geneve: Generic Network Virtualization Encapsulation", RFC 893 8926, DOI 10.17487/RFC8926, November 2020, 894 . 896 [RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The 897 Datagram Transport Layer Security (DTLS) Protocol Version 898 1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022, 899 . 901 Appendix A: Encapsulations Comparison 903 A.1. Overview 905 This section presents a comparison of the three NVO3 encapsulation 906 proposals, Geneve [RFC8926], GUE [I-D.ietf-intarea-gue], and VXLAN- 907 GPE [I-D.ietf-nvo3-vxlan-gpe]. The three encapsulations use an outer 908 UDP/IP transport. Geneve and VXLAN-GPE use an 8-octet header, while 909 GUE uses a 4-octet header. In addition to the base header, optional 910 extensions may be included in the encapsulation, as discussed in 911 Section A.2 below. 913 A.2. Extensibility 915 A.2.1. Native Extensibility Support 917 The Geneve and GUE encapsulations both enable optional headers to be 918 incorporated at the end of the base encapsulation header. 920 VXLAN-GPE does not provide native support for header extensions. 921 However, as discussed in [I-D.ietf-nvo3-vxlan-gpe], extensibility can 922 be attained to some extent if the Network Service Header (NSH) 923 [RFC8300] is used immediately following the VXLAN-GPE header. NSH 924 supports either a fixed-size extension (MD Type 1), or a variable- 925 size TLV-based extension (MD Type 2). Note that NSH-over-VXLAN-GPE 926 implies an additional overhead of the 8-octets NSH header, in 927 addition to the VXLAN-GPE header. 929 A.2.2. Extension Parsing 931 The Geneve Variable Length Options are defined as Type/Length/Value 932 (TLV) extensions. Similarly, VXLAN-GPE, when using NSH, can include 933 NSH TLV-based extensions. In contrast, GUE defines a small set of 934 possible extension fields (proposed in [I-D.herbert-gue-extensions]), 935 and a set of flags in the GUE header that indicate for each extension 936 type whether it is present or not. 938 TLV-based extensions, as defined in Geneve, provide the flexibility 939 for a large number of possible extension types. Similar behavior can 940 be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- 941 based approach taken in GUE strives to simplify implementations by 942 defining a small number of possible extensions used in a fixed order. 944 The Geneve and GUE headers both include a length field, defining the 945 total length of the encapsulation, including the optional extensions. 946 This length field simplifies the parsing by transit devices that skip 947 the encapsulation header without parsing its extensions. 949 A.2.3. Critical Extensions 951 The Geneve encapsulation header includes the 'C' field, which 952 indicates whether the current Geneve header includes critical 953 options, that is to say, options which must be parsed by the target 954 NVE. If the endpoint is not able to process a critical option, the 955 packet is discarded. 957 A.2.4. Maximal Header Length 959 The maximal header length in Geneve, including options, is 260 960 octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE 961 uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is 962 used, yielding an encapsulation header of up to 264 octets. 964 A.3. Encapsulation Header 966 A.3.1. Virtual Network Identifier (VNI) 968 The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. 969 GUE, on the other hand, enables the use of a 32-bit field called 970 VNID; this field is not included in the GUE header, but was defined 971 as an optional extension in [I-D.herbert-gue-extensions]. 973 The VXLAN-GPE header includes the 'I' bit, indicating that the VNI 974 field is valid in the current header. A similar indicator is defined 975 as a flag in the GUE header [I-D.herbert-gue-extensions]. 977 A.3.2. Next Protocol 979 All three encapsulation headers include a field that specifies the 980 type of the next protocol header, which resides after the NVO3 981 encapsulation header. The Geneve header includes a 16-bit field that 982 uses the IEEE Ethertype convention. GUE uses an 8-bit field, which 983 uses the IANA Internet protocol numbering. The VXLAN-GPE header 984 incorporates an 8-bit Next Protocol field, using a VXLAN-GPE-specific 985 registry, defined in [I-D.ietf-nvo3-vxlan-gpe]. 987 The VXLAN-GPE header also includes the 'P' bit, which explicitly 988 indicates whether the Next Protocol field is present in the current 989 header. 991 A.3.3. Other Header Fields 993 The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates 994 whether the current packet is an OAM packet. The GUE header includes 995 a similar field, but uses different terminology; the GUE 'C-bit' 996 specifies whether the current packet is a control packet. Note that 997 the GUE control bit can potentially be used in a large set of 998 protocols that are not OAM protocols. However, the control packet 999 examples discussed in [I-D.ietf-intarea-gue] are OAM-related. 1001 Each of the three NVO3 encapsulation headers includes a 2-bit Version 1002 field, which is currently defined to be zero. 1004 The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in 1005 the Geneve header, and 27 bits in the VXLAN-GPE header are reserved. 1007 A.4. Comparison Summary 1009 The following table summarizes the comparison between the three NVO3 1010 encapsulations. In some cases a plus sign ("+") or minus sign ("-") 1011 is used to indicate that the header is stronger or weaker in an area 1012 respectively. 1014 +----------------+----------------+----------------+----------------+ 1015 | | Geneve | GUE | VXLAN-GPE | 1016 +----------------+----------------+----------------+----------------+ 1017 | Outer transport| UDP/IP | UDP/IP | UDP/IP | 1018 | UDP Port Number| 6081 | 6080 | 4790 | 1019 +----------------+----------------+----------------+----------------+ 1020 | Base header | 8 octets | 4 octets | 8 octets | 1021 | length | | | (16 octets | 1022 | | | | using NSH) | 1023 +----------------+----------------+----------------+----------------+ 1024 | Extensibility |Variable length |Extension fields| No native ext- | 1025 | | options | | ensibility. | 1026 | | | | Might use NSH. | 1027 +----------------+----------------+----------------+----------------+ 1028 | Extension | TLV-based | Flag-based | TLV-based | 1029 | parsing method | | |(using NSH with | 1030 | | | | MD Type 2) | 1031 +----------------+----------------+----------------+----------------+ 1032 | Extension | Variable | Fixed | Variable | 1033 | order | | | (using NSH) | 1034 +----------------+----------------+----------------+----------------+ 1035 | Length field | + | + | - | 1036 +----------------+----------------+----------------+----------------+ 1037 | Max Header | 260 octets | 128 octets | 8 octets | 1038 | Length | | |(264 using NSH) | 1039 +----------------+----------------+----------------+----------------+ 1040 | Critical exte- | + | - | - | 1041 | nsion bit | | | | 1042 +----------------+----------------+----------------+----------------+ 1043 | VNI field size | 24 bits | 32 bits | 24 bits | 1044 | | | (extension) | | 1045 +----------------+----------------+----------------+----------------+ 1046 | Next protocol | 16 bits | 8 bits | 8 bits | 1047 | field | Ethertype | Internet prot- | New registry | 1048 | | registry | ocol registry | | 1049 +----------------+----------------+----------------+----------------+ 1050 | Next protocol | - | - | + | 1051 | indicator | | | | 1052 +----------------+----------------+----------------+----------------+ 1053 | OAM / control | OAM bit | Control bit | OAM bit | 1054 | field | | | | 1055 +----------------+----------------+----------------+----------------+ 1056 | Version field | 2 bits | 2 bits | 2 bits | 1057 +----------------+----------------+----------------+----------------+ 1058 | Reserved bits | 14 bits | none | 27 bits | 1059 +----------------+----------------+----------------+----------------+ 1061 Figure 4: NVO3 Encapsulations Comparison 1063 Contributors 1065 The following co-authors have contributed to this document: 1067 Ilango Ganga Intel Email: ilango.s.ganga@intel.com 1069 Pankaj Garg Microsoft Email: pankajg@microsoft.com 1071 Rajeev Manur Broadcom Email: rajeev.manur@broadcom.com 1073 Tal Mizrahi Marvell Email: talmi@marvell.com 1075 David Mozes Email: mosesster@gmail.com 1077 Erik Nordmark ZEDEDA Email: nordmark@sonic.net 1079 Michael Smith Cisco Email: michsmit@cisco.com 1081 Sam Aldrin Google Email: aldrin.ietf@gmail.com 1083 Ignas Bagdonas Equinix Email: ibagdona.ietf@gmail.com 1085 Authors' Addresses 1087 Sami Boutros 1088 Ciena 1089 USA 1091 Email: sboutros@ciena.com 1093 Donald E. Eastlake 1094 Futurewei Technologies 1095 2386 Panoramic Circle 1096 Apopka, FL 32703 1097 USA 1099 Tel: +1-508-333-2270 1100 Email: d3e3e3@gmail.com