idnits 2.17.1 draft-dt-nvo3-encap-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 46 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 400 has weird spacing: '...of TLVs from ...' == Line 517 has weird spacing: '...lso, if the...' -- The document date (March 12, 2017) is 2600 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 167, but not defined == Missing Reference: 'I-D.ietf-nvo3-vxlan-gpe' is mentioned on line 694, but not defined == Missing Reference: 'I-D.herbert-gue-extensions' is mentioned on line 684, but not defined == Missing Reference: 'I-D.ietf-nvo3-gue' is mentioned on line 708, but not defined == Unused Reference: 'KEYWORDS' is defined on line 595, but no explicit reference was found in the text == Unused Reference: 'Geneve' is defined on line 600, but no explicit reference was found in the text == Unused Reference: 'GUE' is defined on line 602, but no explicit reference was found in the text == Unused Reference: 'NSH' is defined on line 603, but no explicit reference was found in the text == Unused Reference: 'VXLAN-GPE' is defined on line 604, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-smith-vxlan-group-policy-03 Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Sami Boutros(Ed.) 3 Intended Status: Informational VMware 5 Ignas Bagdonas Sam Aldrin 6 Equinix Google 8 Matthew Bocci Uri Elzur 9 Nokia Ilango Ganga 10 Intel 12 Pankaj Garg Rajeev Manur 13 Microsoft Broadcom 15 Tal Mizrahi David Mozes 16 Marvell Mellanox 18 Erik Nordmark Michael Smith 19 Arista Networks Cisco 21 Expires: September 13, 2017 March 12, 2017 23 NVO3 Encapsulation Considerations 24 draft-dt-nvo3-encap-01 26 Abstract 28 As communicated by WG Chairs, the IETF NVO3 chairs and Routing Area 29 director have chartered a design team to take forward the 30 encapsulation discussion and see if there is potential to design a 31 common encapsulation that addresses the various technical concerns. 33 There are implications of different encapsulations in real 34 environments consisting of both software and hardware implementations 35 and spanning multiple data centers. For example, OAM functions such 36 as path MTU discovery become challenging with multiple encapsulations 37 along the data path. 39 The design team recommend Geneve with few modifications as the common 40 encapsulation, more details are described in section 7. 42 Status of this Memo 44 This Internet-Draft is submitted to IETF in full conformance with the 45 provisions of BCP 78 and BCP 79. 47 Internet-Drafts are working documents of the Internet Engineering 48 Task Force (IETF), its areas, and its working groups. Note that 49 other groups may also distribute working documents as 50 Internet-Drafts. 52 Internet-Drafts are draft documents valid for a maximum of six months 53 and may be updated, replaced, or obsoleted by other documents at any 54 time. It is inappropriate to use Internet-Drafts as reference 55 material or to cite them other than as "work in progress." 57 The list of current Internet-Drafts can be accessed at 58 http://www.ietf.org/1id-abstracts.html 60 The list of Internet-Draft Shadow Directories can be accessed at 61 http://www.ietf.org/shadow.html 63 Copyright and License Notice 65 Copyright (c) 2017 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with respect 73 to this document. Code Components extracted from this document must 74 include Simplified BSD License text as described in Section 4.e of 75 the Trust Legal Provisions and are provided without warranty as 76 described in the Simplified BSD License. 78 Table of Contents 80 1. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 4 81 2. Design Team Goals . . . . . . . . . . . . . . . . . . . . . . . 4 82 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4 83 4. Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . 4 84 5. Issues with current Encapsulations . . . . . . . . . . . . . . 5 85 5.1 Geneve . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 86 5.2 GUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 87 5.3 VXLAN-GPE . . . . . . . . . . . . . . . . . . . . . . . . . 5 88 6. Common Encapsulation Considerations . . . . . . . . . . . . . . 6 89 6.1 Current Encapsulations . . . . . . . . . . . . . . . . . . . 6 90 6.2 Useful Extensions Use cases . . . . . . . . . . . . . . . . 6 91 6.2.1. Telemetry extensions. . . . . . . . . . . . . . . . . . 6 92 6.2.2. Security/Integrity extensions . . . . . . . . . . . . . 7 93 6.2.3. Group Base Policy . . . . . . . . . . . . . . . . . . . 7 94 6.3 Hardware Considerations . . . . . . . . . . . . . . . . . . 8 95 6.4 Extension Size . . . . . . . . . . . . . . . . . . . . . . . 8 96 6.5 Extension Ordering . . . . . . . . . . . . . . . . . . . . . 9 97 6.6 TLV vs Bit Fields . . . . . . . . . . . . . . . . . . . . . 9 98 6.7 Control Plane Considerations . . . . . . . . . . . . . . . . 10 99 6.8 Split NVE . . . . . . . . . . . . . . . . . . . . . . . . . 11 100 6.9 Larger VNI Considerations . . . . . . . . . . . . . . . . . 11 101 7. Design team recommendations . . . . . . . . . . . . . . . . . . 11 102 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 103 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 13 104 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 105 10.1 Normative References . . . . . . . . . . . . . . . . . . . 14 106 10.2 Informative References . . . . . . . . . . . . . . . . . . 14 107 11. Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . 14 108 11.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 14 109 11.2. Extensibility . . . . . . . . . . . . . . . . . . . . . . 14 110 11.2.1. Native Extensibility Support . . . . . . . . . . . . 14 111 11.2.2. Extension Parsing . . . . . . . . . . . . . . . . . . 15 112 11.2.3. Critical Extensions . . . . . . . . . . . . . . . . . 15 113 11.2.4. Maximal Header Length . . . . . . . . . . . . . . . . 15 114 11.3. Encapsulation Header . . . . . . . . . . . . . . . . . . 15 115 11.3.1. Virtual Network Identifier (VNI) . . . . . . . . . . 15 116 11.3.2. Next Protocol . . . . . . . . . . . . . . . . . . . . 16 117 11.3.3. Other Header Fields . . . . . . . . . . . . . . . . . 16 118 11.4. Comparison Summary . . . . . . . . . . . . . . . . . . . 16 119 Authors' Addresses (In alphabetical order) . . . . . . . . . . . . 17 121 1. Problem Statement 123 As communicated by WG Chairs, the NVO3 WG charter states that it may 124 produce requirements for network virtualization data planes based on 125 encapsulation of virtual network traffic over an IP-based underlay 126 data plane. Such requirements should consider OAM and security. Based 127 on these requirements the WG will select, extend, and/or develop one 128 or more data plane encapsulation format(s). 130 This has led to drafts describing three encapsulations being adopted 131 by the working group: 133 - draft-ietf-nvo3-geneve-03 135 - draft-ietf-nvo3-gue-04 137 - draft-ietf-nvo3-vxlan-gpe-02 139 Discussion on the list and in face-to-face meetings has identified a 140 number of technical problems with each of these encapsulations. 141 Furthermore, there was clear consensus at the IETF meeting in Berlin 142 that it is undesirable for the working group to progress more than 143 one data plane encapsulation. Although consensus could not be reached 144 on the list, the overall consensus was for a single encapsulation 145 (RFC2418, Section 3.3). Nonetheless there has been resistance to 146 converging on a single encapsulation format. 148 2. Design Team Goals 150 As communicated by WG Chairs, the design team should take one of the 151 proposed encapsulations and enhance it to address the technical 152 concerns. Backwards compatibility with the chosen encapsulation and 153 the simple evolution of deployed networks as well as applicability to 154 all locations in the NVO3 architecture are goals. The DT should 155 specifically avoid a design that is burdensome on hardware 156 implementations, but should allow future extensibility. The chosen 157 design should also operate well with ICMP and in ECMP environments. 158 If further extensibility is required, then it should be done in such 159 a manner that it does not require the consent of an entity outside of 160 the IETF. 162 3. Terminology 164 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 165 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 166 document are to be interpreted as described in RFC 2119 [RFC2119]. 168 4. Abbreviations 169 NVO3 Network Virtualization Overlays over Layer 3 171 OAM Operations, Administration, and Maintenance 173 TLV Type, Length, and Value 175 VNI Virtual Network Identifier 177 NVE Network Virtualization Edge 179 NVA Network Virtualization Authority 181 NIC Network interface card 183 Transit device Underlay network devices between NVE(s). 185 5. Issues with current Encapsulations 187 As summarized by WG Chairs. 189 5.1 Geneve 191 - Can't be implemented cost-effectively in all use cases because 192 variable length header and order of the TLVs makes is costly (in 193 terms of number of gates) to implement in hardware 195 - Fork-lift upgrade from widely deployed VXLAN (no backwards 196 compatibility mechanisms) 198 - Header doesn't fit into largest commonly available parse buffer 199 (256 bytes in NIC). Cannot justify doubling buffer size unless it is 200 mandatory for hardware to process additional option fields. 202 5.2 GUE 204 - There were a significant number of objections related to the 205 complexity of implementation in hardware, similar to those noted for 206 Geneve above. 208 - In addition, there were concerns raised that GUE does not support a 209 sufficient number of extensions due to its reliance on a limited 210 flags field, which is already almost 45% allocated. 212 5.3 VXLAN-GPE 214 - GPE is not day-1 backwards compatible with VXLAN. Although the 215 frame format is similar, it uses a different UDP port, so would 216 require changes to existing implementations even if the rest of the 217 GPE frame is the same. 219 - GPE is insufficiently extensible. Numerous extensions and options 220 have been designed for GUE and Geneve. Note that these have not yet 221 been validated by the WG. 223 - Security e.g. of the VNI has not been addressed by GPE. Although a 224 shim header could be used for security and other extensions, this has 225 not been defined yet and its implications on offloading in NICs are 226 not understood. 228 6. Common Encapsulation Considerations 230 6.1 Current Encapsulations 232 Appendix A includes a detailed comparison between the three proposed 233 encapsulations. The comparison indicates several common properties, 234 but also three major differences among the encapsulations: 236 - Extensibility: Geneve and GUE were defined with built-in 237 extensibility, while VXLAN-GPE is not inherently extensible. Note 238 that any of the three encapsulations can be extended using the 239 Network Service Header (NSH). 241 - Extension method: Geneve is extensible using Type/Length/Value 242 (TLV) fields, while GUE uses a small set of possible extensions, and 243 a set of flags that indicate which extension is present. 245 - Length field: Geneve and GUE include a Length field, indicating the 246 length of the encapsulation header, while VXLAN-GPE does not include 247 such a field. 249 6.2 Useful Extensions Use cases 251 Non vendor specific TLV MUST follow the standardization process. The 252 following use cases for extensions shows that there is a strong 253 requirement to support variable length extensions with possible 254 different subtypes. 256 6.2.1. Telemetry extensions. 258 In several scenarios it is beneficial to make information about the 259 path a packet took through the network or through a network device as 260 well as associated telemetry information available to the operator. 262 This includes not only tasks like debugging, troubleshooting, as well 263 as network planning and network optimization but also policy or 264 service level agreement compliance checks. 266 Packet scheduling algorithms, especially for balancing traffic across 267 equal cost paths or links, often leverage information contained 268 within the packet, such as protocol number, IP-address or MAC- 269 address. Probe packets would thus either need to be sent from the 270 exact same endpoints with the exact same parameters, or probe packets 271 would need to be artificially constructed as "fake" packets and 272 inserted along the path. Both approaches are often not feasible from 273 an operational perspective, be it that access to the end-system is 274 not feasible, or that the diversity of parameters and associated 275 probe packets to be created is simply too large. An in-bound 276 telemetry mechanism in extensions is an alternative in those cases. 278 6.2.2. Security/Integrity extensions 280 Since the currently proposed NVO3 encapsulations do not protect their 281 headers a single bit corruption in the VNI field could deliver a 282 packet to the wrong tenant. Extensions are needed to use any 283 sophisticated security. 285 The possibility of VNI spoofing with an NVO3 protocol is exacerbated 286 by the use of UDP. Systems typically have no restrictions on 287 applications being able to send to any UDP port so an unprivileged 288 application can trivially spoof for instance, VXLAN packets, 289 including using arbitrary VNIs. 291 One can envision HMAC-like support in some NVO3 extension to 292 authenticate the header and the outer IP addresses, thereby 293 preventing attackers from injecting packets with spoofed VNIs. 295 An other aspect of security is payload security. Essentially this is 296 to make packets that look like IP|UDP|NVO3 Encap|DTLS 297 Extension|payload. This is nice since we still have the UDP header 298 for ECMP, the NVO3 header is in plain text so it can by read by 299 network elements, and different security or other payload transforms 300 can be supported on a single UDP port (we don't need a separate UDP 301 for DTLS). 303 6.2.3. Group Base Policy 305 Another use case would be to carry the Group Based Policy (GBP) 306 source group information within a NVO3 header extension in a similar 307 manner as has been implemented for VXLAN [VXLAN-GBP]. This allows 308 various forms of policy such as access control and QoS to be applied 309 between abstract groups rather than coupled to specific endpoint 310 addresses. 312 6.3 Hardware Considerations 314 Hardware restrictions should be taken into consideration along with 315 future hardware enhancements that may provide more flexible metadata 316 processing. However, the set of options that need to and will be 317 implemented in hardware will be a subset of what is implemented in 318 software, since software NVEs are likely to grow features, and hence 319 option support, at a more rapid rate. 321 We note that it is hard to predict which options will be implemented 322 in which piece of hardware and when. That depends on whether the 323 hardware will be in the form of a NIC providing increasing offload 324 capabilities to software NVEs, or a switch chip being used as an NVE 325 gateway towards non-NVO3 parts of the network, or even an transit 326 devices that participates in the NVO3 dataplane e.g. for OAM 327 purposes. 329 A result of this is that it doesn't look useful to prescribe some 330 order of the option so that the ones that are likely to be 331 implemented in hardware come first; we can't decide such an order 332 when we define the options, however a control plane can enforce such 333 order for some hardware implementations. 335 We do know that hardware needs to initially be able to efficiently 336 skip over the NVO3 header to find the inner payload. That is needed 337 for both NICs doing e.g. TCP offload and transit devices and NVEs 338 applying policy/ACLs to the inner payload. 340 6.4 Extension Size 342 Extension header length has a significant impact to hardware and 343 software implementations. A total header length that is too small 344 will unnecessarily constrained software flexibility. A total header 345 length that is too large will place a nontrivial cost on hardware 346 implementations. Thus, the design team recommends that there be a 347 minimum and maximum total extension header length selected. The 348 maximum total header length is determined by the bits allocated for 349 the total extension header length field. The risk with this approach 350 is that it may be difficult to extend the total header size in the 351 future. The minimum total header length is determined by a 352 requirement in the specifications that all implementations must meet. 353 The risk with this approach is that all implementations will only 354 implement the minimum total header length which would then become the 355 de facto maximum total header length. The recommended minimum total 356 header length is 64 bytes. 358 Single Extension size should always be 4 bytes aligned. 360 The maximum length of a single option should be large enough to meet 361 the different extension use case requirements e.g. in-band telemetry 362 and future use. 364 6.5 Extension Ordering 366 In order to support hardware nodes at the tunnel endpoint or at the 367 transit that can process one or few extensions TLVs in TCAM. A 368 control plane in such a deployment can signal a capability to ensure 369 a specific TLV will always appear in a specific order for example the 370 first one in the packet. 372 The order of the TLVs should be HW friendly for both the sender and 373 the receiver and possibly the transit node too. 375 A transit node may need to process some extensions like telemetry 376 and/or OAM inband extensions. 378 6.6 TLV vs Bit Fields 380 If there is a well-known initial set of options that are likely to be 381 implemented in software and in hardware, it can be efficient to use 382 the bit-field approach as in GUE. However, as described in section 383 6.3, if options are added over time and different subsets of options 384 are likely to be implemented in different pieces of hardware, then it 385 would be hard for the IETF to specify which options should get the 386 early bit fields. TLVs are a lot more flexible, which avoids the need 387 to determine the relative importance different options. However, 388 general TLV of arbitrary order, size, and repetition of the same 389 order is difficult to implement in hardware. A middle ground is to 390 use TLV with restrictions on the size and alignment, observing that 391 individual TLVs can have a fixed length, and support in the control 392 plane such that an NVE will only receive options that to needs and 393 implements. The control plane approach can potentially be used to 394 control the order of the TLVs sent to a particular NVE. Note that 395 transit devices are not likely to participate in the control plane 396 hence to the extent that they need to participate in option 397 processing they need more effort, But transit devices would have 398 issues with future GUE bits being defined for future options as well. 400 A benefit of TLVs from a HW perspective is that they are self 401 describing i.e., all the information is in the TLV. In a Bit fields 402 approach the hardware needs to look up the bit to determine the 403 length of the data associated with the bit through some separate 404 table, which would add hardware complexity. 406 There are use cases where multiple modules of software are running on 407 NVE. This can be modules such as a diagnostic module by one vendor 408 that does packet sampling and another module from a different vendor 409 that does a firewall. Using a TLV format, it is easier to have 410 different software modules process different TLVs, which could be 411 standard extensions or vendor specific extensions defined by the 412 different vendors, without conflicting with each other. This can help 413 with hardware modularity as well. 415 6.7 Control Plane Considerations 417 Given that we want to allow large flexibility and extensibility for 418 e.g. software NVEs, yet be able to support key extensions in less 419 flexible e.g. hardware NVEs, it is useful to consider the control 420 plane. By control plane in this context we mean both protocols such 421 as EVPN and others, and also deployment specific configuration. 423 If each NVE can express in the control plane that they only care 424 about particular extensions (could be a single extension, or a few), 425 and the source NVEs only include requested extensions in the NVO3 426 packets, then the target NVE can both use a simpler parser (e.g., a 427 TCAM might be usable to look for a single NVO3 extension) and the 428 depth of the inner payload in the NVO3 packet will be minimized. 429 Furthermore, if the target NVE cares about a few extensions and can 430 express in the control plane the desired order of those extensions in 431 the NVO3 packets, then it can provide useful functionality with 432 minimal hardware requirements. 434 Note that transit devices that are not aware of the NVO3 extensions 435 somewhat benefit from such an approach, since the inner payload is 436 less deep in the packet if no extraneous extensions are included in 437 the packet. However, in general a transit device is not likely to 438 participate in the NVO3 control plane. (However, configuration 439 mechanisms can take into account limitations of the transit devices 440 used in particular deployments.) 442 Note that in this approach different NVEs could desire different 443 (sets of) extensions, which means that the source NVE needs to be 444 able to place different sets of extensions in different NVO3 packets, 445 and perhaps in different order. It also assumes that underlay 446 multicast or replication servers are not used together with NVO3 447 extensions. 449 There is a need to consider mandatory extensions versus optional 450 extensions. Mandatory extensions require the receiver to drop the 451 packet if the extension is unknown. A control plane mechanism can 452 prevent the need for dropping unknown extensions, since they would 453 not be included to targets that do not support them. 455 The control planes defined today need to add the ability to describe 456 the different encapsulations. Thus perhaps EVPN, and any other 457 control plane protocol that the IETF defines, should have a way to 458 enumerate the supported NVO3 extensions and their order. 460 6.8 Split NVE 462 If the working group sees a need for having the hosts send and 463 receive options in a split NVE case, this is possible using any of 464 the existing extensible encapsulations (Geneve, GUE, GPE+NSH) by 465 defining a way to carry those over other transports. NSH can already 466 be used over different transports. 468 If we need to do this with other encapsulations it can be done by 469 defining an Ether type for other encapsulations so that it can be 470 carried over Ethernet and 802.1Q. 472 If we need to carry other encapsulations over MPLS, it would require 473 an EVPN control plane to signal that other encapsulation header + 474 options will be present in front of the L2 packet. The VNI can be 475 ignored in the header, and the MPLS label will be the one used to 476 identify the EVPN L2 instance. 478 6.9 Larger VNI Considerations 480 We discussed whether we should make VNI 32-bits or larger. The 481 benefit of 24-bit VNI would be to avoid unnecessary changes with 482 existing proposals and implementations that are almost all, if not 483 all, are using 24-bit VNI. If we need a larger VNI, an extension can 484 be used to support that. 486 7. Design team recommendations 488 We concluded that Geneve is most suitable as a starting point for 489 proposed standard for network virtualization, for the following 490 reasons: 492 1. We studied whether VNI should be in base header or in extensions 493 and whether it should be 24-bit or 32-bit. The design team agreed 494 that VNI is critical information for network virtualization and MUST 495 be present in all packets. Design team also agreed that 24-bit VNI 496 matches the existing widely used encapsulation format i.e. VxLAN and 497 NVGRE and hence more suitable to use going forward. 499 2. Geneve has the total options length that allow skipping over the 500 options for NIC offload operations, and will allow transit devices to 501 view flow information in the inner payload. 503 3. We considered the option of using NSH with VxLAN-GPE but given 504 that NSH is targeted at service chaining and contains service 505 chaining information, it is less suitable for the network 506 virtualization use case. The other downside for VxLAN-GPE was lack of 507 header length in VxLAN-GPE and hence makes skipping over the headers 508 to process inner payload more difficult. Total Option Length is 509 present in Geneve. It is not possible to skip any options in the 510 middle with VxLAN-GPE. In principle a split between a base header and 511 a header with options is interesting (whether that options header is 512 NSH or some new header without ties to a service path). We explored 513 whether it would make sense to either use NSH for this, or define a 514 new NVO3 options header. However, we observed that this makes it 515 slightly harder to find the inner payload since the length field is 516 not in the NVO3 header itself. Thus one more field would have to be 517 extracted to compute the start of the inner payload. Also, if the 518 experience with IPv6 extension headers is a guidance, there would be 519 a risk that key pieces of hardware might not implement the options 520 header, resulting in future calls to deprecate its use. Making the 521 options part of the base NVO3 header has less of those issues. Even 522 though the implementation of any particular option can not be 523 predicted ahead of time, the option mechanism and ability to skip the 524 options is likely to be broadly implemented. 526 4. We compared the TLV vs Bit-fields style extension and it was 527 deemed that parsing both TLV and bit-fields is expensive and while 528 bit-fields may be simpler to parse, it is also more restrictive and 529 requires guessing which extensions will be widely implemented so they 530 can get early bit assignments for efficiency, as well Bit-fields are 531 not flexible enough to address the requirement of variable length and 532 different subtypes of the same option. While TLV are more flexible, a 533 control plane can restrict the number of option TLVs as well the 534 order and size of the TLVs to make it simpler for a dataplane 535 implementation to handle. 537 5. We briefly discussed multi-vendor NVE case, and the need to allow 538 vendors to put their own extensions in the NVE header. This is 539 possible with TLVs. 541 6. We also agreed that the C bit in Geneve is helpful to allow 542 receiver NVE to easily decide whether to process options or not. For 543 example a UUID based packet trace and how an optional extension such 544 as that can be ignored by receiver NVE and thus make it easy for NVE 545 to skip over the options. Thus the C-bit remains as defined in 546 Geneve. 548 7. There are already some extensions that are being discussed (see 549 section 6.2) of varying sizes, by using Geneve option it is possible 550 to get in band parameters like: switch id, ingress port, egress port, 551 internal delay, and queue in telemetry defined extension TLV from 552 switches. It is also possible to add Security extension TLVs like 553 HMAC and DTLS to authenticate the Geneve packet header and secure the 554 Geneve packet payload by software or hardware tunnel endpoints. As 555 well, a Group Based Policy extension TLV can be carried. 557 There seems to be interest to standardize some well known secure 558 option TLVs to secure the header and payload to guarantee 559 encapsulation header integrity and tenant data privacy. The design 560 team recommends that the working group consider standardizing such 561 option(s). 563 We recommend the following enhancements to Geneve to make it more 564 suitable to hardware and yet provide the flexibility for software: 566 We would propose a text such as, while TLV are more flexible, a 567 control plane can restrict the number of option TLVs as well the 568 order and size of the TLVs to make it simpler for a data plane 569 implementation in software or hardware to handle. For example, there 570 may be some critical information such as secure hash that must be 571 processed in certain order at lowest latency. 573 A control plane can negotiate a subset of option TLVs and certain TLV 574 ordering, as well can limit the total number of option TLVs present 575 in the packet, for example, to allow hardware capable of processing 576 fewer options. Hence, the control planes need to have the ability to 577 describe the supported TLVs subset and their order. 579 The Geneve draft could specify that the subset and order of option 580 TLVs should be configurable for each remote NVE in the absence of a 581 protocol control plane. 583 8. Acknowledgements 585 Tom Herbert provided the motivation for the Security/Integrity 586 extension. 588 9. Security Considerations 589 This document does not introduce any additional security constraints. 591 10. References 593 10.1 Normative References 595 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 596 Requirement Levels", BCP 14, RFC 2119, March 1997. 598 10.2 Informative References 600 [Geneve] Generic Network Virtualization Encapsulation [I-D.ietf-nvo3- 601 geneve] 602 [GUE] Generic UDP Encapsulation [I-D.ietf-nvo3-gue] 603 [NSH] Network Service Header [I-D.ietf-sfc-nsh] 604 [VXLAN-GPE] Virtual eXtensible Local Area Network - Generic Protocol 605 Extension [I-D.ietf-nvo3-vxlan-gpe] 607 [VXLAN-GBP] VXLAN Group Policy Option - [I-D.draft-smith-vxlan-group- 608 policy-03] 610 11. Appendix A 612 11.1. Overview 614 This section presents a comparison of the three NVO3 encapsulation 615 proposals, Geneve, GUE, and VXLAN-GPE. The three encapsulations use 616 an outer UDP/IP transport. Geneve and VXLAN-GPE use an 8-octet 617 header, while GUE uses a 4-octet header. In addition to the base 618 header, optional extensions may be included in the encapsulation, as 619 discussed in Section 3.2 below. 621 11.2. Extensibility 623 11.2.1. Native Extensibility Support 625 The Geneve and GUE encapsulations both enable optional headers to be 626 incorporated at the end of the base encapsulation header. 628 VXLAN-GPE does not provide native support for header extensions. 629 However, as discussed in [I-D.ietf-nvo3-vxlan-gpe], extensibility can 630 be attained to some extent if the Network Service Header (NSH) [I- 631 D.ietf-sfc-nsh] is used immediately following the VXLAN-GPE header. 632 NSH supports either a fixed-size extension (MD Type 1), or a 633 variable-size TLV-based extension (MD Type 2). It should be noted 634 that NSH-over-VXLAN-GPE implies an additional overhead of the 8- 635 octets NSH header, in addition to the VXLAN-GPE header. 637 11.2.2. Extension Parsing 639 The Geneve Variable Length Options are defined as 640 Type/Length/Value(TLV) extensions. Similarly, VXLAN-GPE, when using 641 NSH, can include NSH TLV-based extensions. In contrast, GUE defines 642 a small set of possible extension fields (proposed in [I-D.herbert- 643 gue-extensions]), and a set of flags in the GUE header that indicate 644 for each extension type whether it is present or not. 646 TLV-based extensions, as defined in Geneve, provide the flexibility 647 for a large number of possible extension types. Similar behavior can 648 be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- 649 based approach taken in GUE strives to simplify implementations by 650 defining a small number of possible extensions, used in a fixed 651 order. 653 The Geneve and GUE headers both include a length field, defining the 654 total length of the encapsulation, including the optional extensions. 656 The length field simplifies the parsing of transit devices that skip 657 the encapsulation header without parsing its extensions. 659 11.2.3. Critical Extensions 661 The Geneve encapsulation header includes the 'C' field, which 662 indicates whether the current Geneve header includes critical 663 options, which must be parsed by the tunnel endpoint. If the endpoint 664 is not able to process the critical option, the packet is discarded. 666 11.2.4. Maximal Header Length 668 The maximal header length in Geneve, including options, is 260 669 octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE 670 uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is 671 used, yielding an encapsulation header of up to 264 octets. 673 11.3. Encapsulation Header 675 11.3.1. Virtual Network Identifier (VNI) 677 The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. 678 GUE, on the other hand, enables the use of a 32-bit field called 679 VNID; this field is not included in the GUE header, but was defined 680 as an optional extension in [I-D.herbert-gue-extensions]. 682 The VXLAN-GPE header includes the 'I' bit, indicating that the VNI 683 field is valid in the current header. A similar indicator is defined 684 as a flag in the GUE header [I-D.herbert-gue-extensions]. 686 11.3.2. Next Protocol 688 The three encapsulation headers include a field that specifies the 689 type of the next protocol header, which resides after the NVO3 690 encapsulation header. The Geneve header includes a 16-bit field that 691 uses the IEEE Ethertype convention. GUE uses an 8-bit field, which 692 uses the IANA Internet protocol numbering. The VXLAN-GPE header 693 incorporates an 8-bit Next Protocol field, using a VXLAN-GPE-specific 694 registry, defined in [I-D.ietf-nvo3-vxlan-gpe]. 696 The VXLAN-GPE header also includes the 'P' bit, which explicitly 697 indicates whether the Next Protocol field is present in the current 698 header. 700 11.3.3. Other Header Fields 702 The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates 703 whether the current packet is an OAM packet. The GUE header includes 704 a similar field, but uses different terminology; the GUE 'C-bit' 705 specifies whether the current packet is a control packet. Note that 706 the GUE control bit can potentially be used in a large set of 707 protocols that are not OAM protocols. However, the control packet 708 examples discussed in [I-D.ietf-nvo3-gue] are OAM-related. 710 Each of the three NVO3 encapsulation headers includes a 2-bit Version 711 field, which is currently defined to be zero. 713 The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in 714 the Geneve header, and 27 bits in the VXLAN-GPE header are reserved. 716 11.4. Comparison Summary 718 The following table summarizes the comparison between the three NVO3 719 encapsulations. 721 +----------------+----------------+----------------+----------------+ 722 | | Geneve | GUE | VXLAN-GPE | 723 +----------------+----------------+----------------+----------------+ 724 | Outer transport| UDP/IP | UDP/IP | UDP/IP | 725 +----------------+----------------+----------------+----------------+ 726 | Base header | 8 octets | 4 octets | 8 octets | 727 | length | | | (16 octets | 728 | | | | using NSH) | 729 +----------------+----------------+----------------+----------------+ 730 | Extensibility |Variable length |Extension fields| No native ext- | 731 | | options | | ensibility. | 732 | | | | Extensible | 733 | | | | using NSH. | 734 +----------------+----------------+----------------+----------------+ 735 | Extension | TLV-based | Flag-based | TLV-based | 736 | parsing method | | |(using NSH with | 737 | | | | MD Type 2) | 738 +----------------+----------------+----------------+----------------+ 739 | Extension | Variable | Fixed | Variable | 740 | order | | | (using NSH) | 741 +----------------+----------------+----------------+----------------+ 742 | Length field | + | + | - | 743 +----------------+----------------+----------------+----------------+ 744 | Max Header | 260 octets | 128 octets | 8 octets | 745 | Length | | |(264 using NSH) | 746 +----------------+----------------+----------------+----------------+ 747 | Critical exte- | + | - | - | 748 | nsion bit | | | | 749 +----------------+----------------+----------------+----------------+ 750 | VNI field size | 24 bits | 32 bits | 24 bits | 751 | | | (extension) | | 752 +----------------+----------------+----------------+----------------+ 753 | Next protocol | 16 bits | 8 bits | 8 bits | 754 | field | Ethertype | Internet prot- | New registry | 755 | | registry | ocol registry | | 756 +----------------+----------------+----------------+----------------+ 757 | Next protocol | - | - | + | 758 | indicator | | | | 759 +----------------+----------------+----------------+----------------+ 760 | OAM / control | OAM bit | Control bit | OAM bit | 761 | field | | | | 762 +----------------+----------------+----------------+----------------+ 763 | Version field | 2 bits | 2 bits | 2 bits | 764 +----------------+----------------+----------------+----------------+ 765 | Reserved bits | 14 bits | - | 27 bits | 766 +----------------+----------------+----------------+----------------+ 768 Figure 1: NVO3 Encapsulation Comparison 770 Authors' Addresses (In alphabetical order) 771 Sam Aldrin 772 Google 773 Email: aldrin.ietf@gmail.com 775 Ignas Bagdonas 776 Equinix 777 Email: ibagdona.ietf@gmail.com 779 Matthew Bocci 780 Nokia 781 Email: matthew.bocci@nokia.com 783 Sami Boutros 784 VMware 785 Email: sboutros@vmware.com 787 Uri Elzur 788 Intel 789 Email: uri.elzur@intel.com 791 Ilango Ganga 792 Intel 793 Email: ilango.s.ganga@intel.com 795 Pankaj Garg 796 Microsoft 797 Email: pankajg@microsoft.com 799 Rajeev Manur 800 Broadcom 801 Email: rajeev.manur@broadcom.com 803 Tal Mizrahi 804 Marvell 805 Email: talmi@marvell.com 807 David Mozes 808 Mellanox 809 Email: davidm@mellanox.com 811 Erik Nordmark 812 Arista Networks 813 Email: nordmark@sonic.net 815 Michael Smith 816 Cisco 817 Email: michsmit@cisco.com