idnits 2.17.1 draft-zhou-li-vxlan-soe-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2014) is 3695 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC5226' is mentioned on line 260, but not defined ** Obsolete undefined reference: RFC 5226 (Obsoleted by RFC 8126) == Unused Reference: 'I-D.mahalingam-dutt-dcops-vxlan' is defined on line 271, but no explicit reference was found in the text == Unused Reference: 'I-D.davie-stt' is defined on line 278, but no explicit reference was found in the text == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-08 == Outdated reference: A later version (-08) exists of draft-davie-stt-05 == Outdated reference: A later version (-04) exists of draft-quinn-vxlan-gpe-02 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Zhou 3 Internet-Draft C. Li 4 Intended Status: Experimental eBay Inc. 5 Expires: September 14, 2014 March 13, 2014 7 Segmentation Offloading Extension for VxLAN 8 draft-zhou-li-vxlan-soe-00 10 Abstract 12 Segmentation offloading is nowadays common in network stack 13 implementation and well supported by para-virtualized network device 14 drivers for virtual machine (VM)s. This draft describes an extension 15 to Virtual eXtensible Local Area Network (VXLAN) so that segmentation 16 can be decoupled from physical/underlay networks and offloaded 17 further to the remote end-point thus improving data-plane performance 18 for VMs running on top of overlay networks. 20 Status of this Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as 28 Internet-Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/1id-abstracts.html 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html 41 Copyright and License Notice 43 Copyright (c) 2014 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1 Requirements Notation . . . . . . . . . . . . . . . . . . . 4 60 1.2 Definition of Terms . . . . . . . . . . . . . . . . . . . . 4 61 2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1 VXLAN Header Extension . . . . . . . . . . . . . . . . . . 4 63 2.2 TX VTEP . . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 2.3 RX VTEP - Hypervisors . . . . . . . . . . . . . . . . . . . 6 65 2.4 RX VTEP - Gateways . . . . . . . . . . . . . . . . . . . . . 6 66 3 Interoperability . . . . . . . . . . . . . . . . . . . . . . . 6 67 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 6 68 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 69 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 6.1 Normative References . . . . . . . . . . . . . . . . . . . 7 71 6.2 Informative References . . . . . . . . . . . . . . . . . . 7 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 74 1 Introduction 76 Network virtualization over L3 transport is evolved along with server 77 virtualization in data-centers, and data plane performance is one of 78 the keys to the success of this combination. One of the most critical 79 improvements in OS kernel TCP/IP stack in recent years is 80 segmentation offloading, and now hypervisor providers support same 81 mechanism in para-virtualized Ethernet drivers so that virtual 82 servers can benefit from the same mechanism in virtualized world by 83 offloading segmentation tasks to the lowest layer on hypervisors or 84 NICs (if TSO is supported by the NICs equipped in the hypervisor). 86 Essentially, overlay networks has its own advantage comparing with 87 physical underlay networks in that it does not have a hard MTU 88 limitation. Therefore, segmentation offloading can be pushed to the 89 remote end-point of the transport tunnel, where segmentation can be 90 completely omitted if this remote end-point is on a hypervisor. 91 However, this advantage is not utilized when the transport of the 92 overlay is based on the Virtual eXtensible Local Area Network [I- 93 D.mahalingam-dutt-dcops-vxlan], which provides a transport mechanism 94 for logically isolated L2 overlay networks between hypervisors. 95 Lacking segmentation information in the VXLAN header, hypervisor 96 implementations have to make pessimistic decisions to always segment 97 the packet in the size specified by VMs before delivering to 98 hypervisors' IP stack, because it does not know whether the remote 99 end-point is bridged to a physical network with hard MTU limitations. 100 It is worth noting that the segmentation here is not the IP 101 fragmentation in terms of the physical network MTU, which may still 102 follow if the segment size resulting from the process above plus the 103 tunnel outer header is bigger than the physical network MTU. 105 To fulfill the potential of segmentation offloading on overlay, this 106 draft introduces segmentation metadata in VXLAN header. With the 107 capability of carrying segmentation metadata in packets, hypervisors 108 can offload the segmentation decision further to the remote tunnel 109 end-point, thus decoupling the segmentation for overlay from physical 110 limitations of underlay, providing higher flexibility to hyerpervisor 111 implementations to achieve significant performance gains in a major 112 part of VXLAN deployment scenarios. 114 Although the performance gains can be achieved is affected by the 115 physical network MTU, there is inherently no mandatory requirement to 116 physical layer: 118 1) When physical network MTU is far bigger than overlay MTU, the 119 offloading reduces the number of packets being transmitted by TX 120 hypervisors and received in RX hypervisors and RX VMs. 122 2) When physical network MTU is close to overlay MTU, the number of 123 packets being transmitted in physical network (resulted in IP 124 fragmentation) may not be reduced significantly, but on RX side after 125 IP reassembling, the number of packets being delivered from the 126 hypervisor to the receiving VM is largely reduced, thus saving the 127 cost of hypervisor <-> VM interaction and protocol stack of the 128 receiving VM. Furthermore, a minor cost saving is that the bytes 129 being transmitted over physical network is slightly reduced because 130 only one copy of headers (inner L2-L4 header, VXLAN header and outer 131 UDP header) is transmitted for a large overlay packet. 133 In addition, offloading features support from NIC hardware is NOT a 134 requirement, either, to the performance gains discussed above. 136 1.1 Requirements Notation 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 140 document are to be interpreted as described in RFC 2119 [RFC2119]. 142 1.2 Definition of Terms 144 GSO: Generic Segmentation Offload. 146 TSO: TCP Segmentation Offload. 148 NIC: Network Interface Card. 150 VM: Virtual Machine. 152 TX: Sending side. 154 RX: Receiving side. 156 VTEP: Virtual Tunnel End Point 158 2. Approach 160 2.1 VXLAN Header Extension 162 The new VXLAN Segmentation Offloading Extension (VXLAN-soe) header is 163 defined as: 165 0 1 2 3 166 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 167 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 168 |S|C|R|R|I|R|R|R| Reserved | Overlay MTU | 169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 170 | VXLAN Network Identifier (VNI) | Reserved | 171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 173 The changes to VXLAN are: 175 S Bit: Flag bit 0 is defined as the S (Segmentation Offloading 176 Extension) bit. 178 S = 1 indicates that VXLAN-soe is applied to the encapsulated 179 overlay packet, and the C Bit and Overlay MTU field (see below) 180 are valid. 182 S = 0 indicates that VXLAN-soe is NOT applied, and the C Bit and 183 Segment Size field MUST be set to 0 in accordance with VXLAN. 185 C Bit: Flag bit 1 is defined as the C (Checksum) bit. This bit is 186 valid only if the S bit is set to 1. 188 C = 1 indicates that the checksum to the encapsulated packet is 189 required, and SHALL be re-calculated when the segmentation is 190 being performed. 192 C = 0 indicates that the checksum to the encapsulated packet is 193 NOT needed. 195 Overlay MTU: bit 16 - 31 is defined as the MTU desired by TX VM for 196 the segmentation being offloaded. 198 Its value indicates the max size of an overlay segment including 199 its L3 header, but NOT including Ethernet header. This field is 200 valid only if the S bit is set. 202 2.2 TX VTEP 204 VTEP at TX side MUST set the S bit to 1 if the packet to be 205 encapsulated is NOT segmented and it decides to offload the 206 segmentation to the remote end-point. In such case the C bit and 207 Overlay MTU field MUST be set accordingly. This is the typical use 208 case when the TX VTEP is a hypervisor transmitting TCP stream of 209 VMs with large sliding windows. 211 VTEP at TX side MUST clear the S bit if the packet to be 212 encapsulated is segmented already or does NOT need to be segmented 213 in terms of the overlay MTU. In such case, the encapsulation is in 214 the same format as specified in VXLAN. This is the typical use 215 case when the TX VTEP is a hypervisor transmitting small size 216 overlay packets, or a gateway forwarding overlay packets without 217 offloading requirements. 219 2.3 RX VTEP - Hypervisors 221 When a VTEP at RX side is on a hypervisor, checking of the S bit 222 is OPTIONAL. 224 2.4 RX VTEP - Gateways 226 When a VTEP at RX side is on a gateway node that connects overlay 227 networks and physical networks, the S bit MUST be checked and the 228 VTEP MUST ensure the segmentation specified by the header fields 229 is performed by the VTEP itself or offloaded further - it MAY 230 offload the segmentation again to the subsequent transmission 231 mechanisms: such as GSO and TSO, or, if the link to the next hop 232 is also an overlay based on VXLAN-soe (or other tunneling 233 protocols that supports segmentation offloading), pass the 234 segmentation metadata to the next hop. 236 3 Interoperability 238 In addition to offload segmentation requests from VMs, VXLAN-soe 239 enabled VTEP is able to offload segmentation requests from STT [I- 240 D.davie-stt] overlay, because the metadata required in VXLAN-soe 241 header is a subset of STT metadata. The additional segmentation 242 offloading information carried in STT metadata such as L4 offset 243 can be obtained by examine inner headers of the packets. 245 VXLAN-soe defines Overlay MTU at the same position of Protocol 246 Type field in VXLAN-gpe [I-D.quinn-vxlan-gpe], another extension 247 of VXLAN. This is not a problem because VXLAN-soe is introduced 248 for segmentation offloading use cases where Ethernet header is 249 always encapsulated, and it uses different flag bits to be 250 distinguished from VXLAN-gpe. 252 4 Security Considerations 254 There is no special security issues introduced by this extension 255 to VXLAN. 257 5 IANA Considerations 259 This document creates no new requirements on IANA namespaces 260 [RFC5226]. 262 6 References 264 6.1 Normative References 266 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 267 Requirement Levels", BCP 14, RFC 2119, March 1997. 269 6.2 Informative References 271 [I-D.mahalingam-dutt-dcops-vxlan] 272 Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 273 L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A 274 Framework for Overlaying Virtualized Layer 2 Networks over 275 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-08 276 (work in progress), February 2014. 278 [I-D.davie-stt] 279 Davie, B. and J. Gross, "A Stateless Transport Tunneling 280 Protocol for Network Virtualization (STT)", draft-davie- 281 stt-05(work in progress), March 2014. 283 [I-D.quinn-vxlan-gpe] 284 Agarwal, P., Fernando, R., Kreeger, L., Lewis, D., Maino, 285 F., Quinn, P., Yong, L., Xu, X., Smith, M., Yadav, N., and 286 U. Elzur, "Generic Protocol Extension for VXLAN", draft- 287 quinn-vxlan-gpe-02 (work in progress), December 2013. 289 Authors' Addresses 291 Han Zhou 292 eBay, Inc. 294 EMail: hzhou8@ebay.com 296 Chengyuan Li 297 eBay, Inc. 299 Email: chengyli@ebay.com