| < draft-sridharan-virtualization-nvgre-07.txt | draft-sridharan-virtualization-nvgre-08.txt > | |||
|---|---|---|---|---|
| Network Working Group P. Garg Ed. | Network Working Group P. Garg Ed. | |||
| Internet Draft Y. Wang Ed. | Internet Draft Y. Wang Ed. | |||
| Intended Category: Informational Microsoft | Intended Category: Informational Microsoft | |||
| Expires: May 10, 2015 November 11, 2014 | Expires: October 12, 2015 April 13, 2015 | |||
| NVGRE: Network Virtualization using Generic Routing Encapsulation | NVGRE: Network Virtualization using Generic Routing Encapsulation | |||
| draft-sridharan-virtualization-nvgre-07.txt | draft-sridharan-virtualization-nvgre-08.txt | |||
| Status of this Memo | Status of this Memo | |||
| This memo provides information for the Internet Community. It does | This memo provides information for the Internet Community. It does | |||
| not specify an Internet standard of any kind; instead it relies on a | not specify an Internet standard of any kind; instead it relies on a | |||
| proposed standard. Distribution of this memo is unlimited. | proposed standard. Distribution of this memo is unlimited. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
| skipping to change at page 1, line 47 ¶ | skipping to change at page 1, line 47 ¶ | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with | carefully, as they describe your rights and restrictions with | |||
| respect to this document. | respect to this document. | |||
| This Internet-Draft will expire on May 10, 2015. | This Internet-Draft will expire on October 12, 2015. | |||
| Abstract | Abstract | |||
| This document describes the usage of Generic Routing Encapsulation | This document describes the usage of Generic Routing Encapsulation | |||
| (GRE) header for Network Virtualization (NVGRE) in multi-tenant | (GRE) header for Network Virtualization (NVGRE) in multi-tenant | |||
| datacenters. Network Virtualization decouples virtual networks and | datacenters. Network Virtualization decouples virtual networks and | |||
| addresses from physical network infrastructure, providing isolation | addresses from physical network infrastructure, providing isolation | |||
| and concurrency between multiple virtual networks on the same | and concurrency between multiple virtual networks on the same | |||
| physical network infrastructure. This document also introduces a | physical network infrastructure. This document also introduces a | |||
| Network Virtualization framework to illustrate the use cases, but | Network Virtualization framework to illustrate the use cases, but | |||
| skipping to change at page 2, line 41 ¶ | skipping to change at page 2, line 41 ¶ | |||
| 4.5. Address/Policy Management & Routing......................11 | 4.5. Address/Policy Management & Routing......................11 | |||
| 4.6. Cross-subnet, Cross-premise Communication................11 | 4.6. Cross-subnet, Cross-premise Communication................11 | |||
| 4.7. Internet Connectivity....................................13 | 4.7. Internet Connectivity....................................13 | |||
| 4.8. Management and Control Planes............................13 | 4.8. Management and Control Planes............................13 | |||
| 4.9. NVGRE-Aware Devices......................................13 | 4.9. NVGRE-Aware Devices......................................13 | |||
| 4.10. Network Scalability with NVGRE..........................14 | 4.10. Network Scalability with NVGRE..........................14 | |||
| 5. Security Considerations.......................................15 | 5. Security Considerations.......................................15 | |||
| 6. IANA Considerations...........................................15 | 6. IANA Considerations...........................................15 | |||
| 7. References....................................................15 | 7. References....................................................15 | |||
| 7.1. Normative References.....................................15 | 7.1. Normative References.....................................15 | |||
| 7.2. Informative References...................................15 | 7.2. Informative References...................................16 | |||
| 8. Authors and Contributors......................................16 | 8. Authors and Contributors......................................16 | |||
| 9. Acknowledgments...............................................17 | 9. Acknowledgments...............................................17 | |||
| 1. Introduction | 1. Introduction | |||
| Conventional data center network designs cater to largely static | Conventional data center network designs cater to largely static | |||
| workloads and cause fragmentation of network and server capacity | workloads and cause fragmentation of network and server capacity | |||
| [5][6]. There are several issues that limit dynamic allocation and | [6][7]. There are several issues that limit dynamic allocation and | |||
| consolidation of capacity. Layer-2 networks use Rapid Spanning Tree | consolidation of capacity. Layer 2 networks use Rapid Spanning Tree | |||
| Protocol (RSTP) which is designed to eliminate loops by blocking | Protocol (RSTP) which is designed to eliminate loops by blocking | |||
| redundant paths. These eliminated paths translate to wasted capacity | redundant paths. These eliminated paths translate to wasted capacity | |||
| and a highly oversubscribed network. There are alternative | and a highly oversubscribed network. There are alternative | |||
| approaches such as TRILL that address this problem [12]. | approaches such as TRILL that address this problem [13]. | |||
| The network utilization inefficiencies are exacerbated by network | The network utilization inefficiencies are exacerbated by network | |||
| fragmentation due to the use of VLANs for broadcast isolation. VLANs | fragmentation due to the use of VLANs for broadcast isolation. VLANs | |||
| are used for traffic management and also as the mechanism for | are used for traffic management and also as the mechanism for | |||
| providing security and performance isolation among services | providing security and performance isolation among services | |||
| belonging to different tenants. The Layer-2 network is carved into | belonging to different tenants. The Layer 2 network is carved into | |||
| smaller sized subnets typically one subnet per VLAN, with VLAN tags | smaller sized subnets typically one subnet per VLAN, with VLAN tags | |||
| configured on all the Layer-2 switches connected to server racks | configured on all the Layer 2 switches connected to server racks | |||
| that host a given tenant's services. The current VLAN limits | that host a given tenant's services. The current VLAN limits | |||
| theoretically allow for 4K such subnets to be created. The size of | theoretically allow for 4K such subnets to be created. The size of | |||
| the broadcast domain is typically restricted due to the overhead of | the broadcast domain is typically restricted due to the overhead of | |||
| broadcast traffic (e.g., ARP). The 4K VLAN limit is no longer | broadcast traffic. The 4K VLAN limit is no longer sufficient in a | |||
| sufficient in a shared infrastructure servicing multiple tenants. | shared infrastructure servicing multiple tenants. | |||
| Data center operators must be able to achieve high utilization of | Data center operators must be able to achieve high utilization of | |||
| server and network capacity. In order to achieve efficiency it | server and network capacity. In order to achieve efficiency it | |||
| should be possible to assign workloads that operate in a single | should be possible to assign workloads that operate in a single | |||
| Layer-2 network to any server in any rack in the network. It should | Layer 2 network to any server in any rack in the network. It should | |||
| also be possible to migrate workloads to any server anywhere in the | also be possible to migrate workloads to any server anywhere in the | |||
| network while retaining the workloads' addresses. This can be | network while retaining the workloads' addresses. This can be | |||
| achieved today by stretching VLANs, however when workloads migrate | achieved today by stretching VLANs, however when workloads migrate | |||
| the network needs to be reconfigured which is typically error prone. | the network needs to be reconfigured which is typically error prone. | |||
| By decoupling the workload's location on the LAN from its network | By decoupling the workload's location on the LAN from its network | |||
| address, the network administrator configures the network once and | address, the network administrator configures the network once and | |||
| not every time a service migrates. This decoupling enables any | not every time a service migrates. This decoupling enables any | |||
| server to become part of any server resource pool. | server to become part of any server resource pool. | |||
| The following are key design objectives for next generation data | The following are key design objectives for next generation data | |||
| centers: | centers: | |||
| a) location independent addressing | a) location independent addressing | |||
| b) the ability to a scale the number of logical Layer-2/Layer-3 | b) the ability to a scale the number of logical Layer 2/Layer 3 | |||
| networks irrespective of the underlying physical topology or | networks irrespective of the underlying physical topology or | |||
| the number of VLANs | the number of VLANs | |||
| c) preserving Layer-2 semantics for services and allowing them to | c) preserving Layer 2 semantics for services and allowing them to | |||
| retain their addresses as they move within and across data | retain their addresses as they move within and across data | |||
| centers | centers | |||
| d) providing broadcast isolation as workloads move around without | d) providing broadcast isolation as workloads move around without | |||
| burdening the network control plane | burdening the network control plane | |||
| This document describes the use of Generic Routing Encapsulation | This document describes use of the Generic Routing Encapsulation | |||
| (GRE, [3][4]) header for network virtualization. Network | (GRE, [3][4]) header for network virtualization. Network | |||
| virtualization decouples a virtual network from the underlying | virtualization decouples a virtual network from the underlying | |||
| physical network infrastructure by virtualizing network addresses. | physical network infrastructure by virtualizing network addresses. | |||
| Combined with a management and control plane for the virtual-to- | Combined with a management and control plane for the virtual-to- | |||
| physical mapping, network virtualization can enable flexible VM | physical mapping, network virtualization can enable flexible virtual | |||
| placement and movement, and provide network isolation for a multi- | machine placement and movement, and provide network isolation for a | |||
| tenant datacenter. | multi-tenant datacenter. | |||
| Network virtualization enables customers to bring their own address | Network virtualization enables customers to bring their own address | |||
| spaces into a multi-tenant datacenter while the datacenter | spaces into a multi-tenant datacenter while the datacenter | |||
| administrators can place the customer VMs anywhere in the datacenter | administrators can place the customer virtual machines anywhere in | |||
| without reconfiguring their network switches or routers, | the datacenter without reconfiguring their network switches or | |||
| irrespective of the customer address spaces. | routers, irrespective of the customer address spaces. | |||
| 1.1. Terminology | 1.1. Terminology | |||
| Please refer to [8][10] for more formal definition of terminology. | Please refer to [9][11] for more formal definition of terminology. | |||
| The following terms were used in this document. | The following terms were used in this document. | |||
| Customer Address (CA): These are the virtual IP addresses assigned | Customer Address (CA): These are the virtual IP addresses assigned | |||
| and configured on the virtual NIC within each VM. These are the only | and configured on the virtual NIC within each VM. These are the only | |||
| addresses visible to VMs and applications running within VMs. | addresses visible to VMs and applications running within VMs. | |||
| NVE: Network Virtualization Edge, the entity that performs the | Network Virtualization Edge (NVE): An entity that performs the | |||
| network virtualization encapsulation and decapsulation. | network virtualization encapsulation and decapsulation. | |||
| Provider Address (PA): These are the IP addresses used in the | Provider Address (PA): These are the IP addresses used in the | |||
| physical network. PA's are associated with VM CA's through the | physical network. PA's are associated with VM CA's through the | |||
| network virtualization mapping policy. | network virtualization mapping policy. | |||
| VM: Virtual Machine. Virtual machines are typically instances of | Virtual Machine (VM): These are instances of OS's running on top of | |||
| OS's running on top of hypervisor over a physical machine or server. | hypervisor over a physical machine or server. Multiple VMs can share | |||
| Multiple VMs can share the same physical server via the hypervisor, | the same physical server via the hypervisor, yet are completely | |||
| yet are completely isolated from each other in terms of compute, | isolated from each other in terms of compute, storage, and other OS | |||
| storage, and other OS resources. | resources. | |||
| VSID: Virtual Subnet Identifier, a 24-bit ID that uniquely | Virtual Subnet Identifier (VSID): a 24-bit ID that uniquely | |||
| identifies a virtual subnet or virtual layer 2 broadcast domain. | identifies a virtual subnet or virtual layer 2 broadcast domain. | |||
| 2. Conventions used in this document | 2. Conventions used in this document | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC-2119 [RFC2119]. | document are to be interpreted as described in RFC-2119 [RFC2119]. | |||
| In this document, these words will appear with that interpretation | In this document, these words will appear with that interpretation | |||
| only when in ALL CAPS. Lower case uses of these words are not to be | only when in ALL CAPS. Lower case uses of these words are not to be | |||
| interpreted as carrying RFC-2119 significance. | interpreted as carrying RFC-2119 significance. | |||
| 3. NVGRE: Network Virtualization using GRE | 3. Network Virtualization using GRE (NVGRE) | |||
| This section describes Network Virtualization using GRE, NVGRE. | This section describes Network Virtualization using GRE, NVGRE. | |||
| Network virtualization involves creating virtual Layer 2 topologies | Network virtualization involves creating virtual Layer 2 topologies | |||
| on top of a physical Layer 3 network. Connectivity in the virtual | on top of a physical Layer 3 network. Connectivity in the virtual | |||
| topology is provided by tunneling Ethernet frames in GRE over the | topology is provided by tunneling Ethernet frames in GRE over IP | |||
| physical network. | over the physical network. | |||
| In NVGRE, every virtual Layer-2 network is associated with a 24-bit | In NVGRE, every virtual Layer 2 network is associated with a 24-bit | |||
| identifier, called Virtual Subnet Identifier (VSID). VSID is carried | identifier, called a Virtual Subnet Identifier (VSID). A VSID is | |||
| in an outer header as defined in Section 3.2. , allowing unique | carried in an outer header as defined in Section 3.2. , allowing | |||
| identification of a tenant's virtual subnet to various devices in | unique identification of a tenant's virtual subnet to various | |||
| the network. A 24-bit VSID supports up to 16 million virtual subnets | devices in the network. A 24-bit VSID supports up to 16 million | |||
| in the same management domain, in contrast to only 4K achievable | virtual subnets in the same management domain, in contrast to only | |||
| with VLANs. Each VSID represents a virtual Layer-2 broadcast domain, | 4K achievable with VLANs. Each VSID represents a virtual Layer 2 | |||
| which can be used to identify a virtual subnet of a given tenant. To | broadcast domain, which can be used to identify a virtual subnet of | |||
| support multi-subnet virtual topology, datacenter administrators can | a given tenant. To support multi-subnet virtual topology, datacenter | |||
| configure routes to facilitate communication between virtual subnets | administrators can configure routes to facilitate communication | |||
| of the same tenant. | between virtual subnets of the same tenant. | |||
| GRE is a proposed IETF standard [3][4] and provides a way for | GRE is a proposed IETF standard [3][4] and provides a way for | |||
| encapsulating an arbitrary protocol over IP. NVGRE leverages the GRE | encapsulating an arbitrary protocol over IP. NVGRE leverages the GRE | |||
| header to carry VSID information in each packet. The VSID | header to carry VSID information in each packet. The VSID | |||
| information in each packet can be used to build multi-tenant-aware | information in each packet can be used to build multi-tenant-aware | |||
| tools for traffic analysis, traffic inspection, and monitoring. | tools for traffic analysis, traffic inspection, and monitoring. | |||
| The following sections detail the packet format for NVGRE, describe | The following sections detail the packet format for NVGRE, describe | |||
| the functions of a NVGRE endpoint, illustrate typical traffic flow | the functions of a NVGRE endpoint, illustrate typical traffic flow | |||
| both within and across data centers, and discuss address, policy | both within and across data centers, and discuss address, policy | |||
| management and deployment considerations. | management and deployment considerations. | |||
| 3.1. NVGRE Endpoint | 3.1. NVGRE Endpoint | |||
| NVGRE endpoints are the ingress/egress points between the virtual | NVGRE endpoints are the ingress/egress points between the virtual | |||
| and the physical networks. The NVGRE endpoints are the NVEs as | and the physical networks. The NVGRE endpoints are the NVEs as | |||
| defined in the NVO Framework document [8]. Any physical server or | defined in the NVO Framework document [9]. Any physical server or | |||
| network device can be an NVGRE endpoint. One common deployment is | network device can be an NVGRE endpoint. One common deployment is | |||
| for the endpoint to be part of a hypervisor. The primary function of | for the endpoint to be part of a hypervisor. The primary function of | |||
| this endpoint is to encapsulate/decapsulate Ethernet data frames to | this endpoint is to encapsulate/decapsulate Ethernet data frames to | |||
| and from the GRE tunnel, ensure Layer-2 semantics, and apply | and from the GRE tunnel, ensure Layer 2 semantics, and apply | |||
| isolation policy scoped on VSID. The endpoint can optionally | isolation policy scoped on VSID. The endpoint can optionally | |||
| participate in routing and function as a gateway in the virtual | participate in routing and function as a gateway in the virtual | |||
| topology. To encapsulate an Ethernet frame, the endpoint needs to | topology. To encapsulate an Ethernet frame, the endpoint needs to | |||
| know the location information for the destination address in the | know the location information for the destination address in the | |||
| frame. This information can be provisioned via a management plane, | frame. This information can be provisioned via a management plane, | |||
| or obtained via a combination of control plane distribution or data | or obtained via a combination of control plane distribution or data | |||
| plane learning approaches. This document assumes that the location | plane learning approaches. This document assumes that the location | |||
| information, including VSID, is available to the NVGRE endpoint. | information, including VSID, is available to the NVGRE endpoint. | |||
| 3.2. NVGRE frame format | 3.2. NVGRE frame format | |||
| GRE header format as specified in RFC 2784 and RFC 2890 [3][4] is | The GRE header format as specified in RFC 2784 and RFC 2890 [3][4] | |||
| used for communication between NVGRE endpoints. NVGRE leverages the | is used for communication between NVGRE endpoints. NVGRE leverages | |||
| Key extension specified in RFC 2890 to carry the VSID. The packet | the Key extension specified in RFC 2890 [4] to carry the VSID. The | |||
| format for Layer-2 encapsulation in GRE is shown in Figure 1. | packet format for Layer 2 encapsulation in GRE is shown in Figure 1. | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| Outer Ethernet Header: | | Outer Ethernet Header: | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | (Outer) Destination MAC Address | | | (Outer) Destination MAC Address | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |(Outer)Destination MAC Address | (Outer)Source MAC Address | | |(Outer)Destination MAC Address | (Outer)Source MAC Address | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | (Outer) Source MAC Address | | | (Outer) Source MAC Address | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| skipping to change at page 9, line 10 ¶ | skipping to change at page 9, line 10 ¶ | |||
| The GRE header: | The GRE header: | |||
| o The C (Checksum Present) and S (Sequence Number Present) bits in | o The C (Checksum Present) and S (Sequence Number Present) bits in | |||
| the GRE header MUST be zero. | the GRE header MUST be zero. | |||
| o The K bit (Key Present) in the GRE header MUST be set to one. The | o The K bit (Key Present) in the GRE header MUST be set to one. The | |||
| 32-bit Key field in the GRE header is used to carry the Virtual | 32-bit Key field in the GRE header is used to carry the Virtual | |||
| Subnet ID (VSID), and the FlowId: | Subnet ID (VSID), and the FlowId: | |||
| - Virtual Subnet ID (VSID): This is a 24-bit value that is used | - Virtual Subnet ID (VSID): This is a 24-bit value that is used | |||
| to identify the NVGRE based Virtual Layer-2 Network. | to identify the NVGRE based Virtual Layer 2 Network. | |||
| - FlowID: This is an 8-bit value that is used to provide per- | - FlowID: This is an 8-bit value that is used to provide per- | |||
| flow entropy for flows in the same VSID. The FlowID MUST NOT | flow entropy for flows in the same VSID. The FlowID MUST NOT | |||
| be modified by transit devices. The encapsulating NVE SHOULD | be modified by transit devices. The encapsulating NVE SHOULD | |||
| provide as much entropy as possible in the FlowId. If a FlowID | provide as much entropy as possible in the FlowId. If a FlowID | |||
| is not generated, it MUST be set to all zero. | is not generated, it MUST be set to all zero. | |||
| o The protocol type field in the GRE header is set to 0x6558 | o The protocol type field in the GRE header is set to 0x6558 | |||
| (transparent Ethernet bridging)[2]. | (transparent Ethernet bridging)[2]. | |||
| The inner headers (headers of the GRE payload): | The inner headers (headers of the GRE payload): | |||
| skipping to change at page 10, line 5 ¶ | skipping to change at page 10, line 5 ¶ | |||
| 3.4. Reserved VSID | 3.4. Reserved VSID | |||
| The VSID range from 0-0xFFF is reserved for future use. | The VSID range from 0-0xFFF is reserved for future use. | |||
| The VSID 0xFFFFFF is reserved for vendor specific NVE-NVE | The VSID 0xFFFFFF is reserved for vendor specific NVE-NVE | |||
| communication. The sender NVE SHOULD verify receiver NVE's vendor | communication. The sender NVE SHOULD verify receiver NVE's vendor | |||
| before sending a packet using this VSID, however such verification | before sending a packet using this VSID, however such verification | |||
| mechanism is out of scope of this document. Implementations SHOULD | mechanism is out of scope of this document. Implementations SHOULD | |||
| choose a mechanism that meets their requirements. | choose a mechanism that meets their requirements. | |||
| 4. NVGRE Deployment Consideration | 4. NVGRE Deployment Considerations | |||
| 4.1. ECMP Support | 4.1. ECMP Support | |||
| The switches and routers SHOULD provide ECMP on NVGRE packets using | ECMP may be used to provide load balancing. If ECMP is used, it is | |||
| the outer frame fields and entire Key field (32-bit). | RECOMMENDED that ECMP hash is calculated either using the outer IP | |||
| frame fields and entire Key field (32-bit) or inner IP and transport | ||||
| frame fields. | ||||
| 4.2. Broadcast and Multicast Traffic | 4.2. Broadcast and Multicast Traffic | |||
| To support broadcast and multicast traffic inside a virtual subnet, | To support broadcast and multicast traffic inside a virtual subnet, | |||
| one or more administratively scoped multicast addresses [7][9] can | one or more administratively scoped multicast addresses [8][10] can | |||
| be assigned for the VSID. All multicast or broadcast traffic | be assigned for the VSID. All multicast or broadcast traffic | |||
| originating from within a VSID is encapsulated and sent to the | originating from within a VSID is encapsulated and sent to the | |||
| assigned multicast address. From an administrative standpoint it is | assigned multicast address. From an administrative standpoint it is | |||
| possible for network operators to configure a PA multicast address | possible for network operators to configure a PA multicast address | |||
| for each multicast address that is used inside a VSID, to facilitate | for each multicast address that is used inside a VSID, to facilitate | |||
| optimal multicast handling. Depending on the hardware capabilities | optimal multicast handling. Depending on the hardware capabilities | |||
| of the physical network devices and the physical network | of the physical network devices and the physical network | |||
| architecture, multiple virtual subnet may re-use the same physical | architecture, multiple virtual subnet may re-use the same physical | |||
| IP multicast address. | IP multicast address. | |||
| Alternatively, based upon the configuration at NVE, the broadcast | Alternatively, based upon the configuration at NVE, the broadcast | |||
| and multicast in the virtual subnet can be supported using N-Way | and multicast in the virtual subnet can be supported using N-Way | |||
| unicast. In N-Way unicast, the sender NVE would send one | unicast. In N-Way unicast, the sender NVE would send one | |||
| encapsulated packet to every NVE in the virtual subnet. The sender | encapsulated packet to every NVE in the virtual subnet. The sender | |||
| NVE can encapsulate and send the packet as described in the Unicast | NVE can encapsulate and send the packet as described in the Unicast | |||
| Traffic Section 4.3. This alleviates the need for multicast support | Traffic Section 4.3. This alleviates the need for multicast support | |||
| in the physical network. | in the physical network. | |||
| 4.3. Unicast Traffic | 4.3. Unicast Traffic | |||
| The NVGRE endpoint encapsulates a Layer-2 packet in GRE using the | The NVGRE endpoint encapsulates a Layer 2 packet in GRE using the | |||
| source PA associated with the endpoint with the destination PA | source PA associated with the endpoint with the destination PA | |||
| corresponding to the location of the destination endpoint. As | corresponding to the location of the destination endpoint. As | |||
| outlined earlier, there can be one or more PAs associated with an | outlined earlier, there can be one or more PAs associated with an | |||
| endpoint and policy will control which ones get used for | endpoint and policy will control which ones get used for | |||
| communication. The encapsulated GRE packet is bridged and routed | communication. The encapsulated GRE packet is bridged and routed | |||
| normally by the physical network to the destination PA. Bridging | normally by the physical network to the destination PA. Bridging | |||
| uses the outer Ethernet encapsulation for scope on the LAN. The only | uses the outer Ethernet encapsulation for scope on the LAN. The only | |||
| requirement is bi-directional IP connectivity from the underlying | requirement is bi-directional IP connectivity from the underlying | |||
| physical network. On the destination, the NVGRE endpoint | physical network. On the destination, the NVGRE endpoint | |||
| decapsulates the GRE packet to recover the original Layer-2 frame. | decapsulates the GRE packet to recover the original Layer 2 frame. | |||
| Traffic flows similarly on the reverse path. | Traffic flows similarly on the reverse path. | |||
| 4.4. IP Fragmentation | 4.4. IP Fragmentation | |||
| RFC 2003 [11] Section 5.1 specifies mechanisms for handling | RFC 2003 [12] Section 5.1 specifies mechanisms for handling | |||
| fragmentation when encapsulating IP within IP. The subset of | fragmentation when encapsulating IP within IP. The subset of | |||
| mechanisms NVGRE selects are intended to ensure that NVGRE | mechanisms NVGRE selects are intended to ensure that NVGRE | |||
| encapsulated frames are not fragmented after encapsulation en-route | encapsulated frames are not fragmented after encapsulation en-route | |||
| to the destination NVGRE endpoint, and that traffic sources can | to the destination NVGRE endpoint, and that traffic sources can | |||
| leverage Path MTU discovery. A future version of this draft will | leverage Path MTU discovery. | |||
| clarify the details around setting the DF bit on the outer IP header | ||||
| as well as maintaining per destination NVGRE endpoint MTU soft state | A sender NVE MUST NOT fragment NVGRE packets. A receiver NVE MAY | |||
| so that ICMP Datagram Too Big messages can be exploited. | discard fragmented NVGRE packets. It is RECOMMENDED that MTU of | |||
| Fragmentation behavior when tunneling non-IP Ethernet frames in GRE | physical network accommodates the larger frame size due to | |||
| will also be specified in a future version. | encapsulation. Path MTU or configuration via control plane can be | |||
| used to meet this requirement. | ||||
| 4.5. Address/Policy Management & Routing | 4.5. Address/Policy Management & Routing | |||
| Address acquisition is beyond the scope of this document and can be | Address acquisition is beyond the scope of this document and can be | |||
| obtained statically, dynamically or using stateless address auto- | obtained statically, dynamically or using stateless address auto- | |||
| configuration. CA and PA space can be either IPv4 or IPv6. In fact | configuration. CA and PA space can be either IPv4 or IPv6. In fact | |||
| the address families don't have to match, for example, a CA can be | the address families don't have to match, for example, a CA can be | |||
| IPv4 while the PA is IPv6 and vice versa. | IPv4 while the PA is IPv6 and vice versa. | |||
| 4.6. Cross-subnet, Cross-premise Communication | 4.6. Cross-subnet, Cross-premise Communication | |||
| skipping to change at page 13, line 43 ¶ | skipping to change at page 13, line 43 ¶ | |||
| 4.8. Management and Control Planes | 4.8. Management and Control Planes | |||
| There are several protocols that can manage and distribute policy; | There are several protocols that can manage and distribute policy; | |||
| however, it is out of scope of this document. Implementations SHOULD | however, it is out of scope of this document. Implementations SHOULD | |||
| choose a mechanism that meets their scale requirements. | choose a mechanism that meets their scale requirements. | |||
| 4.9. NVGRE-Aware Devices | 4.9. NVGRE-Aware Devices | |||
| One example of a typical deployment consists of virtualized servers | One example of a typical deployment consists of virtualized servers | |||
| deployed across multiple racks connected by one or more layers of | deployed across multiple racks connected by one or more layers of | |||
| Layer-2 switches which in turn may be connected to a layer 3 routing | Layer 2 switches which in turn may be connected to a layer 3 routing | |||
| domain. Even though routing in the physical infrastructure will work | domain. Even though routing in the physical infrastructure will work | |||
| without any modification with NVGRE, devices that perform | without any modification with NVGRE, devices that perform | |||
| specialized processing in the network need to be able to parse GRE | specialized processing in the network need to be able to parse GRE | |||
| to get access to tenant specific information. Devices that | to get access to tenant specific information. Devices that | |||
| understand and parse the VSID can provide rich multi-tenancy aware | understand and parse the VSID can provide rich multi-tenancy aware | |||
| services inside the data center. As outlined earlier it is | services inside the data center. As outlined earlier it is | |||
| imperative to exploit multiple paths inside the network through | imperative to exploit multiple paths inside the network through | |||
| techniques such as Equal Cost Multipath (ECMP). The Key field (32- | techniques such as Equal Cost Multipath (ECMP). The Key field (32- | |||
| bit field, including both VSID and the optional FlowID) can provide | bit field, including both VSID and the optional FlowID) can provide | |||
| additional entropy to the switches to exploit path diversity inside | additional entropy to the switches to exploit path diversity inside | |||
| skipping to change at page 15, line 4 ¶ | skipping to change at page 15, line 4 ¶ | |||
| and in turn MAC address table scalability that can be achieved. | and in turn MAC address table scalability that can be achieved. | |||
| NVGRE endpoint can use one PA to represent multiple CAs. This lowers | NVGRE endpoint can use one PA to represent multiple CAs. This lowers | |||
| the burden on the MAC address table sizes at the Top of Rack | the burden on the MAC address table sizes at the Top of Rack | |||
| switches. One obvious benefit is in the context of server | switches. One obvious benefit is in the context of server | |||
| virtualization which has increased the demands on the network | virtualization which has increased the demands on the network | |||
| infrastructure. By embedding a NVGRE endpoint in a hypervisor it is | infrastructure. By embedding a NVGRE endpoint in a hypervisor it is | |||
| possible to scale significantly. This framework allows for location | possible to scale significantly. This framework allows for location | |||
| information to be preconfigured inside a NVGRE endpoint allowing | information to be preconfigured inside a NVGRE endpoint allowing | |||
| broadcast ARP traffic to be proxied locally. This approach can scale | broadcast ARP traffic to be proxied locally. This approach can scale | |||
| to large sized virtual subnets. These virtual subnets can be spread | to large sized virtual subnets. These virtual subnets can be spread | |||
| across multiple layer-3 physical subnets. It allows workloads to be | across multiple layer 3 physical subnets. It allows workloads to be | |||
| moved around without imposing a huge burden on the network control | moved around without imposing a huge burden on the network control | |||
| plane. By eliminating most broadcast traffic and converting others | plane. By eliminating most broadcast traffic and converting others | |||
| to multicast the routers and switches can function more efficiently | to multicast the routers and switches can function more efficiently | |||
| by building efficient multicast trees. By using server and network | by building efficient multicast trees. By using server and network | |||
| capacity efficiently it is possible to drive down the cost of | capacity efficiently it is possible to drive down the cost of | |||
| building and managing data centers. | building and managing data centers. | |||
| 5. Security Considerations | 5. Security Considerations | |||
| This proposal extends the Layer-2 subnet across the data center and | This proposal extends the Layer 2 subnet across the data center and | |||
| increases the scope for spoofing attacks. Mitigations of such | increases the scope for spoofing attacks. Mitigations of such | |||
| attacks are possible with authentication/encryption using IPsec or | attacks are possible with authentication/encryption using IPsec or | |||
| any other IP based mechanism. The control plane for policy | any other IP based mechanism. The control plane for policy | |||
| distribution is expected to be secured by using any of the existing | distribution is expected to be secured by using any of the existing | |||
| security protocols. Further management traffic can be isolated in a | security protocols. Further management traffic can be isolated in a | |||
| separate subnet/VLAN. | separate subnet/VLAN. | |||
| The checksum in the GRE header is not supported. The mitigation of | The checksum in the GRE header is not supported. The mitigation of | |||
| this is to deploy NVGRE based solution in a network that provides | this is to deploy NVGRE based solution in a network that provides | |||
| error detection along the NVGRE packet path, for example, using | error detection along the NVGRE packet path, for example, using | |||
| skipping to change at page 15, line 47 ¶ | skipping to change at page 15, line 47 ¶ | |||
| [2] Ethertypes, ftp://ftp.isi.edu/in- | [2] Ethertypes, ftp://ftp.isi.edu/in- | |||
| notes/iana/assignments/ethernet-numbers | notes/iana/assignments/ethernet-numbers | |||
| [3] D. Farinacci et al, "Generic Routing Encapsulation (GRE)", RFC | [3] D. Farinacci et al, "Generic Routing Encapsulation (GRE)", RFC | |||
| 2784, March, 2000. | 2784, March, 2000. | |||
| [4] G. Dommety, "Key and Sequence Number Extensions to GRE", RFC | [4] G. Dommety, "Key and Sequence Number Extensions to GRE", RFC | |||
| 2890, September 2000. | 2890, September 2000. | |||
| [5] Institute of Electrical and Electronics Engineers, "Virtual | ||||
| Bridged Local Area Networks", IEEE Standard 802.1Q, 2005 | ||||
| Edition, May 2006. | ||||
| 7.2. Informative References | 7.2. Informative References | |||
| [5] A. Greenberg et al, "VL2: A Scalable and Flexible Data Center | [6] A. Greenberg et al, "VL2: A Scalable and Flexible Data Center | |||
| Network", Proc. SIGCOMM 2009. | Network", Proc. SIGCOMM 2009. | |||
| [6] A. Greenberg et al, "The Cost of a Cloud: Research Problems in | [7] A. Greenberg et al, "The Cost of a Cloud: Research Problems in | |||
| the Data Center", ACM SIGCOMM Computer Communication Review. | the Data Center", ACM SIGCOMM Computer Communication Review. | |||
| [7] B. Hinden, S. Deering, "IP Version 6 Addressing Architecture", | [8] B. Hinden, S. Deering, "IP Version 6 Addressing Architecture", | |||
| RFC 4291, February 2006. | RFC 4291, February 2006. | |||
| [8] M. Lasserre et al, "Framework for DC Network Virtualization", | [9] M. Lasserre et al, "Framework for DC Network Virtualization", | |||
| draft-ietf-nov3-framework (work in progress), July 2014. | RFC 7365, October 2014. | |||
| [9] D. Meyer, "Administratively Scoped IP Multicast", BCP 23, RFC | [10] D. Meyer, "Administratively Scoped IP Multicast", BCP 23, RFC | |||
| 2365, July 1998. | 2365, July 1998. | |||
| [10] T. Narten et al, "Problem Statement: Overlays for Network | [11] T. Narten et al, "Problem Statement: Overlays for Network | |||
| Virtualization", draft-ietf-nvo3-overlay-problem-statement | Virtualization", RFC 7364, October 2014. | |||
| (work in progress), July 2013. | ||||
| [11] C. Perkins, "IP Encapsulation within IP", RFC 2003, October | [12] C. Perkins, "IP Encapsulation within IP", RFC 2003, October | |||
| 1996. | 1996. | |||
| [12] J. Touch, R. Perlman, "Transparent Interconnection of Lots of | [13] J. Touch, R. Perlman, "Transparent Interconnection of Lots of | |||
| Links (TRILL): Problem and Applicability Statement", RFC 5556, | Links (TRILL): Problem and Applicability Statement", RFC 5556, | |||
| May 2009. | May 2009. | |||
| 8. Authors and Contributors | 8. Authors and Contributors | |||
| M. Sridharan | M. Sridharan | |||
| A. Greenberg | A. Greenberg | |||
| Y. Wang | Y. Wang | |||
| P. Garg | P. Garg | |||
| N. Venkataramiah | N. Venkataramiah | |||
| End of changes. 45 change blocks. | ||||
| 77 lines changed or deleted | 83 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||