idnits 2.17.1 draft-ietf-nvo3-dataplane-requirements-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 18 longer pages, the longest (page 3) being 70 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 18 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 178 instances of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 10 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 15, 2014) is 3664 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'NVOPS' is defined on line 717, but no explicit reference was found in the text == Unused Reference: 'OVCPREQ' is defined on line 725, but no explicit reference was found in the text == Unused Reference: 'FLOYD' is defined on line 729, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 732, but no explicit reference was found in the text == Unused Reference: 'RFC6438' is defined on line 748, but no explicit reference was found in the text == Unused Reference: 'RFC6391' is defined on line 752, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Nabil Bitar 3 Internet Draft Verizon 4 Intended status: Informational 5 Expires: Oct 2014 Marc Lasserre 6 Florin Balus 7 Alcatel-Lucent 9 Thomas Morin 10 France Telecom Orange 12 Lizhong Jin 14 Bhumip Khasnabish 15 ZTE 17 April 15, 2014 19 NVO3 Data Plane Requirements 20 draft-ietf-nvo3-dataplane-requirements-03.txt 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six 33 months and may be updated, replaced, or obsoleted by other documents 34 at any time. It is inappropriate to use Internet-Drafts as 35 reference material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on Oct 15, 2014. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the Simplified BSD License. 54 Abstract 56 Several IETF drafts relate to the use of overlay networks to support 57 large scale virtual data centers. This draft provides a list of data 58 plane requirements for Network Virtualization over L3 (NVO3) that 59 have to be addressed in solutions documents. 61 Table of Contents 63 1. Introduction.................................................3 64 1.1. Conventions used in this document.......................3 65 1.2. General terminology.....................................3 66 2. Data Path Overview...........................................3 67 3. Data Plane Requirements......................................5 68 3.1. Virtual Access Points (VAPs)............................5 69 3.2. Virtual Network Instance (VNI)..........................5 70 3.2.1. L2 VNI................................................5 71 3.2.2. L3 VNI................................................6 72 3.3. Overlay Module..........................................7 73 3.3.1. NVO3 overlay header...................................8 74 3.3.1.1. Virtual Network Context Identification..............8 75 3.3.1.2. Quality of Service (QoS) identifier.................8 76 3.3.2. Tunneling function....................................9 77 3.3.2.1. LAG and ECMP........................................9 78 3.3.2.2. DiffServ and ECN marking...........................10 79 3.3.2.3. Handling of BUM traffic............................11 80 3.4. External NVO3 connectivity.............................11 81 3.4.1. Gateway (GW) Types...................................12 82 3.4.1.1. VPN and Internet GWs...............................12 83 3.4.1.2. Inter-DC GW........................................12 84 3.4.1.3. Intra-DC gateways..................................12 85 3.4.2. Path optimality between NVEs and Gateways............12 86 3.4.2.1. Load-balancing.....................................13 87 3.4.2.2. Triangular Routing Issues..........................14 88 3.5. Path MTU...............................................14 89 3.6. Hierarchical NVE dataplane requirements................15 90 3.7. Other considerations...................................15 91 3.7.1. Data Plane Optimizations.............................15 92 3.7.2. NVE location trade-offs..............................15 93 4. Security Considerations.....................................16 94 5. IANA Considerations.........................................16 95 6. References..................................................16 96 6.1. Normative References...................................16 97 6.2. Informative References.................................16 98 7. Acknowledgments.............................................17 100 1. Introduction 102 1.1. Conventions used in this document 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in RFC-2119 [RFC2119]. 108 In this document, these words will appear with that interpretation 109 only when in ALL CAPS. Lower case uses of these words are not to be 110 interpreted as carrying RFC-2119 significance. 112 1.2. General terminology 114 The terminology defined in [NVO3-framework] is used throughout this 115 document. Terminology specific to this memo is defined here and is 116 introduced as needed in later sections. 118 BUM: Broadcast, Unknown Unicast, Multicast traffic 120 TS: Tenant System 122 2. Data Path Overview 124 The NVO3 framework [NVO3-framework] defines the generic NVE model 125 depicted in Figure 1: 127 +------- L3 Network ------+ 128 | | 129 | Tunnel Overlay | 130 +------------+---------+ +---------+------------+ 131 | +----------+-------+ | | +---------+--------+ | 132 | | Overlay Module | | | | Overlay Module | | 133 | +---------+--------+ | | +---------+--------+ | 134 | |VN context| | VN context| | 135 | | | | | | 136 | +-------+--------+ | | +--------+-------+ | 137 | | |VNI| ... |VNI| | | | |VNI| ... |VNI| | 138 NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2 139 | | VAPs | | | | VAPs | | 140 +----+------------+----+ +----+------------+----+ 141 | | | | 142 -------+------------+-----------------+------------+------- 143 | | Tenant | | 144 | | Service IF | | 145 Tenant Systems Tenant Systems 147 Figure 1 : Generic reference model for NV Edge 149 When a frame is received by an ingress NVE from a Tenant System over 150 a local VAP, it needs to be parsed in order to identify which 151 virtual network instance it belongs to. The parsing function can 152 examine various fields in the data frame (e.g., VLANID) and/or 153 associated interface/port the frame came from. 155 Once a corresponding VNI is identified, a lookup is performed to 156 determine where the frame needs to be sent. This lookup can be based 157 on any combinations of various fields in the data frame (e.g., 158 destination MAC addresses and/or destination IP addresses). Note 159 that additional criteria such as Ethernet 802.1p priorities and/or 160 DSCP markings might be used to select an appropriate tunnel or local 161 VAP destination. 163 Lookup tables can be populated using different techniques: data 164 plane learning, management plane configuration, or a distributed 165 control plane. Management and control planes are not in the scope of 166 this document. The data plane based solution is described in this 167 document as it has implications on the data plane processing 168 function. 170 The result of this lookup yields the corresponding information 171 needed to build the overlay header, as described in section 3.3. 172 This information includes the destination L3 address of the egress 173 NVE. Note that this lookup might yield a list of tunnels such as 174 when ingress replication is used for BUM traffic. 176 The overlay header MUST include a context identifier which the 177 egress NVE will use to identify which VNI this frame belongs to. 179 The egress NVE checks the context identifier and removes the 180 encapsulation header and then forwards the original frame towards 181 the appropriate recipient, usually a local VAP. 183 3. Data Plane Requirements 185 3.1. Virtual Access Points (VAPs) 187 The NVE forwarding plane MUST support VAP identification through the 188 following mechanisms: 190 - Using the local interface on which the frames are received, where 191 the local interface may be an internal, virtual port in a virtual 192 switch or a physical port on a ToR switch 193 - Using the local interface and some fields in the frame header, 194 e.g. one or multiple VLANs or the source MAC 196 3.2. Virtual Network Instance (VNI) 198 VAPs are associated with a specific VNI at service instantiation 199 time. 201 A VNI identifies a per-tenant private context, i.e. per-tenant 202 policies and a FIB table to allow overlapping address space between 203 tenants. 205 There are different VNI types differentiated by the virtual network 206 service they provide to Tenant Systems. Network virtualization can 207 be provided by L2 and/or L3 VNIs. 209 3.2.1. L2 VNI 211 An L2 VNI MUST provide an emulated Ethernet multipoint service as if 212 Tenant Systems are interconnected by a bridge (but instead by using 213 a set of NVO3 tunnels). The emulated bridge could be 802.1Q enabled 214 (allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant 215 virtual switching instance with MAC addressing isolation and L3 216 tunneling. Loop avoidance capability MUST be provided. 218 Forwarding table entries provide mapping information between tenant 219 system MAC addresses and VAPs on directly connected VNIs and L3 220 tunnel destination addresses over the overlay. Such entries could be 221 populated by a control or management plane, or via data plane. 223 Unless a control plane is used to disseminate address mappings, data 224 plane learning MUST be used to populate forwarding tables. As frames 225 arrive from VAPs or from overlay tunnels, standard MAC learning 226 procedures are used: The tenant system source MAC address is learned 227 against the VAP or the NVO3 tunneling encapsulation source address 228 on which the frame arrived. Data plane learning implies that unknown 229 unicast traffic will be flooded (i.e. broadcast). 231 When flooding is required, either to deliver unknown unicast, or 232 broadcast or multicast traffic, the NVE MUST either support ingress 233 replication or multicast. 235 When using underlay multicast, the NVE MUST have one or more 236 underlay multicast trees that can be used by local VNIs for flooding 237 to NVEs belonging to the same VN. For each VNI, there is at least 238 one underlay flooding tree used for Broadcast, Unknown Unicast and 239 Multicast forwarding. This tree MAY be shared across VNIs. The 240 flooding tree is equivalent with a multicast (*,G) construct where 241 all the NVEs for which the corresponding VNI is instantiated are 242 members. 244 When tenant multicast is supported, it SHOULD also be possible to 245 select whether the NVE provides optimized underlay multicast trees 246 inside the VNI for individual tenant multicast groups or whether the 247 default VNI flooding tree is used. If the former option is selected 248 the VNI SHOULD be able to snoop IGMP/MLD messages in order to 249 efficiently join/prune Tenant System from multicast trees. 251 3.2.2. L3 VNI 253 L3 VNIs MUST provide virtualized IP routing and forwarding. L3 VNIs 254 MUST support per-tenant forwarding instance with IP addressing 255 isolation and L3 tunneling for interconnecting instances of the same 256 VNI on NVEs. 258 In the case of L3 VNI, the inner TTL field MUST be decremented by 259 (at least) 1 as if the NVO3 egress NVE was one (or more) hop(s) 260 away. The TTL field in the outer IP header MUST be set to a value 261 appropriate for delivery of the encapsulated frame to the tunnel 262 exit point. Thus, the default behavior MUST be the TTL pipe model 263 where the overlay network looks like one hop to the sending NVE. 264 Configuration of a "uniform" TTL model where the outer tunnel TTL is 265 set equal to the inner TTL on ingress NVE and the inner TTL is set 266 to the outer TTL value on egress MAY be supported. [RFC2983] 267 provides additional details on the uniform and pipe models. 269 L2 and L3 VNIs can be deployed in isolation or in combination to 270 optimize traffic flows per tenant across the overlay network. For 271 example, an L2 VNI may be configured across a number of NVEs to 272 offer L2 multi-point service connectivity while a L3 VNI can be co- 273 located to offer local routing capabilities and gateway 274 functionality. In addition, integrated routing and bridging per 275 tenant MAY be supported on an NVE. An instantiation of such service 276 may be realized by interconnecting an L2 VNI as access to an L3 VNI 277 on the NVE. 279 When underlay multicast is supported, it MAY be possible to select 280 whether the NVE provides optimized underlay multicast trees inside 281 the VNI for individual tenant multicast groups or whether a default 282 underlay VNI multicasting tree, where all the NVEs of the 283 corresponding VNI are members, is used. 285 3.3. Overlay Module 287 The overlay module performs a number of functions related to NVO3 288 header and tunnel processing. 290 The following figure shows a generic NVO3 encapsulated frame: 292 +--------------------------+ 293 | Tenant Frame | 294 +--------------------------+ 295 | NVO3 Overlay Header | 296 +--------------------------+ 297 | Outer Underlay header | 298 +--------------------------+ 299 | Outer Link layer header | 300 +--------------------------+ 301 Figure 2 : NVO3 encapsulated frame 303 where 305 . Tenant frame: Ethernet or IP based upon the VNI type 306 . NVO3 overlay header: Header containing VNI context information 307 and other optional fields that can be used for processing 308 this packet. 310 . Outer underlay header: Can be either IP or MPLS 312 . Outer link layer header: Header specific to the physical 313 transmission link used 315 3.3.1. NVO3 overlay header 317 An NVO3 overlay header MUST be included after the underlay tunnel 318 header when forwarding tenant traffic. 320 Note that this information can be carried within existing protocol 321 headers (when overloading of specific fields is possible) or within 322 a separate header. 324 3.3.1.1. Virtual Network Context Identification 326 The overlay encapsulation header MUST contain a field which allows 327 the encapsulated frame to be delivered to the appropriate virtual 328 network endpoint by the egress NVE. 330 The egress NVE uses this field to determine the appropriate virtual 331 network context in which to process the packet. This field MAY be an 332 explicit, unique (to the administrative domain) virtual network 333 identifier (VNID) or MAY express the necessary context information 334 in other ways (e.g. a locally significant identifier). 336 In the case of a global identifier, this field MUST be large enough 337 to scale to 100's of thousands of virtual networks. Note that there 338 is typically no such constraint when using a local identifier. 340 3.3.1.2. Quality of Service (QoS) identifier 342 Traffic flows originating from different applications could rely on 343 differentiated forwarding treatment to meet end-to-end availability 344 and performance objectives. Such applications may span across one or 345 more overlay networks. To enable such treatment, support for 346 multiple Classes of Service (Cos) across or between overlay networks 347 MAY be required. 349 To effectively enforce CoS across or between overlay networks 350 without Deep Packet Inspection (DPI) repeat, NVEs MAY be able to map 351 CoS markings between networking layers, e.g., Tenant Systems, 352 Overlays, and/or Underlay, enabling each networking layer to 353 independently enforce its own CoS policies. For example: 355 - TS (e.g. VM) CoS 357 o Tenant CoS policies MAY be defined by Tenant administrators 359 o QoS fields (e.g. IP DSCP and/or Ethernet 802.1p) in the 360 tenant frame are used to indicate application level CoS 361 requirements 363 - NVE CoS: Support for NVE Service CoS MAY be provided through a 364 QoS field, inside the NVO3 overlay header 366 o NVE MAY classify packets based on Tenant CoS markings or 367 other mechanisms (eg. DPI) to identify the proper service CoS 368 to be applied across the overlay network 370 o NVE service CoS levels are normalized to a common set (for 371 example 8 levels) across multiple tenants; NVE uses per 372 tenant policies to map Tenant CoS to the normalized service 373 CoS fields in the NVO3 header 375 - Underlay CoS 377 o The underlay/core network MAY use a different CoS set (for 378 example 4 levels) than the NVE CoS as the core devices MAY 379 have different QoS capabilities compared with NVEs. 381 o The Underlay CoS MAY also change as the NVO3 tunnels pass 382 between different domains. 384 3.3.2. Tunneling function 386 This section describes the underlay tunneling requirements. From an 387 encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4 388 and IPv6 SHOULD be supported, MPLS MAY be supported. 390 3.3.2.1. LAG and ECMP 392 For performance reasons, multipath over LAG and ECMP paths MAY be 393 supported. 395 LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal 396 Cost Multi Path) are commonly used techniques to perform load- 397 balancing of microflows over a set of a parallel links either at 398 Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware 399 implementations of LAG and ECMP uses a hash of various fields in the 400 encapsulation (outermost) header(s) (e.g. source and destination MAC 401 addresses for non-IP traffic, source and destination IP addresses, 402 L4 protocol, L4 source and destination port numbers, etc). 403 Furthermore, hardware deployed for the underlay network(s) will be 404 most often unaware of the carried, innermost L2 frames or L3 packets 405 transmitted by the TS. 407 Thus, in order to perform fine-grained load-balancing over LAG and 408 ECMP paths in the underlying network, the encapsulation needs to 409 present sufficient entropy to exercise all paths through several 410 LAG/ECMP hops. 412 The entropy information can be inferred from the NVO3 overlay header 413 or underlay header. If the overlay protocol does not support the 414 necessary entropy information or the switches/routers in the 415 underlay do not support parsing of the additional entropy 416 information in the overlay header, underlay switches and routers 417 should be programmable, i.e. select the appropriate fields in the 418 underlay header for hash calculation based on the type of overlay 419 header. 421 All packets that belong to a specific flow MUST follow the same path 422 in order to prevent packet re-ordering. This is typically achieved 423 by ensuring that the fields used for hashing are identical for a 424 given flow. 426 The goal is for all paths available to the overlay network to be 427 used efficiently. Different flows should be distributed as evenly as 428 possible across multiple underlay network paths. For instance, this 429 can be achieved by ensuring that some fields used for hashing are 430 randomly generated. 432 3.3.2.2. DiffServ and ECN marking 434 When traffic is encapsulated in a tunnel header, there are numerous 435 options as to how the Diffserv Code-Point (DSCP) and Explicit 436 Congestion Notification (ECN) markings are set in the outer header 437 and propagated to the inner header on decapsulation. 439 [RFC2983] defines two modes for mapping the DSCP markings from inner 440 to outer headers and vice versa. The Uniform model copies the inner 441 DSCP marking to the outer header on tunnel ingress, and copies that 442 outer header value back to the inner header at tunnel egress. The 443 Pipe model sets the DSCP value to some value based on local policy 444 at ingress and does not modify the inner header on egress. Both 445 models SHOULD be supported. 447 [RFC6040] defines ECN marking and processing for IP tunnels. 449 3.3.2.3. Handling of BUM traffic 451 NVO3 data plane support for either ingress replication or point-to- 452 multipoint tunnels is required to send traffic destined to multiple 453 locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 454 broadcast and unknown unicast traffic). It is possible that both 455 methods be used simultaneously. 457 There is a bandwidth vs state trade-off between the two approaches. 458 User-configurable settings MUST be provided to select which 459 method(s) gets used based upon the amount of replication required 460 (i.e. the number of hosts per group), the amount of multicast state 461 to maintain, the duration of multicast flows and the scalability of 462 multicast protocols. 464 When ingress replication is used, NVEs MUST maintain for each VNI 465 the related tunnel endpoints to which it needs to replicate the 466 frame. 468 For point-to-multipoint tunnels, the bandwidth efficiency is 469 increased at the cost of more state in the Core nodes. The ability 470 to auto-discover or pre-provision the mapping between VNI multicast 471 trees to related tunnel endpoints at the NVE and/or throughout the 472 core SHOULD be supported. 474 3.4. External NVO3 connectivity 476 It is important that NVO3 services interoperate with current VPN and 477 Internet services. This may happen inside one DC during a migration 478 phase or as NVO3 services are delivered to the outside world via 479 Internet or VPN gateways (GW). 481 Moreover the compute and storage services delivered by a NVO3 domain 482 may span multiple DCs requiring Inter-DC connectivity. From a DC 483 perspective a set of GW devices are required in all of these cases 484 albeit with different functionalities influenced by the overlay type 485 across the WAN, the service type and the DC network technologies 486 used at each DC site. 488 A GW handling the connectivity between NVO3 and external domains 489 represents a single point of failure that may affect multiple tenant 490 services. Redundancy between NVO3 and external domains MUST be 491 supported. 493 3.4.1. Gateway (GW) Types 495 3.4.1.1. VPN and Internet GWs 497 Tenant sites may be already interconnected using one of the existing 498 VPN services and technologies (VPLS or IP VPN). If a new NVO3 499 encapsulation is used, a VPN GW is required to forward traffic 500 between NVO3 and VPN domains. Internet connected Tenants require 501 translation from NVO3 encapsulation to IP in the NVO3 gateway. The 502 translation function SHOULD minimize provisioning touches. 504 3.4.1.2. Inter-DC GW 506 Inter-DC connectivity MAY be required to provide support for 507 features like disaster prevention or compute load re-distribution. 508 This MAY be provided via a set of gateways interconnected through a 509 WAN. This type of connectivity MAY be provided either through 510 extension of the NVO3 tunneling domain or via VPN GWs. 512 3.4.1.3. Intra-DC gateways 514 Even within one DC there may be End Devices that do not support NVO3 515 encapsulation, for example bare metal servers, hardware appliances 516 and storage. A gateway device, e.g. a ToR switch, is required to 517 translate the NVO3 to Ethernet VLAN encapsulation. 519 3.4.2. Path optimality between NVEs and Gateways 521 Within an NVO3 overlay, a default assumption is that NVO3 traffic 522 will be equally load-balanced across the underlying network 523 consisting of LAG and/or ECMP paths. This assumption is valid only 524 as long as: a) all traffic is load-balanced equally among each of 525 the component-links and paths; and, b) each of the component- 526 links/paths is of identical capacity. During the course of normal 527 operation of the underlying network, it is possible that one, or 528 more, of the component-links/paths of a LAG may be taken out-of- 529 service in order to be repaired, e.g.: due to hardware failure of 530 cabling, optics, etc. In such cases, the administrator may configure 531 the underlying network such that an entire LAG bundle in the 532 underlying network will be reported as operationally down if there 533 is a failure of any single component-link member of the LAG bundle, 534 (e.g.: N = M configuration of the LAG bundle), and, thus, they know 535 that traffic will be carried sufficiently by alternate, available 536 (potentially ECMP) paths in the underlying network. This is a likely 537 an adequate assumption for Intra-DC traffic where presumably the 538 costs for additional, protection capacity along alternate paths is 539 not cost-prohibitive. In this case, there are no additional 540 requirements on NVO3 solutions to accommodate this type of 541 underlying network configuration and administration. 543 There is a similar case with ECMP, used Intra-DC, where failure of a 544 single component-path of an ECMP group would result in traffic 545 shifting onto the surviving members of the ECMP group. 546 Unfortunately, there are no automatic recovery methods in IP routing 547 protocols to detect a simultaneous failure of more than one 548 component-path in a ECMP group, operationally disable the entire 549 ECMP group and allow traffic to shift onto alternative paths. This 550 problem is attributable to the underlying network and, thus, out-of- 551 scope of any NVO3 solutions. 553 On the other hand, for Inter-DC and DC to External Network cases 554 that use a WAN, the costs of the underlying network and/or service 555 (e.g.: IPVPN service) are more expensive; therefore, there is a 556 requirement on administrators to both: a) ensure high availability 557 (active-backup failover or active-active load-balancing); and, b) 558 maintaining substantial utilization of the WAN transport capacity at 559 nearly all times, particularly in the case of active-active load- 560 balancing. With respect to the dataplane requirements of NVO3 561 solutions, in the case of active-backup fail-over, all of the 562 ingress NVE's need to dynamically adapt to the failure of an active 563 NVE GW when the backup NVE GW announces itself into the NVO3 overlay 564 immediately following a failure of the previously active NVE GW and 565 update their forwarding tables accordingly, (e.g.: perhaps through 566 dataplane learning and/or translation of a gratuitous ARP, IPv6 567 Router Advertisement). Note that active-backup fail-over could be 568 used to accomplish a crude form of load-balancing by, for example, 569 manually configuring each tenant to use a different NVE GW, in a 570 round-robin fashion. 572 3.4.2.1. Load-balancing 574 When using active-active load-balancing across physically separate 575 NVE GW's (e.g.: two, separate chassis) an NVO3 solution SHOULD 576 support forwarding tables that can simultaneously map a single 577 egress NVE to more than one NVO3 tunnels. The granularity of such 578 mappings, in both active-backup and active-active, MUST be specific 579 to each tenant. 581 3.4.2.2. Triangular Routing Issues 583 L2/ELAN over NVO3 service may span multiple racks distributed across 584 different DC regions. Multiple ELANs belonging to one tenant may be 585 interconnected or connected to the outside world through multiple 586 Router/VRF gateways distributed throughout the DC regions. In this 587 scenario, without aid from an NVO3 or other type of solution, 588 traffic from an ingress NVE destined to External gateways will take 589 a non-optimal path that will result in higher latency and costs, 590 (since it is using more expensive resources of a WAN). In the case 591 of traffic from an IP/MPLS network destined toward the entrance to 592 an NVO3 overlay, well-known IP routing techniques MAY be used to 593 optimize traffic into the NVO3 overlay, (at the expense of 594 additional routes in the IP/MPLS network). In summary, these issues 595 are well known as triangular routing (a.k.a. traffic tromboning). 597 Procedures for gateway selection to avoid triangular routing issues 598 SHOULD be provided. 600 The details of such procedures are, most likely, part of the NVO3 601 Management and/or Control Plane requirements and, thus, out of scope 602 of this document. However, a key requirement on the dataplane of any 603 NVO3 solution to avoid triangular routing is stated above, in 604 Section 3.4.2, with respect to active-active load-balancing. More 605 specifically, an NVO3 solution SHOULD support forwarding tables that 606 can simultaneously map a single egress NVE to more than one NVO3 607 tunnel. 609 The expectation is that, through the Control and/or Management 610 Planes, this mapping information may be dynamically manipulated to, 611 for example, provide the closest geographic and/or topological exit 612 point (egress NVE) for each ingress NVE. 614 3.5. Path MTU 616 The tunnel overlay header can cause the MTU of the path to the 617 egress tunnel endpoint to be exceeded. 619 IP fragmentation SHOULD be avoided for performance reasons. 621 The interface MTU as seen by a Tenant System SHOULD be adjusted such 622 that no fragmentation is needed. This can be achieved by 623 configuration or be discovered dynamically. 625 Either of the following options MUST be supported: 627 o Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] or 628 Extended MTU Path Discovery techniques such as defined in 629 [RFC4821] 631 o Segmentation and reassembly support from the overlay layer 632 operations without relying on the Tenant Systems to know about 633 the end-to-end MTU 635 o The underlay network MAY be designed in such a way that the MTU 636 can accommodate the extra tunnel overhead. 638 3.6. Hierarchical NVE dataplane requirements 640 It might be desirable to support the concept of hierarchical NVEs, 641 such as spoke NVEs and hub NVEs, in order to address possible NVE 642 performance limitations and service connectivity optimizations. 644 For instance, spoke NVE functionality may be used when processing 645 capabilities are limited. In this case, a hub NVE MUST provide 646 additional data processing capabilities such as packet replication. 648 3.7. Other considerations 650 3.7.1. Data Plane Optimizations 652 Data plane forwarding and encapsulation choices SHOULD consider the 653 limitation of possible NVE implementations, specifically in software 654 based implementations (e.g. servers running virtual switches) 656 NVE SHOULD provide efficient processing of traffic. For instance, 657 packet alignment, the use of offsets to minimize header parsing, 658 padding techniques SHOULD be considered when designing NVO3 659 encapsulation types. 661 The NV03 encapsulation/decapsulation processing in software-based 662 NVEs SHOULD make use of hardware assist provided by NICs in order to 663 speed up packet processing. 665 3.7.2. NVE location trade-offs 667 In the case of DC traffic, traffic originated from a VM is native 668 Ethernet traffic. This traffic can be switched by a local VM switch 669 or ToR switch and then by a DC gateway. The NVE function can be 670 embedded within any of these elements. 672 The NVE function can be supported in various DC network elements 673 such as a VM, VM switch, ToR switch or DC GW. 675 The following criteria SHOULD be considered when deciding where the 676 NVE processing boundary happens: 678 o Processing and memory requirements 680 o Datapath (e.g. lookups, filtering, 681 encapsulation/decapsulation) 683 o Control plane processing (e.g. routing, signaling, OAM) 685 o FIB/RIB size 687 o Multicast support 689 o Routing protocols 691 o Packet replication capability 693 o Fragmentation support 695 o QoS transparency 697 o Resiliency 699 4. Security Considerations 701 This requirements document does not raise in itself any specific 702 security issues. 704 5. IANA Considerations 706 IANA does not need to take any action for this draft. 708 6. References 710 6.1. Normative References 712 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 713 Requirement Levels", BCP 14, RFC 2119, March 1997. 715 6.2. Informative References 717 [NVOPS] Narten, T. et al, "Problem Statement: Overlays for Network 718 Virtualization", draft-narten-nvo3-overlay-problem- 719 statement (work in progress) 721 [NVO3-framework] Lasserre, M. et al, "Framework for DC Network 722 Virtualization", draft-lasserre-nvo3-framework (work in 723 progress) 725 [OVCPREQ] Kreeger, L. et al, "Network Virtualization Overlay Control 726 Protocol Requirements", draft-kreeger-nvo3-overlay-cp 727 (work in progress) 729 [FLOYD] Sally Floyd, Allyn Romanow, "Dynamics of TCP Traffic over 730 ATM Networks", IEEE JSAC, V. 13 N. 4, May 1995 732 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 733 Networks (VPNs)", RFC 4364, February 2006. 735 [RFC1191] Mogul, J. "Path MTU Discovery", RFC1191, November 1990 737 [RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981, 738 August 1996 740 [RFC4821] Mathis, M. et al, "Packetization Layer Path MTU 741 Discovery", RFC4821, March 2007 743 [RFC2983] Black, D. "Diffserv and tunnels", RFC2983, Cotober 2000 745 [RFC6040] Briscoe, B. "Tunnelling of Explicit Congestion 746 Notification", RFC6040, November 2010 748 [RFC6438] Carpenter, B. et al, "Using the IPv6 Flow Label for Equal 749 Cost Multipath Routing and Link Aggregation in Tunnels", 750 RFC6438, November 2011 752 [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires 753 over an MPLS Packet Switched Network", RFC6391, November 754 2011 756 7. Acknowledgments 758 In addition to the authors the following people have contributed to 759 this document: 761 Shane Amante, David Black, Dimitrios Stiliadis, Rotem Salomonovitch, 762 Larry Kreeger, Eric Gray and Erik Nordmark. 764 This document was prepared using 2-Word-v2.0.template.dot. 766 Authors' Addresses 768 Nabil Bitar 769 Verizon 770 40 Sylvan Road 771 Waltham, MA 02145 772 Email: nabil.bitar@verizon.com 774 Marc Lasserre 775 Alcatel-Lucent 776 Email: marc.lasserre@alcatel-lucent.com 778 Florin Balus 779 Alcatel-Lucent 780 777 E. Middlefield Road 781 Mountain View, CA, USA 94043 782 Email: florin.balus@alcatel-lucent.com 784 Thomas Morin 785 France Telecom Orange 786 Email: thomas.morin@orange.com 788 Lizhong Jin 789 Email : lizho.jin@gmail.com 791 Bhumip Khasnabish 792 ZTE 793 Email : Bhumip.khasnabish@zteusa.com