idnits 2.17.1 draft-bitar-lasserre-nvo3-dp-reqs-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 15, 2012) is 4364 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'NVOPS' is defined on line 729, but no explicit reference was found in the text == Unused Reference: 'OVCPREQ' is defined on line 737, but no explicit reference was found in the text == Unused Reference: 'FLOYD' is defined on line 741, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 744, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Nabil Bitar 2 Internet Draft Verizon 3 Intended status: Informational 4 Expires: November 2012 Marc Lasserre 5 Florin Balus 6 Alcatel-Lucent 8 May 15, 2012 10 NVO3 Data Plane Requirements 11 draft-bitar-lasserre-nvo3-dp-reqs-00.txt 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other documents 25 at any time. It is inappropriate to use Internet-Drafts as 26 reference material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on November 15, 2012. 36 Copyright Notice 38 Copyright (c) 2012 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with 46 respect to this document. 48 Abstract 50 Several IETF drafts relate to the use of overlay networks to support 51 large scale virtual data centers. This draft provides a list of data 52 plane requirements for Network Virtualization over L3 (NVO3) that 53 have to be addressed in solutions documents. 55 Table of Contents 57 1. Introduction...................................................3 58 1.1. Conventions used in this document.........................3 59 1.2. General terminology.......................................3 60 2. Data Path Overview.............................................4 61 3. Data Plane Requirements........................................5 62 3.1. Virtual Access Points (VAPs)..............................5 63 3.2. Virtual Network Instance (VNI)............................5 64 3.2.1. L2 VNI..................................................6 65 3.2.2. L3 VNI..................................................6 66 3.3. Overlay Module............................................7 67 3.3.1. NVO3 overlay header.....................................7 68 3.3.1.1. Virtual Network Identifier (VNID).....................7 69 3.3.1.2. Service QoS identifier................................8 70 3.3.2. NVE Tunneling function..................................9 71 3.3.2.1. LAG and ECMP..........................................9 72 3.3.2.2. DiffServ and ECN marking.............................10 73 3.3.2.3. Handling of BUM traffic..............................10 74 3.4. External NVO3 connectivity...............................11 75 3.4.1. GW Types...............................................11 76 3.4.1.1. VPN and Internet GWs.................................11 77 3.4.1.2. Inter-DC GW..........................................11 78 3.4.1.3. Intra-DC gateways....................................11 79 3.4.2. Path optimality between NVEs and Gateways..............12 80 3.4.2.1. Triangular Routing Issues,a.k.a.: Traffic Tromboning.13 81 3.5. Path MTU.................................................14 82 3.6. Hierarchical NVE.........................................14 83 3.7. NVE Multi-Homing Requirements............................14 84 3.8. OAM......................................................15 85 3.9. Other considerations.....................................15 86 3.9.1. Data Plane Optimizations...............................15 87 3.9.2. NVE location trade-offs................................16 88 4. Security Considerations.......................................16 89 5. IANA Considerations...........................................16 90 6. References....................................................17 91 6.1. Normative References.....................................17 92 6.2. Informative References...................................17 93 7. Acknowledgments...............................................18 95 1. Introduction 97 1.1. Conventions used in this document 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in RFC-2119 [RFC2119]. 103 In this document, these words will appear with that interpretation 104 only when in ALL CAPS. Lower case uses of these words are not to be 105 interpreted as carrying RFC-2119 significance. 107 1.2. General terminology 109 The terminology defined in [NVO3-framework] is used throughout this 110 document. Terminology specific to this memo is defined here and is 111 introduced as needed in later sections. 113 DC: Data Center 115 BUM: Broadcast, Unknown Unicast, Multicast traffic 117 TES: Tenant End System 119 VAP: Virtual Access Point 121 VNI: Virtual Network Instance 123 VNID: VNI ID 125 2. Data Path Overview 127 The NVO3 framework [NVO3-framework] defines the generic NVE model 128 depicted in Figure 1: 130 +------- L3 Network ------+ 131 | | 132 | Tunnel Overlay | 133 +------------+--------+ +--------+------------+ 134 | +----------+------+ | | +------+----------+ | 135 | | Overlay Module | | | | Overlay Module | | 136 | +--------+--------+ | | +--------+--------+ | 137 | | VNID | | | VNID | 138 | | | | | | 139 | +-------+-------+ | | +-------+-------+ | 140 | | VNI | | | | VNI | | 141 NVE1 | +-+-----------+-+ | | +-+-----------+-+ | NVE2 142 | | VAPs | | | | VAPs | | 143 +----+-----------+----+ +----+-----------+----+ 144 | | | | 145 -------+-----------+-----------------+-----------+------- 146 | | Tenant | | 147 | | Service IF | | 148 Tenant End Systems Tenant End Systems 150 Figure 1 : Generic reference model for NV Edge 152 When a frame is received by an ingress NVE from a Tenant End System 153 over a local VAP, it needs to be parsed in order to identify which 154 virtual network instance it belongs to. The parsing function can 155 examine various fields in the data frame (e.g., VLANID) and/or 156 associated interface/port the frame came from. 158 Once a corresponding VNI is identified, a lookup is performed to 159 determine where the frame needs to be sent. This lookup can be based 160 on any combinations of various fields in the data frame (e.g., 161 source/destination MAC addresses and/or source/destination IP 162 addresses). Note that additional criteria such as 802.1p and/or DSCP 163 markings can be used to select an appropriate tunnel or local VAP 164 destination. 166 Lookup tables can be populated using different techniques: data 167 plane learning, management plane configuration, or a distributed 168 control plane. Management and control planes are not in the scope of 169 this document. The data plane based solution is described in this 170 document as it has implications on the data plane processing 171 function. 173 The result of this lookup yields the corresponding tunnel 174 information needed to build the overlay encapsulation header. This 175 information includes the destination L3 address of the egress NVE. 176 Note that this lookup might yield a list of tunnels such as when 177 ingress replication is used for BUM traffic. 179 The overlay tunnel encapsulation header MUST include a context 180 identifier which the egress NVE will use to identify which VNI this 181 frame belongs to. 183 The egress NVE checks the context identifier and removes the 184 encapsulation header and then forwards the original frame towards 185 the appropriate recipient, usually a local VAP. 187 3. Data Plane Requirements 189 3.1. Virtual Access Points (VAPs) 191 The NVE forwarding plane MUST support VAP identification through the 192 following mechanisms: 194 - Using the local interface on which the frames are received, where 195 the local interface may be an internal, virtual port in a VSwitch 196 or a physical port on the ToR 197 - Using the local interface and some fields in the frame header, 198 e.g. one or multiple VLANs or the source MAC 200 3.2. Virtual Network Instance (VNI) 202 VAPs are associated with a specific VNI at service instantiation 203 time. 205 A VNI identifies a per-tenant private context, i.e. per-tenant 206 policies and a FIB table to allow overlapping address space between 207 tenants. 209 There are different VNI types differentiated by the virtual network 210 service they provide to Tenant End Systems. Network virtualization 211 can be provided by L2 and/or L3 VNIs. 213 3.2.1. L2 VNI 215 An L2 VNI MUST provide an emulated Ethernet multipoint service as if 216 Tenant End Systems are interconnected by an 802.1Q LAN over a set of 217 NVO3 tunnels. An L2 VNI provides per tenant virtual switching 218 instance with MAC addressing isolation and L3 tunneling. Loop 219 avoidance capability MUST be provided. 221 In the absence of a management or control plane, data plane learning 222 MUST be used to populate forwarding tables. Forwarding table entries 223 provide mapping information between MAC addresses and L3 tunnel 224 destination addresses. As frames arrive from VAPs or from overlay 225 tunnels, the MAC learning procedures described in IEEE 802.1Q are 226 used: The source MAC address is learned against the VAP or the NVO3 227 tunnel on which the frame arrived. 229 Broadcast, Unknown Unicast and Multicast (BUM) traffic handling MUST 230 be supported. To achieve this, the NVE must be able to build at 231 least a default flooding tree per VNI. The flooding tree is 232 equivalent with a multicast (*,G) construct where all the NVEs where 233 the corresponding VNI is instantiated are members. 235 The ability to establish or pre-provision multicast trees at the NVE 236 SHOULD be supported. 238 It MUST also be possible to select whether the NVE provides 239 optimized multicast trees inside the VNI for individual tenant 240 multicast groups or whether the default VNI flooding tree is used. 241 If the former option is selected the VNI MUST be able to snoop 242 IGMP/MLD messages in order to efficiently join/prune End System from 243 multicast trees. 245 3.2.2. L3 VNI 247 L3 VNIs MUST provide virtualized IP routing and forwarding. L3 VNIs 248 MUST support per-tenant routing instance with IP addressing 249 isolation and L3 tunneling for interconnecting instances of the same 250 VNI on NVEs. 252 In the case of L3 VNI, the inner TTL field MUST be decremented by 1 253 as if the NVO3 egress NVE was one hop away. The TTL field in the 254 outer IP header must be set to a value appropriate for delivery of 255 the encapsulated frame to the tunnel exit point. Thus, the default 256 behavior must be the TTL pipe model where the overlay network looks 257 like one hop to the sending NVE. Configuration of the uniform TTL 258 model where the outer tunnel TTL is set equal to the inner TTL on 259 ingress NVE and the inner TTL is set to the outer TTL value on 260 egress should be supported. 262 L2 and L3 VNIs can be deployed in isolation or in combination to 263 optimize traffic flows per tenant across the overlay network. For 264 example, an L2 VNI may be configured across a number of NVEs to 265 offer L2 multi-point service connectivity while a L3 VNI can be co- 266 located to offer local routing capabilities and gateway 267 functionality. In addition, integrated routing and bridging per 268 tenant must be supported on an NVE. An instantiation of such service 269 may be realized by interconnecting an L2 VNI as access to an L3 VNI 270 on the NVE. 272 The L3 VNI does not require support for Broadcast and Unknown 273 Unicast traffic. The L3 VNI MAY provide support for customer 274 multicast groups. This paragraph will be expanded in a future 275 version of the draft. 277 3.3. Overlay Module 279 The overlay module performs a number of functions related to NVO3 280 header and tunnel processing. Specifically for a L2 VNI it provides 281 the capability to encapsulate and send Ethernet traffic over NVO3 282 tunnels. For a L3 VNI it provides the capability to encapsulate and 283 carry IP traffic (both IPv4 and IPv6) over NVO3 tunnels. 285 3.3.1. NVO3 overlay header 287 An NVO3 overlay header MUST be included after the tunnel 288 encapsulation header when forwarding tenant traffic. This section 289 describes the fields that need to be included as part of the NOV3 290 overlay header. In this version the focus is on the VN instance and 291 service QoS fields. Future versions may include additional fields. 293 3.3.1.1. Virtual Network Identifier (VNID) 295 A VNID MUST be included in the overlay encapsulation header on the 296 ingress NVE when encapsulating a tenant Ethernet frame or IP packet 297 on the overlay tunnel. The egress NVE uses the VNID to identify the 298 VN context in which the encapsulated frame should be processed. The 299 VNID can be either a globally unique identifier (on a per- 300 administrative domain) or a locally significant identifier. It MUST 301 be easily parsed and processed by the data path and MAY be 302 distributable by a control-plane or configured via a management 303 plane. 305 The VNID MUST be large enough to scale to 100's of thousands of 306 virtual networks. 308 3.3.1.2. Service QoS identifier 310 Traffic flows originating from different applications could rely on 311 differentiated forwarding treatment to meet end-to-end availability 312 and performance objectives. Such applications may span across one or 313 more overlay networks. To enable such treatment, support for 314 multiple Classes of Service across or between overlay networks is 315 required. 317 To effectively enforce CoS across or between overlay networks, NVEs 318 should be able to map CoS markings between networking layers, e.g., 319 Tenant End Systems, Overlays, and/or Underlay, enabling each 320 networking layer to independently enforce its own CoS policies. For 321 example: 323 - TES (e.g. VM) CoS 325 o Tenant CoS policies MAY be defined by Tenant administrators 327 o QoS fields (e.g. IP DSCP and/or Ethernet 802.1p) in the 328 tenant frame are used to indicate application level CoS 329 requirements 331 - NVE CoS 333 o NVE MAY classify packets based on Tenant CoS markings or 334 other mechanisms (eg. DPI) to identify the proper service CoS 335 to be applied across the overlay network 337 o NVE service CoS levels are normalized to a common set (for 338 example 8 levels) across multiple tenants; NVE uses per 339 tenant policies to map Tenant CoS to the normalized service 340 CoS fields in the NVO3 header 342 - Underlay CoS 344 o The underlay/core network may use a different CoS set (for 345 example 4 levels) than the NVE CoS as the core devices may 346 have different QoS capabilities compared with NVEs. 348 o The Underlay CoS may also change as the NVO3 tunnels pass 349 between different domains. 351 Support for NVE Service CoS SHOULD be provided through a QoS field, 352 inside the NVO3 overlay header. Examples of service CoS provided 353 part of the service tag are 802.1p and DE bits in the VLAN and PBB 354 ISID tags and MPLS EXP bits in the VPN labels. 356 3.3.2. NVE Tunneling function 358 This section describes NVE tunneling requirements. From an 359 encapsulation perspective the IPv4 and IPv6 encapsulations MUST be 360 supported, MPLS tunneling MAY be supported. 362 3.3.2.1. LAG and ECMP 364 For performance reasons, multipath over LAG and ECMP paths SHOULD be 365 supported. 367 LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal 368 Cost Multi Path) are commonly used techniques to perform load- 369 balancing of microflows over a set of a parallel links either at 370 Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware 371 implementations of LAG and ECMP uses a hash of various fields in the 372 encapsulation (outermost) header(s) (e.g. source and destination MAC 373 addresses for non-IP traffic, source and destination IP addresses, 374 L4 protocol, L4 source and destination port numbers, etc). 375 Furthermore, hardware deployed for the underlay network(s) will be 376 most often unaware of the carried, innermost L2 frames or L3 packets 377 transmitted by the TES. Thus, in order to perform fine-grained load- 378 balancing over LAG and ECMP paths in the underlying network the NVO3 379 encapsulation headers and/or tunneling methods MUST contain a 380 "entropy field" or "entropy label" so the underlying network can 381 perform fine-grained load-balancing of the NVO3 encapsulated 382 traffic, (e.g.: [RFC6391], [RFC6438], [draft-kompella-mpls-entropy- 383 label-02], etc.) It is recommended this entropy label/field be 384 applied at the ingress VNI, likely using information gleaned from 385 the ingress VAP. If necessary, the entropy label/field will be 386 discarded at the egress VNI. 388 All packets that belong to a specific flow MUST follow the same path 389 in order to prevent packet re-ordering. This is typically achieved 390 by ensuring that the fields used for hashing are identical for a 391 given flow. 393 All paths available to the overlay network SHOULD be used 394 efficiently. Different flows SHOULD be distributed as evenly as 395 possible across multiple underlay network paths. This can be 396 achieved by ensuring that some fields used for hashing are randomly 397 generated. 399 3.3.2.2. DiffServ and ECN marking 401 When traffic is encapsulated in a tunnel header, there are numerous 402 options as to how the Diffserv Code-Point (DSCP) and Explicit 403 Congestion Notification (ECN) markings are set in the outer header 404 and propagated to the inner header on decapsulation. 406 [RFC2983] defines two modes for mapping the DSCP markings from inner 407 to outer headers and vice versa. The Uniform model copies the inner 408 DSCP marking to the outer header on tunnel ingress, and copies that 409 outer header value back to the inner header at tunnel egress. The 410 Pipe model sets the DSCP value to some value based on local policy 411 at ingress and does not modify the inner header on egress. Both 412 models SHOULD be supported. 414 ECN marking MUST be performed according to [RFC6040] which describes 415 the correct ECN behavior for IP tunnels. 417 3.3.2.3. Handling of BUM traffic 419 NVO3 data plane support for either ingress replication or point-to- 420 multipoint tunnels is required to send traffic destined to multiple 421 locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 422 broadcast and unknown unicast traffic). It is possible that both 423 methods be used simultaneously. 425 L2 NVEs MUST support ingress replication and SHOULD support point- 426 to-multipoint tunnels. L3 VNIs MAY support either one of the two 427 methods. 429 There is a bandwidth vs state trade-off between the two approaches. 430 User-definable knobs MUST be provided to select which method(s) gets 431 used based upon the amount of replication required (i.e. the number 432 of hosts per group), the amount of multicast state to maintain, the 433 duration of multicast flows and the scalability of multicast 434 protocols. 436 When ingress replication is used, NVEs must track for each VNI the 437 related tunnel endpoints to which it needs to replicate the frame. 439 For point-to-multipoint tunnels, the bandwidth efficiency is 440 increased at the cost of more state in the Core nodes. The ability 441 to auto-discover or pre-provision the mapping between VNI multicast 442 trees to related tunnel endpoints at the NVE and/or throughout the 443 core SHOULD be supported. 445 3.4. External NVO3 connectivity 447 NVO3 services MUST interoperate with current VPN and Internet 448 services. This may happen inside one DC during a migration phase or 449 as NVO3 services are delivered to the outside world via Internet or 450 VPN gateways. 452 Moreover the compute and storage services delivered by a NVO3 domain 453 may span multiple DCs requiring Inter-DC connectivity. From a DC 454 perspective a set of gateway devices are required in all of these 455 cases albeit with different functionalities influenced by the 456 overlay type across the WAN, the service type and the DC network 457 technologies used at each DC site. 459 A GW handling the connectivity between NVO3 and external domains 460 represents a single point of failure that may affect multiple tenant 461 services. Redundancy between NVO3 and external domains MUST be 462 supported. 464 3.4.1. GW Types 466 3.4.1.1. VPN and Internet GWs 468 Tenant sites may be already interconnected using one of the existing 469 VPN services and technologies (VPLS or IP VPN). A VPN GW is required 470 to translate between NVO3 and VPN encapsulation and forwarding 471 procedures. Internet connected Tenants require translation from NVO3 472 encapsulation to IP in the NVO3 gateway. The translation function 473 SHOULD NOT require provisioning touches and SHOULD NOT use 474 intermediate hand-offs, for example VLANs. 476 3.4.1.2. Inter-DC GW 478 Inter-DC connectivity may be required to provide support for 479 features like disaster prevention or compute load re-distribution. 480 This may be provided through a set of gateways interconnected 481 through a WAN. This type of connectivity may be provided either 482 through extension of the NVO3 tunneling domain or via VPN GWs. 484 3.4.1.3. Intra-DC gateways 486 Even within one DC there may be End Devices that do not support NVO3 487 encapsulation, for example bare metal servers, hardware appliances 488 and storage. A gateway device, e.g. a ToR, is required to translate 489 the NVO3 to Ethernet VLAN encapsulation. 491 3.4.2. Path optimality between NVEs and Gateways 493 Within the NVO3 overlay, a default assumption is that NVO3 traffic 494 will be equally load-balanced across the underlying network 495 consisting of LAG and/or ECMP paths. This assumption is valid only 496 as long as: a) all traffic is load-balanced equally among each of 497 the component-links and paths; and, b) each of the component- 498 links/paths is of identical capacity. During the course of normal 499 operation of the underlying network, it is possible that one, or 500 more, of the component-links/paths of a LAG may be taken out-of- 501 service in order to be repaired, e.g.: due to hardware failure of 502 cabling, optics, etc. In such cases, the administrator should 503 configure the underlying network such that an entire LAG bundle in 504 the underlying network will be reported as operationally down if 505 there is a failure of any single component-link member of the LAG 506 bundle, (e.g.: N = M configuration of the LAG bundle), and, thus, 507 they know that traffic will be carried sufficiently by alternate, 508 available (potentially ECMP) paths in the underlying network. This 509 is a likely an adequate assumption for Intra-DC traffic where 510 presumably the costs for additional, protection capacity along 511 alternate paths is not cost-prohibitive. Thus, there are likely no 512 additional requirements on NVO3 solutions to accommodate this type 513 of underlying network configuration and administration. 515 There is a similar case with ECMP, used Intra-DC, where failure of a 516 single component-path of an ECMP group would result in traffic 517 shifting onto the surviving members of the ECMP group. 518 Unfortunately, there are no automatic recovery methods in IP routing 519 protocols to detect a simultaneous failure of more than one 520 component-path in a ECMP group, operationally disable the entire 521 ECMP group and allow traffic to shift onto alternative paths. This 522 is problem is attributable to the underlying network and, thus, out- 523 of-scope of any NVO3 solutions. 525 On the other hand, for Inter-DC and DC to External Network cases 526 that use a WAN, the costs of the underlying network and/or service 527 (e.g.: IPVPN service) are more expensive; therefore, there is a 528 requirement on administrators to both: a) ensure high availability 529 (active-backup failover or active-active load-balancing); and, b) 530 maintaining substantial utilization of the WAN transport capacity at 531 nearly all times, particularly in the case of active-active load- 532 balancing. With respect to the dataplane requirements of NVO3 533 solutions, in the case of active-backup fail-over, all of the 534 ingress NVE's MUST dynamically learn of the failure of an active NVE 535 GW when the backup NVE GW announces itself into the NVO3 overlay 536 immediately following a failure of the previously active NVE GW and 537 update their forwarding tables accordingly, (e.g.: perhaps through 538 dataplane learning and/or translation of a gratuitous ARP, IPv6 539 Router Advertisement, etc.) Note that active-backup fail-over could 540 be used to accomplish a crude form of load-balancing by, for 541 example, manually configuring each tenant to use a different NVE GW, 542 in a round-robin fashion. On the other hand, with respect to active- 543 active load-balancing across physically separate NVE GW's (e.g.: 544 two, separate chassis) an NVO3 solution SHOULD support forwarding 545 tables that can simultaneously map a single egress NVE to more than 546 one NVO3 tunnels. The granularity of such mappings, in both active- 547 backup and active-active, MUST be unique to each tenant. 549 3.4.2.1. Triangular Routing Issues,a.k.a.: Traffic Tromboning 551 L2/ELAN over NVO3 service may span multiple racks distributed across 552 different DC regions. Multiple ELANs belonging to one tenant may be 553 interconnected or connected to the outside world through multiple 554 Router/VRF gateways distributed throughout the DC regions. In this 555 scenario, without aid from an NVO3 or other type of solution, 556 traffic from an ingress NVE destined to External gateways will take 557 a non-optimal path that will result in higher latency and costs, 558 (since it is using more expensive resources of a WAN). In the case 559 of traffic from an IP/MPLS network destined toward the entrance to 560 an NVO3 overlay, well-known IP routing techniques may be used to 561 optimize traffic into the NVO3 overlay, (at the expense of 562 additional routes in the IP/MPLS network). In summary, these issues 563 are well known as triangular routing. 565 Procedures for gateway selection to avoid triangular routing issues 566 SHOULD be provided. The details of such procedures are, most likely, 567 part of the NVO3 Management and/or Control Plane requirements and, 568 thus, out of scope of this document. However, a key requirement on 569 the dataplane of any NVO3 solution to avoid triangular routing is 570 stated above, in Section 3.4.2, with respect to active-active load- 571 balancing. More specifically, an NVO3 solution SHOULD support 572 forwarding tables that can simultaneously map a single egress NVE to 573 more than one NVO3 tunnels. The expectation is that, through the 574 Control and/or Management Planes, this mapping information may be 575 dynamically manipulated to, for example, provide the closest 576 geographic and/or topological exit point (egress NVE) for each 577 ingress NVE. 579 3.5. Path MTU 581 The tunnel overlay header can cause the MTU of the path to the 582 egress tunnel endpoint to be exceeded. 584 IP fragmentation should be avoided for performance reasons. 586 The interface MTU as seen by a Tenant End System SHOULD be adjusted 587 such that no fragmentation is needed. This can be achieved by 588 configuration or be discovered dynamically. 590 Either of the following options MUST be supported: 592 o Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] or 593 Extended MTU Path Discovery techniques such as defined in 594 [RFC4821] 596 o Segmentation and reassembly support from the overlay layer 597 operations without relying on the Tenant End Systems to know 598 about the end-to-end MTU 600 o The underlay network may be designed in such a way that the MTU 601 can accommodate the extra tunnel overhead. 603 3.6. Hierarchical NVE 605 It might be desirable to support the concept of hierarchical NVEs, 606 such as spoke NVEs and hub NVEs, in order to address possible NVE 607 performance limitations and service connectivity optimizations. 609 For instance, it is possible that the amount of tunneling state to 610 handle within a single NVE be too important. This can happen with 611 both IP based tunneling and more specifically with MPLS based 612 tunneling. 614 NVEs can be either connected in an any-to-any or hub and spoke 615 topology on a per VNI basis. 617 3.7. NVE Multi-Homing Requirements 619 Multi-homing to a set of NVEs may be required in certain scenarios: 621 . End Device dual-homed to two ToR switches acting as NVEs 622 . Multi-homing into NVE-GWs providing connectivity between 623 domains using different technologies 624 . Hierarchical NVEs: Spoke NVE multi-homed to Hub NVEs 626 This section will be extended in the next revision. 628 3.8. OAM 630 NVE may be able to originate/terminate OAM messages for connectivity 631 verification, performance monitoring, statistic gathering and fault 632 isolation. Depending on configuration, NVEs SHOULD be able to 633 process or transparently tunnel OAM messages, as well as supporting 634 alarm propagation capabilities. 636 Given the critical requirement to load-balance NVO3 encapsulated 637 packets over LAG and ECMP paths, it will be equally critical to 638 ensure existing and/or new OAM tools allow NVE administrators to 639 proactively and/or reactively monitor the health of various 640 component-links that comprise both LAG and ECMP paths carrying NVO3 641 encapsulated packets. For example, it will be important that such 642 OAM tools allow NVE administrators to reveal the set of underlying 643 network hops (topology) in order that the underlying network 644 administrators can use this information to quickly perform fault 645 isolation and restore the underlying network. 647 The NVE MUST provide the ability to reveal the set of ECMP and/or 648 LAG paths used by NVO3 encapsulated packets in the underlying 649 network from an ingress NVE to egress NVE. The NVE MUST provide the 650 ability to provide a "ping"-like functionality that may be used to 651 determine the health (liveness) of remote NVE's or their VNI's. The 652 NVE SHOULD provide a "ping"-like functionality to more expeditiously 653 aid in troubleshooting performance problems, i.e.: blackholing or 654 other types of congestion occurring in the underlying network, for 655 NVO3 encapsulated packets carried over LAG and/or ECMP paths. 657 3.9. Other considerations 659 3.9.1. Data Plane Optimizations 661 Data plane forwarding and encapsulation choices SHOULD consider the 662 limitation of possible NVE implementations, specifically in software 663 based implementations (e.g. servers running VSwitches) 665 NVE should provide efficient processing of traffic. For instance, 666 packet alignment, the use of offsets to minimize header parsing, 667 padding techniques SHOULD be considered when designing NVO3 668 encapsulation types. 670 Software-based NVEs SHOULD make use of hardware assist provided by 671 NICs in order to speed up packet processing. 673 3.9.2. NVE location trade-offs 675 In the case of DC traffic, traffic originated from a VM is native 676 Ethernet traffic. This traffic can be switched by a local VM switch 677 or ToR switch and then by a DC gateway. The NVE function can be 678 embedded within any of these elements. 680 The NVE function can be supported in various DC network elements 681 such as a VM, VM switch, ToR switch or DC GW. 683 The following criteria SHOULD be considered when deciding where the 684 NVE processing boundary happens: 686 o Processing and memory requirements 688 o Datapath (e.g. lookups, filtering, 689 encapsulation/decapsulation) 691 o Control plane processing (e.g. routing, signaling, OAM) 693 o FIB/RIB size 695 o Multicast support 697 o Routing protocols 699 o Packet replication capability 701 o Fragmentation support 703 o QoS transparency 705 o Resiliency 707 4. Security Considerations 709 The tenant to overlay mapping function can introduce significant 710 security risks if appropriate protocols are not used that can 711 support authentication. 713 No other new security issues are introduced beyond those described 714 already in the related L2VPN and L3VPN RFCs. 716 5. IANA Considerations 718 IANA does not need to take any action for this draft. 720 6. References 722 6.1. Normative References 724 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 725 Requirement Levels", BCP 14, RFC 2119, March 1997. 727 6.2. Informative References 729 [NVOPS] Narten, T. et al, "Problem Statement: Overlays for Network 730 Virtualization", draft-narten-nvo3-overlay-problem- 731 statement (work in progress) 733 [NVO3-framework] Lasserre, M. et al, "Framework for DC Network 734 Virtualization", draft-lasserre-nvo3-framework (work in 735 progress) 737 [OVCPREQ] Kreeger, L. et al, "Network Virtualization Overlay Control 738 Protocol Requirements", draft-kreeger-nvo3-overlay-cp 739 (work in progress) 741 [FLOYD] Sally Floyd, Allyn Romanow, "Dynamics of TCP Traffic over 742 ATM Networks", IEEE JSAC, V. 13 N. 4, May 1995 744 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 745 Networks (VPNs)", RFC 4364, February 2006. 747 [RFC1191] Mogul, J. "Path MTU Discovery", RFC1191, November 1990 749 [RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981, 750 August 1996 752 [RFC4821] Mathis, M. et al, "Packetization Layer Path MTU 753 Discovery", RFC4821, March 2007 755 [RFC2983] Black, D. "Diffserv and tunnels", RFC2983, Cotober 2000 757 [RFC6040] Briscoe, B. "Tunnelling of Explicit Congestion 758 Notification", RFC6040, November 2010 760 [RFC6438] Carpenter, B. et al, "Using the IPv6 Flow Label for Equal 761 Cost Multipath Routing and Link Aggregation in Tunnels", 762 RFC6438, November 2011 764 [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires 765 over an MPLS Packet Switched Network", RFC6391, November 766 2011 768 7. Acknowledgments 770 In addition to the authors the following people have contributed to 771 this document: 773 Shane Amante, Level3 775 Dimitrios Stiliadis, Rotem Salomonovitch, Alcatel-Lucent 777 This document was prepared using 2-Word-v2.0.template.dot. 779 Authors' Addresses 781 Nabil Bitar 782 Verizon 783 40 Sylvan Road 784 Waltham, MA 02145 785 Email: nabil.bitar@verizon.com 787 Marc Lasserre 788 Alcatel-Lucent 789 Email: marc.lasserre@alcatel-lucent.com 791 Florin Balus 792 Alcatel-Lucent 793 777 E. Middlefield Road 794 Mountain View, CA, USA 94043 795 Email: florin.balus@alcatel-lucent.com