idnits 2.17.1 draft-saum-nvo3-pmtud-over-vxlan-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 490 has weird spacing: '...ers and trans...' -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: The end hosts in a typical datacenter deployment are connected to devices termed as ToR (top of rack devices). These are the networking devices which encapsulate the packet in an Overlay construct and relays it over Data center core network. Although a ToR device MAY NOT always be a gateway for an overlay. -- The document date (June 15, 2017) is 2506 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.nordmark-nvo3-transcending-traceroute' is defined on line 1149, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 1165, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-gue-03 == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-02 == Outdated reference: A later version (-03) exists of draft-nordmark-nvo3-transcending-traceroute-02 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 S. Dikshit 3 Internet-Draft A. Sujeet Nayak 4 Intended status: Standards Track Cisco Systems 5 Expires: December 17, 2017 June 15, 2017 7 PMTUD Over Vxlan 8 draft-saum-nvo3-pmtud-over-vxlan-05 10 Abstract 12 Path MTU Discovery between hosts/VM/servers/end-points connected over 13 a Data-Center/Service-Provider Overlay Network, is still an 14 unattended problem. It needs a converged solution to ensure optimal 15 usage of network and computational resources for all hooked end-point 16 devices. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on December 17, 2017. 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 55 2.2. Solution Requirements . . . . . . . . . . . . . . . . . . 3 56 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 3 57 3.1. IPv6 PMTUD Issues . . . . . . . . . . . . . . . . . . . . 4 58 3.1.1. Inaccurate MTU relayed to end hosts . . . . . . . . . 5 59 3.1.2. Packet_Too_Big not-relayed to host . . . . . . . . . 6 60 4. Solution(s) . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 4.1. Discovery of end-to-end Path MTU . . . . . . . . . . . . 6 62 4.1.1. ICMP extensions, PMTUD on Vxlan . . . . . . . . . . . 7 63 4.1.2. Packet Path Processing . . . . . . . . . . . . . . . 7 64 4.1.3. ICMP(v6) Error Translation . . . . . . . . . . . . . 15 65 5. Multicast and Anycast Considerations . . . . . . . . . 25 66 6. Ecmp Considerations . . . . . . . . . . . . . . . . . . . . . 25 67 7. Security Considerations . . . . . . . . . . . . . . . . . . . 25 68 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 69 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 70 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 71 10.1. Normative References . . . . . . . . . . . . . . . . . . 26 72 10.2. Informative References . . . . . . . . . . . . . . . . . 26 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 75 1. Introduction 77 There is an operational disconnect between underlay network 78 provisioned as the core network, and the overlay network which 79 intends to connect islands of customer deployments. The deployments 80 can range from cloud based services to storage applications or 81 web(over the top) servers hosted over virtual machines or any other 82 end devices like blade servers. Overlay network are provisioned as 83 tunnels leveraging Vxlan (and associated ones like gpe, geneve, gue), 84 NVGRE, MPLS and other overlay encapsulations. 86 The end hosts in a typical datacenter deployment are connected to 87 devices termed as ToR (top of rack devices). These are the 88 networking devices which encapsulate the packet in an Overlay 89 construct and relays it over Data center core network. Although a 90 ToR device MAY NOT always be a gateway for an overlay. 92 IPv6/IPv4 enabled hosts/end-points, triggering PMTUD, may not get the 93 right (or any) information from (over) the core network. This 94 document validates the solution for Vxlan core network (overlay) in a 95 data center deployment. This solution is equally applicable to any 96 other tunnel specific core network deployments. 98 The proposal in this document, formulates an integrated approach 99 which falls inline with OAM modelling discussed in NVO3./>. 101 2. Requirements 103 2.1. Requirements Language 105 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 106 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 107 document are to be interpreted as described in [RFC2119]. 109 When used in lowercase, these words convey their typical use in 110 common language, and they are not to be interpreted as described in 111 [RFC2119]. 113 2.2. Solution Requirements 115 This section describes the advantages of the proposed solution, 116 considering deployment in a typical data center core network: 118 (a) Optimal use of bandwidth in core and client side network of 119 typical data center deployment. 121 (b) In case Vxlan Gateway nodes complies to this solution, it MAY 122 avoid black holing. 124 (c) All end host applications (like web servers) can tailor the MSS 125 accordingly against their respective transports. 127 (d) Facilitates seamless integration of IPv6 or dual stack 128 applications over IPv4 based overlays and vice versa. 130 (e) The proposed solution is applicable to all encapsulations 131 [RFC7348], [I-D.draft-ietf-nvo3-vxlan-gpe], 132 [I-D.draft-ietf-nvo3-gue], [I-D.draft-gross-geneve] and 133 [RFC7637]. Although the problem and solution refers to VXLAN 134 [RFC7348] as a use-case in this document. 136 3. Problem Description 138 In current vendor implementation(s) of Vxlan-Gateway/ToR-device or 139 other network devices, which form part of core data center network 140 and is configured with an overlay(tunnel) mechanism to transport 141 packets from one customer end point to another, are incapable of 142 relaying the errors encountered in routing/switching path in their 143 networks (underlay network) to the customer end points (hosts/vm/ 144 blade-servers). This deems right, as core-network should be 145 transparent and water-tight with respect to leaking any public (core) 146 network information to customer devices (and vice versa), thus 147 ensuring seclusion between different customers provisioning tunneled 148 over the same core network. 150 For example, the information carried in the IP header of a Vxlan 151 encapsulated packet is transparent to the payload (end-point 152 generated packet). Hence, any network-specific information related 153 to IPv6/IPv4 native functionality is carried to the end-point 154 devices, as is the case with an end-to-end private network. The 155 information generated in the core network devices while processing 156 packets destined-to/sourced-from end-point devices, need to be 157 percolated from underlay encapsulation to end customer specific 158 payload. This is something which is NOT directed by any standards, 159 and also NOT implemented by current deployment(s) of routers and 160 switches. 162 Considering the fact that future beholds IPv6-only datacenter 163 deployments, IPv6 PMTUD is one of the major casualties which can 164 linger on forever, in case not dealt with as of now. Although this 165 document intends to resolve PMTUD problem as a generic one across all 166 underlay encapsulations. 168 Note that terms "ICMP(V6)" or "icmp(v4)" are used in the document 169 with an intention to refer to both icmp and icmpv6, in case same 170 context applies to both. 172 3.1. IPv6 PMTUD Issues 174 As mentioned in the [RFC1981], IPV6 PMTUD is based on the "Packet too 175 big" icmpv6 error code, generated by the networking device which is 176 capable of generating such messages on encountering packet paths 177 which go over link with MTU size smaller than packet size. 179 There are problems getting this working when end-point device 180 initiates a "Path MTU Discovery" to remote end-point device. It may 181 lead to black-holing as per the current implementations. 183 The following bullets provides pointers to potential black holing of 184 PMTUD packets, 186 (1) Vxlan Gateway(or ToR) MAY not set the DF bit in the outer IP 187 header encapsulation. 189 (2) Vxlan Gateway(or TOR) is incapable of relaying icmp error 190 "Fragmentation Needed and Don't Fragment was Set", generated by 191 IPv4 enabled core network device (underlay network), to IPv6 192 enabled end-point host/vm/server(source of the original packet). 194 The problems are discussed in detail in the following sub-sections. 196 3.1.1. Inaccurate MTU relayed to end hosts 198 Figure 1 depicts the topology referenced in the document for 199 explaining the problem statement and the solution. 201 +----------+ +----------+ 202 | H1 | | H2 | 203 | | | | 204 |(H1_IPv6) | |(H2_IPv6) | 205 +----------+ +----------+ 206 | | 207 | | 208 +------------+ +----------+ +------------+ 209 |(VtepA_IPv6)| | | |(VtepB_IPv6)| 210 | VtepA | | R1 | | VtepB | 211 |(VtepA_IPv4)|---| (R1_IPv4)|---|(VtepB_IPv4)| 212 +------------+ +----------+ +------------+ 214 Figure 1. L3 Overlay 216 LEGEND: 217 MAC address : _MAC 218 IP address : _IPv4 219 IPv6 address: _IPv6 220 : node names in the above topology are 221 H1, VtepA, R1, VtepB, H2. 222 VtepA, VtepB: Vxlan gateways to core network 223 R1: Intermediate router in underlay network 224 H1,H2: End-point devices communicating withe each other 226 H1 and H2 are the end point hosts in different subnet connected over 227 Vxlan Overlays in core network. The Vtep tunnel end-points MAY be 228 ToR devices are christened as VtepA and VtepB, reachabile over an 229 underlay IPv4 network. VtepA and VtepB are dual stack enabled and 230 act as Vxlan gateways to connected hosts in this specific example. 231 Link mtu between VtepA, R1 and VtepB is 1300 bytes, where as for the 232 link between H1 and VtepA, H2 and VtepB, is 1500 bytes. 234 H1 sends out a packet obliging to 1500 bytes MTU packet size 235 containment over the H1 and VtepA link. VtepA encapsulates the 236 packet with (Vxlan + UDP) header and outer IP header corresponding to 237 underlay reachability to destination tunnel end-point, that is VtepB, 238 to reach out to H2. 240 If size of encapsulated packet to be send over the link VtepA-R1 241 exceeds the MTU (1300 bytes). IPv4 packet with (IP header + UDP 242 header + Vxlan header + Original L2 Packet from H1 containing the 243 IPv6 Payload) SHOULD be fragmented. In case Vxlan gateway, VtepA, 244 does not sets the DF-bit in the outer IP header, the packet gets 245 fragmented, with the reassembly done at the egress gateway (VtepB). 247 The re-assembled packet is routed by VtepB to H2. This can 248 potentially lead to inaccurate Path MTU calculation at H1. H1 249 assumes it to be 1500 bytes as no icmp error is received. This opens 250 the door for fragment/reassembly and more cpu cycles on networking 251 devices in core network. 253 3.1.2. Packet_Too_Big not-relayed to host 255 In figure 1, assume that link between VtepA and R1 is 1500 as the 256 only change from the figure 1 topology. Hence the packet send by H1, 257 leads to VtepA setting the DF-bit in the outer IP header(as part of 258 Vxlan Encapsulation). When R1 receives the packet and the routing 259 table lookup points to the outgoing link with mtu size R1_VtepB_MTU 260 bytes, less than the packet size (1500 bytes). As DF-bit is set, R1 261 generates ICMPv4 error directed towards the src-ip (VtepA_IPv4). It 262 encapsulates the inner PDU of the original packet. However, VtepA 263 drops the icmp error packet and fails to relay it to H1. This leads 264 to black-holing. 266 The above two sub-sections lay down potential problems for IPv6 Path 267 MTU Discovery mechanism in an Overlay network. Although these 268 problem are generic to any combination of underlay and overlay 269 network types (IPv4 or IPv6), the use-case topology in this document 270 is specific to IPv6 end-point devices connected over Vxlan network, 271 wherein, the underlay is connected over IPv4 network, unless 272 mentioned specifically. 274 4. Solution(s) 276 4.1. Discovery of end-to-end Path MTU 278 Since Vxlan Gateway (can be a ToR device) is the one, which 279 encapsulates the Vxlan (or any other overlay) header onto the packet 280 traversing through the overlay network and also decapsulates the 281 overlay header for packets egressing out of same and heading towards 282 the end devices, the solution becomes more apt to be installed on 283 devices playing such role. 285 Firstly, It is a MUST that Vxlan gateways (VtepA, VtepB or ToR 286 device) SHOULD set the DF-bit in Outer header encapsulation for 287 client packets that are wrapped with vxlan, related encapsulation, 288 for Path MTU Discovery. Thus ensuring that ICMP error packet is 289 generated for packet size exceeding the link MTU in underlay network. 291 Secondly, it is MUST that Vxlan gateway devices translates the ICMP 292 error "Destination Unreachable" with code 'Fragmentation Needed and 293 Don't Fragment was Set', into a ICMPv6 error 'Packet too big' packet. 294 This mandates that original packet carried in the icmp error message 295 MUST carry information about the inner payload(original packet), and 296 it is an IPv6 Packet, originated from the end-point device (H1 for 297 VtepA in figure 1), connected to the Vxlan gateway over L3/L2 298 network. 300 Thirdly, it is MUST that Vxlan gateway devices translates the ICMPv6 301 error 'Packet too big' into a ICMP error 'Destination Unreachable' 302 with code 'Fragmentation Needed and Don't Fragment was Set' packet. 303 Successfully translation mandates that, original packet carried in 304 the icmp error message gives information about the inner payload 305 (original packet), and it is an IPv4 packet, which originated from 306 the end-point device connected to gateway over L3/L2 network. 308 Fourthly, incase both, the client side network connected to Vxlan 309 Gateway and the underlay network are same, that is, either both are 310 ipv4 or both are ipv6, then icmp error code error translation is NOT 311 required. Rest of the process to retrieve original packet is 312 identical. 314 4.1.1. ICMP extensions, PMTUD on Vxlan 316 This solution leverages extensions in ICMP and ICMPv6 standards, 317 [RFC4884], for the maximum size of the original packet that can be 318 encapsulated in ICMP error message with code as "Fragmentation 319 Required(icmp)" or "Packet too big(icmpv6)" respectively. As the 320 host info is encapsulated in the inner payload, this requires 321 additional bytes of data in icmp packet: (Outer IP Header + UDP 322 Header + Vxlan + Inner L2 Header + Inner IPv6 SRC/DST IPs). 324 In case Vxlan core network is provisioned over IPv6 underlay, then 325 similar extensions are applicable to icmpv6. 327 The processing of ICMP(V6) packet is extended from the current 328 standards of 'non-delivery of ICMP(v6) packets to upper-layers on 329 Vxlan gateways' to 'relaying it to the end-point devices'. 331 4.1.2. Packet Path Processing 333 Packet Path handling and processing is explained in this section. 334 The assumptions are made with respect to network topology mentioned 335 in Section 3.1.1. The packet format in each flow captures packet 336 fields which are significant with respect to this solution. To 337 understand the solution, the packet flow is explained which leads to 338 generation of ICMP or ICMPv6 error by intermediate node in underlay 339 network. 341 IPv6 packet is sent by host H1 destined to host H2, both are in 342 different IPv6 subnets.This packet is referred to as P1 in the 343 document. 345 +----------------------------------------------------+ 346 H1--|L2_Hdr(14 bytes): src-mac:H1_MAC, dest-mac:VtepA_MAC|-->VtepA 347 +----------------------------------------------------+ 348 |IPv6_Hdr(40 bytes): src-ip:H1_IPV6, dest-ip:H2_IPv6 | 349 +----------------------------------------------------+ 350 |Host/App specific Payload | 351 +----------------------------------------------------+ 352 Figure 2a. Packet P1 sent by host H1 to host H2 354 VtepA re-writes the mac addresses in 'P1' as part of Vxlan 355 encapsulation. This encapsulation is referred as 'P2' in the 356 document. 358 +------------------------------------------------------+ 359 H1--|L2_Hdr(14 bytes):src-mac:VtepA_MAC, dest-mac:VtepB_MAC|-->VtepA 360 +------------------------------------------------------+ 361 |IPv6_Hdr(40 bytes): src-ip:H1_IPV6, dest-ip:H2_IPv6 | 362 +------------------------------------------------------+ 363 |Host/App specific Payload | 364 +------------------------------------------------------+ 365 Figure 2b. Packet P1 re-written by VtepA 367 4.1.2.1. Packet Processing at Vxlan Gateway 369 Processing at VtepA, in packet path from H1 to H2. 371 (1) VtepA(Vxlan gateway) performs the Vxlan encapsulation over the 372 packet received from H1, based on route lookup. The detail for 373 encap are mentioned in [RFC7348]. 375 (2) VtepA MUST set the DF-bit in the Outer IP header. 377 (3) Since the MTU of outgoing link is more than the packet, packet 378 is sent out towards the underlay next hop, R1. 380 (4) P3 packets encapsulation is shown in figure 3. P3 may find a 381 reference without outer header encapsulation [RFC7348] provides 382 details of the vxlan encapsulation. 384 +----------------------------------------------------------+ 385 VtepA-|L2_Hdr(14bytes):src-mac:VtepA_Mac, dest-mac:R1_MAC |-->R1 386 +----------------------------------------------------------+ 387 |IPv4_Hdr(20 bytes):src-ip:VtepA_IPv4,dest-ip:VtepB_IPv4,DF| 388 +----------------------------------------------------------+ 389 |UDP(8 bytes): src-port: ephemeral-port, dest-port: 4789 | 390 +----------------------------------------------------------+ 391 |Vxlan(8 bytes): Vxlan network identifier | 392 +----------------------------------------------------------+ 393 |P2 packet (refer to H1 to VtepA flow for details of P1) | 394 +----------------------------------------------------------+ 395 Figure 3. Vxlan Encap packet sent by Vxlan Gateway to core 397 4.1.2.2. Underlay Generates ICMP error 399 In case the underlay is ipv6 and not ipv4, icmpv6 error is generated. 401 Processing at R1: 403 (1) Packet Size (1500 bytes) is more than the outgoing link's mtu 404 (1300 bytes) and DF-bit is set in the Outer IPv4 header added as 405 part of Vxlan encapsulation at VtepA. 407 (2) R1 MUST generate icmp error message (Destination Unreachable) 408 with error code (Fragmentation Needed and Don't Fragment was 409 Set). For ease of solution description, mtu is assumed to be 410 symmetric over the reverse path, hence reverse path mtu from R1 411 to VtepA is 1500 bytes. ICMP(v6) error message MUST include MTU 412 of link between R1 and VtepB. 414 (3) In a nut shell, the ICMP PDU encapsulation SHOULD be performed 415 as mentioned in [RFC4884] , [RFC4443]. These standards atleast 416 ensure, that original packet carried in icmp error PDU captures 417 enough bytes to include the inner packets IPv6 header atleast. 418 The capture of application specific details depends on the size 419 of the Optional header in the original packet (generated by H1 420 as in Figure 2b) and subsequent transport header. This helps 421 Vxlan Gateway to trace(L3 reachability) the original packet 422 generator (end-point device) atleast and translate icmp error 423 generted by underlay into icmpv6 one and relay it to end-point 424 device. The length field in ICMP PDU, include the maximum 425 possible length permissible in reverse path MTU 427 For simplicity, not including the original packet header in the flow 428 diagram in figure 4. ICMP PDU details are depicted in the follow up 429 figure 5. 431 +-----------------------------------------------------------+ 432 R1-|L2_Hdr(14 bytes): src-mac:R1_MAC, dest-mac:VtepA_MAC |-->VtepA 433 +-----------------------------------------------------------+ 434 |IPv4_Hdr(20 bytes): src-ip:R1_IPv4, dest-ip:VtepA_IPv4 | 435 +-----------------------------------------------------------+ 436 |ICMP PDU,type:3,code:4,R1_VtepB_MTU, P3(No outer L2 Header)| 437 +-----------------------------------------------------------+ 438 Figure 4. Flow diagram from R1 to VtepA 440 The details of ICMP PDU are in the following figure. Type '3' is 441 "Destination Unreachable". Code '4' is "Fragmentation Needed and 442 Don't Fragment bit is set". 444 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4s 5 6 7 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 446 | Type=3 | Code=4 | Checksum | ICMP 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 448 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 450 | Ver=4|IHL=5 | TOS | Total length | ^ 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 452 | Id |Flags| Fragment Offset | | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 454 | TTL | Protocol=UDP | Header Checksum |(Outer) 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Max 40 456 | src-ip : VtepA_IPv4 | | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 458 | dest-ip : VtepB_IPv4 | v 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 460 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 462 | Length | Checksum | | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 464 | | | | | | | | | Reserved | | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 466 |Vxlan Network identifier (VNI) | Reserved | | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------- 468 | Inner Packet Dest-Mac = VtepB_MAC | ^ 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 470 | | Inner Packet Src-Mac = | | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner 472 | VtepA_MAC |14 byte) 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 474 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 476 |Ver=6 |Traffic Class | Flow Label | ^ 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 478 |payload length |Next Header | Hop Limit | | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 480 | | | 481 | src-ipv6 = H1_IPv6 |IPv6 482 | |Header 483 | | | 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 485 | | | 486 | dest-ipv6 = H2_IPv6 | | 487 | | | 488 | | v 489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 490 | ~ Optional Headers and transport header/Payload ~ | Varies 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 492 Figure 5. ICMP PDU Original Packet Capture in Detail 494 4.1.2.3. Relay ICMP(v6) Error to End Devices 496 This sub-section can also be generalized as: "handling of icmp 497 errors, which are generated by underlay network in response to end- 498 device packets, by Vxlan Gateway". 500 Processing at VtepA: Processing of icmp error message with code 501 (Fragmentation Needed and Don't Fragment was Set): 503 (1) The icmp error is processed by Vxlan gateways as per the 504 standards defined in [RFC1981] , [RFC4884] and [RFC4443] . 506 (2) If error code is (Fragmentation Needed and Don't Fragment was 507 Set), it SHOULD perform further inspection of the original 508 packet, P3(ethernet payload without its header) carried as data 509 in ICMP PDU in extension to standards referred in previous 510 bullet. The extension processing MUST be done prior to taking a 511 decision to either drop the packet or deliver to upper-layer 512 protocols. 514 (3) In extension to above, Vxlan gateway device SHOULD perform the 515 vxlan decap as defined in [RFC7348], to arrive at the inner 516 packet (P2, original packet with VtepA rewrite). The underlay 517 encap is not carrying the layer-2 header in the icmp error 518 packet. Once this processing is done, P2 is the packet which 519 needs attention now, as it carries the credentials of actual 520 host which should receive the relayed icmp packet. 522 (4) The layer-3 payload type SHOULD be verified using ethernet type 523 field in ethernet header. In case it point to IPv6, src-ipv6 524 field should be picked up to check for reahability, as the icmp 525 packet MUST be sent to original sender, that is, H1. In case H1 526 is reachable, ICMP packet SHOULD be constructed as mentioned in 527 the following bullet. 529 (5) Now that P2 is out in the open, it's L2 header is decapsulated, 530 and the leftover, in the figure 6, is run through the icmpv6 531 processing as mentioned in [RFC4443]. 533 (6) It SHOULD generate ICMPv6 error message with type (Packet too 534 big) destined to H1_IPv6, that is inner ipv6 packet's source 535 ipv6 address. The mtu 'R1_VtepB_MTU' is copied from icmp error 536 packet recieved from the underlay. 538 (7) The IPv6 header is constructed from original payload as shown in 539 figure 5. The source ipv6 address is picked as local ipv6 540 address "VtepA_IPv6". The destination ipv6 address is set as 541 the "src-ipv6" in original payload, H1_IPv6. The Next Header is 542 set as "58" which denote ICMPv6. The derivation of ethernet 543 header is based on next hop to mac address mapping as is 544 performed in any L3 lookup. The follow up figure 9, shows the 545 icmpv6 error packet sent out to node H1. H1 is the original 546 IPv6 packet generator as mentioned in Figure 2b. 548 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 549 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 550 |Ver=6 |Traffic Class | Flow Label | ^ 551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 552 |payload length |Next Header | Hop Limit | | 553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 554 | | | 555 | src-ipv6 = H1_IPv6 | Inner 556 | | IPv6 557 | | 40 byt) 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 559 | | | 560 | dest-ipv6 = H2_IPv6 | | 561 | | | 562 | | v 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 564 | ~ Optional Headers and Transport/Application Payload ~ | Varies 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 566 Figure 6. Original IPv6 Packet sent from H1 directed to H2 568 Figure 6 gives a typical IPv6 format sent by end-host, H1 towards H2 569 and encapsulated by Vxlan gateway, to translate the icmp error 570 generated by underlay hop, R1, to the one understood in right context 571 by H1. 573 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 575 | Type=2 | Code=0 | CheckSum | | 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 577 | Mtu = R1_VtepB_MTU | | 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 579 |Ver=6 |Traffic Class | Flow Label | ^ 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 581 |payload length |Next Header | Hop Limit | | 582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 583 | | | 584 | src-ipv6 = H1_IPv6 | Orig 585 | | Packet 586 | |40 byte) 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 588 | | | 589 | dest-ipv6 = H2_IPv6 | | 590 | | | 591 | | v 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----- 593 | ~ Optional/Transport Headers and Application Payload ~ |varies 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 595 Figure 7. ICMPv6 "Packet Too Big" PDU relayed 596 to H1 by Vxlan Gateway (VtepA) 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 599 | Dest-Mac = H1_MAC | ^ 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 601 | | Inner Packet Src-Mac = | | 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 603 | VtepA_MAC |14 byte) 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 605 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 607 |Ver=6 |Traffic Class | Flow Label | ^ 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 609 |payload length |Next Hdr = 58 | Hop Limit | | 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 611 | | | 612 | src-ipv6 = VtepA_IPv6 | IPv6 613 | |header 614 | | | 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 616 | | | 617 | dest-ipv6 = H1_IPv6 | | 618 | | | 619 | | v 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 621 Figure 8. Ethernet and IPv6 encap for ICMPv6 PDU mentioned in 622 figure 7 624 The translated icmp packet encapsulation looks similar to, figure 7 625 and figure 8 put together in reverse order. The flow diagram in 626 figure 9 gives a concise form of "packet too big" icmpv6 error 627 relayed by VtepA (Vxlan Gateway) towards H1 (end point device). 629 +--------------------------------------------------------+ 630 VtepA--|L2_Hdr(14): src-mac:VtepA_MAC and Dest_Mac: H1_MAC |-->H1 631 +--------------------------------------------------------+ 632 |IPv6_Hdr(40 bytes): src-ip:Vtep_IPv6, dest-ip:H1_IPv6 | 633 +--------------------------------------------------------+ 634 |ICMPv6: Packet_Too_Big, mtu, data: first 128 bytes of P3| 635 +--------------------------------------------------------+ 636 Figure 9. Flow diagram: VtepA to H1 638 There are few more potential flows worth mentioning in this section. 639 These cases are related to, icmp error getting generated from, 640 ingress Vxlan gateway (VtepA) and egress Vxlan gateway (VtepB) with 641 respect to packet sent from H1 to H2. For ingress Vxlan gateway 642 (VtepA) case, the legacy IPv6 PMTUD rules from [RFC4443] SHOULD be 643 applied as no Vxlan encap is involved. 645 Where as, egress Vxlan gateway (VtepB) SHOULD send packet P3 (without 646 L2 header) in the icmp data, even though mtu calculation MAY be done 647 post vxlan decapsulation. That is when the outgoing link is 648 identified as the one from VtepB to H2. It MAY buffer packet P3 649 prior to lookup based on inner packet (P2) credentials, so that P3 650 can be encapsulated in the icmp packet. This also ensures the packet 651 format consistency, when accessed at the VtepA for translation before 652 relaying it to H1. 654 4.1.3. ICMP(v6) Error Translation 656 This section specifically mentions about ICMP and ICMPv6 packet 657 translation, generated in an underlay network to the one which is, 658 understood by the end point device, with encapsulation aligning with 659 the network-type(IPv4 and IPv6), end-point device and underlay is 660 provisioned with. The last leg processing mentioned in previous sub- 661 section is specific to the topology mentioned in Section 3.1.1. 662 However, this subsection elaborates on all possible topology 663 combination of underlay and end-device networks with respect to IPv4 664 or IPv6. The explanation provided in form of figures for error 665 generated by underlay and the translated one relayed to the end-point 666 device by Vxlan gateway. 668 (a) End-Point is IPv6 connected and Underlay is IPv4 provisioned. 670 (b) End-Point is IPv4 connected and Underlay is IPv6 provisioned. 672 (c) Both End-Point and Underlay are provisioned with IPv6. 674 (d) Both End-Point and Underlay are provisioned with IPv4. 676 4.1.3.1. End-Point is IPv6 connected and Underlay is IPv4 provisioned 678 This case is similar to the last leg processing described in 679 Section 4.1.2 and does not needs any more description. 681 4.1.3.2. End-Point is IPv4 connected and Underlay is IPv6 provisioned 683 Topology drawn in figure 10, provides for the icmpv6 PDU encap 684 generated by R1. H1_IPv4 and H2_IPv4 are in distinct ipv4 subnets. 685 R1_IPv6 represents IPv6 addresses falling in both subnets connecting 686 to VtepA and VtepB. 688 Another difference between an IPv4 and IPv6 underlay is that for IPv6 689 underlay there is no concept of DF-bit. The fragmentation can only 690 be done at ingress. At all other underlay nodes "Packet too big" 691 icmpv6 error is generated. Vxlan Gateway SHOULD ensure that 692 fragmentation is avoided at Vxlan Gateway and icmp error is sent back 693 to H1. This procedure is applicable if and only if, original packet 694 contains DF-bit set in it's IP header. 696 +----------+ +----------+ 697 | H1 | | H2 | 698 | | | | 699 |(H1_IPv4) | |(H2_IPv4) | 700 +----------+ +----------+ 701 | | 702 | | 703 +------------+ +----------+ +------------+ 704 |(VtepA_IPv4)| | | |(VtepB_IPv4)| 705 | VtepA | | R1 | | VtepB | 706 |(VtepA_IPv6)|---| (R1_IPv6)|---|(VtepB_IPv6)| 707 +------------+ +----------+ +------------+ 709 Figure 10. L3 Overlay 711 LEGEND: 712 MAC address : _MAC 713 IPv4 address: _IPv4 714 IPv6 address: _IPv6 715 : node names in the above topology are 716 H1, VtepA, R1, VtepB, H2. 717 VtepA, VtepB: Vxlan gateways to core network 719 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 721 | Type=2 | Code=0 | Checksum | ICMPv6 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 723 | Next Hop Mtu = R1_VtepB_MTU | Code=0 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 725 |Ver=6 |Traffic Class | Flow Label | ^ 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 727 |payload length |Next Hdr | Hop Limit | | 728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 729 | | | 730 | src-ipv6 = R1_IPv6 | IPv6 731 | |40 byte) 732 | | | 733 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 734 | | | 735 | dest-ipv6 = VtepA_IPv6 | | 736 | | | 737 | | | 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 739 | ~ Extension Headers ~ (payload type is UDP) | v 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 741 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8 byte 743 | Length | Checksum | | 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 745 | | | | | | | | | Reserved | | 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8 byte 747 |Vxlan Network identifier (VNI) | Reserved | | 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 749 | Inner Packet Dest-Mac = VtepA_MAC | ^ 750 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 751 | | Inner Packet Src-Mac = | | 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 753 | VtepB_MAC |14 byte 754 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 755 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 757 | Ver=4|IHL=5 | TOS | Total length | ^ 758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 759 | Id |Flags| Fragment Offset | | 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 761 | TTL | Protocol | Header Checksum | Orig 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Hdr 763 | src-ip : H1_IPv4 | | 764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 765 | dest-ip : H2_IPv4 | v 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 767 | ~ transport-header and Application specific Payload ~ | varies 768 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 769 Figure 11. ICMPV6 PDU Sent by R1 to VtepA 771 R1 sends an icmpv6 error "Packet Too Big" directed towards VtepA. 772 The icmpv6 PDU is shown in Figure 11. VtepA receives the packet with 773 this icmpv6 PDU and translates it to icmp PDU with type "Destination 774 Unreachable" and code "Fragmentation Needed" before relaying it to H1 775 over ipv4 network. Figure 12, reflects the relayed packet sent by 776 VtepA to H1. All other references SHOULD be taken as it is from 777 Section 4.1.2. 779 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 781 | Dest-Mac = H1_MAC | ^ 782 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 783 | | Inner Packet Src-Mac = | | 784 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 785 | VtepA_MAC |14 byte) 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 787 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 789 | Ver=4|IHL=5 | TOS | Total length | ^ 790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 791 | Id |Flags| Fragment Offset | | 792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 793 | TTL | Protocol=1 | Header Checksum | IPv4 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 795 | src-ip : VtepA_IPv4 | | 796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 797 | dest-ip : H1_IPv4 | | 798 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 799 | Optional Header | v 800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 801 | Type=3 | Code=4 | Checksum | ICMP 802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 803 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 805 | Ver=4|IHL=5 | TOS | Total length | ^ 806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 807 | Id |Flags| Fragment Offset | | 808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 809 | TTL | Protocol | Header Checksum |Orig 810 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+iPv4 811 | src-ip : H1_IPv4 | | 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 813 | dest-ip : H2_IPv4 | v 814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 815 | Optional and Transport Header and Application data | varies 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 817 Figure 12. ICMPv4 error Packet relayed to end point Host, H1 819 4.1.3.3. Both End-Point and Underlay are provisioned with IPv6 821 Topology is mentioned in Figure 13 with minor changes along with the 822 legend. Figure 14, outlines the icmpv6 PDU, encapsulation generated 823 by R1. H1_IPv6 and H2_IPv6 in different ipv6 subnets. R1_IPv6 824 reflects both subnets connecting to VtepA and VtepB. 826 +----------+ +----------+ 827 | H1 | | H2 | 828 | | | | 829 |(H1_IPv6) | |(H2_IPv6) | 830 +----------+ +----------+ 831 | | 832 | | 833 +------------+ +----------+ +------------+ 834 |(VtepA_IPv6)| | | |(VtepB_IPv6)| 835 | VtepA | | R1 | | VtepB | 836 |(VtepA_IPv6)|---| (R1_IPv6)|---|(VtepB_IPv6)| 837 +------------+ +----------+ +------------+ 839 Figure 13. L3 Overlay 841 LEGEND: 842 MAC address : _MAC 843 IPv6 address: _IPv6 844 : node names in the above topology are 845 H1, VtepA, R1, VtepB, H2. 846 VtepA, VtepB: Vxlan gateways to core network 848 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 850 | Type=2 | Code=0 | Checksum | ICMPv6 851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 852 | Next Hop Mtu = R1_VtepB_MTU | Code=0 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 854 |Ver=6 |Traffic Class | Flow Label | ^ 855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 856 |payload length |Next Hdr | Hop Limit | | 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 858 | | | 859 | src-ipv6 = R1_IPv6 | IPv6 860 | | Header 861 | | | 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 863 | | | 864 | dest-ipv6 = VtepA_IPv6 | | 865 | | | 866 | | | 867 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 868 | ~ Extension Headers ~ (payload type is UDP) | v 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 870 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 872 | Length | Checksum | | 873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 874 | | | | | | | | | Reserved | | 875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 876 |Vxlan Network identifier (VNI) | Reserved | | 877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 878 | Inner Packet Dest-Mac = VtepB_MAC | ^ 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 880 | | Inner Packet Src-Mac = | | 881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 882 | VtepA_MAC |14 byte 883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 884 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 886 |Ver=6 |Traffic Class | Flow Label | ^ 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 888 |payload length |Next Hdr | Hop Limit | | 889 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 890 | | | 891 | src-ipv6 = VtepA_IPv6 |Inner 892 | | Ipv6 893 | | | 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 895 | | | 896 | dest-ipv6 = H1_IPv6 | | 897 | | | 898 | | v 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 900 | ~ Extension and Transport Headers, Application Data ~ | varies 901 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 902 Figure 14. ICMPv6 PDU generated by Intermediate Hop, R1 in Vxlan Network 904 R1 sends an icmpv6 error "Packet Too Big" directed towards VtepA. 905 The icmpv6 PDU is shown in Figure 14. VtepA receives the packet with 906 this icmpv6 PDU and relays it to H1 without any translation as H1 is 907 connected to VtepA over ipv6 network. All other references about 908 original packet to be include in the icmpv6 PDU can be taken as it is 909 from Section 4.1.2. 911 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 913 | Dest-Mac = H1_MAC | ^ 914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 915 | | Inner Packet Src-Mac = | | 916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 917 | VtepA_MAC |14 byte 918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 919 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 921 |Ver=6 |Traffic Class | Flow Label | ^ 922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 923 |payload length |Next Hdr | Hop Limit | | 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 925 | | | 926 | src-ipv6 = VtepA_IPv6 |IPv6 927 | | Header 928 | | | 929 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 930 | | | 931 | dest-ipv6 = H1_IPv6 | | 932 | | | 933 | | | 934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 935 | ~ Extension Headers ~ (payload type is ICMPV6) | v 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 937 | Type=2 | Code=0 | Checksum | ICMPv6 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 939 | Next Hop Mtu = R1_VtepB_MTU | Code=0 940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 941 |Ver=6 |Traffic Class | Flow Label | ^ 942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 943 |payload length |Next Hdr | Hop Limit | | 944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 945 | | | 946 | src-ipv6 = H1_IPv6 |Orig 947 | |IPv6 948 | | | 949 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 950 | | | 951 | dest-ipv6 = H2_IPv6 | | 952 | | | 953 | | v 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 955 | ~ Extension and Transport Headers and Applcation data ~ | varies 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 957 Figure 15. ICMPv6 error Complete Packet sent to H1 by VtepA 958 4.1.3.4. Both End-Point and Underlay are provisioned with IPv4 960 Topology is mentioned in figure 16, with minor changes along with the 961 legend, figure 17, provides the icmp PDU encap generated by R1. 962 H1_IPv4 and H2_IPv4 are in different ipv4 subnets. 964 +----------+ +----------+ 965 | H1 | | H2 | 966 | | | | 967 |(H1_IPv4) | |(H2_IPv4) | 968 +----------+ +----------+ 969 | | 970 | | 971 +------------+ +----------+ +------------+ 972 |(VtepA_IPv4)| | | |(VtepB_IPv4)| 973 | VtepA | | R1 | | VtepB | 974 |(VtepA_IPv4)|---| (R1_IPv4)|---|(VtepB_IPv4)| 975 +------------+ +----------+ +------------+ 977 Figure 16. L3 Overlay 979 LEGEND: 980 MAC address : _MAC 981 IPv4 address: _IPv4 982 : node names in the above topology are 983 H1, VtepA, R1, VtepB, H2. 984 VtepA, VtepB: Vxlan gateways to core network 986 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 988 | Type=3 | Code=4 | Checksum | ICMP 989 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 990 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 991 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 992 | Ver=4|IHL=5 | TOS | Total length | ^ 993 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 994 | Id |Flags| Fragment Offset | | 995 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 996 | TTL | Protocol=UDP | Header Checksum | IPv4 997 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Header 998 | src-ip : VtepA_IPv4 | | 999 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1000 | dest-ip : H1_IPv4 | v 1001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1002 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 1004 | Length | Checksum | | 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1006 | | | | | | | | | Reserved | | 1007 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 1008 |Vxlan Network identifier (VNI) | Reserved | | 1009 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1010 | Inner Packet Dest-Mac = VtepB_MAC | ^ 1011 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1012 | | Inner Packet Src-Mac = |inner 1013 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+packet 1014 | VtepA_MAC |eth hdr 1015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1016 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1018 | Ver=4|IHL=5 | TOS | Total length | ^ 1019 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1020 | Id |Flags| Fragment Offset | | 1021 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1022 | TTL | Protocol | Header Checksum | IPv4 1023 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ hdr 1024 | src-ip : H1_IPv4 | | 1025 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1026 | dest-ip : H2_IPv4 | v 1027 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1028 | ~ Optional and Transport Header and Application Payload ~ |varies 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1031 Figure 17. ICMP PDU generated by R1 towards VtepA 1032 R1 sends an icmp error directed towards VtepA. The icmp PDU is shown 1033 in figure 17. VtepA receives the packet with this icmp PDU and 1034 relays it to H1 over ipv4 network. Figure 16, displays the packet 1035 sent by VtepA to H1. All other references can be taken as it is from 1036 Section 4.1.2. 1038 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1040 | Dest-Mac = H1_MAC | ^ 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1042 | | Src-Mac = | | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ eth 1044 | VtepA_MAC |header 1045 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1046 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 1047 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1048 | Ver=4|IHL=5 | TOS | Total length | ^ 1049 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1050 | Id |Flags| Fragment Offset | | 1051 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1052 | TTL | Protocol=1 | Header Checksum |IPv4 1053 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Header 1054 | src-ip : VtepA_IPv4 | | 1055 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1056 | dest-ip : H1_IPv4 | | 1057 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1058 | Optional Header | v 1059 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1060 | Type=3 | Code=4 | Checksum | ICMP 1061 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 1062 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 1063 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1064 | Ver=4|IHL=5 | TOS | Total length | ^ 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1066 | Id |Flags| Fragment Offset | | 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1068 | TTL | Protocol | Header Checksum |Orig 1069 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+IPv4 1070 | src-ip : H1_IPv4 | | 1071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1072 | dest-ip : H2_IPv4 | v 1073 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1074 |~ Optional and Transport Header and Application Payload ~ | varies 1075 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1077 Figure 18. Complete ICMP error Packet sent to H1 by VtepA 1078 5. Multicast and Anycast Considerations 1080 Multicast solution is similar to one proposed in [RFC1981]. This 1081 SHOULD be applied at Vtep for cases of unknown unicast destinations. 1083 There are no anycast considerations in this document, as the solution 1084 is based upon nodes deriving mtu values from the underlay network 1085 which should either have unicast or multicast reachability between 1086 them. 1088 6. Ecmp Considerations 1090 Ecmp considerations are driven by the packet sent by the end host 1091 application and the way it's leveraged. 1093 To ensure PMTUD is agnostic to ecmp paths in a Vxlan network, there 1094 are few more consideration. In Vxlan Gateway (can be ToR device), 1095 the route look-up is done based on attributes carried in packet 1096 generated by end point host. The packet generated can potentially be 1097 from a tcp based end host application (although should not be 1098 generalized). 1100 Where as, for an intermediate node, (lets say, Spine node in Clos 1101 topology) in core network the look ups are based on Outer Encap (Vtep 1102 ip addresses and and UDP Header). 1104 On another note, for an L2 gateway case, wherein Vxlan gateway (Vtep 1105 Node) bridges (and not routes) host packets destined to same subnet 1106 destination, MTU calculation SHOULD come into play only in the Spine 1107 devices. 1109 7. Security Considerations 1111 This document inherits all the security considerations discussed in 1112 [RFC1981] and [RFC1191]. 1114 8. IANA Considerations 1116 TBD 1118 9. Acknowledgements 1120 Thanks to Vengada Prasad Govindan, Deepak Kumar, Matthew Bocci and 1121 Rohit Mendiratta for providing the inputs. 1123 10. References 1125 10.1. Normative References 1127 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1128 Requirement Levels", BCP 14, RFC 2119, March 1997. 1130 10.2. Informative References 1132 [I-D.draft-gross-geneve] 1133 Gross, J., Sridhar, T., Garg, P., Wright, C., Ganga, G., 1134 Agarwal, P., Duda, C., Dutt, D., and J. Hudson, "Geneve: 1135 Generic Network Virtualization Encapsulation", draft- 1136 gross-geneve-02 (work in progress), Oct 2015. 1138 [I-D.draft-ietf-nvo3-gue] 1139 Herbert, T., Yong, L., and O. Zia, "Generic Protocol 1140 Extension for VXLAN", draft-ietf-nvo3-gue-03 (work in 1141 progress), Mar 2015. 1143 [I-D.draft-ietf-nvo3-vxlan-gpe] 1144 Quinn, P., Manur, R., Kreeger, L., Lewis, D., Maino, F., 1145 Smith, M., Agarwal, P., Yong, L., Xu, X., Elzur, U., and 1146 D. Melman, "Generic Protocol Extension for VXLAN", draft- 1147 ietf-nvo3-vxlan-gpe-02 (work in progress), May 2015. 1149 [I-D.nordmark-nvo3-transcending-traceroute] 1150 Nordmark, E., Appanna, C., and A. Lo, "Layer-Transcending 1151 Traceroute for Overlay Networks like VXLAN", draft- 1152 nordmark-nvo3-transcending-traceroute-02 (work in 1153 progress), March 2015. 1155 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1156 November 1990. 1158 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1159 for IP version 6", RFC 1981, August 1996. 1161 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1162 Message Protocol (ICMPv6) for the Internet Protocol 1163 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1165 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1166 Discovery", RFC 4821, March 2007. 1168 [RFC4884] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, 1169 "Extended ICMP to Support Multi-Part Messages", RFC 4884, 1170 April 2007. 1172 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1173 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1174 eXtensible Local Area Network (VXLAN): A Framework for 1175 Overlaying Virtualized Layer 2 Networks over Layer 3 1176 Networks", RFC 7348, August 2014. 1178 [RFC7637] Yang, S. and M. Garg, "Network Virtualization Using 1179 Generic Routing Encapsulation", RFC 7637, Sep 2015. 1181 Authors' Addresses 1183 Saumya Dikshit 1184 Cisco Systems 1185 Cessna Business Park 1186 Bangalore, Karnataka 560 087 1187 India 1189 Email: sadikshi@cisco.com 1191 A Sujeet Nayak 1192 Cisco Systems 1193 Cessna Business Park 1194 Bangalore, Karnataka 560 087 1195 India 1197 Email: sua@cisco.com