idnits 2.17.1 draft-nvo3-mtu-propagation-over-evpn-overlays-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC4884], [RFC4459]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 515 has weird spacing: '...ers and trans...' -- The document date (3 August 2021) is 989 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.nordmark-nvo3-transcending-traceroute' is defined on line 1276, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 1301, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-gue-03 == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-02 == Outdated reference: A later version (-03) exists of draft-nordmark-nvo3-transcending-traceroute-02 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Duplicate reference: RFC9014, mentioned in 'RFC9014', was also mentioned in 'RFC4459'. Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 S. Dikshit 3 Internet-Draft V. Joshi 4 Intended status: Informational Aruba, HPE 5 Expires: 4 February 2022 A. Sujeet Nayak 6 Cisco 7 3 August 2021 9 MTU propagation over EVPN Overlays 10 draft-nvo3-mtu-propagation-over-evpn-overlays-01 12 Abstract 14 Path MTU Discovery between end-host-devices/Virtual-Machines/servers/ 15 workloads connected over an EVPN-Overlay Network in an 16 Datacenter/Campus/enterprise deployment, is a problem, yet to be 17 resolved in the standards forums. It needs a converged solution to 18 ensure optimal usage of network and computational resources of the 19 underlay routers/switches forming the basis of the overlay network. 20 This documents takes leads from the guidelines presented in 21 [RFC4459]. 23 The overlay connectivity can pan across various sites (geographically 24 seperated or collocated) for realizing a Datacenter Interconnect or 25 intersite VPNs between campus sites (buildings, branch offices etc). 27 This literature intends to solve problem of icmp error propagation 28 from an underlay routing/switching device to an end-host (hooked to 29 EVPN overlay), thus facilitating "accurate MTU" learnings. 31 This document also leverages the icmp multipart message extension, 32 mentioned in [RFC4884] to carry the original packet in the icmp PDU. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 4 February 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Simplified BSD License text 61 as described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 69 2.2. Solution Requirements . . . . . . . . . . . . . . . . . . 4 70 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 4 71 3.1. Issues in MTU propagation in an underlay . . . . . . . . 5 72 3.1.1. Inaccurate MTU relayed to end hosts . . . . . . . . . 5 73 3.1.2. Packet_Too_Big not-relayed to host . . . . . . . . . 7 74 4. Solution(s) . . . . . . . . . . . . . . . . . . . . . . . . . 7 75 4.1. Discovery of end-to-end Path MTU . . . . . . . . . . . . 7 76 4.1.1. ICMP extensions leveraged for MTU propagation . . . . 8 77 4.1.2. Packet Path Processing . . . . . . . . . . . . . . . 8 78 4.1.3. ICMP(v6) Error Translation . . . . . . . . . . . . . 17 79 5. Inter-site MTU Propagation . . . . . . . . . . . . . . . . . 27 80 6. Same subnet Considerations . . . . . . . . . . . . . . . . . 28 81 7. Ecmp Considerations . . . . . . . . . . . . . . . . . . . . . 28 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 29 83 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 84 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 85 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 86 11.1. Normative References . . . . . . . . . . . . . . . . . . 29 87 11.2. Informative References . . . . . . . . . . . . . . . . . 29 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 90 1. Introduction 92 There is an operational disconnect between underlay network 93 provisioned as the underlay network, and the overlay network which 94 intends to connect islands of customer deployments. The deployments 95 can range from cloud based services to storage applications or 96 web(over the top) servers hosted over virtual machines or any other 97 end devices like blade servers. Overlay network are provisioned as 98 tunnels leveraging Vxlan (and associated ones like gpe, geneve, gue), 99 NVGRE, MPLS and other overlay encapsulations. 101 The end hosts (VMs, workloads, user-devices) in a datacenter/campus 102 deployment are connected to gateway. In case the core network is 103 laid out with EVPN-overlays, the gateways are Vteps (Vxlan-fabric 104 gateways). These are the networking devices which encapsulate the 105 packet in an Overlay construct and relays it over the underlay 106 network. 108 For campus deployments, various branch offices can be provisioned 109 with EVPN-overlays and the vpn connectivity between them can be 110 realized via EVPN-overlays (VXLAN, MPLS, NVGRE fabrics) itself. Thus 111 it involves interworking/stitching of same/different overlays at DCI/ 112 VPN transit routing/switching devices. The transit devices can be 113 on-premise WAN gateways (SDWAN or otherwize) or service provider 114 network entry points. 116 IPv6/IPv4 enabled hosts/end-points, triggering PMTUD, may not get the 117 right/inconsistent (or none) information from (over) the underlay 118 network in case MTU errors are encountered in the packet path 119 (encapsulated in the overlay). This document validates the detailed 120 solution for Vxlan-fabric (though equally applicable to other EVPN- 121 overlays like Geneve, GUE, GPE, NVGRE) faciliated by an underlay 122 network (via any routing protocol like BGP, OSPF, ISIS, EIGRP etc). 123 This solution is equally applicable to other tunnel/overlay 124 specifications falling into EVPN-overlay category. 126 The proposal in this document, formulates an integrated approach 127 which falls inline with OAM modelling discussed in NVO3. 129 2. Requirements 131 2.1. Requirements Language 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 135 document are to be interpreted as described in [RFC2119]. 137 When used in lowercase, these words convey their typical use in 138 common language, and they are not to be interpreted as described in 139 [RFC2119]. 141 2.2. Solution Requirements 143 This section describes the advantages of the proposed solution, 144 considering deployment in a typical EVPN-overlay underlay network: 146 (a) Optimal use of bandwidth in underlay and end-host network. 148 (b) In case Vxlan Gateway nodes complies to this solution, it MAY 149 avoid black holing of icmp errors generated by underlay network 150 devices. 152 (c) All end host applications (like web servers) can tailor the MSS 153 accordingly against their respective transports. 155 (d) Facilitates seamless integration of IPv6 or dual stack 156 applications over IPv4 based overlays and vice versa. 158 (e) The proposed solution is applicable to all encapsulations 159 [RFC7348], [I-D.draft-ietf-nvo3-vxlan-gpe], 160 [I-D.draft-ietf-nvo3-gue], [I-D.draft-gross-geneve] and 161 [RFC7637]. Although the problem and solution refers to VXLAN 162 [RFC7348] as a use-case in this document. 164 3. Problem Description 166 In current vendor implementation(s) of Vxlan-Gateway or other network 167 devices, which form part of the underlay network and is configured 168 with an overlay(tunnel) mechanism to transport packets from one 169 customer end point to another, are incapable of relaying the errors 170 encountered in routing/switching path in their networks (underlay 171 network) to the customer end points (hosts/vm/blade-servers). This 172 deems right, as the underlay network should be transparent and water- 173 tight with respect to leaking any public (underlay) network 174 information to customer devices (and vice versa), thus ensuring 175 seclusion between different customers provisioning tunneled over the 176 common underlay network. 178 For example, the information carried in the IP header of a Vxlan 179 encapsulated packet is transparent to the payload (end-point 180 generated packet). Hence, any network-specific information related 181 to IPv6/IPv4 native functionality is carried to the end-point 182 devices, as is the case with an end-to-end private network. The 183 information generated in the underlay network devices while 184 processing packets destined-to/sourced-from end-point devices, need 185 to be percolated from underlay encapsulation to end customer specific 186 payload. This is something which is NOT directed by any standards, 187 and also NOT implemented by current deployment(s) of routers and 188 switches. 190 Thus end-host sending out packets may never know about a lingering 191 problem, impacting it's traffic in the underlay network. 193 Note that terms "icmpv6" or "icmpv4" are used in the document with an 194 intention to refer to both icmp and icmpv6, in case same context 195 applies to both. 197 3.1. Issues in MTU propagation in an underlay 199 As mentioned in the [RFC1981], IPV6 PMTUD is based on the "Packet too 200 big" icmpv6 error code, generated by the networking device which is 201 capable of generating such messages on encountering packet paths 202 which go over link with MTU size smaller than packet size. 204 There are problems getting this working when end-point device 205 initiates a "Path MTU Discovery" to remote end-point device. It may 206 lead to black-holing as per the current implementations. 208 The following bullets provides pointers to potential black holing of 209 PMTUD packets, 211 (1) Vxlan Gateway MAY not set the DF bit in the outer IP header 212 encapsulation. 214 (2) Vxlan Gateway is incapable of relaying icmp error "Fragmentation 215 Needed and Don't Fragment was Set", generated by IPv4 enabled 216 underlay network device, to IPv6 enabled end-point host/vm/ 217 server(source of the original packet). 219 The problems are discussed in detail in the following sub-sections. 221 3.1.1. Inaccurate MTU relayed to end hosts 223 Figure 1 depicts the topology referenced in the document for 224 explaining the problem statement and the solution. 226 +----------+ +----------+ 227 | H1 | | H2 | 228 | | | | 229 |(H1_IPv6) | |(H2_IPv6) | 230 +----------+ +----------+ 231 | | 232 | | 233 +------------+ +----------+ +------------+ 234 |(VtepA_IPv6)| | | |(VtepB_IPv6)| 235 | VtepA | | R1 | | VtepB | 236 |(VtepA_IPv4)|---| (R1_IPv4)|---|(VtepB_IPv4)| 237 +------------+ +----------+ +------------+ 239 Figure 1. L3 Overlay 241 LEGEND: 242 MAC address : _MAC 243 IP address : _IPv4 244 IPv6 address: _IPv6 245 : node names in the above topology are 246 H1, VtepA, R1, VtepB, H2. 247 VtepA, VtepB: Vxlan gateways 248 R1: Intermediate router in underlay network 249 H1,H2: End-point devices communicating withe each other 251 H1 and H2 are the end point hosts in different subnet connected over 252 Vxlan Overlays in the underlay network. The Vtep tunnel end points, 253 christened as VtepA and VtepB, are reachabile over an underlay IPv4 254 network. In this example, VtepA and VtepB are dual stack enabled and 255 act as Vxlan gateways to connected hosts. Link mtu between VtepA, R1 256 and VtepB is configured as 1300 bytes; where as for the links between 257 H1 and VtepA, H2 and VtepB, it is configured as 1500 bytes. 259 H1 sends out a packet obliging to 1500 bytes MTU packet size 260 containment over the H1 and VtepA link. VtepA encapsulates the 261 packet with (Vxlan + UDP) header and outer IP header corresponding to 262 underlay reachability to destination tunnel end-point, that is VtepB, 263 to reach out to H2. 265 If size of encapsulated packet to be send over the link VtepA-R1 266 exceeds the MTU (1300 bytes). IPv4 packet with (IP header + UDP 267 header + Vxlan header + Original L2 Packet from H1 containing the 268 IPv6 Payload) SHOULD be fragmented. In case Vxlan gateway, VtepA, 269 does not sets the DF-bit in the outer IP header, the packet gets 270 fragmented, with the reassembly done at the egress gateway (VtepB). 272 The re-assembled packet is routed by VtepB to H2. This can 273 potentially lead to inaccurate Path MTU calculation at H1. H1 274 assumes it to be 1500 bytes as no icmp error is received. This opens 275 the door for fragment/reassembly and more cpu cycles on networking 276 devices in the underlay network. 278 3.1.2. Packet_Too_Big not-relayed to host 280 In figure 1, assume that link between VtepA and R1 is 1500 as the 281 only change from the figure 1 topology. Hence the packet send by H1, 282 leads to VtepA setting the DF-bit in the outer IP header(as part of 283 Vxlan Encapsulation). When R1 receives the packet and the routing 284 table lookup points to the outgoing link with mtu size R1_VtepB_MTU 285 bytes, less than the packet size (1500 bytes). As DF-bit is set, R1 286 generates icmpv4 error directed towards the src-ip (VtepA_IPv4). It 287 encapsulates the inner PDU of the original packet. However, VtepA 288 drops the icmp error packet and fails to relay it to H1. This leads 289 to black-holing. 291 The above two sub-sections lay down potential problems for IPv6 Path 292 MTU Discovery mechanism in an Overlay network. Although these 293 problem are generic to any combination of underlay and overlay 294 network types (IPv4 or IPv6), the use-case topology in this document 295 is specific to IPv6 end-point devices connected over Vxlan network, 296 wherein, the underlay is connected over IPv4 network, unless 297 mentioned specifically. 299 4. Solution(s) 301 4.1. Discovery of end-to-end Path MTU 303 Since Vxlan Gateway is the one, which encapsulates the Vxlan (or any 304 other overlay) header onto the packet traversing through the overlay 305 network and also decapsulates the overlay header for packets 306 egressing out of same and heading towards the end devices, the 307 solution becomes more apt to be installed on devices playing such 308 role. 310 Firstly, It is a MUST that Vxlan gateways (VtepA and VtepB) SHOULD 311 set the DF-bit in Outer header encapsulation for client packets that 312 are wrapped with vxlan, related encapsulation, for Path MTU 313 Discovery. Thus ensuring that icmp error packet is generated for 314 packet size exceeding the link MTU in underlay network. 316 Secondly, it is MUST that Vxlan gateway devices translates the icmp 317 error "Destination Unreachable" with code 'Fragmentation Needed and 318 Don't Fragment was Set', into a icmpv6 error 'Packet too big' packet. 319 This mandates that original packet carried in the icmp error message 320 MUST carry information about the inner payload(original packet), and 321 it is an IPv6 Packet, originated from the end-point device (H1 for 322 VtepA in figure 1), connected to the Vxlan gateway over L3/L2 323 network. 325 Thirdly, it is MUST that Vxlan gateway devices translates the icmpv6 326 error 'Packet too big' into a icmp error 'Destination Unreachable' 327 with code 'Fragmentation Needed and Don't Fragment was Set' packet. 328 Successfully translation mandates that, original packet carried in 329 the icmp error message gives information about the inner payload 330 (original packet), and it is an IPv4 packet, which originated from 331 the end-point device connected to gateway over L3/L2 network. 333 Fourthly, incase both, the client side network connected to Vxlan 334 Gateway and the underlay network are same, that is, either both are 335 ipv4 or both are ipv6, then icmp error code error translation is NOT 336 required. Rest of the process to retrieve original packet is 337 identical. 339 4.1.1. ICMP extensions leveraged for MTU propagation 341 This solution leverages extensions in icmp and icmpv6 standards, 342 [RFC4884], for the maximum size of the original packet that can be 343 encapsulated in icmp error message with code as "Fragmentation 344 Required(icmp)" or "Packet too big(icmpv6)" respectively. As the 345 host info is encapsulated in the inner payload, this requires 346 additional bytes of data in icmp packet: (Outer IP Header + UDP 347 Header + Vxlan + Inner L2 Header + Inner IPv6 SRC/DST IPs). 349 In case Vxlan underlay network is provisioned over IPv6 underlay, 350 then similar extensions are applicable to icmpv6. 352 The processing of icmpv6 packet is extended from the current 353 standards of 'non-delivery of icmpv6 packets to upper-layers on Vxlan 354 gateways' to 'relaying it to the end-point devices'. 356 4.1.2. Packet Path Processing 358 Packet Path handling and processing is explained in this section. 359 The assumptions are made with respect to network topology mentioned 360 in Section 3.1.1. The packet format in each flow captures packet 361 fields which are significant with respect to this solution. To 362 understand the solution, the packet flow is explained which leads to 363 generation of icmp or icmpv6 error by intermediate node in underlay 364 network. 366 IPv6 packet is sent by host H1 destined to host H2, both are in 367 different IPv6 subnets.This packet is referred to as P1 in the 368 document. 370 +----------------------------------------------------+ 371 H1--|L2_Hdr(14 bytes): src-mac:H1_MAC, dest-mac:VtepA_MAC|-->VtepA 372 +----------------------------------------------------+ 373 |IPv6_Hdr(40 bytes): src-ip:H1_IPV6, dest-ip:H2_IPv6 | 374 +----------------------------------------------------+ 375 |Host/App specific Payload | 376 +----------------------------------------------------+ 377 Figure 2a. Packet P1 sent by host H1 to host H2 379 VtepA re-writes the mac addresses in 'P1' as part of Vxlan 380 encapsulation. This encapsulation is referred as 'P2' in the 381 document. 383 +------------------------------------------------------+ 384 H1--|L2_Hdr(14 bytes):src-mac:VtepA_MAC, dest-mac:VtepB_MAC|-->VtepA 385 +------------------------------------------------------+ 386 |IPv6_Hdr(40 bytes): src-ip:H1_IPV6, dest-ip:H2_IPv6 | 387 +------------------------------------------------------+ 388 |Host/App specific Payload | 389 +------------------------------------------------------+ 390 Figure 2b. Packet P1 re-written by VtepA 392 4.1.2.1. Packet Processing at Vxlan Gateway 394 Processing at VtepA, in packet path from H1 to H2. 396 (1) VtepA(Vxlan gateway) performs the Vxlan encapsulation over the 397 packet received from H1, based on route lookup. The detail for 398 encap are mentioned in [RFC7348]. 400 (2) VtepA MUST set the DF-bit in the Outer IP header. 402 (3) Since the MTU of outgoing link is more than the packet, packet 403 is sent out towards the underlay next hop, R1. 405 (4) P3 packets encapsulation is shown in figure 3. P3 may find a 406 reference without outer header encapsulation [RFC7348] provides 407 details of the vxlan encapsulation. 409 +----------------------------------------------------------+ 410 VtepA-|L2_Hdr(14bytes):src-mac:VtepA_Mac, dest-mac:R1_MAC |-->R1 411 +----------------------------------------------------------+ 412 |IPv4_Hdr(20 bytes):src-ip:VtepA_IPv4,dest-ip:VtepB_IPv4,DF| 413 +----------------------------------------------------------+ 414 |UDP(8 bytes): src-port: ephemeral-port, dest-port: 4789 | 415 +----------------------------------------------------------+ 416 |Vxlan(8 bytes): Vxlan network identifier | 417 +----------------------------------------------------------+ 418 |P2 packet (refer to H1 to VtepA flow for details of P1) | 419 +----------------------------------------------------------+ 420 Figure 3. Vxlan Encap packet sent by Vxlan Gateway to underlay 422 4.1.2.2. Underlay Generates ICMP error 424 In case the underlay is ipv6 and not ipv4, icmpv6 error is generated. 426 Processing at R1: 428 (1) Packet Size (1500 bytes) is more than the outgoing link's mtu 429 (1300 bytes) and DF-bit is set in the Outer IPv4 header added as 430 part of Vxlan encapsulation at VtepA. 432 (2) R1 MUST generate icmp error message (Destination Unreachable) 433 with error code (Fragmentation Needed and Don't Fragment was 434 Set). For ease of solution description, mtu is assumed to be 435 symmetric over the reverse path, hence reverse path mtu from R1 436 to VtepA is 1500 bytes. icmpv6 error message MUST include MTU of 437 link between R1 and VtepB. 439 (3) In a nut shell, the icmp PDU encapsulation SHOULD be performed 440 as mentioned in [RFC4884] , [RFC4443]. These standards atleast 441 ensure, that original packet carried in icmp error PDU captures 442 enough bytes to include the inner packets IPv6 header atleast. 443 The capture of application specific details depends on the size 444 of the Optional header in the original packet (generated by H1 445 as in Figure 2b) and subsequent transport header. This helps 446 Vxlan Gateway to trace(L3 reachability) the original packet 447 generator (end-point device) atleast and translate icmp error 448 generted by underlay into icmpv6 one and relay it to end-point 449 device. The length field in icmp PDU, include the maximum 450 possible length permissible in reverse path MTU 452 For simplicity, not including the original packet header in the flow 453 diagram in figure 4. icmp PDU details are depicted in the follow up 454 figure 5. 456 +-----------------------------------------------------------+ 457 R1-|L2_Hdr(14 bytes): src-mac:R1_MAC, dest-mac:VtepA_MAC |-->VtepA 458 +-----------------------------------------------------------+ 459 |IPv4_Hdr(20 bytes): src-ip:R1_IPv4, dest-ip:VtepA_IPv4 | 460 +-----------------------------------------------------------+ 461 |ICMP PDU,type:3,code:4,R1_VtepB_MTU, P3(No outer L2 Header)| 462 +-----------------------------------------------------------+ 463 Figure 4. Flow diagram from R1 to VtepA 465 The details of icmp PDU are in the following figure. Type '3' is 466 "Destination Unreachable". Code '4' is "Fragmentation Needed and 467 Don't Fragment bit is set". 469 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4s 5 6 7 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 471 | Type=3 | Code=4 | Checksum | ICMP 472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 473 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 475 | Ver=4|IHL=5 | TOS | Total length | ^ 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 477 | Id |Flags| Fragment Offset | | 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 479 | TTL | Protocol=UDP | Header Checksum |(Outer) 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Max 40 481 | src-ip : VtepA_IPv4 | | 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 483 | dest-ip : VtepB_IPv4 | v 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 485 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 487 | Length | Checksum | | 488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 489 | | | | | | | | | Reserved | | 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 491 |Vxlan Network identifier (VNI) | Reserved | | 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------- 493 | Inner Packet Dest-Mac = VtepB_MAC | ^ 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 495 | | Inner Packet Src-Mac = | | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner 497 | VtepA_MAC |14 byte) 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 499 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 501 |Ver=6 |Traffic Class | Flow Label | ^ 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 503 |payload length |Next Header | Hop Limit | | 504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 505 | | | 506 | src-ipv6 = H1_IPv6 |IPv6 507 | |Header 508 | | | 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 510 | | | 511 | dest-ipv6 = H2_IPv6 | | 512 | | | 513 | | v 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 515 | ~ Optional Headers and transport header/Payload ~ | Varies 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 517 Figure 5. ICMP PDU Original Packet Capture in Detail 519 4.1.2.3. Relay ICMP(v6) Error to End Devices 521 This sub-section can also be generalized as: "handling of icmp 522 errors, which are generated by underlay network in response to end- 523 device packets, by Vxlan Gateway". 525 Processing at VtepA: Processing of icmp error message with code 526 (Fragmentation Needed and Don't Fragment was Set): 528 (1) The icmp error is processed by Vxlan gateways as per the 529 standards defined in [RFC1981] , [RFC4884] and [RFC4443] . 531 (2) If error code is (Fragmentation Needed and Don't Fragment was 532 Set), it SHOULD perform further inspection of the original 533 packet, P3(ethernet payload without its header) carried as data 534 in icmp PDU in extension to standards referred in previous 535 bullet. The extension processing MUST be done prior to taking a 536 decision to either drop the packet or deliver to upper-layer 537 protocols. 539 (3) In extension to above, Vxlan gateway device SHOULD perform the 540 vxlan decap as defined in [RFC7348], to arrive at the inner 541 packet (P2, original packet with VtepA rewrite). The underlay 542 encap is not carrying the layer-2 header in the icmp error 543 packet. Once this processing is done, P2 is the packet which 544 needs attention now, as it carries the credentials of actual 545 host which should receive the relayed icmp packet. 547 (4) Post encap, the VNI should be cached and a check should be made 548 if its a Layer-2 VNI (L2VNI) or an Layer-3 VNI (L3VNI). If it's 549 an L2VNI, go to the next step to check on the payload (ethernet) 550 type. If and only if, it is ipv6 or ipv4, then only process it 551 further else terminate the processing. If it's an L3VNI, then 552 the mapping VRF should found to perform the route lookup for 553 inner packet source IP address. 555 (5) The layer-3 payload type SHOULD be verified using ethernet type 556 field in ethernet header. In case it point to IPv6, src-ipv6 557 field should be picked up to check for reahability, as the icmp 558 packet MUST be sent to original sender, that is, H1. In case H1 559 is reachable, icmp packet SHOULD be constructed as mentioned in 560 the following bullet. 562 (6) Now that P2 is out in the open, it's L2 header is decapsulated, 563 and the leftover, in the figure 6, is run through the icmpv6 564 processing as mentioned in [RFC4443]. 566 (7) It SHOULD generate icmpv6 error message with type (Packet too 567 big) destined to H1_IPv6, that is inner ipv6 packet's source 568 ipv6 address. The mtu 'R1_VtepB_MTU' is copied from icmp error 569 packet recieved from the underlay. 571 (8) The IPv6 header is constructed from original payload as shown in 572 figure 5. The source ipv6 address is picked as local ipv6 573 address "VtepA_IPv6". The destination ipv6 address is set as 574 the "src-ipv6" in original payload, H1_IPv6. The Next Header is 575 set as "58" which denote icmpv6. The derivation of ethernet 576 header is based on next hop to mac address mapping as is 577 performed in any L3 lookup. The follow up figure 9, shows the 578 icmpv6 error packet sent out to node H1. H1 is the original 579 IPv6 packet generator as mentioned in Figure 2b. 581 (9) The route lookup is performed for H1_IPv6 in the VNI mapped VRF, 582 as also mentioned in one of the earlier bullets. Thus {inner 583 packet source IP, VNI} as a tuple is required to resolve the 584 path back to the inner packet source. 586 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 588 |Ver=6 |Traffic Class | Flow Label | ^ 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 590 |payload length |Next Header | Hop Limit | | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 592 | | | 593 | src-ipv6 = H1_IPv6 | Inner 594 | | IPv6 595 | | 40 byt) 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 597 | | | 598 | dest-ipv6 = H2_IPv6 | | 599 | | | 600 | | v 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 602 | ~ Optional Headers and Transport/Application Payload ~ | Varies 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 604 Figure 6. Original IPv6 Packet sent from H1 directed to H2 606 Figure 6 gives a typical IPv6 format sent by end-host, H1 towards H2 607 and encapsulated by Vxlan gateway, to translate the icmp error 608 generated by underlay hop, R1, to the one understood in right context 609 by H1. 611 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 613 | Type=2 | Code=0 | CheckSum | | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 615 | Mtu = R1_VtepB_MTU | | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 617 |Ver=6 |Traffic Class | Flow Label | ^ 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 619 |payload length |Next Header | Hop Limit | | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 621 | | | 622 | src-ipv6 = H1_IPv6 | Orig 623 | | Packet 624 | |40 byte) 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 626 | | | 627 | dest-ipv6 = H2_IPv6 | | 628 | | | 629 | | v 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----- 631 | ~ Optional/Transport Headers and Application Payload ~ |varies 632 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 633 Figure 7. ICMPv6 "Packet Too Big" PDU relayed 634 to H1 by Vxlan Gateway (VtepA) 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 637 | Dest-Mac = H1_MAC | ^ 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 639 | | Inner Packet Src-Mac = | | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 641 | VtepA_MAC |14 byte) 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 643 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 645 |Ver=6 |Traffic Class | Flow Label | ^ 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 647 |payload length |Next Hdr = 58 | Hop Limit | | 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 649 | | | 650 | src-ipv6 = VtepA_IPv6 | IPv6 651 | |header 652 | | | 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 654 | | | 655 | dest-ipv6 = H1_IPv6 | | 656 | | | 657 | | v 658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 659 Figure 8. Ethernet and IPv6 encap for ICMPv6 PDU mentioned in 660 figure 7 662 The translated icmp packet encapsulation looks similar to, figure 7 663 and figure 8 put together in reverse order. The flow diagram in 664 figure 9 gives a concise form of "packet too big" icmpv6 error 665 relayed by VtepA (Vxlan Gateway) towards H1 (end point device). 667 +--------------------------------------------------------+ 668 VtepA--|L2_Hdr(14): src-mac:VtepA_MAC and Dest_Mac: H1_MAC |-->H1 669 +--------------------------------------------------------+ 670 |IPv6_Hdr(40 bytes): src-ip:Vtep_IPv6, dest-ip:H1_IPv6 | 671 +--------------------------------------------------------+ 672 |ICMPv6: Packet_Too_Big, mtu, data: first 128 bytes of P3| 673 +--------------------------------------------------------+ 674 Figure 9. Flow diagram: VtepA to H1 676 There are few more potential flows worth mentioning in this section. 677 These cases are related to, icmp error getting generated from, 678 ingress Vxlan gateway (VtepA) and egress Vxlan gateway (VtepB) with 679 respect to packet sent from H1 to H2. For ingress Vxlan gateway 680 (VtepA) case, the legacy IPv6 PMTUD rules from [RFC4443] SHOULD be 681 applied as no Vxlan encap is involved. 683 Where as, egress Vxlan gateway (VtepB) SHOULD send packet P3 (without 684 L2 header) in the icmp data, even though mtu calculation MAY be done 685 post vxlan decapsulation. That is when the outgoing link is 686 identified as the one from VtepB to H2. It MAY buffer packet P3 687 prior to lookup based on inner packet (P2) credentials, so that P3 688 can be encapsulated in the icmp packet. This also ensures the packet 689 format consistency, when accessed at the VtepA for translation before 690 relaying it to H1. 692 4.1.3. ICMP(v6) Error Translation 694 This section specifically mentions about icmp and icmpv6 packet 695 translation, generated in an underlay network to the one which is, 696 understood by the end point device, with encapsulation aligning with 697 the network-type(IPv4 and IPv6), end-point device and underlay is 698 provisioned with. The last leg processing mentioned in previous sub- 699 section is specific to the topology mentioned in Section 3.1.1. 700 However, this subsection elaborates on all possible topology 701 combination of underlay and end-device networks with respect to IPv4 702 or IPv6. The explanation provided in form of figures for error 703 generated by underlay and the translated one relayed to the end-point 704 device by Vxlan gateway. 706 (a) End-Point is IPv6 connected and Underlay is IPv4 provisioned. 708 (b) End-Point is IPv4 connected and Underlay is IPv6 provisioned. 710 (c) Both End-Point and Underlay are provisioned with IPv6. 712 (d) Both End-Point and Underlay are provisioned with IPv4. 714 4.1.3.1. End-Point is IPv6 connected and Underlay is IPv4 provisioned 716 This case is similar to the last leg processing described in 717 Section 4.1.2 and does not needs any more description. 719 4.1.3.2. End-Point is IPv4 connected and Underlay is IPv6 provisioned 721 Topology drawn in figure 10, provides for the icmpv6 PDU encap 722 generated by R1. H1_IPv4 and H2_IPv4 are in distinct ipv4 subnets. 723 R1_IPv6 represents IPv6 addresses falling in both subnets connecting 724 to VtepA and VtepB. 726 Another difference between an IPv4 and IPv6 underlay is that for IPv6 727 underlay there is no concept of DF-bit. The fragmentation can only 728 be done at ingress. At all other underlay nodes "Packet too big" 729 icmpv6 error is generated. Vxlan Gateway SHOULD ensure that 730 fragmentation is avoided at Vxlan Gateway and icmp error is sent back 731 to H1. This procedure is applicable if and only if, original packet 732 contains DF-bit set in it's IP header. 734 +----------+ +----------+ 735 | H1 | | H2 | 736 | | | | 737 |(H1_IPv4) | |(H2_IPv4) | 738 +----------+ +----------+ 739 | | 740 | | 741 +------------+ +----------+ +------------+ 742 |(VtepA_IPv4)| | | |(VtepB_IPv4)| 743 | VtepA | | R1 | | VtepB | 744 |(VtepA_IPv6)|---| (R1_IPv6)|---|(VtepB_IPv6)| 745 +------------+ +----------+ +------------+ 747 Figure 10. L3 Overlay 749 LEGEND: 750 MAC address : _MAC 751 IPv4 address: _IPv4 752 IPv6 address: _IPv6 753 : node names in the above topology are 754 H1, VtepA, R1, VtepB, H2. 755 VtepA, VtepB: Vxlan gateways to underlay network 757 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 759 | Type=2 | Code=0 | Checksum | ICMPv6 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 761 | Next Hop Mtu = R1_VtepB_MTU | Code=0 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 763 |Ver=6 |Traffic Class | Flow Label | ^ 764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 765 |payload length |Next Hdr | Hop Limit | | 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 767 | | | 768 | src-ipv6 = R1_IPv6 | IPv6 769 | |40 byte) 770 | | | 771 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 772 | | | 773 | dest-ipv6 = VtepA_IPv6 | | 774 | | | 775 | | | 776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 777 | ~ Extension Headers ~ (payload type is UDP) | v 778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 779 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8 byte 781 | Length | Checksum | | 782 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 783 | | | | | | | | | Reserved | | 784 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8 byte 785 |Vxlan Network identifier (VNI) | Reserved | | 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 787 | Inner Packet Dest-Mac = VtepA_MAC | ^ 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 789 | | Inner Packet Src-Mac = | | 790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 791 | VtepB_MAC |14 byte 792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 793 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 795 | Ver=4|IHL=5 | TOS | Total length | ^ 796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 797 | Id |Flags| Fragment Offset | | 798 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 799 | TTL | Protocol | Header Checksum | Orig 800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Hdr 801 | src-ip : H1_IPv4 | | 802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 803 | dest-ip : H2_IPv4 | v 804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 805 | ~ transport-header and Application specific Payload ~ | varies 806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 807 Figure 11. ICMPV6 PDU Sent by R1 to VtepA 809 R1 sends an icmpv6 error "Packet Too Big" directed towards VtepA. 810 The icmpv6 PDU is shown in Figure 11. VtepA receives the packet with 811 this icmpv6 PDU and translates it to icmp PDU with type "Destination 812 Unreachable" and code "Fragmentation Needed" before relaying it to H1 813 over ipv4 network. Figure 12, reflects the relayed packet sent by 814 VtepA to H1. All other references SHOULD be taken as it is from 815 Section 4.1.2. 817 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 818 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 819 | Dest-Mac = H1_MAC | ^ 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 821 | | Inner Packet Src-Mac = | | 822 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 823 | VtepA_MAC |14 byte) 824 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 825 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 826 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 827 | Ver=4|IHL=5 | TOS | Total length | ^ 828 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 829 | Id |Flags| Fragment Offset | | 830 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 831 | TTL | Protocol=1 | Header Checksum | IPv4 832 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 833 | src-ip : VtepA_IPv4 | | 834 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 835 | dest-ip : H1_IPv4 | | 836 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 837 | Optional Header | v 838 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 839 | Type=3 | Code=4 | Checksum | ICMP 840 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 841 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 843 | Ver=4|IHL=5 | TOS | Total length | ^ 844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 845 | Id |Flags| Fragment Offset | | 846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 847 | TTL | Protocol | Header Checksum |Orig 848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+iPv4 849 | src-ip : H1_IPv4 | | 850 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 851 | dest-ip : H2_IPv4 | v 852 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 853 | Optional and Transport Header and Application data | varies 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 855 Figure 12. ICMPv4 error Packet relayed to end point Host, H1 857 4.1.3.3. Both End-Point and Underlay are provisioned with IPv6 859 Topology is mentioned in Figure 13 with minor changes along with the 860 legend. Figure 14, outlines the icmpv6 PDU, encapsulation generated 861 by R1. H1_IPv6 and H2_IPv6 in different ipv6 subnets. R1_IPv6 862 reflects both subnets connecting to VtepA and VtepB. 864 +----------+ +----------+ 865 | H1 | | H2 | 866 | | | | 867 |(H1_IPv6) | |(H2_IPv6) | 868 +----------+ +----------+ 869 | | 870 | | 871 +------------+ +----------+ +------------+ 872 |(VtepA_IPv6)| | | |(VtepB_IPv6)| 873 | VtepA | | R1 | | VtepB | 874 |(VtepA_IPv6)|---| (R1_IPv6)|---|(VtepB_IPv6)| 875 +------------+ +----------+ +------------+ 877 Figure 13. L3 Overlay 879 LEGEND: 880 MAC address : _MAC 881 IPv6 address: _IPv6 882 : node names in the above topology are 883 H1, VtepA, R1, VtepB, H2. 884 VtepA, VtepB: Vxlan gateways to underlay network 886 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 888 | Type=2 | Code=0 | Checksum | ICMPv6 889 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 890 | Next Hop Mtu = R1_VtepB_MTU | Code=0 891 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 892 |Ver=6 |Traffic Class | Flow Label | ^ 893 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 894 |payload length |Next Hdr | Hop Limit | | 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 896 | | | 897 | src-ipv6 = R1_IPv6 | IPv6 898 | | Header 899 | | | 900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 901 | | | 902 | dest-ipv6 = VtepA_IPv6 | | 903 | | | 904 | | | 905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 906 | ~ Extension Headers ~ (payload type is UDP) | v 907 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 908 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 909 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 910 | Length | Checksum | | 911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 912 | | | | | | | | | Reserved | | 913 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 914 |Vxlan Network identifier (VNI) | Reserved | | 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 916 | Inner Packet Dest-Mac = VtepB_MAC | ^ 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 918 | | Inner Packet Src-Mac = | | 919 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 920 | VtepA_MAC |14 byte 921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 922 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 923 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 924 |Ver=6 |Traffic Class | Flow Label | ^ 925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 926 |payload length |Next Hdr | Hop Limit | | 927 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 928 | | | 929 | src-ipv6 = VtepA_IPv6 |Inner 930 | | Ipv6 931 | | | 932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 933 | | | 934 | dest-ipv6 = H1_IPv6 | | 935 | | | 936 | | v 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 938 | ~ Extension and Transport Headers, Application Data ~ | varies 939 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 940 Figure 14. ICMPv6 PDU generated by Intermediate Hop, R1 in Vxlan Network 942 R1 sends an icmpv6 error "Packet Too Big" directed towards VtepA. 943 The icmpv6 PDU is shown in Figure 14. VtepA receives the packet with 944 this icmpv6 PDU and relays it to H1 without any translation as H1 is 945 connected to VtepA over ipv6 network. All other references about 946 original packet to be include in the icmpv6 PDU can be taken as it is 947 from Section 4.1.2. 949 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 950 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 951 | Dest-Mac = H1_MAC | ^ 952 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 953 | | Inner Packet Src-Mac = | | 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+eth hdr 955 | VtepA_MAC |14 byte 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 957 | Inner Vlan if present |Ethtype = 0X86dd (IPv6) | v 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 959 |Ver=6 |Traffic Class | Flow Label | ^ 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 961 |payload length |Next Hdr | Hop Limit | | 962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 963 | | | 964 | src-ipv6 = VtepA_IPv6 |IPv6 965 | | Header 966 | | | 967 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 968 | | | 969 | dest-ipv6 = H1_IPv6 | | 970 | | | 971 | | | 972 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 973 | ~ Extension Headers ~ (payload type is ICMPV6) | v 974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 975 | Type=2 | Code=0 | Checksum | ICMPv6 976 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=2 977 | Next Hop Mtu = R1_VtepB_MTU | Code=0 978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 979 |Ver=6 |Traffic Class | Flow Label | ^ 980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 981 |payload length |Next Hdr | Hop Limit | | 982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 983 | | | 984 | src-ipv6 = H1_IPv6 |Orig 985 | |IPv6 986 | | | 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 988 | | | 989 | dest-ipv6 = H2_IPv6 | | 990 | | | 991 | | v 992 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 993 | ~ Extension and Transport Headers and Applcation data ~ | varies 994 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 995 Figure 15. ICMPv6 error Complete Packet sent to H1 by VtepA 996 4.1.3.4. Both End-Point and Underlay are provisioned with IPv4 998 Topology is mentioned in figure 16, with minor changes along with the 999 legend, figure 17, provides the icmp PDU encap generated by R1. 1000 H1_IPv4 and H2_IPv4 are in different ipv4 subnets. 1002 +----------+ +----------+ 1003 | H1 | | H2 | 1004 | | | | 1005 |(H1_IPv4) | |(H2_IPv4) | 1006 +----------+ +----------+ 1007 | | 1008 | | 1009 +------------+ +----------+ +------------+ 1010 |(VtepA_IPv4)| | | |(VtepB_IPv4)| 1011 | VtepA | | R1 | | VtepB | 1012 |(VtepA_IPv4)|---| (R1_IPv4)|---|(VtepB_IPv4)| 1013 +------------+ +----------+ +------------+ 1015 Figure 16. L3 Overlay 1017 LEGEND: 1018 MAC address : _MAC 1019 IPv4 address: _IPv4 1020 : node names in the above topology are 1021 H1, VtepA, R1, VtepB, H2. 1022 VtepA, VtepB: Vxlan gateways to underlay network 1024 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1025 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1026 | Type=3 | Code=4 | Checksum | ICMP 1027 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 1028 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1030 | Ver=4|IHL=5 | TOS | Total length | ^ 1031 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1032 | Id |Flags| Fragment Offset | | 1033 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1034 | TTL | Protocol=UDP | Header Checksum | IPv4 1035 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Header 1036 | src-ip : VtepA_IPv4 | | 1037 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1038 | dest-ip : H1_IPv4 | v 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1040 | Source UDP Port (ephemeral) | Dest UDP Port = 4789 (Vxlan) | | 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 1042 | Length | Checksum | | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1044 | | | | | | | | | Reserved | | 1045 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+8 bytes 1046 |Vxlan Network identifier (VNI) | Reserved | | 1047 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1048 | Inner Packet Dest-Mac = VtepB_MAC | ^ 1049 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1050 | | Inner Packet Src-Mac = |inner 1051 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+packet 1052 | VtepA_MAC |eth hdr 1053 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1054 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 1055 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1056 | Ver=4|IHL=5 | TOS | Total length | ^ 1057 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1058 | Id |Flags| Fragment Offset | | 1059 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1060 | TTL | Protocol | Header Checksum | IPv4 1061 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ hdr 1062 | src-ip : H1_IPv4 | | 1063 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1064 | dest-ip : H2_IPv4 | v 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1066 | ~ Optional and Transport Header and Application Payload ~ |varies 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1069 Figure 17. ICMP PDU generated by R1 towards VtepA 1070 R1 sends an icmp error directed towards VtepA. The icmp PDU is shown 1071 in figure 17. VtepA receives the packet with this icmp PDU and 1072 relays it to H1 over ipv4 network. Figure 16, displays the packet 1073 sent by VtepA to H1. All other references can be taken as it is from 1074 Section 4.1.2. 1076 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1077 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1078 | Dest-Mac = H1_MAC | ^ 1079 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1080 | | Src-Mac = | | 1081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ eth 1082 | VtepA_MAC |header 1083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1084 | Inner Vlan if present |Ethtype = 0X0800 (IPv4) | v 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1086 | Ver=4|IHL=5 | TOS | Total length | ^ 1087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1088 | Id |Flags| Fragment Offset | | 1089 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1090 | TTL | Protocol=1 | Header Checksum |IPv4 1091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Header 1092 | src-ip : VtepA_IPv4 | | 1093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1094 | dest-ip : H1_IPv4 | | 1095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1096 | Optional Header | v 1097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1098 | Type=3 | Code=4 | Checksum | ICMP 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type=3 1100 | unused | Length | Next Hop Mtu = R1_VtepB_MTU | Code=4 1101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1102 | Ver=4|IHL=5 | TOS | Total length | ^ 1103 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1104 | Id |Flags| Fragment Offset | | 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1106 | TTL | Protocol | Header Checksum |Orig 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+IPv4 1108 | src-ip : H1_IPv4 | | 1109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1110 | dest-ip : H2_IPv4 | v 1111 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1112 |~ Optional and Transport Header and Application Payload ~ | varies 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------ 1115 Figure 18. Complete ICMP error Packet sent to H1 by VtepA 1116 5. Inter-site MTU Propagation 1118 +--+ 1119 |CE| 1120 +--+ 1121 | 1122 +----+ 1123 +----| PE |----+ 1124 +---------+ | +----+ | +---------+ 1125 +----+ | +---+ +---+ | +----+ 1126 |NVE1|--| | | | | |--|NVE3| 1127 +----+ |---U1---|GW1|-----P------|GW3|---U2---| +----+ 1128 | +---+ +---+ | 1129 | NVO-1 | WAN | NVO-2 | 1130 | +---+ +---+ | 1131 | | | | | | 1132 +----+ | |GW2| |GW4| | +----+ 1133 |NVE2|--| +---+ +---+ |--|NVE4| 1134 +----+ +---------+ | | +---------+ +----+ 1135 +--------------+ 1136 Figure 18. Datacenter/Site Interconnect Between Remote EVPN fabrics 1138 This section specifically calls out the relay of icmp errors 1139 generated by underlay in an intersite/interfabric connectivity across 1140 EVPN-overlays. The reference diagram shown above is picked up from 1141 [RFC9014]. 1143 The topology in the above diagram describes two disparate NVO fabrics 1144 connected across WAN; leveraging an EVPN provisioned overlay. Lets 1145 consider the interconnect as EVPN-Overlay over the WAN network 1146 (between GW1/2 and GW3/4). Thus there is a multi-hop overlay 1147 (tunnel) reachability between hosts in fabric(s) behind the edges 1148 NVE1/2 and NVE3/4. There is a EVPN-Overlay tunnel between NVEs and 1149 their respective gateways, i.e., between NVE1/2 and GW1/2 and another 1150 one between NVE3/4 and GW3/4. There is an intersite connect 1151 leveraging EVPN-Overlay, thus ensuring end to end connectivity 1152 between NVEs across the WAN. The fabric in dataplane can be Vxlan, 1153 MPLS, NVGRE, GENEVE, GUE, GPE etc. 1155 The packet traversing between networks behind NVE1 to NVE3 shall 1156 transit through three EVPN-Overlay tunnels. First one, between NVE1 1157 and GW1; second one, between the WAN gateways GW1 and GW3 and the 1158 third one, between GW3 and NVE3. There is an EVPN-Overlay handoff at 1159 all the EVPN-tunnel end-points in the packet path, GW1 and GW2 1160 respectively. 1162 There is a possibility that the overlay encapsulated packet hits the 1163 MTU blockage at one of the underlay routers, lets say, P3 in this 1164 case. P3 generates icmp error targetted towards GW3 as the tunnel 1165 end-point. GW3 should check the credentials of the original PDU, 1166 carried in the icmp error and perform the route lookup. It's very 1167 likely that the path to reach packet source (behind NVE1) is also via 1168 the EVPN-Overlay tunnel from GW3 to GW1. The icmp error is relayed 1169 back over the EVPN-Overlay construct towards GW3. In the same flow 1170 GW1 should peek into the original PDU credentials to get the 1171 reachability to the inner packet source. As luck may have it, the 1172 packet source is reachable over the EVPN-overlay tunnel from GW1 to 1173 NVE1. It should go through the similar decap/re-encap as mentioned 1174 in earlier sections. The EVIs at each stitching point may be 1175 different, although ensuring that routes are exported between the 1176 VNIs. The first-hop vtep towards the source i.e. NVE1 should perform 1177 procedures mentioned in Section 4.1.2, to relay out the icmp packet 1178 to the original source of the packet. 1180 6. Same subnet Considerations 1182 This section proposes propagation of icmp or icmpv6 error (specific 1183 to MTU) at source Vtep to inner packet source, which is generated by 1184 an underlay device for a case, when, inner packet source and 1185 destination ipv4(or ipv6) addresses are in the same subnet. 1187 The steps in section Section 4.1.2.3, elaborate on the check to be 1188 performed, if the icmp error is carrying the original PDU 1189 encapsulated with an L2VNI or L3VNI. In case it is an L2VNI, then 1190 the possibility of the inner packet traffic being a "same subnet" one 1191 is, very high. Hence Section 4.1.2.3, also talks about doing the 1192 ethernet type check in the inner packet payload. If and only if, 1193 it's ipv4 or ipv6, relay the icmp error back to the inner packet 1194 source ip (or ipv6) address. Else, don't process the relay message 1195 further. As the inner packet is a non layer-3 PDU, it does not makes 1196 sense to relay back the icmp error. 1198 7. Ecmp Considerations 1200 Ecmp considerations are driven by the packet sent by the end host 1201 application and the way it's leveraged. 1203 To ensure "MTU propagation" via "icmpv6 error", is agnostic to ecmp 1204 paths in a Vxlan network, there are few more consideration. In Vxlan 1205 Gateway, the route look-up is done based on attributes carried in 1206 packet generated by end point host. The packet generated can 1207 potentially be from a tcp based end host application (although should 1208 not be generalized). 1210 Where as, for an intermediate node, (lets say, Spine node in Clos 1211 topology) in underlay network the look ups are based on Outer Encap 1212 (Vtep ip addresses and and UDP Header). 1214 The packet traversing from site behind NE1 to NVE3 shall transit 1215 through three EVPN-Overlay tunnels. First one, between NVE1 and GW1; 1216 second one, between the WAN gateways GW1 and GW3 and the third one, 1217 between GW3 and NVE3. There is an EVPN-Overlay handoff at all the 1218 EVPN-tunnel end-points in the packet path, GW1 and GW2 respectively. 1219 On another note, for an L2 gateway case, wherein Vxlan gateway (Vtep 1220 Node) bridges (and not routes) host packets destined to same subnet 1221 destination, MTU calculation SHOULD come into play only in the Spine 1222 devices. 1224 As a potential solution,the MTU values recieved over ECMP underlay 1225 paths can be cached at the ingress Vteps. The Vtep MAY propagate/ 1226 relay the lowest of the all MTUs received across ECMP underlay paths, 1227 to the end-host. 1229 8. Security Considerations 1231 This document inherits all the security considerations discussed in 1232 [RFC1981] and [RFC1191]. 1234 9. IANA Considerations 1236 TBD 1238 10. Acknowledgements 1240 Thanks to Vengada Prasad Govindan, Deepak Kumar, Matthew Bocci and 1241 Rohit Mendiratta for providing the inputs. 1243 11. References 1245 11.1. Normative References 1247 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1248 Requirement Levels", BCP 14, RFC 2119, March 1997, 1249 . 1251 11.2. Informative References 1253 [I-D.draft-gross-geneve] 1254 Gross, J., Sridhar, T., Garg, P., Wright, C., Ganga, G., 1255 Agarwal, P., Duda, C., Dutt, D., and J. Hudson, "Geneve: 1256 Generic Network Virtualization Encapsulation", Work in 1257 Progress, Internet-Draft, draft-gross-geneve-02, 25 1258 October 2015, . 1261 [I-D.draft-ietf-nvo3-gue] 1262 Herbert, T., Yong, L., and O. Zia, "Generic Protocol 1263 Extension for VXLAN", Work in Progress, Internet-Draft, 1264 draft-ietf-nvo3-gue-03, 6 March 2015, 1265 . 1268 [I-D.draft-ietf-nvo3-vxlan-gpe] 1269 Quinn, P., Manur, R., Kreeger, L., Lewis, D., Maino, F., 1270 Smith, M., Agarwal, P., Yong, L., Xu, X., Elzur, U., and 1271 D. Melman, "Generic Protocol Extension for VXLAN", Work in 1272 Progress, Internet-Draft, draft-ietf-nvo3-vxlan-gpe-02, 1 1273 May 2015, . 1276 [I-D.nordmark-nvo3-transcending-traceroute] 1277 Nordmark, E., Appanna, C., and A. Lo, "Layer-Transcending 1278 Traceroute for Overlay Networks like VXLAN", Work in 1279 Progress, Internet-Draft, draft-nordmark-nvo3- 1280 transcending-traceroute-02, 4 March 2015, 1281 . 1284 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1285 1 November 1990, 1286 . 1288 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1289 for IP version 6", RFC 1981, August 1996, 1290 . 1292 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1293 Message Protocol (icmpv6) for the Internet Protocol 1294 Version 6 (IPv6) Specification", RFC 4443, March 2006, 1295 . 1297 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1298 Network Tunneling", RFC 9014, April 2006, 1299 . 1301 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1302 Discovery", RFC 4821, March 2007, 1303 . 1305 [RFC4884] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, 1306 "Extended ICMP to Support Multi-Part Messages", RFC 4884, 1307 April 2007, . 1309 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1310 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1311 eXtensible Local Area Network (VXLAN): A Framework for 1312 Overlaying Virtualized Layer 2 Networks over Layer 3 1313 Networks", RFC 7348, August 2014, 1314 . 1316 [RFC7637] Yang, S. and M. Garg, "Network Virtualization Using 1317 Generic Routing Encapsulation", RFC 7637, September 2015, 1318 . 1320 [RFC9014] Rabadan, J., Sathappan, S., Henderickx, W., Sajassi, A., 1321 and W. Drake, "Interconnect Solution for Ethernet VPN 1322 (EVPN) Overlay Networks", RFC 9014, May 2021, 1323 . 1325 Authors' Addresses 1327 Saumya Dikshit 1328 Aruba Networks, HPE 1329 Mahadevpura 1330 Bangalore 560 048 1331 Karnataka 1332 India 1334 Email: saumya.dikshit@hpe.com 1336 Vinayak Joshi 1337 Aruba Networks, HPE 1338 Mahadevpura 1339 Bangalore 560 048 1340 Karnataka 1341 India 1343 Email: vinayak.joshi@hpe.com 1344 A. Sujeet Nayak 1345 Cisco 1346 Cessna Business Park 1347 Bangalore 560 087 1348 Karnataka 1349 India 1351 Email: sua@cisco.com