idnits 2.17.1 draft-ietf-intarea-tunnels-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The draft header indicates that this document updates RFC4459, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4459, updated by this document, for RFC5378 checks: 2004-06-14) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 6, 2016) is 2851 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-01 == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-gue-04 == Outdated reference: A later version (-02) exists of draft-ietf-rtgwg-dt-encap-01 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-82) exists of draft-templin-aerolink-67 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Area WG J. Touch 2 Internet Draft USC/ISI 3 Intended status: Informational M. Townsley 4 Updates: 4459 Cisco 5 Expires: January 2017 July 6, 2016 7 IP Tunnels in the Internet Architecture 8 draft-ietf-intarea-tunnels-03.txt 10 Status of this Memo 12 This Internet-Draft is submitted in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 This document may contain material from IETF Documents or IETF 16 Contributions published or made publicly available before November 17 10, 2008. The person(s) controlling the copyright in some of this 18 material may not have granted the IETF Trust the right to allow 19 modifications of such material outside the IETF Standards Process. 20 Without obtaining an adequate license from the person(s) controlling 21 the copyright in such materials, this document may not be modified 22 outside the IETF Standards Process, and derivative works of it may 23 not be created outside the IETF Standards Process, except to format 24 it for publication as an RFC or to translate it into languages other 25 than English. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 This Internet-Draft will expire on January 6, 2017. 45 Copyright Notice 47 Copyright (c) 2016 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Abstract 62 This document discusses the role of IP tunnels in the Internet 63 architecture, in which IP datagrams are carried as payloads in non- 64 link layer protocols. It explains their relationship to existing 65 protocol layers and the challenges in supporting IP tunneling based 66 on the equivalence of tunnels to links. 68 Table of Contents 70 1. Introduction...................................................3 71 2. Conventions used in this document..............................6 72 2.1. Key Words.................................................6 73 2.2. Terminology...............................................6 74 3. The Tunnel Model...............................................9 75 3.1. What is a tunnel?........................................10 76 3.2. View from the Outside....................................11 77 3.3. View from the Inside.....................................12 78 3.4. Location of the Ingress and Egress.......................12 79 3.5. Implications of This Model...............................13 80 3.6. Fragmentation............................................14 81 3.6.1. Outer Fragmentation.................................14 82 3.6.2. Inner Fragmentation.................................15 83 3.6.3. The necessity of Outer Fragmentation................16 84 4. IP Tunnel Requirements........................................16 85 4.1. Minimum MTU Considerations...............................17 86 4.2. Fragmentation............................................18 87 4.3. MTU discovery............................................21 88 4.4. IP ID exhaustion.........................................22 89 4.5. Hop Count................................................23 90 4.6. Signaling................................................24 91 4.7. Relationship of Header Fields............................26 92 4.8. Congestion...............................................27 93 4.9. Checksums................................................27 94 4.10. Numbering...............................................27 95 4.11. Multicast...............................................28 96 4.12. Multipoint..............................................28 97 4.13. NAT / Load Balancing....................................29 98 4.14. Recursive tunnels.......................................29 99 5. Observations (implications)...................................29 100 5.1. Tunnel protocol designers................................29 101 5.2. Tunnel implementers......................................30 102 5.3. Tunnel operators.........................................30 103 5.4. Diagnostics..............................................30 104 5.5. For existing standards...................................31 105 5.5.1. Generic UDP Encapsulation (GUE - IP in UDP in IP)...31 106 5.5.2. Generic Packet Tunneling in IPv6....................31 107 5.5.3. Geneve (NVO3).......................................32 108 5.5.4. GRE (IP in GRE in IP)...............................33 109 5.5.5. IP in IP / mobile IP................................33 110 5.5.6. IPsec tunnel mode (IP in IPsec in IP)...............35 111 5.5.7. L2TP................................................36 112 5.5.8. L2VPN...............................................36 113 5.5.9. L3VPN...............................................36 114 5.5.10. LISP...............................................36 115 5.5.11. MPLS...............................................37 116 5.5.12. PWE................................................37 117 5.5.13. SEAL/AERO..........................................37 118 5.5.14. TRILL..............................................37 119 5.5.15. RTG DT encapsulations..............................38 120 5.6. For future standards.....................................38 121 6. Security Considerations.......................................39 122 7. IANA Considerations...........................................40 123 8. References....................................................40 124 8.1. Normative References.....................................40 125 8.2. Informative References...................................40 126 9. Acknowledgments...............................................44 127 APPENDIX A: Fragmentation efficiency.............................45 128 A.1. Selecting fragment sizes.................................45 129 A.2. Packing..................................................46 131 1. Introduction 133 The Internet is loosely based on the ISO seven layer stack, in which 134 data units traverse the stack by being wrapped inside data units one 135 layer down. A tunnel is a mechanism for transmitting data units 136 between endpoints by wrapping them as data units of the same or 137 higher layers, e.g., IP in IP (Figure 1) or IP in UDP (Figure 2). 139 +----+----+--------------+ 140 | IP'| IP | Data | 141 +----+----+--------------+ 143 Figure 1 IP inside IP 145 +----+-----+----+--------------+ 146 | IP'| UDP | IP | Data | 147 +----+-----+----+--------------+ 149 Figure 2 IP in UDP in IP in Ethernet 151 This document focuses on tunnels that transit IP packets, i.e., in 152 which an IP packet is the payload of another protocol. Tunnels 153 provide a virtual link that can help decouple the network topology 154 seen by transiting packets from the underlying physical network 155 [To98][RFC2473]. Tunnels were critical in the development of 156 multicast because not all routers were capable of processing 157 multicast packets [Er94]. Tunnels allowed multicast packets to 158 transit between multicast-capable routers over paths that did not 159 support multicast. Similar techniques have been used to support other 160 protocols, such as IPv6 [RFC2460]. 162 Use of tunnels is common in the Internet. The word "tunnel" occurs in 163 over 100 RFCs, and is supported within numerous protocols, including: 165 o IP in IP / mobile IP - IPv4 in IPv4 tunnels 166 [RFC2003][RFC2473][RFC5944] 168 o IP in IPv6 - IPv6 or IPv4 in IPv6 [RFC2473] 170 o IPsec - includes a tunnel mode to enable encryption or 171 authentication of the an entire IP datagram [RFC4301] 173 o Generic Router Encapsulation (GRE) - a shim layer for tunneling 174 any network layer in any other network layer, IP in GRE in IP 175 [RFC2784][RFC7588][RFC7676] 177 o Generic UDP Encapsulation (GUE) - IP in UDP (in IP)[He15] 179 o Automatic Multicast Tunneling (AMT) - IP in UDP for multicast 180 [RFC7450] 182 o L2TP - PPP over IP, to extend a subscriber's DSL/FTTH connection 183 from an access line provider to an ISP [RFC3931] 185 o L2VPNs - provides a link topology different from that provided by 186 physical links [RFC4664] 188 o L3VPNs - provides a network topology different from that provided 189 by ISPs [RFC4176] 191 o LISP - reduces routing table load within an enclave of routers at 192 the expense of more complex ingress encapsulation tables [RFC6830] 194 o MPLS - IP over a circuit-like path in which identifiers are 195 rewritten on each hop, often used for traffic provisioning 196 [RFC3031] 198 o NVO3 - data center network sharing (which includes use of GUE, 199 above) [RFC7364] 201 o PWE3 - emulates wire-like services over packet-switched services 202 [RFC3985] 204 o SEAL/AERO -IP in IP tunneling with an additional shim header 205 designed to overcome the limitations of RFC2003 [RFC5320][Te16] 207 o TRILL - enables L3 routing (typically IS-IS) in an enclave of 208 Ethernet bridges [RFC5556][RFC6325] 210 The variety of tunnel mechanisms raises the question of the role of 211 tunnels in the Internet architecture and the potential need for these 212 mechanisms to have similar and predictable behavior. In particular, 213 the ways in which packet sizes (i.e., Maximum Transmission Unit or 214 MTU) mismatch and error signals (e.g., ICMP) are handled may benefit 215 from a coordinated approach. 217 Regardless of the layer in which encapsulation occurs, tunnels 218 emulate a link. The only difference is that a link operates over a 219 physical communication channel, whereas a tunnel operates over 220 software protocol layers. Because tunnels are links, they are subject 221 to the same issues as any link, e.g., MTU discovery, signaling, and 222 the potential utility of native support for broadcast and multicast 223 [RFC2460][RFC3819]. They have advantages over native links, being 224 potentially easier to reconfigure and control. 226 The first attempt to use large-scale tunnels transit multicast across 227 the Internet in 1988 lead to tunnel collapse. At the time, tunnels 228 were not implemented as encapsulation-based virtual links, but rather 229 as loose source routes on un-encapsulated IP datagrams [RFC1075]. 230 Using encapsulation tunnels instead avoided that collapse [Er94] and 231 eventually to AMT [RFC7450]. 233 The remainder of this document describes the general principles of IP 234 tunneling and discusses the key considerations in the design of a 235 protocol that tunnels IP datagrams. It derives its conclusions from 236 the equivalence of tunnels and links. Note that all considerations 237 are in the context of existing standards and requirements. 239 2. Conventions used in this document 241 2.1. Key Words 243 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 244 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 245 document are to be interpreted as described in RFC-2119 [RFC2119]. 247 2.2. Terminology 249 This document uses the following terminology. These definitions are 250 given in the most general terms, but will be used primarily to 251 discuss IP tunnels in this document. They are presented in order from 252 most fundamental to those derived on earlier definitions: 254 o Messages: variable length data labeled with globally-unique 255 endpoint IDs, also known as a datagram for IP messages [RFC791]. 257 o Network node (node): a device that can act as an endpoint or 258 forwarder. For datagrams (IP messages), these are hosts or 259 gateways/routers, respectively. 261 o Endpoint or host: a node that sources or sinks messages labeled 262 from/to its IDs, typically known as a host for both IP and higher- 263 layer protocol messages [RFC1122]. 265 o Forwarder: a node that relays messages using destination IDs and 266 local context, also known as a gateway or router for IP messages 267 [RFC1812]. Note that most forwarders also act as endpoints when 268 they source or sink messages. 270 o Source (sender): the node that generates a message. 272 o Destination (receiver): the node that consumes a message. 274 o Link: a device (or medium) that transfers messages between nodes, 275 i.e., by which a message can traverse between nodes without being 276 processed by a forwarder. Note that the notion of forwarder is 277 relative to the layer at which message processing is considered 278 [To16]. 280 o Link interface (sometimes known as a network interface): a 281 location on a link co-located with a node where messages depart 282 onto that link or arrive from that link. 284 o Path: a sequence of one or more links or tunnels over which a 285 message can traverse between nodes (hosts or forwarders), which 286 may or may not involve being processed by a forwarder. 288 o Tunnel: a protocol mechanism that transits messages using 289 encapsulation to allow a path to appear as a single link. Note 290 that a protocol can be used to tunnel itself (IP over IP) and that 291 this includes the conventional layering of the ISO stack (i.e., by 292 this definition, Ethernet is a tunnel for IP). A tunnel can be 293 considered a virtual link. 295 o Ingress: the virtual link interface of a tunnel which receives 296 messages within a node, encapsulates them according to the tunnel 297 protocol, and transmits them into the tunnel. This is the tunnel 298 equivalent of the outgoing (departing) network interface of a 299 link. Note that the ingress virtual link interface and traffic 300 source node can be co-located. 302 o Egress: a virtual link interface that receives messages that have 303 finished transiting a tunnel and presents them to a node. This is 304 the tunnel equivalent of the incoming (arriving) network interface 305 of a link. The egress decapsulates messages for further transit to 306 the destination. Note that the egress virtual link interface and 307 traffic destination node can be co-located. 309 o Tunnel transit packet (TTP): the packet arriving at a node 310 connected to a tunnel that enters the ingress and exits the 311 egress, i.e., the packet carried over the tunnel. This is 312 sometimes known as the "tunneled packet", i.e., the packet carried 313 over the tunnel. This is the tunnel equivalent of a network layer 314 packet as it would traverse a link. 316 o Tunnel link packet (TLP): packets that traverse from ingress to 317 egress, in which resides all or part of a tunnel transit packet. 318 This is sometimes known as the "tunnel packet", i.e., the packet 319 of the tunnel itself. This is the tunnel equivalent of a link 320 layer packet as it would traverse a link. 322 o Link MTU (LMTU): the largest message that can transit a link. It 323 typically does not include link-layer information, e.g., link 324 layer headers or trailers, i.e., it refers to the message that the 325 link can carry rather than the message as it appears on the link. 326 This is thus the largest network layer packet (including network 327 layer headers, e.g., IP datagram) that can transit a link. Note 328 that this need not be the native size of messages on the link, 329 i.e., the link may internally fragment and reassemble messages. 330 For IPv4, the smallest LMTU is 68 bytes [RFC791], and for IPv6 the 331 smallest LMTU is 1280 bytes [RFC2460]. 333 o Path MTU (PMTU): the largest message that can transit a path. 334 Typically, this is the minimum of the link MTUs of the links of 335 the path, and represents the largest network layer message 336 (including network layer headers) that can transit a path. Note 337 that this is not the largest network packet that can be sent 338 between a source and destination; this is the largest network 339 network packet that can be sent without requiring reassembly at 340 the network layer of the destination. 342 o Reassembly MTU (RMTU): the largest message that can be reassembled 343 by a destination, which is not directly related to the link or 344 path MTU. Sometimes also referred to as "receiver MTU". For IPv4, 345 this is 576 bytes [RFC793] and for IPv6 it is 1500 bytes 346 [RFC2460]; note that in both cases, the size refers to the message 347 transferred at the network layer, which includes the network layer 348 headers. 350 o Tunnel MTU (TMTU): the largest message that can transit a tunnel, 351 i.e., this is the tunnel equivalent of a link MTU. Typically, this 352 is limited by the egress reassembly MTU. Note that this value may 353 have no relation to the path MTU between the tunnel ingress and 354 egress. 356 o Tunnel internal MTU (TIMTU): the largest message that a tunnel 357 egress can emit into a tunnel without requiring further 358 fragmentation to reach the tunnel egress. This the path MTU 359 between the ingress and egress. 361 o Egress reassembly MTU (ERMTU): the largest message that can be 362 reassembled by an egress. This is the size of the RMTU of a tunnel 363 minus the encapsulation overhead of that tunnel. Sometimes also 364 referred to as the "egress MTU". 366 3. The Tunnel Model 368 A network architecture is an abstract description of a distributed 369 communications system, its components and their relationships, the 370 requisite properties of those components and the emergent properties 371 of the system that result [To03]. Such descriptions can help explain 372 behavior, as when the OSI seven-layer model is used as a teaching 373 example [Zi80]. Architectures describe capabilities - and, just as 374 importantly, constraints. 376 A network can be defined as a system of endpoints and relays 377 interconnected by communication paths, abstracting away issues of 378 naming in order to focus on message forwarding. To the extent that 379 the Internet has a single, coherent interpretation, its architecture 380 is defined by its core protocols (IP [RFC791], TCP [RFC793], UDP 381 [RFC768]) and messages, hosts, routers, and links [Cl88][To03], as 382 shown in Figure 3: 384 +------+ ------ ------ +------+ 385 | | / \ / \ | | 386 | HOST |--+ ROUTER +--+ ROUTER +--| HOST | 387 | | \ / \ / | | 388 +------+ ------ ------ +------+ 390 Figure 3 Basic Internet architecture 392 As a network architecture, the Internet is a system of hosts and 393 routers interconnected by links that exchange messages when possible. 394 "When possible" defines the Internet's "best effort" principle. The 395 limited role of routers and links represents the End-to-End Principle 396 [Sa84] and longest-prefix match enables hierarchical forwarding. 398 Although the definitions of host, router, and link seem absolute, 399 they are often relative as viewed within the context of one OSI 400 layer, each of which can be considered a distinct network 401 architecture. An Internet gateway is a Layer 3 router when it 402 transits IP datagrams but it acts as a Layer 2 host as it sources or 403 sinks Layer 2 messages on attached links to accomplish this transit 404 capability. In this way, a single node (Internet gateway) behaves as 405 different components (router, host) at different layers. 407 Even though a single node may have multiple roles - even concurrently 408 - at a given layer, each role is typically static and determined by 409 context. An Internet gateway always acts as a Layer 2 host and that 410 behavior does not depend on where the gateway is viewed from within 411 Layer 2. In the context of a single layer, a node's behavior is 412 modeled as a single component from all viewpoints in that layer. 414 3.1. What is a tunnel? 416 A tunnel can be modeled as a link in another network 417 [To98][To01][To03]. In Figure 4, a source host (Hsrc) and destination 418 host (Hdst) communicating over a network M in which two routers (Ra 419 and Rd) are connected by a tunnel. Keep in mind that it is possible 420 that both network N and network M can both be components of the 421 Internet, i.e., there may be regular traffic as well as tunneled 422 traffic over any of the routers shown. 424 --_ -- 425 +------+ / \ / \ +------+ 426 | Hsrc |--+ Ra + -- -- + Rd +--| Hdst | 427 +------+ \ //\ / \ / \ /\\ / +------+ 428 --/I \--+ Rb +--+ Rc +--/E \-- 429 \ / \ / \ / \ / 430 \/ -- -- \/ 431 <------ Network N -------> 432 <-------------------- Network M ---------------------> 434 Figure 4 The big picture 436 The tunnel consists of two elements (ingress I, egress E), that lie 437 along a path connected by a (possibly different) network N. 438 Regardless of how the ingress and egress are connected, the tunnel 439 serves as a link to the nodes it connects (here, Ra and Rd). 441 IP packets arriving at the ingress are encapsulated to traverse 442 network N. We call these packets "tunnel transit packets" (TTPs) 443 because they will now transit the tunnel inside one or more "tunnel 444 link packets" (TLPs). TLPs use the source address of the ingress and 445 the destination address of the egress - using whatever address is 446 appropriate to the Layer at which the ingress and egress operate 447 (Layer 2, Layer 3, Layer 4, etc.). The egress decapsulates those 448 messages, which then continue on network M as if emerging from a 449 link. To tunnel transit packets, and to the routers the tunnel 450 connects (Ra and Rd), the tunnel acts as a link and the ingress and 451 egress act as network interfaces to that link. 453 The model of each component (ingress, egress) and the entire system 454 (tunnel) depends on the layer from which you view the tunnel. From 455 the perspective of the outermost hosts (Hsrc and Hdst), the tunnel 456 appears as a link between two routers (Ra and Rd). For routers along 457 the tunnel (e.g., Rb and Rc), the ingress and egress appear as the 458 endpoint hosts and Hsrc and Hdst are invisible. 460 When the tunnel network (N) is implemented using the same protocol as 461 the endpoint network (M), the picture looks flatter (Figure 5), as if 462 it were running over a single network. However, note that this 463 appearance is incorrect - nothing has changed. From the perspective 464 of the endpoints, Rb and Rc and network N don't exist and aren't 465 visible, and from the perspective of the tunnel, network M doesn't 466 exist. The fact that network N and M use the same protocol, and may 467 traverse the same links is irrelevant. 469 --_ -- -- -- 470 +------+ / \ /\ / \ / \ /\ / \ +------+ 471 | Hsrc |--+ Ra +/I \--+ Rb +--+ Rc +--/E \+ Rd +--| Hdst | 472 +------+ \ / \ / \ / \ / \ / \ / +------+ 473 -- \/ -- -- \/ -- 474 <------ Network N -------> 475 <---------------------- Network M -----------------------> 477 Figure 5 IP in IP network picture 479 3.2. View from the Outside 481 From outside the tunnel, to network M, the entire tunnel acts as a 482 link (Figure 6). It may be numbered or unnumbered and the addresses 483 associated with the ingress and egress are irrelevant from outside. 485 --_ -- 486 +------+ / \ / \ +------+ 487 | Hsrc |--+ Ra +--------------------------+ Rd +--| Hdst | 488 +------+ \ / \ / +------+ 489 -- -- 491 Figure 6 Tunnels as viewed from the outside 493 A tunnel is effectively invisible to the network in which it resides, 494 except that it behaves exactly as a link. Consequently [RFC3819] 495 requirements for links supporting IP also apply to tunnels. 497 E.g., the IP datagram hop count (IPv4 Time-to-Live [RFC791] and IPv6 498 Hop Limit [RFC2460]) are decremented when traversing a router, not by 499 traversing a link - or thus a tunnel. Tunnels have a tunnel MTU - the 500 largest datagram that can transit, just as links have a corresponding 501 link MTU. A link MTU may not reflect the native link message sizes 502 (ATM AAL5 48 byte messages support a 9KB MTU) and the same is true 503 for a tunnel. 505 3.3. View from the Inside 507 Within network N, i.e., from inside the tunnel itself, the ingress is 508 a source of tunnel link packets and the egress is a sink - both are 509 hosts on network N (Figure 7). Consequently [RFC1122] Internet host 510 requirements apply to ingress and egress nodes when Network N uses IP 511 (and thus the ingress/egress use IP encapsulation). 513 _ -- -- 514 /\ / \ / \ /\ 515 /I \--+ Rb +--+ Rc +--/E \ 516 \ / \ / \ / \ / 517 \/ -- -- \/ 518 <------ Network N -------> 520 Figure 7 Tunnels, as viewed from within the tunnel 522 Viewed from within the tunnel, the outer network (M) doesn't exist. 523 Tunnel link packets can be fragmented by the source (ingress) and 524 reassembled at the destination (egress), just as at any endpoint. The 525 path between ingress and egress may have a path MTU but the endpoints 526 can exchange messages as large as can be reassembled at the 527 destination (egress), i.e., an egress MTU. Information about the 528 network - i.e., regarding MTU sizes, network reachability, etc. - are 529 relayed from the destination (egress) and intermediate routers back 530 to the source (ingress), without regard for the external network (M). 532 3.4. Location of the Ingress and Egress 534 The ingress and egress are endpoints of the tunnel and the tunnel is 535 a link. The ingress and egress are thus link endpoints at the network 536 nodes the tunnel interconnects. Such link endpoints are typically 537 described as "network interfaces". 539 Tunnel interfaces may be physical or virtual. The interface may be 540 implemented inside the node where the tunnel attaches, e.g., inside a 541 host or router. The interface may also be implemented as a "bump in 542 the wire" (BITW), somewhere along a link between the two nodes the 543 link interconnects. IP in IP tunnels are often implemented as 544 interfaces, where IPsec tunnels are sometimes implemented as BITW. 545 These implementation variations determine only whether information 546 available at the link endpoints (ingress/egress) can be easily shared 547 with the connected network nodes. 549 3.5. Implications of This Model 551 This approach highlights a few key features of a tunnel as a network 552 architecture construct: 554 o To the tunnel transit packets (TTPs), tunnels turn a network 555 (Layer 3) path into a (Layer 2) link 557 o To nodes the tunnel traverses, the tunnel ingress and egress act 558 as hosts that source and sink tunnel link packets (TLPs) 560 The consequences of these features are as follow: 562 o Like a link, a tunnel has an MTU defined by the reassembly MTU of 563 the receiving interface (egress). 565 o Like any other link, the MTU inside a tunnel are not relevant to 566 the transited traffic. There is no mechanism or protocol by which 567 they are measured or confirmed. 569 o Path MTU discovery in the network layer (i.e., outer network M) 570 has no direct relation to the MTU of the hops within the link 571 layer of the links (or thus tunnels) that connect its components. 573 o Hops remain defined as the number of routers encountered on a path 574 or the time spent at a router [RFC1812]. Hops are not decremented 575 solely by the transit of a link, e.g., a packet with a hop count 576 of zero should successfully transit a link (and thus a tunnel) 577 that connects two hosts. Routers, not links, alter hopcounts. 579 o The addresses of a tunnel ingress and egress correspond to link 580 layer addresses to the tunnel transit packet and outer network M. 581 Like point-to-point links, point-to-point tunnels can be 582 unnumbered in the network in which they reside (even though they 583 must have addresses in the network they transit). 585 o Like network interfaces, the ingress and egress are never a direct 586 source of ICMP messages but may provide information to their 587 attached host or router to generate those ICMP messages. 589 o Like network interfaces and links, two nodes may be connected by 590 any combination of tunnels and links, including multiple tunnels. 591 As with multiple links, existing routing determines which traffic 592 uses each link or tunnel. 594 These observations make it much easier to determine what a tunnel 595 must do to transit IP packets, notably it must satisfy all 596 requirements expected of a link [RFC1122][RFC3819]. The consequence 597 of these observations are that tunnels are no different from links, 598 except only that a link has a physical instantiation. 600 3.6. Fragmentation 602 There are two places where fragmentation can occur in a tunnel, 603 called Outer Fragmentation and Inner Fragmentation. This document 604 assumes that only Outer Fragmentation is viable because it is the 605 only approach that works for IPv4 datagrams with DF=1 and for IPv6. 607 3.6.1. Outer Fragmentation 609 The simplest case is Outer Fragmentation, as shown in Figure 8. The 610 bottom of the figure shows the network topology, where packets start 611 at the source, enter the tunnel at the encapsulator, exit the tunnel 612 at the decapsulator, and arrive finally at the destination. The 613 packet traffic is shown above the topology, where the end-to-end 614 packets are shown at the top. The packets are composed of an inner 615 header (iH) and inner data (iD); the term "inner") is relative to the 616 tunnel, as will become apparent. When the packet (iH,iD) arrives at 617 the encapsulator, it is placed inside the tunnel packet structure, 618 here shown as adding just an outer header, oH, in step (a). 620 +----+----+ +----+----+ 621 | iH | iD |------+ - - - - - - - - - - +------>| iH | iD | 622 +----+----+ | | +----+----+ 623 v | 624 +----+----+----+ +----+----+----+ 625 (a) | oH | iH | iD | | oH | iH | iD | (c) 626 +----+----+----+ +----+----+----+ 627 | ^ 628 | +----+----+-----+ | 629 (b1) +----- >| oH'| iH | iD1 |-------+ 630 | +----+----+-----+ | 631 | | 632 | +----+-----+ | 633 (b2) +----- >| oH"| iD2 |------------+ 634 +----+-----+ 635 +-----+ +---+ +---+ +-----+ 636 | | / \ ======================= / \ | | 637 | Src |=======| Enc |=======================| Dec |=======| Dst | 638 | | \ / ======================= \ / | | 639 +-----+ +---+ +---+ +-----+ 641 Figure 8 Fragmentation of the outer packet 643 When the encapsulated packet exceeds the tunnel MTU, the packet needs 644 to be fragmented. In this case we fragment the packet at the outer 645 header, with the fragments shown as (b1) and (b2). Note that the 646 outer header indicates fragmentation (as ' and "),the inner header 647 occurs only in the first fragment, and the inner data is broken 648 across the two packets. These fragments are reassembled at the 649 encapsulator in step (c), and the resulting packet is decapsulated 650 and sent on to the destination. 652 Outer fragmentation isolates Source and Destination from tunnel 653 encapsulation duties. This can be considered a benefit in clean, 654 layered network design, but also may result in complex decapsulator 655 design, especially where tunnels aggregate large amounts of traffic, 656 such as IP ID overload (see Sec. 4.4). Outer fragmentation is valid 657 for any tunnel encapsulation protocol that supports fragmentation 658 (e.g., IPv4 or IPv6), where the tunnel endpoints act as the host 659 endpoints of that protocol. 661 Along the tunnel, the inner header is contained only in the first 662 fragment, which can interfere with mechanisms that 'peek' into lower 663 layer headers, e.g., as for ICMP, as discussed in Sec. 4.6. 665 3.6.2. Inner Fragmentation 667 Inner Fragmentation distributes the impact of tunneling across both 668 the decapsulator and destination, and is shown in Figure 9; this can 669 be especially important when the tunnel aggregates large amounts of 670 traffic. However, this mechanism is thus valid only when the original 671 source packets can be fragmented on-path, e.g., as in IPv4 datagrams 672 with DF=0. 674 Again, the network topology is shown at the bottom of the figure, and 675 the original packets show at the top. Packets arrive at the 676 encapsulator, and are fragmented there based on the inner header into 677 (a1) and (a2). The fragments arrive at the decapsulator, which 678 removes the outer header and forwards the resulting fragments on to 679 the destination. The destination is then responsible for reassembling 680 the fragments into the original packet. 682 Along the tunnel, the inner headers are copied into each fragment, 683 and so are available to mechanisms that 'peek' into headers (e.g., 684 ICMP, as discussed in Sec. 4.6). Because fragmentation happens on the 685 inner header, the impact of IP ID is reduced. 687 +----+----+ +----+----+ 688 | iH | iD |-------+- - - - - - - - - - - - - >| iH | iD | 689 +----+----+ | +----+----+ 690 v ^ 691 +----+-----+ +----+-----+ | 692 (a1) | iH'| iD1 | | iH'| iD1 |------+ 693 +----+-----+ +----+-----+ | 694 | 695 +----+--- +----+-----+ | 696 (a2) | iH"| iD2 | | iH"| iD2 |------+ 697 +----+-----+ +----+-----+ 698 | ^ 699 | +----+----+----- | 700 (b1) +----- >| oH | iH'| iD1 |-------+ 701 | +----+----+-----+ | 702 | | 703 | +----+----+-----+ | 704 (b2) +----- >| oH | iH"| iD2 |-------+ 705 +----+----+-----+ 706 +-----+ +---+ +---+ +-----+ 707 | | / \ ======================= / \ | | 708 | Src |=======| Enc |=======================| Dec |=======| Dst | 709 | | \ / ======================= \ / | | 710 +-----+ +---+ +---+ +-----+ 712 Figure 9 Fragmentation of the inner packet 714 3.6.3. The necessity of Outer Fragmentation 716 Fragmentation is critical tunnels that support TTP packets for 717 protocols with minimum MTU requirements, while operating over tunnel 718 paths using protocols with minimum MTU requirements. Depending on the 719 amount of space used by encapsulation, these two minimums will 720 ultimately interfere, and the TTP will need to be fragmented to both 721 support a TTP minimum MTU while traversing tunnels with their own TLP 722 minimum MTUs. 724 Outer Fragmentation is the only solution that supports all IPv4 and 725 IPv6 traffic, because inner fragmentation is allowed only for IPv4 726 datagrams with DF=0. As a result, the remainder of this document 727 assumes Outer Fragmentation. 729 4. IP Tunnel Requirements 731 The requirements of an IP tunnel are defined by the requirements of 732 an IP link because both transit IP packets. A tunnel thus must 733 transit the IP minimum MTU, i.e., 68 bytes for IPv4 [RFC793] and 1280 734 bytes for IPv6 [RFC2460] and a tunnel must support address resolution 735 when there is more than one egress. 737 The requirements of the tunnel ingress and egress are defined by the 738 network over which they exchange messages (tunnel link packets). For 739 IP-over-IP, this means that the ingress MUST NOT exceed the IPv4 740 Identification (fragment) field uniqueness requirements [RFC6864]. 742 These requirements remain even though tunnels have some unique 743 issues, including the need for additional space for encapsulation 744 headers and the potential for tunnel MTU variation. 746 4.1. Minimum MTU Considerations 748 There are a variety of values of minimum MTU to consider, both in a 749 conventional network and in a tunnel as a link in that network. These 750 are indicated in Figure 10, an annotated variant of Figure 4. 752 (a) LMTU <-> 753 (b) PMTU <------------------------------------> 754 (c) <-RMTU-----------------------------------------------> 755 (d) TMTU <------------------------> 756 (e) TIMTU <----------------> 757 (f) ERMTU <------------------------> 758 --_ -- 759 +------+ / \ / \ +------+ 760 | Hsrc |--+ Ra + -- -- + Rd +--| Hdst | 761 +------+ \ //\ / \ / \ /\\ / +------+ 762 --/I \--+ Rb +--+ Rc +--/E \-- 763 \ / \ / \ / \ / 764 \/ -- -- \/ 765 <------ Network N -------> 766 <-------------------- Network M ---------------------> 768 Figure 10 The variety of MTU values 770 Consider the following example values. For IPv6, the minimum LMTU (a) 771 is 1280 bytes, which is also the minimum PMTU (b). The minimum RMTU 772 (c) is 1500 bytes, which is also the minimum MTU for endpoint-to- 773 endpoint communication. This means that IPv6 already assumes that 774 endpoint-to-endpoint communication may require source fragmentation 775 to transit IPv6-compatible links, even without considering tunnels. 777 The TMTU (d) is the tunnel equivalent of a LMTU, and thus also needs 778 to be 1280 bytes for IPv6. Assuming the links of a tunnel traverse 779 IPv6 hops (e.g., I to Rb, Rb to Rc, and Rc to E), the TIMTU (e) is 780 equivalent to the PMTU between I and E, which is 1280 - encaps (where 781 "encaps" is the tunnel encapsulation overhead). This value is 782 insufficient to satisfy the requirement of an IPv6 link (which must 783 transit at least 1280 bytes unfragmented), but this is not a problem. 784 The TMTU (d) is not limited by TIMTU (e), but by ERMTU (f), the 785 tunnel equivalent of RMTU (c). For a tunnel using IPv6 over IPv6, the 786 ERMTU is the RMTU of tne underlying network N minus space for 787 encapsulation, i.e., 1500 - encaps bytes, and the tunnel is viable as 788 long as ERMTU >= 1280. Even though the tunnel will ultimately transit 789 ERMTU - encaps byte messages between the ingress and egress, each hop 790 within the tunnel transits only TIMTU - encaps byte messages. The 791 difference between TIMTU and ERMTU is the reason why the tunnel 792 ingresses need to support fragmentation and tunnel egresses need to 793 support reassembly. The high cost of fragmentation and reassembly is 794 why it is useful for applications to avoid sending messages too close 795 to the PMTU, even the PMTU at their own layer. 797 4.2. Fragmentation 799 A tunnel interacts with fragmentation in two different ways. As a 800 link in network M, it messages might be fragmented before they reach 801 the tunnel - i.e., at the TTP layer either during source 802 fragmentation (if generated at the same node as the ingress 803 interface) or forwarding fragmentation (for IPv4 DF=0 datagrams). In 804 addition, messages traversing the tunnel may require fragmentation by 805 the ingress - i.e., source fragmentation at the TLP layer by the 806 ingress. These two fragmentation operations are no more related than 807 are conventional IP fragmentation and ATM segmentation and 808 reassembly; one occurs at the network layer, the other at the 809 (virtual) link layer. 811 As with any link layer, a tunnel MTU (TMTU) is defined as the largest 812 message that can transit the tunnel. For a tunnel, this is the egress 813 reassembly MTU (ERMTU), which is the reassembly MTU (RMTU) of the 814 egress interface minus the space needed for the tunnel encapsulation 815 headers. This value must also satisfy the requirements of the IP 816 packets that the tunnel transits. 818 Note that many of the issues with tunnel fragmentation and MTU 819 handling were discussed in [RFC4459], but that document described a 820 variety of alternatives as if they were independent. This document 821 explains the combined approach that is necessary. 823 Like any other link, an IPv4 tunnel must transit 68 byte packets 824 without requiring source fragmentation [RFC791][RFC1122] and an IPv6 825 tunnel must transit 1280 byte packets without requiring source 826 fragmentation [RFC2460]. The tunnel MTU interacts with routers or 827 hosts it connects the same way as would a link MTU. In the following 828 pseudocode, TTPsize is the size of the tunnel transit packet (TTP), 829 and ERMTU is the reassembly MTU of the egress. As with any link, the 830 link MTU (LMTU) is defined not by the native path of the link (or, 831 for a tunnel, the path MTU of encapsulated packets inside the tunnel) 832 but by the egress reassembly capability. This is because the ICMP 833 "packet too big" message indicates failure of a link to transit a 834 packet, not a preference for a size that matches that inside the 835 mechanism of the link. There is no ICMP message for "larger than I'd 836 like, but I can still transit it". 838 These rules apply at the host/router where the tunnel is attached, 839 i.e., at the network layer of the TTP (we assume that all tunnels, 840 including multipoint tunnels, have a single, uniform TMTU). These are 841 basic source fragmentation rules (or transit refragmentation for IPv4 842 DF=0 datagrams), and have no relation to the tunnel itself other than 843 to consider the TMTU as the effective LMTU of the next hop: 845 if (TTP > TMTU) then 846 if (TTP can be fragmented, e.g., IPv4 DF=0) then 847 split TTP into fragments of TMTU size 848 and send each fragment to the tunnel ingress 849 else 850 drop TTP and send ICMP "too big" to TTP source 851 endif 852 else 853 send TTP to the tunnel ingress 854 endif 856 These rules apply at the tunnel ingress, in its role as host on the 857 tunnel path, i.e., as source fragmentation of TLP messages (we assume 858 that all tunnels, even multipoint tunnels, have a single, uniform 859 TIMTU), where "encaps" is the encapsulation overhead: 861 if (TTP <= (TIMTU + encaps)) then 862 encapsulate the TTP and process as if arriving at the node 863 else 864 if ((TIMTU + encaps) < TTP <= (ERMTU - encaps)) then 865 fragment TTP into TIMTU chunks 866 encapslate each chunk and process as if arriving at the node 867 else 868 {never happens; host/router already dropped by now} 869 endif 870 endif 872 There is one path above that never occurs - i.e., a network interface 873 should never receive a message larger than its MTU, and a tunnel 874 should thus never receive a message larger than its (ERMTU - encaps) 875 limit. A router attempting to process such a message would generate 876 an ICMP error (packet too big, fragmentation needed) and the packet 877 would already have been dropped before entering into this algorithm. 879 As an example, consider IPv4 over IPv6 or IPv6 over IPv6 tunneling, 880 where IPv6 encapsulation adds a 40 byte fixed header plus IPv6 881 options (i.e., IPv6 header extensions) of total size TOptSz. From 882 [RFC2460] it follows that the TMTU must be at least 1280 bytes and 883 the ERMTU must be at least 1500 - (40 + TOptSz) bytes. The TIMTU must 884 be a minimum of 1280 - (40 + TOptSz) bytes. Considering these minimum 885 values, the previous algorithm becomes: 887 if (TTP <= (1240 - TOptSz)) then 888 encapsulate the TTP and and process as if arriving at the node 889 else 890 if ((1240 - TOptSz) < TTP <= (1460 - TOptSz)) then 891 fragment TTP into (1240 - TOptSz) chunks 892 encapslate each chunk and process as if arriving at the node 893 else 894 {never happens; host/router already dropped by now} 895 endif 896 endif 898 This tunnel supports IPv6 transit only if TOptSize is smaller than 899 180 bytes, and supports IPv4 transit if TOptSize is smaller than 884 900 bytes. IPv6 TTPs of 1280 bytes may be guaranteed transit the outer 901 network (M) without needing fragmentation there but they may require 902 ongoing fragmentation and reassembly if the TMTU is not at least 1320 903 bytes. 905 When using IP directly over IP, the minimum ERMTU for IPv4 is 576 906 bytes and for IPv6 is 1500 bytes. This means that tunnels of IPv4- 907 over-IPv4, IPv4-over-IPv6, and IPv6-over-IPv6 are possible without 908 additional requirements, but this may involve ingress fragmentation 909 and egress reassembly. IPv6 cannot be tunneled directly over IPv4 910 without additional requirements, notably that the ERMTU is at least 911 1280 bytes. Fragmentation and reassembly cannot be avoided for IPv6- 912 over-IPv6 without similar requirements. 914 When ongoing ingress fragmentation and egress reassembly would be 915 prohibitive or costly, larger MTUs can be supported by design and 916 confirmed either out-of-band (by design) or in-band (e.g., using 917 PLPMTUD [RFC4821], as done in SEAL [RFC5320] and AERO [Te16]). 919 Alternately, an ingress can encapsulate packets that fit and shut 920 down once fragmentation is needed, but it must not continue to 921 forward smaller packets while dropping larger packets that are still 922 within required limits. 924 4.3. MTU discovery 926 MTU discovery enables a network path to support a larger PMTU than it 927 can assume from the minimum requirements of protocol over which it 928 operates. A tunnel has two different LMTU-like values: TMTU and the 929 TIMTU. 931 There is temptation to optimize tunnel traversal so that packets are 932 not fragmented between ingress and egress, i.e., to attempt tune the 933 network PMTU to the TIMTU rather than the TMTU, to avoid ingress 934 fragmentation. This is hazardous for many reasons: 936 o The tunnel is capable of transiting packets as large as the ERMTU, 937 which is always at least as large as the TIMTU and typically is 938 larger. 940 o ICMP has only one type of error message regarding large packets - 941 "too big", i.e., too large to transit. There is no optimization 942 message of "bigger than I'd like, but I can deal with if needed". 944 o IP tunnels often involve some level of recursion, i.e., 945 encapsulation over itself [RFC4459]. 947 Recursive tunneling occurs whenever a protocol ends up encapsulated 948 in itself. This happens directly, as when IPv4 is encapsulated in 949 IPv4, or indirectly, as when IP is encapsulated in UDP which then is 950 a payload inside IP. It can involve many layers of encapsulation 951 because a tunnel provider isn't always aware of whether the packets 952 it transits are already tunneled. 954 Recursion is impossible when the tunnel transit packets are limited 955 to that of the native size of the TIMTU. Arriving tunnel transit 956 packets have a minimum supported size (1280 for IPv6) and the tunnel 957 PMTU has the same requirement; there would be no room for the 958 additional encapsulation headers. The result would be an IPv6 tunnel 959 that cannot satisfy IPv6 transit requirements. 961 It is more appropriate to require the tunnel to satisfy IP transit 962 requirements and enforce that requirement at design time or during 963 operation (the latter using PLPMTUD [RFC4821]). Conventional path MTU 964 discovery (PMTUD) relies on existing endpoint ICMP processing of 965 explicit negative feedback from routers along the path via "message 966 to big" ICMP packets in the reverse direction of the tunnel 967 [RFC1191]. This technique is susceptible to the "black hole" 968 phenomenon, in which the ICMP messages never return to the source due 969 to policy-based filtering [RFC2923]. PLPMTUD requires a separate, 970 direct control channel from the egress to the ingress that provides 971 positive feedback; the direct channel is not blocked by policy 972 filters and the positive feedback ensures fail-safe operation if 973 feedback messages are lost [RFC4821]. 975 4.4. IP ID exhaustion 977 In IPv4, the IP Identification (ID) field is a 16-bit value that is 978 unique for every packet for a given source address, destination 979 address, and protocol, such that it does not repeat within the 980 Maximum Segment Lifetime (MSL) [RFC791][RFC1122]. Although the ID 981 field was originally intended for fragmentation and reassembly, it 982 can also be used to detect and discard duplicate packets, e.g., at 983 congested routers (see Sec. 3.2.1.5 of [RFC1122]). For this reason, 984 and because IPv4 packets can be fragmented anywhere along a path, all 985 packets between a source and destination of a given protocol must 986 have unique ID values over a period of an MSL, which is typically 987 interpreted as two minutes (120 seconds). These requirements have 988 recently been somewhat relaxed in recognition of the primary use of 989 this field for reassembly and the need to handle only fragment 990 misordering at the receiver [RFC6864]. 992 The uniqueness of the IP ID is a known problem for high speed nodes, 993 because it limits the speed of a single protocol between two 994 endpoints [RFC4963]. Although this suggests that the uniqueness of 995 the IP ID is moot, tunnels exacerbate this condition. A tunnel often 996 aggregates traffic from a number of different source and destination 997 addresses, of different protocols, and encapsulates them in a header 998 with the same ingress and egress addresses, all using a single 999 encapsulation protocol. The result is one of the following: 1001 1. The IP ID rules are enforced, and the tunnel throughput is 1002 severely limited. 1004 2. The IP ID rules are enforced, and the tunnel consumes large 1005 numbers of ingress/egress IP addresses solely to ensure ID 1006 uniqueness. 1008 3. The IP ID rules are ignored. 1010 The last case is the most obvious solution, because it corresponds to 1011 how endpoints currently behave. Fortunately, fragmentation is 1012 somewhat rare in the current Internet at large, but it can be common 1013 along a tunnel. Fragments that repeat the IP ID risk being 1014 reassembled incorrectly, especially when fragments are reordered or 1015 lost. Reassembly errors are not always detected by other protocol 1016 layers (see Sec. 4.9), and even when detected they can result in 1017 excessive overall packet loss and can waste bandwidth between the 1018 egress and ultimate packet destination. 1020 4.5. Hop Count 1022 This section considers the selection of the value of the hop count of 1023 the tunnel link header, as well as the potential impact on the tunnel 1024 transit header. The former is affected by the number of hops within 1025 the tunnel. The latter determines whether the tunnel has visible 1026 effect on the transit packet. 1028 In general, the Internet hop count field is used to detect and avoid 1029 forwarding loops that cannot be corrected without a synchronized 1030 reboot. The IPv4 Time-to-Live (TTL) and IPv6 Hop Limit field each 1031 serve this purpose [RFC791][RFC2460]. 1033 The IPv4 TTL field was originally intended to indicate packet 1034 expiration time, measured in seconds. A router is required to 1035 decrement the TTL by at least one or the number of seconds the packet 1036 is delayed, whichever is larger [RFC1812]. Packets are rarely held 1037 that long, and so the field has come to represent the count of the 1038 number of routers traversed. IPv6 makes this meaning more explicit. 1040 These hop count fields represent the number of network forwarding 1041 elements traversed by an IP datagram. An IP datagram with a hop count 1042 of zero can traverse a link between two hosts because it never visits 1043 a router (where it would need to be decremented and would have been 1044 dropped). 1046 An IP datagram traversing a tunnel thus need not have its hopcount 1047 modified, i.e., the tunnel transit header need not be affected. A 1048 zero hop count datagram should be able to traverse a tunnel as easily 1049 as it traverses a link. A router MAY be configured to decrement 1050 packets traversing a particular link (and thus a tunnel), which may 1051 be useful in emulating a path as if it had traversed one or more 1052 routers, but this is strictly optional. The ability of the outer 1053 network and tunnel network to avoid indefinitely looping packets does 1054 not rely on the hop counts of the tunnel traversal packet and tunnel 1055 link packet being related in any way at all. 1057 The hop count field is also used by several protocols to determine 1058 whether endpoints are "local", i.e., connected to the same subnet 1059 (link-local discovery and related protocols [RFC4861]). A tunnel is a 1060 way to make a remote address appear directly-connected, so it makes 1061 sense that the other ends of the tunnel appear local and that such 1062 link-local protocols operate over tunnels unless configured 1063 explicitly otherwise. When the interfaces of a tunnel are numbered, 1064 these can be interpreted the same way as if they were on the same 1065 link subnet. 1067 4.6. Signaling 1069 In the current Internet architecture, signaling goes upstream, either 1070 from routers along a path or from the destination, back toward the 1071 source. Such signals are typically contained in ICMP messages, but 1072 can involve other protocols such as RSVP, transport protocol signals 1073 (e.g., TCP RSTs), or multicast control or transport protocols. 1075 A tunnel behaves like a link and acts like a link interface at the 1076 nodes where it is attached. As such, it can provide information that 1077 enhances IP signaling (e.g., ICMP), but itself does not directly 1078 generate ICMP messages. 1080 For tunnels, this means that there are two separate signaling paths. 1081 The outer network M nodes can each signal the source of the tunnel 1082 transit packets, Hsrc (Figure 11). Inside the tunnel, the inner 1083 network N nodes can signal the source of the tunnel link packets, the 1084 ingress I (Figure 12). 1086 +--------+---------------------------+--------+ 1087 | | | | 1088 v --_ -- v 1089 +------+ / \ / \ +------+ 1090 | Hsrc |--+ Ra + -- -- + Rd +--| Hdst | 1091 +------+ \ //\ / \ / \ /\\ / +------+ 1092 --/I \--+ Rb +--+ Rc +--/E \-- 1093 \ / \ / \ / \ / 1094 \/ -- -- \/ 1095 <---- Network N -----> 1096 <-------------------- Network M ---------------------> 1098 Figure 11 Signals outside the tunnel 1099 +-----+-------+------+ 1100 --_ | | | | -- 1101 +------+ / \ v | | | / \ +------+ 1102 | Hsrc |--+ Ra + -- -- + Rd +--| Hdst | 1103 +------+ \ //\ / \ / \ /\\ / +------+ 1104 --/I \--+ Rb +--+ Rc +--/E \-- 1105 \ / \ / \ / \ / 1106 \/ -- -- \/ 1107 <----- Network N ----> 1108 <--------------------- Network M --------------------> 1110 Figure 12 Signals inside the tunnel 1112 These two signal paths are inherently distinct except where 1113 information is exchanged between the network interface of the tunnel 1114 (the ingress) and its attached node (Ra, in both figures). 1116 It is always possible for a network interface to provide hints to its 1117 attached node (host or router), which can be used for optimization. 1118 In this case, when signals inside the tunnel indicate a change to the 1119 tunnel, the ingress (i.e., the tunnel network interface) can provide 1120 information to the router (Ra, in both figures), so that Ra can 1121 generate the appropriate signal in return to Hsrc. This relaying may 1122 be difficult, because signals inside the tunnel may not return enough 1123 information to the ingress to support direct relaying to Hsrc. 1125 In all cases, the tunnel ingress needs to determine how to relay the 1126 signals from inside the tunnel into signals back to the source. For 1127 some protocols this is either simple or impossible (such as for 1128 ICMP), for others, it can even be undefined (e.g., multicast). In 1129 some cases, the individual signals relayed from inside the tunnel may 1130 result in corresponding signals in the outside network, and in other 1131 cases they may just change state of the tunnel interface. In the 1132 latter case, the result may cause the router Ra to generate new ICMP 1133 errors when later messages arrive from Hsrc or other sources in the 1134 outer network. 1136 The meaning of the relayed information must be carefully translated. 1137 In the case of soft or hard ICMP errors, the translation may be 1138 obvious. ICMP "packet too big" messages from inside the tunnel might 1139 update TIMTU at the ingress, but may have no effect on the tunnel as 1140 visible to the router where it is attached (Ra). 1142 In addition to ICMP, messages typically considered for translation 1143 include Explicit Congestion Notification (ECN [RFC6040]) and 1144 multicast (IGMP, e.g.). 1146 4.7. Relationship of Header Fields 1148 Some tunnel specifications attempt to relate the fields of the tunnel 1149 transit packet and tunnel link packet, i.e., the packet arriving at 1150 the ingress and the encapsulation header. These two headers are 1151 effectively independent and there is no utility in requiring their 1152 contents to be related. 1154 In specific, the encapsulation header source and destination 1155 addresses are network endpoints in the tunnel network N, but have no 1156 meaning in the outer network M, even when the tunneled packet 1157 traverses the same network. The addresses are effectively 1158 independent, and the tunnel endpoint addresses are link addresses to 1159 the tunnel transit packet. 1161 Because the tunneled packet uses source and destination addresses 1162 with a separate meaning, it is inappropriate to copy or reuse the 1163 IPv4 Identification or IPv6 Fragment ID fields of the tunnel transit 1164 packet. These fields need to be generated based on the context of the 1165 encapsulation header, not the tunnel transit header. 1167 Similarly, the DF field need not be copied from the tunnel transit 1168 packet to the encapsulation header of the tunnel link packet 1169 (presuming both are IPv4). Path MTU discovery inside the tunnel does 1170 not directly correspond to path MTU discovery outside the tunnel, 1171 i.e., inside the tunnel it would update the TIMTU used for outer 1172 fragmentation at the ingress, but has no effect on the TMTU reported 1173 to the device where the ingress is attached as a network interface. 1175 The same is true for most other fields. When a field value is 1176 generated in the encapsulation header, its meaning should be derived 1177 from what is desired in the context of the tunnel as a link. When 1178 feedback is received from these fields, they should be presented to 1179 the tunnel ingress and egress as if they were network interfaces. The 1180 behavior of the node where these interfaces attach should be 1181 identical to that of a conventional link. 1183 There are exceptions to this rule that are explicitly intended to 1184 relay signals from inside the tunnel to outside the tunnel. The 1185 primary example is ECN [RFC6040], which copies the ECN bits from the 1186 tunnel transit header to the tunnel link header during encapsulation 1187 at the ingress and modifies the tunnel transit header at egress based 1188 on a combination of the bits of the two headers. This is intended to 1189 allow congestion notification within the tunnel to be interpreted as 1190 if it were on the direct path. Other examples may involve the DSCP 1191 flags. In both cases, it is assumed that the intent of copying values 1192 on encapsulation and merging values on decapsulation has the effect 1193 of allowing the tunnel to act as if it participates in the same type 1194 of network as outside the tunnel (network M). 1196 4.8. Congestion 1198 In general, tunnels carrying IP traffic need not react directly to 1199 congestion any more than would any other link layer [RFC5405]. IP 1200 traffic is not generally expected to be congestion reactive. 1202 [text from David Black on ECN relaying?] 1204 4.9. Checksums 1206 IP traffic transiting a tunnel needs to expect a similar level of 1207 error detection and correction as it would expect from any other 1208 link. In the case of IPv4, there are no such expectations, which is 1209 partly why it includes a header checksum [RFC791]. 1211 IPv6 omitted the header checksum because it already expects most link 1212 errors to be detected and dropped by the link layer and because it 1213 also assumes transport protection [RFC2460]. When transiting IPv6 1214 over IPv6, the tunnel fails to provide the expected error detection. 1215 This is why IPv6 is often tunneled over layers that include separate 1216 protection, such as GRE [RFC2784]. 1218 The fragmentation created by the tunnel ingress can increase the need 1219 for stronger error detection and correction, especially at the tunnel 1220 egress to avoid reassembly errors. The Internet checksum is known to 1221 be susceptible to reassembly errors that could be common [RFC4963], 1222 and should not be relied upon for this purpose. This is why SEAL and 1223 AERO include a separate checksum [RFC5320][Te16]. This requirement 1224 can be undermined when using UDP as a tunnel with no UDP checksum (as 1225 per [RFC6935][RFC6936]) when fragmentation occurs because the egress 1226 has no checksum with which to validate reassembly. For this reason, 1227 it is safe to use UDP with a zero checksum for atomic (non- 1228 fragmented, non-fragmentable) tunnel link packets only; when used on 1229 fragments, whether generated at the ingress or en-route inside the 1230 tunnel, omission of such a checksum can result in reassembly errors 1231 that can cause additional work (capacity, forwarding processing, 1232 receiver processing) downstream of the egress. 1234 4.10. Numbering 1236 Tunnel ingresses and egresses have addresses associated with the 1237 encapsulation protocol. These addresses are the source and 1238 destination (respectively) of the encapsulated packet while 1239 traversing the tunnel network. 1241 Tunnels may or may not have addresses in the network whose traffic 1242 they transit (e.g., network M in Figure 4). In some cases, the tunnel 1243 is an unnumbered interface to a point-to-point virtual link. When the 1244 tunnel has multiple egresses, tunnel interfaces require separate 1245 addresses in network M. 1247 To see the effect of tunnel interface addresses, consider traffic 1248 sourced at router Ra in Figure 4. Even before being encapsulated by 1249 the ingress, that traffic needs a source IP network address that 1250 belongs to the router. One option is to use an address associated 1251 with one of the other interfaces of the router [RFC1122]. Another 1252 option is to assign a number to the tunnel interface itself. 1253 Regardless of which address is used, the resulting IP packet is then 1254 encapsulated by the tunnel ingress using the ingress address as a 1255 separate operation. 1257 4.11. Multicast 1259 [To be addressed] 1261 Note that PMTU for multicast is difficult. PIM carries an option that 1262 may help in the Population Count Extensions to PIM [RFC6807]. 1264 IMO, again, this is no different than any other multicast link. 1266 4.12. Multipoint 1268 Multipoint tunnels are tunnels with more than two ingress/egress 1269 endpoints. Just as tunnels emulate links, multipoint tunnels emulate 1270 multipoint links. 1272 Multipoint links require a support for egress determination, just as 1273 multipoint links do. This function is typically supported by ARP 1274 [RFC826] or ARP emulation (e.g., LAN Emulation, known as LANE 1275 [RFC2225]) for multipoint links. For multipoint tunnels, a similar 1276 mechanism is required for the same purpose - to determine the egress 1277 address for proper ingress encapsulation. 1279 All multipoint systems - tunnels and links - might support different 1280 MTUs between each ingress/egress (or link entrance/exit) pair. In 1281 most cases, it is simpler to assume a uniform MTU throughout the 1282 multipoint system, e.g., the minimum MTU supported across all 1283 ingress/egress pairs. This applies to both the ERMTU and TIMETU (the 1284 latter as used only by the ingress). 1286 A multipoint tunnel MUST have support for broadcast and multicast, in 1287 exactly the same way as this is already required for multipoint links 1289 [RFC3819]. Both modes can be supported either by a native mechanism 1290 inside the tunnel or by emulation using serial replication at the 1291 tunnel ingress, in the same way that links may provide the same 1292 support either natively (e.g., via promiscuous or automatic 1293 replication in the link itself) or network interface emulation (e.g., 1294 as for non-broadcast multiaccess networks, i.e., NBMAs). 1296 4.13. NAT / Load Balancing 1298 [To be addressed] 1300 Talk about ECMP / LAG here 1302 4.14. Recursive tunnels 1304 [IS THIS REDUNDANT?] 1306 The rules described in this document already support tunnels over 1307 tunnels, sometimes known as "recursive" tunnels, in which IP is 1308 transited over IP either directly or via intermediate encapsulation 1309 (IP-UDP-IP). 1311 There are known hazards to recursive tunneling, notably that the 1312 independence of the tunnel transit header and tunnel link header hop 1313 counts can result in a tunneling loop. Such looping can be avoided 1314 when using direct encapsulation (IP in IP) by use of a header option 1315 to track the encapsulation count and to limit that count [RFC2473]. 1316 This looping cannot be avoided when other protocols are used for 1317 tunneling, e.g., IP in UDP in IP, because the encapsulation count may 1318 not be visible where the recursion occurs. 1320 5. Observations (implications) 1322 [Leave this as a shopping list for now] 1324 5.1. Tunnel protocol designers 1326 Recursive tunneling + minimum MTU = frag/reassembly is inevitable, at 1327 least to be able to split/join two fragments 1329 Account for egress MTU/path MTU differences. 1331 Include a stronger checksum. 1333 Ensure the egress MTU is always larger than the path MTU. 1335 Ensure that the egress reassembly can keep up with line rate OR 1336 design PLPMTUD into the tunneling protocol. 1338 5.2. Tunnel implementers 1340 Detect when the egress MTU is exceeded. 1342 Detect when the egress MTU drops below the required minimum and shut 1343 down the tunnel if that happens - configuring the tunnel down and 1344 issuing a hard error may be the only way to detect this anomaly, and 1345 it's sufficiently important that the tunnel SHOULD be disabled. This 1346 is always better than blindly assuming the tunnel has been deployed 1347 correctly, i.e., that the solution has been engineered. 1349 Do NOT decrement the TTL as part of being a tunnel. It's always 1350 already OK for a router to decrement the TTL based on different next- 1351 hop routers, but TTL is a property of a router not a link. 1353 5.3. Tunnel operators 1355 Keep the difference between "enforced by operators" vs. "enforced by 1356 active protocol mechanism" in mind. It's fine to assume something the 1357 tunnel cannot or does not test, as long as you KNOW you can assume 1358 it. When the assumption is wrong, it will NOT be signaled by the 1359 tunnel. Do NOT decrement the TTL as part of being a tunnel. It's 1360 always already OK for a router to decrement the TTL based on 1361 different next-hop routers, but TTL is a property of a router not a 1362 link. 1364 Do NOT decrement the TTL as part of being a tunnel. It's always 1365 already OK for a router to decrement the TTL based on different next- 1366 hop routers, but TTL is a property of a router not a link. 1368 >>>> PLPMTUD can give incorrect information during ECMP or LAG 1370 5.4. Diagnostics 1372 Some current implementations include diagnostics to support 1373 monitoring the impact of tunneling, especially the impact on 1374 fragmentation and reassembly resources, the status of path MTU 1375 discovery, etc. 1377 >> Because a tunnel ingress/egress is a network interface, it SHOULD 1378 have similar resources as any other network interface. This includes 1379 resources for packet processing as well as monitoring. 1381 5.5. For existing standards 1383 5.5.1. Generic UDP Encapsulation (GUE - IP in UDP in IP) 1385 [He15] 1387 Consistent with this doc: 1389 Inconsistent with this doc: 1391 Imports RFC4459 1393 Appears to allow both pre and post-encapsulation fragmentation 1395 Recommendations: 1397 Should not encourage pre-encaps fragmentation 1399 See recommendations for RFC4459 1401 5.5.2. Generic Packet Tunneling in IPv6 1403 [RFC2473] 1405 Consistent with this doc: 1407 Considers the endpoints of the tunnel as virtual interfaces. 1409 Considers the tunnel a virtual link. 1411 Requires source fragmentation at the ingress and reassembly at the 1412 egress. 1414 Includes a recursion limit to prevent unlimited re-encapsulation. 1416 Sets tunnel transit header hop limit independently. 1418 Sends ICMPs back at the ingress based on the arriving tunnel 1419 transit packet and its relation to the tunnel MTU (though it uses the 1420 incorrect value of the tunnel MTU; see below). 1422 Allows for ingress relaying of internal tunnel errors (but see 1423 below; it does not discuss retaining state about these). 1425 Inconsistent with this doc: 1427 Decrements the tunnel transit header by 1, i.e., incorrectly 1428 assuming that tunnel endpoints occur at routers only and that the 1429 tunnel, rather than the router, is responsible for this decrement. 1431 This doc goes to pains to describe the decapsulation process as if 1432 it were distinct from conventional protocol processing by the 1433 receiver (when it should not be). 1435 Copies traffic class from tunnel link to tunnel transit header (as 1436 one variant). 1438 Treats the tunnel MTU as the tunnel path MTU, rather than the 1439 tunnel egress MTU. 1441 Incorrectly fragments IPv4 DF=0 tunnel transit packets that arrive 1442 larger than the tunnel MTU at the IPv6 layer; the relationship 1443 between IPv4 and the tunnel is more complex (as noted in this doc). 1445 Fails to retain state from the tunnel based on ingress receiving 1446 ICMP messages from inside the tunnel, e.g., such as might cause 1447 future tunnel transit packets arriving at the ingress to be discarded 1448 with an ICMP error response rather than allowing them to proceed into 1449 the tunnel. 1451 Recommendation: 1453 This doc should update 2473 for TTL decrement, tunnel MTU, and 1454 fragmentation. Other issues are less critical. 1456 5.5.3. Geneve (NVO3) 1458 [RFC7364] info, [Gr16] stds - ISSUE US AS BCP; Gr16 should follow 1460 Consistent with this doc: 1462 Generation of the link header fields is not discussed and presumed 1463 independent of transit packet. 1465 Reportedly treats an ingress/egress as applying to multiple 1466 tunnels, rather than considering them logically independent for each 1467 tunnel. This appears to confuse implementation aggregation with 1468 architecture. 1470 Reportedly treats tunnels as supporting traffic for multiple 1471 virtual networks, rather than considering them logically independent. 1472 This appears to confuse implementation aggregation with architecture. 1474 Inconsistent with this doc: 1476 Tries to match transit to tunnel path MTU rather than egress MTU. 1478 Recommendation: 1480 Gr16 should be updated to follow us 1482 5.5.4. GRE (IP in GRE in IP) 1484 IPv4 [RFC2784] stds, [RFC7588] info, [RFC7676] stds - NO CHANGES 1486 Consistent with this doc: 1488 Does not address link header generation. 1490 Non-default behavior allows fragmentation of link packet to match 1491 tunnel path MTU up to the limit of the egress MTU. 1493 Default behavior sets link DF independently. 1495 Shuts the tunnel down if the tunnel path MTU isn't >= 1280. 1497 Inconsistent with this doc: 1499 Based on tunnel path MTU, not egress MTU. 1501 Claims that the tunnel (GRE) mechanism is responsible for 1502 generating ICMP error messages. 1504 Default behavior fragments transit packet (where possible) based 1505 on tunnel path MTU (it should fragment based on egress MTU). 1507 Default behavior does not support the minimum MTU of IPv6 when run 1508 over IPv6. 1510 Non-default behavior allows copying DF for IPv4 in IPv4. 1512 Recommendations: 1514 No changes - existing docs largely describe legacy deployment. 1516 5.5.5. IP in IP / mobile IP 1518 IPv4 [RFC2003] stds, [RFC4459] info: 1520 Consistent with this doc: 1522 Generate link ID independently 1524 Generate link DF independently when transit DF=0 1526 Generate ECN/update ECN based on sharing info [RFC6040] 1528 Set link TTL to transit to egress only (independently) 1530 Do not decrement TTL on entry except when part of forwarding 1532 Do not decrement TTL on exit except when part of forwarding 1534 Options not copied, but used as a hint to desired services. 1536 Generally treat tunnel as a link, e.g., for link-local. 1538 Inconsistent with this doc 1540 Set link DF when transit DF=1 (won't work unless I-E runs PLPMTUD) 1542 Drop at egress if transit TTL=0 (wrong TTL for host-host tunnels) 1544 Drop when transit source is router's IP (prevents tun from router) 1546 Drop when transit source matches egress (prevents tun to router) 1548 Use tunnel ICMPs to generate upper ICMPs, copying context (ICMPs 1549 are now coming from inside a link!); these should be handled by 1550 setting errors as a "network interface" and letting the attached 1551 host/router figure out what to send. 1553 Using tunnel MTU discovery to tune the transit packet to the 1554 tunnel path MTU rather than egress MTU. 1556 Recommendations: 1558 IMO, ought to update 2003! (no "update" to informational), esp. 1559 regarding TTL issues, transit source drop issues, and tunnel MTU. 1561 IPv6 [RFC2473] std: 1563 Consistent with this doc: 1565 Doesn't discuss lots of header fields, but implies they're set 1566 independently. 1568 Sets link TTL independently. 1570 Inconsistent with this doc: 1572 Tunnel issues ICMP PTBs. 1574 ICMP PTB issued if larger then 1280 - header, rather than egress 1575 reassembly MTU. 1577 Fragments IPv6 over IPv6 fragments only if transit is <= 1280 1578 (i.e., forces all tunnels to have a max MTU of 1280). 1580 Fragments IPv4 over IPv6 fragments only if IPv4 DF=0 1581 (misinterpreting the "can fragment the IPv4 packet" as permission to 1582 fragment at the IPv6 link header) 1584 Considers encapsulation a forwarding operation and decrements the 1585 transit TTL. 1587 Recommendation: 1589 Should UPDATE 2473; tunnel should not issue PTBs (router should), 1590 issue them correctly, fragment correctly, and not TTL decrement. 1592 5.5.6. IPsec tunnel mode (IP in IPsec in IP) 1594 [RFC4301] std 1596 Consistent with this doc: 1598 Most of the rules, except as noted below. 1600 Inconsistent with this doc: 1602 Writes its own header copying rules (Sec 5.1.2), rather than 1603 referring to existing standards, but that makes sense for security 1604 reasons. 1606 Uses policy to set, clear, or copy DF (policy isn't the issue) 1608 Intertwines tunneling with forwarding rather than presenting the 1609 tunnel as a network interface; this can be corrected by using IPsec 1610 transport mode with an IP-in-IP tunnel [RFC3884]. 1612 Recommendations: 1614 None. 1616 5.5.7. L2TP 1618 [RFC3931] std 1620 Consistent with this doc: 1622 Does not address most link headers, which are thus independent. 1624 Inconsistent with this doc: 1626 Manages tunnel access based on tunnel path MTU, instead of egress 1627 MTU. 1629 Refers to RFC2473 (IPv6 in IPv6), which is inconsistent with this 1630 doc as noted above. 1632 Recommendations: 1634 Should update to use correct tunnel MTU. 1636 5.5.8. L2VPN 1638 [RFC4664] 1640 Consistent with this doc: 1642 Inconsistent with this doc: 1644 Recommendations: 1646 5.5.9. L3VPN 1648 [RFC4176] 1650 Consistent with this doc: 1652 Inconsistent with this doc: 1654 Recommendations: 1656 5.5.10. LISP 1658 [RFC6830] 1660 Consistent with this doc: 1662 Inconsistent with this doc: 1664 Recommendations: 1666 5.5.11. MPLS 1668 [RFC3031] 1670 Consistent with this doc: 1672 Inconsistent with this doc: 1674 Recommendations: 1676 5.5.12. PWE 1678 [RFC3985] 1680 Consistent with this doc: 1682 Inconsistent with this doc: 1684 Recommendations: 1686 5.5.13. SEAL/AERO 1688 [RFC5320][Te16] 1690 Consistent with this doc: 1692 Inconsistent with this doc: 1694 Recommendations: 1696 5.5.14. TRILL 1698 [RFC5556][RFC6325] 1700 Consistent with this doc: 1702 Puts IP in Ethernet, so most of the issues don't come up. 1704 Ethernet doesn't have TTL or fragment. 1706 Rbridge (trill) TTL header is independent of transit packet. 1708 Inconsistent with this doc: 1710 None. 1712 Recommendations: 1714 None. 1716 5.5.15. RTG DT encapsulations 1718 [No16], refers to NVO3 and other encapsulations 1720 Includes info on tables for multipoint tunnels, additional info for 1721 headers, etc. 1723 Consistent with this doc: 1725 Inconsistent with this doc: 1727 Assumes MTU can be managed to avoid fragmentation. This is 1728 impossible as long as any one layer is used recursively and that 1729 layer includes a mandatory minimum MTU. A "trust but verify" policy 1730 is better than assuming engineered MTU deployment is sufficient. 1732 Relies on ICMP PTB to correct for tunnel path MTU issues. 1734 Allows encaps protocols to not support fragmentation. 1736 Recommendations: 1738 That doc should refer to this regarding general tunneling issues, 1739 including fragmentation, tunnel MTU, and TTL, including the "trust 1740 but verify" issue for engineered MTU deployment. 1742 All encaps protocols for IP over IP (eventually) MUST support 1743 fragm. 1745 5.6. For future standards 1747 Larger IPv4 MTU (2K? or just 2x path MTU?) for reassembly 1749 Always include frag support for at least two frags; do NOT try to 1750 deprecate fragmentation. 1752 Limit encapsulation option use/space. 1754 Augment ICMP to have two separate messages: PTB vs P-bigger-than- 1755 optimal 1757 Include MTU as part of BGP as a hint - SB 1758 Hazards of multi-MTU draft-van-beijnum-multi-mtu-04 1760 6. Security Considerations 1762 Tunnels may introduce vulnerabilities or add to the potential for 1763 receiver overload and thus DOS attacks. These issues are primarily 1764 related to the fact that a tunnel is a link that traverses a network 1765 path and to fragmentation and reassembly. ICMP signal translation 1766 introduces a new security issue and must be done with care. ICMP 1767 generation at the router or host attached to a tunnel is already 1768 covered by existing requirements (e.g., should be throttled). 1770 Tunnels traverse multiple hops of a network path from ingress to 1771 egress. Traffic along such tunnels may be susceptible to on-path and 1772 off-path attacks, including fragment injection, reassembly buffer 1773 overload, and ICMP attacks. Some of these attacks may not be as 1774 visible to the endpoints of the architecture into which tunnels are 1775 deployed and these attacks may thus be more difficult to detect. 1777 Fragmentation at routers or hosts attached to tunnels may place an 1778 undue burden on receivers where traffic is not sufficiently diffuse, 1779 because tunnels may induce source fragmentation at hosts and path 1780 fragmentation (for IPv4 DF=0) more for tunnels than for other links. 1781 Care should be taken to avoid this situation, notably by ensuring 1782 that tunnel MTUs are not significantly different from other link 1783 MTUs. 1785 Tunnel ingresses emitting IP datagrams MUST obey all existing IP 1786 requirements, such as the uniqueness of the IP ID field. Failure to 1787 either limit encapsulation traffic, or use additional ingress/egress 1788 IP addresses, can result in high speed traffic fragments being 1789 incorrectly reassembled. 1791 Tunnels are susceptible to attacks at both the inner and outer 1792 network layers. The tunnel ingress/egress endpoints appear as network 1793 interfaces in the outer network, and are as susceptible as any other 1794 network interface. This includes vulnerability to fragmentation 1795 reassembly overload, traffic overload, and spoofed ICMPs that 1796 misreport the state of those interfaces. Similarly, the 1797 ingress/egress appear as hosts to the path traversed by the tunnel, 1798 and thus are as susceptible as any other host to attacks as well. 1800 [management?] 1802 [Access control?] 1803 describe relationship to [RFC6169] - JT (as per INTAREA meeting 1804 notes, don't cover Teredo-specific issues in RFC6169, but include 1805 generic issues here) 1807 7. IANA Considerations 1809 This document has no IANA considerations. 1811 The RFC Editor should remove this section prior to publication. 1813 8. References 1815 8.1. Normative References 1817 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1818 Requirement Levels", BCP 14, RFC 2119, March 1997. 1820 8.2. Informative References 1822 [Cl88] Clark, D., "The design philosophy of the DARPA internet 1823 protocols," Proc. Sigcomm 1988, p.106-114, 1988. 1825 [Er94] Eriksson, H., "MBone: The Multicast Backbone," 1826 Communications of the ACM, Aug. 1994, pp.54-60. 1828 [Gr16] Gross, J., et al., "Geneve: Generic Network Virtualization 1829 Encapsulation," draft-ietf-nvo3-geneve-01, Jan. 2016. 1831 [He15] Herbert, T., L. Yong, O. Zia, "Generic UDP Encapsulation," 1832 draft-ietf-nvo3-gue-04, Jul. 2016. 1834 [No16] Nordmark, E. (Ed.), A. Tian, J. Gross, J. Hudson, L. 1835 Kreeger, P. Garg, P. Thaler, T. Herbert, "Encapsulation 1836 Considerations," draft-ietf-rtgwg-dt-encap-01, Mar. 2016. 1838 [RFC768] Postel, J, "User Datagram Protocol," RFC 768, Aug. 1980 1840 [RFC791] Postel, J., "Internet Protocol," RFC 791 / STD 5, September 1841 1981. 1843 [RFC793] Postel, J, "Transmission Control Protocol," RFC 793, Sept. 1844 1981. 1846 [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol -- or 1847 -- Converting Network Protocol Addresses to 48.bit Ethernet 1848 Address for Transmission on Ethernet Hardware," RFC 826, 1849 Nov. 1982. 1851 [RFC1075] Waitzman, D., C. Partridge, S. Deering, "Distance Vector 1852 Multicast Routing Protocol," RFC 1075, Nov. 1988. 1854 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1855 Communication Layers," RFC 1122 / STD 3, October 1989. 1857 [RFC1191] Mogul, J., S. Deering, "Path MTU discovery," RFC 1191, 1858 November 1990. 1860 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers," RFC 1861 1812, June 1995. 1863 [RFC2003] Perkins, C., "IP Encapsulation within IP," RFC 2003, 1864 October 1996. 1866 [RFC2225] Laubach, M., J. Halpern, "Classical IP and ARP over ATM," 1867 RFC 2225, Apr. 1998. 1869 [RFC2460] Deering, S., R. Hinden, "Internet Protocol, Version 6 1870 (IPv6) Specification," RFC 2460, Dec. 1998. 1872 [RFC2473] Conta, A., "Generic Packet Tunneling in IPv6 1873 Specification," RFC 2473, Dec. 1998. 1875 [RFC2784] Farinacci, D., T. Li, S. Hanks, D. Meyer, P. Traina, 1876 "Generic Routing Encapsulation (GRE)", RFC 2784, March 1877 2000. 1879 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery," RFC 1880 2923, September 2000. 1882 [RFC2473] Conta, A., S. Deering, "Generic Packet Tunneling in IPv6 1883 Specification," RFC 2473, Dec. 1998. 1885 [RFC3031] Rosen, E., A. Viswanathan, R. Callon, "Multiprotocol Label 1886 Switching Architecture", RFC 3031, January 2001. 1888 [RFC3819] Karn, P., Ed., C. Bormann, G. Fairhurst, D. Grossman, R. 1889 Ludwig, J. Mahdavi, G. Montenegro, J. Touch, L. Wood, 1890 "Advice for Internet Subnetwork Designers," RFC 3819 / BCP 1891 89, July 2004. 1893 [RFC3884] Touch, J., L. Eggert, Y. Wang, "Use of IPsec Transport Mode 1894 for Dynamic Routing," RFC 3884, September 2004. 1896 [RFC3931] Lau, J., Ed., M. Townsley, Ed., I. Goyret, Ed., "Layer Two 1897 Tunneling Protocol - Version 3 (L2TPv3)," RFC 3931, March 1898 2005. 1900 [RFC3985] Bryant, S., P. Pate (Eds.), "Pseudo Wire Emulation Edge-to- 1901 Edge (PWE3) Architecture", RFC 3985, March 2005. 1903 [RFC4176] El Mghazli, Y., Ed., T. Nadeau, M. Boucadair, K. Chan, A. 1904 Gonguet, "Framework for Layer 3 Virtual Private Networks 1905 (L3VPN) Operations and Management," RFC 4176, October 2005. 1907 [RFC4301] Kent, S., and K. Seo, "Security Architecture for the 1908 Internet Protocol," RFC 4301, December 2005. 1910 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1911 Network Tunneling," RFC 4459, April 2006. 1913 [RFC4664] Andersson, L., Ed., E. Rosen, Ed., "Framework for Layer 2 1914 Virtual Private Networks (L2VPNs)," RFC 4664, September 1915 2006. 1917 [RFC4821] Mathis, M., J. Heffner, "Packetization Layer Path MTU 1918 Discovery," RFC 4821, March 2007. 1920 [RFC4861] Narten, T., E. Nordmark, W. Simpson, H. Soliman, "Neighbor 1921 Discovery for IP version 6 (IPv6)," RFC 4861, Sept. 2007. 1923 [RFC4963] Heffner, J., M. Mathis, B. Chandler, "IPv4 Reassembly 1924 Errors at High Data Rates," RFC 4963, July 2007. 1926 [RFC5320] Templin, F., Ed., "The Subnetwork Encapsulation and 1927 Adaptation Layer (SEAL)," RFC 5320, Feb. 2010. 1929 [RFC5405] Eggert, L., G. Fairhurst, "Unicast UDP Usage Guidelines for 1930 Application Designers," RFC 5405, Nov. 2008. 1932 [RFC5556] Touch, J., R. Perlman, "Transparently Interconnecting Lots 1933 of Links (TRILL): Problem and Applicability Statement," RFC 1934 5556, May 2009. 1936 [RFC5944] Perkins, C., Ed., "IP Mobility Support for IPv4, Revised" 1937 RFC 5944, Nov. 2010. 1939 [RFC6040] Briscoe, B., "Tunneling of Explicit Congestion 1940 Notification," RFC 6040, Nov. 2010. 1942 [RFC6169] Krishnan, S., D. Thaler, J. Hoagland, "Security Concerns 1943 With IP Tunneling," RFC 6169, Apr. 2011. 1945 [RFC6325] Perlman, R., D. Eastlake, D. Dutt, S. Gai, A. Ghanwani, 1946 "Routing Bridges (RBridges): Base Protocol Specification," 1947 RFC 6325, July 2011. 1949 [RFC6807] Farinacci, D., G. Shepherd, S. Venaas, Y. Cai, "Population 1950 Count Extensions to Protocol Independent Multicast (PIM)," 1951 RFC 6807, Dec. 2012. 1953 [RFC6830] Farinacci, D., V. Fuller, D. Meyer, D. Lewis, "The 1954 Locator/ID Separation Protocol," RFC 6830, Jan. 2013. 1956 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field," 1957 Proposed Standard, RFC 6864, Feb. 2013. 1959 [RFC6935] Eubanks, M., P. Chimento, M. Westerlund, "IPv6 and UDP 1960 Checksums for Tunneled Packets," RFC 6935, Apr. 2013. 1962 [RFC6936] Fairhurst, G., M. Westerlund, "Applicability Statement for 1963 the Use of IPv6 UDP Datagrams with Zero Checksums," RFC 1964 6936, Apr. 2013. 1966 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., M. 1967 Napierala, "Problem Statement: Overlays for Network 1968 Virtualization", RFC 7364, Oct. 2014. 1970 [RFC7450] Bumgardner, G., "Automatic Multicast Tunneling," RFC 7450, 1971 Feb. 2015. 1973 [RFC7588] Bonica, R., C. Pignataro, J. Touch, "A Widely-Deployed 1974 Solution to the Generic Routing Encapsulation Fragmentation 1975 Problem," RFC 7588, July 2015. 1977 [RFC7676] Pignataro, C., R. Bonica, S. Krishnan, "IPv6 Support for 1978 Generic Routing Encapsulation (GRE)," RFC 7676, Oct 2015. 1980 [Sa84] Saltzer, J., D. Reed, D. Clark, "End-to-end arguments in 1981 system design," ACM Trans. on Computing Systems, Nov. 1984. 1983 [Te16] Templin, F., "Asymmetric Extended Route Optimization," 1984 draft-templin-aerolink-67, Jun. 2016. 1986 [To01] Touch, J., "Dynamic Internet Overlay Deployment and 1987 Management Using the X-Bone," Computer Networks, July 2001, 1988 pp. 117-135. 1990 [To03] Touch, J., Y. Wang, L. Eggert, G. Finn, "Virtual Internet 1991 Architecture," USC/ISI Tech. Report 570, Aug. 2003. 1993 [To16] Touch, J., "Middleboxes Models Compatible with the 1994 Internet," USC/ISI Tech. Report , July 2016. 1996 [To98] Touch, J., S. Hotz, "The X-Bone," Proc. Globecom Third 1997 Global Internet Mini-Conference, Nov. 1998. 1999 [Zi80] Zimmermann, H., "OSI Reference Model - The ISO Model of 2000 Architecture for Open Systems Interconnection," IEEE Trans. 2001 on Comm., Apr. 1980. 2003 9. Acknowledgments 2005 This document originated as the result of numerous discussions among 2006 the authors, Jari Arkko, Stuart Bryant, Lars Eggert, Ted Faber, Gorry 2007 Fairhurst, Dino Farinacci, Matt Mathis, and Fred Templin. It 2008 benefitted substantially from detailed feedback from Toerless Eckert, 2009 Vincent Roca, and Lucy Yong, as well as other members of the Internet 2010 Area Working Group. 2012 This document was prepared using 2-Word-v2.0.template.dot. 2014 Authors' Addresses 2016 Joe Touch 2017 USC/ISI 2018 4676 Admiralty Way 2019 Marina del Rey, CA 90292-6695 2020 U.S.A. 2022 Phone: +1 (310) 448-9151 2023 Email: touch@isi.edu 2025 W. Mark Townsley 2026 Cisco 2027 L'Atlantis, 11, Rue Camille Desmoulins 2028 Issy Les Moulineaux, ILE DE FRANCE 92782 2030 Email: townsley@cisco.com 2032 APPENDIX A: Fragmentation efficiency 2034 A.1. Selecting fragment sizes 2036 There are different ways to fragment a packet. Consider a network 2037 with an MTU as shown in Figure 13, where packets are encapsulated 2038 over the same network layer as they arrive on (e.g., IP in IP). If a 2039 packet as large as the MTU arrives, it must be fragmented to 2040 accommodate the additional header. 2042 X===========================X (MTU) 2043 +----+----------------------+ 2044 | iH | DDDDDDDDDDDDDDDDDDDD | 2045 +----+----------------------+ 2046 | 2047 | X===========================X (MTU) 2048 | +---+----+------------------+ 2049 (a) +->| H'| iH | DDDDDDDDDDDDDDDD | 2050 | +---+----+------------------+ 2051 | | 2052 | | X===========================X (MTU) 2053 | | +----+---+----+-------------+ 2054 | (a1) +->| nH'| H | iH | DDDDDDDDDDD | 2055 | | +----+---+----+-------------+ 2056 | | 2057 | | +----+-------+ 2058 | (a2) +->| nH"| DDDDD | 2059 | +----+-------+ 2060 | 2061 | +---+------+ 2062 (b) +->| H"| DDDD | 2063 +---+------+ 2064 | 2065 | +----+---+------+ 2066 (b1) +->| nH'| H"| DDDD | 2067 +----+---+------+ 2069 Figure 13Fragmenting via maximum fit 2071 Figure 13 shows this process, using Outer Fragmentation as an example 2072 (the situation is the same for Inner Fragmentation, but the headers 2073 that are affected differ). The arriving packet is first split into 2074 (a) and (b), where (a) is of the MTU of the network. However, this 2075 tunnel then traverses over another tunnel, whose impact the first 2076 tunnel ingress has not accommodated. The packet (a) arrives at the 2077 second tunnel ingress, and needs to be encapsulated again, but 2078 because it is already at the MTU, it needs to be fragmented as well, 2079 into (a1) and (a2). In this case, packet (b) arrives at the second 2080 tunnel ingress and is encapsulated into (b1) without fragmentation, 2081 because it is already below the MTU size. 2083 In Figure 14, the fragmentation is done evenly, i.e., by splitting 2084 the original packet into two roughly equal-sized components, (c) and 2085 (d). Note that (d) contains more packet data, because (c) includes 2086 the original packet header because this is an example of Outer 2087 Fragmentation. The packets (c) and (d) arrive at the second tunnel 2088 encapsulator, and are encapsulated again; this time, neither packet 2089 exceeds the MTU, and neither requires further fragmentation. 2091 X===========================X (MTU) 2092 +----+----------------------+ 2093 | iH | DDDDDDDDDDDDDDDDDDDD | 2094 +----+----------------------+ 2095 | 2096 | X===========================X (MTU) 2097 | +---+----+----------+ 2098 (c) +->| H'| iH | DDDDDDDD | 2099 | +---+----+----------+ 2100 | | 2101 | | X===========================X (MTU) 2102 | | +----+---+----+----------+ 2103 | (c1) +->| nH | H'| iH | DDDDDDDD | 2104 | +----+---+----+----------+ 2105 | 2106 | +---+--------------+ 2107 (d) +->| H"| DDDDDDDDDDDD | 2108 +---+--------------+ 2109 | 2110 | +----+---+--------------+ 2111 (d1) +->| nH | H"| DDDDDDDDDDDD | 2112 +----+---+--------------+ 2114 Figure 14 Fragmenting evenly 2116 A.2. Packing 2118 Encapsulating individual packets to traverse a tunnel can be 2119 inefficient, especially where headers are large relative to the 2120 packets being carried. In that case, it can be more efficient to 2121 encapsulate many small packets in a single, larger tunnel payload. 2122 This technique, similar to the effect of packet bursting in Gigabit 2123 Ethernet (regardless of whether they're encoded using L2 symbols as 2124 delineators), reduces the overhead of the encapsulation headers 2125 (Figure 15). It reduces the work of header addition and removal at 2126 the tunnel endpoints, but increases other work involving the packing 2127 and unpacking of the component packets carried. 2129 +-----+-----+ 2130 | iHa | iDa | 2131 +-----+-----+ 2132 | 2133 | +-----+-----+ 2134 | | iHb | iDb | 2135 | +-----+-----+ 2136 | | 2137 | | +-----+-----+ 2138 | | | iHc | iDc | 2139 | | +-----+-----+ 2140 | | | 2141 v v v 2142 +----+-----+-----+-----+-----+-----+-----+ 2143 | oH | iHa | iHa | iHb | iDb | iHc | iDc | 2144 +----+-----+-----+-----+-----+-----+-----+ 2146 Figure 15 Packing packets into a tunnel 2148 [NOTE: PPP chopping and coalescing?]