idnits 2.17.1 

draft-ietf-intarea-tunnels-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  -- The draft header indicates that this document updates RFC4459, but the
     abstract doesn't seem to directly say this.  It does mention RFC4459
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC4459, updated by this document, for
     RFC5378 checks: 2004-06-14)

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 8, 2017) is 2513 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-nvo3-geneve-04

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 1981
     (Obsoleted by RFC 8201)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 6434
     (Obsoleted by RFC 8504)

  -- Obsolete informational reference (is this intentional?): RFC 6830
     (Obsoleted by RFC 9300, RFC 9301)

  -- Obsolete informational reference (is this intentional?): RFC 6833
     (Obsoleted by RFC 9301)

  == Outdated reference: A later version (-82) exists of
     draft-templin-aerolink-75


     Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------

1	Internet Area WG                                               J. Touch
2	Internet Draft                                                  USC/ISI
3	Intended status: Best Current Practice                      M. Townsley
4	Updates: 4459                                                     Cisco
5	Expires: December 2017                                     June 8, 2017

7	                  IP Tunnels in the Internet Architecture
8	                     draft-ietf-intarea-tunnels-07.txt

10	Status of this Memo

12	   This Internet-Draft is submitted in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   This document may contain material from IETF Documents or IETF
16	   Contributions published or made publicly available before November
17	   10, 2008. The person(s) controlling the copyright in some of this
18	   material may not have granted the IETF Trust the right to allow
19	   modifications of such material outside the IETF Standards Process.
20	   Without obtaining an adequate license from the person(s) controlling
21	   the copyright in such materials, this document may not be modified
22	   outside the IETF Standards Process, and derivative works of it may
23	   not be created outside the IETF Standards Process, except to format
24	   it for publication as an RFC or to translate it into languages other
25	   than English.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html

43	   This Internet-Draft will expire on November 8, 2017.

45	Copyright Notice

47	   Copyright (c) 2017 IETF Trust and the persons identified as the
48	   document authors. All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document. Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document. Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Abstract

62	   This document discusses the role of IP tunnels in the Internet
63	   architecture. An IP tunnel transits IP datagrams as payloads in non-
64	   link layer protocols. This document explains the relationship of IP
65	   tunnels to existing protocol layers and the challenges in supporting
66	   IP tunneling, based on the equivalence of tunnels to links. The
67	   implications of this document are used to derive recommendations that
68	   update MTU and fragment issues in RFC 4459.

70	Table of Contents

72	   1. Introduction...................................................3
73	   2. Conventions used in this document..............................6
74	      2.1. Key Words.................................................6
75	      2.2. Terminology...............................................6
76	   3. The Tunnel Model..............................................10
77	      3.1. What is a Tunnel?........................................11
78	      3.2. View from the Outside....................................13
79	      3.3. View from the Inside.....................................14
80	      3.4. Location of the Ingress and Egress.......................15
81	      3.5. Implications of This Model...............................15
82	      3.6. Fragmentation............................................16
83	         3.6.1. Outer Fragmentation.................................16
84	         3.6.2. Inner Fragmentation.................................18
85	         3.6.3. The Necessity of Outer Fragmentation................19
86	   4. IP Tunnel Requirements........................................20
87	      4.1. Encapsulation Header Issues..............................20
88	         4.1.1. General Principles of Header Fields Relationships...20
89	         4.1.2. Addressing Fields...................................21
90	         4.1.3. Hop Count Fields....................................21
91	         4.1.4. IP Fragment Identification Fields...................22
92	         4.1.5. Checksums...........................................23
93	      4.2. MTU Issues...............................................24
94	         4.2.1. Minimum MTU Considerations..........................24
95	         4.2.2. Fragmentation.......................................27
96	         4.2.3. Path MTU Discovery..................................30
97	      4.3. Coordination Issues......................................32
98	         4.3.1. Signaling...........................................32
99	         4.3.2. Congestion..........................................34
100	         4.3.3. Multipoint Tunnels and Multicast....................34
101	         4.3.4. Load Balancing......................................35
102	         4.3.5. Recursive Tunnels...................................36
103	   5. Observations..................................................37
104	      5.1. Summary of Recommendations...............................37
105	      5.2. Impact on Existing Encapsulation Protocols...............37
106	      5.3. Tunnel Protocol Designers................................40
107	         5.3.1. For Future Standards................................40
108	         5.3.2. Diagnostics.........................................40
109	      5.4. Tunnel Implementers......................................41
110	      5.5. Tunnel Operators.........................................41
111	   6. Security Considerations.......................................42
112	   7. IANA Considerations...........................................43
113	   8. References....................................................43
114	      8.1. Normative References.....................................43
115	      8.2. Informative References...................................43
116	   9. Acknowledgments...............................................48
117	   APPENDIX A: Fragmentation efficiency.............................50
118	      A.1. Selecting fragment sizes.................................50
119	      A.2. Packing..................................................51

121	1. Introduction

123	   The Internet layering architecture is loosely based on the ISO seven
124	   layer stack, in which data units traverse the stack by being wrapped
125	   inside data units of the next layer down [Cl88][Zi80]. A tunnel is a
126	   mechanism for transmitting data units between endpoints by wrapping
127	   them as data units of the same or higher layers, e.g., IP in IP
128	   (Figure 1) or IP in UDP (Figure 2).

130	                        +----+----+--------------+
131	                        | IP'| IP |     Data     |
132	                        +----+----+--------------+

134	                           Figure 1 IP inside IP

136	                     +----+-----+----+--------------+
137	                     | IP'| UDP | IP |     Data     |
138	                     +----+-----+----+--------------+

140	                   Figure 2 IP in UDP in IP in Ethernet

142	   This document focuses on tunnels that transit IP packets, i.e., in
143	   which an IP packet is the payload of another protocol, other than a
144	   typical link layer. A tunnel is a virtual link that can help decouple
145	   the network topology seen by transiting packets from the underlying
146	   physical network [To98][RFC2473]. Tunnels were critical in the
147	   development of multicast because not all routers were capable of
148	   processing multicast packets [Er94]. Tunnels allowed multicast
149	   packets to transit efficiently between multicast-capable routers over
150	   paths that did not support native link-layer multicast. Similar
151	   techniques have been used to support incremental deployment of other
152	   protocols over legacy substrates, such as IPv6 [RFC2546].

154	   Use of tunnels is common in the Internet. The word "tunnel" occurs in
155	   nearly 1,500 RFCs (of nearly 8,000 current RFCs, close to 20%), and
156	   is supported within numerous protocols, including:

158	   o  IP in IP / mobile IP - IPv4 in IPv4 tunnels using protocol 4
159	      [RFC2003][RFC2473][RFC5944] and its precursor called "IPIP" using
160	      protocol 94 [RFC1853]

162	   o  IP in IPv6 - IPv6 or IPv4 in IPv6 [RFC2473]

164	   o  IPsec - includes a tunnel mode to enable encryption or
165	      authentication of the an entire IP datagram inside another IP
166	      datagram [RFC4301]

168	   o  Generic Router Encapsulation (GRE) - a shim layer for tunneling
169	      any network layer in any other network layer, as in IP in GRE in
170	      IP [RFC2784][RFC7588][RFC7676], or inside UDP in IP [RFC8086]

172	   o  MPLS - a shim layer for tunneling IP over a circuit-like path over
173	      a link layer [RFC3031] or inside UDP in IP [RFC7510], in which
174	      identifiers are rewritten on each hop, often used for traffic
175	      provisioning

177	   o  LISP - a mechanism that uses multipoint IP tunnels to reduce
178	      routing table load within an enclave of routers at the expense of
179	      more complex tunnel ingress encapsulation tables [RFC6830]

181	   o  TRILL - a mechanism that uses multipoint L2 tunnels to enable use
182	      of L3 routing (typically IS-IS) in an enclave of Ethernet bridges
183	      [RFC5556][RFC6325]

185	   o  Generic UDP Encapsulation (GUE) - IP in UDP in IP [He16]

187	   o  Automatic Multicast Tunneling (AMT) - IP in UDP in IP for
188	      multicast [RFC7450]

190	   o  L2TP - PPP over IP, to extend a subscriber's DSL/FTTH connection
191	      from an access line provider to an ISP [RFC3931]

193	   o  L2VPNs - provides a link topology different from that provided by
194	      physical links [RFC4664]; many of these are not classical tunnels,
195	      using only tags (Ethernet VLAN tags) rather than encapsulation

197	   o  L3VPNs - provides a network topology different from that provided
198	      by ISPs [RFC4176]

200	   o  NVO3 - data center network sharing (to be determined, which may
201	      include use of GUE or other tunnels) [RFC7364]

203	   o  PWE3 - emulates wire-like services over packet-switched services
204	      [RFC3985]

206	   o  SEAL/AERO -IP in IP tunneling with an additional shim header
207	      designed to overcome the limitations of RFC2003 [RFC5320][Te17]

209	   o  A number of legacy variants, including swIPe (an IPsec precursor),
210	      a GRE precursor, and the Internet Encapsulation Protocol, all of
211	      which included a shim layer [RFC1853]

213	   The variety of tunnel mechanisms raises the question of the role of
214	   tunnels in the Internet architecture and the potential need for these
215	   mechanisms to have similar and predictable behavior. In particular,
216	   the ways in which packet size (i.e., Maximum Transmission Unit or
217	   MTU) mismatches and error signals (e.g., ICMP) are handled may
218	   benefit from a coordinated approach.

220	   Regardless of the layer in which encapsulation occurs, tunnels
221	   emulate a link. The only difference is that a link operates over a
222	   physical communication channel, whereas a tunnel operates over other
223	   software protocol layers. Because tunnels are links, they are subject
224	   to the same issues as any link, e.g., MTU discovery, signaling, and
225	   the potential utility of native support for broadcast and multicast
226	   [RFC3819]. Tunnels have some advantages over native links, being
227	   potentially easier to reconfigure and control because they can
228	   generally rely on existing out-of-band communication between its
229	   endpoints.

231	   The first attempt to use large-scale tunnels was to transit multicast
232	   traffic across the Internet in 1988, and this resulted in 'tunnel
233	   collapse'. At the time, tunnels were not implemented as
234	   encapsulation-based virtual links, but rather as loose source routes
235	   on un-encapsulated IP datagrams [RFC1075]. Then, as now, routers did
236	   not support use of the loose source route IP option at line rate, and
237	   the multicast traffic caused overload of the so-called "slow path"
238	   processing of IP datagrams in software. Using encapsulation tunnels
239	   avoided that collapse by allowing the forwarding of encapsulated
240	   packets to use the "fast path" hardware processing [Er94].

242	   The remainder of this document describes the general principles of IP
243	   tunneling and discusses the key considerations in the design of any
244	   protocol that tunnels IP datagrams. It derives its conclusions from
245	   the equivalence of tunnels and links and from requirements of
246	   existing standards for supporting IPv4 and IPv6 as payloads.

248	2. Conventions used in this document

250	2.1. Key Words

252	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
253	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
254	   document are to be interpreted as described in RFC-2119 [RFC2119].

256	   In this document, these key words will appear with that
257	   interpretation only when in ALL CAPS. Lower case uses of these words
258	   are not to be interpreted as carrying RFC-2119 significance.

260	2.2. Terminology

262	   This document uses the following terminology. Optional words in the
263	   term are indicated in parentheses, e.g., "(link or network)
264	   interface" or "egress (interface)".

266	   Terms from existing RFCs:

268	   o  Messages: variable length data labeled with globally-unique
269	      endpoint IDs, also known as a datagram for IP messages [RFC791].

271	   o  Node: a physical or logical network device that participates as
272	      either a host [RFC1122][RFC6434] or router [RFC1812]. This term
273	      originally referred to gateways since some very early RFCs [RFC5],
274	      but is currently the common way to describe a point in a network
275	      at which messages are processed.

277	   o  Host or endpoint: a node that sources or sinks messages labeled
278	      from/to its IDs, typically known as a host for both IP and higher-
279	      layer protocol messages [RFC1122].

281	   o  Source or sender: the node that generates a message [RFC1122].

283	   o  Destination or receiver: the node that consumes a message
284	      [RFC1122].

286	   o  Router or gateway: a node that relays IP messages using
287	      destination IDs and local context [RFC1812]. Routers also act as
288	      hosts when they source or sink messages. Also known as a forwarder
289	      for IP messages. Note that the notion of router is relative to the
290	      layer at which message processing is considered [To16].

292	   o  Link: a communications medium (or emulation thereof) that
293	      transfers IP messages between nodes without traversing a router
294	      (as would require decrementing the hop count) [RFC1122][RFC1812].

296	   o  Link packet: a link layer message, which can carry an IP datagram
297	      as a payload

299	   o  (Link or network) Interface: a location on a link co-located with
300	      a node where messages depart onto that link or arrive from that
301	      link. On physical links, this interface formats the message for
302	      transmission and interprets the received signals.

304	   o  Path: a sequence of one or more links over which an IP message
305	      traverses between source and destination nodes (hosts or routers).

307	   o  (Link) MTU: the largest message that can transit a link [RFC791],
308	      also often referred to simply as "MTU". It does not include the
309	      size of link-layer information, e.g., link layer headers or
310	      trailers, i.e., it refers to the message that the link can carry
311	      as a payload rather than the message as it appears on the link.
312	      This is thus the largest network layer packet (including network
313	      layer headers, e.g., IP datagram) that can transit a link. Note
314	      that this need not be the native size of messages on the link,
315	      i.e., the link may internally fragment and reassemble messages.
316	      For IPv4, the smallest MTU must be at least 68 bytes [RFC791], and
317	      for IPv6 the smallest MTU must be at least 1280 bytes [RFC2460].

319	   o  EMTU_S (effective MTU for sending): the largest message that can
320	      transit a link, possibly also accounting for fragmentation that
321	      happens before the fragments are emitted onto the link [RFC1122].
322	      When source fragmentation is possible, EMTU_S = EMTU_R. When
323	      source fragmentation is not possible, EMTU_S = (link) MTU. For
324	      IPv4, this is MUST be at least 68 bytes [RFC791] and for IPv6 this
325	      MUST be at least 1280 bytes [RFC2460].

327	   o  EMTU_R (effective MTU to receive): the largest payload message
328	      that a receiver must be able to accept. This thus also represents
329	      the largest message that can traverse a link, taking into account
330	      reassembly at the receiver that happens after the fragments are
331	      received [RFC1122]. For IPv4, this is MUST be at least 576 bytes
332	      [RFC791] and for IPv6 this MUST be at least 1500 bytes [RFC2460].

334	   o  Path MTU (PMTU): the largest message that can transit a path of
335	      links [RFC1191][RFC1981]. Typically, this is the minimum of the
336	      link MTUs of the links of the path, and represents the largest
337	      network layer message (including network layer headers) that can
338	      transit a path without requiring fragmentation while in transit.
339	      Note that this is not the largest network packet that can be sent
340	      between a source and destination, because that network packet
341	      might have been fragmented at the network layer of the source and
342	      reassembled at the network layer of the destination.

344	   o  Tunnel: a protocol mechanism that transits messages between an
345	      ingress interface and egress interface using encapsulation to
346	      allow an existing network path to appear as a single link
347	      [RFC1853]. Note that a protocol can be used to tunnel itself (IP
348	      over IP). There is essentially no difference between a tunnel and
349	      the conventional layering of the ISO stack (i.e., by this
350	      definition, Ethernet is can be considered tunnel for IP). A tunnel
351	      is also known as a virtual link.

353	   o  Ingress (interface): the virtual link interface of a tunnel that
354	      receives messages within a node, encapsulates them according to
355	      the tunnel protocol, and transmits them into the tunnel [RFC2983].
356	      An ingress is the tunnel equivalent of the outgoing (departing)
357	      network interface of a link, and its encapsulation processing is
358	      the tunnel equivalent of encoding a message for transmission over
359	      a physical link. The ingress virtual link interface can be co-
360	      located with the traffic source.

362	      The term 'ingress' in other RFCs also refers to 'network ingress',
363	      which is the entry point of traffic to a transit network. Because
364	      this document focuses on tunnels, the term "ingress" used in the
365	      remainder of this document implies "tunnel ingress".

367	   o  Egress (interface): a virtual link interface of a tunnel that
368	      receives messages that have finished transiting a tunnel and
369	      presents them to a node [RFC2983]. For reasons similar to ingress,
370	      the term 'egress' will refer to 'tunnel egress' throughout the
371	      remainder of this document. An egress is the tunnel equivalent of
372	      the incoming (arriving) network interface of a link and its
373	      decapsulation processing is the tunnel equivalent of interpreting
374	      a signal received from a physical link. The egress decapsulates
375	      messages for further transit to the destination. The egress
376	      virtual link interface can be co-located with the traffic
377	      destination.

379	   o  Ingress node: network device on which an ingress is attached as a
380	      virtual link interface [RFC2983]. Note that a node can act as both
381	      an ingress node and an egress node at the same time, but typically
382	      only for different tunnels.

384	   o  Egress node: device where an egress is attached as a virtual link
385	      interface [RFC2983]. Note that a device can act as both a ingress
386	      node and an egress node at the same time, but typically only for
387	      different tunnels.

389	   o  Inner header: the header of the message as it arrives to the
390	      ingress [RFC2003].

392	   o  Outer header(s): one or more headers added to the message by the
393	      ingress, as part of the encapsulation for tunnel transit
394	      [RFC2003].

396	   o  Mid-tunnel fragmentation: Fragmentation of the message during the
397	      tunnel transit, as could occur for IPv4 datagrams with DF=0
398	      [RFC2983].

400	   o  Atomic packet, datagram, or fragment: an IP packet that has not
401	      been fragmented and which cannot be fragmented further [RFC6864]
402	      [RFC6946].

404	   The following terms are introduced by this document:

406	   o  (Tunnel) transit packet: the packet arriving at a node connected
407	      to a tunnel that enters the ingress interface and exits the egress
408	      interface, i.e., the packet carried over the tunnel. This is
409	      sometimes known as the 'tunneled packet', i.e., the packet carried
410	      over the tunnel. This is the tunnel equivalent of a network layer
411	      packet as it would traverse a link. This document focuses on IPv4
412	      and IPv6 transit packets.

414	   o  (Tunnel) link packet (TLP): packets that traverse between two
415	      interfaces, e.g., from ingress interface to egress interface, in
416	      which resides all or part of a transit packet. A tunnel link
417	      packet is the tunnel equivalent of a link (layer) packet as it
418	      would traverse a link, which is why we use the same terminology.

420	   o  Tunnel MTU: the largest transit packet that can traverse a tunnel,
421	      i.e., the tunnel equivalent of a link MTU, which is why we use the
422	      same terminology. This is the largest transit packet which can be
423	      reassembled at the egress interface.

425	   o  Tunnel maximum atomic packet (MAP): the largest transit packet
426	      that can traverse a tunnel as an atomic packet, i.e., without
427	      requiring tunnel link packet fragmentation either at the ingress
428	      or on-path between the ingress and egress.

430	   o  Inner fragmentation: fragmentation of the transit packet that
431	      arrives at the ingress interface before any additional headers are
432	      added. This can only correctly occur for IPv4 DF=0 datagrams.

434	   o  Outer fragmentation: source fragmentation of the tunnel link
435	      packet after encapsulation; this can involve fragmenting the
436	      outermost header or any of the other (if any) protocol layers
437	      involved in encapsulation.

439	   o  Maximum frame size (MFS): the link-layer equivalent of the MTU,
440	      using the OSI term 'frame'. For Ethernet, the MTU (network packet
441	      size) is 1500 bytes but the MFS (link frame size) is 1518 bytes
442	      originally, and 1522 bytes assuming VLAN (802.1Q) tagging support.

444	   o  EMFS_S: the link layer equivalent of EMTU_S.

446	   o  EMFS_R: the link layer equivalent of EMTU_R.

448	   o  Path MFS: the link layer equivalent of PMTU.

450	3. The Tunnel Model

452	   A network architecture is an abstract description of a distributed
453	   communications system, its components and their relationships, the
454	   requisite properties of those components and the emergent properties
455	   of the system that result [To03]. Such descriptions can help explain
456	   behavior, as when the OSI seven-layer model is used as a teaching
457	   example [Zi80]. Architectures describe capabilities - and, just as
458	   importantly, constraints.

460	   A network can be defined as a system of endpoints and relays
461	   interconnected by communication paths, abstracting away issues of
462	   naming in order to focus on message forwarding. To the extent that
463	   the Internet has a single, coherent interpretation, its architecture
464	   is defined by its core protocols (IP [RFC791], TCP [RFC793], UDP
465	   [RFC768]) whose messages are handled by hosts, routers, and links
466	   [Cl88][To03], as shown in Figure 3:

468	               +------+    ------      ------    +------+
469	               |      |   /      \    /      \   |      |
470	               | HOST |--+ ROUTER +--+ ROUTER +--| HOST |
471	               |      |   \      /    \      /   |      |
472	               +------+    ------      ------    +------+

474	                   Figure 3 Basic Internet architecture

476	   As a network architecture, the Internet is a system of hosts
477	   (endpoints) and routers (relays) interconnected by links that
478	   exchange messages when possible. "When possible" defines the
479	   Internet's "best effort" principle. The limited role of routers and
480	   links represents the End-to-End Principle [Sa84] and longest-prefix
481	   match enables hierarchical forwarding using compact tables.

483	   Although the definitions of host, router, and link seem absolute,
484	   they are often relative as viewed within the context of one protocol
485	   layer, each of which can be considered a distinct network
486	   architecture. An Internet gateway is an OSI Layer 3 router when it
487	   transits IP datagrams but it acts as an OSI Layer 2 host as it
488	   sources or sinks Layer 2 messages on attached links to accomplish
489	   this transit capability. In this way, one device (Internet gateway)
490	   behaves as different components (router, host) at different layers.

492	   Even though a single device may have multiple roles - even
493	   concurrently - at a given layer, each role is typically static and
494	   determined by context. An Internet gateway always acts as a Layer 2
495	   host and that behavior does not depend on where the gateway is viewed
496	   from within Layer 2. In the context of a single layer, a device's
497	   behavior is typically modeled as a single component from all
498	   viewpoints in that layer (with some notable exceptions, e.g., Network
499	   Address Translators, which appear as hosts and routers, depending on
500	   the direction of the viewpoint [To16]).

502	3.1. What is a Tunnel?

504	   A tunnel can be modeled as a link in another network
505	   [To98][To01][To03]. In Figure 4, a source host (Hsrc) and destination
506	   host (Hdst) communicating over a network M in which two routers (Ra
507	   and Rd) are connected by a tunnel. Keep in mind that it is possible
508	   that both network N and network M can both be components of the
509	   Internet, i.e., there may be regular traffic as well as tunneled
510	   traffic over any of the routers shown.

512	                     --_                         --
513	         +------+   /  \                        /  \   +------+
514	         | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
515	         +------+   \  //\    /  \    /  \    /\\  /   +------+
516	                     --/I \--+ Rb +--+ Rc +--/E \--
517	                       \  /   \  /    \  /   \  /
518	                        \/     --      --     \/
519	                       <------ Network N ------->
520	         <-------------------- Network M --------------------->

522	                         Figure 4 The big picture

524	   The tunnel consists of two interfaces - an ingress (I) and an egress
525	   (E) that lie along a path connected by network N. Regardless of how
526	   the ingress and egress interfaces are connected, the tunnel serves as
527	   a link between the nodes it connects (here, Ra and Rd).

529	   IP packets arriving at the ingress interface are encapsulated to
530	   traverse network N. We call these packets 'tunnel transit packets'
531	   (or just 'transit packets') because they will transit the tunnel
532	   inside one or more of what we call 'tunnel link packets'. Transit
533	   packets correspond to network (IP) packets traversing a conventional
534	   link and tunnel link packets correspond to the packets of a
535	   conventional link layer (which can be called just 'link packets').

537	   Link packets use the source address of the ingress interface and the
538	   destination address of the egress interface - using whatever address
539	   is appropriate to the Layer at which the ingress and egress
540	   interfaces operate (Layer 2, Layer 3, Layer 4, etc.). The egress
541	   interface decapsulates those messages, which then continue on network
542	   M as if emerging from a link. To transit packets and to the routers
543	   the tunnel connects (Ra and Rd), the tunnel acts as a link and the
544	   ingress and egress interfaces act as network interfaces to that link.

546	   The model of each component (ingress and egress interfaces) and the
547	   entire system (tunnel) depends on the layer from which they are
548	   viewed. From the perspective of the outermost hosts (Hsrc and Hdst),
549	   the tunnel appears as a link between two routers (Ra and Rd). For
550	   routers along the tunnel (e.g., Rb and Rc), the ingress and egress
551	   interfaces appear as the endpoint hosts on network N.

553	   When the tunnel network (N) is implemented using the same protocol as
554	   the endpoint network (M), the picture looks flatter (Figure 5), as if
555	   it were running over a single network. However, this appearance is
556	   incorrect - nothing has changed from the previous case. From the
557	   perspective of the endpoints, Rb and Rc and network N don't exist and
558	   aren't visible, and from the perspective of the tunnel, network M
559	   doesn't exist. The fact that network N and M use the same protocol,
560	   and may traverse the same links is irrelevant.

562	                   --_         --      --          --
563	       +------+   /  \  /\    /  \    /  \    /\  /  \   +------+
564	       | Hsrc |--+ Ra +/I \--+ Rb +--+ Rc +--/E \+ Rd +--| Hdst |
565	       +------+   \  / \  /   \  /    \  /   \  / \  /   +------+
566	                   --   \/     --      --     \/   --
567	                         <---- Network N ----->
568	           <------------------ Network M ------------------->

570	                     Figure 5 IP in IP network picture

572	3.2. View from the Outside

574	   As already observed, from outside the tunnel, to network M, the
575	   entire tunnel acts as a link (Figure 6). Consequently all
576	   requirements for links supporting IP also apply to tunnels [RFC3819].

578	                   --_                             --
579	       +------+   /  \                            /  \   +------+
580	       | Hsrc |--+ Ra +--------------------------+ Rd +--| Hdst |
581	       +------+   \  /                            \  /   +------+
582	                   --                              --
583	           <------------------ Network M ------------------->

585	                Figure 6 Tunnels as viewed from the outside

587	   For example, the IP datagram hop counts (IPv4 Time-to-Live [RFC791]
588	   and IPv6 Hop Limit [RFC2460]) are decremented when traversing a
589	   router, but not when traversing a link - or thus a tunnel. Similarly,
590	   because the ingress and egress are interfaces on this outer network,
591	   they should never issue ICMP messages. A router or host would issue
592	   the appropriate ICMP, e.g., "packet too big" (IPv4 fragmentation
593	   needed and DF set [RFC792] or IPv6 packet too big [RFC4443]), when
594	   trying to send a packet to the egress, as it would for any interface.

596	   Tunnels have a tunnel MTU - the largest message that can transit that
597	   tunnel, just as links have a link MTU. This MTU may not reflect the
598	   native message size of hops within a multihop link (or tunnel) and
599	   the same is true for a tunnel. In both cases, the MTU is defined by
600	   the link's (or tunnel's) effective MTU to receive (EMTU_R).

602	3.3. View from the Inside

604	   Within network N, i.e., from inside the tunnel itself, the ingress
605	   interface is a source of tunnel link packets and the egress interface
606	   is a sink - so both are viewed as hosts on network N (Figure 7).
607	   Consequently [RFC1122] Internet host requirements apply to ingress
608	   and egress interfaces when Network N uses IP (and thus the
609	   ingress/egress interfaces use IP encapsulation).

611	                   _           --      --
612	                        /\    /  \    /  \    /\
613	                       /I \--+ Rb +--+ Rc +--/E \
614	                       \  /   \  /    \  /   \  /
615	                        \/     --      --     \/
616	                         <---- Network N ----->

618	            Figure 7 Tunnels, as viewed from within the tunnel

620	   Viewed from within the tunnel, the outer network (M) doesn't exist.
621	   Tunnel link packets can be fragmented by the source (ingress
622	   interface) and reassembled at the destination (egress interface),
623	   just as at conventional hosts. The path between ingress and egress
624	   interfaces has a path MTU, but the endpoints can exchange messages as
625	   large as can be reassembled at the destination (egress interface),
626	   i.e., the EMTU_R of the egress interface. However, in both cases,
627	   these MTUs refer to the size of the message that can transit the
628	   links and between the hosts of network N, which represents a link
629	   layer to network M. I.e., the MTUs of network N represent the maximum
630	   frame sizes (MFSs) of the tunnel as a link in network M.

632	   Information about the network - i.e., regarding network N MTU sizes,
633	   network reachability, etc. - are relayed from the destination (egress
634	   interface) and intermediate routers back to the source (ingress
635	   interface), without regard for the external network (M). When such
636	   messages arrive at the ingress interface, they may affect the
637	   properties of that interface (e.g., its reported MTU to network M),
638	   but they should never directly cause new ICMPs in the outer network
639	   M. Again, events at interfaces don't generate ICMP messages; it would
640	   be the host or router at which that interface is attached that would
641	   generate ICMPs, e.g., upon attempting to use that interface.

643	3.4. Location of the Ingress and Egress

645	   The ingress and egress interfaces are endpoints of the tunnel. Tunnel
646	   interfaces may be physical or virtual. The interface may be
647	   implemented inside the node where the tunnel attaches, e.g., inside a
648	   host or router. The interface may also be implemented as a "bump in
649	   the wire" (BITW), somewhere along a link between the two nodes the
650	   link interconnects. IP in IP tunnels are often implemented as
651	   interfaces on nodes, whereas IPsec tunnels are sometimes implemented
652	   as BITW. These implementation variations determine only whether
653	   information available at the link endpoints (ingress/egress
654	   interfaces) can be easily shared with the connected network nodes.

656	   An ingress or egress can be implemented as an integrated component,
657	   appearing equivalent to any other network interface, or can be more
658	   complex. In the simple variant, each is tightly coupled to another
659	   network interface, e.g., where the ingress emits encapsulated packets
660	   directly into another network interface, or where the egress receives
661	   packets to decapsulate directly from another network interface.

663	   The other implementation variant is more modular, but more complex to
664	   explain. The ingress acts like a network interface by receiving IP
665	   packets to transmit from an upper layer protocol (or relay mechanism
666	   of a router), but then acts like an upper layer protocol (or relay
667	   mechanism of a router) when it emits encapsulated packets back into
668	   the same node. The egress acts like an upper layer interface (or
669	   relay mechanism of a router) by receiving packets from a network
670	   interface, but then acts like a network interface when it emits
671	   decapsulated packets back in to the same node. To the existing
672	   network interfaces, the ingress/egress act like upper layer
673	   interfaces (i.e., sending or receiving application stacks), while to
674	   the interior of the node, the ingress/egress act like network
675	   interfaces. This dual nature inside the node reflects the duality of
676	   the tunnel as transit link and host-host channel.

678	3.5. Implications of This Model

680	   This approach highlights a few key features of a tunnel as a network
681	   architecture construct:

683	   o  To the transit packets, tunnels turn a network (Layer 3) path into
684	      a (Layer 2) link

686	   o  To nodes the tunnel traverses, the tunnel ingress and egress
687	      interfaces act as hosts that source and sink tunnel link packets

689	   The consequences of these features are as follow:

691	   o  Like a link MTU, a tunnel MTU is defined by the effective MTU of
692	      the receiver (i.e., EMTU_R of the egress).

694	   o  The messages inside the tunnel are treated like any other link
695	      layer, i.e., the MTU is determined by the largest (transit)
696	      payload that traverses the link.

698	   o  The tunnel path MFS is not relevant to the transited traffic.
699	      There is no mechanism or protocol by which it can be determined.

701	   o  Because routers, not links, alter hop counts [RFC1812], hopcounts
702	      are not decremented solely by the transit of a tunnel. A packet
703	      with a hop count of zero should successfully transit a link (and
704	      thus a tunnel) that connects two hosts.

706	   o  The addresses of a tunnel ingress and egress interface correspond
707	      to link layer addresses to the transit packet. Like links, some
708	      tunnels may not have their own addresses. Like network interfaces,
709	      ingress and egress interfaces typically require network layer
710	      addresses.

712	   o  Like network interfaces, the ingress and egress interfaces are
713	      never a direct source of ICMP messages but may provide information
714	      to their attached host or router to generate those ICMP messages
715	      during the processing of transit packets.

717	   o  Like network interfaces and links, two nodes may be connected by
718	      any combination of tunnels and links, including multiple tunnels.
719	      As with multiple links, existing network layer forwarding
720	      determines which IP traffic uses each link or tunnel.

722	   These observations make it much easier to determine what a tunnel
723	   must do to transit IP packets, notably it must satisfy all
724	   requirements expected of a link [RFC1122][RFC3819]. The remainder of
725	   this document explores these implications in greater detail.

727	3.6. Fragmentation

729	   There are two places where fragmentation can occur in a tunnel,
730	   called 'outer fragmentation' and 'inner fragmentation'. This document
731	   assumes that only outer fragmentation is viable because it is the
732	   only approach that works for both IPv4 datagrams with DF=1 and IPv6.

734	3.6.1. Outer Fragmentation

736	   Outer fragmentation is shown in Figure 8. The bottom of the figure
737	   shows the network topology, where transit packets originate at the
738	   source, enter the tunnel at the ingress interface for encapsulation,
739	   exit the tunnel at the egress interface where they are decapsulated,
740	   and arrive at the destination. The packet traffic is shown above the
741	   topology, where the transit packets are shown at the top. In this
742	   diagram, the ingress interface is located on router 'Ra' and the
743	   egress interface is located on router 'Rd'.

745	   When the link packet - which is the encapsulated transit packet -
746	   would exceed the tunnel MTU, the packet needs to be fragmented. In
747	   this case the packet is fragmented at the outer (link) header, with
748	   the fragments shown as (b1) and (b2). The outer header indicates
749	   fragmentation (as ' and "), the inner (transit) header occurs only in
750	   the first fragment, and the inner (transit) data is broken across the
751	   two packets. These fragments are reassembled at the egress interface
752	   during decapsulation in step (c), where the resulting link packet is
753	   reassembled and decapsulated so that the transit packet can continue
754	   on its way to the destination.

756	    Transit packet
757	    +----+----+                                              +----+----+
758	    | iH | iD |------+ -  -  -  -  -  -  -  -  -  -  +------>| iH | iD |
759	    +----+----+      |                               |       +----+----+
760	                     v Link packet                   |
761	              +----+----+----+               +----+----+----+
762	          (a) | oH | iH | iD |               | oH | iH | iD | (d)
763	              +----+----+----+               +----+----+----+
764	                     |                               ^
765	                     |    Link packet fragment #1    |
766	                     |       +----+----+-----+       |
767	                (b1) +----- >| oH'| iH | iD1 |-------+ (c)
768	                     |       +----+----+-----+       |
769	                     |                               |
770	                     |    Link packet fragment #2    |
771	                     |       +----+-----+            |
772	                (b2) +----- >| oH"| iD2 |------------+
773	                             +----+-----+
774	   +-----+    +--+ +---+                           +---+ +--+    +-----+
775	   |     |    |  |/     \                         /     \|  |    |     |
776	   | Src |----|Ra|Ingress|=======================|Egress |Rd|----| Dst |
777	   |     |    |  |\     /                         \     /|  |    |     |
778	   +-----+    +--+ +---+                           +---+ +--+    +-----+

780	             Figure 8 Fragmentation of the (outer) link packet

782	   Outer fragmentation isolates the tunnel encapsulation duties to the
783	   ingress and egress interfaces. This can be considered a benefit in
784	   clean, layered network design, but also may require complex egress
785	   interface decapsulation, especially where tunnels aggregate large
786	   amounts of traffic, such as may result in IP ID overload (see Sec.
787	   4.1.4). Outer fragmentation is valid for any tunnel link protocol
788	   that supports fragmentation (e.g., IPv4 or IPv6), in which the tunnel
789	   endpoints act as the host endpoints of that protocol.

791	   Along the tunnel, the inner (transit) header is contained only in the
792	   first fragment, which can interfere with mechanisms that 'peek' into
793	   lower layer headers, e.g., as for relayed ICMP (see Sec. 4.3).

795	3.6.2. Inner Fragmentation

797	   Inner fragmentation distributes the impact of tunnel fragmentation
798	   across both egress interface decapsulation and transit packet
799	   destination, as shown in Figure 9; this can be especially important
800	   when the tunnel would otherwise need to source (outer) fragment large
801	   amounts of traffic. However, this mechanism is valid only when the
802	   transit packets can be fragmented on-path, e.g., as when the transit
803	   packets are IPv4 datagrams with DF=0.

805	   Again, the network topology is shown at the bottom of the figure, and
806	   the original packets show at the top. Packets arrive at the ingress
807	   node (router Ra) and are fragmented there based into transit packet
808	   fragments #1 (a1) and #2 (a2). These fragments are encapsulated at
809	   the ingress interface in steps (b1) and (b2) and each resulting link
810	   packet traverses the tunnel. When these link packets arrive at the
811	   egress interface they are decapsulated in steps (c1) and (c2) and the
812	   egress node (router) forwards the transit packet fragments to their
813	   destination. This destination is then responsible for reassembling
814	   the transit packet fragments into the original transit packet (d).

816	   Along the tunnel, the inner headers are copied into each fragment,
817	   and so can be 'peeked at' inside the tunnel (see Sec. 4.3).
818	   Fragmentation shifts from the ingress interface to the ingress router
819	   and reassembly shifts from the egress interface to the destination.

821	    Transit packet
822	   +----+----+                                               +----+----+
823	   | iH | iD |-+ - - - - -  -  -  -  -  -  -  -  -  -  -  - >| iH | iD |
824	   +----+----+ |                                             +----+----+
825	               v Transit packet fragment #1                         ^
826	            +----+-----+                           +----+-----+     |
827	       (a1) | iH'| iD1 |                           | iH'| iD1 |-----+(d)
828	            +----+-----+                           +----+-----+     ^
829	               |     |        Link packet #1         ^              |
830	               |     |       +----+----+-----        |              |
831	               | (b1)+----- >| oH | iH'| iD1 |-------+(c1)          |
832	               |             +----+----+-----+                      |
833	               |                                                    |
834	               v Transit packet fragment #2                         |
835	            +----+-----+                           +----+-----+     |
836	       (a2) | iH"| iD2 |                           | iH"| iD2 |-----+
837	            +----+-----+                           +----+-----+
838	                     |        Link packet #2         |
839	                     |       +----+----+-----+       |
840	                 (b2)+----- >| oH | iH"| iD2 |-------+(c2)
841	                             +----+----+-----+
842	   +-----+    +--+ +---+                           +---+ +--+    +-----+
843	   |     |    |  |/     \                         /     \|  |    |     |
844	   | Src |----|Ra|Ingress|=======================|Egress |Rd|----| Dst |
845	   |     |    |  |\     /                         \     /|  |    |     |
846	   +-----+    +--+ +---+                           +---+ +--+    +-----+

848	           Figure 9 Fragmentation of the inner (transit) packet

850	3.6.3. The Necessity of Outer Fragmentation

852	   Fragmentation is critical for tunnels that support transit packets
853	   for protocols with minimum MTU requirements, while operating over
854	   tunnel paths using protocols that have their own MTU requirements.
855	   Depending on the amount of space used by encapsulation, these two
856	   minimums will ultimately interfere (especially when a protocol
857	   transits itself either directly, as with IP-in-IP, or indirectly, as
858	   in IP-in-GRE-in-IP), and the transit packet will need to be
859	   fragmented to both support a tunnel MTU while traversing tunnels with
860	   their own tunnel path MTUs.

862	   Outer fragmentation is the only solution that supports all IPv4 and
863	   IPv6 traffic, because inner fragmentation is allowed only for IPv4
864	   datagrams with DF=0.

866	4. IP Tunnel Requirements

868	   The requirements of an IP tunnel are defined by the requirements of
869	   an IP link because both transit IP packets. A tunnel thus must
870	   transit the IP minimum MTU, i.e., 68 bytes for IPv4 [RFC793] and 1280
871	   bytes for IPv6 [RFC2460] and a tunnel must support address resolution
872	   when there is more than one egress interface for that tunnel.

874	   The requirements of the tunnel ingress and egress interfaces are
875	   defined by the network over which they exchange messages (link
876	   packets). For IP-over-IP, this means that the ingress interface MUST
877	   NOT exceed the IP fragment identification field uniqueness
878	   requirements [RFC6864]. Uniqueness is more difficult to maintain at
879	   high packet rates for IPv4, whose fragment ID field is only 16 bits.

881	   These requirements remain even though tunnels have some unique
882	   issues, including the need for additional space for encapsulation
883	   headers and the potential for tunnel MTU variation.

885	4.1. Encapsulation Header Issues

887	   Tunnel encapsulation uses a non-link protocol as a link layer. The
888	   encapsulation layer thus has the same requirements and expectations
889	   as any other IP link layer when used to transit IP packets. These
890	   relationships are addressed in the following subsections.

892	4.1.1. General Principles of Header Fields Relationships

894	   Some tunnel specifications attempt to relate the header fields of the
895	   transit packet and tunnel link packet. In some cases, this
896	   relationship is warranted, whereas in other cases the two protocol
897	   layers need to be isolated from each other. For example, the tunnel
898	   link header source and destination addresses are network endpoints in
899	   the tunnel network N, but have no meaning in the outer network M. The
900	   two sets of addresses are effectively independent, just as are other
901	   network and link addresses.

903	   Because the tunneled packet uses source and destination addresses
904	   with a separate meaning, it is inappropriate to copy or reuse the
905	   IPv4 Identification (ID) or IPv6 Fragment ID fields of the tunnel
906	   transit packet (see Section 4.1.4). Similarly, the DF field of the
907	   transit packet is not related to that field in the tunnel link packet
908	   header (presuming both are IPv4) (see Section 4.2). Most other fields
909	   are similarly independent between the transit packet and tunnel link
910	   packet. When a field value is generated in the encapsulation header,
911	   its meaning should be derived from what is desired in the context of
912	   the tunnel as a link. When feedback is received from these fields,
913	   they should be presented to the tunnel ingress and egress as if they
914	   were network interfaces. The behavior of the node where these
915	   interfaces attach should be identical to that of a conventional link.

917	   There are exceptions to this rule that are explicitly intended to
918	   relay signals from inside the tunnel to the network outside the
919	   tunnel, typically relevant only when the tunnel network N and the
920	   outer network M use the same network. These apply only when that
921	   coordination is defined, as with explicit congestion notification
922	   (ECN) [RFC6040] (see Section 4.3.2), and differentiated services code
923	   points (DSCPs) [RFC2983]. Equal-cost multipath routing may also
924	   affect how some encapsulation fields are set, including IPv6 flow
925	   labels [RFC6438] and source ports for transport protocols when used
926	   for tunnel encapsulation [RFC8085] (see Section 4.3.4).

928	4.1.2. Addressing Fields

930	   Tunnel ingresses and egresses have addresses associated with the
931	   encapsulation protocol. These addresses are the source and
932	   destination (respectively) of the encapsulated packet while
933	   traversing the tunnel network.

935	   Tunnels may or may not have addresses in the network whose traffic
936	   they transit (e.g., network M in Figure 4). In some cases, the tunnel
937	   is an unnumbered interface to a point-to-point virtual link. When the
938	   tunnel has multiple egresses, tunnel interfaces require separate
939	   addresses in network M.

941	   To see the effect of tunnel interface addresses, consider traffic
942	   sourced at router Ra in Figure 4. Even before being encapsulated by
943	   the ingress, traffic needs a source IP network address that belongs
944	   to the router. One option is to use an address associated with one of
945	   the other interfaces of the router [RFC1122]. Another option is to
946	   assign a number to the tunnel interface itself. Regardless of which
947	   address is used, the resulting IP packet is then encapsulated by the
948	   tunnel ingress using the ingress address as a separate operation.

950	4.1.3. Hop Count Fields

952	   The Internet hop count field is used to detect and avoid forwarding
953	   loops that cannot be corrected without a synchronized reboot. The
954	   IPv4 Time-to-Live (TTL) and IPv6 Hop Limit field each serve this
955	   purpose [RFC791][RFC2460]. The IPv4 TTL field was originally intended
956	   to indicate packet expiration time, measured in seconds. A router is
957	   required to decrement the TTL by at least one or the number of
958	   seconds the packet is delayed, whichever is larger [RFC1812]. Packets
959	   are rarely held that long, and so the field has come to represent the
960	   count of the number of routers traversed. IPv6 makes this meaning
961	   more explicit.

963	   These hop count fields represent the number of network forwarding
964	   elements (routers) traversed by an IP datagram. An IP datagram with a
965	   hop count of zero can traverse a link between two hosts because it
966	   never visits a router (where it would need to be decremented and
967	   would have been dropped).

969	   An IP datagram traversing a tunnel thus need not have its hop count
970	   modified, i.e., the tunnel transit header need not be affected. A
971	   zero hop count datagram should be able to traverse a tunnel as easily
972	   as it traverses a link. A router MAY be configured to decrement
973	   packets traversing a particular link (and thus a tunnel), which may
974	   be useful in emulating a tunnel path as if it were a network path
975	   that traversed one or more routers, but this is strictly optional.
976	   The ability of the outer network M and tunnel network N to avoid
977	   indefinitely looping packets does not rely on the hop counts of the
978	   transit packet and tunnel link packet being related.

980	   The hop count field is also used by several protocols to determine
981	   whether endpoints are 'local', i.e., connected to the same subnet
982	   (link-local discovery and related protocols [RFC4861]). A tunnel is a
983	   way to make a remote network address appear directly-connected, so it
984	   makes sense that the other ends of the tunnel appear local and that
985	   such link-local protocols operate over tunnels unless configured
986	   explicitly otherwise. When the interfaces of a tunnel are numbered,
987	   these can be interpreted the same way as if they were on the same
988	   link subnet.

990	4.1.4. IP Fragment Identification Fields

992	   Both IPv4 and IPv6 include an IP Identification (ID) field to support
993	   IP datagram fragmentation and reassembly [RFC791][RFC1122][RFC2460].
994	   When used, the ID field is intended to be unique for every packet for
995	   a given source address, destination address, and protocol, such that
996	   it does not repeat within the Maximum Segment Lifetime (MSL).

998	   For IPv4, this field is in the default header and is meaningful only
999	   when either source fragmented or DF=0 ("non-atomic packets")
1000	   [RFC6864]. For IPv6, this field is contained in the optional Fragment
1001	   Header [RFC2460]. Although IPv6 supports only source fragmentation,
1002	   the field may occur in atomic fragments [RFC6946].

1004	   Although the ID field was originally intended for fragmentation and
1005	   reassembly, it can also be used to detect and discard duplicate
1006	   packets, e.g., at congested routers (see Sec. 3.2.1.5 of [RFC1122]).

1008	   For this reason, and because IPv4 packets can be fragmented anywhere
1009	   along a path, all non-atomic IPv4 packets and all IPv6 packets
1010	   between a source and destination of a given protocol must have unique
1011	   ID values over the potential fragment reordering period
1012	   [RFC2460][RFC6864].

1014	   The uniqueness of the IP ID is a known problem for high speed nodes,
1015	   because it limits the speed of a single protocol between two
1016	   endpoints [RFC4963]. Although this RFC suggests that the uniqueness
1017	   of the IP ID is moot, tunnels exacerbate this condition. A tunnel
1018	   often aggregates traffic from a number of different source and
1019	   destination addresses, of different protocols, and encapsulates them
1020	   in a header with the same ingress and egress addresses, all using a
1021	   single encapsulation protocol. If the ingress enforces IP ID
1022	   uniqueness, this can either severely limit tunnel throughput or can
1023	   require substantial resources; the alternative is to ignore IP ID
1024	   uniqueness and risk reassembly errors. Although fragmentation is
1025	   somewhat rare in the current Internet at large, it can be common
1026	   along a tunnel. Reassembly errors are not always detected by other
1027	   protocol layers (see Sec. 4.3.3) , and even when detected they can
1028	   result in excessive overall packet loss and can waste bandwidth
1029	   between the egress and ultimate packet destination.

1031	   The 32-bit IPv6 ID field in the Fragment Header is typically used
1032	   only during source fragmentation. The size of the ID field is
1033	   typically sufficient that a single counter can be used at the tunnel
1034	   ingress, regardless of the endpoint addresses or next-header
1035	   protocol, allowing efficient support for very high throughput
1036	   tunnels.

1038	   The smaller 16-bit IPv4 ID is more difficult to correctly support. A
1039	   recent update to IPv4 allows the ID to be repeated for atomic packets
1040	   [RFC6864]. When either source fragmentation or on-path fragmentation
1041	   is supported, the tunnel ingress may need to keep independent ID
1042	   counters for each tunnel source/destination/protocol tuple.

1044	4.1.5. Checksums

1046	   IP traffic transiting a tunnel needs to expect a similar level of
1047	   error detection and correction as it would expect from any other
1048	   link. In the case of IPv4, there are no such expectations, which is
1049	   partly why it includes a header checksum [RFC791].

1051	   IPv6 omitted the header checksum because it already expects most link
1052	   errors to be detected and dropped by the link layer and because it
1053	   also assumes transport protection [RFC2460]. When transiting IPv6
1054	   over IPv6, the tunnel fails to provide the expected error detection.

1056	   This is why IPv6 is often tunneled over layers that include separate
1057	   protection, such as GRE [RFC2784].

1059	   The fragmentation created by the tunnel ingress can increase the need
1060	   for stronger error detection and correction, especially at the tunnel
1061	   egress to avoid reassembly errors. The Internet checksum is known to
1062	   be susceptible to reassembly errors that could be common [RFC4963],
1063	   and should not be relied upon for this purpose. This is why some
1064	   tunnel protocols, e.g., SEAL and AERO [RFC5320][Te17] and GRE
1065	   [RFC2784] as well as legacy protocols swIPe and the Internet
1066	   Encapsulation Protocol [RFC1853], include a separate checksum. This
1067	   requirement can be undermined when using UDP as a tunnel with no UDP
1068	   checksum (as per [RFC6935][RFC6936]) when fragmentation occurs
1069	   because the egress has no checksum with which to validate reassembly.
1070	   For this reason, it is safe to use UDP with a zero checksum for
1071	   atomic tunnel link packets only; when used on fragments, whether
1072	   generated at the ingress or en-route inside the tunnel, omission of
1073	   such a checksum can result in reassembly errors that can cause
1074	   additional work (capacity, forwarding processing, receiver
1075	   processing) downstream of the egress.

1077	4.2. MTU Issues

1079	   Link MTUs, IP datagram limits, and transport protocol segment sizes
1080	   are already related by several requirements
1081	   [RFC768][RFC791][RFC1122][RFC1812][RFC2460] and by a variety of
1082	   protocol mechanisms that attempt to establish relationships between
1083	   them, including path MTU discovery (PMTUD) [RFC1191][RFC1981],
1084	   packetization layer path MTU discovery (PLMTUD) [RFC4821], as well as
1085	   mechanisms inside transport protocols [RFC793][RFC4340][RFC4960]. The
1086	   following subsections summarize the interactions between tunnels and
1087	   MTU issues, including minimum tunnel MTUs, tunnel fragmentation and
1088	   reassembly, and MTU discovery.

1090	4.2.1. Minimum MTU Considerations

1092	   There are a variety of values of minimum MTU values to consider, both
1093	   in a conventional network and in a tunnel as a link in that network.
1094	   These are indicated in Figure 10, an annotated variant of Figure 4.
1095	   Note that a (link) MTU (a) corresponds to a tunnel MTU (d) and that a
1096	   path MTU (b) corresponds to a tunnel path MTU (e). The tunnel MTU is
1097	   the EMTU_R of the egress interface, because that defines the largest
1098	   transit packet message that can traverse the tunnel as a link in
1099	   network M. The ability to traverse the hops of the tunnel - in
1100	   network N - is not related, and only the ingress need be concerned
1101	   with that value.

1103	                    --_                            --
1104	        +------+   /  \                           /  \   +------+
1105	        | Hsrc |--+ Ra +       --       --       + Rd +--| Hdst |
1106	        +------+   \  //\     /  \     /  \     /\\  /   +------+
1107	                    --/I \---+ Rb +---+ Rc +---/E \--
1108	                      \  /    \  /     \  /    \  /
1109	                       \/      --       --      \/
1110	                        <----- Network N ------->
1111	         <-------------------- Network M --------------------->

1113	   Communication in network M viewed at that layer:
1114	    (a)         <->          Link MTU
1115	    (b)                <---- Tunnel MTU --------->
1116	    (c)         <----------- Path MTU ----------------->
1117	    (d) <------------------- EMTU_R --------------------------->

1119	   Communication in network N viewed at that layer:
1120	    (e)                   <--> Link MTU
1121	    (f)                   <--- Path MTU ------>
1122	    (g)                 <----- EMTU_R --------->

1124	   Communication in network N viewed from network M:
1125	    (h)                   <--> MFS
1126	    (i)                   <--- Path MFS ------>
1127	    (j)                 <----- EMFS_R --------->

1129	                    Figure 10 The variety of MTU values

1131	   Consider the following example values. For IPv6 transit packets, the
1132	   minimum (link) MTU (a) is 1280 bytes, which similarly applies to
1133	   tunnels as the tunnel MTU (b). The path MTU (c) is the minimum of the
1134	   links (including tunnels as links) along a path, and indicates the
1135	   smallest IP message (packet or fragment) that can traverse a path
1136	   between a source and destination without on-path fragmentation (e.g.,
1137	   supported in IPv4 with DF=0). Path MTU discovery, either at the
1138	   network layer (PMTUD [RFC1191][RFC1981]) or packetization layer
1139	   (PLPMTUD [RFC4821]) attempts to tune the source IP packets and
1140	   fragments (i.e., EMTU_S) to fit within this path MTU size to avoid
1141	   fragmentation and reassembly [Ke95]. The minimum EMTU_R (d) is 1500
1142	   bytes, i.e., the minimum MTU for endpoint-to-endpoint communication.

1144	   The tunnel is a source-destination communication in network N.
1145	   Messages between the tunnel source (the ingress interface) and tunnel
1146	   destination (egress interface) similarly experience a variety of
1147	   network N MTU values, including a link MTU (e), a path MTU (f), and
1148	   an EMTU_R (g). The network N message maximum is limited by the path
1149	   MTU, and the source-destination message maximum (EMTU_S) is limited
1150	   by the path MTU when source fragmentation is disabled and by EMTU_R
1151	   otherwise, just as it was in for those types of MTUs in network M.
1152	   For an IPv6 network N, its link and path MTUs must be at least 1280
1153	   and its EMTU_R must be at least 1500.

1155	   However, viewed from the context of network M, these network N MTUs
1156	   are link layer properties, i.e., maximum frame sizes (MFS (h)). The
1157	   network N EMTU_R determines the largest message that can transit
1158	   between the source (ingress) and destination (egress), but viewed
1159	   from network M this is a link layer, i.e., EMFS_R (j). The tunnel
1160	   EMTU_R is EMFS_R minus the link (encapsulation) headers and includes
1161	   the encapsulation headers of the link layer. Just as the path MTU has
1162	   no bearing on EMTU_R, the path MFS (i) in network N has no bearing on
1163	   the MTU of the tunnel.

1165	   For IPv6 networks M and N, these relationships are summarized as
1166	   follows:

1168	   o  Network M MTU = 1280, the largest transit packet (i.e., payload)
1169	      over a single IPv6 link in the base network without source
1170	      fragmentation

1172	   o  Network M path MTU = 1280, the transit packet (i.e., payload) that
1173	      can traverse a path of links in the base network without source
1174	      fragmentation

1176	   o  Network M EMTU_R = 1500, the largest transit packet (i.e.,
1177	      payload) that can traverse a path in the base network with source
1178	      fragmentation

1180	   o  Network N MTU = 1280 (for the same reasons as for network M)

1182	   o  Network N path MTU = 1280 (for the same reasons as for network M)

1184	   o  Network N EMTU_R = 1500 (for the same reasons as for network M)

1186	   o  Tunnel MTU = 1500-encapsulation (typically 1460), the network N
1187	      EMTU_R payload

1189	   o  Tunnel MAP (maximum atomic packet) = largest network M message
1190	      that transits a tunnel as an atomic packet using network N as a
1191	      link layer: 1280-encapsulation, i.e., the network N path MTU
1192	      payload (which is itself limited by the tunnel path MFS)

1194	   The difference between the network N MTU and its treatment as a link
1195	   layer in network M is the reason why the tunnel ingress interfaces
1196	   need to support fragmentation and tunnel egress interfaces need to
1197	   support reassembly in the encapsulation layer(s). The high cost of
1198	   fragmentation and reassembly is why it is useful for applications to
1199	   avoid sending messages too close to the size of the tunnel path MTU
1200	   [Ke95], although there is no signaling mechanism that can achieve
1201	   this (see Section 4.2.3).

1203	4.2.2. Fragmentation

1205	   A tunnel interacts with fragmentation in two different ways. As a
1206	   link in network M, transit packets might be fragmented before they
1207	   reach the tunnel - i.e., in network M either during source
1208	   fragmentation (if generated at the same node as the ingress
1209	   interface) or forwarding fragmentation (for IPv4 DF=0 datagrams). In
1210	   addition, link packets traversing inside the tunnel may require
1211	   fragmentation by the ingress interface - i.e., source fragmentation
1212	   by the ingress as a host in network N. These two fragmentation
1213	   operations are no more related than are conventional IP fragmentation
1214	   and ATM segmentation and reassembly; one occurs at the (transit)
1215	   network layer, the other at the (virtual) link layer.

1217	   Although many of these issues with tunnel fragmentation and MTU
1218	   handling were discussed in [RFC4459], that document described a
1219	   variety of alternatives as if they were independent. This document
1220	   explains the combined approach that is necessary.

1222	   Like any other link, an IPv4 tunnel must transit 68 byte packets
1223	   without requiring source fragmentation [RFC791][RFC1122] and an IPv6
1224	   tunnel must transit 1280 byte packets without requiring source
1225	   fragmentation [RFC2460]. The tunnel MTU interacts with routers or
1226	   hosts it connects the same way as would any other link MTU. The
1227	   pseudocode examples in this section use the following values:

1229	   o  TP: transit packet

1231	   o  TLP: tunnel link packet

1233	   o  TPsize: size of the transit packet (including its headers)

1235	   o  encaps: ingress encapsulation overhead (tunnel link headers)

1237	   o  tunMTU: tunnel MTU, i.e., network N egress EMTU_R - encaps

1239	   o  tunMAP: tunnel maximum atomic packet as limited by the tunnel path
1240	      MFS

1242	   These rules apply at the host/router where the tunnel is attached,
1243	   i.e., at the network layer of the transit packet (we assume that all
1244	   tunnels, including multipoint tunnels, have a single, uniform MTU).
1245	   These are basic source fragmentation rules (or transit
1246	   refragmentation for IPv4 DF=0 datagrams), and have no relation to the
1247	   tunnel itself other than to consider the tunnel MTU as the effective
1248	   link MTU of the next hop.

1250	   Inside the source during transit packet generation or a router during
1251	   transit packet forwarding, the tunnel is treated as if it were any
1252	   other link (i.e., this is not tunnel processing, but rather typical
1253	   source or router processing), as indicated in the pseudocode in
1254	   Figure 11.

1256	      if (TPsize > tunMTU) then
1257	         if (TP can be on-path fragmented, e.g., IPv4 DF=0) then
1258	            split TP into TP fragments of tunMTU size
1259	            and send each TP fragment to the tunnel ingress interface
1260	         else
1261	            drop the TP and send ICMP "too big" to the TP source
1262	         endif
1263	      else
1264	         send TP to the tunnel ingress (i.e., as an outbound interface)
1265	      endif

1267	         Figure 11 Router / host packet size processing algorithm

1269	   The tunnel ingress acts as host on the tunnel path, i.e., as source
1270	   fragmentation of tunnel link packets (we assume that all tunnels,
1271	   even multipoint tunnels, have a single, uniform tunnel MTU), using
1272	   the pseudocode shown in Figure 12. Note that ingress source
1273	   fragmentation occurs in the encapsulation process, which may involve
1274	   more than one protocol layer. In those cases, fragmentation can occur
1275	   at any of the layers of encapsulation in which it is supported, based
1276	   on the configuration of the ingress.

1278	      if (TPsize <= tunMAP) then
1279	         encapsulate the TP and emit
1280	      else
1281	         if (tunMAP < TPsize) then
1282	            encapsulate the TP, creating the TLP
1283	            fragment the TLP into tunMAP chunks
1284	            emit the TLP fragments
1285	         endif
1286	      endif

1288	                  Figure 12 Ingress processing algorithm

1290	   Note that these Figure 11 and Figure 12 indicate that a node might
1291	   both "fragment then encapsulate" and "encapsulate then fragment",
1292	   i.e., the effect is "on-path fragment, then encapsulate, then source
1293	   fragment". The first (on-path) fragmentation occurs only for IPv4
1294	   DF=0 packets, based on the tunnel MTU. The second (source)
1295	   fragmentation occurs for all packets, based on the tunnel maximum
1296	   atomic packet (MAP) size. The first fragmentation is a convenience
1297	   for a subset of IPv4 packets; it is the second (source) fragmentation
1298	   that ensures that messages traverse the tunnel.

1300	   Just as a network interface should never receive a message larger
1301	   than its MTU, a tunnel should never receive a message larger than its
1302	   tunnel MTU limit (see the host/router processing above). A router
1303	   attempting to process such a message would already have generated an
1304	   ICMP "packet too big" and the transit packet would have been dropped
1305	   before entering into this algorithm. Similarly, a host would have
1306	   generated an error internally and aborted the attempted transmission.

1308	   As an example, consider IPv4 over IPv6 or IPv6 over IPv6 tunneling,
1309	   where IPv6 encapsulation adds a 40 byte fixed header plus IPv6
1310	   options (i.e., IPv6 header extensions) of total size 'EHsize'. The
1311	   tunnel MTU will be at least 1500 - (40 + EHsize) bytes. The tunnel
1312	   path MTU will be at least 1280 - (40 + EHsize) bytes, which then also
1313	   represents the tunnel maximum atomic packet size (MAP). Transit
1314	   packets larger than the tunnel MTU will be dropped by a node before
1315	   ingress processing, and so do not need to be addressed as part of
1316	   ingress processing. Considering these minimum values, the previous
1317	   algorithm uses actual values shown in the pseudocode in Figure 13.

1319	      if (TPsize <= (1240 - EHsize)) then
1320	         encapsulate TP and emit
1321	      else
1322	         if ((1240 - EHsize) < TPsize) then
1323	            encapsulate the TP, creating the TLP
1324	            fragment the TLP into (1240 - EHsize) chunks
1325	            emit the TLP fragments
1326	         endif
1327	      endif

1329	           Figure 13 Ingress processing for an tunnel over IPv6

1331	   IPv6 cannot necessarily support all tunnel encapsulations. When the
1332	   egress EMTU_R is the default of 1500 bytes, an IPv6 tunnel supports
1333	   IPv6 transit only if EHsize is 180 bytes or less; otherwise the
1334	   incoming transit packet would have been dropped as being too large by
1335	   the host/router. Under the same EMTU_R assumption, an IPv6 tunnel
1336	   supports IPv4 transit only if EHsize is 884 bytes or less. In this
1337	   example, transit packets of up to (1240 - Ehsize) can traverse the
1338	   tunnel without ingress source fragmentation and egress reassembly.

1340	   When using IP directly over IP, the minimum transit packet EMTU_R for
1341	   IPv4 is 576 bytes and for IPv6 is 1500 bytes. This means that tunnels
1342	   of IPv4-over-IPv4, IPv4-over-IPv6, and IPv6-over-IPv6 are possible
1343	   without additional requirements, but this may involve ingress
1344	   fragmentation and egress reassembly. IPv6 cannot be tunneled directly
1345	   over IPv4 without additional requirements, notably that the egress
1346	   EMTU_R is at least 1280 bytes.

1348	   When ongoing ingress fragmentation and egress reassembly would be
1349	   prohibitive or costly, larger MTUs can be supported by design and
1350	   confirmed either out-of-band (by design) or in-band (e.g., using
1351	   PLPMTUD [RFC4821], as done in SEAL [RFC5320] and AERO [Te17]). In
1352	   particular, many tunnel specifications are often able to avoid
1353	   persistent fragmentation because they operationally assume larger
1354	   EMTU_R and tunnel MAP sizes than are guaranteed for IPv4 [RFC1122] or
1355	   IPv6 [RFC2460].

1357	4.2.3. Path MTU Discovery

1359	   Path MTU discovery (PMTUD) enables a network path to support a larger
1360	   PMTU than it can assume from the minimum requirements of protocol
1361	   over which it operates. Note, however, that PMTUD never discovers
1362	   EMTU_R that is larger than the required minimum; that information is
1363	   available to some upper layer protocols, such as TCP [RFC1122], but
1364	   cannot be determined at the IP layer.

1366	   There is temptation to optimize tunnel traversal so that packets are
1367	   not fragmented between ingress and egress, i.e., to attempt tune the
1368	   network M PMTU to the tunnel MAP size rather than to the tunnel MTU,
1369	   to avoid ingress fragmentation. This is often impossible because the
1370	   ICMP "packet too big" message (IPv4 fragmentation needed [RFC792] or
1371	   IPv6 packet too big [RFC4443]) indicates the complete failure of a
1372	   link to transit a packet, not a preference for a size that matches
1373	   that internal the mechanism of the link. ICMP messages are intended
1374	   to indicate whether a tunnel MTU is insufficient; there is no ICMP
1375	   message that can indicate when a transit packet is "too big for the
1376	   tunnel path MTU, but not larger than the tunnel MTU". If there were,
1377	   endpoints might receive that message for IP packets larger than 40
1378	   bytes (the payload of a single ATM cell, allowing for the 8-byte AAL5
1379	   trailer), but smaller than 9K (the ATM EMTU_R payload).

1381	   In addition, attempting to try to tune the network transit size to
1382	   natively match that of the link internal transit can be hazardous for
1383	   many reasons:

1385	   o  The tunnel is capable of transiting packets as large as the
1386	      network N EMTU_R - encapsulation, which is always at least as
1387	      large as the tunnel MTU and typically is larger.

1389	   o  ICMP has only one type of error message regarding large packets -
1390	      "too big", i.e., too large to transit. There is no optimization
1391	      message of "bigger than I'd like, but I can deal with if needed".

1393	   o  IP tunnels often involve some level of recursion, i.e.,
1394	      encapsulation over itself [RFC4459].

1396	   Tunnels that use IPv4 as the encapsulation layer SHOULD set DF=0, but
1397	   this requires generating unique fragmentation ID values, which may
1398	   limit throughput [RFC6864]. These tunnels might have difficulty
1399	   assuming ingress EMTU_S values over 64 bytes, so it may not be
1400	   feasible to assume that larger packets with DF=1 are safe.

1402	   Recursive tunneling occurs whenever a protocol ends up encapsulated
1403	   in itself. This happens directly, as when IPv4 is encapsulated in
1404	   IPv4, or indirectly, as when IP is encapsulated in UDP which then is
1405	   a payload inside IP. It can involve many layers of encapsulation
1406	   because a tunnel provider isn't always aware of whether the packets
1407	   it transits are already tunneled.

1409	   Recursion is impossible when the tunnel transit packets are limited
1410	   to that of the native size of the ingress payload. Arriving tunnel
1411	   transit packets have a minimum supported size (1280 for IPv6) and the
1412	   tunnel PMFS has the same requirement; there would be no room for the
1413	   tunnel's "link layer" headers, i.e., the encapsulation layer. The
1414	   result would be an IPv6 tunnel that cannot satisfy IPv6 transit
1415	   requirements.

1417	   It is more appropriate to require the tunnel to satisfy IP transit
1418	   requirements and enforce that requirement at design time or during
1419	   operation (the latter using PLPMTUD [RFC4821]). Conventional path MTU
1420	   discovery (PMTUD) relies on existing endpoint ICMP processing of
1421	   explicit negative feedback from routers along the path via "packet to
1422	   big" ICMP packets in the reverse direction of the tunnel
1423	   [RFC1191][RFC1981]. This technique is susceptible to the "black hole"
1424	   phenomenon, in which the ICMP messages never return to the source due
1425	   to policy-based filtering [RFC2923]. PLPMTUD requires a separate,
1426	   direct control channel from the egress to the ingress that provides
1427	   positive feedback; the direct channel is not blocked by policy
1428	   filters and the positive feedback ensures fail-safe operation if
1429	   feedback messages are lost [RFC4821].

1431	   PLPMTUD might require that the ingress consider the potential impact
1432	   of multipath forwarding (see Section 4.3.4). In such cases, probes
1433	   generated by the ingress might need to track different flows, e.g.,
1434	   that might traverse different tunnel paths. Additionally,
1435	   encapsulation might need to consider mechanisms to ensure that probes
1436	   traverse the same path as their corresponding traffic, even when
1437	   labeled as the same flow (e.g., using the IPv6 flow ID). In such
1438	   cases, the transit packet and probe may need to be encrypted or
1439	   encapsulated in an additional flow-based transport header, to avoid
1440	   differential path traversal based on deep-packet inspection within
1441	   the tunnel.

1443	4.3. Coordination Issues

1445	   IP tunnels interact with link layer signals and capabilities in a
1446	   variety of ways. The following subsections address some key issues of
1447	   these interactions. In general, they are again informed by treating a
1448	   tunnel as any other link layer and considering the interactions
1449	   between the IP layer and link layers [RFC3819].

1451	4.3.1. Signaling

1453	   In the current Internet architecture, signaling goes upstream, either
1454	   from routers along a path or from the destination, back toward the
1455	   source. Such signals are typically contained in ICMP messages, but
1456	   can involve other protocols such as RSVP, transport protocol signals
1457	   (e.g., TCP RSTs), or multicast control or transport protocols.

1459	   A tunnel behaves like a link and acts like a link interface at the
1460	   nodes where it is attached. As such, it can provide information that
1461	   enhances IP signaling (e.g., ICMP), but itself does not directly
1462	   generate ICMP messages.

1464	   For tunnels, this means that there are two separate signaling paths.
1465	   The outer network M nodes can each signal the source of the tunnel
1466	   transit packets, Hsrc (Figure 14). Inside the tunnel, the inner
1467	   network N nodes can signal the source of the tunnel link packets, the
1468	   ingress I (Figure 15).

1470	           +--------+---------------------------+--------+
1471	           |        |                           |        |
1472	           v        --_                         --       v
1473	        +------+   /  \                        /  \   +------+
1474	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1475	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1476	                    --/I \--+ Rb +--+ Rc +--/E \--
1477	                      \  /   \  /    \  /   \  /
1478	                       \/     --      --     \/
1479	                        <---- Network N ----->
1480	        <-------------------- Network M --------------------->

1482	                   Figure 14 Signals outside the tunnel

1484	                        +-----+-------+------+
1485	                    --_ |     |       |      |  --
1486	        +------+   /  \ v     |       |      | /  \   +------+
1487	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1488	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1489	                    --/I \--+ Rb +--+ Rc +--/E \--
1490	                      \  /   \  /    \  /   \  /
1491	                       \/     --      --     \/
1492	                        <----- Network N ---->
1493	        <--------------------- Network M -------------------->

1495	                    Figure 15 Signals inside the tunnel

1497	   These two signal paths are inherently distinct except where
1498	   information is exchanged between the network interface of the tunnel
1499	   (the ingress) and its attached node (Ra, in both figures).

1501	   It is always possible for a network interface to provide hints to its
1502	   attached node (host or router), which can be used for optimization.
1503	   In this case, when signals inside the tunnel indicate a change to the
1504	   tunnel, the ingress (i.e., the tunnel network interface) can provide
1505	   information to the router (Ra, in both figures), so that Ra can
1506	   generate the appropriate signal in return to Hsrc. This relaying may
1507	   be difficult, because signals inside the tunnel may not return enough
1508	   information to the ingress to support direct relaying to Hsrc.

1510	   In all cases, the tunnel ingress needs to determine how to relay the
1511	   signals from inside the tunnel into signals back to the source. For
1512	   some protocols this is either simple or impossible (such as for
1513	   ICMP), for others, it can even be undefined (e.g., multicast). In
1514	   some cases, the individual signals relayed from inside the tunnel may
1515	   result in corresponding signals in the outside network, and in other
1516	   cases they may just change state of the tunnel interface. In the
1517	   latter case, the result may cause the router Ra to generate new ICMP
1518	   errors when later messages arrive from Hsrc or other sources in the
1519	   outer network.

1521	   The meaning of the relayed information must be carefully translated.
1522	   An ICMP error within a tunnel indicates a failure of the path inside
1523	   the tunnel to support an egress atomic packet or packet fragment
1524	   size. It can be very difficult to convert that ICMP error into a
1525	   corresponding ICMP message from the ingress node back to the transit
1526	   packet source. The ICMP message may not contain enough of a packet
1527	   prefix to extract the transit packet header sufficient to generate
1528	   the appropriate ICMP message. The relationship between the egress
1529	   EMTU_R and the transit packet may be indirect, e.g., the ingress node
1530	   may be performing source fragmentation that should be adjusted
1531	   instead of propagating the ICMP upstream.

1533	   Some messages have detailed specifications for relaying between the
1534	   tunnel link packet and transit packet, including Explicit Congestion
1535	   Notification (ECN [RFC6040]) and multicast (IGMP, e.g.).

1537	4.3.2. Congestion

1539	   Tunnels carrying IP traffic (i.e., the focus of this document) need
1540	   not react directly to congestion any more than would any other link
1541	   layer [RFC8085]. IP transit packet traffic is already expected to be
1542	   congestion controlled.

1544	   It is useful to relay network congestion notification between the
1545	   tunnel link and the tunnel transit packets. Explicit congestion
1546	   notification requires that ECN bits are copied from the tunnel
1547	   transit packet to the tunnel link packet on encapsulation, as well as
1548	   copied back at the egress based on a combination of the bits of the
1549	   two headers [RFC6040]. This allows congestion notification within the
1550	   tunnel to be interpreted as if it were on the direct path.

1552	4.3.3. Multipoint Tunnels and Multicast

1554	   Multipoint tunnels are tunnels with more than two ingress/egress
1555	   endpoints [RFC2529][RFC5214][Te17]. Just as tunnels emulate links,
1556	   multipoint tunnels emulate multipoint links, and can support
1557	   multicast as a tunnel capability. Multipoint tunnels can be useful on
1558	   their own, or may be used as part of more complex systems, e.g., LISP
1559	   and TRILL configurations [RFC6830][RFC6325].

1561	   Multipoint tunnels require a support for egress determination, just
1562	   as multipoint links do. This function is typically supported by ARP
1563	   [RFC826] or ARP emulation (e.g., LAN Emulation, known as LANE

1565	   [RFC2225]) for multipoint links. For multipoint tunnels, a similar
1566	   mechanism is required for the same purpose - to determine the egress
1567	   address for proper ingress encapsulation (e.g., LISP Map-Service
1568	   [RFC6833]).

1570	   All multipoint systems - tunnels and links - might support different
1571	   MTUs between each ingress/egress (or link entrance/exit) pair. In
1572	   most cases, it is simpler to assume a uniform MTU throughout the
1573	   multipoint system, e.g., the minimum MTU supported across all
1574	   ingress/egress pairs. This applies to both the ingress EMTU_S and
1575	   egress EMTU_R (the latter determining the tunnel MTU). Values valid
1576	   across all receivers need to be confirmed in advance (e.g., via IPv6
1577	   ND announcements or out-of-band configuration information) before a
1578	   multipoint tunnel or link can use values other than the default,
1579	   otherwise packets may reach some receivers but be "black-holed" to
1580	   others (e.g., if PMTUD fails [RFC2923]).

1582	   A multipoint tunnel MUST have support for broadcast and multicast (or
1583	   their equivalent), in exactly the same way as this is already
1584	   required for multipoint links [RFC3819]. Both modes can be supported
1585	   either by a native mechanism inside the tunnel or by emulation using
1586	   serial replication at the tunnel ingress (e.g., AMT [RFC7450]), in
1587	   the same way that links may provide the same support either natively
1588	   (e.g., via promiscuous or automatic replication in the link itself)
1589	   or network interface emulation (e.g., as for non-broadcast
1590	   multiaccess networks, i.e., NBMAs).

1592	   IGMP snooping enables IP multicast to be coupled with native link
1593	   layer multicast support [RFC4541]. A similar technique may be
1594	   relevant to couple transit packet multicast to tunnel link packet
1595	   multicast, but the coupling of the protocols may be more complex
1596	   because many tunnel link protocols rely on their own network N
1597	   multicast control protocol, e.g., via PIM-SM [RFC6807][RFC7761].

1599	4.3.4. Load Balancing

1601	   Load balancing can impact the way in which a tunnel operates. In
1602	   particular, multipath routing inside the tunnel can impact some of
1603	   the tunnel parameters to vary, both over time and for different
1604	   transit packets. The use of multiple paths can be the result of MPLS
1605	   link aggregation groups (LAGs), equal-cost multipath routing (ECMP
1606	   [RFC2991]), or other load balancing mechanisms. In some cases, the
1607	   tunnel exists as the mechanism to support ECMP, as for GRE in UDP
1608	   [RFC8086].

1610	   A tunnel may have multiple paths between the ingress and egress with
1611	   different tunnel path MTU or tunnel MAP values, causing the ingress
1612	   EMTU_S to vary [RFC7690]. When individual values cannot be correlated
1613	   to transit traffic, the EMTU_S can be set to the minimum of these
1614	   different path MTU and MAP values.

1616	   In some cases, these values can be correlated to paths, e.g., IPv6
1617	   packets include a flow label to enable multipath routing to keep
1618	   packets of a single flow following the same path, as well as to help
1619	   differentiate path properties (e.g., for path MTU discovery
1620	   [RFC4821]). It is important to preserve the semantics of that flow
1621	   label as an aggregate identifier of the encapsulated link packets of
1622	   a tunnel. This is achieved by hashing the transit IP addresses and
1623	   flow label to generate a new flow label for use between the ingress
1624	   and egress addresses [RFC6438]. It is not appropriate to simply copy
1625	   the flow label from the transit packet into the link packet because
1626	   of collisions that might arise if a label is used for flows between
1627	   different transit packet addresses that traverse the same tunnel.

1629	   When the transit packet is visible to forwarding nodes inside the
1630	   tunnel (e.g., when it is not encrypted), those nodes use deep packet
1631	   inspection (DPI) context to send a single flow over different paths.
1632	   This sort of "DPI override" of the IP flow information can interfere
1633	   with both PMTUD and PLPMTUD mechanisms. The only way to ensure that
1634	   intermediate nodes do not interfere with PLPMTUD is to encrypt the
1635	   transit packet when it is encapsulated for tunnel traversal, or to
1636	   provide some other signals (e.g., an additional layer of
1637	   encapsulation header including transport ports) that preserves the
1638	   flow semantics.

1640	4.3.5. Recursive Tunnels

1642	   The rules described in this document already support tunnels over
1643	   tunnels, sometimes known as "recursive" tunnels, in which IP is
1644	   transited over IP either directly or via intermediate encapsulation
1645	   (IP-UDP-IP, as in GUE [He16]).

1647	   There are known hazards to recursive tunneling, notably that the
1648	   independence of the tunnel transit header and tunnel link header hop
1649	   counts can result in a tunneling loop. Such looping can be avoided
1650	   when using direct encapsulation (IP in IP) by use of a header option
1651	   to track the encapsulation count and to limit that count [RFC2473].
1652	   This looping cannot be avoided when other protocols are used for
1653	   tunneling, e.g., IP in UDP in IP, because the encapsulation count may
1654	   not be visible where the recursion occurs.

1656	5. Observations

1658	   The following subsections summarize the observations of this document
1659	   and a summary of issues with existing tunnel protocol specifications.
1660	   It also includes advice for tunnel protocol designers, implementers,
1661	   and operators. It also includes

1663	5.1. Summary of Recommendations

1665	   o  Tunnel endpoints are network interfaces, tunnel are virtual links

1667	       o ICMP messages MUST NOT be generated by the tunnel (as a link)

1669	       o ICMP messages received by the ingress inside link change the
1670	          link properties (they do not generate transit-layer ICMP
1671	          messages)

1673	       o Link headers (hop, ID, options) are largely independent of
1674	          arriving ID (with few exceptions based on translation, not
1675	          direct copying, e.g., ECN and IPv6 flow IDs)

1677	   o  MTU values should treat the tunnel as any other link

1679	       o Require source ingress source fragmentation and egress
1680	          reassembly at the tunnel link packet layer

1682	       o The tunnel MTU is the tunnel egress EMTU_R less headers, and
1683	          not related at all to the ingress-egress MFS

1685	   o  Tunnels must obey core IP requirements

1687	       o Obey IPv4 DF=1 on arrival at a node (nodes MUST NOT fragment
1688	          IPv4 packets where DF=1 and routers MUST NOT clear the DF bit)

1690	       o Shut down an IP tunnel if the tunnel MTU falls below the
1691	          required minimum

1693	5.2. Impact on Existing Encapsulation Protocols

1695	   Many existing and proposed encapsulation protocols are inconsistent
1696	   with the guidelines of this document. The following list summarizes
1697	   only those inconsistencies, but omits places where a protocol is
1698	   inconsistent solely by reference to another protocol.

1700	   [should this be inverted as a table of issues and a list of which
1701	   RFCs have problems?]
1702	   o  IP in IP / mobile IP [RFC2003][RFC4459] - IPv4 in IPv4

1704	       o Sets link DF when transit DF=1 (fails without PLPMTUD)

1706	       o Drops at egress if hopcount = 0 (host-host tunnels fail)

1708	       o Drops based on transit source (same as router IP, matches
1709	          egress), i.e., performs routing functions it should not

1711	       o Ingress generates ICMP messages (based on relayed context),
1712	          rather than using inner ICMP messages to set interface
1713	          properties only

1715	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1717	   o  IPv6 tunnels [RFC2473] -- IPv6 or IPv4 in IPv6

1719	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1721	       o Decrements transiting packet hopcount (by 1)

1723	       o Copies traffic class from tunnel link to tunnel transit header

1725	       o Ignores IPv4 DF=0 and fragments at that layer upon arrival

1727	       o Fails to retain soft ingress state based on inner ICMP messages
1728	          affecting tunnel MTU

1730	       o Tunnel ingress issues ICMPs

1732	       o Fragments IPv4 over IPv6 fragments only if IPv4 DF=0
1733	          (misinterpreting the "can fragment the IPv4 packet" as
1734	          permission to fragment at the IPv6 link header)

1736	   o  IPsec tunnel mode (IP in IPsec in IP) [RFC4301] -- IP in IPsec

1738	       o Uses security policy to set, clear, or copy DF (rather than
1739	          generating it independently, which would also be more secure)

1741	       o Intertwines tunnel selection with security selection, rather
1742	          than presenting tunnel as an interface and using existing
1743	          forwarding (as with transport mode over IP-in-IP [RFC3884])

1745	   o  GRE (IP in GRE in IP or IP in GRE in UDP in IP)
1746	      [RFC2784][RFC7588][RFC7676][RFC8086]

1748	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU
1749	       o Requires ingress to generate ICMP errors

1751	       o Copies IPv4 DF to outer IPv4 DF

1753	       o Violates IPv6 MTU requirements when using IPv6 encapsulation

1755	   o  LISP [RFC6830]

1757	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1759	       o Requires ingress to generate ICMP errors

1761	       o Copies inner hop limit to outer

1763	   o  L2TP [RFC3931]

1765	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1767	       o Requires ingress to generate ICMP errors

1769	   o  PWE [RFC3985]

1771	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1773	       o Requires ingress to generate ICMP errors

1775	   o  GUE (Generic UDP encapsulation) [He16] - IP (et. al) in UDP in IP

1777	       o Allows inner encapsulation fragmentation

1779	   o  Geneve [RFC7364][Gr17] - IP (et al.) in Geneve in UDP in IP

1781	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1783	   o  SEAL/AERO [RFC5320][Te17] - IP in SEAL/AERO in IP

1785	       o Some issues with SEAL (MTU, ICMP), corrected in AERO

1787	   o  RTG DT encapsulations [No16]

1789	       o Assumes fragmentation can be avoided completely

1791	       o Allows encapsulation protocols that lack fragmentation

1793	       o Relies on ICMP PTB to correct for tunnel path MTU

1795	   o  No known issues
1796	       o L2VPN (framework for L2 virtualization) [RFC4664]

1798	       o L3VPN (framework for L3 virtualization) [RFC4176]

1800	       o MPLS (IP in MPLS) [RFC3031]

1802	       o TRILL (Ethernet in Ethernet) [RFC5556][RFC6325]

1804	5.3. Tunnel Protocol Designers

1806	   [To be completed]

1808	   Recursive tunneling + minimum MTU = frag/reassembly is inevitable, at
1809	   least to be able to split/join two fragments

1811	   Account for egress MTU/path MTU differences.

1813	   Include a stronger checksum.

1815	   Ensure the egress MTU is always larger than the path MTU.

1817	   Ensure that the egress reassembly can keep up with line rate OR
1818	   design PLPMTUD into the tunneling protocol.

1820	5.3.1. For Future Standards

1822	   [To be completed]

1824	   Larger IPv4 MTU (2K? or just 2x path MTU?) for reassembly

1826	   Always include frag support for at least two frags; do NOT try to
1827	   deprecate fragmentation.

1829	   Limit encapsulation option use/space.

1831	   Augment ICMP to have two separate messages: PTB vs P-bigger-than-
1832	   optimal

1834	   Include MTU as part of BGP as a hint - SB

1836	   Hazards of multi-MTU draft-van-beijnum-multi-mtu-04

1838	5.3.2. Diagnostics

1840	   [To be completed]
1841	   Some current implementations include diagnostics to support
1842	   monitoring the impact of tunneling, especially the impact on
1843	   fragmentation and reassembly resources, the status of path MTU
1844	   discovery, etc.

1846	   >> Because a tunnel ingress/egress is a network interface, it SHOULD
1847	   have similar resources as any other network interface. This includes
1848	   resources for packet processing as well as monitoring.

1850	5.4. Tunnel Implementers

1852	   [To be completed]

1854	   Detect when the egress MTU is exceeded.

1856	   Detect when the egress MTU drops below the required minimum and shut
1857	   down the tunnel if that happens - configuring the tunnel down and
1858	   issuing a hard error may be the only way to detect this anomaly, and
1859	   it's sufficiently important that the tunnel SHOULD be disabled. This
1860	   is always better than blindly assuming the tunnel has been deployed
1861	   correctly, i.e., that the solution has been engineered.

1863	   Do NOT decrement the TTL as part of being a tunnel. It's always
1864	   already OK for a router to decrement the TTL based on different next-
1865	   hop routers, but TTL is a property of a router not a link.

1867	5.5. Tunnel Operators

1869	   [To be completed]

1871	   Keep the difference between "enforced by operators" vs. "enforced by
1872	   active protocol mechanism" in mind. It's fine to assume something the
1873	   tunnel cannot or does not test, as long as you KNOW you can assume
1874	   it. When the assumption is wrong, it will NOT be signaled by the
1875	   tunnel. Do NOT decrement the TTL as part of being a tunnel. It's
1876	   always already OK for a router to decrement the TTL based on
1877	   different next-hop routers, but TTL is a property of a router not a
1878	   link.

1880	   Consider the circuit breakers doc to provide diagnostics and last-
1881	   resort control to avoid overload for non-reactive traffic (see
1882	   Gorry's RFC-to-be)

1884	   Do NOT decrement the TTL as part of being a tunnel. It's always
1885	   already OK for a router to decrement the TTL based on different next-
1886	   hop routers, but TTL is a property of a router not a link.

1888	   >>>> PLPMTUD can give multiple conflicting PMTU values during ECMP or
1889	   LAG if PMTU is cached per endpoint pair rather than per flow -- but
1890	   so can PMTUD! This is another reason why ICMP should never drive up
1891	   the effective MTU (if aggregate, treat as the minimum of received
1892	   messages over an interval).

1894	6. Security Considerations

1896	   Tunnels may introduce vulnerabilities or add to the potential for
1897	   receiver overload and thus DOS attacks. These issues are primarily
1898	   related to the fact that a tunnel is a link that traverses a network
1899	   path and to fragmentation and reassembly. ICMP signal translation
1900	   introduces a new security issue and must be done with care. ICMP
1901	   generation at the router or host attached to a tunnel is already
1902	   covered by existing requirements (e.g., should be throttled).

1904	   Tunnels traverse multiple hops of a network path from ingress to
1905	   egress. Traffic along such tunnels may be susceptible to on-path and
1906	   off-path attacks, including fragment injection, reassembly buffer
1907	   overload, and ICMP attacks. Some of these attacks may not be as
1908	   visible to the endpoints of the architecture into which tunnels are
1909	   deployed and these attacks may thus be more difficult to detect.

1911	   Fragmentation at routers or hosts attached to tunnels may place an
1912	   undue burden on receivers where traffic is not sufficiently diffuse,
1913	   because tunnels may induce source fragmentation at hosts and path
1914	   fragmentation (for IPv4 DF=0) more for tunnels than for other links.
1915	   Care should be taken to avoid this situation, notably by ensuring
1916	   that tunnel MTUs are not significantly different from other link
1917	   MTUs.

1919	   Tunnel ingresses emitting IP datagrams MUST obey all existing IP
1920	   requirements, such as the uniqueness of the IP ID field. Failure to
1921	   either limit encapsulation traffic, or use additional ingress/egress
1922	   IP addresses, can result in high speed traffic fragments being
1923	   incorrectly reassembled.

1925	   Tunnels are susceptible to attacks at both the inner and outer
1926	   network layers. The tunnel ingress/egress endpoints appear as network
1927	   interfaces in the outer network, and are as susceptible as any other
1928	   network interface. This includes vulnerability to fragmentation
1929	   reassembly overload, traffic overload, and spoofed ICMP messages that
1930	   misreport the state of those interfaces. Similarly, the
1931	   ingress/egress appear as hosts to the path traversed by the tunnel,
1932	   and thus are as susceptible as any other host to attacks as well.

1934	   [management?]

1936	   [Access control?]

1938	   describe relationship to [RFC6169] - JT (as per INTAREA meeting
1939	   notes, don't cover Teredo-specific issues in RFC6169, but include
1940	   generic issues here)

1942	7. IANA Considerations

1944	   This document has no IANA considerations.

1946	   The RFC Editor should remove this section prior to publication.

1948	8. References

1950	8.1. Normative References

1952	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1953	             Requirement Levels", BCP 14, RFC 2119, March 1997.

1955	   [are there others? 3819? ECN? Flow label issues?]

1957	8.2. Informative References

1959	   [Cl88]    Clark, D., "The design philosophy of the DARPA internet
1960	             protocols," Proc. Sigcomm 1988, p.106-114, 1988.

1962	   [Er94]    Eriksson, H., "MBone: The Multicast Backbone,"
1963	             Communications of the ACM, Aug. 1994, pp.54-60.

1965	   [Gr17]    Gross, J. (Ed.), I. Ganga (Ed.), T. Sridhar (Ed.), "Geneve:
1966	             Generic Network Virtualization Encapsulation," draft-ietf-
1967	             nvo3-geneve-04, Mar. 2017.

1969	   [He16]    Herbert, T., L. Yong, O. Zia, "Generic UDP Encapsulation,"
1970	             draft-ietf-nvo3-gue-05, Oct. 2016.

1972	   [Ke95]    Kent, S., J. Mogul, "Fragmentation considered harmful," ACM
1973	             Sigcomm Computer Communication Review (CCR), V25 N1, Jan.
1974	             1995, pp. 75-87.

1976	   [No16]    Nordmark, E. (Ed.), A. Tian, J. Gross, J. Hudson, L.
1977	             Kreeger, P. Garg, P. Thaler, T. Herbert, "Encapsulation
1978	             Considerations," draft-ietf-rtgwg-dt-encap-02, Oct. 2016.

1980	   [RFC5]    Rulifson, J, "Decode Encode Language (DEL)," RFC 5, June
1981	             1969.

1983	   [RFC768]  Postel, J, "User Datagram Protocol," RFC 768, Aug. 1980

1985	   [RFC791]  Postel, J., "Internet Protocol," RFC 791 / STD 5, September
1986	             1981.

1988	   [RFC792]  Postel, J., "Internet Control Message Protocol," RFC 792,
1989	             Sep. 981.

1991	   [RFC793]  Postel, J, "Transmission Control Protocol," RFC 793, Sept.
1992	             1981.

1994	   [RFC826]  Plummer, D., "An Ethernet Address Resolution Protocol -- or
1995	             -- Converting Network Protocol Addresses to 48.bit Ethernet
1996	             Address for Transmission on Ethernet Hardware," RFC 826,
1997	             Nov. 1982.

1999	   [RFC1075] Waitzman, D., C. Partridge, S. Deering, "Distance Vector
2000	             Multicast Routing Protocol," RFC 1075, Nov. 1988.

2002	   [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
2003	             Communication Layers," RFC 1122 / STD 3, October 1989.

2005	   [RFC1191] Mogul, J., S. Deering, "Path MTU discovery," RFC 1191,
2006	             November 1990.

2008	   [RFC1812] Baker, F., "Requirements for IP Version 4 Routers," RFC
2009	             1812, June 1995.

2011	   [RFC1853] Simpson, W., "IP in IP Tunneling," RFC 1853, Oct. 1995.

2013	   [RFC1981] McCann, J., S. Deering, J. Mogul, "Path MTU Discovery for
2014	             IP version 6," RFC 1981, Aug. 1996.

2016	   [RFC2003] Perkins, C., "IP Encapsulation within IP," RFC 2003, Oct.
2017	             1996.

2019	   [RFC2225] Laubach, M., J. Halpern, "Classical IP and ARP over ATM,"
2020	             RFC 2225, Apr. 1998.

2022	   [RFC2460] Deering, S., R. Hinden, "Internet Protocol, Version 6
2023	             (IPv6) Specification," RFC 2460, Dec. 1998.

2025	   [RFC2473] Conta, A., "Generic Packet Tunneling in IPv6
2026	             Specification," RFC 2473, Dec. 1998.

2028	   [RFC2529] Carpenter, B., C. Jung, "Transmission of IPv6 over IPv4
2029	             Domains without Explicit Tunnels," RFC 2529, Mar. 1999.

2031	   [RFC2784] Farinacci, D., T. Li, S. Hanks, D. Meyer, P. Traina,
2032	             "Generic Routing Encapsulation (GRE)", RFC 2784, March
2033	             2000.

2035	   [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery," RFC
2036	             2923, September 2000.

2038	   [RFC2983] Black, D., "Differentiated Services and Tunnels," RFC 2983,
2039	             Oct. 2000.

2041	   [RFC2991] Thaler, D., C. Hopps, "Multipath Issues in Unicast and
2042	             Multicast Next-Hop Selection," RFC 2991, Nov. 2000.

2044	   [RFC2473] Conta, A., S. Deering, "Generic Packet Tunneling in IPv6
2045	             Specification," RFC 2473, Dec. 1998.

2047	   [RFC2546] Durand, A., B. Buclin, "6bone Routing Practice," RFC 2540,
2048	             Mar. 1999.

2050	   [RFC3031] Rosen, E., A. Viswanathan, R. Callon, "Multiprotocol Label
2051	             Switching Architecture", RFC 3031, January 2001.

2053	   [RFC3819] Karn, P., Ed., C. Bormann, G. Fairhurst, D. Grossman, R.
2054	             Ludwig, J. Mahdavi, G. Montenegro, J. Touch, L. Wood,
2055	             "Advice for Internet Subnetwork Designers," RFC 3819 / BCP
2056	             89, July 2004.

2058	   [RFC3884] Touch, J., L. Eggert, Y. Wang, "Use of IPsec Transport Mode
2059	             for Dynamic Routing," RFC 3884, September 2004.

2061	   [RFC3931] Lau, J., Ed., M. Townsley, Ed., I. Goyret, Ed., "Layer Two
2062	             Tunneling Protocol - Version 3 (L2TPv3)," RFC 3931, March
2063	             2005.

2065	   [RFC3985] Bryant, S., P. Pate (Eds.), "Pseudo Wire Emulation Edge-to-
2066	             Edge (PWE3) Architecture", RFC 3985, March 2005.

2068	   [RFC4176] El Mghazli, Y., Ed., T. Nadeau, M. Boucadair, K. Chan, A.
2069	             Gonguet, "Framework for Layer 3 Virtual Private Networks
2070	             (L3VPN) Operations and Management," RFC 4176, October 2005.

2072	   [RFC4301] Kent, S., and K. Seo, "Security Architecture for the
2073	             Internet Protocol," RFC 4301, December 2005.

2075	   [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion
2076	             Control Protocol (DCCP)," RFC 4340, Mar. 2006.

2078	   [RFC4443] Conta, A., S. Deering, M. Gupta (Ed.), "Internet Control
2079	             Message Protocol (ICMPv6) for the Internet Protocol Version
2080	             6 (IPv6) Specification," RFC 4443, Mar. 2006.

2082	   [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-
2083	             Network Tunneling," RFC 4459, April 2006.

2085	   [RFC4541] Christensen, M., K. Kimball, F. Solensky, "Considerations
2086	             for Internet Group Management Protocol (IGMP) and Multicast
2087	             Listener Discovery (MLD) Snooping Switches," RFC 4541, May
2088	             2006.

2090	   [RFC4664] Andersson, L., Ed., E. Rosen, Ed., "Framework for Layer 2
2091	             Virtual Private Networks (L2VPNs)," RFC 4664, September
2092	             2006.

2094	   [RFC4821] Mathis, M., J. Heffner, "Packetization Layer Path MTU
2095	             Discovery," RFC 4821, March 2007.

2097	   [RFC4861] Narten, T., E. Nordmark, W. Simpson, H. Soliman, "Neighbor
2098	             Discovery for IP version 6 (IPv6)," RFC 4861, Sept. 2007.

2100	   [RFC4960] Stewart, R. (Ed.), "Stream Control Transmission Protocol,"
2101	             RFC 4960, Sep. 2007.

2103	   [RFC4963] Heffner, J., M. Mathis, B. Chandler, "IPv4 Reassembly
2104	             Errors at High Data Rates," RFC 4963, July 2007.

2106	   [RFC5214] Templin, F., T. Gleeson, D. Thaler, "Intra-Site Automatic
2107	             Tunnel Addressing Protocol (ISATAP)," RFC 5214, Mar. 2008.

2109	   [RFC5320] Templin, F., Ed., "The Subnetwork Encapsulation and
2110	             Adaptation Layer (SEAL)," RFC 5320, Feb. 2010.

2112	   [RFC5556] Touch, J., R. Perlman, "Transparently Interconnecting Lots
2113	             of Links (TRILL): Problem and Applicability Statement," RFC
2114	             5556, May 2009.

2116	   [RFC5944] Perkins, C., Ed., "IP Mobility Support for IPv4, Revised"
2117	             RFC 5944, Nov. 2010.

2119	   [RFC6040] Briscoe, B., "Tunneling of Explicit Congestion
2120	             Notification," RFC 6040, Nov. 2010.

2122	   [RFC6169] Krishnan, S., D. Thaler, J. Hoagland, "Security Concerns
2123	             With IP Tunneling," RFC 6169, Apr. 2011.

2125	   [RFC6325] Perlman, R., D. Eastlake, D. Dutt, S. Gai, A. Ghanwani,
2126	             "Routing Bridges (RBridges): Base Protocol Specification,"
2127	             RFC 6325, July 2011.

2129	   [RFC6434] Jankiewicz, E., J. Loughney, T. Narten, "IPv6 Node
2130	             Requirements," RFC 6434, Dec. 2011.

2132	   [RFC6438] Carpenter, B., S. Amante, "Using the IPv6 Flow Label for
2133	             Equal Cost Multipath Routing and Link Aggregation in
2134	             Tunnels," RFC 6438, Nov. 2011.

2136	   [RFC6807] Farinacci, D., G. Shepherd, S. Venaas, Y. Cai, "Population
2137	             Count Extensions to Protocol Independent Multicast (PIM),"
2138	             RFC 6807, Dec. 2012.

2140	   [RFC6830] Farinacci, D., V. Fuller, D. Meyer, D. Lewis, "The
2141	             Locator/ID Separation Protocol," RFC 6830, Jan. 2013.

2143	   [RFC6833] Fuller, V., D. Farinacci, "Locator/ID Separation Protocol
2144	             (LISP) Map-Server Interface," RFC 6833, Jan. 2013.

2146	   [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field,"
2147	             Proposed Standard, RFC 6864, Feb. 2013.

2149	   [RFC6935] Eubanks, M., P. Chimento, M. Westerlund, "IPv6 and UDP
2150	             Checksums for Tunneled Packets," RFC 6935, Apr. 2013.

2152	   [RFC6936] Fairhurst, G., M. Westerlund, "Applicability Statement for
2153	             the Use of IPv6 UDP Datagrams with Zero Checksums," RFC
2154	             6936, Apr. 2013.

2156	   [RFC6946] Gont, F., "Processing of IPv6 "Atomic" Fragments," RFC
2157	             6946, May 2013.

2159	   [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., M.
2160	             Napierala, "Problem Statement: Overlays for Network
2161	             Virtualization", RFC 7364, Oct. 2014.

2163	   [RFC7450] Bumgardner, G., "Automatic Multicast Tunneling," RFC 7450,
2164	             Feb. 2015.

2166	   [RFC7510] Xu, X., N. Sheth, L. Yong, R. Callon, D. Black,
2167	             "Encapsulating MPLS in UDP," RFC 7510, April 2015.

2169	   [RFC7588] Bonica, R., C. Pignataro, J. Touch, "A Widely-Deployed
2170	             Solution to the Generic Routing Encapsulation Fragmentation
2171	             Problem," RFC 7588, July 2015.

2173	   [RFC7676] Pignataro, C., R. Bonica, S. Krishnan, "IPv6 Support for
2174	             Generic Routing Encapsulation (GRE)," RFC 7676, Oct 2015.

2176	   [RFC7690] Byerly, M., M. Hite, J. Jaeggli, "Close Encounters of the
2177	             ICMP Type 2 Kind (Near Misses with ICMPv6 Packet Too Big
2178	             (PTB))," RFC 7690, Jan. 2016.

2180	   [RFC7761] Fenner, B., M. Handley, H. Holbrook, I. Kouvelas, R.
2181	             Parekh, Z. Zhang, L. Zheng, "Protocol Independent Multicast
2182	             - Sparse Mode (PIM-SM): Protocol Specification (Revised),"
2183	             RFC 7761, Mar. 2016.

2185	   [RFC8085] Eggert, L., G. Fairhurst, G. Shepherd, "Unicast UDP Usage
2186	             Guidelines," RFC 8085, Oct. 2015.

2188	   [RFC8086] Yong, L. (Ed.), E. Crabbe, X. Xu, T. Herbert, "GRE-in-UDP
2189	             Encapsulation," RFC 8086, Feb. 2017.

2191	   [Sa84]    Saltzer, J., D. Reed, D. Clark, "End-to-end arguments in
2192	             system design," ACM Trans. on Computing Systems, Nov. 1984.

2194	   [Te17]    Templin, F., "Asymmetric Extended Route Optimization,"
2195	             draft-templin-aerolink-75, May 2017.

2197	   [To01]    Touch, J., "Dynamic Internet Overlay Deployment and
2198	             Management Using the X-Bone," Computer Networks, July 2001,
2199	             pp. 117-135.

2201	   [To03]    Touch, J., Y. Wang, L. Eggert, G. Finn, "Virtual Internet
2202	             Architecture," USC/ISI Tech. Report ISI-TR-570, Aug. 2003.

2204	   [To16]    Touch, J., "Middleboxes Models Compatible with the
2205	             Internet," USC/ISI Tech. Report ISI-TR-711, Oct. 2016.

2207	   [To98]    Touch, J., S. Hotz, "The X-Bone," Proc. Globecom Third
2208	             Global Internet Mini-Conference, Nov. 1998.

2210	   [Zi80]    Zimmermann, H., "OSI Reference Model - The ISO Model of
2211	             Architecture for Open Systems Interconnection," IEEE Trans.
2212	             on Comm., Apr. 1980.

2214	9. Acknowledgments

2216	   This document originated as the result of numerous discussions among
2217	   the authors, Jari Arkko, Stuart Bryant, Lars Eggert, Ted Faber, Gorry
2218	   Fairhurst, Dino Farinacci, Matt Mathis, and Fred Templin. It
2219	   benefitted substantially from detailed feedback from Toerless Eckert,
2220	   Vincent Roca, and Lucy Yong, as well as other members of the Internet
2221	   Area Working Group.

2223	   This work is partly supported by USC/ISI's Postel Center.

2225	   This document was prepared using 2-Word-v2.0.template.dot.

2227	Authors' Addresses

2229	   Joe Touch
2230	   USC/ISI
2231	   4676 Admiralty Way
2232	   Marina del Rey, CA 90292-6695
2233	   U.S.A.

2235	   Phone: +1 (310) 448-9151
2236	   Email: touch@isi.edu

2238	   W. Mark Townsley
2239	   Cisco
2240	   L'Atlantis, 11, Rue Camille Desmoulins
2241	   Issy Les Moulineaux, ILE DE FRANCE 92782

2243	   Email: townsley@cisco.com

2245	APPENDIX A: Fragmentation efficiency

2247	A.1. Selecting fragment sizes

2249	   There are different ways to fragment a packet. Consider a network
2250	   with a PMTU as shown in Figure 16, where packets are encapsulated
2251	   over the same network layer as they arrive on (e.g., IP in IP). If a
2252	   packet as large as the PMTU arrives, it must be fragmented to
2253	   accommodate the additional header.

2255	         X===========================X (transit PMTU)
2256	         +----+----------------------+
2257	         | iH | DDDDDDDDDDDDDDDDDDDD |
2258	         +----+----------------------+
2259	           |
2260	           |  X===========================X (tunnel 1 MTU)
2261	           |  +---+----+------------------+
2262	       (a) +->| H'| iH | DDDDDDDDDDDDDDDD |
2263	           |  +---+----+------------------+
2264	           |      |
2265	           |      |  X===========================X (tunnel 2 MTU)
2266	           |      |  +----+---+----+-------------+
2267	           | (a1) +->| nH'| H | iH | DDDDDDDDDDD |
2268	           |      |  +----+---+----+-------------+
2269	           |      |
2270	           |      |  +----+-------+
2271	           | (a2) +->| nH"| DDDDD |
2272	           |         +----+-------+
2273	           |
2274	           |  +---+------+
2275	       (b) +->| H"| DDDD |
2276	              +---+------+
2277	                  |
2278	                  |  +----+---+------+
2279	             (b1) +->| nH'| H"| DDDD |
2280	                     +----+---+------+

2282	                   Figure 16 Fragmenting via maximum fit

2284	   Figure 16 shows this process using "maximum fit", assuming outer
2285	   fragmentation as an example (the situation is the same for inner
2286	   fragmentation, but the headers that are affected differ). In maximum
2287	   fit, the arriving packet is split into (a) and (b), where (a) is the
2288	   size of the first tunnel, i.e., the tunnel 1 MTU (the maximum that
2289	   fits over the first tunnel). However, this tunnel then traverses over
2290	   another tunnel (number 2), whose impact the first tunnel ingress has
2291	   not accommodated. The packet (a) arrives at the second tunnel
2292	   ingress, and needs to be encapsulated again, but it needs to be
2293	   fragmented as well to fit into the tunnel 2 MTU, into (a1) and (a2).
2294	   In this case, packet (b) arrives at the second tunnel ingress and is
2295	   encapsulated into (b1) without fragmentation, because it is already
2296	   below the tunnel 2 MTU size.

2298	   In Figure 17, the fragmentation is done using "even split", i.e., by
2299	   splitting the original packet into two roughly equal-sized
2300	   components, (c) and (d). Note that (d) contains more packet data,
2301	   because (c) includes the original packet header because this is an
2302	   example of outer fragmentation. The packets (c) and (d) arrive at the
2303	   second tunnel encapsulator, and are encapsulated again; this time,
2304	   neither packet exceeds the tunnel 2 MTU, and neither requires further
2305	   fragmentation.

2307	         X===========================X (transit PMTU)
2308	         +----+----------------------+
2309	         | iH | DDDDDDDDDDDDDDDDDDDD |
2310	         +----+----------------------+
2311	           |
2312	           |  X===========================X (tunnel 1 MTU)
2313	           |  +---+----+----------+
2314	       (c) +->| H'| iH | DDDDDDDD |
2315	           |  +---+----+----------+
2316	           |      |
2317	           |      |  X===========================X (tunnel 2 MTU)
2318	           |      |  +----+---+----+----------+
2319	           | (c1) +->| nH | H'| iH | DDDDDDDD |
2320	           |         +----+---+----+----------+
2321	           |
2322	           |  +---+--------------+
2323	       (d) +->| H"| DDDDDDDDDDDD |
2324	              +---+--------------+
2325	                  |
2326	                  |  +----+---+--------------+
2327	             (d1) +->| nH | H"| DDDDDDDDDDDD |
2328	                     +----+---+--------------+

2330	                  Figure 17 Fragmenting via "even split"

2332	A.2. Packing

2334	   Encapsulating individual packets to traverse a tunnel can be
2335	   inefficient, especially where headers are large relative to the
2336	   packets being carried. In that case, it can be more efficient to
2337	   encapsulate many small packets in a single, larger tunnel payload.

2339	   This technique, similar to the effect of packet bursting in Gigabit
2340	   Ethernet (regardless of whether they're encoded using L2 symbols as
2341	   delineators), reduces the overhead of the encapsulation headers
2342	   (Figure 18). It reduces the work of header addition and removal at
2343	   the tunnel endpoints, but increases other work involving the packing
2344	   and unpacking of the component packets carried.

2346	                     +-----+-----+
2347	                     | iHa | iDa |
2348	                     +-----+-----+
2349	                           |
2350	                           |     +-----+-----+
2351	                           |     | iHb | iDb |
2352	                           |     +-----+-----+
2353	                           |           |
2354	                           |           |     +-----+-----+
2355	                           |           |     | iHc | iDc |
2356	                           |           |     +-----+-----+
2357	                           |           |           |
2358	                           v           v           v
2359	                +----+-----+-----+-----+-----+-----+-----+
2360	                | oH | iHa | iDa | iHb | iDb | iHc | iDc |
2361	                +----+-----+-----+-----+-----+-----+-----+

2363	                  Figure 18 Packing packets into a tunnel