idnits 2.17.1 

draft-ietf-intarea-tunnels-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  -- The draft header indicates that this document updates RFC4459, but the
     abstract doesn't seem to directly say this.  It does mention RFC4459
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC4459, updated by this document, for
     RFC5378 checks: 2004-06-14)

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 13, 2017) is 2601 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-16) exists of
     draft-ietf-nvo3-geneve-03

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 1981
     (Obsoleted by RFC 8201)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 6434
     (Obsoleted by RFC 8504)

  -- Obsolete informational reference (is this intentional?): RFC 6830
     (Obsoleted by RFC 9300, RFC 9301)

  -- Obsolete informational reference (is this intentional?): RFC 6833
     (Obsoleted by RFC 9301)

  == Outdated reference: A later version (-82) exists of
     draft-templin-aerolink-74


     Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------

1	Internet Area WG                                               J. Touch
2	Internet Draft                                                  USC/ISI
3	Intended status: Informational                              M. Townsley
4	Updates: 4459                                                     Cisco
5	Expires: September 2017                                  March 13, 2017

7	                  IP Tunnels in the Internet Architecture
8	                     draft-ietf-intarea-tunnels-04.txt

10	Status of this Memo

12	   This Internet-Draft is submitted in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   This document may contain material from IETF Documents or IETF
16	   Contributions published or made publicly available before November
17	   10, 2008. The person(s) controlling the copyright in some of this
18	   material may not have granted the IETF Trust the right to allow
19	   modifications of such material outside the IETF Standards Process.
20	   Without obtaining an adequate license from the person(s) controlling
21	   the copyright in such materials, this document may not be modified
22	   outside the IETF Standards Process, and derivative works of it may
23	   not be created outside the IETF Standards Process, except to format
24	   it for publication as an RFC or to translate it into languages other
25	   than English.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html

43	   This Internet-Draft will expire on September 13, 2017.

45	Copyright Notice

47	   Copyright (c) 2017 IETF Trust and the persons identified as the
48	   document authors. All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document. Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document. Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Abstract

62	   This document discusses the role of IP tunnels in the Internet
63	   architecture. An IP tunnel transits IP datagrams as payloads in non-
64	   link layer protocols. This document explains the relationship of IP
65	   tunnels to existing protocol layers and the challenges in supporting
66	   IP tunneling, based on the equivalence of tunnels to links. The
67	   implications of this document are used to derive recommendations that
68	   update MTU and fragment issues in RFC 4459.

70	Table of Contents

72	   1. Introduction...................................................3
73	   2. Conventions used in this document..............................6
74	      2.1. Key Words.................................................6
75	      2.2. Terminology...............................................6
76	   3. The Tunnel Model..............................................10
77	      3.1. What is a Tunnel?........................................11
78	      3.2. View from the Outside....................................13
79	      3.3. View from the Inside.....................................13
80	      3.4. Location of the Ingress and Egress.......................14
81	      3.5. Implications of This Model...............................14
82	      3.6. Fragmentation............................................15
83	         3.6.1. Outer Fragmentation.................................16
84	         3.6.2. Inner Fragmentation.................................17
85	         3.6.3. The Necessity of Outer Fragmentation................18
86	   4. IP Tunnel Requirements........................................19
87	      4.1. Encapsulation Header Issues..............................19
88	         4.1.1. General Principles of Header Fields Relationships...19
89	         4.1.2. Addressing Fields...................................20
90	         4.1.3. Hop Count Fields....................................20
91	         4.1.4. IP Fragment Identification Fields...................21
92	         4.1.5. Checksums...........................................22
93	      4.2. MTU Issues...............................................23
94	         4.2.1. Minimum MTU Considerations..........................23
95	         4.2.2. Fragmentation.......................................26
96	         4.2.3. Path MTU Discovery..................................29
97	      4.3. Coordination Issues......................................30
98	         4.3.1. Signaling...........................................30
99	         4.3.2. Congestion..........................................32
100	         4.3.3. Multipoint Tunnels and Multicast....................33
101	         4.3.4. Load Balancing......................................33
102	         4.3.5. Recursive Tunnels...................................34
103	   5. Observations..................................................34
104	      5.1. Summary of Recommendations...............................34
105	      5.2. Impact on Existing Encapsulation Protocols...............35
106	      5.3. Tunnel Protocol Designers................................38
107	         5.3.1. For Future Standards................................38
108	         5.3.2. Diagnostics.........................................38
109	      5.4. Tunnel Implementers......................................39
110	      5.5. Tunnel Operators.........................................39
111	   6. Security Considerations.......................................40
112	   7. IANA Considerations...........................................41
113	   8. References....................................................41
114	      8.1. Normative References.....................................41
115	      8.2. Informative References...................................41
116	   9. Acknowledgments...............................................46
117	   APPENDIX A: Fragmentation efficiency.............................48
118	      A.1. Selecting fragment sizes.................................48
119	      A.2. Packing..................................................49

121	1. Introduction

123	   The Internet layering architecture is loosely based on the ISO seven
124	   layer stack, in which data units traverse the stack by being wrapped
125	   inside data units of the next layer down [Cl88][Zi80]. A tunnel is a
126	   mechanism for transmitting data units between endpoints by wrapping
127	   them as data units of the same or higher layers, e.g., IP in IP
128	   (Figure 1) or IP in UDP (Figure 2).

130	                        +----+----+--------------+
131	                        | IP'| IP |     Data     |
132	                        +----+----+--------------+

134	                           Figure 1 IP inside IP

136	                     +----+-----+----+--------------+
137	                     | IP'| UDP | IP |     Data     |
138	                     +----+-----+----+--------------+

140	                   Figure 2 IP in UDP in IP in Ethernet

142	   This document focuses on tunnels that transit IP packets, i.e., in
143	   which an IP packet is the payload of another protocol, other than a
144	   typical link layer. A tunnel is a virtual link that can help decouple
145	   the network topology seen by transiting packets from the underlying
146	   physical network [To98][RFC2473]. Tunnels were critical in the
147	   development of multicast because not all routers were capable of
148	   processing multicast packets [Er94]. Tunnels allowed multicast
149	   packets to transit efficiently between multicast-capable routers over
150	   paths that did not support native link-layer multicast. Similar
151	   techniques have been used to support incremental deployment of other
152	   protocols over legacy substrates, such as IPv6 [RFC2546].

154	   Use of tunnels is common in the Internet. The word "tunnel" occurs in
155	   nearly 1,500 RFCs (of nearly 8,000 current RFCs, close to 20%), and
156	   is supported within numerous protocols, including:

158	   o  IP in IP / mobile IP - IPv4 in IPv4 tunnels
159	      [RFC2003][RFC2473][RFC5944]

161	   o  IP in IPv6 - IPv6 or IPv4 in IPv6 [RFC2473]

163	   o  IPsec - includes a tunnel mode to enable encryption or
164	      authentication of the an entire IP datagram inside another IP
165	      datagram [RFC4301]

167	   o  Generic Router Encapsulation (GRE) - a shim layer for tunneling
168	      any network layer in any other network layer, as in IP in GRE in
169	      IP [RFC2784][RFC7588][RFC7676], or inside UDP in IP [RFC8086]

171	   o  MPLS - a shim layer for tunneling IP over a circuit-like path over
172	      a link layer [RFC3031] or inside UDP in IP [RFC7510], in which
173	      identifiers are rewritten on each hop, often used for traffic
174	      provisioning

176	   o  LISP - a mechanism that uses multipoint IP tunnels to reduce
177	      routing table load within an enclave of routers at the expense of
178	      more complex tunnel ingress encapsulation tables [RFC6830]

180	   o  TRILL - a mechanism that uses multipoint L2 tunnels to enable use
181	      of L3 routing (typically IS-IS) in an enclave of Ethernet bridges
182	      [RFC5556][RFC6325]

184	   o  Generic UDP Encapsulation (GUE) - IP in UDP in IP [He16]

186	   o  Automatic Multicast Tunneling (AMT) - IP in UDP in IP for
187	      multicast [RFC7450]

189	   o  L2TP - PPP over IP, to extend a subscriber's DSL/FTTH connection
190	      from an access line provider to an ISP [RFC3931]

192	   o  L2VPNs - provides a link topology different from that provided by
193	      physical links [RFC4664]; many of these are not classical tunnels,
194	      using only tags (Ethernet VLAN tags) rather than encapsulation

196	   o  L3VPNs - provides a network topology different from that provided
197	      by ISPs [RFC4176]

199	   o  NVO3 - data center network sharing (to be determined, which may
200	      include use of GUE or other tunnels) [RFC7364]

202	   o  PWE3 - emulates wire-like services over packet-switched services
203	      [RFC3985]

205	   o  SEAL/AERO -IP in IP tunneling with an additional shim header
206	      designed to overcome the limitations of RFC2003 [RFC5320][Te16]

208	   The variety of tunnel mechanisms raises the question of the role of
209	   tunnels in the Internet architecture and the potential need for these
210	   mechanisms to have similar and predictable behavior. In particular,
211	   the ways in which packet sizes (i.e., Maximum Transmission Unit or
212	   MTU) mismatch and error signals (e.g., ICMP) are handled may benefit
213	   from a coordinated approach.

215	   Regardless of the layer in which encapsulation occurs, tunnels
216	   emulate a link. The only difference is that a link operates over a
217	   physical communication channel, whereas a tunnel operates over other
218	   software protocol layers. Because tunnels are links, they are subject
219	   to the same issues as any link, e.g., MTU discovery, signaling, and
220	   the potential utility of native support for broadcast and multicast
221	   [RFC3819]. Tunnels have some advantages over native links, being
222	   potentially easier to reconfigure and control because they can
223	   generally rely on existing out-of-band communication between its
224	   endpoints.

226	   The first attempt to use large-scale tunnels was to transit multicast
227	   traffic across the Internet in 1988, and this resulted in 'tunnel
228	   collapse'. At the time, tunnels were not implemented as
229	   encapsulation-based virtual links, but rather as loose source routes
230	   on un-encapsulated IP datagrams [RFC1075]. Then, as now, routers did
231	   not support use of the loose source route IP option at line rate, and
232	   the multicast traffic caused overload of the so-called "slow path"
233	   processing of IP datagrams in software. Using encapsulation tunnels
234	   avoided that collapse by allowing the forwarding of encapsulated
235	   packets to use the "fast path" hardware processing [Er94].

237	   The remainder of this document describes the general principles of IP
238	   tunneling and discusses the key considerations in the design of any
239	   protocol that tunnels IP datagrams. It derives its conclusions from
240	   the equivalence of tunnels and links and from requirements of
241	   existing standards for supporting IPv4 and IPv6 as payloads.

243	2. Conventions used in this document

245	2.1. Key Words

247	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
248	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
249	   document are to be interpreted as described in RFC-2119 [RFC2119].

251	   In this document, these key words will appear with that
252	   interpretation only when in ALL CAPS. Lower case uses of these words
253	   are not to be interpreted as carrying RFC-2119 significance.

255	2.2. Terminology

257	   This document uses the following terminology. Optional words in the
258	   term are indicated in parentheses, e.g., "(link or network)
259	   interface" or "egress (interface)".

261	   Terms from existing RFCs:

263	   o  Messages: variable length data labeled with globally-unique
264	      endpoint IDs, also known as a datagram for IP messages [RFC791].

266	   o  Node: a physical or logical network device that participates as
267	      either a host [RFC1122][RFC6434] or router [RFC1812]. This term
268	      originally referred to gateways since some very early RFCs [RFC5],
269	      but is currently the common way to describe a point in a network
270	      at which messages are processed.

272	   o  Host or endpoint: a node that sources or sinks messages labeled
273	      from/to its IDs, typically known as a host for both IP and higher-
274	      layer protocol messages [RFC1122].

276	   o  Source or sender: the node that generates a message [RFC1122].

278	   o  Destination or receiver: the node that consumes a message
279	      [RFC1122].

281	   o  Router or gateway: a node that relays IP messages using
282	      destination IDs and local context [RFC1812]. Routers also act as
283	      hosts when they source or sink messages. Also known as a forwarder
284	      for IP messages. Note that the notion of router is relative to the
285	      layer at which message processing is considered [To16].

287	   o  Link: a communications medium (or emulation thereof) that
288	      transfers IP messages between nodes without traversing a router
289	      (as would require decrementing the hop count) [RFC1122][RFC1812].

291	   o  (Link or network) Interface: a location on a link co-located with
292	      a node where messages depart onto that link or arrive from that
293	      link. On physical links, this interface formats the message for
294	      transmission and interprets the received signals.

296	   o  Path: a sequence of one or more links over which an IP message
297	      traverses between source and destination nodes (hosts or routers).

299	   o  (Link) MTU: the largest message that can transit a link [RFC791],
300	      also often referred to simply as "MTU". It does not include the
301	      size of link-layer information, e.g., link layer headers or
302	      trailers, i.e., it refers to the message that the link can carry
303	      as a payload rather than the message as it appears on the link.
304	      This is thus the largest network layer packet (including network
305	      layer headers, e.g., IP datagram) that can transit a link. Note
306	      that this need not be the native size of messages on the link,
307	      i.e., the link may internally fragment and reassemble messages.
308	      For IPv4, the smallest MTU must be at least 68 bytes [RFC791], and
309	      for IPv6 the smallest MTU must be at least 1280 bytes [RFC2460].

311	   o  EMTU_S (effective MTU for sending): the largest message that can
312	      transit a link, possibly also accounting for fragmentation that
313	      happens before the fragments are emitted onto the link [RFC1122].
314	      When source fragmentation is not possible, EMTU_S = (link) MTU.
315	      For IPv4, this is MUST be at least 68 bytes [RFC791] and for IPv6
316	      this MUST be at least 1280 bytes [RFC2460].

318	   o  EMTU_R (effective MTU to receive): the largest payload message
319	      that a receiver must be able to accept. This thus also represents
320	      the largest message that can traverse a link, taking into account
321	      reassembly at the receiver that happens after the fragments are
322	      received [RFC1122]. For IPv4, this is MUST be at least 576 bytes
323	      [RFC791] and for IPv6 this MUST be at least 1500 bytes [RFC2460].

325	   o  Path MTU (PMTU): the largest message that can transit a path of
326	      links [RFC1191][RFC1981]. Typically, this is the minimum of the
327	      link MTUs of the links of the path, and represents the largest
328	      network layer message (including network layer headers) that can
329	      transit a path without requiring fragmentation while in transit.
330	      Note that this is not the largest network packet that can be sent
331	      between a source and destination, because that network packet
332	      might have been fragmented at the network layer of the source and
333	      reassembled at the network layer of the destination (if
334	      supported).

336	   o  Tunnel: a protocol mechanism that transits messages between an
337	      ingress interface and egress interface using encapsulation to
338	      allow an existing network path to appear as a single link
339	      [RFC1853]. Note that a protocol can be used to tunnel itself (IP
340	      over IP). There is essentially no difference between a tunnel and
341	      the conventional layering of the ISO stack (i.e., by this
342	      definition, Ethernet is can be considered tunnel for IP). A tunnel
343	      is also known as a virtual link.

345	   o  Ingress (interface): the virtual link interface of a tunnel that
346	      receives messages within a node, encapsulates them according to
347	      the tunnel protocol, and transmits them into the tunnel [RFC2983].
348	      An ingress is the tunnel equivalent of the outgoing (departing)
349	      network interface of a link, and its encapsulation processing is
350	      the tunnel equivalent of encoding a message for transmission over
351	      a physical link. The ingress virtual link interface can be co-
352	      located with the traffic source.

354	      The term 'ingress' in other RFCs also refers to 'network ingress',
355	      which is the entry point of traffic to a transit network. Because
356	      this document focuses on tunnels, the term "ingress" used in the
357	      remainder of this document implies "tunnel ingress".

359	   o  Egress (interface): a virtual link interface of a tunnel that
360	      receives messages that have finished transiting a tunnel and
361	      presents them to a node [RFC2983]. For reasons similar to ingress,
362	      the term 'egress' will refer to 'tunnel egress' throughout the
363	      remainder of this document. An egress is the tunnel equivalent of
364	      the incoming (arriving) network interface of a link and its
365	      decapsulation processing is the tunnel equivalent of interpreting
366	      a signal received from a physical link. The egress decapsulates
367	      messages for further transit to the destination. The egress
368	      virtual link interface can be co-located with the traffic
369	      destination.

371	   o  Ingress node: network device on which an ingress is attached as a
372	      virtual link interface [RFC2983]. Note that a node can act as both
373	      an ingress node and an egress node at the same time, but typically
374	      only for different tunnels.

376	   o  Egress node: device where an egress is attached as a virtual link
377	      interface [RFC2983]. Note that a device can act as both a ingress
378	      node and an egress node at the same time, but typically only for
379	      different tunnels.

381	   o  Inner header: the header of the message as it arrives to the
382	      ingress [RFC2003].

384	   o  Outer header(s): the headers added to the message by the ingress,
385	      as part of the encapsulation for tunnel transit [RFC2003].

387	   o  Mid-tunnel fragmentation: Fragmentation of the message during the
388	      tunnel transit, as could occur for IPv4 datagrams with DF=0
389	      [RFC2983].

391	   o  Atomic packet or datagram: an IP packet that has not been
392	      fragmented and which cannot be fragmented further [RFC6864]

394	   The following terms are introduced by this document:

396	   o  (Tunnel) transit packet: the packet arriving at a node connected
397	      to a tunnel that enters the ingress interface and exits the egress
398	      interface, i.e., the packet carried over the tunnel. This is
399	      sometimes known as the 'tunneled packet', i.e., the packet carried
400	      over the tunnel. This is the tunnel equivalent of a network layer
401	      packet as it would traverse a link. This document focuses on IPv4
402	      and IPv6 transit packets.

404	   o  (Tunnel) link packet: packets that traverse from ingress interface
405	      to egress interface, in which resides all or part of a transit
406	      packet. This is the tunnel equivalent of a link layer packet as it
407	      would traverse a link, which is why we use the same terminology.

409	   o  Tunnel MTU: the largest transit packet that can traverse a tunnel,
410	      i.e., the tunnel equivalent of a link MTU, which is why we use the
411	      same terminology. This is the largest transit packet which can be
412	      reassembled at the egress interface.

414	   o  Tunnel atom: the largest transit packet that can traverse a tunnel
415	      as an atomic packet, i.e., without requiring tunnel link packet
416	      fragmentation either at the ingress or on-path between the ingress
417	      and egress.

419	   o  Inner fragmentation: fragmentation of the transit packet that
420	      arrives at the ingress interface before any additional headers are
421	      added. This can only correctly occur for IPv4 DF=0 datagrams.

423	   o  Outer fragmentation: source fragmentation of the tunnel link
424	      packet after encapsulation; this can involve fragmenting the
425	      outermost header or any of the other (if any) protocol layers
426	      involved in encapsulation.

428	   o  Maximum frame size (MFS): the link-layer equivalent of the MTU,
429	      using the OSI term 'frame'. For Ethernet, the MTU (network packet
430	      size) is 1500 bytes but the MFS (link frame size) is 1518 bytes
431	      originally, and 1522 bytes assuming VLAN (802.1Q) tagging support.

433	   o  EMFS_S: the link layer equivalent of EMTU_S.

435	   o  EMFS_R: the link layer equivalent of EMTU_R.

437	   o  Path MFS: the link layer equivalent of PMTU.

439	3. The Tunnel Model

441	   A network architecture is an abstract description of a distributed
442	   communications system, its components and their relationships, the
443	   requisite properties of those components and the emergent properties
444	   of the system that result [To03]. Such descriptions can help explain
445	   behavior, as when the OSI seven-layer model is used as a teaching
446	   example [Zi80]. Architectures describe capabilities - and, just as
447	   importantly, constraints.

449	   A network can be defined as a system of endpoints and relays
450	   interconnected by communication paths, abstracting away issues of
451	   naming in order to focus on message forwarding. To the extent that
452	   the Internet has a single, coherent interpretation, its architecture
453	   is defined by its core protocols (IP [RFC791], TCP [RFC793], UDP
454	   [RFC768]) whose messages are handled by hosts, routers, and links
455	   [Cl88][To03], as shown in Figure 3:

457	               +------+    ------      ------    +------+
458	               |      |   /      \    /      \   |      |
459	               | HOST |--+ ROUTER +--+ ROUTER +--| HOST |
460	               |      |   \      /    \      /   |      |
461	               +------+    ------      ------    +------+

463	                   Figure 3 Basic Internet architecture

465	   As a network architecture, the Internet is a system of hosts
466	   (endpoints) and routers (relays) interconnected by links that
467	   exchange messages when possible. "When possible" defines the
468	   Internet's "best effort" principle. The limited role of routers and
469	   links represents the End-to-End Principle [Sa84] and longest-prefix
470	   match enables hierarchical forwarding using compact tables.

472	   Although the definitions of host, router, and link seem absolute,
473	   they are often relative as viewed within the context of one protocol
474	   layer, each of which can be considered a distinct network
475	   architecture. An Internet gateway is an OSI Layer 3 router when it
476	   transits IP datagrams but it acts as an OSI Layer 2 host as it
477	   sources or sinks Layer 2 messages on attached links to accomplish
478	   this transit capability. In this way, one device (Internet gateway)
479	   behaves as different components (router, host) at different layers.

481	   Even though a single device may have multiple roles - even
482	   concurrently - at a given layer, each role is typically static and
483	   determined by context. An Internet gateway always acts as a Layer 2
484	   host and that behavior does not depend on where the gateway is viewed
485	   from within Layer 2. In the context of a single layer, a device's
486	   behavior is typically modeled as a single component from all
487	   viewpoints in that layer (with some notable exceptions, e.g., Network
488	   Address Translators, which appear as hosts and routers, depending on
489	   the direction of the viewpoint [To16]).

491	3.1. What is a Tunnel?

493	   A tunnel can be modeled as a link in another network
494	   [To98][To01][To03]. In Figure 4, a source host (Hsrc) and destination
495	   host (Hdst) communicating over a network M in which two routers (Ra
496	   and Rd) are connected by a tunnel. Keep in mind that it is possible
497	   that both network N and network M can both be components of the
498	   Internet, i.e., there may be regular traffic as well as tunneled
499	   traffic over any of the routers shown.

501	                     --_                         --
502	         +------+   /  \                        /  \   +------+
503	         | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
504	         +------+   \  //\    /  \    /  \    /\\  /   +------+
505	                     --/I \--+ Rb +--+ Rc +--/E \--
506	                       \  /   \  /    \  /   \  /
507	                        \/     --      --     \/
508	                       <------ Network N ------->
509	         <-------------------- Network M --------------------->

511	                         Figure 4 The big picture

513	   The tunnel consists of two interfaces - an ingress (I) and an egress
514	   (E) that lie along a path connected by network N. Regardless of how
515	   the ingress and egress interfaces are connected, the tunnel serves as
516	   a link between the nodes it connects (here, Ra and Rd).

518	   IP packets arriving at the ingress interface are encapsulated to
519	   traverse network N. We call these packets 'tunnel transit packets'
520	   (or just 'transit packets') because they will transit the tunnel
521	   inside one or more of what we call 'tunnel link packets'. Transit
522	   packets correspond to network (IP) packets traversing a conventional
523	   link and tunnel link packets correspond to the packets of a
524	   conventional link layer (which can be called just 'link packets').

526	   Link packets use the source address of the ingress interface and the
527	   destination address of the egress interface - using whatever address
528	   is appropriate to the Layer at which the ingress and egress
529	   interfaces operate (Layer 2, Layer 3, Layer 4, etc.). The egress
530	   interface decapsulates those messages, which then continue on network
531	   M as if emerging from a link. To transit packets and to the routers
532	   the tunnel connects (Ra and Rd), the tunnel acts as a link and the
533	   ingress and egress interfaces act as network interfaces to that link.

535	   The model of each component (ingress and egress interfaces) and the
536	   entire system (tunnel) depends on the layer from which they are
537	   viewed. From the perspective of the outermost hosts (Hsrc and Hdst),
538	   the tunnel appears as a link between two routers (Ra and Rd). For
539	   routers along the tunnel (e.g., Rb and Rc), the ingress and egress
540	   interfaces appear as the endpoint hosts on network N.

542	   When the tunnel network (N) is implemented using the same protocol as
543	   the endpoint network (M), the picture looks flatter (Figure 5), as if
544	   it were running over a single network. However, this appearance is
545	   incorrect - nothing has changed from the previous case. From the
546	   perspective of the endpoints, Rb and Rc and network N don't exist and
547	   aren't visible, and from the perspective of the tunnel, network M
548	   doesn't exist. The fact that network N and M use the same protocol,
549	   and may traverse the same links is irrelevant.

551	                   --_         --      --          --
552	       +------+   /  \  /\    /  \    /  \    /\  /  \   +------+
553	       | Hsrc |--+ Ra +/I \--+ Rb +--+ Rc +--/E \+ Rd +--| Hdst |
554	       +------+   \  / \  /   \  /    \  /   \  / \  /   +------+
555	                   --   \/     --      --     \/   --
556	                         <---- Network N ----->
557	           <------------------ Network M ------------------->

559	                     Figure 5 IP in IP network picture

561	3.2. View from the Outside

563	   As already observed, from outside the tunnel, to network M, the
564	   entire tunnel acts as a link (Figure 6). Consequently all
565	   requirements for links supporting IP also apply to tunnels [RFC3819].

567	                   --_                             --
568	       +------+   /  \                            /  \   +------+
569	       | Hsrc |--+ Ra +--------------------------+ Rd +--| Hdst |
570	       +------+   \  /                            \  /   +------+
571	                   --                              --
572	           <------------------ Network M ------------------->

574	                Figure 6 Tunnels as viewed from the outside

576	   For example, the IP datagram hop counts (IPv4 Time-to-Live [RFC791]
577	   and IPv6 Hop Limit [RFC2460]) are decremented when traversing a
578	   router, but not when traversing a link - or thus a tunnel. Similarly,
579	   because the ingress and egress are interfaces on this outer network,
580	   they should never issue ICMP messages. A router or host would issue
581	   the appropriate ICMP, e.g., "packet too big" (IPv4 fragmentation
582	   needed and DF set [RFC792] or IPv6 packet too big [RFC4443]), when
583	   trying to send a packet to the egress, as it would for any interface.

585	   Tunnels have a tunnel MTU - the largest message that can transit that
586	   tunnel, just as links have a link MTU. Tis MTU may not reflect the
587	   native message size of hops within a multihop link (or tunnel) and
588	   the same is true for a tunnel. In both cases, the MTU is defined by
589	   the link's (or tunnel's) effective MTU to receive (EMTU_R).

591	3.3. View from the Inside

593	   Within network N, i.e., from inside the tunnel itself, the ingress
594	   interface is a source of tunnel link packets and the egress interface
595	   is a sink - so both are viewed as hosts on network N (Figure 7).
596	   Consequently [RFC1122] Internet host requirements apply to ingress
597	   and egress interfaces when Network N uses IP (and thus the
598	   ingress/egress interfaces use IP encapsulation).

600	                   _           --      --
601	                        /\    /  \    /  \    /\
602	                       /I \--+ Rb +--+ Rc +--/E \
603	                       \  /   \  /    \  /   \  /
604	                        \/     --      --     \/
605	                         <---- Network N ----->

607	            Figure 7 Tunnels, as viewed from within the tunnel

609	   Viewed from within the tunnel, the outer network (M) doesn't exist.
610	   Tunnel link packets can be fragmented by the source (ingress
611	   interface) and reassembled at the destination (egress interface),
612	   just as at conventional hosts. The path between ingress and egress
613	   interfaces has a path MTU, but the endpoints can exchange messages as
614	   large as can be reassembled at the destination (egress interface),
615	   i.e., the EMTU_R of the egress interface. However, in both cases,
616	   these MTUs refer to the size of the message that can transit the
617	   links and between the hosts of network N, which represents a link
618	   layer to network M. I.e., the MTUs of network N represent the maximum
619	   frame sizes (MFSs) of the tunnel as a link in network M.

621	   Information about the network - i.e., regarding network N MTU sizes,
622	   network reachability, etc. - are relayed from the destination (egress
623	   interface) and intermediate routers back to the source (ingress
624	   interface), without regard for the external network (M). When such
625	   messages arrive at the ingress interface, they may affect the
626	   properties of that interface (e.g., its reported MTU to network M),
627	   but they should never directly cause new ICMPs in the outer network
628	   M. Again, events at interfaces don't generate ICMP messages; it would
629	   be the host or router at which that interface is attached that would
630	   generate ICMPs, e.g., upon attempting to use that interface.

632	3.4. Location of the Ingress and Egress

634	   The ingress and egress interfaces are endpoints of the tunnel. Tunnel
635	   interfaces may be physical or virtual. The interface may be
636	   implemented inside the node where the tunnel attaches, e.g., inside a
637	   host or router. The interface may also be implemented as a "bump in
638	   the wire" (BITW), somewhere along a link between the two nodes the
639	   link interconnects. IP in IP tunnels are often implemented as
640	   interfaces on nodes, whereas IPsec tunnels are sometimes implemented
641	   as BITW. These implementation variations determine only whether
642	   information available at the link endpoints (ingress/egress
643	   interfaces) can be easily shared with the connected network nodes.

645	3.5. Implications of This Model

647	   This approach highlights a few key features of a tunnel as a network
648	   architecture construct:

650	   o  To the transit packets, tunnels turn a network (Layer 3) path into
651	      a (Layer 2) link

653	   o  To nodes the tunnel traverses, the tunnel ingress and egress
654	      interfaces act as hosts that source and sink tunnel link packets

656	   The consequences of these features are as follow:

658	   o  Like a link MTU, a tunnel MTU is defined by the effective MTU of
659	      the receiver (i.e., EMTU_R of the egress).

661	   o  The messages inside the tunnel are treated like any other link
662	      layer, i.e., the MTU is determined by the largest (transit)
663	      payload that traverses the link.

665	   o  The tunnel path MFS is not relevant to the transited traffic.
666	      There is no mechanism or protocol by which it can be determined.

668	   o  Because routers, not links, alter hop counts [RFC1812], hopcounts
669	      are not decremented solely by the transit of a tunnel. A packet
670	      with a hop count of zero should successfully transit a link (and
671	      thus a tunnel) that connects two hosts.

673	   o  The addresses of a tunnel ingress and egress interface correspond
674	      to link layer addresses to the transit packet. Like links, some
675	      tunnels may not have their own addresses. Like network interfaces,
676	      ingress and egress interfaces typically require network layer
677	      addresses.

679	   o  Like network interfaces, the ingress and egress interfaces are
680	      never a direct source of ICMP messages but may provide information
681	      to their attached host or router to generate those ICMP messages
682	      during the processing of transit packets.

684	   o  Like network interfaces and links, two nodes may be connected by
685	      any combination of tunnels and links, including multiple tunnels.
686	      As with multiple links, existing network layer forwarding
687	      determines which IP traffic uses each link or tunnel.

689	   These observations make it much easier to determine what a tunnel
690	   must do to transit IP packets, notably it must satisfy all
691	   requirements expected of a link [RFC1122][RFC3819]. The remainder of
692	   this document explores these implications in greater detail.

694	3.6. Fragmentation

696	   There are two places where fragmentation can occur in a tunnel,
697	   called 'outer fragmentation' and 'inner fragmentation'. This document
698	   assumes that only outer fragmentation is viable because it is the
699	   only approach that works for both IPv4 datagrams with DF=1 and IPv6.

701	3.6.1. Outer Fragmentation

703	   Outer fragmentation is shown in Figure 8. The bottom of the figure
704	   shows the network topology, where transit packets originate at the
705	   source, enter the tunnel at the ingress interface for encapsulation,
706	   exit the tunnel at the egress interface where they are decapsulated,
707	   and arrive at the destination. The packet traffic is shown above the
708	   topology, where the transit packets are shown at the top. In this
709	   diagram, the ingress interface is located on router 'Ra' and the
710	   egress interface is located on router 'Rd'.

712	   When the link packet - which is the encapsulated transit packet -
713	   would exceed the tunnel MTU, the packet needs to be fragmented. In
714	   this case the packet is fragmented at the outer (link) header, with
715	   the fragments shown as (b1) and (b2). The outer header indicates
716	   fragmentation (as ' and "), the inner (transit) header occurs only in
717	   the first fragment, and the inner (transit) data is broken across the
718	   two packets. These fragments are reassembled at the egress interface
719	   during decapsulation in step (c), where the resulting link packet is
720	   reassembled and decapsulated so that the transit packet can continue
721	   on its way to the destination.

723	    Transit packet
724	    +----+----+                                              +----+----+
725	    | iH | iD |------+ -  -  -  -  -  -  -  -  -  -  +------>| iH | iD |
726	    +----+----+      |                               |       +----+----+
727	                     v Link packet                   |
728	              +----+----+----+               +----+----+----+
729	          (a) | oH | iH | iD |               | oH | iH | iD | (d)
730	              +----+----+----+               +----+----+----+
731	                     |                               ^
732	                     |    Link packet fragment #1    |
733	                     |       +----+----+-----+       |
734	                (b1) +----- >| oH'| iH | iD1 |-------+ (c)
735	                     |       +----+----+-----+       |
736	                     |                               |
737	                     |    Link packet fragment #2    |
738	                     |       +----+-----+            |
739	                (b2) +----- >| oH"| iD2 |------------+
740	                             +----+-----+
741	   +-----+    +--+ +---+                           +---+ +--+    +-----+
742	   |     |    |  |/     \                         /     \|  |    |     |
743	   | Src |----|Ra|Ingress|=======================|Egress |Rd|----| Dst |
744	   |     |    |  |\     /                         \     /|  |    |     |
745	   +-----+    +--+ +---+                           +---+ +--+    +-----+

747	             Figure 8 Fragmentation of the (outer) link packet

749	   Outer fragmentation isolates the tunnel encapsulation duties to the
750	   ingress and egress interfaces. This can be considered a benefit in
751	   clean, layered network design, but also may require complex egress
752	   interface decapsulation, especially where tunnels aggregate large
753	   amounts of traffic, such as may result in IP ID overload (see Sec.
754	   4.1.4). Outer fragmentation is valid for any tunnel link protocol
755	   that supports fragmentation (e.g., IPv4 or IPv6), in which the tunnel
756	   endpoints act as the host endpoints of that protocol.

758	   Along the tunnel, the inner (transit) header is contained only in the
759	   first fragment, which can interfere with mechanisms that 'peek' into
760	   lower layer headers, e.g., as for relayed ICMP (see Sec. 4.3).

762	3.6.2. Inner Fragmentation

764	   Inner fragmentation distributes the impact of tunnel fragmentation
765	   across both egress interface decapsulation and transit packet
766	   destination, as shown in Figure 9; this can be especially important
767	   when the tunnel would otherwise need to source (outer) fragment large
768	   amounts of traffic. However, this mechanism is valid only when the
769	   transit packets can be fragmented on-path, e.g., as when the transit
770	   packets are IPv4 datagrams with DF=0.

772	   Again, the network topology is shown at the bottom of the figure, and
773	   the original packets show at the top. Packets arrive at the ingress
774	   node (router Ra) and are fragmented there based into transit packet
775	   fragments #1 (a1) and #2 (a2). These fragments are encapsulated at
776	   the ingress interface in steps (b1) and (b2) and each resulting link
777	   packet traverses the tunnel. When these link packets arrive at the
778	   egress interface they are decapsulated in steps (c1) and (c2) and the
779	   egress node (router) forwards the transit packet fragments to their
780	   destination. This destination is then responsible for reassembling
781	   the transit packet fragments into the original transit packet (d).

783	   Along the tunnel, the inner headers are copied into each fragment,
784	   and so can be 'peeked at' inside the tunnel (see Sec. 4.3).
785	   Fragmentation shifts from the ingress interface to the ingress router
786	   and reassembly shifts from the egress interface to the destination.

788	    Transit packet
789	   +----+----+                                               +----+----+
790	   | iH | iD |-+ - - - - -  -  -  -  -  -  -  -  -  -  -  - >| iH | iD |
791	   +----+----+ |                                             +----+----+
792	               v Transit packet fragment #1                         ^
793	            +----+-----+                           +----+-----+     |
794	       (a1) | iH'| iD1 |                           | iH'| iD1 |-----+(d)
795	            +----+-----+                           +----+-----+     ^
796	               |     |        Link packet #1         ^              |
797	               |     |       +----+----+-----        |              |
798	               | (b1)+----- >| oH | iH'| iD1 |-------+(c1)          |
799	               |             +----+----+-----+                      |
800	               |                                                    |
801	               v Transit packet fragment #2                         |
802	            +----+-----+                           +----+-----+     |
803	       (a2) | iH"| iD2 |                           | iH"| iD2 |-----+
804	            +----+-----+                           +----+-----+
805	                     |        Link packet #2         |
806	                     |       +----+----+-----+       |
807	                 (b2)+----- >| oH | iH"| iD2 |-------+(c2)
808	                             +----+----+-----+
809	   +-----+    +--+ +---+                           +---+ +--+    +-----+
810	   |     |    |  |/     \                         /     \|  |    |     |
811	   | Src |----|Ra|Ingress|=======================|Egress |Rd|----| Dst |
812	   |     |    |  |\     /                         \     /|  |    |     |
813	   +-----+    +--+ +---+                           +---+ +--+    +-----+

815	           Figure 9 Fragmentation of the inner (transit) packet

817	3.6.3. The Necessity of Outer Fragmentation

819	   Fragmentation is critical for tunnels that support transit packets
820	   for protocols with minimum MTU requirements, while operating over
821	   tunnel paths using protocols that have their own MTU requirements.
822	   Depending on the amount of space used by encapsulation, these two
823	   minimums will ultimately interfere (especially when a protocol
824	   transits itself either directly, as with IP-in-IP, or indirectly, as
825	   in IP-in-GRE-in-IP), and the transit packet will need to be
826	   fragmented to both support a tunnel MTU while traversing tunnels with
827	   their own tunnel path MTUs.

829	   Outer fragmentation is the only solution that supports all IPv4 and
830	   IPv6 traffic, because inner fragmentation is allowed only for IPv4
831	   datagrams with DF=0.

833	4. IP Tunnel Requirements

835	   The requirements of an IP tunnel are defined by the requirements of
836	   an IP link because both transit IP packets. A tunnel thus must
837	   transit the IP minimum MTU, i.e., 68 bytes for IPv4 [RFC793] and 1280
838	   bytes for IPv6 [RFC2460] and a tunnel must support address resolution
839	   when there is more than one egress interface for that tunnel.

841	   The requirements of the tunnel ingress and egress interfaces are
842	   defined by the network over which they exchange messages (link
843	   packets). For IP-over-IP, this means that the ingress interface MUST
844	   NOT exceed the IP fragment identification field uniqueness
845	   requirements [RFC6864]. Uniqueness is more difficult to maintain at
846	   high packet rates for IPv4, whose fragment ID field is only 16 bits.

848	   These requirements remain even though tunnels have some unique
849	   issues, including the need for additional space for encapsulation
850	   headers and the potential for tunnel MTU variation.

852	4.1. Encapsulation Header Issues

854	   Tunneling uses encapsulation uses a non-link protocol as a link
855	   layer. The encapsulation layer thus has the same requirements and
856	   expectations as any other IP link layer when used to transit IP
857	   packets. These relationships are addressed in the following
858	   subsections.

860	4.1.1. General Principles of Header Fields Relationships

862	   Some tunnel specifications attempt to relate the header fields of the
863	   transit packet and tunnel link packet. In some cases, this
864	   relationship is warranted, whereas in other cases the two protocol
865	   layers need to be isolated from each other. For example, the tunnel
866	   link header source and destination addresses are network endpoints in
867	   the tunnel network N, but have no meaning in the outer network M. The
868	   two sets of addresses are effectively independent, just as are other
869	   network and link addresses.

871	   Because the tunneled packet uses source and destination addresses
872	   with a separate meaning, it is inappropriate to copy or reuse the
873	   IPv4 Identification (ID) or IPv6 Fragment ID fields of the tunnel
874	   transit packet (see Section 4.1.4). Similarly, the DF field of the
875	   transit packet is not related to that field in the tunnel link packet
876	   header (presuming both are IPv4) (see Section 4.2). Most other fields
877	   are similarly independent between the transit packet and tunnel link
878	   packet. When a field value is generated in the encapsulation header,
879	   its meaning should be derived from what is desired in the context of
880	   the tunnel as a link. When feedback is received from these fields,
881	   they should be presented to the tunnel ingress and egress as if they
882	   were network interfaces. The behavior of the node where these
883	   interfaces attach should be identical to that of a conventional link.

885	   There are exceptions to this rule that are explicitly intended to
886	   relay signals from inside the tunnel to the network outside the
887	   tunnel, typically relevant only when the tunnel network N and the
888	   outer network M use the same network. These apply only when that
889	   coordination is defined, as with explicit congestion notification
890	   (ECN) [RFC6040] (see Section 4.3.2), and differentiated services code
891	   points (DSCPs) [RFC2983]. Equal-cost multipath routing may also
892	   affect how some encapsulation fields are set, including IPv6 flow
893	   labels [RFC6438] and source ports for transport protocols when used
894	   for tunnel encapsulation [RFC8085] (see Section 4.3.4).

896	4.1.2. Addressing Fields

898	   Tunnel ingresses and egresses have addresses associated with the
899	   encapsulation protocol. These addresses are the source and
900	   destination (respectively) of the encapsulated packet while
901	   traversing the tunnel network.

903	   Tunnels may or may not have addresses in the network whose traffic
904	   they transit (e.g., network M in Figure 4). In some cases, the tunnel
905	   is an unnumbered interface to a point-to-point virtual link. When the
906	   tunnel has multiple egresses, tunnel interfaces require separate
907	   addresses in network M.

909	   To see the effect of tunnel interface addresses, consider traffic
910	   sourced at router Ra in Figure 4. Even before being encapsulated by
911	   the ingress, traffic needs a source IP network address that belongs
912	   to the router. One option is to use an address associated with one of
913	   the other interfaces of the router [RFC1122]. Another option is to
914	   assign a number to the tunnel interface itself. Regardless of which
915	   address is used, the resulting IP packet is then encapsulated by the
916	   tunnel ingress using the ingress address as a separate operation.

918	4.1.3. Hop Count Fields

920	   The Internet hop count field is used to detect and avoid forwarding
921	   loops that cannot be corrected without a synchronized reboot. The
922	   IPv4 Time-to-Live (TTL) and IPv6 Hop Limit field each serve this
923	   purpose [RFC791][RFC2460]. The IPv4 TTL field was originally intended
924	   to indicate packet expiration time, measured in seconds. A router is
925	   required to decrement the TTL by at least one or the number of
926	   seconds the packet is delayed, whichever is larger [RFC1812]. Packets
927	   are rarely held that long, and so the field has come to represent the
928	   count of the number of routers traversed. IPv6 makes this meaning
929	   more explicit.

931	   These hop count fields represent the number of network forwarding
932	   elements (routers) traversed by an IP datagram. An IP datagram with a
933	   hop count of zero can traverse a link between two hosts because it
934	   never visits a router (where it would need to be decremented and
935	   would have been dropped).

937	   An IP datagram traversing a tunnel thus need not have its hop count
938	   modified, i.e., the tunnel transit header need not be affected. A
939	   zero hop count datagram should be able to traverse a tunnel as easily
940	   as it traverses a link. A router MAY be configured to decrement
941	   packets traversing a particular link (and thus a tunnel), which may
942	   be useful in emulating a tunnel path as if it were a network path
943	   that traversed one or more routers, but this is strictly optional.
944	   The ability of the outer network M and tunnel network N to avoid
945	   indefinitely looping packets does not rely on the hop counts of the
946	   transit packet and tunnel link packet being related.

948	   The hop count field is also used by several protocols to determine
949	   whether endpoints are 'local', i.e., connected to the same subnet
950	   (link-local discovery and related protocols [RFC4861]). A tunnel is a
951	   way to make a remote network address appear directly-connected, so it
952	   makes sense that the other ends of the tunnel appear local and that
953	   such link-local protocols operate over tunnels unless configured
954	   explicitly otherwise. When the interfaces of a tunnel are numbered,
955	   these can be interpreted the same way as if they were on the same
956	   link subnet.

958	4.1.4. IP Fragment Identification Fields

960	   Both IPv4 and IPv6 include an IP Identification (ID) field to support
961	   IP datagram fragmentation and reassembly [RFC791][RFC1122][RFC2460].
962	   When used, the ID field is intended to be unique for every packet for
963	   a given source address, destination address, and protocol, such that
964	   it does not repeat within the Maximum Segment Lifetime (MSL).

966	   For IPv4, this field is in the default header and is meaningful only
967	   when either source fragmented or DF=0 ("non-atomic packets")
968	   [RFC6864]. For IPv6, this field is contained in the optional Fragment
969	   Header [RFC2460]. Although IPv6 supports only source fragmentation,
970	   the field may occur in atomic fragments [RFC6946].

972	   Although the ID field was originally intended for fragmentation and
973	   reassembly, it can also be used to detect and discard duplicate
974	   packets, e.g., at congested routers (see Sec. 3.2.1.5 of [RFC1122]).
975	   For this reason, and because IPv4 packets can be fragmented anywhere
976	   along a path, all non-atomic IPv4 packets and all IPv6 packets
977	   between a source and destination of a given protocol must have unique
978	   ID values over the potential fragment reordering period
979	   [RFC2460][RFC6864].

981	   The uniqueness of the IP ID is a known problem for high speed nodes,
982	   because it limits the speed of a single protocol between two
983	   endpoints [RFC4963]. Although this RFC suggests that the uniqueness
984	   of the IP ID is moot, tunnels exacerbate this condition. A tunnel
985	   often aggregates traffic from a number of different source and
986	   destination addresses, of different protocols, and encapsulates them
987	   in a header with the same ingress and egress addresses, all using a
988	   single encapsulation protocol. If the ingress enforces IP ID
989	   uniqueness, this can either severely limit tunnel throughput or can
990	   require substantial resources; the alternative is to ignore IP ID
991	   uniqueness and risk reassembly errors. Although fragmentation is
992	   somewhat rare in the current Internet at large, but it can be common
993	   along a tunnel. Reassembly errors are not always detected by other
994	   protocol layers (see Sec. 4.3.3) , and even when detected they can
995	   result in excessive overall packet loss and can waste bandwidth
996	   between the egress and ultimate packet destination.

998	   The 32-bit IPv6 ID field in the Fragment Header is typically used
999	   only during source fragmentation. The size of the ID field is
1000	   typically sufficient that a single counter can be used at the tunnel
1001	   ingress, regardless of the endpoint addresses or next-header
1002	   protocol, allowing efficient support for very high throughput
1003	   tunnels.

1005	   The smaller 16-bit IPv4 ID is more difficult to correctly support. A
1006	   recent update to IPv4 allows the ID to be repeated for atomic
1007	   packets. When either source fragmentation or on-path fragmentation is
1008	   supported, the tunnel ingress may need to keep independent ID
1009	   counters for each tunnel source/destination/protocol tuple.

1011	4.1.5. Checksums

1013	   IP traffic transiting a tunnel needs to expect a similar level of
1014	   error detection and correction as it would expect from any other
1015	   link. In the case of IPv4, there are no such expectations, which is
1016	   partly why it includes a header checksum [RFC791].

1018	   IPv6 omitted the header checksum because it already expects most link
1019	   errors to be detected and dropped by the link layer and because it
1020	   also assumes transport protection [RFC2460]. When transiting IPv6
1021	   over IPv6, the tunnel fails to provide the expected error detection.
1022	   This is why IPv6 is often tunneled over layers that include separate
1023	   protection, such as GRE [RFC2784].

1025	   The fragmentation created by the tunnel ingress can increase the need
1026	   for stronger error detection and correction, especially at the tunnel
1027	   egress to avoid reassembly errors. The Internet checksum is known to
1028	   be susceptible to reassembly errors that could be common [RFC4963],
1029	   and should not be relied upon for this purpose. This is why some
1030	   tunnel protocols, e.g., SEAL and AERO [RFC5320][Te16], include a
1031	   separate checksum. This requirement can be undermined when using UDP
1032	   as a tunnel with no UDP checksum (as per [RFC6935][RFC6936]) when
1033	   fragmentation occurs because the egress has no checksum with which to
1034	   validate reassembly. For this reason, it is safe to use UDP with a
1035	   zero checksum for atomic tunnel link packets only; when used on
1036	   fragments, whether generated at the ingress or en-route inside the
1037	   tunnel, omission of such a checksum can result in reassembly errors
1038	   that can cause additional work (capacity, forwarding processing,
1039	   receiver processing) downstream of the egress.

1041	4.2. MTU Issues

1043	   Link MTUs, IP datagram limits, and transport protocol segment sizes
1044	   are already related by several requirements
1045	   [RFC768][RFC791][RFC1122][RFC1812][RFC2460] and by a variety of
1046	   protocol mechanisms that attempt to establish relationships between
1047	   them, including path MTU discovery (PMTUD) [RFC1191][RFC1981],
1048	   packetization layer path MTU discovery (PLMTUD) [RFC4821], as well as
1049	   mechanisms inside transport protocols [RFC793][RFC4340][RFC4960]. The
1050	   following subsections summarize the interactions between tunnels and
1051	   MTU issues, including minimum tunnel MTUs, tunnel fragmentation and
1052	   reassembly, and MTU discovery.

1054	4.2.1. Minimum MTU Considerations

1056	   There are a variety of values of minimum MTU values to consider, both
1057	   in a conventional network and in a tunnel as a link in that network.
1058	   These are indicated in Figure 10, an annotated variant of Figure 4.
1059	   Note that a (link) MTU (a) corresponds to a tunnel MTU (d) and that a
1060	   path MTU (b) corresponds to a tunnel path MTU (e). The tunnel MTU is
1061	   the EMTU_R of the egress interface, because that defines the largest
1062	   transit packet message that can traverse the tunnel as a link in
1063	   network M. The ability to traverse the hops of the tunnel - in
1064	   network N - is not related, and only the ingress need be concerned
1065	   with that value.

1067	                    --_                            --
1068	        +------+   /  \                           /  \   +------+
1069	        | Hsrc |--+ Ra +       --       --       + Rd +--| Hdst |
1070	        +------+   \  //\     /  \     /  \     /\\  /   +------+
1071	                    --/I \---+ Rb +---+ Rc +---/E \--
1072	                      \  /    \  /     \  /    \  /
1073	                       \/      --       --      \/
1074	                        <----- Network N ------->
1075	         <-------------------- Network M --------------------->

1077	   Communication in network M viewed at that layer:
1078	    (a)         <->          Link MTU
1079	    (b)                <---- Tunnel MTU --------->
1080	    (c)         <----------- Path MTU ----------------->
1081	    (d) <------------------- EMTU_R --------------------------->

1083	   Communication in network N viewed at that layer:
1084	    (e)                   <--> Link MTU
1085	    (f)                   <--- Path MTU ------>
1086	    (g)                 <----- EMTU_R --------->

1088	   Communication in network N viewed from network M:
1089	    (h)                   <--> MFS
1090	    (i)                   <--- Path MFS ------>
1091	    (j)                 <----- EMFS_R --------->

1093	                    Figure 10 The variety of MTU values

1095	   Consider the following example values. For IPv6 transit packets, the
1096	   minimum (link) MTU (a) is 1280 bytes, which similarly applies to
1097	   tunnels as the tunnel MTU (b). The path MTU (c) is the minimum of the
1098	   links (including tunnels as links) along a path, and indicates the
1099	   smallest IP message (packet or fragment) that can traverse a path
1100	   between a source and destination without on-path fragmentation (e.g.,
1101	   supported in IPv4 with DF=0). Path MTU discovery, either at the
1102	   network layer (PMTUD [RFC1191][RFC1981]) or packetization layer
1103	   (PLPMTUD [RFC4821]) attempts to tune the source IP packets and
1104	   fragments (i.e., EMTU_S) to fit within this path MTU size to avoid
1105	   fragmentation and reassembly [Ke95]. The minimum EMTU_R (c) is 1500
1106	   bytes, i.e., the minimum MTU for endpoint-to-endpoint communication.

1108	   The tunnel is a source-destination communication in network N.
1109	   Messages between the tunnel source (the ingress interface) and tunnel
1110	   destination (egress interface) similarly experience a variety of
1111	   network N MTU values, including a link MTU (e), a path MTU (f), and
1112	   an EMTU_R (g). The network N EMTU_S is limited by the path MTU, and
1113	   the source-destination message maximum is limited by EMTU_R, just as
1114	   it was in for those types of MTUs in network M. For an IPv6 network
1115	   N, its link and path MTUs must be at least 1280 and its EMTU_R must
1116	   be at least 1500.

1118	   However, viewed from the context of network M, these network N MTUs
1119	   are link layer properties, i.e., maximum frame sizes (MFS). The
1120	   network N EMTU_R determines the largest message that can transit
1121	   between the source (ingress) and destination (egress), but viewed
1122	   from network M this is a link layer, i.e., EMFS_R. The tunnel EMTU_R
1123	   is EMFS_R minus the link (encapsulation) headers includes the
1124	   encapsulation headers of the link layer. Just as the path MTU has no
1125	   bearing on EMTU_R, the path MFS in network N has no bearing on the
1126	   MTU of the tunnel.

1128	   For IPv6 networks M and N, these relationships are summarized as
1129	   follows:

1131	   o  Network M MTU = 1280, the largest transit packet (i.e., payload)
1132	      over a single IPv6 link in the base network without source
1133	      fragmentation

1135	   o  Network M path MTU = 1280, the transit packet (i.e., payload) that
1136	      can traverse a path of links in the base network without source
1137	      fragmentation

1139	   o  Network M EMTU_R = 1500, the largest transit packet (i.e.,
1140	      payload) that can traverse a path in the base network with source
1141	      fragmentation

1143	   o  Network N MTU = 1280 (for the same reasons as for network M)

1145	   o  Network N path MTU = 1280 (for the same reasons as for network M)

1147	   o  Network N EMTU_R = 1500 (for the same reasons as for network M)

1149	   o  Tunnel MTU = 1500-encapsulation (typically 1460), the network N
1150	      EMTU_R payload

1152	   o  Tunnel atom = largest network M message that transits a tunnel
1153	      using network N as a link layer without fragmentation: 1280-
1154	      encapsulation, i.e., the network N EMTU_S payload, treating EMTU_S
1155	      as a network M EMFS_S.

1157	   The difference between the network N MTU and its treatment as a link
1158	   layer in network M is the reason why the tunnel ingress interfaces
1159	   need to support fragmentation and tunnel egress interfaces need to
1160	   support reassembly in the encapsulation layer(s). The high cost of
1161	   fragmentation and reassembly is why it is useful for applications to
1162	   avoid sending messages too close to the size of the tunnel path MTU
1163	   [Ke95], although there is no signaling mechanism that can achieve
1164	   this (see Section 4.2.3).

1166	4.2.2. Fragmentation

1168	   A tunnel interacts with fragmentation in two different ways. As a
1169	   link in network M, transit packets might be fragmented before they
1170	   reach the tunnel - i.e., in network M either during source
1171	   fragmentation (if generated at the same node as the ingress
1172	   interface) or forwarding fragmentation (for IPv4 DF=0 datagrams). In
1173	   addition, link packets traversing inside the tunnel may require
1174	   fragmentation by the ingress interface - i.e., source fragmentation
1175	   by the ingress as a host in network N. These two fragmentation
1176	   operations are no more related than are conventional IP fragmentation
1177	   and ATM segmentation and reassembly; one occurs at the (transit)
1178	   network layer, the other at the (virtual) link layer.

1180	   Although many of these issues with tunnel fragmentation and MTU
1181	   handling were discussed in [RFC4459], that document described a
1182	   variety of alternatives as if they were independent. This document
1183	   explains the combined approach that is necessary.

1185	   Like any other link, an IPv4 tunnel must transit 68 byte packets
1186	   without requiring source fragmentation [RFC791][RFC1122] and an IPv6
1187	   tunnel must transit 1280 byte packets without requiring source
1188	   fragmentation [RFC2460]. The tunnel MTU interacts with routers or
1189	   hosts it connects the same way as would any other link MTU. The
1190	   pseudocode examples in this section use the following values:

1192	   o  TP: transit packet

1194	   o  TPsize: size of the transit packet (including its headers)

1196	   o  encaps: ingress encapsulation overhead (tunnel link headers)

1198	   o  tunMTU: tunnel MTU, i.e., network N egress EMTU_R - encaps.

1200	   o  tunAtom: tunnel atom size, equal to the egress host-level EMTU_S -
1201	      encaps.

1203	   These rules apply at the host/router where the tunnel is attached,
1204	   i.e., at the network layer of the transit packet (we assume that all
1205	   tunnels, including multipoint tunnels, have a single, uniform MTU).
1206	   These are basic source fragmentation rules (or transit
1207	   refragmentation for IPv4 DF=0 datagrams), and have no relation to the
1208	   tunnel itself other than to consider the tunnel MTU as the effective
1209	   link MTU of the next hop.

1211	   Inside the source during transit packet generation or a router during
1212	   transit packet forwarding, the tunnel is treated as if it were any
1213	   other link (i.e., this is not tunnel processing, but rather typical
1214	   source or router processing), as indicated in the pseudocode in
1215	   Figure 11.

1217	      if (TPsize > tunMTU) then
1218	         if (TP can be on-path fragmented, e.g., IPv4 DF=0) then
1219	            split TP into fragments of tunMTU size
1220	            and send each fragment to the tunnel ingress interface
1221	         else
1222	            drop the TP and send ICMP "too big" to TP source
1223	         endif
1224	      else
1225	         send TP to the tunnel ingress
1226	      endif

1228	         Figure 11 Router / host packet size processing algorithm

1230	   The tunnel ingress acts as host on the tunnel path, i.e., as source
1231	   fragmentation of tunnel link packets (we assume that all tunnels,
1232	   even multipoint tunnels, have a single, uniform tunnel MTU), using
1233	   the pseudocode shown in Figure 12. Note that ingress source
1234	   fragmentation occurs in the encapsulation process, which may involve
1235	   more than one protocol layer. In those cases, fragmentation can occur
1236	   at any of the layers of encapsulation in which it is supported, based
1237	   on the configuration of the ingress.

1239	      if (TPsize <= tunAtom) then
1240	         encapsulate the TP and emit
1241	      else
1242	         if (tunAtom < TPsize) then
1243	            fragment TP into tunAtom chunks
1244	            encapslate each chunk and emit
1245	         endif
1246	      endif

1248	                  Figure 12 Ingress processing algorithm

1250	   Just as a network interface should never receive a message larger
1251	   than its MTU, a tunnel should never receive a message larger than its
1252	   tunnel MTU limit (see the host/router processing above). A router
1253	   attempting to process such a message would already have generated an
1254	   ICMP "packet too big" and the transit packet would have been dropped
1255	   before entering into this algorithm. Similarly, a host would have
1256	   generated an error internally and aborted the attempted transmission.

1258	   As an example, consider IPv4 over IPv6 or IPv6 over IPv6 tunneling,
1259	   where IPv6 encapsulation adds a 40 byte fixed header plus IPv6
1260	   options (i.e., IPv6 header extensions) of total size 'EHsize'. The
1261	   tunnel MTU will be at least 1500 - (40 + EHsize) bytes. The tunnel
1262	   path MTU will be at least 1280 - (40 + EHsize) bytes. Transit packets
1263	   larger than 1460-EHsize will be dropped by a node before ingress
1264	   processing. Considering these minimum values, the previous algorithm
1265	   uses actual values shown in the pseudocode in Figure 13.

1267	      if (TPsize <= (1240 - EHsize)) then
1268	         encapsulate TP and emit
1269	      else
1270	         if ((1240 - EHsize) < TPsize) then
1271	            fragment TP  into (1240 - EHsize) chunks
1272	            encapsulate each chunk and emit
1273	         endif
1274	      endif

1276	           Figure 13 Ingress processing for an tunnel over IPv6

1278	   An IPv6 tunnel supports IPv6 transit only if EHsize is 180 bytes or
1279	   less; otherwise the incoming transit packet would have been dropped
1280	   as being too large by the host/router. Similarly, an IPv6 tunnel
1281	   supports IPv4 transit only if EHsize is 884 bytes or less. In this
1282	   example, transit packets of up to (1240 - Ehsize) can traverse the
1283	   tunnel without ingress source fragmentation and egress reassembly.

1285	   When using IP directly over IP, the minimum transit packet EMTU_R for
1286	   IPv4 is 576 bytes and for IPv6 is 1500 bytes. This means that tunnels
1287	   of IPv4-over-IPv4, IPv4-over-IPv6, and IPv6-over-IPv6 are possible
1288	   without additional requirements, but this may involve ingress
1289	   fragmentation and egress reassembly. IPv6 cannot be tunneled directly
1290	   over IPv4 without additional requirements, notably that the egress
1291	   EMTU_R is at least 1280 bytes.

1293	   When ongoing ingress fragmentation and egress reassembly would be
1294	   prohibitive or costly, larger MTUs can be supported by design and
1295	   confirmed either out-of-band (by design) or in-band (e.g., using
1296	   PLPMTUD [RFC4821], as done in SEAL [RFC5320] and AERO [Te16]).

1298	4.2.3. Path MTU Discovery

1300	   Path MTU discovery (PMTUD) enables a network path to support a larger
1301	   PMTU than it can assume from the minimum requirements of protocol
1302	   over which it operates. Note, however, that PMTUD never discovers
1303	   EMTU_R that is larger than the required minimum; that information is
1304	   available to some upper layer protocols, such as TCP [RFC1122], but
1305	   cannot be determined at the IP layer.

1307	   There is temptation to optimize tunnel traversal so that packets are
1308	   not fragmented between ingress and egress, i.e., to attempt tune the
1309	   network M PMTU to the tunnel atom size (i.e., the ingress EMTU_S
1310	   minus encapsulation overhead) rather than the tunnel MTU, to avoid
1311	   ingress fragmentation.

1313	   This is often impossible because the ICMP "packet too big" message
1314	   (IPv4 fragmentation needed [RFC792] or IPv6 packet too big [RFC4443])
1315	   indicates the complete failure of a link to transit a packet, not a
1316	   preference for a size that matches that internal the mechanism of the
1317	   link. ICMP messages are intended to indicate whether a tunnel MTU is
1318	   insufficient; there is no ICMP message that can indicate when a
1319	   transit packet is "too bit to for the tunnel path MTU, but not larger
1320	   than the tunnel MTU". If there were, endpoints might receive that
1321	   message for IP packets larger than 40 bytes (the payload of a single
1322	   ATM cell, allowing for the 8-byte AAL5 trailer), but smaller than 9K
1323	   (the ATM EMTU_R payload).

1325	   In addition, attempting to try to tune the network transit size to
1326	   natively match that of the link internal transit can be hazardous for
1327	   many reasons:

1329	   o  The tunnel is capable of transiting packets as large as the
1330	      network N EMTU_R - encapsulation, which is always at least as
1331	      large as the tunnel MTU and typically is larger.

1333	   o  ICMP has only one type of error message regarding large packets -
1334	      "too big", i.e., too large to transit. There is no optimization
1335	      message of "bigger than I'd like, but I can deal with if needed".

1337	   o  IP tunnels often involve some level of recursion, i.e.,
1338	      encapsulation over itself [RFC4459].

1340	   Tunnels that use IPv4 as the encapsulation layer SHOULD set DF=0, but
1341	   this requires generating unique fragmentation ID values, which may
1342	   limit throughput [RFC6864]. These tunnels might have difficulty
1343	   assuming ingress EMTU_S values over 64 bytes, so it may not be
1344	   feasible to assume that larger packets with DF=1 are safe.

1346	   Recursive tunneling occurs whenever a protocol ends up encapsulated
1347	   in itself. This happens directly, as when IPv4 is encapsulated in
1348	   IPv4, or indirectly, as when IP is encapsulated in UDP which then is
1349	   a payload inside IP. It can involve many layers of encapsulation
1350	   because a tunnel provider isn't always aware of whether the packets
1351	   it transits are already tunneled.

1353	   Recursion is impossible when the tunnel transit packets are limited
1354	   to that of the native size of the ingress payload. Arriving tunnel
1355	   transit packets have a minimum supported size (1280 for IPv6) and the
1356	   tunnel PMFS has the same requirement; there would be no room for the
1357	   tunnel's "link layer" headers, i.e., the encapsulation layer. The
1358	   result would be an IPv6 tunnel that cannot satisfy IPv6 transit
1359	   requirements.

1361	   It is more appropriate to require the tunnel to satisfy IP transit
1362	   requirements and enforce that requirement at design time or during
1363	   operation (the latter using PLPMTUD [RFC4821]). Conventional path MTU
1364	   discovery (PMTUD) relies on existing endpoint ICMP processing of
1365	   explicit negative feedback from routers along the path via "message
1366	   to big" ICMP packets in the reverse direction of the tunnel
1367	   [RFC1191][RFC1981]. This technique is susceptible to the "black hole"
1368	   phenomenon, in which the ICMP messages never return to the source due
1369	   to policy-based filtering [RFC2923]. PLPMTUD requires a separate,
1370	   direct control channel from the egress to the ingress that provides
1371	   positive feedback; the direct channel is not blocked by policy
1372	   filters and the positive feedback ensures fail-safe operation if
1373	   feedback messages are lost [RFC4821].

1375	4.3. Coordination Issues

1377	   IP tunnels interact with link layer signals and capabilities in a
1378	   variety of ways. The following subsections address some key issues of
1379	   these interactions. In general, they are again informed by treating a
1380	   tunnel as any other link layer and considering the interactions
1381	   between the IP layer and link layers [RFC3819].

1383	4.3.1. Signaling

1385	   In the current Internet architecture, signaling goes upstream, either
1386	   from routers along a path or from the destination, back toward the
1387	   source. Such signals are typically contained in ICMP messages, but
1388	   can involve other protocols such as RSVP, transport protocol signals
1389	   (e.g., TCP RSTs), or multicast control or transport protocols.

1391	   A tunnel behaves like a link and acts like a link interface at the
1392	   nodes where it is attached. As such, it can provide information that
1393	   enhances IP signaling (e.g., ICMP), but itself does not directly
1394	   generate ICMP messages.

1396	   For tunnels, this means that there are two separate signaling paths.
1397	   The outer network M nodes can each signal the source of the tunnel
1398	   transit packets, Hsrc (Figure 14). Inside the tunnel, the inner
1399	   network N nodes can signal the source of the tunnel link packets, the
1400	   ingress I (Figure 15).

1402	           +--------+---------------------------+--------+
1403	           |        |                           |        |
1404	           v        --_                         --       v
1405	        +------+   /  \                        /  \   +------+
1406	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1407	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1408	                    --/I \--+ Rb +--+ Rc +--/E \--
1409	                      \  /   \  /    \  /   \  /
1410	                       \/     --      --     \/
1411	                        <---- Network N ----->
1412	        <-------------------- Network M --------------------->

1414	                   Figure 14 Signals outside the tunnel

1416	                        +-----+-------+------+
1417	                    --_ |     |       |      |  --
1418	        +------+   /  \ v     |       |      | /  \   +------+
1419	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1420	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1421	                    --/I \--+ Rb +--+ Rc +--/E \--
1422	                      \  /   \  /    \  /   \  /
1423	                       \/     --      --     \/
1424	                        <----- Network N ---->
1425	        <--------------------- Network M -------------------->

1427	                    Figure 15 Signals inside the tunnel

1429	   These two signal paths are inherently distinct except where
1430	   information is exchanged between the network interface of the tunnel
1431	   (the ingress) and its attached node (Ra, in both figures).

1433	   It is always possible for a network interface to provide hints to its
1434	   attached node (host or router), which can be used for optimization.
1435	   In this case, when signals inside the tunnel indicate a change to the
1436	   tunnel, the ingress (i.e., the tunnel network interface) can provide
1437	   information to the router (Ra, in both figures), so that Ra can
1438	   generate the appropriate signal in return to Hsrc. This relaying may
1439	   be difficult, because signals inside the tunnel may not return enough
1440	   information to the ingress to support direct relaying to Hsrc.

1442	   In all cases, the tunnel ingress needs to determine how to relay the
1443	   signals from inside the tunnel into signals back to the source. For
1444	   some protocols this is either simple or impossible (such as for
1445	   ICMP), for others, it can even be undefined (e.g., multicast). In
1446	   some cases, the individual signals relayed from inside the tunnel may
1447	   result in corresponding signals in the outside network, and in other
1448	   cases they may just change state of the tunnel interface. In the
1449	   latter case, the result may cause the router Ra to generate new ICMP
1450	   errors when later messages arrive from Hsrc or other sources in the
1451	   outer network.

1453	   The meaning of the relayed information must be carefully translated.
1454	   An ICMP error within a tunnel indicates a failure of the path inside
1455	   the tunnel to support an egress EMTU_S. It can be very difficult to
1456	   convert that ICMP error into a corresponding ICMP message from the
1457	   ingress node back to the transit packet source. The ICMP message may
1458	   not contain enough of a packet prefix to extract the transit packet
1459	   header sufficient to generate the appropriate ICMP message. The
1460	   relationship between the egress EMTU_S and the transit packet may be
1461	   indirect, e.g., the ingress node may be performing source
1462	   fragmentation that should be adjusted instead of propagating the ICMP
1463	   upstream.

1465	   Some messages have detailed specifications for relaying between the
1466	   tunnel link packet and transit packet, including Explicit Congestion
1467	   Notification (ECN [RFC6040]) and multicast (IGMP, e.g.).

1469	4.3.2. Congestion

1471	   Tunnels carrying IP traffic (i.e., the focus of this document) need
1472	   not react directly to congestion any more than would any other link
1473	   layer [RFC8085]. IP transit packet traffic is already expected to be
1474	   congestion controlled.

1476	   It is useful to relay network congestion notification between the
1477	   tunnel link and the tunnel transit packets. Explicit congestion
1478	   notification requires that ECN bits are copied from the tunnel
1479	   transit packet to the tunnel link packet on encapsulation, as well as
1480	   copied back at the egress based on a combination of the bits of the
1481	   two headers [RFC6040]. This allows congestion notification within the
1482	   tunnel to be interpreted as if it were on the direct path.

1484	4.3.3. Multipoint Tunnels and Multicast

1486	   Multipoint tunnels are tunnels with more than two ingress/egress
1487	   endpoints. Just as tunnels emulate links, multipoint tunnels emulate
1488	   multipoint links, and can support multicast as a tunnel capability.
1489	   Multipoint tunnels can be useful on their own, or may be used as part
1490	   of more complex systems, e.g., LISP and TRILL configurations
1491	   [RFC6830][RFC6325].

1493	   Multipoint tunnels require a support for egress determination, just
1494	   as multipoint links do. This function is typically supported by ARP
1495	   [RFC826] or ARP emulation (e.g., LAN Emulation, known as LANE
1496	   [RFC2225]) for multipoint links. For multipoint tunnels, a similar
1497	   mechanism is required for the same purpose - to determine the egress
1498	   address for proper ingress encapsulation (e.g., LISP Map-Service
1499	   [RFC6833]).

1501	   All multipoint systems - tunnels and links - might support different
1502	   MTUs between each ingress/egress (or link entrance/exit) pair. In
1503	   most cases, it is simpler to assume a uniform MTU throughout the
1504	   multipoint system, e.g., the minimum MTU supported across all
1505	   ingress/egress pairs. This applies to both the ingress EMTU_S and
1506	   ingress EMTU_S (the latter determining the tunnel MTU).

1508	   A multipoint tunnel MUST have support for broadcast and multicast, in
1509	   exactly the same way as this is already required for multipoint links
1510	   [RFC3819]. Both modes can be supported either by a native mechanism
1511	   inside the tunnel or by emulation using serial replication at the
1512	   tunnel ingress (e.g., AMT [RFC7450]), in the same way that links may
1513	   provide the same support either natively (e.g., via promiscuous or
1514	   automatic replication in the link itself) or network interface
1515	   emulation (e.g., as for non-broadcast multiaccess networks, i.e.,
1516	   NBMAs).

1518	   IGMP snooping enables IP multicast to be coupled with native link
1519	   layer multicast support [RFC4541]. A similar technique may be
1520	   relevant to couple transit packet multicast to tunnel link packet
1521	   multicast, but the coupling of the protocols may be more complex
1522	   because many tunnel link protocols rely on their own network N
1523	   multicast control protocol, e.g., via PIM-SM [RFC6807][RFC7761].

1525	4.3.4. Load Balancing

1527	   Load balancing can impact the way in which a tunnel operates. In
1528	   particular, multipath routing inside the tunnel can impact some of
1529	   the tunnel parameters to vary, both over time and for different
1530	   transit packets. The use of multiple paths can be the result of MPLS
1531	   link aggregation groups (LAGs), equal-cost multipath routing (ECMP
1532	   [RFC2991]), or other load balancing mechanisms. In some cases, the
1533	   tunnel exists as the mechanism to support ECMP, as for GRE in UDP
1534	   [RFC8086].

1536	   A tunnel may have multiple paths between the ingress and egress with
1537	   different path MTU values, causing the ingress EMTU_S to vary
1538	   [RFC7690]. Rather than track individual values, the EMTU_S can be set
1539	   to the minimum of these different path MTU values.

1541	   IPv6 packets include a flow label to enable multipath routing to keep
1542	   packets of a single flow following the same path. It is helpful to
1543	   preserve the semantics of that flow label as an aggregate identifier
1544	   inside the encapsulated link packets of a tunnel. This is achieved by
1545	   hashing the transit IP addresses and flow label to generate a new
1546	   flow label for use between the ingress and egress addresses
1547	   [RFC6438]. It is not useful to simply copy the flow label from the
1548	   transit packet into the link packet because of collisions that might
1549	   arise if a label is used for flows between different transit packet
1550	   addresses that traverse the same tunnel.

1552	4.3.5. Recursive Tunnels

1554	   The rules described in this document already support tunnels over
1555	   tunnels, sometimes known as "recursive" tunnels, in which IP is
1556	   transited over IP either directly or via intermediate encapsulation
1557	   (IP-UDP-IP, as in GUE [He16]).

1559	   There are known hazards to recursive tunneling, notably that the
1560	   independence of the tunnel transit header and tunnel link header hop
1561	   counts can result in a tunneling loop. Such looping can be avoided
1562	   when using direct encapsulation (IP in IP) by use of a header option
1563	   to track the encapsulation count and to limit that count [RFC2473].
1564	   This looping cannot be avoided when other protocols are used for
1565	   tunneling, e.g., IP in UDP in IP, because the encapsulation count may
1566	   not be visible where the recursion occurs.

1568	5. Observations

1570	   The following subsections summarize the observations of this document
1571	   and a summary of issues with existing tunnel protocol specifications.
1572	   It also includes advice for tunnel protocol designers, implementers,
1573	   and operators. It also includes

1575	5.1. Summary of Recommendations

1577	   o  Tunnel endpoints are network interfaces, tunnel are virtual links
1578	       o ICMP messages MUST NOT be generated by the tunnel (as a link)

1580	       o ICMP messages received by the ingress inside link change the
1581	          link properties (they not generate transit-layer ICMP
1582	          messages)

1584	       o Link headers (hop, ID, options) are largely independent of
1585	          arriving ID (with few exceptions based on translation, not
1586	          direct copying, e.g., ECN and IPv6 flow IDs)

1588	   o  MTU values should treat the tunnel as any other link

1590	       o Require source ingress source fragmentation and egress
1591	          reassembly at the tunnel link packet layer

1593	       o The tunnel MTU is the tunnel egress EMTU_S less headers, and
1594	          not related at all to the ingress-egress MFS

1596	   o  Tunnels must obey core IP requirements

1598	       o Obey IPv4 DF=0 on arrival at a node (nodes MUST NOT fragment
1599	          IPv4 packets where DF=0)

1601	       o Shut down an IP tunnel if the tunnel MTU falls below the
1602	          required minimum

1604	5.2. Impact on Existing Encapsulation Protocols

1606	   Many existing and proposed encapsulation protocols are inconsistent
1607	   with the guidelines of this document. The following list summarizes
1608	   only those inconsistencies, but omits places where a protocol is
1609	   inconsistent solely by reference to another protocol.

1611	   [should this be inverted as a table of issues and a list of which
1612	   RFCs have problems?]

1614	   o  IP in IP / mobile IP [RFC2003][RFC4459] - IPv4 in IPv4

1616	       o Sets link DF when transit DF=1 (fails without PLPMTUD)

1618	       o Drops at egress if hopcount = 0 (host-host tunnels fail)

1620	       o Drops based on transit source (same as router IP, matches
1621	          egress), i.e., performs routing functions it should not

1623	       o Ingress generates ICMP messages (based on relayed context),
1624	          rather than using inner ICMP messages to set interface
1625	          properties only

1627	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1629	   o  IPv6 tunnels [RFC2473] -- IPv6 or IPv4 in IPv6

1631	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1633	       o Decrements transiting packet hopcount (by 1)

1635	       o Copies traffic class from tunnel link to tunnel transit header

1637	       o Ignores IPv4 DF=0 and fragments at that layer upon arrival

1639	       o Fails to retain soft ingress state based on inner ICMP messages
1640	          affecting tunnel MTU

1642	       o Tunnel ingress issues ICMPs

1644	       o Fragments IPv4 over IPv6 fragments only if IPv4 DF=0
1645	          (misinterpreting the "can fragment the IPv4 packet" as
1646	          permission to fragment at the IPv6 link header)

1648	   o  IPsec tunnel mode (IP in IPsec in IP) [RFC4301] -- IP in IPsec

1650	       o Uses security policy to set, clear, or copy DF (rather than
1651	          generating it independently, which would also be more secure)

1653	       o Intertwines tunnel selection with security selection, rather
1654	          than presenting tunnel as an interface and using existing
1655	          forwarding (as with transport mode over IP-in-IP [RFC3884])

1657	   o  GRE (IP in GRE in IP or IP in GRE in UDP in IP)
1658	      [RFC2784][RFC7588][RFC7676][RFC8086]

1660	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1662	       o Requires ingress to generate ICMP errors

1664	       o Copies IPv4 DF to outer IPv4 DF

1666	       o Violates IPv6 MTU requirements when using IPv6 encapsulation

1668	   o  LISP [RFC6830]
1669	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1671	       o Requires ingress to generate ICMP errors

1673	       o Copies inner hop limit to outer

1675	   o  L2TP [RFC3931]

1677	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1679	       o Requires ingress to generate ICMP errors

1681	   o  PWE [RFC3985]

1683	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1685	       o Requires ingress to generate ICMP errors

1687	   o  GUE (Generic UDP encapsulation) [He16] - IP (et. al) in UDP in IP

1689	       o Allows inner encapsulation fragmentation

1691	   o  Geneve [RFC7364][Gr16] - IP (et al.) in Geneve in UDP in IP

1693	       o Treats tunnel MTU as tunnel path MTU, not tunnel egress MTU

1695	   o  SEAL/AERO [RFC5320][Te16] - IP in SEAL/AERO in IP

1697	       o Some issues with SEAL (MTU, ICMP), corrected in AERO

1699	   o  RTG DT encapsulations [No16]

1701	       o Assumes fragmentation can be avoided completely

1703	       o Allows encapsulation protocols that lack fragmentation

1705	       o Relies on ICMP PTB to correct for tunnel path MTU

1707	   o  No known issues

1709	       o L2VPN (framework for L2 virtualization) [RFC4664]

1711	       o L3VPN (framework for L3 virtualization) [RFC4176]

1713	       o MPLS (IP in MPLS) [RFC3031]

1715	       o TRILL (Ethernet in Ethernet) [RFC5556][RFC6325]

1717	5.3. Tunnel Protocol Designers

1719	   [To be completed]

1721	   Recursive tunneling + minimum MTU = frag/reassembly is inevitable, at
1722	   least to be able to split/join two fragments

1724	   Account for egress MTU/path MTU differences.

1726	   Include a stronger checksum.

1728	   Ensure the egress MTU is always larger than the path MTU.

1730	   Ensure that the egress reassembly can keep up with line rate OR
1731	   design PLPMTUD into the tunneling protocol.

1733	5.3.1. For Future Standards

1735	   [To be completed]

1737	   Larger IPv4 MTU (2K? or just 2x path MTU?) for reassembly

1739	   Always include frag support for at least two frags; do NOT try to
1740	   deprecate fragmentation.

1742	   Limit encapsulation option use/space.

1744	   Augment ICMP to have two separate messages: PTB vs P-bigger-than-
1745	   optimal

1747	   Include MTU as part of BGP as a hint - SB

1749	   Hazards of multi-MTU draft-van-beijnum-multi-mtu-04

1751	5.3.2. Diagnostics

1753	   [To be completed]

1755	   Some current implementations include diagnostics to support
1756	   monitoring the impact of tunneling, especially the impact on
1757	   fragmentation and reassembly resources, the status of path MTU
1758	   discovery, etc.

1760	   >> Because a tunnel ingress/egress is a network interface, it SHOULD
1761	   have similar resources as any other network interface. This includes
1762	   resources for packet processing as well as monitoring.

1764	5.4. Tunnel Implementers

1766	   [To be completed]

1768	   Detect when the egress MTU is exceeded.

1770	   Detect when the egress MTU drops below the required minimum and shut
1771	   down the tunnel if that happens - configuring the tunnel down and
1772	   issuing a hard error may be the only way to detect this anomaly, and
1773	   it's sufficiently important that the tunnel SHOULD be disabled. This
1774	   is always better than blindly assuming the tunnel has been deployed
1775	   correctly, i.e., that the solution has been engineered.

1777	   Do NOT decrement the TTL as part of being a tunnel. It's always
1778	   already OK for a router to decrement the TTL based on different next-
1779	   hop routers, but TTL is a property of a router not a link.

1781	5.5. Tunnel Operators

1783	   [To be completed]

1785	   Keep the difference between "enforced by operators" vs. "enforced by
1786	   active protocol mechanism" in mind. It's fine to assume something the
1787	   tunnel cannot or does not test, as long as you KNOW you can assume
1788	   it. When the assumption is wrong, it will NOT be signaled by the
1789	   tunnel. Do NOT decrement the TTL as part of being a tunnel. It's
1790	   always already OK for a router to decrement the TTL based on
1791	   different next-hop routers, but TTL is a property of a router not a
1792	   link.

1794	   Consider the circuit breakers doc to provide diagnostics and last-
1795	   resort control to avoid overload for non-reactive traffic (see
1796	   Gorry's RFC-to-be)

1798	   Do NOT decrement the TTL as part of being a tunnel. It's always
1799	   already OK for a router to decrement the TTL based on different next-
1800	   hop routers, but TTL is a property of a router not a link.

1802	   >>>> PLPMTUD can give multiple conflicting PMTU values during ECMP or
1803	   LAG if PMTU is cached per endpoint pair rather than per flow -- but
1804	   so can PMTUD! This is another reason why ICMP should never drive up
1805	   the effective MTU (if aggregate, treat as the minimum of received
1806	   messages over an interval).

1808	6. Security Considerations

1810	   Tunnels may introduce vulnerabilities or add to the potential for
1811	   receiver overload and thus DOS attacks. These issues are primarily
1812	   related to the fact that a tunnel is a link that traverses a network
1813	   path and to fragmentation and reassembly. ICMP signal translation
1814	   introduces a new security issue and must be done with care. ICMP
1815	   generation at the router or host attached to a tunnel is already
1816	   covered by existing requirements (e.g., should be throttled).

1818	   Tunnels traverse multiple hops of a network path from ingress to
1819	   egress. Traffic along such tunnels may be susceptible to on-path and
1820	   off-path attacks, including fragment injection, reassembly buffer
1821	   overload, and ICMP attacks. Some of these attacks may not be as
1822	   visible to the endpoints of the architecture into which tunnels are
1823	   deployed and these attacks may thus be more difficult to detect.

1825	   Fragmentation at routers or hosts attached to tunnels may place an
1826	   undue burden on receivers where traffic is not sufficiently diffuse,
1827	   because tunnels may induce source fragmentation at hosts and path
1828	   fragmentation (for IPv4 DF=0) more for tunnels than for other links.
1829	   Care should be taken to avoid this situation, notably by ensuring
1830	   that tunnel MTUs are not significantly different from other link
1831	   MTUs.

1833	   Tunnel ingresses emitting IP datagrams MUST obey all existing IP
1834	   requirements, such as the uniqueness of the IP ID field. Failure to
1835	   either limit encapsulation traffic, or use additional ingress/egress
1836	   IP addresses, can result in high speed traffic fragments being
1837	   incorrectly reassembled.

1839	   Tunnels are susceptible to attacks at both the inner and outer
1840	   network layers. The tunnel ingress/egress endpoints appear as network
1841	   interfaces in the outer network, and are as susceptible as any other
1842	   network interface. This includes vulnerability to fragmentation
1843	   reassembly overload, traffic overload, and spoofed ICMP messages that
1844	   misreport the state of those interfaces. Similarly, the
1845	   ingress/egress appear as hosts to the path traversed by the tunnel,
1846	   and thus are as susceptible as any other host to attacks as well.

1848	   [management?]

1850	   [Access control?]

1852	   describe relationship to [RFC6169] - JT (as per INTAREA meeting
1853	   notes, don't cover Teredo-specific issues in RFC6169, but include
1854	   generic issues here)

1856	7. IANA Considerations

1858	   This document has no IANA considerations.

1860	   The RFC Editor should remove this section prior to publication.

1862	8. References

1864	8.1. Normative References

1866	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1867	             Requirement Levels", BCP 14, RFC 2119, March 1997.

1869	   [are there others? 3819? ECN? Flow label issues?]

1871	8.2. Informative References

1873	   [Cl88]    Clark, D., "The design philosophy of the DARPA internet
1874	             protocols," Proc. Sigcomm 1988, p.106-114, 1988.

1876	   [Er94]    Eriksson, H., "MBone: The Multicast Backbone,"
1877	             Communications of the ACM, Aug. 1994, pp.54-60.

1879	   [Gr16]    Gross, J. (Ed.), I. Ganga (Ed.), T. Sridhar (Ed.), "Geneve:
1880	             Generic Network Virtualization Encapsulation," draft-ietf-
1881	             nvo3-geneve-03, Sep. 2016.

1883	   [He16]    Herbert, T., L. Yong, O. Zia, "Generic UDP Encapsulation,"
1884	             draft-ietf-nvo3-gue-05, Oct. 2016.

1886	   [Ke95]    Kent, S., J. Mogul, "Fragmentation considered harmful," ACM
1887	             Sigcomm Computer Communication Review (CCR), V25 N1, Jan.
1888	             1995, pp. 75-87.

1890	   [No16]    Nordmark, E. (Ed.), A. Tian, J. Gross, J. Hudson, L.
1891	             Kreeger, P. Garg, P. Thaler, T. Herbert, "Encapsulation
1892	             Considerations," draft-ietf-rtgwg-dt-encap-02, Oct. 2016.

1894	   [RFC5]    Rulifson, J, "Decode Encode Language (DEL)," RFC 5, June
1895	             1969.

1897	   [RFC768]  Postel, J, "User Datagram Protocol," RFC 768, Aug. 1980

1899	   [RFC791]  Postel, J., "Internet Protocol," RFC 791 / STD 5, September
1900	             1981.

1902	   [RFC792]  Postel, J., "Internet Control Message Protocol," RFC 792,
1903	             Sep. 981.

1905	   [RFC793]  Postel, J, "Transmission Control Protocol," RFC 793, Sept.
1906	             1981.

1908	   [RFC826]  Plummer, D., "An Ethernet Address Resolution Protocol -- or
1909	             -- Converting Network Protocol Addresses to 48.bit Ethernet
1910	             Address for Transmission on Ethernet Hardware," RFC 826,
1911	             Nov. 1982.

1913	   [RFC1075] Waitzman, D., C. Partridge, S. Deering, "Distance Vector
1914	             Multicast Routing Protocol," RFC 1075, Nov. 1988.

1916	   [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
1917	             Communication Layers," RFC 1122 / STD 3, October 1989.

1919	   [RFC1191] Mogul, J., S. Deering, "Path MTU discovery," RFC 1191,
1920	             November 1990.

1922	   [RFC1812] Baker, F., "Requirements for IP Version 4 Routers," RFC
1923	             1812, June 1995.

1925	   [RFC1853] Simpson, W., "IP in IP Tunneling," RFC 1853, Oct. 1995.

1927	   [RFC1981] McCann, J., S. Deering, J. Mogul, "Path MTU Discovery for
1928	             IP version 6," RFC 1981, Aug. 1996.

1930	   [RFC2003] Perkins, C., "IP Encapsulation within IP," RFC 2003, Oct.
1931	             1996.

1933	   [RFC2225] Laubach, M., J. Halpern, "Classical IP and ARP over ATM,"
1934	             RFC 2225, Apr. 1998.

1936	   [RFC2460] Deering, S., R. Hinden, "Internet Protocol, Version 6
1937	             (IPv6) Specification," RFC 2460, Dec. 1998.

1939	   [RFC2473] Conta, A., "Generic Packet Tunneling in IPv6
1940	             Specification," RFC 2473, Dec. 1998.

1942	   [RFC2784] Farinacci, D., T. Li, S. Hanks, D. Meyer, P. Traina,
1943	             "Generic Routing Encapsulation (GRE)", RFC 2784, March
1944	             2000.

1946	   [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery," RFC
1947	             2923, September 2000.

1949	   [RFC2983] Black, D., "Differentiated Services and Tunnels," RFC 2983,
1950	             Oct. 2000.

1952	   [RFC2991] Thaler, D., C. Hopps, "Multipath Issues in Unicast and
1953	             Multicast Next-Hop Selection," RFC 2991, Nov. 2000.

1955	   [RFC2473] Conta, A., S. Deering, "Generic Packet Tunneling in IPv6
1956	             Specification," RFC 2473, Dec. 1998.

1958	   [RFC2546] Durand, A., B. Buclin, "6bone Routing Practice," RFC 2540,
1959	             Mar. 1999.

1961	   [RFC3031] Rosen, E., A. Viswanathan, R. Callon, "Multiprotocol Label
1962	             Switching Architecture", RFC 3031, January 2001.

1964	   [RFC3819] Karn, P., Ed., C. Bormann, G. Fairhurst, D. Grossman, R.
1965	             Ludwig, J. Mahdavi, G. Montenegro, J. Touch, L. Wood,
1966	             "Advice for Internet Subnetwork Designers," RFC 3819 / BCP
1967	             89, July 2004.

1969	   [RFC3884] Touch, J., L. Eggert, Y. Wang, "Use of IPsec Transport Mode
1970	             for Dynamic Routing," RFC 3884, September 2004.

1972	   [RFC3931] Lau, J., Ed., M. Townsley, Ed., I. Goyret, Ed., "Layer Two
1973	             Tunneling Protocol - Version 3 (L2TPv3)," RFC 3931, March
1974	             2005.

1976	   [RFC3985] Bryant, S., P. Pate (Eds.), "Pseudo Wire Emulation Edge-to-
1977	             Edge (PWE3) Architecture", RFC 3985, March 2005.

1979	   [RFC4176] El Mghazli, Y., Ed., T. Nadeau, M. Boucadair, K. Chan, A.
1980	             Gonguet, "Framework for Layer 3 Virtual Private Networks
1981	             (L3VPN) Operations and Management," RFC 4176, October 2005.

1983	   [RFC4301] Kent, S., and K. Seo, "Security Architecture for the
1984	             Internet Protocol," RFC 4301, December 2005.

1986	   [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion
1987	             Control Protocol (DCCP)," RFC 4340, Mar. 2006.

1989	   [RFC4443] Conta, A., S. Deering, M. Gupta (Ed.), "Internet Control
1990	             Message Protocol (ICMPv6) for the Internet Protocol Version
1991	             6 (IPv6) Specification," RFC 4443, Mar. 2006.

1993	   [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-
1994	             Network Tunneling," RFC 4459, April 2006.

1996	   [RFC4541] Christensen, M., K. Kimball, F. Solensky, "Considerations
1997	             for Internet Group Management Protocol (IGMP) and Multicast
1998	             Listener Discovery (MLD) Snooping Switches," RFC 4541, May
1999	             2006.

2001	   [RFC4664] Andersson, L., Ed., E. Rosen, Ed., "Framework for Layer 2
2002	             Virtual Private Networks (L2VPNs)," RFC 4664, September
2003	             2006.

2005	   [RFC4821] Mathis, M., J. Heffner, "Packetization Layer Path MTU
2006	             Discovery," RFC 4821, March 2007.

2008	   [RFC4861] Narten, T., E. Nordmark, W. Simpson, H. Soliman, "Neighbor
2009	             Discovery for IP version 6 (IPv6)," RFC 4861, Sept. 2007.

2011	   [RFC4960] Stewart, R. (Ed.), "Stream Control Transmission Protocol,"
2012	             RFC 4960, Sep. 2007.

2014	   [RFC4963] Heffner, J., M. Mathis, B. Chandler, "IPv4 Reassembly
2015	             Errors at High Data Rates," RFC 4963, July 2007.

2017	   [RFC5320] Templin, F., Ed., "The Subnetwork Encapsulation and
2018	             Adaptation Layer (SEAL)," RFC 5320, Feb. 2010.

2020	   [RFC5556] Touch, J., R. Perlman, "Transparently Interconnecting Lots
2021	             of Links (TRILL): Problem and Applicability Statement," RFC
2022	             5556, May 2009.

2024	   [RFC5944] Perkins, C., Ed., "IP Mobility Support for IPv4, Revised"
2025	             RFC 5944, Nov. 2010.

2027	   [RFC6040] Briscoe, B., "Tunneling of Explicit Congestion
2028	             Notification," RFC 6040, Nov. 2010.

2030	   [RFC6169] Krishnan, S., D. Thaler, J. Hoagland, "Security Concerns
2031	             With IP Tunneling," RFC 6169, Apr. 2011.

2033	   [RFC6325] Perlman, R., D. Eastlake, D. Dutt, S. Gai, A. Ghanwani,
2034	             "Routing Bridges (RBridges): Base Protocol Specification,"
2035	             RFC 6325, July 2011.

2037	   [RFC6434] Jankiewicz, E., J. Loughney, T. Narten, "IPv6 Node
2038	             Requirements," RFC 6434, Dec. 2011.

2040	   [RFC6438] Carpenter, B., S. Amante, "Using the IPv6 Flow Label for
2041	             Equal Cost Multipath Routing and Link Aggregation in
2042	             Tunnels," RFC 6438, Nov. 2011.

2044	   [RFC6807] Farinacci, D., G. Shepherd, S. Venaas, Y. Cai, "Population
2045	             Count Extensions to Protocol Independent Multicast (PIM),"
2046	             RFC 6807, Dec. 2012.

2048	   [RFC6830] Farinacci, D., V. Fuller, D. Meyer, D. Lewis, "The
2049	             Locator/ID Separation Protocol," RFC 6830, Jan. 2013.

2051	   [RFC6833] Fuller, V., D. Farinacci, "Locator/ID Separation Protocol
2052	             (LISP) Map-Server Interface," RFC 6833, Jan. 2013.

2054	   [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field,"
2055	             Proposed Standard, RFC 6864, Feb. 2013.

2057	   [RFC6935] Eubanks, M., P. Chimento, M. Westerlund, "IPv6 and UDP
2058	             Checksums for Tunneled Packets," RFC 6935, Apr. 2013.

2060	   [RFC6936] Fairhurst, G., M. Westerlund, "Applicability Statement for
2061	             the Use of IPv6 UDP Datagrams with Zero Checksums," RFC
2062	             6936, Apr. 2013.

2064	   [RFC6946] Gont, F., "Processing of IPv6 "Atomic" Fragments," RFC
2065	             6946, May 2013.

2067	   [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., M.
2068	             Napierala, "Problem Statement: Overlays for Network
2069	             Virtualization", RFC 7364, Oct. 2014.

2071	   [RFC7450] Bumgardner, G., "Automatic Multicast Tunneling," RFC 7450,
2072	             Feb. 2015.

2074	   [RFC7510] Xu, X., N. Sheth, L. Yong, R. Callon, D. Black,
2075	             "Encapsulating MPLS in UDP," RFC 7510, April 2015.

2077	   [RFC7588] Bonica, R., C. Pignataro, J. Touch, "A Widely-Deployed
2078	             Solution to the Generic Routing Encapsulation Fragmentation
2079	             Problem," RFC 7588, July 2015.

2081	   [RFC7676] Pignataro, C., R. Bonica, S. Krishnan, "IPv6 Support for
2082	             Generic Routing Encapsulation (GRE)," RFC 7676, Oct 2015.

2084	   [RFC7690] Byerly, M., M. Hite, J. Jaeggli, "Close Encounters of the
2085	             ICMP Type 2 Kind (Near Misses with ICMPv6 Packet Too Big
2086	             (PTB))," RFC 7690, Jan. 2016.

2088	   [RFC7761] Fenner, B., M. Handley, H. Holbrook, I. Kouvelas, R.
2089	             Parekh, Z. Zhang, L. Zheng, "Protocol Independent Multicast
2090	             - Sparse Mode (PIM-SM): Protocol Specification (Revised),"
2091	             RFC 7761, Mar. 2016.

2093	   [RFC8085] Eggert, L., G. Fairhurst, G. Shepherd, "Unicast UDP Usage
2094	             Guidelines," RFC 8085, Oct. 2015.

2096	   [RFC8086] Yong, L. (Ed.), E. Crabbe, X. Xu, T. Herbert, "GRE-in-UDP
2097	             Encapsulation," RFC 8086, Feb. 2017.

2099	   [Sa84]    Saltzer, J., D. Reed, D. Clark, "End-to-end arguments in
2100	             system design," ACM Trans. on Computing Systems, Nov. 1984.

2102	   [Te16]    Templin, F., "Asymmetric Extended Route Optimization,"
2103	             draft-templin-aerolink-74, Nov. 2016.

2105	   [To01]    Touch, J., "Dynamic Internet Overlay Deployment and
2106	             Management Using the X-Bone," Computer Networks, July 2001,
2107	             pp. 117-135.

2109	   [To03]    Touch, J., Y. Wang, L. Eggert, G. Finn, "Virtual Internet
2110	             Architecture," USC/ISI Tech. Report ISI-TR-570, Aug. 2003.

2112	   [To16]    Touch, J., "Middleboxes Models Compatible with the
2113	             Internet," USC/ISI Tech. Report ISI-TR-711, Oct. 2016.

2115	   [To98]    Touch, J., S. Hotz, "The X-Bone," Proc. Globecom Third
2116	             Global Internet Mini-Conference, Nov. 1998.

2118	   [Zi80]    Zimmermann, H., "OSI Reference Model - The ISO Model of
2119	             Architecture for Open Systems Interconnection," IEEE Trans.
2120	             on Comm., Apr. 1980.

2122	9. Acknowledgments

2124	   This document originated as the result of numerous discussions among
2125	   the authors, Jari Arkko, Stuart Bryant, Lars Eggert, Ted Faber, Gorry
2126	   Fairhurst, Dino Farinacci, Matt Mathis, and Fred Templin. It
2127	   benefitted substantially from detailed feedback from Toerless Eckert,
2128	   Vincent Roca, and Lucy Yong, as well as other members of the Internet
2129	   Area Working Group.

2131	   This work is partly supported by USC/ISI's Postel Center.

2133	   This document was prepared using 2-Word-v2.0.template.dot.

2135	Authors' Addresses

2137	   Joe Touch
2138	   USC/ISI
2139	   4676 Admiralty Way
2140	   Marina del Rey, CA 90292-6695
2141	   U.S.A.

2143	   Phone: +1 (310) 448-9151
2144	   Email: touch@isi.edu

2146	   W. Mark Townsley
2147	   Cisco
2148	   L'Atlantis, 11, Rue Camille Desmoulins
2149	   Issy Les Moulineaux, ILE DE FRANCE 92782

2151	   Email: townsley@cisco.com

2153	APPENDIX A: Fragmentation efficiency

2155	A.1. Selecting fragment sizes

2157	   There are different ways to fragment a packet. Consider a network
2158	   with a PMTU as shown in Figure 16, where packets are encapsulated
2159	   over the same network layer as they arrive on (e.g., IP in IP). If a
2160	   packet as large as the PMTU arrives, it must be fragmented to
2161	   accommodate the additional header.

2163	         X===========================X (transit PMTU)
2164	         +----+----------------------+
2165	         | iH | DDDDDDDDDDDDDDDDDDDD |
2166	         +----+----------------------+
2167	           |
2168	           |  X===========================X (tunnel 1 MTU)
2169	           |  +---+----+------------------+
2170	       (a) +->| H'| iH | DDDDDDDDDDDDDDDD |
2171	           |  +---+----+------------------+
2172	           |      |
2173	           |      |  X===========================X (tunnel 2 MTU)
2174	           |      |  +----+---+----+-------------+
2175	           | (a1) +->| nH'| H | iH | DDDDDDDDDDD |
2176	           |      |  +----+---+----+-------------+
2177	           |      |
2178	           |      |  +----+-------+
2179	           | (a2) +->| nH"| DDDDD |
2180	           |         +----+-------+
2181	           |
2182	           |  +---+------+
2183	       (b) +->| H"| DDDD |
2184	              +---+------+
2185	                  |
2186	                  |  +----+---+------+
2187	             (b1) +->| nH'| H"| DDDD |
2188	                     +----+---+------+

2190	                   Figure 16 Fragmenting via maximum fit

2192	   Figure 16 shows this process using "maximum fit", assuming outer
2193	   fragmentation as an example (the situation is the same for inner
2194	   fragmentation, but the headers that are affected differ). In maximum
2195	   fit, the arriving packet is split into (a) and (b), where (a) is the
2196	   size of the first tunnel, i.e., the tunnel 1 MTU (the maximum that
2197	   fits over the first tunnel). However, this tunnel then traverses over
2198	   another tunnel (number 2), whose impact the first tunnel ingress has
2199	   not accommodated. The packet (a) arrives at the second tunnel
2200	   ingress, and needs to be encapsulated again, but it needs to be
2201	   fragmented as well to fit into the tunnel 2 MTU, into (a1) and (a2).
2202	   In this case, packet (b) arrives at the second tunnel ingress and is
2203	   encapsulated into (b1) without fragmentation, because it is already
2204	   below the tunnel 2 MTU size.

2206	   In Figure 17, the fragmentation is done using "even split", i.e., by
2207	   splitting the original packet into two roughly equal-sized
2208	   components, (c) and (d). Note that (d) contains more packet data,
2209	   because (c) includes the original packet header because this is an
2210	   example of outer fragmentation. The packets (c) and (d) arrive at the
2211	   second tunnel encapsulator, and are encapsulated again; this time,
2212	   neither packet exceeds the tunnel 2 MTU, and neither requires further
2213	   fragmentation.

2215	         X===========================X (transit PMTU)
2216	         +----+----------------------+
2217	         | iH | DDDDDDDDDDDDDDDDDDDD |
2218	         +----+----------------------+
2219	           |
2220	           |  X===========================X (tunnel 1 MTU)
2221	           |  +---+----+----------+
2222	       (c) +->| H'| iH | DDDDDDDD |
2223	           |  +---+----+----------+
2224	           |      |
2225	           |      |  X===========================X (tunnel 2 MTU)
2226	           |      |  +----+---+----+----------+
2227	           | (c1) +->| nH | H'| iH | DDDDDDDD |
2228	           |         +----+---+----+----------+
2229	           |
2230	           |  +---+--------------+
2231	       (d) +->| H"| DDDDDDDDDDDD |
2232	              +---+--------------+
2233	                  |
2234	                  |  +----+---+--------------+
2235	             (d1) +->| nH | H"| DDDDDDDDDDDD |
2236	                     +----+---+--------------+

2238	                  Figure 17 Fragmenting via "even split"

2240	A.2. Packing

2242	   Encapsulating individual packets to traverse a tunnel can be
2243	   inefficient, especially where headers are large relative to the
2244	   packets being carried. In that case, it can be more efficient to
2245	   encapsulate many small packets in a single, larger tunnel payload.

2247	   This technique, similar to the effect of packet bursting in Gigabit
2248	   Ethernet (regardless of whether they're encoded using L2 symbols as
2249	   delineators), reduces the overhead of the encapsulation headers
2250	   (Figure 18). It reduces the work of header addition and removal at
2251	   the tunnel endpoints, but increases other work involving the packing
2252	   and unpacking of the component packets carried.

2254	                     +-----+-----+
2255	                     | iHa | iDa |
2256	                     +-----+-----+
2257	                           |
2258	                           |     +-----+-----+
2259	                           |     | iHb | iDb |
2260	                           |     +-----+-----+
2261	                           |           |
2262	                           |           |     +-----+-----+
2263	                           |           |     | iHc | iDc |
2264	                           |           |     +-----+-----+
2265	                           |           |           |
2266	                           v           v           v
2267	                +----+-----+-----+-----+-----+-----+-----+
2268	                | oH | iHa | iHa | iHb | iDb | iHc | iDc |
2269	                +----+-----+-----+-----+-----+-----+-----+

2271	                  Figure 18 Packing packets into a tunnel