idnits 2.17.1 

draft-ietf-intarea-tunnels-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  -- The draft header indicates that this document updates RFC4459, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC4459, updated by this document, for
     RFC5378 checks: 2004-06-14)

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 6, 2016) is 2851 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-16) exists of
     draft-ietf-nvo3-geneve-01

  == Outdated reference: A later version (-05) exists of
     draft-ietf-nvo3-gue-04

  == Outdated reference: A later version (-02) exists of
     draft-ietf-rtgwg-dt-encap-01

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 5405
     (Obsoleted by RFC 8085)

  -- Obsolete informational reference (is this intentional?): RFC 6830
     (Obsoleted by RFC 9300, RFC 9301)

  == Outdated reference: A later version (-82) exists of
     draft-templin-aerolink-67


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------

1	Internet Area WG                                               J. Touch
2	Internet Draft                                                  USC/ISI
3	Intended status: Informational                              M. Townsley
4	Updates: 4459                                                     Cisco
5	Expires: January 2017                                      July 6, 2016

7	                  IP Tunnels in the Internet Architecture
8	                     draft-ietf-intarea-tunnels-03.txt

10	Status of this Memo

12	   This Internet-Draft is submitted in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   This document may contain material from IETF Documents or IETF
16	   Contributions published or made publicly available before November
17	   10, 2008. The person(s) controlling the copyright in some of this
18	   material may not have granted the IETF Trust the right to allow
19	   modifications of such material outside the IETF Standards Process.
20	   Without obtaining an adequate license from the person(s) controlling
21	   the copyright in such materials, this document may not be modified
22	   outside the IETF Standards Process, and derivative works of it may
23	   not be created outside the IETF Standards Process, except to format
24	   it for publication as an RFC or to translate it into languages other
25	   than English.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html

43	   This Internet-Draft will expire on January 6, 2017.

45	Copyright Notice

47	   Copyright (c) 2016 IETF Trust and the persons identified as the
48	   document authors. All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document. Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document. Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Abstract

62	   This document discusses the role of IP tunnels in the Internet
63	   architecture, in which IP datagrams are carried as payloads in non-
64	   link layer protocols. It explains their relationship to existing
65	   protocol layers and the challenges in supporting IP tunneling based
66	   on the equivalence of tunnels to links.

68	Table of Contents

70	   1. Introduction...................................................3
71	   2. Conventions used in this document..............................6
72	      2.1. Key Words.................................................6
73	      2.2. Terminology...............................................6
74	   3. The Tunnel Model...............................................9
75	      3.1. What is a tunnel?........................................10
76	      3.2. View from the Outside....................................11
77	      3.3. View from the Inside.....................................12
78	      3.4. Location of the Ingress and Egress.......................12
79	      3.5. Implications of This Model...............................13
80	      3.6. Fragmentation............................................14
81	         3.6.1. Outer Fragmentation.................................14
82	         3.6.2. Inner Fragmentation.................................15
83	         3.6.3. The necessity of Outer Fragmentation................16
84	   4. IP Tunnel Requirements........................................16
85	      4.1. Minimum MTU Considerations...............................17
86	      4.2. Fragmentation............................................18
87	      4.3. MTU discovery............................................21
88	      4.4. IP ID exhaustion.........................................22
89	      4.5. Hop Count................................................23
90	      4.6. Signaling................................................24
91	      4.7. Relationship of Header Fields............................26
92	      4.8. Congestion...............................................27
93	      4.9. Checksums................................................27
94	      4.10. Numbering...............................................27
95	      4.11. Multicast...............................................28
96	      4.12. Multipoint..............................................28
97	      4.13. NAT / Load Balancing....................................29
98	      4.14. Recursive tunnels.......................................29
99	   5. Observations (implications)...................................29
100	      5.1. Tunnel protocol designers................................29
101	      5.2. Tunnel implementers......................................30
102	      5.3. Tunnel operators.........................................30
103	      5.4. Diagnostics..............................................30
104	      5.5. For existing standards...................................31
105	         5.5.1. Generic UDP Encapsulation (GUE - IP in UDP in IP)...31
106	         5.5.2. Generic Packet Tunneling in IPv6....................31
107	         5.5.3. Geneve (NVO3).......................................32
108	         5.5.4. GRE (IP in GRE in IP)...............................33
109	         5.5.5. IP in IP / mobile IP................................33
110	         5.5.6. IPsec tunnel mode (IP in IPsec in IP)...............35
111	         5.5.7. L2TP................................................36
112	         5.5.8. L2VPN...............................................36
113	         5.5.9. L3VPN...............................................36
114	         5.5.10. LISP...............................................36
115	         5.5.11. MPLS...............................................37
116	         5.5.12. PWE................................................37
117	         5.5.13. SEAL/AERO..........................................37
118	         5.5.14. TRILL..............................................37
119	         5.5.15. RTG DT encapsulations..............................38
120	      5.6. For future standards.....................................38
121	   6. Security Considerations.......................................39
122	   7. IANA Considerations...........................................40
123	   8. References....................................................40
124	      8.1. Normative References.....................................40
125	      8.2. Informative References...................................40
126	   9. Acknowledgments...............................................44
127	   APPENDIX A: Fragmentation efficiency.............................45
128	      A.1. Selecting fragment sizes.................................45
129	      A.2. Packing..................................................46

131	1. Introduction

133	   The Internet is loosely based on the ISO seven layer stack, in which
134	   data units traverse the stack by being wrapped inside data units one
135	   layer down. A tunnel is a mechanism for transmitting data units
136	   between endpoints by wrapping them as data units of the same or
137	   higher layers, e.g., IP in IP (Figure 1) or IP in UDP (Figure 2).

139	                        +----+----+--------------+
140	                        | IP'| IP |     Data     |
141	                        +----+----+--------------+

143	                           Figure 1 IP inside IP

145	                     +----+-----+----+--------------+
146	                     | IP'| UDP | IP |     Data     |
147	                     +----+-----+----+--------------+

149	                   Figure 2 IP in UDP in IP in Ethernet

151	   This document focuses on tunnels that transit IP packets, i.e., in
152	   which an IP packet is the payload of another protocol. Tunnels
153	   provide a virtual link that can help decouple the network topology
154	   seen by transiting packets from the underlying physical network
155	   [To98][RFC2473]. Tunnels were critical in the development of
156	   multicast because not all routers were capable of processing
157	   multicast packets [Er94]. Tunnels allowed multicast packets to
158	   transit between multicast-capable routers over paths that did not
159	   support multicast. Similar techniques have been used to support other
160	   protocols, such as IPv6 [RFC2460].

162	   Use of tunnels is common in the Internet. The word "tunnel" occurs in
163	   over 100 RFCs, and is supported within numerous protocols, including:

165	   o  IP in IP / mobile IP - IPv4 in IPv4 tunnels
166	      [RFC2003][RFC2473][RFC5944]

168	   o  IP in IPv6 - IPv6 or IPv4 in IPv6 [RFC2473]

170	   o  IPsec - includes a tunnel mode to enable encryption or
171	      authentication of the an entire IP datagram [RFC4301]

173	   o  Generic Router Encapsulation (GRE) - a shim layer for tunneling
174	      any network layer in any other network layer, IP in GRE in IP
175	      [RFC2784][RFC7588][RFC7676]

177	   o  Generic UDP Encapsulation (GUE) - IP in UDP (in IP)[He15]

179	   o  Automatic Multicast Tunneling (AMT) - IP in UDP for multicast
180	      [RFC7450]

182	   o  L2TP - PPP over IP, to extend a subscriber's DSL/FTTH connection
183	      from an access line provider to an ISP [RFC3931]

185	   o  L2VPNs - provides a link topology different from that provided by
186	      physical links [RFC4664]

188	   o  L3VPNs - provides a network topology different from that provided
189	      by ISPs [RFC4176]

191	   o  LISP - reduces routing table load within an enclave of routers at
192	      the expense of more complex ingress encapsulation tables [RFC6830]

194	   o  MPLS - IP over a circuit-like path in which identifiers are
195	      rewritten on each hop, often used for traffic provisioning
196	      [RFC3031]

198	   o  NVO3 - data center network sharing (which includes use of GUE,
199	      above) [RFC7364]

201	   o  PWE3 - emulates wire-like services over packet-switched services
202	      [RFC3985]

204	   o  SEAL/AERO -IP in IP tunneling with an additional shim header
205	      designed to overcome the limitations of RFC2003 [RFC5320][Te16]

207	   o  TRILL - enables L3 routing (typically IS-IS) in an enclave of
208	      Ethernet bridges [RFC5556][RFC6325]

210	   The variety of tunnel mechanisms raises the question of the role of
211	   tunnels in the Internet architecture and the potential need for these
212	   mechanisms to have similar and predictable behavior. In particular,
213	   the ways in which packet sizes (i.e., Maximum Transmission Unit or
214	   MTU) mismatch and error signals (e.g., ICMP) are handled may benefit
215	   from a coordinated approach.

217	   Regardless of the layer in which encapsulation occurs, tunnels
218	   emulate a link. The only difference is that a link operates over a
219	   physical communication channel, whereas a tunnel operates over
220	   software protocol layers. Because tunnels are links, they are subject
221	   to the same issues as any link, e.g., MTU discovery, signaling, and
222	   the potential utility of native support for broadcast and multicast
223	   [RFC2460][RFC3819]. They have advantages over native links, being
224	   potentially easier to reconfigure and control.

226	   The first attempt to use large-scale tunnels transit multicast across
227	   the Internet in 1988 lead to tunnel collapse. At the time, tunnels
228	   were not implemented as encapsulation-based virtual links, but rather
229	   as loose source routes on un-encapsulated IP datagrams [RFC1075].
230	   Using encapsulation tunnels instead avoided that collapse [Er94] and
231	   eventually to AMT [RFC7450].

233	   The remainder of this document describes the general principles of IP
234	   tunneling and discusses the key considerations in the design of a
235	   protocol that tunnels IP datagrams. It derives its conclusions from
236	   the equivalence of tunnels and links. Note that all considerations
237	   are in the context of existing standards and requirements.

239	2. Conventions used in this document

241	2.1. Key Words

243	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
244	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
245	   document are to be interpreted as described in RFC-2119 [RFC2119].

247	2.2. Terminology

249	   This document uses the following terminology. These definitions are
250	   given in the most general terms, but will be used primarily to
251	   discuss IP tunnels in this document. They are presented in order from
252	   most fundamental to those derived on earlier definitions:

254	   o  Messages: variable length data labeled with globally-unique
255	      endpoint IDs, also known as a datagram for IP messages [RFC791].

257	   o  Network node (node): a device that can act as an endpoint or
258	      forwarder. For datagrams (IP messages), these are hosts or
259	      gateways/routers, respectively.

261	   o  Endpoint or host: a node that sources or sinks messages labeled
262	      from/to its IDs, typically known as a host for both IP and higher-
263	      layer protocol messages [RFC1122].

265	   o  Forwarder: a node that relays messages using destination IDs and
266	      local context, also known as a gateway or router for IP messages
267	      [RFC1812]. Note that most forwarders also act as endpoints when
268	      they source or sink messages.

270	   o  Source (sender): the node that generates a message.

272	   o  Destination (receiver): the node that consumes a message.

274	   o  Link: a device (or medium) that transfers messages between nodes,
275	      i.e., by which a message can traverse between nodes without being
276	      processed by a forwarder. Note that the notion of forwarder is
277	      relative to the layer at which message processing is considered
278	      [To16].

280	   o  Link interface (sometimes known as a network interface): a
281	      location on a link co-located with a node where messages depart
282	      onto that link or arrive from that link.

284	   o  Path: a sequence of one or more links or tunnels over which a
285	      message can traverse between nodes (hosts or forwarders), which
286	      may or may not involve being processed by a forwarder.

288	   o  Tunnel: a protocol mechanism that transits messages using
289	      encapsulation to allow a path to appear as a single link. Note
290	      that a protocol can be used to tunnel itself (IP over IP) and that
291	      this includes the conventional layering of the ISO stack (i.e., by
292	      this definition, Ethernet is a tunnel for IP). A tunnel can be
293	      considered a virtual link.

295	   o  Ingress: the virtual link interface of a tunnel which receives
296	      messages within a node, encapsulates them according to the tunnel
297	      protocol, and transmits them into the tunnel. This is the tunnel
298	      equivalent of the outgoing (departing) network interface of a
299	      link. Note that the ingress virtual link interface and traffic
300	      source node can be co-located.

302	   o  Egress: a virtual link interface that receives messages that have
303	      finished transiting a tunnel and presents them to a node. This is
304	      the tunnel equivalent of the incoming (arriving) network interface
305	      of a link. The egress decapsulates messages for further transit to
306	      the destination. Note that the egress virtual link interface and
307	      traffic destination node can be co-located.

309	   o  Tunnel transit packet (TTP): the packet arriving at a node
310	      connected to a tunnel that enters the ingress and exits the
311	      egress, i.e., the packet carried over the tunnel. This is
312	      sometimes known as the "tunneled packet", i.e., the packet carried
313	      over the tunnel. This is the tunnel equivalent of a network layer
314	      packet as it would traverse a link.

316	   o  Tunnel link packet (TLP): packets that traverse from ingress to
317	      egress, in which resides all or part of a tunnel transit packet.
318	      This is sometimes known as the "tunnel packet", i.e., the packet
319	      of the tunnel itself. This is the tunnel equivalent of a link
320	      layer packet as it would traverse a link.

322	   o  Link MTU (LMTU): the largest message that can transit a link. It
323	      typically does not include link-layer information, e.g., link
324	      layer headers or trailers, i.e., it refers to the message that the
325	      link can carry rather than the message as it appears on the link.
326	      This is thus the largest network layer packet (including network
327	      layer headers, e.g., IP datagram) that can transit a link. Note
328	      that this need not be the native size of messages on the link,
329	      i.e., the link may internally fragment and reassemble messages.
330	      For IPv4, the smallest LMTU is 68 bytes [RFC791], and for IPv6 the
331	      smallest LMTU is 1280 bytes [RFC2460].

333	   o  Path MTU (PMTU): the largest message that can transit a path.
334	      Typically, this is the minimum of the link MTUs of the links of
335	      the path, and represents the largest network layer message
336	      (including network layer headers) that can transit a path. Note
337	      that this is not the largest network packet that can be sent
338	      between a source and destination; this is the largest network
339	      network packet that can be sent without requiring reassembly at
340	      the network layer of the destination.

342	   o  Reassembly MTU (RMTU): the largest message that can be reassembled
343	      by a destination, which is not directly related to the link or
344	      path MTU. Sometimes also referred to as "receiver MTU". For IPv4,
345	      this is 576 bytes [RFC793] and for IPv6 it is 1500 bytes
346	      [RFC2460]; note that in both cases, the size refers to the message
347	      transferred at the network layer, which includes the network layer
348	      headers.

350	   o  Tunnel MTU (TMTU): the largest message that can transit a tunnel,
351	      i.e., this is the tunnel equivalent of a link MTU. Typically, this
352	      is limited by the egress reassembly MTU. Note that this value may
353	      have no relation to the path MTU between the tunnel ingress and
354	      egress.

356	   o  Tunnel internal MTU (TIMTU): the largest message that a tunnel
357	      egress can emit into a tunnel without requiring further
358	      fragmentation to reach the tunnel egress. This the path MTU
359	      between the ingress and egress.

361	   o  Egress reassembly MTU (ERMTU): the largest message that can be
362	      reassembled by an egress. This is the size of the RMTU of a tunnel
363	      minus the encapsulation overhead of that tunnel. Sometimes also
364	      referred to as the "egress MTU".

366	3. The Tunnel Model

368	   A network architecture is an abstract description of a distributed
369	   communications system, its components and their relationships, the
370	   requisite properties of those components and the emergent properties
371	   of the system that result [To03]. Such descriptions can help explain
372	   behavior, as when the OSI seven-layer model is used as a teaching
373	   example [Zi80]. Architectures describe capabilities - and, just as
374	   importantly, constraints.

376	   A network can be defined as a system of endpoints and relays
377	   interconnected by communication paths, abstracting away issues of
378	   naming in order to focus on message forwarding. To the extent that
379	   the Internet has a single, coherent interpretation, its architecture
380	   is defined by its core protocols (IP [RFC791], TCP [RFC793], UDP
381	   [RFC768]) and messages, hosts, routers, and links [Cl88][To03], as
382	   shown in Figure 3:

384	               +------+    ------      ------    +------+
385	               |      |   /      \    /      \   |      |
386	               | HOST |--+ ROUTER +--+ ROUTER +--| HOST |
387	               |      |   \      /    \      /   |      |
388	               +------+    ------      ------    +------+

390	                   Figure 3 Basic Internet architecture

392	   As a network architecture, the Internet is a system of hosts and
393	   routers interconnected by links that exchange messages when possible.
394	   "When possible" defines the Internet's "best effort" principle. The
395	   limited role of routers and links represents the End-to-End Principle
396	   [Sa84] and longest-prefix match enables hierarchical forwarding.

398	   Although the definitions of host, router, and link seem absolute,
399	   they are often relative as viewed within the context of one OSI
400	   layer, each of which can be considered a distinct network
401	   architecture. An Internet gateway is a Layer 3 router when it
402	   transits IP datagrams but it acts as a Layer 2 host as it sources or
403	   sinks Layer 2 messages on attached links to accomplish this transit
404	   capability. In this way, a single node (Internet gateway) behaves as
405	   different components (router, host) at different layers.

407	   Even though a single node may have multiple roles - even concurrently
408	   - at a given layer, each role is typically static and determined by
409	   context. An Internet gateway always acts as a Layer 2 host and that
410	   behavior does not depend on where the gateway is viewed from within
411	   Layer 2. In the context of a single layer, a node's behavior is
412	   modeled as a single component from all viewpoints in that layer.

414	3.1. What is a tunnel?

416	   A tunnel can be modeled as a link in another network
417	   [To98][To01][To03]. In Figure 4, a source host (Hsrc) and destination
418	   host (Hdst) communicating over a network M in which two routers (Ra
419	   and Rd) are connected by a tunnel. Keep in mind that it is possible
420	   that both network N and network M can both be components of the
421	   Internet, i.e., there may be regular traffic as well as tunneled
422	   traffic over any of the routers shown.

424	                     --_                         --
425	         +------+   /  \                        /  \   +------+
426	         | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
427	         +------+   \  //\    /  \    /  \    /\\  /   +------+
428	                     --/I \--+ Rb +--+ Rc +--/E \--
429	                       \  /   \  /    \  /   \  /
430	                        \/     --      --     \/
431	                       <------ Network N ------->
432	         <-------------------- Network M --------------------->

434	                         Figure 4 The big picture

436	   The tunnel consists of two elements (ingress I, egress E), that lie
437	   along a path connected by a (possibly different) network N.
438	   Regardless of how the ingress and egress are connected, the tunnel
439	   serves as a link to the nodes it connects (here, Ra and Rd).

441	   IP packets arriving at the ingress are encapsulated to traverse
442	   network N. We call these packets "tunnel transit packets" (TTPs)
443	   because they will now transit the tunnel inside one or more "tunnel
444	   link packets" (TLPs). TLPs use the source address of the ingress and
445	   the destination address of the egress - using whatever address is
446	   appropriate to the Layer at which the ingress and egress operate
447	   (Layer 2, Layer 3, Layer 4, etc.). The egress decapsulates those
448	   messages, which then continue on network M as if emerging from a
449	   link. To tunnel transit packets, and to the routers the tunnel
450	   connects (Ra and Rd), the tunnel acts as a link and the ingress and
451	   egress act as network interfaces to that link.

453	   The model of each component (ingress, egress) and the entire system
454	   (tunnel) depends on the layer from which you view the tunnel. From
455	   the perspective of the outermost hosts (Hsrc and Hdst), the tunnel
456	   appears as a link between two routers (Ra and Rd). For routers along
457	   the tunnel (e.g., Rb and Rc), the ingress and egress appear as the
458	   endpoint hosts and Hsrc and Hdst are invisible.

460	   When the tunnel network (N) is implemented using the same protocol as
461	   the endpoint network (M), the picture looks flatter (Figure 5), as if
462	   it were running over a single network. However, note that this
463	   appearance is incorrect - nothing has changed. From the perspective
464	   of the endpoints, Rb and Rc and network N don't exist and aren't
465	   visible, and from the perspective of the tunnel, network M doesn't
466	   exist. The fact that network N and M use the same protocol, and may
467	   traverse the same links is irrelevant.

469	                   --_         --      --          --
470	       +------+   /  \  /\    /  \    /  \    /\  /  \   +------+
471	       | Hsrc |--+ Ra +/I \--+ Rb +--+ Rc +--/E \+ Rd +--| Hdst |
472	       +------+   \  / \  /   \  /    \  /   \  / \  /   +------+
473	                   --   \/     --      --     \/   --
474	                       <------ Network N ------->
475	       <---------------------- Network M ----------------------->

477	                     Figure 5 IP in IP network picture

479	3.2. View from the Outside

481	   From outside the tunnel, to network M, the entire tunnel acts as a
482	   link (Figure 6). It may be numbered or unnumbered and the addresses
483	   associated with the ingress and egress are irrelevant from outside.

485	                   --_                             --
486	       +------+   /  \                            /  \   +------+
487	       | Hsrc |--+ Ra +--------------------------+ Rd +--| Hdst |
488	       +------+   \  /                            \  /   +------+
489	                   --                              --

491	                Figure 6 Tunnels as viewed from the outside

493	   A tunnel is effectively invisible to the network in which it resides,
494	   except that it behaves exactly as a link. Consequently [RFC3819]
495	   requirements for links supporting IP also apply to tunnels.

497	   E.g., the IP datagram hop count (IPv4 Time-to-Live [RFC791] and IPv6
498	   Hop Limit [RFC2460]) are decremented when traversing a router, not by
499	   traversing a link - or thus a tunnel. Tunnels have a tunnel MTU - the
500	   largest datagram that can transit, just as links have a corresponding
501	   link MTU. A link MTU may not reflect the native link message sizes
502	   (ATM AAL5 48 byte messages support a 9KB MTU) and the same is true
503	   for a tunnel.

505	3.3. View from the Inside

507	   Within network N, i.e., from inside the tunnel itself, the ingress is
508	   a source of tunnel link packets and the egress is a sink - both are
509	   hosts on network N (Figure 7). Consequently [RFC1122] Internet host
510	   requirements apply to ingress and egress nodes when Network N uses IP
511	   (and thus the ingress/egress use IP encapsulation).

513	                   _           --      --
514	                        /\    /  \    /  \    /\
515	                       /I \--+ Rb +--+ Rc +--/E \
516	                       \  /   \  /    \  /   \  /
517	                        \/     --      --     \/
518	                       <------ Network N ------->

520	            Figure 7 Tunnels, as viewed from within the tunnel

522	   Viewed from within the tunnel, the outer network (M) doesn't exist.
523	   Tunnel link packets can be fragmented by the source (ingress) and
524	   reassembled at the destination (egress), just as at any endpoint. The
525	   path between ingress and egress may have a path MTU but the endpoints
526	   can exchange messages as large as can be reassembled at the
527	   destination (egress), i.e., an egress MTU. Information about the
528	   network - i.e., regarding MTU sizes, network reachability, etc. - are
529	   relayed from the destination (egress) and intermediate routers back
530	   to the source (ingress), without regard for the external network (M).

532	3.4. Location of the Ingress and Egress

534	   The ingress and egress are endpoints of the tunnel and the tunnel is
535	   a link. The ingress and egress are thus link endpoints at the network
536	   nodes the tunnel interconnects. Such link endpoints are typically
537	   described as "network interfaces".

539	   Tunnel interfaces may be physical or virtual. The interface may be
540	   implemented inside the node where the tunnel attaches, e.g., inside a
541	   host or router. The interface may also be implemented as a "bump in
542	   the wire" (BITW), somewhere along a link between the two nodes the
543	   link interconnects. IP in IP tunnels are often implemented as
544	   interfaces, where IPsec tunnels are sometimes implemented as BITW.
545	   These implementation variations determine only whether information
546	   available at the link endpoints (ingress/egress) can be easily shared
547	   with the connected network nodes.

549	3.5. Implications of This Model

551	   This approach highlights a few key features of a tunnel as a network
552	   architecture construct:

554	   o  To the tunnel transit packets (TTPs), tunnels turn a network
555	      (Layer 3) path into a (Layer 2) link

557	   o  To nodes the tunnel traverses, the tunnel ingress and egress act
558	      as hosts that source and sink tunnel link packets (TLPs)

560	   The consequences of these features are as follow:

562	   o  Like a link, a tunnel has an MTU defined by the reassembly MTU of
563	      the receiving interface (egress).

565	   o  Like any other link, the MTU inside a tunnel are not relevant to
566	      the transited traffic. There is no mechanism or protocol by which
567	      they are measured or confirmed.

569	   o  Path MTU discovery in the network layer (i.e., outer network M)
570	      has no direct relation to the MTU of the hops within the link
571	      layer of the links (or thus tunnels) that connect its components.

573	   o  Hops remain defined as the number of routers encountered on a path
574	      or the time spent at a router [RFC1812]. Hops are not decremented
575	      solely by the transit of a link, e.g., a packet with a hop count
576	      of zero should successfully transit a link (and thus a tunnel)
577	      that connects two hosts. Routers, not links, alter hopcounts.

579	   o  The addresses of a tunnel ingress and egress correspond to link
580	      layer addresses to the tunnel transit packet and outer network M.
581	      Like point-to-point links, point-to-point tunnels can be
582	      unnumbered in the network in which they reside (even though they
583	      must have addresses in the network they transit).

585	   o  Like network interfaces, the ingress and egress are never a direct
586	      source of ICMP messages but may provide information to their
587	      attached host or router to generate those ICMP messages.

589	   o  Like network interfaces and links, two nodes may be connected by
590	      any combination of tunnels and links, including multiple tunnels.
591	      As with multiple links, existing routing determines which traffic
592	      uses each link or tunnel.

594	   These observations make it much easier to determine what a tunnel
595	   must do to transit IP packets, notably it must satisfy all
596	   requirements expected of a link [RFC1122][RFC3819]. The consequence
597	   of these observations are that tunnels are no different from links,
598	   except only that a link has a physical instantiation.

600	3.6. Fragmentation

602	   There are two places where fragmentation can occur in a tunnel,
603	   called Outer Fragmentation and Inner Fragmentation. This document
604	   assumes that only Outer Fragmentation is viable because it is the
605	   only approach that works for IPv4 datagrams with DF=1 and for IPv6.

607	3.6.1. Outer Fragmentation

609	   The simplest case is Outer Fragmentation, as shown in Figure 8. The
610	   bottom of the figure shows the network topology, where packets start
611	   at the source, enter the tunnel at the encapsulator, exit the tunnel
612	   at the decapsulator, and arrive finally at the destination. The
613	   packet traffic is shown above the topology, where the end-to-end
614	   packets are shown at the top. The packets are composed of an inner
615	   header (iH) and inner data (iD); the term "inner") is relative to the
616	   tunnel, as will become apparent. When the packet (iH,iD) arrives at
617	   the encapsulator, it is placed inside the tunnel packet structure,
618	   here shown as adding just an outer header, oH, in step (a).

620	    +----+----+                                              +----+----+
621	    | iH | iD |------+ -  -  -  -  -  -  -  -  -  -  +------>| iH | iD |
622	    +----+----+      |                               |       +----+----+
623	                     v                               |
624	              +----+----+----+               +----+----+----+
625	          (a) | oH | iH | iD |               | oH | iH | iD | (c)
626	              +----+----+----+               +----+----+----+
627	                     |                               ^
628	                     |       +----+----+-----+       |
629	                (b1) +----- >| oH'| iH | iD1 |-------+
630	                     |       +----+----+-----+       |
631	                     |                               |
632	                     |       +----+-----+            |
633	                (b2) +----- >| oH"| iD2 |------------+
634	                             +----+-----+
635	   +-----+         +---+                           +---+         +-----+
636	   |     |        /     \ ======================= /     \        |     |
637	   | Src |=======|  Enc  |=======================|  Dec  |=======| Dst |
638	   |     |        \     / ======================= \     /        |     |
639	   +-----+         +---+                           +---+         +-----+

641	                Figure 8 Fragmentation of the outer packet

643	   When the encapsulated packet exceeds the tunnel MTU, the packet needs
644	   to be fragmented. In this case we fragment the packet at the outer
645	   header, with the fragments shown as (b1) and (b2). Note that the
646	   outer header indicates fragmentation (as ' and "),the inner header
647	   occurs only in the first fragment, and the inner data is broken
648	   across the two packets. These fragments are reassembled at the
649	   encapsulator in step (c), and the resulting packet is decapsulated
650	   and sent on to the destination.

652	   Outer fragmentation isolates Source and Destination from tunnel
653	   encapsulation duties. This can be considered a benefit in clean,
654	   layered network design, but also may result in complex decapsulator
655	   design, especially where tunnels aggregate large amounts of traffic,
656	   such as IP ID overload (see Sec. 4.4). Outer fragmentation is valid
657	   for any tunnel encapsulation protocol that supports fragmentation
658	   (e.g., IPv4 or IPv6), where the tunnel endpoints act as the host
659	   endpoints of that protocol.

661	   Along the tunnel, the inner header is contained only in the first
662	   fragment, which can interfere with mechanisms that 'peek' into lower
663	   layer headers, e.g., as for ICMP, as discussed in Sec. 4.6.

665	3.6.2. Inner Fragmentation

667	   Inner Fragmentation distributes the impact of tunneling across both
668	   the decapsulator and destination, and is shown in Figure 9; this can
669	   be especially important when the tunnel aggregates large amounts of
670	   traffic. However, this mechanism is thus valid only when the original
671	   source packets can be fragmented on-path, e.g., as in IPv4 datagrams
672	   with DF=0.

674	   Again, the network topology is shown at the bottom of the figure, and
675	   the original packets show at the top. Packets arrive at the
676	   encapsulator, and are fragmented there based on the inner header into
677	   (a1) and (a2). The fragments arrive at the decapsulator, which
678	   removes the outer header and forwards the resulting fragments on to
679	   the destination. The destination is then responsible for reassembling
680	   the fragments into the original packet.

682	   Along the tunnel, the inner headers are copied into each fragment,
683	   and so are available to mechanisms that 'peek' into headers (e.g.,
684	   ICMP, as discussed in Sec. 4.6). Because fragmentation happens on the
685	   inner header, the impact of IP ID is reduced.

687	   +----+----+                                               +----+----+
688	   | iH | iD |-------+-  -  -  -  -  -  -  -  -  -  -  -  - >| iH | iD |
689	   +----+----+       |                                       +----+----+
690	                     v                                            ^
691	                +----+-----+                    +----+-----+      |
692	           (a1) | iH'| iD1 |                    | iH'| iD1 |------+
693	                +----+-----+                    +----+-----+      |
694	                                                                  |
695	                +----+---                       +----+-----+      |
696	           (a2) | iH"| iD2 |                    | iH"| iD2 |------+
697	                +----+-----+                    +----+-----+
698	                     |                               ^
699	                     |       +----+----+-----        |
700	                (b1) +----- >| oH | iH'| iD1 |-------+
701	                     |       +----+----+-----+       |
702	                     |                               |
703	                     |       +----+----+-----+       |
704	                (b2) +----- >| oH | iH"| iD2 |-------+
705	                             +----+----+-----+
706	   +-----+         +---+                           +---+         +-----+
707	   |     |        /     \ ======================= /     \        |     |
708	   | Src |=======|  Enc  |=======================|  Dec  |=======| Dst |
709	   |     |        \     / ======================= \     /        |     |
710	   +-----+         +---+                           +---+         +-----+

712	                Figure 9 Fragmentation of the inner packet

714	3.6.3. The necessity of Outer Fragmentation

716	   Fragmentation is critical tunnels that support TTP packets for
717	   protocols with minimum MTU requirements, while operating over tunnel
718	   paths using protocols with minimum MTU requirements. Depending on the
719	   amount of space used by encapsulation, these two minimums will
720	   ultimately interfere, and the TTP will need to be fragmented to both
721	   support a TTP minimum MTU while traversing tunnels with their own TLP
722	   minimum MTUs.

724	   Outer Fragmentation is the only solution that supports all IPv4 and
725	   IPv6 traffic, because inner fragmentation is allowed only for IPv4
726	   datagrams with DF=0. As a result, the remainder of this document
727	   assumes Outer Fragmentation.

729	4. IP Tunnel Requirements

731	   The requirements of an IP tunnel are defined by the requirements of
732	   an IP link because both transit IP packets. A tunnel thus must
733	   transit the IP minimum MTU, i.e., 68 bytes for IPv4 [RFC793] and 1280
734	   bytes for IPv6 [RFC2460] and a tunnel must support address resolution
735	   when there is more than one egress.

737	   The requirements of the tunnel ingress and egress are defined by the
738	   network over which they exchange messages (tunnel link packets). For
739	   IP-over-IP, this means that the ingress MUST NOT exceed the IPv4
740	   Identification (fragment) field uniqueness requirements [RFC6864].

742	   These requirements remain even though tunnels have some unique
743	   issues, including the need for additional space for encapsulation
744	   headers and the potential for tunnel MTU variation.

746	4.1. Minimum MTU Considerations

748	   There are a variety of values of minimum MTU to consider, both in a
749	   conventional network and in a tunnel as a link in that network. These
750	   are indicated in Figure 10, an annotated variant of Figure 4.

752	     (a) LMTU    <->
753	     (b) PMTU    <------------------------------------>
754	     (c) <-RMTU----------------------------------------------->
755	     (d) TMTU          <------------------------>
756	     (e) TIMTU             <---------------->
757	     (f) ERMTU         <------------------------>
758	                     --_                         --
759	         +------+   /  \                        /  \   +------+
760	         | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
761	         +------+   \  //\    /  \    /  \    /\\  /   +------+
762	                     --/I \--+ Rb +--+ Rc +--/E \--
763	                       \  /   \  /    \  /   \  /
764	                        \/     --      --     \/
765	                       <------ Network N ------->
766	         <-------------------- Network M --------------------->

768	                    Figure 10 The variety of MTU values

770	   Consider the following example values. For IPv6, the minimum LMTU (a)
771	   is 1280 bytes, which is also the minimum PMTU (b). The minimum RMTU
772	   (c) is 1500 bytes, which is also the minimum MTU for endpoint-to-
773	   endpoint communication. This means that IPv6 already assumes that
774	   endpoint-to-endpoint communication may require source fragmentation
775	   to transit IPv6-compatible links, even without considering tunnels.

777	   The TMTU (d) is the tunnel equivalent of a LMTU, and thus also needs
778	   to be 1280 bytes for IPv6. Assuming the links of a tunnel traverse
779	   IPv6 hops (e.g., I to Rb, Rb to Rc, and Rc to E), the TIMTU (e) is
780	   equivalent to the PMTU between I and E, which is 1280 - encaps (where
781	   "encaps" is the tunnel encapsulation overhead). This value is
782	   insufficient to satisfy the requirement of an IPv6 link (which must
783	   transit at least 1280 bytes unfragmented), but this is not a problem.
784	   The TMTU (d) is not limited by TIMTU (e), but by ERMTU (f), the
785	   tunnel equivalent of RMTU (c). For a tunnel using IPv6 over IPv6, the
786	   ERMTU is the RMTU of tne underlying network N minus space for
787	   encapsulation, i.e., 1500 - encaps bytes, and the tunnel is viable as
788	   long as ERMTU >= 1280. Even though the tunnel will ultimately transit
789	   ERMTU - encaps byte messages between the ingress and egress, each hop
790	   within the tunnel transits only TIMTU - encaps byte messages. The
791	   difference between TIMTU and ERMTU is the reason why the tunnel
792	   ingresses need to support fragmentation and tunnel egresses need to
793	   support reassembly. The high cost of fragmentation and reassembly is
794	   why it is useful for applications to avoid sending messages too close
795	   to the PMTU, even the PMTU at their own layer.

797	4.2. Fragmentation

799	   A tunnel interacts with fragmentation in two different ways. As a
800	   link in network M, it messages might be fragmented before they reach
801	   the tunnel - i.e., at the TTP layer either during source
802	   fragmentation (if generated at the same node as the ingress
803	   interface) or forwarding fragmentation (for IPv4 DF=0 datagrams). In
804	   addition, messages traversing the tunnel may require fragmentation by
805	   the ingress - i.e., source fragmentation at the TLP layer by the
806	   ingress. These two fragmentation operations are no more related than
807	   are conventional IP fragmentation and ATM segmentation and
808	   reassembly; one occurs at the network layer, the other at the
809	   (virtual) link layer.

811	   As with any link layer, a tunnel MTU (TMTU) is defined as the largest
812	   message that can transit the tunnel. For a tunnel, this is the egress
813	   reassembly MTU (ERMTU), which is the reassembly MTU (RMTU) of the
814	   egress interface minus the space needed for the tunnel encapsulation
815	   headers. This value must also satisfy the requirements of the IP
816	   packets that the tunnel transits.

818	   Note that many of the issues with tunnel fragmentation and MTU
819	   handling were discussed in [RFC4459], but that document described a
820	   variety of alternatives as if they were independent. This document
821	   explains the combined approach that is necessary.

823	   Like any other link, an IPv4 tunnel must transit 68 byte packets
824	   without requiring source fragmentation [RFC791][RFC1122] and an IPv6
825	   tunnel must transit 1280 byte packets without requiring source
826	   fragmentation [RFC2460]. The tunnel MTU interacts with routers or
827	   hosts it connects the same way as would a link MTU. In the following
828	   pseudocode, TTPsize is the size of the tunnel transit packet (TTP),
829	   and ERMTU is the reassembly MTU of the egress. As with any link, the
830	   link MTU (LMTU) is defined not by the native path of the link (or,
831	   for a tunnel, the path MTU of encapsulated packets inside the tunnel)
832	   but by the egress reassembly capability. This is because the ICMP
833	   "packet too big" message indicates failure of a link to transit a
834	   packet, not a preference for a size that matches that inside the
835	   mechanism of the link. There is no ICMP message for "larger than I'd
836	   like, but I can still transit it".

838	   These rules apply at the host/router where the tunnel is attached,
839	   i.e., at the network layer of the TTP (we assume that all tunnels,
840	   including multipoint tunnels, have a single, uniform TMTU). These are
841	   basic source fragmentation rules (or transit refragmentation for IPv4
842	   DF=0 datagrams), and have no relation to the tunnel itself other than
843	   to consider the TMTU as the effective LMTU of the next hop:

845	      if (TTP > TMTU) then
846	         if (TTP can be fragmented, e.g., IPv4 DF=0) then
847	            split TTP into fragments of TMTU size
848	            and send each fragment to the tunnel ingress
849	         else
850	            drop TTP and send ICMP "too big" to TTP source
851	         endif
852	      else
853	         send TTP to the tunnel ingress
854	      endif

856	   These rules apply at the tunnel ingress, in its role as host on the
857	   tunnel path, i.e., as source fragmentation of TLP messages (we assume
858	   that all tunnels, even multipoint tunnels, have a single, uniform
859	   TIMTU), where "encaps" is the encapsulation overhead:

861	      if (TTP <= (TIMTU + encaps)) then
862	         encapsulate the TTP and process as if arriving at the node
863	      else
864	         if ((TIMTU + encaps) < TTP <= (ERMTU - encaps)) then
865	            fragment TTP into TIMTU chunks
866	            encapslate each chunk and process as if arriving at the node
867	         else
868	            {never happens; host/router already dropped by now}
869	         endif
870	      endif

872	   There is one path above that never occurs - i.e., a network interface
873	   should never receive a message larger than its MTU, and a tunnel
874	   should thus never receive a message larger than its (ERMTU - encaps)
875	   limit. A router attempting to process such a message would generate
876	   an ICMP error (packet too big, fragmentation needed) and the packet
877	   would already have been dropped before entering into this algorithm.

879	   As an example, consider IPv4 over IPv6 or IPv6 over IPv6 tunneling,
880	   where IPv6 encapsulation adds a 40 byte fixed header plus IPv6
881	   options (i.e., IPv6 header extensions) of total size TOptSz. From
882	   [RFC2460] it follows that the TMTU must be at least 1280 bytes and
883	   the ERMTU must be at least 1500 - (40 + TOptSz) bytes. The TIMTU must
884	   be a minimum of 1280 - (40 + TOptSz) bytes. Considering these minimum
885	   values, the previous algorithm becomes:

887	      if (TTP <= (1240 - TOptSz)) then
888	         encapsulate the TTP and and process as if arriving at the node
889	      else
890	         if ((1240 - TOptSz) < TTP <= (1460 - TOptSz))   then
891	            fragment TTP into (1240 - TOptSz) chunks
892	            encapslate each chunk and process as if arriving at the node
893	         else
894	            {never happens; host/router already dropped by now}
895	         endif
896	      endif

898	   This tunnel supports IPv6 transit only if TOptSize is smaller than
899	   180 bytes, and supports IPv4 transit if TOptSize is smaller than 884
900	   bytes. IPv6 TTPs of 1280 bytes may be guaranteed transit the outer
901	   network (M) without needing fragmentation there but they may require
902	   ongoing fragmentation and reassembly if the TMTU is not at least 1320
903	   bytes.

905	   When using IP directly over IP, the minimum ERMTU for IPv4 is 576
906	   bytes and for IPv6 is 1500 bytes. This means that tunnels of IPv4-
907	   over-IPv4, IPv4-over-IPv6, and IPv6-over-IPv6 are possible without
908	   additional requirements, but this may involve ingress fragmentation
909	   and egress reassembly. IPv6 cannot be tunneled directly over IPv4
910	   without additional requirements, notably that the ERMTU is at least
911	   1280 bytes. Fragmentation and reassembly cannot be avoided for IPv6-
912	   over-IPv6 without similar requirements.

914	   When ongoing ingress fragmentation and egress reassembly would be
915	   prohibitive or costly, larger MTUs can be supported by design and
916	   confirmed either out-of-band (by design) or in-band (e.g., using
917	   PLPMTUD [RFC4821], as done in SEAL [RFC5320] and AERO [Te16]).

919	   Alternately, an ingress can encapsulate packets that fit and shut
920	   down once fragmentation is needed, but it must not continue to
921	   forward smaller packets while dropping larger packets that are still
922	   within required limits.

924	4.3. MTU discovery

926	   MTU discovery enables a network path to support a larger PMTU than it
927	   can assume from the minimum requirements of protocol over which it
928	   operates. A tunnel has two different LMTU-like values: TMTU and the
929	   TIMTU.

931	   There is temptation to optimize tunnel traversal so that packets are
932	   not fragmented between ingress and egress, i.e., to attempt tune the
933	   network PMTU to the TIMTU rather than the TMTU, to avoid ingress
934	   fragmentation. This is hazardous for many reasons:

936	   o  The tunnel is capable of transiting packets as large as the ERMTU,
937	      which is always at least as large as the TIMTU and typically is
938	      larger.

940	   o  ICMP has only one type of error message regarding large packets -
941	      "too big", i.e., too large to transit. There is no optimization
942	      message of "bigger than I'd like, but I can deal with if needed".

944	   o  IP tunnels often involve some level of recursion, i.e.,
945	      encapsulation over itself [RFC4459].

947	   Recursive tunneling occurs whenever a protocol ends up encapsulated
948	   in itself. This happens directly, as when IPv4 is encapsulated in
949	   IPv4, or indirectly, as when IP is encapsulated in UDP which then is
950	   a payload inside IP. It can involve many layers of encapsulation
951	   because a tunnel provider isn't always aware of whether the packets
952	   it transits are already tunneled.

954	   Recursion is impossible when the tunnel transit packets are limited
955	   to that of the native size of the TIMTU. Arriving tunnel transit
956	   packets have a minimum supported size (1280 for IPv6) and the tunnel
957	   PMTU has the same requirement; there would be no room for the
958	   additional encapsulation headers. The result would be an IPv6 tunnel
959	   that cannot satisfy IPv6 transit requirements.

961	   It is more appropriate to require the tunnel to satisfy IP transit
962	   requirements and enforce that requirement at design time or during
963	   operation (the latter using PLPMTUD [RFC4821]). Conventional path MTU
964	   discovery (PMTUD) relies on existing endpoint ICMP processing of
965	   explicit negative feedback from routers along the path via "message
966	   to big" ICMP packets in the reverse direction of the tunnel
967	   [RFC1191]. This technique is susceptible to the "black hole"
968	   phenomenon, in which the ICMP messages never return to the source due
969	   to policy-based filtering [RFC2923]. PLPMTUD requires a separate,
970	   direct control channel from the egress to the ingress that provides
971	   positive feedback; the direct channel is not blocked by policy
972	   filters and the positive feedback ensures fail-safe operation if
973	   feedback messages are lost [RFC4821].

975	4.4. IP ID exhaustion

977	   In IPv4, the IP Identification (ID) field is a 16-bit value that is
978	   unique for every packet for a given source address, destination
979	   address, and protocol, such that it does not repeat within the
980	   Maximum Segment Lifetime (MSL) [RFC791][RFC1122]. Although the ID
981	   field was originally intended for fragmentation and reassembly, it
982	   can also be used to detect and discard duplicate packets, e.g., at
983	   congested routers (see Sec. 3.2.1.5 of [RFC1122]). For this reason,
984	   and because IPv4 packets can be fragmented anywhere along a path, all
985	   packets between a source and destination of a given protocol must
986	   have unique ID values over a period of an MSL, which is typically
987	   interpreted as two minutes (120 seconds). These requirements have
988	   recently been somewhat relaxed in recognition of the primary use of
989	   this field for reassembly and the need to handle only fragment
990	   misordering at the receiver [RFC6864].

992	   The uniqueness of the IP ID is a known problem for high speed nodes,
993	   because it limits the speed of a single protocol between two
994	   endpoints [RFC4963]. Although this suggests that the uniqueness of
995	   the IP ID is moot, tunnels exacerbate this condition. A tunnel often
996	   aggregates traffic from a number of different source and destination
997	   addresses, of different protocols, and encapsulates them in a header
998	   with the same ingress and egress addresses, all using a single
999	   encapsulation protocol. The result is one of the following:

1001	   1. The IP ID rules are enforced, and the tunnel throughput is
1002	      severely limited.

1004	   2. The IP ID rules are enforced, and the tunnel consumes large
1005	      numbers of ingress/egress IP addresses solely to ensure ID
1006	      uniqueness.

1008	   3. The IP ID rules are ignored.

1010	   The last case is the most obvious solution, because it corresponds to
1011	   how endpoints currently behave. Fortunately, fragmentation is
1012	   somewhat rare in the current Internet at large, but it can be common
1013	   along a tunnel. Fragments that repeat the IP ID risk being
1014	   reassembled incorrectly, especially when fragments are reordered or
1015	   lost. Reassembly errors are not always detected by other protocol
1016	   layers (see Sec. 4.9), and even when detected they can result in
1017	   excessive overall packet loss and can waste bandwidth between the
1018	   egress and ultimate packet destination.

1020	4.5. Hop Count

1022	   This section considers the selection of the value of the hop count of
1023	   the tunnel link header, as well as the potential impact on the tunnel
1024	   transit header. The former is affected by the number of hops within
1025	   the tunnel. The latter determines whether the tunnel has visible
1026	   effect on the transit packet.

1028	   In general, the Internet hop count field is used to detect and avoid
1029	   forwarding loops that cannot be corrected without a synchronized
1030	   reboot. The IPv4 Time-to-Live (TTL) and IPv6 Hop Limit field each
1031	   serve this purpose [RFC791][RFC2460].

1033	   The IPv4 TTL field was originally intended to indicate packet
1034	   expiration time, measured in seconds. A router is required to
1035	   decrement the TTL by at least one or the number of seconds the packet
1036	   is delayed, whichever is larger [RFC1812]. Packets are rarely held
1037	   that long, and so the field has come to represent the count of the
1038	   number of routers traversed. IPv6 makes this meaning more explicit.

1040	   These hop count fields represent the number of network forwarding
1041	   elements traversed by an IP datagram. An IP datagram with a hop count
1042	   of zero can traverse a link between two hosts because it never visits
1043	   a router (where it would need to be decremented and would have been
1044	   dropped).

1046	   An IP datagram traversing a tunnel thus need not have its hopcount
1047	   modified, i.e., the tunnel transit header need not be affected. A
1048	   zero hop count datagram should be able to traverse a tunnel as easily
1049	   as it traverses a link. A router MAY be configured to decrement
1050	   packets traversing a particular link (and thus a tunnel), which may
1051	   be useful in emulating a path as if it had traversed one or more
1052	   routers, but this is strictly optional. The ability of the outer
1053	   network and tunnel network to avoid indefinitely looping packets does
1054	   not rely on the hop counts of the tunnel traversal packet and tunnel
1055	   link packet being related in any way at all.

1057	   The hop count field is also used by several protocols to determine
1058	   whether endpoints are "local", i.e., connected to the same subnet
1059	   (link-local discovery and related protocols [RFC4861]). A tunnel is a
1060	   way to make a remote address appear directly-connected, so it makes
1061	   sense that the other ends of the tunnel appear local and that such
1062	   link-local protocols operate over tunnels unless configured
1063	   explicitly otherwise. When the interfaces of a tunnel are numbered,
1064	   these can be interpreted the same way as if they were on the same
1065	   link subnet.

1067	4.6. Signaling

1069	   In the current Internet architecture, signaling goes upstream, either
1070	   from routers along a path or from the destination, back toward the
1071	   source. Such signals are typically contained in ICMP messages, but
1072	   can involve other protocols such as RSVP, transport protocol signals
1073	   (e.g., TCP RSTs), or multicast control or transport protocols.

1075	   A tunnel behaves like a link and acts like a link interface at the
1076	   nodes where it is attached. As such, it can provide information that
1077	   enhances IP signaling (e.g., ICMP), but itself does not directly
1078	   generate ICMP messages.

1080	   For tunnels, this means that there are two separate signaling paths.
1081	   The outer network M nodes can each signal the source of the tunnel
1082	   transit packets, Hsrc (Figure 11). Inside the tunnel, the inner
1083	   network N nodes can signal the source of the tunnel link packets, the
1084	   ingress I (Figure 12).

1086	           +--------+---------------------------+--------+
1087	           |        |                           |        |
1088	           v        --_                         --       v
1089	        +------+   /  \                        /  \   +------+
1090	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1091	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1092	                    --/I \--+ Rb +--+ Rc +--/E \--
1093	                      \  /   \  /    \  /   \  /
1094	                       \/     --      --     \/
1095	                        <---- Network N ----->
1096	        <-------------------- Network M --------------------->

1098	                   Figure 11 Signals outside the tunnel
1099	                        +-----+-------+------+
1100	                    --_ |     |       |      |  --
1101	        +------+   /  \ v     |       |      | /  \   +------+
1102	        | Hsrc |--+ Ra +      --      --      + Rd +--| Hdst |
1103	        +------+   \  //\    /  \    /  \    /\\  /   +------+
1104	                    --/I \--+ Rb +--+ Rc +--/E \--
1105	                      \  /   \  /    \  /   \  /
1106	                       \/     --      --     \/
1107	                        <----- Network N ---->
1108	        <--------------------- Network M -------------------->

1110	                    Figure 12 Signals inside the tunnel

1112	   These two signal paths are inherently distinct except where
1113	   information is exchanged between the network interface of the tunnel
1114	   (the ingress) and its attached node (Ra, in both figures).

1116	   It is always possible for a network interface to provide hints to its
1117	   attached node (host or router), which can be used for optimization.
1118	   In this case, when signals inside the tunnel indicate a change to the
1119	   tunnel, the ingress (i.e., the tunnel network interface) can provide
1120	   information to the router (Ra, in both figures), so that Ra can
1121	   generate the appropriate signal in return to Hsrc. This relaying may
1122	   be difficult, because signals inside the tunnel may not return enough
1123	   information to the ingress to support direct relaying to Hsrc.

1125	   In all cases, the tunnel ingress needs to determine how to relay the
1126	   signals from inside the tunnel into signals back to the source. For
1127	   some protocols this is either simple or impossible (such as for
1128	   ICMP), for others, it can even be undefined (e.g., multicast). In
1129	   some cases, the individual signals relayed from inside the tunnel may
1130	   result in corresponding signals in the outside network, and in other
1131	   cases they may just change state of the tunnel interface. In the
1132	   latter case, the result may cause the router Ra to generate new ICMP
1133	   errors when later messages arrive from Hsrc or other sources in the
1134	   outer network.

1136	   The meaning of the relayed information must be carefully translated.
1137	   In the case of soft or hard ICMP errors, the translation may be
1138	   obvious. ICMP "packet too big" messages from inside the tunnel might
1139	   update TIMTU at the ingress, but may have no effect on the tunnel as
1140	   visible to the router where it is attached (Ra).

1142	   In addition to ICMP, messages typically considered for translation
1143	   include Explicit Congestion Notification (ECN [RFC6040]) and
1144	   multicast (IGMP, e.g.).

1146	4.7. Relationship of Header Fields

1148	   Some tunnel specifications attempt to relate the fields of the tunnel
1149	   transit packet and tunnel link packet, i.e., the packet arriving at
1150	   the ingress and the encapsulation header. These two headers are
1151	   effectively independent and there is no utility in requiring their
1152	   contents to be related.

1154	   In specific, the encapsulation header source and destination
1155	   addresses are network endpoints in the tunnel network N, but have no
1156	   meaning in the outer network M, even when the tunneled packet
1157	   traverses the same network. The addresses are effectively
1158	   independent, and the tunnel endpoint addresses are link addresses to
1159	   the tunnel transit packet.

1161	   Because the tunneled packet uses source and destination addresses
1162	   with a separate meaning, it is inappropriate to copy or reuse the
1163	   IPv4 Identification or IPv6 Fragment ID fields of the tunnel transit
1164	   packet. These fields need to be generated based on the context of the
1165	   encapsulation header, not the tunnel transit header.

1167	   Similarly, the DF field need not be copied from the tunnel transit
1168	   packet to the encapsulation header of the tunnel link packet
1169	   (presuming both are IPv4). Path MTU discovery inside the tunnel does
1170	   not directly correspond to path MTU discovery outside the tunnel,
1171	   i.e., inside the tunnel it would update the TIMTU used for outer
1172	   fragmentation at the ingress, but has no effect on the TMTU reported
1173	   to the device where the ingress is attached as a network interface.

1175	   The same is true for most other fields. When a field value is
1176	   generated in the encapsulation header, its meaning should be derived
1177	   from what is desired in the context of the tunnel as a link. When
1178	   feedback is received from these fields, they should be presented to
1179	   the tunnel ingress and egress as if they were network interfaces. The
1180	   behavior of the node where these interfaces attach should be
1181	   identical to that of a conventional link.

1183	   There are exceptions to this rule that are explicitly intended to
1184	   relay signals from inside the tunnel to outside the tunnel. The
1185	   primary example is ECN [RFC6040], which copies the ECN bits from the
1186	   tunnel transit header to the tunnel link header during encapsulation
1187	   at the ingress and modifies the tunnel transit header at egress based
1188	   on a combination of the bits of the two headers. This is intended to
1189	   allow congestion notification within the tunnel to be interpreted as
1190	   if it were on the direct path. Other examples may involve the DSCP
1191	   flags. In both cases, it is assumed that the intent of copying values
1192	   on encapsulation and merging values on decapsulation has the effect
1193	   of allowing the tunnel to act as if it participates in the same type
1194	   of network as outside the tunnel (network M).

1196	4.8. Congestion

1198	   In general, tunnels carrying IP traffic need not react directly to
1199	   congestion any more than would any other link layer [RFC5405]. IP
1200	   traffic is not generally expected to be congestion reactive.

1202	   [text from David Black on ECN relaying?]

1204	4.9. Checksums

1206	   IP traffic transiting a tunnel needs to expect a similar level of
1207	   error detection and correction as it would expect from any other
1208	   link. In the case of IPv4, there are no such expectations, which is
1209	   partly why it includes a header checksum [RFC791].

1211	   IPv6 omitted the header checksum because it already expects most link
1212	   errors to be detected and dropped by the link layer and because it
1213	   also assumes transport protection [RFC2460]. When transiting IPv6
1214	   over IPv6, the tunnel fails to provide the expected error detection.
1215	   This is why IPv6 is often tunneled over layers that include separate
1216	   protection, such as GRE [RFC2784].

1218	   The fragmentation created by the tunnel ingress can increase the need
1219	   for stronger error detection and correction, especially at the tunnel
1220	   egress to avoid reassembly errors. The Internet checksum is known to
1221	   be susceptible to reassembly errors that could be common [RFC4963],
1222	   and should not be relied upon for this purpose. This is why SEAL and
1223	   AERO include a separate checksum [RFC5320][Te16]. This requirement
1224	   can be undermined when using UDP as a tunnel with no UDP checksum (as
1225	   per [RFC6935][RFC6936]) when fragmentation occurs because the egress
1226	   has no checksum with which to validate reassembly. For this reason,
1227	   it is safe to use UDP with a zero checksum for atomic (non-
1228	   fragmented, non-fragmentable) tunnel link packets only; when used on
1229	   fragments, whether generated at the ingress or en-route inside the
1230	   tunnel, omission of such a checksum can result in reassembly errors
1231	   that can cause additional work (capacity, forwarding processing,
1232	   receiver processing) downstream of the egress.

1234	4.10. Numbering

1236	   Tunnel ingresses and egresses have addresses associated with the
1237	   encapsulation protocol. These addresses are the source and
1238	   destination (respectively) of the encapsulated packet while
1239	   traversing the tunnel network.

1241	   Tunnels may or may not have addresses in the network whose traffic
1242	   they transit (e.g., network M in Figure 4). In some cases, the tunnel
1243	   is an unnumbered interface to a point-to-point virtual link. When the
1244	   tunnel has multiple egresses, tunnel interfaces require separate
1245	   addresses in network M.

1247	   To see the effect of tunnel interface addresses, consider traffic
1248	   sourced at router Ra in Figure 4. Even before being encapsulated by
1249	   the ingress, that traffic needs a source IP network address that
1250	   belongs to the router. One option is to use an address associated
1251	   with one of the other interfaces of the router [RFC1122]. Another
1252	   option is to assign a number to the tunnel interface itself.
1253	   Regardless of which address is used, the resulting IP packet is then
1254	   encapsulated by the tunnel ingress using the ingress address as a
1255	   separate operation.

1257	4.11. Multicast

1259	   [To be addressed]

1261	   Note that PMTU for multicast is difficult. PIM carries an option that
1262	   may help in the Population Count Extensions to PIM [RFC6807].

1264	   IMO, again, this is no different than any other multicast link.

1266	4.12. Multipoint

1268	   Multipoint tunnels are tunnels with more than two ingress/egress
1269	   endpoints. Just as tunnels emulate links, multipoint tunnels emulate
1270	   multipoint links.

1272	   Multipoint links require a support for egress determination, just as
1273	   multipoint links do. This function is typically supported by ARP
1274	   [RFC826] or ARP emulation (e.g., LAN Emulation, known as LANE
1275	   [RFC2225]) for multipoint links. For multipoint tunnels, a similar
1276	   mechanism is required for the same purpose - to determine the egress
1277	   address for proper ingress encapsulation.

1279	   All multipoint systems - tunnels and links - might support different
1280	   MTUs between each ingress/egress (or link entrance/exit) pair. In
1281	   most cases, it is simpler to assume a uniform MTU throughout the
1282	   multipoint system, e.g., the minimum MTU supported across all
1283	   ingress/egress pairs. This applies to both the ERMTU and TIMETU (the
1284	   latter as used only by the ingress).

1286	   A multipoint tunnel MUST have support for broadcast and multicast, in
1287	   exactly the same way as this is already required for multipoint links

1289	   [RFC3819]. Both modes can be supported either by a native mechanism
1290	   inside the tunnel or by emulation using serial replication at the
1291	   tunnel ingress, in the same way that links may provide the same
1292	   support either natively (e.g., via promiscuous or automatic
1293	   replication in the link itself) or network interface emulation (e.g.,
1294	   as for non-broadcast multiaccess networks, i.e., NBMAs).

1296	4.13. NAT / Load Balancing

1298	   [To be addressed]

1300	   Talk about ECMP / LAG here

1302	4.14. Recursive tunnels

1304	   [IS THIS REDUNDANT?]

1306	   The rules described in this document already support tunnels over
1307	   tunnels, sometimes known as "recursive" tunnels, in which IP is
1308	   transited over IP either directly or via intermediate encapsulation
1309	   (IP-UDP-IP).

1311	   There are known hazards to recursive tunneling, notably that the
1312	   independence of the tunnel transit header and tunnel link header hop
1313	   counts can result in a tunneling loop. Such looping can be avoided
1314	   when using direct encapsulation (IP in IP) by use of a header option
1315	   to track the encapsulation count and to limit that count [RFC2473].
1316	   This looping cannot be avoided when other protocols are used for
1317	   tunneling, e.g., IP in UDP in IP, because the encapsulation count may
1318	   not be visible where the recursion occurs.

1320	5. Observations (implications)

1322	   [Leave this as a shopping list for now]

1324	5.1. Tunnel protocol designers

1326	   Recursive tunneling + minimum MTU = frag/reassembly is inevitable, at
1327	   least to be able to split/join two fragments

1329	   Account for egress MTU/path MTU differences.

1331	   Include a stronger checksum.

1333	   Ensure the egress MTU is always larger than the path MTU.

1335	   Ensure that the egress reassembly can keep up with line rate OR
1336	   design PLPMTUD into the tunneling protocol.

1338	5.2. Tunnel implementers

1340	   Detect when the egress MTU is exceeded.

1342	   Detect when the egress MTU drops below the required minimum and shut
1343	   down the tunnel if that happens - configuring the tunnel down and
1344	   issuing a hard error may be the only way to detect this anomaly, and
1345	   it's sufficiently important that the tunnel SHOULD be disabled. This
1346	   is always better than blindly assuming the tunnel has been deployed
1347	   correctly, i.e., that the solution has been engineered.

1349	   Do NOT decrement the TTL as part of being a tunnel. It's always
1350	   already OK for a router to decrement the TTL based on different next-
1351	   hop routers, but TTL is a property of a router not a link.

1353	5.3. Tunnel operators

1355	   Keep the difference between "enforced by operators" vs. "enforced by
1356	   active protocol mechanism" in mind. It's fine to assume something the
1357	   tunnel cannot or does not test, as long as you KNOW you can assume
1358	   it. When the assumption is wrong, it will NOT be signaled by the
1359	   tunnel. Do NOT decrement the TTL as part of being a tunnel. It's
1360	   always already OK for a router to decrement the TTL based on
1361	   different next-hop routers, but TTL is a property of a router not a
1362	   link.

1364	   Do NOT decrement the TTL as part of being a tunnel. It's always
1365	   already OK for a router to decrement the TTL based on different next-
1366	   hop routers, but TTL is a property of a router not a link.

1368	   >>>> PLPMTUD can give incorrect information during ECMP or LAG

1370	5.4. Diagnostics

1372	   Some current implementations include diagnostics to support
1373	   monitoring the impact of tunneling, especially the impact on
1374	   fragmentation and reassembly resources, the status of path MTU
1375	   discovery, etc.

1377	   >> Because a tunnel ingress/egress is a network interface, it SHOULD
1378	   have similar resources as any other network interface. This includes
1379	   resources for packet processing as well as monitoring.

1381	5.5. For existing standards

1383	5.5.1. Generic UDP Encapsulation (GUE - IP in UDP in IP)

1385	   [He15]

1387	   Consistent with this doc:

1389	   Inconsistent with this doc:

1391	      Imports RFC4459

1393	      Appears to allow both pre and post-encapsulation fragmentation

1395	   Recommendations:

1397	      Should not encourage pre-encaps fragmentation

1399	      See recommendations for RFC4459

1401	5.5.2. Generic Packet Tunneling in IPv6

1403	   [RFC2473]

1405	   Consistent with this doc:

1407	      Considers the endpoints of the tunnel as virtual interfaces.

1409	      Considers the tunnel a virtual link.

1411	      Requires source fragmentation at the ingress and reassembly at the
1412	   egress.

1414	      Includes a recursion limit to prevent unlimited re-encapsulation.

1416	      Sets tunnel transit header hop limit independently.

1418	      Sends ICMPs back at the ingress based on the arriving tunnel
1419	   transit packet and its relation to the tunnel MTU (though it uses the
1420	   incorrect value of the tunnel MTU; see below).

1422	      Allows for ingress relaying of internal tunnel errors (but see
1423	   below; it does not discuss retaining state about these).

1425	   Inconsistent with this doc:

1427	      Decrements the tunnel transit header by 1, i.e., incorrectly
1428	   assuming that tunnel endpoints occur at routers only and that the
1429	   tunnel, rather than the router, is responsible for this decrement.

1431	      This doc goes to pains to describe the decapsulation process as if
1432	   it were distinct from conventional protocol processing by the
1433	   receiver (when it should not be).

1435	      Copies traffic class from tunnel link to tunnel transit header (as
1436	   one variant).

1438	      Treats the tunnel MTU as the tunnel path MTU, rather than the
1439	   tunnel egress MTU.

1441	      Incorrectly fragments IPv4 DF=0 tunnel transit packets that arrive
1442	   larger than the tunnel MTU at the IPv6 layer; the relationship
1443	   between IPv4 and the tunnel is more complex (as noted in this doc).

1445	      Fails to retain state from the tunnel based on ingress receiving
1446	   ICMP messages from inside the tunnel, e.g., such as might cause
1447	   future tunnel transit packets arriving at the ingress to be discarded
1448	   with an ICMP error response rather than allowing them to proceed into
1449	   the tunnel.

1451	   Recommendation:

1453	      This doc should update 2473 for TTL decrement, tunnel MTU, and
1454	   fragmentation. Other issues are less critical.

1456	5.5.3. Geneve (NVO3)

1458	   [RFC7364] info, [Gr16] stds - ISSUE US AS BCP; Gr16 should follow

1460	   Consistent with this doc:

1462	      Generation of the link header fields is not discussed and presumed
1463	   independent of transit packet.

1465	      Reportedly treats an ingress/egress as applying to multiple
1466	   tunnels, rather than considering them logically independent for each
1467	   tunnel. This appears to confuse implementation aggregation with
1468	   architecture.

1470	      Reportedly treats tunnels as supporting traffic for multiple
1471	   virtual networks, rather than considering them logically independent.
1472	   This appears to confuse implementation aggregation with architecture.

1474	   Inconsistent with this doc:

1476	      Tries to match transit to tunnel path MTU rather than egress MTU.

1478	   Recommendation:

1480	      Gr16 should be updated to follow us

1482	5.5.4. GRE (IP in GRE in IP)

1484	   IPv4 [RFC2784] stds, [RFC7588] info, [RFC7676] stds - NO CHANGES

1486	   Consistent with this doc:

1488	      Does not address link header generation.

1490	      Non-default behavior allows fragmentation of link packet to match
1491	   tunnel path MTU up to the limit of the egress MTU.

1493	      Default behavior sets link DF independently.

1495	      Shuts the tunnel down if the tunnel path MTU isn't >= 1280.

1497	   Inconsistent with this doc:

1499	      Based on tunnel path MTU, not egress MTU.

1501	      Claims that the tunnel (GRE) mechanism is responsible for
1502	   generating ICMP error messages.

1504	      Default behavior fragments transit packet (where possible) based
1505	   on tunnel path MTU (it should fragment based on egress MTU).

1507	      Default behavior does not support the minimum MTU of IPv6 when run
1508	   over IPv6.

1510	      Non-default behavior allows copying DF for IPv4 in IPv4.

1512	   Recommendations:

1514	      No changes - existing docs largely describe legacy deployment.

1516	5.5.5. IP in IP / mobile IP

1518	   IPv4 [RFC2003] stds, [RFC4459] info:

1520	   Consistent with this doc:

1522	      Generate link ID independently

1524	      Generate link DF independently when transit DF=0

1526	      Generate ECN/update ECN based on sharing info [RFC6040]

1528	      Set link TTL to transit to egress only (independently)

1530	      Do not decrement TTL on entry except when part of forwarding

1532	      Do not decrement TTL on exit except when part of forwarding

1534	      Options not copied, but used as a hint to desired services.

1536	      Generally treat tunnel as a link, e.g., for link-local.

1538	   Inconsistent with this doc

1540	      Set link DF when transit DF=1 (won't work unless I-E runs PLPMTUD)

1542	      Drop at egress if transit TTL=0 (wrong TTL for host-host tunnels)

1544	      Drop when transit source is router's IP (prevents tun from router)

1546	      Drop when transit source matches egress (prevents tun to router)

1548	      Use tunnel ICMPs to generate upper ICMPs, copying context (ICMPs
1549	   are now coming from inside a link!); these should be handled by
1550	   setting errors as a "network interface" and letting the attached
1551	   host/router figure out what to send.

1553	      Using tunnel MTU discovery to tune the transit packet to the
1554	   tunnel path MTU rather than egress MTU.

1556	   Recommendations:

1558	      IMO, ought to update 2003! (no "update" to informational), esp.
1559	   regarding TTL issues, transit source drop issues, and tunnel MTU.

1561	   IPv6 [RFC2473] std:

1563	   Consistent with this doc:

1565	      Doesn't discuss lots of header fields, but implies they're set
1566	   independently.

1568	      Sets link TTL independently.

1570	   Inconsistent with this doc:

1572	      Tunnel issues ICMP PTBs.

1574	      ICMP PTB issued if larger then 1280 - header, rather than egress
1575	   reassembly MTU.

1577	      Fragments IPv6 over IPv6 fragments only if transit is <= 1280
1578	   (i.e., forces all tunnels to have a max MTU of 1280).

1580	      Fragments IPv4 over IPv6 fragments only if IPv4 DF=0
1581	   (misinterpreting the "can fragment the IPv4 packet" as permission to
1582	   fragment at the IPv6 link header)

1584	      Considers encapsulation a forwarding operation and decrements the
1585	   transit TTL.

1587	   Recommendation:

1589	      Should UPDATE 2473; tunnel should not issue PTBs (router should),
1590	   issue them correctly, fragment correctly, and not TTL decrement.

1592	5.5.6. IPsec tunnel mode (IP in IPsec in IP)

1594	   [RFC4301] std

1596	   Consistent with this doc:

1598	      Most of the rules, except as noted below.

1600	   Inconsistent with this doc:

1602	      Writes its own header copying rules (Sec 5.1.2), rather than
1603	   referring to existing standards, but that makes sense for security
1604	   reasons.

1606	      Uses policy to set, clear, or copy DF (policy isn't the issue)

1608	      Intertwines tunneling with forwarding rather than presenting the
1609	   tunnel as a network interface; this can be corrected by using IPsec
1610	   transport mode with an IP-in-IP tunnel [RFC3884].

1612	   Recommendations:

1614	      None.

1616	5.5.7. L2TP

1618	   [RFC3931] std

1620	   Consistent with this doc:

1622	      Does not address most link headers, which are thus independent.

1624	   Inconsistent with this doc:

1626	      Manages tunnel access based on tunnel path MTU, instead of egress
1627	   MTU.

1629	      Refers to RFC2473 (IPv6 in IPv6), which is inconsistent with this
1630	   doc as noted above.

1632	   Recommendations:

1634	      Should update to use correct tunnel MTU.

1636	5.5.8. L2VPN

1638	   [RFC4664]

1640	   Consistent with this doc:

1642	   Inconsistent with this doc:

1644	   Recommendations:

1646	5.5.9. L3VPN

1648	   [RFC4176]

1650	   Consistent with this doc:

1652	   Inconsistent with this doc:

1654	   Recommendations:

1656	5.5.10. LISP

1658	   [RFC6830]

1660	   Consistent with this doc:

1662	   Inconsistent with this doc:

1664	   Recommendations:

1666	5.5.11. MPLS

1668	   [RFC3031]

1670	   Consistent with this doc:

1672	   Inconsistent with this doc:

1674	   Recommendations:

1676	5.5.12. PWE

1678	   [RFC3985]

1680	   Consistent with this doc:

1682	   Inconsistent with this doc:

1684	   Recommendations:

1686	5.5.13. SEAL/AERO

1688	   [RFC5320][Te16]

1690	   Consistent with this doc:

1692	   Inconsistent with this doc:

1694	   Recommendations:

1696	5.5.14. TRILL

1698	   [RFC5556][RFC6325]

1700	   Consistent with this doc:

1702	      Puts IP in Ethernet, so most of the issues don't come up.

1704	      Ethernet doesn't have TTL or fragment.

1706	      Rbridge (trill) TTL header is independent of transit packet.

1708	   Inconsistent with this doc:

1710	      None.

1712	   Recommendations:

1714	      None.

1716	5.5.15. RTG DT encapsulations

1718	   [No16], refers to NVO3 and other encapsulations

1720	   Includes info on tables for multipoint tunnels, additional info for
1721	   headers, etc.

1723	   Consistent with this doc:

1725	   Inconsistent with this doc:

1727	      Assumes MTU can be managed to avoid fragmentation. This is
1728	   impossible as long as any one layer is used recursively and that
1729	   layer includes a mandatory minimum MTU. A "trust but verify" policy
1730	   is better than assuming engineered MTU deployment is sufficient.

1732	      Relies on ICMP PTB to correct for tunnel path MTU issues.

1734	      Allows encaps protocols to not support fragmentation.

1736	   Recommendations:

1738	      That doc should refer to this regarding general tunneling issues,
1739	   including fragmentation, tunnel MTU, and TTL, including the "trust
1740	   but verify" issue for engineered MTU deployment.

1742	      All encaps protocols for IP over IP (eventually) MUST support
1743	   fragm.

1745	5.6. For future standards

1747	   Larger IPv4 MTU (2K? or just 2x path MTU?) for reassembly

1749	   Always include frag support for at least two frags; do NOT try to
1750	   deprecate fragmentation.

1752	   Limit encapsulation option use/space.

1754	   Augment ICMP to have two separate messages: PTB vs P-bigger-than-
1755	   optimal

1757	   Include MTU as part of BGP as a hint - SB
1758	   Hazards of multi-MTU draft-van-beijnum-multi-mtu-04

1760	6. Security Considerations

1762	   Tunnels may introduce vulnerabilities or add to the potential for
1763	   receiver overload and thus DOS attacks. These issues are primarily
1764	   related to the fact that a tunnel is a link that traverses a network
1765	   path and to fragmentation and reassembly. ICMP signal translation
1766	   introduces a new security issue and must be done with care. ICMP
1767	   generation at the router or host attached to a tunnel is already
1768	   covered by existing requirements (e.g., should be throttled).

1770	   Tunnels traverse multiple hops of a network path from ingress to
1771	   egress. Traffic along such tunnels may be susceptible to on-path and
1772	   off-path attacks, including fragment injection, reassembly buffer
1773	   overload, and ICMP attacks. Some of these attacks may not be as
1774	   visible to the endpoints of the architecture into which tunnels are
1775	   deployed and these attacks may thus be more difficult to detect.

1777	   Fragmentation at routers or hosts attached to tunnels may place an
1778	   undue burden on receivers where traffic is not sufficiently diffuse,
1779	   because tunnels may induce source fragmentation at hosts and path
1780	   fragmentation (for IPv4 DF=0) more for tunnels than for other links.
1781	   Care should be taken to avoid this situation, notably by ensuring
1782	   that tunnel MTUs are not significantly different from other link
1783	   MTUs.

1785	   Tunnel ingresses emitting IP datagrams MUST obey all existing IP
1786	   requirements, such as the uniqueness of the IP ID field. Failure to
1787	   either limit encapsulation traffic, or use additional ingress/egress
1788	   IP addresses, can result in high speed traffic fragments being
1789	   incorrectly reassembled.

1791	   Tunnels are susceptible to attacks at both the inner and outer
1792	   network layers. The tunnel ingress/egress endpoints appear as network
1793	   interfaces in the outer network, and are as susceptible as any other
1794	   network interface. This includes vulnerability to fragmentation
1795	   reassembly overload, traffic overload, and spoofed ICMPs that
1796	   misreport the state of those interfaces. Similarly, the
1797	   ingress/egress appear as hosts to the path traversed by the tunnel,
1798	   and thus are as susceptible as any other host to attacks as well.

1800	   [management?]

1802	   [Access control?]
1803	   describe relationship to [RFC6169] - JT (as per INTAREA meeting
1804	   notes, don't cover Teredo-specific issues in RFC6169, but include
1805	   generic issues here)

1807	7. IANA Considerations

1809	   This document has no IANA considerations.

1811	   The RFC Editor should remove this section prior to publication.

1813	8. References

1815	8.1. Normative References

1817	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1818	             Requirement Levels", BCP 14, RFC 2119, March 1997.

1820	8.2. Informative References

1822	   [Cl88]    Clark, D., "The design philosophy of the DARPA internet
1823	             protocols," Proc. Sigcomm 1988, p.106-114, 1988.

1825	   [Er94]    Eriksson, H., "MBone: The Multicast Backbone,"
1826	             Communications of the ACM, Aug. 1994, pp.54-60.

1828	   [Gr16]    Gross, J., et al., "Geneve: Generic Network Virtualization
1829	             Encapsulation," draft-ietf-nvo3-geneve-01, Jan. 2016.

1831	   [He15]    Herbert, T., L. Yong, O. Zia, "Generic UDP Encapsulation,"
1832	             draft-ietf-nvo3-gue-04, Jul. 2016.

1834	   [No16]    Nordmark, E. (Ed.), A. Tian, J. Gross, J. Hudson, L.
1835	             Kreeger, P. Garg, P. Thaler, T. Herbert, "Encapsulation
1836	             Considerations," draft-ietf-rtgwg-dt-encap-01, Mar. 2016.

1838	   [RFC768]  Postel, J, "User Datagram Protocol," RFC 768, Aug. 1980

1840	   [RFC791]  Postel, J., "Internet Protocol," RFC 791 / STD 5, September
1841	             1981.

1843	   [RFC793]  Postel, J, "Transmission Control Protocol," RFC 793, Sept.
1844	             1981.

1846	   [RFC826]  Plummer, D., "An Ethernet Address Resolution Protocol -- or
1847	             -- Converting Network Protocol Addresses to 48.bit Ethernet
1848	             Address for Transmission on Ethernet Hardware," RFC 826,
1849	             Nov. 1982.

1851	   [RFC1075] Waitzman, D., C. Partridge, S. Deering, "Distance Vector
1852	             Multicast Routing Protocol," RFC 1075, Nov. 1988.

1854	   [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
1855	             Communication Layers," RFC 1122 / STD 3, October 1989.

1857	   [RFC1191] Mogul, J., S. Deering, "Path MTU discovery," RFC 1191,
1858	             November 1990.

1860	   [RFC1812] Baker, F., "Requirements for IP Version 4 Routers," RFC
1861	             1812, June 1995.

1863	   [RFC2003] Perkins, C., "IP Encapsulation within IP," RFC 2003,
1864	             October 1996.

1866	   [RFC2225] Laubach, M., J. Halpern, "Classical IP and ARP over ATM,"
1867	             RFC 2225, Apr. 1998.

1869	   [RFC2460] Deering, S., R. Hinden, "Internet Protocol, Version 6
1870	             (IPv6) Specification," RFC 2460, Dec. 1998.

1872	   [RFC2473] Conta, A., "Generic Packet Tunneling in IPv6
1873	             Specification," RFC 2473, Dec. 1998.

1875	   [RFC2784] Farinacci, D., T. Li, S. Hanks, D. Meyer, P. Traina,
1876	             "Generic Routing Encapsulation (GRE)", RFC 2784, March
1877	             2000.

1879	   [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery," RFC
1880	             2923, September 2000.

1882	   [RFC2473] Conta, A., S. Deering, "Generic Packet Tunneling in IPv6
1883	             Specification," RFC 2473, Dec. 1998.

1885	   [RFC3031] Rosen, E., A. Viswanathan, R. Callon, "Multiprotocol Label
1886	             Switching Architecture", RFC 3031, January 2001.

1888	   [RFC3819] Karn, P., Ed., C. Bormann, G. Fairhurst, D. Grossman, R.
1889	             Ludwig, J. Mahdavi, G. Montenegro, J. Touch, L. Wood,
1890	             "Advice for Internet Subnetwork Designers," RFC 3819 / BCP
1891	             89, July 2004.

1893	   [RFC3884] Touch, J., L. Eggert, Y. Wang, "Use of IPsec Transport Mode
1894	             for Dynamic Routing," RFC 3884, September 2004.

1896	   [RFC3931] Lau, J., Ed., M. Townsley, Ed., I. Goyret, Ed., "Layer Two
1897	             Tunneling Protocol - Version 3 (L2TPv3)," RFC 3931, March
1898	             2005.

1900	   [RFC3985] Bryant, S., P. Pate (Eds.), "Pseudo Wire Emulation Edge-to-
1901	             Edge (PWE3) Architecture", RFC 3985, March 2005.

1903	   [RFC4176] El Mghazli, Y., Ed., T. Nadeau, M. Boucadair, K. Chan, A.
1904	             Gonguet, "Framework for Layer 3 Virtual Private Networks
1905	             (L3VPN) Operations and Management," RFC 4176, October 2005.

1907	   [RFC4301] Kent, S., and K. Seo, "Security Architecture for the
1908	             Internet Protocol," RFC 4301, December 2005.

1910	   [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-
1911	             Network Tunneling," RFC 4459, April 2006.

1913	   [RFC4664] Andersson, L., Ed., E. Rosen, Ed., "Framework for Layer 2
1914	             Virtual Private Networks (L2VPNs)," RFC 4664, September
1915	             2006.

1917	   [RFC4821] Mathis, M., J. Heffner, "Packetization Layer Path MTU
1918	             Discovery," RFC 4821, March 2007.

1920	   [RFC4861] Narten, T., E. Nordmark, W. Simpson, H. Soliman, "Neighbor
1921	             Discovery for IP version 6 (IPv6)," RFC 4861, Sept. 2007.

1923	   [RFC4963] Heffner, J., M. Mathis, B. Chandler, "IPv4 Reassembly
1924	             Errors at High Data Rates," RFC 4963, July 2007.

1926	   [RFC5320] Templin, F., Ed., "The Subnetwork Encapsulation and
1927	             Adaptation Layer (SEAL)," RFC 5320, Feb. 2010.

1929	   [RFC5405] Eggert, L., G. Fairhurst, "Unicast UDP Usage Guidelines for
1930	             Application Designers," RFC 5405, Nov. 2008.

1932	   [RFC5556] Touch, J., R. Perlman, "Transparently Interconnecting Lots
1933	             of Links (TRILL): Problem and Applicability Statement," RFC
1934	             5556, May 2009.

1936	   [RFC5944] Perkins, C., Ed., "IP Mobility Support for IPv4, Revised"
1937	             RFC 5944, Nov. 2010.

1939	   [RFC6040] Briscoe, B., "Tunneling of Explicit Congestion
1940	             Notification," RFC 6040, Nov. 2010.

1942	   [RFC6169] Krishnan, S., D. Thaler, J. Hoagland, "Security Concerns
1943	             With IP Tunneling," RFC 6169, Apr. 2011.

1945	   [RFC6325] Perlman, R., D. Eastlake, D. Dutt, S. Gai, A. Ghanwani,
1946	             "Routing Bridges (RBridges): Base Protocol Specification,"
1947	             RFC 6325, July 2011.

1949	   [RFC6807] Farinacci, D., G. Shepherd, S. Venaas, Y. Cai, "Population
1950	             Count Extensions to Protocol Independent Multicast (PIM),"
1951	             RFC 6807, Dec. 2012.

1953	   [RFC6830] Farinacci, D., V. Fuller, D. Meyer, D. Lewis, "The
1954	             Locator/ID Separation Protocol," RFC 6830, Jan. 2013.

1956	   [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field,"
1957	             Proposed Standard, RFC 6864, Feb. 2013.

1959	   [RFC6935] Eubanks, M., P. Chimento, M. Westerlund, "IPv6 and UDP
1960	             Checksums for Tunneled Packets," RFC 6935, Apr. 2013.

1962	   [RFC6936] Fairhurst, G., M. Westerlund, "Applicability Statement for
1963	             the Use of IPv6 UDP Datagrams with Zero Checksums," RFC
1964	             6936, Apr. 2013.

1966	   [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., M.
1967	             Napierala, "Problem Statement: Overlays for Network
1968	             Virtualization", RFC 7364, Oct. 2014.

1970	   [RFC7450] Bumgardner, G., "Automatic Multicast Tunneling," RFC 7450,
1971	             Feb. 2015.

1973	   [RFC7588] Bonica, R., C. Pignataro, J. Touch, "A Widely-Deployed
1974	             Solution to the Generic Routing Encapsulation Fragmentation
1975	             Problem," RFC 7588, July 2015.

1977	   [RFC7676] Pignataro, C., R. Bonica, S. Krishnan, "IPv6 Support for
1978	             Generic Routing Encapsulation (GRE)," RFC 7676, Oct 2015.

1980	   [Sa84]    Saltzer, J., D. Reed, D. Clark, "End-to-end arguments in
1981	             system design," ACM Trans. on Computing Systems, Nov. 1984.

1983	   [Te16]    Templin, F., "Asymmetric Extended Route Optimization,"
1984	             draft-templin-aerolink-67, Jun. 2016.

1986	   [To01]    Touch, J., "Dynamic Internet Overlay Deployment and
1987	             Management Using the X-Bone," Computer Networks, July 2001,
1988	             pp. 117-135.

1990	   [To03]    Touch, J., Y. Wang, L. Eggert, G. Finn, "Virtual Internet
1991	             Architecture," USC/ISI Tech. Report 570, Aug. 2003.

1993	   [To16]    Touch, J., "Middleboxes Models Compatible with the
1994	             Internet," USC/ISI Tech. Report <TBD>, July 2016.

1996	   [To98]    Touch, J., S. Hotz, "The X-Bone," Proc. Globecom Third
1997	             Global Internet Mini-Conference, Nov. 1998.

1999	   [Zi80]    Zimmermann, H., "OSI Reference Model - The ISO Model of
2000	             Architecture for Open Systems Interconnection," IEEE Trans.
2001	             on Comm., Apr. 1980.

2003	9. Acknowledgments

2005	   This document originated as the result of numerous discussions among
2006	   the authors, Jari Arkko, Stuart Bryant, Lars Eggert, Ted Faber, Gorry
2007	   Fairhurst, Dino Farinacci, Matt Mathis, and Fred Templin. It
2008	   benefitted substantially from detailed feedback from Toerless Eckert,
2009	   Vincent Roca, and Lucy Yong, as well as other members of the Internet
2010	   Area Working Group.

2012	   This document was prepared using 2-Word-v2.0.template.dot.

2014	Authors' Addresses

2016	   Joe Touch
2017	   USC/ISI
2018	   4676 Admiralty Way
2019	   Marina del Rey, CA 90292-6695
2020	   U.S.A.

2022	   Phone: +1 (310) 448-9151
2023	   Email: touch@isi.edu

2025	   W. Mark Townsley
2026	   Cisco
2027	   L'Atlantis, 11, Rue Camille Desmoulins
2028	   Issy Les Moulineaux, ILE DE FRANCE 92782

2030	   Email: townsley@cisco.com

2032	APPENDIX A: Fragmentation efficiency

2034	A.1. Selecting fragment sizes

2036	   There are different ways to fragment a packet. Consider a network
2037	   with an MTU as shown in Figure 13, where packets are encapsulated
2038	   over the same network layer as they arrive on (e.g., IP in IP). If a
2039	   packet as large as the MTU arrives, it must be fragmented to
2040	   accommodate the additional header.

2042	                 X===========================X (MTU)
2043	                 +----+----------------------+
2044	                 | iH | DDDDDDDDDDDDDDDDDDDD |
2045	                 +----+----------------------+
2046	                   |
2047	                   |  X===========================X (MTU)
2048	                   |  +---+----+------------------+
2049	               (a) +->| H'| iH | DDDDDDDDDDDDDDDD |
2050	                   |  +---+----+------------------+
2051	                   |      |
2052	                   |      |  X===========================X (MTU)
2053	                   |      |  +----+---+----+-------------+
2054	                   | (a1) +->| nH'| H | iH | DDDDDDDDDDD |
2055	                   |      |  +----+---+----+-------------+
2056	                   |      |
2057	                   |      |  +----+-------+
2058	                   | (a2) +->| nH"| DDDDD |
2059	                   |         +----+-------+
2060	                   |
2061	                   |  +---+------+
2062	               (b) +->| H"| DDDD |
2063	                      +---+------+
2064	                          |
2065	                          |  +----+---+------+
2066	                     (b1) +->| nH'| H"| DDDD |
2067	                             +----+---+------+

2069	                   Figure 13Fragmenting via maximum fit

2071	   Figure 13 shows this process, using Outer Fragmentation as an example
2072	   (the situation is the same for Inner Fragmentation, but the headers
2073	   that are affected differ). The arriving packet is first split into
2074	   (a) and (b), where (a) is of the MTU of the network. However, this
2075	   tunnel then traverses over another tunnel, whose impact the first
2076	   tunnel ingress has not accommodated. The packet (a) arrives at the
2077	   second tunnel ingress, and needs to be encapsulated again, but
2078	   because it is already at the MTU, it needs to be fragmented as well,
2079	   into (a1) and (a2). In this case, packet (b) arrives at the second
2080	   tunnel ingress and is encapsulated into (b1) without fragmentation,
2081	   because it is already below the MTU size.

2083	   In Figure 14, the fragmentation is done evenly, i.e., by splitting
2084	   the original packet into two roughly equal-sized components, (c) and
2085	   (d). Note that (d) contains more packet data, because (c) includes
2086	   the original packet header because this is an example of Outer
2087	   Fragmentation. The packets (c) and (d) arrive at the second tunnel
2088	   encapsulator, and are encapsulated again; this time, neither packet
2089	   exceeds the MTU, and neither requires further fragmentation.

2091	                 X===========================X (MTU)
2092	                 +----+----------------------+
2093	                 | iH | DDDDDDDDDDDDDDDDDDDD |
2094	                 +----+----------------------+
2095	                   |
2096	                   |  X===========================X (MTU)
2097	                   |  +---+----+----------+
2098	               (c) +->| H'| iH | DDDDDDDD |
2099	                   |  +---+----+----------+
2100	                   |      |
2101	                   |      |  X===========================X (MTU)
2102	                   |      |  +----+---+----+----------+
2103	                   | (c1) +->| nH | H'| iH | DDDDDDDD |
2104	                   |         +----+---+----+----------+
2105	                   |
2106	                   |  +---+--------------+
2107	               (d) +->| H"| DDDDDDDDDDDD |
2108	                      +---+--------------+
2109	                          |
2110	                          |  +----+---+--------------+
2111	                     (d1) +->| nH | H"| DDDDDDDDDDDD |
2112	                             +----+---+--------------+

2114	                       Figure 14 Fragmenting evenly

2116	A.2. Packing

2118	   Encapsulating individual packets to traverse a tunnel can be
2119	   inefficient, especially where headers are large relative to the
2120	   packets being carried. In that case, it can be more efficient to
2121	   encapsulate many small packets in a single, larger tunnel payload.
2122	   This technique, similar to the effect of packet bursting in Gigabit
2123	   Ethernet (regardless of whether they're encoded using L2 symbols as
2124	   delineators), reduces the overhead of the encapsulation headers
2125	   (Figure 15). It reduces the work of header addition and removal at
2126	   the tunnel endpoints, but increases other work involving the packing
2127	   and unpacking of the component packets carried.

2129	                     +-----+-----+
2130	                     | iHa | iDa |
2131	                     +-----+-----+
2132	                           |
2133	                           |     +-----+-----+
2134	                           |     | iHb | iDb |
2135	                           |     +-----+-----+
2136	                           |           |
2137	                           |           |     +-----+-----+
2138	                           |           |     | iHc | iDc |
2139	                           |           |     +-----+-----+
2140	                           |           |           |
2141	                           v           v           v
2142	                +----+-----+-----+-----+-----+-----+-----+
2143	                | oH | iHa | iHa | iHb | iDb | iHc | iDc |
2144	                +----+-----+-----+-----+-----+-----+-----+

2146	                  Figure 15 Packing packets into a tunnel

2148	   [NOTE: PPP chopping and coalescing?]