idnits 2.17.1 

draft-ietf-intarea-tunnels-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 19, 2016) is 3020 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-16) exists of
     draft-ietf-nvo3-geneve-01

  == Outdated reference: A later version (-05) exists of
     draft-ietf-nvo3-gue-02

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 5405
     (Obsoleted by RFC 8085)

  -- Obsolete informational reference (is this intentional?): RFC 6830
     (Obsoleted by RFC 9300, RFC 9301)

  == Outdated reference: A later version (-82) exists of
     draft-templin-aerolink-64


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------

1	Internet Area WG                                               J. Touch
2	Internet Draft                                                  USC/ISI
3	Intended status: Informational                              M. Townsley
4	Expires: July 2016                                                Cisco
5	                                                       January 19, 2016

7	                  IP Tunnels in the Internet Architecture
8	                     draft-ietf-intarea-tunnels-02.txt

10	Status of this Memo

12	   This Internet-Draft is submitted in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   This document may contain material from IETF Documents or IETF
16	   Contributions published or made publicly available before November
17	   10, 2008. The person(s) controlling the copyright in some of this
18	   material may not have granted the IETF Trust the right to allow
19	   modifications of such material outside the IETF Standards Process.
20	   Without obtaining an adequate license from the person(s) controlling
21	   the copyright in such materials, this document may not be modified
22	   outside the IETF Standards Process, and derivative works of it may
23	   not be created outside the IETF Standards Process, except to format
24	   it for publication as an RFC or to translate it into languages other
25	   than English.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html

43	   This Internet-Draft will expire on July 19, 2016.

45	Copyright Notice

47	   Copyright (c) 2016 IETF Trust and the persons identified as the
48	   document authors. All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document. Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document. Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Abstract

62	   This document discusses the role of IP tunnels in the Internet
63	   architecture. It explains their relationship to existing protocol
64	   layers and the challenges in supporting IP tunneling based on the
65	   equivalence of tunnels to links.

67	Table of Contents

69	   1. Introduction...................................................3
70	   2. Conventions used in this document..............................5
71	      2.1. Key Words.................................................5
72	      2.2. Terminology...............................................6
73	   3. The Tunnel Model...............................................7
74	      3.1. What is a tunnel?.........................................8
75	      3.2. View from the Outside....................................10
76	      3.3. View from the Inside.....................................10
77	      3.4. Location of the Ingress and Egress.......................11
78	      3.5. Implications of This Model...............................11
79	   4. IP Tunnel Requirements........................................12
80	      4.1. Fragmentation............................................13
81	      4.2. MTU discovery............................................15
82	      4.3. IP ID exhaustion.........................................16
83	      4.4. Hop Count................................................17
84	      4.5. Signaling................................................18
85	      4.6. Relationship of Header Fields............................20
86	      4.7. Congestion...............................................21
87	      4.8. Checksums................................................21
88	      4.9. Numbering................................................22
89	      4.10. Multicast...............................................22
90	      4.11. NAT / Load Balancing....................................22
91	      4.12. Recursive tunnels.......................................22
92	   5. Observations (implications)...................................23
93	      5.1. Tunnel protocol designers................................23
94	      5.2. Tunnel implementers......................................23
95	      5.3. Tunnel operators.........................................23
96	      5.4. For existing standards...................................24
97	         5.4.1. Generic UDP Encapsulation (GUE - IP in UDP in IP)...24
98	         5.4.2. Generic Packet Tunneling in IPv6....................24
99	         5.4.3. Geneve (NVO3).......................................25
100	         5.4.4. GRE (IP in GRE in IP)...............................25
101	         5.4.5. IP in IP / mobile IP................................26
102	         5.4.6. IPsec tunnel mode (IP in IPsec in IP)...............27
103	         5.4.7. L2TP................................................28
104	         5.4.8. L2VPN...............................................28
105	         5.4.9. L3VPN...............................................28
106	         5.4.10. LISP...............................................28
107	         5.4.11. MPLS...............................................28
108	         5.4.12. PWE................................................28
109	         5.4.13. SEAL/AERO..........................................28
110	         5.4.14. TRILL..............................................28
111	      5.5. For future standards.....................................29
112	   6. Security Considerations.......................................29
113	   7. IANA Considerations...........................................30
114	   8. References....................................................30
115	      8.1. Normative References.....................................30
116	      8.2. Informative References...................................30
117	   9. Acknowledgments...............................................34
118	   Appendix A. Fragmentation........................................35
119	      A.1. Outer Fragmentation......................................35
120	      A.2. Inner Fragmentation......................................36
121	   APPENDIX B: Fragmentation efficiency.............................38
122	      B.1. Selecting fragment sizes.................................38
123	      B.2. Packing..................................................39

125	1. Introduction

127	   The Internet is loosely based on the ISO seven layer stack, in which
128	   data units traverse the stack by being wrapped inside data units one
129	   layer down. A tunnel is a mechanism for transmitting data units
130	   between endpoints by wrapping them as data units of the same or
131	   higher layers, e.g., IP in IP (Figure 1) or IP in UDP (Figure 2).

133	                        +----+----+--------------+
134	                        | IP'| IP |     Data     |
135	                        +----+----+--------------+

137	                           Figure 1 IP inside IP

139	                     +----+-----+----+--------------+
140	                     | IP'| UDP | IP |     Data     |
141	                     +----+-----+----+--------------+

143	                   Figure 2 IP in UDP in IP in Ethernet

145	   This document focuses on tunnels that transit IP packets, i.e., in
146	   which an IP packet is the payload of another protocol. Tunnels
147	   provide a virtual link that can help decouple the network topology
148	   seen by transiting packets from the underlying physical network
149	   [To98][RFC2473]. For example, tunnels were critical in the
150	   development of multicast because not all routers were capable of
151	   processing multicast packets [Er94]. Tunnels allowed multicast
152	   packets to transit between multicast-capable routers over paths that
153	   did not support multicast. Similar techniques have been used to
154	   support other protocols, such as IPv6 [RFC2460].

156	   Use of tunnels is common in the Internet. The word "tunnel" occurs in
157	   over 100 RFCs, and is supported within numerous protocols, including:

159	   o  Generic UDP Encapsulation (GUE) - IP in UDP (in IP)[He15a][He15b]

161	   o  Generic IPv6 tunneling [RFC2473]

163	   o  Generic Router Encapsulation (GRE) - an encapsulation framework
164	      allowing different messages to tunnel over a variety of tunnels,
165	      e.g., IP in GRE in IP [RFC2473][RFC2784][RFC7588][RFC7676]

167	   o  IP in IP / mobile IP [RFC2003][RFC2473][RFC5944]

169	   o  IPsec - hides the original traffic destination [RFC4301]

171	   o  L2TP - Tunnels PPP over IP, used largely in DSL/FTTH access
172	      networks to extend a subscriber's connection from an access line
173	      provider to an ISP [RFC3931]

175	   o  L2VPNs - provides a link topology different from that provided by
176	      physical links [RFC4664]

178	   o  L3VPNs - provides a network topology different from that provided
179	      by ISPs [RFC4176]

181	   o  LISP - reduces routing table load within an enclave of routers
182	      [RFC6830]

184	   o  MPLS - tunnels IP over a circuit-like path in which identifiers
185	      are rewritten on each hop, often used for traffic provisioning
186	      [RFC3031]

188	   o  NVO3 - tunnels for data center network sharing (which includes use
189	      of GUE, above) [RFC7364]

191	   o  PWE3 - tunnels to emulate wire-like services over packet-switched
192	      services [RFC3985]

194	   o  SEAL/AERO - a generic mechanism for IP in IP tunneling designed to
195	      overcome the limitations of RFC2003 [RFC5320][Te16]

197	   o  TRILL - enables L3 routing (typically IS-IS) in an enclave of
198	      Ethernet bridges [RFC5556][RFC6325]

200	   The variety of tunnel mechanisms raises the question of the role of
201	   tunnels in the Internet architecture and the potential need for these
202	   mechanisms to have similar and predictable behavior. In particular,
203	   the ways in which packet sizes (i.e., Maximum Transmission Unit or
204	   MTU) mismatch and error signals (e.g., ICMP) are handled may benefit
205	   from a coordinated approach.

207	   It is useful to note that, regardless of the layer in which
208	   encapsulation occurs, tunnels emulate a link. As links, they are
209	   subject to link issues, e.g., MTU discovery, signaling, and the
210	   potential utility of native support for broadcast and multicast
211	   [RFC2460][RFC3819]. They have advantages over native links, being
212	   potentially easier to reconfigure and control.

214	   The remainder of this document describes the general principles of IP
215	   tunneling and discusses the key considerations in the design of a
216	   protocol that tunnels IP datagrams. It derives its conclusions from
217	   the equivalence of tunnels and links. Note that all considerations
218	   are in the context of existing standards and requirements.

220	2. Conventions used in this document

222	2.1. Key Words

224	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
225	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
226	   document are to be interpreted as described in RFC-2119 [RFC2119].

228	2.2. Terminology

230	   This document uses the following terminology. These definitions are
231	   given in the most general terms, but will be used primarily to
232	   discuss IP tunnels in this document. They are presented in order from
233	   most fundamental to those derived on earlier definitions:

235	   o  Messages: variable length data labeled with globally-unique
236	      endpoint IDs [RFC791]

238	   o  Endpoint: a network device that sources or sinks messages labeled
239	      from/to its IDs, also known as a host [RFC1122].

241	   o  Forwarder: a network device that relays IP messages using longest-
242	      prefix match of destination IDs and local context, when possible,
243	      also known as a gateway or router [RFC1812].

245	   o  Network node (node): an endpoint or forwarder. For Internet
246	      messages (IP datagrams), these are hosts or gateways/routers,
247	      respectively.

249	   o  Source: the origin host of a message.

251	   o  Destination: the receiving host of a message.

253	   o  Link: a communication device that transfers messages between
254	      network devices, i.e., by which a message can traverse between
255	      devices without being processed by a forwarder. Note that the
256	      notion of forwarder is relative to the layer at which message
257	      processing is considered [RFC1122][RFC1812].

259	   o  Path: a communications path by which a message can traverse
260	      between network nodes, which may or may not involve being
261	      processed by a forwarding node.

263	   o  Tunnel: a protocol mechanism that transits messages using
264	      encapsulation to allow a path to appear as a link. Note that a
265	      protocol can be used to tunnel itself (IP over IP) and that this
266	      includes the conventional layering of the ISO stack (i.e., by this
267	      definition, Ethernet is a tunnel for IP).

269	   o  Ingress: a network node that receives messages, encapsulates them
270	      according to the tunnel protocol, and transmits them into the
271	      tunnel. Note that the ingress and source can be co-located.

273	   o  Egress: a network node that receives messages that have finished
274	      transiting a tunnel. The egress decapsulates datagrams for further
275	      transit to the destination. Note that the egress and destination
276	      can be co-located.

278	   o  Tunnel transit packet: the packet arriving at a node connected to
279	      a tunnel that enters the ingress and exits the egress, i.e., the
280	      packet carried over the tunnel. This is sometimes known as the
281	      "tunneled packet", i.e., the packet carried over the tunnel.

283	   o  Tunnel link packet: packets that traverse from ingress to egress,
284	      in which resides all or part of a tunnel transit packet. This is
285	      sometimes known as the "tunnel packet", i.e., the packet of the
286	      tunnel itself.

288	   o  Link MTU (LMTU): the largest message that can transit a link. Note
289	      that this need not be the native size of messages on the link.

291	   o  Reassembly MTU (RMTU): the largest message that can be reassembled
292	      by a receiver, and is not directly related to the link or path
293	      MTU. Sometimes also referred to as "receiver MTU".

295	   o  Path MTU (PMTU): the largest message that can transit a path.
296	      Typically, this is the minimum of the link MTUs of the links of
297	      the path.

299	   o  Tunnel MTU (TMTU): the largest message that can transit a tunnel.
300	      Typically, this is limited by the egress reassembly MTU.

302	3. The Tunnel Model

304	   A network architecture is an abstract description of a distributed
305	   communications system, its components and their relationships, the
306	   requisite properties of those components and the emergent properties
307	   of the system that result [To03]. Such descriptions can help explain
308	   behavior, as when the OSI seven-layer model is used as a teaching
309	   example [Zi80]. Architectures describe capabilities - and, just as
310	   importantly, constraints.

312	   A network can be defined as a system of endpoints and relays
313	   interconnected by communication paths, abstracting away issues of
314	   naming in order to focus on message forwarding. To the extent that
315	   the Internet has a single, coherent interpretation, its architecture
316	   is defined by its core protocols (IP [RFC791], TCP [RFC793], UDP
317	   [RFC768]) and messages, hosts, routers, and links [Cl88][To03], as
318	   shown in Figure 3:

320	               +------+    ------      ------    +------+
321	               |      |   /      \    /      \   |      |
322	               | HOST |--+ ROUTER +--+ ROUTER +--| HOST |
323	               |      |   \      /    \      /   |      |
324	               +------+    ------      ------    +------+

326	                   Figure 3 Basic Internet architecture

328	   As a network architecture, the Internet is a system of hosts and
329	   routers interconnected by links that exchange messages when possible.
330	   "When possible" defines the Internet's "best effort" principle. The
331	   limited role of routers and links represents the End-to-End Principle
332	   [Sa84] and longest-prefix match enables hierarchical forwarding.

334	   Although the definitions of host, router, and link seem absolute,
335	   they are often relative as viewed within the context of one OSI
336	   layer, each of which can be considered a distinct network
337	   architecture. An Internet gateway is a Layer 3 router when it
338	   transits IP datagrams but it acts as a Layer 2 host as it sources or
339	   sinks Layer 2 messages on attached links to accomplish this transit
340	   capability. In this way, a single device (Internet gateway) behaves
341	   as different components (router, host) at different layers.

343	   Even though a single device may have multiple roles - even
344	   concurrently - at a given layer, each role is typically static and
345	   location-independent. An Internet gateway always acts as a Layer 2
346	   host and that behavior does not depend on where the gateway is viewed
347	   from within Layer 2. In the context of a single layer, a device's
348	   behavior is modeled as a single component from all viewpoints in that
349	   layer.

351	3.1. What is a tunnel?

353	   A tunnel can be modeled as a link in another network
354	   [To98][To01][To03]. In Figure 4, a source host (Hsrc) and destination
355	   host (Hdst) communicating over a network M in which two routers (Ra
356	   and Rd) are connected by a tunnel.

358	                 --_                                 --
359	     +------+   /  \                                /  \   +------+
360	     | Hsrc |--+ Ra +----      --      --      ----+ Rd +--| Hdst |
361	     +------+   \  /    /\    /  \    /  \    /\    \  /   +------+
362	                 --    /I \--+ Rb +--+ Rc +--/E \    --
363	                       \  /   \  /    \  /   \  /
364	                        \/     --      --     \/
365	                       <------ Network N ------->
366	     <------------------------ Network M ------------------------->

368	                         Figure 4 The big picture

370	   The tunnel consists of two elements (ingress I, egress E), that lie
371	   along a path connected by a (possibly different) network N.
372	   Regardless of how the ingress and egress are connected, the tunnel
373	   serves as a link to the devices it connects (here, Ra and Rb).

375	   IP packets arriving at the ingress are encapsulated to traverse
376	   network N. We call these packets "tunnel transit packets" because
377	   they will now transit the tunnel inside one or more "tunnel link
378	   packets". Tunnel link packets use the source address of the ingress
379	   and the destination address of the egress - using whatever address is
380	   appropriate to the Layer at which the ingress and egress operate
381	   (Layer 2, Layer 3, Layer 4, etc.). The egress decapsulates those
382	   messages, which then continue on network M as if emerging from a
383	   link. To tunnel transit packets, and to the routers the tunnel
384	   connects (Ra and Rb), the tunnel acts as a link.

386	   The model of each component (ingress, egress) and the entire system
387	   (tunnel) depends on the layer from which you view the tunnel. From
388	   the perspective of the outermost hosts (Hsrc and Hdst), the tunnel
389	   appears as a link between two routers (Ra and Rd). For routers along
390	   the tunnel (e.g., Rb and Rc), the ingress and egress appear as the
391	   endpoint hosts and Hsrc and Hdst are invisible.

393	   When the tunnel network (N) is implemented using the same protocol as
394	   the endpoint network (M), the picture looks flatter (Figure 5), as if
395	   it were running over a single network. However, note that this
396	   appearance is incorrect - nothing has changed. From the perspective
397	   of the endpoints, Rb and Rc and network N don't exist and aren't
398	   visible, and from the perspective of the tunnel, network M doesn't
399	   exist. The fact that network N and M use the same protocol, and may
400	   traverse the same links is irrelevant.

402	                 --_           --      --            --
403	     +------+   /  \    /\    /  \    /  \    /\    /  \   +------+
404	     | Hsrc |--+ Ra +--/I \--+ Rb +--+ Rc +--/E \--+ Rd +--| Hdst |
405	     +------+   \  /   \  /   \  /    \  /   \  /   \  /   +------+
406	                 --     \/     --      --     \/     --
407	                       <------ Network N ------->
408	     <------------------------ Network M ------------------------->

410	                     Figure 5 IP in IP network picture

412	3.2. View from the Outside

414	   From outside the tunnel, to network M, the entire tunnel acts as a
415	   link (Figure 6). It may be numbered or unnumbered and the addresses
416	   associated with the ingress and egress are irrelevant from outside.

418	                 --_                                 --
419	     +------+   /  \                                /  \   +------+
420	     | Hsrc |--+ Ra +------------------------------+ Rd +--| Hdst |
421	     +------+   \  /                                \  /   +------+
422	                 --                                  --
423	     <------------------------ Network M ------------------------->

425	                Figure 6 Tunnels as viewed from the outside

427	   A tunnel is effectively invisible to the network in which it resides,
428	   except that it behaves exactly as a link. Consequently [RFC3819]
429	   requirements for links supporting IP also apply to tunnels.

431	   E.g., the IP datagram hop count (IPv4 Time-to-Live [RFC791] and IPv6
432	   Hop Limit [RFC2460]) are decremented when traversing a router, not by
433	   traversing a link - or thus a tunnel. Tunnels have a tunnel MTU - the
434	   largest datagram that can transit, just as links have a corresponding
435	   link MTU. A link MTU may not reflect the native link message sizes
436	   (ATM AAL5 48 byte messages support a 9KB MTU) and the same is true
437	   for a tunnel.

439	3.3. View from the Inside

441	   Within network N, i.e., from inside the tunnel itself, the ingress is
442	   a source of tunnel link packets and the egress is a sink - both are
443	   hosts on network N (Figure 7). Consequently [RFC1122] Internet host
444	   requirements apply to ingress and egress nodes when Network N uses IP
445	   (and thus the ingress/egress use IP encapsulation).

447	                   _           --      --
448	                        /\    /  \    /  \    /\
449	                       /I \--+ Rb +--+ Rc +--/E \
450	                       \  /   \  /    \  /   \  /
451	                        \/     --      --     \/
452	                       <------ Network N ------->

454	            Figure 7 Tunnels, as viewed from within the tunnel

456	   Viewed from within the tunnel, the outer network (M) doesn't exist.
457	   Tunnel link packets can be fragmented by the source (ingress) and
458	   reassembled at the destination (egress), just as at any endpoint. The
459	   path between ingress and egress may have a path MTU but the endpoints
460	   can exchange messages as large as can be reassembled at the
461	   destination (egress), i.e., an egress MTU. Information about the
462	   network - i.e., regarding MTU sizes, network reachability, etc. - are
463	   relayed from the destination (egress) and intermediate routers back
464	   to the source (ingress), without regard for the external network (M).

466	3.4. Location of the Ingress and Egress

468	   The ingress and egress are endpoints of the tunnel and the tunnel is
469	   a link. The ingress and egress are thus link endpoints at the network
470	   nodes the tunnel interconnects. Such link endpoints are typically
471	   described as "network interfaces".

473	   Tunnel interfaces may be physical or virtual. The interface may be
474	   implemented inside the node where the tunnel attaches, e.g., inside a
475	   host or router. The interface may also be implemented as a "bump in
476	   the wire" (BITW), somewhere along a link between the two nodes the
477	   link interconnects. IP in IP tunnels are often implemented as
478	   interfaces, where IPsec tunnels are sometimes implemented as BITW.
479	   These implementation variations determine only whether information
480	   available at the link endpoints (ingress/egress) can be easily shared
481	   with the connected network nodes.

483	3.5. Implications of This Model

485	   This approach highlights a few key features of a tunnel as a network
486	   architecture construct:

488	   o  To the tunnel transit packets, tunnels turn a network (Layer 3)
489	      path into a (Layer 2) link

491	   o  To devices the tunnel traverses, the tunnel ingress and egress act
492	      as hosts that source and sink tunnel link packets

494	   The consequences of these features are as follow:

496	   o  Like a link, a tunnel has an MTU defined by the reassembly MTU of
497	      the receiving interface (egress).

499	   o  Path MTU discovery in the network layer (i.e., outer network M)
500	      has no direct relation to the MTU of the hops within the link
501	      layer of the links (or thus tunnels) that connect its components.

503	   o  Hops remain defined as the number of routers encountered on a path
504	      or the time spent at a router [RFC1812]. Hops are not decremented
505	      solely by the transit of a link, e.g., a packet with a hop count
506	      of zero should successfully transit a link (and thus a tunnel)
507	      that connects two hosts.

509	   o  The addresses of a tunnel ingress and egress correspond to link
510	      layer addresses to the tunnel transit packet and outer network M.
511	      Many point-to-point tunnels are unnumbered in the network in which
512	      they reside (even though they must have addresses in the network
513	      they transit).

515	   o  Like network interfaces, the ingress and egress are never a direct
516	      source of ICMP messages but may provide information to their
517	      attached host or router to generate those ICMP messages.

519	   These observations make it much easier to determine what a tunnel
520	   must do to transit IP packets, notably it must satisfy all
521	   requirements expected of a link.

523	4. IP Tunnel Requirements

525	   The requirements of an IP tunnel are defined by the requirements of
526	   an IP link because both transit IP packets. A tunnel must transit the
527	   IP MTU, i.e., 68B for IPv4 and 1280B for IPv6, and a tunnel must
528	   support address resolution when there is more than one egress.

530	   The requirements of the tunnel ingress and egress are defined by the
531	   network over which they exchange messages (tunnel link packets). For
532	   IP-over-IP, this means that the ingress MUST NOT exceed the
533	   (fragment) Identification field uniqueness requirements [RFC6864].

535	   These requirements remain even though tunnels have some unique
536	   issues, including the need for additional space for encapsulation
537	   headers and the potential for tunnel path MTU variation.

539	4.1. Fragmentation

541	   As with any link layer, the MTU of a tunnel is defined as the
542	   receiving interface reassembly MTU, and must satisfy the requirements
543	   of the IP packets the tunnel transits.

545	   Note that many of the issues with tunnel fragmentation and MTU
546	   handling were discussed in [RFC4459], but that document described a
547	   variety of alternatives as if they were independent. This document
548	   explains the combined approach that is necessary.

550	   An IPv4 tunnel must transit 68 byte packets without further
551	   fragmentation [RFC791][RFC1122] and an IPv6 tunnel must transit 1280
552	   byte packets without further fragmentation [RFC2460]. The tunnel MTU
553	   interacts with routers or hosts it connects the same way as would a
554	   link MTU. In the following pseudocode, TTPsize is the size of the
555	   tunnel transit packet, and egressRMTU is the receive MTU of the
556	   egress. As with any link, the link MTU is defined not by the native
557	   path of the link (the path MTU inside the tunnel) but by the egress
558	   reassembly MTU (egressRMTU). This is because the ICMP "packet too
559	   big" message indicates failure, not preference. There is no ICMP
560	   message for "larger than I'd like, but I can still transit it".

562	   These rules apply at the host/router where the tunnel is attached:

564	      if (TTP > linkMTU) then
565	         if (TTP can be fragmented, e.g., IPv4 DF=0) then
566	            split TTP into fragments of TunMTU size
567	            and send each fragment into the tunnel ingress
568	         else
569	            drop TTP and send ICMP "too big" to TTP source
570	         endif
571	      else
572	         send TTP into the tunnel "interface" (the ingress)
573	      endif

575	   These rules apply at the tunnel ingress:

577	      if (sizeof(TTP) <= TunnelPathMTU) then
578	         encapsulate TTP as received and emit
579	      else
580	         if (TunnelPathMTU < sizeof(TTP) <= egressRMTU) then
581	            fragment TTP into TunMTU chunks
582	            encapsulate and emit each TTP
583	         else
584	            {never happens; host/router already dropped by now}
585	         endif
586	      endif

588	   For IPv4 or IPv6 over IPv6, the tunnel path MTU is a minimum of 1280
589	   minus the encapsulation header (40 bytes) with its options (TOptSz)
590	   and the egress reassembly MTU is 1500 minus the same amount:

592	      if (sizeof(TTP) <= (1240 - TOptSz)) then
593	         encapsulate TTP as received and emit
594	      else
595	         if ((1240 - TOptSz) < sizeof(TTP) <= (1460 - TOptSz)) then
596	            fragment TTP into (1240 - TOptSz) chunks
597	            encapsulate and emit each TTP
598	         else
599	            {never happens; host/router already dropped by now}
600	         endif
601	      endif

603	   This tunnel supports IPv6 transit only if TOptSize is smaller than
604	   180 bytes, and supports IPv4 transit if TOptSize is smaller than 884
605	   bytes. IPv6 tunnel transit packets of 1280 bytes may be guaranteed
606	   transit the outer network (M) without needing fragmentation there but
607	   they may require ongoing fragmentation and reassembly if the tunnel
608	   MTU is not at least 1320 bytes.

610	   When using IP directly over IP, the minimum egress reassembly MTU for
611	   IPv4 is 576 bytes and for IPv6 is 1500 bytes. This means that tunnels
612	   of IPv4-over-IPv4, IPv4-over-IPv6, and IPv6-over-IPv6 are possible
613	   without additional requirements, but this may involve ingress
614	   fragmentation and egress reassembly. IPv6 cannot be tunneled directly
615	   over IPv4 without additional requirements, notably that the egress
616	   reassembly MTU or the link path MTU are at least 1280 bytes.
617	   Fragmentation and reassembly cannot be avoided for IPv6-over-IPv6
618	   without similar requirements.

620	   When ongoing ingress fragmentation and egress reassembly would be
621	   prohibitive or costly, larger MTUs can be supported by design and
622	   confirmed either out-of-band (by design) or in-band (e.g., using
623	   PLMTUD [RFC4821], as done in SEAL [RFC5320] and AERO [Te16]).
624	   Alternately, an ingress can encapsulate packets that fit and shut
625	   down once fragmentation is needed, but it must not continue to
626	   forward smaller packets while dropping larger packets that are still
627	   within required limits.

629	4.2. MTU discovery

631	   MTU discovery enables a network path to support a larger path MTU and
632	   egress MTU than it can assume from the protocol over which it
633	   operates. There are two ways in which MTU discovery interact with
634	   tunnels: the MTU of the path over the tunnel and the MTU of the
635	   tunnel itself.

637	   A tunnel has two different MTU values: the largest payload that can
638	   traverse from ingress to egress without further fragmentation (the
639	   tunnel path MTU) and the largest payload that can traverse from
640	   ingress to egress. The latter is defined by the egress reassembly
641	   MTU, not the tunnel path MTU, and is the tunnel MTU.

643	   The path MTU over the tunnel is limited by the tunnel MTU (the egress
644	   reassembly MTU) but not the tunnel path MTU. There is temptation to
645	   optimize tunnel traversal so that packets are not fragmented between
646	   ingress and egress, i.e., to tune the network path MTU to the tunnel
647	   link MTU. This is hazardous for many reasons:

649	   o  The tunnel is capable of transiting packets as large as the egress
650	      reassembly MTU, which is always at least as large as the tunnel
651	      path MTU and typically is larger.

653	   o  ICMP has only one type of error message regarding large packets -
654	      "too big", i.e., too large to transit. There is no optimization
655	      message of "bigger than I'd like, but I can deal with if needed".

657	   o  IP tunnels often involve some level of recursion, i.e.,
658	      encapsulation over itself [RFC4459].

660	   Recursive tunneling occurs whenever a protocol ends up encapsulated
661	   in itself. This happens directly, as when IPv4 is encapsulated in
662	   IPv4, or indirectly, as when IP is encapsulated in UDP which then is
663	   a payload inside IP. It can involve many layers of encapsulation
664	   because a tunnel provider isn't always aware of whether the packets
665	   it transits are already tunneled.

667	   Recursion is impossible when the tunnel transit packets are limited
668	   to that of the native size of the tunnel path MTU. Arriving tunnel
669	   transit packets have a minimum supported size (1280 for IPv6) and the
670	   tunnel path MTU has the same size; there would be no room for the
671	   additional encapsulation headers. The result would be an IPv6 tunnel
672	   that cannot satisfy IPv6 transit requirements.

674	   It is more appropriate to require the tunnel to satisfy IP transit
675	   requirements and enforce that requirement at design time or during
676	   operation (the latter using PLMTUD [RFC4821]). Conventional path MTU
677	   discovery (PMTUD) relies existing endpoint ICMP processing of
678	   explicit negative feedback from routers along the path via "message
679	   to big" ICMP packets in the reverse direction of the tunnel
680	   [RFC1191]. This technique is susceptible to the "black hole"
681	   phenomenon, in which the ICMP messages never return to the source due
682	   to policy-based filtering [RFC2923]. PLMTUD requires a separate,
683	   direct control channel from the egress to the ingress that provides
684	   positive feedback; the direct channel is not blocked by policy
685	   filters and the positive feedback ensures fail-safe operation if
686	   feedback messages are lost [RFC4821].

688	4.3. IP ID exhaustion

690	   In IPv4, the IP Identification (ID) field is a 16-bit value that is
691	   unique for every packet for a given source address, destination
692	   address, and protocol, such that it does not repeat within the
693	   Maximum Segment Lifetime (MSL) [RFC791][RFC1122]. Although the ID
694	   field was originally intended for fragmentation and reassembly, it
695	   can also be used to detect and discard duplicate packets, e.g., at
696	   congested routers (see Sec. 3.2.1.5 of [RFC1122]). For this reason,
697	   and because IPv4 packets can be fragmented anywhere along a path, all
698	   packets between a source and destination of a given protocol must
699	   have unique ID values over a period of an MSL, which is typically
700	   interpreted as two minutes (120 seconds). These requirements have
701	   recently been somewhat relaxed in recognition of the primary use of
702	   this field for reassembly and the need to handle only fragment
703	   misordering at the receiver [RFC6864].

705	   The uniqueness of the IP ID is a known problem for high speed
706	   devices, because it limits the speed of a single protocol between two
707	   endpoints [RFC4963]. Although this suggests that the uniqueness of
708	   the IP ID is moot, tunnels exacerbate this condition. A tunnel often
709	   aggregates traffic from a number of different source and destination
710	   addresses, of different protocols, and encapsulates them in a header
711	   with the same ingress and egress addresses, all using a single
712	   encapsulation protocol. The result is one of the following:

714	   1. The IP ID rules are enforced, and the tunnel throughput is
715	      severely limited.

717	   2. The IP ID rules are enforced, and the tunnel consumes large
718	      numbers of ingress/egress IP addresses solely to ensure ID
719	      uniqueness.

721	   3. The IP ID rules are ignored.

723	   The last case is the most obvious solution, because it corresponds to
724	   how endpoints currently behave. Fortunately, fragmentation is
725	   somewhat rare in the current Internet at large, but it can be common
726	   along a tunnel. Fragments that repeat the IP ID risk being
727	   reassembled incorrectly, especially when fragments are reordered or
728	   lost. Reassembly errors are not always detected by other protocol
729	   layers (see Sec. 4.8), and even when detected they can result in
730	   excessive overall packet loss and can waste bandwidth between the
731	   egress and ultimate packet destination.

733	4.4. Hop Count

735	   This section considers the selection of the value of the hop count of
736	   the tunnel link header, as well as the potential impact on the tunnel
737	   transit header. The former is affected by the number of hops within
738	   the tunnel. The latter determines whether the tunnel has visible
739	   effect on the transit packet.

741	   In general, the Internet hop count field is used to detect and avoid
742	   forwarding loops that cannot be corrected without a synchronized
743	   reboot. The IPv4 Time-to-Live (TTL) and IPv6 Hop Limit field each
744	   serve this purpose [RFC791][RFC2460].

746	   The IPv4 TTL field was originally intended to indicate packet
747	   expiration time, measured in seconds. A router is required to
748	   decrement the TTL by at least one or the number of seconds the packet
749	   is delayed, whichever is larger [RFC1812]. Packets are rarely held
750	   that long, and so the field has come to represent the count of the
751	   number of routers traversed. IPv6 makes this meaning more explicit.

753	   These hop count fields represent the number of network forwarding
754	   elements traversed by an IP datagram. An IP datagram with a hop count
755	   of zero can traverse a link between two hosts because it never visits
756	   a router (where it would need to be decremented and would have been
757	   dropped).

759	   An IP datagram traversing a tunnel thus need not have its hopcount
760	   modified, i.e., the tunnel transit header need not be affected. A
761	   zero hop count datagram should be able to traverse a tunnel as easily
762	   as it traverses a link. A router MAY be configured to decrement
763	   packets traversing a particular link (and thus a tunnel), which may
764	   be useful in emulating a path as if it had traversed one or more
765	   routers, but this is strictly optional. The ability of the outer
766	   network and tunnel network to avoid indefinitely looping packets does
767	   not rely on the hop counts of the tunnel traversal packet and tunnel
768	   link packet being related in any way at all.

770	   The hop count field is also used by several protocols to determine
771	   whether endpoints are "local", i.e., connected to the same subnet
772	   (link-local discovery and related protocols [RFC4861]). A tunnel is a
773	   way to make a remote address appear directly-connected, so it makes
774	   sense that the other ends of the tunnel appear local and that such
775	   link-local protocols operate over tunnels unless configured
776	   explicitly otherwise. When the interfaces of a tunnel are numbered,
777	   these can be interpreted the same way as if they were on the same
778	   link subnet.

780	4.5. Signaling

782	   In the current Internet architecture, signaling goes upstream, either
783	   from routers along a path or from the destination, back toward the
784	   source. Such signals are typically contained in ICMP messages, but
785	   can involve other protocols such as RSVP, transport protocol signals
786	   (e.g., TCP RSTs), or multicast control or transport protocols.

788	   A tunnel behaves like a link and acts like a link interface at the
789	   nodes where it is attached. As such, it can provide information that
790	   enhances IP signaling (e.g., ICMP), but itself does not directly
791	   generate ICMP messages.

793	   For tunnels, this means that there are two separate signaling paths.
794	   The outer network M devices can each signal the source of the tunnel
795	   transit packets, Hsrc (Figure 8). Inside the tunnel, the inner
796	   network N devices can signal the source of the tunnel link packets,
797	   the ingress I (Figure 9).

799	        +--------+-----------------------------------+--------+
800	        |        |                                   |        |
801	        v        --_                                 --       v
802	     +------+   /  \                                /  \   +------+
803	     | Hsrc |--+ Ra +----      --      --      ----+ Rd +--| Hdst |
804	     +------+   \  /    /\    /  \    /  \    /\    \  /   +------+
805	                 --    /I \--+ Rb +--+ Rc +--/E \    --
806	                       \  /   \  /    \  /   \  /
807	                        \/     --      --     \/
808	                       <------ Network N ------->
809	     <------------------------ Network M ------------------------->

811	                    Figure 8 Signals outside the tunnel

813	                         +-----+-------+------+
814	                 --_     |     |       |      |      --
815	     +------+   /  \     v     |       |      |     /  \   +------+
816	     | Hsrc |--+ Ra +----      --      --      ----+ Rd +--| Hdst |
817	     +------+   \  /    /\    /  \    /  \    /\    \  /   +------+
818	                 --    /I \--+ Rb +--+ Rc +--/E \    --
819	                       \  /   \  /    \  /   \  /
820	                        \/     --      --     \/
821	                       <------ Network N ------->
822	     <------------------------ Network M ------------------------->

824	                    Figure 9 Signals inside the tunnel

826	   These two signal paths are inherently distinct except where
827	   information is exchanged between the network interface of the tunnel
828	   (the ingress) and its attached device (Ra, in both figures).

830	   It is always possible for a network interface to provide hints to its
831	   attached device (host or router), which can be used for optimization.
832	   In this case, when signals inside the tunnel indicate a change to the
833	   tunnel, the ingress (i.e., the tunnel network interface) can provide
834	   information to the router (Ra, in both figures), so that Ra can
835	   generate the appropriate signal in return to Hsrc. This relaying may
836	   be difficult, because signals inside the tunnel may not return enough
837	   information to the ingress to support direct relaying to Hsrc.

839	   In all cases, the tunnel ingress needs to determine how to relay the
840	   signals from inside the tunnel into signals back to the source. For
841	   some protocols this is either simple or impossible (such as for
842	   ICMP), for others, it can even be undefined (e.g., multicast). In
843	   some cases, the individual signals relayed from inside the tunnel may
844	   result in corresponding signals in the outside network, and in other
845	   cases they may just change state of the tunnel interface. In the
846	   latter case, the result may cause the router Ra to generate new ICMP
847	   errors when later messages arrive from Hsrc or other sources in the
848	   outer network.

850	   The meaning of the relayed information must be carefully translated.
851	   In the case of soft or hard ICMP errors, the translation may be
852	   obvious. ICMP "packet too big" messages from inside the tunnel do not
853	   necessarily have a direct impact on Ra unless they arrive from the
854	   egress (where they would update egressRMTU). Inside the tunnel, these
855	   messages could be used to adjust the ingress fragmentation.

857	   In addition to ICMP, messages typically considered for translation
858	   include Explicit Congestion Notification (ECN [RFC6040]) and
859	   multicast (IGMP, e.g.).

861	4.6. Relationship of Header Fields

863	   Some tunnel specifications attempt to relate the fields of the tunnel
864	   transit packet and tunnel link packet, i.e., the packet arriving at
865	   the ingress and the encapsulation header. These two headers are
866	   effectively independent and there is no utility in requiring their
867	   contents to be related.

869	   In specific, the encapsulation header source and destination
870	   addresses are network endpoints in the tunnel network N, but have no
871	   meaning in the outer network M, even when the tunneled packet
872	   traverses the same network. The addresses are effectively
873	   independent, and the tunnel endpoint addresses are link addresses to
874	   the tunnel transit packet.

876	   Because the tunneled packet uses source and destination addresses
877	   with a separate meaning, it is inappropriate to copy or reuse the
878	   IPv4 Identification or IPv6 Fragment ID fields of the tunnel transit
879	   packet. These fields need to be generated based on the context of the
880	   encapsulation header, not the tunnel transit header.

882	   Similarly, the DF field need not be copied from the tunnel transit
883	   packet to the encapsulation header of the tunnel link packet
884	   (presuming both are IPv4). Path MTU discovery inside the tunnel does
885	   not directly correspond to path MTU discovery outside the tunnel.

887	   The same is true for most other fields. When a field value is
888	   generated in the encapsulation header, its meaning should be derived
889	   from what is desired in the context of the tunnel as a link. When
890	   feedback is received from these fields, they should be presented to
891	   the tunnel ingress and egress as if they were network interfaces. The
892	   behavior of the node where these interfaces attach should be
893	   identical to that of a conventional link.

895	   There are exceptions to this rule that are explicitly intended to
896	   relay signals from inside the tunnel to outside the tunnel. The
897	   primary example is ECN [RFC6040], which copies the ECN bits from the
898	   tunnel transit header to the tunnel link header during encapsulation
899	   at the ingress and modifies the tunnel transit header at egress based
900	   on a combination of the bits of the two headers. This is intended to
901	   allow congestion notification within the tunnel to be interpreted as
902	   if it were on the direct path. Other examples may involve the DSCP
903	   flags. In both cases, it is assumed that the intent of copying values
904	   on encapsulation and merging values on decapsulation has the effect
905	   of allowing the tunnel to act as if it participates in the same type
906	   of network as outside the tunnel (network M).

908	4.7. Congestion

910	   In general, tunnels carrying IP traffic need not react directly to
911	   congestion any more than would any other link layer [RFC5405]. IP
912	   traffic is not generally expected to be congestion reactive.

914	   [text from David Black on ECN relaying?]

916	4.8. Checksums

918	   IP traffic transiting a tunnel needs to expect a similar level of
919	   error detection and correction as it would expect from any other
920	   link. In the case of IPv4, there are no such expectations, which is
921	   partly why it includes a header checksum [RFC791].

923	   IPv6 omitted the header checksum because it already expects most link
924	   errors to be detected and dropped by the link layer and because it
925	   also assumes transport protection [RFC2460]. When transiting IPv6
926	   over IPv6, the tunnel fails to provide the expected error detection.
927	   This is why IPv6 is often tunneled over layers that include separate
928	   protection, such as GRE [RFC2784].

930	   The fragmentation created by the tunnel ingress can increase the need
931	   for stronger error detection and correction, especially at the tunnel
932	   egress to avoid reassembly errors. The Internet checksum is known to
933	   be susceptible to reassembly errors that could be common [RFC4963],
934	   and should not be relied upon for this purpose. This is why SEAL and
935	   AERO include a separate checksum [RFC5320][Te16]. This requirement
936	   can be undermined when using UDP as a tunnel with no UDP checksum (as
937	   per [RFC6935][RFC6936]) when fragmentation occurs because the egress
938	   has no checksum with which to validate reassembly. For this reason,
939	   it is safe to use UDP with a zero checksum for atomic (non-
940	   fragmented, non-fragmentable) tunnel link packets only; when used on
941	   fragments, whether generated at the ingress or en-route inside the
942	   tunnel, omission of such a checksum can result in reassembly errors
943	   that can cause additional work (capacity, forwarding processing,
944	   receiver processing) downstream of the egress.

946	4.9. Numbering

948	   Tunnel ingresses and egresses have addresses associated with the
949	   encapsulation protocol. These addresses are the source and
950	   destination (respectively) of the encapsulated packet while
951	   traversing the tunnel network.

953	   Tunnels may or may not have addresses in the network whose traffic
954	   they transit (e.g., network M in Figure 4). In some cases, the tunnel
955	   is an unnumbered interface to a point-to-point virtual link. When the
956	   tunnel has multiple egresses, tunnel interfaces require separate
957	   addresses in network M.

959	   To see the effect of tunnel interface addresses, consider traffic
960	   sourced at router Ra in Figure 4. Even before being encapsulated by
961	   the ingress, that traffic needs a source IP network address that
962	   belongs to the router. One option is to use an address associated
963	   with one of the other interfaces of the router [RFC1122]. Another
964	   option is to assign a number to the tunnel interface itself.
965	   Regardless of which address is used, the resulting IP packet is then
966	   encapsulated by the tunnel ingress using the ingress address as a
967	   separate operation.

969	4.10. Multicast

971	   [To be addressed]

973	   Note that PMTU for multicast is difficult. PIM carries an option that
974	   may help in the Population Count Extensions to PIM [RFC6807].

976	   IMO, again, this is no different than any other multicast link.

978	4.11. NAT / Load Balancing

980	   [To be addressed]

982	4.12. Recursive tunnels.

984	   The rules described in this document already support tunnels over
985	   tunnels, sometimes known as "recursive" tunnels, in which IP is
986	   transited over IP either directly or via intermediate encapsulation
987	   (IP-UDP-IP).

989	   There are known hazards to recursive tunneling, notably that the
990	   independence of the tunnel transit header and tunnel link header hop
991	   counts can result in a tunneling loop. Such looping can be avoided
992	   when using direct encapsulation (IP in IP) by use of a header option
993	   to track the encapsulation count and to limit that count [RFC2473].
994	   This looping cannot be avoided when other protocols are used for
995	   tunneling, e.g., IP in UDP in IP, because the encapsulation count may
996	   not be visible where the recursion occurs.

998	5. Observations (implications)

1000	   [Leave this as a shopping list for now]

1002	5.1. Tunnel protocol designers

1004	   Account for egress MTU/path MTU differences.

1006	   Include a stronger checksum.

1008	   Ensure the egress MTU is always larger than the path MTU.

1010	   Ensure that the egress reassembly can keep up with line rate OR
1011	   design PLMTUD into the tunneling protocol.

1013	5.2. Tunnel implementers

1015	   Detect when the egress MTU is exceeded.

1017	   Detect when the egress MTU drops below the required minimum and shut
1018	   down the tunnel if that happens - configuring the tunnel down and
1019	   issuing a hard error may be the only way to detect this anomaly, and
1020	   it's sufficiently important that the tunnel SHOULD be disabled.

1022	   Do NOT decrement the TTL as part of being a tunnel. It's always
1023	   already OK for a router to decrement the TTL based on different next-
1024	   hop routers, but TTL is a property of a router not a link.

1026	5.3. Tunnel operators

1028	   Keep the difference between "enforced by operators" vs. "enforced by
1029	   active protocol mechanism" in mind. It's fine to assume something the
1030	   tunnel cannot or does not test, as long as you KNOW you can assume
1031	   it. When the assumption is wrong, it will NOT be signaled by the
1032	   tunnel. Do NOT decrement the TTL as part of being a tunnel. It's
1033	   always already OK for a router to decrement the TTL based on
1034	   different next-hop routers, but TTL is a property of a router not a
1035	   link.

1037	   Do NOT decrement the TTL as part of being a tunnel. It's always
1038	   already OK for a router to decrement the TTL based on different next-
1039	   hop routers, but TTL is a property of a router not a link.

1041	5.4. For existing standards

1043	5.4.1. Generic UDP Encapsulation (GUE - IP in UDP in IP)

1045	   [He15a][He15b]

1047	5.4.2. Generic Packet Tunneling in IPv6

1049	   [RFC2473]

1051	   Consistent with this doc:

1053	      Considers the endpoints of the tunnel as virtual interfaces.

1055	      Considers the tunnel a virtual link.

1057	      Requires source fragmentation at the ingress and reassembly at the
1058	   egress.

1060	      Includes a recursion limit to prevent unlimited re-encapsulation.

1062	      Sets tunnel transit header hop limit independently.

1064	      Sends ICMPs back at the ingress based on the arriving tunnel
1065	   transit packet and its relation to the tunnel MTU (though it uses the
1066	   incorrect value of the tunnel MTU; see below).

1068	      Allows for ingress relaying of internal tunnel errors (but see
1069	   below; it does not discuss retaining state about these).

1071	   Inconsistent with this doc:

1073	      Decrements the tunnel transit header by 1, i.e., incorrectly
1074	   assuming that tunnel endpoints occur at routers only and that the
1075	   tunnel, rather than the router, is responsible for this decrement.

1077	      This doc goes to pains to describe the decapsulation process as if
1078	   it were distinct from conventional protocol processing by the
1079	   receiver (when it should not be).

1081	      Copies traffic class from tunnel link to tunnel transit header (as
1082	   one variant).

1084	      Treats the tunnel MTU as the tunnel path MTU, rather than the
1085	   tunnel egress MTU.

1087	      Incorrectly fragments IPv4 DF=0 tunnel transit packets that arrive
1088	   larger than the tunnel MTU at the IPv6 layer; the relationship
1089	   between IPv4 and the tunnel is more complex (as noted in this doc).

1091	      Fails to retain state from the tunnel based on ingress receiving
1092	   ICMP messages from inside the tunnel, e.g., such as might cause
1093	   future tunnel transit packets arriving at the ingress to be discarded
1094	   with an ICMP error response rather than allowing them to proceed into
1095	   the tunnel.

1097	5.4.3. Geneve (NVO3)

1099	   [RFC7364][Gr16]

1101	   Consistent with this doc:

1103	      Generation of the link header fields is not discussed and presumed
1104	   independent of transit packet.

1106	   Inconsistent with this doc:

1108	      Tries to match transit to tunnel path MTU rather than egress MTU.

1110	5.4.4. GRE (IP in GRE in IP)

1112	   IPv4 [RFC2784][RFC7588][RFC7676]:

1114	   Consistent with this doc:

1116	      Does not address link header generation.

1118	      Non-default behavior allows fragmentation of link packet to match
1119	   tunnel path MTU up to the limit of the egress MTU.

1121	      Default behavior sets link DF independently.

1123	      Shuts the tunnel down if the tunnel path MTU isn't => 1280.

1125	   Inconsistent with this doc:

1127	      Based on tunnel path MTU, not egress MTU.

1129	      Claims that the tunnel (GRE) mechanism is responsible for
1130	   generating ICMP error messages.

1132	      Default behavior fragments transit packet (where possible) based
1133	   on tunnel path MTU (it should fragment based on egress MTU).

1135	      Default behavior does not support the minimum MTU of IPv6 when run
1136	   over IPv6.

1138	      Non-default behavior allows copying DF for IPv4 in IPv4.

1140	5.4.5. IP in IP / mobile IP

1142	   IPv4 [RFC2003][RFC5944]:

1144	   Consistent with this doc:

1146	      Generate link ID independently

1148	      Generate link DF independently when transit DF=0

1150	      Generate ECN/update ECN based on sharing info [RFC6040]

1152	      Set link TTL to transit to egress only (independently)

1154	      Do not decrement TTL on entry except when part of forwarding

1156	      Do not decrement TTL on exit except when part of forwarding

1158	      Options not copied, but used as a hint to desired services.

1160	      Generally treat tunnel as a link, e.g., for link-local.

1162	   Inconsistent with this doc

1164	      Set link DF when transit DF=1 (won't work unless I-E runs PLMTUD)

1166	      Drop at egress if transit TTL=0 (wrong TTL for host-host tunnels)

1168	      Drop when transit source is router's IP (prevents tun from router)

1170	      Drop when transit source matches egress (prevents tun to router)

1172	      Use tunnel ICMPs to generate upper ICMPs, copying context (ICMPs
1173	   are now coming from inside a link!); these should be handled by
1174	   setting errors as a "network interface" and letting the attached
1175	   host/router figure out what to send.

1177	      Using tunnel MTU discovery to tune the transit packet to the
1178	   tunnel path MTU rather than egress MTU.

1180	   IPv6 [RFC2473]:

1182	   Consistent with this doc:

1184	      Doesn't discuss lots of header fields, but implies they're set
1185	   independently.

1187	      Sets link TTL independently.

1189	   Inconsistent with this doc:

1191	      Tunnel issues ICMP PTBs.

1193	      ICMP PTB issued if larger then 1280 - header, rather than egress
1194	   reassembly MTU.

1196	      Fragments IPv6 over IPv6 fragments only if transit is <= 1280
1197	   (i.e., forces all tunnels to have a max MTU of 1280).

1199	      Fragments IPv4 over IPv6 fragments only if IPv4 DF=0
1200	   (misinterpreting the "can fragment the IPv4 packet" as permission to
1201	   fragment at the IPv6 link header)

1203	      Considers encapsulation a forwarding operation and decrements the
1204	   transit TTL.

1206	5.4.6. IPsec tunnel mode (IP in IPsec in IP)

1208	   [RFC4301]

1210	   Consistent with this doc:

1212	      Most of the rules, except as noted below.

1214	   Inconsistent with this doc:

1216	      Writes its own header copying rules (Sec 5.1.2), rather than
1217	   referring to existing standards.

1219	      Uses policy to set, clear, or copy DF (policy isn't the issue)

1221	      Intertwines tunneling with forwarding rather than presenting the
1222	   tunnel as a network interface; this can be corrected by using IPsec
1223	   transport mode with an IP-in-IP tunnel [RFC3884].

1225	5.4.7. L2TP

1227	   [RFC3931]

1229	   Consistent with this doc:

1231	      Does not address most link headers, which are thus independent.

1233	   Inconsistent with this doc:

1235	      Manages tunnel access based on tunnel path MTU, instead of egress
1236	   MTU.

1238	      Refers to RFC2473 (IPv6 in IPv6), which is inconsistent with this
1239	   doc as noted above.

1241	5.4.8. L2VPN

1243	   [RFC4664]

1245	5.4.9. L3VPN

1247	   [RFC4176]

1249	5.4.10. LISP

1251	   [RFC6830]

1253	5.4.11. MPLS

1255	   [RFC3031]

1257	5.4.12. PWE

1259	   [RFC3985]

1261	5.4.13. SEAL/AERO

1263	   [RFC5320][Te16]

1265	5.4.14. TRILL

1267	   [RFC5556][RFC6325]

1269	   Consistent with this doc:

1271	      Puts IP in Ethernet, so most of the issues don't come up.

1273	      Ethernet doesn't have TTL or fragment.

1275	      Rbridge (trill) TTL header is independent of transit packet.

1277	5.5. For future standards

1279	   Larger IPv4 MTU (2K? or just 2x path MTU?) for reassembly

1281	   Always include frag support for at least two frags; do NOT try to
1282	   deprecate fragmentation.

1284	   Limit encapsulation option use/space.

1286	   Augment ICMP to have two separate messages: PTB vs P-bigger-than-
1287	   optimal

1289	   Include MTU as part of BGP as a hint - SB

1291	   Hazards of multi-MTU draft-van-beijnum-multi-mtu-04

1293	6. Security Considerations

1295	   Tunnels may introduce vulnerabilities or add to the potential for
1296	   receiver overload and thus DOS attacks. These issues are primarily
1297	   related to the fact that a tunnel is a link that traverses a network
1298	   path and to fragmentation and reassembly. ICMP signal translation
1299	   introduces a new security issue and must be done with care. ICMP
1300	   generation at the router or host attached to a tunnel is already
1301	   covered by existing requirements (e.g., should be throttled).

1303	   Tunnels traverse multiple hops of a network path from ingress to
1304	   egress. Traffic along such tunnels may be susceptible to on-path and
1305	   off-path attacks, including fragment injection, reassembly buffer
1306	   overload, and ICMP attacks. Some of these attacks may not be as
1307	   visible to the endpoints of the architecture into which tunnels are
1308	   deployed and these attacks may thus be more difficult to detect.

1310	   Fragmentation at routers or hosts attached to tunnels may place an
1311	   undue burden on receivers where traffic is not sufficiently diffuse,
1312	   because tunnels may induce source fragmentation at hosts and path
1313	   fragmentation (for IPv4 DF=0) more for tunnels than for other links.
1314	   Care should be taken to avoid this situation, notably by ensuring
1315	   that tunnel MTUs are not significantly different from other link
1316	   MTUs.

1318	   Tunnel ingresses emitting IP datagrams MUST obey all existing IP
1319	   requirements, such as the uniqueness of the IP ID field. Failure to
1320	   either limit encapsulation traffic, or use additional ingress/egress
1321	   IP addresses, can result in high speed traffic fragments being
1322	   incorrectly reassembled.

1324	   [management?]

1326	   [Access control?]

1328	   describe relationship to [RFC6169] - JT (as per INTAREA meeting
1329	   notes, don't cover Teredo-specific issues in RFC6169, but include
1330	   generic issues here)

1332	7. IANA Considerations

1334	   This document has no IANA considerations.

1336	   The RFC Editor should remove this section prior to publication.

1338	8. References

1340	8.1. Normative References

1342	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1343	             Requirement Levels", BCP 14, RFC 2119, March 1997.

1345	8.2. Informative References

1347	   [Cl88]    Clark, D., "The design philosophy of the DARPA internet
1348	             protocols," Proc. Sigcomm 1988, p.106-114, 1988.

1350	   [Er94]    Eriksson, H., "MBone: The Multicast Backbone,"
1351	             Communications of the ACM, Aug. 1994, pp.54-60.

1353	   [Gr16]    Gross, J., et al., "Geneve: Generic Network Virtualization
1354	             Encapsulation," draft-ietf-nvo3-geneve-01, Jan. 2016.

1356	   [He15a]   Herbert, T., L. Yong, O. Zia, "Generic UDP Encapsulation,"
1357	             draft-ietf-nvo3-gue-02, Dec. 2015.

1359	   [He15b]   Herbert, T., F. Templin, "Fragmentation option for Generic
1360	             UDP Encapsulation," draft-herbert-gue-fragmentation-02,
1361	             Oct. 2015.

1363	   [RFC768]  Postel, J, "User Datagram Protocol," RFC 768, Aug. 1980

1365	   [RFC791]  Postel, J., "Internet Protocol," RFC 791 / STD 5, September
1366	             1981.

1368	   [RFC793]  Postel, J, "Transmission Control Protocol," RFC 793, Sept.
1369	             1981.

1371	   [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
1372	             Communication Layers," RFC 1122 / STD 3, October 1989.

1374	   [RFC1191] Mogul, J., S. Deering, "Path MTU discovery," RFC 1191,
1375	             November 1990.

1377	   [RFC1812] Baker, F., "Requirements for IP Version 4 Routers," RFC
1378	             1812, June 1995.

1380	   [RFC2003] Perkins, C., "IP Encapsulation within IP," RFC 2003,
1381	             October 1996.

1383	   [RFC2460] Deering, S., R. Hinden, "Internet Protocol, Version 6
1384	             (IPv6) Specification," RFC 2460, Dec. 1998.

1386	   [RFC2473] Conta, A., "Generic Packet Tunneling in IPv6
1387	             Specification," RFC 2473, Dec. 1998.

1389	   [RFC2784] Farinacci, D., T. Li, S. Hanks, D. Meyer, P. Traina,
1390	             "Generic Routing Encapsulation (GRE)", RFC 2784, March
1391	             2000.

1393	   [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery," RFC
1394	             2923, September 2000.

1396	   [RFC2473] Conta, A., S. Deering, "Generic Packet Tunneling in IPv6
1397	             Specification," RFC 2473, Dec. 1998.

1399	   [RFC3031] Rosen, E., A. Viswanathan, R. Callon, "Multiprotocol Label
1400	             Switching Architecture", RFC 3031, January 2001.

1402	   [RFC5944] Perkins, C., Ed., "IP Mobility Support for IPv4, Revised"
1403	             RFC 5944, Nov. 2010.

1405	   [RFC3819] Karn, P., Ed., C. Bormann, G. Fairhurst, D. Grossman, R.
1406	             Ludwig, J. Mahdavi, G. Montenegro, J. Touch, L. Wood,
1407	             "Advice for Internet Subnetwork Designers," RFC 3819 / BCP
1408	             89, July 2004.

1410	   [RFC3884] Touch, J., L. Eggert, Y. Wang, "Use of IPsec Transport Mode
1411	             for Dynamic Routing," RFC 3884, September 2004.

1413	   [RFC3931] Lau, J., Ed., M. Townsley, Ed., I. Goyret, Ed., "Layer Two
1414	             Tunneling Protocol - Version 3 (L2TPv3)," RFC 3931, March
1415	             2005.

1417	   [RFC3985] Bryant, S., P. Pate (Eds.), "Pseudo Wire Emulation Edge-to-
1418	             Edge (PWE3) Architecture", RFC 3985, March 2005.

1420	   [RFC4176] El Mghazli, Y., Ed., T. Nadeau, M. Boucadair, K. Chan, A.
1421	             Gonguet, "Framework for Layer 3 Virtual Private Networks
1422	             (L3VPN) Operations and Management," RFC 4176, October 2005.

1424	   [RFC4301] Kent, S., and K. Seo, "Security Architecture for the
1425	             Internet Protocol," RFC 4301, December 2005.

1427	   [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-
1428	             Network Tunneling," RFC 4459, April 2006.

1430	   [RFC4664] Andersson, L., Ed., E. Rosen, Ed., "Framework for Layer 2
1431	             Virtual Private Networks (L2VPNs)," RFC 4664, September
1432	             2006.

1434	   [RFC4821] Mathis, M., J. Heffner, "Packetization Layer Path MTU
1435	             Discovery," RFC 4821, March 2007.

1437	   [RFC4861] Narten, T., E. Nordmark, W. Simpson, H. Soliman, "Neighbor
1438	             Discovery for IP version 6 (IPv6)," RFC 4861, Sept. 2007.

1440	   [RFC4963] Heffner, J., M. Mathis, B. Chandler, "IPv4 Reassembly
1441	             Errors at High Data Rates," RFC 4963, July 2007.

1443	   [RFC5320] Templin, F., Ed., "The Subnetwork Encapsulation and
1444	             Adaptation Layer (SEAL)," RFC 5320, Feb. 2010.

1446	   [RFC5405] Eggert, L., G. Fairhurst, "Unicast UDP Usage Guidelines for
1447	             Application Designers," RFC 5405, Nov. 2008.

1449	   [RFC5556] Touch, J., R. Perlman, "Transparently Interconnecting Lots
1450	             of Links (TRILL): Problem and Applicability Statement," RFC
1451	             5556, May 2009.

1453	   [RFC6040] Briscoe, B., "Tunneling of Explicit Congestion
1454	             Notification," RFC 6040, Nov. 2010.

1456	   [RFC6169] Krishnan, S., D. Thaler, J. Hoagland, "Security Concerns
1457	             With IP Tunneling," RFC 6169, Apr. 2011.

1459	   [RFC6325] Perlman, R., D. Eastlake, D. Dutt, S. Gai, A. Ghanwani,
1460	             "Routing Bridges (RBridges): Base Protocol Specification,"
1461	             RFC 6325, July 2011.

1463	   [RFC6807] Farinacci, D., G. Shepherd, S. Venaas, Y. Cai, "Population
1464	             Count Extensions to Protocol Independent Multicast (PIM),"
1465	             RFC 6807, Dec. 2012.

1467	   [RFC6830] Farinacci, D., V. Fuller, D. Meyer, D. Lewis, "The
1468	             Locator/ID Separation Protocol," RFC 6830, Jan. 2013.

1470	   [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field,"
1471	             Proposed Standard, RFC 6864, Feb. 2013.

1473	   [RFC6935] Eubanks, M., P. Chimento, M. Westerlund, "IPv6 and UDP
1474	             Checksums for Tunneled Packets," RFC 6935, Apr. 2013.

1476	   [RFC6936] Fairhurst, G., M. Westerlund, "Applicability Statement for
1477	             the Use of IPv6 UDP Datagrams with Zero Checksums," RFC
1478	             6936, Apr. 2013.

1480	   [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., M.
1481	             Napierala, "Problem Statement: Overlays for Network
1482	             Virtualization", RFC 7364, October 2014.

1484	   [RFC7588] Bonica, R., C. Pignataro, J. Touch, "A Widely-Deployed
1485	             Solution to the Generic Routing Encapsulation Fragmentation
1486	             Problem," RFC 7588, July 2015.

1488	   [RFC7676] Pignataro, C., R. Bonica, S. Krishnan, "IPv6 Support for
1489	             Generic Routing Encapsulation (GRE)," RFC 7676, Oct 2015.

1491	   [Sa84]    Saltzer, J., D. Reed, D. Clark, "End-to-end arguments in
1492	             system design," ACM Trans. on Computing Systems, Nov. 1984.

1494	   [Te16]    Templin, F., "Asymmetric Extended Route Optimization,"
1495	             draft-templin-aerolink-64, Jan. 2016.

1497	   [To01]    Touch, J., "Dynamic Internet Overlay Deployment and
1498	             Management Using the X-Bone," Computer Networks, July 2001,
1499	             pp. 117-135.

1501	   [To03]    Touch, J., Y. Wang, L. Eggert, G. Finn, "Virtual Internet
1502	             Architecture," USC/ISI Tech. Report 570, Aug. 2003.

1504	   [To98]    Touch, J., S. Hotz, "The X-Bone," Proc. Globecom Third
1505	             Global Internet Mini-Conference, Nov. 1998.

1507	   [Zi80]    Zimmermann, H., "OSI Reference Model - The ISO Model of
1508	             Architecture for Open Systems Interconnection," IEEE Trans.
1509	             on Comm., Apr. 1980.

1511	9. Acknowledgments

1513	   This document originated as the result of numerous discussions among
1514	   the authors, Jari Arkko, Stuart Bryant, Lars Eggert, Ted Faber, Gorry
1515	   Fairhurst, Dino Farinacci, Matt Mathis, and Fred Templin, as well as
1516	   members participating in the Internet Area Working Group.

1518	   This document was prepared using 2-Word-v2.0.template.dot.

1520	Authors' Addresses

1522	   Joe Touch
1523	   USC/ISI
1524	   4676 Admiralty Way
1525	   Marina del Rey, CA 90292-6695
1526	   U.S.A.

1528	   Phone: +1 (310) 448-9151
1529	   Email: touch@isi.edu

1531	   W. Mark Townsley
1532	   Cisco
1533	   L'Atlantis, 11, Rue Camille Desmoulins
1534	   Issy Les Moulineaux, ILE DE FRANCE 92782

1536	   Email: townsley@cisco.com

1538	Appendix A.                 Fragmentation

1540	   There are two places where fragmentation can occur in a tunnel,
1541	   called Outer Fragmentation and Inner Fragmentation.

1543	A.1. Outer Fragmentation

1545	   The simplest case is Outer Fragmentation, as shown in Figure 10. The
1546	   bottom of the figure shows the network topology, where packets start
1547	   at the source, enter the tunnel at the encapsulator, exit the tunnel
1548	   at the decapsulator, and arrive finally at the destination. The
1549	   packet traffic is shown above the topology, where the end-to-end
1550	   packets are shown at the top. The packets are composed of an inner
1551	   header (iH) and inner data (iD); the term "inner") is relative to the
1552	   tunnel, as will become apparent. When the packet (iH,iD) arrives at
1553	   the encapsulator, it is placed inside the tunnel packet structure,
1554	   here shown as adding just an outer header, oH, in step (a).

1556	   When the encapsulated packet exceeds the MTU of the tunnel, the
1557	   packet needs to be fragmented. In this case we fragment the packet at
1558	   the outer header, with the fragments shown as (b1) and (b2). Note
1559	   that the outer header indicates fragmentation (as ' and "),the inner
1560	   header occurs only in the first fragment, and the inner data is
1561	   broken across the two packets. These fragments are reassembled at the
1562	   encapsulator in step (c), and the resulting packet is decapsulated
1563	   and sent on to the destination.

1565	    +----+----+                                              +----+----+
1566	    | iH | iD |------+ -  -  -  -  -  -  -  -  -  -  +------>| iH | iD |
1567	    +----+----+      |                               |       +----+----+
1568	                     v                               |
1569	              +----+----+----+               +----+----+----+
1570	          (a) | oH | iH | iD |               | oH | iH | iD | (c)
1571	              +----+----+----+               +----+----+----+
1572	                     |                               ^
1573	                     |       +----+----+-----+       |
1574	                (b1) +----- >| oH'| iH | iD1 |-------+
1575	                     |       +----+----+-----+       |
1576	                     |                               |
1577	                     |       +----+-----+            |
1578	                (b2) +----- >| oH"| iD2 |------------+
1579	                             +----+-----+
1580	   +-----+         +---+                           +---+         +-----+
1581	   |     |        /     \ ======================= /     \        |     |
1582	   | Src |=======|  Enc  |=======================|  Dec  |=======| Dst |
1583	   |     |        \     / ======================= \     /        |     |
1584	   +-----+         +---+                           +---+         +-----+

1586	                Figure 10 Fragmentation of the outer packet

1588	   Outer fragmentation isolates Source and Destination from tunnel
1589	   encapsulation duties. This can be considered a benefit in clean,
1590	   layered network design, but also may result in complex decapsulator
1591	   design, especially where tunnels aggregate large amounts of traffic,
1592	   such as IP ID overload (see Sec. 4.3). Outer fragmentation is valid
1593	   for any tunnel encapsulation protocol that supports fragmentation
1594	   (e.g., IPv4 or IPv6), where the tunnel endpoints act as the host
1595	   endpoints of that protocol.

1597	   Along the tunnel, the inner header is contained only in the first
1598	   fragment, which can interfere with mechanisms that 'peek' into lower
1599	   layer headers, e.g., as for ICMP, as discussed in Sec. 4.5.

1601	A.2. Inner Fragmentation

1603	   Inner Fragmentation distributes the impact of tunneling across both
1604	   the decapsulator and destination, and is shown in Figure 11. Again,
1605	   the network topology is shown at the bottom of the figure, and the
1606	   original packets show at the top. Packets arrive at the encapsulator,
1607	   and are fragmented there based on the inner header into (a1) and
1608	   (a2). The fragments arrive at the decapsulator, which removes the
1609	   outer header and forwards the resulting fragments on to the
1610	   destination. The destination is then responsible for reassembling the
1611	   fragments into the original packet.

1613	   +----+----+                                               +----+----+
1614	   | iH | iD |-------+-  -  -  -  -  -  -  -  -  -  -  -  - >| iH | iD |
1615	   +----+----+       |                                       +----+----+
1616	                     v                                            ^
1617	                +----+-----+                    +----+-----+      |
1618	           (a1) | iH'| iD1 |                    | iH'| iD1 |------+
1619	                +----+-----+                    +----+-----+      |
1620	                                                                  |
1621	                +----+---                       +----+-----+      |
1622	           (a2) | iH"| iD2 |                    | iH"| iD2 |------+
1623	                +----+-----+                    +----+-----+
1624	                     |                               ^
1625	                     |       +----+----+-----        |
1626	                (b1) +----- >| oH | iH'| iD1 |-------+
1627	                     |       +----+----+-----+       |
1628	                     |                               |
1629	                     |       +----+----+-----+       |
1630	                (b2) +----- >| oH | iH"| iD2 |-------+
1631	                             +----+----+-----+
1632	   +-----+         +---+                           +---+         +-----+
1633	   |     |        /     \ ======================= /     \        |     |
1634	   | Src |=======|  Enc  |=======================|  Dec  |=======| Dst |
1635	   |     |        \     / ======================= \     /        |     |
1636	   +-----+         +---+                           +---+         +-----+

1638	                Figure 11 Fragmentation of the inner packet

1640	   As noted, inner fragmentation distributes the effort of tunneling
1641	   across the decapsulator and destinations; this can be especially
1642	   important when the tunnel aggregates large amounts of traffic. Note
1643	   that this mechanism is thus valid only when the original source
1644	   packets can be fragmented on-path, e.g., as in IPv4.

1646	   Along the tunnel, the inner headers are copied into each fragment,
1647	   and so are available to mechanisms that 'peek' into headers (e.g.,
1648	   ICMP, as discussed in Sec. 4.5). Because fragmentation happens on the
1649	   inner header, the impact of IP ID is reduced.

1651	APPENDIX B: Fragmentation efficiency

1653	B.1. Selecting fragment sizes

1655	   There are different ways to fragment a packet. Consider a network
1656	   with an MTU as shown in Figure 12, where packets are encapsulated
1657	   over the same network layer as they arrive on (e.g., IP in IP). If a
1658	   packet as large as the MTU arrives, it must be fragmented to
1659	   accommodate the additional header.

1661	                 X===========================X (MTU)
1662	                 +----+----------------------+
1663	                 | iH | DDDDDDDDDDDDDDDDDDDD |
1664	                 +----+----------------------+
1665	                   |
1666	                   |  X===========================X (MTU)
1667	                   |  +---+----+------------------+
1668	               (a) +->| H'| iH | DDDDDDDDDDDDDDDD |
1669	                   |  +---+----+------------------+
1670	                   |      |
1671	                   |      |  X===========================X (MTU)
1672	                   |      |  +----+---+----+-------------+
1673	                   | (a1) +->| nH'| H | iH | DDDDDDDDDDD |
1674	                   |      |  +----+---+----+-------------+
1675	                   |      |
1676	                   |      |  +----+-------+
1677	                   | (a2) +->| nH"| DDDDD |
1678	                   |         +----+-------+
1679	                   |
1680	                   |  +---+------+
1681	               (b) +->| H"| DDDD |
1682	                      +---+------+
1683	                          |
1684	                          |  +----+---+------+
1685	                     (b1) +->| nH'| H"| DDDD |
1686	                             +----+---+------+

1688	                   Figure 12Fragmenting via maximum fit

1690	   Figure 12 shows this process, using Outer Fragmentation as an example
1691	   (the situation is the same for Inner Fragmentation, but the headers
1692	   that are affected differ). The arriving packet is first split into
1693	   (a) and (b), where (a) is of the MTU of the network. However, this
1694	   tunnel then traverses over another tunnel, whose impact the first
1695	   tunnel ingress has not accommodated. The packet (a) arrives at the
1696	   second tunnel ingress, and needs to be encapsulated again, but
1697	   because it is already at the MTU, it needs to be fragmented as well,
1698	   into (a1) and (a2). In this case, packet (b) arrives at the second
1699	   tunnel ingress and is encapsulated into (b1) without fragmentation,
1700	   because it is already below the MTU size.

1702	   In Figure 13, the fragmentation is done evenly, i.e., by splitting
1703	   the original packet into two roughly equal-sized components, (c) and
1704	   (d). Note that (d) contains more packet data, because (c) includes
1705	   the original packet header because this is an example of Outer
1706	   Fragmentation. The packets (c) and (d) arrive at the second tunnel
1707	   encapsulator, and are encapsulated again; this time, neither packet
1708	   exceeds the MTU, and neither requires further fragmentation.

1710	                 X===========================X (MTU)
1711	                 +----+----------------------+
1712	                 | iH | DDDDDDDDDDDDDDDDDDDD |
1713	                 +----+----------------------+
1714	                   |
1715	                   |  X===========================X (MTU)
1716	                   |  +---+----+----------+
1717	               (c) +->| H'| iH | DDDDDDDD |
1718	                   |  +---+----+----------+
1719	                   |      |
1720	                   |      |  X===========================X (MTU)
1721	                   |      |  +----+---+----+----------+
1722	                   | (c1) +->| nH | H'| iH | DDDDDDDD |
1723	                   |         +----+---+----+----------+
1724	                   |
1725	                   |  +---+--------------+
1726	               (d) +->| H"| DDDDDDDDDDDD |
1727	                      +---+--------------+
1728	                          |
1729	                          |  +----+---+--------------+
1730	                     (d1) +->| nH | H"| DDDDDDDDDDDD |
1731	                             +----+---+--------------+

1733	                       Figure 13 Fragmenting evenly

1735	B.2. Packing

1737	   Encapsulating individual packets to traverse a tunnel can be
1738	   inefficient, especially where headers are large relative to the
1739	   packets being carried. In that case, it can be more efficient to
1740	   encapsulate many small packets in a single, larger tunnel payload.
1741	   This technique, similar to the effect of packet bursting in Gigabit
1742	   Ethernet (regardless of whether they're encoded using L2 symbols as
1743	   delineators), reduces the overhead of the encapsulation headers
1744	   (Figure 14). It reduces the work of header addition and removal at
1745	   the tunnel endpoints, but increases other work involving the packing
1746	   and unpacking of the component packets carried.

1748	                     +-----+-----+
1749	                     | iHa | iDa |
1750	                     +-----+-----+
1751	                           |
1752	                           |     +-----+-----+
1753	                           |     | iHb | iDb |
1754	                           |     +-----+-----+
1755	                           |           |
1756	                           |           |     +-----+-----+
1757	                           |           |     | iHc | iDc |
1758	                           |           |     +-----+-----+
1759	                           |           |           |
1760	                           v           v           v
1761	                +----+-----+-----+-----+-----+-----+-----+
1762	                | oH | iHa | iHa | iHb | iDb | iHc | iDc |
1763	                +----+-----+-----+-----+-----+-----+-----+

1765	                  Figure 14 Packing packets into a tunnel

1767	   [NOTE: PPP chopping and coalescing?]