idnits 2.17.1 

draft-templin-intarea-seal-16.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 9, 2010) is 5038 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  == Outdated reference: A later version (-07) exists of
     draft-ietf-intarea-ipv4-id-update-00

  == Outdated reference: A later version (-04) exists of
     draft-ietf-v6ops-tunnel-security-concerns-02

  == Outdated reference: A later version (-40) exists of
     draft-templin-intarea-vet-15

  == Outdated reference: A later version (-17) exists of draft-templin-iron-08

  -- Obsolete informational reference (is this intentional?): RFC 1063
     (Obsoleted by RFC 1191)

  -- Obsolete informational reference (is this intentional?): RFC 1981
     (Obsoleted by RFC 8201)


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    F. Templin, Ed.
3	Internet-Draft                              Boeing Research & Technology
4	Intended status: Standards Track                            July 9, 2010
5	Expires: January 10, 2011

7	        The Subnetwork Encapsulation and Adaptation Layer (SEAL)
8	                   draft-templin-intarea-seal-16.txt

10	Abstract

12	   For the purpose of this document, a subnetwork is defined as a
13	   virtual topology configured over a connected IP network routing
14	   region and bounded by encapsulating border nodes.  These virtual
15	   topologies may span multiple IP and/or sub-IP layer forwarding hops,
16	   and can introduce failure modes due to packet duplication and/or
17	   links with diverse Maximum Transmission Units (MTUs).  This document
18	   specifies a Subnetwork Encapsulation and Adaptation Layer (SEAL) that
19	   accommodates such virtual topologies over diverse underlying link
20	   technologies.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on January 10, 2011.

39	Copyright Notice

41	   Copyright (c) 2010 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
57	     1.1.  Motivation . . . . . . . . . . . . . . . . . . . . . . . .  4
58	     1.2.  Approach . . . . . . . . . . . . . . . . . . . . . . . . .  6
59	   2.  Terminology and Requirements . . . . . . . . . . . . . . . . .  7
60	   3.  Applicability Statement  . . . . . . . . . . . . . . . . . . .  9
61	   4.  SEAL Protocol Specification  . . . . . . . . . . . . . . . . . 10
62	     4.1.  Model of Operation . . . . . . . . . . . . . . . . . . . . 10
63	     4.2.  SEAL Header Format . . . . . . . . . . . . . . . . . . . . 13
64	     4.3.  ITE Specification  . . . . . . . . . . . . . . . . . . . . 14
65	       4.3.1.  Tunnel Interface MTU . . . . . . . . . . . . . . . . . 14
66	       4.3.2.  Tunnel Interface Soft State  . . . . . . . . . . . . . 15
67	       4.3.3.  Admitting Packets into the Tunnel  . . . . . . . . . . 16
68	       4.3.4.  Mid-Layer Encapsulation  . . . . . . . . . . . . . . . 17
69	       4.3.5.  SEAL Segmentation  . . . . . . . . . . . . . . . . . . 17
70	       4.3.6.  Outer Encapsulation  . . . . . . . . . . . . . . . . . 17
71	       4.3.7.  Probing Strategy . . . . . . . . . . . . . . . . . . . 18
72	       4.3.8.  Identification . . . . . . . . . . . . . . . . . . . . 18
73	       4.3.9.  Sending SEAL Protocol Packets  . . . . . . . . . . . . 19
74	       4.3.10. Processing Raw ICMP Messages . . . . . . . . . . . . . 19
75	     4.4.  ETE Specification  . . . . . . . . . . . . . . . . . . . . 19
76	       4.4.1.  Reassembly Buffer Requirements . . . . . . . . . . . . 19
77	       4.4.2.  IP-Layer Reassembly  . . . . . . . . . . . . . . . . . 20
78	       4.4.3.  SEAL-Layer Reassembly  . . . . . . . . . . . . . . . . 21
79	       4.4.4.  Decapsulation and Delivery to Upper Layers . . . . . . 22
80	     4.5.  The SEAL Control Message Protocol (SCMP) . . . . . . . . . 22
81	       4.5.1.  Generating SCMP Messages . . . . . . . . . . . . . . . 22
82	       4.5.2.  Processing SCMP Messages . . . . . . . . . . . . . . . 25
83	     4.6.  Tunnel Endpoint Synchronization  . . . . . . . . . . . . . 27
84	   5.  Link Requirements  . . . . . . . . . . . . . . . . . . . . . . 29
85	   6.  End System Requirements  . . . . . . . . . . . . . . . . . . . 29
86	   7.  Router Requirements  . . . . . . . . . . . . . . . . . . . . . 30
87	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30
88	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 30
89	   10. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 31
90	   11. SEAL Advantages over Classical Methods . . . . . . . . . . . . 31
91	   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 32
92	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33
93	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 33
94	     13.2. Informative References . . . . . . . . . . . . . . . . . . 33
95	   Appendix A.  Reliability . . . . . . . . . . . . . . . . . . . . . 36
96	   Appendix B.  Integrity . . . . . . . . . . . . . . . . . . . . . . 36
97	   Appendix C.  Transport Mode  . . . . . . . . . . . . . . . . . . . 37
98	   Appendix D.  Historic Evolution of PMTUD . . . . . . . . . . . . . 38
99	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 39

101	1.  Introduction

103	   As Internet technology and communication has grown and matured, many
104	   techniques have developed that use virtual topologies (including
105	   tunnels of one form or another) over an actual network that supports
106	   the Internet Protocol (IP) [RFC0791][RFC2460].  Those virtual
107	   topologies have elements that appear as one hop in the virtual
108	   topology, but are actually multiple IP or sub-IP layer hops.  These
109	   multiple hops often have quite diverse properties that are often not
110	   even visible to the endpoints of the virtual hop.  This introduces
111	   failure modes that are not dealt with well in current approaches.

113	   The use of IP encapsulation has long been considered as the means for
114	   creating such virtual topologies.  However, the insertion of an outer
115	   IP header reduces the effective path MTU visible to the inner network
116	   layer.  When IPv4 is used, this reduced MTU can be accommodated
117	   through the use of IPv4 fragmentation, but unmitigated in-the-network
118	   fragmentation has been found to be harmful through operational
119	   experience and studies conducted over the course of many years
120	   [FRAG][FOLK][RFC4963].  Additionally, classical path MTU discovery
121	   [RFC1191] has known operational issues that are exacerbated by in-
122	   the-network tunnels [RFC2923][RFC4459].  The following subsections
123	   present further details on the motivation and approach for addressing
124	   these issues.

126	1.1.  Motivation

128	   Before discussing the approach, it is necessary to first understand
129	   the problems.  In both the Internet and private-use networks today,
130	   IPv4 is ubiquitously deployed as the Layer 3 protocol.  The two
131	   primary functions of IPv4 are to provide for 1) addressing, and 2) a
132	   fragmentation and reassembly capability used to accommodate links
133	   with diverse MTUs.  While it is well known that the IPv4 address
134	   space is rapidly becoming depleted, there is a lesser-known but
135	   growing consensus that other IPv4 protocol limitations have already
136	   or may soon become problematic.

138	   First, the IPv4 header Identification field is only 16 bits in
139	   length, meaning that at most 2^16 unique packets with the same
140	   (source, destination, protocol)-tuple may be active in the Internet
141	   at a given time [I-D.ietf-intarea-ipv4-id-update].  Due to the
142	   escalating deployment of high-speed links (e.g., 1Gbps Ethernet),
143	   however, this number may soon become too small by several orders of
144	   magnitude for high data rate packet sources such as tunnel endpoints
145	   [RFC4963].  Furthermore, there are many well-known limitations
146	   pertaining to IPv4 fragmentation and reassembly - even to the point
147	   that it has been deemed "harmful" in both classic and modern-day
148	   studies (cited above).  In particular, IPv4 fragmentation raises
149	   issues ranging from minor annoyances (e.g., in-the-network router
150	   fragmentation) to the potential for major integrity issues (e.g.,
151	   mis-association of the fragments of multiple IP packets during
152	   reassembly [RFC4963]).

154	   As a result of these perceived limitations, a fragmentation-avoiding
155	   technique for discovering the MTU of the forward path from a source
156	   to a destination node was devised through the deliberations of the
157	   Path MTU Discovery Working Group (PMTUDWG) during the late 1980's
158	   through early 1990's (see Appendix D).  In this method, the source
159	   node provides explicit instructions to routers in the path to discard
160	   the packet and return an ICMP error message if an MTU restriction is
161	   encountered.  However, this approach has several serious shortcomings
162	   that lead to an overall "brittleness" [RFC2923].

164	   In particular, site border routers in the Internet are being
165	   configured more and more to discard ICMP error messages coming from
166	   the outside world.  This is due in large part to the fact that
167	   malicious spoofing of error messages in the Internet is made simple
168	   since there is no way to authenticate the source of the messages
169	   [I-D.ietf-tcpm-icmp-attacks].  Furthermore, when a source node that
170	   requires ICMP error message feedback when a packet is dropped due to
171	   an MTU restriction does not receive the messages, a path MTU-related
172	   black hole occurs.  This means that the source will continue to send
173	   packets that are too large and never receive an indication from the
174	   network that they are being discarded.  This behavior has been
175	   confirmed through documented studies showing clear evidence of path
176	   MTU discovery failures in the Internet today [TBIT][WAND].

178	   The issues with both IPv4 fragmentation and this "classical" method
179	   of path MTU discovery are exacerbated further when IP tunneling is
180	   used [RFC4459].  For example, ingress tunnel endpoints (ITEs) may be
181	   required to forward encapsulated packets into the subnetwork on
182	   behalf of hundreds, thousands, or even more original sources in the
183	   end site.  If the ITE allows IPv4 fragmentation on the encapsulated
184	   packets, persistent fragmentation could lead to undetected data
185	   corruption due to Identification field wrapping.  If the ITE instead
186	   uses classical IPv4 path MTU discovery, it may be inconvenienced by
187	   excessive ICMP error messages coming from the subnetwork that may be
188	   either suspect or contain insufficient information for translation
189	   into error messages to be returned to the original sources.

191	   Although recent works have led to the development of a robust end-to-
192	   end MTU determination scheme [RFC4821], this approach requires
193	   tunnels to present a consistent MTU the same as for ordinary links on
194	   the end-to-end path.  Moreover, in current practice existing
195	   tunneling protocols mask the MTU issues by selecting a "lowest common
196	   denominator" MTU that may be much smaller than necessary for most
197	   paths and difficult to change at a later date.  Due to these many
198	   consideration, a new approach to accommodate tunnels over links with
199	   diverse MTUs is necessary.

201	1.2.  Approach

203	   For the purpose of this document, a subnetwork is defined as a
204	   virtual topology configured over a connected network routing region
205	   and bounded by encapsulating border nodes.  Examples include the
206	   global Internet interdomain routing core, Mobile Ad hoc Networks
207	   (MANETs) and enterprise networks.  Subnetwork border nodes forward
208	   unicast and multicast packets over the virtual topology across
209	   multiple IP and/or sub-IP layer forwarding hops that may introduce
210	   packet duplication and/or traverse links with diverse Maximum
211	   Transmission Units (MTUs).

213	   This document introduces a Subnetwork Encapsulation and Adaptation
214	   Layer (SEAL) for tunneling network layer protocols (e.g., IP, OSI,
215	   etc.) over IP subnetworks that connect Ingress and Egress Tunnel
216	   Endpoints (ITEs/ETEs) of border nodes.  It provides a modular
217	   specification designed to be tailored to specific associated
218	   tunneling protocols.  A transport-mode of operation is also possible,
219	   and described in Appendix C.  SEAL accommodates links with diverse
220	   MTUs, protects against off-path denial-of-service attacks, and
221	   supports efficient duplicate packet detection through the use of a
222	   minimal mid-layer encapsulation.

224	   SEAL specifically treats tunnels that traverse the subnetwork as
225	   ordinary links that must support network layer services.  As for any
226	   link, tunnels that use SEAL must provide suitable networking services
227	   including best-effort datagram delivery, integrity and consistent
228	   handling of packets of various sizes.  As for any link whose media
229	   cannot provide suitable services natively, tunnels that use SEAL
230	   employ link-level adaptation functions to meet the legitimate
231	   expectations of the network layer service.  As this is essentially a
232	   link level adaptation, SEAL is therefore permitted to alter packets
233	   within the subnetwork as long as it restores them to their original
234	   form when they exit the subnetwork.  The mechanisms described within
235	   this document are designed precisely for this purpose.

237	   SEAL encapsulation introduces an extended Identification field for
238	   packet identification and a mid-layer segmentation and reassembly
239	   capability that allows simplified cutting and pasting of packets.
240	   Moreover, SEAL senses in-the-network fragmentation as a "noise"
241	   indication that packet sizing parameters are "out of tune" with
242	   respect to the network path.  As a result, SEAL can naturally tune
243	   its packet sizing parameters to eliminate the in-the-network
244	   fragmentation.  This approach is in contrast to existing tunneling
245	   protocol practices which seek to avoid MTU issues by selecting a
246	   "lowest common denominator" MTU that may be overly conservative for
247	   many tunnels and difficult to change even when larger MTUs become
248	   available.

250	   The following sections provide the SEAL normative specifications,
251	   while the appendices present non-normative additional considerations.

253	2.  Terminology and Requirements

255	   The following terms are defined within the scope of this document:

257	   subnetwork
258	      a virtual topology configured over a connected network routing
259	      region and bounded by encapsulating border nodes.

261	   Ingress Tunnel Endpoint
262	      a virtual interface over which an encapsulating border node (host
263	      or router) sends encapsulated packets into the subnetwork.

265	   Egress Tunnel Endpoint
266	      a virtual interface over which an encapsulating border node (host
267	      or router) receives encapsulated packets from the subnetwork.

269	   inner packet
270	      an unencapsulated network layer protocol packet (e.g., IPv6
271	      [RFC2460], IPv4 [RFC0791], OSI/CLNP [RFC1070], etc.) before any
272	      mid-layer or outer encapsulations are added.  Internet protocol
273	      numbers that identify inner packets are found in the IANA Internet
274	      Protocol registry [RFC3232].

276	   mid-layer packet
277	      a packet resulting from adding mid-layer encapsulating headers to
278	      an inner packet.

280	   outer IP packet
281	      a packet resulting from adding an outer IP header to a mid-layer
282	      packet.

284	   packet-in-error
285	      the leading portion of an invoking data packet encapsulated in the
286	      body of an error control message (e.g., an ICMPv4 [RFC0792] error
287	      message, an ICMPv6 [RFC4443] error message, etc.).

289	   IP, IPvX, IPvY
290	      used to generically refer to either IP protocol version, i.e.,
291	      IPv4 or IPv6.

293	   The following abbreviations correspond to terms used within this
294	   document and elsewhere in common Internetworking nomenclature:

296	      DF - the IPv4 header "Don't Fragment" flag [RFC0791]

298	      ETE - Egress Tunnel Endpoint

300	      HLEN - the sum of MHLEN and OHLEN

302	      ITE - Ingress Tunnel Endpoint

304	      MHLEN - the length of any mid-layer headers and trailers

306	      OHLEN - the length of the outer encapsulating headers and
307	      trailers, including the outer IP header, the SEAL header and any
308	      other outer headers and trailers.

310	      PTB - a Packet Too Big message recognized by the inner network
311	      layer, e.g., an ICMPv6 "Packet Too Big" message [RFC4443], an
312	      ICMPv4 "Fragmentation Needed" message [RFC0792], etc.

314	      S_MRU - the SEAL Maximum Reassembly Unit

316	      S_MSS - the SEAL Maximum Segment Size

318	      SCMP - the SEAL Control Message Protocol

320	      SEAL_ID - an Identification value, randomly initialized and
321	      monotonically incremented for each SEAL protocol packet

323	      SEAL_PORT - a TCP/UDP service port number used for SEAL

325	      SEAL_PROTO - an IPv4 protocol number used for SEAL

327	      TE - Tunnel Endpoint (i.e., either ingress or egress)

329	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
330	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
331	   document are to be interpreted as described in [RFC2119].  When used
332	   in lower case (e.g., must, must not, etc.), these words MUST NOT be
333	   interpreted as described in [RFC2119], but are rather interpreted as
334	   they would be in common English.

336	3.  Applicability Statement

338	   SEAL was originally motivated by the specific case of subnetwork
339	   abstraction for Mobile Ad hoc Networks (MANETs), however it soon
340	   became apparent that the domain of applicability also extends to
341	   subnetwork abstractions of enterprise networks, ISP networks, SOHO
342	   networks, the interdomain routing core, and any other networking
343	   scenario involving IP encapsulation.  SEAL and its associated
344	   technologies (including Virtual Enterprise Traversal (VET)
345	   [I-D.templin-intarea-vet]) are functional building blocks for a new
346	   Internetworking architecture based on Routing and Addressing in
347	   Networks with Global Enterprise Recursion (RANGER)
348	   [RFC5720][I-D.russert-rangers] and the Internet Routing Overlay
349	   Network (IRON) [I-D.templin-iron].

351	   SEAL provides a network sublayer for encapsulation of an inner
352	   network layer packet within outer encapsulating headers.  For
353	   example, for IPvX in IPvY encapsulation (e.g., as IPv4/SEAL/IPv6),
354	   the SEAL header appears as a subnetwork encapsulation as seen by the
355	   inner IP layer.  SEAL can also be used as a sublayer within a UDP
356	   data payload (e.g., as IPv4/UDP/SEAL/IPv6 similar to Teredo
357	   [RFC4380]), where UDP encapsulation is typically used for NAT
358	   traversal as well as operation over subnetworks that give
359	   preferential treatment to the "core" Internet protocols (i.e., TCP
360	   and UDP).  The SEAL header is processed the same as for IPv6
361	   extension headers, i.e., it is not part of the outer IP header but
362	   rather allows for the creation of an arbitrarily extensible chain of
363	   headers in the same way that IPv6 does.

365	   SEAL supports a segmentation and reassembly capability for adapting
366	   the network layer to the underlying subnetwork characteristics, where
367	   the Egress Tunnel Endpoint (ETE) determines how much or how little
368	   reassembly it is willing to support.  In the limiting case, the ETE
369	   acts as a passive observer that simply informs the Ingress Tunnel
370	   Endpoint (ITE) of any MTU limitations and otherwise discards all
371	   packets that arrive as multiple fragments.  This mode is useful for
372	   determining an appropriate MTU for tunnels between performance-
373	   critical routers connected to high data rate subnetworks such as the
374	   Internet DFZ, as well as for other uses in which reassembly would
375	   present too great of a burden for the routers or end systems.

377	   When the ETE supports reassembly, the tunnel can be used to transport
378	   packets that are too large to traverse the path without
379	   fragmentation.  In this mode, the ITE determines the tunnel MTU based
380	   on the largest packet the ETE is capable of reassembling rather than
381	   on the MTU of the smallest link in the path.  Therefore, tunnel
382	   endpoints that use SEAL can transport packets that are much larger
383	   than the underlying subnetwork links themselves can carry in a single
384	   piece.

386	   SEAL tunnels may be configured over paths that include not only
387	   ordinary physical links, but also virtual links that may include
388	   other tunnels.  An example application would be linking two
389	   geographically remote supercomputer centers with large MTU links by
390	   configuring a SEAL tunnel across the Internet.  A second example
391	   would be support for sub-IP segmentation over low-end links, i.e.,
392	   especially over wireless transmission media such as IEEE 802.15.4,
393	   broadcast radio links in Mobile Ad-hoc Networks (MANETs), Very High
394	   Frequency (VHF) civil aviation data links, etc.

396	   Many other use case examples are anticipated, and will be identified
397	   as further experience is gained.

399	4.  SEAL Protocol Specification

401	   The following sections specify the operation of the SEAL protocol.

403	4.1.  Model of Operation

405	   SEAL is an encapsulation sublayer that supports a multi-level
406	   segmentation and reassembly capability for the transmission of
407	   unicast and multicast packets across an underlying IP subnetwork with
408	   heterogeneous links.  First, the ITE can use IPv4 fragmentation to
409	   fragment inner IPv4 packets before SEAL encapsulation if necessary.
410	   Secondly, the SEAL layer itself provides a simple cutting-and-pasting
411	   capability for mid-layer packets that can be used to avoid IP
412	   fragmentation on the outer packet.  Finally, ordinary IP
413	   fragmentation is permitted on the outer packet after SEAL
414	   encapsulation and is used to detect and tune out any in-the-network
415	   fragmentation.

417	   SEAL-enabled ITEs encapsulate each inner packet in any mid-layer
418	   headers and trailers, segment the resulting mid-layer packet into
419	   multiple segments if necessary, then append a SEAL header and any
420	   outer encapsulations to each segment.  As an example, for IPv6-in-
421	   IPv4 encapsulation a single-segment inner IPv6 packet encapsulated in
422	   any mid-layer headers and trailers, followed by the SEAL header,
423	   followed by any outer headers and trailers, followed by an outer IPv4
424	   header would appear as shown in Figure 1:

426	                                       +--------------------+
427	                                       ~  outer IPv4 header ~
428	                                       +--------------------+
429	   I                                   ~  other outer hdrs  ~
430	   n                                   +--------------------+
431	   n                                   ~    SEAL Header     ~
432	   e      +--------------------+       +--------------------+
433	   r      ~  mid-layer headers ~       ~  mid-layer headers ~
434	          +--------------------+       +--------------------+
435	   I -->  |                    |  -->  |                    |
436	   P -->  ~     inner IPv6     ~  -->  ~     inner IPv6     ~
437	   v -->  ~       Packet       ~  -->  ~       Packet       ~
438	   6 -->  |                    |  -->  |                    |
439	          +--------------------+       +--------------------+
440	   P      ~ mid-layer trailers ~       ~ mid-layer trailers ~
441	   a      +--------------------+       +--------------------+
442	   c                                   ~   outer trailers   ~
443	   k         Mid-layer packet          +--------------------+
444	   e      after mid-layer encaps.
445	   t                                      Outer IPv4 packet
446	                                     after SEAL and outer encaps.

448	               Figure 1: SEAL Encapsulation - Single Segment

450	   As a second example, for IPv4-in-IPv6 encapsulation an inner IPv4
451	   packet requiring three SEAL segments would appear as three separate
452	   outer IPv6 packets, where the mid-layer headers are carried only in
453	   segment 0 and the mid-layer trailers are carried in segment 2 as
454	   shown in Figure 2:

456	   +------------------+                          +------------------+
457	   ~  outer IPv6 hdr  ~                          ~  outer IPv6 hdr  ~
458	   +------------------+   +------------------+   +------------------+
459	   ~ other outer hdrs ~   ~  outer IPv6 hdr  ~   ~ other outer hdrs ~
460	   +------------------+   +------------------+   +------------------+
461	   ~ SEAL hdr (SEG=0) ~   ~ other outer hdrs ~   ~ SEAL hdr (SEG=2) ~
462	   +------------------+   +------------------+   +------------------+
463	   ~  mid-layer hdrs  ~   ~ SEAL hdr (SEG=1) ~   |    inner IPv4    |
464	   +------------------+   +------------------+   ~      Packet      ~
465	   |    inner IPv4    |   |    inner IPv4    |   |    (Segment 2)   |
466	   ~      Packet      ~   ~      Packet      ~   +------------------+
467	   |    (Segment 0)   |   |    (Segment 1)   |   ~ mid-layer trails ~
468	   +------------------+   +------------------+   +------------------+
469	   ~  outer trailers  ~   ~  outer trailers  ~   ~  outer trailers  ~
470	   +------------------+   +------------------+   +------------------+

472	   Segment 0 (includes    Segment 1 (no mid-     Segment 2 (includes
473	     mid-layer hdrs)        layer encaps)         mid-layer trails)

475	             Figure 2: SEAL Encapsulation - Multiple Segments

477	   The SEAL header itself is inserted according to the specific
478	   tunneling protocol.  Examples include the following:

480	   o  For simple encapsulation of an inner network layer packet within
481	      an outer IPvX header (e.g., [RFC1070][RFC2003][RFC2473][RFC4213],
482	      etc.), the SEAL header is inserted between the inner packet and
483	      outer IPvX headers as: IPvX/SEAL/{inner packet}.

485	   o  For encapsulations over transports such as UDP (e.g., [RFC4380]),
486	      the SEAL header is inserted between the outer transport layer
487	      header and the mid-layer packet, e.g., as IPvX/UDP/SEAL/{mid-layer
488	      packet}.  Here, the UDP header is seen as an "other outer header".

490	   SEAL-encapsulated packets include a SEAL_ID that the TEs maintain as
491	   either a monotonically-incrementing packet identification number or
492	   as a static nonce to identify the tunnel.  When the SEAL_ID is
493	   maintained as a packet identifier, routers within the subnetwork can
494	   use it for duplicate packet detection and the TEs can use it for SEAL
495	   segmentation/reassembly.  TEs can also use the SEAL_ID to detect off-
496	   path attacks whether it is maintained as a packet identifier or a
497	   nonce.

499	   The following sections specify the SEAL header format and SEAL-
500	   related operations of the ITE and ETE, respectively.

502	4.2.  SEAL Header Format

504	   The SEAL header is formatted as follows:

506	       0                   1                   2                   3
507	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
508	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
509	      |VER|A|I|F|M|RSV|  NEXTHDR/SEG  |    SEAL_ID (bits 48 - 32)     |
510	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511	      |                   SEAL_ID (bits 31 - 0)                       |
512	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

514	                       Figure 3: SEAL Header Format

516	   where the header fields are defined as:

518	   VER (2)
519	      a 2-bit version field.  This document specifies Version 0 of the
520	      SEAL protocol, i.e., the VER field encodes the value 0.

522	   A (1)
523	      the "Acknowledgement Requested" bit.  Set to 1 by the ITE in data
524	      packets if it wishes to receive an explicit acknowledgement from
525	      the ETE.

527	   I (1)
528	      the "Identifier" bit.  Set to 1 if the SEAL_ID contains a
529	      monotonically-incrementing packet identifier; set to 0 if the
530	      SEAL_ID contains a static nonce.

532	   F (1)
533	      the "First Segment" bit.  Set to 1 if this SEAL protocol packet
534	      contains the first segment (i.e., Segment #0) of a mid-layer
535	      packet.

537	   M (1)
538	      the "More Segments" bit.  Set to 1 if this SEAL protocol packet
539	      contains a non-final segment of a multi-segment mid-layer packet.

541	   RSV (1)
542	      a 2-bit "Reserved" field.  Set to zero by the ITE and ignored by
543	      the ETE.

545	   NEXTHDR/SEG (8)  an 8-bit field.  When 'F'=1, encodes the next header
546	      Internet Protocol number the same as for the IPv4 protocol and
547	      IPv6 next header fields.  When 'F'=0, encodes a segment number of
548	      a multi-segment mid-layer packet.  (The segment number 0 is
549	      reserved.)

551	   SEAL_ID (48)
552	      a 48-bit Identification or nonce field.

554	   Setting of the various bits and fields of the SEAL header is
555	   specified in the following sections.

557	4.3.  ITE Specification

559	4.3.1.  Tunnel Interface MTU

561	   The ITE configures a tunnel virtual interface over one or more
562	   underlying links that connect the border node to the subnetwork.  The
563	   tunnel interface must present a fixed MTU to Layer 3 as the size for
564	   admission of inner packets into the tunnel.  Since the tunnel
565	   interface may support a large set of ETEs that accept widely varying
566	   maximum packet sizes, however, a number of factors should be taken
567	   into consideration when selecting a tunnel interface MTU.

569	   Due to the ubiquitous deployment of standard Ethernet and similar
570	   networking gear, the nominal Internet cell size has become 1500
571	   bytes; this is the de facto size that end systems have come to expect
572	   will either be delivered by the network without loss due to an MTU
573	   restriction on the path or a suitable ICMP Packet Too Big (PTB)
574	   message returned.  When the 1500 byte packets sent by end systems
575	   incur additional encapsulation at an ITE, however, they may be
576	   dropped silently since the network may not always deliver the
577	   necessary PTBs [RFC2923].

579	   The ITE should therefore set a tunnel virtual interface MTU of at
580	   least 1500 bytes plus extra room to accommodate any additional
581	   encapsulations that may occur on the path from the original source.
582	   The ITE can set larger MTU values still, but should select a value
583	   that is not so large as to cause excessive PTBs coming from within
584	   the tunnel interface.  The ITE can also set smaller MTU values;
585	   however, care must be taken not to set so small a value that original
586	   sources would experience an MTU underflow.  In particular, IPv6
587	   sources must see a minimum path MTU of 1280 bytes, and IPv4 sources
588	   should see a minimum path MTU of 576 bytes.

590	   The ITE can alternatively set an indefinite MTU on the tunnel virtual
591	   interface such that all inner packets are admitted into the interface
592	   without regard to size.  For ITEs that host applications, this option
593	   must be carefully coordinated with protocol stack upper layers, since
594	   some upper layer protocols (e.g., TCP) derive their packet sizing
595	   parameters from the MTU of the outgoing interface and as such may
596	   select too large an initial size.  This is not a problem for upper
597	   layers that use conservative initial maximum segment size estimates
598	   and/or when the tunnel interface can reduce the upper layer's maximum
599	   segment size (e.g., the size advertised in the TCP MSS option) based
600	   on the per-neighbor MTU.

602	   The inner network layer protocol consults the tunnel interface MTU
603	   when admitting a packet into the interface.  For inner IPv4 packets
604	   with the IPv4 Don't Fragment (DF) bit set to 0, if the packet is
605	   larger than the tunnel interface MTU the inner IPv4 layer uses IPv4
606	   fragmentation to break the packet into fragments no larger than the
607	   tunnel interface MTU.  The ITE then admits each fragment into the
608	   tunnel as an independent packet.

610	   For all other inner packets, the ITE admits the packet if it is no
611	   larger than the tunnel interface MTU; otherwise, it drops the packet
612	   and sends a PTB error message to the source with the MTU value set to
613	   the tunnel interface MTU.  The message must contain as much of the
614	   invoking packet as possible without the entire message exceeding the
615	   network layer minimum MTU (e.g., 576 bytes for IPv4, 1280 bytes for
616	   IPv6, etc.).

618	   Note that when the tunnel interface sets an indefinite MTU the ITE
619	   unconditionally admits all packets into the interface without
620	   fragmentation.  In light of the above considerations, it is
621	   RECOMMENDED that the ITE configure an indefinite MTU on the tunnel
622	   virtual interface and adapt to any per-neighbor MTU limitations
623	   within the tunnel virtual interface as described in the following
624	   sections.

626	4.3.2.  Tunnel Interface Soft State

628	   For each ETE, the ITE maintains soft state within the tunnel
629	   interface (e.g., in a neighbor cache) used to support inner
630	   fragmentation and SEAL segmentation for packets admitted into the
631	   tunnel interface.  The soft state includes the following:

633	   o  a Mid-layer Header Length (MHLEN); set to the length of any mid-
634	      layer encapsulation headers and trailers that must be added before
635	      SEAL segmentation.

637	   o  an Outer Header Length (OHLEN); set to the length of the outer IP,
638	      SEAL and other outer encapsulation headers and trailers.

640	   o  a total Header Length (HLEN); set to MHLEN plus OHLEN.

642	   o  a SEAL Maximum Segment Size (S_MSS).  The ITE initializes S_MSS to
643	      the underlying interface MTU if the underlying interface MTU can
644	      be determined (otherwise, the ITE initializes S_MSS to
645	      "infinity").  The ITE decreases or increased S_MSS based on any
646	      SCMP "MTU Report" messages received (see Section 4.5).

648	   o  a SEAL Maximum Reassembly Unit (S_MRU).  If the ITE is not
649	      configured to use SEAL segmentation, it initializes S_MRU to the
650	      static value 0.  Otherwise, it initializes S_MRU to "infinity" and
651	      decreases or increases S_MRU based on any SCMP MTU Report messages
652	      received (see Section 4.5).  When (S_MRU>(S_MSS*256)), the ITE
653	      uses (S_MSS*256) as the effective S_MRU value.

655	   Note that S_MSS and S_MRU include the length of the outer and mid-
656	   layer encapsulating headers and trailers (i.e., HLEN), since the ETE
657	   must retain the headers and trailers during reassembly.  Note also
658	   that the ITE maintains S_MSS and S_MRU as 32-bit values such that
659	   inner packets larger than 64KB (e.g., IPv6 jumbograms [RFC2675]) can
660	   be accommodated when appropriate for a given subnetwork.

662	4.3.3.  Admitting Packets into the Tunnel

664	   After the ITE admits an inner packet/fragment into the tunnel
665	   interface, it uses the following algorithm to determine whether the
666	   packet can be accommodated and (if so) whether (further) inner IP
667	   fragmentation is needed:

669	   o  if the inner packet is unfragmentable (e.g., an IPv6 packet, an
670	      IPv4 packet with DF=1, etc.), and the packet is larger than
671	      (MAX(S_MRU, S_MSS) - HLEN), the ITE drops the packet and sends a
672	      PTB message to the original source with an MTU value of
673	      (MAX(S_MRU, S_MSS) - HLEN); else,

675	   o  if the inner packet is fragmentable (e.g., an IPv4 packet with
676	      DF=0), and the packet is larger than (foo) bytes, the ITE uses
677	      inner fragmentation to break the packet into fragments no larger
678	      than (foo) bytes; else,

680	   o  the ITE processes the packet without inner fragmentation.

682	   In the above, the ITE must track whether the tunnel interface is
683	   using header compression.  If so, the ITE must include the length of
684	   the uncompressed headers and trailers when calculating HLEN.  Note
685	   also in the above that the ITE is permitted to admit inner packets
686	   into the tunnel that can be accommodated in a single SEAL segment
687	   (i.e., no larger than S_MSS) even if they are larger than the ETE
688	   would be willing to reassemble if fragmented (i.e., larger than
689	   S_MRU) - see: Section 4.4.1.

691	   When the ITE uses inner fragmentation, it should use a "safe"
692	   fragment size of (foo) bytes that would be highly unlikely to incur
693	   an outer IP MTU restriction within the tunnel.  If the ITE can
694	   determine a larger fragment size (e.g., via probing), it should use
695	   the larger size for inner fragmentation.  In the absence of
696	   deterministic information, it is RECOMMENDED that the ITE set (foo)
697	   to 1280.

699	4.3.4.  Mid-Layer Encapsulation

701	   After inner IP fragmentation (if necessary), the ITE next
702	   encapsulates each inner packet/fragment in the MHLEN bytes of mid-
703	   layer headers and trailers.  The ITE then presents the mid-layer
704	   packet for SEAL segmentation and outer encapsulation.

706	4.3.5.  SEAL Segmentation

708	   If the ITE is configured to use SEAL segmentation, it checks the
709	   length of the resulting packet after mid-layer encapsulation to
710	   determine whether SEAL segmentation is needed.  If the length of the
711	   resulting mid-layer packet plus OHLEN is larger than S_MSS but no
712	   larger than S_MRU the ITE performs SEAL segmentation by breaking the
713	   mid-layer packet into N segments (N <= 256) that are no larger than
714	   (S_MSS - OHLEN) bytes each.  Each segment, except the final one, MUST
715	   be of equal length.  The first byte of each segment MUST begin
716	   immediately after the final byte of the previous segment, i.e., the
717	   segments MUST NOT overlap.  The ITE SHOULD generate the smallest
718	   number of segments possible, e.g., it SHOULD NOT generate 6 smaller
719	   segments when the packet could be accommodated with 4 larger
720	   segments.

722	   Note that this SEAL segmentation ignores the fact that the mid-layer
723	   packet may be unfragmentable outside of the subnetwork.  This
724	   segmentation process is a mid-layer (not an IP layer) operation
725	   employed by the ITE to adapt the mid-layer packet to the subnetwork
726	   path characteristics, and the ETE will restore the packet to its
727	   original form during reassembly.  Therefore, the fact that the packet
728	   may have been segmented within the subnetwork is not observable
729	   outside of the subnetwork.

731	4.3.6.  Outer Encapsulation

733	   Following SEAL segmentation, the ITE next encapsulates each segment
734	   in a SEAL header formatted as specified in Section 4.2.  For the
735	   first segment, the ITE sets F=1, then sets NEXTHDR to the Internet
736	   Protocol number of the encapsulated inner packet, and finally sets
737	   M=1 if there are more segments or sets M=0 otherwise.  For each non-
738	   initial segment of an N-segment mid-layer packet (N <= 256), the ITE
739	   sets (F=0; M=1; SEG=1) in the SEAL header of the first non-initial
740	   segment, sets (F=0; M=1; SEG=2) in the next non-initial segment,
741	   etc., and sets (F=0; M=0; SEG=N-1) in the final segment.  (Note that
742	   the value SEG=0 is not used, since the initial segment encodes a
743	   NEXTHDR value and not a SEG value.)
744	   The ITE next encapsulates each segment in the requisite outer headers
745	   and trailers according to the specific encapsulation format (e.g.,
746	   [RFC1070], [RFC2003], [RFC2473], [RFC4213], etc.), except that it
747	   writes 'SEAL_PROTO' in the protocol field of the outer IP header
748	   (when simple IP encapsulation is used) or writes 'SEAL_PORT' in the
749	   outer destination service port field (e.g., when IP/UDP encapsulation
750	   is used).  The ITE finally sets A=1 if probing is necessary as
751	   specified in Section 4.3.7, sets the packet identification values as
752	   specified in Section 4.3.8 and sends the packets as specified in
753	   Section 4.3.9.

755	4.3.7.  Probing Strategy

757	   All SEAL encapsulated packets sent by the ITE are considered implicit
758	   probes.  SEAL encapsulated packets that use IPv4 as the outer layer
759	   of encapsulation will elicit SCMP PTB messages from the ETE (see:
760	   Section 4.5) if any IPv4 fragmentation occurs in the path.  SEAL
761	   encapsulated packets that use IPv6 as the outer layer of
762	   encapsulation may be dropped by an IPv6 router on the path to the ETE
763	   which will also return an ICMPv6 PTB message to the ITE.  The ITE can
764	   then use the SEAL_ID within the packet-in-error to determine whether
765	   the PTB message corresponds to one of its recent packet
766	   transmissions.

768	   The ITE should also send explicit probes, periodically, to verify
769	   that the ETE is still reachable.  The ITE sets A=1 in the SEAL header
770	   of a segment to be used as an explicit probe, where the probe can be
771	   either an ordinary data packet or a NULL packet created by setting
772	   the NEXTHDR field to a value of "No Next Header" (see Section 4.7 of
773	   [RFC2460]).  The probe will elicit a solicited SCMP Neighbor
774	   Advertisement (NA) message from the ETE as an acknowledgement (see
775	   Section 4.5.1).

777	   Finally, the ITE MAY send "expendable" outer IP probe packets (see
778	   Section 4.3.9) as explicit probes in order to detect increases in the
779	   path MTU to the ETE.  One possible strategy is to send expendable
780	   packets with A=1 in the SEAL header and DF=1 in the IP header.  In
781	   all cases, the ITE MUST be conservative in its use of the A bit in
782	   order to limit the resultant control message overhead.

784	4.3.8.  Identification

786	   The ITE maintains a randomly-initialized SEAL_ID value as per-ETE
787	   soft state (e.g., in the neighbor cache).  If the SEAL_ID is to be
788	   used as a packet identifier, the ITE monotonically increments the
789	   value for each successive SEAL protocol packet it sends to the ETE.
790	   If the SEAL_ID is to be used as a tunnel identifier, the ITE instead
791	   maintains SEAL_ID as a static value.

793	   For each successive SEAL protocol packet, the ITE writes the current
794	   SEAL_ID value into the header field of the same name in the SEAL
795	   header.  It then sets I=1 if the SEAL_ID represents a packet
796	   identifier and I=0 if the SEAL_ID represents a tunnel identifier.

798	   Note that the ITE must be consistent in its setting of the I bit.
799	   For example, it must not set I=1 in some packets and I=0 in others
800	   since this may result in unpredictable behavior.

802	4.3.9.  Sending SEAL Protocol Packets

804	   Following SEAL segmentation and encapsulation, the ITE sets DF=0 in
805	   the header of each outer IPv4 packet to ensure that they will be
806	   delivered to the ETE even if they are fragmented within the
807	   subnetwork.  (The ITE can instead set DF=1 for "expendable" outer
808	   IPv4 packets (e.g., for NULL packets used as probes -- see Section
809	   4.3.7), but these may be lost due to an MTU restriction).  For outer
810	   IPv6 packets, the "DF" bit is always implicitly set to 1; hence, they
811	   will not be fragmented within the subnetwork.

813	   The ITE sends each outer packet that encapsulates a segment of the
814	   same mid-layer packet into the tunnel in canonical order, i.e.,
815	   segment 0 first, followed by segment 1, etc., and finally segment
816	   N-1.

818	4.3.10.  Processing Raw ICMP Messages

820	   The ITE may receive "raw" ICMP error messages [RFC0792][RFC4443] from
821	   either the ETE or routers within the subnetwork that comprise an
822	   outer IP header, followed by an ICMP header, followed by a portion of
823	   the SEAL packet that generated the error (also known as the "packet-
824	   in-error").  The ITE can use the SEAL_ID encoded in the packet-in-
825	   error as a nonce to confirm that the ICMP message came from either
826	   the ETE or an on-path router, and can use any additional information
827	   to determine whether to accept or discard the message.

829	   The ITE should specifically process raw ICMPv4 Protocol Unreachable
830	   messages and ICMPv6 Parameter Problem messages with Code
831	   "Unrecognized Next Header type encountered" as a hint that the ETE
832	   does not implement the SEAL protocol; specific actions that the ITE
833	   may take in this case are out of scope.

835	4.4.  ETE Specification

837	4.4.1.  Reassembly Buffer Requirements

839	   The ETE SHOULD support IP-layer and SEAL-layer reassembly for inner
840	   packets of at least 1280 bytes in length and MAY support reassembly
841	   for larger inner packets.  (The ETE may instead support only a
842	   minimum-length reassembly buffer or even a zero-length buffer, but
843	   this may cause MTU underruns in some environments.)  The ETE must
844	   retain the outer IP, SEAL and other outer headers and trailers during
845	   both IP-layer and SEAL-layer reassembly for the purpose of
846	   associating the fragments/segments of the same packet, and must also
847	   configure a SEAL-layer reassembly buffer that is no smaller than the
848	   IP-layer reassembly buffer.  Hence, the ETE:

850	   o  SHOULD configure an outer IP-layer reassembly buffer size of at
851	      least (1280 + HELN) bytes, and

853	   o  MUST be capable of discarding inner packets that require IP-layer
854	      or SEAL-layer reassembly and that are larger than (S_MRU - HLEN).

856	   The ETE can maintain S_MRU either as a single value to be applied for
857	   all ITEs, or as a per-ITE value.  In that case, the ETE can manage
858	   each per-ITE S_MRU value separately (e.g., to reduce congestion
859	   caused by excessive segmentation from specific ITEs) but should seek
860	   to maintain as stable a value as possible for each ITE.

862	   Note that the ETE is permitted to accept inner packets that did not
863	   undergo IP-layer and/or SEAL-layer reassembly even if they are larger
864	   than (S_MRU - HELN) bytes.  Hence, S_MRU is a maximum *reassembly*
865	   size, and may be less than the ETE is able to receive without
866	   reassembly.

868	4.4.2.  IP-Layer Reassembly

870	   The ETE submits unfragmented SEAL protocol IP packets for SEAL-layer
871	   reassembly as specified in Section 4.4.3.  The ETE instead performs
872	   standard IP-layer reassembly for multi-fragment SEAL protocol IP
873	   packets as follows.

875	   The ETE should maintain conservative IP-layer reassembly cache high-
876	   and low-water marks.  When the size of the reassembly cache exceeds
877	   this high-water mark, the ETE should actively discard incomplete
878	   reassemblies (e.g., using an Active Queue Management (AQM) strategy)
879	   until the size falls below the low-water mark.  The ETE should also
880	   actively discard any pending reassemblies that clearly have no
881	   opportunity for completion, e.g., when a considerable number of new
882	   fragments have been received before a fragment that completes a
883	   pending reassembly has arrived.  Following successful IP-layer
884	   reassembly, the ETE submits the reassembled packet for SEAL-layer
885	   reassembly as specified in Section 4.4.3.

887	   When the ETE processes the IP first fragment (i.e., one with MF=1 and
888	   Offset=0 in the IP header) of a fragmented SEAL packet, it sends an
889	   SCMP PTB message back to the ITE (see Section 4.5.1).  When the ETE
890	   processes an IP fragment that would cause the reassembled outer
891	   packet to be larger than the IP-layer reassembly buffer following
892	   reassembly, it discontinues the reassembly and discards any further
893	   fragments of the same packet.

895	4.4.3.  SEAL-Layer Reassembly

897	   Following IP reassembly (if necessary), if the mid-layer packet has
898	   an incorrect value in the SEAL header the ETE discards the packet and
899	   returns an SCMP "Parameter Problem" message (see Section 4.5.1).
900	   Next, if the SEAL header has A=1, the ETE sends a solicited SCMP
901	   Neighbor Advertisement (NA) message back to the ITE (see Section
902	   4.5.1).  The ETE next submits single-segment mid-layer packets for
903	   decapsulation and delivery to upper layers as specified in Section
904	   4.4.4.  The ETE instead performs SEAL-layer reassembly for multi-
905	   segment mid-layer packets with I=1 in the SEAL header as follows.

907	   The ETE adds each segment of a multi-segment mid-layer packet with
908	   I=1 in the SEAL header to a SEAL-layer pending-reassembly queue
909	   according to the (Source, Destination, SEAL_ID)-tuple found in the
910	   outer IP and SEAL headers.  The ETE performs SEAL-layer reassembly
911	   through simple in-order concatenation of the encapsulated segments of
912	   the same mid-layer packet from N consecutive SEAL segments.  SEAL-
913	   layer reassembly requires the ETE to maintain a cache of recently
914	   received segments for a hold time that would allow for nominal inter-
915	   segment delays.  When a SEAL reassembly times out, the ETE discards
916	   the incomplete reassembly and returns an SCMP "Time Exceeded" message
917	   to the ITE (see Section 4.5.1).  As for IP-layer reassembly, the ETE
918	   should also maintain a conservative reassembly cache high- and low-
919	   water mark and should actively discard any pending reassemblies that
920	   clearly have no opportunity for completion, e.g., when a considerable
921	   number of new SEAL packets have been received before a packet that
922	   completes a pending reassembly has arrived.

924	   If the ETE receives a SEAL packet for which a segment with the same
925	   (Source, Destination, SEAL_ID)-tuple is already in the queue, it must
926	   determine whether to accept the new segment and release the old, or
927	   drop the new segment.  If accepting the new segment would cause an
928	   inconsistency with other segments already in the queue (e.g.,
929	   differing segment lengths), the ETE drops the segment that is least
930	   likely to complete the reassembly.  If the ETE accepts a new SEAL
931	   segment that would cause the reassembled outer packet to be larger
932	   than S_MRU following reassembly, it schedules the reassembly
933	   resources for garbage collection and sends an SCMP PTB message back
934	   to the ITE (see Section 4.5.1).

936	   After all segments are gathered, the ETE reassembles the packet by
937	   concatenating the segments encapsulated in the N consecutive SEAL
938	   packets beginning with the initial segment (i.e., SEG=0) and followed
939	   by any non-initial segments 1 through N-1.  That is, for an N-segment
940	   mid-layer packet, reassembly entails the concatenation of the SEAL-
941	   encapsulated packet segments with (F=1, M=1, SEAL_ID=j) in the first
942	   SEAL header, followed by (F=0, M=1, SEG=1, SEAL_ID=(j+1)) in the next
943	   SEAL header, followed by (F=0, M=1, SEG=2, SEAL_ID=(j+2)), etc., up
944	   to (F=0, M=0, SEG=(N-1), SEAL_ID=(j + N-1)) in the final SEAL header.
945	   (Note that modulo arithmetic based on the length of the SEAL_ID field
946	   is used).  Following successful SEAL-layer reassembly, the ETE
947	   submits the reassembled mid-layer packet for decapsulation and
948	   delivery to upper layers as specified in Section 4.4.4.

950	4.4.4.  Decapsulation and Delivery to Upper Layers

952	   Following any necessary IP- and SEAL-layer reassembly, the ETE
953	   discards the outer headers and trailers and performs any mid-layer
954	   transformations on the mid-layer packet.  The ETE next discards the
955	   mid-layer headers and trailers, and delivers the inner packet to the
956	   upper-layer protocol indicated either in the SEAL NEXTHDR field or
957	   the next header field of the mid-layer packet (i.e., if the packet
958	   included mid-layer encapsulations).  The ETE instead silently
959	   discards the inner packet if it was a NULL packet (see Section
960	   4.3.9).

962	4.5.  The SEAL Control Message Protocol (SCMP)

964	   SEAL uses a companion SEAL Control Message Protocol (SCMP) based on
965	   the same message format as the Internet Control Message Protocol for
966	   IPv6 (ICMPv6) [RFC4443].  SCMP messages are further identified by the
967	   NEXTHDR value '58' the same as for ICMPv6 messages, however the SCMP
968	   message is *not* immediately preceded by an inner IPv6 header.
969	   Instead, SCMP messages appear immediately following the SEAL header
970	   which allows TEs to differentiate them from ordinary ICMPv6 messages.
971	   Unlike ICMPv6 messages, SCMP messages are used only for the purpose
972	   of conveying information between TEs, i.e., they are used only for
973	   sharing control information within the tunnel and not beyond the
974	   tunnel.

976	   The following sections specify the generation and processing of SCMP
977	   messages:

979	4.5.1.  Generating SCMP Messages

981	   SCMP messages may be generated by either ITEs or ETEs (i.e., by any
982	   TE) using use the same message Type and Code values specified for
983	   ordinary ICMPv6 messages in [RFC4443].  SCMP can also be used to
984	   carry other message types and their associated options as specified
985	   in other documents (e.g., [RFC4191][RFC4861]).  The general format
986	   for SCMP messages is shown in Figure 4:

988	       0                   1                   2                   3
989	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
990	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
991	      |     Type      |     Code      |          Checksum             |
992	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
993	      |                                                               |
994	      ~                         Message Body                          ~
995	      |                                                               |
996	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
997	      |                  As much of invoking SEAL data                |
998	      ~                packet as possible without the SCMP            ~
999	      |                  packet exceeding 576 bytes (*)               |
1000	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1002	      (*) also known as the "packet-in-error"

1004	                       Figure 4: SCMP Message Format

1006	   As for ordinary ICMPv6 messages, the SCMP message begins with a 4
1007	   byte header that includes 8-bit Type and Code fields followed by a
1008	   16-bit Checksum field followed by a variable-length Message Body.
1009	   The Message Body is followed by the leading portion of the invoking
1010	   SEAL data packet (i.e., the "packet-in-error") IFF the packet-in-
1011	   error would also be included in the corresponding ICMPv6 message.
1012	   The TE sets the Type and Code fields to the same values that would
1013	   appear in the corresponding ICMPv6 message and also formats the
1014	   message body the same as for the corresponding ICMPv6 message except
1015	   as otherwise specified.

1017	   If the SCMP message will include a packet-in-error, the TE then
1018	   includes as much of the leading portion of the invoking SEAL data
1019	   packet as possible beginning with the outer IP header and extending
1020	   to a length that would not cause the entire SCMP message following
1021	   encapsulation to exceed 576 bytes.  The ITE finally calculates the
1022	   Checksum the same as specified for ICMPv4 messages [RFC0792] and does
1023	   not include a pseudo-header of the outer IP header since the SEAL_ID
1024	   gives sufficient assurance against mis-delivery.  The TE then
1025	   encapsulates the SCMP message in the outer headers as shown in
1026	   Figure 5:

1028	                                       +--------------------+
1029	                                       ~  outer IPv4 header ~
1030	                                       +--------------------+
1031	                                       ~  other outer hdrs  ~
1032	                                       +--------------------+
1033	                                       ~    SEAL Header     ~
1034	          +--------------------+       +--------------------+
1035	          ~ SCMP message header~  -->  ~ SCMP message header~
1036	          +--------------------+  -->  +--------------------+
1037	          ~  SCMP message body ~  -->  ~  SCMP message body ~
1038	          +--------------------+  -->  +--------------------+
1039	          ~   packet-in-error  ~  -->  ~  packet-in-error   ~
1040	          +--------------------+       +--------------------+
1041	                                       ~   outer trailers   ~
1042	               SCMP Message            +--------------------+
1043	           before encapsulation
1044	                                            SCMP Message
1045	                                         after encapsulation

1047	                   Figure 5: SCMP Message Encapsulation

1049	   When a TE generates an SCMP message in response to a packet-in-error,
1050	   it sets the outer IP destination and source addresses of the SCMP
1051	   packet to the packet-in-error's source and destination addresses
1052	   (respectively).  (If the destination address in the packet-in-error
1053	   was multicast, the TE instead sets the outer IP source address of the
1054	   SCMP packet to an address assigned to the underlying IP interface.)
1055	   When a TE generates an SCMP message that is not due to a packet-in-
1056	   error, it sets the outer IP destination and source addresses of the
1057	   SCMP packet the same as for ordinary data packets.  The TE finally
1058	   sets the NEXTHDR field in the SEAL header to the value '58' (i.e.,
1059	   the official IANA protocol number for the ICMPv6 protocol) and sends
1060	   the SCMP message to the tunnel far end.

1062	4.5.1.1.  Generating SCMP Packet Too Big (PTB) Messages

1064	   An ETE generates an SCMP Packet Too Big (PTB) message when it
1065	   receives the IP first fragment (i.e., one with MF=1 and Offset=0 in
1066	   the outer IP header) of a SEAL protocol packet that arrived as
1067	   multiple IP fragments, or when it discontinues reassembly of a SEAL
1068	   protocol packet that arrived as multiple IP fragments and/or multiple
1069	   SEAL segments and would exceed S_MRU following reassembly.

1071	   The ETE prepares an SCMP PTB message the same as for the
1072	   corresponding ICMPv6 PTB message, except that it writes the value 0
1073	   in the MTU field of the message if the PTB is generated as a result
1074	   of receiving an IP first fragment and writes the S_MRU value for this
1075	   ITE in the MTU field otherwise.

1077	4.5.1.2.  Generating SCMP Neighbor Discovery Messages

1079	   An ITE generates an SCMP "Neighbor Solicitation" (NS) or "Router
1080	   Solicitation" (RS) message when it needs to solicit a response from
1081	   an ETE.  An ETE generates a solicited SCMP "Neighbor Advertisement"
1082	   (NA) or "Router Advertisement" (RA) message when it receives an NS/RS
1083	   message, and also generates a solicited NA message when it receives a
1084	   SEAL protocol packet with A=1 in the SEAL header.  Any TE may also
1085	   generate unsolicited NA/RA messages that are not triggered by a
1086	   specific solicitation event.

1088	   The TE generates NS/RS and NA/RA messages the same as described for
1089	   the corresponding IPv6 Neighbor Discovery (ND) messages (see:
1090	   [RFC4861]), except that for solicited NA/RA messages it also includes
1091	   a Redirected Header option formatted the same as for an IPv6 ND
1092	   Redirect message.  The messages may also be used in conjunction with
1093	   the tunnel endpoint synchronization procedure specified in Section
1094	   4.6.

1096	4.5.1.3.  Generating Other SCMP Messages

1098	   An ETE generates an SCMP "Destination Unreachable - Communication
1099	   with Destination Administratively Prohibited" message when it is
1100	   operating in synchronized mode and receives a SEAL packet with a
1101	   SEAL_ID that is outside of the current window for this ITE (see:
1102	   Section 4.6).  An ETE also generates an SCMP "Destination
1103	   Unreachable" message with an appropriate code under the same
1104	   circumstances that an IPv6 system would generate an ICMPv6
1105	   Destination Unreachable message using the same code.  The SCMP
1106	   Destination Unreachable message is formatted the same as for ICMPv6
1107	   Destination Unreachable messages.

1109	   An ETE generates an SCMP "Parameter Problem" message when it receives
1110	   a SEAL packet with an incorrect value in the SEAL header, and
1111	   generates an SCMP "Time Exceeded" message when it garbage collects an
1112	   incomplete SEAL data packet reassembly.  The message formats used are
1113	   the same as for the corresponding ICMPv6 messages.

1115	   Generation of all other SCMP message types is outside the scope of
1116	   this document.

1118	4.5.2.  Processing SCMP Messages

1120	   An ITE processes any SCMP messages it receives as long as it can
1121	   verify that the message was sent from an on-path ETE.  The ITE can
1122	   verify that the SCMP message came from an on-path ETE by checking
1123	   that the SEAL_ID in the encapsulated packet-in-error corresponds to
1124	   one of its recently-sent SEAL data packets.

1126	   An ITE maintains a window of SEAL_IDs of packets that it has recently
1127	   sent to each ETE.  For each SCMP message it receives, the ITE first
1128	   verifies that the SEAL_ID encoded in the packet-in-error is within
1129	   the window of packets that it has recently sent to the ETE..  The ITE
1130	   then verifies that the Checksum in the SCMP message header is
1131	   correct.  If the SEAL_ID is outside of the window and/or the checksum
1132	   is incorrect, the ITE discards the message; otherwise, it processes
1133	   the message the same as for ordinary ICMPv6 messages.

1135	   Any TE may also receive unsolicited SCMP messages from the tunnel far
1136	   end.  When the TEs are synchronized, they can also check that the
1137	   SEAL_ID in the SEAL header of an SCMP message is within the window of
1138	   recently received packets from this tunnel far end (see Section 4.6).

1140	   Finally, TEs process SCMP messages as an indication that the tunnel
1141	   far end is responsive, i.e., in the same manner implied for IPv6
1142	   Neighbor Unreachability Detection "hints of forward progress" (see:
1143	   [RFC4861]).

1145	4.5.2.1.  Processing SCMP PTB Messages

1147	   An ITE may receive an SCMP PTB message after it sends a SEAL data
1148	   packet (see: Section 4.5.1).  When the ITE receives an SCMP PTB
1149	   message, it examines the MTU field in the message.  If the MTU field
1150	   is non-zero, the PTB was the result of a reassembly buffer
1151	   limitation; in that case, the ITE records the value in the MTU field
1152	   as the new S_MRU value for this ETE then (optionally) sends a
1153	   translated PTB message of the inner network layer protocol to the
1154	   original source with MTU set to (MAX(S_MRU, S_MSS) - HLEN).  If the
1155	   MTU field is zero, however, the PTB was the result of an IP
1156	   fragmentation event; in that case, the ITE does not send back a
1157	   translated PTB message but determines a new S_MSS value according to
1158	   the length recorded in the IP header of the packet-in-error as
1159	   follows:

1161	   o  If the length is no less than 1280, the ITE records the length as
1162	      the new S_MSS value.

1164	   o  If the length is less than the current S_MSS value and also less
1165	      than 1280, the ITE can discern that IP fragmentation is occurring
1166	      but it cannot determine the true MTU of the restricting link due
1167	      to the possibility that a router on the path is generating runt
1168	      first fragments.

1170	   In this latter case, the ITE must search for a reduced S_MSS value
1171	   through an iterative searching strategy that parallels (Section 5 of
1172	   [RFC1191]).  This searching strategy may require multiple iterations
1173	   in which the ITE sends SEAL data packets using a reduced S_MSS and
1174	   receives additional SCMP MTU Report messages.  During this process,
1175	   it is essential that the ITE reduce S_MSS based on the first SCMP MTU
1176	   Report message received under the current S_MSS size, and refrain
1177	   from further reducing S_MSS until SCMP MTU Report messages pertaining
1178	   to packets sent under the new S_MSS are received.

1180	4.5.2.2.  Processing SCMP Neighbor Discovery Messages

1182	   An ETE may received NS/RS messages from an ITE as an the initial leg
1183	   in a neighbor discovery exchange.  An ITE may receive both solicited
1184	   and unsolicited NA/RA messages from an ETE, where solicited NA/RA
1185	   messages are distinguished by their inclusion of a Redirected header
1186	   option (see: Section 4.5.1).

1188	   The TE processes NS/RS and NA/RA messages the same as described for
1189	   the corresponding IPv6 Neighbor Discovery (ND) messages (see:
1190	   [RFC4861]).  The messages may also be used in conjunction with the
1191	   tunnel endpoint synchronization procedure specified in Section 4.6.

1193	4.5.2.3.  Processing Other SCMP Messages

1195	   An ITE may receive an SCMP "Destination Unreachable - Communication
1196	   with Destination Administratively Prohibited" message after it sends
1197	   a SEAL data packet.  The ITE processes this message as an indication
1198	   that it needs to (re)synchronize with the ETE (see: Section 4.6).  An
1199	   ITE may also receive an SCMP "Destination Unreachable" message with
1200	   an appropriate code under the same circumstances that an IPv6 host
1201	   would receive an ICMPv6 Destination Unreachable message.

1203	   An ITE may receive an SCMP "Parameter Problem" message when the ETE
1204	   receives a SEAL packet with an incorrect value in the SEAL header.
1205	   The ITE should examine the incorrect SEAL header field setting to
1206	   determine whether a different setting should be used in subsequent
1207	   packets.

1209	   .An ITE may receive an SCMP "Time Exceeded" message when the ETE
1210	   garbage collects an incomplete SEAL data packet reassembly.  The ITE
1211	   should consider the message as an indication of congestion.

1213	   Processing of all other SCMP message types is outside the scope of
1214	   this document.

1216	4.6.  Tunnel Endpoint Synchronization

1218	   The SEAL ITE maintains a per-ETE window of SEAL_IDs of its recently-
1219	   sent packets, but by default the SEAL ETE does not retain inter-
1220	   packet state.  When closer synchronization is required, SEAL Tunnel
1221	   Endpoints (TEs) can exchange initial sequence numbers in a procedure
1222	   that parallels IPv6 neighbor discovery and the TCP 3-way handshake.
1223	   When the TEs are synchronized, the ETE can also maintain a per-ITE
1224	   window of SEAL_IDs of its recently-received packets.

1226	   When an initiating TE ("TE(A)") needs to synchronize with a new
1227	   tunnel far end ("TE(B)"), it first chooses a randomly-initialized 48-
1228	   bit SEAL_ID value that it would like TE(B) to use (i.e.,
1229	   "SEAL_ID(B)").  TE(A) then creates a neighbor cache entry for TE(B)
1230	   and records SEAL_ID(B) in the neighbor cache entry.  Next, TE(A)
1231	   creates an SCMP NS or RS message that includes a Nonce option (see:
1232	   [RFC3971], Section 5.3).  TE(A) then writes the value SEAL_ID(B) in
1233	   the Nonce option, writes the value 0 in the SEAL_ID field of the SEAL
1234	   header and sends the NS/RS message to TE(B).

1236	   When TE(B) receives an NS/RS message with a Nonce option and with the
1237	   value 0 in the SEAL_ID of the SEAL header, it considers the message
1238	   as a potential synchronization request.  TE(B) first extracts the
1239	   value SEAL_ID(B) from the Nonce option then chooses a randomly-
1240	   initialized 48-bit SEAL_ID value that it would like TE(A) to use
1241	   (i.e., "SEAL_ID(A)").  TE(B) then stores the tuple (ip_src,
1242	   SEAL_ID(A), SEAL_ID(B)) in a minimal temporary fast path data
1243	   structure, where "ip_src" is the outer IP source address of the SCMP
1244	   message.  (For efficiency and security purposes, the data structure
1245	   should be indexed, e.g., by a secret hash of the -tuple).  TE(B) then
1246	   creates a solicited SCMP NA or RA message that includes a Nonce
1247	   option.  It then writes the value SEAL_ID(A) in the Nonce option,
1248	   writes the value SEAL_ID(B) in the SEAL_ID field of the SEAL header
1249	   and sends the NA/RA message back to TE(A).

1251	   When TE(A) receives the NA/RA, it considers the message as a
1252	   potential synchronization acknowledgement.  TE(A) first verifies that
1253	   the value encoded in the SEAL_ID of the SEAL header matches the
1254	   SEAL_ID(B) in the neighbor cache entry.  If the values match, TE(A)
1255	   extracts SEAL_ID(A) from the nonce option and records it in the
1256	   neighbor cache entry; otherwise, it drops the packet.  If instead
1257	   TE(A) does not receive a timely NA/RA response, it retransmits the
1258	   initial NS/RS message for a total of 3 tries before giving up the
1259	   same as for ordinary IPv6 neighbor discovery.

1261	   After TE(A) receives the synchronization acknowledgement, it begins
1262	   sending either unsolicited NA/RA messages or ordinary data packets
1263	   back to TE(B) using SEAL_ID(A) as the initial sequence number.  When
1264	   TE(B) receives these packets, it first checks its neighbor cache to
1265	   see if there is a matching neighbor cache entry.  If there is a
1266	   neighbor cache entry, and the SEAL_ID in the header of the packet is
1267	   within the window of the SEAL_ID recorded in the neighbor cache
1268	   entry, TE(B) accepts the packet.  If the SEAL_ID in the packet is
1269	   newer than the SEAL_ID in the neighbor cache entry, TE(B) also
1270	   updates the neighbor cache value.  If there is no neighbor cache
1271	   entry, TE(B) instead checks the fast path cache to see if the packet
1272	   is a match for an in-progress synchronization event.  If there is a
1273	   fast path cache entry with a SEAL_ID(A) that is within the window of
1274	   the SEAL_ID in the packet header, TE(B) accepts the packet and also
1275	   creates a new neighbor cache entry with the tuple (ip_src,
1276	   SEAL_ID(A), SEAL_ID(B)).  If there is no matching fast path cache
1277	   entry, TE(B) instead simply discards the packet.

1279	   By maintaining the fast path cache, each TE is able to mitigate
1280	   buffer exhaustion attacks that may be launched by off-path attackers
1281	   [RFC4987].  The TE will receive positive confirmation that the
1282	   synchronization request came from an on-path tunnel far end after it
1283	   receives a stream of in-window packets as the "third leg" of this
1284	   three-way handshake as described above.  The TEs should maintain
1285	   neighbor cache entries as long as they receive hints of forward
1286	   progress from the tunnel far end, but should delete the neighbor
1287	   cache entries after a nominal stale time (e.g., 30 seconds).  The TEs
1288	   should also purge fast-path cache entries for which no window
1289	   synchronization messages are received within a nominal stale time
1290	   (e.g., 5 seconds).

1292	   After synchronization is complete, when a TE receives a SEAL packet
1293	   it checks in its neighbor cache to determine whether the SEAL_ID is
1294	   within the current window, and discards any packets that are outside
1295	   the window.  Since packets may be lost or reordered, and since SEAL
1296	   presents only a best effort (i.e., and not reliable) link model, the
1297	   TE should set a coarse-grained window size (e.g., 32768) and accept
1298	   any packet with a SEAL_ID that is within the window.

1300	   Note that when the ITE sends SEAL packets with I=0, the window is
1301	   trivial and a constant SEAL_ID nonce value instead of an incrementing
1302	   sequence number is used.

1304	5.  Link Requirements

1306	   Subnetwork designers are expected to follow the recommendations in
1307	   Section 2 of [RFC3819] when configuring link MTUs.

1309	6.  End System Requirements

1311	   SEAL provides robust mechanisms for returning PTB messages; however,
1312	   end systems that send unfragmentable IP packets larger than 1500
1313	   bytes are strongly encouraged to implement their own end-to-end MTU
1314	   assurance, e.g., using Packetization Layer Path MTU Discovery per
1315	   [RFC4821].

1317	7.  Router Requirements

1319	   IPv4 routers within the subnetwork are strongly encouraged to
1320	   implement IPv4 fragmentation such that the first fragment is the
1321	   largest and approximately the size of the underlying link MTU, i.e.,
1322	   they should avoid generating runt first fragments.

1324	   IPv6 routers within the subnetwork are required to generate the
1325	   necessary PTB messages when they drop outer IPv6 packets due to an
1326	   MTU restriction.

1328	8.  IANA Considerations

1330	   The IANA is instructed to allocate an IP protocol number for
1331	   'SEAL_PROTO' in the 'protocol-numbers' registry.

1333	   The IANA is instructed to allocate a Well-Known Port number for
1334	   'SEAL_PORT' in the 'port-numbers' registry.

1336	   The IANA is instructed to establish a "SEAL Protocol" registry to
1337	   record SEAL Version values.  This registry should be initialized to
1338	   include the initial SEAL Version number, i.e., Version 0.

1340	9.  Security Considerations

1342	   Unlike IPv4 fragmentation, overlapping fragment attacks are not
1343	   possible due to the requirement that SEAL segments be non-
1344	   overlapping.  This condition is naturally enforced due to the fact
1345	   that each consecutive SEAL segment begins at offset 0 with respect to
1346	   the previous SEAL segment.

1348	   An amplification/reflection attack is possible when an attacker sends
1349	   IP first fragments with spoofed source addresses to an ETE, resulting
1350	   in a stream of SCMP messages returned to a victim ITE.  The SEAL_ID
1351	   in the encapsulated segment of the spoofed IP first fragment provides
1352	   mitigation for the ITE to detect and discard spurious SCMP messages.

1354	   The SEAL header is sent in-the-clear (outside of any IPsec/ESP
1355	   encapsulations) the same as for the outer IP and other outer headers.
1356	   In this respect, the threat model is no different than for IPv6
1357	   extension headers.  As for IPv6 extension headers, the SEAL header is
1358	   protected only by L2 integrity checks and is not covered under any L3
1359	   integrity checks.

1361	   SCMP messages carry the SEAL_ID of the packet-in-error.  Therefore,
1362	   when an ITE receives an SCMP message it can unambiguously associate
1363	   it with the SEAL data packet that triggered the error.  When the TEs
1364	   are synchronized, the ETE can also detect off-path spoofing attacks.

1366	   Security issues that apply to tunneling in general are discussed in
1367	   [I-D.ietf-v6ops-tunnel-security-concerns].

1369	10.  Related Work

1371	   Section 3.1.7 of [RFC2764] provides a high-level sketch for
1372	   supporting large tunnel MTUs via a tunnel-level segmentation and
1373	   reassembly capability to avoid IP level fragmentation, which is in
1374	   part the same approach used by SEAL.  SEAL could therefore be
1375	   considered as a fully functioned manifestation of the method
1376	   postulated by that informational reference.

1378	   Section 3 of [RFC4459] describes inner and outer fragmentation at the
1379	   tunnel endpoints as alternatives for accommodating the tunnel MTU;
1380	   however, the SEAL protocol specifies a mid-layer segmentation and
1381	   reassembly capability that is distinct from both inner and outer
1382	   fragmentation.

1384	   Section 4 of [RFC2460] specifies a method for inserting and
1385	   processing extension headers between the base IPv6 header and
1386	   transport layer protocol data.  The SEAL header is inserted and
1387	   processed in exactly the same manner.

1389	   The concepts of path MTU determination through the report of
1390	   fragmentation and extending the IP Identification field were first
1391	   proposed in deliberations of the TCP-IP mailing list and the Path MTU
1392	   Discovery Working Group (MTUDWG) during the late 1980's and early
1393	   1990's.  SEAL supports a report fragmentation capability using bits
1394	   in an extension header (the original proposal used a spare bit in the
1395	   IP header) and supports ID extension through a 16-bit field in an
1396	   extension header (the original proposal used a new IP option).  A
1397	   historical analysis of the evolution of these concepts, as well as
1398	   the development of the eventual path MTU discovery mechanism for IP,
1399	   appears in Appendix D of this document.

1401	11.  SEAL Advantages over Classical Methods

1403	   The SEAL approach offers a number of distinct advantages over the
1404	   classical path MTU discovery methods [RFC1191] [RFC1981]:

1406	   1.  Classical path MTU discovery always results in packet loss when
1407	       an MTU restriction is encountered.  Using SEAL, IP fragmentation
1408	       provides a short-term interim mechanism for ensuring that packets
1409	       are delivered while SEAL adjusts its packet sizing parameters.

1411	   2.  Classical path MTU may require several iterations of dropping
1412	       packets and returning PTB messages until an acceptable path MTU
1413	       value is determined.  Under normal circumstances, SEAL determines
1414	       the correct packet sizing parameters in a single iteration.

1416	   3.  Using SEAL, ordinary packets serve as implicit probes without
1417	       exposing data to unnecessary loss.  SEAL also provides an
1418	       explicit probing mode not available in the classic methods.

1420	   4.  Using SEAL, ETEs encapsulate SCMP error messages in outer and
1421	       mid-layer headers such that packet-filtering network middleboxes
1422	       will not filter them the same as for "raw" ICMP messages that may
1423	       be generated by an attacker.

1425	   5.  The SEAL approach ensures that the tunnel either delivers or
1426	       deterministically drops packets according to their size, which is
1427	       a required characteristic of any IP link.

1429	   6.  Most importantly, all SEAL packets have an Identification field
1430	       that is sufficiently long to be used for duplicate packet
1431	       detection purposes and to associate ICMP error messages with
1432	       actual packets sent without requiring per-packet state; hence,
1433	       SEAL avoids certain denial-of-service attack vectors open to the
1434	       classical methods.

1436	12.  Acknowledgments

1438	   The following individuals are acknowledged for helpful comments and
1439	   suggestions: Jari Arkko, Fred Baker, Iljitsch van Beijnum, Oliver
1440	   Bonaventure, Teco Boot, Bob Braden, Brian Carpenter, Steve Casner,
1441	   Ian Chakeres, Noel Chiappa, Remi Denis-Courmont, Remi Despres, Ralph
1442	   Droms, Aurnaud Ebalard, Gorry Fairhurst, Dino Farinacci, Joel
1443	   Halpern, Sam Hartman, John Heffner, Thomas Henderson, Bob Hinden,
1444	   Christian Huitema, Eliot Lear, Darrel Lewis, Joe Macker, Matt Mathis,
1445	   Erik Nordmark, Dan Romascanu, Dave Thaler, Joe Touch, Mark Townsley,
1446	   Ole Troan, Margaret Wasserman, Magnus Westerlund, Robin Whittle,
1447	   James Woodyatt, and members of the Boeing Research & Technology NST
1448	   DC&NT group.

1450	   Path MTU determination through the report of fragmentation was first
1451	   proposed by Charles Lynn on the TCP-IP mailing list in 1987.
1452	   Extending the IP identification field was first proposed by Steve
1453	   Deering on the MTUDWG mailing list in 1989.

1455	13.  References

1457	13.1.  Normative References

1459	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
1460	              September 1981.

1462	   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
1463	              RFC 792, September 1981.

1465	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1466	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1468	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1469	              (IPv6) Specification", RFC 2460, December 1998.

1471	   [RFC3971]  Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
1472	              Neighbor Discovery (SEND)", RFC 3971, March 2005.

1474	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
1475	              Message Protocol (ICMPv6) for the Internet Protocol
1476	              Version 6 (IPv6) Specification", RFC 4443, March 2006.

1478	   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
1479	              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
1480	              September 2007.

1482	13.2.  Informative References

1484	   [FOLK]     Shannon, C., Moore, D., and k. claffy, "Beyond Folklore:
1485	              Observations on Fragmented Traffic", December 2002.

1487	   [FRAG]     Kent, C. and J. Mogul, "Fragmentation Considered Harmful",
1488	              October 1987.

1490	   [I-D.ietf-intarea-ipv4-id-update]
1491	              Touch, J., "Updated Specification of the IPv4 ID Field",
1492	              draft-ietf-intarea-ipv4-id-update-00 (work in progress),
1493	              March 2010.

1495	   [I-D.ietf-tcpm-icmp-attacks]
1496	              Gont, F., "ICMP attacks against TCP",
1497	              draft-ietf-tcpm-icmp-attacks-12 (work in progress),
1498	              March 2010.

1500	   [I-D.ietf-v6ops-tunnel-security-concerns]
1501	              Hoagland, J., Krishnan, S., and D. Thaler, "Security
1502	              Concerns With IP Tunneling",
1503	              draft-ietf-v6ops-tunnel-security-concerns-02 (work in
1504	              progress), March 2010.

1506	   [I-D.russert-rangers]
1507	              Russert, S., Fleischman, E., and F. Templin, "RANGER
1508	              Scenarios", draft-russert-rangers-05 (work in progress),
1509	              July 2010.

1511	   [I-D.templin-intarea-vet]
1512	              Templin, F., "Virtual Enterprise Traversal (VET)",
1513	              draft-templin-intarea-vet-15 (work in progress),
1514	              June 2010.

1516	   [I-D.templin-iron]
1517	              Templin, F., "The Internet Routing Overlay Network
1518	              (IRON)", draft-templin-iron-08 (work in progress),
1519	              July 2010.

1521	   [MTUDWG]   "IETF MTU Discovery Working Group mailing list,
1522	              gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November
1523	              1989 - February 1995.".

1525	   [RFC1063]  Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP
1526	              MTU discovery options", RFC 1063, July 1988.

1528	   [RFC1070]  Hagens, R., Hall, N., and M. Rose, "Use of the Internet as
1529	              a subnetwork for experimentation with the OSI network
1530	              layer", RFC 1070, February 1989.

1532	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
1533	              November 1990.

1535	   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
1536	              for IP version 6", RFC 1981, August 1996.

1538	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
1539	              October 1996.

1541	   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
1542	              IPv6 Specification", RFC 2473, December 1998.

1544	   [RFC2675]  Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
1545	              RFC 2675, August 1999.

1547	   [RFC2764]  Gleeson, B., Heinanen, J., Lin, A., Armitage, G., and A.
1548	              Malis, "A Framework for IP Based Virtual Private
1549	              Networks", RFC 2764, February 2000.

1551	   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
1552	              RFC 2923, September 2000.

1554	   [RFC3232]  Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by
1555	              an On-line Database", RFC 3232, January 2002.

1557	   [RFC3366]  Fairhurst, G. and L. Wood, "Advice to link designers on
1558	              link Automatic Repeat reQuest (ARQ)", BCP 62, RFC 3366,
1559	              August 2002.

1561	   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
1562	              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
1563	              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
1564	              RFC 3819, July 2004.

1566	   [RFC4191]  Draves, R. and D. Thaler, "Default Router Preferences and
1567	              More-Specific Routes", RFC 4191, November 2005.

1569	   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
1570	              for IPv6 Hosts and Routers", RFC 4213, October 2005.

1572	   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
1573	              Network Address Translations (NATs)", RFC 4380,
1574	              February 2006.

1576	   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
1577	              Network Tunneling", RFC 4459, April 2006.

1579	   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
1580	              Discovery", RFC 4821, March 2007.

1582	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
1583	              Errors at High Data Rates", RFC 4963, July 2007.

1585	   [RFC4987]  Eddy, W., "TCP SYN Flooding Attacks and Common
1586	              Mitigations", RFC 4987, August 2007.

1588	   [RFC5445]  Watson, M., "Basic Forward Error Correction (FEC)
1589	              Schemes", RFC 5445, March 2009.

1591	   [RFC5720]  Templin, F., "Routing and Addressing in Networks with
1592	              Global Enterprise Recursion (RANGER)", RFC 5720,
1593	              February 2010.

1595	   [TBIT]     Medina, A., Allman, M., and S. Floyd, "Measuring
1596	              Interactions Between Transport Protocols and Middleboxes",
1597	              October 2004.

1599	   [TCP-IP]   "Archive/Hypermail of Early TCP-IP Mail List,
1600	              http://www-mice.cs.ucl.ac.uk/multimedia/misc/tcp_ip/, May
1601	              1987 - May 1990.".

1603	   [WAND]     Luckie, M., Cho, K., and B. Owens, "Inferring and
1604	              Debugging Path MTU Discovery Failures", October 2005.

1606	Appendix A.  Reliability

1608	   Although a SEAL tunnel may span an arbitrarily-large subnetwork
1609	   expanse, the IP layer sees the tunnel as a simple link that supports
1610	   the IP service model.  Since SEAL supports segmentation at a layer
1611	   below IP, SEAL therefore presents a case in which the link unit of
1612	   loss (i.e., a SEAL segment) is smaller than the end-to-end
1613	   retransmission unit (e.g., a TCP segment).

1615	   Links with high bit error rates (BERs) (e.g., IEEE 802.11) use
1616	   Automatic Repeat-ReQuest (ARQ) mechanisms [RFC3366] to increase
1617	   packet delivery ratios, while links with much lower BERs typically
1618	   omit such mechanisms.  Since SEAL tunnels may traverse arbitrarily-
1619	   long paths over links of various types that are already either
1620	   performing or omitting ARQ as appropriate, it would therefore often
1621	   be inefficient to also require the tunnel to perform ARQ.

1623	   When the SEAL ITE has knowledge that the tunnel will traverse a
1624	   subnetwork with non-negligible loss due to, e.g., interference, link
1625	   errors, congestion, etc., it can solicit Segment Reports from the ETE
1626	   periodically to discover missing segments for retransmission within a
1627	   single round-trip time.  However, retransmission of missing segments
1628	   may require the ITE to maintain considerable state and may also
1629	   result in considerable delay variance and packet reordering.

1631	   SEAL may also use alternate reliability mechanisms such as Forward
1632	   Error Correction (FEC).  A simple FEC mechanism may merely entail
1633	   gratuitous retransmissions of duplicate data, however more efficient
1634	   alternatives are also possible.  Basic FEC schemes are discussed in
1635	   [RFC5445].

1637	   The use of ARQ and FEC mechanisms for improved reliability are for
1638	   further study.

1640	Appendix B.  Integrity

1642	   Each link in the path over which a SEAL tunnel is configured is
1643	   responsible for link layer integrity verification for packets that
1644	   traverse the link.  As such, when a multi-segment SEAL packet with N
1645	   segments is reassembled, its segments will have been inspected by N
1646	   independent link layer integrity check streams instead of a single
1647	   stream that a single segment SEAL packet of the same size would have
1648	   received.  Intuitively, a reassembled packet subjected to N
1649	   independent integrity check streams of shorter-length segments would
1650	   seem to have integrity assurance that is no worse than a single-
1651	   segment packet subjected to only a single integrity check steam,
1652	   since the integrity check strength diminishes in inverse proportion
1653	   with segment length.  In any case, the link-layer integrity assurance
1654	   for a multi-segment SEAL packet is no different than for a multi-
1655	   fragment IPv6 packet.

1657	   Fragmentation and reassembly schemes must also consider packet-
1658	   splicing errors, e.g., when two segments from the same packet are
1659	   concatenated incorrectly, when a segment from packet X is reassembled
1660	   with segments from packet Y, etc.  The primary sources of such errors
1661	   include implementation bugs and wrapping IP ID fields.  In terms of
1662	   implementation bugs, the SEAL segmentation and reassembly algorithm
1663	   is much simpler than IP fragmentation resulting in simplified
1664	   implementations.  In terms of wrapping ID fields, when IPv4 is used
1665	   as the outer IP protocol, the 16-bit IP ID field can wrap with only
1666	   64K packets with the same (src, dst, protocol)-tuple alive in the
1667	   system at a given time [RFC4963] increasing the likelihood of
1668	   reassembly mis-associations.  However, SEAL ensures that any outer
1669	   IPv4 fragmentation and reassembly will be short-lived and tuned out
1670	   as soon as the ITE receives a Reassembly Repot, and SEAL segmentation
1671	   and reassembly uses a much longer ID field.  Therefore, reassembly
1672	   mis-associations of IP fragments nor of SEAL segments should be
1673	   prohibitively rare.

1675	Appendix C.  Transport Mode

1677	   SEAL can also be used in "transport-mode", e.g., when the inner layer
1678	   comprises upper-layer protocol data rather than an encapsulated IP
1679	   packet.  For instance, TCP peers can negotiate the use of SEAL for
1680	   the carriage of protocol data encapsulated as IPv4/SEAL/TCP.  In this
1681	   sense, the "subnetwork" becomes the entire end-to-end path between
1682	   the TCP peers and may potentially span the entire Internet.

1684	   Section specifies the operation of SEAL in "tunnel mode", i.e., when
1685	   there are both an inner and outer IP layer with a SEAL encapsulation
1686	   layer between.  However, the SEAL protocol can also be used in a
1687	   "transport mode" of operation within a subnetwork region in which the
1688	   inner-layer corresponds to a transport layer protocol (e.g., UDP,
1689	   TCP, etc.) instead of an inner IP layer.

1691	   For example, two TCP endpoints connected to the same subnetwork
1692	   region can negotiate the use of transport-mode SEAL for a connection
1693	   by inserting a 'SEAL_OPTION' TCP option during the connection
1694	   establishment phase.  If both TCPs agree on the use of SEAL, their
1695	   protocol messages will be carried as TCP/SEAL/IPv4 and the connection
1696	   will be serviced by the SEAL protocol using TCP (instead of an
1697	   encapsulating tunnel endpoint) as the transport layer protocol.  The
1698	   SEAL protocol for transport mode otherwise observes the same
1699	   specifications as for Section 4.

1701	Appendix D.  Historic Evolution of PMTUD

1703	   The topic of Path MTU discovery (PMTUD) saw a flurry of discussion
1704	   and numerous proposals in the late 1980's through early 1990.  The
1705	   initial problem was posed by Art Berggreen on May 22, 1987 in a
1706	   message to the TCP-IP discussion group [TCP-IP].  The discussion that
1707	   followed provided significant reference material for [FRAG].  An IETF
1708	   Path MTU Discovery Working Group [MTUDWG] was formed in late 1989
1709	   with charter to produce an RFC.  Several variations on a very few
1710	   basic proposals were entertained, including:

1712	   1.  Routers record the PMTUD estimate in ICMP-like path probe
1713	       messages (proposed in [FRAG] and later [RFC1063])

1715	   2.  The destination reports any fragmentation that occurs for packets
1716	       received with the "RF" (Report Fragmentation) bit set (Steve
1717	       Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal)

1719	   3.  A hybrid combination of 1) and Charles Lynn's Nov. 1987 (straw
1720	       RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990)

1722	   4.  Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30,
1723	       1990)

1725	   5.  Fragmentation avoidance by setting "IP_DF" flag on all packets
1726	       and retransmitting if ICMPv4 "fragmentation needed" messages
1727	       occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191]
1728	       by Mogul and Deering).

1730	   Option 1) seemed attractive to the group at the time, since it was
1731	   believed that routers would migrate more quickly than hosts.  Option
1732	   2) was a strong contender, but repeated attempts to secure an "RF"
1733	   bit in the IPv4 header from the IESG failed and the proponents became
1734	   discouraged. 3) was abandoned because it was perceived as too
1735	   complicated, and 4) never received any apparent serious
1736	   consideration.  Proposal 5) was a late entry into the discussion from
1737	   Steve Deering on Feb. 24th, 1990.  The discussion group soon
1738	   thereafter seemingly lost track of all other proposals and adopted
1739	   5), which eventually evolved into [RFC1191] and later [RFC1981].

1741	   In retrospect, the "RF" bit postulated in 2) is not needed if a
1742	   "contract" is first established between the peers, as in proposal 4)
1743	   and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on
1744	   Feb 19. 1990.  These proposals saw little discussion or rebuttal, and
1745	   were dismissed based on the following the assertions:

1747	   o  routers upgrade their software faster than hosts

1749	   o  PCs could not reassemble fragmented packets

1751	   o  Proteon and Wellfleet routers did not reproduce the "RF" bit
1752	      properly in fragmented packets

1754	   o  Ethernet-FDDI bridges would need to perform fragmentation (i.e.,
1755	      "translucent" not "transparent" bridging)

1757	   o  the 16-bit IP_ID field could wrap around and disrupt reassembly at
1758	      high packet arrival rates

1760	   The first four assertions, although perhaps valid at the time, have
1761	   been overcome by historical events.  The final assertion is addressed
1762	   by the mechanisms specified in SEAL.

1764	Author's Address

1766	   Fred L. Templin (editor)
1767	   Boeing Research & Technology
1768	   P.O. Box 3707
1769	   Seattle, WA  98124
1770	   USA

1772	   Email: fltemplin@acm.org