idnits 2.17.1 draft-templin-intarea-seal-68.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 03, 2014) is 3758 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.templin-aerolink' is mentioned on line 503, but not defined == Unused Reference: 'RFC0768' is defined on line 1170, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-02) exists of draft-taylor-v6ops-fragdrop-01 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Obsolete informational reference (is this intentional?): RFC 6434 (Obsoleted by RFC 8504) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Obsoletes: rfc5320 (if approved) January 03, 2014 5 Updates: rfc2460 (if approved) 6 Intended status: Standards Track 7 Expires: July 7, 2014 9 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 10 draft-templin-intarea-seal-68.txt 12 Abstract 14 This document specifies a Subnetwork Encapsulation and Adaptation 15 Layer (SEAL). SEAL operates over virtual topologies configured over 16 connected IP network routing regions bounded by encapsulating border 17 nodes. These virtual topologies are manifested by tunnels that may 18 span multiple IP and/or sub-IP layer forwarding hops, where they may 19 incur packet duplication, packet reordering, source address spoofing 20 and traversal of links with diverse Maximum Transmission Units 21 (MTUs). SEAL addresses these issues through the encapsulation and 22 messaging mechanisms specified in this document. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on July 7, 2014. 41 Copyright Notice 43 Copyright (c) 2014 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 1.3. Differences with RFC5320 . . . . . . . . . . . . . . . . . 7 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 63 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 9 64 4. Applicability Statement . . . . . . . . . . . . . . . . . . . 9 65 5. SEAL Specification . . . . . . . . . . . . . . . . . . . . . . 10 66 5.1. SEAL Tunnel Model . . . . . . . . . . . . . . . . . . . . 10 67 5.2. SEAL Model of Operation . . . . . . . . . . . . . . . . . 11 68 5.3. SEAL Encapsulation Format . . . . . . . . . . . . . . . . 12 69 5.4. ITE Specification . . . . . . . . . . . . . . . . . . . . 13 70 5.4.1. Tunnel MTU . . . . . . . . . . . . . . . . . . . . . . 13 71 5.4.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 14 72 5.4.3. SEAL Layer Pre-Processing . . . . . . . . . . . . . . 15 73 5.4.4. SEAL Encapsulation and Fragmentation . . . . . . . . . 16 74 5.4.5. Outer Encapsulation . . . . . . . . . . . . . . . . . 16 75 5.4.6. Path MTU Probing and ETE Reachability Verification . . 17 76 5.4.7. Processing ICMP Messages . . . . . . . . . . . . . . . 18 77 5.4.8. Detecting Path MTU Changes . . . . . . . . . . . . . . 19 78 5.5. ETE Specification . . . . . . . . . . . . . . . . . . . . 19 79 5.5.1. Reassembly Buffer Requirements . . . . . . . . . . . . 19 80 5.5.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 19 81 5.5.3. IPv4-Layer Reassembly . . . . . . . . . . . . . . . . 19 82 5.5.4. Decapsulation, SEAL-Layer Reassembly, and 83 Re-Encapsulation . . . . . . . . . . . . . . . . . . . 20 84 6. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 21 85 7. End System Requirements . . . . . . . . . . . . . . . . . . . 21 86 8. Router Requirements . . . . . . . . . . . . . . . . . . . . . 21 87 9. Multicast/Anycast Considerations . . . . . . . . . . . . . . . 21 88 10. Compatibility Considerations . . . . . . . . . . . . . . . . . 22 89 11. Nested Encapsulation Considerations . . . . . . . . . . . . . 22 90 12. Reliability Considerations . . . . . . . . . . . . . . . . . . 23 91 13. Integrity Considerations . . . . . . . . . . . . . . . . . . . 23 92 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 93 15. Security Considerations . . . . . . . . . . . . . . . . . . . 24 94 16. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 24 95 17. Implementation Status . . . . . . . . . . . . . . . . . . . . 24 96 18. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 97 19. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 98 19.1. Normative References . . . . . . . . . . . . . . . . . . . 25 99 19.2. Informative References . . . . . . . . . . . . . . . . . . 26 100 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 102 1. Introduction 104 As Internet technology and communication has grown and matured, many 105 techniques have developed that use virtual topologies (manifested by 106 tunnels of one form or another) over an actual network that supports 107 the Internet Protocol (IP) [RFC0791][RFC2460]. Those virtual 108 topologies have elements that appear as one network layer hop, but 109 are actually multiple IP or sub-IP layer hops which comprise the 110 "subnetwork" over which the tunnel operates. 112 The use of IP encapsulation (also known as "tunneling") has long been 113 considered as the means for creating such virtual topologies (e.g., 114 see [RFC2003][RFC2473]). Tunnels serve a wide variety of purposes, 115 including mobility, security, routing control, traffic engineering, 116 multihoming, etc., and will remain an integral part of the 117 architecture moving forward. However, the encapsulation headers 118 often include insufficiently provisioned per-packet identification 119 values. IP encapsulation also allows an attacker to produce 120 encapsulated packets with spoofed source addresses even if the source 121 address in the encapsulating header cannot be spoofed. A denial-of- 122 service vector that is not possible in non-tunneled subnetworks is 123 therefore presented. 125 Additionally, the insertion of an outer IP header reduces the 126 effective Maximum Transmission Unit (MTU) visible to the inner 127 network layer. When IPv6 is used as the encapsulation protocol, 128 original sources expect to be informed of the MTU limitation through 129 IPv6 Path MTU discovery (PMTUD) [RFC1981]. When IPv4 is used, this 130 reduced MTU can be accommodated through the use of IPv4 131 fragmentation, but unmitigated in-the-network fragmentation has been 132 deemed harmful through operational experience and studies conducted 133 over the course of many years [FRAG][FOLK][RFC4963]. Additionally, 134 classical IPv4 PMTUD [RFC1191] has known operational issues that are 135 exacerbated by in-the-network tunnels [RFC2923][RFC4459]. 137 The following subsections present further details on the motivation 138 and approach for addressing these issues. 140 1.1. Motivation 142 Before discussing the approach, it is necessary to first understand 143 the problems. In both the Internet and private-use networks today, 144 IP is ubiquitously deployed as the Layer 3 protocol. The primary 145 functions of IP are to provide for routing, addressing, and a 146 fragmentation and reassembly capability used to accommodate links 147 with diverse MTUs. While it is well known that the IP address space 148 is rapidly becoming depleted, there is also a growing awareness that 149 other IP protocol limitations have already or may soon become 150 problematic. 152 First, the Internet historically provided no means for discerning 153 whether the source addresses of IP packets are authentic. This 154 shortcoming is being addressed more and more through the deployment 155 of site border router ingress filters [RFC2827], however the use of 156 encapsulation provides a vector for an attacker to circumvent 157 filtering for the encapsulated packet even if filtering is correctly 158 applied to the encapsulation header. Secondly, the IP header does 159 not include a well-behaved identification value unless the source has 160 included a fragment header for IPv6 or unless the source permits 161 fragmentation for IPv4. These limitations preclude an efficient 162 means for routers to detect duplicate packets and packets that have 163 been re-ordered within the subnetwork. Additionally, recent studies 164 have shown that the arrival of fragments at high data rates can cause 165 denial-of-service (DoS) attacks on performance-sensitive networking 166 gear, prompting some administrators to configure their equipment to 167 drop fragments unconditionally [I-D.taylor-v6ops-fragdrop]. 169 For IPv4 encapsulation, when fragmentation is permitted the header 170 includes a 16-bit Identification field, meaning that at most 2^16 171 unique packets with the same (source, destination, protocol)-tuple 172 can be active in the network at the same time [RFC6864]. (When 173 middleboxes such as Network Address Translators (NATs) re-write the 174 Identification field to random values, the number of unique packets 175 is even further reduced.) Due to the escalating deployment of high- 176 speed links, however, these numbers have become too small by several 177 orders of magnitude for high data rate packet sources such as tunnel 178 endpoints [RFC4963]. 180 Furthermore, there are many well-known limitations pertaining to IPv4 181 fragmentation and reassembly - even to the point that it has been 182 deemed "harmful" in both classic and modern-day studies (see above). 183 In particular, IPv4 fragmentation raises issues ranging from minor 184 annoyances (e.g., in-the-network router fragmentation [RFC1981]) to 185 the potential for major integrity issues (e.g., mis-association of 186 the fragments of multiple IP packets during reassembly [RFC4963]). 188 As a result of these perceived limitations, a fragmentation-avoiding 189 technique for discovering the MTU of the forward path from a source 190 to a destination node was devised through the deliberations of the 191 Path MTU Discovery Working Group (MTUDWG) during the late 1980's 192 through early 1990's which resulted in the publication of [RFC1191]. 193 In this negative feedback-based method, the source node provides 194 explicit instructions to routers in the path to discard the packet 195 and return an ICMP error message if an MTU restriction is 196 encountered. However, this approach has several serious shortcomings 197 that lead to an overall "brittleness" [RFC2923]. 199 In particular, site border routers in the Internet have been known to 200 discard ICMP error messages coming from the outside world. This is 201 due in large part to the fact that malicious spoofing of error 202 messages in the Internet is trivial since there is no way to 203 authenticate the source of the messages [RFC5927]. Furthermore, when 204 a source node that requires ICMP error message feedback when a packet 205 is dropped due to an MTU restriction does not receive the messages, a 206 path MTU-related black hole occurs. This means that the source will 207 continue to send packets that are too large and never receive an 208 indication from the network that they are being discarded. This 209 behavior has been confirmed through documented studies showing clear 210 evidence of PMTUD failures for both IPv4 and IPv6 in the Internet 211 today [TBIT][WAND][SIGCOMM][RIPE]. 213 The issues with both IP fragmentation and this "classical" PMTUD 214 method are exacerbated further when IP tunneling is used [RFC4459]. 215 For example, a tunnel ingress may be required to forward encapsulated 216 packets into the subnetwork on behalf of hundreds, thousands, or even 217 more original sources. If the ITE allows IP fragmentation on the 218 encapsulated packets, persistent fragmentation could lead to 219 undetected data corruption due to Identification field wrapping 220 and/or reassembly congestion at the tunnel egress. If the ingress 221 instead uses classical IP PMTUD it must rely on ICMP error messages 222 coming from the subnetwork that may be suspect, subject to loss due 223 to filtering middleboxes, or insufficiently provisioned for 224 translation into error messages to be returned to the original 225 sources. 227 Although recent works have led to the development of a positive 228 feedback-based end-to-end MTU determination scheme [RFC4821], they do 229 not excuse tunnels from accounting for the encapsulation overhead 230 they add to packets. Moreover, in current practice existing 231 tunneling protocols mask the MTU issues by selecting a "lowest common 232 denominator" MTU that may be much smaller than necessary for most 233 paths and difficult to change at a later date. Therefore, a new 234 approach to accommodate tunnels over links with diverse MTUs is 235 necessary. 237 1.2. Approach 239 This document concerns subnetworks manifested through a virtual 240 topology configured over a connected network routing region and 241 bounded by encapsulating border nodes. Example connected network 242 routing regions include Mobile Ad hoc Networks (MANETs), enterprise 243 networks, aviation networks and the global public Internet itself. 244 Subnetwork border nodes forward unicast and multicast packets over 245 the virtual topology across multiple IP and/or sub-IP layer 246 forwarding hops that may introduce packet duplication and/or traverse 247 links with diverse Maximum Transmission Units (MTUs). 249 This document introduces a Subnetwork Encapsulation and Adaptation 250 Layer (SEAL) for tunneling inner network layer protocol packets over 251 IP subnetworks that connect Ingress and Egress Tunnel Endpoints 252 (ITEs/ETEs) of border nodes. It provides a modular specification 253 designed to be tailored to specific associated tunneling protocols. 254 (A transport-mode of operation is also possible but out of scope for 255 this document.) 257 SEAL treats tunnels that traverse the subnetwork as ordinary links 258 that must support network layer services. Moreover, SEAL provides 259 dynamic mechanisms (including limited fragmentation and reassembly) 260 to ensure a maximal path MTU over the tunnel. This is in contrast to 261 static approaches which avoid MTU issues by selecting a lowest common 262 denominator MTU value that may be overly conservative for the vast 263 majority of tunnel paths and difficult to change even when larger 264 MTUs become available. 266 1.3. Differences with RFC5320 268 This specification of SEAL is descended from an experimental 269 independent RFC publication of the same name [RFC5320]. However, 270 this specification introduces a number of fundamental differences 271 from the earlier publication. This specification therefore obsoletes 272 (i.e., and does not update) [RFC5320]. 274 First, [RFC5320] forms a 32-bit Identification value by concatenating 275 the 16-bit IPv4 Identification field with a 16-bit Identification 276 "extension" field in the SEAL header. This means that [RFC5320] can 277 only operate over IPv4 networks (since IPv6 headers do not include a 278 16-bit version number) and that the SEAL Identification value can be 279 corrupted if the Identification in the outer IPv4 header is 280 rewritten. In contrast, this specification includes a 32-bit 281 Identification value that is independent of any identification fields 282 found in the inner or outer IP headers, and is therefore compatible 283 with any inner and outer IP protocol version combinations. 285 Additionally, the SEAL fragmentation and reassembly procedures 286 defined in [RFC5320] differ significantly from those found in this 287 specification. In particular, this specification defines an 13-bit 288 Offset field that allows for finer-grained fragment sizes when SEAL 289 fragmentation and reassembly is necessary. In contrast, [RFC5320] 290 includes only a 3-bit Segment field and performs reassembly through 291 concatenation of consecutive segments. 293 Finally, SEAL no longer uses the IPv4 fragmentation sensing method 294 specified in [RFC5320] as well as in earlier versions of this 295 document. This departure is based on the fact that there is no way 296 for the ITE or ETE to control the way in which middleboxes perform 297 IPv4 fragmentation (e.g., largest fragment first, smallest fragment 298 first, all fragments the same size, etc.). Moreover, there may be 299 middleboxes in the path that reassemble IPv4 fragmented packets 300 before delivering them to the ETE as the final destination. Use of 301 IPv4 fragmentation sensing in the ETE also greatly complicated the 302 specification and proved difficult to implement. Therefore, although 303 the IPv4 fragmentation sensing method is conceptually elegant and 304 natural, it is no longer included. 306 2. Terminology 308 The following terms are defined within the scope of this document: 310 subnetwork 311 a virtual topology configured over a connected network routing 312 region and bounded by encapsulating border nodes. 314 IP 315 used to generically refer to either Internet Protocol (IP) 316 version, i.e., IPv4 or IPv6. 318 Ingress Tunnel Endpoint (ITE) 319 a portal over which an encapsulating border node (host or router) 320 sends encapsulated packets into the subnetwork. 322 Egress Tunnel Endpoint (ETE) 323 a portal over which an encapsulating border node (host or router) 324 receives encapsulated packets from the subnetwork. 326 inner packet 327 an unencapsulated network layer protocol packet (e.g., IPv4 328 [RFC0791], OSI/CLNP [RFC0994], IPv6 [RFC2460], etc.) before any 329 outer encapsulations are added. Internet protocol numbers that 330 identify inner packets are found in the IANA Internet Protocol 331 registry [RFC3232]. SEAL protocol packets that incur an 332 additional layer of SEAL encapsulation are also considered inner 333 packets. 335 outer IP packet 336 a packet resulting from adding an outer IP header (and possibly 337 other outer headers) to a SEAL-encapsulated inner packet. 339 packet-in-error 340 the leading portion of an invoking data packet encapsulated in the 341 body of an error control message (e.g., an ICMPv4 [RFC0792] error 342 message, an ICMPv6 [RFC4443] error message, etc.). 344 Packet Too Big (PTB) message 345 a control plane message indicating an MTU restriction (e.g., an 346 ICMPv6 "Packet Too Big" message [RFC4443], an ICMPv4 347 "Fragmentation Needed" message [RFC0792], etc.). 349 Don't Fragment (DF) bit 350 a bit that indicates whether the packet may be fragmented by the 351 network. The DF bit is explicitly included in the IPv4 header 352 [RFC0791] and may be set to '0' to allow fragmentation or '1' to 353 disallow further in-network fragmentation. The bit is absent from 354 the IPv6 header [RFC2460], but implicitly set to '1' because 355 fragmentation can occur only at IPv6 sources. 357 3. Requirements 359 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 360 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 361 document are to be interpreted as described in [RFC2119]. When used 362 in lower case (e.g., must, must not, etc.), these words MUST NOT be 363 interpreted as described in [RFC2119], but are rather interpreted as 364 they would be in common English. 366 4. Applicability Statement 368 SEAL was originally motivated by the specific case of subnetwork 369 abstraction for Mobile Ad hoc Networks (MANETs), however the domain 370 of applicability also extends to subnetwork abstractions over 371 enterprise networks, mobile networks, aviation networks, ISP 372 networks, SO/HO networks, the global public Internet itself, and any 373 other connected network routing region. 375 SEAL provides a network sublayer used during encapsulation of an 376 inner network layer packet within outer encapsulating headers. SEAL 377 can also be used as a sublayer within a transport layer protocol data 378 payload, where transport layer encapsulation is typically used for 379 Network Address Translator (NAT) traversal as well as operation over 380 subnetworks that give preferential treatment to certain "core" 381 Internet protocols, e.g., TCP, UDP, etc. (However, note that TCP 382 encapsulation may not be appropriate for all use cases; particularly 383 those that require low delay and/or delay variance.) The SEAL header 384 is processed in the same manner as for IPv6 extension headers, i.e., 385 it is not part of the outer IP header but rather allows for the 386 creation of an arbitrarily extensible chain of headers in the same 387 way that IPv6 does. 389 To accommodate MTU diversity, the Ingress Tunnel Endpoint (ITE) may 390 need to perform limited fragmentation which the Egress Tunnel 391 Endpoint (ETE) reassembles. The ITE and ETE further engage in 392 minimal path probing to determine when the path can be traversed 393 without fragmentation. This allows the ITE to send whole packets 394 instead of fragmented packets whenever possible. 396 In practice, SEAL is typically used as an encapsulation sublayer in 397 conjunction with existing tunnel types such as IPsec [RFC4301] , 398 GRE[RFC1701], IP-in-IPv6 [RFC2473], IP-in-IPv4 [RFC4213][RFC2003], 399 etc. When used with existing tunnel types that insert mid-layer 400 headers between the inner and outer IP headers (e.g., IPsec, GRE, 401 etc.), the SEAL header is inserted between the mid-layer headers and 402 outer IP header. 404 5. SEAL Specification 406 The following sections specify the operation of SEAL: 408 5.1. SEAL Tunnel Model 410 SEAL is an encapsulation sublayer used within point-to-point, point- 411 to-multipoint, and non-broadcast, multiple access (NBMA) tunnels. 412 SEAL can also be used with multicast-capable tunnels, but the path 413 probing mechanisms specified in the following sections may not always 414 be sufficient to determine an optimal MTU for a multicast group. 416 Each tunnel is configured over one or more underlying interfaces 417 attached to subnetwork links, where each link represents a different 418 subnetwork path. The tunnel connects an ITE to one or more ETE 419 "neighbors" via encapsulation across an underlying subnetwork, where 420 each tunnel neighbor relationship is maintained over one or more 421 subnetwork paths. The tunnel neighbor relationship may be 422 bidirectional, partially unidirectional or fully unidirectional. 424 A bidirectional tunnel neighbor relationship is one over which both 425 tunnel endpoints can exchange both data and control messages. A 426 partially unidirectional tunnel neighbor relationship allows the near 427 end ITE to send data packets forward to the far end ETE, while the 428 far end only returns control messages when necessary. Finally, a 429 fully unidirectional mode of operation is one in which the near end 430 ITE can receive neither data nor control messages from the far end 431 ETE. 433 5.2. SEAL Model of Operation 435 SEAL-enabled ITEs encapsulate each inner packet in any ancillary 436 tunnel protocol headers and trailers, a SEAL header, and any outer 437 header encapsulations as shown in Figure 1: 439 +--------------------+ 440 ~ outer IP header ~ 441 +--------------------+ 442 ~ other outer hdrs ~ 443 +--------------------+ 444 ~ SEAL header ~ 445 +--------------------+ 446 ~ tunnel headers ~ 447 +--------------------+ +--------------------+ 448 | | --> | | 449 ~ Inner ~ --> ~ Inner ~ 450 ~ Packet ~ --> ~ Packet ~ 451 | | --> | | 452 +--------------------+ +--------------------+ 453 ~ tunnel trailers ~ 454 +--------------------+ 456 Figure 1: SEAL Encapsulation 458 The ITE inserts the SEAL header according to the specific tunneling 459 protocol. For simple encapsulation of an inner network layer packet 460 within an outer IP header, the ITE inserts the SEAL header following 461 the outer IP header and before the inner packet as: IP/SEAL/{inner 462 packet}. 464 For encapsulations over transports such as UDP, the ITE inserts the 465 SEAL header following the outer transport layer header and before the 466 inner packet, e.g., as IP/UDP/SEAL/{inner packet}. In that case, the 467 UDP header is seen as an "other outer header" as depicted in Figure 1 468 and the outer IP and transport layer headers are together seen as the 469 outer encapsulation headers. (Note that outer transport layer 470 headers such as UDP must sometimes be included to ensure that SEAL 471 packets will traverse the path to the ETE without loss due filtering 472 middleboxes. The ETE MUST accept both IP/SEAL and IP/UDP/SEAL as 473 equivalent packets so that the ITE can discontinue outer transport 474 layer encapsulation if the path supports raw IP/SEAL encapsulation.) 476 For SEAL encapsulations that involve tunnel types that include 477 ancillary tunnel headers (e.g., GRE, IPsec, etc.) the ITE inserts the 478 SEAL header as a leading extension to the tunnel headers, i.e., the 479 SEAL encapsulation appears as part of the same tunnel and not a 480 separate tunnel. For example, for GRE the ITE inserts the SEAL 481 header as IP/SEAL/GRE/{inner packet}, and for IPsec the ITE inserts 482 the SEAL header as IP/SEAL/IPsec-header/{inner packet}/IPsec-trailer. 483 In such cases, SEAL considers the length of the inner packet only 484 (i.e., and not the other tunnel headers and trailers) when performing 485 its packet size calculations. 487 SEAL supports both "nested" tunneling and "re-encapsulating" 488 tunneling. Nested tunneling occurs when a first tunnel is 489 encapsulated within a second tunnel, which may then further be 490 encapsulated within additional tunnels. Nested tunneling can be 491 useful, and stands in contrast to "recursive" tunneling which is an 492 anomalous condition incurred due to misconfiguration or a routing 493 loop. Considerations for nested tunneling and avoiding recursive 494 tunneling are discussed in Section 4 of [RFC2473] as well as in 495 Section 9 of this document. 497 Re-encapsulating tunneling occurs when a packet arrives at a first 498 ETE, which then acts as an ITE to re-encapsulate and forward the 499 packet to a second ETE connected to the same subnetwork. In that 500 case each ITE/ETE transition represents a segment of a bridged path 501 between the ITE nearest the source and the ETE nearest the 502 destination. Uses for re-encapsulating tunneling are discussed in 503 [I-D.templin-aerolink]. Combinations of nested and re-encapsulating 504 tunneling are also naturally supported by SEAL. 506 The SEAL ITE considers each underlying interface as the ingress 507 attachment point to a separate subnetwork path to the ETE. The ITE 508 therefore may experience different path MTUs on different subnetwork 509 paths. 511 Finally, the SEAL ITE ensures that the inner network layer protocol 512 will see a minimum MTU of 1500 bytes over each subnetwork path 513 regardless of the outer network layer protocol version, i.e., even if 514 a small amount of fragmentation and reassembly are necessary. This 515 is to avoid path MTU "black holes" for the minimum MTU configured by 516 the vast majority of links in the Internet. 518 5.3. SEAL Encapsulation Format 520 The SEAL header shares the same format and IP protocol number ('44') 521 as the IPv6 Fragment Header specified in Section 4.5 of [RFC2460]. 522 The SEAL header is differentiated from the IPv6 Fragment Header by 523 defining bit number 30 as the "SEAL (S)" bit which is set to 1 when 524 SEAL encapsulation is used and set to 0 for ordinary IPv6 525 fragmentation. SEAL therefore updates the IPv6 Fragment Header 526 specification as shown in Figure 2: 528 0 1 2 3 529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | Next Header | Reserved | Fragment Offset |R|S|M| 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 | Identification | 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 536 Figure 2: SEAL Encapsulation Format 538 5.4. ITE Specification 540 5.4.1. Tunnel MTU 542 The tunnel must present a stable MTU value to the inner network layer 543 as the size for admission of inner packets into the tunnel. Since 544 tunnels may support a large set of subnetwork paths that accept 545 widely varying maximum packet sizes, however, a number of factors 546 should be taken into consideration when selecting a tunnel MTU. 548 Due to the ubiquitous deployment of standard Ethernet and similar 549 networking gear, the nominal Internet cell size has become 1500 550 bytes; this is the de facto size that end systems have come to expect 551 will either be delivered by the network without loss due to an MTU 552 restriction on the path or a suitable ICMP Packet Too Big (PTB) 553 message returned. When large packets sent by end systems incur 554 additional encapsulation at an ITE, however, they may be dropped 555 silently within the tunnel since the network may not always deliver 556 the necessary PTBs [RFC2923]. The ITE SHOULD therefore set a tunnel 557 MTU of at least 1500 bytes and provide accommodations to ensure that 558 packets up to that size are successfully conveyed to the ETE. 560 The inner network layer protocol consults the tunnel MTU when 561 admitting a packet into the tunnel. For non-SEAL inner IPv4 packets 562 with the IPv4 Don't Fragment (DF) bit cleared (i.e., DF==0), if the 563 packet is larger than the tunnel MTU the inner IPv4 layer uses IPv4 564 fragmentation to break the packet into fragments no larger than the 565 MTU. The ITE then admits each fragment into the tunnel as an 566 independent packet. 568 For all other inner packets, the inner network layer admits the 569 packet if it is no larger than the tunnel MTU; otherwise, it drops 570 the packet and sends a PTB error message to the source with the MTU 571 value set to the MTU. The message contains as much of the invoking 572 packet as possible without the entire message exceeding the network 573 layer minimum MTU size. 575 The ITE can alternatively set an indefinite tunnel MTU such that all 576 inner packets are admitted into the tunnel regardless of their size 577 (practical maximums are 64KB for IPv4 and 4GB for IPv6 [RFC2675]). 578 For ITEs that host applications that use the tunnel directly, this 579 option must be carefully coordinated with protocol stack upper layers 580 since some upper layer protocols (e.g., TCP) derive their packet 581 sizing parameters from the MTU of the outgoing interface and as such 582 may select too large an initial size. This is not a problem for 583 upper layers that use conservative initial maximum segment size 584 estimates and/or when the tunnel can reduce the upper layer's maximum 585 segment size, e.g., by reducing the size advertised in the MSS option 586 of outgoing TCP messages (sometimes known as "MSS clamping"). 588 In light of the above considerations, the ITE SHOULD configure an 589 indefinite MTU on *router* tunnels so that SEAL performs all 590 subnetwork adaptation from within the tunnel as specified in the 591 following sections. The ITE MAY instead set a smaller MTU on *host* 592 tunnels; in that case, the RECOMMENDED MTU is the maximum of 1500 593 bytes and the smallest MTU among all of the underlying links minus 594 the size of the encapsulation headers. 596 5.4.2. Tunnel Neighbor Soft State 598 The ITE maintains a number of soft state variables and constants. 600 The ITE maintains a per-ETE window of Identification values for the 601 packets it sends to the ETE. The ITE increments the current 602 Identification value monotonically (modulo 2^32) for each packet it 603 sends. 605 For each subnetwork path, the ITE must also account for encapsulation 606 header lengths. The ITE therefore maintains the per subnetwork path 607 constant values "SHLEN" set to the length of the SEAL header, "THLEN" 608 set to the length of the outer encapsulating transport layer headers 609 (or 0 if outer transport layer encapsulation is not used), "IHLEN" 610 set to the length of the outer IP layer header, and "HLEN" set to 611 (SHLEN+THLEN+IHLEN). When calculating these lengths, the ITE must 612 include the length of the uncompressed headers even if header 613 compression is enabled. When SEAL is used in conjunction with tunnel 614 types that insert additional headers/trailers such as GRE or IPsec, 615 the length of the additional headers and trailers is also included in 616 the HLEN calculation. 618 The ITE also sets a global constant value "MINMTU" to 1500 bytes and 619 sets a per subnetwork path constant value 'FRAGMTU' to (1280-HLEN) 620 bytes (where 1280 is the minimum path MTU for IPv6 [RFC2460]). The 621 value 1280 is used regardless of the outer IP protocol version even 622 though the practical minimum MTU for IPv4 is only 576 bytes [RFC1122] 623 and the theoretical minimum MTU for IPv4 is only 68 bytes [RFC0791]. 625 The value 1280 is applied also to IPv4 since IPv4 links with MTUs 626 smaller than 1280 are presumably performance-constrained such that 627 IPv4 fragmentation can be used to accommodate MTU underruns without 628 risk of high data rate reassembly misassociations. 630 The ITE also sets a per subnetwork path variable "MAXMTU" to the 631 maximum of MINMTU and the MTU of the underlying interface minus HLEN. 632 The ITE thereafter adjusts MAXMTU based on any PTB messages it 633 receives from the subnetwork, but does not reduce MAXMTU below 634 MINMTU. 636 The ITE finally maintains a per subnetwork path boolean variable 637 "DOFRAG", which is initially set to TRUE and may be reset to FALSE if 638 the ITE discovers that the MTU on the path to the ETE is sufficient 639 to accommodate packet sizes of MINMTU bytes or larger. 641 5.4.3. SEAL Layer Pre-Processing 643 The SEAL layer is logically positioned between the inner and outer 644 network protocol layers, where the inner layer is seen as the (true) 645 network layer and the outer layer is seen as the (virtual) data link 646 layer. Each packet to be processed by the SEAL layer is either 647 admitted into the tunnel by the inner network layer protocol as 648 described in Section 5.4.1 or is undergoing re-encapsulation from 649 within the tunnel. The SEAL layer sees the former class of packets 650 as inner packets that include inner network and transport layer 651 headers, and sees the latter class of packets as transitional SEAL 652 packets that include the outer and SEAL layer headers that were 653 inserted by the previous hop SEAL ITE. For these transitional 654 packets, the SEAL layer re-encapsulates the packet with new outer and 655 SEAL layer headers when it forwards the packet to the next hop SEAL 656 ITE. 658 We now discuss the SEAL layer pre-processing actions for these two 659 classes of packets. 661 5.4.3.1. Inner Packet Pre-Processing 663 For each IPv4 inner packet with DF==0 in the IP header, if the packet 664 is larger than MINMTU bytes the ITE first uses standard IPv4 665 fragmentation to fragment the packet into N pieces of at most MINMTU 666 bytes each. In this process, the ITE MUST additionally ensure that N 667 is minimized, the first fragment is the largest fragment and no 668 fragments are overlapping. The ITE then submits each fragment for 669 SEAL encapsulation as specified in Section 5.4.4. 671 For all other inner packets, if the packet is no larger than MAXMTU 672 the ITE submits it for SEAL encapsulation as specified in Section 673 5.4.4. Otherwise, the ITE discards the packet and sends a PTB 674 message appropriate to the inner protocol version (subject to rate 675 limiting) with the MTU field set to MAXMTU. 677 5.4.3.2. Transitional SEAL Packet Pre-Processing 679 For each transitional packet that is to be processed by the SEAL 680 layer from within the tunnel, if the packet is larger than MAXMTU for 681 the next hop subnetwork path the ITE discards the packet and sends a 682 PTB message appropriate to the inner protocol version (subject to 683 rate limiting) with the MTU field set to MAXMTU. Otherwise, the ITE 684 sets aside the encapsulating SEAL and outer headers for later 685 reference (see Section 5.4.5) and submits the inner packet for SEAL 686 re-encapsulation as discussed in the following sections. 688 5.4.4. SEAL Encapsulation and Fragmentation 690 For each inner packet/fragment submitted for SEAL encapsulation, the 691 ITE next encapsulates the packet in a SEAL header formatted as 692 specified in Section 5.3. The ITE next sets S=1 and sets the Next 693 Header field to the protocol number corresponding to the address 694 family of the encapsulated inner packet. For example, the ITE sets 695 the Next Header field to the value '4' for encapsulated IPv4 packets 696 [RFC2003], '41' for encapsulated IPv6 packets [RFC2473][RFC4213], 697 '47' for GRE [RFC1701], '80' for encapsulated OSI/CLNP packets 698 [RFC1070], etc. 700 Next, if the inner packet is no larger than FRAGMTU, or if the inner 701 packet is larger than MINMTU, or if the DOFRAG flag is FALSE, the ITE 702 sets (M=0; Offset=0) and considers the packet an "atomic fragment" 703 (see: [RFC6946]). Otherwise, the ITE fragments the inner packet 704 using the fragmentation procedures specified in Section 4.5 of 705 [RFC2460]. In this process, the ITE breaks the inner packet into two 706 non-overlapping fragments, where the encapsulated SEAL packet 707 containing the first fragment MUST be as large as possible without 708 exceeding 1280 bytes (i.e., the IPv6 minimum MTU) and the 709 encapsulated SEAL packet containing the second fragment MUST include 710 the remainder of the inner packet. This ensures that the entire IP 711 header (plus extensions) is likely to fit within the first fragment 712 and that the number of fragments is minimized. The ITE then adds the 713 outer encapsulating headers as specified in Section 5.4.5. 715 5.4.5. Outer Encapsulation 717 Following SEAL encapsulation and fragmentation, the ITE next 718 encapsulates each fragment in the requisite outer transport (when 719 necessary) and IP layer headers. When a transport layer header such 720 as UDP or TCP is included, the ITE writes the port number for SEAL in 721 the transport destination service port field. 723 When UDP encapsulation is used, the ITE sets the UDP checksum field 724 to zero for both IPv4 and IPv6 packets (see: [RFC6935][RFC6936]). 726 The ITE then sets the outer IP layer headers the same as specified 727 for ordinary IP encapsulation (e.g., [RFC1070][RFC2003], [RFC2473], 728 [RFC4213], etc.) except that for ordinary SEAL packets the ITE copies 729 the "TTL/Hop Limit", "Type of Service/Traffic Class" and "Congestion 730 Experienced" values in the inner network layer header into the 731 corresponding fields in the outer IP header. For transitional SEAL 732 packets undergoing re-encapsulation, the ITE instead copies the "TTL/ 733 Hop Limit", "Type of Service/Traffic Class" and "Congestion 734 Experienced" values in the original outer IP header of the 735 transitional packet into the corresponding fields in the new outer IP 736 header of the packet to be forwarded (i.e., the values are 737 transferred between outer headers and *not* copied from the inner 738 network layer header). 740 The ITE also sets the IP protocol number to the appropriate value for 741 the first protocol layer within the encapsulation (e.g., UDP, TCP, 742 IPv6 Fragment Header, etc.). When IPv6 is used as the outer IP 743 protocol, the ITE then sets the flow label value in the outer IPv6 744 header the same as described in [RFC6438]. When IPv4 is used as the 745 outer IP protocol, if the encapsulated SEAL packet is no larger than 746 1280 bytes the ITE sets DF=0 in the IPv4 header to allow the packet 747 to be fragmented if it encounters a restricting link; otherwise, the 748 ITE sets DF=1 (for IPv6 subnetwork paths, the DF bit is absent but 749 implicitly set to 1). The ITE finally sends each outer packet via 750 the corresponding underlying subnetwork path. 752 5.4.6. Path MTU Probing and ETE Reachability Verification 754 When the ITE is actively sending packets over a subnetwork path to an 755 ETE, it also sends explicit probes subject to rate limiting to test 756 the path MTU. To generate a probe, the ITE creates an ICMPv6 Echo 757 Request message [RFC4443] of length MINMTU bytes and encapsulates the 758 message in a SEAL header and any other outer headers, i.e., with the 759 length of the resulting SEAL packet being (MINMTU+HLEN) bytes. It 760 then sets (Offset=0; S=1; M=0) in the SEAL header, and also sets DF=1 761 in the outer IP header when IPv4 is used. It finally writes the 762 value '58' in the Next Header field of the SEAL header to indicate 763 that the message is a SEAL-encapsulated ICMPv6 message. 765 The ITE sends such MINMTU probes to determine whether SEAL 766 fragmentation is still necessary (see Section 5.4.4). In particular, 767 if the ITE sends a probe and receives a SEAL-encapsulated ICMPv6 Echo 768 Reply message probe reply (see: section 5.5.4), it SHOULD set DOFRAG 769 for this subnetwork path to FALSE. Note that the nominal probe size 770 of MINMTU bytes is RECOMMENDED since probes slightly smaller than 771 this size may be fragmented by the ITE of a nested tunnel further 772 down the path. For example, a successful probe size of 1400 bytes 773 does not guarantee that fragmentation is not occurring at the ITE of 774 another tunnel nesting level. While this would not necessarily 775 result in communication failure, it could yield poor performance not 776 only for the other tunnel nesting levels but also for the ITE itself. 778 The ITE can also send smaller probes to determine whether the ETE is 779 still reachable over this subnetwork path. The ITE prepares the 780 probe as described above then sends the message to the ETE. If the 781 ITE receives a probe reply, its upper layers can consider the message 782 as a reachability indication. The ITE can also send larger probes to 783 test for larger MTU sizes; however, SEAL considers probing for MTU 784 sizes larger than MINMTU as an end-to-end consideration to be 785 addressed by end systems (see: Section 7). 787 Finally, the ITE can also send probes to detect whether an outer 788 transport layer header is no longer necessary to reach this ETE. For 789 example, if the ITE sends its initial packets as IP/UDP/SEAL/*, it 790 can send probes constructed as IP/SEAL/[probe] to determine whether 791 the ETE is reachable without the use of UDP encapsulation. If so, 792 the ITE should also send a new MINMTU probe since switching to a new 793 encapsulation format may result in a path change. 795 While probing, the ITE processes ICMP messages as specified in 796 Section 5.4.7. 798 5.4.7. Processing ICMP Messages 800 When the ITE sends SEAL packets, it may receive ICMP error messages 801 [RFC0792][RFC4443] from a router on the path to the ETE. Each ICMP 802 message includes an outer IP header, followed by an ICMP header, 803 followed by a portion of the SEAL packet that generated the error 804 (also known as the "packet-in-error"). Note that the ITE may receive 805 an ICMP message from either an ordinary router on the path or from 806 another ITE that is at the head end of a nested level of 807 encapsulation. The ITE has no security associations with this nested 808 ITE, hence it should consider the message the same as if it 809 originated from an ordinary router. 811 The ITE should process ICMP Protocol/Port Unreachable messages as a 812 hint that the ETE does not implement SEAL. The ITE can optionally 813 ignore other ICMP messages that do not include sufficient information 814 in the packet-in-error, or process them as a hint that the subnetwork 815 path to the ETE may be failing. The ITE then discards these types of 816 messages. 818 For other ICMP messages, the ITE SHOULD examine the SEAL data packet 819 within the packet-in-error field. If the IP source and/or 820 destination addresses are invalid, or if the value in the SEAL header 821 Identification field (if present) is not within the window of packets 822 the ITE has recently sent to this ETE, the ITE discards the message. 824 Next, if the received ICMP message is a PTB the ITE sets MAXMTU to 825 the maximum of MINMTU and the MTU value in the message minus HLEN. 826 If the MTU value in the message is smaller than (MINMTU+HLEN), the 827 ITE also resets DOFRAG to TRUE and discards the message. 829 If the ICMP message was not discarded, the ITE transcribes it into a 830 message appropriate for the inner protocol version (e.g., ICMPv4 for 831 IPv4, ICMPv6 for IPv6, etc.) and forwards the transcribed message to 832 the previous hop toward the inner source address. 834 5.4.8. Detecting Path MTU Changes 836 The ITE SHOULD periodically reset MAXMTU to the MTU of the underlying 837 subnetwork interface to determine whether the subnetwork path MTU has 838 increased. If the path still has a too-small MTU, the ITE will 839 receive a PTB message that reports a smaller size. 841 5.5. ETE Specification 843 5.5.1. Reassembly Buffer Requirements 845 The ETE MUST configure a minimum SEAL reassembly buffer size of 846 (MINMTU+HLEN) bytes for the reassembly of fragmented SEAL packets 847 (see: Section 5.5.4). Note that the value "HLEN" may be variable and 848 initially unknown to the ETE. It is therefore RECOMMENDED that the 849 ETE configure a slightly larger SEAL reassembly buffer size of 2048 850 bytes (2KB). 852 When IPv4 is used as the outer layer of encapsulation, the ETE MUST 853 also configure a minimum IPv4 reassembly buffer size of 1280 bytes. 855 5.5.2. Tunnel Neighbor Soft State 857 The ETE maintains a window of Identification values for the packets 858 it has recently received from this ITE as well as a window of 859 Identification values for the packets it has recently sent to this 860 ITE. 862 5.5.3. IPv4-Layer Reassembly 864 The ETE reassembles fragmented IPv4 packets that are explicitly 865 addressed to itself. For IPv4 fragments of SEAL packets, the ETE 866 SHOULD maintain conservative reassembly cache high- and low-water 867 marks. When the size of the reassembly cache exceeds this high-water 868 mark, the ETE SHOULD actively discard stale incomplete reassemblies 869 (e.g., using an Active Queue Management (AQM) strategy) until the 870 size falls below the low-water mark. The ETE SHOULD also actively 871 discard any pending reassemblies that clearly have no opportunity for 872 completion, e.g., when a considerable number of new fragments have 873 arrived before a fragment that completes a pending reassembly 874 arrives. 876 The ETE processes IPv4 fragments as specified in the normative 877 references, i.e., it performs any necessary IPv4 reassembly then 878 submits the packet to the appropriate upper layer protocol module. 879 For SEAL packets, the ETE then performs SEAL decapsulation as 880 specified in Section 5.5.4. 882 5.5.4. Decapsulation, SEAL-Layer Reassembly, and Re-Encapsulation 884 For each SEAL packet accepted for decapsulation, the ETE first 885 examines the Identification field. If the Identification is not 886 within the window of acceptable values for this ITE, the ETE silently 887 discards the packet.. 889 Next, if the SEAL header has (Offset!=0 || M=1) the ETE submits the 890 packet for reassembly as specified for IPv6 reassembly in Section 4.5 891 of [RFC2460]. During the reassembly process, the ETE discards any 892 fragments that are overlapping with respect to fragments that have 893 already been received (see: [RFC5722]), and also discards any 894 fragments that have M=1 in the SEAL header but do not contain an 895 integer multiple of 8 bytes. The ETE further SHOULD manage the SEAL 896 reassembly cache the same as described for the IPv4-Layer Reassembly 897 cache in Section 5.5.3, i.e., it SHOULD perform an early discard for 898 any pending reassemblies that have low probability of completion. 900 Next, if the (reassembled) packet is an ICMPv6 Echo Request probe 901 message, the ETE prepares an ICMPv6 Echo Reply probe reply message to 902 send back to the ITE. The ETE then encapsulates the probe reply as 903 specified in Section 5.4.4 and fragments the message if necessary 904 according to the DOFRAG flag (i.e., to ensure that the probe reply is 905 delivered to the ITE). The ETE then sends the probe reply to the ITE 906 and discards the probe. When the ITE receives the probe reply, it 907 reassembles the message if necessary and processes it as specified in 908 Section 5.4.6. 910 Finally, the ETE discards the outer headers of the (reassembled) 911 packet and processes the inner packet according to the header type 912 indicated in the SEAL Next Header field. If the next hop toward the 913 inner destination address is via a different interface than the SEAL 914 packet arrived on, the ETE discards the SEAL and outer headers and 915 delivers the inner packet either to the local host or to the next hop 916 if the packet is not destined to the local host. 918 If the next hop is on the same tunnel the SEAL packet arrived on, 919 however, the ETE submits the packet for SEAL re-encapsulation 920 beginning with the specification in Section 5.4.3 above and without 921 decrementing the value in the inner (TTL / Hop Limit) field. 923 6. Link Requirements 925 Subnetwork designers are expected to follow the recommendations in 926 Section 2 of [RFC3819] when configuring link MTUs. 928 7. End System Requirements 930 End systems are encouraged to implement end-to-end MTU assurance 931 (e.g., using Packetization Layer Path MTU Discovery (PLPMTUD) per 932 [RFC4821]) even if the subnetwork is using SEAL. 934 When end systems use PLPMTUD, SEAL will ensure that the tunnel 935 behaves as a link in the path that assures an MTU of at least 1500 936 bytes while still allowing end systems to discover larger MTUs. The 937 PLPMTUD mechanism will therefore be able to function as designed in 938 order to discover and utilize larger MTUs. 940 8. Router Requirements 942 Routers within the subnetwork are expected to observe the standard IP 943 router requirements, including the implementation of IP fragmentation 944 and reassembly as well as the generation of ICMP messages 945 [RFC0792][RFC1122][RFC1812][RFC2460][RFC4443][RFC6434]. 947 Note that, even when routers support existing requirements for the 948 generation of ICMP messages, these messages are often filtered and 949 discarded by middleboxes on the path to the original source of the 950 message that triggered the ICMP. It is therefore not possible to 951 assume delivery of ICMP messages even when routers are correctly 952 implemented. 954 9. Multicast/Anycast Considerations 956 On multicast-capable tunnels, encapsulated packets sent by an ITE may 957 be received by potentially many ETEs. In that case, the ITE can 958 still send unicast probe messages to receive probe replies from a 959 specific ETE, or it can send multicast probe messages to receive 960 replies from all ETEs in the multicast group that receive the probe. 961 If the ITE were to send a multicast MINMTU probe message as described 962 in Section 5.4.6, however, it would be unable to discern whether all 963 ETEs received the probe unless it had some way of tracking the full 964 constituency of the multicast group. For multicast ETE addresses, 965 the ITE would therefore ordinarily set MAXMTU=MINMTU and DOFRAG=TRUE. 966 But, the setting of these values may be situation-dependent and based 967 on whether the ITE can tolerate packet loss to ETEs that may be 968 reached by subnetwork paths having small MTUs. 970 For ETEs that configure an anycast address, if the ITE sends a MINMTU 971 probe message it may receive a probe reply from a first ETE but then 972 be re-routed to a second ETE. It is therefore necessary for the ITE 973 to continue to send periodic probes (subject to rate limiting) as 974 described in Section 5.4.6 so that any path oscillations between ETEs 975 that configure the same anycast address will not result in a 976 sustained path MTU black hole. 978 10. Compatibility Considerations 980 Since SEAL is based on the standard IPv6 fragment header, the ITE can 981 implement the scheme independently of any ETE implementations. 982 Therefore, if the ITE uses SEAL but the ETE does not the ITE can 983 still send a MINMTU probe as specified in Section 5.4.6 but may 984 receive an ordinary (i.e., non SEAL-encapsulated) probe reply. If 985 so, it SHOULD reset DOFRAG to FALSE the same as if the ETE returned a 986 SEAL-encapsulated probe reply. 988 In some cases, a non-SEAL ETE may not be able to reassemble 989 fragmented SEAL packets up to (MINMTU+HLEN) bytes, since [RFC2460] 990 only requires IPv6 nodes to reassemble packets up to 1500 bytes in 991 length. To test for this condition, the ITE can create a MINMTU 992 probe message, fragment the message into two pieces, then send both 993 fragments to the ETE. If the ETE returns a probe reply, the ITE has 994 assurance that the ETE is capable of reassembly. Otherwise, the ITE 995 SHOULD reset MAXMTU for this subnetwork path to (MINMTU-HLEN) or even 996 smaller if the ETE still cannot accept packets of this size. 998 11. Nested Encapsulation Considerations 1000 SEAL supports nested tunneling - an example would be a recursive 1001 nesting of mobile networks, where the first network receives service 1002 from an ISP, the second network receives service from the first 1003 network, the third network receives service from the second network, 1004 etc. Since it is imperative that such nesting not extend 1005 indefinitely, tunnels that use SEAL SHOULD honor the Encapsulation 1006 Limit option defined in [RFC2473]. 1008 12. Reliability Considerations 1010 Although a tunnel may span an arbitrarily-large subnetwork expanse, 1011 the IP layer sees the tunnel as a simple link that supports the IP 1012 service model. Links with high bit error rates (BERs) (e.g., IEEE 1013 802.11) use Automatic Repeat-ReQuest (ARQ) mechanisms [RFC3366] to 1014 increase packet delivery ratios, while links with much lower BERs 1015 typically omit such mechanisms. Since Tunnels may traverse 1016 arbitrarily-long paths over links of various types that are already 1017 either performing or omitting ARQ as appropriate, it would therefore 1018 be inefficient to require the tunnel endpoints to also perform ARQ. 1020 13. Integrity Considerations 1022 Fragmentation and reassembly schemes must consider packet-splicing 1023 errors, e.g., when two fragments from the same packet are 1024 concatenated incorrectly, when a fragment from packet X is 1025 reassembled with fragments from packet Y, etc. The primary sources 1026 of such errors include implementation bugs and wrapping ID fields. 1028 In particular, the IPv4 16-bit ID field can wrap with only 64K 1029 packets with the same (src, dst, protocol)-tuple alive in the system 1030 at a given time [RFC4963]. When the IPv4 ID field is re-written by a 1031 middlebox such as a NAT or Firewall, ID field wrapping can occur with 1032 even fewer packets alive in the system. 1034 Fortunately, SEAL includes a 32-bit ID field the same as for IPv6 1035 fragmentation and also only employs SEAL fragmentation for packets up 1036 to 1500 bytes in length. SEAL also only allows IPv4 network 1037 fragmentation for packets up to 1280 bytes in length, but this size 1038 is small enough to fit within the MTU of modern high-speed IPv4 links 1039 without fragmentation. IPv4 links with smaller MTUs certainly exist, 1040 but typically support data rates that are slow enough to preclude 1041 high data rate reassembly misassociations errors; hence, a small 1042 amount of IPv4 fragmentation is deemed acceptable. 1044 14. IANA Considerations 1046 The IANA is requested to allocate a User Port number for "SEAL" in 1047 the 'port-numbers' registry. The Service Name is "SEAL", and the 1048 Transport Protocols are TCP and UDP. The Assignee is the IESG 1049 (iesg@ietf.org) and the Contact is the IETF Chair (chair@ietf.org). 1050 The Description is "Subnetwork Encapsulation and Adaptation Layer 1051 (SEAL)", and the Reference is the RFC-to-be currently known as 1052 'draft-templin-intarea-seal'. 1054 15. Security Considerations 1056 Neighbor relationships between the ITE and ETE should be secured in 1057 environments where authentication and/or confidentiality are a matter 1058 of concern. Securing mechanisms such as Secure Neighbor Discovery 1059 (SeND) [RFC3971] and IPsec [RFC4301] can be used for this purpose, 1060 however the tunnel neighbor relationship is managed by the tunnel 1061 protocols that ride over SEAL (as an encapsulation sublayer) rather 1062 than by SEAL itself. 1064 Security issues that apply to tunneling in general are discussed in 1065 [RFC6169]. 1067 16. Related Work 1069 Section 3.1.7 of [RFC2764] provides a high-level sketch for 1070 supporting large tunnel MTUs via a tunnel-layer fragmentation and 1071 reassembly capability to avoid IP layer fragmentation. 1073 Section 3 of [RFC4459] describes inner and outer fragmentation at the 1074 tunnel endpoints as alternatives for accommodating the tunnel MTU. 1076 Section 4 of [RFC2460] specifies a method for inserting and 1077 processing extension headers between the base IPv6 header and 1078 transport layer protocol data. The SEAL header is inserted and 1079 processed in exactly the same manner. 1081 The concepts of path MTU determination through the report of 1082 fragmentation and extending the IPv4 Identification field were first 1083 proposed in deliberations of the TCP-IP mailing list and the Path MTU 1084 Discovery Working Group (MTUDWG) during the late 1980's and early 1085 1990's. An historical analysis of the evolution of these concepts, 1086 as well as the development of the eventual PMTUD mechanism, appears 1087 in [RFC5320]. 1089 17. Implementation Status 1091 An early implementation of the first revision of SEAL [RFC5320] is 1092 available at: http://isatap.com/seal. 1094 An implementation of the current version of SEAL is available at: 1095 http://linkupnetworks.com/seal/sealv2-1.0.tgz. 1097 18. Acknowledgments 1099 The following individuals are acknowledged for helpful comments and 1100 suggestions: Jari Arkko, Fred Baker, Iljitsch van Beijnum, Oliver 1101 Bonaventure, Teco Boot, Bob Braden, Brian Carpenter, Steve Casner, 1102 Ian Chakeres, Noel Chiappa, Remi Denis-Courmont, Remi Despres, Ralph 1103 Droms, Aurnaud Ebalard, Gorry Fairhurst, Washam Fan, Dino Farinacci, 1104 Joel Halpern, Brian Haberman, Sam Hartman, John Heffner, Thomas 1105 Henderson, Bob Hinden, Christian Huitema, Eliot Lear, Darrel Lewis, 1106 Joe Macker, Matt Mathis, Erik Nordmark, Dan Romascanu, Dave Thaler, 1107 Joe Touch, Mark Townsley, Ole Troan, Margaret Wasserman, Magnus 1108 Westerlund, Robin Whittle, James Woodyatt, and members of the Boeing 1109 Research & Technology NST DC&NT group. 1111 Discussions with colleagues following the publication of [RFC5320] 1112 have provided useful insights that have resulted in significant 1113 improvements to this, the Second Edition of SEAL. In particular, 1114 this work has been encouraged and supported by Boeing colleagues 1115 including Balaguruna Chidambaram, Jeff Holland, Cam Brodie, Yueli 1116 Yang, Wen Fang, Ed King, Mike Slane, Kent Shuey, Gen MacLean, and 1117 other members of the BR&T and BIT mobile networking teams. 1119 This document received substantial review input from the IESG and 1120 IETF area directorates in the February 2013 timeframe. IESG members 1121 and IETF area directorate representatives who contributed helpful 1122 comments and suggestions are gratefully acknowledged. Discussions on 1123 the IETF IPv6 and Intarea mailing lists in the summer 2013 timeframe 1124 also stimulated several useful ideas. 1126 Path MTU determination through the report of fragmentation was first 1127 proposed by Charles Lynn on the TCP-IP mailing list in 1987. 1128 Extending the IP identification field was first proposed by Steve 1129 Deering on the MTUDWG mailing list in 1989. Steve Deering also 1130 proposed the IPv6 minimum MTU of 1280 bytes on the IPng mailing list 1131 in 1997. 1133 19. References 1135 19.1. Normative References 1137 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1138 September 1981. 1140 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1141 RFC 792, September 1981. 1143 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1144 Communication Layers", STD 3, RFC 1122, October 1989. 1146 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1147 Requirement Levels", BCP 14, RFC 2119, March 1997. 1149 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1150 (IPv6) Specification", RFC 2460, December 1998. 1152 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1153 Message Protocol (ICMPv6) for the Internet Protocol 1154 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1156 19.2. Informative References 1158 [FOLK] Shannon, C., Moore, D., and k. claffy, "Beyond Folklore: 1159 Observations on Fragmented Traffic", December 2002. 1161 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 1162 October 1987. 1164 [I-D.taylor-v6ops-fragdrop] 1165 Jaeggli, J., Colitti, L., Kumari, W., Vyncke, E., Kaeo, 1166 M., and T. Taylor, "Why Operators Filter Fragments and 1167 What It Implies", draft-taylor-v6ops-fragdrop-01 (work in 1168 progress), June 2013. 1170 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1171 August 1980. 1173 [RFC0994] International Organization for Standardization (ISO) and 1174 American National Standards Institute (ANSI), "Final text 1175 of DIS 8473, Protocol for Providing the Connectionless- 1176 mode Network Service", RFC 994, March 1986. 1178 [RFC1070] Hagens, R., Hall, N., and M. Rose, "Use of the Internet as 1179 a subnetwork for experimentation with the OSI network 1180 layer", RFC 1070, February 1989. 1182 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1183 November 1990. 1185 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic 1186 Routing Encapsulation (GRE)", RFC 1701, October 1994. 1188 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 1189 RFC 1812, June 1995. 1191 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1192 for IP version 6", RFC 1981, August 1996. 1194 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1195 October 1996. 1197 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1198 IPv6 Specification", RFC 2473, December 1998. 1200 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 1201 RFC 2675, August 1999. 1203 [RFC2764] Gleeson, B., Heinanen, J., Lin, A., Armitage, G., and A. 1204 Malis, "A Framework for IP Based Virtual Private 1205 Networks", RFC 2764, February 2000. 1207 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1208 Defeating Denial of Service Attacks which employ IP Source 1209 Address Spoofing", BCP 38, RFC 2827, May 2000. 1211 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1212 RFC 2923, September 2000. 1214 [RFC3232] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by 1215 an On-line Database", RFC 3232, January 2002. 1217 [RFC3366] Fairhurst, G. and L. Wood, "Advice to link designers on 1218 link Automatic Repeat reQuest (ARQ)", BCP 62, RFC 3366, 1219 August 2002. 1221 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1222 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1223 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1224 RFC 3819, July 2004. 1226 [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure 1227 Neighbor Discovery (SEND)", RFC 3971, March 2005. 1229 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1230 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1232 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1233 Internet Protocol", RFC 4301, December 2005. 1235 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1236 Network Tunneling", RFC 4459, April 2006. 1238 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1239 Discovery", RFC 4821, March 2007. 1241 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1242 Errors at High Data Rates", RFC 4963, July 2007. 1244 [RFC5320] Templin, F., "The Subnetwork Encapsulation and Adaptation 1245 Layer (SEAL)", RFC 5320, February 2010. 1247 [RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments", 1248 RFC 5722, December 2009. 1250 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 1252 [RFC6169] Krishnan, S., Thaler, D., and J. Hoagland, "Security 1253 Concerns with IP Tunneling", RFC 6169, April 2011. 1255 [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node 1256 Requirements", RFC 6434, December 2011. 1258 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1259 for Equal Cost Multipath Routing and Link Aggregation in 1260 Tunnels", RFC 6438, November 2011. 1262 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 1263 RFC 6864, February 2013. 1265 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1266 UDP Checksums for Tunneled Packets", RFC 6935, April 2013. 1268 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1269 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1270 RFC 6936, April 2013. 1272 [RFC6946] Gont, F., "Processing of IPv6 "Atomic" Fragments", 1273 RFC 6946, May 2013. 1275 [RIPE] De Boer, M. and J. Bosma, "Discovering Path MTU Black 1276 Holes on the Internet using RIPE Atlas", July 2012. 1278 [SIGCOMM] Luckie, M. and B. Stasiewicz, "Measuring Path MTU 1279 Discovery Behavior", November 2010. 1281 [TBIT] Medina, A., Allman, M., and S. Floyd, "Measuring 1282 Interactions Between Transport Protocols and Middleboxes", 1283 October 2004. 1285 [WAND] Luckie, M., Cho, K., and B. Owens, "Inferring and 1286 Debugging Path MTU Discovery Failures", October 2005. 1288 Author's Address 1290 Fred L. Templin (editor) 1291 Boeing Research & Technology 1292 P.O. Box 3707 1293 Seattle, WA 98124 1294 USA 1296 Email: fltemplin@acm.org