idnits 2.17.1 draft-templin-intarea-seal-66.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 19, 2013) is 3780 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.massar-v6ops-ayiya' is defined on line 1178, but no explicit reference was found in the text == Unused Reference: 'RFC0768' is defined on line 1193, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-02) exists of draft-taylor-v6ops-fragdrop-01 == Outdated reference: A later version (-16) exists of draft-templin-ironbis-15 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Obsolete informational reference (is this intentional?): RFC 6434 (Obsoleted by RFC 8504) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Obsoletes: rfc5320 (if approved) December 19, 2013 5 Updates: rfc2460 (if approved) 6 Intended status: Standards Track 7 Expires: June 22, 2014 9 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 10 draft-templin-intarea-seal-66.txt 12 Abstract 14 This document specifies a Subnetwork Encapsulation and Adaptation 15 Layer (SEAL). SEAL operates over virtual topologies configured over 16 connected IP network routing regions bounded by encapsulating border 17 nodes. These virtual topologies are manifested by tunnels that may 18 span multiple IP and/or sub-IP layer forwarding hops, where they may 19 incur packet duplication, packet reordering, source address spoofing 20 and traversal of links with diverse Maximum Transmission Units 21 (MTUs). SEAL addresses these issues through the encapsulation and 22 messaging mechanisms specified in this document. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on June 22, 2014. 41 Copyright Notice 43 Copyright (c) 2013 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 1.3. Differences with RFC5320 . . . . . . . . . . . . . . . . . 7 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 63 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 9 64 4. Applicability Statement . . . . . . . . . . . . . . . . . . . 9 65 5. SEAL Specification . . . . . . . . . . . . . . . . . . . . . . 10 66 5.1. SEAL Tunnel Model . . . . . . . . . . . . . . . . . . . . 10 67 5.2. SEAL Model of Operation . . . . . . . . . . . . . . . . . 11 68 5.3. SEAL Encapsulation Format . . . . . . . . . . . . . . . . 12 69 5.4. ITE Specification . . . . . . . . . . . . . . . . . . . . 13 70 5.4.1. Tunnel MTU . . . . . . . . . . . . . . . . . . . . . . 13 71 5.4.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 14 72 5.4.3. SEAL Layer Pre-Processing . . . . . . . . . . . . . . 15 73 5.4.4. SEAL Encapsulation and Fragmentation . . . . . . . . . 16 74 5.4.5. Outer Encapsulation . . . . . . . . . . . . . . . . . 17 75 5.4.6. Path Probing and ETE Reachability Verification . . . . 17 76 5.4.7. Processing ICMP Messages . . . . . . . . . . . . . . . 18 77 5.4.8. Detecting Path MTU Changes . . . . . . . . . . . . . . 19 78 5.5. ETE Specification . . . . . . . . . . . . . . . . . . . . 19 79 5.5.1. Reassembly Buffer Requirements . . . . . . . . . . . . 19 80 5.5.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 20 81 5.5.3. IPv4-Layer Reassembly . . . . . . . . . . . . . . . . 20 82 5.5.4. Decapsulation, SEAL-Layer Reassembly, and 83 Re-Encapsulation . . . . . . . . . . . . . . . . . . . 20 84 6. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 21 85 7. End System Requirements . . . . . . . . . . . . . . . . . . . 21 86 8. Router Requirements . . . . . . . . . . . . . . . . . . . . . 21 87 9. Multicast/Anycast Considerations . . . . . . . . . . . . . . . 22 88 10. Compatibility Considerations . . . . . . . . . . . . . . . . . 22 89 11. Nested Encapsulation Considerations . . . . . . . . . . . . . 23 90 12. Reliability Considerations . . . . . . . . . . . . . . . . . . 23 91 13. Integrity Considerations . . . . . . . . . . . . . . . . . . . 23 92 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 93 15. Security Considerations . . . . . . . . . . . . . . . . . . . 24 94 16. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 24 95 17. Implementation Status . . . . . . . . . . . . . . . . . . . . 25 96 18. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 97 19. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 98 19.1. Normative References . . . . . . . . . . . . . . . . . . . 26 99 19.2. Informative References . . . . . . . . . . . . . . . . . . 26 100 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 102 1. Introduction 104 As Internet technology and communication has grown and matured, many 105 techniques have developed that use virtual topologies (manifested by 106 tunnels of one form or another) over an actual network that supports 107 the Internet Protocol (IP) [RFC0791][RFC2460]. Those virtual 108 topologies have elements that appear as one network layer hop, but 109 are actually multiple IP or sub-IP layer hops which comprise the 110 "subnetwork" over which the tunnel operates. 112 The use of IP encapsulation (also known as "tunneling") has long been 113 considered as the means for creating such virtual topologies (e.g., 114 see [RFC2003][RFC2473]). Tunnels serve a wide variety of purposes, 115 including mobility, security, routing control, traffic engineering, 116 multihoming, etc., and will remain an integral part of the 117 architecture moving forward. However, the encapsulation headers 118 often include insufficiently provisioned per-packet identification 119 values. IP encapsulation also allows an attacker to produce 120 encapsulated packets with spoofed source addresses even if the source 121 address in the encapsulating header cannot be spoofed. A denial-of- 122 service vector that is not possible in non-tunneled subnetworks is 123 therefore presented. 125 Additionally, the insertion of an outer IP header reduces the 126 effective Maximum Transmission Unit (MTU) visible to the inner 127 network layer. When IPv6 is used as the encapsulation protocol, 128 original sources expect to be informed of the MTU limitation through 129 IPv6 Path MTU discovery (PMTUD) [RFC1981]. When IPv4 is used, this 130 reduced MTU can be accommodated through the use of IPv4 131 fragmentation, but unmitigated in-the-network fragmentation has been 132 deemed harmful through operational experience and studies conducted 133 over the course of many years [FRAG][FOLK][RFC4963]. Additionally, 134 classical IPv4 PMTUD [RFC1191] has known operational issues that are 135 exacerbated by in-the-network tunnels [RFC2923][RFC4459]. 137 The following subsections present further details on the motivation 138 and approach for addressing these issues. 140 1.1. Motivation 142 Before discussing the approach, it is necessary to first understand 143 the problems. In both the Internet and private-use networks today, 144 IP is ubiquitously deployed as the Layer 3 protocol. The primary 145 functions of IP are to provide for routing, addressing, and a 146 fragmentation and reassembly capability used to accommodate links 147 with diverse MTUs. While it is well known that the IP address space 148 is rapidly becoming depleted, there is also a growing awareness that 149 other IP protocol limitations have already or may soon become 150 problematic. 152 First, the Internet historically provided no means for discerning 153 whether the source addresses of IP packets are authentic. This 154 shortcoming is being addressed more and more through the deployment 155 of site border router ingress filters [RFC2827], however the use of 156 encapsulation provides a vector for an attacker to circumvent 157 filtering for the encapsulated packet even if filtering is correctly 158 applied to the encapsulation header. Secondly, the IP header does 159 not include a well-behaved identification value unless the source has 160 included a fragment header for IPv6 or unless the source permits 161 fragmentation for IPv4. These limitations preclude an efficient 162 means for routers to detect duplicate packets and packets that have 163 been re-ordered within the subnetwork. Additionally, recent studies 164 have shown that the arrival of fragments at high data rates can cause 165 denial-of-service (DoS) attacks on performance-sensitive networking 166 gear, prompting some administrators to configure their equipment to 167 drop fragments unconditionally [I-D.taylor-v6ops-fragdrop]. 169 For IPv4 encapsulation, when fragmentation is permitted the header 170 includes a 16-bit Identification field, meaning that at most 2^16 171 unique packets with the same (source, destination, protocol)-tuple 172 can be active in the network at the same time [RFC6864]. (When 173 middleboxes such as Network Address Translators (NATs) re-write the 174 Identification field to random values, the number of unique packets 175 is even further reduced.) Due to the escalating deployment of high- 176 speed links, however, these numbers have become too small by several 177 orders of magnitude for high data rate packet sources such as tunnel 178 endpoints [RFC4963]. 180 Furthermore, there are many well-known limitations pertaining to IPv4 181 fragmentation and reassembly - even to the point that it has been 182 deemed "harmful" in both classic and modern-day studies (see above). 183 In particular, IPv4 fragmentation raises issues ranging from minor 184 annoyances (e.g., in-the-network router fragmentation [RFC1981]) to 185 the potential for major integrity issues (e.g., mis-association of 186 the fragments of multiple IP packets during reassembly [RFC4963]). 188 As a result of these perceived limitations, a fragmentation-avoiding 189 technique for discovering the MTU of the forward path from a source 190 to a destination node was devised through the deliberations of the 191 Path MTU Discovery Working Group (MTUDWG) during the late 1980's 192 through early 1990's which resulted in the publication of [RFC1191]. 193 In this negative feedback-based method, the source node provides 194 explicit instructions to routers in the path to discard the packet 195 and return an ICMP error message if an MTU restriction is 196 encountered. However, this approach has several serious shortcomings 197 that lead to an overall "brittleness" [RFC2923]. 199 In particular, site border routers in the Internet have been known to 200 discard ICMP error messages coming from the outside world. This is 201 due in large part to the fact that malicious spoofing of error 202 messages in the Internet is trivial since there is no way to 203 authenticate the source of the messages [RFC5927]. Furthermore, when 204 a source node that requires ICMP error message feedback when a packet 205 is dropped due to an MTU restriction does not receive the messages, a 206 path MTU-related black hole occurs. This means that the source will 207 continue to send packets that are too large and never receive an 208 indication from the network that they are being discarded. This 209 behavior has been confirmed through documented studies showing clear 210 evidence of PMTUD failures for both IPv4 and IPv6 in the Internet 211 today [TBIT][WAND][SIGCOMM][RIPE]. 213 The issues with both IP fragmentation and this "classical" PMTUD 214 method are exacerbated further when IP tunneling is used [RFC4459]. 215 For example, a tunnel ingress may be required to forward encapsulated 216 packets into the subnetwork on behalf of hundreds, thousands, or even 217 more original sources. If the ITE allows IP fragmentation on the 218 encapsulated packets, persistent fragmentation could lead to 219 undetected data corruption due to Identification field wrapping 220 and/or reassembly congestion at the tunnel egress. If the ingress 221 instead uses classical IP PMTUD it must rely on ICMP error messages 222 coming from the subnetwork that may be suspect, subject to loss due 223 to filtering middleboxes, or insufficiently provisioned for 224 translation into error messages to be returned to the original 225 sources. 227 Although recent works have led to the development of a positive 228 feedback-based end-to-end MTU determination scheme [RFC4821], they do 229 not excuse tunnels from accounting for the encapsulation overhead 230 they add to packets. Moreover, in current practice existing 231 tunneling protocols mask the MTU issues by selecting a "lowest common 232 denominator" MTU that may be much smaller than necessary for most 233 paths and difficult to change at a later date. Therefore, a new 234 approach to accommodate tunnels over links with diverse MTUs is 235 necessary. 237 1.2. Approach 239 This document concerns subnetworks manifested through a virtual 240 topology configured over a connected network routing region and 241 bounded by encapsulating border nodes. Example connected network 242 routing regions include Mobile Ad hoc Networks (MANETs), enterprise 243 networks, aviation networks and the global public Internet itself. 244 Subnetwork border nodes forward unicast and multicast packets over 245 the virtual topology across multiple IP and/or sub-IP layer 246 forwarding hops that may introduce packet duplication and/or traverse 247 links with diverse Maximum Transmission Units (MTUs). 249 This document introduces a Subnetwork Encapsulation and Adaptation 250 Layer (SEAL) for tunneling inner network layer protocol packets over 251 IP subnetworks that connect Ingress and Egress Tunnel Endpoints 252 (ITEs/ETEs) of border nodes. It provides a modular specification 253 designed to be tailored to specific associated tunneling protocols. 254 (A transport-mode of operation is also possible but out of scope for 255 this document.) 257 SEAL provides a mid-layer encapsulation that accommodates links with 258 diverse MTUs, and allows routers in the subnetwork to perform 259 efficient duplicate packet and packet reordering detection. The 260 encapsulation further ensures message origin authentication, packet 261 header integrity and anti-replay in environments in which these 262 functions are necessary. 264 SEAL treats tunnels that traverse the subnetwork as ordinary links 265 that must support network layer services. Moreover, SEAL provides 266 dynamic mechanisms (including limited fragmentation and reassembly) 267 to ensure a maximal path MTU over the tunnel. This is in contrast to 268 static approaches which avoid MTU issues by selecting a lowest common 269 denominator MTU value that may be overly conservative for the vast 270 majority of tunnel paths and difficult to change even when larger 271 MTUs become available. 273 1.3. Differences with RFC5320 275 This specification of SEAL is descended from an experimental 276 independent RFC publication of the same name [RFC5320]. However, 277 this specification introduces a number of fundamental differences 278 from the earlier publication. This specification therefore obsoletes 279 (i.e., and does not update) [RFC5320]. 281 First, [RFC5320] forms a 32-bit Identification value by concatenating 282 the 16-bit IPv4 Identification field with a 16-bit Identification 283 "extension" field in the SEAL header. This means that [RFC5320] can 284 only operate over IPv4 networks (since IPv6 headers do not include a 285 16-bit version number) and that the SEAL Identification value can be 286 corrupted if the Identification in the outer IPv4 header is 287 rewritten. In contrast, this specification includes a 32-bit 288 Identification value that is independent of any identification fields 289 found in the inner or outer IP headers, and is therefore compatible 290 with any inner and outer IP protocol version combinations. 292 Additionally, the SEAL fragmentation and reassembly procedures 293 defined in [RFC5320] differ significantly from those found in this 294 specification. In particular, this specification defines an 13-bit 295 Offset field that allows for finer-grained fragment sizes when SEAL 296 fragmentation and reassembly is necessary. In contrast, [RFC5320] 297 includes only a 3-bit Segment field and performs reassembly through 298 concatenation of consecutive segments. 300 Finally, SEAL no longer uses the IPv4 fragmentation sensing method 301 specified in [RFC5320] as well as in earlier versions of this 302 document. This departure is based on the fact that there is no way 303 for the ITE or ETE to control the way in which middleboxes perform 304 IPv4 fragmentation (e.g., largest fragment first, smallest fragment 305 first, all fragments the same size, etc.). Moreover, there may be 306 middleboxes in the path that reassemble IPv4 fragmented packets 307 before delivering them to the ETE as the final destination. Use of 308 IPv4 fragmentation sensing in the ETE also greatly complicated the 309 specification and proved difficult to implement. Therefore, although 310 the IPv4 fragmentation sensing method is conceptually elegant and 311 natural, it is no longer included. 313 2. Terminology 315 The following terms are defined within the scope of this document: 317 subnetwork 318 a virtual topology configured over a connected network routing 319 region and bounded by encapsulating border nodes. 321 IP 322 used to generically refer to either Internet Protocol (IP) 323 version, i.e., IPv4 or IPv6. 325 Ingress Tunnel Endpoint (ITE) 326 a portal over which an encapsulating border node (host or router) 327 sends encapsulated packets into the subnetwork. 329 Egress Tunnel Endpoint (ETE) 330 a portal over which an encapsulating border node (host or router) 331 receives encapsulated packets from the subnetwork. 333 inner packet 334 an unencapsulated network layer protocol packet (e.g., IPv4 335 [RFC0791], OSI/CLNP [RFC0994], IPv6 [RFC2460], etc.) before any 336 outer encapsulations are added. Internet protocol numbers that 337 identify inner packets are found in the IANA Internet Protocol 338 registry [RFC3232]. SEAL protocol packets that incur an 339 additional layer of SEAL encapsulation are also considered inner 340 packets. 342 outer IP packet 343 a packet resulting from adding an outer IP header (and possibly 344 other outer headers) to a SEAL-encapsulated inner packet. 346 packet-in-error 347 the leading portion of an invoking data packet encapsulated in the 348 body of an error control message (e.g., an ICMPv4 [RFC0792] error 349 message, an ICMPv6 [RFC4443] error message, etc.). 351 Packet Too Big (PTB) message 352 a control plane message indicating an MTU restriction (e.g., an 353 ICMPv6 "Packet Too Big" message [RFC4443], an ICMPv4 354 "Fragmentation Needed" message [RFC0792], etc.). 356 Don't Fragment (DF) bit 357 a bit that indicates whether the packet may be fragmented by the 358 network. The DF bit is explicitly included in the IPv4 header 359 [RFC0791] and may be set to '0' to allow fragmentation or '1' to 360 disallow further in-network fragmentation. The bit is absent from 361 the IPv6 header [RFC2460], but implicitly set to '1' because 362 fragmentation can occur only at IPv6 sources. 364 3. Requirements 366 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 367 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 368 document are to be interpreted as described in [RFC2119]. When used 369 in lower case (e.g., must, must not, etc.), these words MUST NOT be 370 interpreted as described in [RFC2119], but are rather interpreted as 371 they would be in common English. 373 4. Applicability Statement 375 SEAL was originally motivated by the specific case of subnetwork 376 abstraction for Mobile Ad hoc Networks (MANETs), however the domain 377 of applicability also extends to subnetwork abstractions over 378 enterprise networks, mobile networks, aviation networks, ISP 379 networks, SO/HO networks, the global public Internet itself, and any 380 other connected network routing region. 382 SEAL provides a network sublayer used during encapsulation of an 383 inner network layer packet within outer encapsulating headers. SEAL 384 can also be used as a sublayer within a transport layer protocol data 385 payload, where transport layer encapsulation is typically used for 386 Network Address Translator (NAT) traversal as well as operation over 387 subnetworks that give preferential treatment to certain "core" 388 Internet protocols, e.g., TCP, UDP, etc. (However, note that TCP 389 encapsulation may not be appropriate for all use cases; particularly 390 those that require low delay and/or delay variance.) The SEAL header 391 is processed in the same manner as for IPv6 extension headers, i.e., 392 it is not part of the outer IP header but rather allows for the 393 creation of an arbitrarily extensible chain of headers in the same 394 way that IPv6 does. 396 To accommodate MTU diversity, the Ingress Tunnel Endpoint (ITE) may 397 need to perform limited fragmentation which the Egress Tunnel 398 Endpoint (ETE) reassembles. The ITE and ETE further engage in 399 minimal path probing to determine when the path can be traversed 400 without fragmentation. This allows the ITE to send whole packets 401 instead of fragmented packets whenever possible. 403 In practice, SEAL is typically used as an encapsulation sublayer in 404 conjunction with existing tunnel types such as IPsec [RFC4301] , 405 GRE[RFC1701], IP-in-IPv6 [RFC2473], IP-in-IPv4 [RFC4213][RFC2003], 406 etc. When used with existing tunnel types that insert mid-layer 407 headers between the inner and outer IP headers (e.g., IPsec, GRE, 408 etc.), the SEAL header is inserted between the mid-layer headers and 409 outer IP header. 411 5. SEAL Specification 413 The following sections specify the operation of SEAL: 415 5.1. SEAL Tunnel Model 417 SEAL is an encapsulation sublayer used within point-to-point, point- 418 to-multipoint, and non-broadcast, multiple access (NBMA) tunnels. 419 SEAL can also be used with multicast-capable tunnels, but the path 420 probing mechanisms specified in the following sections apply only to 421 unicast tunnels. 423 Each tunnel is configured over one or more underlying interfaces 424 attached to subnetwork links, where each link represents a different 425 subnetwork path. The tunnel connects an ITE to one or more ETE 426 "neighbors" via encapsulation across an underlying subnetwork, where 427 each tunnel neighbor relationship is maintained over one or more 428 subnetwork paths. The tunnel neighbor relationship may be 429 bidirectional, partially unidirectional or fully unidirectional. 431 A bidirectional tunnel neighbor relationship is one over which both 432 tunnel endpoints can exchange both data and control messages. A 433 partially unidirectional tunnel neighbor relationship allows the near 434 end ITE to send data packets forward to the far end ETE, while the 435 far end only returns control messages when necessary. Finally, a 436 fully unidirectional mode of operation is one in which the near end 437 ITE can receive neither data nor control messages from the far end 438 ETE. 440 5.2. SEAL Model of Operation 442 SEAL-enabled ITEs encapsulate each inner packet in any ancillary 443 tunnel protocol headers and trailers, a SEAL header, and any outer 444 header encapsulations as shown in Figure 1: 446 +--------------------+ 447 ~ outer IP header ~ 448 +--------------------+ 449 ~ other outer hdrs ~ 450 +--------------------+ 451 ~ SEAL header ~ 452 +--------------------+ 453 ~ tunnel headers ~ 454 +--------------------+ +--------------------+ 455 | | --> | | 456 ~ Inner ~ --> ~ Inner ~ 457 ~ Packet ~ --> ~ Packet ~ 458 | | --> | | 459 +--------------------+ +--------------------+ 460 ~ tunnel trailers ~ 461 +--------------------+ 463 Figure 1: SEAL Encapsulation 465 The ITE inserts the SEAL header according to the specific tunneling 466 protocol. For simple encapsulation of an inner network layer packet 467 within an outer IP header, the ITE inserts the SEAL header following 468 the outer IP header and before the inner packet as: IP/SEAL/{inner 469 packet}. 471 For encapsulations over transports such as UDP, the ITE inserts the 472 SEAL header following the outer transport layer header and before the 473 inner packet, e.g., as IP/UDP/SEAL/{inner packet}. In that case, the 474 UDP header is seen as an "other outer header" as depicted in Figure 1 475 and the outer IP and transport layer headers are together seen as the 476 outer encapsulation headers. (Note that outer transport layer 477 headers such as UDP must sometimes be included to ensure that SEAL 478 packets will traverse the path to the ETE without loss due filtering 479 middleboxes. The ETE MUST accept both IP/SEAL and IP/UDP/SEAL as 480 equivalent packets so that the ITE can discontinue outer transport 481 layer encapsulation if the path supports raw IP/SEAL encapsulation.) 482 For SEAL encapsulations that involve tunnel types that include 483 ancillary tunnel headers (e.g., GRE, IPsec, etc.) the ITE inserts the 484 SEAL header as a leading extension to the tunnel headers, i.e., the 485 SEAL encapsulation appears as part of the same tunnel and not a 486 separate tunnel. For example, for GRE the ITE inserts the SEAL 487 header as IP/SEAL/GRE/{inner packet}, and for IPsec the ITE inserts 488 the SEAL header as IP/SEAL/IPsec-header/{inner packet}/IPsec-trailer. 489 In such cases, SEAL considers the length of the inner packet only 490 (i.e., and not the other tunnel headers and trailers) when performing 491 its packet size calculations. 493 SEAL supports both "nested" tunneling and "re-encapsulating" 494 tunneling. Nested tunneling occurs when a first tunnel is 495 encapsulated within a second tunnel, which may then further be 496 encapsulated within additional tunnels. Nested tunneling can be 497 useful, and stands in contrast to "recursive" tunneling which is an 498 anomalous condition incurred due to misconfiguration or a routing 499 loop. Considerations for nested tunneling and avoiding recursive 500 tunneling are discussed in Section 4 of [RFC2473] as well as in 501 Section 9 of this document. 503 Re-encapsulating tunneling occurs when a packet arrives at a first 504 ETE, which then acts as an ITE to re-encapsulate and forward the 505 packet to a second ETE connected to the same subnetwork. In that 506 case each ITE/ETE transition represents a segment of a bridged path 507 between the ITE nearest the source and the ETE nearest the 508 destination. Considerations for re-encapsulating tunneling are 509 discussed in[I-D.templin-ironbis]. Combinations of nested and re- 510 encapsulating tunneling are also naturally supported by SEAL. 512 The SEAL ITE considers each underlying interface as the ingress 513 attachment point to a separate subnetwork path to the ETE. The ITE 514 therefore may experience different path MTUs on different subnetwork 515 paths. 517 Finally, the SEAL ITE ensures that the inner network layer protocol 518 will see a minimum MTU of 1500 bytes over each subnetwork path 519 regardless of the outer network layer protocol version, i.e., even if 520 a small amount of fragmentation and reassembly are necessary. This 521 is to avoid path MTU "black holes" for the minimum MTU configured by 522 the vast majority of links in the Internet. 524 5.3. SEAL Encapsulation Format 526 The SEAL header shares the same format as the IPv6 Fragment Header 527 [RFC2460] and is identified by the same IP protocol number assigned 528 for the IPv6 Fragment Header (type '44') [RFC2460]. The SEAL header 529 is differentiated from the IPv6 Fragment Header by defining bit 530 number 30 as the "SEAL (S)" bit which is set to 1 when SEAL 531 encapsulation is used and set to 0 for ordinary IPv6 fragmentation. 532 SEAL therefore updates the IPv6 Fragment Header specification found 533 in Section 4.5 of [RFC2460] as shown in Figure 2: 535 0 1 2 3 536 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 538 | Next Header | Reserved | Fragment Offset |R|S|M| 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 | Identification | 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 Figure 2: SEAL Encapsulation Format 545 5.4. ITE Specification 547 5.4.1. Tunnel MTU 549 The tunnel must present a stable MTU value to the inner network layer 550 as the size for admission of inner packets into the tunnel. Since 551 tunnels may support a large set of subnetwork paths that accept 552 widely varying maximum packet sizes, however, a number of factors 553 should be taken into consideration when selecting a tunnel MTU. 555 Due to the ubiquitous deployment of standard Ethernet and similar 556 networking gear, the nominal Internet cell size has become 1500 557 bytes; this is the de facto size that end systems have come to expect 558 will either be delivered by the network without loss due to an MTU 559 restriction on the path or a suitable ICMP Packet Too Big (PTB) 560 message returned. When large packets sent by end systems incur 561 additional encapsulation at an ITE, however, they may be dropped 562 silently within the tunnel since the network may not always deliver 563 the necessary PTBs [RFC2923]. The ITE SHOULD therefore set a tunnel 564 MTU of at least 1500 bytes and provide accommodations to ensure that 565 packets up to that size are successfully conveyed to the ETE. 567 The inner network layer protocol consults the tunnel MTU when 568 admitting a packet into the tunnel. For non-SEAL inner IPv4 packets 569 with the IPv4 Don't Fragment (DF) bit cleared (i.e, DF==0), if the 570 packet is larger than the tunnel MTU the inner IPv4 layer uses IPv4 571 fragmentation to break the packet into fragments no larger than the 572 MTU. The ITE then admits each fragment into the tunnel as an 573 independent packet. 575 For all other inner packets, the inner network layer admits the 576 packet if it is no larger than the tunnel MTU; otherwise, it drops 577 the packet and sends a PTB error message to the source with the MTU 578 value set to the MTU. The message contains as much of the invoking 579 packet as possible without the entire message exceeding the network 580 layer minimum MTU size. 582 The ITE can alternatively set an indefinite tunnel MTU such that all 583 inner packets are admitted into the tunnel regardless of their size 584 (practical maximums are 64KB for IPv4 and 4GB for IPv6 [RFC2675]). 585 For ITEs that host applications that use the tunnel directly, this 586 option must be carefully coordinated with protocol stack upper layers 587 since some upper layer protocols (e.g., TCP) derive their packet 588 sizing parameters from the MTU of the outgoing interface and as such 589 may select too large an initial size. This is not a problem for 590 upper layers that use conservative initial maximum segment size 591 estimates and/or when the tunnel can reduce the upper layer's maximum 592 segment size, e.g., by reducing the size advertised in the MSS option 593 of outgoing TCP messages (sometimes known as "MSS clamping"). 595 In light of the above considerations, the ITE SHOULD configure an 596 indefinite MTU on *router* tunnels so that SEAL performs all 597 subnetwork adaptation from within the tunnel as specified in the 598 following sections. The ITE MAY instead set a smaller MTU on *host* 599 tunnels; in that case, the RECOMMENDED MTU is the maximum of 1500 600 bytes and the smallest MTU among all of the underlying links minus 601 the size of the encapsulation headers. 603 5.4.2. Tunnel Neighbor Soft State 605 The ITE maintains a number of soft state variables and constants. 607 The ITE maintains a per ETE window of Identification values for the 608 packets it has recently sent to this ETE as well as a per ETE window 609 of Identification values for the packets it has recently received 610 from this ETE. The ITE then includes an Identification in each 611 packet it sends to this ETE. 613 For each subnetwork path, the ITE must also account for encapsulation 614 header lengths. The ITE therefore maintains the per subnetwork path 615 constant values "SHLEN" set to the length of the SEAL header, "THLEN" 616 set to the length of the outer encapsulating transport layer headers 617 (or 0 if outer transport layer encapsulation is not used), "IHLEN" 618 set to the length of the outer IP layer header, and "HLEN" set to 619 (SHLEN+THLEN+IHLEN). When calculating these lengths, the ITE must 620 include the length of the uncompressed headers even if header 621 compression is enabled. When SEAL is used in conjunction with tunnel 622 types that insert additional headers/trailers such as GRE or IPsec, 623 the length of the additional headers is also included in the HLEN 624 calculation for the first fragment only and the length of the 625 trailers is included in the HLEN calculation for the final fragment 626 only. 628 The ITE also sets a global constant value "MINMTU" to 1500 bytes and 629 sets a per subnetwork path constant value 'FRAGMTU' to (1280-HLEN) 630 bytes (where 1280 is the minimum path MTU for IPv6 [RFC2460]. The 631 value 1280 is used regardless of the outer IP protocol version even 632 though the practical minimum MTU for IPv4 is only 576 bytes [RFC1122] 633 and the theoretical minimum MTU for IPv4 is only 68 bytes [RFC0791]. 634 The value 1280 is applied also to IPv4 since IPv4 links with MTUs 635 smaller than 1280 are presumably performance-constrained such that 636 IPv4 fragmentation can be used to accommodate MTU underruns without 637 risk of high data rate reassembly misassociations. 639 The ITE also sets a per subnetwork path variable "MAXMTU" to the 640 maximum of MINMTU and the MTU of the underlying interface minus HLEN. 641 The ITE thereafter adjusts MAXMTU based on any PTB messages it 642 receives from the subnetwork, but does not reduce MAXMTU below 643 MINMTU. 645 The ITE finally maintains a per subnetwork path boolean variable 646 "DOFRAG", which is initially set to TRUE and may be reset to FALSE if 647 the ITE discovers that the MTU on the path to the ETE is sufficient 648 to accommodate packet sizes of MINMTU bytes or larger. 650 5.4.3. SEAL Layer Pre-Processing 652 The SEAL layer is logically positioned between the inner and outer 653 network protocol layers, where the inner layer is seen as the (true) 654 network layer and the outer layer is seen as the (virtual) data link 655 layer. Each packet to be processed by the SEAL layer is either 656 admitted into the tunnel by the inner network layer protocol as 657 described in Section 5.4.1 or is undergoing re-encapsulation from 658 within the tunnel. The SEAL layer sees the former class of packets 659 as inner packets that include inner network and transport layer 660 headers, and sees the latter class of packets as transitional SEAL 661 packets that include the outer and SEAL layer headers that were 662 inserted by the previous hop SEAL ITE. For these transitional 663 packets, the SEAL layer re-encapsulates the packet with new outer and 664 SEAL layer headers when it forwards the packet to the next hop SEAL 665 ITE. 667 We now discuss the SEAL layer pre-processing actions for these two 668 classes of packets. 670 5.4.3.1. Inner Packet Pre-Processing 672 For each IPv4 inner packet with DF==0 in the IP header, if the packet 673 is larger than MINMTU the ITE first uses standard IPv4 fragmentation 674 to fragment the packet into N pieces of at most MINMTU bytes each. 675 In this process, the ITE MUST additionally ensure that N is 676 minimized, the first fragment is the largest fragment and no 677 fragments are overlapping. The ITE then submits each fragment for 678 SEAL encapsulation as specified in Section 5.4.4. 680 For all other inner packets, if the packet is no larger than MAXMTU 681 the ITE submits it for SEAL encapsulation as specified in Section 682 5.4.4. Otherwise, the ITE discards the packet and sends a PTB 683 message appropriate to the inner protocol version (subject to rate 684 limiting) with the MTU field set to MAXMTU. 686 5.4.3.2. Transitional SEAL Packet Pre-Processing 688 For each transitional packet that is to be processed by the SEAL 689 layer from within the tunnel, if the packet is larger than MAXMTU for 690 the next hop subnetwork path the ITE discards the packet and sends a 691 PTB message appropriate to the inner protocol version (subject to 692 rate limiting) with the MTU field set to MAXMTU. Otherwise, the ITE 693 sets aside the encapsulating SEAL and outer headers for later 694 reference (see Section 5.4.5) and submits the inner packet for SEAL 695 re-encapsulation as discussed in the following sections. 697 5.4.4. SEAL Encapsulation and Fragmentation 699 For each inner packet/fragment submitted for SEAL encapsulation, the 700 ITE next encapsulates the packet in a SEAL header formatted as 701 specified in Section 5.3. The ITE next sets S=1 and sets the Next 702 Header field to the protocol number corresponding to the address 703 family of the encapsulated inner packet. For example, the ITE sets 704 the Next Header field to the value '4' for encapsulated IPv4 packets 705 [RFC2003], '41' for encapsulated IPv6 packets [RFC2473][RFC4213], 706 '47' for GRE [RFC1701], '80' for encapsulated OSI/CLNP packets 707 [RFC1070], etc. 709 Next, if the inner packet is no larger than FRAGMTU, or if the inner 710 packet is larger than MINMTU, or if the DOFRAG flag is FALSE, the ITE 711 sets (M=0; Offset=0). Otherwise, the ITE fragments the inner packet 712 using the fragmentation procedures specified in Section 4.5 of 713 [RFC2460]. In this process, the ITE breaks the inner packet into two 714 non-overlapping fragments, where the encapsulated SEAL packet 715 containing the first fragment MUST be as large as possible without 716 exceeding 1280 bytes (i.e., the IPv6 minimum MTU) and the 717 encapsulated SEAL packet containing the second fragment MUST include 718 the remainder of the inner packet. This ensures that the entire IP 719 header (plus extensions) is likely to fit within the first fragment 720 and that the number of fragments is minimized. The ITE then adds the 721 outer encapsulating headers as specified in Section 5.4.5. 723 5.4.5. Outer Encapsulation 725 Following SEAL encapsulation and fragmentation, the ITE next 726 encapsulates each fragment in the requisite outer transport (when 727 necessary) and IP layer headers. When a transport layer header such 728 as UDP or TCP is included, the ITE writes the port number for SEAL in 729 the transport destination service port field. 731 When UDP encapsulation is used, the ITE sets the UDP checksum field 732 to zero for both IPv4 and IPv6 packets (see: [RFC6935][RFC6936]). 734 The ITE then sets the outer IP layer headers the same as specified 735 for ordinary IP encapsulation (e.g., [RFC1070][RFC2003], [RFC2473], 736 [RFC4213], etc.) except that for ordinary SEAL packets the ITE copies 737 the "TTL/Hop Limit", "Type of Service/Traffic Class" and "Congestion 738 Experienced" values in the inner network layer header into the 739 corresponding fields in the outer IP header. For transitional SEAL 740 packets undergoing re-encapsulation, the ITE instead copies the "TTL/ 741 Hop Limit", "Type of Service/Traffic Class" and "Congestion 742 Experienced" values in the original outer IP header of the 743 transitional packet into the corresponding fields in the new outer IP 744 header of the packet to be forwarded (i.e., the values are 745 transferred between outer headers and *not* copied from the inner 746 network layer header). 748 The ITE also sets the IP protocol number to the appropriate value for 749 the first protocol layer within the encapsulation (e.g., UDP, TCP, 750 IPv6 Fragment Header, etc.). When IPv6 is used as the outer IP 751 protocol, the ITE then sets the flow label value in the outer IPv6 752 header the same as described in [RFC6438]. When IPv4 is used as the 753 outer IP protocol, if the encapsulated SEAL packet is no larger than 754 1280 bytes the ITE sets DF=0 in the IPv4 header to allow the packet 755 to be fragmented if it encounters a restricting link; otherwise, the 756 ITE sets DF=1 (for IPv6 subnetwork paths, the DF bit is absent but 757 implicitly set to 1). The ITE finally sends each outer packet via 758 the corresponding underlying subnetwork path. 760 5.4.6. Path Probing and ETE Reachability Verification 762 When the ITE is actively sending packets over a subnetwork path to an 763 ETE, it also sends explicit MINMTU probes subject to rate limiting. 764 To generate a probe, the ITE creates a unicast ICMPv6 Neighbor 765 Solicitation (NS) message as specified in [RFC4861] and without 766 including a Source Link Layer Address Option. The ITE then pads the 767 NS message with null padding to a length of MINMTU bytes and 768 encapsulates the message in a SEAL header and any other outer headers 769 (i.e., with the length of the resulting SEAL packet being (MINMTU+ 770 HLEN) bytes). It then sets (Offset=0; S=1; M=0) in the SEAL header, 771 and also sets DF=1 in the outer IP header when IPv4 is used. It 772 finally writes the value '58' in the Next Header field of the SEAL 773 header to indicate that the message is a SEAL-encapsulated ICMPv6 774 message. 776 The ITE sends MINMTU probes to determine whether SEAL fragmentation 777 is still necessary (see Section 5.4.4). In particular, if the ITE 778 sends a probe and receives a SEAL-encapsulated ICMPv6 Neighbor 779 Advertisement (NA) message response with (Offset=0; S=1; M=0) in the 780 SEAL header (see: section 5.5.4), the ITE SHOULD set DOFRAG for this 781 subnetwork path to FALSE. This probing discipline can therefore be 782 considered as Packetization Layer Path MTU Discovery (PLPMTUD) 783 [RFC4821] applied to tunnels, which operates independently of any 784 application of PLPMTUD between end systems. Note that the probe size 785 of MINMTU bytes is chosen since probe packets smaller than this size 786 may be fragmented by a nested ITE further down the path. For 787 example, a successful probe for a packet size of 1400 bytes does not 788 guarantee that fragmentation is not occurring at the ITE of another 789 tunnel nesting level. 791 The ITE can also send ordinary NS messages to determine whether the 792 ETE is still reachable over this subnetwork path. The ITE prepares 793 the NS message as described above but without padding, sets 794 (Offset=0; S=1; M=0) in the SEAL header, then sends the NS message to 795 the ETE. If the ITE receives a solicited NA message response, its 796 upper layers MAY consider the message as a reachability hint. The 797 ITE can also send probes to detect whether an outer transport layer 798 header is no longer necessary to reach this ETE. For example, if the 799 ITE sends its initial packets as IP/UDP/SEAL/*, it can send probes 800 constructed as IP/SEAL/* to determine whether the ETE is reachable 801 without the use of transport layer encapsulation. If so, the ITE 802 should also send a new probe since switching to a new encapsulation 803 format may result in a path change. 805 While probing, the ITE processes ICMP messages as specified in 806 Section 5.4.7. 808 5.4.7. Processing ICMP Messages 810 When the ITE sends SEAL packets, it may receive ICMP error messages 811 [RFC0792][RFC4443] from a router on the path to the ETE. Each ICMP 812 message includes an outer IP header, followed by an ICMP header, 813 followed by a portion of the SEAL packet that generated the error 814 (also known as the "packet-in-error"). Note that the ITE may receive 815 an ICMP message from either an ordinary router on the path or from 816 another ITE that is at the head end of a nested level of 817 encapsulation. The ITE has no security associations with this nested 818 ITE, hence it should consider the message the same as if it 819 originated from an ordinary router. 821 The ITE should process ICMP Protocol/Port Unreachable messages as a 822 hint that the ETE does not implement SEAL. The ITE can optionally 823 ignore other ICMP messages that do not include sufficient information 824 in the packet-in-error, or process them as a hint that the subnetwork 825 path to the ETE may be failing. The ITE then discards these types of 826 messages. 828 For other ICMP messages, the ITE SHOULD examine the SEAL data packet 829 within the packet-in-error field. If the IP source and/or 830 destination addresses are invalid, or if the value in the SEAL header 831 Identification field (if present) is not within the window of packets 832 the ITE has recently sent to this ETE, the ITE discards the message. 834 Next, if the received ICMP message is a PTB the ITE sets MAXMTU to 835 the maximum of MINMTU and the MTU value in the message minus HLEN. 836 If the MTU value in the message is smaller than (MINMTU+HLEN), the 837 ITE also resets DOFRAG to TRUE and discards the message. 839 If the ICMP message was not discarded, the ITE transcribes it into a 840 message appropriate for the inner protocol version (e.g., ICMPv4 for 841 IPv4, ICMPv6 for IPv6, etc.) and forwards the transcribed message to 842 the previous hop toward the inner source address. 844 5.4.8. Detecting Path MTU Changes 846 The ITE SHOULD periodically reset MAXMTU to the MTU of the underlying 847 subnetwork interface to determine whether the subnetwork path MTU has 848 increased. If the path still has a too-small MTU, the ITE will 849 receive a PTB message that reports a smaller size. 851 5.5. ETE Specification 853 5.5.1. Reassembly Buffer Requirements 855 The ETE MUST configure a minimum SEAL reassembly buffer size of 856 (MINMTU+HLEN) bytes for the reassembly of fragmented SEAL packets 857 (see: Section 5.5.4). Note that the value "HLEN" may be variable and 858 initially unknown to the ETE. It is therefore RECOMMENDED that the 859 ETE configure a slightly larger SEAL reassembly buffer size of 2048 860 bytes (2KB). 862 When IPv4 is used as the outer layer of encapsulation, the ETE MUST 863 also configure a minimum IPv4 reassembly buffer size of 1280 bytes. 865 5.5.2. Tunnel Neighbor Soft State 867 The ETE maintains a window of Identification values for the packets 868 it has recently received from this ITE as well as a window of 869 Identification values for the packets it has recently sent to this 870 ITE. 872 5.5.3. IPv4-Layer Reassembly 874 The ETE reassembles fragmented IPv4 packets that are explicitly 875 addressed to itself. For IPv4 fragments of SEAL packets, the ETE 876 SHOULD maintain conservative reassembly cache high- and low-water 877 marks. When the size of the reassembly cache exceeds this high-water 878 mark, the ETE SHOULD actively discard stale incomplete reassemblies 879 (e.g., using an Active Queue Management (AQM) strategy) until the 880 size falls below the low-water mark. The ETE SHOULD also actively 881 discard any pending reassemblies that clearly have no opportunity for 882 completion, e.g., when a considerable number of new fragments have 883 arrived before a fragment that completes a pending reassembly 884 arrives. 886 The ETE processes IPv4 fragments as specified in the normative 887 references, i.e., it performs any necessary IPv4 reassembly then 888 submits the packet to the appropriate upper layer protocol module. 889 For SEAL packets, the ETE then performs SEAL decapsulation as 890 specified in Section 5.5.4. 892 5.5.4. Decapsulation, SEAL-Layer Reassembly, and Re-Encapsulation 894 For each SEAL packet accepted for decapsulation, the ETE first 895 examines the Identification field. If the Identification is not 896 within the window of acceptable values for this ITE, the ETE silently 897 discards the packet. 899 Next, if the SEAL header has (Offset=0; S=1; M=0), and the 900 encapsulated packet is an NS message, the ETE prepares a SEAL- 901 encapsulated Neighbor Advertisement (NA) message response. The NA 902 message body is prepared exactly the same as for unicast NAs as 903 specified in [RFC4861] and does not include a Target Link Layer 904 Address Option. The ETE then sends the NA response back to the ITE 905 while copying the (Offset, P, S, M) bit values that were received in 906 the SEAL header of the NS message then discards the NS message. When 907 the ITE receives the NA message, it processes the message as 908 specified in Section 5.4.6. 910 Next, if the SEAL header has (Offset!=0 || M=1) the ETE submits the 911 packet for reassembly exactly as specified for IPv6 reassembly in 912 Section 4.5 of [RFC2460]. During the reassembly process, the ETE 913 discards any fragments that are overlapping with respect to fragments 914 that have already been received, and also discards any fragments that 915 have M=1 in the SEAL header but do not contain an integer multiple of 916 8 bytes. The ETE further SHOULD manage the SEAL reassembly cache the 917 same as described for the IPv4-Layer Reassembly cache in Section 918 5.5.3, i.e., it SHOULD perform an early discard for any pending 919 reassemblies that have low probability of completion. In all other 920 ways, 922 Finally, the ETE discards the outer headers of the reassembled packet 923 and processes the inner packet according to the header type indicated 924 in the SEAL Next Header field. If the next hop toward the inner 925 destination address is via a different interface than the SEAL packet 926 arrived on, the ETE discards the SEAL header and delivers the inner 927 packet either to the local host or to the next hop if the packet is 928 not destined to the local host. 930 If the next hop is on the same tunnel the SEAL packet arrived on, 931 however, the ETE submits the packet for SEAL re-encapsulation 932 beginning with the specification in Section 5.4.3 above and without 933 decrementing the value in the inner (TTL / Hop Limit) field. 935 6. Link Requirements 937 Subnetwork designers are expected to follow the recommendations in 938 Section 2 of [RFC3819] when configuring link MTUs. 940 7. End System Requirements 942 End systems are encouraged to implement end-to-end MTU assurance 943 (e.g., using Packetization Layer Path MTU Discovery (PLPMTUD) per 944 [RFC4821]) even if the subnetwork is using SEAL. 946 When end systems use PLPMTUD, SEAL will ensure that the tunnel 947 behaves as a link in the path that assures an MTU of at least 1500 948 bytes while still allowing end systems to discover larger MTUs. The 949 PLPMTUD mechanism will therefore be able to function as designed in 950 order to discover and utilize larger MTUs. 952 8. Router Requirements 954 Routers within the subnetwork are expected to observe the standard IP 955 router requirements, including the implementation of IP fragmentation 956 and reassembly as well as the generation of ICMP messages 957 [RFC0792][RFC1122][RFC1812][RFC2460][RFC4443][RFC6434]. 959 Note that, even when routers support existing requirements for the 960 generation of ICMP messages, these messages are often filtered and 961 discarded by middleboxes on the path to the original source of the 962 message that triggered the ICMP. It is therefore not possible to 963 assume delivery of ICMP messages even when routers are correctly 964 implemented. 966 9. Multicast/Anycast Considerations 968 On multicast-capable tunnels, encapsulated packets sent by an ITE may 969 be received by potentially many ETEs. In that case, the ITE can 970 still send unicast NS messages to receive NA messages from a specific 971 ETE, or it can send multicast NS messages to receive NA messages from 972 all ETEs in the multicast group that receive the NS. If the ITE were 973 to send a multicast NS message used for MINMTU probing as described 974 in Section 5.4.6, however, it would be unable to discern whether all 975 ETEs received the probe unless it had some way of tracking the full 976 constituency of the multicast group. For multicast ETE addresses, 977 the ITE would therefore ordinarily set MAXMTU=MINMTU and DOFRAG=TRUE. 978 But, the setting of these values may be situation-dependent and based 979 on whether the ITE can tolerate packet loss to ETEs that may be 980 reached by subnetwork paths having small MTUs. 982 For ETEs that configure an anycast address, if the ITE sends a MINMTU 983 probe message it may receive an NA probe reply from a first ETE but 984 then be re-routed to a second ETE. It is therefore necessary for the 985 ITE to continue to send periodic probes (subject to rate limiting) as 986 described in Section 5.4.6 so that any path oscillations between ETEs 987 that configure the same anycast address will not result in a 988 sustained path MTU black hole. 990 10. Compatibility Considerations 992 Since SEAL is based on the standard IPv6 fragment header, the ITE can 993 implement the scheme independently of any ETE implementations. 994 Therefore, if the ITE uses SEAL but the ETE does not the ITE can stil 995 send a MINMTU probe message as specified in Section 5.4.6 but may 996 receive an ordinary (i.e., non SEAL-encapsulated) solicited NA 997 message response. If the ITE can discern that the NA message was in 998 fact a response to a MINMTU probe and not a response to an ordinary 999 NS message, it SHOULD reset DOFRAG to FALSE the same as if the ETE 1000 returned a SEAL-encapsulated NA message. 1002 In some cases, a non-SEAL ETE may not be able to reassemble 1003 fragmented SEAL packets up to (MINMTU+HLEN) bytes, since [RFC2460] 1004 only requires IPv6 nodes to reassemble packets up to 1500 bytes in 1005 length. To test for this condition, the ITE can create a MINMTU 1006 probe message, fragment the message into two pieces, then send the 1007 fragments to the ETE. If the ETE returns a solicited NA message, the 1008 ITE has assurance that the ETE is capable of reassembly. Otherwise, 1009 the ITE SHOULD reset MAXMTU for this subnetwork path to (MINMTU-HLEN) 1010 or even smaller if the ETE still cannot accept packets of this size. 1012 11. Nested Encapsulation Considerations 1014 SEAL supports nested tunneling - an example would be a recursive 1015 nesting of mobile networks, where the first network receives service 1016 from an ISP, the second network receives service from the first 1017 network, the third network receives service from the second network, 1018 etc. Since it is imperative that such nesting not extend 1019 indefinitely, tunnels that use SEAL SHOULD honor the Encapsulation 1020 Limit option defined in [RFC2473]. 1022 12. Reliability Considerations 1024 Although a tunnel may span an arbitrarily-large subnetwork expanse, 1025 the IP layer sees the tunnel as a simple link that supports the IP 1026 service model. Links with high bit error rates (BERs) (e.g., IEEE 1027 802.11) use Automatic Repeat-ReQuest (ARQ) mechanisms [RFC3366] to 1028 increase packet delivery ratios, while links with much lower BERs 1029 typically omit such mechanisms. Since Tunnels may traverse 1030 arbitrarily-long paths over links of various types that are already 1031 either performing or omitting ARQ as appropriate, it would therefore 1032 be inefficient to require the tunnel endpoints to also perform ARQ. 1034 13. Integrity Considerations 1036 Fragmentation and reassembly schemes must consider packet-splicing 1037 errors, e.g., when two fragments from the same packet are 1038 concatenated incorrectly, when a fragment from packet X is 1039 reassembled with fragments from packet Y, etc. The primary sources 1040 of such errors include implementation bugs and wrapping ID fields. 1042 In particular, the IPv4 16-bit ID field can wrap with only 64K 1043 packets with the same (src, dst, protocol)-tuple alive in the system 1044 at a given time [RFC4963]. When the IPv4 ID field is re-written by a 1045 middlebox such as a NAT or Firewall, ID field wrapping can occur with 1046 even fewer packets alive in the system. 1048 Fortunately, SEAL includes a 32-bit ID field the same as for IPv6 1049 fragmentation and also only employs SEAL fragmentation for packets up 1050 to 1500 bytes in length. SEAL also only allows IPv4 network 1051 fragmentation for packets up to 1280 bytes in length, but this size 1052 is small enough to fit within the MTU of modern high-speed IPv4 links 1053 without fragmentation. IPv4 links with smaller MTUs certainly exist, 1054 but typically support data rates that are slow enough to preclude 1055 high data rate reassembly misassociations errors; hence, a small 1056 amount of IPv4 fragmentation may occur in rare cases. 1058 14. IANA Considerations 1060 The IANA is requested to allocate a User Port number for "SEAL" in 1061 the 'port-numbers' registry. The Service Name is "SEAL", and the 1062 Transport Protocols are TCP and UDP. The Assignee is the IESG 1063 (iesg@ietf.org) and the Contact is the IETF Chair (chair@ietf.org). 1064 The Description is "Subnetwork Encapsulation and Adaptation Layer 1065 (SEAL)", and the Reference is the RFC-to-be currently known as 1066 'draft-templin-intarea-seal'. 1068 15. Security Considerations 1070 Neighbor relationships between the ITE and ETE should be secured in 1071 environments where authentication and/or confidentiality are a matter 1072 of concern. Securing mechanisms such as Secure Neighbor Discovery 1073 (SeND) [RFC3971] and IPsec [RFC4301] can be used for this purpose, 1074 however the tunnel neighbor relationship is managed by the tunnel 1075 protocols that ride over SEAL (as an encapsulation sublayer) rather 1076 than by SEAL itself. 1078 Security issues that apply to tunneling in general are discussed in 1079 [RFC6169]. 1081 16. Related Work 1083 Section 3.1.7 of [RFC2764] provides a high-level sketch for 1084 supporting large tunnel MTUs via a tunnel-layer fragmentation and 1085 reassembly capability to avoid IP layer fragmentation. 1087 Section 3 of [RFC4459] describes inner and outer fragmentation at the 1088 tunnel endpoints as alternatives for accommodating the tunnel MTU. 1090 Section 4 of [RFC2460] specifies a method for inserting and 1091 processing extension headers between the base IPv6 header and 1092 transport layer protocol data. The SEAL header is inserted and 1093 processed in exactly the same manner. 1095 The concepts of path MTU determination through the report of 1096 fragmentation and extending the IPv4 Identification field were first 1097 proposed in deliberations of the TCP-IP mailing list and the Path MTU 1098 Discovery Working Group (MTUDWG) during the late 1980's and early 1099 1990's. An historical analysis of the evolution of these concepts, 1100 as well as the development of the eventual PMTUD mechanism, appears 1101 in [RFC5320]. 1103 17. Implementation Status 1105 An early implementation of the first revision of SEAL [RFC5320] is 1106 available at: http://isatap.com/seal. 1108 An implementation of the current version of SEAL is available at: 1109 http://linkupnetworks.com/seal/sealv2-1.0.tgz. 1111 18. Acknowledgments 1113 The following individuals are acknowledged for helpful comments and 1114 suggestions: Jari Arkko, Fred Baker, Iljitsch van Beijnum, Oliver 1115 Bonaventure, Teco Boot, Bob Braden, Brian Carpenter, Steve Casner, 1116 Ian Chakeres, Noel Chiappa, Remi Denis-Courmont, Remi Despres, Ralph 1117 Droms, Aurnaud Ebalard, Gorry Fairhurst, Washam Fan, Dino Farinacci, 1118 Joel Halpern, Brian Haberman, Sam Hartman, John Heffner, Thomas 1119 Henderson, Bob Hinden, Christian Huitema, Eliot Lear, Darrel Lewis, 1120 Joe Macker, Matt Mathis, Erik Nordmark, Dan Romascanu, Dave Thaler, 1121 Joe Touch, Mark Townsley, Ole Troan, Margaret Wasserman, Magnus 1122 Westerlund, Robin Whittle, James Woodyatt, and members of the Boeing 1123 Research & Technology NST DC&NT group. 1125 Discussions with colleagues following the publication of [RFC5320] 1126 have provided useful insights that have resulted in significant 1127 improvements to this, the Second Edition of SEAL. 1129 This document received substantial review input from the IESG and 1130 IETF area directorates in the February 2013 timeframe. IESG members 1131 and IETF area directorate representatives who contributed helpful 1132 comments and suggestions are gratefully acknowledged. Discussions on 1133 the IETF IPv6 and Intarea mailing lists in the summer 2013 timeframe 1134 also stimulated several useful ideas. 1136 Path MTU determination through the report of fragmentation was first 1137 proposed by Charles Lynn on the TCP-IP mailing list in 1987. 1138 Extending the IP identification field was first proposed by Steve 1139 Deering on the MTUDWG mailing list in 1989. Steve Deering also 1140 proposed the IPv6 minimum MTU of 1280 bytes on the IPng mailing list 1141 in 1997. 1143 19. References 1145 19.1. Normative References 1147 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1148 September 1981. 1150 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1151 RFC 792, September 1981. 1153 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1154 Communication Layers", STD 3, RFC 1122, October 1989. 1156 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1157 Requirement Levels", BCP 14, RFC 2119, March 1997. 1159 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1160 (IPv6) Specification", RFC 2460, December 1998. 1162 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1163 Message Protocol (ICMPv6) for the Internet Protocol 1164 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1166 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 1167 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 1168 September 2007. 1170 19.2. Informative References 1172 [FOLK] Shannon, C., Moore, D., and k. claffy, "Beyond Folklore: 1173 Observations on Fragmented Traffic", December 2002. 1175 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 1176 October 1987. 1178 [I-D.massar-v6ops-ayiya] 1179 Massar, J., "AYIYA: Anything In Anything", 1180 draft-massar-v6ops-ayiya-02 (work in progress), July 2004. 1182 [I-D.taylor-v6ops-fragdrop] 1183 Jaeggli, J., Colitti, L., Kumari, W., Vyncke, E., Kaeo, 1184 M., and T. Taylor, "Why Operators Filter Fragments and 1185 What It Implies", draft-taylor-v6ops-fragdrop-01 (work in 1186 progress), June 2013. 1188 [I-D.templin-ironbis] 1189 Templin, F., "The Interior Routing Overlay Network 1190 (IRON)", draft-templin-ironbis-15 (work in progress), 1191 May 2013. 1193 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1194 August 1980. 1196 [RFC0994] International Organization for Standardization (ISO) and 1197 American National Standards Institute (ANSI), "Final text 1198 of DIS 8473, Protocol for Providing the Connectionless- 1199 mode Network Service", RFC 994, March 1986. 1201 [RFC1070] Hagens, R., Hall, N., and M. Rose, "Use of the Internet as 1202 a subnetwork for experimentation with the OSI network 1203 layer", RFC 1070, February 1989. 1205 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1206 November 1990. 1208 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic 1209 Routing Encapsulation (GRE)", RFC 1701, October 1994. 1211 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 1212 RFC 1812, June 1995. 1214 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1215 for IP version 6", RFC 1981, August 1996. 1217 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1218 October 1996. 1220 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1221 IPv6 Specification", RFC 2473, December 1998. 1223 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 1224 RFC 2675, August 1999. 1226 [RFC2764] Gleeson, B., Heinanen, J., Lin, A., Armitage, G., and A. 1227 Malis, "A Framework for IP Based Virtual Private 1228 Networks", RFC 2764, February 2000. 1230 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1231 Defeating Denial of Service Attacks which employ IP Source 1232 Address Spoofing", BCP 38, RFC 2827, May 2000. 1234 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1235 RFC 2923, September 2000. 1237 [RFC3232] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by 1238 an On-line Database", RFC 3232, January 2002. 1240 [RFC3366] Fairhurst, G. and L. Wood, "Advice to link designers on 1241 link Automatic Repeat reQuest (ARQ)", BCP 62, RFC 3366, 1242 August 2002. 1244 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1245 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1246 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1247 RFC 3819, July 2004. 1249 [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure 1250 Neighbor Discovery (SEND)", RFC 3971, March 2005. 1252 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1253 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1255 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1256 Internet Protocol", RFC 4301, December 2005. 1258 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1259 Network Tunneling", RFC 4459, April 2006. 1261 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1262 Discovery", RFC 4821, March 2007. 1264 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1265 Errors at High Data Rates", RFC 4963, July 2007. 1267 [RFC5320] Templin, F., "The Subnetwork Encapsulation and Adaptation 1268 Layer (SEAL)", RFC 5320, February 2010. 1270 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 1272 [RFC6169] Krishnan, S., Thaler, D., and J. Hoagland, "Security 1273 Concerns with IP Tunneling", RFC 6169, April 2011. 1275 [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node 1276 Requirements", RFC 6434, December 2011. 1278 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1279 for Equal Cost Multipath Routing and Link Aggregation in 1280 Tunnels", RFC 6438, November 2011. 1282 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 1283 RFC 6864, February 2013. 1285 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1286 UDP Checksums for Tunneled Packets", RFC 6935, April 2013. 1288 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1289 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1290 RFC 6936, April 2013. 1292 [RIPE] De Boer, M. and J. Bosma, "Discovering Path MTU Black 1293 Holes on the Internet using RIPE Atlas", July 2012. 1295 [SIGCOMM] Luckie, M. and B. Stasiewicz, "Measuring Path MTU 1296 Discovery Behavior", November 2010. 1298 [TBIT] Medina, A., Allman, M., and S. Floyd, "Measuring 1299 Interactions Between Transport Protocols and Middleboxes", 1300 October 2004. 1302 [WAND] Luckie, M., Cho, K., and B. Owens, "Inferring and 1303 Debugging Path MTU Discovery Failures", October 2005. 1305 Author's Address 1307 Fred L. Templin (editor) 1308 Boeing Research & Technology 1309 P.O. Box 3707 1310 Seattle, WA 98124 1311 USA 1313 Email: fltemplin@acm.org