idnits 2.17.1 draft-templin-intarea-seal-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 7, 2010) is 5064 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-07) exists of draft-ietf-intarea-ipv4-id-update-00 == Outdated reference: A later version (-04) exists of draft-ietf-v6ops-tunnel-security-concerns-02 == Outdated reference: A later version (-05) exists of draft-russert-rangers-03 == Outdated reference: A later version (-40) exists of draft-templin-intarea-vet-13 == Outdated reference: A later version (-17) exists of draft-templin-iron-02 -- Obsolete informational reference (is this intentional?): RFC 1063 (Obsoleted by RFC 1191) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Intended status: Standards Track June 7, 2010 5 Expires: December 9, 2010 7 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 8 draft-templin-intarea-seal-15.txt 10 Abstract 12 For the purpose of this document, a subnetwork is defined as a 13 virtual topology configured over a connected IP network routing 14 region and bounded by encapsulating border nodes. These virtual 15 topologies may span multiple IP and/or sub-IP layer forwarding hops, 16 and can introduce failure modes due to packet duplication and/or 17 links with diverse Maximum Transmission Units (MTUs). This document 18 specifies a Subnetwork Encapsulation and Adaptation Layer (SEAL) that 19 accommodates such virtual topologies over diverse underlying link 20 technologies. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on December 9, 2010. 39 Copyright Notice 41 Copyright (c) 2010 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 58 1.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 2. Terminology and Requirements . . . . . . . . . . . . . . . . . 7 60 3. Applicability Statement . . . . . . . . . . . . . . . . . . . 9 61 4. SEAL Protocol Specification . . . . . . . . . . . . . . . . . 10 62 4.1. Model of Operation . . . . . . . . . . . . . . . . . . . . 10 63 4.2. SEAL Header Format . . . . . . . . . . . . . . . . . . . . 12 64 4.3. ITE Specification . . . . . . . . . . . . . . . . . . . . 14 65 4.3.1. Tunnel Interface MTU . . . . . . . . . . . . . . . . . 14 66 4.3.2. Tunnel Interface Soft State . . . . . . . . . . . . . 15 67 4.3.3. Admitting Packets into the Tunnel . . . . . . . . . . 16 68 4.3.4. Mid-Layer Encapsulation . . . . . . . . . . . . . . . 17 69 4.3.5. SEAL Segmentation . . . . . . . . . . . . . . . . . . 17 70 4.3.6. Outer Encapsulation . . . . . . . . . . . . . . . . . 17 71 4.3.7. Probing Strategy . . . . . . . . . . . . . . . . . . . 18 72 4.3.8. Packet Identification . . . . . . . . . . . . . . . . 18 73 4.3.9. Sending SEAL Protocol Packets . . . . . . . . . . . . 19 74 4.3.10. Processing Raw ICMP Messages . . . . . . . . . . . . . 19 75 4.4. ETE Specification . . . . . . . . . . . . . . . . . . . . 19 76 4.4.1. Reassembly Buffer Requirements . . . . . . . . . . . . 19 77 4.4.2. IP-Layer Reassembly . . . . . . . . . . . . . . . . . 20 78 4.4.3. SEAL-Layer Reassembly . . . . . . . . . . . . . . . . 21 79 4.4.4. Decapsulation and Delivery to Upper Layers . . . . . . 22 80 4.5. The SEAL Control Message Protocol (SCMP) . . . . . . . . . 22 81 4.5.1. Generating SCMP Messages . . . . . . . . . . . . . . . 22 82 4.5.2. Processing SCMP Messages . . . . . . . . . . . . . . . 25 83 4.6. Tunnel Endpoint Synchronization . . . . . . . . . . . . . 27 84 5. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 29 85 6. End System Requirements . . . . . . . . . . . . . . . . . . . 29 86 7. Router Requirements . . . . . . . . . . . . . . . . . . . . . 30 87 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 88 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 89 10. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 31 90 11. SEAL Advantages over Classical Methods . . . . . . . . . . . . 31 91 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 92 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 93 13.1. Normative References . . . . . . . . . . . . . . . . . . . 33 94 13.2. Informative References . . . . . . . . . . . . . . . . . . 33 95 Appendix A. Reliability . . . . . . . . . . . . . . . . . . . . . 36 96 Appendix B. Integrity . . . . . . . . . . . . . . . . . . . . . . 36 97 Appendix C. Transport Mode . . . . . . . . . . . . . . . . . . . 37 98 Appendix D. Historic Evolution of PMTUD . . . . . . . . . . . . . 38 99 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 39 101 1. Introduction 103 As Internet technology and communication has grown and matured, many 104 techniques have developed that use virtual topologies (including 105 tunnels of one form or another) over an actual network that supports 106 the Internet Protocol (IP) [RFC0791][RFC2460]. Those virtual 107 topologies have elements that appear as one hop in the virtual 108 topology, but are actually multiple IP or sub-IP layer hops. These 109 multiple hops often have quite diverse properties that are often not 110 even visible to the endpoints of the virtual hop. This introduces 111 failure modes that are not dealt with well in current approaches. 113 The use of IP encapsulation has long been considered as the means for 114 creating such virtual topologies. However, the insertion of an outer 115 IP header reduces the effective path MTU visible to the inner network 116 layer. When IPv4 is used, this reduced MTU can be accommodated 117 through the use of IPv4 fragmentation, but unmitigated in-the-network 118 fragmentation has been found to be harmful through operational 119 experience and studies conducted over the course of many years 120 [FRAG][FOLK][RFC4963]. Additionally, classical path MTU discovery 121 [RFC1191] has known operational issues that are exacerbated by in- 122 the-network tunnels [RFC2923][RFC4459]. The following subsections 123 present further details on the motivation and approach for addressing 124 these issues. 126 1.1. Motivation 128 Before discussing the approach, it is necessary to first understand 129 the problems. In both the Internet and private-use networks today, 130 IPv4 is ubiquitously deployed as the Layer 3 protocol. The two 131 primary functions of IPv4 are to provide for 1) addressing, and 2) a 132 fragmentation and reassembly capability used to accommodate links 133 with diverse MTUs. While it is well known that the IPv4 address 134 space is rapidly becoming depleted, there is a lesser-known but 135 growing consensus that other IPv4 protocol limitations have already 136 or may soon become problematic. 138 First, the IPv4 header Identification field is only 16 bits in 139 length, meaning that at most 2^16 unique packets with the same 140 (source, destination, protocol)-tuple may be active in the Internet 141 at a given time [I-D.ietf-intarea-ipv4-id-update]. Due to the 142 escalating deployment of high-speed links (e.g., 1Gbps Ethernet), 143 however, this number may soon become too small by several orders of 144 magnitude for high data rate packet sources such as tunnel endpoints 145 [RFC4963]. Furthermore, there are many well-known limitations 146 pertaining to IPv4 fragmentation and reassembly - even to the point 147 that it has been deemed "harmful" in both classic and modern-day 148 studies (cited above). In particular, IPv4 fragmentation raises 149 issues ranging from minor annoyances (e.g., in-the-network router 150 fragmentation) to the potential for major integrity issues (e.g., 151 mis-association of the fragments of multiple IP packets during 152 reassembly [RFC4963]). 154 As a result of these perceived limitations, a fragmentation-avoiding 155 technique for discovering the MTU of the forward path from a source 156 to a destination node was devised through the deliberations of the 157 Path MTU Discovery Working Group (PMTUDWG) during the late 1980's 158 through early 1990's (see Appendix D). In this method, the source 159 node provides explicit instructions to routers in the path to discard 160 the packet and return an ICMP error message if an MTU restriction is 161 encountered. However, this approach has several serious shortcomings 162 that lead to an overall "brittleness" [RFC2923]. 164 In particular, site border routers in the Internet are being 165 configured more and more to discard ICMP error messages coming from 166 the outside world. This is due in large part to the fact that 167 malicious spoofing of error messages in the Internet is made simple 168 since there is no way to authenticate the source of the messages 169 [I-D.ietf-tcpm-icmp-attacks]. Furthermore, when a source node that 170 requires ICMP error message feedback when a packet is dropped due to 171 an MTU restriction does not receive the messages, a path MTU-related 172 black hole occurs. This means that the source will continue to send 173 packets that are too large and never receive an indication from the 174 network that they are being discarded. This behavior has been 175 confirmed through documented studies showing clear evidence of path 176 MTU discovery failures in the Internet today [TBIT][WAND]. 178 The issues with both IPv4 fragmentation and this "classical" method 179 of path MTU discovery are exacerbated further when IP tunneling is 180 used [RFC4459]. For example, ingress tunnel endpoints (ITEs) may be 181 required to forward encapsulated packets into the subnetwork on 182 behalf of hundreds, thousands, or even more original sources in the 183 end site. If the ITE allows IPv4 fragmentation on the encapsulated 184 packets, persistent fragmentation could lead to undetected data 185 corruption due to Identification field wrapping. If the ITE instead 186 uses classical IPv4 path MTU discovery, it may be inconvenienced by 187 excessive ICMP error messages coming from the subnetwork that may be 188 either suspect or contain insufficient information for translation 189 into error messages to be returned to the original sources. 191 Although recent works have led to the development of a robust end-to- 192 end MTU determination scheme [RFC4821], this approach requires 193 tunnels to present a consistent MTU the same as for ordinary links on 194 the end-to-end path. Moreover, in current practice existing 195 tunneling protocols mask the MTU issues by selecting a "lowest common 196 denominator" MTU that may be much smaller than necessary for most 197 paths and difficult to change at a later date. Due to these many 198 consideration, a new approach to accommodate tunnels over links with 199 diverse MTUs is necessary. 201 1.2. Approach 203 For the purpose of this document, a subnetwork is defined as a 204 virtual topology configured over a connected network routing region 205 and bounded by encapsulating border nodes. Examples include the 206 global Internet interdomain routing core, Mobile Ad hoc Networks 207 (MANETs) and enterprise networks. Subnetwork border nodes forward 208 unicast and multicast packets over the virtual topology across 209 multiple IP and/or sub-IP layer forwarding hops that may introduce 210 packet duplication and/or traverse links with diverse Maximum 211 Transmission Units (MTUs). 213 This document introduces a Subnetwork Encapsulation and Adaptation 214 Layer (SEAL) for tunneling network layer protocols (e.g., IP, OSI, 215 etc.) over IP subnetworks that connect Ingress and Egress Tunnel 216 Endpoints (ITEs/ETEs) of border nodes. It provides a modular 217 specification designed to be tailored to specific associated 218 tunneling protocols. A transport-mode of operation is also possible, 219 and described in Appendix C. SEAL accommodates links with diverse 220 MTUs, protects against off-path denial-of-service attacks, and 221 supports efficient duplicate packet detection through the use of a 222 minimal mid-layer encapsulation. 224 SEAL specifically treats tunnels that traverse the subnetwork as 225 unidirectional links that must support network layer services. As 226 for any link, tunnels that use SEAL must provide suitable networking 227 services including best-effort datagram delivery, integrity and 228 consistent handling of packets of various sizes. As for any link 229 whose media cannot provide suitable services natively, tunnels that 230 use SEAL employ link-level adaptation functions to meet the 231 legitimate expectations of the network layer service. As this is 232 essentially a link level adaptation, SEAL is therefore permitted to 233 alter packets within the subnetwork as long as it restores them to 234 their original form when they exit the subnetwork. The mechanisms 235 described within this document are designed precisely for this 236 purpose. 238 SEAL encapsulation introduces an extended Identification field for 239 packet identification and a mid-layer segmentation and reassembly 240 capability that allows simplified cutting and pasting of packets. 241 Moreover, SEAL senses in-the-network fragmentation as a "noise" 242 indication that packet sizing parameters are "out of tune" with 243 respect to the network path. As a result, SEAL can naturally tune 244 its packet sizing parameters to eliminate the in-the-network 245 fragmentation. This approach is in contrast to existing tunneling 246 protocol practices which seek to avoid MTU issues by selecting a 247 "lowest common denominator" MTU that may be overly conservative for 248 many tunnels and difficult to change even when larger MTUs become 249 available. 251 The following sections provide the SEAL normative specifications, 252 while the appendices present non-normative additional considerations. 254 2. Terminology and Requirements 256 The following terms are defined within the scope of this document: 258 subnetwork 259 a virtual topology configured over a connected network routing 260 region and bounded by encapsulating border nodes. 262 Ingress Tunnel Endpoint 263 a virtual interface over which an encapsulating border node (host 264 or router) sends encapsulated packets into the subnetwork. 266 Egress Tunnel Endpoint 267 a virtual interface over which an encapsulating border node (host 268 or router) receives encapsulated packets from the subnetwork. 270 inner packet 271 an unencapsulated network layer protocol packet (e.g., IPv6 272 [RFC2460], IPv4 [RFC0791], OSI/CLNP [RFC1070], etc.) before any 273 mid-layer or outer encapsulations are added. Internet protocol 274 numbers that identify inner packets are found in the IANA Internet 275 Protocol registry [RFC3232]. 277 mid-layer packet 278 a packet resulting from adding mid-layer encapsulating headers to 279 an inner packet. 281 outer IP packet 282 a packet resulting from adding an outer IP header to a mid-layer 283 packet. 285 packet-in-error 286 the leading portion of an invoking data packet encapsulated in the 287 body of an error control message (e.g., an ICMPv4 [RFC0792] error 288 message, an ICMPv6 [RFC4443] error message, etc.). 290 IP, IPvX, IPvY 291 used to generically refer to either IP protocol version, i.e., 292 IPv4 or IPv6. 294 The following abbreviations correspond to terms used within this 295 document and elsewhere in common Internetworking nomenclature: 297 DF - the IPv4 header "Don't Fragment" flag [RFC0791] 299 ETE - Egress Tunnel Endpoint 301 HLEN - the sum of MHLEN and OHLEN 303 ITE - Ingress Tunnel Endpoint 305 MHLEN - the length of any mid-layer headers and trailers 307 OHLEN - the length of the outer encapsulating headers and 308 trailers, including the outer IP header, the SEAL header and any 309 other outer headers and trailers. 311 PTB - a Packet Too Big message recognized by the inner network 312 layer, e.g., an ICMPv6 "Packet Too Big" message [RFC4443], an 313 ICMPv4 "Fragmentation Needed" message [RFC0792], etc. 315 S_MRU - the SEAL Maximum Reassembly Unit 317 S_MSS - the SEAL Maximum Segment Size 319 SCMP - the SEAL Control Message Protocol 321 SEAL_ID - an Identification value, randomly initialized and 322 monotonically incremented for each SEAL protocol packet 324 SEAL_PORT - a TCP/UDP service port number used for SEAL 326 SEAL_PROTO - an IPv4 protocol number used for SEAL 328 TE - Tunnel Endpoint (i.e., either ingress or egress) 330 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 331 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 332 document are to be interpreted as described in [RFC2119]. When used 333 in lower case (e.g., must, must not, etc.), these words MUST NOT be 334 interpreted as described in [RFC2119], but are rather interpreted as 335 they would be in common English. 337 3. Applicability Statement 339 SEAL was originally motivated by the specific case of subnetwork 340 abstraction for Mobile Ad hoc Networks (MANETs), however it soon 341 became apparent that the domain of applicability also extends to 342 subnetwork abstractions of enterprise networks, ISP networks, SOHO 343 networks, the interdomain routing core, and any other networking 344 scenario involving IP encapsulation. SEAL and its associated 345 technologies (including Virtual Enterprise Traversal (VET) 346 [I-D.templin-intarea-vet]) are functional building blocks for a new 347 Internetworking architecture based on Routing and Addressing in 348 Networks with Global Enterprise Recursion (RANGER) 349 [RFC5720][I-D.russert-rangers] and the Internet Routing Overlay 350 Network (IRON) [I-D.templin-iron]. 352 SEAL provides a network sublayer for encapsulation of an inner 353 network layer packet within outer encapsulating headers. For 354 example, for IPvX in IPvY encapsulation (e.g., as IPv4/SEAL/IPv6), 355 the SEAL header appears as a subnetwork encapsulation as seen by the 356 inner IP layer. SEAL can also be used as a sublayer within a UDP 357 data payload (e.g., as IPv4/UDP/SEAL/IPv6 similar to Teredo 358 [RFC4380]), where UDP encapsulation is typically used for operation 359 over subnetworks that give preferential treatment to the "core" 360 Internet protocols (i.e., TCP and UDP). The SEAL header is processed 361 the same as for IPv6 extension headers, i.e., it is not part of the 362 outer IP header but rather allows for the creation of an arbitrarily 363 extensible chain of headers in the same way that IPv6 does. 365 SEAL supports a segmentation and reassembly capability for adapting 366 the network layer to the underlying subnetwork characteristics, where 367 the Egress Tunnel Endpoint (ETE) determines how much or how little 368 reassembly it is willing to support. In the limiting case, the ETE 369 acts as a passive observer that simply informs the Ingress Tunnel 370 Endpoint (ITE) of any MTU limitations and otherwise discards all 371 packets that arrive as multiple fragments. This mode is useful for 372 determining an appropriate MTU for tunnels between performance- 373 critical routers connected to high data rate subnetworks such as the 374 Internet DFZ, as well as for other uses in which reassembly would 375 present too great of a burden for the routers or end systems. 377 When the ETE supports reassembly, the tunnel can be used to transport 378 packets that are too large to traverse the path without 379 fragmentation. In this mode, the ITE determines the tunnel MTU based 380 on the largest packet the ETE is capable of reassembling rather than 381 on the MTU of the smallest link in the path. Therefore, tunnel 382 endpoints that use SEAL can transport packets that are much larger 383 than the underlying subnetwork links themselves can carry in a single 384 piece. 386 SEAL tunnels may be configured over paths that include not only 387 ordinary physical links, but also virtual links that may include 388 other SEAL tunnels. An example application would be linking two 389 geographically remote supercomputer centers with large MTU links by 390 configuring a SEAL tunnel across the Internet. A second example 391 would be support for sub-IP segmentation over low-end links, i.e., 392 especially over wireless transmission media such as IEEE 802.15.4, 393 broadcast radio links in Mobile Ad-hoc Networks (MANETs), Very High 394 Frequency (VHF) civil aviation data links, etc. 396 Many other use case examples are anticipated, and will be identified 397 as further experience is gained. 399 4. SEAL Protocol Specification 401 The following sections specify the operation of the SEAL protocol. 403 4.1. Model of Operation 405 SEAL is an encapsulation sublayer that supports a multi-level 406 segmentation and reassembly capability for the transmission of 407 unicast and multicast packets across an underlying IP subnetwork with 408 heterogeneous links. First, the ITE can use IPv4 fragmentation to 409 fragment inner IPv4 packets before SEAL encapsulation if necessary. 410 Secondly, the SEAL layer itself provides a simple cutting-and-pasting 411 capability for mid-layer packets to avoid IP fragmentation on the 412 outer packet. Finally, ordinary IP fragmentation is permitted on the 413 outer packet after SEAL encapsulation and is used to detect and tune 414 out any in-the-network fragmentation. 416 SEAL-enabled ITEs encapsulate each inner packet in any mid-layer 417 headers and trailers, segment the resulting mid-layer packet into 418 multiple segments if necessary, then append a SEAL header and any 419 outer encapsulations to each segment. As an example, for IPv6-in- 420 IPv4 encapsulation a single-segment inner IPv6 packet encapsulated in 421 any mid-layer headers and trailers, followed by the SEAL header, 422 followed by any outer headers and trailers, followed by an outer IPv4 423 header would appear as shown in Figure 1: 425 +--------------------+ 426 ~ outer IPv4 header ~ 427 +--------------------+ 428 I ~ other outer hdrs ~ 429 n +--------------------+ 430 n ~ SEAL Header ~ 431 e +--------------------+ +--------------------+ 432 r ~ mid-layer headers ~ ~ mid-layer headers ~ 433 +--------------------+ +--------------------+ 434 I --> | | --> | | 435 P --> ~ inner IPv6 ~ --> ~ inner IPv6 ~ 436 v --> ~ Packet ~ --> ~ Packet ~ 437 6 --> | | --> | | 438 +--------------------+ +--------------------+ 439 P ~ mid-layer trailers ~ ~ mid-layer trailers ~ 440 a +--------------------+ +--------------------+ 441 c ~ outer trailers ~ 442 k Mid-layer packet +--------------------+ 443 e after mid-layer encaps. 444 t Outer IPv4 packet 445 after SEAL and outer encaps. 447 Figure 1: SEAL Encapsulation - Single Segment 449 As a second example, for IPv4-in-IPv6 encapsulation an inner IPv4 450 packet requiring three SEAL segments would appear as three separate 451 outer IPv6 packets, where the mid-layer headers are carried only in 452 segment 0 and the mid-layer trailers are carried in segment 2 as 453 shown in Figure 2: 455 +------------------+ +------------------+ 456 ~ outer IPv6 hdr ~ ~ outer IPv6 hdr ~ 457 +------------------+ +------------------+ +------------------+ 458 ~ other outer hdrs ~ ~ outer IPv6 hdr ~ ~ other outer hdrs ~ 459 +------------------+ +------------------+ +------------------+ 460 ~ SEAL hdr (SEG=0) ~ ~ other outer hdrs ~ ~ SEAL hdr (SEG=2) ~ 461 +------------------+ +------------------+ +------------------+ 462 ~ mid-layer hdrs ~ ~ SEAL hdr (SEG=1) ~ | inner IPv4 | 463 +------------------+ +------------------+ ~ Packet ~ 464 | inner IPv4 | | inner IPv4 | | (Segment 2) | 465 ~ Packet ~ ~ Packet ~ +------------------+ 466 | (Segment 0) | | (Segment 1) | ~ mid-layer trails ~ 467 +------------------+ +------------------+ +------------------+ 468 ~ outer trailers ~ ~ outer trailers ~ ~ outer trailers ~ 469 +------------------+ +------------------+ +------------------+ 471 Segment 0 (includes Segment 1 (no mid- Segment 2 (includes 472 mid-layer hdrs) layer encaps) mid-layer trails) 474 Figure 2: SEAL Encapsulation - Multiple Segments 476 The SEAL header itself is inserted according to the specific 477 tunneling protocol. Examples include the following: 479 o For simple encapsulation of an inner network layer packet within 480 an outer IPvX header (e.g., [RFC1070][RFC2003][RFC2473][RFC4213], 481 etc.), the SEAL header is inserted between the inner packet and 482 outer IPvX headers as: IPvX/SEAL/{inner packet}. 484 o For encapsulations over transports such as UDP (e.g., [RFC4380]), 485 the SEAL header is inserted between the outer transport layer 486 header and the mid-layer packet, e.g., as IPvX/UDP/SEAL/{mid-layer 487 packet}. Here, the UDP header is seen as an "other outer header". 489 SEAL-encapsulated packets include a SEAL_ID to uniquely identify each 490 packet. Routers within the subnetwork use the SEAL_ID for duplicate 491 packet detection, and TEs use the SEAL_ID for SEAL segmentation/ 492 reassembly and protection against off-path attacks. The following 493 sections specify the SEAL header format and SEAL-related operations 494 of the ITE and ETE, respectively. 496 4.2. SEAL Header Format 498 The SEAL header is formatted as follows: 500 0 1 2 3 501 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 503 |VER|A|R|D|F|M|Z| NEXTHDR/SEG | SEAL_ID (bits 48 - 32) | 504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 505 | SEAL_ID (bits 31 - 0) | 506 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 Figure 3: SEAL Header Format 510 where the header fields are defined as: 512 VER (2) 513 a 2-bit version field. This document specifies Version 0 of the 514 SEAL protocol, i.e., the VER field encodes the value 0. 516 A (1) 517 the "Acknowledgement Requested" bit. Set to 1 by the ITE in data 518 packets if it wishes to receive an explicit acknowledgement from 519 the ETE. 521 R (1) 522 the "Redirect" bit. Set to 0 unless otherwise specified in other 523 documents. 525 D (1) 526 the "Discard" bit. Set to 0 unless otherwise specified in other 527 documents. 529 F (1) 530 the "First Segment" bit. Set to 1 if this SEAL protocol packet 531 contains the first segment (i.e., Segment #0) of a mid-layer 532 packet. 534 M (1) 535 the "More Segments" bit. Set to 1 if this SEAL protocol packet 536 contains a non-final segment of a multi-segment mid-layer packet. 538 Z (1) 539 a 1-bit "Reserved" field. Set to zero by the ITE and ignored by 540 the ETE. 542 NEXTHDR/SEG (8) an 8-bit field. When 'F'=1, encodes the next header 543 Internet Protocol number the same as for the IPv4 protocol and 544 IPv6 next header fields. When 'F'=0, encodes a segment number of 545 a multi-segment mid-layer packet. (The segment number 0 is 546 reserved.) 548 SEAL_ID (48) 549 a 48-bit Identification field. 551 Setting of the various bits and fields of the SEAL header is 552 specified in the following sections. 554 4.3. ITE Specification 556 4.3.1. Tunnel Interface MTU 558 The ITE configures a tunnel virtual interface over one or more 559 underlying links that connect the border node to the subnetwork. The 560 tunnel interface must present a fixed MTU to Layer 3 as the size for 561 admission of inner packets into the tunnel. Since the tunnel 562 interface may support a large set of ETEs that accept widely varying 563 maximum packet sizes, however, a number of factors should be taken 564 into consideration when selecting a tunnel interface MTU. 566 Due to the ubiquitous deployment of standard Ethernet and similar 567 networking gear, the nominal Internet cell size has become 1500 568 bytes; this is the de facto size that end systems have come to expect 569 will either be delivered by the network without loss due to an MTU 570 restriction on the path or a suitable ICMP Packet Too Big (PTB) 571 message returned. When the 1500 byte packets sent by end systems 572 incur additional encapsulation at an ITE, however, they may be 573 dropped silently since the network may not always deliver the 574 necessary PTBs [RFC2923]. 576 The ITE should therefore set a tunnel virtual interface MTU of at 577 least 1500 bytes plus extra room to accommodate any additional 578 encapsulations that may occur on the path from the original source. 579 The ITE can set larger MTU values still, but should select a value 580 that is not so large as to cause excessive PTBs coming from within 581 the tunnel interface. The ITE can also set smaller MTU values; 582 however, care must be taken not to set so small a value that original 583 sources would experience an MTU underflow. In particular, IPv6 584 sources must see a minimum path MTU of 1280 bytes, and IPv4 sources 585 should see a minimum path MTU of 576 bytes. 587 The ITE can alternatively set an indefinite MTU on the tunnel virtual 588 interface such that all inner packets are admitted into the interface 589 without regard to size. For ITEs that host applications, this option 590 must be carefully coordinated with protocol stack upper layers, since 591 some upper layer protocols (e.g., TCP) derive their packet sizing 592 parameters from the MTU of the outgoing interface and as such may 593 select too large an initial size. This is not a problem for upper 594 layers that use conservative initial maximum segment size estimates 595 and/or when the tunnel interface can reduce the upper layer's maximum 596 segment size (e.g., the size advertised in the TCP MSS option) based 597 on the per-neighbor MTU. 599 The inner network layer protocol consults the tunnel interface MTU 600 when admitting a packet into the interface. For inner IPv4 packets 601 with the IPv4 Don't Fragment (DF) bit set to 0, if the packet is 602 larger than the tunnel interface MTU the inner IPv4 layer uses IPv4 603 fragmentation to break the packet into fragments no larger than the 604 tunnel interface MTU. The ITE then admits each fragment into the 605 tunnel as an independent packet. 607 For all other inner packets, the ITE admits the packet if it is no 608 larger than the tunnel interface MTU; otherwise, it drops the packet 609 and sends a PTB error message to the source with the MTU value set to 610 the tunnel interface MTU. The message must contain as much of the 611 invoking packet as possible without the entire message exceeding the 612 network layer minimum MTU (e.g., 576 bytes for IPv4, 1280 bytes for 613 IPv6, etc.). 615 Note that when the tunnel interface sets an indefinite MTU the ITE 616 unconditionally admits all packets into the interface without 617 fragmentation. In light of the above considerations, it is 618 RECOMMENDED that the ITE configure an indefinite MTU on the tunnel 619 virtual interface and adapt to any per-neighbor MTU limitations 620 within the tunnel virtual interface as described in the following 621 sections. 623 4.3.2. Tunnel Interface Soft State 625 For each ETE, the ITE maintains soft state within the tunnel 626 interface (e.g., in a neighbor cache) used to support inner 627 fragmentation and SEAL segmentation for packets admitted into the 628 tunnel interface. The soft state includes the following: 630 o a Mid-layer Header Length (MHLEN); set to the length of any mid- 631 layer encapsulation headers and trailers that must be added before 632 SEAL segmentation. 634 o an Outer Header Length (OHLEN); set to the length of the outer IP, 635 SEAL and other outer encapsulation headers and trailers. 637 o a total Header Length (HLEN); set to MHLEN plus OHLEN. 639 o a SEAL Maximum Segment Size (S_MSS). The ITE initializes S_MSS to 640 the underlying interface MTU if the underlying interface MTU can 641 be determined (otherwise, the ITE initializes S_MSS to 642 "infinity"). The ITE decreases or increased S_MSS based on any 643 SCMP "MTU Report" messages received (see Section 4.5). 645 o a SEAL Maximum Reassembly Unit (S_MRU). The ITE initializes S_MRU 646 to "infinity" and decreases or increases S_MRU based on any SCMP 647 MTU Report messages received (see Section 4.5). When 648 (S_MRU>(S_MSS*256)), the ITE uses (S_MSS*256) as the effective 649 S_MRU value. 651 Note that S_MSS and S_MRU include the length of the outer and mid- 652 layer encapsulating headers and trailers (i.e., HLEN), since the ETE 653 must retain the headers and trailers during reassembly. Note also 654 that the ITE maintains S_MSS and S_MRU as 32-bit values such that 655 inner packets larger than 64KB (e.g., IPv6 jumbograms [RFC2675]) can 656 be accommodated when appropriate for a given subnetwork. 658 4.3.3. Admitting Packets into the Tunnel 660 After the ITE admits an inner packet/fragment into the tunnel 661 interface, it uses the following algorithm to determine whether the 662 packet can be accommodated and (if so) whether (further) inner IP 663 fragmentation is needed: 665 o if the inner packet is unfragmentable (e.g., an IPv6 packet, an 666 IPv4 packet with DF=1, etc.), and the packet is larger than 667 (MAX(S_MRU, S_MSS) - HLEN), the ITE drops the packet and sends a 668 PTB message to the original source with an MTU value of 669 (MAX(S_MRU, S_MSS) - HLEN); else, 671 o if the inner packet is fragmentable (e.g., an IPv4 packet with 672 DF=0), and the packet is larger than (foo) bytes, the ITE uses 673 inner fragmentation to break the packet into fragments no larger 674 than (foo) bytes; else, 676 o the ITE processes the packet without inner fragmentation. 678 In the above, the ITE must track whether the tunnel interface is 679 using header compression. If so, the ITE must include the length of 680 the uncompressed headers and trailers when calculating HLEN. Note 681 also in the above that the ITE is permitted to admit inner packets 682 into the tunnel that can be accommodated in a single SEAL segment 683 (i.e., no larger than S_MSS) even if they are larger than the ETE 684 would be willing to reassemble if fragmented (i.e., larger than 685 S_MRU) - see: Section 4.4.1. 687 When the ITE uses inner fragmentation, it should use a "safe" 688 fragment size of (foo) bytes that would be highly unlikely to incur 689 an outer IP MTU restriction within the tunnel. If the ITE can 690 determine a larger fragment size (e.g., via probing), it should use 691 the larger size for inner fragmentation. In the absence of 692 deterministic information, it is RECOMMENDED that the ITE set (foo) 693 to 1280. 695 4.3.4. Mid-Layer Encapsulation 697 After inner IP fragmentation (if necessary), the ITE next 698 encapsulates each inner packet/fragment in the MHLEN bytes of mid- 699 layer headers and trailers. The ITE then presents the mid-layer 700 packet for SEAL segmentation and outer encapsulation. 702 4.3.5. SEAL Segmentation 704 After mid-layer encapsulation, if the length of the resulting mid- 705 layer packet plus OHLEN is greater than S_MSS the ITE must 706 additionally perform SEAL segmentation. To do so, it breaks the mid- 707 layer packet into N segments (N <= 256) that are no larger than 708 (S_MSS - OHLEN) bytes each. Each segment, except the final one, MUST 709 be of equal length. The first byte of each segment MUST begin 710 immediately after the final byte of the previous segment, i.e., the 711 segments MUST NOT overlap. The ITE SHOULD generate the smallest 712 number of segments possible, e.g., it SHOULD NOT generate 6 smaller 713 segments when the packet could be accommodated with 4 larger 714 segments. 716 Note that this SEAL segmentation ignores the fact that the mid-layer 717 packet may be unfragmentable outside of the subnetwork. This 718 segmentation process is a mid-layer (not an IP layer) operation 719 employed by the ITE to adapt the mid-layer packet to the subnetwork 720 path characteristics, and the ETE will restore the packet to its 721 original form during reassembly. Therefore, the fact that the packet 722 may have been segmented within the subnetwork is not observable 723 outside of the subnetwork. 725 4.3.6. Outer Encapsulation 727 Following SEAL segmentation, the ITE next encapsulates each segment 728 in a SEAL header formatted as specified in Section 4.2. For the 729 first segment, the ITE sets F=1, then sets NEXTHDR to the Internet 730 Protocol number of the encapsulated inner packet, and finally sets 731 M=1 if there are more segments or sets M=0 otherwise. For each non- 732 initial segment of an N-segment mid-layer packet (N <= 256), the ITE 733 sets (F=0; M=1; SEG=1) in the SEAL header of the first non-initial 734 segment, sets (F=0; M=1; SEG=2) in the next non-initial segment, 735 etc., and sets (F=0; M=0; SEG=N-1) in the final segment. (Note that 736 the value SEG=0 is not used, since the initial segment encodes a 737 NEXTHDR value and not a SEG value.) 739 The ITE next encapsulates each segment in the requisite outer headers 740 and trailers according to the specific encapsulation format (e.g., 742 [RFC1070], [RFC2003], [RFC2473], [RFC4213], etc.), except that it 743 writes 'SEAL_PROTO' in the protocol field of the outer IP header 744 (when simple IP encapsulation is used) or writes 'SEAL_PORT' in the 745 outer destination service port field (e.g., when IP/UDP encapsulation 746 is used). The ITE finally sets A=1 if probing is necessary as 747 specified in Section 4.3.7, sets the packet identification values as 748 specified in Section 4.3.8 and sends the packets as specified in 749 Section 4.3.9. 751 4.3.7. Probing Strategy 753 All SEAL encapsulated packets sent by the ITE are considered implicit 754 probes. SEAL encapsulated packets that use IPv4 as the outer layer 755 of encapsulation will elicit SCMP PTB messages from the ETE (see: 756 Section 4.5) if any IPv4 fragmentation occurs in the path. SEAL 757 encapsulated packets that use IPv6 as the outer layer of 758 encapsulation may be dropped by an IPv6 router on the path to the ETE 759 which will also return an ICMPv6 PTB message to the ITE. The ITE can 760 then use the SEAL_ID within the packet-in-error to determine whether 761 the PTB message corresponds to one of its recent packet 762 transmissions. 764 The ITE should also send explicit probes, periodically, to verify 765 that the ETE is still reachable. The ITE sets A=1 in the SEAL header 766 of a segment to be used as an explicit probe, where the probe can be 767 either an ordinary data packet or a NULL packet created by setting 768 the NEXTHDR field to a value of "No Next Header" (see Section 4.7 of 769 [RFC2460]). The probe will elicit a solicited SCMP Neighbor 770 Advertisement (NA) message from the ETE as an acknowledgement (see 771 Section 4.5.1). 773 Finally, the ITE MAY send "expendable" outer IP probe packets (see 774 Section 4.3.9) as explicit probes in order to detect increases in the 775 path MTU to the ETE. One possible strategy is to send expendable 776 packets with A=1 in the SEAL header and DF=1 in the IP header. In 777 all cases, the ITE MUST be conservative in its use of the A bit in 778 order to limit the resultant control message overhead. 780 4.3.8. Packet Identification 782 The ITE maintains a randomly-initialized SEAL_ID value as per-ETE 783 soft state (e.g., in the neighbor cache) and monotonically increments 784 it for each successive SEAL protocol packet it sends to the ETE. For 785 each successive SEAL protocol packet, the ITE writes the current 786 SEAL_ID value into the header field of the same name in the SEAL 787 header. 789 4.3.9. Sending SEAL Protocol Packets 791 Following SEAL segmentation and encapsulation, the ITE sets DF=0 in 792 the header of each outer IPv4 packet to ensure that they will be 793 delivered to the ETE even if they are fragmented within the 794 subnetwork. (The ITE can instead set DF=1 for "expendable" outer 795 IPv4 packets (e.g., for NULL packets used as probes -- see Section 796 4.3.7), but these may be lost due to an MTU restriction). For outer 797 IPv6 packets, the "DF" bit is always implicitly set to 1; hence, they 798 will not be fragmented within the subnetwork. 800 The ITE sends each outer packet that encapsulates a segment of the 801 same mid-layer packet into the tunnel in canonical order, i.e., 802 segment 0 first, followed by segment 1, etc., and finally segment 803 N-1. 805 4.3.10. Processing Raw ICMP Messages 807 The ITE may receive "raw" ICMP error messages [RFC0792][RFC4443] from 808 either the ETE or routers within the subnetwork that comprise an 809 outer IP header, followed by an ICMP header, followed by a portion of 810 the SEAL packet that generated the error (also known as the "packet- 811 in-error"). The ITE can use the SEAL_ID encoded in the packet-in- 812 error as a nonce to confirm that the ICMP message came from either 813 the ETE or an on-path router, and can use any additional information 814 to determine whether to accept or discard the message. 816 The ITE should specifically process raw ICMPv4 Protocol Unreachable 817 messages and ICMPv6 Parameter Problem messages with Code 818 "Unrecognized Next Header type encountered" as a hint that the ETE 819 does not implement the SEAL protocol; specific actions that the ITE 820 may take in this case are out of scope. 822 4.4. ETE Specification 824 4.4.1. Reassembly Buffer Requirements 826 The ETE SHOULD support IP-layer and SEAL-layer reassembly for inner 827 packets of at least 1280 bytes in length and MAY support reassembly 828 for larger inner packets; the ETE may instead support only a minimum- 829 sized reassembly buffer, but this may cause MTU underruns in some 830 environments. The ETE must retain the outer IP, SEAL and other outer 831 headers and trailers during both IP-layer and SEAL-layer reassembly 832 for the purpose of associating the fragments/segments of the same 833 packet, and must also configure a SEAL-layer reassembly buffer that 834 is no smaller than the IP-layer reassembly buffer. Hence, the ETE: 836 o SHOULD configure an outer IP-layer reassembly buffer size of at 837 least (1280 + HELN) bytes. 839 o MUST configure a SEAL-layer reassembly buffer size (i.e., S_MRU) 840 that is no smaller than the IP-layer reassembly buffer size. 842 o MUST be capable of discarding inner packets that require IP-layer 843 or SEAL-layer reassembly and that are larger than (S_MRU - HLEN). 845 The ETE can maintain S_MRU either as a single value to be applied for 846 all ITEs, or as a per-ITE value. In that case, the ETE can manage 847 each per-ITE S_MRU value separately (e.g., to reduce congestion 848 caused by excessive segmentation from specific ITEs) but should seek 849 to maintain as stable a value as possible for each ITE. 851 Note that the ETE is permitted to accept inner packets that did not 852 undergo IP-layer and/or SEAL-layer reassembly even if they are larger 853 than (S_MRU - HELN) bytes. Hence, S_MRU is a maximum *reassembly* 854 size, and may be less than the ETE is able to receive without 855 reassembly. 857 4.4.2. IP-Layer Reassembly 859 The ETE submits unfragmented SEAL protocol IP packets for SEAL-layer 860 reassembly as specified in Section 4.4.3. The ETE instead performs 861 standard IP-layer reassembly for multi-fragment SEAL protocol IP 862 packets as follows. 864 The ETE should maintain conservative IP-layer reassembly cache high- 865 and low-water marks. When the size of the reassembly cache exceeds 866 this high-water mark, the ETE should actively discard incomplete 867 reassemblies (e.g., using an Active Queue Management (AQM) strategy) 868 until the size falls below the low-water mark. The ETE should also 869 actively discard any pending reassemblies that clearly have no 870 opportunity for completion, e.g., when a considerable number of new 871 fragments have been received before a fragment that completes a 872 pending reassembly has arrived. Following successful IP-layer 873 reassembly, the ETE submits the reassembled packet for SEAL-layer 874 reassembly as specified in Section 4.4.3. 876 When the ETE processes the IP first fragment (i.e., one with MF=1 and 877 Offset=0 in the IP header) of a fragmented SEAL packet, it sends an 878 SCMP PTB message back to the ITE (see Section 4.5.1). When the ETE 879 processes an IP fragment that would cause the reassembled outer 880 packet to be larger than the IP-layer reassembly buffer following 881 reassembly, it discontinues the reassembly and discards any further 882 fragments of the same packet. 884 4.4.3. SEAL-Layer Reassembly 886 Following IP reassembly (if necessary), if the mid-layer packet has 887 an incorrect value in the SEAL header the ETE discards the packet and 888 returns an SCMP "Parameter Problem" message (see Section 4.5.1). 889 Next, if the SEAL header has A=1, the ETE sends a solicited SCMP 890 Neighbor Advertisement (NA) message back to the ITE (see Section 891 4.5.1). The ETE next submits single-segment mid-layer packets for 892 decapsulation and delivery to upper layers as specified in Section 893 4.4.4. The ETE instead performs SEAL-layer reassembly for multi- 894 segment mid-layer packets as follows. 896 The ETE adds each segment of a multi-segment mid-layer packet to a 897 SEAL-layer pending-reassembly queue according to the (Source, 898 Destination, SEAL_ID)-tuple found in the outer IP and SEAL headers. 899 The ETE performs SEAL-layer reassembly through simple in-order 900 concatenation of the encapsulated segments of the same mid-layer 901 packet from N consecutive SEAL segments. SEAL-layer reassembly 902 requires the ETE to maintain a cache of recently received segments 903 for a hold time that would allow for nominal inter-segment delays. 904 When a SEAL reassembly times out, the ETE discards the incomplete 905 reassembly and returns an SCMP "Time Exceeded" message to the ITE 906 (see Section 4.5.1). As for IP-layer reassembly, the ETE should also 907 maintain a conservative reassembly cache high- and low-water mark and 908 should actively discard any pending reassemblies that clearly have no 909 opportunity for completion, e.g., when a considerable number of new 910 SEAL packets have been received before a packet that completes a 911 pending reassembly has arrived. 913 If the ETE receives a SEAL packet for which a segment with the same 914 (Source, Destination, SEAL_ID)-tuple is already in the queue, it must 915 determine whether to accept the new segment and release the old, or 916 drop the new segment. If accepting the new segment would cause an 917 inconsistency with other segments already in the queue (e.g., 918 differing segment lengths), the ETE drops the segment that is least 919 likely to complete the reassembly. If the ETE accepts a new SEAL 920 segment that would cause the reassembled outer packet to be larger 921 than S_MRU following reassembly, it schedules the reassembly 922 resources for garbage collection and sends an SCMP PTB message back 923 to the ITE (see Section 4.5.1). 925 After all segments are gathered, the ETE reassembles the packet by 926 concatenating the segments encapsulated in the N consecutive SEAL 927 packets beginning with the initial segment (i.e., SEG=0) and followed 928 by any non-initial segments 1 through N-1. That is, for an N-segment 929 mid-layer packet, reassembly entails the concatenation of the SEAL- 930 encapsulated packet segments with (F=1, M=1, SEAL_ID=j) in the first 931 SEAL header, followed by (F=0, M=1, SEG=1, SEAL_ID=(j+1)) in the next 932 SEAL header, followed by (F=0, M=1, SEG=2, SEAL_ID=(j+2)), etc., up 933 to (F=0, M=0, SEG=(N-1), SEAL_ID=(j + N-1)) in the final SEAL header. 934 (Note that modulo arithmetic based on the length of the SEAL_ID field 935 is used). Following successful SEAL-layer reassembly, the ETE 936 submits the reassembled mid-layer packet for decapsulation and 937 delivery to upper layers as specified in Section 4.4.4. 939 4.4.4. Decapsulation and Delivery to Upper Layers 941 Following any necessary IP- and SEAL-layer reassembly, the ETE 942 discards the outer headers and trailers and performs any mid-layer 943 transformations on the mid-layer packet. The ETE next discards the 944 mid-layer headers and trailers, and delivers the inner packet to the 945 upper-layer protocol indicated either in the SEAL NEXTHDR field or 946 the next header field of the mid-layer packet (i.e., if the packet 947 included mid-layer encapsulations). The ETE instead silently 948 discards the inner packet if it was a NULL packet (see Section 949 4.3.9). 951 4.5. The SEAL Control Message Protocol (SCMP) 953 SEAL uses a companion SEAL Control Message Protocol (SCMP) based on 954 the same message format as the Internet Control Message Protocol for 955 IPv6 (ICMPv6) [RFC4443]. SCMP messages are further identified by the 956 NEXTHDR value '58' the same as for ICMPv6 messages, however the SCMP 957 message is *not* immediately preceded by an inner IPv6 header. 958 Instead, SCMP messages appear immediately following the SEAL header 959 which allows TEs to differentiate them from ordinary ICMPv6 messages. 960 Unlike ICMPv6 messages, SCMP messages are used only for the purpose 961 of conveying information between TEs, i.e., they are used only for 962 sharing control information within the tunnel and not beyond the 963 tunnel. 965 The following sections specify the generation and processing of SCMP 966 messages: 968 4.5.1. Generating SCMP Messages 970 SCMP messages may be generated by either ITEs or ETEs (i.e., by any 971 TE) using use the same message Type and Code values specified for 972 ordinary ICMPv6 messages in [RFC4443]. SCMP can also be used to 973 carry other message types and their associated options as specified 974 in other documents (e.g., [RFC4191][RFC4861]). The general format 975 for SCMP messages is shown in Figure 4: 977 0 1 2 3 978 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 980 | Type | Code | Checksum | 981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 982 | | 983 ~ Message Body ~ 984 | | 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 986 | As much of invoking SEAL data | 987 ~ packet as possible without the SCMP ~ 988 | packet exceeding 576 bytes (*) | 989 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 991 (*) also known as the "packet-in-error" 993 Figure 4: SCMP Message Format 995 As for ordinary ICMPv6 messages, the SCMP message begins with a 4 996 byte header that includes 8-bit Type and Code fields followed by a 997 16-bit Checksum field followed by a variable-length Message Body. 998 The Message Body is followed by the leading portion of the invoking 999 SEAL data packet (i.e., the "packet-in-error") IFF the packet-in- 1000 error would also be included in the corresponding ICMPv6 message. 1001 The TE sets the Type and Code fields to the same values that would 1002 appear in the corresponding ICMPv6 message and also formats the 1003 message body the same as for the corresponding ICMPv6 message except 1004 as otherwise specified. 1006 If the SCMP message will include a packet-in-error, the TE then 1007 includes as much of the leading portion of the invoking SEAL data 1008 packet as possible beginning with the outer IP header and extending 1009 to a length that would not cause the entire SCMP message following 1010 encapsulation to exceed 576 bytes. The ITE finally calculates the 1011 Checksum the same as specified for ICMPv4 messages [RFC0792] and does 1012 not include a pseudo-header of the outer IP header since the SEAL_ID 1013 gives sufficient assurance against mis-delivery. The TE then 1014 encapsulates the SCMP message in the outer headers as shown in 1015 Figure 5: 1017 +--------------------+ 1018 ~ outer IPv4 header ~ 1019 +--------------------+ 1020 ~ other outer hdrs ~ 1021 +--------------------+ 1022 ~ SEAL Header ~ 1023 +--------------------+ +--------------------+ 1024 ~ SCMP message header~ --> ~ SCMP message header~ 1025 +--------------------+ --> +--------------------+ 1026 ~ SCMP message body ~ --> ~ SCMP message body ~ 1027 +--------------------+ --> +--------------------+ 1028 ~ packet-in-error ~ --> ~ packet-in-error ~ 1029 +--------------------+ +--------------------+ 1030 ~ outer trailers ~ 1031 SCMP Message +--------------------+ 1032 before encapsulation 1033 SCMP Message 1034 after encapsulation 1036 Figure 5: SCMP Message Encapsulation 1038 When a TE generates an SCMP message in response to a packet-in-error, 1039 it sets the outer IP destination and source addresses of the SCMP 1040 packet to the packet-in-error's source and destination addresses 1041 (respectively). (If the destination address in the packet-in-error 1042 was multicast, the TE instead sets the outer IP source address of the 1043 SCMP packet to an address assigned to the underlying IP interface.) 1044 When a TE generates an SCMP message that is not due to a packet-in- 1045 error, it sets the outer IP destination and source addresses of the 1046 SCMP packet the same as for ordinary data packets. The TE finally 1047 sets the NEXTHDR field in the SEAL header to the value '58' (i.e., 1048 the official IANA protocol number for the ICMPv6 protocol) and sends 1049 the SCMP message to the tunnel far end. 1051 4.5.1.1. Generating SCMP Packet Too Big (PTB) Messages 1053 An ETE generates an SCMP Packet Too Big (PTB) message when it 1054 receives the IP first fragment (i.e., one with MF=1 and Offset=0 in 1055 the outer IP header) of a SEAL protocol packet that arrived as 1056 multiple IP fragments, or when it discontinues reassembly of a SEAL 1057 protocol packet that arrived as multiple IP fragments and/or multiple 1058 SEAL segments and would exceed S_MRU following reassembly. 1060 The ETE prepares an SCMP PTB message the same as for the 1061 corresponding ICMPv6 PTB message, except that it writes the value 0 1062 in the MTU field of the message if the PTB is generated as a result 1063 of receiving an IP first fragment and writes the S_MRU value for this 1064 ITE in the MTU field otherwise. 1066 4.5.1.2. Generating SCMP Neighbor Discovery Messages 1068 An ITE generates an SCMP "Neighbor Solicitation" (NS) or "Router 1069 Solicitation" (RS) message when it needs to solicit a response from 1070 an ETE. An ETE generates a solicited SCMP "Neighbor Advertisement" 1071 (NA) or "Router Advertisement" (RA) message when it receives an NS/RS 1072 message, and also generates a solicited NA message when it receives a 1073 SEAL protocol packet with A=1 in the SEAL header. Any TE may also 1074 generate unsolicited NA/RA messages that are not triggered by a 1075 specific solicitation event. 1077 The TE generates NS/RS and NA/RA messages the same as described for 1078 the corresponding IPv6 Neighbor Discovery (ND) messages (see: 1079 [RFC4861]), except that for solicited NA/RA messages it also includes 1080 a Redirected Header option formatted the same as for an IPv6 ND 1081 Redirect message. The messages may also be used in conjunction with 1082 the tunnel endpoint synchronization procedure specified in Section 1083 4.6. 1085 4.5.1.3. Generating Other SCMP Messages 1087 An ETE generates an SCMP "Destination Unreachable - Communication 1088 with Destination Administratively Prohibited" message when it is 1089 operating in synchronized mode and receives a SEAL packet with a 1090 SEAL_ID that is outside of the current window for this ITE (see: 1091 Section 4.6). An ETE also generates an SCMP "Destination 1092 Unreachable" message with an appropriate code under the same 1093 circumstances that an IPv6 system would generate an ICMPv6 1094 Destination Unreachable message using the same code. The SCMP 1095 Destination Unreachable message is formatted the same as for ICMPv6 1096 Destination Unreachable messages. 1098 An ETE generates an SCMP "Parameter Problem" message when it receives 1099 a SEAL packet with an incorrect value in the SEAL header, and 1100 generates an SCMP "Time Exceeded" message when it garbage collects an 1101 incomplete SEAL data packet reassembly. The message formats used are 1102 the same as for the corresponding ICMPv6 messages. 1104 Generation of all other SCMP message types is outside the scope of 1105 this document. 1107 4.5.2. Processing SCMP Messages 1109 An ITE processes any SCMP messages it receives as long as it can 1110 verify that the message was sent from an on-path ETE. The ITE can 1111 verify that the SCMP message came from an on-path ETE by checking 1112 that the SEAL_ID in the encapsulated packet-in-error corresponds to 1113 one of its recently-sent SEAL data packets. 1115 An ITE maintains a window of SEAL_IDs of packets that it has recently 1116 sent to each ETE. For each SCMP message it receives, the ITE first 1117 verifies that the SEAL_ID encoded in the packet-in-error is within 1118 the window of packets that it has recently sent to the ETE. The ITE 1119 then verifies that the Checksum in the SCMP message header is 1120 correct. If the SEAL_ID is outside of the window and/or the checksum 1121 is incorrect, the ITE discards the message; otherwise, it processes 1122 the message the same as for ordinary ICMPv6 messages. 1124 Any TE may also receive unsolicited SCMP messages from the tunnel far 1125 end. When the TEs are synchronized, they can also check that the 1126 SEAL_ID in the SEAL header of an SCMP message is within the window of 1127 recently received packets from this tunnel far end (see Section 4.6). 1129 Finally, TEs process SCMP messages as an indication that the tunnel 1130 far end is responsive, i.e., in the same manner implied for IPv6 1131 Neighbor Unreachability Detection "hints of forward progress" (see: 1132 [RFC4861]). 1134 4.5.2.1. Processing SCMP PTB Messages 1136 An ITE may receive an SCMP PTB message after it sends a SEAL data 1137 packet (see: Section 4.5.1). When the ITE receives an SCMP PTB 1138 message, it examines the MTU field in the message. If the MTU field 1139 is non-zero, the PTB was the result of a reassembly buffer 1140 limitation; in that case, the ITE records the value in the MTU field 1141 as the new S_MRU value for this ETE then (optionally) sends a 1142 translated PTB message of the inner network layer protocol to the 1143 original source with MTU set to (MAX(S_MRU, S_MSS) - HLEN). If the 1144 MTU field is zero, however, the PTB was the result of an IP 1145 fragmentation event; in that case, the ITE does not send back a 1146 translated PTB message but determines a new S_MSS value according to 1147 the length recorded in the IP header of the packet-in-error as 1148 follows: 1150 o If the length is no less than 1280, the ITE records the length as 1151 the new S_MSS value. 1153 o If the length is less than the current S_MSS value and also less 1154 than 1280, the ITE can discern that IP fragmentation is occurring 1155 but it cannot determine the true MTU of the restricting link due 1156 to the possibility that a router on the path is generating runt 1157 first fragments. 1159 In this latter case, the ITE must search for a reduced S_MSS value 1160 through an iterative searching strategy that parallels (Section 5 of 1161 [RFC1191]). This searching strategy may require multiple iterations 1162 in which the ITE sends SEAL data packets using a reduced S_MSS and 1163 receives additional SCMP MTU Report messages. During this process, 1164 it is essential that the ITE reduce S_MSS based on the first SCMP MTU 1165 Report message received under the current S_MSS size, and refrain 1166 from further reducing S_MSS until SCMP MTU Report messages pertaining 1167 to packets sent under the new S_MSS are received. 1169 4.5.2.2. Processing SCMP Neighbor Discovery Messages 1171 An ETE may received NS/RS messages from an ITE as an the initial leg 1172 in a neighbor discovery exchange. An ITE may receive both solicited 1173 and unsolicited NA/RA messages from an ETE, where solicited NA/RA 1174 messages are distinguished by their inclusion of a Redirected header 1175 option (see: Section 4.5.1). 1177 The TE processes NS/RS and NA/RA messages the same as described for 1178 the corresponding IPv6 Neighbor Discovery (ND) messages (see: 1179 [RFC4861]). The messages may also be used in conjunction with the 1180 tunnel endpoint synchronization procedure specified in Section 4.6. 1182 4.5.2.3. Processing Other SCMP Messages 1184 An ITE may receive an SCMP "Destination Unreachable - Communication 1185 with Destination Administratively Prohibited" message after it sends 1186 a SEAL data packet. The ITE processes this message as an indication 1187 that it needs to (re)synchronize with the ETE (see: Section 4.6). An 1188 ITE may also receive an SCMP "Destination Unreachable" message with 1189 an appropriate code under the same circumstances that an IPv6 host 1190 would receive an ICMPv6 Destination Unreachable message. 1192 An ITE may receive an SCMP "Parameter Problem" message when the ETE 1193 receives a SEAL packet with an incorrect value in the SEAL header. 1194 The ITE should examine the incorrect SEAL header field setting to 1195 determine whether a different setting should be used in subsequent 1196 packets. 1198 .An ITE may receive an SCMP "Time Exceeded" message when the ETE 1199 garbage collects an incomplete SEAL data packet reassembly. The ITE 1200 should consider the message as an indication of congestion. 1202 Processing of all other SCMP message types is outside the scope of 1203 this document. 1205 4.6. Tunnel Endpoint Synchronization 1207 The SEAL ITE maintains a per-ETE window of SEAL_IDs of its recently- 1208 sent packets, but by default the SEAL ETE does not retain inter- 1209 packet state. When closer synchronization is required, SEAL Tunnel 1210 Endpoints (TEs) can exchange initial sequence numbers in a procedure 1211 that parallels IPv6 neighbor discovery and the TCP 3-way handshake. 1212 When the TEs are synchronized, the ETE can also maintain a per-ITE 1213 window of SEAL_IDs of its recently-received packets. 1215 When an initiating TE ("TE(A)") needs to synchronize with a new 1216 tunnel far end ("TE(B)"), it first chooses a randomly-initialized 48- 1217 bit SEAL_ID value that it would like TE(B) to use (i.e., 1218 "SEAL_ID(B)"). TE(A) then creates a neighbor cache entry for TE(B) 1219 and records SEAL_ID(B) in the neighbor cache entry. Next, TE(A) 1220 creates an SCMP NS or RS message that includes a Nonce option (see: 1221 [RFC3971], Section 5.3). TE(A) then writes the value SEAL_ID(B) in 1222 the Nonce option, writes the value 0 in the SEAL_ID field of the SEAL 1223 header and sends the NS/RS message to TE(B). 1225 When TE(B) receives an NS/RS message with a Nonce option and with the 1226 value 0 in the SEAL_ID of the SEAL header, it considers the message 1227 as a potential synchronization request. TE(B) first extracts the 1228 value SEAL_ID(B) from the Nonce option then chooses a randomly- 1229 initialized 48-bit SEAL_ID value that it would like TE(A) to use 1230 (i.e., "SEAL_ID(A)"). TE(B) then stores the tuple (ip_src, 1231 SEAL_ID(A), SEAL_ID(B)) in a minimal temporary fast path data 1232 structure, where "ip_src" is the outer IP source address of the SCMP 1233 message. (For efficiency and security purposes, the data structure 1234 should be indexed, e.g., by a secret hash of the -tuple). TE(B) then 1235 creates a solicited SCMP NA or RA message that includes a Nonce 1236 option. It then writes the value SEAL_ID(A) in the Nonce option, 1237 writes the value SEAL_ID(B) in the SEAL_ID field of the SEAL header 1238 and sends the NA/RA message back to TE(A). 1240 When TE(A) receives the NA/RA, it considers the message as a 1241 potential synchronization acknowledgement. TE(A) first verifies that 1242 the value encoded in the SEAL_ID of the SEAL header matches the 1243 SEAL_ID(B) in the neighbor cache entry. If the values match, TE(A) 1244 extracts SEAL_ID(A) from the nonce option and records it in the 1245 neighbor cache entry; otherwise, it drops the packet. If instead 1246 TE(A) does not receive a timely NA/RA response, it retransmits the 1247 initial NS/RS message for a total of 3 tries before giving up the 1248 same as for ordinary IPv6 neighbor discovery. 1250 After TE(A) receives the synchronization acknowledgement, it begins 1251 sending either unsolicited NA/RA messages or ordinary data packets 1252 back to TE(B) using SEAL_ID(A) as the initial sequence number. When 1253 TE(B) receives these packets, it first checks its neighbor cache to 1254 see if there is a matching neighbor cache entry. If there is a 1255 neighbor cache entry, and the SEAL_ID in the header of the packet is 1256 within the window of the SEAL_ID recorded in the neighbor cache 1257 entry, TE(B) accepts the packet. If the SEAL_ID in the packet is 1258 newer than the SEAL_ID in the neighbor cache entry, TE(B) also 1259 updates the neighbor cache value. If there is no neighbor cache 1260 entry, TE(B) instead checks the fast path cache to see if the packet 1261 is a match for an in-progress synchronization event. If there is a 1262 fast path cache entry with a SEAL_ID(A) that is within the window of 1263 the SEAL_ID in the packet header, TE(B) accepts the packet and also 1264 creates a new neighbor cache entry with the tuple (ip_src, 1265 SEAL_ID(A), SEAL_ID(B)). If there is no matching fast path cache 1266 entry, TE(B) instead simply discards the packet. 1268 By maintaining the fast path cache, each TE is able to mitigate 1269 buffer exhaustion attacks that may be launched by off-path attackers 1270 [RFC4987]. The TE will receive positive confirmation that the 1271 synchronization request came from an on-path tunnel far end after it 1272 receives a stream of in-window packets as the "third leg" of this 1273 three-way handshake as described above. The TEs should maintain 1274 neighbor cache entries as long as they receive hints of forward 1275 progress from the tunnel far end, but should delete the neighbor 1276 cache entries after a nominal stale time (e.g., 30 seconds). The TEs 1277 should also purge fast-path cache entries for which no window 1278 synchronization messages are received within a nominal stale time 1279 (e.g., 5 seconds). 1281 After synchronization is complete, when a TE receives a SEAL packet 1282 it checks in its neighbor cache to determine whether the SEAL_ID is 1283 within the current window, and discards any packets that are outside 1284 the window. Since packets may be lost or reordered, and since SEAL 1285 presents only a best effort (i.e., and not reliable) link model, the 1286 TE should set a coarse-grained window size (e.g., 32768) and accept 1287 any packet with a SEAL_ID that is within the window. 1289 5. Link Requirements 1291 Subnetwork designers are expected to follow the recommendations in 1292 Section 2 of [RFC3819] when configuring link MTUs. 1294 6. End System Requirements 1296 SEAL provides robust mechanisms for returning PTB messages; however, 1297 end systems that send unfragmentable IP packets larger than 1500 1298 bytes are strongly encouraged to implement their own end-to-end MTU 1299 assurance, e.g., using Packetization Layer Path MTU Discovery per 1300 [RFC4821]. 1302 7. Router Requirements 1304 IPv4 routers within the subnetwork are strongly encouraged to 1305 implement IPv4 fragmentation such that the first fragment is the 1306 largest and approximately the size of the underlying link MTU, i.e., 1307 they should avoid generating runt first fragments. 1309 IPv6 routers within the subnetwork are required to generate the 1310 necessary PTB messages when they drop outer IPv6 packets due to an 1311 MTU restriction. 1313 8. IANA Considerations 1315 The IANA is instructed to allocate an IP protocol number for 1316 'SEAL_PROTO' in the 'protocol-numbers' registry. 1318 The IANA is instructed to allocate a Well-Known Port number for 1319 'SEAL_PORT' in the 'port-numbers' registry. 1321 The IANA is instructed to establish a "SEAL Protocol" registry to 1322 record SEAL Version values. This registry should be initialized to 1323 include the initial SEAL Version number, i.e., Version 0. 1325 9. Security Considerations 1327 Unlike IPv4 fragmentation, overlapping fragment attacks are not 1328 possible due to the requirement that SEAL segments be non- 1329 overlapping. This condition is naturally enforced due to the fact 1330 that each consecutive SEAL segment begins at offset 0 with respect to 1331 the previous SEAL segment. 1333 An amplification/reflection attack is possible when an attacker sends 1334 IP first fragments with spoofed source addresses to an ETE, resulting 1335 in a stream of SCMP messages returned to a victim ITE. The SEAL_ID 1336 in the encapsulated segment of the spoofed IP first fragment provides 1337 mitigation for the ITE to detect and discard spurious SCMP messages. 1339 The SEAL header is sent in-the-clear (outside of any IPsec/ESP 1340 encapsulations) the same as for the outer IP and other outer headers. 1341 In this respect, the threat model is no different than for IPv6 1342 extension headers. As for IPv6 extension headers, the SEAL header is 1343 protected only by L2 integrity checks and is not covered under any L3 1344 integrity checks. 1346 SCMP messages carry the SEAL_ID of the packet-in-error. Therefore, 1347 when an ITE receives an SCMP message it can unambiguously associate 1348 it with the SEAL data packet that triggered the error. When the TEs 1349 are synchronized, the ETE can also detect off-path spoofing attacks. 1351 Security issues that apply to tunneling in general are discussed in 1352 [I-D.ietf-v6ops-tunnel-security-concerns]. 1354 10. Related Work 1356 Section 3.1.7 of [RFC2764] provides a high-level sketch for 1357 supporting large tunnel MTUs via a tunnel-level segmentation and 1358 reassembly capability to avoid IP level fragmentation, which is in 1359 part the same approach used by SEAL. SEAL could therefore be 1360 considered as a fully functioned manifestation of the method 1361 postulated by that informational reference. 1363 Section 3 of [RFC4459] describes inner and outer fragmentation at the 1364 tunnel endpoints as alternatives for accommodating the tunnel MTU; 1365 however, the SEAL protocol specifies a mid-layer segmentation and 1366 reassembly capability that is distinct from both inner and outer 1367 fragmentation. 1369 Section 4 of [RFC2460] specifies a method for inserting and 1370 processing extension headers between the base IPv6 header and 1371 transport layer protocol data. The SEAL header is inserted and 1372 processed in exactly the same manner. 1374 The concepts of path MTU determination through the report of 1375 fragmentation and extending the IP Identification field were first 1376 proposed in deliberations of the TCP-IP mailing list and the Path MTU 1377 Discovery Working Group (MTUDWG) during the late 1980's and early 1378 1990's. SEAL supports a report fragmentation capability using bits 1379 in an extension header (the original proposal used a spare bit in the 1380 IP header) and supports ID extension through a 16-bit field in an 1381 extension header (the original proposal used a new IP option). A 1382 historical analysis of the evolution of these concepts, as well as 1383 the development of the eventual path MTU discovery mechanism for IP, 1384 appears in Appendix D of this document. 1386 11. SEAL Advantages over Classical Methods 1388 The SEAL approach offers a number of distinct advantages over the 1389 classical path MTU discovery methods [RFC1191] [RFC1981]: 1391 1. Classical path MTU discovery always results in packet loss when 1392 an MTU restriction is encountered. Using SEAL, IP fragmentation 1393 provides a short-term interim mechanism for ensuring that packets 1394 are delivered while SEAL adjusts its packet sizing parameters. 1396 2. Classical path MTU may require several iterations of dropping 1397 packets and returning PTB messages until an acceptable path MTU 1398 value is determined. Under normal circumstances, SEAL determines 1399 the correct packet sizing parameters in a single iteration. 1401 3. Using SEAL, ordinary packets serve as implicit probes without 1402 exposing data to unnecessary loss. SEAL also provides an 1403 explicit probing mode not available in the classic methods. 1405 4. Using SEAL, ETEs encapsulate SCMP error messages in outer and 1406 mid-layer headers such that packet-filtering network middleboxes 1407 will not filter them the same as for "raw" ICMP messages that may 1408 be generated by an attacker. 1410 5. The SEAL approach ensures that the tunnel either delivers or 1411 deterministically drops packets according to their size, which is 1412 a required characteristic of any IP link. 1414 6. Most importantly, all SEAL packets have an Identification field 1415 that is sufficiently long to be used for duplicate packet 1416 detection purposes and to associate ICMP error messages with 1417 actual packets sent without requiring per-packet state; hence, 1418 SEAL avoids certain denial-of-service attack vectors open to the 1419 classical methods. 1421 12. Acknowledgments 1423 The following individuals are acknowledged for helpful comments and 1424 suggestions: Jari Arkko, Fred Baker, Iljitsch van Beijnum, Oliver 1425 Bonaventure, Teco Boot, Bob Braden, Brian Carpenter, Steve Casner, 1426 Ian Chakeres, Noel Chiappa, Remi Denis-Courmont, Remi Despres, Ralph 1427 Droms, Aurnaud Ebalard, Gorry Fairhurst, Dino Farinacci, Joel 1428 Halpern, Sam Hartman, John Heffner, Thomas Henderson, Bob Hinden, 1429 Christian Huitema, Eliot Lear, Darrel Lewis, Joe Macker, Matt Mathis, 1430 Erik Nordmark, Dan Romascanu, Dave Thaler, Joe Touch, Mark Townsley, 1431 Ole Troan, Margaret Wasserman, Magnus Westerlund, Robin Whittle, 1432 James Woodyatt, and members of the Boeing Research & Technology NST 1433 DC&NT group. 1435 Path MTU determination through the report of fragmentation was first 1436 proposed by Charles Lynn on the TCP-IP mailing list in 1987. 1437 Extending the IP identification field was first proposed by Steve 1438 Deering on the MTUDWG mailing list in 1989. 1440 13. References 1442 13.1. Normative References 1444 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1445 September 1981. 1447 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1448 RFC 792, September 1981. 1450 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1451 Requirement Levels", BCP 14, RFC 2119, March 1997. 1453 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1454 (IPv6) Specification", RFC 2460, December 1998. 1456 [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure 1457 Neighbor Discovery (SEND)", RFC 3971, March 2005. 1459 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1460 Message Protocol (ICMPv6) for the Internet Protocol 1461 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1463 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 1464 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 1465 September 2007. 1467 13.2. Informative References 1469 [FOLK] Shannon, C., Moore, D., and k. claffy, "Beyond Folklore: 1470 Observations on Fragmented Traffic", December 2002. 1472 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 1473 October 1987. 1475 [I-D.ietf-intarea-ipv4-id-update] 1476 Touch, J., "Updated Specification of the IPv4 ID Field", 1477 draft-ietf-intarea-ipv4-id-update-00 (work in progress), 1478 March 2010. 1480 [I-D.ietf-tcpm-icmp-attacks] 1481 Gont, F., "ICMP attacks against TCP", 1482 draft-ietf-tcpm-icmp-attacks-12 (work in progress), 1483 March 2010. 1485 [I-D.ietf-v6ops-tunnel-security-concerns] 1486 Hoagland, J., Krishnan, S., and D. Thaler, "Security 1487 Concerns With IP Tunneling", 1488 draft-ietf-v6ops-tunnel-security-concerns-02 (work in 1489 progress), March 2010. 1491 [I-D.russert-rangers] 1492 Russert, S., Fleischman, E., and F. Templin, "Operational 1493 Scenarios for IRON and RANGER", draft-russert-rangers-03 1494 (work in progress), June 2010. 1496 [I-D.templin-intarea-vet] 1497 Templin, F., "Virtual Enterprise Traversal (VET)", 1498 draft-templin-intarea-vet-13 (work in progress), 1499 June 2010. 1501 [I-D.templin-iron] 1502 Templin, F., "The Internet Routing Overlay Network 1503 (IRON)", draft-templin-iron-02 (work in progress), 1504 June 2010. 1506 [MTUDWG] "IETF MTU Discovery Working Group mailing list, 1507 gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November 1508 1989 - February 1995.". 1510 [RFC1063] Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP 1511 MTU discovery options", RFC 1063, July 1988. 1513 [RFC1070] Hagens, R., Hall, N., and M. Rose, "Use of the Internet as 1514 a subnetwork for experimentation with the OSI network 1515 layer", RFC 1070, February 1989. 1517 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1518 November 1990. 1520 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1521 for IP version 6", RFC 1981, August 1996. 1523 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1524 October 1996. 1526 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1527 IPv6 Specification", RFC 2473, December 1998. 1529 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 1530 RFC 2675, August 1999. 1532 [RFC2764] Gleeson, B., Heinanen, J., Lin, A., Armitage, G., and A. 1533 Malis, "A Framework for IP Based Virtual Private 1534 Networks", RFC 2764, February 2000. 1536 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1537 RFC 2923, September 2000. 1539 [RFC3232] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by 1540 an On-line Database", RFC 3232, January 2002. 1542 [RFC3366] Fairhurst, G. and L. Wood, "Advice to link designers on 1543 link Automatic Repeat reQuest (ARQ)", BCP 62, RFC 3366, 1544 August 2002. 1546 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1547 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1548 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1549 RFC 3819, July 2004. 1551 [RFC4191] Draves, R. and D. Thaler, "Default Router Preferences and 1552 More-Specific Routes", RFC 4191, November 2005. 1554 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1555 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1557 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 1558 Network Address Translations (NATs)", RFC 4380, 1559 February 2006. 1561 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1562 Network Tunneling", RFC 4459, April 2006. 1564 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1565 Discovery", RFC 4821, March 2007. 1567 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1568 Errors at High Data Rates", RFC 4963, July 2007. 1570 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1571 Mitigations", RFC 4987, August 2007. 1573 [RFC5445] Watson, M., "Basic Forward Error Correction (FEC) 1574 Schemes", RFC 5445, March 2009. 1576 [RFC5720] Templin, F., "Routing and Addressing in Networks with 1577 Global Enterprise Recursion (RANGER)", RFC 5720, 1578 February 2010. 1580 [TBIT] Medina, A., Allman, M., and S. Floyd, "Measuring 1581 Interactions Between Transport Protocols and Middleboxes", 1582 October 2004. 1584 [TCP-IP] "Archive/Hypermail of Early TCP-IP Mail List, 1585 http://www-mice.cs.ucl.ac.uk/multimedia/misc/tcp_ip/, May 1586 1987 - May 1990.". 1588 [WAND] Luckie, M., Cho, K., and B. Owens, "Inferring and 1589 Debugging Path MTU Discovery Failures", October 2005. 1591 Appendix A. Reliability 1593 Although a SEAL tunnel may span an arbitrarily-large subnetwork 1594 expanse, the IP layer sees the tunnel as a simple link that supports 1595 the IP service model. Since SEAL supports segmentation at a layer 1596 below IP, SEAL therefore presents a case in which the link unit of 1597 loss (i.e., a SEAL segment) is smaller than the end-to-end 1598 retransmission unit (e.g., a TCP segment). 1600 Links with high bit error rates (BERs) (e.g., IEEE 802.11) use 1601 Automatic Repeat-ReQuest (ARQ) mechanisms [RFC3366] to increase 1602 packet delivery ratios, while links with much lower BERs typically 1603 omit such mechanisms. Since SEAL tunnels may traverse arbitrarily- 1604 long paths over links of various types that are already either 1605 performing or omitting ARQ as appropriate, it would therefore often 1606 be inefficient to also require the tunnel to perform ARQ. 1608 When the SEAL ITE has knowledge that the tunnel will traverse a 1609 subnetwork with non-negligible loss due to, e.g., interference, link 1610 errors, congestion, etc., it can solicit Segment Reports from the ETE 1611 periodically to discover missing segments for retransmission within a 1612 single round-trip time. However, retransmission of missing segments 1613 may require the ITE to maintain considerable state and may also 1614 result in considerable delay variance and packet reordering. 1616 SEAL may also use alternate reliability mechanisms such as Forward 1617 Error Correction (FEC). A simple FEC mechanism may merely entail 1618 gratuitous retransmissions of duplicate data, however more efficient 1619 alternatives are also possible. Basic FEC schemes are discussed in 1620 [RFC5445]. 1622 The use of ARQ and FEC mechanisms for improved reliability are for 1623 further study. 1625 Appendix B. Integrity 1627 Each link in the path over which a SEAL tunnel is configured is 1628 responsible for link layer integrity verification for packets that 1629 traverse the link. As such, when a multi-segment SEAL packet with N 1630 segments is reassembled, its segments will have been inspected by N 1631 independent link layer integrity check streams instead of a single 1632 stream that a single segment SEAL packet of the same size would have 1633 received. Intuitively, a reassembled packet subjected to N 1634 independent integrity check streams of shorter-length segments would 1635 seem to have integrity assurance that is no worse than a single- 1636 segment packet subjected to only a single integrity check steam, 1637 since the integrity check strength diminishes in inverse proportion 1638 with segment length. In any case, the link-layer integrity assurance 1639 for a multi-segment SEAL packet is no different than for a multi- 1640 fragment IPv6 packet. 1642 Fragmentation and reassembly schemes must also consider packet- 1643 splicing errors, e.g., when two segments from the same packet are 1644 concatenated incorrectly, when a segment from packet X is reassembled 1645 with segments from packet Y, etc. The primary sources of such errors 1646 include implementation bugs and wrapping IP ID fields. In terms of 1647 implementation bugs, the SEAL segmentation and reassembly algorithm 1648 is much simpler than IP fragmentation resulting in simplified 1649 implementations. In terms of wrapping ID fields, when IPv4 is used 1650 as the outer IP protocol, the 16-bit IP ID field can wrap with only 1651 64K packets with the same (src, dst, protocol)-tuple alive in the 1652 system at a given time [RFC4963] increasing the likelihood of 1653 reassembly mis-associations. However, SEAL ensures that any outer 1654 IPv4 fragmentation and reassembly will be short-lived and tuned out 1655 as soon as the ITE receives a Reassembly Repot, and SEAL segmentation 1656 and reassembly uses a much longer ID field. Therefore, reassembly 1657 mis-associations of IP fragments nor of SEAL segments should be 1658 prohibitively rare. 1660 Appendix C. Transport Mode 1662 SEAL can also be used in "transport-mode", e.g., when the inner layer 1663 comprises upper-layer protocol data rather than an encapsulated IP 1664 packet. For instance, TCP peers can negotiate the use of SEAL for 1665 the carriage of protocol data encapsulated as IPv4/SEAL/TCP. In this 1666 sense, the "subnetwork" becomes the entire end-to-end path between 1667 the TCP peers and may potentially span the entire Internet. 1669 Section specifies the operation of SEAL in "tunnel mode", i.e., when 1670 there are both an inner and outer IP layer with a SEAL encapsulation 1671 layer between. However, the SEAL protocol can also be used in a 1672 "transport mode" of operation within a subnetwork region in which the 1673 inner-layer corresponds to a transport layer protocol (e.g., UDP, 1674 TCP, etc.) instead of an inner IP layer. 1676 For example, two TCP endpoints connected to the same subnetwork 1677 region can negotiate the use of transport-mode SEAL for a connection 1678 by inserting a 'SEAL_OPTION' TCP option during the connection 1679 establishment phase. If both TCPs agree on the use of SEAL, their 1680 protocol messages will be carried as TCP/SEAL/IPv4 and the connection 1681 will be serviced by the SEAL protocol using TCP (instead of an 1682 encapsulating tunnel endpoint) as the transport layer protocol. The 1683 SEAL protocol for transport mode otherwise observes the same 1684 specifications as for Section 4. 1686 Appendix D. Historic Evolution of PMTUD 1688 The topic of Path MTU discovery (PMTUD) saw a flurry of discussion 1689 and numerous proposals in the late 1980's through early 1990. The 1690 initial problem was posed by Art Berggreen on May 22, 1987 in a 1691 message to the TCP-IP discussion group [TCP-IP]. The discussion that 1692 followed provided significant reference material for [FRAG]. An IETF 1693 Path MTU Discovery Working Group [MTUDWG] was formed in late 1989 1694 with charter to produce an RFC. Several variations on a very few 1695 basic proposals were entertained, including: 1697 1. Routers record the PMTUD estimate in ICMP-like path probe 1698 messages (proposed in [FRAG] and later [RFC1063]) 1700 2. The destination reports any fragmentation that occurs for packets 1701 received with the "RF" (Report Fragmentation) bit set (Steve 1702 Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal) 1704 3. A hybrid combination of 1) and Charles Lynn's Nov. 1987 (straw 1705 RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990) 1707 4. Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30, 1708 1990) 1710 5. Fragmentation avoidance by setting "IP_DF" flag on all packets 1711 and retransmitting if ICMPv4 "fragmentation needed" messages 1712 occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191] 1713 by Mogul and Deering). 1715 Option 1) seemed attractive to the group at the time, since it was 1716 believed that routers would migrate more quickly than hosts. Option 1717 2) was a strong contender, but repeated attempts to secure an "RF" 1718 bit in the IPv4 header from the IESG failed and the proponents became 1719 discouraged. 3) was abandoned because it was perceived as too 1720 complicated, and 4) never received any apparent serious 1721 consideration. Proposal 5) was a late entry into the discussion from 1722 Steve Deering on Feb. 24th, 1990. The discussion group soon 1723 thereafter seemingly lost track of all other proposals and adopted 1724 5), which eventually evolved into [RFC1191] and later [RFC1981]. 1726 In retrospect, the "RF" bit postulated in 2) is not needed if a 1727 "contract" is first established between the peers, as in proposal 4) 1728 and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on 1729 Feb 19. 1990. These proposals saw little discussion or rebuttal, and 1730 were dismissed based on the following the assertions: 1732 o routers upgrade their software faster than hosts 1734 o PCs could not reassemble fragmented packets 1736 o Proteon and Wellfleet routers did not reproduce the "RF" bit 1737 properly in fragmented packets 1739 o Ethernet-FDDI bridges would need to perform fragmentation (i.e., 1740 "translucent" not "transparent" bridging) 1742 o the 16-bit IP_ID field could wrap around and disrupt reassembly at 1743 high packet arrival rates 1745 The first four assertions, although perhaps valid at the time, have 1746 been overcome by historical events. The final assertion is addressed 1747 by the mechanisms specified in SEAL. 1749 Author's Address 1751 Fred L. Templin (editor) 1752 Boeing Research & Technology 1753 P.O. Box 3707 1754 Seattle, WA 98124 1755 USA 1757 Email: fltemplin@acm.org