idnits 2.17.1 draft-templin-intarea-seal-60.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3937 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC3971' is defined on line 1675, but no explicit reference was found in the text == Unused Reference: 'RFC4861' is defined on line 1682, but no explicit reference was found in the text == Unused Reference: 'RFC1063' is defined on line 1718, but no explicit reference was found in the text == Unused Reference: 'RFC1146' is defined on line 1725, but no explicit reference was found in the text == Unused Reference: 'RFC2675' is defined on line 1750, but no explicit reference was found in the text == Unused Reference: 'RFC2780' is defined on line 1757, but no explicit reference was found in the text == Unused Reference: 'RFC4191' is defined on line 1780, but no explicit reference was found in the text == Unused Reference: 'RFC4987' is defined on line 1801, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 1804, but no explicit reference was found in the text == Unused Reference: 'RFC5246' is defined on line 1808, but no explicit reference was found in the text == Unused Reference: 'RFC5445' is defined on line 1814, but no explicit reference was found in the text == Unused Reference: 'RFC6335' is defined on line 1830, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-02) exists of draft-taylor-v6ops-fragdrop-01 == Outdated reference: A later version (-16) exists of draft-templin-ironbis-15 -- Obsolete informational reference (is this intentional?): RFC 1063 (Obsoleted by RFC 1191) -- Obsolete informational reference (is this intentional?): RFC 1146 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 6434 (Obsoleted by RFC 8504) Summary: 1 error (**), 0 flaws (~~), 16 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Obsoletes: rfc5320 (if approved) July 15, 2013 5 Intended status: Informational 6 Expires: January 16, 2014 8 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 9 draft-templin-intarea-seal-60.txt 11 Abstract 13 This document specifies a Subnetwork Encapsulation and Adaptation 14 Layer (SEAL). SEAL operates over virtual topologies configured over 15 connected IP network routing regions bounded by encapsulating border 16 nodes. These virtual topologies are manifested by tunnels that may 17 span multiple IP and/or sub-IP layer forwarding hops, where they may 18 incur packet duplication, packet reordering, source address spoofing 19 and traversal of links with diverse Maximum Transmission Units 20 (MTUs). SEAL addresses these issues through the encapsulation and 21 messaging mechanisms specified in this document. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 16, 2014. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 59 1.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 1.3. Differences with RFC5320 . . . . . . . . . . . . . . . . . 7 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 10 63 4. Applicability Statement . . . . . . . . . . . . . . . . . . . 10 64 5. SEAL Specification . . . . . . . . . . . . . . . . . . . . . . 11 65 5.1. SEAL Tunnel Model . . . . . . . . . . . . . . . . . . . . 11 66 5.2. SEAL Model of Operation . . . . . . . . . . . . . . . . . 11 67 5.3. SEAL Header and Trailer Format . . . . . . . . . . . . . . 13 68 5.4. ITE Specification . . . . . . . . . . . . . . . . . . . . 15 69 5.4.1. Tunnel Interface MTU . . . . . . . . . . . . . . . . . 15 70 5.4.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 16 71 5.4.3. SEAL Layer Pre-Processing . . . . . . . . . . . . . . 17 72 5.4.4. SEAL Encapsulation and Segmentation . . . . . . . . . 19 73 5.4.5. Outer Encapsulation . . . . . . . . . . . . . . . . . 20 74 5.4.6. Path Probing and ETE Reachability Verification . . . . 21 75 5.4.7. Processing ICMP Messages . . . . . . . . . . . . . . . 22 76 5.4.8. IPv4 Middlebox Reassembly Testing . . . . . . . . . . 23 77 5.4.9. Stateful MTU Determination . . . . . . . . . . . . . . 24 78 5.4.10. Detecting Path MTU Changes . . . . . . . . . . . . . . 24 79 5.5. ETE Specification . . . . . . . . . . . . . . . . . . . . 24 80 5.5.1. Reassembly Buffer Requirements . . . . . . . . . . . . 25 81 5.5.2. Tunnel Neighbor Soft State . . . . . . . . . . . . . . 25 82 5.5.3. IP-Layer Reassembly . . . . . . . . . . . . . . . . . 25 83 5.5.4. Decapsulation, SEAL-Layer Reassembly, and 84 Re-Encapsulation . . . . . . . . . . . . . . . . . . . 26 85 5.6. The SEAL Control Message Protocol (SCMP) . . . . . . . . . 27 86 5.6.1. Generating SCMP Error Messages . . . . . . . . . . . . 28 87 5.6.2. Processing SCMP Error Messages . . . . . . . . . . . . 30 88 6. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 32 89 7. End System Requirements . . . . . . . . . . . . . . . . . . . 32 90 8. Router Requirements . . . . . . . . . . . . . . . . . . . . . 32 91 9. Nested Encapsulation Considerations . . . . . . . . . . . . . 33 92 10. Reliability Considerations . . . . . . . . . . . . . . . . . . 33 93 11. Integrity Considerations . . . . . . . . . . . . . . . . . . . 34 94 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 95 13. Security Considerations . . . . . . . . . . . . . . . . . . . 34 96 14. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 35 97 15. Implementation Status . . . . . . . . . . . . . . . . . . . . 36 98 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36 99 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 100 17.1. Normative References . . . . . . . . . . . . . . . . . . . 36 101 17.2. Informative References . . . . . . . . . . . . . . . . . . 37 102 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 41 104 1. Introduction 106 As Internet technology and communication has grown and matured, many 107 techniques have developed that use virtual topologies (manifested by 108 tunnels of one form or another) over an actual network that supports 109 the Internet Protocol (IP) [RFC0791][RFC2460]. Those virtual 110 topologies have elements that appear as one network layer hop, but 111 are actually multiple IP or sub-IP layer hops. These multiple hops 112 often have quite diverse properties that are often not even visible 113 to the endpoints of the virtual hop. This introduces failure modes 114 that are not dealt with well in current approaches. 116 The use of IP encapsulation (also known as "tunneling") has long been 117 considered as the means for creating such virtual topologies (e.g., 118 see [RFC2003][RFC2473]). However, the encapsulation headers often 119 include insufficiently provisioned per-packet identification values. 120 IP encapsulation also allows an attacker to produce encapsulated 121 packets with spoofed source addresses even if the source address in 122 the encapsulating header cannot be spoofed. A denial-of-service 123 vector that is not possible in non-tunneled subnetworks is therefore 124 presented. 126 Additionally, the insertion of an outer IP header reduces the 127 effective path MTU visible to the inner network layer. When IPv6 is 128 used as the encapsulation protocol, original sources expect to be 129 informed of the MTU limitation through IPv6 Path MTU discovery 130 (PMTUD) [RFC1981]. When IPv4 is used, this reduced MTU can be 131 accommodated through the use of IPv4 fragmentation, but unmitigated 132 in-the-network fragmentation has been found to be harmful through 133 operational experience and studies conducted over the course of many 134 years [FRAG][FOLK][RFC4963]. Additionally, classical IPv4 PMTUD 135 [RFC1191] has known operational issues that are exacerbated by in- 136 the-network tunnels [RFC2923][RFC4459]. 138 The following subsections present further details on the motivation 139 and approach for addressing these issues. 141 1.1. Motivation 143 Before discussing the approach, it is necessary to first understand 144 the problems. In both the Internet and private-use networks today, 145 IP is ubiquitously deployed as the Layer 3 protocol. The primary 146 functions of IP are to provide for routing, addressing, and a 147 fragmentation and reassembly capability used to accommodate links 148 with diverse MTUs. While it is well known that the IP address space 149 is rapidly becoming depleted, there is also a growing awareness that 150 other IP protocol limitations have already or may soon become 151 problematic. 153 First, the Internet historically provided no means for discerning 154 whether the source addresses of IP packets are authentic. This 155 shortcoming is being addressed more and more through the deployment 156 of site border router ingress filters [RFC2827], however the use of 157 encapsulation provides a vector for an attacker to circumvent 158 filtering for the encapsulated packet even if filtering is correctly 159 applied to the encapsulation header. Secondly, the IP header does 160 not include a well-behaved identification value unless the source has 161 included a fragment header for IPv6 or unless the source permits 162 fragmentation for IPv4. These limitations preclude an efficient 163 means for routers to detect duplicate packets and packets that have 164 been re-ordered within the subnetwork. Additionally, recent studies 165 have shown that the arrival of fragments at high data rates can cause 166 denial-of-service (DoS) attacks on performance-sensitive networking 167 gear, prompting some administrators to configure their equipment to 168 drop fragments unconditionally [I-D.taylor-v6ops-fragdrop]. 170 For IPv4 encapsulation, when fragmentation is permitted the header 171 includes a 16-bit Identification field, meaning that at most 2^16 172 unique packets with the same (source, destination, protocol)-tuple 173 can be active in the network at the same time [RFC6864]. (When 174 middleboxes such as Network Address Translators (NATs) re-write the 175 Identification field to random values, the number of unique packets 176 is even further reduced.) Due to the escalating deployment of high- 177 speed links, however, these numbers have become too small by several 178 orders of magnitude for high data rate packet sources such as tunnel 179 endpoints [RFC4963]. 181 Furthermore, there are many well-known limitations pertaining to IPv4 182 fragmentation and reassembly - even to the point that it has been 183 deemed "harmful" in both classic and modern-day studies (see above). 184 In particular, IPv4 fragmentation raises issues ranging from minor 185 annoyances (e.g., in-the-network router fragmentation [RFC1981]) to 186 the potential for major integrity issues (e.g., mis-association of 187 the fragments of multiple IP packets during reassembly [RFC4963]). 189 As a result of these perceived limitations, a fragmentation-avoiding 190 technique for discovering the MTU of the forward path from a source 191 to a destination node was devised through the deliberations of the 192 Path MTU Discovery Working Group (MTUDWG) during the late 1980's 193 through early 1990's which resulted in the publication of [RFC1191]. 194 In this negative feedback-based method, the source node provides 195 explicit instructions to routers in the path to discard the packet 196 and return an ICMP error message if an MTU restriction is 197 encountered. However, this approach has several serious shortcomings 198 that lead to an overall "brittleness" [RFC2923]. 200 In particular, site border routers in the Internet have been known to 201 discard ICMP error messages coming from the outside world. This is 202 due in large part to the fact that malicious spoofing of error 203 messages in the Internet is trivial since there is no way to 204 authenticate the source of the messages [RFC5927]. Furthermore, when 205 a source node that requires ICMP error message feedback when a packet 206 is dropped due to an MTU restriction does not receive the messages, a 207 path MTU-related black hole occurs. This means that the source will 208 continue to send packets that are too large and never receive an 209 indication from the network that they are being discarded. This 210 behavior has been confirmed through documented studies showing clear 211 evidence of PMTUD failures for both IPv4 and IPv6 in the Internet 212 today [TBIT][WAND][SIGCOMM][RIPE]. 214 The issues with both IP fragmentation and this "classical" PMTUD 215 method are exacerbated further when IP tunneling is used [RFC4459]. 216 For example, an ingress tunnel endpoint (ITE) may be required to 217 forward encapsulated packets into the subnetwork on behalf of 218 hundreds, thousands, or even more original sources. If the ITE 219 allows IP fragmentation on the encapsulated packets, persistent 220 fragmentation could lead to undetected data corruption due to 221 Identification field wrapping and/or reassembly congestion at the 222 ETE. If the ITE instead uses classical IP PMTUD it must rely on ICMP 223 error messages coming from the subnetwork that may be suspect, 224 subject to loss due to filtering middleboxes, or insufficiently 225 provisioned for translation into error messages to be returned to the 226 original sources. 228 Although recent works have led to the development of a positive 229 feedback-based end-to-end MTU determination scheme [RFC4821], they do 230 not excuse tunnels from accounting for the encapsulation overhead 231 they add to packets. Moreover, in current practice existing 232 tunneling protocols mask the MTU issues by selecting a "lowest common 233 denominator" MTU that may be much smaller than necessary for most 234 paths and difficult to change at a later date. Therefore, a new 235 approach to accommodate tunnels over links with diverse MTUs is 236 necessary. 238 1.2. Approach 240 This document concerns subnetworks manifested through a virtual 241 topology configured over a connected network routing region and 242 bounded by encapsulating border nodes. Example connected network 243 routing regions include Mobile Ad hoc Networks (MANETs), enterprise 244 networks and the global public Internet itself. Subnetwork border 245 nodes forward unicast and multicast packets over the virtual topology 246 across multiple IP and/or sub-IP layer forwarding hops that may 247 introduce packet duplication and/or traverse links with diverse 248 Maximum Transmission Units (MTUs). 250 This document introduces a Subnetwork Encapsulation and Adaptation 251 Layer (SEAL) for tunneling inner network layer protocol packets over 252 IP subnetworks that connect Ingress and Egress Tunnel Endpoints 253 (ITEs/ETEs) of border nodes. It provides a modular specification 254 designed to be tailored to specific associated tunneling protocols. 255 (A transport-mode of operation is also possible, but out of scope for 256 this document.) 258 SEAL provides a mid-layer encapsulation that accommodates links with 259 diverse MTUs, and allows routers in the subnetwork to perform 260 efficient duplicate packet and packet reordering detection. The 261 encapsulation further ensures message origin authentication, packet 262 header integrity and anti-replay in environments in which these 263 functions are necessary. 265 SEAL treats tunnels that traverse the subnetwork as ordinary links 266 that must support network layer services. Moreover, SEAL provides 267 dynamic mechanisms (including limited segmentation and reassembly) to 268 ensure a maximal path MTU over the tunnel. This is in contrast to 269 static approaches which avoid MTU issues by selecting a lowest common 270 denominator MTU value that may be overly conservative for the vast 271 majority of tunnel paths and difficult to change even when larger 272 MTUs become available. 274 1.3. Differences with RFC5320 276 This specification of SEAL is descended from an experimental 277 independent RFC publication of the same name [RFC5320]. However, 278 this specification introduces a number of important differences from 279 the earlier publication. 281 First, this specification includes a protocol version field in the 282 SEAL header whereas [RFC5320] does not, and therefore cannot be 283 updated by future revisions. This specification therefore obsoletes 284 (i.e., and does not update) [RFC5320]. 286 Secondly, [RFC5320] forms a 32-bit Identification value by 287 concatenating the 16-bit IPv4 Identification field with a 16-bit 288 Identification "extension" field in the SEAL header. This means that 289 [RFC5320] can only operate over IPv4 networks (since IPv6 headers do 290 not include a 16-bit version number) and that the SEAL Identification 291 value can be corrupted if the Identification in the outer IPv4 header 292 is rewritten. In contrast, this specification includes a 32-bit 293 Identification value that is independent of any identification fields 294 found in the inner or outer IP headers, and is therefore compatible 295 with any inner and outer IP protocol version combinations. 297 Additionally, the SEAL segmentation and reassembly procedures defined 298 in [RFC5320] differ significantly from those found in this 299 specification. In particular, this specification defines an 8-bit 300 Offset field that allows for smaller segment sizes when SEAL 301 segmentation is necessary. In contrast, [RFC5320] includes a 3-bit 302 Segment field and performs reassembly through concatenation of 303 consecutive segments. 305 The SEAL header in this specification also includes an optional 306 Integrity Check Vector (ICV) that can be used to digitally sign the 307 SEAL header and the leading portion of the encapsulated inner packet. 308 This allows for a lightweight integrity check and a loose message 309 origin authentication capability. The header further includes new 310 control bits as well as a link identification and encapsulation level 311 field for additional control capabilities. 313 Finally, this version of SEAL includes a new messaging protocol known 314 as the SEAL Control Message Protocol (SCMP), whereas [RFC5320] 315 performs signalling through the use of SEAL-encapsulated ICMP 316 messages. The use of SCMP allows SEAL-specific departures from ICMP, 317 as well as a control messaging capability that extends to other 318 specifications, including Virtual Enterprise Traversal (VET) 319 [I-D.templin-intarea-vet]. 321 2. Terminology 323 The following terms are defined within the scope of this document: 325 subnetwork 326 a virtual topology configured over a connected network routing 327 region and bounded by encapsulating border nodes. 329 IP 330 used to generically refer to either Internet Protocol (IP) 331 version, i.e., IPv4 or IPv6. 333 Ingress Tunnel Endpoint (ITE) 334 a virtual interface over which an encapsulating border node (host 335 or router) sends encapsulated packets into the subnetwork. 337 Egress Tunnel Endpoint (ETE) 338 a virtual interface over which an encapsulating border node (host 339 or router) receives encapsulated packets from the subnetwork. 341 SEAL Path 342 a subnetwork path from an ITE to an ETE beginning with an 343 underlying link of the ITE as the first hop. Note that, if the 344 ITE's interface connection to the underlying link assigns multiple 345 IP addresses, each address represents a separate SEAL path. 347 inner packet 348 an unencapsulated network layer protocol packet (e.g., IPv4 349 [RFC0791], OSI/CLNP [RFC0994], IPv6 [RFC2460], etc.) before any 350 outer encapsulations are added. Internet protocol numbers that 351 identify inner packets are found in the IANA Internet Protocol 352 registry [RFC3232]. SEAL protocol packets that incur an 353 additional layer of SEAL encapsulation are also considered inner 354 packets. 356 outer IP packet 357 a packet resulting from adding an outer IP header (and possibly 358 other outer headers) to a SEAL-encapsulated inner packet. 360 packet-in-error 361 the leading portion of an invoking data packet encapsulated in the 362 body of an error control message (e.g., an ICMPv4 [RFC0792] error 363 message, an ICMPv6 [RFC4443] error message, etc.). 365 Packet Too Big (PTB) message 366 a control plane message indicating an MTU restriction (e.g., an 367 ICMPv6 "Packet Too Big" message [RFC4443], an ICMPv4 368 "Fragmentation Needed" message [RFC0792], etc.). 370 Don't Fragment (DF) bit 371 a bit that indicates whether the packet may be fragmented by the 372 network. The DF bit is explicitly included in the IPv4 header 373 [RFC0791] and may be set to '0' to allow fragmentation or '1' to 374 disallow further in-network fragmentation. The bit is absent from 375 the IPv6 header [RFC2460], but implicitly set to '1' because 376 fragmentation can occur only at IPv6 sources. 378 The following abbreviations correspond to terms used within this 379 document and/or elsewhere in common Internetworking nomenclature: 381 HLEN - the length of the SEAL header plus outer headers 383 ICV - Integrity Check Vector 385 MAC - Message Authentication Code 387 MTU - Maximum Transmission Unit 389 SCMP - the SEAL Control Message Protocol 391 SDU - SCMP Destination Unreachable message 392 SPP - SCMP Parameter Problem message 394 SPTB - SCMP Packet Too Big message 396 SEAL - Subnetwork Encapsulation and Adaptation Layer 398 TE - Tunnel Endpoint (i.e., either ingress or egress) 400 VET - Virtual Enterprise Traversal 402 3. Requirements 404 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 405 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 406 document are to be interpreted as described in [RFC2119]. When used 407 in lower case (e.g., must, must not, etc.), these words MUST NOT be 408 interpreted as described in [RFC2119], but are rather interpreted as 409 they would be in common English. 411 4. Applicability Statement 413 SEAL was originally motivated by the specific case of subnetwork 414 abstraction for Mobile Ad hoc Networks (MANETs), however the domain 415 of applicability also extends to subnetwork abstractions over 416 enterprise networks, ISP networks, SO/HO networks, the global public 417 Internet itself, and any other connected network routing region. 419 SEAL provides a network sublayer for encapsulation of an inner 420 network layer packet within outer encapsulating headers. SEAL can 421 also be used as a sublayer within a transport layer protocol data 422 payload, where transport layer encapsulation is typically used for 423 Network Address Translator (NAT) traversal as well as operation over 424 subnetworks that give preferential treatment to certain "core" 425 Internet protocols, e.g., TCP, UDP, etc.. (However, note that TCP 426 encapsulation may not be appropriate for all use cases; particularly 427 those that require low delay and/or delay variance.) The SEAL header 428 is processed in a similar manner as for IPv6 extension headers, i.e., 429 it is not part of the outer IP header but rather allows for the 430 creation of an arbitrarily extensible chain of headers in the same 431 way that IPv6 does. 433 To accommodate MTU diversity, the Ingress Tunnel Endpoint (ITE) may 434 need to perform limited segmentation which the Egress Tunnel Endpoint 435 (ETE) reassembles. The ETE further acts as a passive observer that 436 informs the ITE of any packet size limitations. This allows the ITE 437 to return appropriate PMTUD feedback even if the network path between 438 the ITE and ETE filters ICMP messages. 440 SEAL further provides mechanisms to ensure message origin 441 authentication, packet header integrity, and anti-replay. The SEAL 442 framework is therefore similar to the IP Security (IPsec) 443 Authentication Header (AH) [RFC4301][RFC4302], however it provides 444 only minimal hop-by-hop authenticating services while leaving full 445 data integrity, authentication and confidentiality services as an 446 end-to-end consideration. 448 In many aspects, SEAL also very closely resembles the Generic Routing 449 Encapsulation (GRE) framework [RFC1701]. SEAL can therefore be 450 applied in the same use cases that are traditionally addressed by 451 GRE, but goes beyond GRE to also provide additional capabilities 452 (e.,g., path MTU accommodation, message origin authentication, etc.) 453 as described in this document. 455 5. SEAL Specification 457 The following sections specify the operation of SEAL: 459 5.1. SEAL Tunnel Model 461 SEAL is an encapsulation sublayer used within point-to-point, point- 462 to-multipoint, and non-broadcast, multiple access (NBMA) tunnels. 463 Each SEAL path is configured over one or more underlying interfaces 464 attached to subnetwork links. The SEAL tunnel connects an ITE to one 465 or more ETE "neighbors" via encapsulation across an underlying 466 subnetwork, where the tunnel neighbor relationship may be either 467 unidirectional or bidirectional. 469 A unidirectional tunnel neighbor relationship allows the near end ITE 470 to send data packets forward to the far end ETE, while the ETE only 471 returns control messages when necessary. A bidirectional tunnel 472 neighbor relationship is one over which both TEs can exchange both 473 data and control messages. 475 Implications of the SEAL unidirectional and bidirectional models are 476 the same as discussed in [I-D.templin-intarea-vet]. 478 5.2. SEAL Model of Operation 480 SEAL-enabled ITEs encapsulate each inner packet in a SEAL header and 481 any outer header encapsulations as shown in Figure 1: 483 +--------------------+ 484 ~ outer IP header ~ 485 +--------------------+ 486 ~ other outer hdrs ~ 487 +--------------------+ 488 ~ SEAL Header ~ 489 +--------------------+ +--------------------+ 490 | | --> | | 491 ~ Inner ~ --> ~ Inner ~ 492 ~ Packet ~ --> ~ Packet ~ 493 | | --> | | 494 +--------------------+ +----------+---------+ 496 Figure 1: SEAL Encapsulation 498 The ITE inserts the SEAL header according to the specific tunneling 499 protocol. For simple encapsulation of an inner network layer packet 500 within an outer IP header, the ITE inserts the SEAL header following 501 the outer IP header and before the inner packet as: IP/SEAL/{inner 502 packet}. 504 For encapsulations over transports such as UDP, the ITE inserts the 505 SEAL header following the outer transport layer header and before the 506 inner packet, e.g., as IP/UDP/SEAL/{inner packet}. In that case, the 507 UDP header is seen as an "other outer header" as depicted in Figure 1 508 and the outer IP and transport layer headers are together seen as the 509 outer encapsulation headers. 511 For SEAL encapsulations that involve other tunnel types (e.g., GRE, 512 IPsec, etc.) the ITE inserts the SEAL header as a leading extension 513 to the other tunnel headers, i.e., the SEAL encapsulation appears as 514 part of the same tunnel and not a separate tunnel. For example, for 515 IPsec/ESP the ITE inserts the SEAL header as IP/SEAL/ESP/{inner 516 packet}. In that case, SEAL considers the length of the inner packet 517 only (i.e., and not the other tunnel headers) when performing its 518 packet size calculations. Also, the other tunnel headers appear in 519 only the first segment of a segmented SEAL packet (i.e., they do not 520 appear in non-initial segments) and the outer tunnel trailers (if 521 any) appear only in the final segment. 523 SEAL supports both "nested" tunneling and "re-encapsulating" 524 tunneling. Nested tunneling occurs when a first tunnel is 525 encapsulated within a second tunnel, which may then further be 526 encapsulated within additional tunnels. Nested tunneling can be 527 useful, and stands in contrast to "recursive" tunneling which is an 528 anomalous condition incurred due to misconfiguration or a routing 529 loop. Considerations for nested tunneling and avoiding recursive 530 tunneling are discussed in Section 4 of [RFC2473]. 532 Re-encapsulating tunneling occurs when a packet arrives at a first 533 ETE, which then acts as an ITE to re-encapsulate and forward the 534 packet to a second ETE connected to the same subnetwork. In that 535 case each ITE/ETE transition represents a segment of a bridged path 536 between the ITE nearest the source and the ETE nearest the 537 destination. Considerations for re-encapsulating tunneling are 538 discussed in[I-D.templin-ironbis]. Combinations of nested and re- 539 encapsulating tunneling are also naturally supported by SEAL. 541 The SEAL ITE considers each underlying interface as the ingress 542 attachment point to a SEAL path to the ETE. The ITE therefore may 543 experience different path MTUs on different SEAL paths. 545 Finally, the SEAL ITE ensures that the inner network layer protocol 546 will see a minimum MTU of 1500 bytes over each SEAL path regardless 547 of the outer network layer protocol version, i.e., even if a small 548 amount of segmentation and reassembly are necessary. This is to 549 avoid path MTU "black holes" for the minimum MTU configured by the 550 vast majority of links in the Internet. Note that in some scenarios, 551 however, reassembly may place a heavy burden on the ETE. In that 552 case, the ITE should avoid invoking segmentation and instead report 553 an MTU smaller than 1500 bytes to the original source. 555 5.3. SEAL Header and Trailer Format 557 The SEAL header is formatted as follows: 559 0 1 2 3 560 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 |VER|C|P|I|V|R|M| Offset | NEXTHDR | LINK_ID |LEVEL| 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Identification (optional) | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Integrity Check Vector (optional) | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... 569 Figure 2: SEAL Header Format 571 VER (2) 572 a 2-bit version field. This document specifies Version 0 of the 573 SEAL protocol, i.e., the VER field encodes the value 0. 575 C (1) 576 the "Control/Data" bit. Set to 1 by the ITE in SEAL Control 577 Message Protocol (SCMP) control messages, and set to 0 in ordinary 578 data packets. 580 P (1) 581 the "Probe" bit. Set to 1 by the ITE in SEAL probe packets for 582 which it wishes to receive an explicit acknowledgement from the 583 ETE. 585 I (1) 586 the "Identification Included" bit. 588 V (1) 589 the "Integrity Check Vector included" bit. 591 R (1) 592 the "Redirects Permitted" bit when used by VET (see: 593 [I-D.templin-intarea-vet]); reserved for future use in other 594 contexts. 596 M (1) the "More Segments" bit. Set to 1 in a non-final segment and 597 set to 0 in the final segment of the SEAL packet. 599 Offset (8) an 8-bit Offset field. Set to 0 in the first segment of 600 a segmented SEAL packet. Set to an integral number of 256 byte 601 blocks in subsequent segments (e.g., an Offset of 4 indicates a 602 block that begins at the 1024th byte in the packet). 604 NEXTHDR (8) an 8-bit field that encodes the next header Internet 605 Protocol number the same as for the IPv4 protocol and IPv6 next 606 header fields. 608 LINK_ID (5) 609 a 5-bit link identification value, set to a unique value by the 610 ITE for each SEAL path over which it will send encapsulated 611 packets to the ETE (up to 32 SEAL paths per ETE are therefore 612 supported). Note that, if the ITE's interface connection to the 613 underlying link assigns multiple IP addresses, each address 614 represents a separate SEAL path that must be assigned a separate 615 LINK_ID. 617 LEVEL (3) 618 a 3-bit nesting level; use to limit the number of tunnel nesting 619 levels. Set to an integer value up to 7 in the innermost SEAL 620 encapsulation, and decremented by 1 for each successive additional 621 SEAL encapsulation nesting level. Up to 8 levels of nesting are 622 therefore supported. 624 Identification (32) 625 an optional 32-bit per-packet identification field; present when 626 I==1. Set to a 32-bit value (beginning with 0) that is 627 monotonically-incremented for each SEAL packet transmitted to this 628 ETE. 630 Integrity Check Vector (ICV) (variable) 631 an optional variable-length integrity check vector field; present 632 when V==1. 634 5.4. ITE Specification 636 5.4.1. Tunnel Interface MTU 638 The tunnel interface must present a constant MTU value to the inner 639 network layer as the size for admission of inner packets into the 640 interface. Since NBMA tunnel virtual interfaces may support a large 641 set of SEAL paths that accept widely varying maximum packet sizes, 642 however, a number of factors should be taken into consideration when 643 selecting a tunnel interface MTU. 645 Due to the ubiquitous deployment of standard Ethernet and similar 646 networking gear, the nominal Internet cell size has become 1500 647 bytes; this is the de facto size that end systems have come to expect 648 will either be delivered by the network without loss due to an MTU 649 restriction on the path or a suitable ICMP Packet Too Big (PTB) 650 message returned. When large packets sent by end systems incur 651 additional encapsulation at an ITE, however, they may be dropped 652 silently within the tunnel since the network may not always deliver 653 the necessary PTBs [RFC2923]. The ITE SHOULD therefore set a tunnel 654 interface MTU of at least 1500 bytes. 656 The inner network layer protocol consults the tunnel interface MTU 657 when admitting a packet into the interface. For non-SEAL inner IPv4 658 packets with the IPv4 Don't Fragment (DF) bit cleared (i.e, DF==0), 659 if the packet is larger than the tunnel interface MTU the inner IPv4 660 layer uses IPv4 fragmentation to break the packet into fragments no 661 larger than the tunnel interface MTU. The ITE then admits each 662 fragment into the interface as an independent packet. 664 For all other inner packets, the inner network layer admits the 665 packet if it is no larger than the tunnel interface MTU; otherwise, 666 it drops the packet and sends a PTB error message to the source with 667 the MTU value set to the tunnel interface MTU. The message contains 668 as much of the invoking packet as possible without the entire message 669 exceeding the network layer minimum MTU size. 671 The ITE can alternatively set an indefinite MTU on the tunnel 672 interface such that all inner packets are admitted into the interface 673 regardless of their size. For ITEs that host applications that use 674 the tunnel interface directly, this option must be carefully 675 coordinated with protocol stack upper layers since some upper layer 676 protocols (e.g., TCP) derive their packet sizing parameters from the 677 MTU of the outgoing interface and as such may select too large an 678 initial size. This is not a problem for upper layers that use 679 conservative initial maximum segment size estimates and/or when the 680 tunnel interface can reduce the upper layer's maximum segment size, 681 e.g., by reducing the size advertised in the MSS option of outgoing 682 TCP messages (sometimes known as "MSS clamping"). 684 In light of the above considerations, the ITE SHOULD configure an 685 indefinite MTU on tunnel *router* interfaces so that SEAL performs 686 all subnetwork adaptation from within the interface as specified in 687 Section 5.4.3. The ITE can instead set a smaller MTU on tunnel 688 *host* interfaces, e.g., the maximum of 1500 bytes and the smallest 689 MTU among all of the underlying links minus the size of the 690 encapsulation headers. 692 5.4.2. Tunnel Neighbor Soft State 694 The tunnel virtual interface maintains a number of soft state 695 variables for each ETE and for each SEAL path. 697 When per-packet identification is required, the ITE maintains a per 698 ETE window of Identification values for the packets it has recently 699 sent to this ETE. The ITE then sets a variable "USE_ID" to TRUE, and 700 includes an Identification in each packet it sends to this ETE; 701 otherwise, it sets USE_ID to FALSE. 703 When message origin authentication and integrity checking is 704 required, the ITE also includes an ICV in the packets it sends to the 705 ETE. The ICV format is shown in Figure 3: 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 |F|Key|Algorithm| Message Authentication Code (MAC) | 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... 711 Figure 3: Integrity Check Vector (ICV) Format 713 As shown in the figure, the ICV begins with a 1-octet control field 714 with a 1-bit (F)lag, a 2-bit Key identifier and a 5-bit Algorithm 715 identifier. The control octet is followed by a variable-length 716 Message Authentication Code (MAC). The ITE maintains a per ETE 717 algorithm and secret key to calculate the MAC in each packet it will 718 send to this ETE. (By default, the ITE sets the F bit and Algorithm 719 fields to 0 to indicate use of the HMAC-SHA-1 algorithm with a 160 720 bit shared secret key to calculate an 80 bit MAC per [RFC2104] over 721 the leading 128 bytes of the packet. Other values for F and 722 Algorithm are out of scope.) The ITE then sets a variable "USE_ICV" 723 to TRUE, and includes an ICV in each packet it sends to this ETE; 724 otherwise, it sets USE_ICV to FALSE. 726 For each SEAL path, the ITE must also account for encapsulation 727 header lengths. The ITE therefore maintains the per SEAL path 728 constant values "SHLEN" set to the length of the SEAL header, "THLEN" 729 set to the length of the outer encapsulating transport layer headers 730 (or 0 if outer transport layer encapsulation is not used), "IHLEN" 731 set to the length of the outer IP layer header, and "HLEN" set to 732 (SHLEN+THLEN+IHLEN). (The ITE must include the length of the 733 uncompressed headers even if header compression is enabled when 734 calculating these lengths.) When SEAL is used in conjunction with 735 another tunnel type such as GRE or IPsec, the length of the headers 736 associated with those tunnels is also included in the HLEN 737 calculation for the first segment only and the length of the 738 associated trailers is included in the HLEN calculation for the final 739 segment only. 741 The ITE maintains a per SEAL path variable "MAXMTU" initialized to 742 the maximum of 1500 bytes and the MTU of the underlying link minus 743 HLEN. The ITE further sets a variable 'MINMTU' to the minimum MTU 744 for the SEAL path over which encapsulated packets will travel. For 745 IPv6 paths, the ITE sets MINMTU=1280 per [RFC2460]. For IPv4 paths, 746 the ITE sets MINMTU=576 based on practical interpretation of 747 [RFC1122] even though the theoretical MINMTU for IPv4 is only 68 748 bytes [RFC0791]. 750 The ITE can also set MINMTU to a larger value if there is reason to 751 believe that the minimum path MTU is larger, or to a smaller value if 752 there is reason to believe the MTU is smaller, e.g., if there may be 753 additional encapsulations on the path. If this value proves too 754 large, the ITE will receive PTB message feedback either from the ETE 755 or from a router on the path and will be able to reduce its MINMTU to 756 a smaller value. 758 The ITE may instead maintain the packet sizing variables and 759 constants as per ETE (rather than per SEAL path) values. In that 760 case, the values reflect the lowest-common-denominator size across 761 all of the SEAL paths associated with this ETE. 763 5.4.3. SEAL Layer Pre-Processing 765 The SEAL layer is logically positioned between the inner and outer 766 network protocol layers, where the inner layer is seen as the (true) 767 network layer and the outer layer is seen as the (virtual) data link 768 layer. Each packet to be processed by the SEAL layer is either 769 admitted into the tunnel interface by the inner network layer 770 protocol as described in Section 5.4.1 or is undergoing re- 771 encapsulation from within the tunnel interface. The SEAL layer sees 772 the former class of packets as inner packets that include inner 773 network and transport layer headers, and sees the latter class of 774 packets as transitional SEAL packets that include the outer and SEAL 775 layer headers that were inserted by the previous hop SEAL ITE. For 776 these transitional packets, the SEAL layer re-encapsulates the packet 777 with new outer and SEAL layer headers when it forwards the packet to 778 the next hop SEAL ITE. 780 We now discuss the SEAL layer pre-processing actions for these two 781 classes of packets. 783 5.4.3.1. Inner Packet Pre-Processing 785 For each inner packet admitted into the tunnel interface, if the 786 packet is itself a SEAL packet (i.e., one with the port number for 787 SEAL in the transport layer header or one with the protocol number 788 for SEAL in the IP layer header) and the LEVEL field of the SEAL 789 header contains the value 0, the ITE silently discards the packet. 791 Otherwise, for non-SEAL IPv4 inner packets with DF==0 in the IP 792 header and IPv6 inner packets with a fragment header and with (MF=0; 793 Offset=0), if the packet is larger than (MINMTU-HLEN) the ITE uses IP 794 fragmentation to fragment the packet into N roughly equal-length 795 pieces, where N is minimized and each fragment is significantly 796 smaller than (MINMTU-HLEN) to allow for additional encapsulations in 797 the path. The ITE then submits each fragment for SEAL encapsulation 798 as specified in Section 5.4.4. 800 For all other inner packets, if the packet is no larger than MAXMTU 801 for the corresponding SEAL path the ITE submits it for SEAL 802 encapsulation as specified in Section 5.4.4. Otherwise, the ITE 803 drops the packet and sends an ordinary PTB message appropriate to the 804 inner protocol version (subject to rate limiting) with the MTU field 805 set to MAXMTU. (For IPv4 SEAL packets with DF==0, the ITE should set 806 DF=1 and re-calculate the IPv4 header checksum before generating the 807 PTB message in order to avoid bogon filters.) After sending the PTB 808 message, the ITE discards the inner packet. 810 5.4.3.2. Transitional SEAL Packet Pre-Processing 812 For each transitional packet that is to be processed by the SEAL 813 layer from within the tunnel interface, the ITE sets aside the SEAL 814 encapsulation headers that were received from the previous hop. 815 Next, if the packet is no larger than MAXMTU for the next hop SEAL 816 path the ITE submits it for SEAL encapsulation as specified in 817 Section 5.4.4. Otherwise, the ITE drops the packet and sends an SCMP 818 Packet Too Big (SPTB) message to the previous hop subject to rate 819 limiting (see: Section 5.6.1.1) with the MTU field set to MAXMTU. 821 After sending the SPTB message, the ITE discards the packet. 823 5.4.4. SEAL Encapsulation and Segmentation 825 For each inner packet/fragment submitted for SEAL encapsulation, the 826 ITE next encapsulates the packet in a SEAL header formatted as 827 specified in Section 5.3. The SEAL header includes an Identification 828 field when USE_ID is TRUE, followed by an ICV field when USE_ICV is 829 TRUE. 831 The ITE next sets C=0 in the SEAL header. The ITE also sets P=1 if 832 packet is a probe (see: Sections 5.4.6 and 5.4.9); otherwise, the ITE 833 sets P=0. 835 The ITE then sets LINK_ID to the value assigned to the underlying 836 SEAL path, and sets NEXTHDR to the protocol number corresponding to 837 the address family of the encapsulated inner packet. For example, 838 the ITE sets NEXTHDR to the value '4' for encapsulated IPv4 packets 839 [RFC2003], '41' for encapsulated IPv6 packets [RFC2473][RFC4213], 840 '80' for encapsulated OSI/CLNP packets [RFC1070], etc. 842 Next, if the inner packet is not itself a SEAL packet the ITE sets 843 LEVEL to an integer value between 0 and 7 as a specification of the 844 number of additional layers of nested SEAL encapsulations permitted. 845 If the inner packet is a SEAL packet that is undergoing nested 846 encapsulation, the ITE instead sets LEVEL to the value that appears 847 in the inner packet's SEAL header minus 1. If the inner packet is 848 undergoing SEAL re-encapsulation, the ITE instead copies the LEVEL 849 value from the SEAL header of the packet to be re-encapsulated. 851 Next, if the inner packet is no larger than (MINMTU-HLEN) or larger 852 than 1500, the ITE sets (M=0; Offset=0). Otherwise, if the path MTU 853 is indeterminant or insufficient (see Section 5.4.6) the ITE breaks 854 the inner packet into N non-overlapping segments (where N is 855 minimized and each segment is significantly smaller than (MINMTU- 856 HLEN) to allow for additional encapsulations in the path). In this 857 process, the ITE MUST ensure that each segment except the final 858 segment contains an integral number of 256 byte blocks, and that the 859 inner packet's network and transport layer headers are included in 860 the first segment. The ITE then appends a clone of the SEAL header 861 from the first segment onto the head of each additional segment. The 862 ITE MUST also include an Identification field and set USE_ID=TRUE for 863 each segment. The ITE then sets (M=1; Offset=0) in the first 864 segment, sets (M=0/1; Offset=O(1)) in the second segment, sets 865 (M=0/1; Offset=O(2)) in the third segment (if needed), etc., then 866 finally sets (M=0; Offset=O(n)) in the final segment (where O(i) is 867 the number of 256 byte blocks that preceded this segment). 869 When USE_ID is FALSE, the ITE next sets I=0. Otherwise, the ITE sets 870 I=1 and writes a monotonically-incrementing integer value for this 871 ETE in the Identification field beginning with 0 in the first packet 872 transmitted. (For SEAL packets that have been split into multiple 873 pieces, the ITE writes the same Identification value in each piece.) 874 The monotonically-incrementing requirement is to satisfy ETEs that 875 use this value for anti-replay purposes. The value is incremented 876 modulo 2^32, i.e., it wraps back to 0 when the previous value was 877 (2^32 - 1). 879 When USE_ICV is FALSE, the ITE next sets V=0. Otherwise, the ITE 880 sets V=1, includes an ICV and calculates the MAC using HMAC-SHA-1 881 with a 160 bit secret key and 80 bit MAC field. Beginning with the 882 SEAL header, the ITE sets the ICV field to 0, calculates the MAC over 883 the leading 128 bytes of the packet (or up to the end of the packet 884 if there are fewer than 128 bytes) and places the result in the MAC 885 field. (For SEAL packets that have been split into multiple pieces, 886 each piece calculates its own MAC.) The ITE then writes the value 0 887 in the F flag and 0x00 in the Algorithm field of the ICV control 888 octet (other values for these fields, and other MAC calculation 889 disciplines, are outside the scope of this document and may be 890 specified in future documents.) 892 The ITE then adds the outer encapsulating headers as specified in 893 Section 5.4.5. 895 5.4.5. Outer Encapsulation 897 Following SEAL encapsulation, the ITE next encapsulates each segment 898 in the requisite outer transport (when necessary) and IP layer 899 headers. When a transport layer header such as UDP or TCP is 900 included, the ITE writes the port number for SEAL in the transport 901 destination service port field. 903 When UDP encapsulation is used, the ITE sets the UDP checksum field 904 to zero for IPv4 packets and also sets the UDP checksum field to zero 905 for IPv6 packets even though IPv6 generally requires UDP checksums. 906 Further considerations for setting the UDP checksum field for IPv6 907 packets are discussed in [RFC6935][RFC6936]. 909 The ITE then sets the outer IP layer headers the same as specified 910 for ordinary IP encapsulation (e.g., [RFC1070][RFC2003], [RFC2473], 911 [RFC4213], etc.) except that for ordinary SEAL packets the ITE copies 912 the "TTL/Hop Limit", "Type of Service/Traffic Class" and "Congestion 913 Experienced" values in the inner network layer header into the 914 corresponding fields in the outer IP header. For transitional SEAL 915 packets undergoing re-encapsulation, the ITE instead copies the "TTL/ 916 Hop Limit", "Type of Service/Traffic Class" and "Congestion 917 Experienced" values in the outer IP header of the received packet 918 into the corresponding fields in the outer IP header of the packet to 919 be forwarded (i.e., the values are transferred between outer headers 920 and *not* copied from the inner network layer header). 922 The ITE also sets the IP protocol number to the appropriate value for 923 the first protocol layer within the encapsulation (e.g., UDP, TCP, 924 SEAL, etc.). When IPv6 is used as the outer IP protocol, the ITE 925 then sets the flow label value in the outer IPv6 header the same as 926 described in [RFC6438]. When IPv4 is used as the outer IP protocol, 927 the ITE instead sets DF=0 in the IPv4 header to allow the packet to 928 be fragmented if it encounters a restricting link (for IPv6 SEAL 929 paths, the DF bit is implicitly set to 1). 931 The ITE finally sends each outer packet via the underlying link 932 corresponding to LINK_ID. 934 5.4.6. Path Probing and ETE Reachability Verification 936 All SEAL data packets sent by the ITE are considered implicit probes 937 to test for in-the-network fragmentation, and explicit probe packets 938 can be constructed to probe the path MTU. These probes will elicit 939 an SCMP message from the ETE if it needs to send an acknowledgement 940 and/or report an error condition. The probe packets may also be 941 dropped by either the ETE or a router on the path, which may or may 942 not result in an ICMP message being returned to the ITE. 944 To generate an explicit probe packet, the ITE creates a duplicate of 945 an actual data packet and use the duplicate as a probe. 946 (Alternatively, the ITE can create a packet buffer beginning with the 947 same outer headers, SEAL header and inner network layer header that 948 would appear in an ordinary data packet, then pad the packet with 949 random data.) The ITE then sets (P=1; C=0) in the SEAL header of the 950 probe packet, and also sets DF=1 in the outer IP header when IPv4 is 951 used. 953 The ITE can use explicit probing, e.g., to determine whether SEAL 954 segmentation is still necessary (see Section 5.4.4). If a probe size 955 of (1500+HLEN) bytes succeeds, the ITE can then be assured that the 956 path MTU is large enough so that the segmentation/reassembly process 957 can be suspended. This probing method parallels Packetization Layer 958 Path MTU Discovery [RFC4821]. 960 The ITE processes ICMP messages as specified in Section 5.4.7. 962 The ITE processes SCMP messages as specified in Section 5.6.2. 964 5.4.7. Processing ICMP Messages 966 When the ITE sends SEAL packets, it may receive ICMP error messages 967 [RFC0792][RFC4443] from an ordinary router within the subnetwork. 968 Each ICMP message includes an outer IP header, followed by an ICMP 969 header, followed by a portion of the SEAL data packet that generated 970 the error (also known as the "packet-in-error") beginning with the 971 outer IP header. 973 The ITE should process ICMPv4 Protocol Unreachable messages and 974 ICMPv6 Parameter Problem messages with Code "Unrecognized Next Header 975 type encountered" as a hint that the IP destination address does not 976 implement SEAL. The ITE can optionally ignore ICMP messages that do 977 not include sufficient information in the packet-in-error, or process 978 them as a hint that the SEAL path may be failing. 980 For other ICMP messages, the ITE should use any outer header 981 information available as a first-pass authentication filter (e.g., to 982 determine if the source of the message is within the same 983 administrative domain as the ITE) and discards the message if first 984 pass filtering fails. 986 Next, the ITE examines the packet-in-error beginning with the SEAL 987 header. If the value in the Identification field (if present) is not 988 within the window of packets the ITE has recently sent to this ETE, 989 or if the MAC value in the SEAL header ICV field (if present) is 990 incorrect, the ITE discards the message. 992 Next, if the received ICMP message is a PTB the ITE sets the 993 temporary variable "PMTU" for this SEAL path to the MTU value in the 994 PTB message. If PMTU==0, the ITE consults a plateau table (e.g., as 995 described in [RFC1191]) to determine PMTU based on the length field 996 in the outer IP header of the packet-in-error. For example, if the 997 ITE receives a PTB message with MTU==0 and length 4KB, it can set 998 PMTU=2KB. If the ITE subsequently receives a PTB message with MTU==0 999 and length 2KB, it can set PMTU=1792, etc., to a minimum value of 1000 PMTU=(1500+HLEN). If the ITE is performing stateful MTU 1001 determination for this SEAL path (see Section 5.4.9), the ITE next 1002 sets MAXMTU=MAX((PMTU-HLEN), 1500). 1004 If the ICMP message was not discarded, the ITE then transcribes it 1005 into a message to return to the previous hop. If the inner packet 1006 was a SEAL data packet, the ITE transcribes the ICMP message into an 1007 SCMP message. Otherwise, the ITE transcribes the ICMP message into a 1008 message appropriate for the inner protocol version. 1010 To transcribe the message, the ITE extracts the inner packet from 1011 within the ICMP message packet-in-error field and uses it to generate 1012 a new message corresponding to the type of the received ICMP message. 1013 For SCMP messages, the ITE generates the message the same as 1014 described for ETE generation of SCMP messages in Section 5.6.1. For 1015 (S)PTB messages, the ITE writes (PMTU-HLEN) in the MTU field. 1017 The ITE finally forwards the transcribed message to the previous hop 1018 toward the inner source address. 1020 5.4.8. IPv4 Middlebox Reassembly Testing 1022 The ITE can perform a qualification exchange to ensure that the 1023 subnetwork correctly delivers fragments to the ETE. This procedure 1024 can be used, e.g., to determine whether there are middleboxes on the 1025 path that violate the [RFC1812], Section 5.2.6 requirement that: "A 1026 router MUST NOT reassemble any datagram before forwarding it". 1028 The ITE should use knowledge of its topological arrangement as an aid 1029 in determining when middlebox reassembly testing is necessary. For 1030 example, if the ITE is aware that the ETE is located somewhere in the 1031 public Internet, middlebox reassembly testing should not be 1032 necessary. If the ITE is aware that the ETE is located behind a NAT 1033 or a firewall, however, then reassembly testing can be used to detect 1034 middleboxes that do not conform to specifications. 1036 The ITE can perform a middlebox reassembly test by either selecting 1037 an ordinary data packet to be used as a prototype for a probe or by 1038 creating an explicit probe packet filled with random data. The ITE 1039 should select only inner packets that are no larger than (1500-HLEN) 1040 bytes for testing purposes since the most an ordinary node can be 1041 expected to reassemble is 1500 bytes. 1043 To generate a probe, the ITE either creates a clone of an ordinary 1044 data packet or creates a packet buffer beginning with the same outer 1045 headers, SEAL header and inner network layer header that would appear 1046 in an ordinary data packet. The ITE then pads the probe packet with 1047 random data to a length that is at least 128 bytes but no longer than 1048 (1500-HLEN) bytes. 1050 The ITE then sets (P=1; C=0) in the SEAL header of the probe packet 1051 and sets the NEXTHDR field to the inner network layer protocol type. 1052 Next, the ITE sets LINK_ID and LEVEL to the appropriate values for 1053 this SEAL path, sets Identification and I=1 (when USE_ID is TRUE), 1054 then finally calculates the ICV and sets V=1 (when USE_ICV is TRUE). 1056 The ITE then encapsulates the probe packet in the appropriate outer 1057 headers, splits it into two outer IP fragments, then sends both 1058 fragments over the same SEAL path. 1060 The ITE should send a series of probe packets (e.g., 3-5 probes with 1061 1sec intervals between tests) instead of a single isolated probe in 1062 case of packet loss. If the ETE returns an SCMP PTB message with MTU 1063 != 0, then the SEAL path correctly supports fragmentation; otherwise, 1064 the ITE enables stateful MTU determination for this SEAL path as 1065 specified in Section 5.4.9. 1067 (Examples of middleboxes that may perform reassembly include stateful 1068 NATs and firewalls. Such devices could still allow for stateless MTU 1069 determination if they gather the fragments of a fragmented IPv4 SEAL 1070 data packet for packet analysis purposes but then forward the 1071 fragments on to the final destination rather than forwarding the 1072 reassembled packet.) 1074 5.4.9. Stateful MTU Determination 1076 SEAL supports a stateless MTU determination capability, however the 1077 ITE may in some instances wish to impose a stateful MTU limit on a 1078 particular SEAL path. For example, when the ETE is situated behind a 1079 middlebox that performs IPv4 reassembly (see: Section 5.4.8) it is 1080 imperative that fragmentation be avoided. In other instances (e.g., 1081 when the SEAL path includes performance-constrained links), the ITE 1082 may deem it necessary to cache a conservative static MTU in order to 1083 avoid sending large packets that would only be dropped due to an MTU 1084 restriction somewhere on the path. 1086 To determine a static MTU value, the ITE sends a series of probe 1087 packets of various sizes to the ETE with P=1 in the SEAL header and 1088 DF=1 in the outer IP header. The ITE then caches the size 'S' of the 1089 largest packet for which it receives a probe reply from the ETE by 1090 setting MAXMTU=MAX((S-HLEN), 1500) for this SEAL path. 1092 For example, the ITE could send probe packets of 4KB, followed by 1093 2KB, followed by 1792 bytes, etc. While probing, the ITE processes 1094 any ICMP PTB message it receives as a potential indication of probe 1095 failure then discards the message. 1097 5.4.10. Detecting Path MTU Changes 1099 When stateful MTU determination is used, the ITE SHOULD periodically 1100 reset MAXMTU and/or re-probe the path to determine whether MAXMTU has 1101 increased. If the path still has a too-small MTU, the ITE will 1102 receive a PTB message that reports a smaller size. 1104 5.5. ETE Specification 1105 5.5.1. Reassembly Buffer Requirements 1107 For IPv6, the ETE configures a minimum reassembly buffer size of 1108 (1500 + HLEN) bytes for the reassembly of outer IPv6 packets, i.e., 1109 even though the true minimum reassembly size for IPv6 is only 1500 1110 bytes [RFC2460]. For IPv4, the ETE also configures a minimum 1111 reassembly buffer size of (1500 + HLEN) bytes for the reassembly of 1112 outer IPv4 packets, i.e., even though the true minimum reassembly 1113 size for IPv4 is only 576 bytes [RFC1122]. 1115 In addition to this outer reassembly buffer requirement, the ETE 1116 further configures a minimum SEAL reassembly buffer size of (1500 + 1117 HLEN) bytes for the reassembly of segmented SEAL packets (see: 1118 Section 5.5.4). 1120 Note that the value "HLEN" may be variable and initially unknown to 1121 the ETE, and would typically range from a few bytes to a few tens of 1122 bytes. It is therefore RECOMMENDED that the ETE configure minimum 1123 IP/SEAL slightly larger reassembly buffer sizes of 2048 bytes (2KB), 1124 which may also provide a more natural reassembly buffer allocation 1125 size. 1127 5.5.2. Tunnel Neighbor Soft State 1129 When message origin authentication and integrity checking is 1130 required, the ETE maintains a per-ITE MAC calculation algorithm and a 1131 symmetric secret key to verify the MAC. When per-packet 1132 identification is required, the ETE also maintains a window of 1133 Identification values for the packets it has recently received from 1134 this ITE. 1136 When the tunnel neighbor relationship is bidirectional, the ETE 1137 further maintains a per SEAL path mapping of outer IP and transport 1138 layer addresses to the LINK_ID that appears in packets received from 1139 the ITE. 1141 5.5.3. IP-Layer Reassembly 1143 The ETE reassembles fragmented IP packets that are explicitly 1144 addressed to itself. For IP fragments that are received via a SEAL 1145 tunnel, the ETE SHOULD maintain conservative reassembly cache high- 1146 and low-water marks. When the size of the reassembly cache exceeds 1147 this high-water mark, the ETE SHOULD actively discard stale 1148 incomplete reassemblies (e.g., using an Active Queue Management (AQM) 1149 strategy) until the size falls below the low-water mark. The ETE 1150 SHOULD also actively discard any pending reassemblies that clearly 1151 have no opportunity for completion, e.g., when a considerable number 1152 of new fragments have arrived before a fragment that completes a 1153 pending reassembly arrives. 1155 The ETE processes non-SEAL IP packets as specified in the normative 1156 references, i.e., it performs any necessary IP reassembly then 1157 discards the packet if it is larger than the reassembly buffer size 1158 or delivers the (fully-reassembled) packet to the appropriate upper 1159 layer protocol module. 1161 For SEAL packets, the ETE performs any necessary IP reassembly then 1162 submits the packet for SEAL decapsulation as specified in Section 1163 5.5.4. (Note that if the packet is larger than the reassembly buffer 1164 size, the ETE still examines the leading portion of the (partially) 1165 reassembled packet during decapsulation.) 1167 5.5.4. Decapsulation, SEAL-Layer Reassembly, and Re-Encapsulation 1169 For each SEAL packet accepted for decapsulation, when I==1 the ETE 1170 first examines the Identification field. If the Identification is 1171 not within the window of acceptable values for this ITE, the ETE 1172 silently discards the packet. 1174 Next, if V==1 the ETE SHOULD verify the MAC value (with the MAC field 1175 itself reset to 0) and silently discard the packet if the value is 1176 incorrect. 1178 Next, if the packet arrived as multiple IP fragments, the ETE sends 1179 an SPTB message back to the ITE with MTU set to the size of the 1180 largest fragment received minus HLEN (see: Section 5.6.1.1). 1182 Next, if the packet arrived as multiple IP fragments and the inner 1183 packet is larger than 1500 bytes, the ETE silently discards the 1184 packet; otherwise, it continues to process the packet. 1186 Next, if there is an incorrect value in a SEAL header field (e.g., an 1187 incorrect "VER" field value), the ETE discards the packet. If the 1188 SEAL header has C==0, the ETE also returns an SCMP "Parameter 1189 Problem" (SPP) message (see Section 5.6.1.2). 1191 Next, if the SEAL header has C==1, the ETE processes the packet as an 1192 SCMP packet as specified in Section 5.6.2. Otherwise, the ETE 1193 continues to process the packet as a SEAL data packet. 1195 Next, if the SEAL header has (M==1 || Offset!=0) the ETE checks to 1196 see if the other segments of this already-segmented SEAL packet have 1197 arrived, i.e., by looking for additional segments that have the same 1198 outer IP source address, destination address, source transport port 1199 number (if present) and SEAL Identification value. If the other 1200 segments have already arrived, the ETE discards the SEAL header and 1201 other outer headers from the non-initial segments and appends them 1202 onto the end of the first segment according to their offset value. 1203 Otherwise, the ETE caches the segment for at most 60 seconds while 1204 awaiting the arrival of its partners. During this process, the ETE 1205 discards any segments that are overlapping with respect to segments 1206 that have already been received, and also discards any segments that 1207 have M==1 in the SEAL header but do not contain an integral number of 1208 256 byte blocks. The ETE further SHOULD manage the SEAL reassembly 1209 cache the same as described for the IP-Layer Reassembly cache in 1210 Section 5.5.3, i.e., it SHOULD perform an early discard for any 1211 pending reassemblies that have low probability of completion. 1213 Next, if the SEAL header in the (reassembled) packet has P==1, the 1214 ETE drops the packet unconditionally and sends an SPTB message back 1215 to the ITE with MTU=0 (see: Section 5.6.1.1) if it has not already 1216 sent an SPTB message based on IP fragmentation. (Note that the ETE 1217 therefore sends only a single SPTB message for a probe packet that 1218 also experienced IP fragmentation.) 1220 Finally, the ETE discards the outer headers and processes the inner 1221 packet according to the header type indicated in the SEAL NEXTHDR 1222 field. If the next hop toward the inner destination address is via a 1223 different interface than the SEAL packet arrived on, the ETE discards 1224 the SEAL header and delivers the inner packet either to the local 1225 host or to the next hop interface if the packet is not destined to 1226 the local host. 1228 If the next hop is on the same interface the SEAL packet arrived on, 1229 however, the ETE submits the packet for SEAL re-encapsulation 1230 beginning with the specification in Section 5.4.3 above and without 1231 decrementing the value in the inner (TTL / Hop Limit) field. In this 1232 process, the packet remains within the tunnel (i.e., it does not exit 1233 and then re-enter the tunnel); hence, the packet is not discarded if 1234 the LEVEL field in the SEAL header contains the value 0. 1236 5.6. The SEAL Control Message Protocol (SCMP) 1238 SEAL provides a companion SEAL Control Message Protocol (SCMP) that 1239 uses the same message types and formats as for the Internet Control 1240 Message Protocol for IPv6 (ICMPv6) [RFC4443]. As for ICMPv6, each 1241 SCMP message includes a 32-bit header and a variable-length body. 1242 The ITE encapsulates the SCMP message in a SEAL header and outer 1243 headers as shown in Figure 4: 1245 +--------------------+ 1246 ~ outer IP header ~ 1247 +--------------------+ 1248 ~ other outer hdrs ~ 1249 +--------------------+ 1250 ~ SEAL Header ~ 1251 +--------------------+ +--------------------+ 1252 | SCMP message header| --> | SCMP message header| 1253 +--------------------+ +--------------------+ 1254 | | --> | | 1255 ~ SCMP message body ~ --> ~ SCMP message body ~ 1256 | | --> | | 1257 +--------------------+ +--------------------+ 1259 SCMP Message SCMP Packet 1260 before encapsulation after encapsulation 1262 Figure 4: SCMP Message Encapsulation 1264 The following sections specify the generation, processing and 1265 relaying of SCMP messages. 1267 5.6.1. Generating SCMP Error Messages 1269 ETEs generate SCMP error messages in response to receiving certain 1270 SEAL data packets using the format shown in Figure 5: 1272 0 1 2 3 1273 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1275 | Type | Code | Checksum | 1276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1277 | Type-Specific Data | 1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1279 | As much of the invoking SEAL data packet as possible | 1280 ~ (beginning with the SEAL header) without the SCMP ~ 1281 | packet exceeding MINMTU bytes (*) | 1283 (*) also known as the "packet-in-error" 1285 Figure 5: SCMP Error Message Format 1287 The error message includes the 32-bit SCMP message header, followed 1288 by a 32-bit Type-Specific Data field, followed by the leading portion 1289 of the invoking SEAL data packet beginning with the SEAL header as 1290 the "packet-in-error". The packet-in-error includes as much of the 1291 invoking packet as possible extending to a length that would not 1292 cause the entire SCMP packet following outer encapsulation to exceed 1293 MINMTU bytes. 1295 When the ETE processes a SEAL data packet for which the 1296 Identification and ICV values are correct but an error must be 1297 returned, it prepares an SCMP error message as shown in Figure 5. 1298 The ETE sets the Type and Code fields to the same values that would 1299 appear in the corresponding ICMPv6 message [RFC4443], but calculates 1300 the Checksum beginning with the SCMP message header using the 1301 algorithm specified for ICMPv4 in [RFC0792]. 1303 The ETE next encapsulates the SCMP message in the requisite SEAL and 1304 outer headers as shown in Figure 4. During encapsulation, the ETE 1305 sets the outer destination address/port numbers of the SCMP packet to 1306 the values associated with the ITE and sets the outer source address/ 1307 port numbers to its own outer address/port numbers. 1309 The ETE then sets (C=1; P=0; M=0; Offset=0) in the SEAL header, then 1310 sets I, V, NEXTHDR and LEVEL to the same values that appeared in the 1311 SEAL header of the data packet. If the neighbor relationship between 1312 the ITE and ETE is unidirectional, the ETE next sets the LINK_ID 1313 field to the same value that appeared in the SEAL header of the data 1314 packet. Otherwise, the ETE sets the LINK_ID field to the value it 1315 would use in sending a SEAL packet to this ITE. 1317 When I==1, the ETE next sets the Identification field to an 1318 appropriate value for the ITE. If the neighbor relationship between 1319 the ITE and ETE is unidirectional, the ETE sets the Identification 1320 field to the same value that appeared in the SEAL header of the data 1321 packet. Otherwise, the ETE sets the Identification field to the 1322 value it would use in sending the next SEAL packet to this ITE. 1324 When V==1, the ETE then prepares the ICV field the same as specified 1325 for SEAL data packet encapsulation in Section 5.4.4. 1327 Finally, the ETE sends the resulting SCMP packet to the ITE the same 1328 as specified for SEAL data packets in Section 5.4.5. 1330 The following sections describe additional considerations for various 1331 SCMP error messages: 1333 5.6.1.1. Generating SCMP Packet Too Big (SPTB) Messages 1335 An ETE generates an SPTB message when it receives a SEAL data packet 1336 that arrived as multiple outer IP fragments. The ETE prepares the 1337 SPTB message the same as for the corresponding ICMPv6 PTB message, 1338 and writes the length of the largest outer IP fragment received minus 1339 HLEN in the MTU field of the message. 1341 The ETE also generates an SPTB message when it accepts a SEAL probe 1342 packet, i.e., one with P==1 in the SEAL header. The ETE prepares the 1343 SPTB message the same as above, except that it writes the value 0 in 1344 the MTU field. 1346 5.6.1.2. Generating Other SCMP Error Messages 1348 An ETE generates an SCMP "Destination Unreachable" (SDU) message 1349 under the same circumstances that an IPv6 system would generate an 1350 ICMPv6 Destination Unreachable message. 1352 An ETE generates an SCMP "Parameter Problem" (SPP) message when it 1353 receives a SEAL packet with an incorrect value in the SEAL header. 1355 TEs generate other SCMP message types using methods and procedures 1356 specified in other documents. For example, SCMP message types used 1357 for tunnel neighbor coordinations are specified in VET 1358 [I-D.templin-intarea-vet]. 1360 5.6.2. Processing SCMP Error Messages 1362 An ITE may receive SCMP messages with C==1 in the SEAL header after 1363 sending packets to an ETE. The ITE first verifies that the outer 1364 addresses of the SCMP packet are correct, and (when I==1) that the 1365 Identification field contains an acceptable value. The ITE next 1366 verifies that the SEAL header fields are set correctly as specified 1367 in Section 5.6.1. When V==1, the ITE then verifies the ICV. The ITE 1368 next verifies the Checksum value in the SCMP message header. If any 1369 of these values are incorrect, the ITE silently discards the message; 1370 otherwise, it processes the message as follows: 1372 5.6.2.1. Processing SCMP PTB Messages 1374 After an ITE sends a SEAL packet to an ETE, it may receive an SPTB 1375 message with a packet-in-error containing the leading portion of the 1376 packet (see: Section 5.6.1.1). For SPTB messages with MTU==0, the 1377 ITE processes the message as confirmation that the ETE received a 1378 SEAL probe packet with P==1 in the SEAL header. The ITE then 1379 discards the message. 1381 For SPTB messages with MTU!=0, the ITE processes the message as an 1382 indication of a packet size limitation as follows. If the inner 1383 packet is no larger than 1500 bytes, the ITE reduces its MINMTU value 1384 for this ITE. If the inner packet length is larger than 1500 and the 1385 MTU value is not substantially less than MINMTU bytes, the value is 1386 likely to reflect the true MTU of the restricting link on the path to 1387 the ETE; otherwise, a router on the path may be generating runt 1388 fragments. 1390 In that case, the ITE can consult a plateau table (e.g., as described 1391 in [RFC1191]) to rewrite the MTU value to a reduced size. For 1392 example, if the ITE receives an IPv4 SPTB message with MTU==256 and 1393 inner packet length 4KB, it can rewrite the MTU to 2KB. If the ITE 1394 subsequently receives an IPv4 SPTB message with MTU==256 and inner 1395 packet length 2KB, it can rewrite the MTU to 1792, etc., to a minimum 1396 of 1500 bytes. If the ITE is performing stateful MTU determination 1397 for this SEAL path, it then writes the new MTU value minus HLEN in 1398 MAXMTU. 1400 The ITE then checks its forwarding tables to discover the previous 1401 hop toward the source address of the inner packet. If the previous 1402 hop is reached via the same tunnel interface the SPTB message arrived 1403 on, the ITE relays the message to the previous hop. In order to 1404 relay the message, the first writes zero in the Identification and 1405 ICV fields of the SEAL header within the packet-in-error. The ITE 1406 next rewrites the outer SEAL header fields with values corresponding 1407 to the previous hop and recalculates the MAC using the MAC 1408 calculation parameters associated with the previous hop. Next, the 1409 ITE replaces the SPTB's outer headers with headers of the appropriate 1410 protocol version and fills in the header fields as specified in 1411 Section 5.4.5, where the destination address/port correspond to the 1412 previous hop and the source address/port correspond to the ITE. The 1413 ITE then sends the message to the previous hop the same as if it were 1414 issuing a new SPTB message. (Note that, in this process, the values 1415 within the SEAL header of the packet-in-error are meaningless to the 1416 previous hop and therefore cannot be used by the previous hop for 1417 authentication purposes.) 1419 If the previous hop is not reached via the same tunnel interface, the 1420 ITE instead transcribes the message into a format appropriate for the 1421 inner packet (i.e., the same as described for transcribing ICMP 1422 messages in Section 5.4.7) and sends the resulting transcribed 1423 message to the original source. (NB: if the inner packet within the 1424 SPTB message is an IPv4 SEAL packet with DF==0, the ITE should set 1425 DF=1 and re-calculate the IPv4 header checksum while transcribing the 1426 message in order to avoid bogon filters.) The ITE then discards the 1427 SPTB message. 1429 Note that the ITE may receive an SPTB message from another ITE that 1430 is at the head end of a nested level of encapsulation. The ITE has 1431 no security associations with this nested ITE, hence it should 1432 consider this SPTB message the same as if it had received an ICMP PTB 1433 message from an ordinary router on the path to the ETE. That is, the 1434 ITE should examine the packet-in-error field of the SPTB message and 1435 only process the message if it is able to recognize the packet as one 1436 it had previously sent. 1438 5.6.2.2. Processing Other SCMP Error Messages 1440 An ITE may receive an SDU message with an appropriate code under the 1441 same circumstances that an IPv6 node would receive an ICMPv6 1442 Destination Unreachable message. The ITE either transcribes or 1443 relays the message toward the source address of the inner packet 1444 within the packet-in-error the same as specified for SPTB messages in 1445 Section 5.6.2.1. 1447 An ITE may receive an SPP message when the ETE receives a SEAL packet 1448 with an incorrect value in the SEAL header. The ITE should examine 1449 the SEAL header within the packet-in-error to determine whether a 1450 different setting should be used in subsequent packets, but does not 1451 relay the message further. 1453 TEs process other SCMP message types using methods and procedures 1454 specified in other documents. For example, SCMP message types used 1455 for tunnel neighbor coordinations are specified in VET 1456 [I-D.templin-intarea-vet]. 1458 6. Link Requirements 1460 Subnetwork designers are expected to follow the recommendations in 1461 Section 2 of [RFC3819] when configuring link MTUs. 1463 7. End System Requirements 1465 End systems are encouraged to implement end-to-end MTU assurance 1466 (e.g., using Packetization Layer Path MTU Discovery (PLPMTUD) per 1467 [RFC4821]) even if the subnetwork is using SEAL. 1469 When end systems use PLPMTUD, SEAL will ensure that the tunnel 1470 behaves as a link in the path that assures an MTU of at least 1500 1471 bytes while not precluding discovery of larger MTUs. The PLPMTUD 1472 mechanism will therefore be able to function as designed in order to 1473 discover and utilize larger MTUs. 1475 8. Router Requirements 1477 Routers within the subnetwork are expected to observe the standard IP 1478 router requirements, including the implementation of IP fragmentation 1479 and reassembly as well as the generation of ICMP messages 1480 [RFC0792][RFC1122][RFC1812][RFC2460][RFC4443][RFC6434]. 1482 Note that, even when routers support existing requirements for the 1483 generation of ICMP messages, these messages are often filtered and 1484 discarded by middleboxes on the path to the original source of the 1485 message that triggered the ICMP. It is therefore not possible to 1486 assume delivery of ICMP messages even when routers are correctly 1487 implemented. 1489 9. Nested Encapsulation Considerations 1491 SEAL supports nested tunneling for up to 8 layers of encapsulation. 1492 In this model, the SEAL ITE has a tunnel neighbor relationship only 1493 with ETEs at its own nesting level, i.e., it does not have a tunnel 1494 neighbor relationship with other ITEs, nor with ETEs at other nesting 1495 levels. 1497 Therefore, when an ITE 'A' within an outer nesting level needs to 1498 return an error message to an ITE 'B' within an inner nesting level, 1499 it generates an ordinary ICMP error message the same as if it were an 1500 ordinary router within the subnetwork. 'B' can then perform message 1501 validation as specified in Section 5.4.7, but full message origin 1502 authentication is not possible. 1504 Since ordinary ICMP messages are used for coordinations between ITEs 1505 at different nesting levels, nested SEAL encapsulations should only 1506 be used when the ITEs are within a common administrative domain 1507 and/or when there is no ICMP filtering middlebox such as a firewall 1508 or NAT between them. An example would be a recursive nesting of 1509 mobile networks, where the first network receives service from an 1510 ISP, the second network receives service from the first network, the 1511 third network receives service from the second network, etc. 1513 NB: As an alternative, the SCMP protocol could be extended to allow 1514 ITE 'A' to return an SCMP message to ITE 'B' rather than return an 1515 ICMP message. This would conceptually allow the control messages to 1516 pass through firewalls and NATs, however it would give no more 1517 message origin authentication assurance than for ordinary ICMP 1518 messages. It was therefore determined that the complexity of 1519 extending the SCMP protocol was of little value within the context of 1520 the anticipated use cases for nested encapsulations. 1522 10. Reliability Considerations 1524 Although a SEAL tunnel may span an arbitrarily-large subnetwork 1525 expanse, the IP layer sees the tunnel as a simple link that supports 1526 the IP service model. Links with high bit error rates (BERs) (e.g., 1527 IEEE 802.11) use Automatic Repeat-ReQuest (ARQ) mechanisms [RFC3366] 1528 to increase packet delivery ratios, while links with much lower BERs 1529 typically omit such mechanisms. Since SEAL tunnels may traverse 1530 arbitrarily-long paths over links of various types that are already 1531 either performing or omitting ARQ as appropriate, it would therefore 1532 be inefficient to require the tunnel endpoints to also perform ARQ. 1534 11. Integrity Considerations 1536 The SEAL header includes an integrity check field that covers the 1537 SEAL header and at least the inner packet headers. This provides for 1538 header integrity verification on a segment-by-segment basis for a 1539 segmented re-encapsulating tunnel path. 1541 Fragmentation and reassembly schemes must also consider packet- 1542 splicing errors, e.g., when two fragments from the same packet are 1543 concatenated incorrectly, when a fragment from packet X is 1544 reassembled with fragments from packet Y, etc. The primary sources 1545 of such errors include implementation bugs and wrapping IPv4 ID 1546 fields. 1548 In particular, the IPv4 16-bit ID field can wrap with only 64K 1549 packets with the same (src, dst, protocol)-tuple alive in the system 1550 at a given time [RFC4963]. When the IPv4 ID field is re-written by a 1551 middlebox such as a NAT or Firewall, ID field wrapping can occur with 1552 even fewer packets alive in the system. It is therefore essential 1553 that IPv4 fragmentation and reassembly be avoided. 1555 12. IANA Considerations 1557 The IANA is requested to allocate a User Port number for "SEAL" in 1558 the 'port-numbers' registry. The Service Name is "SEAL", and the 1559 Transport Protocols are TCP and UDP. The Assignee is the IESG 1560 (iesg@ietf.org) and the Contact is the IETF Chair (chair@ietf.org). 1561 The Description is "Subnetwork Encapsulation and Adaptation Layer 1562 (SEAL)", and the Reference is the RFC-to-be currently known as 1563 'draft-templin-intarea-seal'. 1565 13. Security Considerations 1567 SEAL provides a segment-by-segment message origin authentication, 1568 integrity and anti-replay service. The SEAL header is sent in-the- 1569 clear the same as for the outer IP and other outer headers. In this 1570 respect, the threat model is no different than for IPv6 extension 1571 headers. Unlike IPv6 extension headers, however, the SEAL header can 1572 be protected by an integrity check that also covers the inner packet 1573 headers. 1575 An amplification/reflection/buffer overflow attack is possible when 1576 an attacker sends IP fragments with spoofed source addresses to an 1577 ETE in an attempt to clog the ETE's reassembly buffer and/or cause 1578 the ETE to generate a stream of SCMP messages returned to a victim 1579 ITE. The SCMP message ICV, Identification, as well as the inner 1580 headers of the packet-in-error, provide mitigation for the ETE to 1581 detect and discard SEAL segments with spoofed source addresses. 1583 Security issues that apply to tunneling in general are discussed in 1584 [RFC6169]. 1586 14. Related Work 1588 Section 3.1.7 of [RFC2764] provides a high-level sketch for 1589 supporting large tunnel MTUs via a tunnel-level segmentation and 1590 reassembly capability to avoid IP level fragmentation. 1592 Section 3 of [RFC4459] describes inner and outer fragmentation at the 1593 tunnel endpoints as alternatives for accommodating the tunnel MTU. 1595 Section 4 of [RFC2460] specifies a method for inserting and 1596 processing extension headers between the base IPv6 header and 1597 transport layer protocol data. The SEAL header is inserted and 1598 processed in exactly the same manner. 1600 IPsec/AH is [RFC4301][RFC4301] is used for full message integrity 1601 verification between tunnel endpoints, whereas SEAL only ensures 1602 integrity for the inner packet headers. The AYIYA proposal 1603 [I-D.massar-v6ops-ayiya] uses similar means for providing message 1604 authentication and integrity. 1606 SEAL, along with the Virtual Enterprise Traversal (VET) 1607 [I-D.templin-intarea-vet] tunnel virtual interface abstraction, are 1608 the functional building blocks for the Interior Routing Overlay 1609 Network (IRON) [I-D.templin-ironbis] and Routing and Addressing in 1610 Networks with Global Enterprise Recursion (RANGER) [RFC5720][RFC6139] 1611 architectures. 1613 The concepts of path MTU determination through the report of 1614 fragmentation and extending the IPv4 Identification field were first 1615 proposed in deliberations of the TCP-IP mailing list and the Path MTU 1616 Discovery Working Group (MTUDWG) during the late 1980's and early 1617 1990's. An historical analysis of the evolution of these concepts, 1618 as well as the development of the eventual PMTUD mechanism, appears 1619 in [RFC5320]. 1621 15. Implementation Status 1623 An early implementation of the first revision of SEAL [RFC5320] is 1624 available at: http://isatap.com/seal. 1626 16. Acknowledgments 1628 The following individuals are acknowledged for helpful comments and 1629 suggestions: Jari Arkko, Fred Baker, Iljitsch van Beijnum, Oliver 1630 Bonaventure, Teco Boot, Bob Braden, Brian Carpenter, Steve Casner, 1631 Ian Chakeres, Noel Chiappa, Remi Denis-Courmont, Remi Despres, Ralph 1632 Droms, Aurnaud Ebalard, Gorry Fairhurst, Washam Fan, Dino Farinacci, 1633 Joel Halpern, Sam Hartman, John Heffner, Thomas Henderson, Bob 1634 Hinden, Christian Huitema, Eliot Lear, Darrel Lewis, Joe Macker, Matt 1635 Mathis, Erik Nordmark, Dan Romascanu, Dave Thaler, Joe Touch, Mark 1636 Townsley, Ole Troan, Margaret Wasserman, Magnus Westerlund, Robin 1637 Whittle, James Woodyatt, and members of the Boeing Research & 1638 Technology NST DC&NT group. 1640 Discussions with colleagues following the publication of [RFC5320] 1641 have provided useful insights that have resulted in significant 1642 improvements to this, the Second Edition of SEAL. 1644 This document received substantial review input from the IESG and 1645 IETF area directorates in the February 2013 timeframe. IESG members 1646 and IETF area directorate representatives who contributed helpful 1647 comments and suggestions are gratefully acknowledged. Discussions on 1648 the IETF IPv6 and Intarea mailing lists in the June 2013 timeframe 1649 also stimulated several useful ideas. 1651 Path MTU determination through the report of fragmentation was first 1652 proposed by Charles Lynn on the TCP-IP mailing list in 1987. 1653 Extending the IP identification field was first proposed by Steve 1654 Deering on the MTUDWG mailing list in 1989. 1656 17. References 1658 17.1. Normative References 1660 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1661 September 1981. 1663 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1664 RFC 792, September 1981. 1666 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1667 Communication Layers", STD 3, RFC 1122, October 1989. 1669 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1670 Requirement Levels", BCP 14, RFC 2119, March 1997. 1672 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1673 (IPv6) Specification", RFC 2460, December 1998. 1675 [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure 1676 Neighbor Discovery (SEND)", RFC 3971, March 2005. 1678 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1679 Message Protocol (ICMPv6) for the Internet Protocol 1680 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1682 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 1683 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 1684 September 2007. 1686 17.2. Informative References 1688 [FOLK] Shannon, C., Moore, D., and k. claffy, "Beyond Folklore: 1689 Observations on Fragmented Traffic", December 2002. 1691 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 1692 October 1987. 1694 [I-D.massar-v6ops-ayiya] 1695 Massar, J., "AYIYA: Anything In Anything", 1696 draft-massar-v6ops-ayiya-02 (work in progress), July 2004. 1698 [I-D.taylor-v6ops-fragdrop] 1699 Jaeggli, J., Colitti, L., Kumari, W., Vyncke, E., Kaeo, 1700 M., and T. Taylor, "Why Operators Filter Fragments and 1701 What It Implies", draft-taylor-v6ops-fragdrop-01 (work in 1702 progress), June 2013. 1704 [I-D.templin-intarea-vet] 1705 Templin, F., "Virtual Enterprise Traversal (VET)", 1706 draft-templin-intarea-vet-40 (work in progress), May 2013. 1708 [I-D.templin-ironbis] 1709 Templin, F., "The Interior Routing Overlay Network 1710 (IRON)", draft-templin-ironbis-15 (work in progress), 1711 May 2013. 1713 [RFC0994] International Organization for Standardization (ISO) and 1714 American National Standards Institute (ANSI), "Final text 1715 of DIS 8473, Protocol for Providing the Connectionless- 1716 mode Network Service", RFC 994, March 1986. 1718 [RFC1063] Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP 1719 MTU discovery options", RFC 1063, July 1988. 1721 [RFC1070] Hagens, R., Hall, N., and M. Rose, "Use of the Internet as 1722 a subnetwork for experimentation with the OSI network 1723 layer", RFC 1070, February 1989. 1725 [RFC1146] Zweig, J. and C. Partridge, "TCP alternate checksum 1726 options", RFC 1146, March 1990. 1728 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1729 November 1990. 1731 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic 1732 Routing Encapsulation (GRE)", RFC 1701, October 1994. 1734 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 1735 RFC 1812, June 1995. 1737 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1738 for IP version 6", RFC 1981, August 1996. 1740 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1741 October 1996. 1743 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 1744 Hashing for Message Authentication", RFC 2104, 1745 February 1997. 1747 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1748 IPv6 Specification", RFC 2473, December 1998. 1750 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 1751 RFC 2675, August 1999. 1753 [RFC2764] Gleeson, B., Heinanen, J., Lin, A., Armitage, G., and A. 1754 Malis, "A Framework for IP Based Virtual Private 1755 Networks", RFC 2764, February 2000. 1757 [RFC2780] Bradner, S. and V. Paxson, "IANA Allocation Guidelines For 1758 Values In the Internet Protocol and Related Headers", 1759 BCP 37, RFC 2780, March 2000. 1761 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1762 Defeating Denial of Service Attacks which employ IP Source 1763 Address Spoofing", BCP 38, RFC 2827, May 2000. 1765 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 1766 RFC 2923, September 2000. 1768 [RFC3232] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by 1769 an On-line Database", RFC 3232, January 2002. 1771 [RFC3366] Fairhurst, G. and L. Wood, "Advice to link designers on 1772 link Automatic Repeat reQuest (ARQ)", BCP 62, RFC 3366, 1773 August 2002. 1775 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1776 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1777 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1778 RFC 3819, July 2004. 1780 [RFC4191] Draves, R. and D. Thaler, "Default Router Preferences and 1781 More-Specific Routes", RFC 4191, November 2005. 1783 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 1784 for IPv6 Hosts and Routers", RFC 4213, October 2005. 1786 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1787 Internet Protocol", RFC 4301, December 2005. 1789 [RFC4302] Kent, S., "IP Authentication Header", RFC 4302, 1790 December 2005. 1792 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1793 Network Tunneling", RFC 4459, April 2006. 1795 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1796 Discovery", RFC 4821, March 2007. 1798 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 1799 Errors at High Data Rates", RFC 4963, July 2007. 1801 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1802 Mitigations", RFC 4987, August 2007. 1804 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1805 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1806 May 2008. 1808 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1809 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1811 [RFC5320] Templin, F., "The Subnetwork Encapsulation and Adaptation 1812 Layer (SEAL)", RFC 5320, February 2010. 1814 [RFC5445] Watson, M., "Basic Forward Error Correction (FEC) 1815 Schemes", RFC 5445, March 2009. 1817 [RFC5720] Templin, F., "Routing and Addressing in Networks with 1818 Global Enterprise Recursion (RANGER)", RFC 5720, 1819 February 2010. 1821 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 1823 [RFC6139] Russert, S., Fleischman, E., and F. Templin, "Routing and 1824 Addressing in Networks with Global Enterprise Recursion 1825 (RANGER) Scenarios", RFC 6139, February 2011. 1827 [RFC6169] Krishnan, S., Thaler, D., and J. Hoagland, "Security 1828 Concerns with IP Tunneling", RFC 6169, April 2011. 1830 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 1831 Cheshire, "Internet Assigned Numbers Authority (IANA) 1832 Procedures for the Management of the Service Name and 1833 Transport Protocol Port Number Registry", BCP 165, 1834 RFC 6335, August 2011. 1836 [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node 1837 Requirements", RFC 6434, December 2011. 1839 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1840 for Equal Cost Multipath Routing and Link Aggregation in 1841 Tunnels", RFC 6438, November 2011. 1843 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 1844 RFC 6864, February 2013. 1846 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1847 UDP Checksums for Tunneled Packets", RFC 6935, April 2013. 1849 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1850 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1851 RFC 6936, April 2013. 1853 [RIPE] De Boer, M. and J. Bosma, "Discovering Path MTU Black 1854 Holes on the Internet using RIPE Atlas", July 2012. 1856 [SIGCOMM] Luckie, M. and B. Stasiewicz, "Measuring Path MTU 1857 Discovery Behavior", November 2010. 1859 [TBIT] Medina, A., Allman, M., and S. Floyd, "Measuring 1860 Interactions Between Transport Protocols and Middleboxes", 1861 October 2004. 1863 [WAND] Luckie, M., Cho, K., and B. Owens, "Inferring and 1864 Debugging Path MTU Discovery Failures", October 2005. 1866 Author's Address 1868 Fred L. Templin (editor) 1869 Boeing Research & Technology 1870 P.O. Box 3707 1871 Seattle, WA 98124 1872 USA 1874 Email: fltemplin@acm.org