idnits 2.17.1 draft-templin-seal-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 907. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 918. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 925. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 931. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 22, 2008) is 5847 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-manet-smf' is defined on line 747, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-14) exists of draft-ietf-manet-smf-07 == Outdated reference: A later version (-38) exists of draft-templin-autoconf-dhcp-14 -- Obsolete informational reference (is this intentional?): RFC 1063 (Obsoleted by RFC 1191) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Phantom Works 4 Intended status: Informational April 22, 2008 5 Expires: October 24, 2008 7 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 8 draft-templin-seal-12.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on October 24, 2008. 35 Abstract 37 Subnetworks are connected network regions bounded by border routers 38 that forward unicast and multicast packets over a virtual topology 39 manifested by tunneling. This virtual topology resembles a "virtual 40 ethernet", but may span multiple IP- and/or sub-IP layer forwarding 41 hops that can introduce packet duplication and/or traverse links with 42 diverse Maximum Transmission Units (MTUs). This document specifies a 43 Subnetwork Encapsulation and Adaptation Layer (SEAL) that 44 accommodates such virtual topologies over diverse underlying link 45 technologies. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 2. Terminology and Requirements . . . . . . . . . . . . . . . . . 4 51 3. Applicability Statement . . . . . . . . . . . . . . . . . . . 5 52 4. SEAL Protocol Specification . . . . . . . . . . . . . . . . . 5 53 4.1. Model of Operation . . . . . . . . . . . . . . . . . . . . 5 54 4.2. ITE Specification . . . . . . . . . . . . . . . . . . . . 7 55 4.2.1. Tunnel Interface MTU . . . . . . . . . . . . . . . . . 7 56 4.2.2. SEAL Maximum Segment Size (S-MSS) Maintenance . . . . 8 57 4.2.3. Inner Packet Fragmentation . . . . . . . . . . . . . . 8 58 4.2.4. SEAL Segmentation and Encapsulation . . . . . . . . . 8 59 4.2.5. Sending SEAL Protocol packets . . . . . . . . . . . . 11 60 4.2.6. Sending S-MSS Probes . . . . . . . . . . . . . . . . . 11 61 4.2.7. Processing Fragmentation Reports (FRAGREPs) . . . . . 11 62 4.2.8. Processing ICMP PTBs . . . . . . . . . . . . . . . . . 12 63 4.3. ETE Specification . . . . . . . . . . . . . . . . . . . . 12 64 4.3.1. Reassembly Buffer Requirements . . . . . . . . . . . . 12 65 4.3.2. IPv4-Layer Reassembly . . . . . . . . . . . . . . . . 12 66 4.3.3. Generating Fragmentation Reports (FRAGREPs) . . . . . 13 67 4.3.4. SEAL-Layer Reassembly and Decapsulation . . . . . . . 14 68 5. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 15 69 6. End System Requirements . . . . . . . . . . . . . . . . . . . 15 70 7. Router Requirements . . . . . . . . . . . . . . . . . . . . . 15 71 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 72 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 73 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 74 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 75 11.1. Normative References . . . . . . . . . . . . . . . . . . . 16 76 11.2. Informative References . . . . . . . . . . . . . . . . . . 17 77 Appendix A. Historic Evolution of PMTUD (written 10/30/2002) . . 18 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 79 Intellectual Property and Copyright Statements . . . . . . . . . . 21 81 1. Introduction 83 As internet technology and communication has grown and matured, many 84 techniques have developed that use virtual topologies (frequently 85 tunnels of one form or another) over an actual IP network. Those 86 virtual topologies have elements which appear as one hop in the 87 virtual topology, but are actually multiple IP or sub-IP layer hops. 88 These multiple hops often have quite diverse properties which are 89 often not even visible to the end-points of the virtual hop. This 90 introduces many failure modes that are not dealt with well in current 91 approaches. 93 The use of IP encapsulation has long been considered as an 94 alternative for creating such virtual topologies. However, the 95 insertion of an outer IP header reduces the effective path MTU as- 96 seen by the IP layer. When IPv4 is used, this reduced MTU can be 97 accommodated through the use of IPv4 fragmentation, but unmitigated 98 in-the-network fragmentation has been shown to be harmful through 99 operational experience and studies conducted over the course of many 100 years [FRAG][FOLK][RFC4963]. Additionally, classical path MTU 101 discovery [RFC1191] has known operational issues that are exacerbated 102 by in-the-network tunnels [RFC2923][RFC4459]. 104 For the purpose of this document, subnetworks are defined as virtual 105 topologies that span connected network regions bounded by border 106 routers. Examples include the global Internet interdomain routing 107 core, Mobile Ad hoc Networks (MANETs) and enterprise networks. These 108 subnetworks are mainfested by tunnels that may span many underlying 109 networks and traditional IP subnets, e.g., in the internal 110 organization of an enterprise network. Subnetwork border routers 111 support the Internet protocols [RFC0791][RFC2460] and forward unicast 112 and multicast IP packets over the virtual topology across multiple 113 IP- and/or sub-IP layer forwarding hops which may introduce packet 114 duplication and/or traverse links with diverse Maximum Transmission 115 Units (MTUs). 117 This document proposes a Subnetwork Encapsulation and Adaptation 118 Layer (SEAL) for the operation of IP over subnetworks that connect 119 the Ingress- and Egress Tunnel Endpoints (ITEs/ETEs) of border 120 routers. SEAL accommodates links with diverse MTUs and supports 121 efficient duplicate packet detection by introducing a minimal mid- 122 layer encapsulation. The SEAL encapsulation introduces an extended 123 Identification field for packet identification and a mid-layer 124 segmentation and reassembly capability that allows simplified cutting 125 and pasting of packets without invoking in-the-network IP 126 fragmentation. The SEAL protocol is specified in the following 127 sections. 129 2. Terminology and Requirements 131 The term "subnetwork" in this document refers to a virtual topology 132 that is configured over a connected network region bounded by border 133 routers and that that appears as a fully-connected shared link, i.e., 134 a "Virtual Ethernet (VET)" [I-D.templin-autoconf-dhcp]. 136 The terms "inner" and "outer" respectively refer to the innermost IP 137 {layer, protocol, header, packet, etc.} *before* any encapsulation, 138 and the outermost IP {layer, protocol, header, packet etc.} *after* 139 any encapsulation. Between these inner and outer layers, there may 140 also be "mid-layer" encapsulations. 142 The notation IPvX/*/IPvY refers to an inner IPvX packet encapsulated 143 in any '*' mid-layer headers (including the SEAL header) followed by 144 an outer IPvY header. The notation "IP" means either IP protocol 145 version (IPv4 or IPv6). 147 The following abbreviations correspond to terms used within this 148 document and elsewhere in common Internetworking nomenclature: 150 Subnetwork - a connected network region bounded by border routers 152 SEAL - Subnetwork Encapsulation and Adaptation Layer 154 VET - Virtual EThernet 156 MANET - Mobile Ad-hoc Network 158 ITE - Ingress Tunnel Endpoint 160 ETE - Egress Tunnel Endpoint 162 ENCAPS - the size of the outer encapsulating SEAL/*/IPv4 headers 164 MTU - Maximum Transmission Unit 166 S-MSS - the per-ETE SEAL Maximum Segment Size 168 PTB - an ICMPv6 "Packet Too Big" or an ICMPv4 "fragmentation 169 needed" message 171 DF - the IPv4 header Don't Fragment flag 173 FRAGREP - a Fragmentation Report message 175 SEAL-ID - a 32-bit Identification value; randomly initialized and 176 monotonically incremented for each SEAL protocol packet 177 SEAL_PROTO - an IPv4 protocol number used for SEAL 179 SEAL_PORT - a TCP/UDP service port number used for SEAL 181 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 182 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 183 document, are to be interpreted as described in [RFC2119]. 185 3. Applicability Statement 187 SEAL was motivated by the specific use case of subnetwork abstraction 188 for MANETs, however the domain of applicability also extends to 189 subnetwork abstractions of enterprise networks, the interdomain 190 routing core, etc. The domain of application therefore also includes 191 the map-and-encaps architecture proposals in the IRTF Routing 192 Research Group (RRG) (see: http://www3.tools.ietf.org/group/irtf/ 193 trac/wiki/RoutingResearchGroup). 195 SEAL introduces a minimal new mid-layer for IPvX in IPvY 196 encapsulation (e.g., as IPv6/SEAL/IPv4), and appears as a subnetwork 197 encapsulation as seen by the inner IP layer. SEAL can also be used 198 as a mid-layer for encapsulating inner IP packets within outer UDP/ 199 IPv4 header (e.g., as IP/SEAL/UDP/IPv4) such as for the Teredo domain 200 of applicability [RFC4380]. For further study, SEAL may also be 201 useful for "transport-mode" applications, e.g., when the inner layer 202 includes ordinary protocol data rather than an encapsulated IP 203 packet. 205 The current document version is specific to the use of IPv4 as the 206 outer encapsulation layer, however the same principles apply when 207 IPv6 is used as the outer layer. 209 4. SEAL Protocol Specification 211 4.1. Model of Operation 213 Ingres Tunnel Endpoints (ITEs) insert a SEAL header in the IP/*/ 214 IPv4-encapsulated packets they inject into a subnetwork, where the 215 outermost IPv4 header contains the source and destination addresses 216 of the subnetwork entry/exit points (i.e., the ITE/ETE), 217 respectively. SEAL defines a new IP protocol type and a new mid- 218 layer encapsulation for both unicast and multicast inner IP packets. 219 The ITE inserts a SEAL header during encapsulation as shown in 220 Figure 1: 222 +-------------------------+ 223 | | 224 ~ Outer */IPv4 headers ~ 225 | | 226 +-------------------------+ 227 | SEAL Header | 228 +-------------------------+ +-------------------------+ 229 ~ Any mid-layer * headers ~ ~ Any mid-layer * headers ~ 230 +-------------------------+ +-------------------------+ 231 | | | | 232 ~ Inner IP ~ ---> ~ Inner IP ~ 233 ~ Packet ~ ---> ~ Packet ~ 234 | | | | 235 +-------------------------+ +-------------------------+ 236 ~ Any mid-layer trailers ~ ~ Any mid-layer trailers ~ 237 +-------------------------+ +-------------------------+ 238 ~ Any outer trailers ~ 239 +-------------------------+ 241 Figure 1: SEAL Encapsulation 243 where the SEAL header is inserted as follows: 245 o For simple IP/IPv4 encapsulations (e.g., 246 [RFC2003][RFC2004][RFC4213]), the SEAL header is inserted between 247 the inner IP and outer IPv4 headers as: IP/SEAL/IPv4. 249 o For tunnel-mode IPsec encapsulations over IPv4, [RFC4301], the 250 SEAL header is inserted between the {AH,ESP} header and outer IPv4 251 headers as: IP/*/{AH,ESP}/SEAL/IPv4. 253 o For IP encapsulations over transports such as UDP, the SEAL header 254 is inserted immediately after the outer transport layer header, 255 e.g., as IP/*/SEAL/UDP/IPv4. 257 SEAL protocol packets include a 32-bit SEAL-ID formed from the 258 concatenation of the 16-bit ID Extension field in the SEAL header as 259 the most-significant bits, and with the 16-bit ID value in the outer 260 IPv4 header as the least-significant bits. Routers within the 261 subnetwork use the SEAL-ID for duplicate packet detection, and ITEs/ 262 ETEs use the SEAL-ID for SEAL segmentation and reassembly. 264 SEAL enables a multi-level segmentation and reassembly capability. 265 First, the ITE can use IPv4 fragmentation to fragment inner IPv4 266 packets with DF=0 before SEAL encapsulation to avoid lower-level 267 segmentation and reassembly. Secondly, the SEAL layer itself 268 provides a simple mid-layer cutting-and-pasting of inner IP packets 269 to avoid IPv4 fragmentation on the outer packet. Finally, ordinary 270 IPv4 fragmentation is permitted on the outer packet after SEAL 271 encapsulation and used to detect and dampen any in-the-network 272 fragmentation as quickly as possible. 274 The following sections specifiy the SEAL-related operations of the 275 ITE and ETE, respectively: 277 4.2. ITE Specification 279 4.2.1. Tunnel Interface MTU 281 The ITE configures a tunnel virtual interface over one or more 282 underlying links that connect the border router to the subnetwork. 283 The tunnel interface must present a fixed MTU to the inner IP layer 284 (i.e., Layer 3) as the size for admission of inner IP packets into 285 the tunnel. Since the tunnel interface provides a virtual point-to- 286 multipoint abstraction between the ITE and a potentially large set of 287 ETEs, however, care must be taken in setting the MTU while still 288 upholding end system expectations. 290 Due to the ubiquitous deployment of standard Ethernet and similar 291 networking gear, the nominal Internet cell size has become 1500 292 bytes; this is the de facto size that end systems have come to expect 293 will be delivered by the network without loss due to an MTU 294 restriction on the path, or a suitable ICMP PTB message returned. 295 However, the network may not always deliver the necessary PTBs, 296 leading to MTU-related black holes [RFC2923]. The ITE therefore 297 requires a means for conveying 1500 byte (or smaller) packets to the 298 ETE without loss due to MTU restrictions and without dependence on 299 PTB messages from within the subnetwork. 301 In common deployments, there may be many forwarding hops between the 302 original source and the ITE. Within those hops, there may be 303 additional encapsulations (IPSec, L2TP, etc.) such that a 1500 byte 304 packet sent by the original source might grow to a larger size by the 305 time it reaches the ITE for encapsulation as an inner IP packet, with 306 (2KB-ENCAPS) serving as the nominal worst-case upper bound. 307 Similarly, additional encapsulations on the path from the ITE to the 308 ETE could cause the encapsulated packet to become larger still and 309 trigger in-the-network fragmentation. In order to preserve the end 310 system expectation of delivery for 1500 byte and smaller original 311 packets, the ITE therefore requires a means for conveying them to the 312 ETE even though there may be links within the subnetwork that 313 configure a smaller MTU. 315 The ITE upholds the 1500-byte-and-smaller packet delivery expectation 316 by setting a tunnel virtual interface MTU of 1500 bytes plus extra 317 room to accommodate any additional encapsulations that may occur on 318 the path from the original source (i.e., even if the underlying links 319 do not support an MTU of this size). The ITE can set larger MTU 320 values still (e.g., up to the maximum MTU size of the underlying 321 links), but should select a value that is not so large as to cause 322 excessive internally-generated ICMP PTBs coming from within the 323 tunnel interface (see: Section 4.2.4). 325 4.2.2. SEAL Maximum Segment Size (S-MSS) Maintenance 327 The ITE maintains a SEAL Maximum Segment Size (S-MSS) value for each 328 ETE as soft state within the tunnel interface (e.g., in the IPv4 path 329 MTU discovery cache). The ITE initializes S-MSS to the MTU of the 330 underlying link minus ENCAPS, and decreases or increases S-MSS based 331 on any Fragmentation Report (FRAGREP) messages received (see: Section 332 4.2.7). 334 4.2.3. Inner Packet Fragmentation 336 The ITE performs inner packet fragmentation *before* it admits an 337 inner packet into the tunnel interface. 339 For inner IPv4 packets larger than 1500 bytes and with the IPv4 Don't 340 Fragment (DF) bit set to 0, the ITE uses IPv4 fragmentation to break 341 the packet into 1500 byte IPv4 fragments, with the final fragment 342 possibly smaller than the first fragment. The IPv4 layer then admits 343 each fragment into the tunnel as an independent inner IPv4 packet. 344 These IPv4 fragments will ultimately be reassembled by the final 345 destination. (Note that inner fragmentation may not be available for 346 certain ITE types, e.g., for tunnel-mode IPsec.) 348 For all other inner packets, the ITE admits the packet if it is no 349 larger than the tunnel interface MTU; otherwise, it drops the packet 350 and sends an ICMP PTB message to the source. 352 4.2.4. SEAL Segmentation and Encapsulation 354 The ITE performs SEAL segmentation and encapsulation *after* it 355 admits an inner packet into the tunnel interface. 357 For inner IP packets larger than (2KB-ENCAPS) and also larger than 358 S-MSS, the ITE drops the packet and sends an ICMP PTB message back to 359 the source. Otherwise, the ITE encapsulates the packet in any mid- 360 layer '*' headers (for '*' other than the SEAL header). Next, if the 361 inner IP packet plus '*' headers is larger than S-MSS the ITE breaks 362 it into N segments (N <= 16) that are no larger than S-MSS bytes 363 each. Each segment except the final one MUST be of equal length, 364 while the final segment MUST be no larger than the initial segment. 365 The first byte of each segment MUST begin immediately after the final 366 byte of the previous segment, i.e., the segments MUST NOT overlap. 368 Note that this SEAL segmentation and encapsulation ignores the DF bit 369 in the inner IPv4 header or (in the case of IPv6) ignores the fact 370 that the network is not permitted to perform IPv6 fragmentation. 371 This segmentation process is a mid-layer (not an IP layer) operation 372 employed by the ITE to adapt the inner IP packet to the subnetwork 373 path characteristics, and the ETE will restore the inner packet to 374 its original form during decapsulation. Therefore, the fact that the 375 packet may have been segmented within the subnetwork is not 376 observable after decapsulation. 378 The ITE encapsulates each segment in a SEAL header formatted as 379 follows: 381 0 1 2 3 382 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 | ID Extension |R|M|CTL|Segment| Next Header | 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 Figure 2: SEAL Header Format 389 where the header fields are defined as follows: 391 ID Extension (16) 392 a 16-bit extension of the 16-bit ID field in the outer IPv4 393 header; encodes the most-significant 16 bits of a 32 bit SEAL-ID 394 value. 396 R (1) 397 Reserved. 399 M (1) 400 the "More Segments" bit. Set to 1 if this SEAL protocol packet 401 contains a non-final segment of a multi-segment inner IP packet. 403 CTL (2) 404 a 2-bit "Control" field that identifies the type of SEAL protocol 405 packet as follows: 407 '00' - a Fragmentation Report (FRAGREP). 409 '01' - a non-probe. 411 '10' - an implicit probe. 413 '11' - an explicit probe. 415 Segment (4) 416 a 4-bit Segment number. Encodes a segment number between 0 - 15. 418 Next Header (8) an 8-bit field that encodes an IP protocol number 419 the same as for the IPv4 protocol and IPv6 next header fields. 421 For single-segment inner IP packets, the ITE encapsulates the segment 422 in a SEAL header with (M=0; Segment=0). For N-segment inner packets 423 (N <= 16), the ITE encapsulates each segment in a SEAL header with 424 (M=1; Segment=0) for the first segment, (M=1; Segment=1) for the 425 second segment, etc., with the final segment setting (M=0; 426 Segment=N-1). 428 The ITE next sets CTL in the SEAL header of each segment as specified 429 in Section 4.2.6, then writes the IP protocol number corresponding to 430 the inner packet in the SEAL 'Next Header' field. Next, the ITE 431 encapsulates the segment in the requisite */IPv4 outer headers 432 according to the specific encapsulation format (e.g., [RFC2003], 433 [RFC4213], [RFC4380], etc.), except that it writes 'SEAL_PROTO' in 434 the protocol field of the outer IPv4 header (when simple IPv4 435 encapsualtion is used) or writes 'SEAL_PORT' in the outer destination 436 service port field (e.g., when UDP/IPv4 encapsulation is used). The 437 ITE finally sets packet identification values as described below. 439 For the purpose of packet identification, the ITE maintains a 32-bit 440 SEAL-ID value as per-ETE soft state, e.g. in the IPv4 destination 441 cache. The ITE randomly-initializes SEAL-ID when the soft state is 442 created and monotonically increments it (modulo 2^32) for each 443 successive SEAL protocol packet it sends to the ETE. For each 444 packet, the ITE writes the least-significant 16 bits of the SEAL-ID 445 value in the ID field in the outer IPv4 header, and writes the most- 446 significant 16 bits in the ID Extension field in the SEAL header. 448 For tunnels that may traverse an IPv4 Network Address Translator 449 (NAT), the ITE instead maintains SEAL-ID as a 16-bit value that it 450 randomly-initializes when the soft state is created and monotonically 451 increments (modulo 2^16) for each successive SEAL protocol packet. 452 For each packet, the ITE writes SEAL-ID in the ID extension field of 453 the SEAL header and writes a random 16-bit value in the ID field in 454 the outer IPv4 header. This requires that both the ITE and ETE 455 participate in this alternate scheme. 457 4.2.5. Sending SEAL Protocol packets 459 Following SEAL segmentation and encapsulation, the ITE sets DF=0 in 460 the outer IPv4 header of every outer packet it sends. 462 The ITE then sends each outer packet that encapsulates a segment of 463 the same inner packet into the tunnel in canonical order, i.e., 464 Segment 0 first, then Segment 1, etc. and finally Segment N-1. 466 4.2.6. Sending S-MSS Probes 468 When S-MSS is larger than 128, the ITE sends each data packet as an 469 implicit probe to detect any in-the-network IPv4 fragmentation. The 470 ITE sets CTL='10' in the SEAL header and DF=0 in the outer IPv4 471 header of each SEAL protocol packet, and will receive FRAGREP 472 messages from the ETE if fragmentation occurs. When S-MSS=128, the 473 ITE instead sets CTL='01' in the SEAL header to avoid generating 474 FRAGREPs for unavoidable in-the-network fragmentation. 476 The ITE additionally sends explicit probes periodically to manage a 477 window of SEAL-IDs of outstanding probes that allows the ITE to 478 validate any FRAGREPs it receives. The ITE sends explicit probes by 479 setting CTL='11' in the SEAL header and DF=0 in the IPv4 header, 480 where the probe can be either an ordinary data packet or a NULL 481 packet created by setting the 'Next Header' field in the SEAL header 482 to a value of "No Next Header". 484 The ITE should also send explicit probes that are larger than S-MSS 485 periodically to detect increases in the path MTU to the ETE; the ITE 486 can send a large probe using either a NULL packet or an ordinary data 487 packet that is padded at the end by setting the outer IPv4 length 488 field to a larger value than the packet's true length. When the ETE 489 receives an explicit probe, it will return a FRAGREP message whether 490 or not any in-the-network fragmentation occured. 492 4.2.7. Processing Fragmentation Reports (FRAGREPs) 494 When the ITE receives a potential FRAGREP message, it first verifies 495 that the message was formatted correctly (see: Section 4.3.3) and 496 that the SEAL-ID embedded in the encapsulated IPv4 first-fragment is 497 within the current window of outstanding probes. If the FRAGREP is 498 valid, the ITE advances the probe window and sets a variable 'LEN' to 499 the value in the first-fragment's IPv4 length field. If (LEN-ENCAPS) 500 is smaller than S-MSS and the first-fragment was also the final 501 fragment, the ITE discards the FRAGREP. Otherwise, it re-calculates 502 S-MSS as follows: 504 if (LEN-ENCAPS) is greater than S-MSS or LEN is at least 576 505 set S-MSS to (LEN-ENCAPS) 506 else 507 set S-MSS to the maximum of S-MSS/2 and 128 508 endif 510 Finally, if the length field of the inner IP header encapsulated 511 within the first-fragment contains a value larger than (2KB-ENCAPS), 512 and the length field of the first-fragment header contains a still 513 larger value, the ITE discards the FRAGREP. Otherwise, it 514 encapsulates the inner IP packet portion embedded within the first- 515 fragment in an ICMP PTB to send back to the original source, with the 516 MTU field set to the maximum of 2KB-ENCAPS and (length of the first- 517 fragment minus ENCAPS). 519 (NB: The "576" in the S-MSS calculation above is the nominal minimum 520 MTU for typical IPv4 links and accounts for normal-case IPv4 first 521 fragments, while the "else" clause includes a "limited halving" 522 factor that accounts for unusual cases in which the ETE receives a 523 small IPv4 first-fragment [RFC1812]. This limited halving may 524 require multiple iterations of sending probes and receiving FRAGREPs, 525 but will soon converge to a stable value for S-MSS.) 527 4.2.8. Processing ICMP PTBs 529 SInce the ITE sends all SEAL protocol packets with DF=0, it 530 unconditionally ignores any ICMP PTBs pertaining to SEAL protocol 531 packets that it receives from within the tunnel. 533 4.3. ETE Specification 535 4.3.1. Reassembly Buffer Requirements 537 ETEs MUST be capable of using IPv4-layer reassembly to reassemble 538 SEAL protocol outer packets of at least 2KB bytes, and MUST also be 539 capable of using SEAL-layer reassembly to reassemble inner IP packets 540 of (2KB-ENCAPS). 542 4.3.2. IPv4-Layer Reassembly 544 The ETE performs IPv4 reassembly as-normal, and should maintain a 545 conservative high- and low-water mark for the number of outstanding 546 reassemblies pending for each ITE. When the size of the reassembly 547 buffer exceeds this high-water mark, the ETE actively discards 548 incomplete reassemblies (e.g., using an Active Queue Management (AQM) 549 strategy) until the size falls below the low-water mark. 551 After reassembly, the ETE either accepts or discards the reassembled 552 packet based on the current status of the IPv4 reassembly cache 553 (congested vs uncongested). The SEAL-ID included in the IPv4 first- 554 fragment provides an additional level of reassembly assurance, since 555 it can record a distinct arrival timestamp useful for associating the 556 first-fragment with its corresponding non-initial fragments. The 557 choice of accepting/discarding a reassembly may also depend on the 558 strength of the upper-layer integrity check if known (e.g., IPSec/ESP 559 provides a strong upper-layer integrity check) and/or the corruption 560 tolerance of the data (e.g., multicast streaming audio/video may be 561 more corruption-tolerant than file transfer, etc.). 563 4.3.3. Generating Fragmentation Reports (FRAGREPs) 565 During IPv4-layer reassembly, the ETE determines whether the packet 566 belongs to the SEAL protocol by checking for SEAL_PROTO in the outer 567 IPv4 header (i.e., for simple IPv4 encapsulation) or for SEAL_PORT in 568 the outer */IPv4 header (e.g., for '*'=UDP). 570 When the ETE receives the IPv4 first-fragment of a SEAL protocol 571 packet that was delivered as multiple IPv4 fragments and with 572 CTL='10' in the SEAL header, it sends a FRAGREP message back to the 573 ITE. The ETE also sends a FRAGREP for any SEAL packet with CTL='11', 574 i.e., even if the packet was not fragmented and while treating the 575 unfragmented packet the same as a first-fragment. 577 The ETE prepares the FRAGREP message by encapsulating the leading 256 578 bytes (or up to the end) of the first-fragment in outer SEAL/*/IPv4 579 headers as shown in Figure 3: 581 +-------------------------+ - 582 | | \ 583 ~ Outer */IPv4 headers ~ | 584 ~ of FRAGREP ~ > FRAGREP headers 585 | | | 586 +-------------------------+ | 587 | SEAL Header of FRAGREP | / 588 +-------------------------+ - 589 | | \ 590 ~ IP/*/SEAL/*/IPv4 ~ | 591 ~ hdrs of first-fragment ~ | 592 | | > First 256 bytes (or up to 593 +-------------------------+ | the end) of first-fragment 594 | | | 595 ~ Data of first-fragment ~ | 596 | | / 597 +-------------------------+ - 599 Figure 3: Fragmentation Report (FRAGREP) Message 601 The ETE next sets CTL='00', Segment=0 and M=0 in the outer SEAL 602 header, sets the SEAL-ID the same as for any SEAL packet, then sets 603 the SEAL Next Header field and the fields of the outer */IPv4 headers 604 the same as for ordinay SEAL encapsulation (see: Section 4.2.4). The 605 ETE then sets the FRAGREP's destination address to the source address 606 of the first-fragment and sets the FRAGREP's source address to the 607 destination address of the first-fragment. If the destination 608 address in the first-fragment was multicast, the ETE instead sets the 609 FRAGREP's source address to an address assigned to the underlying 610 IPv4 interface. Finally, the ETE sends the FRAGREP to the ITE. 612 4.3.4. SEAL-Layer Reassembly and Decapsulation 614 Following IPv4 reassembly of a SEAL protocol packet and generation of 615 FRAGREPs, the ETE performs SEAL-Layer reassembly (if necessary) then 616 decapsulates and processes the inner packet. 618 For SEAL protocol packets that are larger than 2KB and that arrived 619 as multiple IPv4 fragments, the ETE discards all non-initial IPv4 620 fragments and decapsulates the inner packet from the first fragment 621 only. If the inner packet is a single-segment SEAL packet that was 622 fully-contained within the IPv4 first fragment (i.e., all non-initial 623 IPv4 fragments contained only padding bytes), the ETE discards the 624 encapsulating */SEAL/IPv4 headers and processes the inner packet as- 625 normal; otherwise it drops the packet. This ensures that tunnel is 626 consistent in its handling of large inner packets. 628 For all other SEAL protocol packets, the ETE performs SEAL-layer 629 reassembly for multi-segment inner packets through simple in-order 630 concatenation of the encapsulated segments from N consecutive SEAL 631 protocol packets from the same inner packet. SEAL-layer reassembly 632 requires the ETE to maintain a cache of recently received SEAL 633 packets for a hold time that would allow for reasonable inter-segment 634 delays. The ETE uses a SEAL maximum segment lifetime of 15 seconds 635 for this purpose, i.e., the time after which it will discard an 636 incomplete reassembly. However, the ETE should also actively discard 637 any pending reassemblies that clearly have no opportunity for 638 completion, e.g., when a considerable number of new SEAL packets have 639 been received before a packet that completes a pending reassembly has 640 arrived. 642 The ETE reassembles the inner packet segments in SEAL protocol 643 packets that contain Segment numbers 0 through N-1, with M=0/1 in 644 final and non-final segments (respectively) and with consecutive 645 SEAL-ID values. That is, for an N-segment inner packet, reassembly 646 entails the concatenation of the SEAL-encapsulated segments with 647 (Segment 0, SEAL-ID i), followed by (Segment 1, SEAL-ID ((i + 1) mod 648 2^32)), etc. up to (Segment N-1, SEAL-ID ((i + N-1) mod 2^32)). (For 649 tunnels that may traverse an IPv4 NAT, the ETE instead uses only a 650 16-bit SEAL-ID value, and uses mod 2^16 arithmetic to associate the 651 segments of the same packet.) 653 Following SEAL-layer reassembly (if necessary), the ETE discards the 654 encapsulating */SEAL/IPv4 headers and processes the inner packet as- 655 normal. 657 5. Link Requirements 659 Subnetwork designers are strongly encouraged to follow the 660 recommendations in [RFC3819] when configuring link MTUs, where all 661 IPv4 links SHOULD configure a minimum MTU of 576 bytes. Links that 662 cannot configure an MTU of at least 576 bytes (e.g., due to 663 performance characteristics) SHOULD implement transparent link-layer 664 segmentation and reassembly such that an MTU of at least 576 can 665 still be presented to the IP layer. 667 6. End System Requirements 669 SEAL provides robust mechanisms for returning ICMP PTB messages to 670 the original source, however end systems that send unfragmentable IP 671 packets larger than 1500 bytes are strongly encouraged to use 672 Packetization Layer Path MTU Discovery per [RFC4821]. 674 7. Router Requirements 676 IPv4 routers within the subnetwork observe the requirements in 677 [RFC1812], and are strongly encouraged to implement IPv4 678 fragmentation such that the first fragment is the largest and 679 approximately the size of the underlying link MTU. 681 8. IANA Considerations 683 SEAL_PROTO and SEAL_PORT are taken from their respective range of 684 experimental values documented in [RFC3692][RFC4727]. These values 685 are for experimentation purposes only, and not to be used for any 686 kind of deployments (i.e., they are not to be shipped in any 687 products). This document therefore has no actions for IANA. 689 9. Security Considerations 691 Unlike IPv4 fragmentation, overlapping fragment attacks are not 692 possible due to the requirement that SEAL segments be non- 693 overlapping. 695 An amplification/reflection attack is possible when an attacker sends 696 IPv4 first-fragments with spoofed source addresses to an ETE, 697 resulting in a stream of FRAGREP messages returned to a victim ITE. 698 The encapsulated segment of the spoofed IPv4 first-fragment provides 699 mitigation for the ITE to detect and discard spurious FRAGREPs. 701 The SEAL header is sent in-the-clear (outside of any IPsec/ESP 702 encapsulations) the same as for the IPv4 header. As for IPv6 703 extension headers, the SEAL header is protected only by L2 integrity 704 checks and is not covered under any L3 integrity checks. 706 10. Acknowledgments 708 Path MTU determination through the report of fragmentation 709 experienced by the final destination was first proposed by Charles 710 Lynn of BBN on the TCP-IP mailing list in May 1987. An historical 711 analysis of the evolution of path MTU discovery appears in 712 http://www.tools.ietf.org/html/draft-templin-v6v4-ndisc-01 and is 713 reproduced in Appendix A of this document. 715 The following individuals are acknowledged for helpful comments and 716 suggestions: Jari Arkko, Fred Baker, Teco Boot, Iljitsch van Beijnum, 717 Brian Carpenter, Steve Casner, Ian Chakeres, Remi Denis-Courmont, 718 Aurnaud Ebalard, Gorry Fairhurst, Joel Halpern, John Heffner, Bob 719 Hinden, Christian Huitema, Joe Macker, Matt Mathis, Dan Romascanu, 720 Dave Thaler, Joe Touch, Magnus Westerlund, Robin Whittle, James 721 Woodyatt and members of the Boeing PhantomWorks DC&NT group. 723 11. References 725 11.1. Normative References 727 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 728 September 1981. 730 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 731 RFC 1812, June 1995. 733 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 734 Requirement Levels", BCP 14, RFC 2119, March 1997. 736 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 737 (IPv6) Specification", RFC 2460, December 1998. 739 11.2. Informative References 741 [FOLK] C, C., D, D., and k. k, "Beyond Folklore: Observations on 742 Fragmented Traffic", December 2002. 744 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 745 October 1987. 747 [I-D.ietf-manet-smf] 748 Macker, J. and S. Team, "Simplified Multicast Forwarding 749 for MANET", draft-ietf-manet-smf-07 (work in progress), 750 February 2008. 752 [I-D.templin-autoconf-dhcp] 753 Templin, F., Russert, S., and S. Yi, "The MANET Virtual 754 Ethernet (VET) Abstraction", 755 draft-templin-autoconf-dhcp-14 (work in progress), 756 April 2008. 758 [MTUDWG] "IETF MTU Discovery Working Group mailing list, 759 gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November 760 1989 - February 1995.". 762 [RFC1063] Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP 763 MTU discovery options", RFC 1063, July 1988. 765 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 766 November 1990. 768 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 769 for IP version 6", RFC 1981, August 1996. 771 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 772 October 1996. 774 [RFC2004] Perkins, C., "Minimal Encapsulation within IP", RFC 2004, 775 October 1996. 777 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 778 RFC 2923, September 2000. 780 [RFC3692] Narten, T., "Assigning Experimental and Testing Numbers 781 Considered Useful", BCP 82, RFC 3692, January 2004. 783 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 784 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 785 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 786 RFC 3819, July 2004. 788 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 789 for IPv6 Hosts and Routers", RFC 4213, October 2005. 791 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 792 Internet Protocol", RFC 4301, December 2005. 794 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 795 Network Address Translations (NATs)", RFC 4380, 796 February 2006. 798 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 799 Network Tunneling", RFC 4459, April 2006. 801 [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, 802 ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006. 804 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 805 Discovery", RFC 4821, March 2007. 807 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 808 Errors at High Data Rates", RFC 4963, July 2007. 810 [TCP-IP] "TCP-IP mailing list archives, 811 http://www-mice.cs.ucl.ac.uk/multimedia/mist/tcpip, May 812 1987 - May 1990.". 814 Appendix A. Historic Evolution of PMTUD (written 10/30/2002) 816 The topic of Path MTU discovery (PMTUD) saw a flurry of discussion 817 and numerous proposals in the late 1980's through early 1990. The 818 initial problem was posed by Art Berggreen on May 22, 1987 in a 819 message to the TCP-IP discussion group [TCP-IP]. The discussion that 820 followed provided significant reference material for [FRAG]. An IETF 821 Path MTU Discovery Working Group [MTUDWG] was formed in late 1989 822 with charter to produce an RFC. Several variations on a very few 823 basic proposals were entertained, including: 825 1. Routers record the PMTUD estimate in ICMP-like path probe 826 messages (proposed in [FRAG] and later [RFC1063]) 828 2. The destination reports any fragmentation that occurs for packets 829 received with the "RF" (Report Fragmentation) bit set (Steve 830 Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal) 832 3. A hybrid combination of 1) and Charles Lynn's Nov. 1987 proposal 833 (straw RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990) 835 4. Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30, 836 1990) 838 5. Fragmentation avoidance by setting "IP_DF" flag on all packets 839 and retransmitting if ICMPv4 "fragmentation needed" messages 840 occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191] 841 by Mogul and Deering). 843 Option 1) seemed attractive to the group at the time, since it was 844 believed that routers would migrate more quickly than hosts. Option 845 2) was a strong contender, but repeated attempts to secure an "RF" 846 bit in the IPv4 header from the IESG failed and the proponents became 847 discouraged. 3) was abandoned because it was perceived as too 848 complicated, and 4) never received any apparent serious 849 consideration. Proposal 5) was a late entry into the discussion from 850 Steve Deering on Feb. 24th, 1990. The discussion group soon 851 thereafter seemingly lost track of all other proposals and adopted 852 5), which eventually evolved into [RFC1191] and later [RFC1981]. 854 In retrospect, the "RF" bit postulated in 2) is not needed if a 855 "contract" is first established between the peers, as in proposal 4) 856 and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on 857 Feb 19. 1990. These proposals saw little discussion or rebuttal, and 858 were dismissed based on the following the assertions: 860 o routers upgrade their software faster than hosts 862 o PCs could not reassemble fragmented packets 864 o Proteon and Wellfleet routers did not reproduce the "RF" bit 865 properly in fragmented packets 867 o Ethernet-FDDI bridges would need to perform fragmentation (i.e., 868 "translucent" not "transparent" bridging) 870 o the 16-bit IP_ID field could wrap around and disrupt reassembly at 871 high packet arrival rates 873 The first four assertions, although perhaps valid at the time, have 874 been overcome by historical events leaving only the final to 875 consider. But, [FOLK] has shown that IP_ID wraparound simply does 876 not occur within several orders of magnitude the reassembly timeout 877 window on high-bandwidth networks. 879 (Authors 2/11/08 note: this final point was based on a loose 880 interpretation of [FOLK], and is more accurately addressed in 881 [RFC4963].) 883 Author's Address 885 Fred L. Templin (editor) 886 Boeing Phantom Works 887 P.O. Box 3707 888 Seattle, WA 98124 889 USA 891 Email: fltemplin@acm.org 893 Full Copyright Statement 895 Copyright (C) The IETF Trust (2008). 897 This document is subject to the rights, licenses and restrictions 898 contained in BCP 78, and except as set forth therein, the authors 899 retain all their rights. 901 This document and the information contained herein are provided on an 902 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 903 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 904 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 905 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 906 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 907 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 909 Intellectual Property 911 The IETF takes no position regarding the validity or scope of any 912 Intellectual Property Rights or other rights that might be claimed to 913 pertain to the implementation or use of the technology described in 914 this document or the extent to which any license under such rights 915 might or might not be available; nor does it represent that it has 916 made any independent effort to identify any such rights. Information 917 on the procedures with respect to rights in RFC documents can be 918 found in BCP 78 and BCP 79. 920 Copies of IPR disclosures made to the IETF Secretariat and any 921 assurances of licenses to be made available, or the result of an 922 attempt made to obtain a general license or permission for the use of 923 such proprietary rights by implementers or users of this 924 specification can be obtained from the IETF on-line IPR repository at 925 http://www.ietf.org/ipr. 927 The IETF invites any interested party to bring to its attention any 928 copyrights, patents or patent applications, or other proprietary 929 rights that may cover technology that may be required to implement 930 this standard. Please address the information to the IETF at 931 ietf-ipr@ietf.org.