idnits 2.17.1 draft-templin-seal-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 886. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 897. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 904. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 910. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 21, 2008) is 5820 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-manet-smf' is defined on line 729, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-14) exists of draft-ietf-manet-smf-07 == Outdated reference: A later version (-38) exists of draft-templin-autoconf-dhcp-14 -- Obsolete informational reference (is this intentional?): RFC 1063 (Obsoleted by RFC 1191) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Phantom Works 4 Intended status: Informational April 21, 2008 5 Expires: October 23, 2008 7 The Subnetwork Encapsulation and Adaptation Layer (SEAL) 8 draft-templin-seal-10.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on October 23, 2008. 35 Abstract 37 Subnetworks are connected network regions bounded by border routers 38 that forward unicast and multicast packets over a virtual topology 39 manifested by tunneling. This virtual topology resembles a "virtual 40 ethernet", but may span multiple IP- and/or sub-IP layer forwarding 41 hops that can introduce packet duplication and/or traverse links with 42 diverse Maximum Transmission Units (MTUs). This document specifies a 43 Subnetwork Encapsulation and Adaptation Layer (SEAL) that 44 accommodates such virtual topologies over diverse underlying link 45 technologies. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 2. Terminology and Requirements . . . . . . . . . . . . . . . . . 4 51 3. Applicability Statement . . . . . . . . . . . . . . . . . . . 5 52 4. SEAL Protocol Specification . . . . . . . . . . . . . . . . . 5 53 4.1. Model of Operation . . . . . . . . . . . . . . . . . . . . 5 54 4.2. ITE Specification . . . . . . . . . . . . . . . . . . . . 7 55 4.2.1. Tunnel Interface MTU . . . . . . . . . . . . . . . . . 7 56 4.2.2. SEAL Maximum Segment Size (S-MSS) Maintenance . . . . 8 57 4.2.3. Inner Packet Fragmentation . . . . . . . . . . . . . . 8 58 4.2.4. SEAL Segmentation and Encapsulation . . . . . . . . . 8 59 4.2.5. Sending SEAL packets . . . . . . . . . . . . . . . . . 10 60 4.2.6. Sending S-MSS Probes . . . . . . . . . . . . . . . . . 11 61 4.2.7. Processing Fragmentation Reports (FRAGREPs) . . . . . 11 62 4.2.8. Processing ICMP PTBs . . . . . . . . . . . . . . . . . 12 63 4.3. ETE Specification . . . . . . . . . . . . . . . . . . . . 12 64 4.3.1. Reassembly Buffer Requirements . . . . . . . . . . . . 12 65 4.3.2. IPv4-Layer Reassembly . . . . . . . . . . . . . . . . 12 66 4.3.3. SEAL-Layer Reassembly . . . . . . . . . . . . . . . . 13 67 4.3.4. Generating Fragmentation Reports (FRAGREPs) . . . . . 13 68 5. Link Requirements . . . . . . . . . . . . . . . . . . . . . . 14 69 6. End System Requirements . . . . . . . . . . . . . . . . . . . 15 70 7. Router Requirements . . . . . . . . . . . . . . . . . . . . . 15 71 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 72 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 73 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 74 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 75 11.1. Normative References . . . . . . . . . . . . . . . . . . . 16 76 11.2. Informative References . . . . . . . . . . . . . . . . . . 16 77 Appendix A. Historic Evolution of PMTUD (written 10/30/2002) . . 18 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19 79 Intellectual Property and Copyright Statements . . . . . . . . . . 20 81 1. Introduction 83 As internet technology and communication has grown and matured, many 84 techniques have developed that use virtual topologies (frequently 85 tunnels of one form or another) over an actual IP network. Those 86 virtual topologies have elements which appear as one hop in the 87 virtual topology, but are actually multiple IP or sub-IP layer hops. 88 These multiple hops often have quite diverse properties which are 89 often not even visible to the end-points of the virtual hop. This 90 introduces many failure modes that are not dealt with well in current 91 approaches. 93 The use of IP encapsulation has long been considered as an 94 alternative for creating such virtual topologies. However, the 95 insertion of an outer IP header reduces the effective path MTU as- 96 seen by the IP layer. When IPv4 is used, this reduced MTU can be 97 accommodated through the use of IPv4 fragmentation, but unmitigated 98 in-the-network fragmentation has been shown to be harmful through 99 operational experience and studies conducted over the course of many 100 years [FRAG][FOLK][RFC4963]. Additionally, classical path MTU 101 discovery [RFC1191] has known operational issues that are exacerbated 102 by in-the-network tunnels [RFC2923][RFC4459]. 104 For the purpose of this document, subnetworks are defined as virtual 105 topologies that span connected network regions bounded by border 106 routers. Examples include the global Internet interdomain routing 107 core, Mobile Ad hoc Networks (MANETs) and enterprise networks. These 108 subnetworks are mainfested by tunnels that may span many underlying 109 networks and traditional IP subnets, e.g., in the internal 110 organization of an enterprise network. Subnetwork border routers 111 support the Internet protocols [RFC0791][RFC2460] and forward unicast 112 and multicast IP packets over the virtual topology across multiple 113 IP- and/or sub-IP layer forwarding hops which may introduce packet 114 duplication and/or traverse links with diverse Maximum Transmission 115 Units (MTUs). 117 This document proposes a Subnetwork Encapsulation and Adaptation 118 Layer (SEAL) for the operation of IP over subnetworks that connect 119 the Ingress- and Egress Tunnel Endpoints (ITEs/ETEs) of border 120 routers. SEAL accommodates links with diverse MTUs and supports 121 efficient duplicate packet detection by introducing a minimal mid- 122 layer encapsulation. The SEAL encapsulation introduces an extended 123 Identification field for packet identification and a mid-layer 124 segmentation and reassembly capability that allows simplified cutting 125 and pasting of packets without invoking in-the-network IP 126 fragmentation. The SEAL protocol is specified in the following 127 sections. 129 2. Terminology and Requirements 131 The term "subnetwork" in this document refers to a virtual topology 132 that is configured over a connected network region bounded by border 133 routers and that that appears as a fully-connected shared link, i.e., 134 a "Virtual Ethernet (VET)" [I-D.templin-autoconf-dhcp]. 136 The terms "inner" and "outer" respectively refer to the innermost IP 137 {layer, protocol, header, packet, etc.} *before* any encapsulation, 138 and the outermost IP {layer, protocol, header, packet etc.} *after* 139 any encapsulation. Between these inner and outer layers, there may 140 also be "mid-layer" encapsulations. 142 The notation IPvX/*/IPvY refers to an inner IPvX packet encapsulated 143 in any '*' mid-layer headers (including the SEAL header) followed by 144 an outer IPvY header. The notation "IP" means either IP protocol 145 version (IPv4 or IPv6). 147 The following abbreviations correspond to terms used within this 148 document and elsewhere in common Internetworking nomenclature: 150 Subnetwork - a connected network region bounded by border routers 152 SEAL - Subnetwork Encapsulation and Adaptation Layer 154 VET - Virtual EThernet 156 MANET - Mobile Ad-hoc Network 158 ITE - Ingress Tunnel Endpoint 160 ETE - Egress Tunnel Endpoint 162 ENCAPS - the size of the outer encapsulating SEAL/*/IPv4 headers 164 MTU - Maximum Transmission Unit 166 S-MSS - the per-ETE SEAL Maximum Segment Size 168 PTB - an ICMPv6 "Packet Too Big" or an ICMPv4 "fragmentation 169 needed" message 171 DF - the IPv4 header Don't Fragment flag 173 FRAGREP - a Fragmentation Report message 175 SEAL-ID - a 32-bit Identification value; randomly initialized and 176 monotonically incremented for each SEAL-encapsulated packet 178 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 179 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 180 document, are to be interpreted as described in [RFC2119]. 182 3. Applicability Statement 184 SEAL was motivated by the specific use case of subnetwork abstraction 185 for MANETs, however the domain of applicability also extends to 186 subnetwork abstractions of enterprise networks, the interdomain 187 routing core, etc. The domain of application therefore also includes 188 the map-and-encaps architecture proposals in the IRTF Routing 189 Research Group (RRG) (see: http://www3.tools.ietf.org/group/irtf/ 190 trac/wiki/RoutingResearchGroup). 192 SEAL introduces a minimal new mid-layer for IPvX in IPvY 193 encapsulation (e.g., as IPv6/SEAL/IPv4), and appears as a subnetwork 194 encapsulation as seen by the inner IP layer. SEAL can also be used 195 as a mid-layer for encapsulating inner IP packets within outer UDP/ 196 IPv4 header (e.g., as IP/SEAL/UDP/IPv4) such as for the Teredo domain 197 of applicability [RFC4380]. For further study, SEAL may also be 198 useful for "transport-mode" applications, e.g., when the inner layer 199 includes ordinary protocol data rather than an encapsulated IP 200 packet. 202 The current document version is specific to the use of IPv4 as the 203 outer encapsulation layer, however the same principles apply when 204 IPv6 is used as the outer layer. 206 4. SEAL Protocol Specification 208 4.1. Model of Operation 210 Ingres Tunnel Endpoints (ITEs) insert a SEAL header in the IP/*/ 211 IPv4-encapsulated packets they inject into a subnetwork, where the 212 outermost IPv4 header contains the source and destination addresses 213 of the subnetwork entry/exit points (i.e., the ITE/ETE), 214 respectively. SEAL defines a new IP protocol type and a new mid- 215 layer encapsulation for both unicast and multicast inner IP packets. 216 The ITE inserts a SEAL header during encapsulation as shown in 217 Figure 1: 219 +-------------------------+ 220 | | 221 ~ Outer */IPv4 headers ~ 222 | | 223 +-------------------------+ 224 | SEAL Header | 225 +-------------------------+ +-------------------------+ 226 ~ Any mid-layer * headers ~ ~ Any mid-layer * headers ~ 227 +-------------------------+ +-------------------------+ 228 | | | | 229 ~ Inner IP ~ ---> ~ Inner IP ~ 230 ~ Packet ~ ---> ~ Packet ~ 231 | | | | 232 +-------------------------+ +-------------------------+ 233 ~ Any mid-layer trailers ~ ~ Any mid-layer trailers ~ 234 +-------------------------+ +-------------------------+ 235 ~ Any outer trailers ~ 236 +-------------------------+ 238 Figure 1: SEAL Encapsulation 240 where the SEAL header is inserted as follows: 242 o For simple IP/IPv4 encapsulations (e.g., 243 [RFC2003][RFC2004][RFC4213]), the SEAL header is inserted between 244 the inner IP and outer IPv4 headers as: IP/SEAL/IPv4. 246 o For tunnel-mode IPsec encapsulations over IPv4, [RFC4301], the 247 SEAL header is inserted between the {AH,ESP} header and outer IPv4 248 headers as: IP/*/{AH,ESP}/SEAL/IPv4. 250 o For IP encapsulations over transports such as UDP, the SEAL header 251 is inserted immediately after the outer transport layer header, 252 e.g., as IP/*/SEAL/UDP/IPv4. 254 SEAL-encapsulated packets include a 32-bit SEAL-ID formed from the 255 concatenation of the 16-bit ID Extension field in the SEAL header as 256 the most-significant bits, and with the 16-bit ID value in the outer 257 IPv4 header as the least-significant bits. Routers within the 258 subnetwork use the SEAL-ID for duplicate packet detection, and ITEs/ 259 ETEs use the SEAL-ID for SEAL segmentation and reassembly. 261 SEAL enables a multi-level segmentation and reassembly capability. 262 First, the ITE can use IPv4 fragmentation to fragment inner IPv4 263 packets with DF=0 before SEAL encapsulation to avoid lower-level 264 segmentation and reassembly. Secondly, the SEAL layer itself 265 provides a simple mid-layer cutting-and-pasting of inner IP packets 266 to avoid IPv4 fragmentation on the outer packet. Finally, ordinary 267 IPv4 fragmentation is permitted on the outer packet after SEAL 268 encapsulation and used to detect and dampen any in-the-network 269 fragmentation as quickly as possible. 271 The following sections specifiy the SEAL-related operations of the 272 ITE and ETE, respectively: 274 4.2. ITE Specification 276 4.2.1. Tunnel Interface MTU 278 The ITE configures a tunnel virtual interface over one or more 279 underlying links that connect the border router to the subnetwork. 280 The tunnel interface must present a fixed MTU to the inner IP layer 281 (i.e., Layer 3) as the size for admission of inner IP packets into 282 the tunnel. Since the tunnel interface provides a virtual point-to- 283 multipoint abstraction between the ITE and a potentially large set of 284 ETEs, however, care must be taken in setting the MTU while still 285 upholding end system expectations. 287 Due to the ubiquitous deployment of standard Ethernet and similar 288 networking gear, the nominal Internet cell size has become 1500 289 bytes; this is the de facto size that end systems have come to expect 290 will be delivered by the network without loss due to an MTU 291 restriction on the path, or a suitable ICMP PTB message returned. 292 However, the network may not always deliver the necessary PTBs, 293 leading to MTU-related black holes [RFC2923]. The ITE therefore 294 requires a means for conveying 1500 byte (or smaller) packets to the 295 ETE without loss due to MTU restrictions and without dependence on 296 PTB messages from within the subnetwork. 298 In common deployments, there may be many forwarding hops between the 299 original source and the ITE. Within those hops, there may be 300 additional encapsulations (IPSec, L2TP, etc.) such that a 1500 byte 301 packet sent by the original source might grow to a larger size by the 302 time it reaches the ITE for encapsulation as an inner IP packet, with 303 (2KB-ENCAPS) serving as the nominal worst-case upper bound. 304 Similarly, additional encapsulations on the path from the ITE to the 305 ETE could cause the encapsulated packet to become larger still and 306 trigger in-the-network fragmentation. In order to preserve the end 307 system expectation of delivery for 1500 byte and smaller original 308 packets, the ITE therefore requires a means for conveying them to the 309 ETE even though there may be links within the subnetwork that 310 configure a smaller MTU. 312 The ITE upholds the 1500-byte-and-smaller packet delivery expectation 313 by setting a tunnel virtual interface MTU of 1500 bytes plus extra 314 room to accommodate any additional encapsulations that may occur on 315 the path from the original source (i.e., even if the underlying links 316 do not support an MTU of this size). The ITE can set larger MTU 317 values still (e.g., up to the maximum MTU size of the underlying 318 links), but should select a value that is not so large as to cause 319 excessive internally-generated ICMP PTBs coming from within the 320 tunnel interface (see: Section 4.2.4). 322 4.2.2. SEAL Maximum Segment Size (S-MSS) Maintenance 324 The ITE maintains a SEAL Maximum Segment Size (S-MSS) value for each 325 ETE as soft state within the tunnel interface (e.g., in the IPv4 path 326 MTU discovery cache). The ITE initializes S-MSS to the MTU of the 327 underlying link minus ENCAPS, and decreases or increases S-MSS based 328 on any Fragmentation Report (FRAGREP) messages received (see: Section 329 4.2.7). 331 4.2.3. Inner Packet Fragmentation 333 The ITE performs inner packet fragmentation *before* it admits an 334 inner packet into the tunnel interface. 336 For inner IPv4 packets larger than 1500 bytes and with the IPv4 Don't 337 Fragment (DF) bit set to 0, the ITE uses IPv4 fragmentation to break 338 the packet into 1500 byte IPv4 fragments, with the final fragment 339 possibly smaller than the first fragment. The IPv4 layer then admits 340 each fragment into the tunnel as an independent inner IPv4 packet. 341 These IPv4 fragments will ultimately be reassembled by the final 342 destination. (Note that inner fragmentation may not be available for 343 certain ITE types, e.g., for tunnel-mode IPsec.) 345 For all other inner packets, the ITE admits the packet if it is no 346 larger than the tunnel interface MTU; otherwise, it drops the packet 347 and sends an ICMP PTB message to the source. 349 4.2.4. SEAL Segmentation and Encapsulation 351 The ITE performs SEAL segmentation and encapsulation *after* it 352 admits an inner packet into the tunnel interface. 354 For inner IP packets larger than (2KB-ENCAPS) and also larger than 355 S-MSS, the ITE drops the packet and sends an ICMP PTB message back to 356 the source. Otherwise, the ITE encapsulates the packet in any mid- 357 layer '*' headers (for '*' other than the SEAL header). Next, if the 358 inner IP packet plus '*' headers is larger than S-MSS the ITE breaks 359 it into N segments (N <= 16) that are no larger than S-MSS bytes 360 each. Each segment except the final one MUST be of equal length, 361 while the final segment MUST be no larger than the initial segment. 362 The first byte of each segment MUST begin immediately after the final 363 byte of the previous segment, i.e., the segments MUST NOT overlap. 365 Note that this SEAL segmentation and encapsulation ignores the DF bit 366 in the inner IPv4 header or (in the case of IPv6) ignores the fact 367 that the network is not permitted to perform IPv6 fragmentation. 368 This segmentation process is a mid-layer (not an IP layer) operation 369 employed by the ITE to adapt the inner IP packet to the subnetwork 370 path characteristics, and the ETE will restore the inner packet to 371 its original form during decapsulation. Therefore, the fact that the 372 packet may have been segmented within the subnetwork is not 373 observable after decapsulation. 375 The ITE encapsulates each segment in a SEAL header formatted as 376 follows: 378 0 1 2 3 379 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 381 | ID Extension |R|M|CTL|Segment| Next Header | 382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 Figure 2: SEAL Header Format 386 where the header fields are defined as follows: 388 ID Extension (16) 389 a 16-bit extension of the 16-bit ID field in the outer IPv4 390 header; encodes the most-significant 16 bits of a 32 bit SEAL-ID 391 value. 393 R (1) 394 Reserved. 396 M (1) 397 the "More Segments" bit. Set to 1 if this SEAL-encapsulated 398 packet contains a non-final segment of a multi-segment inner IP 399 packet. 401 CTL (2) 402 a 2-bit "Control" field that identifies the type of SEAL- 403 encapsulated packet as follows: 405 '00' - a Fragmentation Report (FRAGREP). 407 '01' - a non-probe. 409 '10' - an implicit probe. 411 '11' - an explicit probe. 413 Segment (4) 414 a 4-bit Segment number. Encodes a segment number between 0 - 15. 416 Next Header (8) an 8-bit field that encodes an IP protocol number 417 the same as for the IPv4 protocol and IPv6 next header fields. 419 For single-segment inner IP packets, the ITE encapsulates the segment 420 in a SEAL header with (M=0; Segment=0). For N-segment inner packets 421 (N <= 16), the ITE encapsulates each segment in a SEAL header with 422 (M=1; Segment=0) for the first segment, (M=1; Segment=1) for the 423 second segment, etc., with the final segment setting (M=0; 424 Segment=N-1). 426 The ITE next sets CTL in the SEAL header of each segment as specified 427 in Section 4.2.6, then writes the IP protocol number corresponding to 428 the inner packet in the SEAL 'Next Header' field. Finally, the ITE 429 encapsulates the segment in the requisite */IPv4 outer headers 430 according to the specific encapsulation format (e.g., [RFC2003], 431 [RFC4213], etc.) then sets packet identification values as described 432 below. 434 For the purpose of packet identification, the ITE maintains a 32-bit 435 SEAL-ID value as per-ETE soft state, e.g. in the IPv4 destination 436 cache. The ITE randomly-initializes SEAL-ID when the soft state is 437 created and monotonically increments it (modulo 2^32) for each 438 successive SEAL-encapsulated packet it sends to the ETE. For each 439 packet, the ITE writes the least-significant 16 bits of the SEAL-ID 440 value in the ID field in the outer IPv4 header, and writes the most- 441 significant 16 bits in the ID Extension field in the SEAL header. 443 For tunnels that may traverse an IPv4 Network Address Translator 444 (NAT), the ITE instead maintains SEAL-ID as a 16-bit value that it 445 randomly-initializes when the soft state is created and monotonically 446 increments (modulo 2^16) for each successive SEAL-encapsulated 447 packet. For each packet, the ITE writes SEAL-ID in the ID extension 448 field of the SEAL header and writes a random 16-bit value in the ID 449 field in the outer IPv4 header. This requires that both the ITE and 450 ETE participate in this alternate scheme. 452 4.2.5. Sending SEAL packets 454 Following SEAL segmentation and encapsulation, the ITE sets DF=0 in 455 the outer IPv4 header of every outer packet it sends. 457 The ITE then sends each outer packet that encapsulates a segment of 458 the same inner packet into the tunnel in canonical order, i.e., 459 Segment 0 first, then Segment 1, etc. and finally Segment N-1. 461 4.2.6. Sending S-MSS Probes 463 When S-MSS is larger than 128, the ITE sends each data packet as an 464 implicit probe to detect any in-the-network IPv4 fragmentation. The 465 ITE sets CTL='10' in the SEAL header and DF=0 in the outer IPv4 466 header of each SEAL-encapsulated packet, and will receive FRAGREP 467 messages from the ETE if fragmentation occurs. When S-MSS=128, the 468 ITE instead sets CTL='01' in the SEAL header to avoid generating 469 FRAGREPs for unavoidable in-the-network fragmentation. 471 The ITE additionally sends explicit probes periodically to manage a 472 window of SEAL-IDs of outstanding probes that allows the ITE to 473 validate any FRAGREPs it receives. The ITE sends explicit probes by 474 setting CTL='11' in the SEAL header and DF=0 in the IPv4 header, 475 where the probe can be either an ordinary data packet or a NULL 476 packet created by setting the 'Next Header' field in the SEAL header 477 to a value of "No Next Header". 479 The ITE should also send explicit probes that are larger than S-MSS 480 periodically to detect increases in the path MTU to the ETE; the ITE 481 can send a large probe using either a NULL packet or an ordinary data 482 packet that is padded at the end by setting the outer IPv4 length 483 field to a larger value than the packet's true length. When the ETE 484 receives an explicit probe, it will return a FRAGREP message whether 485 or not any in-the-network fragmentation occured. 487 4.2.7. Processing Fragmentation Reports (FRAGREPs) 489 When the ITE receives a potential FRAGREP message, it first verifies 490 that the message was formatted correctly (see: Section 4.3.4) and 491 that the SEAL-ID embedded in the encapsulated IPv4 first-fragment is 492 within the current window of outstanding probes. If the FRAGREP is 493 valid, the ITE advances the probe window and sets a variable 'LEN' to 494 the value in the first-fragment's IPv4 length field. If (LEN-ENCAPS) 495 is smaller than S-MSS and the first-fragment was also the final 496 fragment, the ITE discards the FRAGREP. Otherwise, it re-calculates 497 S-MSS as follows: 499 if (LEN-ENCAPS) is greater than S-MSS or LEN is at least 576 500 set S-MSS to (LEN-ENCAPS) 501 else 502 set S-MSS to the maximum of S-MSS/2 and 128 503 endif 505 Finally, if the length field of the inner IP header encapsulated 506 within the first-fragment contains a value larger than (2KB-ENCAPS), 507 and the length field of the first-fragment header contains a still 508 larger value, the ITE discards the FRAGREP. Otherwise, it 509 encapsulated the inner IP packet portion embedded within the first- 510 fragment in an ICMP PTB to send back to the original source, with the 511 MTU field set to the maximum of 2KB-ENCAPS and (length of the first- 512 fragment minus ENCAPS). 514 (NB: The "576" in the S-MSS calculation above is the nominal minimum 515 MTU for typical IPv4 links and accounts for normal-case IPv4 first 516 fragments, while the "else" clause includes a "limited halving" 517 factor that accounts for unusual cases in which the ETE receives a 518 small IPv4 first-fragment [RFC1812]. This limited halving may 519 require multiple iterations of sending probes and receiving FRAGREPs, 520 but will rapidly converge to a stable value for S-MSS.) 522 4.2.8. Processing ICMP PTBs 524 SInce the ITE sends all SEAL-encapsulated packets with DF=0, it 525 unconditionally ignores any ICMP PTBs pertaining to SEAL-encapsulated 526 packets that it receives from within the tunnel. 528 4.3. ETE Specification 530 4.3.1. Reassembly Buffer Requirements 532 ETEs MUST be capable of using IPv4-layer reassembly to reassemble 533 SEAL-encapsulated outer packets of at least 2KB bytes, and MUST also 534 be capable of using SEAL-layer reassembly to reassemble inner IP 535 packets of (2KB-ENCAPS). 537 4.3.2. IPv4-Layer Reassembly 539 The ETE performs IPv4 reassembly as-normal, and should maintain a 540 conservative high- and low-water mark for the number of outstanding 541 reassemblies pending for each ITE. When the size of the reassembly 542 buffer exceeds this high-water mark, the ETE actively discards 543 incomplete reassemblies (e.g., using an Active Queue Management (AQM) 544 strategy) until the size falls below the low-water mark. 546 After reassembly, the ETE either accepts or discards the reassembled 547 packet based on the current status of the IPv4 reassembly cache 548 (congested vs uncongested). The SEAL-ID included in the IPv4 first- 549 fragment provides an additional level of reassembly assurance, since 550 it can record a distinct arrival timestamp useful for associating the 551 first-fragment with its corresponding non-initial fragments. The 552 choice of accepting/discarding a reassembly may also depend on the 553 strength of the upper-layer integrity check if known (e.g., IPSec/ESP 554 provides a strong upper-layer integrity check) and/or the corruption 555 tolerance of the data (e.g., multicast streaming audio/video may be 556 more corruption-tolerant than file transfer, etc.). 558 For SEAL-encapsulated packets that are larger than 2KB and that 559 arrive as multiple IPv4 fragments, the ETE uses the IPv4 first 560 fragment to generate a FRAGREP as specified in Section 4.3.4. The 561 ETE then discards all non-initial IPv4 fragemnts and decapsulates the 562 inner packet from the first fragment only. If the entire inner 563 packet is a single-segment SEAL packet that was fully-contained 564 within the IPv4 first fragment (i.e., all non-initial IPv4 fragments 565 contained only padding bytes), the ETE forwards the inner packet as- 566 normal; otherwise it drops the packet. This ensures that tunnel is 567 consistent in its handling of large inner packets. 569 4.3.3. SEAL-Layer Reassembly 571 After IPv4-layer reassembly, the ETE performs SEAL-layer reassembly 572 through simple in-order concatenation of the encapsulated segments 573 from N consecutive SEAL-encapsulated packets from the same inner 574 packet. These packets contain Segment numbers 0 through N-1 with 575 M=0/1 in final and non-final segments, respectively, and with 576 consecutive SEAL-ID values encoded in the 32-bit concatenation of the 577 ID Extension field in the SEAL header and the ID field in the IPv4 578 header. That is, for an N-segment inner packet, reassembly entails 579 the concatenation of the SEAL-encapsulated segments with (Segment 0, 580 SEAL-ID i), followed by (Segment 1, SEAL-ID ((i + 1) mod 2^32)), etc. 581 up to (Segment N-1, SEAL-ID ((i + N-1) mod 2^32)). (For tunnels that 582 may traverse an IPv4 Network Address Translator (NAT), the ETE 583 instead uses only the 16-bit value in the ID extension field in the 584 SEAL header as a 16-bit SEAL-ID value, and uses mod 2^16 arithmetic 585 to associate the segments of the same packet.) 587 SEAL-layer reassembly requires the ETE to maintain a cache of 588 recently received SEAL packets for a hold time that would allow for 589 reasonable inter-segment delays. The ETE uses a SEAL maximum segment 590 lifetime of 15 seconds for this purpose, i.e., the time after which 591 it will discard an incomplete reassembly. However, the ETE should 592 also actively discard any pending reassemblies that clearly have no 593 opportunity for completion, e.g., when a considerable number of new 594 SEAL packets have been received before a packet that completes a 595 pending reassembly has arrived. 597 4.3.4. Generating Fragmentation Reports (FRAGREPs) 599 When the ETE receives the IPv4 first-fragment of a SEAL packet that 600 was delivered as multiple IPv4 fragments and with CTL='10' in the 601 SEAL header, it sends a FRAGREP message back to the ITE. The ETE 602 also sends a FRAGREP for any SEAL packet with CTL='11', i.e., even if 603 the packet was not fragmented and while treating the unfragmented 604 packet the same as a first-fragment. 606 The ETE prepares the FRAGREP message by encapsulating the leading 256 607 bytes (or up to the end) of the first-fragment in outer SEAL/*/IPv4 608 headers as shown in Figure 3: 610 +-------------------------+ - 611 | | \ 612 ~ Outer */IPv4 headers ~ | 613 ~ of FRAGREP ~ > FRAGREP headers 614 | | | 615 +-------------------------+ | 616 | SEAL Header of FRAGREP | / 617 +-------------------------+ - 618 | | \ 619 ~ IP/*/SEAL/*/IPv4 ~ | 620 ~ hdrs of first-fragment ~ | 621 | | > First 256 bytes (or up to 622 +-------------------------+ | the end) of first-fragment 623 | | | 624 ~ Data of first-fragment ~ | 625 | | / 626 +-------------------------+ - 628 Figure 3: Fragmentation Report (FRAGREP) Message 630 The ETE next sets CTL='00', Segment=0 and M=0 in the outer SEAL 631 header, sets the SEAL-ID the same as for any SEAL packet, then sets 632 the SEAL Next Header field and the fields of the outer */IPv4 headers 633 according to the specific encapsulation type. The ETE then sets the 634 FRAGREP's destination address to the source address of the first- 635 fragment and sets the FRAGREP's source address to the destination 636 address of the first-fragment. If the destination address in the 637 first-fragment was multicast, the ETE instead sets the FRAGREP's 638 source address to an address assigned to the underlying IPv4 639 interface. Finally, the ETE sends the FRAGREP to the ITE. 641 5. Link Requirements 643 Subnetwork designers are strongly encouraged to follow the 644 recommendations in [RFC3819] when configuring link MTUs, where all 645 IPv4 links SHOULD configure a minimum MTU of 576 bytes. Links that 646 cannot configure an MTU of at least 576 bytes (e.g., due to 647 performance characteristics) SHOULD implement transparent link-layer 648 segmentation and reassembly such that an MTU of at least 576 can 649 still be presented to the IP layer. 651 6. End System Requirements 653 SEAL provides robust mechanisms for returning ICMP PTB messages to 654 the original source, however end systems that send unfragmentable IP 655 packets larger than 1500 bytes are strongly encouraged to use 656 Packetization Layer Path MTU Discovery per [RFC4821]. 658 7. Router Requirements 660 IPv4 routers within the subnetwork observe the requirements in 661 [RFC1812], and are strongly encouraged to implement IPv4 662 fragmentation such that the first fragment is the largest and 663 approximately the size of the underlying link MTU. 665 8. IANA Considerations 667 SEAL will use the IANA-assigned value of 253 as an IP protocol value 668 for experimentation purposes [RFC3692]; therefore, this document has 669 no actions for IANA. 671 9. Security Considerations 673 Unlike IPv4 fragmentation, overlapping fragment attacks are not 674 possible due to the requirement that SEAL segments be non- 675 overlapping. 677 An amplification/reflection attack is possible when an attacker sends 678 IPv4 first-fragments with spoofed source addresses to an ETE, 679 resulting in a stream of FRAGREP messages returned to a victim ITE. 680 The encapsulated segment of the spoofed IPv4 first-fragment provides 681 mitigation for the ITE to detect and discard spurious FRAGREPs. 683 The SEAL header is sent in-the-clear (outside of any IPsec/ESP 684 encapsulations) the same as for the IPv4 header. As for IPv6 685 extension headers, the SEAL header is protected only by L2 integrity 686 checks and is not covered under any L3 integrity checks. 688 10. Acknowledgments 690 Path MTU determination through the report of fragmentation 691 experienced by the final destination was first proposed by Charles 692 Lynn of BBN on the TCP-IP mailing list in May 1987. An historical 693 analysis of the evolution of path MTU discovery appears in 694 http://www.tools.ietf.org/html/draft-templin-v6v4-ndisc-01 and is 695 reproduced in Appendix A of this document. 697 The following individuals are acknowledged for helpful comments and 698 suggestions: Jari Arkko, Fred Baker, Teco Boot, Iljitsch van Beijnum, 699 Brian Carpenter, Steve Casner, Ian Chakeres, Remi Denis-Courmont, 700 Aurnaud Ebalard, Gorry Fairhurst, Joel Halpern, John Heffner, Bob 701 Hinden, Christian Huitema, Joe Macker, Matt Mathis, Dan Romascanu, 702 Dave Thaler, Joe Touch, Magnus Westerlund, Robin Whittle, James 703 Woodyatt and members of the Boeing PhantomWorks DC&NT group. 705 11. References 707 11.1. Normative References 709 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 710 September 1981. 712 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 713 RFC 1812, June 1995. 715 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 716 Requirement Levels", BCP 14, RFC 2119, March 1997. 718 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 719 (IPv6) Specification", RFC 2460, December 1998. 721 11.2. Informative References 723 [FOLK] C, C., D, D., and k. k, "Beyond Folklore: Observations on 724 Fragmented Traffic", December 2002. 726 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 727 October 1987. 729 [I-D.ietf-manet-smf] 730 Macker, J. and S. Team, "Simplified Multicast Forwarding 731 for MANET", draft-ietf-manet-smf-07 (work in progress), 732 February 2008. 734 [I-D.templin-autoconf-dhcp] 735 Templin, F., Russert, S., and S. Yi, "The MANET Virtual 736 Ethernet (VET) Abstraction", 737 draft-templin-autoconf-dhcp-14 (work in progress), 738 April 2008. 740 [MTUDWG] "IETF MTU Discovery Working Group mailing list, 741 gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November 742 1989 - February 1995.". 744 [RFC1063] Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP 745 MTU discovery options", RFC 1063, July 1988. 747 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 748 November 1990. 750 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 751 for IP version 6", RFC 1981, August 1996. 753 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 754 October 1996. 756 [RFC2004] Perkins, C., "Minimal Encapsulation within IP", RFC 2004, 757 October 1996. 759 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 760 RFC 2923, September 2000. 762 [RFC3692] Narten, T., "Assigning Experimental and Testing Numbers 763 Considered Useful", BCP 82, RFC 3692, January 2004. 765 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 766 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 767 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 768 RFC 3819, July 2004. 770 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 771 for IPv6 Hosts and Routers", RFC 4213, October 2005. 773 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 774 Internet Protocol", RFC 4301, December 2005. 776 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 777 Network Address Translations (NATs)", RFC 4380, 778 February 2006. 780 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 781 Network Tunneling", RFC 4459, April 2006. 783 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 784 Discovery", RFC 4821, March 2007. 786 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 787 Errors at High Data Rates", RFC 4963, July 2007. 789 [TCP-IP] "TCP-IP mailing list archives, 790 http://www-mice.cs.ucl.ac.uk/multimedia/mist/tcpip, May 791 1987 - May 1990.". 793 Appendix A. Historic Evolution of PMTUD (written 10/30/2002) 795 The topic of Path MTU discovery (PMTUD) saw a flurry of discussion 796 and numerous proposals in the late 1980's through early 1990. The 797 initial problem was posed by Art Berggreen on May 22, 1987 in a 798 message to the TCP-IP discussion group [TCP-IP]. The discussion that 799 followed provided significant reference material for [FRAG]. An IETF 800 Path MTU Discovery Working Group [MTUDWG] was formed in late 1989 801 with charter to produce an RFC. Several variations on a very few 802 basic proposals were entertained, including: 804 1. Routers record the PMTUD estimate in ICMP-like path probe 805 messages (proposed in [FRAG] and later [RFC1063]) 807 2. The destination reports any fragmentation that occurs for packets 808 received with the "RF" (Report Fragmentation) bit set (Steve 809 Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal) 811 3. A hybrid combination of 1) and Charles Lynn's Nov. 1987 proposal 812 (straw RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990) 814 4. Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30, 815 1990) 817 5. Fragmentation avoidance by setting "IP_DF" flag on all packets 818 and retransmitting if ICMPv4 "fragmentation needed" messages 819 occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191] 820 by Mogul and Deering). 822 Option 1) seemed attractive to the group at the time, since it was 823 believed that routers would migrate more quickly than hosts. Option 824 2) was a strong contender, but repeated attempts to secure an "RF" 825 bit in the IPv4 header from the IESG failed and the proponents became 826 discouraged. 3) was abandoned because it was perceived as too 827 complicated, and 4) never received any apparent serious 828 consideration. Proposal 5) was a late entry into the discussion from 829 Steve Deering on Feb. 24th, 1990. The discussion group soon 830 thereafter seemingly lost track of all other proposals and adopted 831 5), which eventually evolved into [RFC1191] and later [RFC1981]. 833 In retrospect, the "RF" bit postulated in 2) is not needed if a 834 "contract" is first established between the peers, as in proposal 4) 835 and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on 836 Feb 19. 1990. These proposals saw little discussion or rebuttal, and 837 were dismissed based on the following the assertions: 839 o routers upgrade their software faster than hosts 841 o PCs could not reassemble fragmented packets 843 o Proteon and Wellfleet routers did not reproduce the "RF" bit 844 properly in fragmented packets 846 o Ethernet-FDDI bridges would need to perform fragmentation (i.e., 847 "translucent" not "transparent" bridging) 849 o the 16-bit IP_ID field could wrap around and disrupt reassembly at 850 high packet arrival rates 852 The first four assertions, although perhaps valid at the time, have 853 been overcome by historical events leaving only the final to 854 consider. But, [FOLK] has shown that IP_ID wraparound simply does 855 not occur within several orders of magnitude the reassembly timeout 856 window on high-bandwidth networks. 858 (Authors 2/11/08 note: this final point was based on a loose 859 interpretation of [FOLK], and is more accurately addressed in 860 [RFC4963].) 862 Author's Address 864 Fred L. Templin (editor) 865 Boeing Phantom Works 866 P.O. Box 3707 867 Seattle, WA 98124 868 USA 870 Email: fltemplin@acm.org 872 Full Copyright Statement 874 Copyright (C) The IETF Trust (2008). 876 This document is subject to the rights, licenses and restrictions 877 contained in BCP 78, and except as set forth therein, the authors 878 retain all their rights. 880 This document and the information contained herein are provided on an 881 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 882 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 883 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 884 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 885 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 886 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 888 Intellectual Property 890 The IETF takes no position regarding the validity or scope of any 891 Intellectual Property Rights or other rights that might be claimed to 892 pertain to the implementation or use of the technology described in 893 this document or the extent to which any license under such rights 894 might or might not be available; nor does it represent that it has 895 made any independent effort to identify any such rights. Information 896 on the procedures with respect to rights in RFC documents can be 897 found in BCP 78 and BCP 79. 899 Copies of IPR disclosures made to the IETF Secretariat and any 900 assurances of licenses to be made available, or the result of an 901 attempt made to obtain a general license or permission for the use of 902 such proprietary rights by implementers or users of this 903 specification can be obtained from the IETF on-line IPR repository at 904 http://www.ietf.org/ipr. 906 The IETF invites any interested party to bring to its attention any 907 copyrights, patents or patent applications, or other proprietary 908 rights that may cover technology that may be required to implement 909 this standard. Please address the information to the IETF at 910 ietf-ipr@ietf.org.