idnits 2.17.1 draft-templin-inetmtu-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 899. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 910. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 917. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 923. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 176: '... TEs MUST configure an MRU (i.e., an...' RFC 2119 keyword, line 178: '...ditionally, they MUST configure an MRU...' RFC 2119 keyword, line 179: '...l interface, and SHOULD configure an M...' RFC 2119 keyword, line 355: '... The TE MUST include trailing data w...' RFC 2119 keyword, line 356: '...eply/solicit packets, and MUST include...' (24 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 25, 2007) is 6057 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 715 == Missing Reference: 'N' is mentioned on line 715, but not defined == Unused Reference: 'RFC1812' is defined on line 807, but no explicit reference was found in the text == Unused Reference: 'RFC0905' is defined on line 815, but no explicit reference was found in the text == Unused Reference: 'RFC3385' is defined on line 828, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 1146 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Phantom Works 4 Intended status: Informational September 25, 2007 5 Expires: March 28, 2008 7 Packetization Layer Path MTU Discovery for IP/*/IPv4 Tunnels 8 draft-templin-inetmtu-01.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on March 28, 2008. 35 Copyright Notice 37 Copyright (C) The IETF Trust (2007). 39 Abstract 41 The nominal Maximum Transmission Unit (MTU) of the Internet has 42 become 1500 bytes, but existing IP/*/IPv4 tunneling mechanisms impose 43 an encapsulation overhead that can reduce the effective path MTU to 44 smaller values. Additionally, existing IP/*/IPv4 tunneling 45 mechanisms are limited in their ability to discover and utilize 46 larger MTUs. This document specifies new mechanisms for conveying 47 packets over IP/*/IPv4 tunnels that address these issues. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 3. Concept of Operation . . . . . . . . . . . . . . . . . . . . . 4 54 4. Tunnel MTU and MRU . . . . . . . . . . . . . . . . . . . . . . 4 55 5. Tunnel Soft State . . . . . . . . . . . . . . . . . . . . . . 5 56 6. Sending Packets . . . . . . . . . . . . . . . . . . . . . . . 5 57 6.1. Conceptual Sending Algorithm . . . . . . . . . . . . . . . 6 58 6.2. Inner packet Fragmentation . . . . . . . . . . . . . . . . 7 59 6.3. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 60 6.3.1. Footer . . . . . . . . . . . . . . . . . . . . . . . . 7 61 6.3.2. Trailing Data and Checksum . . . . . . . . . . . . . . 8 62 6.3.3. Data, Probe Request, and Probe Solicitation Format . . 9 63 6.3.4. Probe Reply Format . . . . . . . . . . . . . . . . . . 9 64 6.4. Outer Packet Fragmentation . . . . . . . . . . . . . . . . 11 65 6.5. Setting DF in the Outer Header . . . . . . . . . . . . . . 11 66 6.6. Window Management . . . . . . . . . . . . . . . . . . . . 11 67 7. Receiving Packets . . . . . . . . . . . . . . . . . . . . . . 11 68 7.1. Decapsulation . . . . . . . . . . . . . . . . . . . . . . 11 69 7.2. Receiving Packet Too Big (PTB) Errors . . . . . . . . . . 12 70 8. Tunnel Qualification and Soft State Management . . . . . . . . 12 71 8.1. Probe Requests . . . . . . . . . . . . . . . . . . . . . . 12 72 8.1.1. Sending Probe Requests . . . . . . . . . . . . . . . . 12 73 8.1.2. Receiving Probe Requests . . . . . . . . . . . . . . . 13 74 8.2. Probe Solicitations . . . . . . . . . . . . . . . . . . . 13 75 8.2.1. Sending Probe Solicitations . . . . . . . . . . . . . 13 76 8.2.2. Receiving Probe Solicitations . . . . . . . . . . . . 14 77 8.3. Probe Replies . . . . . . . . . . . . . . . . . . . . . . 14 78 8.3.1. Sending Probe Replies . . . . . . . . . . . . . . . . 14 79 8.3.2. Receiving Probe Replies . . . . . . . . . . . . . . . 15 80 9. 8-bit Fletcher Checksum Calculation . . . . . . . . . . . . . 16 81 10. Updated Specifications . . . . . . . . . . . . . . . . . . . . 16 82 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 83 12. Security Considerations . . . . . . . . . . . . . . . . . . . 17 84 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 85 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 86 14.1. Normative References . . . . . . . . . . . . . . . . . . . 18 87 14.2. Informative References . . . . . . . . . . . . . . . . . . 18 88 Appendix A. Discussion . . . . . . . . . . . . . . . . . . . . . 19 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 Intellectual Property and Copyright Statements . . . . . . . . . . 21 92 1. Introduction 94 The nominal Maximum Transmission Unit (MTU) of today's Internet has 95 become 1500 bytes due to the preponderance of networking gear that 96 configures an MTU of that size. Since not all links in the Internet 97 configure a 1500 byte MTU, however [RFC3819], packets can be dropped 98 due to an MTU restriction on the path. Internet Protocol, Version 4 99 (IPv4) [RFC0791] is the predominant network layer protocol in the 100 Internet today, and it is likely that IPv4 use will continue to grow 101 into the future. It is therefore essential that tunnels over IPv4 102 (hereafter called IP/*/IPv4 tunnels) be made capable of consistent 103 and efficient handling of packets of various sizes. 105 Upper layers see IP/*/IPv4 tunnels as ordinary links, but even for 106 packets no larger than 1500 bytes these links are susceptible to 107 silent loss (e.g., due to path MTU restrictions, lost error messages, 108 layered encapsulations, reassembly buffer limitations, etc.) 109 resulting in poor performance and/or communications failures 110 [RFC2923][RFC4459][RFC4821][RFC4963]. 112 This document specifies new mechanisms for IP/*/IPv4 tunnels that 113 assure robust handling for packets of various sizes; it updates the 114 functional specifications for Tunnel Endpoints (TEs) found in 115 existing IP/*/IPv4 tunneling mechanisms (see: Section 10). 117 2. Terminology 119 The following abbreviations and terms are used in this document: 121 DF - the IPv4 header "Don't Fragment" flag ([RFC0791], Section 122 3.1). 124 EMTU_R - Effective MTU to Receive ([RFC1122], Section 3.3.2). 126 ENCAPS - the size of the encapsulating */IPv4 headers plus 127 trailers. 129 IPv4 - Internet Protocol, Version 4 131 IPv6 - Internet Protocol, Version 6 133 MaxOuterPktLen - Maximum Outer Packet Length, in bytes 135 MaxInnerPktLen - Maximum Inner Packet Length, in bytes 137 ReassTime - Reassembly Timeout 138 MRU - Maximum Receive Unit. For the purpose of this document, 139 'MRU' has exactly the same meaning as 'EMTU_R' 141 MTU - Maximum Transmission Unit 143 PTB - Packet Too Big error 145 TE - Tunnel Endpoint 147 TFE - Tunnel Far End 149 TNE - Tunnel Near End 151 IP/*/IPv4 - an IP packet encapsulated in */IPv4 headers (e.g. for 152 "*" = NULL, UDP, TCP, AH, ESP, etc.). 154 inner packet/header/payload - an IP packet/header/payload before 155 IP/*/IPv4 encapsulation. 157 outer packet/header/payload - a */IPv4 packet/header/payload after 158 IP/*/IPv4 encapsulation. 160 3. Concept of Operation 162 TEs that implement this scheme engage in a continuous handshaking 163 process while data is flowing through the tunnel to confirm that the 164 TFE is participating and to maintain soft state used for determining 165 maximum packet sizes. When the flow of data through the tunnel is 166 suspended, the handshaking process is discontinued. When one or both 167 of the TEs do not implement the scheme, the behavior automatically 168 reverts to that of the legacy IP/*/IPv4 tunneling mechanism. 170 4. Tunnel MTU and MRU 172 TEs configure an indefinite MTU on the tunnel interface, i.e., there 173 is no logical limit on the size of inner packets that upper layers 174 can present to the tunnel interface. 176 TEs MUST configure an MRU (i.e., an EMTU_R) that is no smaller than 177 2048 bytes (2KB) on all IPv4 interfaces over which a tunnel interface 178 is configured. Additionally, they MUST configure an MRU that is no 179 smaller than 2KB on the tunnel interface, and SHOULD configure an MRU 180 that is no smaller than the largest MRU of any IPv4 interfaces over 181 which the tunnel is configured. 183 5. Tunnel Soft State 185 TEs maintain the following per-TFE conceptual variables as soft state 186 (e.g., in a conceptual neighbor cache): 188 MaxOuterPktLen 189 the current maximum length outer packet/fragment that can be 190 accommodated by the IPv4 path MTU without further fragmentation. 191 Recommended default value: 128 bytes. Range: 68 bytes to 64KB. 193 MaxInnerPktLen 194 the current maximum length inner packet/fragment that the TFE can 195 reassemble over the tunnel, i.e., the MRU. Recommended default 196 value: the minimum MRU defined for the specific IP/*/IPv4 197 tunneling mechanism (e.g., 1500 bytes for [RFC4213]). Range: 576 198 bytes to (2^32-1) bytes. 200 ReassTime 201 the current timeout value that the TFE uses for reassembly of 202 fragmented packets that traverse the tunnel. Recommended default 203 value: 120 seconds. Range: 4uSec to 4*(2^32)usec (~4.55hr). 205 IPv4Id 206 the current IPv4 ID value that the TE will assign in the outer 207 IPv4 header of packets it sends into the tunnel. Initial value: 208 randomly chosen. Range: 0 to 2^16-1. 210 isQualified 211 boolean indicating whether the TFE implements the scheme. 212 Recommended default value: FALSE. 214 isNAT 215 boolean indicating whether there is an IPv4 Network Address 216 Translator (NAT) on the path to the TFE. Default value: TRUE or 217 FALSE, based on the specific IP/*/IPv4 tunneling mechanism. 219 See: [RFC3819], Section 2 for subnetwork MTU recommendations that 220 influence 'MaxOuterPktLen'. 222 See: [RFC1122], Section 3.3.2 for EMTU_R (MRU) and reassembly timeout 223 recommendations. 225 6. Sending Packets 227 TEs send packets across a tunnel to the TFE according to the 228 following specifications: 230 6.1. Conceptual Sending Algorithm 232 With reference to Sections 6.2 - 6.6, TEs use the following 233 conceptual sending algorithm: 235 if inner packet is larger than 'MaxInnerPktLen' and inner 236 packet is not fragmentable (see: Section 6.2) 237 Send PTB appropriate to the inner protocol (e.g., an 238 ICMPv6 PTB [RFC1981]) with MTU = 'MaxInnerPktLen'. 239 Drop packet. 240 else 241 if 'isNAT' and inner packet is not a probe used for 242 'MaxOuterPktLen' determination 243 if inner packet is larger than 2*('MaxOuterPktLen' 244 - ENCAPS) and inner packet is not fragmentable 245 (see: Section 6.2) 246 Send PTB appropriate to the inner protocol 247 with MTU = 2*('MaxOuterPktLen' - ENCAPS)). 248 Drop packet. 249 else 250 Fragment inner packet into fragments no larger 251 than MIN('MaxInnerPktLen', 2*('MaxOuterPktLen' 252 - ENCAPS)) (see: Section 6.2). 253 endif 254 else 255 Fragment inner packet into fragments no larger than 256 'MaxInnerPktLen' (see: Section 6.2). 257 endif 258 foreach inner packet/fragment 259 Encapsulate as an outer IPv4 packet (see: Section 6.3). 260 if outer packet is not a probe used for 261 'MaxOuterPktLen' determination 262 fragment outer packet into fragments no larger than 263 'MaxOuterPktLen' (see: Section 6.4). 264 endif 265 foreach outer packet/fragment 266 Set DF in the outer header according to Section 6.5. 267 Send fragment subject to window restrictions 268 (see: Section 6.6). 269 endforeach 270 endforeach 271 endif 273 Figure 1: Conceptual Sending Algorithm 275 6.2. Inner packet Fragmentation 277 An inner packet is fragmentable IFF the TE is permitted to break it 278 into inner fragments before encapsulation, e.g., an IPv6 packet with 279 a fragment header, an IPv4 packet with DF=0, etc. 281 TEs break fragmentable inner packets into inner fragments of no more 282 than 'MaxInnerPktLen' bytes when 'isNAT' is FALSE and no more than 283 MIN('MaxInnerPktLen', 2*('MaxOuterPktLen' - ENCAPS)) bytes when 284 'isNAT' is TRUE. The TE then encapsulates each inner fragment per 285 Section 6.3. These inner fragments will be reassembled by the final 286 destination. 288 When 'isNAT' is TRUE, 2*('MaxOuterPktLen' - ENCAPS) may not be large 289 enough to accommodate the minimum IPv6 MTU such that the TE may be 290 required to drop an IPv6 packet of 1280 bytes or smaller and send an 291 ICMPv6 PTB with an MTU value less than 1280 bytes. The original IPv6 292 source will then include a fragment header in subsequent IPv6 packets 293 and the TE can then perform IPv6 fragmentation on these inner packets 294 using the fragment header included by the source according to the 295 final paragraph of [RFC2460], Section 5. 297 6.3. Encapsulation 299 TEs encapsulate inner IP packets according to the specific IP/*/IPv4 300 document, except that the TE maintains a randomly-initialized and 301 monotonically-increasing (modulo 64K) per-TFE 'IPv4Id' value that it 302 encodes in the outer IPv4 headers of successive encapsulated packets. 304 The TE also appends trailing data as specified in the following 305 sections and increments the innermost '*' header length field by the 306 number of trailing data bytes added, e.g., the UDP length field for 307 IPv6/UDP/IPv4 tunnels, the IPv4 length field for IPv6/IPv4 tunnels, 308 etc. 310 6.3.1. Footer 312 When trailing data is included (see Section 6.3.2), the TE adds the 313 following 4-byte footer as the final 4 bytes of the trailing data. 314 The footer is byte-aligned only, and need not be aligned on an even 315 word/longword/etc. boundary: 317 0 1 2 3 318 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 320 |Version| Type | Reserved | Fletcher A | Fletcher B | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 323 Figure 2: Footer Format 325 where the fields of the footer are specified as follows: 327 Version (4 bits) 328 The Version field indicates the format of the trailing data. This 329 document describes version 1. 331 Type (4 bits) 332 The type of encapsulated packet. The following types are defined: 334 0 - Ordinary data packet. 336 1 - Probe Request (see: Section 8.1). 338 2 - Probe Solicitation (see: Section 8.2). 340 3 - Probe Reply (see: Section 8.3). 342 4 - 15 - Reserved for future use. 344 Reserved (8 bits) 345 Reserved for future use. 347 Fletcher A (8 bits) 348 The 8-bit Fletcher A checksum component. 350 Fletcher B (8 bits) 351 The 8-bit Fletcher B checksum component. 353 6.3.2. Trailing Data and Checksum 355 The TE MUST include trailing data with a non-zero checksum in the 356 footer of all probe request/reply/solicit packets, and MUST include 357 trailing data with a non-zero checksum in the footer of data packets 358 when 'isNAT' is TRUE. The TE MAY include trailing data with either a 359 zero or non-zero checksum in data packets when 'isNAT' is FALSE, or 360 MAY alternately omit trailing data in those packets. 362 For probe reply packets, the TE appends zero-filled padding bytes as 363 necessary to extend the packet to a minimum of 50 bytes beyond the 364 beginning of the inner IP header then appends a 14 byte control block 365 as specified in Section 6.3.4. 367 For all other packets that will include a non-zero trailing checksum, 368 the TE appends zero-filled padding bytes as necessary to extend the 369 packet to a minimum of 64 bytes beyond the beginning of the inner IP 370 header. 372 The TE then calculates the 8-bit Fletcher checksum as specified in 373 Section 9 and encodes the results in the Fletcher A and B fields of 374 the footer. The footer is appended as the final 4 bytes of the 375 trailing data, as specified in the following sections. 377 6.3.3. Data, Probe Request, and Probe Solicitation Format 379 The TE uses the following packet format for data, probe request, and 380 probe solicitation packets (types 0 through 2): 381 +---------------------------------+ 382 | Outer IPv4 | 383 | Header w/'IPv4Id' | 384 +---------------------------------+ 385 | * Headers | 386 | | 387 +-------------+ +---------------------------------+ 388 | Inner IP | | Inner IP | 389 ~ packet ~ ===> ~ packet ~ 390 | | | | T 391 +-------------+ +---------------------------------+ -\ r 392 Inner Packet | | | a 393 ~ Zero Padding ~ | i 394 | | > l 395 +---------------------------------+ | e 396 | Footer (see: Section 6.3.1) | | r 397 +---------------------------------+ -/ s 398 | Any */IPv4 protocol trailers ... 399 +------------------------------ 400 Outer Packet with Trailers 402 Figure 3: Data, Probe Request, and Probe Solicitation Format 404 6.3.4. Probe Reply Format 406 The TE uses the following encapsulation format for all probe reply 407 packets (type 3): 409 +--------------------------------+ 410 | Outer IPv4 | 411 | Header w/'IPv4Id' | 412 +--------------------------------+ 413 | * Headers | 414 | | 415 +-------------+ +--------------------------------+ 416 | inner IP | | inner IP | 417 ~ echo ~ ===> ~ echo ~ 418 | reply | | reply | 419 +-------------+ +--------------------------------+ -\ 420 Inner Reply | | | 421 ~ Zero Padding ~ | 422 | | | 423 +--------------------------------+ | T 424 | YourPort / YourId | | r 425 +--------------------------------+ | a 426 | YourAddr | > i 427 +--------------------------------+ | l 428 | ReassTime | | e 429 +--------------------------------+ | r 430 | MaxInnerPktLen | | s 431 +--------------------------------+ | 432 | Footer (see: Section 6.3.1) | | 433 +--------------------------------+ -/ 434 | Any */IPv4 protocol trailers ... 435 +------------------------------ 436 Outer Reply with Trailers 438 Figure 4: Probe Reply Format 440 where the following 14-byte "control block" information is included 441 immediately following the padding and immediately before the trailing 442 footer: 444 YourPort (16 bits) - 1's complement of the observed port number of 445 the probe request. 447 YourId (16 bits) - 1's complement of the observed ip_id number of 448 the probe request. 450 YourAddr (32 bits) - 1's complement of the observed IPv4 source 451 address of the probe request. 453 ReassTime (32 bits) - non-zero value between 1 - (2^32-1) in 4usec 454 increments. 456 MaxInnerPktLen (32 bits) - non-zero value between 576 - (2^32-1) 457 in 1 byte increments. 459 6.4. Outer Packet Fragmentation 461 For packets other than probe requests used for 'MaxOuterPktLen' 462 determination, TEs use IPv4 fragmentation to fragment outer packets 463 after IPv4 encapsulation into fragments no larger than 464 'MaxOuterPktLen' bytes. These outer fragments will be reassembled by 465 the TFE. 467 6.5. Setting DF in the Outer Header 469 TEs MUST set DF=1 in the outer IPv4 header of probe requests to be 470 used for 'MaxOuterPktLen' determination. TEs MAY set DF=0 in the 471 outer header of other probe requests and SHOULD set DF=0 in the outer 472 header of probe replies. 474 TEs MUST set DF=1 in the outer header of ordinary data packets/ 475 fragments when 'isNAT' is TRUE. 477 TEs MAY set DF=0 in the outer header of ordinary data packets/ 478 fragments when 'isNAT' is FALSE. 480 6.6. Window Management 482 TEs send packets into a tunnel according to a window based on the 483 TFE's advertised 'ReassTime'. In particular, the TE must not admit 484 more than 2^16 packets into the tunnel within the 'ReassTime' window. 486 TE implementations should use discretion when not all of the inner- 487 and outer fragments of the original packet could be admitted into the 488 tunnel within the current window, i.e., implementations are advised 489 to determine when it is appropriate to admit some fragments vs. drop 490 all fragments. 492 7. Receiving Packets 494 7.1. Decapsulation 496 TEs decapsulate each outer packet they receive exactly as specified 497 in the appropriate IP/*/IPv4 document except that when 'isQualified' 498 is TRUE and the packet includes a non-zero trailing checksum the TE 499 first verifies the checksum in the outer packet as specified in 500 Section 9. If the A and B results of the checksum calculation match 501 the values stored in the trailing checksum, the TE decapsulates the 502 packet; otherwise it drops the packet. 504 Note that the initial probe request/reply packets from a new TFE will 505 be received before 'isQualified' is set to TRUE. The TE decapsulates 506 these packets also as specified in Section 8. 508 7.2. Receiving Packet Too Big (PTB) Errors 510 TEs may receive ICMPv4 PTB errors with Type=3 ("Destination 511 Unreachable") and Code=4 ("fragmentation needed, and DF set") that 512 include a Next-Hop MTU value [RFC1191] in response to any packets 513 that were admitted into the tunnel with DF=1 [RFC0792]. 515 When a TE receives an ICMPv4 PTB with a Next-Hop MTU value smaller 516 than 'MaxOuterPktLen', it SHOULD reduce 'MaxOuterPktLen' and/or 517 actively probe to discover and confirm a new 'MaxOuterPktLen'. The 518 TE SHOULD NOT send a translated PTB back to the inner source. 520 8. Tunnel Qualification and Soft State Management 522 TEs engage in a probing process to qualify new TFEs and refresh per- 523 TFE soft state for qualified TFEs thereafter. TEs discontinue the 524 probing process and garbage-collect stale soft state for dormant 525 tunnels and unqualified TFEs. TEs exchange probe requests, probe 526 solicitations and probe replies as specified in the following 527 sections: 529 8.1. Probe Requests 531 TEs send and receive probe requests as specified below: 533 8.1.1. Sending Probe Requests 535 8.1.1.1. Basic Probing Strategy 537 TEs send probe requests while data is actively flowing through the 538 tunnel. The TE sends initial probe requests to qualify each new TFE, 539 then sends periodic probe requests thereafter. The TE SHOULD limit 540 the rate at which it sends probe requests to each TFE, but MUST probe 541 frequently enough to refresh per-TFE conceptual variables. 543 The TE retains a cache of recently-sent probe requests and uses them 544 to verify subsequent probe replies. 546 8.1.1.2. MaxOuterPktLen Probing 548 The TE SHOULD probe to detect larger 'MaxOuterPktLen' values by 549 sending progressively larger probe requests padded to the desired 550 probe size. When the TE receives sufficient evidence through probing 551 that the forward path to the TFE supports the probed size, it 552 advances 'MaxOuterPktLen' to the probe size. The TE SHOULD NOT send 553 probe requests larger than ('MaxInnerPktLen' + ENCAPS). The TE MAY 554 send a series of probes in parallel to mitigate 'MaxOuterPktLen' 555 fluctuations in the case of multipath routes with diverse path MTUs. 557 8.1.1.3. Generating and Sending Probe Requests 559 TEs generate probe requests by creating a minimum-sized and 560 unfragmentable IP echo request packet according to the inner IP 561 protocol (e.g., an ICMPv6 echo request [RFC4443] when the inner IP 562 protocol is IPv6). The echo request MUST include source and 563 destination addresses that correspond to the TNE and TFE 564 respectively, and SHOULD include additional identifying information 565 (e.g., sequence/identification numbers, nonce values, etc.) that the 566 TFE will echo in its reply. The TE then encapsulates the echo 567 request with padding added to create an outer probe request of the 568 desired probe size and sends the probe request into the tunnel as 569 specified in Section 6. 571 8.1.2. Receiving Probe Requests 573 When a TE receives a potential probe request from a TFE (i.e., as- 574 told by examining the potential trailing footer), it first determines 575 whether the packet includes a valid trailing checksum. If the packet 576 did not include a valid trailing checksum, the TE discontinues probe 577 request processing, decapsulates the packet as for ordinary data and 578 returns from processing. Otherwise, the TE generates a probe reply 579 as specified in Section 8.3. 581 8.2. Probe Solicitations 583 TEs send and receive probe solicitations as specified below: 585 8.2.1. Sending Probe Solicitations 587 When a TE has new information to convey to a TFE, but has not 588 received recent probe requests from the TFE, it MAY send a probe 589 solicitation to the TFE. The TE creates a NULL inner IP packet 590 (e.g., an IPv6 header with "No Next Header" in the Next Header field) 591 with source and destination addresses that correspond to the TNE and 592 TFE respectively. The TE then encapsulates the NULL packet as a 593 probe solicitation and sends it into the tunnel as specified in 594 Section 6. 596 8.2.2. Receiving Probe Solicitations 598 When a TE receives a potential probe solicitation from a TFE, it 599 first determines whether the packet includes a valid trailing 600 checksum. If the packet did not include a valid trailing checksum, 601 the TE discontinues probe solicitation processing, decapsulates the 602 inner packet as for ordinary data and returns from processing. 604 Otherwise, the TE SHOULD send an expedited probe request with DF=0 to 605 the TFE as specified in Section 8.1 if it has not successfully probed 606 the TFE recently. The TE then discards the probe solicitation. 608 8.3. Probe Replies 610 TEs send and receive probe replies as specified below: 612 8.3.1. Sending Probe Replies 614 TEs send probe replies in response to valid probe requests and use 615 them as a mechanism for advertising 'MaxInnerPktLen' and 'ReassTime' 616 values to the TFE. TEs also use probe replies to inform the TFE of 617 the IPv4 address and protocol port number that it observed in the 618 TFE's probe request. 620 The TE creates an inner IP echo reply packet according to the inner 621 IP protocol (e.g., an ICMPv6 echo reply [RFC4443] when the inner 622 protocol is IPv6). The TE includes in the echo reply the destination 623 address of the echo request as the source address and the source 624 address of the echo request as the destination addresses. The TE 625 also includes in the echo reply any additional identifying 626 information that the TFE included in its echo request. 628 The TE then encapsulates the echo reply as specified in Section 6.3. 629 For IP/*/IPv4 tunneling mechanisms that include a port number in the 630 encapsulating * header, the TE includes the 1's complement of the 631 protocol source port number it observed in the TFE's probe request 632 (e.g., the UDP source port number for IPv6/UDP/IPv4 encapsulation) in 633 the 16-bit 'YourPort' field. (Otherwise, the TE encodes the value 634 '0' in the 'YourPort' field.) The TE next includes the 1's 635 complement of the ip_id it observed in the outer IPv4 header of the 636 TFE's probe request in the 16-bit 'YourId' field and encodes the 637 source address of the probe request in the 32-bit 'YourAddr' field. 639 The TE next includes a value that is less than or equal to an MRU 640 appropriate for the interface the TFE's probe request arrived on in 641 the 'MaxInnerPktLen' field. The TE MAY choose to dynamically 642 increase or decrease the 'MaxInnerPktLen' values it advertises to a 643 TFE in successive probe replies, but if so it SHOULD seek to converge 644 to a stable value. 646 The TE finally includes a reassembly timeout value appropriate for 647 the interface the TFE's probe request arrived on in the 'ReassTime' 648 field. The TE MAY choose to dynamically increase or decrease the 649 'ReassTime' value it advertises to a TFE in successive probe replies, 650 but if so it SHOULD seek to converge to a stable value. 652 Following the encoding of the above trailing data, the TE appends the 653 trailing checksum and sends the reply to the TFE. 655 8.3.2. Receiving Probe Replies 657 8.3.2.1. Probe Reply Verification 659 When a TE receives a potential probe reply from a TFE, it first 660 determines whether the packet includes a valid trailing checksum. 661 The TE next verifies that the packet includes enough trailing data to 662 contain a probe reply control block (see: Section 6.3.4) then 663 examines the 'MaxInnerPktLen' and 'ReassTime' values in the potential 664 control block. If the packet did not include a valid trailing 665 checksum, or the packet did not include a control block, or if either 666 of the 'MaxInnerPktLen' or 'ReassTime' values in the potential 667 control block lie outside of the acceptable ranges listed in Section 668 6.3.4, the TE discontinues probe reply processing, decapsulates the 669 packet as for ordinary data and returns from processing. 671 Next, the TE verifies that the inner IP echo reply matches one of its 672 cached probe requests by examining the inner IP source and 673 destination addresses as well as any other identifying information in 674 the inner packet. The TE sets: 'isQualified' to TRUE for this TFE if 675 the probe reply is valid; otherwise, it discards the probe reply and 676 returns from processing. If the TE receives excessive invalid probe 677 replies from a TFE, it resets 'isQualified' to FALSE and restores 678 'MaxOuterPktLen' and 'MaxInnerPktLen' to default values. 680 8.3.2.2. Probe Reply Processing 682 For IP/*/IPv4 tunneling mechanisms that include port numbers in 683 encapsulating * headers, the TE next examines the 'YourPort', 684 'YourId' and 'YourAddr' values encoded in the packet. If the values 685 match the 1's complement of the probe request's protocol port, ip_id 686 and IPv4 address, respectively, the TE sets 'isNAT' to FALSE; 687 otherwise, it sets 'isNAT' to TRUE. (For encapsulating * headers 688 that do not include port numbers, the TE the ignores the 'YourPort' 689 value in this check.) 691 Next, the TE records the 'MaxInnerPktLen' and 'ReassTime' values in 692 the corresponding conceptual variables for this TFE. If the new 693 'MaxInnerPktLen' is smaller than ('MaxOuterPktLen' - ENCAPS), the TE 694 SHOULD reduce 'MaxOuterPktLen' to ('MaxInnerPktLen' + ENCAPS). If 695 the 'MaxInnerPktLen' and 'ReassTime' values fluctuate significantly 696 between successive probe replies, the TE SHOULD record the most 697 conservative values received (e.g., 16KB 'MaxInnerPktLen' instead of 698 64KB, 90sec 'ReassTime' instead of 60sec, etc.). 700 Following the above processing, the TE discards the probe reply. 702 9. 8-bit Fletcher Checksum Calculation 704 The 8-bit Fletcher Checksum is discussed in [RFC1146][STONE1][STONE2] 705 and is used by this specification to provide an integrity check with 706 different properties than those used by common link layers and upper 707 layer protocols. 709 The TE calculates the 8-bit Fletcher checksum of the first 64 bytes 710 of the inner packet beginning with the inner IP header according to 711 the algorithm of [RFC1146], which is reproduced below with an 712 additional rule for representing zero results: 714 The 8-bit Fletcher Checksum Algorithm is calculated over a 715 sequence of data octets (call them D[1] through D[N]) by 716 maintaining 2 unsigned 1's-complement 8-bit accumulators A and B 717 whose contents are initially zero, and performing the following 718 loop where i ranges from 1 to N: 720 A := A + D[i] 721 B := B + A 723 If, at the end of the loop, either or both of the A, B 724 accumulators encode the value 0x0000, invert the value 725 in the accumulator(s) to 0xffff. 727 Note that faster algorithms are possible and may be used instead of 728 the algorithm above; see: [RFC1146] for citations of alternate 729 algorithms. 731 10. Updated Specifications 733 This document updates the following specifications: 735 o RFC2003 (IP-in-IP) 736 o RFC2529 (6over4) 738 o RFC2661 (L2TP) 740 o RFC2784 (GRE) 742 o RFC3056 (6to4) 744 o RFC3378 (ETHERIP) 746 o RFC3884 (IPSec Transport Mode for Dynamic Routing) 748 o RFC4023 (MPLS-in-IP) 750 o RFC4213 (Basic IPv6 Transition Mechanisms) 752 o RFC4214 (ISATAP) 754 o RFC4301 (IPSec) 756 o RFC4302 (AH) 758 o RFC4303 (ESP) 760 o RFC4380 (TEREDO) 762 o LISP 764 o others.... 766 11. IANA Considerations 768 The IANA is instructed to create a registry for the Version and Type 769 values that occur in the footers of encapsulated packets per Section 770 6.3.1. 772 12. Security Considerations 774 A possible attack vector involves an off-path attacker sending probe 775 requests and/or probe solicitations with spoofed source addresses. 776 Legitimate probe requests and replies contain identifying information 777 that is useful for defending against off-path attacks. 779 Security considerations for specific IP/*/IPv4 tunneling mechanisms 780 are given in the respective documents. 782 13. Acknowledgments 784 This work has benefited from discussions with Fred Baker, Iljitsch 785 van Beijnum, Steve Casner, Gorry Fairhurst, John Heffner, Joe Macker, 786 Matt Mathis, and Joe Touch. Dan Romascanu mentioned the IEEE 802.3as 787 extension of the Ethernet frame size to 2048 bytes. Remi Denis- 788 Courmont noted that trailers could be added using the innermost '*' 789 protocol length field. 791 14. References 793 14.1. Normative References 795 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 796 September 1981. 798 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 799 RFC 792, September 1981. 801 [RFC1122] Braden, R., "Requirements for Internet Hosts - 802 Communication Layers", STD 3, RFC 1122, October 1989. 804 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 805 November 1990. 807 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 808 RFC 1812, June 1995. 810 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 811 (IPv6) Specification", RFC 2460, December 1998. 813 14.2. Informative References 815 [RFC0905] International Organization for Standardization (ISO), "ISO 816 Transport Protocol specification ISO DP 8073", RFC 905, 817 April 1984. 819 [RFC1146] Zweig, J. and C. Partridge, "TCP alternate checksum 820 options", RFC 1146, March 1990. 822 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 823 for IP version 6", RFC 1981, August 1996. 825 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 826 RFC 2923, September 2000. 828 [RFC3385] Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna, 829 "Internet Protocol Small Computer System Interface (iSCSI) 830 Cyclic Redundancy Check (CRC)/Checksum Considerations", 831 RFC 3385, September 2002. 833 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 834 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 835 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 836 RFC 3819, July 2004. 838 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 839 for IPv6 Hosts and Routers", RFC 4213, October 2005. 841 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 842 Message Protocol (ICMPv6) for the Internet Protocol 843 Version 6 (IPv6) Specification", RFC 4443, March 2006. 845 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 846 Network Tunneling", RFC 4459, April 2006. 848 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 849 Discovery", RFC 4821, March 2007. 851 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 852 Errors at High Data Rates", RFC 4963, July 2007. 854 [STONE1] Stone, J., "Checksums in the Internet (Stanford Doctoral 855 Dissertation)", August 2001. 857 [STONE2] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 858 "Performance of Checksums and CRC's over Real Data, IEEE/ 859 ACM Transactions on Networking, Vol 6, No. 5", 860 October 1998. 862 Appendix A. Discussion 864 Probing strategies for packetization layer protocols are specified in 865 ([RFC4821], Section 7) and apply also to the TE's 'MaxOuterPktLen' 866 probing process. 868 Further strategies for handling ICMPv4 PTB errors are specified in 869 ([RFC4821], Section 7) and apply also to the TE's 'MaxOuterPktLen' 870 probing process. 872 Note that decapsulation automatically erases any padding that may 873 have been inserted by the TE along with the trailing checksum. 875 Author's Address 877 Fred L. Templin (editor) 878 Boeing Phantom Works 879 P.O. Box 3707 880 Seattle, WA 98124 881 USA 883 Email: fred.l.templin@boeing.com 885 Full Copyright Statement 887 Copyright (C) The IETF Trust (2007). 889 This document is subject to the rights, licenses and restrictions 890 contained in BCP 78, and except as set forth therein, the authors 891 retain all their rights. 893 This document and the information contained herein are provided on an 894 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 895 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 896 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 897 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 898 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 899 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 901 Intellectual Property 903 The IETF takes no position regarding the validity or scope of any 904 Intellectual Property Rights or other rights that might be claimed to 905 pertain to the implementation or use of the technology described in 906 this document or the extent to which any license under such rights 907 might or might not be available; nor does it represent that it has 908 made any independent effort to identify any such rights. Information 909 on the procedures with respect to rights in RFC documents can be 910 found in BCP 78 and BCP 79. 912 Copies of IPR disclosures made to the IETF Secretariat and any 913 assurances of licenses to be made available, or the result of an 914 attempt made to obtain a general license or permission for the use of 915 such proprietary rights by implementers or users of this 916 specification can be obtained from the IETF on-line IPR repository at 917 http://www.ietf.org/ipr. 919 The IETF invites any interested party to bring to its attention any 920 copyrights, patents or patent applications, or other proprietary 921 rights that may cover technology that may be required to implement 922 this standard. Please address the information to the IETF at 923 ietf-ipr@ietf.org. 925 Acknowledgment 927 Funding for the RFC Editor function is provided by the IETF 928 Administrative Support Activity (IASA).