idnits 2.17.1 draft-templin-inetmtu-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 866. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 877. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 884. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 890. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 19, 2007) is 6032 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 669 == Missing Reference: 'N-1' is mentioned on line 669, but not defined ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 4941 (Obsoleted by RFC 8981) -- Obsolete informational reference (is this intentional?): RFC 1146 (Obsoleted by RFC 6247) Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Phantom Works 4 Intended status: Informational October 19, 2007 5 Expires: April 21, 2008 7 Simple Protocol for Robust IP/*/IP Tunnel Endpoint MTU Determination 8 (sprite-mtu) 9 draft-templin-inetmtu-05.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 21, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 The nominal Maximum Transmission Unit (MTU) of today's Internet has 43 become 1500 bytes, but IP/*/IP tunneling mechanisms impose an 44 encapsulation overhead that can reduce the effective path MTU to 45 smaller values. Additionally, existing tunneling mechanisms are 46 limited in their ability to support larger MTUs. This document 47 specifies a simple protocol for robust IP/*/IP tunnel endpoint MTU 48 determination (sprite-mtu). 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Terminology and Requirements . . . . . . . . . . . . . . . . . 3 54 3. Concept of Operation . . . . . . . . . . . . . . . . . . . . . 4 55 4. "sprite-udp" UDP Service . . . . . . . . . . . . . . . . . . . 4 56 5. "sprite-mtu" Protocol Specification . . . . . . . . . . . . . 6 57 5.1. Interactions with End-to-End MTU Determination . . . . . . 6 58 5.2. Tunnel Virtual Interface MTU and linkMTU . . . . . . . . . 6 59 5.3. Per-Tunnel Tunnel Soft State . . . . . . . . . . . . . . . 7 60 5.3.1. TNE Soft State . . . . . . . . . . . . . . . . . . . . 7 61 5.3.2. TFE Soft State . . . . . . . . . . . . . . . . . . . . 8 62 5.4. Soft State Maintenance . . . . . . . . . . . . . . . . . . 8 63 5.4.1. TNE Soft State Maintenance . . . . . . . . . . . . . . 8 64 5.4.2. TFE Soft State Maintenance . . . . . . . . . . . . . . 9 65 5.5. Sending Packets . . . . . . . . . . . . . . . . . . . . . 9 66 5.5.1. Conceptual Sending Algorithm . . . . . . . . . . . . . 10 67 5.5.2. Inner Packet Fragmentation . . . . . . . . . . . . . . 10 68 5.5.3. Encapsulation and Trailers . . . . . . . . . . . . . . 10 69 5.5.4. Outer Packet Fragmentation and Setting DF . . . . . . 11 70 5.5.5. Admitting Packets into the Tunnel . . . . . . . . . . 12 71 5.6. Receiving Packets . . . . . . . . . . . . . . . . . . . . 12 72 5.6.1. IPv4 Reassembly Cache Management . . . . . . . . . . . 12 73 5.6.2. Decapsulation . . . . . . . . . . . . . . . . . . . . 13 74 5.6.3. Receiving Packet Too Big (PTB) Errors . . . . . . . . 13 75 5.6.4. Receiving Other ICMP Errors . . . . . . . . . . . . . 13 76 5.7. MTU Probing and Black Hole Detection . . . . . . . . . . . 13 77 5.8. Congestion Control . . . . . . . . . . . . . . . . . . . . 14 78 5.9. Confidentiality . . . . . . . . . . . . . . . . . . . . . 14 79 5.10. sprite-mtu Checksum Calculation . . . . . . . . . . . . . 15 80 6. Updated Specifications . . . . . . . . . . . . . . . . . . . . 15 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 83 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 84 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 85 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17 86 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 87 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19 88 Intellectual Property and Copyright Statements . . . . . . . . . . 20 90 1. Introduction 92 The nominal Maximum Transmission Unit (MTU) of today's Internet has 93 become 1500 bytes due to the preponderance of networking gear that 94 configures an MTU of that size. Since not all links in the Internet 95 configure a 1500 byte MTU, however, packets can be dropped due to an 96 MTU restriction on the path. 98 Upper layers see IP/*/IP tunnels as ordinary links, but even for 99 small packets these links are susceptible to silent loss (e.g., due 100 to path MTU restrictions, lost error messages, layered 101 encapsulations, reassembly buffer limitations, etc.) resulting in 102 poor performance and/or communications failures 103 [RFC2923][RFC4459][RFC4821][RFC4963]. 105 This document specifies a simple protocol for robust IP/*/IP tunnel 106 endpoint MTU determination (sprite-mtu), and updates the functional 107 specifications for Tunnel Endpoints (TEs) found in existing tunneling 108 mechanisms (see: Section 6). 110 This document seeks to achieve an appropriate balance between 111 function in the network and function in the end systems [RFC1958], 112 and further observes the tunnel management specifications in 113 [RFC2003][RFC2473][RFC4213]. 115 2. Terminology and Requirements 117 The following abbreviations and terms are used in this document: 119 ICMP - ICMPv4 [RFC0793] or ICMPv6 [RFC4443]. 121 IP - IPv4 [RFC0791] or IPv6 [RFC2460]. 123 IP/*/IP - an inner IP packet encapsulated in outer */IP headers 124 (e.g. for "*" = NULL, UDP, TCP, AH, ESP, etc.) 126 inner packet/fragment/header - an IP packet/fragment/header before 127 */IP encapsulation. 129 outer packet/fragment/header - a */IP packet/fragment/header after 130 encapsulation. 132 AQM - Active Queue Management 134 DF - the IPv4 header "Don't Fragment" flag 135 ENCAPS - the size of the encapsulating */IP headers plus trailers 137 IPAE 139 MTU - Maximum Transmission Unit 141 linkMTU - MTU assigned to an IP link over which the tunnel is 142 configured 144 pathMTU - the minimum path MTU for the tunnel 146 PTB - an ICMPv4 "Destination Unreachable - fragmentation needed" 147 [RFC1191] or an ICMPv6 "Packet Too Big" [RFC1981] message. 149 sprite-mtu - the sprite-mtu protocol, specified in this document 151 sprite-udp - the sprite-udp UDP messaging service, also specified 152 in this document 154 sprite - a message of the "sprite-udp" service 156 TE - Tunnel Endpoint 158 TFE - Tunnel Far End 160 TNE - Tunnel Near End 162 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 163 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 164 document, are to be interpreted as described in [RFC2119]. 166 3. Concept of Operation 168 TEs use the tunnel management specifications in 169 [RFC2003][RFC2473][RFC4213] and also participate in the "sprite-mtu" 170 protocol to confirm the participation of the TFE, to determine per- 171 tunnel MTU values, to detect path MTU-related black-holes, and to 172 detect congestion. The protocol is supported through the exchange of 173 messages between TEs using the "sprite-udp" UDP service. The 174 mechanisms provide robust MTU determination and congestion control 175 when both TEs support the protocol, and support the legacy behavior 176 otherwise. 178 4. "sprite-udp" UDP Service 180 The "sprite-udp" service is a simple UDP service based on "sprite" 181 messages. Sprite requests set the UDP destination port to the 182 sprite-udp service port (see: Section 7), and set the source port to 183 either the sprite-udp service port or a dynamic port number chosen by 184 the source. Sprite replys set the UDP source port to the sprite-udp 185 service port and set the UDP destination port to the value included 186 in the UDP source port in the soliciting sprite request. 188 All sprite requests and replys include a protocol version number, a 189 message type field, a TTL value, a randomly-chosen 8 byte nonce used 190 to match requests with their corresponding replys and zero or more 191 bytes of arbitrary data as shown in Figure 1: 193 0 1 2 3 194 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 196 | Vers | Type | TTL | Checksum | 197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 198 | | 199 +- Nonce -+ 200 | | 201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 202 | Data ... 203 +-+-+-+-+- 205 Figure 1: Sprite Message Format 207 where the fields of the message body are defined as follows: 209 Vers 210 the Version field indicates the sprite-udp protocol version. This 211 document describes version 1. 213 Type 214 the message type. Currently defined values are: 216 0 - request 218 1 - reply 220 2 - 15 reserved for future use 222 TTL 223 set to 0 in sprite requests; set to the TTL/Hop Limit in the IP 224 header of the soliciting request in replys. 226 Checksum 227 the 16-bit one's complement of the one's complement sum of the 228 message body, starting with the version field and ending at the 229 end of the data field. For computing the checksum, the checksum 230 field is first set to zero. An all zero transmitted checksum 231 value means that the transmitter generated no checksum. 233 Nonce 234 An 8-byte nonce field set to a randomly-initialized value in the 235 sprite request and echoed in the reply. 237 Data 238 Zero or more octets of arbitrary data, included in the sprite 239 request and echoed in the reply. 241 5. "sprite-mtu" Protocol Specification 243 TEs that implement the sprite-udp service MUST also participate in 244 the "sprite-mtu" protocol to: 1) determine whether the TFE implements 245 the scheme, 2) detect path MTU-related black holes, 3) provide timely 246 aging of stale path MTU information, 4) determine the length of the 247 forward path through the tunnel, 5) determine accurate round trip 248 times, and 6) detect and report congestion. The following sections 249 specify the protocol details: 251 5.1. Interactions with End-to-End MTU Determination 253 The sprite-mtu protocol operates independently of any end-to-end MTU 254 determination, however it offers improved convergence time and 255 efficiency when end-to-end mechanisms such as [RFC4821] are also 256 used. 258 5.2. Tunnel Virtual Interface MTU and linkMTU 260 TEs SHOULD configure an MTU on the tunnel virtual interface (i.e., 261 the MTU that is seen by upper layers) that is at least as large as 262 the largest linkMTU for all underlying interfaces over which the 263 tunnel virtual interface is configured. For IPv6/*/IP tunnels, the 264 tunnel virtual interface MUST configure an MTU no smaller than 1280 265 bytes, and for IPv4/*/IP tunnels it MUST configure an MTU no smaller 266 than 576 bytes. 268 Additionally, operators SHOULD observe the recommendations in 269 [RFC3819], Section 2, i.e., they should avoid setting a too-small 270 linkMTU on any of the underlying interfaces over which the tunnel 271 virtual interface is configured. 273 5.3. Per-Tunnel Tunnel Soft State 275 TEs maintain per-tunnel soft state information (e.g., in a conceptual 276 neighbor cache) that is initialized when data begins flowing over the 277 tunnel and is scheduled for deletion based on idle timers, resource 278 limitations, etc. thereafter. The TNE maintains state regarding the 279 forward path to the TFE, while the TFE maintains state regarding the 280 number of packets received, packets in error, etc. The minimum state 281 kept by the TNE and TFE is given in the following sections: 283 5.3.1. TNE Soft State 285 The TNE keeps the following minimum per-tunnel soft state, and MAY 286 keep additional soft state (e.g., packets/bytes sent, collisions, 287 etc.): 289 isQualified 290 boolean indicating whether the TFE implements the protocol. 292 Initial value: FALSE 294 TFEAddr 295 the inner IP address of the TFE. 297 pathMTU 298 the current minimum path MTU across the tunnel to the TFE, 299 determined through sprite probing to determine the largest size 300 outer packet that can traverse the tunnel. 302 Initial value: 0. 304 TTL 305 the current path length across the tunnel to the TFE, determined 306 through sprite probing. 308 Initial value: 0. 310 RTT 311 the round trip time for the TFE. 313 Initial value: none 315 spriteList 316 a list of sprite requests that have been sent into the tunnel but 317 not yet acknowledged by the TFE. 319 Initial list: NULL 321 5.3.2. TFE Soft State 323 The TFE keeps the following minimum per-tunnel soft state 324 information, and MAY keep additional soft state (e.g., congestion, 325 error rate, etc.): 327 TNEAddr 328 the inner IP address of the TNE. 330 rxTime 331 the system time at which the soft state for this TNE was 332 initialized. 334 rxPackets 335 number of packets received. 337 rxBytes 338 number of bytes received. 340 rxDropped 341 number of packets dropped due to incorrect checksums, congestion, 342 etc. 344 5.4. Soft State Maintenance 346 5.4.1. TNE Soft State Maintenance 348 When a TNE has data to send to a TFE, it obeys the specification in 349 the normative IP/*/IP reference but also creates soft state per 350 Section 5.3.1 (unless the soft state already exists) and sends sprite 351 requests into the tunnel. The TNE includes in the inner IP header of 352 each sprite request one of its currently-assigned IP addresses as the 353 source address and a destination address that is either an IP address 354 of the TFE if one is known, or a randomly-chosen address otherwise. 355 The TNE then sets 'Vers' to '1', sets 'Type' to '0', sets 'TTL' to 356 '0', includes a randomly-chosen 8-byte Nonce value, and includes any 357 additional data to be echoed in the sprite reply. The TNE finally 358 calculates the checksum and writes its value in the 'Checksum' field 359 (or, writes the value '0'), then sends the sprite request into the 360 tunnel 362 If the TNE receives a sprite reply message that is apparently from 363 the TFE and also includes a 'Nonce' and 'Data' that matches a request 364 in its 'spriteList', it sets 'isQualified' to TRUE; otherwise, it 365 discards the reply. The TNE also records the round-trip time (RTT) 366 relative to the time at which it sent the soliciting sprite request, 367 as well as the difference between the TTL of the soliciting request 368 and the TTL encoded in the reply. The TNE finally records the source 369 address in the inner IP header of the sprite reply in 'TFEAddr' and 370 uses this address as the inner IP destination address in subsequent 371 requests. 373 The TNE sends sprite requests continuously while it is actively 374 sending data into the tunnel, but should limit the rate, e.g., on the 375 order of 1 request per second. When the flow of data ceases, the TNE 376 stops sending sprite requests and schedules the soft state entry for 377 deletion. When the flow of data resumes, the TNE resumes sending 378 sprite requests. 380 The TNE removes sprite requests from its 'spriteList' when either a 381 matching reply is received, or after a timeout period during which no 382 matching reply is received. (Timeouts on the order of IPv6 neighbor 383 discovery [RFC4861] are recommended.) 385 5.4.2. TFE Soft State Maintenance 387 When a TFE receives a sprite request, it prepares a reply that 388 includes the inner IP source address of the request in the inner IP 389 destination address and one of the TFE's currently-assigned IP 390 addresses in the inner IP source address. The TFE then sets 'Vers' 391 to '1', sets 'Type to '1', copies the TTL/Hop Limit from the sprite 392 request into the 'TTL' field, and copies the 'Nonce' and 'Data' in 393 the request message body into the corresponding fields of the reply. 394 The TFE finally calculates the checksum and writes its value in the 395 'Checksum' field (or, writes the value '0'), then sends the sprite 396 reply back to the TNE. 398 If the inner IP destination address of the sprite request was the 399 same as an address assigned to the TFE, the TFE also creates soft 400 state per Section 5.3.2 (unless the soft state already exists) and 401 records the inner IP source address in 'TNEAddr'. 403 The TFE retains the soft state information as long as it continues to 404 receive sprite requests from the TNE. When the flow of requests 405 ceases, the TFE schedules the soft state entry for deletion. 407 5.5. Sending Packets 409 Inner IP packets forwarded by upper layers that are larger than the 410 tunnel virtual interface MTU are dropped with an ICMP Packet Too Big 411 (PTB) sent back to the original source, as for any IP interface. 412 Other inner IP packets are forwarded into the tunnel interface, which 413 will encapsulate and send them on the underlying tunnel and/or return 414 an internally-generated PTB when necessary. 416 TEs that implement the sprite-mtu protocol use the specifications for 417 sending packets found in the following sections: 419 5.5.1. Conceptual Sending Algorithm 421 TEs use the conceptual sending algorithm in Figure 1 for sending 422 packets that are forwarded into the tunnel virtual interface by upper 423 layers: 425 Fragment inner packet if necessary (Section 5.5.2) 426 foreach inner packet/fragment 427 Encapsulate as an outer packet (Section 5.5.3). 428 Fragment outer packet if necessary (Sect. 5.5.4). 429 foreach outer packet/fragment 430 Send packet/fragment (Section 5.5.5). 431 endforeach 432 Send PTB appropriate to the inner protocol if 433 necessary (Section 5.5.6). 434 end foreach 436 Figure 2: Conceptual Sending Algorithm 438 5.5.2. Inner Packet Fragmentation 440 The TE is not permitted to fragment inner IPv6 packets, therefore an 441 inner IPv6 packet is never fragmentable. 443 The TE considers an inner IPv4 packet as fragmentable IFF the DF bit 444 is set to 0 in the inner IPv4 header, and assumes that the original 445 source has learned through some end-to-end means that the final 446 destination is able to reassemble a packet of this size. The TE uses 447 IPv4 fragmentation to break fragmentable inner packets into fragments 448 no larger than (576 - ENCAPS) before encapsulation; these inner 449 fragments will ultimately be reassembled by the final destination. 451 5.5.3. Encapsulation and Trailers 453 TEs encapsulate inner IP packets according to the specific IP/*/IP 454 document. When 'isQualified' is TRUE, the TE includes a trailer with 455 a correct non-zero checksum in all packets that may incur outer 456 fragmentation (see: Section 5.5.4). For other packets, the TE can 457 either: 1) include a trailer with a correct non-zero checksum, 2) 458 include a trailer with a zero checksum, or 3) omit the trailer. 460 For outer packets that will include a trailer during encapsulation, 461 the TE includes zero or more padding bytes plus a 4-byte trailing 462 checksum immediately following the inner IP packet. The TE 463 increments the innermost '*' header length field by the number of 464 trailer bytes added before applying the outermost */IP 465 encapsulation(s). For example, it increments the UDP length field 466 for IP/UDP/IP tunnels ('*' = UDP), the IPv4 length field for IP/IPv4 467 tunnels ('*' = NULL), etc. The encapsulation is shown in Figure 3: 469 +---------------------------------+ 470 | Outer IP Header | 471 | | 472 +---------------------------------+ 473 | * Headers | 474 | | 475 +-------------+ +---------------------------------+ 476 | Inner IP | | Inner IP | 477 ~ packet ~ ===> ~ packet ~ 478 | | | | 479 +-------------+ +---------------------------------+ -\ T 480 Inner Packet | | | r 481 ~ Padding (0 or more bytes) ~ | a 482 | | > i 483 +----------------+----------------+ | l 484 | cksum-A (16b) | cksum-B (16b) | | e 485 +----------------+----------------+ -/ r 486 | Any */IP protocol trailers ... 487 +------------------------------ 488 Outer Packet 490 Figure 3: Encapsulation Format with Trailer 492 For all packets that will include a trailer, the TE appends any 493 padding bytes as necessary to extend the packet to a specific length 494 then calculates the checksum as specified in Section 5.10. It then 495 appends the results in the A and B fields of the trailing checksum. 496 (The TE instead writes the value 0 in these fields if the trailer is 497 to include a zero checksum). The checksum is byte-aligned only, 498 i.e., it need not be aligned on an even word/longword/etc. boundary. 500 The TE SHOULD NOT include a trailer during encapsulation when 501 'isQualified' is FALSE. 503 5.5.4. Outer Packet Fragmentation and Setting DF 505 The TE is not permitted to fragment outer */IPv6 packets using IPv6 506 fragmentation. 508 The TE considers an outer */IPv4 packet as fragmentable IFF: 510 o for IPv6/*/IPv4 tunnels, 'pathMTU' is less than (1280 bytes + 511 ENCAPS) and the inner IPv6 packet is no larger than 1280 bytes, 512 or: 514 o for IPv4/*/IPv4 tunnels, 'pathMTU' is less than 576 bytes and the 515 inner IPv4 packet is no larger than (576 bytes - ENCAPS) 517 The TE's */IPv4 encapsulation layer(s) MAY fragment a fragmentable 518 outer packet before admitting it into the tunnel. The TE SHOULD set 519 DF=1 in the outer IPv4 header of each fragment if 'pathMTU' is known; 520 otherwise, the TE MAY set DF=0 if there is assurance that the TFE can 521 receive non-initial IPv4 fragments. These outer fragments will be 522 reassembled by the TFE. 524 The TE MUST set DF=1 in the outer IPv4 header of all unfragmentable 525 outer packets. 527 5.5.5. Admitting Packets into the Tunnel 529 For IP/*/IPv4 tunnels, when 'isQualified' is FALSE and 'pathMTU' is 530 smaller than the minimum values listed in Section 5.5.4, the TE MUST 531 institute pacing and AQM to minimize IPv4 reassembly misassociations 532 and/or congestion at the TFE. The TE MAY relax this pacing if 533 'isQualified' becomes TRUE, but MUST resume pacing if it subsequently 534 detects congestion at the TFE (see: Section 5.8). 536 The TE admits outer packets into the tunnel subject to pacing and TTL 537 restrictions. For unfragmentable outer packets that are larger than 538 'pathMTU', the TE admits the packet but also sends a PTB message 539 appropriate to the inner IP protocol with an MTU size of ('pathMTU' - 540 ENCAPS). 542 5.6. Receiving Packets 544 TEs that implement the sprite-mtu protocol use the specifications for 545 receiving packets found in the following sections: 547 5.6.1. IPv4 Reassembly Cache Management 549 IP/*/IPv4 TEs SHOULD perform AQM in their IPv4 reassembly cache. In 550 particular, they should actively discard "stale" reassemblies that 551 have no apparent opportunity for successful completion, i.e., even 552 before the packets have reached the normal reassembly timeout 553 expiration recommended in [RFC1122], Section 3.3.2. 555 5.6.2. Decapsulation 557 When the TE receives (and, if necessary, reassembles) an encapsulated 558 packet, and the packet includes a trailing checksum, the TE accepts 559 the packet if the checksum is correct and drops the packet if the 560 checksum is incorrect. Otherwise, the TE decapsulates the packet 561 exactly as specified in the appropriate IP/*/IP document and discards 562 the trailer. 564 During decapsulation, the TE also records 'rx*' statistics in its 565 soft state entry for the tunnel, i.e., it increments 'rxPackets' and 566 'rxBytes' for packets received with no errors, and increments 567 'rxDropped' for packets dropped due to checksum errors, congestion, 568 etc. 570 5.6.3. Receiving Packet Too Big (PTB) Errors 572 TNEs may receive PTB errors in response to any packets they admit 573 into the tunnel. When the TNE receives a PTB with an MTU value 574 smaller than 'pathMTU', it SHOULD record the new 'pathMTU' in its 575 soft state entry for the tunnel. 577 5.6.4. Receiving Other ICMP Errors 579 TEs SHOULD observe the specifications in [RFC2003][RFC2473][RFC4213] 580 when they receive other ICMP errors from within the tunnel, but are 581 advised that ICMP denial-of-service attacks are possible. 583 5.7. MTU Probing and Black Hole Detection 585 When 'isQualified' is TRUE, the TE can send sprite requests to the 586 TFE with trailing padding added during encapsulation to create an MTU 587 probe of the desired length. In particular, when the TE sends a data 588 packet into the tunnel with a size that exceeds the current 589 'pathMTU', it MAY also send a padded sprite request of the same size 590 into the tunnel. If the TE receives a matching sprite reply, it 591 advances 'pathMTU' to the size of the request. 593 The TE MAY send additional sprite requests into the tunnel to 594 determine a maximum 'pathMTU' independent of any data packets sent 595 into the tunnel, to detect a smaller 'pathMTU' due to routing changes 596 and/or to detect MTU-related black holes. 598 For IP/*/IPv4 tunnels, the TE MUST set DF=1 in any sprite request 599 used for the purpose of 'pathMTU' probing. 601 5.8. Congestion Control 603 The TNE MUST set the ECN field in the inner IP header of sprite 604 requests used for soft state maintenance to either ECT(0) or ECT(1) 605 [RFC3168] and MUST also examine the ECN field in the inner IP headers 606 of replys. When the TNE begins to observe the CE codepoint in the 607 ECN field in the inner IP headers of successive sprite replys at a 608 rate that exceeds a high water threshold, it institutes pacing per 609 Section 5.5.5. The TNE MAY relax pacing when the rate falls below a 610 low water threshold. 612 The TFE MUST track the rate at which packets received from the TNE 613 are dropped due to, e.g., checksum errors, congestion, etc. When the 614 rate exceeds a high water threshold, the TFE MUST begin setting the 615 CE codepoint in the ECN field in the inner IP headers of sprite 616 replys sent in response to requests with the ECN field set to other 617 than not-ECT. The TFE MUST also set the CE codepoint in ordinary 618 data packets with the ECN field set to other than not-ECT when it 619 forwards them to the next IP hop. When the rate of dropped packets 620 falls below a low water threshold, the TFE MAY relax CE codepoint 621 marking. 623 5.9. Confidentiality 625 The soft state maintenance procedures in Section 5.4 are vulnerable 626 to denial-of-service and resource exhaustion attacks by attackers 627 that know or can guess the inner IP addresses of the TEs. For 628 bidirectional IPv6/*/IP tunnels, a simple mitigation is available 629 through the use of IPv6 privacy addresses [RFC4941] and IPv6 Unique 630 Local Address (ULA) prefixes [RFC4193]. 632 In particular, the TEs of bidirectional IPv6/*/IP tunnels should 633 configure time-varying IPv6 privacy addresses from randomly-chosen 634 IPv6 ULA prefixes (e.g., regenerate a new IPv6 privacy address from a 635 different ULA prefix every 30 seconds) so that attackers cannot 636 correlate activities and/or easily guess addresses. These addresses 637 will not be routable beyond the scope of the TEs and need not be 638 checked for uniqueness, since they need not be used for any purpose 639 other than the exchange of sprite requests/replys. 641 The same mitigation is also available for IPv4/*/IP tunnels through 642 the use of random assignments from IPv4 private address ranges (e.g., 643 192.0.2.0/8), however the number of bits for randomness is 644 significantly smaller than for IPv6. 646 This mitigation is not available for unidirectional tunnels, since 647 any randomly-chosen site-scoped addresses would not be routable on 648 the (non-tunneled) return path from the TFE to the TNE. 650 5.10. sprite-mtu Checksum Calculation 652 This specification uses a 16-bit variation of the Fletcher Checksum 653 [RFC1146][STONE1][STONE2] called: the "sprite-mtu checksum" which 654 provides a lightweight integrity check with different properties than 655 those used by common link layers and upper layer protocols. 657 The TE calculates the sprite-mtu checksum beginning with the inner IP 658 header up to the end of the inner packet including trailing padding, 659 but not including the trailing checksum field itself. The TE 660 calculates the checksum by summing every 10th byte of the packet 661 beginning with byte 0 up to a maximum of 128 bytes, i.e., for inner 662 packets larger than 1280 bytes only the first 1280 bytes are covered 663 by the checksum. 665 The TE calculates the checksum according to the algorithm below, 666 which represents a slight variation of that found in [RFC1146]: 668 The sprite-mtu checksum is calculated over a sequence of 669 unsigned data octets (call them D[0] through D[N-1]) by 670 maintaining unsigned 1's-complement 16-bit accumulators 671 A and B whose contents are initially zero, and performing 672 the following loop where i ranges from 1 to N (the loop 673 terminates when i >= N, or i >= 1280): 675 A := A + D[i] 676 B := B + A 677 i := i + 10 679 If, at the end of the loop, either or both of the A and B 680 accumulators encode the value 0x0000, invert the value in the 681 accumulator(s) to 0xffff. 683 Note that faster algorithms are possible and may be used instead of 684 the algorithm above (see: [RFC1146]). Note also that for '*/IP' 685 encapsulations that include an additional checksum, the sprite-mtu 686 checksum can be calculated in parallel. 688 6. Updated Specifications 690 This document updates the following specifications, and possibly 691 others: 693 o RFC1853 (IP-in-IP) 695 o RFC2003 (IP-in-IP) 696 o RFC2473 (Generic packet tunneling in IPv6) 698 o RFC2529 (6over4) 700 o RFC2661 (L2TP) 702 o RFC2784 (GRE) 704 o RFC3056 (6to4) 706 o RFC3378 (ETHERIP) 708 o RFC3884 (IPSec Transport Mode for Dynamic Routing) 710 o RFC4023 (MPLS-in-IP) 712 o RFC4213 (Basic IPv6 Transition Mechanisms) 714 o RFC4214 (ISATAP) 716 o RFC4301 (IPSec) 718 o RFC4302 (AH) 720 o RFC4303 (ESP) 722 o RFC4380 (TEREDO) 724 o LISP 726 o others.... 728 7. IANA Considerations 730 A new UDP port number for the "sprite-udp" service is requested. 732 8. Security Considerations 734 A possible denial of service attack vector involves an off-path 735 attacker sending sprite replys with spoofed source addresses, however 736 the 8-byte nonce serves as an effective mitigation. 738 A possible resource exhaustion attack vector exist when TEs use well- 739 known and/or time-invariant addresses. Time-varying private 740 addresses serve as an effective mitigation for bidirectional tunnels, 741 as specified in Section 5.9. 743 Security considerations for specific IP/*/IP tunneling mechanisms are 744 specified in their respective documents. 746 9. Acknowledgments 748 This work has benefited from discussions with Fred Baker, Iljitsch 749 van Beijnum, Brian Carpenter, Steve Casner, Remi Denis-Courmont, 750 Aurnaud Ebalard, Gorry Fairhurst, John Heffner, Bob Hinden, Christian 751 Huitema, Joe Macker, Matt Mathis, Dave Thaler, Joe Touch, Magnus 752 Westerlund and James Woodyatt. 754 This work is dedicated to the editor's family. 756 10. References 758 10.1. Normative References 760 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 761 September 1981. 763 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 764 RFC 793, September 1981. 766 [RFC1122] Braden, R., "Requirements for Internet Hosts - 767 Communication Layers", STD 3, RFC 1122, October 1989. 769 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 770 November 1990. 772 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 773 for IP version 6", RFC 1981, August 1996. 775 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 776 October 1996. 778 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 779 Requirement Levels", BCP 14, RFC 2119, March 1997. 781 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 782 (IPv6) Specification", RFC 2460, December 1998. 784 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 785 IPv6 Specification", RFC 2473, December 1998. 787 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 788 of Explicit Congestion Notification (ECN) to IP", 789 RFC 3168, September 2001. 791 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 792 Addresses", RFC 4193, October 2005. 794 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 795 for IPv6 Hosts and Routers", RFC 4213, October 2005. 797 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 798 Message Protocol (ICMPv6) for the Internet Protocol 799 Version 6 (IPv6) Specification", RFC 4443, March 2006. 801 [RFC4941] Narten, T., Draves, R., and S. Krishnan, "Privacy 802 Extensions for Stateless Address Autoconfiguration in 803 IPv6", RFC 4941, September 2007. 805 10.2. Informative References 807 [RFC1146] Zweig, J. and C. Partridge, "TCP alternate checksum 808 options", RFC 1146, March 1990. 810 [RFC1958] Carpenter, B., "Architectural Principles of the Internet", 811 RFC 1958, June 1996. 813 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 814 RFC 2923, September 2000. 816 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 817 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 818 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 819 RFC 3819, July 2004. 821 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 822 Network Tunneling", RFC 4459, April 2006. 824 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 825 Discovery", RFC 4821, March 2007. 827 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 828 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 829 September 2007. 831 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 832 Errors at High Data Rates", RFC 4963, July 2007. 834 [STONE1] Stone, J., "Checksums in the Internet (Stanford Doctoral 835 Dissertation)", August 2001. 837 [STONE2] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 838 "Performance of Checksums and CRC's over Real Data, IEEE/ 839 ACM Transactions on Networking, Vol 6, No. 5", 840 October 1998. 842 Author's Address 844 Fred L. Templin (editor) 845 Boeing Phantom Works 846 P.O. Box 3707 847 Seattle, WA 98124 848 USA 850 Email: fred.l.templin@boeing.com 852 Full Copyright Statement 854 Copyright (C) The IETF Trust (2007). 856 This document is subject to the rights, licenses and restrictions 857 contained in BCP 78, and except as set forth therein, the authors 858 retain all their rights. 860 This document and the information contained herein are provided on an 861 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 862 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 863 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 864 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 865 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 866 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 868 Intellectual Property 870 The IETF takes no position regarding the validity or scope of any 871 Intellectual Property Rights or other rights that might be claimed to 872 pertain to the implementation or use of the technology described in 873 this document or the extent to which any license under such rights 874 might or might not be available; nor does it represent that it has 875 made any independent effort to identify any such rights. Information 876 on the procedures with respect to rights in RFC documents can be 877 found in BCP 78 and BCP 79. 879 Copies of IPR disclosures made to the IETF Secretariat and any 880 assurances of licenses to be made available, or the result of an 881 attempt made to obtain a general license or permission for the use of 882 such proprietary rights by implementers or users of this 883 specification can be obtained from the IETF on-line IPR repository at 884 http://www.ietf.org/ipr. 886 The IETF invites any interested party to bring to its attention any 887 copyrights, patents or patent applications, or other proprietary 888 rights that may cover technology that may be required to implement 889 this standard. Please address the information to the IETF at 890 ietf-ipr@ietf.org. 892 Acknowledgment 894 Funding for the RFC Editor function is provided by the IETF 895 Administrative Support Activity (IASA).