idnits 2.17.1 draft-templin-inetmtu-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 872. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 883. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 890. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 896. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 14, 2007) is 6009 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 672 == Missing Reference: 'N-1' is mentioned on line 672, but not defined ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 4941 (Obsoleted by RFC 8981) -- Obsolete informational reference (is this intentional?): RFC 1146 (Obsoleted by RFC 6247) Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Phantom Works 4 Intended status: Informational November 14, 2007 5 Expires: May 17, 2008 7 Simple Protocol for Robust IP/*/IP Tunnel Endpoint MTU Determination 8 (sprite-mtu) 9 draft-templin-inetmtu-06.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on May 17, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 The nominal Maximum Transmission Unit (MTU) of today's Internet has 43 become 1500 bytes, but IP/*/IP tunneling mechanisms impose an 44 encapsulation overhead that can reduce the effective path MTU to 45 smaller values. Additionally, existing tunneling mechanisms are 46 limited in their ability to support larger MTUs. This document 47 specifies a simple protocol for robust IP/*/IP tunnel endpoint MTU 48 determination (sprite-mtu). 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Terminology and Requirements . . . . . . . . . . . . . . . . . 3 54 3. Concept of Operation . . . . . . . . . . . . . . . . . . . . . 4 55 4. "sprite-udp" UDP Service . . . . . . . . . . . . . . . . . . . 5 56 5. "sprite-mtu" Protocol Specification . . . . . . . . . . . . . 6 57 5.1. Interactions with End-to-End MTU Determination . . . . . . 6 58 5.2. Tunnel Virtual Interface MTU and linkMTU . . . . . . . . . 6 59 5.3. Sprite Addresses . . . . . . . . . . . . . . . . . . . . . 7 60 5.4. Per-Tunnel Tunnel Soft State . . . . . . . . . . . . . . . 7 61 5.4.1. TNE Soft State . . . . . . . . . . . . . . . . . . . . 7 62 5.4.2. TFE Soft State . . . . . . . . . . . . . . . . . . . . 8 63 5.5. Soft State Maintenance . . . . . . . . . . . . . . . . . . 9 64 5.5.1. TNE Soft State Maintenance . . . . . . . . . . . . . . 9 65 5.5.2. TFE Soft State Maintenance . . . . . . . . . . . . . . 9 66 5.6. Sending Packets . . . . . . . . . . . . . . . . . . . . . 10 67 5.6.1. Conceptual Sending Algorithm . . . . . . . . . . . . . 10 68 5.6.2. Inner Packet Fragmentation . . . . . . . . . . . . . . 11 69 5.6.3. Encapsulation and Trailers . . . . . . . . . . . . . . 11 70 5.6.4. Outer Packet Fragmentation and Setting DF . . . . . . 12 71 5.6.5. Admitting Packets into the Tunnel . . . . . . . . . . 13 72 5.7. Receiving Packets . . . . . . . . . . . . . . . . . . . . 13 73 5.7.1. IPv4 Reassembly Cache Management . . . . . . . . . . . 13 74 5.7.2. Decapsulation . . . . . . . . . . . . . . . . . . . . 13 75 5.7.3. Receiving Packet Too Big (PTB) Errors . . . . . . . . 14 76 5.7.4. Receiving Other ICMP Errors . . . . . . . . . . . . . 14 77 5.8. MTU Probing and Black Hole Detection . . . . . . . . . . . 14 78 5.9. Congestion Control . . . . . . . . . . . . . . . . . . . . 14 79 5.10. sprite-mtu Checksum Calculation . . . . . . . . . . . . . 15 80 6. Updated Specifications . . . . . . . . . . . . . . . . . . . . 16 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 83 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 84 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 85 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17 86 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 87 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19 88 Intellectual Property and Copyright Statements . . . . . . . . . . 20 90 1. Introduction 92 The nominal Maximum Transmission Unit (MTU) of today's Internet has 93 become 1500 bytes due to the preponderance of networking gear that 94 configures an MTU of that size. Since not all links in the Internet 95 configure a 1500 byte MTU, however, packets can be dropped due to an 96 MTU restriction on the path. 98 Upper layers see IP/*/IP tunnels as ordinary links, but even for 99 small packets these links are susceptible to silent loss (e.g., due 100 to path MTU restrictions, lost error messages, layered 101 encapsulations, reassembly buffer limitations, etc.) resulting in 102 poor performance and/or communications failures 103 [RFC2923][RFC4459][RFC4821][RFC4963]. 105 This document specifies a simple protocol for robust IP/*/IP tunnel 106 endpoint MTU determination (sprite-mtu), and updates the functional 107 specifications for Tunnel Endpoints (TEs) found in existing tunneling 108 mechanisms (see: Section 6). 110 This document seeks to achieve an appropriate balance between 111 function in the network and function in the end systems [RFC1958], 112 and further observes the tunnel management specifications in 113 [RFC2003][RFC2473][RFC4213]. 115 2. Terminology and Requirements 117 The following abbreviations and terms are used in this document: 119 ICMP - ICMPv4 [RFC0793] or ICMPv6 [RFC4443]. 121 IP - IPv4 [RFC0791] or IPv6 [RFC2460]. 123 IP/*/IP - an inner IP packet encapsulated in outer */IP headers 124 (e.g. for "*" = NULL, UDP, TCP, AH, ESP, etc.) 126 inner packet/fragment/header - an IP packet/fragment/header before 127 */IP encapsulation. 129 outer packet/fragment/header - a */IP packet/fragment/header after 130 encapsulation. 132 AQM - Active Queue Management 134 DF - the IPv4 header "Don't Fragment" flag 135 ENCAPS - the size of the encapsulating */IP headers plus trailers 137 EMTU_R - Effective Maximum Transmission Unit to Receive [RFC1122] 138 for the TFE 140 MTU - Maximum Transmission Unit 142 linkMTU - MTU assigned to a link over which the tunnel is 143 configured 145 pathMTU - the minimum path MTU for the tunnel 147 PTB - an ICMPv4 "Destination Unreachable - fragmentation needed" 148 [RFC1191] or an ICMPv6 "Packet Too Big" [RFC1981] message. 150 sprite-mtu - the sprite-mtu protocol, specified in this document 152 sprite-udp - the sprite-udp UDP messaging service, also specified 153 in this document 155 sprite - a message of the "sprite-udp" service 157 TE - Tunnel Endpoint 159 TFE - Tunnel Far End 161 TNE - Tunnel Near End 163 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 164 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 165 document, are to be interpreted as described in [RFC2119]. 167 3. Concept of Operation 169 TEs use the tunnel management specifications in 170 [RFC2003][RFC2473][RFC4213] and also participate in the "sprite-mtu" 171 protocol to confirm the participation of the TFE, to determine per- 172 tunnel MTU values, to detect path MTU-related black-holes, and to 173 detect congestion. The protocol is supported through the exchange of 174 messages between TEs using the "sprite-udp" UDP service. The 175 mechanisms provide robust MTU determination and congestion control 176 when both TEs support the protocol, and support the legacy behavior 177 otherwise. 179 4. "sprite-udp" UDP Service 181 The "sprite-udp" service is a simple UDP service based on "sprite" 182 messages. Sprite requests set the UDP destination port to the 183 sprite-udp service port (see: Section 7), and set the source port to 184 either the sprite-udp service port or a dynamic port number chosen by 185 the source. Sprite replys set the UDP source port to the sprite-udp 186 service port and set the UDP destination port to the value included 187 in the UDP source port in the soliciting sprite request. 189 All sprite requests and replys are formatted as shown in Figure 1 190 (format for other sprite messages may be specified in future 191 documents): 193 0 1 2 3 194 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 196 | Vers | Type | TTL | Checksum | 197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 198 | Identification | Sequence Number | 199 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 200 | Data ... 201 +-+-+-+-+- 203 Figure 1: Sprite Message Format 205 where the fields of the message body are defined as follows: 207 Vers 208 the Version field indicates the sprite-udp protocol version. This 209 document describes version 1. 211 Type 212 the message type. Currently defined values are: 214 0 - request 216 1 - reply 218 2 - 15 reserved for future use 220 TTL 221 in sprite requests, set to 0; in sprite replies, set to the TTL/ 222 Hop Limit in the IP header of the request to which this packet is 223 a reply. 225 Checksum 226 the 16-bit one's complement of the one's complement sum of the 227 message body, starting with the version field and ending at the 228 end of the data field. For computing the checksum, the checksum 229 field is first set to zero. An all zero transmitted checksum 230 value means that the transmitter generated no checksum. 232 Identification, Sequence Number 233 Two 16-bit fields, used exactly as specified for the corresponding 234 fields in ICMP echo request and reply messages. 236 Data 237 Zero or more octets of arbitrary data, included in the sprite 238 request and echoed in the reply. 240 5. "sprite-mtu" Protocol Specification 242 TEs that implement the sprite-udp service MUST also participate in 243 the "sprite-mtu" protocol to: 1) determine whether the TFE implements 244 the scheme, 2) detect path MTU-related black holes, 3) provide timely 245 aging of stale path MTU information, 4) determine the length of the 246 forward path through the tunnel, 5) determine accurate round trip 247 times, and 6) detect and report congestion. The following sections 248 specify the protocol details: 250 5.1. Interactions with End-to-End MTU Determination 252 The sprite-mtu protocol operates independently of any end-to-end MTU 253 determination, however it offers improved convergence time and 254 efficiency when end-to-end mechanisms such as [RFC4821] are also 255 used. 257 5.2. Tunnel Virtual Interface MTU and linkMTU 259 TEs SHOULD configure an MTU on the tunnel virtual interface (i.e., 260 the MTU that is seen by upper layers) that is at least as large as 261 the largest linkMTU for all underlying interfaces over which the 262 tunnel virtual interface is configured. For IPv6/*/IP tunnels, the 263 tunnel virtual interface MUST configure an MTU no smaller than 1280 264 bytes, and for IPv4/*/IP tunnels it SHOULD configure an MTU no 265 smaller than 576 bytes. 267 Additionally, operators SHOULD observe the recommendations in 268 [RFC3819], Section 2, i.e., they should avoid setting a too-small 269 linkMTU on any of the underlying interfaces over which the tunnel 270 virtual interface is configured. 272 5.3. Sprite Addresses 274 TEs must configure a time-varying link-local address (e.g., they 275 regenerate a new link-local address every 30 seconds) for the tunnel 276 interface known as the "sprite address". The sprite address is 277 associated with the tunnel interface (i.e., not assigned to the 278 interface) hence it need not be checked for uniqueness and will only 279 be used for the purpose of sprite exchanges for soft state management 280 (see below). 282 For IPv6/*/IPv4 tunnels, link-local IPv6 privacy addresses [RFC4941] 283 are used. For IPv4/*/IP tunnels, random assignments from the IPv4 284 link local address range [RFC3927] are used, however the number of 285 bits for randomness is significantly smaller than for IPv6. 287 This addressing scheme is not available for unidirectional tunnels, 288 since link local addresses would not be routable on the (non- 289 tunneled) return path from the TFE to the TNE. 291 5.4. Per-Tunnel Tunnel Soft State 293 TEs maintain per-tunnel soft state information (e.g., in a conceptual 294 neighbor cache) that is initialized when there is evidence that a 295 continuous flow of data will traverse the tunnel and is scheduled for 296 deletion based on idle timers, resource limitations, etc. thereafter. 297 The TNE maintains state regarding the forward path to the TFE. When 298 necessary (e.g., when the tunnel is fragmenting), the TFE also 299 maintains state regarding the number of packets received, packets in 300 error, etc. The minimum state kept by the TNE and TFE is given in 301 the following sections: 303 5.4.1. TNE Soft State 305 The TNE keeps the following minimum per-tunnel soft state for active 306 tunnels, and MAY keep additional soft state (e.g., packets/bytes 307 sent, collisions, etc.): 309 isQualified 310 boolean indicating whether the TFE implements the protocol. 312 Initial value: FALSE 314 TFEAddr 315 the inner IP address of the TFE. 317 pathMTU 318 the current minimum path MTU across the tunnel to the TFE, 319 determined through sprite probing to determine the largest size 320 outer packet that can traverse the tunnel. 322 Initial value: 0. 324 TTL 325 the current path length across the tunnel to the TFE, determined 326 through sprite probing. 328 Initial value: 0. 330 RTT 331 the round trip time for the TFE. 333 Initial value: none 335 spriteList 336 a list of sprite requests that have been sent into the tunnel but 337 not yet acknowledged by the TFE. 339 Initial list: NULL 341 5.4.2. TFE Soft State 343 When requested by the TNE, the TFE keeps the following minimum per- 344 tunnel soft state information, and MAY keep additional soft state 345 (e.g., congestion, error rate, etc.): 347 rxTime 348 the system time at which the soft state for this TNE was 349 initialized. 351 rxPackets 352 number of packets received. 354 rxBytes 355 number of bytes received. 357 rxDropped 358 number of packets dropped due to incorrect checksums, congestion, 359 etc. 361 5.5. Soft State Maintenance 363 5.5.1. TNE Soft State Maintenance 365 When a TNE has data to send to a TFE, it obeys the specification in 366 the normative IP/*/IP reference. When there is evidence that a 367 persistent flow of data will traverse the tunnel, the TNE also 368 creates soft state per Section 5.3.1 (unless the soft state already 369 exists) and sends sprite requests into the tunnel. The TNE includes 370 in the inner IP header of each sprite request its current sprite 371 address as the source address and a destination address that is 372 either the current sprite address of the TFE or a different link 373 local address. The TNE then sets 'Vers' to '1', sets 'Type' to '0', 374 sets 'TTL' to '0', sets 'Identification' and 'Sequence Number' to 375 identifying values, and includes a randomly-chosen nonce value (8 376 bytes recommended) in the 'Data' field along with any other data to 377 be echoed. The TNE finally calculates the checksum and writes its 378 value in the 'Checksum' field (or, writes the value '0'), then sends 379 the sprite request into the tunnel. 381 If the TNE receives a sprite reply message that is apparently from 382 the TFE and also includes an 'Identification', 'Sequence Number', and 383 'Data' that matches a request in its 'spriteList', it sets 384 'isQualified' to TRUE; otherwise, it discards the reply. The TNE 385 also records the round-trip time (RTT) relative to the time at which 386 it sent the soliciting sprite request, as well as the difference 387 between the TTL of the soliciting request and the TTL encoded in the 388 reply. The TNE finally records the source address in the inner IP 389 header of the sprite reply in 'TFEAddr'. 391 When the TNE requires the TFE to maintain state (e.g., when the 392 tunnel is fragmenting), it sends continuous sprite requests at a rate 393 of no more than 1 request per second and sets the inner destination 394 address of each request to 'TFEAddr'. When the flow of data ceases, 395 the TNE stops sending sprite requests and schedules the soft state 396 entry for deletion. When the flow of data resumes, the TNE resumes 397 sending sprite requests. 399 The TNE removes sprite requests from its 'spriteList' when either a 400 matching reply is received, or after a timeout period during which no 401 matching reply is received. (Timeouts on the order of IPv6 neighbor 402 discovery [RFC4861] are recommended.) 404 5.5.2. TFE Soft State Maintenance 406 When a TFE receives a sprite request, it prepares a reply that 407 includes the inner IP source address of the request in the inner IP 408 destination address and its current sprite address in the inner IP 409 source address. The TFE then sets 'Vers' to '1', sets 'Type to '1', 410 copies the TTL/Hop Limit from the sprite request into the 'TTL' 411 field, and copies 'Identification', 'Sequence Number' and 'Data' from 412 the request message body into the corresponding fields of the reply. 413 The TFE finally calculates the checksum and writes its value in the 414 'Checksum' field (or, writes the value '0'), then sends the sprite 415 reply back to the TNE. 417 If the inner IP destination address of the sprite request was the 418 same as the TFE's current sprite address, the TFE creates soft state 419 per Section 5.3.2 if possible. The TFE maintains the soft state as 420 long as it continues to receive sprite requests from the TNE that 421 include an inner destination address that matches its current sprite 422 address. When the flow of requests ceases, the TFE schedules the 423 soft state entry for deletion. 425 If the TFE is unable to maintain soft state, e.g., due to resource 426 limitations, it sets the inner IP destination address in its sprite 427 replys to a different link local IP address than the one included in 428 the inner IP source address of the sprite request. The TNE must 429 accept this as an indication that the TFE is not currently 430 maintaining soft state. 432 5.6. Sending Packets 434 Inner IP packets forwarded by upper layers that are larger than the 435 tunnel virtual interface MTU are dropped with an ICMP Packet Too Big 436 (PTB) sent back to the original source, as for any IP interface. 437 Other inner IP packets are forwarded into the tunnel interface, which 438 will encapsulate and send them on the underlying tunnel and/or return 439 an internally-generated PTB when necessary. 441 TEs that implement the sprite-mtu protocol use the specifications for 442 sending packets found in the following sections: 444 5.6.1. Conceptual Sending Algorithm 446 TEs use the conceptual sending algorithm in Figure 2 for sending 447 packets that are forwarded into the tunnel virtual interface by upper 448 layers: 450 Fragment inner packet if necessary (Section 5.5.2) 451 foreach inner packet/fragment 452 Encapsulate as an outer packet (Section 5.5.3). 453 Fragment outer packet if necessary (Sect. 5.5.4). 454 foreach outer packet/fragment 455 Send packet/fragment (Section 5.5.5). 456 endforeach 457 Send PTB appropriate to the inner protocol if 458 necessary (Section 5.5.6). 459 end foreach 461 Figure 2: Conceptual Sending Algorithm 463 5.6.2. Inner Packet Fragmentation 465 The TE considers an inner IPv4 packet as fragmentable IFF the DF bit 466 is set to 0 in the inner IPv4 header, and assumes that the original 467 source has learned through some end-to-end means that the final 468 destination is able to reassemble a packet of this size. The TE uses 469 IPv4 fragmentation to break fragmentable inner packets into fragments 470 no larger than (576 - ENCAPS) before encapsulation; these inner 471 fragments will ultimately be reassembled by the final destination. 473 The TE is not permitted to fragment inner IPv6 packets, therefore an 474 inner IPv6 packet is never fragmentable. 476 5.6.3. Encapsulation and Trailers 478 TEs encapsulate inner IP packets according to the specific IP/*/IP 479 document. When 'isQualified' is TRUE, the TE includes a trailer with 480 a correct non-zero checksum in all packets that may incur outer 481 fragmentation (see: Section 5.5.4). For other packets, the TE can 482 either: 1) include a trailer with a correct non-zero checksum, 2) 483 include a trailer with a zero checksum, or 3) omit the trailer. 485 For outer packets that will include a trailer during encapsulation, 486 the TE includes zero or more padding bytes plus a 4-byte trailing 487 checksum immediately following the inner IP packet. The TE 488 increments the innermost '*' header length field by the number of 489 trailer bytes added before applying the outermost */IP 490 encapsulation(s). For example, it increments the UDP length field 491 for IP/UDP/IP tunnels ('*' = UDP), the IPv4 length field for IP/IPv4 492 tunnels ('*' = NULL), etc. The encapsulation is shown in Figure 3: 494 +---------------------------------+ 495 | Outer IP Header | 496 | | 497 +---------------------------------+ 498 | * Headers | 499 | | 500 +-------------+ +---------------------------------+ 501 | Inner IP | | Inner IP | 502 ~ packet ~ ===> ~ packet ~ 503 | | | | 504 +-------------+ +---------------------------------+ -\ T 505 Inner Packet | | | r 506 ~ Padding (0 or more bytes) ~ | a 507 | | > i 508 +----------------+----------------+ | l 509 | cksum-A (16b) | cksum-B (16b) | | e 510 +----------------+----------------+ -/ r 511 | Any */IP protocol trailers ... 512 +------------------------------ 513 Outer Packet 515 Figure 3: Encapsulation Format with Trailer 517 For all packets that will include a trailer, the TE appends any 518 padding bytes as necessary to extend the packet to a specific length 519 then calculates the checksum as specified in Section 5.10. It then 520 appends the results in the A and B fields of the trailing checksum. 521 (The TE instead writes the value 0 in these fields if the trailer is 522 to include a zero checksum). The checksum is byte-aligned only, 523 i.e., it need not be aligned on an even word/longword/etc. boundary. 525 The TE SHOULD NOT include a trailer during encapsulation when 526 'isQualified' is FALSE. 528 5.6.4. Outer Packet Fragmentation and Setting DF 530 The TE is not permitted to fragment outer */IPv6 packets using IPv6 531 fragmentation. 533 The TE considers an outer */IPv4 packet as fragmentable IFF: 535 o for IPv6/*/IPv4 tunnels, 'pathMTU' is less than (1280 bytes + 536 ENCAPS) and the inner IPv6 packet is no larger than 1280 bytes, 537 or: 539 o for IPv4/*/IPv4 tunnels, 'pathMTU' is less than MIN(EMTU_R, 1280) 540 bytes and the inner IPv4 packet is no larger than MIN(EMTU_R, 541 1280) minus ENCAPS. (When EMTU_R for the TFE is not known, 576 542 bytes must be assumed.) 544 The TE's */IPv4 encapsulation layer(s) MAY fragment a fragmentable 545 outer packet before admitting it into the tunnel. The TE SHOULD set 546 DF=1 in the outer IPv4 header of each fragment if 'pathMTU' is known; 547 otherwise, the TE MAY set DF=0 if there is assurance that the TFE can 548 receive non-initial IPv4 fragments. These outer fragments will be 549 reassembled by the TFE. 551 The TE MUST set DF=1 in the outer IPv4 header of all unfragmentable 552 outer packets. 554 5.6.5. Admitting Packets into the Tunnel 556 For IP/*/IPv4 tunnels, when 'pathMTU' is smaller than the minimum 557 values listed in Section 5.5.4 and the TFE is not maintaining soft 558 state, the TE MUST institute pacing and AQM to minimize IPv4 559 reassembly misassociations and/or congestion at the TFE. The TE MAY 560 relax this pacing when the TFE indicates that it is maintaining soft 561 state, but MUST resume pacing if it subsequently detects congestion 562 at the TFE (see: Section 5.8). 564 The TE admits outer packets into the tunnel subject to pacing and TTL 565 restrictions. For unfragmentable outer packets that are larger than 566 'pathMTU', the TE admits the packet but also sends a PTB message 567 appropriate to the inner IP protocol with an MTU size of ('pathMTU' - 568 ENCAPS). 570 5.7. Receiving Packets 572 TEs that implement the sprite-mtu protocol use the specifications for 573 receiving packets found in the following sections: 575 5.7.1. IPv4 Reassembly Cache Management 577 IP/*/IPv4 TEs SHOULD perform AQM in their IPv4 reassembly cache. In 578 particular, they should actively discard "stale" reassemblies that 579 have no apparent opportunity for successful completion, i.e., even 580 before the packets have reached the normal reassembly timeout 581 expiration recommended in [RFC1122], Section 3.3.2. 583 5.7.2. Decapsulation 585 When the TE receives (and, if necessary, reassembles) an encapsulated 586 packet, and the packet includes a trailing checksum, the TE accepts 587 the packet if the checksum is correct and drops the packet if the 588 checksum is incorrect. Otherwise, the TE decapsulates the packet 589 exactly as specified in the appropriate IP/*/IP document and discards 590 the trailer. 592 During decapsulation, it the TE is maintaining soft state it also 593 records 'rx*' statistics in its soft state entry for the tunnel, 594 i.e., it increments 'rxPackets' and 'rxBytes' for packets received 595 with no errors, and increments 'rxDropped' for packets dropped due to 596 checksum errors, congestion, etc. 598 5.7.3. Receiving Packet Too Big (PTB) Errors 600 TNEs may receive PTB errors in response to any packets they admit 601 into the tunnel. When the TNE receives a PTB with an MTU value 602 smaller than 'pathMTU', it SHOULD record the new 'pathMTU' in its 603 soft state entry for the tunnel. 605 5.7.4. Receiving Other ICMP Errors 607 TEs SHOULD observe the specifications in [RFC2003][RFC2473][RFC4213] 608 when they receive other ICMP errors from within the tunnel, but are 609 advised that ICMP denial-of-service attacks are possible. 611 5.8. MTU Probing and Black Hole Detection 613 When 'isQualified' is TRUE, the TE can send sprite requests to the 614 TFE with trailing padding added during encapsulation to create an MTU 615 probe of the desired length. In particular, when the TE sends a data 616 packet into the tunnel with a size that exceeds the current 617 'pathMTU', it MAY also send a padded sprite request of the same size 618 into the tunnel. If the TE receives a matching sprite reply, it 619 advances 'pathMTU' to the size of the request. 621 The TE MAY send additional sprite requests into the tunnel to 622 determine a maximum 'pathMTU' independent of any data packets sent 623 into the tunnel, to detect a smaller 'pathMTU' due to routing changes 624 and/or to detect MTU-related black holes. 626 For IP/*/IPv4 tunnels, the TE MUST set DF=1 in any sprite request 627 used for the purpose of 'pathMTU' probing. 629 5.9. Congestion Control 631 The TNE MUST set the ECN field in the inner IP header of sprite 632 requests used for soft state maintenance to either ECT(0) or ECT(1) 633 [RFC3168] and MUST also examine the ECN field in the inner IP headers 634 of sprite replys. When the TNE begins to observe the CE codepoint in 635 the ECN field in the inner IP headers of successive sprite replys at 636 a rate that exceeds a high water threshold, it institutes pacing per 637 Section 5.5.5. The TNE MAY relax pacing when the rate falls below a 638 low water threshold. 640 When soft state management is requested by the TNE, the TFE MUST 641 track the rate at which packets received from the TNE are dropped due 642 to, e.g., checksum errors, congestion, etc. When the rate exceeds a 643 high water threshold, the TFE MUST begin setting the CE codepoint in 644 the ECN field in the inner IP headers of sprite replys sent in 645 response to requests with the ECN field set to other than not-ECT. 646 The TFE MUST also set the CE codepoint in ordinary data packets with 647 the ECN field set to other than not-ECT when it forwards them to the 648 next IP hop. When the rate of dropped packets falls below a low 649 water threshold, the TFE MAY relax CE codepoint marking. 651 When the TFE is temporarily unable to maintain soft state, it 652 includes a link local address in the inner IP destination address of 653 its sprite replies that is different than the inner IP source address 654 that the TNE included in its sprite request. The TNE must interpret 655 this as an indication that pacing should resume. 657 5.10. sprite-mtu Checksum Calculation 659 This specification uses a 16-bit variation of the Fletcher Checksum 660 [RFC1146][STONE1][STONE2] called: the "sprite-mtu checksum" which 661 provides a lightweight integrity check with different properties than 662 those used by common link layers and upper layer protocols. 664 The TE calculates the sprite-mtu checksum by summing every 10th byte 665 of the packet beginning with the inner IP header up to the end of the 666 inner packet including trailing padding, but not including the 667 trailing checksum field itself. The TE calculates the checksum 668 according to the algorithm below, which represents a slight variation 669 of that found in [RFC1146]: 671 The sprite-mtu checksum is calculated over a sequence of 672 unsigned data octets (call them D[0] through D[N-1]) by 673 maintaining unsigned 1's-complement 16-bit accumulators 674 A and B whose contents are initially zero, and performing 675 the following loop where i ranges from 1 to N: 677 A := A + D[i] 678 B := B + A 679 i := i + 10 681 If, at the end of the loop, either or both of the A and B 682 accumulators encode the value 0x0000, invert the value in the 683 accumulator(s) to 0xffff. 685 Note that faster algorithms are possible and may be used instead of 686 the algorithm above (see: [RFC1146]). Note also that for '*/IP' 687 encapsulations that include an additional checksum, the sprite-mtu 688 checksum can be calculated in parallel. 690 6. Updated Specifications 692 This document updates the following specifications, and possibly 693 others: 695 o RFC1853 (IP-in-IP) 697 o RFC2003 (IP-in-IP) 699 o RFC2473 (Generic packet tunneling in IPv6) 701 o RFC2529 (6over4) 703 o RFC2661 (L2TP) 705 o RFC2784 (GRE) 707 o RFC3056 (6to4) 709 o RFC3378 (ETHERIP) 711 o RFC3884 (IPSec Transport Mode for Dynamic Routing) 713 o RFC4023 (MPLS-in-IP) 715 o RFC4213 (Basic IPv6 Transition Mechanisms) 717 o RFC4214 (ISATAP) 719 o RFC4301 (IPSec) 721 o RFC4302 (AH) 723 o RFC4303 (ESP) 725 o RFC4380 (TEREDO) 727 o LISP 729 o IPAE 731 o others.... 733 7. IANA Considerations 735 A new UDP port number for the "sprite-udp" service is requested. 737 8. Security Considerations 739 A possible denial of service attack vector involves an off-path 740 attacker sending sprite replys with spoofed source addresses, however 741 the 8-byte nonce serves as an effective mitigation. 743 A possible resource exhaustion attack vector exist when TEs use well- 744 known and/or time-invariant addresses. Sprite addresses serve as an 745 effective mitigation for bidirectional tunnels, as specified in 746 Section 5.9. 748 Security considerations for specific IP/*/IP tunneling mechanisms are 749 specified in their respective documents. 751 9. Acknowledgments 753 This work has benefited from discussions with Fred Baker, Iljitsch 754 van Beijnum, Brian Carpenter, Steve Casner, Remi Denis-Courmont, 755 Aurnaud Ebalard, Gorry Fairhurst, John Heffner, Bob Hinden, Christian 756 Huitema, Joe Macker, Matt Mathis, Dave Thaler, Joe Touch, Magnus 757 Westerlund, Robin Whittle, and James Woodyatt. 759 This work is dedicated to the editor's family. 761 10. References 763 10.1. Normative References 765 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 766 September 1981. 768 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 769 RFC 793, September 1981. 771 [RFC1122] Braden, R., "Requirements for Internet Hosts - 772 Communication Layers", STD 3, RFC 1122, October 1989. 774 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 775 November 1990. 777 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 778 for IP version 6", RFC 1981, August 1996. 780 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 781 October 1996. 783 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 784 Requirement Levels", BCP 14, RFC 2119, March 1997. 786 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 787 (IPv6) Specification", RFC 2460, December 1998. 789 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 790 IPv6 Specification", RFC 2473, December 1998. 792 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 793 of Explicit Congestion Notification (ECN) to IP", 794 RFC 3168, September 2001. 796 [RFC3927] Cheshire, S., Aboba, B., and E. Guttman, "Dynamic 797 Configuration of IPv4 Link-Local Addresses", RFC 3927, 798 May 2005. 800 [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms 801 for IPv6 Hosts and Routers", RFC 4213, October 2005. 803 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 804 Message Protocol (ICMPv6) for the Internet Protocol 805 Version 6 (IPv6) Specification", RFC 4443, March 2006. 807 [RFC4941] Narten, T., Draves, R., and S. Krishnan, "Privacy 808 Extensions for Stateless Address Autoconfiguration in 809 IPv6", RFC 4941, September 2007. 811 10.2. Informative References 813 [RFC1146] Zweig, J. and C. Partridge, "TCP alternate checksum 814 options", RFC 1146, March 1990. 816 [RFC1958] Carpenter, B., "Architectural Principles of the Internet", 817 RFC 1958, June 1996. 819 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 820 RFC 2923, September 2000. 822 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 823 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 824 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 825 RFC 3819, July 2004. 827 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 828 Network Tunneling", RFC 4459, April 2006. 830 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 831 Discovery", RFC 4821, March 2007. 833 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 834 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 835 September 2007. 837 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 838 Errors at High Data Rates", RFC 4963, July 2007. 840 [STONE1] Stone, J., "Checksums in the Internet (Stanford Doctoral 841 Dissertation)", August 2001. 843 [STONE2] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 844 "Performance of Checksums and CRC's over Real Data, IEEE/ 845 ACM Transactions on Networking, Vol 6, No. 5", 846 October 1998. 848 Author's Address 850 Fred L. Templin (editor) 851 Boeing Phantom Works 852 P.O. Box 3707 853 Seattle, WA 98124 854 USA 856 Email: fred.l.templin@boeing.com 858 Full Copyright Statement 860 Copyright (C) The IETF Trust (2007). 862 This document is subject to the rights, licenses and restrictions 863 contained in BCP 78, and except as set forth therein, the authors 864 retain all their rights. 866 This document and the information contained herein are provided on an 867 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 868 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 869 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 870 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 871 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 872 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 874 Intellectual Property 876 The IETF takes no position regarding the validity or scope of any 877 Intellectual Property Rights or other rights that might be claimed to 878 pertain to the implementation or use of the technology described in 879 this document or the extent to which any license under such rights 880 might or might not be available; nor does it represent that it has 881 made any independent effort to identify any such rights. Information 882 on the procedures with respect to rights in RFC documents can be 883 found in BCP 78 and BCP 79. 885 Copies of IPR disclosures made to the IETF Secretariat and any 886 assurances of licenses to be made available, or the result of an 887 attempt made to obtain a general license or permission for the use of 888 such proprietary rights by implementers or users of this 889 specification can be obtained from the IETF on-line IPR repository at 890 http://www.ietf.org/ipr. 892 The IETF invites any interested party to bring to its attention any 893 copyrights, patents or patent applications, or other proprietary 894 rights that may cover technology that may be required to implement 895 this standard. Please address the information to the IETF at 896 ietf-ipr@ietf.org. 898 Acknowledgment 900 Funding for the RFC Editor function is provided by the IETF 901 Administrative Support Activity (IASA).