idnits 2.17.1 draft-templin-intarea-parcels-10.txt: -(804): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (29 March 2022) is 757 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCXXXX' is mentioned on line 667, but not defined == Unused Reference: 'I-D.ietf-tcpm-rfc793bis' is defined on line 732, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-6man-mtu-option-13 == Outdated reference: A later version (-07) exists of draft-templin-6man-fragrep-06 == Outdated reference: A later version (-74) exists of draft-templin-6man-omni-55 == Outdated reference: A later version (-16) exists of draft-templin-dtn-ltpfrag-08 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1063 (Obsoleted by RFC 1191) Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. L. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Updates: RFC2675 (if approved) 29 March 2022 5 Intended status: Standards Track 6 Expires: 30 September 2022 8 IP Parcels 9 draft-templin-intarea-parcels-10 11 Abstract 13 IP packets (both IPv4 and IPv6) are understood to contain a unit of 14 data which becomes the retransmission unit in case of loss. Upper 15 layer protocols such as the Transmission Control Protocol (TCP) 16 prepare data units known as "segments", with traditional arrangements 17 including a single segment per packet. This document presents a new 18 construct known as the "IP Parcel" which permits a single packet to 19 carry multiple segments, essentially creating a "packet-of-packets". 20 IP parcels provide an essential building block for accommodating 21 larger Maximum Transmission Units (MTUs) in the Internet as discussed 22 in this document. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on 30 September 2022. 41 Copyright Notice 43 Copyright (c) 2022 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 48 license-info) in effect on the date of publication of this document. 49 Please review these documents carefully, as they describe your rights 50 and restrictions with respect to this document. Code Components 51 extracted from this document must include Revised BSD License text as 52 described in Section 4.e of the Trust Legal Provisions and are 53 provided without warranty as described in the Revised BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3. Background and Motivation . . . . . . . . . . . . . . . . . . 4 60 4. IP Parcel Formation . . . . . . . . . . . . . . . . . . . . . 5 61 5. Transmission of IP Parcels . . . . . . . . . . . . . . . . . 8 62 6. Parcel Path Qualification . . . . . . . . . . . . . . . . . . 10 63 7. Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . 14 64 8. RFC2675 Updates . . . . . . . . . . . . . . . . . . . . . . . 15 65 9. IPv4 Jumbograms . . . . . . . . . . . . . . . . . . . . . . . 15 66 10. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 67 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 68 12. Security Considerations . . . . . . . . . . . . . . . . . . . 15 69 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 70 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 71 14.1. Normative References . . . . . . . . . . . . . . . . . . 16 72 14.2. Informative References . . . . . . . . . . . . . . . . . 16 73 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 75 1. Introduction 77 IP packets (both IPv4 [RFC0791] and IPv6 [RFC8200]) are understood to 78 contain a unit of data which becomes the retransmission unit in case 79 of loss. Upper layer protocols such as the Transmission Control 80 Protocol (TCP) [RFC0793], QUIC [RFC9000], LTP [RFC5326] and others 81 prepare data units known as "segments", with traditional arrangements 82 including a single segment per packet. This document presents a new 83 construct known as the "IP Parcel" which permits a single packet to 84 carry multiple segments. This essentially creates a "packet-of- 85 packets" with the IP layer headers appearing only once but with 86 possibly multiple upper layer protocol segments included. 88 Parcels are formed when an upper layer protocol entity identified by 89 the "5-tuple" (source IP, source port, destination IP, destination 90 port, protocol number) prepares a data buffer with the concatenation 91 of up to 64 properly-formed segments that can be broken out into 92 smaller parcels using a copy of the IP header. All segments except 93 the final segment must be equal in size and no larger than 65535 94 octets (minus headers), while the final segment must not be larger 95 than the others but may be smaller. The upper layer protocol entity 96 then delivers the buffer and non-final segment size to the IP layer, 97 which appends the necessary IP header plus extensions to identify 98 this as a parcel and not an ordinary packet. 100 Parcels can be forwarded over consecutive parcel-capable IP links in 101 the path until arriving at an ingress middlebox at the edge of an 102 intermediate Internetwork. The ingress middlebox may break the 103 parcel out into smaller (sub-)parcels and encapsulate them in headers 104 suitable for traversing the Internetwork. These smaller parcels may 105 then be rejoined into one or more larger parcels at an egress 106 middlebox which either delivers them locally or forwards them further 107 over parcel-capable IP links toward the final destination. Middlebox 108 repackaging of parcels is therefore possible, making reordering and 109 even loss of individual segments possible. But, what matters is that 110 the number of parcels delivered to the final destination should be 111 kept to a minimum, and that loss or receipt of individual segments 112 (and not parcel size) determines the retransmission unit. 114 The following sections discuss rationale for creating and shipping 115 parcels as well as the actual protocol constructs and procedures 116 involved. IP parcels provide an essential building block for 117 accommodating larger Maximum Transmission Units (MTUs) in the 118 Internet. It is further expected that the parcel concept may drive 119 future innovation in applications, operating systems, network 120 equipment and data links. 122 2. Terminology 124 A "parcel" is defined as "a thing or collection of things wrapped in 125 paper in order to be carried or sent by mail". Indeed, there are 126 many examples of parcel delivery services worldwide that provide an 127 essential transit backbone for efficient business and consumer 128 transactions. 130 In this same spirit, an "IP parcel" is simply a collection of up to 131 64 upper layer protocol segments wrapped in an efficient package for 132 transmission and delivery (i.e., a "packet-of-packets") while a 133 "singleton IP parcel" is simply a parcel that contains a single 134 segment. IP parcels are distinguished from ordinary packets through 135 the special header constructions discussed in this document. 137 The IP parcels construct is defined for both IPv4 and IPv6. Where 138 the document refers to "IPv4 header length", it means the total 139 length of the base IPv4 header plus all included options, i.e., as 140 determined by consulting the Internet Header Length (IHL) field. 141 Where the document refers to "IPv6 header length", however, it means 142 only the length of the base IPv6 header (i.e., 40 octets), while the 143 length of any extension headers is referred to separately as the 144 "extension header length". Finally, the term "IP header plus 145 extensions" refers generically to an IPv4 header plus all included 146 options or an IPv6 header plus all included extension headers. 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 150 "OPTIONAL" in this document are to be interpreted as described in BCP 151 14 [RFC2119][RFC8174] when, and only when, they appear in all 152 capitals, as shown here. 154 3. Background and Motivation 156 Studies have shown that applications can realize greater performance 157 by sending and receiving larger packets due to reduced numbers of 158 system calls and interrupts as well as larger atomic data copies 159 between kernel and user space. Large packets also result in reduced 160 numbers of network device interrupts and better network utilization 161 in comparison with smaller packet sizes. 163 A first study [QUIC] involved performance enhancement of the QUIC 164 protocol [RFC9000] using the linux Generic Segment/Receive Offload 165 (GSO/GRO) facility. GSO/GRO provide a robust (but non-standard) 166 service very similar in nature to the IP parcel service described 167 here, and its application has shown significant performance increases 168 due to the increased transfer unit size between the operating system 169 kernel and QUIC application. 171 A second study [I-D.templin-dtn-ltpfrag] showed that GSO/GRO also 172 improved performance for the Licklider Transmission Protocol (LTP) 173 [RFC5326] for small- to medium-sized segments. Historically, the NFS 174 protocol also saw significant performance increases using larger 175 (single-segment) UDP datagrams even when IP fragmentation is invoked, 176 and LTP still follows this profile today. Moreover, LTP shows this 177 (single-segment) performance increase profile extending to the 178 largest possible segment size which suggests that additional 179 performance gains may be possible using (multi-segment) IP parcels 180 that exceed 65535 octets. 182 TCP also benefits from larger packet sizes and efforts have 183 investigated TCP performance using jumbograms internally with changes 184 to the linux GSO/GRO facilities [BIG-TCP]. The idea is to use the 185 jumbo payload internally and to allow GSO/GRO to use buffer sizes 186 larger than 65535 octets, but with the understanding that links that 187 support jumbos natively are not yet widely available. Hence, IP 188 parcels provides a packaging that can be considered in the near term 189 under current deployment limitations. 191 A limiting consideration for sending large packets is that they are 192 often lost at links with smaller Maximum Transmission Units (MTUs), 193 and the resulting Packet Too Big (PTB) message may be lost somewhere 194 in the path back to the original source. This "Path MTU black hole" 195 condition can degrade performance unless robust path probing 196 techniques are used, however the best case performance always occurs 197 when no packets are lost due to size restrictions. 199 These considerations therefore motivate a design where the maximum 200 segment size should be no larger than 65535 octets (minus headers), 201 while parcels that carry the segments may themselves be significantly 202 larger. Then, even if a middlebox needs to sub-divide the parcels 203 into smaller sub-parcels to forward further toward the final 204 destination, an important performance optimization for the original 205 source, final destination and network middleboxes can be realized. 207 An analogy: when a consumer orders 50 small items from a major online 208 retailer, the retailer does not ship the order in 50 separate small 209 boxes. Instead, the retailer puts as many of the small items as 210 possible into one or a few larger boxes (i.e., parcels) then places 211 the parcels on a semi-truck or airplane. The parcels may then pass 212 through one or more regional distribution centers where they may be 213 repackaged into different parcel configurations and forwarded further 214 until they are finally delivered to the consumer. But most often, 215 the consumer will only find one or a few parcels at their doorstep 216 and not 50 separate small boxes. This flexible parcel delivery 217 service greatly reduces shipping and handling overhead for all 218 including the retailer, regional distribution centers and finally the 219 consumer. 221 4. IP Parcel Formation 223 IP parcel formation is invoked by an upper layer protocol (identified 224 by the 5-tuple described above) when it prepares a data buffer 225 containing the concatenation of up to 64 segments. All non-final 226 segments MUST be equal in length while the final segment MUST NOT be 227 larger and MAY be smaller. Each non-final segment MUST NOT be larger 228 than 65535 octets minus the length of the IPv4 header or IPv6 229 extension headers, minus the length of an additional IPv6 header in 230 case an encapsulation middlebox is visited on the path (see: 231 Section 5). The upper layer protocol then presents the buffer and 232 non-final segment size to the IP layer which appends a single IP 233 header (plus extensions) before presenting the parcel either to an 234 adaptation layer interface or directly to an ordinary network 235 interface without engaging the adaptation layer (see: Section 5). 237 For IPv4, the IP layer prepares the parcel by appending an IPv4 238 header with a Jumbo Payload option formed as follows: 240 +--------+--------+--------+--------+--------+--------+ 241 |Opt Type|Opt Len | Jumbo Payload Length | 242 +--------+--------+--------+--------+--------+--------+ 244 The IPv4 Jumbo Payload option format is identical to that defined in 245 [RFC2675], except that the IP layer sets option type to '00001011' 246 and option length to '00000110' noting that the length distinguishes 247 this type from its deprecated use as the IPv4 "Probe MTU" option 248 [RFC1063]. The IP layer then sets "Jumbo Payload Length" to the 249 lengths of the IPv4 header plus the combined length of all 250 concatenated segments (i.e., as a 32-bit value in network byte 251 order). The IP layer next sets the IPv4 header DF bit to 1, then 252 sets the IPv4 header Total Length field to the length of the IPv4 253 header plus the length of the first segment only. Note that the IP 254 layer can form true IPv4 jumbograms (as opposed to parcels) by 255 instead setting the IPv4 header Total Length field to the length of 256 the IPv4 header only (see: Section 9). 258 For IPv6, the IP layer forms a parcel by appending an IPv6 header 259 with a Hop-by-Hop Options extension header containing a Jumbo Payload 260 option formatted the same as for IPv4 above, but with option type set 261 to '11000010' and option length set to '00000100'. The IP layer then 262 sets "Jumbo Payload Length" to the lengths of all IPv6 extension 263 headers present plus the combined length of all concatenated 264 segments. The IP layer next sets the IPv6 header Payload Length 265 field to the lengths of all IPv6 extension headers present plus the 266 length of the first segment only. Note that the IP layer can form 267 true IPv6 jumbograms (as opposed to parcels) by instead setting the 268 IPv6 header Payload Length field to 0 (see: [RFC2675]). 270 An IP parcel therefore has the following structure: 272 +--------+--------+--------+--------+ 273 | | 274 ~ Segment J (K octets) ~ 275 | | 276 +--------+--------+--------+--------+ 277 ~ ~ 278 ~ ~ 279 +--------+--------+--------+--------+ 280 | | 281 ~ Segment 3 (L octets) ~ 282 | | 283 +--------+--------+--------+--------+ 284 | | 285 ~ Segment 2 (L octets) ~ 286 | | 287 +--------+--------+--------+--------+ 288 | | 289 ~ Segment 1 (L octets) ~ 290 | | 291 +--------+--------+--------+--------+ 292 | IP Header Plus Extensions | 293 ~ {Total, Payload} Length = M ~ 294 | Jumbo Payload Length = N | 295 +--------+--------+--------+--------+ 297 where J is the total number of segments (between 1 and 64), L is the 298 length of each non-final segment which MUST NOT be larger than 65535 299 octets (minus headers as above) and K is the length of the final 300 segment which MUST NOT be larger than L. The values M and N are then 301 set to the length of the IP header for IPv4 or to the length of the 302 extension headers only for IPv6, then further calculated as follows: 304 M = M + ((J-1) ? L : K) 306 N = N + (((J-1) * L) + K) 308 Using TCP [RFC0793] for example, each of the J segments would include 309 its own TCP header, including Sequence Number, Checksum, etc. The 310 Sequence Number plus segment length (K or L) therefore provides the 311 destination with the necessary parameters for application data 312 reassembly while the Checksum provides per-segment application data 313 integrity. 315 Note: a "singleton" parcel is one that includes only the IP header 316 plus extensions with J=1 and a single segment of length K, while a 317 "null" parcel is a singleton with (J=1; K=0), i.e., a parcel 318 consisting of only the IP header plus extensions with no octets 319 beyond. 321 5. Transmission of IP Parcels 323 The IP layer next presents the parcel to the outgoing network 324 interface. For ordinary IP interfaces, the interface simply forwards 325 the parcel over the underlying link the same as for any IP packet 326 after which it may then be forwarded by any number of routers over 327 additional consecutive parcel-capable IP links. If any next hop IP 328 link in the path either does not support parcels or configures an MTU 329 that is too small to transit the parcel without fragmentation, the 330 router instead opens the parcel and forwards each enclosed segment as 331 a separate IP packet (i.e., by appending a copy of the parcel's IP 332 header to each segment but with the Jumbo Payload option removed 333 according to the standards [RFC0791][RFC8200]). Or, if the router 334 does not recognize parcels at all, it drops the parcel and may return 335 an ICMP "Parameter Problem" message. 337 If the outgoing network interface is an OMNI interface 338 [I-D.templin-6man-omni], the OMNI Adaptation Layer (OAL) of this 339 First Hop Segment (FHS) OAL source node forwards the parcel to the 340 next OAL hop which may be either an OAL intermediate node or a Last 341 Hop Segment (LHS) OAL destination node (which may also be the final 342 destination itself). The OAL source assigns a monotonically- 343 incrementing (modulo 127) "Parcel ID" and subdivides the parcel into 344 sub-parcels no larger than the maximum of the path MTU to the next 345 hop or 65535 octets (minus headers) by determining the number of 346 segments of length L that can fit into each sub-parcel under these 347 size constraints. For example, if the OAL source determines that a 348 sub-parcel can contain 3 segments of length L, it creates sub-parcels 349 with the first containing segments 1-3, the second containing 350 segments 4-6, etc. and with the final containing any remaining 351 segments. The OAL source then appends an identical IP header plus 352 extensions to each sub-parcel while resetting M and N in each 353 according to the above equations with J set to 3 (and K = L) for each 354 non-final sub-parcel and with J set to the remaining number of 355 segments for the final sub-parcel. 357 The OAL source next performs IP encapsulation on each sub-parcel with 358 destination set to the next hop IP address then inserts an IPv6 359 Fragment Header after the IP encapsulation header, i.e., even if the 360 encapsulation header is IPv4, even if no actual fragmentation is 361 needed and/or even if the Jumbo Payload option is present. The OAL 362 source then assigns a randomly-initialized 32-bit Identification 363 number that is monotonically-incremented for each consecutive sub- 364 parcel, then performs IPv6 fragmentation over the sub-parcel if 365 necessary to create fragments small enough to traverse the path to 366 the next OAL hop while writing the Parcel ID and setting or clearing 367 the "Parcel (P)" and "(More) Sub-Parcels (S)" bits in the Fragment 368 Header of the first fragment (see: [I-D.templin-6man-fragrep]). (The 369 OAL source sets P to 1 for a parcel or to 0 for a non-parcel. When P 370 is 1, the OAL source next sets S to 1 for non-final sub-parcels or to 371 0 if the sub-parcel contains the final segment.) The OAL source then 372 forwards each IP encapsulated packet/fragment to the next OAL hop. 374 When the next OAL hop receives the encapsulated IP fragments or whole 375 packets, it reassembles if necessary. If the P flag in the first 376 fragment is 0, the next hop then processes the reassembled entity as 377 an ordinary IP packet; otherwise it continues processing as a sub- 378 parcel. If the next hop is an OAL intermediate node, it retains the 379 sub-parcels along with their Parcel ID and Identification values for 380 a brief time in hopes of re-combining with peer sub-parcels of the 381 same original parcel identified by the 4-tuple consisting of the IP 382 encapsulation source and destination, Identification and Parcel ID. 383 The combining entails the concatenation of the segments included in 384 sub-parcels with the same Parcel ID and with Identification values 385 within 64 of one another to create a larger sub-parcel possibly even 386 as large as the entire original parcel. Order of concatenation is 387 not important, with the exception that the final sub-parcel (i.e., 388 the one with S set to 0) must occur as the final concatenation before 389 transmission. The OAL intermediate node then appends a common IP 390 header plus extensions to each re-combined sub-parcel while resetting 391 M and N in each according to the above equations with J, K and L set 392 accordingly. 394 This OAL intermediate node next forwards the re-combined sub- 395 parcel(s) to the next hop toward the OAL destination using 396 encapsulation the same as specified above. (The intermediate node 397 MUST ensure that the S flag remains set to 0 in the sub-parcel that 398 contains the final segment.) When the sub-parcel(s) arrive at the 399 OAL destination, the OAL destination re-combines them into the 400 largest possible sub-parcels while honoring the S flag as above. If 401 the OAL destination is also the final destination, it delivers the 402 sub-parcels to the IP layer which acts on the enclosed 5-tuple 403 information supplied by the original source. Otherwise, the OAL 404 destination forwards each sub-parcel toward the final destination the 405 same as for an ordinary IP packet the same as discussed above. 407 Note: while the OAL destination and/or final destination could 408 theoretically re-combine the sub-parcels of multiple different 409 parcels with identical upper layer protocol 5-tuples and with non- 410 final segments of identical length, this process could become 411 complicated when the different parcels each have final segments of 412 diverse lengths. Since this might interfere with any perceived 413 performance advantages, the decision of whether and how to perform 414 inter-parcel concatenation is an implementation matter. 416 Note: some IPv6 fragmentation and reassembly implementations may 417 require a well-formed IPv6 header to perform their operations. When 418 the encapsulation is based on IPv4, such implementations translate 419 the encapsulation header into an IPv6 header with IPv4-Mapped IPv6 420 addresses before performing the fragmentation/reassembly operation, 421 then restore the original IPv4 header before further processing. 423 6. Parcel Path Qualification 425 To determine whether parcels are supported over at least a leading 426 portion of the forward path toward the final destination, the 427 original source can send a singleton IP parcel formatted as a "Parcel 428 Probe" that may include an upper layer protocol probe segment (e.g., 429 a data segment, an ICMP Echo Request message, etc.). The purpose of 430 the probe is to elicit a "Parcel Reply" and possibly also an ordinary 431 upper layer protocol probe reply from the final destination. 433 If the original source receives a positive Parcel Reply, it marks the 434 path as "parcels supported" and ignores any ICMP [RFC0792][RFC4443] 435 and/or Packet Too Big (PTB) messages [RFC1191][RFC8201] concerning 436 the probe. If the original source instead receives a negative Parcel 437 Reply or no reply, it marks the path as "parcels not supported" and 438 may regard any ICMP and/or PTB messages concerning the probe (or its 439 contents) as indications of a possible path MTU restriction. 441 The original source can therefore send Parcel Probes in parallel with 442 sending real data as ordinary IP packets. If the original source 443 receives a positive Parcel Reply, it can begin using IP parcels. 445 Parcel Probes use the Jumbo Payload option type (see: Section 4) but 446 set a different option length and replace the option value with 447 control information plus a 4-octet "Path MTU" value into which 448 conformant middleboxes write the minimum link MTU observed in a 449 similar fashion as described in [RFC1063][I-D.ietf-6man-mtu-option]. 450 Parcel Probes can also include an upper layer protocol probe segment, 451 e.g., per [RFC4821][RFC8899]. When an upper layer protocol probe 452 segment is included, it appears immediately after the IP header plus 453 extensions. 455 The original source sends Parcel Probes unidirectionally in the 456 forward path toward the final destination to elicit a Parcel Reply, 457 since it will often be the case that IP parcels are supported only in 458 the forward path and not in the return path. Parcel Probes may be 459 dropped in the forward path by any node that does not recognize IP 460 parcels, but a Parcel Reply must not be dropped even if IP parcels 461 are not recognized along portions of the return path. For this 462 reason, Parcel Probes are packaged as IPv4 (header) options or IPv6 463 Hop-by-Hop options while Parcel Replys are always packaged as IPv6 464 Destination Options (i.e., regardless of the IP protocol version). 466 Original sources send Parcel Probes and Replys that include a Jumbo 467 Payload option coded in an alternate format as follows: 469 +--------+--------+--------+--------+ 470 |Opt Type|Opt Len | Nonce-1 | 471 +--------+--------+--------+--------+ 472 | Nonce-2 | 473 +--------+--------+--------+--------+ 474 | PMTU | 475 +--------+--------+--------+--------+ 476 | Code | Check | 477 +--------+--------+ 479 For IPv4, the original source includes the option as an IPv4 option 480 with Type set to '00001011' the same as for an ordinary IPv4 parcel 481 (see: Section 4) but with Length set to '00001110' to distinguish 482 this as a probe/reply. The original source sets Nonce-1 to 0xffff, 483 sets Nonce-2 to a (pseudo)-random 32-bit value and sets PMTU to the 484 MTU of the outgoing IPv4 interface. The original source then sets 485 Code to 0, sets Check to the same value that will appear in the TTL 486 of the outgoing IPv4 header, then finally sets IPv4 Total Length to 487 the lengths of the IPv4 header plus the upper layer protocol probe 488 segment (if any) and sends the Parcel Probe via the outgoing IPv4 489 interface. According to [RFC7126], middleboxes (i.e., routers, 490 security gateways, firewalls, etc.) that do not observe this 491 specification SHOULD drop IP packets that contain option type 492 '00001011' ("IPv4 Probe MTU") but some might instead either attempt 493 to implement [RFC1063] or ignore the option altogether. IPv4 494 middleboxes that observe this specification instead MUST process the 495 option as a Parcel Probe as specified below. 497 For IPv6, the original source includes the probe option as an IPv6 498 Hop-by-Hop option with Type set to '11000010' the same as for an 499 ordinary IPv6 parcel (see: Section 4) but with Length set to 500 '00001100' to distinguish this as a probe. The original source sets 501 the concatenation of Nonce-1 and Nonce-2 to a (pseudo)-random 48-bit 502 value and sets PMTU to the MTU of the outgoing IPv6 interface. The 503 original source then sets Code to 0, sets Check to the same value 504 that will appear in the Hop Limit of the outgoing IPv6 header, then 505 finally sets IPv6 Payload Length to the lengths of the IPv6 extension 506 headers plus the upper layer protocol probe segment (if any) and 507 sends the Parcel Probe via the outgoing IPv6 interface. According to 508 [RFC2675], middleboxes (i.e., routers, security gateways, firewalls, 509 etc.) that recognize the IPv6 Jumbo Payload option but do not observe 510 this specification SHOULD return an ICMPv6 Parameter Problem message 511 (and presumably also drop the packet). IPv6 middleboxes that observe 512 this specification instead MUST process the option as a Parcel Probe 513 as specified below. 515 When a middlebox that observes this specification receives a Parcel 516 Probe it first compares the Check value with the IP header Hop Limit/ 517 TTL; if the values differ, the middlebox MUST return a negative 518 Parcel Reply (see below) and drop the probe. Otherwise, if the next 519 hop IP link either does not support parcels or configures an MTU that 520 is too small to pass the probe, the middlebox compares the PMTU value 521 with the MTU of the inbound link for the probe and MUST (re)set PMTU 522 to the lower MTU. The middlebox then MUST return a positive Parcel 523 Reply (see below) and convert the probe into an ordinary IP packet by 524 removing the probe option according to [RFC0791] or [RFC8200]. If 525 the next hop IP link configures a sufficiently large MTU to pass the 526 packet, the middlebox then MUST forward the packet to the next hop; 527 otherwise, it MUST drop the packet and return a suitable PTB. If the 528 next hop IP link both supports parcels and configures an MTU that is 529 large enough to pass the probe, the middlebox instead compares the 530 probe PMTU value with the MTUs of both the inbound and outbound links 531 for the probe and MUST (re)set PMTU to the lower MTU. The middlebox 532 then MUST reset Check to the same value that will appear in the TTL/ 533 Hop Limit of the outgoing IP header, and MUST forward the Parcel 534 Probe to the next hop. 536 The final destination may therefore receive either an ordinary IP 537 packet containing an upper layer protocol probe or a Parcel Probe. 538 If the final destination receives an ordinary IP packet, it performs 539 any necessary integrity checks then delivers the packet to upper 540 layers which will return an upper layer probe response. If the final 541 destination instead receives a Parcel Probe, it first compares the 542 Check value with the IP header Hop Limit/TTL; if the values differ, 543 the final destination MUST drop the probe and return a negative 544 Parcel Reply (see below). Otherwise, the final destination compares 545 the probe PMTU value with the MTU of the inbound link and MUST 546 (re)set PMTU to the lower MTU. The final destination then MUST 547 return a positive Parcel Reply (see below) and convert the probe into 548 an ordinary IP packet by removing the Parcel Probe option according 549 to the standards [RFC0791][RFC8200].The final destination then 550 performs any necessary integrity checks and delivers the packet to 551 upper layers. 553 When the middlebox or final destination returns a Parcel Reply, it 554 prepares an IP header of the same protocol version that appeared in 555 the Parcel Probe with source and destination addresses reversed, with 556 {Protocol, Next Header} set to the value '60' (i.e., "IPv6 557 Destination Option") and with an IPv6 Destination Option header with 558 Next Header set to the value '59' (i.e., "IPv6 No Next Header") 559 [RFC8200]. The node next copies the body of the Parcel Probe option 560 as the sole Parcel Reply Destination Option (and for IPv4 resets Type 561 to '11000010' and Length to '00001100') and includes no other octets 562 beyond the end of the option. The node then MUST (re)set Check to 1 563 for a positive or to 0 for a negative Parcel Reply, then MUST finally 564 set the IP header {Total, Payload} Length field according to the 565 length of the included Destination Option and return the Parcel Reply 566 to the source. (Since filtering middleboxes may drop IPv4 packets 567 with Protocol '60' the destination MUST wrap an IPv4 Parcel Reply in 568 UDP/IPv4 headers with the IPv4 source and destination addresses 569 copied from the Parcel Reply and with UDP port numbers set to the UDP 570 port number for OMNI [I-D.templin-6man-omni].) 572 After sending a Parcel Probe the original source may therefore 573 receive a Parcel Reply (see above) and/or an upper layer protocol 574 probe reply. If the source receives a Parcel Reply, it first matches 575 Nonce-2 (and for IPv6 only also matches Nonce-1) with the values it 576 had included in the Parcel Probe. If the values do not match, the 577 source discards the Parcel Reply. Next, the source examines the 578 Check value and marks the path as "parcels supported" if the value is 579 1 or "parcels not supported" otherwise. If the source marks the path 580 as "parcels supported", it also records the PMTU value as the maximum 581 parcel size for the forward path to this destination. 583 After receiving a positive Parcel Reply, the original source can 584 begin sending IP parcels addressed to the final destination up to the 585 size recorded in the PMTU. Any upper layer protocol probe replies 586 will determine the maximum segment size that can be included in the 587 parcel, but this is an upper layer consideration. The original 588 source should then periodically re-initiate Parcel Path Qualification 589 as long as it continues to forward parcels toward the final 590 destination (i.e., in case the forward path fluctuates). If at any 591 time performance appears to degrade, the original source should cease 592 sending IP parcels and/or re-initiate Parcel Path Qualification. 594 Note: For IPv4, the original source sets the Parcel Probe Nonce-1 595 field to 0xffff on transmission and ignores the Nonce-1 field value 596 in any corresponding Parcel Replys. This avoids any possible 597 confusion in case an IPv4 router on the path rewrites the Nonce-1 598 field in a wayward attempt to implement [RFC1063]. 600 Note: The PMTU value returned in a positive Parcel Reply determines 601 only the maximum IP parcel size for the path, while the maximum upper 602 layer protocol segment size may be significantly smaller. The upper 603 layer protocol segment size is instead determined separately 604 according to any upper layer protocol probes and must be assumed to 605 be no larger than 1/64th of the maximum IP parcel size unless a 606 larger size is discovered by probing. 608 7. Integrity 610 Each segment of a (multi-segment) IP parcel includes its own upper 611 layer protocol integrity check. This means that IP parcels can 612 support stronger integrity for the same amount of upper layer 613 protocol data in comparison with an ordinary IP packet or Jumbogram 614 containing only a single segment. The integrity checks must then be 615 verified at the final destination, which accepts any segments with 616 correct integrity while discarding all other segments and counting 617 them as a loss event. 619 IP parcels can range in length from as small as only the IP headers 620 themselves to as large as the IP headers plus (64 * (65535 minus 621 headers)) octets. Although link layer integrity checks provide 622 sufficient protection for contiguous data blocks up to approximately 623 9KB, reliance on the presence of link-layer integrity checks may not 624 be possible over links such as tunnels. Moreover, the segment 625 contents of a received parcel may arrive in an incomplete and/or 626 rearranged order with respect to their original packaging. 628 For these reasons, the OAL at each hop of an OMNI link includes an 629 integrity check when it performs IP fragmentation on a sub-parcel, 630 with the integrity verified during reassembly at the next hop. 632 8. RFC2675 Updates 634 Section 3 of [RFC2675] provides a list of certain conditions to be 635 considered as errors. In particular: 637 error: IPv6 Payload Length != 0 and Jumbo Payload option present 639 error: Jumbo Payload option present and Jumbo Payload Length < 640 65,536 642 Implementations that obey this specification ignore these conditions 643 and do not consider them as errors. 645 9. IPv4 Jumbograms 647 By defining a new IPv4 Jumbo Payload option, this document also 648 implicitly enables a true IPv4 jumbogram service defined as an IPv4 649 packet with a Jumbo Payload option included and with Total Length set 650 to the length of the IPv4 header only. All other aspects of IPv4 651 jumbograms are the same as for IPv6 jumbograms [RFC2675]. 653 10. Implementation Status 655 Common widely-deployed implementations include services such as TCP 656 Segmentation Offload (TSO) and Generic Segmentation/Receive Offload 657 (GSO/GRO). These services support a robust (but not standardized) 658 service that has been shown to improve performance in many instances. 659 Implementation of the IP parcel service is a work in progress. 661 11. IANA Considerations 663 The IANA is instructed to change the "MTUP - MTU Probe" entry in the 664 'ip option numbers' registry to the "JUMBO - IPv4 Jumbo Payload" 665 option. The Copy and Class fields must both be set to 0, and the 666 Number and Value fields must both be set to 11'. The reference must 667 be changed to this document [RFCXXXX]. 669 12. Security Considerations 671 Original sources match the Nonce values in received Parcel Replies 672 with their corresponding Parcel Probes. If the values match, the 673 Parcel Reply is likely an authentic response to the Parcel Probe. In 674 environments where stronger authentication is necessary, the message 675 authentication services of OMNI can be applied 676 [I-D.templin-6man-omni]. 678 Multi-layer security solutions may be necessary to ensure 679 confidentiality, integrity and availability in some environments. 681 13. Acknowledgements 683 This work was inspired by ongoing AERO/OMNI/DTN investigations. The 684 concepts were further motivated through discussions on the intarea 685 and 6man lists. 687 A considerable body of work over recent years has produced useful 688 "segmentation offload" facilities available in widely-deployed 689 implementations. 691 . 693 14. References 695 14.1. Normative References 697 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 698 DOI 10.17487/RFC0791, September 1981, 699 . 701 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 702 Requirement Levels", BCP 14, RFC 2119, 703 DOI 10.17487/RFC2119, March 1997, 704 . 706 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 707 RFC 2675, DOI 10.17487/RFC2675, August 1999, 708 . 710 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 711 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 712 May 2017, . 714 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 715 (IPv6) Specification", STD 86, RFC 8200, 716 DOI 10.17487/RFC8200, July 2017, 717 . 719 14.2. Informative References 721 [BIG-TCP] Dumazet, E., "BIG TCP, Netdev 0x15 Conference (virtual), 722 https://netdevconf.info/0x15/session.html?BIG-TCP", 31 723 August 2021. 725 [I-D.ietf-6man-mtu-option] 726 Hinden, R. M. and G. Fairhurst, "IPv6 Minimum Path MTU 727 Hop-by-Hop Option", Work in Progress, Internet-Draft, 728 draft-ietf-6man-mtu-option-13, 28 February 2022, 729 . 732 [I-D.ietf-tcpm-rfc793bis] 733 Eddy, W. M., "Transmission Control Protocol (TCP) 734 Specification", Work in Progress, Internet-Draft, draft- 735 ietf-tcpm-rfc793bis-28, 7 March 2022, 736 . 739 [I-D.templin-6man-fragrep] 740 Templin, F. L., "IPv6 Fragment Retransmission and Path MTU 741 Discovery Soft Errors", Work in Progress, Internet-Draft, 742 draft-templin-6man-fragrep-06, 9 February 2022, 743 . 746 [I-D.templin-6man-omni] 747 Templin, F. L., "Transmission of IP Packets over Overlay 748 Multilink Network (OMNI) Interfaces", Work in Progress, 749 Internet-Draft, draft-templin-6man-omni-55, 7 March 2022, 750 . 753 [I-D.templin-dtn-ltpfrag] 754 Templin, F. L., "LTP Fragmentation", Work in Progress, 755 Internet-Draft, draft-templin-dtn-ltpfrag-08, 1 February 756 2022, . 759 [QUIC] Ghedini, A., "Accelerating UDP packet transmission for 760 QUIC, https://blog.cloudflare.com/accelerating-udp-packet- 761 transmission-for-quic/", 8 January 2020. 763 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 764 RFC 792, DOI 10.17487/RFC0792, September 1981, 765 . 767 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 768 RFC 793, DOI 10.17487/RFC0793, September 1981, 769 . 771 [RFC1063] Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP 772 MTU discovery options", RFC 1063, DOI 10.17487/RFC1063, 773 July 1988, . 775 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 776 DOI 10.17487/RFC1191, November 1990, 777 . 779 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 780 Control Message Protocol (ICMPv6) for the Internet 781 Protocol Version 6 (IPv6) Specification", STD 89, 782 RFC 4443, DOI 10.17487/RFC4443, March 2006, 783 . 785 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 786 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 787 . 789 [RFC5326] Ramadas, M., Burleigh, S., and S. Farrell, "Licklider 790 Transmission Protocol - Specification", RFC 5326, 791 DOI 10.17487/RFC5326, September 2008, 792 . 794 [RFC7126] Gont, F., Atkinson, R., and C. Pignataro, "Recommendations 795 on Filtering of IPv4 Packets Containing IPv4 Options", 796 BCP 186, RFC 7126, DOI 10.17487/RFC7126, February 2014, 797 . 799 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 800 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 801 DOI 10.17487/RFC8201, July 2017, 802 . 804 [RFC8899] Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T. 805 Völker, "Packetization Layer Path MTU Discovery for 806 Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, 807 September 2020, . 809 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 810 Multiplexed and Secure Transport", RFC 9000, 811 DOI 10.17487/RFC9000, May 2021, 812 . 814 Author's Address 815 Fred L. Templin (editor) 816 Boeing Research & Technology 817 P.O. Box 3707 818 Seattle, WA 98124 819 United States of America 820 Email: fltemplin@acm.org