idnits 2.17.1 draft-templin-dtn-ltpfrag-08.txt: -(596): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (1 February 2022) is 808 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-74) exists of draft-templin-6man-omni-52 == Outdated reference: A later version (-99) exists of draft-templin-intarea-parcels-06 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. L. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Intended status: Informational 1 February 2022 5 Expires: 5 August 2022 7 LTP Fragmentation 8 draft-templin-dtn-ltpfrag-08 10 Abstract 12 The Licklider Transmission Protocol (LTP) provides a reliable 13 datagram convergence layer for the Delay/Disruption Tolerant 14 Networking (DTN) Bundle Protocol. In common practice, LTP is often 15 configured over UDP/IP sockets and inherits its maximum segment size 16 from the maximum-sized UDP/IP datagram, however when this size 17 exceeds the maximum IP packet size for the path a service known as IP 18 fragmentation must be employed. This document discusses LTP 19 interactions with IP fragmentation and mitigations for managing the 20 amount of IP fragmentation employed. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on 5 August 2022. 39 Copyright Notice 41 Copyright (c) 2022 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this document. 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. Code Components 49 extracted from this document must include Revised BSD License text as 50 described in Section 4.e of the Trust Legal Provisions and are 51 provided without warranty as described in the Revised BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. IP Fragmentation Issues . . . . . . . . . . . . . . . . . . . 4 58 4. LTP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 5 59 5. Beyond "sendmmsg()" . . . . . . . . . . . . . . . . . . . . . 6 60 6. LTP Performance Enhancement Using GSO/GRO . . . . . . . . . . 7 61 6.1. LTP and GSO . . . . . . . . . . . . . . . . . . . . . . . 7 62 6.2. LTP and GRO . . . . . . . . . . . . . . . . . . . . . . . 8 63 6.3. LTP GSO/GRO Over OMNI Interfaces . . . . . . . . . . . . 9 64 6.4. IP Parcels . . . . . . . . . . . . . . . . . . . . . . . 11 65 7. Implementation Status . . . . . . . . . . . . . . . . . . . . 11 66 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 67 9. Security Considerations . . . . . . . . . . . . . . . . . . . 11 68 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 69 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 70 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 71 11.2. Informative References . . . . . . . . . . . . . . . . . 12 72 Appendix A. IPv4/IPv6 Protocol Considerations . . . . . . . . . 14 73 Appendix B. The Intergalactic Jigsaw Puzzle Builders Club . . . 14 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 76 1. Introduction 78 The Licklider Transmission Protocol (LTP) [RFC5326] provides a 79 reliable datagram convergence layer for the Delay/Disruption Tolerant 80 Networking (DTN) Bundle Protocol (BP) [I-D.ietf-dtn-bpbis]. In 81 common practice, LTP is often configured over the User Datagram 82 Protocol (UDP) [RFC0768] and Internet Protocol (IP) [RFC0791] using 83 the "socket" abstraction. LTP inherits its maximum segment size from 84 the maximum-sized UDP/IP datagram (i.e. 64KB minus header sizes), 85 however when that size exceeds the maximum IP packet size for the 86 path a service known as IP fragmentation must be employed. 88 LTP breaks BP bundles into "blocks", then further breaks these blocks 89 into "segments". The segment size is a configurable option and 90 represents the largest atomic portion of data that LTP will require 91 underlying layers to deliver as a single unit. The segment size is 92 therefore also known as the "retransmission unit", since each lost 93 segment must be retransmitted in its entirety. Experimental and 94 operational evidence has shown that on robust networks increasing the 95 LTP segment size (up to the maximum UDP/IP datagram size of slightly 96 less than 64KB) can result in substantial performance increases over 97 smaller segment sizes. However, the performance increases must be 98 tempered with the amount of IP fragmentation invoked as discussed 99 below. 101 When LTP presents a segment to the operating system kernel (e.g., via 102 a sendmsg() system call), the UDP layer prepends a UDP header to 103 create a UDP datagram. The UDP layer then presents the resulting 104 datagram to the IP layer for packet framing and transmission over a 105 networked path. The path is further characterized by the path 106 Maximum Transmission Unit (Path-MTU) which is a measure of the 107 smallest link MTU (Link-MTU) among all links in the path. 109 When LTP presents a segment to the kernel that is larger than the 110 Path-MTU, the resulting UDP datagram is presented to the IP layer 111 which in turn performs IP fragmentation to break the datagram into 112 fragments that are no larger than the Path-MTU. For example, if the 113 LTP segment size is 64KB and the Path-MTU is 1280 bytes IP 114 fragmentation results in 50+ fragments that are transmitted as 115 individual IP packets. (Note that for IPv4 [RFC0791], fragmentation 116 may occur either in the source host or in a router in the network 117 path, while for IPv6 [RFC8200] only the source host may perform 118 fragmentation.) 120 Each IP fragment is subject to the same best-effort delivery service 121 offered by the network according to current congestion and/or link 122 signal quality conditions; therefore, the IP fragment size becomes 123 known as the "loss unit". Especially when the packet loss rate is 124 non-negligible, however, performance can suffer dramatically when the 125 loss unit is significantly smaller than the retransmission unit. In 126 particular, if even a single IP fragment of a fragmented LTP segment 127 is lost then the entire LTP segment is deemed lost and must be 128 retransmitted. Since LTP does not support flow control or congestion 129 control, this can result in a cascading flood of redundant 130 information when fragments are systematically lost in transit. 132 This document discusses LTP interactions with IP fragmentation and 133 mitigations for managing the amount of IP fragmentation employed. It 134 further discusses methods for increasing LTP performance both with 135 and without the aid of IP fragmentation. 137 2. Terminology 139 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 140 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 141 "OPTIONAL" in this document are to be interpreted as described in BCP 142 14 [RFC2119][RFC8174] when, and only when, they appear in all 143 capitals, as shown here. 145 3. IP Fragmentation Issues 147 IP fragmentation is a fundamental service of the Internet Protocol, 148 yet it has long been understood that its use can be problematic in 149 some environments. Beginning as early as 1987, "Fragmentation 150 Considered Harmful" [FRAG] outlined multiple issues with the service 151 including a performance-crippling condition that can occur at high 152 data rates when the loss unit is considerably smaller than the 153 retransmission unit during intermittent and/or steady-state loss 154 conditions. 156 Later investigations also identified the possibility for undetected 157 corruption at high data rates due to a condition known as "ID 158 wraparound" when the 16-bit IP identification field (aka the "IP ID") 159 increments such that new fragments overlap with existing fragments 160 still alive in the network and with identical ID values 161 [RFC4963][RFC6864]. Although this issue occurs only in the IPv4 162 protocol (and not in IPv6 where the IP ID is 32-bits in length), the 163 IPv4 concerns along with the fact that IPv6 does not permit routers 164 to perform "network fragmentation" have led many to discourage the 165 use of fragmentation whenever possible. 167 Even in the modern era, investigators have seen fit to declare "IP 168 Fragmentation Considered Fragile" in an Internet Engineering Task 169 Force (IETF) Best Current Practice (BCP) reference [RFC8900]. 170 Indeed, the BCP recommendations cite the Bundle Protocol LTP 171 convergence layer as a user of IP fragmentation that depends on some 172 of its properties to realize greater performance. However, the BCP 173 summarizes by saying: 175 "Rather than deprecating IP fragmentation, this document 176 recommends that upper-layer protocols address the problem of 177 fragmentation at their layer, reducing their reliance on IP 178 fragmentation to the greatest degree possible." 180 While the performance implications are considerable and have serious 181 implications for real-world applications, our goal in this document 182 is neither to condemn nor embrace IP fragmentation as it pertains to 183 the Bundle Protocol LTP convergence layer operating over UDP/IP 184 sockets. Instead, we examine ways in which the benefits of IP 185 fragmentation can be realized while avoiding the pitfalls. We 186 therefore next discuss our systematic approach to LTP fragmentation. 188 4. LTP Fragmentation 190 In common LTP implementations over UDP/IP (e.g., the Interplanetary 191 Overlay Network (ION)), performance is greatly dependent on the LTP 192 segment size. This is due to the fact that a larger segment 193 presented to UDP/IP as a single unit incurs only a single system call 194 and a single data copy from application to kernel space via the 195 sendmsg() system call. Once inside the kernel, the segment incurs 196 UDP/IP encapsulation and IP fragmentation which again results in a 197 loss unit smaller than the retransmission unit. However, during 198 fragmentation, each fragment is transmitted immediately following the 199 previous without delay so that the fragments appear as a "burst" of 200 consecutive packets over the network path resulting in high network 201 utilization during the burst period. Additionally, the use of IP 202 fragmentation with a larger segment size conserves header framing 203 bytes since the upper layer headers only appear in the first IP 204 fragment as opposed to appearing in all fragments. 206 In order to avoid retransmission congestion (i.e., especially when 207 the loss probability is non-negligible), the natural choice would be 208 to set the LTP segment size to a size that is no larger than the 209 Path-MTU. Assuming the minimum IPv4 MTU of 576 bytes, however, 210 transmission of 64KB of data using a 576B segment size would require 211 well over 100 independent sendmsg() system calls and data copies as 212 opposed to just one when the largest segment size is used. This 213 greatly reduces the bandwidth advantage offered by IP fragmentation 214 bursts. Therefore, a means for providing the best aspects of both 215 large segment fragment bursting and small segment retransmission 216 efficiency is needed. 218 Common operating systems such as linux provide the sendmmsg() ("send 219 multiple messages") system call that allows the LTP application to 220 present the kernel with a vector of up to 1024 segments instead of 221 just a single segment. This theoretically affords the bursting 222 behavior of IP fragmentation coupled with the retransmission 223 efficiency of employing small segment sizes. (Note that LTP 224 receivers can also use the recvmmsg() ("receive multiple messages") 225 system call to receive a vector of segments from the kernel in case 226 multiple recent packet arrivals can be combined.) 228 This work therefore recommends implementations of LTP to employ a 229 large block size, a conservative segment size and a new configuration 230 option known as the "Burst-Limit" which determines the number of 231 segments that can be presented in a single sendmmsg() system call. 232 When the implementation receives an LTP block, it carves Burst-Limit- 233 many segments from the block and presents the vector of segments to 234 sendmmsg(). The kernel will prepare each segment as an independent 235 UDP/IP packet and transmit them into the network as a burst in a 236 fashion that parallels IP fragmentation. The loss unit and 237 retransmission unit will be the same, therefore loss of a single 238 segment does not result in a retransmission congestion event. 240 It should be noted that the Burst-Limit is bounded only by the LTP 241 block size and not by the maximum UDP/IP datagram size. Therefore, 242 each burst can in practice convey significantly more data than a 243 single IP fragmentation event. It should also be noted that the 244 segment size can still be made larger than the Path-MTU in low-loss 245 environments without danger of triggering retransmission storms due 246 to loss of IP fragments. This would result in combined large UDP/IP 247 message transmission and IP fragmentation bursting for increased 248 network utilization in more robust environments. Finally, both the 249 Burst-Limit and UDP/IP message sizes need not be static values, and 250 can be tuned to adaptively increase or decrease according to time 251 varying network conditions. 253 5. Beyond "sendmmsg()" 255 Implementation experience with the ION-DTN distribution along with 256 two recent studies have demonstrated limited performance increases 257 for employing sendmmsg() for transmission over UDP/IP sockets. A 258 first study used sendmmsg() as part of an integrated solution to 259 produce 1M packets per second assuming only raw data transmission 260 conditions [MPPS], while a second study focused on performance 261 improvements for the QUIC reliable transport service [QUIC]. In both 262 studies, the use of sendmmsg() alone produced observable increases 263 but complimentary enhancements were identified that (when combined 264 with sendmmsg()) produced considerable additional increases. 266 In [MPPS], additional enhancements such as using recvmmsg() and 267 configuring multiple receive queues at the receiver were introduced 268 in an attempt to achieve greater parallelism and engage multiple 269 processors and threads. However, the system was still limited to a 270 single thread until multiple receiving processes were introduced 271 using the "SO_REUSEPORT" socket option. By having multiple receiving 272 processes (each with its own socket buffer), the performance 273 advantages of parallel processing were employed to achieve the 1M 274 packets per second goal. 276 In [QUIC], a new feature available in recent linux kernel versions 277 was employed. The feature, known as "Generic Segmentation Offload 278 (GSO) / Generic Receive Offload (GRO)" allows an application to 279 provide the kernel with a "super-buffer" containing up to 64 separate 280 upper layer protocol segments. When the application presents the 281 super-buffer to the kernel, GSO segmentation then sends up to 64 282 separate UDP/IP packets in a burst. (Note that GSO requires each 283 UDP/IP packet to be no larger than the path MTU so that receivers can 284 invoke GRO without interactions with IP reassembly.) The GSO 285 facility can be invoked by either sendmsg() (i.e., a single super- 286 buffer) or sendmmsg() (i.e., multiple super-buffers), and the study 287 showed a substantial performance increase over using just sendmsg() 288 and sendmmsg() alone. 290 For LTP fragmentation, our ongoing efforts explore using these 291 techniques in a manner that parallels the effort undertaken for QUIC. 292 Using these higher-layer segmentation management facilities is 293 consistent with the guidance in "IP Fragmentation Considered Fragile" 294 that states: 296 "Rather than deprecating IP fragmentation, this document 297 recommends that upper-layer protocols address the problem of 298 fragmentation at their layer, reducing their reliance on IP 299 fragmentation to the greatest degree possible." 301 By addressing fragmentation at their layer, the LTP/UDP functions can 302 then be tuned to minimize IP fragmentation in environments where it 303 may be problematic or to adaptively engage IP fragmentation in 304 environments where performance gains can be realized without risking 305 sustained loss and/or data corruption. 307 6. LTP Performance Enhancement Using GSO/GRO 309 Some modern operating systems include Generic Segment Offload (GSO) 310 and Generic Receive Offload (GRO) services. For example, GSO/GRO 311 support has been included in linux beginning with kernel version 312 4.18. Some network drivers and network hardware also support GSO/GRO 313 at or below the operating system network device driver interface 314 layer to provide benefits of delayed segmentation and/or early 315 reassembly. The following sections discuss LTP interactions with GSO 316 and GRO. 318 6.1. LTP and GSO 320 GSO allows LTP implementations to present the sendmsg() or sendmmsg() 321 system calls with "super-buffers" that include up to 64 LTP segments 322 which the kernel will subdivide into individual UDP/IP datagrams. 323 LTP implementations enable GSO either on a per-socket basis using the 324 "setsockopt()" system call or on a per-message basis for 325 sendmsg()/sendmmsg() as follows: 327 /* Set LTP segment size */ 328 unsigned integer gso_size = SEGSIZE; 329 ... 330 /* Enable GSO for all messages sent on the socket */ 331 setsockopt(fd, SOL_UDP, UDP_SEGMENT, &gso_size, sizeof(gso_size))); 332 ... 333 /* Alternatively, set per-message GSO control */ 334 cm = CMSG_FIRSTHDR(&msg); 335 cm->cmsg_level = SOL_UDP; 336 cm->cmsg_type = UDP_SEGMENT; 337 cm->cmsg_len = CMSG_LEN(sizeof(uint16_t)); 338 *((uint16_t *) CMSG_DATA(cm)) = gso_size; 340 Implementations must set SEGSIZE to a value no larger than the path 341 MTU via the underlying network interface, minus the header sizes 342 (see: Appendix A); this ensures that UDP/IP datagrams generated 343 during GSO segmentation will not incur local IP fragmentation prior 344 to transmission (NB: the linux kernel returns EINVAL if SEGSIZE is 345 set to a value that would exceed the path MTU.) 347 Implementations should therefore dynamically determine SEGSIZE for 348 paths that traverse multiple links through Packetization Layer Path 349 MTU Discovery for Datagram Transports [RFC8899] (DPMTUD). 350 Implementations should set an initial SEGSIZE to either a known 351 minimum MTU for the path or to the protocol-defined minimum path MTU 352 (i.e., 576 for IPv4 or 1280 for IPv6). Implementations may then 353 dynamically increase SEGSIZE without service interruption if the 354 discovered path MTU is larger. 356 6.2. LTP and GRO 358 GRO allows the kernel to return "super-buffers" that contain multiple 359 concatenated received segments to the LTP implementation in recvmsg() 360 or recvmmsg() system calls, where each concatenated segment is 361 distinguished by an LTP segment header per [RFC5326]. LTP 362 implementations enable GRO on a per-socket basis using the 363 "setsockopt()" system call, then optionally set up per receive 364 message ancillary data to receive the segment length for each message 365 as follows: 367 /* Enable GRO */ 368 unsigned integer use_gro = 1; /* boolean */ 369 setsockopt(fd, SOL_UDP, UDP_GRO, &use_gro, sizeof(use_gro))); 370 ... 371 /* Set per-message GRO control */ 372 cmsg->cmsg_len = CMSG_LEN(sizeof(int)); 373 *((int *)CMSG_DATA(cmsg)) = 0; 374 cmsg->cmsg_level = SOL_UDP; 375 cmsg->cmsg_type = UDP_GRO; 376 ... 377 /* Receive per-message GRO segment length */ 378 if ((segmentLength = *((int *)CMSG_DATA(cmsg))) <= 0) 379 segmentLength = messageLength; 381 Implementations include a pointer to a "use_gro" boolean indication 382 to the kernel to enable GRO; the only interoperability requirement 383 therefore is that each UDP/IP packet includes an integral number of 384 properly-formed LTP segments. The kernel and/or underlying network 385 hardware will first coalesce multiple received segments into a larger 386 single segment whenever possible and/or return multiple coalesced or 387 singular segments to the LTP implementation so as to maximize the 388 amount of data returned in a single system call. The "super-buffer" 389 thus prepared MUST contain at most 64 segments where each non-final 390 segment MUST be equal in length and the final segment MUST NOT be 391 longer than the non-final segment length. 393 Implementations that invoke recvmsg( ) and/or recvmmsg() will 394 therefore receive "super-buffers" that include one or more 395 concatenated received LTP segments. The LTP implementation accepts 396 all received LTP segments and identifies any segments that may be 397 missing. The LTP protocol then engages segment report procedures if 398 necessary to request retransmission of any missing segments. 400 6.3. LTP GSO/GRO Over OMNI Interfaces 402 LTP engines produce UDP/IP packets that can be forwarded over an 403 underlying network interface as the head-end of a "link-layer service 404 that transits IP packets". UDP/IP packets that enter the link near- 405 end are deterministically delivered to the link-far end modulo loss 406 due to corruption, congestion or disruption. The link-layer service 407 is associated with an MTU that deterministically establishes the 408 maximum packet size that can transit the link. The link-layer 409 service may further support a segmentation and reassembly function 410 with fragment retransmissions at a layer below IP; in many cases, 411 these timely link-layer retransmissions can reduce dependency on 412 (slow) end-to-end retransmissions. 414 LTP engines that connect to networks traversed by paths consisting of 415 multiple concatenated links must be prepared to adapt their segment 416 sizes to match the minimum MTU of all links in the path. This could 417 result in a small SEGSIZE that would interfere with the benefits of 418 GSO/GRO layering. However, nodes that configure LTP engines can also 419 establish an Overlay Multilink Network Interface (OMNI) 420 [I-D.templin-6man-omni] that spans the multiple concatenated links 421 while presenting an assured (64KB-1) MTU to the LTP engine. 423 The OMNI interface internally uses IPv6 fragmentation as an OMNI 424 Adaptation Layer (OAL) service not visible to the LTP engine to allow 425 timely link-layer retransmissions of lost fragments where the 426 retransmission unit matches the loss unit. The LTP engine can then 427 dynamically vary its SEGSIZE (up to a maximum value of (64KB-1) minus 428 headers) to determine the size that produces the best performance at 429 the current time by engaging the combined operational factors at all 430 layers of the multi-layer architecture. This dynamic factoring 431 coupled with the ideal link properties provided by the OMNI interface 432 support an effective layering solution for many DTN networks. 434 When an LTP/UDP/IP packet is transmitted over an OMNI interface, the 435 OAL inserts an IPv6 header and performs IPv6 fragmentation to produce 436 fragments small enough to fit within the path MTU. The OAL then 437 replaces the IPv6 encapsulation headers with OMNI Compressed Headers 438 (OCHs) which are significantly smaller that their uncompressed IPv6 439 header counterparts and even smaller than the IPv4 headers would have 440 been had the packet been sent directly over a physical interface such 441 as Ethernet using IPv4 fragmentation. 443 The end result is that the first fragment produced by the OAL will 444 include a small amount of additional overhead to accommodate the OCH 445 encapsulation header while all additional fragments will include only 446 an OCH header which is significantly smaller than even an IPv4 447 header. The act of forwarding the large LTP/UDP/IP packet over the 448 OMNI interface will therefore produce a considerable overhead savings 449 in comparison with direct Ethernet transmission. 451 Using the OMNI interface with its OAL service in addition to the GSO/ 452 GRO mechanism, an LTP engine can therefore theoretically present 453 concatenated LTP segments in a "super-buffer" of up to (64 * ((64KB- 454 1) minus headers)) octets for transmission in a single sendmsg() 455 system call, and may present multiple such "super-buffers" in a 456 single system call when sendmmsg() is used. (Note however that 457 existing implementations limit the maximum-sized "super-buffer" to 458 only 64KB total.) In the future, this service may realize even 459 greater benefits through the use of IP Jumbograms [RFC2675] over 460 paths that support them. 462 6.4. IP Parcels 464 The so-called "super-buffers" discussed in the previous sessions can 465 be applied for GSO/GRO only when the LTP application endpoints are 466 co-resident with the OAL source and destination, respectively. 467 However, it may be desirable for the future architecture to support 468 network forwarding for these "super-buffers" in case the LTP source 469 and/or destination are located one or more IP networking hops away 470 from nodes that configure their respective source and destination 471 OMNI interfaces. Moreover, if the OMNI virtual link spans multiple 472 OMNI intermediate nodes on the path from the OAL source to the OAL 473 destination it may be desirable to keep the "super-buffers" together 474 as much as possible as they traverse the intermediate hops. For this 475 reason, a new construct known as the "IP Parcel" has been specified 476 [I-D.templin-intarea-parcels]. 478 An IP parcel is a special form of an IP Jumbogram that includes a 479 non-zero value in the IP {Total, Payload} Length field. The value in 480 that field sets the segment size for the first segment included in 481 the parcel, while the value coded in the Jumbo Payload header 482 determines the number of segments included. Each segment "shares" 483 the same IP header, and the parcel can be broken down into sub- 484 parcels if necessary to traverse paths with length restrictions. A 485 full discussion of IP parcels is found in 486 [I-D.templin-intarea-parcels]. 488 7. Implementation Status 490 Supporting code for invoking the sendmmsg() facility is included in 491 the official ION source code distribution, beginning with release 492 ion-4.0.1. 494 Working code for GSO/GRO has been incorporated into a pre-release of 495 ION and scheduled for integration following the next major release. 497 8. IANA Considerations 499 This document introduces no IANA considerations. 501 9. Security Considerations 503 Communications networking security is necessary to preserve 504 confidentiality, integrity and availability. 506 10. Acknowledgements 508 The NASA Space Communications and Networks (SCaN) directorate 509 coordinates DTN activities for the International Space Station (ISS) 510 and other space exploration initiatives. 512 Akash Agarwal, Madhuri Madhava Badgandi, Keith Philpott, Bill 513 Pohlchuck, Vijayasarathy Rajagopalan, Bhargava Raman Sai Prakash and 514 Eric Yeh are acknowledged for their significant contributions. Tyler 515 Doubrava was the first to mention the "sendmmsg()" facility. Scott 516 Burleigh provided review input, and David Zoller provided useful 517 perspective. 519 11. References 521 11.1. Normative References 523 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 524 DOI 10.17487/RFC0768, August 1980, 525 . 527 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 528 DOI 10.17487/RFC0791, September 1981, 529 . 531 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 532 Requirement Levels", BCP 14, RFC 2119, 533 DOI 10.17487/RFC2119, March 1997, 534 . 536 [RFC5326] Ramadas, M., Burleigh, S., and S. Farrell, "Licklider 537 Transmission Protocol - Specification", RFC 5326, 538 DOI 10.17487/RFC5326, September 2008, 539 . 541 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 542 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 543 May 2017, . 545 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 546 (IPv6) Specification", STD 86, RFC 8200, 547 DOI 10.17487/RFC8200, July 2017, 548 . 550 11.2. Informative References 552 [FRAG] Mogul, J. and C. Kent, "Fragmentation Considered Harmful, 553 ACM Sigcomm 1987", August 1987. 555 [I-D.ietf-dtn-bpbis] 556 Burleigh, S., Fall, K., and E. J. Birrane, "Bundle 557 Protocol Version 7", Work in Progress, Internet-Draft, 558 draft-ietf-dtn-bpbis-31, 25 January 2021, 559 . 562 [I-D.templin-6man-omni] 563 Templin, F. L. and T. Whyman, "Transmission of IP Packets 564 over Overlay Multilink Network (OMNI) Interfaces", Work in 565 Progress, Internet-Draft, draft-templin-6man-omni-52, 31 566 December 2021, . 569 [I-D.templin-intarea-parcels] 570 Templin, F. L., "IP Parcels", Work in Progress, Internet- 571 Draft, draft-templin-intarea-parcels-06, 22 December 2021, 572 . 575 [MPPS] Majkowski, M., "How to Receive a Million Packets Per 576 Second, https://blog.cloudflare.com/how-to-receive-a- 577 million-packets/", June 2015. 579 [QUIC] Ghedini, A., "Accelerating UDP Packet Transmission for 580 QUIC, https://calendar.perfplanet.com/2019/accelerating- 581 udp-packet-transmission-for-quic/", December 2019. 583 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 584 RFC 2675, DOI 10.17487/RFC2675, August 1999, 585 . 587 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 588 Errors at High Data Rates", RFC 4963, 589 DOI 10.17487/RFC4963, July 2007, 590 . 592 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 593 RFC 6864, DOI 10.17487/RFC6864, February 2013, 594 . 596 [RFC8899] Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T. 597 Völker, "Packetization Layer Path MTU Discovery for 598 Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, 599 September 2020, . 601 [RFC8900] Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 602 and F. Gont, "IP Fragmentation Considered Fragile", 603 BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020, 604 . 606 Appendix A. IPv4/IPv6 Protocol Considerations 608 LTP/UDP/IP peers can communicate either via IPv4 or IPv6 addressing 609 when both peers configure a unique address of the same protocol 610 version on the OMNI interface. The IPv4 Total Length field includes 611 the length of both the UDP header and base IPv4 header, while the 612 IPv6 Payload Length field includes the length of the UDP header but 613 not the base IPv6 header. 615 Therefore, unless header extensions are included, each maximum-sized 616 LTP/UDP/IPv6 packet would contain 20 octets more actual LTP data than 617 a maximum-sized LTP/UDP/IPv4 packet can contain for the price of 618 including only 20 additional header octets for IPv6. The overhead 619 percentage for carrying this additional 20 header octets in maximum- 620 sized packets is therefore insignificant and becomes smaller still 621 when IPv6 header compression is used. 623 Appendix B. The Intergalactic Jigsaw Puzzle Builders Club 625 The process we are optimizing is like an imaginary Intergalactic 626 Jigsaw Puzzle Builders Club. A first builder starts with an original 627 image, admires it momentarily then breaks it up into like-sized 628 puzzle pieces with unique serial numbers. The first builder then 629 delivers each piece to their local post office which has an 630 Intergalactic Puzzle Piece Transporter. 632 The transporter can instantly deliver each puzzle piece to a remote 633 post office which could be nearby or in a far-off distant galaxy. 634 The remote post office then delivers each piece to the next builder 635 who very quickly puts it in the correct place based on the serial 636 number. This builder eventually reconstructs the entire original 637 image, then admires it and forwards it on to the next builder in the 638 same fashion. 640 All original images are the same dimensions, but each consecutive 641 builder can choose to break them into fewer and larger pieces or more 642 and smaller pieces - for example 100, 250, 500, 1000 or even more 643 pieces. The local post office transporter can send smaller pieces 644 intact, but must cut larger pieces into fragments that the remote 645 post office will paste back together. This process is both fast and 646 invisible to the builders who only see whole puzzle pieces and not 647 fragments. 649 For ION-DTN LTP performance, we believed that performance could 650 increase if builders could exchange MULTIPLE puzzle pieces in 651 packages called PARCELs with their local post offices instead of just 652 one piece at a time, and we have shown that this is true to a limited 653 extent for small- to medium-sized pieces. But, we see that overall 654 system performance is dominated by the time needed for the receiver 655 to install a SINGLE puzzle piece, and we see that builders can 656 reassemble puzzles with fewer and larger pieces MUCH faster than for 657 ones with more and smaller pieces. 659 So, why not just use larger puzzle pieces all the time? The problem 660 is the transporter is imperfect and can lose, damage and/or reorder 661 pieces. And, if even a single bit is lost or damaged the sender must 662 retransmit the entire large piece all over again. This is not only 663 expensive (since the post office charges for transporter use by 664 weight) but the whole service degrades because the loss unit is 665 smaller than the retransmission unit resulting in a cascading flood 666 of redundant information. 668 The system is a multi-variable optimization problem, and there are 669 many knobs to turn. Tuning characteristics can also vary over time 670 due to fluctuations in transporter performance. We also believe that 671 if the transporter can be made to quickly retransmit lost fragments, 672 it can often salvage partial puzzle pieces that otherwise would have 673 been discarded. This would allow builders to use larger pieces to 674 increase performance. 676 Studies of the QUIC protocol have shown that PARCELs can result in 677 major performance increases by making the builder to post office 678 interface exchanges more efficient. For LTP, we have seen limited 679 increases (less than factor-2) using smaller segment sizes. While 680 any increase is good, we believe that either increasing the single 681 puzzle piece placement speed or supporting placement of multiple 682 pieces simultaneously will be gating factors for increased 683 performance. This may shift the performance bottleneck back to the 684 builder to post office interface, and PARCELS may help achieve 685 greater increases even for larger puzzle pieces. 687 Author's Address 689 Fred L. Templin (editor) 690 Boeing Research & Technology 691 P.O. Box 3707 692 Seattle, WA 98124 693 United States of America 695 Email: fltemplin@acm.org