idnits 2.17.1 draft-templin-dtn-ltpfrag-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 8, 2021) is 1135 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Intended status: Informational March 8, 2021 5 Expires: September 9, 2021 7 LTP Fragmentation 8 draft-templin-dtn-ltpfrag-04 10 Abstract 12 The Licklider Transmission Protocol (LTP) provides a reliable 13 datagram convergence layer for the Delay/Disruption Tolerant 14 Networking (DTN) Bundle Protocol. In common practice, LTP is often 15 configured over UDP/IP sockets and inherits its maximum segment size 16 from the maximum-sized UDP datagram, however when this size exceeds 17 the maximum IP packet size for the path a service known as IP 18 fragmentation must be employed. This document discusses LTP 19 interactions with IP fragmentation and mitigations for managing the 20 amount of IP fragmentation employed. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 9, 2021. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. IP Fragmentation Issues . . . . . . . . . . . . . . . . . . . 3 59 4. LTP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 4 60 5. Beyond "sendmmsg()" . . . . . . . . . . . . . . . . . . . . . 6 61 6. Implementation Status . . . . . . . . . . . . . . . . . . . . 7 62 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 63 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 64 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 65 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 10.1. Normative References . . . . . . . . . . . . . . . . . . 7 67 10.2. Informative References . . . . . . . . . . . . . . . . . 8 68 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 70 1. Introduction 72 The Licklider Transmission Protocol (LTP) [RFC5326] provides a 73 reliable datagram convergence layer for the Delay/Disruption Tolerant 74 Networking (DTN) Bundle Protocol (BP) [I-D.ietf-dtn-bpbis]. In 75 common practice, LTP is often configured over the User Datagram 76 Protocol (UDP) [RFC0768] and Internet Protocol (IP) [RFC0791] using 77 the "socket" abstraction. LTP inherits its maximum segment size from 78 the maximum-sized UDP datagram (i.e. 2^16 bytes minus header sizes), 79 however when the UDP datagram size exceeds the maximum IP packet size 80 for the path a service known as IP fragmentation must be employed. 82 LTP breaks BP bundles into "blocks", then further breaks these blocks 83 into "segments". The segment size is a configurable option and 84 represents the largest atomic block of data that LTP will require 85 underlying layers to deliver as a single unit. The segment size is 86 therefore also known as the "retransmission unit", since each lost 87 segment must be retransmitted in its entirety. Experimental and 88 operational evidence has shown that on robust networks increasing the 89 LTP segment size (up to the maximum UDP datagram size of slightly 90 less than 64KB) can result in substantial performance increases over 91 smaller segment sizes. However, the performance increases must be 92 tempered with the amount of IP fragmentation invoked as discussed 93 below. 95 When LTP presents a segment to the operating system kernel (e.g., via 96 a sendmsg() system call), the UDP layer prepends a UDP header to 97 create a UDP datagram. The UDP layer then presents the resulting 98 datagram to the IP layer for packet framing and transmission over a 99 networked path. The path is further characterized by the path 100 Maximum Transmission Unit (Path-MTU) which is a measure of the 101 smallest link MTU (Link-MTU) among all links in the path. 103 When LTP presents a segment to the kernel that is larger than the 104 Path-MTU, the resulting UDP datagram is presented to the IP layer, 105 which in turn performs IP fragmentation to break the datagram into 106 fragments that are no larger than the Path-MTU. For example, if the 107 LTP segment size is 64KB and the Path-MTU is 1280 bytes IP 108 fragmentation results in 50+ fragments that are transmitted as 109 individual IP packets. (Note that for IPv4 [RFC0791], fragmentation 110 may occur either in the source host or in a router in the network 111 path, while for IPv6 [RFC8200] only the source host may perform 112 fragmentation.) 114 Each IP fragment is subject to the same best-effort delivery service 115 offered by the network according to current congestion and/or link 116 signal quality conditions; therefore, the IP fragment size becomes 117 known as the "loss unit". Especially when the packet loss rate is 118 non-negligible, however, performance can suffer dramatically when the 119 loss unit is significantly smaller than the retransmission unit. In 120 particular, if even a single IP fragment of a fragmented LTP segment 121 is lost then the entire LTP segment is deemed lost and must be 122 retransmitted. 124 This document discusses LTP interactions with IP fragmentation and 125 mitigations for managing the amount of IP fragmentation employed. It 126 further discusses methods for increasing LTP performance both with 127 and without the aid of IP fragmentation. 129 2. Terminology 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 133 "OPTIONAL" in this document are to be interpreted as described in BCP 134 14 [RFC2119][RFC8174] when, and only when, they appear in all 135 capitals, as shown here. 137 3. IP Fragmentation Issues 139 IP fragmentation is a fundamental service of the Internet Protocol, 140 yet it has long been understood that its use can be problematic in 141 some environments. Beginning as early as 1987, "Fragmentation 142 Considered Harmful" [FRAG] outlined multiple issues with the service 143 including a performance-crippling condition that can occur at high 144 data rates when the loss unit is considerably smaller than the 145 retransmission unit during intermittent and/or steady-state loss 146 conditions. 148 Later investigations also identified the possibility for undetected 149 data corruption at high data rates due to a condition known as "ID 150 wraparound" when the 16-bit IP identification field (aka the "IP ID") 151 increments such that new fragments overlap with existing fragments 152 still alive in the network and with identical ID values 153 [RFC4963][RFC6864]. Although this issue occurs only in the IPv4 154 protocol (and not in IPv6 where the IP ID is 32-bits in length), the 155 IPv4 concerns along with the fact that IPv6 does not permit routers 156 to perform "network fragmentation" have led many to discourage its 157 use. 159 Even in the modern era, investigators have seen fit to declare "IP 160 Fragmentation Considered Fragile" in an Internet Engineering Task 161 Force (IETF) Best Current Practice (BCP) reference [RFC8900]. 162 Indeed, the BCP recommendations cite the Bundle Protocol LTP 163 convergence layer as a user of IP fragmentation that depends on some 164 of its properties to realize greater performance. However, the BCP 165 summarizes by saying: 167 "Rather than deprecating IP fragmentation, this document 168 recommends that upper-layer protocols address the problem of 169 fragmentation at their layer, reducing their reliance on IP 170 fragmentation to the greatest degree possible." 172 While the performance implications are considerable and have serious 173 implications for real-world applications, our goal in this document 174 is neither to condemn nor embrace IP fragmentation as it pertains to 175 the Bundle Protocol LTP convergence layer operating over UDP/IP 176 sockets. Instead, we examine ways in which the benefits of IP 177 fragmentation can be realized while avoiding the pitfalls. We 178 therefore next discuss our systematic approach to LTP fragmentation. 180 4. LTP Fragmentation 182 In common LTP implementations over UDP/IP (e.g., the Interplanetary 183 Overlay Network (ION)), performance is greatly dependent on the LTP 184 segment size. This is due to the fact that a larger segment 185 presented to UDP/IP as a single unit incurs only a single system call 186 and a single data copy from application to kernel space via the 187 sendmsg() system call. Once inside the kernel, the segment incurs 188 UDP/IP encapsulation and IP fragmentation which again results in a 189 loss unit smaller than the retransmission unit. However, during 190 fragmentation, each fragment is transmitted immediately following the 191 previous without delay so that the fragments appear as a "burst" of 192 consecutive packets over the network path resulting in high network 193 utilization during the burst period. Additionally, the use of IP 194 fragmentation with a larger segment size conserves header framing 195 bytes since the LTP layer headers only appear in the first IP 196 fragment as opposed to appearing in all IP packets. 198 In order to avoid retransmission congestion (i.e., especially when 199 the loss probability is non-negligible), the natural choice would be 200 to set the LTP segment size to a size that is no larger than the 201 Path-MTU. Assuming the minimum IPv4 MTU of 576 bytes, however, 202 transmission of 64KB of data using a 576B segment size would require 203 well over 100 independent sendmsg() system calls and data copies as 204 opposed to just one when the largest segment size is used. This 205 greatly reduces the bandwidth advantage offered by IP fragmentation 206 bursts. Therefore, a means for providing the best aspects of both 207 large segment fragment bursting and small segment retransmission 208 efficiency is needed. 210 Common operating systems such as linux provide the sendmmsg() ("send 211 multiple messages") system call that allows the LTP application to 212 present the kernel with a vector of up to 1024 segments instead of 213 just a single segment. This affords the bursting behavior of IP 214 fragmentation coupled with the retransmission efficiency of employing 215 small segment sizes. (Note that LTP receivers can also use the 216 recvmmsg() ("receive multiple messages") system call to receive a 217 vector of segments from the kernel in case multiple recent packet 218 arrivals can be combined.) 220 This work therefore recommends implementations of LTP to employ a 221 large block size, a conservative segment size and a new configuration 222 option known as the "Burst-Limit" which determines the number of 223 segments that can be presented in a single sendmmsg() system call. 224 When the implementation receives an LTP block, it carves Burst-Limit- 225 many segments from the block and presents the vector of segments to 226 sendmmsg(). The kernel will prepare each segment as an independent 227 UDP/IP packet and transmit them into the network as a burst in a 228 fashion that parallels IP fragmentation. The loss unit and 229 retransmission unit will be the same, therefore loss of a single 230 segment does not result in a retransmission congestion event. 232 It should be noted that the Burst-Limit is bounded only by the LTP 233 block size and not by the maximum UDP datagram size. Therefore, each 234 burst can in practice convey significantly more data than a single IP 235 fragmentation event. It should also be noted that the segment size 236 can still be made larger than the Path-MTU in low-loss environments 237 without danger of triggering retransmission storms due to loss of IP 238 fragments. This would result in combined UDP message and IP fragment 239 bursting for increased network utilization in more robust 240 environments. Finally, both the Burst-Limit and UDP message sizes 241 need not be static values, and can be tuned to adaptively increase or 242 decrease according to time varying network conditions. 244 5. Beyond "sendmmsg()" 246 Implementation experience with the ION DTN distribution along with 247 two recent studies have demonstrated performance increases for 248 employing sendmmsg() for transmission over UDP/IP sockets. A first 249 study used sendmmsg() as part of an integrated solution to produce 1M 250 packets per second assuming only raw data transmission conditions 251 [MPPS], while a second study focused on performance improvements for 252 the QUIC reliable transport service [QUIC]. In both studies, the use 253 of sendmmsg() alone produced observable increases but complimentary 254 enhancements were identified that (when combined with sendmmsg()) 255 produced considerable additional increases. 257 In [MPPS], additional enhancements such as using recvmmsg() and 258 configuring multiple receive queues at the receiver were introduced 259 in an attempt to achieve greater parallelism and engage multiple 260 processors and threads. However, the system was still limited to a 261 single thread until multiple receiving processes were introduced 262 using the "SO_REUSEPORT" socket option. By having multiple receiving 263 processes (each with its own socket buffer), the performance 264 advantages of parallel processing were employed to achieve the 1M 265 packets per second goal. 267 In [QUIC], a new feature available in recent linux kernel versions 268 was employed. The feature, known as "Generic Segmentation Offload 269 (GSO) / Generic Receive Offload (GRO)" allows an application to 270 provide the kernel with a "super-buffer" containing up to 64 separate 271 QUIC/UDP segments. When the application presents the super-buffer to 272 the kernel, GSO segmentation then sends 64 separate UDP/IP packets in 273 a burst. If each packet is larger than the Path-MTU, then IP 274 fragmentation will be invoked for each packet leading to high network 275 utilization (at the risk of IP fragment loss and retransmission 276 storms). The GSO facility can be invoked by either sendmsg() (i.e., 277 a single super-buffer) or sendmmsg() (i.e., multiple super-buffers), 278 and the study showed a substantial performance increase over using 279 just sendmsg() and sendmmsg() alone. 281 For LTP fragmentation, our ongoing efforts explore using these 282 techniques in a manner that parallels the effort undertaken for QUIC. 283 Using these higher-layer segmentation management facilities is 284 consistent with the guidance in "IP Fragmentation Considered Fragile" 285 that states: 287 "Rather than deprecating IP fragmentation, this document 288 recommends that upper-layer protocols address the problem of 289 fragmentation at their layer, reducing their reliance on IP 290 fragmentation to the greatest degree possible." 292 By addressing fragmentation at their layer, the LTP/UDP functions can 293 then be tuned to minimize IP fragmentation in environments where it 294 may be problematic or to adaptively engage IP fragmentation in 295 environments where performance gains can be realized without risking 296 data corruption. 298 6. Implementation Status 300 Supporting code for invoking the sendmmsg() facility is included in 301 the official ION source code distribution, beginning with release 302 ion-4.0.1. 304 7. IANA Considerations 306 This document introduces no IANA considerations. 308 8. Security Considerations 310 Communications networking security is necessary to preserve 311 confidentiality, integrity and availability. 313 9. Acknowledgements 315 The NASA Space Communications and Networks (SCaN) directorate 316 coordinates DTN activities for the International Space Station (ISS) 317 and other space exploration initiatives. 319 Madhuri Madhava Badgandi, Keith Philpott, Bill Pohlchuck, 320 Vijayasarathy Rajagopalan and Eric Yeh are acknowledged for their 321 significant contributions. Tyler Doubrava was the first to mention 322 the "sendmmsg()" facility. Scott Burleigh provided review input, and 323 David Zoller provided useful perspective. 325 10. References 327 10.1. Normative References 329 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 330 DOI 10.17487/RFC0768, August 1980, 331 . 333 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 334 DOI 10.17487/RFC0791, September 1981, 335 . 337 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 338 Requirement Levels", BCP 14, RFC 2119, 339 DOI 10.17487/RFC2119, March 1997, 340 . 342 [RFC5326] Ramadas, M., Burleigh, S., and S. Farrell, "Licklider 343 Transmission Protocol - Specification", RFC 5326, 344 DOI 10.17487/RFC5326, September 2008, 345 . 347 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 348 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 349 May 2017, . 351 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 352 (IPv6) Specification", STD 86, RFC 8200, 353 DOI 10.17487/RFC8200, July 2017, 354 . 356 10.2. Informative References 358 [FRAG] Mogul, J. and C. Kent, "Fragmentation Considered Harmful, 359 ACM Sigcomm 1987", August 1987. 361 [I-D.ietf-dtn-bpbis] 362 Burleigh, S., Fall, K., and E. Birrane, "Bundle Protocol 363 Version 7", draft-ietf-dtn-bpbis-31 (work in progress), 364 January 2021. 366 [MPPS] Majkowski, M., "How to Receive a Million Packets Per 367 Second, https://blog.cloudflare.com/how-to-receive-a- 368 million-packets/", June 2015. 370 [QUIC] Ghedini, A., "Accelerating UDP Packet Transmission for 371 QUIC, https://calendar.perfplanet.com/2019/accelerating- 372 udp-packet-transmission-for-quic/", December 2019. 374 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 375 Errors at High Data Rates", RFC 4963, 376 DOI 10.17487/RFC4963, July 2007, 377 . 379 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 380 RFC 6864, DOI 10.17487/RFC6864, February 2013, 381 . 383 [RFC8900] Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., 384 and F. Gont, "IP Fragmentation Considered Fragile", 385 BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020, 386 . 388 Author's Address 390 Fred L. Templin (editor) 391 Boeing Research & Technology 392 P.O. Box 3707 393 Seattle, WA 98124 394 USA 396 Email: fltemplin@acm.org