idnits 2.17.1 draft-templin-dtn-ltpfrag-08.txt: -(596): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (1 February 2022) is 808 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-74) exists of
     draft-templin-6man-omni-52

  == Outdated reference: A later version (-99) exists of
     draft-templin-intarea-parcels-06


     Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                 F. L. Templin, Ed.
3	Internet-Draft                              Boeing Research & Technology
4	Intended status: Informational                           1 February 2022
5	Expires: 5 August 2022

7	                           LTP Fragmentation
8	                      draft-templin-dtn-ltpfrag-08

10	Abstract

12	   The Licklider Transmission Protocol (LTP) provides a reliable
13	   datagram convergence layer for the Delay/Disruption Tolerant
14	   Networking (DTN) Bundle Protocol.  In common practice, LTP is often
15	   configured over UDP/IP sockets and inherits its maximum segment size
16	   from the maximum-sized UDP/IP datagram, however when this size
17	   exceeds the maximum IP packet size for the path a service known as IP
18	   fragmentation must be employed.  This document discusses LTP
19	   interactions with IP fragmentation and mitigations for managing the
20	   amount of IP fragmentation employed.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on 5 August 2022.

39	Copyright Notice

41	   Copyright (c) 2022 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
46	   license-info) in effect on the date of publication of this document.
47	   Please review these documents carefully, as they describe your rights
48	   and restrictions with respect to this document.  Code Components
49	   extracted from this document must include Revised BSD License text as
50	   described in Section 4.e of the Trust Legal Provisions and are
51	   provided without warranty as described in the Revised BSD License.

53	Table of Contents

55	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
56	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
57	   3.  IP Fragmentation Issues . . . . . . . . . . . . . . . . . . .   4
58	   4.  LTP Fragmentation . . . . . . . . . . . . . . . . . . . . . .   5
59	   5.  Beyond "sendmmsg()" . . . . . . . . . . . . . . . . . . . . .   6
60	   6.  LTP Performance Enhancement Using GSO/GRO . . . . . . . . . .   7
61	     6.1.  LTP and GSO . . . . . . . . . . . . . . . . . . . . . . .   7
62	     6.2.  LTP and GRO . . . . . . . . . . . . . . . . . . . . . . .   8
63	     6.3.  LTP GSO/GRO Over OMNI Interfaces  . . . . . . . . . . . .   9
64	     6.4.  IP Parcels  . . . . . . . . . . . . . . . . . . . . . . .  11
65	   7.  Implementation Status . . . . . . . . . . . . . . . . . . . .  11
66	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
67	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
68	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
69	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
70	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  12
71	     11.2.  Informative References . . . . . . . . . . . . . . . . .  12
72	   Appendix A.  IPv4/IPv6 Protocol Considerations  . . . . . . . . .  14
73	   Appendix B.  The Intergalactic Jigsaw Puzzle Builders Club  . . .  14
74	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  15

76	1.  Introduction

78	   The Licklider Transmission Protocol (LTP) [RFC5326] provides a
79	   reliable datagram convergence layer for the Delay/Disruption Tolerant
80	   Networking (DTN) Bundle Protocol (BP) [I-D.ietf-dtn-bpbis].  In
81	   common practice, LTP is often configured over the User Datagram
82	   Protocol (UDP) [RFC0768] and Internet Protocol (IP) [RFC0791] using
83	   the "socket" abstraction.  LTP inherits its maximum segment size from
84	   the maximum-sized UDP/IP datagram (i.e. 64KB minus header sizes),
85	   however when that size exceeds the maximum IP packet size for the
86	   path a service known as IP fragmentation must be employed.

88	   LTP breaks BP bundles into "blocks", then further breaks these blocks
89	   into "segments".  The segment size is a configurable option and
90	   represents the largest atomic portion of data that LTP will require
91	   underlying layers to deliver as a single unit.  The segment size is
92	   therefore also known as the "retransmission unit", since each lost
93	   segment must be retransmitted in its entirety.  Experimental and
94	   operational evidence has shown that on robust networks increasing the
95	   LTP segment size (up to the maximum UDP/IP datagram size of slightly
96	   less than 64KB) can result in substantial performance increases over
97	   smaller segment sizes.  However, the performance increases must be
98	   tempered with the amount of IP fragmentation invoked as discussed
99	   below.

101	   When LTP presents a segment to the operating system kernel (e.g., via
102	   a sendmsg() system call), the UDP layer prepends a UDP header to
103	   create a UDP datagram.  The UDP layer then presents the resulting
104	   datagram to the IP layer for packet framing and transmission over a
105	   networked path.  The path is further characterized by the path
106	   Maximum Transmission Unit (Path-MTU) which is a measure of the
107	   smallest link MTU (Link-MTU) among all links in the path.

109	   When LTP presents a segment to the kernel that is larger than the
110	   Path-MTU, the resulting UDP datagram is presented to the IP layer
111	   which in turn performs IP fragmentation to break the datagram into
112	   fragments that are no larger than the Path-MTU.  For example, if the
113	   LTP segment size is 64KB and the Path-MTU is 1280 bytes IP
114	   fragmentation results in 50+ fragments that are transmitted as
115	   individual IP packets.  (Note that for IPv4 [RFC0791], fragmentation
116	   may occur either in the source host or in a router in the network
117	   path, while for IPv6 [RFC8200] only the source host may perform
118	   fragmentation.)

120	   Each IP fragment is subject to the same best-effort delivery service
121	   offered by the network according to current congestion and/or link
122	   signal quality conditions; therefore, the IP fragment size becomes
123	   known as the "loss unit".  Especially when the packet loss rate is
124	   non-negligible, however, performance can suffer dramatically when the
125	   loss unit is significantly smaller than the retransmission unit.  In
126	   particular, if even a single IP fragment of a fragmented LTP segment
127	   is lost then the entire LTP segment is deemed lost and must be
128	   retransmitted.  Since LTP does not support flow control or congestion
129	   control, this can result in a cascading flood of redundant
130	   information when fragments are systematically lost in transit.

132	   This document discusses LTP interactions with IP fragmentation and
133	   mitigations for managing the amount of IP fragmentation employed.  It
134	   further discusses methods for increasing LTP performance both with
135	   and without the aid of IP fragmentation.

137	2.  Terminology

139	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
140	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
141	   "OPTIONAL" in this document are to be interpreted as described in BCP
142	   14 [RFC2119][RFC8174] when, and only when, they appear in all
143	   capitals, as shown here.

145	3.  IP Fragmentation Issues

147	   IP fragmentation is a fundamental service of the Internet Protocol,
148	   yet it has long been understood that its use can be problematic in
149	   some environments.  Beginning as early as 1987, "Fragmentation
150	   Considered Harmful" [FRAG] outlined multiple issues with the service
151	   including a performance-crippling condition that can occur at high
152	   data rates when the loss unit is considerably smaller than the
153	   retransmission unit during intermittent and/or steady-state loss
154	   conditions.

156	   Later investigations also identified the possibility for undetected
157	   corruption at high data rates due to a condition known as "ID
158	   wraparound" when the 16-bit IP identification field (aka the "IP ID")
159	   increments such that new fragments overlap with existing fragments
160	   still alive in the network and with identical ID values
161	   [RFC4963][RFC6864].  Although this issue occurs only in the IPv4
162	   protocol (and not in IPv6 where the IP ID is 32-bits in length), the
163	   IPv4 concerns along with the fact that IPv6 does not permit routers
164	   to perform "network fragmentation" have led many to discourage the
165	   use of fragmentation whenever possible.

167	   Even in the modern era, investigators have seen fit to declare "IP
168	   Fragmentation Considered Fragile" in an Internet Engineering Task
169	   Force (IETF) Best Current Practice (BCP) reference [RFC8900].
170	   Indeed, the BCP recommendations cite the Bundle Protocol LTP
171	   convergence layer as a user of IP fragmentation that depends on some
172	   of its properties to realize greater performance.  However, the BCP
173	   summarizes by saying:

175	      "Rather than deprecating IP fragmentation, this document
176	      recommends that upper-layer protocols address the problem of
177	      fragmentation at their layer, reducing their reliance on IP
178	      fragmentation to the greatest degree possible."

180	   While the performance implications are considerable and have serious
181	   implications for real-world applications, our goal in this document
182	   is neither to condemn nor embrace IP fragmentation as it pertains to
183	   the Bundle Protocol LTP convergence layer operating over UDP/IP
184	   sockets.  Instead, we examine ways in which the benefits of IP
185	   fragmentation can be realized while avoiding the pitfalls.  We
186	   therefore next discuss our systematic approach to LTP fragmentation.

188	4.  LTP Fragmentation

190	   In common LTP implementations over UDP/IP (e.g., the Interplanetary
191	   Overlay Network (ION)), performance is greatly dependent on the LTP
192	   segment size.  This is due to the fact that a larger segment
193	   presented to UDP/IP as a single unit incurs only a single system call
194	   and a single data copy from application to kernel space via the
195	   sendmsg() system call.  Once inside the kernel, the segment incurs
196	   UDP/IP encapsulation and IP fragmentation which again results in a
197	   loss unit smaller than the retransmission unit.  However, during
198	   fragmentation, each fragment is transmitted immediately following the
199	   previous without delay so that the fragments appear as a "burst" of
200	   consecutive packets over the network path resulting in high network
201	   utilization during the burst period.  Additionally, the use of IP
202	   fragmentation with a larger segment size conserves header framing
203	   bytes since the upper layer headers only appear in the first IP
204	   fragment as opposed to appearing in all fragments.

206	   In order to avoid retransmission congestion (i.e., especially when
207	   the loss probability is non-negligible), the natural choice would be
208	   to set the LTP segment size to a size that is no larger than the
209	   Path-MTU.  Assuming the minimum IPv4 MTU of 576 bytes, however,
210	   transmission of 64KB of data using a 576B segment size would require
211	   well over 100 independent sendmsg() system calls and data copies as
212	   opposed to just one when the largest segment size is used.  This
213	   greatly reduces the bandwidth advantage offered by IP fragmentation
214	   bursts.  Therefore, a means for providing the best aspects of both
215	   large segment fragment bursting and small segment retransmission
216	   efficiency is needed.

218	   Common operating systems such as linux provide the sendmmsg() ("send
219	   multiple messages") system call that allows the LTP application to
220	   present the kernel with a vector of up to 1024 segments instead of
221	   just a single segment.  This theoretically affords the bursting
222	   behavior of IP fragmentation coupled with the retransmission
223	   efficiency of employing small segment sizes.  (Note that LTP
224	   receivers can also use the recvmmsg() ("receive multiple messages")
225	   system call to receive a vector of segments from the kernel in case
226	   multiple recent packet arrivals can be combined.)

228	   This work therefore recommends implementations of LTP to employ a
229	   large block size, a conservative segment size and a new configuration
230	   option known as the "Burst-Limit" which determines the number of
231	   segments that can be presented in a single sendmmsg() system call.
232	   When the implementation receives an LTP block, it carves Burst-Limit-
233	   many segments from the block and presents the vector of segments to
234	   sendmmsg().  The kernel will prepare each segment as an independent
235	   UDP/IP packet and transmit them into the network as a burst in a
236	   fashion that parallels IP fragmentation.  The loss unit and
237	   retransmission unit will be the same, therefore loss of a single
238	   segment does not result in a retransmission congestion event.

240	   It should be noted that the Burst-Limit is bounded only by the LTP
241	   block size and not by the maximum UDP/IP datagram size.  Therefore,
242	   each burst can in practice convey significantly more data than a
243	   single IP fragmentation event.  It should also be noted that the
244	   segment size can still be made larger than the Path-MTU in low-loss
245	   environments without danger of triggering retransmission storms due
246	   to loss of IP fragments.  This would result in combined large UDP/IP
247	   message transmission and IP fragmentation bursting for increased
248	   network utilization in more robust environments.  Finally, both the
249	   Burst-Limit and UDP/IP message sizes need not be static values, and
250	   can be tuned to adaptively increase or decrease according to time
251	   varying network conditions.

253	5.  Beyond "sendmmsg()"

255	   Implementation experience with the ION-DTN distribution along with
256	   two recent studies have demonstrated limited performance increases
257	   for employing sendmmsg() for transmission over UDP/IP sockets.  A
258	   first study used sendmmsg() as part of an integrated solution to
259	   produce 1M packets per second assuming only raw data transmission
260	   conditions [MPPS], while a second study focused on performance
261	   improvements for the QUIC reliable transport service [QUIC].  In both
262	   studies, the use of sendmmsg() alone produced observable increases
263	   but complimentary enhancements were identified that (when combined
264	   with sendmmsg()) produced considerable additional increases.

266	   In [MPPS], additional enhancements such as using recvmmsg() and
267	   configuring multiple receive queues at the receiver were introduced
268	   in an attempt to achieve greater parallelism and engage multiple
269	   processors and threads.  However, the system was still limited to a
270	   single thread until multiple receiving processes were introduced
271	   using the "SO_REUSEPORT" socket option.  By having multiple receiving
272	   processes (each with its own socket buffer), the performance
273	   advantages of parallel processing were employed to achieve the 1M
274	   packets per second goal.

276	   In [QUIC], a new feature available in recent linux kernel versions
277	   was employed.  The feature, known as "Generic Segmentation Offload
278	   (GSO) / Generic Receive Offload (GRO)" allows an application to
279	   provide the kernel with a "super-buffer" containing up to 64 separate
280	   upper layer protocol segments.  When the application presents the
281	   super-buffer to the kernel, GSO segmentation then sends up to 64
282	   separate UDP/IP packets in a burst.  (Note that GSO requires each
283	   UDP/IP packet to be no larger than the path MTU so that receivers can
284	   invoke GRO without interactions with IP reassembly.)  The GSO
285	   facility can be invoked by either sendmsg() (i.e., a single super-
286	   buffer) or sendmmsg() (i.e., multiple super-buffers), and the study
287	   showed a substantial performance increase over using just sendmsg()
288	   and sendmmsg() alone.

290	   For LTP fragmentation, our ongoing efforts explore using these
291	   techniques in a manner that parallels the effort undertaken for QUIC.
292	   Using these higher-layer segmentation management facilities is
293	   consistent with the guidance in "IP Fragmentation Considered Fragile"
294	   that states:

296	      "Rather than deprecating IP fragmentation, this document
297	      recommends that upper-layer protocols address the problem of
298	      fragmentation at their layer, reducing their reliance on IP
299	      fragmentation to the greatest degree possible."

301	   By addressing fragmentation at their layer, the LTP/UDP functions can
302	   then be tuned to minimize IP fragmentation in environments where it
303	   may be problematic or to adaptively engage IP fragmentation in
304	   environments where performance gains can be realized without risking
305	   sustained loss and/or data corruption.

307	6.  LTP Performance Enhancement Using GSO/GRO

309	   Some modern operating systems include Generic Segment Offload (GSO)
310	   and Generic Receive Offload (GRO) services.  For example, GSO/GRO
311	   support has been included in linux beginning with kernel version
312	   4.18.  Some network drivers and network hardware also support GSO/GRO
313	   at or below the operating system network device driver interface
314	   layer to provide benefits of delayed segmentation and/or early
315	   reassembly.  The following sections discuss LTP interactions with GSO
316	   and GRO.

318	6.1.  LTP and GSO

320	   GSO allows LTP implementations to present the sendmsg() or sendmmsg()
321	   system calls with "super-buffers" that include up to 64 LTP segments
322	   which the kernel will subdivide into individual UDP/IP datagrams.
323	   LTP implementations enable GSO either on a per-socket basis using the
324	   "setsockopt()" system call or on a per-message basis for
325	   sendmsg()/sendmmsg() as follows:

327	     /* Set LTP segment size */
328	     unsigned integer gso_size = SEGSIZE;
329	     ...
330	     /* Enable GSO for all messages sent on the socket */
331	     setsockopt(fd, SOL_UDP, UDP_SEGMENT, &gso_size, sizeof(gso_size)));
332	     ...
333	     /* Alternatively, set per-message GSO control */
334	     cm = CMSG_FIRSTHDR(&msg);
335	     cm->cmsg_level = SOL_UDP;
336	     cm->cmsg_type = UDP_SEGMENT;
337	     cm->cmsg_len = CMSG_LEN(sizeof(uint16_t));
338	     *((uint16_t *) CMSG_DATA(cm)) = gso_size;

340	   Implementations must set SEGSIZE to a value no larger than the path
341	   MTU via the underlying network interface, minus the header sizes
342	   (see: Appendix A); this ensures that UDP/IP datagrams generated
343	   during GSO segmentation will not incur local IP fragmentation prior
344	   to transmission (NB: the linux kernel returns EINVAL if SEGSIZE is
345	   set to a value that would exceed the path MTU.)

347	   Implementations should therefore dynamically determine SEGSIZE for
348	   paths that traverse multiple links through Packetization Layer Path
349	   MTU Discovery for Datagram Transports [RFC8899] (DPMTUD).
350	   Implementations should set an initial SEGSIZE to either a known
351	   minimum MTU for the path or to the protocol-defined minimum path MTU
352	   (i.e., 576 for IPv4 or 1280 for IPv6).  Implementations may then
353	   dynamically increase SEGSIZE without service interruption if the
354	   discovered path MTU is larger.

356	6.2.  LTP and GRO

358	   GRO allows the kernel to return "super-buffers" that contain multiple
359	   concatenated received segments to the LTP implementation in recvmsg()
360	   or recvmmsg() system calls, where each concatenated segment is
361	   distinguished by an LTP segment header per [RFC5326].  LTP
362	   implementations enable GRO on a per-socket basis using the
363	   "setsockopt()" system call, then optionally set up per receive
364	   message ancillary data to receive the segment length for each message
365	   as follows:

367	     /* Enable GRO */
368	     unsigned integer use_gro = 1; /* boolean */
369	     setsockopt(fd, SOL_UDP, UDP_GRO, &use_gro, sizeof(use_gro)));
370	     ...
371	     /* Set per-message GRO control */
372	     cmsg->cmsg_len = CMSG_LEN(sizeof(int));
373	     *((int *)CMSG_DATA(cmsg)) = 0;
374	     cmsg->cmsg_level = SOL_UDP;
375	     cmsg->cmsg_type = UDP_GRO;
376	     ...
377	     /* Receive per-message GRO segment length */
378	     if ((segmentLength = *((int *)CMSG_DATA(cmsg))) <= 0)
379	          segmentLength = messageLength;

381	   Implementations include a pointer to a "use_gro" boolean indication
382	   to the kernel to enable GRO; the only interoperability requirement
383	   therefore is that each UDP/IP packet includes an integral number of
384	   properly-formed LTP segments.  The kernel and/or underlying network
385	   hardware will first coalesce multiple received segments into a larger
386	   single segment whenever possible and/or return multiple coalesced or
387	   singular segments to the LTP implementation so as to maximize the
388	   amount of data returned in a single system call.  The "super-buffer"
389	   thus prepared MUST contain at most 64 segments where each non-final
390	   segment MUST be equal in length and the final segment MUST NOT be
391	   longer than the non-final segment length.

393	   Implementations that invoke recvmsg( ) and/or recvmmsg() will
394	   therefore receive "super-buffers" that include one or more
395	   concatenated received LTP segments.  The LTP implementation accepts
396	   all received LTP segments and identifies any segments that may be
397	   missing.  The LTP protocol then engages segment report procedures if
398	   necessary to request retransmission of any missing segments.

400	6.3.  LTP GSO/GRO Over OMNI Interfaces

402	   LTP engines produce UDP/IP packets that can be forwarded over an
403	   underlying network interface as the head-end of a "link-layer service
404	   that transits IP packets".  UDP/IP packets that enter the link near-
405	   end are deterministically delivered to the link-far end modulo loss
406	   due to corruption, congestion or disruption.  The link-layer service
407	   is associated with an MTU that deterministically establishes the
408	   maximum packet size that can transit the link.  The link-layer
409	   service may further support a segmentation and reassembly function
410	   with fragment retransmissions at a layer below IP; in many cases,
411	   these timely link-layer retransmissions can reduce dependency on
412	   (slow) end-to-end retransmissions.

414	   LTP engines that connect to networks traversed by paths consisting of
415	   multiple concatenated links must be prepared to adapt their segment
416	   sizes to match the minimum MTU of all links in the path.  This could
417	   result in a small SEGSIZE that would interfere with the benefits of
418	   GSO/GRO layering.  However, nodes that configure LTP engines can also
419	   establish an Overlay Multilink Network Interface (OMNI)
420	   [I-D.templin-6man-omni] that spans the multiple concatenated links
421	   while presenting an assured (64KB-1) MTU to the LTP engine.

423	   The OMNI interface internally uses IPv6 fragmentation as an OMNI
424	   Adaptation Layer (OAL) service not visible to the LTP engine to allow
425	   timely link-layer retransmissions of lost fragments where the
426	   retransmission unit matches the loss unit.  The LTP engine can then
427	   dynamically vary its SEGSIZE (up to a maximum value of (64KB-1) minus
428	   headers) to determine the size that produces the best performance at
429	   the current time by engaging the combined operational factors at all
430	   layers of the multi-layer architecture.  This dynamic factoring
431	   coupled with the ideal link properties provided by the OMNI interface
432	   support an effective layering solution for many DTN networks.

434	   When an LTP/UDP/IP packet is transmitted over an OMNI interface, the
435	   OAL inserts an IPv6 header and performs IPv6 fragmentation to produce
436	   fragments small enough to fit within the path MTU.  The OAL then
437	   replaces the IPv6 encapsulation headers with OMNI Compressed Headers
438	   (OCHs) which are significantly smaller that their uncompressed IPv6
439	   header counterparts and even smaller than the IPv4 headers would have
440	   been had the packet been sent directly over a physical interface such
441	   as Ethernet using IPv4 fragmentation.

443	   The end result is that the first fragment produced by the OAL will
444	   include a small amount of additional overhead to accommodate the OCH
445	   encapsulation header while all additional fragments will include only
446	   an OCH header which is significantly smaller than even an IPv4
447	   header.  The act of forwarding the large LTP/UDP/IP packet over the
448	   OMNI interface will therefore produce a considerable overhead savings
449	   in comparison with direct Ethernet transmission.

451	   Using the OMNI interface with its OAL service in addition to the GSO/
452	   GRO mechanism, an LTP engine can therefore theoretically present
453	   concatenated LTP segments in a "super-buffer" of up to (64 * ((64KB-
454	   1) minus headers)) octets for transmission in a single sendmsg()
455	   system call, and may present multiple such "super-buffers" in a
456	   single system call when sendmmsg() is used.  (Note however that
457	   existing implementations limit the maximum-sized "super-buffer" to
458	   only 64KB total.)  In the future, this service may realize even
459	   greater benefits through the use of IP Jumbograms [RFC2675] over
460	   paths that support them.

462	6.4.  IP Parcels

464	   The so-called "super-buffers" discussed in the previous sessions can
465	   be applied for GSO/GRO only when the LTP application endpoints are
466	   co-resident with the OAL source and destination, respectively.
467	   However, it may be desirable for the future architecture to support
468	   network forwarding for these "super-buffers" in case the LTP source
469	   and/or destination are located one or more IP networking hops away
470	   from nodes that configure their respective source and destination
471	   OMNI interfaces.  Moreover, if the OMNI virtual link spans multiple
472	   OMNI intermediate nodes on the path from the OAL source to the OAL
473	   destination it may be desirable to keep the "super-buffers" together
474	   as much as possible as they traverse the intermediate hops.  For this
475	   reason, a new construct known as the "IP Parcel" has been specified
476	   [I-D.templin-intarea-parcels].

478	   An IP parcel is a special form of an IP Jumbogram that includes a
479	   non-zero value in the IP {Total, Payload} Length field.  The value in
480	   that field sets the segment size for the first segment included in
481	   the parcel, while the value coded in the Jumbo Payload header
482	   determines the number of segments included.  Each segment "shares"
483	   the same IP header, and the parcel can be broken down into sub-
484	   parcels if necessary to traverse paths with length restrictions.  A
485	   full discussion of IP parcels is found in
486	   [I-D.templin-intarea-parcels].

488	7.  Implementation Status

490	   Supporting code for invoking the sendmmsg() facility is included in
491	   the official ION source code distribution, beginning with release
492	   ion-4.0.1.

494	   Working code for GSO/GRO has been incorporated into a pre-release of
495	   ION and scheduled for integration following the next major release.

497	8.  IANA Considerations

499	   This document introduces no IANA considerations.

501	9.  Security Considerations

503	   Communications networking security is necessary to preserve
504	   confidentiality, integrity and availability.

506	10.  Acknowledgements

508	   The NASA Space Communications and Networks (SCaN) directorate
509	   coordinates DTN activities for the International Space Station (ISS)
510	   and other space exploration initiatives.

512	   Akash Agarwal, Madhuri Madhava Badgandi, Keith Philpott, Bill
513	   Pohlchuck, Vijayasarathy Rajagopalan, Bhargava Raman Sai Prakash and
514	   Eric Yeh are acknowledged for their significant contributions.  Tyler
515	   Doubrava was the first to mention the "sendmmsg()" facility.  Scott
516	   Burleigh provided review input, and David Zoller provided useful
517	   perspective.

519	11.  References

521	11.1.  Normative References

523	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
524	              DOI 10.17487/RFC0768, August 1980,
525	              .

527	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
528	              DOI 10.17487/RFC0791, September 1981,
529	              .

531	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
532	              Requirement Levels", BCP 14, RFC 2119,
533	              DOI 10.17487/RFC2119, March 1997,
534	              .

536	   [RFC5326]  Ramadas, M., Burleigh, S., and S. Farrell, "Licklider
537	              Transmission Protocol - Specification", RFC 5326,
538	              DOI 10.17487/RFC5326, September 2008,
539	              .

541	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
542	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
543	              May 2017, .

545	   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
546	              (IPv6) Specification", STD 86, RFC 8200,
547	              DOI 10.17487/RFC8200, July 2017,
548	              .

550	11.2.  Informative References

552	   [FRAG]     Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
553	              ACM Sigcomm 1987", August 1987.

555	   [I-D.ietf-dtn-bpbis]
556	              Burleigh, S., Fall, K., and E. J. Birrane, "Bundle
557	              Protocol Version 7", Work in Progress, Internet-Draft,
558	              draft-ietf-dtn-bpbis-31, 25 January 2021,
559	              .

562	   [I-D.templin-6man-omni]
563	              Templin, F. L. and T. Whyman, "Transmission of IP Packets
564	              over Overlay Multilink Network (OMNI) Interfaces", Work in
565	              Progress, Internet-Draft, draft-templin-6man-omni-52, 31
566	              December 2021, .

569	   [I-D.templin-intarea-parcels]
570	              Templin, F. L., "IP Parcels", Work in Progress, Internet-
571	              Draft, draft-templin-intarea-parcels-06, 22 December 2021,
572	              .

575	   [MPPS]     Majkowski, M., "How to Receive a Million Packets Per
576	              Second, https://blog.cloudflare.com/how-to-receive-a-
577	              million-packets/", June 2015.

579	   [QUIC]     Ghedini, A., "Accelerating UDP Packet Transmission for
580	              QUIC, https://calendar.perfplanet.com/2019/accelerating-
581	              udp-packet-transmission-for-quic/", December 2019.

583	   [RFC2675]  Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
584	              RFC 2675, DOI 10.17487/RFC2675, August 1999,
585	              .

587	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
588	              Errors at High Data Rates", RFC 4963,
589	              DOI 10.17487/RFC4963, July 2007,
590	              .

592	   [RFC6864]  Touch, J., "Updated Specification of the IPv4 ID Field",
593	              RFC 6864, DOI 10.17487/RFC6864, February 2013,
594	              .

596	   [RFC8899]  Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T.
597	              Völker, "Packetization Layer Path MTU Discovery for
598	              Datagram Transports", RFC 8899, DOI 10.17487/RFC8899,
599	              September 2020, .

601	   [RFC8900]  Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O.,
602	              and F. Gont, "IP Fragmentation Considered Fragile",
603	              BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020,
604	              .

606	Appendix A.  IPv4/IPv6 Protocol Considerations

608	   LTP/UDP/IP peers can communicate either via IPv4 or IPv6 addressing
609	   when both peers configure a unique address of the same protocol
610	   version on the OMNI interface.  The IPv4 Total Length field includes
611	   the length of both the UDP header and base IPv4 header, while the
612	   IPv6 Payload Length field includes the length of the UDP header but
613	   not the base IPv6 header.

615	   Therefore, unless header extensions are included, each maximum-sized
616	   LTP/UDP/IPv6 packet would contain 20 octets more actual LTP data than
617	   a maximum-sized LTP/UDP/IPv4 packet can contain for the price of
618	   including only 20 additional header octets for IPv6.  The overhead
619	   percentage for carrying this additional 20 header octets in maximum-
620	   sized packets is therefore insignificant and becomes smaller still
621	   when IPv6 header compression is used.

623	Appendix B.  The Intergalactic Jigsaw Puzzle Builders Club

625	   The process we are optimizing is like an imaginary Intergalactic
626	   Jigsaw Puzzle Builders Club.  A first builder starts with an original
627	   image, admires it momentarily then breaks it up into like-sized
628	   puzzle pieces with unique serial numbers.  The first builder then
629	   delivers each piece to their local post office which has an
630	   Intergalactic Puzzle Piece Transporter.

632	   The transporter can instantly deliver each puzzle piece to a remote
633	   post office which could be nearby or in a far-off distant galaxy.
634	   The remote post office then delivers each piece to the next builder
635	   who very quickly puts it in the correct place based on the serial
636	   number.  This builder eventually reconstructs the entire original
637	   image, then admires it and forwards it on to the next builder in the
638	   same fashion.

640	   All original images are the same dimensions, but each consecutive
641	   builder can choose to break them into fewer and larger pieces or more
642	   and smaller pieces - for example 100, 250, 500, 1000 or even more
643	   pieces.  The local post office transporter can send smaller pieces
644	   intact, but must cut larger pieces into fragments that the remote
645	   post office will paste back together.  This process is both fast and
646	   invisible to the builders who only see whole puzzle pieces and not
647	   fragments.

649	   For ION-DTN LTP performance, we believed that performance could
650	   increase if builders could exchange MULTIPLE puzzle pieces in
651	   packages called PARCELs with their local post offices instead of just
652	   one piece at a time, and we have shown that this is true to a limited
653	   extent for small- to medium-sized pieces.  But, we see that overall
654	   system performance is dominated by the time needed for the receiver
655	   to install a SINGLE puzzle piece, and we see that builders can
656	   reassemble puzzles with fewer and larger pieces MUCH faster than for
657	   ones with more and smaller pieces.

659	   So, why not just use larger puzzle pieces all the time?  The problem
660	   is the transporter is imperfect and can lose, damage and/or reorder
661	   pieces.  And, if even a single bit is lost or damaged the sender must
662	   retransmit the entire large piece all over again.  This is not only
663	   expensive (since the post office charges for transporter use by
664	   weight) but the whole service degrades because the loss unit is
665	   smaller than the retransmission unit resulting in a cascading flood
666	   of redundant information.

668	   The system is a multi-variable optimization problem, and there are
669	   many knobs to turn.  Tuning characteristics can also vary over time
670	   due to fluctuations in transporter performance.  We also believe that
671	   if the transporter can be made to quickly retransmit lost fragments,
672	   it can often salvage partial puzzle pieces that otherwise would have
673	   been discarded.  This would allow builders to use larger pieces to
674	   increase performance.

676	   Studies of the QUIC protocol have shown that PARCELs can result in
677	   major performance increases by making the builder to post office
678	   interface exchanges more efficient.  For LTP, we have seen limited
679	   increases (less than factor-2) using smaller segment sizes.  While
680	   any increase is good, we believe that either increasing the single
681	   puzzle piece placement speed or supporting placement of multiple
682	   pieces simultaneously will be gating factors for increased
683	   performance.  This may shift the performance bottleneck back to the
684	   builder to post office interface, and PARCELS may help achieve
685	   greater increases even for larger puzzle pieces.

687	Author's Address

689	   Fred L. Templin (editor)
690	   Boeing Research & Technology
691	   P.O. Box 3707
692	   Seattle, WA 98124
693	   United States of America

695	   Email: fltemplin@acm.org