idnits 2.17.1 

draft-templin-dtn-ltpfrag-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (March 8, 2021) is 1135 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                    F. Templin, Ed.
3	Internet-Draft                              Boeing Research & Technology
4	Intended status: Informational                             March 8, 2021
5	Expires: September 9, 2021

7	                           LTP Fragmentation
8	                      draft-templin-dtn-ltpfrag-04

10	Abstract

12	   The Licklider Transmission Protocol (LTP) provides a reliable
13	   datagram convergence layer for the Delay/Disruption Tolerant
14	   Networking (DTN) Bundle Protocol.  In common practice, LTP is often
15	   configured over UDP/IP sockets and inherits its maximum segment size
16	   from the maximum-sized UDP datagram, however when this size exceeds
17	   the maximum IP packet size for the path a service known as IP
18	   fragmentation must be employed.  This document discusses LTP
19	   interactions with IP fragmentation and mitigations for managing the
20	   amount of IP fragmentation employed.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on September 9, 2021.

39	Copyright Notice

41	   Copyright (c) 2021 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (https://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   3.  IP Fragmentation Issues . . . . . . . . . . . . . . . . . . .   3
59	   4.  LTP Fragmentation . . . . . . . . . . . . . . . . . . . . . .   4
60	   5.  Beyond "sendmmsg()" . . . . . . . . . . . . . . . . . . . . .   6
61	   6.  Implementation Status . . . . . . . . . . . . . . . . . . . .   7
62	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
63	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
64	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
65	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .   7
66	     10.1.  Normative References . . . . . . . . . . . . . . . . . .   7
67	     10.2.  Informative References . . . . . . . . . . . . . . . . .   8
68	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

70	1.  Introduction

72	   The Licklider Transmission Protocol (LTP) [RFC5326] provides a
73	   reliable datagram convergence layer for the Delay/Disruption Tolerant
74	   Networking (DTN) Bundle Protocol (BP) [I-D.ietf-dtn-bpbis].  In
75	   common practice, LTP is often configured over the User Datagram
76	   Protocol (UDP) [RFC0768] and Internet Protocol (IP) [RFC0791] using
77	   the "socket" abstraction.  LTP inherits its maximum segment size from
78	   the maximum-sized UDP datagram (i.e. 2^16 bytes minus header sizes),
79	   however when the UDP datagram size exceeds the maximum IP packet size
80	   for the path a service known as IP fragmentation must be employed.

82	   LTP breaks BP bundles into "blocks", then further breaks these blocks
83	   into "segments".  The segment size is a configurable option and
84	   represents the largest atomic block of data that LTP will require
85	   underlying layers to deliver as a single unit.  The segment size is
86	   therefore also known as the "retransmission unit", since each lost
87	   segment must be retransmitted in its entirety.  Experimental and
88	   operational evidence has shown that on robust networks increasing the
89	   LTP segment size (up to the maximum UDP datagram size of slightly
90	   less than 64KB) can result in substantial performance increases over
91	   smaller segment sizes.  However, the performance increases must be
92	   tempered with the amount of IP fragmentation invoked as discussed
93	   below.

95	   When LTP presents a segment to the operating system kernel (e.g., via
96	   a sendmsg() system call), the UDP layer prepends a UDP header to
97	   create a UDP datagram.  The UDP layer then presents the resulting
98	   datagram to the IP layer for packet framing and transmission over a
99	   networked path.  The path is further characterized by the path
100	   Maximum Transmission Unit (Path-MTU) which is a measure of the
101	   smallest link MTU (Link-MTU) among all links in the path.

103	   When LTP presents a segment to the kernel that is larger than the
104	   Path-MTU, the resulting UDP datagram is presented to the IP layer,
105	   which in turn performs IP fragmentation to break the datagram into
106	   fragments that are no larger than the Path-MTU.  For example, if the
107	   LTP segment size is 64KB and the Path-MTU is 1280 bytes IP
108	   fragmentation results in 50+ fragments that are transmitted as
109	   individual IP packets.  (Note that for IPv4 [RFC0791], fragmentation
110	   may occur either in the source host or in a router in the network
111	   path, while for IPv6 [RFC8200] only the source host may perform
112	   fragmentation.)

114	   Each IP fragment is subject to the same best-effort delivery service
115	   offered by the network according to current congestion and/or link
116	   signal quality conditions; therefore, the IP fragment size becomes
117	   known as the "loss unit".  Especially when the packet loss rate is
118	   non-negligible, however, performance can suffer dramatically when the
119	   loss unit is significantly smaller than the retransmission unit.  In
120	   particular, if even a single IP fragment of a fragmented LTP segment
121	   is lost then the entire LTP segment is deemed lost and must be
122	   retransmitted.

124	   This document discusses LTP interactions with IP fragmentation and
125	   mitigations for managing the amount of IP fragmentation employed.  It
126	   further discusses methods for increasing LTP performance both with
127	   and without the aid of IP fragmentation.

129	2.  Terminology

131	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
132	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
133	   "OPTIONAL" in this document are to be interpreted as described in BCP
134	   14 [RFC2119][RFC8174] when, and only when, they appear in all
135	   capitals, as shown here.

137	3.  IP Fragmentation Issues

139	   IP fragmentation is a fundamental service of the Internet Protocol,
140	   yet it has long been understood that its use can be problematic in
141	   some environments.  Beginning as early as 1987, "Fragmentation
142	   Considered Harmful" [FRAG] outlined multiple issues with the service
143	   including a performance-crippling condition that can occur at high
144	   data rates when the loss unit is considerably smaller than the
145	   retransmission unit during intermittent and/or steady-state loss
146	   conditions.

148	   Later investigations also identified the possibility for undetected
149	   data corruption at high data rates due to a condition known as "ID
150	   wraparound" when the 16-bit IP identification field (aka the "IP ID")
151	   increments such that new fragments overlap with existing fragments
152	   still alive in the network and with identical ID values
153	   [RFC4963][RFC6864].  Although this issue occurs only in the IPv4
154	   protocol (and not in IPv6 where the IP ID is 32-bits in length), the
155	   IPv4 concerns along with the fact that IPv6 does not permit routers
156	   to perform "network fragmentation" have led many to discourage its
157	   use.

159	   Even in the modern era, investigators have seen fit to declare "IP
160	   Fragmentation Considered Fragile" in an Internet Engineering Task
161	   Force (IETF) Best Current Practice (BCP) reference [RFC8900].
162	   Indeed, the BCP recommendations cite the Bundle Protocol LTP
163	   convergence layer as a user of IP fragmentation that depends on some
164	   of its properties to realize greater performance.  However, the BCP
165	   summarizes by saying:

167	      "Rather than deprecating IP fragmentation, this document
168	      recommends that upper-layer protocols address the problem of
169	      fragmentation at their layer, reducing their reliance on IP
170	      fragmentation to the greatest degree possible."

172	   While the performance implications are considerable and have serious
173	   implications for real-world applications, our goal in this document
174	   is neither to condemn nor embrace IP fragmentation as it pertains to
175	   the Bundle Protocol LTP convergence layer operating over UDP/IP
176	   sockets.  Instead, we examine ways in which the benefits of IP
177	   fragmentation can be realized while avoiding the pitfalls.  We
178	   therefore next discuss our systematic approach to LTP fragmentation.

180	4.  LTP Fragmentation

182	   In common LTP implementations over UDP/IP (e.g., the Interplanetary
183	   Overlay Network (ION)), performance is greatly dependent on the LTP
184	   segment size.  This is due to the fact that a larger segment
185	   presented to UDP/IP as a single unit incurs only a single system call
186	   and a single data copy from application to kernel space via the
187	   sendmsg() system call.  Once inside the kernel, the segment incurs
188	   UDP/IP encapsulation and IP fragmentation which again results in a
189	   loss unit smaller than the retransmission unit.  However, during
190	   fragmentation, each fragment is transmitted immediately following the
191	   previous without delay so that the fragments appear as a "burst" of
192	   consecutive packets over the network path resulting in high network
193	   utilization during the burst period.  Additionally, the use of IP
194	   fragmentation with a larger segment size conserves header framing
195	   bytes since the LTP layer headers only appear in the first IP
196	   fragment as opposed to appearing in all IP packets.

198	   In order to avoid retransmission congestion (i.e., especially when
199	   the loss probability is non-negligible), the natural choice would be
200	   to set the LTP segment size to a size that is no larger than the
201	   Path-MTU.  Assuming the minimum IPv4 MTU of 576 bytes, however,
202	   transmission of 64KB of data using a 576B segment size would require
203	   well over 100 independent sendmsg() system calls and data copies as
204	   opposed to just one when the largest segment size is used.  This
205	   greatly reduces the bandwidth advantage offered by IP fragmentation
206	   bursts.  Therefore, a means for providing the best aspects of both
207	   large segment fragment bursting and small segment retransmission
208	   efficiency is needed.

210	   Common operating systems such as linux provide the sendmmsg() ("send
211	   multiple messages") system call that allows the LTP application to
212	   present the kernel with a vector of up to 1024 segments instead of
213	   just a single segment.  This affords the bursting behavior of IP
214	   fragmentation coupled with the retransmission efficiency of employing
215	   small segment sizes.  (Note that LTP receivers can also use the
216	   recvmmsg() ("receive multiple messages") system call to receive a
217	   vector of segments from the kernel in case multiple recent packet
218	   arrivals can be combined.)

220	   This work therefore recommends implementations of LTP to employ a
221	   large block size, a conservative segment size and a new configuration
222	   option known as the "Burst-Limit" which determines the number of
223	   segments that can be presented in a single sendmmsg() system call.
224	   When the implementation receives an LTP block, it carves Burst-Limit-
225	   many segments from the block and presents the vector of segments to
226	   sendmmsg().  The kernel will prepare each segment as an independent
227	   UDP/IP packet and transmit them into the network as a burst in a
228	   fashion that parallels IP fragmentation.  The loss unit and
229	   retransmission unit will be the same, therefore loss of a single
230	   segment does not result in a retransmission congestion event.

232	   It should be noted that the Burst-Limit is bounded only by the LTP
233	   block size and not by the maximum UDP datagram size.  Therefore, each
234	   burst can in practice convey significantly more data than a single IP
235	   fragmentation event.  It should also be noted that the segment size
236	   can still be made larger than the Path-MTU in low-loss environments
237	   without danger of triggering retransmission storms due to loss of IP
238	   fragments.  This would result in combined UDP message and IP fragment
239	   bursting for increased network utilization in more robust
240	   environments.  Finally, both the Burst-Limit and UDP message sizes
241	   need not be static values, and can be tuned to adaptively increase or
242	   decrease according to time varying network conditions.

244	5.  Beyond "sendmmsg()"

246	   Implementation experience with the ION DTN distribution along with
247	   two recent studies have demonstrated performance increases for
248	   employing sendmmsg() for transmission over UDP/IP sockets.  A first
249	   study used sendmmsg() as part of an integrated solution to produce 1M
250	   packets per second assuming only raw data transmission conditions
251	   [MPPS], while a second study focused on performance improvements for
252	   the QUIC reliable transport service [QUIC].  In both studies, the use
253	   of sendmmsg() alone produced observable increases but complimentary
254	   enhancements were identified that (when combined with sendmmsg())
255	   produced considerable additional increases.

257	   In [MPPS], additional enhancements such as using recvmmsg() and
258	   configuring multiple receive queues at the receiver were introduced
259	   in an attempt to achieve greater parallelism and engage multiple
260	   processors and threads.  However, the system was still limited to a
261	   single thread until multiple receiving processes were introduced
262	   using the "SO_REUSEPORT" socket option.  By having multiple receiving
263	   processes (each with its own socket buffer), the performance
264	   advantages of parallel processing were employed to achieve the 1M
265	   packets per second goal.

267	   In [QUIC], a new feature available in recent linux kernel versions
268	   was employed.  The feature, known as "Generic Segmentation Offload
269	   (GSO) / Generic Receive Offload (GRO)" allows an application to
270	   provide the kernel with a "super-buffer" containing up to 64 separate
271	   QUIC/UDP segments.  When the application presents the super-buffer to
272	   the kernel, GSO segmentation then sends 64 separate UDP/IP packets in
273	   a burst.  If each packet is larger than the Path-MTU, then IP
274	   fragmentation will be invoked for each packet leading to high network
275	   utilization (at the risk of IP fragment loss and retransmission
276	   storms).  The GSO facility can be invoked by either sendmsg() (i.e.,
277	   a single super-buffer) or sendmmsg() (i.e., multiple super-buffers),
278	   and the study showed a substantial performance increase over using
279	   just sendmsg() and sendmmsg() alone.

281	   For LTP fragmentation, our ongoing efforts explore using these
282	   techniques in a manner that parallels the effort undertaken for QUIC.
283	   Using these higher-layer segmentation management facilities is
284	   consistent with the guidance in "IP Fragmentation Considered Fragile"
285	   that states:

287	      "Rather than deprecating IP fragmentation, this document
288	      recommends that upper-layer protocols address the problem of
289	      fragmentation at their layer, reducing their reliance on IP
290	      fragmentation to the greatest degree possible."

292	   By addressing fragmentation at their layer, the LTP/UDP functions can
293	   then be tuned to minimize IP fragmentation in environments where it
294	   may be problematic or to adaptively engage IP fragmentation in
295	   environments where performance gains can be realized without risking
296	   data corruption.

298	6.  Implementation Status

300	   Supporting code for invoking the sendmmsg() facility is included in
301	   the official ION source code distribution, beginning with release
302	   ion-4.0.1.

304	7.  IANA Considerations

306	   This document introduces no IANA considerations.

308	8.  Security Considerations

310	   Communications networking security is necessary to preserve
311	   confidentiality, integrity and availability.

313	9.  Acknowledgements

315	   The NASA Space Communications and Networks (SCaN) directorate
316	   coordinates DTN activities for the International Space Station (ISS)
317	   and other space exploration initiatives.

319	   Madhuri Madhava Badgandi, Keith Philpott, Bill Pohlchuck,
320	   Vijayasarathy Rajagopalan and Eric Yeh are acknowledged for their
321	   significant contributions.  Tyler Doubrava was the first to mention
322	   the "sendmmsg()" facility.  Scott Burleigh provided review input, and
323	   David Zoller provided useful perspective.

325	10.  References

327	10.1.  Normative References

329	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
330	              DOI 10.17487/RFC0768, August 1980,
331	              <https://www.rfc-editor.org/info/rfc768>.

333	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
334	              DOI 10.17487/RFC0791, September 1981,
335	              <https://www.rfc-editor.org/info/rfc791>.

337	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
338	              Requirement Levels", BCP 14, RFC 2119,
339	              DOI 10.17487/RFC2119, March 1997,
340	              <https://www.rfc-editor.org/info/rfc2119>.

342	   [RFC5326]  Ramadas, M., Burleigh, S., and S. Farrell, "Licklider
343	              Transmission Protocol - Specification", RFC 5326,
344	              DOI 10.17487/RFC5326, September 2008,
345	              <https://www.rfc-editor.org/info/rfc5326>.

347	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
348	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
349	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

351	   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
352	              (IPv6) Specification", STD 86, RFC 8200,
353	              DOI 10.17487/RFC8200, July 2017,
354	              <https://www.rfc-editor.org/info/rfc8200>.

356	10.2.  Informative References

358	   [FRAG]     Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
359	              ACM Sigcomm 1987", August 1987.

361	   [I-D.ietf-dtn-bpbis]
362	              Burleigh, S., Fall, K., and E. Birrane, "Bundle Protocol
363	              Version 7", draft-ietf-dtn-bpbis-31 (work in progress),
364	              January 2021.

366	   [MPPS]     Majkowski, M., "How to Receive a Million Packets Per
367	              Second, https://blog.cloudflare.com/how-to-receive-a-
368	              million-packets/", June 2015.

370	   [QUIC]     Ghedini, A., "Accelerating UDP Packet Transmission for
371	              QUIC, https://calendar.perfplanet.com/2019/accelerating-
372	              udp-packet-transmission-for-quic/", December 2019.

374	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
375	              Errors at High Data Rates", RFC 4963,
376	              DOI 10.17487/RFC4963, July 2007,
377	              <https://www.rfc-editor.org/info/rfc4963>.

379	   [RFC6864]  Touch, J., "Updated Specification of the IPv4 ID Field",
380	              RFC 6864, DOI 10.17487/RFC6864, February 2013,
381	              <https://www.rfc-editor.org/info/rfc6864>.

383	   [RFC8900]  Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O.,
384	              and F. Gont, "IP Fragmentation Considered Fragile",
385	              BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020,
386	              <https://www.rfc-editor.org/info/rfc8900>.

388	Author's Address

390	   Fred L. Templin (editor)
391	   Boeing Research & Technology
392	   P.O. Box 3707
393	   Seattle, WA  98124
394	   USA

396	   Email: fltemplin@acm.org