idnits 2.17.1 

draft-morton-tsvwg-interflow-intraflow-delays-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (17 May 2021) is 1074 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Working Group                                        J. Morton
3	Internet-Draft
4	Intended status: Informational                                  P. Heist
5	Expires: 18 November 2021                                    17 May 2021

7	                     Interflow vs Intraflow Delays
8	            draft-morton-tsvwg-interflow-intraflow-delays-00

10	Abstract

12	   Much current literature discusses queuing delays, and the effects of
13	   different queue disciplines, active queue management algorithms, and
14	   congestion control measures on these delays.  This draft highlights
15	   an important distinction between different types of delay, which may
16	   be helpful to practitioners and theoreticians alike.

18	Status of This Memo

20	   This Internet-Draft is submitted in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF).  Note that other groups may also distribute
25	   working documents as Internet-Drafts.  The list of current Internet-
26	   Drafts is at https://datatracker.ietf.org/drafts/current/.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   This Internet-Draft will expire on 18 November 2021.

35	Copyright Notice

37	   Copyright (c) 2021 IETF Trust and the persons identified as the
38	   document authors.  All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
42	   license-info) in effect on the date of publication of this document.
43	   Please review these documents carefully, as they describe your rights
44	   and restrictions with respect to this document.  Code Components
45	   extracted from this document must include Simplified BSD License text
46	   as described in Section 4.e of the Trust Legal Provisions and are
47	   provided without warranty as described in the Simplified BSD License.

49	Table of Contents

51	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
52	   2.  Baseline Path Delay (BPD) and Baseline Round-Trip Time
53	           (BRTT)  . . . . . . . . . . . . . . . . . . . . . . . . .   3
54	   3.  Between-Flow Induced Delay (BFID) . . . . . . . . . . . . . .   4
55	   4.  Within-Flow Induced Delay (WFID)  . . . . . . . . . . . . . .   5
56	   5.  Latency Sensitivity of Traffic  . . . . . . . . . . . . . . .   6
57	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
58	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
59	   8.  Informative References  . . . . . . . . . . . . . . . . . . .   8
60	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

62	1.  Introduction

64	   Throughput, packet loss ratio, and latency are the three most
65	   prominent performance characteristics of Internet paths.  Of these,
66	   throughput has always been the most heavily marketed to consumers,
67	   possibly because it is the only metric from this group in which
68	   bigger numbers are better.  Packet loss is also closely managed by
69	   network engineers, and is mostly kept to usefully low levels in
70	   practice, probably because excessive packet loss tends to cripple the
71	   throughput of typical congestion-controlled traffic.  However, while
72	   latency has great practical importance to many Internet applications,
73	   it is rarely given the attention it needs for proper management.

75	   One consequence of this neglect is the phenomenon of bufferbloat.
76	   Any given Internet path has a natural baseline delay, which is a
77	   consequence of the speed of information propagation in the physical
78	   media, plus processing delays in network nodes that connect link
79	   segments together, plus (for some link types) additional delays
80	   associated with shared media negotiation.  To this baseline, we must
81	   add the delay caused by packets waiting in a queue behind other
82	   packets, which occurs if the link is busy.  If the queue is permitted
83	   to grow too much, these additional queuing delays can become very
84	   noticeable to the user, and may even affect the reliability of
85	   Internet protocols.

87	   This document does not discuss in detail the many and varied means of
88	   controlling latency that are currently or might someday become
89	   available.  Instead the characteristics of this delay are discussed,
90	   including the distinction between "inter-flow induced delay" and
91	   "intra-flow induced delay".  Typically these two types of delay,
92	   despite their similar names, have different effects and may be
93	   controlled by different queue mechanisms.  Simple queues, however, do
94	   not attempt to distinguish them.

96	   To improve the likelihood of distinguishing the names, the terms BFID
97	   (Between-Flow Induced Delay) and WFID (Within-Flow Induced Delay)
98	   will be used as synonyms for inter-flow and intra-flow delays,
99	   respectively.

101	2.  Baseline Path Delay (BPD) and Baseline Round-Trip Time (BRTT)

103	   *Definition:* The delay on a one-way path or round-trip due entirely
104	   to link characteristics and unavoidable processing delays.

106	   For the avoidance of doubt, the word "unavoidable" in this definition
107	   refers to the agency of the traffic traversing the path in question,
108	   and not to that of the network operators or equipment manufacturers
109	   involved.

111	   The speed of light is a fundamental limitation on information
112	   transmission velocity, and thus on the minimum latency of a
113	   geographically long Internet path.  On radio-based links, this limit
114	   is approached closely; in optical fibre or copper wires, the
115	   transmission velocity is somewhat slower.  When avian carriers
116	   [RFC1149] are involved, the transmission velocity necessarily falls
117	   below the speed of sound.  In practice, an allowance of one
118	   millisecond round-trip delay per 100km is usually appropriate.

120	   When a packet is received by a network node, it must be directed into
121	   a processing buffer for at least long enough to determine in which
122	   direction it should be sent next.  Since the necessary information is
123	   typically in the packet header, this may sometimes be less time than
124	   is necessary to receive the entire packet, in which case the head of
125	   the packet may be sent onward while the tail is still being received.
126	   In other cases, the node may receive the packet in whole before
127	   making a processing decision, and may even aggregate the packet with
128	   others for efficiency of dispatch.  This efficiency in throughput or
129	   power consumption may be achieved at the expense of processing delay.

131	   Some link types have significant overhead associated with initiating
132	   a transmission, and/or utilise a shared medium into which only one or
133	   a small number of stations (out of a larger possible total) may
134	   transmit simultaneously.  Similar characteristics may also be
135	   exhibited by power-saving measures on portable devices.  These may
136	   result in significant and/or variable delays in forwarding over these
137	   links, which cannot be avoided by altering characteristics of the
138	   traffic itself.

140	   In practice, an Internet packet can be sent around the world in about
141	   300 milliseconds with current technology.  The round-trip latency
142	   between Eastern Europe and Western North America is presently about
143	   160 milliseconds.  A "typical" Internet round-trip delay can be taken
144	   to be 80 milliseconds, though more localised paths are significantly
145	   quicker in this respect.  Within a LAN or a datacentre, the baseline
146	   delay will often be less than one millisecond.

148	   Whenever two or more packets require sending over the same link
149	   within the time required to send either one of them, link contention
150	   exists and must be resolved.  This generally involves either placing
151	   packets into a queue or discarding them.  These practices are not
152	   within the definition of "baseline" delays, but influence "induced"
153	   delays as below.

155	3.  Between-Flow Induced Delay (BFID)

157	   *Definition:* The delay which the presence and volume of one flow
158	   induces in traffic belonging to another flow.

160	   When packets are held in a queue awaiting delivery, the order in
161	   which these packets are dequeued is significant for managing delay.
162	   The most common strategy to date is to employ a simple FIFO queue.
163	   This means that all traffic traversing the same link at about the
164	   same time experience the same amount of queue delay.  It also means
165	   that a single flow occupying a large part of the queue induces a
166	   large delay to all other flows sharing that queue, even if without
167	   the presence of that single flow there would be no need for queuing
168	   at all.  This is the essence of BFID.

170	   Large BFIDs can be avoided by discriminating flows with high queue
171	   occupancy from those with little or no queue occupancy, and queuing
172	   them separately.  One effective method of doing so, that is, placing
173	   every flow in its own FIFO and serving them in deficit-round-robin
174	   order, is described in detail by [RFC8290]; this "flow-isolating"
175	   mechanism reduces the maximum BFID to the serialisation time of one
176	   full-size packet from each active flow, and can be implemented with
177	   or without the use of Active Queue Management.  It is also feasible
178	   to merely categorise flows into queue occupancy bands and use a
179	   separate FIFO only for each band; this renders the BFID experienced
180	   by each flow proportionate to the BFID it produces.

182	   BFID can also be reduced in a simple FIFO by implementing Active
183	   Queue Management.  This is because in a simple FIFO, BFID and WFID
184	   have the same cause and extent, so reducing WFID also reduces BFID.
185	   The extent to which BFID can be reduced by this method is limited
186	   compared to dedicated methods, and a significant amount of delay
187	   variation typically remains, but this is significantly better than
188	   allowing a large, uncontrolled BFID to exist.

190	   Capacity-seeking flows with little latency sensitivity are
191	   particularly prone to produce BFID, while latency-sensitive flows
192	   that typically use little capacity are particularly affected by
193	   receiving BFID.

195	4.  Within-Flow Induced Delay (WFID)

197	   *Definition:* The delay which the presence and volume of one flow
198	   induces in traffic belonging to itself.

200	   Regardless of the order in which packets are delivered from a queue,
201	   if more than one packet belonging to a given flow is held in a queue,
202	   one of them induces delay to the other by occupying transmission
203	   capacity ahead of it.  In general this WFID is calculable as the
204	   product of the packet delivery rate of that flow and the packet
205	   occupancy in the queue of that flow.

207	   In congestion-controlled flows, one typical cause of WFID is that the
208	   flow's congestion window exceeds the baseline Bandwidth-Delay Product
209	   (BDP) of the flow's path, and the queue in question is the
210	   controlling bottleneck defining the Bandwidth factor.  This is a
211	   natural result of capacity-seeking behaviour, where the congestion
212	   window is increased continuously until some explicit signal of
213	   capacity overload is detected.  If the queue is large and does not
214	   implement Active Queue Management, WFIDs of many seconds are easily
215	   achieved and have been observed in practice.

217	   Another typical cause is that the sender emitted a short-term burst
218	   of packets, which subsequently collects in one or more downstream
219	   queues and is thereby spread out in time at the receiver.  This cause
220	   also applies to non-congestion-controlled protocols that can have
221	   large datagram payloads.  This form of WFID is usually harmless to
222	   the flow causing it, except that large bursts can exceed the capacity
223	   of a queue to absorb them, resulting in packet loss and the need for
224	   retransmission.

226	   In simple FIFOs, or where a flow-isolating mechanism is defeated by
227	   hash collisions or information hiding, the presence of WFID also
228	   implies the presence of an equal degree of BFID to any other flows
229	   sharing that queue.  This implies a responsibility to try to minimise
230	   WFID, even when the flow causing it is not very sensitive to its
231	   effects (as is typical of capacity-seeking protocols).  Buffer sizing
232	   guidelines (eg. typical BDP / sqrt(flows) ) are among the simplest
233	   ways to limit WFID to tolerable levels.

235	   Active Queue Management (AQM) is the primary means of effectively
236	   controlling WFID without impairing the ability to absorb short-term
237	   bursts of traffic, by sending congestion signals to flows
238	   experiencing high queue occupancy.  Early forms of AQM were only able
239	   to generate congestion signals by artificially inducing packet loss.
240	   ECN [RFC3168] introduced the ability to flag congestion on a packet
241	   without dropping it.  AQM may be used alone as in [RFC8289], or in
242	   conjunction with flow-isolation mechanisms as in [RFC8290].  In the
243	   latter case, both WFID and BFID are addressed individually by
244	   natively appropriate mechanisms.

246	   Some flows fail to respond to congestion signals applied by an AQM.
247	   If these flows cause high degrees of WFID, it is reasonable and
248	   probably wise to include a backstop mechanism to prevent them from
249	   completely dominating the queue, by artificially inducing enough
250	   packet loss (without using the ECN "flag" mechanism) to materially
251	   reduce that flow's queue occupancy.  If possible, this "queue
252	   protection" mechanism should be specific to the offending flow(s),
253	   such that it mostly avoids dropping packets from appropriately
254	   responsive or inoffensive flows.  Without these features, an
255	   unresponsive flow could seriously impair the quality of service of
256	   other flows, either by producing a lot of BFID, or by causing an
257	   overzealous AQM to drop the wrong packets.

259	5.  Latency Sensitivity of Traffic

261	   Some protocols and applications are more sensitive to latency, and
262	   variations in delay, than others.  Variations in delay are often
263	   referred to as "jitter", which is the origin of the term "jitter
264	   buffer" commonly used in some types of application.

266	   If the response time for a DNS request exceeds 2 seconds, a timeout
267	   occurs and the request may be retried or an error reported to the
268	   application.  Since DNS is a critical support protocol for many
269	   Internet applications, the degree of BFID should be kept well below 2
270	   seconds in all foreseeable cases.  DNS timeouts are a significant
271	   cause of user-visible application failure, often resulting in manual
272	   retries and user frustration.  If DNS stops working, "the Internet is
273	   down".

275	   Congestion-controlled reliable transports, such as TCP, can have
276	   difficulty recovering from occasional packet loss efficiently if the
277	   effective RTT is high, which can be caused by excessive WFID.  The
278	   recovery process may be visible to the user in the form of a "stall"
279	   in the progress of a download or rendering of a Web page, since data
280	   received beyond the lost packet(s) cannot be delivered to the
281	   application until the lost packet's retransmission is successully
282	   received.  The duration of the stall is proportional to the effective
283	   RTT, so keeping WFID low can maintain reasonably smooth perceived
284	   application performance even in the face of packet loss and recovery.
285	   Implementing AQM with ECN can also eliminate packet loss entirely, if
286	   the underlying path is sufficiently reliable.

288	   NTP assumes that delay is approximately symmetric on each path.  In
289	   the case of BPD, that is usually true except in certain highly
290	   asymmetric routing scenarios.  The assumption is violated, however,
291	   in the case where BFID persists for an extended period of time that
292	   exceeds NTP's built-in filter against it.  Even quite small degrees
293	   of BFID can distort NTP synchronisation.

295	   VoIP and videoconferencing protocols can usually tolerate a
296	   surprisingly high BRTT, often more than the human users communicating
297	   over them.  To accommodate delay variations caused by inherent link
298	   characteristics, BFID and WFID, they require jitter buffers.  The
299	   round-trip latency presented to the users is the sum of the BRTT and
300	   the jitter buffers in both directions, so the jitter buffers are
301	   tuned at runtime to be only as large as necessary to accommodate
302	   observed delay variations.  Since these protocols usually don't
303	   produce much WFID, protecting them from BFID to the greatest extent
304	   practical will noticeably improve perceived call quality.

306	   Multiplayer games are among the most latency-sensitive applications
307	   visible to consumers.  The effective RTT determines how quickly it is
308	   possible for each player to perceive situations in the game and
309	   transmit responses to them.  In very fast-paced games, every
310	   millisecond is considered a valuable competitive edge, and
311	   experienced players become highly sensitive to even minor glitches
312	   caused by network disturbances.  In slower-paced games, there is
313	   slightly more tolerance, but a significant "lag spike" at an
314	   inopportune moment will still be noticed.  Crucially, a defeat caused
315	   by such a glitch is far more difficult for a player to accept than
316	   one caused by his own mistakes or an opponent's genuinely superior
317	   performance.  Accordingly, this class of application requires
318	   strictly minimising both BRTT and BFID, even at the expense of
319	   throughput, and should not be routed over links with significant
320	   inherent delay variation characteristics.

322	6.  Security Considerations

324	   This is an informational document and raises no security
325	   considerations.

327	7.  IANA Considerations

329	   There are no IANA considerations.

331	8.  Informative References

333	   [RFC1149]  Waitzman, D., "Standard for the transmission of IP
334	              datagrams on avian carriers", RFC 1149,
335	              DOI 10.17487/RFC1149, April 1990,
336	              <https://www.rfc-editor.org/info/rfc1149>.

338	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
339	              of Explicit Congestion Notification (ECN) to IP",
340	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
341	              <https://www.rfc-editor.org/info/rfc3168>.

343	   [RFC8289]  Nichols, K., Jacobson, V., McGregor, A., Ed., and J.
344	              Iyengar, Ed., "Controlled Delay Active Queue Management",
345	              RFC 8289, DOI 10.17487/RFC8289, January 2018,
346	              <https://www.rfc-editor.org/info/rfc8289>.

348	   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
349	              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
350	              and Active Queue Management Algorithm", RFC 8290,
351	              DOI 10.17487/RFC8290, January 2018,
352	              <https://www.rfc-editor.org/info/rfc8290>.

354	Authors' Addresses

356	   Jonathan Morton
357	   Kokkonranta 21
358	   FI-31520 Pitkajarvi
359	   Finland

361	   Phone: +358 44 927 2377
362	   Email: chromatix99@gmail.com

364	   Peter G. Heist
365	   Redacted
366	   463 11 Liberec 30
367	   Czech Republic
368	   Email: pete@heistp.net