idnits 2.17.1 draft-morton-tsvwg-interflow-intraflow-delays-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (17 May 2021) is 1074 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Working Group J. Morton 3 Internet-Draft 4 Intended status: Informational P. Heist 5 Expires: 18 November 2021 17 May 2021 7 Interflow vs Intraflow Delays 8 draft-morton-tsvwg-interflow-intraflow-delays-00 10 Abstract 12 Much current literature discusses queuing delays, and the effects of 13 different queue disciplines, active queue management algorithms, and 14 congestion control measures on these delays. This draft highlights 15 an important distinction between different types of delay, which may 16 be helpful to practitioners and theoreticians alike. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on 18 November 2021. 35 Copyright Notice 37 Copyright (c) 2021 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 42 license-info) in effect on the date of publication of this document. 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. Code Components 45 extracted from this document must include Simplified BSD License text 46 as described in Section 4.e of the Trust Legal Provisions and are 47 provided without warranty as described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2. Baseline Path Delay (BPD) and Baseline Round-Trip Time 53 (BRTT) . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 3. Between-Flow Induced Delay (BFID) . . . . . . . . . . . . . . 4 55 4. Within-Flow Induced Delay (WFID) . . . . . . . . . . . . . . 5 56 5. Latency Sensitivity of Traffic . . . . . . . . . . . . . . . 6 57 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 58 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 59 8. Informative References . . . . . . . . . . . . . . . . . . . 8 60 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 62 1. Introduction 64 Throughput, packet loss ratio, and latency are the three most 65 prominent performance characteristics of Internet paths. Of these, 66 throughput has always been the most heavily marketed to consumers, 67 possibly because it is the only metric from this group in which 68 bigger numbers are better. Packet loss is also closely managed by 69 network engineers, and is mostly kept to usefully low levels in 70 practice, probably because excessive packet loss tends to cripple the 71 throughput of typical congestion-controlled traffic. However, while 72 latency has great practical importance to many Internet applications, 73 it is rarely given the attention it needs for proper management. 75 One consequence of this neglect is the phenomenon of bufferbloat. 76 Any given Internet path has a natural baseline delay, which is a 77 consequence of the speed of information propagation in the physical 78 media, plus processing delays in network nodes that connect link 79 segments together, plus (for some link types) additional delays 80 associated with shared media negotiation. To this baseline, we must 81 add the delay caused by packets waiting in a queue behind other 82 packets, which occurs if the link is busy. If the queue is permitted 83 to grow too much, these additional queuing delays can become very 84 noticeable to the user, and may even affect the reliability of 85 Internet protocols. 87 This document does not discuss in detail the many and varied means of 88 controlling latency that are currently or might someday become 89 available. Instead the characteristics of this delay are discussed, 90 including the distinction between "inter-flow induced delay" and 91 "intra-flow induced delay". Typically these two types of delay, 92 despite their similar names, have different effects and may be 93 controlled by different queue mechanisms. Simple queues, however, do 94 not attempt to distinguish them. 96 To improve the likelihood of distinguishing the names, the terms BFID 97 (Between-Flow Induced Delay) and WFID (Within-Flow Induced Delay) 98 will be used as synonyms for inter-flow and intra-flow delays, 99 respectively. 101 2. Baseline Path Delay (BPD) and Baseline Round-Trip Time (BRTT) 103 *Definition:* The delay on a one-way path or round-trip due entirely 104 to link characteristics and unavoidable processing delays. 106 For the avoidance of doubt, the word "unavoidable" in this definition 107 refers to the agency of the traffic traversing the path in question, 108 and not to that of the network operators or equipment manufacturers 109 involved. 111 The speed of light is a fundamental limitation on information 112 transmission velocity, and thus on the minimum latency of a 113 geographically long Internet path. On radio-based links, this limit 114 is approached closely; in optical fibre or copper wires, the 115 transmission velocity is somewhat slower. When avian carriers 116 [RFC1149] are involved, the transmission velocity necessarily falls 117 below the speed of sound. In practice, an allowance of one 118 millisecond round-trip delay per 100km is usually appropriate. 120 When a packet is received by a network node, it must be directed into 121 a processing buffer for at least long enough to determine in which 122 direction it should be sent next. Since the necessary information is 123 typically in the packet header, this may sometimes be less time than 124 is necessary to receive the entire packet, in which case the head of 125 the packet may be sent onward while the tail is still being received. 126 In other cases, the node may receive the packet in whole before 127 making a processing decision, and may even aggregate the packet with 128 others for efficiency of dispatch. This efficiency in throughput or 129 power consumption may be achieved at the expense of processing delay. 131 Some link types have significant overhead associated with initiating 132 a transmission, and/or utilise a shared medium into which only one or 133 a small number of stations (out of a larger possible total) may 134 transmit simultaneously. Similar characteristics may also be 135 exhibited by power-saving measures on portable devices. These may 136 result in significant and/or variable delays in forwarding over these 137 links, which cannot be avoided by altering characteristics of the 138 traffic itself. 140 In practice, an Internet packet can be sent around the world in about 141 300 milliseconds with current technology. The round-trip latency 142 between Eastern Europe and Western North America is presently about 143 160 milliseconds. A "typical" Internet round-trip delay can be taken 144 to be 80 milliseconds, though more localised paths are significantly 145 quicker in this respect. Within a LAN or a datacentre, the baseline 146 delay will often be less than one millisecond. 148 Whenever two or more packets require sending over the same link 149 within the time required to send either one of them, link contention 150 exists and must be resolved. This generally involves either placing 151 packets into a queue or discarding them. These practices are not 152 within the definition of "baseline" delays, but influence "induced" 153 delays as below. 155 3. Between-Flow Induced Delay (BFID) 157 *Definition:* The delay which the presence and volume of one flow 158 induces in traffic belonging to another flow. 160 When packets are held in a queue awaiting delivery, the order in 161 which these packets are dequeued is significant for managing delay. 162 The most common strategy to date is to employ a simple FIFO queue. 163 This means that all traffic traversing the same link at about the 164 same time experience the same amount of queue delay. It also means 165 that a single flow occupying a large part of the queue induces a 166 large delay to all other flows sharing that queue, even if without 167 the presence of that single flow there would be no need for queuing 168 at all. This is the essence of BFID. 170 Large BFIDs can be avoided by discriminating flows with high queue 171 occupancy from those with little or no queue occupancy, and queuing 172 them separately. One effective method of doing so, that is, placing 173 every flow in its own FIFO and serving them in deficit-round-robin 174 order, is described in detail by [RFC8290]; this "flow-isolating" 175 mechanism reduces the maximum BFID to the serialisation time of one 176 full-size packet from each active flow, and can be implemented with 177 or without the use of Active Queue Management. It is also feasible 178 to merely categorise flows into queue occupancy bands and use a 179 separate FIFO only for each band; this renders the BFID experienced 180 by each flow proportionate to the BFID it produces. 182 BFID can also be reduced in a simple FIFO by implementing Active 183 Queue Management. This is because in a simple FIFO, BFID and WFID 184 have the same cause and extent, so reducing WFID also reduces BFID. 185 The extent to which BFID can be reduced by this method is limited 186 compared to dedicated methods, and a significant amount of delay 187 variation typically remains, but this is significantly better than 188 allowing a large, uncontrolled BFID to exist. 190 Capacity-seeking flows with little latency sensitivity are 191 particularly prone to produce BFID, while latency-sensitive flows 192 that typically use little capacity are particularly affected by 193 receiving BFID. 195 4. Within-Flow Induced Delay (WFID) 197 *Definition:* The delay which the presence and volume of one flow 198 induces in traffic belonging to itself. 200 Regardless of the order in which packets are delivered from a queue, 201 if more than one packet belonging to a given flow is held in a queue, 202 one of them induces delay to the other by occupying transmission 203 capacity ahead of it. In general this WFID is calculable as the 204 product of the packet delivery rate of that flow and the packet 205 occupancy in the queue of that flow. 207 In congestion-controlled flows, one typical cause of WFID is that the 208 flow's congestion window exceeds the baseline Bandwidth-Delay Product 209 (BDP) of the flow's path, and the queue in question is the 210 controlling bottleneck defining the Bandwidth factor. This is a 211 natural result of capacity-seeking behaviour, where the congestion 212 window is increased continuously until some explicit signal of 213 capacity overload is detected. If the queue is large and does not 214 implement Active Queue Management, WFIDs of many seconds are easily 215 achieved and have been observed in practice. 217 Another typical cause is that the sender emitted a short-term burst 218 of packets, which subsequently collects in one or more downstream 219 queues and is thereby spread out in time at the receiver. This cause 220 also applies to non-congestion-controlled protocols that can have 221 large datagram payloads. This form of WFID is usually harmless to 222 the flow causing it, except that large bursts can exceed the capacity 223 of a queue to absorb them, resulting in packet loss and the need for 224 retransmission. 226 In simple FIFOs, or where a flow-isolating mechanism is defeated by 227 hash collisions or information hiding, the presence of WFID also 228 implies the presence of an equal degree of BFID to any other flows 229 sharing that queue. This implies a responsibility to try to minimise 230 WFID, even when the flow causing it is not very sensitive to its 231 effects (as is typical of capacity-seeking protocols). Buffer sizing 232 guidelines (eg. typical BDP / sqrt(flows) ) are among the simplest 233 ways to limit WFID to tolerable levels. 235 Active Queue Management (AQM) is the primary means of effectively 236 controlling WFID without impairing the ability to absorb short-term 237 bursts of traffic, by sending congestion signals to flows 238 experiencing high queue occupancy. Early forms of AQM were only able 239 to generate congestion signals by artificially inducing packet loss. 240 ECN [RFC3168] introduced the ability to flag congestion on a packet 241 without dropping it. AQM may be used alone as in [RFC8289], or in 242 conjunction with flow-isolation mechanisms as in [RFC8290]. In the 243 latter case, both WFID and BFID are addressed individually by 244 natively appropriate mechanisms. 246 Some flows fail to respond to congestion signals applied by an AQM. 247 If these flows cause high degrees of WFID, it is reasonable and 248 probably wise to include a backstop mechanism to prevent them from 249 completely dominating the queue, by artificially inducing enough 250 packet loss (without using the ECN "flag" mechanism) to materially 251 reduce that flow's queue occupancy. If possible, this "queue 252 protection" mechanism should be specific to the offending flow(s), 253 such that it mostly avoids dropping packets from appropriately 254 responsive or inoffensive flows. Without these features, an 255 unresponsive flow could seriously impair the quality of service of 256 other flows, either by producing a lot of BFID, or by causing an 257 overzealous AQM to drop the wrong packets. 259 5. Latency Sensitivity of Traffic 261 Some protocols and applications are more sensitive to latency, and 262 variations in delay, than others. Variations in delay are often 263 referred to as "jitter", which is the origin of the term "jitter 264 buffer" commonly used in some types of application. 266 If the response time for a DNS request exceeds 2 seconds, a timeout 267 occurs and the request may be retried or an error reported to the 268 application. Since DNS is a critical support protocol for many 269 Internet applications, the degree of BFID should be kept well below 2 270 seconds in all foreseeable cases. DNS timeouts are a significant 271 cause of user-visible application failure, often resulting in manual 272 retries and user frustration. If DNS stops working, "the Internet is 273 down". 275 Congestion-controlled reliable transports, such as TCP, can have 276 difficulty recovering from occasional packet loss efficiently if the 277 effective RTT is high, which can be caused by excessive WFID. The 278 recovery process may be visible to the user in the form of a "stall" 279 in the progress of a download or rendering of a Web page, since data 280 received beyond the lost packet(s) cannot be delivered to the 281 application until the lost packet's retransmission is successully 282 received. The duration of the stall is proportional to the effective 283 RTT, so keeping WFID low can maintain reasonably smooth perceived 284 application performance even in the face of packet loss and recovery. 285 Implementing AQM with ECN can also eliminate packet loss entirely, if 286 the underlying path is sufficiently reliable. 288 NTP assumes that delay is approximately symmetric on each path. In 289 the case of BPD, that is usually true except in certain highly 290 asymmetric routing scenarios. The assumption is violated, however, 291 in the case where BFID persists for an extended period of time that 292 exceeds NTP's built-in filter against it. Even quite small degrees 293 of BFID can distort NTP synchronisation. 295 VoIP and videoconferencing protocols can usually tolerate a 296 surprisingly high BRTT, often more than the human users communicating 297 over them. To accommodate delay variations caused by inherent link 298 characteristics, BFID and WFID, they require jitter buffers. The 299 round-trip latency presented to the users is the sum of the BRTT and 300 the jitter buffers in both directions, so the jitter buffers are 301 tuned at runtime to be only as large as necessary to accommodate 302 observed delay variations. Since these protocols usually don't 303 produce much WFID, protecting them from BFID to the greatest extent 304 practical will noticeably improve perceived call quality. 306 Multiplayer games are among the most latency-sensitive applications 307 visible to consumers. The effective RTT determines how quickly it is 308 possible for each player to perceive situations in the game and 309 transmit responses to them. In very fast-paced games, every 310 millisecond is considered a valuable competitive edge, and 311 experienced players become highly sensitive to even minor glitches 312 caused by network disturbances. In slower-paced games, there is 313 slightly more tolerance, but a significant "lag spike" at an 314 inopportune moment will still be noticed. Crucially, a defeat caused 315 by such a glitch is far more difficult for a player to accept than 316 one caused by his own mistakes or an opponent's genuinely superior 317 performance. Accordingly, this class of application requires 318 strictly minimising both BRTT and BFID, even at the expense of 319 throughput, and should not be routed over links with significant 320 inherent delay variation characteristics. 322 6. Security Considerations 324 This is an informational document and raises no security 325 considerations. 327 7. IANA Considerations 329 There are no IANA considerations. 331 8. Informative References 333 [RFC1149] Waitzman, D., "Standard for the transmission of IP 334 datagrams on avian carriers", RFC 1149, 335 DOI 10.17487/RFC1149, April 1990, 336 . 338 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 339 of Explicit Congestion Notification (ECN) to IP", 340 RFC 3168, DOI 10.17487/RFC3168, September 2001, 341 . 343 [RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J. 344 Iyengar, Ed., "Controlled Delay Active Queue Management", 345 RFC 8289, DOI 10.17487/RFC8289, January 2018, 346 . 348 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 349 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 350 and Active Queue Management Algorithm", RFC 8290, 351 DOI 10.17487/RFC8290, January 2018, 352 . 354 Authors' Addresses 356 Jonathan Morton 357 Kokkonranta 21 358 FI-31520 Pitkajarvi 359 Finland 361 Phone: +358 44 927 2377 362 Email: chromatix99@gmail.com 364 Peter G. Heist 365 Redacted 366 463 11 Liberec 30 367 Czech Republic 368 Email: pete@heistp.net