idnits 2.17.1 draft-white-tsvwg-nqb-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2018) is 2006 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-02 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group G. White 3 Internet-Draft CableLabs 4 Intended status: Informational October 22, 2018 5 Expires: April 25, 2019 7 Identifying and Handling Non Queue Building Flows in a Bottleneck Link 8 draft-white-tsvwg-nqb-00 10 Abstract 12 This draft discusses the potential to improve quality of experience 13 for broadband internet applications by distinguishing between flows 14 that cause queuing latency and flows that don't. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at https://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on April 25, 2019. 33 Copyright Notice 35 Copyright (c) 2018 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (https://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Non-Queue Building Flows . . . . . . . . . . . . . . . . . . 3 52 3. Identifying NQB traffic . . . . . . . . . . . . . . . . . . . 3 53 3.1. Endpoint marking . . . . . . . . . . . . . . . . . . . . 4 54 3.2. Queuing behavior analysis . . . . . . . . . . . . . . . . 5 55 4. Non Queue Building PHB . . . . . . . . . . . . . . . . . . . 5 56 5. End-to-end Support . . . . . . . . . . . . . . . . . . . . . 6 57 6. Relationship to L4S . . . . . . . . . . . . . . . . . . . . . 6 58 7. Comparison to Existing Approaches . . . . . . . . . . . . . . 6 59 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 60 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 61 10. Security Considerations . . . . . . . . . . . . . . . . . . . 7 62 11. Informative References . . . . . . . . . . . . . . . . . . . 7 63 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 65 1. Introduction 67 Residential broadband internet services are commonly configured with 68 a single bottleneck link (the access network link) upon which the 69 service definition is applied. The service definition, typically an 70 upstream/downstream data rate tuple, is implemented as a configured 71 pair of rate shapers that are applied to the user's traffic. In such 72 networks, the quality of service that each application receives, and 73 as a result, the quality of experience that it generates for the user 74 is influenced by the characteristics of the access network link. 76 The vast majority of packets that are carried by residential 77 broadband access networks are managed by an end-to-end congestion 78 control algorithm, such as Reno, Cubic or BBR. These congestion 79 control algorithms attempt to seek the available capacity of the end- 80 to-end path (which in the case of residential broadband networks, can 81 frequently be the access network link), and in doing so generally 82 overshoot the available capacity, causing a queue to build-up at the 83 bottleneck link. This queue build up results in queuing delay that 84 the application experiences as variable latency. 86 In contrast to congestion-controlled applications, there are a 87 variety of relatively low data rate applications that do not 88 materially contribute to queueing delay, but are nonetheless 89 subjected to it by sharing the same bottleneck link in the access 90 network. Many of these applications may be sensitive to latency or 91 latency variation, and thus produce a poor quality of experience in 92 such conditions. 94 Active Queue Management (AQM) mechanisms (such as PIE [RFC8033], 95 DOCSIS-PIE [RFC8034], or CoDel [RFC8289]) can improve the quality of 96 experience for latency sensitive applications, but there are 97 practical limits to the amount of improvement that can be achieved 98 without impacting the throughput of capacity-seeking applications. 100 This document considers differentiating between these two classes of 101 traffic in bottleneck links in order that both classes can deliver 102 exceptional quality of experience for their applications, and 103 solicits discussion / feedback. 105 2. Non-Queue Building Flows 107 There are many applications that send traffic at relatively low data 108 rates and/or in a fairly smooth and consistent manner such that they 109 are highly unlikely to exceed the available capacity of the network 110 path between source and sink. Such applications are ideal candidates 111 to be queued separately from the capacity-seeking applications that 112 cause queue buildups and latency. 114 These Non-queue-building (NQB) flows are typically UDP flows, which 115 send traffic at a lower data rate and don't seek the capacity of the 116 link (examples: online games, voice chat, dns lookups). Here the 117 data rate is essentially limited by the Application itself. In 118 contrast, Queue-building (QB) flows include traffic which uses the 119 Traditional TCP, QUIC, BBR or other TCP variants. 121 There are a lot of great examples of applications that fall very 122 neatly into these two categories, but there are also application 123 flows that may be in a gray area in between (e.g. they are NQB on 124 high-speed links, but QB on slow-speed links). 126 3. Identifying NQB traffic 128 This memo is intended to seek feedback on mechanisms by which Non- 129 Queue Building flows can be identified by the network in an 130 application-neutral way. Two mechanisms in particular seem feasible, 131 and could (either alone or in concert) be used to differentiate 132 between QB and NQB flows. 134 o Endpoint marking. This mechanism would have application endpoints 135 apply a marking (perhaps utilizing the Diffserv field of the IP 136 header) to NQB flows that could then be used by the network to 137 differentiate between QB and NQB flows. 139 o Queuing behavior analysis. This mechanism would utilize real-time 140 per-flow traffic statistics to identify whether a flow is sending 141 traffic at a rate that exceeds the available capacity of the 142 bottleneck link and hence is causing a queue to form. 144 3.1. Endpoint marking 146 This mechanism would have application endpoints apply a marking 147 (perhaps utilizing the Diffserv field of the IP header) to NQB flows 148 that could then be used by the network to differentiate between QB 149 and NQB flows. It would be useful for such a marking to be 150 universally agreed upon, rather than being locally defined by the 151 network operator, such that applications could be written to apply 152 the marking without regard to local network policies. 154 Some questions that arise when considering endpoint marking are: How 155 can an application determine whether it is queue building or not, 156 given that the sending application is generally not aware of the 157 available capacity of the path to the receiving endpoint? Even in 158 cases where an application is aware of the capacity of the path, how 159 can it be sure that the available capacity (considering other flows 160 that may be sharing the path) would be sufficient to result in the 161 application's traffic not causing a queue to form? In an unmanaged 162 environment, how can networks trust endpoint marking, why wouldn't 163 all applications mark their packets as NQB? 165 As an answer the last question, it would be worthwhile to note that 166 the NQB designation and marking would be intended to convey 167 verifiable traffic behavior, not needs or wants. Also, it would be 168 important that incentives are aligned correctly, i.e. that there is a 169 benefit to the application in marking its packets correctly, and no 170 benefit for an application in intentionally mismarking its traffic. 171 Thus, a useful property of nodes that support separate queues for NQB 172 and QB flows would be that for NQB flows, the NQB queue provides 173 better performance (considering latency, loss and throughput) than 174 the QB queue; and for QB flows, the QB queue provides better 175 performance (considering latency, loss and throughput) than the NQB 176 queue. 178 Even so, it is possible that due to an implementation error or 179 misconfiguration, a QB flow would end up getting mismarked as NQB, or 180 vice versa. In the case of an NQB flow that isn't marked as NQB and 181 ends up in the QB queue, it would only impact its own quality of 182 service, and so it seems to be of lesser concern. However, a QB flow 183 that is mismarked as NQB, either due to error or due to the fact that 184 the application developer can't predict the data rate capabilities of 185 the link, would causing queuing delays for all of the other flows 186 that are sharing the NQB queue. 188 To prevent this situation from harming the performance of the real 189 NQB flows, it would likely be valuable to support a "queue 190 protection" function that could identify QB flows that are mismarked 191 as NQB, and reclassify those flows/packets to the QB queue. This 192 would benefit the reclassified flow by giving it access to a large 193 buffer (and thus lower packet loss rate), and would benefit the 194 actual NQB flows by preventing harm (increased latency variability) 195 to them. Some open questions around this function include: How could 196 such a function be implemented in an objective and verifiable manner? 197 What other options might exist to serve this purpose in a dual-queue 198 architecture? 200 3.2. Queuing behavior analysis 202 Similar to the queue protection function outlined in the previous 203 section, it may be feasible to devise a real time flow analyzer for a 204 node that would identify flows that are causing queue build up, and 205 redirect those flows to the QB queue, leaving the remaining flows in 206 the NQB queue. 208 4. Non Queue Building PHB 210 This section uses the DiffServ nomenclature of per-hop-behavior (PHB) 211 to describe how a network node could provide better quality of 212 service for NQB flows without reducing performance of QB flows. 214 A node supporting the NQB PHB would provide a separate queue for non- 215 queue-building traffic. This queue would support a latency-based 216 queue protection mechanism that is able to identify queue-building 217 behavior in flows that are classified into the queue, and to redirect 218 flows causing queue build up to a different queue. 220 While there may be some similarities between the characteristics of 221 NQB flows and flows marked with the Expedited Forwarding DSCP, the 222 NQB PHB would differ from the Expedited Forwarding PHB in several 223 important ways. 225 o NQB traffic is not rate limited or rate policed. Rather, the NQB 226 queue would be expected to support a latency-based queue 227 protection mechanism that identifies NQB marked flows that are 228 beginning to cause latency, and redirects packets from those flows 229 to the queue for QB flows. 231 o The node supporting the NQB PHB makes no guarantees on latency or 232 data rate for NQB marked flows, but instead aims to provide sub- 233 millisecond queuing delays for as many such marked flows as it 234 can, and shed load when needed. 236 o EF is commonly used exclusively for voice traffic, for which 237 additional functions are applied, such as admission control, 238 accounting, prioritized delivery, etc. 240 In networks that support the NQB PHB, it may be preferred to also 241 include traffic marked EF (101110b) in the NQB queue. The choice of 242 the 0x2A codepoint (101010b) for NQB would conveniently allow a node 243 to select these two codepoints using a single mask pattern of 244 101x10b. 246 5. End-to-end Support 248 In contrast to the existing standard DSCPs, which are typically only 249 enforced within a DiffServ Domain (e.g. an AS), this DSCP would be 250 intended for end-to-end usage across the Internet. Some access 251 network service providers bleach the Diffserv field on ingress into 252 their network, and in some cases apply their own DSCP for internal 253 usage. Access networks that support the NQB PHB would need to permit 254 the NQB PHB to pass through this bleaching operation such that the 255 PHB can be provided at the access network link. 257 6. Relationship to L4S 259 The dual-queue mechanism described in this draft is similar to, and 260 is intended to be compatible with [I-D.ietf-tsvwg-l4s-arch]. 262 7. Comparison to Existing Approaches 264 Traditional QoS mechanisms focus on prioritization in an attempt to 265 achieve two goals, reduced latency for "latency-sensitive" traffic, 266 and increased bandwidth availability for "important" applications. 267 Applications are generally given priority in proportion to some 268 combination of latency-sensitivity and importance. 270 Downsides to this approach include the difficulties in sorting out 271 what priority level each application should get (making the value 272 judgement as to latency-sensitivity and importance), associating 273 packets to priority levels (lots of classifier state, or trusting 274 endpoint markings and the value judgements that they convey), 275 ensuring that high priority traffic doesn't starve lower priority 276 traffic (admission control, weighted scheduling, etc. are possible 277 solutions). This solution can work in a managed network, where the 278 network operator can control the usage of the QoS mechanisms, but has 279 not been adopted end-to-end across the internet. 281 Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on 282 the other hand, achieve latency improvements by associating packets 283 into "flow" queues and then prioritizing "sparse flows", i.e. packets 284 that arrive to an empty flow queue. Flow queueing does not attempt 285 to differentiate between flows on the basis of value (importance or 286 latency-sensitivity), it simply gives preference to sparse flows, and 287 tries to guarantee that the non-sparse flows all get an equal share 288 of the remaining channel capacity. As a result, fq mechanisms could 289 be considered more appropriate for unmanaged environments and general 290 internet traffic. 292 Downsides to this approach include loss of low latency performance 293 due to hash collisions (where a sparse flow shares a queue with a 294 bulk data flow), complexity in managing a large number of queues, and 295 the scheduling (typically DRR) that enforces that each non-sparse 296 flow gets an equal fraction of link bandwidth causes problems with 297 VPNs and other tunnels, exhibits poor behavior with less-aggressive 298 CA algos, e.g. LEDBAT, and exhibits poor behavior with RMCAT CA 299 algos. In effect the network element is making a decision as to what 300 constitutes a flow, and then forcing all such flows to take equal 301 bandwidth at every instant. 303 The Dual-queue approach achieves the main benefit of fq_codel: 304 latency improvement without value judgements, without the downsides. 306 The distinction between NQB flows and QB flows is similar to the 307 distinction made between "sparse flow queues" and "non-sparse flow 308 queues" in fq_codel. In fq_codel, a flow queue is considered sparse 309 if it is drained completely by each packet transmission, and remains 310 empty for at least one cycle of the round robin over the active flows 311 (this is approximately equivalent to saying that it utilizes less 312 than its fair share of capacity). While this definition is 313 convenient to implement in fq_codel, it isn't the only useful 314 definition of sparse flows. 316 8. Acknowledgements 318 TBD 320 9. IANA Considerations 322 TBD 324 10. Security Considerations 326 TBD 328 11. Informative References 330 [I-D.ietf-tsvwg-l4s-arch] 331 Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, 332 Low Loss, Scalable Throughput (L4S) Internet Service: 333 Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in 334 progress), March 2018. 336 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 337 "Proportional Integral Controller Enhanced (PIE): A 338 Lightweight Control Scheme to Address the Bufferbloat 339 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 340 . 342 [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based 343 on Proportional Integral Controller Enhanced PIE) for 344 Data-Over-Cable Service Interface Specifications (DOCSIS) 345 Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February 346 2017, . 348 [RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J. 349 Iyengar, Ed., "Controlled Delay Active Queue Management", 350 RFC 8289, DOI 10.17487/RFC8289, January 2018, 351 . 353 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 354 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 355 and Active Queue Management Algorithm", RFC 8290, 356 DOI 10.17487/RFC8290, January 2018, 357 . 359 Author's Address 361 Greg White 362 CableLabs 363 858 Coal Creek Circle 364 Louisville, CO 80027 365 US 367 Email: g.white@cablelabs.com