idnits 2.17.1 draft-white-tsvwg-lld-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([LLD-white-paper]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 11, 2019) is 1845 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-08 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-05 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-03 == Outdated reference: A later version (-02) exists of draft-white-tsvwg-nqb-00 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group G. White 3 Internet-Draft K. Sundaresan 4 Intended status: Informational B. Briscoe 5 Expires: September 12, 2019 CableLabs 6 March 11, 2019 8 Low Latency DOCSIS - Technology Overview 9 draft-white-tsvwg-lld-00 11 Abstract 13 NOTE: This document is a reformatted version of [LLD-white-paper]. 15 The evolution of the bandwidth capabilities - from kilobits per 16 second to gigabits - across generations of DOCSIS cable broadband 17 technology has paved the way for the applications that today form our 18 digital lives. Along with increased bandwidth, or "speed", the 19 latency performance of DOCSIS technology has also improved in recent 20 years. Although it often gets less attention, latency performance 21 contributes as much or more to the broadband experience and the 22 feasibility of future applications as does speed. 24 Low Latency DOCSIS technology (LLD) is a specification developed by 25 CableLabs in collaboration with DOCSIS vendors and cable operators 26 that tackles the two main causes of latency in the network: queuing 27 delay and media acquisition delay. LLD introduces an approach 28 wherein data traffic from applications that aren't causing latency 29 can take a different logical path through the DOCSIS network without 30 getting hung up behind data from applications that are causing 31 latency, as is the case in today's Internet architectures. This 32 mechanism doesn't interfere with the way applications share the total 33 bandwidth of the connection, and it doesn't reduce one application's 34 latency at the expense of others. In addition, LLD improves the 35 DOCSIS upstream media acquisition delay with a faster request-grant 36 loop and a new proactive scheduling mechanism. LLD makes the 37 internet experience better for latency sensitive applications without 38 any negative impact on other applications. 40 The latest generation of DOCSIS equipment that has been deployed in 41 the field - DOCSIS 3.1 - experiences typical latency performance of 42 around 10 milliseconds (ms) on the Access Network link. However, 43 under heavy load, the link can experience delay spikes of 100 ms or 44 more. LLD systems can deliver a consistent 1 ms delay on the DOCSIS 45 network for traffic that isn't causing latency, imperceptible for 46 nearly all applications. The experience will be more consistent with 47 much smaller delay variation. 49 LLD can be deployed by field-upgrading DOCSIS 3.1 cable modem and 50 cable modem termination system devices with new software. The 51 technology includes tools that enable automatic provisioning of these 52 new services, and it also introduces new tools to report statistics 53 of latency performance to the operator. 55 Cable operators, DOCSIS equipment manufacturers, and application 56 providers will all have to act in order to take advantage of LLD. 57 This white paper explains the technology and describes the role that 58 each of these parties plays in making LLD a reality. 60 Status of This Memo 62 This Internet-Draft is submitted in full conformance with the 63 provisions of BCP 78 and BCP 79. 65 Internet-Drafts are working documents of the Internet Engineering 66 Task Force (IETF). Note that other groups may also distribute 67 working documents as Internet-Drafts. The list of current Internet- 68 Drafts is at https://datatracker.ietf.org/drafts/current/. 70 Internet-Drafts are draft documents valid for a maximum of six months 71 and may be updated, replaced, or obsoleted by other documents at any 72 time. It is inappropriate to use Internet-Drafts as reference 73 material or to cite them other than as "work in progress." 75 This Internet-Draft will expire on September 12, 2019. 77 Copyright Notice 79 Copyright (c) 2019 IETF Trust and the persons identified as the 80 document authors. All rights reserved. 82 This document is subject to BCP 78 and the IETF Trust's Legal 83 Provisions Relating to IETF Documents 84 (https://trustee.ietf.org/license-info) in effect on the date of 85 publication of this document. Please review these documents 86 carefully, as they describe your rights and restrictions with respect 87 to this document. Code Components extracted from this document must 88 include Simplified BSD License text as described in Section 4.e of 89 the Trust Legal Provisions and are provided without warranty as 90 described in the Simplified BSD License. 92 Table of Contents 94 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 95 2. Latency in DOCSIS Networks . . . . . . . . . . . . . . . . . 4 96 3. New Dual-Queue Approach . . . . . . . . . . . . . . . . . . . 7 97 3.1. Low-Latency Aggregate Service Flows . . . . . . . . . . . 8 98 3.2. Identifying NQB Packets - Default Classifiers . . . . . . 9 99 3.3. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 10 100 3.4. Queue Protection . . . . . . . . . . . . . . . . . . . . 11 101 4. Upstream Scheduling Improvements . . . . . . . . . . . . . . 12 102 4.1. Faster Request Grant Loop . . . . . . . . . . . . . . . . 12 103 4.2. Proactive Grant Service . . . . . . . . . . . . . . . . . 13 104 5. Low Latency DOCSIS Performance . . . . . . . . . . . . . . . 13 105 6. Deployment Considerations . . . . . . . . . . . . . . . . . . 16 106 6.1. Device Support . . . . . . . . . . . . . . . . . . . . . 16 107 6.2. Packet Marking . . . . . . . . . . . . . . . . . . . . . 17 108 6.3. Provisioning Mechanisms . . . . . . . . . . . . . . . . . 18 109 6.3.1. Aggregate QoS Profiles . . . . . . . . . . . . . . . 18 110 6.3.2. Migration Using Existing Configuration File and 111 Service Class Name . . . . . . . . . . . . . . . . . 18 112 6.3.3. Explicit Definition of ASF in the Configuration File 19 113 6.4. Latency Histogram Reporting . . . . . . . . . . . . . . . 19 114 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 19 115 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 116 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 117 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 118 11. Informative References . . . . . . . . . . . . . . . . . . . 20 119 Appendix A. Low Latency and High Bandwidth: L4S . . . . . . . . 22 120 Appendix B. Simulation Details . . . . . . . . . . . . . . . . . 24 121 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 123 1. Introduction 125 Let's begin with bandwidth (or "speed"): the amount of data that can 126 be delivered across a network connection over a period of time. 127 Sometimes bandwidth is very important to the broadband experience, 128 particularly when an application is trying to send or receive large 129 amounts of data, such as watching videos on Netflix, downloading 130 videos/music, syncing file-shares or email clients, uploading a video 131 to YouTube or Instagram, or downloading a new application or system 132 update. Other times, bandwidth (or bandwidth alone) isn't enough, 133 and latency has a big effect on the user experience. 135 Latency is the time that it takes for a short message (a packet, in 136 networking terminology) to make it across the network from the sender 137 to the receiver and for a response to come back. Network latency is 138 commonly measured as round-trip-time and is sometimes referred to as 139 "ping time." Applications that are more interactive or real-time, 140 like web browsing, online gaming, and video conferencing/chatting, 141 perform the best when latency is kept low, and adding more bandwidth 142 without addressing latency doesn't make things better. 144 When multiple applications share the broadband connection of one 145 household (e.g., several users doing different activities at the same 146 time), each of those applications can have an impact on the 147 performance of the others. They all share the total bandwidth of the 148 connection (so more active applications mean less bandwidth for each 149 one), and they can all cause the latency of the connection to 150 increase. 152 It turns out that applications today that want to send a lot of data 153 all at once do a reasonably good job of sharing the bandwidth in a 154 fair manner, but they actually cause a pretty big latency problem 155 when they do it because they send data too quickly and expect the 156 network to queue it up. We call these applications "queue-building" 157 applications, e.g., video streaming (Netflix). There are also plenty 158 of other applications that don't send data too quickly, so they don't 159 cause latency. We call these "non-queue-building" applications, 160 e.g., video chatting (FaceTime). 162 LLD separates these two types of traffic into two logical queues, 163 which greatly improves the latency experienced by the non-queue- 164 building applications (many of which may be latency-sensitive) 165 without having any downside for the queue-building applications. In 166 addition, two queues allow LLD to support a next-generation 167 application protocol that can scale up to sending data at 10 Gbps and 168 beyond while maintaining ultra-low queuing delay, which means that in 169 the future, there may not be queue-building applications at all. 171 As of the writing of this document, the Low Latency DOCSIS 172 specifications have just been published ([DOCSIS-MULPIv3.1], 173 [DOCSIS-CCAP-OSSIv3.1], [DOCSIS-CM-OSSIv3.1]), and DOCSIS equipment 174 manufacturers are working on building support for the functionality. 175 In addition, work is underway in the Internet Engineering Task Force 176 to standardize low-latency architectures across the broader Internet 177 ecosystem. 179 2. Latency in DOCSIS Networks 181 Low Latency DOCSIS technology is the next step in a progression of 182 latency improvements that have been made to the DOCSIS specifications 183 by CableLabs in recent years. Table 1 provides a snapshot of the 184 milestones in round-trip latency performance with DOCSIS technology 185 from the first DOCSIS 3.0 equipment to DOCSIS 3.1 equipment that 186 supports [RFC8034] Active Queue Management, and finally the new Low 187 Latency DOCSIS, which achieves ~1 ms of round-trip latency. The 188 table references three metrics that describe the range of latencies 189 added by the DOCSIS network link that would be experienced by a 190 broadband user. The first, "When Idle," refers to a broadband 191 connection that is not being actively used by the customer. The 192 second, "Under Load," represents average latency while the user is 193 actively using the service (e.g., streaming video). Finally, the 194 third, "99th Percentile," gives an indication of the maximum latency 195 that a customer would commonly experience in real usage scenarios. 196 The table uses order-of-magnitude numbers because the actual 197 performance will vary because of a number of factors including DOCSIS 198 channel configuration and actual application usage pattern. 200 For latency-sensitive applications, the 99th percentile value has the 201 most impact on user experience. 203 TABLE 1. EVOLUTION OF LATENCY PERFORMANCE IN DOCSIS NETWORKS (ROUND- 204 TRIP TIME IN MILLISECONDS BETWEEN THE CM AND CMTS) 206 +-------------------------------+--------+----------+---------------+ 207 | | When | Under | 99th | 208 | | Idle | Load | Percentile | 209 +-------------------------------+--------+----------+---------------+ 210 | DOCSIS 3.0 Early Equipment | ~10 ms | ~1000 ms | ~1000 ms | 211 | DOCSIS 3.0 w/ Buffer Control | ~10 ms | ~100 ms | ~100 ms | 212 | DOCSIS 3.1 Active Queue | ~10 ms | ~10 ms | ~100 ms | 213 | Management | | | | 214 | Low Latency DOCSIS 3.1 | ~1 ms | ~1 ms | ~1 ms | 215 +-------------------------------+--------+----------+---------------+ 217 Table 1 219 The latency described in Table 1 is caused by a series of factors in 220 the DOCSIS cable modem (CM) and cable modem termination system 221 (CMTS). Figure 1 in [LLD-white-paper] illustrates the range of 222 latencies caused by those factors in DOCSIS 3.1 networks. 224 The lowest two latency sources in Figure 1 in [LLD-white-paper] have 225 minor impacts on overall latency. 227 The "Switching/Forwarding" delay represents the amount of time it 228 takes for the CM and CMTS to make the decision to forward a packet. 229 This has a very minor impact on overall latency. 231 The "Propagation" delay (the amount of time it takes for a signal to 232 travel on the HFC plant) is set by the speed of light and the 233 distance from CM to CMTS. Not much can be done to affect latency 234 from this source. 236 Of the sources in Figure 1 in [LLD-white-paper], the top three 237 significantly drive latency performance. 239 The range of the "Serialization/Encoding" delay comes from the 240 upstream and downstream channel configuration options available to 241 the operator. Some of these configurations provide significant 242 robustness benefits at the expense of latency, whereas others may be 243 less robust to noise but provide very low latency. The LLD 244 specification does not modify the set of options available to the 245 operator. Rather, operators should be encouraged to use the lowest 246 latency channel configurations that they can, given the plant 247 conditions. 249 The "Media Acquisition" delay is a result of the shared-medium 250 scheduling currently provided by DOCSIS technology, in which the CMTS 251 arbitrates access to the upstream channel via a request-grant 252 mechanism. 254 The "Queuing" delay is mainly caused by the current TCP protocol and 255 its variants. Applications today that need to seek out as much 256 bandwidth as possible use a transport protocol like TCP (or the TCP- 257 replacement known as QUIC), which uses a "congestion control" 258 algorithm (such as Reno, Cubic, or BBR) to adjust to the available 259 bandwidth at the bottleneck link through the network. Typically, 260 this will be the last mile link - the DOCSIS link for cable customers 261 - where the bandwidth available for each application often varies 262 rapidly as the activity of all the devices in the household varies. 264 With today's congestion control algorithms, the sender ramps up the 265 sending rate until it's sending data faster than the bottleneck link 266 can support. Packets then start queuing in a buffer at the entrance 267 to the link, i.e. the CM or CMTS. This queue of packets grows 268 quickly until the device decides to discard some newly arriving 269 packets, which triggers the sender to pause for a bit in order to 270 allow the buffer to drain somewhat before resuming sending. This 271 process is an inherent feature of the TCP family of Internet 272 transport protocols, and it repeats over and over again until the 273 file transfer completes. In doing so, it causes latency and packet 274 loss for all of the traffic that shares the broadband link. 276 LLD tackles the two main causes of latency in the network: queuing 277 delay and media acquisition delay. 279 o LLD addresses Queueing Delay by allowing non-queue-building 280 applications to avoid waiting behind the delays caused by the 281 current TCP or its variants. At a high level, the low-latency 282 architecture consists of a dual-queue approach that treats both 283 queues as a single pool of bandwidth. 285 o LLD cuts Media Acquisition Delay by using a faster request-grant 286 loop and by adding support for a new proactive scheduler that can 287 provide extremely low latency service. 289 In addition, LLD introduces detailed statistics on queueing delay via 290 histogram calculations performed by the CM (for upstream) and CMTS 291 (for downstream). Furthermore, CableLabs is working with a broad 292 cross-section of stakeholders in the IETF to standardize an end-to- 293 end service architecture that can leverage LLD to enable even high 294 bandwidth TCP flows to achieve ultra-low queuing delay. This 295 technology will be important for future, interactive high-data-rate 296 applications like holographic light field experiences, as well as for 297 enabling higher performance versions of today's applications like web 298 and video conferencing. 300 The sections below describe these features in more detail. 302 3. New Dual-Queue Approach 304 Of all the features of LLD, the dual-queue mechanism has by far the 305 greatest impact on round-trip latency and latency variation. The 306 concept of the dual-queue approach is that the majority of the 307 applications that use the internet can be divided into two 308 categories: 310 o Queue-Building Applications: These application traffic flows 311 frequently send data faster than the path between sender and 312 receiver can support. The most common instance of queue-building 313 flows are flows that use the current TCP or QUIC protocols. As 314 discussed above, these capacity-seeking protocols use a legacy 315 congestion control algorithm that probes for available capacity on 316 the path by sending data faster than the path can support and 317 expecting the network to queue the excess data in internal 318 buffers. The majority of traffic (by volume) today is queue- 319 building. Some examples of queue-building applications are video 320 streaming (e.g., Netflix, YouTube) and application downloads. 322 o Non-Queue-Building Applications: These application traffic flows 323 very rarely send data faster than the path can support. They come 324 in two subcategories: 326 * Today's self-limited, non-capacity-seeking apps, such as 327 multiplayer online games and IP communication apps (such as 328 Skype or FaceTime). These applications send data at a 329 relatively low data rate and generally space their packets out 330 in a manner that does not cause a queue to form in the network. 332 * Future capacity-seeking TCP/QUIC applications that adopt the 333 new L4S congestion control algorithm (see Appendix A) and so 334 can immediately respond to fast congestion signals sent by the 335 network. These applications are still in development, as 336 networks must first support L4S before applications are able to 337 take advantage, but some prime candidates are web browsing, 338 cloud VR, and interactive light field experiences. 340 Queue-building (QB) application flows are the source of queuing 341 delay, and today's non-queue-building (NQB) apps typically suffer 342 from the latency caused by the QB flows. 344 The purpose of the dual-queue mechanism is to segment queue-building 345 traffic from non-queue-building traffic in a manner that can be 346 readily implemented in DOCSIS 3.1 equipment and that doesn't alter 347 the overall bandwidth of the broadband service. 349 By segmenting these two types of applications into separate queues, 350 each can get optimal performance. The QB traffic can build a queue 351 and achieve the necessary and expected throughput performance, and 352 the NQB traffic can take advantage of the available lower latencies 353 by avoiding the delay caused by the QB flows. It is important to 354 note that this segmentation of traffic isn't for purposes of giving 355 one class of traffic benefits at the expense of the other - it isn't 356 a high-priority queue and a low-priority queue. Instead, each queue 357 is optimized for the distinct features and requirements of the two 358 classes of traffic, enabling increased functionality and adding value 359 for the broadband user. This is smart network management at work. 361 3.1. Low-Latency Aggregate Service Flows 363 DOCSIS 3.1 equipment, like equipment built against earlier versions 364 of the specification, supports a number of upstream and downstream 365 Service Flows (SFs). These Service Flows are logical pipes that are 366 defined by their configured Quality of Service (QoS) parameters (most 367 commonly, the rate shaping parameters [MULPIv3.1] that specify the 368 speed of user connections) and that carry a subset of the traffic to/ 369 from a particular CM, as specified by a set of packet classifiers 370 configured by the operator. Traditionally, each Service Flow 371 provides near-complete isolation of its traffic from the traffic 372 transiting other Service Flows (those on the same CM as well as those 373 on other CMs) - each Service Flow has its own buffer and queue and is 374 scheduled independently by the CMTS. 376 Typically, the operator defines a service offering via the 377 configuration of a single upstream Service Flow and a single 378 downstream Service Flow with rate shaping enabled, and all of the 379 user's traffic transits these two Service Flows. 381 The DOCSIS 3.1 specification already includes optional support in the 382 CMTS for a mechanism to group any number of the Service Flows serving 383 a particular CM. LLD leverages and extends this "Aggregate Service 384 Flow" (ASF) feature to establish (and group) a pair of Service Flows 385 in each direction specifically to enable low-latency services. One 386 of the Service Flows in the pair (the "Low Latency Service Flow") 387 will carry NQB traffic, and the other Service Flow (the "Classic 388 Service Flow") will carry QB traffic. The Aggregate Service Flow is 389 configured for the service's rate shaping setting, and the two 390 constituent Service Flows inside the Aggregate have rate shaping 391 disabled. The result is that the operator can configure the total 392 aggregate rate of the service offering in each direction and does not 393 have to configure (or even consider) how much of the user's traffic 394 is likely to be NQB vs QB. 396 Figure 2 in [LLD-white-paper] illustrates an example configuration of 397 broadband service as it might look in a current DOCSIS deployment, as 398 well as how it would look with Low Latency DOCSIS. In the 399 traditional configuration, there is a single downstream Service Flow 400 with a rate of 100 Mbps and a single upstream Service Flow with a 401 rate of 20 Mbps. In the LLD configuration, there is a single 402 downstream Aggregate Service Flow with a rate of 100 Mbps, containing 403 two individual Service Flows, one for Low Latency traffic and one for 404 Classic traffic. Similarly, there is single upstream Aggregate 405 Service Flow with a rate of 20 Mbps, containing two individual 406 Service Flows for Low Latency and Classic traffic. 408 The CMTS will enforce the Aggregate "Max Sustained Traffic Rate" 409 (AMSR), and the end-user's applications determine how much of the 410 aggregate bandwidth they consume irrespective of which SF they use - 411 just as they do today with a single DOCSIS SF. 413 As described later, Inter-Service-Flow scheduling is arranged to make 414 the ASF function as a single pool of bandwidth. 416 3.2. Identifying NQB Packets - Default Classifiers 418 By default, the traffic within an Aggregate Service Flow is segmented 419 into the two constituent Service Flows by a set of packet classifiers 420 (see Figure 3 in [LLD-white-paper]) that examine the Differentiated 421 Services (DiffServ) Field and the Explicit Congestion Notification 422 (ECN) Field, which are standard elements of the IPv4/IPv6 header 423 [RFC3168]. Specifically, packets with an NQB DiffServ value or an 424 ECN field indicating either ECN Capable Transport 1 (ECT(1)) or 425 Congestion Experienced (CE) will get mapped to the Low Latency 426 Service Flow, and the rest of the traffic will get mapped to the 427 Classic Service Flow. 429 As of the writing of this draft, it is proposed that the DiffServ 430 value 0x2A be standardized in IETF/IANA to indicate NQB 431 [I-D.white-tsvwg-nqb]. Certain existing DiffServ values may also be 432 classified as NQB by default, such as Expedited Forwarding (EF). 434 The expectation is that non-queue-building traffic sources 435 (applications) will either mark their packets with an NQB DiffServ 436 value or support ECN. 438 Although the DiffServ Field is being used to indicate NQB behavior, 439 that does not imply adoption of the Differentiated Services 440 architecture as it is typically understood. In the traditional 441 DiffServ architecture, applications indicate a desire for a 442 particular treatment of their packets - often implemented as a 443 priority level - which in essence conveys a value judgement as to the 444 importance of that traffic relative to the traffic of other 445 applications. Such an architecture can work just fine in a managed 446 environment where all applications conform to a common view of their 447 relative priority levels and so can be trusted to mark their packets 448 appropriately. It fails, however, when applications need to send 449 packets across trust boundaries between networks, where there would 450 be no common view on their relative importance. As a result, the 451 DiffServ architecture is often used within managed networks 452 (corporate networks, campus networks, etc.) but is not used on the 453 Internet. 455 LLD's usage of the DiffServ Field to indicate NQB sidesteps this 456 fundamental problem by eliminating the subjective value judgement on 457 the relative importance of applications. Instead, this usage of the 458 DiffServ Field describes objectively verifiable behavior on the part 459 of the application - that it will not build a queue. Therefore, 460 networks can verify that the marking has been applied properly before 461 a packet is allowed into the Low Latency Service Flow queue (see 462 Section 3.4). 464 The ECN classifiers enable LLD's support of the IETF's Low Latency 465 Low Loss Scalable throughput (L4S) service 466 [I-D.ietf-tsvwg-ecn-l4s-id], which is an evolution of the original 467 ECN facility to support applications needing both high bandwidth and 468 low latency (see Appendix A). 470 3.3. Coupled AQM 472 To manage queuing delay, both the Low Latency Service Flow queue and 473 the Classic Service Flow queue support Active Queue Management (AQM) 474 (see Figure 4 in [LLD-white-paper]). 476 In the case of the Classic Service Flow, the queue implements the 477 same state-of-the-art Active Queue Management techniques used in 478 today's DOCSIS 3.1 networks. For upstream Classic Service Flows, the 479 DOCSIS 3.1 specification mandates that the CM implement the DOCSIS- 480 PIE (Proportional-Integral-Enhanced AQM Algorithm), which introduces 481 packet drops at an appropriate rate to drive the queue delay to the 482 default target value of 10 ms. For downstream Classic Service Flows, 483 the AQM in the CMTS is still vendor specific. 485 In the case of the Low Latency Service Flow, the queue supports L4S 486 congestion controllers by implementing an Immediate Active Queue 487 Management algorithm that utilizes ECN marking instead of packet 488 drops. By default, the algorithm does not mark the packet if the 489 queuing delay is less than 0.475 milliseconds and always marks the 490 packet if the delay is greater than 1 ms. Between those configurable 491 values, the algorithm marks at a rate that ramps up from 0% to 100% 492 over the range. In addition, per [I-D.ietf-tsvwg-aqm-dualq-coupled], 493 the Immediate AQM in the Low Latency Queue is coupled to the Classic 494 Queue AQM so that congestion in the Classic Queue will induce ECN 495 marking in the Low Latency Queue that will act to balance the per- 496 flow throughput across all of the flows in both queues. L4S 497 congestion control and the role of the dual-queue-coupled-aqm in 498 providing flow balance is described further in Appendix A. 500 To enable the Low Latency Queue to rapidly dequeue an arrived burst 501 of traffic, the Inter-Service-Flow scheduler gives a higher weight to 502 the Low Latency Queue than it does to the Classic Queue. The 503 coupling to the Low Latency AQM counterbalances the weighted 504 scheduler by making low-latency applications leave space for Classic 505 traffic. This ensures that the weighted scheduler does not give 506 priority over bandwidth, as a traditional weighted scheduler would. 508 3.4. Queue Protection 510 Because of the small buffer size of the Low Latency Queue, classic 511 TCP flows or other queue-building flows would see poor performance 512 (due to high packet loss) if they were to end up in the Low Latency 513 Queue. In addition, they would destroy the latency performance for 514 the non-queue-building flows, negating the primary benefits of LLD. 516 To prevent this situation, the packets that are classified to the Low 517 Latency queue pass through a "Queue Protection" function (see 518 Figure 5 in [LLD-white-paper]), which scores each flow's contribution 519 to the growth of the queue. If the queue delay exceeds a threshold, 520 the Queue Protection function identifies the flow or flows that have 521 contributed most to the growth of the queue delay, and it redirects 522 future packets from those flows to the Classic Service Flow. This 523 mechanism is performed objectively and statistically, without 524 examining the identifiers or contents of the data being transmitted. 526 4. Upstream Scheduling Improvements 528 The DOCSIS upstream Media Access Control (MAC) Layer uses a request- 529 grant mechanism. When data to be transmitted arrive at the CM, a 530 request message is sent from the CM to the CMTS. The CMTS schedules 531 the individual transmission bursts for all the CMs and communicates 532 this via a bandwidth allocation map (MAP) message. Each MAP message 533 describes the upstream transmission opportunities (grants) for a time 534 interval and is sent shortly before the interval to which it applies. 536 When a CM has data to send, it waits for a "contention request" 537 transmission opportunity. During that opportunity, it sends a short 538 request message indicating the amount of data it has to send. It 539 then waits for a subsequent MAP message granting it a transmission 540 opportunity in which to send its data. This time interval between 541 the arrival of the packet at the CM and the time at which the data 542 arrives at the CMTS on the upstream channel is known as the Request- 543 Grant Delay (see Figure 6 in [LLD-white-paper]). In the absence of 544 queuing delay, this delay is generally 2-8 ms. 546 4.1. Faster Request Grant Loop 548 LLD lowers the request-grant delay by requiring support for a shorter 549 MAP Interval and a shorter MAP Processing Time (see Figure 7 in 550 [LLD-white-paper]). 552 The MAP interval is the amount of time that each MAP message 553 describes. The MAP interval is also the time interval between 554 consecutive MAP messages. Reducing the MAP interval means that the 555 CMTS processes incoming requests more frequently, thus shortening the 556 amount of time that a request might wait at the CMTS before being 557 processed. A shorter MAP interval also means that grants are not 558 scheduled as far into the future within each MAP message. 560 The MAP Processing Time is the amount of time the CMTS uses to 561 perform its scheduling calculations. With a shorter MAP Processing 562 Time, there is less delay between a request being received at the 563 CMTS and the resulting grant being scheduled. 565 The LLD specification requires support for a nominal MAP interval of 566 1 ms or less for OFDMA upstream channels, in place of the 2-4 ms used 567 previously. In certain configurations, a 1 ms MAP interval may 568 introduce tradeoffs such as upstream and/or downstream inefficiency 569 that will need to be weighed against the latency improvement. 571 4.2. Proactive Grant Service 573 DOCSIS scheduling services are designed to customize the behavior of 574 the request-grant process for particular traffic types. LLD 575 introduces a new scheduling service called Proactive Grant Service 576 (PGS), which can eliminate the request-grant loop entirely (see 577 Figure 8 in [LLD-white-paper]). 579 In PGS, a CMTS proactively schedules a stream of grants to a Service 580 Flow at a rate that is intended to match or exceed the instantaneous 581 demand. In doing so, the vast majority of packets carried by the 582 Service Flow can be transmitted without being delayed by the Request- 583 Grant process. During periods when the CMTS estimates no demand for 584 bandwidth for a particular PGS Service Flow, it can conserve 585 bandwidth by providing periodic unicast request opportunities rather 586 than a stream of grants. 588 The service parameters that are specific to PGS are Guaranteed Grant 589 Interval (GGI), Guaranteed Grant Rate (GGR), and Guaranteed Request 590 Interval (GRI). In addition, the traditional rate-shaping 591 parameters, such as Maximum Sustained Traffic Rate and Peak Rate, 592 serve as an upper bound on the grants that can be provided to a PGS 593 Service Flow. 595 PGS can eliminate the delay caused by the Request-Grant loop, but it 596 comes at the price of efficiency. Inevitably, the CMTS will not be 597 able to exactly predict the instantaneous demand for the Service 598 Flow, so it may overestimate the capacity needed. When the shared 599 channel is fully utilized, this could reduce the capacity available 600 to other Service Flows. 602 The PGS scheduling type may appear at first to be similar to an 603 existing DOCSIS upstream scheduling type "UGS/AD." The main 604 differences with PGS are that it sets a minimum floor on the level of 605 granting (minimum grant spacing and minimum granted bandwidth) rather 606 than setting a fixed grant pattern (fixed grant size and precise 607 grant spacing), it supports the "Continuous Concatenation and 608 Fragmentation" method of filling grants (where a contiguous sequence 609 of bytes are dequeued to fill the grant, regardless of packet 610 boundaries) rather than only carrying a single packet in each grant, 611 and the CM is expected to continue to send Requests to the CMTS to 612 inform it of packets that might be waiting in the queue. 614 5. Low Latency DOCSIS Performance 616 CableLabs has developed a simulator using the NS3 platform 617 () in order to evaluate the performance of 618 different aspects of LLD. The simulator models a DOCSIS 3.1 link 619 (OFDM/A channel types) between the CM and the CMTS and can be 620 configured to enable or disable various components of the technology. 622 Because the latency performance of the service depends on the mix of 623 applications in use by the customer, we have developed a set of 10 624 traffic mix scenarios that represent what we believe to be common 625 busy-hour behaviors for a cable customer. All traffic mixes include 626 two bidirectional UDP sessions that are modeled after online games, 627 but they could also represent VoIP or video conferencing/chatting 628 applications. One of the sessions has its packets marked as NQB and 629 the other does not, allowing us to see the benefit that the low- 630 latency queue provides. 632 In addition, each traffic mix has a set of other applications that 633 create background load, as summarized in Table 2 (see Appendix B for 634 details on the traffic types). All of this background load traffic 635 utilizes the classic queue. 637 Some of these traffic mixes represent behaviors that may be very 638 common for broadband users during busy hour, whereas others represent 639 more extreme behaviors that users may occasionally engage in. When 640 generating an overall view of the performance across all of the 641 traffic mixes, we model the fact that they may not all be equally 642 likely to occur by giving the more common mixes (1, 2, and 8) ten 643 times the weight that we give to each of the other less common mixes. 645 TABLE2. BACKGROUND TRAFFIC MIXES 647 +----------------+--------------------------------------------------+ 648 | Traffic Mix 1 | 1 web user | 649 | Traffic Mix 2 | 1 web user, 1 video streaming user | 650 | Traffic Mix 3 | 1 web user, 1 FTP upstream | 651 | Traffic Mix 4 | 1 web user, 1 FTP downstream | 652 | Traffic Mix 5 | 1 web user, 1 FTP upstream and 1 FTP downstream | 653 | Traffic Mix 6 | 1 web user, 5 FTP upstream and 5 FTP downstream | 654 | Traffic Mix 7 | 1 web user, 5 FTP up, 5 FTP down, and 2 video | 655 | | streaming users | 656 | Traffic Mix 8 | 5 web users | 657 | Traffic Mix 9 | 16 TCP down (speedtest) | 658 | Traffic Mix 10 | 8 TCP up (speedtest) | 659 +----------------+--------------------------------------------------+ 661 Table 2 663 Table 3 summarizes the 99th percentile per-packet latency for the 664 NQB-marked game traffic across all ten traffic mixes, as well as the 665 weighted overall performance, for four different systems: 667 1. a legacy DOCSIS 3.1 system with AQM disabled, 2 ms MAP interval; 669 2. a legacy DOCSIS 3.1 system with AQM enabled, 2 ms MAP interval; 671 3. a Low Latency DOCSIS 3.1 system without PGS, 1 ms MAP interval; 672 and 674 4. a Low Latency DOCSIS 3.1 system with PGS configured for 5 Mbps 675 GGR, 1 ms MAP interval. 677 We include LLD with and without PGS because some network operators 678 may wish to deploy LLD without the overhead that comes with PGS 679 scheduling. 681 TABLE 3. 99TH PERCENTILE ROUND-TRIP LATENCY FOR NQB-MARKED TRAFFIC 682 BETWEEN THE CM AND CMTS 684 +-----------+-------------+------------+--------------+-------------+ 685 | | Legacy | Legacy | Low Latency | Low Latency | 686 | | DOCSIS 3.1 | DOCSIS 3.1 | DOCSIS with | DOCSIS with | 687 | | with no AQM | with AQM | no PGS | PGS | 688 +-----------+-------------+------------+--------------+-------------+ 689 | Traffic | 7.7 ms | 7.7 ms | 4.7 ms | 0.9 ms | 690 | Mix 1 | | | | | 691 | Traffic | 7.7 ms | 7.7 ms | 4.8 ms | 0.9 ms | 692 | Mix 2 | | | | | 693 | Traffic | 159.5 ms | 36.6 ms | 4.7 ms | 0.9 ms | 694 | Mix 3 | | | | | 695 | Traffic | 7.8 ms | 7.9 ms | 4.7 ms | 0.9 ms | 696 | Mix 4 | | | | | 697 | Traffic | 159.6 ms | 57.4 ms | 4.7 ms | 0.9 ms | 698 | Mix 5 | | | | | 699 | Traffic | 253.7 ms | 96.7 ms | 4.7 ms | 0.9 ms | 700 | Mix 6 | | | | | 701 | Traffic | 253.9 ms | 74.7 ms | 4.7 ms | 0.9 ms | 702 | Mix 7 | | | | | 703 | Traffic | 7.7 ms | 7.7 ms | 4.7 ms | 0.9 ms | 704 | Mix 8 | | | | | 705 | Traffic | 259.3 ms | 52.1 ms | 4.8 ms | 0.9 ms | 706 | Mix 9 | | | | | 707 | Traffic | 254.0 ms | 34.1 ms | 4.8 ms | 0.9 ms | 708 | Mix 10 | | | | | 709 | Weighted | 250.5 ms | 32.4 ms | 4.7 ms | 0.9 ms | 710 | Overall | | | | | 711 | P99 | | | | | 712 +-----------+-------------+------------+--------------+-------------+ 714 Table 3 716 As can be seen in this table, there are several traffic mixes 717 (notably 1, 2, 4, and 8) for which the relatively light traffic load 718 doesn't create the conditions for TCP to cause significant queuing 719 delay, so even the "Legacy DOCSIS 3.1 with no AQM" system results in 720 fairly low latency. However, in the heavier traffic mixes, the 721 benefit of AQM can be seen and the benefit of the dual-queue 722 mechanism in LLD becomes very apparent. By separating the NQB-marked 723 traffic from the queue-building traffic, the NQB-marked traffic is 724 isolated from the delay created by the TCP flows entirely, and very 725 reliable low latency is achieved. The right-most system, which 726 additionally implements PGS, can eliminate the request-grant delay 727 for the NQB traffic and thereby drive the round-trip latency below 1 728 ms at 99th percentile. 730 Figure 9 in [LLD-white-paper] illustrates the weighted overall 731 latency performance across all ten traffic mixes. The plot is a log- 732 log complementary cumulative distribution function, with the y-axis 733 labeled with the equivalent quantile values. 735 Focusing, for instance, on the horizontal through the 99th percentile 736 (P99), it can be seen that LLD with PGS holds delay below 0.9 ms for 737 99% of packets. In contrast, a DOCSIS 3.1 network without AQM can 738 only hold delay below 250 ms for 99% of packets. So, P99 delay is 739 more than 250 times better with LLD. We therefore see that LLD will 740 bring a consistent, low-latency, responsive quality to cable 741 broadband performance and user experiences for NBQ traffic. 743 6. Deployment Considerations 745 6.1. Device Support 747 Deploying LLD in the MSO network can be accomplished via software- 748 only upgrades to the existing DOCSIS 3.1 CMs and CMTSs. Table 4 749 shows which LLD features need implementation on the CM side, the CMTS 750 side, or both. The Dual Queue feature in the upstream requires an 751 upgrade to the CM as well as to the CMTS. The other features (Dual 752 Queue in Downstream, Upstream Scheduling improvements) only require 753 upgrades on the CMTS, so they can be deployed to CMs that don't 754 support LLD (including DOCSIS 3.0 modems). 756 TABLE 4. DEVICE DEPENDENCIES FOR LLD FEATURES 758 +------------+------------+-------------+-------------+-------------+ 759 | LLD | Downstream | Downstream | Upstream | Upstream | 760 | Feature | Latency Im | Latency Imp | Latency Imp | Latency Imp | 761 | | provements | rovements - | rovements - | rovements - | 762 | | - CMTS | CM upgrade? | CMTS | CM upgrade? | 763 | | upgrade? | | upgrade? | | 764 +------------+------------+-------------+-------------+-------------+ 765 | Dual Queue | Required | Not | Required | Required | 766 | (ASF, | | required | | | 767 | Coupled | | | | | 768 | AQM, QP) | | | | | 769 | Upstream | Not | Not | Required | Not | 770 | Scheduling | applicable | applicable | | required | 771 | (Faster | | | | | 772 | Req-Grant | | | | | 773 | Loop, PGS) | | | | | 774 +------------+------------+-------------+-------------+-------------+ 776 Table 4 778 6.2. Packet Marking 780 The design of LLD takes the approach that applications are in the 781 best position to determine which flows or which packets are non- 782 queue-building. Thus, applications such as online games will be able 783 to tag their packets with the NQB DiffServ value to indicate that 784 they behave in a non-queue-building way, so that LLD will be able to 785 classify them into the Low Latency Service Flow. 787 For these packet markings to be useful for the LLD classifiers, they 788 will need to survive the journey from the application source to the 789 CM or CMTS. In some cases, operators today clear the DiffServ Field 790 in packets entering their network from an interconnecting network, 791 which would prevent the markings making their way to the CMTS. This 792 practice is presumably driven by the view that DiffServ Field usage 793 is defined by each operator for use within its network, in which case 794 preserving another network's markings has no value. As was described 795 in Section 3.2, it is proposed that a single globally standard value 796 be chosen to indicate NQB so that operators that intend to support 797 LLD can ensure that this specific value traverses their inbound 798 interconnects and their network and then arrives at the CMTS intact. 800 Although application marking is preferable, some network operators 801 might want to provide immediate benefits to applications that behave 802 in a non-queue-building way, in advance of application developers 803 introducing support for NQB tagging. It might be possible to 804 repurpose the queue protection function to identify NQB behavior even 805 if the packets are not tagged as NQB, e.g., by assuming that all non- 806 TCP traffic is likely to be NQB and relying on queue protection to 807 redirect the QB flows. This is currently an area of active research. 809 Further, it is possible that intermediary software or devices (either 810 installed by the user or provided by the operator) could identify 811 flows that are expected to be NQB and mark the packets on behalf of 812 the application. 814 6.3. Provisioning Mechanisms 816 The LLD specifications include provisioning mechanisms to allow an 817 MSO to deploy low-latency features with minimal operational impact. 818 Figure 10 in [LLD-white-paper] shows all the pieces needed to build a 819 low-latency service in the upstream and downstream direction. 820 Although it is possible to define a Low Latency ASF, its constituent 821 Classic and Low Latency SFs, and the associated classifiers 822 explicitly in the CM's configuration file, a new feature known as the 823 Aggregate QoS Profile can make this configuration automatic in many 824 cases. Default classifiers will be created and default parameters 825 for AQM and queue protection will be used, or any of these can be 826 overridden by the operator as needed. 828 6.3.1. Aggregate QoS Profiles 830 Similar to Service Class Names that are expanded by the CMTS into a 831 set of QoS parameters for a Service Flow during the registration 832 process, an operator can create an Aggregate QoS Profile (AQP) on the 833 CMTS to describe the parameters of an Aggregate Service Flow, its 834 constituent Service Flows, and the classifiers used to identify NQB 835 traffic. 837 Just like with Service Class Names, the operator can also provide 838 explicit values in the configuration file for any ASF or SF 839 parameters that they wish to "override". 841 6.3.2. Migration Using Existing Configuration File and Service Class 842 Name 844 One very straightforward way to migrate to LLD configurations may not 845 involve any changes to the CM configuration file. This method 846 involves the automatic expansion of a Service Flow definition to a 847 Low Latency ASF via the use of a Service Class Name and matching AQP 848 definition. 850 When the CMTS sees a Service Class Name in a Service Flow definition 851 from the CM's config file, if the CM indicates support for LLD, then 852 the CMTS will first use the Service Class Name as an AQP Name and 853 look for a matching entry in the AQP Table. If it finds a matching 854 entry, it will automatically expand the Service Flow into an ASF and 855 two Service Flows. 857 This mechanism allows the operator to deploy LLD by simply updating 858 the CMTS to support the feature and configuring AQP entries that 859 match the Service Class Names in use in CM config files. Then, as 860 CMs are updated over time to include support for LLD, they will 861 automatically start being configured with a Low Latency ASF. 863 6.3.3. Explicit Definition of ASF in the Configuration File 865 An operator can also encode a Low Latency ASF in a CM configuration 866 file directly using an Aggregate Service Flow TLV (70 or 71). The 867 ASF TLV could have an AQP Name that is used by the CMTS to look up a 868 definition of the ASF in its AQP Table. It could also have ASF 869 parameters that would explicitly define the ASF or would override the 870 AQP parameters. A configuration could also have explicit individual 871 Service Flow TLVs (24 or 25) that are linked to the ASF via the 872 Aggregate Service Flow Reference TLV. 874 6.4. Latency Histogram Reporting 876 As part of the AQM operation, CMs and CMTSs generate estimates of the 877 queuing latency for the upstream and downstream Service Flows, 878 respectively. The latency histogram reporting function exposes these 879 estimates to the operator to provide information that can be utilized 880 to characterize network performance, optimize configurations, or 881 troubleshoot problems in the field. 883 This latency histogram reporting can be enabled via a configuration 884 file setting or can be initiated by setting a MIB object on the 885 device. The operator configures the bins of the histogram, and the 886 CM or the CMTS logs the number of packets with recorded latencies 887 into each of the bins. The CM implements histograms for upstream 888 Service Flows, and the CMTS implements histograms for downstream 889 Service Flows. (This function can be enabled even for Service Flows 890 for which AQM is disabled.) The latency estimates from the AQM are 891 represented in the form of a histogram as well as a maximum latency 892 value. See Figure 11 in [LLD-white-paper]. 894 7. Conclusion 896 LLD enables a huge leap in latency performance and will improve the 897 Internet experience overall. With LLD, online gaming will become 898 more responsive and video chats will cease to be "choppy." This 899 technology will enable a range of new applications that require real- 900 time interface between the cyber and physical worlds, such as 901 vehicular communications and remote health care services. 903 To realize the benefits of LLD, a number of parties need to take 904 action. DOCSIS equipment manufacturers will need to develop and 905 integrate the LLD features into software updates for CMTSs and CMs. 906 Cable operators need to plan the roll-out of software updates and 907 configurations to DOCSIS equipment and set up the network to support 908 those services (e.g., carrying DiffServ/ECN markings through the 909 network). Application and operating system vendors will need to 910 adopt packet marking for NQB traffic and/or adopt the L4S congestion 911 controller. Each element of the Internet ecosystem will make these 912 decisions independently; the faster that all take the necessary 913 steps, the more quickly the user experience will improve. 915 The cable industry has provisioned its network with substantial 916 bandwidth and is poised to take another leap forward with its 10G 917 networks. But more bandwidth is only part of the broadband 918 performance story. Latency is becoming crucial to the evolution of 919 broadband. That is why LLD is a cornerstone of cable's 10G future. 921 8. Acknowledgements 923 CableLabs would like to thank the participants of the Low Latency 924 DOCSIS Working Group, representing ARRIS, Broadcom, Casa, Charter, 925 Cisco, Comcast, Cox Communications, Huawei, Intel, Liberty Global, 926 Nokia, Rogers, Shaw, Videotron 928 9. IANA Considerations 930 None 932 10. Security Considerations 934 TBD 936 11. Informative References 938 [DOCSIS-CCAP-OSSIv3.1] 939 Cable Television Laboratories, Inc., "DOCSIS 3.1 CCAP 940 Operations Support System Interface Specification, CM-SP- 941 CCAP-OSSIv3.1-I14-190121", January 21, 2019, 942 . 945 [DOCSIS-CM-OSSIv3.1] 946 Cable Television Laboratories, Inc., "DOCSIS 3.1 Cable 947 Modem Operations Support System Interface Specification, 948 CM-SP-CM-OSSIv3.1-I14-190121", January 21, 2019, 949 . 952 [DOCSIS-MULPIv3.1] 953 Cable Television Laboratories, Inc., "MAC and Upper Layer 954 Protocols Interface Specification, CM-SP- 955 MULPIv3.1-I17-190121", January 21, 2019, 956 . 959 [I-D.ietf-tsvwg-aqm-dualq-coupled] 960 Schepper, K., Briscoe, B., Bondarenko, O., and I. Tsang, 961 "DualQ Coupled AQMs for Low Latency, Low Loss and Scalable 962 Throughput (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-08 963 (work in progress), November 2018. 965 [I-D.ietf-tsvwg-ecn-l4s-id] 966 Schepper, K. and B. Briscoe, "Identifying Modified 967 Explicit Congestion Notification (ECN) Semantics for 968 Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- 969 id-05 (work in progress), November 2018. 971 [I-D.ietf-tsvwg-l4s-arch] 972 Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, 973 Low Loss, Scalable Throughput (L4S) Internet Service: 974 Architecture", draft-ietf-tsvwg-l4s-arch-03 (work in 975 progress), October 2018. 977 [I-D.white-tsvwg-nqb] 978 White, G., "Identifying and Handling Non Queue Building 979 Flows in a Bottleneck Link", draft-white-tsvwg-nqb-00 980 (work in progress), October 2018. 982 [LLD-white-paper] 983 White, G., Sundaresan, K., and B. Briscoe, "Low Latency 984 DOCSIS: Technology Overview", February 2019, 985 . 988 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 989 of Explicit Congestion Notification (ECN) to IP", 990 RFC 3168, DOI 10.17487/RFC3168, September 2001, 991 . 993 [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based 994 on Proportional Integral Controller Enhanced PIE) for 995 Data-Over-Cable Service Interface Specifications (DOCSIS) 996 Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February 997 2017, . 999 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1000 Notification (ECN) Experimentation", RFC 8311, 1001 DOI 10.17487/RFC8311, January 2018, 1002 . 1004 [web-user-model] 1005 3GPP, "3GPP2-TSGC5, HTTP, FTP and TCP models for 1xEV-DV 1006 simulations", 2001. 1008 Appendix A. Low Latency and High Bandwidth: L4S 1010 How can LLD support applications that want maximum speed, and low 1011 latency too? CableLabs is working with the Internet Engineering Task 1012 Force to make this a reality through a new technology called L4S: Low 1013 Latency Low Loss Scalable throughput [I-D.ietf-tsvwg-l4s-arch]. 1015 L4S improves many of today's applications (e.g., video chat, 1016 everything on the web), but it will also enable future applications 1017 that will need both high bandwidth and low delay, such as HD video 1018 conferencing, cloud-rendered interactive video, cloud-rendered 1019 virtual reality, augmented reality, remote presence with remote 1020 control, interactive light field experiences, and others yet to be 1021 invented. 1023 L4S involves incremental changes to the congestion controller on the 1024 sender and to the AQM at the bottleneck. The key is to indicate 1025 congestion by marking packets using Explicit Congestion Notification 1026 (ECN) rather than discarding packets. L4S uses the 2-bit ECN field 1027 in the IP header (v4 or v6) and defines each marked packet to 1028 represent a lower strength of congestion signal [RFC8311] than the 1029 original ECN standard. All the benefits of L4S follow from that. 1031 o Low Latency: The sender's L4S congestion controller makes small 1032 but frequent rate adjustments dependent on the proportion of ECN 1033 marked packets, and the L4S AQM starts applying ECN-marks to 1034 packets at a very shallow buffer threshold. This means an L4S 1035 queue can ripple at the very bottom of the buffer with sub- 1036 millisecond queuing delay but still fully utilize the link. 1037 Small, frequent adjustments could not even be considered if packet 1038 discards were used instead of ECN - they would induce a 1039 prohibitively high loss level. Further, AQMs could not consider a 1040 very shallow threshold if small adjustments were not used, as 1041 severe link under-utilization would result. 1043 o Low Loss: By definition, using ECN eliminates packet discard. In 1044 turn, that eliminates retransmission delays, which particularly 1045 impact the responsiveness of short web-like exchanges of data. 1046 Using ECN eliminates both the round-trip delay repairing a loss 1047 and the delay while detecting a loss. In addition, an L4S AQM can 1048 immediately signal queue growth using ECN, catching queue growth 1049 early. In contrast, classic AQMs hold back from discarding a 1050 packet for 100-200 ms because if a burst subsides of its own 1051 accord, a loss in itself could cause more harm than the good it 1052 would do as a signal to slow down. Furthermore, eliminating 1053 packet discard eliminates the collateral damage caused to flows 1054 that were not significantly contributing to congestion. 1056 o Scalable Throughput: Existing congestion control algorithms don't 1057 scale, so applications need to open many simultaneous connections 1058 to fully utilize today's broadband connections. An L4S congestion 1059 controller can rapidly ramp up its sending rate to match any link 1060 capacity. This is because L4S uses a "scalable congestion 1061 controller" that maintains the same frequency of control signals 1062 (2 ECN marks per round trip on average) regardless of flow rate. 1063 With classic congestion controllers, the faster they try to go, 1064 the longer they run blind without any control signals. 1066 The technology behind L4S isn't new; it is based on a scalable 1067 congestion control called Data Center TCP (DCTCP) that is currently 1068 used in data centers to get very high throughputs with ultra-low 1069 delay and loss. What is new is the development of a way that 1070 scalable traffic can coexist with the existing TCP and QUIC traffic 1071 on the Internet - the key that unlocks a transition to L4S. Until 1072 now, DCTCP has been confined to data centers because it would starve 1073 any classic flows sharing a link. 1075 Separation into two queues serves two purposes: (1) it isolates L4S 1076 flows from the queuing of classic TCP and QUIC and (2) it sends each 1077 type of traffic appropriately scaled congestion signals. This 1078 results in any number of application flows (of either type) all 1079 getting roughly equal bandwidth each, as if there were just one 1080 aggregate pool of bandwidth, with no division between the Service 1081 Flows. 1083 The approach couples the levels of ECN and drop signaling, as shown 1084 in Figure 12 in [LLD-white-paper]. The packet rate of today's 1085 classic congestion controls conforms to the well-known square-root 1086 rule (on the left of the figure). So, the classic AQM applies a drop 1087 level to Classic traffic that is coupled to the square of the ECN 1088 marking level being applied to Low Latency traffic. The squaring in 1089 the network counterbalances the square root at the sender, so the 1090 packet rates of the two types of flow turn out roughly the same. 1092 Supporting L4S in LLD is relatively straightforward. All that is 1093 needed is to classify L4S flows into the Low Latency SF and support 1094 the logic in the Low Latency SF to perform immediate ECN marking of 1095 packets (see Section 3.2). 1097 Appendix B. Simulation Details 1099 For the results reported in this paper, we set up the following 1100 network with 5 types of client devices behind the CM and a set of 1101 servers north of the CMTS. See Figure 13 in [LLD-white-paper]. The 1102 link delays shown are 1-way values. The DOCSIS link is configured in 1103 the most latency-efficient manner (short interleavers, small OFDMA 1104 frame sizes) and models a plant distance of 8 km. The service is 1105 configured with a Maximum Sustained Traffic Rate (rate limit) of 50 1106 Mbps in the upstream direction and 200 Mbps in the downstream 1107 direction. 1109 The upstream game traffic model involves normally distributed packet 1110 interarrival times (mu=33 ms, sigma=3 ms) and normally distributed 1111 packet sizes (mu=110 bytes, sigma=20 bytes) constrained to discard 1112 draws of packet size <32 bytes or >188 bytes. The downstream game 1113 traffic model involves normally distributed packet interarrival times 1114 (mu=33 ms, sigma=5 ms) and normally distributed packet sizes (mu=432 1115 bytes, sigma=20 bytes) constrained to discard draws of packet size 1116 <32 bytes or >832 bytes. 1118 The background load traffic is configured as follows. The web user 1119 is based on the 3GPP standardized web user model [web-user-model]. 1120 The video streaming model is an abstracted model of a Dynamic 1121 Adaptive Streaming over HTTP (DASH) streaming video user where the 1122 video stream is 6 Mbps and is implemented as a 3.75 MB file download 1123 every 5 seconds. Each FTP session involves the sender selecting a 1124 file size using a log-normal random variable (mu=14.8, sigma=2.0, 1125 leading to a median file size of 2.7 MB), opening a TCP connection, 1126 sending the file, closing the TCP connection, then pausing for 100 ms 1127 before repeating the process. Although we refer to this model as an 1128 FTP model, the intention is that it models TCP usage across all 1129 applications other than web browsing and video streaming. 1131 Authors' Addresses 1132 Greg White 1133 CableLabs 1134 858 Coal Creek Circle 1135 Louisville, CO 80027 1136 US 1138 Email: g.white@cablelabs.com 1140 Karthik Sundaresan 1141 CableLabs 1142 858 Coal Creek Circle 1143 Louisville, CO 80027 1144 US 1146 Email: k.sundaresan@cablelabs.com 1148 Bob Briscoe 1149 CableLabs 1150 UK 1152 Email: b.briscoe-contractor@cablelabs.com