idnits 2.17.1 draft-briscoe-tsvwg-l4s-arch-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2601 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-02 == Outdated reference: A later version (-07) exists of draft-ietf-tcpm-cubic-04 == Outdated reference: A later version (-10) exists of draft-ietf-tcpm-dctcp-04 == Outdated reference: A later version (-08) exists of draft-ietf-tsvwg-ecn-experimentation-01 == Outdated reference: A later version (-03) exists of draft-johansson-quic-ecn-01 == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe, Ed. 3 Internet-Draft Simula Research Lab 4 Intended status: Informational K. De Schepper 5 Expires: September 14, 2017 Nokia Bell Labs 6 M. Bagnulo Braun 7 Universidad Carlos III de Madrid 8 March 13, 2017 10 Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: 11 Architecture 12 draft-briscoe-tsvwg-l4s-arch-01 14 Abstract 16 This document describes the L4S architecture for the provision of a 17 new service that the Internet could provide to eventually replace 18 best efforts for all traffic: Low Latency, Low Loss, Scalable 19 throughput (L4S). It is becoming common for _all_ (or most) 20 applications being run by a user at any one time to require low 21 latency. However, the only solution the IETF can offer for ultra-low 22 queuing delay is Diffserv, which only favours a minority of packets 23 at the expense of others. In extensive testing the new L4S service 24 keeps average queuing delay under a millisecond for _all_ 25 applications even under very heavy load, without sacrificing 26 utilization; and it keeps congestion loss to zero. It is becoming 27 widely recognized that adding more access capacity gives diminishing 28 returns, because latency is becoming the critical problem. Even with 29 a high capacity broadband access, the reduced latency of L4S 30 remarkably and consistently improves performance under load for 31 applications such as interactive video, conversational video, voice, 32 Web, gaming, instant messaging, remote desktop and cloud-based apps 33 (even when all being used at once over the same access link). The 34 insight is that the root cause of queuing delay is in TCP, not in the 35 queue. By fixing the sending TCP (and other transports) queuing 36 latency becomes so much better than today that operators will want to 37 deploy the network part of L4S to enable new products and services. 38 Further, the network part is simple to deploy - incrementally with 39 zero-config. Both parts, sender and network, ensure coexistence with 40 other legacy traffic. At the same time L4S solves the long- 41 recognized problem with the future scalability of TCP throughput. 43 This document describes the L4S architecture, briefly describing the 44 different components and how the work together to provide the 45 aforementioned enhanced Internet service. 47 Status of This Memo 49 This Internet-Draft is submitted in full conformance with the 50 provisions of BCP 78 and BCP 79. 52 Internet-Drafts are working documents of the Internet Engineering 53 Task Force (IETF). Note that other groups may also distribute 54 working documents as Internet-Drafts. The list of current Internet- 55 Drafts is at http://datatracker.ietf.org/drafts/current/. 57 Internet-Drafts are draft documents valid for a maximum of six months 58 and may be updated, replaced, or obsoleted by other documents at any 59 time. It is inappropriate to use Internet-Drafts as reference 60 material or to cite them other than as "work in progress." 62 This Internet-Draft will expire on September 14, 2017. 64 Copyright Notice 66 Copyright (c) 2017 IETF Trust and the persons identified as the 67 document authors. All rights reserved. 69 This document is subject to BCP 78 and the IETF Trust's Legal 70 Provisions Relating to IETF Documents 71 (http://trustee.ietf.org/license-info) in effect on the date of 72 publication of this document. Please review these documents 73 carefully, as they describe your rights and restrictions with respect 74 to this document. Code Components extracted from this document must 75 include Simplified BSD License text as described in Section 4.e of 76 the Trust Legal Provisions and are provided without warranty as 77 described in the Simplified BSD License. 79 Table of Contents 81 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 82 2. L4S architecture overview . . . . . . . . . . . . . . . . . . 4 83 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 84 4. L4S architecture components . . . . . . . . . . . . . . . . . 7 85 5. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 9 86 5.1. Why These Primary Components? . . . . . . . . . . . . . . 9 87 5.2. Why Not Alternative Approaches? . . . . . . . . . . . . . 10 88 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 12 89 6.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 13 90 6.2. Deployment Considerations . . . . . . . . . . . . . . . . 14 91 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 92 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 93 8.1. Traffic (Non-)Policing . . . . . . . . . . . . . . . . . 15 94 8.2. 'Latency Friendliness' . . . . . . . . . . . . . . . . . 15 95 8.3. Policing Prioritized L4S Bandwidth . . . . . . . . . . . 16 96 8.4. ECN Integrity . . . . . . . . . . . . . . . . . . . . . . 16 97 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 98 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 99 10.1. Normative References . . . . . . . . . . . . . . . . . . 17 100 10.2. Informative References . . . . . . . . . . . . . . . . . 18 101 Appendix A. Required features for scalable transport protocols 102 to be safely deployable in the Internet (a.k.a. TCP 103 Prague requirements) . . . . . . . . . . . . . . . . 22 104 Appendix B. Standardization items . . . . . . . . . . . . . . . 26 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 107 1. Introduction 109 It is increasingly common for _all_ of a user's applications at any 110 one time to require low delay: interactive Web, Web services, voice, 111 conversational video, interactive video, instant messaging, online 112 gaming, remote desktop and cloud-based applications. In the last 113 decade or so, much has been done to reduce propagation delay by 114 placing caches or servers closer to users. However, queuing remains 115 a major, albeit intermittent, component of latency. When present it 116 typically doubles the path delay from that due to the base speed-of- 117 light. Low loss is also important because, for interactive 118 applications, losses translate into even longer retransmission 119 delays. 121 It has been demonstrated that, once access network bit rates reach 122 levels now common in the developed world, increasing capacity offers 123 diminishing returns if latency (delay) is not addressed. 124 Differentiated services (Diffserv) offers Expedited Forwarding 125 [RFC3246] for some packets at the expense of others, but this is not 126 applicable when all (or most) of a user's applications require low 127 latency. 129 Therefore, the goal is an Internet service with ultra-Low queueing 130 Latency, ultra-Low Loss and Scalable throughput (L4S) - for _all_ 131 traffic. A service for all traffic will need none of the 132 configuration or management baggage (traffic policing, traffic 133 contracts) associated with favouring some packets over others. This 134 document describes the L4S architecture for achieving that goal. 136 It must be said that queuing delay only degrades performance 137 infrequently [Hohlfeld14]. It only occurs when a large enough 138 capacity-seeking (e.g. TCP) flow is running alongside the user's 139 traffic in the bottleneck link, which is typically in the access 140 network. Or when the low latency application is itself a large 141 capacity-seeking flow (e.g. interactive video). At these times, the 142 performance improvement must be so remarkable that network operators 143 will be motivated to deploy it. 145 Active Queue Management (AQM) is part of the solution to queuing 146 under load. AQM improves performance for all traffic, but there is a 147 limit to how much queuing delay can be reduced by solely changing the 148 network; without addressing the root of the problem. 150 The root of the problem is the presence of standard TCP congestion 151 control (Reno [RFC5681]) or compatible variants (e.g. TCP Cubic 152 [I-D.ietf-tcpm-cubic]). We shall call this family of congestion 153 controls 'Classic' TCP. It has been demonstrated that if the sending 154 host replaces Classic TCP with a 'Scalable' alternative, when a 155 suitable AQM is deployed in the network the performance under load of 156 all the above interactive applications can be stunningly improved. 157 For instance, queuing delay under heavy load with the example DCTCP/ 158 DualQ solution cited below is roughly 1 millisecond (1 ms) at the 159 99th percentile without losing link utilization. This compares with 160 5 to 20 ms on _average_ with a Classic TCP and current state-of-the- 161 art AQMs such as fq_CoDel [I-D.ietf-aqm-fq-codel] or PIE [RFC8033]. 162 Also, with a Classic TCP, 5 ms of queuing is usually only possible by 163 losing some utilization. 165 It has been convincingly demonstrated [DCttH15] that it is possible 166 to deploy such an L4S service alongside the existing best efforts 167 service so that all of a user's applications can shift to it when 168 their stack is updated. Access networks are typically designed with 169 one link as the bottleneck for each site (which might be a home, 170 small enterprise or mobile device), so deployment at a single node 171 should give nearly all the benefit. The L4S approach requires a 172 number of mechanisms in different parts of the Internet to fulfill 173 its goal. This document presents the L4S architecture, by describing 174 the different components and how they interact to provide the 175 scalable low-latency, low-loss, Internet service. 177 2. L4S architecture overview 179 There are three main components to the L4S architecture (illustrated 180 in Figure 1): 182 1) Network: The L4S service traffic needs to be isolated from the 183 queuing latency of the Classic service traffic. However, the two 184 should be able to freely share a common pool of capacity. This is 185 because there is no way to predict how many flows at any one time 186 might use each service and capacity in access networks is too 187 scarce to partition into two. So a 'semi-permeable' membrane is 188 needed that partitions latency but not bandwidth. The Dual Queue 189 Coupled AQM [I-D.briscoe-aqm-dualq-coupled] is an example of such 190 a semi-permeable membrane. 192 Per-flow queuing such as in [I-D.ietf-aqm-fq-codel] could be used, 193 but it partitions both latency and bandwidth between every end-to- 194 end flow. So it is rather overkill, which brings disadvantages 195 (see Section 5.2), not least that thousands of queues are needed 196 when two are sufficient. 198 2) Protocol: A host needs to distinguish L4S and Classic packets 199 with an identifier so that the network can classify them into 200 their separate treatments. [I-D.briscoe-tsvwg-ecn-l4s-id] 201 considers various alternative identifiers, and concludes that all 202 alternatives involve compromises, but the ECT(1) codepoint of the 203 ECN field is a workable solution. 205 3) Host: Scalable congestion controls already exist. They solve the 206 scaling problem with TCP first pointed out in [RFC3649]. The one 207 used most widely (in controlled environments) is Data Centre TCP 208 (DCTCP [I-D.ietf-tcpm-dctcp]), which has been implemented and 209 deployed in Windows Server Editions (since 2012), in Linux and in 210 FreeBSD. Although DCTCP as-is 'works' well over the public 211 Internet, most implementations lack certain safety features that 212 will be necessary once it is used outside controlled environments 213 like data centres (see later). A similar scalable congestion 214 control will also need to be transplanted into protocols other 215 than TCP (SCTP, RTP/RTCP, RMCAT, etc.) 217 (2) (1) 218 .-------^------. .--------------^-------------------. 219 ,-(3)-----. ______ 220 ; ________ : L4S --------. | | 221 :|Scalable| : _\ ||___\_| mark | 222 :| sender | : __________ / / || / |______|\ _________ 223 :|________|\; | |/ --------' ^ \1| | 224 `---------'\_| IP-ECN | Coupling : \|priority |_\ 225 ________ / |Classifier| : /|scheduler| / 226 |Classic |/ |__________|\ --------. ___:__ / |_________| 227 | sender | \_\ || | |||___\_| mark/|/ 228 |________| / || | ||| / | drop | 229 Classic --------' |______| 231 Figure 1: Components of an L4S Solution: 1) Isolation in separate 232 network queues; 2) Packet Identification Protocol; and 3) Scalable 233 Sending Host 235 3. Terminology 237 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 238 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 239 document are to be interpreted as described in [RFC2119]. In this 240 document, these words will appear with that interpretation only when 241 in ALL CAPS. Lower case uses of these words are not to be 242 interpreted as carrying RFC-2119 significance. COMMENT: Since this 243 will be an information document, This should be removed. 245 Classic service: The 'Classic' service is intended for all the 246 congestion control behaviours that currently co-exist with TCP 247 Reno (e.g. TCP Cubic, Compound, SCTP, etc). 249 Low-Latency, Low-Loss and Scalable (L4S) service: The 'L4S' service 250 is intended for traffic from scalable TCP algorithms such as Data 251 Centre TCP. But it is also more general--it will allow a set of 252 congestion controls with similar scaling properties to DCTCP (e.g. 253 Relentless [Mathis09]) to evolve. 255 Both Classic and L4S services can cope with a proportion of 256 unresponsive or less-responsive traffic as well (e.g. DNS, VoIP, 257 etc). 259 Scalable Congestion Control: A congestion control where flow rate is 260 inversely proportional to the level of congestion signals. Then, 261 as flow rate scales, the number of congestion signals per round 262 trip remains invariant, maintaining the same degree of control. 263 For instance, DCTCP averages 2 congestion signals per round-trip 264 whatever the flow rate. 266 Classic Congestion Control: A congestion control with a flow rate 267 compatible with standard TCP Reno [RFC5681]. With Classic 268 congestion controls, as capacity increases enabling higher flow 269 rates, the number of round trips between congestion signals 270 (losses or ECN marks) rises in proportion to the flow rate. So 271 control of queuing and/or utilization becomes very slack. For 272 instance, with 1500 B packets and an RTT of 18 ms, as TCP Reno 273 flow rate increases from 2 to 100 Mb/s the number of round trips 274 between congestion signals rises proportionately, from 2 to 100. 276 The default congestion control in Linux (TCP Cubic) is Reno- 277 compatible for most scenarios expected for some years. For 278 instance, with a typical domestic round-trip time (RTT) of 18ms, 279 TCP Cubic only switches out of Reno-compatibility mode once the 280 flow rate approaches 1 Gb/s. For a typical data centre RTT of 1 281 ms, the switch-over point is theoretically 1.3 Tb/s. However, 282 with a less common transcontinental RTT of 100 ms, it only remains 283 Reno-compatible up to 13 Mb/s. All examples assume 1,500 B 284 packets. 286 Classic ECN: The original proposed standard Explicit Congestion 287 Notification (ECN) protocol [RFC3168], which requires ECN signals 288 to be treated the same as drops, both when generated in the 289 network and when responded to by the sender. 291 Site: A home, mobile device, small enterprise or campus, where the 292 network bottleneck is typically the access link to the site. Not 293 all network arrangements fit this model but it is a useful, widely 294 applicable generalisation. 296 4. L4S architecture components 298 The L4S architecture is composed by the following elements. 300 Protocols:The L4S architecture encompass the two protocol changes 301 that we describe next: 303 a. [I-D.briscoe-tsvwg-ecn-l4s-id] recommends ECT(1) is used as the 304 identifier to classify L4S and Classic packets into their 305 separate treatments, as required by [RFC4774]. 307 b. An essential aspect of a scalable congestion control is the use 308 of explicit congestion signals rather than losses, because the 309 signals need to be sent immediately and frequently--too often to 310 use drops. 'Classic' ECN [RFC3168] requires an ECN signal to be 311 treated the same as a drop, both when it is generated in the 312 network and when it is responded to by hosts. L4S allows 313 networks and hosts to support two separate meanings for ECN. So 314 the standards track [RFC3168] will need to be updated to allow 315 ECT(1) packets to depart from the 'same as drop' constraint. 317 [I-D.ietf-tsvwg-ecn-experimentation] has been prepared as a 318 standards track update to relax specific requirements in RFC 3168 319 (and certain other standards track RFCs), which clears the way 320 for the above experimental changes proposed for L4S. 321 [I-D.ietf-tsvwg-ecn-experimentation] also obsoletes the original 322 experimental assignment of the ECT(1) codepoint as an ECN nonce 323 [RFC3540] (it was never deployed, and it offers no security 324 benefit now that deployment is optional). 326 Network components:The Dual Queue Coupled AQM has been specified as 327 generically as possible [I-D.briscoe-aqm-dualq-coupled] as a 'semi- 328 permeable' membrane without specifying the particular AQMs to use in 329 the two queues. An informational appendix of the draft is provided 330 for pseudocode examples of different possible AQM approaches. 332 Initially a zero-config variant of RED called Curvy RED was 333 implemented, tested and documented. The aim is for designers to be 334 free to implement diverse ideas. So the brief normative body of the 335 draft only specifies the minimum constraints an AQM needs to comply 336 with to ensure that the L4S and Classic services will coexist. For 337 instance, a variant of PIE called Dual PI Squared [PI2] has been 338 implemented and found to perform better over a wide range of 339 conditions, so it has been documented in a second appendix of 340 [I-D.briscoe-aqm-dualq-coupled]. 342 Host mechanisms: The L4S architecture includes a number of mechanisms 343 in the end host that we enumerate next: 345 a. Data Centre TCP is the most widely used example of a scalable 346 congestion control. It is being documented in the TCPM WG as an 347 informational record of the protocol currently in use 348 [I-D.ietf-tcpm-dctcp]. It will be necessary to define a number 349 of safety features for a variant usable on the public Internet. 350 A draft list of these, known as the TCP Prague requirements, has 351 been drawn up (see Appendix A). The list also includes some 352 optional performance improvements. 354 b. Transport protocols other than TCP use various congestion 355 controls designed to be friendly with Classic TCP. Before they 356 can use the L4S service, it will be necessary to implement 357 scalable variants of each of these transport behaviours. The 358 following standards track RFCs currently define these protocols: 359 ECN in TCP [RFC3168], in SCTP [RFC4960], in RTP [RFC6679], and in 360 DCCP [RFC4340]. Not all are in widespread use, but those that 361 are will eventually need to be updated to allow a different 362 congestion response, which they will have to indicate by using 363 the ECT(1) codepoint. Scalable variants are under consideration 364 for some new transport protocols that are themselves under 365 development, e.g. QUIC [I-D.johansson-quic-ecn] and certain 366 real-time media congestion avoidandance techniques (RMCAT) 367 protocols. 369 c. ECN feedback is sufficient for L4S in some transport protocols 370 (RTCP, DCCP) but not others: 372 * For the case of TCP, the feedback protocol for ECN embeds the 373 assumption from Classic ECN that it is the same as drop, 374 making it unusable for a scalable TCP. Therefore, the 375 implementation of TCP receivers will have to be upgraded 376 [RFC7560]. Work to standardize more accurate ECN feedback for 377 TCP (AccECN [I-D.ietf-tcpm-accurate-ecn]) is already in 378 progress. 380 * ECN feedback is only roughly sketched in an appendix of the 381 SCTP specification. A fuller specification has been proposed 382 [I-D.stewart-tsvwg-sctpecn], which would need to be 383 implemented and deployed before SCTCP could support L4S. 385 5. Rationale 387 5.1. Why These Primary Components? 389 Explicit congestion signalling (protocol): Explicit congestion 390 signalling is a key part of the L4S approach. In contrast, use of 391 drop as a congestion signal creates a tension because drop is both 392 a useful signal (more would reduce delay) and an impairment (less 393 would reduce delay). Explicit congestion signals can be used many 394 times per round trip, to keep tight control, without any 395 impairment. Under heavy load, even more explicit signals can be 396 applied so the queue can be kept short whatever the load. Whereas 397 state-of-the-art AQMs have to introduce very high packet drop at 398 high load to keep the queue short. Further, TCP's sawtooth 399 reduction can be smaller, and therefore return to the operating 400 point more often, without worrying that this causes more signals 401 (one at the top of each smaller sawtooth). The consequent smaller 402 amplitude sawteeth fit between a very shallow marking threshold 403 and an empty queue, so delay variation can be very low, without 404 risk of under-utilization. 406 All the above makes it clear that explicit congestion signalling 407 is only advantageous for latency if it does not have to be 408 considered 'the same as' drop (as required with Classic ECN 409 [RFC3168]). Therefore, in a DualQ AQM, the L4S queue uses a new 410 L4S variant of ECN that is not equivalent to drop 411 [I-D.briscoe-tsvwg-ecn-l4s-id], while the Classic queue uses 412 either classic ECN [RFC3168] or drop, which are equivalent. 414 Before Classic ECN was standardized, there were various proposals 415 to give an ECN mark a different meaning from drop. However, there 416 was no particular reason to agree on any one of the alternative 417 meanings, so 'the same as drop' was the only compromise that could 418 be reached. RFC 3168 contains a statement that: 420 "An environment where all end nodes were ECN-Capable could 421 allow new criteria to be developed for setting the CE 422 codepoint, and new congestion control mechanisms for end-node 423 reaction to CE packets. However, this is a research issue, and 424 as such is not addressed in this document." 426 Latency isolation with coupled congestion notification (network): 428 Using just two queues is not essential to L4S (more would be 429 possible), but it is the simplest way to isolate all the L4S 430 traffic that keeps latency low from all the legacy Classic traffic 431 that does not. 433 Similarly, coupling the congestion notification between the queues 434 is not necessarily essential, but it is a clever and simple way to 435 allow senders to determine their rate, packet-by-packet, rather 436 than be overridden by a network scheduler. Because otherwise a 437 network scheduler would have to inspect at least transport layer 438 headers, and it would have to continually assign a rate to each 439 flow without any easy way to understand application intent. 441 L4S packet identifier (protocol): Once there are at least two 442 separate treatments in the network, hosts need an identifier at 443 the IP layer to distinguish which treatment they intend to use. 445 Scalable congestion notification (host): A scalable congestion 446 control keeps the signalling frequency high so that rate 447 variations can be small when signalling is stable, and rate can 448 track variations in available capacity as rapidly as possible 449 otherwise. 451 5.2. Why Not Alternative Approaches? 453 All the following approaches address some part of the same problem 454 space as L4S. In each case, it is shown that L4S complements them or 455 improves on them, rather than being a mutually exclusive alternative: 457 Diffserv: Diffserv addresses the problem of bandwidth apportionment 458 for important traffic as well as queuing latency for delay- 459 sensitive traffic. L4S solely addresses the problem of queuing 460 latency (as well as loss and throughput scaling). Diffserv will 461 still be necessary where important traffic requires priority (e.g. 462 for commercial reasons, or for protection of critical 463 infrastructure traffic). Nonetheless, if there are Diffserv 464 classes for important traffic, the L4S approach can provide low 465 latency for _all_ traffic within each Diffserv class (including 466 the case where there is only one Diffserv class). 468 Also, as already explained, Diffserv only works for a small subset 469 of the traffic on a link. It is not applicable when all the 470 applications in use at one time at a single site (home, small 471 business or mobile device) require low latency. Also, because L4S 472 is for all traffic, it needs none of the management baggage 473 (traffic policing, traffic contracts) associated with favouring 474 some packets over others. This baggage has held Diffserv back 475 from widespread end-to-end deployment. 477 State-of-the-art AQMs: AQMs such as PIE and fq_CoDel give a 478 significant reduction in queuing delay relative to no AQM at all. 479 The L4S work is intended to complement these AQMs, and we 480 definitely do not want to distract from the need to deploy them as 481 widely as possible. Nonetheless, without addressing the large 482 saw-toothing rate variations of Classic congestion controls, AQMs 483 alone cannot reduce queuing delay too far without significantly 484 reducing link utilization. The L4S approach resolves this tension 485 by ensuring hosts can minimize the size of their sawteeth without 486 appearing so aggressive to legacy flows that they starve. 488 Per-flow queuing: Similarly per-flow queuing is not incompatible 489 with the L4S approach. However, one queue for every flow can be 490 thought of as overkill compared to the minimum of two queues for 491 all traffic needed for the L4S approach. The overkill of per-flow 492 queuing has side-effects: 494 A. fq makes high performance networking equipment costly 495 (processing and memory) - in contrast dual queue code can be 496 very simple; 498 B. fq requires packet inspection into the end-to-end transport 499 layer, which doesn't sit well alongside encryption for privacy 500 - in contrast a dual queue only operates at the IP layer; 502 C. fq isolates the queuing of each flow from the others and it 503 prevents any one flow from consuming more than 1/N of the 504 capacity. In contrast, all L4S flows are expected to keep the 505 queue shallow, and policing of individual flows to enforce 506 this may be applied separately, as a policy choice. 508 An fq scheduler has to decide packet-by-packet which flow to 509 schedule without knowing application intent. Whereas a 510 separate policing function can be configured less strictly, so 511 that senders can still control the instantaneous rate of each 512 flow dependent on the needs of each application (e.g. variable 513 rate video), giving more wriggle-room before a flow is deemed 514 non-compliant. Also policing of queuing and of flow-rates can 515 be applied independently. 517 Alternative Back-off ECN (ABE): Yet again, L4S is not an alternative 518 to ABE but a complement that introduces much lower queuing delay. 519 ABE [I-D.khademi-tcpm-alternativebackoff-ecn] alters the host 520 behaviour in response to ECN marking to utilize a link better and 521 give ECN flows a faster throughput, but it assumes the network 522 still treats ECN and drop the same. Therefore ABE exploits any 523 lower queuing delay that AQMs can provide. But as explained 524 above, AQMs still cannot reduce queuing delay too far without 525 losing link utilization (for other non-ABE flows). 527 6. Applicability 529 A transport layer that solves the current latency issues will provide 530 new service, product and application opportunities. 532 With the L4S approach, the following existing applications will 533 immediately experience significantly better quality of experience 534 under load in the best effort class: 536 o Gaming 538 o VoIP 540 o Video conferencing 542 o Web browsing 544 o (Adaptive) video streaming 546 o Instant messaging 548 The significantly lower queuing latency also enables some interactive 549 application functions to be offloaded to the cloud that would hardly 550 even be usable today: 552 o Cloud based interactive video 554 o Cloud based virtual and augmented reality 556 The above two applications have been successfully demonstrated with 557 L4S, both running together over a 40 Mb/s broadband access link 558 loaded up with the numerous other latency sensitive applications in 559 the previous list as well as numerous downloads. A panoramic video 560 of a football stadium can be swiped and pinched so that on the fly a 561 proxy in the cloud generates a sub-window of the match video under 562 the finger-gesture control of each user. At the same time, a virtual 563 reality headset fed from a 360 degree camera in a racing car has been 564 demonstrated, where the user's head movements control the scene 565 generated in the cloud. In both cases, with 7 ms end-to-end base 566 delay, the additional queuing delay of roughly 1 ms is so low that it 567 seems the video is generated locally. See https://riteproject.eu/ 568 dctth/ for videos of these demonstrations. 570 Using a swiping finger gesture or head movement to pan a video are 571 extremely demanding applications--far more demanding than VoIP. 573 Because human vision can detect extremely low delays of the order of 574 single milliseconds when delay is translated into a visual lag 575 between a video and a reference point (the finger or the orientation 576 of the head). 578 If low network delay is not available, all fine interaction has to be 579 done locally and therefore much more redundant data has to be 580 downloaded. When all interactive processing can be done in the 581 cloud, only the data to be rendered for the end user needs to be 582 sent. Whereas, once applications can rely on minimal queues in the 583 network, they can focus on reducing their own latency by only 584 minimizing the application send queue. 586 6.1. Use Cases 588 The following use-cases for L4S are being considered by various 589 interested parties: 591 o Where the bottleneck is one of various types of access network: 592 DSL, cable, mobile, satellite 594 * Radio links (cellular, WiFi) that are distant from the source 595 are particularly challenging. The radio link capacity can vary 596 rapidly by orders of magnitude, so it is often desirable to 597 hold a buffer to utilise sudden increases of capacity; 599 * cellular networks are further complicated by a perceived need 600 to buffer in order to make hand-overs imperceptible; 602 * Satellite networks generally have a very large base RTT, so 603 even with minimal queuing, overall delay can never be extremely 604 low; 606 * Nonetheless, it is certainly desirable not to hold a buffer 607 purely because of the sawteeth of Classic TCP, when it is more 608 than is needed for all the above reasons. 610 o Private networks of heterogeneous data centres, where there is no 611 single administrator that can arrange for all the simultaneous 612 changes to senders, receivers and network needed to deploy DCTCP: 614 * a set of private data centres interconnected over a wide area 615 with separate administrations, but within the same company 617 * a set of data centres operated by separate companies 618 interconnected by a community of interest network (e.g. for the 619 finance sector) 621 * multi-tenant (cloud) data centres where tenants choose their 622 operating system stack (Infrastructure as a Service - IaaS) 624 o Different types of transport (or application) congestion control: 626 * elastic (TCP/SCTP); 628 * real-time (RTP, RMCAT); 630 * query (DNS/LDAP). 632 o Where low delay quality of service is required, but without 633 inspecting or intervening above the IP layer 634 [I-D.you-encrypted-traffic-management]: 636 * mobile and other networks have tended to inspect higher layers 637 in order to guess application QoS requirements. However, with 638 growing demand for support of privacy and encryption, L4S 639 offers an alternative. There is no need to select which 640 traffic to favour for queuing, when L4S gives favourable 641 queuing to all traffic. 643 o If queuing delay is minimized, applications with a fixed delay 644 budget can communicate over longer distances, or via a longer 645 chain of service functions [RFC7665] or onion routers. 647 6.2. Deployment Considerations 649 {ToDo: This section TBA - currently, bullet points only.} 651 Incremental deployment parts. 653 Possible deployment sequences. 655 Prioritizing the most-likely bottlenecks in the various use-cases 656 (access links, downstream and upstream, broadband, mobile, DC, etc). 658 Deployment incentives: Immediate vs. deferred benefits. 660 7. IANA Considerations 662 This specification contains no IANA considerations. 664 8. Security Considerations 665 8.1. Traffic (Non-)Policing 667 Because the L4S service can serve all traffic that is using the 668 capacity of a link, it should not be necessary to police access to 669 the L4S service. In contrast, Diffserv only works if some packets 670 get less favourable treatement than others. So it has to use traffic 671 policers to limit how much traffic can be favoured, In turn, traffic 672 policers require traffic contracts between users and networks as well 673 as pairwise between networks. Because L4S will lack all this 674 management complexity, it is more likely to work end-to-end. 676 During early deployment (and perhaps always), some networks will not 677 offer the L4S service. These networks do not need to police or re- 678 mark L4S traffic - they just forward it unchanged as best efforts 679 traffic, as they would already forward traffic with ECT(1) today. At 680 a bottleneck, such networks will introduce some queuing and dropping. 681 When a scalable congestion control detects a drop it will have to 682 respond as if it is a Classic congestion control (see item 3-1 in 683 Appendix A). This will ensure safe interworking with other traffic 684 at the 'legacy' bottleneck, but it will degrade the L4S service to no 685 better (but never worse) than classic best efforts, whenever a legacy 686 (non-L4S) bottleneck is encountered on a path. 688 Certain network operators might choose to restict access to the L4S 689 class, perhaps only to customers who have paid a premium. Their 690 packet classifer (item 2 in Figure 1) could identify such customers 691 against some other field (e.g. source address range) as well as ECN. 692 If only the ECN L4S identifier matched, but not the source address 693 (say), the classifier could direct these packets (from non-paying 694 customers) into the Classic queue. Allowing operators to use an 695 additional local classifier is intended to remove any incentive to 696 bleach the L4S identifier. Then at least the L4S ECN identifier will 697 be more likely to survive end-to-end even though the service may not 698 be supported at every hop. Such arrangements would only require 699 simple registered/not-registered packet classification, rather than 700 the managed application-specific traffic policing against customer- 701 specific traffic contracts that Diffserv requires. 703 8.2. 'Latency Friendliness' 705 The L4S service does rely on self-constraint - not in terms of 706 limiting capacity usage, but in terms of limiting burstiness. It is 707 hoped that standardisation of dynamic behaviour (cf. TCP slow-start) 708 and self-interest will be sufficient to prevent transports from 709 sending excessive bursts of L4S traffic, given the application's own 710 latency will suffer most from such behaviour. 712 Whether burst policing becomes necessary remains to be seen. Without 713 it, there will be potential for attacks on the low latency of the L4S 714 service. However it may only be necessary to apply such policing 715 reactively, e.g. punitively targeted at any deployments of new bursty 716 malware. 718 8.3. Policing Prioritized L4S Bandwidth 720 As mentioned in Section 5.2, L4S should remove the need for low 721 latency Diffserv classes. However, those Diffserv classes that give 722 certain applications or users priority over capacity, would still be 723 applicable. Then, within such Diffserv classes, L4S would often be 724 applicable to give traffic low latency and low loss. WIthin such a 725 class, the bandwidth available to a user or application is often 726 limited by a rate policer. Similarly, in the default Diffserv class, 727 rate policers are used to partition shared capacity. 729 A classic rate policer drops any packets exceeding a set rate, 730 usually also giving a burst allowance (variant exist where the 731 policer re-marks non-compliant traffic to a discard-eligible Diffserv 732 codepoint, so they may be dropped elsewhere during contention). In 733 networks that deploy L4S and use rate policers, it will be preferable 734 to deploy a policer designed to be more friendly to the L4S service, 736 This might be achieved by setting a threshold where ECN marking is 737 introduced, such that it is just under the policed rate or just under 738 the burst allowance where drop is introduced. This could be applied 739 to various types of policer, e.g. [RFC2697], [RFC2698] or the local 740 (non-ConEx) variant of the ConEx congestion policer 741 [I-D.briscoe-conex-policing]. Otherwise, whenever L4S traffic 742 encounters a rate policer, it will experience drops and the source 743 will fall back to a Classic congestion control, thus losing all the 744 benefits of L4S. 746 Further discussion of the applicability of L4S to the various 747 Diffserv classes, and the design of suitable L4S rate policers. 749 8.4. ECN Integrity 751 Receiving hosts can fool a sender into downloading faster by 752 suppressing feedback of ECN marks (or of losses if retransmissions 753 are not necessary or available otherwise). [RFC3540] proposes that a 754 TCP sender could pseudorandomly set either of ECT(0) or ECT(1) in 755 each packet of a flow and remember the sequence it had set, termed 756 the ECN nonce. If the receiver supports the nonce, it can prove that 757 it is not suppressing feedback by reflecting its knowledge of the 758 sequence back to the sender. The nonce was proposed on the 759 assumption that receivers might be more likely to cheat congestion 760 control than senders (although senders also have a motive to cheat). 762 If L4S uses the ECT(1) codepoint of ECN for packet classification, it 763 will have to obsolete the experimental nonce. As far as is known, 764 the ECN Nonce has never been deployed, and it was only implemented 765 for a couple of testbed evaluations. It would be nearly impossible 766 to deploy now, because any misbehaving receiver can simply opt-out, 767 which would be unremarkable given all receivers currently opt-out. 769 Other ways to protect TCP feedback integrity have since been 770 developed. For instance: 772 o the sender can test the integrity of the receiver's feedback by 773 occasionally setting the IP-ECN field to a value normally only set 774 by the network. Then it can test whether the receiver's feedback 775 faithfully reports what it expects [I-D.moncaster-tcpm-rcv-cheat]. 776 This method consumes no extra codepoints. It works for loss and 777 it will work for ECN feedback in any transport protocol suitable 778 for L4S. However, it shares the same assumption as the nonce; 779 that the sender is not cheating and it is motivated to prevent the 780 receiver cheating; 782 o A network can enforce a congestion response to its ECN markings 783 (or packet losses) by auditing congestion exposure (ConEx) 784 [RFC7713]. Whether the receiver or a downstream network is 785 suppressing congestion feedback or the sender is unresponsive to 786 the feedback, or both, ConEx audit can neutralise any advantage 787 that any of these three parties would otherwise gain. ConEx is 788 only currently defined for IPv6 and consumes a destination option 789 header. It has been implemented, but not deployed as far as is 790 known. 792 9. Acknowledgements 794 Thanks to Wes Eddy, Karen Nielsen and David Black for their useful 795 review comments. 797 10. References 799 10.1. Normative References 801 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 802 Requirement Levels", BCP 14, RFC 2119, 803 DOI 10.17487/RFC2119, March 1997, 804 . 806 10.2. Informative References 808 [Alizadeh-stability] 809 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 810 of DCTCP: Stability, Convergence, and Fairness", ACM 811 SIGMETRICS 2011 , June 2011. 813 [DCttH15] De Schepper, K., Bondarenko, O., Tsang, I., and B. 814 Briscoe, "'Data Centre to the Home': Ultra-Low Latency for 815 All", 2015, . 818 (Under submission) 820 [Hohlfeld14] 821 Hohlfeld , O., Pujol, E., Ciucu, F., Feldmann, A., and P. 822 Barford, "A QoE Perspective on Sizing Network Buffers", 823 Proc. ACM Internet Measurement Conf (IMC'14) hmm, November 824 2014. 826 [I-D.briscoe-aqm-dualq-coupled] 827 Schepper, K., Briscoe, B., Bondarenko, O., and I. Tsang, 828 "DualQ Coupled AQM for Low Latency, Low Loss and Scalable 829 Throughput", draft-briscoe-aqm-dualq-coupled-01 (work in 830 progress), March 2016. 832 [I-D.briscoe-conex-policing] 833 Briscoe, B., "Network Performance Isolation using 834 Congestion Policing", draft-briscoe-conex-policing-01 835 (work in progress), February 2014. 837 [I-D.briscoe-tsvwg-ecn-l4s-id] 838 Schepper, K., Briscoe, B., and I. Tsang, "Identifying 839 Modified Explicit Congestion Notification (ECN) Semantics 840 for Ultra-Low Queuing Delay", draft-briscoe-tsvwg-ecn-l4s- 841 id-02 (work in progress), October 2016. 843 [I-D.ietf-aqm-fq-codel] 844 Hoeiland-Joergensen, T., McKenney, P., 845 dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, "The 846 FlowQueue-CoDel Packet Scheduler and Active Queue 847 Management Algorithm", draft-ietf-aqm-fq-codel-06 (work in 848 progress), March 2016. 850 [I-D.ietf-tcpm-accurate-ecn] 851 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 852 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 853 ecn-02 (work in progress), October 2016. 855 [I-D.ietf-tcpm-cubic] 856 Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 857 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 858 draft-ietf-tcpm-cubic-04 (work in progress), February 859 2017. 861 [I-D.ietf-tcpm-dctcp] 862 Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P., 863 and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion 864 Control for Datacenters", draft-ietf-tcpm-dctcp-04 (work 865 in progress), February 2017. 867 [I-D.ietf-tsvwg-ecn-experimentation] 868 Black, D., "Explicit Congestion Notification (ECN) 869 Experimentation", draft-ietf-tsvwg-ecn-experimentation-01 870 (work in progress), March 2017. 872 [I-D.johansson-quic-ecn] 873 Johansson, I., "ECN support in QUIC", draft-johansson- 874 quic-ecn-01 (work in progress), February 2017. 876 [I-D.khademi-tcpm-alternativebackoff-ecn] 877 Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 878 "TCP Alternative Backoff with ECN (ABE)", draft-khademi- 879 tcpm-alternativebackoff-ecn-01 (work in progress), October 880 2016. 882 [I-D.moncaster-tcpm-rcv-cheat] 883 Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to 884 Allow Senders to Identify Receiver Non-Compliance", draft- 885 moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014. 887 [I-D.stewart-tsvwg-sctpecn] 888 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 889 Control Transmission Protocol (SCTP)", draft-stewart- 890 tsvwg-sctpecn-05 (work in progress), January 2014. 892 [I-D.you-encrypted-traffic-management] 893 You, J. and C. Xiong, "The Effect of Encrypted Traffic on 894 the QoS Mechanisms in Cellular Networks", draft-you- 895 encrypted-traffic-management-00 (work in progress), 896 October 2015. 898 [Mathis09] 899 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 900 May 2009, . 903 [NewCC_Proc] 904 Eggert, L., "Experimental Specification of New Congestion 905 Control Algorithms", IETF Operational Note ion-tsv-alt-cc, 906 July 2007. 908 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 909 Briscoe, "PI^2 : A Linearized AQM for both Classic and 910 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 911 2016, 912 . 914 [RFC2697] Heinanen, J. and R. Guerin, "A Single Rate Three Color 915 Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999, 916 . 918 [RFC2698] Heinanen, J. and R. Guerin, "A Two Rate Three Color 919 Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999, 920 . 922 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 923 of Explicit Congestion Notification (ECN) to IP", 924 RFC 3168, DOI 10.17487/RFC3168, September 2001, 925 . 927 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 928 J., Courtney, W., Davari, S., Firoiu, V., and D. 929 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 930 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 931 . 933 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 934 Congestion Notification (ECN) Signaling with Nonces", 935 RFC 3540, DOI 10.17487/RFC3540, June 2003, 936 . 938 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 939 RFC 3649, DOI 10.17487/RFC3649, December 2003, 940 . 942 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 943 Congestion Control Protocol (DCCP)", RFC 4340, 944 DOI 10.17487/RFC4340, March 2006, 945 . 947 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 948 Explicit Congestion Notification (ECN) Field", BCP 124, 949 RFC 4774, DOI 10.17487/RFC4774, November 2006, 950 . 952 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 953 RFC 4960, DOI 10.17487/RFC4960, September 2007, 954 . 956 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 957 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 958 . 960 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 961 and K. Carlberg, "Explicit Congestion Notification (ECN) 962 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 963 2012, . 965 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 966 "Problem Statement and Requirements for Increased Accuracy 967 in Explicit Congestion Notification (ECN) Feedback", 968 RFC 7560, DOI 10.17487/RFC7560, August 2015, 969 . 971 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 972 Chaining (SFC) Architecture", RFC 7665, 973 DOI 10.17487/RFC7665, October 2015, 974 . 976 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 977 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 978 DOI 10.17487/RFC7713, December 2015, 979 . 981 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 982 "Proportional Integral Controller Enhanced (PIE): A 983 Lightweight Control Scheme to Address the Bufferbloat 984 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 985 . 987 [TCP-sub-mss-w] 988 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 989 Window for Small Round Trip Times", BT Technical Report 990 TR-TUB8-2015-002, May 2015, 991 . 994 [TCPPrague] 995 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 996 2015, 17:40, Prague", tcpprague mailing list archive , 997 July 2015. 999 Appendix A. Required features for scalable transport protocols to be 1000 safely deployable in the Internet (a.k.a. TCP Prague 1001 requirements) 1003 This list contains a list of features, mechanisms and modifications 1004 from currently defined behaviour for scalable Transport protocols so 1005 that they can be safely deployed over the public Internet. This list 1006 of requirements was produced at an ad hoc meeting during IETF-94 in 1007 Prague [TCPPrague]. 1009 One of such scalable transport protocols is DCTCP, currently 1010 specified in [I-D.ietf-tcpm-dctcp]. In its current form, DCTCP is 1011 specified to be deployable in controlled environments and deploying 1012 it in the public Internet would lead to a number of issues, both from 1013 the safety and the performance perspective. In this section, we 1014 describe the modifications and additional mechanisms that are 1015 required for its deployment over the global Internet. We use DCTCP 1016 as a base, but it is likely that most of these requirements equally 1017 apply to other scalable transport protocols. 1019 We next provide a brief description of each required feature. 1021 Requirement #4.1: Fall back to Reno/Cubic congestion control on 1022 packet loss. 1024 Description: In case of packet loss, the scalable transport MUST 1025 react as classic TCP (whatever the classic version of TCP is running 1026 in the host, e.g. Reno, Cubic). 1028 Motivation: As part of the safety conditions for deploying a scalable 1029 transport over the public Internet is to make sure that it behaves 1030 properly when some or all the network devices connecting the two 1031 endpoints that implement the scalable transport have not been 1032 upgraded. In particular, it may be the case that some of the 1033 switches along the path between the two endpoints may only react to 1034 congestion by dropping packets (i.e. no ECN marking). It is 1035 important that in these cases, the scalable transport react to the 1036 congestion signal in the form of a packet drop similarly to classic 1037 TCP. 1039 In the particular case of DCTCP, the current DCTCP specification 1040 states that "It is RECOMMENDED that an implementation deal with loss 1041 episodes in the same way as conventional TCP." For safe deployment 1042 in the public Internet of a scalable transport, the above requirement 1043 needs to be defined as a MUST. 1045 Packet loss, while rare, may also occur in the case that the 1046 bottleneck is L4S capable. In this case, the sender may receive a 1047 high number of packets marked with the CE bit set and also experience 1048 a loss. Current DCTCP implementations react differently to this 1049 situation. At least one implementation reacts only to the drop 1050 signal (e.g. by halving the CWND) and at least another DCTCP 1051 implementation reacts to both signals (e.g. by halving the CWND due 1052 to the drop and also further reducing the CWND based on the 1053 proportion of marked packet). We believe that further 1054 experimentation is needed to understand what is the best behaviour 1055 for the public Internet, which may or not be one of the existent 1056 implementations. 1058 Requirement #4.2: Fall back to Reno/Cubic congestion control on 1059 classic ECN bottlenecks. 1061 Description: The scalable transport protocol SHOULD/MAY? behave as 1062 classic TCP with classic ECN if the path contains a legacy bottleneck 1063 which marks both ect(0) and ect(1) in the same way as drop (non L4S, 1064 but ECN capable bottleneck). 1066 Motivation: Similarly to Requirement #3.1, this requirement is a 1067 safety condition in case L4S-capable endpoints are communicating over 1068 a path that contains one or more non-L4S but ECN capable switches and 1069 one of them happens to be the bottleneck. In this case, the scalable 1070 transport will attempt to fill in the buffer of the bottleneck switch 1071 up to the marking threshold and produce a small sawtooth around that 1072 operation point. The result is that the switch will set its 1073 operation point with the buffer full and all other non-scalable 1074 transports will be starved (as they will react reducing their CWND 1075 more aggressively than the scalable transport). 1077 Scalable transports then MUST be able to detect the presence of a 1078 classic ECN bottleneck and fall back to classic TCP/classic ECN 1079 behaviour in this case. 1081 Discussion: It is not clear at this point if it is possible to design 1082 a mechanism that always detect the aforementioned cases. One 1083 possibility is to base the detection on an increase on top of a 1084 minimum RTT, but it is not yet clear which value should trigger this. 1085 Having a delay based fall back response on L4S may as well be 1086 beneficial for preserving low latency without legacy network nodes. 1087 Even if it possible to design such a mechanism, it may well be that 1088 it would encompass additional complexity that implementers may 1089 consider unnecessary. The need for this mechanism depends on the 1090 extent of classic ECN deployment. 1092 Requirement #4.3: Reduce RTT dependence 1093 Description: Scalable transport congestion control algorithms MUST 1094 reduce or eliminate the RTT bias within the range of RTTs available. 1096 Motivation: Classic TCP's throughput is known to be inversely 1097 proportional to RTT. One would expect flows over very low RTT paths 1098 to nearly starve flows over larger RTTs. However, because Classic 1099 TCP induces a large queue, it has never allowed a very low RTT path 1100 to exist, so far. For instance, consider two paths with base RTT 1ms 1101 and 100ms. If Classic TCP induces a 20ms queue, it turns these RTTs 1102 into 21ms and 120ms leading to a throughput ratio of about 1:6. 1103 Whereas if a Scalable TCP induces only a 1ms queue, the ratio is 1104 2:101. Therefore, with small queues, long RTT flows will essentially 1105 starve. 1107 Scalable transport protocol MUST then accommodate flows across the 1108 range of RTTs enabled by the deployment of L4S service over the 1109 public Internet. 1111 Requirement #4.4: Scaling down the congestion window. 1113 Description: Scalable transports MUST be responsive to congestion 1114 when RTTs are significantly smaller than in the current public 1115 Internet. 1117 Motivation: As currently specified, the minimum CWND of TCP (and the 1118 scalable extensions such as DCTCP), is set to 2 MSS. Once this 1119 minimum CWND is reached, the transport protocol ceases to react to 1120 congestion signals (the CWND is not further reduced beyond this 1121 minimum size). 1123 L4S mechanisms reduce significantly the queueing delay, achieving 1124 smaller RTTs over the Internet. For the same CWND, smaller RTTs 1125 imply higher transmission rates. The result is that when scalable 1126 transport are used and small RTTs are achieved, the minimum value of 1127 the CWND currently defined in 2 MSS may still result in a high 1128 transmission rate for a large number of common scenarios. For 1129 example, as described in [TCP-sub-mss-w], consider a residential 1130 setting with an broadband Internet access of 40Mbps. Suppose now a 1131 number of equal TCP flows running in parallel with the Internet 1132 access link being the bottleneck. Suppose that for these flows, the 1133 RTT is 6ms and the MSS is 1500B. The minimum transmission rate 1134 supported by TCP in this scenario is when CWND is set to 2 MSS, which 1135 results in 4Mbps for each flow. This means that in this scenario, if 1136 the number of flows is higher than 10, the congestion control ceases 1137 to be responsive and starts to build up a queue in the network. 1139 In order to address this issue, the congestion control mechanism for 1140 scalable transports MUST be responsive for the new range of RTT 1141 resulting from the decrease of the queueing delay. 1143 There are several ways how this can be achieved. One possible sub- 1144 MSS window mechanism is described in [TCP-sub-mss-w]. 1146 In addition to the safety requirements described before, there are 1147 some optimizations that while not required for the safe deployment of 1148 scalable transports over the public Internet, would results in an 1149 optimized performance. We describe them next. 1151 Optimization #5.1: Setting ECT in SYN, SYN/ACK and pure ACK packets. 1153 Description: Scalable transport SHOULD set the ECT bit in SYN, SYN/ 1154 ACK and pure ACK packets. 1156 Motivation: Failing to set the ECT bit in SYN, SYN/ACK or ACK packets 1157 results in these packets being more likely dropped during congestion 1158 events. Dropping SYN and SYN/ACK packets is particularly bad for 1159 performance as the retransmission timers for these packets are large. 1160 [RFC3168] prevents from marking these packets due to security 1161 reasons. The arguments provided should be revisited in the the 1162 context of L4S and evaluate if avoiding marking these packets is 1163 still the best approach. 1165 Optimization #5.2: Faster than additive increase. 1167 Description: Scalable transport MAY support faster than additive 1168 increase in the congestion avoidance phase. 1170 Motivation: As currently defined, DCTCP supports additive increase in 1171 congestion avoidance phase. It would be beneficial for performance 1172 to update the congestion control algorithm to increase the CWND more 1173 than 1 MSS per RTT during the congestion avoidance phase. In the 1174 context of L4S such mechanism, must also provide fairness with other 1175 classes of traffic, including classic TCP and possibly scalable TCP 1176 that uses additive increase. 1178 Optimization #5.3: Faster convergence to fairness. 1180 Description: Scalable transport SHOULD converge to a fair share 1181 allocation of the available capacity as fast as classic TCP or 1182 faster. 1184 Motivation: The time required for a new flow to obtain its fair share 1185 of the capacity of the bottleneck when the there are already ongoing 1186 flows using up all the bottleneck capacity is higher in the case of 1187 DCTCP than in the case of classic TCP (about a factor of 1,5 and 2 1188 larger according to [Alizadeh-stability]). This is detrimental in 1189 general, but it is very harmful for short flows, which performance 1190 can be worse than the one obtained with classic TCP. for this reason 1191 it is desirable that scalable transport provide convergence times no 1192 larger than classic TCP. 1194 Appendix B. Standardization items 1196 The following table includes all the itmes that should be 1197 standardized to provide a full L4S architecture. 1199 The table is too wide for the ASCII draft format, so it has been 1200 split into two, with a common column of row index numbers on the 1201 left. 1203 The columns in the second part of the table have the following 1204 meanings: 1206 WG: The IETF WG most relevant to this requirement. The "tcpm/iccrg" 1207 combination refers to the procedure typically used for congestion 1208 control changes, where tcpm owns the approval decision, but uses 1209 the iccrg for expert review [NewCC_Proc]; 1211 TCP: Applicable to all forms of TCP congestion control; 1213 DCTCP: Applicable to Data Centre TCP as currently used (in 1214 controlled environments); 1216 DCTCP bis: Applicable to an future Data Centre TCP congestion 1217 control intended for controlled environments; 1219 XXX Prague: Applicable to a Scalable variant of XXX (TCP/SCTP/RMCAT) 1220 congestion control. 1222 +-----+-----------------------+-------------------------------------+ 1223 | Req | Requirement | Reference | 1224 | # | | | 1225 +-----+-----------------------+-------------------------------------+ 1226 | 0 | ARCHITECTURE | | 1227 | 1 | L4S IDENTIFIER | [I-D.briscoe-tsvwg-ecn-l4s-id] | 1228 | 2 | DUAL QUEUE AQM | [I-D.briscoe-aqm-dualq-coupled] | 1229 | 3 | Suitable ECN Feedback | [I-D.ietf-tcpm-accurate-ecn], | 1230 | | | [I-D.stewart-tsvwg-sctpecn]. | 1231 | | | | 1232 | | SCALABLE TRANSPORT - | | 1233 | | SAFETY ADDITIONS | | 1234 | 4-1 | Fall back to | [I-D.ietf-tcpm-dctcp] | 1235 | | Reno/Cubic on loss | | 1236 | 4-2 | Fall back to | | 1237 | | Reno/Cubic if classic | | 1238 | | ECN bottleneck | | 1239 | | detected | | 1240 | | | | 1241 | 4-3 | Reduce RTT-dependence | | 1242 | | | | 1243 | 4-4 | Scaling TCP's | [TCP-sub-mss-w] | 1244 | | Congestion Window for | | 1245 | | Small Round Trip | | 1246 | | Times | | 1247 | | SCALABLE TRANSPORT - | | 1248 | | PERFORMANCE | | 1249 | | ENHANCEMENTS | | 1250 | 5-1 | Setting ECT in SYN, | draft-bagnulo-tsvwg-generalized-ECN | 1251 | | SYN/ACK and pure ACK | | 1252 | | packets | | 1253 | 5-2 | Faster-than-additive | | 1254 | | increase | | 1255 | 5-3 | Less drastic exit | | 1256 | | from slow-start | | 1257 +-----+-----------------------+-------------------------------------+ 1258 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1259 | # | WG | TCP | DCTCP | DCTCP-bis | TCP | SCTP | RMCAT | 1260 | | | | | | Prague | Prague | Prague | 1261 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1262 | 0 | tsvwg? | Y | Y | Y | Y | Y | Y | 1263 | 1 | tsvwg? | | | Y | Y | Y | Y | 1264 | 2 | aqm? | n/a | n/a | n/a | n/a | n/a | n/a | 1265 | | | | | | | | | 1266 | | | | | | | | | 1267 | | | | | | | | | 1268 | 3 | tcpm | Y | Y | Y | Y | n/a | n/a | 1269 | | | | | | | | | 1270 | 4-1 | tcpm | | Y | Y | Y | Y | Y | 1271 | | | | | | | | | 1272 | 4-2 | tcpm/ | | | | Y | Y | ? | 1273 | | iccrg? | | | | | | | 1274 | | | | | | | | | 1275 | | | | | | | | | 1276 | | | | | | | | | 1277 | | | | | | | | | 1278 | 4-3 | tcpm/ | | | Y | Y | Y | ? | 1279 | | iccrg? | | | | | | | 1280 | 4-4 | tcpm | Y | Y | Y | Y | Y | ? | 1281 | | | | | | | | | 1282 | | | | | | | | | 1283 | 5-1 | tsvwg | Y | Y | Y | Y | n/a | n/a | 1284 | | | | | | | | | 1285 | 5-2 | tcpm/ | | | Y | Y | Y | ? | 1286 | | iccrg? | | | | | | | 1287 | 5-3 | tcpm/ | | | Y | Y | Y | ? | 1288 | | iccrg? | | | | | | | 1289 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1291 Authors' Addresses 1293 Bob Briscoe (editor) 1294 Simula Research Lab 1296 Email: ietf@bobbriscoe.net 1297 URI: http://bobbriscoe.net/ 1298 Koen De Schepper 1299 Nokia Bell Labs 1300 Antwerp 1301 Belgium 1303 Email: koen.de_schepper@nokia.com 1304 URI: https://www.bell-labs.com/usr/koen.de_schepper 1306 Marcelo Bagnulo 1307 Universidad Carlos III de Madrid 1308 Av. Universidad 30 1309 Leganes, Madrid 28911 1310 Spain 1312 Phone: 34 91 6249500 1313 Email: marcelo@it.uc3m.es 1314 URI: http://www.it.uc3m.es