idnits 2.17.1 draft-even-iccrg-dc-fast-congestion-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 23, 2019) is 1647 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'CongestionManagment' is defined on line 511, but no explicit reference was found in the text == Unused Reference: 'I-D.herbert-ipv4-eh' is defined on line 535, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-quic-transport' is defined on line 546, but no explicit reference was found in the text == Unused Reference: 'RFC6679' is defined on line 620, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-herbert-ipv4-eh-01 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-07 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-23 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-09 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-10 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-07 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-04 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG R. Even 3 Internet-Draft Huawei 4 Intended status: Informational R. Huang 5 Expires: April 25, 2020 Huawei Technologies Co., Ltd. 6 October 23, 2019 8 Data Center Fast Congestion Management 9 draft-even-iccrg-dc-fast-congestion-00 11 Abstract 13 Fast congestion control is discussed in academic papers as well as in 14 the different standard bodies. There is no one proposal for 15 providing a solution that will work for all use cases leading to 16 multiple approaches. By congestion control we refer to an end to end 17 solution and not only to the congestion control algorithm on the 18 sender side. This document describes the current state of flow 19 control and congestion for Data Centers and proposes future 20 directions. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on April 25, 2020. 39 Copyright Notice 41 Copyright (c) 2019 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . 3 59 4. Alternative Congestion Management mechanisms . . . . . . . . 4 60 4.1. Mechanisms based on estimation of network status . . . . 4 61 4.2. Network provides limited information . . . . . . . . . . 4 62 4.2.1. ECN and DCTCP . . . . . . . . . . . . . . . . . . . . 5 63 4.2.2. DCQCN . . . . . . . . . . . . . . . . . . . . . . . . 5 64 4.2.3. SCE - Some Congestion Experienced . . . . . . . . . . 6 65 4.2.4. L4S - Low Latency, Low Loss, Scalable Throughput . . 7 66 4.3. Network provides more information . . . . . . . . . . . . 8 67 4.4. Network provides proactive control . . . . . . . . . . . 9 68 5. Summary and Proposal . . . . . . . . . . . . . . . . . . . . 9 69 5.1. Reflect the network status more accurately . . . . . . . 10 70 5.2. Notify the reaction point as soon as possible. . . . . . 10 71 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 73 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 8.1. Normative References . . . . . . . . . . . . . . . . . . 11 75 8.2. Informative References . . . . . . . . . . . . . . . . . 11 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 78 1. Introduction 80 Fast congestion control is discussed in academic papers as well as in 81 the different standard bodies. There is no one proposal for 82 providing a solution that will work for all use cases leading to 83 multiple approaches. By congestion control we refer to an end to end 84 solution and not only to the congestion control algorithm on the 85 sender side. 87 The major use case that we are looking at is congestion control for 88 Data Centers, a controlled environment[RFC8085]. With the emerging 89 Distributed Storage, AI/HPC (High Performance Computing), Machine 90 Learning, etc., modern datacenter applications demand high 91 throughput(40Gbps and above) with ultra-low latency of less than 10 92 microsecond per hop from the network, with low CPU overhead. For the 93 end to end the latency should be less than 50usec, this value is 94 based on DCQCN [DCQCN] The high link speed (>40Gb/s) in Data Centers 95 (DC) are making network transfers complete faster and in fewer RTTs. 96 Network traffic in a data center is often a mix of short and long 97 flows, where the short flows require low latencies and the long flows 98 require high throughputs. 100 On IP-routed datacenter networks, RDMA is deployed using RoCEv2 101 [RoCEv2] protocol or iWARP [RFC5040] RoCEv2 [RoCEv2] is a 102 straightforward extension of the RoCE protocol that involves a simple 103 modification of the RoCE packet format. RoCEv2 packets carry an IP 104 header which allows traversal of IP L3 Routers and a UDP header that 105 serves as a stateless encapsulation layer for the RDMA Transport 106 Protocol Packets over IP. For Data Centers RDMA in ROCEv2 expect a 107 lossless fabric and this is achieved using ECN and PFC. iWARP 108 congestion control is based on TCP congestion control (DCTCP 109 [RFC8257]) 111 A good congestion control for data centers should provide low 112 latency, fast convergence and high link utilization. Since multiple 113 applications with different requirements may run on the DC network it 114 is important to provide fairness between different applications that 115 may use different congestion algorithms. An important issue from the 116 user perspective is to achieve short Flow Completion Time (FCT). 118 This document investigates the current congestion control proposals, 119 and discusses future data center congestion control directions which 120 aims to achieve high performance and collaboration. 122 2. Conventions 124 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 125 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 126 "OPTIONAL" in this document are to be interpreted as described in BCP 127 14 [RFC2119] [RFC8174] when, and only when, they appear in all 128 capitals, as shown here. 130 3. Abbreviations 132 RCM - RoCEv2 Congestion Management 134 PFC - Priority-based Flow Control 136 ECN - Explicit Congestion Notification 138 DCQCN - Data Center Quantized Congestion Notification 140 AI/HPC - Artificial Intelligence/High-Performance computing 142 ECMP - Equal-Cost Multipath 144 NIC - Network Interface Card 145 RED - Random early detection gateways for congestion avoidance 147 4. Alternative Congestion Management mechanisms 149 This section will describe alternative directions based on current 150 work. Looking at the alternatives from the network perspective we 151 can classify the alternatives as: 153 1. Based on estimation of network status: Traditional TCP, Timely. 155 2. Network provides limited information: DCQCN using only ECN, SCE 156 and L4S 158 3. Network provides some information: HPCC. 160 4. Network provides proactive control: RCP (Rate Control Protocol) 162 Note that any research on congestion control that requires network 163 participation will be irrelevant if we cannot find a viable 164 deployment path where only part of the network devices support the 165 proposed congestion control. 167 4.1. Mechanisms based on estimation of network status 169 Traditional mechanisms uses packet status as the congestion signal 170 and feedback to the sender, e.g. loss or delay, which is based on the 171 facts that packets will drop when a buffer is full and packets will 172 be delayed when a queue is building up. It can simply be achieved by 173 the interactions between the sender and the receiver, without the 174 involvement of network. It works well on the internet for a very 175 long time, especially for best effort applications that do not have 176 specific performance requirements. 178 However, these mechanism are not optimized for some data center 179 application because the convergence time and throughput are not good 180 enough. Mainly because endpoints estimation of network status are 181 not accurate enough, and these mechanisms lack further information to 182 adjust the sender behaviors. 184 4.2. Network provides limited information 186 In these mechanisms, the network utilize the ECN field of IP header 187 to provide some hints on network status. The following sections 188 describe some typical proposals. 190 4.2.1. ECN and DCTCP 192 The Internet solutions use ECN [RFC3168] for marking the state of the 193 queues in the network device, they may use some AQM mechanism 194 (fq_coDel [RFC8290] ], PIE [RFC8033]) in the network devices and a 195 congestion algorithm (New Reno [RFC5681], Cubic [RFC8312] or 196 DCTCP[RFC8257]) on the sender side to address the congestion in the 197 network. Note that ECN is signaled earlier than packet drop but may 198 cause earlier exit from TCP slow start. 200 One of the problem for TCP is that ECN is specified for TCP in such a 201 way that only one feedback signal can be transmitted per Round-Trip 202 Time (RTT). [I-D.ietf-tcpm-accurate-ecn] specifies an alternative 203 feedback scheme that provides more accurate information that can be 204 used by DCTCP and L4S. 206 Traditional TCP uses ECN signal to indicate congestion experienced 207 instead of packet loss, however, it does not provide information 208 about the degree of the congestion. DCTCP [RFC8257] is trying to 209 solve this issue. It estimates the fraction of bytes that encounter 210 congestion rather than simply detecting the congestion presence. 211 DCTCP further scales its sending rates accordingly. DCTCP is widely 212 implemented in current data center environments. 214 4.2.2. DCQCN 216 An enhancement to the congestion handling for ROCEv2 is the 217 Congestion Control for Large-Scale RDMA Deployments [DCQCN] providing 218 similar functionality to QCN [QCN] and DCTCP [RFC8257], it is 219 implemented in some of the ROCEv2 NICs but is not part of the ROCEv2 220 specification. As such, vendors have their own implementations which 221 make it difficult to interoperate with each other efficiently. 223 DCQCN tests are assuming that the Congestion Point is using RED-ECN 224 for ECN marking and the RDMA CNP message is used by the Notification 225 Point (the receiver) to report ECN Congestion Experienced (CE). 226 DCQCN as presented includes parameters that should be set. It 227 provides the parameters that were used during the specific tests 228 using Mellanox NICs. One of the comments about DCQCN is that it is 229 not simple to define the parameters in order to get an optimized 230 solution. This solution is specific to ROCEv2 and addresses only the 231 congestion control algorithm and is implemented in the NIC. 233 DCQCN notification is using CNP that only report that at least one 234 packet with CE marking was received in the last 50usec; this is 235 similar to TCP reporting. Other UDP based transports like RTP and 236 QUIC provides information about how many packets marked with CE, 237 ECT(0,1) were received. 239 4.2.3. SCE - Some Congestion Experienced 241 [I-D.morton-taht-tsvwg-sce] ECT(1) to be an early notification of 242 congestion on ECT(0) marked packets, which can be used by AQM 243 algorithms and transports as an earlier signal of congestion than CE 244 ("Congestion Experienced"). 246 The ECN specification say that the congestion algorithm should treat 247 CE marks the same as a drop packets. Using ECT(1) to signal SCE 248 permits middleboxes implementing AQM to signal incipient congestion, 249 below the threshold required to justify setting CE. Existing 250 [RFC3168] compliant receivers MUST transparently ignore this new 251 signal with respect to congestion control, and both existing and SCE- 252 aware middleboxes MAY convert SCE to CE in the same circumstances as 253 for ECT, thus ensuring backwards compatibility with ECN [RFC3168] 254 endpoints. 256 This solution is using ECT(1) which was defined in ECN [RFC3168] as a 257 one bit Nonce but this use is obsoleted in RFC8311 and SCE is using 258 it for the SCE mark. There may be other documents trying to use this 259 bit for example L4S use it to signal L4S support. The SCE marking 260 are done by the AQM algorithm (RED, CODEL) and are sent back to the 261 sender by the transport so there may be a need to add support for 262 conveying the SCE marking to the sender (QUIC for example already has 263 support for reporting the count of ECT(0) and ECT(1) separately). 264 This solution is simpler than HPCC but provide less information. 266 [I-D.heist-tsvwg-sce-one-and-two-flow-tests] presents one and two- 267 flow test results for the SCE reference implementation. These tests 268 are not intended to be a comprehensive real-world evaluation of SCE, 269 but an illustration of SCE's influence on basic TCP metrics in a 270 controlled environment. The goal of the one-flow tests is to analyze 271 the impact of SCE on the TCP throughput and TCP RTT of single TCP 272 flows across a range of simulated path bandwidths and RTTs. The 273 tests were with RENO and DCCP. Even though using SCE gave in general 274 better results there were significant under-utilization at low 275 bandwidths ( <10Mb/sec; <25Mb/sec) and a slight increase in TCP RTT 276 for DCTCP-SCE at 100Mbit / 160ms and a slight increase in TCP RTT for 277 SCE RENO at high BDPs. The document does not describe the congestion 278 algorithm that was used for DCTCP-SCE or RENO-SCE and comment that 279 further work need to be done to understand the reason for this 280 behvior. 282 The goal of the two-flow tests is to measure fairness between and 283 among SCE and non-SCE TCP flows, through either a single queue or 284 with fair queuing. 286 The initial results show that SCE enabled flows back off in the face 287 of competition, whereas non-SCE flows fill the queue until a drop or 288 CE mark occurs so fairness is not achieved. By changing the ramp by 289 which SCE is marked and marking SCE when closer to drop or CE the 290 fairness is better. 292 4.2.4. L4S - Low Latency, Low Loss, Scalable Throughput 294 There are three main components to the L4S architecture 295 [I-D.ietf-tsvwg-l4s-arch] 297 1. Network: L4S traffic needs to be isolated from the queuing 298 latency of Classic traffic. However, the two should be able to 299 freely share a common pool of capacity. This is because there is 300 no way to predict how many flows at any one time might use each 301 service and capacity in access networks is too scarce to 302 partition into two. The Dual Queue Coupled AQM 303 [I-D.ietf-tsvwg-aqm-dualq-coupled] was developed as a minimal 304 complexity solution to this problem. The two queues appear to be 305 separated by a 'semi-permeable' membrane that partitions latency 306 but not bandwidth. Per-flow queuing such as in [RFC8290] could 307 be used but it partitions both latency and bandwidth between 308 every end-to-end flow. So it is rather overkill, which brings 309 disadvantages, not least that large number of queues are needed 310 when two are sufficient. 312 2. Protocol: A host needs to distinguish L4S and Classic packets 313 with an identifier so that the network can classify them into 314 their separate treatments. [I-D.ietf-tsvwg-ecn-l4s-id] considers 315 various alternative identifiers, and concludes that all 316 alternatives involve compromises, but the ECT(1) and CE 317 codepoints of the ECN field represent a workable solution. 319 3. Host: Scalable congestion controls already exist. They solve the 320 scaling problem with TCP that was first pointed out in [RFC3649]. 321 The one used most widely (in controlled environments) is Data 322 Center TCP (DCTCP [RFC8257]). Although DCTCP as-is 'works' well 323 over the public Internet, most implementations lack certain 324 safety features that will be necessary once it is used outside 325 controlled environments like data centers. A similar scalable 326 congestion control will also need to be transplanted into 327 protocols other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.) 328 Indeed, between the present document being drafted and published, 329 the following scalable congestion controls were implemented: TCP 330 Prague, QUIC Prague and an L4S variant of the RMCAT SCReAM 331 controller [RFC8298]. 333 Using Dual Queue provides better fairness between DCTCP and Reno/ 334 Cubic . This is less relevant to Data Centers where the competing 335 streams may use DCQN and DCTCP. 337 4.3. Network provides more information 339 The new-generation high-speed cloud network congestion control 340 protocol HPCC (High Precision Congestion Control) [HPCC], aiming to 341 achieve the ultimate performance and high stability of the high-speed 342 cloud network at the same time. HPCC has been presented at ACM 343 SIGCOMM 2019. 345 The key design choice of HPCC is to rely on switches to provide fine- 346 grained load information, such as queue size and accumulated tx/rx 347 traffic to compute precise flow rates. This has two major benefits: 348 (i) HPCC can quickly converge to proper flow rates to highly utilize 349 bandwidth while avoiding congestion; and (ii) HPCC can consistently 350 maintain a close-to-zero queue for low latency. 352 HPCC is a sender-driven CC framework. Each packet a sender sends 353 will be acknowledged by the receiver. During the propagation of the 354 packet from the sender to the receiver, each switch along the path 355 leverages the INT feature of its switching ASIC to insert some meta- 356 data that reports the current load of the packet's egress port, 357 including timestamp (ts), queue length (qLen), transmitted bytes 358 (txBytes), and the link bandwidth capacity (B). When the receiver 359 gets the packet, it copies all the meta-data recorded by the switches 360 to the ACK message it sends back to the sender. The sender decides 361 how to adjust its flow rate each time it receives an ACK with network 362 load information. 364 Current IETF activity in IOAM [I-D.ietf-ippm-ioam-data] provides a 365 standard mechanism for inserting metadata by the switches in the 366 middle. IOAM can provides an optional method for sending the 367 metadata feedback by the network to the endpoints on congestion 368 status. But to using IOAM, the following points should be 369 considered: 371 1. Is the current IOAM data fields sufficient for congestion 372 control. 374 2. The encapsulation of IOAM in data center for congestion control. 376 3. The feedback format for sender driven congestion control. 378 The HPCC framework requires each node in the middle to add 379 information about its state to the forward going packet until it 380 reaches the receiver who will send the acknowledgment. We can think 381 of others modes like having the nodes in the middle updating the 382 status information based on its available resources. This solution 383 requires support for INT or IOAM, both protocols need to specify the 384 packet format with the INT/IOAM extension. The HPCC document specify 385 how to implement it for ROCEv2 while for IOAM there are some drafts 386 in IPPM WG describing how to implement it for different transports 387 and layer 2 packets. 389 The conclusion from the trials done were that HPCC can be a next- 390 generation CC for high-speed networks to achieve ultra-low latency, 391 high bandwidth, and stability simultaneously. HPCC achieves fast 392 convergence, small queues, and fairness by leveraging precise load 393 information from INT. 395 Similar mechanism is defined in Quick Start for TCP and IP[RFC4782]. 396 There is a difference with the starting rate. While HPCC starts at 397 maximum line speed [RFC4782] starts at a rate as specified in the 398 Quick-Start request message. The Quick Start is specified for TCP, 399 if other transport (UDP) is used there is a need to specify how the 400 receiver send the Quick-Start response message. 402 4.4. Network provides proactive control 404 The typical algorithm in this category is RCP (Rate Control Protocol) 405 [RCP]. In the basic RCP algorithm , a router maintains a single 406 rate, R(t), for every link. The router "stamps" R(t) on every 407 passing packet (unless it already carries a slower value). The 408 receiver sends the value back to the sender, thus informing it about 409 the slowest (or bottleneck) rate along the path. In this way, the 410 sender quickly finds out the rate it should be using (without the 411 need for Slow-Start). The router updates R(t) approximately once per 412 roundtrip time, and strives to emulate Processor Sharing among flows. 413 The biggest plus of RCP is the short flow completion times under a 414 wide range of network and traffic characteristics. 416 The downside of RCP is that RCP involves the routers in congestion 417 control, so it needs help from the infrastructure. Although they are 418 simple, it does have per-packet computations. Another downside is 419 that although the RCP algorithm strives to keep the buffer occupancy 420 low most times, there are no guarantees of buffers not overflowing or 421 of a zero packet loss. 423 5. Summary and Proposal 425 Congestion control is all about how to utilize the network resource 426 in a better and reasonably way under different network conditions. 427 Senders are the reaction points that consume network resource, and 428 network nodes are the congestion points. Ideally, reaction points 429 should react as soon as possible when network statuses change. To 430 achieve that, there are two directions: 432 5.1. Reflect the network status more accurately 434 In order to provide more information than just ECN CE marking there 435 is a need to standardize a mechanism for the network device to 436 provide such information and for the receiver to send more 437 information to the sender. The network device should not insert any 438 new fields to the IP packet but should be able to modify the value of 439 fields in the packets sent from the data sender. 441 The network device will update the metadata in the forward going 442 packet to provide more information than a single CE mark or SCE like 443 solution. 445 The receiver will analyze the metadata and report back to the sender. 446 Different from the Internet, data center network can benefit more 447 from having more accurate information to achieve better congestion 448 control. And this means network and hosts must collaborate together 449 to achieve it. 451 Issues to be addressed: 453 o How to add the metadata to the forward stream (IOAM is a valid 454 option since we are interested in a single DC domain). The 455 encapsulations for both IPv4 and IPv6 should be considered. 457 o Negotiation of the capabilities of different nodes. 459 o The format of the network information feedback to the sender in 460 the case of sender-driven mechanisms. 462 o The semantics of the message (notification or proactive) 464 o Investigation of the extra load on the network device for adding 465 the metadata. 467 5.2. Notify the reaction point as soon as possible. 469 In this direction, it is worth to investigate if it's possible for 470 the middle nodes to notify the sender directly (like IOAM Postcards) 471 on network conditions, but such a method is challenging in terms of 472 addressing security issues and the first concern will be that this 473 can serve as a tool for DOS attack. But other ways, for example, 474 carry the information in the reverse traffic would be an alternative 475 as long as reverse traffic exists. 477 Issues to be addressed: 479 o How to deal with multiple congestion points? 481 o How to identify support by the sender and receiver for this mode 482 and support legacy systems (same as previous mode). 484 o How to authenticate the validity of the data. 486 o Hardware implications 488 6. Security Considerations 490 TBD 492 7. IANA Considerations 494 No IANA action 496 8. References 498 8.1. Normative References 500 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 501 Requirement Levels", BCP 14, RFC 2119, 502 DOI 10.17487/RFC2119, March 1997, 503 . 505 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 506 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 507 May 2017, . 509 8.2. Informative References 511 [CongestionManagment] 512 "Understanding RoCEv2 Congestion Management", 12 2018, 513 . 516 [DCQCN] Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., 517 Liron, Y., Padhye, J., Raindel, S., Yahia, M. H., and M. 518 Zhang, "Congestion control for large-scale RDMA 519 deployments. In ACM SIGCOMM Computer Communication Review, 520 Vol. 45. ACM, 523-536.", 8 2015, 521 . 524 [HPCC] Li, Y., Miao, R., Liur, H. H., Zhuang, Y., Feng, F., Tang, 525 L., Cao, Z., Zhang, M., Kelly, F., Alizadeh, M., and M. 526 Yu, "HPCC: High Precision Congestion Control", 8 2019, 527 . 529 [I-D.heist-tsvwg-sce-one-and-two-flow-tests] 530 Heist, P., Grimes, R., and J. Morton, "Some Congestion 531 Experienced One and Two-Flow Tests", draft-heist-tsvwg- 532 sce-one-and-two-flow-tests-00 (work in progress), July 533 2019. 535 [I-D.herbert-ipv4-eh] 536 Herbert, T., "IPv4 Extension Headers and Flow Label", 537 draft-herbert-ipv4-eh-01 (work in progress), May 2019. 539 [I-D.ietf-ippm-ioam-data] 540 Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., 541 Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, 542 P., Chang, R., daniel.bernier@bell.ca, d., and J. Lemon, 543 "Data Fields for In-situ OAM", draft-ietf-ippm-ioam- 544 data-07 (work in progress), September 2019. 546 [I-D.ietf-quic-transport] 547 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 548 and Secure Transport", draft-ietf-quic-transport-23 (work 549 in progress), September 2019. 551 [I-D.ietf-tcpm-accurate-ecn] 552 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 553 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 554 ecn-09 (work in progress), July 2019. 556 [I-D.ietf-tsvwg-aqm-dualq-coupled] 557 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 558 AQMs for Low Latency, Low Loss and Scalable Throughput 559 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in 560 progress), July 2019. 562 [I-D.ietf-tsvwg-ecn-l4s-id] 563 Schepper, K. and B. Briscoe, "Identifying Modified 564 Explicit Congestion Notification (ECN) Semantics for 565 Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- 566 id-07 (work in progress), July 2019. 568 [I-D.ietf-tsvwg-l4s-arch] 569 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 570 Latency, Low Loss, Scalable Throughput (L4S) Internet 571 Service: Architecture", draft-ietf-tsvwg-l4s-arch-04 (work 572 in progress), July 2019. 574 [I-D.morton-taht-tsvwg-sce] 575 Morton, J. and D. Taht, "The Some Congestion Experienced 576 ECN Codepoint", draft-morton-taht-tsvwg-sce-00 (work in 577 progress), March 2019. 579 [IEEE.802.1QBB_2011] 580 IEEE, "IEEE Standard for Local and metropolitan area 581 networks--Media Access Control (MAC) Bridges and Virtual 582 Bridged Local Area Networks--Amendment 17: Priority-based 583 Flow Control", IEEE 802.1Qbb-2011, 584 DOI 10.1109/ieeestd.2011.6032693, September 2011, 585 . 588 [QCN] Alizadeh, M., Atikoglu, B., Kabbani, A., Lakshmikantha, 589 A., Pan, R., Prabhakar, B., and M. Seaman, "Data Center 590 Transport Mechanisms:Congestion Control Theory and IEEE 591 Standardization", 9 2008, 592 . 594 [RCP] Dukkipati, N., "RATE CONTROL PROTOCOL (RCP): CONGESTION 595 CONTROL TO MAKE FLOWS COMPLETE QUICKLY", 10 2007, 596 . 598 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 599 of Explicit Congestion Notification (ECN) to IP", 600 RFC 3168, DOI 10.17487/RFC3168, September 2001, 601 . 603 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 604 RFC 3649, DOI 10.17487/RFC3649, December 2003, 605 . 607 [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- 608 Start for TCP and IP", RFC 4782, DOI 10.17487/RFC4782, 609 January 2007, . 611 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 612 Garcia, "A Remote Direct Memory Access Protocol 613 Specification", RFC 5040, DOI 10.17487/RFC5040, October 614 2007, . 616 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 617 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 618 . 620 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 621 and K. Carlberg, "Explicit Congestion Notification (ECN) 622 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 623 2012, . 625 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 626 "Proportional Integral Controller Enhanced (PIE): A 627 Lightweight Control Scheme to Address the Bufferbloat 628 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 629 . 631 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 632 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 633 March 2017, . 635 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 636 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 637 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 638 October 2017, . 640 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 641 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 642 and Active Queue Management Algorithm", RFC 8290, 643 DOI 10.17487/RFC8290, January 2018, 644 . 646 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 647 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 648 2017, . 650 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 651 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 652 RFC 8312, DOI 10.17487/RFC8312, February 2018, 653 . 655 [RoCEv2] "Infiniband Trade Association. Supplement to InfiniBand 656 architecture specification volume 1 release 1.2.2 annex 657 A17: RoCEv2 (IP routable RoCE).", 658 . 660 Authors' Addresses 662 Roni Even 663 Huawei 665 Email: roni.even@huawei.com 667 Rachel Huang 668 Huawei Technologies Co., Ltd. 670 Email: rachel.huang@huawei.com