idnits 2.17.1 draft-ietf-tcpm-dctcp-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 2, 2015) is 3098 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6982 (Obsoleted by RFC 7942) -- Duplicate reference: RFC3168, mentioned in 'RFC3168-ERRATA3639', was also mentioned in 'RFC3168'. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Bensley 3 Internet-Draft Microsoft 4 Intended status: Informational L. Eggert 5 Expires: May 5, 2016 NetApp 6 D. Thaler 7 P. Balasubramanian 8 Microsoft 9 G. Judd 10 Morgan Stanley 11 November 2, 2015 13 Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters 14 draft-ietf-tcpm-dctcp-01 16 Abstract 18 This informational memo describes Datacenter TCP (DCTCP), an 19 improvement to TCP congestion control for datacenter traffic. DCTCP 20 uses improved Explicit Congestion Notification (ECN) processing to 21 estimate the fraction of bytes that encounter congestion, rather than 22 simply detecting that some congestion has occurred. DCTCP then 23 scales the TCP congestion window based on this estimate. This method 24 achieves high burst tolerance, low latency, and high throughput with 25 shallow-buffered switches. This memo also discusses deployment 26 issues related to the coexistence of DCTCP and conventional TCP, the 27 lack of a negotiating mechanism between sender and receiver, and 28 presents some possible mitigations. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on May 5, 2016. 47 Copyright Notice 49 Copyright (c) 2015 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 3. DCTCP Algorithm . . . . . . . . . . . . . . . . . . . . . . . 4 67 3.1. Marking Congestion on the Switches . . . . . . . . . . . 4 68 3.2. Echoing Congestion Information on the Receiver . . . . . 4 69 3.3. Processing Congestion Indications on the Sender . . . . . 5 70 3.4. Handling of SYN, SYN-ACK, RST Packets . . . . . . . . . . 7 71 4. Implementation Issues . . . . . . . . . . . . . . . . . . . . 7 72 5. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 9 73 6. Known Issues . . . . . . . . . . . . . . . . . . . . . . . . 10 74 7. Implementation Status . . . . . . . . . . . . . . . . . . . . 11 75 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 76 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 77 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 78 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 79 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 80 11.2. Informative References . . . . . . . . . . . . . . . . . 12 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 83 1. Introduction 85 Large datacenters necessarily need many network switches to 86 interconnect its many servers. Therefore, a datacenter can greatly 87 reduce its capital expenditure by leveraging low-cost switches. 88 However, such low-cost switches tend to have limited queue capacities 89 and are thus more susceptible to packet loss due to congestion. 91 Network traffic in a datacenter is often a mix of short and long 92 flows, where the short flows require low latencies and the long flows 93 require high throughputs. Datacenters also experience incast bursts, 94 where many servers send traffic to a single server at the same time. 96 For example, this traffic pattern is a natural consequence of 97 MapReduce workload: The worker nodes complete at approximately the 98 same time, and all reply to the master node concurrently. 100 These factors place some conflicting demands on the queue occupancy 101 of a switch: 103 o The queue must be short enough that it does not impose excessive 104 latency on short flows. 106 o The queue must be long enough to buffer sufficient data for the 107 long flows to saturate the path capacity. 109 o The queue must be short enough to absorb incast bursts without 110 excessive packet loss. 112 Standard TCP congestion control [RFC5681] relies on packet loss to 113 detect congestion. This does not meet the demands described above. 114 First, short flows will start to experience unacceptable latencies 115 before packet loss occurs. Second, by the time TCP congestion 116 control kicks in on the senders, most of the incast burst has already 117 been dropped. 119 [RFC3168] describes a mechanism for using Explicit Congestion 120 Notification (ECN) from the switches for early detection of 121 congestion, rather than waiting for packet loss to occur. However, 122 this method only detects the presence of congestion, not its extent. 123 In the presence of mild congestion, the TCP congestion window is 124 reduced too aggressively and this unnecessarily reduces the 125 throughput of long flows. 127 Datacenter TCP (DCTCP) improves traditional ECN processing by 128 estimating the fraction of bytes that encounter congestion, rather 129 than simply detecting that some congestion has occurred. DCTCP then 130 scales the TCP congestion window based on this estimate. This method 131 achieves high burst tolerance, low latency, and high throughput with 132 shallow-buffered switches. 134 It is recommended that DCTCP be deployed in a datacenter environment 135 where the endpoints and the switching fabric are under a single 136 administrative domain. This protocol is not meant for uncontrolled 137 deployment in the global Internet. 139 2. Terminology 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 143 document are to be interpreted as described in [RFC2119]. 145 3. DCTCP Algorithm 147 There are three components involved in the DCTCP algorithm: 149 o The switches (or other intermediate devices in the network) detect 150 congestion and set the Congestion Encountered (CE) codepoint in 151 the IP header. 153 o The receiver echoes the congestion information back to the sender, 154 using the ECN-Echo (ECE) flag in the TCP header. 156 o The sender computes a congestion estimate and reacts, by reducing 157 the TCP congestion window accordingly (cwnd). 159 3.1. Marking Congestion on the Switches 161 The switches in a datacenter fabric indicate congestion to the end 162 nodes by setting the CE codepoint in the IP header as specified in 163 Section 5 of [RFC3168]. For example, the switches may be configured 164 with a congestion threshold. When a packet arrives at a switch and 165 its queue length is greater than the congestion threshold, the switch 166 sets the CE codepoint in the packet. For example, Section 3.4 of 167 [DCTCP10] suggests threshold marking with a threshold K > (RTT * 168 C)/7, where C is the sending rate in packets per second. However, 169 the actual algorithm for marking congestion is an implementation 170 detail of the switch and will generally not be known to the sender 171 and receiver. Therefore, sender and receiver MUST NOT assume that a 172 particular marking algorithm is implemented by the switching fabric. 174 3.2. Echoing Congestion Information on the Receiver 176 According to Section 6.1.3 of [RFC3168], the receiver sets the ECE 177 flag if any of the packets being acknowledged had the CE code point 178 set. The receiver then continues to set the ECE flag until it 179 receives a packet with the Congestion Window Reduced (CWR) flag set. 180 However, the DCTCP algorithm requires more detailed congestion 181 information. In particular, the sender must be able to determine the 182 number of bytes sent that encountered congestion. Thus, the scheme 183 described in [RFC3168] does not suffice. 185 One possible solution is to ACK every packet and set the ECE flag in 186 the ACK if and only if the CE code point was set in the packet being 187 acknowledged. However, this prevents the use of delayed ACKs, which 188 are an important performance optimization in datacenters. 190 Instead, DCTCP introduces a new Boolean TCP state variable, "DCTCP 191 Congestion Encountered" (DCTCP.CE), which is initialized to false and 192 stored in the Transmission Control Block (TCB). When sending an ACK, 193 the ECE flag MUST be set if and only if DCTCP.CE is true. When 194 receiving packets, the CE codepoint MUST be processed as follows: 196 1. If the CE codepoint is set and DCTCP.CE is false, send an ACK for 197 any previously unacknowledged packets and set DCTCP.CE to true. 199 2. If the CE codepoint is not set and DCTCP.CE is true, send an ACK 200 for any previously unacknowledged packets and set DCTCP.CE to 201 false. 203 3. Otherwise, ignore the CE codepoint. 205 The handling of the "Congestion Window Reduced" (CWR) bit is also 206 exactly as per [RFC3168] including [RFC3168-ERRATA3639]. That is, on 207 receipt of a segment with both the CE and CWR bits set, CWR is 208 processed first and then ECE is processed. 210 Send immediate 211 ACK with ECE=0 212 .----. .-------------. .---. 213 Send 1 ACK / v v | | \ 214 for every | .------. .------. | Send 1 ACK 215 m packets | | CE=0 | | CE=1 | | for every 216 with ECE=0 | '------' '------' | m packets 217 \ | | ^ ^ / with ECE=1 218 '---' '------------' '----' 219 Send immediate 220 ACK with ECE=1 222 Figure 1: ACK generation state machine. DCTCP.CE abbreviated as CE. 224 3.3. Processing Congestion Indications on the Sender 226 The sender estimates the fraction of bytes sent that encountered 227 congestion. The current estimate is stored in a new TCP state 228 variable, DCTCP.Alpha, which is initialized to 1 and MUST be updated 229 as follows: 231 DCTCP.Alpha = DCTCP.Alpha * (1 - g) + g * M 233 where 235 o g is the estimation gain, a real number between 0 and 1. The 236 selection of g is left to the implementation. See Section 4 for 237 further considerations. 239 o M is the fraction of bytes sent that encountered congestion during 240 the previous observation window, where the observation window is 241 chosen to be approximately the Round Trip Time (RTT). In 242 particular, an observation window ends when all bytes in flight at 243 the beginning of the window have been acknowledged. 245 In order to update DCTCP.Alpha, the TCP state variables defined in 246 [RFC0793] are used, and three additional TCP state variables are 247 introduced: 249 o DCTCP.WindowEnd: The TCP sequence number threshold for beginning a 250 new observation window; initialized to SND.UNA. 252 o DCTCP.BytesSent: The number of bytes sent during the current 253 observation window; initialized to zero. 255 o DCTCP.BytesMarked: The number of bytes sent during the current 256 observation window that encountered congestion; initialized to 257 zero. 259 The congestion estimator on the sender MUST process acceptable ACKs 260 as follows: 262 1. Compute the bytes acknowledged (TCP SACK options [RFC2018] are 263 ignored): 265 BytesAcked = SEG.ACK - SND.UNA 267 2. Update the bytes sent: 269 DCTCP.BytesSent += BytesAcked 271 3. If the ECE flag is set, update the bytes marked: 273 DCTCP.BytesMarked += BytesAcked 275 4. If the acknowledgment number is less than or equal to 276 DCTCP.WindowEnd, stop processing. Otherwise, the end of the 277 observation window has been reached, so proceed to update the 278 congestion estimate as follows: 280 5. Compute the congestion level for the current observation window: 282 M = DCTCP.BytesMarked / DCTCP.BytesSent 284 6. Update the congestion estimate: 286 DCTCP.Alpha = DCTCP.Alpha * (1 - g) + g * M 288 7. Determine the end of the next observation window: 290 DCTCP.WindowEnd = SND.NXT 292 8. Reset the byte counters: 294 DCTCP.BytesSent = DCTCP.BytesMarked = 0 296 Rather than always halving the congestion window as described in 297 [RFC3168], when the sender receives an indication of congestion 298 (ECE), the sender MUST update cwnd as follows: 300 cwnd = cwnd * (1 - DCTCP.Alpha / 2) 302 Thus, when no bytes sent experienced congestion, DCTCP.Alpha equals 303 zero, and cwnd is left unchanged. When all sent bytes experienced 304 congestion, DCTCP.Alpha equals one, and cwnd is reduced by half. 305 Lower levels of congestion will result in correspondingly smaller 306 reductions to cwnd. 308 Just as specified in [RFC3168], TCP should not react to congestion 309 indications more than once for every window of data. The setting of 310 the "Congestion Window Reduced" (CWR) bit is also exactly as per 311 [RFC3168]. 313 3.4. Handling of SYN, SYN-ACK, RST Packets 315 [RFC3168] requires that a compliant TCP MUST NOT set ECT on SYN or 316 SYN-ACK packets. [RFC5562] proposes setting ECT on SYN-ACK packets, 317 but maintains the restriction of no ECT on SYN packets. Both these 318 RFCs prohibit ECT in SYN packets due to security concerns regarding 319 malicious SYN packets with ECT set. These RFCs, however, are 320 intended for general Internet use, and do not directly apply to a 321 controlled datacenter environment. The switching fabric can drop TCP 322 packets that do not have the ECT set in the IP header. If SYN and 323 SYN-ACK packets for DCTCP connections do not have ECT set, they will 324 be dropped with high probability. For DCTCP connections, the sender 325 SHOULD set ECT for SYN, SYN-ACK and RST packets. 327 4. Implementation Issues 329 As noted in Section 3.3, the implementation MUST choose a suitable 330 estimation gain. [DCTCP10] provides a theoretical basis for 331 selecting the gain. However, it may be more practical to use 332 experimentation to select a suitable gain for a particular network 333 and workload. The Microsoft implementation of DCTCP in Windows 334 Server 2012 uses a fixed estimation gain of 1/16. 336 The implementation must also decide when to use DCTCP. Datacenter 337 servers may need to communicate with endpoints outside the 338 datacenter, where DCTCP is unsuitable or unsupported. Thus, a global 339 configuration setting to enable DCTCP will generally not suffice. 340 DCTCP provides no mechanism for negotiating its use. Thus, there is 341 additional management and configuration overhead required to ensure 342 that DCTCP is not used with non-DCTCP endpoints. 344 Potential solutions rely on either configuration or heuristics. 345 Heuristics need to allow endpoints to individually enable DCTCP, to 346 ensure a DCTCP sender is always paired with a DCTCP receiver. One 347 approach is to enable DCTCP based on the IP address of the remote 348 endpoint. Another approach is to detect connections that transmit 349 within the bounds a datacenter. For example, Microsoft Windows 350 Server 2012 (and later versions) supports automatic selection of 351 DCTCP if the estimated RTT is less than 10 msec and ECN is 352 successfully negotiated, under the assumption that if the RTT is low, 353 then the two endpoints are likely in the same datacenter network. 355 It is RECOMMENDED that an implementation deal with loss episodes in 356 the same way as conventional TCP. In case of a timeout or fast 357 retransmit or any change in delay (for delay based congestion 358 control), the cwnd and other state variables like ssthresh must be 359 changed in the same way that a conventional TCP would have changed 360 them. It would be useful to implement DCTCP as additional actions on 361 top of an existing congestion control algorithm like NewReno. The 362 DCTCP implementation MAY also allow configuration of resetting the 363 value of DCTCP.Alpha as part of processing any loss episodes. 365 To prevent incast throughput collapse, the minimum RTO (MinRTO) used 366 by TCP should be lowered significantly. The default value of MinRTO 367 in Windows is 300 msec, which is much greater than the maximum 368 latencies inside a datacenter. In Microsoft Windows Server 2012 (and 369 later), the MinRTO value is configurable, allowing values as low as 370 10 msec on a per-subnet or per-port basis (or even globally.) A 371 lower MinRTO value requires a correspondingly lower delayed ACK 372 timeout on the receiver. It is RECOMMENDED that an implementation 373 allow configuration of lower timeouts for DCTCP connections. 375 In the same vein, it is also RECOMMENDED that an implementation allow 376 configuration of restarting the congestion window (cwnd) of idle 377 DCTCP connections as described in [RFC5681], since network conditions 378 can change rapidly in datacenters. 380 [RFC3168] forbids the ECN-marking of pure ACK packets, because of the 381 inability of TCP to mitigate ACK-path congestion and protocol-wise 382 preferential treatment by routers. However, dropping pure ACKs - 383 rather than ECN marking them - has disadvantages for typical 384 datacenter traffic patterns. Because of the prevalence of bursty 385 traffic patterns that feature transient congestion, dropping of ACKs 386 causes subsequent retransmissions. It is RECOMMENDED that an 387 implementation provide a configuration knob that forces ECT to be set 388 on pure ACKs. 390 The DCTCP.Alpha calculation as per the formula in Section 3.3 391 involves fractions. An efficient kernel implementation MAY scale the 392 DCTCP.Alpha value for efficient computation using shift operations. 393 For example, if the implementation chooses g as 1/16, multiplications 394 of DCTCP.Alpha by g become right-shifts by 4. A scaling 395 implementation SHOULD ensure that DCTCP.Alpha is able to reach zero 396 once it falls below the smallest shifted value (16 in the above 397 example). At the other extreme, a scaled update MUST also ensure 398 DCTCP.Alpha does not exceed the scaling factor, which would be 399 equivalent to greater than 100% congestion. So, DCTCP.Alpha MUST be 400 clamped after an update. 402 This results in the following computations replacing steps 5 and 6 in 403 Section 3.3, where SCF is the chosen scaling factor (65536 in the 404 example) and SHF is the shift factor (4 in the example): 406 1. Compute the congestion level for the current observation window: 408 ScaledM = SCF * DCTCP.BytesMarked / DCTCP.BytesSent 410 2. Update the congestion estimate: 412 if (DCTCP.Alpha >> SHF) == 0 then DCTCP.Alpha = 0 414 DCTCP.Alpha += (ScaledM >> SHF) - (DCTCP.Alpha >> SHF) 416 if DCTCP.Alpha > SCF then DCTCP.Alpha = SCF 418 5. Deployment Issues 420 DCTCP and conventional TCP congestion control do not coexist well in 421 the same network. In DCTCP, the marking threshold is set to a very 422 low value to reduce queueing delay, and a relatively small amount of 423 congestion will exceed the marking threshold. During such periods of 424 congestion, conventional TCP will suffer packet loss and quickly and 425 drastically reduce cwnd. DCTCP, on the other hand, will use the 426 fraction of marked packets to reduce cwnd more gradually. Thus, the 427 rate reduction in DCTCP will be much slower than that of conventional 428 TCP, and DCTCP traffic will gain a larger share of the capacity 429 compared to conventional TCP traffic traversing the same path. If 430 the traffic in the datacenter is a mix of conventional TCP and DCTCP, 431 it is RECOMMENDED that DCTCP traffic be segregated from conventional 432 TCP traffic. [MORGANSTANLEY] describes a deployment that uses the IP 433 DSCP bits to segregate the network such that AQM is applied to DCTCP 434 traffic, whereas TCP traffic is managed via drop-tail queueing. 436 Today's commodity switches allow configuration of different marking/ 437 drop profiles for non-TCP and non-IP packets. Non-TCP and non-IP 438 packets should be able to pass through such switches, unless they 439 really run out of buffer space. If the datacenter traffic consists 440 of such traffic (including UDP), one possible mitigation would be to 441 mark IP packets as ECT even when there is no transport that is 442 reacting to the marking. 444 Since DCTCP relies on congestion marking by the switches, DCTCP can 445 only be deployed in datacenters where the entire network 446 infrastructure supports ECN. The switches may also support 447 configuration of the congestion threshold used for marking. The 448 proposed parameterization can be configured with switches that 449 implement RED. [DCTCP10] provides a theoretical basis for selecting 450 the congestion threshold, but as with the estimation gain, it may be 451 more practical to rely on experimentation or simply to use the 452 default configuration of the device. DCTCP will degrade to loss- 453 based congestion control when transiting a congested drop-tail link. 455 DCTCP requires changes on both the sender and the receiver, so both 456 endpoints must support DCTCP. Furthermore, DCTCP provides no 457 mechanism for negotiating its use, so both endpoints must be 458 configured through some out-of-band mechanism to use DCTCP. A 459 variant of DCTCP that can be deployed unilaterally and only requires 460 standard ECN behavior has been described in [ODCTCP][BSDCAN], but 461 requires additional experimental evaluation. 463 6. Known Issues 465 DCTCP relies on the sender's ability to reconstruct the stream of CE 466 codepoints received by the remote endpoint. To accomplish this, 467 DCTCP avoids using a single ACK packet to acknowledge segments 468 received both with and without the CE codepoint set. However, if one 469 or more ACK packets are dropped, it is possible that a subsequent ACK 470 will cumulatively acknowledge a mix of CE and non-CE segments. This 471 will, of course, result in a less accurate congestion estimate. 472 There are some potential considerations: 474 o Even with an inaccurate congestion estimate, DCTCP may still 475 perform better than [RFC3168]. 477 o If the estimation gain is small relative to the packet loss rate, 478 the estimate may not be too inaccurate. 480 o If packet loss mostly occurs under heavy congestion, most drops 481 will occur during an unbroken string of CE packets, and the 482 estimate will be unaffected. 484 However, the effect of packet drops on DCTCP under real world 485 conditions has not been analyzed. 487 DCTCP provides no mechanism for negotiating its use. The effect of 488 using DCTCP with a standard ECN endpoint has been analyzed in 489 [ODCTCP][BSDCAN]. Furthermore, it is possible that other 490 implementations may also modify [RFC3168] behavior without 491 negotiation, causing further interoperability issues. 493 Much like standard TCP, DCTCP is biased against flows with longer 494 RTTs. A method for improving the fairness of DCTCP has been proposed 495 in [ADCTCP], but requires additional experimental evaluation. 497 7. Implementation Status 499 This section documents the implementation status of the specification 500 in this document, as recommended by [RFC6982]. 502 This document describes DCTCP as implemented in Microsoft Windows 503 Server 2012. Since publication of the first versions of this 504 document, the Linux [LINUX] and FreeBSD [FREEBSD] operating systems 505 have also implemented support for DCTCP in a way that is believed to 506 follow this document. 508 8. Security Considerations 510 DCTCP enhances ECN and thus inherits the security considerations 511 discussed in [RFC3168]. The processing changes introduced by DCTCP 512 do not exacerbate these considerations or introduce new ones. In 513 particular, with either algorithm, the network infrastructure or the 514 remote endpoint can falsely report congestion and thus cause the 515 sender to reduce cwnd. However, this is no worse than what can be 516 achieved by simply dropping packets. 518 9. IANA Considerations 520 This document has no actions for IANA. 522 10. Acknowledgements 524 The DCTCP algorithm was originally proposed and analyzed in [DCTCP10] 525 by Mohammad Alizadeh, Albert Greenberg, Dave Maltz, Jitu Padhye, 526 Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari 527 Sridharan. 529 We would like to thank Andrew Shewmaker for identifying the problem 530 of clamping DCTCP.Alpha and proposing a solution for it. 532 Lars Eggert has received funding from the European Union's Horizon 533 2020 research and innovation program 2014-2018 under grant agreement 534 No. 644866 ("SSICLOPS"). This document reflects only the authors' 535 views and the European Commission is not responsible for any use that 536 may be made of the information it contains. 538 11. References 540 11.1. Normative References 542 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 543 RFC 793, DOI 10.17487/RFC0793, September 1981, 544 . 546 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 547 Selective Acknowledgment Options", RFC 2018, 548 DOI 10.17487/RFC2018, October 1996, 549 . 551 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 552 Requirement Levels", BCP 14, RFC 2119, 553 DOI 10.17487/RFC2119, March 1997, 554 . 556 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 557 of Explicit Congestion Notification (ECN) to IP", 558 RFC 3168, DOI 10.17487/RFC3168, September 2001, 559 . 561 11.2. Informative References 563 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 564 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 565 . 567 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 568 Ramakrishnan, "Adding Explicit Congestion Notification 569 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 570 DOI 10.17487/RFC5562, June 2009, 571 . 573 [RFC6982] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 574 Code: The Implementation Status Section", RFC 6982, 575 DOI 10.17487/RFC6982, July 2013, 576 . 578 [DCTCP10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, 579 P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data 580 Center TCP (DCTCP)", DOI 10.1145/1851182.1851192, Proc. 581 ACM SIGCOMM 2010 Conference (SIGCOMM 10), August 2010, 582 . 584 [ODCTCP] Kato, M., "Improving Transmission Performance with One- 585 Sided Datacenter TCP", M.S. Thesis, Keio University, 586 2014, . 588 [BSDCAN] Kato, M., Eggert, L., Zimmermann, A., van Meter, R., and 589 H. Tokuda, "Extensions to FreeBSD Datacenter TCP for 590 Incremental Deployment Support", BSDCan 2015, June 2015, 591 . 593 [ADCTCP] Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 594 of DCTCP: Stability, Convergence, and Fairness", 595 DOI 10.1145/1993744.1993753, Proc. ACM SIGMETRICS Joint 596 International Conference on Measurement and Modeling of 597 Computer Systems (SIGMETRICS 11), June 2011, 598 . 600 [LINUX] Borkmann, D. and F. Westphal, "Linux DCTCP patch", 2014, 601 . 605 [FREEBSD] Kato, M. and H. Panchasara, "DCTCP (Data Center TCP) 606 implementation", 2015, 607 . 610 [MORGANSTANLEY] 611 Judd, G., "Attaining the Promise and Avoiding the Pitfalls 612 of TCP in the Datacenter", Proc. 12th USENIX Symposium on 613 Networked Systems Design and Implementation (NSDI 15), May 614 2015, . 617 [RFC3168-ERRATA3639] 618 Scheffenegger, R., "RFC3168 Errata ID 3639", 2013, 619 . 622 Authors' Addresses 624 Stephen Bensley 625 Microsoft 626 One Microsoft Way 627 Redmond, WA 98052 628 USA 630 Phone: +1 425 703 5570 631 Email: sbens@microsoft.com 633 Lars Eggert 634 NetApp 635 Sonnenallee 1 636 Kirchheim 85551 637 Germany 639 Phone: +49 151 120 55791 640 Email: lars@netapp.com 641 URI: http://eggert.org/ 643 Dave Thaler 644 Microsoft 646 Phone: +1 425 703 8835 647 Email: dthaler@microsoft.com 649 Praveen Balasubramanian 650 Microsoft 652 Phone: +1 425 538 2782 653 Email: pravb@microsoft.com 655 Glenn Judd 656 Morgan Stanley 658 Phone: +1 973 979 6481 659 Email: glenn.judd@morganstanley.com