idnits 2.17.1 draft-ietf-tcpm-dctcp-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 14, 2016) is 2719 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Duplicate reference: RFC3168, mentioned in 'RFC3168-ERRATA3639', was also mentioned in 'RFC3168'. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Bensley 3 Internet-Draft Microsoft 4 Intended status: Informational L. Eggert 5 Expires: May 18, 2017 NetApp 6 D. Thaler 7 P. Balasubramanian 8 Microsoft 9 G. Judd 10 Morgan Stanley 11 November 14, 2016 13 Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters 14 draft-ietf-tcpm-dctcp-03 16 Abstract 18 This informational memo describes Datacenter TCP (DCTCP), an 19 improvement to TCP congestion control for datacenter traffic. DCTCP 20 uses improved Explicit Congestion Notification (ECN) processing to 21 estimate the fraction of bytes that encounter congestion, rather than 22 simply detecting that some congestion has occurred. DCTCP then 23 scales the TCP congestion window based on this estimate. This method 24 achieves high burst tolerance, low latency, and high throughput with 25 shallow-buffered switches. This memo also discusses deployment 26 issues related to the coexistence of DCTCP and conventional TCP, the 27 lack of a negotiating mechanism between sender and receiver, and 28 presents some possible mitigations. DCTCP as described in this draft 29 is applicable to deployments in controlled environments like 30 datacenters but it MUST NOT be deployed over the public Internet 31 without additional measures, as detailed in Section 5. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on May 18, 2017. 50 Copyright Notice 52 Copyright (c) 2016 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 3. DCTCP Algorithm . . . . . . . . . . . . . . . . . . . . . . . 4 70 3.1. Marking Congestion on the L3 Switches and Routers . . . . 4 71 3.2. Echoing Congestion Information on the Receiver . . . . . 4 72 3.3. Processing Congestion Indications on the Sender . . . . . 6 73 3.4. Handling of SYN, SYN-ACK, RST Packets . . . . . . . . . . 8 74 4. Implementation Issues . . . . . . . . . . . . . . . . . . . . 8 75 5. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 9 76 6. Known Issues . . . . . . . . . . . . . . . . . . . . . . . . 10 77 7. Implementation Status . . . . . . . . . . . . . . . . . . . . 11 78 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 80 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 81 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 82 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 83 11.2. Informative References . . . . . . . . . . . . . . . . . 13 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 86 1. Introduction 88 Large datacenters necessarily need many network switches to 89 interconnect their many servers. Therefore, a datacenter can greatly 90 reduce its capital expenditure by leveraging low-cost switches. 91 However, such low-cost switches tend to have limited queue capacities 92 and are thus more susceptible to packet loss due to congestion. 94 Network traffic in a datacenter is often a mix of short and long 95 flows, where the short flows require low latencies and the long flows 96 require high throughputs. Datacenters also experience incast bursts, 97 where many servers send traffic to a single server at the same time. 99 For example, this traffic pattern is a natural consequence of 100 MapReduce workload: The worker nodes complete at approximately the 101 same time, and all reply to the master node concurrently. 103 These factors place some conflicting demands on the queue occupancy 104 of a switch: 106 o The queue must be short enough that it does not impose excessive 107 latency on short flows. 109 o The queue must be long enough to buffer sufficient data for the 110 long flows to saturate the path capacity. 112 o The queue must be long enough to absorb incast bursts without 113 excessive packet loss. 115 Standard TCP congestion control [RFC5681] relies on packet loss to 116 detect congestion. This does not meet the demands described above. 117 First, short flows will start to experience unacceptable latencies 118 before packet loss occurs. Second, by the time TCP congestion 119 control kicks in on the senders, most of the incast burst has already 120 been dropped. 122 [RFC3168] describes a mechanism for using Explicit Congestion 123 Notification (ECN) from the switches for detection of congestion. 124 However, this method only detects the presence of congestion, not its 125 extent. In the presence of mild congestion, the TCP congestion 126 window is reduced too aggressively and this unnecessarily reduces the 127 throughput of long flows. 129 Datacenter TCP (DCTCP) improves traditional ECN processing by 130 estimating the fraction of bytes that encounter congestion, rather 131 than simply detecting that some congestion has occurred. DCTCP then 132 scales the TCP congestion window based on this estimate. This method 133 achieves high burst tolerance, low latency, and high throughput with 134 shallow-buffered switches. 136 It is recommended that DCTCP be only deployed in a datacenter 137 environment where the endpoints and the switching fabric are under a 138 single administrative domain. This protocol is not meant for 139 uncontrolled deployment in the global Internet. Refer to Section 5 140 for more details. 142 2. Terminology 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. Normative 147 language is used to describe how necessary the various aspects of the 148 Microsoft implementation are for interoperability, but even compliant 149 implementations without the measures in sections 4-6 would still only 150 be safe to deploy in controlled environments. 152 3. DCTCP Algorithm 154 There are three components involved in the DCTCP algorithm: 156 o The switches (or other intermediate devices in the network) detect 157 congestion and set the Congestion Encountered (CE) codepoint in 158 the IP header. 160 o The receiver echoes the congestion information back to the sender, 161 using the ECN-Echo (ECE) flag in the TCP header. 163 o The sender computes a congestion estimate and reacts, by reducing 164 the TCP congestion window accordingly (cwnd). 166 3.1. Marking Congestion on the L3 Switches and Routers 168 The L3 switches and routers in a datacenter fabric indicate 169 congestion to the end nodes by setting the CE codepoint in the IP 170 header as specified in Section 5 of [RFC3168]. For example, the 171 switches may be configured with a congestion threshold. When a 172 packet arrives at a switch and its queue length is greater than the 173 congestion threshold, the switch sets the CE codepoint in the packet. 174 For example, Section 3.4 of [DCTCP10] suggests threshold marking with 175 a threshold K > (RTT * C)/7, where C is the link rate in packets per 176 second. However, the actual algorithm for marking congestion is an 177 implementation detail of the switch and will generally not be known 178 to the sender and receiver. Therefore, sender and receiver should 179 not assume that a particular marking algorithm is implemented by the 180 switching fabric. 182 3.2. Echoing Congestion Information on the Receiver 184 According to Section 6.1.3 of [RFC3168], the receiver sets the ECE 185 flag if any of the packets being acknowledged had the CE code point 186 set. The receiver then continues to set the ECE flag until it 187 receives a packet with the Congestion Window Reduced (CWR) flag set. 188 However, the DCTCP algorithm requires more detailed congestion 189 information. In particular, the sender must be able to determine the 190 number of bytes sent that encountered congestion. Thus, the scheme 191 described in [RFC3168] does not suffice. 193 One possible solution is to ACK every packet and set the ECE flag in 194 the ACK if and only if the CE code point was set in the packet being 195 acknowledged. However, this prevents the use of delayed ACKs, which 196 are an important performance optimization in datacenters. If the 197 delayed ACK frequency is m, then an ACK is generated every m packets. 198 The typical value of m is 2 but it could be affected by ACK 199 throttling or packet coalescing techniques designed to improve 200 performance. 202 Instead, DCTCP introduces a new Boolean TCP state variable, "DCTCP 203 Congestion Encountered" (DCTCP.CE), which is initialized to false and 204 stored in the Transmission Control Block (TCB). When sending an ACK, 205 the ECE flag MUST be set if and only if DCTCP.CE is true. When 206 receiving packets, the CE codepoint MUST be processed as follows: 208 1. If the CE codepoint is set and DCTCP.CE is false, send an ACK for 209 any previously unacknowledged packets and set DCTCP.CE to true. 211 2. If the CE codepoint is not set and DCTCP.CE is true, send an ACK 212 for any previously unacknowledged packets and set DCTCP.CE to 213 false. 215 3. Otherwise, ignore the CE codepoint. 217 The immediate ACK generated SHOULD NOT acknowledge any data in the 218 received packet that changes the DCTCP.CE state. 220 Receiver handling of the "Congestion Window Reduced" (CWR) bit is 221 also per [RFC3168] including [RFC3168-ERRATA3639]. That is, on 222 receipt of a segment with both the CE and CWR bits set, CWR is 223 processed first and then ECE is processed. 225 Send immediate 226 ACK with ECE=0 227 .----. .-------------. .---. 228 Send 1 ACK / v v | | \ 229 for every | .------. .------. | Send 1 ACK 230 m packets | | CE=0 | | CE=1 | | for every 231 with ECE=0 | '------' '------' | m packets 232 \ | | ^ ^ / with ECE=1 233 '---' '------------' '----' 234 Send immediate 235 ACK with ECE=1 237 Figure 1: ACK generation state machine. DCTCP.CE abbreviated as CE. 239 3.3. Processing Congestion Indications on the Sender 241 The sender estimates the fraction of bytes sent that encountered 242 congestion. The current estimate is stored in a new TCP state 243 variable, DCTCP.Alpha, which is initialized to 1 and SHOULD be 244 updated as follows: 246 DCTCP.Alpha = DCTCP.Alpha * (1 - g) + g * M 248 where 250 o g is the estimation gain, a real number between 0 and 1. The 251 selection of g is left to the implementation. See Section 4 for 252 further considerations. 254 o M is the fraction of bytes sent that encountered congestion during 255 the previous observation window, where the observation window is 256 chosen to be approximately the Round Trip Time (RTT). In 257 particular, an observation window ends when all bytes in flight at 258 the beginning of the window have been acknowledged. 260 In order to update DCTCP.Alpha, the TCP state variables defined in 261 [RFC0793] are used, and three additional TCP state variables are 262 introduced: 264 o DCTCP.WindowEnd: The TCP sequence number threshold for beginning a 265 new observation window; initialized to SND.UNA. 267 o DCTCP.BytesAcked: The number of sent bytes acknowledged during the 268 current observation window; initialized to zero. 270 o DCTCP.BytesMarked: The number of bytes sent during the current 271 observation window that encountered congestion; initialized to 272 zero. 274 The congestion estimator on the sender SHOULD process acceptable ACKs 275 as follows: 277 1. Compute the bytes acknowledged (TCP SACK options [RFC2018] are 278 ignored for this computation): 280 BytesAcked = SEG.ACK - SND.UNA 282 2. Update the bytes sent: 284 DCTCP.BytesAcked += BytesAcked 286 3. If the ECE flag is set, update the bytes marked: 288 DCTCP.BytesMarked += BytesAcked 290 4. If the acknowledgment number is less than or equal to 291 DCTCP.WindowEnd, stop processing. Otherwise, the end of the 292 observation window has been reached, so proceed to update the 293 congestion estimate as follows: 295 5. Compute the congestion level for the current observation window: 297 M = DCTCP.BytesMarked / DCTCP.BytesAcked 299 6. Update the congestion estimate: 301 DCTCP.Alpha = DCTCP.Alpha * (1 - g) + g * M 303 7. Determine the end of the next observation window: 305 DCTCP.WindowEnd = SND.NXT 307 8. Reset the byte counters: 309 DCTCP.BytesAcked = DCTCP.BytesMarked = 0 311 Rather than always halving the congestion window as described in 312 [RFC3168], when the sender receives an indication of congestion 313 (ECE), the sender SHOULD update cwnd as follows: 315 cwnd = cwnd * (1 - DCTCP.Alpha / 2) 317 Thus, when no bytes sent experienced congestion, DCTCP.Alpha equals 318 zero, and cwnd is left unchanged. When all sent bytes experienced 319 congestion, DCTCP.Alpha equals one, and cwnd is reduced by half. 320 Lower levels of congestion will result in correspondingly smaller 321 reductions to cwnd. 323 Just as specified in [RFC3168], DCTCP does not react to congestion 324 indications more than once for every window of data. The setting of 325 the "Congestion Window Reduced" (CWR) bit is also as per [RFC3168]. 326 This is required for interop with classic ECN receivers due to 327 potential misconfigurations. 329 A DCTCP sender MUST deal with loss episodes in the same way as 330 conventional TCP. In case of a timeout or fast retransmit or any 331 change in delay (for delay based congestion control), the cwnd and 332 other state variables like ssthresh must be changed in the same way 333 that a conventional TCP would have changed them. 335 3.4. Handling of SYN, SYN-ACK, RST Packets 337 The switching fabric can drop TCP packets that do not have the ECT 338 set in the IP header. If SYN and SYN-ACK packets for DCTCP 339 connections do not have ECT set, they will be dropped with high 340 probability. For DCTCP connections, the sender SHOULD set ECT for 341 SYN, SYN-ACK and RST packets. 343 4. Implementation Issues 345 As noted in Section 3.3, the implementation will need to choose a 346 suitable estimation gain. [DCTCP10] provides a theoretical basis for 347 selecting the gain. However, it may be more practical to use 348 experimentation to select a suitable gain for a particular network 349 and workload. The Microsoft implementation of DCTCP in Windows 350 Server 2012 uses a fixed estimation gain of 1/16. 352 The implementation must also decide when to use DCTCP. Datacenter 353 servers may need to communicate with endpoints outside the 354 datacenter, where DCTCP is unsuitable or unsupported. Thus, a global 355 configuration setting to enable DCTCP will generally not suffice. 356 DCTCP provides no mechanism for negotiating its use. Thus, there is 357 additional management and configuration overhead required to ensure 358 that DCTCP is not used with non-DCTCP endpoints. 360 Potential solutions rely on either configuration or heuristics. 361 Heuristics need to allow endpoints to individually enable DCTCP, to 362 ensure a DCTCP sender is always paired with a DCTCP receiver. One 363 approach is to enable DCTCP based on the IP address of the remote 364 endpoint. Another approach is to detect connections that transmit 365 within the bounds a datacenter. For example, Microsoft Windows 366 Server 2012 (and later versions) supports automatic selection of 367 DCTCP if the estimated RTT is less than 10 msec and ECN is 368 successfully negotiated, under the assumption that if the RTT is low, 369 then the two endpoints are likely in the same datacenter network. 371 [RFC3168] forbids the ECN-marking of pure ACK packets, because of the 372 inability of TCP to mitigate ACK-path congestion. RFC 3168 also 373 forbids ECN-marking of retransmissions, window probes and RSTs. 374 However, dropping all these control packets - rather than ECN marking 375 them - has considerable performance disadvantages. It is RECOMMENDED 376 that an implementation provide a configuration knob that will cause 377 ECT to be set on such control packes, which can be used in 378 environments where such concerns do not apply. 380 It would be useful to implement DCTCP as additional actions on top of 381 an existing congestion control algorithm like NewReno. The DCTCP 382 implementation MAY also allow configuration of resetting the value of 383 DCTCP.Alpha as part of processing any loss episodes. 385 The DCTCP.Alpha calculation as per the formula in Section 3.3 386 involves fractions. An efficient kernel implementation MAY scale the 387 DCTCP.Alpha value for efficient computation using shift operations. 388 For example, if the implementation chooses g as 1/16, multiplications 389 of DCTCP.Alpha by g become right-shifts by 4. A scaling 390 implementation SHOULD ensure that DCTCP.Alpha is able to reach zero 391 once it falls below the smallest shifted value (16 in the above 392 example). At the other extreme, a scaled update MUST also ensure 393 DCTCP.Alpha does not exceed the scaling factor, which would be 394 equivalent to greater than 100% congestion. So, DCTCP.Alpha MUST be 395 clamped after an update. 397 This results in the following computations replacing steps 5 and 6 in 398 Section 3.3, where SCF is the chosen scaling factor (65536 in the 399 example) and SHF is the shift factor (4 in the example): 401 1. Compute the congestion level for the current observation window: 403 ScaledM = SCF * DCTCP.BytesMarked / DCTCP.BytesAcked 405 2. Update the congestion estimate: 407 if (DCTCP.Alpha >> SHF) == 0 then DCTCP.Alpha = 0 409 DCTCP.Alpha += (ScaledM >> SHF) - (DCTCP.Alpha >> SHF) 411 if DCTCP.Alpha > SCF then DCTCP.Alpha = SCF 413 5. Deployment Issues 415 DCTCP and conventional TCP congestion control do not coexist well in 416 the same network. In DCTCP, the marking threshold is set to a very 417 low value to reduce queueing delay, and a relatively small amount of 418 congestion will exceed the marking threshold. During such periods of 419 congestion, conventional TCP will suffer packet loss and quickly and 420 drastically reduce cwnd. DCTCP, on the other hand, will use the 421 fraction of marked packets to reduce cwnd more gradually. Thus, the 422 rate reduction in DCTCP will be much slower than that of conventional 423 TCP, and DCTCP traffic will gain a larger share of the capacity 424 compared to conventional TCP traffic traversing the same path. If 425 the traffic in the datacenter is a mix of conventional TCP and DCTCP, 426 it is RECOMMENDED that DCTCP traffic be segregated from conventional 427 TCP traffic. [MORGANSTANLEY] describes a deployment that uses the IP 428 DSCP bits to segregate the network such that AQM is applied to DCTCP 429 traffic, whereas TCP traffic is managed via drop-tail queueing. 431 Deployments should take into account segregation of non-TCP traffic 432 as well. Today's commodity switches allow configuration of different 433 marking/drop profiles for non-TCP and non-IP packets. Non-TCP and 434 non-IP packets should be able to pass through such switches, unless 435 they really run out of buffer space. 437 Since DCTCP relies on congestion marking by the switches, DCTCP's 438 potential can only be realized in datacenters where the entire 439 network infrastructure supports ECN. The switches may also support 440 configuration of the congestion threshold used for marking. The 441 proposed parameterization can be configured with switches that 442 implement RED. [DCTCP10] provides a theoretical basis for selecting 443 the congestion threshold, but as with the estimation gain, it may be 444 more practical to rely on experimentation or simply to use the 445 default configuration of the device. DCTCP will degrade to loss- 446 based congestion control when transiting a congested drop-tail link. 448 DCTCP requires changes on both the sender and the receiver, so both 449 endpoints must support DCTCP. Furthermore, DCTCP provides no 450 mechanism for negotiating its use, so both endpoints must be 451 configured through some out-of-band mechanism to use DCTCP. A 452 variant of DCTCP that can be deployed unilaterally and only requires 453 standard ECN behavior has been described in [ODCTCP][BSDCAN], but 454 requires additional experimental evaluation. 456 6. Known Issues 458 DCTCP relies on the sender's ability to reconstruct the stream of CE 459 codepoints received by the remote endpoint. To accomplish this, 460 DCTCP avoids using a single ACK packet to acknowledge segments 461 received both with and without the CE codepoint set. However, if one 462 or more ACK packets are dropped, it is possible that a subsequent ACK 463 will cumulatively acknowledge a mix of CE and non-CE segments. This 464 will, of course, result in a less accurate congestion estimate. 465 There are some potential considerations: 467 o Even with an inaccurate congestion estimate, DCTCP may still 468 perform better than [RFC3168]. 470 o If the estimation gain is small relative to the packet loss rate, 471 the estimate may not be too inaccurate. 473 o If packet loss mostly occurs under heavy congestion, most drops 474 will occur during an unbroken string of CE packets, and the 475 estimate will be unaffected. 477 However, the effect of packet drops on DCTCP under real world 478 conditions has not been analyzed. 480 DCTCP provides no mechanism for negotiating its use. The effect of 481 using DCTCP with a standard ECN endpoint has been analyzed in 482 [ODCTCP][BSDCAN]. Furthermore, it is possible that other 483 implementations may also modify [RFC3168] behavior without 484 negotiation, causing further interoperability issues. 486 Much like standard TCP, DCTCP is biased against flows with longer 487 RTTs. A method for improving the RTT fairness of DCTCP has been 488 proposed in [ADCTCP], but requires additional experimental 489 evaluation. 491 7. Implementation Status 493 This section documents the implementation status of the specification 494 in this document, as recommended by [RFC7942]. 496 This document describes DCTCP as implemented in Microsoft Windows 497 Server 2012. Since publication of the first versions of this 498 document, the Linux [LINUX] and FreeBSD [FREEBSD] operating systems 499 have also implemented support for DCTCP in a way that is believed to 500 follow this document. 502 8. Security Considerations 504 DCTCP enhances ECN and thus inherits the security considerations 505 discussed in [RFC3168]. The processing changes introduced by DCTCP 506 do not exacerbate these considerations or introduce new ones. In 507 particular, with either algorithm, the network infrastructure or the 508 remote endpoint can falsely report congestion and thus cause the 509 sender to reduce cwnd. However, this is no worse than what can be 510 achieved by simply dropping packets. 512 [RFC3168] requires that a compliant TCP must not set ECT on SYN or 513 SYN-ACK packets. [RFC5562] proposes setting ECT on SYN-ACK packets, 514 but maintains the restriction of no ECT on SYN packets. Both these 515 RFCs prohibit ECT in SYN packets due to security concerns regarding 516 malicious SYN packets with ECT set. These RFCs, however, are 517 intended for general Internet use, and do not directly apply to a 518 controlled datacenter environment. The security concerns addressed 519 by both these RFCs might not apply in controlled environments like 520 datacenters, and it might not be necessary to account for the 521 presence of non-ECN servers. Since most servers run virtualized in 522 datacenters, additional security can be imposed in the physical 523 servers to intercept and drop traffic resembling an attack. 525 9. IANA Considerations 527 This document has no actions for IANA. 529 10. Acknowledgements 531 The DCTCP algorithm was originally proposed and analyzed in [DCTCP10] 532 by Mohammad Alizadeh, Albert Greenberg, Dave Maltz, Jitu Padhye, 533 Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari 534 Sridharan. 536 We would like to thank Andrew Shewmaker for identifying the problem 537 of clamping DCTCP.Alpha and proposing a solution for it. 539 Lars Eggert has received funding from the European Union's Horizon 540 2020 research and innovation program 2014-2018 under grant agreement 541 No. 644866 ("SSICLOPS"). This document reflects only the authors' 542 views and the European Commission is not responsible for any use that 543 may be made of the information it contains. 545 11. References 547 11.1. Normative References 549 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 550 RFC 793, DOI 10.17487/RFC0793, September 1981, 551 . 553 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 554 Selective Acknowledgment Options", RFC 2018, 555 DOI 10.17487/RFC2018, October 1996, 556 . 558 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 559 Requirement Levels", BCP 14, RFC 2119, 560 DOI 10.17487/RFC2119, March 1997, 561 . 563 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 564 of Explicit Congestion Notification (ECN) to IP", 565 RFC 3168, DOI 10.17487/RFC3168, September 2001, 566 . 568 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 569 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 570 . 572 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 573 Ramakrishnan, "Adding Explicit Congestion Notification 574 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 575 DOI 10.17487/RFC5562, June 2009, 576 . 578 11.2. Informative References 580 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 581 Code: The Implementation Status Section", BCP 205, 582 RFC 7942, DOI 10.17487/RFC7942, July 2016, 583 . 585 [DCTCP10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, 586 P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data 587 Center TCP (DCTCP)", DOI 10.1145/1851182.1851192, Proc. 588 ACM SIGCOMM 2010 Conference (SIGCOMM 10), August 2010, 589 . 591 [ODCTCP] Kato, M., "Improving Transmission Performance with One- 592 Sided Datacenter TCP", M.S. Thesis, Keio University, 593 2014, . 595 [BSDCAN] Kato, M., Eggert, L., Zimmermann, A., van Meter, R., and 596 H. Tokuda, "Extensions to FreeBSD Datacenter TCP for 597 Incremental Deployment Support", BSDCan 2015, June 2015, 598 . 600 [ADCTCP] Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 601 of DCTCP: Stability, Convergence, and Fairness", 602 DOI 10.1145/1993744.1993753, Proc. ACM SIGMETRICS Joint 603 International Conference on Measurement and Modeling of 604 Computer Systems (SIGMETRICS 11), June 2011, 605 . 607 [LINUX] Borkmann, D. and F. Westphal, "Linux DCTCP patch", 2014, 608 . 612 [FREEBSD] Kato, M. and H. Panchasara, "DCTCP (Data Center TCP) 613 implementation", 2015, 614 . 617 [MORGANSTANLEY] 618 Judd, G., "Attaining the Promise and Avoiding the Pitfalls 619 of TCP in the Datacenter", Proc. 12th USENIX Symposium on 620 Networked Systems Design and Implementation (NSDI 15), May 621 2015, . 624 [RFC3168-ERRATA3639] 625 Scheffenegger, R., "RFC3168 Errata ID 3639", 2013, 626 . 629 Authors' Addresses 631 Stephen Bensley 632 Microsoft 633 One Microsoft Way 634 Redmond, WA 98052 635 USA 637 Phone: +1 425 703 5570 638 Email: sbens@microsoft.com 640 Lars Eggert 641 NetApp 642 Sonnenallee 1 643 Kirchheim 85551 644 Germany 646 Phone: +49 151 120 55791 647 Email: lars@netapp.com 648 URI: http://eggert.org/ 650 Dave Thaler 651 Microsoft 653 Phone: +1 425 703 8835 654 Email: dthaler@microsoft.com 656 Praveen Balasubramanian 657 Microsoft 659 Phone: +1 425 538 2782 660 Email: pravb@microsoft.com 661 Glenn Judd 662 Morgan Stanley 664 Phone: +1 973 979 6481 665 Email: glenn.judd@morganstanley.com