idnits 2.17.1 draft-ietf-tcpm-accecn-reqs-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 9, 2015) is 3329 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-bensley-tcpm-dctcp-02 == Outdated reference: A later version (-06) exists of draft-stewart-tsvwg-sctpecn-05 == Outdated reference: A later version (-02) exists of draft-welzl-ecn-benefits-01 -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 6093 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed. 3 Internet-Draft ETH Zurich 4 Intended status: Informational R. Scheffenegger 5 Expires: September 10, 2015 NetApp, Inc. 6 B. Briscoe 7 BT 8 March 9, 2015 10 Problem Statement and Requirements for a More Accurate ECN Feedback 11 draft-ietf-tcpm-accecn-reqs-08 13 Abstract 15 Explicit Congestion Notification (ECN) is a mechanism where network 16 nodes can mark IP packets instead of dropping them to indicate 17 congestion to the end-points. An ECN-capable receiver will feed this 18 information back to the sender. ECN is specified for TCP in such a 19 way that it can only feed back one congestion signal per Round-Trip 20 Time (RTT). In contrast, ECN for other transport protocols, such as 21 RTP/UDP and SCTP, is specified with more accurate ECN feedback. 22 Recent new TCP mechanisms (like ConEx or DCTCP) need more accurate 23 ECN feedback in the case where more than one marking is received in 24 one RTT. This document specifies requirements for an update to the 25 TCP protocol to provide more accurate ECN feedback. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on September 10, 2015. 44 Copyright Notice 46 Copyright (c) 2015 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . 4 64 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7 66 5. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 10 67 5.1. Re-Definition of ECN/NS Header Bits . . . . . . . . . . . 11 68 5.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 12 69 5.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . 12 70 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 71 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 72 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 73 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 74 9.1. Normative References . . . . . . . . . . . . . . . . . . 14 75 9.2. Informative References . . . . . . . . . . . . . . . . . 14 76 Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP 15 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 79 1. Introduction 81 Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where 82 network nodes can mark IP packets instead of dropping them to 83 indicate congestion to the end-points. An ECN-capable receiver will 84 feed this information back to the sender. ECN is specified for TCP 85 in such a way that only one feedback signal can be transmitted per 86 Round-Trip Time (RTT). This is sufficient for pre-existing TCP 87 congestion control mechanisms that perform only one reduction in 88 sending rate per RTT, independent of the number of ECN congestion 89 marks. But recently proposed or deployed mechanisms like Congestion 90 Exposure (ConEx) [RFC6789] or Data Center TCP (DCTCP) 91 [I-D.bensley-tcpm-dctcp] need more accurate ECN feedback than 92 'classic' ECN [RFC3168] to work correctly in the case where more than 93 one marking is received in any one RTT. 95 For an in-depth discussion of the application benefits of using ECN 96 (including with sufficiently granular feedback) see 97 [I-D.welzl-ecn-benefits]. 99 ECN is also defined for transport protocols beside TCP. ECN feedback 100 as defined for RTP/UDP [RFC6679] provides a very detailed level of 101 information, delivering individual counters for all four ECN 102 codepoints as well as lost and duplicate segments, but at the cost of 103 high signaling overhead. ECN feedback for SCTP has been proposed in 104 [I-D.stewart-tsvwg-sctpecn]. This delivers a counter for the number 105 of CE marked segments between CWR chunks, but also comes at the cost 106 of increased overhead. 108 Today, implementations of DCTCP already exist that alter TCP's ECN 109 feedback protocol in proprietary ways (DCTCP was released in 110 Microsoft Windows 8, and implementations exist for Linux and 111 FreeBSD). The changes DCTCP makes to TCP are not currently the 112 subject of any IETF standardization activity, and they omit 113 capability negotiation, relying instead on uniform configuration 114 across all hosts and network devices with ECN capability. A primary 115 motivation for this document is to intervene before each proprietary 116 implementation invents its own non-interoperable handshake, which 117 could lead to _de facto_ consumption of the few flags or codepoints 118 that remain available for standardizing capability negotiation. 120 This document lists requirements for a robust and interoperable TCP/ 121 ECN feedback protocol that is more accurate than classic ECN 122 [RFC3168] and that all implementations of new TCP extensions, like 123 ConEx and/or DCTCP, can use. While a new feedback scheme should 124 still deliver as much information as classic ECN, this document also 125 clarifies what has to be taken into consideration in addition. Thus 126 the listed requirements should be addressed in the specification of a 127 more accurate ECN feedback scheme. A few solutions have already been 128 proposed. Section 5 demonstrates how to use the requirements to 129 compare them, by briefly sketching their high level design choices 130 and discussing the benefits and drawbacks of each. 132 The scope of these requirements is not limited to any specific 133 environment and is intended for general deployment over public and 134 private IP networks. Candidate solutions should try to adhere to all 135 these requirements, but where this is not possible they should 136 justify the deviation. The ordering of the requirements listed in 137 this document is not to be taken as an order of importance, because 138 each requirement might have different weight in different deployment 139 scenarios. 141 These requirements are only concerned with the type and quality of 142 the ECN feedback signal. The requirements do not stipulate how a TCP 143 sender might react to the improved ECN signal. The requirements also 144 do not imply that any modifications to TCP senders or receivers are 145 obligatory. 147 1.1. Terminology 149 We use the following terminology from [RFC3168] and [RFC3540]: 151 The ECN field in the IP header: 153 Not-ECT: the not ECN-Capable Transport codepoint, 155 CE: the Congestion Experienced codepoint, 157 ECT(0): the first ECN-Capable Transport codepoint, and 159 ECT(1): the second ECN-Capable Transport codepoint. 161 The ECN flags in the TCP header: 163 CWR: the Congestion Window Reduced flag, 165 ECE: the ECN-Echo flag, and 167 NS: ECN Nonce Sum. 169 In this document, the ECN feedback scheme as specified in [RFC3168] 170 is called 'classic ECN' and any new proposal is called a 'more 171 accurate ECN feedback' scheme. A 'congestion mark' is defined as an 172 IP packet where the CE codepoint is set. A 'congestion episode' 173 refers to one or more congestion marks that belong to the same 174 overload situation in the network (usually during one RTT). A TCP 175 segment with the acknowledgment flag set is simply called an ACK. 177 2. Recap of Classic ECN and ECN Nonce in IP/TCP 179 ECN requires two bits in the IP header. The ECN capability of a 180 packet is indicated when either one of the two bits is set. A 181 network node can set both bits simultaneously when it experiences 182 congestion. This leads to the four codepoints (not-ECT, ECT(0), 183 ECT(1), and CE) as listed above. 185 In the TCP header the first two bits in byte 14 are defined as ECN 186 feedback for each half-connection. A TCP receiver signals the 187 reception of a congestion mark using the ECN-Echo (ECE) flag in the 188 TCP header. For reliability, the receiver continues to set the ECE 189 flag on every ACK. To enable the TCP receiver to determine when to 190 stop setting the ECN-Echo flag, the sender sets the CWR flag upon 191 reception of an ECE feedback signal. This always leads to a full RTT 192 of ACKs with ECE set. Thus the receiver cannot signal back any 193 additional CE markings arriving within the same RTT. 195 The ECN Nonce [RFC3540] is an experimental addition to ECN that the 196 TCP sender can use to protect itself against accidental or malicious 197 concealment of CE-marked or dropped packets. This addition defines 198 the last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag. 199 The receiver maintains a nonce sum that counts the occurrence of 200 ECT(1) packets, and signals the least significant bit of this sum on 201 the NS flag. There are no known deployments of a TCP stack that 202 makes use of the ECN Nonce extension. 204 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 205 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 206 | | | N | C | E | U | A | P | R | S | F | 207 | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | 208 | | | | R | E | G | K | H | T | N | N | 209 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 211 Figure 1: The (post-ECN Nonce) definition of the TCP header flags 213 An alternative for a sender to assure feedback integrity has been 214 proposed where the sender occasionally inserts a CE mark or 215 reordering itself, and checks that the receiver feeds it back 216 faithfully [I-D.moncaster-tcpm-rcv-cheat]. This alternative consumes 217 no header bits or codepoints, as well as releasing the ECT(1) 218 codepoint in the IP header and the NS flag in the TCP header for 219 other uses. 221 3. Use Cases 223 The following two examples serve to show where existing mechanisms 224 would already benefit from more accurate ECN feedback information. 225 However, as it is hard to predict the future, once a more accurate 226 ECN feedback mechanism that adheres to the requirements stated in 227 this document is widely deployed, it's very likely that additional 228 uses are found. The examples listed below are in no particular 229 order. 231 ConEx is an experimental approach that allows a sender to relay 232 congestion feedback provided by the receiver into the network along 233 the forward data path. ConEx information can be used for traffic 234 management to limit traffic proportionate to the actual congestion 235 being caused, rather than limiting traffic based on rate or volume 236 [RFC6789]. A ConEx sender uses selective acknowledgements (SACK) 237 [RFC2018] for accurate feedback of loss signals, but currently TCP 238 offers no equivalent accurate feedback for ECN. 240 DCTCP offers very low and predictable queuing delay. DCTCP changes 241 the reaction to congestion of a TCP sender and additionally requires 242 switches/routers to have ECN enabled and configured with a low step 243 threshold and no signal smoothing, so it is currently only used in 244 private networks, e.g. internal to data centers. DCTCP was released 245 in Microsoft Windows 8, and implementations exist for Linux and 246 FreeBSD. To retrieve sufficient congestion information, the 247 different DCTCP implementations use a proprietary ECN feedback 248 protocol, but they omit capability negotiation. Moreover, the 249 feedback protocol proposed in [I-D.bensley-tcpm-dctcp] only works if 250 there are no losses at all, and otherwise it gets very confused (see 251 Appendix A). Therefore, if a generic more accurate ECN feedback 252 scheme were available, it would solve two problems for DCTCP: i) need 253 for a consistent variant of DCTCP to be deployed network-wide and ii) 254 inability to cope with ACK loss. 256 Classic ECN-TCP would not benefit from more accurate ECN feedback, 257 but it would not suffer either. The same signal that is currently 258 conveyed with ECN following the specification given in [RFC3168] 259 would be available. 261 The following scenarios should briefly show where accurate ECN 262 feedback is needed or adds value: 264 A sender with standardised TCP congestion control that supports 265 ConEx: 266 In this case the ConEx mechanism uses the extra information 267 per RTT to re-echo the precise congestion information, but 268 the congestion control algorithm still ignores multiple marks 269 per RTT [RFC5681]. 271 A sender using DCTCP congestion control without ConEx: 272 The congestion control algorithm uses the extra info per RTT 273 to perform its decrease depending on the number of congestion 274 marks. 276 A sender using DCTCP congestion control and supporting ConEx: 277 Both the congestion control algorithm and ConEx use the more 278 accurate ECN feedback mechanism. 280 As-yet-unspecified sender mechanisms: 281 The above are two examples of more general interest in sender 282 mechanisms that respond to the extent of congestion feedback, 283 not just its existence. It will greatly simplify incremental 284 deployment if the sender can unilaterally deploy new 285 behaviours, and rely on the presence of generic receivers 286 that have already implemented more accurate feedback. 288 An RFC5681 TCP sender without ConEx: 289 No accurate feedback is necessary here. The congestion 290 control algorithm still reacts to only one signal per RTT. 291 But it is best to feed back all the information the receiver 292 gets, whether the sender uses it or not -- at least as long 293 as overhead is low or zero. 295 Using CE for checking integrity: 296 If a more accurate ECN feedback scheme feeds all occurrences 297 of CE marks back, a sender could perform integrity checking 298 by occasionally injecting CE marks itself. Specifically, a 299 sender can send packets which it randomly marks with CE (at 300 low frequency), then check if feedback is received for these 301 packets. The congestion notification feedback for these 302 self-injected markings, would not require a congestion 303 control reaction [I-D.moncaster-tcpm-rcv-cheat]. 305 4. Requirements 307 The requirements of the accurate ECN feedback protocol are to have 308 fairly accurate (not necessarily perfect), timely and protected 309 signaling. This leads to the following requirements, which should be 310 discussed for any proposed more accurate ECN feedback scheme: 312 Resilience 313 The ECN feedback signal is carried within the ACK. Pure TCP 314 ACKs can get lost without recovery (not just due to 315 congestion, but also due to deliberate ACK thinning). 316 Moreover, delayed ACKs are commonly used with TCP. 317 Typically, an ACK is triggered after two data segments (or 318 more e.g., due to receive segment coalescing, ACK 319 compression, ACK congestion control [RFC5690] or other 320 phenomena, see [RFC3449]). In a high congestion situation 321 where most of the packets are marked with CE, an accurate 322 feedback mechanism should still be able to signal sufficient 323 congestion information. Thus the accurate ECN feedback 324 extension has to take delayed ACKs and ACK loss into account. 325 Also, a more accurate feedback protocol should still provide 326 more accurate feedback than classic ECN when delayed ACKs 327 cover more than two segments, or when a thin stream disables 328 Nagle's algorithm [RFC0896]. Finally, the feedback mechanism 329 should not be impacted by reordering of ACKs, even when the 330 ACK'ed sequence number does not increase. 332 Timeliness 333 A CE mark can be induced by the sending host, or more 334 commonly a network node on the transmission path, and is then 335 echoed by the receiver in the TCP ACK. Thus when this 336 information arrives at the sender, it is naturally already 337 about one RTT old. With a sufficient ACK rate a further 338 delay of a small number of packets can be tolerated. 339 However, this information will become stale with large 340 delays, given the dynamic nature of networks. TCP congestion 341 control (which itself partly introduces these dynamics) 342 operates on a time scale of one RTT. Thus, to be timely, 343 congestion feedback information should be delivered within 344 about one RTT. 346 Integrity 347 The integrity of the feedback in a more accurate ECN feedback 348 scheme should be assured, at least as well as the ECN Nonce. 349 Alternatively, it should at least be possible to give strong 350 incentives for the receiver and network nodes to cooperate 351 honestly. 353 Given there are known problems with ECN Nonce deployment, 354 this document only requires that the integrity of the more 355 accurate ECN feedback can be assured; it does not require 356 that the ECN Nonce mechanism is employed to achieve this. 357 Indeed, if integrity could be provided else-wise, a more 358 accurate ECN feedback protocol might re-purpose the nonce sum 359 (NS) flag in the TCP header. 361 If the more accurate ECN feedback scheme provides sufficient 362 information, the integrity check could e.g. be performed by 363 deterministically setting the CE in the sender and monitoring 364 the respective feedback (similar to ECT(1) and the ECN Nonce 365 sum). Whether a sender should enforce when it detects wrong 366 feedback information, and what kind of enforcement it should 367 apply, are policy issues that need not be specified as part 368 of more accurate ECN feedback signal scheme itself, but 369 rather when specifying an update to core TCP mechanisms like 370 congestion control that makes use of the more accurate ECN 371 signal. 373 Accuracy 374 Classic ECN feeds back one congestion notification per RTT, 375 which is sufficient for classic TCP congestion control which 376 reduces the sending rate at most once per RTT. Thus the more 377 accurate ECN feedback scheme should ensure that, if a 378 congestion episode occurs, at least one congestion 379 notification is echoed and received per RTT as classic ECN 380 would do. Of course, the goal of a more accurate ECN 381 extension is to reconstruct the number of CE markings more 382 accurately. In the best case the new scheme should even 383 allow reconstruction of the exact number of payload bytes 384 that a CE marked packet was carrying. However, it is 385 accepted that it may be too complex for a sender to get the 386 exact number of congestion markings or marked bytes in all 387 situations. Ideally, the feedback scheme should preserve the 388 order in which any (of the four) ECN signals were received. 389 And, ideally, it would even be possible for the sender to 390 determine which of the packets covered by one delayed ACK 391 were congestion marked, e.g. if the flow consists of packets 392 of different sizes, or to allow for future protocols where 393 the order of the markings may be important. 395 In the best case, a sender that sees more accurate ECN 396 feedback information would be able to reconstruct the 397 occurrence of any of the four code points (non-ECT, CE, 398 ECT(0), ECT(1)). However, assuming the sender marks all data 399 packets as ECN-capable and uses a default setting of ECT(0) 400 (as with [RFC3168], solely feeding back the occurrence of CE 401 and ECT(1) might be sufficient. Because the sender can keep 402 account of the transmitted segments with any of the three ECN 403 codepoints, conveying any two of these back to the sender is 404 sufficient for it to reconstruct the third as observed by the 405 receiver. Thus a more accurate ECN feedback scheme should at 406 least provide information on two of these signals, e.g. CE 407 and ECT(1). 409 If a more accurate ECN scheme can reliably deliver feedback 410 in most but not all circumstances, ideally the scheme should 411 at least not introduce bias. In other words, undetected loss 412 of some ACKs should be as likely to increase as decrease the 413 sender's estimate of the probability of ECN marking. 415 Complexity 416 Implementation should be as simple as possible and only a 417 minimum of additional state information should be needed. 418 This will enable more accurate ECN feedback to be used as the 419 default feedback mechanism, even if only one ECN feedback 420 signal per RTT is needed. 422 Overhead 423 A more accurate ECN feedback signal should limit the 424 additional network load, because ECN feedback is ultimately 425 not critical information (in the worst case, loss will still 426 be available as a congestion signal of last resort). As 427 feedback information has to be provided frequently and in a 428 timely fashion, potentially all or a large fraction of TCP 429 acknowledgments might carry this information. Ideally, no 430 additional segments should be exchanged compared to an 431 RFC3168 TCP session, and the overhead in each segment should 432 be minimized. 434 Backward and forward compatibility 435 Given more accurate ECN feedback will involve a change to the 436 TCP protocol, it should be negotiated between the two TCP 437 endpoints. If either end does not support the more accurate 438 feedback, they should both be able to fall-back to classic 439 ECN feedback. 441 A more accurate ECN feedback extension should aim to traverse 442 most middleboxes, including firewalls and network address 443 translators (NAT). Further, a feedback mechanism should 444 provide a method to fall back to classic ECN signaling if the 445 new signal is suppressed by certain middleboxes. 447 In order to avoid a fork in the TCP protocol specifications, 448 if experiments with the new ECN feedback protocol are 449 successful, it is intended to eventually update RFC3168 for 450 any TCP/ECN sender, not just for ConEx or DCTCP senders. 451 Then future senders will be able to unilaterally deploy new 452 behaviours that exploit the existence of more accurate ECN 453 feedback in receivers (forward compatibility). Conversely, 454 even if another sender only needs one ECN feedback signal per 455 RTT, it should be able to use more accurate ECN feedback, and 456 simply ignore the excess information. 458 Furthermore, the receiver should not make assumptions about the 459 mechanism that was used to set the markings nor about any 460 interpretation or reaction to the congestion signal. The receiver 461 only needs to faithfully reflect congestion information back to the 462 sender. 464 5. Design Approaches 466 This section introduces some possible TCP ECN feedback design 467 approaches. The purpose of this section is to give examples of how 468 trade-offs might be needed between the requirements, as input to 469 future IETF work to specify a protocol. The order is not significant 470 and there is no intention to endorse any particular approach. 472 All approaches presented below (and proposed so far) are able to 473 provide accurate ECN feedback information as long as no ACK loss 474 occurs and the congestion rate is reasonable. In the case of a high 475 ACK loss rate or very high congestion (CE marking) rate, the proposed 476 schemes have different resilience characteristics depending on the 477 number of bits used for the encoding. While classic ECN provides 478 reliable (but inaccurate) feedback of a maximum of one congestion 479 signal per RTT, the proposed schemes do not implement an explicit 480 acknowledgement mechanism for the feedback (as e.g. the ECE / CWR 481 exchange of [RFC3168]). 483 5.1. Re-Definition of ECN/NS Header Bits 485 Schemes in this category can additionally use the NS bit for 486 capability negotiation during the TCP handshake exchange. Thus a 487 more accurate ECN could be negotiated without changing the classic 488 ECN negotiation and thus being backwards compatible. 490 Schemes in this category can simply re-define the ECN header flags, 491 ECE and CWR, to encode the occurrence of a CE marking at the 492 receiver. This approach provides very limited resilience against 493 loss of ACK, particularly pure ACKs (no payload and therefore 494 delivered unreliably). 496 A couple of schemes have been proposed so far: 498 o A naive one-bit scheme that sends one ECE for each CE received 499 could use CWR to increase robustness against ACK loss by 500 introducing redundant information on the next ACK, but this is 501 still vulnerable to ACK loss. 503 o The scheme defined for DCTCP [I-D.bensley-tcpm-dctcp], which 504 toggles the ECE feedback on an immediate ACK whenever the CE 505 marking changes, and otherwise feeds back delayed ACKs with the 506 ECE value unchanged. Appendix A demonstrates that this scheme is 507 still ambiguous to the sender if the ACKs are pure ACKs, and if 508 some may have been lost. 510 Alternatively, the receiver uses the three ECN/NS header flags, ECE, 511 CWR and NS to represent a counter that signals the accumulated number 512 of CE markings it has received. Resilience against loss is better 513 than the flag-based schemes, but may not suffice in the presence of 514 extended ACK loss that otherwise would not affect the TCP sender's 515 performance. 517 A number of coding schemes have been proposed so far in this 518 category: 520 o A 3-bit counter scheme continuously feeds back the three least 521 significant bits of a CE counter; 523 o A scheme that defines a standardised lookup table to map the 8 524 codepoints onto either a CE counter or an ECT(1) counter. 526 These proposed schemes provide accumulated information on ECN-CE 527 marking feedback, similar to the number of acknowledged bytes in the 528 TCP header. Due to the limited number of bits the ECN feedback 529 information will wrap much more often than the acknowledgement field. 530 Thus feedback information could be lost due to a relatively small 531 sequence of pure-ACK losses. Resilience could be increased by 532 introducing redundancy, e.g. send each counter increase two or more 533 times. Of course any of these additional mechanisms will increase 534 the complexity. If the congestion rate is greater than the ACK rate 535 (multiplied by the number of congestion marks that can be signaled 536 per ACK), the congestion information cannot correctly be fed back. 537 Covering the worst case where every packet is CE marked can 538 potentially be realized by dynamically adapting the ACK rate and 539 redundancy. This again increases complexity and perhaps the 540 signaling overhead as well. Schemes that do not re-purpose the ECN 541 NS bit, could still support the ECN Nonce. 543 5.2. Using Other Header Bits 545 As seen in Figure 1, there are currently three unused flags in the 546 TCP header. The proposed 3-bit counter or codepoint schemes could be 547 extended by one or more bits to add higher resilience against ACK 548 loss. The relative gain would be exponentially higher resilience 549 against ACK loss, while the respective drawbacks would remain 550 identical. 552 Alternatively, a new method could standardise the use of the bits in 553 the Urgent Pointer field (see [RFC6093]) to signal more bits of its 554 congestion signal counter, but only whenever it does not set the 555 Urgent Flag. As this is often the case, resilience could be 556 increased without additional header overhead. 558 Any proposal to use such bits would need to check the likelihood that 559 some middleboxes might discard or 'normalize' the currently unused 560 flag bits or a non-zero Urgent Pointer when the Urgent Flag is 561 cleared. If during experimentation certain bits have been proven to 562 be usable, the assignment of any of these bits would then require an 563 IETF standards action. 565 5.3. Using a TCP Option 567 Alternatively, a new TCP option could be introduced, to help maintain 568 the accuracy and integrity of ECN feedback between receiver and 569 sender. Such an option could provide higher resilience and even more 570 information, perhaps as much as ECN for RTP/UDP [RFC6679], which 571 explicitly provides the number of ECT(0), ECT(1), CE, non-ECT marked 572 and lost packets, or as much as a proposal for SCTP that counts the 573 number of ECN marks [I-D.stewart-tsvwg-sctpecn] between CWR chunks. 574 However, deploying new TCP options has its own challenges. Moreover, 575 to actually achieve high resilience, this option would need to be 576 carried by most or all ACKs as the receiver cannot know if and when 577 ACKs may be dropped. Thus this approach would introduce considerable 578 signaling overhead even though ECN feedback is not extremely critical 579 information (in the worst case, loss will still be available to 580 provide a strong congestion feedback signal). Whatever, such a TCP 581 option could be used in addition to a more accurate ECN feedback 582 scheme in the TCP header or in addition to classic ECN, only when 583 needed and when space is available. 585 6. Acknowledgements 587 Thanks to Gorry Fairhurst for his review and for ideas on CE-based 588 integrity checking and to Mohammad Alizadeh for suggesting the need 589 to avoid bias. 591 Bob Briscoe was part-funded by the European Community under its 592 Seventh Framework Programme through the Reducing Internet Transport 593 Latency (RITE) project (ICT-317700) and through the Trilogy 2 project 594 (ICT-317756). he views expressed here are solely those of the 595 authors, in the context of the mentioned funding projects 597 7. IANA Considerations 599 This memo includes no request to IANA. 601 8. Security Considerations 603 ECN feedback information must only be used if the other information 604 contained in a received TCP segment indicates that the congestion was 605 genuinely part of the flow and not spoofed - i.e. the normal TCP 606 acceptance techniques have to be used to verify that the segment is 607 part of the flow before returning any contained ECN information, and 608 similarly ECN feedback is only accepted on valid ACKs. 610 Given ECN feedback is used as input for congestion control, the 611 respective algorithm would not react appropriately if ECN feedback 612 were lost and the resilience mechanism to recover it was inadequate. 613 This resilience requirement is articulated in Section 4. However, it 614 should be noted that ECN feedback is not the last resort against 615 congestion collapse, because if there is insufficient response to 616 ECN, loss will ensue, and TCP will still react appropriately to loss. 618 A receiver could suppress ECN feedback information leading to its 619 connections consuming excess sender or network resources. This 620 problem is similar to that seen with the classic ECN feedback scheme 621 and should be addressed by integrity checking as required in 622 Section 4. 624 9. References 626 9.1. Normative References 628 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 629 of Explicit Congestion Notification (ECN) to IP", RFC 630 3168, September 2001. 632 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 633 Congestion Notification (ECN) Signaling with Nonces", RFC 634 3540, June 2003. 636 9.2. Informative References 638 [I-D.bensley-tcpm-dctcp] 639 sbens@microsoft.com, s., Eggert, L., and D. Thaler, 640 "Microsoft's Datacenter TCP (DCTCP): TCP Congestion 641 Control for Datacenters", draft-bensley-tcpm-dctcp-02 642 (work in progress), January 2015. 644 [I-D.moncaster-tcpm-rcv-cheat] 645 Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to 646 Allow Senders to Identify Receiver Non-Compliance", draft- 647 moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014. 649 [I-D.stewart-tsvwg-sctpecn] 650 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 651 Control Transmission Protocol (SCTP)", draft-stewart- 652 tsvwg-sctpecn-05 (work in progress), January 2014. 654 [I-D.welzl-ecn-benefits] 655 Welzl, M. and G. Fairhurst, "The Benefits to Applications 656 of using Explicit Congestion Notification (ECN)", draft- 657 welzl-ecn-benefits-01 (work in progress), July 2014. 659 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 660 RFC 896, January 1984. 662 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 663 Selective Acknowledgment Options", RFC 2018, October 1996. 665 [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. 666 Sooriyabandara, "TCP Performance Implications of Network 667 Path Asymmetry", BCP 69, RFC 3449, December 2002. 669 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 670 Control", RFC 5681, September 2009. 672 [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 673 Acknowledgement Congestion Control to TCP", RFC 5690, 674 February 2010. 676 [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the 677 TCP Urgent Mechanism", RFC 6093, January 2011. 679 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 680 and K. Carlberg, "Explicit Congestion Notification (ECN) 681 for RTP over UDP", RFC 6679, August 2012. 683 [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion 684 Exposure (ConEx) Concepts and Use Cases", RFC 6789, 685 December 2012. 687 Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP 689 As defined in [I-D.bensley-tcpm-dctcp], a DCTCP receiver feeds back 690 ECE=0 on delayed ACKs as long as CE remains 0, and also immediately 691 sends an ACK with ECE=0 when CE transitions to 1. Similarly, it 692 continually feeds back ECE=1 on delayed ACKs while CE remains 1 and 693 immediately feeds back ECE=1 when CE transitions to 0. A sender can 694 unambiguously decode this scheme if there is never any ACK loss, and 695 the sender assumes there will never be any ACK loss. 697 The following two examples show that the feedback sequence becomes 698 highly ambiguous to the sender, if either of these conditions is 699 broken. Below, '0' will represent ECE=0, '1' will represent ECE=1 700 and '.' will represent a gap of one segment between delayed ACKs. 701 Now imagine that the sender receives the following sequence of 702 feedback on 3 pure ACKs: 704 0.0.0 706 When the receiver sent this sequence it could have been any of the 707 following four sequences: 709 a. 0.0.0 (0 x CE) 711 b. 010.0 (1 x CE) 712 c. 0.010 (1 x CE) 714 d. 01010 (2 x CE) 716 where any of the 1s represent a possible pure ACK carrying ECE 717 feedback that could have been lost. If the sender guesses (a), it 718 might be correct, or it might miss 1 or 2 congestion marks over 5 719 packets. Therefore, when confronted with this simple sequence (that 720 is not contrived), a sender can guess that congestion might have been 721 0%, 20% or 40%, but it doesn't know which. 723 Sequences with a longer gap (e.g. 0...0.0) become far more ambiguous. 724 It helps a little if the sender knows the distance the receiver uses 725 between delayed ACKs, and it helps a lot if the distance is 1, i.e. 726 no delayed ACKs, but even then there will still be ambiguity whenever 727 there are pure ACK losses. 729 Authors' Addresses 731 Mirja Kuehlewind (editor) 732 ETH Zurich 733 Gloriastrasse 35 734 Zurich 8092 735 Switzerland 737 Email: mirja.kuehlewind@tik.ee.ethz.ch 739 Richard Scheffenegger 740 NetApp, Inc. 741 Am Euro Platz 2 742 Vienna 1120 743 Austria 745 Phone: +43 1 3676811 3146 746 Email: rs@netapp.com 748 Bob Briscoe 749 BT 750 B54/77, Adastral Park 751 Martlesham Heath 752 Ipswich IP5 3RE 753 UK 755 Phone: +44 1473 645196 756 Email: bob.briscoe@bt.com 757 URI: http://bobbriscoe.net/