| < draft-ietf-tcpm-accurate-ecn-01.txt | draft-ietf-tcpm-accurate-ecn-02.txt > | |||
|---|---|---|---|---|
| TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | |||
| Internet-Draft Simula Research Laboratory | Internet-Draft Simula Research Laboratory | |||
| Intended status: Experimental M. Kuehlewind | Intended status: Experimental M. Kuehlewind | |||
| Expires: January 1, 2017 ETH Zurich | Expires: May 4, 2017 ETH Zurich | |||
| R. Scheffenegger | R. Scheffenegger | |||
| NetApp, Inc. | October 31, 2016 | |||
| June 30, 2016 | ||||
| More Accurate ECN Feedback in TCP | More Accurate ECN Feedback in TCP | |||
| draft-ietf-tcpm-accurate-ecn-01 | draft-ietf-tcpm-accurate-ecn-02 | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism where network | Explicit Congestion Notification (ECN) is a mechanism where network | |||
| nodes can mark IP packets instead of dropping them to indicate | nodes can mark IP packets instead of dropping them to indicate | |||
| incipient congestion to the end-points. Receivers with an ECN- | incipient congestion to the end-points. Receivers with an ECN- | |||
| capable transport protocol feed back this information to the sender. | capable transport protocol feed back this information to the sender. | |||
| ECN is specified for TCP in such a way that only one feedback signal | ECN is specified for TCP in such a way that only one feedback signal | |||
| can be transmitted per Round-Trip Time (RTT). Recently, new TCP | can be transmitted per Round-Trip Time (RTT). Recently, new TCP | |||
| mechanisms like Congestion Exposure (ConEx) or Data Center TCP | mechanisms like Congestion Exposure (ConEx) or Data Center TCP | |||
| skipping to change at page 1, line 45 ¶ | skipping to change at page 1, line 44 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on January 1, 2017. | This Internet-Draft will expire on May 4, 2017. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2016 IETF Trust and the persons identified as the | Copyright (c) 2016 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 17, line 52 ¶ | skipping to change at page 17, line 52 ¶ | |||
| main TCP header (Section 3.1) is successful, it implicitly declares | main TCP header (Section 3.1) is successful, it implicitly declares | |||
| that the endpoints also support the AccECN TCP Option. | that the endpoints also support the AccECN TCP Option. | |||
| If the TCP client indicated AccECN support, a TCP server tha confirms | If the TCP client indicated AccECN support, a TCP server tha confirms | |||
| its support for AccECN (as described in Section 3.1) SHOULD also | its support for AccECN (as described in Section 3.1) SHOULD also | |||
| include an AccECN TCP Option in the SYN/ACK. A TCP client that has | include an AccECN TCP Option in the SYN/ACK. A TCP client that has | |||
| successfully negotiated AccECN SHOULD include an AccECN Option in the | successfully negotiated AccECN SHOULD include an AccECN Option in the | |||
| first ACK at the end of the 3WHS. However, this first ACK is not | first ACK at the end of the 3WHS. However, this first ACK is not | |||
| delivered reliably, so the TCP client SHOULD also include an AccECN | delivered reliably, so the TCP client SHOULD also include an AccECN | |||
| Option on the first data segment it sends (if it ever sends one). A | Option on the first data segment it sends (if it ever sends one). A | |||
| host need not include an AccECN Option in any of these three cases if | host MAY NOT include an AccECN Option in any of these three cases if | |||
| it has cached knowledge that the packet would be likely to be blocked | it has cached knowledge that the packet would be likely to be blocked | |||
| on the path to the other host if it included an AccECN Option. | on the path to the other host if it included an AccECN Option. | |||
| If the TCP client has successfully negotiated AccECN but does not | If the TCP client has successfully negotiated AccECN but does not | |||
| receive an AccECN Option on the SYN/ACK, it switches into a mode that | receive an AccECN Option on the SYN/ACK, it switches into a mode that | |||
| assumes that the AccECN Option is not available for this half | assumes that the AccECN Option is not available for this half | |||
| connection. Similarly, if the TCP server has successfully negotiated | connection. Similarly, if the TCP server has successfully negotiated | |||
| AccECN but does not receive an AccECN Option on the first ACK or on | AccECN but does not receive an AccECN Option on the first ACK or on | |||
| the first data segment, it switches into a mode that assumes that the | the first data segment, it switches into a mode that assumes that the | |||
| AccECN Option is not available for this half connection. | AccECN Option is not available for this half connection. | |||
| skipping to change at page 18, line 34 ¶ | skipping to change at page 18, line 34 ¶ | |||
| AccECN Option. To expedite connection setup, the host SHOULD fall | AccECN Option. To expedite connection setup, the host SHOULD fall | |||
| back to NS=CWR=ECE=0 and no AccECN Option on the retransmission of | back to NS=CWR=ECE=0 and no AccECN Option on the retransmission of | |||
| the SYN/ACK. Implementers MAY use other fall-back strategies if they | the SYN/ACK. Implementers MAY use other fall-back strategies if they | |||
| are found to be more effective (e.g. retransmitting a SYN/ACK with | are found to be more effective (e.g. retransmitting a SYN/ACK with | |||
| AccECN TCP flags but not the AccECN Option; attempting to retransmit | AccECN TCP flags but not the AccECN Option; attempting to retransmit | |||
| a second AccECN segment before fall-back (most appropriate during | a second AccECN segment before fall-back (most appropriate during | |||
| high levels of congestion); or falling back to classic ECN feedback | high levels of congestion); or falling back to classic ECN feedback | |||
| rather than non-ECN). | rather than non-ECN). | |||
| Similarly, if the TCP client detects that the first data segment it | Similarly, if the TCP client detects that the first data segment it | |||
| sent was lost, it SHOULD fall back to no AccECN Option on the | sent with the AccECN Option was lost, it SHOULD fall back to no | |||
| retransmission. Again, implementers MAY use other fall-back | AccECN Option on the retransmission. Again, implementers MAY use | |||
| strategies such as attempting to retransmit a second segment with the | other fall-back strategies such as attempting to retransmit a second | |||
| AccECN Option before fall-back, and/or caching the result of previous | segment with the AccECN Option before fall-back, and/or caching the | |||
| attempts. | result of previous attempts. | |||
| Either host MAY include the AccECN Option in a subsequent segment to | Either host MAY include the AccECN Option in a subsequent segment to | |||
| retest whether the AccECN Option can traverse the path. | retest whether the AccECN Option can traverse the path. | |||
| Currently the Data Sender is not required to test whether the | Currently the Data Sender is not required to test whether the | |||
| arriving byte counters in the AccECN Option have been correctly | arriving byte counters in the AccECN Option have been correctly | |||
| initialised. This allows different initial values to be used as an | initialised. This allows different initial values to be used as an | |||
| additional signalling channel in future. If any inappropriate | additional signalling channel in future. If any inappropriate | |||
| zeroing of these fields is discovered during testing, this approach | zeroing of these fields is discovered during testing, this approach | |||
| will need to be reviewed. | will need to be reviewed. | |||
| skipping to change at page 34, line 51 ¶ | skipping to change at page 34, line 51 ¶ | |||
| rather than 3, so that the division could be implemented as an | rather than 3, so that the division could be implemented as an | |||
| integer right bit-shift by lg(BEACON_FREQ). | integer right bit-shift by lg(BEACON_FREQ). | |||
| In certain operating systems, it might be too complex to maintain | In certain operating systems, it might be too complex to maintain | |||
| acks_in_round. In others it might be possible by tagging each data | acks_in_round. In others it might be possible by tagging each data | |||
| segment in the retransmit buffer with the number of ACKs sent at the | segment in the retransmit buffer with the number of ACKs sent at the | |||
| point that segment was sent. This would not work well if the Data | point that segment was sent. This would not work well if the Data | |||
| Receiver was not sending data itself, in which case it might be | Receiver was not sending data itself, in which case it might be | |||
| necessary to beacon based on time instead, as follows: | necessary to beacon based on time instead, as follows: | |||
| if (time_now > time_last_option_sent + RTT / BEACON_FREQ) | if ( time_now > time_last_option_sent + (RTT / BEACON_FREQ) ) | |||
| send_full_AccECN_Option() | send_full_AccECN_Option() | |||
| However, this time-based approach does not work well when all the | This time-based approach does not work well when all the ACKs are | |||
| ACKs are sent early in each round trip, as is the case during slow- | sent early in each round trip, as is the case during slow-start. In | |||
| start. | this case few options will be sent (evtl. even less than 3 per RTT). | |||
| However, when continuously sending data, data packets as well as ACKs | ||||
| {ToDo: A simple and robust beaconing algorithm for all circumstances | will spread out equally over the RTT and sufficient ACKs with the | |||
| is still work-in-progress.} | AccECN option will be sent. | |||
| A.5. Example Algorithm to Count Not-ECT Bytes | A.5. Example Algorithm to Count Not-ECT Bytes | |||
| A Data Sender in AccECN mode can infer the amount of TCP payload data | A Data Sender in AccECN mode can infer the amount of TCP payload data | |||
| arriving at the receiver marked Not-ECT from the difference between | arriving at the receiver marked Not-ECT from the difference between | |||
| the amount of newly ACKed data and the sum of the bytes with the | the amount of newly ACKed data and the sum of the bytes with the | |||
| other three markings, d.ceb, d.e0b and d.e1b. Note that, because | other three markings, d.ceb, d.e0b and d.e1b. Note that, because | |||
| r.e0b is initialised to 1 and the other two counters are initialised | r.e0b is initialised to 1 and the other two counters are initialised | |||
| to 0, the initial sum will be 1, which matches the initial offset of | to 0, the initial sum will be 1, which matches the initial offset of | |||
| the TCP sequence number on completion of the 3WHS. | the TCP sequence number on completion of the 3WHS. | |||
| skipping to change at page 36, line 28 ¶ | skipping to change at page 36, line 28 ¶ | |||
| middlebox had stripped the option. | middlebox had stripped the option. | |||
| Appendix C. Open Protocol Design Issues (To Be Removed Before | Appendix C. Open Protocol Design Issues (To Be Removed Before | |||
| Publication) | Publication) | |||
| 1. Currently it is specified that the receiver `SHOULD' use Change- | 1. Currently it is specified that the receiver `SHOULD' use Change- | |||
| Triggered ACKs. It is controversial whether this ought to be a | Triggered ACKs. It is controversial whether this ought to be a | |||
| `MUST' instead. A `SHOULD' would leave the Data Sender uncertain | `MUST' instead. A `SHOULD' would leave the Data Sender uncertain | |||
| whether it can rely on the timing and ordering information in | whether it can rely on the timing and ordering information in | |||
| ACKs. If the sender guesses wrongly, it will probably introduce | ACKs. If the sender guesses wrongly, it will probably introduce | |||
| at least 1RTT of delay before it can use this timing information. | at least 1 RTT of delay before it can use this timing | |||
| Ironically it will most likely be wanting this information to | information. Ironically it will most likely be wanting this | |||
| reduce ramp-up delay. A `MUST' could make it hard to implement | information to reduce ramp-up delay. A `MUST' could make it hard | |||
| AccECN in offload hardware. However, it is not known whether | to implement AccECN in offload hardware. However, it is not | |||
| AccECN would be hard to implement in such hardware even with a | known whether AccECN would be hard to implement in such hardware | |||
| `SHOULD' here. For instance, was it hard to offload DCTCP to | even with a `SHOULD' here. For instance, was it hard to offload | |||
| hardware because of change-triggered ACKs, or was this just one | DCTCP to hardware because of change-triggered ACKs, or was this | |||
| of many reasons? The choice between MUST and SHOULD here is | just one of many reasons? The choice between MUST and SHOULD | |||
| critical. Before that choice is made, a clear use-case for | here is critical. Before that choice is made, a clear use-case | |||
| certainty of timing and ordering information is needed, plus | for certainty of timing and ordering information is needed, plus | |||
| well-informed discussion about hardware offload constraints. | well-informed discussion about hardware offload constraints. | |||
| 2. There is possibly a concern that a receiver could deliberately | 2. There is possibly a concern that a receiver could deliberately | |||
| omit the AccECN Option pretending that it had been stripped by a | omit the AccECN Option pretending that it had been stripped by a | |||
| middlebox. No known way can yet be contrived to take advantage | middlebox. No known way can yet be contrived to take advantage | |||
| of this downgrade attack, but it is mentioned here in case | of this downgrade attack, but it is mentioned here in case | |||
| someone else can contrive one. | someone else can contrive one. | |||
| 3. The s.cep counter might increase even if the s.ceb counter does | 3. The s.cep counter might increase even if the s.ceb counter does | |||
| not (e.g. due to a CE-marked control packet). The sender's | not (e.g. due to a CE-marked control packet). The sender's | |||
| skipping to change at page 37, line 26 ¶ | skipping to change at page 37, line 26 ¶ | |||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe | Bob Briscoe | |||
| Simula Research Laboratory | Simula Research Laboratory | |||
| EMail: ietf@bobbriscoe.net | EMail: ietf@bobbriscoe.net | |||
| URI: http://bobbriscoe.net/ | URI: http://bobbriscoe.net/ | |||
| Mirja Kuehlewind | Mirja Kuehlewind | |||
| ETH Zurich | ETH Zurich | |||
| Gloriastrasse 35 | Zurich | |||
| Zurich 8092 | ||||
| Switzerland | Switzerland | |||
| EMail: mirja.kuehlewind@tik.ee.ethz.ch | EMail: mirja.kuehlewind@tik.ee.ethz.ch | |||
| Richard Scheffenegger | Richard Scheffenegger | |||
| NetApp, Inc. | Vienna | |||
| Am Euro Platz 2 | ||||
| Vienna 1120 | ||||
| Austria | Austria | |||
| Phone: +43 1 3676811 3146 | EMail: rscheff@gmx.at | |||
| EMail: rs@netapp.com | ||||
| End of changes. 12 change blocks. | ||||
| 33 lines changed or deleted | 29 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||