INTERNET-DRAFT M.Sun Intended Status: Standard Track HUAWEI Technologies Expires: May 4, 2017 October 31, 2016 An Improvement of ECN to Enhance TCP Fairness Performance draft-sun-tcpm-ecn-improvement-00 Abstract This document describes TCP fairness performance enhancement scheme by use of two parameters congestion degree and link idle rate. It uses IP header flag reserved field and ECN 2 bits field to form a new IP congestion and Spare Flag (ICSF) and uses 3bits of reserved bits in the TCP header to compose the TCP Congestion and Spare Flag (TCSF) field. This method identifies the congestion and link idle status to make sure that every TCP flow can receive the same degree of fairness and can improve TCP send window adjustment speed and transmission efficiency. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. Marcus SUN Expires May 4, 2017 [Page 1] INTERNET DRAFT An Improvement of ECN October 31, 2016 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Generic ECN Overview . . . . . . . . . . . . . . . . . . . . . 3 2.1 Detailed information of ECN . . . . . . . . . . . . . . . . 3 2.2 Shortage . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 TCP Fairness Performance Improvement . . . . . . . . . . . . . 6 3.1 Congestion Degree . . . . . . . . . . . . . . . . . . . . . 6 3.2 Idle Rate . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Full Link Congestion and Idle Information . . . . . . . . . 6 3.4 Send Window Adjusting Method . . . . . . . . . . . . . . . 7 3.4.1 Using the Worst Congestion Degree to Adjust Sending Window . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4.2 Using the Worst Idle Rate to Adjust Sending Window . . 7 3.5 IP Option Extend . . . . . . . . . . . . . . . . . . . . . 8 3.5.1 Congestion Degree IP Extend . . . . . . . . . . . . . . 8 3.5.2 Idle Rate IP Extend . . . . . . . . . . . . . . . . . . 8 4 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1 Normative References . . . . . . . . . . . . . . . . . . . 8 4.2 Informative References . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Marcus SUN Expires May 4, 2017 [Page 2] INTERNET DRAFT An Improvement of ECN October 31, 2016 1 Introduction ECN (Explicit Congestion Notification)[RFC3168]is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168. ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it. An ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet. This draft specifies use of congestion degree and idle rate information in IP/TCP packets as an improvement of ECN. This scheme can help network equipments to obtain accurate congestion and idle status of the whole link in real time, and then feedback to the TCP transmitter. TCP transmitter can be more accurate to adjust the transmission window to avoid network congestion better. Packets always carry the worst port congestion or idle state and there will be no overlap of the case. In addition the scheme provides the same network status in each TCP flow and better TCP fairness. This document requests one flag (1 bit) from IP header flag reserved field and 3bits of reserved bits in the TCP header. It describes use of extending options in IP and TCP. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. TCP: Transmission Control Protocol RTT: Round Trip Time CWND: Congestion Window ACK: Acknowledgement SSThresh:Slow Start Threshold 2. Generic ECN Overview 2.1 Detailed information of ECN Marcus SUN Expires May 4, 2017 [Page 3] INTERNET DRAFT An Improvement of ECN October 31, 2016 The traditional congestion control algorithm is usually designed on the basic assumption that network is like a black box. These algorithms use TCP detection to realize network congestion and delay.The send window at sending end increases gradually until congestion or packets loss occurs in network. Although these congestion control algorithms can adapt to best effort transmission manner, they are not sensitive to packets loss or network delay, either cannot apply to the application of interactive class, including telnet,web browsing,audio/video etc. ECN method helps routers announce TCP when network is about to get congested. Our method can not only avoid network packets loss circumstances, but also network delay due to router packet buffer overflow. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELD, DSCP | ECN FIELD | +-----+-----+-----+-----+-----+-----+-----+-----+ DSCP: differentiated services codepoint ECN: Explicit Congestion Notification Figure 1 ECN field in IP message In Fig.1, we use 2 bit ECN FIELD in IP header to record whether ECN is enable and to represent congestion of network. +-----+-----+ | ECN FIELD | +-----+-----+ ECT CE 0 0 Not-ECT 0 1 ECT(1) 1 0 ECT(0) 1 1 CE Figure 2 The ECN FIELD in IP In Fig.2, ECN's value is 00 which means not ECT (ECN-Capable Transport). 10 or 01 means ECT. While 11 means CE (Congestion Experienced). There are two fields added in TCP header: CWR and ECE,as shown in Figure Marcus SUN Expires May 4, 2017 [Page 4] INTERNET DRAFT An Improvement of ECN October 31, 2016 3. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | | | | C | E | U | A | P | R | S | F | | Header Length | Reserved | W | C | R | C | S | S | Y | I | | | | R | E | G | K | H | T | N | N | | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure 3 The new definition of bytes 13 and 14 of the TCP Header CWR (Congestion Window Reduced): The sender will announce that send window has already reduced. The receiver will stop mark ECE in TCP ACK when it receives CWR message. ECE (ECN Echo): If ECN FIELD's value is 11 in messages received by TCP receiver, the network is congested. So receiver will send ECE(ECN Echo)in TCP ACK back to sender to notify congestion and messages will be marked as congestion Experienced CE. Besides the ECE can be used by TCP to negotiate whether both receiver and sender support ECN function. The interaction process of ECN is described as following. Step1: Using two fields of ECE and CWR in TCP SYN header to consult ECN. If both two ends set ECE and CWR in SYN message, these two ends support ECN. Step2: If both ends support ECN, TCP sender will enable ECN function in IP ECN field and set IP ECN field 10 or 01. Step3: When message arrives at network equipment, the equipment will check output port buffer situation. If buffer overflows the threshold of ECN and IP ECN filed is ECT, IP ECN filed will be modified to 11 and marked as CE. Step4: Send messages out through output port. Step5-7: are as same as steps 3-4. Step8: When IP ECN field is marked as CE in IP messages, the receiver will add ECE mark in TCP ACK and notify sender that network has already congested. Step9: According to the total amount of messages and marked ECE messages, the sender will confirm the congestion level and determine the size of send window. Marcus SUN Expires May 4, 2017 [Page 5] INTERNET DRAFT An Improvement of ECN October 31, 2016 2.2 Shortage The main purpose of ECN is to notify the sender to reduce the window when the network device buffer achieves the threshold value as one method. But this congestion handle method lacks good solutions for light load in network. When light load happened, network is unable to notify TCP sender rapidly about idle rate information so it will be helpful if we can rapidly adjust send window to get faster transmission speed and better physical bandwidth utilization. ECN can not accurately reflect network congestion status. Theoretically, when multiple network devices in the path exceed a threshold value of buffer, the congestion status should be the worst node status. But ECN signs may be changed by each network device, so the proportion of packets labeled with ECN is larger than that of the most congested node. That will lead network congestion status received by the sender is far greater than the actual congestion on the path. TCP fairness problem. Because of the different stream order and the number of packets sent by RTT, it will lead to different congestion levels, which leads to the unfairness of the flow. 3 TCP Fairness Performance Improvement 3.1 Congestion Degree The degree of link congestion is mainly refers to the number of packets in the link buffer cache relative to the physical bandwidth of the link, how long it takes to complete the transmission. For example, the link buffer cache size of message is 20MB, convert into 20*8Mb for the bit, the link bandwidth is 1Gbps. Then it takes 20*8Mb/1Gbps=160/1024=0.16 seconds. It takes 0.16 seconds to complete message transferred in the cache and the link congestion level can be 16%. 3.2 Idle Rate Link idle rate is a concept which is opposite to link usage. For example, 1Gbps link with 600Mbps traffic, then the link usage is 60%, while the link idle rate is 40%. 3.3 Full Link Congestion and Idle Information Step1: TCP sender sends message. Step2: After receiving this message, the first network device can find out the link congestion degree/idle rate of its output port based on the forwarding table, and increase the link congestion information/idle rate into the message. Marcus SUN Expires May 4, 2017 [Page 6] INTERNET DRAFT An Improvement of ECN October 31, 2016 Step3: The message is sent out from the outlet port of the network device. Step4: The following network equipment will execute look-up forwarding table and obtain the degree of congestion as Step2, and judge the congestion degree/idle rate of port and message before sending. If current congestion degree/idle rate of its port is worse than that in the message, update the congestion/idle information in the message, otherwise change noting. Step5: Then the message is sent out from the outlet port of the network device. Step6-7: are as same as Step 3-4. Step8: TCP receiver adds the link congestion degree/idle rate of the message to the ACK, and the worst congestion degree/idle rate information of the link is notified to sender. Step9: TCP sender determine the scope of the reduction/increase of the send window according to the received TCP ACK with the worst congestion degree/idle rate and current congestion window size. 3.4 Send Window Adjusting Method 3.4.1 Using the Worst Congestion Degree to Adjust Sending Window The TCP sender adjusts the decrease rate of the window according to the worst congestion degree in the received TCP ACK and the current send window. Assuming that the worst congestion degree in TCP ACK is 10%, the current window of FLOW1 is 1000 and the window of FLOW2 is 200. Since the worst degree of congestion is 10%, which means that the current traffic exceeds 10% of the link bandwidth, the total traffic of all current flows is (1 + 10%) of the available bandwidth. Therefore, the ratio of the flow rate to be reduced is 10% / (1 + 10%) = 9.09%. For FLOW1, the current send window is 1000, so the window needs to be reduced by 90.9 (1000 * 9.09%), while FLOW2's send window needs to be reduced by 18.2 (200 * 9.09%). 3.4.2 Using the Worst Idle Rate to Adjust Sending Window The TCP sender adjusts the increase in window size based on the worst idle rate in the received TCP ACK and the current send window. Assuming that the worst idle rate in TCP ACK is 40%, the current window of FLOW1 is 1000 and the window of FLOW2 is 200. Since the worst idle rate is 40%, so the current link utilization is 60%. The flow rate of all current flow is 60%, and 40% of the free space, so the flow rate can be increased by 40% / (1-40%) = 66.67%.For FLOW1, the current send window Marcus SUN Expires May 4, 2017 [Page 7] INTERNET DRAFT An Improvement of ECN October 31, 2016 is 1000, so the window needs to increase 667 (1000 * 66.67%), while FLOW2's send window is 200, so the window needs to increase 133 (200 * 66.67%). 3.5 IP Option Extend 3.5.1 Congestion Degree IP Extend The degree of congestion carried in IP packets can be achieved by extending the IP option. +----------------------+-------+-----+ | Type |Length |Value| +----------------------+-------+-----+ |node congestion degree|4 bytes| 0.1 | +----------------------+-------+-----+ The worst degree of congestion carried by TCP ACK can be extended by TCP option as following. +---------------------------+-------+-----+ | Type |Length |Value| +---------------------------+-------+-----+ |the worst congestion degree|4 bytes| 0.1 | +---------------------------+-------+-----+ 3.5.2 Idle Rate IP Extend The idle rate carried in IP packets can be achieved by extending the IP option. +----------------------+-------+-------+ | Type |Length | Value | +----------------------+-------+-------+ | node idle rate |4 bytes| 0.45 | +----------------------+-------+-------+ The worst idle rate carried by TCP ACK can be extended by TCP option as following. +-------------------+-------+-------+ | Type |Length | Value | +-------------------+-------+-------+ |the worst idle rate|4 bytes| 0.45 | +-------------------+-------+-------+ 4 References 4.1 Normative References Marcus SUN Expires May 4, 2017 [Page 8] INTERNET DRAFT An Improvement of ECN October 31, 2016 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC1925] Callon, R., "The Twelve Networking Truths", RFC 1925, April 1 1996. 4.2 Informative References [RFC2481] Ramakrishnan, K. and S. Floyd, "A Proposal to add Explicit Congestion Notification (ECN) to IP", RFC 2481, January 1999. [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit Congestion Notification (ECN) Signaling with Nonces", RFC 3540, June 2003. [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. Ramakrishnan, "Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June 2009. Authors' Addresses Marcus Sun HUAWEI TECHNOLOGIES CO.,LTD 12 E. Mozhou Rd. Jiangning Dist. Nanjing,Jiangsu China EMail: marcus.sun@huawei.com Marcus SUN Expires May 4, 2017 [Page 9]