idnits 2.17.1 draft-salim-jhsbnns-ecn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC792]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1998) is 9446 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1812' is mentioned on line 224, but not defined == Unused Reference: 'RFC 1812' is defined on line 439, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Hadi Salim, J 3 Internet Draft Nandy, B 4 Seddigh, N 5 Computing Technology Labs, 6 Nortel 7 June 1998 8 10 A proposal for Backward ECN for the Internet Protocol (IPv4/IPv6) 12 Status of this Memo 13 This document is an Internet-Draft. Internet-Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 To view the entire list of current Internet-Drafts, please check 24 the "1id-abstracts.txt" listing contained in the Internet-Drafts 25 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 26 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au 27 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu 28 (US West Coast). 30 Abstract 32 This memo proposes an alternative approach to the current ECN mechanism 33 as proposed in the internet draft [draft-kksjf]. A Backward-ECN(BECN) 34 is proposed which uses the existing IP signalling mechanism, the 35 Internet Control Messaging Protocol (ICMP) [RFC 792] Source Quench 36 message. The use of ICMP Source Quench (ISQ) allows a basic ECN 37 mechanism for IP which does not require any negotiation between end 38 systems. Congestion notification is kept at the network(IP) level. The 39 congestion state can be reflected up to the transport layer (e.g. TCP or 40 UDP) for appropriate action. The ISQ based approach reduces the reaction 41 time to a congestion in the network. In addition, the ISQ message can 42 include information on the severity of the congestion allowing the end 43 host to react accordingly so as to make maximal use of the resources 44 while maintaining network equilibrium. 46 1.0 Introduction 48 IP currently does not have any adhered to mechanism to notify its 49 transport protocols of network congestion problems. ISQs have been in 50 the past used for congestion notification; TCP implements its own 51 congestion control algorithm and makes inferences about network 52 congestion: TCP-Reno and variants use packet losses as an indicator 53 whereas TCP-Vegas uses delay/throughput as the indicator. UDP 54 applications are usually unresponsive and the protocols running over UDP 55 (e.g., RTP) use their own congestion control methods if they do at all. 56 The initial suggestions to introduce a methodology for adding Explicit 57 Congestion Notification to IP are outlined in [Floyd94] and later in the 58 IETF draft [draft-kksjf]. 60 1.1 Current ECN Proposal [draft-kksjf] 62 Bits 10 and 11 in the IPV6 header are proposed respectively for the ECT 63 (ECN Capable Transport indicator) and CE (Congestion Experienced 64 indicator). Bits 6 and 7 of the IPV4 header TOS field are also proposed 65 as the ECT and CE place holders respectively. The TCP header is 66 modified to add an additional flag, the ECN Echo, to notify the sender 67 (from the receiver) that it is contributing to congestion. The flag's 68 bit-space is borrowed from the reserved field in the TCP header. This 69 bit is also interchangebly referred to as the ECE bit in this text. 71 The ECT bit is set by the sender end system if both the end systems are 72 ECN capable. This is confirmed in the pre-negotiation during the 73 connection setup phase in TCP. Packets encountering congestion are 74 marked (CE bit) by a router on their way to the receiver end system 75 (from the sender end system), with a probability proportional to their 76 bandwidth usage following the procedure used in RED [RFC2309] routers. 77 When the receiver end system receives the congestion causing packet with 78 CE and ECT bits set, it informs the sender end system that it is 79 contributing to congestion by the setting of ECE bit in the ACK packet. 80 The sender end system reacts by halving the congestion window upon 81 receiving the ACK packet. The sender end system reacts only once to ECE 82 messages per in-flight window of messages. 84 1.2 Limitations of the Current ECN Proposal [draft-kksjf] 86 1) The [draft-kksjf] proposal's congestion notification is coupled to 87 the transport layer(TCP) via the use of header information (ECE bit). 88 To extend this proposal to other transport protocols will require 89 changes to each of their respective headers. 91 2) The proposed [draft-kksjf] scheme requires the congestion 92 notification to incur a round trip time (RTT) before the sender can 93 react. In a path with high delay-bandwidth product this would be 94 problematic for two reasons: i) in the scenario where the delay- 95 bandwidth product is dominated mostly by the high bandwidth (as in in 96 high-speed networks), a large amount of traffic will pass through the 97 intermediate routers causing an increase in congestion level before the 98 sender is notified. ii) in the scenario where the delay-bandwidth 99 product is dominated mostly by the high latency/RTT (as in satellite 100 networks), the reaction will take too long to address the congestion 101 issue. In both cases, the efficient use of the available bandwidth is 102 affected. 104 3) Because of the binary nature of the feedback, the reaction is limited 105 to halving the window size even if the congestion level is very low. 106 Network resources could be more effectively utilized if the feedback was 107 indicative of the congestion level at the overloaded point in the 108 network. 110 In this document we introduce a Backward ECN (BECN) which is a binary 111 feedback mechanism and then an incremental improvement to BECN which 112 provides Multi-level Backward ECN which we refer to as Multilevel ECN 113 (MECN). 115 Section 2 gives an introduction to our solution and how it addresses the 116 above limitations: a justification for using ISQ is made and Backward 117 ECN (BECN) and then multi-level BECN (MECN). Section 3 goes into the 118 details of BECN and suggests a role for the router and the end system. 119 Section 4 goes into the details of MECN and suggests a role for the 120 router and the end system. Section 5 addresses the situation of 121 multiple congested routers with our scheme. Section 6 is on security 122 issues. 124 2.0 Network Level Signalling for ECN 126 We argue that ECN is a network level functionality and should be 127 decoupled from the transport protocols. A mechanism should be provided 128 for the end IP layer to inform its transport protocols of congestion 129 problems without using their header bit(s). This provides the value that 130 all IP transport protocols (including any new ones that might be added 131 in the future) are notified in the same manner about network congestion. 132 In this document we only deal with TCP and in particular TCP mechanisms 133 which use packet drops as indicators of congestion such as TCP-Reno and 134 its variants. 136 It is assumed that the participating routers are capable of RED or some 137 other active queue management mechanism. In such a router, a packet has 138 a probability of being dropped where this probability is dependent on 139 average queue size. For packets with the ECT bit set in the IP header, 140 instead of the packet being dropped it would have the CE bit in the 141 header set before being forwarded with a given probability if the 142 average queue size goes between the minimum and maximum thresholds as 143 described in [draft-kksjf]. 145 We leverage ICMP's Source Quench message whose design intent is to 146 provide feedback to a source end system about network congestion. Both 147 the CE and ECT bits defined in [draft-kksjf] are maintained. During the 148 de-multiplexing of the IP message, the values of both CE and ECT are 149 passed to the transport layer. 151 We start by introducing a traditional ISQ which comprises a binary 152 feedback mechanism and a relatively modified binary reaction at the 153 source end system (in comparison to what the requirements for the end 154 host's reaction to ISQ are at the moment [RFC1122]) 156 Definition: The term binary congestion feedback is used to define 157 gathered knowledge of network congestion being passed back to an end 158 node, explicit or otherwise, ignoring the levels of congestion. The 159 data only says that the network is congested. 161 We then introduce a multilevel congestion feedback mechanism based on 162 the various incipient congestion levels detected at the RED router. The 163 sender end system in that scenario has the luxury of having more varied 164 reactions based on the congestion level that is fed back. This results 165 in effective use of the network resources and performance. 167 Definition: The term multilevel congestion feedback is used to define 168 gathered knowledge of network congestion being passed back to an end 169 node with explicit level indicators of how severely the network is 170 congested. 172 We propose the multilevel congestion feedback and reaction as an 173 incremental improvement over the binary congestion feedback and reaction 174 mechanism. In sections 3 and 4 we suggest some simple algorithms for 175 both the binary and multilevel solutions. 177 2.1 Backward ECN (BECN) 179 This section briefly describes the binary feedback-reaction mechanism. 181 ICMP Source Quench messages (ISQ) are generated by the intermediate 182 congested RED router and sent back to the source as an indication of 183 incipient congestion whenever that router decides to mark the CE bit. 184 ISQs are usually not generated for a packet that has already been marked 185 previously by another router regardless of whether that packet is 186 contributing to some congestion; however, when the router queue level 187 mandates that the packet be dropped then an ISQ is sent back to the 188 source regardless of whether the packet was marked previously or not. 190 The source reacts at the transport protocol level by lowering its data 191 throughput into the network. In TCP, upon identifying the flow causing 192 the congestion, the sender reacts by halving both the congestion window 193 and the slow start threshold value for that flow. The sender does not 194 react to an ISQ message more than once per window. This is similar to 195 the algorithm defined in the draft[draft-kksjf]. 197 2.2 Multilevel BECN (MECN) 199 This section briefly describes the multilevel congestion feedback- 200 reaction. 202 Multi-level ICMP Source Quench messages (ISQ) are generated by the RED 203 router and sent back to the source as an indication of incipient 204 congestion whenever the CE bit is marked by the intermediate congested 205 router. The levels are based on the RED probability, and therefore 206 average queue size, at the time a congestive packet arrives at the 207 router. The congestion level sent back is a multiplicative factor of the 208 marking probability and is stored in the 32-bit unused field of the ISQ. 209 As an example the multiplicative value selected is 100. The upper limit 210 of 100 is returned when the probability of dropping the packet is equal 211 to one.(i.e average queue size is above maximum threshold). ISQs are 212 not generated for a packet that has already been marked; however, as in 213 the case of the BECN when the router queue level mandates that a packet 214 is dropped then an ISQ is sent back to the source regardless of whether 215 the packet was previously marked or not. The value is the maximum i.e 216 100 in the above example. 218 2.3 The argument to justify the use of ISQ 220 ISQ messages, generated by a router to an end system, in the past have 221 been considered inefficient due to the following reasons: 223 1) Gateway CPU abuse while processing these extra messages and 2) 224 Bandwidth consumption on the reverse path. It is suggested [RFC1812] 225 that the routers, if implementing ISQs, should rate limit their 226 generation because they consume too much bandwidth in the reverse path. 228 We argue that CPU time is no longer a constrained resource today and 229 that the benefits provided by ECN outweigh the small performance hit 230 added. Moreover, it has been shown [red-paper] that when using RED 231 (with cooperating end systems) less packet drops happen at the router in 232 comparison to the traditional drop-tail algorithms used in disapproving 233 ISQ. This implies the amount of processing needed at the router is 234 reduced. It has been quantitatively shown in simulations [kcho-97] that 235 only about 1-5% of the packets are marked or dropped in a RED gateway 236 under incipient congestion. We argue that a faster reaction to the 237 problem as provided by ISQ would alleviate the problem faster resulting 238 in even further reductions. 240 Using a RED gateway provides us with an advantage. A connection is 241 notified (by an ISQ in this case) of congestion at a rate proportional 242 to the connection's share of the bandwidth at the congested gateway. 243 Generation of ISQ messages will be limited to the period between when 244 incipient congestion is detected all the way until the source end system 245 adjusts. In fact, given our scheme which addresses congested routers 246 sequentially on a downstream path, we argue that the back-path even if 247 it is the same as the forward path is probably not really congested 248 since it covers the path only to the first point of congestion along 249 that path. More details in section 5. 251 In essence RED addresses both the backward path congestion problem, if 252 the back path is the same one as the forward path, as well as the router 253 processing concerns. 255 3.0 Suggested BECN algorithm 257 This is a binary feedback-reaction mechanism. The ISQs sent by the 258 router to the source host act as an indication of incipient congestion. 259 The source reacts at the transport level by lowering its congestion 260 window. The algorithm supplied here is the same as the one used in the 261 ECN proposal [draft-kksjf] 263 3.1 Role of the Router 265 If the incoming message causes the average queue size to go above the 266 maximum threshold, then drop the segment 267 if the ECT bit is marked in the IP header send an ISQ back to the 268 source. 269 else if the incoming message causes the average queue to go between 270 the minimum and maximum thresholds then: 271 if the RED probability chooses this packet and the ECT bit is set 272 and if packet is not already marked then: 273 mark the packet (CE bit) and send an ISQ back. 274 else if RED chooses this packet and the ECT bit is not set then: 275 drop the packet. 277 3.2 Role of the Source End System 279 If an ISQ message is received then the sender knows that there is 280 network congestion. The flow causing the congestion is identified from 281 the ICMP data. The TCP source reacts by halving both the congestion 282 window and the slow start threshold value for that flow. 284 The sender does not react to ISQ more than once per window. Upon 285 receipt of an ISQ packet at time t, it notes the packets that are 286 outstanding at that time (sent but not yet acked) and waits until a time 287 u when they have all been acknowledged before reacting to a new ISQ 288 message. 290 4.0 Suggested MECN algorithm 292 This is an evolution of BECN. The router now sends levels of congestion 293 notification and the source end system reacts differently depending on 294 the severity of the congestion. The level of notification is stored in 295 the 32-bit unused field in the ISQ. 297 4.1 Role of the Router 299 4.1.1 How the congestion level weight is computed 301 Pb refers to the computed RED packet marking probability. Pb is a 302 function of the computed average queue size. As the average queue size 303 varies from minimum to maximum threshold, Pb varies between 0 and the 304 maximum value set for it, Maxp. Note that we quantify Pb to be one 305 when the threshold is above maximum; in that particular case, the 306 maximum weight is sent to the source system. We choose for simplicity's 307 sake a multiplicative factor to be 100 to fashion the weight as a 308 percentage congestion level. Above the maximum threshold we send a value 309 of 100 in the feedback message indicating 100% incipient congestion. We 310 multiply Pb by some factor such that we get a reflection of 99% 311 congestion when Pb reaches its maximum value and we add 1 to counter for 312 the fact that Pb is zero at the minimum threshold. The equation used to 313 compute the weight to send between the minimum and maximum thresholds 314 is: 316 level= Pb*(98/Maxp) + 1 318 At the maximum threshold the weight sent is 99 and at minimum threshold 319 the weight sent is 1. For efficiency, 98/Maxp could be computed at RED 320 initialization. 322 4.1.2 The Router functionality 324 If the incoming message causes the average queue size to go above the 325 maximum threshold, then: 326 drop the packet, 327 if the ECT bit is marked in the IP header then: 328 send an ISQ back to the source with a weight of 100 330 If the incoming message causes the average queue to go between the 331 minimum and maximum thresholds then: 333 if the RED probability picks this packet then: 334 if the ECT bit is set and the CE bit is not already marked then: 335 mark the packet and send an ISQ of integer level 1+(Pb*98/Maxp) 336 back to the source 337 else (the ECT bit is not set in the IP header) then: 338 drop the packet. 340 4.2 Role of the End System 342 The end system can now react to a shade of congestion level 343 notifications. 345 We show here a simple algorithm that could be incrementally improved. 346 We react to each ISQ received under the assumption that the effect of 347 burstiness and spuriousness is accounted for by the RED algorithm at the 348 router. Since a weight of 100 indicates that the packet was dropped we 349 use this information to improve RTO in TCP by retransmitting that 350 packet. Note that the packet sequence number can be deduced from the 8 351 bytes of the TCP header passed back in the ISQ message (ISQs always pass 352 8 bytes on top of the IP header's information). The slow start, 353 congestion avoidance and Fast retransmit/recovery mechanics are 354 maintained. 356 4.2.1 The Source end system functionality 358 If an ISQ message is received then the sender knows that there is 359 network congestion. The flow causing the congestion is identified from 360 the ICMP data and the congestion level is extracted. 362 If the congestion level == 100 then: 363 extract the TCP sequence number from the ISQ. 364 retransmit the packet. 365 cut the congestion window and threshold value by 1/2. 367 else (we are between max and min threshold at the router) then: 368 if congestion level >=50 then: 369 cut the congestion window and threshold value by 1/2. 370 else (anything below 50%) then: 371 congestion window is linearly decremented by 1. 373 Note: a) The usual rules about the lower bounds of the threshold and 374 congestion window values apply when decrementing. 376 b) The MECN method outlined above will have interactions with the 377 existing congestion control mechanisms in TCP. The overall effect still 378 slows down the system throughput if the congestion levels warrant it. 380 5.0 Multiple congested routers 382 Multiple congested routers on the path between the sender and the 383 receiver have their concerns addressed one at a time in a domino effect. 384 If any of the downstream routers are congested to the extent of a packet 385 drop then that router's congestion concerns are addressed immediately. 386 If a packet is marked by a congested router, no ISQ message is generated 387 further for it on its way to the destination. The exception to the rule 388 is, if along the path after the marking, some other intermediate router 389 decides to drop this packet. In that case it will transmit an ISQ of 390 level 100 to which the end system will have to invoke the congestion 391 reaction immediately. Therefore any router which is congested to the 392 level of dropping packets will participate in the congestion control. 393 Routers which are closer to the source will be favored in the sense that 394 their incipient congestion levels will be reacted to first. If the flow 395 is long enough, the router closest to the source will have its 396 congestion concerns serviced first with the next downstream router 397 serviced next and so forth with the router closest to the destination 398 being the last one responded to. The bias is more eminent when a 399 further downstream router (other the one that marked the packet) would 400 have sent a higher notification level had it had the opportunity i.e had 401 a packet not been marked and given a lesser weight in a previous router. 402 We feel that this bias is not of great significance given that any 403 downstream router dropping a packet will contribute to the congestion 404 reaction at the source. 406 6.0 Security issues 408 ISQ messages can be spoofed. This can be used for a Denial of Service 409 attack on a source end system. Building authentication is probably too 410 heavy weight. This is a problem faced by IP in general and so we have 411 not attempted to address it. 413 7.0 References 415 [draft-kksjf] Ramakrishanan, KK and Floyd, S. A proposal to add Explicit 416 Congestion Notification(ECN) to IPv6 and to TCP, IETF Draft draft-kksjf- 417 ECN-00.txt, November 1997. 419 [Floyd94] Floyd, S. TCP and Explicit Congestion Notification, ACM 420 Computer Communications Review, V.24N, October 1994. 422 [red-paper] Floyd,S. and Jacobson, V. Random Early Detection Gateways 423 for Congestion Avoidance, IEEE/ACM Transactions on Networking,Aug 1993. 425 [kcho-97] Cho, K.J. ALTQ/RED Performance, 426 http://www.csl.csl.sony.co.jp/person/kjc/red/perf.html 428 [RFC 792] Postel, J Internet Control Message Protocol (sep 1981) 430 [RFC1122] Braden, R (Editor) Requirements for Internet Hosts -- 431 Communication Layers (oct 1989). 433 [RFC2309] Braden, B.,Clark, D.,Crowcroft, J.,Davie, B., Deering, S., 434 Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., 435 Peterseon, L., Ramakrishnan, K., Shenker, S.,Wroclaski, J., and Zhang, 436 L. Recommendations on Queue Management and Congestion Avoidance in the 437 Internet (April 1998). 439 [RFC 1812] Baker, F. Requirements for IPv4 routers (June 1995). 441 8.0 Acknowledgements 443 The authors are much indebted to Alan Chapman. Without his insight and 444 multiple edits the ideas embedded in here would have been much difficult 445 to present. 447 9.0 Authors' Addresses 449 Jamal Hadi Salim, 450 Computing Technology Labs, 451 Nortel Canada, 452 PO Box 3511 Station C 453 Ottawa ON K1Y 4H7 454 Canada 456 Phone: 613-763-6395 457 Email: hadi@nortel.com 459 Biswajit Nandy, 460 Computing Technology Labs, 461 Nortel Canada, 462 PO Box 3511 Station C 463 Ottawa ON K1Y 4H7 464 Canada 466 Phone: 613-765-3709 467 Email: bnandy@nortel.com 469 Nabil Seddigh, 470 Computing Technology Labs, 471 Nortel Canada, 472 PO Box 3511 Station C 473 Ottawa ON K1Y 4H7 474 Canada 476 Phone: 613-763-6396 477 Email: nseddigh@nortel.com