idnits 2.17.1 draft-briscoe-tsvwg-byte-pkt-mark-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1656. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1667. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1674. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1680. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 24, 2008) is 5906 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 496 -- Looks like a reference, but probably isn't: '1' on line 496 ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3448 (Obsoleted by RFC 5348) == Outdated reference: A later version (-06) exists of draft-floyd-tcpm-ackcc-02 == Outdated reference: A later version (-11) exists of draft-ietf-pcn-architecture-03 == Outdated reference: A later version (-10) exists of draft-ietf-tcpm-ecnsyn-05 == Outdated reference: A later version (-07) exists of draft-ietf-tcpm-rfc2581bis-03 == Outdated reference: A later version (-08) exists of draft-irtf-iccrg-welzl-congestion-control-open-research-00 == Outdated reference: A later version (-09) exists of draft-briscoe-tsvwg-re-ecn-tcp-05 Summary: 5 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft BT & UCL 4 Intended status: Informational February 24, 2008 5 Expires: August 27, 2008 7 Byte and Packet Congestion Notification 8 draft-briscoe-tsvwg-byte-pkt-mark-02 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on August 27, 2008. 35 Copyright Notice 37 Copyright (C) The IETF Trust (2008). 39 Abstract 41 This memo concerns dropping or marking packets using active queue 42 management (AQM) such as random early detection (RED) or pre- 43 congestion notification (PCN). The primary conclusion is that packet 44 size should be taken into account when transports decode congestion 45 indications, not when network equipment writes them. Reducing drop 46 of small packets has some tempting advantages: i) it drops less 47 control packets, which tend to be small and ii) it makes TCP's bit- 48 rate less dependent on packet size. However, there are ways of 49 addressing these issues at the transport layer, rather than reverse 50 engineering network forwarding to fix specific transport problems. 51 Network layer algorithms like the byte-mode packet drop variant of 52 RED should not be used to drop fewer small packets, because that 53 creates a perverse incentive for transports to use tiny segments, 54 consequently also opening up a DoS vulnerability. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 2. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 9 60 2.1. Scaling Congestion Control with Packet Size . . . . . . . 9 61 2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets . 10 62 2.3. Small != Control . . . . . . . . . . . . . . . . . . . . . 11 63 3. Working Definition of Congestion Notification . . . . . . . . 12 64 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 12 65 4.1. Congestion Measurement by Queue Length . . . . . . . . . . 12 66 4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 13 67 4.2. Congestion Measurement without a Queue . . . . . . . . . . 14 68 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 14 69 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 16 70 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 17 71 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 17 72 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 17 73 6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 19 74 6.2.3. Making Transports Robust against Control Packet 75 Losses . . . . . . . . . . . . . . . . . . . . . . . . 20 76 6.2.4. Congestion Coding: Summary of Status . . . . . . . . . 21 77 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 23 78 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 23 79 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 24 80 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 81 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 82 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 83 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 27 84 Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . 85 Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 28 86 A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 28 87 A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 28 88 A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 29 89 A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 30 90 A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 31 91 Appendix B. Congestion Notification Definition: Further 92 Justification . . . . . . . . . . . . . . . . . . . . 31 93 Appendix C. Byte-mode Drop Complicates Policing Congestion 94 Response . . . . . . . . . . . . . . . . . . . . . . 32 95 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 96 12.1. Normative References . . . . . . . . . . . . . . . . . . . 33 97 12.2. Informative References . . . . . . . . . . . . . . . . . . 33 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36 99 Intellectual Property and Copyright Statements . . . . . . . . . . 37 101 Changes from Previous Versions 103 To be removed by the RFC Editor on publication. 105 Full incremental diffs between each version are available at 106 107 (courtesy of the rfcdiff tool): 109 From -01 to -02 (this version): 111 Abstract reorganised to align with clearer separation of issue 112 in the memo. 114 Introduction reorganised with motivating arguments removed to 115 new Section 2. 117 Clarified avoiding lock-out of large packets is not the main or 118 only motivation for RED. 120 Mentioned choice of drop or marking explicitly throughout, 121 rather than trying to coin a word to mean either. 123 Generalised the discussion throughout to any packet forwarding 124 function on any network equipment, not just routers. 126 Clarified the last point about why this is a good time to sort 127 out this issue: because it will be hard / impossible to design 128 new transports unless we decide whether the network or the 129 transport is allowing for packet size. 131 Added statement explaining the horizon of the memo is long 132 term, but with short term expediency in mind. 134 Added material on scaling congestion control with packet size 135 (Section 2.1). 137 Separated out issue of normalising TCP's bit rate from issue of 138 preference to control packets (Section 2.3). 140 Divided up Congestion Measurement section for clarity, 141 including new material on fixed size packet buffers and buffer 142 carving (Section 4.1.1 & Section 6.2.1) and on congestion 143 measurement in wireless link technologies without queues 144 (Section 4.2). 146 Added section on 'Making Transports Robust against Control 147 Packet Losses' (Section 6.2.3) with existing & new material 148 included. 150 Added tabulated results of vendor survey on byte-mode drop 151 variant of RED (Table 2). 153 From -00 to -01: 155 Clarified applicability to drop as well as ECN. 157 Highlighted DoS vulnerability. 159 Emphasised that drop-tail suffers from similar problems to 160 byte-mode drop, so only byte-mode drop should be turned off, 161 not RED itself. 163 Clarified the original apparent motivations for recommending 164 byte-mode drop included protecting SYNs and pure ACKs more than 165 equalising the bit rates of TCPs with different segment sizes. 166 Removed some conjectured motivations. 168 Added support for updates to TCP in progress (ackcc & ecn-syn- 169 ack). 171 Updated survey results with newly arrived data. 173 Pulled all recommendations together into the conclusions. 175 Moved some detailed points into two additional appendices and a 176 note. 178 Considerable clarifications throughout. 180 Updated references 182 Requirements notation 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in [RFC2119]. 188 1. Introduction 190 When notifying congestion, the problem of how (and whether) to take 191 packet sizes into account has exercised the minds of researchers and 192 practitioners for as long as active queue management (AQM) has been 193 discussed. Indeed, one reason AQM was originally introduced was to 194 reduce the lock-out effects that small packets can have on large 195 packets in drop-tail queues. This memo aims to state the principles 196 we should be using and to come to conclusions on what these 197 principles will mean for future protocol design, taking into account 198 the deployments we have already. 200 Note that the byte vs. packet dilemma concerns congestion 201 notification irrespective of whether it is signalled implicitly by 202 drop or using explicit congestion notification (ECN [RFC3168] or PCN 203 [I-D.ietf-pcn-architecture]). Throughout this document, unless clear 204 from the context, the term marking will be used to mean notifying 205 congestion explicitly, while congestion notification will be used to 206 mean notifying congestion either implicitly by drop or explicitly by 207 marking. 209 If the load on a resource depends on the rate at which packets 210 arrive, it is called packet-congestible. If the load depends on the 211 rate at which bits arrive it is called bit-congestible. 213 Examples of packet-congestible resources are route look-up engines 214 and firewalls, because load depends on how many packet headers they 215 have to process. Examples of bit-congestible resources are 216 transmission links, and most buffer memory, because the load depends 217 on how many bits they have to transmit or store. Some machine 218 architectures use fixed size packet buffers, so buffer memory in 219 these cases is packet-congestible (see Section 4.1.1). 221 Note that information is generally processed or transmitted with a 222 minimum granularity greater than a bit (e.g. octets). The 223 appropriate granularity for the resource in question SHOULD be used, 224 but for the sake of brevity we will talk in terms of bytes in this 225 memo. 227 Resources may be congestible at higher levels of granularity than 228 packets, for instance stateful firewalls are flow-congestible and 229 call-servers are session-congestible. This memo focuses on 230 congestion of connectionless resources, but the same principles may 231 be applied for congestion notification protocols controlling per-flow 232 and per-session processing or state. 234 The byte vs. packet dilemma arises at three stages in the congestion 235 notification process: 237 Measuring congestion When the congested resource decides locally how 238 to measure how congested it is. (Should the queue be measured in 239 bytes or packets?); 241 Coding congestion notification into the wire protocol: When the 242 congested resource decides how to notify the level of congestion. 243 (Should the level of notification depend on the byte-size of each 244 particular packet carrying the notification?); 246 Decoding congestion notification from the wire protocol: When the 247 transport interprets the notification. (Should the byte-size of a 248 missing or marked packet be taken into account?). 250 In RED, whether to use packets or bytes when measuring queues is 251 called packet-mode or byte-mode queue measurement. This choice is 252 now fairly well understood but is included in Section 4 to document 253 it in the RFC series. 255 The controversy is mainly around the other two stages: whether to 256 allow for packet size when the network codes or when the transport 257 decodes congestion notification. In RED, the variant that reduces 258 drop probability for packets based on their size in bytes is called 259 byte-mode drop, while the variant that doesn't is called packet mode 260 drop. Whether queues are measured in bytes or packets is an 261 orthogonal choice, termed byte-mode queue measurement or packet-mode 262 queue measurement. 264 Currently, the RFC series is silent on this matter other than a paper 265 trail of advice referenced from [RFC2309], which conditionally 266 recommends byte-mode (packet-size dependent) drop [pktByteEmail]. 267 However, all the implementers who responded to our survey have not 268 followed this advice. The primary purpose of this memo is to build a 269 definitive consensus against deliberate preferential treatment for 270 small packets in AQM algorithms and to record this advice within the 271 RFC series. 273 Now is a good time to discuss whether fairness between different 274 sized packets would best be implemented in the network layer, or at 275 the transport, for a number of reasons: 277 1. The packet vs. byte issue requires speedy resolution because the 278 IETF pre-congestion notification (PCN) working group has been 279 chartered to produce a standards track specification of its 280 congestion notification (AQM) algorithm [PCNcharter]; 282 2. [RFC2309] says RED may either take account of packet size or not 283 when dropping, but gives no recommendation between the two, 284 referring instead to advice on the performance implications in an 285 email [pktByteEmail], which recommends byte-mode drop. Further, 286 just before RFC2309 was issued, an addendum was added to the 287 archived email that revisited the issue of packet vs. byte-mode 288 drop in its last para, making the recommendation less clear-cut; 290 3. Without the present memo, the only advice in the RFC series on 291 packet size bias in AQM algorithms would be a reference to an 292 archived email in [RFC2309] (including an addendum at the end of 293 the email to correct the original). 295 4. The IRTF Internet Congestion Control Research Group (ICCRG) 296 recently took on the challenge of building consensus on what 297 common congestion control support should be required from network 298 forwarding functions in future 299 [I-D.irtf-iccrg-welzl-congestion-control-open-research]. The 300 wider Internet community needs to discuss whether the complexity 301 of adjusting for packet size should be in the network or in 302 transports; 304 5. Given there are many good reasons why larger path max 305 transmission units (PMTUs) would help solve a number of scaling 306 issues, we don't want to create any bias against large packets 307 that is greater than their true cost; 309 6. The IETF has started to consider the question of fairness between 310 flows that use different packet sizes (e.g. in the small-packet 311 variant of TCP-friendly rate control, TFRC-SP [RFC4828]). Given 312 transports with different packet sizes, if we don't decide 313 whether the network or the transport should allow for packet 314 size, it will be hard if not impossible to design any transport 315 protocol so that its bit-rate relative to other transports meets 316 design guidelines [RFC5033] (Note however that, if the concern 317 were fairness between users, rather than between flows 318 [Rate_fair_Dis], relative rates between flows would have to come 319 under run-time control rather than being embedded in protocol 320 designs). 322 This memo is initially concerned with how we should correctly scale 323 congestion control functions with packet size for the long term. But 324 it also recognises that expediency may be necessary to deal with 325 existing widely deployed protocols that don't live up to the long 326 term goal. It turns out that the 'correct' variant of RED to deploy 327 seems to be the one everyone has deployed, and no-one who responded 328 to our survey has implemented the other variant. However, at the 329 transport layer, TCP congestion control is a widely deployed protocol 330 that we argue doesn't scale correctly with packet size. To date this 331 hasn't been a significant problem because most TCPs have been used 332 with similar packet sizes. But, as we design new congestion 333 controls, we should build in scaling with packet size rather than 334 assuming we should follow TCP's example. 336 Motivating arguments for our advice are given next in Section 2. 337 Then the body of the memo starts from first principles, defining 338 congestion notification in Section 3 then determining the correct way 339 to measure congestion (Section 4) and to design an idealised 340 congestion notification protocol (Section 5). It then surveys the 341 advice given previously in the RFC series, the research literature 342 and the deployed legacy (Section 6) before listing outstanding issues 343 (Section 7) that will need resolution both to achieve the ideal 344 protocol and to handle legacy. After discussing security 345 considerations (Section 8) strong recommendations for the way forward 346 are given in the conclusions (Section 9). 348 2. Motivating Arguments 350 2.1. Scaling Congestion Control with Packet Size 352 There are two ways of interpreting a dropped or marked packet. It 353 can either be considered as a single loss event or as loss/marking of 354 the bytes in the packet. Here we try to design a test to see which 355 approach scales with packet size. 357 Imagine a bit-congestible link shared by many flows, so that each 358 busy period tends to cause packets to be lost from different flows. 359 The test compares two identical scenarios with the same applications, 360 the same numbers of sources and the same load. But the sources break 361 the load into large packets in one scenario and small packets in the 362 other. Of course, because the load is the same, there will be 363 proportionately more packets in the small packet case. 365 The test of whether a congestion control scales with packet size is 366 that it should respond in the same way to the same congestion 367 excursion, irrespective of the size of the packets that the bytes 368 causing congestion happen to be broken down into. 370 A bit-congestible queue suffering a congestion excursion has to drop 371 or mark the same excess bytes whether they are in a few large packets 372 or many small packets. So for the same congestion excursion, the 373 same amount of bytes have to be shed to get the load back to its 374 operating point. But, of course, for smaller packets more packets 375 will have to be discarded to shed the same bytes. 377 If all the transports interpret each drop/mark as a single loss event 378 irrespective of the size of the packet dropped, they will respond 379 more to the same congestion excursion, failing our test. On the 380 other hand, if they respond proportionately less when smaller packets 381 are dropped/marked, overall they will be able to respond the same to 382 the same congestion excursion. 384 Therefore, for a congestion control to scale with packet size it 385 should respond to dropped or marked bytes (as TFRC-SP [RFC4828] 386 effectively does), not just to dropped or marked packets irrespective 387 of packet size (as TCP does). 389 The email [pktByteEmail] referred to by RFC2309 says the question of 390 whether a packet's own size should affect its drop probability 391 "depends on the dominant end-to-end congestion control mechanisms". 392 But we argue the network layer should not be optimised for whatever 393 transport is predominant. 395 TCP congestion control ensures that flows competing for the same 396 resource each maintain the same number of segments in flight, 397 irrespective of segment size. So under similar conditions, flows 398 with different segment sizes will get different bit rates. But even 399 though reducing the drop probability of small packets helps ensure 400 TCPs with different packet sizes will achieve similar bit rates, we 401 argue this should be achieved in TCP itself, not in the network. 403 Effectively, favouring small packets is reverse engineering of the 404 network layer around TCP, contrary to the excellent advice in 405 [RFC3426], which asks designers to question "Why are you proposing a 406 solution at this layer of the protocol stack, rather than at another 407 layer?" 409 2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets 411 Increasingly, it is being recognised that a protocol design must take 412 care not to cause unintended consequences by giving the parties in 413 the protocol exchange perverse incentives [Evol_cc][RFC3426]. Again, 414 imagine a scenario where the same bit rate of packets will contribute 415 the same to congestion of a link irrespective of whether it is sent 416 as fewer larger packets or more smaller packets. A protocol design 417 that caused larger packets to be more likely to be dropped than 418 smaller ones would be dangerous in this case: 420 Malicious transports: A queue that gives an advantage to small 421 packets can be used to amplify the force of a flooding attack. By 422 sending a flood of small packets, the attacker can get the queue 423 to discard more traffic in large packets, allowing more attack 424 traffic to get through to cause further damage. Such a queue 425 allows attack traffic to have a disproportionately large effect on 426 regular traffic without the attacker having to do much work. The 427 byte-mode drop variant of RED amplifies small packet attacks. 429 Drop-tail queues amplify small packet attacks even more than RED 430 byte-mode drop (see the Security Considerations section 431 Section 8). Wherever possible neither should be used. 433 Normal transports: Even if a transport is not malicious, if it finds 434 small packets go faster, it will tend to act in its own interest 435 and use them. Queues that give advantage to small packets create 436 an evolutionary pressure for transports to send at the same bit- 437 rate but break their data stream down into tiny segments to reduce 438 their drop rate. Encouraging a high volume of tiny packets might 439 in turn unnecessarily overload a completely unrelated part of the 440 system, perhaps more limited by header-processing than bandwidth. 442 Imagine two flows arrive at a bit-congestible transmission link each 443 with the same bit rate, say 1Mbps, but one consists of 1500B and the 444 other 60B packets, which are 25x smaller. Consider a scenario where 445 gentle RED [gentle_RED] is used, along with the variant of RED we 446 advise against, i.e. where the RED algorithm is configured to adjust 447 the drop probability of packets in proportion to each packet's size 448 (byte mode packet drop). In this case, if RED drops 25% of the 449 larger packets, it will aim to drop 1% of the smaller packets (but in 450 practice it may drop more as congestion increases 451 [RFC4828](S.B.4)[Note_Variation]). Even though both flows arrive 452 with the same bit rate, the bit rate the RED queue aims to pass to 453 the line will be 750k for the flow of larger packet but 990k for the 454 smaller packets (but because of rate variation it will be less than 455 this target). It can be seen that this behaviour reopens the same 456 denial of service vulnerability that drop tail queues offer to floods 457 of small packet, though not necessarily as strongly (see Section 8). 459 2.3. Small != Control 461 It is tempting to drop small packets with lower probability to 462 improve performance, because many control packets are small (TCP SYNs 463 & ACKs, DNS queries & responses, SIP messages, HTTP GETs, etc) and 464 dropping fewer control packets considerably improves performance. 465 However, we must not give control packets preference purely by virtue 466 of their smallness, otherwise it is too easy for any data source to 467 get the same preferential treatment simply by sending data in smaller 468 packets. Again we should not create perverse incentives to favour 469 small packets rather than to favour control packets, which is what we 470 intend. 472 Just because many control packets are small does not mean all small 473 packets are control packets. 475 So again, rather than fix these problems in the network layer, we 476 argue that the transport should be made more robust against losses of 477 control packets (see 'Making Transports Robust against Control Packet 478 Losses' in Section 6.2.3). 480 3. Working Definition of Congestion Notification 482 Rather than aim to achieve what many have tried and failed, this memo 483 will not try to define congestion. It will give a working definition 484 of what congestion notification should be taken to mean for this 485 document. Congestion notification is a changing signal that aims to 486 communicate the ratio E/L, where E is the instantaneous excess load 487 offered to a resource that it cannot (or would not) serve and L is 488 the instantaneous offered load. 490 The phrase `would not serve' is added, because AQM systems (e.g. 491 RED, PCN [I-D.ietf-pcn-architecture]) use a virtual capacity smaller 492 than actual capacity, then notify congestion of this virtual capacity 493 in order to avoid congestion of the actual capacity. 495 Note that the denominator is offered load, not capacity. Therefore 496 congestion notification is a real number bounded by the range [0,1]. 497 This ties in with the most well-understood form of congestion 498 notification: drop rate. It also means that congestion has a natural 499 interpretation as a probability; the probability of offered traffic 500 not being served (or being marked as at risk of not being served). 501 Appendix B describes a further incidental benefit that arises from 502 using load as the denominator of congestion notification. 504 4. Congestion Measurement 506 4.1. Congestion Measurement by Queue Length 508 Queue length is usually the most correct and simplest way to measure 509 congestion of a resource. To avoid the pathological effects of drop 510 tail, an AQM function can then be used to transform queue length into 511 the probability of dropping or marking a packet (e.g. RED's 512 piecewise linear function between thresholds). If the resource is 513 bit-congestible, the length of the queue SHOULD be measured in bytes. 514 If the resource is packet-congestible, the length of the queue SHOULD 515 be measured in packets. No other choice makes sense, because the 516 number of packets waiting in the queue isn't relevant if the resource 517 gets congested by bytes and vice versa. We discuss the implications 518 on RED's byte mode and packet mode for measuring queue length in 519 Section 6. 521 4.1.1. Fixed Size Packet Buffers 523 Some, mostly older, queuing hardware sets aside fixed sized buffers 524 in which to store each packet in the queue. Also, with some 525 hardware, any fixed sized buffers not completely filled by a packet 526 are padded when transmitted to the wire. If we imagine a theoretical 527 forwarding system with both queuing and transmission in fixed, MTU- 528 sized units, it should clearly be treated as packet-congestible, 529 because the queue length in packets would be a good model of 530 congestion of the lower layer link. 532 If we now imagine a hybrid forwarding system with transmission delay 533 largely dependent on the byte-size of packets but buffers of one MTU 534 per packet, it should strictly require a more complex algorithm to 535 determine the probability of congestion. It should be treated as two 536 resources in sequence, where the sum of the byte-sizes of the packets 537 within each packet buffer models congestion of the line while the 538 length of the queue in packets models congestion of the queue. Then 539 the probability of congesting the forwarding buffer would be a 540 conditional probability--conditional on the previously calculated 541 probability of congesting the line. 543 However, in systems that use fixed size buffers, it is unusual for 544 all the buffers used by an interface to be the same size. Typically 545 pools of different sized buffers are provided (Cisco uses the term 546 'buffer carving' for the process of dividing up memory into these 547 pools [IOSArch]). Usually, if the pool of small buffers is 548 exhausted, arriving small packets can borrow space in the pool of 549 large buffers, but not vice versa. However, it is easier to work out 550 what should be done if we temporarily set aside the possibility of 551 such borrowing. Then, with fixed pools of buffers for different 552 sized packets and no borrowing, the size of each pool and the current 553 queue length in each pool would both be measured in packets. So an 554 AQM algorithm would have to maintain the queue length for each pool, 555 and judge whether to drop/mark a packet of a particular size by 556 looking at the pool for packets of that size and using the length (in 557 packets) of its queue. 559 We now return to the issue we temporarily set aside: small packets 560 borrowing space in larger buffers. In this case, the only difference 561 is that the pools for smaller packets have a maximum queue size that 562 includes all the pools for larger packets. And every time a packet 563 takes a larger buffer, the current queue size has to be incremented 564 for all queues in the pools of buffers less than or equal to the 565 buffer size used. 567 We will return to borrowing of fixed sized buffers when we discuss 568 biasing the drop/marking probability of a specific packet because of 569 its size in Section 6.2.1. But here we can give a simple summary of 570 the present discussion on how to measure the length of queues of 571 fixed buffers: no matter how complicated the scheme is, ultimately 572 any fixed buffer system will need to measure its queue length in 573 packets not bytes. 575 4.2. Congestion Measurement without a Queue 577 AQM algorithms are nearly always described assuming there is a queue 578 for a congested resource and the algorithm can use the queue length 579 to determine the probability that it will drop or mark each packet. 580 But not all congested resources lead to queues. For instance, 581 wireless spectrum is bit-congestible (for a given coding scheme), 582 because interference increases with the rate at which bits are 583 transmitted. But wireless link protocols do not always maintain a 584 queue that depends on spectrum interference. Similarly, power 585 limited resources are also usually bit-congestible if energy is 586 primarily required for transmission rather than header processing, 587 but it is rare for a link protocol to build a queue as it approaches 588 maximum power. 590 However, AQM algorithms don't require a queue in order to work. For 591 instance spectrum congestion can be modelled by signal quality using 592 target bit-energy-to-noise-density ratio. And, to model radio power 593 exhaustion, transmission power levels can be measured and compared to 594 the maximum power available. [ECNFixedWireless] proposes a practical 595 and theoretically sound way to combine congestion notification for 596 different bit-congestible resources at different layers along an end 597 to end path, whether wireless or wired, and whether with or without 598 queues. 600 5. Idealised Wire Protocol Coding 602 We will start by inventing an idealised congestion notification 603 protocol before discussing how to make it practical. The idealised 604 protocol is shown to be correct using examples in Appendix A. 605 Congestion notification involves the congested resource coding a 606 congestion notification signal into the packet stream and the 607 transports decoding it. The idealised protocol uses two different 608 fields in each datagram to signal congestion: one for byte congestion 609 and one for packet congestion. 611 We are not saying two ECN fields will be needed (and we are not 612 saying that somehow a resource should be able to drop a packet in one 613 of two different ways so that the transport can distinguish which 614 sort of drop it was!). These two congestion notification channels 615 are just a conceptual device. They allow us to defer having to 616 decide whether to distinguish between byte and packet congestion when 617 the network resource codes the signal or when the transport decodes 618 it. 620 However, although this idealised mechanism isn't intended for 621 implementation, we do want to emphasise that we may need to find a 622 way to implement it, because it could become necessary to somehow 623 distinguish between bit and packet congestion [RFC3714]. Currently a 624 design goal of network processing equipment such as routers and 625 firewalls is to keep packet processing uncongested even under worst 626 case bit rates with minimum packet sizes. Therefore, packet- 627 congestion is currently rare, but there is no guarantee that it will 628 not become common with future technology trends. 630 The idealised wire protocol is given below. It accounts for packet 631 sizes at the transport layer, not in the network, and then only in 632 the case of bit-congestible resources. This avoids the perverse 633 incentive to send smaller packets and the DoS vulnerability that 634 would otherwise result if the network were to bias towards them (see 635 the motivating argument about avoiding perverse incentives in 636 Section 2.2). Incidentally, it also ensures neither the network nor 637 the transport needs to do a multiply operation--multiplication by 638 packet size is effectively achieved as a repeated add when the 639 transport adds to its count of marked bytes as each congestion event 640 is fed to it: 642 o A packet-congestible resource trying to code congestion level p_p 643 into a packet stream should mark the idealised `packet congestion' 644 field in each packet with probability p_p irrespective of the 645 packet's size. The transport should then take a packet with the 646 packet congestion field marked to mean just one mark, irrespective 647 of the packet size. 649 o A bit-congestible resource trying to code time-varying byte- 650 congestion level p_b into a packet stream should mark the `byte 651 congestion' field in each packet with probability p_b, again 652 irrespective of the packet's size. Unlike before, the transport 653 should take a packet with the byte congestion field marked to 654 count as a mark on each byte in the packet. 656 The worked examples in Appendix A show that transports can extract 657 sufficient and correct congestion notification from these protocols 658 for cases when two flows with different packet sizes have matching 659 bit rates or matching packet rates. Examples are also given that mix 660 these two flows into one to show that a flow with mixed packet sizes 661 would still be able to extract sufficient and correct information. 663 Sufficient and correct congestion information means that there is 664 sufficient information for the two different types of transport 665 requirements: 667 Ratio-based: Established transport congestion controls like TCP's 668 [RFC2581] aim to achieve equal segment rates per RTT through the 669 same bottleneck--TCP friendliness [RFC3448]. They work with the 670 ratio of dropped to delivered segments (or marked to unmarked 671 segments in the case of ECN). The example scenarios show that 672 these ratio-based transports are effectively the same whether 673 counting in bytes or packets, because the units cancel out. 674 (Incidentally, this is why TCP's bit rate is still proportional to 675 packet size even when byte-counting is used, as recommended for 676 TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security 677 reasons.) 679 Absolute-target-based: Other congestion controls proposed in the 680 research community aim to limit the volume of congestion caused to 681 a constant weight parameter. [MulTCP][WindowPropFair] are 682 examples of weighted proportionally fair transports designed for 683 cost-fair environments [Rate_fair_Dis]. In this case, the 684 transport requires a count (not a ratio) of dropped/marked bytes 685 in the bit-congestible case and of dropped/marked packets in the 686 packet congestible case. 688 6. The State of the Art 690 The original 1993 paper on RED [RED93] proposed two options for the 691 RED active queue management algorithm: packet mode and byte mode. 692 Packet mode measured the queue length in packets and dropped (or 693 marked) individual packets with a probability independent of their 694 size. Byte mode measured the queue length in bytes and marked an 695 individual packet with probability in proportion to its size 696 (relative to the maximum packet size). In the paper's outline of 697 further work, it was stated that no recommendation had been made on 698 whether the queue size should be measured in bytes or packets, but 699 noted that the difference could be significant. 701 When RED was recommended for general deployment in 1998 [RFC2309], 702 the two modes were mentioned implying the choice between them was a 703 question of performance, referring to a 1997 email [pktByteEmail] for 704 advice on tuning. This email clarified that there were in fact two 705 orthogonal choices: whether to measure queue length in bytes or 706 packets (Section 6.1 below) and whether the drop probability of an 707 individual packet should depend on its own size (Section 6.2 below). 709 6.1. Congestion Measurement: Status 711 The choice of which metric to use to measure queue length was left 712 open in RFC2309. It is now well understood that queues for bit- 713 congestible resources should be measured in bytes, and queues for 714 packet-congestible resources should be measured in packets (see 715 Section 4). 717 Where buffers are not configured or legacy buffers cannot be 718 configured to the above guideline, we don't have to make allowances 719 for such legacy in future protocol design. If a bit-congestible 720 buffer is measured in packets, the operator will have set the 721 thresholds mindful of a typical mix of packets sizes. Any AQM 722 algorithm on such a buffer will be oversensitive to high proportions 723 of small packets, e.g. a DoS attack, and undersensitive to high 724 proportions of large packets. But an operator can safely keep such a 725 legacy buffer because any undersensitivity during unusual traffic 726 mixes cannot lead to congestion collapse given the buffer will 727 eventually revert to tail drop, discarding proportionately more large 728 packets. 730 Some modern queue implementations give a choice for setting RED's 731 thresholds in byte-mode or packet-mode. This may merely be an 732 administrator-interface preference, not altering how the queue itself 733 is measured but on some hardware it does actually change the way it 734 measures its queue. Whether a resource is bit-congestible or packet- 735 congestible is a property of the resource, so an admin SHOULD NOT 736 ever need to, or be able to, configure the way a queue measures 737 itself. 739 We believe the question of whether to measure queues in bytes or 740 packets is fairly well understood these days. The only outstanding 741 issues concern how to measure congestion when the queue is bit 742 congestible but the resource is packet congestible or vice versa (see 743 Section 4). But there is no controversy over what should be done. 744 It's just you have to be an expert in probability to work out what 745 should be done and, even if you have, it's not always easy to find a 746 practical algorithm to implement it. 748 6.2. Congestion Coding: Status 750 6.2.1. Network Bias when Encoding 752 The previously mentioned email [pktByteEmail] referred to by 753 [RFC2309] said that the choice over whether a packet's own size 754 should affect its drop probability "depends on the dominant end-to- 755 end congestion control mechanisms". [Section 2 argues against this 756 approach, citing the excellent advice in RFC3246.] The referenced 757 email went on to argue that drop probability should depend on the 758 size of the packet being considered for drop if the resource is bit- 759 congestible, but not if it is packet-congestible, but advised that 760 most scarce resources in the Internet were currently bit-congestible. 761 The argument continued that if packet drops were inflated by packet 762 size (byte-mode dropping), "a flow's fraction of the packet drops is 763 then a good indication of that flow's fraction of the link bandwidth 764 in bits per second". This was consistent with a referenced policing 765 mechanism being worked on at the time for detecting unusually high 766 bandwidth flows, eventually published in 1999 [pBox]. [The problem 767 could have been solved by making the policing mechanism count the 768 volume of bytes randomly dropped, not the number of packets.] 770 A few months before RFC2309 was published, an addendum was added to 771 the above archived email referenced from the RFC, in which the final 772 paragraph seemed to partially retract what had previously been said. 773 It clarified that the question of whether the probability of 774 dropping/marking a packet should depend on its size was not related 775 to whether the resource itself was bit congestible, but a completely 776 orthogonal question. However the only example given had the queue 777 measured in packets but packet drop depended on the byte-size of the 778 packet in question. No example was given the other way round. 780 In 2000, Cnodder et al [REDbyte] pointed out that there was an error 781 in the part of the original 1993 RED algorithm that aimed to 782 distribute drops uniformly, because it didn't correctly take into 783 account the adjustment for packet size. They recommended an 784 algorithm called RED_4 to fix this. But they also recommended a 785 further change, RED_5, to adjust drop rate dependent on the square of 786 relative packet size. This was indeed consistent with one stated 787 motivation behind RED's byte mode drop--that we should reverse 788 engineer the network to improve the performance of dominant end-to- 789 end congestion control mechanisms. 791 By 2003, a further change had been made to the adjustment for packet 792 size, this time in the RED algorithm of the ns2 simulator. Instead 793 of taking each packet's size relative to a `maximum packet size' it 794 was taken relative to a `mean packet size', intended to be a static 795 value representative of the `typical' packet size on the link. We 796 have not been able to find a justification for this change in the 797 literature, however Eddy and Allman conducted experiments [REDbias] 798 that assessed how sensitive RED was to this parameter, amongst other 799 things. No-one seems to have pointed out that this changed algorithm 800 can often lead to drop probabilities of greater than 1 [which should 801 ring alarm bells hinting that there's a mistake in the theory 802 somewhere]. On 10-Nov-2004, this variant of byte-mode packet drop 803 was made the default in the ns2 simulator. 805 The byte-mode drop variant of RED is, of course, not the only 806 possible bias towards small packets in queueing algorithms. We have 807 already mentioned that tail-drop queues naturally tend to lock-out 808 large packets once they are full. But also queues with fixed sized 809 buffers reduce the probability that small packets will be dropped if 810 (and only if) they allow small packets to borrow buffers from the 811 pools for larger packets. As was explained in Section 4.1.1 on fixed 812 size buffer carving, borrowing effectively makes the maximum queue 813 size for small packets greater than that for large packets, because 814 more buffers can be used by small packets while less will fit large 815 packets. 817 However, in itself, the bias towards small packets caused by buffer 818 borrowing is perfectly correct. Lower drop probability for small 819 packets is legitimate in buffer borrowing schemes, because small 820 packets genuinely congest the machine's buffer memory less than large 821 packets, given they can fit in more spaces. The bias towards small 822 packets is not artificially added (as it is in RED's byte-mode drop 823 algorithm), it merely reflects the reality of the way fixed buffer 824 memory gets congested. Incidentally, the bias towards small packets 825 from buffer borrowing is nothing like as large as that of RED's byte- 826 mode drop. 828 Nonetheless, fixed-buffer memory with tail drop is still prone to 829 lock-out large packets, purely because of the tail-drop aspect. So a 830 good AQM algorithm like RED with packet-mode drop should be used with 831 fixed buffer memories where possible. If RED is too complicated to 832 implement with multiple fixed buffer pools, the minimum necessary to 833 prevent large packet lock-out is to ensure smaller packets never use 834 the last available buffer in any of the pools for larger packets. 836 6.2.2. Transport Bias when Decoding 838 The above proposals to alter the network layer to give a bias towards 839 smaller packets have largely carried on outside the IETF process 840 (unless one counts a reference in an informational RFC to an archived 841 email!). Whereas, within the IETF, there are many different 842 proposals to alter transport protocols to achieve the same goals, 843 i.e. either to make the flow bit-rate take account of packet size, or 844 to protect control packets from loss. This memo argues that altering 845 transport protocols is the more principled approach. 847 A recently approved experimental RFC adapts its transport layer 848 protocol to take account of packet sizes relative to typical TCP 849 packet sizes. This proposes a new small-packet variant of TCP- 850 friendly rate control [RFC3448] called TFRC-SP [RFC4828]. 851 Essentially, it proposes a rate equation that inflates the flow rate 852 by the ratio of a typical TCP segment size (1500B including TCP 853 header) over the actual segment size [PktSizeEquCC]. (There are also 854 other important differences of detail relative to TFRC, such as using 855 virtual packets [CCvarPktSize] to avoid responding to multiple losses 856 per round trip and using a minimum inter-packet interval.) 858 Section 4.5.1 of this TFRC-SP spec discusses the implications of 859 operating in an environment where queues have been configured to drop 860 smaller packets with proportionately lower probability than larger 861 ones. But it only discusses TCP operating in such an environment, 862 only mentioning TFRC-SP briefly when discussing how to define 863 fairness with TCP. And it only discusses the byte-mode dropping 864 version of RED as it was before Cnodder et al pointed out it didn't 865 sufficiently bias towards small packets to make TCP independent of 866 packet size. 868 So the TFRC-SP spec doesn't address the issue of which of the network 869 or the transport _should_ handle fairness between different packet 870 sizes. In its Appendix B.4 it discusses the possibility of both 871 TFRC-SP and some network buffers duplicating each other's attempts to 872 deliberately bias towards small packets. But the discussion is not 873 conclusive, instead reporting simulations of many of the 874 possibilities in order to assess performance but not recommending any 875 particular course of action. 877 The paper originally proposing TFRC with virtual packets (VP-TFRC) 878 [CCvarPktSize] proposed that there should perhaps be two variants to 879 cater for the different variants of RED. However, as the TFRC-SP 880 authors point out, there is no way for a transport to know whether 881 some queues on its path have deployed RED with byte-mode packet drop 882 (except if an exhaustive survey found that no-one has deployed it!-- 883 see Section 6.2.4). Incidentally, VP-TFRC also proposed that byte- 884 mode RED dropping should really square the packet size compensation 885 factor (like that of RED_5, but apparently unaware of it). 887 Pre-congestion notification [I-D.ietf-pcn-architecture] is a proposal 888 to use a virtual queue for AQM marking for packets within one 889 Diffserv class in order to give early warning prior to any real 890 queuing. The proposed PCN marking algorithms have been designed not 891 to take account of packet size when forwarding through queues. 892 Instead the general principle has been to take account of the sizes 893 of marked packets when monitoring the fraction of marking at the edge 894 of the network. 896 6.2.3. Making Transports Robust against Control Packet Losses 898 Recently, two drafts have proposed changes to TCP that make it more 899 robust against losing small control packets [I-D.ietf-tcpm-ecnsyn] 900 [I-D.floyd-tcpm-ackcc]. In both cases they note that the case for 901 these TCP changes would be weaker if RED were biased against dropping 902 small packets. We argue here that these two proposals are a safer 903 and more principled way to achieve TCP performance improvements than 904 reverse engineering RED to benefit TCP. 906 Although no proposals exist as far as we know, it would also be 907 possible and perfectly valid to make control packets robust against 908 drop by explicitly requesting a lower drop probability using their 909 Diffserv code point [RFC2474] to request a scheduling class with 910 lower drop. 912 The re-ECN protocol proposal [Re-TCP] is designed so that transports 913 can be made more robust against losing control packets. It gives 914 queues an incentive to optionally give preference against drop to 915 packets with the 'feedback not established' codepoint in the proposed 916 'extended ECN' field. Senders have incentives to use this codepoint 917 sparingly, but they can use it on control packets to reduce their 918 chance of being dropped. For instance, the proposed modification to 919 TCP for re-ECN uses this codepoint on the SYN and SYN-ACK. 921 Although not brought to the IETF, a simple proposal from Wischik 922 [DupTCP] suggests that the first three packets of every TCP flow 923 should be routinely duplicated after a short delay. It shows that 924 this would greatly improve the chances of short flows completing 925 quickly, but it would hardly increase traffic levels on the Internet, 926 because Internet bytes have always been concentrated in the large 927 flows. It further shows that the performance of many typical 928 applications depends on completion of long serial chains of short 929 messages. It argues that, given most of the value people get from 930 the Internet is concentrated within short flows, this simple 931 expedient would greatly increase the value of the best efforts 932 Internet at minimal cost. 934 6.2.4. Congestion Coding: Summary of Status 936 +-----------+----------------+-----------------+--------------------+ 937 | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | 938 | cc | mode drop) | byte mode drop) | mode drop) | 939 +-----------+----------------+-----------------+--------------------+ 940 | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | 941 | TFRC | | | | 942 | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | 943 +-----------+----------------+-----------------+--------------------+ 945 Table 1: Dependence of flow bit-rate per RTT on packet size s and 946 drop rate p when network and/or transport bias towards small packets 947 to varying degrees 949 Table 1 aims to summarise the positions we may now be in. Each 950 column shows a different possible AQM behaviour in different queues 951 in the network, using the terminology of Cnodder et al outlined 952 earlier (RED_1 is basic RED with packet-mode drop). Each row shows a 953 different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on 954 the top row with TFRC-SP [RFC4828] below. Suppressing all 955 inessential details the table shows that independence from packet 956 size should either be achievable by not altering the TCP transport in 957 a RED_5 network, or using the small packet TFRC-SP transport in a 958 network without any byte-mode dropping RED (top right and bottom 959 left). Top left is the `do nothing' scenario, while bottom right is 960 the `do-both' scenario in which bit-rate would become far too biased 961 towards small packets. Of course, if any form of byte-mode dropping 962 RED has been deployed on a selection of congested queues, each path 963 will present a different hybrid scenario to its transport. 965 Whatever, we can see that the linear byte-mode drop column in the 966 middle considerably complicates the Internet. It's a half-way house 967 that doesn't bias enough towards small packets even if one believes 968 the network should be doing the biasing. We argue below that _all_ 969 network layer bias towards small packets should be turned off--if 970 indeed any equipment vendors have implemented it--leaving packet size 971 bias solely as the preserve of the transport layer (solely the 972 leftmost, packet-mode drop column). 974 A survey has been conducted of 84 vendors to assess how widely drop 975 probability based on packet size has been implemented in RED. Prior 976 to the survey, an individual approach to Cisco received confirmation 977 that, having checked the code-base for each of the product ranges, 978 Cisco has not implemented any discrimination based on packet size in 979 any AQM algorithm in any of its products. Also an individual 980 approach to Alcatel-Lucent drew a confirmation that it was very 981 likely that none of their products contained RED code that 982 implemented any packet-size bias. 984 Turning to our more formal survey (Table 2), about 19% of those 985 surveyed have replied so far, giving a sample size of 16. Although 986 we do not have permission to identify the respondents, we can say 987 that those that have responded include most of the larger vendors, 988 covering a large fraction of the market. They range across the large 989 network equipment vendors at L3 & L2, firewall vendors, wireless 990 equipment vendors, as well as large software businesses with a small 991 selection of networking products. So far, all those who have 992 responded have confirmed that they have not implemented the variant 993 of RED with drop dependent on packet size (2 are fairly sure they 994 haven't but need to check more thoroughly). 996 +-------------------------------+----------------+-----------------+ 997 | Response | No. of vendors | %age of vendors | 998 +-------------------------------+----------------+-----------------+ 999 | Not implemented | 14 | 17% | 1000 | Not implemented (probably) | 2 | 2% | 1001 | Implemented | 0 | 0% | 1002 | No response | 68 | 81% | 1003 | Total companies/orgs surveyed | 84 | 100% | 1004 +-------------------------------+----------------+-----------------+ 1006 Table 2: Vendor Survey on byte-mode drop variant of RED (lower drop 1007 probability for small packets) 1009 Where reasons have been given, the extra complexity of packet bias 1010 code has been most prevalent, though one vendor had a more principled 1011 reason for avoiding it--similar to the argument of this document. We 1012 have established that Linux does not implement RED with packet size 1013 drop bias, although we have not investigated a wider range of open 1014 source code. 1016 Finally, we repeat that RED's byte mode drop is not the only way to 1017 bias towards small packets--tail-drop tends to lock-out large packets 1018 very effectively. Our survey was of vendor implementations, so we 1019 cannot be certain about operator deployment. But we believe many 1020 queues in the Internet are still tail-drop. My own company (BT) has 1021 widely deployed RED, but there are bound to be many tail-drop queues, 1022 particularly in access network equipment and on middleboxes like 1023 firewalls, where RED is not always available. Routers using a memory 1024 architecture based on fixed size buffers with borrowing may also 1025 still be prevalent in the Internet. As explained in Section 6.2.1, 1026 these also provide a marginal (but legitimate) bias towards small 1027 packets. So even though RED byte-mode drop is not prevalent, it is 1028 likely there is still some bias towards small packets in the Internet 1029 due to tail drop and fixed buffer borrowing. 1031 7. Outstanding Issues and Next Steps 1033 7.1. Bit-congestible World 1035 For a connectionless network with only bit-congestible resources we 1036 believe the recommended position is now unarguably clear--that the 1037 network should not make allowance for packet sizes and the transport 1038 should. This leaves two outstanding issues: 1040 o How to handle any legacy of AQM with byte-mode drop already 1041 deployed; 1043 o The need to start a programme to update transport congestion 1044 control protocol standards to take account of packet size. 1046 The sample of returns from our vendor survey Section 6.2.4 suggest 1047 that byte-mode packet drop seems not to be implemented at all let 1048 alone deployed, or if it is, it is likely to be very sparse. 1049 Therefore, we do not really need a migration strategy from all but 1050 nothing to nothing. 1052 A programme of standards updates to take account of packet size in 1053 transport congestion control protocols has started with TFRC-SP 1054 [RFC4828], while weighted TCPs implemented in the research community 1055 [WindowPropFair] could form the basis of a future change to TCP 1056 congestion control [RFC2581] itself. 1058 7.2. Bit- & Packet-congestible World 1060 Nonetheless, a connectionless network with both bit-congestible and 1061 packet-congestible resources is a different matter. If we believe we 1062 should allow for this possibility in the future, this space contains 1063 a truly open research issue. 1065 The idealised wire protocol coding described in Section 5 requires at 1066 least two flags for congestion of bit-congestible and packet- 1067 congestible resources. This hides a fundamental problem--much more 1068 fundamental than whether we can magically create header space for yet 1069 another ECN flag in IPv4, or whether it would work while being 1070 deployed incrementally. A congestion notification protocol must 1071 survive a transition from low levels of congestion to high. Marking 1072 two states is feasible with explicit marking, but much harder if 1073 packets are dropped. Also, it will not always be cost-effective to 1074 implement AQM at every low level resource, so drop will often have to 1075 suffice. Distinguishing drop from delivery naturally provides just 1076 one congestion flag--it is hard to drop a packet in two ways that are 1077 distinguishable remotely. This is a similar problem to that of 1078 distinguishing wireless transmission losses from congestive losses. 1080 We should also note that, strictly, packet-congestible resources are 1081 actually cycle-congestible because load also depends on the 1082 complexity of each look-up and whether the pattern of arrivals is 1083 amenable to caching or not. Further, this reminds us that any 1084 solution must not require a forwarding engine to use excessive 1085 processor cycles in order to decide how to say it has no spare 1086 processor cycles. 1088 The problem of signalling packet processing congestion is not 1089 pressing, as most if not all Internet resources are designed to be 1090 bit-congestible before packet processing starts to congest. However, 1091 given the IRTF ICCRG has set itself the task of reaching consensus on 1092 generic forwarding mechanisms that are necessary and sufficient to 1093 support the Internet's future congestion control requirements 1094 [I-D.irtf-iccrg-welzl-congestion-control-open-research], we must not 1095 give this problem no thought at all, just because it is hard and 1096 currently hypothetical. 1098 8. Security Considerations 1100 This draft recommends that queues do not bias drop probability 1101 towards small packets as this creates a perverse incentive for 1102 transports to break down their flows into tiny segments. One of the 1103 benefits of implementing AQM was meant to be to remove this perverse 1104 incentive that drop-tail queues gave to small packets. Of course, if 1105 transports really want to make the greatest gains, they don't have to 1106 respond to congestion anyway. But we don't want applications that 1107 are trying to behave to discover that they can go faster by using 1108 smaller packets. 1110 In practice, transports cannot all be trusted to respond to 1111 congestion. So another reason for recommending that queues do not 1112 bias drop probability towards small packets is to avoid the 1113 vulnerability to small packet DDoS attacks that would otherwise 1114 result. One of the benefits of implementing AQM was meant to be to 1115 remove drop-tail's DoS vulnerability to small packets, so we 1116 shouldn't add it back again. 1118 If most queues implemented AQM with byte-mode drop, the resulting 1119 network would amplify the potency of a small packet DDoS attack. At 1120 the first queue the stream of packets would push aside a greater 1121 proportion of large packets, so more of the small packets would 1122 survive to attack the next queue. Thus a flood of small packets 1123 would continue on towards the destination, pushing regular traffic 1124 with large packets out of the way in one queue after the next, but 1125 suffering much less drop itself. 1127 Appendix C explains why the ability of networks to police the 1128 response of _any_ transport to congestion depends on bit-congestible 1129 network resources only doing packet-mode not byte-mode drop. In 1130 summary, it says that making drop probability depend on the size of 1131 the packets that bits happen to be divided into simply encourages the 1132 bits to be divided into smaller packets. Byte-mode drop would 1133 therefore irreversibly complicate any attempt to fix the Internet's 1134 incentive structures. 1136 9. Conclusions 1138 The strong conclusion is that AQM algorithms such as RED SHOULD NOT 1139 use byte-mode drop. More generally, the Internet's congestion 1140 notification protocols (drop, ECN & PCN) SHOULD take account of 1141 packet size when the notification is read by the transport layer, NOT 1142 when it is written by the network layer. This approach offers 1143 sufficient and correct congestion information for all known and 1144 future transport protocols and also ensures no perverse incentives 1145 are created that would encourage transports to use inappropriately 1146 small packet sizes. 1148 The alternative of deflating RED's drop probability for smaller 1149 packet sizes (byte-mode drop) has no enduring advantages. It is more 1150 complex, it creates the perverse incentive to fragment segments into 1151 tiny pieces and it reopens the vulnerability to floods of small- 1152 packets that drop-tail queues suffered from and AQM was designed to 1153 remove. Byte-mode drop is a change to the network layer that makes 1154 allowance for an omission from the design of TCP, effectively reverse 1155 engineering the network layer to contrive to make two TCPs with 1156 different packet sizes run at equal bit rates (rather than packet 1157 rates) under the same path conditions. It also improves TCP 1158 performance by reducing the chance that a SYN or a pure ACK will be 1159 dropped, because they are small. But we SHOULD NOT hack the network 1160 layer to improve or fix certain transport protocols. No matter how 1161 predominant a transport protocol is (even if it's TCP), trying to 1162 correct for its failings by biasing towards small packets in the 1163 network layer creates a perverse incentive to break down all flows 1164 from all transports into tiny segments. 1166 So far, our survey of 84 vendors across the industry has drawn 1167 responses from about 19%, none of whom have implemented the byte mode 1168 packet drop variant of RED. Given there appears to be little, if 1169 any, installed base recommending removal of byte-mode drop from RED 1170 is possibly only a paper exercise with few, if any, incremental 1171 deployment issues. 1173 If a vendor has implemented byte-mode drop, and an operator has 1174 turned it on, it is strongly RECOMMENDED that it SHOULD be turned 1175 off. Note that RED as a whole SHOULD NOT be turned off, as without 1176 it, a drop tail queue also biases against large packets. But note 1177 also that turning off byte-mode may alter the relative performance of 1178 applications using different packet sizes, so it would be advisable 1179 to establish the implications before turning it off. 1181 Instead, the IETF transport area should continue its programme of 1182 updating congestion control protocols to take account of packet size 1183 and to make transports less sensitive to losing control packets like 1184 SYNs and pure ACKS. 1186 NOTE WELL that RED's byte-mode queue measurement is fine, being 1187 completely orthogonal to byte-mode drop. If a RED implementation has 1188 a byte-mode but does not specify what sort of byte-mode, it is most 1189 probably byte-mode queue measurement, which is fine. However, if in 1190 doubt, the vendor should be consulted. 1192 The above conclusions cater for the Internet as it is today with 1193 most, if not all, resources being primarily bit-congestible. A 1194 secondary conclusion of this memo is that we may see more packet- 1195 congestible resources in the future, so research may be needed to 1196 extend the Internet's congestion notification (drop or ECN) so that 1197 it can handle a mix of bit-congestible and packet-congestible 1198 resources. 1200 10. Acknowledgements 1202 Thank you to Sally Floyd, who gave extensive and useful review 1203 comments. Also thanks for the reviews from Toby Moncaster and Arnaud 1204 Jacquet. I am grateful to Bruce Davie and his colleagues for 1205 providing a timely and efficient survey of RED implementation in 1206 Cisco's product range. Also grateful thanks to Toby Moncaster, Will 1207 Dormann, John Regnault, Simon Carter and Stefaan De Cnodder who 1208 further helped survey the current status of RED implementation and 1209 deployment and, finally, thanks to the anonymous individuals who 1210 responded. 1212 11. Comments Solicited 1214 Comments and questions are encouraged and very welcome. They can be 1215 addressed to the IETF Transport Area working group mailing list 1216 , and/or to the authors. 1218 Editorial Comments 1220 [Note_Variation] The algorithm of the byte-mode drop variant of RED 1221 switches off any bias towards small packets 1222 whenever the smoothed queue length dictates that 1223 the drop probability of large packets should be 1224 100%. In the example in the Introduction, as the 1225 large packet drop probability varies around 25% the 1226 small packet drop probability will vary around 1%, 1227 but with occasional jumps to 100% whenever the 1228 instantaneous queue (after drop) manages to sustain 1229 a length above the 100% drop point for longer than 1230 the queue averaging period. 1232 Appendix A. Example Scenarios 1234 A.1. Notation 1236 To prove the two sets of assertions in the idealised wire protocol 1237 (Section 5) are true, we will compare two flows with different packet 1238 sizes, s_1 and s_2 [bit/pkt], to make sure their transports each see 1239 the correct congestion notification. Initially, within each flow we 1240 will take all packets as having equal sizes, but later we will 1241 generalise to flows within which packet sizes vary. A flow's bit 1242 rate, x [bit/s], is related to its packet rate, u [pkt/s], by 1244 x(t) = s.u(t). 1246 We will consider a 2x2 matrix of four scenarios: 1248 +-----------------------------+------------------+------------------+ 1249 | resource type and | A) Equal bit | B) Equal pkt | 1250 | congestion level | rates | rates | 1251 +-----------------------------+------------------+------------------+ 1252 | i) bit-congestible, p_b | (Ai) | (Bi) | 1253 | ii) pkt-congestible, p_p | (Aii) | (Bii) | 1254 +-----------------------------+------------------+------------------+ 1256 Table 3 1258 A.2. Bit-congestible resource, equal bit rates (Ai) 1260 Starting with the bit-congestible scenario, for two flows to maintain 1261 equal bit rates (Ai) the ratio of the packet rates must be the 1262 inverse of the ratio of packet sizes: u_2/u_1 = s_1/s_2. So, for 1263 instance, a flow of 60B packets would have to send 25x more packets 1264 to achieve the same bit rate as a flow of 1500B packets. If a 1265 congested resource marks proportion p_b of packets irrespective of 1266 size, the ratio of marked packets received by each transport will 1267 still be the same as the ratio of their packet rates, p_b.u_2/p_b.u_1 1268 = s_1/s_2. So of the 25x more 60B packets sent, 25x more will be 1269 marked than in the 1500B packet flow, but 25x more won't be marked 1270 too. 1272 In this scenario, the resource is bit-congestible, so it always uses 1273 our idealised bit-congestion field when it marks packets. Therefore 1274 the transport should count marked bytes not packets. But it doesn't 1275 actually matter for ratio-based transports like TCP (Section 5). The 1276 ratio of marked to unmarked bytes seen by each flow will be p_b, as 1277 will the ratio of marked to unmarked packets. Because they are 1278 ratios, the units cancel out. 1280 If a flow sent an inconsistent mixture of packet sizes, we have said 1281 it should count the ratio of marked and unmarked bytes not packets in 1282 order to correctly decode the level of congestion. But actually, if 1283 all it is trying to do is decode p_b, it still doesn't matter. For 1284 instance, imagine the two equal bit rate flows were actually one flow 1285 at twice the bit rate sending a mixture of one 1500B packet for every 1286 thirty 60B packets. 25x more small packets will be marked and 25x 1287 more will be unmarked. The transport can still calculate p_b whether 1288 it uses bytes or packets for the ratio. In general, for any 1289 algorithm which works on a ratio of marks to non-marks, either bytes 1290 or packets can be counted interchangeably, because the choice cancels 1291 out in the ratio calculation. 1293 However, where an absolute target rather than relative volume of 1294 congestion caused is important (Section 5), as it is for congestion 1295 accountability [Rate_fair_Dis], the transport must count marked bytes 1296 not packets, in this bit-congestible case. Aside from the goal of 1297 congestion accountability, this is how the bit rate of a transport 1298 can be made independent of packet size; by ensuring the rate of 1299 congestion caused is kept to a constant weight [WindowPropFair], 1300 rather than merely responding to the ratio of marked and unmarked 1301 bytes. 1303 Note the unit of byte-congestion volume is the byte. 1305 A.3. Bit-congestible resource, equal packet rates (Bi) 1307 If two flows send different packet sizes but at the same packet rate, 1308 their bit rates will be in the same ratio as their packet sizes, x_2/ 1309 x_1 = s_2/s_1. For instance, a flow sending 1500B packets at the 1310 same packet rate as another sending 60B packets will be sending at 1311 25x greater bit rate. In this case, if a congested resource marks 1312 proportion p_b of packets irrespective of size, the ratio of packets 1313 received with the byte-congestion field marked by each transport will 1314 be the same, p_b.u_2/p_b.u_1 = 1. 1316 Because the byte-congestion field is marked, the transport should 1317 count marked bytes not packets. But because each flow sends 1318 consistently sized packets it still doesn't matter for ratio-based 1319 transports. The ratio of marked to unmarked bytes seen by each flow 1320 will be p_b, as will the ratio of marked to unmarked packets. 1321 Therefore, if the congestion control algorithm is only concerned with 1322 the ratio of marked to unmarked packets (as is TCP), both flows will 1323 be able to decode p_b correctly whether they count packets or bytes. 1325 But if the absolute volume of congestion is important, e.g. for 1326 congestion accountability, the transport must count marked bytes not 1327 packets. Then the lower bit rate flow using smaller packets will 1328 rightly be perceived as causing less byte-congestion even though its 1329 packet rate is the same. 1331 If the two flows are mixed into one, of bit rate x1+x2, with equal 1332 packet rates of each size packet, the ratio p_b will still be 1333 measurable by counting the ratio of marked to unmarked bytes (or 1334 packets because the ratio cancels out the units). However, if the 1335 absolute volume of congestion is required, the transport must count 1336 the sum of congestion marked bytes, which indeed gives a correct 1337 measure of the rate of byte-congestion p_b(x_1 + x_2) caused by the 1338 combined bit rate. 1340 A.4. Pkt-congestible resource, equal bit rates (Aii) 1342 Moving to the case of packet-congestible resources, we now take two 1343 flows that send different packet sizes at the same bit rate, but this 1344 time the pkt-congestion field is marked by the resource with 1345 probability p_p. As in scenario Ai with the same bit rates but a 1346 bit-congestible resource, the flow with smaller packets will have a 1347 higher packet rate, so more packets will be both marked and unmarked, 1348 but in the same proportion. 1350 This time, the transport should only count marks without taking into 1351 account packet sizes. Transports will get the same result, p_p, by 1352 decoding the ratio of marked to unmarked packets in either flow. 1354 If one flow imitates the two flows but merged together, the bit rate 1355 will double with more small packets than large. The ratio of marked 1356 to unmarked packets will still be p_p. But if the absolute number of 1357 pkt-congestion marked packets is counted it will accumulate at the 1358 combined packet rate times the marking probability, p_p(u_1+u_2), 26x 1359 faster than packet congestion accumulates in the single 1500B packet 1360 flow of our example, as required. 1362 But if the transport is interested in the absolute number of packet 1363 congestion, it should just count how many marked packets arrive. For 1364 instance, a flow sending 60B packets will see 25x more marked packets 1365 than one sending 1500B packets at the same bit rate, because it is 1366 sending more packets through a packet-congestible resource. 1368 Note the unit of packet congestion is packets. 1370 A.5. Pkt-congestible resource, equal packet rates (Bii) 1372 Finally, if two flows with the same packet rate, pass through a 1373 packet-congestible resource, they will both suffer the same 1374 proportion of marking, p_p, irrespective of their packet sizes. On 1375 detecting that the pkt-congestion field is marked, the transport 1376 should count packets, and it will be able to extract the ratio p_p of 1377 marked to unmarked packets from both flows, irrespective of packet 1378 sizes. 1380 Even if the transport is monitoring the absolute amount of packets 1381 congestion over a period, still it will see the same amount of packet 1382 congestion from either flow. 1384 And if the two equal packet rates of different size packets are mixed 1385 together in one flow, the packet rate will double, so the absolute 1386 volume of packet-congestion will accumulate at twice the rate of 1387 either flow, 2p_p.u_1 = p_p(u_1+u_2). 1389 Appendix B. Congestion Notification Definition: Further Justification 1391 In Section 3 on the definition of congestion notification, load not 1392 capacity was used as the denominator. This also has a subtle 1393 significance in the related debate over the design of new transport 1394 protocols--typical new protocol designs (e.g. in XCP 1395 [I-D.falk-xcp-spec] & Quickstart [RFC4782]) expect the sending 1396 transport to communicate its desired flow rate to the network and 1397 network elements to progressively subtract from this so that the 1398 achievable flow rate emerges at the receiving transport. 1400 Congestion notification with total load in the denominator can serve 1401 a similar purpose (though in retrospect not in advance like XCP & 1402 QuickStart). Congestion notification is a dimensionless fraction but 1403 each source can extract necessary rate information from it because it 1404 already knows what its own rate is. Even though congestion 1405 notification doesn't communicate a rate explicitly, from each 1406 source's point of view congestion notification represents the 1407 fraction of the rate it was sending a round trip ago that couldn't 1408 (or wouldn't) be served by available resources. After they were 1409 sent, all these fractions of each source's offered load added up to 1410 the aggregate fraction of offered load seen by the congested 1411 resource. So, the source can also know the total excess rate by 1412 multiplying total load by congestion level. Therefore congestion 1413 notification, as one scale-free dimensionless fraction, implicitly 1414 communicates the instantaneous excess flow rate, albeit a RTT ago. 1416 Appendix C. Byte-mode Drop Complicates Policing Congestion Response 1418 This appendix explains why the ability of networks to police the 1419 response of _any_ transport to congestion depends on bit-congestible 1420 network resources only doing packet-mode not byte-mode drop. 1422 To be able to police a transport's response to congestion when 1423 fairness can only be judged over time and over all an individual's 1424 flows, the policer has to have an integrated view of all the 1425 congestion an individual (not just one flow) has caused due to all 1426 traffic entering the Internet from that individual. This is termed 1427 congestion accountability. 1429 But with byte-mode drop, one dropped or marked packet is not 1430 necessarily equivalent to another unless you know the MTU that caused 1431 it to be dropped/marked. To have an integrated view of a user, we 1432 believe congestion policing has to be located at an individual's 1433 attachment point to the Internet [Re-TCP]. But from there it cannot 1434 know the MTU of each remote queue that caused each drop/mark. 1435 Therefore it cannot take an integrated approach to policing all the 1436 responses to congestion of all the transports of one individual. 1437 Therefore it cannot police anything. 1439 The security/incentive argument _for_ packet-mode drop is similar. 1440 Firstly, confining RED to packet-mode drop would not preclude 1441 bottleneck policing approaches such as [pBox] as it seems likely they 1442 could work just as well by monitoring the volume of dropped bytes 1443 rather than packets. Secondly packet-mode dropping/marking naturally 1444 allows the congestion notification of packets to be globally 1445 meaningful without relying on MTU information held elsewhere. 1447 Because we recommend that a dropped/marked packet should be taken to 1448 mean that all the bytes in the packet are dropped/marked, a policer 1449 can remain robust against bits being re-divided into different size 1450 packets or across different size flows [Rate_fair_Dis]. Therefore 1451 policing would work naturally with just simple packet-mode drop in 1452 RED. 1454 In summary, making drop probability depend on the size of the packets 1455 that bits happen to be divided into simply encourages the bits to be 1456 divided into smaller packets. Byte-mode drop would therefore 1457 irreversibly complicate any attempt to fix the Internet's incentive 1458 structures. 1460 12. References 1461 12.1. Normative References 1463 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1464 Requirement Levels", BCP 14, RFC 2119, March 1997. 1466 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1467 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1468 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1469 S., Wroclawski, J., and L. Zhang, "Recommendations on 1470 Queue Management and Congestion Avoidance in the 1471 Internet", RFC 2309, April 1998. 1473 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1474 "Definition of the Differentiated Services Field (DS 1475 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1476 December 1998. 1478 [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 1479 Control", RFC 2581, April 1999. 1481 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1482 of Explicit Congestion Notification (ECN) to IP", 1483 RFC 3168, September 2001. 1485 [RFC3426] Floyd, S., "General Architectural and Policy 1486 Considerations", RFC 3426, November 2002. 1488 [RFC3448] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP 1489 Friendly Rate Control (TFRC): Protocol Specification", 1490 RFC 3448, January 2003. 1492 [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control 1493 (TFRC): The Small-Packet (SP) Variant", RFC 4828, 1494 April 2007. 1496 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1497 Control Algorithms", BCP 133, RFC 5033, August 2007. 1499 12.2. Informative References 1501 [CCvarPktSize] 1502 Widmer, J., Boutremans, C., and J-Y. Le Boudec, 1503 "Congestion Control for Flows with Variable Packet Size", 1504 ACM CCR 34(2) 137--151, 2004, 1505 . 1507 [DupTCP] Wischik, D., "Short messages", Royal Society workshop on 1508 networks: modelling and control , September 2007, . 1511 [ECNFixedWireless] 1512 Siris, V., "Resource Control for Elastic Traffic in CDMA 1513 Networks", Proc. ACM MOBICOM'02 , September 2002, . 1517 [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the 1518 evolution of congestion control", Automatica 35(12)1969-- 1519 1985, December 1999, 1520 . 1522 [I-D.falk-xcp-spec] 1523 Falk, A., "Specification for the Explicit Control Protocol 1524 (XCP)", draft-falk-xcp-spec-03 (work in progress), 1525 July 2007. 1527 [I-D.floyd-tcpm-ackcc] 1528 Floyd, S. and I. Property, "Adding Acknowledgement 1529 Congestion Control to TCP", draft-floyd-tcpm-ackcc-02 1530 (work in progress), November 2007. 1532 [I-D.ietf-pcn-architecture] 1533 Eardley, P., "Pre-Congestion Notification Architecture", 1534 draft-ietf-pcn-architecture-03 (work in progress), 1535 February 2008. 1537 [I-D.ietf-tcpm-ecnsyn] 1538 Floyd, S., "Adding Explicit Congestion Notification (ECN) 1539 Capability to TCP's SYN/ACK Packets", 1540 draft-ietf-tcpm-ecnsyn-05 (work in progress), 1541 February 2008. 1543 [I-D.ietf-tcpm-rfc2581bis] 1544 Allman, M., "TCP Congestion Control", 1545 draft-ietf-tcpm-rfc2581bis-03 (work in progress), 1546 September 2007. 1548 [I-D.irtf-iccrg-welzl-congestion-control-open-research] 1549 Papadimitriou, D., "Open Research Issues in Internet 1550 Congestion Control", 1551 draft-irtf-iccrg-welzl-congestion-control-open-research-00 1552 (work in progress), July 2007. 1554 [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco 1555 IOS Software Architecture", Cisco Press: CCIE Professional 1556 Development ISBN13: 978-1-57870-181-0, July 2000. 1558 [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End 1559 Internet Services using a Weighted Proportional Fair 1560 Sharing TCP", CCR 28(3) 53--69, July 1998, . 1563 [PCNcharter] 1564 IETF, "Congestion and Pre-Congestion Notification (pcn)", 1565 IETF w-g charter , Feb 2007, 1566 . 1568 [PktSizeEquCC] 1569 Vasallo, P., "Variable Packet Size Equation-Based 1570 Congestion Control", ICSI Technical Report tr-00-008, 1571 2000, . 1574 [RED93] Floyd, S. and V. Jacobson, "Random Early Detection (RED) 1575 gateways for Congestion Avoidance", IEEE/ACM Transactions 1576 on Networking 1(4) 397--413, August 1993, 1577 . 1579 [REDbias] Eddy, W. and M. Allman, "A Comparison of RED's Byte and 1580 Packet Modes", Computer Networks 42(3) 261--280, 1581 June 2003, 1582 . 1584 [REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "RED behavior 1585 with different packet sizes", Proc. 5th IEEE Symposium on 1586 Computers and Communications (ISCC) 793--799, July 2000, 1587 . 1589 [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion 1590 Control for Voice Traffic in the Internet", RFC 3714, 1591 March 2004. 1593 [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- 1594 Start for TCP and IP", RFC 4782, January 2007. 1596 [Rate_fair_Dis] 1597 Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", 1598 ACM CCR 37(2)63--74, April 2007, 1599 . 1601 [Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, 1602 "Re-ECN: Adding Accountability for Causing Congestion to 1603 TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in 1604 progress), January 2008. 1606 [WindowPropFair] 1607 Siris, V., "Service Differentiation and Performance of 1608 Weighted Window-Based Congestion Control and Packet 1609 Marking Algorithms in ECN Networks", Computer 1610 Communications 26(4) 314--326, 2002, . 1614 [gentle_RED] 1615 Floyd, S., "Recommendation on using the "gentle_" variant 1616 of RED", Web page , March 2000, 1617 . 1619 [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End 1620 Congestion Control in the Internet", IEEE/ACM Transactions 1621 on Networking 7(4) 458--472, August 1999, 1622 . 1624 [pktByteEmail] 1625 Floyd, S., "RED: Discussions of Byte and Packet Modes", 1626 email , March 1997, 1627 . 1629 Author's Address 1631 Bob Briscoe 1632 BT & UCL 1633 B54/77, Adastral Park 1634 Martlesham Heath 1635 Ipswich IP5 3RE 1636 UK 1638 Phone: +44 1473 645196 1639 Email: bob.briscoe@bt.com 1640 URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ 1642 Full Copyright Statement 1644 Copyright (C) The IETF Trust (2008). 1646 This document is subject to the rights, licenses and restrictions 1647 contained in BCP 78, and except as set forth therein, the authors 1648 retain all their rights. 1650 This document and the information contained herein are provided on an 1651 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1652 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1653 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1654 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1655 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1656 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1658 Intellectual Property 1660 The IETF takes no position regarding the validity or scope of any 1661 Intellectual Property Rights or other rights that might be claimed to 1662 pertain to the implementation or use of the technology described in 1663 this document or the extent to which any license under such rights 1664 might or might not be available; nor does it represent that it has 1665 made any independent effort to identify any such rights. Information 1666 on the procedures with respect to rights in RFC documents can be 1667 found in BCP 78 and BCP 79. 1669 Copies of IPR disclosures made to the IETF Secretariat and any 1670 assurances of licenses to be made available, or the result of an 1671 attempt made to obtain a general license or permission for the use of 1672 such proprietary rights by implementers or users of this 1673 specification can be obtained from the IETF on-line IPR repository at 1674 http://www.ietf.org/ipr. 1676 The IETF invites any interested party to bring to its attention any 1677 copyrights, patents or patent applications, or other proprietary 1678 rights that may cover technology that may be required to implement 1679 this standard. Please address the information to the IETF at 1680 ietf-ipr@ietf.org. 1682 Acknowledgments 1684 Funding for the RFC Editor function is provided by the IETF 1685 Administrative Support Activity (IASA). This document was produced 1686 using xml2rfc v1.32 (of http://xml.resource.org/) from a source in 1687 RFC-2629 XML format.