idnits 2.17.1 

draft-briscoe-tsvwg-byte-pkt-mark-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1656.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1667.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1674.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1680.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 24, 2008) is 5906 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '0' on line 496

  -- Looks like a reference, but probably isn't: '1' on line 496

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 3448 (Obsoleted by RFC 5348)

  == Outdated reference: A later version (-06) exists of
     draft-floyd-tcpm-ackcc-02

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pcn-architecture-03

  == Outdated reference: A later version (-10) exists of
     draft-ietf-tcpm-ecnsyn-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-tcpm-rfc2581bis-03

  == Outdated reference: A later version (-08) exists of
     draft-irtf-iccrg-welzl-congestion-control-open-research-00

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-05


     Summary: 5 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  B. Briscoe
3	Internet-Draft                                                  BT & UCL
4	Intended status: Informational                         February 24, 2008
5	Expires: August 27, 2008

7	                Byte and Packet Congestion Notification
8	                  draft-briscoe-tsvwg-byte-pkt-mark-02

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on August 27, 2008.

35	Copyright Notice

37	   Copyright (C) The IETF Trust (2008).

39	Abstract

41	   This memo concerns dropping or marking packets using active queue
42	   management (AQM) such as random early detection (RED) or pre-
43	   congestion notification (PCN).  The primary conclusion is that packet
44	   size should be taken into account when transports decode congestion
45	   indications, not when network equipment writes them.  Reducing drop
46	   of small packets has some tempting advantages: i) it drops less
47	   control packets, which tend to be small and ii) it makes TCP's bit-
48	   rate less dependent on packet size.  However, there are ways of
49	   addressing these issues at the transport layer, rather than reverse
50	   engineering network forwarding to fix specific transport problems.
51	   Network layer algorithms like the byte-mode packet drop variant of
52	   RED should not be used to drop fewer small packets, because that
53	   creates a perverse incentive for transports to use tiny segments,
54	   consequently also opening up a DoS vulnerability.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
59	   2.  Motivating Arguments . . . . . . . . . . . . . . . . . . . . .  9
60	     2.1.  Scaling Congestion Control with Packet Size  . . . . . . .  9
61	     2.2.  Avoiding Perverse Incentives to (ab)use Smaller Packets  . 10
62	     2.3.  Small != Control . . . . . . . . . . . . . . . . . . . . . 11
63	   3.  Working Definition of Congestion Notification  . . . . . . . . 12
64	   4.  Congestion Measurement . . . . . . . . . . . . . . . . . . . . 12
65	     4.1.  Congestion Measurement by Queue Length . . . . . . . . . . 12
66	       4.1.1.  Fixed Size Packet Buffers  . . . . . . . . . . . . . . 13
67	     4.2.  Congestion Measurement without a Queue . . . . . . . . . . 14
68	   5.  Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 14
69	   6.  The State of the Art . . . . . . . . . . . . . . . . . . . . . 16
70	     6.1.  Congestion Measurement: Status . . . . . . . . . . . . . . 17
71	     6.2.  Congestion Coding: Status  . . . . . . . . . . . . . . . . 17
72	       6.2.1.  Network Bias when Encoding . . . . . . . . . . . . . . 17
73	       6.2.2.  Transport Bias when Decoding . . . . . . . . . . . . . 19
74	       6.2.3.  Making Transports Robust against Control Packet
75	               Losses . . . . . . . . . . . . . . . . . . . . . . . . 20
76	       6.2.4.  Congestion Coding: Summary of Status . . . . . . . . . 21
77	   7.  Outstanding Issues and Next Steps  . . . . . . . . . . . . . . 23
78	     7.1.  Bit-congestible World  . . . . . . . . . . . . . . . . . . 23
79	     7.2.  Bit- & Packet-congestible World  . . . . . . . . . . . . . 24
80	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 25
81	   9.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 26
82	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27
83	   11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 27
84	   Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . .
85	   Appendix A.  Example Scenarios . . . . . . . . . . . . . . . . . . 28
86	     A.1.  Notation . . . . . . . . . . . . . . . . . . . . . . . . . 28
87	     A.2.  Bit-congestible resource, equal bit rates (Ai) . . . . . . 28
88	     A.3.  Bit-congestible resource, equal packet rates (Bi)  . . . . 29
89	     A.4.  Pkt-congestible resource, equal bit rates (Aii)  . . . . . 30
90	     A.5.  Pkt-congestible resource, equal packet rates (Bii) . . . . 31
91	   Appendix B.  Congestion Notification Definition: Further
92	                Justification . . . . . . . . . . . . . . . . . . . . 31
93	   Appendix C.  Byte-mode Drop Complicates Policing Congestion
94	                Response  . . . . . . . . . . . . . . . . . . . . . . 32
95	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33
96	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 33
97	     12.2. Informative References . . . . . . . . . . . . . . . . . . 33
98	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36
99	   Intellectual Property and Copyright Statements . . . . . . . . . . 37

101	Changes from Previous Versions

103	   To be removed by the RFC Editor on publication.

105	   Full incremental diffs between each version are available at
106	   <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#byte-pkt-mark>
107	   (courtesy of the rfcdiff tool):

109	   From -01 to -02 (this version):

111	         Abstract reorganised to align with clearer separation of issue
112	         in the memo.

114	         Introduction reorganised with motivating arguments removed to
115	         new Section 2.

117	         Clarified avoiding lock-out of large packets is not the main or
118	         only motivation for RED.

120	         Mentioned choice of drop or marking explicitly throughout,
121	         rather than trying to coin a word to mean either.

123	         Generalised the discussion throughout to any packet forwarding
124	         function on any network equipment, not just routers.

126	         Clarified the last point about why this is a good time to sort
127	         out this issue: because it will be hard / impossible to design
128	         new transports unless we decide whether the network or the
129	         transport is allowing for packet size.

131	         Added statement explaining the horizon of the memo is long
132	         term, but with short term expediency in mind.

134	         Added material on scaling congestion control with packet size
135	         (Section 2.1).

137	         Separated out issue of normalising TCP's bit rate from issue of
138	         preference to control packets (Section 2.3).

140	         Divided up Congestion Measurement section for clarity,
141	         including new material on fixed size packet buffers and buffer
142	         carving (Section 4.1.1 & Section 6.2.1) and on congestion
143	         measurement in wireless link technologies without queues
144	         (Section 4.2).

146	         Added section on 'Making Transports Robust against Control
147	         Packet Losses' (Section 6.2.3) with existing & new material
148	         included.

150	         Added tabulated results of vendor survey on byte-mode drop
151	         variant of RED (Table 2).

153	   From -00 to -01:

155	         Clarified applicability to drop as well as ECN.

157	         Highlighted DoS vulnerability.

159	         Emphasised that drop-tail suffers from similar problems to
160	         byte-mode drop, so only byte-mode drop should be turned off,
161	         not RED itself.

163	         Clarified the original apparent motivations for recommending
164	         byte-mode drop included protecting SYNs and pure ACKs more than
165	         equalising the bit rates of TCPs with different segment sizes.
166	         Removed some conjectured motivations.

168	         Added support for updates to TCP in progress (ackcc & ecn-syn-
169	         ack).

171	         Updated survey results with newly arrived data.

173	         Pulled all recommendations together into the conclusions.

175	         Moved some detailed points into two additional appendices and a
176	         note.

178	         Considerable clarifications throughout.

180	         Updated references

182	Requirements notation

184	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
185	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
186	   document are to be interpreted as described in [RFC2119].

188	1.  Introduction

190	   When notifying congestion, the problem of how (and whether) to take
191	   packet sizes into account has exercised the minds of researchers and
192	   practitioners for as long as active queue management (AQM) has been
193	   discussed.  Indeed, one reason AQM was originally introduced was to
194	   reduce the lock-out effects that small packets can have on large
195	   packets in drop-tail queues.  This memo aims to state the principles
196	   we should be using and to come to conclusions on what these
197	   principles will mean for future protocol design, taking into account
198	   the deployments we have already.

200	   Note that the byte vs. packet dilemma concerns congestion
201	   notification irrespective of whether it is signalled implicitly by
202	   drop or using explicit congestion notification (ECN [RFC3168] or PCN
203	   [I-D.ietf-pcn-architecture]).  Throughout this document, unless clear
204	   from the context, the term marking will be used to mean notifying
205	   congestion explicitly, while congestion notification will be used to
206	   mean notifying congestion either implicitly by drop or explicitly by
207	   marking.

209	   If the load on a resource depends on the rate at which packets
210	   arrive, it is called packet-congestible.  If the load depends on the
211	   rate at which bits arrive it is called bit-congestible.

213	   Examples of packet-congestible resources are route look-up engines
214	   and firewalls, because load depends on how many packet headers they
215	   have to process.  Examples of bit-congestible resources are
216	   transmission links, and most buffer memory, because the load depends
217	   on how many bits they have to transmit or store.  Some machine
218	   architectures use fixed size packet buffers, so buffer memory in
219	   these cases is packet-congestible (see Section 4.1.1).

221	   Note that information is generally processed or transmitted with a
222	   minimum granularity greater than a bit (e.g. octets).  The
223	   appropriate granularity for the resource in question SHOULD be used,
224	   but for the sake of brevity we will talk in terms of bytes in this
225	   memo.

227	   Resources may be congestible at higher levels of granularity than
228	   packets, for instance stateful firewalls are flow-congestible and
229	   call-servers are session-congestible.  This memo focuses on
230	   congestion of connectionless resources, but the same principles may
231	   be applied for congestion notification protocols controlling per-flow
232	   and per-session processing or state.

234	   The byte vs. packet dilemma arises at three stages in the congestion
235	   notification process:

237	   Measuring congestion  When the congested resource decides locally how
238	      to measure how congested it is.  (Should the queue be measured in
239	      bytes or packets?);

241	   Coding congestion notification into the wire protocol:  When the
242	      congested resource decides how to notify the level of congestion.
243	      (Should the level of notification depend on the byte-size of each
244	      particular packet carrying the notification?);

246	   Decoding congestion notification from the wire protocol:  When the
247	      transport interprets the notification.  (Should the byte-size of a
248	      missing or marked packet be taken into account?).

250	   In RED, whether to use packets or bytes when measuring queues is
251	   called packet-mode or byte-mode queue measurement.  This choice is
252	   now fairly well understood but is included in Section 4 to document
253	   it in the RFC series.

255	   The controversy is mainly around the other two stages: whether to
256	   allow for packet size when the network codes or when the transport
257	   decodes congestion notification.  In RED, the variant that reduces
258	   drop probability for packets based on their size in bytes is called
259	   byte-mode drop, while the variant that doesn't is called packet mode
260	   drop.  Whether queues are measured in bytes or packets is an
261	   orthogonal choice, termed byte-mode queue measurement or packet-mode
262	   queue measurement.

264	   Currently, the RFC series is silent on this matter other than a paper
265	   trail of advice referenced from [RFC2309], which conditionally
266	   recommends byte-mode (packet-size dependent) drop [pktByteEmail].
267	   However, all the implementers who responded to our survey have not
268	   followed this advice.  The primary purpose of this memo is to build a
269	   definitive consensus against deliberate preferential treatment for
270	   small packets in AQM algorithms and to record this advice within the
271	   RFC series.

273	   Now is a good time to discuss whether fairness between different
274	   sized packets would best be implemented in the network layer, or at
275	   the transport, for a number of reasons:

277	   1.  The packet vs. byte issue requires speedy resolution because the
278	       IETF pre-congestion notification (PCN) working group has been
279	       chartered to produce a standards track specification of its
280	       congestion notification (AQM) algorithm [PCNcharter];

282	   2.  [RFC2309] says RED may either take account of packet size or not
283	       when dropping, but gives no recommendation between the two,
284	       referring instead to advice on the performance implications in an
285	       email [pktByteEmail], which recommends byte-mode drop.  Further,
286	       just before RFC2309 was issued, an addendum was added to the
287	       archived email that revisited the issue of packet vs. byte-mode
288	       drop in its last para, making the recommendation less clear-cut;

290	   3.  Without the present memo, the only advice in the RFC series on
291	       packet size bias in AQM algorithms would be a reference to an
292	       archived email in [RFC2309] (including an addendum at the end of
293	       the email to correct the original).

295	   4.  The IRTF Internet Congestion Control Research Group (ICCRG)
296	       recently took on the challenge of building consensus on what
297	       common congestion control support should be required from network
298	       forwarding functions in future
299	       [I-D.irtf-iccrg-welzl-congestion-control-open-research].  The
300	       wider Internet community needs to discuss whether the complexity
301	       of adjusting for packet size should be in the network or in
302	       transports;

304	   5.  Given there are many good reasons why larger path max
305	       transmission units (PMTUs) would help solve a number of scaling
306	       issues, we don't want to create any bias against large packets
307	       that is greater than their true cost;

309	   6.  The IETF has started to consider the question of fairness between
310	       flows that use different packet sizes (e.g. in the small-packet
311	       variant of TCP-friendly rate control, TFRC-SP [RFC4828]).  Given
312	       transports with different packet sizes, if we don't decide
313	       whether the network or the transport should allow for packet
314	       size, it will be hard if not impossible to design any transport
315	       protocol so that its bit-rate relative to other transports meets
316	       design guidelines [RFC5033] (Note however that, if the concern
317	       were fairness between users, rather than between flows
318	       [Rate_fair_Dis], relative rates between flows would have to come
319	       under run-time control rather than being embedded in protocol
320	       designs).

322	   This memo is initially concerned with how we should correctly scale
323	   congestion control functions with packet size for the long term.  But
324	   it also recognises that expediency may be necessary to deal with
325	   existing widely deployed protocols that don't live up to the long
326	   term goal.  It turns out that the 'correct' variant of RED to deploy
327	   seems to be the one everyone has deployed, and no-one who responded
328	   to our survey has implemented the other variant.  However, at the
329	   transport layer, TCP congestion control is a widely deployed protocol
330	   that we argue doesn't scale correctly with packet size.  To date this
331	   hasn't been a significant problem because most TCPs have been used
332	   with similar packet sizes.  But, as we design new congestion
333	   controls, we should build in scaling with packet size rather than
334	   assuming we should follow TCP's example.

336	   Motivating arguments for our advice are given next in Section 2.
337	   Then the body of the memo starts from first principles, defining
338	   congestion notification in Section 3 then determining the correct way
339	   to measure congestion (Section 4) and to design an idealised
340	   congestion notification protocol (Section 5).  It then surveys the
341	   advice given previously in the RFC series, the research literature
342	   and the deployed legacy (Section 6) before listing outstanding issues
343	   (Section 7) that will need resolution both to achieve the ideal
344	   protocol and to handle legacy.  After discussing security
345	   considerations (Section 8) strong recommendations for the way forward
346	   are given in the conclusions (Section 9).

348	2.  Motivating Arguments

350	2.1.  Scaling Congestion Control with Packet Size

352	   There are two ways of interpreting a dropped or marked packet.  It
353	   can either be considered as a single loss event or as loss/marking of
354	   the bytes in the packet.  Here we try to design a test to see which
355	   approach scales with packet size.

357	   Imagine a bit-congestible link shared by many flows, so that each
358	   busy period tends to cause packets to be lost from different flows.
359	   The test compares two identical scenarios with the same applications,
360	   the same numbers of sources and the same load.  But the sources break
361	   the load into large packets in one scenario and small packets in the
362	   other.  Of course, because the load is the same, there will be
363	   proportionately more packets in the small packet case.

365	   The test of whether a congestion control scales with packet size is
366	   that it should respond in the same way to the same congestion
367	   excursion, irrespective of the size of the packets that the bytes
368	   causing congestion happen to be broken down into.

370	   A bit-congestible queue suffering a congestion excursion has to drop
371	   or mark the same excess bytes whether they are in a few large packets
372	   or many small packets.  So for the same congestion excursion, the
373	   same amount of bytes have to be shed to get the load back to its
374	   operating point.  But, of course, for smaller packets more packets
375	   will have to be discarded to shed the same bytes.

377	   If all the transports interpret each drop/mark as a single loss event
378	   irrespective of the size of the packet dropped, they will respond
379	   more to the same congestion excursion, failing our test.  On the
380	   other hand, if they respond proportionately less when smaller packets
381	   are dropped/marked, overall they will be able to respond the same to
382	   the same congestion excursion.

384	   Therefore, for a congestion control to scale with packet size it
385	   should respond to dropped or marked bytes (as TFRC-SP [RFC4828]
386	   effectively does), not just to dropped or marked packets irrespective
387	   of packet size (as TCP does).

389	   The email [pktByteEmail] referred to by RFC2309 says the question of
390	   whether a packet's own size should affect its drop probability
391	   "depends on the dominant end-to-end congestion control mechanisms".
392	   But we argue the network layer should not be optimised for whatever
393	   transport is predominant.

395	   TCP congestion control ensures that flows competing for the same
396	   resource each maintain the same number of segments in flight,
397	   irrespective of segment size.  So under similar conditions, flows
398	   with different segment sizes will get different bit rates.  But even
399	   though reducing the drop probability of small packets helps ensure
400	   TCPs with different packet sizes will achieve similar bit rates, we
401	   argue this should be achieved in TCP itself, not in the network.

403	   Effectively, favouring small packets is reverse engineering of the
404	   network layer around TCP, contrary to the excellent advice in
405	   [RFC3426], which asks designers to question "Why are you proposing a
406	   solution at this layer of the protocol stack, rather than at another
407	   layer?"

409	2.2.  Avoiding Perverse Incentives to (ab)use Smaller Packets

411	   Increasingly, it is being recognised that a protocol design must take
412	   care not to cause unintended consequences by giving the parties in
413	   the protocol exchange perverse incentives [Evol_cc][RFC3426].  Again,
414	   imagine a scenario where the same bit rate of packets will contribute
415	   the same to congestion of a link irrespective of whether it is sent
416	   as fewer larger packets or more smaller packets.  A protocol design
417	   that caused larger packets to be more likely to be dropped than
418	   smaller ones would be dangerous in this case:

420	   Malicious transports:  A queue that gives an advantage to small
421	      packets can be used to amplify the force of a flooding attack.  By
422	      sending a flood of small packets, the attacker can get the queue
423	      to discard more traffic in large packets, allowing more attack
424	      traffic to get through to cause further damage.  Such a queue
425	      allows attack traffic to have a disproportionately large effect on
426	      regular traffic without the attacker having to do much work.  The
427	      byte-mode drop variant of RED amplifies small packet attacks.

429	      Drop-tail queues amplify small packet attacks even more than RED
430	      byte-mode drop (see the Security Considerations section
431	      Section 8).  Wherever possible neither should be used.

433	   Normal transports:  Even if a transport is not malicious, if it finds
434	      small packets go faster, it will tend to act in its own interest
435	      and use them.  Queues that give advantage to small packets create
436	      an evolutionary pressure for transports to send at the same bit-
437	      rate but break their data stream down into tiny segments to reduce
438	      their drop rate.  Encouraging a high volume of tiny packets might
439	      in turn unnecessarily overload a completely unrelated part of the
440	      system, perhaps more limited by header-processing than bandwidth.

442	   Imagine two flows arrive at a bit-congestible transmission link each
443	   with the same bit rate, say 1Mbps, but one consists of 1500B and the
444	   other 60B packets, which are 25x smaller.  Consider a scenario where
445	   gentle RED [gentle_RED] is used, along with the variant of RED we
446	   advise against, i.e. where the RED algorithm is configured to adjust
447	   the drop probability of packets in proportion to each packet's size
448	   (byte mode packet drop).  In this case, if RED drops 25% of the
449	   larger packets, it will aim to drop 1% of the smaller packets (but in
450	   practice it may drop more as congestion increases
451	   [RFC4828](S.B.4)[Note_Variation]).  Even though both flows arrive
452	   with the same bit rate, the bit rate the RED queue aims to pass to
453	   the line will be 750k for the flow of larger packet but 990k for the
454	   smaller packets (but because of rate variation it will be less than
455	   this target).  It can be seen that this behaviour reopens the same
456	   denial of service vulnerability that drop tail queues offer to floods
457	   of small packet, though not necessarily as strongly (see Section 8).

459	2.3.  Small != Control

461	   It is tempting to drop small packets with lower probability to
462	   improve performance, because many control packets are small (TCP SYNs
463	   & ACKs, DNS queries & responses, SIP messages, HTTP GETs, etc) and
464	   dropping fewer control packets considerably improves performance.
465	   However, we must not give control packets preference purely by virtue
466	   of their smallness, otherwise it is too easy for any data source to
467	   get the same preferential treatment simply by sending data in smaller
468	   packets.  Again we should not create perverse incentives to favour
469	   small packets rather than to favour control packets, which is what we
470	   intend.

472	   Just because many control packets are small does not mean all small
473	   packets are control packets.

475	   So again, rather than fix these problems in the network layer, we
476	   argue that the transport should be made more robust against losses of
477	   control packets (see 'Making Transports Robust against Control Packet
478	   Losses' in Section 6.2.3).

480	3.  Working Definition of Congestion Notification

482	   Rather than aim to achieve what many have tried and failed, this memo
483	   will not try to define congestion.  It will give a working definition
484	   of what congestion notification should be taken to mean for this
485	   document.  Congestion notification is a changing signal that aims to
486	   communicate the ratio E/L, where E is the instantaneous excess load
487	   offered to a resource that it cannot (or would not) serve and L is
488	   the instantaneous offered load.

490	   The phrase `would not serve' is added, because AQM systems (e.g.
491	   RED, PCN [I-D.ietf-pcn-architecture]) use a virtual capacity smaller
492	   than actual capacity, then notify congestion of this virtual capacity
493	   in order to avoid congestion of the actual capacity.

495	   Note that the denominator is offered load, not capacity.  Therefore
496	   congestion notification is a real number bounded by the range [0,1].
497	   This ties in with the most well-understood form of congestion
498	   notification: drop rate.  It also means that congestion has a natural
499	   interpretation as a probability; the probability of offered traffic
500	   not being served (or being marked as at risk of not being served).
501	   Appendix B describes a further incidental benefit that arises from
502	   using load as the denominator of congestion notification.

504	4.  Congestion Measurement

506	4.1.  Congestion Measurement by Queue Length

508	   Queue length is usually the most correct and simplest way to measure
509	   congestion of a resource.  To avoid the pathological effects of drop
510	   tail, an AQM function can then be used to transform queue length into
511	   the probability of dropping or marking a packet (e.g.  RED's
512	   piecewise linear function between thresholds).  If the resource is
513	   bit-congestible, the length of the queue SHOULD be measured in bytes.
514	   If the resource is packet-congestible, the length of the queue SHOULD
515	   be measured in packets.  No other choice makes sense, because the
516	   number of packets waiting in the queue isn't relevant if the resource
517	   gets congested by bytes and vice versa.  We discuss the implications
518	   on RED's byte mode and packet mode for measuring queue length in
519	   Section 6.

521	4.1.1.  Fixed Size Packet Buffers

523	   Some, mostly older, queuing hardware sets aside fixed sized buffers
524	   in which to store each packet in the queue.  Also, with some
525	   hardware, any fixed sized buffers not completely filled by a packet
526	   are padded when transmitted to the wire.  If we imagine a theoretical
527	   forwarding system with both queuing and transmission in fixed, MTU-
528	   sized units, it should clearly be treated as packet-congestible,
529	   because the queue length in packets would be a good model of
530	   congestion of the lower layer link.

532	   If we now imagine a hybrid forwarding system with transmission delay
533	   largely dependent on the byte-size of packets but buffers of one MTU
534	   per packet, it should strictly require a more complex algorithm to
535	   determine the probability of congestion.  It should be treated as two
536	   resources in sequence, where the sum of the byte-sizes of the packets
537	   within each packet buffer models congestion of the line while the
538	   length of the queue in packets models congestion of the queue.  Then
539	   the probability of congesting the forwarding buffer would be a
540	   conditional probability--conditional on the previously calculated
541	   probability of congesting the line.

543	   However, in systems that use fixed size buffers, it is unusual for
544	   all the buffers used by an interface to be the same size.  Typically
545	   pools of different sized buffers are provided (Cisco uses the term
546	   'buffer carving' for the process of dividing up memory into these
547	   pools [IOSArch]).  Usually, if the pool of small buffers is
548	   exhausted, arriving small packets can borrow space in the pool of
549	   large buffers, but not vice versa.  However, it is easier to work out
550	   what should be done if we temporarily set aside the possibility of
551	   such borrowing.  Then, with fixed pools of buffers for different
552	   sized packets and no borrowing, the size of each pool and the current
553	   queue length in each pool would both be measured in packets.  So an
554	   AQM algorithm would have to maintain the queue length for each pool,
555	   and judge whether to drop/mark a packet of a particular size by
556	   looking at the pool for packets of that size and using the length (in
557	   packets) of its queue.

559	   We now return to the issue we temporarily set aside: small packets
560	   borrowing space in larger buffers.  In this case, the only difference
561	   is that the pools for smaller packets have a maximum queue size that
562	   includes all the pools for larger packets.  And every time a packet
563	   takes a larger buffer, the current queue size has to be incremented
564	   for all queues in the pools of buffers less than or equal to the
565	   buffer size used.

567	   We will return to borrowing of fixed sized buffers when we discuss
568	   biasing the drop/marking probability of a specific packet because of
569	   its size in Section 6.2.1.  But here we can give a simple summary of
570	   the present discussion on how to measure the length of queues of
571	   fixed buffers: no matter how complicated the scheme is, ultimately
572	   any fixed buffer system will need to measure its queue length in
573	   packets not bytes.

575	4.2.  Congestion Measurement without a Queue

577	   AQM algorithms are nearly always described assuming there is a queue
578	   for a congested resource and the algorithm can use the queue length
579	   to determine the probability that it will drop or mark each packet.
580	   But not all congested resources lead to queues.  For instance,
581	   wireless spectrum is bit-congestible (for a given coding scheme),
582	   because interference increases with the rate at which bits are
583	   transmitted.  But wireless link protocols do not always maintain a
584	   queue that depends on spectrum interference.  Similarly, power
585	   limited resources are also usually bit-congestible if energy is
586	   primarily required for transmission rather than header processing,
587	   but it is rare for a link protocol to build a queue as it approaches
588	   maximum power.

590	   However, AQM algorithms don't require a queue in order to work.  For
591	   instance spectrum congestion can be modelled by signal quality using
592	   target bit-energy-to-noise-density ratio.  And, to model radio power
593	   exhaustion, transmission power levels can be measured and compared to
594	   the maximum power available.  [ECNFixedWireless] proposes a practical
595	   and theoretically sound way to combine congestion notification for
596	   different bit-congestible resources at different layers along an end
597	   to end path, whether wireless or wired, and whether with or without
598	   queues.

600	5.  Idealised Wire Protocol Coding

602	   We will start by inventing an idealised congestion notification
603	   protocol before discussing how to make it practical.  The idealised
604	   protocol is shown to be correct using examples in Appendix A.
605	   Congestion notification involves the congested resource coding a
606	   congestion notification signal into the packet stream and the
607	   transports decoding it.  The idealised protocol uses two different
608	   fields in each datagram to signal congestion: one for byte congestion
609	   and one for packet congestion.

611	   We are not saying two ECN fields will be needed (and we are not
612	   saying that somehow a resource should be able to drop a packet in one
613	   of two different ways so that the transport can distinguish which
614	   sort of drop it was!).  These two congestion notification channels
615	   are just a conceptual device.  They allow us to defer having to
616	   decide whether to distinguish between byte and packet congestion when
617	   the network resource codes the signal or when the transport decodes
618	   it.

620	   However, although this idealised mechanism isn't intended for
621	   implementation, we do want to emphasise that we may need to find a
622	   way to implement it, because it could become necessary to somehow
623	   distinguish between bit and packet congestion [RFC3714].  Currently a
624	   design goal of network processing equipment such as routers and
625	   firewalls is to keep packet processing uncongested even under worst
626	   case bit rates with minimum packet sizes.  Therefore, packet-
627	   congestion is currently rare, but there is no guarantee that it will
628	   not become common with future technology trends.

630	   The idealised wire protocol is given below.  It accounts for packet
631	   sizes at the transport layer, not in the network, and then only in
632	   the case of bit-congestible resources.  This avoids the perverse
633	   incentive to send smaller packets and the DoS vulnerability that
634	   would otherwise result if the network were to bias towards them (see
635	   the motivating argument about avoiding perverse incentives in
636	   Section 2.2).  Incidentally, it also ensures neither the network nor
637	   the transport needs to do a multiply operation--multiplication by
638	   packet size is effectively achieved as a repeated add when the
639	   transport adds to its count of marked bytes as each congestion event
640	   is fed to it:

642	   o  A packet-congestible resource trying to code congestion level p_p
643	      into a packet stream should mark the idealised `packet congestion'
644	      field in each packet with probability p_p irrespective of the
645	      packet's size.  The transport should then take a packet with the
646	      packet congestion field marked to mean just one mark, irrespective
647	      of the packet size.

649	   o  A bit-congestible resource trying to code time-varying byte-
650	      congestion level p_b into a packet stream should mark the `byte
651	      congestion' field in each packet with probability p_b, again
652	      irrespective of the packet's size.  Unlike before, the transport
653	      should take a packet with the byte congestion field marked to
654	      count as a mark on each byte in the packet.

656	   The worked examples in Appendix A show that transports can extract
657	   sufficient and correct congestion notification from these protocols
658	   for cases when two flows with different packet sizes have matching
659	   bit rates or matching packet rates.  Examples are also given that mix
660	   these two flows into one to show that a flow with mixed packet sizes
661	   would still be able to extract sufficient and correct information.

663	   Sufficient and correct congestion information means that there is
664	   sufficient information for the two different types of transport
665	   requirements:

667	   Ratio-based:  Established transport congestion controls like TCP's
668	      [RFC2581] aim to achieve equal segment rates per RTT through the
669	      same bottleneck--TCP friendliness [RFC3448].  They work with the
670	      ratio of dropped to delivered segments (or marked to unmarked
671	      segments in the case of ECN).  The example scenarios show that
672	      these ratio-based transports are effectively the same whether
673	      counting in bytes or packets, because the units cancel out.
674	      (Incidentally, this is why TCP's bit rate is still proportional to
675	      packet size even when byte-counting is used, as recommended for
676	      TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security
677	      reasons.)

679	   Absolute-target-based:  Other congestion controls proposed in the
680	      research community aim to limit the volume of congestion caused to
681	      a constant weight parameter.  [MulTCP][WindowPropFair] are
682	      examples of weighted proportionally fair transports designed for
683	      cost-fair environments [Rate_fair_Dis].  In this case, the
684	      transport requires a count (not a ratio) of dropped/marked bytes
685	      in the bit-congestible case and of dropped/marked packets in the
686	      packet congestible case.

688	6.  The State of the Art

690	   The original 1993 paper on RED [RED93] proposed two options for the
691	   RED active queue management algorithm: packet mode and byte mode.
692	   Packet mode measured the queue length in packets and dropped (or
693	   marked) individual packets with a probability independent of their
694	   size.  Byte mode measured the queue length in bytes and marked an
695	   individual packet with probability in proportion to its size
696	   (relative to the maximum packet size).  In the paper's outline of
697	   further work, it was stated that no recommendation had been made on
698	   whether the queue size should be measured in bytes or packets, but
699	   noted that the difference could be significant.

701	   When RED was recommended for general deployment in 1998 [RFC2309],
702	   the two modes were mentioned implying the choice between them was a
703	   question of performance, referring to a 1997 email [pktByteEmail] for
704	   advice on tuning.  This email clarified that there were in fact two
705	   orthogonal choices: whether to measure queue length in bytes or
706	   packets (Section 6.1 below) and whether the drop probability of an
707	   individual packet should depend on its own size (Section 6.2 below).

709	6.1.  Congestion Measurement: Status

711	   The choice of which metric to use to measure queue length was left
712	   open in RFC2309.  It is now well understood that queues for bit-
713	   congestible resources should be measured in bytes, and queues for
714	   packet-congestible resources should be measured in packets (see
715	   Section 4).

717	   Where buffers are not configured or legacy buffers cannot be
718	   configured to the above guideline, we don't have to make allowances
719	   for such legacy in future protocol design.  If a bit-congestible
720	   buffer is measured in packets, the operator will have set the
721	   thresholds mindful of a typical mix of packets sizes.  Any AQM
722	   algorithm on such a buffer will be oversensitive to high proportions
723	   of small packets, e.g. a DoS attack, and undersensitive to high
724	   proportions of large packets.  But an operator can safely keep such a
725	   legacy buffer because any undersensitivity during unusual traffic
726	   mixes cannot lead to congestion collapse given the buffer will
727	   eventually revert to tail drop, discarding proportionately more large
728	   packets.

730	   Some modern queue implementations give a choice for setting RED's
731	   thresholds in byte-mode or packet-mode.  This may merely be an
732	   administrator-interface preference, not altering how the queue itself
733	   is measured but on some hardware it does actually change the way it
734	   measures its queue.  Whether a resource is bit-congestible or packet-
735	   congestible is a property of the resource, so an admin SHOULD NOT
736	   ever need to, or be able to, configure the way a queue measures
737	   itself.

739	   We believe the question of whether to measure queues in bytes or
740	   packets is fairly well understood these days.  The only outstanding
741	   issues concern how to measure congestion when the queue is bit
742	   congestible but the resource is packet congestible or vice versa (see
743	   Section 4).  But there is no controversy over what should be done.
744	   It's just you have to be an expert in probability to work out what
745	   should be done and, even if you have, it's not always easy to find a
746	   practical algorithm to implement it.

748	6.2.  Congestion Coding: Status

750	6.2.1.  Network Bias when Encoding

752	   The previously mentioned email [pktByteEmail] referred to by
753	   [RFC2309] said that the choice over whether a packet's own size
754	   should affect its drop probability "depends on the dominant end-to-
755	   end congestion control mechanisms".  [Section 2 argues against this
756	   approach, citing the excellent advice in RFC3246.]  The referenced
757	   email went on to argue that drop probability should depend on the
758	   size of the packet being considered for drop if the resource is bit-
759	   congestible, but not if it is packet-congestible, but advised that
760	   most scarce resources in the Internet were currently bit-congestible.
761	   The argument continued that if packet drops were inflated by packet
762	   size (byte-mode dropping), "a flow's fraction of the packet drops is
763	   then a good indication of that flow's fraction of the link bandwidth
764	   in bits per second".  This was consistent with a referenced policing
765	   mechanism being worked on at the time for detecting unusually high
766	   bandwidth flows, eventually published in 1999 [pBox].  [The problem
767	   could have been solved by making the policing mechanism count the
768	   volume of bytes randomly dropped, not the number of packets.]

770	   A few months before RFC2309 was published, an addendum was added to
771	   the above archived email referenced from the RFC, in which the final
772	   paragraph seemed to partially retract what had previously been said.
773	   It clarified that the question of whether the probability of
774	   dropping/marking a packet should depend on its size was not related
775	   to whether the resource itself was bit congestible, but a completely
776	   orthogonal question.  However the only example given had the queue
777	   measured in packets but packet drop depended on the byte-size of the
778	   packet in question.  No example was given the other way round.

780	   In 2000, Cnodder et al [REDbyte] pointed out that there was an error
781	   in the part of the original 1993 RED algorithm that aimed to
782	   distribute drops uniformly, because it didn't correctly take into
783	   account the adjustment for packet size.  They recommended an
784	   algorithm called RED_4 to fix this.  But they also recommended a
785	   further change, RED_5, to adjust drop rate dependent on the square of
786	   relative packet size.  This was indeed consistent with one stated
787	   motivation behind RED's byte mode drop--that we should reverse
788	   engineer the network to improve the performance of dominant end-to-
789	   end congestion control mechanisms.

791	   By 2003, a further change had been made to the adjustment for packet
792	   size, this time in the RED algorithm of the ns2 simulator.  Instead
793	   of taking each packet's size relative to a `maximum packet size' it
794	   was taken relative to a `mean packet size', intended to be a static
795	   value representative of the `typical' packet size on the link.  We
796	   have not been able to find a justification for this change in the
797	   literature, however Eddy and Allman conducted experiments [REDbias]
798	   that assessed how sensitive RED was to this parameter, amongst other
799	   things.  No-one seems to have pointed out that this changed algorithm
800	   can often lead to drop probabilities of greater than 1 [which should
801	   ring alarm bells hinting that there's a mistake in the theory
802	   somewhere].  On 10-Nov-2004, this variant of byte-mode packet drop
803	   was made the default in the ns2 simulator.

805	   The byte-mode drop variant of RED is, of course, not the only
806	   possible bias towards small packets in queueing algorithms.  We have
807	   already mentioned that tail-drop queues naturally tend to lock-out
808	   large packets once they are full.  But also queues with fixed sized
809	   buffers reduce the probability that small packets will be dropped if
810	   (and only if) they allow small packets to borrow buffers from the
811	   pools for larger packets.  As was explained in Section 4.1.1 on fixed
812	   size buffer carving, borrowing effectively makes the maximum queue
813	   size for small packets greater than that for large packets, because
814	   more buffers can be used by small packets while less will fit large
815	   packets.

817	   However, in itself, the bias towards small packets caused by buffer
818	   borrowing is perfectly correct.  Lower drop probability for small
819	   packets is legitimate in buffer borrowing schemes, because small
820	   packets genuinely congest the machine's buffer memory less than large
821	   packets, given they can fit in more spaces.  The bias towards small
822	   packets is not artificially added (as it is in RED's byte-mode drop
823	   algorithm), it merely reflects the reality of the way fixed buffer
824	   memory gets congested.  Incidentally, the bias towards small packets
825	   from buffer borrowing is nothing like as large as that of RED's byte-
826	   mode drop.

828	   Nonetheless, fixed-buffer memory with tail drop is still prone to
829	   lock-out large packets, purely because of the tail-drop aspect.  So a
830	   good AQM algorithm like RED with packet-mode drop should be used with
831	   fixed buffer memories where possible.  If RED is too complicated to
832	   implement with multiple fixed buffer pools, the minimum necessary to
833	   prevent large packet lock-out is to ensure smaller packets never use
834	   the last available buffer in any of the pools for larger packets.

836	6.2.2.  Transport Bias when Decoding

838	   The above proposals to alter the network layer to give a bias towards
839	   smaller packets have largely carried on outside the IETF process
840	   (unless one counts a reference in an informational RFC to an archived
841	   email!).  Whereas, within the IETF, there are many different
842	   proposals to alter transport protocols to achieve the same goals,
843	   i.e. either to make the flow bit-rate take account of packet size, or
844	   to protect control packets from loss.  This memo argues that altering
845	   transport protocols is the more principled approach.

847	   A recently approved experimental RFC adapts its transport layer
848	   protocol to take account of packet sizes relative to typical TCP
849	   packet sizes.  This proposes a new small-packet variant of TCP-
850	   friendly rate control [RFC3448] called TFRC-SP [RFC4828].
851	   Essentially, it proposes a rate equation that inflates the flow rate
852	   by the ratio of a typical TCP segment size (1500B including TCP
853	   header) over the actual segment size [PktSizeEquCC].  (There are also
854	   other important differences of detail relative to TFRC, such as using
855	   virtual packets [CCvarPktSize] to avoid responding to multiple losses
856	   per round trip and using a minimum inter-packet interval.)

858	   Section 4.5.1 of this TFRC-SP spec discusses the implications of
859	   operating in an environment where queues have been configured to drop
860	   smaller packets with proportionately lower probability than larger
861	   ones.  But it only discusses TCP operating in such an environment,
862	   only mentioning TFRC-SP briefly when discussing how to define
863	   fairness with TCP.  And it only discusses the byte-mode dropping
864	   version of RED as it was before Cnodder et al pointed out it didn't
865	   sufficiently bias towards small packets to make TCP independent of
866	   packet size.

868	   So the TFRC-SP spec doesn't address the issue of which of the network
869	   or the transport _should_ handle fairness between different packet
870	   sizes.  In its Appendix B.4 it discusses the possibility of both
871	   TFRC-SP and some network buffers duplicating each other's attempts to
872	   deliberately bias towards small packets.  But the discussion is not
873	   conclusive, instead reporting simulations of many of the
874	   possibilities in order to assess performance but not recommending any
875	   particular course of action.

877	   The paper originally proposing TFRC with virtual packets (VP-TFRC)
878	   [CCvarPktSize] proposed that there should perhaps be two variants to
879	   cater for the different variants of RED.  However, as the TFRC-SP
880	   authors point out, there is no way for a transport to know whether
881	   some queues on its path have deployed RED with byte-mode packet drop
882	   (except if an exhaustive survey found that no-one has deployed it!--
883	   see Section 6.2.4).  Incidentally, VP-TFRC also proposed that byte-
884	   mode RED dropping should really square the packet size compensation
885	   factor (like that of RED_5, but apparently unaware of it).

887	   Pre-congestion notification [I-D.ietf-pcn-architecture] is a proposal
888	   to use a virtual queue for AQM marking for packets within one
889	   Diffserv class in order to give early warning prior to any real
890	   queuing.  The proposed PCN marking algorithms have been designed not
891	   to take account of packet size when forwarding through queues.
892	   Instead the general principle has been to take account of the sizes
893	   of marked packets when monitoring the fraction of marking at the edge
894	   of the network.

896	6.2.3.  Making Transports Robust against Control Packet Losses

898	   Recently, two drafts have proposed changes to TCP that make it more
899	   robust against losing small control packets [I-D.ietf-tcpm-ecnsyn]
900	   [I-D.floyd-tcpm-ackcc].  In both cases they note that the case for
901	   these TCP changes would be weaker if RED were biased against dropping
902	   small packets.  We argue here that these two proposals are a safer
903	   and more principled way to achieve TCP performance improvements than
904	   reverse engineering RED to benefit TCP.

906	   Although no proposals exist as far as we know, it would also be
907	   possible and perfectly valid to make control packets robust against
908	   drop by explicitly requesting a lower drop probability using their
909	   Diffserv code point [RFC2474] to request a scheduling class with
910	   lower drop.

912	   The re-ECN protocol proposal [Re-TCP] is designed so that transports
913	   can be made more robust against losing control packets.  It gives
914	   queues an incentive to optionally give preference against drop to
915	   packets with the 'feedback not established' codepoint in the proposed
916	   'extended ECN' field.  Senders have incentives to use this codepoint
917	   sparingly, but they can use it on control packets to reduce their
918	   chance of being dropped.  For instance, the proposed modification to
919	   TCP for re-ECN uses this codepoint on the SYN and SYN-ACK.

921	   Although not brought to the IETF, a simple proposal from Wischik
922	   [DupTCP] suggests that the first three packets of every TCP flow
923	   should be routinely duplicated after a short delay.  It shows that
924	   this would greatly improve the chances of short flows completing
925	   quickly, but it would hardly increase traffic levels on the Internet,
926	   because Internet bytes have always been concentrated in the large
927	   flows.  It further shows that the performance of many typical
928	   applications depends on completion of long serial chains of short
929	   messages.  It argues that, given most of the value people get from
930	   the Internet is concentrated within short flows, this simple
931	   expedient would greatly increase the value of the best efforts
932	   Internet at minimal cost.

934	6.2.4.  Congestion Coding: Summary of Status

936	   +-----------+----------------+-----------------+--------------------+
937	   | transport |  RED_1 (packet |  RED_4 (linear  | RED_5 (square byte |
938	   |        cc |   mode drop)   | byte mode drop) |     mode drop)     |
939	   +-----------+----------------+-----------------+--------------------+
940	   |    TCP or |    s/sqrt(p)   |    sqrt(s/p)    |      1/sqrt(p)     |
941	   |      TFRC |                |                 |                    |
942	   |   TFRC-SP |    1/sqrt(p)   |    1/sqrt(sp)   |    1/(s.sqrt(p))   |
943	   +-----------+----------------+-----------------+--------------------+

945	     Table 1: Dependence of flow bit-rate per RTT on packet size s and
946	   drop rate p when network and/or transport bias towards small packets
947	                            to varying degrees

949	   Table 1 aims to summarise the positions we may now be in.  Each
950	   column shows a different possible AQM behaviour in different queues
951	   in the network, using the terminology of Cnodder et al outlined
952	   earlier (RED_1 is basic RED with packet-mode drop).  Each row shows a
953	   different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on
954	   the top row with TFRC-SP [RFC4828] below.  Suppressing all
955	   inessential details the table shows that independence from packet
956	   size should either be achievable by not altering the TCP transport in
957	   a RED_5 network, or using the small packet TFRC-SP transport in a
958	   network without any byte-mode dropping RED (top right and bottom
959	   left).  Top left is the `do nothing' scenario, while bottom right is
960	   the `do-both' scenario in which bit-rate would become far too biased
961	   towards small packets.  Of course, if any form of byte-mode dropping
962	   RED has been deployed on a selection of congested queues, each path
963	   will present a different hybrid scenario to its transport.

965	   Whatever, we can see that the linear byte-mode drop column in the
966	   middle considerably complicates the Internet.  It's a half-way house
967	   that doesn't bias enough towards small packets even if one believes
968	   the network should be doing the biasing.  We argue below that _all_
969	   network layer bias towards small packets should be turned off--if
970	   indeed any equipment vendors have implemented it--leaving packet size
971	   bias solely as the preserve of the transport layer (solely the
972	   leftmost, packet-mode drop column).

974	   A survey has been conducted of 84 vendors to assess how widely drop
975	   probability based on packet size has been implemented in RED.  Prior
976	   to the survey, an individual approach to Cisco received confirmation
977	   that, having checked the code-base for each of the product ranges,
978	   Cisco has not implemented any discrimination based on packet size in
979	   any AQM algorithm in any of its products.  Also an individual
980	   approach to Alcatel-Lucent drew a confirmation that it was very
981	   likely that none of their products contained RED code that
982	   implemented any packet-size bias.

984	   Turning to our more formal survey (Table 2), about 19% of those
985	   surveyed have replied so far, giving a sample size of 16.  Although
986	   we do not have permission to identify the respondents, we can say
987	   that those that have responded include most of the larger vendors,
988	   covering a large fraction of the market.  They range across the large
989	   network equipment vendors at L3 & L2, firewall vendors, wireless
990	   equipment vendors, as well as large software businesses with a small
991	   selection of networking products.  So far, all those who have
992	   responded have confirmed that they have not implemented the variant
993	   of RED with drop dependent on packet size (2 are fairly sure they
994	   haven't but need to check more thoroughly).

996	   +-------------------------------+----------------+-----------------+
997	   |                      Response | No. of vendors | %age of vendors |
998	   +-------------------------------+----------------+-----------------+
999	   |               Not implemented |             14 |             17% |
1000	   |    Not implemented (probably) |              2 |              2% |
1001	   |                   Implemented |              0 |              0% |
1002	   |                   No response |             68 |             81% |
1003	   | Total companies/orgs surveyed |             84 |            100% |
1004	   +-------------------------------+----------------+-----------------+

1006	    Table 2: Vendor Survey on byte-mode drop variant of RED (lower drop
1007	                      probability for small packets)

1009	   Where reasons have been given, the extra complexity of packet bias
1010	   code has been most prevalent, though one vendor had a more principled
1011	   reason for avoiding it--similar to the argument of this document.  We
1012	   have established that Linux does not implement RED with packet size
1013	   drop bias, although we have not investigated a wider range of open
1014	   source code.

1016	   Finally, we repeat that RED's byte mode drop is not the only way to
1017	   bias towards small packets--tail-drop tends to lock-out large packets
1018	   very effectively.  Our survey was of vendor implementations, so we
1019	   cannot be certain about operator deployment.  But we believe many
1020	   queues in the Internet are still tail-drop.  My own company (BT) has
1021	   widely deployed RED, but there are bound to be many tail-drop queues,
1022	   particularly in access network equipment and on middleboxes like
1023	   firewalls, where RED is not always available.  Routers using a memory
1024	   architecture based on fixed size buffers with borrowing may also
1025	   still be prevalent in the Internet.  As explained in Section 6.2.1,
1026	   these also provide a marginal (but legitimate) bias towards small
1027	   packets.  So even though RED byte-mode drop is not prevalent, it is
1028	   likely there is still some bias towards small packets in the Internet
1029	   due to tail drop and fixed buffer borrowing.

1031	7.  Outstanding Issues and Next Steps

1033	7.1.  Bit-congestible World

1035	   For a connectionless network with only bit-congestible resources we
1036	   believe the recommended position is now unarguably clear--that the
1037	   network should not make allowance for packet sizes and the transport
1038	   should.  This leaves two outstanding issues:

1040	   o  How to handle any legacy of AQM with byte-mode drop already
1041	      deployed;

1043	   o  The need to start a programme to update transport congestion
1044	      control protocol standards to take account of packet size.

1046	   The sample of returns from our vendor survey Section 6.2.4 suggest
1047	   that byte-mode packet drop seems not to be implemented at all let
1048	   alone deployed, or if it is, it is likely to be very sparse.
1049	   Therefore, we do not really need a migration strategy from all but
1050	   nothing to nothing.

1052	   A programme of standards updates to take account of packet size in
1053	   transport congestion control protocols has started with TFRC-SP
1054	   [RFC4828], while weighted TCPs implemented in the research community
1055	   [WindowPropFair] could form the basis of a future change to TCP
1056	   congestion control [RFC2581] itself.

1058	7.2.  Bit- & Packet-congestible World

1060	   Nonetheless, a connectionless network with both bit-congestible and
1061	   packet-congestible resources is a different matter.  If we believe we
1062	   should allow for this possibility in the future, this space contains
1063	   a truly open research issue.

1065	   The idealised wire protocol coding described in Section 5 requires at
1066	   least two flags for congestion of bit-congestible and packet-
1067	   congestible resources.  This hides a fundamental problem--much more
1068	   fundamental than whether we can magically create header space for yet
1069	   another ECN flag in IPv4, or whether it would work while being
1070	   deployed incrementally.  A congestion notification protocol must
1071	   survive a transition from low levels of congestion to high.  Marking
1072	   two states is feasible with explicit marking, but much harder if
1073	   packets are dropped.  Also, it will not always be cost-effective to
1074	   implement AQM at every low level resource, so drop will often have to
1075	   suffice.  Distinguishing drop from delivery naturally provides just
1076	   one congestion flag--it is hard to drop a packet in two ways that are
1077	   distinguishable remotely.  This is a similar problem to that of
1078	   distinguishing wireless transmission losses from congestive losses.

1080	   We should also note that, strictly, packet-congestible resources are
1081	   actually cycle-congestible because load also depends on the
1082	   complexity of each look-up and whether the pattern of arrivals is
1083	   amenable to caching or not.  Further, this reminds us that any
1084	   solution must not require a forwarding engine to use excessive
1085	   processor cycles in order to decide how to say it has no spare
1086	   processor cycles.

1088	   The problem of signalling packet processing congestion is not
1089	   pressing, as most if not all Internet resources are designed to be
1090	   bit-congestible before packet processing starts to congest.  However,
1091	   given the IRTF ICCRG has set itself the task of reaching consensus on
1092	   generic forwarding mechanisms that are necessary and sufficient to
1093	   support the Internet's future congestion control requirements
1094	   [I-D.irtf-iccrg-welzl-congestion-control-open-research], we must not
1095	   give this problem no thought at all, just because it is hard and
1096	   currently hypothetical.

1098	8.  Security Considerations

1100	   This draft recommends that queues do not bias drop probability
1101	   towards small packets as this creates a perverse incentive for
1102	   transports to break down their flows into tiny segments.  One of the
1103	   benefits of implementing AQM was meant to be to remove this perverse
1104	   incentive that drop-tail queues gave to small packets.  Of course, if
1105	   transports really want to make the greatest gains, they don't have to
1106	   respond to congestion anyway.  But we don't want applications that
1107	   are trying to behave to discover that they can go faster by using
1108	   smaller packets.

1110	   In practice, transports cannot all be trusted to respond to
1111	   congestion.  So another reason for recommending that queues do not
1112	   bias drop probability towards small packets is to avoid the
1113	   vulnerability to small packet DDoS attacks that would otherwise
1114	   result.  One of the benefits of implementing AQM was meant to be to
1115	   remove drop-tail's DoS vulnerability to small packets, so we
1116	   shouldn't add it back again.

1118	   If most queues implemented AQM with byte-mode drop, the resulting
1119	   network would amplify the potency of a small packet DDoS attack.  At
1120	   the first queue the stream of packets would push aside a greater
1121	   proportion of large packets, so more of the small packets would
1122	   survive to attack the next queue.  Thus a flood of small packets
1123	   would continue on towards the destination, pushing regular traffic
1124	   with large packets out of the way in one queue after the next, but
1125	   suffering much less drop itself.

1127	   Appendix C explains why the ability of networks to police the
1128	   response of _any_ transport to congestion depends on bit-congestible
1129	   network resources only doing packet-mode not byte-mode drop.  In
1130	   summary, it says that making drop probability depend on the size of
1131	   the packets that bits happen to be divided into simply encourages the
1132	   bits to be divided into smaller packets.  Byte-mode drop would
1133	   therefore irreversibly complicate any attempt to fix the Internet's
1134	   incentive structures.

1136	9.  Conclusions

1138	   The strong conclusion is that AQM algorithms such as RED SHOULD NOT
1139	   use byte-mode drop.  More generally, the Internet's congestion
1140	   notification protocols (drop, ECN & PCN) SHOULD take account of
1141	   packet size when the notification is read by the transport layer, NOT
1142	   when it is written by the network layer.  This approach offers
1143	   sufficient and correct congestion information for all known and
1144	   future transport protocols and also ensures no perverse incentives
1145	   are created that would encourage transports to use inappropriately
1146	   small packet sizes.

1148	   The alternative of deflating RED's drop probability for smaller
1149	   packet sizes (byte-mode drop) has no enduring advantages.  It is more
1150	   complex, it creates the perverse incentive to fragment segments into
1151	   tiny pieces and it reopens the vulnerability to floods of small-
1152	   packets that drop-tail queues suffered from and AQM was designed to
1153	   remove.  Byte-mode drop is a change to the network layer that makes
1154	   allowance for an omission from the design of TCP, effectively reverse
1155	   engineering the network layer to contrive to make two TCPs with
1156	   different packet sizes run at equal bit rates (rather than packet
1157	   rates) under the same path conditions.  It also improves TCP
1158	   performance by reducing the chance that a SYN or a pure ACK will be
1159	   dropped, because they are small.  But we SHOULD NOT hack the network
1160	   layer to improve or fix certain transport protocols.  No matter how
1161	   predominant a transport protocol is (even if it's TCP), trying to
1162	   correct for its failings by biasing towards small packets in the
1163	   network layer creates a perverse incentive to break down all flows
1164	   from all transports into tiny segments.

1166	   So far, our survey of 84 vendors across the industry has drawn
1167	   responses from about 19%, none of whom have implemented the byte mode
1168	   packet drop variant of RED.  Given there appears to be little, if
1169	   any, installed base recommending removal of byte-mode drop from RED
1170	   is possibly only a paper exercise with few, if any, incremental
1171	   deployment issues.

1173	   If a vendor has implemented byte-mode drop, and an operator has
1174	   turned it on, it is strongly RECOMMENDED that it SHOULD be turned
1175	   off.  Note that RED as a whole SHOULD NOT be turned off, as without
1176	   it, a drop tail queue also biases against large packets.  But note
1177	   also that turning off byte-mode may alter the relative performance of
1178	   applications using different packet sizes, so it would be advisable
1179	   to establish the implications before turning it off.

1181	   Instead, the IETF transport area should continue its programme of
1182	   updating congestion control protocols to take account of packet size
1183	   and to make transports less sensitive to losing control packets like
1184	   SYNs and pure ACKS.

1186	   NOTE WELL that RED's byte-mode queue measurement is fine, being
1187	   completely orthogonal to byte-mode drop.  If a RED implementation has
1188	   a byte-mode but does not specify what sort of byte-mode, it is most
1189	   probably byte-mode queue measurement, which is fine.  However, if in
1190	   doubt, the vendor should be consulted.

1192	   The above conclusions cater for the Internet as it is today with
1193	   most, if not all, resources being primarily bit-congestible.  A
1194	   secondary conclusion of this memo is that we may see more packet-
1195	   congestible resources in the future, so research may be needed to
1196	   extend the Internet's congestion notification (drop or ECN) so that
1197	   it can handle a mix of bit-congestible and packet-congestible
1198	   resources.

1200	10.  Acknowledgements

1202	   Thank you to Sally Floyd, who gave extensive and useful review
1203	   comments.  Also thanks for the reviews from Toby Moncaster and Arnaud
1204	   Jacquet.  I am grateful to Bruce Davie and his colleagues for
1205	   providing a timely and efficient survey of RED implementation in
1206	   Cisco's product range.  Also grateful thanks to Toby Moncaster, Will
1207	   Dormann, John Regnault, Simon Carter and Stefaan De Cnodder who
1208	   further helped survey the current status of RED implementation and
1209	   deployment and, finally, thanks to the anonymous individuals who
1210	   responded.

1212	11.  Comments Solicited

1214	   Comments and questions are encouraged and very welcome.  They can be
1215	   addressed to the IETF Transport Area working group mailing list
1216	   <tsvwg@ietf.org>, and/or to the authors.

1218	Editorial Comments

1220	   [Note_Variation]  The algorithm of the byte-mode drop variant of RED
1221	                     switches off any bias towards small packets
1222	                     whenever the smoothed queue length dictates that
1223	                     the drop probability of large packets should be
1224	                     100%. In the example in the Introduction, as the
1225	                     large packet drop probability varies around 25% the
1226	                     small packet drop probability will vary around 1%,
1227	                     but with occasional jumps to 100% whenever the
1228	                     instantaneous queue (after drop) manages to sustain
1229	                     a length above the 100% drop point for longer than
1230	                     the queue averaging period.

1232	Appendix A.  Example Scenarios

1234	A.1.  Notation

1236	   To prove the two sets of assertions in the idealised wire protocol
1237	   (Section 5) are true, we will compare two flows with different packet
1238	   sizes, s_1 and s_2 [bit/pkt], to make sure their transports each see
1239	   the correct congestion notification.  Initially, within each flow we
1240	   will take all packets as having equal sizes, but later we will
1241	   generalise to flows within which packet sizes vary.  A flow's bit
1242	   rate, x [bit/s], is related to its packet rate, u [pkt/s], by

1244	      x(t) = s.u(t).

1246	   We will consider a 2x2 matrix of four scenarios:

1248	   +-----------------------------+------------------+------------------+
1249	   |           resource type and |   A) Equal bit   |   B) Equal pkt   |
1250	   |            congestion level |       rates      |       rates      |
1251	   +-----------------------------+------------------+------------------+
1252	   |     i) bit-congestible, p_b |       (Ai)       |       (Bi)       |
1253	   |    ii) pkt-congestible, p_p |       (Aii)      |       (Bii)      |
1254	   +-----------------------------+------------------+------------------+

1256	                                  Table 3

1258	A.2.  Bit-congestible resource, equal bit rates (Ai)

1260	   Starting with the bit-congestible scenario, for two flows to maintain
1261	   equal bit rates (Ai) the ratio of the packet rates must be the
1262	   inverse of the ratio of packet sizes: u_2/u_1 = s_1/s_2.  So, for
1263	   instance, a flow of 60B packets would have to send 25x more packets
1264	   to achieve the same bit rate as a flow of 1500B packets.  If a
1265	   congested resource marks proportion p_b of packets irrespective of
1266	   size, the ratio of marked packets received by each transport will
1267	   still be the same as the ratio of their packet rates, p_b.u_2/p_b.u_1
1268	   = s_1/s_2.  So of the 25x more 60B packets sent, 25x more will be
1269	   marked than in the 1500B packet flow, but 25x more won't be marked
1270	   too.

1272	   In this scenario, the resource is bit-congestible, so it always uses
1273	   our idealised bit-congestion field when it marks packets.  Therefore
1274	   the transport should count marked bytes not packets.  But it doesn't
1275	   actually matter for ratio-based transports like TCP (Section 5).  The
1276	   ratio of marked to unmarked bytes seen by each flow will be p_b, as
1277	   will the ratio of marked to unmarked packets.  Because they are
1278	   ratios, the units cancel out.

1280	   If a flow sent an inconsistent mixture of packet sizes, we have said
1281	   it should count the ratio of marked and unmarked bytes not packets in
1282	   order to correctly decode the level of congestion.  But actually, if
1283	   all it is trying to do is decode p_b, it still doesn't matter.  For
1284	   instance, imagine the two equal bit rate flows were actually one flow
1285	   at twice the bit rate sending a mixture of one 1500B packet for every
1286	   thirty 60B packets. 25x more small packets will be marked and 25x
1287	   more will be unmarked.  The transport can still calculate p_b whether
1288	   it uses bytes or packets for the ratio.  In general, for any
1289	   algorithm which works on a ratio of marks to non-marks, either bytes
1290	   or packets can be counted interchangeably, because the choice cancels
1291	   out in the ratio calculation.

1293	   However, where an absolute target rather than relative volume of
1294	   congestion caused is important (Section 5), as it is for congestion
1295	   accountability [Rate_fair_Dis], the transport must count marked bytes
1296	   not packets, in this bit-congestible case.  Aside from the goal of
1297	   congestion accountability, this is how the bit rate of a transport
1298	   can be made independent of packet size; by ensuring the rate of
1299	   congestion caused is kept to a constant weight [WindowPropFair],
1300	   rather than merely responding to the ratio of marked and unmarked
1301	   bytes.

1303	   Note the unit of byte-congestion volume is the byte.

1305	A.3.  Bit-congestible resource, equal packet rates (Bi)

1307	   If two flows send different packet sizes but at the same packet rate,
1308	   their bit rates will be in the same ratio as their packet sizes, x_2/
1309	   x_1 = s_2/s_1.  For instance, a flow sending 1500B packets at the
1310	   same packet rate as another sending 60B packets will be sending at
1311	   25x greater bit rate.  In this case, if a congested resource marks
1312	   proportion p_b of packets irrespective of size, the ratio of packets
1313	   received with the byte-congestion field marked by each transport will
1314	   be the same, p_b.u_2/p_b.u_1 = 1.

1316	   Because the byte-congestion field is marked, the transport should
1317	   count marked bytes not packets.  But because each flow sends
1318	   consistently sized packets it still doesn't matter for ratio-based
1319	   transports.  The ratio of marked to unmarked bytes seen by each flow
1320	   will be p_b, as will the ratio of marked to unmarked packets.
1321	   Therefore, if the congestion control algorithm is only concerned with
1322	   the ratio of marked to unmarked packets (as is TCP), both flows will
1323	   be able to decode p_b correctly whether they count packets or bytes.

1325	   But if the absolute volume of congestion is important, e.g. for
1326	   congestion accountability, the transport must count marked bytes not
1327	   packets.  Then the lower bit rate flow using smaller packets will
1328	   rightly be perceived as causing less byte-congestion even though its
1329	   packet rate is the same.

1331	   If the two flows are mixed into one, of bit rate x1+x2, with equal
1332	   packet rates of each size packet, the ratio p_b will still be
1333	   measurable by counting the ratio of marked to unmarked bytes (or
1334	   packets because the ratio cancels out the units).  However, if the
1335	   absolute volume of congestion is required, the transport must count
1336	   the sum of congestion marked bytes, which indeed gives a correct
1337	   measure of the rate of byte-congestion p_b(x_1 + x_2) caused by the
1338	   combined bit rate.

1340	A.4.  Pkt-congestible resource, equal bit rates (Aii)

1342	   Moving to the case of packet-congestible resources, we now take two
1343	   flows that send different packet sizes at the same bit rate, but this
1344	   time the pkt-congestion field is marked by the resource with
1345	   probability p_p.  As in scenario Ai with the same bit rates but a
1346	   bit-congestible resource, the flow with smaller packets will have a
1347	   higher packet rate, so more packets will be both marked and unmarked,
1348	   but in the same proportion.

1350	   This time, the transport should only count marks without taking into
1351	   account packet sizes.  Transports will get the same result, p_p, by
1352	   decoding the ratio of marked to unmarked packets in either flow.

1354	   If one flow imitates the two flows but merged together, the bit rate
1355	   will double with more small packets than large.  The ratio of marked
1356	   to unmarked packets will still be p_p.  But if the absolute number of
1357	   pkt-congestion marked packets is counted it will accumulate at the
1358	   combined packet rate times the marking probability, p_p(u_1+u_2), 26x
1359	   faster than packet congestion accumulates in the single 1500B packet
1360	   flow of our example, as required.

1362	   But if the transport is interested in the absolute number of packet
1363	   congestion, it should just count how many marked packets arrive.  For
1364	   instance, a flow sending 60B packets will see 25x more marked packets
1365	   than one sending 1500B packets at the same bit rate, because it is
1366	   sending more packets through a packet-congestible resource.

1368	   Note the unit of packet congestion is packets.

1370	A.5.  Pkt-congestible resource, equal packet rates (Bii)

1372	   Finally, if two flows with the same packet rate, pass through a
1373	   packet-congestible resource, they will both suffer the same
1374	   proportion of marking, p_p, irrespective of their packet sizes.  On
1375	   detecting that the pkt-congestion field is marked, the transport
1376	   should count packets, and it will be able to extract the ratio p_p of
1377	   marked to unmarked packets from both flows, irrespective of packet
1378	   sizes.

1380	   Even if the transport is monitoring the absolute amount of packets
1381	   congestion over a period, still it will see the same amount of packet
1382	   congestion from either flow.

1384	   And if the two equal packet rates of different size packets are mixed
1385	   together in one flow, the packet rate will double, so the absolute
1386	   volume of packet-congestion will accumulate at twice the rate of
1387	   either flow, 2p_p.u_1 = p_p(u_1+u_2).

1389	Appendix B.  Congestion Notification Definition: Further Justification

1391	   In Section 3 on the definition of congestion notification, load not
1392	   capacity was used as the denominator.  This also has a subtle
1393	   significance in the related debate over the design of new transport
1394	   protocols--typical new protocol designs (e.g. in XCP
1395	   [I-D.falk-xcp-spec] & Quickstart [RFC4782]) expect the sending
1396	   transport to communicate its desired flow rate to the network and
1397	   network elements to progressively subtract from this so that the
1398	   achievable flow rate emerges at the receiving transport.

1400	   Congestion notification with total load in the denominator can serve
1401	   a similar purpose (though in retrospect not in advance like XCP &
1402	   QuickStart).  Congestion notification is a dimensionless fraction but
1403	   each source can extract necessary rate information from it because it
1404	   already knows what its own rate is.  Even though congestion
1405	   notification doesn't communicate a rate explicitly, from each
1406	   source's point of view congestion notification represents the
1407	   fraction of the rate it was sending a round trip ago that couldn't
1408	   (or wouldn't) be served by available resources.  After they were
1409	   sent, all these fractions of each source's offered load added up to
1410	   the aggregate fraction of offered load seen by the congested
1411	   resource.  So, the source can also know the total excess rate by
1412	   multiplying total load by congestion level.  Therefore congestion
1413	   notification, as one scale-free dimensionless fraction, implicitly
1414	   communicates the instantaneous excess flow rate, albeit a RTT ago.

1416	Appendix C.  Byte-mode Drop Complicates Policing Congestion Response

1418	   This appendix explains why the ability of networks to police the
1419	   response of _any_ transport to congestion depends on bit-congestible
1420	   network resources only doing packet-mode not byte-mode drop.

1422	   To be able to police a transport's response to congestion when
1423	   fairness can only be judged over time and over all an individual's
1424	   flows, the policer has to have an integrated view of all the
1425	   congestion an individual (not just one flow) has caused due to all
1426	   traffic entering the Internet from that individual.  This is termed
1427	   congestion accountability.

1429	   But with byte-mode drop, one dropped or marked packet is not
1430	   necessarily equivalent to another unless you know the MTU that caused
1431	   it to be dropped/marked.  To have an integrated view of a user, we
1432	   believe congestion policing has to be located at an individual's
1433	   attachment point to the Internet [Re-TCP].  But from there it cannot
1434	   know the MTU of each remote queue that caused each drop/mark.
1435	   Therefore it cannot take an integrated approach to policing all the
1436	   responses to congestion of all the transports of one individual.
1437	   Therefore it cannot police anything.

1439	   The security/incentive argument _for_ packet-mode drop is similar.
1440	   Firstly, confining RED to packet-mode drop would not preclude
1441	   bottleneck policing approaches such as [pBox] as it seems likely they
1442	   could work just as well by monitoring the volume of dropped bytes
1443	   rather than packets.  Secondly packet-mode dropping/marking naturally
1444	   allows the congestion notification of packets to be globally
1445	   meaningful without relying on MTU information held elsewhere.

1447	   Because we recommend that a dropped/marked packet should be taken to
1448	   mean that all the bytes in the packet are dropped/marked, a policer
1449	   can remain robust against bits being re-divided into different size
1450	   packets or across different size flows [Rate_fair_Dis].  Therefore
1451	   policing would work naturally with just simple packet-mode drop in
1452	   RED.

1454	   In summary, making drop probability depend on the size of the packets
1455	   that bits happen to be divided into simply encourages the bits to be
1456	   divided into smaller packets.  Byte-mode drop would therefore
1457	   irreversibly complicate any attempt to fix the Internet's incentive
1458	   structures.

1460	12.  References
1461	12.1.  Normative References

1463	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1464	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1466	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
1467	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
1468	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
1469	              S., Wroclawski, J., and L. Zhang, "Recommendations on
1470	              Queue Management and Congestion Avoidance in the
1471	              Internet", RFC 2309, April 1998.

1473	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
1474	              "Definition of the Differentiated Services Field (DS
1475	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
1476	              December 1998.

1478	   [RFC2581]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
1479	              Control", RFC 2581, April 1999.

1481	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1482	              of Explicit Congestion Notification (ECN) to IP",
1483	              RFC 3168, September 2001.

1485	   [RFC3426]  Floyd, S., "General Architectural and Policy
1486	              Considerations", RFC 3426, November 2002.

1488	   [RFC3448]  Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP
1489	              Friendly Rate Control (TFRC): Protocol Specification",
1490	              RFC 3448, January 2003.

1492	   [RFC4828]  Floyd, S. and E. Kohler, "TCP Friendly Rate Control
1493	              (TFRC): The Small-Packet (SP) Variant", RFC 4828,
1494	              April 2007.

1496	   [RFC5033]  Floyd, S. and M. Allman, "Specifying New Congestion
1497	              Control Algorithms", BCP 133, RFC 5033, August 2007.

1499	12.2.  Informative References

1501	   [CCvarPktSize]
1502	              Widmer, J., Boutremans, C., and J-Y. Le Boudec,
1503	              "Congestion Control for Flows with Variable Packet Size",
1504	              ACM CCR 34(2) 137--151, 2004,
1505	              <http://doi.acm.org/10.1145/997150.997162>.

1507	   [DupTCP]   Wischik, D., "Short messages", Royal Society workshop on
1508	              networks: modelling and control , September 2007, <http://
1509	              www.cs.ucl.ac.uk/staff/ucacdjw/Research/shortmsg.html>.

1511	   [ECNFixedWireless]
1512	              Siris, V., "Resource Control for Elastic Traffic in CDMA
1513	              Networks", Proc. ACM MOBICOM'02 , September 2002, <http://
1514	              www.ics.forth.gr/netlab/publications/
1515	              resource_control_elastic_cdma.html>.

1517	   [Evol_cc]  Gibbens, R. and F. Kelly, "Resource pricing and the
1518	              evolution of congestion control", Automatica 35(12)1969--
1519	              1985, December 1999,
1520	              <http://www.statslab.cam.ac.uk/~frank/evol.html>.

1522	   [I-D.falk-xcp-spec]
1523	              Falk, A., "Specification for the Explicit Control Protocol
1524	              (XCP)", draft-falk-xcp-spec-03 (work in progress),
1525	              July 2007.

1527	   [I-D.floyd-tcpm-ackcc]
1528	              Floyd, S. and I. Property, "Adding Acknowledgement
1529	              Congestion Control to TCP", draft-floyd-tcpm-ackcc-02
1530	              (work in progress), November 2007.

1532	   [I-D.ietf-pcn-architecture]
1533	              Eardley, P., "Pre-Congestion Notification Architecture",
1534	              draft-ietf-pcn-architecture-03 (work in progress),
1535	              February 2008.

1537	   [I-D.ietf-tcpm-ecnsyn]
1538	              Floyd, S., "Adding Explicit Congestion Notification (ECN)
1539	              Capability to TCP's SYN/ACK  Packets",
1540	              draft-ietf-tcpm-ecnsyn-05 (work in progress),
1541	              February 2008.

1543	   [I-D.ietf-tcpm-rfc2581bis]
1544	              Allman, M., "TCP Congestion Control",
1545	              draft-ietf-tcpm-rfc2581bis-03 (work in progress),
1546	              September 2007.

1548	   [I-D.irtf-iccrg-welzl-congestion-control-open-research]
1549	              Papadimitriou, D., "Open Research Issues in Internet
1550	              Congestion Control",
1551	              draft-irtf-iccrg-welzl-congestion-control-open-research-00
1552	              (work in progress), July 2007.

1554	   [IOSArch]  Bollapragada, V., White, R., and C. Murphy, "Inside Cisco
1555	              IOS Software Architecture", Cisco Press: CCIE Professional
1556	              Development ISBN13: 978-1-57870-181-0, July 2000.

1558	   [MulTCP]   Crowcroft, J. and Ph. Oechslin, "Differentiated End to End
1559	              Internet Services using a Weighted Proportional Fair
1560	              Sharing TCP", CCR 28(3) 53--69, July 1998, <http://
1561	              www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>.

1563	   [PCNcharter]
1564	              IETF, "Congestion and Pre-Congestion Notification (pcn)",
1565	              IETF w-g charter , Feb 2007,
1566	              <http://www.ietf.org/html.charters/pcn-charter.html>.

1568	   [PktSizeEquCC]
1569	              Vasallo, P., "Variable Packet Size Equation-Based
1570	              Congestion Control", ICSI Technical Report tr-00-008,
1571	              2000, <http://http.icsi.berkeley.edu/ftp/global/pub/
1572	              techreports/2000/tr-00-008.pdf>.

1574	   [RED93]    Floyd, S. and V. Jacobson, "Random Early Detection (RED)
1575	              gateways for Congestion Avoidance", IEEE/ACM Transactions
1576	              on Networking 1(4) 397--413, August 1993,
1577	              <http://www.icir.org/floyd/papers/red/red.html>.

1579	   [REDbias]  Eddy, W. and M. Allman, "A Comparison of RED's Byte and
1580	              Packet Modes", Computer Networks 42(3) 261--280,
1581	              June 2003,
1582	              <http://www.ir.bbn.com/documents/articles/redbias.ps>.

1584	   [REDbyte]  De Cnodder, S., Elloumi, O., and K. Pauwels, "RED behavior
1585	              with different packet sizes", Proc. 5th IEEE Symposium on
1586	              Computers and Communications (ISCC) 793--799, July 2000,
1587	              <http://www.icir.org/floyd/red/Elloumi99.pdf>.

1589	   [RFC3714]  Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
1590	              Control for Voice Traffic in the Internet", RFC 3714,
1591	              March 2004.

1593	   [RFC4782]  Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-
1594	              Start for TCP and IP", RFC 4782, January 2007.

1596	   [Rate_fair_Dis]
1597	              Briscoe, B., "Flow Rate Fairness: Dismantling a Religion",
1598	              ACM CCR 37(2)63--74, April 2007,
1599	              <http://portal.acm.org/citation.cfm?id=1232926>.

1601	   [Re-TCP]   Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
1602	              "Re-ECN: Adding Accountability for Causing Congestion to
1603	              TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in
1604	              progress), January 2008.

1606	   [WindowPropFair]
1607	              Siris, V., "Service Differentiation and Performance of
1608	              Weighted Window-Based Congestion Control and Packet
1609	              Marking Algorithms in ECN Networks", Computer
1610	              Communications 26(4) 314--326, 2002, <http://
1611	              www.ics.forth.gr/netgroup/publications/
1612	              weighted_window_control.html>.

1614	   [gentle_RED]
1615	              Floyd, S., "Recommendation on using the "gentle_" variant
1616	              of RED", Web page , March 2000,
1617	              <http://www.icir.org/floyd/red/gentle.html>.

1619	   [pBox]     Floyd, S. and K. Fall, "Promoting the Use of End-to-End
1620	              Congestion Control in the Internet", IEEE/ACM Transactions
1621	              on Networking 7(4) 458--472, August 1999,
1622	              <http://www.aciri.org/floyd/end2end-paper.html>.

1624	   [pktByteEmail]
1625	              Floyd, S., "RED: Discussions of Byte and Packet Modes",
1626	              email , March 1997,
1627	              <http://www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>.

1629	Author's Address

1631	   Bob Briscoe
1632	   BT & UCL
1633	   B54/77, Adastral Park
1634	   Martlesham Heath
1635	   Ipswich  IP5 3RE
1636	   UK

1638	   Phone: +44 1473 645196
1639	   Email: bob.briscoe@bt.com
1640	   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/

1642	Full Copyright Statement

1644	   Copyright (C) The IETF Trust (2008).

1646	   This document is subject to the rights, licenses and restrictions
1647	   contained in BCP 78, and except as set forth therein, the authors
1648	   retain all their rights.

1650	   This document and the information contained herein are provided on an
1651	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1652	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1653	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1654	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1655	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1656	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1658	Intellectual Property

1660	   The IETF takes no position regarding the validity or scope of any
1661	   Intellectual Property Rights or other rights that might be claimed to
1662	   pertain to the implementation or use of the technology described in
1663	   this document or the extent to which any license under such rights
1664	   might or might not be available; nor does it represent that it has
1665	   made any independent effort to identify any such rights.  Information
1666	   on the procedures with respect to rights in RFC documents can be
1667	   found in BCP 78 and BCP 79.

1669	   Copies of IPR disclosures made to the IETF Secretariat and any
1670	   assurances of licenses to be made available, or the result of an
1671	   attempt made to obtain a general license or permission for the use of
1672	   such proprietary rights by implementers or users of this
1673	   specification can be obtained from the IETF on-line IPR repository at
1674	   http://www.ietf.org/ipr.

1676	   The IETF invites any interested party to bring to its attention any
1677	   copyrights, patents or patent applications, or other proprietary
1678	   rights that may cover technology that may be required to implement
1679	   this standard.  Please address the information to the IETF at
1680	   ietf-ipr@ietf.org.

1682	Acknowledgments

1684	   Funding for the RFC Editor function is provided by the IETF
1685	   Administrative Support Activity (IASA).  This document was produced
1686	   using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
1687	   RFC-2629 XML format.