idnits 2.17.1 

draft-ietf-tsvwg-byte-pkt-congest-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1567 has weird spacing: '...ability    p  ...'

  == Line 1572 has weird spacing: '...ss-rate  p*u  ...'

  == Line 1573 has weird spacing: '...ss-rate  p*u*s...'

  == Line 1580 has weird spacing: '...ss-rate  p*u  ...'

  == Line 1581 has weird spacing: '...ss-rate  p*u*s...'

     (Using the creation date from RFC2309, updated by this document, for
     RFC5378 checks: 1997-03-25)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 23, 2013) is 3991 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  B. Briscoe
3	Internet-Draft                                                        BT
4	Updates: 2309 (if approved)                                    J. Manner
5	Intended status: BCP                                    Aalto University
6	Expires: November 24, 2013                                  May 23, 2013

8	                Byte and Packet Congestion Notification
9	                  draft-ietf-tsvwg-byte-pkt-congest-10

11	Abstract

13	   This document provides recommendations of best current practice for
14	   dropping or marking packets using any active queue management (AQM)
15	   algorithm, such as random early detection (RED), BLUE, pre-congestion
16	   notification (PCN), etc.  We give three strong recommendations: (1)
17	   packet size should be taken into account when transports read and
18	   respond to congestion indications, (2) packet size should not be
19	   taken into account when network equipment creates congestion signals
20	   (marking, dropping), and therefore (3) in the specific case of RED,
21	   the byte-mode packet drop variant that drops fewer small packets
22	   should not be used.  This memo updates RFC 2309 to deprecate
23	   deliberate preferential treatment of small packets in AQM algorithms.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on November 24, 2013.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
60	     1.1.  Terminology and Scoping  . . . . . . . . . . . . . . . . .  6
61	     1.2.  Example Comparing Packet-Mode Drop and Byte-Mode Drop  . .  7
62	   2.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . .  9
63	     2.1.  Recommendation on Queue Measurement  . . . . . . . . . . .  9
64	     2.2.  Recommendation on Encoding Congestion Notification . . . . 10
65	     2.3.  Recommendation on Responding to Congestion . . . . . . . . 11
66	     2.4.  Recommendation on Handling Congestion Indications when
67	           Splitting or Merging Packets . . . . . . . . . . . . . . . 12
68	   3.  Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 12
69	     3.1.  Avoiding Perverse Incentives to (Ab)use Smaller Packets  . 12
70	     3.2.  Small != Control . . . . . . . . . . . . . . . . . . . . . 14
71	     3.3.  Transport-Independent Network  . . . . . . . . . . . . . . 14
72	     3.4.  Partial Deployment of AQM  . . . . . . . . . . . . . . . . 15
73	     3.5.  Implementation Efficiency  . . . . . . . . . . . . . . . . 17
74	   4.  A Survey and Critique of Past Advice . . . . . . . . . . . . . 17
75	     4.1.  Congestion Measurement Advice  . . . . . . . . . . . . . . 17
76	       4.1.1.  Fixed Size Packet Buffers  . . . . . . . . . . . . . . 18
77	       4.1.2.  Congestion Measurement without a Queue . . . . . . . . 19
78	     4.2.  Congestion Notification Advice . . . . . . . . . . . . . . 20
79	       4.2.1.  Network Bias when Encoding . . . . . . . . . . . . . . 20
80	       4.2.2.  Transport Bias when Decoding . . . . . . . . . . . . . 21
81	       4.2.3.  Making Transports Robust against Control Packet
82	               Losses . . . . . . . . . . . . . . . . . . . . . . . . 23
83	       4.2.4.  Congestion Notification: Summary of Conflicting
84	               Advice . . . . . . . . . . . . . . . . . . . . . . . . 23
85	   5.  Outstanding Issues and Next Steps  . . . . . . . . . . . . . . 24
86	     5.1.  Bit-congestible Network  . . . . . . . . . . . . . . . . . 24
87	     5.2.  Bit- & Packet-congestible Network  . . . . . . . . . . . . 25
88	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 25
89	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 26
90	   8.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 26
91	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27
92	   10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 27
93	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27
94	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 27
95	     11.2. Informative References . . . . . . . . . . . . . . . . . . 28
96	   Appendix A.  Survey of RED Implementation Status . . . . . . . . . 31
97	   Appendix B.  Sufficiency of Packet-Mode Drop . . . . . . . . . . . 32
98	     B.1.  Packet-Size (In)Dependence in Transports . . . . . . . . . 33
99	     B.2.  Bit-Congestible and Packet-Congestible Indications . . . . 36
100	   Appendix C.  Byte-mode Drop Complicates Policing Congestion
101	                Response  . . . . . . . . . . . . . . . . . . . . . . 37
102	   Appendix D.  Changes from Previous Versions  . . . . . . . . . . . 38

104	1.  Introduction

106	   This memo concerns how we should correctly scale congestion control
107	   functions with respect to packet size for the long term.  It also
108	   recognises that expediency may be necessary to deal with existing
109	   widely deployed protocols that don't live up to the long term goal.

111	   When signalling congestion, the problem of how (and whether) to take
112	   packet sizes into account has exercised the minds of researchers and
113	   practitioners for as long as active queue management (AQM) has been
114	   discussed.  Indeed, one reason AQM was originally introduced was to
115	   reduce the lock-out effects that small packets can have on large
116	   packets in drop-tail queues.  This memo aims to state the principles
117	   we should be using and to outline how these principles will affect
118	   future protocol design, taking into account the existing deployments
119	   we have already.

121	   The question of whether to take into account packet size arises at
122	   three stages in the congestion notification process:

124	   Measuring congestion:  When a congested resource measures locally how
125	      congested it is, should it measure its queue length in time, bytes
126	      or packets?

128	   Encoding congestion notification into the wire protocol:  When a
129	      congested network resource signals its level of congestion, should
130	      it drop / mark each packet dependent on the size of the particular
131	      packet in question?

133	   Decoding congestion notification from the wire protocol:  When a
134	      transport interprets the notification in order to decide how much
135	      to respond to congestion, should it take into account the size of
136	      each missing or marked packet?

138	   Consensus has emerged over the years concerning the first stage: if
139	   queues cannot be measured in time, whether they should be measured in
140	   bytes or packets.  Section 2.1 of this memo records this consensus in
141	   the RFC Series.  In summary the choice solely depends on whether the
142	   resource is congested by bytes or packets.

144	   The controversy is mainly around the last two stages: whether to
145	   allow for the size of the specific packet notifying congestion i)
146	   when the network encodes or ii) when the transport decodes the
147	   congestion notification.

149	   Currently, the RFC series is silent on this matter other than a paper
150	   trail of advice referenced from [RFC2309], which conditionally
151	   recommends byte-mode (packet-size dependent) drop [pktByteEmail].

153	   Reducing drop of small packets certainly has some tempting
154	   advantages: i) it drops less control packets, which tend to be small
155	   and ii) it makes TCP's bit-rate less dependent on packet size.
156	   However, there are ways of addressing these issues at the transport
157	   layer, rather than reverse engineering network forwarding to fix the
158	   problems.

160	   This memo updates [RFC2309] to deprecate deliberate preferential
161	   treatment of small packets in AQM algorithms.  It recommends that (1)
162	   packet size should be taken into account when transports read
163	   congestion indications, (2) not when network equipment writes them.
164	   This memo also adds to the congestion control principles enumerated
165	   in BCP 41 [RFC2914].

167	   In the particular case of Random early Detection (RED), this means
168	   that the byte-mode packet drop variant should not be used to drop
169	   fewer small packets, because that creates a perverse incentive for
170	   transports to use tiny segments, consequently also opening up a DoS
171	   vulnerability.  Fortunately all the RED implementers who responded to
172	   our admittedly limited survey (Section 4.2.4) have not followed the
173	   earlier advice to use byte-mode drop, so the position this memo
174	   argues for seems to already exist in implementations.

176	   However, at the transport layer, TCP congestion control is a widely
177	   deployed protocol that doesn't scale with packet size.  To date this
178	   hasn't been a significant problem because most TCP implementations
179	   have been used with similar packet sizes.  But, as we design new
180	   congestion control mechanisms, this memo recommends that we should
181	   build in scaling with packet size rather than assuming we should
182	   follow TCP's example.

184	   This memo continues as follows.  First it discusses terminology and
185	   scoping.  Section 2 gives the concrete formal recommendations,
186	   followed by motivating arguments in Section 3.  We then critically
187	   survey the advice given previously in the RFC series and the research
188	   literature (Section 4), referring to an assessment of whether or not
189	   this advice has been followed in production networks (Appendix A).
190	   To wrap up, outstanding issues are discussed that will need
191	   resolution both to inform future protocol designs and to handle
192	   legacy (Section 5).  Then security issues are collected together in
193	   Section 6 before conclusions are drawn in Section 8.  The interested
194	   reader can find discussion of more detailed issues on the theme of
195	   byte vs. packet in the appendices.

197	   This memo intentionally includes a non-negligible amount of material
198	   on the subject.  For the busy reader Section 2 summarises the
199	   recommendations for the Internet community.

201	1.1.  Terminology and Scoping

203	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
204	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
205	   document are to be interpreted as described in [RFC2119].

207	   This memo applies to the design of all AQM algorithms, for example,
208	   Random Early Detection (RED) [RFC2309], BLUE [BLUE02], Pre-Congestion
209	   Notification (PCN) [RFC5670], Controlled Delay (CoDel) [CoDel12] and
210	   the Proportional Integral controller Enhanced (PIE)
211	   [I-D.pan-tsvwg-pie].  Throughout, RED is used as a concrete example
212	   because it is a widely known and deployed AQM algorithm.  There is no
213	   intention to imply that the advice is any less applicable to the
214	   other algorithms, nor that RED is preferred.

216	   Congestion Notification:  Congestion notification is a changing
217	      signal that aims to communicate the probability that the network
218	      resource(s) will not be able to forward the level of traffic load
219	      offered (or that there is an impending risk that they will not be
220	      able to).

222	      The `impending risk' qualifier is added, because AQM systems set a
223	      virtual limit smaller than the actual limit to the resource, then
224	      notify when this virtual limit is exceeded in order to avoid
225	      uncontrolled congestion of the actual capacity.

227	      Congestion notification communicates a real number bounded by the
228	      range [ 0 , 1 ].  This ties in with the most well-understood
229	      measure of congestion notification: drop probability.

231	   Explicit and Implicit Notification:  The byte vs. packet dilemma
232	      concerns congestion notification irrespective of whether it is
233	      signalled implicitly by drop or using explicit congestion
234	      notification (ECN [RFC3168] or PCN [RFC5670]).  Throughout this
235	      document, unless clear from the context, the term marking will be
236	      used to mean notifying congestion explicitly, while congestion
237	      notification will be used to mean notifying congestion either
238	      implicitly by drop or explicitly by marking.

240	   Bit-congestible vs. Packet-congestible:  If the load on a resource
241	      depends on the rate at which packets arrive, it is called packet-
242	      congestible.  If the load depends on the rate at which bits arrive
243	      it is called bit-congestible.

245	      Examples of packet-congestible resources are route look-up engines
246	      and firewalls, because load depends on how many packet headers
247	      they have to process.  Examples of bit-congestible resources are
248	      transmission links, radio power and most buffer memory, because
249	      the load depends on how many bits they have to transmit or store.
250	      Some machine architectures use fixed size packet buffers, so
251	      buffer memory in these cases is packet-congestible (see
252	      Section 4.1.1).

254	      The path through a machine will typically encounter both packet-
255	      congestible and bit-congestible resources.  However, currently, a
256	      design goal of network processing equipment such as routers and
257	      firewalls is to size the packet-processing engine(s) relative to
258	      the lines in order to keep packet processing uncongested even
259	      under worst case packet rates with runs of minimum size packets.
260	      Therefore, packet-congestion is currently rare [RFC6077; S.3.3],
261	      but there is no guarantee that it will not become more common in
262	      future.

264	      Note that information is generally processed or transmitted with a
265	      minimum granularity greater than a bit (e.g. octets).  The
266	      appropriate granularity for the resource in question should be
267	      used, but for the sake of brevity we will talk in terms of bytes
268	      in this memo.

270	   Coarser Granularity:  Resources may be congestible at higher levels
271	      of granularity than bits or packets, for instance stateful
272	      firewalls are flow-congestible and call-servers are session-
273	      congestible.  This memo focuses on congestion of connectionless
274	      resources, but the same principles may be applicable for
275	      congestion notification protocols controlling per-flow and per-
276	      session processing or state.

278	   RED Terminology:  In RED whether to use packets or bytes when
279	      measuring queues is called respectively "packet-mode queue
280	      measurement" or "byte-mode queue measurement".  And whether the
281	      probability of dropping a particular packet is independent or
282	      dependent on its size is called respectively "packet-mode drop" or
283	      "byte-mode drop".  The terms byte-mode and packet-mode should not
284	      be used without specifying whether they apply to queue measurement
285	      or to drop.

287	1.2.  Example Comparing Packet-Mode Drop and Byte-Mode Drop

289	   Taking RED as a well-known example algorithm, a central question
290	   addressed by this document is whether to recommend RED's packet-mode
291	   drop variant and to deprecate byte-mode drop.  Table 1 compares how
292	   packet-mode and byte-mode drop affect two flows of different size
293	   packets.  For each it gives the expected number of packets and of
294	   bits dropped in one second.  Each example flow runs at the same bit-
295	   rate of 48Mb/s, but one is broken up into small 60 byte packets and
296	   the other into large 1500 byte packets.

298	   To keep up the same bit-rate, in one second there are about 25 times
299	   more small packets because they are 25 times smaller.  As can be seen
300	   from the table, the packet rate is 100,000 small packets versus 4,000
301	   large packets per second (pps).

303	      Parameter            Formula        Small packets Large packets
304	      -------------------- -------------- ------------- -------------
305	      Packet size          s/8                      60B        1,500B
306	      Packet size          s                       480b       12,000b
307	      Bit-rate             x                     48Mbps        48Mbps
308	      Packet-rate          u = x/s              100kpps         4kpps

310	      Packet-mode Drop
311	      Pkt loss probability p                       0.1%          0.1%
312	      Pkt loss-rate        p*u                   100pps          4pps
313	      Bit loss-rate        p*u*s                 48kbps        48kbps

315	      Byte-mode Drop       MTU, M=12,000b
316	      Pkt loss probability b = p*s/M             0.004%          0.1%
317	      Pkt loss-rate        b*u                     4pps          4pps
318	      Bit loss-rate        b*u*s               1.92kbps        48kbps

320	         Table 1: Example Comparing Packet-mode and Byte-mode Drop

322	   For packet-mode drop, we illustrate the effect of a drop probability
323	   of 0.1%, which the algorithm applies to all packets irrespective of
324	   size.  Because there are 25 times more small packets in one second,
325	   it naturally drops 25 times more small packets, that is 100 small
326	   packets but only 4 large packets.  But if we count how many bits it
327	   drops, there are 48,000 bits in 100 small packets and 48,000 bits in
328	   4 large packets--the same number of bits of small packets as large.

330	      The packet-mode drop algorithm drops any bit with the same
331	      probability whether the bit is in a small or a large packet.

333	   For byte-mode drop, again we use an example drop probability of 0.1%,
334	   but only for maximum size packets (assuming the link MTU is 1,500B or
335	   12,000b).  The byte-mode algorithm reduces the drop probability of
336	   smaller packets proportional to their size, making the probability
337	   that it drops a small packet 25 times smaller at 0.004%.  But there
338	   are 25 times more small packets, so dropping them with 25 times lower
339	   probability results in dropping the same number of packets: 4 drops
340	   in both cases.  The 4 small dropped packets contain 25 times less
341	   bits than the 4 large dropped packets: 1,920 compared to 48,000.

343	      The byte-mode drop algorithm drops any bit with a probability
344	      proportionate to the size of the packet it is in.

346	2.  Recommendations

348	   This section gives recommendations related to network equipment in
349	   Sections 2.1 and 2.2, and in Sections 2.3 and 2.4 we discuss the
350	   implications on the transport protocols.

352	2.1.  Recommendation on Queue Measurement

354	   Ideally, an AQM would measure the service time of the queue to
355	   measure congestion of a resource.  However service time can only be
356	   measured as packets leave the queue, where it is not always feasible
357	   to implement a full AQM algorithm.  To predict the service time as
358	   packets join the queue, an AQM algorithm needs to measure the length
359	   of the queue.

361	   In this case, if the resource is bit-congestible, the AQM
362	   implementation SHOULD measure the length of the queue in bytes and,
363	   if the resource is packet-congestible, the implementation SHOULD
364	   measure the length of the queue in packets.  No other choice makes
365	   sense, because the number of packets waiting in the queue isn't
366	   relevant if the resource gets congested by bytes and vice versa.  For
367	   example, the length of the queue into a transmission line would be
368	   measured in bytes, while the length of the queue into a firewall
369	   would be measured in packets.

371	   To avoid the pathological effects of drop tail, the AQM can then
372	   transform this service time or queue length into the probability of
373	   dropping or marking a packet (e.g.  RED's piecewise linear function
374	   between thresholds).

376	   What this advice means for RED as a specific example:

378	   1.  A RED implementation SHOULD use byte mode queue measurement for
379	       measuring the congestion of bit-congestible resources and packet
380	       mode queue measurement for packet-congestible resources.

382	   2.  An implementation SHOULD NOT make it possible to configure the
383	       way a queue measures itself, because whether a queue is bit-
384	       congestible or packet-congestible is an inherent property of the
385	       queue.

387	   Exceptions to these recommendations MAY be necessary, for instance
388	   where a packet-congestible resource has to be configured as a proxy
389	   bottleneck for a bit-congestible resource in an adjacent box that
390	   does not support AQM.

392	   The recommended approach in less straightforward scenarios, such as
393	   fixed size packet buffers, resources without a queue and buffers
394	   comprising a mix of packet and bit-congestible resources, is
395	   discussed in Section 4.1.  For instance, Section 4.1.1 explains that
396	   the queue into a line should be measured in bytes even if the queue
397	   consists of fixed-size packet-buffers, because the root-cause of any
398	   congestion is bytes arriving too fast for the line--packets filling
399	   buffers are merely a symptom of the underlying congestion of the
400	   line.

402	2.2.  Recommendation on Encoding Congestion Notification

404	   When encoding congestion notification (e.g. by drop, ECN or PCN), the
405	   probability that network equipment drops or marks a particular packet
406	   to notify congestion SHOULD NOT depend on the size of the packet in
407	   question.  As the example in Section 1.2 illustrates, to drop any bit
408	   with probability 0.1% it is only necessary to drop every packet with
409	   probability 0.1% without regard to the size of each packet.

411	   This approach ensures the network layer offers sufficient congestion
412	   information for all known and future transport protocols and also
413	   ensures no perverse incentives are created that would encourage
414	   transports to use inappropriately small packet sizes.

416	   What this advice means for RED as a specific example:

418	   1.  The RED AQM algorithm SHOULD NOT use byte-mode drop, i.e. it
419	       ought to use packet-mode drop.  Byte-mode drop is more complex,
420	       it creates the perverse incentive to fragment segments into tiny
421	       pieces and it is vulnerable to floods of small packets.

423	   2.  If a vendor has implemented byte-mode drop, and an operator has
424	       turned it on, it is RECOMMENDED to switch it to packet-mode drop,
425	       after establishing if there are any implications on the relative
426	       performance of applications using different packet sizes.  The
427	       unlikely possibility of some application-specific legacy use of
428	       byte-mode drop is the only reason that all the above
429	       recommendations on encoding congestion notification are not
430	       phrased more strongly.

432	       RED as a whole SHOULD NOT be switched off.  Without RED, a drop
433	       tail queue biases against large packets and is vulnerable to
434	       floods of small packets.

436	   Note well that RED's byte-mode queue drop is completely orthogonal to
437	   byte-mode queue measurement and should not be confused with it.  If a
438	   RED implementation has a byte-mode but does not specify what sort of
439	   byte-mode, it is most probably byte-mode queue measurement, which is
440	   fine.  However, if in doubt, the vendor should be consulted.

442	   A survey (Appendix A) showed that there appears to be little, if any,
443	   installed base of the byte-mode drop variant of RED.  This suggests
444	   that deprecating byte-mode drop will have little, if any, incremental
445	   deployment impact.

447	2.3.  Recommendation on Responding to Congestion

449	   When a transport detects that a packet has been lost or congestion
450	   marked, it SHOULD consider the strength of the congestion indication
451	   as proportionate to the size in octets (bytes) of the missing or
452	   marked packet.

454	   In other words, when a packet indicates congestion (by being lost or
455	   marked) it can be considered conceptually as if there is a congestion
456	   indication on every octet of the packet, not just one indication per
457	   packet.

459	   To be clear, the above recommendation solely describes how a
460	   transport should interpret the meaning of a congestion indication.
461	   It makes no recommendation on whether a transport should act
462	   differently based on this interpretation.  It merely aids
463	   interoperablity between transports, if they choose to make their
464	   actions depend on the strength of congestion indications.

466	   This definition will be useful as the IETF transport area continues
467	   its programme of;

469	   o  updating host-based congestion control protocols to take account
470	      of packet size

472	   o  making transports less sensitive to losing control packets like
473	      SYNs and pure ACKs.

475	   What this advice means for the case of TCP:

477	   1.  If two TCP flows with different packet sizes are required to run
478	       at equal bit rates under the same path conditions, this SHOULD be
479	       done by altering TCP (Section 4.2.2), not network equipment (the
480	       latter affects other transports besides TCP).

482	   2.  If it is desired to improve TCP performance by reducing the
483	       chance that a SYN or a pure ACK will be dropped, this SHOULD be
484	       done by modifying TCP (Section 4.2.3), not network equipment.

486	   To be clear, we are not recommending at all that TCPs under
487	   equivalent conditions should aim for equal bit-rates.  We are merely
488	   saying that anyone trying to do such a thing should modify their TCP
489	   algorithm, not the network.

491	   These recommendations are phrased as 'SHOULD' rather than 'MUST',
492	   because there may be cases where compatibility with pre-existing
493	   versions of a transport protocol make the recommendations
494	   impractical.

496	2.4.  Recommendation on Handling Congestion Indications when Splitting
497	      or Merging Packets

499	   Packets carrying congestion indications may be split or merged in
500	   some circumstances (e.g. at a RTP/RTCP transcoder or during IP
501	   fragment reassembly).  Splitting and merging only make sense in the
502	   context of ECN, not loss.

504	   The general rule to follow is that the number of octets in packets
505	   with congestion indications SHOULD be equivalent before and after
506	   merging or splitting.  This is based on the principle used above;
507	   that an indication of congestion on a packet can be considered as an
508	   indication of congestion on each octet of the packet.

510	   The above rule is not phrased with the word "MUST" to allow the
511	   following exception.  There are cases where pre-existing protocols
512	   were not designed to conserve congestion marked octets (e.g.  IP
513	   fragment reassembly [RFC3168] or loss statistics in RTCP receiver
514	   reports [RFC3550] before ECN was added [RFC6679]).  When any such
515	   protocol is updated, it SHOULD comply with the above rule to conserve
516	   marked octets.  However, the rule may be relaxed if it would
517	   otherwise become too complex to interoperate with pre-existing
518	   implementations of the protocol.

520	   One can think of a splitting or merging process as if all the
521	   incoming congestion-marked octets increment a counter and all the
522	   outgoing marked octets decrement the same counter.  In order to
523	   ensure that congestion indications remain timely, even the smallest
524	   positive remainder in the conceptual counter should trigger the next
525	   outgoing packet to be marked (causing the counter to go negative).

527	3.  Motivating Arguments

529	   This section is informative.  It justifies the recommendations given
530	   in the previous section.

532	3.1.  Avoiding Perverse Incentives to (Ab)use Smaller Packets

534	   Increasingly, it is being recognised that a protocol design must take
535	   care not to cause unintended consequences by giving the parties in
536	   the protocol exchange perverse incentives [Evol_cc][RFC3426].  Given
537	   there are many good reasons why larger path maximum transmission
538	   units (PMTUs) would help solve a number of scaling issues, we do not
539	   want to create any bias against large packets that is greater than
540	   their true cost.

542	   Imagine a scenario where the same bit rate of packets will contribute
543	   the same to bit-congestion of a link irrespective of whether it is
544	   sent as fewer larger packets or more smaller packets.  A protocol
545	   design that caused larger packets to be more likely to be dropped
546	   than smaller ones would be dangerous in both the following cases:

548	   Malicious transports:  A queue that gives an advantage to small
549	      packets can be used to amplify the force of a flooding attack.  By
550	      sending a flood of small packets, the attacker can get the queue
551	      to discard more traffic in large packets, allowing more attack
552	      traffic to get through to cause further damage.  Such a queue
553	      allows attack traffic to have a disproportionately large effect on
554	      regular traffic without the attacker having to do much work.

556	   Non-malicious transports:  Even if an application designer is not
557	      actually malicious, if over time it is noticed that small packets
558	      tend to go faster, designers will act in their own interest and
559	      use smaller packets.  Queues that give advantage to small packets
560	      create an evolutionary pressure for applications or transports to
561	      send at the same bit-rate but break their data stream down into
562	      tiny segments to reduce their drop rate.  Encouraging a high
563	      volume of tiny packets might in turn unnecessarily overload a
564	      completely unrelated part of the system, perhaps more limited by
565	      header-processing than bandwidth.

567	   Imagine two unresponsive flows arrive at a bit-congestible
568	   transmission link each with the same bit rate, say 1Mbps, but one
569	   consists of 1500B and the other 60B packets, which are 25x smaller.
570	   Consider a scenario where gentle RED [gentle_RED] is used, along with
571	   the variant of RED we advise against, i.e. where the RED algorithm is
572	   configured to adjust the drop probability of packets in proportion to
573	   each packet's size (byte mode packet drop).  In this case, RED aims
574	   to drop 25x more of the larger packets than the smaller ones.  Thus,
575	   for example if RED drops 25% of the larger packets, it will aim to
576	   drop 1% of the smaller packets (but in practice it may drop more as
577	   congestion increases [RFC4828; Appx B.4]).  Even though both flows
578	   arrive with the same bit rate, the bit rate the RED queue aims to
579	   pass to the line will be 750kbps for the flow of larger packets but
580	   990kbps for the smaller packets (because of rate variations it will
581	   actually be a little less than this target).

583	   Note that, although the byte-mode drop variant of RED amplifies small
584	   packet attacks, drop-tail queues amplify small packet attacks even
585	   more (see Security Considerations in Section 6).  Wherever possible
586	   neither should be used.

588	3.2.  Small != Control

590	   Dropping fewer control packets considerably improves performance.  It
591	   is tempting to drop small packets with lower probability in order to
592	   improve performance, because many control packets tend to be smaller
593	   (TCP SYNs & ACKs, DNS queries & responses, SIP messages, HTTP GETs,
594	   etc).  However, we must not give control packets preference purely by
595	   virtue of their smallness, otherwise it is too easy for any data
596	   source to get the same preferential treatment simply by sending data
597	   in smaller packets.  Again we should not create perverse incentives
598	   to favour small packets rather than to favour control packets, which
599	   is what we intend.

601	   Just because many control packets are small does not mean all small
602	   packets are control packets.

604	   So, rather than fix these problems in the network, we argue that the
605	   transport should be made more robust against losses of control
606	   packets (see 'Making Transports Robust against Control Packet Losses'
607	   in Section 4.2.3).

609	3.3.  Transport-Independent Network

611	   TCP congestion control ensures that flows competing for the same
612	   resource each maintain the same number of segments in flight,
613	   irrespective of segment size.  So under similar conditions, flows
614	   with different segment sizes will get different bit-rates.

616	   To counter this effect it seems tempting not to follow our
617	   recommendation, and instead for the network to bias congestion
618	   notification by packet size in order to equalise the bit-rates of
619	   flows with different packet sizes.  However, in order to do this, the
620	   queuing algorithm has to make assumptions about the transport, which
621	   become embedded in the network.  Specifically:

623	   o  The queuing algorithm has to assume how aggressively the transport
624	      will respond to congestion (see Section 4.2.4).  If the network
625	      assumes the transport responds as aggressively as TCP NewReno, it
626	      will be wrong for Compound TCP and differently wrong for Cubic
627	      TCP, etc.  To achieve equal bit-rates, each transport then has to
628	      guess what assumption the network made, and work out how to
629	      replace this assumed aggressiveness with its own aggressiveness.

631	   o  Also, if the network biases congestion notification by packet size
632	      it has to assume a baseline packet size--all proposed algorithms
633	      use the local MTU (for example see the byte-mode loss probability
634	      formula in Table 1).  Then if the non-Reno transports mentioned
635	      above are trying to reverse engineer what the network assumed,
636	      they also have to guess the MTU of the congested link.

638	   Even though reducing the drop probability of small packets (e.g.
639	   RED's byte-mode drop) helps ensure TCP flows with different packet
640	   sizes will achieve similar bit rates, we argue this correction should
641	   be made to any future transport protocols based on TCP, not to the
642	   network in order to fix one transport, no matter how predominant it
643	   is.  Effectively, favouring small packets is reverse engineering of
644	   network equipment around one particular transport protocol (TCP),
645	   contrary to the excellent advice in [RFC3426], which asks designers
646	   to question "Why are you proposing a solution at this layer of the
647	   protocol stack, rather than at another layer?"

649	   In contrast, if the network never takes account of packet size, the
650	   transport can be certain it will never need to guess any assumptions
651	   the network has made.  And the network passes two pieces of
652	   information to the transport that are sufficient in all cases: i)
653	   congestion notification on the packet and ii) the size of the packet.
654	   Both are available for the transport to combine (by taking account of
655	   packet size when responding to congestion) or not.  Appendix B checks
656	   that these two pieces of information are sufficient for all relevant
657	   scenarios.

659	   When the network does not take account of packet size, it allows
660	   transport protocols to choose whether to take account of packet size
661	   or not.  However, if the network were to bias congestion notification
662	   by packet size, transport protocols would have no choice; those that
663	   did not take account of packet size themselves would unwittingly
664	   become dependent on packet size, and those that already took account
665	   of packet size would end up taking account of it twice.

667	3.4.  Partial Deployment of AQM

669	   In overview, the argument in this section runs as follows:

671	   o  Because the network does not and cannot always drop packets in
672	      proportion to their size, it shouldn't be given the task of making
673	      drop signals depend on packet size at all.

675	   o  Transports on the other hand don't always want to make their rate
676	      response proportional to the size of dropped packets, but if they
677	      want to, they always can.

679	   The argument is similar to the end-to-end argument that says "Don't
680	   do X in the network if end-systems can do X by themselves, and they
681	   want to be able to choose whether to do X anyway."  Actually the
682	   following argument is stronger; in addition it says "Don't give the
683	   network task X that could be done by the end-systems, if X is not
684	   deployed on all network nodes, and end-systems won't be able to tell
685	   whether their network is doing X, or whether they need to do X
686	   themselves."  In this case, the X in question is "making the response
687	   to congestion depend on packet size".

689	   We will now re-run this argument taking each step in more depth.  The
690	   argument applies solely to drop, not to ECN marking.

692	   A queue drops packets for either of two reasons: a) to signal to host
693	   congestion controls that they should reduce the load and b) because
694	   there is no buffer left to store the packets.  Active queue
695	   management tries to use drops as a signal for hosts to slow down
696	   (case a) so that drop due to buffer exhaustion (case b) should not be
697	   necessary.

699	   AQM is not universally deployed in every queue in the Internet; many
700	   cheap Ethernet bridges, software firewalls, NATs on consumer devices,
701	   etc implement simple tail-drop buffers.  Even if AQM were universal,
702	   it has to be able to cope with buffer exhaustion (by switching to a
703	   behaviour like tail-drop), in order to cope with unresponsive or
704	   excessive transports.  For these reasons networks will sometimes be
705	   dropping packets as a last resort (case b) rather than under AQM
706	   control (case a).

708	   When buffers are exhausted (case b), they don't naturally drop
709	   packets in proportion to their size.  The network can only reduce the
710	   probability of dropping smaller packets if it has enough space to
711	   store them somewhere while it waits for a larger packet that it can
712	   drop.  If the buffer is exhausted, it does not have this choice.
713	   Admittedly tail-drop does naturally drop somewhat fewer small
714	   packets, but exactly how few depends more on the mix of sizes than
715	   the size of the packet in question.  Nonetheless, in general, if we
716	   wanted networks to do size-dependent drop, we would need universal
717	   deployment of (packet-size dependent) AQM code, which is currently
718	   unrealistic.

720	   A host transport cannot know whether any particular drop was a
721	   deliberate signal from an AQM or a sign of a queue shedding packets
722	   due to buffer exhaustion.  Therefore, because the network cannot
723	   universally do size-dependent drop, it should not do it all.

725	   Whereas universality is desirable in the network, diversity is
726	   desirable between different transport layer protocols - some, like
727	   NewReno TCP [RFC5681], may not choose to make their rate response
728	   proportionate to the size of each dropped packet, while others will
729	   (e.g.  TFRC-SP [RFC4828]).

731	3.5.  Implementation Efficiency

733	   Biasing against large packets typically requires an extra multiply
734	   and divide in the network (see the example byte-mode drop formula in
735	   Table 1).  Allowing for packet size at the transport rather than in
736	   the network ensures that neither the network nor the transport needs
737	   to do a multiply operation--multiplication by packet size is
738	   effectively achieved as a repeated add when the transport adds to its
739	   count of marked bytes as each congestion event is fed to it.  Also
740	   the work to do the biasing is spread over many hosts, rather than
741	   concentrated in just the congested network element.  These aren't
742	   principled reasons in themselves, but they are a happy consequence of
743	   the other principled reasons.

745	4.  A Survey and Critique of Past Advice

747	   This section is informative, not normative.

749	   The original 1993 paper on RED [RED93] proposed two options for the
750	   RED active queue management algorithm: packet mode and byte mode.
751	   Packet mode measured the queue length in packets and dropped (or
752	   marked) individual packets with a probability independent of their
753	   size.  Byte mode measured the queue length in bytes and marked an
754	   individual packet with probability in proportion to its size
755	   (relative to the maximum packet size).  In the paper's outline of
756	   further work, it was stated that no recommendation had been made on
757	   whether the queue size should be measured in bytes or packets, but
758	   noted that the difference could be significant.

760	   When RED was recommended for general deployment in 1998 [RFC2309],
761	   the two modes were mentioned implying the choice between them was a
762	   question of performance, referring to a 1997 email [pktByteEmail] for
763	   advice on tuning.  A later addendum to this email introduced the
764	   insight that there are in fact two orthogonal choices:

766	   o  whether to measure queue length in bytes or packets (Section 4.1)

768	   o  whether the drop probability of an individual packet should depend
769	      on its own size (Section 4.2).

771	   The rest of this section is structured accordingly.

773	4.1.  Congestion Measurement Advice

775	   The choice of which metric to use to measure queue length was left
776	   open in RFC2309.  It is now well understood that queues for bit-
777	   congestible resources should be measured in bytes, and queues for
778	   packet-congestible resources should be measured in packets

780	   [pktByteEmail].

782	   Congestion in some legacy bit-congestible buffers is only measured in
783	   packets not bytes.  In such cases, the operator has to set the
784	   thresholds mindful of a typical mix of packets sizes.  Any AQM
785	   algorithm on such a buffer will be oversensitive to high proportions
786	   of small packets, e.g. a DoS attack, and under-sensitive to high
787	   proportions of large packets.  However, there is no need to make
788	   allowances for the possibility of such legacy in future protocol
789	   design.  This is safe because any under-sensitivity during unusual
790	   traffic mixes cannot lead to congestion collapse given the buffer
791	   will eventually revert to tail drop, discarding proportionately more
792	   large packets.

794	4.1.1.  Fixed Size Packet Buffers

796	   The question of whether to measure queues in bytes or packets seems
797	   to be well understood.  However, measuring congestion is confusing
798	   when the resource is bit congestible but the queue into the resource
799	   is packet congestible.  This section outlines the approach to take.

801	   Some, mostly older, queuing hardware allocates fixed sized buffers in
802	   which to store each packet in the queue.  This hardware forwards to
803	   the line in one of two ways:

805	   o  With some hardware, any fixed sized buffers not completely filled
806	      by a packet are padded when transmitted to the wire.  This case,
807	      should clearly be treated as packet-congestible, because both
808	      queuing and transmission are in fixed MTU-sized units.  Therefore
809	      the queue length in packets is a good model of congestion of the
810	      link.

812	   o  More commonly, hardware with fixed size packet buffers transmits
813	      packets to line without padding.  This implies a hybrid forwarding
814	      system with transmission congestion dependent on the size of
815	      packets but queue congestion dependent on the number of packets,
816	      irrespective of their size.

818	      Nonetheless, there would be no queue at all unless the line had
819	      become congested--the root-cause of any congestion is too many
820	      bytes arriving for the line.  Therefore, the AQM should measure
821	      the queue length as the sum of all the packet sizes in bytes that
822	      are queued up waiting to be serviced by the line, irrespective of
823	      whether each packet is held in a fixed size buffer.

825	   In the (unlikely) first case where use of padding means the queue
826	   should be measured in packets, further confusion is likely because
827	   the fixed buffers are rarely all one size.  Typically pools of
828	   different sized buffers are provided (Cisco uses the term 'buffer
829	   carving' for the process of dividing up memory into these pools
830	   [IOSArch]).  Usually, if the pool of small buffers is exhausted,
831	   arriving small packets can borrow space in the pool of large buffers,
832	   but not vice versa.  However, there is no need to consider all this
833	   complexity, because the root-cause of any congestion is still line
834	   overload--buffer consumption is only the symptom.  Therefore, the
835	   length of the queue should be measured as the sum of the bytes in the
836	   queue that will be transmitted to line, including any padding.  In
837	   the (unusual) case of transmission with padding this means the sum of
838	   the sizes of the small buffers queued plus the sum of the sizes of
839	   the large buffers queued.

841	   We will return to borrowing of fixed sized buffers when we discuss
842	   biasing the drop/marking probability of a specific packet because of
843	   its size in Section 4.2.1.  But here we can repeat the simple rule
844	   for how to measure the length of queues of fixed buffers: no matter
845	   how complicated the buffering scheme is, ultimately a transmission
846	   line is nearly always bit-congestible so the number of bytes queued
847	   up waiting for the line measures how congested the line is, and it is
848	   rarely important to measure how congested the buffering system is.

850	4.1.2.  Congestion Measurement without a Queue

852	   AQM algorithms are nearly always described assuming there is a queue
853	   for a congested resource and the algorithm can use the queue length
854	   to determine the probability that it will drop or mark each packet.
855	   But not all congested resources lead to queues.  For instance,
856	   wireless spectrum is usually regarded as bit-congestible (for a given
857	   coding scheme).  But wireless link protocols do not always maintain a
858	   queue that depends on spectrum interference.  Similarly, power
859	   limited resources are also usually bit-congestible if energy is
860	   primarily required for transmission rather than header processing,
861	   but it is rare for a link protocol to build a queue as it approaches
862	   maximum power.

864	   Nonetheless, AQM algorithms do not require a queue in order to work.
865	   For instance spectrum congestion can be modelled by signal quality
866	   using target bit-energy-to-noise-density ratio.  And, to model radio
867	   power exhaustion, transmission power levels can be measured and
868	   compared to the maximum power available.  [ECNFixedWireless] proposes
869	   a practical and theoretically sound way to combine congestion
870	   notification for different bit-congestible resources at different
871	   layers along an end to end path, whether wireless or wired, and
872	   whether with or without queues.

874	4.2.  Congestion Notification Advice

876	4.2.1.  Network Bias when Encoding

878	4.2.1.1.  Advice on Packet Size Bias in RED

880	   The previously mentioned email [pktByteEmail] referred to by
881	   [RFC2309] advised that most scarce resources in the Internet were
882	   bit-congestible, which is still believed to be true (Section 1.1).
883	   But it went on to offer advice that is updated by this memo.  It said
884	   that drop probability should depend on the size of the packet being
885	   considered for drop if the resource is bit-congestible, but not if it
886	   is packet-congestible.  The argument continued that if packet drops
887	   were inflated by packet size (byte-mode dropping), "a flow's fraction
888	   of the packet drops is then a good indication of that flow's fraction
889	   of the link bandwidth in bits per second".  This was consistent with
890	   a referenced policing mechanism being worked on at the time for
891	   detecting unusually high bandwidth flows, eventually published in
892	   1999 [pBox].  However, the problem could and should have been solved
893	   by making the policing mechanism count the volume of bytes randomly
894	   dropped, not the number of packets.

896	   A few months before RFC2309 was published, an addendum was added to
897	   the above archived email referenced from the RFC, in which the final
898	   paragraph seemed to partially retract what had previously been said.
899	   It clarified that the question of whether the probability of
900	   dropping/marking a packet should depend on its size was not related
901	   to whether the resource itself was bit congestible, but a completely
902	   orthogonal question.  However the only example given had the queue
903	   measured in packets but packet drop depended on the size of the
904	   packet in question.  No example was given the other way round.

906	   In 2000, Cnodder et al [REDbyte] pointed out that there was an error
907	   in the part of the original 1993 RED algorithm that aimed to
908	   distribute drops uniformly, because it didn't correctly take into
909	   account the adjustment for packet size.  They recommended an
910	   algorithm called RED_4 to fix this.  But they also recommended a
911	   further change, RED_5, to adjust drop rate dependent on the square of
912	   relative packet size.  This was indeed consistent with one implied
913	   motivation behind RED's byte mode drop--that we should reverse
914	   engineer the network to improve the performance of dominant end-to-
915	   end congestion control mechanisms.  This memo makes a different
916	   recommendations in Section 2.

918	   By 2003, a further change had been made to the adjustment for packet
919	   size, this time in the RED algorithm of the ns2 simulator.  Instead
920	   of taking each packet's size relative to a `maximum packet size' it
921	   was taken relative to a `mean packet size', intended to be a static
922	   value representative of the `typical' packet size on the link.  We
923	   have not been able to find a justification in the literature for this
924	   change, however Eddy and Allman conducted experiments [REDbias] that
925	   assessed how sensitive RED was to this parameter, amongst other
926	   things.  However, this changed algorithm can often lead to drop
927	   probabilities of greater than 1 (which gives a hint that there is
928	   probably a mistake in the theory somewhere).

930	   On 10-Nov-2004, this variant of byte-mode packet drop was made the
931	   default in the ns2 simulator.  It seems unlikely that byte-mode drop
932	   has ever been implemented in production networks (Appendix A),
933	   therefore any conclusions based on ns2 simulations that use RED
934	   without disabling byte-mode drop are likely to behave very
935	   differently from RED in production networks.

937	4.2.1.2.  Packet Size Bias Regardless of AQM

939	   The byte-mode drop variant of RED (or a similar variant of other AQM
940	   algorithms) is not the only possible bias towards small packets in
941	   queueing systems.  We have already mentioned that tail-drop queues
942	   naturally tend to lock-out large packets once they are full.

944	   But also queues with fixed sized buffers reduce the probability that
945	   small packets will be dropped if (and only if) they allow small
946	   packets to borrow buffers from the pools for larger packets (see
947	   Section 4.1.1).  Borrowing effectively makes the maximum queue size
948	   for small packets greater than that for large packets, because more
949	   buffers can be used by small packets while less will fit large
950	   packets.  Incidentally, the bias towards small packets from buffer
951	   borrowing is nothing like as large as that of RED's byte-mode drop.

953	   Nonetheless, fixed-buffer memory with tail drop is still prone to
954	   lock-out large packets, purely because of the tail-drop aspect.  So,
955	   fixed size packet-buffers should be augmented with a good AQM
956	   algorithm and packet-mode drop.  If an AQM is too complicated to
957	   implement with multiple fixed buffer pools, the minimum necessary to
958	   prevent large packet lock-out is to ensure smaller packets never use
959	   the last available buffer in any of the pools for larger packets.

961	4.2.2.  Transport Bias when Decoding

963	   The above proposals to alter the network equipment to bias towards
964	   smaller packets have largely carried on outside the IETF process.
965	   Whereas, within the IETF, there are many different proposals to alter
966	   transport protocols to achieve the same goals, i.e. either to make
967	   the flow bit-rate take account of packet size, or to protect control
968	   packets from loss.  This memo argues that altering transport
969	   protocols is the more principled approach.

971	   A recently approved experimental RFC adapts its transport layer
972	   protocol to take account of packet sizes relative to typical TCP
973	   packet sizes.  This proposes a new small-packet variant of TCP-
974	   friendly rate control [RFC5348] called TFRC-SP [RFC4828].
975	   Essentially, it proposes a rate equation that inflates the flow rate
976	   by the ratio of a typical TCP segment size (1500B including TCP
977	   header) over the actual segment size [PktSizeEquCC].  (There are also
978	   other important differences of detail relative to TFRC, such as using
979	   virtual packets [CCvarPktSize] to avoid responding to multiple losses
980	   per round trip and using a minimum inter-packet interval.)

982	   Section 4.5.1 of this TFRC-SP spec discusses the implications of
983	   operating in an environment where queues have been configured to drop
984	   smaller packets with proportionately lower probability than larger
985	   ones.  But it only discusses TCP operating in such an environment,
986	   only mentioning TFRC-SP briefly when discussing how to define
987	   fairness with TCP.  And it only discusses the byte-mode dropping
988	   version of RED as it was before Cnodder et al pointed out it didn't
989	   sufficiently bias towards small packets to make TCP independent of
990	   packet size.

992	   So the TFRC-SP spec doesn't address the issue of which of the network
993	   or the transport _should_ handle fairness between different packet
994	   sizes.  In its Appendix B.4 it discusses the possibility of both
995	   TFRC-SP and some network buffers duplicating each other's attempts to
996	   deliberately bias towards small packets.  But the discussion is not
997	   conclusive, instead reporting simulations of many of the
998	   possibilities in order to assess performance but not recommending any
999	   particular course of action.

1001	   The paper originally proposing TFRC with virtual packets (VP-TFRC)
1002	   [CCvarPktSize] proposed that there should perhaps be two variants to
1003	   cater for the different variants of RED.  However, as the TFRC-SP
1004	   authors point out, there is no way for a transport to know whether
1005	   some queues on its path have deployed RED with byte-mode packet drop
1006	   (except if an exhaustive survey found that no-one has deployed it!--
1007	   see Appendix A).  Incidentally, VP-TFRC also proposed that byte-mode
1008	   RED dropping should really square the packet-size compensation-factor
1009	   (like that of Cnodder's RED_5, but apparently unaware of it).

1011	   Pre-congestion notification [RFC5670] is an IETF technology to use a
1012	   virtual queue for AQM marking for packets within one Diffserv class
1013	   in order to give early warning prior to any real queuing.  The PCN
1014	   marking algorithms have been designed not to take account of packet
1015	   size when forwarding through queues.  Instead the general principle
1016	   has been to take account of the sizes of marked packets when
1017	   monitoring the fraction of marking at the edge of the network, as
1018	   recommended here.

1020	4.2.3.  Making Transports Robust against Control Packet Losses

1022	   Recently, two RFCs have defined changes to TCP that make it more
1023	   robust against losing small control packets [RFC5562] [RFC5690].  In
1024	   both cases they note that the case for these two TCP changes would be
1025	   weaker if RED were biased against dropping small packets.  We argue
1026	   here that these two proposals are a safer and more principled way to
1027	   achieve TCP performance improvements than reverse engineering RED to
1028	   benefit TCP.

1030	   Although there are no known proposals, it would also be possible and
1031	   perfectly valid to make control packets robust against drop by
1032	   explicitly requesting a lower drop probability using their Diffserv
1033	   code point [RFC2474] to request a scheduling class with lower drop.

1035	   Although not brought to the IETF, a simple proposal from Wischik
1036	   [DupTCP] suggests that the first three packets of every TCP flow
1037	   should be routinely duplicated after a short delay.  It shows that
1038	   this would greatly improve the chances of short flows completing
1039	   quickly, but it would hardly increase traffic levels on the Internet,
1040	   because Internet bytes have always been concentrated in the large
1041	   flows.  It further shows that the performance of many typical
1042	   applications depends on completion of long serial chains of short
1043	   messages.  It argues that, given most of the value people get from
1044	   the Internet is concentrated within short flows, this simple
1045	   expedient would greatly increase the value of the best efforts
1046	   Internet at minimal cost.

1048	4.2.4.  Congestion Notification: Summary of Conflicting Advice

1050	   +-----------+----------------+-----------------+--------------------+
1051	   | transport |  RED_1 (packet |  RED_4 (linear  | RED_5 (square byte |
1052	   |        cc |   mode drop)   | byte mode drop) |     mode drop)     |
1053	   +-----------+----------------+-----------------+--------------------+
1054	   |    TCP or |    s/sqrt(p)   |    sqrt(s/p)    |      1/sqrt(p)     |
1055	   |      TFRC |                |                 |                    |
1056	   |   TFRC-SP |    1/sqrt(p)   |    1/sqrt(sp)   |    1/(s.sqrt(p))   |
1057	   +-----------+----------------+-----------------+--------------------+

1059	    Table 2: Dependence of flow bit-rate per RTT on packet size, s, and
1060	   drop probability, p, when network and/or transport bias towards small
1061	                        packets to varying degrees

1063	   Table 2 aims to summarise the potential effects of all the advice
1064	   from different sources.  Each column shows a different possible AQM
1065	   behaviour in different queues in the network, using the terminology
1066	   of Cnodder et al outlined earlier (RED_1 is basic RED with packet-
1067	   mode drop).  Each row shows a different transport behaviour: TCP

1069	   [RFC5681] and TFRC [RFC5348] on the top row with TFRC-SP [RFC4828]
1070	   below.  Each cell shows how the bits per round trip of a flow depends
1071	   on packet size, s, and drop probability, p.  In order to declutter
1072	   the formulae to focus on packet-size dependence they are all given
1073	   per round trip, which removes any RTT term.

1075	   Let us assume that the goal is for the bit-rate of a flow to be
1076	   independent of packet size.  Suppressing all inessential details, the
1077	   table shows that this should either be achievable by not altering the
1078	   TCP transport in a RED_5 network, or using the small packet TFRC-SP
1079	   transport (or similar) in a network without any byte-mode dropping
1080	   RED (top right and bottom left).  Top left is the `do nothing'
1081	   scenario, while bottom right is the `do-both' scenario in which bit-
1082	   rate would become far too biased towards small packets.  Of course,
1083	   if any form of byte-mode dropping RED has been deployed on a subset
1084	   of queues that congest, each path through the network will present a
1085	   different hybrid scenario to its transport.

1087	   Whatever, we can see that the linear byte-mode drop column in the
1088	   middle would considerably complicate the Internet.  It's a half-way
1089	   house that doesn't bias enough towards small packets even if one
1090	   believes the network should be doing the biasing.  Section 2
1091	   recommends that _all_ bias in network equipment towards small packets
1092	   should be turned off--if indeed any equipment vendors have
1093	   implemented it--leaving packet-size bias solely as the preserve of
1094	   the transport layer (solely the leftmost, packet-mode drop column).

1096	   In practice it seems that no deliberate bias towards small packets
1097	   has been implemented for production networks.  Of the 19% of vendors
1098	   who responded to a survey of 84 equipment vendors, none had
1099	   implemented byte-mode drop in RED (see Appendix A for details).

1101	5.  Outstanding Issues and Next Steps

1103	5.1.  Bit-congestible Network

1105	   For a connectionless network with nearly all resources being bit-
1106	   congestible the recommended position is clear--that the network
1107	   should not make allowance for packet sizes and the transport should.
1108	   This leaves two outstanding issues:

1110	   o  How to handle any legacy of AQM with byte-mode drop already
1111	      deployed;

1113	   o  The need to start a programme to update transport congestion
1114	      control protocol standards to take account of packet size.

1116	   A survey of equipment vendors (Section 4.2.4) found no evidence that
1117	   byte-mode packet drop had been implemented, so deployment will be
1118	   sparse at best.  A migration strategy is not really needed to remove
1119	   an algorithm that may not even be deployed.

1121	   A programme of experimental updates to take account of packet size in
1122	   transport congestion control protocols has already started with
1123	   TFRC-SP [RFC4828].

1125	5.2.  Bit- & Packet-congestible Network

1127	   The position is much less clear-cut if the Internet becomes populated
1128	   by a more even mix of both packet-congestible and bit-congestible
1129	   resources (see Appendix B.2).  This problem is not pressing, because
1130	   most Internet resources are designed to be bit-congestible before
1131	   packet processing starts to congest (see Section 1.1).

1133	   The IRTF Internet congestion control research group (ICCRG) has set
1134	   itself the task of reaching consensus on generic forwarding
1135	   mechanisms that are necessary and sufficient to support the
1136	   Internet's future congestion control requirements (the first
1137	   challenge in [RFC6077]).  The research question of whether packet
1138	   congestion might become common and what to do if it does may in the
1139	   future be explored in the IRTF (the "Challenge 3: Packet Size" in
1140	   [RFC6077]).

1142	6.  Security Considerations

1144	   This memo recommends that queues do not bias drop probability towards
1145	   small packets as this creates a perverse incentive for transports to
1146	   break down their flows into tiny segments.  One of the benefits of
1147	   implementing AQM was meant to be to remove this perverse incentive
1148	   that drop-tail queues gave to small packets.

1150	   In practice, transports cannot all be trusted to respond to
1151	   congestion.  So another reason for recommending that queues do not
1152	   bias drop probability towards small packets is to avoid the
1153	   vulnerability to small packet DDoS attacks that would otherwise
1154	   result.  One of the benefits of implementing AQM was meant to be to
1155	   remove drop-tail's DoS vulnerability to small packets, so we
1156	   shouldn't add it back again.

1158	   If most queues implemented AQM with byte-mode drop, the resulting
1159	   network would amplify the potency of a small packet DDoS attack.  At
1160	   the first queue the stream of packets would push aside a greater
1161	   proportion of large packets, so more of the small packets would
1162	   survive to attack the next queue.  Thus a flood of small packets
1163	   would continue on towards the destination, pushing regular traffic
1164	   with large packets out of the way in one queue after the next, but
1165	   suffering much less drop itself.

1167	   Appendix C explains why the ability of networks to police the
1168	   response of _any_ transport to congestion depends on bit-congestible
1169	   network resources only doing packet-mode not byte-mode drop.  In
1170	   summary, it says that making drop probability depend on the size of
1171	   the packets that bits happen to be divided into simply encourages the
1172	   bits to be divided into smaller packets.  Byte-mode drop would
1173	   therefore irreversibly complicate any attempt to fix the Internet's
1174	   incentive structures.

1176	7.  IANA Considerations

1178	   This document has no actions for IANA.

1180	8.  Conclusions

1182	   This memo identifies the three distinct stages of the congestion
1183	   notification process where implementations need to decide whether to
1184	   take packet size into account.  The recommendations provided in
1185	   Section 2 of this memo are different in each case:

1187	   o  When network equipment measures the length of a queue, whether it
1188	      counts in bytes or packets depends on whether the network resource
1189	      is congested respectively by bytes or by packets.

1191	   o  When network equipment decides whether to drop (or mark) a packet,
1192	      it is recommended that the size of the particular packet should
1193	      not be taken into account

1195	   o  However, when a transport algorithm responds to a dropped or
1196	      marked packet, the size of the rate reduction should be
1197	      proportionate to the size of the packet.

1199	   In summary, the answers are 'it depends', 'no' and 'yes' respectively

1201	   For the specific case of RED, this means that byte-mode queue
1202	   measurement will often be appropriate although byte-mode drop is
1203	   strongly deprecated.

1205	   At the transport layer the IETF should continue updating congestion
1206	   control protocols to take account of the size of each packet that
1207	   indicates congestion.  Also the IETF should continue to make
1208	   protocols less sensitive to losing control packets like SYNs, pure
1209	   ACKs and DNS exchanges.  Although many control packets happen to be
1210	   small, the alternative of network equipment favouring all small
1211	   packets would be dangerous.  That would create perverse incentives to
1212	   split data transfers into smaller packets.

1214	   The memo develops these recommendations from principled arguments
1215	   concerning scaling, layering, incentives, inherent efficiency,
1216	   security and policeability.  But it also addresses practical issues
1217	   such as specific buffer architectures and incremental deployment.
1218	   Indeed a limited survey of RED implementations is discussed, which
1219	   shows there appears to be little, if any, installed base of RED's
1220	   byte-mode drop.  Therefore it can be deprecated with little, if any,
1221	   incremental deployment complications.

1223	   The recommendations have been developed on the well-founded basis
1224	   that most Internet resources are bit-congestible not packet-
1225	   congestible.  We need to know the likelihood that this assumption
1226	   will prevail longer term and, if it might not, what protocol changes
1227	   will be needed to cater for a mix of the two.  The IRTF Internet
1228	   Congestion Control Research Group (ICCRG) is currently working on
1229	   these problems [RFC6077].

1231	9.  Acknowledgements

1233	   Thank you to Sally Floyd, who gave extensive and useful review
1234	   comments.  Also thanks for the reviews from Philip Eardley, David
1235	   Black, Fred Baker, Toby Moncaster, Arnaud Jacquet and Mirja
1236	   Kuehlewind as well as helpful explanations of different hardware
1237	   approaches from Larry Dunn and Fred Baker.  We are grateful to Bruce
1238	   Davie and his colleagues for providing a timely and efficient survey
1239	   of RED implementation in Cisco's product range.  Also grateful thanks
1240	   to Toby Moncaster, Will Dormann, John Regnault, Simon Carter and
1241	   Stefaan De Cnodder who further helped survey the current status of
1242	   RED implementation and deployment and, finally, thanks to the
1243	   anonymous individuals who responded.

1245	   Bob Briscoe and Jukka Manner were partly funded by Trilogy, a
1246	   research project (ICT- 216372) supported by the European Community
1247	   under its Seventh Framework Programme.  The views expressed here are
1248	   those of the authors only.

1250	10.  Comments Solicited

1252	   Comments and questions are encouraged and very welcome.  They can be
1253	   addressed to the IETF Transport Area working group mailing list
1254	   <tsvwg@ietf.org>, and/or to the authors.

1256	11.  References

1258	11.1.  Normative References

1260	   [RFC2119]            Bradner, S., "Key words for use in RFCs to
1261	                        Indicate Requirement Levels", BCP 14, RFC 2119,
1262	                        March 1997.

1264	   [RFC3168]            Ramakrishnan, K., Floyd, S., and D. Black, "The
1265	                        Addition of Explicit Congestion Notification
1266	                        (ECN) to IP", RFC 3168, September 2001.

1268	11.2.  Informative References

1270	   [BLUE02]             Feng, W-c., Shin, K., Kandlur, D., and D. Saha,
1271	                        "The BLUE active queue management algorithms",
1272	                        IEEE/ACM Transactions on Networking 10(4) 513--
1273	                        528, August 2002,
1274	                        <http://dx.doi.org/10.1109/TNET.2002.801399>.

1276	   [CCvarPktSize]       Widmer, J., Boutremans, C., and J-Y. Le Boudec,
1277	                        "Congestion Control for Flows with Variable
1278	                        Packet Size", ACM CCR 34(2) 137--151, 2004,
1279	                        <http://doi.acm.org/10.1145/997150.997162>.

1281	   [CHOKe_Var_Pkt]      Psounis, K., Pan, R., and B. Prabhaker,
1282	                        "Approximate Fair Dropping for Variable Length
1283	                        Packets", IEEE Micro 21(1):48--56, January-
1284	                        February 2001, <http://www.stanford.edu/~balaji/
1285	                        papers/01approximatefair.pdf}>.

1287	   [CoDel12]            Nichols, K. and V. Jacobson, "Controlling Queue
1288	                        Delay", ACM Queue 10(5), May 2012,
1289	                        <http://queue.acm.org/detail.cfm?id=2209336>.

1291	   [DRQ]                Shin, M., Chong, S., and I. Rhee, "Dual-Resource
1292	                        TCP/AQM for Processing-Constrained Networks",
1293	                        IEEE/ACM Transactions on Networking Vol 16,
1294	                        issue 2, April 2008,
1295	                        <http://dx.doi.org/10.1109/TNET.2007.900415>.

1297	   [DupTCP]             Wischik, D., "Short messages", Royal Society
1298	                        workshop on networks: modelling and control ,
1299	                        September 2007, <http://www.cs.ucl.ac.uk/staff/
1300	                        ucacdjw/Research/shortmsg.html>.

1302	   [ECNFixedWireless]   Siris, V., "Resource Control for Elastic Traffic
1303	                        in CDMA Networks", Proc. ACM MOBICOM'02 ,
1304	                        September 2002, <http://www.ics.forth.gr/netlab/
1305	                        publications/
1306	                        resource_control_elastic_cdma.html>.

1308	   [Evol_cc]            Gibbens, R. and F. Kelly, "Resource pricing and
1309	                        the evolution of congestion control",
1310	                        Automatica 35(12)1969--1985, December 1999,
1311	                        <http://www.statslab.cam.ac.uk/~frank/
1312	                        evol.html>.

1314	   [I-D.pan-tsvwg-pie]  Pan, R., Natarajan, P., Piglione, C., and M.
1315	                        Prabhu, "PIE: A Lightweight Control Scheme To
1316	                        Address the Bufferbloat Problem",
1317	                        draft-pan-tsvwg-pie-00 (work in progress),
1318	                        December 2012.

1320	   [IOSArch]            Bollapragada, V., White, R., and C. Murphy,
1321	                        "Inside Cisco IOS Software Architecture", Cisco
1322	                        Press: CCIE Professional Development ISBN13:
1323	                        978-1-57870-181-0, July 2000.

1325	   [PktSizeEquCC]       Vasallo, P., "Variable Packet Size Equation-
1326	                        Based Congestion Control", ICSI Technical
1327	                        Report tr-00-008, 2000, <http://
1328	                        http.icsi.berkeley.edu/ftp/global/pub/
1329	                        techreports/2000/tr-00-008.pdf>.

1331	   [RED93]              Floyd, S. and V. Jacobson, "Random Early
1332	                        Detection (RED) gateways for Congestion
1333	                        Avoidance", IEEE/ACM Transactions on
1334	                        Networking 1(4) 397--413, August 1993,
1335	                        <http://www.icir.org/floyd/papers/red/red.html>.

1337	   [REDbias]            Eddy, W. and M. Allman, "A Comparison of RED's
1338	                        Byte and Packet Modes", Computer Networks 42(3)
1339	                        261--280, June 2003, <http://www.ir.bbn.com/
1340	                        documents/articles/redbias.ps>.

1342	   [REDbyte]            De Cnodder, S., Elloumi, O., and K. Pauwels,
1343	                        "RED behavior with different packet sizes",
1344	                        Proc. 5th IEEE Symposium on Computers and
1345	                        Communications (ISCC) 793--799, July 2000,
1346	                        <http://www.icir.org/floyd/red/Elloumi99.pdf>.

1348	   [RFC2309]            Braden, B., Clark, D., Crowcroft, J., Davie, B.,
1349	                        Deering, S., Estrin, D., Floyd, S., Jacobson,
1350	                        V., Minshall, G., Partridge, C., Peterson, L.,
1351	                        Ramakrishnan, K., Shenker, S., Wroclawski, J.,
1352	                        and L. Zhang, "Recommendations on Queue
1353	                        Management and Congestion Avoidance in the
1354	                        Internet", RFC 2309, April 1998.

1356	   [RFC2474]            Nichols, K., Blake, S., Baker, F., and D. Black,
1357	                        "Definition of the Differentiated Services Field
1358	                        (DS Field) in the IPv4 and IPv6 Headers",
1359	                        RFC 2474, December 1998.

1361	   [RFC2914]            Floyd, S., "Congestion Control Principles",
1362	                        BCP 41, RFC 2914, September 2000.

1364	   [RFC3426]            Floyd, S., "General Architectural and Policy
1365	                        Considerations", RFC 3426, November 2002.

1367	   [RFC3550]            Schulzrinne, H., Casner, S., Frederick, R., and
1368	                        V. Jacobson, "RTP: A Transport Protocol for
1369	                        Real-Time Applications", STD 64, RFC 3550,
1370	                        July 2003.

1372	   [RFC3714]            Floyd, S. and J. Kempf, "IAB Concerns Regarding
1373	                        Congestion Control for Voice Traffic in the
1374	                        Internet", RFC 3714, March 2004.

1376	   [RFC4828]            Floyd, S. and E. Kohler, "TCP Friendly Rate
1377	                        Control (TFRC): The Small-Packet (SP) Variant",
1378	                        RFC 4828, April 2007.

1380	   [RFC5348]            Floyd, S., Handley, M., Padhye, J., and J.
1381	                        Widmer, "TCP Friendly Rate Control (TFRC):
1382	                        Protocol Specification", RFC 5348,
1383	                        September 2008.

1385	   [RFC5562]            Kuzmanovic, A., Mondal, A., Floyd, S., and K.
1386	                        Ramakrishnan, "Adding Explicit Congestion
1387	                        Notification (ECN) Capability to TCP's SYN/ACK
1388	                        Packets", RFC 5562, June 2009.

1390	   [RFC5670]            Eardley, P., "Metering and Marking Behaviour of
1391	                        PCN-Nodes", RFC 5670, November 2009.

1393	   [RFC5681]            Allman, M., Paxson, V., and E. Blanton, "TCP
1394	                        Congestion Control", RFC 5681, September 2009.

1396	   [RFC5690]            Floyd, S., Arcia, A., Ros, D., and J. Iyengar,
1397	                        "Adding Acknowledgement Congestion Control to
1398	                        TCP", RFC 5690, February 2010.

1400	   [RFC6077]            Papadimitriou, D., Welzl, M., Scharf, M., and B.
1401	                        Briscoe, "Open Research Issues in Internet
1402	                        Congestion Control", RFC 6077, February 2011.

1404	   [RFC6679]            Westerlund, M., Johansson, I., Perkins, C.,
1405	                        O'Hanlon, P., and K. Carlberg, "Explicit
1406	                        Congestion Notification (ECN) for RTP over UDP",
1407	                        RFC 6679, August 2012.

1409	   [RFC6789]            Briscoe, B., Woundy, R., and A. Cooper,
1410	                        "Congestion Exposure (ConEx) Concepts and Use
1411	                        Cases", RFC 6789, December 2012.

1413	   [Rate_fair_Dis]      Briscoe, B., "Flow Rate Fairness: Dismantling a
1414	                        Religion", ACM CCR 37(2)63--74, April 2007,
1415	                        <http://portal.acm.org/citation.cfm?id=1232926>.

1417	   [gentle_RED]         Floyd, S., "Recommendation on using the
1418	                        "gentle_" variant of RED", Web page ,
1419	                        March 2000,
1420	                        <http://www.icir.org/floyd/red/gentle.html>.

1422	   [pBox]               Floyd, S. and K. Fall, "Promoting the Use of
1423	                        End-to-End Congestion Control in the Internet",
1424	                        IEEE/ACM Transactions on Networking 7(4) 458--
1425	                        472, August 1999,
1426	                        <http://www.aciri.org/floyd/end2end-paper.html>.

1428	   [pktByteEmail]       Floyd, S., "RED: Discussions of Byte and Packet
1429	                        Modes", email , March 1997, <http://
1430	                        www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>.

1432	Appendix A.  Survey of RED Implementation Status

1434	   This Appendix is informative, not normative.

1436	   In May 2007 a survey was conducted of 84 vendors to assess how widely
1437	   drop probability based on packet size has been implemented in RED
1438	   Table 3.  About 19% of those surveyed replied, giving a sample size
1439	   of 16.  Although in most cases we do not have permission to identify
1440	   the respondents, we can say that those that have responded include
1441	   most of the larger equipment vendors, covering a large fraction of
1442	   the market.  The two who gave permission to be identified were Cisco
1443	   and Alcatel-Lucent.  The others range across the large network
1444	   equipment vendors at L3 & L2, firewall vendors, wireless equipment
1445	   vendors, as well as large software businesses with a small selection
1446	   of networking products.  All those who responded confirmed that they
1447	   have not implemented the variant of RED with drop dependent on packet
1448	   size (2 were fairly sure they had not but needed to check more
1449	   thoroughly).  At the time the survey was conducted, Linux did not
1450	   implement RED with packet-size bias of drop, although we have not
1451	   investigated a wider range of open source code.

1453	   +-------------------------------+----------------+-----------------+
1454	   |                      Response | No. of vendors | %age of vendors |
1455	   +-------------------------------+----------------+-----------------+
1456	   |               Not implemented |             14 |             17% |
1457	   |    Not implemented (probably) |              2 |              2% |
1458	   |                   Implemented |              0 |              0% |
1459	   |                   No response |             68 |             81% |
1460	   | Total companies/orgs surveyed |             84 |            100% |
1461	   +-------------------------------+----------------+-----------------+

1463	    Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop
1464	                      probability for small packets)

1466	   Where reasons have been given, the extra complexity of packet bias
1467	   code has been most prevalent, though one vendor had a more principled
1468	   reason for avoiding it--similar to the argument of this document.

1470	   Our survey was of vendor implementations, so we cannot be certain
1471	   about operator deployment.  But we believe many queues in the
1472	   Internet are still tail-drop.  The company of one of the co-authors
1473	   (BT) has widely deployed RED, but many tail-drop queues are bound to
1474	   still exist, particularly in access network equipment and on
1475	   middleboxes like firewalls, where RED is not always available.

1477	   Routers using a memory architecture based on fixed size buffers with
1478	   borrowing may also still be prevalent in the Internet.  As explained
1479	   in Section 4.2.1, these also provide a marginal (but legitimate) bias
1480	   towards small packets.  So even though RED byte-mode drop is not
1481	   prevalent, it is likely there is still some bias towards small
1482	   packets in the Internet due to tail drop and fixed buffer borrowing.

1484	Appendix B.  Sufficiency of Packet-Mode Drop

1486	   This Appendix is informative, not normative.

1488	   Here we check that packet-mode drop (or marking) in the network gives
1489	   sufficiently generic information for the transport layer to use.  We
1490	   check against a 2x2 matrix of four scenarios that may occur now or in
1491	   the future (Table 4).  The horizontal and vertical dimensions have
1492	   been chosen because each tests extremes of sensitivity to packet size
1493	   in the transport and in the network respectively.

1495	   Note that this section does not consider byte-mode drop at all.
1496	   Having deprecated byte-mode drop, the goal here is to check that
1497	   packet-mode drop will be sufficient in all cases.

1499	   +-------------------------------+-----------------+-----------------+
1500	   |                     Transport |  a) Independent | b) Dependent on |
1501	   |                               |  of packet size |  packet size of |
1502	   | Network                       |  of congestion  |    congestion   |
1503	   |                               |  notifications  |  notifications  |
1504	   +-------------------------------+-----------------+-----------------+
1505	   | 1) Predominantly              |   Scenario a1)  |   Scenario b1)  |
1506	   | bit-congestible network       |                 |                 |
1507	   | 2) Mix of bit-congestible and |   Scenario a2)  |   Scenario b2)  |
1508	   | pkt-congestible network       |                 |                 |
1509	   +-------------------------------+-----------------+-----------------+

1511	                Table 4: Four Possible Congestion Scenarios

1513	   Appendix B.1 focuses on the horizontal dimension of Table 4 checking
1514	   that packet-mode drop (or marking) gives sufficient information,
1515	   whether or not the transport uses it--scenarios b) and a)
1516	   respectively.

1518	   Appendix B.2 focuses on the vertical dimension of Table 4, checking
1519	   that packet-mode drop gives sufficient information to the transport
1520	   whether resources in the network are bit-congestible or packet-
1521	   congestible (these terms are defined in Section 1.1).

1523	   Notation:  To be concrete, we will compare two flows with different
1524	      packet sizes, s_1 and s_2.  As an example, we will take s_1 = 60B
1525	      = 480b and s_2 = 1500B = 12,000b.

1527	      A flow's bit rate, x [bps], is related to its packet rate, u
1528	      [pps], by

1530	         x(t) = s.u(t).

1532	      In the bit-congestible case, path congestion will be denoted by
1533	      p_b, and in the packet-congestible case by p_p.  When either case
1534	      is implied, the letter p alone will denote path congestion.

1536	B.1.  Packet-Size (In)Dependence in Transports

1538	   In all cases we consider a packet-mode drop queue that indicates
1539	   congestion by dropping (or marking) packets with probability p
1540	   irrespective of packet size. We use an example value of loss
1541	   (marking) probability, p=0.1%.

1543	   A transport like RFC5681 TCP treats a congestion notification on any
1544	   packet whatever its size as one event.  However, a network with just
1545	   the packet-mode drop algorithm does give more information if the
1546	   transport chooses to use it.  We will use Table 5 to illustrate this.

1548	   We will set aside the last column until later.  The columns labelled
1549	   "Flow 1" and "Flow 2" compare two flows consisting of 60B and 1500B
1550	   packets respectively.  The body of the table considers two separate
1551	   cases, one where the flows have equal bit-rate and the other with
1552	   equal packet-rates.  In both cases, the two flows fill a 96Mbps link.
1553	   Therefore, in the equal bit-rate case they each have half the bit-
1554	   rate (48Mbps).  Whereas, with equal packet-rates, flow 1 uses 25
1555	   times smaller packets so it gets 25 times less bit-rate--it only gets
1556	   1/(1+25) of the link capacity (96Mbps/26 = 4Mbps after rounding).  In
1557	   contrast flow 2 gets 25 times more bit-rate (92Mbps) in the equal
1558	   packet rate case because its packets are 25 times larger.  The packet
1559	   rate shown for each flow could easily be derived once the bit-rate
1560	   was known by dividing bit-rate by packet size, as shown in the column
1561	   labelled "Formula".

1563	       Parameter               Formula      Flow 1  Flow 2 Combined
1564	       ----------------------- ----------- ------- ------- --------
1565	       Packet size             s/8             60B  1,500B    (Mix)
1566	       Packet size             s              480b 12,000b    (Mix)
1567	       Pkt loss probability    p              0.1%    0.1%     0.1%

1569	       EQUAL BIT-RATE CASE
1570	       Bit-rate                x            48Mbps  48Mbps   96Mbps
1571	       Packet-rate             u = x/s     100kpps   4kpps  104kpps
1572	       Absolute pkt-loss-rate  p*u          100pps    4pps   104pps
1573	       Absolute bit-loss-rate  p*u*s        48kbps  48kbps   96kbps
1574	       Ratio of lost/sent pkts p*u/u          0.1%    0.1%     0.1%
1575	       Ratio of lost/sent bits p*u*s/(u*s)    0.1%    0.1%     0.1%

1577	       EQUAL PACKET-RATE CASE
1578	       Bit-rate                x             4Mbps  92Mbps   96Mbps
1579	       Packet-rate             u = x/s       8kpps   8kpps   15kpps
1580	       Absolute pkt-loss-rate  p*u            8pps    8pps    15pps
1581	       Absolute bit-loss-rate  p*u*s         4kbps  92kbps   96kbps
1582	       Ratio of lost/sent pkts p*u/u          0.1%    0.1%     0.1%
1583	       Ratio of lost/sent bits p*u*s/(u*s)    0.1%    0.1%     0.1%

1585	    Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and
1586	                      Large Packets and Both Combined

1588	   So far we have merely set up the scenarios.  We now consider
1589	   congestion notification in the scenario.  Two TCP flows with the same
1590	   round trip time aim to equalise their packet-loss-rates over time.
1591	   That is the number of packets lost in a second, which is the packets
1592	   per second (u) multiplied by the probability that each one is dropped
1593	   (p).  Thus TCP converges on the "Equal packet-rate" case, where both
1594	   flows aim for the same "Absolute packet-loss-rate" (both 8pps in the
1595	   table).

1597	   Packet-mode drop actually gives flows sufficient information to
1598	   measure their loss-rate in bits per second, if they choose, not just
1599	   packets per second.  Each flow can count the size of a lost or marked
1600	   packet and scale its rate-response in proportion (as TFRC-SP does).
1601	   The result is shown in the row entitled "Absolute bit-loss-rate",
1602	   where the bits lost in a second is the packets per second (u)
1603	   multiplied by the probability of losing a packet (p) multiplied by
1604	   the packet size (s).  Such an algorithm would try to remove any
1605	   imbalance in bit-loss-rate such as the wide disparity in the "Equal
1606	   packet-rate" case (4kbps vs. 92kbps).  Instead, a packet-size-
1607	   dependent algorithm would aim for equal bit-loss-rates, which would
1608	   drive both flows towards the "Equal bit-rate" case, by driving them
1609	   to equal bit-loss-rates (both 48kbps in this example).

1611	   The explanation so far has assumed that each flow consists of packets
1612	   of only one constant size.  Nonetheless, it extends naturally to
1613	   flows with mixed packet sizes.  In the right-most column of Table 5 a
1614	   flow of mixed size packets is created simply by considering flow 1
1615	   and flow 2 as a single aggregated flow.  There is no need for a flow
1616	   to maintain an average packet size.  It is only necessary for the
1617	   transport to scale its response to each congestion indication by the
1618	   size of each individual lost (or marked) packet.  Taking for example
1619	   the "Equal packet-rate" case, in one second about 8 small packets and
1620	   8 large packets are lost (making closer to 15 than 16 losses per
1621	   second due to rounding).  If the transport multiplies each loss by
1622	   its size, in one second it responds to 8*480b and 8*12,000b lost
1623	   bits, adding up to 96,000 lost bits in a second.  This double checks
1624	   correctly, being the same as 0.1% of the total bit-rate of 96Mbps.
1625	   For completeness, the formula for absolute bit-loss-rate is p(u1*s1+
1626	   u2*s2).

1628	   Incidentally, a transport will always measure the loss probability
1629	   the same irrespective of whether it measures in packets or in bytes.
1630	   In other words, the ratio of lost to sent packets will be the same as
1631	   the ratio of lost to sent bytes.  (This is why TCP's bit rate is
1632	   still proportional to packet size even when byte-counting is used, as
1633	   recommended for TCP in [RFC5681], mainly for orthogonal security
1634	   reasons.)  This is intuitively obvious by comparing two example
1635	   flows; one with 60B packets, the other with 1500B packets.  If both
1636	   flows pass through a queue with drop probability 0.1%, each flow will
1637	   lose 1 in 1,000 packets.  In the stream of 60B packets the ratio of
1638	   bytes lost to sent will be 60B in every 60,000B; and in the stream of
1639	   1500B packets, the loss ratio will be 1,500B out of 1,500,000B. When
1640	   the transport responds to the ratio of lost to sent packets, it will
1641	   measure the same ratio whether it measures in packets or bytes: 0.1%
1642	   in both cases.  The fact that this ratio is the same whether measured
1643	   in packets or bytes can be seen in Table 5, where the ratio of lost
1644	   to sent packets and the ratio of lost to sent bytes is always 0.1% in
1645	   all cases (recall that the scenario was set up with p=0.1%).

1647	   This discussion of how the ratio can be measured in packets or bytes
1648	   is only raised here to highlight that it is irrelevant to this memo!
1649	   Whether a transport depends on packet size or not depends on how this
1650	   ratio is used within the congestion control algorithm.

1652	   So far we have shown that packet-mode drop passes sufficient
1653	   information to the transport layer so that the transport can take
1654	   account of bit-congestion, by using the sizes of the packets that
1655	   indicate congestion.  We have also shown that the transport can
1656	   choose not to take packet size into account if it wishes.  We will
1657	   now consider whether the transport can know which to do.

1659	B.2.  Bit-Congestible and Packet-Congestible Indications

1661	   As a thought-experiment, imagine an idealised congestion notification
1662	   protocol that supports both bit-congestible and packet-congestible
1663	   resources.  It would require at least two ECN flags, one for each of
1664	   bit-congestible and packet-congestible resources.

1666	   1.  A packet-congestible resource trying to code congestion level p_p
1667	       into a packet stream should mark the idealised `packet
1668	       congestion' field in each packet with probability p_p
1669	       irrespective of the packet's size.  The transport should then
1670	       take a packet with the packet congestion field marked to mean
1671	       just one mark, irrespective of the packet size.

1673	   2.  A bit-congestible resource trying to code time-varying byte-
1674	       congestion level p_b into a packet stream should mark the `byte
1675	       congestion' field in each packet with probability p_b, again
1676	       irrespective of the packet's size.  Unlike before, the transport
1677	       should take a packet with the byte congestion field marked to
1678	       count as a mark on each byte in the packet.

1680	   This hides a fundamental problem--much more fundamental than whether
1681	   we can magically create header space for yet another ECN flag, or
1682	   whether it would work while being deployed incrementally.
1683	   Distinguishing drop from delivery naturally provides just one
1684	   implicit bit of congestion indication information--the packet is
1685	   either dropped or not.  It is hard to drop a packet in two ways that
1686	   are distinguishable remotely.  This is a similar problem to that of
1687	   distinguishing wireless transmission losses from congestive losses.

1689	   This problem would not be solved even if ECN were universally
1690	   deployed.  A congestion notification protocol must survive a
1691	   transition from low levels of congestion to high.  Marking two states
1692	   is feasible with explicit marking, but much harder if packets are
1693	   dropped.  Also, it will not always be cost-effective to implement AQM
1694	   at every low level resource, so drop will often have to suffice.

1696	   We are not saying two ECN fields will be needed (and we are not
1697	   saying that somehow a resource should be able to drop a packet in one
1698	   of two different ways so that the transport can distinguish which
1699	   sort of drop it was!).  These two congestion notification channels
1700	   are a conceptual device to illustrate a dilemma we could face in the
1701	   future.  Section 3 gives four good reasons why it would be a bad idea
1702	   to allow for packet size by biasing drop probability in favour of
1703	   small packets within the network.  The impracticality of our thought
1704	   experiment shows that it will be hard to give transports a practical
1705	   way to know whether to take account of the size of congestion
1706	   indication packets or not.

1708	   Fortunately, this dilemma is not pressing because by design most
1709	   equipment becomes bit-congested before its packet-processing becomes
1710	   congested (as already outlined in Section 1.1).  Therefore transports
1711	   can be designed on the relatively sound assumption that a congestion
1712	   indication will usually imply bit-congestion.

1714	   Nonetheless, although the above idealised protocol isn't intended for
1715	   implementation, we do want to emphasise that research is needed to
1716	   predict whether there are good reasons to believe that packet
1717	   congestion might become more common, and if so, to find a way to
1718	   somehow distinguish between bit and packet congestion [RFC3714].

1720	   Recently, the dual resource queue (DRQ) proposal [DRQ] has been made
1721	   on the premise that, as network processors become more cost
1722	   effective, per packet operations will become more complex
1723	   (irrespective of whether more function in the network is desirable).
1724	   Consequently the premise is that CPU congestion will become more
1725	   common.  DRQ is a proposed modification to the RED algorithm that
1726	   folds both bit congestion and packet congestion into one signal
1727	   (either loss or ECN).

1729	   Finally, we note one further complication.  Strictly, packet-
1730	   congestible resources are often cycle-congestible.  For instance, for
1731	   routing look-ups load depends on the complexity of each look-up and
1732	   whether the pattern of arrivals is amenable to caching or not.  This
1733	   also reminds us that any solution must not require a forwarding
1734	   engine to use excessive processor cycles in order to decide how to
1735	   say it has no spare processor cycles.

1737	Appendix C.  Byte-mode Drop Complicates Policing Congestion Response

1739	   This section is informative, not normative.

1741	   There are two main classes of approach to policing congestion
1742	   response: i) policing at each bottleneck link or ii) policing at the
1743	   edges of networks.  Packet-mode drop in RED is compatible with
1744	   either, while byte-mode drop precludes edge policing.

1746	   The simplicity of an edge policer relies on one dropped or marked
1747	   packet being equivalent to another of the same size without having to
1748	   know which link the drop or mark occurred at.  However, the byte-mode
1749	   drop algorithm has to depend on the local MTU of the line--it needs
1750	   to use some concept of a 'normal' packet size.  Therefore, one
1751	   dropped or marked packet from a byte-mode drop algorithm is not
1752	   necessarily equivalent to another from a different link.  A policing
1753	   function local to the link can know the local MTU where the
1754	   congestion occurred.  However, a policer at the edge of the network
1755	   cannot, at least not without a lot of complexity.

1757	   The early research proposals for type (i) policing at a bottleneck
1758	   link [pBox] used byte-mode drop, then detected flows that contributed
1759	   disproportionately to the number of packets dropped.  However, with
1760	   no extra complexity, later proposals used packet mode drop and looked
1761	   for flows that contributed a disproportionate amount of dropped bytes
1762	   [CHOKe_Var_Pkt].

1764	   Work is progressing on the congestion exposure protocol (ConEx
1765	   [RFC6789]), which enables a type (ii) edge policer located at a
1766	   user's attachment point.  The idea is to be able to take an
1767	   integrated view of the effect of all a user's traffic on any link in
1768	   the internetwork.  However, byte-mode drop would effectively preclude
1769	   such edge policing because of the MTU issue above.

1771	   Indeed, making drop probability depend on the size of the packets
1772	   that bits happen to be divided into would simply encourage the bits
1773	   to be divided into smaller packets in order to confuse policing.  In
1774	   contrast, as long as a dropped/marked packet is taken to mean that
1775	   all the bytes in the packet are dropped/marked, a policer can remain
1776	   robust against bits being re-divided into different size packets or
1777	   across different size flows [Rate_fair_Dis].

1779	Appendix D.  Changes from Previous Versions

1781	   To be removed by the RFC Editor on publication.

1783	   Full incremental diffs between each version are available at
1784	   <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/>
1785	   (courtesy of the rfcdiff tool):

1787	   From -09 to -10:  Following IESG review:

1789	      *  Updates 2309: Left header unchanged reflecting eventual IESG
1790	         consensus [Sean Turner, Pete Resnick].

1792	      *  S.1 Intro: This memo adds to the congestion control principles
1793	         enumerated in BCP 41 [Pete Resnick]

1795	      *  Abstract, S.1, S.1.1, s.1.2 Intro, Scoping and Example: Made
1796	         applicability to all AQMs clearer listing some more example
1797	         AQMs and explained that we always use RED for examples, but
1798	         this doesn't mean it's not applicable to other AQMs.  [A number
1799	         of reviewers have described the draft as "about RED"]

1801	      *  S.1 & S.2.1 Queue measurement: Explained that the choice
1802	         between measuring the queue in packets or bytes is only
1803	         relevant if measuring it in time units is infeasible [So as not
1804	         to imply that we haven't noticed the advances made by PDPC &
1805	         CoDel]

1807	      *  S.1.1.  Terminology: Better explained why hybrid systems
1808	         congested by both packets and bytes are often designed to be
1809	         treated as bit-congestible [Richard Barnes].

1811	      *  S.2.1.  Queue measurement advice: Added examples.  Added a
1812	         counter-example to justify SHOULDs rather than MUSTs.  Pointed
1813	         to S.4.1 for a list of more complicated scenarios.  [Benson
1814	         Schliesser, OpsDir]

1816	      *  S2.2.  Recommendation on Encoding Congestion Notification:
1817	         Removed SHOULD treat packets equally, leaving only SHOULD NOT
1818	         drop dependent on packet size, to avoid it sounding like we're
1819	         saying QoS is not allowed.  Pointed to possible app-specific
1820	         legacy use of byte-mode as a counter-example that prevents us
1821	         saying MUST NOT.  [Pete Resnick]

1823	      *  S.2.3.  Recommendation on Responding to Congestion: capitalised
1824	         the two SHOULDs in recommendations for TCP, and gave possible
1825	         counter-examples. [noticed while dealing with Pete Resnick's
1826	         point]

1828	      *  S2.4.  Splitting & Merging: RTCP -> RTP/RTCP [Pete McCann, Gen-
1829	         ART]

1831	      *  S.3.2 Small != Control: many control packets are small ->
1832	         ...tend to be small [Stephen Farrell]

1834	      *  S.3.1 Perverse incentives: Changed transport designers to app
1835	         developers [Stephen Farrell]

1837	      *  S.4.1.1.  Fixed Size Packet Buffers: Nearly completely re-
1838	         written to simplify and to reverse the advice when the
1839	         underlying resource is bit-congestible, irrespective of whether
1840	         the buffer consists of fixed-size packet buffers.  [Richard
1841	         Barnes & Benson Schliesser]

1843	      *  S.4.2.1.2.  Packet Size Bias Regardless of AQM: Largely re-
1844	         written to reflect the earlier change in advice about fixed-
1845	         size packet buffers, and to primarily focus on getting rid of
1846	         tail-drop, not various nuances of tail-drop.  [Richard Barnes &
1847	         Benson Schliesser]

1849	      *  Editorial corrections [Tim Bray, AppsDir, Pete McCann, Gen-ART
1850	         and others]

1852	      *  Updated refs (two I-Ds have become RFCs).  [Pete McCann]

1854	   From -08 to -09:  Following WG last call:

1856	      *  S.2.1: Made RED-related queue measurement recommendations
1857	         clearer

1859	      *  S.2.3: Added to "Recommendation on Responding to Congestion" to
1860	         make it clear that we are definitely not saying transports have
1861	         to equalise bit-rates, just how to do it and not do it, if you
1862	         want to.

1864	      *  S.3: Clarified motivation sections S.3.3 "Transport-Independent
1865	         Network" and S.3.5 "Implementation Efficiency"

1867	      *  S.3.4: Completely changed motivating argument from "Scaling
1868	         Congestion Control with Packet Size" to "Partial Deployment of
1869	         AQM".

1871	   From -07 to -08:

1873	      *  Altered abstract to say it provides best current practice and
1874	         highlight that it updates RFC2309

1876	      *  Added null IANA section

1878	      *  Updated refs

1880	   From -06 to -07:

1882	      *  A mix-up with the corollaries and their naming in 2.1 to 2.3
1883	         fixed.

1885	   From -05 to -06:

1887	      *  Primarily editorial fixes.

1889	   From -04 to -05:

1891	      *  Changed from Informational to BCP and highlighted non-normative
1892	         sections and appendices

1894	      *  Removed language about consensus

1896	      *  Added "Example Comparing Packet-Mode Drop and Byte-Mode Drop"

1898	      *  Arranged "Motivating Arguments" into a more logical order and
1899	         completely rewrote "Transport-Independent Network" & "Scaling
1900	         Congestion Control with Packet Size" arguments.  Removed "Why
1901	         Now?"

1903	      *  Clarified applicability of certain recommendations

1905	      *  Shifted vendor survey to an Appendix

1907	      *  Cut down "Outstanding Issues and Next Steps"

1909	      *  Re-drafted the start of the conclusions to highlight the three
1910	         distinct areas of concern

1912	      *  Completely re-wrote appendices

1914	      *  Editorial corrections throughout.

1916	   From -03 to -04:

1918	      *  Reordered Sections 2 and 3, and some clarifications here and
1919	         there based on feedback from Colin Perkins and Mirja
1920	         Kuehlewind.

1922	   From -02 to -03  (this version)

1924	      *  Structural changes:

1926	         +  Split off text at end of "Scaling Congestion Control with
1927	            Packet Size" into new section "Transport-Independent
1928	            Network"

1930	         +  Shifted "Recommendations" straight after "Motivating
1931	            Arguments" and added "Conclusions" at end to reinforce
1932	            Recommendations

1934	         +  Added more internal structure to Recommendations, so that
1935	            recommendations specific to RED or to TCP are just
1936	            corollaries of a more general recommendation, rather than
1937	            being listed as a separate recommendation.

1939	         +  Renamed "State of the Art" as "Critical Survey of Existing
1940	            Advice" and retitled a number of subsections with more
1941	            descriptive titles.

1943	         +  Split end of "Congestion Coding: Summary of Status" into a
1944	            new subsection called "RED Implementation Status".

1946	         +  Removed text that had been in the Appendix "Congestion
1947	            Notification Definition: Further Justification".

1949	      *  Reordered the intro text a little.

1951	      *  Made it clearer when advice being reported is deprecated and
1952	         when it is not.

1954	      *  Described AQM as in network equipment, rather than saying "at
1955	         the network layer" (to side-step controversy over whether
1956	         functions like AQM are in the transport layer but in network
1957	         equipment).

1959	      *  Minor improvements to clarity throughout

1961	   From -01 to -02:

1963	      *  Restructured the whole document for (hopefully) easier reading
1964	         and clarity.  The concrete recommendation, in RFC2119 language,
1965	         is now in Section 8.

1967	   From -00 to -01:

1969	      *  Minor clarifications throughout and updated references

1971	   From briscoe-byte-pkt-mark-02 to ietf-byte-pkt-congest-00:

1973	      *  Added note on relationship to existing RFCs
1974	      *  Posed the question of whether packet-congestion could become
1975	         common and deferred it to the IRTF ICCRG.  Added ref to the
1976	         dual-resource queue (DRQ) proposal.

1978	      *  Changed PCN references from the PCN charter & architecture to
1979	         the PCN marking behaviour draft most likely to imminently
1980	         become the standards track WG item.

1982	   From -01 to -02:

1984	      *  Abstract reorganised to align with clearer separation of issue
1985	         in the memo.

1987	      *  Introduction reorganised with motivating arguments removed to
1988	         new Section 3.

1990	      *  Clarified avoiding lock-out of large packets is not the main or
1991	         only motivation for RED.

1993	      *  Mentioned choice of drop or marking explicitly throughout,
1994	         rather than trying to coin a word to mean either.

1996	      *  Generalised the discussion throughout to any packet forwarding
1997	         function on any network equipment, not just routers.

1999	      *  Clarified the last point about why this is a good time to sort
2000	         out this issue: because it will be hard / impossible to design
2001	         new transports unless we decide whether the network or the
2002	         transport is allowing for packet size.

2004	      *  Added statement explaining the horizon of the memo is long
2005	         term, but with short term expediency in mind.

2007	      *  Added material on scaling congestion control with packet size
2008	         (Section 3.4).

2010	      *  Separated out issue of normalising TCP's bit rate from issue of
2011	         preference to control packets (Section 3.2).

2013	      *  Divided up Congestion Measurement section for clarity,
2014	         including new material on fixed size packet buffers and buffer
2015	         carving (Section 4.1.1 & Section 4.2.1) and on congestion
2016	         measurement in wireless link technologies without queues
2017	         (Section 4.1.2).

2019	      *  Added section on 'Making Transports Robust against Control
2020	         Packet Losses' (Section 4.2.3) with existing & new material
2021	         included.

2023	      *  Added tabulated results of vendor survey on byte-mode drop
2024	         variant of RED (Table 3).

2026	   From -00 to -01:

2028	      *  Clarified applicability to drop as well as ECN.

2030	      *  Highlighted DoS vulnerability.

2032	      *  Emphasised that drop-tail suffers from similar problems to
2033	         byte-mode drop, so only byte-mode drop should be turned off,
2034	         not RED itself.

2036	      *  Clarified the original apparent motivations for recommending
2037	         byte-mode drop included protecting SYNs and pure ACKs more than
2038	         equalising the bit rates of TCPs with different segment sizes.
2039	         Removed some conjectured motivations.

2041	      *  Added support for updates to TCP in progress (ackcc & ecn-syn-
2042	         ack).

2044	      *  Updated survey results with newly arrived data.

2046	      *  Pulled all recommendations together into the conclusions.

2048	      *  Moved some detailed points into two additional appendices and a
2049	         note.

2051	      *  Considerable clarifications throughout.

2053	      *  Updated references

2055	Authors' Addresses

2057	   Bob Briscoe
2058	   BT
2059	   B54/77, Adastral Park
2060	   Martlesham Heath
2061	   Ipswich  IP5 3RE
2062	   UK

2064	   Phone: +44 1473 645196
2065	   EMail: bob.briscoe@bt.com
2066	   URI:   http://bobbriscoe.net/
2067	   Jukka Manner
2068	   Aalto University
2069	   Department of Communications and Networking (Comnet)
2070	   P.O. Box 13000
2071	   FIN-00076 Aalto
2072	   Finland

2074	   Phone: +358 9 470 22481
2075	   EMail: jukka.manner@aalto.fi
2076	   URI:   http://www.netlab.tkk.fi/~jmanner/