idnits 2.17.1 

draft-ietf-conex-abstract-mech-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 14, 2011) is 4791 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'FairerFaster' is defined on line 682, but no explicit
     reference was found in the text

  == Unused Reference: 'Re-fb' is defined on line 778, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-conex-concepts-uses-01

  == Outdated reference: A later version (-10) exists of
     draft-ietf-ledbat-congestion-03

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 3448
     (Obsoleted by RFC 5348)


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Congestion Exposure (ConEx) Working                            M. Mathis
3	Group                                                        Google, Inc
4	Internet-Draft                                                B. Briscoe
5	Intended status: Informational                                        BT
6	Expires: September 15, 2011                               March 14, 2011

8	      Congestion Exposure (ConEx) Concepts and Abstract Mechanism
9	                   draft-ietf-conex-abstract-mech-01

11	Abstract

13	   This document describes an abstract mechanism by which senders inform
14	   the network about the congestion encountered by packets earlier in
15	   the same flow.  Today, the network may signal congestion to the
16	   receiver by ECN markings or by dropping packets, and the receiver
17	   passes this information back to the sender in transport-layer
18	   feedback.  The mechanism to be developed by the ConEx WG will enable
19	   the sender to also relay this congestion information back into the
20	   network in-band at the IP layer, such that the total level of
21	   congestion is visible to all IP devices along the path, from where it
22	   could, for example, provide input to traffic management.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on September 15, 2011.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
59	     1.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
60	   2.  Requirements for the ConEx Signal  . . . . . . . . . . . . . .  5
61	   3.  Representing Congestion Exposure . . . . . . . . . . . . . . .  6
62	     3.1.  Strawman Encoding  . . . . . . . . . . . . . . . . . . . .  7
63	     3.2.  ECN Based Encoding . . . . . . . . . . . . . . . . . . . .  7
64	       3.2.1.  ECN Changes  . . . . . . . . . . . . . . . . . . . . .  8
65	     3.3.  Abstract Encoding  . . . . . . . . . . . . . . . . . . . .  9
66	       3.3.1.  Independent Bits . . . . . . . . . . . . . . . . . . .  9
67	       3.3.2.  Codepoint Encoding . . . . . . . . . . . . . . . . . .  9
68	   4.  Congestion Exposure Components . . . . . . . . . . . . . . . . 10
69	     4.1.  Modified Senders . . . . . . . . . . . . . . . . . . . . . 10
70	     4.2.  Receivers (Optionally Modified)  . . . . . . . . . . . . . 10
71	     4.3.  Audit  . . . . . . . . . . . . . . . . . . . . . . . . . . 10
72	       4.3.1.  Using Credit to Simplify Audit . . . . . . . . . . . . 11
73	       4.3.2.  Behaviour Constraints for the Audit Function . . . . . 12
74	     4.4.  Policy Devices . . . . . . . . . . . . . . . . . . . . . . 13
75	       4.4.1.  Policy Monitoring Devices  . . . . . . . . . . . . . . 13
76	       4.4.2.  Congestion Policers  . . . . . . . . . . . . . . . . . 13
77	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
78	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
79	   7.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 14
80	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
81	   9.  Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 14
82	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
83	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
84	     10.2. Informative References . . . . . . . . . . . . . . . . . . 14

86	1.  Introduction

88	   One of the required functions of a transport protocol is controlling
89	   congestion in the network.  There are three techniques in use today
90	   for the network to signal congestion to a transport:
91	   o  The most common congestion signal is packet loss.  When congested,
92	      the network simply discards some packets either as part of an
93	      active queue management function [RFC2309] or as the consequence
94	      of a queue overflow or other resource starvation.  The transport
95	      receiver detects that some data is missing and signals such
96	      through transport acknowledgments to the transport sender (e.g.
97	      TCP SACK options).  The sender performs the appropriate congestion
98	      control rate reduction (e.g.  [RFC5681] for TCP) and, if it is a
99	      reliable transport, it retransmits the missing data.
100	   o  If the transport supports explicit congestion notification (ECN)
101	      [RFC3168] or pre-congestion notification (PCN) [RFC5670] , the
102	      transport sender indicates this by setting an ECN-capable
103	      transport (ECT) codepoint in every packet.  Network devices can
104	      then explicitly signal congestion to the receiver by setting ECN
105	      bits in the IP header of such packets.  The transport receiver
106	      communicates these ECN signals back to the sender, which then
107	      performs the appropriate congestion control rate reduction.
108	   o  Some experimental transport protocols and TCP variants [Vegas]
109	      sense queuing delays in the network and reduce their rate before
110	      the network has to signal congestion using loss or ECN.  A purely
111	      delay-sensing transport will tend to be pushed out by other
112	      competing transports that do not back off until they have driven
113	      the queue into loss.  Therefore, modern delay-sensing algorithms
114	      use delay in some combination with loss to signal congestion (e.g.
115	      LEDBAT [I-D.ietf-ledbat-congestion], Compound
116	      [I-D.sridharan-tcpm-ctcp]).  In the rest of this document, we will
117	      confine the discussion to concrete signals of congestion such as
118	      loss and ECN.  We will not discuss delay-sensing further, because
119	      it can only avoid these more concrete signals of congestion in
120	      some circumstances.

122	   In all cases the congestion signals follow the route indicated in
123	   Figure 1.  A congested network device sends a signal in the data
124	   stream on the forward path to the transport receiver, the receiver
125	   passes it back to the sender through transport level feedback, and
126	   the sender makes some congestion control adjustment.

128	   This document proposes to extend the capabilities of the Internet
129	   protocol suite with the addition of a ConEx Signal that, to a first
130	   approximation, relays the congestion information from the transport
131	   sender back through the internetwork layer.  That signal is shown in
132	   Figure 1.  It would be visible to all internetwork layer devices
133	   along the forward (data) path and is intended to support a number of
134	   new policy-controlled mechanisms that might be used to manage
135	   traffic.

137	   There is no expectation that internetwork layer devices will do fine-
138	   grained congestion control using ConEx information.  That is still
139	   probably best done at the transport sender.  Rather, the network will
140	   be able to use ConEx information to do better bulk traffic
141	   management, which in turn should incentivize end-system transports to
142	   be more careful about congesting others [I-D.conex-concepts-uses].

144	   +---------+                                               +---------+
145	   |Transport|             +-----------+                     |Transport|
146	   | Sender  |>=Data=Path=>|(Congested)|>=====Data=Path=====>| Receiver|
147	   |         |             |  Network  |>-Congestion-Signal->|---.     |
148	   |         |             |   Device  |                     |   |     |
149	   |         |             +-----------+                     |   |     |
150	   |         |                                               |   |     |
151	   |         |<==Feedback=Path==============================<|   |     |
152	   |     ,---|<--Transport Layer returned Congestion Signal-<|<--'     |
153	   |     |   |                                               |         |
154	   |     |   |>==============Data=Path======================>|         |
155	   |     `-->|>---------(new)-IP layer ConEx Signal--------->|         |
156	   |         |        (Carried in Data Packet Headers)       |         |
157	   +---------+                                               +---------+

159	   Not shown are policy devices along the data path that observe the
160	   ConEx Signal, and use the information to monitor or manage traffic.
161	   These are discussed in Section 4.4.

163	                                 Figure 1

165	1.1.  Terminology

167	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
168	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
169	   document are to be interpreted as described in RFC 2119 [RFC2119].

171	   ConEx signals in IP packet headers from the sender to the network
172	   {ToDo: These are placeholders for whatever words we decide to use}:
173	   Not-ConEx:  The transport is not ConEx-capable
174	   ConEx-Capable:  The transport is ConEx-Capable.  This is the opposite
175	      of Not-ConEx and implies one of the following signals
176	      Re-Echo-Loss:  (aka Purple) The transport has experienced a loss
177	      Re-Echo-ECN:  (aka Black) The transport has experienced an ECN
178	         mark

180	      Credit:  (aka Green) The transport is building up credit to allow
181	         for any future delay in expected ConEx signals (see
182	         Section 4.3.1)
183	      ConEx-Not-Marked:  The transport is ConEx-capable but is signaling
184	         none of Re-Echo-Loss, Re-Echo-ECN or Credit
185	      ConEx-Marked:  At least one of Re-Echo-Loss, Re-Echo-ECN or
186	         Credit.

188	2.  Requirements for the ConEx Signal

190	   Ideally, all the following requirements would be met by a Congestion
191	   Exposure Signal.  However it is already known that some compromises
192	   will be necessary, therefore all the requirements are expressed with
193	   the keyword 'SHOULD' rather than 'MUST'.  The only mandatory
194	   requirement is that a concrete protocol description MUST give sound
195	   reasoning if it chooses not to meet any of these requirements:
196	   a.  The ConEx Signal SHOULD be visible to internetwork layer devices
197	       along the entire path from the transport sender to the transport
198	       receiver.  Equivalently, it SHOULD be present in the IPv4 or IPv6
199	       header, and in the outermost IP header if using IP in IP
200	       tunneling.  The ConEx Signal SHOULD be immutable once set by the
201	       transport sender.  A corollary of these requirements is that the
202	       chosen ConEx encoding SHOULD pass silently without modification
203	       through pre-existing networking gear.
204	   b.  The ConEx Signal SHOULD be useful under only partial deployment.
205	       A minimal deployment SHOULD only require changes to transport
206	       senders.  Furthermore, partial deployment SHOULD create
207	       incentives for additional deployment, both in terms of enabling
208	       ConEx on more devices and adding richer features to existing
209	       devices.  Nonetheless, ConEx deployment need never be universal,
210	       and it is anticipated that some hosts and some transports may
211	       never support the ConEx Protocol and some networks may never use
212	       the ConEx Signals.
213	   c.  The ConEx Signal SHOULD be accurate.  In potentially hostile
214	       environments such as the public Internet, it SHOULD be possible
215	       for techniques to be deployed to audit the Congestion Exposure
216	       Signal by comparing it to the actual congestion signals on the
217	       forward data path.  The auditing mechanism must have a capability
218	       for providing sufficient disincentives against misreported
219	       congestion, such as by throttling traffic that reports less
220	       congestion than it is actually experiencing.
221	   d.  The ConEx Signal SHOULD be timely.  There will be a delay between
222	       the time when an auditing device sees an actual congestion signal
223	       and when it sees the subsequent Congestion Exposure Signal from
224	       the sender.  The minimum delay will be one round trip, but it may
225	       be much longer depending on the transport's choice of feedback
226	       delay (consider RTCP [RFC3550] for example).  It is not practical
227	       to expect auditing devices in the network to make allowance for
228	       such feedback delays.  Instead, the sender SHOULD be able to send
229	       ConEx signals in advance, as 'credit' for any audit function to
230	       hold as a balance against the risk of congestion during the
231	       feedback delay.  This design choice greatly simplifies auditing
232	       (see Section 4.3.1).

234	   It is important to note that the auditing requirement implies a
235	   number of additional constraints: The basic auditing technique is to
236	   count both actual congestion signals and ConEx Signals someplace
237	   along the data path:
238	   o  For congestion signaled by ECN, auditing is most accurate when
239	      located near the transport receiver.  Within any flow or aggregate
240	      of flows, the volume of data tagged with ConEx Signals should
241	      never be less than the total volume of ECN marked data seen near
242	      the receiver.
243	   o  For congestion signaled by loss, totally accurate auditing is not
244	      believed to be possible in the general case, because it involves a
245	      network node detecting the absence of some packets, when it cannot
246	      necessarily see the transport protocol sequence numbers and when
247	      the missing packets might simply be taking a different route.  But
248	      there are common cases where sufficient audit accuracy should be
249	      possible:
250	      *  For non-IPsec traffic conforming to standard TCP sequence
251	         numbering on a single path, an auditor could detect losses by
252	         observing both the original transmission and the retransmission
253	         after the loss.  Such auditing would be most accurate near the
254	         sender.
255	      *  For networks designed so that losses predominantly occur under
256	         the management of one IP-aware node on the path, the auditor
257	         could be located at this bottleneck.  It could simply compare
258	         ConEx Signals with actual local losses.  This is a good model
259	         for most consumer access networks where audit accuracy could
260	         well be sufficient even if losses occasionally occur at other
261	         nodes in the network, such as border gateways (see Section 4.3
262	         for details).

264	   Given that loss-based and ECN-based ConEx might sometimes be best
265	   audited at different locations, having distinct encodings would widen
266	   the design space for the auditing function.

268	3.  Representing Congestion Exposure

270	   Most protocol specifications start with a description of packet
271	   formats and codepoints with their associated meanings.  This document
272	   does not: It is already known that choosing the encoding for the
273	   ConEx Signal is likely to entail some engineering compromises that
274	   have the potential to reduce the protocol's usefulness in some
275	   settings.  Rather than making these engineering choices prematurely,
276	   this document side steps the encoding problem by describing an
277	   abstract representation of ConEx Signals.  All of the elements of the
278	   protocol can be defined in terms of this abstract representation.
279	   Most important, the preliminary use cases for the protocol are
280	   described in terms of the abstract representation in companion
281	   documents [I-D.conex-concepts-uses].

283	   Once we have some example use cases we can evaluate different
284	   encoding schemes.  Since these schemes are likely to include some
285	   conflated code points, some information will be lost resulting in
286	   weakening or disabling some of the algorithms and eliminating some
287	   use cases.

289	   The goal of this approach is to be as complete as possible for
290	   discovering the potential usage and capabilities of the ConEx
291	   protocol, so we have some hope of making optimal design decisions
292	   when choosing the encoding.

294	3.1.  Strawman Encoding

296	   As an aid to the reader, it might be helpful to describe a naive
297	   strawman encoding of the ConEx protocol described solely in terms of
298	   TCP: set the Reserved bit in the IPv4 header (bit 48 counting from
299	   zero [RFC0791]--aka the "evil bit" [RFC3514]) on all retransmissions
300	   or once per ECN signaled window reduction.  Clearly network devices
301	   along the forward path can see this bit and act on it.  For example
302	   they can count marked and unmarked packets to estimate the congestion
303	   levels along the path.

305	   However, the IESG has chartered the ConEx working group to establish
306	   that there is sufficient demand for an IPv6 ConEx protocol before
307	   using the last available bit in the IPv4 header.  Furthermore this
308	   encoding, by itself, does not sufficiently support partial deployment
309	   or strong auditing and might motivate users and/or applications to
310	   misrepresent the congestion that they are causing.

312	   Nonetheless, this strawman encoding does present a clear mental model
313	   of how the ConEx protocol might function under various uses.

315	3.2.  ECN Based Encoding

317	   Ideally ConEx and ECN are orthogonal signals and SHOULD be entirely
318	   independent.  However, given the limited number of header bit and/or
319	   code points, these signals may have to share code points, at least
320	   partially.

322	   The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an
323	   implementation of ConEx that had to be tightly integrated with the
324	   encoding of ECN in order to fit into the IP header.  The central
325	   theme of the re-ECN work is an audit mechanism that can provide
326	   sufficient disincentives against misrepresenting congestion
327	   [I-D.briscoe-tsvwg-re-ecn-motiv], which is analyzed extensively in
328	   Briscoe's PhD dissertation [Refb-dis].

330	   Re-ECN is a good example of one chosen set of compromises attempting
331	   to meet the requirements of Section 2.  However, the present document
332	   takes a step back, aiming to state the ideal requirements in order to
333	   allow the Internet community to assess whether other compromises are
334	   possible.

336	   In particular, different incremental deployment choices may be
337	   desirable to meet the partial deployment requirement of Section 2.
338	   Re-ECN requires the receiver to be at least ECN-capable as well as
339	   requiring an update to the sender.  Although ConEx will inherently
340	   require change at the sender, it would be preferable if it could
341	   work, even partially, with any receiver.

343	   The chosen ConEx protocol certainly must not require ECN to be
344	   deployed in any network.  In this respect re-ECN is already a good
345	   example--it acts perfectly well as a loss-based ConEx protocol it the
346	   loss-based audit techniques in Section 4.3 are used.  However, it
347	   would still be desirable to avoid the dependence on an ECN receiver.

349	   For a tutorial background on re-ECN techniques, see [Re-fb,
350	   FairerFaster].

352	3.2.1.  ECN Changes

354	   Although the re-ECN protocol requires no changes to the network part
355	   of the ECN protocol, it is important to note that it does propose
356	   some relatively minor modifications to the host-to-host aspects of
357	   the ECN protocol specified in RFC 3168.  They include: redefining the
358	   ECT(1) code point (the change is consistent with RFC3168 but requires
359	   deprecating the experimental ECN nonce [RFC3540]); modifications to
360	   the ECN negotiations carried on the SYN and SYN-ACK; and using a
361	   different state machine to carry ECN signals in the transport
362	   acknowledgments from a modified Receiver to the Sender.  This last
363	   change is optional, but it permits the transport protocol to carry
364	   multiple congestion signals per round trip.  It greatly simplifies
365	   accurate auditing, and is likely to be useful in other transports,
366	   e.g.  DCTCP [DCTCP].

368	   All of these adjustments to RFC 3168 may also be needed in a future
369	   standardized ConEx protocol.  There will need to be very careful
370	   consideration of any proposed changes to ECN or other existing
371	   protocols, because any such changes increase the cost of deployment.

373	3.3.  Abstract Encoding

375	   The ConEx protocol could take one of two different encodings:
376	   independently settable bits or an enumerated set of mutually
377	   exclusive codepoints.

379	   In both cases, the amount of congestion is signaled by the volume of
380	   marked data--just as the volume of lost data or ECN marked data
381	   signals the amount of congestion experienced.  Thus the size of each
382	   packet carrying a ConEx Signal is significant.

384	3.3.1.  Independent Bits

386	   This encoding involves flag bits, each of which the sender can set
387	   independently to indicate to the network one of the following four
388	   signals:
389	   ConEx (Not-ConEx)  The transport is (or is not) using ConEx with this
390	      packet (the protocol MUST be arranged so that legacy transport
391	      senders implicitly send Not-ConEx)
392	   Re-Echo-Loss (Not-Re-Echo-Loss)  The transport has (or has not)
393	      experienced a loss
394	   Re-Echo-ECN (Not-Re-Echo-ECN)  The transport has (or has not)
395	      experienced ECN-signaled congestion
396	   Credit (Not-Credit)  The transport is (or is not) building up
397	      congestion credit (see Section 4.3 on the audit function)

399	3.3.2.  Codepoint Encoding

401	   This encoding involves signaling one of the following five
402	   codepoints:

404	   ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit}

406	   Each named codepoint has the same meaning as in the encoding using
407	   independent bits (Section 3.3.1).  The use of any one codepoint
408	   implies the negative of all the others.

410	   Inherently, the semantics of most of the enumerated codepoints are
411	   mutually exclusive.  'Credit' is the only one that might need to be
412	   used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even
413	   that requirement is questionable.  It must not be forgotten that the
414	   enumerated encoding loses the flexibility to signal these two
415	   combinations, whereas the encoding with four independent bits is not
416	   so limited.  Alternatively two extra codepoints could be assigned to
417	   these two combinations of semantics.

419	4.  Congestion Exposure Components

421	   {ToDo: Picture of the components, similar to that in the last
422	   slideset about conex-concepts-uses?}

424	4.1.  Modified Senders

426	   The sending transport needs to be modified to send Congestion
427	   Exposure Signals in response to congestion feedback signals.

429	4.2.  Receivers (Optionally Modified)

431	   The receiving transport may already feedback sufficiently useful
432	   signals to the sender so that it does not need to be altered.

434	   However, a TCP receiver feeds back ECN congestion signals no more
435	   than once within a round trip.  The sender may require more precise
436	   feedback from the receiver otherwise it will appear to be
437	   understating its ConEx Signals (see Section 3.2.1).

439	   Ideally, ConEx should be added to a transport like TCP without
440	   mandatory modifications to the receiver.  But an optional
441	   modification to the receiver could be recommended for precision.
442	   This was the approach taken when adding re-ECN to TCP
443	   [I-D.briscoe-tsvwg-re-ecn-tcp].

445	4.3.  Audit

447	   To audit ConEx Signals against actual losses (as opposed to ECN) an
448	   auditor could use one of the following techniques:
449	   TCP-specific approach:  The auditor could monitor TCP flows or
450	      aggregates of flows, only holding state on a flow if it first
451	      sends a Credit or a Re-Echo-Loss marking.  The auditor could
452	      detect retransmissions by monitoring sequence numbers.  It would
453	      assure that (volume of retransmitted data) <= (volume of data
454	      marked Re-Echo-Loss).  Traffic would only be auditable in this way
455	      if it conformed to the standard TCP protocol and the IP payload
456	      was not encrypted (e.g. with IPsec).
457	   Predominant bottleneck approach:  Unlike the above TCP-specific
458	      solution, this technique would work for IP packets carrying any
459	      transport layer protocol, and whether encrypted or not.  But it
460	      only works well for networks designed so that losses predominantly
461	      occur under the management of one IP-aware node on the path.  The
462	      auditor could then be located at this bottleneck.  It could simply
463	      compare ConEx Signals with actual local losses.  Most consumer
464	      access networks are design to this model, e.g. the radio network
465	      controller (RNC) in a cellular network or the broadband remote
466	      access server (BRAS) in a digital subscriber line (DSL) network.

468	      The accuracy of an auditor at one predominant bottleneck might
469	      still be sufficient, even if losses occasionally occurred at other
470	      nodes in the network (e.g. border gateways).  Although the auditor
471	      at the predominant bottleneck would not always be able to detect
472	      losses at other nodes, transports would not know where losses were
473	      occurring either.  Therefore a transport would not know which
474	      losses it could cheat on without getting caught, and which ones it
475	      couldn't.

477	   To audit ConEx Signals against actual ECN markings or losses, the
478	   auditor could work as follows: monitor flows or aggregates of flows,
479	   only holding state on a flow if it first sends a ConEx-Marked packet
480	   (Credit or either Re-Echo marking).  Count the number of bytes marked
481	   with Credit or Re-Echo-ECN.  Separately count the number of bytes
482	   marked with ECN.  Use Credits to assure that {#ECN} <= {#Re-Echo-ECN}
483	   + {#Credit}, even though the Re-Echo-ECN markings are delayed by at
484	   least one RTT.

486	4.3.1.  Using Credit to Simplify Audit

488	   At the audit function,there will be an inherent delay of at least one
489	   round trip between a congestion signal and the subsequent ConEx
490	   signal it triggers--as it makes the two passes of the feedback loop
491	   in Figure 1.  However, the audit function cannot be expected to wait
492	   for a round trip to check that one signal balances the other, because
493	   it is hard for a network device to know the RTT of each transport.

495	   Instead, it considerably simplifies the audit function if the source
496	   transport is made responsible for removing the round trip delay in
497	   ConEx signals.  The transport SHOULD signal sufficient credit in
498	   advance to cover any reasonably expected congestion during its
499	   feedback delay.  Then, the audit function does not need to make
500	   allowance for round trip delays--that it cannot quantify.  This
501	   design choice correctly makes the transport responsible for both
502	   minimizing feedback delay and for the risk that packets in flight
503	   will cause congestion to others before the source can react.

505	   For example, imagine the audit function keeps a running account of
506	   the balance between actual congestion signals (loss or ECN), which it
507	   counts as negative, and ConEx signals, which it counts as positive.
508	   Having made the transport responsible for round trip delays, it will
509	   be expected to have pre-loaded the audit function with some credit at
510	   the start.  Therefore, if ever the balance does go negative, the
511	   audit function can immediately start punishing a flow, without any
512	   grace period.

514	   The one-way nature of packet forwarding probably makes per-flow state
515	   unavoidable for the audit function.  This was a necessary sacrifice
516	   to avoid per-flow state elsewhere in the wider ConEx architecture.
517	   Nonetheless, care was taken to ensure that packets could bring soft-
518	   state to the audit function, so that it would continue to work if a
519	   flow shifted to a different audit device, perhaps after a reroute or
520	   an audit device failure.  Therefore, although the audit function is
521	   likely to need flow state memory, at least it complies with the
522	   'fate-sharing' design principle of the Internet [IntDesPrinciples],
523	   and at least per-flow audit is only required at the outer edges of
524	   the internetwork, where it is less of a scalability concern.

526	   Note also that ConEx does not intend to embed rules in the network on
527	   how individual flows _behave_.  The audit function only does per-flow
528	   processing to check the integrity of ConEx _information_.

530	4.3.2.  Behaviour Constraints for the Audit Function

532	   There is no intention to standardise how to design or implement the
533	   audit function.  However, it is necessary to lay down the following
534	   normative constraints on audit behaviour so that transport designers
535	   will know what to design against and implementers of audit devices
536	   will know what pitfalls to avoid:
537	   Minimal False Hits:  Audit SHOULD introduce minimal false hits for
538	      honest flows;
539	   Minimal False Misses:  Audit SHOULD quickly detect and sanction
540	      dishonest flows, preferably at the first dishonest packet;
541	   Transport Oblivious:  Audit MUST NOT be designed around one
542	      particular rate response, such as any particular TCP congestion
543	      control algorithm or one particular resource sharing regime such
544	      as TCP-friendliness [RFC3448].  An important goal is to give
545	      ingress networks the freedom to unilaterally allow different rate
546	      responses to congestion and different resource sharing regimes
547	      [Evol_cc], without having to coordinate with downstream networks;
548	   Sufficient Sanction:  Audit MUST introduce sufficient sanction (e.g.
549	      loss in goodput) so that sources cannot understate congestion and
550	      play off losses at the audit function against higher allowed
551	      throughput at a congestion policer [Salvatori05];
552	   Manage Memory Exhaustion:  Audit SHOULD be able to counter state
553	      exhaustion attacks.  For instance, if the audit function uses
554	      flow-state, it should not be possible for sources to exhaust its
555	      memory capacity by gratuitously sending numerous packets, each
556	      with a different flow ID.
557	   Identifier Accountability:  Audit MUST NOT be vulnerable to `identity
558	      whitewashing', where a transport can label a flow with a new ID
559	      more cheaply than paying the cost of continuing to use its current
560	      ID [CheapPseud];

562	4.4.  Policy Devices

564	   Policy devices are characterised by a need to be configured with a
565	   policy related to the users or neighboring networks being served.  In
566	   contrast, the auditing devices referred to in the previous section
567	   primarily enforce compliance with the ConEx protocol and do not need
568	   to be configured with any client-specific policy.

570	4.4.1.  Policy Monitoring Devices

572	   Policy devices can typically be decomposed into two functions i)
573	   monitoring the ConEx signal to compare it with a policy then ii)
574	   acting in some way on the result.  Various actions might be invoked
575	   against 'out of contract' traffic, such as policing (see next
576	   section), re-routing, or downgrading the class of service.

578	   Alternatively a policy device might not act directly on the traffic,
579	   but instead report to management systems that are designed to control
580	   congestion indirectly.  For instance the reports might trigger
581	   capacity upgrades, penalty clauses in contracts, levy charges between
582	   networks based on congestion, or merely send warnings to clients who
583	   are causing excessive congestion.

585	   Nonetheless, whatever action is invoked, the policy monitoring
586	   function will always be a necessary part of any policy device.

588	4.4.2.  Congestion Policers

590	   A congestion policer can be implemented in a very similar way to a
591	   bit-rate policer, but its effect can be focused solely on traffic
592	   causing congestion downstream, which ConEx signals make visible.
593	   Without ConEx signals, the only way to mitigate congestion is to
594	   blindly limit traffic bit-rate, on the assumption that high bit-rate
595	   is more likely to cause congestion.

597	   A congestion policer monitors all ConEx traffic entering a network,
598	   or some identifiable subset.  Using ConEx signals, it measures the
599	   amount of congestion that this traffic is contributing to somewhere
600	   downstream.  If this exceeds a policy-configured 'congestion-bit-
601	   rate' the congestion policer will limit all the monitored ConEx
602	   traffic.

604	   A congestion policer can be implemented by a simple token bucket.
605	   But unlike a bit-rate policer, it removes a token only when it
606	   forwards a packet that is ConEx-Marked, effectively treating Not-
607	   ConEx-Marked packets as invisible.  Consequently, because tokens give
608	   the right to send congested bits, the fill-rate of the token bucket
609	   will represent the allowed congestion-bit-rate, which should be
610	   sufficient traffic management without having to additionally
611	   constrain the straight bit-rate.  See [CongPol] for details.

613	5.  IANA Considerations

615	   This memo includes no request to IANA.

617	   Note to RFC Editor: this section may be removed on publication as an
618	   RFC.

620	6.  Security Considerations

622	   Significant parts of this whole document are about auditability of
623	   ConEx Signals, in particular Section 4.3.

625	7.  Conclusions

627	   {ToDo:}

629	8.  Acknowledgements

631	   This document was improved by review comments from Toby Moncaster,
632	   Nandita Dukkipati, Mirja Kuehlewind and Caitlin Bestler.

634	9.  Comments Solicited

636	   Comments and questions are encouraged and very welcome.  They can be
637	   addressed to the IETF Congestion Exposure (ConEx) working group
638	   mailing list <conex@ietf.org>, and/or to the authors.

640	10.  References

642	10.1.  Normative References

644	   [RFC2119]                         Bradner, S., "Key words for use in
645	                                     RFCs to Indicate Requirement
646	                                     Levels", BCP 14, RFC 2119,
647	                                     March 1997.

649	10.2.  Informative References

651	   [CheapPseud]                      Friedman, E. and P. Resnick, "The
652	                                     Social Cost of Cheap Pseudonyms",
653	                                     Journal of Economics and Management
654	                                     Strategy 10(2)173--199, 1998.

656	   [CongPol]                         Jacquet, A., Briscoe, B., and T.
657	                                     Moncaster, "Policing Freedom to Use
658	                                     the Internet Resource Pool", Proc
659	                                     ACM Workshop on Re-Architecting the
660	                                     Internet (ReArch'08) ,
661	                                     December 2008, <http://
662	                                     bobbriscoe.net/projects/
663	                                     refb/#polfree>.

665	   [DCTCP]                           Alizadeh, M., Greenberg, A., Maltz,
666	                                     D., Padhye, J., Patel, P.,
667	                                     Prabhakar, B., Sengupta, S., and M.
668	                                     Sridharan, "Data Center TCP
669	                                     (DCTCP)", ACM SIGCOMM
670	                                     CCR 40(4)63--74, October 2010, <htt
671	                                     p://portal.acm.org/
672	                                     citation.cfm?id=1851192>.

674	   [Evol_cc]                         Gibbens, R. and F. Kelly, "Resource
675	                                     pricing and the evolution of
676	                                     congestion control",
677	                                     Automatica 35(12)1969--1985,
678	                                     December 1999, <http://
679	                                     www.statslab.cam.ac.uk/~frank/
680	                                     evol.html>.

682	   [FairerFaster]                    Briscoe, B., "A Fairer, Faster
683	                                     Internet Protocol", IEEE
684	                                     Spectrum Dec 2008:38--43,
685	                                     December 2008, <http://
686	                                     bobbriscoe.net/projects/
687	                                     refb/#fairfastip>.

689	   [I-D.briscoe-tsvwg-re-ecn-motiv]  Briscoe, B., Jacquet, A.,
690	                                     Moncaster, T., and A. Smith, "Re-
691	                                     ECN: A Framework for adding
692	                                     Congestion Accountability to
693	                                     TCP/IP", draft-briscoe-tsvwg-re-
694	                                     ecn-tcp-motivation-02 (work in
695	                                     progress), October 2010.

697	   [I-D.briscoe-tsvwg-re-ecn-tcp]    Briscoe, B., Jacquet, A.,
698	                                     Moncaster, T., and A. Smith, "Re-
699	                                     ECN: Adding Accountability for
700	                                     Causing Congestion to TCP/IP",
701	                                     draft-briscoe-tsvwg-re-ecn-tcp-09
702	                                     (work in progress), October 2010.

704	   [I-D.conex-concepts-uses]         Briscoe, B., Woundy, R., Moncaster,
705	                                     T., and J. Leslie, "ConEx Concepts
706	                                     and Use Cases",
707	                                     draft-ietf-conex-concepts-uses-01
708	                                     (work in progress), March 2011.

710	   [I-D.ietf-ledbat-congestion]      Shalunov, S., Hazel, G., and J.
711	                                     Iyengar, "Low Extra Delay
712	                                     Background Transport (LEDBAT)",
713	                                     draft-ietf-ledbat-congestion-03
714	                                     (work in progress), October 2010.

716	   [I-D.sridharan-tcpm-ctcp]         Sridharan, M., Tan, K., Bansal, D.,
717	                                     and D. Thaler, "Compound TCP: A New
718	                                     TCP Congestion Control for High-
719	                                     Speed and Long Distance  Networks",
720	                                     draft-sridharan-tcpm-ctcp-02 (work
721	                                     in progress), November 2008.

723	   [IntDesPrinciples]                Clark, D., "The Design Philosophy
724	                                     of the DARPA Internet Protocols",
725	                                     ACM SIGCOMM CCR 18(4)106--114,
726	                                     August 1988, <http://www.acm.org/
727	                                     sigcomm/ccr/archive/1995/jan95/
728	                                     ccr-9501-clark.pdf>.

730	   [RFC0791]                         Postel, J., "Internet Protocol",
731	                                     STD 5, RFC 791, September 1981.

733	   [RFC2309]                         Braden, B., Clark, D., Crowcroft,
734	                                     J., Davie, B., Deering, S., Estrin,
735	                                     D., Floyd, S., Jacobson, V.,
736	                                     Minshall, G., Partridge, C.,
737	                                     Peterson, L., Ramakrishnan, K.,
738	                                     Shenker, S., Wroclawski, J., and L.
739	                                     Zhang, "Recommendations on Queue
740	                                     Management and Congestion Avoidance
741	                                     in the Internet", RFC 2309,
742	                                     April 1998.

744	   [RFC3168]                         Ramakrishnan, K., Floyd, S., and D.
745	                                     Black, "The Addition of Explicit
746	                                     Congestion Notification (ECN) to
747	                                     IP", RFC 3168, September 2001.

749	   [RFC3448]                         Handley, M., Floyd, S., Padhye, J.,
750	                                     and J. Widmer, "TCP Friendly Rate
751	                                     Control (TFRC): Protocol
752	                                     Specification", RFC 3448,
753	                                     January 2003.

755	   [RFC3514]                         Bellovin, S., "The Security Flag in
756	                                     the IPv4 Header", RFC 3514, April 1
757	                                     2003.

759	   [RFC3540]                         Spring, N., Wetherall, D., and D.
760	                                     Ely, "Robust Explicit Congestion
761	                                     Notification (ECN) Signaling with
762	                                     Nonces", RFC 3540, June 2003.

764	   [RFC3550]                         Schulzrinne, H., Casner, S.,
765	                                     Frederick, R., and V. Jacobson,
766	                                     "RTP: A Transport Protocol for
767	                                     Real-Time Applications", STD 64,
768	                                     RFC 3550, July 2003.

770	   [RFC5670]                         Eardley, P., "Metering and Marking
771	                                     Behaviour of PCN-Nodes", RFC 5670,
772	                                     November 2009.

774	   [RFC5681]                         Allman, M., Paxson, V., and E.
775	                                     Blanton, "TCP Congestion Control",
776	                                     RFC 5681, September 2009.

778	   [Re-fb]                           Briscoe, B., Jacquet, A., Di
779	                                     Cairano-Gilfedder, C., Salvatori,
780	                                     A., Soppera, A., and M. Koyabe,
781	                                     "Policing Congestion Response in an
782	                                     Internetwork Using Re-Feedback",
783	                                     ACM SIGCOMM CCR 35(4)277--288,
784	                                     August 2005, <http://www.acm.org/
785	                                     sigs/sigcomm/sigcomm2005/
786	                                     techprog.html#session8>.

788	   [Refb-dis]                        Briscoe, B., "Re-feedback: Freedom
789	                                     with Accountability for Causing
790	                                     Congestion in a Connectionless
791	                                     Internetwork", UCL PhD
792	                                     Dissertation , 2009, <http://
793	                                     bobbriscoe.net/projects/
794	                                     refb/#refb-dis>.

796	   [Salvatori05]                     Salvatori, A., "Closed Loop Traffic
797	                                     Policing", Politecnico Torino and
798	                                     Institut Eurecom Masters Thesis ,
799	                                     September 2005.

801	   [Vegas]                           Brakmo, L. and L. Peterson, "TCP
802	                                     Vegas: End-to-End Congestion
803	                                     Avoidance on a Global Internet",
804	                                     IEEE Journal on Selected Areas in
805	                                     Communications 13(8)1465--80,
806	                                     October 1995, <http://
807	                                     ieeexplore.ieee.org/iel1/49/9740/
808	                                     00464716.pdf?arnumber=464716>.

810	Authors' Addresses

812	   Matt Mathis
813	   Google, Inc
814	   1600 Amphitheater Parkway
815	   Mountain View, California  93117
816	   USA

818	   EMail: mattmathis at google.com

820	   Bob Briscoe
821	   BT
822	   B54/77, Adastral Park
823	   Martlesham Heath
824	   Ipswich  IP5 3RE
825	   UK

827	   Phone: +44 1473 645196
828	   EMail: bob.briscoe@bt.com
829	   URI:   http://bobbriscoe.net/