idnits 2.17.1 

draft-ietf-conex-abstract-mech-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 24, 2014) is 3466 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-01) exists of
     draft-briscoe-conex-policing-00

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-conex-re-ecn-motiv-02

  == Outdated reference: A later version (-04) exists of
     draft-briscoe-conex-re-ecn-tcp-02

  == Outdated reference: A later version (-12) exists of
     draft-ietf-conex-destopt-05

  == Outdated reference: A later version (-10) exists of
     draft-ietf-conex-tcp-modifications-04

  == Outdated reference: A later version (-08) exists of
     draft-ietf-tcpm-accecn-reqs-04

  == Outdated reference: A later version (-02) exists of
     draft-wagner-conex-audit-01


     Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Congestion Exposure (ConEx) Working                            M. Mathis
3	Group                                                        Google, Inc
4	Internet-Draft                                                B. Briscoe
5	Intended status: Informational                                        BT
6	Expires: April 27, 2015                                 October 24, 2014

8	      Congestion Exposure (ConEx) Concepts, Abstract Mechanism and
9	                              Requirements
10	                   draft-ietf-conex-abstract-mech-13

12	Abstract

14	   This document describes an abstract mechanism by which senders inform
15	   the network about the congestion recently encountered by packets in
16	   the same flow.  Today, network elements at any layer may signal
17	   congestion to the receiver by dropping packets or by ECN markings,
18	   and the receiver passes this information back to the sender in
19	   transport-layer feedback.  The mechanism described here enables the
20	   sender to also relay this congestion information back into the
21	   network in-band at the IP layer, such that the total amount of
22	   congestion from all elements on the path is revealed to all IP
23	   elements along the path, where it could, for example, be used to
24	   provide input to traffic management.  This mechanism is called
25	   congestion exposure or ConEx.  The companion document "ConEx Concepts
26	   and Use Cases" provides the entry-point to the set of ConEx
27	   documentation.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on April 27, 2015.

46	Copyright Notice

48	   Copyright (c) 2014 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
64	   2.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
65	     2.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  6
66	   3.  Requirements for the ConEx Abstract Mechanism  . . . . . . . .  7
67	     3.1.  Requirements for ConEx Signals . . . . . . . . . . . . . .  7
68	     3.2.  Constraints on the Audit Function  . . . . . . . . . . . .  8
69	     3.3.  Requirements for non-abstract ConEx specifications . . . .  9
70	   4.  Encoding Congestion Exposure . . . . . . . . . . . . . . . . . 11
71	     4.1.  Naive Encoding . . . . . . . . . . . . . . . . . . . . . . 11
72	     4.2.  Null Encoding  . . . . . . . . . . . . . . . . . . . . . . 12
73	     4.3.  ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 12
74	     4.4.  Independent Bits . . . . . . . . . . . . . . . . . . . . . 13
75	     4.5.  Codepoint Encoding . . . . . . . . . . . . . . . . . . . . 13
76	     4.6.  Units Implied by an Encoding . . . . . . . . . . . . . . . 14
77	   5.  Congestion Exposure Components . . . . . . . . . . . . . . . . 15
78	     5.1.  Network Devices (Not modified) . . . . . . . . . . . . . . 15
79	     5.2.  Modified Senders . . . . . . . . . . . . . . . . . . . . . 15
80	     5.3.  Receivers (Optionally Modified)  . . . . . . . . . . . . . 16
81	     5.4.  Policy Devices . . . . . . . . . . . . . . . . . . . . . . 16
82	       5.4.1.  Congestion Monitoring Devices  . . . . . . . . . . . . 16
83	       5.4.2.  Rest-of-Path Congestion Monitoring . . . . . . . . . . 17
84	       5.4.3.  Congestion Policers  . . . . . . . . . . . . . . . . . 17
85	     5.5.  Audit  . . . . . . . . . . . . . . . . . . . . . . . . . . 18
86	   6.  Support for Incremental Deployment . . . . . . . . . . . . . . 21
87	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 24
88	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 24
89	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25
90	   10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 26
91	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
92	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 26
93	     11.2. Informative References . . . . . . . . . . . . . . . . . . 26

95	1.  Introduction

97	   This document describes an abstract mechanism by which, to a first
98	   approximation, senders inform the network about the congestion
99	   encountered by packets earlier in the same flow.  It is not a
100	   complete protocol specification, because it is known that designing
101	   an encoding (e.g. packet formats, codepoint allocations, etc) is
102	   likely to entail compromises that preclude some uses of the protocol.
103	   The goal of this document is to provide a framework for developing
104	   and testing algorithms to evaluate the benefits of the ConEx protocol
105	   and to evaluate the consequences of the compromises in various
106	   different encoding designs.  This document lays out requirements for
107	   concrete protocol specifications.

109	   A companion document [RFC6789] provides the entry point to the set of
110	   ConEx documentation.  It outlines concepts that are pre-requisites to
111	   understanding why ConEx is useful, and it outlines various ways that
112	   ConEx might be used.

114	2.  Overview

116	   As typical end-to-end transport protocols continually seek out more
117	   network capacity, network elements signal whenever congestion
118	   results, and the transports are responsible for controlling this
119	   network congestion [RFC5681].  The more a transport tries to use
120	   capacity that others want to use, the more congestion signals will be
121	   attributable to that transport.  Likewise, the more transport
122	   sessions sustained by a user and the longer the user sustains them,
123	   the more congestion signals will be attributable to that user.  The
124	   goal of ConEx is to ensure that the resulting congestion signals are
125	   sufficiently visible and robust, because they are an ideal metric for
126	   networks to use as the basis of traffic management or other related
127	   functions.

129	   Networks indicate congestion by three possible signals: packet loss,
130	   ECN marking or queueing delay.  ECN marking and some packet loss may
131	   be the outcome of Active Queue Management (AQM), which the network
132	   uses to warn senders to reduce their rates.  Packet loss is also the
133	   natural consequence of complete exhaustion of a buffer or other
134	   network resource.  Some experimental transport protocols and TCP
135	   variants infer impending congestion from increasing queuing delay.
136	   However, delay is too amorphous to use as a congestion metric.  In
137	   this and other ConEx documents, the term 'congestion signals' is
138	   generally used solely for ECN markings and packet losses, because
139	   they are unambiguous signals of congestion.

141	   In both cases the congestion signals follow the route indicated in
142	   Figure 1.  A congested network device sends a signal in the data
143	   stream on the forward path to the transport receiver, the receiver
144	   passes it back to the sender through transport level feedback, and
145	   the sender makes some congestion control adjustment.

147	   This document extends the capabilities of the Internet protocol suite
148	   with the addition of a new Congestion Exposure signal.  To a first
149	   approximation this signal, also shown in Figure 1, relays the
150	   congestion information from the transport sender back through the
151	   internetwork layer where it is visible to any interested internetwork
152	   layer devices along the forward path.  This document frames the
153	   engineering problem of designing the ConEx signal.  The requirements
154	   are described in Section 3 and some example encoding are presented in
155	   Section 4.  Section 5 describes all of the protocol components.

157	   This new signal is expressly designed to support a variety of new
158	   policy mechanisms that might be used to instrument, monitor or manage
159	   traffic.  The policy devices are not shown in Figure 1 but might be
160	   placed anywhere along the forward data path (see Section 5.4).

162	   ,---------.                                               ,---------.
163	   |Transport|                                               |Transport|
164	   | Sender  |   .                                           |Receiver |
165	   |         |  /|___________________________________________|         |
166	   |     ,-<---------------Congestion-Feedback-Signals--<--------.     |
167	   |     |   |/                                              |   |     |
168	   |     |   |\           Transport Layer Feedback Flow      |   |     |
169	   |     |   | \  ___________________________________________|   |     |
170	   |     |   |  \|                                           |   |     |
171	   |     |   |   '         ,-----------.               .     |   |     |
172	   |     |   |_____________|           |_______________|\    |   |     |
173	   |     |   |    IP Layer |           |  Data Flow      \   |   |     |
174	   |     |   |             |(Congested)|                  \  |   |     |
175	   |     |   |             |  Network  |--Congestion-Signals--->-'     |
176	   |     |   |             |  Device   |                    \|         |
177	   |     |   |             |           |                    /|         |
178	   |     `----------->--(new)-IP-Layer-ConEx-Signals-------->|         |
179	   |         |             |           |                  /  |         |
180	   |         |_____________|           |_______________  /   |         |
181	   |         |             |           |               |/    |         |
182	   `---------'             `-----------'               '     `---------'

184	            Figure 1: The Flow of Congestion and ConEx Signals

186	   Since the policy devices can affect how traffic is treated it is
187	   assumed that there is an intrinsic motivation for users, applications
188	   or operating systems to understate the congestion that they are
189	   causing.  Therefore, it is important to be able to audit ConEx
190	   signals, and to be able to apply sufficient sanction to discourage
191	   cheating of congestion policies.  The general approach to auditing is
192	   to count signals on the forward path to confirm that there are never
193	   fewer ConEx signals than congestion signals.  Many ConEx design
194	   constraints come from the need to assure that the audit function is
195	   sufficiently robust.  The audit function is described in Section 5.5,
196	   however significant portions of this document (and prior research
197	   [Refb-dis]) is motivated by issues relating to the audit function and
198	   making it robust.

200	   The congestion and ConEx signals shown in Figure 1 represent a series
201	   of discrete events: ECN marks or lost packets, carried by the forward
202	   data stream and fed back into the Internetwork layer.  The policy and
203	   audit functions are most likely to act on the accumulated values of
204	   these signals, for which we use the term "volume".  For example
205	   traffic volume is the total number of bytes delivered, optionally
206	   over a specified time interval and over some aggregate of traffic
207	   (e.g. all traffic from a site).  While loss-volume is the total
208	   amount of bytes discarded from some aggregate over an interval.  The
209	   term congestion-volume is defined precisely in [RFC6789].  Note that
210	   volume per unit time is (average) rate.

212	   A design goal of the ConEx protocol is that the important policy
213	   mechanisms can be implemented per logical link without per flow state
214	   (see Section 5.4).  However, the price to pay can be flow state to
215	   audit ConEx signals (Section 5.5).  This is justified in that i)
216	   auditing at the edges, with limited per flow state, enables policy
217	   elsewhere, including in the core, without any per flow state; ii)
218	   auditing can use soft flow state, which does not require route
219	   pinning.

221	   There is a long standing argument over units of congestion: bytes vs
222	   packets (see [RFC7141] and its references).  Section 4.6 explains why
223	   this problem must be addressed carefully.  However, this document
224	   does not take a strong position on this issue.  Nonetheless, it does
225	   require that the units of congestion must be an explicitly stated
226	   property of any proposed encoding, and the consequences of that
227	   design decision must be evaluated along with other aspects of the
228	   design.

230	   To be successful the ConEx protocol needs to have the property that
231	   the relevant stakeholders each have the incentive to unilaterally
232	   start on each stage of partial deployment, which in turn creates
233	   incentives for further deployment.  Furthermore, legacy systems that
234	   will never be upgraded do not become a barrier to deploying ConEx.
235	   Issues relating to partial deployment are described in Section 6.

237	   Note that ConEx signals are not intended to be used for fine-grained
238	   congestion control.  They are anticipated to be most useful at longer
239	   time scales and/or at coarser granularity than single microflows.
240	   For example the total congestion caused by a user might serve as an
241	   input to higher level policy or accountability functions, designed to
242	   create incentives for improving user behavior, such as choosing to
243	   send large quantities of data at off-peak times, at lower data rates
244	   or with less aggressive protocols such as LEDBAT [RFC6817] (see
245	   [RFC6789]).

247	   Ultimately ConEx signals have the potential to provide a mechanism to
248	   regulate global Internet congestion.  From the earliest days of
249	   congestion control research there has been a concern that there is no
250	   mechanism to prevent transport designers from incrementally making
251	   protocols more aggressive without bound and spiraling to a "tragedy
252	   of the commons" Internet congestion collapse.  The "TCP friendly"
253	   paradigm was created in part to forestall this failure.  However, it
254	   no longer commands any authority because it has little to say about
255	   the Internet of today, which has moved beyond the scaling range of
256	   standard TCP.  As a consequence, many transports and applications are
257	   opening arbitrarily large numbers of connections or using arbitrary
258	   levels of aggressiveness.  ConEx represents a recognition that the
259	   IETF cannot regulate this space directly because it concerns the
260	   behaviour of users and applications, not individual transport
261	   protocols.  Instead the IETF can give network operators the protocol
262	   tools to arbitrate the space themselves, with better bulk traffic
263	   management.  This in turn should create incentives for users, and
264	   designers of application and of transport protocols to be more
265	   mindful about contributing to congesting.

267	2.1.  Terminology

269	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
270	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
271	   document are to be interpreted as described in RFC 2119 [RFC2119].

273	   ConEx signals in IP packet headers from the sender to the network:
274	   Not-ConEx:  The transport (or at least this packet) is not ConEx-
275	      capable.
276	   ConEx-Capable:  The transport is ConEx-Capable.  This is the opposite
277	      of Not-ConEx.
278	   ConEx Signal:  A signal in a packet sent by a ConEx Capable
279	      transport.  It carries at least one of the following signals:
280	      Re-Echo-Loss:  The transport has experienced a loss.
281	      Re-Echo-ECN:  The transport has detected an ECN congestion
282	         experienced (CE) mark.

284	      Credit:  The transport is building up credit to signal advance
285	         notice of the risk of packets contributing to congestion, in
286	         contrast to signalling only after inherently delayed feedback
287	         of actual congestion.
288	      ConEx-Not-Marked:  The transport is ConEx-capable but is signaling
289	         none of Re-Echo-Loss, Re-Echo-ECN or Credit.
290	   ConEx-Marked:  At least one of Re-Echo-Loss, Re-Echo-ECN or Credit.
291	   ConEx-Re-Echo:  At least one of Re-Echo-Loss or Re-Echo-ECN.

293	3.  Requirements for the ConEx Abstract Mechanism

295	   First time readers may wish to skim this section, since it is more
296	   understandable having read the entire document.

298	3.1.  Requirements for ConEx Signals

300	   Ideally, all the following requirements would be met by a Congestion
301	   Exposure Signal:
302	   a.  The ConEx Signal SHOULD be visible to internetwork layer devices
303	       along the entire path from the transport sender to the transport
304	       receiver.  Equivalently, it SHOULD be present in the IPv4 or IPv6
305	       header, and in the outermost IP header if using IP in IP
306	       tunneling.  It MAY need to be visible if other encapsulating
307	       headers are used to interconnect networks.  The ConEx Signal
308	       SHOULD be immutable once set by the transport sender.  A
309	       corollary of these requirements is that the chosen ConEx encoding
310	       SHOULD pass silently without modification through pre-existing
311	       networking gear.
312	   b.  The ConEx Signal SHOULD be useful under only partial deployment.
313	       A minimal deployment SHOULD only require changes to transport
314	       senders.  Furthermore, partial deployment SHOULD create
315	       incentives for additional deployment, both in terms of enabling
316	       ConEx on more devices and adding richer features to existing
317	       devices.  Nonetheless, ConEx deployment need never be universal,
318	       and it is anticipated that some hosts and some transports may
319	       never support the ConEx Protocol and some networks may never use
320	       the ConEx Signals.
321	   c.  The ConEx signal SHOULD be timely.  There will be a minimum delay
322	       of one RTT, and often longer if the transport protocol sends
323	       infrequent feedback (consider RTCP [RFC3550], [RFC6679] for
324	       example).
325	   d.  The ConEx signal SHOULD be accurate and auditable.  The general
326	       approach for auditing is to observe the volume of congestion
327	       signals and ConEx signals on the forward data path and verify
328	       that the ConEx signals do not under-represent the congestion
329	       signals (see Section 5.5).

331	   e.  The ConEx signals for packet loss and ECN marking SHOULD have
332	       distinct encodings because they are likely to require different
333	       auditing techniques.
334	   f.  Additionally there SHOULD be an auditable ConEx Credit signal.  A
335	       sender can use Credit to indicate potential future congestion,
336	       for example as often seen during startup.  ConEx Credit is
337	       intended to overestimate congestion actually experienced across
338	       the network.

340	   It is already known that implementing ConEx signals is likely to
341	   entail some compromises, and therefore all the requirements above are
342	   expressed with the keyword 'SHOULD' rather than 'MUST'.  The only
343	   mandatory requirement is that a concrete protocol description MUST
344	   give sound reasoning if it chooses not to meet some requirement.

346	3.2.  Constraints on the Audit Function

348	   The role of the audit function and constraints on it are described in
349	   Section 5.5.  There is no intention to standardise the audit
350	   function.  However, it is necessary to lay down the following
351	   normative constraints on audit behaviour so that transport designers
352	   will know what to design against and implementers of audit devices
353	   will know what pitfalls to avoid:
354	   Minimal False Hits:  Audit SHOULD introduce minimal false hits for
355	      honest flows;
356	   Minimal False Misses:  Audit SHOULD quickly detect and sanction
357	      dishonest flows, ideally on the first dishonest packet;
358	   Transport Oblivious:  Audit SHOULD NOT be designed around one
359	      particular rate response, such as any particular TCP congestion
360	      control algorithm or one particular resource sharing regime such
361	      as TCP-friendliness [RFC5348].  An important goal is to give
362	      ingress networks the freedom to unilaterally allow different rate
363	      responses to congestion and different resource sharing regimes
364	      [Evol_cc], without having to coordinate with other networks over
365	      details of individual flow behaviour;
366	   Sufficient Sanction:  Audit SHOULD introduce sufficient sanction
367	      (e.g. loss in goodput) such that senders cannot gain from
368	      understating congestion;
369	   Proportionate Sanction:  To the extent that the audit might be
370	      subject to false hits, the sanction SHOULD be proportionate to the
371	      degree to which congestion is understated.  If audit over-
372	      punishes, attackers will find ways to harness it into amplifying
373	      attacks on others.  Ideally audit should, in the long-run, cause
374	      the user to get no better performance than they would get by being
375	      accurate.

377	   Manage Memory Exhaustion:  Audit SHOULD be able to counter state
378	      exhaustion attacks.  For instance, if the audit function uses
379	      flow-state, it should not be possible for senders to exhaust its
380	      memory capacity by gratuitously sending numerous packets, each
381	      with a different flow ID.
382	   Identifier Accountability:  Audit SHOULD NOT be vulnerable to
383	      `identity whitewashing', where a transport can label a flow with a
384	      new ID more cheaply than paying the cost of continuing to use its
385	      current ID [CheapPseud];

387	3.3.  Requirements for non-abstract ConEx specifications

389	   An experimental ConEx specification SHOULD describe the following
390	   protocol details:
391	   Network Layer:
392	      A.  The specific ConEx signal encodings with packet formats, bit
393	          fields and/or code points;
394	      B.  An inventory of invalid combinations of flags or invalid
395	          codepoints in the encoding.  Whether security gateways should
396	          normalise, discard or ignore such invalid encodings, and what
397	          values they should be considered equivalent to by ConEx-aware
398	          elements;
399	      C.  An inventory of any conflated signals or any other effects
400	          that are known to compromise signal integrity;
401	      D.  Whether the source is responsible for allowing for the round
402	          trip delay in ConEx signals (e.g. using a Credit marking), and
403	          if so whether Credit is maintained for the duration of a flow
404	          or degrades over time, and what defines the end of the
405	          duration of a flow;
406	      E.  A specification for signal units (bytes vs packets, etc), any
407	          approximations allowed and algorithms to do any implied
408	          conversions or accounting;
409	      F.  If the units are bytes a definition of which headers are
410	          included in the size of the packet;
411	      G.  How tunnels should propagate the ConEx encoding;
412	      H.  Whether the encoding fields are mutable or not, to ensure that
413	          header authentication, checksum calculation, etc. process them
414	          correctly.  A ConEx encoding field SHOULD be immutable end-to-
415	          end, then end points can detect if it has been tampered with
416	          in transit;
417	      I.  If a specific encoding allows mutability (e.g. at proxies), an
418	          inventory of invalid transitions between codepoints.  In all
419	          encodings, transitions from any ConEx marking to Not-ConEx
420	          MUST be invalid;
421	      J.  A statement that the ConEx encoding is only applicable to
422	          unicast and anycast, and that forwarding elements should
423	          silently ignore any ConEx signalling on multicast packets
424	          (they should be forwarded unchanged)

426	      K.  Definition of any extensibility;
427	      L.  Backward and forward compatibility and potential migration
428	          strategies.  In all cases, a ConEx encoding MUST be arranged
429	          so that legacy transport senders implicitly send Not-ConEx;
430	      M.  Any (optional) modification to data-plane forwarding dependent
431	          on the encoding (e.g. preferential discard, interaction with
432	          Diffserv, ECN etc.);
433	      N.  Any warning or error messages relevant to the encoding.

435	      Note regarding item J on multicast: A multicast tree may involve
436	      different levels of congestion on each leg.  Any traffic
437	      management can only monitor or control multicast congestion at or
438	      near each receiver.  It would make no sense for the sender to try
439	      to expose "whole path congestion" in sent packets, because it
440	      cannot hope to describe all the differing congestion levels on
441	      every leg of the tree.
442	   Transport Layer:
443	      A.  A specification of any required changes to congestion feedback
444	          in particular transport protocols.
445	      B.  A specification (or minimally a recommendation) for how a
446	          transport should estimate credits at the beginning of a
447	          connection and while it is in progress.
448	      C.  A specification of whether any other protocol options should
449	          (or must) be enabled along with an implementation of ConEx
450	          (e.g. at least attempting to negotiate ECN and SACK
451	          capability);
452	      D.  A specification of any configuration that a ConEx stack may
453	          require (or preferably confirmation that it requires no
454	          configuration);
455	      E.  A specification of the statistics that a protocol stack should
456	          log for each type of marking on a per-flow or aggregate basis.
457	   Security:
458	      A.  An example of a strong audit algorithm suitable for detecting
459	          if a single flow is misstating congestion.  This algorithm
460	          should present minimal false results, but need not have
461	          optimal scaling properties (e.g. may need per flow state).
462	      B.  An example of an audit algorithm suitable for detecting
463	          misstated congestion in a large aggregate (e.g. no per-flow
464	          state).

466	   The possibility exists that these specifications over constrain the
467	   ConEx design, and can not be fully satisfied.  An important part of
468	   the evaluation of any particular design will be a thorough inventory
469	   of all ways in which it might fail to satisfy these specifications.

471	4.  Encoding Congestion Exposure

473	   Most protocol specifications start with a description of packet
474	   formats and codepoints with their associated meanings.  This document
475	   does not: It is already known that choosing the encoding for ConEx is
476	   likely to entail some engineering compromises that have the potential
477	   to reduce the protocol's usefulness in some settings.  For instance
478	   the experimental ConEx encoding chosen for IPv6
479	   [I-D.ietf-conex-destopt] had to make compromises on tunnelling.
480	   Rather than making these engineering choices prematurely, this
481	   document sidesteps the encoding problem by making it abstract.  It
482	   describes several different representations of ConEx Signals, none of
483	   which are specified to the level of specific bits or code points.

485	   The goal of this approach is to be as complete as possible for
486	   discovering the potential usage and capabilities of the ConEx
487	   protocol, so we have some hope of making optimal design decisions
488	   when choosing the encoding.  Even if experiments reveal particular
489	   problems due to the encoding, then this document will still serve as
490	   a reference model.

492	4.1.  Naive Encoding

494	   For tutorial purposes, it is helpful to describe a naive encoding of
495	   the ConEx protocol for TCP and similar protocols: set a bit (not
496	   specified here) in the IP header on each retransmission and on each
497	   ECN signaled window reduction.  Network devices along the forward
498	   path can see this bit and act on it.  For example any device along
499	   the path might limit the rate of all traffic if the rate of marked
500	   (congested) packets exceeds a threshold.

502	   This simple encoding is sufficient to illustrate many of the benefits
503	   envisioned for ConEx.  At first glance it looks like it might
504	   motivate people to deploy and use it.  It is a one line code change
505	   that a small number of OS developers and content providers could
506	   unilaterally deploy across a significant fraction of all Internet
507	   traffic.  However, this encoding does not support auditing so it
508	   would also motivate users and/or applications to misrepresent the
509	   congestion that they are causing [RFC3514].  As a consequence the
510	   naive encoding is not likely to be trusted and thus creates its own
511	   disincentives for deployment.

513	   Nonetheless, this Naive encoding does present a clear mental model of
514	   how the ConEx protocol might function under various uses.  It is
515	   useful for thought experiments where it can be stipulated that all
516	   participants are honest and it does illustrate some of the incentives
517	   that might be introduced by ConEx.

519	4.2.  Null Encoding

521	   In limited contexts it is possible to implement ConEx-like functions
522	   without any signals at all by measuring rest-of-path congestion
523	   directly from TCP headers.  The algorithm is to keep at least one RTT
524	   of past TCP headers and matching each new header against the history
525	   to count duplicate data.

527	   This could implement many ConEx policies, without any explicit
528	   protocol.  It is fairly easy to implement, at least at low rate (e.g.
529	   in a software based edge router).  However, it would only be useful
530	   in cases where the network operator can see the TCP headers.  This is
531	   currently (2014) the majority of traffic because UDP, IPSec and VPN
532	   tunnels are used far less than SSL or TLS over TCP/IP, which do not
533	   hide TCP sequence numbers from network devices.  However, anyone
534	   specifically intending to avoid the attention of a congestion policy
535	   device would only have to hide their TCP headers from the network
536	   operator (e.g. by using a VPN tunnel).

538	4.3.  ECN Based Encoding

540	   The re-ECN specification [I-D.briscoe-conex-re-ecn-tcp] presents an
541	   encoding of ConEx in IPv4 and IPv6 that was tightly integrated with
542	   ECN encoding in order to fit into the IPv4 header.  Any individual
543	   packet may need to represent any ECN codepoint and any ConEx signal
544	   value independently.  So, ideally their encoding should be entirely
545	   independent.  However, given the limited number of header bits and/or
546	   code points, re-ECN chooses to partially share code points and to re-
547	   echo both losses and ECN with just one codepoint.

549	   The central theme of the re-ECN work is an audit mechanism that
550	   provides sufficient disincentives against misrepresenting congestion
551	   [I-D.briscoe-conex-re-ecn-motiv].  It is analyzed extensively in
552	   Briscoe's PhD dissertation [Refb-dis].  For a tutorial background on
553	   re-ECN motivation and techniques, see [Re-fb, FairerFaster].

555	   Re-ECN is an example of one chosen set of compromises attempting to
556	   meet the requirements of Section 3.  The present document takes a
557	   step back, aiming to state the ideal requirements in order to allow
558	   the Internet community to assess whether different compromises might
559	   be better.

561	   The problem with Re-ECN is that it requires that receivers be ECN
562	   enabled in addition to sender changes.  Newer encodings
563	   [I-D.ietf-conex-destopt] overcome this problem by being able to
564	   represent loss and ECN based congestion separately.

566	4.4.  Independent Bits

568	   This encoding involves flag bits, each of which the sender can set
569	   independently to indicate to the network one of the following four
570	   signals:
571	   ConEx (Not-ConEx)  The transport is (or is not) using ConEx with this
572	      packet (network layer encoding requirement L in Section 3.3) says
573	      the protocol must be arranged so that legacy transport senders
574	      implicitly send Not-ConEx;
575	   Re-Echo-Loss (Not-Re-Echo-Loss)  The transport has (or has not)
576	      experienced a loss
577	   Re-Echo-ECN (Not-Re-Echo-ECN)  The transport has (or has not)
578	      experienced ECN-signaled congestion
579	   Credit (Not-Credit)  The transport is (or is not) building up
580	      congestion credit (see Section 5.5 on the audit function)

582	   A packet with ConEx set combined with all the three other flags
583	   cleared implies ConEx-Not-Marked

585	   This encoding does not imply any exclusion property among the
586	   signals.  Multiple types of congestion (ECN, loss) can be signalled
587	   on the same ACK.  So, ideally, a ConEx sender would be able to
588	   reflect these in the next packet.  However, there will be many
589	   invalid combinations of flags (e.g.  Not-ConEx combined with any of
590	   the ConEx-marked flags), which a malicious sender could use to
591	   advantage against naive policy devices that only check each flag
592	   separately.

594	   As long as the packets in a flow have uniform sizes, it does not
595	   matter whether the units of congestion are packets or bytes.
596	   However, if an application sends very irregular packet sizes, it may
597	   be necessary for the sender to mark multiple packets to avoid being
598	   in technical violation of an audit function measuring in bytes (see
599	   Section 4.6).

601	4.5.  Codepoint Encoding

603	   This encoding involves signaling one of the following five
604	   codepoints:

606	   ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit}

608	   Each named codepoint has the same meaning as in the encoding using
609	   independent bits in the previous section.  The use of any one
610	   codepoint implies the negative of all the others.

612	   Inherently, the semantics of most of the enumerated codepoints are
613	   mutually exclusive.  'Credit' is the only one that might need to be
614	   used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even
615	   that requirement is questionable.  It must not be forgotten that the
616	   enumerated encoding loses the flexibility to signal these two
617	   combinations, whereas the encoding with four independent bits is not
618	   so limited.  Alternatively two extra codepoints could be assigned to
619	   these two combinations of semantics.  The comment in the previous
620	   section about units also applies.

622	4.6.  Units Implied by an Encoding

624	   The following comments apply generally to all the other encodings.

626	   Congestion can be due to exhaustion of bit-carrying capacity, or
627	   exhaustion of packet processing power.  When a packet is discarded or
628	   marked to indicate congestion, there is no easy way to know whether
629	   the lost or marked packet signifies bit-congestion or packet-
630	   congestion.  The above ConEx encodings that rely on marking packets
631	   suffer from the same ambiguity.

633	   This problem is most acute when audit needs to check that one count
634	   of markings matches another.  For example if there are ConEx markings
635	   on three large (1500B) packets, is that sufficient to match the loss
636	   of 5 small (60B) packets?  If a packet-marking is defined to mean all
637	   the bytes in the packet are marked, then we have 4500B of Conex
638	   marked data against 300B of lost data, which is easily sufficient.
639	   If instead we are counting packets, then we have 3 ConEx packets
640	   against 5 lost packets, which is not sufficient.  This problem will
641	   not arise when all the packets in a flow are the same size, but a
642	   choice needs to be made for flows in which packet sizes vary, such as
643	   BGP, SPDY and some variable rate video encoding schemes.

645	   Whether to use bytes or packets is not obvious.  For instance, the
646	   most expensive links in the Internet, in terms of cost per bit, are
647	   all at lower data rates, where transmission times are large and
648	   packet sizes are important.  In order for a policy to consider wire
649	   time, it needs to know the number of congested bytes.  However, high
650	   speed networking equipment and the transport protocols themselves
651	   sometimes gauge resource consumption and congestion in terms of
652	   packets.

654	   [RFC7141] advises that congestion indications should be interpreted
655	   in units of bytes when responding to congestion, at least on today's
656	   Internet.  [RFC6789] takes the same view in its definition of
657	   congestion-volume, again for today's Internet.

659	   In any TCP implementation this is simple to achieve for varying size
660	   packets, given TCP SACK tracks losses in bytes.  If an encoding is
661	   specified in units of bytes, the encoding should also specify which
662	   headers to include in the size of a packet (see network layer
663	   requirement F in Section 3.3).

665	   RFC 7141 constructs an argument for why equipment tends to be built
666	   so that the bottleneck will be the bit-carrying capacity of its
667	   interfaces not its packet processing capacity.  However, RFC 7141
668	   acknowledges that the position may change in future, and notes that
669	   new techniques will need to be developed to distinguish packet- and
670	   bit-congestion.

672	   Given this document describes an abstract ConEx mechanism, it is
673	   intended to be timeless.  Therefore it does not take a strong
674	   position on this issue.  However, a ConEx encoding will need to
675	   explicitly specify whether it assumes units of bytes or packets
676	   consistently for both congestion indications and ConEx markings (see
677	   network layer requirement E in Section 3.3).  It may help to refer to
678	   the guidance in [RFC7141].

680	5.  Congestion Exposure Components

682	   The components shown in Figure 1 as well as policy and audit are
683	   described in more detail.

685	5.1.  Network Devices (Not modified)

687	   Congestion signals originate from network devices as they do today.
688	   A congested router, switch or other network device can discard or ECN
689	   mark packets when it is congested.

691	5.2.  Modified Senders

693	   The sending transport needs to be modified to send Congestion
694	   Exposure signals in response to congestion feedback signals (e.g. for
695	   the case of a TCP transport see [I-D.ietf-tcp-modifications]).  We
696	   want to permit ConEx without ECN (e.g. if the receiver does not
697	   support ECN).  However, we want to encourage a ConEx sender to at
698	   least attempt to negotiate ECN (a ConEx transport protocol spec may
699	   require this), because it is believed that ConEx without ECN is
700	   harder to audit, and thus potentially exposed to cheating.  Since
701	   honest users have the potential to benefit from stronger mechanisms
702	   to manage traffic they have an incentive to deploy ConEx and ECN
703	   together.  This incentive is not sufficient to prevent a dishonest
704	   user from constructing (or configuring) a sender that enables ConEx
705	   after choosing not to negotiate ECN, but it should be sufficient to
706	   prevent this from being the sustained default case for any
707	   significant pool of users.

709	   Permitting ConEx without ECN is necessary to facilitate bootstrapping
710	   other parts of ConEx deployment.

712	5.3.  Receivers (Optionally Modified)

714	   Any receiving transport may already feedback sufficiently useful
715	   signals to the sender so that it does not need to be altered.

717	   The native loss or ECN signaling mechanism required for compliance
718	   with existing congestion control standards (e.g.  RTCP, SCTP) will
719	   typically be sufficient for the Sender to generate ConEx signals.

721	   TCP's loss feedback is sufficient for ConEx if SACK is used
722	   [RFC2018].  However, the original specification for ECN in TCP
723	   [RFC3168] signals congestion no more than once per round trip.  The
724	   sender may require more precise feedback from the receiver otherwise
725	   it is at risk of appearing to be understating its ConEx Signals.

727	   Ideally, ConEx should be added to a transport like TCP without
728	   mandatory modifications to the receiver.  But in the TCP-ECN case an
729	   optional modification to the receiver could be recommended for
730	   precision (see [I-D.ietf-tcpm-accecn-reqs], which is based on the
731	   approach originally taken when adding re-ECN to TCP
732	   [I-D.briscoe-conex-re-ecn-tcp]).

734	5.4.  Policy Devices

736	   Policy devices are characterised by a need to be configured with a
737	   policy related to the users or neighboring networks being served.  In
738	   contrast, auditing devices solely enforce compliance with the ConEx
739	   protocol and do not need to be configured with any client-specific
740	   policy.

742	   One of the design goals of the ConEx protocol is that none of the
743	   important policy mechanisms requires per flow state, and that policy
744	   mechanisms can even be implemented for heavily aggregated traffic in
745	   the core of the Internet with complexity akin to accumulating marking
746	   volumes per logical link.  Of course, policy mechanisms may sometimes
747	   choose to focus down on individual flows, but ConEx aims to make
748	   aggregate policy devices feasible.

750	5.4.1.  Congestion Monitoring Devices

752	   Policy devices can typically be decomposed into two functions i)
753	   monitoring the ConEx signal to compare it with a policy then ii)
754	   acting in some way on the result.  Various actions might be invoked
755	   against 'out of contract' traffic, such as policing (see
756	   Section 5.4.3), re-routing, or downgrading the class of service.

758	   Alternatively a policy device might not act directly on the traffic,
759	   but instead report to management systems that are designed to control
760	   congestion indirectly.  For instance the reports might trigger
761	   capacity upgrades, penalty clauses in contracts, levy charges based
762	   on congestion, or merely send warnings to clients who are causing
763	   excessive congestion.

765	   Nonetheless, whatever action is invoked, the congestion monitoring
766	   function will always be a necessary part of any policy device.

768	5.4.2.  Rest-of-Path Congestion Monitoring

770	   ConEx signals indicate the level of congestion along a whole path
771	   from source to destination.  In contrast, ECN signals monitored in
772	   the middle of a network indicate the level of congestion experienced
773	   so far on the path (of course, only in ECN-capable traffic).

775	   If a monitor in the middle of a network (e.g. at a network border)
776	   measures both of these signals, it can subtract the level of ECN
777	   (path so far) from the level of ConEx (whole path) to derive a
778	   measure of the congestion that packets are likely to experience
779	   between the monitoring point and their destination (rest-of-path
780	   congestion).

782	   It will often be preferable for policy devices to monitor rest-of-
783	   path congestion if they can, because it is a measure of the
784	   downstream congestion that the policy device can directly influence
785	   by controlling the traffic passing through it.

787	5.4.3.  Congestion Policers

789	   A congestion policer can be implemented in a very similar way to a
790	   bit-rate policer, but its effect can be focused solely on traffic of
791	   users causing congestion downstream, which ConEx signals make
792	   visible.  Without ConEx signals, the only way to mitigate congestion
793	   is to blindly limit traffic bit-rate, on the assumption that high
794	   bit-rate is more likely to cause congestion.

796	   A congestion policer monitors all ConEx traffic entering a network,
797	   or some identifiable subset.  Using ConEx signals and/or Credit
798	   signals (and preferably subtracting ECN signals to yield rest-of-path
799	   congestion), it measures the amount of congestion that this traffic
800	   is contributing somewhere downstream.  If this persistently exceeds a
801	   policy-configured 'congestion-bit-rate' the congestion policer can
802	   limit all the monitored ConEx traffic.

804	   A congestion policer can be implemented by a simple token bucket
805	   applied to an aggregate.  But unlike a bit-rate policer, it removes
806	   tokens only when it forwards packets that are ConEx-Marked and/or
807	   Credit-Marked, effectively treating Not-ConEx-Marked packets as
808	   invisible.  Consequently, because tokens give the right to send
809	   congested bits, the fill-rate of the token bucket will represent the
810	   allowed congestion-bit-rate.  This should provide sufficient traffic
811	   management without having to additionally constrain the straight bit-
812	   rate at all.  See [I-D.briscoe-conex-policing] for details.

814	   Note that the policing action could be to introduce a throttle
815	   (discard some traffic) immediately upstream of the congestion
816	   monitor.  Alternatively, this throttle could introduce delay using a
817	   queue with its own AQM, which potentially increases the whole path
818	   congestion.  In effect the congestion policer has moved the
819	   congestion earlier in the path, and focused it on one user to protect
820	   downstream resources by reducing the congestion in the rest of the
821	   path.

823	5.5.  Audit

825	   The most critical aspect of ConEx is the capability to support robust
826	   auditing.  It can be assumed that sanctions based on ConEx signals
827	   will create an intrinsic motivation for users to understate the
828	   congestion that they are causing.  So, without strong audit
829	   functions, the ConEx signal would become understated to the point of
830	   being useless.  Therefore the most important feature of an encoding
831	   design is likely to be the robustness of the auditing it supports.

833	   The general goal of an auditor is to make sure that any ConEx-enabled
834	   traffic is sent with sufficient ConEx-Re-Echo and ConEx-Credit
835	   signals.  A concrete definition of the ConEx protocol MUST define
836	   what sufficient means.

838	   If a ConEx-enabled transport does not carry sufficient ConEx signals,
839	   then an auditor is likely to apply some sanction to that traffic.
840	   Although sanctions are beyond the scope of this document, an example
841	   sanction might be to throttle the traffic immediately upstream of the
842	   auditor to prevent the user from getting any advantage by
843	   understating congestion.  Such a throttle would likely include some
844	   combination of delaying or dropping traffic.

846	   A ConEx auditor might use one of the following techniques:

848	   Generic loss auditing:  For congestion signaled by loss, totally
849	      accurate auditing is not believed to be possible in the general
850	      case, because it involves a network node detecting the absence of
851	      some packets, when it cannot always necessarily identify
852	      retransmissions or missing packets.  The missing packet might
853	      simply be taking a different route, or the IP payload may be
854	      encrypted.

856	      It is for this reason that it is desirable to motivate the
857	      deploying of ECN, even though ECN is not strictly required for
858	      ConEx.

860	   ECN auditing:  Directly observe and compare the volume of ECN and
861	      ConEx marks.  Since the volume of ECN marks rises monotonically
862	      along a path, ECN auditing is most accurate when located near the
863	      transport receiver.  For this reason ECN should be monitored
864	      downstream of the predominant bottleneck.

866	   TCP-specific loss auditing:  For non-encrypted standard TCP traffic
867	      on a single path, a tactical audit approach could be to measure
868	      losses by detecting retransmissions, which appear as duplicate
869	      sequence numbers upstream of the loss and out of order data
870	      downstream of the loss.  Since some reordering is present in the
871	      Internet, such a loss estimator would be most accurate near the
872	      sender.  Such an audit device should treat non-ECN-capable packets
873	      with encrypted IP payload as Not-ConEx, even if they claim to be
874	      ConEx-capable, unless the operator is also using one of the other
875	      two techniques below that can audit such packets against losses.

877	   Predominant bottleneck loss auditing:  For networks designed so that
878	      losses predominantly occur under the control of one IP-aware
879	      bottleneck node on the path, the auditor could be located at this
880	      bottleneck.  It could simply compare ConEx Signals with actual
881	      local packet discards (and ECN marks).  This is a good model for
882	      most consumer access networks where audit accuracy could well be
883	      sufficient even if losses occasionally occur elsewhere in the
884	      network.

886	      Although the auditor at the predominant bottleneck would not be
887	      able to count losses at other nodes, transports would not know
888	      where losses were occurring either.  Therefore a transport would
889	      not know which losses it could cheat and which ones it couldn't
890	      without getting caught.

892	   ECN tunnel loss auditing:  A network operator can arrange IP-in-IP
893	      tunnels (or IP-in-MPLS etc.) so that any losses within the tunnels
894	      are deferred until the tunnel egress.  Then the audit function can
895	      be deployed at the egress and be aware of all losses.  This is
896	      possible by enabling ECN marking on switches and routers within a
897	      tunnel, irrespective of whether end-systems support ECN, by
898	      exploiting a side-effect of the way tunnels handle the ECN field.
899	      After encapsulation at the tunnel ingress, the network should
900	      arrange for any non-ECN packets (with '00' in ECN field of the
901	      outer) to be set to the ECN-capable transport (ECT(0)) codepoint.

903	      Then, if they experience congestion at one of the ECN-capable
904	      switches or routers within the tunnel, some will be ECN-marked
905	      rather than immediately dropped.  However, when the tunnel
906	      decapsulator strips the outer from such an ECN-marked packet, if
907	      it finds the inner header has '00' in the ECN field (meaning that
908	      the endpoints do not support ECN) it will automatically drop the
909	      packet, assuming it complies with [RFC6040].  Thus, an audit
910	      function at the decapsulator can know which packets would have
911	      been dropped within the tunnel (and even which are genuinely ECN-
912	      marked for the end-to-end protocol).  Non-ECN end-systems outside
913	      the tunnel see no sign of the use of ECN internally.

915	   In addition, other audit techniques may be identified in the future.

917	   [Refb-dis] gives a comprehensive inventory of attacks against audit
918	   proposed by various people.  It includes pseudocode for both
919	   deterministic and statistical audit functions designed to thwart
920	   these attacks and analyses the effectiveness of an implementation.
921	   Although this work is specific to the re-ECN protocol, most of the
922	   material is useful for designing and assessing audit of other
923	   specific ConEx encodings, against both ECN and loss.

925	   The auditing function should be able to trigger sufficient sanction
926	   to discourage understating congestion [Salvatori05].  This seems to
927	   require designing the sanction in concert with the policy functions,
928	   even though they might be implemented in different parts of the
929	   network.  However, [Refb-dis] proves audit and policy functions can
930	   be independent as long as audit drops sufficient traffic to
931	   'normalise' actual congestion signals to be no greater than ConEx
932	   signals.

934	   Similarly, the job of incentivising the sending of ConEx-enabled
935	   packets is proper solely to policy devices, independent of the audit
936	   function.  The audit function's job is policy-neutral, so it should
937	   be solely confined to checking for correctness within those packets
938	   that have been marked as ConEx-capable.  Even if there are Not-ConEx
939	   packets mixed with ConEx packets within a flow, audit will not need
940	   to monitor any Not-ConEx packets.

942	   Note that in the future it might prove to be desirable to provide
943	   advice on uniformly implementing sanctions, because otherwise
944	   insufficient sanctions could impair the ability to implement policy
945	   elsewhere in the network.

947	   Some of the audit algorithms require per flow state.  This cost is
948	   expected to be tolerable, because these techniques are most apropos
949	   near the edges of the network, where traffic is generally much less
950	   aggregated, so the state need not overwhelm any one device.  The
951	   flow-state required for audit creates itself as it detects new flows.
952	   Therefore a flow will not fail if it is re-routed away from the audit
953	   box currently holding its flow-state, so auditing does not require
954	   route pinning and works fine with multipath flows.

956	   Holding flow-state seems to create a vulnerability to attacks that
957	   exhaust the auditor's memory by opening numerous new short flows.
958	   The audit function can protect itself from this attack by not
959	   allocating new flow-state unless a ConEx-marked packet arrives (e.g.
960	   credit at the start of a flow).  Because policy devices rate limit
961	   ConEx-marked packets, this sets a natural limit to the rate at which
962	   a source can create flow-state in audit devices.  The auditor would
963	   treat all the remaining flows without any ConEx-marked packets as a
964	   single misbehaving aggregate.

966	   Auditing can be distributed and redundant.  One flow may be audited
967	   in multiple places, using multiple techniques.  Some audit techniques
968	   do not require any per flow state and can be applied to aggregate
969	   traffic.  These might be able to detect the presence of understated
970	   congestion at large scale and support recursively hunting for
971	   individual flows that are understating their congestion.  Even at
972	   large scales, flows can be randomly selected for individual auditing.

974	   Sampling techniques can also be used to bound the total auditing
975	   memory footprint, although the implementer needs to counter the
976	   tactic where a source cheats until caught by sampling, then simply
977	   discards that flow ID and starts cheating with a new one (termed
978	   'identifier white-washing when caught').

980	   For the the concrete ConEx protocol encoding defined in
981	   [I-D.ietf-conex-destopt], ConEx Credit and ConEx-Re-Echo signals are
982	   intended to be audited separately.  The Credit signal can be audited
983	   directly against actual congestion (loss and ECN).  However, there
984	   will be an inherent delay of at least one round trip between a
985	   congestion signal and the subsequent ConEx-Re-Echo signal it
986	   triggers, as shown in Figure 1.  Therefore ConEx-Re-Echo signals will
987	   need to be audited with some allowance for this delay.  Further
988	   discussion of design and implementation choices for functions
989	   intended to audit this concrete ConEx encoding can be found in
990	   [I-D.wagner-conex-audit].

992	6.  Support for Incremental Deployment

994	   The ConEx abstract protocol described so far is intended to support
995	   incremental deployment in every possible respect.  For convenience,
996	   the following list collects together all the features that support
997	   incremental deployment in the concrete ConEx specifications, and
998	   points to further information on each:

1000	   Packets:  The wire protocol encoding allows each packet to indicate
1001	      whether it is using ConEx or not (see Section 4 on Encoding
1002	      Congestion Exposure).

1004	   Senders:  ConEx requires a modification to the source in order to
1005	      send ConEx packet markings (see Section 5.2).  Although ConEx
1006	      support can be indicated on a packet-by-packet basis, it is likely
1007	      that all the packets in a flow will either consistently support
1008	      ConEx or consistently not.  It is also likely that, if the
1009	      implementation of a transport protocol supports ConEx, all the
1010	      packets sent from that host using that protocol will be ConEx
1011	      marked.

1013	      The implementations of some of the transport protocols on a host
1014	      might not support ConEx (e.g. the implementation of DNS over UDP
1015	      might not support ConEx, while perhaps RTP over UDP and TCP will).
1016	      Any non-upgraded transports and non-upgraded hosts will simply
1017	      continue to send regular Not-ConEx packets as always.

1019	      A network operator can create incentives for senders to
1020	      voluntarily reveal ConEx information (see the item on incremental
1021	      deployment by 'Networks' below).

1023	   Receivers:  A ConEx source should be able to work with the regular
1024	      receiver for the transport in question, without requiring any
1025	      ConEx-specific modifications.  This is true for modern transport
1026	      protocols (RTCP, SCTP etc) and it is even true for TCP, as long as
1027	      the receiver supports SACK, which is widely deployed anyway.
1028	      However, it is not true for ECN feedback in TCP.  The need for
1029	      more precise ECN feedback in TCP is not exclusive to ConEx, for
1030	      instance Data Centre TCP (DCTCP [DCTCP]) uses precise feedback to
1031	      good effect.  Therefore, if a receiver offers precise feedback,
1032	      [I-D.ietf-tcpm-accecn-reqs] it will be best if ConEx uses it (see
1033	      Section 5.3).  Alternatively, without sufficiently precise
1034	      congestion feedback from the receiver, the source may have to
1035	      conservatively send extra ConEx markings in order to avoid
1036	      understating congestion.

1038	   Proxies:  Although it was stated above that ConEx requires a
1039	      modification to the source, ConEx signals could theoretically be
1040	      introduced by a proxy for the source, as long as it can intercept
1041	      feedback from the receiver.  Similarly, more precise feedback
1042	      could thoretically be provided by a proxy for the receiver rather
1043	      than modifying the receiver itself.

1045	   Forwarding:  No modification to forwarding or queuing is needed for
1046	      ConEx.

1048	      However, once some ConEx is deployed, it is possible that a queue
1049	      implementation could optionally take advantage of the ConEx
1050	      information in packets.  For instance, it has been suggested
1051	      [I-D.ietf-conex-destopt] that a queue would be more robust against
1052	      flooding if it preferentially discarded Not-ConEx packets then
1053	      Not-Marked ConEx packets.

1055	      A ConEx sender re-echoes congestion whether the queues signaling
1056	      congestion are ECN-enabled or not.  Nonetheless, an operator
1057	      relying on ConEx signals is recommended to enable ECN in queues
1058	      wherever possible.  This is because auditing works best if most
1059	      congestion is indicated by ECN rather than loss (see Section 3).
1060	      Also, monitoring rest-of-path congestion is not accurate if there
1061	      are congested non-ECN queues upstream of the monitoring point
1062	      (Section 5.4.2).

1064	   Networks:  If a subset of traffic sources (or proxies) use ConEx
1065	      signals to reveal congestion in the internetwork layer, a network
1066	      operator can choose (or not) to use this information for traffic
1067	      management.  As long as the end-to-end ConEx signals are present,
1068	      each network can unilaterally choose to use them--independently of
1069	      whether other networks do.

1071	      ConEx marked packets may safely traverse a network that ignores
1072	      them.  ConEx signals are defined to remain unchanged once set by
1073	      the sender, but some encodings may allow changes in transit (e.g.
1074	      by proxies).  In no circumstances will a network node change ConEx
1075	      marked packets to Not-ConEx (network layer encoding requirement I
1076	      in Section 3.3).  If necessary, endpoints should be able to detect
1077	      if a network is removing ConEx signals (network layer encoding
1078	      requirement H in Section 3.3).

1080	      An operator can deploy policy devices (Section 5.4) wherever
1081	      traffic enters its network, in order to monitor the downstream
1082	      congestion that incoming traffic contributes to, and control it if
1083	      necessary.  A network operator can create incentives for the
1084	      developers of sending applications and transports to voluntarily
1085	      reveal ConEx information.  Without ConEx information, a network
1086	      operator tends to have to limit the bit-rate or volume from a site
1087	      more than is necessary, just in case it might congest others.
1088	      With ConEx information, the operator can solely limit congestion-
1089	      causing traffic, and otherwise allow complete freedom.  This
1090	      greater freedom acts as an inducement for the source to volunteer
1091	      ConEx information.  An operator may also monitor whether a source
1092	      transport has sent ConEx packets, and treat the same transport
1093	      with greater suspicion (e.g. a more stringent rate-limit) whenever
1094	      it selectively sends packets without ConEx support.  See [RFC6789]
1095	      for further discussion of deployment incentives for networks and
1096	      references to scenarios where some networks use ConEx-based policy
1097	      devices and others don't.

1099	      An operator can deploy audit devices (Section 5.5) unilaterally
1100	      within its own network to verify that traffic sources are not
1101	      understating ConEx information.  From the viewpoint of one network
1102	      operator (say N_a), it only cares that the level of ConEx
1103	      signaling is sufficient to cover congestion in its own network.
1104	      If traffic continues into a congested downstream network (say
1105	      N_b), it is of no concern to the first network (N_a) if the end-
1106	      to-end ConEx signaling is insufficient to cover the congestion in
1107	      N_b as well.  This is N_b's concern, and N_b can both detect such
1108	      anomalous traffic and deal with it using ConEx-based audit devices
1109	      itself.

1111	7.  IANA Considerations

1113	   This memo includes no request to IANA.

1115	   Note to RFC Editor: this section may be removed on publication as an
1116	   RFC.

1118	8.  Security Considerations

1120	   The only known risk associated with ConEx is that users and
1121	   applications are very likely to be motivated to under-represent the
1122	   congestion that they are causing.  Significant portions of this
1123	   document are about mechanisms to audit the ConEx signals and create
1124	   sufficient sanction to inhibit such under-representation.  In
1125	   particular see Section 5.5.

1127	   Security attacks and their defences are best discussed against a
1128	   concrete protocol specification, not the abstract mechanism of this
1129	   document.  A concrete ConEx protocol will need to be accompanied by a
1130	   document describing how the protocol and its audit mechanisms defend
1131	   against likely attacks.  [Refb-dis] will be a useful source for such
1132	   a document.  It gives a comprehensive inventory of attacks against
1133	   audit that have been proposed by various parties.  It includes
1134	   pseudocode for both deterministic and statistical audit functions
1135	   designed to thwart these attacks and analyses the effectiveness of an
1136	   implementation.

1138	   However, [Refb-dis] is specific to the re-ECN protocol, which
1139	   signalled ECN & loss together, whereas the concrete ConEx protocol
1140	   defined in [I-D.ietf-conex-destopt] signals them separately.

1142	   Therefore, although likely attacks will be similar, there will be
1143	   more combinations of attacks to worry about, and defences and their
1144	   analysis are likely to be a little different for ConEx.

1146	   The main known attacks that a security document for a concrete ConEx
1147	   protocol will need to address are listed below, and [Refb-dis] should
1148	   be referred to for how re-ECN was designed to defend against similar
1149	   attacks:
1150	   o  Attacks on the audit function (see Section 7.5 of [Refb-dis]):
1151	      Flow ID Whitewashing:   Designing the audit function so that a
1152	         source cannot gain from starting a new flow once audit has
1153	         detected cheating in a previous flow.
1154	      Dragging Down an Aggregate:   Avoiding audit discarding packets
1155	         from all flows within an aggregate, which would allow one flow
1156	         to pull down the average so that the audit function would
1157	         discard packets from all flows, not just the offending flow.
1158	      Dragging Down a Spoofed Flow ID:   An attacker understates ConEx
1159	         markings in packets that spoof another flow, which fools the
1160	         audit function into dropping the genuine user's packets.
1161	   o  Attacks by networks on other networks (see Section 8.2 of
1162	      [Refb-dis]):
1163	      Dummy Traffic:   Sending dummy traffic across a border with
1164	         understated ConEx markings to bring down the average ConEx
1165	         markings in the aggregate of border traffic.  This attack can
1166	         be combined with a TTL that expires before the packets reach an
1167	         audit function.
1168	      Signal Poisoning with 'Cancelled' Marking:   Sending high volumes
1169	         of valid packets that are both ConEx-Marked and ECN-Marked,
1170	         which seems to represent congestion upstream, but it makes
1171	         these packets immune to being further ECN-Marked downstream.

1173	   It is planned to document all known attacks and their defences
1174	   (including all the above) in the RFC series against a concrete ConEx
1175	   protocol specification.  In the interim [Refb-dis] and its references
1176	   should be referred to for details and ways to address these attacks
1177	   in the case of re-ECN.

1179	9.  Acknowledgements

1181	   This document was improved by review comments from Toby Moncaster,
1182	   Nandita Dukkipati, Mirja Kuehlewind, Caitlin Bestler, Marcelo Bagnulo
1183	   Braun, John Leslie, Ingemar Johansson and David Wagner.

1185	   Bob Briscoe's work on this specification received part-funding from
1186	   the European Union's Seventh Framework Programme FP7/2007-2013 under
1187	   Trilogy 2 project, grant agreement no. 317756.  The views expressed
1188	   here are solely those of the author.

1190	10.  Comments Solicited

1192	   Comments and questions are encouraged and very welcome.  They can be
1193	   addressed to the IETF Congestion Exposure (ConEx) working group
1194	   mailing list <conex@ietf.org>, and/or to the authors.

1196	11.  References

1198	11.1.  Normative References

1200	   [RFC2119]                         Bradner, S., "Key words for use in
1201	                                     RFCs to Indicate Requirement
1202	                                     Levels", BCP 14, RFC 2119,
1203	                                     March 1997.

1205	11.2.  Informative References

1207	   [CheapPseud]                      Friedman, E. and P. Resnick, "The
1208	                                     Social Cost of Cheap Pseudonyms",
1209	                                     Journal of Economics and Management
1210	                                     Strategy 10(2)173--199, 1998.

1212	   [DCTCP]                           Alizadeh, M., Greenberg, A., Maltz,
1213	                                     D., Padhye, J., Patel, P.,
1214	                                     Prabhakar, B., Sengupta, S., and M.
1215	                                     Sridharan, "Data Center TCP
1216	                                     (DCTCP)", ACM SIGCOMM
1217	                                     CCR 40(4)63--74, October 2010, <htt
1218	                                     p://portal.acm.org/
1219	                                     citation.cfm?id=1851192>.

1221	   [Evol_cc]                         Gibbens, R. and F. Kelly, "Resource
1222	                                     pricing and the evolution of
1223	                                     congestion control",
1224	                                     Automatica 35(12)1969--1985,
1225	                                     December 1999, <http://
1226	                                     www.sciencedirect.com/science/
1227	                                     article/pii/S0005109899001351>.

1229	   [FairerFaster]                    Briscoe, B., "A Fairer, Faster
1230	                                     Internet Protocol", IEEE
1231	                                     Spectrum Dec 2008:38--43,
1232	                                     December 2008, <http://
1233	                                     bobbriscoe.net/projects/
1234	                                     refb/#fairfastip>.

1236	   [I-D.briscoe-conex-policing]      Briscoe, B., "Network Performance
1237	                                     Isolation using Congestion
1238	                                     Policing",
1239	                                     draft-briscoe-conex-policing-00
1240	                                     (work in progress), February 2013.

1242	   [I-D.briscoe-conex-re-ecn-motiv]  Briscoe, B., Jacquet, A.,
1243	                                     Moncaster, T., and A. Smith, "Re-
1244	                                     ECN: A Framework for adding
1245	                                     Congestion Accountability to
1246	                                     TCP/IP",
1247	                                     draft-briscoe-conex-re-ecn-motiv-02
1248	                                     (work in progress), July 2013.

1250	   [I-D.briscoe-conex-re-ecn-tcp]    Briscoe, B., Jacquet, A.,
1251	                                     Moncaster, T., and A. Smith, "Re-
1252	                                     ECN: Adding Accountability for
1253	                                     Causing Congestion to TCP/IP",
1254	                                     draft-briscoe-conex-re-ecn-tcp-02
1255	                                     (work in progress), July 2013.

1257	   [I-D.ietf-conex-destopt]          Krishnan, S., Kuehlewind, M., and
1258	                                     C. Ucendo, "IPv6 Destination Option
1259	                                     for ConEx",
1260	                                     draft-ietf-conex-destopt-05 (work
1261	                                     in progress), October 2013.

1263	   [I-D.ietf-tcp-modifications]      Kuehlewind, M. and R.
1264	                                     Scheffenegger, "TCP modifications
1265	                                     for Congestion Exposure", draft-
1266	                                     ietf-conex-tcp-modifications-04
1267	                                     (work in progress), July 2013.

1269	   [I-D.ietf-tcpm-accecn-reqs]       Kuehlewind, M. and R.
1270	                                     Scheffenegger, "Problem Statement
1271	                                     and Requirements for a More
1272	                                     Accurate ECN Feedback",
1273	                                     draft-ietf-tcpm-accecn-reqs-04
1274	                                     (work in progress), October 2013.

1276	   [I-D.wagner-conex-audit]          Wagner, D. and M. Kuehlewind,
1277	                                     "Auditing of Congestion Exposure
1278	                                     (ConEx) signals",
1279	                                     draft-wagner-conex-audit-01 (work
1280	                                     in progress), February 2014.

1282	   [RFC2018]                         Mathis, M., Mahdavi, J., Floyd, S.,
1283	                                     and A. Romanow, "TCP Selective
1284	                                     Acknowledgment Options", RFC 2018,
1285	                                     October 1996.

1287	   [RFC3168]                         Ramakrishnan, K., Floyd, S., and D.
1288	                                     Black, "The Addition of Explicit
1289	                                     Congestion Notification (ECN) to
1290	                                     IP", RFC 3168, September 2001.

1292	   [RFC3514]                         Bellovin, S., "The Security Flag in
1293	                                     the IPv4 Header", RFC 3514, April 1
1294	                                     2003.

1296	   [RFC3550]                         Schulzrinne, H., Casner, S.,
1297	                                     Frederick, R., and V. Jacobson,
1298	                                     "RTP: A Transport Protocol for
1299	                                     Real-Time Applications", STD 64,
1300	                                     RFC 3550, July 2003.

1302	   [RFC5348]                         Floyd, S., Handley, M., Padhye, J.,
1303	                                     and J. Widmer, "TCP Friendly Rate
1304	                                     Control (TFRC): Protocol
1305	                                     Specification", RFC 5348,
1306	                                     September 2008.

1308	   [RFC5681]                         Allman, M., Paxson, V., and E.
1309	                                     Blanton, "TCP Congestion Control",
1310	                                     RFC 5681, September 2009.

1312	   [RFC6040]                         Briscoe, B., "Tunnelling of
1313	                                     Explicit Congestion Notification",
1314	                                     RFC 6040, November 2010.

1316	   [RFC6679]                         Westerlund, M., Johansson, I.,
1317	                                     Perkins, C., O'Hanlon, P., and K.
1318	                                     Carlberg, "Explicit Congestion
1319	                                     Notification (ECN) for RTP over
1320	                                     UDP", RFC 6679, August 2012.

1322	   [RFC6789]                         Briscoe, B., Woundy, R., and A.
1323	                                     Cooper, "Congestion Exposure
1324	                                     (ConEx) Concepts and Use Cases",
1325	                                     RFC 6789, December 2012.

1327	   [RFC6817]                         Shalunov, S., Hazel, G., Iyengar,
1328	                                     J., and M. Kuehlewind, "Low Extra
1329	                                     Delay Background Transport
1330	                                     (LEDBAT)", RFC 6817, December 2012.

1332	   [RFC7141]                         Briscoe, B. and J. Manner, "Byte
1333	                                     and Packet Congestion
1334	                                     Notification", BCP 41, RFC 7141,
1335	                                     February 2014.

1337	   [Re-fb]                           Briscoe, B., Jacquet, A., Di
1338	                                     Cairano-Gilfedder, C., Salvatori,
1339	                                     A., Soppera, A., and M. Koyabe,
1340	                                     "Policing Congestion Response in an
1341	                                     Internetwork Using Re-Feedback",
1342	                                     ACM SIGCOMM CCR 35(4)277--288,
1343	                                     August 2005, <http://
1344	                                     portal.acm.org/
1345	                                     citation.cfm?id=1080091.1080124>.

1347	   [Refb-dis]                        Briscoe, B., "Re-feedback: Freedom
1348	                                     with Accountability for Causing
1349	                                     Congestion in a Connectionless
1350	                                     Internetwork", UCL PhD
1351	                                     Dissertation , 2009,
1352	                                     <http://discovery.ucl.ac.uk/
1353	                                     16274/>.

1355	   [Salvatori05]                     Salvatori, A., "Closed Loop Traffic
1356	                                     Policing", Politecnico Torino and
1357	                                     Institut Eurecom Masters Thesis ,
1358	                                     September 2005.

1360	Authors' Addresses

1362	   Matt Mathis
1363	   Google, Inc
1364	   1600 Amphitheater Parkway
1365	   Mountain View, California  93117
1366	   USA

1368	   EMail: mattmathis at google.com

1370	   Bob Briscoe
1371	   BT
1372	   B54/77, Adastral Park
1373	   Martlesham Heath
1374	   Ipswich  IP5 3RE
1375	   UK

1377	   Phone: +44 1473 645196
1378	   EMail: bob.briscoe@bt.com
1379	   URI:   http://bobbriscoe.net/