idnits 2.17.1 

draft-ietf-tsvwg-ecn-tunnel-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1161 has weird spacing: '...   both  admis...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 24, 2009) is 5511 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Encapsulate' is mentioned on line 1488, but not
     defined

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pcn-architecture-10

  == Outdated reference: A later version (-07) exists of
     draft-ietf-pcn-baseline-encoding-02

  == Outdated reference: A later version (-05) exists of
     draft-ietf-pcn-marking-behaviour-02

  == Outdated reference: A later version (-02) exists of
     draft-ietf-pwe3-congestion-frmwk-01

  == Outdated reference: A later version (-02) exists of
     draft-satoh-pcn-st-marking-01

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)

  -- Obsolete informational reference (is this intentional?): RFC 4423
     (Obsoleted by RFC 9063)


     Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  B. Briscoe
3	Internet-Draft                                                        BT
4	Intended status: Standards Track                          March 24, 2009
5	Expires: September 25, 2009

7	             Tunnelling of Explicit Congestion Notification
8	                     draft-ietf-tsvwg-ecn-tunnel-02

10	Status of this Memo

12	   This Internet-Draft is submitted to IETF in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on September 25, 2009.

33	Copyright Notice

35	   Copyright (c) 2009 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents in effect on the date of
40	   publication of this document (http://trustee.ietf.org/license-info).
41	   Please review these documents carefully, as they describe your rights
42	   and restrictions with respect to this document.

44	Abstract

46	   This document redefines how the explicit congestion notification
47	   (ECN) field of the IP header should be constructed on entry to and
48	   exit from any IP in IP tunnel.  On encapsulation it brings all IP in
49	   IP tunnels (v4 or v6) into line with the way RFC4301 IPsec tunnels
50	   now construct the ECN field.  On decapsulation it redefines how the
51	   ECN field in the forwarded IP header should be calculated for two
52	   previously invalid combinations of incoming inner and outer headers,
53	   in order that these combinations may be usefully employed in future
54	   standards actions.  It includes a thorough analysis of the reasoning
55	   for these changes and the implications.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	     1.1.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . .  8
61	     1.2.  Document Roadmap . . . . . . . . . . . . . . . . . . . . .  9
62	   2.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  9
63	   3.  Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 10
64	     3.1.  Encapsulation at Tunnel Ingress  . . . . . . . . . . . . . 10
65	     3.2.  Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 12
66	   4.  New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 13
67	     4.1.  Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14
68	     4.2.  Default Tunnel Egress Behaviour  . . . . . . . . . . . . . 14
69	     4.3.  Design Principles for Future Non-Default Schemes . . . . . 16
70	   5.  Backward Compatibility . . . . . . . . . . . . . . . . . . . . 17
71	     5.1.  Non-Issues Upgrading Any Tunnel Decapsulation  . . . . . . 18
72	     5.2.  Non-Issues for RFC4301 IPsec Encapsulation . . . . . . . . 18
73	     5.3.  Upgrading Other IP in IP Tunnel Encapsulators  . . . . . . 19
74	   6.  Changes from Earlier RFCs  . . . . . . . . . . . . . . . . . . 20
75	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
76	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
77	   9.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 23
78	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24
79	   11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 25
80	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
81	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 25
82	     12.2. Informative References . . . . . . . . . . . . . . . . . . 25
83	   Appendix A.  Design Constraints  . . . . . . . . . . . . . . . . . 28
84	     A.1.  Security Constraints . . . . . . . . . . . . . . . . . . . 28
85	     A.2.  Control Constraints  . . . . . . . . . . . . . . . . . . . 30
86	     A.3.  Management Constraints . . . . . . . . . . . . . . . . . . 31
87	   Appendix B.  Relative Placement of Tunnelling and In-Path Load
88	                Regulation  . . . . . . . . . . . . . . . . . . . . . 32
89	     B.1.  Identifiers and In-Path Load Regulators  . . . . . . . . . 32
90	     B.2.  Non-Dependence of Tunnelling on In-path Load Regulation  . 33
91	     B.3.  Dependence of In-Path Load Regulation on Tunnelling  . . . 34
92	   Appendix C.  Contribution to Congestion across a Tunnel  . . . . . 37
93	   Appendix D.  Why Not Propagating ECT(1) on Decapsulation
94	                Impedes PCN . . . . . . . . . . . . . . . . . . . . . 38
95	     D.1.  Alternative Ways to Introduce the New Decapsulation
96	           Rules  . . . . . . . . . . . . . . . . . . . . . . . . . . 39
97	   Appendix E.  Why Resetting CE on Encapsulation Impedes PCN . . . . 40
98	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 40

100	Changes from previous drafts (to be removed by the RFC Editor)

102	   Full text differences between IETF draft versions are available at
103	   <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-ecn-tunnel/>, and
104	   between earlier individual draft versions at
105	   <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ecn-tunnel>

107	   From ietf-01 to ietf-02 (current):

109	      *  Scope reduced from any encapsulation of an IP packet to solely
110	         IP in IP tunnelled encapsulation.  Consequently changed title
111	         and removed whole section 'Design Guidelines for New
112	         Encapsulations of Congestion Notification' (to be included in a
113	         future companion informational document).

115	      *  Included a new normative decapsulation rule for ECT(0) inner
116	         and ECT(1) outer that had previously only been outlined in the
117	         non-normative appendix 'Comprehensive Decapsulation Rules'.
118	         Consequently:

120	         +  The Introduction has been completely re-written to motivate
121	            this change to decapsulation along with the existing change
122	            to encapsulation.

124	         +  The tentative text in the appendix that first proposed this
125	            change has been split between normative standards text in
126	            Section 4 and Appendix D, which explains specifically why
127	            this change would streamline PCN.  New text on the logic of
128	            the resulting decap rules added.

130	      *  If inner/outer is Not-ECT/ECT(0), changed decapsulation to
131	         propagate Not-ECT rather than drop the packet; and added
132	         reasoning.

134	      *  Considerably restructured:

136	         +  "Design Constraints" analysis moved to an appendix
137	            (Appendix A);

139	         +  Added Section 3 to summarise relevant existing RFCs;

141	         +  Structured Section 4 and Section 5 into subsections.

143	         +  Added tables to sections on old and new rules, for precision
144	            and comparison.

146	         +  Moved Section 4.3 on Design Principles to the end of the
147	            section specifying the new default normative tunnelling
148	            behaviour.  Rewritten and shifted text on identifiers and
149	            in-path load regulators to Appendix B.1.

151	   From ietf-00 to ietf-01:

153	      *  Identified two additional alarm states in the decapsulation
154	         rules (Figure 4) if ECT(X) in outer and inner contradict each
155	         other.

157	      *  Altered Comprehensive Decapsulation Rules (Appendix D) so that
158	         ECT(0) in the outer no longer overrides ECT(1) in the inner.
159	         Used the term 'Comprehensive' instead of 'Ideal'.  And
160	         considerably updated the text in this appendix.

162	      *  Added Appendix D.1 to weigh up the various ways the
163	         Comprehensive Decapsulation Rules might be introduced.  This
164	         replaces the previous contradictory statements saying complex
165	         backwards compatibility interactions would be introduced while
166	         also saying there would be no backwards compatibility issues.

168	      *  Updated references.

170	   From briscoe-01 to ietf-00:

172	      *  Re-wrote Appendix C giving much simpler technique to measure
173	         contribution to congestion across a tunnel.

175	      *  Added discussion of backward compatibility of the ideal
176	         decapsulation scheme in Appendix D

178	      *  Updated references.  Minor corrections & clarifications
179	         throughout.

181	   From -00 to -01:

183	      *  Related everything conceptually to the uniform and pipe models
184	         of RFC2983 on Diffserv Tunnels, and completely removed the
185	         dependence of tunnelling behaviour on the presence of any in-
186	         path load regulation by using the [1 - Before] [2 - Outer]
187	         function placement concepts from RFC2983;

189	      *  Added specific cases where the existing standards limit new
190	         proposals, particularly Appendix E;

192	      *  Added sub-structure to Introduction (Need for Rationalisation,
193	         Roadmap), added new Introductory subsection on "Scope" and
194	         improved clarity;

196	      *  Added Design Guidelines for New Encapsulations of Congestion
197	         Notification;

199	      *  Considerably clarified the Backward Compatibility section
200	         (Section 5);

202	      *  Considerably extended the Security Considerations section
203	         (Section 8);

205	      *  Summarised the primary rationale much better in the
206	         conclusions;

208	      *  Added numerous extra acknowledgements;

210	      *  Added Appendix E.  "Why resetting CE on encapsulation harms
211	         PCN", Appendix C.  "Contribution to Congestion across a Tunnel"
212	         and Appendix D.  "Ideal Decapsulation Rules";

214	      *  Re-wrote Appendix B.2, explaining how tunnel encapsulation no
215	         longer depends on in-path load-regulation (changed title from
216	         "In-path Load Regulation" to "Non-Dependence of Tunnelling on
217	         In-path Load Regulation"), but explained how an in-path load
218	         regulation function must be carefully placed with respect to
219	         tunnel encapsulation (in a new sub-section entitled "Dependence
220	         of In-Path Load Regulation on Tunnelling").

222	1.  Introduction

224	   This document redefines how the explicit congestion notification
225	   (ECN) field [RFC3168] in the IP header should be constructed for all
226	   IP in IP tunnelling.  Previously, tunnel endpoints blocked visibility
227	   of transitions of the ECN field except the minimum necessary to allow
228	   the basic ECN mechanism to work.  Three main change are defined, one
229	   on entry to and two on exit from any IP in IP tunnel.  The newly
230	   specified behaviours make all transitions to the ECN field visible
231	   across tunnel end-points, so tunnels no longer restrict new uses of
232	   the ECN field that were not envisaged when ECN was first designed.

234	   The immediate motivation for opening up the ECN behaviour of tunnels
235	   is because otherwise they impede the introduction of pre-congestion
236	   notification (PCN [I-D.ietf-pcn-marking-behaviour]) in networks with
237	   tunnels (Appendix E explains why).  But these changes are not just
238	   intended to ease the introduction of PCN; care has been taken to
239	   ensure the resulting ECN tunnelling behaviour is simple and generic
240	   for other potential future uses.

242	   Given this is a change to behaviour at 'the neck of the hourglass',
243	   an extensive analysis of the trade-offs between control, management
244	   and security constraints has been conducted in order to minimise
245	   unexpected side-effects both now and in the future.  Care has also
246	   been taken to ensure the changes are fully backwards compatible with
247	   all previous tunnelling behaviours.

249	   The ECN protocol allows a forwarding element to notify the onset of
250	   congestion of its resources without having to drop packets.  Instead
251	   it can explicitly mark a proportion of packets by setting the
252	   congestion experienced (CE) codepoint in the 2-bit ECN field in the
253	   IP header (see Table 1 for a recap of the ECN codepoints).

255	     +------------------+----------------+---------------------------+
256	     | Binary codepoint | Codepoint name | Meaning                   |
257	     +------------------+----------------+---------------------------+
258	     |        00        | Not-ECT        | Not ECN-capable transport |
259	     |        01        | ECT(1)         | ECN-capable transport     |
260	     |        10        | ECT(0)         | ECN-capable transport     |
261	     |        11        | CE             | Congestion experienced    |
262	     +------------------+----------------+---------------------------+

264	     Table 1: Recap of Codepoints of the ECN Field [RFC3168] in the IP
265	                                  Header

267	   The outer header of an IP packet can encapsulate one (or more)
268	   additional IP headers tunnelled within it.  A forwarding element that
269	   is using ECN to signify congestion will only mark the outer IP header
270	   that is immediately visible to it.  When a tunnel decapsulator later
271	   removes this outer header, it must follow rules to ensure the marking
272	   is propagated into the IP header being forwarded onwards, otherwise
273	   congestion notifications will disappear into a black hole leading to
274	   potential congestion collapse.

276	   The rules for constructing the ECN field to be forwarded after tunnel
277	   decapsulation ensure this happens, but they are not wholly
278	   straightforward, and neither are the rules for encapsulating one IP
279	   header in another on entry to a tunnel.  The factor that has
280	   introduced most complication at both ends of a tunnel has been the
281	   possibility that the ECN field might be used as a covert channel to
282	   compromise the integrity of an IPsec tunnel.

284	   A common use for IPsec is to create a secure tunnel between two
285	   secure sites across the public Internet.  A field like ECN that can
286	   change as it traverses the Internet cannot be covered by IPsec's
287	   integrity mechanisms.  Therefore, the ECN field might be toggled
288	   (with two bits per packet) to communicate between a secure site and
289	   someone on the public Internet--a covert channel.

291	   Over the years covert channel restrictions have been added to the
292	   design of ECN (with consequent backward compatibility complications).
293	   However the latest IPsec architecture [RFC4301] takes the view that
294	   simplicity is more important than closing off the covert channel
295	   threat, which it deems manageable given its bandwidth is limited to
296	   two bits per packet.

298	   As a result, an unfortunate sequence of standards actions has left us
299	   with nearly the worst of all possible combinations of outcomes,
300	   despite the best endeavours of everyone concerned.  The new IPsec
301	   architecture [RFC4301] only updates the earlier specification of ECN
302	   tunnelling behaviour [RFC3168] for the case of IPsec tunnels.  For
303	   the case of non-IPsec tunnels the earlier RFC3168 specification still
304	   applies.  At the time RFC3168 was standardised, covert channels
305	   through the ECN field were restricted, whether or not IPsec was being
306	   used.  The perverse position now is that non-IPsec tunnels restrict
307	   covert channels, while IPsec tunnels don't.

309	   Actually, this statement needs some qualification.  IPsec tunnels
310	   only don't restrict the ECN covert channel at the ingress.  At the
311	   tunnel egress, the presumption that the ECN covert channel should be
312	   restricted has not been removed from any tunnelling specifications,
313	   whether IPsec or not.

315	   Now that these historic 2-bit covert channel constraints are impeding
316	   the introduction of PCN, this specification is designed to remove
317	   them and at the same time streamline the whole ECN behaviour for the
318	   future.

320	1.1.  Scope

322	   This document only concerns wire protocol processing at tunnel
323	   endpoints and makes no changes or recommendations concerning
324	   algorithms for congestion marking or congestion response.

326	   This document specifies common, default ECN field processing at
327	   encapsulation and decapsulation for any IP in IP tunnelling.  It
328	   applies irrespective of whether IPv4 or IPv6 is used for either of
329	   the inner and outer headers.  It applies to all Diffserv per-hop
330	   behaviours (PHBs), unless stated otherwise in the specification of a
331	   PHB.  It is intended to be a good trade off between somewhat
332	   conflicting security, control and management requirements.

334	   Nonetheless, if necessary, an alternate congestion encapsulation
335	   behaviour can be introduced as part of the definition of an alternate
336	   congestion marking scheme used by a specific Diffserv PHB (see S.5 of
337	   [RFC3168] and [RFC4774]).  When designing such new encapsulation
338	   schemes, the principles in Section 4.3 should be followed as closely
339	   as possible.  There is no requirement for a PHB to state anything
340	   about ECN tunnelling behaviour if the new default behaviour is
341	   sufficient.

343	   [RFC2983] is a comprehensive primer on differentiated services and
344	   tunnels.  Given ECN raises similar issues to differentiated services
345	   when interacting with tunnels, useful concepts introduced in RFC2983
346	   are used throughout, with brief recaps of the explanations where
347	   necessary.

349	1.2.  Document Roadmap

351	   The body of the document focuses solely on standards actions
352	   impacting implementation.  Appendices record the analysis that
353	   motivates and justifies these actions.  The whole document is
354	   organised as follows:

356	   o  Section 3 recaps relevant existing RFCs and explains exactly why
357	      changes are needed, referring to Appendix D and Appendix E in
358	      order to explain in detail why current tunnelling behaviours
359	      impede PCN deployment, at egress and ingress respectively.

361	   o  Section 4 uses precise standards terminology to specify the new
362	      ECN tunnelling behaviours.  It refers to Appendix A for analysis
363	      of the trade-offs between security, control and management design
364	      constraints that led to these particular standards actions.

366	   o  Extending the new IPsec tunnel ingress behaviour to all IP in IP
367	      tunnels requires consideration of backwards compatibility, which
368	      is covered in Section 5 and detailed changes from earlier RFCs are
369	      brought together in Section 6.

371	   o  Finally, a number of security considerations are discussed and
372	      conclusions are drawn.

374	   o  Additional specialist issues are deferred to appendices in
375	      addition to those already referred to above, in particular
376	      Appendix B discusses specialist tunnelling issues that could arise
377	      when ECN is fed back to a load regulation function on a middlebox,
378	      rather than at the source of the path.

380	2.  Requirements Language

382	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
383	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
384	   document are to be interpreted as described in RFC 2119 [RFC2119].

386	3.  Summary of Pre-Existing RFCs

388	   This section is informative not normative.  It merely recaps pre-
389	   existing RFCs to help motivate changing these behaviours.  Earlier
390	   relevant RFCs that were either experimental or incomplete with
391	   respect to ECN tunnelling (RFC2481, RFC2401 and RFC2003) are not
392	   discussed, although the backwards compatibility considerations in
393	   Section 5 take them into account.  The question of whether tunnel
394	   implementations used in the Internet comply with any of these RFCs is
395	   also not discussed.

397	3.1.  Encapsulation at Tunnel Ingress

399	   The controversy at tunnel ingress has been over whether to propagate
400	   information about congestion experienced on the path upstream of the
401	   tunnel ingress into the outer header of the tunnel.

403	   Specifically, RFC3168 says that, if a tunnel fully supports ECN
404	   (termed a 'full-functionality' ECN tunnel in [RFC3168]), the tunnel
405	   ingress must not copy a CE marking from the inner header into the
406	   outer header that it creates.  Instead the tunnel ingress must set
407	   the outer header to ECT(0) (i.e. codepoint 10) if the ECN field is
408	   marked CE (codepoint 11) in the arriving IP header.  We term this
409	   'resetting' a CE codepoint.

411	   However, the new IPsec architecture in [RFC4301] reverses this rule,
412	   stating that the tunnel ingress must simply copy the ECN field from
413	   the arriving to the outer header.  The main purpose of the present
414	   specification is to carry the new behaviour of IPsec over to all IP
415	   in IP tunnels, so all tunnel ingress nodes consistently copy the ECN
416	   field.

418	   RFC3168 also provided a Limited Functionality mode that turns off ECN
419	   processing over the scope of the tunnel.  This is necessary if the
420	   ingress does not know whether the tunnel egress supports propagation
421	   of ECN markings.  Neither Limited Functionality mode nor Full
422	   Functionality mode are used in RFC4301 IPsec.

424	   These pre-existing behaviours are summarised in Figure 1.

426	    +-----------------+-----------------------------------------------+
427	    | Incoming Header |             Outgoing Outer Header             |
428	    | (also equal to  +---------------+---------------+---------------+
429	    | Outgoing Inner  |  RFC3168 ECN  |  RFC3168 ECN  | RFC4301 IPsec |
430	    |     Header)     |    Limited    |     Full      |               |
431	    |                 | Functionality | Functionality |               |
432	    +-----------------+---------------+---------------+---------------+
433	    |    Not-ECT      |   Not-ECT     |   Not-ECT     |   Not-ECT     |
434	    |     ECT(0)      |   Not-ECT     |    ECT(0)     |    ECT(0)     |
435	    |     ECT(1)      |   Not-ECT     |    ECT(1)     |    ECT(1)     |
436	    |       CE        |   Not-ECT     |    ECT(0)     |      CE       e|
437	    +-----------------+---------------+---------------+---------------+

439	    Figure 1: IP in IP Encapsulation: Recap of Pre-existing Behaviours

441	   For encapsulation, the specification in Section 4 below brings all IP
442	   in IP tunnels (v4 or v6) into line with the way IPsec tunnels
443	   [RFC4301] now construct the ECN field, except where a legacy tunnel
444	   egress might not understand ECN at all.  This removes the now
445	   redundant full functionality mode in the middle column of Figure 1.
446	   Wherever possible it ensures that the outer header reveals any
447	   congestion experienced so far on the whole path, not just since the
448	   last tunnel ingress.

450	   Why does it matter if we have different ECN encapsulation behaviours
451	   for IPsec and non-IPsec tunnels?  A general answer is that gratuitous
452	   inconsistency constrains the available design space and makes it
453	   harder to design networks and new protocols that work predictably.

455	   But there is also a specific need not to reset the CE codepoint.  The
456	   standards track proposal for excess rate pre-congestion notification
457	   (PCN [I-D.ietf-pcn-marking-behaviour]) only works correctly in the
458	   presence of RFC4301 IPsec encapsulation or [RFC5129] MPLS
459	   encapsulation, but not with RFC3168 IP in IP encapsulation
460	   (Appendix E explains why).  The PCN architecture
461	   [I-D.ietf-pcn-architecture] states that the regular RFC3168 rules for
462	   IP in IP tunnelling of the ECN field should not be used for PCN.  But
463	   if non-IPsec tunnels are already present within a network to which
464	   PCN is being added, that is not particularly helpful advice.

466	   The present specification provides a clean solution to this problem,
467	   so that network operators who want to use PCN and tunnels can specify
468	   that all tunnel endpoints in a PCN region need to be upgraded to
469	   comply with this specification.  Also, whether using PCN or not, as
470	   more tunnel endpoints comply with this specification, it should make
471	   ECN behaviour simpler, faster and more predictable.

473	   To ensure copying rather than resetting CE on ingress will not cause
474	   unintended side-effects, Appendix A assesses whether either harm any
475	   security, control or management functions.  It finds that resetting
476	   CE makes life difficult in a number of directions, while copying CE
477	   harms nothing (other than opening a low bit-rate covert channel
478	   vulnerability which the IETF Security Area now deems is manageable).

480	3.2.  Decapsulation at Tunnel Egress

482	   Both RFC3168 and RFC4301 specify the decapsulation behaviour
483	   summarised in Figure 2.  The ECN field in the outgoing header is set
484	   to the codepoint at the intersection of the appropriate incoming
485	   inner header (row) and incoming outer header (column).
486	    +------------------+----------------------------------------------+
487	    |  Incoming Inner  |             Incoming Outer Header            |
488	    |      Header      +---------+------------+------------+----------+
489	    |                  | Not-ECT |   ECT(0)   |   ECT(1)   |    CE    |
490	    +------------------+---------+------------+------------+----------+
491	    |     Not-ECT      | Not-ECT |   drop(!!!)|   drop(!!!)| drop(!!!)|
492	    |      ECT(0)      |  ECT(0) | ECT(0)     | ECT(0)     |   CE     |
493	    |      ECT(1)      |  ECT(1) | ECT(1)     | ECT(1)     |   CE     |
494	    |        CE        |      CE |     CE     |     CE     |   CE     |
495	    +------------------+---------+------------+------------+----------+
496	                       |                Outgoing Header               |
497	                       +----------------------------------------------+

499	     Figure 2: IP in IP Decapsulation; Recap of Pre-existing Behaviour

501	   The behaviour in the table derives from the logic given in RFC3168,
502	   briefly recapped as follows:

504	   o  On decapsulation, if the inner ECN field is Not-ECT but the outer
505	      ECN field is anything except Not-ECT the decapsulator must drop
506	      the packet.  Drop is mandated because known legal protocol
507	      transitions should not be able to lead to these cases (indicated
508	      in the table by '(!!!)'), therefore the decapsulator may also
509	      raise an alarm;

511	   o  In all other cases, the outgoing ECN field is set to the more
512	      severe marking of the outer and inner ECN fields, where the
513	      ranking of severity from highest to lowest is CE, ECT, Not-ECT;

515	   o  ECT(0) and ECT(1) are considered of equal severity (indicated by
516	      just 'ECT' in the rank order above).  Where the inner and outer
517	      ECN fields are both ECT but they differ, the packet is forwarded
518	      with the codepoint of the inner ECN field, which prevents ECT
519	      codepoints being used for a covert channel.

521	   The specification for decapsulation in Section 4 fixes two problems
522	   with this pre-existing behaviour:

524	   o  Firstly, forwarding the codepoint of the inner header in the cases
525	      where both inner and outer are different values of ECT effectively
526	      implies that any distinction between ECT(0) and ECT(1) cannot be
527	      introduced in the future wherever a tunnel might be deployed.
528	      Therefore, the currently specified tunnel decapsulation behaviour
529	      unnecessarily wastes one of four codepoints (effectively wasting
530	      half a bit) in the IP (v4 & v6) header.  As explained in
531	      Appendix A.1, the original reason for not using the outer ECT
532	      codepoints for onward forwarding was to limit the covert channel
533	      across a decapsulator to 1 bit per packet.  However, now that the
534	      IETF Security Area has deemed that a 2-bit covert channel through
535	      an encapsulator is a manageable risk, the same should be true for
536	      a decapsulator.

538	      As well as being a general future-proofing issue, this problem is
539	      immediately pressing for standardisation of pre-congestion
540	      notification (PCN).  PCN solutions generally require three
541	      encoding states in addition to Not-ECT: one for 'not marked' and
542	      two increasingly severe levels of marking.  Although the ECN field
543	      gives sufficient codepoints for these three states, they cannot
544	      all be used for PCN because a change between ECT(0) and ECT(1) in
545	      any tunnelled packet would be lost when the outer header was
546	      decapsulated, dangerously discarding congestion signalling.  A
547	      number of wasteful or convoluted work-rounds to this problem are
548	      being considered for standardisation by the PCN working group (see
549	      Appendix D), but by far the simplest approach is just to remove
550	      the covert channel blockages from tunnelling behaviour, that are
551	      now deemed unnecessary anyway.  Not only will this streamline PCN
552	      standardisation, but it could also streamline other future uses of
553	      these codepoints.

555	   o  Secondly, mandating drop is not always a good idea just because a
556	      combination of headers seems invalid.  There are many cases where
557	      it has become nearly impossible to deploy new standards because
558	      legacy middleboxes drop packets carrying header values they don't
559	      expect.  Where possible, the new decapsulation behaviour specified
560	      in Section 4 below is more liberal in its response to unexpected
561	      combinations of headers.

563	4.  New ECN Tunnelling Rules

565	   The ECN tunnel processing rules below in Section 4.1 (ingress
566	   encapsulation) and Section 4.2 (egress decapsulation) are the default
567	   for a packet with any DSCP.  If required, different ECN encapsulation
568	   rules MAY be defined as part of the definition of an appropriate
569	   Diffserv PHB using the guidelines that follow in Section 4.3.
570	   However, the deployment burden of handling exceptional PHBs in
571	   implementations of all affected tunnels and lower layer link
572	   protocols should not be underestimated.

574	4.1.  Default Tunnel Ingress Behaviour

576	   A tunnel ingress compliant with this specification MUST implement a
577	   `normal mode'.  It might also need to implement a `compatibility
578	   mode' for backward compatibility with legacy tunnel egresses that do
579	   not understand ECN (see Section 5 for when compatibility mode is
580	   required).  Note that these are modes of the ingress tunnel endpoint
581	   only, not the tunnel as a whole.

583	   Whatever the mode, the tunnel ingress forwards the inner header
584	   without changing the ECN field.  In normal mode a tunnel ingress
585	   compliant with this specification MUST construct the outer
586	   encapsulating IP header by copying the 2-bit ECN field of the
587	   arriving IP header.  In compatibility mode it clears the ECN field in
588	   the outer header to the Not-ECT codepoint.  These rules are tabulated
589	   for convenience in Figure 3.
590	            +-----------------+-------------------------------+
591	            | Incoming Header |     Outgoing Outer Header     |
592	            | (also equal to  +---------------+---------------+
593	            | Outgoing Inner  | Compatibility |    Normal     |
594	            |     Header)     |     Mode      |     Mode      |
595	            +-----------------+---------------+---------------+
596	            |    Not-ECT      |   Not-ECT     |   Not-ECT     |
597	            |     ECT(0)      |   Not-ECT     |    ECT(0)     |
598	            |     ECT(1)      |   Not-ECT     |    ECT(1)     |
599	            |       CE        |   Not-ECT     |      CE       |
600	            +-----------------+---------------+---------------+

602	              Figure 3: New IP in IP Encapsulation Behaviours

604	   Compatibility mode is the same per packet behaviour as the ingress
605	   end of RFC3168's limited functionality mode.  Normal mode is the same
606	   per packet behaviour as the ingress end of RFC4301 IPsec.

608	4.2.  Default Tunnel Egress Behaviour

610	   To decapsulate the inner header at the tunnel egress, a compliant
611	   tunnel egress MUST set the outgoing ECN field to the codepoint at the
612	   intersection of the appropriate incoming inner header (row) and outer
613	   header (column) in Figure 4.

615	    +------------------+----------------------------------------------+
616	    |  Incoming Inner  |             Incoming Outer Header            |
617	    |      Header      +---------+------------+------------+----------+
618	    |                  | Not-ECT |   ECT(0)   |   ECT(1)   |    CE    |
619	    +------------------+---------+------------+------------+----------+
620	    |     Not-ECT      | Not-ECT |Not-ECT(!!!)|   drop(!!!)| drop(!!!)|
621	    |      ECT(0)      |  ECT(0) | ECT(0)     | ECT(1)     |   CE     |
622	    |      ECT(1)      |  ECT(1) | ECT(1)(!!!)| ECT(1)     |   CE     |
623	    |        CE        |      CE |     CE     |     CE(!!!)|   CE     |
624	    +------------------+---------+------------+------------+----------+
625	                       |                Outgoing Header               |
626	                       +----------------------------------------------+

628	              Figure 4: New IP in IP Decapsulation Behaviour

630	   This table for decapsulation behaviour is derived from the following
631	   logic:

633	   o  If the inner ECN field is Not-ECT the decapsulator MUST NOT
634	      propagate any other ECN codepoint in the outer header onwards.
635	      This is because the inner Not-ECT marking is set by transports
636	      that would not understand the ECN protocol.  Instead:

638	      *  If the inner ECN field is Not-ECT and the outer ECN field is
639	         ECT(1) or CE the decapsulator MUST drop the packet.
640	         Reasoning: these combinations of codepoints either imply some
641	         illegal protocol transition has occurred within the tunnel, or
642	         that some locally defined mechanism is being used within the
643	         tunnel that might be signalling congestion.  In either case,
644	         the only appropriate signal to the transport is a packet drop.
645	         It would have been nice to allow packets with ECT(1) in the
646	         outer to be forwarded, but drop has had to be mandated in case
647	         future multi-level ECN schemes are defined.  Then ECT(1) and CE
648	         can be used in the future to signify two levels of congestion
649	         severity.

651	      *  If the inner ECN field is Not-ECT and the outer ECN field is
652	         ECT(0) or Not-ECT the decapsulator MUST forward the packet with
653	         the ECN field cleared to Not-ECT.
654	         Reasoning: Although no known legal protocol transition would
655	         lead to ECT(0) in the outer and Not-ECT in the inner, no known
656	         or proposed protocol uses ECT(0) as a congestion signal either.
657	         Therefore in this case the packet can be forwarded rather than
658	         dropped, which will allow future standards actions to use this
659	         combination.

661	   o  In all other cases, the outgoing ECN field is set to the more
662	      severe marking of the outer and inner ECN fields, where the
663	      ranking of severity from highest to lowest is CE, ECT(1), ECT(0),
664	      Not-ECT;

666	   o  There are cases where no currently legal transition in any current
667	      or previous ECN tunneling specification would result in certain
668	      combinations of inner and outer ECN fields.  These cases are
669	      indicated in Figure 4 by '(!!!)').  In these cases, the
670	      decapsulator SHOULD log the event and MAY also raise an alarm, but
671	      not so often that the illegal combinations would amplify into a
672	      flood of alarm messages.

674	   The above logic allows for ECT(0) and ECT(1) to both represent the
675	   same severity of congestion marking (e.g. "not congestion marked").
676	   But it also allows future schemes to be defined where ECT(1) is a
677	   more severe marking than ECT(0).  This approach is discussed in
678	   Appendix D and in the discussion of the ECN nonce [RFC3540] in
679	   Section 8.

681	4.3.  Design Principles for Future Non-Default Schemes

683	   This section is informative not normative.

685	   S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
686	   'switch in' different behaviours for marking the ECN field, just as
687	   it switches in different per-hop behaviours (PHBs) for scheduling.
688	   Therefore here we give guidance for designing possibly different
689	   marking schemes.

691	   In one word the guidance is "Don't".  If a scheme requires tunnels to
692	   implement special processing of the ECN field for certain DSCPs, it
693	   is highly unlikely that every implementer of every tunnel will want
694	   to add the required exception and that operators will want to deploy
695	   the required configuration options.  Therefore it is highly likely
696	   that some tunnels within a network will not implement this special
697	   case.  Therefore, designers should avoid non-default tunnelling
698	   schemes if at all possible.

700	   That said, if a non-default scheme for processing the ECN field is
701	   really required, the following guidelines may prove useful in its
702	   design:

704	   o  For any new scheme, a tunnel ingress should not set the ECN field
705	      of the outer header if it cannot guarantee that any corresponding
706	      tunnel egress will understand how to handle such an ECN field.

708	   o  On encapsulation in any new scheme, an outer header capable of
709	      carrying congestion markings should reflect accumulated congestion
710	      since the last interface designed to regulate load (see
711	      Appendix A.2 for the definition of a Load Regulator, which is
712	      usually but not always the data source).  This implies that new
713	      schemes for tunnelling congestion notification should copy
714	      congestion notification into the outer header of each new
715	      encapsulating header that supports it.

717	      Reasoning: The constraints from the three perspectives of
718	      security, control and management in Appendix A are somewhat in
719	      tension as to whether a tunnel ingress should copy congestion
720	      markings into the outer header it creates or reset them.  From the
721	      control perspective either copying or resetting works for existing
722	      arrangements, but copying has more potential for simplifying
723	      control.  From the management perspective copying is preferable.
724	      From the security perspective resetting is preferable but copying
725	      is now considered acceptable given the bandwidth of a 2-bit covert
726	      channel can be managed.  Therefore, on balance, copying is simpler
727	      and more useful than resetting and does minimal harm.

729	   o  For any new scheme, a tunnel egress should not forward any ECN
730	      codepoint if the arriving inner header implies the transport will
731	      not understand how to process it.

733	   o  On decapsulation in any new scheme, if a combination of inner and
734	      outer headers is encountered that should not have been possible,
735	      this event should be logged and an alarm raised.  But the packet
736	      should still be forwarded with a safe codepoint setting if at all
737	      possible.  This increases the chances of 'forward compatibility'
738	      with possible future protocol extensions.

740	   o  On decapsulation in any new scheme, the ECN field that the tunnel
741	      egress forwards should reflect the more severe congestion marking
742	      of the arriving inner and outer headers.

744	5.  Backward Compatibility

746	   Note: in RFC3168, a whole tunnel was considered in one of two modes:
747	   limited functionality or full functionality.  The new modes defined
748	   in this specification are only modes of the tunnel ingress.  The new
749	   tunnel egress behaviour has only one mode and doesn't need to know
750	   what mode the ingress is in.

752	5.1.  Non-Issues Upgrading Any Tunnel Decapsulation

754	   This specification only changes the egress per-packet calculation of
755	   the ECN field for combinations of inner and outer headers that have
756	   so far not been used in any IETF protocols.  Therefore, a tunnel
757	   egress complying with any previous specification (RFC4301, both modes
758	   of RFC3168, both modes of RFC2481, RFC2401 and RFC2003) can be
759	   upgraded to comply with this new decapsulation specification without
760	   any backwards compatibility issues.

762	   The proposed tunnel egress behaviour also requires no additional mode
763	   or option configuration at the ingress or egress nor any additional
764	   negotiation with the ingress.  A compliant tunnel egress merely needs
765	   to implement the one behaviour in Section 4.  The reduction to one
766	   mode at the egress has no backwards compatibility issues, because
767	   previously the egress produced the same output whichever mode the
768	   tunnel was in.

770	   These new decapsulation rules have been defined in such a way that
771	   congestion control will still work safely if any of the earlier
772	   versions of ECN processing are used unilaterally at the encapsulating
773	   ingress of the tunnel (any of RFC2003, RFC2401, either mode of
774	   RFC2481, either mode of RFC3168, RFC4301 and this present
775	   specification).  If a tunnel ingress tries to negotiate to use
776	   limited functionality mode or full functionality mode [RFC3168], a
777	   decapsulating tunnel egress compliant with this specification MUST
778	   agree to either request, as its behaviour will be the same in both
779	   cases.

781	   For 'forward compatibility', a compliant tunnel egress SHOULD raise a
782	   warning about any requests to enter modes it doesn't recognise, but
783	   it can continue operating.  If no ECN-related mode is requested, a
784	   compliant tunnel egress can continue without raising any error or
785	   warning as its egress behaviour is compatible with all the legacy
786	   ingress behaviours that don't negotiate capabilities.

788	5.2.  Non-Issues for RFC4301 IPsec Encapsulation

790	   The new normal mode of ingress behaviour defined above (Section 4.1)
791	   brings all IP in IP tunnels into line with [RFC4301].  If one end of
792	   an IPsec tunnel is compliant with [RFC4301], the other end is
793	   guaranteed to also be RFC4301-compliant (there could be corner cases
794	   where manual keying is used, but they will be set aside here).
795	   Therefore the new normal ingress behaviour introduces no backward
796	   compatibility isses with IKEv2 [RFC4306] IPsec [RFC4301] tunnels, and
797	   no need for any new modes, options or configuration.

799	5.3.  Upgrading Other IP in IP Tunnel Encapsulators

801	   At the tunnel ingress, this specification effectively extends the
802	   scope of RFC4301's ingress behaviour to any IP in IP tunnel.  If any
803	   other IP in IP tunnel ingress (i.e. not RFC4301 IPsec) is upgraded to
804	   be compliant with this specification, it has to cater for the
805	   possibility that it is talking to a legacy tunnel egress that may not
806	   know how to process the ECN field.  If ECN capable outer headers were
807	   sent towards a legacy (e.g.  [RFC2003]) egress, it would most likely
808	   simply disregard the outer headers, dangerously discarding
809	   information about congestion experienced within the tunnel.  ECN-
810	   capable traffic sources would not see any congestion feedback and
811	   instead continually ratchet up their share of the bandwidth without
812	   realising that cross-flows from other ECN sources were continually
813	   having to ratchet down.

815	   This specification introduces no new backward compatibility issues
816	   when a compliant ingress talks with a legacy egress, but it has to
817	   provide similar sfaeguards to those already defined in RFC3168.
818	   Therefore, to comply with this specification, a tunnel ingress that
819	   does not always know the ECN capability of its tunnel egress MUST
820	   implement a 'normal' mode and a 'compatibility' mode, and for safety
821	   it MUST initiate each negotiated tunnel in compatibility mode.

823	   However, a tunnel ingress can be compliant even if it only implements
824	   the 'normal mode' of encapsulation behaviour, but only as long as it
825	   is designed or configured so that all possible tunnel egress nodes it
826	   will ever talk to will have at least full ECN functionality
827	   (complying with either RFC3168 full functionality mode, RFC4301 or
828	   this present specification).

830	   Before switching to normal mode, a compliant tunnel ingress that does
831	   not know the egress ECN capability MUST negotiate with the tunnel
832	   egress.  If the egress says it is compliant with this specification
833	   or with RFC3168 full functionality mode, the ingress puts itself into
834	   normal mode.  If the egress denies compliance with all of these or
835	   doesn't understand the question, the tunnel ingress MUST remain in
836	   compatibility mode.

838	   The encapsulation rules for normal mode and compatibility mode are
839	   defined in Section 4 (i.e. header copying or zeroing respectively).

841	   An ingress cannot claim compliance with this specification simply by
842	   disabling ECN processing across the tunnel (only implementing
843	   compatibility mode).  Although such a tunnel ingress is at least safe
844	   with the ECN behaviour of any egress it may encounter (any of
845	   RFC2003, RFC2401, either mode of RFC2481 and RFC3168's limited
846	   functionality mode), it doesn't meet the aim of introducing ECN.

848	   Therefore, a compliant tunnel ingress MUST at least implement `normal
849	   mode' and, if it might be used with arbitrary tunnel egress nodes, it
850	   MUST also implement `compatibility mode'.

852	   Implementation note: if a compliant node is the ingress for multiple
853	   tunnels, a mode setting will need to be stored for each tunnel
854	   ingress.  However, if a node is the egress for multiple tunnels, none
855	   of the tunnels will need to store a mode setting, because a compliant
856	   egress can only be in one mode.

858	6.  Changes from Earlier RFCs

860	   On encapsulation, the rule that a normal mode tunnel ingress MUST
861	   copy any ECN field into the outer header is a change to the ingress
862	   behaviour of RFC3168, but it is the same as the rules for IPsec
863	   tunnels in RFC4301.

865	   On decapsulation, the rules for calculating the outgoing ECN field at
866	   a tunnel egress are similar to the full functionality mode of ECN in
867	   RFC3168 and to RFC4301, with the following exceptions:

869	   o  The outer, not the inner, is propagated when the outer is ECT(1)
870	      and the inner is ECT(0);

872	   o  A packet with Not-ECT in the inner may be forwarded as Not-ECT
873	      rather than dropped, if the outer is ECT(0);

875	   o  The following extra illegal combinations have been identified,
876	      which may require logging and/or an alarm: outer ECT(1) with inner
877	      CE; outer ECT(0) with inner ECT(1)

879	   The rules for how a tunnel establishes whether the egress has full
880	   functionality ECN capabilities are an update to RFC3168.  For all the
881	   typical cases, RFC4301 is not updated by the ECN capability check in
882	   this specification, because a typical RFC4301 tunnel ingress will
883	   have already established that it is talking to an RFC4301 tunnel
884	   egress (e.g. if it uses IKEv2).  However, there may be some corner
885	   cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with
886	   an egress with limited functionality ECN handling.  Strictly, for
887	   such corner cases, the requirement to use compatibility mode in this
888	   specification updates RFC4301, but this is unlikely to be necessary
889	   to implement for this corner case in practice.

891	   The optional ECN Tunnel field in the IPsec security association
892	   database (SAD) and the optional ECN Tunnel Security Association
893	   Attribute defined in RFC3168 are no longer needed.  The security
894	   association (SA) has no policy on ECN usage, because all RFC4301
895	   tunnels now support ECN without any policy choice.

897	   RFC3168 defines a (required) limited functionality mode and an
898	   (optional) full functionality mode for a tunnel, but RFC4301 doesn't
899	   need modes.  In this specification only the ingress might need two
900	   modes: a normal mode (required) and a compatibility mode (required in
901	   some scenarios, optional in others).  The egress needs only one mode
902	   which correctly handles any ingress ECN behaviour.

904	Additional changes to the RFC Index (to be removed by the RFC Editor):

906	   In the RFC index, RFC3168 should be identified as an update to
907	   RFC2003.  RFC4301 should be identified as an update to RFC3168.

909	   This specification updates RFC3168 and RFC4301.

911	7.  IANA Considerations

913	   This memo includes no request to IANA.

915	8.  Security Considerations

917	   Appendix A.1 discusses the security constraints imposed on ECN tunnel
918	   processing.  The new rules for ECN tunnel processing (Section 4)
919	   trade-off between security (covert channels) and congestion
920	   monitoring & control.  In fact, ensuring congestion markings are not
921	   lost is itself another aspect of security, because if we allowed
922	   congestion notification to be lost, any attempt to enforce a response
923	   to congestion would be much harder.

925	   If alternate congestion notification semantics are defined for a
926	   certain PHB (e.g. the pre-congestion notification architecture
927	   [I-D.ietf-pcn-architecture]), the scope of the alternate semantics
928	   might typically be bounded by the limits of a Diffserv region or
929	   regions, as envisaged in [RFC4774].  The inner headers in tunnels
930	   crossing the boundary of such a Diffserv region but ending within the
931	   region can potentially leak the external congestion notification
932	   semantics into the region, or leak the internal semantics out of the
933	   region.  [RFC2983] discusses the need for Diffserv traffic
934	   conditioning to be applied at these tunnel endpoints as if they are
935	   at the edge of the Diffserv region.  Similar concerns apply to any
936	   processing or propagation of the ECN field at the edges of a Diffserv
937	   region with alternate ECN semantics.  Such edge processing must also
938	   be applied at the endpoints of tunnels with one end inside and the
939	   other outside the domain.  [I-D.ietf-pcn-architecture] gives specific
940	   advice on this for the PCN case, but other definitions of alternate
941	   semantics will need to discuss the specific security implications in
942	   each case.

944	   With the decapsulation rules as they stood in RFC3168 and RFC4301, a
945	   small part of the protection of the ECN nonce [RFC3540] was
946	   compromised.  The new decapsulation rules do not solve this problem.

948	   The minor problem is as follows: The ECN nonce was defined to enable
949	   the data source to detect if a CE marking had been applied then
950	   subsequently removed.  The source could detect this by weaving a
951	   pseudo-random sequence of ECT(0) and ECT(1) values into a stream of
952	   packets, which is termed an ECN nonce.  By the decapsulation rules in
953	   RFC3168 and RFC4301, if the inner and outer headers carry
954	   contradictory ECT values only the inner header is preserved for
955	   onward forwarding.  So if a CE marking added to the outer ECN field
956	   in a tunnel has been illegally (or accidentally) suppressed by a
957	   subsequent node in the tunnel, the decapsulator will revert the ECN
958	   field to its value before tampering, hiding all evidence of the crime
959	   from the onward feedback loop.  We chose not to close this minor
960	   loophole for all the following reasons:

962	   1.  This loophole is only applicable in the corner case where the
963	       attacker controls a network node downstream of a congested node
964	       in the same tunnel;

966	   2.  In tunnelling scenarios, the ECN nonce is already vulnerable to
967	       suppression by nodes downstream of a congested node in the same
968	       tunnel, if they can copy the ECT value in the inner header to the
969	       outer header (any node in the tunnel can do this if the inner
970	       header is not encrypted, and an IPsec tunnel egress can do it
971	       whether or not the tunnel is encrypted);

973	   3.  Although the new decapsulation behaviour removes evidence of
974	       congestion suppression from the onward feedback loop, the
975	       decapsulator itself can at least detect that congestion within
976	       the tunnel has been suppressed;

978	   4.  The ECN nonce [RFC3540] currently has experimental status and
979	       there has been no evidence that anyone has implemented it beyond
980	       the author's prototype.

982	   We could have fixed this loophole by specifying that the outer header
983	   should always be propagated onwards if inner and outer are both ECT.
984	   Although this would close the minor loophole in the nonce, it would
985	   raise a minor safety issue if multilevel ECN or PCN were used.  A
986	   less severe marking in the inner header would override a more severe
987	   one in the outer.  Both are corner cases so it is difficult to decide
988	   which is more important:

990	   1.  The loophole in the nonce is only for a minor case of one tunnel
991	       node attacking another in the same tunnel;

993	   2.  The severity inversion for multilevel congestion notification
994	       would not result from any legal codepoint transition.

996	   We decided safety against misconfiguration was slightly more
997	   important than securing against an attack that has little, if any,
998	   clear motivation.

1000	   If a legacy security policy configures a legacy tunnel ingress to
1001	   negotiate to turn off ECN processing, a compliant tunnel egress will
1002	   agree to a request to turn off ECN processing but it will actually
1003	   still copy CE markings from the outer to the forwarded header.
1004	   Although the tunnel ingress 'I' in Figure 5 (Appendix A.1) will set
1005	   all ECN fields in outer headers to Not-ECT, 'M' could still toggle CE
1006	   on and off to communicate covertly with 'B', because we have
1007	   specified that 'E' only has one mode regardless of what mode it says
1008	   it has negotiated.  We could have specified that 'E' should have a
1009	   limited functionality mode and check for such behaviour.  But we
1010	   decided not to add the extra complexity of two modes on a compliant
1011	   tunnel egress merely to cater for a legacy security concern that is
1012	   now considered manageable.

1014	9.  Conclusions

1016	   This document updates the ingress tunnelling encapsulation of RFC3168
1017	   ECN for all IP in IP tunnels to bring it into line with the new
1018	   behaviour in the IPsec architecture of RFC4301.  It copies rather
1019	   than resets a congestion experienced (CE) marking when creating outer
1020	   headers.

1022	   It also specifies new rules that update both RFC3168 and RFC4301 for
1023	   calculating the outgoing ECN field on tunnel decapsulation.  The new
1024	   rules update egress behaviour for two specific combinations of inner
1025	   and outer header that have no current legal usage, but will now be
1026	   possible to use in future standards actions, rather than being wasted
1027	   by current tunnelling behaviour.

1029	   The new rules propagate changes to the ECN field across tunnel end-
1030	   points that were previously blocked due to a perceived covert channel
1031	   vulnerability.  The new IPsec architecture deems the two-bit covert
1032	   channel that the ECN field opens up is a manageable threat, so these
1033	   new rules bring all IP in IP tunnelling into line with this new more
1034	   permissive attitude.  The result is a single specification for all
1035	   future tunnelling of ECN, whether IPsec or not.  Then equipment can
1036	   be specified against a single ECN behaviour and ECN markings can have
1037	   a well-defined meaning wherever they are measured in a network.  This
1038	   new certainty will enable new uses of the ECN field that would
1039	   otherwise be confounded by ambiguity.

1041	   The immediate motivation for making these changes is to allow the
1042	   introduction of multi-level pre-congestion notification (PCN).  But
1043	   great care has been taken to ensure the resulting ECN tunnelling
1044	   behaviour is simple and generic for other potential future uses.

1046	   The change to encapsulation has been analysed from the three
1047	   perspectives of security, control and management.  They are somewhat
1048	   in tension as to whether a tunnel ingress should copy congestion
1049	   markings into the outer header it creates or reset them.  From the
1050	   control perspective either copying or resetting works for existing
1051	   arrangements, but copying has more potential for simplifying control
1052	   and resetting breaks at least one proposal already on the standards
1053	   track.  From the management and monitoring perspective copying is
1054	   preferable.  From the network security perspective (theft of service
1055	   etc) copying is preferable.  From the information security
1056	   perspective resetting is preferable, but the IETF Security Area now
1057	   considers copying acceptable given the bandwidth of a 2-bit covert
1058	   channel can be managed.  Therefore there are no points against
1059	   copying and a number against resetting CE on ingress.

1061	   The only downside of the changes to decapsulation is that the same
1062	   2-bit covert channel is opened up as at the ingress, but this is now
1063	   deemed to be a manageable threat.  The changes at decapsulation have
1064	   been found to be free of any backwards compatibility issues.

1066	10.  Acknowledgements

1068	   Thanks to Anil Agawaal for pointing out a case where it's safe for a
1069	   tunnel decapsulator to forward a combination of headers it doesn't
1070	   understand.  Thanks to David Black for explaining a better way to
1071	   think about function placement and to Louise Burness for a better way
1072	   to think about multilayer transports and networks, having read
1073	   [Patterns_Arch].  Also thanks to Arnaud Jacquet for the idea for
1074	   Appendix C.  Thanks to Michael Menth, Bruce Davie, Toby Moncaster,
1075	   Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for
1076	   their thoughts and careful review comments.

1078	   Bob Briscoe is partly funded by Trilogy, a research project (ICT-
1079	   216372) supported by the European Community under its Seventh
1080	   Framework Programme.  The views expressed here are those of the
1081	   author only.

1083	11.  Comments Solicited

1085	   Comments and questions are encouraged and very welcome.  They can be
1086	   addressed to the IETF Transport Area working group mailing list
1087	   <tsvwg@ietf.org>, and/or to the authors.

1089	12.  References

1091	12.1.  Normative References

1093	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
1094	              October 1996.

1096	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1097	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1099	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
1100	              "Definition of the Differentiated Services Field (DS
1101	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
1102	              December 1998.

1104	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1105	              of Explicit Congestion Notification (ECN) to IP",
1106	              RFC 3168, September 2001.

1108	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1109	              Internet Protocol", RFC 4301, December 2005.

1111	12.2.  Informative References

1113	   [I-D.briscoe-pcn-3-in-1-encoding]
1114	              Briscoe, B., "PCN 3-State Encoding Extension in a single
1115	              DSCP", draft-briscoe-pcn-3-in-1-encoding-00 (work in
1116	              progress), October 2008.

1118	   [I-D.charny-pcn-single-marking]
1119	              Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre-
1120	              Congestion Notification Using Single Marking for Admission
1121	              and  Termination", draft-charny-pcn-single-marking-03
1122	              (work in progress), November 2007.

1124	   [I-D.ietf-pcn-architecture]
1125	              Eardley, P., "Pre-Congestion Notification (PCN)
1126	              Architecture", draft-ietf-pcn-architecture-10 (work in
1127	              progress), March 2009.

1129	   [I-D.ietf-pcn-baseline-encoding]
1130	              Moncaster, T., Briscoe, B., and M. Menth, "Baseline
1131	              Encoding and Transport of Pre-Congestion Information",
1132	              draft-ietf-pcn-baseline-encoding-02 (work in progress),
1133	              February 2009.

1135	   [I-D.ietf-pcn-marking-behaviour]
1136	              Eardley, P., "Marking behaviour of PCN-nodes",
1137	              draft-ietf-pcn-marking-behaviour-02 (work in progress),
1138	              March 2009.

1140	   [I-D.ietf-pwe3-congestion-frmwk]
1141	              Bryant, S., Davie, B., Martini, L., and E. Rosen,
1142	              "Pseudowire Congestion Control Framework",
1143	              draft-ietf-pwe3-congestion-frmwk-01 (work in progress),
1144	              May 2008.

1146	   [I-D.menth-pcn-psdm-encoding]
1147	              Menth, M., Babiarz, J., Moncaster, T., and B. Briscoe,
1148	              "PCN Encoding for Packet-Specific Dual Marking (PSDM)",
1149	              draft-menth-pcn-psdm-encoding-00 (work in progress),
1150	              July 2008.

1152	   [I-D.moncaster-pcn-3-state-encoding]
1153	              Moncaster, T., Briscoe, B., and M. Menth, "A three state
1154	              extended PCN encoding scheme",
1155	              draft-moncaster-pcn-3-state-encoding-01 (work in
1156	              progress), March 2009.

1158	   [I-D.satoh-pcn-st-marking]
1159	              Satoh, D., Maeda, Y., Phanachet, O., and H. Ueno, "Single
1160	              PCN Threshold Marking by using PCN baseline encoding for
1161	              both  admission and termination controls",
1162	              draft-satoh-pcn-st-marking-01 (work in progress),
1163	              March 2009.

1165	   [IEEE802.1au]
1166	              IEEE, "IEEE Standard for Local and Metropolitan Area
1167	              Networks--Virtual Bridged Local Area Networks - Amendment
1168	              10: Congestion Notification", 2008,
1169	              <http://www.ieee802.org/1/pages/802.1au.html>.

1171	              (Work in Progress; Access Controlled link within page)

1173	   [ITU-T.I.371]
1174	              ITU-T, "Traffic Control and Congestion Control in B-ISDN",
1175	              ITU-T Rec. I.371 (03/04), March 2004.

1177	   [PCNcharter]
1178	              IETF, "Congestion and Pre-Congestion Notification (pcn)",
1179	              IETF w-g charter , Feb 2007,
1180	              <http://www.ietf.org/html.charters/pcn-charter.html>.

1182	   [Patterns_Arch]
1183	              Day, J., "Patterns in Network Architecture: A Return to
1184	              Fundamentals", Pub: Prentice Hall ISBN-13: 9780132252423,
1185	              Jan 2008.

1187	   [RFC1254]  Mankin, A. and K. Ramakrishnan, "Gateway Congestion
1188	              Control Survey", RFC 1254, August 1991.

1190	   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
1191	              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
1192	              Functional Specification", RFC 2205, September 1997.

1194	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1195	              RFC 2983, October 2000.

1197	   [RFC3426]  Floyd, S., "General Architectural and Policy
1198	              Considerations", RFC 3426, November 2002.

1200	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
1201	              Congestion Notification (ECN) Signaling with Nonces",
1202	              RFC 3540, June 2003.

1204	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
1205	              RFC 4306, December 2005.

1207	   [RFC4423]  Moskowitz, R. and P. Nikander, "Host Identity Protocol
1208	              (HIP) Architecture", RFC 4423, May 2006.

1210	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
1211	              Explicit Congestion Notification (ECN) Field", BCP 124,
1212	              RFC 4774, November 2006.

1214	   [RFC5129]  Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
1215	              Marking in MPLS", RFC 5129, January 2008.

1217	   [Shayman]  "Using ECN to Signal Congestion Within an MPLS Domain",
1218	              2000, <http://www.ee.umd.edu/~shayman/papers.d/
1219	              draft-shayman-mpls-ecn-00.txt>.

1221	              (Expired)

1223	Appendix A.  Design Constraints

1225	   Tunnel processing of a congestion notification field has to meet
1226	   congestion control and management needs without creating new
1227	   information security vulnerabilities (if information security is
1228	   required).  This appendix documents the analysis of the tradeoffs
1229	   between these factors that led to the new encapsulation rules in
1230	   Section 4.1.

1232	A.1.  Security Constraints

1234	   Information security can be assured by using various end to end
1235	   security solutions (including IPsec in transport mode [RFC4301]), but
1236	   a commonly used scenario involves the need to communicate between two
1237	   physically protected domains across the public Internet.  In this
1238	   case there are certain management advantages to using IPsec in tunnel
1239	   mode solely across the publicly accessible part of the path.  The
1240	   path followed by a packet then crosses security 'domains'; the ones
1241	   protected by physical or other means before and after the tunnel and
1242	   the one protected by an IPsec tunnel across the otherwise unprotected
1243	   domain.  We will use the scenario in Figure 5 where endpoints 'A' and
1244	   'B' communicate through a tunnel.  The tunnel ingress 'I' and egress
1245	   'E' are within physically protected edge domains, while the tunnel
1246	   spans an unprotected internetwork where there may be 'men in the
1247	   middle', M.

1249	                physically       unprotected     physically
1250	            <-protected domain-><--domain--><-protected domain->
1251	            +------------------+            +------------------+
1252	            |                  |      M     |                  |
1253	            |    A-------->I=========>==========>E-------->B   |
1254	            |                  |            |                  |
1255	            +------------------+            +------------------+
1256	                                <----IPsec secured---->
1257	                                        tunnel

1259	                      Figure 5: IPsec Tunnel Scenario

1261	   IPsec encryption is typically used to prevent 'M' seeing messages
1262	   from 'A' to 'B'.  IPsec authentication is used to prevent 'M'
1263	   masquerading as the sender of messages from 'A' to 'B' or altering
1264	   their contents.  But 'I' can also use IPsec tunnel mode to allow 'A'
1265	   to communicate with 'B', but impose encryption to prevent 'A' leaking
1266	   information to 'M'.  Or 'E' can insist that 'I' uses tunnel mode
1267	   authentication to prevent 'M' communicating information to 'B'.
1268	   Mutable IP header fields such as the ECN field (as well as the TTL/
1269	   Hop Limit and DS fields) cannot be included in the cryptographic
1270	   calculations of IPsec.  Therefore, if 'I' copies these mutable fields
1271	   into the outer header that is exposed across the tunnel it will have
1272	   allowed a covert channel from 'A' to M that bypasses its encryption
1273	   of the inner header.  And if 'E' copies these fields from the outer
1274	   header to the inner, even if it validates authentication from 'I', it
1275	   will have allowed a covert channel from 'M' to 'B'.

1277	   ECN at the IP layer is designed to carry information about congestion
1278	   from a congested resource towards downstream nodes.  Typically a
1279	   downstream transport might feed the information back somehow to the
1280	   point upstream of the congestion that can regulate the load on the
1281	   congested resource, but other actions are possible (see [RFC3168]
1282	   S.6).  In terms of the above unicast scenario, ECN is typically
1283	   intended to create an information channel from 'M' to 'B' (for 'B' to
1284	   feed back to 'A').  Therefore the goals of IPsec and ECN are mutually
1285	   incompatible.

1287	   With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
1288	   "controls are provided to manage the bandwidth of this [covert]
1289	   channel".  Using the ECN processing rules of RFC4301, the channel
1290	   bandwidth is two bits per datagram from 'A' to 'M' and one bit per
1291	   datagram from 'M' to 'A' (because 'E' limits the combinations of the
1292	   2-bit ECN field that it will copy).  In both cases the covert channel
1293	   bandwidth is further reduced by noise from any real congestion
1294	   marking.  RFC4301 therefore implies that these covert channels are
1295	   sufficiently limited to be considered a manageable threat.  However,
1296	   with respect to the larger (6b) DS field, the same section of RFC4301
1297	   says not copying is the default, but a configuration option can allow
1298	   copying "to allow a local administrator to decide whether the covert
1299	   channel provided by copying these bits outweighs the benefits of
1300	   copying".  Of course, an administrator considering copying of the DS
1301	   field has to take into account that it could be concatenated with the
1302	   ECN field giving an 8b per datagram covert channel.

1304	   Thus, for tunnelling the 6b Diffserv field two conceptual models have
1305	   had to be defined so that administrators can trade off security
1306	   against the needs of traffic conditioning [RFC2983]:

1308	   The uniform model:  where the DIffserv field is preserved end-to-end
1309	      by copying into the outer header on encapsulation and copying from
1310	      the outer header on decapsulation.

1312	   The pipe model:  where the outer header is independent of that in the
1313	      inner header so it hides the Diffserv field of the inner header
1314	      from any interaction with nodes along the tunnel.

1316	   However, for ECN, the new IPsec security architecture in RFC4301 only
1317	   standardised one tunnelling model equivalent to the uniform model.
1318	   It deemed that simplicity was more important than allowing
1319	   administrators the option of a tiny increment in security, especially
1320	   given not copying congestion indications could seriously harm
1321	   everyone's network service.

1323	A.2.  Control Constraints

1325	   Congestion control requires that any congestion notification marked
1326	   into packets by a resource will be able to traverse a feedback loop
1327	   back to a function capable of controlling the load on that resource.
1328	   To be precise, rather than calling this function the data source, we
1329	   will call it the Load Regulator.  This will allow us to deal with
1330	   exceptional cases where load is not regulated by the data source, but
1331	   usually the two terms will be synonymous.  Note the term "a function
1332	   _capable of_ controlling the load" deliberately includes a source
1333	   application that doesn't actually control the load but ought to (e.g.
1334	   an application without congestion control that uses UDP).

1336	                 A--->R--->I=========>M=========>E-------->B

1338	                     Figure 6: Simple Tunnel Scenario

1340	   We now consider a similar tunnelling scenario to the IPsec one just
1341	   described, but without the different security domains so we can just
1342	   focus on ensuring the control loop and management monitoring can work
1343	   (Figure 6).  If we want resources in the tunnel to be able to
1344	   explicitly notify congestion and the feedback path is from 'B' to
1345	   'A', it will certainly be necessary for 'E' to copy any CE marking
1346	   from the outer header to the inner header for onward transmission to
1347	   'B', otherwise congestion notification from resources like 'M' cannot
1348	   be fed back to the Load Regulator ('A').  But it doesn't seem
1349	   necessary for 'I' to copy CE markings from the inner to the outer
1350	   header.  For instance, if resource 'R' is congested, it can send
1351	   congestion information to 'B' using the congestion field in the inner
1352	   header without 'I' copying the congestion field into the outer header
1353	   and 'E' copying it back to the inner header.  'E' can still write any
1354	   additional congestion marking introduced across the tunnel into the
1355	   congestion field of the inner header.

1357	   It might be useful for the tunnel egress to be able to tell whether
1358	   congestion occurred across a tunnel or upstream of it.  If outer
1359	   header congestion marking was reset by the tunnel ingress ('I'), at
1360	   the end of a tunnel ('E') the outer headers would indicate congestion
1361	   experienced across the tunnel ('I' to 'E'), while the inner header
1362	   would indicate congestion upstream of 'I'.  But similar information
1363	   can be gleaned even if the tunnel ingress copies the inner to the
1364	   outer headers.  At the end of the tunnel ('E'), any packet with an
1365	   _extra_ mark in the outer header relative to the inner header
1366	   indicates congestion across the tunnel ('I' to 'E'), while the inner
1367	   header would still indicate congestion upstream of ('I').  Appendix C
1368	   gives a simple and precise method for a tunnel egress to infer the
1369	   congestion level introduced across a tunnel.

1371	   All this shows that 'E' can preserve the control loop irrespective of
1372	   whether 'I' copies congestion notification into the outer header or
1373	   resets it.

1375	   That is the situation for existing control arrangements but, because
1376	   copying reveals more information, it would open up possibilities for
1377	   better control system designs.  For instance, Appendix E describes
1378	   how resetting CE marking at a tunnel ingress confuses a proposed
1379	   congestion marking scheme on the standards track.  It ends up
1380	   removing excessive amounts of traffic unnecessarily.  Whereas copying
1381	   CE markings at ingress leads to the correct control behaviour.

1383	A.3.  Management Constraints

1385	   As well as control, there are also management constraints.
1386	   Specifically, a management system may monitor congestion markings in
1387	   passing packets, perhaps at the border between networks as part of a
1388	   service level agreement.  For instance, monitors at the borders of
1389	   autonomous systems may need to measure how much congestion has
1390	   accumulated since the original source, perhaps to determine between
1391	   them how much of the congestion is contributed by each domain.

1393	   Therefore, when monitoring the middle of a path, it should be
1394	   possible to establish how far back in the path congestion markings
1395	   have accumulated from.  In this document we term this the baseline of
1396	   congestion marking (or the Congestion Baseline), i.e. the source of
1397	   the layer that last reset (or created) the congestion notification
1398	   field.  Given some tunnels cross domain borders (e.g. consider M in
1399	   Figure 6 is monitoring a border), it would therefore be desirable for
1400	   'I' to copy congestion accumulated so far into the outer headers
1401	   exposed across the tunnel.

1403	   Appendix B.2 discusses various scenarios where the Load Regulator
1404	   lies in-path, not at the source host as we would typically expect.
1405	   It concludes that a Congestion Baseline is determined by where the
1406	   Load Regulator function is, which should be identified in the
1407	   transport layer, not by addresses in network layer headers.  This
1408	   applies whether the Load Regulator is at the source host or within
1409	   the path.  The appendix also discusses where a Load Regulator
1410	   function should be located relative to a local tunnel encapsulation
1411	   function.

1413	Appendix B.  Relative Placement of Tunnelling and In-Path Load
1414	             Regulation

1416	B.1.  Identifiers and In-Path Load Regulators

1418	   The Load Regulator is the node to which congestion feedback should be
1419	   returned by the next downstream node with a transport layer feedback
1420	   function (typically but not always the data receiver).  The Load
1421	   Regulator is often, but not always the data source.  It is not always
1422	   (or even typically) the same thing as the node identified by the
1423	   source address of the outermost exposed header.  In general the
1424	   addressing of the outermost encapsulation header says nothing about
1425	   the identifiers of either the upstream or the downstream transport
1426	   layer functions.  As long as the transport functions know each
1427	   other's addresses, they don't have to be identified in the network
1428	   layer or in any link layer.  It was only a convenience that a TCP
1429	   receiver assumed that the address of the source transport is the same
1430	   as the network layer source address of an IP packet it receives.

1432	   More generally, the return transport address for feedback could be
1433	   identified solely in the transport layer protocol.  For instance, a
1434	   signalling protocol like RSVP [RFC2205] breaks up a path into
1435	   transport layer hops and informs each hop of the address of its
1436	   transport layer neighbour without any need to identify these hops in
1437	   the network layer.  RSVP can be arranged so that these transport
1438	   layer hops are bigger than the underlying network layer hops.  The
1439	   host identity protocol (HIP) architecture [RFC4423] also supports the
1440	   same principled separation (for mobility amongst other things), where
1441	   the transport layer sender identifies its transport address for
1442	   feedback to be sent to, using an identifier provided by a shim below
1443	   the transport layer.

1445	   Keeping to this layering principle deliberately doesn't require a
1446	   network layer packet header to reveal the origin address from where
1447	   congestion notification accumulates (its Congestion Baseline).  It is
1448	   not necessary for the network and lower layers to know the address of
1449	   the Load Regulator.  Only the destination transport needs to know
1450	   that.  With forward congestion notification, the network and link
1451	   layers only notify congestion forwards; they aren't involved in
1452	   feeding it backwards.  If they are (e.g. backward congestion
1453	   notification (BCN) in Ethernet [IEEE802.1au] or EFCI in ATM
1454	   [ITU-T.I.371]), that should be considered as a transport function
1455	   added to the lower layer, which must sort out its own addressing.
1456	   Indeed, this is one reason why ICMP source quench is now deprecated
1457	   [RFC1254]; when congestion occurs within a tunnel it is complex
1458	   (particularly in the case of IPsec tunnels) to return the ICMP
1459	   messages beyond the tunnel ingress back to the Load Regulator.

1461	   Similarly, if a management system is monitoring congestion and needs
1462	   to know the Congestion Baseline, the management system has to find
1463	   this out from the transport; in general it cannot tell solely by
1464	   looking at the network or link layer headers.

1466	B.2.  Non-Dependence of Tunnelling on In-path Load Regulation

1468	   We have said that at any point in a network, the Congestion Baseline
1469	   (where congestion notification starts from zero) should be the
1470	   previous upstream Load Regulator.  We have also said that the ingress
1471	   of an IP in IP tunnel must copy congestion indications to the
1472	   encapsulating outer headers it creates.  If the Load Regulator is in-
1473	   path rather than at the source, and also a tunnel ingress, these two
1474	   requirements seem to be contradictory.  A tunnel ingress must not
1475	   reset incoming congestion, but a Load Regulator must be the
1476	   Congestion Baseline, implying it needs to reset incoming congestion.

1478	   In fact, the two requirements are not contradictory, because a Load
1479	   Regulator and a tunnel ingress are not the names of machines, but the
1480	   names of functions within a machine that typically occur in sequence
1481	   on a stream of packets, not at the same point.  Figure 7 is borrowed
1482	   from [RFC2983] (which was making a similar point about the location
1483	   of Diffserv traffic conditioning relative to the encapsulation
1484	   function of a tunnel).  An in-path Load Regulator can act on packets
1485	   either at [1 - Before] encapsulation or at [2 - Outer] after
1486	   encapsulation.  Load Regulation does not ever need to be integrated
1487	   with the [Encapsulate] function (but it can be for efficiency).
1488	   Therefore we can still mandate that the [Encapsulate] function always
1489	   copies CE into the outer header.

1491	     >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->>
1492	                                         \
1493	                                          \
1494	                                           +--------[2 - Outer]------->>

1496	     Figure 7: Placement of In-Path Load Regulator Relative to Tunnel
1497	                                  Ingress

1499	   Then separately, if there is a Load Regulator at location [2 -
1500	   Outer], it might reset CE to ECT(0), say.  Then the Congestion
1501	   Baseline for the lower layer (outer) will be [2 - Outer], while the
1502	   Congestion Baseline of the inner layer will be unchanged.  But how
1503	   encapsulation works has nothing to do with whether a Load Regulator
1504	   is present or where it is.

1506	   If on the other hand a Load Regulator resets CE at [1 - Before], the
1507	   Congestion Baseline of both the inner and outer headers will be [1 -
1508	   Before].  But again, encapsulation is independent of load regulation.

1510	B.3.  Dependence of In-Path Load Regulation on Tunnelling

1512	   Although encapsulation doesn't need to depend on in-path load
1513	   regulation, the reverse is not true.  The placement of an in-path
1514	   Load Regulator must be carefully considered relative to
1515	   encapsulation.  Some examples are given in the following for
1516	   guidance.

1518	   In the traditional Internet architecture one tends to think of the
1519	   source host as the Load Regulator for a path.  It is generally not
1520	   desirable or practical for a node part way along the path to regulate
1521	   the load.  However, various reasonable proposals for in-path load
1522	   regulation have been made from time to time (e.g. fair queuing,
1523	   traffic engineering, flow admission control).  The IETF has recently
1524	   chartered a working group to standardise admission control across a
1525	   part of a path using pre-congestion notification (PCN) [PCNcharter].
1526	   This is of particular relevance here because it involves congestion
1527	   notification with an in-path Load Regulator, it can involve
1528	   tunnelling and it certainly involves encapsulation more generally.

1530	   We will use the more complex scenario in Figure 8 to tease out all
1531	   the issues that arise when combining congestion notification and
1532	   tunnelling with various possible in-path load regulation schemes.  In
1533	   this case 'I1' and 'E2' break up the path into three separate
1534	   congestion control loops.  The feedback for these loops is shown
1535	   going right to left across the top of the figure.  The 'V's are arrow
1536	   heads representing the direction of feedback, not letters.  But there
1537	   are also two tunnels within the middle control loop: 'I1' to 'E1' and
1538	   'I2' to 'E2'.  The two tunnels might be VPNs, perhaps over two MPLS
1539	   core networks.  M is a congestion monitoring point, perhaps between
1540	   two border routers where the same tunnel continues unbroken across
1541	   the border.
1542	        ______     _______________________________________      _____
1543	       /      \   /                                        \   /     \
1544	      V        \ V                                M         \ V       \
1545	      A--->R--->I1===========>E1----->I2=========>==========>E2------->B

1547	                     Figure 8: Complex Tunnel Scenario

1549	   The question is, should the congestion markings in the outer exposed
1550	   headers of a tunnel represent congestion only since the tunnel
1551	   ingress or over the whole upstream path from the source of the inner
1552	   header (whatever that may mean)?  Or put another way, should 'I1' and
1553	   'I2' copy or reset CE markings?
1554	   Based on the design principles in Section 4.3, the answer is that the
1555	   Congestion Baseline should be the nearest upstream interface designed
1556	   to regulate traffic load--the Load Regulator.  In Figure 8 'A', 'I1'
1557	   or 'E2' are all Load Regulators.  We have shown the feedback loops
1558	   returning to each of these nodes so that they can regulate the load
1559	   causing the congestion notification.  So the Congestion Baseline
1560	   exposed to M should be 'I1' (the Load Regulator), not 'I2'.
1561	   Therefore I1 should reset any arriving CE markings.  In this case,
1562	   'I1' knows the tunnel to 'E1' is unrelated to its load regulation
1563	   function.  So the load regulation function within 'I1' should be
1564	   placed at [1 - Before] tunnel encapsulation within 'I1' (using the
1565	   terminology of Figure 7).  Then the Congestion Baseline all across
1566	   the networks from 'I1' to 'E2' in both inner and outer headers will
1567	   be 'I1'.

1569	   The following further examples illustrate how this answer might be
1570	   applied:

1572	   o  We argued in Appendix E that resetting CE on encapsulation could
1573	      harm PCN excess rate marking, which marks excess traffic for
1574	      removal in subsequent round trips.  This marking relies on not
1575	      marking packets if another node upstream has already marked them
1576	      for removal.  If there were a tunnel ingress between the two which
1577	      reset CE markings, it would confuse the downstream node into
1578	      marking far too much traffic for removal.  So why do we say that
1579	      'I1' should reset CE, while a tunnel ingress shouldn't?  The
1580	      answer is that it is the Load Regulator function at 'I1' that is
1581	      resetting CE, not the tunnel encapsulator.  The Load Regulator
1582	      needs to set itself as the Congestion Baseline, so the feedback it
1583	      gets will only be about congestion on links it can relieve itself
1584	      (by regulating the load into them).  When it resets CE markings,
1585	      it knows that something else upstream will have dealt with the
1586	      congestion notifications it removes, given it is part of an end-
1587	      to-end admission control signalling loop.  It therefore knows that
1588	      previous hops will be covered by other Load Regulators.
1589	      Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should
1590	      follow the new rule for any tunnel ingress and copy congestion
1591	      marking into the outer tunnel header.  The ingress at 'I1' will
1592	      happen to copy headers that have already been reset just
1593	      beforehand.  But it doesn't need to know that.

1595	   o  [Shayman] suggested feedback of ECN accumulated across an MPLS
1596	      domain could cause the ingress to trigger re-routing to mitigate
1597	      congestion.  This case is more like the simple scenario of
1598	      Figure 6, with a feedback loop across the MPLS domain ('E' back to
1599	      'I').  I is a Load Regulator because re-routing around congestion
1600	      is a load regulation function.  But in this case 'I' should only
1601	      reset itself as the Congestion Baseline in outer headers, as it is
1602	      not handling congestion outside its domain, so it must preserve
1603	      the end-to-end congestion feedback loop for something else to
1604	      handle (probably the data source).  Therefore the Load Regulator
1605	      within 'I' should be placed at [2 - Outer] to reset CE markings
1606	      just after the tunnel ingress has copied them from arriving
1607	      headers.  Again, the tunnel encapsulation function at 'I' simply
1608	      copies incoming headers, unaware that the load regulator will
1609	      subsequently reset its outer headers.

1611	   o  The PWE3 working group of the IETF is considering the problem of
1612	      how and whether an aggregate edge-to-edge pseudo-wire emulation
1613	      should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].
1614	      Although the study is still at the requirements stage, some
1615	      (controversial) solution proposals include in-path load regulation
1616	      at the ingress to the tunnel that could lead to tunnel
1617	      arrangements with similar complexity to that of Figure 8.

1619	   These are not contrived scenarios--they could be a lot worse.  For
1620	   instance, a host may create a tunnel for IPsec which is placed inside
1621	   a tunnel for Mobile IP over a remote part of its path.  And around
1622	   this all we may have MPLS labels being pushed and popped as packets
1623	   pass across different core networks.  Similarly, it is possible that
1624	   subnets could be built from link technology (e.g. future Ethernet
1625	   switches) so that link headers being added and removed could involve
1626	   congestion notification in future Ethernet link headers with all the
1627	   same issues as with IP in IP tunnels.

1629	   One reason we introduced the concept of a Load Regulator was to allow
1630	   for in-path load regulation.  In the traditional Internet
1631	   architecture one tends to think of a host and a Load Regulator as
1632	   synonymous, but when considering tunnelling, even the definition of a
1633	   host is too fuzzy, whereas a Load Regulator is a clearly defined
1634	   function.  Similarly, the concept of innermost header is too fuzzy to
1635	   be able to (wrongly) say that the source address of the innermost
1636	   header should be the Congestion Baseline.  Which is the innermost
1637	   header when multiple encapsulations may be in use?  Where do we stop?
1638	   If we say the original source in the above IPsec-Mobile IP case is
1639	   the host, how do we know it isn't tunnelling an encrypted packet
1640	   stream on behalf of another host in a p2p network?

1642	   We have become used to thinking that only hosts regulate load.  The
1643	   end to end design principle advises that this is a good idea
1644	   [RFC3426], but it also advises that it is solely a guiding principle
1645	   intended to make the designer think very carefully before breaking
1646	   it.  We do have proposals where load regulation functions sit within
1647	   a network path for good, if sometimes controversial, reasons, e.g.
1648	   PCN edge admission control gateways [I-D.ietf-pcn-architecture] or
1649	   traffic engineering functions at domain borders to re-route around
1650	   congestion [Shayman].  Whether or not we want in-path load
1651	   regulation, we have to work round the fact that it will not go away.

1653	Appendix C.  Contribution to Congestion across a Tunnel

1655	   This specification mandates that a tunnel ingress determines the ECN
1656	   field of each new outer tunnel header by copying the arriving header.
1657	   Concern has been expressed that this will make it difficult for the
1658	   tunnel egress to monitor congestion introduced only along a tunnel,
1659	   which is easy if the outer ECN field is reset at a tunnel ingress
1660	   (RFC3168 full functionality mode).  However, in fact copying CE marks
1661	   at ingress will still make it easy for the egress to measure
1662	   congestion introduced across a tunnel, as illustrated below.

1664	   Consider 100 packets measured at the egress.  It measures that 30 are
1665	   CE marked in the inner and outer headers and 12 have additional CE
1666	   marks in the outer but not the inner.  This means packets arriving at
1667	   the ingress had already experienced 30% congestion.  However, it does
1668	   not mean there was 12% congestion across the tunnel.  The correct
1669	   calculation of congestion across the tunnel is p_t = 12/(100-30) =
1670	   12/70 = 17%.  This is easy for the egress to to measure.  It is the
1671	   packets with additional CE marking in the outer header (12) as a
1672	   proportion of packets not marked in the inner header (70).

1674	   Figure 9 illustrates this in a combinatorial probability diagram.
1675	   The square represents 100 packets.  The 30% division along the bottom
1676	   represents marking before the ingress, and the p_t division up the
1677	   side represents marking along the tunnel.

1679	     +-----+---------+100%
1680	     |     |         |
1681	     | 30  |         |
1682	     |     |         |       The large square
1683	     |     +---------+p_t    represents 100 packets
1684	     |     |   12    |
1685	     +-----+---------+0
1686	     0    30%       100%
1687	     inner header marking

1689	       Figure 9: Tunnel Marking of Packets Already Marked at Ingress

1691	Appendix D.  Why Not Propagating ECT(1) on Decapsulation Impedes PCN

1693	   Multi-level congestion notification is currently on the IETF's
1694	   standards track agenda in the Congestion and Pre-Congestion
1695	   Notification (PCN) working group.  The PCN working group eventually
1696	   requires three congestion states (not marked and two increasingly
1697	   severe levels of congestion marking) [I-D.ietf-pcn-architecture].
1698	   The aim is for the less severe level of marking to stop admitting new
1699	   traffic and the more severe level to terminate sufficient existing
1700	   flows to bring a network back to its operating point after a serious
1701	   failure.

1703	   Although the ECN field gives sufficient codepoints for these three
1704	   states, current ECN tunnelling RFCs prevent the PCN working group
1705	   from using three ECN states in case any tunnel decapsulations occur
1706	   within a PCN region (see Appendix A of
1707	   [I-D.ietf-pcn-baseline-encoding]).  If a node in a tunnel sets the
1708	   ECN field to ECT(0) or ECT(1), this change will be discarded by a
1709	   tunnel egress compliant with RFC4301 or RFC3168.  This can be seen in
1710	   Figure 2 (Section 3.2), where ECT values in the outer header are
1711	   ignored unless the inner header is the same.  Effectively one ECT
1712	   codepoint is wasted; the ECT(0) and ECT(1) codepoints have to be
1713	   treated as just one codepoint when they could otherwise have been
1714	   used for their intended purpose of congestion notification.

1716	   As a consequence, the PCN w-g has initially confined itself to two
1717	   encoding states as a baseline encoding
1718	   [I-D.ietf-pcn-baseline-encoding].  And it has had to propose an
1719	   experimental extension using extra Diffserv codepoint(s) to encode
1720	   the extra states [I-D.moncaster-pcn-3-state-encoding], using up the
1721	   rapidly exhausting DSCP space while leaving ECN codepoints unused.
1722	   Another PCN encoding has been proposed that would survive tunnelling
1723	   without an extra DSCP [I-D.menth-pcn-psdm-encoding], but it requires
1724	   the PCN edge gateways to somehow share state so the egress can
1725	   determine which marking a packet started with at the ingress.  Also a
1726	   PCN ingress node can game the system by initiating packets with
1727	   inappropriate markings.  Yet another work-round to the ECN tunnelling
1728	   problem proposes a more involved marking algorithm in the forwarding
1729	   plane to encode the three congestion notification states using only
1730	   two ECN codepoints [I-D.satoh-pcn-st-marking].  Still another
1731	   proposal compromises the precision of the admission control
1732	   mechanism, but manages to work with just two encoding states and a
1733	   single marking algorithm [I-D.charny-pcn-single-marking].

1735	   Rather than require the IETF to bless any of these work-rounds, this
1736	   specification fixes the root cause of the problem so that operators
1737	   deploying PCN can simply ask that tunnel end-points within a PCN
1738	   region should comply with this new ECN tunnelling specification.

1740	   Then PCN can use the trivially simple experimental 3-state ECN
1741	   encoding defined in [I-D.briscoe-pcn-3-in-1-encoding].

1743	D.1.  Alternative Ways to Introduce the New Decapsulation Rules

1745	   There are a number of ways for the new decapsulation rules to be
1746	   introduced:

1748	   o  They could be specified in the present standards track proposal
1749	      (preferred) or in an experimental extension;

1751	   o  They could be specified as a new default for all Diffserv PHBs
1752	      (preferred) or as an option to be configured only for Diffserv
1753	      PHBs requiring them (e.g.  PCN).

1755	   The argument for making this change now, rather than in a separate
1756	   experimental extension, is to avoid the burden of an extra standard
1757	   to be compliant with and to be backwards compatible with--so we don't
1758	   add to the already complex history of ECN tunnelling RFCs.  The
1759	   argument for a separate experimental extension is that we may never
1760	   need this change (if PCN is never successfully deployed and if no-one
1761	   ever needs three ECN or PCN encoding states rather than two).
1762	   However, the change does no harm to existing mechanisms and stops
1763	   tunnels wasting of quarter of a bit (a 2-bit codepoint).

1765	   The argument for making this new decapsulation behaviour the default
1766	   for all PHBs is that it doesn't change any expected behaviour that
1767	   existing mechanisms rely on already.  Also, by ending the present
1768	   waste of a codepoint, in the future a use of that codepoint could be
1769	   proposed for all PHBs, even if PCN isn't successfully deployed.

1771	   In practice, if these new decapsulation rules are specified
1772	   straightaway as the normative default for all PHBs, a network
1773	   operator deploying 3-state PCN would be able to request that tunnels
1774	   comply with the latest specification.  Implementers of non-PCN
1775	   tunnels would not need to comply but, if they did, their code would
1776	   be future proofed and no harm would be done to legacy operations.
1777	   Therefore, rather than branching their code base, it would be easiest
1778	   for implementers to make all their new tunnel code comply with this
1779	   specfication, whether or not it was for PCN.  But they could leave
1780	   old code untouched, unless it was for PCN.

1782	   The alternatives are worse.  Implementers would otherwise have to
1783	   provide configurable decapsulation options and operators would have
1784	   to configure all IPsec and IP in IP tunnel endpoints for the
1785	   exceptional behaviour of certain PHBs.  The rules for tunnel
1786	   endpoints to handle both the Diffserv field and the ECN field should
1787	   'just work' when handling packets with any Diffserv codepoint.

1789	Appendix E.  Why Resetting CE on Encapsulation Impedes PCN

1791	   Regarding encapsulation, the section of the PCN architecture
1792	   [I-D.ietf-pcn-architecture] on tunnelling says that header copying
1793	   (RFC4301) allows PCN to work correctly.  Whereas resetting CE
1794	   markings confuses PCN marking.

1796	   The specific issue here concerns PCN excess rate marking
1797	   [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic
1798	   that exceeds a configured threshold rate.  One of the goals of excess
1799	   rate marking is to enable the speedy removal of excess admission
1800	   controlled traffic following re-routes caused by link failures or
1801	   other disasters.  This maintains a share of the capacity for traffic
1802	   in lower priority classes.  After failures, traffic re-routed onto
1803	   remaining links can often stress multiple links along a path.
1804	   Therefore, traffic can arrive at a link under stress with some
1805	   proportion already marked for removal by a previous link.  By design,
1806	   marked traffic will be removed by the overall system in subsequent
1807	   round trips.  So when the excess rate marking algorithm decides how
1808	   much traffic to mark for removal, it doesn't include traffic already
1809	   marked for removal by another node upstream (the `Excess traffic
1810	   meter function' of [I-D.ietf-pcn-marking-behaviour]).

1812	   However, if an RFC3168 tunnel ingress intervenes, it resets the ECN
1813	   field in all the outer headers, hiding all the evidence of problems
1814	   upstream.  Thus, although excess rate marking works fine with RFC4301
1815	   IPsec tunnels, with RFC3168 tunnels it typically removes large
1816	   volumes of traffic that it didn't need to remove at all.

1818	Author's Address

1820	   Bob Briscoe
1821	   BT
1822	   B54/77, Adastral Park
1823	   Martlesham Heath
1824	   Ipswich  IP5 3RE
1825	   UK

1827	   Phone: +44 1473 645196
1828	   Email: bob.briscoe@bt.com
1829	   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/