idnits 2.17.1 

draft-davie-ecn-mpls-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 903.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 880.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 887.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 893.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 18, 2006) is 6522 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2475' is defined on line 759, but no explicit
     reference was found in the text

  == Unused Reference: 'I-D.briscoe-tsvwg-re-ecn-border-cheat' is defined on
     line 800, but no explicit reference was found in the text

  == Unused Reference: 'I-D.ietf-nsis-rmd' is defined on line 815, but no
     explicit reference was found in the text

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Downref: Normative reference to an Informational RFC: RFC 3260

  == Outdated reference: A later version (-04) exists of
     draft-briscoe-tsvwg-cl-architecture-02

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-tsvwg-cl-phb-01

  == Outdated reference: A later version (-01) exists of
     draft-briscoe-tsvwg-re-ecn-border-cheat-00

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-01

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-06

  == Outdated reference: A later version (-01) exists of
     draft-lefaucheur-rsvp-ecn-00


     Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           B. Davie
3	Internet-Draft                                       Cisco Systems, Inc.
4	Expires: December 20, 2006                                    B. Briscoe
5	                                                                  J. Tay
6	                                                             BT Research
7	                                                           June 18, 2006

9	                  Explicit Congestion Marking in MPLS
10	                      draft-davie-ecn-mpls-00.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on December 20, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2006).

41	Abstract

43	   RFC 3270 defines how to support the Diffserv arhitecture in MPLS
44	   networks, including how to encode Diffserv Code Points (DSCPs) in an
45	   MPLS header.  DSCPs may be encoded in the EXP field, while other uses
46	   of that field are not precluded.  RFC3270 makes no statement about
47	   how Explicit Congestion Notification (ECN) marking might be encoded
48	   in the MPLS header.  This draft defines how an operator might define
49	   some of the EXP codepoints for explicit congestion notification,
50	   without precluding other uses.

52	Requirements Language

54	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
55	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
56	   document are to be interpreted as described in RFC 2119 [RFC2119].

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
61	     1.1.  Background . . . . . . . . . . . . . . . . . . . . . . . .  4
62	     1.2.  Intent . . . . . . . . . . . . . . . . . . . . . . . . . .  4
63	     1.3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
64	   2.  Use of MPLS EXP Field for ECN  . . . . . . . . . . . . . . . .  5
65	   3.  Per-domain ECT checking  . . . . . . . . . . . . . . . . . . .  7
66	   4.  ECN-enabled MPLS domain  . . . . . . . . . . . . . . . . . . .  8
67	     4.1.  Pushing (adding) one or more labels to an IP packet  . . .  8
68	     4.2.  Pushing one or more labels onto an MPLS labelled packet  .  8
69	     4.3.  Congestion experienced in an interior MPLS node  . . . . .  9
70	     4.4.  Crossing a Diffserv Domain Boundary  . . . . . . . . . . .  9
71	     4.5.  Popping an MPLS label (not the end of the stack) . . . . .  9
72	     4.6.  Popping the last MPLS label in the stack . . . . . . . . .  9
73	     4.7.  Diffserv Tunneling Models  . . . . . . . . . . . . . . . . 10
74	     4.8.  Extension to Pre-Congestion Notification . . . . . . . . . 10
75	       4.8.1.  Label Push onto IP packet  . . . . . . . . . . . . . . 10
76	       4.8.2.  Pushing Additional MPLS Labels . . . . . . . . . . . . 10
77	       4.8.3.  Admission Control or Pre-emption Marking inside
78	               MPLS domain  . . . . . . . . . . . . . . . . . . . . . 11
79	       4.8.4.  Popping an MPLS Label (not end of stack) . . . . . . . 11
80	       4.8.5.  Popping the last MPLS Label to expose IP header  . . . 11
81	   5.  ECN-disabled MPLS domain . . . . . . . . . . . . . . . . . . . 11
82	   6.  The use of more codepoints with E-LSPs and L-LSPs  . . . . . . 11
83	   7.  Relationship to tunnel behavior in RFC 3168  . . . . . . . . . 12
84	     7.1.  Alternative approach to support ECN in an MPLS domain  . . 12
85	   8.  Example Uses . . . . . . . . . . . . . . . . . . . . . . . . . 13
86	     8.1.  RFC3168-style ECN  . . . . . . . . . . . . . . . . . . . . 13
87	     8.2.  ECN Co-existence with Diffserv E-LSPs  . . . . . . . . . . 14
88	     8.3.  Congestion-feedback-based Traffic Engineering  . . . . . . 14
89	     8.4.  PCN flow admission control and flow pre-emption  . . . . . 14
90	   9.  Deployment Considerations  . . . . . . . . . . . . . . . . . . 15
91	     9.1.  Marking non-ECN Capable Packets  . . . . . . . . . . . . . 15
92	     9.2.  Non-ECN capable routers in an MPLS Domain  . . . . . . . . 16
93	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
94	   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 16
95	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
96	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
97	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 17
98	     13.2. Informative References . . . . . . . . . . . . . . . . . . 18
99	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
100	   Intellectual Property and Copyright Statements . . . . . . . . . . 21

102	1.  Introduction

104	1.1.  Background

106	   [RFC3270] defines how to support the Diffserv arhitecture in MPLS
107	   networks, including how to encode Diffserv Code Points (DSCPs) in an
108	   MPLS header.  DSCPs may be encoded in the EXP field, while other uses
109	   of that field are not precluded.  RFC3270 makes no statement about
110	   how Explicit Congestion Notification (ECN) marking might be encoded
111	   in the MPLS header.  This draft defines how an operator might define
112	   some of the EXP codepoints for explicit congestion notification,
113	   without precluding other uses.  In parallel to the activity defining
114	   the addition of ECN to IP [RFC3168], two proposals were made to add
115	   ECN to MPLS [Floyd][Shayman].  These proposals, however, fell by the
116	   way-side.  With ECN for IP now being a proposed standard, and
117	   developing interest in using pre-congestion notification (PCN) for
118	   admission control and flow pre-emption[I-D.briscoe-tsvwg-cl-
119	   architecture], there is consequent interest in being able to support
120	   ECN across IP networks consisting of MPLS-enabled domains.  Therefore
121	   it is necessary to specify the protocol for including ECN or PCN in
122	   the MPLS shim header, and the protocol behaviour of edge MPLS nodes.

124	   We note that in [RFC3168] there are four codepoints used for ECN
125	   marking, which are encoded using two bits of the IP header.  The MPLS
126	   EXP field is the logical place to encode ECN codepoints, but with
127	   only 3 bits (8 codepoints) available, and with the same field being
128	   used to convey DSCP information as well, there is a clear incentive
129	   to conserve the number of codepoints consumed for ECN purposes.
130	   Efficient use of the EXP field has been a focus of prior drafts
131	   [Floyd] [Shayman] and we draw on those efforts in this draft as well.

133	1.2.  Intent

135	   Our intent is to specify how the MPLS shim header[RFC3032] should
136	   denote ECN marking and how MPLS nodes should understand whether the
137	   transport for a packet will be ECN capable.  We offer this as a
138	   building block, from which to build different congestion notification
139	   systems.  We do not intend to specify how the resulting congestion
140	   notification is fed back to an upstream node that can mitigate
141	   congestion.  For instance, unlike [Shayman], we do not specify edge-
142	   to-edge MPLS domain feedback, but we also do not preclude it.
143	   Nonetheless, we do specify how the egress node of an MPLS domain
144	   should copy congestion notification from the MPLS shim into the
145	   underlying IP header if the ECN is to be carried onward towards the
146	   IP receiver.  But we do NOT mandate that MPLS congestion notification
147	   must be copied into the IP header for onward transmission.  This
148	   draft aims to be generic for any use of congestion notification in
149	   MPLS.  PCN or traffic engineering are merely two of many motivating
150	   applications (see Section 8.)

152	1.3.  Terminology

154	   This document draws freely on the terminology of ECN [RFC3168] and
155	   MPLS [RFC3031].  For ease of reference, we have included some
156	   definitions here, but refer the reader to the references above for
157	   complete specifications of the relevant technologies:

159	   o  CE: Congestion Experienced.  One of the states with which a packet
160	      may be marked in a network supporting ECN.  A packet is marked in
161	      this state by an ECN-capable router, to indicate that this router
162	      was experiencing congestion at the time the packet arrived.

164	   o  ECT: ECN-capable Transport.  One of the ECN states which a packet
165	      may be in when it is sent by an end system.  An end system marks a
166	      packet with an ECT codepoint to indicate that the end-points of
167	      the transport protocol are ECN-capable.  A router may not mark a
168	      packet as CE unless the packet was marked ECT when it arrived.

170	   o  Not-ECT: Not ECN capable transport.  An end system marks a packet
171	      with this codepoint to indicate that the end-points of the
172	      transport protocol are not ECN-capable.  A congested router cannot
173	      mark such packets as CE, and thus can only drop them to indicate
174	      congestion.

176	   o  EXP field.  A 3 bit field in the MPLS label header [RFC3032] which
177	      may be used to convey Diffserv information (and used in this draft
178	      to carry ECN information).

180	   o  PHP.  Penultimate Hop Popping.  An MPLS operation in which the
181	      penultimate Label Switching Router (LSR) on a Label Switched Path
182	      (LSP) removes the top label from the packet before forwarding the
183	      packet to the final LSR on the LSP.

185	2.  Use of MPLS EXP Field for ECN

187	   We propose that LSRs configured for explicit congestion notification
188	   should use the EXP field in the MPLS shim header.  However, RFC 3270
189	   already defines use of codepoints in the EXP field for differentiated
190	   services.  Although it does not preclude other compatible uses of the
191	   EXP field, this clearly seems to limit the space available for ECN,
192	   given the field is only 3 bits (8 codepoints).

194	   RFC 3270 defines two possible approaches for requesting
195	   differentiated service treatment from an LSR.

197	   o  In the E-LSP approach, different codepoints of the EXP field in
198	      the MPLS shim header are used to indicate the packet's per hop
199	      behaviour (PHB).

201	   o  In the L-LSP approach, an MPLS label is assigned for each PHB
202	      scheduling class (PSC, as defined in [RFC3260], so that an LSR
203	      determines both its forwarding and its scheduling behaviour from
204	      the label.

206	   If an MPLS domain uses the L-LSP approach, there is likely to be
207	   space in the EXP field for ECN codepoint(s).  Where the E-LSP
208	   approach is used, then codepoint space in the EXP field is likely to
209	   be scarce.  This draft focuses on interworking ECN marking with the
210	   E-LSP approach as it is the tougher problem.  Consequently the same
211	   approach can also be applied with L-LSPs.

213	   We recommend that explicit congestion notification in MPLS should use
214	   codepoints instead of bits in the EXP field.  Since not every DSCP
215	   will need an associated ECN codepoint and some DSCPs might need two
216	   ECN codepoints [I-D.briscoe-tsvwg-cl-architecture], it would be
217	   wasteful and incorrect to assign a bit for ECN.

219	   For each PHB that uses ECN marking, we assume one EXP codepoint will
220	   be defined meaning not congestion marked (Not-CM), and at least one
221	   other codepoint will be defined meaning congestion marked (CM).
222	   Therefore, each PHB that uses ECN marking will consume at least two
223	   EXP codepoints.  But PHBs that do not use ECN marking will only
224	   consume one.

226	   Further, we wish to use minimal space in the MPLS shim header to tell
227	   interior LSRs whether each packet will be received by an ECN-capable
228	   transport (ECT).  Nonetheless, we must ensure that an end-point that
229	   would not understand an ECN mark will not receive one, otherwise it
230	   will not be able to respond to congestion as it should.  In the past,
231	   three solutions to this problem have been proposed:

233	   o  One possible approach is for congested LSRs to mark the ECN field
234	      in the underlying IP header at the bottom of the label stack.
235	      Although many commercial LSRs routinely access the IP header for
236	      other reasons (ECMP), there are numerous drawbacks to attempting
237	      to find an IP header beneath an MPLS label stack.  Notably, there
238	      is the challenge of detecting the absence of an IP header when
239	      non-IP packets are carried on an LSP.  Therefore we will not
240	      consider this approach further.

242	   o  In the schemes suggested by [Floyd] and [Shayman], ECT and CE are
243	      overloaded into one bit, so that a 0 means ECT while a 1 might
244	      either mean Not-ECT or it might mean CE.  A packet that has been
245	      marked as having experienced congestion upstream, and then is
246	      picked out for marking at a second congested LSR, will be dropped
247	      by the second LSR since it cannot determine whether the packet has
248	      previously experienced congestion or if ECN is not supported by
249	      the transport.

251	      While such an approach seemed potentially palatable for
252	      traditional ECN, we do not recommend it here for the following
253	      reasons.  In some cases we wish to be able to use ECN marking long
254	      before actual congestion (e.g. pre-congestion notification).  In
255	      these circumstances, marking rates at each LSR might be non-
256	      negligible most of the time, so the chances of a previously marked
257	      packet encountering an LSR that wants to mark it again will also
258	      be non-negligible.  This will lead to unacceptable drop rates.
259	      For instance, if the typical marking rate at every router or LSRs
260	      is p, and the typical diameter of the network of LSRs is d, then
261	      the probability that a marked packet will be marked again is 1-
262	      [1+p(d-1)][1-p]^(d-1).  For instance, with 6 LSRs in a row, each
263	      marking ECN with 1% probability, this bit overloading scheme would
264	      introduce a drop rate of 0.15% unnecessarily.  Given most modern
265	      core networks are sized to introduce near-zero packet drop, it may
266	      be unacceptable to drop over one in a thousand packets
267	      unnecessarily.

269	   o  A third possible approach is for interior LSRs to assume that the
270	      endpoints are ECN-capable, but this assumption is checked when the
271	      final label is popped.  If an interior LSR has marked ECN in the
272	      EXP field of the shim, but the IP header says the endpoints are
273	      not ECN capable, the edge router (or penultimate if using
274	      penultimate hop popping) drops the packet.  We recommend this
275	      scheme, which we call `per-domain ECT checking'; and define it
276	      more precisely in the following section.  Its chief drawback is
277	      that it can involve packets continuing to be forwarded after
278	      encountering congestion only to be dropped at the egress of the
279	      MPLS domain.  The rationale for this decision is given in
280	      Section 9.1.

282	3.  Per-domain ECT checking

284	   For the purposes of this discussion, we define the egress nodes of an
285	   MPLS domain as the nodes that pop the last MPLS label from the label
286	   stack, exposing the IP (or, potentially non-IP) header.  Note that
287	   such a node may be the ultimate or penultimate hop of an LSP,
288	   depending on whether penultimate hop popping (PHP) is employed.

290	   In the per-domain ECT checking approach, the egress nodes take
291	   responsibility for checking whether the transport is ECN capable.

293	   This draft does not specify how these nodes should pass on congestion
294	   notification, because different approaches are likely in different
295	   scenarios.  However, if congestion notification in the MPLS header is
296	   copied into the IP header, the procedure MUST conform to the
297	   specification given here.

299	   If congestion notification is passed to the transport without first
300	   passing it onward in the IP header, the approach used must take
301	   similar care to check that the transport is ECN capable before
302	   passing it ECN markings.  Specifically, if the transport for a
303	   particular congestion marked MPLS packet is found not to be ECN-
304	   capable, the packet MUST be dropped at this egress node.

306	   In the per-domain ECT checking approach, only the egress nodes check
307	   whether an IP packet is destined for an ECN-capable transport.
308	   Therefore, any single LSR within an MPLS domain MUST NOT be
309	   configured to enable ECN marking unless all the egress LSRs
310	   surrounding it are already configured to handle ECN marking.

312	   We call a domain surrounded by ECN-capable egress LSRs an ECN-enabled
313	   MPLS domain.  This term only implies that all the egress LSRs are
314	   ECN-enabled; some interior LSRs may not be ECN-enabled.  For
315	   instance, it would be possible to use legacy LSRs incapable of
316	   supporting ECN in the interior of an MPLS domain as long as all the
317	   egress LSRs were ECN-capable.  Note that if PHP is used, the
318	   "penultimate hop" routers which perform the pop operation do need to
319	   be ECN-enabled, since they are acting in this context as egress LSRs.

321	4.  ECN-enabled MPLS domain

323	   In the following subsections we describe various operations affecting
324	   the ECN marking of a packet that may be performed at MPLS edge and
325	   core LSRs.

327	4.1.   Pushing (adding) one or more labels to an IP packet

329	   On encapsulating an IP packet with an MPLS label stack, the ECN field
330	   must be translated from the IP packet into the MPLS EXP field.  The
331	   Not-CM (not congestion marked) state is set in the MPLS EXP field if
332	   the ECN status of the IP packet is "Not ECT" or ECT(1) or ECT(0).
333	   The CM state is set if the ECN status of the IP packet is "CE".  If
334	   more than one label is pushed at one time, the same value should be
335	   placed in the EXP value of all label stack entries.

337	4.2.  Pushing one or more labels onto an MPLS labelled packet

339	   The EXP field is copied directly from the topmost label before the
340	   push to the newly added outer label.  If more than one label is being
341	   pushed, the same EXP value is copied to all label stack entries.

343	4.3.  Congestion experienced in an interior MPLS node

345	   If the EXP codepoint of the packet maps to a PHB that uses ECN
346	   marking and the marking algorithm requires the packet to be marked,
347	   the CM state is set (irrespective of whether it is already in the CM
348	   state).

350	   If the buffer is full, the packet would be dropped.

352	4.4.  Crossing a Diffserv Domain Boundary

354	   If an MPLS-encapsulated packet crosses a Diffserv domain boundary, it
355	   may be the case that the two domains use different encodings of the
356	   same PHB in the EXP field.  In such cases, the EXP field must be
357	   rewritten at the domain boundary.  If the PHB is one that supports
358	   ECN, then the appropriate ECN marking should also be preserved when
359	   the EXP field is mapped at the boundary.

361	   The related issue of Diffserv tunnel models is discussed in
362	   Section 4.7.

364	4.5.  Popping an MPLS label (not the end of the stack)

366	   When a packet has more than one MPLS label in the stack and the top
367	   label is popped, another MPLS label is exposed.  In this case the ECN
368	   information should be transferred from the outer EXP field to the
369	   inner MPLS label in the following manner.  If the inner EXP field is
370	   Not-CM, the inner EXP field is set to the same CM or Not-CM state as
371	   the outer EXP field.  If the inner EXP field is CM, it remains
372	   unchanged whatever the outer EXP field.  Note that an inner value of
373	   CM and an outer value of not-CM should be considered anomalous, and
374	   SHOULD be logged in some way by the LSR.

376	4.6.  Popping the last MPLS label in the stack

378	   When the last MPLS label is popped from the packet, its payload is
379	   exposed.  If that packet is not IP, and does not have any capability
380	   equivalent to ECT, it is assumed Not-ECT and treated as such.  That
381	   means that if the EXP value of the MPLS header was CM, the packet
382	   MUST be dropped.

384	   Assuming an IP packet was exposed, we have to examine whether that
385	   packet is ECT or not.  If the inner IP packet is Not-ECT, its ECN
386	   field remains unchanged if the EXP field is Not-CM.  However, a Not-
387	   ECT packet MUST be dropped if the EXP field is CM.

389	   If the ECN field of the inner packet is set to ECT(0), ECT(1) or CE,
390	   the ECN field remains unchanged if the EXP field is set to Not-CM.
391	   The ECN field is set to CE if the EXP field is CM.  Note that an
392	   inner value of CE and an outer value of not-CM should be considered
393	   anomalous, and SHOULD be logged in some way by the LSR.

395	4.7.  Diffserv Tunneling Models

397	   [RFC3270] describes three tunneling models for Diffserv support
398	   across MPLS Domains, referred to as the uniform, short pipe, and pipe
399	   models.  The differences between these models lie in whether the
400	   Diffserv treatment that applies to a packet while it travels along a
401	   particular LSP is carried to the last hop of the LSP and beyond the
402	   last hop.  Depending on which mode is preferred by an operator, the
403	   EXP value or DSCP value of an exposed header following a label pop
404	   may or may not be dependent on the EXP value of the label that is
405	   removed by the pop operation.  We believe that in the case of ECN
406	   marking, the use of these models should only apply to the encoding of
407	   the Diffserv PHB in the EXP value, and that the choice of codepoint
408	   for ECN should always be made based on the procedures described
409	   above, independent of the tunneling model.

411	4.8.  Extension to Pre-Congestion Notification

413	   To fully support PCN [I-D.briscoe-tsvwg-cl-architecture] in an MPLS
414	   domain for a particular PHB, a total of 3 codepoints need to be
415	   allocated for that PHB.  (See Section 8.4 for further discussion of
416	   PCN and the possibility of using fewer codepoints.)  These 3
417	   codepoints represent the admission marked (AM), pre-emption marked
418	   (PM) and not marked (NM) states.  The procedures described above need
419	   to be slightly modified to support this scenario.  The following
420	   procedures are invoked when the topmost DSCP or EXP value indicates a
421	   PHB that supports PCN.

423	4.8.1.  Label Push onto IP packet

425	   If the IP packet header indicates AM, set the EXP value of all
426	   entries in the label stack to AM.  If the IP packet header indicates
427	   PM, set the EXP value of all entries in the label stack to PM.  For
428	   any other marking of the IP header, set the EXP value of all entries
429	   in the label stack to NM.

431	4.8.2.  Pushing Additional MPLS Labels

433	   The procedures of Section 4.2 apply.

435	4.8.3.  Admission Control or Pre-emption Marking inside MPLS domain

437	   The EXP value can be set to AM or PM according to the same procedures
438	   as described in [I-D.briscoe-tsvwg-cl-phb].

440	4.8.4.  Popping an MPLS Label (not end of stack)

442	   When popping an MPLS Label exposes another MPLS label, the AM or PM
443	   marking should be transferred to the exposed EXP field in the
444	   following manner: if the inner EXP value is NM, then it should be set
445	   to the same marking state as the EXP value of the popped label stack
446	   entry.  If the inner EXP value is AM, it should be unchanged if the
447	   popped EXP value was AM, and it should be set to PM if the popped EXP
448	   value was PM.  If the popped EXP value was NM, this should be logged
449	   in some way and the inner EXP value should be unchanged.  If the
450	   inner EXP value is PM, it should be unchanged whatever the popped EXP
451	   value was, but any EXP value other than PM should be logged.

453	4.8.5.  Popping the last MPLS Label to expose IP header

455	   When popping the last MPLS Label exposes the IP header, the AM or PM
456	   marking should be transferred to the exposed IP header field in the
457	   following manner: if the inner IP header value is neither AM nor PM,
458	   and the EXP value was NM, then the IP header should be unchanged.
459	   For any other EXP value, the IP header should be set to the same
460	   marking state as the EXP value of the popped label stack entry.  If
461	   the inner IP header value is AM, it should be unchanged if the popped
462	   EXP value was AM, and it should be set to PM if the popped EXP value
463	   was PM.  If the popped EXP value was NM, this should be logged in
464	   some way and the inner IP header value should be unchanged.  If the
465	   IP header value is PM, it should be unchanged whatever the popped EXP
466	   value was, but any EXP value other than PM should be logged.

468	5.  ECN-disabled MPLS domain

470	   If ECN is not enabled on all the egress LSRs of a domain, ECN MUST
471	   NOT be enabled on any LSRs throughout the domain.  If congestion is
472	   experienced on any LSR in an ECN-disabled MPLS domain, packets MUST
473	   be dropped NOT marked.  The exact algorithm for deciding when to drop
474	   packets during congestion (e.g. tail-drop, RED, etc.) is a local
475	   matter for the operator of the domain.

477	6.  The use of more codepoints with E-LSPs and L-LSPs

479	   RFC 3270 gives different options with E-LSPs and L-LSPs and some of
480	   those could potentially provide ample EXP codepoints for ECN/PCN.

482	   However, deploying L-LSPs vs E-LSPs has many implications such as
483	   platform support and operational complexity.  The above ECN/PCN MPLS
484	   solution should provide some flexibility.  If the operator has
485	   deployed one L-LSP per PHB scheduling class, then EXP space will be a
486	   non-issue and it could be used to achieve more sophisticated ECN/PCN
487	   behavior if required.  If the operator wants to stick to E-LSPs and
488	   uses a handful of EXP codepoints for Diffserv, it may be desirable to
489	   operate with a minimum number of extra ECN/PCN codepoints, even if
490	   this comes with some compromise on ECN/PCN optimality.  See Section 8
491	   for discussion of some possible deployment scenarios.

493	7.  Relationship to tunnel behavior in RFC 3168

495	   [RFC3168] defines two modes of encapsulating ECN-marked IP packets
496	   inside additonal IP headers when tunnels are used.  The two modes are
497	   the "full functionality" and "limited functionality" modes.  In the
498	   full functionality mode, the ECT information from the inner header is
499	   copied to the outer header at the tunnel ingress, but the CE
500	   information is not.  In the limited functionality mode, neither ECT
501	   nor CE information is copied to the outer header, and thus ECN cannot
502	   be applied to the encapsulated packet.

504	   The behavior that is specified in Section 4 of this document
505	   resembles the "full functionality" mode in the sense that it conveys
506	   some information from inner to outer header, and in the sense that it
507	   enables full ECN support along the MPLS LSP (which is analogous to an
508	   IP tunnel in this context).  However it differs in one respect, which
509	   is that the CE information is conveyed from the inner header to the
510	   outer header.  Our reason for this different design choice is to give
511	   interior routers and LSRs more information about upstream marking in
512	   multi-bottleneck cases.  For instance, the flow pre-emption marking
513	   mechanism proposed for PCN works by only considering packets for
514	   marking that have not already been marked upstream.  Unless existing
515	   pre-emption marking is copied from the inner to the outer header at
516	   tunnel ingress, the mechanism doesn't pre-empt enough traffic in
517	   cases where anomalous events hit multiple MPLS domains at once.
518	   [RFC3168] does not give any reasons against conveying CE information
519	   from the inner header to the outer in the "full functionality" mode.
520	   So, rather than define different encapsulation methods for ECN and
521	   PCN, Section 4 defines a common approach for both.

523	7.1.  Alternative approach to support ECN in an MPLS domain

525	   It is possible to define an approach for MPLS support of ECN that
526	   more closely resembles that of the full functionality mode of
527	   [RFC3168].  This approach would differ from that described in
528	   Section 4 in the following ways:

530	   o  when pushing one or more MPLS labels onto an IP packet, the not-CM
531	      state is set in the EXP field of all label stack entries

533	   o  when pushing one or more MPLS labels onto an MPLS packet, the
534	      not-CM state is set in the EXP field of all newly added label
535	      stack entries

537	   o  when popping an MPLS label and the exposed header is MPLS (i.e.
538	      this is not the end of stack), the EXP field of the MPLS packet
539	      should be set to CM if the popped label's EXP value was CM and
540	      left unchanged otherwise

542	   o  when popping an MPLS label and the exposed header is IP, the IP
543	      ECN field should be set to CE if the EXP value was CM and if the
544	      IP header indicated that the packet was ECN capable.  If the IP
545	      header indicated not-ECT and the EXP value was CM, the packet MUST
546	      be dropped.  If the EXP value was not-CM, the ECN field in the IP
547	      header is unchanged.

549	   The advantages of this scheme over that described in Section 4 are
550	   greater similarity to [RFC3168], and the ability to determine, at the
551	   end of an LSP, that congestion either did or did not occur along that
552	   LSP (since the initial state is always not-CM at the start of an
553	   LSP).

555	   A disadvantage of this approach is that exceptions to this rule are
556	   necessary in cases where the marking process on LSRs needs to depend
557	   on whether a packet has already suffered upstream marking.  The
558	   currently proposed pre-emption marking in PCN is an example where
559	   such an exception would be necessary (see the discussion at the start
560	   of Section 7).

562	8.  Example Uses

564	8.1.  RFC3168-style ECN

566	   [RFC3168] proposes the use of ECN in TCP and introduces the use of
567	   ECN-Echo and CWR flags in the TCP header for initialisation.  The TCP
568	   sender responds accordingly (such as not increasing the congestion
569	   window) when it receives an ECN-Echo (ECE) ACK packet (that is, an
570	   ACK packet with ECN-Echo flag set in the TCP header), then the sender
571	   knows that congestion was encountered in the network on the path from
572	   the sender to the receiver.

574	   It would be possible to enable ECN in an MPLS domain for Diffserv
575	   PHBs like AF and best efforts that are expected to be used by TCP and
576	   similar transports (e.g.  DCCP [RFC4340]).  Then end-to-end
577	   congestion control in transports capable of understanding ECN would
578	   be able to respond to approaching congestion on LSRs without having
579	   to rely on packet discard to signal congestion.

581	8.2.  ECN Co-existence with Diffserv E-LSPs

583	   Many operators today have deployed Diffserv using the E-LSP approach
584	   of [RFC3270].  In many cases the number of PHBs used is less than 8,
585	   and hence there remain available codepoints in the EXP space.  If an
586	   operator wished to support ECN for single PHB, this can be
587	   accomplished by simply allocated a second codepoint to the PHB for
588	   the "CM" state of that PHB and retaining the old codepoint for the
589	   "not-CM" state.  An operator with only four deployed PHBs could of
590	   course enable ECN marking on all those PHBs.  It is easy to imagine
591	   cases where some PHBs might benefit more from ECN than others - for
592	   example, an operator might use ECN on a premium data service but not
593	   on a PHB used for best effort internet traffic.

595	   As an illustrative example of how the EXP field might be used in this
596	   case, consider the example of an operator who is using the aggregated
597	   service classes described in [I-D.chan-tsvwg-diffserv-class-aggr].
598	   He may choose to support ECN only for the Assured Elastic Treatment
599	   Aggregate, using the EXP codepoint 010 for the not-CM state and 011
600	   for the CM state.  All other codepoints could be the same as in
601	   [I-D.chan-tsvwg-diffserv-class-aggr].  Of course any other
602	   combination of EXP values can be used according to the specific set
603	   of PHBs and marking conventions used within that operator's network.

605	8.3.  Congestion-feedback-based Traffic Engineering

607	   Shayman's traffic engineering [Shayman] proposed the use of ECN by an
608	   egress LSR feeding back congestion to an ingress LSR to mitigate
609	   congestion by employing dynamic traffic engineering techniques such
610	   as shifting flows to an alternate path.  It proposed a new RSVP
611	   TUNNEL CONGESTION message which was sent to the ingress LSR and
612	   ignored by transit LSRs.

614	8.4.  PCN flow admission control and flow pre-emption

616	   [I-D.briscoe-tsvwg-cl-architecture] proposes using pre-congestion
617	   notification (PCN) on routers within an edge-to-edge Diffserv region
618	   to control admission of new flows to the region and, if necessary, to
619	   pre-empt existing flows in response to disasters and other anomalous
620	   routing events.  In this approach, the current level of PCN marking
621	   is picked up by the signalling used to initiate each flow in order to
622	   inform the admission control decision for the whole region at once.
623	   As an example, a minor extension to RSVP signalling has been proposed
624	   [I-D.lefaucheur-rsvp-ecn] to carry this message, but a similar
625	   approach has also been proposed that uses NSIS signalling [I-D.ietf-
626	   nsis-rmd].

628	   If it is possible for LSRs to signify congestion in MPLS, PCN marking
629	   could be used for admission control and flow pre-emption across a
630	   Diffserv region, irrespective of whether it contained pure IP
631	   routers, MPLS LSRs, or both.  Indeed, the solution could be somewhat
632	   more efficient to implement if aggregates could identify themselves
633	   by their MPLS label.  Section 4.8 describes the mechanisms by which
634	   the necessary markings for PCN could be carried in the MPLS header.

636	   As an illustrative example of how the EXP field might be used in this
637	   case, consider the example of an operator who is using the aggregated
638	   service classes described in [I-D.chan-tsvwg-diffserv-class-aggr].
639	   He may choose to support PCN only for the Real Time Treatment
640	   Aggregate, using the EXP codepoint 100 for the not-marked (NM) state,
641	   101 for the Admission Marked (AM) state, and 111 for the Pre-emption
642	   Marked (PM) state.  All other codepoints could be the same as in
643	   [I-D.chan-tsvwg-diffserv-class-aggr].  Of course any other
644	   combination of EXP values can be used according to the specific set
645	   of PHBs and marking conventions used within that operator's network.

647	   It might also be possible to deploy a similar solution using PCN
648	   marking over MPLS for just admission control alone, or just flow pre-
649	   emption alone, particularly if codepoint space was at a premium in
650	   the MPLS EXP field.  However, the feasibility of deploying one
651	   without the other would require further study.

653	9.  Deployment Considerations

655	9.1.  Marking non-ECN Capable Packets

657	   What is the consequences of marking a packet that is not ECN-capable?
658	   Even if it will be dropped before leaving the domain, doesn't this
659	   consume resources unnecessarily?

661	   The problem only arises if there is congestion downstream of an
662	   earlier congested node.  It might be that marked packets are carried
663	   through this second congested router when, within the underlying IP
664	   header they are not ECN capable, so they will be dropped later.  Such
665	   packets might cause other packets to be marked (or dropped) that
666	   would not otherwise have been.

668	   We decided to use the per-domain ECT checking approach because it
669	   would become optimal as ECN deployment became prevalent.  The
670	   situation where traffic is carried beyond a congested LSR only to be
671	   dropped later should become less prevalent as more transports use
672	   ECN.  This is why we chose not to use the [Floyd] alternative which
673	   introduced a low but persistent level of unnecessary packet drop for
674	   all time.  Although that scheme did not carry droppable traffic to
675	   the edge of the MPLS domain, we felt this was a small price to pay,
676	   and it was anyway only of concern until ECN had become more widely
677	   deployed.

679	   A partial solution would be to preferentially drop packets arriving
680	   at a congested router that were already marked.  There is no solution
681	   to the problem of marking a packet congested by another packet that
682	   should have been dropped.  However, the chance of such an occurrence
683	   is very low and the consequences are not significant.  It merely
684	   causes an application to very occasionally slow down its rate when it
685	   did not have to.

687	9.2.  Non-ECN capable routers in an MPLS Domain

689	   What if an MPLS domain wants to use ECN, but not all legacy routers
690	   are able to support it?

692	   If the legacy router(s) are used in the interior, this is not a
693	   problem.  They will simply have to drop the packets if they are
694	   congested, rather than mark them, which is the standard behaviour for
695	   IP routers that are not ECN-enabled.

697	   If the legacy router were used as an egress router, it would not be
698	   able to check the ECN capability of the transport correctly.  An
699	   operator in this position would not be able to use this solution and
700	   therefore MUST NOT enable ECN unless all egress routers are ECN-
701	   capable.

703	10.  IANA Considerations

705	   This document makes no request of IANA.

707	   Note to RFC Editor: this section may be removed on publication as an
708	   RFC.

710	11.  Security Considerations

712	   We believe no new vulnerabilities are introduced by this draft.

714	   We have considered whether malicious sources might be able to exploit
715	   the fact that interior LSRs will mark packets that are Not-ECT,
716	   relying on their egress LSR to drop them.  Although this might allow
717	   sources to engineer a situation where more traffic is carried across
718	   an MPLS domain than should be, we figured that even if we hadn't
719	   introduced this feature, these sources would have been able to
720	   prevent these LSRs dropping this traffic anyway, simply by setting
721	   ECT in the first place.

723	   An ECN sender can use the ECN nonce [RFC3540] to detect a misbehaving
724	   receiver.  The ECN nonce works correctly across an MPLS domain
725	   without requiring any specific support from the proposal in this
726	   draft.  The nonce does not need to be present in the MPLS shim
727	   header.  As long as the nonce is present in the IP header when the
728	   ECN information is copied from the last MPLS shim header, it will be
729	   overwritten if congestion has been experienced by an LSR.  This is
730	   all that is necessary for the sender to detect a misbehaving
731	   receiver.

733	   An alternative proposal currently in progress in the IETF
734	   [I-D.briscoe-tsvwg-re-ecn-tcp] allows the network to prevent
735	   misbehaviour by senders or receivers or other routers.  Like the ECN
736	   nonce, it works correctly without requiring any specific support from
737	   the proposal in this draft.  It uses a bit in the IP header (the RE
738	   bit) which is set by the sender and never changed along the path-it
739	   is only read by certain policing elements in the network.  There is
740	   no need for a copy of this bit in the MPLS shim, as policing nodes
741	   can examine the IP header if they need to, particularly given they
742	   are intended to only be necessary at domain borders where MPLS
743	   headers are often removed.

745	12.  Acknowledgements

747	   Thanks to K.K. Ramakrishnan and Sally Floyd for getting us thinking
748	   about this in the first place and for providing advice on tunneling
749	   of ECN packets, and to Joe Babiarz and Ben Niven-Jenkins for their
750	   comments on the draft.

752	13.  References

754	13.1.  Normative References

756	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
757	              Requirement Levels", BCP 14, RFC 2119, March 1997.

759	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
760	              and W. Weiss, "An Architecture for Differentiated
761	              Services", RFC 2475, December 1998.

763	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
764	              Label Switching Architecture", RFC 3031, January 2001.

766	   [RFC3032]  Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
767	              Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
768	              Encoding", RFC 3032, January 2001.

770	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
771	              of Explicit Congestion Notification (ECN) to IP",
772	              RFC 3168, September 2001.

774	   [RFC3260]  Grossman, D., "New Terminology and Clarifications for
775	              Diffserv", RFC 3260, April 2002.

777	   [RFC3270]  Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen,
778	              P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-
779	              Protocol Label Switching (MPLS) Support of Differentiated
780	              Services", RFC 3270, May 2002.

782	13.2.  Informative References

784	   [Floyd]    "A Proposal to Incorporate ECN in MPLS", 1999.

786	              Work in progress. http://www.icir.org/floyd/papers/
787	              draft-ietf-mpls-ecn-00.txt

789	   [I-D.briscoe-tsvwg-cl-architecture]
790	              Briscoe, B., "A Framework for Admission Control over
791	              DiffServ using Pre-Congestion  Notification",
792	              draft-briscoe-tsvwg-cl-architecture-02 (work in progress),
793	              March 2006.

795	   [I-D.briscoe-tsvwg-cl-phb]
796	              Briscoe, B., "Pre-Congestion Notification marking",
797	              draft-briscoe-tsvwg-cl-phb-01 (work in progress),
798	              March 2006.

800	   [I-D.briscoe-tsvwg-re-ecn-border-cheat]
801	              Briscoe, B., "Emulating Border Flow Policing using Re-ECN
802	              on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-00
803	              (work in progress), February 2006.

805	   [I-D.briscoe-tsvwg-re-ecn-tcp]
806	              Briscoe, B., "Re-ECN: Adding Accountability for Causing
807	              Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-01
808	              (work in progress), March 2006.

810	   [I-D.chan-tsvwg-diffserv-class-aggr]
811	              Chan, K., "Aggregation of DiffServ Service Classes",
812	              draft-chan-tsvwg-diffserv-class-aggr-03 (work in
813	              progress), January 2006.

815	   [I-D.ietf-nsis-rmd]
816	              Bader, A., "RMD-QOSM - The Resource Management in Diffserv
817	              QOS Model", draft-ietf-nsis-rmd-06 (work in progress),
818	              February 2006.

820	   [I-D.lefaucheur-rsvp-ecn]
821	              Faucheur, F., "RSVP Extensions for Admission Control over
822	              Diffserv using Pre-congestion  Notification",
823	              draft-lefaucheur-rsvp-ecn-00 (work in progress),
824	              October 2005.

826	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
827	              Congestion Notification (ECN) Signaling with Nonces",
828	              RFC 3540, June 2003.

830	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
831	              Congestion Control Protocol (DCCP)", RFC 4340, March 2006.

833	   [Shayman]  "Using ECN to Signal Congestion Within an MPLS Domain",
834	              2000.

836	              Work in progress. http://www.ee.umd.edu/~shayman/papers.d/
837	              draft-shayman-mpls-ecn-00.txt

839	Authors' Addresses

841	   Bruce Davie
842	   Cisco Systems, Inc.
843	   1414 Mass. Ave.
844	   Boxborough, MA  01719
845	   USA

847	   Email: bsd@cisco.com

849	   Bob Briscoe
850	   BT Research
851	   B54/77, Sirius House
852	   Adastral Park
853	   Martlesham Heath
854	   Ipswich
855	   Suffolk  IP5 3RE
856	   United Kingdom

858	   Email: bob.briscoe@bt.com

860	   June Tay
861	   BT Research
862	   B54/77, Sirius House
863	   Adastral Park
864	   Martlesham Heath
865	   Ipswich
866	   Suffolk  IP5 3RE
867	   United Kingdom

869	   Email: june.tay@bt.com

871	Intellectual Property Statement

873	   The IETF takes no position regarding the validity or scope of any
874	   Intellectual Property Rights or other rights that might be claimed to
875	   pertain to the implementation or use of the technology described in
876	   this document or the extent to which any license under such rights
877	   might or might not be available; nor does it represent that it has
878	   made any independent effort to identify any such rights.  Information
879	   on the procedures with respect to rights in RFC documents can be
880	   found in BCP 78 and BCP 79.

882	   Copies of IPR disclosures made to the IETF Secretariat and any
883	   assurances of licenses to be made available, or the result of an
884	   attempt made to obtain a general license or permission for the use of
885	   such proprietary rights by implementers or users of this
886	   specification can be obtained from the IETF on-line IPR repository at
887	   http://www.ietf.org/ipr.

889	   The IETF invites any interested party to bring to its attention any
890	   copyrights, patents or patent applications, or other proprietary
891	   rights that may cover technology that may be required to implement
892	   this standard.  Please address the information to the IETF at
893	   ietf-ipr@ietf.org.

895	Disclaimer of Validity

897	   This document and the information contained herein are provided on an
898	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
899	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
900	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
901	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
902	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
903	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

905	Copyright Statement

907	   Copyright (C) The Internet Society (2006).  This document is subject
908	   to the rights, licenses and restrictions contained in BCP 78, and
909	   except as set forth therein, the authors retain all their rights.

911	Acknowledgment

913	   Funding for the RFC Editor function is currently provided by the
914	   Internet Society.