idnits 2.17.1
draft-ietf-tsvwg-ecn-tunnel-02.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** The document seems to lack a License Notice according IETF Trust
Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
Section 6.b -- however, there's a paragraph with a matching beginning.
Boilerplate error?
(You're using the IETF Trust Provisions' Section 6.b License Notice from
12 Feb 2009 rather than one of the newer Notices. See
https://trustee.ietf.org/license-info/.)
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
== Line 1161 has weird spacing: '... both admis...'
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (March 24, 2009) is 5511 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
== Missing Reference: 'Encapsulate' is mentioned on line 1488, but not
defined
== Outdated reference: A later version (-11) exists of
draft-ietf-pcn-architecture-10
== Outdated reference: A later version (-07) exists of
draft-ietf-pcn-baseline-encoding-02
== Outdated reference: A later version (-05) exists of
draft-ietf-pcn-marking-behaviour-02
== Outdated reference: A later version (-02) exists of
draft-ietf-pwe3-congestion-frmwk-01
== Outdated reference: A later version (-02) exists of
draft-satoh-pcn-st-marking-01
-- Obsolete informational reference (is this intentional?): RFC 4306
(Obsoleted by RFC 5996)
-- Obsolete informational reference (is this intentional?): RFC 4423
(Obsoleted by RFC 9063)
Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 4 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Transport Area Working Group B. Briscoe
3 Internet-Draft BT
4 Intended status: Standards Track March 24, 2009
5 Expires: September 25, 2009
7 Tunnelling of Explicit Congestion Notification
8 draft-ietf-tsvwg-ecn-tunnel-02
10 Status of this Memo
12 This Internet-Draft is submitted to IETF in full conformance with the
13 provisions of BCP 78 and BCP 79.
15 Internet-Drafts are working documents of the Internet Engineering
16 Task Force (IETF), its areas, and its working groups. Note that
17 other groups may also distribute working documents as Internet-
18 Drafts.
20 Internet-Drafts are draft documents valid for a maximum of six months
21 and may be updated, replaced, or obsoleted by other documents at any
22 time. It is inappropriate to use Internet-Drafts as reference
23 material or to cite them other than as "work in progress."
25 The list of current Internet-Drafts can be accessed at
26 http://www.ietf.org/ietf/1id-abstracts.txt.
28 The list of Internet-Draft Shadow Directories can be accessed at
29 http://www.ietf.org/shadow.html.
31 This Internet-Draft will expire on September 25, 2009.
33 Copyright Notice
35 Copyright (c) 2009 IETF Trust and the persons identified as the
36 document authors. All rights reserved.
38 This document is subject to BCP 78 and the IETF Trust's Legal
39 Provisions Relating to IETF Documents in effect on the date of
40 publication of this document (http://trustee.ietf.org/license-info).
41 Please review these documents carefully, as they describe your rights
42 and restrictions with respect to this document.
44 Abstract
46 This document redefines how the explicit congestion notification
47 (ECN) field of the IP header should be constructed on entry to and
48 exit from any IP in IP tunnel. On encapsulation it brings all IP in
49 IP tunnels (v4 or v6) into line with the way RFC4301 IPsec tunnels
50 now construct the ECN field. On decapsulation it redefines how the
51 ECN field in the forwarded IP header should be calculated for two
52 previously invalid combinations of incoming inner and outer headers,
53 in order that these combinations may be usefully employed in future
54 standards actions. It includes a thorough analysis of the reasoning
55 for these changes and the implications.
57 Table of Contents
59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6
60 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 8
61 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 9
62 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 9
63 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 10
64 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 10
65 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 12
66 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 13
67 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14
68 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 14
69 4.3. Design Principles for Future Non-Default Schemes . . . . . 16
70 5. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 17
71 5.1. Non-Issues Upgrading Any Tunnel Decapsulation . . . . . . 18
72 5.2. Non-Issues for RFC4301 IPsec Encapsulation . . . . . . . . 18
73 5.3. Upgrading Other IP in IP Tunnel Encapsulators . . . . . . 19
74 6. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 20
75 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
76 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21
77 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 23
78 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24
79 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 25
80 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
81 12.1. Normative References . . . . . . . . . . . . . . . . . . . 25
82 12.2. Informative References . . . . . . . . . . . . . . . . . . 25
83 Appendix A. Design Constraints . . . . . . . . . . . . . . . . . 28
84 A.1. Security Constraints . . . . . . . . . . . . . . . . . . . 28
85 A.2. Control Constraints . . . . . . . . . . . . . . . . . . . 30
86 A.3. Management Constraints . . . . . . . . . . . . . . . . . . 31
87 Appendix B. Relative Placement of Tunnelling and In-Path Load
88 Regulation . . . . . . . . . . . . . . . . . . . . . 32
89 B.1. Identifiers and In-Path Load Regulators . . . . . . . . . 32
90 B.2. Non-Dependence of Tunnelling on In-path Load Regulation . 33
91 B.3. Dependence of In-Path Load Regulation on Tunnelling . . . 34
92 Appendix C. Contribution to Congestion across a Tunnel . . . . . 37
93 Appendix D. Why Not Propagating ECT(1) on Decapsulation
94 Impedes PCN . . . . . . . . . . . . . . . . . . . . . 38
95 D.1. Alternative Ways to Introduce the New Decapsulation
96 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 39
97 Appendix E. Why Resetting CE on Encapsulation Impedes PCN . . . . 40
98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 40
100 Changes from previous drafts (to be removed by the RFC Editor)
102 Full text differences between IETF draft versions are available at
103 , and
104 between earlier individual draft versions at
105
107 From ietf-01 to ietf-02 (current):
109 * Scope reduced from any encapsulation of an IP packet to solely
110 IP in IP tunnelled encapsulation. Consequently changed title
111 and removed whole section 'Design Guidelines for New
112 Encapsulations of Congestion Notification' (to be included in a
113 future companion informational document).
115 * Included a new normative decapsulation rule for ECT(0) inner
116 and ECT(1) outer that had previously only been outlined in the
117 non-normative appendix 'Comprehensive Decapsulation Rules'.
118 Consequently:
120 + The Introduction has been completely re-written to motivate
121 this change to decapsulation along with the existing change
122 to encapsulation.
124 + The tentative text in the appendix that first proposed this
125 change has been split between normative standards text in
126 Section 4 and Appendix D, which explains specifically why
127 this change would streamline PCN. New text on the logic of
128 the resulting decap rules added.
130 * If inner/outer is Not-ECT/ECT(0), changed decapsulation to
131 propagate Not-ECT rather than drop the packet; and added
132 reasoning.
134 * Considerably restructured:
136 + "Design Constraints" analysis moved to an appendix
137 (Appendix A);
139 + Added Section 3 to summarise relevant existing RFCs;
141 + Structured Section 4 and Section 5 into subsections.
143 + Added tables to sections on old and new rules, for precision
144 and comparison.
146 + Moved Section 4.3 on Design Principles to the end of the
147 section specifying the new default normative tunnelling
148 behaviour. Rewritten and shifted text on identifiers and
149 in-path load regulators to Appendix B.1.
151 From ietf-00 to ietf-01:
153 * Identified two additional alarm states in the decapsulation
154 rules (Figure 4) if ECT(X) in outer and inner contradict each
155 other.
157 * Altered Comprehensive Decapsulation Rules (Appendix D) so that
158 ECT(0) in the outer no longer overrides ECT(1) in the inner.
159 Used the term 'Comprehensive' instead of 'Ideal'. And
160 considerably updated the text in this appendix.
162 * Added Appendix D.1 to weigh up the various ways the
163 Comprehensive Decapsulation Rules might be introduced. This
164 replaces the previous contradictory statements saying complex
165 backwards compatibility interactions would be introduced while
166 also saying there would be no backwards compatibility issues.
168 * Updated references.
170 From briscoe-01 to ietf-00:
172 * Re-wrote Appendix C giving much simpler technique to measure
173 contribution to congestion across a tunnel.
175 * Added discussion of backward compatibility of the ideal
176 decapsulation scheme in Appendix D
178 * Updated references. Minor corrections & clarifications
179 throughout.
181 From -00 to -01:
183 * Related everything conceptually to the uniform and pipe models
184 of RFC2983 on Diffserv Tunnels, and completely removed the
185 dependence of tunnelling behaviour on the presence of any in-
186 path load regulation by using the [1 - Before] [2 - Outer]
187 function placement concepts from RFC2983;
189 * Added specific cases where the existing standards limit new
190 proposals, particularly Appendix E;
192 * Added sub-structure to Introduction (Need for Rationalisation,
193 Roadmap), added new Introductory subsection on "Scope" and
194 improved clarity;
196 * Added Design Guidelines for New Encapsulations of Congestion
197 Notification;
199 * Considerably clarified the Backward Compatibility section
200 (Section 5);
202 * Considerably extended the Security Considerations section
203 (Section 8);
205 * Summarised the primary rationale much better in the
206 conclusions;
208 * Added numerous extra acknowledgements;
210 * Added Appendix E. "Why resetting CE on encapsulation harms
211 PCN", Appendix C. "Contribution to Congestion across a Tunnel"
212 and Appendix D. "Ideal Decapsulation Rules";
214 * Re-wrote Appendix B.2, explaining how tunnel encapsulation no
215 longer depends on in-path load-regulation (changed title from
216 "In-path Load Regulation" to "Non-Dependence of Tunnelling on
217 In-path Load Regulation"), but explained how an in-path load
218 regulation function must be carefully placed with respect to
219 tunnel encapsulation (in a new sub-section entitled "Dependence
220 of In-Path Load Regulation on Tunnelling").
222 1. Introduction
224 This document redefines how the explicit congestion notification
225 (ECN) field [RFC3168] in the IP header should be constructed for all
226 IP in IP tunnelling. Previously, tunnel endpoints blocked visibility
227 of transitions of the ECN field except the minimum necessary to allow
228 the basic ECN mechanism to work. Three main change are defined, one
229 on entry to and two on exit from any IP in IP tunnel. The newly
230 specified behaviours make all transitions to the ECN field visible
231 across tunnel end-points, so tunnels no longer restrict new uses of
232 the ECN field that were not envisaged when ECN was first designed.
234 The immediate motivation for opening up the ECN behaviour of tunnels
235 is because otherwise they impede the introduction of pre-congestion
236 notification (PCN [I-D.ietf-pcn-marking-behaviour]) in networks with
237 tunnels (Appendix E explains why). But these changes are not just
238 intended to ease the introduction of PCN; care has been taken to
239 ensure the resulting ECN tunnelling behaviour is simple and generic
240 for other potential future uses.
242 Given this is a change to behaviour at 'the neck of the hourglass',
243 an extensive analysis of the trade-offs between control, management
244 and security constraints has been conducted in order to minimise
245 unexpected side-effects both now and in the future. Care has also
246 been taken to ensure the changes are fully backwards compatible with
247 all previous tunnelling behaviours.
249 The ECN protocol allows a forwarding element to notify the onset of
250 congestion of its resources without having to drop packets. Instead
251 it can explicitly mark a proportion of packets by setting the
252 congestion experienced (CE) codepoint in the 2-bit ECN field in the
253 IP header (see Table 1 for a recap of the ECN codepoints).
255 +------------------+----------------+---------------------------+
256 | Binary codepoint | Codepoint name | Meaning |
257 +------------------+----------------+---------------------------+
258 | 00 | Not-ECT | Not ECN-capable transport |
259 | 01 | ECT(1) | ECN-capable transport |
260 | 10 | ECT(0) | ECN-capable transport |
261 | 11 | CE | Congestion experienced |
262 +------------------+----------------+---------------------------+
264 Table 1: Recap of Codepoints of the ECN Field [RFC3168] in the IP
265 Header
267 The outer header of an IP packet can encapsulate one (or more)
268 additional IP headers tunnelled within it. A forwarding element that
269 is using ECN to signify congestion will only mark the outer IP header
270 that is immediately visible to it. When a tunnel decapsulator later
271 removes this outer header, it must follow rules to ensure the marking
272 is propagated into the IP header being forwarded onwards, otherwise
273 congestion notifications will disappear into a black hole leading to
274 potential congestion collapse.
276 The rules for constructing the ECN field to be forwarded after tunnel
277 decapsulation ensure this happens, but they are not wholly
278 straightforward, and neither are the rules for encapsulating one IP
279 header in another on entry to a tunnel. The factor that has
280 introduced most complication at both ends of a tunnel has been the
281 possibility that the ECN field might be used as a covert channel to
282 compromise the integrity of an IPsec tunnel.
284 A common use for IPsec is to create a secure tunnel between two
285 secure sites across the public Internet. A field like ECN that can
286 change as it traverses the Internet cannot be covered by IPsec's
287 integrity mechanisms. Therefore, the ECN field might be toggled
288 (with two bits per packet) to communicate between a secure site and
289 someone on the public Internet--a covert channel.
291 Over the years covert channel restrictions have been added to the
292 design of ECN (with consequent backward compatibility complications).
293 However the latest IPsec architecture [RFC4301] takes the view that
294 simplicity is more important than closing off the covert channel
295 threat, which it deems manageable given its bandwidth is limited to
296 two bits per packet.
298 As a result, an unfortunate sequence of standards actions has left us
299 with nearly the worst of all possible combinations of outcomes,
300 despite the best endeavours of everyone concerned. The new IPsec
301 architecture [RFC4301] only updates the earlier specification of ECN
302 tunnelling behaviour [RFC3168] for the case of IPsec tunnels. For
303 the case of non-IPsec tunnels the earlier RFC3168 specification still
304 applies. At the time RFC3168 was standardised, covert channels
305 through the ECN field were restricted, whether or not IPsec was being
306 used. The perverse position now is that non-IPsec tunnels restrict
307 covert channels, while IPsec tunnels don't.
309 Actually, this statement needs some qualification. IPsec tunnels
310 only don't restrict the ECN covert channel at the ingress. At the
311 tunnel egress, the presumption that the ECN covert channel should be
312 restricted has not been removed from any tunnelling specifications,
313 whether IPsec or not.
315 Now that these historic 2-bit covert channel constraints are impeding
316 the introduction of PCN, this specification is designed to remove
317 them and at the same time streamline the whole ECN behaviour for the
318 future.
320 1.1. Scope
322 This document only concerns wire protocol processing at tunnel
323 endpoints and makes no changes or recommendations concerning
324 algorithms for congestion marking or congestion response.
326 This document specifies common, default ECN field processing at
327 encapsulation and decapsulation for any IP in IP tunnelling. It
328 applies irrespective of whether IPv4 or IPv6 is used for either of
329 the inner and outer headers. It applies to all Diffserv per-hop
330 behaviours (PHBs), unless stated otherwise in the specification of a
331 PHB. It is intended to be a good trade off between somewhat
332 conflicting security, control and management requirements.
334 Nonetheless, if necessary, an alternate congestion encapsulation
335 behaviour can be introduced as part of the definition of an alternate
336 congestion marking scheme used by a specific Diffserv PHB (see S.5 of
337 [RFC3168] and [RFC4774]). When designing such new encapsulation
338 schemes, the principles in Section 4.3 should be followed as closely
339 as possible. There is no requirement for a PHB to state anything
340 about ECN tunnelling behaviour if the new default behaviour is
341 sufficient.
343 [RFC2983] is a comprehensive primer on differentiated services and
344 tunnels. Given ECN raises similar issues to differentiated services
345 when interacting with tunnels, useful concepts introduced in RFC2983
346 are used throughout, with brief recaps of the explanations where
347 necessary.
349 1.2. Document Roadmap
351 The body of the document focuses solely on standards actions
352 impacting implementation. Appendices record the analysis that
353 motivates and justifies these actions. The whole document is
354 organised as follows:
356 o Section 3 recaps relevant existing RFCs and explains exactly why
357 changes are needed, referring to Appendix D and Appendix E in
358 order to explain in detail why current tunnelling behaviours
359 impede PCN deployment, at egress and ingress respectively.
361 o Section 4 uses precise standards terminology to specify the new
362 ECN tunnelling behaviours. It refers to Appendix A for analysis
363 of the trade-offs between security, control and management design
364 constraints that led to these particular standards actions.
366 o Extending the new IPsec tunnel ingress behaviour to all IP in IP
367 tunnels requires consideration of backwards compatibility, which
368 is covered in Section 5 and detailed changes from earlier RFCs are
369 brought together in Section 6.
371 o Finally, a number of security considerations are discussed and
372 conclusions are drawn.
374 o Additional specialist issues are deferred to appendices in
375 addition to those already referred to above, in particular
376 Appendix B discusses specialist tunnelling issues that could arise
377 when ECN is fed back to a load regulation function on a middlebox,
378 rather than at the source of the path.
380 2. Requirements Language
382 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
383 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
384 document are to be interpreted as described in RFC 2119 [RFC2119].
386 3. Summary of Pre-Existing RFCs
388 This section is informative not normative. It merely recaps pre-
389 existing RFCs to help motivate changing these behaviours. Earlier
390 relevant RFCs that were either experimental or incomplete with
391 respect to ECN tunnelling (RFC2481, RFC2401 and RFC2003) are not
392 discussed, although the backwards compatibility considerations in
393 Section 5 take them into account. The question of whether tunnel
394 implementations used in the Internet comply with any of these RFCs is
395 also not discussed.
397 3.1. Encapsulation at Tunnel Ingress
399 The controversy at tunnel ingress has been over whether to propagate
400 information about congestion experienced on the path upstream of the
401 tunnel ingress into the outer header of the tunnel.
403 Specifically, RFC3168 says that, if a tunnel fully supports ECN
404 (termed a 'full-functionality' ECN tunnel in [RFC3168]), the tunnel
405 ingress must not copy a CE marking from the inner header into the
406 outer header that it creates. Instead the tunnel ingress must set
407 the outer header to ECT(0) (i.e. codepoint 10) if the ECN field is
408 marked CE (codepoint 11) in the arriving IP header. We term this
409 'resetting' a CE codepoint.
411 However, the new IPsec architecture in [RFC4301] reverses this rule,
412 stating that the tunnel ingress must simply copy the ECN field from
413 the arriving to the outer header. The main purpose of the present
414 specification is to carry the new behaviour of IPsec over to all IP
415 in IP tunnels, so all tunnel ingress nodes consistently copy the ECN
416 field.
418 RFC3168 also provided a Limited Functionality mode that turns off ECN
419 processing over the scope of the tunnel. This is necessary if the
420 ingress does not know whether the tunnel egress supports propagation
421 of ECN markings. Neither Limited Functionality mode nor Full
422 Functionality mode are used in RFC4301 IPsec.
424 These pre-existing behaviours are summarised in Figure 1.
426 +-----------------+-----------------------------------------------+
427 | Incoming Header | Outgoing Outer Header |
428 | (also equal to +---------------+---------------+---------------+
429 | Outgoing Inner | RFC3168 ECN | RFC3168 ECN | RFC4301 IPsec |
430 | Header) | Limited | Full | |
431 | | Functionality | Functionality | |
432 +-----------------+---------------+---------------+---------------+
433 | Not-ECT | Not-ECT | Not-ECT | Not-ECT |
434 | ECT(0) | Not-ECT | ECT(0) | ECT(0) |
435 | ECT(1) | Not-ECT | ECT(1) | ECT(1) |
436 | CE | Not-ECT | ECT(0) | CE e|
437 +-----------------+---------------+---------------+---------------+
439 Figure 1: IP in IP Encapsulation: Recap of Pre-existing Behaviours
441 For encapsulation, the specification in Section 4 below brings all IP
442 in IP tunnels (v4 or v6) into line with the way IPsec tunnels
443 [RFC4301] now construct the ECN field, except where a legacy tunnel
444 egress might not understand ECN at all. This removes the now
445 redundant full functionality mode in the middle column of Figure 1.
446 Wherever possible it ensures that the outer header reveals any
447 congestion experienced so far on the whole path, not just since the
448 last tunnel ingress.
450 Why does it matter if we have different ECN encapsulation behaviours
451 for IPsec and non-IPsec tunnels? A general answer is that gratuitous
452 inconsistency constrains the available design space and makes it
453 harder to design networks and new protocols that work predictably.
455 But there is also a specific need not to reset the CE codepoint. The
456 standards track proposal for excess rate pre-congestion notification
457 (PCN [I-D.ietf-pcn-marking-behaviour]) only works correctly in the
458 presence of RFC4301 IPsec encapsulation or [RFC5129] MPLS
459 encapsulation, but not with RFC3168 IP in IP encapsulation
460 (Appendix E explains why). The PCN architecture
461 [I-D.ietf-pcn-architecture] states that the regular RFC3168 rules for
462 IP in IP tunnelling of the ECN field should not be used for PCN. But
463 if non-IPsec tunnels are already present within a network to which
464 PCN is being added, that is not particularly helpful advice.
466 The present specification provides a clean solution to this problem,
467 so that network operators who want to use PCN and tunnels can specify
468 that all tunnel endpoints in a PCN region need to be upgraded to
469 comply with this specification. Also, whether using PCN or not, as
470 more tunnel endpoints comply with this specification, it should make
471 ECN behaviour simpler, faster and more predictable.
473 To ensure copying rather than resetting CE on ingress will not cause
474 unintended side-effects, Appendix A assesses whether either harm any
475 security, control or management functions. It finds that resetting
476 CE makes life difficult in a number of directions, while copying CE
477 harms nothing (other than opening a low bit-rate covert channel
478 vulnerability which the IETF Security Area now deems is manageable).
480 3.2. Decapsulation at Tunnel Egress
482 Both RFC3168 and RFC4301 specify the decapsulation behaviour
483 summarised in Figure 2. The ECN field in the outgoing header is set
484 to the codepoint at the intersection of the appropriate incoming
485 inner header (row) and incoming outer header (column).
486 +------------------+----------------------------------------------+
487 | Incoming Inner | Incoming Outer Header |
488 | Header +---------+------------+------------+----------+
489 | | Not-ECT | ECT(0) | ECT(1) | CE |
490 +------------------+---------+------------+------------+----------+
491 | Not-ECT | Not-ECT | drop(!!!)| drop(!!!)| drop(!!!)|
492 | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE |
493 | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE |
494 | CE | CE | CE | CE | CE |
495 +------------------+---------+------------+------------+----------+
496 | Outgoing Header |
497 +----------------------------------------------+
499 Figure 2: IP in IP Decapsulation; Recap of Pre-existing Behaviour
501 The behaviour in the table derives from the logic given in RFC3168,
502 briefly recapped as follows:
504 o On decapsulation, if the inner ECN field is Not-ECT but the outer
505 ECN field is anything except Not-ECT the decapsulator must drop
506 the packet. Drop is mandated because known legal protocol
507 transitions should not be able to lead to these cases (indicated
508 in the table by '(!!!)'), therefore the decapsulator may also
509 raise an alarm;
511 o In all other cases, the outgoing ECN field is set to the more
512 severe marking of the outer and inner ECN fields, where the
513 ranking of severity from highest to lowest is CE, ECT, Not-ECT;
515 o ECT(0) and ECT(1) are considered of equal severity (indicated by
516 just 'ECT' in the rank order above). Where the inner and outer
517 ECN fields are both ECT but they differ, the packet is forwarded
518 with the codepoint of the inner ECN field, which prevents ECT
519 codepoints being used for a covert channel.
521 The specification for decapsulation in Section 4 fixes two problems
522 with this pre-existing behaviour:
524 o Firstly, forwarding the codepoint of the inner header in the cases
525 where both inner and outer are different values of ECT effectively
526 implies that any distinction between ECT(0) and ECT(1) cannot be
527 introduced in the future wherever a tunnel might be deployed.
528 Therefore, the currently specified tunnel decapsulation behaviour
529 unnecessarily wastes one of four codepoints (effectively wasting
530 half a bit) in the IP (v4 & v6) header. As explained in
531 Appendix A.1, the original reason for not using the outer ECT
532 codepoints for onward forwarding was to limit the covert channel
533 across a decapsulator to 1 bit per packet. However, now that the
534 IETF Security Area has deemed that a 2-bit covert channel through
535 an encapsulator is a manageable risk, the same should be true for
536 a decapsulator.
538 As well as being a general future-proofing issue, this problem is
539 immediately pressing for standardisation of pre-congestion
540 notification (PCN). PCN solutions generally require three
541 encoding states in addition to Not-ECT: one for 'not marked' and
542 two increasingly severe levels of marking. Although the ECN field
543 gives sufficient codepoints for these three states, they cannot
544 all be used for PCN because a change between ECT(0) and ECT(1) in
545 any tunnelled packet would be lost when the outer header was
546 decapsulated, dangerously discarding congestion signalling. A
547 number of wasteful or convoluted work-rounds to this problem are
548 being considered for standardisation by the PCN working group (see
549 Appendix D), but by far the simplest approach is just to remove
550 the covert channel blockages from tunnelling behaviour, that are
551 now deemed unnecessary anyway. Not only will this streamline PCN
552 standardisation, but it could also streamline other future uses of
553 these codepoints.
555 o Secondly, mandating drop is not always a good idea just because a
556 combination of headers seems invalid. There are many cases where
557 it has become nearly impossible to deploy new standards because
558 legacy middleboxes drop packets carrying header values they don't
559 expect. Where possible, the new decapsulation behaviour specified
560 in Section 4 below is more liberal in its response to unexpected
561 combinations of headers.
563 4. New ECN Tunnelling Rules
565 The ECN tunnel processing rules below in Section 4.1 (ingress
566 encapsulation) and Section 4.2 (egress decapsulation) are the default
567 for a packet with any DSCP. If required, different ECN encapsulation
568 rules MAY be defined as part of the definition of an appropriate
569 Diffserv PHB using the guidelines that follow in Section 4.3.
570 However, the deployment burden of handling exceptional PHBs in
571 implementations of all affected tunnels and lower layer link
572 protocols should not be underestimated.
574 4.1. Default Tunnel Ingress Behaviour
576 A tunnel ingress compliant with this specification MUST implement a
577 `normal mode'. It might also need to implement a `compatibility
578 mode' for backward compatibility with legacy tunnel egresses that do
579 not understand ECN (see Section 5 for when compatibility mode is
580 required). Note that these are modes of the ingress tunnel endpoint
581 only, not the tunnel as a whole.
583 Whatever the mode, the tunnel ingress forwards the inner header
584 without changing the ECN field. In normal mode a tunnel ingress
585 compliant with this specification MUST construct the outer
586 encapsulating IP header by copying the 2-bit ECN field of the
587 arriving IP header. In compatibility mode it clears the ECN field in
588 the outer header to the Not-ECT codepoint. These rules are tabulated
589 for convenience in Figure 3.
590 +-----------------+-------------------------------+
591 | Incoming Header | Outgoing Outer Header |
592 | (also equal to +---------------+---------------+
593 | Outgoing Inner | Compatibility | Normal |
594 | Header) | Mode | Mode |
595 +-----------------+---------------+---------------+
596 | Not-ECT | Not-ECT | Not-ECT |
597 | ECT(0) | Not-ECT | ECT(0) |
598 | ECT(1) | Not-ECT | ECT(1) |
599 | CE | Not-ECT | CE |
600 +-----------------+---------------+---------------+
602 Figure 3: New IP in IP Encapsulation Behaviours
604 Compatibility mode is the same per packet behaviour as the ingress
605 end of RFC3168's limited functionality mode. Normal mode is the same
606 per packet behaviour as the ingress end of RFC4301 IPsec.
608 4.2. Default Tunnel Egress Behaviour
610 To decapsulate the inner header at the tunnel egress, a compliant
611 tunnel egress MUST set the outgoing ECN field to the codepoint at the
612 intersection of the appropriate incoming inner header (row) and outer
613 header (column) in Figure 4.
615 +------------------+----------------------------------------------+
616 | Incoming Inner | Incoming Outer Header |
617 | Header +---------+------------+------------+----------+
618 | | Not-ECT | ECT(0) | ECT(1) | CE |
619 +------------------+---------+------------+------------+----------+
620 | Not-ECT | Not-ECT |Not-ECT(!!!)| drop(!!!)| drop(!!!)|
621 | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE |
622 | ECT(1) | ECT(1) | ECT(1)(!!!)| ECT(1) | CE |
623 | CE | CE | CE | CE(!!!)| CE |
624 +------------------+---------+------------+------------+----------+
625 | Outgoing Header |
626 +----------------------------------------------+
628 Figure 4: New IP in IP Decapsulation Behaviour
630 This table for decapsulation behaviour is derived from the following
631 logic:
633 o If the inner ECN field is Not-ECT the decapsulator MUST NOT
634 propagate any other ECN codepoint in the outer header onwards.
635 This is because the inner Not-ECT marking is set by transports
636 that would not understand the ECN protocol. Instead:
638 * If the inner ECN field is Not-ECT and the outer ECN field is
639 ECT(1) or CE the decapsulator MUST drop the packet.
640 Reasoning: these combinations of codepoints either imply some
641 illegal protocol transition has occurred within the tunnel, or
642 that some locally defined mechanism is being used within the
643 tunnel that might be signalling congestion. In either case,
644 the only appropriate signal to the transport is a packet drop.
645 It would have been nice to allow packets with ECT(1) in the
646 outer to be forwarded, but drop has had to be mandated in case
647 future multi-level ECN schemes are defined. Then ECT(1) and CE
648 can be used in the future to signify two levels of congestion
649 severity.
651 * If the inner ECN field is Not-ECT and the outer ECN field is
652 ECT(0) or Not-ECT the decapsulator MUST forward the packet with
653 the ECN field cleared to Not-ECT.
654 Reasoning: Although no known legal protocol transition would
655 lead to ECT(0) in the outer and Not-ECT in the inner, no known
656 or proposed protocol uses ECT(0) as a congestion signal either.
657 Therefore in this case the packet can be forwarded rather than
658 dropped, which will allow future standards actions to use this
659 combination.
661 o In all other cases, the outgoing ECN field is set to the more
662 severe marking of the outer and inner ECN fields, where the
663 ranking of severity from highest to lowest is CE, ECT(1), ECT(0),
664 Not-ECT;
666 o There are cases where no currently legal transition in any current
667 or previous ECN tunneling specification would result in certain
668 combinations of inner and outer ECN fields. These cases are
669 indicated in Figure 4 by '(!!!)'). In these cases, the
670 decapsulator SHOULD log the event and MAY also raise an alarm, but
671 not so often that the illegal combinations would amplify into a
672 flood of alarm messages.
674 The above logic allows for ECT(0) and ECT(1) to both represent the
675 same severity of congestion marking (e.g. "not congestion marked").
676 But it also allows future schemes to be defined where ECT(1) is a
677 more severe marking than ECT(0). This approach is discussed in
678 Appendix D and in the discussion of the ECN nonce [RFC3540] in
679 Section 8.
681 4.3. Design Principles for Future Non-Default Schemes
683 This section is informative not normative.
685 S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
686 'switch in' different behaviours for marking the ECN field, just as
687 it switches in different per-hop behaviours (PHBs) for scheduling.
688 Therefore here we give guidance for designing possibly different
689 marking schemes.
691 In one word the guidance is "Don't". If a scheme requires tunnels to
692 implement special processing of the ECN field for certain DSCPs, it
693 is highly unlikely that every implementer of every tunnel will want
694 to add the required exception and that operators will want to deploy
695 the required configuration options. Therefore it is highly likely
696 that some tunnels within a network will not implement this special
697 case. Therefore, designers should avoid non-default tunnelling
698 schemes if at all possible.
700 That said, if a non-default scheme for processing the ECN field is
701 really required, the following guidelines may prove useful in its
702 design:
704 o For any new scheme, a tunnel ingress should not set the ECN field
705 of the outer header if it cannot guarantee that any corresponding
706 tunnel egress will understand how to handle such an ECN field.
708 o On encapsulation in any new scheme, an outer header capable of
709 carrying congestion markings should reflect accumulated congestion
710 since the last interface designed to regulate load (see
711 Appendix A.2 for the definition of a Load Regulator, which is
712 usually but not always the data source). This implies that new
713 schemes for tunnelling congestion notification should copy
714 congestion notification into the outer header of each new
715 encapsulating header that supports it.
717 Reasoning: The constraints from the three perspectives of
718 security, control and management in Appendix A are somewhat in
719 tension as to whether a tunnel ingress should copy congestion
720 markings into the outer header it creates or reset them. From the
721 control perspective either copying or resetting works for existing
722 arrangements, but copying has more potential for simplifying
723 control. From the management perspective copying is preferable.
724 From the security perspective resetting is preferable but copying
725 is now considered acceptable given the bandwidth of a 2-bit covert
726 channel can be managed. Therefore, on balance, copying is simpler
727 and more useful than resetting and does minimal harm.
729 o For any new scheme, a tunnel egress should not forward any ECN
730 codepoint if the arriving inner header implies the transport will
731 not understand how to process it.
733 o On decapsulation in any new scheme, if a combination of inner and
734 outer headers is encountered that should not have been possible,
735 this event should be logged and an alarm raised. But the packet
736 should still be forwarded with a safe codepoint setting if at all
737 possible. This increases the chances of 'forward compatibility'
738 with possible future protocol extensions.
740 o On decapsulation in any new scheme, the ECN field that the tunnel
741 egress forwards should reflect the more severe congestion marking
742 of the arriving inner and outer headers.
744 5. Backward Compatibility
746 Note: in RFC3168, a whole tunnel was considered in one of two modes:
747 limited functionality or full functionality. The new modes defined
748 in this specification are only modes of the tunnel ingress. The new
749 tunnel egress behaviour has only one mode and doesn't need to know
750 what mode the ingress is in.
752 5.1. Non-Issues Upgrading Any Tunnel Decapsulation
754 This specification only changes the egress per-packet calculation of
755 the ECN field for combinations of inner and outer headers that have
756 so far not been used in any IETF protocols. Therefore, a tunnel
757 egress complying with any previous specification (RFC4301, both modes
758 of RFC3168, both modes of RFC2481, RFC2401 and RFC2003) can be
759 upgraded to comply with this new decapsulation specification without
760 any backwards compatibility issues.
762 The proposed tunnel egress behaviour also requires no additional mode
763 or option configuration at the ingress or egress nor any additional
764 negotiation with the ingress. A compliant tunnel egress merely needs
765 to implement the one behaviour in Section 4. The reduction to one
766 mode at the egress has no backwards compatibility issues, because
767 previously the egress produced the same output whichever mode the
768 tunnel was in.
770 These new decapsulation rules have been defined in such a way that
771 congestion control will still work safely if any of the earlier
772 versions of ECN processing are used unilaterally at the encapsulating
773 ingress of the tunnel (any of RFC2003, RFC2401, either mode of
774 RFC2481, either mode of RFC3168, RFC4301 and this present
775 specification). If a tunnel ingress tries to negotiate to use
776 limited functionality mode or full functionality mode [RFC3168], a
777 decapsulating tunnel egress compliant with this specification MUST
778 agree to either request, as its behaviour will be the same in both
779 cases.
781 For 'forward compatibility', a compliant tunnel egress SHOULD raise a
782 warning about any requests to enter modes it doesn't recognise, but
783 it can continue operating. If no ECN-related mode is requested, a
784 compliant tunnel egress can continue without raising any error or
785 warning as its egress behaviour is compatible with all the legacy
786 ingress behaviours that don't negotiate capabilities.
788 5.2. Non-Issues for RFC4301 IPsec Encapsulation
790 The new normal mode of ingress behaviour defined above (Section 4.1)
791 brings all IP in IP tunnels into line with [RFC4301]. If one end of
792 an IPsec tunnel is compliant with [RFC4301], the other end is
793 guaranteed to also be RFC4301-compliant (there could be corner cases
794 where manual keying is used, but they will be set aside here).
795 Therefore the new normal ingress behaviour introduces no backward
796 compatibility isses with IKEv2 [RFC4306] IPsec [RFC4301] tunnels, and
797 no need for any new modes, options or configuration.
799 5.3. Upgrading Other IP in IP Tunnel Encapsulators
801 At the tunnel ingress, this specification effectively extends the
802 scope of RFC4301's ingress behaviour to any IP in IP tunnel. If any
803 other IP in IP tunnel ingress (i.e. not RFC4301 IPsec) is upgraded to
804 be compliant with this specification, it has to cater for the
805 possibility that it is talking to a legacy tunnel egress that may not
806 know how to process the ECN field. If ECN capable outer headers were
807 sent towards a legacy (e.g. [RFC2003]) egress, it would most likely
808 simply disregard the outer headers, dangerously discarding
809 information about congestion experienced within the tunnel. ECN-
810 capable traffic sources would not see any congestion feedback and
811 instead continually ratchet up their share of the bandwidth without
812 realising that cross-flows from other ECN sources were continually
813 having to ratchet down.
815 This specification introduces no new backward compatibility issues
816 when a compliant ingress talks with a legacy egress, but it has to
817 provide similar sfaeguards to those already defined in RFC3168.
818 Therefore, to comply with this specification, a tunnel ingress that
819 does not always know the ECN capability of its tunnel egress MUST
820 implement a 'normal' mode and a 'compatibility' mode, and for safety
821 it MUST initiate each negotiated tunnel in compatibility mode.
823 However, a tunnel ingress can be compliant even if it only implements
824 the 'normal mode' of encapsulation behaviour, but only as long as it
825 is designed or configured so that all possible tunnel egress nodes it
826 will ever talk to will have at least full ECN functionality
827 (complying with either RFC3168 full functionality mode, RFC4301 or
828 this present specification).
830 Before switching to normal mode, a compliant tunnel ingress that does
831 not know the egress ECN capability MUST negotiate with the tunnel
832 egress. If the egress says it is compliant with this specification
833 or with RFC3168 full functionality mode, the ingress puts itself into
834 normal mode. If the egress denies compliance with all of these or
835 doesn't understand the question, the tunnel ingress MUST remain in
836 compatibility mode.
838 The encapsulation rules for normal mode and compatibility mode are
839 defined in Section 4 (i.e. header copying or zeroing respectively).
841 An ingress cannot claim compliance with this specification simply by
842 disabling ECN processing across the tunnel (only implementing
843 compatibility mode). Although such a tunnel ingress is at least safe
844 with the ECN behaviour of any egress it may encounter (any of
845 RFC2003, RFC2401, either mode of RFC2481 and RFC3168's limited
846 functionality mode), it doesn't meet the aim of introducing ECN.
848 Therefore, a compliant tunnel ingress MUST at least implement `normal
849 mode' and, if it might be used with arbitrary tunnel egress nodes, it
850 MUST also implement `compatibility mode'.
852 Implementation note: if a compliant node is the ingress for multiple
853 tunnels, a mode setting will need to be stored for each tunnel
854 ingress. However, if a node is the egress for multiple tunnels, none
855 of the tunnels will need to store a mode setting, because a compliant
856 egress can only be in one mode.
858 6. Changes from Earlier RFCs
860 On encapsulation, the rule that a normal mode tunnel ingress MUST
861 copy any ECN field into the outer header is a change to the ingress
862 behaviour of RFC3168, but it is the same as the rules for IPsec
863 tunnels in RFC4301.
865 On decapsulation, the rules for calculating the outgoing ECN field at
866 a tunnel egress are similar to the full functionality mode of ECN in
867 RFC3168 and to RFC4301, with the following exceptions:
869 o The outer, not the inner, is propagated when the outer is ECT(1)
870 and the inner is ECT(0);
872 o A packet with Not-ECT in the inner may be forwarded as Not-ECT
873 rather than dropped, if the outer is ECT(0);
875 o The following extra illegal combinations have been identified,
876 which may require logging and/or an alarm: outer ECT(1) with inner
877 CE; outer ECT(0) with inner ECT(1)
879 The rules for how a tunnel establishes whether the egress has full
880 functionality ECN capabilities are an update to RFC3168. For all the
881 typical cases, RFC4301 is not updated by the ECN capability check in
882 this specification, because a typical RFC4301 tunnel ingress will
883 have already established that it is talking to an RFC4301 tunnel
884 egress (e.g. if it uses IKEv2). However, there may be some corner
885 cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with
886 an egress with limited functionality ECN handling. Strictly, for
887 such corner cases, the requirement to use compatibility mode in this
888 specification updates RFC4301, but this is unlikely to be necessary
889 to implement for this corner case in practice.
891 The optional ECN Tunnel field in the IPsec security association
892 database (SAD) and the optional ECN Tunnel Security Association
893 Attribute defined in RFC3168 are no longer needed. The security
894 association (SA) has no policy on ECN usage, because all RFC4301
895 tunnels now support ECN without any policy choice.
897 RFC3168 defines a (required) limited functionality mode and an
898 (optional) full functionality mode for a tunnel, but RFC4301 doesn't
899 need modes. In this specification only the ingress might need two
900 modes: a normal mode (required) and a compatibility mode (required in
901 some scenarios, optional in others). The egress needs only one mode
902 which correctly handles any ingress ECN behaviour.
904 Additional changes to the RFC Index (to be removed by the RFC Editor):
906 In the RFC index, RFC3168 should be identified as an update to
907 RFC2003. RFC4301 should be identified as an update to RFC3168.
909 This specification updates RFC3168 and RFC4301.
911 7. IANA Considerations
913 This memo includes no request to IANA.
915 8. Security Considerations
917 Appendix A.1 discusses the security constraints imposed on ECN tunnel
918 processing. The new rules for ECN tunnel processing (Section 4)
919 trade-off between security (covert channels) and congestion
920 monitoring & control. In fact, ensuring congestion markings are not
921 lost is itself another aspect of security, because if we allowed
922 congestion notification to be lost, any attempt to enforce a response
923 to congestion would be much harder.
925 If alternate congestion notification semantics are defined for a
926 certain PHB (e.g. the pre-congestion notification architecture
927 [I-D.ietf-pcn-architecture]), the scope of the alternate semantics
928 might typically be bounded by the limits of a Diffserv region or
929 regions, as envisaged in [RFC4774]. The inner headers in tunnels
930 crossing the boundary of such a Diffserv region but ending within the
931 region can potentially leak the external congestion notification
932 semantics into the region, or leak the internal semantics out of the
933 region. [RFC2983] discusses the need for Diffserv traffic
934 conditioning to be applied at these tunnel endpoints as if they are
935 at the edge of the Diffserv region. Similar concerns apply to any
936 processing or propagation of the ECN field at the edges of a Diffserv
937 region with alternate ECN semantics. Such edge processing must also
938 be applied at the endpoints of tunnels with one end inside and the
939 other outside the domain. [I-D.ietf-pcn-architecture] gives specific
940 advice on this for the PCN case, but other definitions of alternate
941 semantics will need to discuss the specific security implications in
942 each case.
944 With the decapsulation rules as they stood in RFC3168 and RFC4301, a
945 small part of the protection of the ECN nonce [RFC3540] was
946 compromised. The new decapsulation rules do not solve this problem.
948 The minor problem is as follows: The ECN nonce was defined to enable
949 the data source to detect if a CE marking had been applied then
950 subsequently removed. The source could detect this by weaving a
951 pseudo-random sequence of ECT(0) and ECT(1) values into a stream of
952 packets, which is termed an ECN nonce. By the decapsulation rules in
953 RFC3168 and RFC4301, if the inner and outer headers carry
954 contradictory ECT values only the inner header is preserved for
955 onward forwarding. So if a CE marking added to the outer ECN field
956 in a tunnel has been illegally (or accidentally) suppressed by a
957 subsequent node in the tunnel, the decapsulator will revert the ECN
958 field to its value before tampering, hiding all evidence of the crime
959 from the onward feedback loop. We chose not to close this minor
960 loophole for all the following reasons:
962 1. This loophole is only applicable in the corner case where the
963 attacker controls a network node downstream of a congested node
964 in the same tunnel;
966 2. In tunnelling scenarios, the ECN nonce is already vulnerable to
967 suppression by nodes downstream of a congested node in the same
968 tunnel, if they can copy the ECT value in the inner header to the
969 outer header (any node in the tunnel can do this if the inner
970 header is not encrypted, and an IPsec tunnel egress can do it
971 whether or not the tunnel is encrypted);
973 3. Although the new decapsulation behaviour removes evidence of
974 congestion suppression from the onward feedback loop, the
975 decapsulator itself can at least detect that congestion within
976 the tunnel has been suppressed;
978 4. The ECN nonce [RFC3540] currently has experimental status and
979 there has been no evidence that anyone has implemented it beyond
980 the author's prototype.
982 We could have fixed this loophole by specifying that the outer header
983 should always be propagated onwards if inner and outer are both ECT.
984 Although this would close the minor loophole in the nonce, it would
985 raise a minor safety issue if multilevel ECN or PCN were used. A
986 less severe marking in the inner header would override a more severe
987 one in the outer. Both are corner cases so it is difficult to decide
988 which is more important:
990 1. The loophole in the nonce is only for a minor case of one tunnel
991 node attacking another in the same tunnel;
993 2. The severity inversion for multilevel congestion notification
994 would not result from any legal codepoint transition.
996 We decided safety against misconfiguration was slightly more
997 important than securing against an attack that has little, if any,
998 clear motivation.
1000 If a legacy security policy configures a legacy tunnel ingress to
1001 negotiate to turn off ECN processing, a compliant tunnel egress will
1002 agree to a request to turn off ECN processing but it will actually
1003 still copy CE markings from the outer to the forwarded header.
1004 Although the tunnel ingress 'I' in Figure 5 (Appendix A.1) will set
1005 all ECN fields in outer headers to Not-ECT, 'M' could still toggle CE
1006 on and off to communicate covertly with 'B', because we have
1007 specified that 'E' only has one mode regardless of what mode it says
1008 it has negotiated. We could have specified that 'E' should have a
1009 limited functionality mode and check for such behaviour. But we
1010 decided not to add the extra complexity of two modes on a compliant
1011 tunnel egress merely to cater for a legacy security concern that is
1012 now considered manageable.
1014 9. Conclusions
1016 This document updates the ingress tunnelling encapsulation of RFC3168
1017 ECN for all IP in IP tunnels to bring it into line with the new
1018 behaviour in the IPsec architecture of RFC4301. It copies rather
1019 than resets a congestion experienced (CE) marking when creating outer
1020 headers.
1022 It also specifies new rules that update both RFC3168 and RFC4301 for
1023 calculating the outgoing ECN field on tunnel decapsulation. The new
1024 rules update egress behaviour for two specific combinations of inner
1025 and outer header that have no current legal usage, but will now be
1026 possible to use in future standards actions, rather than being wasted
1027 by current tunnelling behaviour.
1029 The new rules propagate changes to the ECN field across tunnel end-
1030 points that were previously blocked due to a perceived covert channel
1031 vulnerability. The new IPsec architecture deems the two-bit covert
1032 channel that the ECN field opens up is a manageable threat, so these
1033 new rules bring all IP in IP tunnelling into line with this new more
1034 permissive attitude. The result is a single specification for all
1035 future tunnelling of ECN, whether IPsec or not. Then equipment can
1036 be specified against a single ECN behaviour and ECN markings can have
1037 a well-defined meaning wherever they are measured in a network. This
1038 new certainty will enable new uses of the ECN field that would
1039 otherwise be confounded by ambiguity.
1041 The immediate motivation for making these changes is to allow the
1042 introduction of multi-level pre-congestion notification (PCN). But
1043 great care has been taken to ensure the resulting ECN tunnelling
1044 behaviour is simple and generic for other potential future uses.
1046 The change to encapsulation has been analysed from the three
1047 perspectives of security, control and management. They are somewhat
1048 in tension as to whether a tunnel ingress should copy congestion
1049 markings into the outer header it creates or reset them. From the
1050 control perspective either copying or resetting works for existing
1051 arrangements, but copying has more potential for simplifying control
1052 and resetting breaks at least one proposal already on the standards
1053 track. From the management and monitoring perspective copying is
1054 preferable. From the network security perspective (theft of service
1055 etc) copying is preferable. From the information security
1056 perspective resetting is preferable, but the IETF Security Area now
1057 considers copying acceptable given the bandwidth of a 2-bit covert
1058 channel can be managed. Therefore there are no points against
1059 copying and a number against resetting CE on ingress.
1061 The only downside of the changes to decapsulation is that the same
1062 2-bit covert channel is opened up as at the ingress, but this is now
1063 deemed to be a manageable threat. The changes at decapsulation have
1064 been found to be free of any backwards compatibility issues.
1066 10. Acknowledgements
1068 Thanks to Anil Agawaal for pointing out a case where it's safe for a
1069 tunnel decapsulator to forward a combination of headers it doesn't
1070 understand. Thanks to David Black for explaining a better way to
1071 think about function placement and to Louise Burness for a better way
1072 to think about multilayer transports and networks, having read
1073 [Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for
1074 Appendix C. Thanks to Michael Menth, Bruce Davie, Toby Moncaster,
1075 Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for
1076 their thoughts and careful review comments.
1078 Bob Briscoe is partly funded by Trilogy, a research project (ICT-
1079 216372) supported by the European Community under its Seventh
1080 Framework Programme. The views expressed here are those of the
1081 author only.
1083 11. Comments Solicited
1085 Comments and questions are encouraged and very welcome. They can be
1086 addressed to the IETF Transport Area working group mailing list
1087 , and/or to the authors.
1089 12. References
1091 12.1. Normative References
1093 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
1094 October 1996.
1096 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
1097 Requirement Levels", BCP 14, RFC 2119, March 1997.
1099 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
1100 "Definition of the Differentiated Services Field (DS
1101 Field) in the IPv4 and IPv6 Headers", RFC 2474,
1102 December 1998.
1104 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1105 of Explicit Congestion Notification (ECN) to IP",
1106 RFC 3168, September 2001.
1108 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
1109 Internet Protocol", RFC 4301, December 2005.
1111 12.2. Informative References
1113 [I-D.briscoe-pcn-3-in-1-encoding]
1114 Briscoe, B., "PCN 3-State Encoding Extension in a single
1115 DSCP", draft-briscoe-pcn-3-in-1-encoding-00 (work in
1116 progress), October 2008.
1118 [I-D.charny-pcn-single-marking]
1119 Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre-
1120 Congestion Notification Using Single Marking for Admission
1121 and Termination", draft-charny-pcn-single-marking-03
1122 (work in progress), November 2007.
1124 [I-D.ietf-pcn-architecture]
1125 Eardley, P., "Pre-Congestion Notification (PCN)
1126 Architecture", draft-ietf-pcn-architecture-10 (work in
1127 progress), March 2009.
1129 [I-D.ietf-pcn-baseline-encoding]
1130 Moncaster, T., Briscoe, B., and M. Menth, "Baseline
1131 Encoding and Transport of Pre-Congestion Information",
1132 draft-ietf-pcn-baseline-encoding-02 (work in progress),
1133 February 2009.
1135 [I-D.ietf-pcn-marking-behaviour]
1136 Eardley, P., "Marking behaviour of PCN-nodes",
1137 draft-ietf-pcn-marking-behaviour-02 (work in progress),
1138 March 2009.
1140 [I-D.ietf-pwe3-congestion-frmwk]
1141 Bryant, S., Davie, B., Martini, L., and E. Rosen,
1142 "Pseudowire Congestion Control Framework",
1143 draft-ietf-pwe3-congestion-frmwk-01 (work in progress),
1144 May 2008.
1146 [I-D.menth-pcn-psdm-encoding]
1147 Menth, M., Babiarz, J., Moncaster, T., and B. Briscoe,
1148 "PCN Encoding for Packet-Specific Dual Marking (PSDM)",
1149 draft-menth-pcn-psdm-encoding-00 (work in progress),
1150 July 2008.
1152 [I-D.moncaster-pcn-3-state-encoding]
1153 Moncaster, T., Briscoe, B., and M. Menth, "A three state
1154 extended PCN encoding scheme",
1155 draft-moncaster-pcn-3-state-encoding-01 (work in
1156 progress), March 2009.
1158 [I-D.satoh-pcn-st-marking]
1159 Satoh, D., Maeda, Y., Phanachet, O., and H. Ueno, "Single
1160 PCN Threshold Marking by using PCN baseline encoding for
1161 both admission and termination controls",
1162 draft-satoh-pcn-st-marking-01 (work in progress),
1163 March 2009.
1165 [IEEE802.1au]
1166 IEEE, "IEEE Standard for Local and Metropolitan Area
1167 Networks--Virtual Bridged Local Area Networks - Amendment
1168 10: Congestion Notification", 2008,
1169 .
1171 (Work in Progress; Access Controlled link within page)
1173 [ITU-T.I.371]
1174 ITU-T, "Traffic Control and Congestion Control in B-ISDN",
1175 ITU-T Rec. I.371 (03/04), March 2004.
1177 [PCNcharter]
1178 IETF, "Congestion and Pre-Congestion Notification (pcn)",
1179 IETF w-g charter , Feb 2007,
1180 .
1182 [Patterns_Arch]
1183 Day, J., "Patterns in Network Architecture: A Return to
1184 Fundamentals", Pub: Prentice Hall ISBN-13: 9780132252423,
1185 Jan 2008.
1187 [RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion
1188 Control Survey", RFC 1254, August 1991.
1190 [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
1191 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
1192 Functional Specification", RFC 2205, September 1997.
1194 [RFC2983] Black, D., "Differentiated Services and Tunnels",
1195 RFC 2983, October 2000.
1197 [RFC3426] Floyd, S., "General Architectural and Policy
1198 Considerations", RFC 3426, November 2002.
1200 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
1201 Congestion Notification (ECN) Signaling with Nonces",
1202 RFC 3540, June 2003.
1204 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
1205 RFC 4306, December 2005.
1207 [RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol
1208 (HIP) Architecture", RFC 4423, May 2006.
1210 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the
1211 Explicit Congestion Notification (ECN) Field", BCP 124,
1212 RFC 4774, November 2006.
1214 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
1215 Marking in MPLS", RFC 5129, January 2008.
1217 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain",
1218 2000, .
1221 (Expired)
1223 Appendix A. Design Constraints
1225 Tunnel processing of a congestion notification field has to meet
1226 congestion control and management needs without creating new
1227 information security vulnerabilities (if information security is
1228 required). This appendix documents the analysis of the tradeoffs
1229 between these factors that led to the new encapsulation rules in
1230 Section 4.1.
1232 A.1. Security Constraints
1234 Information security can be assured by using various end to end
1235 security solutions (including IPsec in transport mode [RFC4301]), but
1236 a commonly used scenario involves the need to communicate between two
1237 physically protected domains across the public Internet. In this
1238 case there are certain management advantages to using IPsec in tunnel
1239 mode solely across the publicly accessible part of the path. The
1240 path followed by a packet then crosses security 'domains'; the ones
1241 protected by physical or other means before and after the tunnel and
1242 the one protected by an IPsec tunnel across the otherwise unprotected
1243 domain. We will use the scenario in Figure 5 where endpoints 'A' and
1244 'B' communicate through a tunnel. The tunnel ingress 'I' and egress
1245 'E' are within physically protected edge domains, while the tunnel
1246 spans an unprotected internetwork where there may be 'men in the
1247 middle', M.
1249 physically unprotected physically
1250 <-protected domain-><--domain--><-protected domain->
1251 +------------------+ +------------------+
1252 | | M | |
1253 | A-------->I=========>==========>E-------->B |
1254 | | | |
1255 +------------------+ +------------------+
1256 <----IPsec secured---->
1257 tunnel
1259 Figure 5: IPsec Tunnel Scenario
1261 IPsec encryption is typically used to prevent 'M' seeing messages
1262 from 'A' to 'B'. IPsec authentication is used to prevent 'M'
1263 masquerading as the sender of messages from 'A' to 'B' or altering
1264 their contents. But 'I' can also use IPsec tunnel mode to allow 'A'
1265 to communicate with 'B', but impose encryption to prevent 'A' leaking
1266 information to 'M'. Or 'E' can insist that 'I' uses tunnel mode
1267 authentication to prevent 'M' communicating information to 'B'.
1268 Mutable IP header fields such as the ECN field (as well as the TTL/
1269 Hop Limit and DS fields) cannot be included in the cryptographic
1270 calculations of IPsec. Therefore, if 'I' copies these mutable fields
1271 into the outer header that is exposed across the tunnel it will have
1272 allowed a covert channel from 'A' to M that bypasses its encryption
1273 of the inner header. And if 'E' copies these fields from the outer
1274 header to the inner, even if it validates authentication from 'I', it
1275 will have allowed a covert channel from 'M' to 'B'.
1277 ECN at the IP layer is designed to carry information about congestion
1278 from a congested resource towards downstream nodes. Typically a
1279 downstream transport might feed the information back somehow to the
1280 point upstream of the congestion that can regulate the load on the
1281 congested resource, but other actions are possible (see [RFC3168]
1282 S.6). In terms of the above unicast scenario, ECN is typically
1283 intended to create an information channel from 'M' to 'B' (for 'B' to
1284 feed back to 'A'). Therefore the goals of IPsec and ECN are mutually
1285 incompatible.
1287 With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
1288 "controls are provided to manage the bandwidth of this [covert]
1289 channel". Using the ECN processing rules of RFC4301, the channel
1290 bandwidth is two bits per datagram from 'A' to 'M' and one bit per
1291 datagram from 'M' to 'A' (because 'E' limits the combinations of the
1292 2-bit ECN field that it will copy). In both cases the covert channel
1293 bandwidth is further reduced by noise from any real congestion
1294 marking. RFC4301 therefore implies that these covert channels are
1295 sufficiently limited to be considered a manageable threat. However,
1296 with respect to the larger (6b) DS field, the same section of RFC4301
1297 says not copying is the default, but a configuration option can allow
1298 copying "to allow a local administrator to decide whether the covert
1299 channel provided by copying these bits outweighs the benefits of
1300 copying". Of course, an administrator considering copying of the DS
1301 field has to take into account that it could be concatenated with the
1302 ECN field giving an 8b per datagram covert channel.
1304 Thus, for tunnelling the 6b Diffserv field two conceptual models have
1305 had to be defined so that administrators can trade off security
1306 against the needs of traffic conditioning [RFC2983]:
1308 The uniform model: where the DIffserv field is preserved end-to-end
1309 by copying into the outer header on encapsulation and copying from
1310 the outer header on decapsulation.
1312 The pipe model: where the outer header is independent of that in the
1313 inner header so it hides the Diffserv field of the inner header
1314 from any interaction with nodes along the tunnel.
1316 However, for ECN, the new IPsec security architecture in RFC4301 only
1317 standardised one tunnelling model equivalent to the uniform model.
1318 It deemed that simplicity was more important than allowing
1319 administrators the option of a tiny increment in security, especially
1320 given not copying congestion indications could seriously harm
1321 everyone's network service.
1323 A.2. Control Constraints
1325 Congestion control requires that any congestion notification marked
1326 into packets by a resource will be able to traverse a feedback loop
1327 back to a function capable of controlling the load on that resource.
1328 To be precise, rather than calling this function the data source, we
1329 will call it the Load Regulator. This will allow us to deal with
1330 exceptional cases where load is not regulated by the data source, but
1331 usually the two terms will be synonymous. Note the term "a function
1332 _capable of_ controlling the load" deliberately includes a source
1333 application that doesn't actually control the load but ought to (e.g.
1334 an application without congestion control that uses UDP).
1336 A--->R--->I=========>M=========>E-------->B
1338 Figure 6: Simple Tunnel Scenario
1340 We now consider a similar tunnelling scenario to the IPsec one just
1341 described, but without the different security domains so we can just
1342 focus on ensuring the control loop and management monitoring can work
1343 (Figure 6). If we want resources in the tunnel to be able to
1344 explicitly notify congestion and the feedback path is from 'B' to
1345 'A', it will certainly be necessary for 'E' to copy any CE marking
1346 from the outer header to the inner header for onward transmission to
1347 'B', otherwise congestion notification from resources like 'M' cannot
1348 be fed back to the Load Regulator ('A'). But it doesn't seem
1349 necessary for 'I' to copy CE markings from the inner to the outer
1350 header. For instance, if resource 'R' is congested, it can send
1351 congestion information to 'B' using the congestion field in the inner
1352 header without 'I' copying the congestion field into the outer header
1353 and 'E' copying it back to the inner header. 'E' can still write any
1354 additional congestion marking introduced across the tunnel into the
1355 congestion field of the inner header.
1357 It might be useful for the tunnel egress to be able to tell whether
1358 congestion occurred across a tunnel or upstream of it. If outer
1359 header congestion marking was reset by the tunnel ingress ('I'), at
1360 the end of a tunnel ('E') the outer headers would indicate congestion
1361 experienced across the tunnel ('I' to 'E'), while the inner header
1362 would indicate congestion upstream of 'I'. But similar information
1363 can be gleaned even if the tunnel ingress copies the inner to the
1364 outer headers. At the end of the tunnel ('E'), any packet with an
1365 _extra_ mark in the outer header relative to the inner header
1366 indicates congestion across the tunnel ('I' to 'E'), while the inner
1367 header would still indicate congestion upstream of ('I'). Appendix C
1368 gives a simple and precise method for a tunnel egress to infer the
1369 congestion level introduced across a tunnel.
1371 All this shows that 'E' can preserve the control loop irrespective of
1372 whether 'I' copies congestion notification into the outer header or
1373 resets it.
1375 That is the situation for existing control arrangements but, because
1376 copying reveals more information, it would open up possibilities for
1377 better control system designs. For instance, Appendix E describes
1378 how resetting CE marking at a tunnel ingress confuses a proposed
1379 congestion marking scheme on the standards track. It ends up
1380 removing excessive amounts of traffic unnecessarily. Whereas copying
1381 CE markings at ingress leads to the correct control behaviour.
1383 A.3. Management Constraints
1385 As well as control, there are also management constraints.
1386 Specifically, a management system may monitor congestion markings in
1387 passing packets, perhaps at the border between networks as part of a
1388 service level agreement. For instance, monitors at the borders of
1389 autonomous systems may need to measure how much congestion has
1390 accumulated since the original source, perhaps to determine between
1391 them how much of the congestion is contributed by each domain.
1393 Therefore, when monitoring the middle of a path, it should be
1394 possible to establish how far back in the path congestion markings
1395 have accumulated from. In this document we term this the baseline of
1396 congestion marking (or the Congestion Baseline), i.e. the source of
1397 the layer that last reset (or created) the congestion notification
1398 field. Given some tunnels cross domain borders (e.g. consider M in
1399 Figure 6 is monitoring a border), it would therefore be desirable for
1400 'I' to copy congestion accumulated so far into the outer headers
1401 exposed across the tunnel.
1403 Appendix B.2 discusses various scenarios where the Load Regulator
1404 lies in-path, not at the source host as we would typically expect.
1405 It concludes that a Congestion Baseline is determined by where the
1406 Load Regulator function is, which should be identified in the
1407 transport layer, not by addresses in network layer headers. This
1408 applies whether the Load Regulator is at the source host or within
1409 the path. The appendix also discusses where a Load Regulator
1410 function should be located relative to a local tunnel encapsulation
1411 function.
1413 Appendix B. Relative Placement of Tunnelling and In-Path Load
1414 Regulation
1416 B.1. Identifiers and In-Path Load Regulators
1418 The Load Regulator is the node to which congestion feedback should be
1419 returned by the next downstream node with a transport layer feedback
1420 function (typically but not always the data receiver). The Load
1421 Regulator is often, but not always the data source. It is not always
1422 (or even typically) the same thing as the node identified by the
1423 source address of the outermost exposed header. In general the
1424 addressing of the outermost encapsulation header says nothing about
1425 the identifiers of either the upstream or the downstream transport
1426 layer functions. As long as the transport functions know each
1427 other's addresses, they don't have to be identified in the network
1428 layer or in any link layer. It was only a convenience that a TCP
1429 receiver assumed that the address of the source transport is the same
1430 as the network layer source address of an IP packet it receives.
1432 More generally, the return transport address for feedback could be
1433 identified solely in the transport layer protocol. For instance, a
1434 signalling protocol like RSVP [RFC2205] breaks up a path into
1435 transport layer hops and informs each hop of the address of its
1436 transport layer neighbour without any need to identify these hops in
1437 the network layer. RSVP can be arranged so that these transport
1438 layer hops are bigger than the underlying network layer hops. The
1439 host identity protocol (HIP) architecture [RFC4423] also supports the
1440 same principled separation (for mobility amongst other things), where
1441 the transport layer sender identifies its transport address for
1442 feedback to be sent to, using an identifier provided by a shim below
1443 the transport layer.
1445 Keeping to this layering principle deliberately doesn't require a
1446 network layer packet header to reveal the origin address from where
1447 congestion notification accumulates (its Congestion Baseline). It is
1448 not necessary for the network and lower layers to know the address of
1449 the Load Regulator. Only the destination transport needs to know
1450 that. With forward congestion notification, the network and link
1451 layers only notify congestion forwards; they aren't involved in
1452 feeding it backwards. If they are (e.g. backward congestion
1453 notification (BCN) in Ethernet [IEEE802.1au] or EFCI in ATM
1454 [ITU-T.I.371]), that should be considered as a transport function
1455 added to the lower layer, which must sort out its own addressing.
1456 Indeed, this is one reason why ICMP source quench is now deprecated
1457 [RFC1254]; when congestion occurs within a tunnel it is complex
1458 (particularly in the case of IPsec tunnels) to return the ICMP
1459 messages beyond the tunnel ingress back to the Load Regulator.
1461 Similarly, if a management system is monitoring congestion and needs
1462 to know the Congestion Baseline, the management system has to find
1463 this out from the transport; in general it cannot tell solely by
1464 looking at the network or link layer headers.
1466 B.2. Non-Dependence of Tunnelling on In-path Load Regulation
1468 We have said that at any point in a network, the Congestion Baseline
1469 (where congestion notification starts from zero) should be the
1470 previous upstream Load Regulator. We have also said that the ingress
1471 of an IP in IP tunnel must copy congestion indications to the
1472 encapsulating outer headers it creates. If the Load Regulator is in-
1473 path rather than at the source, and also a tunnel ingress, these two
1474 requirements seem to be contradictory. A tunnel ingress must not
1475 reset incoming congestion, but a Load Regulator must be the
1476 Congestion Baseline, implying it needs to reset incoming congestion.
1478 In fact, the two requirements are not contradictory, because a Load
1479 Regulator and a tunnel ingress are not the names of machines, but the
1480 names of functions within a machine that typically occur in sequence
1481 on a stream of packets, not at the same point. Figure 7 is borrowed
1482 from [RFC2983] (which was making a similar point about the location
1483 of Diffserv traffic conditioning relative to the encapsulation
1484 function of a tunnel). An in-path Load Regulator can act on packets
1485 either at [1 - Before] encapsulation or at [2 - Outer] after
1486 encapsulation. Load Regulation does not ever need to be integrated
1487 with the [Encapsulate] function (but it can be for efficiency).
1488 Therefore we can still mandate that the [Encapsulate] function always
1489 copies CE into the outer header.
1491 >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->>
1492 \
1493 \
1494 +--------[2 - Outer]------->>
1496 Figure 7: Placement of In-Path Load Regulator Relative to Tunnel
1497 Ingress
1499 Then separately, if there is a Load Regulator at location [2 -
1500 Outer], it might reset CE to ECT(0), say. Then the Congestion
1501 Baseline for the lower layer (outer) will be [2 - Outer], while the
1502 Congestion Baseline of the inner layer will be unchanged. But how
1503 encapsulation works has nothing to do with whether a Load Regulator
1504 is present or where it is.
1506 If on the other hand a Load Regulator resets CE at [1 - Before], the
1507 Congestion Baseline of both the inner and outer headers will be [1 -
1508 Before]. But again, encapsulation is independent of load regulation.
1510 B.3. Dependence of In-Path Load Regulation on Tunnelling
1512 Although encapsulation doesn't need to depend on in-path load
1513 regulation, the reverse is not true. The placement of an in-path
1514 Load Regulator must be carefully considered relative to
1515 encapsulation. Some examples are given in the following for
1516 guidance.
1518 In the traditional Internet architecture one tends to think of the
1519 source host as the Load Regulator for a path. It is generally not
1520 desirable or practical for a node part way along the path to regulate
1521 the load. However, various reasonable proposals for in-path load
1522 regulation have been made from time to time (e.g. fair queuing,
1523 traffic engineering, flow admission control). The IETF has recently
1524 chartered a working group to standardise admission control across a
1525 part of a path using pre-congestion notification (PCN) [PCNcharter].
1526 This is of particular relevance here because it involves congestion
1527 notification with an in-path Load Regulator, it can involve
1528 tunnelling and it certainly involves encapsulation more generally.
1530 We will use the more complex scenario in Figure 8 to tease out all
1531 the issues that arise when combining congestion notification and
1532 tunnelling with various possible in-path load regulation schemes. In
1533 this case 'I1' and 'E2' break up the path into three separate
1534 congestion control loops. The feedback for these loops is shown
1535 going right to left across the top of the figure. The 'V's are arrow
1536 heads representing the direction of feedback, not letters. But there
1537 are also two tunnels within the middle control loop: 'I1' to 'E1' and
1538 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS
1539 core networks. M is a congestion monitoring point, perhaps between
1540 two border routers where the same tunnel continues unbroken across
1541 the border.
1542 ______ _______________________________________ _____
1543 / \ / \ / \
1544 V \ V M \ V \
1545 A--->R--->I1===========>E1----->I2=========>==========>E2------->B
1547 Figure 8: Complex Tunnel Scenario
1549 The question is, should the congestion markings in the outer exposed
1550 headers of a tunnel represent congestion only since the tunnel
1551 ingress or over the whole upstream path from the source of the inner
1552 header (whatever that may mean)? Or put another way, should 'I1' and
1553 'I2' copy or reset CE markings?
1554 Based on the design principles in Section 4.3, the answer is that the
1555 Congestion Baseline should be the nearest upstream interface designed
1556 to regulate traffic load--the Load Regulator. In Figure 8 'A', 'I1'
1557 or 'E2' are all Load Regulators. We have shown the feedback loops
1558 returning to each of these nodes so that they can regulate the load
1559 causing the congestion notification. So the Congestion Baseline
1560 exposed to M should be 'I1' (the Load Regulator), not 'I2'.
1561 Therefore I1 should reset any arriving CE markings. In this case,
1562 'I1' knows the tunnel to 'E1' is unrelated to its load regulation
1563 function. So the load regulation function within 'I1' should be
1564 placed at [1 - Before] tunnel encapsulation within 'I1' (using the
1565 terminology of Figure 7). Then the Congestion Baseline all across
1566 the networks from 'I1' to 'E2' in both inner and outer headers will
1567 be 'I1'.
1569 The following further examples illustrate how this answer might be
1570 applied:
1572 o We argued in Appendix E that resetting CE on encapsulation could
1573 harm PCN excess rate marking, which marks excess traffic for
1574 removal in subsequent round trips. This marking relies on not
1575 marking packets if another node upstream has already marked them
1576 for removal. If there were a tunnel ingress between the two which
1577 reset CE markings, it would confuse the downstream node into
1578 marking far too much traffic for removal. So why do we say that
1579 'I1' should reset CE, while a tunnel ingress shouldn't? The
1580 answer is that it is the Load Regulator function at 'I1' that is
1581 resetting CE, not the tunnel encapsulator. The Load Regulator
1582 needs to set itself as the Congestion Baseline, so the feedback it
1583 gets will only be about congestion on links it can relieve itself
1584 (by regulating the load into them). When it resets CE markings,
1585 it knows that something else upstream will have dealt with the
1586 congestion notifications it removes, given it is part of an end-
1587 to-end admission control signalling loop. It therefore knows that
1588 previous hops will be covered by other Load Regulators.
1589 Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should
1590 follow the new rule for any tunnel ingress and copy congestion
1591 marking into the outer tunnel header. The ingress at 'I1' will
1592 happen to copy headers that have already been reset just
1593 beforehand. But it doesn't need to know that.
1595 o [Shayman] suggested feedback of ECN accumulated across an MPLS
1596 domain could cause the ingress to trigger re-routing to mitigate
1597 congestion. This case is more like the simple scenario of
1598 Figure 6, with a feedback loop across the MPLS domain ('E' back to
1599 'I'). I is a Load Regulator because re-routing around congestion
1600 is a load regulation function. But in this case 'I' should only
1601 reset itself as the Congestion Baseline in outer headers, as it is
1602 not handling congestion outside its domain, so it must preserve
1603 the end-to-end congestion feedback loop for something else to
1604 handle (probably the data source). Therefore the Load Regulator
1605 within 'I' should be placed at [2 - Outer] to reset CE markings
1606 just after the tunnel ingress has copied them from arriving
1607 headers. Again, the tunnel encapsulation function at 'I' simply
1608 copies incoming headers, unaware that the load regulator will
1609 subsequently reset its outer headers.
1611 o The PWE3 working group of the IETF is considering the problem of
1612 how and whether an aggregate edge-to-edge pseudo-wire emulation
1613 should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].
1614 Although the study is still at the requirements stage, some
1615 (controversial) solution proposals include in-path load regulation
1616 at the ingress to the tunnel that could lead to tunnel
1617 arrangements with similar complexity to that of Figure 8.
1619 These are not contrived scenarios--they could be a lot worse. For
1620 instance, a host may create a tunnel for IPsec which is placed inside
1621 a tunnel for Mobile IP over a remote part of its path. And around
1622 this all we may have MPLS labels being pushed and popped as packets
1623 pass across different core networks. Similarly, it is possible that
1624 subnets could be built from link technology (e.g. future Ethernet
1625 switches) so that link headers being added and removed could involve
1626 congestion notification in future Ethernet link headers with all the
1627 same issues as with IP in IP tunnels.
1629 One reason we introduced the concept of a Load Regulator was to allow
1630 for in-path load regulation. In the traditional Internet
1631 architecture one tends to think of a host and a Load Regulator as
1632 synonymous, but when considering tunnelling, even the definition of a
1633 host is too fuzzy, whereas a Load Regulator is a clearly defined
1634 function. Similarly, the concept of innermost header is too fuzzy to
1635 be able to (wrongly) say that the source address of the innermost
1636 header should be the Congestion Baseline. Which is the innermost
1637 header when multiple encapsulations may be in use? Where do we stop?
1638 If we say the original source in the above IPsec-Mobile IP case is
1639 the host, how do we know it isn't tunnelling an encrypted packet
1640 stream on behalf of another host in a p2p network?
1642 We have become used to thinking that only hosts regulate load. The
1643 end to end design principle advises that this is a good idea
1644 [RFC3426], but it also advises that it is solely a guiding principle
1645 intended to make the designer think very carefully before breaking
1646 it. We do have proposals where load regulation functions sit within
1647 a network path for good, if sometimes controversial, reasons, e.g.
1648 PCN edge admission control gateways [I-D.ietf-pcn-architecture] or
1649 traffic engineering functions at domain borders to re-route around
1650 congestion [Shayman]. Whether or not we want in-path load
1651 regulation, we have to work round the fact that it will not go away.
1653 Appendix C. Contribution to Congestion across a Tunnel
1655 This specification mandates that a tunnel ingress determines the ECN
1656 field of each new outer tunnel header by copying the arriving header.
1657 Concern has been expressed that this will make it difficult for the
1658 tunnel egress to monitor congestion introduced only along a tunnel,
1659 which is easy if the outer ECN field is reset at a tunnel ingress
1660 (RFC3168 full functionality mode). However, in fact copying CE marks
1661 at ingress will still make it easy for the egress to measure
1662 congestion introduced across a tunnel, as illustrated below.
1664 Consider 100 packets measured at the egress. It measures that 30 are
1665 CE marked in the inner and outer headers and 12 have additional CE
1666 marks in the outer but not the inner. This means packets arriving at
1667 the ingress had already experienced 30% congestion. However, it does
1668 not mean there was 12% congestion across the tunnel. The correct
1669 calculation of congestion across the tunnel is p_t = 12/(100-30) =
1670 12/70 = 17%. This is easy for the egress to to measure. It is the
1671 packets with additional CE marking in the outer header (12) as a
1672 proportion of packets not marked in the inner header (70).
1674 Figure 9 illustrates this in a combinatorial probability diagram.
1675 The square represents 100 packets. The 30% division along the bottom
1676 represents marking before the ingress, and the p_t division up the
1677 side represents marking along the tunnel.
1679 +-----+---------+100%
1680 | | |
1681 | 30 | |
1682 | | | The large square
1683 | +---------+p_t represents 100 packets
1684 | | 12 |
1685 +-----+---------+0
1686 0 30% 100%
1687 inner header marking
1689 Figure 9: Tunnel Marking of Packets Already Marked at Ingress
1691 Appendix D. Why Not Propagating ECT(1) on Decapsulation Impedes PCN
1693 Multi-level congestion notification is currently on the IETF's
1694 standards track agenda in the Congestion and Pre-Congestion
1695 Notification (PCN) working group. The PCN working group eventually
1696 requires three congestion states (not marked and two increasingly
1697 severe levels of congestion marking) [I-D.ietf-pcn-architecture].
1698 The aim is for the less severe level of marking to stop admitting new
1699 traffic and the more severe level to terminate sufficient existing
1700 flows to bring a network back to its operating point after a serious
1701 failure.
1703 Although the ECN field gives sufficient codepoints for these three
1704 states, current ECN tunnelling RFCs prevent the PCN working group
1705 from using three ECN states in case any tunnel decapsulations occur
1706 within a PCN region (see Appendix A of
1707 [I-D.ietf-pcn-baseline-encoding]). If a node in a tunnel sets the
1708 ECN field to ECT(0) or ECT(1), this change will be discarded by a
1709 tunnel egress compliant with RFC4301 or RFC3168. This can be seen in
1710 Figure 2 (Section 3.2), where ECT values in the outer header are
1711 ignored unless the inner header is the same. Effectively one ECT
1712 codepoint is wasted; the ECT(0) and ECT(1) codepoints have to be
1713 treated as just one codepoint when they could otherwise have been
1714 used for their intended purpose of congestion notification.
1716 As a consequence, the PCN w-g has initially confined itself to two
1717 encoding states as a baseline encoding
1718 [I-D.ietf-pcn-baseline-encoding]. And it has had to propose an
1719 experimental extension using extra Diffserv codepoint(s) to encode
1720 the extra states [I-D.moncaster-pcn-3-state-encoding], using up the
1721 rapidly exhausting DSCP space while leaving ECN codepoints unused.
1722 Another PCN encoding has been proposed that would survive tunnelling
1723 without an extra DSCP [I-D.menth-pcn-psdm-encoding], but it requires
1724 the PCN edge gateways to somehow share state so the egress can
1725 determine which marking a packet started with at the ingress. Also a
1726 PCN ingress node can game the system by initiating packets with
1727 inappropriate markings. Yet another work-round to the ECN tunnelling
1728 problem proposes a more involved marking algorithm in the forwarding
1729 plane to encode the three congestion notification states using only
1730 two ECN codepoints [I-D.satoh-pcn-st-marking]. Still another
1731 proposal compromises the precision of the admission control
1732 mechanism, but manages to work with just two encoding states and a
1733 single marking algorithm [I-D.charny-pcn-single-marking].
1735 Rather than require the IETF to bless any of these work-rounds, this
1736 specification fixes the root cause of the problem so that operators
1737 deploying PCN can simply ask that tunnel end-points within a PCN
1738 region should comply with this new ECN tunnelling specification.
1740 Then PCN can use the trivially simple experimental 3-state ECN
1741 encoding defined in [I-D.briscoe-pcn-3-in-1-encoding].
1743 D.1. Alternative Ways to Introduce the New Decapsulation Rules
1745 There are a number of ways for the new decapsulation rules to be
1746 introduced:
1748 o They could be specified in the present standards track proposal
1749 (preferred) or in an experimental extension;
1751 o They could be specified as a new default for all Diffserv PHBs
1752 (preferred) or as an option to be configured only for Diffserv
1753 PHBs requiring them (e.g. PCN).
1755 The argument for making this change now, rather than in a separate
1756 experimental extension, is to avoid the burden of an extra standard
1757 to be compliant with and to be backwards compatible with--so we don't
1758 add to the already complex history of ECN tunnelling RFCs. The
1759 argument for a separate experimental extension is that we may never
1760 need this change (if PCN is never successfully deployed and if no-one
1761 ever needs three ECN or PCN encoding states rather than two).
1762 However, the change does no harm to existing mechanisms and stops
1763 tunnels wasting of quarter of a bit (a 2-bit codepoint).
1765 The argument for making this new decapsulation behaviour the default
1766 for all PHBs is that it doesn't change any expected behaviour that
1767 existing mechanisms rely on already. Also, by ending the present
1768 waste of a codepoint, in the future a use of that codepoint could be
1769 proposed for all PHBs, even if PCN isn't successfully deployed.
1771 In practice, if these new decapsulation rules are specified
1772 straightaway as the normative default for all PHBs, a network
1773 operator deploying 3-state PCN would be able to request that tunnels
1774 comply with the latest specification. Implementers of non-PCN
1775 tunnels would not need to comply but, if they did, their code would
1776 be future proofed and no harm would be done to legacy operations.
1777 Therefore, rather than branching their code base, it would be easiest
1778 for implementers to make all their new tunnel code comply with this
1779 specfication, whether or not it was for PCN. But they could leave
1780 old code untouched, unless it was for PCN.
1782 The alternatives are worse. Implementers would otherwise have to
1783 provide configurable decapsulation options and operators would have
1784 to configure all IPsec and IP in IP tunnel endpoints for the
1785 exceptional behaviour of certain PHBs. The rules for tunnel
1786 endpoints to handle both the Diffserv field and the ECN field should
1787 'just work' when handling packets with any Diffserv codepoint.
1789 Appendix E. Why Resetting CE on Encapsulation Impedes PCN
1791 Regarding encapsulation, the section of the PCN architecture
1792 [I-D.ietf-pcn-architecture] on tunnelling says that header copying
1793 (RFC4301) allows PCN to work correctly. Whereas resetting CE
1794 markings confuses PCN marking.
1796 The specific issue here concerns PCN excess rate marking
1797 [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic
1798 that exceeds a configured threshold rate. One of the goals of excess
1799 rate marking is to enable the speedy removal of excess admission
1800 controlled traffic following re-routes caused by link failures or
1801 other disasters. This maintains a share of the capacity for traffic
1802 in lower priority classes. After failures, traffic re-routed onto
1803 remaining links can often stress multiple links along a path.
1804 Therefore, traffic can arrive at a link under stress with some
1805 proportion already marked for removal by a previous link. By design,
1806 marked traffic will be removed by the overall system in subsequent
1807 round trips. So when the excess rate marking algorithm decides how
1808 much traffic to mark for removal, it doesn't include traffic already
1809 marked for removal by another node upstream (the `Excess traffic
1810 meter function' of [I-D.ietf-pcn-marking-behaviour]).
1812 However, if an RFC3168 tunnel ingress intervenes, it resets the ECN
1813 field in all the outer headers, hiding all the evidence of problems
1814 upstream. Thus, although excess rate marking works fine with RFC4301
1815 IPsec tunnels, with RFC3168 tunnels it typically removes large
1816 volumes of traffic that it didn't need to remove at all.
1818 Author's Address
1820 Bob Briscoe
1821 BT
1822 B54/77, Adastral Park
1823 Martlesham Heath
1824 Ipswich IP5 3RE
1825 UK
1827 Phone: +44 1473 645196
1828 Email: bob.briscoe@bt.com
1829 URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/