idnits 2.17.1 draft-ietf-tsvwg-ecn-tunnel-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1161 has weird spacing: '... both admis...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 24, 2009) is 5511 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Encapsulate' is mentioned on line 1488, but not defined == Outdated reference: A later version (-11) exists of draft-ietf-pcn-architecture-10 == Outdated reference: A later version (-07) exists of draft-ietf-pcn-baseline-encoding-02 == Outdated reference: A later version (-05) exists of draft-ietf-pcn-marking-behaviour-02 == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-01 == Outdated reference: A later version (-02) exists of draft-satoh-pcn-st-marking-01 -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 4423 (Obsoleted by RFC 9063) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft BT 4 Intended status: Standards Track March 24, 2009 5 Expires: September 25, 2009 7 Tunnelling of Explicit Congestion Notification 8 draft-ietf-tsvwg-ecn-tunnel-02 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on September 25, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This document redefines how the explicit congestion notification 47 (ECN) field of the IP header should be constructed on entry to and 48 exit from any IP in IP tunnel. On encapsulation it brings all IP in 49 IP tunnels (v4 or v6) into line with the way RFC4301 IPsec tunnels 50 now construct the ECN field. On decapsulation it redefines how the 51 ECN field in the forwarded IP header should be calculated for two 52 previously invalid combinations of incoming inner and outer headers, 53 in order that these combinations may be usefully employed in future 54 standards actions. It includes a thorough analysis of the reasoning 55 for these changes and the implications. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 8 61 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 9 62 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 9 63 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 10 64 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 10 65 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 12 66 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 13 67 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14 68 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 14 69 4.3. Design Principles for Future Non-Default Schemes . . . . . 16 70 5. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 17 71 5.1. Non-Issues Upgrading Any Tunnel Decapsulation . . . . . . 18 72 5.2. Non-Issues for RFC4301 IPsec Encapsulation . . . . . . . . 18 73 5.3. Upgrading Other IP in IP Tunnel Encapsulators . . . . . . 19 74 6. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 20 75 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 76 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 77 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 23 78 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 79 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 25 80 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 81 12.1. Normative References . . . . . . . . . . . . . . . . . . . 25 82 12.2. Informative References . . . . . . . . . . . . . . . . . . 25 83 Appendix A. Design Constraints . . . . . . . . . . . . . . . . . 28 84 A.1. Security Constraints . . . . . . . . . . . . . . . . . . . 28 85 A.2. Control Constraints . . . . . . . . . . . . . . . . . . . 30 86 A.3. Management Constraints . . . . . . . . . . . . . . . . . . 31 87 Appendix B. Relative Placement of Tunnelling and In-Path Load 88 Regulation . . . . . . . . . . . . . . . . . . . . . 32 89 B.1. Identifiers and In-Path Load Regulators . . . . . . . . . 32 90 B.2. Non-Dependence of Tunnelling on In-path Load Regulation . 33 91 B.3. Dependence of In-Path Load Regulation on Tunnelling . . . 34 92 Appendix C. Contribution to Congestion across a Tunnel . . . . . 37 93 Appendix D. Why Not Propagating ECT(1) on Decapsulation 94 Impedes PCN . . . . . . . . . . . . . . . . . . . . . 38 95 D.1. Alternative Ways to Introduce the New Decapsulation 96 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 39 97 Appendix E. Why Resetting CE on Encapsulation Impedes PCN . . . . 40 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 40 100 Changes from previous drafts (to be removed by the RFC Editor) 102 Full text differences between IETF draft versions are available at 103 , and 104 between earlier individual draft versions at 105 107 From ietf-01 to ietf-02 (current): 109 * Scope reduced from any encapsulation of an IP packet to solely 110 IP in IP tunnelled encapsulation. Consequently changed title 111 and removed whole section 'Design Guidelines for New 112 Encapsulations of Congestion Notification' (to be included in a 113 future companion informational document). 115 * Included a new normative decapsulation rule for ECT(0) inner 116 and ECT(1) outer that had previously only been outlined in the 117 non-normative appendix 'Comprehensive Decapsulation Rules'. 118 Consequently: 120 + The Introduction has been completely re-written to motivate 121 this change to decapsulation along with the existing change 122 to encapsulation. 124 + The tentative text in the appendix that first proposed this 125 change has been split between normative standards text in 126 Section 4 and Appendix D, which explains specifically why 127 this change would streamline PCN. New text on the logic of 128 the resulting decap rules added. 130 * If inner/outer is Not-ECT/ECT(0), changed decapsulation to 131 propagate Not-ECT rather than drop the packet; and added 132 reasoning. 134 * Considerably restructured: 136 + "Design Constraints" analysis moved to an appendix 137 (Appendix A); 139 + Added Section 3 to summarise relevant existing RFCs; 141 + Structured Section 4 and Section 5 into subsections. 143 + Added tables to sections on old and new rules, for precision 144 and comparison. 146 + Moved Section 4.3 on Design Principles to the end of the 147 section specifying the new default normative tunnelling 148 behaviour. Rewritten and shifted text on identifiers and 149 in-path load regulators to Appendix B.1. 151 From ietf-00 to ietf-01: 153 * Identified two additional alarm states in the decapsulation 154 rules (Figure 4) if ECT(X) in outer and inner contradict each 155 other. 157 * Altered Comprehensive Decapsulation Rules (Appendix D) so that 158 ECT(0) in the outer no longer overrides ECT(1) in the inner. 159 Used the term 'Comprehensive' instead of 'Ideal'. And 160 considerably updated the text in this appendix. 162 * Added Appendix D.1 to weigh up the various ways the 163 Comprehensive Decapsulation Rules might be introduced. This 164 replaces the previous contradictory statements saying complex 165 backwards compatibility interactions would be introduced while 166 also saying there would be no backwards compatibility issues. 168 * Updated references. 170 From briscoe-01 to ietf-00: 172 * Re-wrote Appendix C giving much simpler technique to measure 173 contribution to congestion across a tunnel. 175 * Added discussion of backward compatibility of the ideal 176 decapsulation scheme in Appendix D 178 * Updated references. Minor corrections & clarifications 179 throughout. 181 From -00 to -01: 183 * Related everything conceptually to the uniform and pipe models 184 of RFC2983 on Diffserv Tunnels, and completely removed the 185 dependence of tunnelling behaviour on the presence of any in- 186 path load regulation by using the [1 - Before] [2 - Outer] 187 function placement concepts from RFC2983; 189 * Added specific cases where the existing standards limit new 190 proposals, particularly Appendix E; 192 * Added sub-structure to Introduction (Need for Rationalisation, 193 Roadmap), added new Introductory subsection on "Scope" and 194 improved clarity; 196 * Added Design Guidelines for New Encapsulations of Congestion 197 Notification; 199 * Considerably clarified the Backward Compatibility section 200 (Section 5); 202 * Considerably extended the Security Considerations section 203 (Section 8); 205 * Summarised the primary rationale much better in the 206 conclusions; 208 * Added numerous extra acknowledgements; 210 * Added Appendix E. "Why resetting CE on encapsulation harms 211 PCN", Appendix C. "Contribution to Congestion across a Tunnel" 212 and Appendix D. "Ideal Decapsulation Rules"; 214 * Re-wrote Appendix B.2, explaining how tunnel encapsulation no 215 longer depends on in-path load-regulation (changed title from 216 "In-path Load Regulation" to "Non-Dependence of Tunnelling on 217 In-path Load Regulation"), but explained how an in-path load 218 regulation function must be carefully placed with respect to 219 tunnel encapsulation (in a new sub-section entitled "Dependence 220 of In-Path Load Regulation on Tunnelling"). 222 1. Introduction 224 This document redefines how the explicit congestion notification 225 (ECN) field [RFC3168] in the IP header should be constructed for all 226 IP in IP tunnelling. Previously, tunnel endpoints blocked visibility 227 of transitions of the ECN field except the minimum necessary to allow 228 the basic ECN mechanism to work. Three main change are defined, one 229 on entry to and two on exit from any IP in IP tunnel. The newly 230 specified behaviours make all transitions to the ECN field visible 231 across tunnel end-points, so tunnels no longer restrict new uses of 232 the ECN field that were not envisaged when ECN was first designed. 234 The immediate motivation for opening up the ECN behaviour of tunnels 235 is because otherwise they impede the introduction of pre-congestion 236 notification (PCN [I-D.ietf-pcn-marking-behaviour]) in networks with 237 tunnels (Appendix E explains why). But these changes are not just 238 intended to ease the introduction of PCN; care has been taken to 239 ensure the resulting ECN tunnelling behaviour is simple and generic 240 for other potential future uses. 242 Given this is a change to behaviour at 'the neck of the hourglass', 243 an extensive analysis of the trade-offs between control, management 244 and security constraints has been conducted in order to minimise 245 unexpected side-effects both now and in the future. Care has also 246 been taken to ensure the changes are fully backwards compatible with 247 all previous tunnelling behaviours. 249 The ECN protocol allows a forwarding element to notify the onset of 250 congestion of its resources without having to drop packets. Instead 251 it can explicitly mark a proportion of packets by setting the 252 congestion experienced (CE) codepoint in the 2-bit ECN field in the 253 IP header (see Table 1 for a recap of the ECN codepoints). 255 +------------------+----------------+---------------------------+ 256 | Binary codepoint | Codepoint name | Meaning | 257 +------------------+----------------+---------------------------+ 258 | 00 | Not-ECT | Not ECN-capable transport | 259 | 01 | ECT(1) | ECN-capable transport | 260 | 10 | ECT(0) | ECN-capable transport | 261 | 11 | CE | Congestion experienced | 262 +------------------+----------------+---------------------------+ 264 Table 1: Recap of Codepoints of the ECN Field [RFC3168] in the IP 265 Header 267 The outer header of an IP packet can encapsulate one (or more) 268 additional IP headers tunnelled within it. A forwarding element that 269 is using ECN to signify congestion will only mark the outer IP header 270 that is immediately visible to it. When a tunnel decapsulator later 271 removes this outer header, it must follow rules to ensure the marking 272 is propagated into the IP header being forwarded onwards, otherwise 273 congestion notifications will disappear into a black hole leading to 274 potential congestion collapse. 276 The rules for constructing the ECN field to be forwarded after tunnel 277 decapsulation ensure this happens, but they are not wholly 278 straightforward, and neither are the rules for encapsulating one IP 279 header in another on entry to a tunnel. The factor that has 280 introduced most complication at both ends of a tunnel has been the 281 possibility that the ECN field might be used as a covert channel to 282 compromise the integrity of an IPsec tunnel. 284 A common use for IPsec is to create a secure tunnel between two 285 secure sites across the public Internet. A field like ECN that can 286 change as it traverses the Internet cannot be covered by IPsec's 287 integrity mechanisms. Therefore, the ECN field might be toggled 288 (with two bits per packet) to communicate between a secure site and 289 someone on the public Internet--a covert channel. 291 Over the years covert channel restrictions have been added to the 292 design of ECN (with consequent backward compatibility complications). 293 However the latest IPsec architecture [RFC4301] takes the view that 294 simplicity is more important than closing off the covert channel 295 threat, which it deems manageable given its bandwidth is limited to 296 two bits per packet. 298 As a result, an unfortunate sequence of standards actions has left us 299 with nearly the worst of all possible combinations of outcomes, 300 despite the best endeavours of everyone concerned. The new IPsec 301 architecture [RFC4301] only updates the earlier specification of ECN 302 tunnelling behaviour [RFC3168] for the case of IPsec tunnels. For 303 the case of non-IPsec tunnels the earlier RFC3168 specification still 304 applies. At the time RFC3168 was standardised, covert channels 305 through the ECN field were restricted, whether or not IPsec was being 306 used. The perverse position now is that non-IPsec tunnels restrict 307 covert channels, while IPsec tunnels don't. 309 Actually, this statement needs some qualification. IPsec tunnels 310 only don't restrict the ECN covert channel at the ingress. At the 311 tunnel egress, the presumption that the ECN covert channel should be 312 restricted has not been removed from any tunnelling specifications, 313 whether IPsec or not. 315 Now that these historic 2-bit covert channel constraints are impeding 316 the introduction of PCN, this specification is designed to remove 317 them and at the same time streamline the whole ECN behaviour for the 318 future. 320 1.1. Scope 322 This document only concerns wire protocol processing at tunnel 323 endpoints and makes no changes or recommendations concerning 324 algorithms for congestion marking or congestion response. 326 This document specifies common, default ECN field processing at 327 encapsulation and decapsulation for any IP in IP tunnelling. It 328 applies irrespective of whether IPv4 or IPv6 is used for either of 329 the inner and outer headers. It applies to all Diffserv per-hop 330 behaviours (PHBs), unless stated otherwise in the specification of a 331 PHB. It is intended to be a good trade off between somewhat 332 conflicting security, control and management requirements. 334 Nonetheless, if necessary, an alternate congestion encapsulation 335 behaviour can be introduced as part of the definition of an alternate 336 congestion marking scheme used by a specific Diffserv PHB (see S.5 of 337 [RFC3168] and [RFC4774]). When designing such new encapsulation 338 schemes, the principles in Section 4.3 should be followed as closely 339 as possible. There is no requirement for a PHB to state anything 340 about ECN tunnelling behaviour if the new default behaviour is 341 sufficient. 343 [RFC2983] is a comprehensive primer on differentiated services and 344 tunnels. Given ECN raises similar issues to differentiated services 345 when interacting with tunnels, useful concepts introduced in RFC2983 346 are used throughout, with brief recaps of the explanations where 347 necessary. 349 1.2. Document Roadmap 351 The body of the document focuses solely on standards actions 352 impacting implementation. Appendices record the analysis that 353 motivates and justifies these actions. The whole document is 354 organised as follows: 356 o Section 3 recaps relevant existing RFCs and explains exactly why 357 changes are needed, referring to Appendix D and Appendix E in 358 order to explain in detail why current tunnelling behaviours 359 impede PCN deployment, at egress and ingress respectively. 361 o Section 4 uses precise standards terminology to specify the new 362 ECN tunnelling behaviours. It refers to Appendix A for analysis 363 of the trade-offs between security, control and management design 364 constraints that led to these particular standards actions. 366 o Extending the new IPsec tunnel ingress behaviour to all IP in IP 367 tunnels requires consideration of backwards compatibility, which 368 is covered in Section 5 and detailed changes from earlier RFCs are 369 brought together in Section 6. 371 o Finally, a number of security considerations are discussed and 372 conclusions are drawn. 374 o Additional specialist issues are deferred to appendices in 375 addition to those already referred to above, in particular 376 Appendix B discusses specialist tunnelling issues that could arise 377 when ECN is fed back to a load regulation function on a middlebox, 378 rather than at the source of the path. 380 2. Requirements Language 382 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 383 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 384 document are to be interpreted as described in RFC 2119 [RFC2119]. 386 3. Summary of Pre-Existing RFCs 388 This section is informative not normative. It merely recaps pre- 389 existing RFCs to help motivate changing these behaviours. Earlier 390 relevant RFCs that were either experimental or incomplete with 391 respect to ECN tunnelling (RFC2481, RFC2401 and RFC2003) are not 392 discussed, although the backwards compatibility considerations in 393 Section 5 take them into account. The question of whether tunnel 394 implementations used in the Internet comply with any of these RFCs is 395 also not discussed. 397 3.1. Encapsulation at Tunnel Ingress 399 The controversy at tunnel ingress has been over whether to propagate 400 information about congestion experienced on the path upstream of the 401 tunnel ingress into the outer header of the tunnel. 403 Specifically, RFC3168 says that, if a tunnel fully supports ECN 404 (termed a 'full-functionality' ECN tunnel in [RFC3168]), the tunnel 405 ingress must not copy a CE marking from the inner header into the 406 outer header that it creates. Instead the tunnel ingress must set 407 the outer header to ECT(0) (i.e. codepoint 10) if the ECN field is 408 marked CE (codepoint 11) in the arriving IP header. We term this 409 'resetting' a CE codepoint. 411 However, the new IPsec architecture in [RFC4301] reverses this rule, 412 stating that the tunnel ingress must simply copy the ECN field from 413 the arriving to the outer header. The main purpose of the present 414 specification is to carry the new behaviour of IPsec over to all IP 415 in IP tunnels, so all tunnel ingress nodes consistently copy the ECN 416 field. 418 RFC3168 also provided a Limited Functionality mode that turns off ECN 419 processing over the scope of the tunnel. This is necessary if the 420 ingress does not know whether the tunnel egress supports propagation 421 of ECN markings. Neither Limited Functionality mode nor Full 422 Functionality mode are used in RFC4301 IPsec. 424 These pre-existing behaviours are summarised in Figure 1. 426 +-----------------+-----------------------------------------------+ 427 | Incoming Header | Outgoing Outer Header | 428 | (also equal to +---------------+---------------+---------------+ 429 | Outgoing Inner | RFC3168 ECN | RFC3168 ECN | RFC4301 IPsec | 430 | Header) | Limited | Full | | 431 | | Functionality | Functionality | | 432 +-----------------+---------------+---------------+---------------+ 433 | Not-ECT | Not-ECT | Not-ECT | Not-ECT | 434 | ECT(0) | Not-ECT | ECT(0) | ECT(0) | 435 | ECT(1) | Not-ECT | ECT(1) | ECT(1) | 436 | CE | Not-ECT | ECT(0) | CE e| 437 +-----------------+---------------+---------------+---------------+ 439 Figure 1: IP in IP Encapsulation: Recap of Pre-existing Behaviours 441 For encapsulation, the specification in Section 4 below brings all IP 442 in IP tunnels (v4 or v6) into line with the way IPsec tunnels 443 [RFC4301] now construct the ECN field, except where a legacy tunnel 444 egress might not understand ECN at all. This removes the now 445 redundant full functionality mode in the middle column of Figure 1. 446 Wherever possible it ensures that the outer header reveals any 447 congestion experienced so far on the whole path, not just since the 448 last tunnel ingress. 450 Why does it matter if we have different ECN encapsulation behaviours 451 for IPsec and non-IPsec tunnels? A general answer is that gratuitous 452 inconsistency constrains the available design space and makes it 453 harder to design networks and new protocols that work predictably. 455 But there is also a specific need not to reset the CE codepoint. The 456 standards track proposal for excess rate pre-congestion notification 457 (PCN [I-D.ietf-pcn-marking-behaviour]) only works correctly in the 458 presence of RFC4301 IPsec encapsulation or [RFC5129] MPLS 459 encapsulation, but not with RFC3168 IP in IP encapsulation 460 (Appendix E explains why). The PCN architecture 461 [I-D.ietf-pcn-architecture] states that the regular RFC3168 rules for 462 IP in IP tunnelling of the ECN field should not be used for PCN. But 463 if non-IPsec tunnels are already present within a network to which 464 PCN is being added, that is not particularly helpful advice. 466 The present specification provides a clean solution to this problem, 467 so that network operators who want to use PCN and tunnels can specify 468 that all tunnel endpoints in a PCN region need to be upgraded to 469 comply with this specification. Also, whether using PCN or not, as 470 more tunnel endpoints comply with this specification, it should make 471 ECN behaviour simpler, faster and more predictable. 473 To ensure copying rather than resetting CE on ingress will not cause 474 unintended side-effects, Appendix A assesses whether either harm any 475 security, control or management functions. It finds that resetting 476 CE makes life difficult in a number of directions, while copying CE 477 harms nothing (other than opening a low bit-rate covert channel 478 vulnerability which the IETF Security Area now deems is manageable). 480 3.2. Decapsulation at Tunnel Egress 482 Both RFC3168 and RFC4301 specify the decapsulation behaviour 483 summarised in Figure 2. The ECN field in the outgoing header is set 484 to the codepoint at the intersection of the appropriate incoming 485 inner header (row) and incoming outer header (column). 486 +------------------+----------------------------------------------+ 487 | Incoming Inner | Incoming Outer Header | 488 | Header +---------+------------+------------+----------+ 489 | | Not-ECT | ECT(0) | ECT(1) | CE | 490 +------------------+---------+------------+------------+----------+ 491 | Not-ECT | Not-ECT | drop(!!!)| drop(!!!)| drop(!!!)| 492 | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | 493 | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | 494 | CE | CE | CE | CE | CE | 495 +------------------+---------+------------+------------+----------+ 496 | Outgoing Header | 497 +----------------------------------------------+ 499 Figure 2: IP in IP Decapsulation; Recap of Pre-existing Behaviour 501 The behaviour in the table derives from the logic given in RFC3168, 502 briefly recapped as follows: 504 o On decapsulation, if the inner ECN field is Not-ECT but the outer 505 ECN field is anything except Not-ECT the decapsulator must drop 506 the packet. Drop is mandated because known legal protocol 507 transitions should not be able to lead to these cases (indicated 508 in the table by '(!!!)'), therefore the decapsulator may also 509 raise an alarm; 511 o In all other cases, the outgoing ECN field is set to the more 512 severe marking of the outer and inner ECN fields, where the 513 ranking of severity from highest to lowest is CE, ECT, Not-ECT; 515 o ECT(0) and ECT(1) are considered of equal severity (indicated by 516 just 'ECT' in the rank order above). Where the inner and outer 517 ECN fields are both ECT but they differ, the packet is forwarded 518 with the codepoint of the inner ECN field, which prevents ECT 519 codepoints being used for a covert channel. 521 The specification for decapsulation in Section 4 fixes two problems 522 with this pre-existing behaviour: 524 o Firstly, forwarding the codepoint of the inner header in the cases 525 where both inner and outer are different values of ECT effectively 526 implies that any distinction between ECT(0) and ECT(1) cannot be 527 introduced in the future wherever a tunnel might be deployed. 528 Therefore, the currently specified tunnel decapsulation behaviour 529 unnecessarily wastes one of four codepoints (effectively wasting 530 half a bit) in the IP (v4 & v6) header. As explained in 531 Appendix A.1, the original reason for not using the outer ECT 532 codepoints for onward forwarding was to limit the covert channel 533 across a decapsulator to 1 bit per packet. However, now that the 534 IETF Security Area has deemed that a 2-bit covert channel through 535 an encapsulator is a manageable risk, the same should be true for 536 a decapsulator. 538 As well as being a general future-proofing issue, this problem is 539 immediately pressing for standardisation of pre-congestion 540 notification (PCN). PCN solutions generally require three 541 encoding states in addition to Not-ECT: one for 'not marked' and 542 two increasingly severe levels of marking. Although the ECN field 543 gives sufficient codepoints for these three states, they cannot 544 all be used for PCN because a change between ECT(0) and ECT(1) in 545 any tunnelled packet would be lost when the outer header was 546 decapsulated, dangerously discarding congestion signalling. A 547 number of wasteful or convoluted work-rounds to this problem are 548 being considered for standardisation by the PCN working group (see 549 Appendix D), but by far the simplest approach is just to remove 550 the covert channel blockages from tunnelling behaviour, that are 551 now deemed unnecessary anyway. Not only will this streamline PCN 552 standardisation, but it could also streamline other future uses of 553 these codepoints. 555 o Secondly, mandating drop is not always a good idea just because a 556 combination of headers seems invalid. There are many cases where 557 it has become nearly impossible to deploy new standards because 558 legacy middleboxes drop packets carrying header values they don't 559 expect. Where possible, the new decapsulation behaviour specified 560 in Section 4 below is more liberal in its response to unexpected 561 combinations of headers. 563 4. New ECN Tunnelling Rules 565 The ECN tunnel processing rules below in Section 4.1 (ingress 566 encapsulation) and Section 4.2 (egress decapsulation) are the default 567 for a packet with any DSCP. If required, different ECN encapsulation 568 rules MAY be defined as part of the definition of an appropriate 569 Diffserv PHB using the guidelines that follow in Section 4.3. 570 However, the deployment burden of handling exceptional PHBs in 571 implementations of all affected tunnels and lower layer link 572 protocols should not be underestimated. 574 4.1. Default Tunnel Ingress Behaviour 576 A tunnel ingress compliant with this specification MUST implement a 577 `normal mode'. It might also need to implement a `compatibility 578 mode' for backward compatibility with legacy tunnel egresses that do 579 not understand ECN (see Section 5 for when compatibility mode is 580 required). Note that these are modes of the ingress tunnel endpoint 581 only, not the tunnel as a whole. 583 Whatever the mode, the tunnel ingress forwards the inner header 584 without changing the ECN field. In normal mode a tunnel ingress 585 compliant with this specification MUST construct the outer 586 encapsulating IP header by copying the 2-bit ECN field of the 587 arriving IP header. In compatibility mode it clears the ECN field in 588 the outer header to the Not-ECT codepoint. These rules are tabulated 589 for convenience in Figure 3. 590 +-----------------+-------------------------------+ 591 | Incoming Header | Outgoing Outer Header | 592 | (also equal to +---------------+---------------+ 593 | Outgoing Inner | Compatibility | Normal | 594 | Header) | Mode | Mode | 595 +-----------------+---------------+---------------+ 596 | Not-ECT | Not-ECT | Not-ECT | 597 | ECT(0) | Not-ECT | ECT(0) | 598 | ECT(1) | Not-ECT | ECT(1) | 599 | CE | Not-ECT | CE | 600 +-----------------+---------------+---------------+ 602 Figure 3: New IP in IP Encapsulation Behaviours 604 Compatibility mode is the same per packet behaviour as the ingress 605 end of RFC3168's limited functionality mode. Normal mode is the same 606 per packet behaviour as the ingress end of RFC4301 IPsec. 608 4.2. Default Tunnel Egress Behaviour 610 To decapsulate the inner header at the tunnel egress, a compliant 611 tunnel egress MUST set the outgoing ECN field to the codepoint at the 612 intersection of the appropriate incoming inner header (row) and outer 613 header (column) in Figure 4. 615 +------------------+----------------------------------------------+ 616 | Incoming Inner | Incoming Outer Header | 617 | Header +---------+------------+------------+----------+ 618 | | Not-ECT | ECT(0) | ECT(1) | CE | 619 +------------------+---------+------------+------------+----------+ 620 | Not-ECT | Not-ECT |Not-ECT(!!!)| drop(!!!)| drop(!!!)| 621 | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE | 622 | ECT(1) | ECT(1) | ECT(1)(!!!)| ECT(1) | CE | 623 | CE | CE | CE | CE(!!!)| CE | 624 +------------------+---------+------------+------------+----------+ 625 | Outgoing Header | 626 +----------------------------------------------+ 628 Figure 4: New IP in IP Decapsulation Behaviour 630 This table for decapsulation behaviour is derived from the following 631 logic: 633 o If the inner ECN field is Not-ECT the decapsulator MUST NOT 634 propagate any other ECN codepoint in the outer header onwards. 635 This is because the inner Not-ECT marking is set by transports 636 that would not understand the ECN protocol. Instead: 638 * If the inner ECN field is Not-ECT and the outer ECN field is 639 ECT(1) or CE the decapsulator MUST drop the packet. 640 Reasoning: these combinations of codepoints either imply some 641 illegal protocol transition has occurred within the tunnel, or 642 that some locally defined mechanism is being used within the 643 tunnel that might be signalling congestion. In either case, 644 the only appropriate signal to the transport is a packet drop. 645 It would have been nice to allow packets with ECT(1) in the 646 outer to be forwarded, but drop has had to be mandated in case 647 future multi-level ECN schemes are defined. Then ECT(1) and CE 648 can be used in the future to signify two levels of congestion 649 severity. 651 * If the inner ECN field is Not-ECT and the outer ECN field is 652 ECT(0) or Not-ECT the decapsulator MUST forward the packet with 653 the ECN field cleared to Not-ECT. 654 Reasoning: Although no known legal protocol transition would 655 lead to ECT(0) in the outer and Not-ECT in the inner, no known 656 or proposed protocol uses ECT(0) as a congestion signal either. 657 Therefore in this case the packet can be forwarded rather than 658 dropped, which will allow future standards actions to use this 659 combination. 661 o In all other cases, the outgoing ECN field is set to the more 662 severe marking of the outer and inner ECN fields, where the 663 ranking of severity from highest to lowest is CE, ECT(1), ECT(0), 664 Not-ECT; 666 o There are cases where no currently legal transition in any current 667 or previous ECN tunneling specification would result in certain 668 combinations of inner and outer ECN fields. These cases are 669 indicated in Figure 4 by '(!!!)'). In these cases, the 670 decapsulator SHOULD log the event and MAY also raise an alarm, but 671 not so often that the illegal combinations would amplify into a 672 flood of alarm messages. 674 The above logic allows for ECT(0) and ECT(1) to both represent the 675 same severity of congestion marking (e.g. "not congestion marked"). 676 But it also allows future schemes to be defined where ECT(1) is a 677 more severe marking than ECT(0). This approach is discussed in 678 Appendix D and in the discussion of the ECN nonce [RFC3540] in 679 Section 8. 681 4.3. Design Principles for Future Non-Default Schemes 683 This section is informative not normative. 685 S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to 686 'switch in' different behaviours for marking the ECN field, just as 687 it switches in different per-hop behaviours (PHBs) for scheduling. 688 Therefore here we give guidance for designing possibly different 689 marking schemes. 691 In one word the guidance is "Don't". If a scheme requires tunnels to 692 implement special processing of the ECN field for certain DSCPs, it 693 is highly unlikely that every implementer of every tunnel will want 694 to add the required exception and that operators will want to deploy 695 the required configuration options. Therefore it is highly likely 696 that some tunnels within a network will not implement this special 697 case. Therefore, designers should avoid non-default tunnelling 698 schemes if at all possible. 700 That said, if a non-default scheme for processing the ECN field is 701 really required, the following guidelines may prove useful in its 702 design: 704 o For any new scheme, a tunnel ingress should not set the ECN field 705 of the outer header if it cannot guarantee that any corresponding 706 tunnel egress will understand how to handle such an ECN field. 708 o On encapsulation in any new scheme, an outer header capable of 709 carrying congestion markings should reflect accumulated congestion 710 since the last interface designed to regulate load (see 711 Appendix A.2 for the definition of a Load Regulator, which is 712 usually but not always the data source). This implies that new 713 schemes for tunnelling congestion notification should copy 714 congestion notification into the outer header of each new 715 encapsulating header that supports it. 717 Reasoning: The constraints from the three perspectives of 718 security, control and management in Appendix A are somewhat in 719 tension as to whether a tunnel ingress should copy congestion 720 markings into the outer header it creates or reset them. From the 721 control perspective either copying or resetting works for existing 722 arrangements, but copying has more potential for simplifying 723 control. From the management perspective copying is preferable. 724 From the security perspective resetting is preferable but copying 725 is now considered acceptable given the bandwidth of a 2-bit covert 726 channel can be managed. Therefore, on balance, copying is simpler 727 and more useful than resetting and does minimal harm. 729 o For any new scheme, a tunnel egress should not forward any ECN 730 codepoint if the arriving inner header implies the transport will 731 not understand how to process it. 733 o On decapsulation in any new scheme, if a combination of inner and 734 outer headers is encountered that should not have been possible, 735 this event should be logged and an alarm raised. But the packet 736 should still be forwarded with a safe codepoint setting if at all 737 possible. This increases the chances of 'forward compatibility' 738 with possible future protocol extensions. 740 o On decapsulation in any new scheme, the ECN field that the tunnel 741 egress forwards should reflect the more severe congestion marking 742 of the arriving inner and outer headers. 744 5. Backward Compatibility 746 Note: in RFC3168, a whole tunnel was considered in one of two modes: 747 limited functionality or full functionality. The new modes defined 748 in this specification are only modes of the tunnel ingress. The new 749 tunnel egress behaviour has only one mode and doesn't need to know 750 what mode the ingress is in. 752 5.1. Non-Issues Upgrading Any Tunnel Decapsulation 754 This specification only changes the egress per-packet calculation of 755 the ECN field for combinations of inner and outer headers that have 756 so far not been used in any IETF protocols. Therefore, a tunnel 757 egress complying with any previous specification (RFC4301, both modes 758 of RFC3168, both modes of RFC2481, RFC2401 and RFC2003) can be 759 upgraded to comply with this new decapsulation specification without 760 any backwards compatibility issues. 762 The proposed tunnel egress behaviour also requires no additional mode 763 or option configuration at the ingress or egress nor any additional 764 negotiation with the ingress. A compliant tunnel egress merely needs 765 to implement the one behaviour in Section 4. The reduction to one 766 mode at the egress has no backwards compatibility issues, because 767 previously the egress produced the same output whichever mode the 768 tunnel was in. 770 These new decapsulation rules have been defined in such a way that 771 congestion control will still work safely if any of the earlier 772 versions of ECN processing are used unilaterally at the encapsulating 773 ingress of the tunnel (any of RFC2003, RFC2401, either mode of 774 RFC2481, either mode of RFC3168, RFC4301 and this present 775 specification). If a tunnel ingress tries to negotiate to use 776 limited functionality mode or full functionality mode [RFC3168], a 777 decapsulating tunnel egress compliant with this specification MUST 778 agree to either request, as its behaviour will be the same in both 779 cases. 781 For 'forward compatibility', a compliant tunnel egress SHOULD raise a 782 warning about any requests to enter modes it doesn't recognise, but 783 it can continue operating. If no ECN-related mode is requested, a 784 compliant tunnel egress can continue without raising any error or 785 warning as its egress behaviour is compatible with all the legacy 786 ingress behaviours that don't negotiate capabilities. 788 5.2. Non-Issues for RFC4301 IPsec Encapsulation 790 The new normal mode of ingress behaviour defined above (Section 4.1) 791 brings all IP in IP tunnels into line with [RFC4301]. If one end of 792 an IPsec tunnel is compliant with [RFC4301], the other end is 793 guaranteed to also be RFC4301-compliant (there could be corner cases 794 where manual keying is used, but they will be set aside here). 795 Therefore the new normal ingress behaviour introduces no backward 796 compatibility isses with IKEv2 [RFC4306] IPsec [RFC4301] tunnels, and 797 no need for any new modes, options or configuration. 799 5.3. Upgrading Other IP in IP Tunnel Encapsulators 801 At the tunnel ingress, this specification effectively extends the 802 scope of RFC4301's ingress behaviour to any IP in IP tunnel. If any 803 other IP in IP tunnel ingress (i.e. not RFC4301 IPsec) is upgraded to 804 be compliant with this specification, it has to cater for the 805 possibility that it is talking to a legacy tunnel egress that may not 806 know how to process the ECN field. If ECN capable outer headers were 807 sent towards a legacy (e.g. [RFC2003]) egress, it would most likely 808 simply disregard the outer headers, dangerously discarding 809 information about congestion experienced within the tunnel. ECN- 810 capable traffic sources would not see any congestion feedback and 811 instead continually ratchet up their share of the bandwidth without 812 realising that cross-flows from other ECN sources were continually 813 having to ratchet down. 815 This specification introduces no new backward compatibility issues 816 when a compliant ingress talks with a legacy egress, but it has to 817 provide similar sfaeguards to those already defined in RFC3168. 818 Therefore, to comply with this specification, a tunnel ingress that 819 does not always know the ECN capability of its tunnel egress MUST 820 implement a 'normal' mode and a 'compatibility' mode, and for safety 821 it MUST initiate each negotiated tunnel in compatibility mode. 823 However, a tunnel ingress can be compliant even if it only implements 824 the 'normal mode' of encapsulation behaviour, but only as long as it 825 is designed or configured so that all possible tunnel egress nodes it 826 will ever talk to will have at least full ECN functionality 827 (complying with either RFC3168 full functionality mode, RFC4301 or 828 this present specification). 830 Before switching to normal mode, a compliant tunnel ingress that does 831 not know the egress ECN capability MUST negotiate with the tunnel 832 egress. If the egress says it is compliant with this specification 833 or with RFC3168 full functionality mode, the ingress puts itself into 834 normal mode. If the egress denies compliance with all of these or 835 doesn't understand the question, the tunnel ingress MUST remain in 836 compatibility mode. 838 The encapsulation rules for normal mode and compatibility mode are 839 defined in Section 4 (i.e. header copying or zeroing respectively). 841 An ingress cannot claim compliance with this specification simply by 842 disabling ECN processing across the tunnel (only implementing 843 compatibility mode). Although such a tunnel ingress is at least safe 844 with the ECN behaviour of any egress it may encounter (any of 845 RFC2003, RFC2401, either mode of RFC2481 and RFC3168's limited 846 functionality mode), it doesn't meet the aim of introducing ECN. 848 Therefore, a compliant tunnel ingress MUST at least implement `normal 849 mode' and, if it might be used with arbitrary tunnel egress nodes, it 850 MUST also implement `compatibility mode'. 852 Implementation note: if a compliant node is the ingress for multiple 853 tunnels, a mode setting will need to be stored for each tunnel 854 ingress. However, if a node is the egress for multiple tunnels, none 855 of the tunnels will need to store a mode setting, because a compliant 856 egress can only be in one mode. 858 6. Changes from Earlier RFCs 860 On encapsulation, the rule that a normal mode tunnel ingress MUST 861 copy any ECN field into the outer header is a change to the ingress 862 behaviour of RFC3168, but it is the same as the rules for IPsec 863 tunnels in RFC4301. 865 On decapsulation, the rules for calculating the outgoing ECN field at 866 a tunnel egress are similar to the full functionality mode of ECN in 867 RFC3168 and to RFC4301, with the following exceptions: 869 o The outer, not the inner, is propagated when the outer is ECT(1) 870 and the inner is ECT(0); 872 o A packet with Not-ECT in the inner may be forwarded as Not-ECT 873 rather than dropped, if the outer is ECT(0); 875 o The following extra illegal combinations have been identified, 876 which may require logging and/or an alarm: outer ECT(1) with inner 877 CE; outer ECT(0) with inner ECT(1) 879 The rules for how a tunnel establishes whether the egress has full 880 functionality ECN capabilities are an update to RFC3168. For all the 881 typical cases, RFC4301 is not updated by the ECN capability check in 882 this specification, because a typical RFC4301 tunnel ingress will 883 have already established that it is talking to an RFC4301 tunnel 884 egress (e.g. if it uses IKEv2). However, there may be some corner 885 cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with 886 an egress with limited functionality ECN handling. Strictly, for 887 such corner cases, the requirement to use compatibility mode in this 888 specification updates RFC4301, but this is unlikely to be necessary 889 to implement for this corner case in practice. 891 The optional ECN Tunnel field in the IPsec security association 892 database (SAD) and the optional ECN Tunnel Security Association 893 Attribute defined in RFC3168 are no longer needed. The security 894 association (SA) has no policy on ECN usage, because all RFC4301 895 tunnels now support ECN without any policy choice. 897 RFC3168 defines a (required) limited functionality mode and an 898 (optional) full functionality mode for a tunnel, but RFC4301 doesn't 899 need modes. In this specification only the ingress might need two 900 modes: a normal mode (required) and a compatibility mode (required in 901 some scenarios, optional in others). The egress needs only one mode 902 which correctly handles any ingress ECN behaviour. 904 Additional changes to the RFC Index (to be removed by the RFC Editor): 906 In the RFC index, RFC3168 should be identified as an update to 907 RFC2003. RFC4301 should be identified as an update to RFC3168. 909 This specification updates RFC3168 and RFC4301. 911 7. IANA Considerations 913 This memo includes no request to IANA. 915 8. Security Considerations 917 Appendix A.1 discusses the security constraints imposed on ECN tunnel 918 processing. The new rules for ECN tunnel processing (Section 4) 919 trade-off between security (covert channels) and congestion 920 monitoring & control. In fact, ensuring congestion markings are not 921 lost is itself another aspect of security, because if we allowed 922 congestion notification to be lost, any attempt to enforce a response 923 to congestion would be much harder. 925 If alternate congestion notification semantics are defined for a 926 certain PHB (e.g. the pre-congestion notification architecture 927 [I-D.ietf-pcn-architecture]), the scope of the alternate semantics 928 might typically be bounded by the limits of a Diffserv region or 929 regions, as envisaged in [RFC4774]. The inner headers in tunnels 930 crossing the boundary of such a Diffserv region but ending within the 931 region can potentially leak the external congestion notification 932 semantics into the region, or leak the internal semantics out of the 933 region. [RFC2983] discusses the need for Diffserv traffic 934 conditioning to be applied at these tunnel endpoints as if they are 935 at the edge of the Diffserv region. Similar concerns apply to any 936 processing or propagation of the ECN field at the edges of a Diffserv 937 region with alternate ECN semantics. Such edge processing must also 938 be applied at the endpoints of tunnels with one end inside and the 939 other outside the domain. [I-D.ietf-pcn-architecture] gives specific 940 advice on this for the PCN case, but other definitions of alternate 941 semantics will need to discuss the specific security implications in 942 each case. 944 With the decapsulation rules as they stood in RFC3168 and RFC4301, a 945 small part of the protection of the ECN nonce [RFC3540] was 946 compromised. The new decapsulation rules do not solve this problem. 948 The minor problem is as follows: The ECN nonce was defined to enable 949 the data source to detect if a CE marking had been applied then 950 subsequently removed. The source could detect this by weaving a 951 pseudo-random sequence of ECT(0) and ECT(1) values into a stream of 952 packets, which is termed an ECN nonce. By the decapsulation rules in 953 RFC3168 and RFC4301, if the inner and outer headers carry 954 contradictory ECT values only the inner header is preserved for 955 onward forwarding. So if a CE marking added to the outer ECN field 956 in a tunnel has been illegally (or accidentally) suppressed by a 957 subsequent node in the tunnel, the decapsulator will revert the ECN 958 field to its value before tampering, hiding all evidence of the crime 959 from the onward feedback loop. We chose not to close this minor 960 loophole for all the following reasons: 962 1. This loophole is only applicable in the corner case where the 963 attacker controls a network node downstream of a congested node 964 in the same tunnel; 966 2. In tunnelling scenarios, the ECN nonce is already vulnerable to 967 suppression by nodes downstream of a congested node in the same 968 tunnel, if they can copy the ECT value in the inner header to the 969 outer header (any node in the tunnel can do this if the inner 970 header is not encrypted, and an IPsec tunnel egress can do it 971 whether or not the tunnel is encrypted); 973 3. Although the new decapsulation behaviour removes evidence of 974 congestion suppression from the onward feedback loop, the 975 decapsulator itself can at least detect that congestion within 976 the tunnel has been suppressed; 978 4. The ECN nonce [RFC3540] currently has experimental status and 979 there has been no evidence that anyone has implemented it beyond 980 the author's prototype. 982 We could have fixed this loophole by specifying that the outer header 983 should always be propagated onwards if inner and outer are both ECT. 984 Although this would close the minor loophole in the nonce, it would 985 raise a minor safety issue if multilevel ECN or PCN were used. A 986 less severe marking in the inner header would override a more severe 987 one in the outer. Both are corner cases so it is difficult to decide 988 which is more important: 990 1. The loophole in the nonce is only for a minor case of one tunnel 991 node attacking another in the same tunnel; 993 2. The severity inversion for multilevel congestion notification 994 would not result from any legal codepoint transition. 996 We decided safety against misconfiguration was slightly more 997 important than securing against an attack that has little, if any, 998 clear motivation. 1000 If a legacy security policy configures a legacy tunnel ingress to 1001 negotiate to turn off ECN processing, a compliant tunnel egress will 1002 agree to a request to turn off ECN processing but it will actually 1003 still copy CE markings from the outer to the forwarded header. 1004 Although the tunnel ingress 'I' in Figure 5 (Appendix A.1) will set 1005 all ECN fields in outer headers to Not-ECT, 'M' could still toggle CE 1006 on and off to communicate covertly with 'B', because we have 1007 specified that 'E' only has one mode regardless of what mode it says 1008 it has negotiated. We could have specified that 'E' should have a 1009 limited functionality mode and check for such behaviour. But we 1010 decided not to add the extra complexity of two modes on a compliant 1011 tunnel egress merely to cater for a legacy security concern that is 1012 now considered manageable. 1014 9. Conclusions 1016 This document updates the ingress tunnelling encapsulation of RFC3168 1017 ECN for all IP in IP tunnels to bring it into line with the new 1018 behaviour in the IPsec architecture of RFC4301. It copies rather 1019 than resets a congestion experienced (CE) marking when creating outer 1020 headers. 1022 It also specifies new rules that update both RFC3168 and RFC4301 for 1023 calculating the outgoing ECN field on tunnel decapsulation. The new 1024 rules update egress behaviour for two specific combinations of inner 1025 and outer header that have no current legal usage, but will now be 1026 possible to use in future standards actions, rather than being wasted 1027 by current tunnelling behaviour. 1029 The new rules propagate changes to the ECN field across tunnel end- 1030 points that were previously blocked due to a perceived covert channel 1031 vulnerability. The new IPsec architecture deems the two-bit covert 1032 channel that the ECN field opens up is a manageable threat, so these 1033 new rules bring all IP in IP tunnelling into line with this new more 1034 permissive attitude. The result is a single specification for all 1035 future tunnelling of ECN, whether IPsec or not. Then equipment can 1036 be specified against a single ECN behaviour and ECN markings can have 1037 a well-defined meaning wherever they are measured in a network. This 1038 new certainty will enable new uses of the ECN field that would 1039 otherwise be confounded by ambiguity. 1041 The immediate motivation for making these changes is to allow the 1042 introduction of multi-level pre-congestion notification (PCN). But 1043 great care has been taken to ensure the resulting ECN tunnelling 1044 behaviour is simple and generic for other potential future uses. 1046 The change to encapsulation has been analysed from the three 1047 perspectives of security, control and management. They are somewhat 1048 in tension as to whether a tunnel ingress should copy congestion 1049 markings into the outer header it creates or reset them. From the 1050 control perspective either copying or resetting works for existing 1051 arrangements, but copying has more potential for simplifying control 1052 and resetting breaks at least one proposal already on the standards 1053 track. From the management and monitoring perspective copying is 1054 preferable. From the network security perspective (theft of service 1055 etc) copying is preferable. From the information security 1056 perspective resetting is preferable, but the IETF Security Area now 1057 considers copying acceptable given the bandwidth of a 2-bit covert 1058 channel can be managed. Therefore there are no points against 1059 copying and a number against resetting CE on ingress. 1061 The only downside of the changes to decapsulation is that the same 1062 2-bit covert channel is opened up as at the ingress, but this is now 1063 deemed to be a manageable threat. The changes at decapsulation have 1064 been found to be free of any backwards compatibility issues. 1066 10. Acknowledgements 1068 Thanks to Anil Agawaal for pointing out a case where it's safe for a 1069 tunnel decapsulator to forward a combination of headers it doesn't 1070 understand. Thanks to David Black for explaining a better way to 1071 think about function placement and to Louise Burness for a better way 1072 to think about multilayer transports and networks, having read 1073 [Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for 1074 Appendix C. Thanks to Michael Menth, Bruce Davie, Toby Moncaster, 1075 Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for 1076 their thoughts and careful review comments. 1078 Bob Briscoe is partly funded by Trilogy, a research project (ICT- 1079 216372) supported by the European Community under its Seventh 1080 Framework Programme. The views expressed here are those of the 1081 author only. 1083 11. Comments Solicited 1085 Comments and questions are encouraged and very welcome. They can be 1086 addressed to the IETF Transport Area working group mailing list 1087 , and/or to the authors. 1089 12. References 1091 12.1. Normative References 1093 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1094 October 1996. 1096 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1097 Requirement Levels", BCP 14, RFC 2119, March 1997. 1099 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1100 "Definition of the Differentiated Services Field (DS 1101 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1102 December 1998. 1104 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1105 of Explicit Congestion Notification (ECN) to IP", 1106 RFC 3168, September 2001. 1108 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1109 Internet Protocol", RFC 4301, December 2005. 1111 12.2. Informative References 1113 [I-D.briscoe-pcn-3-in-1-encoding] 1114 Briscoe, B., "PCN 3-State Encoding Extension in a single 1115 DSCP", draft-briscoe-pcn-3-in-1-encoding-00 (work in 1116 progress), October 2008. 1118 [I-D.charny-pcn-single-marking] 1119 Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre- 1120 Congestion Notification Using Single Marking for Admission 1121 and Termination", draft-charny-pcn-single-marking-03 1122 (work in progress), November 2007. 1124 [I-D.ietf-pcn-architecture] 1125 Eardley, P., "Pre-Congestion Notification (PCN) 1126 Architecture", draft-ietf-pcn-architecture-10 (work in 1127 progress), March 2009. 1129 [I-D.ietf-pcn-baseline-encoding] 1130 Moncaster, T., Briscoe, B., and M. Menth, "Baseline 1131 Encoding and Transport of Pre-Congestion Information", 1132 draft-ietf-pcn-baseline-encoding-02 (work in progress), 1133 February 2009. 1135 [I-D.ietf-pcn-marking-behaviour] 1136 Eardley, P., "Marking behaviour of PCN-nodes", 1137 draft-ietf-pcn-marking-behaviour-02 (work in progress), 1138 March 2009. 1140 [I-D.ietf-pwe3-congestion-frmwk] 1141 Bryant, S., Davie, B., Martini, L., and E. Rosen, 1142 "Pseudowire Congestion Control Framework", 1143 draft-ietf-pwe3-congestion-frmwk-01 (work in progress), 1144 May 2008. 1146 [I-D.menth-pcn-psdm-encoding] 1147 Menth, M., Babiarz, J., Moncaster, T., and B. Briscoe, 1148 "PCN Encoding for Packet-Specific Dual Marking (PSDM)", 1149 draft-menth-pcn-psdm-encoding-00 (work in progress), 1150 July 2008. 1152 [I-D.moncaster-pcn-3-state-encoding] 1153 Moncaster, T., Briscoe, B., and M. Menth, "A three state 1154 extended PCN encoding scheme", 1155 draft-moncaster-pcn-3-state-encoding-01 (work in 1156 progress), March 2009. 1158 [I-D.satoh-pcn-st-marking] 1159 Satoh, D., Maeda, Y., Phanachet, O., and H. Ueno, "Single 1160 PCN Threshold Marking by using PCN baseline encoding for 1161 both admission and termination controls", 1162 draft-satoh-pcn-st-marking-01 (work in progress), 1163 March 2009. 1165 [IEEE802.1au] 1166 IEEE, "IEEE Standard for Local and Metropolitan Area 1167 Networks--Virtual Bridged Local Area Networks - Amendment 1168 10: Congestion Notification", 2008, 1169 . 1171 (Work in Progress; Access Controlled link within page) 1173 [ITU-T.I.371] 1174 ITU-T, "Traffic Control and Congestion Control in B-ISDN", 1175 ITU-T Rec. I.371 (03/04), March 2004. 1177 [PCNcharter] 1178 IETF, "Congestion and Pre-Congestion Notification (pcn)", 1179 IETF w-g charter , Feb 2007, 1180 . 1182 [Patterns_Arch] 1183 Day, J., "Patterns in Network Architecture: A Return to 1184 Fundamentals", Pub: Prentice Hall ISBN-13: 9780132252423, 1185 Jan 2008. 1187 [RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion 1188 Control Survey", RFC 1254, August 1991. 1190 [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. 1191 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1192 Functional Specification", RFC 2205, September 1997. 1194 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1195 RFC 2983, October 2000. 1197 [RFC3426] Floyd, S., "General Architectural and Policy 1198 Considerations", RFC 3426, November 2002. 1200 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1201 Congestion Notification (ECN) Signaling with Nonces", 1202 RFC 3540, June 2003. 1204 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 1205 RFC 4306, December 2005. 1207 [RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol 1208 (HIP) Architecture", RFC 4423, May 2006. 1210 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1211 Explicit Congestion Notification (ECN) Field", BCP 124, 1212 RFC 4774, November 2006. 1214 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 1215 Marking in MPLS", RFC 5129, January 2008. 1217 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", 1218 2000, . 1221 (Expired) 1223 Appendix A. Design Constraints 1225 Tunnel processing of a congestion notification field has to meet 1226 congestion control and management needs without creating new 1227 information security vulnerabilities (if information security is 1228 required). This appendix documents the analysis of the tradeoffs 1229 between these factors that led to the new encapsulation rules in 1230 Section 4.1. 1232 A.1. Security Constraints 1234 Information security can be assured by using various end to end 1235 security solutions (including IPsec in transport mode [RFC4301]), but 1236 a commonly used scenario involves the need to communicate between two 1237 physically protected domains across the public Internet. In this 1238 case there are certain management advantages to using IPsec in tunnel 1239 mode solely across the publicly accessible part of the path. The 1240 path followed by a packet then crosses security 'domains'; the ones 1241 protected by physical or other means before and after the tunnel and 1242 the one protected by an IPsec tunnel across the otherwise unprotected 1243 domain. We will use the scenario in Figure 5 where endpoints 'A' and 1244 'B' communicate through a tunnel. The tunnel ingress 'I' and egress 1245 'E' are within physically protected edge domains, while the tunnel 1246 spans an unprotected internetwork where there may be 'men in the 1247 middle', M. 1249 physically unprotected physically 1250 <-protected domain-><--domain--><-protected domain-> 1251 +------------------+ +------------------+ 1252 | | M | | 1253 | A-------->I=========>==========>E-------->B | 1254 | | | | 1255 +------------------+ +------------------+ 1256 <----IPsec secured----> 1257 tunnel 1259 Figure 5: IPsec Tunnel Scenario 1261 IPsec encryption is typically used to prevent 'M' seeing messages 1262 from 'A' to 'B'. IPsec authentication is used to prevent 'M' 1263 masquerading as the sender of messages from 'A' to 'B' or altering 1264 their contents. But 'I' can also use IPsec tunnel mode to allow 'A' 1265 to communicate with 'B', but impose encryption to prevent 'A' leaking 1266 information to 'M'. Or 'E' can insist that 'I' uses tunnel mode 1267 authentication to prevent 'M' communicating information to 'B'. 1268 Mutable IP header fields such as the ECN field (as well as the TTL/ 1269 Hop Limit and DS fields) cannot be included in the cryptographic 1270 calculations of IPsec. Therefore, if 'I' copies these mutable fields 1271 into the outer header that is exposed across the tunnel it will have 1272 allowed a covert channel from 'A' to M that bypasses its encryption 1273 of the inner header. And if 'E' copies these fields from the outer 1274 header to the inner, even if it validates authentication from 'I', it 1275 will have allowed a covert channel from 'M' to 'B'. 1277 ECN at the IP layer is designed to carry information about congestion 1278 from a congested resource towards downstream nodes. Typically a 1279 downstream transport might feed the information back somehow to the 1280 point upstream of the congestion that can regulate the load on the 1281 congested resource, but other actions are possible (see [RFC3168] 1282 S.6). In terms of the above unicast scenario, ECN is typically 1283 intended to create an information channel from 'M' to 'B' (for 'B' to 1284 feed back to 'A'). Therefore the goals of IPsec and ECN are mutually 1285 incompatible. 1287 With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, 1288 "controls are provided to manage the bandwidth of this [covert] 1289 channel". Using the ECN processing rules of RFC4301, the channel 1290 bandwidth is two bits per datagram from 'A' to 'M' and one bit per 1291 datagram from 'M' to 'A' (because 'E' limits the combinations of the 1292 2-bit ECN field that it will copy). In both cases the covert channel 1293 bandwidth is further reduced by noise from any real congestion 1294 marking. RFC4301 therefore implies that these covert channels are 1295 sufficiently limited to be considered a manageable threat. However, 1296 with respect to the larger (6b) DS field, the same section of RFC4301 1297 says not copying is the default, but a configuration option can allow 1298 copying "to allow a local administrator to decide whether the covert 1299 channel provided by copying these bits outweighs the benefits of 1300 copying". Of course, an administrator considering copying of the DS 1301 field has to take into account that it could be concatenated with the 1302 ECN field giving an 8b per datagram covert channel. 1304 Thus, for tunnelling the 6b Diffserv field two conceptual models have 1305 had to be defined so that administrators can trade off security 1306 against the needs of traffic conditioning [RFC2983]: 1308 The uniform model: where the DIffserv field is preserved end-to-end 1309 by copying into the outer header on encapsulation and copying from 1310 the outer header on decapsulation. 1312 The pipe model: where the outer header is independent of that in the 1313 inner header so it hides the Diffserv field of the inner header 1314 from any interaction with nodes along the tunnel. 1316 However, for ECN, the new IPsec security architecture in RFC4301 only 1317 standardised one tunnelling model equivalent to the uniform model. 1318 It deemed that simplicity was more important than allowing 1319 administrators the option of a tiny increment in security, especially 1320 given not copying congestion indications could seriously harm 1321 everyone's network service. 1323 A.2. Control Constraints 1325 Congestion control requires that any congestion notification marked 1326 into packets by a resource will be able to traverse a feedback loop 1327 back to a function capable of controlling the load on that resource. 1328 To be precise, rather than calling this function the data source, we 1329 will call it the Load Regulator. This will allow us to deal with 1330 exceptional cases where load is not regulated by the data source, but 1331 usually the two terms will be synonymous. Note the term "a function 1332 _capable of_ controlling the load" deliberately includes a source 1333 application that doesn't actually control the load but ought to (e.g. 1334 an application without congestion control that uses UDP). 1336 A--->R--->I=========>M=========>E-------->B 1338 Figure 6: Simple Tunnel Scenario 1340 We now consider a similar tunnelling scenario to the IPsec one just 1341 described, but without the different security domains so we can just 1342 focus on ensuring the control loop and management monitoring can work 1343 (Figure 6). If we want resources in the tunnel to be able to 1344 explicitly notify congestion and the feedback path is from 'B' to 1345 'A', it will certainly be necessary for 'E' to copy any CE marking 1346 from the outer header to the inner header for onward transmission to 1347 'B', otherwise congestion notification from resources like 'M' cannot 1348 be fed back to the Load Regulator ('A'). But it doesn't seem 1349 necessary for 'I' to copy CE markings from the inner to the outer 1350 header. For instance, if resource 'R' is congested, it can send 1351 congestion information to 'B' using the congestion field in the inner 1352 header without 'I' copying the congestion field into the outer header 1353 and 'E' copying it back to the inner header. 'E' can still write any 1354 additional congestion marking introduced across the tunnel into the 1355 congestion field of the inner header. 1357 It might be useful for the tunnel egress to be able to tell whether 1358 congestion occurred across a tunnel or upstream of it. If outer 1359 header congestion marking was reset by the tunnel ingress ('I'), at 1360 the end of a tunnel ('E') the outer headers would indicate congestion 1361 experienced across the tunnel ('I' to 'E'), while the inner header 1362 would indicate congestion upstream of 'I'. But similar information 1363 can be gleaned even if the tunnel ingress copies the inner to the 1364 outer headers. At the end of the tunnel ('E'), any packet with an 1365 _extra_ mark in the outer header relative to the inner header 1366 indicates congestion across the tunnel ('I' to 'E'), while the inner 1367 header would still indicate congestion upstream of ('I'). Appendix C 1368 gives a simple and precise method for a tunnel egress to infer the 1369 congestion level introduced across a tunnel. 1371 All this shows that 'E' can preserve the control loop irrespective of 1372 whether 'I' copies congestion notification into the outer header or 1373 resets it. 1375 That is the situation for existing control arrangements but, because 1376 copying reveals more information, it would open up possibilities for 1377 better control system designs. For instance, Appendix E describes 1378 how resetting CE marking at a tunnel ingress confuses a proposed 1379 congestion marking scheme on the standards track. It ends up 1380 removing excessive amounts of traffic unnecessarily. Whereas copying 1381 CE markings at ingress leads to the correct control behaviour. 1383 A.3. Management Constraints 1385 As well as control, there are also management constraints. 1386 Specifically, a management system may monitor congestion markings in 1387 passing packets, perhaps at the border between networks as part of a 1388 service level agreement. For instance, monitors at the borders of 1389 autonomous systems may need to measure how much congestion has 1390 accumulated since the original source, perhaps to determine between 1391 them how much of the congestion is contributed by each domain. 1393 Therefore, when monitoring the middle of a path, it should be 1394 possible to establish how far back in the path congestion markings 1395 have accumulated from. In this document we term this the baseline of 1396 congestion marking (or the Congestion Baseline), i.e. the source of 1397 the layer that last reset (or created) the congestion notification 1398 field. Given some tunnels cross domain borders (e.g. consider M in 1399 Figure 6 is monitoring a border), it would therefore be desirable for 1400 'I' to copy congestion accumulated so far into the outer headers 1401 exposed across the tunnel. 1403 Appendix B.2 discusses various scenarios where the Load Regulator 1404 lies in-path, not at the source host as we would typically expect. 1405 It concludes that a Congestion Baseline is determined by where the 1406 Load Regulator function is, which should be identified in the 1407 transport layer, not by addresses in network layer headers. This 1408 applies whether the Load Regulator is at the source host or within 1409 the path. The appendix also discusses where a Load Regulator 1410 function should be located relative to a local tunnel encapsulation 1411 function. 1413 Appendix B. Relative Placement of Tunnelling and In-Path Load 1414 Regulation 1416 B.1. Identifiers and In-Path Load Regulators 1418 The Load Regulator is the node to which congestion feedback should be 1419 returned by the next downstream node with a transport layer feedback 1420 function (typically but not always the data receiver). The Load 1421 Regulator is often, but not always the data source. It is not always 1422 (or even typically) the same thing as the node identified by the 1423 source address of the outermost exposed header. In general the 1424 addressing of the outermost encapsulation header says nothing about 1425 the identifiers of either the upstream or the downstream transport 1426 layer functions. As long as the transport functions know each 1427 other's addresses, they don't have to be identified in the network 1428 layer or in any link layer. It was only a convenience that a TCP 1429 receiver assumed that the address of the source transport is the same 1430 as the network layer source address of an IP packet it receives. 1432 More generally, the return transport address for feedback could be 1433 identified solely in the transport layer protocol. For instance, a 1434 signalling protocol like RSVP [RFC2205] breaks up a path into 1435 transport layer hops and informs each hop of the address of its 1436 transport layer neighbour without any need to identify these hops in 1437 the network layer. RSVP can be arranged so that these transport 1438 layer hops are bigger than the underlying network layer hops. The 1439 host identity protocol (HIP) architecture [RFC4423] also supports the 1440 same principled separation (for mobility amongst other things), where 1441 the transport layer sender identifies its transport address for 1442 feedback to be sent to, using an identifier provided by a shim below 1443 the transport layer. 1445 Keeping to this layering principle deliberately doesn't require a 1446 network layer packet header to reveal the origin address from where 1447 congestion notification accumulates (its Congestion Baseline). It is 1448 not necessary for the network and lower layers to know the address of 1449 the Load Regulator. Only the destination transport needs to know 1450 that. With forward congestion notification, the network and link 1451 layers only notify congestion forwards; they aren't involved in 1452 feeding it backwards. If they are (e.g. backward congestion 1453 notification (BCN) in Ethernet [IEEE802.1au] or EFCI in ATM 1454 [ITU-T.I.371]), that should be considered as a transport function 1455 added to the lower layer, which must sort out its own addressing. 1456 Indeed, this is one reason why ICMP source quench is now deprecated 1457 [RFC1254]; when congestion occurs within a tunnel it is complex 1458 (particularly in the case of IPsec tunnels) to return the ICMP 1459 messages beyond the tunnel ingress back to the Load Regulator. 1461 Similarly, if a management system is monitoring congestion and needs 1462 to know the Congestion Baseline, the management system has to find 1463 this out from the transport; in general it cannot tell solely by 1464 looking at the network or link layer headers. 1466 B.2. Non-Dependence of Tunnelling on In-path Load Regulation 1468 We have said that at any point in a network, the Congestion Baseline 1469 (where congestion notification starts from zero) should be the 1470 previous upstream Load Regulator. We have also said that the ingress 1471 of an IP in IP tunnel must copy congestion indications to the 1472 encapsulating outer headers it creates. If the Load Regulator is in- 1473 path rather than at the source, and also a tunnel ingress, these two 1474 requirements seem to be contradictory. A tunnel ingress must not 1475 reset incoming congestion, but a Load Regulator must be the 1476 Congestion Baseline, implying it needs to reset incoming congestion. 1478 In fact, the two requirements are not contradictory, because a Load 1479 Regulator and a tunnel ingress are not the names of machines, but the 1480 names of functions within a machine that typically occur in sequence 1481 on a stream of packets, not at the same point. Figure 7 is borrowed 1482 from [RFC2983] (which was making a similar point about the location 1483 of Diffserv traffic conditioning relative to the encapsulation 1484 function of a tunnel). An in-path Load Regulator can act on packets 1485 either at [1 - Before] encapsulation or at [2 - Outer] after 1486 encapsulation. Load Regulation does not ever need to be integrated 1487 with the [Encapsulate] function (but it can be for efficiency). 1488 Therefore we can still mandate that the [Encapsulate] function always 1489 copies CE into the outer header. 1491 >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->> 1492 \ 1493 \ 1494 +--------[2 - Outer]------->> 1496 Figure 7: Placement of In-Path Load Regulator Relative to Tunnel 1497 Ingress 1499 Then separately, if there is a Load Regulator at location [2 - 1500 Outer], it might reset CE to ECT(0), say. Then the Congestion 1501 Baseline for the lower layer (outer) will be [2 - Outer], while the 1502 Congestion Baseline of the inner layer will be unchanged. But how 1503 encapsulation works has nothing to do with whether a Load Regulator 1504 is present or where it is. 1506 If on the other hand a Load Regulator resets CE at [1 - Before], the 1507 Congestion Baseline of both the inner and outer headers will be [1 - 1508 Before]. But again, encapsulation is independent of load regulation. 1510 B.3. Dependence of In-Path Load Regulation on Tunnelling 1512 Although encapsulation doesn't need to depend on in-path load 1513 regulation, the reverse is not true. The placement of an in-path 1514 Load Regulator must be carefully considered relative to 1515 encapsulation. Some examples are given in the following for 1516 guidance. 1518 In the traditional Internet architecture one tends to think of the 1519 source host as the Load Regulator for a path. It is generally not 1520 desirable or practical for a node part way along the path to regulate 1521 the load. However, various reasonable proposals for in-path load 1522 regulation have been made from time to time (e.g. fair queuing, 1523 traffic engineering, flow admission control). The IETF has recently 1524 chartered a working group to standardise admission control across a 1525 part of a path using pre-congestion notification (PCN) [PCNcharter]. 1526 This is of particular relevance here because it involves congestion 1527 notification with an in-path Load Regulator, it can involve 1528 tunnelling and it certainly involves encapsulation more generally. 1530 We will use the more complex scenario in Figure 8 to tease out all 1531 the issues that arise when combining congestion notification and 1532 tunnelling with various possible in-path load regulation schemes. In 1533 this case 'I1' and 'E2' break up the path into three separate 1534 congestion control loops. The feedback for these loops is shown 1535 going right to left across the top of the figure. The 'V's are arrow 1536 heads representing the direction of feedback, not letters. But there 1537 are also two tunnels within the middle control loop: 'I1' to 'E1' and 1538 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS 1539 core networks. M is a congestion monitoring point, perhaps between 1540 two border routers where the same tunnel continues unbroken across 1541 the border. 1542 ______ _______________________________________ _____ 1543 / \ / \ / \ 1544 V \ V M \ V \ 1545 A--->R--->I1===========>E1----->I2=========>==========>E2------->B 1547 Figure 8: Complex Tunnel Scenario 1549 The question is, should the congestion markings in the outer exposed 1550 headers of a tunnel represent congestion only since the tunnel 1551 ingress or over the whole upstream path from the source of the inner 1552 header (whatever that may mean)? Or put another way, should 'I1' and 1553 'I2' copy or reset CE markings? 1554 Based on the design principles in Section 4.3, the answer is that the 1555 Congestion Baseline should be the nearest upstream interface designed 1556 to regulate traffic load--the Load Regulator. In Figure 8 'A', 'I1' 1557 or 'E2' are all Load Regulators. We have shown the feedback loops 1558 returning to each of these nodes so that they can regulate the load 1559 causing the congestion notification. So the Congestion Baseline 1560 exposed to M should be 'I1' (the Load Regulator), not 'I2'. 1561 Therefore I1 should reset any arriving CE markings. In this case, 1562 'I1' knows the tunnel to 'E1' is unrelated to its load regulation 1563 function. So the load regulation function within 'I1' should be 1564 placed at [1 - Before] tunnel encapsulation within 'I1' (using the 1565 terminology of Figure 7). Then the Congestion Baseline all across 1566 the networks from 'I1' to 'E2' in both inner and outer headers will 1567 be 'I1'. 1569 The following further examples illustrate how this answer might be 1570 applied: 1572 o We argued in Appendix E that resetting CE on encapsulation could 1573 harm PCN excess rate marking, which marks excess traffic for 1574 removal in subsequent round trips. This marking relies on not 1575 marking packets if another node upstream has already marked them 1576 for removal. If there were a tunnel ingress between the two which 1577 reset CE markings, it would confuse the downstream node into 1578 marking far too much traffic for removal. So why do we say that 1579 'I1' should reset CE, while a tunnel ingress shouldn't? The 1580 answer is that it is the Load Regulator function at 'I1' that is 1581 resetting CE, not the tunnel encapsulator. The Load Regulator 1582 needs to set itself as the Congestion Baseline, so the feedback it 1583 gets will only be about congestion on links it can relieve itself 1584 (by regulating the load into them). When it resets CE markings, 1585 it knows that something else upstream will have dealt with the 1586 congestion notifications it removes, given it is part of an end- 1587 to-end admission control signalling loop. It therefore knows that 1588 previous hops will be covered by other Load Regulators. 1589 Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should 1590 follow the new rule for any tunnel ingress and copy congestion 1591 marking into the outer tunnel header. The ingress at 'I1' will 1592 happen to copy headers that have already been reset just 1593 beforehand. But it doesn't need to know that. 1595 o [Shayman] suggested feedback of ECN accumulated across an MPLS 1596 domain could cause the ingress to trigger re-routing to mitigate 1597 congestion. This case is more like the simple scenario of 1598 Figure 6, with a feedback loop across the MPLS domain ('E' back to 1599 'I'). I is a Load Regulator because re-routing around congestion 1600 is a load regulation function. But in this case 'I' should only 1601 reset itself as the Congestion Baseline in outer headers, as it is 1602 not handling congestion outside its domain, so it must preserve 1603 the end-to-end congestion feedback loop for something else to 1604 handle (probably the data source). Therefore the Load Regulator 1605 within 'I' should be placed at [2 - Outer] to reset CE markings 1606 just after the tunnel ingress has copied them from arriving 1607 headers. Again, the tunnel encapsulation function at 'I' simply 1608 copies incoming headers, unaware that the load regulator will 1609 subsequently reset its outer headers. 1611 o The PWE3 working group of the IETF is considering the problem of 1612 how and whether an aggregate edge-to-edge pseudo-wire emulation 1613 should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. 1614 Although the study is still at the requirements stage, some 1615 (controversial) solution proposals include in-path load regulation 1616 at the ingress to the tunnel that could lead to tunnel 1617 arrangements with similar complexity to that of Figure 8. 1619 These are not contrived scenarios--they could be a lot worse. For 1620 instance, a host may create a tunnel for IPsec which is placed inside 1621 a tunnel for Mobile IP over a remote part of its path. And around 1622 this all we may have MPLS labels being pushed and popped as packets 1623 pass across different core networks. Similarly, it is possible that 1624 subnets could be built from link technology (e.g. future Ethernet 1625 switches) so that link headers being added and removed could involve 1626 congestion notification in future Ethernet link headers with all the 1627 same issues as with IP in IP tunnels. 1629 One reason we introduced the concept of a Load Regulator was to allow 1630 for in-path load regulation. In the traditional Internet 1631 architecture one tends to think of a host and a Load Regulator as 1632 synonymous, but when considering tunnelling, even the definition of a 1633 host is too fuzzy, whereas a Load Regulator is a clearly defined 1634 function. Similarly, the concept of innermost header is too fuzzy to 1635 be able to (wrongly) say that the source address of the innermost 1636 header should be the Congestion Baseline. Which is the innermost 1637 header when multiple encapsulations may be in use? Where do we stop? 1638 If we say the original source in the above IPsec-Mobile IP case is 1639 the host, how do we know it isn't tunnelling an encrypted packet 1640 stream on behalf of another host in a p2p network? 1642 We have become used to thinking that only hosts regulate load. The 1643 end to end design principle advises that this is a good idea 1644 [RFC3426], but it also advises that it is solely a guiding principle 1645 intended to make the designer think very carefully before breaking 1646 it. We do have proposals where load regulation functions sit within 1647 a network path for good, if sometimes controversial, reasons, e.g. 1648 PCN edge admission control gateways [I-D.ietf-pcn-architecture] or 1649 traffic engineering functions at domain borders to re-route around 1650 congestion [Shayman]. Whether or not we want in-path load 1651 regulation, we have to work round the fact that it will not go away. 1653 Appendix C. Contribution to Congestion across a Tunnel 1655 This specification mandates that a tunnel ingress determines the ECN 1656 field of each new outer tunnel header by copying the arriving header. 1657 Concern has been expressed that this will make it difficult for the 1658 tunnel egress to monitor congestion introduced only along a tunnel, 1659 which is easy if the outer ECN field is reset at a tunnel ingress 1660 (RFC3168 full functionality mode). However, in fact copying CE marks 1661 at ingress will still make it easy for the egress to measure 1662 congestion introduced across a tunnel, as illustrated below. 1664 Consider 100 packets measured at the egress. It measures that 30 are 1665 CE marked in the inner and outer headers and 12 have additional CE 1666 marks in the outer but not the inner. This means packets arriving at 1667 the ingress had already experienced 30% congestion. However, it does 1668 not mean there was 12% congestion across the tunnel. The correct 1669 calculation of congestion across the tunnel is p_t = 12/(100-30) = 1670 12/70 = 17%. This is easy for the egress to to measure. It is the 1671 packets with additional CE marking in the outer header (12) as a 1672 proportion of packets not marked in the inner header (70). 1674 Figure 9 illustrates this in a combinatorial probability diagram. 1675 The square represents 100 packets. The 30% division along the bottom 1676 represents marking before the ingress, and the p_t division up the 1677 side represents marking along the tunnel. 1679 +-----+---------+100% 1680 | | | 1681 | 30 | | 1682 | | | The large square 1683 | +---------+p_t represents 100 packets 1684 | | 12 | 1685 +-----+---------+0 1686 0 30% 100% 1687 inner header marking 1689 Figure 9: Tunnel Marking of Packets Already Marked at Ingress 1691 Appendix D. Why Not Propagating ECT(1) on Decapsulation Impedes PCN 1693 Multi-level congestion notification is currently on the IETF's 1694 standards track agenda in the Congestion and Pre-Congestion 1695 Notification (PCN) working group. The PCN working group eventually 1696 requires three congestion states (not marked and two increasingly 1697 severe levels of congestion marking) [I-D.ietf-pcn-architecture]. 1698 The aim is for the less severe level of marking to stop admitting new 1699 traffic and the more severe level to terminate sufficient existing 1700 flows to bring a network back to its operating point after a serious 1701 failure. 1703 Although the ECN field gives sufficient codepoints for these three 1704 states, current ECN tunnelling RFCs prevent the PCN working group 1705 from using three ECN states in case any tunnel decapsulations occur 1706 within a PCN region (see Appendix A of 1707 [I-D.ietf-pcn-baseline-encoding]). If a node in a tunnel sets the 1708 ECN field to ECT(0) or ECT(1), this change will be discarded by a 1709 tunnel egress compliant with RFC4301 or RFC3168. This can be seen in 1710 Figure 2 (Section 3.2), where ECT values in the outer header are 1711 ignored unless the inner header is the same. Effectively one ECT 1712 codepoint is wasted; the ECT(0) and ECT(1) codepoints have to be 1713 treated as just one codepoint when they could otherwise have been 1714 used for their intended purpose of congestion notification. 1716 As a consequence, the PCN w-g has initially confined itself to two 1717 encoding states as a baseline encoding 1718 [I-D.ietf-pcn-baseline-encoding]. And it has had to propose an 1719 experimental extension using extra Diffserv codepoint(s) to encode 1720 the extra states [I-D.moncaster-pcn-3-state-encoding], using up the 1721 rapidly exhausting DSCP space while leaving ECN codepoints unused. 1722 Another PCN encoding has been proposed that would survive tunnelling 1723 without an extra DSCP [I-D.menth-pcn-psdm-encoding], but it requires 1724 the PCN edge gateways to somehow share state so the egress can 1725 determine which marking a packet started with at the ingress. Also a 1726 PCN ingress node can game the system by initiating packets with 1727 inappropriate markings. Yet another work-round to the ECN tunnelling 1728 problem proposes a more involved marking algorithm in the forwarding 1729 plane to encode the three congestion notification states using only 1730 two ECN codepoints [I-D.satoh-pcn-st-marking]. Still another 1731 proposal compromises the precision of the admission control 1732 mechanism, but manages to work with just two encoding states and a 1733 single marking algorithm [I-D.charny-pcn-single-marking]. 1735 Rather than require the IETF to bless any of these work-rounds, this 1736 specification fixes the root cause of the problem so that operators 1737 deploying PCN can simply ask that tunnel end-points within a PCN 1738 region should comply with this new ECN tunnelling specification. 1740 Then PCN can use the trivially simple experimental 3-state ECN 1741 encoding defined in [I-D.briscoe-pcn-3-in-1-encoding]. 1743 D.1. Alternative Ways to Introduce the New Decapsulation Rules 1745 There are a number of ways for the new decapsulation rules to be 1746 introduced: 1748 o They could be specified in the present standards track proposal 1749 (preferred) or in an experimental extension; 1751 o They could be specified as a new default for all Diffserv PHBs 1752 (preferred) or as an option to be configured only for Diffserv 1753 PHBs requiring them (e.g. PCN). 1755 The argument for making this change now, rather than in a separate 1756 experimental extension, is to avoid the burden of an extra standard 1757 to be compliant with and to be backwards compatible with--so we don't 1758 add to the already complex history of ECN tunnelling RFCs. The 1759 argument for a separate experimental extension is that we may never 1760 need this change (if PCN is never successfully deployed and if no-one 1761 ever needs three ECN or PCN encoding states rather than two). 1762 However, the change does no harm to existing mechanisms and stops 1763 tunnels wasting of quarter of a bit (a 2-bit codepoint). 1765 The argument for making this new decapsulation behaviour the default 1766 for all PHBs is that it doesn't change any expected behaviour that 1767 existing mechanisms rely on already. Also, by ending the present 1768 waste of a codepoint, in the future a use of that codepoint could be 1769 proposed for all PHBs, even if PCN isn't successfully deployed. 1771 In practice, if these new decapsulation rules are specified 1772 straightaway as the normative default for all PHBs, a network 1773 operator deploying 3-state PCN would be able to request that tunnels 1774 comply with the latest specification. Implementers of non-PCN 1775 tunnels would not need to comply but, if they did, their code would 1776 be future proofed and no harm would be done to legacy operations. 1777 Therefore, rather than branching their code base, it would be easiest 1778 for implementers to make all their new tunnel code comply with this 1779 specfication, whether or not it was for PCN. But they could leave 1780 old code untouched, unless it was for PCN. 1782 The alternatives are worse. Implementers would otherwise have to 1783 provide configurable decapsulation options and operators would have 1784 to configure all IPsec and IP in IP tunnel endpoints for the 1785 exceptional behaviour of certain PHBs. The rules for tunnel 1786 endpoints to handle both the Diffserv field and the ECN field should 1787 'just work' when handling packets with any Diffserv codepoint. 1789 Appendix E. Why Resetting CE on Encapsulation Impedes PCN 1791 Regarding encapsulation, the section of the PCN architecture 1792 [I-D.ietf-pcn-architecture] on tunnelling says that header copying 1793 (RFC4301) allows PCN to work correctly. Whereas resetting CE 1794 markings confuses PCN marking. 1796 The specific issue here concerns PCN excess rate marking 1797 [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic 1798 that exceeds a configured threshold rate. One of the goals of excess 1799 rate marking is to enable the speedy removal of excess admission 1800 controlled traffic following re-routes caused by link failures or 1801 other disasters. This maintains a share of the capacity for traffic 1802 in lower priority classes. After failures, traffic re-routed onto 1803 remaining links can often stress multiple links along a path. 1804 Therefore, traffic can arrive at a link under stress with some 1805 proportion already marked for removal by a previous link. By design, 1806 marked traffic will be removed by the overall system in subsequent 1807 round trips. So when the excess rate marking algorithm decides how 1808 much traffic to mark for removal, it doesn't include traffic already 1809 marked for removal by another node upstream (the `Excess traffic 1810 meter function' of [I-D.ietf-pcn-marking-behaviour]). 1812 However, if an RFC3168 tunnel ingress intervenes, it resets the ECN 1813 field in all the outer headers, hiding all the evidence of problems 1814 upstream. Thus, although excess rate marking works fine with RFC4301 1815 IPsec tunnels, with RFC3168 tunnels it typically removes large 1816 volumes of traffic that it didn't need to remove at all. 1818 Author's Address 1820 Bob Briscoe 1821 BT 1822 B54/77, Adastral Park 1823 Martlesham Heath 1824 Ipswich IP5 3RE 1825 UK 1827 Phone: +44 1473 645196 1828 Email: bob.briscoe@bt.com 1829 URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/