idnits 2.17.1 draft-ietf-tsvwg-ecn-tunnel-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC4774, but the abstract doesn't seem to directly say this. It does mention RFC4774 though, so this could be OK. -- The draft header indicates that this document updates RFC4301, but the abstract doesn't seem to directly say this. It does mention RFC4301 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. (Using the creation date from RFC3168, updated by this document, for RFC5378 checks: 2000-11-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 26, 2010) is 4989 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-11) exists of draft-ietf-pcn-3-in-1-encoding-03 -- Obsolete informational reference (is this intentional?): RFC 2401 (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 2481 (Obsoleted by RFC 3168) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 5696 (Obsoleted by RFC 6660) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft BT 4 Updates: 3168, 4301, 4774 August 26, 2010 5 (if approved) 6 Intended status: Standards Track 7 Expires: February 27, 2011 9 Tunnelling of Explicit Congestion Notification 10 draft-ietf-tsvwg-ecn-tunnel-10 12 Abstract 14 This document redefines how the explicit congestion notification 15 (ECN) field of the IP header should be constructed on entry to and 16 exit from any IP in IP tunnel. On encapsulation it updates RFC3168 17 to bring all IP in IP tunnels (v4 or v6) into line with RFC4301 IPsec 18 ECN processing. On decapsulation it updates both RFC3168 and RFC4301 19 to add new behaviours for previously unused combinations of inner and 20 outer header. The new rules ensure the ECN field is correctly 21 propagated across a tunnel whether it is used to signal one or two 22 severity levels of congestion, whereas before only one severity level 23 was supported. Tunnel endpoints can be updated in any order without 24 affecting pre-existing uses of the ECN field, thus ensuring backward 25 compatibility. Nonetheless, operators wanting to support two 26 severity levels (e.g. for pre-congestion notification--PCN) can 27 require compliance with this new specification. A thorough analysis 28 of the reasoning for these changes and the implications is included. 29 In the unlikely event that the new rules do not meet a specific need, 30 RFC4774 gives guidance on designing alternate ECN semantics and this 31 document extends that to include tunnelling issues. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on February 27, 2011. 50 Copyright Notice 52 Copyright (c) 2010 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 13 70 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 14 71 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 14 72 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 15 73 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 16 74 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 17 75 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 17 76 4.3. Encapsulation Modes . . . . . . . . . . . . . . . . . . . 19 77 4.4. Single Mode of Decapsulation . . . . . . . . . . . . . . . 21 78 5. Updates to Earlier RFCs . . . . . . . . . . . . . . . . . . . 22 79 5.1. Changes to RFC4301 ECN processing . . . . . . . . . . . . 22 80 5.2. Changes to RFC3168 ECN processing . . . . . . . . . . . . 22 81 5.3. Motivation for Changes . . . . . . . . . . . . . . . . . . 23 82 5.3.1. Motivation for Changing Encapsulation . . . . . . . . 24 83 5.3.2. Motivation for Changing Decapsulation . . . . . . . . 25 84 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 27 85 6.1. Non-Issues Updating Decapsulation . . . . . . . . . . . . 27 86 6.2. Non-Update of RFC4301 IPsec Encapsulation . . . . . . . . 28 87 6.3. Update to RFC3168 Encapsulation . . . . . . . . . . . . . 28 88 7. Design Principles for Alternate ECN Tunnelling Semantics . . . 29 89 8. IANA Considerations (to be removed on publication): . . . . . 31 90 9. Security Considerations . . . . . . . . . . . . . . . . . . . 31 91 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 32 92 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 33 93 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 94 12.1. Normative References . . . . . . . . . . . . . . . . . . . 34 95 12.2. Informative References . . . . . . . . . . . . . . . . . . 34 96 Appendix A. Early ECN Tunnelling RFCs . . . . . . . . . . . . . . 35 97 Appendix B. Design Constraints . . . . . . . . . . . . . . . . . 36 98 B.1. Security Constraints . . . . . . . . . . . . . . . . . . . 36 99 B.2. Control Constraints . . . . . . . . . . . . . . . . . . . 38 100 B.3. Management Constraints . . . . . . . . . . . . . . . . . . 39 101 Appendix C. Contribution to Congestion across a Tunnel . . . . . 40 102 Appendix D. Compromise on Decap with ECT(1) Inner and ECT(0) 103 Outer . . . . . . . . . . . . . . . . . . . . . . . . 41 104 Appendix E. Open Issues . . . . . . . . . . . . . . . . . . . . . 42 106 Request to the RFC Editor (to be removed on publication): 108 In the RFC index, RFC3168 should be identified as an update to 109 RFC2003. RFC4301 should be identified as an update to RFC3168. 111 Changes from previous drafts (to be removed by the RFC Editor) 113 Full text differences between IETF draft versions are available at 114 , and 115 between earlier individual draft versions at 116 118 From ietf-09 to ietf-10 (current): 120 * Editorial changes: 122 + Clarified couple of sentences in Introduction and one in 123 section 6.3 to distinguish whether the terms 'RFC3168' & 124 'RFC4301' refer to implementations or documents. 126 + Corrected garbled sentence in the introduction about 127 backward compatibility. 129 + Made it clear that 'drop' in Fig 2, Fig 4 and the following 130 para is an action, not a codepoint. 132 + In sections 5.1 & 5.2, specifically identified the updated 133 sections of RFC3168 & RFC4301. 135 + Avoided describing compatibility mode as 'optional' at the 136 end of section 5.2 where it should have said 'not always 137 obligatory' instead, because in section 4 compatibility mode 138 is normatively defined as obligatory in some circumstances 139 (rather than always optional). 141 + Added RFC5659 as informative reference on pseudowires and 142 clarified only some pseudowires might be relevant examples. 144 + Deleted "The views expressed here are those of the author 145 only." in the acknowledgements. 147 + Fixed a few nits. 149 From ietf-08 to ietf-09: Added change log entry for -07 to -08 that 150 was previously omitted. 152 * Changes to standards action text: 154 + Added RFC4774 to 'Updates:' header (the draft always has 155 extended the advice in RFC4774 (BCP124) which said very 156 little about tunnels. The GENART reviewer merely pointed 157 out that the header did not highlight this fact.) 159 * Editorial changes: 161 + Abstract: s/providing backward compatibility./thus ensuring 162 backward compatibility./ 164 + Moved PCN-related text motivating changes to decapsulation 165 from "Default Tunnel Egress Behaviour" (Section 4.2) to 166 "Motivation for Changing Decapsulation" (Section 5.3.2) 167 where it was merged with existing similar text. 169 + In the non-normative Design Principles avoided using words 170 in lower case where they were in contexts that might make 171 them confusable with upper case RFC2119 normative language. 173 + Added Stephen Hanna and Ben Campbell to acks and corrected 174 spelling of Agarwal. 176 + Deleted endnote discussing corner case with IKEv2 manual 177 keying (identified as "to be removed before publication 178 following SecDir review"). 180 + Deleted Appendices D & E on why existing ingress & egress 181 tunnelling behavour impede PCN and the endnotes that 182 referred to them (identified as "to be removed before 183 publication"). 185 + Various minor corrections pointed out by reviewers. 187 From ietf-07 to ietf-08: 189 * Changes to standards actions: 191 + Section 4: Changed non-RFC2119 phrase 'NOT RECOMMENDED' to 192 'SHOULD be avoided', wrt alternate ECN tunnelling schemes. 194 + Section 4.2: Used upper-case in 'Alarms SHOULD be rate- 195 limited'. 197 + Section 7: Made bullet #1 in the decapsulation guidelines 198 for alternate schemes more precise. Also changed any upper- 199 case keywords in this informative section to lower case. 201 * Editorial changes: 203 + Changed copyright notice to allow for pre-5378 material. 205 + Shifted supporting text intended for deletion on publication 206 into editorial comments. 208 + Explained how to read the decapsulation matrices in their 209 captions. 211 + Minor clarifications throughout. 213 From ietf-06 to ietf-07: 215 * Emphasised that this is the opposite of a fork in the RFC 216 series. 218 * Altered Section 5 to focus on updates to implementations of 219 earlier RFCs, rather than on updates to the text of the RFCs. 221 * Removed potential loop-holes in normative text that 222 implementers might have used to claim compliance without 223 implementing normal mode. Highlighted the deliberate 224 distinction between "MUST implement" and "SHOULD use" normal 225 mode. 227 * Added question for Security Directorate reviewers on whether to 228 mention a corner-case concerning manual keying of IPsec 229 tunnels. 231 * Minor clarifications, updated references and updated acks. 233 * Marked two appendices about PCN motivations for removal before 234 publication. 236 From ietf-05 to ietf-06: 238 * Minor textual clarifications and corrections. 240 From ietf-04 to ietf-05: 242 * Functional changes: 244 + Section 4.2: ECT(1) outer with Not-ECT inner: reverted to 245 forwarding as Not-ECT (as in RFC3168 & RFC4301), rather than 246 dropping. 248 + Altered rationale in bullet 3 of Section 5.3.2 to justify 249 this. 251 + Distinguished alarms for dangerous and invalid combinations 252 and allowed combinations that are valid in some tunnel 253 configurations but dangerous in others to be alarmed at the 254 discretion of the implementer and/or operator. 256 + Altered advice on designing alternate ECN tunnelling 257 semantics to reflect the above changes. 259 * Textual changes: 261 + Changed "Future non-default schemes" to "Alternate ECN 262 Tunnelling Semantics" throughout. 264 + Cut down Appendix D and Appendix E for brevity. 266 + A number of clarifying edits & updated refs. 268 From ietf-03 to ietf-04: 270 * Functional changes: none 272 * Structural changes: 274 + Added "Open Issues" appendix 276 * Textual changes: 278 + Section title: "Changes from Earlier RFCs" -> "Updates to 279 Earlier RFCs" 281 + Emphasised that change on decap to previously unused 282 combinations will propagate PCN encoding. 284 + Acknowledged additional reviewers and updated references 286 From ietf-02 to ietf-03: 288 * Functional changes: 290 + Corrected errors in recap of previous RFCs, which wrongly 291 stated the different decapsulation behaviours of RFC3168 & 292 RFC4301 with a Not-ECT inner header. This also required 293 corrections to the "Changes from Earlier RFCs" and the 294 Motivations for these changes. 296 + Mandated that any future standards action SHOULD NOT use the 297 ECT(0) codepoint as an indication of congestion, without 298 giving strong reasons. 300 + Added optional alarm when decapsulating ECT(1) outer, 301 ECT(0), but noted it would need to be disabled for 302 2-severity level congestion (e.g. PCN). 304 * Structural changes: 306 + Removed Document Roadmap which merely repeated the Contents 307 (previously Section 1.2). 309 + Moved "Changes from Earlier RFCs" (Section 5) before 310 Section 6 on Backward Compatibility and internally organised 311 both by RFC, rather than by ingress then egress. 313 + Moved motivation for changing existing RFCs (Section 5.3) to 314 after the changes are specified. 316 + Moved informative "Design Principles for Future Non-Default 317 Schemes" after all the normative sections. 319 + Added Appendix A on early history of ECN tunnelling RFCs. 321 + Removed specialist appendix on "Relative Placement of 322 Tunnelling and In-Path Load Regulation" (Appendix D in the 323 -02 draft) 325 + Moved and updated specialist text on "Compromise on Decap 326 with ECT(1) Inner and ECT(0) Outer" from Security 327 Considerations to Appendix D 329 * Textual changes: 331 + Simplified vocabulary for non-native-english speakers 333 + Simplified Introduction and defined regularly used terms in 334 an expanded Terminology section. 336 + More clearly distinguished statically configured tunnels 337 from dynamic tunnel endpoint discovery, before explaining 338 operating modes. 340 + Simplified, cut-down and clarified throughout 342 + Updated references. 344 From ietf-01 to ietf-02: 346 * Scope reduced from any encapsulation of an IP packet to solely 347 IP in IP tunnelled encapsulation. Consequently changed title 348 and removed whole section 'Design Guidelines for New 349 Encapsulations of Congestion Notification' (to be included in a 350 future companion informational document). 352 * Included a new normative decapsulation rule for ECT(0) inner 353 and ECT(1) outer that had previously only been outlined in the 354 non-normative appendix 'Comprehensive Decapsulation Rules'. 355 Consequently: 357 + The Introduction has been completely re-written to motivate 358 this change to decapsulation along with the existing change 359 to encapsulation. 361 + The tentative text in the appendix that first proposed this 362 change has been split between normative standards text in 363 Section 4 and Appendix D, which explains specifically why 364 this change would streamline PCN. New text on the logic of 365 the resulting decap rules added. 367 * If inner/outer is Not-ECT/ECT(0), changed decapsulation to 368 propagate Not-ECT rather than drop the packet; and added 369 reasoning. 371 * Considerably restructured: 373 + "Design Constraints" analysis moved to an appendix 374 (Appendix B); 376 + Added Section 3 to summarise relevant existing RFCs; 378 + Structured Section 4 and Section 6 into subsections. 380 + Added tables to sections on old and new rules, for precision 381 and comparison. 383 + Moved Section 7 on Design Principles to the end of the 384 section specifying the new default normative tunnelling 385 behaviour. Rewritten and shifted text on identifiers and 386 in-path load regulators to Appendix B.1 [deleted in revision 387 -03]. 389 From ietf-00 to ietf-01: 391 * Identified two additional alarm states in the decapsulation 392 rules (Figure 4) if ECT(X) in outer and inner contradict each 393 other. 395 * Altered Comprehensive Decapsulation Rules (Appendix D) so that 396 ECT(0) in the outer no longer overrides ECT(1) in the inner. 397 Used the term 'Comprehensive' instead of 'Ideal'. And 398 considerably updated the text in this appendix. 400 * Added Appendix D.1 (removed again in a later revision) to weigh 401 up the various ways the Comprehensive Decapsulation Rules might 402 be introduced. This replaces the previous contradictory 403 statements saying complex backwards compatibility interactions 404 would be introduced while also saying there would be no 405 backwards compatibility issues. 407 * Updated references. 409 From briscoe-01 to ietf-00: 411 * Re-wrote Appendix C giving much simpler technique to measure 412 contribution to congestion across a tunnel. 414 * Added discussion of backward compatibility of the ideal 415 decapsulation scheme in Appendix D 417 * Updated references. Minor corrections & clarifications 418 throughout. 420 From briscoe-00 to briscoe-01: 422 * Related everything conceptually to the uniform and pipe models 423 of RFC2983 on Diffserv Tunnels, and completely removed the 424 dependence of tunnelling behaviour on the presence of any in- 425 path load regulation by using the [1 - Before] [2 - Outer] 426 function placement concepts from RFC2983; 428 * Added specific cases where the existing standards limit new 429 proposals, particularly Appendix E; 431 * Added sub-structure to Introduction (Need for Rationalisation, 432 Roadmap), added new Introductory subsection on "Scope" and 433 improved clarity; 435 * Added Design Guidelines for New Encapsulations of Congestion 436 Notification; 438 * Considerably clarified the Backward Compatibility section 439 (Section 6); 441 * Considerably extended the Security Considerations section 442 (Section 9); 444 * Summarised the primary rationale much better in the 445 conclusions; 447 * Added numerous extra acknowledgements; 449 * Added Appendix E. "Why resetting CE on encapsulation harms 450 PCN", Appendix C. "Contribution to Congestion across a Tunnel" 451 and Appendix D. "Ideal Decapsulation Rules"; 453 * Re-wrote Appendix B [deleted in a later revision], explaining 454 how tunnel encapsulation no longer depends on in-path load- 455 regulation (changed title from "In-path Load Regulation" to 456 "Non-Dependence of Tunnelling on In-path Load Regulation"), but 457 explained how an in-path load regulation function must be 458 carefully placed with respect to tunnel encapsulation (in a new 459 sub-section entitled "Dependence of In-Path Load Regulation on 460 Tunnelling"). 462 1. Introduction 464 Explicit congestion notification (ECN [RFC3168]) allows a forwarding 465 element (e.g. a router) to notify the onset of congestion without 466 having to drop packets. Instead it can explicitly mark a proportion 467 of packets in the 2-bit ECN field in the IP header (Table 1 recaps 468 the ECN codepoints). 470 The outer header of an IP packet can encapsulate one or more IP 471 headers for tunnelling. A forwarding element using ECN to signify 472 congestion will only mark the immediately visible outer IP header. 473 When a tunnel decapsulator later removes this outer header, it 474 follows rules to propagate congestion markings by combining the ECN 475 fields of the inner and outer IP header into one outgoing IP header. 477 This document updates those rules for IPsec [RFC4301] and non-IPsec 478 [RFC3168] tunnels to add new behaviours for previously unused 479 combinations of inner and outer header. It also updates the ingress 480 behaviour of RFC3168 tunnels to match that of RFC4301 tunnels. 481 Tunnel endpoints complying with the updated rules will be backward 482 compatible when interworking with tunnel endpoints complying with 483 RFC4301, RFC3168 or any earlier specification. 485 When ECN and its tunnelling was defined in RFC3168, only the minimum 486 necessary changes to the ECN field were propagated through tunnel 487 endpoints--just enough for the basic ECN mechanism to work. This was 488 due to concerns that the ECN field might be toggled to communicate 489 between a secure site and someone on the public Internet--a covert 490 channel. This was because a mutable field like ECN cannot be 491 protected by IPsec's integrity mechanisms--it has to be able to 492 change as it traverses the Internet. 494 Nonetheless, the latest IPsec architecture [RFC4301] considered a 495 bandwidth limit of 2 bits per packet on a covert channel to be a 496 manageable risk. Therefore, for simplicity, an RFC4301 ingress 497 copied the whole ECN field to encapsulate a packet. RFC4301 498 dispensed with the two modes of RFC3168, one which partially copied 499 the ECN field, and the other which blocked all propagation of ECN 500 changes. 502 Unfortunately, this entirely reasonable sequence of standards actions 503 resulted in a perverse outcome; non-IPsec tunnels (RFC3168) blocked 504 the 2-bit covert channel, while IPsec tunnels (RFC4301) did not--at 505 least not at the ingress. At the egress, both IPsec and non-IPsec 506 tunnels still partially restricted propagation of the full ECN field. 508 The trigger for the changes in this document was the introduction of 509 pre-congestion notification (PCN [RFC5670]) to the IETF standards 510 track. PCN needs the ECN field to be copied at a tunnel ingress and 511 it needs four states of congestion signalling to be propagated at the 512 egress, but pre-existing tunnels only propagate three in the ECN 513 field. 515 This document draws on currently unused (CU) combinations of inner 516 and outer headers to add tunnelling of four-state congestion 517 signalling to RFC3168 and RFC4301. Operators of tunnels who 518 specifically want to support four states can require that all their 519 tunnels comply with this specification. However, this is not a fork 520 in the RFC series. It is an update that can be deployed first by 521 those that need it, and subsequently by all tunnel endpoint 522 implementations (RFC4301, RFC3168, RFC2481, RFC2401, RFC2003), which 523 can safely be updated to this new specification as part of general 524 code maintenance. This will gradually add support for four 525 congestion states to the Internet. Existing three state schemes will 526 continue to work as before. 528 In fact, this document is the opposite of a fork. At the same time 529 as supporting a fourth state, the opportunity has been taken to draw 530 together divergent ECN tunnelling specifications into a single 531 consistent behaviour, harmonising differences such as perverse covert 532 channel treatment. Then any tunnel can be deployed unilaterally, and 533 it will support the full range of congestion control and management 534 schemes without any modes or configuration. Further, any host or 535 router can expect the ECN field to behave in the same way, whatever 536 type of tunnel might intervene in the path. 538 1.1. Scope 540 This document only concerns wire protocol processing of the ECN field 541 at tunnel endpoints and makes no changes or recommendations 542 concerning algorithms for congestion marking or congestion response. 544 This document specifies common ECN field processing at encapsulation 545 and decapsulation for any IP in IP tunnelling, whether IPsec or non- 546 IPsec tunnels. It applies irrespective of whether IPv4 or IPv6 is 547 used for either of the inner and outer headers. It applies for 548 packets with any destination address type, whether unicast or 549 multicast. It applies as the default for all Diffserv per-hop 550 behaviours (PHBs), unless stated otherwise in the specification of a 551 PHB (but Section 4 strongly deprecates such exceptions). It is 552 intended to be a good trade off between somewhat conflicting 553 security, control and management requirements. 555 [RFC2983] is a comprehensive primer on differentiated services and 556 tunnels. Given ECN raises similar issues to differentiated services 557 when interacting with tunnels, useful concepts introduced in RFC2983 558 are used throughout, with brief recaps of the explanations where 559 necessary. 561 2. Terminology 563 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 564 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 565 document are to be interpreted as described in RFC 2119 [RFC2119]. 567 Table 1 recaps the names of the ECN codepoints [RFC3168]. 569 +------------------+----------------+---------------------------+ 570 | Binary codepoint | Codepoint name | Meaning | 571 +------------------+----------------+---------------------------+ 572 | 00 | Not-ECT | Not ECN-capable transport | 573 | 01 | ECT(1) | ECN-capable transport | 574 | 10 | ECT(0) | ECN-capable transport | 575 | 11 | CE | Congestion experienced | 576 +------------------+----------------+---------------------------+ 578 Table 1: Recap of Codepoints of the ECN Field [RFC3168] in the IP 579 Header 581 Further terminology used within this document: 583 Encapsulator: The tunnel endpoint function that adds an outer IP 584 header to tunnel a packet (also termed the 'ingress tunnel 585 endpoint' or just the 'ingress' where the context is clear). 587 Decapsulator: The tunnel endpoint function that removes an outer IP 588 header from a tunnelled packet (also termed the 'egress tunnel 589 endpoint' or just the 'egress' where the context is clear). 591 Incoming header: The header of an arriving packet before 592 encapsulation. 594 Outer header: The header added to encapsulate a tunnelled packet. 596 Inner header: The header encapsulated by the outer header. 598 Outgoing header: The header constructed by the decapsulator using 599 logic that combines the fields in the outer and inner headers. 601 Copying ECN: On encapsulation, setting the ECN field of the new 602 outer header to be a copy of the ECN field in the incoming header. 604 Zeroing ECN: On encapsulation, clearing the ECN field of the new 605 outer header to Not-ECT ("00"). 607 Resetting ECN: On encapsulation, setting the ECN field of the new 608 outer header to be a copy of the ECN field in the incoming header 609 except the outer ECN field is set to the ECT(0) codepoint if the 610 incoming ECN field is CE. 612 3. Summary of Pre-Existing RFCs 614 This section is informative not normative, as it recaps pre-existing 615 RFCs. Earlier relevant RFCs that were either experimental or 616 incomplete with respect to ECN tunnelling (RFC2481, RFC2401 and 617 RFC2003) are briefly outlined in Appendix A. The question of whether 618 tunnel implementations used in the Internet comply with any of these 619 RFCs is not discussed. 621 3.1. Encapsulation at Tunnel Ingress 623 At the encapsulator, the controversy has been over whether to 624 propagate information about congestion experienced on the path so far 625 into the outer header of the tunnel. 627 Specifically, RFC3168 says that, if a tunnel fully supports ECN 628 (termed a 'full-functionality' ECN tunnel in [RFC3168]), the 629 encapsulator must not copy a CE marking from the inner header into 630 the outer header that it creates. Instead the encapsulator must set 631 the outer header to ECT(0) if the ECN field is marked CE in the 632 arriving IP header. We term this 'resetting' a CE codepoint. 634 However, the new IPsec architecture in [RFC4301] reverses this rule, 635 stating that the encapsulator must simply copy the ECN field from the 636 incoming header to the outer header. 638 RFC3168 also provided a Limited Functionality mode that turns off ECN 639 processing over the scope of the tunnel by setting the outer header 640 to Not-ECT ("00"). Then such packets will be dropped to indicate 641 congestion rather than marked with ECN. This is necessary for the 642 ingress to interwork with legacy decapsulators ([RFC2481], [RFC2401] 643 and [RFC2003]) that do not propagate ECN markings added to the outer 644 header. Otherwise such legacy decapsulators would throw away 645 congestion notifications before they reached the transport layer. 647 Neither Limited Functionality mode nor Full Functionality mode are 648 used by an RFC4301 IPsec encapsulator, which simply copies the 649 incoming ECN field into the outer header. An earlier key-exchange 650 phase ensures an RFC4301 ingress will not have to interwork with a 651 legacy egress that does not support ECN. 653 These pre-existing behaviours are summarised in Figure 1. 654 +-----------------+-----------------------------------------------+ 655 | Incoming Header | Outgoing Outer Header | 656 | (also equal to +---------------+---------------+---------------+ 657 | Outgoing Inner | RFC3168 ECN | RFC3168 ECN | RFC4301 IPsec | 658 | Header) | Limited | Full | | 659 | | Functionality | Functionality | | 660 +-----------------+---------------+---------------+---------------+ 661 | Not-ECT | Not-ECT | Not-ECT | Not-ECT | 662 | ECT(0) | Not-ECT | ECT(0) | ECT(0) | 663 | ECT(1) | Not-ECT | ECT(1) | ECT(1) | 664 | CE | Not-ECT | ECT(0) | CE | 665 +-----------------+---------------+---------------+---------------+ 667 Figure 1: IP in IP Encapsulation: Recap of Pre-existing Behaviours 669 3.2. Decapsulation at Tunnel Egress 671 RFC3168 and RFC4301 specify the decapsulation behaviour summarised in 672 Figure 2. The ECN field in the outgoing header is set to the 673 codepoint at the intersection of the appropriate incoming inner 674 header (row) and incoming outer header (column). 676 +---------+------------------------------------------------+ 677 |Incoming | Incoming Outer Header | 678 | Inner +---------+------------+------------+------------+ 679 | Header | Not-ECT | ECT(0) | ECT(1) | CE | 680 +---------+---------+------------+------------+------------+ 681 RFC3168->| Not-ECT | Not-ECT |Not-ECT |Not-ECT | | 682 RFC4301->| Not-ECT | Not-ECT |Not-ECT |Not-ECT |Not-ECT | 683 | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | 684 | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | 685 | CE | CE | CE | CE | CE | 686 +---------+---------+------------+------------+------------+ 688 In pre-existing RFCs, the ECN field in the outgoing header was set to 689 the codepoint at the intersection of the appropriate incoming inner 690 header (row) and incoming outer header (column) , or the packet was 691 dropped where indicated. 693 Figure 2: IP in IP Decapsulation; Recap of Pre-existing Behaviour 695 The behaviour in the table derives from the logic given in RFC3168 696 and RFC4301, briefly recapped as follows: 698 o On decapsulation, if the inner ECN field is Not-ECT the outer is 699 ignored. RFC3168 (but not RFC4301) also specified that the 700 decapsulator must drop a packet with a Not-ECT inner and CE in the 701 outer. 703 o In all other cases, if the outer is CE, the outgoing ECN field is 704 set to CE, but otherwise the outer is ignored and the inner is 705 used for the outgoing ECN field. 707 Section 9.2.2 of RFC3168 also made it an auditable event for an IPsec 708 tunnel "if the ECN Field is changed inappropriately within an IPsec 709 tunnel...". Inappropriate changes were not specifically enumerated. 710 RFC4301 did not mention inappropriate ECN changes. 712 4. New ECN Tunnelling Rules 714 The standards actions below in Section 4.1 (ingress encapsulation) 715 and Section 4.2 (egress decapsulation) define new default ECN tunnel 716 processing rules for any IP packet (v4 or v6) with any Diffserv 717 codepoint. 719 If these defaults do not meet a particular requirement, an alternate 720 ECN tunnelling scheme can be introduced as part of the definition of 721 an alternate congestion marking scheme used by a specific Diffserv 722 PHB (see section 5 of [RFC3168] and [RFC4774]). When designing such 723 alternate ECN tunnelling schemes, the principles in Section 7 should 724 be followed. However, alternate ECN tunnelling schemes SHOULD be 725 avoided whenever possible as the deployment burden of handling 726 exceptional PHBs in implementations of all affected tunnels should 727 not be underestimated. There is no requirement for a PHB definition 728 to state anything about ECN tunnelling behaviour if the default 729 behaviour in the present specification is sufficient. 731 4.1. Default Tunnel Ingress Behaviour 733 Two modes of encapsulation are defined here; a REQUIRED `normal mode' 734 and a `compatibility mode', which is for backward compatibility with 735 tunnel decapsulators that do not understand ECN. Note that these are 736 modes of the ingress tunnel endpoint only, not the whole tunnel. 737 Section 4.3 explains why two modes are necessary and specifies the 738 circumstances in which it is sufficient to solely implement normal 739 mode. 741 Whatever the mode, an encapsulator forwards the inner header without 742 changing the ECN field. 744 In normal mode an encapsulator compliant with this specification MUST 745 construct the outer encapsulating IP header by copying the 2-bit ECN 746 field of the incoming IP header. In compatibility mode it clears the 747 ECN field in the outer header to the Not-ECT codepoint (the IPv4 748 header checksum also changes whenever the ECN field is changed). 749 These rules are tabulated for convenience in Figure 3. 750 +-----------------+-------------------------------+ 751 | Incoming Header | Outgoing Outer Header | 752 | (also equal to +---------------+---------------+ 753 | Outgoing Inner | Compatibility | Normal | 754 | Header) | Mode | Mode | 755 +-----------------+---------------+---------------+ 756 | Not-ECT | Not-ECT | Not-ECT | 757 | ECT(0) | Not-ECT | ECT(0) | 758 | ECT(1) | Not-ECT | ECT(1) | 759 | CE | Not-ECT | CE | 760 +-----------------+---------------+---------------+ 762 Figure 3: New IP in IP Encapsulation Behaviours 764 4.2. Default Tunnel Egress Behaviour 766 To decapsulate the inner header at the tunnel egress, a compliant 767 tunnel egress MUST set the outgoing ECN field to the codepoint at the 768 intersection of the appropriate incoming inner header (row) and outer 769 header (column) in Figure 4 (the IPv4 header checksum also changes 770 whenever the ECN field is changed). There is no need for more than 771 one mode of decapsulation, as these rules cater for all known 772 requirements. 773 +---------+------------------------------------------------+ 774 |Incoming | Incoming Outer Header | 775 | Inner +---------+------------+------------+------------+ 776 | Header | Not-ECT | ECT(0) | ECT(1) | CE | 777 +---------+---------+------------+------------+------------+ 778 | Not-ECT | Not-ECT |Not-ECT(!!!)|Not-ECT(!!!)| (!!!)| 779 | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE | 780 | ECT(1) | ECT(1) | ECT(1) (!) | ECT(1) | CE | 781 | CE | CE | CE | CE(!!!)| CE | 782 +---------+---------+------------+------------+------------+ 784 The ECN field in the outgoing header is set to the codepoint at the 785 intersection of the appropriate incoming inner header (row) and 786 incoming outer header (column) , or the packet is dropped where 787 indicated. Currently unused combinations are indicated by '(!!!)' or 788 '(!)' 790 Figure 4: New IP in IP Decapsulation Behaviour 792 This table for decapsulation behaviour is derived from the following 793 logic: 795 o If the inner ECN field is Not-ECT the decapsulator MUST NOT 796 propagate any other ECN codepoint onwards. This is because the 797 inner Not-ECT marking is set by transports that rely on dropped 798 packets as an indication of congestion and would not understand or 799 respond to any other ECN codepoint [RFC4774]. Specifically: 801 * If the inner ECN field is Not-ECT and the outer ECN field is CE 802 the decapsulator MUST drop the packet. 804 * If the inner ECN field is Not-ECT and the outer ECN field is 805 Not-ECT, ECT(0) or ECT(1) the decapsulator MUST forward the 806 outgoing packet with the ECN field cleared to Not-ECT. 808 o In all other cases where the inner supports ECN, the decapsulator 809 MUST set the outgoing ECN field to the more severe marking of the 810 outer and inner ECN fields, where the ranking of severity from 811 highest to lowest is CE, ECT(1), ECT(0), Not-ECT. This in no way 812 precludes cases where ECT(1) and ECT(0) have the same severity; 814 o Certain combinations of inner and outer ECN fields cannot result 815 from any transition in any current or previous ECN tunneling 816 specification. These currently unused (CU) combinations are 817 indicated in Figure 4 by '(!!!)' or '(!)', where '(!!!)' means the 818 combination is CU and always potentially dangerous, while '(!)' 819 means it is CU and possibly dangerous. In these cases, 820 particularly the more dangerous ones, the decapsulator SHOULD log 821 the event and MAY also raise an alarm. 823 Just because the highlighted combinations are currently unused, 824 does not mean that all the other combinations are always valid. 825 Some are only valid if they have arrived from a particular type of 826 legacy ingress, and dangerous otherwise. Therefore an 827 implementation MAY allow an operator to configure logging and 828 alarms for such additional header combinations known to be 829 dangerous or CU for the particular configuration of tunnel 830 endpoints deployed at run-time. 832 Alarms SHOULD be rate-limited so that the anomalous combinations 833 will not amplify into a flood of alarm messages. It MUST be 834 possible to suppress alarms or logging, e.g. if it becomes 835 apparent that a combination that previously was not used has 836 started to be used for legitimate purposes such as a new standards 837 action. 839 The above logic allows for ECT(0) and ECT(1) to both represent the 840 same severity of congestion marking (e.g. "not congestion marked"). 841 But it also allows future schemes to be defined where ECT(1) is a 842 more severe marking than ECT(0), in particular enabling the simplest 843 possible encoding for PCN [I-D.ietf-pcn-3-in-1-encoding] (see 844 Section 5.3.2). Treating ECT(1) as either the same as ECT(0) or as a 845 higher severity level is explained in the discussion of the ECN nonce 846 [RFC3540] in Section 9, which in turn refers to Appendix D. 848 4.3. Encapsulation Modes 850 Section 4.1 introduces two encapsulation modes, normal mode and 851 compatibility mode, defining their encapsulation behaviour (i.e. 852 header copying or zeroing respectively). Note that these are modes 853 of the ingress tunnel endpoint only, not the tunnel as a whole. 855 To comply with this specification, a tunnel ingress MUST at least 856 implement `normal mode'. Unless it will never be used with legacy 857 tunnel egress nodes (RFC2003, RFC2401 or RFC2481 or the limited 858 functionality mode of RFC3168), an ingress MUST also implement 859 `compatibility mode' for backward compatibility with tunnel egresses 860 that do not propagate explicit congestion notifications [RFC4774]. 862 We can categorise the way that an ingress tunnel endpoint is paired 863 with an egress as either static or dynamically discovered: 865 Static: Tunnel endpoints paired together by prior configuration. 867 Some implementations of encapsulator might always be statically 868 deployed, and constrained to never be paired with a legacy 869 decapsulator (RFC2003, RFC2401 or RFC2481 or the limited 870 functionality mode of RFC3168). In such a case, only normal mode 871 needs to be implemented. 873 For instance, RFC4301-compatible IPsec tunnel endpoints invariably 874 use IKEv2 [RFC4306] for key exchange, which was introduced 875 alongside RFC4301. Therefore both endpoints of an RFC4301 tunnel 876 can be sure that the other end is RFC4301-compatible, because the 877 tunnel is only formed after IKEv2 key management has completed, at 878 which point both ends will be RFC4301-compliant by definition. 879 Therefore an IPsec tunnel ingress does not need compatibility 880 mode, as it will never interact with legacy ECN tunnels. To 881 comply with the present specification, it only needs to implement 882 the required normal mode, which is identical to the pre-existing 883 RFC4301 behaviour. 885 Dynamic Discovery: Tunnel endpoints paired together by some form of 886 tunnel endpoint discovery, typically finding an egress on the path 887 taken by the first packet. 889 This specification does not require or recommend dynamic discovery 890 and it does not define how dynamic negotiation might be done, but 891 it recognises that proprietary tunnel endpoint discovery protocols 892 exist. It therefore sets down some constraints on discovery 893 protocols to ensure safe interworking. 895 If dynamic tunnel endpoint discovery might pair an ingress with a 896 legacy egress (RFC2003, RFC2401 or RFC2481 or the limited 897 functionality mode of RFC3168), the ingress MUST implement both 898 normal and compatibility mode. If the tunnel discovery process is 899 arranged to only ever find a tunnel egress that propagates ECN 900 (RFC3168 full functionality mode, RFC4301 or this present 901 specification), then a tunnel ingress can be compliant with the 902 present specification without implementing compatibility mode. 904 While a compliant tunnel ingress is discovering an egress, it MUST 905 send packets in compatibility mode in case the egress it discovers 906 is a legacy egress. If, through the discovery protocol, the 907 egress indicates that it is compliant with the present 908 specification, with RFC4301 or with RFC3168 full functionality 909 mode, the ingress can switch itself into normal mode. If the 910 egress denies compliance with any of these or returns an error 911 that implies it does not understand a request to work to any of 912 these ECN specifications, the tunnel ingress MUST remain in 913 compatibility mode. 915 If an ingress claims compliance with this specification it MUST NOT 916 permanently disable ECN processing across the tunnel (i.e. only using 917 compatibility mode). It is true that such a tunnel ingress is at 918 least safe with the ECN behaviour of any egress it may encounter, but 919 it does not meet the central aim of this specification: introducing 920 ECN support to tunnels. 922 Instead, if the ingress knows that the egress does support 923 propagation of ECN (full functionality mode of RFC3168 or RFC4301 or 924 the present specification), it SHOULD use normal mode, in order to 925 support ECN where possible. Note that this section started by saying 926 an ingress "MUST implement "normal mode, while it has just said an 927 ingress "SHOULD use" normal mode. This distinction is deliberate, to 928 allow the mode to be turned off in exceptional circumstances but to 929 ensure all implementations make normal mode available. 931 Implementation note: If a compliant node is the ingress for multiple 932 tunnels, a mode setting will need to be stored for each tunnel 933 ingress. However, if a node is the egress for multiple tunnels, 934 none of the tunnels will need to store a mode setting, because a 935 compliant egress only needs one mode. 937 4.4. Single Mode of Decapsulation 939 A compliant decapsulator only needs one mode of operation. However, 940 if a compliant egress is implemented to be dynamically discoverable, 941 it may need to respond to discovery requests from various types of 942 legacy tunnel ingress. This specification does not define how 943 dynamic negotiation might be done by (proprietary) discovery 944 protocols, but it sets down some constraints to ensure safe 945 interworking. 947 Through the discovery protocol, a tunnel ingress compliant with the 948 present specification might ask if the egress is compliant with the 949 present specification, with RFC4301 or with RFC3168 full 950 functionality mode. Or an RFC3168 tunnel ingress might try to 951 negotiate to use limited functionality or full functionality mode 952 [RFC3168]. In all these cases, a decapsulating tunnel egress 953 compliant with this specification MUST agree to any of these 954 requests, since it will behave identically in all these cases. 956 If no ECN-related mode is requested, a compliant tunnel egress MUST 957 continue without raising any error or warning, because its egress 958 behaviour is compatible with all the legacy ingress behaviours that 959 do not negotiate capabilities. 961 A compliant tunnel egress SHOULD raise a warning alarm about any 962 requests to enter modes it does not recognise but, for 'forward 963 compatibility' with standards actions possibly defined after it was 964 implemented, it SHOULD continue operating. 966 5. Updates to Earlier RFCs 968 5.1. Changes to RFC4301 ECN processing 970 Ingress: An RFC4301 IPsec encapsulator is not changed at all by the 971 present specification. It uses the normal mode of the present 972 specification, which defines packet encapsulation identically to 973 RFC4301. 975 Egress: An RFC4301 egress will need to be updated to the new 976 decapsulation behaviour in Figure 4, in order to comply with the 977 present specification. However, the changes are backward 978 compatible; combinations of inner and outer that result from any 979 protocol defined in the RFC series so far are unaffected. Only 980 combinations that have never been used have been changed, 981 effectively adding new behaviours to RFC4301 decapsulation without 982 altering existing behaviours. The following specific updates to 983 section 5.1.2 of RFC4301 have been made: 985 * The outer, not the inner, is propagated when the outer is 986 ECT(1) and the inner is ECT(0); 988 * A packet with Not-ECT in the inner and an outer of CE is 989 dropped rather than forwarded as Not-ECT; 991 * Certain combinations of inner and outer ECN field have been 992 identified as currently unused. These can trigger logging 993 and/or raise alarms. 995 Modes: RFC4301 tunnel endpoints do not need modes and are not 996 updated by the modes in the present specification. Effectively an 997 RFC4301 IPsec ingress solely uses the REQUIRED normal mode of 998 encapsulation, which is unchanged from RFC4301 encapsulation. It 999 will never need the OPTIONAL compatibility mode as explained in 1000 Section 4.3. 1002 5.2. Changes to RFC3168 ECN processing 1004 Ingress: On encapsulation, the new rule in Figure 3 that a normal 1005 mode tunnel ingress copies any ECN field into the outer header 1006 updates the full functionality behaviour of an RFC3168 ingress 1007 [RFC3168; section 9.1.1]. Nonetheless, the new compatibility mode 1008 encapsulates packets identically to the limited functionality mode 1009 of an RFC3168 ingress. 1011 Egress: An RFC3168 egress will need to be updated to the new 1012 decapsulation behaviour in Figure 4, in order to comply with the 1013 present specification. However, the changes are backward 1014 compatible; combinations of inner and outer that result from any 1015 protocol defined in the RFC series so far are unaffected. Only 1016 combinations that have never been used have been changed, 1017 effectively adding new behaviours to RFC3168 decapsulation without 1018 altering existing behaviours. The following specific updates to 1019 section 9.1.1 of RFC3168 have been made: 1021 * The outer, not the inner, is propagated when the outer is 1022 ECT(1) and the inner is ECT(0); 1024 * Certain combinations of inner and outer ECN field have been 1025 identified as currently unused. These can trigger logging 1026 and/or raise alarms. 1028 Modes: An RFC3168 ingress will need to be updated if it is to comply 1029 with the present specification, whether or not it implemented the 1030 optional full functionality mode of section 9.1.1 of RFC3168. 1032 Section 9.1 of RFC3168 defined a (required) limited functionality 1033 mode and an (optional) full functionality mode for a tunnel. In 1034 RFC3168, modes applied to both ends of the tunnel, while in the 1035 present specification, modes are only used at the ingress--a 1036 single egress behaviour covers all cases. 1038 The normal mode of encapsulation is an update to the encapsulation 1039 behaviour of the full functionality mode of an RFC3168 ingress. 1040 The compatibility mode of encapsulation is identical to the 1041 encapsulation behaviour of the limited functionality mode of an 1042 RFC3168 ingress, except it is not always obligatory. 1044 The constraints on how tunnel discovery protocols set modes in 1045 Section 4.3 and Section 4.4 are an update to RFC3168, but they are 1046 unlikely to require code changes as they document existing safe 1047 practice. 1049 5.3. Motivation for Changes 1051 An overriding goal is to ensure the same ECN signals can mean the 1052 same thing whatever tunnels happen to encapsulate an IP packet flow. 1053 This removes gratuitous inconsistency, which otherwise constrains the 1054 available design space and makes it harder to design networks and new 1055 protocols that work predictably. 1057 5.3.1. Motivation for Changing Encapsulation 1059 The normal mode in Section 4 updates RFC3168 to make all IP in IP 1060 encapsulation of the ECN field consistent--consistent with the way 1061 both RFC4301 IPsec [RFC4301] and IP in MPLS or MPLS in MPLS 1062 encapsulation [RFC5129] construct the ECN field. 1064 Compatibility mode has also been defined so that a non-RFC4301 1065 ingress can still switch to using drop across a tunnel for backwards 1066 compatibility with legacy decapsulators that do not propagate ECN 1067 correctly. 1069 The trigger that motivated this update to RFC3168 encapsulation was a 1070 standards track proposal for pre-congestion notification (PCN 1071 [RFC5670]). PCN excess rate marking only works correctly if the ECN 1072 field is copied on encapsulation (as in RFC4301 and RFC5129); it does 1073 not work if ECN is reset (as in RFC3168). This is because PCN excess 1074 rate marking depends on the outer header revealing any congestion 1075 experienced so far on the whole path, not just since the last tunnel 1076 ingress. 1078 PCN allows a network operator to add flow admission and termination 1079 for inelastic traffic at the edges of a Diffserv domain, but without 1080 any per-flow mechanisms in the interior and without the generous 1081 provisioning typical of Diffserv, aiming to significantly reduce 1082 costs. The PCN architecture [RFC5559] states that RFC3168 IP in IP 1083 tunnelling of the ECN field cannot be used for any tunnel ingress in 1084 a PCN domain. Prior to the present specification, this left a stark 1085 choice between not being able to use PCN for inelastic traffic 1086 control or not being able to use the many tunnels already deployed 1087 for Mobile IP, VPNs and so forth. 1089 The present specification provides a clean solution to this problem, 1090 so that network operators who want to use both PCN and tunnels can 1091 specify that every tunnel ingress in a PCN region must comply with 1092 this latest specification. 1094 Rather than allow tunnel specifications to fragment further into one 1095 for PCN, one for IPsec and one for other tunnels, the opportunity has 1096 been taken to consolidate the diverging specifications back into a 1097 single tunnelling behaviour. Resetting ECN was originally motivated 1098 by a covert channel concern that has been deliberately set aside in 1099 RFC4301 IPsec. Therefore the reset behaviour of RFC3168 is an 1100 anomaly that we do not need to keep. Copying ECN on encapsulation is 1101 anyway simpler than resetting. So, as more tunnel endpoints comply 1102 with this single consistent specification, encapsulation will be 1103 simpler as well as more predictable. 1105 Appendix B assesses whether copying rather than resetting CE on 1106 ingress will cause any unintended side-effects, from the three 1107 perspectives of security, control and management. In summary this 1108 analysis finds that: 1110 o From the control perspective either copying or resetting works for 1111 existing arrangements, but copying has more potential for 1112 simplifying control and resetting breaks at least one proposal 1113 already on the standards track. 1115 o From the management and monitoring perspective copying is 1116 preferable. 1118 o From the traffic security perspective (enforcing congestion 1119 control, mitigating denial of service etc) copying is preferable. 1121 o From the information security perspective resetting is preferable, 1122 but the IETF Security Area now considers copying acceptable given 1123 the bandwidth of a 2-bit covert channel can be managed. 1125 Therefore there are two points against resetting CE on ingress while 1126 copying CE causes no significant harm. 1128 5.3.2. Motivation for Changing Decapsulation 1130 The specification for decapsulation in Section 4 fixes three problems 1131 with the pre-existing behaviours of both RFC3168 and RFC4301: 1133 1. The pre-existing rules prevented the introduction of alternate 1134 ECN semantics to signal more than one severity level of 1135 congestion [RFC4774], [RFC5559]. The four states of the 2-bit 1136 ECN field provide room for signalling two severity levels in 1137 addition to not-congested and not-ECN-capable states. But, the 1138 pre-existing rules assumed that two of the states (ECT(0) and 1139 ECT(1)) are always equivalent. This unnecessarily restricts the 1140 use of one of four codepoints (half a bit) in the IP (v4 & v6) 1141 header. The new rules are designed to work in either case; 1142 whether ECT(1) is more severe than or equivalent to ECT(0). 1144 As explained in Appendix B.1, the original reason for not 1145 forwarding the outer ECT codepoints was to limit the covert 1146 channel across a decapsulator to 1 bit per packet. However, now 1147 that the IETF Security Area has deemed that a 2-bit covert 1148 channel through an encapsulator is a manageable risk, the same 1149 should be true for a decapsulator. 1151 As well as being useful for general future-proofing, this problem 1152 is immediately pressing for standardisation of pre-congestion 1153 notification (PCN), which uses two severity levels of congestion. 1154 If a congested queue used ECT(1) in the outer header to signal 1155 more severe congestion than ECT(0), the pre-existing 1156 decapsulation rules would have thrown away this congestion 1157 signal, preventing tunnelled traffic from ever knowing that it 1158 should reduce its load. 1160 Before the present specification was written, the PCN working 1161 group had to consider a number of wasteful or convoluted work- 1162 rounds to this problem. Without wishing to disparage the 1163 ingenuity of these work-rounds, none were chosen for the 1164 standards track because they were either somewhat wasteful, 1165 imprecise or complicated. Instead a baseline PCN encoding was 1166 specified [RFC5696] that supported only one severity level of 1167 congestion but allowed space for these work-rounds as 1168 experimental extensions. 1170 But by far the simplest approach is that taken by the current 1171 specification: just to remove the covert channel blockages from 1172 tunnelling behaviour--now deemed unnecessary anyway. Then 1173 network operators that want to support two congestion severity- 1174 levels for PCN can specify that every tunnel egress in a PCN 1175 region must comply with this latest specification. Having taken 1176 this step, the simplest possible encoding for PCN with two 1177 severity levels of congestion [I-D.ietf-pcn-3-in-1-encoding] can 1178 be used. 1180 Not only does this make two congestion severity-levels available 1181 for PCN, but also for other potential uses of the extra ECN 1182 codepoint (e.g. [VCP]). 1184 2. Cases are documented where a middlebox (e.g. a firewall) drops 1185 packets with header values that were currently unused (CU) when 1186 the box was deployed, often on the grounds that anything 1187 unexpected might be an attack. This tends to bar future use of 1188 CU values. The new decapsulation rules specify optional logging 1189 and/or alarms for specific combinations of inner and outer header 1190 that are currently unused. The aim is to give implementers a 1191 recourse other than drop if they are concerned about the security 1192 of CU values. It recognises legitimate security concerns about 1193 CU values but still eases their future use. If the alarms are 1194 interpreted as an attack (e.g. by a management system) the 1195 offending packets can be dropped. But alarms can be turned off 1196 if these combinations come into regular use (e.g. through a 1197 future standards action). 1199 3. While reviewing currently unused combinations of inner and outer, 1200 the opportunity was taken to define a single consistent behaviour 1201 for the three cases with a Not-ECT inner header but a different 1202 outer. RFC3168 and RFC4301 had diverged in this respect and even 1203 their common behaviours had never been justified. 1205 None of these combinations should result from Internet protocols 1206 in the RFC series, but future standards actions might put any or 1207 all of them to good use. Therefore it was decided that a 1208 decapsulator must forward a Not-ECT inner unchanged when the 1209 arriving outer is ECT(0) or ECT(1). But for safety it must drop 1210 a combination of Not-ECT inner and CE outer. Then, if some 1211 unfortunate misconfiguration resulted in a congested router 1212 marking CE on a packet that was originally Not-ECT, drop would be 1213 the only appropriate signal for the egress to propagate--the only 1214 signal a non-ECN-capable transport (Not-ECT) would understand. 1216 It may seem contradictory that the same argument has not been 1217 applied to the ECT(1) codepoint, given it is being proposed as an 1218 intermediate level of congestion in a scheme progressing through 1219 the IETF [I-D.ietf-pcn-3-in-1-encoding]. Instead, a decapsulator 1220 must forward a Not-ECT inner unchanged when its outer is ECT(1). 1221 The rationale for not dropping this CU combination is to ensure 1222 it will be usable if needed in the future. If any 1223 misconfiguration led to ECT(1) congestion signals with a Not-ECT 1224 inner, it would not be disastrous for the tunnel egress to 1225 suppress them, because the congestion should then escalate to CE 1226 marking, which the egress would drop, thus at least preventing 1227 congestion collapse. 1229 Problems 2 & 3 alone would not warrant a change to decapsulation, but 1230 it was decided they are worth fixing and making consistent at the 1231 same time as decapsulation code is changed to fix problem 1 (two 1232 congestion severity-levels). 1234 6. Backward Compatibility 1236 A tunnel endpoint compliant with the present specification is 1237 backward compatible when paired with any tunnel endpoint compliant 1238 with any previous tunnelling RFC, whether RFC4301, RFC3168 (see 1239 Section 3) or the earlier RFCs summarised in Appendix A (RFC2481, 1240 RFC2401 and RFC2003). Each case is enumerated below. 1242 6.1. Non-Issues Updating Decapsulation 1244 At the egress, this specification only augments the per-packet 1245 calculation of the ECN field (RFC3168 and RFC4301) for combinations 1246 of inner and outer headers that have so far not been used in any IETF 1247 protocols. 1249 Therefore, all other things being equal, if an RFC4301 IPsec egress 1250 is updated to comply with the new rules, it will still interwork with 1251 any RFC4301 compliant ingress and the packet outputs will be 1252 identical to those it would have output before (fully backward 1253 compatible). 1255 And, all other things being equal, if an RFC3168 egress is updated to 1256 comply with the same new rules, it will still interwork with any 1257 ingress complying with any previous specification (both modes of 1258 RFC3168, both modes of RFC2481, RFC2401 and RFC2003) and the packet 1259 outputs will be identical to those it would have output before (fully 1260 backward compatible). 1262 A compliant tunnel egress merely needs to implement the one behaviour 1263 in Section 4 with no additional mode or option configuration at the 1264 ingress or egress nor any additional negotiation with the ingress. 1265 The new decapsulation rules have been defined in such a way that 1266 congestion control will still work safely if any of the earlier 1267 versions of ECN processing are used unilaterally at the encapsulating 1268 ingress of the tunnel (any of RFC2003, RFC2401, either mode of 1269 RFC2481, either mode of RFC3168, RFC4301 and this present 1270 specification). 1272 6.2. Non-Update of RFC4301 IPsec Encapsulation 1274 An RFC4301 IPsec ingress can comply with this new specification 1275 without any update and it has no need for any new modes, options or 1276 configuration. So, all other things being equal, it will continue to 1277 interwork identically with any egress it worked with before (fully 1278 backward compatible). 1280 6.3. Update to RFC3168 Encapsulation 1282 The encapsulation behaviour of the new normal mode copies the ECN 1283 field whereas an RFC3168 ingress in full functionality mode reset it. 1284 However, all other things being equal, if an RFC3168 ingress is 1285 updated to the present specification, the outgoing packets from any 1286 tunnel egress will still be unchanged. This is because all variants 1287 of tunnelling at either end (RFC4301, both modes of RFC3168, both 1288 modes of RFC2481, RFC2401, RFC2003 and the present specification) 1289 have always propagated an incoming CE marking through the inner 1290 header and onward into the outgoing header, whether the outer header 1291 is reset or copied. Therefore, If the tunnel is considered as a 1292 black box, the packets output from any egress will be identical with 1293 or without an update to the ingress. Nonetheless, if packets are 1294 observed within the black box (between the tunnel endpoints), CE 1295 markings copied by the updated ingress will be visible within the 1296 black box, whereas they would not have been before. Therefore, the 1297 update to encapsulation can be termed 'black-box backwards 1298 compatible' (i.e. identical unless you look inside the tunnel). 1300 This specification introduces no new backward compatibility issues 1301 when a compliant ingress talks with a legacy egress, but it has to 1302 provide similar safeguards to those already defined in RFC3168. 1303 RFC3168 laid down rules to ensure that an RFC3168 ingress turns off 1304 ECN (limited functionality mode) if it is paired with a legacy egress 1305 (RFC 2481, RFC2401 or RFC2003), which would not propagate ECN 1306 correctly. The present specification carries forward those rules 1307 (Section 4.3). It uses compatibility mode whenever RFC3168 would 1308 have used limited functionality mode, and their per-packet behaviours 1309 are identical. Therefore, all other things being equal, an ingress 1310 using the new rules will interwork with any legacy tunnel egress in 1311 exactly the same way as an RFC3168 ingress (still black-box backward 1312 compatible). 1314 7. Design Principles for Alternate ECN Tunnelling Semantics 1316 This section is informative not normative. 1318 Section 5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] 1319 to 'switch in' alternative behaviours for marking the ECN field, just 1320 as it switches in different per-hop behaviours (PHBs) for scheduling. 1321 [RFC4774] gives best current practice for designing such alternative 1322 ECN semantics and very briefly mentions in section 5.4 that 1323 tunnelling needs to be considered. The guidance below complements 1324 and extends RFC4774, giving additional guidance on designing any 1325 alternate ECN semantics that would also require alternate tunnelling 1326 semantics. 1328 The overriding guidance is: "Avoid designing alternate ECN tunnelling 1329 semantics, if at all possible." If a scheme requires tunnels to 1330 implement special processing of the ECN field for certain DSCPs, it 1331 will be hard to guarantee that every implementer of every tunnel will 1332 have added the required exception or that operators will have 1333 ubiquitously deployed the required updates. It is unlikely a single 1334 authority is even aware of all the tunnels in a network, which may 1335 include tunnels set up by applications between endpoints, or 1336 dynamically created in the network. Therefore it is highly likely 1337 that some tunnels within a network or on hosts connected to it will 1338 not implement the required special case. 1340 That said, if a non-default scheme for tunnelling the ECN field is 1341 really required, the following guidelines might prove useful in its 1342 design: 1344 On encapsulation in any alternate scheme: 1346 1. The ECN field of the outer header ought to be cleared to Not- 1347 ECT ("00") unless it is guaranteed that the corresponding 1348 tunnel egress will correctly propagate congestion markings 1349 introduced across the tunnel in the outer header. 1351 2. If it has established that ECN will be correctly propagated, 1352 an encapsulator ought to also copy incoming congestion 1353 notification into the outer header. The general principle 1354 here is that the outer header should reflect congestion 1355 accumulated along the whole upstream path, not just since the 1356 tunnel ingress (Appendix B.3 on management and monitoring 1357 explains). 1359 In some circumstances (e.g. PCN [RFC5559] and perhaps some 1360 pseudowires [RFC5659]), the whole path is divided into 1361 segments, each with its own congestion notification and 1362 feedback loop. In these cases, the function that regulates 1363 load at the start of each segment will need to reset 1364 congestion notification for its segment. Often the point 1365 where congestion notification is reset will also be located at 1366 the start of a tunnel. However, the resetting function can be 1367 thought of as being applied to packets after the encapsulation 1368 function--two logically separate functions even though they 1369 might run on the same physical box. Then the code module 1370 doing encapsulation can keep to the copying rule and the load 1371 regulator module can reset congestion, without any code in 1372 either module being conditional on whether the other is there. 1374 On decapsulation in any new scheme: 1376 1. If the arriving inner header is Not-ECT it implies the 1377 transport will not understand other ECN codepoints. If the 1378 outer header carries an explicit congestion marking, the 1379 alternate scheme would be expected to drop the packet--the 1380 only indication of congestion the transport will understand. 1381 If the alternate scheme recommends forwarding rather than 1382 dropping such a packet, it will need to clearly justify this 1383 decision. If the inner is Not-ECT and the outer carries any 1384 other ECN codepoint that does not indicate congestion, the 1385 alternate scheme can forward the packet, but probably only as 1386 Not-ECT. 1388 2. If the arriving inner header is other than Not-ECT, the ECN 1389 field that the alternate decapsulation scheme forwards ought 1390 to reflect the more severe congestion marking of the arriving 1391 inner and outer headers. 1393 3. Any alternate scheme will need to define a behaviour for all 1394 combinations of inner and outer headers, even those that would 1395 not be expected to result from standards known at the time and 1396 even those that would not be expected from the tunnel ingress 1397 paired with the egress at run-time. Consideration should be 1398 given to logging such unexpected combinations and raising an 1399 alarm, particularly if there is a danger that the invalid 1400 combination implies congestion signals are not being 1401 propagated correctly. The presence of currently unused 1402 combinations may represent an attack, but the new scheme 1403 should try to define a way to forward such packets, at least 1404 if a safe outgoing codepoint can be defined. 1406 Raising an alarm allows a management system to decide whether 1407 the anomaly is indeed an attack, in which case it can decide 1408 to drop such packets. This is a preferable approach to hard- 1409 coded discard of packets that seem anomalous today, but may be 1410 needed tomorrow in future standards actions. 1412 8. IANA Considerations (to be removed on publication): 1414 This memo includes no request to IANA. 1416 9. Security Considerations 1418 Appendix B.1 discusses the security constraints imposed on ECN tunnel 1419 processing. The new rules for ECN tunnel processing (Section 4) 1420 trade-off between information security (covert channels) and traffic 1421 security (congestion monitoring & control). Ensuring congestion 1422 markings are not lost is itself an aspect of security, because if we 1423 allowed congestion notification to be lost, any attempt to enforce a 1424 response to congestion would be much harder. 1426 Security issues in unlikely but possible scenarios: 1428 Tunnels intersecting Diffserv regions with alternate ECN semantics: 1429 If alternate congestion notification semantics are defined for a 1430 certain Diffserv PHB, the scope of the alternate semantics might 1431 typically be bounded by the limits of a Diffserv region or 1432 regions, as envisaged in [RFC4774] (e.g. the pre-congestion 1433 notification architecture [RFC5559]). The inner headers in 1434 tunnels crossing the boundary of such a Diffserv region but ending 1435 within the region can potentially leak the external congestion 1436 notification semantics into the region, or leak the internal 1437 semantics out of the region. [RFC2983] discusses the need for 1438 Diffserv traffic conditioning to be applied at these tunnel 1439 endpoints as if they are at the edge of the Diffserv region. 1440 Similar concerns apply to any processing or propagation of the ECN 1441 field at the endpoints of tunnels with one end inside and the 1442 other outside the domain. [RFC5559] gives specific advice on this 1443 for the PCN case, but other definitions of alternate semantics 1444 will need to discuss the specific security implications in each 1445 case. 1447 ECN nonce tunnel coverage: The new decapsulation rules improve the 1448 coverage of the ECN nonce [RFC3540] relative to the previous rules 1449 in RFC3168 and RFC4301. However, nonce coverage is still not 1450 perfect, as this would have led to a safety problem in another 1451 case. Both are corner-cases, so discussion of the compromise 1452 between them is deferred to Appendix D. 1454 Covert channel not turned off: A legacy (RFC3168) tunnel ingress 1455 could ask an RFC3168 egress to turn off ECN processing as well as 1456 itself turning off ECN. An egress compliant with the present 1457 specification will agree to such a request from a legacy ingress, 1458 but it relies on the ingress always sending Not-ECT in the outer. 1459 If the egress receives other ECN codepoints in the outer it will 1460 process them as normal, so it will actually still copy congestion 1461 markings from the outer to the outgoing header. Referring for 1462 example to Figure 5 (Appendix B.1), although the tunnel ingress 1463 'I' will set all ECN fields in outer headers to Not-ECT, 'M' could 1464 still toggle CE or ECT(1) on and off to communicate covertly with 1465 'B', because we have specified that 'E' only has one mode 1466 regardless of what mode it says it has negotiated. We could have 1467 specified that 'E' should have a limited functionality mode and 1468 check for such behaviour. But we decided not to add the extra 1469 complexity of two modes on a compliant tunnel egress merely to 1470 cater for an historic security concern that is now considered 1471 manageable. 1473 10. Conclusions 1475 This document allows tunnels to propagate an extra level of 1476 congestion severity. It uses previously unused combinations of inner 1477 and outer header to augment the rules for calculating the ECN field 1478 when decapsulating IP packets at the egress of IPsec (RFC4301) and 1479 non-IPsec (RFC3168) tunnels. 1481 This document also updates the ingress tunnelling encapsulation of 1482 RFC3168 ECN to bring all IP in IP tunnels into line with the new 1483 behaviour in the IPsec architecture of RFC4301, which copies rather 1484 than resets the ECN field when creating outer headers. 1486 The need for both these updated behaviours was triggered by the 1487 introduction of pre-congestion notification (PCN) onto the IETF 1488 standards track. Operators wanting to support PCN or other alternate 1489 ECN schemes that use an extra severity level can require that their 1490 tunnels comply with the present specification. This is not a fork in 1491 the RFC series, it is an update that can be deployed first by those 1492 that need it, and subsequently by all tunnel endpoint implementations 1493 during general code maintenance. It is backward compatible with all 1494 previous tunnelling behaviours, so existing single severity level 1495 schemes will continue to work as before, but support for two severity 1496 levels will gradually be added to the Internet. 1498 The new rules propagate changes to the ECN field across tunnel end- 1499 points that previously blocked them to restrict the bandwidth of a 1500 potential covert channel. Limiting the channel's bandwidth to 2 bits 1501 per packet is now considered sufficient. 1503 At the same time as removing these legacy constraints, the 1504 opportunity has been taken to draw together diverging tunnel 1505 specifications into a single consistent behaviour. Then any tunnel 1506 can be deployed unilaterally, and it will support the full range of 1507 congestion control and management schemes without any modes or 1508 configuration. Further, any host or router can expect the ECN field 1509 to behave in the same way, whatever type of tunnel might intervene in 1510 the path. This new certainty could enable new uses of the ECN field 1511 that would otherwise be confounded by ambiguity. 1513 11. Acknowledgements 1515 Thanks to David Black for his insightful reviews and patient 1516 explanations of better ways to think about function placement and 1517 alarms. Thanks to David and to Anil Agarwal for pointing out cases 1518 where it is safe to forward CU combinations of headers. Also thanks 1519 to Arnaud Jacquet for the idea for Appendix C. Thanks to Gorry 1520 Fairhurst, Teco Boot, Michael Menth, Bruce Davie, Toby Moncaster, 1521 Sally Floyd, Alfred Hoenes, Gabriele Corliano, Ingemar Johansson, 1522 Philip Eardley and David Harrington for their thoughts and careful 1523 review comments, and to Stephen Hanna, Ben Campbell and members of 1524 the IESG for respectively conducting the Security Directorate, 1525 General Area and IESG reviews. 1527 Bob Briscoe is partly funded by Trilogy, a research project (ICT- 1528 216372) supported by the European Community under its Seventh 1529 Framework Programme. 1531 Comments Solicited (to be removed by the RFC Editor): 1533 Comments and questions are encouraged and very welcome. They can be 1534 addressed to the IETF Transport Area working group mailing list 1535 , and/or to the authors. 1537 12. References 1539 12.1. Normative References 1541 [RFC2003] Perkins, C., "IP Encapsulation within 1542 IP", RFC 2003, October 1996. 1544 [RFC2119] Bradner, S., "Key words for use in 1545 RFCs to Indicate Requirement Levels", 1546 BCP 14, RFC 2119, March 1997. 1548 [RFC3168] Ramakrishnan, K., Floyd, S., and D. 1549 Black, "The Addition of Explicit 1550 Congestion Notification (ECN) to IP", 1551 RFC 3168, September 2001. 1553 [RFC4301] Kent, S. and K. Seo, "Security 1554 Architecture for the Internet 1555 Protocol", RFC 4301, December 2005. 1557 12.2. Informative References 1559 [I-D.ietf-pcn-3-in-1-encoding] Briscoe, B., Moncaster, T., and M. 1560 Menth, "Encoding 3 PCN-States in the 1561 IP header using a single DSCP", 1562 draft-ietf-pcn-3-in-1-encoding-03 1563 (work in progress), July 2010. 1565 [RFC2401] Kent, S. and R. Atkinson, "Security 1566 Architecture for the Internet 1567 Protocol", RFC 2401, November 1998. 1569 [RFC2474] Nichols, K., Blake, S., Baker, F., 1570 and D. Black, "Definition of the 1571 Differentiated Services Field (DS 1572 Field) in the IPv4 and IPv6 Headers", 1573 RFC 2474, December 1998. 1575 [RFC2481] Ramakrishnan, K. and S. Floyd, "A 1576 Proposal to add Explicit Congestion 1577 Notification (ECN) to IP", RFC 2481, 1578 January 1999. 1580 [RFC2983] Black, D., "Differentiated Services 1581 and Tunnels", RFC 2983, October 2000. 1583 [RFC3540] Spring, N., Wetherall, D., and D. 1584 Ely, "Robust Explicit Congestion 1585 Notification (ECN) Signaling with 1586 Nonces", RFC 3540, June 2003. 1588 [RFC4306] Kaufman, C., "Internet Key Exchange 1589 (IKEv2) Protocol", RFC 4306, 1590 December 2005. 1592 [RFC4774] Floyd, S., "Specifying Alternate 1593 Semantics for the Explicit Congestion 1594 Notification (ECN) Field", BCP 124, 1595 RFC 4774, November 2006. 1597 [RFC5129] Davie, B., Briscoe, B., and J. Tay, 1598 "Explicit Congestion Marking in 1599 MPLS", RFC 5129, January 2008. 1601 [RFC5559] Eardley, P., "Pre-Congestion 1602 Notification (PCN) Architecture", 1603 RFC 5559, June 2009. 1605 [RFC5659] Bocci, M. and S. Bryant, "An 1606 Architecture for Multi-Segment 1607 Pseudowire Emulation Edge-to-Edge", 1608 RFC 5659, October 2009. 1610 [RFC5670] Eardley, P., "Metering and Marking 1611 Behaviour of PCN-Nodes", RFC 5670, 1612 November 2009. 1614 [RFC5696] Moncaster, T., Briscoe, B., and M. 1615 Menth, "Baseline Encoding and 1616 Transport of Pre-Congestion 1617 Information", RFC 5696, 1618 November 2009. 1620 [VCP] Xia, Y., Subramanian, L., Stoica, I., 1621 and S. Kalyanaraman, "One more bit is 1622 enough", Proc. SIGCOMM'05, ACM 1623 CCR 35(4)37--48, 2005, . 1626 Appendix A. Early ECN Tunnelling RFCs 1628 IP in IP tunnelling was originally defined in [RFC2003]. On 1629 encapsulation, the incoming header was copied to the outer and on 1630 decapsulation the outer was simply discarded. Initially, IPsec 1631 tunnelling [RFC2401] followed the same behaviour. 1633 When ECN was introduced experimentally in [RFC2481], legacy (RFC2003 1634 or RFC2401) tunnels would have discarded any congestion markings 1635 added to the outer header, so RFC2481 introduced rules for 1636 calculating the outgoing header from a combination of the inner and 1637 outer on decapsulation. RC2481 also introduced a second mode for 1638 IPsec tunnels, which turned off ECN processing (Not-ECT) in the outer 1639 header on encapsulation because an RFC2401 decapsulator would discard 1640 the outer on decapsulation. For RFC2401 IPsec this had the side- 1641 effect of completely blocking the covert channel. 1643 In RFC2481 the ECN field was defined as two separate bits. But when 1644 ECN moved from the experimental to the standards track [RFC3168], the 1645 ECN field was redefined as four codepoints. This required a 1646 different calculation of the ECN field from that used in RFC2481 on 1647 decapsulation. RFC3168 also had two modes; a 'full functionality 1648 mode' that restricted the covert channel as much as possible but 1649 still allowed ECN to be used with IPsec, and another that completely 1650 turned off ECN processing across the tunnel. This 'limited 1651 functionality mode' both offered a way for operators to completely 1652 block the covert channel and allowed an RFC3168 ingress to interwork 1653 with a legacy tunnel egress (RFC2481, RFC2401 or RFC2003). 1655 The present specification includes a similar compatibility mode to 1656 interwork safely with tunnels compliant with any of these three 1657 earlier RFCs. However, unlike RFC3168, it is only a mode of the 1658 ingress, as decapsulation behaviour is the same in either case. 1660 Appendix B. Design Constraints 1662 Tunnel processing of a congestion notification field has to meet 1663 congestion control and management needs without creating new 1664 information security vulnerabilities (if information security is 1665 required). This appendix documents the analysis of the tradeoffs 1666 between these factors that led to the new encapsulation rules in 1667 Section 4.1. 1669 B.1. Security Constraints 1671 Information security can be assured by using various end to end 1672 security solutions (including IPsec in transport mode [RFC4301]), but 1673 a commonly used scenario involves the need to communicate between two 1674 physically protected domains across the public Internet. In this 1675 case there are certain management advantages to using IPsec in tunnel 1676 mode solely across the publicly accessible part of the path. The 1677 path followed by a packet then crosses security 'domains'; the ones 1678 protected by physical or other means before and after the tunnel and 1679 the one protected by an IPsec tunnel across the otherwise unprotected 1680 domain. The scenario in Figure 5 will be used where endpoints 'A' 1681 and 'B' communicate through a tunnel. The tunnel ingress 'I' and 1682 egress 'E' are within physically protected edge domains, while the 1683 tunnel spans an unprotected internetwork where there may be 'men in 1684 the middle', M. 1686 physically unprotected physically 1687 <-protected domain-><--domain--><-protected domain-> 1688 +------------------+ +------------------+ 1689 | | M | | 1690 | A-------->I=========>==========>E-------->B | 1691 | | | | 1692 +------------------+ +------------------+ 1693 <----IPsec secured----> 1694 tunnel 1696 Figure 5: IPsec Tunnel Scenario 1698 IPsec encryption is typically used to prevent 'M' seeing messages 1699 from 'A' to 'B'. IPsec authentication is used to prevent 'M' 1700 masquerading as the sender of messages from 'A' to 'B' or altering 1701 their contents. 'I' can use IPsec tunnel mode to allow 'A' to 1702 communicate with 'B', but impose encryption to prevent 'A' leaking 1703 information to 'M'. Or 'E' can insist that 'I' uses tunnel mode 1704 authentication to prevent 'M' communicating information to 'B'. 1706 Mutable IP header fields such as the ECN field (as well as the TTL/ 1707 Hop Limit and DS fields) cannot be included in the cryptographic 1708 calculations of IPsec. Therefore, if 'I' copies these mutable fields 1709 into the outer header that is exposed across the tunnel it will have 1710 allowed a covert channel from 'A' to M that bypasses its encryption 1711 of the inner header. And if 'E' copies these fields from the outer 1712 header to the inner, even if it validates authentication from 'I', it 1713 will have allowed a covert channel from 'M' to 'B'. 1715 ECN at the IP layer is designed to carry information about congestion 1716 from a congested resource towards downstream nodes. Typically a 1717 downstream transport might feed the information back somehow to the 1718 point upstream of the congestion that can regulate the load on the 1719 congested resource, but other actions are possible [RFC3168; section 1720 6]. In terms of the above unicast scenario, ECN effectively intends 1721 to create an information channel (for congestion signalling) from 'M' 1722 to 'B' (for 'B' to feed back to 'A'). Therefore the goals of IPsec 1723 and ECN are mutually incompatible, requiring some compromise. 1725 With respect to using the DS or ECN fields as covert channels, 1726 section 5.1.2 of RFC4301 says, "controls are provided to manage the 1727 bandwidth of this channel". Using the ECN processing rules of 1728 RFC4301, the channel bandwidth is two bits per datagram from 'A' to 1729 'M' and one bit per datagram from 'M' to 'A' (because 'E' limits the 1730 combinations of the 2-bit ECN field that it will copy). In both 1731 cases the covert channel bandwidth is further reduced by noise from 1732 any real congestion marking. RFC4301 implies that these covert 1733 channels are sufficiently limited to be considered a manageable 1734 threat. However, with respect to the larger (6b) DS field, the same 1735 section of RFC4301 says not copying is the default, but a 1736 configuration option can allow copying "to allow a local 1737 administrator to decide whether the covert channel provided by 1738 copying these bits outweighs the benefits of copying". Of course, an 1739 administrator considering copying of the DS field has to take into 1740 account that it could be concatenated with the ECN field giving an 8b 1741 per datagram covert channel. 1743 For tunnelling the 6b Diffserv field two conceptual models have had 1744 to be defined so that administrators can trade off security against 1745 the needs of traffic conditioning [RFC2983]: 1747 The uniform model: where the Diffserv field is preserved end-to-end 1748 by copying into the outer header on encapsulation and copying from 1749 the outer header on decapsulation. 1751 The pipe model: where the outer header is independent of that in the 1752 inner header so it hides the Diffserv field of the inner header 1753 from any interaction with nodes along the tunnel. 1755 However, for ECN, the new IPsec security architecture in RFC4301 only 1756 standardised one tunnelling model equivalent to the uniform model. 1757 It deemed that simplicity was more important than allowing 1758 administrators the option of a tiny increment in security, especially 1759 given not copying congestion indications could seriously harm 1760 everyone's network service. 1762 B.2. Control Constraints 1764 Congestion control requires that any congestion notification marked 1765 into packets by a resource will be able to traverse a feedback loop 1766 back to a function capable of controlling the load on that resource. 1767 To be precise, rather than calling this function the data source, it 1768 will be called the Load Regulator. This allows for exceptional cases 1769 where load is not regulated by the data source, but usually the two 1770 terms will be synonymous. Note the term "a function _capable of_ 1771 controlling the load" deliberately includes a source application that 1772 doesn't actually control the load but ought to (e.g. an application 1773 without congestion control that uses UDP). 1775 A--->R--->I=========>M=========>E-------->B 1777 Figure 6: Simple Tunnel Scenario 1779 A similar tunnelling scenario to the IPsec one just described will 1780 now be considered, but without the different security domains, 1781 because the focus now shifts to whether the control loop and 1782 management monitoring work (Figure 6). If resources in the tunnel 1783 are to be able to explicitly notify congestion and the feedback path 1784 is from 'B' to 'A', it will certainly be necessary for 'E' to copy 1785 any CE marking from the outer header to the inner header for onward 1786 transmission to 'B', otherwise congestion notification from resources 1787 like 'M' cannot be fed back to the Load Regulator ('A'). But it does 1788 not seem necessary for 'I' to copy CE markings from the inner to the 1789 outer header. For instance, if resource 'R' is congested, it can 1790 send congestion information to 'B' using the congestion field in the 1791 inner header without 'I' copying the congestion field into the outer 1792 header and 'E' copying it back to the inner header. 'E' can still 1793 write any additional congestion marking introduced across the tunnel 1794 into the congestion field of the inner header. 1796 All this shows that 'E' can preserve the control loop irrespective of 1797 whether 'I' copies congestion notification into the outer header or 1798 resets it. 1800 That is the situation for existing control arrangements but, because 1801 copying reveals more information, it would open up possibilities for 1802 better control system designs. For instance, resetting CE marking on 1803 encapsulation breaks the standards track PCN congestion marking 1804 scheme [RFC5670]. It ends up removing excessive amounts of traffic 1805 unnecessarily. Whereas copying CE markings at ingress leads to the 1806 correct control behaviour. 1808 B.3. Management Constraints 1810 As well as control, there are also management constraints. 1811 Specifically, a management system may monitor congestion markings in 1812 passing packets, perhaps at the border between networks as part of a 1813 service level agreement. For instance, monitors at the borders of 1814 autonomous systems may need to measure how much congestion has 1815 accumulated so far along the path, perhaps to determine between them 1816 how much of the congestion is contributed by each domain. 1818 In this document the baseline of congestion marking (or the 1819 Congestion Baseline) is defined as the source of the layer that 1820 created (or most recently reset) the congestion notification field. 1821 When monitoring congestion it would be desirable if the Congestion 1822 Baseline did not depend on whether packets were tunnelled or not. 1823 Given some tunnels cross domain borders (e.g. consider M in Figure 6 1824 is monitoring a border), it would therefore be desirable for 'I' to 1825 copy congestion accumulated so far into the outer headers, so that it 1826 is exposed across the tunnel. 1828 For management purposes it might be useful for the tunnel egress to 1829 be able to monitor whether congestion occurred across a tunnel or 1830 upstream of it. Superficially it appears that copying congestion 1831 markings at the ingress would make this difficult, whereas it was 1832 straightforward when an RFC3168 ingress reset them. However, 1833 Appendix C gives a simple and precise method for a tunnel egress to 1834 infer the congestion level introduced across a tunnel. It works 1835 irrespective of whether the ingress copies or resets congestion 1836 markings. 1838 Appendix C. Contribution to Congestion across a Tunnel 1840 This specification mandates that a tunnel ingress determines the ECN 1841 field of each new outer tunnel header by copying the arriving header. 1842 Concern has been expressed that this will make it difficult for the 1843 tunnel egress to monitor congestion introduced only along a tunnel, 1844 which is easy if the outer ECN field is reset at a tunnel ingress 1845 (RFC3168 full functionality mode). However, in fact copying CE marks 1846 at ingress will still make it easy for the egress to measure 1847 congestion introduced across a tunnel, as illustrated below. 1849 Consider 100 packets measured at the egress. Say it measures that 30 1850 are CE marked in the inner and outer headers and 12 have additional 1851 CE marks in the outer but not the inner. This means packets arriving 1852 at the ingress had already experienced 30% congestion. However, it 1853 does not mean there was 12% congestion across the tunnel. The 1854 correct calculation of congestion across the tunnel is p_t = 12/ 1855 (100-30) = 12/70 = 17%. This is easy for the egress to measure. It 1856 is simply the proportion of packets not marked in the inner header 1857 (70) that have a CE marking in the outer header (12). This technique 1858 works whether the ingress copies or resets CE markings, so it can be 1859 used by an egress that is not sure which RFC the ingress complies 1860 with. 1862 Figure 7 illustrates this in a combinatorial probability diagram. 1863 The square represents 100 packets. The 30% division along the bottom 1864 represents marking before the ingress, and the p_t division up the 1865 side represents marking introduced across the tunnel. 1867 ^ outer header marking 1868 | 1869 100% +-----+---------+ The large square 1870 | | | represents 100 packets 1871 | 30 | | 1872 | | | p_t = 12/(100-30) 1873 p_t + +---------+ = 12/70 1874 | | 12 | = 17% 1875 0 +-----+---------+---> 1876 0 30% 100% inner header marking 1878 Figure 7: Tunnel Marking of Packets Already Marked at Ingress 1880 Appendix D. Compromise on Decap with ECT(1) Inner and ECT(0) Outer 1882 A packet with an ECT(1) inner and an ECT(0) outer should never arise 1883 from any known IETF protocol. Without giving a reason, RFC3168 and 1884 RFC4301 both say the outer should be ignored when decapsulating such 1885 a packet. This appendix explains why it was decided not to change 1886 this advice. 1888 In summary, ECT(0) always means 'not congested' and ECT(1) may imply 1889 the same [RFC3168] or it may imply a higher severity congestion 1890 signal [RFC4774], [I-D.ietf-pcn-3-in-1-encoding], depending on the 1891 transport in use. Whether they mean the same or not, at the ingress 1892 the outer should have started the same as the inner and only a broken 1893 or compromised router could have changed the outer to ECT(0). 1895 The decapsulator can detect this anomaly. But the question is, 1896 should it correct the anomaly by ignoring the outer, or should it 1897 reveal the anomaly to the end-to-end transport by forwarding the 1898 outer? 1900 On balance, it was decided that the decapsulator should correct the 1901 anomaly, but log the event and optionally raise an alarm. This is 1902 the safe action if ECT(1) is being used as a more severe marking than 1903 ECT(0), because it passes the more severe signal to the transport. 1904 However, it is not a good idea to hide anomalies, which is why an 1905 optional alarm is suggested. It should be noted that this anomaly 1906 may be the result of two changes to the outer: a broken or 1907 compromised router within the tunnel might be erasing congestion 1908 markings introduced earlier in the same tunnel by a congested router. 1909 In this case, the anomaly would be losing congestion signals, which 1910 needs immediate attention. 1912 The original reason for defining ECT(0) and ECT(1) as equivalent was 1913 so that the data source could use the ECN nonce [RFC3540] to detect 1914 if congestion signals were being erased. However, in this case, the 1915 decapsulator does not need a nonce to detect any anomalies introduced 1916 within the tunnel, because it has the inner as a record of the header 1917 at the ingress. Therefore, it was decided that the best compromise 1918 would be to give precedence to solving the safety issue over 1919 revealing the anomaly, because the anomaly could at least be detected 1920 and dealt with internally. 1922 Superficially, the opposite case where the inner and outer carry 1923 different ECT values, but with an ECT(1) outer and ECT(0) inner, 1924 seems to require a similar compromise. However, because that case is 1925 reversed, no compromise is necessary; it is best to forward the outer 1926 whether the transport expects the ECT(1) to mean a higher severity 1927 than ECT(0) or the same severity. Forwarding the outer either 1928 preserves a higher value (if it is higher) or it reveals an anomaly 1929 to the transport (if the two ECT codepoints mean the same severity). 1931 Appendix E. Open Issues 1933 The new decapsulation behaviour defined in Section 4.2 adds support 1934 for propagation of 2 severity levels of congestion. However 1935 transports have no way to discover whether there are any legacy 1936 tunnels on their path that will not propagate 2 severity levels. It 1937 would have been nice to add a feature for transports to check path 1938 support, but this remains an open issue that will have to be 1939 addressed in any future standards action to define an end-to-end 1940 scheme that requires 2-severity levels of congestion. PCN avoids 1941 this problem because it is only for a controlled region, so all 1942 legacy tunnels can be upgraded by the same operator that deploys PCN. 1944 Author's Address 1946 Bob Briscoe 1947 BT 1948 B54/77, Adastral Park 1949 Martlesham Heath 1950 Ipswich IP5 3RE 1951 UK 1953 Phone: +44 1473 645196 1954 EMail: bob.briscoe@bt.com 1955 URI: http://bobbriscoe.net/