idnits 2.17.1 draft-ietf-tsvwg-ecn-mpls-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 979. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 990. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 997. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1003. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 4, 2007) is 6047 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-01) exists of draft-briscoe-tsvwg-ecn-tunnel-00 == Outdated reference: A later version (-20) exists of draft-ietf-nsis-rmd-11 == Outdated reference: A later version (-11) exists of draft-ietf-pcn-architecture-00 == Outdated reference: A later version (-07) exists of draft-ietf-tsvwg-diffserv-class-aggr-04 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Davie 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track B. Briscoe 5 Expires: April 6, 2008 J. Tay 6 BT Research 7 October 4, 2007 9 Explicit Congestion Marking in MPLS 10 draft-ietf-tsvwg-ecn-mpls-02.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on April 6, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 RFC 3270 defines how to support the Diffserv architecture in MPLS 44 networks, including how to encode Diffserv Code Points (DSCPs) in an 45 MPLS header. DSCPs may be encoded in the EXP field, while other uses 46 of that field are not precluded. RFC3270 makes no statement about 47 how Explicit Congestion Notification (ECN) marking might be encoded 48 in the MPLS header. This draft defines how an operator might define 49 some of the EXP codepoints for explicit congestion notification, 50 without precluding other uses. 52 Requirements Language 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 56 document are to be interpreted as described in RFC 2119 [RFC2119]. 58 Change History 60 [Note to RFC Editor: This section to be removed before publication] 62 Changes in this version (draft-ietf-tsvwg-ecn-mpls-02.txt) relative 63 to the last (draft-ietf-tsvwg-ecn-mpls-01.txt): 65 o Added new text about misordering considerations in Section 6. 67 o Swapped order of Section 8 and Section 9. 69 o Explained more fully the example of congestion-based traffic 70 engineering in Section 9.3. 72 o Trimmed the example of PCN in Section 9.4 and updated to latest 73 preferred PCN terminology in PCN appendix. 75 Changes in draft-ietf-tsvwg-ecn-mpls-01.txt relative to 76 draft-ietf-tsvwg-ecn-mpls-00.txt: 78 o Moved the detailed discussion of marking procedures for Pre- 79 Congestion Notification (PCN) to an appendix. 81 o Removed PCN as a motivation for the efficient code-point usage in 82 Section 2. 84 o Clarified the rationale for preferring the ECT-checking approach 85 over the approach of [Floyd] in Section 8.1. 87 o Updated discussion of relationship to RFC3168 in Section 7 89 o Removed discussion of re-ECN from Security Considerations. 91 o Fixed typos and nits. 93 Changes in draft-ietf-tsvwg-ecn-mpls-00.txt relative to 94 draft-davie-ecn-mpls-00: 96 o Corrected the description of ECN-MPLS marking proposed in 97 [Shayman], which closely corresponds to that proposed in this 98 document. 100 o Pre-congestion notification (PCN) marking is now described in a 101 way that does not require normative references to PCN 102 specifications. PCN discussion now serves only to illustrate how 103 the ECN marking concepts can be extended to cover more complex 104 scenarios, with PCN being an example. 106 o Added specification of behavior when MPLS encapsulated packets 107 cross from an ECN-enabled domain to a domain that is not ECN- 108 enabled. 110 o Clarified that copying MPLS ECN or PCN marking into exposed IP 111 header on egress is not mandatory 113 o Fixed typos and nits 115 Table of Contents 117 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 118 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . 5 119 1.2. Intent . . . . . . . . . . . . . . . . . . . . . . . . . . 5 120 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 121 2. Use of MPLS EXP Field for ECN . . . . . . . . . . . . . . . . 7 122 3. Per-domain ECT checking . . . . . . . . . . . . . . . . . . . 9 123 4. ECN-enabled MPLS domain . . . . . . . . . . . . . . . . . . . 9 124 4.1. Pushing (adding) one or more labels to an IP packet . . . 10 125 4.2. Pushing one or more labels onto an MPLS labelled packet . 10 126 4.3. Congestion experienced in an interior MPLS node . . . . . 10 127 4.4. Crossing a Diffserv Domain Boundary . . . . . . . . . . . 10 128 4.5. Popping an MPLS label (not the end of the stack) . . . . . 11 129 4.6. Popping the last MPLS label in the stack . . . . . . . . . 11 130 4.7. Diffserv Tunneling Models . . . . . . . . . . . . . . . . 11 131 5. ECN-disabled MPLS domain . . . . . . . . . . . . . . . . . . . 12 132 6. The use of more codepoints with E-LSPs and L-LSPs . . . . . . 12 133 7. Relationship to tunnel behavior in RFC 3168 . . . . . . . . . 12 134 8. Deployment Considerations . . . . . . . . . . . . . . . . . . 13 135 8.1. Marking non-ECN Capable Packets . . . . . . . . . . . . . 13 136 8.2. Non-ECN capable routers in an MPLS Domain . . . . . . . . 14 137 9. Example Uses . . . . . . . . . . . . . . . . . . . . . . . . . 14 138 9.1. RFC3168-style ECN . . . . . . . . . . . . . . . . . . . . 14 139 9.2. ECN Co-existence with Diffserv E-LSPs . . . . . . . . . . 15 140 9.3. Congestion-feedback-based Traffic Engineering . . . . . . 15 141 9.4. PCN flow admission control and flow termination . . . . . 16 142 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 143 11. Security Considerations . . . . . . . . . . . . . . . . . . . 16 144 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 145 Appendix A. Extension to Pre-Congestion Notification . . . . . . 17 146 Appendix A.1. Label Push onto IP packet . . . . . . . . . . . . . 18 147 Appendix A.2. Pushing Additional MPLS Labels . . . . . . . . . . . 18 148 Appendix A.3. Admission Control or Flow Termination Marking 149 inside MPLS domain . . . . . . . . . . . . . . . . . 18 150 Appendix A.4. Popping an MPLS Label (not end of stack) . . . . . . 18 151 Appendix A.5. Popping the last MPLS Label to expose IP header . . 19 152 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 153 13.1. Normative References . . . . . . . . . . . . . . . . . . . 19 154 13.2. Informative References . . . . . . . . . . . . . . . . . . 20 155 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 156 Intellectual Property and Copyright Statements . . . . . . . . . . 23 158 1. Introduction 160 1.1. Background 162 [RFC3168] defines Explicit Congestion Notification for IP. The 163 primary purpose of ECN is to allow congestion to be signalled without 164 dropping packets. 166 [RFC3270] defines how to support the Diffserv architecture in MPLS 167 networks, including how to encode Diffserv Code Points (DSCPs) in an 168 MPLS header. DSCPs may be encoded in the EXP field, while other uses 169 of that field are not precluded. RFC3270 makes no statement about 170 how Explicit Congestion Notification (ECN) marking might be encoded 171 in the MPLS header. 173 This draft defines how an operator might define some of the EXP 174 codepoints for explicit congestion notification, without precluding 175 other uses. In parallel to the activity defining the addition of ECN 176 to IP [RFC3168], two proposals were made to add ECN to MPLS 177 [Floyd][Shayman]. These proposals, however, fell by the wayside. 178 With ECN for IP now being a proposed standard, and developing 179 interest in using pre-congestion notification (PCN) for admission 180 control and flow termination [I-D.ietf-pcn-architecture], there is 181 consequent interest in being able to support ECN across IP networks 182 consisting of MPLS-enabled domains. Therefore it is necessary to 183 specify the protocol for including ECN in the MPLS shim header, and 184 the protocol behavior of edge MPLS nodes. 186 We note that in [RFC3168] there are four codepoints used for ECN 187 marking, which are encoded using two bits of the IP header. The MPLS 188 EXP field is the logical place to encode ECN codepoints, but with 189 only 3 bits (8 codepoints) available, and with the same field being 190 used to convey DSCP information as well, there is a clear incentive 191 to conserve the number of codepoints consumed for ECN purposes. 192 Efficient use of the EXP field has been a focus of prior drafts 193 [Floyd] [Shayman] and we draw on those efforts in this draft as well. 195 We also note that [RFC3168] defines default usage of the ECN field 196 but allows for the possibility that some Diffserv PHBs might include 197 different specifications on how the ECN field is to be used. This 198 draft seeks to preserve that capability. 200 1.2. Intent 202 Our intent is to specify how the MPLS shim header [RFC3032] should 203 denote ECN marking and how MPLS nodes should understand whether the 204 transport for a packet will be ECN capable. We offer this as a 205 building block, from which to build different congestion notification 206 systems. We do not intend to specify how the resulting congestion 207 notification is fed back to an upstream node that can mitigate 208 congestion. For instance, unlike [Shayman], we do not specify edge- 209 to-edge MPLS domain feedback, but we also do not preclude it. 210 Nonetheless, we do specify how the egress node of an MPLS domain 211 should copy congestion notification from the MPLS shim into the 212 encapsulated IP header if the ECN is to be carried onward towards the 213 IP receiver. But we do NOT mandate that MPLS congestion notification 214 must be copied into the IP header for onward transmission. This 215 draft aims to be generic for any use of congestion notification in 216 MPLS. Support of [RFC3168] is our primary motivation; some 217 additional potential applications to illustrate the flexibility of 218 our approach are described in Section 9. In particular, we aim to 219 support possible future schemes that may use more than one level of 220 congestion marking. 222 1.3. Terminology 224 This document draws freely on the terminology of ECN [RFC3168] and 225 MPLS [RFC3031]. For ease of reference, we have included some 226 definitions here, but refer the reader to the references above for 227 complete specifications of the relevant technologies: 229 o CE: Congestion Experienced. One of the states with which a packet 230 may be marked in a network supporting ECN. A packet is marked in 231 this state by an ECN-capable router, to indicate that this router 232 was experiencing congestion at the time the packet arrived. 234 o ECT: ECN-capable Transport. One of the ECN states which a packet 235 may be in when it is sent by an end system. An end system marks a 236 packet with an ECT codepoint to indicate that the end-points of 237 the transport protocol are ECN-capable. A router may not mark a 238 packet as CE unless the packet was marked ECT when it arrived. 240 o Not-ECT: Not ECN capable transport. An end system marks a packet 241 with this codepoint to indicate that the end-points of the 242 transport protocol are not ECN-capable. A congested router cannot 243 mark such packets as CE, and thus can only drop them to indicate 244 congestion. 246 o EXP field. A 3 bit field in the MPLS label header [RFC3032] which 247 may be used to convey Diffserv information (and is also used in 248 this draft to carry ECN information). 250 o PHP. Penultimate Hop Popping. An MPLS operation in which the 251 penultimate Label Switching Router (LSR) on a Label Switched Path 252 (LSP) removes the top label from the packet before forwarding the 253 packet to the final LSR on the LSP. 255 2. Use of MPLS EXP Field for ECN 257 We propose that LSRs configured for explicit congestion notification 258 should use the EXP field in the MPLS shim header. However, [RFC3270] 259 already defines use of codepoints in the EXP field for differentiated 260 services. Although it does not preclude other compatible uses of the 261 EXP field, this clearly seems to limit the space available for ECN, 262 given the field is only 3 bits (8 codepoints). 264 [RFC3270] defines two possible approaches for requesting 265 differentiated service treatment from an LSR. 267 o In the E-LSP approach, different codepoints of the EXP field in 268 the MPLS shim header are used to indicate the packet's per hop 269 behavior (PHB). 271 o In the L-LSP approach, an MPLS label is assigned for each PHB 272 scheduling class (PSC, as defined in [RFC3260], so that an LSR 273 determines both its forwarding and its scheduling behavior from 274 the label. 276 If an MPLS domain uses the L-LSP approach, there is likely to be 277 space in the EXP field for ECN codepoint(s). Where the E-LSP 278 approach is used, then codepoint space in the EXP field is likely to 279 be scarce. This draft focuses on interworking ECN marking with the 280 E-LSP approach as it is the tougher problem. Consequently the same 281 approach can also be applied with L-LSPs. 283 We recommend that explicit congestion notification in MPLS should use 284 codepoints instead of bits in the EXP field. Since not every PHB 285 will necessarily require an associated ECN codepoint it would be 286 wasteful to assign a dedicated bit for ECN. (There may also be cases 287 where a given PHB might need more than one ECN-like codepoint; see 288 Section 9.4 for an example.) 290 For each PHB that uses ECN marking, we assume one EXP codepoint will 291 be defined meaning not congestion marked (Not-CM), and at least one 292 other codepoint will be defined meaning congestion marked (CM). 293 Therefore, each PHB that uses ECN marking will consume at least two 294 EXP codepoints. But PHBs that do not use ECN marking will only 295 consume one. 297 Further, we wish to use minimal space in the MPLS shim header to tell 298 interior LSRs whether each packet will be received by an ECN-capable 299 transport (ECT). Nonetheless, we must ensure that an end-point that 300 would not understand an ECN mark will not receive one, otherwise it 301 will not be able to respond to congestion as it should. In the past, 302 three solutions to this problem have been proposed: 304 o One possible approach is for congested LSRs to mark the ECN field 305 in the underlying IP header at the bottom of the label stack. 306 Although many commercial LSRs routinely access the IP header for 307 other reasons (ECMP), there are numerous drawbacks to attempting 308 to find an IP header beneath an MPLS label stack. Notably, there 309 is the challenge of detecting the absence of an IP header when 310 non-IP packets are carried on an LSP. Therefore we will not 311 consider this approach further. 313 o In the scheme suggested by [Floyd] ECT and CE are overloaded into 314 one bit, so that a 0 means ECT while a 1 might either mean Not-ECT 315 or it might mean CE. A packet that has been marked as having 316 experienced congestion upstream, and then is picked out for 317 marking at a second congested LSR, will be dropped by the second 318 LSR since it cannot determine whether the packet has previously 319 experienced congestion or if ECN is not supported by the 320 transport. 322 While such an approach seemed potentially palatable, we do not 323 recommend it here for the following reasons. In some cases we 324 wish to be able to use ECN marking long before actual congestion 325 (e.g. pre-congestion notification). In these circumstances, 326 marking rates at each LSR might be non-negligible most of the 327 time, so the chances of a previously marked packet encountering an 328 LSR that wants to mark it again will also be non-negligible. In 329 the case where CE and not-ECT are indistinguishable to core 330 routers, such a scenario could lead to unacceptable drop rates. 331 If the typical marking rate at every router or LSR is p, and the 332 typical diameter of the network of LSRs is d, then the probability 333 that a marked packet will be chosen for marking more than once is 334 1-[Pr(never marked) + Pr(marked at exactly one hop)] = 1- [(1-p)^d 335 + dp(1-p)^(d-1)]. For instance, with 6 LSRs in a row, each 336 marking ECN with 1% probability, the chances of a packet that is 337 already marked being chosen for marking a second time is 0.15%. 338 The bit overloading scheme would therefore introduce a drop rate 339 of 0.15% unnecessarily. Given that most modern core networks are 340 sized to introduce near-zero packet drop, it may be unacceptable 341 to drop over one in a thousand packets unnecessarily. 343 o A third possible approach was suggested by [Shayman]. In this 344 scheme, interior LSRs assume that the endpoints are ECN-capable, 345 but this assumption is checked when the final label is popped. If 346 an interior LSR has marked ECN in the EXP field of the shim 347 header, but the IP header says the endpoints are not ECN capable, 348 the edge router (or penultimate router, if using penultimate hop 349 popping) drops the packet. We recommend this scheme, which we 350 call `per-domain ECT checking', and define it more precisely in 351 the following section. Its chief drawback is that it can cause 352 packets to be forwarded after encountering congestion only to be 353 dropped at the egress of the MPLS domain. The rationale for this 354 decision is given in Section 8.1. 356 3. Per-domain ECT checking 358 For the purposes of this discussion, we define the egress nodes of an 359 MPLS domain as the nodes that pop the last MPLS label from the label 360 stack, exposing the IP (or, potentially non-IP) header. Note that 361 such a node may be the ultimate or penultimate hop of an LSP, 362 depending on whether penultimate hop popping (PHP) is employed. 364 In the per-domain ECT checking approach, the egress nodes take 365 responsibility for checking whether the transport is ECN capable. 366 This draft does not specify how these nodes should pass on congestion 367 notification, because different approaches are likely in different 368 scenarios. However, if congestion notification in the MPLS header is 369 copied into the IP header, the procedure MUST conform to the 370 specification given here. 372 If congestion notification is passed to the transport without first 373 passing it onward in the IP header, the approach used must take 374 similar care to check that the transport is ECN capable before 375 passing it ECN markings. Specifically, if the transport for a 376 particular congestion marked MPLS packet is found not to be ECN- 377 capable, the packet MUST be dropped at this egress node. 379 In the per-domain ECT checking approach, only the egress nodes check 380 whether an IP packet is destined for an ECN-capable transport. 381 Therefore, any single LSR within an MPLS domain MUST NOT be 382 configured to enable ECN marking unless all the egress LSRs 383 surrounding it are already configured to handle ECN marking. 385 We call a domain surrounded by ECN-capable egress LSRs an ECN-enabled 386 MPLS domain. This term only implies that all the egress LSRs are 387 ECN-enabled; some interior LSRs may not be ECN-enabled. For 388 instance, it would be possible to use some legacy LSRs incapable of 389 supporting ECN in the interior of an MPLS domain as long as all the 390 egress LSRs were ECN-capable. Note that if PHP is used, the 391 "penultimate hop" routers which perform the pop operation do need to 392 be ECN-enabled, since they are acting in this context as egress LSRs. 394 4. ECN-enabled MPLS domain 396 In the following subsections we describe various operations affecting 397 the ECN marking of a packet that may be performed at MPLS edge and 398 core LSRs. 400 4.1. Pushing (adding) one or more labels to an IP packet 402 On encapsulating an IP packet with an MPLS label stack, the ECN field 403 must be translated from the IP packet into the MPLS EXP field. The 404 Not-CM (not congestion marked) state is set in the MPLS EXP field if 405 the ECN status of the IP packet is "Not ECT" or ECT(1) or ECT(0). 406 The CM state is set if the ECN status of the IP packet is "CE". If 407 more than one label is pushed at one time, the same value should be 408 placed in the EXP value of all label stack entries. 410 4.2. Pushing one or more labels onto an MPLS labelled packet 412 The EXP field is copied directly from the topmost label before the 413 push to the newly added outer label. If more than one label is being 414 pushed, the same EXP value is copied to all label stack entries. 416 4.3. Congestion experienced in an interior MPLS node 418 If the EXP codepoint of the packet maps to a PHB that uses ECN 419 marking and the marking algorithm requires the packet to be marked, 420 the CM state is set (irrespective of whether it is already in the CM 421 state). 423 If the buffer is full, a packet is dropped. 425 4.4. Crossing a Diffserv Domain Boundary 427 If an MPLS-encapsulated packet crosses a Diffserv domain boundary, it 428 may be the case that the two domains use different encodings of the 429 same PHB in the EXP field. In such cases, the EXP field must be 430 rewritten at the domain boundary. If the PHB is one that supports 431 ECN, then the appropriate ECN marking should also be preserved when 432 the EXP field is mapped at the boundary. 434 If an MPLS-encapsulated packet that is in the CM state crosses from a 435 domain that is ECN-enabled (as defined in Section 3) to a domain that 436 is not ECN-enabled, then it is necessary to perform the egress 437 checking procedures at the egress LSR of the ECN-enabled domain. 438 This means that if the encapsulated packet is not ECN capable, the 439 packet MUST be dropped. Note that this implies the egress LSR must 440 be able to look beneath the MPLS header without popping the label 441 stack. 443 The related issue of Diffserv tunnel models is discussed in 444 Section 4.7. 446 4.5. Popping an MPLS label (not the end of the stack) 448 When a packet has more than one MPLS label in the stack and the top 449 label is popped, another MPLS label is exposed. In this case the ECN 450 information should be transferred from the outer EXP field to the 451 inner MPLS label in the following manner. If the inner EXP field is 452 Not-CM, the inner EXP field is set to the same CM or Not-CM state as 453 the outer EXP field. If the inner EXP field is CM, it remains 454 unchanged whatever the outer EXP field. Note that an inner value of 455 CM and an outer value of not-CM should be considered anomalous, and 456 SHOULD be logged in some way by the LSR. 458 4.6. Popping the last MPLS label in the stack 460 When the last MPLS label is popped from the packet, its payload is 461 exposed. If that packet is not IP, and does not have any capability 462 equivalent to ECT, it is assumed Not-ECT and treated as such. That 463 means that if the EXP value of the MPLS header was CM, the packet 464 MUST be dropped. 466 Assuming an IP packet was exposed, we have to examine whether that 467 packet is ECT or not. A Not-ECT packet MUST be dropped if the EXP 468 field is CM. 470 For the remainder of this section, we describe the behavior that is 471 required if the ECN information is to be transferred from the MPLS 472 header into the exposed IP header for onward transmission. As noted 473 in Section 1.2, such behavior is not mandated by this document, but 474 may be selected by an operator. 476 If the inner IP packet is Not-ECT, its ECN field remains unchanged if 477 the EXP field is Not-CM. If the ECN field of the inner packet is set 478 to ECT(0), ECT(1) or CE, the ECN field remains unchanged if the EXP 479 field is set to Not-CM. The ECN field is set to CE if the EXP field 480 is CM. Note that an inner value of CE and an outer value of not-CM 481 should be considered anomalous, and SHOULD be logged in some way by 482 the LSR. 484 4.7. Diffserv Tunneling Models 486 [RFC3270] describes three tunneling models for Diffserv support 487 across MPLS Domains, referred to as the uniform, short pipe, and pipe 488 models. The differences between these models lie in whether the 489 Diffserv treatment that applies to a packet while it travels along a 490 particular LSP is carried to the last hop of the LSP and beyond the 491 last hop. Depending on which mode is preferred by an operator, the 492 EXP value or DSCP value of an exposed header following a label pop 493 may or may not be dependent on the EXP value of the label that is 494 removed by the pop operation. We believe that in the case of ECN 495 marking, the use of these models should only apply to the encoding of 496 the Diffserv PHB in the EXP value, and that the choice of codepoint 497 for ECN should always be made based on the procedures described 498 above, independent of the tunneling model. 500 5. ECN-disabled MPLS domain 502 If ECN is not enabled on all the egress LSRs of a domain, ECN MUST 503 NOT be enabled on any LSRs throughout the domain. If congestion is 504 experienced on any LSR in an ECN-disabled MPLS domain, packets MUST 505 be dropped, NOT marked. The exact algorithm for deciding when to 506 drop packets during congestion (e.g. tail-drop, RED, etc.) is a local 507 matter for the operator of the domain. 509 6. The use of more codepoints with E-LSPs and L-LSPs 511 [RFC3270] gives different options with E-LSPs and L-LSPs and some of 512 those could potentially provide ample EXP codepoints for ECN. 513 However, deploying L-LSPs vs E-LSPs has many implications such as 514 platform support and operational complexity. The above ECN MPLS 515 solution should provide some flexibility. If the operator has 516 deployed one L-LSP per PHB scheduling class, then EXP space will be a 517 non-issue and it could be used to achieve more sophisticated ECN 518 behavior if required. If the operator wants to stick to E-LSPs and 519 uses a handful of EXP codepoints for Diffserv, it may be desirable to 520 operate with a minimum number of extra ECN codepoints, even if this 521 comes with some compromise on ECN optimality. See Section 9 for 522 discussion of some possible deployment scenarios. 524 We note that in a network where L-LSPs are used, ECN marking SHOULD 525 NOT cause packets from the same microflow but with different ECN 526 markings to be sent on different LSPs. As discussed in [RFC3270], 527 packets of a single microflow should always travel on the same LSP to 528 avoid possible misordering. Thus, ECN marking of packets on L-LSPs 529 SHOULD only affect the EXP value of the packets. 531 7. Relationship to tunnel behavior in RFC 3168 533 [RFC3168] defines two modes of encapsulating ECN-marked IP packets 534 inside additional IP headers when tunnels are used. The two modes 535 are the "full functionality" and "limited functionality" modes. In 536 the full functionality mode, the ECT information from the inner 537 header is copied to the outer header at the tunnel ingress, but the 538 CE information is not. In the limited functionality mode, neither 539 ECT nor CE information is copied to the outer header, and thus ECN 540 cannot be applied to the encapsulated packet. 542 The behavior that is specified in Section 4 of this document 543 resembles the "full functionality" mode in the sense that it conveys 544 some information from inner to outer header, and in the sense that it 545 enables full ECN support along the MPLS LSP (which is analogous to an 546 IP tunnel in this context). However it differs in one respect, which 547 is that the CE information is conveyed from the inner header to the 548 outer header. Our original reason for this different design choice 549 was to give interior routers and LSRs more information about upstream 550 marking in multi-bottleneck cases. For instance, the flow 551 termination marking mechanism proposed for PCN works by only 552 considering packets for marking that have not already been marked 553 upstream. Unless existing flow termination marking is copied from 554 the inner to the outer header at tunnel ingress, the mechanism 555 doesn't terminate enough traffic in cases where anomalous events hit 556 multiple domains at once. [RFC3168] does not give any reasons 557 against conveying CE information from the inner header to the outer 558 in the "full functionality" mode. Furthermore, [RFC4301] specifies 559 that the ECN marking should be copied from inner header to outer 560 header in IPSEC tunnels, consistent with the approach defined here. 561 [I-D.briscoe-tsvwg-ecn-tunnel] discusses this issue in more detail. 562 In summary, the approach described in Section 4 appears to be both a 563 sound technical choice and consistent with the current state of 564 thinking in the IETF. 566 8. Deployment Considerations 568 8.1. Marking non-ECN Capable Packets 570 What are the consequences of marking a packet that is not ECN- 571 capable? Even if it will be dropped before leaving the domain, 572 doesn't this consume resources unnecessarily? 574 The problem only arises if there is congestion downstream of an 575 earlier congested queue in the same MPLS domain. Downstream 576 congested LSRs might forward packets already marked, even though they 577 will be dropped later when the inner IP header is found to be Not-ECT 578 on decapsulation. Such packets might cause the downstream LSRs to 579 mark (or drop) other packets that they would otherwise not have had 580 to. 582 We expect congestion will typically be rare in MPLS networks, but it 583 might not be. The extra unnecessary load at downstream LSRs will not 584 be more than the fraction of marked packets from upstream LSRs, even 585 in the worst case where no transports are ECN capable. Therefore the 586 amount of unnecessary marking (or drop) on an LSR will not be more 587 than the product of its local marking rate and the marking rate due 588 to upstream LSRs within the same domain - typically the product of 589 two small (often zero) probabilities. 591 This is why we decided to use the per-domain ECT checking approach - 592 because the most likely effect would be a very slightly increased 593 marking rate, which would result in very slightly higher drop only 594 for non-ECN-capable transports. We chose not to use the [Floyd] 595 alternative which introduced a low but persistent level of 596 unnecessary packet drop for all time, even for ECN-capable 597 transports. Although that scheme did not carry traffic to the edge 598 of the MPLS domain only to be dropped on decapsulation, we felt our 599 minor inefficiency was a small price to pay. And it would get 600 smaller still if ECN deployment widened. 602 A partial solution would be to preferentially drop packets arriving 603 at a congested router that were already marked. There is no solution 604 to the problem of marking a packet when congestion is caused by 605 another packet that should have been dropped. However, the chance of 606 such an occurrence is very low and the consequences are not 607 significant. It merely causes an application to very occasionally 608 slow down its rate when it did not have to. 610 8.2. Non-ECN capable routers in an MPLS Domain 612 What if an MPLS domain wants to use ECN, but not all legacy routers 613 are able to support it? 615 If the legacy router(s) are used in the interior, this is not a 616 problem. They will simply have to drop the packets if they are 617 congested, rather than mark them, which is the standard behavior for 618 IP routers that are not ECN-enabled. 620 If the legacy router were used as an egress router, it would not be 621 able to check the ECN capability of the transport correctly. An 622 operator in this position would not be able to use this solution and 623 therefore MUST NOT enable ECN unless all egress routers are ECN- 624 capable. 626 9. Example Uses 628 9.1. RFC3168-style ECN 630 [RFC3168] proposes the use of ECN in TCP and introduces the use of 631 ECN-Echo and CWR flags in the TCP header for initialization. The TCP 632 sender responds accordingly (such as not increasing the congestion 633 window) when it receives an ECN-Echo (ECE) ACK packet (that is, an 634 ACK packet with ECN-Echo flag set in the TCP header), then the sender 635 knows that congestion was encountered in the network on the path from 636 the sender to the receiver. 638 It would be possible to enable ECN in an MPLS domain for Diffserv 639 PHBs like AF and best efforts that are expected to be used by TCP and 640 similar transports (e.g. DCCP [RFC4340]). Then end-to-end 641 congestion control in transports capable of understanding ECN would 642 be able to respond to approaching congestion on LSRs without having 643 to rely on packet discard to signal congestion. 645 9.2. ECN Co-existence with Diffserv E-LSPs 647 Many operators today have deployed Diffserv using the E-LSP approach 648 of [RFC3270]. In many cases the number of PHBs used is less than 8, 649 and hence there remain available codepoints in the EXP space. If an 650 operator wished to support ECN for single PHB, this can be 651 accomplished by simply allocated a second codepoint to the PHB for 652 the "CM" state of that PHB and retaining the old codepoint for the 653 "not-CM" state. An operator with only four deployed PHBs could of 654 course enable ECN marking on all those PHBs. It is easy to imagine 655 cases where some PHBs might benefit more from ECN than others - for 656 example, an operator might use ECN on a premium data service but not 657 on a PHB used for best effort internet traffic. 659 As an illustrative example of how the EXP field might be used in this 660 case, consider the example of an operator who is using the aggregated 661 service classes proposed in [I-D.ietf-tsvwg-diffserv-class-aggr]. He 662 may choose to support ECN only for the Assured Elastic Treatment 663 Aggregate, using the EXP codepoint 010 for the not-CM state and 011 664 for the CM state. All other codepoints could be the same as in 665 [I-D.ietf-tsvwg-diffserv-class-aggr]. Of course any other 666 combination of EXP values can be used according to the specific set 667 of PHBs and marking conventions used within that operator's network. 669 9.3. Congestion-feedback-based Traffic Engineering 671 Shayman's traffic engineering [Shayman] presents another example 672 application of ECN feedback in an MPLS domain. Shayman proposed the 673 use of ECN by an egress LSR feeding back congestion to an ingress LSR 674 to mitigate congestion by employing dynamic traffic engineering 675 techniques such as shifting flows to an alternate path. It proposed 676 a new RSVP message which was sent by the egress LSR to the ingress 677 LSR (and ignored by transit LSRs) to indicate congestion along the 678 path. Thus, rather than providing the same style of congestion 679 notification to endpoints as defined in [RFC3168], [Shayman] limits 680 its scope to the MPLS domain only. This application of ECN in an 681 MPLS domain could make use of the ECN encoding in the MPLS header 682 that is defined in this document. 684 9.4. PCN flow admission control and flow termination 686 [I-D.ietf-pcn-architecture] proposes using pre-congestion 687 notification (PCN) on routers within an edge-to-edge Diffserv region 688 to control admission of new flows to the region and, if necessary, to 689 terminate existing flows in response to disasters and other anomalous 690 routing events. In this approach, the current level of PCN marking 691 is picked up by the signalling used to initiate each flow in order to 692 inform the admission control decision for the whole region at once. 693 For example, extensions to RSVP [I-D.lefaucheur-rsvp-ecn] and NSIS 694 [I-D.ietf-nsis-rmd], [I-D.arumaithurai-nsis-pcn] have been proposed. 696 If LSRs are able to mark packets to signify congestion in MPLS, PCN 697 marking could be used for admission control and flow termination 698 across a Diffserv region, irrespective of whether it contained pure 699 IP routers, MPLS LSRs, or both. Indeed, the solution could be 700 somewhat more efficient to implement if aggregates could identify 701 themselves by their MPLS label. Appendix A describes the mechanisms 702 by which the necessary markings for PCN could be carried in the MPLS 703 header. 705 10. IANA Considerations 707 This document makes no request of IANA. 709 Note to RFC Editor: this section may be removed on publication as an 710 RFC. 712 11. Security Considerations 714 We believe no new vulnerabilities are introduced by this draft. 716 We have considered whether malicious sources might be able to exploit 717 the fact that interior LSRs will mark packets that are Not-ECT, 718 relying on their egress LSR to drop them. Although this might allow 719 sources to engineer a situation where more traffic is carried across 720 an MPLS domain than should be, we figured that even if we hadn't 721 introduced this feature, these sources would have been able to 722 prevent these LSRs dropping this traffic anyway, simply by setting 723 ECT in the first place. 725 An ECN sender can use the ECN nonce [RFC3540] to detect a misbehaving 726 receiver. The ECN nonce works correctly across an MPLS domain 727 without requiring any specific support from the proposal in this 728 draft. The nonce does not need to be present in the MPLS shim 729 header. As long as the nonce is present in the IP header when the 730 ECN information is copied from the last MPLS shim header, it will be 731 overwritten if congestion has been experienced by an LSR. This is 732 all that is necessary for the sender to detect a misbehaving 733 receiver. 735 12. Acknowledgments 737 Thanks to K.K. Ramakrishnan and Sally Floyd for getting us thinking 738 about this in the first place and for providing advice on tunneling 739 of ECN packets, and to Sally Floyd, Joe Babiarz, Ben Niven-Jenkins, 740 Phil Eardley, Ruediger Geib, and Magnus Westerlund for their comments 741 on the draft. 743 Appendix A. Extension to Pre-Congestion Notification 745 This appendix describes how the mechanisms decribed in the body of 746 the document can be extended to support PCN 747 [I-D.ietf-pcn-architecture]. Our intent here is to show that the 748 mechanisms are readily extended to more complex scenarios than ECN, 749 particulary in the case where more codepoints are needed, but this 750 appendix may be safely ignored if one is interested only in 751 supporting ECN. Note that the PCN standards are still very much 752 under development at the time of writing, hence the precise details 753 contained in this appendix may be subject to change, and we stress 754 that this appendix is for illustrative purposes only. 756 The relevant aspects of PCN for the purposes of this discussion are: 758 o PCN uses 3 states rather than 2 for ECN - these are referred to as 759 admission marked (AM), termination marked (TM) and not marked (NM) 760 states. (See Section 9.4 for further discussion of PCN and the 761 possibility of using fewer codepoints.) 763 o A packet can go from NM to AM, from NM to TM, or from AM to TM, 764 but no other transition is possible. 766 o The determination of whether a packet is subject to PCN is based 767 on the PHB of the packet. 769 Thus, to support PCN fully in an MPLS domain for a particular PHB, a 770 total of 3 codepoints need to be allocated for that PHB. These 3 771 codepoints represent the admission marked (AM), termination marked 772 (TM) and not marked (NM) states. The procedures described in 773 Section 4 above need to be slightly modified to support this 774 scenario. The following procedures are invoked when the topmost DSCP 775 or EXP value indicates a PHB that supports PCN. 777 Appendix A.1. Label Push onto IP packet 779 If the IP packet header indicates AM, set the EXP value of all 780 entries in the label stack to AM. If the IP packet header indicates 781 TM, set the EXP value of all entries in the label stack to TM. For 782 any other marking of the IP header, set the EXP value of all entries 783 in the label stack to NM. 785 Appendix A.2. Pushing Additional MPLS Labels 787 The procedures of Section 4.2 apply. 789 Appendix A.3. Admission Control or Flow Termination Marking inside MPLS 790 domain 792 The EXP value can be set to AM or TM according to the same procedures 793 as described in [I-D.briscoe-tsvwg-cl-phb]. For the purposes of this 794 document, it does not matter exactly what algorithms are used to 795 decide when to set AM or TM; all that matters is that if a router 796 would have marked AM (or TM) in the IP header, it should set the EXP 797 value in the MPLS header to the AM (or TM) codepoint. 799 Appendix A.4. Popping an MPLS Label (not end of stack) 801 When popping an MPLS Label exposes another MPLS label, the AM or TM 802 marking should be transferred to the exposed EXP field in the 803 following manner: 805 o If the inner EXP value is NM, then it should be set to the same 806 marking state as the EXP value of the popped label stack entry. 808 o If the inner EXP value is AM, it should be unchanged if the popped 809 EXP value was AM, and it should be set to TM if the popped EXP 810 value was TM. If the popped EXP value was NM, this should be 811 logged in some way and the inner EXP value should be unchanged. 813 o If the inner EXP value is TM, it should be unchanged whatever the 814 popped EXP value was, but any EXP value other than TM should be 815 logged. 817 Appendix A.5. Popping the last MPLS Label to expose IP header 819 When popping the last MPLS Label exposes the IP header, there are two 820 cases to consider: 822 o the popping LSR is NOT the egress router of the PCN region, in 823 which case AM or TM marking should be transferred to the exposed 824 IP header field; or 826 o the popping LSR IS the egress router of the PCN region. 828 In the latter case, the behavior of the egress LSR is defined in 829 [I-D.ietf-pcn-architecture] and is beyond the scope of this document. 830 In the former case, the marking should be transferred from the popped 831 MPLS header to the exposed IP header as follows: 833 o If the inner IP header value is neither AM nor TM, and the EXP 834 value was NM, then the IP header should be unchanged. For any 835 other EXP value, the IP header should be set to the same marking 836 state as the EXP value of the popped label stack entry. 838 o If the inner IP header value is AM, it should be unchanged if the 839 popped EXP value was AM, and it should be set to TM if the popped 840 EXP value was TM. If the popped EXP value was NM, this should be 841 logged in some way and the inner IP header value should be 842 unchanged. 844 o If the IP header value is TM, it should be unchanged whatever the 845 popped EXP value was, but any EXP value other than TM should be 846 logged. 848 13. References 850 13.1. Normative References 852 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 853 Requirement Levels", BCP 14, RFC 2119, March 1997. 855 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 856 Label Switching Architecture", RFC 3031, January 2001. 858 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 859 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 860 Encoding", RFC 3032, January 2001. 862 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 863 of Explicit Congestion Notification (ECN) to IP", 864 RFC 3168, September 2001. 866 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 867 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 868 Protocol Label Switching (MPLS) Support of Differentiated 869 Services", RFC 3270, May 2002. 871 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 872 Internet Protocol", RFC 4301, December 2005. 874 13.2. Informative References 876 [Floyd] "A Proposal to Incorporate ECN in MPLS", 1999. 878 Work in progress. http://www.icir.org/floyd/papers/ 879 draft-ietf-mpls-ecn-00.txt 881 [I-D.arumaithurai-nsis-pcn] 882 Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service 883 Model for Pre-Congestion Notification (PCN)", 884 draft-arumaithurai-nsis-pcn-00 (work in progress), 885 September 2007. 887 [I-D.briscoe-tsvwg-cl-phb] 888 Briscoe, B., "Pre-Congestion Notification marking", 889 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 890 October 2006. 892 [I-D.briscoe-tsvwg-ecn-tunnel] 893 Briscoe, B., "Layered Encapsulation of Congestion 894 Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in 895 progress), July 2007. 897 [I-D.ietf-nsis-rmd] 898 Bader, A., "RMD-QOSM - The Resource Management in Diffserv 899 QOS Model", draft-ietf-nsis-rmd-11 (work in progress), 900 August 2007. 902 [I-D.ietf-pcn-architecture] 903 Eardley, P., "Pre-Congestion Notification Architecture", 904 draft-ietf-pcn-architecture-00 (work in progress), 905 August 2007. 907 [I-D.ietf-tsvwg-diffserv-class-aggr] 908 Chan, K., "Aggregation of DiffServ Service Classes", 909 draft-ietf-tsvwg-diffserv-class-aggr-04 (work in 910 progress), August 2007. 912 [I-D.lefaucheur-rsvp-ecn] 913 Faucheur, F., "RSVP Extensions for Admission Control over 914 Diffserv using Pre-congestion Notification (PCN)", 915 draft-lefaucheur-rsvp-ecn-01 (work in progress), 916 June 2006. 918 [RFC3260] Grossman, D., "New Terminology and Clarifications for 919 Diffserv", RFC 3260, April 2002. 921 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 922 Congestion Notification (ECN) Signaling with Nonces", 923 RFC 3540, June 2003. 925 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 926 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 928 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", 929 2000. 931 Work in progress. http://www.ee.umd.edu/~shayman/papers.d/ 932 draft-shayman-mpls-ecn-00.txt 934 Authors' Addresses 936 Bruce Davie 937 Cisco Systems, Inc. 938 1414 Mass. Ave. 939 Boxborough, MA 01719 940 USA 942 Email: bsd@cisco.com 944 Bob Briscoe 945 BT Research 946 B54/77, Sirius House 947 Adastral Park 948 Martlesham Heath 949 Ipswich 950 Suffolk IP5 3RE 951 United Kingdom 953 Email: bob.briscoe@bt.com 954 June Tay 955 BT Research 956 B54/77, Sirius House 957 Adastral Park 958 Martlesham Heath 959 Ipswich 960 Suffolk IP5 3RE 961 United Kingdom 963 Email: june.tay@bt.com 965 Full Copyright Statement 967 Copyright (C) The IETF Trust (2007). 969 This document is subject to the rights, licenses and restrictions 970 contained in BCP 78, and except as set forth therein, the authors 971 retain all their rights. 973 This document and the information contained herein are provided on an 974 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 975 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 976 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 977 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 978 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 979 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 981 Intellectual Property 983 The IETF takes no position regarding the validity or scope of any 984 Intellectual Property Rights or other rights that might be claimed to 985 pertain to the implementation or use of the technology described in 986 this document or the extent to which any license under such rights 987 might or might not be available; nor does it represent that it has 988 made any independent effort to identify any such rights. Information 989 on the procedures with respect to rights in RFC documents can be 990 found in BCP 78 and BCP 79. 992 Copies of IPR disclosures made to the IETF Secretariat and any 993 assurances of licenses to be made available, or the result of an 994 attempt made to obtain a general license or permission for the use of 995 such proprietary rights by implementers or users of this 996 specification can be obtained from the IETF on-line IPR repository at 997 http://www.ietf.org/ipr. 999 The IETF invites any interested party to bring to its attention any 1000 copyrights, patents or patent applications, or other proprietary 1001 rights that may cover technology that may be required to implement 1002 this standard. Please address the information to the IETF at 1003 ietf-ipr@ietf.org. 1005 Acknowledgments 1007 Funding for the RFC Editor function is provided by the IETF 1008 Administrative Support Activity (IASA). This document was produced 1009 using xml2rfc v1.32 (of http://xml.resource.org/) from a source in 1010 RFC-2629 XML format.