idnits 2.17.1 draft-davie-ecn-mpls-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 903. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 880. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 887. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 893. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 18, 2006) is 6522 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2475' is defined on line 759, but no explicit reference was found in the text == Unused Reference: 'I-D.briscoe-tsvwg-re-ecn-border-cheat' is defined on line 800, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-nsis-rmd' is defined on line 815, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Downref: Normative reference to an Informational RFC: RFC 3260 == Outdated reference: A later version (-04) exists of draft-briscoe-tsvwg-cl-architecture-02 == Outdated reference: A later version (-03) exists of draft-briscoe-tsvwg-cl-phb-01 == Outdated reference: A later version (-01) exists of draft-briscoe-tsvwg-re-ecn-border-cheat-00 == Outdated reference: A later version (-09) exists of draft-briscoe-tsvwg-re-ecn-tcp-01 == Outdated reference: A later version (-20) exists of draft-ietf-nsis-rmd-06 == Outdated reference: A later version (-01) exists of draft-lefaucheur-rsvp-ecn-00 Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Davie 3 Internet-Draft Cisco Systems, Inc. 4 Expires: December 20, 2006 B. Briscoe 5 J. Tay 6 BT Research 7 June 18, 2006 9 Explicit Congestion Marking in MPLS 10 draft-davie-ecn-mpls-00.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on December 20, 2006. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 RFC 3270 defines how to support the Diffserv arhitecture in MPLS 44 networks, including how to encode Diffserv Code Points (DSCPs) in an 45 MPLS header. DSCPs may be encoded in the EXP field, while other uses 46 of that field are not precluded. RFC3270 makes no statement about 47 how Explicit Congestion Notification (ECN) marking might be encoded 48 in the MPLS header. This draft defines how an operator might define 49 some of the EXP codepoints for explicit congestion notification, 50 without precluding other uses. 52 Requirements Language 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 56 document are to be interpreted as described in RFC 2119 [RFC2119]. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.2. Intent . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 64 2. Use of MPLS EXP Field for ECN . . . . . . . . . . . . . . . . 5 65 3. Per-domain ECT checking . . . . . . . . . . . . . . . . . . . 7 66 4. ECN-enabled MPLS domain . . . . . . . . . . . . . . . . . . . 8 67 4.1. Pushing (adding) one or more labels to an IP packet . . . 8 68 4.2. Pushing one or more labels onto an MPLS labelled packet . 8 69 4.3. Congestion experienced in an interior MPLS node . . . . . 9 70 4.4. Crossing a Diffserv Domain Boundary . . . . . . . . . . . 9 71 4.5. Popping an MPLS label (not the end of the stack) . . . . . 9 72 4.6. Popping the last MPLS label in the stack . . . . . . . . . 9 73 4.7. Diffserv Tunneling Models . . . . . . . . . . . . . . . . 10 74 4.8. Extension to Pre-Congestion Notification . . . . . . . . . 10 75 4.8.1. Label Push onto IP packet . . . . . . . . . . . . . . 10 76 4.8.2. Pushing Additional MPLS Labels . . . . . . . . . . . . 10 77 4.8.3. Admission Control or Pre-emption Marking inside 78 MPLS domain . . . . . . . . . . . . . . . . . . . . . 11 79 4.8.4. Popping an MPLS Label (not end of stack) . . . . . . . 11 80 4.8.5. Popping the last MPLS Label to expose IP header . . . 11 81 5. ECN-disabled MPLS domain . . . . . . . . . . . . . . . . . . . 11 82 6. The use of more codepoints with E-LSPs and L-LSPs . . . . . . 11 83 7. Relationship to tunnel behavior in RFC 3168 . . . . . . . . . 12 84 7.1. Alternative approach to support ECN in an MPLS domain . . 12 85 8. Example Uses . . . . . . . . . . . . . . . . . . . . . . . . . 13 86 8.1. RFC3168-style ECN . . . . . . . . . . . . . . . . . . . . 13 87 8.2. ECN Co-existence with Diffserv E-LSPs . . . . . . . . . . 14 88 8.3. Congestion-feedback-based Traffic Engineering . . . . . . 14 89 8.4. PCN flow admission control and flow pre-emption . . . . . 14 90 9. Deployment Considerations . . . . . . . . . . . . . . . . . . 15 91 9.1. Marking non-ECN Capable Packets . . . . . . . . . . . . . 15 92 9.2. Non-ECN capable routers in an MPLS Domain . . . . . . . . 16 93 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 94 11. Security Considerations . . . . . . . . . . . . . . . . . . . 16 95 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 96 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 97 13.1. Normative References . . . . . . . . . . . . . . . . . . . 17 98 13.2. Informative References . . . . . . . . . . . . . . . . . . 18 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 100 Intellectual Property and Copyright Statements . . . . . . . . . . 21 102 1. Introduction 104 1.1. Background 106 [RFC3270] defines how to support the Diffserv arhitecture in MPLS 107 networks, including how to encode Diffserv Code Points (DSCPs) in an 108 MPLS header. DSCPs may be encoded in the EXP field, while other uses 109 of that field are not precluded. RFC3270 makes no statement about 110 how Explicit Congestion Notification (ECN) marking might be encoded 111 in the MPLS header. This draft defines how an operator might define 112 some of the EXP codepoints for explicit congestion notification, 113 without precluding other uses. In parallel to the activity defining 114 the addition of ECN to IP [RFC3168], two proposals were made to add 115 ECN to MPLS [Floyd][Shayman]. These proposals, however, fell by the 116 way-side. With ECN for IP now being a proposed standard, and 117 developing interest in using pre-congestion notification (PCN) for 118 admission control and flow pre-emption[I-D.briscoe-tsvwg-cl- 119 architecture], there is consequent interest in being able to support 120 ECN across IP networks consisting of MPLS-enabled domains. Therefore 121 it is necessary to specify the protocol for including ECN or PCN in 122 the MPLS shim header, and the protocol behaviour of edge MPLS nodes. 124 We note that in [RFC3168] there are four codepoints used for ECN 125 marking, which are encoded using two bits of the IP header. The MPLS 126 EXP field is the logical place to encode ECN codepoints, but with 127 only 3 bits (8 codepoints) available, and with the same field being 128 used to convey DSCP information as well, there is a clear incentive 129 to conserve the number of codepoints consumed for ECN purposes. 130 Efficient use of the EXP field has been a focus of prior drafts 131 [Floyd] [Shayman] and we draw on those efforts in this draft as well. 133 1.2. Intent 135 Our intent is to specify how the MPLS shim header[RFC3032] should 136 denote ECN marking and how MPLS nodes should understand whether the 137 transport for a packet will be ECN capable. We offer this as a 138 building block, from which to build different congestion notification 139 systems. We do not intend to specify how the resulting congestion 140 notification is fed back to an upstream node that can mitigate 141 congestion. For instance, unlike [Shayman], we do not specify edge- 142 to-edge MPLS domain feedback, but we also do not preclude it. 143 Nonetheless, we do specify how the egress node of an MPLS domain 144 should copy congestion notification from the MPLS shim into the 145 underlying IP header if the ECN is to be carried onward towards the 146 IP receiver. But we do NOT mandate that MPLS congestion notification 147 must be copied into the IP header for onward transmission. This 148 draft aims to be generic for any use of congestion notification in 149 MPLS. PCN or traffic engineering are merely two of many motivating 150 applications (see Section 8.) 152 1.3. Terminology 154 This document draws freely on the terminology of ECN [RFC3168] and 155 MPLS [RFC3031]. For ease of reference, we have included some 156 definitions here, but refer the reader to the references above for 157 complete specifications of the relevant technologies: 159 o CE: Congestion Experienced. One of the states with which a packet 160 may be marked in a network supporting ECN. A packet is marked in 161 this state by an ECN-capable router, to indicate that this router 162 was experiencing congestion at the time the packet arrived. 164 o ECT: ECN-capable Transport. One of the ECN states which a packet 165 may be in when it is sent by an end system. An end system marks a 166 packet with an ECT codepoint to indicate that the end-points of 167 the transport protocol are ECN-capable. A router may not mark a 168 packet as CE unless the packet was marked ECT when it arrived. 170 o Not-ECT: Not ECN capable transport. An end system marks a packet 171 with this codepoint to indicate that the end-points of the 172 transport protocol are not ECN-capable. A congested router cannot 173 mark such packets as CE, and thus can only drop them to indicate 174 congestion. 176 o EXP field. A 3 bit field in the MPLS label header [RFC3032] which 177 may be used to convey Diffserv information (and used in this draft 178 to carry ECN information). 180 o PHP. Penultimate Hop Popping. An MPLS operation in which the 181 penultimate Label Switching Router (LSR) on a Label Switched Path 182 (LSP) removes the top label from the packet before forwarding the 183 packet to the final LSR on the LSP. 185 2. Use of MPLS EXP Field for ECN 187 We propose that LSRs configured for explicit congestion notification 188 should use the EXP field in the MPLS shim header. However, RFC 3270 189 already defines use of codepoints in the EXP field for differentiated 190 services. Although it does not preclude other compatible uses of the 191 EXP field, this clearly seems to limit the space available for ECN, 192 given the field is only 3 bits (8 codepoints). 194 RFC 3270 defines two possible approaches for requesting 195 differentiated service treatment from an LSR. 197 o In the E-LSP approach, different codepoints of the EXP field in 198 the MPLS shim header are used to indicate the packet's per hop 199 behaviour (PHB). 201 o In the L-LSP approach, an MPLS label is assigned for each PHB 202 scheduling class (PSC, as defined in [RFC3260], so that an LSR 203 determines both its forwarding and its scheduling behaviour from 204 the label. 206 If an MPLS domain uses the L-LSP approach, there is likely to be 207 space in the EXP field for ECN codepoint(s). Where the E-LSP 208 approach is used, then codepoint space in the EXP field is likely to 209 be scarce. This draft focuses on interworking ECN marking with the 210 E-LSP approach as it is the tougher problem. Consequently the same 211 approach can also be applied with L-LSPs. 213 We recommend that explicit congestion notification in MPLS should use 214 codepoints instead of bits in the EXP field. Since not every DSCP 215 will need an associated ECN codepoint and some DSCPs might need two 216 ECN codepoints [I-D.briscoe-tsvwg-cl-architecture], it would be 217 wasteful and incorrect to assign a bit for ECN. 219 For each PHB that uses ECN marking, we assume one EXP codepoint will 220 be defined meaning not congestion marked (Not-CM), and at least one 221 other codepoint will be defined meaning congestion marked (CM). 222 Therefore, each PHB that uses ECN marking will consume at least two 223 EXP codepoints. But PHBs that do not use ECN marking will only 224 consume one. 226 Further, we wish to use minimal space in the MPLS shim header to tell 227 interior LSRs whether each packet will be received by an ECN-capable 228 transport (ECT). Nonetheless, we must ensure that an end-point that 229 would not understand an ECN mark will not receive one, otherwise it 230 will not be able to respond to congestion as it should. In the past, 231 three solutions to this problem have been proposed: 233 o One possible approach is for congested LSRs to mark the ECN field 234 in the underlying IP header at the bottom of the label stack. 235 Although many commercial LSRs routinely access the IP header for 236 other reasons (ECMP), there are numerous drawbacks to attempting 237 to find an IP header beneath an MPLS label stack. Notably, there 238 is the challenge of detecting the absence of an IP header when 239 non-IP packets are carried on an LSP. Therefore we will not 240 consider this approach further. 242 o In the schemes suggested by [Floyd] and [Shayman], ECT and CE are 243 overloaded into one bit, so that a 0 means ECT while a 1 might 244 either mean Not-ECT or it might mean CE. A packet that has been 245 marked as having experienced congestion upstream, and then is 246 picked out for marking at a second congested LSR, will be dropped 247 by the second LSR since it cannot determine whether the packet has 248 previously experienced congestion or if ECN is not supported by 249 the transport. 251 While such an approach seemed potentially palatable for 252 traditional ECN, we do not recommend it here for the following 253 reasons. In some cases we wish to be able to use ECN marking long 254 before actual congestion (e.g. pre-congestion notification). In 255 these circumstances, marking rates at each LSR might be non- 256 negligible most of the time, so the chances of a previously marked 257 packet encountering an LSR that wants to mark it again will also 258 be non-negligible. This will lead to unacceptable drop rates. 259 For instance, if the typical marking rate at every router or LSRs 260 is p, and the typical diameter of the network of LSRs is d, then 261 the probability that a marked packet will be marked again is 1- 262 [1+p(d-1)][1-p]^(d-1). For instance, with 6 LSRs in a row, each 263 marking ECN with 1% probability, this bit overloading scheme would 264 introduce a drop rate of 0.15% unnecessarily. Given most modern 265 core networks are sized to introduce near-zero packet drop, it may 266 be unacceptable to drop over one in a thousand packets 267 unnecessarily. 269 o A third possible approach is for interior LSRs to assume that the 270 endpoints are ECN-capable, but this assumption is checked when the 271 final label is popped. If an interior LSR has marked ECN in the 272 EXP field of the shim, but the IP header says the endpoints are 273 not ECN capable, the edge router (or penultimate if using 274 penultimate hop popping) drops the packet. We recommend this 275 scheme, which we call `per-domain ECT checking'; and define it 276 more precisely in the following section. Its chief drawback is 277 that it can involve packets continuing to be forwarded after 278 encountering congestion only to be dropped at the egress of the 279 MPLS domain. The rationale for this decision is given in 280 Section 9.1. 282 3. Per-domain ECT checking 284 For the purposes of this discussion, we define the egress nodes of an 285 MPLS domain as the nodes that pop the last MPLS label from the label 286 stack, exposing the IP (or, potentially non-IP) header. Note that 287 such a node may be the ultimate or penultimate hop of an LSP, 288 depending on whether penultimate hop popping (PHP) is employed. 290 In the per-domain ECT checking approach, the egress nodes take 291 responsibility for checking whether the transport is ECN capable. 293 This draft does not specify how these nodes should pass on congestion 294 notification, because different approaches are likely in different 295 scenarios. However, if congestion notification in the MPLS header is 296 copied into the IP header, the procedure MUST conform to the 297 specification given here. 299 If congestion notification is passed to the transport without first 300 passing it onward in the IP header, the approach used must take 301 similar care to check that the transport is ECN capable before 302 passing it ECN markings. Specifically, if the transport for a 303 particular congestion marked MPLS packet is found not to be ECN- 304 capable, the packet MUST be dropped at this egress node. 306 In the per-domain ECT checking approach, only the egress nodes check 307 whether an IP packet is destined for an ECN-capable transport. 308 Therefore, any single LSR within an MPLS domain MUST NOT be 309 configured to enable ECN marking unless all the egress LSRs 310 surrounding it are already configured to handle ECN marking. 312 We call a domain surrounded by ECN-capable egress LSRs an ECN-enabled 313 MPLS domain. This term only implies that all the egress LSRs are 314 ECN-enabled; some interior LSRs may not be ECN-enabled. For 315 instance, it would be possible to use legacy LSRs incapable of 316 supporting ECN in the interior of an MPLS domain as long as all the 317 egress LSRs were ECN-capable. Note that if PHP is used, the 318 "penultimate hop" routers which perform the pop operation do need to 319 be ECN-enabled, since they are acting in this context as egress LSRs. 321 4. ECN-enabled MPLS domain 323 In the following subsections we describe various operations affecting 324 the ECN marking of a packet that may be performed at MPLS edge and 325 core LSRs. 327 4.1. Pushing (adding) one or more labels to an IP packet 329 On encapsulating an IP packet with an MPLS label stack, the ECN field 330 must be translated from the IP packet into the MPLS EXP field. The 331 Not-CM (not congestion marked) state is set in the MPLS EXP field if 332 the ECN status of the IP packet is "Not ECT" or ECT(1) or ECT(0). 333 The CM state is set if the ECN status of the IP packet is "CE". If 334 more than one label is pushed at one time, the same value should be 335 placed in the EXP value of all label stack entries. 337 4.2. Pushing one or more labels onto an MPLS labelled packet 339 The EXP field is copied directly from the topmost label before the 340 push to the newly added outer label. If more than one label is being 341 pushed, the same EXP value is copied to all label stack entries. 343 4.3. Congestion experienced in an interior MPLS node 345 If the EXP codepoint of the packet maps to a PHB that uses ECN 346 marking and the marking algorithm requires the packet to be marked, 347 the CM state is set (irrespective of whether it is already in the CM 348 state). 350 If the buffer is full, the packet would be dropped. 352 4.4. Crossing a Diffserv Domain Boundary 354 If an MPLS-encapsulated packet crosses a Diffserv domain boundary, it 355 may be the case that the two domains use different encodings of the 356 same PHB in the EXP field. In such cases, the EXP field must be 357 rewritten at the domain boundary. If the PHB is one that supports 358 ECN, then the appropriate ECN marking should also be preserved when 359 the EXP field is mapped at the boundary. 361 The related issue of Diffserv tunnel models is discussed in 362 Section 4.7. 364 4.5. Popping an MPLS label (not the end of the stack) 366 When a packet has more than one MPLS label in the stack and the top 367 label is popped, another MPLS label is exposed. In this case the ECN 368 information should be transferred from the outer EXP field to the 369 inner MPLS label in the following manner. If the inner EXP field is 370 Not-CM, the inner EXP field is set to the same CM or Not-CM state as 371 the outer EXP field. If the inner EXP field is CM, it remains 372 unchanged whatever the outer EXP field. Note that an inner value of 373 CM and an outer value of not-CM should be considered anomalous, and 374 SHOULD be logged in some way by the LSR. 376 4.6. Popping the last MPLS label in the stack 378 When the last MPLS label is popped from the packet, its payload is 379 exposed. If that packet is not IP, and does not have any capability 380 equivalent to ECT, it is assumed Not-ECT and treated as such. That 381 means that if the EXP value of the MPLS header was CM, the packet 382 MUST be dropped. 384 Assuming an IP packet was exposed, we have to examine whether that 385 packet is ECT or not. If the inner IP packet is Not-ECT, its ECN 386 field remains unchanged if the EXP field is Not-CM. However, a Not- 387 ECT packet MUST be dropped if the EXP field is CM. 389 If the ECN field of the inner packet is set to ECT(0), ECT(1) or CE, 390 the ECN field remains unchanged if the EXP field is set to Not-CM. 391 The ECN field is set to CE if the EXP field is CM. Note that an 392 inner value of CE and an outer value of not-CM should be considered 393 anomalous, and SHOULD be logged in some way by the LSR. 395 4.7. Diffserv Tunneling Models 397 [RFC3270] describes three tunneling models for Diffserv support 398 across MPLS Domains, referred to as the uniform, short pipe, and pipe 399 models. The differences between these models lie in whether the 400 Diffserv treatment that applies to a packet while it travels along a 401 particular LSP is carried to the last hop of the LSP and beyond the 402 last hop. Depending on which mode is preferred by an operator, the 403 EXP value or DSCP value of an exposed header following a label pop 404 may or may not be dependent on the EXP value of the label that is 405 removed by the pop operation. We believe that in the case of ECN 406 marking, the use of these models should only apply to the encoding of 407 the Diffserv PHB in the EXP value, and that the choice of codepoint 408 for ECN should always be made based on the procedures described 409 above, independent of the tunneling model. 411 4.8. Extension to Pre-Congestion Notification 413 To fully support PCN [I-D.briscoe-tsvwg-cl-architecture] in an MPLS 414 domain for a particular PHB, a total of 3 codepoints need to be 415 allocated for that PHB. (See Section 8.4 for further discussion of 416 PCN and the possibility of using fewer codepoints.) These 3 417 codepoints represent the admission marked (AM), pre-emption marked 418 (PM) and not marked (NM) states. The procedures described above need 419 to be slightly modified to support this scenario. The following 420 procedures are invoked when the topmost DSCP or EXP value indicates a 421 PHB that supports PCN. 423 4.8.1. Label Push onto IP packet 425 If the IP packet header indicates AM, set the EXP value of all 426 entries in the label stack to AM. If the IP packet header indicates 427 PM, set the EXP value of all entries in the label stack to PM. For 428 any other marking of the IP header, set the EXP value of all entries 429 in the label stack to NM. 431 4.8.2. Pushing Additional MPLS Labels 433 The procedures of Section 4.2 apply. 435 4.8.3. Admission Control or Pre-emption Marking inside MPLS domain 437 The EXP value can be set to AM or PM according to the same procedures 438 as described in [I-D.briscoe-tsvwg-cl-phb]. 440 4.8.4. Popping an MPLS Label (not end of stack) 442 When popping an MPLS Label exposes another MPLS label, the AM or PM 443 marking should be transferred to the exposed EXP field in the 444 following manner: if the inner EXP value is NM, then it should be set 445 to the same marking state as the EXP value of the popped label stack 446 entry. If the inner EXP value is AM, it should be unchanged if the 447 popped EXP value was AM, and it should be set to PM if the popped EXP 448 value was PM. If the popped EXP value was NM, this should be logged 449 in some way and the inner EXP value should be unchanged. If the 450 inner EXP value is PM, it should be unchanged whatever the popped EXP 451 value was, but any EXP value other than PM should be logged. 453 4.8.5. Popping the last MPLS Label to expose IP header 455 When popping the last MPLS Label exposes the IP header, the AM or PM 456 marking should be transferred to the exposed IP header field in the 457 following manner: if the inner IP header value is neither AM nor PM, 458 and the EXP value was NM, then the IP header should be unchanged. 459 For any other EXP value, the IP header should be set to the same 460 marking state as the EXP value of the popped label stack entry. If 461 the inner IP header value is AM, it should be unchanged if the popped 462 EXP value was AM, and it should be set to PM if the popped EXP value 463 was PM. If the popped EXP value was NM, this should be logged in 464 some way and the inner IP header value should be unchanged. If the 465 IP header value is PM, it should be unchanged whatever the popped EXP 466 value was, but any EXP value other than PM should be logged. 468 5. ECN-disabled MPLS domain 470 If ECN is not enabled on all the egress LSRs of a domain, ECN MUST 471 NOT be enabled on any LSRs throughout the domain. If congestion is 472 experienced on any LSR in an ECN-disabled MPLS domain, packets MUST 473 be dropped NOT marked. The exact algorithm for deciding when to drop 474 packets during congestion (e.g. tail-drop, RED, etc.) is a local 475 matter for the operator of the domain. 477 6. The use of more codepoints with E-LSPs and L-LSPs 479 RFC 3270 gives different options with E-LSPs and L-LSPs and some of 480 those could potentially provide ample EXP codepoints for ECN/PCN. 482 However, deploying L-LSPs vs E-LSPs has many implications such as 483 platform support and operational complexity. The above ECN/PCN MPLS 484 solution should provide some flexibility. If the operator has 485 deployed one L-LSP per PHB scheduling class, then EXP space will be a 486 non-issue and it could be used to achieve more sophisticated ECN/PCN 487 behavior if required. If the operator wants to stick to E-LSPs and 488 uses a handful of EXP codepoints for Diffserv, it may be desirable to 489 operate with a minimum number of extra ECN/PCN codepoints, even if 490 this comes with some compromise on ECN/PCN optimality. See Section 8 491 for discussion of some possible deployment scenarios. 493 7. Relationship to tunnel behavior in RFC 3168 495 [RFC3168] defines two modes of encapsulating ECN-marked IP packets 496 inside additonal IP headers when tunnels are used. The two modes are 497 the "full functionality" and "limited functionality" modes. In the 498 full functionality mode, the ECT information from the inner header is 499 copied to the outer header at the tunnel ingress, but the CE 500 information is not. In the limited functionality mode, neither ECT 501 nor CE information is copied to the outer header, and thus ECN cannot 502 be applied to the encapsulated packet. 504 The behavior that is specified in Section 4 of this document 505 resembles the "full functionality" mode in the sense that it conveys 506 some information from inner to outer header, and in the sense that it 507 enables full ECN support along the MPLS LSP (which is analogous to an 508 IP tunnel in this context). However it differs in one respect, which 509 is that the CE information is conveyed from the inner header to the 510 outer header. Our reason for this different design choice is to give 511 interior routers and LSRs more information about upstream marking in 512 multi-bottleneck cases. For instance, the flow pre-emption marking 513 mechanism proposed for PCN works by only considering packets for 514 marking that have not already been marked upstream. Unless existing 515 pre-emption marking is copied from the inner to the outer header at 516 tunnel ingress, the mechanism doesn't pre-empt enough traffic in 517 cases where anomalous events hit multiple MPLS domains at once. 518 [RFC3168] does not give any reasons against conveying CE information 519 from the inner header to the outer in the "full functionality" mode. 520 So, rather than define different encapsulation methods for ECN and 521 PCN, Section 4 defines a common approach for both. 523 7.1. Alternative approach to support ECN in an MPLS domain 525 It is possible to define an approach for MPLS support of ECN that 526 more closely resembles that of the full functionality mode of 527 [RFC3168]. This approach would differ from that described in 528 Section 4 in the following ways: 530 o when pushing one or more MPLS labels onto an IP packet, the not-CM 531 state is set in the EXP field of all label stack entries 533 o when pushing one or more MPLS labels onto an MPLS packet, the 534 not-CM state is set in the EXP field of all newly added label 535 stack entries 537 o when popping an MPLS label and the exposed header is MPLS (i.e. 538 this is not the end of stack), the EXP field of the MPLS packet 539 should be set to CM if the popped label's EXP value was CM and 540 left unchanged otherwise 542 o when popping an MPLS label and the exposed header is IP, the IP 543 ECN field should be set to CE if the EXP value was CM and if the 544 IP header indicated that the packet was ECN capable. If the IP 545 header indicated not-ECT and the EXP value was CM, the packet MUST 546 be dropped. If the EXP value was not-CM, the ECN field in the IP 547 header is unchanged. 549 The advantages of this scheme over that described in Section 4 are 550 greater similarity to [RFC3168], and the ability to determine, at the 551 end of an LSP, that congestion either did or did not occur along that 552 LSP (since the initial state is always not-CM at the start of an 553 LSP). 555 A disadvantage of this approach is that exceptions to this rule are 556 necessary in cases where the marking process on LSRs needs to depend 557 on whether a packet has already suffered upstream marking. The 558 currently proposed pre-emption marking in PCN is an example where 559 such an exception would be necessary (see the discussion at the start 560 of Section 7). 562 8. Example Uses 564 8.1. RFC3168-style ECN 566 [RFC3168] proposes the use of ECN in TCP and introduces the use of 567 ECN-Echo and CWR flags in the TCP header for initialisation. The TCP 568 sender responds accordingly (such as not increasing the congestion 569 window) when it receives an ECN-Echo (ECE) ACK packet (that is, an 570 ACK packet with ECN-Echo flag set in the TCP header), then the sender 571 knows that congestion was encountered in the network on the path from 572 the sender to the receiver. 574 It would be possible to enable ECN in an MPLS domain for Diffserv 575 PHBs like AF and best efforts that are expected to be used by TCP and 576 similar transports (e.g. DCCP [RFC4340]). Then end-to-end 577 congestion control in transports capable of understanding ECN would 578 be able to respond to approaching congestion on LSRs without having 579 to rely on packet discard to signal congestion. 581 8.2. ECN Co-existence with Diffserv E-LSPs 583 Many operators today have deployed Diffserv using the E-LSP approach 584 of [RFC3270]. In many cases the number of PHBs used is less than 8, 585 and hence there remain available codepoints in the EXP space. If an 586 operator wished to support ECN for single PHB, this can be 587 accomplished by simply allocated a second codepoint to the PHB for 588 the "CM" state of that PHB and retaining the old codepoint for the 589 "not-CM" state. An operator with only four deployed PHBs could of 590 course enable ECN marking on all those PHBs. It is easy to imagine 591 cases where some PHBs might benefit more from ECN than others - for 592 example, an operator might use ECN on a premium data service but not 593 on a PHB used for best effort internet traffic. 595 As an illustrative example of how the EXP field might be used in this 596 case, consider the example of an operator who is using the aggregated 597 service classes described in [I-D.chan-tsvwg-diffserv-class-aggr]. 598 He may choose to support ECN only for the Assured Elastic Treatment 599 Aggregate, using the EXP codepoint 010 for the not-CM state and 011 600 for the CM state. All other codepoints could be the same as in 601 [I-D.chan-tsvwg-diffserv-class-aggr]. Of course any other 602 combination of EXP values can be used according to the specific set 603 of PHBs and marking conventions used within that operator's network. 605 8.3. Congestion-feedback-based Traffic Engineering 607 Shayman's traffic engineering [Shayman] proposed the use of ECN by an 608 egress LSR feeding back congestion to an ingress LSR to mitigate 609 congestion by employing dynamic traffic engineering techniques such 610 as shifting flows to an alternate path. It proposed a new RSVP 611 TUNNEL CONGESTION message which was sent to the ingress LSR and 612 ignored by transit LSRs. 614 8.4. PCN flow admission control and flow pre-emption 616 [I-D.briscoe-tsvwg-cl-architecture] proposes using pre-congestion 617 notification (PCN) on routers within an edge-to-edge Diffserv region 618 to control admission of new flows to the region and, if necessary, to 619 pre-empt existing flows in response to disasters and other anomalous 620 routing events. In this approach, the current level of PCN marking 621 is picked up by the signalling used to initiate each flow in order to 622 inform the admission control decision for the whole region at once. 623 As an example, a minor extension to RSVP signalling has been proposed 624 [I-D.lefaucheur-rsvp-ecn] to carry this message, but a similar 625 approach has also been proposed that uses NSIS signalling [I-D.ietf- 626 nsis-rmd]. 628 If it is possible for LSRs to signify congestion in MPLS, PCN marking 629 could be used for admission control and flow pre-emption across a 630 Diffserv region, irrespective of whether it contained pure IP 631 routers, MPLS LSRs, or both. Indeed, the solution could be somewhat 632 more efficient to implement if aggregates could identify themselves 633 by their MPLS label. Section 4.8 describes the mechanisms by which 634 the necessary markings for PCN could be carried in the MPLS header. 636 As an illustrative example of how the EXP field might be used in this 637 case, consider the example of an operator who is using the aggregated 638 service classes described in [I-D.chan-tsvwg-diffserv-class-aggr]. 639 He may choose to support PCN only for the Real Time Treatment 640 Aggregate, using the EXP codepoint 100 for the not-marked (NM) state, 641 101 for the Admission Marked (AM) state, and 111 for the Pre-emption 642 Marked (PM) state. All other codepoints could be the same as in 643 [I-D.chan-tsvwg-diffserv-class-aggr]. Of course any other 644 combination of EXP values can be used according to the specific set 645 of PHBs and marking conventions used within that operator's network. 647 It might also be possible to deploy a similar solution using PCN 648 marking over MPLS for just admission control alone, or just flow pre- 649 emption alone, particularly if codepoint space was at a premium in 650 the MPLS EXP field. However, the feasibility of deploying one 651 without the other would require further study. 653 9. Deployment Considerations 655 9.1. Marking non-ECN Capable Packets 657 What is the consequences of marking a packet that is not ECN-capable? 658 Even if it will be dropped before leaving the domain, doesn't this 659 consume resources unnecessarily? 661 The problem only arises if there is congestion downstream of an 662 earlier congested node. It might be that marked packets are carried 663 through this second congested router when, within the underlying IP 664 header they are not ECN capable, so they will be dropped later. Such 665 packets might cause other packets to be marked (or dropped) that 666 would not otherwise have been. 668 We decided to use the per-domain ECT checking approach because it 669 would become optimal as ECN deployment became prevalent. The 670 situation where traffic is carried beyond a congested LSR only to be 671 dropped later should become less prevalent as more transports use 672 ECN. This is why we chose not to use the [Floyd] alternative which 673 introduced a low but persistent level of unnecessary packet drop for 674 all time. Although that scheme did not carry droppable traffic to 675 the edge of the MPLS domain, we felt this was a small price to pay, 676 and it was anyway only of concern until ECN had become more widely 677 deployed. 679 A partial solution would be to preferentially drop packets arriving 680 at a congested router that were already marked. There is no solution 681 to the problem of marking a packet congested by another packet that 682 should have been dropped. However, the chance of such an occurrence 683 is very low and the consequences are not significant. It merely 684 causes an application to very occasionally slow down its rate when it 685 did not have to. 687 9.2. Non-ECN capable routers in an MPLS Domain 689 What if an MPLS domain wants to use ECN, but not all legacy routers 690 are able to support it? 692 If the legacy router(s) are used in the interior, this is not a 693 problem. They will simply have to drop the packets if they are 694 congested, rather than mark them, which is the standard behaviour for 695 IP routers that are not ECN-enabled. 697 If the legacy router were used as an egress router, it would not be 698 able to check the ECN capability of the transport correctly. An 699 operator in this position would not be able to use this solution and 700 therefore MUST NOT enable ECN unless all egress routers are ECN- 701 capable. 703 10. IANA Considerations 705 This document makes no request of IANA. 707 Note to RFC Editor: this section may be removed on publication as an 708 RFC. 710 11. Security Considerations 712 We believe no new vulnerabilities are introduced by this draft. 714 We have considered whether malicious sources might be able to exploit 715 the fact that interior LSRs will mark packets that are Not-ECT, 716 relying on their egress LSR to drop them. Although this might allow 717 sources to engineer a situation where more traffic is carried across 718 an MPLS domain than should be, we figured that even if we hadn't 719 introduced this feature, these sources would have been able to 720 prevent these LSRs dropping this traffic anyway, simply by setting 721 ECT in the first place. 723 An ECN sender can use the ECN nonce [RFC3540] to detect a misbehaving 724 receiver. The ECN nonce works correctly across an MPLS domain 725 without requiring any specific support from the proposal in this 726 draft. The nonce does not need to be present in the MPLS shim 727 header. As long as the nonce is present in the IP header when the 728 ECN information is copied from the last MPLS shim header, it will be 729 overwritten if congestion has been experienced by an LSR. This is 730 all that is necessary for the sender to detect a misbehaving 731 receiver. 733 An alternative proposal currently in progress in the IETF 734 [I-D.briscoe-tsvwg-re-ecn-tcp] allows the network to prevent 735 misbehaviour by senders or receivers or other routers. Like the ECN 736 nonce, it works correctly without requiring any specific support from 737 the proposal in this draft. It uses a bit in the IP header (the RE 738 bit) which is set by the sender and never changed along the path-it 739 is only read by certain policing elements in the network. There is 740 no need for a copy of this bit in the MPLS shim, as policing nodes 741 can examine the IP header if they need to, particularly given they 742 are intended to only be necessary at domain borders where MPLS 743 headers are often removed. 745 12. Acknowledgements 747 Thanks to K.K. Ramakrishnan and Sally Floyd for getting us thinking 748 about this in the first place and for providing advice on tunneling 749 of ECN packets, and to Joe Babiarz and Ben Niven-Jenkins for their 750 comments on the draft. 752 13. References 754 13.1. Normative References 756 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 757 Requirement Levels", BCP 14, RFC 2119, March 1997. 759 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 760 and W. Weiss, "An Architecture for Differentiated 761 Services", RFC 2475, December 1998. 763 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 764 Label Switching Architecture", RFC 3031, January 2001. 766 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 767 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 768 Encoding", RFC 3032, January 2001. 770 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 771 of Explicit Congestion Notification (ECN) to IP", 772 RFC 3168, September 2001. 774 [RFC3260] Grossman, D., "New Terminology and Clarifications for 775 Diffserv", RFC 3260, April 2002. 777 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 778 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 779 Protocol Label Switching (MPLS) Support of Differentiated 780 Services", RFC 3270, May 2002. 782 13.2. Informative References 784 [Floyd] "A Proposal to Incorporate ECN in MPLS", 1999. 786 Work in progress. http://www.icir.org/floyd/papers/ 787 draft-ietf-mpls-ecn-00.txt 789 [I-D.briscoe-tsvwg-cl-architecture] 790 Briscoe, B., "A Framework for Admission Control over 791 DiffServ using Pre-Congestion Notification", 792 draft-briscoe-tsvwg-cl-architecture-02 (work in progress), 793 March 2006. 795 [I-D.briscoe-tsvwg-cl-phb] 796 Briscoe, B., "Pre-Congestion Notification marking", 797 draft-briscoe-tsvwg-cl-phb-01 (work in progress), 798 March 2006. 800 [I-D.briscoe-tsvwg-re-ecn-border-cheat] 801 Briscoe, B., "Emulating Border Flow Policing using Re-ECN 802 on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-00 803 (work in progress), February 2006. 805 [I-D.briscoe-tsvwg-re-ecn-tcp] 806 Briscoe, B., "Re-ECN: Adding Accountability for Causing 807 Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-01 808 (work in progress), March 2006. 810 [I-D.chan-tsvwg-diffserv-class-aggr] 811 Chan, K., "Aggregation of DiffServ Service Classes", 812 draft-chan-tsvwg-diffserv-class-aggr-03 (work in 813 progress), January 2006. 815 [I-D.ietf-nsis-rmd] 816 Bader, A., "RMD-QOSM - The Resource Management in Diffserv 817 QOS Model", draft-ietf-nsis-rmd-06 (work in progress), 818 February 2006. 820 [I-D.lefaucheur-rsvp-ecn] 821 Faucheur, F., "RSVP Extensions for Admission Control over 822 Diffserv using Pre-congestion Notification", 823 draft-lefaucheur-rsvp-ecn-00 (work in progress), 824 October 2005. 826 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 827 Congestion Notification (ECN) Signaling with Nonces", 828 RFC 3540, June 2003. 830 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 831 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 833 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", 834 2000. 836 Work in progress. http://www.ee.umd.edu/~shayman/papers.d/ 837 draft-shayman-mpls-ecn-00.txt 839 Authors' Addresses 841 Bruce Davie 842 Cisco Systems, Inc. 843 1414 Mass. Ave. 844 Boxborough, MA 01719 845 USA 847 Email: bsd@cisco.com 849 Bob Briscoe 850 BT Research 851 B54/77, Sirius House 852 Adastral Park 853 Martlesham Heath 854 Ipswich 855 Suffolk IP5 3RE 856 United Kingdom 858 Email: bob.briscoe@bt.com 860 June Tay 861 BT Research 862 B54/77, Sirius House 863 Adastral Park 864 Martlesham Heath 865 Ipswich 866 Suffolk IP5 3RE 867 United Kingdom 869 Email: june.tay@bt.com 871 Intellectual Property Statement 873 The IETF takes no position regarding the validity or scope of any 874 Intellectual Property Rights or other rights that might be claimed to 875 pertain to the implementation or use of the technology described in 876 this document or the extent to which any license under such rights 877 might or might not be available; nor does it represent that it has 878 made any independent effort to identify any such rights. Information 879 on the procedures with respect to rights in RFC documents can be 880 found in BCP 78 and BCP 79. 882 Copies of IPR disclosures made to the IETF Secretariat and any 883 assurances of licenses to be made available, or the result of an 884 attempt made to obtain a general license or permission for the use of 885 such proprietary rights by implementers or users of this 886 specification can be obtained from the IETF on-line IPR repository at 887 http://www.ietf.org/ipr. 889 The IETF invites any interested party to bring to its attention any 890 copyrights, patents or patent applications, or other proprietary 891 rights that may cover technology that may be required to implement 892 this standard. Please address the information to the IETF at 893 ietf-ipr@ietf.org. 895 Disclaimer of Validity 897 This document and the information contained herein are provided on an 898 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 899 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 900 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 901 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 902 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 903 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 905 Copyright Statement 907 Copyright (C) The Internet Society (2006). This document is subject 908 to the rights, licenses and restrictions contained in BCP 78, and 909 except as set forth therein, the authors retain all their rights. 911 Acknowledgment 913 Funding for the RFC Editor function is currently provided by the 914 Internet Society.