idnits 2.17.1 draft-davie-ecn-mpls-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 959. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 970. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 977. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 983. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 18, 2006) is 6398 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2475' is defined on line 833, but no explicit reference was found in the text == Unused Reference: 'I-D.briscoe-tsvwg-re-ecn-border-cheat' is defined on line 874, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Downref: Normative reference to an Informational RFC: RFC 3260 == Outdated reference: A later version (-04) exists of draft-briscoe-tsvwg-cl-architecture-03 == Outdated reference: A later version (-03) exists of draft-briscoe-tsvwg-cl-phb-02 == Outdated reference: A later version (-09) exists of draft-briscoe-tsvwg-re-ecn-tcp-02 == Outdated reference: A later version (-20) exists of draft-ietf-nsis-rmd-07 Summary: 5 errors (**), 0 flaws (~~), 7 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Davie 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track B. Briscoe 5 Expires: April 21, 2007 J. Tay 6 BT Research 7 October 18, 2006 9 Explicit Congestion Marking in MPLS 10 draft-davie-ecn-mpls-01.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on April 21, 2007. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 RFC 3270 defines how to support the Diffserv architecture in MPLS 44 networks, including how to encode Diffserv Code Points (DSCPs) in an 45 MPLS header. DSCPs may be encoded in the EXP field, while other uses 46 of that field are not precluded. RFC3270 makes no statement about 47 how Explicit Congestion Notification (ECN) marking might be encoded 48 in the MPLS header. This draft defines how an operator might define 49 some of the EXP codepoints for explicit congestion notification, 50 without precluding other uses. 52 Requirements Language 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 56 document are to be interpreted as described in RFC 2119 [RFC2119]. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 1.1. Changes From Previous (-00) Version . . . . . . . . . . . 4 62 1.2. Background . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Intent . . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 65 2. Use of MPLS EXP Field for ECN . . . . . . . . . . . . . . . . 6 66 3. Per-domain ECT checking . . . . . . . . . . . . . . . . . . . 8 67 4. ECN-enabled MPLS domain . . . . . . . . . . . . . . . . . . . 9 68 4.1. Pushing (adding) one or more labels to an IP packet . . . 9 69 4.2. Pushing one or more labels onto an MPLS labelled packet . 9 70 4.3. Congestion experienced in an interior MPLS node . . . . . 9 71 4.4. Crossing a Diffserv Domain Boundary . . . . . . . . . . . 9 72 4.5. Popping an MPLS label (not the end of the stack) . . . . . 10 73 4.6. Popping the last MPLS label in the stack . . . . . . . . . 10 74 4.7. Diffserv Tunneling Models . . . . . . . . . . . . . . . . 11 75 4.8. Extension to Pre-Congestion Notification . . . . . . . . . 11 76 4.8.1. Label Push onto IP packet . . . . . . . . . . . . . . 12 77 4.8.2. Pushing Additional MPLS Labels . . . . . . . . . . . . 12 78 4.8.3. Admission Control or Pre-emption Marking inside 79 MPLS domain . . . . . . . . . . . . . . . . . . . . . 12 80 4.8.4. Popping an MPLS Label (not end of stack) . . . . . . . 12 81 4.8.5. Popping the last MPLS Label to expose IP header . . . 12 82 5. ECN-disabled MPLS domain . . . . . . . . . . . . . . . . . . . 13 83 6. The use of more codepoints with E-LSPs and L-LSPs . . . . . . 13 84 7. Relationship to tunnel behavior in RFC 3168 . . . . . . . . . 13 85 7.1. Alternative approach to support ECN in an MPLS domain . . 14 86 8. Example Uses . . . . . . . . . . . . . . . . . . . . . . . . . 15 87 8.1. RFC3168-style ECN . . . . . . . . . . . . . . . . . . . . 15 88 8.2. ECN Co-existence with Diffserv E-LSPs . . . . . . . . . . 15 89 8.3. Congestion-feedback-based Traffic Engineering . . . . . . 16 90 8.4. PCN flow admission control and flow pre-emption . . . . . 16 91 9. Deployment Considerations . . . . . . . . . . . . . . . . . . 17 92 9.1. Marking non-ECN Capable Packets . . . . . . . . . . . . . 17 93 9.2. Non-ECN capable routers in an MPLS Domain . . . . . . . . 17 94 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 95 11. Security Considerations . . . . . . . . . . . . . . . . . . . 18 96 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 97 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 98 13.1. Normative References . . . . . . . . . . . . . . . . . . . 19 99 13.2. Informative References . . . . . . . . . . . . . . . . . . 20 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 101 Intellectual Property and Copyright Statements . . . . . . . . . . 22 103 1. Introduction 105 1.1. Changes From Previous (-00) Version 107 [Note to RFC Editor: This section to be removed before publication] 109 o Corrected the description of ECN-MPLS marking proposed in 110 [Shayman], which closely corresponds to that proposed in this 111 document. 113 o Pre-congestion notification (PCN) marking is now described in a 114 way that does not require normative references to PCN 115 specifications. PCN discussion now serves only to illustrate how 116 the ECN marking concepts can be extended to cover more complex 117 scenarios, with PCN being an example. 119 o Added specification of behavior when MPLS encapsulated packets 120 cross from an ECN-enabled domain to a domain that is not ECN- 121 enabled. 123 o Clarified that copying MPLS ECN or PCN marking into exposed IP 124 header on egress is not mandatory 126 o Fixed typos and nits 128 1.2. Background 130 [RFC3270] defines how to support the Diffserv architecture in MPLS 131 networks, including how to encode Diffserv Code Points (DSCPs) in an 132 MPLS header. DSCPs may be encoded in the EXP field, while other uses 133 of that field are not precluded. RFC3270 makes no statement about 134 how Explicit Congestion Notification (ECN) marking might be encoded 135 in the MPLS header. This draft defines how an operator might define 136 some of the EXP codepoints for explicit congestion notification, 137 without precluding other uses. In parallel to the activity defining 138 the addition of ECN to IP [RFC3168], two proposals were made to add 139 ECN to MPLS [Floyd][Shayman]. These proposals, however, fell by the 140 wayside. With ECN for IP now being a proposed standard, and 141 developing interest in using pre-congestion notification (PCN) for 142 admission control and flow pre-emption 143 [I-D.briscoe-tsvwg-cl-architecture], there is consequent interest in 144 being able to support ECN across IP networks consisting of MPLS- 145 enabled domains. Therefore it is necessary to specify the protocol 146 for including ECN in the MPLS shim header, and the protocol behavior 147 of edge MPLS nodes. 149 We note that in [RFC3168] there are four codepoints used for ECN 150 marking, which are encoded using two bits of the IP header. The MPLS 151 EXP field is the logical place to encode ECN codepoints, but with 152 only 3 bits (8 codepoints) available, and with the same field being 153 used to convey DSCP information as well, there is a clear incentive 154 to conserve the number of codepoints consumed for ECN purposes. 155 Efficient use of the EXP field has been a focus of prior drafts 156 [Floyd] [Shayman] and we draw on those efforts in this draft as well. 158 1.3. Intent 160 Our intent is to specify how the MPLS shim header[RFC3032] should 161 denote ECN marking and how MPLS nodes should understand whether the 162 transport for a packet will be ECN capable. We offer this as a 163 building block, from which to build different congestion notification 164 systems. We do not intend to specify how the resulting congestion 165 notification is fed back to an upstream node that can mitigate 166 congestion. For instance, unlike [Shayman], we do not specify edge- 167 to-edge MPLS domain feedback, but we also do not preclude it. 168 Nonetheless, we do specify how the egress node of an MPLS domain 169 should copy congestion notification from the MPLS shim into the 170 underlying IP header if the ECN is to be carried onward towards the 171 IP receiver. But we do NOT mandate that MPLS congestion notification 172 must be copied into the IP header for onward transmission. This 173 draft aims to be generic for any use of congestion notification in 174 MPLS. PCN or traffic engineering are merely two of many motivating 175 applications (see Section 8.) 177 1.4. Terminology 179 This document draws freely on the terminology of ECN [RFC3168] and 180 MPLS [RFC3031]. For ease of reference, we have included some 181 definitions here, but refer the reader to the references above for 182 complete specifications of the relevant technologies: 184 o CE: Congestion Experienced. One of the states with which a packet 185 may be marked in a network supporting ECN. A packet is marked in 186 this state by an ECN-capable router, to indicate that this router 187 was experiencing congestion at the time the packet arrived. 189 o ECT: ECN-capable Transport. One of the ECN states which a packet 190 may be in when it is sent by an end system. An end system marks a 191 packet with an ECT codepoint to indicate that the end-points of 192 the transport protocol are ECN-capable. A router may not mark a 193 packet as CE unless the packet was marked ECT when it arrived. 195 o Not-ECT: Not ECN capable transport. An end system marks a packet 196 with this codepoint to indicate that the end-points of the 197 transport protocol are not ECN-capable. A congested router cannot 198 mark such packets as CE, and thus can only drop them to indicate 199 congestion. 201 o EXP field. A 3 bit field in the MPLS label header [RFC3032] which 202 may be used to convey Diffserv information (and used in this draft 203 to carry ECN information). 205 o PHP. Penultimate Hop Popping. An MPLS operation in which the 206 penultimate Label Switching Router (LSR) on a Label Switched Path 207 (LSP) removes the top label from the packet before forwarding the 208 packet to the final LSR on the LSP. 210 2. Use of MPLS EXP Field for ECN 212 We propose that LSRs configured for explicit congestion notification 213 should use the EXP field in the MPLS shim header. However, RFC 3270 214 already defines use of codepoints in the EXP field for differentiated 215 services. Although it does not preclude other compatible uses of the 216 EXP field, this clearly seems to limit the space available for ECN, 217 given the field is only 3 bits (8 codepoints). 219 RFC 3270 defines two possible approaches for requesting 220 differentiated service treatment from an LSR. 222 o In the E-LSP approach, different codepoints of the EXP field in 223 the MPLS shim header are used to indicate the packet's per hop 224 behavior (PHB). 226 o In the L-LSP approach, an MPLS label is assigned for each PHB 227 scheduling class (PSC, as defined in [RFC3260], so that an LSR 228 determines both its forwarding and its scheduling behavior from 229 the label. 231 If an MPLS domain uses the L-LSP approach, there is likely to be 232 space in the EXP field for ECN codepoint(s). Where the E-LSP 233 approach is used, then codepoint space in the EXP field is likely to 234 be scarce. This draft focuses on interworking ECN marking with the 235 E-LSP approach as it is the tougher problem. Consequently the same 236 approach can also be applied with L-LSPs. 238 We recommend that explicit congestion notification in MPLS should use 239 codepoints instead of bits in the EXP field. Since not every PHB 240 will need an associated ECN codepoint and in some applications a 241 given PHB might need two ECN codepoints (see, for 242 example,[I-D.briscoe-tsvwg-cl-architecture]) it would be wasteful to 243 assign a dedicated bit for ECN. 245 For each PHB that uses ECN marking, we assume one EXP codepoint will 246 be defined meaning not congestion marked (Not-CM), and at least one 247 other codepoint will be defined meaning congestion marked (CM). 248 Therefore, each PHB that uses ECN marking will consume at least two 249 EXP codepoints. But PHBs that do not use ECN marking will only 250 consume one. 252 Further, we wish to use minimal space in the MPLS shim header to tell 253 interior LSRs whether each packet will be received by an ECN-capable 254 transport (ECT). Nonetheless, we must ensure that an end-point that 255 would not understand an ECN mark will not receive one, otherwise it 256 will not be able to respond to congestion as it should. In the past, 257 three solutions to this problem have been proposed: 259 o One possible approach is for congested LSRs to mark the ECN field 260 in the underlying IP header at the bottom of the label stack. 261 Although many commercial LSRs routinely access the IP header for 262 other reasons (ECMP), there are numerous drawbacks to attempting 263 to find an IP header beneath an MPLS label stack. Notably, there 264 is the challenge of detecting the absence of an IP header when 265 non-IP packets are carried on an LSP. Therefore we will not 266 consider this approach further. 268 o In the scheme suggested by [Floyd] ECT and CE are overloaded into 269 one bit, so that a 0 means ECT while a 1 might either mean Not-ECT 270 or it might mean CE. A packet that has been marked as having 271 experienced congestion upstream, and then is picked out for 272 marking at a second congested LSR, will be dropped by the second 273 LSR since it cannot determine whether the packet has previously 274 experienced congestion or if ECN is not supported by the 275 transport. 277 While such an approach seemed potentially palatable, we do not 278 recommend it here for the following reasons. In some cases we 279 wish to be able to use ECN marking long before actual congestion 280 (e.g. pre-congestion notification). In these circumstances, 281 marking rates at each LSR might be non-negligible most of the 282 time, so the chances of a previously marked packet encountering an 283 LSR that wants to mark it again will also be non-negligible. In 284 the case where CE and not-ECT are indistinguishable to core 285 routers, such a scenario could lead to unacceptable drop rates. 286 If the typical marking rate at every router or LSRs is p, and the 287 typical diameter of the network of LSRs is d, then the probability 288 that a marked packet will be chosen for marking more than once is 289 1-[Pr(never marked) + Pr(marked at exactly one hop)] = 1- [(1-p)^d 290 + dp(1-p)^(d-1)]. For instance, with 6 LSRs in a row, each 291 marking ECN with 1% probability, the chances of a packet that is 292 already marked being chosen for marking a second time is 0.15%. 293 The bit overloading scheme would therefore introduce a drop rate 294 of 0.15% unnecessarily. Given that most modern core networks are 295 sized to introduce near-zero packet drop, it may be unacceptable 296 to drop over one in a thousand packets unnecessarily. 298 o A third possible approach was suggested by [Shayman]. In this 299 scheme, interior LSRs assume that the endpoints are ECN-capable, 300 but this assumption is checked when the final label is popped. If 301 an interior LSR has marked ECN in the EXP field of the shim 302 header, but the IP header says the endpoints are not ECN capable, 303 the edge router (or penultimate router, if using penultimate hop 304 popping) drops the packet. We recommend this scheme, which we 305 call `per-domain ECT checking', and define it more precisely in 306 the following section. Its chief drawback is that it can cause 307 packets to be forwarded after encountering congestion only to be 308 dropped at the egress of the MPLS domain. The rationale for this 309 decision is given in Section 9.1. 311 3. Per-domain ECT checking 313 For the purposes of this discussion, we define the egress nodes of an 314 MPLS domain as the nodes that pop the last MPLS label from the label 315 stack, exposing the IP (or, potentially non-IP) header. Note that 316 such a node may be the ultimate or penultimate hop of an LSP, 317 depending on whether penultimate hop popping (PHP) is employed. 319 In the per-domain ECT checking approach, the egress nodes take 320 responsibility for checking whether the transport is ECN capable. 321 This draft does not specify how these nodes should pass on congestion 322 notification, because different approaches are likely in different 323 scenarios. However, if congestion notification in the MPLS header is 324 copied into the IP header, the procedure MUST conform to the 325 specification given here. 327 If congestion notification is passed to the transport without first 328 passing it onward in the IP header, the approach used must take 329 similar care to check that the transport is ECN capable before 330 passing it ECN markings. Specifically, if the transport for a 331 particular congestion marked MPLS packet is found not to be ECN- 332 capable, the packet MUST be dropped at this egress node. 334 In the per-domain ECT checking approach, only the egress nodes check 335 whether an IP packet is destined for an ECN-capable transport. 336 Therefore, any single LSR within an MPLS domain MUST NOT be 337 configured to enable ECN marking unless all the egress LSRs 338 surrounding it are already configured to handle ECN marking. 340 We call a domain surrounded by ECN-capable egress LSRs an ECN-enabled 341 MPLS domain. This term only implies that all the egress LSRs are 342 ECN-enabled; some interior LSRs may not be ECN-enabled. For 343 instance, it would be possible to use legacy LSRs incapable of 344 supporting ECN in the interior of an MPLS domain as long as all the 345 egress LSRs were ECN-capable. Note that if PHP is used, the 346 "penultimate hop" routers which perform the pop operation do need to 347 be ECN-enabled, since they are acting in this context as egress LSRs. 349 4. ECN-enabled MPLS domain 351 In the following subsections we describe various operations affecting 352 the ECN marking of a packet that may be performed at MPLS edge and 353 core LSRs. 355 4.1. Pushing (adding) one or more labels to an IP packet 357 On encapsulating an IP packet with an MPLS label stack, the ECN field 358 must be translated from the IP packet into the MPLS EXP field. The 359 Not-CM (not congestion marked) state is set in the MPLS EXP field if 360 the ECN status of the IP packet is "Not ECT" or ECT(1) or ECT(0). 361 The CM state is set if the ECN status of the IP packet is "CE". If 362 more than one label is pushed at one time, the same value should be 363 placed in the EXP value of all label stack entries. 365 4.2. Pushing one or more labels onto an MPLS labelled packet 367 The EXP field is copied directly from the topmost label before the 368 push to the newly added outer label. If more than one label is being 369 pushed, the same EXP value is copied to all label stack entries. 371 4.3. Congestion experienced in an interior MPLS node 373 If the EXP codepoint of the packet maps to a PHB that uses ECN 374 marking and the marking algorithm requires the packet to be marked, 375 the CM state is set (irrespective of whether it is already in the CM 376 state). 378 If the buffer is full, a packet is dropped. 380 4.4. Crossing a Diffserv Domain Boundary 382 If an MPLS-encapsulated packet crosses a Diffserv domain boundary, it 383 may be the case that the two domains use different encodings of the 384 same PHB in the EXP field. In such cases, the EXP field must be 385 rewritten at the domain boundary. If the PHB is one that supports 386 ECN, then the appropriate ECN marking should also be preserved when 387 the EXP field is mapped at the boundary. 389 If an MPLS-encapsulated packet that is in the CM state crosses from a 390 domain that is ECN-enabled (as defined in Section 3) to a domain that 391 is not ECN-enabled, then it is necessary to perform the egress 392 checking procedures at the egress LSR of the ECN-enabled domain. 393 This means that if the encapsulated packet is not ECN capable, the 394 packet MUST be dropped. Note that this implies the egress LSR must 395 be able to look beneath the MPLS header without popping the label 396 stack. 398 The related issue of Diffserv tunnel models is discussed in 399 Section 4.7. 401 4.5. Popping an MPLS label (not the end of the stack) 403 When a packet has more than one MPLS label in the stack and the top 404 label is popped, another MPLS label is exposed. In this case the ECN 405 information should be transferred from the outer EXP field to the 406 inner MPLS label in the following manner. If the inner EXP field is 407 Not-CM, the inner EXP field is set to the same CM or Not-CM state as 408 the outer EXP field. If the inner EXP field is CM, it remains 409 unchanged whatever the outer EXP field. Note that an inner value of 410 CM and an outer value of not-CM should be considered anomalous, and 411 SHOULD be logged in some way by the LSR. 413 4.6. Popping the last MPLS label in the stack 415 When the last MPLS label is popped from the packet, its payload is 416 exposed. If that packet is not IP, and does not have any capability 417 equivalent to ECT, it is assumed Not-ECT and treated as such. That 418 means that if the EXP value of the MPLS header was CM, the packet 419 MUST be dropped. 421 Assuming an IP packet was exposed, we have to examine whether that 422 packet is ECT or not. A Not-ECT packet MUST be dropped if the EXP 423 field is CM. 425 For the remainder of this section, we describe the behavior that is 426 required if the ECN information is to be transferred from the MPLS 427 header into the exposed IP header for onward transmission. As noted 428 in Section 1.3, such behavior is not mandated by this document, but 429 may be selected by an operator. 431 If the inner IP packet is Not-ECT, its ECN field remains unchanged if 432 the EXP field is Not-CM. If the ECN field of the inner packet is set 433 to ECT(0), ECT(1) or CE, the ECN field remains unchanged if the EXP 434 field is set to Not-CM. The ECN field is set to CE if the EXP field 435 is CM. Note that an inner value of CE and an outer value of not-CM 436 should be considered anomalous, and SHOULD be logged in some way by 437 the LSR. 439 4.7. Diffserv Tunneling Models 441 [RFC3270] describes three tunneling models for Diffserv support 442 across MPLS Domains, referred to as the uniform, short pipe, and pipe 443 models. The differences between these models lie in whether the 444 Diffserv treatment that applies to a packet while it travels along a 445 particular LSP is carried to the last hop of the LSP and beyond the 446 last hop. Depending on which mode is preferred by an operator, the 447 EXP value or DSCP value of an exposed header following a label pop 448 may or may not be dependent on the EXP value of the label that is 449 removed by the pop operation. We believe that in the case of ECN 450 marking, the use of these models should only apply to the encoding of 451 the Diffserv PHB in the EXP value, and that the choice of codepoint 452 for ECN should always be made based on the procedures described 453 above, independent of the tunneling model. 455 4.8. Extension to Pre-Congestion Notification 457 This section describes how the preceding mechanisms can be extended 458 to support PCN [I-D.briscoe-tsvwg-cl-architecture]. Our intent here 459 is to show that the mechanisms are readily extended to more complex 460 scenarios than ECN, but this section may be safely ignored if one is 461 interested only in supporting ECN. 463 The relevant aspects of PCN for the purposes of this discussion are: 465 o PCN uses 3 states rather than 2 for ECN - these are referred to as 466 admission marked (AM), pre-emption marked (PM) and not marked (NM) 467 states. (See Section 8.4 for further discussion of PCN and the 468 possibility of using fewer codepoints.) 470 o A packet can go from NM to AM, from NM to PM, or from AM to PM, 471 but no other transition is possible. 473 o Whereas ECN-capable packets are identified by the ECT value in the 474 IP header, PCN-capability is determined by the PHB of the packet. 476 Thus, to support PCN fully in an MPLS domain for a particular PHB, a 477 total of 3 codepoints need to be allocated for that PHB. These 3 478 codepoints represent the admission marked (AM), pre-emption marked 479 (PM) and not marked (NM) states. The procedures described above need 480 to be slightly modified to support this scenario. The following 481 procedures are invoked when the topmost DSCP or EXP value indicates a 482 PHB that supports PCN. 484 4.8.1. Label Push onto IP packet 486 If the IP packet header indicates AM, set the EXP value of all 487 entries in the label stack to AM. If the IP packet header indicates 488 PM, set the EXP value of all entries in the label stack to PM. For 489 any other marking of the IP header, set the EXP value of all entries 490 in the label stack to NM. 492 4.8.2. Pushing Additional MPLS Labels 494 The procedures of Section 4.2 apply. 496 4.8.3. Admission Control or Pre-emption Marking inside MPLS domain 498 The EXP value can be set to AM or PM according to the same procedures 499 as described in [I-D.briscoe-tsvwg-cl-phb]. For the purposes of this 500 document, it does not matter exactly what algorithms are used to 501 decide when to set AM or PM; all that matters is that if a router 502 would have marked AM (or PM) in the IP header, it should set the EXP 503 value in the MPLS header to the AM (or PM) codepoint. 505 4.8.4. Popping an MPLS Label (not end of stack) 507 When popping an MPLS Label exposes another MPLS label, the AM or PM 508 marking should be transferred to the exposed EXP field in the 509 following manner: if the inner EXP value is NM, then it should be set 510 to the same marking state as the EXP value of the popped label stack 511 entry. If the inner EXP value is AM, it should be unchanged if the 512 popped EXP value was AM, and it should be set to PM if the popped EXP 513 value was PM. If the popped EXP value was NM, this should be logged 514 in some way and the inner EXP value should be unchanged. If the 515 inner EXP value is PM, it should be unchanged whatever the popped EXP 516 value was, but any EXP value other than PM should be logged. 518 4.8.5. Popping the last MPLS Label to expose IP header 520 When popping the last MPLS Label exposes the IP header, there are two 521 cases to consider: 523 o the popping LSR is NOT the egress router of the PCN region, in 524 which case AM or PM marking should be transferred to the exposed 525 IP header field; or 527 o the popping LSR IS the egress router of the PCN region. 529 In the latter case, the behavior of the egress LSR is defined in 530 [I-D.briscoe-tsvwg-cl-architecture] and is beyond the scope of this 531 document. In the former case, the marking should be transferred from 532 the popped MPLS header to the exposed IP header as follows: if the 533 inner IP header value is neither AM nor PM, and the EXP value was NM, 534 then the IP header should be unchanged. For any other EXP value, the 535 IP header should be set to the same marking state as the EXP value of 536 the popped label stack entry. If the inner IP header value is AM, it 537 should be unchanged if the popped EXP value was AM, and it should be 538 set to PM if the popped EXP value was PM. If the popped EXP value 539 was NM, this should be logged in some way and the inner IP header 540 value should be unchanged. If the IP header value is PM, it should 541 be unchanged whatever the popped EXP value was, but any EXP value 542 other than PM should be logged. 544 5. ECN-disabled MPLS domain 546 If ECN is not enabled on all the egress LSRs of a domain, ECN MUST 547 NOT be enabled on any LSRs throughout the domain. If congestion is 548 experienced on any LSR in an ECN-disabled MPLS domain, packets MUST 549 be dropped, NOT marked. The exact algorithm for deciding when to 550 drop packets during congestion (e.g. tail-drop, RED, etc.) is a local 551 matter for the operator of the domain. 553 6. The use of more codepoints with E-LSPs and L-LSPs 555 RFC 3270 gives different options with E-LSPs and L-LSPs and some of 556 those could potentially provide ample EXP codepoints for ECN/PCN. 557 However, deploying L-LSPs vs E-LSPs has many implications such as 558 platform support and operational complexity. The above ECN/PCN MPLS 559 solution should provide some flexibility. If the operator has 560 deployed one L-LSP per PHB scheduling class, then EXP space will be a 561 non-issue and it could be used to achieve more sophisticated ECN/PCN 562 behavior if required. If the operator wants to stick to E-LSPs and 563 uses a handful of EXP codepoints for Diffserv, it may be desirable to 564 operate with a minimum number of extra ECN/PCN codepoints, even if 565 this comes with some compromise on ECN/PCN optimality. See Section 8 566 for discussion of some possible deployment scenarios. 568 7. Relationship to tunnel behavior in RFC 3168 570 [RFC3168] defines two modes of encapsulating ECN-marked IP packets 571 inside additional IP headers when tunnels are used. The two modes 572 are the "full functionality" and "limited functionality" modes. In 573 the full functionality mode, the ECT information from the inner 574 header is copied to the outer header at the tunnel ingress, but the 575 CE information is not. In the limited functionality mode, neither 576 ECT nor CE information is copied to the outer header, and thus ECN 577 cannot be applied to the encapsulated packet. 579 The behavior that is specified in Section 4 of this document 580 resembles the "full functionality" mode in the sense that it conveys 581 some information from inner to outer header, and in the sense that it 582 enables full ECN support along the MPLS LSP (which is analogous to an 583 IP tunnel in this context). However it differs in one respect, which 584 is that the CE information is conveyed from the inner header to the 585 outer header. Our reason for this different design choice is to give 586 interior routers and LSRs more information about upstream marking in 587 multi-bottleneck cases. For instance, the flow pre-emption marking 588 mechanism proposed for PCN works by only considering packets for 589 marking that have not already been marked upstream. Unless existing 590 pre-emption marking is copied from the inner to the outer header at 591 tunnel ingress, the mechanism doesn't pre-empt enough traffic in 592 cases where anomalous events hit multiple MPLS domains at once. 593 [RFC3168] does not give any reasons against conveying CE information 594 from the inner header to the outer in the "full functionality" mode. 595 So, rather than define different encapsulation methods for ECN and 596 PCN, Section 4 defines a common approach for both. 598 7.1. Alternative approach to support ECN in an MPLS domain 600 It is possible to define an approach for MPLS support of ECN that 601 more closely resembles that of the full functionality mode of 602 [RFC3168]. This approach would differ from that described in 603 Section 4 in the following ways: 605 o when pushing one or more MPLS labels onto an IP packet, the not-CM 606 state is set in the EXP field of all label stack entries 608 o when pushing one or more MPLS labels onto an MPLS packet, the 609 not-CM state is set in the EXP field of all newly added label 610 stack entries 612 o when popping an MPLS label and the exposed header is MPLS (i.e. 613 this is not the end of stack), the EXP field of the MPLS packet 614 should be set to CM if the popped label's EXP value was CM and 615 left unchanged otherwise 617 o when popping an MPLS label and the exposed header is IP, the IP 618 ECN field should be set to CE if the EXP value was CM and if the 619 IP header indicated that the packet was ECN capable. If the IP 620 header indicated not-ECT and the EXP value was CM, the packet MUST 621 be dropped. If the EXP value was not-CM, the ECN field in the IP 622 header is unchanged. 624 The advantages of this scheme over that described in Section 4 are 625 greater similarity to [RFC3168], and the ability to determine, at the 626 end of an LSP, that congestion either did or did not occur along that 627 LSP (since the initial state is always not-CM at the start of an 628 LSP). 630 A disadvantage of this approach is that exceptions to this rule are 631 necessary in cases where the marking process on LSRs needs to depend 632 on whether a packet has already suffered upstream marking. The 633 currently proposed pre-emption marking in PCN is an example where 634 such an exception would be necessary (see the discussion at the start 635 of Section 7). 637 8. Example Uses 639 8.1. RFC3168-style ECN 641 [RFC3168] proposes the use of ECN in TCP and introduces the use of 642 ECN-Echo and CWR flags in the TCP header for initialization. The TCP 643 sender responds accordingly (such as not increasing the congestion 644 window) when it receives an ECN-Echo (ECE) ACK packet (that is, an 645 ACK packet with ECN-Echo flag set in the TCP header), then the sender 646 knows that congestion was encountered in the network on the path from 647 the sender to the receiver. 649 It would be possible to enable ECN in an MPLS domain for Diffserv 650 PHBs like AF and best efforts that are expected to be used by TCP and 651 similar transports (e.g. DCCP [RFC4340]). Then end-to-end 652 congestion control in transports capable of understanding ECN would 653 be able to respond to approaching congestion on LSRs without having 654 to rely on packet discard to signal congestion. 656 8.2. ECN Co-existence with Diffserv E-LSPs 658 Many operators today have deployed Diffserv using the E-LSP approach 659 of [RFC3270]. In many cases the number of PHBs used is less than 8, 660 and hence there remain available codepoints in the EXP space. If an 661 operator wished to support ECN for single PHB, this can be 662 accomplished by simply allocated a second codepoint to the PHB for 663 the "CM" state of that PHB and retaining the old codepoint for the 664 "not-CM" state. An operator with only four deployed PHBs could of 665 course enable ECN marking on all those PHBs. It is easy to imagine 666 cases where some PHBs might benefit more from ECN than others - for 667 example, an operator might use ECN on a premium data service but not 668 on a PHB used for best effort internet traffic. 670 As an illustrative example of how the EXP field might be used in this 671 case, consider the example of an operator who is using the aggregated 672 service classes described in [I-D.chan-tsvwg-diffserv-class-aggr]. 673 He may choose to support ECN only for the Assured Elastic Treatment 674 Aggregate, using the EXP codepoint 010 for the not-CM state and 011 675 for the CM state. All other codepoints could be the same as in 676 [I-D.chan-tsvwg-diffserv-class-aggr]. Of course any other 677 combination of EXP values can be used according to the specific set 678 of PHBs and marking conventions used within that operator's network. 680 8.3. Congestion-feedback-based Traffic Engineering 682 Shayman's traffic engineering [Shayman] proposed the use of ECN by an 683 egress LSR feeding back congestion to an ingress LSR to mitigate 684 congestion by employing dynamic traffic engineering techniques such 685 as shifting flows to an alternate path. It proposed a new RSVP 686 TUNNEL CONGESTION message which was sent to the ingress LSR and 687 ignored by transit LSRs. 689 8.4. PCN flow admission control and flow pre-emption 691 [I-D.briscoe-tsvwg-cl-architecture] proposes using pre-congestion 692 notification (PCN) on routers within an edge-to-edge Diffserv region 693 to control admission of new flows to the region and, if necessary, to 694 pre-empt existing flows in response to disasters and other anomalous 695 routing events. In this approach, the current level of PCN marking 696 is picked up by the signalling used to initiate each flow in order to 697 inform the admission control decision for the whole region at once. 698 As an example, a minor extension to RSVP signalling has been proposed 699 [I-D.lefaucheur-rsvp-ecn] to carry this message, but a similar 700 approach has also been proposed that uses NSIS signalling 701 [I-D.ietf-nsis-rmd]. 703 If it is possible for LSRs to signify congestion in MPLS, PCN marking 704 could be used for admission control and flow pre-emption across a 705 Diffserv region, irrespective of whether it contained pure IP 706 routers, MPLS LSRs, or both. Indeed, the solution could be somewhat 707 more efficient to implement if aggregates could identify themselves 708 by their MPLS label. Section 4.8 describes the mechanisms by which 709 the necessary markings for PCN could be carried in the MPLS header. 711 As an illustrative example of how the EXP field might be used in this 712 case, consider the example of an operator who is using the aggregated 713 service classes described in [I-D.chan-tsvwg-diffserv-class-aggr]. 714 He may choose to support PCN only for the Real Time Treatment 715 Aggregate, using the EXP codepoint 100 for the not-marked (NM) state, 716 101 for the Admission Marked (AM) state, and 111 for the Pre-emption 717 Marked (PM) state. All other codepoints could be the same as in 718 [I-D.chan-tsvwg-diffserv-class-aggr]. Of course any other 719 combination of EXP values can be used according to the specific set 720 of PHBs and marking conventions used within that operator's network. 722 It might also be possible to deploy a similar solution using PCN 723 marking over MPLS for just admission control alone, or just flow pre- 724 emption alone, particularly if codepoint space was at a premium in 725 the MPLS EXP field. However, the feasibility of deploying one 726 without the other would require further study. 728 9. Deployment Considerations 730 9.1. Marking non-ECN Capable Packets 732 What is the consequences of marking a packet that is not ECN-capable? 733 Even if it will be dropped before leaving the domain, doesn't this 734 consume resources unnecessarily? 736 The problem only arises if there is congestion downstream of an 737 earlier congested node. It might be that marked packets are carried 738 through this second congested router when, within the underlying IP 739 header they are not ECN capable, so they will be dropped later. Such 740 packets might cause other packets to be marked (or dropped) that 741 would not otherwise have been. 743 We decided to use the per-domain ECT checking approach because it 744 would become optimal as ECN deployment became prevalent. The 745 situation where traffic is carried beyond a congested LSR only to be 746 dropped later should become less prevalent as more transports use 747 ECN. This is why we chose not to use the [Floyd] alternative which 748 introduced a low but persistent level of unnecessary packet drop for 749 all time. Although that scheme did not carry droppable traffic to 750 the edge of the MPLS domain, we felt this was a small price to pay, 751 and it was anyway only of concern until ECN had become more widely 752 deployed. 754 A partial solution would be to preferentially drop packets arriving 755 at a congested router that were already marked. There is no solution 756 to the problem of marking a packet when congestion is caused by 757 another packet that should have been dropped. However, the chance of 758 such an occurrence is very low and the consequences are not 759 significant. It merely causes an application to very occasionally 760 slow down its rate when it did not have to. 762 9.2. Non-ECN capable routers in an MPLS Domain 764 What if an MPLS domain wants to use ECN, but not all legacy routers 765 are able to support it? 766 If the legacy router(s) are used in the interior, this is not a 767 problem. They will simply have to drop the packets if they are 768 congested, rather than mark them, which is the standard behavior for 769 IP routers that are not ECN-enabled. 771 If the legacy router were used as an egress router, it would not be 772 able to check the ECN capability of the transport correctly. An 773 operator in this position would not be able to use this solution and 774 therefore MUST NOT enable ECN unless all egress routers are ECN- 775 capable. 777 10. IANA Considerations 779 This document makes no request of IANA. 781 Note to RFC Editor: this section may be removed on publication as an 782 RFC. 784 11. Security Considerations 786 We believe no new vulnerabilities are introduced by this draft. 788 We have considered whether malicious sources might be able to exploit 789 the fact that interior LSRs will mark packets that are Not-ECT, 790 relying on their egress LSR to drop them. Although this might allow 791 sources to engineer a situation where more traffic is carried across 792 an MPLS domain than should be, we figured that even if we hadn't 793 introduced this feature, these sources would have been able to 794 prevent these LSRs dropping this traffic anyway, simply by setting 795 ECT in the first place. 797 An ECN sender can use the ECN nonce [RFC3540] to detect a misbehaving 798 receiver. The ECN nonce works correctly across an MPLS domain 799 without requiring any specific support from the proposal in this 800 draft. The nonce does not need to be present in the MPLS shim 801 header. As long as the nonce is present in the IP header when the 802 ECN information is copied from the last MPLS shim header, it will be 803 overwritten if congestion has been experienced by an LSR. This is 804 all that is necessary for the sender to detect a misbehaving 805 receiver. 807 An alternative proposal currently in progress in the IETF 808 [I-D.briscoe-tsvwg-re-ecn-tcp] allows the network to prevent 809 misbehavior by senders or receivers or other routers. Like the ECN 810 nonce, it works correctly without requiring any specific support from 811 the proposal in this draft. It uses a bit in the IP header (the RE 812 bit) which is set by the sender and never changed along the path-it 813 is only read by certain policing elements in the network. There is 814 no need for a copy of this bit in the MPLS shim, as policing nodes 815 can examine the IP header if they need to, particularly given they 816 are intended to only be necessary at domain borders where MPLS 817 headers are often removed. 819 12. Acknowledgements 821 Thanks to K.K. Ramakrishnan and Sally Floyd for getting us thinking 822 about this in the first place and for providing advice on tunneling 823 of ECN packets, and to Joe Babiarz, Ben Niven-Jenkins, Phil Eardley, 824 and Ruediger Geib for their comments on the draft. 826 13. References 828 13.1. Normative References 830 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 831 Requirement Levels", BCP 14, RFC 2119, March 1997. 833 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 834 and W. Weiss, "An Architecture for Differentiated 835 Services", RFC 2475, December 1998. 837 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 838 Label Switching Architecture", RFC 3031, January 2001. 840 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 841 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 842 Encoding", RFC 3032, January 2001. 844 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 845 of Explicit Congestion Notification (ECN) to IP", 846 RFC 3168, September 2001. 848 [RFC3260] Grossman, D., "New Terminology and Clarifications for 849 Diffserv", RFC 3260, April 2002. 851 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 852 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 853 Protocol Label Switching (MPLS) Support of Differentiated 854 Services", RFC 3270, May 2002. 856 13.2. Informative References 858 [Floyd] "A Proposal to Incorporate ECN in MPLS", 1999. 860 Work in progress. http://www.icir.org/floyd/papers/ 861 draft-ietf-mpls-ecn-00.txt 863 [I-D.briscoe-tsvwg-cl-architecture] 864 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 865 Congestion Notification: Admission Control over a 866 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-03 867 (work in progress), June 2006. 869 [I-D.briscoe-tsvwg-cl-phb] 870 Briscoe, B., "Pre-Congestion Notification marking", 871 draft-briscoe-tsvwg-cl-phb-02 (work in progress), 872 June 2006. 874 [I-D.briscoe-tsvwg-re-ecn-border-cheat] 875 Briscoe, B., "Emulating Border Flow Policing using Re-ECN 876 on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 877 (work in progress), June 2006. 879 [I-D.briscoe-tsvwg-re-ecn-tcp] 880 Briscoe, B., "Re-ECN: Adding Accountability for Causing 881 Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-02 882 (work in progress), June 2006. 884 [I-D.chan-tsvwg-diffserv-class-aggr] 885 Chan, K., "Aggregation of DiffServ Service Classes", 886 draft-chan-tsvwg-diffserv-class-aggr-03 (work in 887 progress), January 2006. 889 [I-D.ietf-nsis-rmd] 890 Bader, A., "RMD-QOSM - The Resource Management in Diffserv 891 QOS Model", draft-ietf-nsis-rmd-07 (work in progress), 892 June 2006. 894 [I-D.lefaucheur-rsvp-ecn] 895 Faucheur, F., "RSVP Extensions for Admission Control over 896 Diffserv using Pre-congestion Notification (PCN)", 897 draft-lefaucheur-rsvp-ecn-01 (work in progress), 898 June 2006. 900 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 901 Congestion Notification (ECN) Signaling with Nonces", 902 RFC 3540, June 2003. 904 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 905 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 907 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", 908 2000. 910 Work in progress. http://www.ee.umd.edu/~shayman/papers.d/ 911 draft-shayman-mpls-ecn-00.txt 913 Authors' Addresses 915 Bruce Davie 916 Cisco Systems, Inc. 917 1414 Mass. Ave. 918 Boxborough, MA 01719 919 USA 921 Email: bsd@cisco.com 923 Bob Briscoe 924 BT Research 925 B54/77, Sirius House 926 Adastral Park 927 Martlesham Heath 928 Ipswich 929 Suffolk IP5 3RE 930 United Kingdom 932 Email: bob.briscoe@bt.com 934 June Tay 935 BT Research 936 B54/77, Sirius House 937 Adastral Park 938 Martlesham Heath 939 Ipswich 940 Suffolk IP5 3RE 941 United Kingdom 943 Email: june.tay@bt.com 945 Full Copyright Statement 947 Copyright (C) The Internet Society (2006). 949 This document is subject to the rights, licenses and restrictions 950 contained in BCP 78, and except as set forth therein, the authors 951 retain all their rights. 953 This document and the information contained herein are provided on an 954 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 955 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 956 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 957 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 958 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 959 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 961 Intellectual Property 963 The IETF takes no position regarding the validity or scope of any 964 Intellectual Property Rights or other rights that might be claimed to 965 pertain to the implementation or use of the technology described in 966 this document or the extent to which any license under such rights 967 might or might not be available; nor does it represent that it has 968 made any independent effort to identify any such rights. Information 969 on the procedures with respect to rights in RFC documents can be 970 found in BCP 78 and BCP 79. 972 Copies of IPR disclosures made to the IETF Secretariat and any 973 assurances of licenses to be made available, or the result of an 974 attempt made to obtain a general license or permission for the use of 975 such proprietary rights by implementers or users of this 976 specification can be obtained from the IETF on-line IPR repository at 977 http://www.ietf.org/ipr. 979 The IETF invites any interested party to bring to its attention any 980 copyrights, patents or patent applications, or other proprietary 981 rights that may cover technology that may be required to implement 982 this standard. Please address the information to the IETF at 983 ietf-ipr@ietf.org. 985 Acknowledgment 987 Funding for the RFC Editor function is provided by the IETF 988 Administrative Support Activity (IASA).