idnits 2.17.1 draft-briscoe-tsvwg-ecn-tunnel-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 923. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 934. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 941. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 947. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 30, 2007) is 6143 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-02) exists of draft-ietf-tsvwg-ecn-mpls-00 -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 4423 (Obsoleted by RFC 9063) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft BT 4 Intended status: Standards Track June 30, 2007 5 Expires: January 1, 2008 7 Layered Encapsulation of Congestion Notification 8 draft-briscoe-tsvwg-ecn-tunnel-00 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on January 1, 2008. 35 Copyright Notice 37 Copyright (C) The IETF Trust (2007). 39 Abstract 41 This document redefines how the explicit congestion notification 42 (ECN) field of the outer IP header of a tunnel should be constructed. 43 It brings all IP in IP tunnels (v4 or v6) into line with the way 44 IPsec tunnels now construct the ECN field, ensuring that the outer 45 header reveals any congestion experienced so far on the path. It 46 specifies the default ECN tunneling behaviour for any Diffserv per- 47 hop behaviour (PHB), but also gives general principles to guide the 48 design of alternate congestion marking behaviours for specific PHBs 49 and for lower layer congestion notification schemes. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 5 55 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 6 56 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 6 57 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 7 58 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 8 59 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 9 60 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 11 61 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 12 62 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 13 63 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 64 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 65 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14 66 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 67 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 15 68 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 69 13.1. Normative References . . . . . . . . . . . . . . . . . . . 15 70 13.2. Informative References . . . . . . . . . . . . . . . . . . 16 71 Appendix A. In-path Load Regulation . . . . . . . . . . . . . . . 17 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 73 Intellectual Property and Copyright Statements . . . . . . . . . . 21 75 1. Introduction 77 This document redefines how the explicit congestion notification 78 (ECN) field [RFC3168] of the outer IP header of a tunnel should be 79 constructed. It brings all IP in IP tunnels (v4 or v6) into line 80 with the way IPsec tunnels [RFC4301] now construct the ECN field, 81 ensuring that the outer header reveals any congestion experienced so 82 far on the path. Although this memo focuses on IP in IP tunnelling 83 it also gives generalised advice for any encapsulation by lower layer 84 headers. 86 ECN allows a congested resource to notify the onset of congestion 87 without having to drop packets, by explicitly marking a proportion of 88 packets with the congestion experienced (CE) codepoint. Congestion 89 notification is unusual in that it propagates from the physical layer 90 upwards to the transport layer, because congestion is exhaustion of a 91 physical resource. The transport layer can directly detect loss of a 92 packet (or frame) by a lower layer. But if a lower layer marks a 93 packet (or frame) to notify incipient congestion, this marking has to 94 be explicitly copied up the layers at every header decapsulation. 95 So, at each decapsulation of an outer (lower layer) header a 96 congestion marking has to be arranged to propagate into the forwarded 97 (upper layer) header. It must continue upwards until it reaches the 98 destination transport, which should feed congestion notification back 99 to the source transport. 101 Note that often lower layer resources are arranged to be protected by 102 higher layer buffers, so instead of blocking occurring at the lower 103 layer, it occurs when the higher layer queue overflows. Thus, non- 104 blocking link and physical layer technologies do not have to 105 implement congestion notification, which can be introduced solely in 106 IP layer active queue management (AQM). However, if we want to use 107 congestion notification, we have to arrange for it to be explicitly 108 copied up the layers when IP is tunnelled in IP (and if a particular 109 link layer technology isn't protected from blocking by network layer 110 queues). 112 IPsec tunnel mode is a specific form of tunnelling that can hide the 113 inner headers. Because the ECN field has to be mutable, it cannot be 114 covered by IPsec encryption or authentication calculations. 115 Therefore concern has been raised in the past that the ECN field 116 could be used as a low bandwidth covert channel to communicate with 117 someone on the unprotected public Internet even if an end-host is 118 restricted to only communicate with the public Internet through an 119 IPsec gateway. However, the recently updated version of IPsec 120 [RFC4301] chose not to block this covert channel, deciding that the 121 threat could be managed given the channel bandwidth is so limited 122 (ECN is a 2-bit field). 124 An unfortunate sequence of standards actions leading up to this 125 latest change in IPsec has left us with nearly the worst of all 126 possible combinations of outcomes, despite the best endeavours of 127 everyone concerned. Even though information about congestion 128 experienced on the upstream path has various uses if it is revealed 129 in the outer header of a tunnel, when ECN was standardised[RFC3168] 130 it was decided that all IP in IP tunnels should hide upstream 131 congestion information simply to avoid the extra complexity of two 132 different mechanisms for IPsec and non-IPsec tunnels. However, now 133 that [RFC4301] IPsec tunnels deliberately no longer hide this 134 information, we are left in the perverse position where non-IPsec 135 tunnels still hide congestion information unnecessarily. This 136 document is designed to correct that anomaly. 138 Specifically, RFC3168 says that, if a tunnel supports ECN (termed a 139 'full-functionality' ECN tunnel), the tunnel ingress must not copy a 140 CE marking from the inner header into the outer header that it 141 creates. Instead the tunnel ingress has to set the ECN field of the 142 outer header to ECT(0) (i.e. codepoint 10). We term this 'resetting' 143 a CE codepoint. However, RFC4301 reverses this, stating that the 144 tunnel ingress must simply copy the ECN field from the inner to the 145 outer header. The main purpose of this document is to carry over 146 this new relaxed attitude to covert channels from IPsec to all IP in 147 IP tunnels, so all tunnel ingress nodes consistently copy the ECN 148 field. 150 The rest of the document deals with the knock-on effects of this 151 apparently minor change. It is organised as follows: 153 o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to 154 'switch in' different behaviours for marking the ECN field, just 155 as it switches in different per-hop behaviours (PHBs) for 156 scheduling. Therefore we cannot only discuss the ECN protocol 157 that RFC3168 gives as a default. We need to also give guidance 158 for possible different marking schemes. Therefore in Section 3 we 159 lay out the design constraints when tunneling congestion 160 notification. 162 o Then in Section 4 we resolve the tensions between these 163 constraints to give general design principles on how a tunnel 164 should process congestion notification; principles that could 165 apply to any marking behaviour for any PHB, not just the default 166 in RFC3168. In particular, we examine the underlying principles 167 behind whether CE should be reset or copied into the outer header 168 at the ingress to a tunnel--or indeed at the ingress of any 169 layered encapsulation of headers with congestion notification 170 fields. 172 o Section 5 then confirms the precise rules for the default ECN 173 tunnelling behaviour based on the above design principles. These 174 rules apply to all PHBs, unless stated otherwise in the 175 specification of a PHB. There is no requirement for a PHB to 176 state anything about ECN behaviour if the default behaviour is 177 sufficient. 179 o Extending the new IPsec tunnel ingress behaviour to all IP in IP 180 tunnels causes one further knock-on effect that is dealt with in 181 Section 6 on Backward Compatibility. If one end of an IPsec 182 tunnel is compliant with [RFC4301], assuming IKEv2 key management 183 is used, the other end can be guaranteed to also be [RFC4301] 184 compliant. So there is no backward compatibility problem with 185 IKEv2 RFC4301 IPsec tunnels. But once we extend our scope to any 186 IP in IP tunnel, we have to cater for the possibility that a 187 tunnel ingress compliant with this specification is sending to an 188 egress that doesn't even understand ECN (e.g. a legacy [RFC2003] 189 tunnel egress). If a tunnel ingress copied incoming ECN-capable 190 headers into outer headers, then a legacy tunnel egress would 191 discard any congestion markings added to the outer header within 192 the tunnel. ECN-capable traffic sources would not see any 193 congestion feedback and instead continually ratchet up their share 194 of the bandwidth without realising that cross-flows from other ECN 195 sources were continually having to ratchet down. 197 The scope of this document is all IP in IP tunnelling, irrespective 198 of whether IPv4 or IPv6 is used for either of the inner and outer 199 headers. The document only concerns wire protocol processing at 200 tunnel endpoints and makes no changes or recommendations concerning 201 algorithms for congestion marking or congestion response. The 202 general design principles of Section 4 may also be useful when any 203 datagram/packet/frame with a congestion notification capability is 204 encapsulated by a connectionless outer header [BBnet] that might also 205 support a congestion notification capability in the future as 206 discussed in S.9.3 of [RFC3168] (e.g. IP encapsulated in L2TP 207 [RFC2661], GRE [RFC1701] or PPTP [RFC2637]). However, of course, the 208 IETF does not have standards authority over every link or tunnel 209 protocol, so this document focuses only on IP in IP. 210 [I-D.ietf-tsvwg-ecn-mpls] applies these principles to IP in MPLS and 211 to MPLS in MPLS. 213 2. Requirements notation 215 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 216 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 217 document are to be interpreted as described in [RFC2119]. 219 3. Design Constraints 221 Tunnel processing of a congestion notification field has to meet 222 congestion control needs without creating new information security 223 vulnerabilities (if information security is required). 225 3.1. Security Constraints 227 Information security can be assured by using various end to end 228 security solutions (including IPsec in transport mode [RFC4301]), but 229 a commonly used scenario involves the need to communicate between two 230 physically protected domains across the public Internet. In this 231 case there are certain management advantages to using IPsec in tunnel 232 mode solely across the publicly accessible part of the path. The 233 path followed by a packet then crosses security 'domains'; the ones 234 protected by physical or other means before and after the tunnel and 235 the one protected by an IPsec tunnel across the otherwise unprotected 236 domain. We will use the scenario in Figure 1 where endpoints 'A' and 237 'B' communicate through a tunnel with ingress 'I' and egress 'E' 238 within physically protected edge domains across an unprotected 239 internetwork where there may be 'men in the middle', M. 241 physically unprotected physically 242 <-protected domain-><--domain--><-protected domain-> 243 +------------------+ +------------------+ 244 | | M | | 245 | A-------->I=========>==========>E-------->B | 246 | | | | 247 +------------------+ +------------------+ 248 <----IPsec secured----> 249 tunnel 251 Figure 1: IPsec Tunnel Scenario 253 IPsec encryption is typically used to prevent 'M' seeing messages 254 from 'A' to 'B'. IPsec authentication is used to prevent 'M' 255 masquerading as the sender of messages from 'A' to 'B' or altering 256 their contents. But 'I' can also use IPsec tunnel mode to allow 'A' 257 to communicate with 'B', but impose encryption to prevent 'A' leaking 258 information to 'M'. Or 'E' can insist that 'I' uses tunnel mode 259 authentication to prevent 'M' communicating information to 'B'. 260 Mutable IP header fields such as the ECN field (as well as the TTL/ 261 Hop Limit and DS fields) cannot be included in the cryptographic 262 calculations of IPsec. Therefore, if 'I' encrypts but copies these 263 mutable fields into the outer header that is exposed across the 264 tunnel it will have allowed a covert channel from 'A' to M. And if 265 'E' copies these fields from the outer header to the inner, even if 266 it validates authentication from 'I', it will have allowed a covert 267 channel from 'M' to 'B'. 269 ECN at the IP layer is designed to carry information about congestion 270 from a congested resource to some downstream node that will feed the 271 information back somehow to the point upstream of the congestion that 272 can regulate the load on the congested resource. In terms of the 273 above scenario, ECN is effectively intended to create an information 274 channel from 'M' to 'B', for 'B' to forward to 'A'. Therefore the 275 goals of IPsec and ECN are mutually incompatible. 277 With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, 278 "controls are provided to manage the bandwidth of this [covert] 279 channel". Using the ECN processing rules of RFC4301, the channel 280 bandwidth is two bits per datagram from 'A' to 'M' and one bit per 281 datagram from 'M' to 'A' because 'E' limits the combinations it will 282 copy. In both cases the covert channel bandwidth is further reduced 283 by noise from any real congestion marking. RFC4301 therefore implies 284 that these covert channels are sufficiently limited to be considered 285 a manageable threat. However, with respect to the larger (6b) DS 286 field, the same section of RFC4301 says not copying is the default, 287 but a configuration option can allow copying "to allow a local 288 administrator to decide whether the covert channel provided by 289 copying these bits outweighs the benefits of copying". Of course, an 290 administrator considering copying of the DS field has to take into 291 account that it could be concatenated with the ECN field giving an 8b 292 per datagram channel. 294 3.2. Control Constraints 296 Congestion control requires that any congestion notification marked 297 into packets by a resource will be able to traverse a feedback loop 298 back to a node capable of controlling the load on that resource. To 299 avoid ambiguity later rather than calling this node the data source 300 we will call it the Load Regulator. This will allow us to deal with 301 exceptional cases where load is not regulated by the data source, but 302 usually the two will be synonymous. Note the term "a node _capable 303 of_ controlling the load" deliberately includes a source application 304 that doesn't actually control the load but ought to (e.g. an 305 application without congestion control that uses UDP). 307 A--->R--->I=========>M=========>E-------->B 309 Figure 2: Simple Tunnel Scenario 311 We now consider a similar tunneling scenario to the IPsec one just 312 described, but without the different security domains so we can just 313 focus on ensuring the control loop and management monitoring can work 314 (Figure 2). If we want resources in the tunnel to be able to 315 explicitly notify congestion and the feedback loop is from 'B' to 316 'A', it will certainly be necessary for 'E' to copy any CE marking 317 from the outer header to the inner header for onward transmission to 318 'B', otherwise congestion notification from resources like 'M' cannot 319 be fed back to the Load Regulator ('A'). But it doesn't seem 320 necessary for 'I' to copy CE markings from the inner to the outer 321 header. For instance, if resource 'R' is congested, it can send 322 congestion information to 'B' using the congestion field in the inner 323 header without 'I' copying the congestion field into the outer header 324 and 'E' copying it back to the inner header. 'E' can then write any 325 additional congestion marking introduced across the tunnel into the 326 congestion field of the inner header. 328 Indeed, this arrangement can be extended to multi-level congestion 329 marking (such as that proposed for PCN [PCN-arch]) as long as all the 330 marks have unambiguously ranked values. For instance, if a 331 hypothetical multi-level marking scheme for PCN had PCN-capable 332 codepoints ranked 1, 2 and 3, then, if 'I' reset the outer congestion 333 field to the lowest ranked value that is PCN-capable (1), 'E' would 334 simply write the highest ranked of the inner and outer congestion 335 markings into the forwarded header. For instance, if the inner 336 marking on arrival at 'I' was 3 and 'I' reset the outer to 1, but 'M' 337 subsequently set it to 2, then the header forwarded by 'E' would be 338 max(3,2) = 3. 340 It might be useful for the tunnel egress to be able to tell whether 341 congestion occurred across a tunnel or upstream of it. If outer 342 header congestion marking was reset at the tunnel ingress ('I'), by 343 the end of a tunnel ('E') the outer headers would indicate congestion 344 experienced across the tunnel ('I' to 'E'), while the inner header 345 would indicate congestion upstream of 'I'. But the same information 346 could be gleaned even if the tunnel ingress copied the inner to the 347 outer headers. By the end of the tunnel ('E'), any packet with an 348 _extra_ mark in the outer header relative to the inner header would 349 indicate congestion across the tunnel ('I' to 'E'), while the inner 350 header would still indicate congestion upstream of ('I'). 352 All this shows that 'E' can preserve the control loop irrespective of 353 whether 'I' copies congestion notification into the outer header or 354 resets it. 356 3.3. Management Constraints 358 As well as control, there are also management constraints. 359 Specifically, a management system may monitor congestion markings in 360 passing packets, perhaps at the border between networks as part of a 361 service level agreement. For instance, monitors at the borders of 362 autonomous systems may need to measure how much congestion has 363 accumulated since the original source to determine between them how 364 much of the congestion is contributed by each domain. 366 Therefore it should be clear how far back in the path the congestion 367 markings have accumulated from. In this document we term this the 368 baseline of the congestion marking, i.e. the source of the layer that 369 last reset rather than copied the congestion notification field when 370 creating an outer header. Given some tunnels cross domain borders 371 (e.g. consider M in Figure 2 is monitoring a border), it is therefore 372 desirable for 'I' to copy congestion accumulated so far into the 373 outer headers exposed across the tunnel. 375 Appendix A discusses various scenarios where the Load Regulator lies 376 in-path, not at the source host as we would typically expect. It 377 concludes that the baseline for congestion notification should be 378 determined by where the Load Regulator function is, whether it is at 379 the source host or within the path. Therefore every tunnel ingress 380 should copy the ECN field into the outer header it creates unless it 381 is also a Load Regulator, in which case it should reset any CE 382 markings, which is an exception to the normal copying rule for a 383 tunnel ingress. 385 4. Design Principles 387 The constraints from the three perspectives of security, control and 388 management in Section 3 are somewhat in tension as to whether a 389 tunnel ingress should copy congestion markings into the outer header 390 it creates or reset them. From the control perspective either 391 copying or resetting works. From the management perspective copying 392 is preferable (with the exception of an in-path load regulator). 393 From the security perspective resetting is preferable but copying is 394 now considered acceptable given the bandwidth of a 2-bit covert 395 channel can be managed. 397 Therefore an outer encapsulating header capable of carrying 398 congestion markings SHOULD reflect accumulated congestion since the 399 last interface designed to regulate load (the Load Regulator). This 400 implies congestion notification SHOULD be copied into the outer 401 header of each new encapsulating header that supports it--except at 402 an in-path Load Regulator. An in-path Load Regulator knows its 403 function is to regulate load, so if it also acts as the ingress to a 404 tunnel, in every new outer header it creates it MUST reset any 405 congestion marking. 407 The Load Regulator is the node to which congestion feedback should be 408 returned by the next downstream node with a transport layer function 409 (typically but not always the data receiver). The Load Regulator is 410 not always (or even typically) the same thing as the node identified 411 by the source address of the outermost exposed header. In general 412 the addressing of the outermost encapsulation header says nothing 413 about the identifiers of either the upstream or the downstream 414 transport layer functions. As long as the transport functions know 415 each other's addresses, they don't have to be identified in the 416 network layer or in any link layer. It was only a convenience that a 417 TCP receiver assumed that the address of the source transport is the 418 same as the network layer source address of a packet it receives. 420 More generally, the return transport address could be identified 421 solely in the transport layer protocol. For instance, a signalling 422 protocol like RSVP [RFC2205] breaks up a path into transport layer 423 hops and informs each hop of the address of its transport layer 424 neighbour without any need to identify these hops in the network 425 layer. RSVP can be arranged so that these transport layer hops are 426 bigger than the underlying network layer hops. The host identity 427 protocol (HIP) architecture [RFC4423] also supports the same 428 principled separation (for mobility amongst other things), where the 429 transport layer receiver identifies the transport layer sender using 430 an identifier provided by the transport layer, which gets mapped to a 431 network layer address below the transport layer. 433 Note that this principle deliberately doesn't require a packet header 434 to reveal the origin address of the baseline that congestion 435 notification has accumulated from. It is not necessary for the 436 network and lower layers to know the address of the Load Regulator. 437 Only the destination transport needs to know that. With congestion 438 notification, the network and link layers only notify congestion 439 forwards, they aren't involved in feeding it backwards. If they are, 440 e.g. backward congestion notification (BCN) in Ethernet [802.1au], 441 that should be considered as a transport function added to the lower 442 layer, which must sort out its own addressing. Indeed, this is one 443 reason why ICMP source quench is now deprecated [RFC1254]; when 444 congestion occurs within a tunnel it is complex (particularly in the 445 case of IPsec tunnels) to return the ICMP messages beyond the tunnel 446 ingress back to the Load Regulator . 448 Similarly, if a management system is monitoring congestion and needs 449 to know the baseline of congestion notification, the management 450 system has to find this out from the transport; in general it cannot 451 tell solely by looking at the network or link layer headers. 453 We have said that a tunnel ingress that is not a Load Regulator 454 SHOULD (as opposed to MUST) copy incoming congestion notification 455 into an outer encapsulating header that supports it. In the case of 456 2-bit ECN, the IETF security area have deemed the benefit always 457 outweighs the risk. Therefore for 2-bit ECN we can and we will say 458 'MUST' (Section 5). But in this section where we are setting down 459 general design principles, we leave it as a 'SHOULD'. This allows 460 for future multi-bit congestion notification fields where the risk 461 from the covert channel created by copying congestion notification 462 might outweigh the congestion control benefit of copying. 464 5. Default ECN Tunnelling Rules 466 The following ECN tunnel processing rules are the default for a 467 packet with any DSCP. If required, different ECN processing rules 468 MAY be defined for the appropriate Diffserv PHB using the guidelines 469 in Section 4. 471 When a tunnel ingress creates an encapsulating IP header, the 2-bit 472 ECN field of the inner IP header MUST be copied into the outer IP 473 header, for all types of IP in IP tunnel (except if the tunnel 474 ingress is in compatibility mode--see Section 6). If the tunnel 475 ingress is also a Load Regulator, it MUST instead reset the outer 476 header to ECT(0). 478 To decapsulate the inner header at the tunnel egress, the outgoing 479 inner header MUST be calculated from the combination of the incoming 480 inner and outer headers setting the outgoing ECN field to the 481 codepoints displayed in the body of Table 1. 483 +--Incoming Outer Header--- 485 +--------------------+---------+------------+-----------+-----------+ 486 | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | 487 | Header | | | | | 488 +--------------------+---------+------------+-----------+-----------+ 489 | Not-ECT | Not-ECT | drop (!!!) | drop(!!!) | drop(!!!) | 490 | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | 491 | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | 492 | CE | CE | CE (!!!) | CE (!!!) | CE | 493 +--------------------+---------+------------+-----------+-----------+ 495 +-----Outgoing Header------ 497 Table 1: IP in IP Decapsulation 499 The exclamation marks '(!!!)' in Table 1 indicate that this 500 combination of inner and outer headers should not be possible if only 501 legal transitions have taken place. So, the decapsulator should drop 502 or mark the ECN field as the table specifies, but it MAY also raise 503 an appropriate alarm. It MUST NOT raise an alarm so often that the 504 illegal combinations would amplify into a flood of alarm messages. 506 6. Backward Compatibility 508 A legacy tunnel egress may not know how to process an ECN field, so 509 it will most likely simply disregard all outer headers. Therefore, 510 unless a compliant tunnel ingress has established that the tunnel 511 egress understands ECN processing, it MUST only send packets with the 512 ECN field set to Not-ECT in the outer header. Otherwise, if ECN 513 capable outer headers were sent towards a legacy egress, it would 514 dangerously remove information about congestion experienced within 515 the tunnel. 517 A tunnel ingress may establish whether its tunnel egress will 518 understand ECN processing by configuration or by negotiation. Note 519 that a [RFC4301] tunnel ingress that has used IKEv2 key management 520 [RFC4306] can guarantee that the tunnel egress is also RFC4301- 521 compliant and therefore need not negotiate ECN capabilities. 523 To be compliant with this specification a tunnel ingress that does 524 not know the egress ECN capability (e.g. by configuration) MUST 525 implement a 'normal' mode and a 'compatibility' mode, and it MUST 526 initiate each negotiated tunnel in compatibility mode. On the other 527 hand, a compliant tunnel egress MUST merely implement the one 528 behaviour in Section 5, which we term 'full-functionality' mode. 530 Before switching to normal mode, a compliant tunnel ingress that does 531 not know the egress ECN capability (e.g. by configuration) MUST 532 negotiate with the tunnel egress to establish whether the egress is 533 in full functionality mode. If the egress is in full functionality 534 mode, the ingress puts itself into normal mode. In normal mode the 535 ingress follows the encapsulation rule in Section 5 (i.e. it copies 536 the inner ECN field into the outer header). If the egress is not in 537 full-functionality mode or doesn't understand the question, the 538 tunnel ingress MUST remain in compatibility mode. 540 A tunnel ingress in compatibility mode MUST set all outer headers to 541 Not-ECT. 543 The decapsulation rules for the egress of the tunnel in Section 5 544 have been defined in such a way that congestion control will still 545 work safely if any of the earlier versions of ECN processing are used 546 unilaterally at the encapsulating ingress of the tunnel. If a tunnel 547 ingress tries to negotiate to use limited functionality mode or full 548 functionality mode, a decapsulating tunnel egress compliant with this 549 specification MUST agree to the request, even though its behaviour 550 will be the same in both cases. For 'forward compatibility', a 551 compliant tunnel egress MUST raise a warning about any requests to 552 enter modes it doesn't recognise, but it can continue operating. If 553 no ECN-related mode is requested, no error or warning need be raised 554 as the egress behaviour is compatible with all the legacy ingress 555 behaviours that don't negotiate capabilities. 557 Note that if a compliant node is the ingress for multiple tunnels, a 558 mode setting will need to be stored for each tunnel ingress. 559 However, if a node is the egress for multiple tunnels, none of the 560 tunnels will need to store a mode setting, because a compliant egress 561 can only be in one mode. 563 7. Changes from Earlier RFCs 565 The rule that a tunnel ingress MUST copy any ECN field into the outer 566 header is a change to RFC3168 (unless it is a Load Regulator as well, 567 in which case there is no change). 569 The rules for calculating the outgoing ECN field on decapsulation at 570 a tunnel egress are in line with the full functionality mode of ECN 571 in RFC3168 and with RFC4301, except that neither identified the need 572 to raise an alarm if the inner header was CE but the outer header was 573 ECT. 575 The rules for how a tunnel establishes whether the egress has full 576 functionality ECN capabilities are an update to RFC3168. For all the 577 typical cases, RFC4301 is not updated by the ECN capability check in 578 this specification, because a typical RFC4301 tunnel ingress will 579 have already established that it is talking to an RFC4301 tunnel 580 egress (e.g. if it uses IKEv2). However, there may be some corner 581 cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with 582 an egress with limited functionality ECN handling. For such corner 583 cases, the requirement to use compatibility mode in this 584 specification updates RFC4301. 586 The optional ECN Tunnel field in the IPsec security association 587 database (SAD) and the optional ECN Tunnel Security Association 588 Attribute defined in RFC3168 are no longer needed. The security 589 association (SA) has no policy on ECN usage, because all RFC4301 590 tunnels now support ECN without any policy choice. 592 RFC3168 defines a (required) limited functionality mode and an 593 (optional) full functionality mode for a tunnel, but RFC4301 doesn't 594 need modes. In this specification only the ingress might need two 595 modes, unlike the modes of RFC3168 that were properties of the pair 596 of tunnel endpoints after negotiation. 598 All these ECN processing rules update RFC2003 on IP in IP tunnelling. 600 8. IANA Considerations 602 This memo includes no request to IANA. 604 9. Security Considerations 606 Section 3.1 discusses the security constraints imposed on ECN tunnel 607 processing. The Design Principles of Section 4 trade-off between 608 security (covert channels) and congestion monitoring & control. In 609 fact, ensuring congestion markings are not lost is itself another 610 aspect of security, because if we allowed congestion notification to 611 be lost, any attempt to enforce a response to congestion would be 612 much harder. 614 We keep the behaviour defined in both RFC3168 and RFC4301 where, if 615 the inner and outer headers carry contradictory ECT values the inner 616 header is preserved for onward forwarding. However, in writing this 617 document we noticed this behaviour would hide illegal suppression of 618 congestion notification from the detection mechanism designed for 619 this attack. One reason two ECT codepoints were defined was to 620 enable the source to detect if a CE marking had been applied then 621 subsequently removed. The source could detect this by weaving a 622 pseudo-random sequence of ECT(0) and ECT(1) values into a stream of 623 packets [RFC3540]. With the rules as they stand in RFC3168 and 624 RFC4301, within a tunnel a CE marking could be added and subsequently 625 removed by a non-compliant node without detection, because the 626 evidence of such misbehaviour is removed by the decapsulator. 628 We could have specified that an outer header value of ECT should 629 overwrite a contradictory ECT value in the inner header to close this 630 loophole. But we chose not to for two reasons: i) we wanted to avoid 631 any changes to IPsec tunnelling behaviour; ii) allowing ECT values in 632 the outer header to override the inner header would have increased 633 the bandwidth of the covert channel through the egress gateway from 1 634 to 1.5 bit per datagram, potentially threatening to upset the 635 consensus established in the security area that says that the 636 bandwidth of this covert channel can now be safely managed. 638 10. Conclusions 640 This document updates the tunnelling treatment of RFC3168 ECN for all 641 IP in IP tunnels to bring it into line with the new behaviour in the 642 IPsec architecture of RFC4301. 644 At the tunnel egress, header decapsulation for the default ECN 645 marking behaviour is broadly unchanged except that one exceptional 646 case has been catered for. At the ingress, for all forms of IP in IP 647 tunnel, encapsulation has been brought into line with the new IPsec 648 rules in RFC4301 which copy rather than reset CE markings when 649 creating outer headers. Previously, upstream congestion information 650 was not revealed in the outer header, which limited the scope of some 651 management monitoring techniques and prevented certain active queue 652 management algorithms from taking account of upstream congestion 653 markings. The change ensures all IP in IP tunnels reflect the more 654 relaxed attitude to revealing congestion information in the new IPsec 655 architecture, which now deems that the threat from 2-bit covert 656 channels can be managed without disabling ECN. 658 Also, this document defines more generic principles to guide the 659 design of alternate forms of tunnel processing of congestion 660 notification, if required for specific Diffserv PHBs (such as will be 661 required for the PCN working group) or for other lower layer 662 encapsulating protocols that might support congestion notification in 663 the future (e.g. MPLS). 665 11. Acknowledgements 667 Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele 668 Corliano for their careful review comments. 670 12. Comments Solicited 672 Comments and questions are encouraged and very welcome. They can be 673 addressed to the IETF Transport Area working group mailing list 674 , and/or to the authors. 676 13. References 678 13.1. Normative References 680 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 681 October 1996. 683 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 684 Requirement Levels", BCP 14, RFC 2119, March 1997. 686 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 687 "Definition of the Differentiated Services Field (DS 688 Field) in the IPv4 and IPv6 Headers", RFC 2474, 689 December 1998. 691 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 692 of Explicit Congestion Notification (ECN) to IP", 693 RFC 3168, September 2001. 695 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 696 Internet Protocol", RFC 4301, December 2005. 698 13.2. Informative References 700 [802.1au] "IEEE Standard for Local and Metropolitan Area Networks-- 701 Virtual Bridged Local Area Networks - Amendment 10: 702 Congestion Notification", 2006, 703 . 705 (Work in Progress; Access Controlled link within page) 707 [BBnet] Sexton, M. and A. Reid, "Broadband Networking: {ATM}, 708 {SDH} and {SONET}", Artech House telecommunications 709 library ISBN: 0-89006-578-0, 1997. 711 [I-D.ietf-tsvwg-ecn-mpls] 712 Davie, B., "Explicit Congestion Marking in MPLS", 713 draft-ietf-tsvwg-ecn-mpls-00 (work in progress), 714 March 2007. 716 [I-D.rosen-pwe3-congestion] 717 Rosen, E., "Pseudowire Congestion Control Framework", 718 draft-rosen-pwe3-congestion-04 (work in progress), 719 October 2006. 721 [PCN-arch] 722 Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R., 723 Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion 724 Notification Architecture", 725 draft-eardley-pcn-architecture-00 (work in progress), 726 June 2007. 728 [PCNcharter] 729 IETF, "Congestion and Pre-Congestion Notification (pcn)", 730 IETF w-g charter , Feb 2007, 731 . 733 [RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion 734 Control Survey", RFC 1254, August 1991. 736 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic 737 Routing Encapsulation (GRE)", RFC 1701, October 1994. 739 [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. 740 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 741 Functional Specification", RFC 2205, September 1997. 743 [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, 744 W., and G. Zorn, "Point-to-Point Tunneling Protocol", 745 RFC 2637, July 1999. 747 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 748 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 749 RFC 2661, August 1999. 751 [RFC3426] Floyd, S., "General Architectural and Policy 752 Considerations", RFC 3426, November 2002. 754 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 755 Congestion Notification (ECN) Signaling with Nonces", 756 RFC 3540, June 2003. 758 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 759 RFC 4306, December 2005. 761 [RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol 762 (HIP) Architecture", RFC 4423, May 2006. 764 [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", 765 2000, . 768 (Expired) 770 Appendix A. In-path Load Regulation 772 In the traditional Internet architecture one tends to think of the 773 source host as the Load Regulator for a path. It is generally not 774 desirable or practical for a node part way along the path to regulate 775 the load. However, various reasonable proposals for in-path load 776 regulation have been made from time to time (e.g. fair queuing, 777 traffic engineering). Also the IETF has recently chartered a working 778 group to standardise admission control across a part of a path using 779 pre-congestion notification (PCN) [PCNcharter], which involves in- 780 path load regulation. This is of particular relevance here because 781 it involves congestion notification with an in-path Load Regulator 782 and it can involve tunnelling. 784 We will use the more complex scenario in Figure 3 to tease out all 785 the issues that arise when combining congestion notification and 786 tunnelling with various possible in-path load regulation schemes. In 787 this case 'I1' and 'E2' break up the path into three separate 788 congestion control loops. The feedback for these loops is shown 789 going right to left across the top of the figure. The 'V's are arrow 790 heads representing the direction of feedback, not letters. But there 791 are also two tunnels within the middle control loop: 'I1' to 'E1' and 792 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS 793 core networks. M is a congestion monitoring point, perhaps between 794 two border routers where the same tunnel continues unbroken across 795 the border. 796 ______ _______________________________________ _____ 797 / \ / \ / \ 798 V \ V M \ V \ 799 A--->R--->I1===========>E1----->I2=========>==========>E2------->B 801 Figure 3: complex Tunnel Scenario 803 The question is, should the congestion markings in the outer exposed 804 headers of a tunnel represent congestion only since the tunnel 805 ingress or over the whole upstream path from the source of the inner 806 header (whatever that may mean)? Or put another way, should 'I1' and 807 'I2' copy or reset CE markings? 809 The answer is that the baseline of congestion marking should be the 810 nearest upstream interface designed to regulate traffic load--the 811 Load Regulator. In Figure 3 'A', 'I1' or 'E2' are all Load 812 Regulators. We have shown the feedback loops returning to each of 813 these nodes so that they can regulate the load causing the congestion 814 notification. So the baseline for congestion markings exposed to M 815 should be 'I1' (the Load Regulator), not 'I2'. That is, 'I2' SHOULD 816 copy any CE marking into the outer header it creates, while 'I1' is 817 an exception because it is an in-path load regulator, so it should 818 reset the ECN field in the outer header it creates. 820 The following further examples illustrate how this answer might be 821 applied: 823 o Preemption marking is currently defined for PCN [PCN-arch] so that 824 the rate of unmarked packets at the end of a path of multiple 825 bottlenecks determines the maximum sustainable aggregate bit rate 826 over that path. To produce the correct marking by the end, each 827 congested node must only consider packets to be eligible for 828 marking if they have not already been marked by any previous 829 bottleneck along a path that may span multiple tunnels (including 830 MPLS encapsulations etc.). This scheme only results in the 831 correct marking rate if the markings accumulated so far along the 832 path are copied into the outer exposed header of each tunnel or 833 encapsulation. Consider that 'I1' and 'E2' in the complex 834 scenario of Figure 3 are edge gateways of a PCN region. Admission 835 control based on PCN measurements is a form of load regulation, so 836 'I1' regulates the load on the PCN region. Therefore 'I1' should 837 be the baseline of congestion marking for _both_ tunnels within 838 the scope of its feedback loop. Therefore 'I2' should follow the 839 normal rules and copy congestion marking into the outer tunnel 840 header, while 'I1' is an exception because it is also a load 841 regulator, so it should reset CE markings in the outer header. 843 o [Shayman] suggested feedback of ECN accumulated across an MPLS 844 domain could cause the ingress to trigger re-routing to mitigate 845 congestion. This case is more like the simple scenario of 846 Figure 2, with a feedback loop across the MPLS domain ('E' back to 847 'I'). The baseline for congestion exposed in outer headers in 848 this case will be the tunnel ingress, which should therefore reset 849 the ECN field in the outer headers it creates. But the reason it 850 should act as the baseline is because it is an in-path load 851 regulator (re-routing around congestion is a load regulation 852 function), not just because it is a tunnel ingress. 854 o The PWE3 working group of the IETF is considering the problem of 855 how and whether an aggregate private wire emulation should respond 856 to congestion [I-D.rosen-pwe3-congestion]. Although the study is 857 still at the requirements stage, some (controversial) solution 858 proposals include in-path load regulation at the ingress to the 859 tunnel that could lead to tunnel arrangements with similar 860 complexity to that of Figure 3. 862 These are not contrived scenarios--they could be a lot worse. For 863 instance, a host may create a tunnel for IPsec which is placed inside 864 a tunnel for Mobile IP over a remote part of its path. And around 865 this all we may have MPLS labels being pushed and popped as packets 866 pass across different core networks. Similarly, it is possible that 867 subnets could be built from link technology (e.g. ethernet switches) 868 so that link headers being added and removed could involve congestion 869 notification in future link headers with all the same issues as with 870 IP in IP tunnels. 872 The reason we introduced the concept of a Load Regulator was to allow 873 for in-path load regulation. In the traditional Internet 874 architecture one tends to think of a host and a Load Regulator as 875 synonymous, but when considering tunnelling, even the definition of a 876 host is too fuzzy, whereas a Load Regulator is a clearly defined 877 function. Similarly, the concept of innermost header is too fuzzy to 878 be able to (wrongly) say that the source address of the innermost 879 header should be the baseline. Which is the innermost header when 880 multiple encapsulations may be in use? Where do we stop? If we say 881 the original source in the above IPsec-Mobile IP case is the host, 882 how do we know it isn't tunnelling an encrypted packet stream on 883 behalf of another host in a p2p network? 885 The reason there has been so much confusion over the question of 886 whether a tunnel ingress should copy or reset CE markings is that we 887 have become used to thinking that only hosts regulate load. The end 888 to end design principle advises that this is a good idea [RFC3426], 889 but it also advises that it is only a guiding principle intended to 890 make the designer think very carefully before breaking it. We do 891 have proposals where load regulation functions sit within a network 892 path for good, if sometimes controversial, reasons, e.g. PCN edge 893 admission control gateways [PCN-arch] or traffic engineering 894 functions at domain borders to re-route around congestion [Shayman]. 896 Author's Address 898 Bob Briscoe 899 BT 900 B54/77, Adastral Park 901 Martlesham Heath 902 Ipswich IP5 3RE 903 UK 905 Phone: +44 1473 645196 906 Email: bob.briscoe@bt.com 907 URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ 909 Full Copyright Statement 911 Copyright (C) The IETF Trust (2007). 913 This document is subject to the rights, licenses and restrictions 914 contained in BCP 78, and except as set forth therein, the authors 915 retain all their rights. 917 This document and the information contained herein are provided on an 918 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 919 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 920 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 921 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 922 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 923 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 925 Intellectual Property 927 The IETF takes no position regarding the validity or scope of any 928 Intellectual Property Rights or other rights that might be claimed to 929 pertain to the implementation or use of the technology described in 930 this document or the extent to which any license under such rights 931 might or might not be available; nor does it represent that it has 932 made any independent effort to identify any such rights. Information 933 on the procedures with respect to rights in RFC documents can be 934 found in BCP 78 and BCP 79. 936 Copies of IPR disclosures made to the IETF Secretariat and any 937 assurances of licenses to be made available, or the result of an 938 attempt made to obtain a general license or permission for the use of 939 such proprietary rights by implementers or users of this 940 specification can be obtained from the IETF on-line IPR repository at 941 http://www.ietf.org/ipr. 943 The IETF invites any interested party to bring to its attention any 944 copyrights, patents or patent applications, or other proprietary 945 rights that may cover technology that may be required to implement 946 this standard. Please address the information to the IETF at 947 ietf-ipr@ietf.org. 949 Acknowledgments 951 Funding for the RFC Editor function is provided by the IETF 952 Administrative Support Activity (IASA). This document was produced 953 using xml2rfc v1.32 (of http://xml.resource.org/) from a source in 954 RFC-2629 XML format.