idnits 2.17.1 draft-ietf-tsvwg-ecn-encap-guidelines-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC3819, but the abstract doesn't seem to directly say this. It does mention RFC3819 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3819, updated by this document, for RFC5378 checks: 1999-10-14) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 15, 2020) is 1251 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-11 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-10 -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft Independent 4 Updates: 3819 (if approved) J. Kaippallimalil 5 Intended status: Best Current Practice Futurewei 6 Expires: May 19, 2021 November 15, 2020 8 Guidelines for Adding Congestion Notification to Protocols that 9 Encapsulate IP 10 draft-ietf-tsvwg-ecn-encap-guidelines-14 12 Abstract 14 The purpose of this document is to guide the design of congestion 15 notification in any lower layer or tunnelling protocol that 16 encapsulates IP. The aim is for explicit congestion signals to 17 propagate consistently from lower layer protocols into IP. Then the 18 IP internetwork layer can act as a portability layer to carry 19 congestion notification from non-IP-aware congested nodes up to the 20 transport layer (L4). Following these guidelines should assure 21 interworking among IP layer and lower layer congestion notification 22 mechanisms, whether specified by the IETF or other standards bodies. 23 This document updates the advice to subnetwork designers about ECN in 24 RFC 3819. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on May 19, 2021. 43 Copyright Notice 45 Copyright (c) 2020 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Update to RFC 3819 . . . . . . . . . . . . . . . . . . . 5 62 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 9 65 3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 9 66 3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 11 67 3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 12 68 3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 14 69 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 70 Notification . . . . . . . . . . . . . . . . . . . . . . . . 14 71 4.1. IP-in-IP Tunnels with Shim Headers . . . . . . . . . . . 15 72 4.2. Wire Protocol Design: Indication of ECN Support . . . . . 16 73 4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 18 74 4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 20 75 4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 22 76 4.6. Reframing and Congestion Markings . . . . . . . . . . . . 22 77 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 78 Notification . . . . . . . . . . . . . . . . . . . . . . . . 23 79 6. Feed-Backward Mode: Guidelines for Adding Congestion 80 Notification . . . . . . . . . . . . . . . . . . . . . . . . 24 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 83 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 84 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 85 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 27 86 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 27 87 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 88 13.1. Normative References . . . . . . . . . . . . . . . . . . 27 89 13.2. Informative References . . . . . . . . . . . . . . . . . 28 90 Appendix A. Changes in This Version (to be removed by RFC 91 Editor) . . . . . . . . . . . . . . . . . . . . . . 33 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 94 1. Introduction 96 The benefits of Explicit Congestion Notification (ECN) described in 97 [RFC8087] and summarized below can only be fully realized if support 98 for ECN is added to the relevant subnetwork technology, as well as to 99 IP. When a lower layer buffer drops a packet obviously it does not 100 just drop at that layer; the packet disappears from all layers. In 101 contrast, when active queue management (AQM) at a lower layer marks a 102 packet with ECN, the marking needs to be explicitly propagated up the 103 layers. The same is true if AQM marks the outer header of a packet 104 that encapsulates inner tunnelled headers. Forwarding ECN is not as 105 straightforward as other headers because it has to be assumed ECN may 106 be only partially deployed. If a lower layer header that contains 107 ECN congestion indications is stripped off by a subnet egress that is 108 not ECN-aware, or if the ultimate receiver or sender is not ECN- 109 aware, congestion needs to be indicated by dropping a packet, not 110 marking it. 112 The purpose of this document is to guide the addition of congestion 113 notification to any subnet technology or tunnelling protocol, so that 114 lower layer AQM algorithms can signal congestion explicitly and it 115 will propagate consistently into encapsulated (higher layer) headers, 116 otherwise the signals will not reach their ultimate destination. 118 ECN is defined in the IP header (v4 and v6) [RFC3168] to allow a 119 resource to notify the onset of queue build-up without having to drop 120 packets, by explicitly marking a proportion of packets with the 121 congestion experienced (CE) codepoint. 123 Given a suitable marking scheme, ECN removes nearly all congestion 124 loss and it cuts delays for two main reasons: 126 o It avoids the delay when recovering from congestion losses, which 127 particularly benefits small flows or real-time flows, making their 128 delivery time predictably short [RFC2884]; 130 o As ECN is used more widely by end-systems, it will gradually 131 remove the need to configure a degree of delay into buffers before 132 they start to notify congestion (the cause of bufferbloat). This 133 is because drop involves a trade-off between sending a timely 134 signal and trying to avoid impairment, whereas ECN is solely a 135 signal not an impairment, so there is no harm triggering it 136 earlier. 138 Some lower layer technologies (e.g. MPLS, Ethernet) are used to form 139 subnetworks with IP-aware nodes only at the edges. These networks 140 are often sized so that it is rare for interior queues to overflow. 141 However, until recently this was more due to the inability of TCP to 142 saturate the links. For many years, fixes such as window scaling 143 [RFC7323] proved hard to deploy. And the Reno variant of TCP has 144 remained in widespread use despite its inability to scale to high 145 flow rates. However, now that modern operating systems are finally 146 capable of saturating interior links, even the buffers of well- 147 provisioned interior switches will need to signal episodes of 148 queuing. 150 Propagation of ECN is defined for MPLS [RFC5129], and is being 151 defined for TRILL [RFC7780], [I-D.ietf-trill-ecn-support], but it 152 remains to be defined for a number of other subnetwork technologies. 154 Similarly, ECN propagation is yet to be defined for many tunnelling 155 protocols. [RFC6040] defines how ECN should be propagated for IP-in- 156 IPv4 [RFC2003], IP-in-IPv6 [RFC2473] and IPsec [RFC4301] tunnels, but 157 there are numerous other tunnelling protocols with a shim and/or a 158 layer 2 header between two IP headers (v4 or v6). Some address ECN 159 propagation between the IP headers, but many do not. This document 160 gives guidance on how to address ECN propagation for future 161 tunnelling protocols, and a companion standards track specification 162 [I-D.ietf-tsvwg-rfc6040update-shim] updates those existing IP-shim- 163 (L2)-IP protocols that are under IETF change control and still widely 164 used. 166 Incremental deployment is the most delicate aspect when adding 167 support for ECN. The original ECN protocol in IP [RFC3168] was 168 carefully designed so that a congested buffer would not mark a packet 169 (rather than drop it) unless both source and destination hosts were 170 ECN-capable. Otherwise its congestion markings would never be 171 detected and congestion would just build up further. However, to 172 support congestion marking below the IP layer or within tunnels, it 173 is not sufficient to only check that the two layer 4 transport end- 174 points support ECN; correct operation also depends on the 175 decapsulator at each subnet or tunnel egress faithfully propagating 176 congestion notifications to the higher layer. Otherwise, a legacy 177 decapsulator might silently fail to propagate any ECN signals from 178 the outer to the forwarded header. Then the lost signals would never 179 be detected and again congestion would build up further. The 180 guidelines given later require protocol designers to carefully 181 consider incremental deployment, and suggest various safe approaches 182 for different circumstances. 184 Of course, the IETF does not have standards authority over every link 185 layer protocol. So this document gives guidelines for designing 186 propagation of congestion notification across the interface between 187 IP and protocols that may encapsulate IP (i.e. that can be layered 188 beneath IP). Each lower layer technology will exhibit different 189 issues and compromises, so the IETF or the relevant standards body 190 must be free to define the specifics of each lower layer congestion 191 notification scheme. Nonetheless, if the guidelines are followed, 192 congestion notification should interwork between different 193 technologies, using IP in its role as a 'portability layer'. 195 Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often 196 used in preference to 'MUST' or 'MUST NOT', because it is difficult 197 to know the compromises that will be necessary in each protocol 198 design. If a particular protocol design chooses not to follow a 199 'SHOULD (NOT)' given in the advice below, it MUST include a sound 200 justification. 202 It has not been possible to give common guidelines for all lower 203 layer technologies, because they do not all fit a common pattern. 204 Instead they have been divided into a few distinct modes of 205 operation: feed-forward-and-upward; feed-upward-and-forward; feed- 206 backward; and null mode. These modes are described in Section 3, 207 then in the subsequent sections separate guidelines are given for 208 each mode. 210 1.1. Update to RFC 3819 212 This document updates the brief advice to subnetwork designers about 213 ECN in [RFC3819], by replacing the last two paragraphs of Section 13 214 with the following sentence: 216 By following the guidelines in [this document], subnetwork 217 designers can enable a layer-2 protocol to participate in 218 congestion control without dropping packets via propagation of 219 explicit congestion notification (ECN [RFC3168]) to receivers. 221 and adding [this document] as an informative reference. {RFC Editor: 222 Please replace both instances of [this document] above with the 223 number of the present RFC when published.} 225 1.2. Scope 227 This document only concerns wire protocol processing of explicit 228 notification of congestion. It makes no changes or recommendations 229 concerning algorithms for congestion marking or for congestion 230 response, because algorithm issues should be independent of the layer 231 the algorithm operates in. 233 The default ECN semantics are described in [RFC3168] and updated by 234 [RFC8311]. Also the guidelines for AQM designers [RFC7567] clarify 235 the semantics of both drop and ECN signals from AQM algorithms. 236 [RFC4774] is the appropriate best current practice specification of 237 how algorithms with alternative semantics for the ECN field can be 238 partitioned from Internet traffic that uses the default ECN 239 semantics. There are two main examples for how alternative ECN 240 semantics have been defined in practice: 242 o RFC 4774 suggests using the ECN field in combination with a 243 Diffserv codepoint such as in PCN [RFC6660], Voice over 3G [UTRAN] 244 or Voice over LTE (VoLTE) [LTE-RA]; 246 o RFC 8311 suggests using the ECT(1) codepoint of the ECN field to 247 indicate alternative semantics such as for the experimental Low 248 Latency Low Loss Scalable throughput (L4S) service 249 [I-D.ietf-tsvwg-ecn-l4s-id]). 251 The aim is that the default rules for encapsulating and decapsulating 252 the ECN field are sufficiently generic that tunnels and subnets will 253 encapsulate and decapsulate packets without regard to how algorithms 254 elsewhere are setting or interpreting the semantics of the ECN field. 255 [RFC6040] updates RFC 4774 to allow alternative encapsulation and 256 decapsulation behaviours to be defined for alternative ECN semantics. 257 However it reinforces the same point - that it is far preferable to 258 try to fit within the common ECN encapsulation and decapsulation 259 behaviours, because expecting all lower layer technologies and 260 tunnels to be updated is likely to be completely impractical. 262 Alternative semantics for the ECN field can be defined to depend on 263 the traffic class indicated by the DSCP. Therefore correct 264 propagation of congestion signals could depend on correct propagation 265 of the DSCP between the layers and along the path. For instance, if 266 the meaning of the ECN field depends on the DSCP (as in PCN or VoLTE) 267 and if the outer DSCP is stripped on descapsulation, as in the pipe 268 model of [RFC2983], the special semantics of the ECN field would be 269 lost. Similarly, if the DSCP is changed at the boundary between 270 Diffserv domains, the special ECN semantics would also be lost. This 271 is an important implication of the localized scope of most Diffserv 272 arrangements. In this document, correct propagation of traffic class 273 information is assumed, while what 'correct' means and how it is 274 achieved is covered elsewhere (e.g. RFC 2983) and is outside the 275 scope of the present document. 277 The guidelines in this document do ensure that common encapsulation 278 and decapsulation rules are sufficiently generic to cover cases where 279 ECT(1) is used instead of ECT(0) to identify alternative ECN 280 semantics (as in L4S [I-D.ietf-tsvwg-ecn-l4s-id]) and where ECN 281 marking algorithms use ECT(1) to encode 3 severity levels into the 282 ECN field (e.g. PCN [RFC6660]) rather than the default of 2. All 283 these different semantics for the ECN field work because it has been 284 possible to define common default decapsulation rules that allow for 285 all cases. 287 Note that the guidelines in this document do not necessarily require 288 the subnet wire protocol to be changed to add support for congestion 289 notification. For instance, the Feed-Up-and-Forward Mode 290 (Section 3.2) and the Null Mode (Section 3.4) do not. Another way to 291 add congestion notification without consuming header space in the 292 subnet protocol might be to use a parallel control plane protocol. 294 This document focuses on the congestion notification interface 295 between IP and lower layer or tunnel protocols that can encapsulate 296 IP, where the term 'IP' includes v4 or v6, unicast, multicast or 297 anycast. However, it is likely that the guidelines will also be 298 useful when a lower layer protocol or tunnel encapsulates itself, 299 e.g. Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah) or when 300 it encapsulates other protocols. In the feed-backward mode, 301 propagation of congestion signals for multicast and anycast packets 302 is out-of-scope (because the complexity would make it unlikely to be 303 attempted). 305 2. Terminology 307 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 308 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 309 document are to be interpreted as described in [RFC2119] [RFC8174] 310 when, and only when, they appear in all capitals, as shown here. 312 Further terminology used within this document: 314 Protocol data unit (PDU): Information that is delivered as a unit 315 among peer entities of a layered network consisting of protocol 316 control information (typically a header) and possibly user data 317 (payload) of that layer. The scope of this document includes 318 layer 2 and layer 3 networks, where the PDU is respectively termed 319 a frame or a packet (or a cell in ATM). PDU is a general term for 320 any of these. This definition also includes a payload with a shim 321 header lying somewhere between layer 2 and 3. 323 Transport: The end-to-end transmission control function, 324 conventionally considered at layer-4 in the OSI reference model. 325 Given the audience for this document will often use the word 326 transport to mean low level bit carriage, whenever the term is 327 used it will be qualified, e.g. 'L4 transport'. 329 Encapsulator: The link or tunnel endpoint function that adds an 330 outer header to a PDU (also termed the 'link ingress', the 'subnet 331 ingress', the 'ingress tunnel endpoint' or just the 'ingress' 332 where the context is clear). 334 Decapsulator: The link or tunnel endpoint function that removes an 335 outer header from a PDU (also termed the 'link egress', the 336 'subnet egress', the 'egress tunnel endpoint' or just the 'egress' 337 where the context is clear). 339 Incoming header: The header of an arriving PDU before encapsulation. 341 Outer header: The header added to encapsulate a PDU. 343 Inner header: The header encapsulated by the outer header. 345 Outgoing header: The header forwarded by the decapsulator. 347 CE: Congestion Experienced [RFC3168] 349 ECT: ECN-Capable (L4) Transport [RFC3168] 351 Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] 353 Load Regulator: For each flow of PDUs, the transport function that 354 is capable of controlling the data rate. Typically located at the 355 data source, but in-path nodes can regulate load in some 356 congestion control arrangements (e.g. admission control, policing 357 nodes or transport circuit-breakers [RFC8084]). Note the term "a 358 function capable of controlling the load" deliberately includes a 359 transport that does not actually control the load responsively but 360 ideally it ought to (e.g. a sending application without congestion 361 control that uses UDP). 363 ECN-PDU: A PDU at the IP layer or below with a capacity to signal 364 congestion that is part of a congestion control feedback loop 365 within which all the nodes necessary to propagate the signal back 366 to the Load Regulator are capable of doing that propagation. An 367 IP packet with a non-zero ECN field implies that the endpoints are 368 ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is 369 intended to be a general term for a PDU at lower layers, as well 370 as at the IP layer. 372 Not-ECN-PDU: A PDU at the IP layer or below that is part of a 373 congestion control feedback-loop within which at least one node 374 necessary to propagate any explicit congestion notification 375 signals back to the Load Regulator is not capable of doing that 376 propagation. 378 3. Modes of Operation 380 This section sets down the different modes by which congestion 381 information is passed between the lower layer and the higher one. It 382 acts as a reference framework for the following sections, which give 383 normative guidelines for designers of explicit congestion 384 notification protocols, taking each mode in turn: 386 Feed-Forward-and-Up: Nodes feed forward congestion notification 387 towards the egress within the lower layer then up and along the 388 layers towards the end-to-end destination at the transport layer. 389 The following local optimisation is possible: 391 Feed-Up-and-Forward: A lower layer switch feeds-up congestion 392 notification directly into the higher layer (e.g. into the ECN 393 field in the IP header), irrespective of whether the node is at 394 the egress of a subnet. 396 Feed-Backward: Nodes feed back congestion signals towards the 397 ingress of the lower layer and (optionally) attempt to control 398 congestion within their own layer. 400 Null: Nodes cannot experience congestion at the lower layer except 401 at ingress nodes (which are IP-aware or equivalently higher-layer- 402 aware). 404 3.1. Feed-Forward-and-Up Mode 406 Like IP and MPLS, many subnet technologies are based on self- 407 contained protocol data units (PDUs) or frames sent unreliably. They 408 provide no feedback channel at the subnetwork layer, instead relying 409 on higher layers (e.g. TCP) to feed back loss signals. 411 In these cases, ECN may best be supported by standardising explicit 412 notification of congestion into the lower layer protocol that carries 413 the data forwards. Then a specification is needed for how the egress 414 of the lower layer subnet propagates this explicit signal into the 415 forwarded upper layer (IP) header. This signal continues forwards 416 until it finally reaches the destination transport (at L4). Then 417 typically the destination will feed this congestion notification back 418 to the source transport using an end-to-end protocol (e.g. TCP). 419 This is the arrangement that has already been used to add ECN to IP- 420 in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. 422 This mode is illustrated in Figure 1. Along the middle of the 423 figure, layers 2, 3 and 4 of the protocol stack are shown, and one 424 packet is shown along the bottom as it progresses across the network 425 from source to destination, crossing two subnets connected by a 426 router, and crossing two switches on the path across each subnet. 427 Congestion at the output of the first switch (shown as *) leads to a 428 congestion marking in the L2 header (shown as C in the illustration 429 of the packet). The chevrons show the progress of the resulting 430 congestion indication. It is propagated from link to link across the 431 subnet in the L2 header, then when the router removes the marked L2 432 header, it propagates the marking up into the L3 (IP) header. The 433 router forwards the marked L3 header into subnet 2, and when it adds 434 a new L2 header it copies the L3 marking into the L2 header as well, 435 as shown by the 'C's in both layers (assuming the technology of 436 subnet 2 also supports explicit congestion marking). 438 Note that there is no implication that each 'C' marking is encoded 439 the same; a different encoding might be used for the 'C' marking in 440 each protocol. 442 Finally, for completeness, we show the L3 marking arriving at the 443 destination, where the host transport protocol (e.g. TCP) feeds it 444 back to the source in the L4 acknowledgement (the 'C' at L4 in the 445 packet at the top of the diagram). 447 _ _ _ 448 /_______ | | |C| ACK Packet (V) 449 \ |_|_|_| 450 +---+ layer: 2 3 4 header +---+ 451 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 452 | | +---+ | ^ | 453 | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 454 | | +---+ +---+ | ^ | +---+ +---+ | | 455 | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 456 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 457 source subnet A router subnet B dest 458 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 459 | | | | | | | | |C| | | |C| | | |C|C| Data________\ 460 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / 461 layer: 4 3 2A 4 3 2A 4 3 4 3 2B 462 header 464 Figure 1: Feed-Forward-and-Up Mode 466 Of course, modern networks are rarely as simple as this text-book 467 example, often involving multiple nested layers. For example, a 3GPP 468 mobile network may have two IP-in-IP (GTP [GTPv1]) tunnels in series 469 and an MPLS backhaul between the base station and the first router. 470 Nonetheless, the example illustrates the general idea of feeding 471 congestion notification forward then upward whenever a header is 472 removed at the egress of a subnet. 474 Note that the FECN (forward ECN ) bit in Frame Relay [Buck00] and the 475 explicit forward congestion indication (EFCI [ITU-T.I.371]) bit in 476 ATM user data cells follow a feed-forward pattern. However, in ATM, 477 this arrangement is only part of a feed-forward-and-backward pattern 478 at the lower layer, not feed-forward-and-up out of the lower layer-- 479 the intention was never to interface to IP ECN at the subnet egress. 480 To our knowledge, Frame Relay FECN is solely used to detect where 481 more capacity should be provisioned. 483 3.2. Feed-Up-and-Forward Mode 485 Ethernet is particularly difficult to extend incrementally to support 486 explicit congestion notification. One way to support ECN in such 487 cases has been to use so called 'layer-3 switches'. These are 488 Ethernet switches that dig into the Ethernet payload to find an IP 489 header and manipulate or act on certain IP fields (specifically 490 Diffserv & ECN). For instance, in Data Center TCP [RFC8257], layer-3 491 switches are configured to mark the ECN field of the IP header within 492 the Ethernet payload when their output buffer becomes congested. 493 With respect to switching, a layer-3 switch acts solely on the 494 addresses in the Ethernet header; it does not use IP addresses, and 495 it does not decrement the TTL field in the IP header. 497 _ _ _ 498 /_______ | | |C| ACK packet (V) 499 \ |_|_|_| 500 +---+ layer: 2 3 4 header +---+ 501 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 502 | | +---+ | ^ | 503 | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 504 | | +--^+ +---+ | | +---+ +---+ | | 505 | | | *| | | | | | | | | | |L2 506 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 507 source subnet E router subnet F dest 508 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 509 | | | | | | | |C| | | | |C| | | |C|C| data________\ 510 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 511 layer: 4 3 2 4 3 2 4 3 4 3 2 512 header 514 Figure 2: Feed-Up-and-Forward Mode 516 By comparing Figure 2 with Figure 1, it can be seen that subnet E 517 (perhaps a subnet of layer-3 Ethernet switches) works in feed-up-and- 518 forward mode by notifying congestion directly into L3 at the point of 519 congestion, even though the congested switch does not otherwise act 520 at L3. In this example, the technology in subnet F (e.g. MPLS) does 521 support ECN natively, so when the router adds the layer-2 header it 522 copies the ECN marking from L3 to L2 as well. 524 3.3. Feed-Backward Mode 526 In some layer 2 technologies, explicit congestion notification has 527 been defined for use internally within the subnet with its own 528 feedback and load regulation, but typically the interface with IP for 529 ECN has not been defined. 531 For instance, for the available bit-rate (ABR) service in ATM, the 532 relative rate mechanism was one of the more popular mechanisms for 533 managing traffic, tending to supersede earlier designs. In this 534 approach ATM switches send special resource management (RM) cells in 535 both the forward and backward directions to control the ingress rate 536 of user data into a virtual circuit. If a switch buffer is 537 approaching congestion or is congested it sends an RM cell back 538 towards the ingress with respectively the No Increase (NI) or 539 Congestion Indication (CI) bit set in its message type field 540 [ATM-TM-ABR]. The ingress then holds or decreases its sending bit- 541 rate accordingly. 543 _ _ _ 544 /_______ | | |C| ACK packet (X) 545 \ |_|_|_| 546 +---+ layer: 2 3 4 header +---+ 547 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 548 | | +---+ | ^ | 549 | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 550 | | +---+ +---+ | | +---+ +---+ | | 551 | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 552 | | . . | . |Packet U | . . | . | . . | . | . . | .*| . . | |L2 553 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 554 source subnet G router subnet H dest 555 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ later 556 | | | | | | | | | | | | | | | | |C| | data________\ 557 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (W) / 558 4 3 2 4 3 2 4 3 4 3 2 559 _ 560 /__ |C| Feedback control 561 \ |_| cell/frame (V) 562 2 563 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier 564 | | | | | | | | | | | | | | | | | | | data________\ 565 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 566 layer: 4 3 2 4 3 2 4 3 4 3 2 567 header 569 Figure 3: Feed-Backward Mode 571 ATM's feed-backward approach does not fit well when layered beneath 572 IP's feed-forward approach--unless the initial data source is the 573 same node as the ATM ingress. Figure 3 shows the feed-backward 574 approach being used in subnet H. If the final switch on the path is 575 congested (*), it does not feed-forward any congestion indications on 576 packet (U). Instead it sends a control cell (V) back to the router 577 at the ATM ingress. 579 However, the backward feedback does not reach the original data 580 source directly because IP does not support backward feedback (and 581 subnet G is independent of subnet H). Instead, the router in the 582 middle throttles down its sending rate but the original data sources 583 don't reduce their rates. The resulting rate mismatch causes the 584 middle router's buffer at layer 3 to back up until it becomes 585 congested, which it signals forwards on later data packets at layer 3 586 (e.g. packet W). Note that the forward signal from the middle router 587 is not triggered directly by the backward signal. Rather, it is 588 triggered by congestion resulting from the middle router's mismatched 589 rate response to the backward signal. 591 In response to this later forward signalling, end-to-end feedback at 592 layer-4 finally completes the tortuous path of congestion indications 593 back to the origin data source, as before. 595 Quantized congestion notification (QCN [IEEE802.1Q]) would suffer 596 from similar problems if extended to multiple subnets. However, from 597 the start QCN was clearly characterized as solely applicable to a 598 single subnet (see Section 6). 600 3.4. Null Mode 602 Often link and physical layer resources are 'non-blocking' by design. 603 In these cases congestion notification may be implemented but it does 604 not need to be deployed at the lower layer; ECN in IP would be 605 sufficient. 607 A degenerate example is a point-to-point Ethernet link. Excess 608 loading of the link merely causes the queue from the higher layer to 609 back up, while the lower layer remains immune to congestion. Even a 610 whole meshed subnetwork can be made immune to interior congestion by 611 limiting ingress capacity and sufficient sizing of interior links, 612 e.g. a non-blocking fat-tree network [Leiserson85]. An alternative 613 to fat links near the root is numerous thin links with multi-path 614 routing to ensure even worst-case patterns of load cannot congest any 615 link, e.g. a Clos network [Clos53]. 617 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 618 Notification 620 Feed-forward-and-up is the mode already used for signalling ECN up 621 the layers through MPLS into IP [RFC5129] and through IP-in-IP 622 tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 623 [RFC2473] or IPsec [RFC4301]. These RFCs take a consistent approach 624 and the following guidelines are designed to ensure this consistency 625 continues as ECN support is added to other protocols that encapsulate 626 IP. The guidelines are also designed to ensure compliance with the 627 more general best current practice for the design of alternate ECN 628 schemes given in [RFC4774] and extended by [RFC8311]. 630 The rest of this section is structured as follows: 632 o Section 4.1 addresses the most straightforward cases, where 633 [RFC6040] can be applied directly to add ECN to tunnels that are 634 effectively IP-in-IP tunnels, but with shim header(s) between the 635 IP headers. 637 o The subsequent sections give guidelines for adding ECN to a subnet 638 technology that uses feed-forward-and-up mode like IP, but it is 639 not so similar to IP that [RFC6040] rules can be applied directly. 640 Specifically: 642 * Sections 4.2, 4.3 and 4.4 respectively address how to add ECN 643 support to the wire protocol and to the encapsulators and 644 decapsulators at the ingress and egress of the subnet. 646 * Section 4.5 deals with the special, but common, case of 647 sequences of tunnels or subnets that all use the same 648 technology 650 * Section 4.6 deals with the question of reframing when IP 651 packets do not map 1:1 into lower layer frames. 653 4.1. IP-in-IP Tunnels with Shim Headers 655 A common pattern for many tunnelling protocols is to encapsulate an 656 inner IP header with shim header(s) then an outer IP header. A shim 657 header is defined as one that is not sufficient alone to forward the 658 packet as an outer header. Another common pattern is for a shim to 659 encapsulate a layer 2 (L2) header, which in turn encapsulates (or 660 might encapsulate) an IP header. [I-D.ietf-tsvwg-rfc6040update-shim] 661 clarifies that RFC 6040 is just as applicable when there are shim(s) 662 and possibly a L2 header between two IP headers. 664 However, it is not always feasible or necessary to propagate ECN 665 between IP headers when separated by a shim. For instance, it might 666 be too costly to dig to arbitrary depths to find an inner IP header, 667 there may be little or no congestion within the tunnel by design (see 668 null mode in Section 3.4 above), or a legacy implementation might not 669 support ECN. In cases where a tunnel does not support ECN, it is 670 important that the ingress does not copy the ECN field from an inner 671 IP header to an outer. Therefore section 4 of 672 [I-D.ietf-tsvwg-rfc6040update-shim] requires network operators to 673 configure the ingress of a tunnel that does not support ECN so that 674 it zeros the ECN field in the outer IP header. 676 Nonetheless, in many cases it is feasible to propagate the ECN field 677 between IP headers separated by shim header(s) and/or a L2 header. 678 Particularly in the typical case when the outer IP header and the 679 shim(s) are added (or removed) as part of the same procedure. Even 680 if the shim(s) encapsulate a L2 header, it is often possible to find 681 an inner IP header within the L2 PDU and propagate ECN between that 682 and the outer IP header. This can be thought of as a special case of 683 the feed-up-and-forward mode (Section 3.2), so the guidelines for 684 this mode apply (Section 5). 686 Numerous shim protocols have been defined for IP tunnelling. More 687 recent ones e.g. Generic UDP Encapsulation (GUE) 688 [I-D.ietf-intarea-gue] and Geneve [I-D.ietf-nvo3-geneve] cite and 689 follow RFC 6040. And some earlier ones, e.g. CAPWAP [RFC5415] and 690 LISP [RFC6830], cite RFC 3168, which is compatible with RFC 6040. 692 However, as Section 9.3 of RFC 3168 pointed out, ECN support needs to 693 be defined for many earlier shim-based tunnelling protocols, e.g. 694 L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], 695 GTP [GTPv1], [GTPv1-U], [GTPv2-C] and Teredo [RFC4380] as well as 696 some recent ones, e.g. VXLAN [RFC7348], NVGRE [RFC7637] and NSH 697 [RFC8300]. 699 All these IP-based encapsulations can be updated in one shot by 700 simple reference to RFC 6040. However, it would not be appropriate 701 to update all these protocols from within the present guidance 702 document. Instead a companion specification 703 [I-D.ietf-tsvwg-rfc6040update-shim] has been prepared that has the 704 appropriate standards track status to update standards track 705 protocols. For those that are not under IETF change control 706 [I-D.ietf-tsvwg-rfc6040update-shim] can only recommend that the 707 relevant body updates them. 709 4.2. Wire Protocol Design: Indication of ECN Support 711 This section is intended to guide the redesign of any lower layer 712 protocol that encapsulate IP to add native ECN support at the lower 713 layer. It reflects the approaches used in [RFC6040] and in 714 [RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS 715 encapsulations that already comply with [RFC6040] or [RFC5129] will 716 already satisfy this guidance. 718 A lower layer (or subnet) congestion notification system: 720 1. SHOULD NOT apply explicit congestion notifications to PDUs that 721 are destined for legacy layer-4 transport implementations that 722 will not understand ECN, and 724 2. SHOULD NOT apply explicit congestion notifications to PDUs if the 725 egress of the subnet might not propagate congestion notifications 726 onward into the higher layer. 728 We use the term ECN-PDUs for a PDU on a feedback loop that will 729 propagate congestion notification properly because it meets both 730 the above criteria. And a Not-ECN-PDU is a PDU on a feedback 731 loop that does not meet at least one of the criteria, and will 732 therefore not propagate congestion notification properly. A 733 corollary of the above is that a lower layer congestion 734 notification protocol: 736 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. 738 Note that there is no need for all interior nodes within a subnet to 739 be able to mark congestion explicitly. A mix of ECN and drop signals 740 from different nodes is fine. However, if _any_ interior nodes might 741 generate ECN markings, guideline 2 above says that all relevant 742 egress node(s) SHOULD be able to propagate those markings up to the 743 higher layer. 745 In IP, if the ECN field in each PDU is cleared to the Not-ECT (not 746 ECN-capable transport) codepoint, it indicates that the L4 transport 747 will not understand congestion markings. A congested buffer must not 748 mark these Not-ECT PDUs, and therefore drops them instead. 750 The mechanism a lower layer uses to distinguish the ECN-capability of 751 PDUs need not mimic that of IP. The above guidelines merely say that 752 the lower layer system, as a whole, should achieve the same outcome. 753 For instance, ECN-capable feedback loops might use PDUs that are 754 identified by a particular set of labels or tags. Alternatively, 755 logical link protocols that use flow state might determine whether a 756 PDU can be congestion marked by checking for ECN-support in the flow 757 state. Other protocols might depend on out-of-band control signals. 759 The per-domain checking of ECN support in MPLS [RFC5129] is a good 760 example of a way to avoid sending congestion markings to L4 761 transports that will not understand them, without using any header 762 space in the subnet protocol. 764 In MPLS, header space is extremely limited, therefore RFC5129 does 765 not provide a field in the MPLS header to indicate whether the PDU is 766 an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are 767 allowed to set explicit congestion indications without checking 768 whether the PDU is destined for a L4 transport that will understand 769 them. Nonetheless, this is made safe by requiring that the network 770 operator upgrades all decapsulating edges of a whole domain at once, 771 as soon as even one switch within the domain is configured to mark 772 rather than drop during congestion. Therefore, any edge node that 773 might decapsulate a packet will be capable of checking whether the 774 higher layer transport is ECN-capable. When decapsulating a CE- 775 marked packet, if the decapsulator discovers that the higher layer 776 (inner header) indicates the transport is not ECN-capable, it drops 777 the packet--effectively on behalf of the earlier congested node (see 778 Decapsulation Guideline 1 in Section 4.4). 780 It was only appropriate to define such an incremental deployment 781 strategy because MPLS is targeted solely at professional operators, 782 who can be expected to ensure that a whole subnetwork is consistently 783 configured. This strategy might not be appropriate for other link 784 technologies targeted at zero-configuration deployment or deployment 785 by the general public (e.g. Ethernet). For such 'plug-and-play' 786 environments it will be necessary to invent a failsafe approach that 787 ensures congestion markings will never fall into black holes, no 788 matter how inconsistently a system is put together. Alternatively, 789 congestion notification relying on correct system configuration could 790 be confined to flavours of Ethernet intended only for professional 791 network operators, such as Provider Backbone Bridges (PBB 792 [IEEE802.1Q]; previously 802.1ah). 794 ECN support in TRILL [I-D.ietf-trill-ecn-support] provides a good 795 example of how to add ECN to a lower layer protocol without relying 796 on careful and consistent operator configuration. TRILL provides an 797 extension header word with space for flags of different categories 798 depending on whether logic to understand the extension is critical. 799 The congestion experienced marking has been defined as a 'critical 800 ingress-to-egress' flag. So if a transit RBridge sets this flag and 801 an egress RBridge does not have any logic to process it, it will drop 802 it; which is the desired default action anyway. Therefore TRILL 803 RBridges can be updated with support for ECN in no particular order 804 and, at the egress of the TRILL campus, congestion notification will 805 be propagated to IP as ECN whenever ECN logic has been implemented, 806 or as drop otherwise. 808 QCN [IEEE802.1Q] is not intended to extend beyond a single subnet, or 809 to interoperate with ECN. Nonetheless, the way QCN indicates to 810 lower layer devices that the end-points will not understand QCN 811 provides another example that a lower layer protocol designer might 812 be able to mimic for their scenario. An operator can define certain 813 Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to 814 indicate non-QCN frames and an ingress bridge is required to map 815 arriving not-QCN-capable IP packets to one of these non-QCN PCPs. 817 4.3. Encapsulation Guidelines 819 This section is intended to guide the redesign of any node that 820 encapsulates IP with a lower layer header when adding native ECN 821 support to the lower layer protocol. It reflects the approaches used 822 in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- 823 MPLS or MPLS-in-MPLS encapsulations that already comply with 824 [RFC6040] or [RFC5129] will already satisfy this guidance. 826 1. Egress Capability Check: A subnet ingress needs to be sure that 827 the corresponding egress of a subnet will propagate any 828 congestion notification added to the outer header across the 829 subnet. This is necessary in addition to checking that an 830 incoming PDU indicates an ECN-capable (L4) transport. Examples 831 of how this guarantee might be provided include: 833 * by configuration (e.g. if any label switches in a domain 834 support ECN marking, [RFC5129] requires all egress nodes to 835 have been configured to propagate ECN) 837 * by the ingress explicitly checking that the egress propagates 838 ECN (e.g. an early attempt to add ECN support to TRILL used 839 IS-IS to check path capabilities before adding ECN extension 840 flags to each frame [RFC7780]). 842 * by inherent design of the protocol (e.g. by encoding ECN 843 marking on the outer header in such a way that a legacy egress 844 that does not understand ECN will consider the PDU corrupt or 845 invalid and discard it, thus at least propagating a form of 846 congestion signal). 848 2. Egress Fails Capability Check: If the ingress cannot guarantee 849 that the egress will propagate congestion notification, the 850 ingress SHOULD disable ECN at the lower layer when it forwards 851 the PDU. An example of how the ingress might disable ECN at the 852 lower layer would be by setting the outer header of the PDU to 853 identify it as a Not-ECN-PDU, assuming the subnet technology 854 supports such a concept. 856 3. Standard Congestion Monitoring Baseline: Once the ingress to a 857 subnet has established that the egress will correctly propagate 858 ECN, on encapsulation it SHOULD encode the same level of 859 congestion in outer headers as is arriving in incoming headers. 860 For example it might copy any incoming congestion notification 861 into the outer header of the lower layer protocol. 863 This ensures that bulk congestion monitoring of outer headers 864 (e.g. by a network management node monitoring ECN in passing 865 frames) will measure congestion accumulated along the whole 866 upstream path - since the Load Regulator not just since the 867 ingress of the subnet. A node that is not the Load Regulator 868 SHOULD NOT re-initialize the level of CE markings in the outer to 869 zero. 871 It would still also be possible to measure congestion introduced 872 across one subnet (or tunnel) by subtracting the level of CE 873 markings on inner headers from that on outer headers (see 874 Appendix C of [RFC6040]). For example: 876 * If this guideline has been followed and if the level of CE 877 markings is 0.4% on the outer and 0.1% on the inner, 0.4% 878 congestion has been introduced across all the networks since 879 the load regulator, and 0.3% (= 0.4% - 0.1%) has been 880 introduced since the ingress to the current subnet (or 881 tunnel); 883 * Without this guideline, if the subnet ingress had re- 884 initialized the outer congestion level to zero, the outer and 885 inner would measure 0.1% and 0.3%. It would still be possible 886 to infer that the congestion introduced since the Load 887 Regulator was 0.4% (= 0.1% + 0.3%). But only if the 888 monitoring system somehow knows whether the subnet ingress re- 889 initialized the congestion level. 891 As long as subnet and tunnel technologies use the standard 892 congestion monitoring baseline in this guideline, monitoring 893 systems will know to use the former approach, rather than having 894 to "somehow know" which approach to use. 896 4.4. Decapsulation Guidelines 898 This section is intended to guide the redesign of any node that 899 decapsulates IP from within a lower layer header when adding native 900 ECN support to the lower layer protocol. It reflects the approaches 901 used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or 902 IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with 903 [RFC6040] or [RFC5129] will already satisfy this guidance. 905 A subnet egress SHOULD NOT simply copy congestion notification from 906 outer headers to the forwarded header. It SHOULD calculate the 907 outgoing congestion notification field from the inner and outer 908 headers using the following guidelines. If there is any conflict, 909 rules earlier in the list take precedence over rules later in the 910 list: 912 1. If the arriving inner header is a Not-ECN-PDU it implies the L4 913 transport will not understand explicit congestion markings. 914 Then: 916 * If the outer header carries an explicit congestion marking, 917 drop is the only indication of congestion that the L4 918 transport will understand. If the congestion marking is the 919 most severe possible, the packet MUST be dropped. However, if 920 congestion can be marked with multiple levels of severity and 921 the packet's marking is not the most severe, this requirement 922 can be relaxed to: the packet SHOULD be dropped. 924 * If the outer is an ECN-PDU that carries no indication of 925 congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but 926 still as a Not-ECN-PDU. 928 2. If the outer header does not support explicit congestion 929 notification (a Not-ECN-PDU), but the inner header does (an ECN- 930 PDU), the inner header SHOULD be forwarded unchanged. 932 3. In some lower layer protocols congestion may be signalled as a 933 numerical level, such as in the control frames of quantized 934 congestion notification (QCN [IEEE802.1Q]). If such a multi-bit 935 encoding encapsulates an ECN-capable IP data packet, a function 936 will be needed to convert the quantized congestion level into the 937 frequency of congestion markings in outgoing IP packets. 939 4. Congestion indications might be encoded by a severity level. For 940 instance increasing levels of congestion might be encoded by 941 numerically increasing indications, e.g. pre-congestion 942 notification (PCN) can be encoded in each PDU at three severity 943 levels in IP or MPLS [RFC6660] and the default encapsulation and 944 decapsulation rules [RFC6040] are compatible with this 945 interpretation of the ECN field. 947 If the arriving inner header is an ECN-PDU, where the inner and 948 outer headers carry indications of congestion of different 949 severity, the more severe indication SHOULD be forwarded in 950 preference to the less severe. 952 5. The inner and outer headers might carry a combination of 953 congestion notification fields that should not be possible given 954 any currently used protocol transitions. For instance, if 955 Encapsulation Guideline 3 in Section 4.3 had been followed, it 956 should not be possible to have a less severe indication of 957 congestion in the outer than in the inner. It MAY be appropriate 958 to log unexpected combinations of headers and possibly raise an 959 alarm. 961 If a safe outgoing codepoint can be defined for such a PDU, the 962 PDU SHOULD be forwarded rather than dropped. Some implementers 963 discard PDUs with currently unused combinations of headers just 964 in case they represent an attack. However, an approach using 965 alarms and policy-mediated drop is preferable to hard-coded drop, 966 so that operators can keep track of possible attacks but 967 currently unused combinations are not precluded from future use 968 through new standards actions. 970 4.5. Sequences of Similar Tunnels or Subnets 972 In some deployments, particularly in 3GPP networks, an IP packet may 973 traverse two or more IP-in-IP tunnels in sequence that all use 974 identical technology (e.g. GTP). 976 In such cases, it would be sufficient for every encapsulation and 977 decapsulation in the chain to comply with RFC 6040. Alternatively, 978 as an optimisation, a node that decapsulates a packet and immediately 979 re-encapsulates it for the next tunnel MAY copy the incoming outer 980 ECN field directly to the outgoing outer and the incoming inner ECN 981 field directly to the outgoing inner. Then the overall behavior 982 across the sequence of tunnel segments would still be consistent with 983 RFC 6040. 985 Appendix C of RFC6040 describes how a tunnel egress can monitor how 986 much congestion has been introduced within a tunnel. A network 987 operator might want to monitor how much congestion had been 988 introduced within a whole sequence of tunnels. Using the technique 989 in Appendix C of RFC6040 at the final egress, the operator could 990 monitor the whole sequence of tunnels, but only if the above 991 optimisation were used consistently along the sequence of tunnels, in 992 order to make it appear as a single tunnel. Therefore, tunnel 993 endpoint implementations SHOULD allow the operator to configure 994 whether this optimisation is enabled. 996 When ECN support is added to a subnet technology, consideration 997 SHOULD be given to a similar optimisation between subnets in sequence 998 if they all use the same technology. 1000 4.6. Reframing and Congestion Markings 1002 The guidance in this section is worded in terms of framing 1003 boundaries, but it applies equally whether the protocol data units 1004 are frames, cells, packets or fragments. 1006 Where an AQM marks the ECN field of IP packets as they queue into a 1007 layer-2 link, there will be no problem with framing boundaries, 1008 because the ECN markings would be applied directly to IP packets. 1009 The guidance in this section is only applicable where an ECN 1010 capability is being added to a layer-2 protocol so that layer-2 1011 frames can be ECN-marked by an AQM at layer-2. This would only be 1012 necessary where AQM will be applied at pure layer-2 nodes (without 1013 IP-awareness). Where framing boundaries do not necessarily align 1014 with packet boundaries, the following guidance will be needed. It 1015 explains how to propagate ECN markings from layer-2 frame headers 1016 when they are stripped off and IP PDUs with different boundaries are 1017 reassembled for forwarding. 1019 Congestion indications SHOULD be propagated on the basis that an 1020 encapsulator or decapsulator SHOULD approximately preserve the 1021 proportion of PDUs with congestion indications arriving and leaving. 1023 The mechanism for propagating congestion indications SHOULD ensure 1024 that any incoming congestion indication is propagated immediately, 1025 not held awaiting the possibility of further congestion indications 1026 to be sufficient to indicate congestion on an outgoing PDU. 1028 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 1029 Notification 1031 The guidance in this section is applicable, for example, when IP 1032 packets: 1034 o are encapsulated in Ethernet headers, which have no support for 1035 ECN; 1037 o are forwarded by the eNode-B (base station) of a 3GPP radio access 1038 network, which is required to apply ECN marking during congestion, 1039 [LTE-RA], [UTRAN], but the Packet Data Convergence Protocol (PDCP) 1040 that encapsulates the IP header over the radio access has no 1041 support for ECN. 1043 This guidance also generalizes to encapsulation by other subnet 1044 technologies with no native support for explicit congestion 1045 notification at the lower layer, but with support for finding and 1046 processing an IP header. It is unlikely to be applicable or 1047 necessary for IP-in-IP encapsulation, where feed-forward-and-up mode 1048 based on [RFC6040] would be more appropriate. 1050 Marking the IP header while switching at layer-2 (by using a layer-3 1051 switch) or while forwarding in a radio access network seems to 1052 represent a layering violation. However, it can be considered as a 1053 benign optimisation if the guidelines below are followed. Feed-up- 1054 and-forward is certainly not a general alternative to implementing 1055 feed-forward congestion notification in the lower layer, because: 1057 o IPv4 and IPv6 are not the only layer-3 protocols that might be 1058 encapsulated by lower layer protocols 1060 o Link-layer encryption might be in use, making the layer-2 payload 1061 inaccessible 1063 o Many Ethernet switches do not have 'layer-3 switch' capabilities 1064 so they cannot read or modify an IP payload 1066 o It might be costly to find an IP header (v4 or v6) when it may be 1067 encapsulated by more than one lower layer header, e.g. Ethernet 1068 MAC in MAC ([IEEE802.1Q]; previously 802.1ah). 1070 Nonetheless, configuring lower layer equipment to look for an ECN 1071 field in an encapsulated IP header is a useful optimisation. If the 1072 implementation follows the guidelines below, this optimisation does 1073 not have to be confined to a controlled environment such as within a 1074 data centre; it could usefully be applied on any network--even if the 1075 operator is not sure whether the above issues will never apply: 1077 1. If a native lower-layer congestion notification mechanism exists 1078 for a subnet technology, it is safe to mix feed-up-and-forward 1079 with feed-forward-and-up on other switches in the same subnet. 1080 However, it will generally be more efficient to use the native 1081 mechanism. 1083 2. The depth of the search for an IP header SHOULD be limited. If 1084 an IP header is not found soon enough, or an unrecognized or 1085 unreadable header is encountered, the switch SHOULD resort to an 1086 alternative means of signalling congestion (e.g. drop, or the 1087 native lower layer mechanism if available). 1089 3. It is sufficient to use the first IP header found in the stack; 1090 the egress of the relevant tunnel can propagate congestion 1091 notification upwards to any more deeply encapsulated IP headers 1092 later. 1094 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification 1096 It can be seen from Section 3.3 that congestion notification in a 1097 subnet using feed-backward mode has generally not been designed to be 1098 directly coupled with IP layer congestion notification. The subnet 1099 attempts to minimize congestion internally, and if the incoming load 1100 at the ingress exceeds the capacity somewhere through the subnet, the 1101 layer 3 buffer into the ingress backs up. Thus, a feed-backward mode 1102 subnet is in some sense similar to a null mode subnet, in that there 1103 is no need for any direct interaction between the subnet and higher 1104 layer congestion notification. Therefore no detailed protocol design 1105 guidelines are appropriate. Nonetheless, a more general guideline is 1106 appropriate: 1108 A subnetwork technology intended to eventually interface to IP 1109 SHOULD NOT be designed using only the feed-backward mode, which is 1110 certainly best for a stand-alone subnet, but would need to be 1111 modified to work efficiently as part of the wider Internet, 1112 because IP uses feed-forward-and-up mode. 1114 The feed-backward approach at least works beneath IP, where the term 1115 'works' is used only in a narrow functional sense because feed- 1116 backward can result in very inefficient and sluggish congestion 1117 control--except if it is confined to the subnet directly connected to 1118 the original data source, when it is faster than feed-forward. It 1119 would be valid to design a protocol that could work in feed-backward 1120 mode for paths that only cross one subnet, and in feed-forward-and-up 1121 mode for paths that cross subnets. 1123 In the early days of TCP/IP, a similar feed-backward approach was 1124 tried for explicit congestion signalling, using source-quench (SQ) 1125 ICMP control packets. However, SQ fell out of favour and is now 1126 formally deprecated [RFC6633]. The main problem was that it is hard 1127 for a data source to tell the difference between a spoofed SQ message 1128 and a quench request from a genuine buffer on the path. It is also 1129 hard for a lower layer buffer to address an SQ message to the 1130 original source port number, which may be buried within many layers 1131 of headers, and possibly encrypted. 1133 QCN (also known as backward congestion notification, BCN; see 1134 Sections 30--33 of [IEEE802.1Q]; previously known as 802.1Qau) uses a 1135 feed-backward mode structurally similar to ATM's relative rate 1136 mechanism. However, QCN confines its applicability to scenarios such 1137 as some data centres where all endpoints are directly attached by the 1138 same Ethernet technology. If a QCN subnet were later connected into 1139 a wider IP-based internetwork (e.g. when attempting to interconnect 1140 multiple data centres) it would suffer the inefficiency shown in 1141 Figure 3. 1143 7. IANA Considerations 1145 This memo includes no request to IANA. 1147 8. Security Considerations 1149 If a lower layer wire protocol is redesigned to include explicit 1150 congestion signalling in-band in the protocol header, care SHOULD be 1151 take to ensure that the field used is specified as mutable during 1152 transit. Otherwise interior nodes signalling congestion would 1153 invalidate any authentication protocol applied to the lower layer 1154 header--by altering a header field that had been assumed as 1155 immutable. 1157 The redesign of protocols that encapsulate IP in order to propagate 1158 congestion signals between layers raises potential signal integrity 1159 concerns. Experimental or proposed approaches exist for assuring the 1160 end-to-end integrity of in-band congestion signals, e.g.: 1162 o Congestion exposure (ConEx ) for networks to audit that their 1163 congestion signals are not being suppressed by other networks or 1164 by receivers, and for networks to police that senders are 1165 responding sufficiently to the signals, irrespective of the L4 1166 transport protocol used [RFC7713]. 1168 o A test for a sender to detect whether a network or the receiver is 1169 suppressing congestion signals (for example see 2nd para of 1170 Section 20.2 of [RFC3168]). 1172 Given these end-to-end approaches are already being specified, it 1173 would make little sense to attempt to design hop-by-hop congestion 1174 signal integrity into a new lower layer protocol, because end-to-end 1175 integrity inherently achieves hop-by-hop integrity. 1177 Section 6 gives vulnerability to spoofing as one of the reasons for 1178 deprecating feed-backward mode. 1180 9. Conclusions 1182 Following the guidance in this document enables ECN support to be 1183 extended to numerous protocols that encapsulate IP (v4 & v6) in a 1184 consistent way, so that IP continues to fulfil its role as an end-to- 1185 end interoperability layer. This includes: 1187 o A wide range of tunnelling protocols including those with various 1188 forms of shim header between two IP headers, possibly also 1189 separated by a L2 header; 1191 o A wide range of subnet technologies, particularly those that work 1192 in the same 'feed-forward-and-up' mode that is used to support ECN 1193 in IP and MPLS. 1195 Guidelines have been defined for supporting propagation of ECN 1196 between Ethernet and IP on so-called Layer-3 Ethernet switches, using 1197 a 'feed-up-and-forward' mode. This approach could enable other 1198 subnet technologies to pass ECN signals into the IP layer, even if 1199 they do not support ECN natively. 1201 Finally, attempting to add ECN to a subnet technology in feed- 1202 backward mode is deprecated except in special cases, due to its 1203 likely sluggish response to congestion. 1205 10. Acknowledgements 1207 Thanks to Gorry Fairhurst and David Black for extensive reviews. 1208 Thanks also to the following reviewers: Joe Touch, Andrew McGregor, 1209 Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald 1210 Eastlake, Jonathan Morton and Michael Welzl, who pointed out that 1211 lower layer congestion notification signals may have different 1212 semantics to those in IP. Thanks are also due to the tsvwg chairs, 1213 TSV ADs and IETF liaison people such as Eric Gray, Dan Romascanu and 1214 Gonzalo Camarillo for helping with the liaisons with the IEEE and 1215 3GPP. And thanks to Georg Mayer and particularly to Erik Guttman for 1216 the extensive search and categorisation of any 3GPP specifications 1217 that cite ECN specifications. 1219 Bob Briscoe was part-funded by the European Community under its 1220 Seventh Framework Programme through the Trilogy project (ICT-216372) 1221 for initial drafts and through the Reducing Internet Transport 1222 Latency (RITE) project (ICT-317700) subsequently. The views 1223 expressed here are solely those of the authors. 1225 11. Contributors 1227 Pat Thaler 1228 Broadcom Corporation (retired) 1229 CA 1230 USA 1232 Pat was a co-author of this draft, but retired before its 1233 publication. 1235 12. Comments Solicited 1237 Comments and questions are encouraged and very welcome. They can be 1238 addressed to the IETF Transport Area working group mailing list 1239 , and/or to the authors. 1241 13. References 1243 13.1. Normative References 1245 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1246 Requirement Levels", BCP 14, RFC 2119, 1247 DOI 10.17487/RFC2119, March 1997, 1248 . 1250 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1251 of Explicit Congestion Notification (ECN) to IP", 1252 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1253 . 1255 [RFC3819] Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D., 1256 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1257 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1258 RFC 3819, DOI 10.17487/RFC3819, July 2004, 1259 . 1261 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1262 Explicit Congestion Notification (ECN) Field", BCP 124, 1263 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1264 . 1266 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 1267 Marking in MPLS", RFC 5129, DOI 10.17487/RFC5129, January 1268 2008, . 1270 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1271 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1272 2010, . 1274 13.2. Informative References 1276 [ATM-TM-ABR] 1277 Cisco, "Understanding the Available Bit Rate (ABR) Service 1278 Category for ATM VCs", Design Technote 10415, June 2005. 1280 [Buck00] Buckwalter, J., "Frame Relay: Technology and Practice", 1281 Pub. Addison Wesley ISBN-13: 978-0201485240, 2000. 1283 [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", 1284 Bell Systems Technical Journal 32(2):406--424, March 1953. 1286 [GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across the Gn and Gp 1287 interface", Technical Specification TS 29.060. 1289 [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling 1290 Protocol User Plane (GTPv1-U)", Technical Specification TS 1291 29.281. 1293 [GTPv2-C] 3GPP, "Evolved General Packet Radio Service (GPRS) 1294 Tunnelling Protocol for Control plane (GTPv2-C)", 1295 Technical Specification TS 29.274. 1297 [I-D.ietf-intarea-gue] 1298 Herbert, T., Yong, L., and O. Zia, "Generic UDP 1299 Encapsulation", draft-ietf-intarea-gue-09 (work in 1300 progress), October 2019. 1302 [I-D.ietf-nvo3-geneve] 1303 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 1304 Network Virtualization Encapsulation", draft-ietf- 1305 nvo3-geneve-16 (work in progress), March 2020. 1307 [I-D.ietf-trill-ecn-support] 1308 Eastlake, D. and B. Briscoe, "TRILL (TRansparent 1309 Interconnection of Lots of Links): ECN (Explicit 1310 Congestion Notification) Support", draft-ietf-trill-ecn- 1311 support-07 (work in progress), February 2018. 1313 [I-D.ietf-tsvwg-ecn-l4s-id] 1314 Schepper, K. and B. Briscoe, "Identifying Modified 1315 Explicit Congestion Notification (ECN) Semantics for 1316 Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- 1317 id-11 (work in progress), November 2020. 1319 [I-D.ietf-tsvwg-rfc6040update-shim] 1320 Briscoe, B., "Propagating Explicit Congestion Notification 1321 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1322 tsvwg-rfc6040update-shim-10 (work in progress), March 1323 2020. 1325 [IEEE802.1Q] 1326 IEEE, "IEEE Standard for Local and Metropolitan Area 1327 Networks--Virtual Bridged Local Area Networks--Amendment 1328 6: Provider Backbone Bridges", IEEE Std 802.1Q-2018, July 1329 2018, . 1331 [ITU-T.I.371] 1332 ITU-T, "Traffic Control and Congestion Control in B-ISDN", 1333 ITU-T Rec. I.371 (03/04), March 2004, 1334 . 1337 [Leiserson85] 1338 Leiserson, C., "Fat-trees: universal networks for 1339 hardware-efficient supercomputing", IEEE Transactions on 1340 Computers 34(10):892-901, October 1985. 1342 [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) 1343 and Evolved Universal Terrestrial Radio Access Network 1344 (E-UTRAN); Overall description; Stage 2", Technical 1345 Specification TS 36.300. 1347 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1348 DOI 10.17487/RFC2003, October 1996, 1349 . 1351 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1352 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1353 December 1998, . 1355 [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, 1356 W., and G. Zorn, "Point-to-Point Tunneling Protocol 1357 (PPTP)", RFC 2637, DOI 10.17487/RFC2637, July 1999, 1358 . 1360 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1361 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1362 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1363 . 1365 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1366 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1367 DOI 10.17487/RFC2784, March 2000, 1368 . 1370 [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of 1371 Explicit Congestion Notification (ECN) in IP Networks", 1372 RFC 2884, DOI 10.17487/RFC2884, July 2000, 1373 . 1375 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1376 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1377 . 1379 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1380 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1381 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1382 . 1384 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1385 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1386 December 2005, . 1388 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 1389 Network Address Translations (NATs)", RFC 4380, 1390 DOI 10.17487/RFC4380, February 2006, 1391 . 1393 [RFC5415] Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley, 1394 Ed., "Control And Provisioning of Wireless Access Points 1395 (CAPWAP) Protocol Specification", RFC 5415, 1396 DOI 10.17487/RFC5415, March 2009, 1397 . 1399 [RFC6633] Gont, F., "Deprecation of ICMP Source Quench Messages", 1400 RFC 6633, DOI 10.17487/RFC6633, May 2012, 1401 . 1403 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1404 Pre-Congestion Notification (PCN) States in the IP Header 1405 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1406 DOI 10.17487/RFC6660, July 2012, 1407 . 1409 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1410 Locator/ID Separation Protocol (LISP)", RFC 6830, 1411 DOI 10.17487/RFC6830, January 2013, 1412 . 1414 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 1415 Scheffenegger, Ed., "TCP Extensions for High Performance", 1416 RFC 7323, DOI 10.17487/RFC7323, September 2014, 1417 . 1419 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1420 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1421 eXtensible Local Area Network (VXLAN): A Framework for 1422 Overlaying Virtualized Layer 2 Networks over Layer 3 1423 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1424 . 1426 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1427 Recommendations Regarding Active Queue Management", 1428 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1429 . 1431 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1432 Virtualization Using Generic Routing Encapsulation", 1433 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1434 . 1436 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1437 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1438 DOI 10.17487/RFC7713, December 2015, 1439 . 1441 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 1442 Ghanwani, A., and S. Gupta, "Transparent Interconnection 1443 of Lots of Links (TRILL): Clarifications, Corrections, and 1444 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 1445 . 1447 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1448 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1449 . 1451 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 1452 Explicit Congestion Notification (ECN)", RFC 8087, 1453 DOI 10.17487/RFC8087, March 2017, 1454 . 1456 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1457 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1458 May 2017, . 1460 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1461 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1462 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1463 October 2017, . 1465 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 1466 "Network Service Header (NSH)", RFC 8300, 1467 DOI 10.17487/RFC8300, January 2018, 1468 . 1470 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1471 Notification (ECN) Experimentation", RFC 8311, 1472 DOI 10.17487/RFC8311, January 2018, 1473 . 1475 [UTRAN] 3GPP, "UTRAN Overall Description", Technical 1476 Specification TS 25.401. 1478 Appendix A. Changes in This Version (to be removed by RFC Editor) 1480 From ietf-12 to ietf-13 1482 * Following 3rd tsvwg WGLC: 1484 + Formalized update to RFC 3819 in its own subsection (1.1) 1485 and referred to it in the abstract 1487 + Scope: Clarified that the specification of alternative ECN 1488 semantics using ECT(1) was not in RFC 4774, but rather in 1489 RFC 8311, and that the problem with using a DSCP to indicate 1490 alternative semantics has issues at domain boundaries as 1491 well as tunnels. 1493 + Terminology: tighted up definitions of ECN-PDU and Not-ECN- 1494 PDU, and removed definition of Congestion Baseline, given it 1495 was only used once. 1497 + Mentioned QCN where feed-backward is first introduced (S.3), 1498 referring forward to where it is discussed more deeply 1499 (S.4). 1501 + Clarified that IS-IS solution to adding ECN support to TRILL 1502 was not pursued 1504 + Completely rewrote the rationale for the guideline about a 1505 Standard Congestion Monitoring Baseline, to focus on 1506 standardization of the otherwise unknown scenario used, 1507 rather than the relative usefulness of the info in each 1508 approach 1510 + Explained the re-framing problem better and added 1511 fragmentation as another possible cause of the problem 1513 + Acknowledged new reviewers 1515 + Updated references, replaced citations of 802.1Qau and 1516 802.1ah with rolled up 802.1Q, and added citations of Fat 1517 trees and Clos Networks 1519 + Numerous other editorial improvements 1521 From ietf-11 to ietf-12 1523 * Updated references 1525 From ietf-10 to ietf-11 1526 * Removed short section (was 3) 'Guidelines for All Cases' 1527 because it was out of scope, being covered by RFC 4774. 1528 Expanded the Scope section (1.2) to explain all this. 1529 Explained that the default encap/decap rules already support 1530 certain alternative semantics, particularly all three of the 1531 alternative semantics for ECT(1): equivalent to ECT(0) , higher 1532 severity than ECT(0), and unmarked but implying different 1533 marking semantics from ECT(0). 1535 * Clarified why the QCN example was being given even though not 1536 about increment deployment of ECN 1538 * Pointed to the spoofing issue with feed-backward mode from the 1539 Security Considerations section, to aid security review. 1541 * Removed any ambiguity in the word 'transport' throughout 1543 From ietf-09 to ietf-10 1545 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1546 be consistent with updates to draft-ietf-tsvwg-rfc6040update- 1547 shim. 1549 * Removed reference to the ECN nonce, which has been made 1550 historic by RFC 8311 1552 * Removed "Open Issues" Appendix, given all have been addressed. 1554 From ietf-08 to ietf-09 1556 * Updated para in Intro that listed all the IP-in-IP tunnelling 1557 protocols, to instead refer to draft-ietf-tsvwg-rfc6040update- 1558 shim 1560 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1561 summarize guidance that has evolved as rfc6040update-shim has 1562 developed. 1564 From ietf-07 to ietf-08: Refreshed to avoid expiry. Updated 1565 references. 1567 From ietf-06 to ietf-07: 1569 * Added the people involved in liaisons to the acknowledgements. 1571 From ietf-05 to ietf-06: 1573 * Introduction: Added GUE and Geneve as examples of tightly 1574 coupled shims between IP headers that cite RFC 6040. And added 1575 VXLAN to list of those that do not. 1577 * Replaced normative text about tightly coupled shims between IP 1578 headers, with reference to new draft-ietf-tsvwg-rfc6040update- 1579 shim 1581 * Wire Protocol Design: Indication of ECN Support: Added TRILL as 1582 an example of a well-design protocol that does not need an 1583 indication of ECN support in the wire protocol. 1585 * Encapsulation Guidelines: In the case of a Not-ECN-PDU with a 1586 CE outer, replaced SHOULD be dropped, with explanations of when 1587 SHOULD or MUST are appropriate. 1589 * Feed-Up-and-Forward Mode: Explained examples more carefully, 1590 referred to PDCP and cited UTRAN spec as well as E-UTRAN. 1592 * Updated references. 1594 * Marked open issues as resolved, but did not delete Open Issues 1595 Appendix (yet). 1597 From ietf-04 to ietf-05: 1599 * Explained why tightly coupled shim headers only "SHOULD" comply 1600 with RFC 6040, not "MUST". 1602 * Updated references 1604 From ietf-03 to ietf-04: 1606 * Addressed Richard Scheffenegger's review comments: primarily 1607 editorial corrections, and addition of examples for clarity. 1609 From ietf-02 to ietf-03: 1611 * Updated references, ad cited RFC4774. 1613 From ietf-01 to ietf-02: 1615 * Added Section for guidelines that are applicable in all cases. 1617 * Updated references. 1619 From ietf-00 to ietf-01: Updated references. 1621 From briscoe-04 to ietf-00: Changed filename following tsvwg 1622 adoption. 1624 From briscoe-03 to 04: 1626 * Re-arranged the introduction to describe the purpose of the 1627 document first before introducing ECN in more depth. And 1628 clarified the introduction throughout. 1630 * Added applicability to 3GPP TS 36.300. 1632 From briscoe-02 to 03: 1634 * Scope section: 1636 + Added dependence on correct propagation of traffic class 1637 information 1639 + For the feed-backward mode, deemed multicast and anycast out 1640 of scope 1642 * Ensured all guidelines referring to subnet technologies also 1643 refer to tunnels and vice versa by adding applicability 1644 sentences at the start of sections 4.1, 4.2, 4.3, 4.4, 4.6 and 1645 5. 1647 * Added Security Considerations on ensuring congestion signal 1648 fields are classed as immutable and on using end-to-end 1649 congestion signal integrity technologies rather than hop-by- 1650 hop. 1652 From briscoe-01 to 02: 1654 * Added authors: JK & PT 1656 * Added 1658 + Section 4.1 "IP-in-IP Tunnels with Tightly Coupled Shim 1659 Headers" 1661 + Section 4.5 "Sequences of Similar Tunnels or Subnets" 1663 + roadmap at the start of Section 4, given the subsections 1664 have become quite fragmented. 1666 + Section 9 "Conclusions" 1668 * Clarified why transports are starting to be able to saturate 1669 interior links 1671 * Under Section 1.1, addressed the question of alternative signal 1672 semantics and included multicast & anycast. 1674 * Under Section 3.1, included a 3GPP example. 1676 * Section 4.2. "Wire Protocol Design": 1678 + Altered guideline 2. to make it clear that it only applies 1679 to the immediate subnet egress, not later ones 1681 + Added a reminder that it is only necessary to check that ECN 1682 propagates at the egress, not whether interior nodes mark 1683 ECN 1685 + Added example of how QCN uses 802.1p to indicate support for 1686 QCN. 1688 * Added references to Appendix C of RFC6040, about monitoring the 1689 amount of congestion signals introduced within a tunnel 1691 * Appendix A: Added more issues to be addressed, including plan 1692 to produce a standards track update to IP-in-IP tunnel 1693 protocols. 1695 * Updated acks and references 1697 From briscoe-00 to 01: 1699 * Intended status: BCP (was Informational) & updates 3819 added. 1701 * Briefer Introduction: Introductory para justifying benefits of 1702 ECN. Moved all but a brief enumeration of modes of operation 1703 to their own new section (from both Intro & Scope). Introduced 1704 incr. deployment as most tricky part. 1706 * Tightened & added to terminology section 1708 * Structured with Modes of Operation, then Guidelines section for 1709 each mode. 1711 * Tightened up guideline text to remove vagueness / passive voice 1712 / ambiguity and highlight main guidelines as numbered items. 1714 * Added Outstanding Document Issues Appendix 1715 * Updated references 1717 Authors' Addresses 1719 Bob Briscoe 1720 Independent 1721 UK 1723 EMail: ietf@bobbriscoe.net 1724 URI: http://bobbriscoe.net/ 1726 John Kaippallimalil 1727 Futurewei 1728 5700 Tennyson Parkway, Suite 600 1729 Plano, Texas 75024 1730 USA 1732 EMail: kjohn@futurewei.com