idnits 2.17.1 draft-ietf-tsvwg-ecn-encap-guidelines-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC3819, but the abstract doesn't seem to directly say this. It does mention RFC3819 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3819, updated by this document, for RFC5378 checks: 1999-10-14) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 25, 2021) is 1067 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-14 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-13 -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft Independent 4 Updates: 3819 (if approved) J. Kaippallimalil 5 Intended status: Best Current Practice Futurewei 6 Expires: November 26, 2021 May 25, 2021 8 Guidelines for Adding Congestion Notification to Protocols that 9 Encapsulate IP 10 draft-ietf-tsvwg-ecn-encap-guidelines-16 12 Abstract 14 The purpose of this document is to guide the design of congestion 15 notification in any lower layer or tunnelling protocol that 16 encapsulates IP. The aim is for explicit congestion signals to 17 propagate consistently from lower layer protocols into IP. Then the 18 IP internetwork layer can act as a portability layer to carry 19 congestion notification from non-IP-aware congested nodes up to the 20 transport layer (L4). Following these guidelines should assure 21 interworking among IP layer and lower layer congestion notification 22 mechanisms, whether specified by the IETF or other standards bodies. 23 This document updates the advice to subnetwork designers about ECN in 24 RFC 3819. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on November 26, 2021. 43 Copyright Notice 45 Copyright (c) 2021 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Update to RFC 3819 . . . . . . . . . . . . . . . . . . . 5 62 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 9 65 3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 9 66 3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 11 67 3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 12 68 3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 14 69 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 70 Notification . . . . . . . . . . . . . . . . . . . . . . . . 14 71 4.1. IP-in-IP Tunnels with Shim Headers . . . . . . . . . . . 15 72 4.2. Wire Protocol Design: Indication of ECN Support . . . . . 16 73 4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 18 74 4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 20 75 4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 22 76 4.6. Reframing and Congestion Markings . . . . . . . . . . . . 22 77 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 78 Notification . . . . . . . . . . . . . . . . . . . . . . . . 23 79 6. Feed-Backward Mode: Guidelines for Adding Congestion 80 Notification . . . . . . . . . . . . . . . . . . . . . . . . 24 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 83 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 84 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 27 85 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 27 86 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 27 87 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 88 13.1. Normative References . . . . . . . . . . . . . . . . . . 27 89 13.2. Informative References . . . . . . . . . . . . . . . . . 28 90 Appendix A. Changes in This Version (to be removed by RFC 91 Editor) . . . . . . . . . . . . . . . . . . . . . . 33 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 94 1. Introduction 96 The benefits of Explicit Congestion Notification (ECN) described in 97 [RFC8087] and summarized below can only be fully realized if support 98 for ECN is added to the relevant subnetwork technology, as well as to 99 IP. When a lower layer buffer drops a packet obviously it does not 100 just drop at that layer; the packet disappears from all layers. In 101 contrast, when active queue management (AQM) at a lower layer marks a 102 packet with ECN, the marking needs to be explicitly propagated up the 103 layers. The same is true if AQM marks the outer header of a packet 104 that encapsulates inner tunnelled headers. Forwarding ECN is not as 105 straightforward as other headers because it has to be assumed ECN may 106 be only partially deployed. If a lower layer header that contains 107 ECN congestion indications is stripped off by a subnet egress that is 108 not ECN-aware, or if the ultimate receiver or sender is not ECN- 109 aware, congestion needs to be indicated by dropping a packet, not 110 marking it. 112 The purpose of this document is to guide the addition of congestion 113 notification to any subnet technology or tunnelling protocol, so that 114 lower layer AQM algorithms can signal congestion explicitly and it 115 will propagate consistently into encapsulated (higher layer) headers, 116 otherwise the signals will not reach their ultimate destination. 118 ECN is defined in the IP header (v4 and v6) [RFC3168] to allow a 119 resource to notify the onset of queue build-up without having to drop 120 packets, by explicitly marking a proportion of packets with the 121 congestion experienced (CE) codepoint. 123 Given a suitable marking scheme, ECN removes nearly all congestion 124 loss and it cuts delays for two main reasons: 126 o It avoids the delay when recovering from congestion losses, which 127 particularly benefits small flows or real-time flows, making their 128 delivery time predictably short [RFC2884]; 130 o As ECN is used more widely by end-systems, it will gradually 131 remove the need to configure a degree of delay into buffers before 132 they start to notify congestion (the cause of bufferbloat). This 133 is because drop involves a trade-off between sending a timely 134 signal and trying to avoid impairment, whereas ECN is solely a 135 signal not an impairment, so there is no harm triggering it 136 earlier. 138 Some lower layer technologies (e.g. MPLS, Ethernet) are used to form 139 subnetworks with IP-aware nodes only at the edges. These networks 140 are often sized so that it is rare for interior queues to overflow. 141 However, until recently this was more due to the inability of TCP to 142 saturate the links. For many years, fixes such as window scaling 143 [RFC7323] proved hard to deploy. And the Reno variant of TCP has 144 remained in widespread use despite its inability to scale to high 145 flow rates. However, now that modern operating systems are finally 146 capable of saturating interior links, even the buffers of well- 147 provisioned interior switches will need to signal episodes of 148 queuing. 150 Propagation of ECN is defined for MPLS [RFC5129], and is being 151 defined for TRILL [RFC7780], [I-D.ietf-trill-ecn-support], but it 152 remains to be defined for a number of other subnetwork technologies. 154 Similarly, ECN propagation is yet to be defined for many tunnelling 155 protocols. [RFC6040] defines how ECN should be propagated for IP-in- 156 IPv4 [RFC2003], IP-in-IPv6 [RFC2473] and IPsec [RFC4301] tunnels, but 157 there are numerous other tunnelling protocols with a shim and/or a 158 layer 2 header between two IP headers (v4 or v6). Some address ECN 159 propagation between the IP headers, but many do not. This document 160 gives guidance on how to address ECN propagation for future 161 tunnelling protocols, and a companion standards track specification 162 [I-D.ietf-tsvwg-rfc6040update-shim] updates those existing IP-shim- 163 (L2)-IP protocols that are under IETF change control and still widely 164 used. 166 Incremental deployment is the most delicate aspect when adding 167 support for ECN. The original ECN protocol in IP [RFC3168] was 168 carefully designed so that a congested buffer would not mark a packet 169 (rather than drop it) unless both source and destination hosts were 170 ECN-capable. Otherwise its congestion markings would never be 171 detected and congestion would just build up further. However, to 172 support congestion marking below the IP layer or within tunnels, it 173 is not sufficient to only check that the two layer 4 transport end- 174 points support ECN; correct operation also depends on the 175 decapsulator at each subnet or tunnel egress faithfully propagating 176 congestion notifications to the higher layer. Otherwise, a legacy 177 decapsulator might silently fail to propagate any ECN signals from 178 the outer to the forwarded header. Then the lost signals would never 179 be detected and again congestion would build up further. The 180 guidelines given later require protocol designers to carefully 181 consider incremental deployment, and suggest various safe approaches 182 for different circumstances. 184 Of course, the IETF does not have standards authority over every link 185 layer protocol. So this document gives guidelines for designing 186 propagation of congestion notification across the interface between 187 IP and protocols that may encapsulate IP (i.e. that can be layered 188 beneath IP). Each lower layer technology will exhibit different 189 issues and compromises, so the IETF or the relevant standards body 190 must be free to define the specifics of each lower layer congestion 191 notification scheme. Nonetheless, if the guidelines are followed, 192 congestion notification should interwork between different 193 technologies, using IP in its role as a 'portability layer'. 195 Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often 196 used in preference to 'MUST' or 'MUST NOT', because it is difficult 197 to know the compromises that will be necessary in each protocol 198 design. If a particular protocol design chooses not to follow a 199 'SHOULD (NOT)' given in the advice below, it MUST include a sound 200 justification. 202 It has not been possible to give common guidelines for all lower 203 layer technologies, because they do not all fit a common pattern. 204 Instead they have been divided into a few distinct modes of 205 operation: feed-forward-and-upward; feed-upward-and-forward; feed- 206 backward; and null mode. These modes are described in Section 3, 207 then in the subsequent sections separate guidelines are given for 208 each mode. 210 1.1. Update to RFC 3819 212 This document updates the brief advice to subnetwork designers about 213 ECN in [RFC3819], by replacing the last two paragraphs of Section 13 214 with the following sentence: 216 By following the guidelines in [this document], subnetwork 217 designers can enable a layer-2 protocol to participate in 218 congestion control without dropping packets via propagation of 219 explicit congestion notification (ECN [RFC3168]) to receivers. 221 and adding [this document] as an informative reference. {RFC Editor: 222 Please replace both instances of [this document] above with the 223 number of the present RFC when published.} 225 1.2. Scope 227 This document only concerns wire protocol processing of explicit 228 notification of congestion. It makes no changes or recommendations 229 concerning algorithms for congestion marking or for congestion 230 response, because algorithm issues should be independent of the layer 231 the algorithm operates in. 233 The default ECN semantics are described in [RFC3168] and updated by 234 [RFC8311]. Also the guidelines for AQM designers [RFC7567] clarify 235 the semantics of both drop and ECN signals from AQM algorithms. 236 [RFC4774] is the appropriate best current practice specification of 237 how algorithms with alternative semantics for the ECN field can be 238 partitioned from Internet traffic that uses the default ECN 239 semantics. There are two main examples for how alternative ECN 240 semantics have been defined in practice: 242 o RFC 4774 suggests using the ECN field in combination with a 243 Diffserv codepoint such as in PCN [RFC6660], Voice over 3G [UTRAN] 244 or Voice over LTE (VoLTE) [LTE-RA]; 246 o RFC 8311 suggests using the ECT(1) codepoint of the ECN field to 247 indicate alternative semantics such as for the experimental Low 248 Latency Low Loss Scalable throughput (L4S) service 249 [I-D.ietf-tsvwg-ecn-l4s-id]). 251 The aim is that the default rules for encapsulating and decapsulating 252 the ECN field are sufficiently generic that tunnels and subnets will 253 encapsulate and decapsulate packets without regard to how algorithms 254 elsewhere are setting or interpreting the semantics of the ECN field. 255 [RFC6040] updates RFC 4774 to allow alternative encapsulation and 256 decapsulation behaviours to be defined for alternative ECN semantics. 257 However it reinforces the same point - that it is far preferable to 258 try to fit within the common ECN encapsulation and decapsulation 259 behaviours, because expecting all lower layer technologies and 260 tunnels to be updated is likely to be completely impractical. 262 Alternative semantics for the ECN field can be defined to depend on 263 the traffic class indicated by the DSCP. Therefore correct 264 propagation of congestion signals could depend on correct propagation 265 of the DSCP between the layers and along the path. For instance, if 266 the meaning of the ECN field depends on the DSCP (as in PCN or VoLTE) 267 and if the outer DSCP is stripped on descapsulation, as in the pipe 268 model of [RFC2983], the special semantics of the ECN field would be 269 lost. Similarly, if the DSCP is changed at the boundary between 270 Diffserv domains, the special ECN semantics would also be lost. This 271 is an important implication of the localized scope of most Diffserv 272 arrangements. In this document, correct propagation of traffic class 273 information is assumed, while what 'correct' means and how it is 274 achieved is covered elsewhere (e.g. RFC 2983) and is outside the 275 scope of the present document. 277 The guidelines in this document do ensure that common encapsulation 278 and decapsulation rules are sufficiently generic to cover cases where 279 ECT(1) is used instead of ECT(0) to identify alternative ECN 280 semantics (as in L4S [I-D.ietf-tsvwg-ecn-l4s-id]) and where ECN 281 marking algorithms use ECT(1) to encode 3 severity levels into the 282 ECN field (e.g. PCN [RFC6660]) rather than the default of 2. All 283 these different semantics for the ECN field work because it has been 284 possible to define common default decapsulation rules that allow for 285 all cases. 287 Note that the guidelines in this document do not necessarily require 288 the subnet wire protocol to be changed to add support for congestion 289 notification. For instance, the Feed-Up-and-Forward Mode 290 (Section 3.2) and the Null Mode (Section 3.4) do not. Another way to 291 add congestion notification without consuming header space in the 292 subnet protocol might be to use a parallel control plane protocol. 294 This document focuses on the congestion notification interface 295 between IP and lower layer or tunnel protocols that can encapsulate 296 IP, where the term 'IP' includes v4 or v6, unicast, multicast or 297 anycast. However, it is likely that the guidelines will also be 298 useful when a lower layer protocol or tunnel encapsulates itself, 299 e.g. Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah) or when 300 it encapsulates other protocols. In the feed-backward mode, 301 propagation of congestion signals for multicast and anycast packets 302 is out-of-scope (because the complexity would make it unlikely to be 303 attempted). 305 2. Terminology 307 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 308 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 309 document are to be interpreted as described in [RFC2119] [RFC8174] 310 when, and only when, they appear in all capitals, as shown here. 312 Further terminology used within this document: 314 Protocol data unit (PDU): Information that is delivered as a unit 315 among peer entities of a layered network consisting of protocol 316 control information (typically a header) and possibly user data 317 (payload) of that layer. The scope of this document includes 318 layer 2 and layer 3 networks, where the PDU is respectively termed 319 a frame or a packet (or a cell in ATM). PDU is a general term for 320 any of these. This definition also includes a payload with a shim 321 header lying somewhere between layer 2 and 3. 323 Transport: The end-to-end transmission control function, 324 conventionally considered at layer-4 in the OSI reference model. 325 Given the audience for this document will often use the word 326 transport to mean low level bit carriage, whenever the term is 327 used it will be qualified, e.g. 'L4 transport'. 329 Encapsulator: The link or tunnel endpoint function that adds an 330 outer header to a PDU (also termed the 'link ingress', the 'subnet 331 ingress', the 'ingress tunnel endpoint' or just the 'ingress' 332 where the context is clear). 334 Decapsulator: The link or tunnel endpoint function that removes an 335 outer header from a PDU (also termed the 'link egress', the 336 'subnet egress', the 'egress tunnel endpoint' or just the 'egress' 337 where the context is clear). 339 Incoming header: The header of an arriving PDU before encapsulation. 341 Outer header: The header added to encapsulate a PDU. 343 Inner header: The header encapsulated by the outer header. 345 Outgoing header: The header forwarded by the decapsulator. 347 CE: Congestion Experienced [RFC3168] 349 ECT: ECN-Capable (L4) Transport [RFC3168] 351 Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] 353 Load Regulator: For each flow of PDUs, the transport function that 354 is capable of controlling the data rate. Typically located at the 355 data source, but in-path nodes can regulate load in some 356 congestion control arrangements (e.g. admission control, policing 357 nodes or transport circuit-breakers [RFC8084]). Note the term "a 358 function capable of controlling the load" deliberately includes a 359 transport that does not actually control the load responsively but 360 ideally it ought to (e.g. a sending application without congestion 361 control that uses UDP). 363 ECN-PDU: A PDU at the IP layer or below with a capacity to signal 364 congestion that is part of a congestion control feedback loop 365 within which all the nodes necessary to propagate the signal back 366 to the Load Regulator are capable of doing that propagation. An 367 IP packet with a non-zero ECN field implies that the endpoints are 368 ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is 369 intended to be a general term for a PDU at lower layers, as well 370 as at the IP layer. 372 Not-ECN-PDU: A PDU at the IP layer or below that is part of a 373 congestion control feedback-loop within which at least one node 374 necessary to propagate any explicit congestion notification 375 signals back to the Load Regulator is not capable of doing that 376 propagation. 378 3. Modes of Operation 380 This section sets down the different modes by which congestion 381 information is passed between the lower layer and the higher one. It 382 acts as a reference framework for the following sections, which give 383 normative guidelines for designers of explicit congestion 384 notification protocols, taking each mode in turn: 386 Feed-Forward-and-Up: Nodes feed forward congestion notification 387 towards the egress within the lower layer then up and along the 388 layers towards the end-to-end destination at the transport layer. 389 The following local optimisation is possible: 391 Feed-Up-and-Forward: A lower layer switch feeds-up congestion 392 notification directly into the higher layer (e.g. into the ECN 393 field in the IP header), irrespective of whether the node is at 394 the egress of a subnet. 396 Feed-Backward: Nodes feed back congestion signals towards the 397 ingress of the lower layer and (optionally) attempt to control 398 congestion within their own layer. 400 Null: Nodes cannot experience congestion at the lower layer except 401 at ingress nodes (which are IP-aware or equivalently higher-layer- 402 aware). 404 3.1. Feed-Forward-and-Up Mode 406 Like IP and MPLS, many subnet technologies are based on self- 407 contained protocol data units (PDUs) or frames sent unreliably. They 408 provide no feedback channel at the subnetwork layer, instead relying 409 on higher layers (e.g. TCP) to feed back loss signals. 411 In these cases, ECN may best be supported by standardising explicit 412 notification of congestion into the lower layer protocol that carries 413 the data forwards. Then a specification is needed for how the egress 414 of the lower layer subnet propagates this explicit signal into the 415 forwarded upper layer (IP) header. This signal continues forwards 416 until it finally reaches the destination transport (at L4). Then 417 typically the destination will feed this congestion notification back 418 to the source transport using an end-to-end protocol (e.g. TCP). 419 This is the arrangement that has already been used to add ECN to IP- 420 in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. 422 This mode is illustrated in Figure 1. Along the middle of the 423 figure, layers 2, 3 and 4 of the protocol stack are shown, and one 424 packet is shown along the bottom as it progresses across the network 425 from source to destination, crossing two subnets connected by a 426 router, and crossing two switches on the path across each subnet. 427 Congestion at the output of the first switch (shown as *) leads to a 428 congestion marking in the L2 header (shown as C in the illustration 429 of the packet). The chevrons show the progress of the resulting 430 congestion indication. It is propagated from link to link across the 431 subnet in the L2 header, then when the router removes the marked L2 432 header, it propagates the marking up into the L3 (IP) header. The 433 router forwards the marked L3 header into subnet 2, and when it adds 434 a new L2 header it copies the L3 marking into the L2 header as well, 435 as shown by the 'C's in both layers (assuming the technology of 436 subnet 2 also supports explicit congestion marking). 438 Note that there is no implication that each 'C' marking is encoded 439 the same; a different encoding might be used for the 'C' marking in 440 each protocol. 442 Finally, for completeness, we show the L3 marking arriving at the 443 destination, where the host transport protocol (e.g. TCP) feeds it 444 back to the source in the L4 acknowledgement (the 'C' at L4 in the 445 packet at the top of the diagram). 447 _ _ _ 448 /_______ | | |C| ACK Packet (V) 449 \ |_|_|_| 450 +---+ layer: 2 3 4 header +---+ 451 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 452 | | +---+ | ^ | 453 | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 454 | | +---+ +---+ | ^ | +---+ +---+ | | 455 | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 456 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 457 source subnet A router subnet B dest 458 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 459 | | | | | | | | |C| | | |C| | | |C|C| Data________\ 460 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / 461 layer: 4 3 2A 4 3 2A 4 3 4 3 2B 462 header 464 Figure 1: Feed-Forward-and-Up Mode 466 Of course, modern networks are rarely as simple as this text-book 467 example, often involving multiple nested layers. For example, a 3GPP 468 mobile network may have two IP-in-IP (GTP [GTPv1]) tunnels in series 469 and an MPLS backhaul between the base station and the first router. 470 Nonetheless, the example illustrates the general idea of feeding 471 congestion notification forward then upward whenever a header is 472 removed at the egress of a subnet. 474 Note that the FECN (forward ECN ) bit in Frame Relay [Buck00] and the 475 explicit forward congestion indication (EFCI [ITU-T.I.371]) bit in 476 ATM user data cells follow a feed-forward pattern. However, in ATM, 477 this arrangement is only part of a feed-forward-and-backward pattern 478 at the lower layer, not feed-forward-and-up out of the lower layer-- 479 the intention was never to interface to IP ECN at the subnet egress. 480 To our knowledge, Frame Relay FECN is solely used to detect where 481 more capacity should be provisioned. 483 3.2. Feed-Up-and-Forward Mode 485 Ethernet is particularly difficult to extend incrementally to support 486 explicit congestion notification. One way to support ECN in such 487 cases has been to use so called 'layer-3 switches'. These are 488 Ethernet switches that dig into the Ethernet payload to find an IP 489 header and manipulate or act on certain IP fields (specifically 490 Diffserv & ECN). For instance, in Data Center TCP [RFC8257], layer-3 491 switches are configured to mark the ECN field of the IP header within 492 the Ethernet payload when their output buffer becomes congested. 493 With respect to switching, a layer-3 switch acts solely on the 494 addresses in the Ethernet header; it does not use IP addresses, and 495 it does not decrement the TTL field in the IP header. 497 _ _ _ 498 /_______ | | |C| ACK packet (V) 499 \ |_|_|_| 500 +---+ layer: 2 3 4 header +---+ 501 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 502 | | +---+ | ^ | 503 | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 504 | | +--^+ +---+ | | +---+ +---+ | | 505 | | | *| | | | | | | | | | |L2 506 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 507 source subnet E router subnet F dest 508 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 509 | | | | | | | |C| | | | |C| | | |C|C| data________\ 510 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 511 layer: 4 3 2 4 3 2 4 3 4 3 2 512 header 514 Figure 2: Feed-Up-and-Forward Mode 516 By comparing Figure 2 with Figure 1, it can be seen that subnet E 517 (perhaps a subnet of layer-3 Ethernet switches) works in feed-up-and- 518 forward mode by notifying congestion directly into L3 at the point of 519 congestion, even though the congested switch does not otherwise act 520 at L3. In this example, the technology in subnet F (e.g. MPLS) does 521 support ECN natively, so when the router adds the layer-2 header it 522 copies the ECN marking from L3 to L2 as well. 524 3.3. Feed-Backward Mode 526 In some layer 2 technologies, explicit congestion notification has 527 been defined for use internally within the subnet with its own 528 feedback and load regulation, but typically the interface with IP for 529 ECN has not been defined. 531 For instance, for the available bit-rate (ABR) service in ATM, the 532 relative rate mechanism was one of the more popular mechanisms for 533 managing traffic, tending to supersede earlier designs. In this 534 approach ATM switches send special resource management (RM) cells in 535 both the forward and backward directions to control the ingress rate 536 of user data into a virtual circuit. If a switch buffer is 537 approaching congestion or is congested it sends an RM cell back 538 towards the ingress with respectively the No Increase (NI) or 539 Congestion Indication (CI) bit set in its message type field 540 [ATM-TM-ABR]. The ingress then holds or decreases its sending bit- 541 rate accordingly. 543 _ _ _ 544 /_______ | | |C| ACK packet (X) 545 \ |_|_|_| 546 +---+ layer: 2 3 4 header +---+ 547 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 548 | | +---+ | ^ | 549 | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 550 | | +---+ +---+ | | +---+ +---+ | | 551 | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 552 | | . . | . |Packet U | . . | . | . . | . | . . | .*| . . | |L2 553 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 554 source subnet G router subnet H dest 555 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ later 556 | | | | | | | | | | | | | | | | |C| | data________\ 557 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (W) / 558 4 3 2 4 3 2 4 3 4 3 2 559 _ 560 /__ |C| Feedback control 561 \ |_| cell/frame (V) 562 2 563 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier 564 | | | | | | | | | | | | | | | | | | | data________\ 565 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 566 layer: 4 3 2 4 3 2 4 3 4 3 2 567 header 569 Figure 3: Feed-Backward Mode 571 ATM's feed-backward approach does not fit well when layered beneath 572 IP's feed-forward approach--unless the initial data source is the 573 same node as the ATM ingress. Figure 3 shows the feed-backward 574 approach being used in subnet H. If the final switch on the path is 575 congested (*), it does not feed-forward any congestion indications on 576 packet (U). Instead it sends a control cell (V) back to the router 577 at the ATM ingress. 579 However, the backward feedback does not reach the original data 580 source directly because IP does not support backward feedback (and 581 subnet G is independent of subnet H). Instead, the router in the 582 middle throttles down its sending rate but the original data sources 583 don't reduce their rates. The resulting rate mismatch causes the 584 middle router's buffer at layer 3 to back up until it becomes 585 congested, which it signals forwards on later data packets at layer 3 586 (e.g. packet W). Note that the forward signal from the middle router 587 is not triggered directly by the backward signal. Rather, it is 588 triggered by congestion resulting from the middle router's mismatched 589 rate response to the backward signal. 591 In response to this later forward signalling, end-to-end feedback at 592 layer-4 finally completes the tortuous path of congestion indications 593 back to the origin data source, as before. 595 Quantized congestion notification (QCN [IEEE802.1Q]) would suffer 596 from similar problems if extended to multiple subnets. However, from 597 the start QCN was clearly characterized as solely applicable to a 598 single subnet (see Section 6). 600 3.4. Null Mode 602 Often link and physical layer resources are 'non-blocking' by design. 603 In these cases congestion notification may be implemented but it does 604 not need to be deployed at the lower layer; ECN in IP would be 605 sufficient. 607 A degenerate example is a point-to-point Ethernet link. Excess 608 loading of the link merely causes the queue from the higher layer to 609 back up, while the lower layer remains immune to congestion. Even a 610 whole meshed subnetwork can be made immune to interior congestion by 611 limiting ingress capacity and sufficient sizing of interior links, 612 e.g. a non-blocking fat-tree network [Leiserson85]. An alternative 613 to fat links near the root is numerous thin links with multi-path 614 routing to ensure even worst-case patterns of load cannot congest any 615 link, e.g. a Clos network [Clos53]. 617 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 618 Notification 620 Feed-forward-and-up is the mode already used for signalling ECN up 621 the layers through MPLS into IP [RFC5129] and through IP-in-IP 622 tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 623 [RFC2473] or IPsec [RFC4301]. These RFCs take a consistent approach 624 and the following guidelines are designed to ensure this consistency 625 continues as ECN support is added to other protocols that encapsulate 626 IP. The guidelines are also designed to ensure compliance with the 627 more general best current practice for the design of alternate ECN 628 schemes given in [RFC4774] and extended by [RFC8311]. 630 The rest of this section is structured as follows: 632 o Section 4.1 addresses the most straightforward cases, where 633 [RFC6040] can be applied directly to add ECN to tunnels that are 634 effectively IP-in-IP tunnels, but with shim header(s) between the 635 IP headers. 637 o The subsequent sections give guidelines for adding ECN to a subnet 638 technology that uses feed-forward-and-up mode like IP, but it is 639 not so similar to IP that [RFC6040] rules can be applied directly. 640 Specifically: 642 * Sections 4.2, 4.3 and 4.4 respectively address how to add ECN 643 support to the wire protocol and to the encapsulators and 644 decapsulators at the ingress and egress of the subnet. 646 * Section 4.5 deals with the special, but common, case of 647 sequences of tunnels or subnets that all use the same 648 technology 650 * Section 4.6 deals with the question of reframing when IP 651 packets do not map 1:1 into lower layer frames. 653 4.1. IP-in-IP Tunnels with Shim Headers 655 A common pattern for many tunnelling protocols is to encapsulate an 656 inner IP header with shim header(s) then an outer IP header. A shim 657 header is defined as one that is not sufficient alone to forward the 658 packet as an outer header. Another common pattern is for a shim to 659 encapsulate a layer 2 (L2) header, which in turn encapsulates (or 660 might encapsulate) an IP header. [I-D.ietf-tsvwg-rfc6040update-shim] 661 clarifies that RFC 6040 is just as applicable when there are shim(s) 662 and possibly a L2 header between two IP headers. 664 However, it is not always feasible or necessary to propagate ECN 665 between IP headers when separated by a shim. For instance, it might 666 be too costly to dig to arbitrary depths to find an inner IP header, 667 there may be little or no congestion within the tunnel by design (see 668 null mode in Section 3.4 above), or a legacy implementation might not 669 support ECN. In cases where a tunnel does not support ECN, it is 670 important that the ingress does not copy the ECN field from an inner 671 IP header to an outer. Therefore section 4 of 672 [I-D.ietf-tsvwg-rfc6040update-shim] requires network operators to 673 configure the ingress of a tunnel that does not support ECN so that 674 it zeros the ECN field in the outer IP header. 676 Nonetheless, in many cases it is feasible to propagate the ECN field 677 between IP headers separated by shim header(s) and/or a L2 header. 678 Particularly in the typical case when the outer IP header and the 679 shim(s) are added (or removed) as part of the same procedure. Even 680 if the shim(s) encapsulate a L2 header, it is often possible to find 681 an inner IP header within the L2 PDU and propagate ECN between that 682 and the outer IP header. This can be thought of as a special case of 683 the feed-up-and-forward mode (Section 3.2), so the guidelines for 684 this mode apply (Section 5). 686 Numerous shim protocols have been defined for IP tunnelling. More 687 recent ones e.g. Geneve [RFC8926] and Generic UDP Encapsulation 688 (GUE) [I-D.ietf-intarea-gue] cite and follow RFC 6040. And some 689 earlier ones, e.g. CAPWAP [RFC5415] and LISP [RFC6830], cite RFC 690 3168, which is compatible with RFC 6040. 692 However, as Section 9.3 of RFC 3168 pointed out, ECN support needs to 693 be defined for many earlier shim-based tunnelling protocols, e.g. 694 L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], 695 GTP [GTPv1], [GTPv1-U], [GTPv2-C] and Teredo [RFC4380] as well as 696 some recent ones, e.g. VXLAN [RFC7348], NVGRE [RFC7637] and NSH 697 [RFC8300]. 699 All these IP-based encapsulations can be updated in one shot by 700 simple reference to RFC 6040. However, it would not be appropriate 701 to update all these protocols from within the present guidance 702 document. Instead a companion specification 703 [I-D.ietf-tsvwg-rfc6040update-shim] has been prepared that has the 704 appropriate standards track status to update standards track 705 protocols. For those that are not under IETF change control 706 [I-D.ietf-tsvwg-rfc6040update-shim] can only recommend that the 707 relevant body updates them. 709 4.2. Wire Protocol Design: Indication of ECN Support 711 This section is intended to guide the redesign of any lower layer 712 protocol that encapsulate IP to add native ECN support at the lower 713 layer. It reflects the approaches used in [RFC6040] and in 714 [RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS 715 encapsulations that already comply with [RFC6040] or [RFC5129] will 716 already satisfy this guidance. 718 A lower layer (or subnet) congestion notification system: 720 1. SHOULD NOT apply explicit congestion notifications to PDUs that 721 are destined for legacy layer-4 transport implementations that 722 will not understand ECN, and 724 2. SHOULD NOT apply explicit congestion notifications to PDUs if the 725 egress of the subnet might not propagate congestion notifications 726 onward into the higher layer. 728 We use the term ECN-PDUs for a PDU on a feedback loop that will 729 propagate congestion notification properly because it meets both 730 the above criteria. And a Not-ECN-PDU is a PDU on a feedback 731 loop that does not meet at least one of the criteria, and will 732 therefore not propagate congestion notification properly. A 733 corollary of the above is that a lower layer congestion 734 notification protocol: 736 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. 738 Note that there is no need for all interior nodes within a subnet to 739 be able to mark congestion explicitly. A mix of ECN and drop signals 740 from different nodes is fine. However, if _any_ interior nodes might 741 generate ECN markings, guideline 2 above says that all relevant 742 egress node(s) SHOULD be able to propagate those markings up to the 743 higher layer. 745 In IP, if the ECN field in each PDU is cleared to the Not-ECT (not 746 ECN-capable transport) codepoint, it indicates that the L4 transport 747 will not understand congestion markings. A congested buffer must not 748 mark these Not-ECT PDUs, and therefore drops them instead. 750 The mechanism a lower layer uses to distinguish the ECN-capability of 751 PDUs need not mimic that of IP. The above guidelines merely say that 752 the lower layer system, as a whole, should achieve the same outcome. 753 For instance, ECN-capable feedback loops might use PDUs that are 754 identified by a particular set of labels or tags. Alternatively, 755 logical link protocols that use flow state might determine whether a 756 PDU can be congestion marked by checking for ECN-support in the flow 757 state. Other protocols might depend on out-of-band control signals. 759 The per-domain checking of ECN support in MPLS [RFC5129] is a good 760 example of a way to avoid sending congestion markings to L4 761 transports that will not understand them, without using any header 762 space in the subnet protocol. 764 In MPLS, header space is extremely limited, therefore RFC5129 does 765 not provide a field in the MPLS header to indicate whether the PDU is 766 an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are 767 allowed to set explicit congestion indications without checking 768 whether the PDU is destined for a L4 transport that will understand 769 them. Nonetheless, this is made safe by requiring that the network 770 operator upgrades all decapsulating edges of a whole domain at once, 771 as soon as even one switch within the domain is configured to mark 772 rather than drop during congestion. Therefore, any edge node that 773 might decapsulate a packet will be capable of checking whether the 774 higher layer transport is ECN-capable. When decapsulating a CE- 775 marked packet, if the decapsulator discovers that the higher layer 776 (inner header) indicates the transport is not ECN-capable, it drops 777 the packet--effectively on behalf of the earlier congested node (see 778 Decapsulation Guideline 1 in Section 4.4). 780 It was only appropriate to define such an incremental deployment 781 strategy because MPLS is targeted solely at professional operators, 782 who can be expected to ensure that a whole subnetwork is consistently 783 configured. This strategy might not be appropriate for other link 784 technologies targeted at zero-configuration deployment or deployment 785 by the general public (e.g. Ethernet). For such 'plug-and-play' 786 environments it will be necessary to invent a failsafe approach that 787 ensures congestion markings will never fall into black holes, no 788 matter how inconsistently a system is put together. Alternatively, 789 congestion notification relying on correct system configuration could 790 be confined to flavours of Ethernet intended only for professional 791 network operators, such as Provider Backbone Bridges (PBB 792 [IEEE802.1Q]; previously 802.1ah). 794 ECN support in TRILL [I-D.ietf-trill-ecn-support] provides a good 795 example of how to add ECN to a lower layer protocol without relying 796 on careful and consistent operator configuration. TRILL provides an 797 extension header word with space for flags of different categories 798 depending on whether logic to understand the extension is critical. 799 The congestion experienced marking has been defined as a 'critical 800 ingress-to-egress' flag. So if a transit RBridge sets this flag and 801 an egress RBridge does not have any logic to process it, it will drop 802 it; which is the desired default action anyway. Therefore TRILL 803 RBridges can be updated with support for ECN in no particular order 804 and, at the egress of the TRILL campus, congestion notification will 805 be propagated to IP as ECN whenever ECN logic has been implemented, 806 or as drop otherwise. 808 QCN [IEEE802.1Q] is not intended to extend beyond a single subnet, or 809 to interoperate with ECN. Nonetheless, the way QCN indicates to 810 lower layer devices that the end-points will not understand QCN 811 provides another example that a lower layer protocol designer might 812 be able to mimic for their scenario. An operator can define certain 813 Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to 814 indicate non-QCN frames and an ingress bridge is required to map 815 arriving not-QCN-capable IP packets to one of these non-QCN PCPs. 817 4.3. Encapsulation Guidelines 819 This section is intended to guide the redesign of any node that 820 encapsulates IP with a lower layer header when adding native ECN 821 support to the lower layer protocol. It reflects the approaches used 822 in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- 823 MPLS or MPLS-in-MPLS encapsulations that already comply with 824 [RFC6040] or [RFC5129] will already satisfy this guidance. 826 1. Egress Capability Check: A subnet ingress needs to be sure that 827 the corresponding egress of a subnet will propagate any 828 congestion notification added to the outer header across the 829 subnet. This is necessary in addition to checking that an 830 incoming PDU indicates an ECN-capable (L4) transport. Examples 831 of how this guarantee might be provided include: 833 * by configuration (e.g. if any label switches in a domain 834 support ECN marking, [RFC5129] requires all egress nodes to 835 have been configured to propagate ECN) 837 * by the ingress explicitly checking that the egress propagates 838 ECN (e.g. an early attempt to add ECN support to TRILL used 839 IS-IS to check path capabilities before adding ECN extension 840 flags to each frame [RFC7780]). 842 * by inherent design of the protocol (e.g. by encoding ECN 843 marking on the outer header in such a way that a legacy egress 844 that does not understand ECN will consider the PDU corrupt or 845 invalid and discard it, thus at least propagating a form of 846 congestion signal). 848 2. Egress Fails Capability Check: If the ingress cannot guarantee 849 that the egress will propagate congestion notification, the 850 ingress SHOULD disable ECN at the lower layer when it forwards 851 the PDU. An example of how the ingress might disable ECN at the 852 lower layer would be by setting the outer header of the PDU to 853 identify it as a Not-ECN-PDU, assuming the subnet technology 854 supports such a concept. 856 3. Standard Congestion Monitoring Baseline: Once the ingress to a 857 subnet has established that the egress will correctly propagate 858 ECN, on encapsulation it SHOULD encode the same level of 859 congestion in outer headers as is arriving in incoming headers. 860 For example it might copy any incoming congestion notification 861 into the outer header of the lower layer protocol. 863 This ensures that bulk congestion monitoring of outer headers 864 (e.g. by a network management node monitoring ECN in passing 865 frames) will measure congestion accumulated along the whole 866 upstream path - since the Load Regulator not just since the 867 ingress of the subnet. A node that is not the Load Regulator 868 SHOULD NOT re-initialize the level of CE markings in the outer to 869 zero. 871 It would still also be possible to measure congestion introduced 872 across one subnet (or tunnel) by subtracting the level of CE 873 markings on inner headers from that on outer headers (see 874 Appendix C of [RFC6040]). For example: 876 * If this guideline has been followed and if the level of CE 877 markings is 0.4% on the outer and 0.1% on the inner, 0.4% 878 congestion has been introduced across all the networks since 879 the load regulator, and 0.3% (= 0.4% - 0.1%) has been 880 introduced since the ingress to the current subnet (or 881 tunnel); 883 * Without this guideline, if the subnet ingress had re- 884 initialized the outer congestion level to zero, the outer and 885 inner would measure 0.1% and 0.3%. It would still be possible 886 to infer that the congestion introduced since the Load 887 Regulator was 0.4% (= 0.1% + 0.3%). But only if the 888 monitoring system somehow knows whether the subnet ingress re- 889 initialized the congestion level. 891 As long as subnet and tunnel technologies use the standard 892 congestion monitoring baseline in this guideline, monitoring 893 systems will know to use the former approach, rather than having 894 to "somehow know" which approach to use. 896 4.4. Decapsulation Guidelines 898 This section is intended to guide the redesign of any node that 899 decapsulates IP from within a lower layer header when adding native 900 ECN support to the lower layer protocol. It reflects the approaches 901 used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or 902 IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with 903 [RFC6040] or [RFC5129] will already satisfy this guidance. 905 A subnet egress SHOULD NOT simply copy congestion notification from 906 outer headers to the forwarded header. It SHOULD calculate the 907 outgoing congestion notification field from the inner and outer 908 headers using the following guidelines. If there is any conflict, 909 rules earlier in the list take precedence over rules later in the 910 list: 912 1. If the arriving inner header is a Not-ECN-PDU it implies the L4 913 transport will not understand explicit congestion markings. 914 Then: 916 * If the outer header carries an explicit congestion marking, 917 drop is the only indication of congestion that the L4 918 transport will understand. If the congestion marking is the 919 most severe possible, the packet MUST be dropped. However, if 920 congestion can be marked with multiple levels of severity and 921 the packet's marking is not the most severe, this requirement 922 can be relaxed to: the packet SHOULD be dropped. 924 * If the outer is an ECN-PDU that carries no indication of 925 congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but 926 still as a Not-ECN-PDU. 928 2. If the outer header does not support explicit congestion 929 notification (a Not-ECN-PDU), but the inner header does (an ECN- 930 PDU), the inner header SHOULD be forwarded unchanged. 932 3. In some lower layer protocols congestion may be signalled as a 933 numerical level, such as in the control frames of quantized 934 congestion notification (QCN [IEEE802.1Q]). If such a multi-bit 935 encoding encapsulates an ECN-capable IP data packet, a function 936 will be needed to convert the quantized congestion level into the 937 frequency of congestion markings in outgoing IP packets. 939 4. Congestion indications might be encoded by a severity level. For 940 instance increasing levels of congestion might be encoded by 941 numerically increasing indications, e.g. pre-congestion 942 notification (PCN) can be encoded in each PDU at three severity 943 levels in IP or MPLS [RFC6660] and the default encapsulation and 944 decapsulation rules [RFC6040] are compatible with this 945 interpretation of the ECN field. 947 If the arriving inner header is an ECN-PDU, where the inner and 948 outer headers carry indications of congestion of different 949 severity, the more severe indication SHOULD be forwarded in 950 preference to the less severe. 952 5. The inner and outer headers might carry a combination of 953 congestion notification fields that should not be possible given 954 any currently used protocol transitions. For instance, if 955 Encapsulation Guideline 3 in Section 4.3 had been followed, it 956 should not be possible to have a less severe indication of 957 congestion in the outer than in the inner. It MAY be appropriate 958 to log unexpected combinations of headers and possibly raise an 959 alarm. 961 If a safe outgoing codepoint can be defined for such a PDU, the 962 PDU SHOULD be forwarded rather than dropped. Some implementers 963 discard PDUs with currently unused combinations of headers just 964 in case they represent an attack. However, an approach using 965 alarms and policy-mediated drop is preferable to hard-coded drop, 966 so that operators can keep track of possible attacks but 967 currently unused combinations are not precluded from future use 968 through new standards actions. 970 4.5. Sequences of Similar Tunnels or Subnets 972 In some deployments, particularly in 3GPP networks, an IP packet may 973 traverse two or more IP-in-IP tunnels in sequence that all use 974 identical technology (e.g. GTP). 976 In such cases, it would be sufficient for every encapsulation and 977 decapsulation in the chain to comply with RFC 6040. Alternatively, 978 as an optimisation, a node that decapsulates a packet and immediately 979 re-encapsulates it for the next tunnel MAY copy the incoming outer 980 ECN field directly to the outgoing outer and the incoming inner ECN 981 field directly to the outgoing inner. Then the overall behavior 982 across the sequence of tunnel segments would still be consistent with 983 RFC 6040. 985 Appendix C of RFC6040 describes how a tunnel egress can monitor how 986 much congestion has been introduced within a tunnel. A network 987 operator might want to monitor how much congestion had been 988 introduced within a whole sequence of tunnels. Using the technique 989 in Appendix C of RFC6040 at the final egress, the operator could 990 monitor the whole sequence of tunnels, but only if the above 991 optimisation were used consistently along the sequence of tunnels, in 992 order to make it appear as a single tunnel. Therefore, tunnel 993 endpoint implementations SHOULD allow the operator to configure 994 whether this optimisation is enabled. 996 When ECN support is added to a subnet technology, consideration 997 SHOULD be given to a similar optimisation between subnets in sequence 998 if they all use the same technology. 1000 4.6. Reframing and Congestion Markings 1002 The guidance in this section is worded in terms of framing 1003 boundaries, but it applies equally whether the protocol data units 1004 are frames, cells or packets. 1006 Where an AQM marks the ECN field of IP packets as they queue into a 1007 layer-2 link, there will be no problem with framing boundaries, 1008 because the ECN markings would be applied directly to IP packets. 1009 The guidance in this section is only applicable where an ECN 1010 capability is being added to a layer-2 protocol so that layer-2 1011 frames can be ECN-marked by an AQM at layer-2. This would only be 1012 necessary where AQM will be applied at pure layer-2 nodes (without 1013 IP-awareness). 1015 When layer-2 frame headers are stripped off and IP PDUs with 1016 different boundaries are forwarded, the provisions in RFC7141 for 1017 handling congestion indications when splitting or merging packets 1018 apply (see Section 2.4 of [RFC7141]. Those provisions include: "The 1019 general rule to follow is that the number of octets in packets with 1020 congestion indications SHOULD be equivalent before and after merging 1021 or splitting." See RFC 7141 for the complete provisions and related 1022 discussion, including an exception to that general rule. 1024 As also recommended in RFC 7141, the mechanism for propagating 1025 congestion indications SHOULD ensure that any new incoming congestion 1026 indication is propagated immediately, and not held awaiting possible 1027 arrival of further congestion indications sufficient to indicate 1028 congestion for all of the octets of an outgoing IP PDU. 1030 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 1031 Notification 1033 The guidance in this section is applicable, for example, when IP 1034 packets: 1036 o are encapsulated in Ethernet headers, which have no support for 1037 ECN; 1039 o are forwarded by the eNode-B (base station) of a 3GPP radio access 1040 network, which is required to apply ECN marking during congestion, 1041 [LTE-RA], [UTRAN], but the Packet Data Convergence Protocol (PDCP) 1042 that encapsulates the IP header over the radio access has no 1043 support for ECN. 1045 This guidance also generalizes to encapsulation by other subnet 1046 technologies with no native support for explicit congestion 1047 notification at the lower layer, but with support for finding and 1048 processing an IP header. It is unlikely to be applicable or 1049 necessary for IP-in-IP encapsulation, where feed-forward-and-up mode 1050 based on [RFC6040] would be more appropriate. 1052 Marking the IP header while switching at layer-2 (by using a layer-3 1053 switch) or while forwarding in a radio access network seems to 1054 represent a layering violation. However, it can be considered as a 1055 benign optimisation if the guidelines below are followed. Feed-up- 1056 and-forward is certainly not a general alternative to implementing 1057 feed-forward congestion notification in the lower layer, because: 1059 o IPv4 and IPv6 are not the only layer-3 protocols that might be 1060 encapsulated by lower layer protocols 1062 o Link-layer encryption might be in use, making the layer-2 payload 1063 inaccessible 1065 o Many Ethernet switches do not have 'layer-3 switch' capabilities 1066 so they cannot read or modify an IP payload 1068 o It might be costly to find an IP header (v4 or v6) when it may be 1069 encapsulated by more than one lower layer header, e.g. Ethernet 1070 MAC in MAC ([IEEE802.1Q]; previously 802.1ah). 1072 Nonetheless, configuring lower layer equipment to look for an ECN 1073 field in an encapsulated IP header is a useful optimisation. If the 1074 implementation follows the guidelines below, this optimisation does 1075 not have to be confined to a controlled environment such as within a 1076 data centre; it could usefully be applied on any network--even if the 1077 operator is not sure whether the above issues will never apply: 1079 1. If a native lower-layer congestion notification mechanism exists 1080 for a subnet technology, it is safe to mix feed-up-and-forward 1081 with feed-forward-and-up on other switches in the same subnet. 1082 However, it will generally be more efficient to use the native 1083 mechanism. 1085 2. The depth of the search for an IP header SHOULD be limited. If 1086 an IP header is not found soon enough, or an unrecognized or 1087 unreadable header is encountered, the switch SHOULD resort to an 1088 alternative means of signalling congestion (e.g. drop, or the 1089 native lower layer mechanism if available). 1091 3. It is sufficient to use the first IP header found in the stack; 1092 the egress of the relevant tunnel can propagate congestion 1093 notification upwards to any more deeply encapsulated IP headers 1094 later. 1096 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification 1098 It can be seen from Section 3.3 that congestion notification in a 1099 subnet using feed-backward mode has generally not been designed to be 1100 directly coupled with IP layer congestion notification. The subnet 1101 attempts to minimize congestion internally, and if the incoming load 1102 at the ingress exceeds the capacity somewhere through the subnet, the 1103 layer 3 buffer into the ingress backs up. Thus, a feed-backward mode 1104 subnet is in some sense similar to a null mode subnet, in that there 1105 is no need for any direct interaction between the subnet and higher 1106 layer congestion notification. Therefore no detailed protocol design 1107 guidelines are appropriate. Nonetheless, a more general guideline is 1108 appropriate: 1110 A subnetwork technology intended to eventually interface to IP 1111 SHOULD NOT be designed using only the feed-backward mode, which is 1112 certainly best for a stand-alone subnet, but would need to be 1113 modified to work efficiently as part of the wider Internet, 1114 because IP uses feed-forward-and-up mode. 1116 The feed-backward approach at least works beneath IP, where the term 1117 'works' is used only in a narrow functional sense because feed- 1118 backward can result in very inefficient and sluggish congestion 1119 control--except if it is confined to the subnet directly connected to 1120 the original data source, when it is faster than feed-forward. It 1121 would be valid to design a protocol that could work in feed-backward 1122 mode for paths that only cross one subnet, and in feed-forward-and-up 1123 mode for paths that cross subnets. 1125 In the early days of TCP/IP, a similar feed-backward approach was 1126 tried for explicit congestion signalling, using source-quench (SQ) 1127 ICMP control packets. However, SQ fell out of favour and is now 1128 formally deprecated [RFC6633]. The main problem was that it is hard 1129 for a data source to tell the difference between a spoofed SQ message 1130 and a quench request from a genuine buffer on the path. It is also 1131 hard for a lower layer buffer to address an SQ message to the 1132 original source port number, which may be buried within many layers 1133 of headers, and possibly encrypted. 1135 QCN (also known as backward congestion notification, BCN; see 1136 Sections 30--33 of [IEEE802.1Q]; previously known as 802.1Qau) uses a 1137 feed-backward mode structurally similar to ATM's relative rate 1138 mechanism. However, QCN confines its applicability to scenarios such 1139 as some data centres where all endpoints are directly attached by the 1140 same Ethernet technology. If a QCN subnet were later connected into 1141 a wider IP-based internetwork (e.g. when attempting to interconnect 1142 multiple data centres) it would suffer the inefficiency shown in 1143 Figure 3. 1145 7. IANA Considerations 1147 This memo includes no request to IANA. 1149 8. Security Considerations 1151 If a lower layer wire protocol is redesigned to include explicit 1152 congestion signalling in-band in the protocol header, care SHOULD be 1153 take to ensure that the field used is specified as mutable during 1154 transit. Otherwise interior nodes signalling congestion would 1155 invalidate any authentication protocol applied to the lower layer 1156 header--by altering a header field that had been assumed as 1157 immutable. 1159 The redesign of protocols that encapsulate IP in order to propagate 1160 congestion signals between layers raises potential signal integrity 1161 concerns. Experimental or proposed approaches exist for assuring the 1162 end-to-end integrity of in-band congestion signals, e.g.: 1164 o Congestion exposure (ConEx ) for networks to audit that their 1165 congestion signals are not being suppressed by other networks or 1166 by receivers, and for networks to police that senders are 1167 responding sufficiently to the signals, irrespective of the L4 1168 transport protocol used [RFC7713]. 1170 o A test for a sender to detect whether a network or the receiver is 1171 suppressing congestion signals (for example see 2nd para of 1172 Section 20.2 of [RFC3168]). 1174 Given these end-to-end approaches are already being specified, it 1175 would make little sense to attempt to design hop-by-hop congestion 1176 signal integrity into a new lower layer protocol, because end-to-end 1177 integrity inherently achieves hop-by-hop integrity. 1179 Section 6 gives vulnerability to spoofing as one of the reasons for 1180 deprecating feed-backward mode. 1182 9. Conclusions 1184 Following the guidance in this document enables ECN support to be 1185 extended to numerous protocols that encapsulate IP (v4 & v6) in a 1186 consistent way, so that IP continues to fulfil its role as an end-to- 1187 end interoperability layer. This includes: 1189 o A wide range of tunnelling protocols including those with various 1190 forms of shim header between two IP headers, possibly also 1191 separated by a L2 header; 1193 o A wide range of subnet technologies, particularly those that work 1194 in the same 'feed-forward-and-up' mode that is used to support ECN 1195 in IP and MPLS. 1197 Guidelines have been defined for supporting propagation of ECN 1198 between Ethernet and IP on so-called Layer-3 Ethernet switches, using 1199 a 'feed-up-and-forward' mode. This approach could enable other 1200 subnet technologies to pass ECN signals into the IP layer, even if 1201 they do not support ECN natively. 1203 Finally, attempting to add ECN to a subnet technology in feed- 1204 backward mode is deprecated except in special cases, due to its 1205 likely sluggish response to congestion. 1207 10. Acknowledgements 1209 Thanks to Gorry Fairhurst and David Black for extensive reviews. 1210 Thanks also to the following reviewers: Joe Touch, Andrew McGregor, 1211 Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald 1212 Eastlake, Jonathan Morton and Michael Welzl, who pointed out that 1213 lower layer congestion notification signals may have different 1214 semantics to those in IP. Thanks are also due to the tsvwg chairs, 1215 TSV ADs and IETF liaison people such as Eric Gray, Dan Romascanu and 1216 Gonzalo Camarillo for helping with the liaisons with the IEEE and 1217 3GPP. And thanks to Georg Mayer and particularly to Erik Guttman for 1218 the extensive search and categorisation of any 3GPP specifications 1219 that cite ECN specifications. 1221 Bob Briscoe was part-funded by the European Community under its 1222 Seventh Framework Programme through the Trilogy project (ICT-216372) 1223 for initial drafts and through the Reducing Internet Transport 1224 Latency (RITE) project (ICT-317700) subsequently. The views 1225 expressed here are solely those of the authors. 1227 11. Contributors 1229 Pat Thaler 1230 Broadcom Corporation (retired) 1231 CA 1232 USA 1234 Pat was a co-author of this draft, but retired before its 1235 publication. 1237 12. Comments Solicited 1239 Comments and questions are encouraged and very welcome. They can be 1240 addressed to the IETF Transport Area working group mailing list 1241 , and/or to the authors. 1243 13. References 1245 13.1. Normative References 1247 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1248 Requirement Levels", BCP 14, RFC 2119, 1249 DOI 10.17487/RFC2119, March 1997, 1250 . 1252 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1253 of Explicit Congestion Notification (ECN) to IP", 1254 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1255 . 1257 [RFC3819] Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D., 1258 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1259 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1260 RFC 3819, DOI 10.17487/RFC3819, July 2004, 1261 . 1263 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1264 Explicit Congestion Notification (ECN) Field", BCP 124, 1265 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1266 . 1268 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 1269 Marking in MPLS", RFC 5129, DOI 10.17487/RFC5129, January 1270 2008, . 1272 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1273 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1274 2010, . 1276 [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion 1277 Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, 1278 February 2014, . 1280 13.2. Informative References 1282 [ATM-TM-ABR] 1283 Cisco, "Understanding the Available Bit Rate (ABR) Service 1284 Category for ATM VCs", Design Technote 10415, June 2005. 1286 [Buck00] Buckwalter, J., "Frame Relay: Technology and Practice", 1287 Pub. Addison Wesley ISBN-13: 978-0201485240, 2000. 1289 [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", 1290 Bell Systems Technical Journal 32(2):406--424, March 1953. 1292 [GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across the Gn and Gp 1293 interface", Technical Specification TS 29.060. 1295 [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling 1296 Protocol User Plane (GTPv1-U)", Technical Specification TS 1297 29.281. 1299 [GTPv2-C] 3GPP, "Evolved General Packet Radio Service (GPRS) 1300 Tunnelling Protocol for Control plane (GTPv2-C)", 1301 Technical Specification TS 29.274. 1303 [I-D.ietf-intarea-gue] 1304 Herbert, T., Yong, L., and O. Zia, "Generic UDP 1305 Encapsulation", draft-ietf-intarea-gue-09 (work in 1306 progress), October 2019. 1308 [I-D.ietf-trill-ecn-support] 1309 Eastlake, D. E. and B. Briscoe, "TRILL (TRansparent 1310 Interconnection of Lots of Links): ECN (Explicit 1311 Congestion Notification) Support", draft-ietf-trill-ecn- 1312 support-07 (work in progress), February 2018. 1314 [I-D.ietf-tsvwg-ecn-l4s-id] 1315 Schepper, K. D. and B. Briscoe, "Explicit Congestion 1316 Notification (ECN) Protocol for Ultra-Low Queuing Delay 1317 (L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress), 1318 March 2021. 1320 [I-D.ietf-tsvwg-rfc6040update-shim] 1321 Briscoe, B., "Propagating Explicit Congestion Notification 1322 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1323 tsvwg-rfc6040update-shim-13 (work in progress), March 1324 2021. 1326 [IEEE802.1Q] 1327 IEEE, "IEEE Standard for Local and Metropolitan Area 1328 Networks--Virtual Bridged Local Area Networks--Amendment 1329 6: Provider Backbone Bridges", IEEE Std 802.1Q-2018, July 1330 2018, . 1332 [ITU-T.I.371] 1333 ITU-T, "Traffic Control and Congestion Control in B-ISDN", 1334 ITU-T Rec. I.371 (03/04), March 2004, 1335 . 1338 [Leiserson85] 1339 Leiserson, C., "Fat-trees: universal networks for 1340 hardware-efficient supercomputing", IEEE Transactions on 1341 Computers 34(10):892-901, October 1985. 1343 [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) 1344 and Evolved Universal Terrestrial Radio Access Network 1345 (E-UTRAN); Overall description; Stage 2", Technical 1346 Specification TS 36.300. 1348 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1349 DOI 10.17487/RFC2003, October 1996, 1350 . 1352 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1353 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1354 December 1998, . 1356 [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, 1357 W., and G. Zorn, "Point-to-Point Tunneling Protocol 1358 (PPTP)", RFC 2637, DOI 10.17487/RFC2637, July 1999, 1359 . 1361 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1362 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1363 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1364 . 1366 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1367 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1368 DOI 10.17487/RFC2784, March 2000, 1369 . 1371 [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of 1372 Explicit Congestion Notification (ECN) in IP Networks", 1373 RFC 2884, DOI 10.17487/RFC2884, July 2000, 1374 . 1376 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1377 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1378 . 1380 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1381 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1382 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1383 . 1385 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1386 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1387 December 2005, . 1389 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 1390 Network Address Translations (NATs)", RFC 4380, 1391 DOI 10.17487/RFC4380, February 2006, 1392 . 1394 [RFC5415] Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley, 1395 Ed., "Control And Provisioning of Wireless Access Points 1396 (CAPWAP) Protocol Specification", RFC 5415, 1397 DOI 10.17487/RFC5415, March 2009, 1398 . 1400 [RFC6633] Gont, F., "Deprecation of ICMP Source Quench Messages", 1401 RFC 6633, DOI 10.17487/RFC6633, May 2012, 1402 . 1404 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1405 Pre-Congestion Notification (PCN) States in the IP Header 1406 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1407 DOI 10.17487/RFC6660, July 2012, 1408 . 1410 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1411 Locator/ID Separation Protocol (LISP)", RFC 6830, 1412 DOI 10.17487/RFC6830, January 2013, 1413 . 1415 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 1416 Scheffenegger, Ed., "TCP Extensions for High Performance", 1417 RFC 7323, DOI 10.17487/RFC7323, September 2014, 1418 . 1420 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1421 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1422 eXtensible Local Area Network (VXLAN): A Framework for 1423 Overlaying Virtualized Layer 2 Networks over Layer 3 1424 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1425 . 1427 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1428 Recommendations Regarding Active Queue Management", 1429 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1430 . 1432 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1433 Virtualization Using Generic Routing Encapsulation", 1434 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1435 . 1437 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1438 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1439 DOI 10.17487/RFC7713, December 2015, 1440 . 1442 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 1443 Ghanwani, A., and S. Gupta, "Transparent Interconnection 1444 of Lots of Links (TRILL): Clarifications, Corrections, and 1445 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 1446 . 1448 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1449 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1450 . 1452 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 1453 Explicit Congestion Notification (ECN)", RFC 8087, 1454 DOI 10.17487/RFC8087, March 2017, 1455 . 1457 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1458 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1459 May 2017, . 1461 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1462 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1463 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1464 October 2017, . 1466 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 1467 "Network Service Header (NSH)", RFC 8300, 1468 DOI 10.17487/RFC8300, January 2018, 1469 . 1471 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1472 Notification (ECN) Experimentation", RFC 8311, 1473 DOI 10.17487/RFC8311, January 2018, 1474 . 1476 [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., 1477 "Geneve: Generic Network Virtualization Encapsulation", 1478 RFC 8926, DOI 10.17487/RFC8926, November 2020, 1479 . 1481 [UTRAN] 3GPP, "UTRAN Overall Description", Technical 1482 Specification TS 25.401. 1484 Appendix A. Changes in This Version (to be removed by RFC Editor) 1486 From ietf-12 to ietf-13 1488 * Following 3rd tsvwg WGLC: 1490 + Formalized update to RFC 3819 in its own subsection (1.1) 1491 and referred to it in the abstract 1493 + Scope: Clarified that the specification of alternative ECN 1494 semantics using ECT(1) was not in RFC 4774, but rather in 1495 RFC 8311, and that the problem with using a DSCP to indicate 1496 alternative semantics has issues at domain boundaries as 1497 well as tunnels. 1499 + Terminology: tighted up definitions of ECN-PDU and Not-ECN- 1500 PDU, and removed definition of Congestion Baseline, given it 1501 was only used once. 1503 + Mentioned QCN where feed-backward is first introduced (S.3), 1504 referring forward to where it is discussed more deeply 1505 (S.4). 1507 + Clarified that IS-IS solution to adding ECN support to TRILL 1508 was not pursued 1510 + Completely rewrote the rationale for the guideline about a 1511 Standard Congestion Monitoring Baseline, to focus on 1512 standardization of the otherwise unknown scenario used, 1513 rather than the relative usefulness of the info in each 1514 approach 1516 + Explained the re-framing problem better and added 1517 fragmentation as another possible cause of the problem 1519 + Acknowledged new reviewers 1521 + Updated references, replaced citations of 802.1Qau and 1522 802.1ah with rolled up 802.1Q, and added citations of Fat 1523 trees and Clos Networks 1525 + Numerous other editorial improvements 1527 From ietf-11 to ietf-12 1529 * Updated references 1531 From ietf-10 to ietf-11 1532 * Removed short section (was 3) 'Guidelines for All Cases' 1533 because it was out of scope, being covered by RFC 4774. 1534 Expanded the Scope section (1.2) to explain all this. 1535 Explained that the default encap/decap rules already support 1536 certain alternative semantics, particularly all three of the 1537 alternative semantics for ECT(1): equivalent to ECT(0) , higher 1538 severity than ECT(0), and unmarked but implying different 1539 marking semantics from ECT(0). 1541 * Clarified why the QCN example was being given even though not 1542 about increment deployment of ECN 1544 * Pointed to the spoofing issue with feed-backward mode from the 1545 Security Considerations section, to aid security review. 1547 * Removed any ambiguity in the word 'transport' throughout 1549 From ietf-09 to ietf-10 1551 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1552 be consistent with updates to draft-ietf-tsvwg-rfc6040update- 1553 shim. 1555 * Removed reference to the ECN nonce, which has been made 1556 historic by RFC 8311 1558 * Removed "Open Issues" Appendix, given all have been addressed. 1560 From ietf-08 to ietf-09 1562 * Updated para in Intro that listed all the IP-in-IP tunnelling 1563 protocols, to instead refer to draft-ietf-tsvwg-rfc6040update- 1564 shim 1566 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1567 summarize guidance that has evolved as rfc6040update-shim has 1568 developed. 1570 From ietf-07 to ietf-08: Refreshed to avoid expiry. Updated 1571 references. 1573 From ietf-06 to ietf-07: 1575 * Added the people involved in liaisons to the acknowledgements. 1577 From ietf-05 to ietf-06: 1579 * Introduction: Added GUE and Geneve as examples of tightly 1580 coupled shims between IP headers that cite RFC 6040. And added 1581 VXLAN to list of those that do not. 1583 * Replaced normative text about tightly coupled shims between IP 1584 headers, with reference to new draft-ietf-tsvwg-rfc6040update- 1585 shim 1587 * Wire Protocol Design: Indication of ECN Support: Added TRILL as 1588 an example of a well-design protocol that does not need an 1589 indication of ECN support in the wire protocol. 1591 * Encapsulation Guidelines: In the case of a Not-ECN-PDU with a 1592 CE outer, replaced SHOULD be dropped, with explanations of when 1593 SHOULD or MUST are appropriate. 1595 * Feed-Up-and-Forward Mode: Explained examples more carefully, 1596 referred to PDCP and cited UTRAN spec as well as E-UTRAN. 1598 * Updated references. 1600 * Marked open issues as resolved, but did not delete Open Issues 1601 Appendix (yet). 1603 From ietf-04 to ietf-05: 1605 * Explained why tightly coupled shim headers only "SHOULD" comply 1606 with RFC 6040, not "MUST". 1608 * Updated references 1610 From ietf-03 to ietf-04: 1612 * Addressed Richard Scheffenegger's review comments: primarily 1613 editorial corrections, and addition of examples for clarity. 1615 From ietf-02 to ietf-03: 1617 * Updated references, ad cited RFC4774. 1619 From ietf-01 to ietf-02: 1621 * Added Section for guidelines that are applicable in all cases. 1623 * Updated references. 1625 From ietf-00 to ietf-01: Updated references. 1627 From briscoe-04 to ietf-00: Changed filename following tsvwg 1628 adoption. 1630 From briscoe-03 to 04: 1632 * Re-arranged the introduction to describe the purpose of the 1633 document first before introducing ECN in more depth. And 1634 clarified the introduction throughout. 1636 * Added applicability to 3GPP TS 36.300. 1638 From briscoe-02 to 03: 1640 * Scope section: 1642 + Added dependence on correct propagation of traffic class 1643 information 1645 + For the feed-backward mode, deemed multicast and anycast out 1646 of scope 1648 * Ensured all guidelines referring to subnet technologies also 1649 refer to tunnels and vice versa by adding applicability 1650 sentences at the start of sections 4.1, 4.2, 4.3, 4.4, 4.6 and 1651 5. 1653 * Added Security Considerations on ensuring congestion signal 1654 fields are classed as immutable and on using end-to-end 1655 congestion signal integrity technologies rather than hop-by- 1656 hop. 1658 From briscoe-01 to 02: 1660 * Added authors: JK & PT 1662 * Added 1664 + Section 4.1 "IP-in-IP Tunnels with Tightly Coupled Shim 1665 Headers" 1667 + Section 4.5 "Sequences of Similar Tunnels or Subnets" 1669 + roadmap at the start of Section 4, given the subsections 1670 have become quite fragmented. 1672 + Section 9 "Conclusions" 1674 * Clarified why transports are starting to be able to saturate 1675 interior links 1677 * Under Section 1.1, addressed the question of alternative signal 1678 semantics and included multicast & anycast. 1680 * Under Section 3.1, included a 3GPP example. 1682 * Section 4.2. "Wire Protocol Design": 1684 + Altered guideline 2. to make it clear that it only applies 1685 to the immediate subnet egress, not later ones 1687 + Added a reminder that it is only necessary to check that ECN 1688 propagates at the egress, not whether interior nodes mark 1689 ECN 1691 + Added example of how QCN uses 802.1p to indicate support for 1692 QCN. 1694 * Added references to Appendix C of RFC6040, about monitoring the 1695 amount of congestion signals introduced within a tunnel 1697 * Appendix A: Added more issues to be addressed, including plan 1698 to produce a standards track update to IP-in-IP tunnel 1699 protocols. 1701 * Updated acks and references 1703 From briscoe-00 to 01: 1705 * Intended status: BCP (was Informational) & updates 3819 added. 1707 * Briefer Introduction: Introductory para justifying benefits of 1708 ECN. Moved all but a brief enumeration of modes of operation 1709 to their own new section (from both Intro & Scope). Introduced 1710 incr. deployment as most tricky part. 1712 * Tightened & added to terminology section 1714 * Structured with Modes of Operation, then Guidelines section for 1715 each mode. 1717 * Tightened up guideline text to remove vagueness / passive voice 1718 / ambiguity and highlight main guidelines as numbered items. 1720 * Added Outstanding Document Issues Appendix 1721 * Updated references 1723 Authors' Addresses 1725 Bob Briscoe 1726 Independent 1727 UK 1729 EMail: ietf@bobbriscoe.net 1730 URI: http://bobbriscoe.net/ 1732 John Kaippallimalil 1733 Futurewei 1734 5700 Tennyson Parkway, Suite 600 1735 Plano, Texas 75024 1736 USA 1738 EMail: kjohn@futurewei.com