idnits 2.17.1 draft-ietf-tsvwg-ecn-encap-guidelines-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC3819, but the abstract doesn't seem to directly say this. It does mention RFC3819 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3819, updated by this document, for RFC5378 checks: 1999-10-14) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 20, 2019) is 1803 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCXXXX' is mentioned on line 223, but not defined == Outdated reference: A later version (-09) exists of draft-ietf-intarea-gue-07 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-13 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-06 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-08 -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe 3 Internet-Draft Independent 4 Updates: 3819 (if approved) J. Kaippallimalil 5 Intended status: Best Current Practice Huawei 6 Expires: November 21, 2019 P. Thaler 7 Broadcom Corporation (retired) 8 May 20, 2019 10 Guidelines for Adding Congestion Notification to Protocols that 11 Encapsulate IP 12 draft-ietf-tsvwg-ecn-encap-guidelines-13 14 Abstract 16 The purpose of this document is to guide the design of congestion 17 notification in any lower layer or tunnelling protocol that 18 encapsulates IP. The aim is for explicit congestion signals to 19 propagate consistently from lower layer protocols into IP. Then the 20 IP internetwork layer can act as a portability layer to carry 21 congestion notification from non-IP-aware congested nodes up to the 22 transport layer (L4). Following these guidelines should assure 23 interworking among IP layer and lower layer congestion notification 24 mechanisms, whether specified by the IETF or other standards bodies. 25 This document updates the advice to subnetwork designers about ECN in 26 RFC 3819. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on November 21, 2019. 45 Copyright Notice 47 Copyright (c) 2019 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Update to RFC 3819 . . . . . . . . . . . . . . . . . . . 5 64 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 9 67 3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 9 68 3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 11 69 3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 12 70 3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 14 71 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 72 Notification . . . . . . . . . . . . . . . . . . . . . . . . 14 73 4.1. IP-in-IP Tunnels with Shim Headers . . . . . . . . . . . 15 74 4.2. Wire Protocol Design: Indication of ECN Support . . . . . 16 75 4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 18 76 4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 20 77 4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 22 78 4.6. Reframing and Congestion Markings . . . . . . . . . . . . 22 79 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 80 Notification . . . . . . . . . . . . . . . . . . . . . . . . 23 81 6. Feed-Backward Mode: Guidelines for Adding Congestion 82 Notification . . . . . . . . . . . . . . . . . . . . . . . . 24 83 7. IANA Considerations (to be removed by RFC Editor) . . . . . . 25 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 26 85 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 86 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 27 87 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 27 88 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 89 12.1. Normative References . . . . . . . . . . . . . . . . . . 27 90 12.2. Informative References . . . . . . . . . . . . . . . . . 28 91 Appendix A. Changes in This Version (to be removed by RFC 92 Editor) . . . . . . . . . . . . . . . . . . . . . . 33 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 96 1. Introduction 98 The benefits of Explicit Congestion Notification (ECN) described in 99 [RFC8087] and summarized below can only be fully realized if support 100 for ECN is added to the relevant subnetwork technology, as well as to 101 IP. When a lower layer buffer drops a packet obviously it does not 102 just drop at that layer; the packet disappears from all layers. In 103 contrast, when active queue management (AQM) at a lower layer marks a 104 packet with ECN, the marking needs to be explicitly propagated up the 105 layers. The same is true if AQM marks the outer header of a packet 106 that encapsulates inner tunnelled headers. Forwarding ECN is not as 107 straightforward as other headers because it has to be assumed ECN may 108 be only partially deployed. If a lower layer header that contains 109 ECN congestion indications is stripped off by a subnet egress that is 110 not ECN-aware, or if the ultimate receiver or sender is not ECN- 111 aware, congestion needs to be indicated by dropping a packet, not 112 marking it. 114 The purpose of this document is to guide the addition of congestion 115 notification to any subnet technology or tunnelling protocol, so that 116 lower layer AQM algorithms can signal congestion explicitly and it 117 will propagate consistently into encapsulated (higher layer) headers, 118 otherwise the signals will not reach their ultimate destination. 120 ECN is defined in the IP header (v4 and v6) [RFC3168] to allow a 121 resource to notify the onset of queue build-up without having to drop 122 packets, by explicitly marking a proportion of packets with the 123 congestion experienced (CE) codepoint. 125 Given a suitable marking scheme, ECN removes nearly all congestion 126 loss and it cuts delays for two main reasons: 128 o It avoids the delay when recovering from congestion losses, which 129 particularly benefits small flows or real-time flows, making their 130 delivery time predictably short [RFC2884]; 132 o As ECN is used more widely by end-systems, it will gradually 133 remove the need to configure a degree of delay into buffers before 134 they start to notify congestion (the cause of bufferbloat). This 135 is because drop involves a trade-off between sending a timely 136 signal and trying to avoid impairment, whereas ECN is solely a 137 signal not an impairment, so there is no harm triggering it 138 earlier. 140 Some lower layer technologies (e.g. MPLS, Ethernet) are used to form 141 subnetworks with IP-aware nodes only at the edges. These networks 142 are often sized so that it is rare for interior queues to overflow. 143 However, until recently this was more due to the inability of TCP to 144 saturate the links. For many years, fixes such as window scaling 145 [RFC1323] (now [RFC7323]) proved hard to deploy. And the Reno 146 variant of TCP has remained in widespread use despite its inability 147 to scale to high flow rates. However, now that modern operating 148 systems are finally capable of saturating interior links, even the 149 buffers of well-provisioned interior switches will need to signal 150 episodes of queuing. 152 Propagation of ECN is defined for MPLS [RFC5129], and is being 153 defined for TRILL [RFC7780], [I-D.ietf-trill-ecn-support], but it 154 remains to be defined for a number of other subnetwork technologies. 156 Similarly, ECN propagation is yet to be defined for many tunnelling 157 protocols. [RFC6040] defines how ECN should be propagated for IP-in- 158 IPv4 [RFC2003], IP-in-IPv6 [RFC2473] and IPsec [RFC4301] tunnels, but 159 there are numerous other tunnelling protocols with a shim and/or a 160 layer 2 header between two IP headers (v4 or v6). Some address ECN 161 propagation between the IP headers, but many do not. This document 162 gives guidance on how to address ECN propagation for future 163 tunnelling protocols, and a companion standards track specification 164 [I-D.ietf-tsvwg-rfc6040update-shim] updates those existing IP-shim- 165 (L2)-IP protocols that are under IETF change control and still widely 166 used. 168 Incremental deployment is the most delicate aspect when adding 169 support for ECN. The original ECN protocol in IP [RFC3168] was 170 carefully designed so that a congested buffer would not mark a packet 171 (rather than drop it) unless both source and destination hosts were 172 ECN-capable. Otherwise its congestion markings would never be 173 detected and congestion would just build up further. However, to 174 support congestion marking below the IP layer or within tunnels, it 175 is not sufficient to only check that the two layer 4 transport end- 176 points support ECN; correct operation also depends on the 177 decapsulator at each subnet or tunnel egress faithfully propagating 178 congestion notifications to the higher layer. Otherwise, a legacy 179 decapsulator might silently fail to propagate any ECN signals from 180 the outer to the forwarded header. Then the lost signals would never 181 be detected and again congestion would build up further. The 182 guidelines given later require protocol designers to carefully 183 consider incremental deployment, and suggest various safe approaches 184 for different circumstances. 186 Of course, the IETF does not have standards authority over every link 187 layer protocol. So this document gives guidelines for designing 188 propagation of congestion notification across the interface between 189 IP and protocols that may encapsulate IP (i.e. that can be layered 190 beneath IP). Each lower layer technology will exhibit different 191 issues and compromises, so the IETF or the relevant standards body 192 must be free to define the specifics of each lower layer congestion 193 notification scheme. Nonetheless, if the guidelines are followed, 194 congestion notification should interwork between different 195 technologies, using IP in its role as a 'portability layer'. 197 Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often 198 used in preference to 'MUST' or 'MUST NOT', because it is difficult 199 to know the compromises that will be necessary in each protocol 200 design. If a particular protocol design chooses not to follow a 201 'SHOULD (NOT)' given in the advice below, it MUST include a sound 202 justification. 204 It has not been possible to give common guidelines for all lower 205 layer technologies, because they do not all fit a common pattern. 206 Instead they have been divided into a few distinct modes of 207 operation: feed-forward-and-upward; feed-upward-and-forward; feed- 208 backward; and null mode. These modes are described in Section 3, 209 then in the subsequent sections separate guidelines are given for 210 each mode. 212 1.1. Update to RFC 3819 214 This document updates the brief advice to subnetwork designers about 215 ECN in [RFC3819], by replacing the last two paragraphs of Section 13 216 with the following sentence: 218 By following the guidelines in [RFCXXXX], subnetwork designers can 219 enable a layer-2 protocol to participate in congestion control 220 without dropping packets via propagation of explicit congestion 221 notification (ECN [RFC3168]) to receivers. 223 and adding [RFCXXXX] as an informative reference. {RFC Editor: Please 224 replace both instances of XXXX above with the number of this RFC when 225 published.} 227 1.2. Scope 229 This document only concerns wire protocol processing of explicit 230 notification of congestion. It makes no changes or recommendations 231 concerning algorithms for congestion marking or for congestion 232 response, because algorithm issues should be independent of the layer 233 the algorithm operates in. 235 The default ECN semantics are described in [RFC3168] and updated by 236 [RFC8311]. Also the guidelines for AQM designers [RFC7567] clarify 237 the semantics of both drop and ECN signals from AQM algorithms. 239 [RFC4774] is the appropriate best current practice specification of 240 how algorithms with alternative semantics for the ECN field can be 241 partitioned from Internet traffic that uses the default ECN 242 semantics. There are two main examples for how alternative ECN 243 semantics have been defined in practice: 245 o RFC 4774 suggests using the ECN field in combination with a 246 Diffserv codepoint such as in PCN [RFC6660], Voice over 3G [UTRAN] 247 or Voice over LTE (VoLTE) [LTE-RA]; 249 o RFC 8311 suggests using the ECT(1) codepoint of the ECN field to 250 indicate alternative semantics such as for the experimental Low 251 Latency Low Loss Scalable throughput (L4S) service 252 [I-D.ietf-tsvwg-ecn-l4s-id]). 254 The aim is that the default rules for encapsulating and decapsulating 255 the ECN field are sufficiently generic that tunnels and subnets will 256 encapsulate and decapsulate packets without regard to how algorithms 257 elsewhere are setting or interpreting the semantics of the ECN field. 258 [RFC6040] updates RFC 4774 to allow alternative encapsulation and 259 decapsulation behaviours to be defined for alternative ECN semantics. 260 However it reinforces the same point - that it is far preferable to 261 try to fit within the common ECN encapsulation and decapsulation 262 behaviours, because expecting all lower layer technologies and 263 tunnels to be updated is likely to be completely impractical. 265 Alternative semantics for the ECN field can be defined to depend on 266 the traffic class indicated by the DSCP. Therefore correct 267 propagation of congestion signals could depend on correct propagation 268 of the DSCP between the layers and along the path. For instance, if 269 the meaning of the ECN field depends on the DSCP (as in PCN or VoLTE) 270 and if the outer DSCP is stripped on descapsulation, as in the pipe 271 model of [RFC2983], the special semantics of the ECN field would be 272 lost. Similarly, if the DSCP is changed at the boundary between 273 Diffserv domains, the special ECN semantics would also be lost. This 274 is an important implication of the localized scope of most Diffserv 275 arrangements. In this document, correct propagation of traffic class 276 information is assumed, while what 'correct' means and how it is 277 achieved is covered elsewhere (e.g. RFC 2983) and is outside the 278 scope of the present document. 280 The guidelines in this document do ensure that common encapsulation 281 and decapsulation rules are sufficiently generic to cover cases where 282 ECT(1) is used instead of ECT(0) to identify alternative ECN 283 semantics (as in L4S [I-D.ietf-tsvwg-ecn-l4s-id]) and where ECN 284 marking algorithms use ECT(1) to encode 3 severity levels into the 285 ECN field (e.g. PCN [RFC6660]) rather than the default of 2. All 286 these different semantics for the ECN field work because it has been 287 possible to define common default decapsulation rules that allow for 288 all cases. 290 Note that the guidelines in this document do not necessarily require 291 the subnet wire protocol to be changed to add support for congestion 292 notification. For instance, the Feed-Up-and-Forward Mode 293 (Section 3.2) and the Null Mode (Section 3.4) do not. Another way to 294 add congestion notification without consuming header space in the 295 subnet protocol might be to use a parallel control plane protocol. 297 This document focuses on the congestion notification interface 298 between IP and lower layer or tunnel protocols that can encapsulate 299 IP, where the term 'IP' includes v4 or v6, unicast, multicast or 300 anycast. However, it is likely that the guidelines will also be 301 useful when a lower layer protocol or tunnel encapsulates itself, 302 e.g. Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah) or when 303 it encapsulates other protocols. In the feed-backward mode, 304 propagation of congestion signals for multicast and anycast packets 305 is out-of-scope (because the complexity would make it unlikely to be 306 attempted). 308 2. Terminology 310 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 311 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 312 document are to be interpreted as described in RFC 2119 [RFC2119] 313 when, and only when, they appear in all capitals, as shown here. 315 Further terminology used within this document: 317 Protocol data unit (PDU): Information that is delivered as a unit 318 among peer entities of a layered network consisting of protocol 319 control information (typically a header) and possibly user data 320 (payload) of that layer. The scope of this document includes 321 layer 2 and layer 3 networks, where the PDU is respectively termed 322 a frame or a packet (or a cell in ATM). PDU is a general term for 323 any of these. This definition also includes a payload with a shim 324 header lying somewhere between layer 2 and 3. 326 Transport: The end-to-end transmission control function, 327 conventionally considered at layer-4 in the OSI reference model. 328 Given the audience for this document will often use the word 329 transport to mean low level bit carriage, whenever the term is 330 used it will be qualified, e.g. 'L4 transport'. 332 Encapsulator: The link or tunnel endpoint function that adds an 333 outer header to a PDU (also termed the 'link ingress', the 'subnet 334 ingress', the 'ingress tunnel endpoint' or just the 'ingress' 335 where the context is clear). 337 Decapsulator: The link or tunnel endpoint function that removes an 338 outer header from a PDU (also termed the 'link egress', the 339 'subnet egress', the 'egress tunnel endpoint' or just the 'egress' 340 where the context is clear). 342 Incoming header: The header of an arriving PDU before encapsulation. 344 Outer header: The header added to encapsulate a PDU. 346 Inner header: The header encapsulated by the outer header. 348 Outgoing header: The header forwarded by the decapsulator. 350 CE: Congestion Experienced [RFC3168] 352 ECT: ECN-Capable (L4) Transport [RFC3168] 354 Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] 356 Load Regulator: For each flow of PDUs, the transport function that 357 is capable of controlling the data rate. Typically located at the 358 data source, but in-path nodes can regulate load in some 359 congestion control arrangements (e.g. admission control, policing 360 nodes or transport circuit-breakers [RFC8084]). Note the term "a 361 function capable of controlling the load" deliberately includes a 362 transport that does not actually control the load responsively but 363 ideally it ought to (e.g. a sending application without congestion 364 control that uses UDP). 366 ECN-PDU: A PDU at the IP layer or below with a capacity to signal 367 congestion that is part of a congestion control feedback loop 368 within which all the nodes necessary to propagate the signal back 369 to the Load Regulator are capable of doing that propagation. An 370 IP packet with a non-zero ECN field implies that the endpoints are 371 ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is 372 intended to be a general term for a PDU at lower layers, as well 373 as at the IP layer. 375 Not-ECN-PDU: A PDU at the IP layer or below that is part of a 376 congestion control feedback-loop within which at least one node 377 necessary to propagate any explicit congestion notification 378 signals back to the Load Regulator is not capable of doing that 379 propagation. 381 3. Modes of Operation 383 This section sets down the different modes by which congestion 384 information is passed between the lower layer and the higher one. It 385 acts as a reference framework for the following sections, which give 386 normative guidelines for designers of explicit congestion 387 notification protocols, taking each mode in turn: 389 Feed-Forward-and-Up: Nodes feed forward congestion notification 390 towards the egress within the lower layer then up and along the 391 layers towards the end-to-end destination at the transport layer. 392 The following local optimisation is possible: 394 Feed-Up-and-Forward: A lower layer switch feeds-up congestion 395 notification directly into the higher layer (e.g. into the ECN 396 field in the IP header), irrespective of whether the node is at 397 the egress of a subnet. 399 Feed-Backward: Nodes feed back congestion signals towards the 400 ingress of the lower layer and (optionally) attempt to control 401 congestion within their own layer. 403 Null: Nodes cannot experience congestion at the lower layer except 404 at ingress nodes (which are IP-aware or equivalently higher-layer- 405 aware). 407 3.1. Feed-Forward-and-Up Mode 409 Like IP and MPLS, many subnet technologies are based on self- 410 contained protocol data units (PDUs) or frames sent unreliably. They 411 provide no feedback channel at the subnetwork layer, instead relying 412 on higher layers (e.g. TCP) to feed back loss signals. 414 In these cases, ECN may best be supported by standardising explicit 415 notification of congestion into the lower layer protocol that carries 416 the data forwards. Then a specification is needed for how the egress 417 of the lower layer subnet propagates this explicit signal into the 418 forwarded upper layer (IP) header. This signal continues forwards 419 until it finally reaches the destination transport (at L4). Then 420 typically the destination will feed this congestion notification back 421 to the source transport using an end-to-end protocol (e.g. TCP). 422 This is the arrangement that has already been used to add ECN to IP- 423 in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. 425 This mode is illustrated in Figure 1. Along the middle of the 426 figure, layers 2, 3 and 4 of the protocol stack are shown, and one 427 packet is shown along the bottom as it progresses across the network 428 from source to destination, crossing two subnets connected by a 429 router, and crossing two switches on the path across each subnet. 430 Congestion at the output of the first switch (shown as *) leads to a 431 congestion marking in the L2 header (shown as C in the illustration 432 of the packet). The chevrons show the progress of the resulting 433 congestion indication. It is propagated from link to link across the 434 subnet in the L2 header, then when the router removes the marked L2 435 header, it propagates the marking up into the L3 (IP) header. The 436 router forwards the marked L3 header into subnet 2, and when it adds 437 a new L2 header it copies the L3 marking into the L2 header as well, 438 as shown by the 'C's in both layers (assuming the technology of 439 subnet 2 also supports explicit congestion marking). 441 Note that there is no implication that each 'C' marking is encoded 442 the same; a different encoding might be used for the 'C' marking in 443 each protocol. 445 Finally, for completeness, we show the L3 marking arriving at the 446 destination, where the host transport protocol (e.g. TCP) feeds it 447 back to the source in the L4 acknowledgement (the 'C' at L4 in the 448 packet at the top of the diagram). 450 _ _ _ 451 /_______ | | |C| ACK Packet (V) 452 \ |_|_|_| 453 +---+ layer: 2 3 4 header +---+ 454 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 455 | | +---+ | ^ | 456 | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 457 | | +---+ +---+ | ^ | +---+ +---+ | | 458 | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 459 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 460 source subnet A router subnet B dest 461 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 462 | | | | | | | | |C| | | |C| | | |C|C| Data________\ 463 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / 464 layer: 4 3 2A 4 3 2A 4 3 4 3 2B 465 header 467 Figure 1: Feed-Forward-and-Up Mode 469 Of course, modern networks are rarely as simple as this text-book 470 example, often involving multiple nested layers. For example, a 3GPP 471 mobile network may have two IP-in-IP (GTP [GTPv1]) tunnels in series 472 and an MPLS backhaul between the base station and the first router. 473 Nonetheless, the example illustrates the general idea of feeding 474 congestion notification forward then upward whenever a header is 475 removed at the egress of a subnet. 477 Note that the FECN (forward ECN ) bit in Frame Relay [Buck00] and the 478 explicit forward congestion indication (EFCI [ITU-T.I.371]) bit in 479 ATM user data cells follow a feed-forward pattern. However, in ATM, 480 this arrangement is only part of a feed-forward-and-backward pattern 481 at the lower layer, not feed-forward-and-up out of the lower layer-- 482 the intention was never to interface to IP ECN at the subnet egress. 483 To our knowledge, Frame Relay FECN is solely used to detect where 484 more capacity should be provisioned. 486 3.2. Feed-Up-and-Forward Mode 488 Ethernet is particularly difficult to extend incrementally to support 489 explicit congestion notification. One way to support ECN in such 490 cases has been to use so called 'layer-3 switches'. These are 491 Ethernet switches that dig into the Ethernet payload to find an IP 492 header and manipulate or act on certain IP fields (specifically 493 Diffserv & ECN). For instance, in Data Center TCP [RFC8257], layer-3 494 switches are configured to mark the ECN field of the IP header within 495 the Ethernet payload when their output buffer becomes congested. 496 With respect to switching, a layer-3 switch acts solely on the 497 addresses in the Ethernet header; it does not use IP addresses, and 498 it does not decrement the TTL field in the IP header. 500 _ _ _ 501 /_______ | | |C| ACK packet (V) 502 \ |_|_|_| 503 +---+ layer: 2 3 4 header +---+ 504 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 505 | | +---+ | ^ | 506 | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 507 | | +--^+ +---+ | | +---+ +---+ | | 508 | | | *| | | | | | | | | | |L2 509 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 510 source subnet E router subnet F dest 511 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ 512 | | | | | | | |C| | | | |C| | | |C|C| data________\ 513 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 514 layer: 4 3 2 4 3 2 4 3 4 3 2 515 header 517 Figure 2: Feed-Up-and-Forward Mode 519 By comparing Figure 2 with Figure 1, it can be seen that subnet E 520 (perhaps a subnet of layer-3 Ethernet switches) works in feed-up-and- 521 forward mode by notifying congestion directly into L3 at the point of 522 congestion, even though the congested switch does not otherwise act 523 at L3. In this example, the technology in subnet F (e.g. MPLS) does 524 support ECN natively, so when the router adds the layer-2 header it 525 copies the ECN marking from L3 to L2 as well. 527 3.3. Feed-Backward Mode 529 In some layer 2 technologies, explicit congestion notification has 530 been defined for use internally within the subnet with its own 531 feedback and load regulation, but typically the interface with IP for 532 ECN has not been defined. 534 For instance, for the available bit-rate (ABR) service in ATM, the 535 relative rate mechanism was one of the more popular mechanisms for 536 managing traffic, tending to supersede earlier designs. In this 537 approach ATM switches send special resource management (RM) cells in 538 both the forward and backward directions to control the ingress rate 539 of user data into a virtual circuit. If a switch buffer is 540 approaching congestion or is congested it sends an RM cell back 541 towards the ingress with respectively the No Increase (NI) or 542 Congestion Indication (CI) bit set in its message type field 543 [ATM-TM-ABR]. The ingress then holds or decreases its sending bit- 544 rate accordingly. 546 _ _ _ 547 /_______ | | |C| ACK packet (X) 548 \ |_|_|_| 549 +---+ layer: 2 3 4 header +---+ 550 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 551 | | +---+ | ^ | 552 | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 553 | | +---+ +---+ | | +---+ +---+ | | 554 | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 555 | | . . | . |Packet U | . . | . | . . | . | . . | .*| . . | |L2 556 |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| 557 source subnet G router subnet H dest 558 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ later 559 | | | | | | | | | | | | | | | | |C| | data________\ 560 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (W) / 561 4 3 2 4 3 2 4 3 4 3 2 562 _ 563 /__ |C| Feedback control 564 \ |_| cell/frame (V) 565 2 566 __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier 567 | | | | | | | | | | | | | | | | | | | data________\ 568 |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / 569 layer: 4 3 2 4 3 2 4 3 4 3 2 570 header 572 Figure 3: Feed-Backward Mode 574 ATM's feed-backward approach does not fit well when layered beneath 575 IP's feed-forward approach--unless the initial data source is the 576 same node as the ATM ingress. Figure 3 shows the feed-backward 577 approach being used in subnet H. If the final switch on the path is 578 congested (*), it does not feed-forward any congestion indications on 579 packet (U). Instead it sends a control cell (V) back to the router 580 at the ATM ingress. 582 However, the backward feedback does not reach the original data 583 source directly because IP does not support backward feedback (and 584 subnet G is independent of subnet H). Instead, the router in the 585 middle throttles down its sending rate but the original data sources 586 don't reduce their rates. The resulting rate mismatch causes the 587 middle router's buffer at layer 3 to back up until it becomes 588 congested, which it signals forwards on later data packets at layer 3 589 (e.g. packet W). Note that the forward signal from the middle router 590 is not triggered directly by the backward signal. Rather, it is 591 triggered by congestion resulting from the middle router's mismatched 592 rate response to the backward signal. 594 In response to this later forward signalling, end-to-end feedback at 595 layer-4 finally completes the tortuous path of congestion indications 596 back to the origin data source, as before. 598 Quantized congestion notification (QCN [IEEE802.1Q]) would suffer 599 from similar problems if extended to multiple subnets. However, from 600 the start QCN was clearly characterized as solely applicable to a 601 single subnet (see Section 6). 603 3.4. Null Mode 605 Often link and physical layer resources are 'non-blocking' by design. 606 In these cases congestion notification may be implemented but it does 607 not need to be deployed at the lower layer; ECN in IP would be 608 sufficient. 610 A degenerate example is a point-to-point Ethernet link. Excess 611 loading of the link merely causes the queue from the higher layer to 612 back up, while the lower layer remains immune to congestion. Even a 613 whole meshed subnetwork can be made immune to interior congestion by 614 limiting ingress capacity and sufficient sizing of interior links, 615 e.g. a non-blocking fat-tree network [Leiserson85]. An alternative 616 to fat links near the root is numerous thin links with multi-path 617 routing to ensure even worst-case patterns of load cannot congest any 618 link, e.g. a Clos network [Clos53]. 620 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 621 Notification 623 Feed-forward-and-up is the mode already used for signalling ECN up 624 the layers through MPLS into IP [RFC5129] and through IP-in-IP 625 tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 626 [RFC2473] or IPsec [RFC4301]. These RFCs take a consistent approach 627 and the following guidelines are designed to ensure this consistency 628 continues as ECN support is added to other protocols that encapsulate 629 IP. The guidelines are also designed to ensure compliance with the 630 more general best current practice for the design of alternate ECN 631 schemes given in [RFC4774] and extended by [RFC8311]. 633 The rest of this section is structured as follows: 635 o Section 4.1 addresses the most straightforward cases, where 636 [RFC6040] can be applied directly to add ECN to tunnels that are 637 effectively IP-in-IP tunnels, but with shim header(s) between the 638 IP headers. 640 o The subsequent sections give guidelines for adding ECN to a subnet 641 technology that uses feed-forward-and-up mode like IP, but it is 642 not so similar to IP that [RFC6040] rules can be applied directly. 643 Specifically: 645 * Sections 4.2, 4.3 and 4.4 respectively address how to add ECN 646 support to the wire protocol and to the encapsulators and 647 decapsulators at the ingress and egress of the subnet. 649 * Section 4.5 deals with the special, but common, case of 650 sequences of tunnels or subnets that all use the same 651 technology 653 * Section 4.6 deals with the question of reframing when IP 654 packets do not map 1:1 into lower layer frames. 656 4.1. IP-in-IP Tunnels with Shim Headers 658 A common pattern for many tunnelling protocols is to encapsulate an 659 inner IP header with shim header(s) then an outer IP header. A shim 660 header is defined as one that is not sufficient alone to forward the 661 packet as an outer header. Another common pattern is for a shim to 662 encapsulate a layer 2 (L2) header, which in turn encapsulates (or 663 might encapsulate) an IP header. [I-D.ietf-tsvwg-rfc6040update-shim] 664 clarifies that RFC 6040 is just as applicable when there are shim(s) 665 and possibly a L2 header between two IP headers. 667 However, it is not always feasible or necessary to propagate ECN 668 between IP headers when separated by a shim. For instance, it might 669 be too costly to dig to arbitrary depths to find an inner IP header, 670 there may be little or no congestion within the tunnel by design (see 671 null mode in Section 3.4 above), or a legacy implementation might not 672 support ECN. In cases where a tunnel does not support ECN, it is 673 important that the ingress does not copy the ECN field from an inner 674 IP header to an outer. Therefore section 4 of 675 [I-D.ietf-tsvwg-rfc6040update-shim] requires network operators to 676 configure the ingress of a tunnel that does not support ECN so that 677 it zeros the ECN field in the outer IP header. 679 Nonetheless, in many cases it is feasible to propagate the ECN field 680 between IP headers separated by shim header(s) and/or a L2 header. 681 Particularly in the typical case when the outer IP header and the 682 shim(s) are added (or removed) as part of the same procedure. Even 683 if the shim(s) encapsulate a L2 header, it is often possible to find 684 an inner IP header within the L2 PDU and propagate ECN between that 685 and the outer IP header. This can be thought of as a special case of 686 the feed-up-and-forward mode (Section 3.2), so the guidelines for 687 this mode apply (Section 5). 689 Numerous shim protocols have been defined for IP tunnelling. More 690 recent ones e.g. Generic UDP Encapsulation (GUE) 691 [I-D.ietf-intarea-gue] and Geneve [I-D.ietf-nvo3-geneve] cite and 692 follow RFC 6040. And some earlier ones, e.g. CAPWAP [RFC5415] and 693 LISP [RFC6830], cite RFC 3168, which is compatible with RFC 6040. 695 However, as Section 9.3 of RFC 3168 pointed out, ECN support needs to 696 be defined for many earlier shim-based tunnelling protocols, e.g. 697 L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], 698 GTP [GTPv1], [GTPv1-U], [GTPv2-C] and Teredo [RFC4380] as well as 699 some recent ones, e.g. VXLAN [RFC7348], NVGRE [RFC7637] and NSH 700 [RFC8300]. 702 All these IP-based encapsulations can be updated in one shot by 703 simple reference to RFC 6040. However, it would not be appropriate 704 to update all these protocols from within the present guidance 705 document. Instead a companion specification 706 [I-D.ietf-tsvwg-rfc6040update-shim] has been prepared that has the 707 appropriate standards track status to update standards track 708 protocols. For those that are not under IETF change control 709 [I-D.ietf-tsvwg-rfc6040update-shim] can only recommend that the 710 relevant body updates them. 712 4.2. Wire Protocol Design: Indication of ECN Support 714 This section is intended to guide the redesign of any lower layer 715 protocol that encapsulate IP to add native ECN support at the lower 716 layer. It reflects the approaches used in [RFC6040] and in 717 [RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS 718 encapsulations that already comply with [RFC6040] or [RFC5129] will 719 already satisfy this guidance. 721 A lower layer (or subnet) congestion notification system: 723 1. SHOULD NOT apply explicit congestion notifications to PDUs that 724 are destined for legacy layer-4 transport implementations that 725 will not understand ECN, and 727 2. SHOULD NOT apply explicit congestion notifications to PDUs if the 728 egress of the subnet might not propagate congestion notifications 729 onward into the higher layer. 731 We use the term ECN-PDUs for a PDU on a feedback loop that will 732 propagate congestion notification properly because it meets both 733 the above criteria. And a Not-ECN-PDU is a PDU on a feedback 734 loop that does not meet at least one of the criteria, and will 735 therefore not propagate congestion notification properly. A 736 corollary of the above is that a lower layer congestion 737 notification protocol: 739 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. 741 Note that there is no need for all interior nodes within a subnet to 742 be able to mark congestion explicitly. A mix of ECN and drop signals 743 from different nodes is fine. However, if _any_ interior nodes might 744 generate ECN markings, guideline 2 above says that all relevant 745 egress node(s) SHOULD be able to propagate those markings up to the 746 higher layer. 748 In IP, if the ECN field in each PDU is cleared to the Not-ECT (not 749 ECN-capable transport) codepoint, it indicates that the L4 transport 750 will not understand congestion markings. A congested buffer must not 751 mark these Not-ECT PDUs, and therefore drops them instead. 753 The mechanism a lower layer uses to distinguish the ECN-capability of 754 PDUs need not mimic that of IP. The above guidelines merely say that 755 the lower layer system, as a whole, should achieve the same outcome. 756 For instance, ECN-capable feedback loops might use PDUs that are 757 identified by a particular set of labels or tags. Alternatively, 758 logical link protocols that use flow state might determine whether a 759 PDU can be congestion marked by checking for ECN-support in the flow 760 state. Other protocols might depend on out-of-band control signals. 762 The per-domain checking of ECN support in MPLS [RFC5129] is a good 763 example of a way to avoid sending congestion markings to L4 764 transports that will not understand them, without using any header 765 space in the subnet protocol. 767 In MPLS, header space is extremely limited, therefore RFC5129 does 768 not provide a field in the MPLS header to indicate whether the PDU is 769 an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are 770 allowed to set explicit congestion indications without checking 771 whether the PDU is destined for a L4 transport that will understand 772 them. Nonetheless, this is made safe by requiring that the network 773 operator upgrades all decapsulating edges of a whole domain at once, 774 as soon as even one switch within the domain is configured to mark 775 rather than drop during congestion. Therefore, any edge node that 776 might decapsulate a packet will be capable of checking whether the 777 higher layer transport is ECN-capable. When decapsulating a CE- 778 marked packet, if the decapsulator discovers that the higher layer 779 (inner header) indicates the transport is not ECN-capable, it drops 780 the packet--effectively on behalf of the earlier congested node (see 781 Decapsulation Guideline 1 in Section 4.4). 783 It was only appropriate to define such an incremental deployment 784 strategy because MPLS is targeted solely at professional operators, 785 who can be expected to ensure that a whole subnetwork is consistently 786 configured. This strategy might not be appropriate for other link 787 technologies targeted at zero-configuration deployment or deployment 788 by the general public (e.g. Ethernet). For such 'plug-and-play' 789 environments it will be necessary to invent a failsafe approach that 790 ensures congestion markings will never fall into black holes, no 791 matter how inconsistently a system is put together. Alternatively, 792 congestion notification relying on correct system configuration could 793 be confined to flavours of Ethernet intended only for professional 794 network operators, such as Provider Backbone Bridges (PBB 795 [IEEE802.1Q]; previously 802.1ah). 797 ECN support in TRILL [I-D.ietf-trill-ecn-support] provides a good 798 example of how to add ECN to a lower layer protocol without relying 799 on careful and consistent operator configuration. TRILL provides an 800 extension header word with space for flags of different categories 801 depending on whether logic to understand the extension is critical. 802 The congestion experienced marking has been defined as a 'critical 803 ingress-to-egress' flag. So if a transit RBridge sets this flag and 804 an egress RBridge does not have any logic to process it, it will drop 805 it; which is the desired default action anyway. Therefore TRILL 806 RBridges can be updated with support for ECN in no particular order 807 and, at the egress of the TRILL campus, congestion notification will 808 be propagated to IP as ECN whenever ECN logic has been implemented, 809 or as drop otherwise. 811 QCN [IEEE802.1Q] is not intended to extend beyond a single subnet, or 812 to interoperate with ECN. Nonetheless, the way QCN indicates to 813 lower layer devices that the end-points will not understand QCN 814 provides another example that a lower layer protocol designer might 815 be able to mimic for their scenario. An operator can define certain 816 802.1p classes of service to indicate non-QCN frames and an ingress 817 bridge is required to map arriving not-QCN-capable IP packets to one 818 of these non-QCN 802.1p classes. 820 4.3. Encapsulation Guidelines 822 This section is intended to guide the redesign of any node that 823 encapsulates IP with a lower layer header when adding native ECN 824 support to the lower layer protocol. It reflects the approaches used 825 in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- 826 MPLS or MPLS-in-MPLS encapsulations that already comply with 827 [RFC6040] or [RFC5129] will already satisfy this guidance. 829 1. Egress Capability Check: A subnet ingress needs to be sure that 830 the corresponding egress of a subnet will propagate any 831 congestion notification added to the outer header across the 832 subnet. This is necessary in addition to checking that an 833 incoming PDU indicates an ECN-capable (L4) transport. Examples 834 of how this guarantee might be provided include: 836 * by configuration (e.g. if any label switches in a domain 837 support ECN marking, [RFC5129] requires all egress nodes to 838 have been configured to propagate ECN) 840 * by the ingress explicitly checking that the egress propagates 841 ECN (e.g. an early attempt to add ECN support to TRILL used 842 IS-IS to check path capabilities before adding ECN extension 843 flags to each frame [RFC7780]). 845 * by inherent design of the protocol (e.g. by encoding ECN 846 marking on the outer header in such a way that a legacy egress 847 that does not understand ECN will consider the PDU corrupt or 848 invalid and discard it, thus at least propagating a form of 849 congestion signal). 851 2. Egress Fails Capability Check: If the ingress cannot guarantee 852 that the egress will propagate congestion notification, the 853 ingress SHOULD disable ECN at the lower layer when it forwards 854 the PDU. An example of how the ingress might disable ECN at the 855 lower layer would be by setting the outer header of the PDU to 856 identify it as a Not-ECN-PDU, assuming the subnet technology 857 supports such a concept. 859 3. Standard Congestion Monitoring Baseline: Once the ingress to a 860 subnet has established that the egress will correctly propagate 861 ECN, on encapsulation it SHOULD encode the same level of 862 congestion in outer headers as is arriving in incoming headers. 863 For example it might copy any incoming congestion notification 864 into the outer header of the lower layer protocol. 866 This ensures that bulk congestion monitoring of outer headers 867 (e.g. by a network management node monitoring ECN in passing 868 frames) will measure congestion accumulated along the whole 869 upstream path - since the Load Regulator not just since the 870 ingress of the subnet. A node that is not the Load Regulator 871 SHOULD NOT re-initialize the level of CE markings in the outer to 872 zero. 874 It would still also be possible to measure congestion introduced 875 across one subnet (or tunnel) by subtracting the level of CE 876 markings on inner headers from that on outer headers (see 877 Appendix C of [RFC6040]). For example: 879 * If this guideline has been followed and if the level of CE 880 markings is 0.4% on the outer and 0.1% on the inner, 0.4% 881 congestion has been introduced across all the networks since 882 the load regulator, and 0.3% (= 0.4% - 0.1%) has been 883 introduced since the ingress to the current subnet (or 884 tunnel); 886 * Without this guideline, if the subnet ingress had re- 887 initialized the outer congestion level to zero, the outer and 888 inner would measure 0.1% and 0.3%. It would still be possible 889 to infer that the congestion introduced since the Load 890 Regulator was 0.4% (= 0.1% + 0.3%). But only if the 891 monitoring system somehow knows whether the subnet ingress re- 892 initialized the congestion level. 894 As long as subnet and tunnel technologies use the standard 895 congestion monitoring baseline in this guideline, monitoring 896 systems will know to use the former approach, rather than having 897 to "somehow know" which approach to use. 899 4.4. Decapsulation Guidelines 901 This section is intended to guide the redesign of any node that 902 decapsulates IP from within a lower layer header when adding native 903 ECN support to the lower layer protocol. It reflects the approaches 904 used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or 905 IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with 906 [RFC6040] or [RFC5129] will already satisfy this guidance. 908 A subnet egress SHOULD NOT simply copy congestion notification from 909 outer headers to the forwarded header. It SHOULD calculate the 910 outgoing congestion notification field from the inner and outer 911 headers using the following guidelines. If there is any conflict, 912 rules earlier in the list take precedence over rules later in the 913 list: 915 1. If the arriving inner header is a Not-ECN-PDU it implies the L4 916 transport will not understand explicit congestion markings. 917 Then: 919 * If the outer header carries an explicit congestion marking, 920 drop is the only indication of congestion that the L4 921 transport will understand. If the congestion marking is the 922 most severe possible, the packet MUST be dropped. However, if 923 congestion can be marked with multiple levels severity and the 924 packet's marking is not the most severe, the packet MAY be 925 forwarded, but it SHOULD be dropped. 927 * If the outer is an ECN-PDU that carries no indication of 928 congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but 929 still as a Not-ECN-PDU. 931 2. If the outer header does not support explicit congestion 932 notification (a Not-ECN-PDU), but the inner header does (an ECN- 933 PDU), the inner header SHOULD be forwarded unchanged. 935 3. In some lower layer protocols congestion may be signalled as a 936 numerical level, such as in the control frames of quantized 937 congestion notification (QCN [IEEE802.1Q]). If such a multi-bit 938 encoding encapsulates an ECN-capable IP data packet, a function 939 will be needed to convert the quantized congestion level into the 940 frequency of congestion markings in outgoing IP packets. 942 4. Congestion indications might be encoded by a severity level. For 943 instance increasing levels of congestion might be encoded by 944 numerically increasing indications, e.g. pre-congestion 945 notification (PCN) can be encoded in each PDU at three severity 946 levels in IP or MPLS [RFC6660] and the default encapsulation and 947 decapsulation rules [RFC6040] are compatible with this 948 interpretation of the ECN field. 950 If the arriving inner header is an ECN-PDU, where the inner and 951 outer headers carry indications of congestion of different 952 severity, the more severe indication SHOULD be forwarded in 953 preference to the less severe. 955 5. The inner and outer headers might carry a combination of 956 congestion notification fields that should not be possible given 957 any currently used protocol transitions. For instance, if 958 Encapsulation Guideline 3 in Section 4.3 had been followed, it 959 should not be possible to have a less severe indication of 960 congestion in the outer than in the inner. It MAY be appropriate 961 to log unexpected combinations of headers and possibly raise an 962 alarm. 964 If a safe outgoing codepoint can be defined for such a PDU, the 965 PDU SHOULD be forwarded rather than dropped. Some implementers 966 discard PDUs with currently unused combinations of headers just 967 in case they represent an attack. However, an approach using 968 alarms and policy-mediated drop is preferable to hard-coded drop, 969 so that operators can keep track of possible attacks but 970 currently unused combinations are not precluded from future use 971 through new standards actions. 973 4.5. Sequences of Similar Tunnels or Subnets 975 In some deployments, particularly in 3GPP networks, an IP packet may 976 traverse two or more IP-in-IP tunnels in sequence that all use 977 identical technology (e.g. GTP). 979 In such cases, it would be sufficient for every encapsulation and 980 decapsulation in the chain to comply with RFC 6040. Alternatively, 981 as an optimisation, a node that decapsulates a packet and immediately 982 re-encapsulates it for the next tunnel MAY copy the incoming outer 983 ECN field directly to the outgoing outer and the incoming inner ECN 984 field directly to the outgoing inner. Then the overall behavior 985 across the sequence of tunnel segments would still be consistent with 986 RFC 6040. 988 Appendix C of RFC6040 describes how a tunnel egress can monitor how 989 much congestion has been introduced within a tunnel. A network 990 operator might want to monitor how much congestion had been 991 introduced within a whole sequence of tunnels. Using the technique 992 in Appendix C of RFC6040 at the final egress, the operator could 993 monitor the whole sequence of tunnels, but only if the above 994 optimisation were used consistently along the sequence of tunnels, in 995 order to make it appear as a single tunnel. Therefore, tunnel 996 endpoint implementations SHOULD allow the operator to configure 997 whether this optimisation is enabled. 999 When ECN support is added to a subnet technology, consideration 1000 SHOULD be given to a similar optimisation between subnets in sequence 1001 if they all use the same technology. 1003 4.6. Reframing and Congestion Markings 1005 The guidance in this section is worded in terms of framing 1006 boundaries, but it applies equally whether the protocol data units 1007 are frames, cells, packets or fragments. 1009 Where an AQM marks the ECN field of IP packets as they queue into a 1010 layer-2 link, there will be no problem with framing boundaries, 1011 because the ECN markings would be applied directly to IP packets. 1012 The guidance in this section is only applicable where an ECN 1013 capability is being added to a layer-2 protocol so that layer-2 1014 frames can be ECN-marked by an AQM at layer-2. This would only be 1015 necessary where AQM will be applied at pure layer-2 nodes (without 1016 IP-awareness). Where framing boundaries do not necessarily align 1017 with packet boundaries, the following guidance will be needed. It 1018 explains how to propagate ECN markings from layer-2 frame headers 1019 when they are stripped off and IP PDUs with different boundaries are 1020 reassembled for forwarding. 1022 Congestion indications SHOULD be propagated on the basis that a 1023 congestion indication on a PDU applies to all the octets in the PDU. 1024 On average, an encapsulator or decapsulator SHOULD approximately 1025 preserve the number of marked octets arriving and leaving (counting 1026 the size of inner headers, but not encapsulating headers that are 1027 being added or stripped). 1029 The next departing frame SHOULD be immediately marked even if only 1030 enough incoming marked octets have arrived for part of the departing 1031 frame. This ensures that any outstanding congestion marked octets 1032 are propagated immediately, rather than held back waiting for a frame 1033 no bigger than the outstanding marked octets--which might involve a 1034 long wait. 1036 For instance, an algorithm for marking departing frames could 1037 maintain a counter representing the balance of arriving marked octets 1038 minus departing marked octets. It adds the size of every marked 1039 frame that arrives and if the counter is positive it marks the next 1040 frame to depart and subtracts its size from the counter. This will 1041 often leave a negative remainder in the counter, which is deliberate. 1043 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 1044 Notification 1046 The guidance in this section is applicable, for example, when IP 1047 packets: 1049 o are encapsulated in Ethernet headers, which have no support for 1050 ECN; 1052 o are forwarded by the eNode-B (base station) of a 3GPP radio access 1053 network, which is required to apply ECN marking during congestion, 1054 [LTE-RA], [UTRAN], but the Packet Data Convergence Protocol (PDCP) 1055 that encapsulates the IP header over the radio access has no 1056 support for ECN. 1058 This guidance also generalizes to encapsulation by other subnet 1059 technologies with no native support for explicit congestion 1060 notification at the lower layer, but with support for finding and 1061 processing an IP header. It is unlikely to be applicable or 1062 necessary for IP-in-IP encapsulation, where feed-forward-and-up mode 1063 based on [RFC6040] would be more appropriate. 1065 Marking the IP header while switching at layer-2 (by using a layer-3 1066 switch) or while forwarding in a radio access network seems to 1067 represent a layering violation. However, it can be considered as a 1068 benign optimisation if the guidelines below are followed. Feed-up- 1069 and-forward is certainly not a general alternative to implementing 1070 feed-forward congestion notification in the lower layer, because: 1072 o IPv4 and IPv6 are not the only layer-3 protocols that might be 1073 encapsulated by lower layer protocols 1075 o Link-layer encryption might be in use, making the layer-2 payload 1076 inaccessible 1078 o Many Ethernet switches do not have 'layer-3 switch' capabilities 1079 so they cannot read or modify an IP payload 1081 o It might be costly to find an IP header (v4 or v6) when it may be 1082 encapsulated by more than one lower layer header, e.g. Ethernet 1083 MAC in MAC ([IEEE802.1Q]; previously 802.1ah). 1085 Nonetheless, configuring lower layer equipment to look for an ECN 1086 field in an encapsulated IP header is a useful optimisation. If the 1087 implementation follows the guidelines below, this optimisation does 1088 not have to be confined to a controlled environment such as within a 1089 data centre; it could usefully be applied on any network--even if the 1090 operator is not sure whether the above issues will never apply: 1092 1. If a native lower-layer congestion notification mechanism exists 1093 for a subnet technology, it is safe to mix feed-up-and-forward 1094 with feed-forward-and-up on other switches in the same subnet. 1095 However, it will generally be more efficient to use the native 1096 mechanism. 1098 2. The depth of the search for an IP header SHOULD be limited. If 1099 an IP header is not found soon enough, or an unrecognized or 1100 unreadable header is encountered, the switch SHOULD resort to an 1101 alternative means of signalling congestion (e.g. drop, or the 1102 native lower layer mechanism if available). 1104 3. It is sufficient to use the first IP header found in the stack; 1105 the egress of the relevant tunnel can propagate congestion 1106 notification upwards to any more deeply encapsulated IP headers 1107 later. 1109 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification 1111 It can be seen from Section 3.3 that congestion notification in a 1112 subnet using feed-backward mode has generally not been designed to be 1113 directly coupled with IP layer congestion notification. The subnet 1114 attempts to minimize congestion internally, and if the incoming load 1115 at the ingress exceeds the capacity somewhere through the subnet, the 1116 layer 3 buffer into the ingress backs up. Thus, a feed-backward mode 1117 subnet is in some sense similar to a null mode subnet, in that there 1118 is no need for any direct interaction between the subnet and higher 1119 layer congestion notification. Therefore no detailed protocol design 1120 guidelines are appropriate. Nonetheless, a more general guideline is 1121 appropriate: 1123 A subnetwork technology intended to eventually interface to IP 1124 SHOULD NOT be designed using only the feed-backward mode, which is 1125 certainly best for a stand-alone subnet, but would need to be 1126 modified to work efficiently as part of the wider Internet, 1127 because IP uses feed-forward-and-up mode. 1129 The feed-backward approach at least works beneath IP, where the term 1130 'works' is used only in a narrow functional sense because feed- 1131 backward can result in very inefficient and sluggish congestion 1132 control--except if it is confined to the subnet directly connected to 1133 the original data source, when it is faster than feed-forward. It 1134 would be valid to design a protocol that could work in feed-backward 1135 mode for paths that only cross one subnet, and in feed-forward-and-up 1136 mode for paths that cross subnets. 1138 In the early days of TCP/IP, a similar feed-backward approach was 1139 tried for explicit congestion signalling, using source-quench (SQ) 1140 ICMP control packets. However, SQ fell out of favour and is now 1141 formally deprecated [RFC6633]. The main problem was that it is hard 1142 for a data source to tell the difference between a spoofed SQ message 1143 and a quench request from a genuine buffer on the path. It is also 1144 hard for a lower layer buffer to address an SQ message to the 1145 original source port number, which may be buried within many layers 1146 of headers, and possibly encrypted. 1148 QCN (also known as backward congestion notification, BCN; see 1149 Sections 30--33 of [IEEE802.1Q]; previously known as 802.1Qau) uses a 1150 feed-backward mode structurally similar to ATM's relative rate 1151 mechanism. However, QCN confines its applicability to scenarios such 1152 as some data centres where all endpoints are directly attached by the 1153 same Ethernet technology. If a QCN subnet were later connected into 1154 a wider IP-based internetwork (e.g. when attempting to interconnect 1155 multiple data centres) it would suffer the inefficiency shown in 1156 Figure 3. 1158 7. IANA Considerations (to be removed by RFC Editor) 1160 This memo includes no request to IANA. 1162 8. Security Considerations 1164 If a lower layer wire protocol is redesigned to include explicit 1165 congestion signalling in-band in the protocol header, care SHOULD be 1166 take to ensure that the field used is specified as mutable during 1167 transit. Otherwise interior nodes signalling congestion would 1168 invalidate any authentication protocol applied to the lower layer 1169 header--by altering a header field that had been assumed as 1170 immutable. 1172 The redesign of protocols that encapsulate IP in order to propagate 1173 congestion signals between layers raises potential signal integrity 1174 concerns. Experimental or proposed approaches exist for assuring the 1175 end-to-end integrity of in-band congestion signals, e.g.: 1177 o Congestion exposure (ConEx ) for networks to audit that their 1178 congestion signals are not being suppressed by other networks or 1179 by receivers, and for networks to police that senders are 1180 responding sufficiently to the signals, irrespective of the L4 1181 transport protocol used [RFC7713]. 1183 o A test for a sender to detect whether a network or the receiver is 1184 suppressing congestion signals (for example see 2nd para of 1185 Section 20.2 of [RFC3168]). 1187 Given these end-to-end approaches are already being specified, it 1188 would make little sense to attempt to design hop-by-hop congestion 1189 signal integrity into a new lower layer protocol, because end-to-end 1190 integrity inherently achieves hop-by-hop integrity. 1192 Section 6 gives vulnerability to spoofing as one of the reasons for 1193 deprecating feed-backward mode. 1195 9. Conclusions 1197 Following the guidance in the document enables ECN support to be 1198 extended to numerous protocols that encapsulate IP (v4 & v6) in a 1199 consistent way, so that IP continues to fulfil its role as an end-to- 1200 end interoperability layer. This includes: 1202 o A wide range of tunnelling protocols including those with various 1203 forms of shim header between two IP headers, possibly also 1204 separated by a L2 header; 1206 o A wide range of subnet technologies, particularly those that work 1207 in the same 'feed-forward-and-up' mode that is used to support ECN 1208 in IP and MPLS. 1210 Guidelines have been defined for supporting propagation of ECN 1211 between Ethernet and IP on so-called Layer-3 Ethernet switches, using 1212 a 'feed-up-and-forward' mode. This approach could enable other 1213 subnet technologies to pass ECN signals into the IP layer, even if 1214 they do not support ECN natively. 1216 Finally, attempting to add ECN to a subnet technology in feed- 1217 backward mode is deprecated except in special cases, due to its 1218 likely sluggish response to congestion. 1220 10. Acknowledgements 1222 Thanks to Gorry Fairhurst and David Black for extensive reviews. 1223 Thanks also to the following reviewers: Joe Touch, Andrew McGregor, 1224 Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon and Michael 1225 Welzl, who pointed out that lower layer congestion notification 1226 signals may have different semantics to those in IP. Thanks are also 1227 due to the tsvwg chairs, TSV ADs and IETF liaison people such as Eric 1228 Gray, Dan Romascanu and Gonzalo Camarillo for helping with the 1229 liaisons with the IEEE and 3GPP. And thanks to Georg Mayer and 1230 particularly to Erik Guttman for the extensive search and 1231 categorisation of any 3GPP specifications that cite ECN 1232 specifications. 1234 Bob Briscoe was part-funded by the European Community under its 1235 Seventh Framework Programme through the Trilogy project (ICT-216372) 1236 for initial drafts and through the Reducing Internet Transport 1237 Latency (RITE) project (ICT-317700) subsequently. The views 1238 expressed here are solely those of the authors. 1240 11. Comments Solicited 1242 Comments and questions are encouraged and very welcome. They can be 1243 addressed to the IETF Transport Area working group mailing list 1244 , and/or to the authors. 1246 12. References 1248 12.1. Normative References 1250 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1251 Requirement Levels", BCP 14, RFC 2119, 1252 DOI 10.17487/RFC2119, March 1997, 1253 . 1255 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1256 of Explicit Congestion Notification (ECN) to IP", 1257 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1258 . 1260 [RFC3819] Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D., 1261 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1262 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1263 RFC 3819, DOI 10.17487/RFC3819, July 2004, 1264 . 1266 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1267 Explicit Congestion Notification (ECN) Field", BCP 124, 1268 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1269 . 1271 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 1272 Marking in MPLS", RFC 5129, DOI 10.17487/RFC5129, January 1273 2008, . 1275 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1276 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1277 2010, . 1279 12.2. Informative References 1281 [ATM-TM-ABR] 1282 Cisco, "Understanding the Available Bit Rate (ABR) Service 1283 Category for ATM VCs", Design Technote 10415, June 2005. 1285 [Buck00] Buckwalter, J., "Frame Relay: Technology and Practice", 1286 Pub. Addison Wesley ISBN-13: 978-0201485240, 2000. 1288 [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", 1289 Bell Systems Technical Journal 32(2):406--424, March 1953. 1291 [GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across the Gn and Gp 1292 interface", Technical Specification TS 29.060. 1294 [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling 1295 Protocol User Plane (GTPv1-U)", Technical Specification TS 1296 29.281. 1298 [GTPv2-C] 3GPP, "Evolved General Packet Radio Service (GPRS) 1299 Tunnelling Protocol for Control plane (GTPv2-C)", 1300 Technical Specification TS 29.274. 1302 [I-D.ietf-intarea-gue] 1303 Herbert, T., Yong, L., and O. Zia, "Generic UDP 1304 Encapsulation", draft-ietf-intarea-gue-07 (work in 1305 progress), March 2019. 1307 [I-D.ietf-nvo3-geneve] 1308 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 1309 Network Virtualization Encapsulation", draft-ietf- 1310 nvo3-geneve-13 (work in progress), March 2019. 1312 [I-D.ietf-trill-ecn-support] 1313 Eastlake, D. and B. Briscoe, "TRILL (TRansparent 1314 Interconnection of Lots of Links): ECN (Explicit 1315 Congestion Notification) Support", draft-ietf-trill-ecn- 1316 support-07 (work in progress), February 2018. 1318 [I-D.ietf-tsvwg-ecn-l4s-id] 1319 Schepper, K. and B. Briscoe, "Identifying Modified 1320 Explicit Congestion Notification (ECN) Semantics for 1321 Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- 1322 id-06 (work in progress), March 2019. 1324 [I-D.ietf-tsvwg-rfc6040update-shim] 1325 Briscoe, B., "Propagating Explicit Congestion Notification 1326 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1327 tsvwg-rfc6040update-shim-08 (work in progress), March 1328 2019. 1330 [IEEE802.1Q] 1331 IEEE, "IEEE Standard for Local and Metropolitan Area 1332 Networks--Virtual Bridged Local Area Networks--Amendment 1333 6: Provider Backbone Bridges", IEEE Std 802.1Q-2018, July 1334 2018, . 1336 [ITU-T.I.371] 1337 ITU-T, "Traffic Control and Congestion Control in B-ISDN", 1338 ITU-T Rec. I.371 (03/04), March 2004, 1339 . 1342 [Leiserson85] 1343 Leiserson, C., "Fat-trees: universal networks for 1344 hardware-efficient supercomputing", IEEE Transactions on 1345 Computers 34(10):892-901, October 1985. 1347 [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) 1348 and Evolved Universal Terrestrial Radio Access Network 1349 (E-UTRAN); Overall description; Stage 2", Technical 1350 Specification TS 36.300. 1352 [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 1353 for High Performance", RFC 1323, DOI 10.17487/RFC1323, May 1354 1992, . 1356 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1357 DOI 10.17487/RFC2003, October 1996, 1358 . 1360 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1361 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1362 December 1998, . 1364 [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, 1365 W., and G. Zorn, "Point-to-Point Tunneling Protocol 1366 (PPTP)", RFC 2637, DOI 10.17487/RFC2637, July 1999, 1367 . 1369 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1370 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1371 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1372 . 1374 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1375 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1376 DOI 10.17487/RFC2784, March 2000, 1377 . 1379 [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of 1380 Explicit Congestion Notification (ECN) in IP Networks", 1381 RFC 2884, DOI 10.17487/RFC2884, July 2000, 1382 . 1384 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1385 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1386 . 1388 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1389 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1390 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1391 . 1393 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1394 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1395 December 2005, . 1397 [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through 1398 Network Address Translations (NATs)", RFC 4380, 1399 DOI 10.17487/RFC4380, February 2006, 1400 . 1402 [RFC5415] Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley, 1403 Ed., "Control And Provisioning of Wireless Access Points 1404 (CAPWAP) Protocol Specification", RFC 5415, 1405 DOI 10.17487/RFC5415, March 2009, 1406 . 1408 [RFC6633] Gont, F., "Deprecation of ICMP Source Quench Messages", 1409 RFC 6633, DOI 10.17487/RFC6633, May 2012, 1410 . 1412 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1413 Pre-Congestion Notification (PCN) States in the IP Header 1414 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1415 DOI 10.17487/RFC6660, July 2012, 1416 . 1418 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1419 Locator/ID Separation Protocol (LISP)", RFC 6830, 1420 DOI 10.17487/RFC6830, January 2013, 1421 . 1423 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 1424 Scheffenegger, Ed., "TCP Extensions for High Performance", 1425 RFC 7323, DOI 10.17487/RFC7323, September 2014, 1426 . 1428 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1429 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1430 eXtensible Local Area Network (VXLAN): A Framework for 1431 Overlaying Virtualized Layer 2 Networks over Layer 3 1432 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1433 . 1435 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1436 Recommendations Regarding Active Queue Management", 1437 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1438 . 1440 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1441 Virtualization Using Generic Routing Encapsulation", 1442 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1443 . 1445 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1446 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1447 DOI 10.17487/RFC7713, December 2015, 1448 . 1450 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 1451 Ghanwani, A., and S. Gupta, "Transparent Interconnection 1452 of Lots of Links (TRILL): Clarifications, Corrections, and 1453 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 1454 . 1456 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1457 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1458 . 1460 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 1461 Explicit Congestion Notification (ECN)", RFC 8087, 1462 DOI 10.17487/RFC8087, March 2017, 1463 . 1465 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1466 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1467 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1468 October 2017, . 1470 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 1471 "Network Service Header (NSH)", RFC 8300, 1472 DOI 10.17487/RFC8300, January 2018, 1473 . 1475 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1476 Notification (ECN) Experimentation", RFC 8311, 1477 DOI 10.17487/RFC8311, January 2018, 1478 . 1480 [UTRAN] 3GPP, "UTRAN Overall Description", Technical 1481 Specification TS 25.401. 1483 Appendix A. Changes in This Version (to be removed by RFC Editor) 1485 From ietf-12 to ietf-13 1487 * Following 3rd tsvwg WGLC: 1489 + Formalized update to RFC 3819 in its own subsection (1.1) 1490 and referred to it in the abstract 1492 + Scope: Clarified that the specification of alternative ECN 1493 semantics using ECT(1) was not in RFC 4774, but rather in 1494 RFC 8311, and that the problem with using a DSCP to indicate 1495 alternative semantics has issues at domain boundaries as 1496 well as tunnels. 1498 + Terminology: tighted up definitions of ECN-PDU and Not-ECN- 1499 PDU, and removed definition of Congestion Baseline, given it 1500 was only used once. 1502 + Mentioned QCN where feed-backward is first introduced (S.3), 1503 referring forward to where it is discussed more deeply 1504 (S.4). 1506 + Clarified that IS-IS solution to adding ECN support to TRILL 1507 was not pursued 1509 + Completely rewrote the rationale for the guideline about a 1510 Standard Congestion Monitoring Baseline, to focus on 1511 standardization of the otherwise unknown scenario used, 1512 rather than the relative usefulness of the info in each 1513 approach 1515 + Explained the re-framing problem better and added 1516 fragmentation as another possible cause of the problem 1518 + Acknowledged new reviewers 1520 + Updated references, replaced citations of 802.1Qau and 1521 802.1ah with rolled up 802.1Q, and added citations of Fat 1522 trees and Clos Networks 1524 + Numerous other editorial improvements 1526 From ietf-11 to ietf-12 1528 * Updated references 1530 From ietf-10 to ietf-11 1531 * Removed short section (was 3) 'Guidelines for All Cases' 1532 because it was out of scope, being covered by RFC 4774. 1533 Expanded the Scope section (1.2) to explain all this. 1534 Explained that the default encap/decap rules already support 1535 certain alternative semantics, particularly all three of the 1536 alternative semantics for ECT(1): equivalent to ECT(0) , higher 1537 severity than ECT(0), and unmarked but implying different 1538 marking semantics from ECT(0). 1540 * Clarified why the QCN example was being given even though not 1541 about increment deployment of ECN 1543 * Pointed to the spoofing issue with feed-backward mode from the 1544 Security Considerations section, to aid security review. 1546 * Removed any ambiguity in the word 'transport' throughout 1548 From ietf-09 to ietf-10 1550 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1551 be consistent with updates to draft-ietf-tsvwg-rfc6040update- 1552 shim. 1554 * Removed reference to the ECN nonce, which has been made 1555 historic by RFC 8311 1557 * Removed "Open Issues" Appendix, given all have been addressed. 1559 From ietf-08 to ietf-09 1561 * Updated para in Intro that listed all the IP-in-IP tunnelling 1562 protocols, to instead refer to draft-ietf-tsvwg-rfc6040update- 1563 shim 1565 * Updated section 5.1 on "IP-in-IP tunnels with Shim Headers" to 1566 summarize guidance that has evolved as rfc6040update-shim has 1567 developed. 1569 From ietf-07 to ietf-08: Refreshed to avoid expiry. Updated 1570 references. 1572 From ietf-06 to ietf-07: 1574 * Added the people involved in liaisons to the acknowledgements. 1576 From ietf-05 to ietf-06: 1578 * Introduction: Added GUE and Geneve as examples of tightly 1579 coupled shims between IP headers that cite RFC 6040. And added 1580 VXLAN to list of those that do not. 1582 * Replaced normative text about tightly coupled shims between IP 1583 headers, with reference to new draft-ietf-tsvwg-rfc6040update- 1584 shim 1586 * Wire Protocol Design: Indication of ECN Support: Added TRILL as 1587 an example of a well-design protocol that does not need an 1588 indication of ECN support in the wire protocol. 1590 * Encapsulation Guidelines: In the case of a Not-ECN-PDU with a 1591 CE outer, replaced SHOULD be dropped, with explanations of when 1592 SHOULD or MUST are appropriate. 1594 * Feed-Up-and-Forward Mode: Explained examples more carefully, 1595 referred to PDCP and cited UTRAN spec as well as E-UTRAN. 1597 * Updated references. 1599 * Marked open issues as resolved, but did not delete Open Issues 1600 Appendix (yet). 1602 From ietf-04 to ietf-05: 1604 * Explained why tightly coupled shim headers only "SHOULD" comply 1605 with RFC 6040, not "MUST". 1607 * Updated references 1609 From ietf-03 to ietf-04: 1611 * Addressed Richard Scheffenegger's review comments: primarily 1612 editorial corrections, and addition of examples for clarity. 1614 From ietf-02 to ietf-03: 1616 * Updated references, ad cited RFC4774. 1618 From ietf-01 to ietf-02: 1620 * Added Section for guidelines that are applicable in all cases. 1622 * Updated references. 1624 From ietf-00 to ietf-01: Updated references. 1626 From briscoe-04 to ietf-00: Changed filename following tsvwg 1627 adoption. 1629 From briscoe-03 to 04: 1631 * Re-arranged the introduction to describe the purpose of the 1632 document first before introducing ECN in more depth. And 1633 clarified the introduction throughout. 1635 * Added applicability to 3GPP TS 36.300. 1637 From briscoe-02 to 03: 1639 * Scope section: 1641 + Added dependence on correct propagation of traffic class 1642 information 1644 + For the feed-backward mode, deemed multicast and anycast out 1645 of scope 1647 * Ensured all guidelines referring to subnet technologies also 1648 refer to tunnels and vice versa by adding applicability 1649 sentences at the start of sections 4.1, 4.2, 4.3, 4.4, 4.6 and 1650 5. 1652 * Added Security Considerations on ensuring congestion signal 1653 fields are classed as immutable and on using end-to-end 1654 congestion signal integrity technologies rather than hop-by- 1655 hop. 1657 From briscoe-01 to 02: 1659 * Added authors: JK & PT 1661 * Added 1663 + Section 4.1 "IP-in-IP Tunnels with Tightly Coupled Shim 1664 Headers" 1666 + Section 4.5 "Sequences of Similar Tunnels or Subnets" 1668 + roadmap at the start of Section 4, given the subsections 1669 have become quite fragmented. 1671 + Section 9 "Conclusions" 1673 * Clarified why transports are starting to be able to saturate 1674 interior links 1676 * Under Section 1.1, addressed the question of alternative signal 1677 semantics and included multicast & anycast. 1679 * Under Section 3.1, included a 3GPP example. 1681 * Section 4.2. "Wire Protocol Design": 1683 + Altered guideline 2. to make it clear that it only applies 1684 to the immediate subnet egress, not later ones 1686 + Added a reminder that it is only necessary to check that ECN 1687 propagates at the egress, not whether interior nodes mark 1688 ECN 1690 + Added example of how QCN uses 802.1p to indicate support for 1691 QCN. 1693 * Added references to Appendix C of RFC6040, about monitoring the 1694 amount of congestion signals introduced within a tunnel 1696 * Appendix A: Added more issues to be addressed, including plan 1697 to produce a standards track update to IP-in-IP tunnel 1698 protocols. 1700 * Updated acks and references 1702 From briscoe-00 to 01: 1704 * Intended status: BCP (was Informational) & updates 3819 added. 1706 * Briefer Introduction: Introductory para justifying benefits of 1707 ECN. Moved all but a brief enumeration of modes of operation 1708 to their own new section (from both Intro & Scope). Introduced 1709 incr. deployment as most tricky part. 1711 * Tightened & added to terminology section 1713 * Structured with Modes of Operation, then Guidelines section for 1714 each mode. 1716 * Tightened up guideline text to remove vagueness / passive voice 1717 / ambiguity and highlight main guidelines as numbered items. 1719 * Added Outstanding Document Issues Appendix 1720 * Updated references 1722 Authors' Addresses 1724 Bob Briscoe 1725 Independent 1726 UK 1728 EMail: ietf@bobbriscoe.net 1729 URI: http://bobbriscoe.net/ 1731 John Kaippallimalil 1732 Huawei 1733 5340 Legacy Drive, Suite 175 1734 Plano, Texas 75024 1735 USA 1737 EMail: john.kaippallimalil@huawei.com 1739 Pat Thaler 1740 Broadcom Corporation (retired) 1741 CA 1742 USA