idnits 2.17.1 draft-ietf-conex-abstract-mech-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2014) is 3466 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-01) exists of draft-briscoe-conex-policing-00 == Outdated reference: A later version (-03) exists of draft-briscoe-conex-re-ecn-motiv-02 == Outdated reference: A later version (-04) exists of draft-briscoe-conex-re-ecn-tcp-02 == Outdated reference: A later version (-12) exists of draft-ietf-conex-destopt-05 == Outdated reference: A later version (-10) exists of draft-ietf-conex-tcp-modifications-04 == Outdated reference: A later version (-08) exists of draft-ietf-tcpm-accecn-reqs-04 == Outdated reference: A later version (-02) exists of draft-wagner-conex-audit-01 Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion Exposure (ConEx) Working M. Mathis 3 Group Google, Inc 4 Internet-Draft B. Briscoe 5 Intended status: Informational BT 6 Expires: April 27, 2015 October 24, 2014 8 Congestion Exposure (ConEx) Concepts, Abstract Mechanism and 9 Requirements 10 draft-ietf-conex-abstract-mech-13 12 Abstract 14 This document describes an abstract mechanism by which senders inform 15 the network about the congestion recently encountered by packets in 16 the same flow. Today, network elements at any layer may signal 17 congestion to the receiver by dropping packets or by ECN markings, 18 and the receiver passes this information back to the sender in 19 transport-layer feedback. The mechanism described here enables the 20 sender to also relay this congestion information back into the 21 network in-band at the IP layer, such that the total amount of 22 congestion from all elements on the path is revealed to all IP 23 elements along the path, where it could, for example, be used to 24 provide input to traffic management. This mechanism is called 25 congestion exposure or ConEx. The companion document "ConEx Concepts 26 and Use Cases" provides the entry-point to the set of ConEx 27 documentation. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 27, 2015. 46 Copyright Notice 48 Copyright (c) 2014 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 66 3. Requirements for the ConEx Abstract Mechanism . . . . . . . . 7 67 3.1. Requirements for ConEx Signals . . . . . . . . . . . . . . 7 68 3.2. Constraints on the Audit Function . . . . . . . . . . . . 8 69 3.3. Requirements for non-abstract ConEx specifications . . . . 9 70 4. Encoding Congestion Exposure . . . . . . . . . . . . . . . . . 11 71 4.1. Naive Encoding . . . . . . . . . . . . . . . . . . . . . . 11 72 4.2. Null Encoding . . . . . . . . . . . . . . . . . . . . . . 12 73 4.3. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 12 74 4.4. Independent Bits . . . . . . . . . . . . . . . . . . . . . 13 75 4.5. Codepoint Encoding . . . . . . . . . . . . . . . . . . . . 13 76 4.6. Units Implied by an Encoding . . . . . . . . . . . . . . . 14 77 5. Congestion Exposure Components . . . . . . . . . . . . . . . . 15 78 5.1. Network Devices (Not modified) . . . . . . . . . . . . . . 15 79 5.2. Modified Senders . . . . . . . . . . . . . . . . . . . . . 15 80 5.3. Receivers (Optionally Modified) . . . . . . . . . . . . . 16 81 5.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 16 82 5.4.1. Congestion Monitoring Devices . . . . . . . . . . . . 16 83 5.4.2. Rest-of-Path Congestion Monitoring . . . . . . . . . . 17 84 5.4.3. Congestion Policers . . . . . . . . . . . . . . . . . 17 85 5.5. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 18 86 6. Support for Incremental Deployment . . . . . . . . . . . . . . 21 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 88 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 89 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 90 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 26 91 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 92 11.1. Normative References . . . . . . . . . . . . . . . . . . . 26 93 11.2. Informative References . . . . . . . . . . . . . . . . . . 26 95 1. Introduction 97 This document describes an abstract mechanism by which, to a first 98 approximation, senders inform the network about the congestion 99 encountered by packets earlier in the same flow. It is not a 100 complete protocol specification, because it is known that designing 101 an encoding (e.g. packet formats, codepoint allocations, etc) is 102 likely to entail compromises that preclude some uses of the protocol. 103 The goal of this document is to provide a framework for developing 104 and testing algorithms to evaluate the benefits of the ConEx protocol 105 and to evaluate the consequences of the compromises in various 106 different encoding designs. This document lays out requirements for 107 concrete protocol specifications. 109 A companion document [RFC6789] provides the entry point to the set of 110 ConEx documentation. It outlines concepts that are pre-requisites to 111 understanding why ConEx is useful, and it outlines various ways that 112 ConEx might be used. 114 2. Overview 116 As typical end-to-end transport protocols continually seek out more 117 network capacity, network elements signal whenever congestion 118 results, and the transports are responsible for controlling this 119 network congestion [RFC5681]. The more a transport tries to use 120 capacity that others want to use, the more congestion signals will be 121 attributable to that transport. Likewise, the more transport 122 sessions sustained by a user and the longer the user sustains them, 123 the more congestion signals will be attributable to that user. The 124 goal of ConEx is to ensure that the resulting congestion signals are 125 sufficiently visible and robust, because they are an ideal metric for 126 networks to use as the basis of traffic management or other related 127 functions. 129 Networks indicate congestion by three possible signals: packet loss, 130 ECN marking or queueing delay. ECN marking and some packet loss may 131 be the outcome of Active Queue Management (AQM), which the network 132 uses to warn senders to reduce their rates. Packet loss is also the 133 natural consequence of complete exhaustion of a buffer or other 134 network resource. Some experimental transport protocols and TCP 135 variants infer impending congestion from increasing queuing delay. 136 However, delay is too amorphous to use as a congestion metric. In 137 this and other ConEx documents, the term 'congestion signals' is 138 generally used solely for ECN markings and packet losses, because 139 they are unambiguous signals of congestion. 141 In both cases the congestion signals follow the route indicated in 142 Figure 1. A congested network device sends a signal in the data 143 stream on the forward path to the transport receiver, the receiver 144 passes it back to the sender through transport level feedback, and 145 the sender makes some congestion control adjustment. 147 This document extends the capabilities of the Internet protocol suite 148 with the addition of a new Congestion Exposure signal. To a first 149 approximation this signal, also shown in Figure 1, relays the 150 congestion information from the transport sender back through the 151 internetwork layer where it is visible to any interested internetwork 152 layer devices along the forward path. This document frames the 153 engineering problem of designing the ConEx signal. The requirements 154 are described in Section 3 and some example encoding are presented in 155 Section 4. Section 5 describes all of the protocol components. 157 This new signal is expressly designed to support a variety of new 158 policy mechanisms that might be used to instrument, monitor or manage 159 traffic. The policy devices are not shown in Figure 1 but might be 160 placed anywhere along the forward data path (see Section 5.4). 162 ,---------. ,---------. 163 |Transport| |Transport| 164 | Sender | . |Receiver | 165 | | /|___________________________________________| | 166 | ,-<---------------Congestion-Feedback-Signals--<--------. | 167 | | |/ | | | 168 | | |\ Transport Layer Feedback Flow | | | 169 | | | \ ___________________________________________| | | 170 | | | \| | | | 171 | | | ' ,-----------. . | | | 172 | | |_____________| |_______________|\ | | | 173 | | | IP Layer | | Data Flow \ | | | 174 | | | |(Congested)| \ | | | 175 | | | | Network |--Congestion-Signals--->-' | 176 | | | | Device | \| | 177 | | | | | /| | 178 | `----------->--(new)-IP-Layer-ConEx-Signals-------->| | 179 | | | | / | | 180 | |_____________| |_______________ / | | 181 | | | | |/ | | 182 `---------' `-----------' ' `---------' 184 Figure 1: The Flow of Congestion and ConEx Signals 186 Since the policy devices can affect how traffic is treated it is 187 assumed that there is an intrinsic motivation for users, applications 188 or operating systems to understate the congestion that they are 189 causing. Therefore, it is important to be able to audit ConEx 190 signals, and to be able to apply sufficient sanction to discourage 191 cheating of congestion policies. The general approach to auditing is 192 to count signals on the forward path to confirm that there are never 193 fewer ConEx signals than congestion signals. Many ConEx design 194 constraints come from the need to assure that the audit function is 195 sufficiently robust. The audit function is described in Section 5.5, 196 however significant portions of this document (and prior research 197 [Refb-dis]) is motivated by issues relating to the audit function and 198 making it robust. 200 The congestion and ConEx signals shown in Figure 1 represent a series 201 of discrete events: ECN marks or lost packets, carried by the forward 202 data stream and fed back into the Internetwork layer. The policy and 203 audit functions are most likely to act on the accumulated values of 204 these signals, for which we use the term "volume". For example 205 traffic volume is the total number of bytes delivered, optionally 206 over a specified time interval and over some aggregate of traffic 207 (e.g. all traffic from a site). While loss-volume is the total 208 amount of bytes discarded from some aggregate over an interval. The 209 term congestion-volume is defined precisely in [RFC6789]. Note that 210 volume per unit time is (average) rate. 212 A design goal of the ConEx protocol is that the important policy 213 mechanisms can be implemented per logical link without per flow state 214 (see Section 5.4). However, the price to pay can be flow state to 215 audit ConEx signals (Section 5.5). This is justified in that i) 216 auditing at the edges, with limited per flow state, enables policy 217 elsewhere, including in the core, without any per flow state; ii) 218 auditing can use soft flow state, which does not require route 219 pinning. 221 There is a long standing argument over units of congestion: bytes vs 222 packets (see [RFC7141] and its references). Section 4.6 explains why 223 this problem must be addressed carefully. However, this document 224 does not take a strong position on this issue. Nonetheless, it does 225 require that the units of congestion must be an explicitly stated 226 property of any proposed encoding, and the consequences of that 227 design decision must be evaluated along with other aspects of the 228 design. 230 To be successful the ConEx protocol needs to have the property that 231 the relevant stakeholders each have the incentive to unilaterally 232 start on each stage of partial deployment, which in turn creates 233 incentives for further deployment. Furthermore, legacy systems that 234 will never be upgraded do not become a barrier to deploying ConEx. 235 Issues relating to partial deployment are described in Section 6. 237 Note that ConEx signals are not intended to be used for fine-grained 238 congestion control. They are anticipated to be most useful at longer 239 time scales and/or at coarser granularity than single microflows. 240 For example the total congestion caused by a user might serve as an 241 input to higher level policy or accountability functions, designed to 242 create incentives for improving user behavior, such as choosing to 243 send large quantities of data at off-peak times, at lower data rates 244 or with less aggressive protocols such as LEDBAT [RFC6817] (see 245 [RFC6789]). 247 Ultimately ConEx signals have the potential to provide a mechanism to 248 regulate global Internet congestion. From the earliest days of 249 congestion control research there has been a concern that there is no 250 mechanism to prevent transport designers from incrementally making 251 protocols more aggressive without bound and spiraling to a "tragedy 252 of the commons" Internet congestion collapse. The "TCP friendly" 253 paradigm was created in part to forestall this failure. However, it 254 no longer commands any authority because it has little to say about 255 the Internet of today, which has moved beyond the scaling range of 256 standard TCP. As a consequence, many transports and applications are 257 opening arbitrarily large numbers of connections or using arbitrary 258 levels of aggressiveness. ConEx represents a recognition that the 259 IETF cannot regulate this space directly because it concerns the 260 behaviour of users and applications, not individual transport 261 protocols. Instead the IETF can give network operators the protocol 262 tools to arbitrate the space themselves, with better bulk traffic 263 management. This in turn should create incentives for users, and 264 designers of application and of transport protocols to be more 265 mindful about contributing to congesting. 267 2.1. Terminology 269 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 270 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 271 document are to be interpreted as described in RFC 2119 [RFC2119]. 273 ConEx signals in IP packet headers from the sender to the network: 274 Not-ConEx: The transport (or at least this packet) is not ConEx- 275 capable. 276 ConEx-Capable: The transport is ConEx-Capable. This is the opposite 277 of Not-ConEx. 278 ConEx Signal: A signal in a packet sent by a ConEx Capable 279 transport. It carries at least one of the following signals: 280 Re-Echo-Loss: The transport has experienced a loss. 281 Re-Echo-ECN: The transport has detected an ECN congestion 282 experienced (CE) mark. 284 Credit: The transport is building up credit to signal advance 285 notice of the risk of packets contributing to congestion, in 286 contrast to signalling only after inherently delayed feedback 287 of actual congestion. 288 ConEx-Not-Marked: The transport is ConEx-capable but is signaling 289 none of Re-Echo-Loss, Re-Echo-ECN or Credit. 290 ConEx-Marked: At least one of Re-Echo-Loss, Re-Echo-ECN or Credit. 291 ConEx-Re-Echo: At least one of Re-Echo-Loss or Re-Echo-ECN. 293 3. Requirements for the ConEx Abstract Mechanism 295 First time readers may wish to skim this section, since it is more 296 understandable having read the entire document. 298 3.1. Requirements for ConEx Signals 300 Ideally, all the following requirements would be met by a Congestion 301 Exposure Signal: 302 a. The ConEx Signal SHOULD be visible to internetwork layer devices 303 along the entire path from the transport sender to the transport 304 receiver. Equivalently, it SHOULD be present in the IPv4 or IPv6 305 header, and in the outermost IP header if using IP in IP 306 tunneling. It MAY need to be visible if other encapsulating 307 headers are used to interconnect networks. The ConEx Signal 308 SHOULD be immutable once set by the transport sender. A 309 corollary of these requirements is that the chosen ConEx encoding 310 SHOULD pass silently without modification through pre-existing 311 networking gear. 312 b. The ConEx Signal SHOULD be useful under only partial deployment. 313 A minimal deployment SHOULD only require changes to transport 314 senders. Furthermore, partial deployment SHOULD create 315 incentives for additional deployment, both in terms of enabling 316 ConEx on more devices and adding richer features to existing 317 devices. Nonetheless, ConEx deployment need never be universal, 318 and it is anticipated that some hosts and some transports may 319 never support the ConEx Protocol and some networks may never use 320 the ConEx Signals. 321 c. The ConEx signal SHOULD be timely. There will be a minimum delay 322 of one RTT, and often longer if the transport protocol sends 323 infrequent feedback (consider RTCP [RFC3550], [RFC6679] for 324 example). 325 d. The ConEx signal SHOULD be accurate and auditable. The general 326 approach for auditing is to observe the volume of congestion 327 signals and ConEx signals on the forward data path and verify 328 that the ConEx signals do not under-represent the congestion 329 signals (see Section 5.5). 331 e. The ConEx signals for packet loss and ECN marking SHOULD have 332 distinct encodings because they are likely to require different 333 auditing techniques. 334 f. Additionally there SHOULD be an auditable ConEx Credit signal. A 335 sender can use Credit to indicate potential future congestion, 336 for example as often seen during startup. ConEx Credit is 337 intended to overestimate congestion actually experienced across 338 the network. 340 It is already known that implementing ConEx signals is likely to 341 entail some compromises, and therefore all the requirements above are 342 expressed with the keyword 'SHOULD' rather than 'MUST'. The only 343 mandatory requirement is that a concrete protocol description MUST 344 give sound reasoning if it chooses not to meet some requirement. 346 3.2. Constraints on the Audit Function 348 The role of the audit function and constraints on it are described in 349 Section 5.5. There is no intention to standardise the audit 350 function. However, it is necessary to lay down the following 351 normative constraints on audit behaviour so that transport designers 352 will know what to design against and implementers of audit devices 353 will know what pitfalls to avoid: 354 Minimal False Hits: Audit SHOULD introduce minimal false hits for 355 honest flows; 356 Minimal False Misses: Audit SHOULD quickly detect and sanction 357 dishonest flows, ideally on the first dishonest packet; 358 Transport Oblivious: Audit SHOULD NOT be designed around one 359 particular rate response, such as any particular TCP congestion 360 control algorithm or one particular resource sharing regime such 361 as TCP-friendliness [RFC5348]. An important goal is to give 362 ingress networks the freedom to unilaterally allow different rate 363 responses to congestion and different resource sharing regimes 364 [Evol_cc], without having to coordinate with other networks over 365 details of individual flow behaviour; 366 Sufficient Sanction: Audit SHOULD introduce sufficient sanction 367 (e.g. loss in goodput) such that senders cannot gain from 368 understating congestion; 369 Proportionate Sanction: To the extent that the audit might be 370 subject to false hits, the sanction SHOULD be proportionate to the 371 degree to which congestion is understated. If audit over- 372 punishes, attackers will find ways to harness it into amplifying 373 attacks on others. Ideally audit should, in the long-run, cause 374 the user to get no better performance than they would get by being 375 accurate. 377 Manage Memory Exhaustion: Audit SHOULD be able to counter state 378 exhaustion attacks. For instance, if the audit function uses 379 flow-state, it should not be possible for senders to exhaust its 380 memory capacity by gratuitously sending numerous packets, each 381 with a different flow ID. 382 Identifier Accountability: Audit SHOULD NOT be vulnerable to 383 `identity whitewashing', where a transport can label a flow with a 384 new ID more cheaply than paying the cost of continuing to use its 385 current ID [CheapPseud]; 387 3.3. Requirements for non-abstract ConEx specifications 389 An experimental ConEx specification SHOULD describe the following 390 protocol details: 391 Network Layer: 392 A. The specific ConEx signal encodings with packet formats, bit 393 fields and/or code points; 394 B. An inventory of invalid combinations of flags or invalid 395 codepoints in the encoding. Whether security gateways should 396 normalise, discard or ignore such invalid encodings, and what 397 values they should be considered equivalent to by ConEx-aware 398 elements; 399 C. An inventory of any conflated signals or any other effects 400 that are known to compromise signal integrity; 401 D. Whether the source is responsible for allowing for the round 402 trip delay in ConEx signals (e.g. using a Credit marking), and 403 if so whether Credit is maintained for the duration of a flow 404 or degrades over time, and what defines the end of the 405 duration of a flow; 406 E. A specification for signal units (bytes vs packets, etc), any 407 approximations allowed and algorithms to do any implied 408 conversions or accounting; 409 F. If the units are bytes a definition of which headers are 410 included in the size of the packet; 411 G. How tunnels should propagate the ConEx encoding; 412 H. Whether the encoding fields are mutable or not, to ensure that 413 header authentication, checksum calculation, etc. process them 414 correctly. A ConEx encoding field SHOULD be immutable end-to- 415 end, then end points can detect if it has been tampered with 416 in transit; 417 I. If a specific encoding allows mutability (e.g. at proxies), an 418 inventory of invalid transitions between codepoints. In all 419 encodings, transitions from any ConEx marking to Not-ConEx 420 MUST be invalid; 421 J. A statement that the ConEx encoding is only applicable to 422 unicast and anycast, and that forwarding elements should 423 silently ignore any ConEx signalling on multicast packets 424 (they should be forwarded unchanged) 426 K. Definition of any extensibility; 427 L. Backward and forward compatibility and potential migration 428 strategies. In all cases, a ConEx encoding MUST be arranged 429 so that legacy transport senders implicitly send Not-ConEx; 430 M. Any (optional) modification to data-plane forwarding dependent 431 on the encoding (e.g. preferential discard, interaction with 432 Diffserv, ECN etc.); 433 N. Any warning or error messages relevant to the encoding. 435 Note regarding item J on multicast: A multicast tree may involve 436 different levels of congestion on each leg. Any traffic 437 management can only monitor or control multicast congestion at or 438 near each receiver. It would make no sense for the sender to try 439 to expose "whole path congestion" in sent packets, because it 440 cannot hope to describe all the differing congestion levels on 441 every leg of the tree. 442 Transport Layer: 443 A. A specification of any required changes to congestion feedback 444 in particular transport protocols. 445 B. A specification (or minimally a recommendation) for how a 446 transport should estimate credits at the beginning of a 447 connection and while it is in progress. 448 C. A specification of whether any other protocol options should 449 (or must) be enabled along with an implementation of ConEx 450 (e.g. at least attempting to negotiate ECN and SACK 451 capability); 452 D. A specification of any configuration that a ConEx stack may 453 require (or preferably confirmation that it requires no 454 configuration); 455 E. A specification of the statistics that a protocol stack should 456 log for each type of marking on a per-flow or aggregate basis. 457 Security: 458 A. An example of a strong audit algorithm suitable for detecting 459 if a single flow is misstating congestion. This algorithm 460 should present minimal false results, but need not have 461 optimal scaling properties (e.g. may need per flow state). 462 B. An example of an audit algorithm suitable for detecting 463 misstated congestion in a large aggregate (e.g. no per-flow 464 state). 466 The possibility exists that these specifications over constrain the 467 ConEx design, and can not be fully satisfied. An important part of 468 the evaluation of any particular design will be a thorough inventory 469 of all ways in which it might fail to satisfy these specifications. 471 4. Encoding Congestion Exposure 473 Most protocol specifications start with a description of packet 474 formats and codepoints with their associated meanings. This document 475 does not: It is already known that choosing the encoding for ConEx is 476 likely to entail some engineering compromises that have the potential 477 to reduce the protocol's usefulness in some settings. For instance 478 the experimental ConEx encoding chosen for IPv6 479 [I-D.ietf-conex-destopt] had to make compromises on tunnelling. 480 Rather than making these engineering choices prematurely, this 481 document sidesteps the encoding problem by making it abstract. It 482 describes several different representations of ConEx Signals, none of 483 which are specified to the level of specific bits or code points. 485 The goal of this approach is to be as complete as possible for 486 discovering the potential usage and capabilities of the ConEx 487 protocol, so we have some hope of making optimal design decisions 488 when choosing the encoding. Even if experiments reveal particular 489 problems due to the encoding, then this document will still serve as 490 a reference model. 492 4.1. Naive Encoding 494 For tutorial purposes, it is helpful to describe a naive encoding of 495 the ConEx protocol for TCP and similar protocols: set a bit (not 496 specified here) in the IP header on each retransmission and on each 497 ECN signaled window reduction. Network devices along the forward 498 path can see this bit and act on it. For example any device along 499 the path might limit the rate of all traffic if the rate of marked 500 (congested) packets exceeds a threshold. 502 This simple encoding is sufficient to illustrate many of the benefits 503 envisioned for ConEx. At first glance it looks like it might 504 motivate people to deploy and use it. It is a one line code change 505 that a small number of OS developers and content providers could 506 unilaterally deploy across a significant fraction of all Internet 507 traffic. However, this encoding does not support auditing so it 508 would also motivate users and/or applications to misrepresent the 509 congestion that they are causing [RFC3514]. As a consequence the 510 naive encoding is not likely to be trusted and thus creates its own 511 disincentives for deployment. 513 Nonetheless, this Naive encoding does present a clear mental model of 514 how the ConEx protocol might function under various uses. It is 515 useful for thought experiments where it can be stipulated that all 516 participants are honest and it does illustrate some of the incentives 517 that might be introduced by ConEx. 519 4.2. Null Encoding 521 In limited contexts it is possible to implement ConEx-like functions 522 without any signals at all by measuring rest-of-path congestion 523 directly from TCP headers. The algorithm is to keep at least one RTT 524 of past TCP headers and matching each new header against the history 525 to count duplicate data. 527 This could implement many ConEx policies, without any explicit 528 protocol. It is fairly easy to implement, at least at low rate (e.g. 529 in a software based edge router). However, it would only be useful 530 in cases where the network operator can see the TCP headers. This is 531 currently (2014) the majority of traffic because UDP, IPSec and VPN 532 tunnels are used far less than SSL or TLS over TCP/IP, which do not 533 hide TCP sequence numbers from network devices. However, anyone 534 specifically intending to avoid the attention of a congestion policy 535 device would only have to hide their TCP headers from the network 536 operator (e.g. by using a VPN tunnel). 538 4.3. ECN Based Encoding 540 The re-ECN specification [I-D.briscoe-conex-re-ecn-tcp] presents an 541 encoding of ConEx in IPv4 and IPv6 that was tightly integrated with 542 ECN encoding in order to fit into the IPv4 header. Any individual 543 packet may need to represent any ECN codepoint and any ConEx signal 544 value independently. So, ideally their encoding should be entirely 545 independent. However, given the limited number of header bits and/or 546 code points, re-ECN chooses to partially share code points and to re- 547 echo both losses and ECN with just one codepoint. 549 The central theme of the re-ECN work is an audit mechanism that 550 provides sufficient disincentives against misrepresenting congestion 551 [I-D.briscoe-conex-re-ecn-motiv]. It is analyzed extensively in 552 Briscoe's PhD dissertation [Refb-dis]. For a tutorial background on 553 re-ECN motivation and techniques, see [Re-fb, FairerFaster]. 555 Re-ECN is an example of one chosen set of compromises attempting to 556 meet the requirements of Section 3. The present document takes a 557 step back, aiming to state the ideal requirements in order to allow 558 the Internet community to assess whether different compromises might 559 be better. 561 The problem with Re-ECN is that it requires that receivers be ECN 562 enabled in addition to sender changes. Newer encodings 563 [I-D.ietf-conex-destopt] overcome this problem by being able to 564 represent loss and ECN based congestion separately. 566 4.4. Independent Bits 568 This encoding involves flag bits, each of which the sender can set 569 independently to indicate to the network one of the following four 570 signals: 571 ConEx (Not-ConEx) The transport is (or is not) using ConEx with this 572 packet (network layer encoding requirement L in Section 3.3) says 573 the protocol must be arranged so that legacy transport senders 574 implicitly send Not-ConEx; 575 Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not) 576 experienced a loss 577 Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not) 578 experienced ECN-signaled congestion 579 Credit (Not-Credit) The transport is (or is not) building up 580 congestion credit (see Section 5.5 on the audit function) 582 A packet with ConEx set combined with all the three other flags 583 cleared implies ConEx-Not-Marked 585 This encoding does not imply any exclusion property among the 586 signals. Multiple types of congestion (ECN, loss) can be signalled 587 on the same ACK. So, ideally, a ConEx sender would be able to 588 reflect these in the next packet. However, there will be many 589 invalid combinations of flags (e.g. Not-ConEx combined with any of 590 the ConEx-marked flags), which a malicious sender could use to 591 advantage against naive policy devices that only check each flag 592 separately. 594 As long as the packets in a flow have uniform sizes, it does not 595 matter whether the units of congestion are packets or bytes. 596 However, if an application sends very irregular packet sizes, it may 597 be necessary for the sender to mark multiple packets to avoid being 598 in technical violation of an audit function measuring in bytes (see 599 Section 4.6). 601 4.5. Codepoint Encoding 603 This encoding involves signaling one of the following five 604 codepoints: 606 ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit} 608 Each named codepoint has the same meaning as in the encoding using 609 independent bits in the previous section. The use of any one 610 codepoint implies the negative of all the others. 612 Inherently, the semantics of most of the enumerated codepoints are 613 mutually exclusive. 'Credit' is the only one that might need to be 614 used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even 615 that requirement is questionable. It must not be forgotten that the 616 enumerated encoding loses the flexibility to signal these two 617 combinations, whereas the encoding with four independent bits is not 618 so limited. Alternatively two extra codepoints could be assigned to 619 these two combinations of semantics. The comment in the previous 620 section about units also applies. 622 4.6. Units Implied by an Encoding 624 The following comments apply generally to all the other encodings. 626 Congestion can be due to exhaustion of bit-carrying capacity, or 627 exhaustion of packet processing power. When a packet is discarded or 628 marked to indicate congestion, there is no easy way to know whether 629 the lost or marked packet signifies bit-congestion or packet- 630 congestion. The above ConEx encodings that rely on marking packets 631 suffer from the same ambiguity. 633 This problem is most acute when audit needs to check that one count 634 of markings matches another. For example if there are ConEx markings 635 on three large (1500B) packets, is that sufficient to match the loss 636 of 5 small (60B) packets? If a packet-marking is defined to mean all 637 the bytes in the packet are marked, then we have 4500B of Conex 638 marked data against 300B of lost data, which is easily sufficient. 639 If instead we are counting packets, then we have 3 ConEx packets 640 against 5 lost packets, which is not sufficient. This problem will 641 not arise when all the packets in a flow are the same size, but a 642 choice needs to be made for flows in which packet sizes vary, such as 643 BGP, SPDY and some variable rate video encoding schemes. 645 Whether to use bytes or packets is not obvious. For instance, the 646 most expensive links in the Internet, in terms of cost per bit, are 647 all at lower data rates, where transmission times are large and 648 packet sizes are important. In order for a policy to consider wire 649 time, it needs to know the number of congested bytes. However, high 650 speed networking equipment and the transport protocols themselves 651 sometimes gauge resource consumption and congestion in terms of 652 packets. 654 [RFC7141] advises that congestion indications should be interpreted 655 in units of bytes when responding to congestion, at least on today's 656 Internet. [RFC6789] takes the same view in its definition of 657 congestion-volume, again for today's Internet. 659 In any TCP implementation this is simple to achieve for varying size 660 packets, given TCP SACK tracks losses in bytes. If an encoding is 661 specified in units of bytes, the encoding should also specify which 662 headers to include in the size of a packet (see network layer 663 requirement F in Section 3.3). 665 RFC 7141 constructs an argument for why equipment tends to be built 666 so that the bottleneck will be the bit-carrying capacity of its 667 interfaces not its packet processing capacity. However, RFC 7141 668 acknowledges that the position may change in future, and notes that 669 new techniques will need to be developed to distinguish packet- and 670 bit-congestion. 672 Given this document describes an abstract ConEx mechanism, it is 673 intended to be timeless. Therefore it does not take a strong 674 position on this issue. However, a ConEx encoding will need to 675 explicitly specify whether it assumes units of bytes or packets 676 consistently for both congestion indications and ConEx markings (see 677 network layer requirement E in Section 3.3). It may help to refer to 678 the guidance in [RFC7141]. 680 5. Congestion Exposure Components 682 The components shown in Figure 1 as well as policy and audit are 683 described in more detail. 685 5.1. Network Devices (Not modified) 687 Congestion signals originate from network devices as they do today. 688 A congested router, switch or other network device can discard or ECN 689 mark packets when it is congested. 691 5.2. Modified Senders 693 The sending transport needs to be modified to send Congestion 694 Exposure signals in response to congestion feedback signals (e.g. for 695 the case of a TCP transport see [I-D.ietf-tcp-modifications]). We 696 want to permit ConEx without ECN (e.g. if the receiver does not 697 support ECN). However, we want to encourage a ConEx sender to at 698 least attempt to negotiate ECN (a ConEx transport protocol spec may 699 require this), because it is believed that ConEx without ECN is 700 harder to audit, and thus potentially exposed to cheating. Since 701 honest users have the potential to benefit from stronger mechanisms 702 to manage traffic they have an incentive to deploy ConEx and ECN 703 together. This incentive is not sufficient to prevent a dishonest 704 user from constructing (or configuring) a sender that enables ConEx 705 after choosing not to negotiate ECN, but it should be sufficient to 706 prevent this from being the sustained default case for any 707 significant pool of users. 709 Permitting ConEx without ECN is necessary to facilitate bootstrapping 710 other parts of ConEx deployment. 712 5.3. Receivers (Optionally Modified) 714 Any receiving transport may already feedback sufficiently useful 715 signals to the sender so that it does not need to be altered. 717 The native loss or ECN signaling mechanism required for compliance 718 with existing congestion control standards (e.g. RTCP, SCTP) will 719 typically be sufficient for the Sender to generate ConEx signals. 721 TCP's loss feedback is sufficient for ConEx if SACK is used 722 [RFC2018]. However, the original specification for ECN in TCP 723 [RFC3168] signals congestion no more than once per round trip. The 724 sender may require more precise feedback from the receiver otherwise 725 it is at risk of appearing to be understating its ConEx Signals. 727 Ideally, ConEx should be added to a transport like TCP without 728 mandatory modifications to the receiver. But in the TCP-ECN case an 729 optional modification to the receiver could be recommended for 730 precision (see [I-D.ietf-tcpm-accecn-reqs], which is based on the 731 approach originally taken when adding re-ECN to TCP 732 [I-D.briscoe-conex-re-ecn-tcp]). 734 5.4. Policy Devices 736 Policy devices are characterised by a need to be configured with a 737 policy related to the users or neighboring networks being served. In 738 contrast, auditing devices solely enforce compliance with the ConEx 739 protocol and do not need to be configured with any client-specific 740 policy. 742 One of the design goals of the ConEx protocol is that none of the 743 important policy mechanisms requires per flow state, and that policy 744 mechanisms can even be implemented for heavily aggregated traffic in 745 the core of the Internet with complexity akin to accumulating marking 746 volumes per logical link. Of course, policy mechanisms may sometimes 747 choose to focus down on individual flows, but ConEx aims to make 748 aggregate policy devices feasible. 750 5.4.1. Congestion Monitoring Devices 752 Policy devices can typically be decomposed into two functions i) 753 monitoring the ConEx signal to compare it with a policy then ii) 754 acting in some way on the result. Various actions might be invoked 755 against 'out of contract' traffic, such as policing (see 756 Section 5.4.3), re-routing, or downgrading the class of service. 758 Alternatively a policy device might not act directly on the traffic, 759 but instead report to management systems that are designed to control 760 congestion indirectly. For instance the reports might trigger 761 capacity upgrades, penalty clauses in contracts, levy charges based 762 on congestion, or merely send warnings to clients who are causing 763 excessive congestion. 765 Nonetheless, whatever action is invoked, the congestion monitoring 766 function will always be a necessary part of any policy device. 768 5.4.2. Rest-of-Path Congestion Monitoring 770 ConEx signals indicate the level of congestion along a whole path 771 from source to destination. In contrast, ECN signals monitored in 772 the middle of a network indicate the level of congestion experienced 773 so far on the path (of course, only in ECN-capable traffic). 775 If a monitor in the middle of a network (e.g. at a network border) 776 measures both of these signals, it can subtract the level of ECN 777 (path so far) from the level of ConEx (whole path) to derive a 778 measure of the congestion that packets are likely to experience 779 between the monitoring point and their destination (rest-of-path 780 congestion). 782 It will often be preferable for policy devices to monitor rest-of- 783 path congestion if they can, because it is a measure of the 784 downstream congestion that the policy device can directly influence 785 by controlling the traffic passing through it. 787 5.4.3. Congestion Policers 789 A congestion policer can be implemented in a very similar way to a 790 bit-rate policer, but its effect can be focused solely on traffic of 791 users causing congestion downstream, which ConEx signals make 792 visible. Without ConEx signals, the only way to mitigate congestion 793 is to blindly limit traffic bit-rate, on the assumption that high 794 bit-rate is more likely to cause congestion. 796 A congestion policer monitors all ConEx traffic entering a network, 797 or some identifiable subset. Using ConEx signals and/or Credit 798 signals (and preferably subtracting ECN signals to yield rest-of-path 799 congestion), it measures the amount of congestion that this traffic 800 is contributing somewhere downstream. If this persistently exceeds a 801 policy-configured 'congestion-bit-rate' the congestion policer can 802 limit all the monitored ConEx traffic. 804 A congestion policer can be implemented by a simple token bucket 805 applied to an aggregate. But unlike a bit-rate policer, it removes 806 tokens only when it forwards packets that are ConEx-Marked and/or 807 Credit-Marked, effectively treating Not-ConEx-Marked packets as 808 invisible. Consequently, because tokens give the right to send 809 congested bits, the fill-rate of the token bucket will represent the 810 allowed congestion-bit-rate. This should provide sufficient traffic 811 management without having to additionally constrain the straight bit- 812 rate at all. See [I-D.briscoe-conex-policing] for details. 814 Note that the policing action could be to introduce a throttle 815 (discard some traffic) immediately upstream of the congestion 816 monitor. Alternatively, this throttle could introduce delay using a 817 queue with its own AQM, which potentially increases the whole path 818 congestion. In effect the congestion policer has moved the 819 congestion earlier in the path, and focused it on one user to protect 820 downstream resources by reducing the congestion in the rest of the 821 path. 823 5.5. Audit 825 The most critical aspect of ConEx is the capability to support robust 826 auditing. It can be assumed that sanctions based on ConEx signals 827 will create an intrinsic motivation for users to understate the 828 congestion that they are causing. So, without strong audit 829 functions, the ConEx signal would become understated to the point of 830 being useless. Therefore the most important feature of an encoding 831 design is likely to be the robustness of the auditing it supports. 833 The general goal of an auditor is to make sure that any ConEx-enabled 834 traffic is sent with sufficient ConEx-Re-Echo and ConEx-Credit 835 signals. A concrete definition of the ConEx protocol MUST define 836 what sufficient means. 838 If a ConEx-enabled transport does not carry sufficient ConEx signals, 839 then an auditor is likely to apply some sanction to that traffic. 840 Although sanctions are beyond the scope of this document, an example 841 sanction might be to throttle the traffic immediately upstream of the 842 auditor to prevent the user from getting any advantage by 843 understating congestion. Such a throttle would likely include some 844 combination of delaying or dropping traffic. 846 A ConEx auditor might use one of the following techniques: 848 Generic loss auditing: For congestion signaled by loss, totally 849 accurate auditing is not believed to be possible in the general 850 case, because it involves a network node detecting the absence of 851 some packets, when it cannot always necessarily identify 852 retransmissions or missing packets. The missing packet might 853 simply be taking a different route, or the IP payload may be 854 encrypted. 856 It is for this reason that it is desirable to motivate the 857 deploying of ECN, even though ECN is not strictly required for 858 ConEx. 860 ECN auditing: Directly observe and compare the volume of ECN and 861 ConEx marks. Since the volume of ECN marks rises monotonically 862 along a path, ECN auditing is most accurate when located near the 863 transport receiver. For this reason ECN should be monitored 864 downstream of the predominant bottleneck. 866 TCP-specific loss auditing: For non-encrypted standard TCP traffic 867 on a single path, a tactical audit approach could be to measure 868 losses by detecting retransmissions, which appear as duplicate 869 sequence numbers upstream of the loss and out of order data 870 downstream of the loss. Since some reordering is present in the 871 Internet, such a loss estimator would be most accurate near the 872 sender. Such an audit device should treat non-ECN-capable packets 873 with encrypted IP payload as Not-ConEx, even if they claim to be 874 ConEx-capable, unless the operator is also using one of the other 875 two techniques below that can audit such packets against losses. 877 Predominant bottleneck loss auditing: For networks designed so that 878 losses predominantly occur under the control of one IP-aware 879 bottleneck node on the path, the auditor could be located at this 880 bottleneck. It could simply compare ConEx Signals with actual 881 local packet discards (and ECN marks). This is a good model for 882 most consumer access networks where audit accuracy could well be 883 sufficient even if losses occasionally occur elsewhere in the 884 network. 886 Although the auditor at the predominant bottleneck would not be 887 able to count losses at other nodes, transports would not know 888 where losses were occurring either. Therefore a transport would 889 not know which losses it could cheat and which ones it couldn't 890 without getting caught. 892 ECN tunnel loss auditing: A network operator can arrange IP-in-IP 893 tunnels (or IP-in-MPLS etc.) so that any losses within the tunnels 894 are deferred until the tunnel egress. Then the audit function can 895 be deployed at the egress and be aware of all losses. This is 896 possible by enabling ECN marking on switches and routers within a 897 tunnel, irrespective of whether end-systems support ECN, by 898 exploiting a side-effect of the way tunnels handle the ECN field. 899 After encapsulation at the tunnel ingress, the network should 900 arrange for any non-ECN packets (with '00' in ECN field of the 901 outer) to be set to the ECN-capable transport (ECT(0)) codepoint. 903 Then, if they experience congestion at one of the ECN-capable 904 switches or routers within the tunnel, some will be ECN-marked 905 rather than immediately dropped. However, when the tunnel 906 decapsulator strips the outer from such an ECN-marked packet, if 907 it finds the inner header has '00' in the ECN field (meaning that 908 the endpoints do not support ECN) it will automatically drop the 909 packet, assuming it complies with [RFC6040]. Thus, an audit 910 function at the decapsulator can know which packets would have 911 been dropped within the tunnel (and even which are genuinely ECN- 912 marked for the end-to-end protocol). Non-ECN end-systems outside 913 the tunnel see no sign of the use of ECN internally. 915 In addition, other audit techniques may be identified in the future. 917 [Refb-dis] gives a comprehensive inventory of attacks against audit 918 proposed by various people. It includes pseudocode for both 919 deterministic and statistical audit functions designed to thwart 920 these attacks and analyses the effectiveness of an implementation. 921 Although this work is specific to the re-ECN protocol, most of the 922 material is useful for designing and assessing audit of other 923 specific ConEx encodings, against both ECN and loss. 925 The auditing function should be able to trigger sufficient sanction 926 to discourage understating congestion [Salvatori05]. This seems to 927 require designing the sanction in concert with the policy functions, 928 even though they might be implemented in different parts of the 929 network. However, [Refb-dis] proves audit and policy functions can 930 be independent as long as audit drops sufficient traffic to 931 'normalise' actual congestion signals to be no greater than ConEx 932 signals. 934 Similarly, the job of incentivising the sending of ConEx-enabled 935 packets is proper solely to policy devices, independent of the audit 936 function. The audit function's job is policy-neutral, so it should 937 be solely confined to checking for correctness within those packets 938 that have been marked as ConEx-capable. Even if there are Not-ConEx 939 packets mixed with ConEx packets within a flow, audit will not need 940 to monitor any Not-ConEx packets. 942 Note that in the future it might prove to be desirable to provide 943 advice on uniformly implementing sanctions, because otherwise 944 insufficient sanctions could impair the ability to implement policy 945 elsewhere in the network. 947 Some of the audit algorithms require per flow state. This cost is 948 expected to be tolerable, because these techniques are most apropos 949 near the edges of the network, where traffic is generally much less 950 aggregated, so the state need not overwhelm any one device. The 951 flow-state required for audit creates itself as it detects new flows. 952 Therefore a flow will not fail if it is re-routed away from the audit 953 box currently holding its flow-state, so auditing does not require 954 route pinning and works fine with multipath flows. 956 Holding flow-state seems to create a vulnerability to attacks that 957 exhaust the auditor's memory by opening numerous new short flows. 958 The audit function can protect itself from this attack by not 959 allocating new flow-state unless a ConEx-marked packet arrives (e.g. 960 credit at the start of a flow). Because policy devices rate limit 961 ConEx-marked packets, this sets a natural limit to the rate at which 962 a source can create flow-state in audit devices. The auditor would 963 treat all the remaining flows without any ConEx-marked packets as a 964 single misbehaving aggregate. 966 Auditing can be distributed and redundant. One flow may be audited 967 in multiple places, using multiple techniques. Some audit techniques 968 do not require any per flow state and can be applied to aggregate 969 traffic. These might be able to detect the presence of understated 970 congestion at large scale and support recursively hunting for 971 individual flows that are understating their congestion. Even at 972 large scales, flows can be randomly selected for individual auditing. 974 Sampling techniques can also be used to bound the total auditing 975 memory footprint, although the implementer needs to counter the 976 tactic where a source cheats until caught by sampling, then simply 977 discards that flow ID and starts cheating with a new one (termed 978 'identifier white-washing when caught'). 980 For the the concrete ConEx protocol encoding defined in 981 [I-D.ietf-conex-destopt], ConEx Credit and ConEx-Re-Echo signals are 982 intended to be audited separately. The Credit signal can be audited 983 directly against actual congestion (loss and ECN). However, there 984 will be an inherent delay of at least one round trip between a 985 congestion signal and the subsequent ConEx-Re-Echo signal it 986 triggers, as shown in Figure 1. Therefore ConEx-Re-Echo signals will 987 need to be audited with some allowance for this delay. Further 988 discussion of design and implementation choices for functions 989 intended to audit this concrete ConEx encoding can be found in 990 [I-D.wagner-conex-audit]. 992 6. Support for Incremental Deployment 994 The ConEx abstract protocol described so far is intended to support 995 incremental deployment in every possible respect. For convenience, 996 the following list collects together all the features that support 997 incremental deployment in the concrete ConEx specifications, and 998 points to further information on each: 1000 Packets: The wire protocol encoding allows each packet to indicate 1001 whether it is using ConEx or not (see Section 4 on Encoding 1002 Congestion Exposure). 1004 Senders: ConEx requires a modification to the source in order to 1005 send ConEx packet markings (see Section 5.2). Although ConEx 1006 support can be indicated on a packet-by-packet basis, it is likely 1007 that all the packets in a flow will either consistently support 1008 ConEx or consistently not. It is also likely that, if the 1009 implementation of a transport protocol supports ConEx, all the 1010 packets sent from that host using that protocol will be ConEx 1011 marked. 1013 The implementations of some of the transport protocols on a host 1014 might not support ConEx (e.g. the implementation of DNS over UDP 1015 might not support ConEx, while perhaps RTP over UDP and TCP will). 1016 Any non-upgraded transports and non-upgraded hosts will simply 1017 continue to send regular Not-ConEx packets as always. 1019 A network operator can create incentives for senders to 1020 voluntarily reveal ConEx information (see the item on incremental 1021 deployment by 'Networks' below). 1023 Receivers: A ConEx source should be able to work with the regular 1024 receiver for the transport in question, without requiring any 1025 ConEx-specific modifications. This is true for modern transport 1026 protocols (RTCP, SCTP etc) and it is even true for TCP, as long as 1027 the receiver supports SACK, which is widely deployed anyway. 1028 However, it is not true for ECN feedback in TCP. The need for 1029 more precise ECN feedback in TCP is not exclusive to ConEx, for 1030 instance Data Centre TCP (DCTCP [DCTCP]) uses precise feedback to 1031 good effect. Therefore, if a receiver offers precise feedback, 1032 [I-D.ietf-tcpm-accecn-reqs] it will be best if ConEx uses it (see 1033 Section 5.3). Alternatively, without sufficiently precise 1034 congestion feedback from the receiver, the source may have to 1035 conservatively send extra ConEx markings in order to avoid 1036 understating congestion. 1038 Proxies: Although it was stated above that ConEx requires a 1039 modification to the source, ConEx signals could theoretically be 1040 introduced by a proxy for the source, as long as it can intercept 1041 feedback from the receiver. Similarly, more precise feedback 1042 could thoretically be provided by a proxy for the receiver rather 1043 than modifying the receiver itself. 1045 Forwarding: No modification to forwarding or queuing is needed for 1046 ConEx. 1048 However, once some ConEx is deployed, it is possible that a queue 1049 implementation could optionally take advantage of the ConEx 1050 information in packets. For instance, it has been suggested 1051 [I-D.ietf-conex-destopt] that a queue would be more robust against 1052 flooding if it preferentially discarded Not-ConEx packets then 1053 Not-Marked ConEx packets. 1055 A ConEx sender re-echoes congestion whether the queues signaling 1056 congestion are ECN-enabled or not. Nonetheless, an operator 1057 relying on ConEx signals is recommended to enable ECN in queues 1058 wherever possible. This is because auditing works best if most 1059 congestion is indicated by ECN rather than loss (see Section 3). 1060 Also, monitoring rest-of-path congestion is not accurate if there 1061 are congested non-ECN queues upstream of the monitoring point 1062 (Section 5.4.2). 1064 Networks: If a subset of traffic sources (or proxies) use ConEx 1065 signals to reveal congestion in the internetwork layer, a network 1066 operator can choose (or not) to use this information for traffic 1067 management. As long as the end-to-end ConEx signals are present, 1068 each network can unilaterally choose to use them--independently of 1069 whether other networks do. 1071 ConEx marked packets may safely traverse a network that ignores 1072 them. ConEx signals are defined to remain unchanged once set by 1073 the sender, but some encodings may allow changes in transit (e.g. 1074 by proxies). In no circumstances will a network node change ConEx 1075 marked packets to Not-ConEx (network layer encoding requirement I 1076 in Section 3.3). If necessary, endpoints should be able to detect 1077 if a network is removing ConEx signals (network layer encoding 1078 requirement H in Section 3.3). 1080 An operator can deploy policy devices (Section 5.4) wherever 1081 traffic enters its network, in order to monitor the downstream 1082 congestion that incoming traffic contributes to, and control it if 1083 necessary. A network operator can create incentives for the 1084 developers of sending applications and transports to voluntarily 1085 reveal ConEx information. Without ConEx information, a network 1086 operator tends to have to limit the bit-rate or volume from a site 1087 more than is necessary, just in case it might congest others. 1088 With ConEx information, the operator can solely limit congestion- 1089 causing traffic, and otherwise allow complete freedom. This 1090 greater freedom acts as an inducement for the source to volunteer 1091 ConEx information. An operator may also monitor whether a source 1092 transport has sent ConEx packets, and treat the same transport 1093 with greater suspicion (e.g. a more stringent rate-limit) whenever 1094 it selectively sends packets without ConEx support. See [RFC6789] 1095 for further discussion of deployment incentives for networks and 1096 references to scenarios where some networks use ConEx-based policy 1097 devices and others don't. 1099 An operator can deploy audit devices (Section 5.5) unilaterally 1100 within its own network to verify that traffic sources are not 1101 understating ConEx information. From the viewpoint of one network 1102 operator (say N_a), it only cares that the level of ConEx 1103 signaling is sufficient to cover congestion in its own network. 1104 If traffic continues into a congested downstream network (say 1105 N_b), it is of no concern to the first network (N_a) if the end- 1106 to-end ConEx signaling is insufficient to cover the congestion in 1107 N_b as well. This is N_b's concern, and N_b can both detect such 1108 anomalous traffic and deal with it using ConEx-based audit devices 1109 itself. 1111 7. IANA Considerations 1113 This memo includes no request to IANA. 1115 Note to RFC Editor: this section may be removed on publication as an 1116 RFC. 1118 8. Security Considerations 1120 The only known risk associated with ConEx is that users and 1121 applications are very likely to be motivated to under-represent the 1122 congestion that they are causing. Significant portions of this 1123 document are about mechanisms to audit the ConEx signals and create 1124 sufficient sanction to inhibit such under-representation. In 1125 particular see Section 5.5. 1127 Security attacks and their defences are best discussed against a 1128 concrete protocol specification, not the abstract mechanism of this 1129 document. A concrete ConEx protocol will need to be accompanied by a 1130 document describing how the protocol and its audit mechanisms defend 1131 against likely attacks. [Refb-dis] will be a useful source for such 1132 a document. It gives a comprehensive inventory of attacks against 1133 audit that have been proposed by various parties. It includes 1134 pseudocode for both deterministic and statistical audit functions 1135 designed to thwart these attacks and analyses the effectiveness of an 1136 implementation. 1138 However, [Refb-dis] is specific to the re-ECN protocol, which 1139 signalled ECN & loss together, whereas the concrete ConEx protocol 1140 defined in [I-D.ietf-conex-destopt] signals them separately. 1142 Therefore, although likely attacks will be similar, there will be 1143 more combinations of attacks to worry about, and defences and their 1144 analysis are likely to be a little different for ConEx. 1146 The main known attacks that a security document for a concrete ConEx 1147 protocol will need to address are listed below, and [Refb-dis] should 1148 be referred to for how re-ECN was designed to defend against similar 1149 attacks: 1150 o Attacks on the audit function (see Section 7.5 of [Refb-dis]): 1151 Flow ID Whitewashing: Designing the audit function so that a 1152 source cannot gain from starting a new flow once audit has 1153 detected cheating in a previous flow. 1154 Dragging Down an Aggregate: Avoiding audit discarding packets 1155 from all flows within an aggregate, which would allow one flow 1156 to pull down the average so that the audit function would 1157 discard packets from all flows, not just the offending flow. 1158 Dragging Down a Spoofed Flow ID: An attacker understates ConEx 1159 markings in packets that spoof another flow, which fools the 1160 audit function into dropping the genuine user's packets. 1161 o Attacks by networks on other networks (see Section 8.2 of 1162 [Refb-dis]): 1163 Dummy Traffic: Sending dummy traffic across a border with 1164 understated ConEx markings to bring down the average ConEx 1165 markings in the aggregate of border traffic. This attack can 1166 be combined with a TTL that expires before the packets reach an 1167 audit function. 1168 Signal Poisoning with 'Cancelled' Marking: Sending high volumes 1169 of valid packets that are both ConEx-Marked and ECN-Marked, 1170 which seems to represent congestion upstream, but it makes 1171 these packets immune to being further ECN-Marked downstream. 1173 It is planned to document all known attacks and their defences 1174 (including all the above) in the RFC series against a concrete ConEx 1175 protocol specification. In the interim [Refb-dis] and its references 1176 should be referred to for details and ways to address these attacks 1177 in the case of re-ECN. 1179 9. Acknowledgements 1181 This document was improved by review comments from Toby Moncaster, 1182 Nandita Dukkipati, Mirja Kuehlewind, Caitlin Bestler, Marcelo Bagnulo 1183 Braun, John Leslie, Ingemar Johansson and David Wagner. 1185 Bob Briscoe's work on this specification received part-funding from 1186 the European Union's Seventh Framework Programme FP7/2007-2013 under 1187 Trilogy 2 project, grant agreement no. 317756. The views expressed 1188 here are solely those of the author. 1190 10. Comments Solicited 1192 Comments and questions are encouraged and very welcome. They can be 1193 addressed to the IETF Congestion Exposure (ConEx) working group 1194 mailing list , and/or to the authors. 1196 11. References 1198 11.1. Normative References 1200 [RFC2119] Bradner, S., "Key words for use in 1201 RFCs to Indicate Requirement 1202 Levels", BCP 14, RFC 2119, 1203 March 1997. 1205 11.2. Informative References 1207 [CheapPseud] Friedman, E. and P. Resnick, "The 1208 Social Cost of Cheap Pseudonyms", 1209 Journal of Economics and Management 1210 Strategy 10(2)173--199, 1998. 1212 [DCTCP] Alizadeh, M., Greenberg, A., Maltz, 1213 D., Padhye, J., Patel, P., 1214 Prabhakar, B., Sengupta, S., and M. 1215 Sridharan, "Data Center TCP 1216 (DCTCP)", ACM SIGCOMM 1217 CCR 40(4)63--74, October 2010, . 1221 [Evol_cc] Gibbens, R. and F. Kelly, "Resource 1222 pricing and the evolution of 1223 congestion control", 1224 Automatica 35(12)1969--1985, 1225 December 1999, . 1229 [FairerFaster] Briscoe, B., "A Fairer, Faster 1230 Internet Protocol", IEEE 1231 Spectrum Dec 2008:38--43, 1232 December 2008, . 1236 [I-D.briscoe-conex-policing] Briscoe, B., "Network Performance 1237 Isolation using Congestion 1238 Policing", 1239 draft-briscoe-conex-policing-00 1240 (work in progress), February 2013. 1242 [I-D.briscoe-conex-re-ecn-motiv] Briscoe, B., Jacquet, A., 1243 Moncaster, T., and A. Smith, "Re- 1244 ECN: A Framework for adding 1245 Congestion Accountability to 1246 TCP/IP", 1247 draft-briscoe-conex-re-ecn-motiv-02 1248 (work in progress), July 2013. 1250 [I-D.briscoe-conex-re-ecn-tcp] Briscoe, B., Jacquet, A., 1251 Moncaster, T., and A. Smith, "Re- 1252 ECN: Adding Accountability for 1253 Causing Congestion to TCP/IP", 1254 draft-briscoe-conex-re-ecn-tcp-02 1255 (work in progress), July 2013. 1257 [I-D.ietf-conex-destopt] Krishnan, S., Kuehlewind, M., and 1258 C. Ucendo, "IPv6 Destination Option 1259 for ConEx", 1260 draft-ietf-conex-destopt-05 (work 1261 in progress), October 2013. 1263 [I-D.ietf-tcp-modifications] Kuehlewind, M. and R. 1264 Scheffenegger, "TCP modifications 1265 for Congestion Exposure", draft- 1266 ietf-conex-tcp-modifications-04 1267 (work in progress), July 2013. 1269 [I-D.ietf-tcpm-accecn-reqs] Kuehlewind, M. and R. 1270 Scheffenegger, "Problem Statement 1271 and Requirements for a More 1272 Accurate ECN Feedback", 1273 draft-ietf-tcpm-accecn-reqs-04 1274 (work in progress), October 2013. 1276 [I-D.wagner-conex-audit] Wagner, D. and M. Kuehlewind, 1277 "Auditing of Congestion Exposure 1278 (ConEx) signals", 1279 draft-wagner-conex-audit-01 (work 1280 in progress), February 2014. 1282 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., 1283 and A. Romanow, "TCP Selective 1284 Acknowledgment Options", RFC 2018, 1285 October 1996. 1287 [RFC3168] Ramakrishnan, K., Floyd, S., and D. 1288 Black, "The Addition of Explicit 1289 Congestion Notification (ECN) to 1290 IP", RFC 3168, September 2001. 1292 [RFC3514] Bellovin, S., "The Security Flag in 1293 the IPv4 Header", RFC 3514, April 1 1294 2003. 1296 [RFC3550] Schulzrinne, H., Casner, S., 1297 Frederick, R., and V. Jacobson, 1298 "RTP: A Transport Protocol for 1299 Real-Time Applications", STD 64, 1300 RFC 3550, July 2003. 1302 [RFC5348] Floyd, S., Handley, M., Padhye, J., 1303 and J. Widmer, "TCP Friendly Rate 1304 Control (TFRC): Protocol 1305 Specification", RFC 5348, 1306 September 2008. 1308 [RFC5681] Allman, M., Paxson, V., and E. 1309 Blanton, "TCP Congestion Control", 1310 RFC 5681, September 2009. 1312 [RFC6040] Briscoe, B., "Tunnelling of 1313 Explicit Congestion Notification", 1314 RFC 6040, November 2010. 1316 [RFC6679] Westerlund, M., Johansson, I., 1317 Perkins, C., O'Hanlon, P., and K. 1318 Carlberg, "Explicit Congestion 1319 Notification (ECN) for RTP over 1320 UDP", RFC 6679, August 2012. 1322 [RFC6789] Briscoe, B., Woundy, R., and A. 1323 Cooper, "Congestion Exposure 1324 (ConEx) Concepts and Use Cases", 1325 RFC 6789, December 2012. 1327 [RFC6817] Shalunov, S., Hazel, G., Iyengar, 1328 J., and M. Kuehlewind, "Low Extra 1329 Delay Background Transport 1330 (LEDBAT)", RFC 6817, December 2012. 1332 [RFC7141] Briscoe, B. and J. Manner, "Byte 1333 and Packet Congestion 1334 Notification", BCP 41, RFC 7141, 1335 February 2014. 1337 [Re-fb] Briscoe, B., Jacquet, A., Di 1338 Cairano-Gilfedder, C., Salvatori, 1339 A., Soppera, A., and M. Koyabe, 1340 "Policing Congestion Response in an 1341 Internetwork Using Re-Feedback", 1342 ACM SIGCOMM CCR 35(4)277--288, 1343 August 2005, . 1347 [Refb-dis] Briscoe, B., "Re-feedback: Freedom 1348 with Accountability for Causing 1349 Congestion in a Connectionless 1350 Internetwork", UCL PhD 1351 Dissertation , 2009, 1352 . 1355 [Salvatori05] Salvatori, A., "Closed Loop Traffic 1356 Policing", Politecnico Torino and 1357 Institut Eurecom Masters Thesis , 1358 September 2005. 1360 Authors' Addresses 1362 Matt Mathis 1363 Google, Inc 1364 1600 Amphitheater Parkway 1365 Mountain View, California 93117 1366 USA 1368 EMail: mattmathis at google.com 1370 Bob Briscoe 1371 BT 1372 B54/77, Adastral Park 1373 Martlesham Heath 1374 Ipswich IP5 3RE 1375 UK 1377 Phone: +44 1473 645196 1378 EMail: bob.briscoe@bt.com 1379 URI: http://bobbriscoe.net/