idnits 2.17.1 draft-ietf-conex-abstract-mech-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2014) is 3697 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-01) exists of draft-briscoe-conex-policing-00 == Outdated reference: A later version (-03) exists of draft-briscoe-conex-re-ecn-motiv-02 == Outdated reference: A later version (-04) exists of draft-briscoe-conex-re-ecn-tcp-02 == Outdated reference: A later version (-12) exists of draft-ietf-conex-destopt-05 -- No information found for draft-ietf-tcp-modifications - is the name correct? == Outdated reference: A later version (-08) exists of draft-ietf-tcpm-accecn-reqs-04 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion Exposure (ConEx) Working M. Mathis 3 Group Google, Inc 4 Internet-Draft B. Briscoe 5 Intended status: Informational BT 6 Expires: September 14, 2014 March 13, 2014 8 Congestion Exposure (ConEx) Concepts and Abstract Mechanism 9 draft-ietf-conex-abstract-mech-11 11 Abstract 13 This document describes an abstract mechanism by which senders inform 14 the network about the congestion encountered by packets earlier in 15 the same flow. Today, network elements at any layer may signal 16 congestion to the receiver by dropping packets or by ECN markings, 17 and the receiver passes this information back to the sender in 18 transport-layer feedback. The mechanism described here enables the 19 sender to also relay this congestion information back into the 20 network in-band at the IP layer, such that the total amount of 21 congestion from all elements on the path is revealed to all IP 22 elements along the path, where it could, for example, be used to 23 provide input to traffic management. This mechanism is called 24 congestion exposure or ConEx. The companion document "ConEx Concepts 25 and Use Cases" provides the entry-point to the set of ConEx 26 documentation. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on September 14, 2014. 45 Copyright Notice 47 Copyright (c) 2014 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 65 3. Requirements for the ConEx Abstract Mechanism . . . . . . . . 7 66 3.1. Requirements for ConEx Signals . . . . . . . . . . . . . . 7 67 3.2. Requirements for the Audit Function . . . . . . . . . . . 8 68 3.3. Requirements for non-abstract ConEx specifications . . . . 9 69 4. Encoding Congestion Exposure . . . . . . . . . . . . . . . . . 10 70 4.1. Naive Encoding . . . . . . . . . . . . . . . . . . . . . . 11 71 4.2. Null Encoding . . . . . . . . . . . . . . . . . . . . . . 11 72 4.3. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 12 73 4.4. Independent Bits . . . . . . . . . . . . . . . . . . . . . 12 74 4.5. Codepoint Encoding . . . . . . . . . . . . . . . . . . . . 13 75 4.6. Units Implied by an Encoding . . . . . . . . . . . . . . . 14 76 5. Congestion Exposure Components . . . . . . . . . . . . . . . . 15 77 5.1. Network Devices (Not modified) . . . . . . . . . . . . . . 15 78 5.2. Modified Senders . . . . . . . . . . . . . . . . . . . . . 15 79 5.3. Receivers (Optionally Modified) . . . . . . . . . . . . . 15 80 5.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 16 81 5.4.1. Congestion Monitoring Devices . . . . . . . . . . . . 16 82 5.4.2. Rest-of-Path Congestion Monitoring . . . . . . . . . . 16 83 5.4.3. Congestion Policers . . . . . . . . . . . . . . . . . 17 84 5.5. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 18 85 6. Support for Incremental Deployment . . . . . . . . . . . . . . 21 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 87 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 88 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 89 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 25 90 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 91 11.1. Normative References . . . . . . . . . . . . . . . . . . . 25 92 11.2. Informative References . . . . . . . . . . . . . . . . . . 25 94 1. Introduction 96 This document describes an abstract mechanism by which, to a first 97 approximation, senders inform the network about the congestion 98 encountered by packets earlier in the same flow. It is not a 99 complete protocol specification, because it is known that designing 100 an encoding (e.g. packet formats, codepoint allocations, etc) is 101 likely to entail compromises that preclude some uses of the protocol. 102 The goal of this document is to provide a framework for developing 103 and testing algorithms to evaluate the benefits of the ConEx protocol 104 and to evaluate the consequences of the compromises in various 105 different encoding designs. 107 A companion document [RFC6789] provides the entry point to the set of 108 ConEx documentation. It outlines concepts that are pre-requisites to 109 understanding why ConEx is useful, and it outlines various ways that 110 ConEx might be used. 112 2. Overview 114 As typical end-to-end transport protocols continually seek out more 115 network capacity, network elements signal whenever congestion 116 results, and the transports are responsible for controlling this 117 network congestion [RFC5681]. The more a transport tries to use 118 capacity that others want to use, the more congestion signals will be 119 attributable to that transport. Likewise, the more transport 120 sessions sustained by a user and the longer the user sustains them, 121 the more congestion signals will be attributable to that user. The 122 goal of ConEx is to ensure that the resulting congestion signals are 123 sufficiently visible and robust, because they are an ideal metric for 124 networks to use as the basis of traffic management or other related 125 functions. 127 Networks indicate congestion by three possible signals: packet loss, 128 ECN marking or queueing delay. ECN marking and some packet loss may 129 be the outcome of Active Queue Management (AQM), which the network 130 uses to warn senders to reduce their rates. Packet loss is also the 131 natural consequence of complete exhaustion of a buffer or other 132 network resource. Some experimental transport protocols and TCP 133 variants infer impending congestion from increasing queuing delay. 134 However, delay is too amorphous to use as a congestion metric. In 135 this and other ConEx documents, the term 'congestion signals' is 136 generally used solely for ECN markings and packet losses, because 137 they are unambiguous signals of congestion. 139 In both cases the congestion signals follow the route indicated in 140 Figure 1. A congested network device sends a signal in the data 141 stream on the forward path to the transport receiver, the receiver 142 passes it back to the sender through transport level feedback, and 143 the sender makes some congestion control adjustment. 145 This document extends the capabilities of the Internet protocol suite 146 with the addition of a new Congestion Exposure signal. To a first 147 approximation this signal, also shown in Figure 1, relays the 148 congestion information from the transport sender back through the 149 internetwork layer where it is visible to any interested internetwork 150 layer devices along the forward path. This document frames the 151 engineering problem of designing the ConEx signal. The requirements 152 are described in Section 3 and some example encoding are presented in 153 Section 4. Section 5 describes all of the protocol components. 155 This new signal is expressly designed to support a variety of new 156 policy mechanisms that might be used to instrument, monitor or manage 157 traffic. The policy devices are not shown in Figure 1 but might be 158 placed anywhere along the forward data path (see Section 5.4). 160 ,---------. ,---------. 161 |Transport| |Transport| 162 | Sender | . |Receiver | 163 | | /|___________________________________________| | 164 | ,-<---------------Congestion-Feedback-Signals--<--------. | 165 | | |/ | | | 166 | | |\ Transport Layer Feedback Flow | | | 167 | | | \ ___________________________________________| | | 168 | | | \| | | | 169 | | | ' ,-----------. . | | | 170 | | |_____________| |_______________|\ | | | 171 | | | IP Layer | | Data Flow \ | | | 172 | | | |(Congested)| \ | | | 173 | | | | Network |--Congestion-Signals--->-' | 174 | | | | Device | \| | 175 | | | | | /| | 176 | `----------->--(new)-IP-Layer-ConEx-Signals-------->| | 177 | | | | / | | 178 | |_____________| |_______________ / | | 179 | | | | |/ | | 180 `---------' `-----------' ' `---------' 182 Figure 1: The Flow of Congestion and ConEx Signals 184 Since the policy devices can affect how traffic is treated it is 185 assumed that there is an intrinsic motivation for users, applications 186 or operating systems to understate the congestion that they are 187 causing. Therefore, it is important to be able to audit ConEx 188 signals, and to be able apply sufficient sanction to discourage 189 cheating of congestion policies. The general approach to auditing is 190 to count signals on the forward path to confirm that there are never 191 fewer ConEx signals than congestion signals. Many ConEx design 192 constraints come from the need to assure that the audit function is 193 sufficiently robust. The audit function is described in Section 5.5, 194 however significant portions of this document (and prior research 195 [Refb-dis]) is motivated by issues relating to the audit function and 196 making it robust. 198 The congestion and ConEx signals shown in Figure 1 represent a series 199 of discrete events: ECN marks or lost packets, carried by the forward 200 data stream and fed back into the Internetwork layer. The policy and 201 audit functions are most likely to act on the accumulated values of 202 these signals, for which we use the term "volume". For example 203 traffic volume is the total number of bytes delivered, optionally 204 over a specified time interval and over some aggregate of traffic 205 (e.g. all traffic from a site). While loss-volume is the total 206 amount of bytes discarded from some aggregate over an interval. The 207 term congestion-volume is defined precisely in [RFC6789]. Note that 208 volume per unit time is (average) rate. 210 A design goal of the ConEx protocol is that the important policy 211 mechanisms can be implemented per logical link without per flow state 212 (see Section 5.4). However, the price to pay can be flow state to 213 audit ConEx signals (Section 5.5). This is justified in that i) 214 auditing at the edges, with limited per flow state, enables policy 215 elsewhere, including in the core, without any per flow state; ii) 216 auditing can use soft flow state, which does not require route 217 pinning. 219 There is a long standing argument over units of congestion: bytes vs 220 packets (see [RFC7141] and its references). Section 4.6 explains why 221 this problem must be addressed carefully. However, this document 222 does not take a strong position on this issue. Nonetheless, it does 223 require that the units of congestion must be an explicitly stated 224 property of any proposed encoding, and the consequences of that 225 design decision must be evaluated along with other aspects of the 226 design. 228 To be successful the ConEx protocol must have the property that the 229 relevant stakeholders each have the incentive to unilaterally start 230 on each stage of partial deployment, which in turn creates incentives 231 for further deployment. Furthermore, legacy systems that will never 232 be upgraded do not become a barrier to deploying ConEx. Issues 233 relating to partial deployment are described in Section 6. 235 Note that ConEx signals are not intended to be used for fine-grained 236 congestion control. They are anticipated to be most useful at longer 237 time scales and/or at coarser granularity than single microflows. 239 For example the total congestion caused by a user might serve as an 240 input to higher level policy or accountability functions, designed to 241 create incentives for improving user behavior, such as choosing to 242 send large quantities of data at off-peak times, at lower data rates 243 or with less aggressive protocols such as LEDBAT [RFC6817] (see 244 [RFC6789]). 246 Ultimately ConEx signals have the potential to provide a mechanism to 247 regulate global Internet congestion. From the earliest days of 248 congestion control research there has been a concern that there is no 249 mechanism to prevent transport designers from incrementally making 250 protocols more aggressive without bound and spiraling to a "tragedy 251 of the commons" Internet congestion collapse. The "TCP friendly" 252 paradigm was created in part to forestall this failure. However, it 253 no longer commands any authority because it has little to say about 254 the Internet of today, which has moved beyond the scaling range of 255 standard TCP. As a consequence, many transports and applications are 256 opening arbitrarily large numbers of connections or using arbitrary 257 levels of aggressiveness. ConEx represents a recognition that the 258 IETF cannot regulate this space directly because it concerns the 259 behaviour of users and applications, not individual transport 260 protocols. Instead the IETF can give network operators the protocol 261 tools to arbitrate the space themselves, with better bulk traffic 262 management. This in turn should create incentives for users, and 263 designers of application and of transport protocols to be more 264 mindful about contributing to congesting. 266 2.1. Terminology 268 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 269 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 270 document are to be interpreted as described in RFC 2119 [RFC2119]. 272 ConEx signals in IP packet headers from the sender to the network: 273 Not-ConEx: The transport (or at least this packet) is not ConEx- 274 capable. 275 ConEx-Capable: The transport is ConEx-Capable. This is the opposite 276 of Not-ConEx. 277 ConEx Signal: A signal in a packet sent by a ConEx Capable 278 transport. It carries at least one of the following signals: 279 Re-Echo-Loss: The transport has experienced a loss. 280 Re-Echo-ECN: The transport has detected an ECN congestion 281 experienced (CE) mark. 282 Credit: The transport is building up credit to signal advance 283 notice of the risk of packets contributing to congestion, in 284 contrast to signalling only after inherently delayed feedback 285 of actual congestion. 287 ConEx-Not-Marked: The transport is ConEx-capable but is signaling 288 none of Re-Echo-Loss, Re-Echo-ECN or Credit. 289 ConEx-Marked: At least one of Re-Echo-Loss, Re-Echo-ECN or Credit. 290 ConEx-Re-Echo: At least one of Re-Echo-Loss or Re-Echo-ECN. 292 3. Requirements for the ConEx Abstract Mechanism 294 First time readers may wish to skim this section, since it is more 295 understandable having read the entire document. 297 3.1. Requirements for ConEx Signals 299 Ideally, all the following requirements would be met by a Congestion 300 Exposure Signal: 301 a. The ConEx Signal SHOULD be visible to internetwork layer devices 302 along the entire path from the transport sender to the transport 303 receiver. Equivalently, it SHOULD be present in the IPv4 or IPv6 304 header, and in the outermost IP header if using IP in IP 305 tunneling. The ConEx Signal SHOULD be immutable once set by the 306 transport sender. A corollary of these requirements is that the 307 chosen ConEx encoding SHOULD pass silently without modification 308 through pre-existing networking gear. 309 b. The ConEx Signal SHOULD be useful under only partial deployment. 310 A minimal deployment SHOULD only require changes to transport 311 senders. Furthermore, partial deployment SHOULD create 312 incentives for additional deployment, both in terms of enabling 313 ConEx on more devices and adding richer features to existing 314 devices. Nonetheless, ConEx deployment need never be universal, 315 and it is anticipated that some hosts and some transports may 316 never support the ConEx Protocol and some networks may never use 317 the ConEx Signals. 318 c. The ConEx signal SHOULD be timely. There will be a minimum delay 319 of one RTT, and often longer if the transport protocol sends 320 infrequent feedback (consider RTCP [RFC3550], [RFC6679] for 321 example). 322 d. The ConEx signal SHOULD be accurate and auditable. The general 323 approach for auditing is to observe the volume of congestion 324 signals and ConEx signals on the forward data path and verify 325 that the ConEx signals do not under-represent the congestion 326 signals (see Section 5.5). Furthermore, the ConEx signals for 327 packet loss and ECN marking SHOULD have distinct encodings 328 because they are likely to require different auditing techniques. 329 e. Additionally there SHOULD be an auditable ConEx Credit signal. A 330 sender can use Credit to indicate potential future congestion, 331 for example as often seen during startup. ConEx Credit is 332 intended to overestimate congestion actually experienced across 333 the network. 335 It is already known that implementing ConEx signals is likely to 336 entail some compromises, and therefore all the requirements above are 337 expressed with the keyword 'SHOULD' rather than 'MUST'. The only 338 mandatory requirement is that a concrete protocol description MUST 339 give sound reasoning if it chooses not to meet some requirement. 341 3.2. Requirements for the Audit Function 343 The role the audit function and constraints on it are described in 344 Section 5.5. There is no intention to standardise the audit 345 function. However, it is necessary to lay down the following 346 normative constraints on audit behaviour so that transport designers 347 will know what to design against and implementers of audit devices 348 will know what pitfalls to avoid: 349 Minimal False Hits: Audit SHOULD introduce minimal false hits for 350 honest flows; 351 Minimal False Misses: Audit SHOULD quickly detect and sanction 352 dishonest flows, ideally on the first dishonest packet; 353 Transport Oblivious: Audit SHOULD NOT be designed around one 354 particular rate response, such as any particular TCP congestion 355 control algorithm or one particular resource sharing regime such 356 as TCP-friendliness [RFC5348]. An important goal is to give 357 ingress networks the freedom to unilaterally allow different rate 358 responses to congestion and different resource sharing regimes 359 [Evol_cc], without having to coordinate with other networks over 360 details of individual flow behaviour; 361 Sufficient Sanction: Audit SHOULD introduce sufficient sanction 362 (e.g. loss in goodput) such that senders cannot gain from 363 understating congestion; 364 Proportionate Sanction: To the extent that the audit might be 365 subject to false hits, the sanction SHOULD be proportionate to the 366 degree to which congestion is understated. If audit over- 367 punishes, attackers will find ways to harness it into amplifying 368 attacks on others. Ideally audit should, in the long-run, cause 369 the user to get no better performance than they would get by being 370 accurate. 371 Manage Memory Exhaustion: Audit SHOULD be able to counter state 372 exhaustion attacks. For instance, if the audit function uses 373 flow-state, it should not be possible for senders to exhaust its 374 memory capacity by gratuitously sending numerous packets, each 375 with a different flow ID. 376 Identifier Accountability: Audit SHOULD NOT be vulnerable to 377 `identity whitewashing', where a transport can label a flow with a 378 new ID more cheaply than paying the cost of continuing to use its 379 current ID [CheapPseud]; 381 3.3. Requirements for non-abstract ConEx specifications 383 An experimental ConEx specification SHOULD describe the following 384 protocol details: 385 Network Layer: 386 A. The specific ConEx signal encodings with packet formats, bit 387 fields and/or code points; 388 B. An inventory of invalid combinations of flags or invalid 389 codepoints in the encoding. Whether security gateways should 390 normalise, discard or ignore such invalid encodings, and what 391 values they should be considered equivalent to by ConEx-aware 392 elements; 393 C. An inventory of any conflated signals or any other effects 394 that are known to compromise signal integrity; 395 D. Whether the source is responsible for allowing for the round 396 trip delay in ConEx signals (e.g. using a Credit marking), and 397 if so whether Credit is maintained for the duration of a flow 398 or degrades over time, and what defines the end of the 399 duration of a flow; 400 E. A specification for signal units (bytes vs packets, etc), any 401 approximations allowed and algorithms to do any implied 402 conversions or accounting; 403 F. If the units are bytes a definition of which headers are 404 included in the size of the packet; 405 G. How tunnels should propagate the ConEx encoding; 406 H. Whether the encoding fields are mutable or not, to ensure that 407 header authentication, checksum calculation, etc. process them 408 correctly. A ConEx encoding field SHOULD be immutable end-to- 409 end, then end points can detect if it has been tampered with 410 in transit; 411 I. If a specific encoding allows mutability (e.g. at proxies), an 412 inventory of invalid transitions between codepoints. In all 413 encodings, transitions from any ConEx marking to Not-ConEx 414 MUST be invalid; 415 J. A statement that the ConEx encoding is only applicable to 416 unicast and anycast, and that forwarding elements should 417 silently ignore any ConEx signalling on multicast packets 418 (they should be forwarded unchanged) 419 K. Definition of any extensibility; 420 L. Backward and forward compatibility and potential migration 421 strategies. In all cases, a ConEx encoding MUST be arranged 422 so that legacy transport senders implicitly send Not-ConEx; 423 M. Any (optional) modification to data-plane forwarding dependent 424 on the encoding (e.g. preferential discard, interaction with 425 Diffserv, ECN etc.); 426 N. Any warning or error messages relevant to the encoding. 428 Note regarding item J on multicast: A multicast tree may involve 429 different levels of congestion on each leg. Any traffic 430 management can only monitor or control multicast congestion at or 431 near each receiver. It would make no sense for the sender to try 432 to expose "whole path congestion" in sent packets, because it 433 cannot hope to describe all the differing congestion levels on 434 every leg of the tree. 435 Transport Layer: 436 A. A specification of any required changes to congestion feedback 437 in particular transport protocols. 438 B. A specification (or minimally a recommendation) for how a 439 transport should estimate credits at the beginning of a 440 connection and while it is in progress. 441 C. A specification of whether any other protocol options should 442 (or must) be enabled along with an implementation of ConEx 443 (e.g. at least attempting to negotiate ECN and SACK 444 capability); 445 D. A specification of any configuration that a ConEx stack may 446 require (or preferably confirmation that it requires no 447 configuration); 448 E. A specification of the statistics that a protocol stack should 449 log for each type of marking on a per-flow or aggregate basis. 450 Security: 451 A. An example of a strong audit algorithm suitable for detecting 452 if a single flow is misstating congestion. This algorithm 453 should present minimal false results, but need not have 454 optimal scaling properties (e.g. may need per flow state). 455 B. An example of an audit algorithm suitable for detecting 456 misstated congestion in a large aggregate (e.g. no per-flow 457 state). 459 The possibility exists that these specifications over constrain the 460 ConEx design, and can not be fully satisfied. An important part of 461 the evaluation of any particular design will be a thorough inventory 462 of all ways in which it might fail to satisfy these specifications. 464 4. Encoding Congestion Exposure 466 Most protocol specifications start with a description of packet 467 formats and codepoints with their associated meanings. This document 468 does not: It is already known that choosing the encoding for ConEx is 469 likely to entail some engineering compromises that have the potential 470 to reduce the protocol's usefulness in some settings. For instance 471 the experimental ConEx encoding chosen for IPv6 472 [I-D.ietf-conex-destopt] had to make compromises on tunnelling. 473 Rather than making these engineering choices prematurely, this 474 document sidesteps the encoding problem by making it abstract. It 475 describes several different representations of ConEx Signals, none of 476 which are specified to the level of specific bits or code points. 478 The goal of this approach is to be as complete as possible for 479 discovering the potential usage and capabilities of the ConEx 480 protocol, so we have some hope of making optimal design decisions 481 when choosing the encoding. Even if experiments reveal particular 482 problems due to the encoding, then this document will still serve as 483 a reference model. 485 4.1. Naive Encoding 487 For tutorial purposes, it is helpful to describe a naive encoding of 488 the ConEx protocol for TCP and similar protocols: set a bit (not 489 specified here) in the IP header on each retransmission and on each 490 ECN signaled window reduction. Network devices along the forward 491 path can see this bit and act on it. For example any device along 492 the path might limit the rate of all traffic if the rate of marked 493 (congested) packets exceeds a threshold. 495 This simple encoding is sufficient to illustrate many of the benefits 496 envisioned for ConEx. At first glance it looks like it might 497 motivate people to deploy and use it. It is a one line code change 498 that a small number of OS developers and content providers could 499 unilaterally deploy across a significant fraction of all Internet 500 traffic. However, this encoding does not support auditing so it 501 would also motivate users and/or applications to misrepresent the 502 congestion that they are causing [RFC3514]. As a consequence the 503 naive encoding is not likely to be trusted and thus creates its own 504 disincentives for deployment. 506 Nonetheless, this Naive encoding does present a clear mental model of 507 how the ConEx protocol might function under various uses. It is 508 useful for thought experiments where it can be stipulated that all 509 participants are honest and it does illustrate some of the incentives 510 that might be introduced by ConEx. 512 4.2. Null Encoding 514 In limited contexts it is possible to implement ConEx-like functions 515 without any signals at all by measuring rest-of-path congestion 516 directly from TCP headers. The algorithm is to keep at least one RTT 517 of past TCP headers and matching each new header against the history 518 to count duplicate data. 520 This could implement many ConEx policies, without any explicit 521 protocol. It is fairly easy to implement, at least at low rate (e.g. 523 in a software based edge router). However, it would only be useful 524 in cases where the network operator can see the TCP headers. This is 525 currently (2014) the majority of traffic because UDP, IPSec and VPN 526 tunnels are used far less than SSL or TLS over TCP/IP, which do not 527 hide TCP sequence numbers from network devices. However, anyone 528 specifically intending to avoid the attention of a congestion policy 529 device would only have to hide their TCP headers from the network 530 operator (e.g. by using a VPN tunnel). 532 4.3. ECN Based Encoding 534 The re-ECN specification [I-D.briscoe-conex-re-ecn-tcp] presents an 535 encoding of ConEx in IPv4 and IPv6 that was tightly integrated with 536 ECN encoding in order to fit into the IPv4 header. Any individual 537 packet may need to represent any ECN codepoint and any ConEx signal 538 value independently. So, ideally their encoding should be entirely 539 independent. However, given the limited number of header bits and/or 540 code points, re-ECN chooses to partially share code points and to re- 541 echo both losses and ECN with just one codepoint. 543 The central theme of the re-ECN work is an audit mechanism that 544 provides sufficient disincentives against misrepresenting congestion 545 [I-D.briscoe-conex-re-ecn-motiv]. It is analyzed extensively in 546 Briscoe's PhD dissertation [Refb-dis]. For a tutorial background on 547 re-ECN motivation and techniques, see [Re-fb, FairerFaster]. 549 Re-ECN is an example of one chosen set of compromises attempting to 550 meet the requirements of Section 3. The present document takes a 551 step back, aiming to state the ideal requirements in order to allow 552 the Internet community to assess whether different compromises might 553 be better. 555 The problem with Re-ECN is that it requires that receivers be ECN 556 enabled in addition to sender changes. Newer encodings 557 [I-D.ietf-conex-destopt] overcome this problem by being able to 558 represent loss and ECN based congestion separately. 560 4.4. Independent Bits 562 This encoding involves flag bits, each of which the sender can set 563 independently to indicate to the network one of the following four 564 signals: 565 ConEx (Not-ConEx) The transport is (or is not) using ConEx with this 566 packet (the protocol must be arranged so that legacy transport 567 senders implicitly send Not-ConEx; see network layer encoding 568 requirement L in Section 3.3) 570 Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not) 571 experienced a loss 572 Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not) 573 experienced ECN-signaled congestion 574 Credit (Not-Credit) The transport is (or is not) building up 575 congestion credit (see Section 5.5 on the audit function) 577 A packet with ConEx set combined with all the three other flags 578 cleared implies ConEx-Not-Marked 580 This encoding does not imply any exclusion property among the 581 signals. Multiple types of congestion (ECN, loss) can be signalled 582 on the same ACK. So, ideally, a ConEx sender would be able to 583 reflect these in the next packet. However, there will be many 584 invalid combinations of flags (e.g. Not-ConEx combined with any of 585 the ConEx-marked flags), which a malicious sender could use to 586 advantage against naive policy devices that only check each flag 587 separately. 589 As long as the packets in a flow have uniform sizes, it does not 590 matter whether the units of congestion are packets or bytes. 591 However, if an application sends very irregular packet sizes, it may 592 be necessary for the sender to mark multiple packets to avoid being 593 in technical violation of an audit function measuring in bytes (see 594 Section 4.6). 596 4.5. Codepoint Encoding 598 This encoding involves signaling one of the following five 599 codepoints: 601 ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit} 603 Each named codepoint has the same meaning as in the encoding using 604 independent bits in the previous section. The use of any one 605 codepoint implies the negative of all the others. 607 Inherently, the semantics of most of the enumerated codepoints are 608 mutually exclusive. 'Credit' is the only one that might need to be 609 used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even 610 that requirement is questionable. It must not be forgotten that the 611 enumerated encoding loses the flexibility to signal these two 612 combinations, whereas the encoding with four independent bits is not 613 so limited. Alternatively two extra codepoints could be assigned to 614 these two combinations of semantics. The comment in the previous 615 section about units also applies. 617 4.6. Units Implied by an Encoding 619 The following comments apply generally to all the other encodings. 621 Congestion can be due to exhaustion of bit-carrying capacity, or 622 exhaustion of packet processing power. When a packet is discarded or 623 marked to indicate congestion, there is no easy way to know whether 624 the lost or marked packet signifies bit-congestion or packet- 625 congestion. The above ConEx encodings that rely on marking packets 626 suffer from the same ambiguity. 628 This problem is most acute when audit needs to check that one count 629 of markings matches another. For example if there are ConEx markings 630 on three large (1500B) packets, is that sufficient to match the loss 631 of 5 small (60B) packets? If a packet-marking is defined to mean all 632 the bytes in the packet are marked, then we have 4500B of Conex 633 marked data against 300B of lost data, which is easily sufficient. 634 If instead we are counting packets, then we have 3 ConEx packets 635 against 5 lost packets, which is not sufficient. This problem will 636 not arise when all the packets in a flow are the same size, but a 637 choice needs to be made for flows in which packet sizes vary, such as 638 BGP, SPDY and some variable rate video encoding schemes. 640 Whether to use bytes or packets is not obvious. For instance, the 641 most expensive links in the Internet, in terms of cost per bit, are 642 all at lower data rates, where transmission times are large and 643 packet sizes are important. In order for a policy to consider wire 644 time, it needs to know the number of congested bytes. However, high 645 speed networking equipment and the transport protocols themselves 646 sometimes gauge resource consumption and congestion in terms of 647 packets. 649 This document does not take a strong position on this issue. 650 However, a ConEx encoding will need to explicitly specify whether it 651 assumes units of bytes or packets consistently for both congestion 652 indications and ConEx markings (see network layer requirement E in 653 Section 3.3). It may help to refer to the guidance in [RFC7141]. 655 [RFC7141] advises that congestion indications should be interpreted 656 in units of bytes when responding to congestion, at least on today's 657 Internet. In any TCP implementation this is simple to achieve for 658 varying size packets, given TCP SACK tracks losses in bytes. If an 659 encoding is specified in units of bytes, the encoding should also 660 specify which headers to include in the size of a packet (see network 661 layer requirement F in Section 3.3). 663 5. Congestion Exposure Components 665 The components shown in Figure 1 as well as policy and audit are 666 described in more detail. 668 5.1. Network Devices (Not modified) 670 Congestion signals originate from network devices as they do today. 671 A congested router, switch or other network device can discard or ECN 672 mark packets when it is congested. 674 5.2. Modified Senders 676 The sending transport needs to be modified to send Congestion 677 Exposure Signals in response to congestion feed back signals (e.g. 678 for the case of a TCP transport see [I-D.ietf-tcp-modifications]). 679 We want to permit ConEx without ECN (e.g. if the receiver does not 680 support ECN). However, we want to encourage a ConEx sender to at 681 least attempt to negotiate ECN (a ConEx transport protocol spec may 682 require this), because it is believed that ConEx without ECN is 683 harder to audit, and thus potentially exposed to cheating. Since 684 honest users have the potential to benefit from stronger mechanisms 685 to manage traffic they have an incentive to deploy ConEx and ECN 686 together. This incentive is not sufficient to prevent a dishonest 687 user from constructing (or configuring) a sender that enables ConEx 688 after choosing not to negotiate ECN, but it should be sufficient to 689 prevent this from being the sustained default case for any 690 significant pool of users. 692 Permitting ConEx without ECN is necessary to facilitate bootstrapping 693 other parts of ConEx deployment. 695 5.3. Receivers (Optionally Modified) 697 Any receiving transport may already feedback sufficiently useful 698 signals to the sender so that it does not need to be altered. 700 The native loss or ECN signaling mechanism required for compliance 701 with existing congestion control standards (e.g. RTCP, SCTP) will 702 typically be sufficient for the Sender to generate ConEx signals. 704 TCP's loss feedback is sufficient for ConEx if SACK is used 705 [RFC2018]. However, the original specification for ECN in TCP 706 [RFC3168] signals congestion no more than once per round trip. The 707 sender may require more precise feedback from the receiver otherwise 708 it is at risk of appearing to be understating its ConEx Signals. 710 Ideally, ConEx should be added to a transport like TCP without 711 mandatory modifications to the receiver. But in the TCP-ECN case an 712 optional modification to the receiver could be recommended for 713 precision (see [I-D.ietf-tcpm-accecn-reqs], which is based on the 714 approach originally taken when adding re-ECN to TCP 715 [I-D.briscoe-conex-re-ecn-tcp]). 717 5.4. Policy Devices 719 Policy devices are characterised by a need to be configured with a 720 policy related to the users or neighboring networks being served. In 721 contrast, auditing devices solely enforce compliance with the ConEx 722 protocol and do not need to be configured with any client-specific 723 policy. 725 One of the design goals of the ConEx protocol is that none of the 726 important policy mechanisms requires per flow state, and that policy 727 mechanisms can even be implemented for heavily aggregated traffic in 728 the core of the Internet with complexity akin to accumulating marking 729 volumes per logical link. Of course, policy mechanisms may sometimes 730 choose to focus down on individual flows, but ConEx aims to make 731 aggregate policy devices feasible. 733 5.4.1. Congestion Monitoring Devices 735 Policy devices can typically be decomposed into two functions i) 736 monitoring the ConEx signal to compare it with a policy then ii) 737 acting in some way on the result. Various actions might be invoked 738 against 'out of contract' traffic, such as policing (see 739 Section 5.4.3), re-routing, or downgrading the class of service. 741 Alternatively a policy device might not act directly on the traffic, 742 but instead report to management systems that are designed to control 743 congestion indirectly. For instance the reports might trigger 744 capacity upgrades, penalty clauses in contracts, levy charges based 745 on congestion, or merely send warnings to clients who are causing 746 excessive congestion. 748 Nonetheless, whatever action is invoked, the congestion monitoring 749 function will always be a necessary part of any policy device. 751 5.4.2. Rest-of-Path Congestion Monitoring 753 ConEx signals indicate the level of congestion along a whole path 754 from source to destination. In contrast, ECN signals monitored in 755 the middle of a network indicate the level of congestion experienced 756 so far on the path (of course, only in ECN-capable traffic). 758 If a monitor in the middle of a network (e.g. at a network border) 759 measures both of these signals, it can subtract the level of ECN 760 (path so far) from the level of ConEx (whole path) to derive a 761 measure of the congestion that packets are likely to experience 762 between the monitoring point and their destination (rest-of-path 763 congestion). 765 It will often be preferable for policy devices to monitor rest-of- 766 path congestion if they can, because it is a measure of the 767 downstream congestion that the policy device can directly influence 768 by controlling the traffic passing through it. 770 5.4.3. Congestion Policers 772 A congestion policer can be implemented in a very similar way to a 773 bit-rate policer, but its effect can be focused solely on traffic of 774 users causing congestion downstream, which ConEx signals make 775 visible. Without ConEx signals, the only way to mitigate congestion 776 is to blindly limit traffic bit-rate, on the assumption that high 777 bit-rate is more likely to cause congestion. 779 A congestion policer monitors all ConEx traffic entering a network, 780 or some identifiable subset. Using ConEx signals and/or Credit 781 signals (and preferably subtracting ECN signals to yield rest-of-path 782 congestion), it measures the amount of congestion that this traffic 783 is contributing somewhere downstream. If this persistently exceeds a 784 policy-configured 'congestion-bit-rate' the congestion policer can 785 limit all the monitored ConEx traffic. 787 A congestion policer can be implemented by a simple token bucket 788 applied to an aggregate. But unlike a bit-rate policer, it removes 789 tokens only when it forwards packets that are ConEx-Marked and/or 790 Credit-Marked, effectively treating Not-ConEx-Marked packets as 791 invisible. Consequently, because tokens give the right to send 792 congested bits, the fill-rate of the token bucket will represent the 793 allowed congestion-bit-rate. This should provide sufficient traffic 794 management without having to additionally constrain the straight bit- 795 rate at all. See [I-D.briscoe-conex-policing] for details. 797 Note that the policing action could be to introduce a throttle 798 (discard some traffic) immediately upstream of the congestion 799 monitor. Alternatively, this throttle could introduce delay using a 800 queue with its own AQM, which potentially increases the whole path 801 congestion. In effect the congestion policer has moved the 802 congestion earlier in the path, and focused it on one user to protect 803 downstream resources by reducing the congestion in the rest of the 804 path. 806 5.5. Audit 808 The most critical aspect of ConEx is the capability to support robust 809 auditing. It can be assumed that sanctions based on ConEx signals 810 will create an intrinsic motivation for users to understate the 811 congestion that they are causing. So, without strong audit 812 functions, the ConEx signal would become understated to the point of 813 being useless. Therefore the most important feature of an encoding 814 design is likely to be the robustness of the auditing it supports. 816 The general goal of an auditor is to make sure that any ConEx-enabled 817 traffic is sent with sufficient ConEx-Re-Echo and ConEx-Credit 818 signals. A concrete definition of the ConEx protocol MUST define 819 what sufficient means. 821 If a ConEx-enabled transport does not carry sufficient ConEx signals, 822 then an auditor is likely to apply some sanction to that traffic. 823 Although sanctions are beyond the scope of this document, an example 824 sanction might be to throttle the traffic immediately upstream of the 825 auditor to prevent the user from getting any advantage by 826 understating congestion. Such a throttle would likely include some 827 combination of delaying or dropping traffic. 829 A ConEx auditor might use one of the following techniques: 830 Generic loss auditing: 832 For congestion signaled by loss, totally accurate auditing is not 833 believed to be possible in the general case, because it involves a 834 network node detecting the absence of some packets, when it cannot 835 always necessarily identify retransmissions or missing packets. 836 The missing packet might simply be taking a different route, or 837 the IP payload may be encrypted. 839 It is for this reason that it is desirable to motivate the 840 deploying of ECN, even though ECN is not strictly required for 841 ConEx. 842 ECN auditing: 844 Directly observe and compare the volume of ECN and ConEx marks. 845 Since the volume of ECN marks rises monotonically along a path, 846 ECN auditing is most accurate when located near the transport 847 receiver. For this reason ECN should be monitored downstream of 848 the predominant bottleneck. 849 TCP-specific loss auditing: 851 For non-encrypted standard TCP traffic on a single path, a 852 tactical audit approach could be to measure losses by detecting 853 retransmissions, which appear as duplicate sequence numbers 854 upstream of the loss and out of order data downstream of the loss. 855 Since some reordering is present in the Internet, such a loss 856 estimator would be most accurate near the sender. Such an audit 857 device should treat non-ECN-capable packets with encrypted IP 858 payload as Not-ConEx, even if they claim to be ConEx-capable, 859 unless the operator is also using one of the other two techniques 860 below that can audit such packets against losses. 861 Predominant bottleneck loss auditing: 863 For networks designed so that losses predominantly occur under the 864 control of one IP-aware bottleneck node on the path, the auditor 865 could be located at this bottleneck. It could simply compare 866 ConEx Signals with actual local packet discards (and ECN marks). 867 This is a good model for most consumer access networks where audit 868 accuracy could well be sufficient even if losses occasionally 869 occur elsewhere in the network. 871 Although the auditor at the predominant bottleneck would not be 872 able to count losses at other nodes, transports would not know 873 where losses were occurring either. Therefore a transport would 874 not know which losses it could cheat and which ones it couldn't 875 without getting caught. 876 ECN tunnel loss auditing: 878 A network operator can arrange IP-in-IP tunnels (or IP-in-MPLS 879 etc.) so that any losses within the tunnels are deferred until the 880 tunnel egress. Then the audit function can be deployed at the 881 egress and be aware of all losses. This is possible by enabling 882 ECN marking on switches and routers within a tunnel, irrespective 883 of whether end-systems support ECN, by exploiting a side-effect of 884 the way tunnels handle the ECN field. After encapsulation at the 885 tunnel ingress, the network should arrange for any non-ECN packets 886 (with '00' in ECN field of the outer) to be set to the ECN-capable 887 transport (ECT(0)) codepoint. Then, if they experience congestion 888 at one of the ECN-capable switches or routers within the tunnel, 889 some will be ECN-marked rather than immediately dropped. However, 890 when the tunnel decapsulator strips the outer from such an ECN- 891 marked packet, if it finds the inner header has '00' in the ECN 892 field (meaning that the endpoints do not support ECN) it will 893 automatically drop the packet, assuming it complies with 894 [RFC6040]. Thus, an audit function at the decapsulator can know 895 which packets would have been dropped within the tunnel (and even 896 which are genuinely ECN-marked for the end-to-end protocol). Non- 897 ECN end-systems outside the tunnel see no sign of the use of ECN 898 internally. 900 In addition, other audit techniques may be identified in the future. 902 [Refb-dis] gives a comprehensive inventory of attacks against audit 903 proposed by various people. It includes pseudocode for both 904 deterministic and statistical audit functions designed to thwart 905 these attacks and analyses the effectiveness of an implementation. 906 Although this work is specific to the re-ECN protocol, most of the 907 material is useful for designing and assessing audit of other 908 specific ConEx encodings, against both ECN and loss. 910 The auditing function should be able to trigger sufficient sanction 911 to discourage understating congestion [Salvatori05]. This seems to 912 require designing the sanction in concert with the policy functions, 913 even though they might be implemented in different parts of the 914 network. However, [Refb-dis] proves audit and policy functions can 915 be independent as long as audit drops sufficient traffic to 916 'normalise' actual congestion signals to be no greater than ConEx 917 signals. 919 Similarly, the job of incentivising the sending of ConEx-enabled 920 packets is proper solely to policy devices, independent of the audit 921 function. The audit function's job is policy-neutral, so it should 922 be solely confined to checking for correctness within those packets 923 that have been marked as ConEx-capable. Even if there are Not-ConEx 924 packets mixed with ConEx packets within a flow, audit will not need 925 to monitor any Not-ConEx packets. 927 Note that in the future it might prove to be desirable to provide 928 advice on uniformly implementing sanctions, because otherwise 929 insufficient sanctions could impair the ability to implement policy 930 elsewhere in the network. 932 Some of the audit algorithms require per flow state. This cost is 933 expected to be tolerable, because these techniques are most apropos 934 near the edges of the network, where traffic is generally much less 935 aggregated, so the state need not overwhelm any one device. The 936 flow-state required for audit creates itself as it detects new flows. 937 Therefore a flow will not fail if it is re-routed away from the audit 938 box currently holding its flow-state, so auditing does not require 939 route pinning and works fine with multipath flows. 941 Holding flow-state seems to create a vulnerability to attacks that 942 exhaust the auditor's memory by opening numerous new short flows. 943 The audit function can protect itself from this attack by not 944 allocating new flow-state unless a ConEx-marked packet arrives (e.g. 945 credit at the start of a flow). Because policy devices rate limit 946 ConEx-marked packets, this sets a natural limit to the rate at which 947 a source can create flow-state in audit devices. The auditor would 948 treat all the remaining flows without any ConEx-marked packets as a 949 single misbehaving aggregate. 951 Auditing can be distributed and redundant. One flow may be audited 952 in multiple places, using multiple techniques. Some audit techniques 953 do not require any per flow state and can be applied to aggregate 954 traffic. These might be able to detect the presence of understated 955 congestion at large scale and support recursively hunting for 956 individual flows that are understating their congestion. Even at 957 large scales, flows can be randomly selected for individual auditing. 959 Sampling techniques can also be used to bound the total auditing 960 memory footprint, although the implementer must counter the tactic 961 where a source cheats until caught by sampling, then simply discards 962 that flow ID and starts cheating with a new one (termed 'identifier 963 white-washing when caught'). 965 ConEx Credit and ConEx-Re-Echo signals are intended to be audited 966 separately. The Credit signal can be audited directly against actual 967 congestion (loss and ECN). However, there will be an inherent delay 968 of at least one round trip between a congestion signal and the 969 subsequent ConEx-Re-Echo signal it triggers, as shown in Figure 1. 970 Therefore ConEx-Re-Echo signals will need to be audited with some 971 allowance for this delay. 973 6. Support for Incremental Deployment 975 The ConEx abstract protocol described so far is intended to support 976 incremental deployment in every possible respect. For convenience, 977 the following list collects together all the features that support 978 incremental deployment in the concrete ConEx specifications, and 979 points to further information on each: 980 Packets: The wire protocol encoding allows each packet to indicate 981 whether it is using ConEx or not (see Section 4 on Encoding 982 Congestion Exposure). 983 Senders: ConEx requires a modification to the source in order to 984 send ConEx packet markings (see Section 5.2). Although ConEx 985 support can be indicated on a packet-by-packet basis, it is likely 986 that all the packets in a flow will either consistently support 987 ConEx or consistently not. It is also likely that, if the 988 implementation of a transport protocol supports ConEx, all the 989 packets sent from that host using that protocol will be ConEx 990 marked. 992 The implementations of some of the transport protocols on a host 993 might not support ConEx (e.g. the implementation of DNS over UDP 994 might not support ConEx, while perhaps RTP over UDP and TCP will). 995 Any non-upgraded transports and non-upgraded hosts will simply 996 continue to send regular Not-ConEx packets as always. 998 A network operator can create incentives for senders to 999 voluntarily reveal ConEx information (see the item on incremental 1000 deployment by 'Networks' below). 1001 Receivers: A ConEx source should be able to work with the regular 1002 receiver for the transport in question, without requiring any 1003 ConEx-specific modifications. This is true for modern transport 1004 protocols (RTCP, SCTP etc) and it is even true for TCP, as long as 1005 the receiver supports SACK, which is widely deployed anyway. 1006 However, it is not true for ECN feedback in TCP. The need for 1007 more precise ECN feedback in TCP is not exclusive to ConEx, for 1008 instance Data Centre TCP (DCTCP [DCTCP]) uses precise feedback to 1009 good effect. Therefore, if a receiver offers precise feedback, 1010 [I-D.ietf-tcpm-accecn-reqs] it will be best if ConEx uses it (see 1011 Section 5.3). Alternatively, without sufficiently precise 1012 congestion feedback from the receiver, the source may have to 1013 conservatively send extra ConEx markings in order to avoid 1014 understating congestion. 1015 Proxies: Although it was stated above that ConEx requires a 1016 modification to the source, ConEx signals could theoretically be 1017 introduced by a proxy for the source, as long as it can intercept 1018 feedback from the receiver. Similarly, more precise feedback 1019 could thoretically be provided by a proxy for the receiver rather 1020 than modifying the receiver itself. 1021 Forwarding: 1023 No modification to forwarding or queuing is needed for ConEx. 1025 However, once some ConEx is deployed, it is possible that a queue 1026 implementation could optionally take advantage of the ConEx 1027 information in packets. For instance, it has been suggested 1028 [I-D.ietf-conex-destopt] that a queue would be more robust against 1029 flooding if it preferentially discarded Not-ConEx packets then 1030 Not-Marked ConEx packets. 1032 A ConEx sender re-echoes congestion whether the queues signaling 1033 congestion are ECN-enabled or not. Nonetheless, an operator 1034 relying on ConEx signals is recommended to enable ECN in queues 1035 wherever possible. This is because auditing works best if most 1036 congestion is indicated by ECN rather than loss (see Section 3). 1037 Also, monitoring rest-of-path congestion is not accurate if there 1038 are congested non-ECN queues upstream of the monitoring point 1039 (Section 5.4.2). 1040 Networks: If a subset of traffic sources (or proxies) use ConEx 1041 signals to reveal congestion in the internetwork layer, a network 1042 operator can choose (or not) to use this information for traffic 1043 management. As long as the end-to-end ConEx signals are present, 1044 each network can unilaterally choose to use them--independently of 1045 whether other networks do. 1047 ConEx marked packets may safely traverse a network that ignores 1048 them. ConEx signals are defined to remain unchanged once set by 1049 the sender, but some encodings may allow changes in transit (e.g. 1050 by proxies). In no circumstances will a network node change ConEx 1051 marked packets to Not-ConEx (network layer encoding requirement I 1052 in Section 3.3). If necessary, endpoints should be able to detect 1053 if a network is removing ConEx signals (network layer encoding 1054 requirement H in Section 3.3). 1056 An operator can deploy policy devices (Section 5.4) wherever 1057 traffic enters its network, in order to monitor the downstream 1058 congestion that incoming traffic contributes to, and control it if 1059 necessary. A network operator can create incentives for the 1060 developers of sending applications and transports to voluntarily 1061 reveal ConEx information. Without ConEx information, a network 1062 operator tends to have to limit the bit-rate or volume from a site 1063 more than is necessary, just in case it might congest others. 1064 With ConEx information, the operator can solely limit congestion- 1065 causing traffic, and otherwise allow complete freedom. This 1066 greater freedom acts as an inducement for the source to volunteer 1067 ConEx information. An operator may also monitor whether a source 1068 transport has sent ConEx packets, and treat the same transport 1069 with greater suspicion (e.g. a more stringent rate-limit) whenever 1070 it selectively sends packets without ConEx support. See [RFC6789] 1071 for further discussion of deployment incentives for networks and 1072 references to scenarios where some networks use ConEx-based policy 1073 devices and others don't. 1075 An operator can deploy audit devices (Section 5.5) unilaterally 1076 within its own network to verify that traffic sources are not 1077 understating ConEx information. From the viewpoint of one network 1078 operator (say N_a), it only cares that the level of ConEx 1079 signaling is sufficient to cover congestion in its own network. 1080 If traffic continues into a congested downstream network (say 1081 N_b), it is of no concern to the first network (N_a) if the end- 1082 to-end ConEx signaling is insufficient to cover the congestion in 1083 N_b as well. This is N_b's concern, and N_b can both detect such 1084 anomalous traffic and deal with it using ConEx-based audit devices 1085 itself. 1087 7. IANA Considerations 1089 This memo includes no request to IANA. 1091 Note to RFC Editor: this section may be removed on publication as an 1092 RFC. 1094 8. Security Considerations 1096 The only known risk associated with ConEx is that users and 1097 applications are very likely to be motivated to under-represent the 1098 congestion that they are causing. Significant portions of this 1099 document are about mechanisms to audit the ConEx signals and create 1100 sufficient sanction to inhibit such under-representation. In 1101 particular see Section 5.5. 1103 Security attacks and their defences are best discussed against a 1104 concrete protocol specification, not the abstract mechanism of this 1105 document. A concrete ConEx protocol will need to be accompanied by a 1106 document describing how the protocol and its audit mechanisms defend 1107 against likely attacks. [Refb-dis] will be a useful source for such 1108 a document. It gives a comprehensive inventory of attacks against 1109 audit that have been proposed by various parties. It includes 1110 pseudocode for both deterministic and statistical audit functions 1111 designed to thwart these attacks and analyses the effectiveness of an 1112 implementation. 1114 However, [Refb-dis] is specific to the re-ECN protocol, which 1115 signalled ECN & loss together, whereas the concrete ConEx protocol 1116 defined in [I-D.ietf-conex-destopt] signals them separately. 1117 Therefore, although likely attacks will be similar, there will be 1118 more combinations of attacks to worry about, and defences and their 1119 analysis are likely to be a little different for ConEx. 1121 The main known attacks that a security document for a concrete ConEx 1122 protocol will need to address are listed below, and [Refb-dis] should 1123 be referred to for how re-ECN was designed to defend against similar 1124 attacks: 1125 o Attacks on the audit function (see Section 7.5 of [Refb-dis]): 1126 Flow ID Whitewashing: Designing the audit function so that a 1127 source cannot gain from starting a new flow once audit has 1128 detected cheating in a previous flow. 1129 Dragging Down an Aggregate: Avoiding audit discarding packets 1130 from all flows within an aggregate, which would allow one flow 1131 to pull down the average so that the audit function would 1132 discard packets from all flows, not just the offending flow. 1133 Dragging Down a Spoofed Flow ID: An attacker understates ConEx 1134 markings in packets that spoof another flow, which fools the 1135 audit function into dropping the genuine user's packets. 1136 o Attacks by networks on other networks (see Section 8.2 of 1137 [Refb-dis]): 1139 Dummy Traffic: Sending dummy traffic across a border with 1140 understated ConEx markings to bring down the average ConEx 1141 markings in the aggregate of border traffic. This attack can 1142 be combined with a TTL that expires before the packets reach an 1143 audit function. 1144 Signal Poisoning with 'Cancelled' Marking: Sending high volumes 1145 of valid packets that are both ConEx-Marked and ECN-Marked, 1146 which seems to represent congestion upstream, but it makes 1147 these packets immune to being further ECN-Marked downstream. 1149 It is planned to document all known attacks and their defences 1150 (including all the above) in the RFC series against a concrete ConEx 1151 protocol specification. In the interim [Refb-dis] and its references 1152 should be referred to for details and ways to address these attacks 1153 in the case of re-ECN. 1155 9. Acknowledgements 1157 This document was improved by review comments from Toby Moncaster, 1158 Nandita Dukkipati, Mirja Kuehlewind, Caitlin Bestler, Marcelo Bagnulo 1159 Braun, John Leslie, Ingemar Johansson and David Wagner. 1161 10. Comments Solicited 1163 Comments and questions are encouraged and very welcome. They can be 1164 addressed to the IETF Congestion Exposure (ConEx) working group 1165 mailing list , and/or to the authors. 1167 11. References 1169 11.1. Normative References 1171 [RFC2119] Bradner, S., "Key words for use in 1172 RFCs to Indicate Requirement 1173 Levels", BCP 14, RFC 2119, 1174 March 1997. 1176 11.2. Informative References 1178 [CheapPseud] Friedman, E. and P. Resnick, "The 1179 Social Cost of Cheap Pseudonyms", 1180 Journal of Economics and Management 1181 Strategy 10(2)173--199, 1998. 1183 [DCTCP] Alizadeh, M., Greenberg, A., Maltz, 1184 D., Padhye, J., Patel, P., 1185 Prabhakar, B., Sengupta, S., and M. 1186 Sridharan, "Data Center TCP 1187 (DCTCP)", ACM SIGCOMM 1188 CCR 40(4)63--74, October 2010, . 1192 [Evol_cc] Gibbens, R. and F. Kelly, "Resource 1193 pricing and the evolution of 1194 congestion control", 1195 Automatica 35(12)1969--1985, 1196 December 1999, . 1200 [FairerFaster] Briscoe, B., "A Fairer, Faster 1201 Internet Protocol", IEEE 1202 Spectrum Dec 2008:38--43, 1203 December 2008, . 1207 [I-D.briscoe-conex-policing] Briscoe, B., "Network Performance 1208 Isolation using Congestion 1209 Policing", 1210 draft-briscoe-conex-policing-00 1211 (work in progress), February 2013. 1213 [I-D.briscoe-conex-re-ecn-motiv] Briscoe, B., Jacquet, A., 1214 Moncaster, T., and A. Smith, "Re- 1215 ECN: A Framework for adding 1216 Congestion Accountability to 1217 TCP/IP", 1218 draft-briscoe-conex-re-ecn-motiv-02 1219 (work in progress), July 2013. 1221 [I-D.briscoe-conex-re-ecn-tcp] Briscoe, B., Jacquet, A., 1222 Moncaster, T., and A. Smith, "Re- 1223 ECN: Adding Accountability for 1224 Causing Congestion to TCP/IP", 1225 draft-briscoe-conex-re-ecn-tcp-02 1226 (work in progress), July 2013. 1228 [I-D.ietf-conex-destopt] Krishnan, S., Kuehlewind, M., and 1229 C. Ucendo, "IPv6 Destination Option 1230 for ConEx", 1231 draft-ietf-conex-destopt-05 (work 1232 in progress), October 2013. 1234 [I-D.ietf-tcp-modifications] Kuehlewind, M. and R. 1236 Scheffenegger, "TCP modifications 1237 for Congestion Exposure", draft- 1238 ietf-conex-tcp-modifications-04 1239 (work in progress), July 2013. 1241 [I-D.ietf-tcpm-accecn-reqs] Kuehlewind, M. and R. 1242 Scheffenegger, "Problem Statement 1243 and Requirements for a More 1244 Accurate ECN Feedback", 1245 draft-ietf-tcpm-accecn-reqs-04 1246 (work in progress), October 2013. 1248 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., 1249 and A. Romanow, "TCP Selective 1250 Acknowledgment Options", RFC 2018, 1251 October 1996. 1253 [RFC3168] Ramakrishnan, K., Floyd, S., and D. 1254 Black, "The Addition of Explicit 1255 Congestion Notification (ECN) to 1256 IP", RFC 3168, September 2001. 1258 [RFC3514] Bellovin, S., "The Security Flag in 1259 the IPv4 Header", RFC 3514, April 1 1260 2003. 1262 [RFC3550] Schulzrinne, H., Casner, S., 1263 Frederick, R., and V. Jacobson, 1264 "RTP: A Transport Protocol for 1265 Real-Time Applications", STD 64, 1266 RFC 3550, July 2003. 1268 [RFC5348] Floyd, S., Handley, M., Padhye, J., 1269 and J. Widmer, "TCP Friendly Rate 1270 Control (TFRC): Protocol 1271 Specification", RFC 5348, 1272 September 2008. 1274 [RFC5681] Allman, M., Paxson, V., and E. 1275 Blanton, "TCP Congestion Control", 1276 RFC 5681, September 2009. 1278 [RFC6040] Briscoe, B., "Tunnelling of 1279 Explicit Congestion Notification", 1280 RFC 6040, November 2010. 1282 [RFC6679] Westerlund, M., Johansson, I., 1283 Perkins, C., O'Hanlon, P., and K. 1285 Carlberg, "Explicit Congestion 1286 Notification (ECN) for RTP over 1287 UDP", RFC 6679, August 2012. 1289 [RFC6789] Briscoe, B., Woundy, R., and A. 1290 Cooper, "Congestion Exposure 1291 (ConEx) Concepts and Use Cases", 1292 RFC 6789, December 2012. 1294 [RFC6817] Shalunov, S., Hazel, G., Iyengar, 1295 J., and M. Kuehlewind, "Low Extra 1296 Delay Background Transport 1297 (LEDBAT)", RFC 6817, December 2012. 1299 [RFC7141] Briscoe, B. and J. Manner, "Byte 1300 and Packet Congestion 1301 Notification", BCP 41, RFC 7141, 1302 February 2014. 1304 [Re-fb] Briscoe, B., Jacquet, A., Di 1305 Cairano-Gilfedder, C., Salvatori, 1306 A., Soppera, A., and M. Koyabe, 1307 "Policing Congestion Response in an 1308 Internetwork Using Re-Feedback", 1309 ACM SIGCOMM CCR 35(4)277--288, 1310 August 2005, . 1314 [Refb-dis] Briscoe, B., "Re-feedback: Freedom 1315 with Accountability for Causing 1316 Congestion in a Connectionless 1317 Internetwork", UCL PhD 1318 Dissertation , 2009, 1319 . 1322 [Salvatori05] Salvatori, A., "Closed Loop Traffic 1323 Policing", Politecnico Torino and 1324 Institut Eurecom Masters Thesis , 1325 September 2005. 1327 Authors' Addresses 1329 Matt Mathis 1330 Google, Inc 1331 1600 Amphitheater Parkway 1332 Mountain View, California 93117 1333 USA 1335 EMail: mattmathis at google.com 1337 Bob Briscoe 1338 BT 1339 B54/77, Adastral Park 1340 Martlesham Heath 1341 Ipswich IP5 3RE 1342 UK 1344 Phone: +44 1473 645196 1345 EMail: bob.briscoe@bt.com 1346 URI: http://bobbriscoe.net/