idnits 2.17.1 draft-ietf-conex-abstract-mech-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 14, 2011) is 4791 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'FairerFaster' is defined on line 682, but no explicit reference was found in the text == Unused Reference: 'Re-fb' is defined on line 778, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-conex-concepts-uses-01 == Outdated reference: A later version (-10) exists of draft-ietf-ledbat-congestion-03 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion Exposure (ConEx) Working M. Mathis 3 Group Google, Inc 4 Internet-Draft B. Briscoe 5 Intended status: Informational BT 6 Expires: September 15, 2011 March 14, 2011 8 Congestion Exposure (ConEx) Concepts and Abstract Mechanism 9 draft-ietf-conex-abstract-mech-01 11 Abstract 13 This document describes an abstract mechanism by which senders inform 14 the network about the congestion encountered by packets earlier in 15 the same flow. Today, the network may signal congestion to the 16 receiver by ECN markings or by dropping packets, and the receiver 17 passes this information back to the sender in transport-layer 18 feedback. The mechanism to be developed by the ConEx WG will enable 19 the sender to also relay this congestion information back into the 20 network in-band at the IP layer, such that the total level of 21 congestion is visible to all IP devices along the path, from where it 22 could, for example, provide input to traffic management. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on September 15, 2011. 41 Copyright Notice 43 Copyright (c) 2011 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Requirements for the ConEx Signal . . . . . . . . . . . . . . 5 61 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 6 62 3.1. Strawman Encoding . . . . . . . . . . . . . . . . . . . . 7 63 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 7 64 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 8 65 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9 66 3.3.1. Independent Bits . . . . . . . . . . . . . . . . . . . 9 67 3.3.2. Codepoint Encoding . . . . . . . . . . . . . . . . . . 9 68 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 10 69 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 10 70 4.2. Receivers (Optionally Modified) . . . . . . . . . . . . . 10 71 4.3. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 4.3.1. Using Credit to Simplify Audit . . . . . . . . . . . . 11 73 4.3.2. Behaviour Constraints for the Audit Function . . . . . 12 74 4.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 13 75 4.4.1. Policy Monitoring Devices . . . . . . . . . . . . . . 13 76 4.4.2. Congestion Policers . . . . . . . . . . . . . . . . . 13 77 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 79 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14 80 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 81 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 14 82 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 83 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 84 10.2. Informative References . . . . . . . . . . . . . . . . . . 14 86 1. Introduction 88 One of the required functions of a transport protocol is controlling 89 congestion in the network. There are three techniques in use today 90 for the network to signal congestion to a transport: 91 o The most common congestion signal is packet loss. When congested, 92 the network simply discards some packets either as part of an 93 active queue management function [RFC2309] or as the consequence 94 of a queue overflow or other resource starvation. The transport 95 receiver detects that some data is missing and signals such 96 through transport acknowledgments to the transport sender (e.g. 97 TCP SACK options). The sender performs the appropriate congestion 98 control rate reduction (e.g. [RFC5681] for TCP) and, if it is a 99 reliable transport, it retransmits the missing data. 100 o If the transport supports explicit congestion notification (ECN) 101 [RFC3168] or pre-congestion notification (PCN) [RFC5670] , the 102 transport sender indicates this by setting an ECN-capable 103 transport (ECT) codepoint in every packet. Network devices can 104 then explicitly signal congestion to the receiver by setting ECN 105 bits in the IP header of such packets. The transport receiver 106 communicates these ECN signals back to the sender, which then 107 performs the appropriate congestion control rate reduction. 108 o Some experimental transport protocols and TCP variants [Vegas] 109 sense queuing delays in the network and reduce their rate before 110 the network has to signal congestion using loss or ECN. A purely 111 delay-sensing transport will tend to be pushed out by other 112 competing transports that do not back off until they have driven 113 the queue into loss. Therefore, modern delay-sensing algorithms 114 use delay in some combination with loss to signal congestion (e.g. 115 LEDBAT [I-D.ietf-ledbat-congestion], Compound 116 [I-D.sridharan-tcpm-ctcp]). In the rest of this document, we will 117 confine the discussion to concrete signals of congestion such as 118 loss and ECN. We will not discuss delay-sensing further, because 119 it can only avoid these more concrete signals of congestion in 120 some circumstances. 122 In all cases the congestion signals follow the route indicated in 123 Figure 1. A congested network device sends a signal in the data 124 stream on the forward path to the transport receiver, the receiver 125 passes it back to the sender through transport level feedback, and 126 the sender makes some congestion control adjustment. 128 This document proposes to extend the capabilities of the Internet 129 protocol suite with the addition of a ConEx Signal that, to a first 130 approximation, relays the congestion information from the transport 131 sender back through the internetwork layer. That signal is shown in 132 Figure 1. It would be visible to all internetwork layer devices 133 along the forward (data) path and is intended to support a number of 134 new policy-controlled mechanisms that might be used to manage 135 traffic. 137 There is no expectation that internetwork layer devices will do fine- 138 grained congestion control using ConEx information. That is still 139 probably best done at the transport sender. Rather, the network will 140 be able to use ConEx information to do better bulk traffic 141 management, which in turn should incentivize end-system transports to 142 be more careful about congesting others [I-D.conex-concepts-uses]. 144 +---------+ +---------+ 145 |Transport| +-----------+ |Transport| 146 | Sender |>=Data=Path=>|(Congested)|>=====Data=Path=====>| Receiver| 147 | | | Network |>-Congestion-Signal->|---. | 148 | | | Device | | | | 149 | | +-----------+ | | | 150 | | | | | 151 | |<==Feedback=Path==============================<| | | 152 | ,---|<--Transport Layer returned Congestion Signal-<|<--' | 153 | | | | | 154 | | |>==============Data=Path======================>| | 155 | `-->|>---------(new)-IP layer ConEx Signal--------->| | 156 | | (Carried in Data Packet Headers) | | 157 +---------+ +---------+ 159 Not shown are policy devices along the data path that observe the 160 ConEx Signal, and use the information to monitor or manage traffic. 161 These are discussed in Section 4.4. 163 Figure 1 165 1.1. Terminology 167 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 168 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 169 document are to be interpreted as described in RFC 2119 [RFC2119]. 171 ConEx signals in IP packet headers from the sender to the network 172 {ToDo: These are placeholders for whatever words we decide to use}: 173 Not-ConEx: The transport is not ConEx-capable 174 ConEx-Capable: The transport is ConEx-Capable. This is the opposite 175 of Not-ConEx and implies one of the following signals 176 Re-Echo-Loss: (aka Purple) The transport has experienced a loss 177 Re-Echo-ECN: (aka Black) The transport has experienced an ECN 178 mark 180 Credit: (aka Green) The transport is building up credit to allow 181 for any future delay in expected ConEx signals (see 182 Section 4.3.1) 183 ConEx-Not-Marked: The transport is ConEx-capable but is signaling 184 none of Re-Echo-Loss, Re-Echo-ECN or Credit 185 ConEx-Marked: At least one of Re-Echo-Loss, Re-Echo-ECN or 186 Credit. 188 2. Requirements for the ConEx Signal 190 Ideally, all the following requirements would be met by a Congestion 191 Exposure Signal. However it is already known that some compromises 192 will be necessary, therefore all the requirements are expressed with 193 the keyword 'SHOULD' rather than 'MUST'. The only mandatory 194 requirement is that a concrete protocol description MUST give sound 195 reasoning if it chooses not to meet any of these requirements: 196 a. The ConEx Signal SHOULD be visible to internetwork layer devices 197 along the entire path from the transport sender to the transport 198 receiver. Equivalently, it SHOULD be present in the IPv4 or IPv6 199 header, and in the outermost IP header if using IP in IP 200 tunneling. The ConEx Signal SHOULD be immutable once set by the 201 transport sender. A corollary of these requirements is that the 202 chosen ConEx encoding SHOULD pass silently without modification 203 through pre-existing networking gear. 204 b. The ConEx Signal SHOULD be useful under only partial deployment. 205 A minimal deployment SHOULD only require changes to transport 206 senders. Furthermore, partial deployment SHOULD create 207 incentives for additional deployment, both in terms of enabling 208 ConEx on more devices and adding richer features to existing 209 devices. Nonetheless, ConEx deployment need never be universal, 210 and it is anticipated that some hosts and some transports may 211 never support the ConEx Protocol and some networks may never use 212 the ConEx Signals. 213 c. The ConEx Signal SHOULD be accurate. In potentially hostile 214 environments such as the public Internet, it SHOULD be possible 215 for techniques to be deployed to audit the Congestion Exposure 216 Signal by comparing it to the actual congestion signals on the 217 forward data path. The auditing mechanism must have a capability 218 for providing sufficient disincentives against misreported 219 congestion, such as by throttling traffic that reports less 220 congestion than it is actually experiencing. 221 d. The ConEx Signal SHOULD be timely. There will be a delay between 222 the time when an auditing device sees an actual congestion signal 223 and when it sees the subsequent Congestion Exposure Signal from 224 the sender. The minimum delay will be one round trip, but it may 225 be much longer depending on the transport's choice of feedback 226 delay (consider RTCP [RFC3550] for example). It is not practical 227 to expect auditing devices in the network to make allowance for 228 such feedback delays. Instead, the sender SHOULD be able to send 229 ConEx signals in advance, as 'credit' for any audit function to 230 hold as a balance against the risk of congestion during the 231 feedback delay. This design choice greatly simplifies auditing 232 (see Section 4.3.1). 234 It is important to note that the auditing requirement implies a 235 number of additional constraints: The basic auditing technique is to 236 count both actual congestion signals and ConEx Signals someplace 237 along the data path: 238 o For congestion signaled by ECN, auditing is most accurate when 239 located near the transport receiver. Within any flow or aggregate 240 of flows, the volume of data tagged with ConEx Signals should 241 never be less than the total volume of ECN marked data seen near 242 the receiver. 243 o For congestion signaled by loss, totally accurate auditing is not 244 believed to be possible in the general case, because it involves a 245 network node detecting the absence of some packets, when it cannot 246 necessarily see the transport protocol sequence numbers and when 247 the missing packets might simply be taking a different route. But 248 there are common cases where sufficient audit accuracy should be 249 possible: 250 * For non-IPsec traffic conforming to standard TCP sequence 251 numbering on a single path, an auditor could detect losses by 252 observing both the original transmission and the retransmission 253 after the loss. Such auditing would be most accurate near the 254 sender. 255 * For networks designed so that losses predominantly occur under 256 the management of one IP-aware node on the path, the auditor 257 could be located at this bottleneck. It could simply compare 258 ConEx Signals with actual local losses. This is a good model 259 for most consumer access networks where audit accuracy could 260 well be sufficient even if losses occasionally occur at other 261 nodes in the network, such as border gateways (see Section 4.3 262 for details). 264 Given that loss-based and ECN-based ConEx might sometimes be best 265 audited at different locations, having distinct encodings would widen 266 the design space for the auditing function. 268 3. Representing Congestion Exposure 270 Most protocol specifications start with a description of packet 271 formats and codepoints with their associated meanings. This document 272 does not: It is already known that choosing the encoding for the 273 ConEx Signal is likely to entail some engineering compromises that 274 have the potential to reduce the protocol's usefulness in some 275 settings. Rather than making these engineering choices prematurely, 276 this document side steps the encoding problem by describing an 277 abstract representation of ConEx Signals. All of the elements of the 278 protocol can be defined in terms of this abstract representation. 279 Most important, the preliminary use cases for the protocol are 280 described in terms of the abstract representation in companion 281 documents [I-D.conex-concepts-uses]. 283 Once we have some example use cases we can evaluate different 284 encoding schemes. Since these schemes are likely to include some 285 conflated code points, some information will be lost resulting in 286 weakening or disabling some of the algorithms and eliminating some 287 use cases. 289 The goal of this approach is to be as complete as possible for 290 discovering the potential usage and capabilities of the ConEx 291 protocol, so we have some hope of making optimal design decisions 292 when choosing the encoding. 294 3.1. Strawman Encoding 296 As an aid to the reader, it might be helpful to describe a naive 297 strawman encoding of the ConEx protocol described solely in terms of 298 TCP: set the Reserved bit in the IPv4 header (bit 48 counting from 299 zero [RFC0791]--aka the "evil bit" [RFC3514]) on all retransmissions 300 or once per ECN signaled window reduction. Clearly network devices 301 along the forward path can see this bit and act on it. For example 302 they can count marked and unmarked packets to estimate the congestion 303 levels along the path. 305 However, the IESG has chartered the ConEx working group to establish 306 that there is sufficient demand for an IPv6 ConEx protocol before 307 using the last available bit in the IPv4 header. Furthermore this 308 encoding, by itself, does not sufficiently support partial deployment 309 or strong auditing and might motivate users and/or applications to 310 misrepresent the congestion that they are causing. 312 Nonetheless, this strawman encoding does present a clear mental model 313 of how the ConEx protocol might function under various uses. 315 3.2. ECN Based Encoding 317 Ideally ConEx and ECN are orthogonal signals and SHOULD be entirely 318 independent. However, given the limited number of header bit and/or 319 code points, these signals may have to share code points, at least 320 partially. 322 The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an 323 implementation of ConEx that had to be tightly integrated with the 324 encoding of ECN in order to fit into the IP header. The central 325 theme of the re-ECN work is an audit mechanism that can provide 326 sufficient disincentives against misrepresenting congestion 327 [I-D.briscoe-tsvwg-re-ecn-motiv], which is analyzed extensively in 328 Briscoe's PhD dissertation [Refb-dis]. 330 Re-ECN is a good example of one chosen set of compromises attempting 331 to meet the requirements of Section 2. However, the present document 332 takes a step back, aiming to state the ideal requirements in order to 333 allow the Internet community to assess whether other compromises are 334 possible. 336 In particular, different incremental deployment choices may be 337 desirable to meet the partial deployment requirement of Section 2. 338 Re-ECN requires the receiver to be at least ECN-capable as well as 339 requiring an update to the sender. Although ConEx will inherently 340 require change at the sender, it would be preferable if it could 341 work, even partially, with any receiver. 343 The chosen ConEx protocol certainly must not require ECN to be 344 deployed in any network. In this respect re-ECN is already a good 345 example--it acts perfectly well as a loss-based ConEx protocol it the 346 loss-based audit techniques in Section 4.3 are used. However, it 347 would still be desirable to avoid the dependence on an ECN receiver. 349 For a tutorial background on re-ECN techniques, see [Re-fb, 350 FairerFaster]. 352 3.2.1. ECN Changes 354 Although the re-ECN protocol requires no changes to the network part 355 of the ECN protocol, it is important to note that it does propose 356 some relatively minor modifications to the host-to-host aspects of 357 the ECN protocol specified in RFC 3168. They include: redefining the 358 ECT(1) code point (the change is consistent with RFC3168 but requires 359 deprecating the experimental ECN nonce [RFC3540]); modifications to 360 the ECN negotiations carried on the SYN and SYN-ACK; and using a 361 different state machine to carry ECN signals in the transport 362 acknowledgments from a modified Receiver to the Sender. This last 363 change is optional, but it permits the transport protocol to carry 364 multiple congestion signals per round trip. It greatly simplifies 365 accurate auditing, and is likely to be useful in other transports, 366 e.g. DCTCP [DCTCP]. 368 All of these adjustments to RFC 3168 may also be needed in a future 369 standardized ConEx protocol. There will need to be very careful 370 consideration of any proposed changes to ECN or other existing 371 protocols, because any such changes increase the cost of deployment. 373 3.3. Abstract Encoding 375 The ConEx protocol could take one of two different encodings: 376 independently settable bits or an enumerated set of mutually 377 exclusive codepoints. 379 In both cases, the amount of congestion is signaled by the volume of 380 marked data--just as the volume of lost data or ECN marked data 381 signals the amount of congestion experienced. Thus the size of each 382 packet carrying a ConEx Signal is significant. 384 3.3.1. Independent Bits 386 This encoding involves flag bits, each of which the sender can set 387 independently to indicate to the network one of the following four 388 signals: 389 ConEx (Not-ConEx) The transport is (or is not) using ConEx with this 390 packet (the protocol MUST be arranged so that legacy transport 391 senders implicitly send Not-ConEx) 392 Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not) 393 experienced a loss 394 Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not) 395 experienced ECN-signaled congestion 396 Credit (Not-Credit) The transport is (or is not) building up 397 congestion credit (see Section 4.3 on the audit function) 399 3.3.2. Codepoint Encoding 401 This encoding involves signaling one of the following five 402 codepoints: 404 ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit} 406 Each named codepoint has the same meaning as in the encoding using 407 independent bits (Section 3.3.1). The use of any one codepoint 408 implies the negative of all the others. 410 Inherently, the semantics of most of the enumerated codepoints are 411 mutually exclusive. 'Credit' is the only one that might need to be 412 used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even 413 that requirement is questionable. It must not be forgotten that the 414 enumerated encoding loses the flexibility to signal these two 415 combinations, whereas the encoding with four independent bits is not 416 so limited. Alternatively two extra codepoints could be assigned to 417 these two combinations of semantics. 419 4. Congestion Exposure Components 421 {ToDo: Picture of the components, similar to that in the last 422 slideset about conex-concepts-uses?} 424 4.1. Modified Senders 426 The sending transport needs to be modified to send Congestion 427 Exposure Signals in response to congestion feedback signals. 429 4.2. Receivers (Optionally Modified) 431 The receiving transport may already feedback sufficiently useful 432 signals to the sender so that it does not need to be altered. 434 However, a TCP receiver feeds back ECN congestion signals no more 435 than once within a round trip. The sender may require more precise 436 feedback from the receiver otherwise it will appear to be 437 understating its ConEx Signals (see Section 3.2.1). 439 Ideally, ConEx should be added to a transport like TCP without 440 mandatory modifications to the receiver. But an optional 441 modification to the receiver could be recommended for precision. 442 This was the approach taken when adding re-ECN to TCP 443 [I-D.briscoe-tsvwg-re-ecn-tcp]. 445 4.3. Audit 447 To audit ConEx Signals against actual losses (as opposed to ECN) an 448 auditor could use one of the following techniques: 449 TCP-specific approach: The auditor could monitor TCP flows or 450 aggregates of flows, only holding state on a flow if it first 451 sends a Credit or a Re-Echo-Loss marking. The auditor could 452 detect retransmissions by monitoring sequence numbers. It would 453 assure that (volume of retransmitted data) <= (volume of data 454 marked Re-Echo-Loss). Traffic would only be auditable in this way 455 if it conformed to the standard TCP protocol and the IP payload 456 was not encrypted (e.g. with IPsec). 457 Predominant bottleneck approach: Unlike the above TCP-specific 458 solution, this technique would work for IP packets carrying any 459 transport layer protocol, and whether encrypted or not. But it 460 only works well for networks designed so that losses predominantly 461 occur under the management of one IP-aware node on the path. The 462 auditor could then be located at this bottleneck. It could simply 463 compare ConEx Signals with actual local losses. Most consumer 464 access networks are design to this model, e.g. the radio network 465 controller (RNC) in a cellular network or the broadband remote 466 access server (BRAS) in a digital subscriber line (DSL) network. 468 The accuracy of an auditor at one predominant bottleneck might 469 still be sufficient, even if losses occasionally occurred at other 470 nodes in the network (e.g. border gateways). Although the auditor 471 at the predominant bottleneck would not always be able to detect 472 losses at other nodes, transports would not know where losses were 473 occurring either. Therefore a transport would not know which 474 losses it could cheat on without getting caught, and which ones it 475 couldn't. 477 To audit ConEx Signals against actual ECN markings or losses, the 478 auditor could work as follows: monitor flows or aggregates of flows, 479 only holding state on a flow if it first sends a ConEx-Marked packet 480 (Credit or either Re-Echo marking). Count the number of bytes marked 481 with Credit or Re-Echo-ECN. Separately count the number of bytes 482 marked with ECN. Use Credits to assure that {#ECN} <= {#Re-Echo-ECN} 483 + {#Credit}, even though the Re-Echo-ECN markings are delayed by at 484 least one RTT. 486 4.3.1. Using Credit to Simplify Audit 488 At the audit function,there will be an inherent delay of at least one 489 round trip between a congestion signal and the subsequent ConEx 490 signal it triggers--as it makes the two passes of the feedback loop 491 in Figure 1. However, the audit function cannot be expected to wait 492 for a round trip to check that one signal balances the other, because 493 it is hard for a network device to know the RTT of each transport. 495 Instead, it considerably simplifies the audit function if the source 496 transport is made responsible for removing the round trip delay in 497 ConEx signals. The transport SHOULD signal sufficient credit in 498 advance to cover any reasonably expected congestion during its 499 feedback delay. Then, the audit function does not need to make 500 allowance for round trip delays--that it cannot quantify. This 501 design choice correctly makes the transport responsible for both 502 minimizing feedback delay and for the risk that packets in flight 503 will cause congestion to others before the source can react. 505 For example, imagine the audit function keeps a running account of 506 the balance between actual congestion signals (loss or ECN), which it 507 counts as negative, and ConEx signals, which it counts as positive. 508 Having made the transport responsible for round trip delays, it will 509 be expected to have pre-loaded the audit function with some credit at 510 the start. Therefore, if ever the balance does go negative, the 511 audit function can immediately start punishing a flow, without any 512 grace period. 514 The one-way nature of packet forwarding probably makes per-flow state 515 unavoidable for the audit function. This was a necessary sacrifice 516 to avoid per-flow state elsewhere in the wider ConEx architecture. 517 Nonetheless, care was taken to ensure that packets could bring soft- 518 state to the audit function, so that it would continue to work if a 519 flow shifted to a different audit device, perhaps after a reroute or 520 an audit device failure. Therefore, although the audit function is 521 likely to need flow state memory, at least it complies with the 522 'fate-sharing' design principle of the Internet [IntDesPrinciples], 523 and at least per-flow audit is only required at the outer edges of 524 the internetwork, where it is less of a scalability concern. 526 Note also that ConEx does not intend to embed rules in the network on 527 how individual flows _behave_. The audit function only does per-flow 528 processing to check the integrity of ConEx _information_. 530 4.3.2. Behaviour Constraints for the Audit Function 532 There is no intention to standardise how to design or implement the 533 audit function. However, it is necessary to lay down the following 534 normative constraints on audit behaviour so that transport designers 535 will know what to design against and implementers of audit devices 536 will know what pitfalls to avoid: 537 Minimal False Hits: Audit SHOULD introduce minimal false hits for 538 honest flows; 539 Minimal False Misses: Audit SHOULD quickly detect and sanction 540 dishonest flows, preferably at the first dishonest packet; 541 Transport Oblivious: Audit MUST NOT be designed around one 542 particular rate response, such as any particular TCP congestion 543 control algorithm or one particular resource sharing regime such 544 as TCP-friendliness [RFC3448]. An important goal is to give 545 ingress networks the freedom to unilaterally allow different rate 546 responses to congestion and different resource sharing regimes 547 [Evol_cc], without having to coordinate with downstream networks; 548 Sufficient Sanction: Audit MUST introduce sufficient sanction (e.g. 549 loss in goodput) so that sources cannot understate congestion and 550 play off losses at the audit function against higher allowed 551 throughput at a congestion policer [Salvatori05]; 552 Manage Memory Exhaustion: Audit SHOULD be able to counter state 553 exhaustion attacks. For instance, if the audit function uses 554 flow-state, it should not be possible for sources to exhaust its 555 memory capacity by gratuitously sending numerous packets, each 556 with a different flow ID. 557 Identifier Accountability: Audit MUST NOT be vulnerable to `identity 558 whitewashing', where a transport can label a flow with a new ID 559 more cheaply than paying the cost of continuing to use its current 560 ID [CheapPseud]; 562 4.4. Policy Devices 564 Policy devices are characterised by a need to be configured with a 565 policy related to the users or neighboring networks being served. In 566 contrast, the auditing devices referred to in the previous section 567 primarily enforce compliance with the ConEx protocol and do not need 568 to be configured with any client-specific policy. 570 4.4.1. Policy Monitoring Devices 572 Policy devices can typically be decomposed into two functions i) 573 monitoring the ConEx signal to compare it with a policy then ii) 574 acting in some way on the result. Various actions might be invoked 575 against 'out of contract' traffic, such as policing (see next 576 section), re-routing, or downgrading the class of service. 578 Alternatively a policy device might not act directly on the traffic, 579 but instead report to management systems that are designed to control 580 congestion indirectly. For instance the reports might trigger 581 capacity upgrades, penalty clauses in contracts, levy charges between 582 networks based on congestion, or merely send warnings to clients who 583 are causing excessive congestion. 585 Nonetheless, whatever action is invoked, the policy monitoring 586 function will always be a necessary part of any policy device. 588 4.4.2. Congestion Policers 590 A congestion policer can be implemented in a very similar way to a 591 bit-rate policer, but its effect can be focused solely on traffic 592 causing congestion downstream, which ConEx signals make visible. 593 Without ConEx signals, the only way to mitigate congestion is to 594 blindly limit traffic bit-rate, on the assumption that high bit-rate 595 is more likely to cause congestion. 597 A congestion policer monitors all ConEx traffic entering a network, 598 or some identifiable subset. Using ConEx signals, it measures the 599 amount of congestion that this traffic is contributing to somewhere 600 downstream. If this exceeds a policy-configured 'congestion-bit- 601 rate' the congestion policer will limit all the monitored ConEx 602 traffic. 604 A congestion policer can be implemented by a simple token bucket. 605 But unlike a bit-rate policer, it removes a token only when it 606 forwards a packet that is ConEx-Marked, effectively treating Not- 607 ConEx-Marked packets as invisible. Consequently, because tokens give 608 the right to send congested bits, the fill-rate of the token bucket 609 will represent the allowed congestion-bit-rate, which should be 610 sufficient traffic management without having to additionally 611 constrain the straight bit-rate. See [CongPol] for details. 613 5. IANA Considerations 615 This memo includes no request to IANA. 617 Note to RFC Editor: this section may be removed on publication as an 618 RFC. 620 6. Security Considerations 622 Significant parts of this whole document are about auditability of 623 ConEx Signals, in particular Section 4.3. 625 7. Conclusions 627 {ToDo:} 629 8. Acknowledgements 631 This document was improved by review comments from Toby Moncaster, 632 Nandita Dukkipati, Mirja Kuehlewind and Caitlin Bestler. 634 9. Comments Solicited 636 Comments and questions are encouraged and very welcome. They can be 637 addressed to the IETF Congestion Exposure (ConEx) working group 638 mailing list , and/or to the authors. 640 10. References 642 10.1. Normative References 644 [RFC2119] Bradner, S., "Key words for use in 645 RFCs to Indicate Requirement 646 Levels", BCP 14, RFC 2119, 647 March 1997. 649 10.2. Informative References 651 [CheapPseud] Friedman, E. and P. Resnick, "The 652 Social Cost of Cheap Pseudonyms", 653 Journal of Economics and Management 654 Strategy 10(2)173--199, 1998. 656 [CongPol] Jacquet, A., Briscoe, B., and T. 657 Moncaster, "Policing Freedom to Use 658 the Internet Resource Pool", Proc 659 ACM Workshop on Re-Architecting the 660 Internet (ReArch'08) , 661 December 2008, . 665 [DCTCP] Alizadeh, M., Greenberg, A., Maltz, 666 D., Padhye, J., Patel, P., 667 Prabhakar, B., Sengupta, S., and M. 668 Sridharan, "Data Center TCP 669 (DCTCP)", ACM SIGCOMM 670 CCR 40(4)63--74, October 2010, . 674 [Evol_cc] Gibbens, R. and F. Kelly, "Resource 675 pricing and the evolution of 676 congestion control", 677 Automatica 35(12)1969--1985, 678 December 1999, . 682 [FairerFaster] Briscoe, B., "A Fairer, Faster 683 Internet Protocol", IEEE 684 Spectrum Dec 2008:38--43, 685 December 2008, . 689 [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., 690 Moncaster, T., and A. Smith, "Re- 691 ECN: A Framework for adding 692 Congestion Accountability to 693 TCP/IP", draft-briscoe-tsvwg-re- 694 ecn-tcp-motivation-02 (work in 695 progress), October 2010. 697 [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., 698 Moncaster, T., and A. Smith, "Re- 699 ECN: Adding Accountability for 700 Causing Congestion to TCP/IP", 701 draft-briscoe-tsvwg-re-ecn-tcp-09 702 (work in progress), October 2010. 704 [I-D.conex-concepts-uses] Briscoe, B., Woundy, R., Moncaster, 705 T., and J. Leslie, "ConEx Concepts 706 and Use Cases", 707 draft-ietf-conex-concepts-uses-01 708 (work in progress), March 2011. 710 [I-D.ietf-ledbat-congestion] Shalunov, S., Hazel, G., and J. 711 Iyengar, "Low Extra Delay 712 Background Transport (LEDBAT)", 713 draft-ietf-ledbat-congestion-03 714 (work in progress), October 2010. 716 [I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D., 717 and D. Thaler, "Compound TCP: A New 718 TCP Congestion Control for High- 719 Speed and Long Distance Networks", 720 draft-sridharan-tcpm-ctcp-02 (work 721 in progress), November 2008. 723 [IntDesPrinciples] Clark, D., "The Design Philosophy 724 of the DARPA Internet Protocols", 725 ACM SIGCOMM CCR 18(4)106--114, 726 August 1988, . 730 [RFC0791] Postel, J., "Internet Protocol", 731 STD 5, RFC 791, September 1981. 733 [RFC2309] Braden, B., Clark, D., Crowcroft, 734 J., Davie, B., Deering, S., Estrin, 735 D., Floyd, S., Jacobson, V., 736 Minshall, G., Partridge, C., 737 Peterson, L., Ramakrishnan, K., 738 Shenker, S., Wroclawski, J., and L. 739 Zhang, "Recommendations on Queue 740 Management and Congestion Avoidance 741 in the Internet", RFC 2309, 742 April 1998. 744 [RFC3168] Ramakrishnan, K., Floyd, S., and D. 745 Black, "The Addition of Explicit 746 Congestion Notification (ECN) to 747 IP", RFC 3168, September 2001. 749 [RFC3448] Handley, M., Floyd, S., Padhye, J., 750 and J. Widmer, "TCP Friendly Rate 751 Control (TFRC): Protocol 752 Specification", RFC 3448, 753 January 2003. 755 [RFC3514] Bellovin, S., "The Security Flag in 756 the IPv4 Header", RFC 3514, April 1 757 2003. 759 [RFC3540] Spring, N., Wetherall, D., and D. 760 Ely, "Robust Explicit Congestion 761 Notification (ECN) Signaling with 762 Nonces", RFC 3540, June 2003. 764 [RFC3550] Schulzrinne, H., Casner, S., 765 Frederick, R., and V. Jacobson, 766 "RTP: A Transport Protocol for 767 Real-Time Applications", STD 64, 768 RFC 3550, July 2003. 770 [RFC5670] Eardley, P., "Metering and Marking 771 Behaviour of PCN-Nodes", RFC 5670, 772 November 2009. 774 [RFC5681] Allman, M., Paxson, V., and E. 775 Blanton, "TCP Congestion Control", 776 RFC 5681, September 2009. 778 [Re-fb] Briscoe, B., Jacquet, A., Di 779 Cairano-Gilfedder, C., Salvatori, 780 A., Soppera, A., and M. Koyabe, 781 "Policing Congestion Response in an 782 Internetwork Using Re-Feedback", 783 ACM SIGCOMM CCR 35(4)277--288, 784 August 2005, . 788 [Refb-dis] Briscoe, B., "Re-feedback: Freedom 789 with Accountability for Causing 790 Congestion in a Connectionless 791 Internetwork", UCL PhD 792 Dissertation , 2009, . 796 [Salvatori05] Salvatori, A., "Closed Loop Traffic 797 Policing", Politecnico Torino and 798 Institut Eurecom Masters Thesis , 799 September 2005. 801 [Vegas] Brakmo, L. and L. Peterson, "TCP 802 Vegas: End-to-End Congestion 803 Avoidance on a Global Internet", 804 IEEE Journal on Selected Areas in 805 Communications 13(8)1465--80, 806 October 1995, . 810 Authors' Addresses 812 Matt Mathis 813 Google, Inc 814 1600 Amphitheater Parkway 815 Mountain View, California 93117 816 USA 818 EMail: mattmathis at google.com 820 Bob Briscoe 821 BT 822 B54/77, Adastral Park 823 Martlesham Heath 824 Ipswich IP5 3RE 825 UK 827 Phone: +44 1473 645196 828 EMail: bob.briscoe@bt.com 829 URI: http://bobbriscoe.net/