idnits 2.17.1 draft-moncaster-conex-problem-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 667: '...imply say that operators MUST turn off...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 1, 2010) is 5163 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-08) exists of draft-irtf-iccrg-welzl-congestion-control-open-research-05 == Outdated reference: A later version (-09) exists of draft-livingood-woundy-congestion-mgmt-03 == Outdated reference: A later version (-10) exists of draft-ietf-ledbat-congestion-00 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion Exposure T. Moncaster 3 Internet-Draft L. Krug 4 Intended status: Informational BT 5 Expires: September 2, 2010 M. Menth 6 University of Wuerzburg 7 J. Araujo 8 UCL 9 S. Blake 10 Extreme Networks 11 R. Woundy, Ed. 12 Comcast 13 March 1, 2010 15 The Need for Congestion Exposure in the Internet 16 draft-moncaster-conex-problem-00 18 Abstract 20 Today's Internet is a product of its history. TCP is the main 21 transport protocol responsible for sharing out bandwidth and 22 preventing a recurrence of congestion collapse while packet drop is 23 the primary signal of congestion at bottlenecks. Since packet drop 24 (and increased delay) impacts all their customers negatively, network 25 operators would like to be able to distinguish between overly 26 aggressive congestion control and a confluence of many low-bandwidth, 27 low-impact flows. But they are unable to see the actual congestion 28 signal and thus, they have to implement bandwidth and/or usage limits 29 based on the only information they can see or measure (the contents 30 of the packet headers and the rate of the traffic). Such measures 31 don't solve the packet-drop problems effectively and are leading to 32 calls for government regulation (which also won't solve the problem). 34 We propose congestion exposure as a possible solution. This allows 35 packets to carry an accurate prediction of the congestion they expect 36 to cause downstream thus allowing it to be visible to ISPs and 37 network operators. This memo sets out the motivations for congestion 38 exposure and introduces a strawman protocol designed to achieve 39 congestion exposure. 41 Status of This Memo 43 This Internet-Draft is submitted to IETF in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF), its areas, and its working groups. Note that 48 other groups may also distribute working documents as Internet- 49 Drafts. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 The list of current Internet-Drafts can be accessed at 57 http://www.ietf.org/ietf/1id-abstracts.txt. 59 The list of Internet-Draft Shadow Directories can be accessed at 60 http://www.ietf.org/shadow.html. 62 This Internet-Draft will expire on September 2, 2010. 64 Copyright Notice 66 Copyright (c) 2010 IETF Trust and the persons identified as the 67 document authors. All rights reserved. 69 This document is subject to BCP 78 and the IETF Trust's Legal 70 Provisions Relating to IETF Documents 71 (http://trustee.ietf.org/license-info) in effect on the date of 72 publication of this document. Please review these documents 73 carefully, as they describe your rights and restrictions with respect 74 to this document. Code Components extracted from this document must 75 include Simplified BSD License text as described in Section 4.e of 76 the Trust Legal Provisions and are provided without warranty as 77 described in the BSD License. 79 Table of Contents 81 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 82 1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 83 1.2. Changes from previous versions . . . . . . . . . . . . . . 6 84 2. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 7 85 2.1. Congestion is not the problem . . . . . . . . . . . . . . 7 86 2.2. Increase capacity or manage traffic? . . . . . . . . . . . 7 87 2.2.1. Making Congestion Visible . . . . . . . . . . . . . . 8 88 2.2.2. ECN - a Step in the Right Directions . . . . . . . . . 8 89 3. Existing Approaches to Traffic Control . . . . . . . . . . . . 9 90 3.1. Layer 3 Measurement . . . . . . . . . . . . . . . . . . . 9 91 3.1.1. Volume Accounting . . . . . . . . . . . . . . . . . . 9 92 3.1.2. Rate Measurement . . . . . . . . . . . . . . . . . . . 10 93 3.2. Higher Layer Discrimination . . . . . . . . . . . . . . . 10 94 3.2.1. Bottleneck Rate Policing . . . . . . . . . . . . . . . 10 95 3.2.2. DPI and Application Rate Policing . . . . . . . . . . 11 96 4. Why Now? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 97 5. Requirements for a Solution . . . . . . . . . . . . . . . . . 12 98 6. A Strawman Congestion Exposure Protocol . . . . . . . . . . . 14 99 7. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 14 100 7.1. Improved Policing . . . . . . . . . . . . . . . . . . . . 16 101 7.1.1. Per Aggregate Policing . . . . . . . . . . . . . . . . 16 102 7.1.2. Per customer policing . . . . . . . . . . . . . . . . 16 103 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 104 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 105 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 18 106 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 107 12. Informative References . . . . . . . . . . . . . . . . . . . . 18 109 1. Introduction 111 The Internet has grown from humble origins to become a global 112 phenomenon with billions of end-users able to share the network and 113 exchange data and more. One of the key elements in this success has 114 been the use of distributed algorithms such as TCP that share 115 capacity while avoiding congestion collapse. These algorithms rely 116 on the end-systems altruistically reducing their transmission rate in 117 response to any congestion they see. 119 In recent years ISPs have seen a minority of users taking a larger 120 share of the network by using applications that transfer data 121 continuously for hours or even days at a time and even opening 122 multiple simultaneous TCP connections. This issue became prevalent 123 with the advent of "always on" broadband connections. Frequently 124 peer to peer protocols have been held responsible [RFC5594] but 125 streaming video traffic is becoming increasingly significant. In 126 order to improve the network experience for the majority of their 127 customers, many ISPs have chosen to impose controls on how their 128 network's capacity is shared rather than continually buying more 129 capacity. They calculate that most customers will be unwilling to 130 contribute to the cost of extra shared capacity if that will only 131 really benefit a minority of users. Approaches include volume 132 counting or charging, and application rate limiting. Typically these 133 traffic controls, whilst not impacting most customers, set a 134 restriction on a customer's level of network usage, as defined in a 135 "fair usage policy". 137 We believe that such traffic controls seek to control the wrong 138 quantity. What matters in the network is neither the volume of 139 traffic nor the rate of traffic, it is the contribution to congestion 140 over time - congestion means that your traffic impacts other users, 141 and conversely that their traffic impacts you. So if there is no 142 congestion there need not be any restriction on the amount a user can 143 send; restrictions only need to apply when others are sending traffic 144 such that there is congestion. In fact some of the current work at 145 the IETF [LEDBAT] and IRTF [CC-open-research] already reflects this 146 thinking. For example, an application intending to transfer large 147 amounts of data could use LEDBAT to try to reduce its transmission 148 rate before any competing TCP flows do, by detecting an increase in 149 end-to-end delay (as a measure of incipient congestion). However 150 these techniques rely on voluntary, altruistic action by end users 151 and their application providers. ISPs cannot enforce their use. 152 This leads to our second point. 154 The Internet was designed so that end-hosts detect and control 155 congestion. We believe that congestion needs to be visible to 156 network nodes as well, not just to the end hosts. More specifically, 157 a network needs to be able to measure how much congestion traffic 158 causes between the monitoring point in the network and the 159 destination ("rest-of-path congestion"). This would be a new 160 capability; today a network can use explicit congestion notification 161 (ECN) [RFC3168] to detect how much congestion traffic has suffered 162 between the source and a monitoring point in the network, but not 163 beyond. Such a capability would enable an ISP to give incentives for 164 the use, without restrictions, of LEDBAT-like applications whilst 165 perhaps restricting excessive use of TCP and UDP ones. 167 So we propose a new approach which we call congestion exposure. We 168 propose that congestion information should be made visible at the IP 169 layer, so that any network node can measure the contribution to 170 congestion of an aggregate of traffic as easily as straight volume 171 can be measured today. Once the information is exposed in this way, 172 it is then possible to use it to measure the true impact of any 173 traffic on the network. Lacking the ability to see congestion, some 174 ISPs count the volume each user transfers. On this basis LEDBAT 175 applications would get blamed for hogging the network given the large 176 amount of volume they transfer. However, because they yield rather 177 than hog, they actually contribute very little to congestion. One 178 use of exposed congestion information would be to measure the 179 congestion attributable to a given user, and thereby incentivise the 180 use of protocols such as [LEDBAT] which aim to reduce the congestion 181 caused by bulk data transfers. 183 Creating the incentive to deploy low-congestion protocols such as 184 LEDBAT is just one of many motivations for congestion exposure. In 185 general, congestion exposure gives ISPs a principled way to hold 186 their customers accountable for the impact on others of their network 187 usage and reward them for choosing congestion-sensitive applications. 188 It can measure the impact of an individual consumer, a large 189 enterprise network or the traffic crossing a border from another ISP 190 - anywhere where volume is used today as a (poor) measure of usage. 191 In Section 7, a range of potential use cases for congestion exposure 192 are given, showing it is possible to imagine a wide range of other 193 ways to use the exposed congestion information. 195 1.1. Definitions 197 Throughout this document we refer to congestion repeatedly. 198 Congestion has a wide range of definitions. For the purposes of this 199 document it is defined using the simplest way that it can be measured 200 - the instantaneous fraction of loss. More precisely, congestion is 201 bits lost divided by bits sent, taken over any brief period. By 202 extension, if explicit congestion notification (ECN) is being used, 203 the fraction of bits marked (rather than lost) gives a useful metric 204 that can be thought of as analagous to congestion. Strictly 205 congestion should measure impairment, whereas ECN aims to avoid any 206 loss or delay impairments due to congestion. But for the purposes of 207 this document, the two will both be called congestion. 209 We also need to define two specific terms carefully: 211 Upstream Congestion: The congestion that has already been 212 experienced by a packet as it travels along its path. In other 213 words at any point on the path it is the congestion between that 214 point and the source of the packet. 216 Downstream Congestion: The congestion that a packet still has to 217 experience on the remainder of its path. In other words at any 218 point it is the congestion still to be experienced as the packet 219 travels between that point and its destination. 221 1.2. Changes from previous versions 223 From -03 to -04 (current version): 225 Many edits throughout per comments from Bob Briscoe about the 226 intentions of ConEx. 228 References section updated; reference to Comcast congestion 229 management system added as ISP example. 231 NOTE: there are still sections needing more work, especially 232 the Use Cases. The whole document also needs trimming in 233 places and checking for repetition or omission. 235 From -02 to -03: 237 Abstract re-written again following comments from John Leslie. 239 Use Cases Section re-written. 241 Security Considerations section improved. 243 This ChangeLog added. 245 From -01 to -02: 247 Extensive changes throughout the document: 249 + Abstract and Introduction re-written. 251 + The Problem section re-written and extended significantly. 253 + Why Now? Section re-written and extended. 255 + Requirements extended. 257 + Security Considerations expanded. 259 Other less major changes throughout. 261 From -00 -01: 263 Significant changes throughout including re-organising the main 264 structure. 266 New Abstract and changes to Introduction. 268 2. The Problem 270 2.1. Congestion is not the problem 272 The problem is not congestion itself. The problem is how best to 273 share available capacity. When too much traffic meets too little 274 capacity, congestion occurs. Then we have to share out what capacity 275 there is. But we should not (and cannot) solve the capacity sharing 276 problem by trying to make it go away - by saying there should somehow 277 be no congestion, slower traffic or more capacity. That misses the 278 whole point of the Internet: to multiplex or share available capacity 279 at maximum bit-rate. 281 So as we say, the problem is not congestion in itself. Every elastic 282 data transfer should (and usually will) congest a healthy data 283 network. If it doesn't, its transport protocol is broken. There 284 should always be periods approaching 100% utilisation at some link 285 along every data path through the Internet, implying that frequent 286 periods of congestion are a healthy sign. If transport protocols are 287 too weak to congest capacity, they are under-utilising it and hanging 288 around longer than they need to, reducing the capacity available for 289 the next data transfers that might be about to start. 291 2.2. Increase capacity or manage traffic? 293 Some say the problem is that ISPs should invest in more capacity. 294 Certainly increasing capacity should make the congested periods 295 during data transfers shorter and the non-congested gaps between them 296 longer. The argument goes that if capacity were large enough it 297 would make the periods when there is a capacity sharing problem 298 insignificant and not worth solving. 300 Yet, ISPs are facing a quandary - traffic is growing rapidly and 301 traffic patterns are changing significantly (see Section 4 and 302 [Cisco-VNI]) They know that any increases in capacity will have to be 303 paid for by all their customers but capacity growth will be of most 304 benefit to the heaviest users. Faced with these problems, some ISPs 305 are seeking to reduce what they regard as "heavy usage" in order to 306 improve the service experienced by the majority of their customers. 308 If done properly, managing traffic should be a valid alternative to 309 increasing capacity. An ISP's customers can vote with their feet if 310 the ISP chooses the wrong balance between managing heavy traffic and 311 charging for too much shared capacity. Current traffic management 312 techniques (Section 3) fight against the capacity shares that TCP is 313 aiming for. Ironically, they try to impose something approaching 314 LEDBAT-like behaviour on heavier flows. But as we have seen, they 315 cannot give LEDBAT the credit for doing this itself - the network 316 just sees a LEDBAT flow as a large amount of volume. 318 Thus the problem for the IETF is to ensure that ISPs and their 319 equipment suppliers have appropriate protocol support - not just to 320 impose good capacity sharing themselves, but to encourage end-to-end 321 protocols to share out capacity in everyone's best interests. 323 2.2.1. Making Congestion Visible 325 Unfortunately ISPs are only able to see limited information about the 326 traffic they forward. As we will see in section 3 they are forced to 327 use the only information they do have available which leads to myopic 328 control that has scant regard for the actual impact of the traffic or 329 the underlying network conditions. All their approaches are unsound 330 because they cannot measure the most useful metric. The volume or 331 rate of a given flow or aggregate doesn't directly affect other 332 users, but the congestion it causes does. This can be seen with a 333 simple illustration. A 5Mbps flow in an otherwise empty 10Mbps 334 bottleneck causes no congestion and so affects no other users. By 335 contrast a 1Mbps flow entering a 10Mbps bottleneck that is already 336 fully occupied causes significant congestion and impacts every other 337 user sharing that bottleneck as well as suffering impairment itself. 338 So the real problem that needs to be addressed is how to close this 339 information gap. How can we expose congestion at the IP layer so 340 that it can be used as the basis for measuring the impact of any 341 traffic on the network as a whole? 343 2.2.2. ECN - a Step in the Right Directions 345 Explicit Congestion Notification [RFC3168] allows routers to 346 explicitly tell end-hosts that they are approaching the point of 347 congestion. ECN builds on Active Queue Mechanisms such as random 348 early discard (RED) [RFC2309] by allowing the router to mark a packet 349 with a Congestion Experienced (CE) codepoint, rather than dropping 350 it. The probability of a packet being marked increases with the 351 length of the queue and thus the rate of CE marks is a guide to the 352 level of congestion at that queue. This CE codepoint travels forward 353 through the network to the receiver which then informs the sender 354 that it has seen congestion. The sender is then required to respond 355 as if it had experienced a packet loss. Because the CE codepoint is 356 visible in the IP layer, this approach reveals the upstream 357 congestion level for a packet. 359 So Is ECN the Solution? Alas not - ECN does allow downstream nodes 360 to measure the upstream congestion for any flow, but this is not 361 enough. This can make a receiver accountable for the congestion 362 caused by incoming traffic. But a receiver can only control incoming 363 congestion indirectly, by politely asking the sender to control it. 364 A receiver cannot make a sender install an adaptive codec, or install 365 LEDBAT instead of TCP. And a receiver cannot ask an attacker to stop 366 flooding it with traffic. What is needed is knowledge of the 367 downstream congestion level for which you need additional information 368 that is still concealed from the network - by design. 370 3. Existing Approaches to Traffic Control 372 Existing approaches intended to address the problems outlined above 373 can be broadly divided into two groups - those that passively monitor 374 traffic and can thus measure the apparent impact of a given flow of 375 packets and those that can actively discriminate against certain 376 packets, flows, applications or users based on various 377 characteristics or metrics. 379 3.1. Layer 3 Measurement 381 L3 measurement of traffic relies on using the information that can be 382 measured directly or is revealed in the IP header of the packet (or 383 lower layers). Architecturally, L3 measurement is best since it fits 384 with the idea of the hourglass design of the Internet [RFC3439]. 385 This asserts that "the complexity of the Internet belongs at the 386 edges, and the IP layer of the Internet should remain as simple as 387 possible." 389 3.1.1. Volume Accounting 391 Volume accounting is a technique that is often used to discriminate 392 between heavy and light users. The volume of traffic sent by a given 393 user or network is one of the easiest pieces of information to 394 monitor in a network. Measuring the size of every packet from the 395 header and adding them up is a simple operation. Consequently this 396 has long been a favoured measure used by operators to control their 397 customers. 399 The precise manner in which this volume information is used may vary. 400 Typically ISPs may impose an overall volume cap on their customers 401 (perhaps 10Gbytes a month). Alternatively they may decide that the 402 heaviest users each month are subjected to some sanction. 404 Volume is naively thought to indicate the impact that one party's 405 traffic has on others. But the same volume can cause very different 406 impacts on others if it is transferred at slightly different times, 407 or between slightly different endpoints. Also the impact on others 408 greatly depends on how responsive the transport is to congestion, 409 whether responsive (TCP), very responsive (LEDBAT), aggressive 410 (multiple TCPs) or totally unresponsive. 412 3.1.2. Rate Measurement 414 Rate measurements might be thought indicative of the impact of one 415 aggregate of traffic on others, and rate is often limited to avoid 416 impact on others. However such limits generally constrain everyone 417 much more than they need to, just in case most parties send fast at 418 the same time. And such limits constrain everyone too little at 419 other times, when everyone actually does send fast at the same time. 421 The problem with measuring rate is that it doesn't say how much the 422 rate is occupying shared capacity over time, and whether the high 423 rate of one user comes at times when others want a high rate. 425 3.2. Higher Layer Discrimination 427 Over recent years a number of traffic management techniques have 428 emerged that explicitly differentiate between different traffic 429 types, applications and even users. This is done because ISPs and 430 operators feel they have a need to use such techniques to better 431 control a new raft of applications that break some of the implicit 432 design assumptions behind TCP (short-lived flows, limited flows per 433 connection, generally between server and client). 435 3.2.1. Bottleneck Rate Policing 437 Bottleneck flow rate policers such as [XCHOKe] and [pBox] have been 438 proposed as approaches for rate policing traffic. But they must be 439 deployed at bottlenecks in order to work. Unfortunately, capacity 440 sharing is not only about congestion-responsive behaviour of each 441 flow, but also about how long the flows occupy the capacity and the 442 combined total of multiple flows. Such rate policers also make an 443 assumption about what constitutes acceptable per-flow behaviour. If 444 these bottleneck policers were widely deployed, the Internet could 445 find itself with one universal rate adaptation policy embedded 446 throughout the network. With TCP's congestion control algorithm 447 approaching its scalability limits as the network bandwidth continues 448 to increase, new algorithms are being developed for high-speed 449 congestion control. Embedding assumptions about acceptable rate 450 adaptation would make evolution to such new algorithms extremely 451 painful. 453 3.2.2. DPI and Application Rate Policing 455 Some operators use deep packet inspection (DPI) and traffic analysis 456 to identify certain applications they believe to have an excessive 457 impact on the network. ISPs generally pick on applications that that 458 they judge as low value to the customer in question and high impact 459 on other customers. A common example is peer-to-peer file-sharing. 460 Having identified a flow as belonging to such an application, the 461 operator uses differential scheduling to limit the impact of that 462 flow on others, which usually limits its throughput as well. This 463 has fuelled the on-going battle between application developers and 464 DPI vendors. 466 When operators first started to limit the throughput of P2P, it soon 467 became common knowledge that turning on encryption could boost your 468 throughput. The DPI vendors then improved their equipment so that it 469 could identify P2P traffic by the pattern of packets it sends. This 470 risks becoming an endless vicious cycle - an arms race that neither 471 side can win. Furthermore such techniques may put the operator in 472 direct conflict with the customers, regulators and content providers. 474 4. Why Now? 476 The accountability and capacity sharing problems highlighted so far 477 have always characterised the Internet to some extent. In 1988 Van 478 Jacobson coded capacity sharing into TCP's e2e congestion control 479 algorithms [TCPcc]. But fair queuing algorithms were already being 480 written for network operators to ensure each active user received an 481 equal share of a link and couldn't game the system [RFC0970]. The 482 two approaches have divergent objectives, but they have co-existed 483 ever since. 485 The main new factor has been the introduction of residential 486 broadband, making 'always-on' available to all, not just campuses and 487 enterprises. Both TCP and approaches like fair queuing don't take 488 account of how much of each user's data is occupying a link over 489 time, which can significantly reduce the capacity available to 490 lighter usage. Therefore residential ISPs have been introducing new 491 traffic management equipment that can prioritise based on each 492 customer's usage volume, e.g. [Comcast]. Otherwise capacity 493 upgrades get eaten up by transfers of large amounts of data, with 494 little gain for interactive usage [BB-Incentive]. 496 In campus networks, capacity upgrades are the easiest way to mitigate 497 the inability of TCP or FQ to take account of activity over time. 498 But capacity upgrades are much more expensive in residential 499 broadband networks that are spread over large geographic areas and 500 customers will only be happy to pay more for their service if the 501 majority can see a significant benefit. 503 However, these traffic management techniques fight the capacity 504 shares e2e protocols are aiming at, rather than working together in 505 unison. And, the more optimal ISPs try to make their controls, the 506 more they need application knowledge within the network - which isn't 507 how the Internet was designed to work. Congestion exposure hasn't 508 been considered before, because the depth of the problem has only 509 recently been understood. We now understand that both networks and 510 end-systems to focus on contribution to congestion, not volume or 511 rate. Then application knowledge is only needed on the end-system, 512 where it should be. But the reason this isn't happening is because 513 the network cannot see the information it needs (congestion). 515 As long as ISPs continue to use rate and volume as the key metrics 516 for determining when to control traffic there is no incentive to use 517 LEDBAT or other low-congestion protocols to improve the performance 518 of competing interactive traffic. We believe that congestion 519 exposure gives ISPs the information they need to be able to 520 discriminate in favour of such low-congestion transports. In turn 521 this will give users a direct benefit from using such transports and 522 so encourage their wider use. 524 5. Requirements for a Solution 526 This section proposes some requirements for any solution to this 527 problem. We believe that a solution that meets most of these 528 requirements is likely to be better than one that doesn't, but we 529 recognise that if a working group is established in this area, it may 530 have to make tradeoffs. 532 o Allow both upstream and downstream congestion to be visible at the 533 IP layer -- visibility at the IP layer allows congestion in the 534 heart of the network to be monitored at the edges and without 535 deploying complicated and intrusive equipment such as DPI boxes. 536 This gives several advantages: 538 1. It enables bulk policing of traffic based on the congestion it 539 is actually going to cause in the network. 541 2. It allows the amount of congestion across ISP borders to be 542 monitored. 544 3. It supports a diversity of intra-domain and inter-domain 545 congestion management practices. 547 4. It allows the contribution to congestion over time to be 548 counted as easily as volume can be counted today. 550 5. It supports contractual arrangements for managing traffic 551 (acceptable use policies, SLAs etc) between just the two 552 parties exchanging traffic across their point of attachment, 553 without involving others. 555 o Avoid making assumptions about the behavior of specific 556 applications (e.g. be agnostic to application and transport 557 behaviour). 559 o Support the widest possible range of transport protocols for the 560 widest range of data types (elastic, inelastic, real-time, 561 background, etc) -- don't force a "universal rate adaptable 562 policy" such as TCP-friendliness [RFC3448]. 564 o Be responsive to real-time congestion in the network. 566 o Allow incremental deployment of the solution and ideally design 567 for permanent partial deployment to increase chances of successful 568 deployment. 570 o Ensure packets supporting congestion exposure are distinguishable 571 from others, so that each transport can control when it chooses to 572 deploy congestion exposure, and ISPs can manage the two types of 573 traffic distinctly. 575 o Support mechanisms that ensure the integrity of congestion 576 notifications, thus making it hard for a user or network to 577 distort the congestion signal. 579 o Be robust in the face of DoS attacks, so that congestion 580 information can be used to identify and limit DoS traffic and to 581 protect the hosts and network elements implementing congestion 582 exposure. 584 Many of these requirements are by no means unique to the problem of 585 congestion exposure. Incremental deployment for instance is a 586 critical requirement for any new protocol that affects something as 587 fundamental as IP. Being robust under attack is also a pre-requisite 588 for any protocol to succeed in the real Internet and this is covered 589 in more detail in Section 9. 591 6. A Strawman Congestion Exposure Protocol 593 In this section we explore a simple strawman protocol that would 594 solve the congestion exposure problem. This protocol neatly 595 illustrates how a solution might work. A practical implementation of 596 this protocol has been produced and both simulations and real-life 597 testing show that it works. The protocol is based on a concept known 598 as re-feedback [Re-fb] and builds on existing active queue management 599 techniques like RED [RFC2309] and ECN [RFC3168] that network elements 600 can already use to measure and expose congestion. 602 Re-feedback, standing for re-inserted feedback, is a system designed 603 to allow end-hosts to reveal to the network information about their 604 network path that they have received via conventional feedback (for 605 instance congestion). 607 In our strawman protocol we imagine that packets have two 608 "congestion" fields in their IP header: 610 o The first is a congestion experienced field to record the upstream 611 congestion level along the path. Routers indicate their current 612 congestion level by updating this field in every packet. As the 613 packet traverses the network it builds up a record of the overall 614 congestion along its path in this field. This data is sent back 615 to the sender who uses it to determine its transmission rate. 617 o The other is a whole-path congestion field that uses re-feedback 618 to record the total congestion along the path. The sender does 619 this by re-inserting the current congestion level for the path 620 into this field for every packet it transmits. 622 Thus at any node downstream of the sender you can see the upstream 623 congestion for the packet (the congestion thus far), the whole path 624 congestion (with a time lag of 1RTT) and can calculate the downstream 625 congestion by subtracting one from the other. 627 So congestion exposure can be achieved by coupling congestion 628 notification from routers with the re-insertion of this information 629 by the sender. This establishes information symmetry between users 630 and network providers. 632 7. Use Cases 634 Once downstream congestion information is revealed in the IP header 635 it can be used for a number of purposes. Precise details of how the 636 information might be used are beyond the scope of this document but 637 this section will give an overview of some possible uses. {ToDo: 638 write up the rest of this section properly. Concentrate on a couple 639 of the most useful potential use cases (traffic management and 640 accountability?) and mention a couple of more arcane uses (traffic 641 engineering and e2e QoS). The key thing is to clarify that 642 Congestion Exposure is a tool that can be used for many other 643 things...} 645 It allows an ISP to accurately identify which traffic is having the 646 greatest impact on the network and either police directly on that 647 basis or use it to determine which users should be policed. It can 648 form the basis of inter-domain contracts between operators. It could 649 even be used as the basis for inter-domain routing, thus encouraging 650 operators to invest appropriately in improving their infrastructure. 652 From Rich Woundy: "I would add a section about use cases. The 653 primary use case would seem to be an "incentive environment that 654 ensures optimal sharing of capacity", although that could use a 655 better title. Other use cases may include "DDoS mitigation", "end- 656 to-end QoS", "traffic engineering", and "inter-provider service 657 monitoring". (You can see I am stealing liberally from the 658 motivation draft here. We'll have to see whether the other use cases 659 are "core" to this group, or "freebies" that come along with re-ECN 660 as a particular protocol.)" 662 My take on this is we need to concentrate on one or two major use 663 cases. The most obvious one is using this to control user-behaviour 664 and encourage the use of "congestion friendly" protocols such as 665 LEDBAT. 667 {Comments from Louise Krug:} simply say that operators MUST turn off 668 any kind of rate limitation for LEDBAT traffic and what they might 669 mean for the amount of bandwidth they see compared to a throttled 670 customer? You could then extend that to say how it leads to better 671 QoS differentiation under the assumption that there is a broad 672 traffic mix any way? Not sure how much detail you want to go into 673 here though? 675 {ToDo: better incorporate this text from Mirja into Michael's text 676 below.} Congestion exposure can enable ISPs to give an incentive to 677 end-systems to response to congestion in a way that leads to a better 678 share of the available capacity. For example the introduction of a 679 per-user congestion volume might motivate "heavy-user" to back off 680 with their high-bandwidth traffic (when congestion occurs) to save 681 their congestion volume for more time-critical traffic. If every 682 end-system reacts to congestion in such a way that it avoids 683 congestion for non-critical traffic and allow a certain level of 684 congestion for the more important traffic (from the user's point of 685 view), the all-over user experience will be increased. More-over the 686 network might be utilized more equally when less-important traffic is 687 shifted to less congested time slots. 689 7.1. Improved Policing 691 As described earlier in this document, ISPs throttle traffic not 692 because it causes congestion in the network but because users have 693 exceeded their traffic profile or because individual applications or 694 flows are suspected to cause congestion. This is done because it is 695 not possible to police only the traffic that is causing congestion. 696 Congestion exposure allows new possibilities for rate policing. 698 7.1.1. Per Aggregate Policing 700 A straightforward application of congestion exposure is per-flow or 701 per-aggregate congestion policing. Instead of limiting flows or 702 aggregates because they have exceeded certain rate thresholds, they 703 can be throttled if they cause too much congestion in the network. 704 This is throttling on evidence instead of suspicion. 706 7.1.2. Per customer policing 708 The assumption is that every customer has an allowance of congestion 709 per second. If he causes more congestion than this throughout the 710 network, his traffic can be policed or shaped to ensure he stays 711 within his allowance. The nice features of this approach are that it 712 sets incentives for the use of congestion-minimising transport 713 protocols such as LEDBAT and allows tariffs that better reflect the 714 relative impact of customers each other. 716 Incentives for congestion minimising transports: A user generates 717 foreground and background traffic. Foreground traffic needs to go 718 fast while background traffic can afford to go slow. With per- 719 customer congestion policing, users can optimise their network 720 experience by using congestion-minimising transport protocols for 721 background traffic and normal TCP-like or even high-speed 722 transport protocols for foreground traffic. Doing so means 723 background traffic only causes minimal congestion so that 724 foreground traffic can go faster than when both were transmitted 725 over the same transport protocols. Hence, per-customer congestion 726 policing sets incentives for selfish users to utilise congestion- 727 minimising transport protocols. 729 Improved tariff structures: Currently customers are offered tariffs 730 with all manner of differentaitors from peak access rate to volume 731 limit and even specific application rate limits. Congestion- 732 policing offers a better means of distinguishing between tariffs. 734 Heavy users and light users will get equal access in terms of 735 speed and short-term throughput, but customers that cause more 736 congestion and thus have a bigger impact on others will have to 737 pay for the privilege or suffer reduced throuhgput during periods 738 of heavy congestion. However tariffs are a subject best left to 739 the market to determine, not the IETF. 741 8. IANA Considerations 743 This document makes no request to IANA. 745 9. Security Considerations 747 One intended use of exposed congestion information is to hold the e2e 748 transport and the network accountable to each other. Therefore, any 749 congestion exposure protocol will have to provide the necessary hooks 750 to mechanisms that can assure the integrity of this information. The 751 network cannot be relied on to report information to the receiver 752 against its interest, and the same applies for the information the 753 receiver feeds back to the sender, and that the sender reports back 754 to the network. Looking at all each in turn: 756 o The Network. In general it is not in any network's interest to 757 under-declare congestion since this will have potentially negative 758 consequences for all users of that network. It may be in its 759 interest to over-declare congestion if, for instance, it wishes to 760 force traffic to move away to a different network or indeed simply 761 wants to reduce the amonut of traffic it is carrying. Congestion 762 Exposure itself shouldn't significantly alter the incentives for 763 and against honest declaration of congestion by a network, but it 764 is possible to imagine applications of Congestion Exposure that 765 will change these incentives. There is a general perception among 766 networks that their level of congestion is a business secret. 767 Actually in the Internet architecture congestion is one of the 768 worst-kept secrets a network has, because end-hosts can see 769 congestion better than networks can. Nonetheless, one goal of a 770 congestion exposure protocol is to allow networks to pinpoint 771 whether congestion is in one side or the other of a border. 772 Although this extra transparency should be good for ISPs with low 773 congestion, those with underprovisioned networks may try to 774 obstruct deployment. 776 o The Receiver. Receivers generally have an incentive to under- 777 declare congestion since they generally wish to receive the data 778 from the sender as rapidly as possible. [Savage] explains how a 779 receiver can significantly improve their throughput my failing to 780 declare congestion. This is a problem with or without Congestion 781 Exposure. [KGao] explains one possible technique to encourage 782 receiver's to be honest in their declaration of congestion. 784 o The Sender. One proposed mechanisms for congestion exposure adds 785 a requirement for a sender to let the network know how much 786 congestion it has suffered or caused. Although most senders 787 currently respond to congestion they are informed of, one use of 788 exposed congestion information might be to encourage sources of 789 excessive congestion to respond more than previously. Then 790 clearly there may be an incentive for the sender to under-declare 791 congestion. This will be a particular problem with sources of 792 flooding attacks. 794 In addition there are potential problems from source spoofing. A 795 malicious sender can pretend to be another user by spoofing the 796 source address. A congestion exposure protocol will need to be 797 robust against injection of false congestion information into the 798 forward path that could distort or disrupt the integrity of the 799 congestion signal. 801 10. Conclusions 803 Congestion exposure is the idea that traffic itself indicates to all 804 nodes on its path how much congestion it causes on the entire path. 805 It is useful for network operators to police traffic only if it 806 really causes congestion in the Internet instead of doing blind rate 807 capping independently of the congestion situation. This change would 808 give incentives to users to adopt new transport protocols such as 809 LEDBAT which try to avoid congestion more than TCP does. 810 Requirements for congestion exposure in the IP header were 811 summarized, one technical solution was presented, and additional use 812 cases for congestion exposure were discussed. 814 11. Acknowledgements 816 A number of people other than authors have provided text and comments 817 for this memo. The document is being produced in support of a BoF on 818 Congestion Exposure as discussed extensively on the 819 mailing list. 821 12. Informative References 823 [BB-Incentive] MIT Communications Futures Program (CFP) and 824 Cambridge University Communications Research 825 Network, "The Broadband Incentive Problem", 826 September 2005. 828 [CC-open-research] Welzl, M., Scharf, M., Briscoe, B., and D. 829 Papadimitriou, "Open Research Issues in Internet 830 Congestion Control", draft-irtf-iccrg-welzl- 831 congestion-control-open-research-05 (work in 832 progress), September 2009. 834 [Cisco-VNI] Cisco Systems, inc., "Cisco Visual Networking 835 Index: Forecast and Methodology, 2008-2013", 836 June 2009. 838 [Comcast] Bastian, C., Klieber, T., Livingood, J., Mills, 839 J., and R. Woundy, "Comcast's Protocol-Agnostic 840 Congestion Management System", 841 draft-livingood-woundy-congestion-mgmt-03 (work 842 in progress), February 2010. 844 [KGao] Gao, K. and C. Wang, "Incrementally Deployable 845 Prevention to TCP Attack with Misbehaving 846 Receivers", December 2004. 848 [LEDBAT] Shalunov, S., "Low Extra Delay Background 849 Transport (LEDBAT)", 850 draft-ietf-ledbat-congestion-00 (work in 851 progress), October 2009. 853 [RFC0970] Nagle, J., "On packet switches with infinite 854 storage", RFC 970, December 1985. 856 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., 857 Deering, S., Estrin, D., Floyd, S., Jacobson, V., 858 Minshall, G., Partridge, C., Peterson, L., 859 Ramakrishnan, K., Shenker, S., Wroclawski, J., 860 and L. Zhang, "Recommendations on Queue 861 Management and Congestion Avoidance in the 862 Internet", RFC 2309, April 1998. 864 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The 865 Addition of Explicit Congestion Notification 866 (ECN) to IP", RFC 3168, September 2001. 868 [RFC3439] Bush, R. and D. Meyer, "Some Internet 869 Architectural Guidelines and Philosophy", 870 RFC 3439, December 2002. 872 [RFC3448] Handley, M., Floyd, S., Padhye, J., and J. 873 Widmer, "TCP Friendly Rate Control (TFRC): 874 Protocol Specification", RFC 3448, January 2003. 876 [RFC5594] Peterson, J. and A. Cooper, "Report from the IETF 877 Workshop on Peer-to-Peer (P2P) Infrastructure, 878 May 28, 2008", RFC 5594, July 2009. 880 [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, 881 C., Salvatori, A., Soppera, A., and M. Koyabe, 882 "Policing Congestion Response in an Internetwork 883 Using Re-Feedback", ACM SIGCOMM CCR 35(4)277-- 884 288, August 2005, . 887 [Savage] Savage, S., Wetherall, D., and T. Anderson, "TCP 888 Congestion Control with a Misbehaving Receiver", 889 ACM SIGCOMM Computer Communication Review , 1999. 891 [TCPcc] Jacobson, V. and M. Karels, "Congestion Avoidance 892 and Control", Proc. ACM SIGCOMM'88 Symposium, 893 Computer Communication Review 18(4)314--329, 894 August 1988, 895 . 897 [XCHOKe] Chhabra, P., Chuig, S., Goel, A., John, A., 898 Kumar, A., Saran, H., and R. Shorey, "XCHOKe: 899 Malicious Source Control for Congestion Avoidance 900 at Internet Gateways", Proceedings of IEEE 901 International Conference on Network Protocols 902 (ICNP-02) , November 2002, . 906 [pBox] Floyd, S. and K. Fall, "Promoting the Use of End- 907 to-End Congestion Control in the Internet", IEEE/ 908 ACM Transactions on Networking 7(4) 458--472, 909 August 1999, 910 . 912 Authors' Addresses 914 Toby Moncaster 915 BT 916 B54/70, Adastral Park 917 Martlesham Heath 918 Ipswich IP5 3RE 919 UK 921 Phone: +44 7918 901170 922 EMail: toby.moncaster@bt.com 923 Louise Krug 924 BT 925 B54/77, Adastral Park 926 Martlesham Heath 927 Ipswich IP5 3RE 928 UK 930 EMail: louise.burness@bt.com 932 Michael Menth 933 University of Wuerzburg 934 room B206, Institute of Computer Science 935 Am Hubland 936 Wuerzburg D-97074 937 Germany 939 Phone: +49 931 888 6644 940 EMail: menth@informatik.uni-wuerzburg.de 942 Joao Taveira Araujo 943 UCL 944 GS206 Department of Electronic and Electrical Engineering 945 Torrington Place 946 London WC1E 7JE 947 UK 949 EMail: j.araujo@ee.ucl.ac.uk 951 Steven Blake 952 Extreme Networks 953 Pamlico Building One, Suite 100 954 3306/08 E. NC Hwy 54 955 RTP, NC 27709 956 US 958 EMail: sblake@extremenetworks.com 959 Richard Woundy (editor) 960 Comcast 961 Comcast Cable Communications 962 27 Industrial Avenue 963 Chelmsford, MA 01824 964 US 966 EMail: richard_woundy@cable.comcast.com 967 URI: http://www.comcast.com