idnits 2.17.1 draft-briscoe-conex-re-ecn-motiv-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 11, 2014) is 3699 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Historic ---------------------------------------------------------------------------- == Outdated reference: A later version (-04) exists of draft-briscoe-conex-re-ecn-tcp-03 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe, Ed. 3 Internet-Draft A. Jacquet 4 Intended status: Historic BT 5 Expires: September 12, 2014 T. Moncaster 6 Moncaster.com 7 A. Smith 8 BT 9 March 11, 2014 11 Re-ECN: A Framework for adding Congestion Accountability to TCP/IP 12 draft-briscoe-conex-re-ecn-motiv-03 14 Abstract 16 This document describes a framework for using a new protocol called 17 re-ECN (re-inserted explicit congestion notification), which can be 18 deployed incrementally around unmodified routers. Re-ECN allows 19 accurate congestion monitoring throughout the network thus enabling 20 the upstream party at any trust boundary in the internetwork to be 21 held responsible for the congestion they cause, or allow to be 22 caused. So, networks can introduce straightforward accountability 23 for congestion and policing mechanisms for incoming traffic from end- 24 customers or from neighbouring network domains. As well as giving 25 the motivation for re-ECN this document also gives examples of 26 mechanisms that can use the protocol to ensure data sources respond 27 correctly to congestion. And it describes example mechanisms that 28 ensure the dominant selfish strategy of both network domains and end- 29 points will be to use the protocol honestly. 31 Note concerning Intended Status: If this draft were ever published as 32 an RFC it would probably have historic status. There is limited 33 space in the IP header, so re-ECN had to compromise by requiring the 34 receiver to be ECN-enabled otherwise the sender could not use re-ECN. 35 Re-ECN was a precursor to chartering of the IETF's Congestion 36 Exposure (ConEx) working group, but during chartering there were 37 still too few ECN receivers enabled, therefore it was decided to 38 pursue other compromises in order to fit a similar capability into 39 the IP header. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at http://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on September 12, 2014. 58 Copyright Notice 60 Copyright (c) 2014 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.2. Re-ECN Protocol in Brief . . . . . . . . . . . . . . . . . 5 78 1.3. The Re-ECN Framework . . . . . . . . . . . . . . . . . . . 6 79 1.4. Solving Hard Problems . . . . . . . . . . . . . . . . . . 7 80 1.5. The Rest of this Document . . . . . . . . . . . . . . . . 8 81 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 8 82 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 83 3.1. Policing Congestion Response . . . . . . . . . . . . . . . 9 84 3.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 9 85 3.1.2. The Case Against Bottleneck Policing . . . . . . . . . 10 86 4. Re-ECN Incentive Framework . . . . . . . . . . . . . . . . . . 11 87 4.1. Revealing Congestion Along the Path . . . . . . . . . . . 11 88 4.1.1. Positive and Negative Flows . . . . . . . . . . . . . 13 89 4.2. Incentive Framework Overview . . . . . . . . . . . . . . . 13 90 4.3. Egress Dropper . . . . . . . . . . . . . . . . . . . . . . 17 91 4.4. Ingress Policing . . . . . . . . . . . . . . . . . . . . . 19 92 4.5. Inter-domain Policing . . . . . . . . . . . . . . . . . . 21 93 4.6. Inter-domain Fail-safes . . . . . . . . . . . . . . . . . 24 94 4.7. The Case against Classic Feedback . . . . . . . . . . . . 25 95 4.8. Simulations . . . . . . . . . . . . . . . . . . . . . . . 26 96 5. Other Applications of Re-ECN . . . . . . . . . . . . . . . . . 26 97 5.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . . . 26 98 5.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . . . 28 99 5.3. Traffic Engineering . . . . . . . . . . . . . . . . . . . 28 100 5.4. Inter-Provider Service Monitoring . . . . . . . . . . . . 28 101 6. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 28 102 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 29 103 7.1. Incremental Deployment Features . . . . . . . . . . . . . 29 104 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 30 105 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 35 106 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 38 107 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 38 108 9.2. Congestion Notification Integrity . . . . . . . . . . . . 38 109 9.3. Identifying Upstream and Downstream Congestion . . . . . . 39 110 10. Security Considerations . . . . . . . . . . . . . . . . . . . 40 111 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 112 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 40 113 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 40 114 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 40 115 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 116 15.1. Normative References . . . . . . . . . . . . . . . . . . . 40 117 15.2. Informative References . . . . . . . . . . . . . . . . . . 41 118 Appendix A. Example Egress Dropper Algorithm . . . . . . . . . . 43 119 Appendix B. Policer Designs to ensure Congestion 120 Responsiveness . . . . . . . . . . . . . . . . . . . 44 121 B.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 44 122 B.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 45 123 Appendix C. Downstream Congestion Metering Algorithms . . . . . . 47 124 C.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 48 125 C.2. Inflation Factor for Persistently Negative Flows . . . . . 48 126 Appendix D. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 49 127 Appendix E. Argument for holding back the ECN nonce . . . . . . . 50 129 Authors' Statement: Status (to be removed by the RFC Editor) 131 Although the re-ECN protocol is intended to make a simple but far- 132 reaching change to the Internet architecture, the most immediate 133 priority for the authors is to delay any move of the ECN nonce to 134 Proposed Standard status. The argument for this position is 135 developed in Appendix E. 137 1. Introduction 139 This document aims to: 141 o Describe the motivation for wanting to introduce re-ECN; 143 o Provide a very brief description of the protocol; 145 o The framework within which the protocol sits; 147 o To show how a number of hard problems become much easier to solve 148 once re-ECN is available in IP. 150 This introduction starts with a run through of these 4 points. 152 1.1. Motivation 154 Re-ECN is proposed as a means of allowing accurate monitoring of 155 congestion throughout the Internet. The current Internet relies on 156 the vast majority of end-systems running TCP and reacting to detected 157 congestion by reducing their sending rates. Thus congestion control 158 is conducted by the collaboration of the majority of end-systems. 160 In this situation it is possible for applications that are 161 unresponsive to congestion to take whatever share of bottleneck 162 resources they want from responsive flows, the responsive flows 163 reduce their sending rate in face of congestion and effectively get 164 out of the way of unresponsive flows. An increasing proportion of 165 such applications could lead to congestion collapse being more common 166 [RFC3714]. Each network has no visibility of whole path congestion 167 and can only respond to congestion on a local basis. 169 Using re-ECN will allow any point along a path to calculate 170 congestion both upstream and downstream of that point. As a 171 consequence of this policing of congestion /could/ be carried out in 172 the network if end-systems fail to do so. Re-ECN enables flows and 173 users to be policed and for policing to happen at network ingress and 174 at network borders. 176 1.2. Re-ECN Protocol in Brief 178 In re-ECN each sender makes a prediction of the congestion that each 179 flow will cause and signals that prediction within the IP headers of 180 that flow. The prediction is based on, but not limited to, feedback 181 received from the receiver. Sending a prediction of the congestion 182 gives network equipment a view of the congestion downstream and 183 upstream. 185 In order to explain this mechanism we introduce the notion of IP 186 packets carrying different, notional values dependent on the state of 187 their header flags: 189 o Negative - are those marked by queues when incipient congestion is 190 detected. This is exactly the same as ECN [RFC3168]; 192 o Positive - are sent by the sender in proportion to the number of 193 bytes in packets that have been marked negative according to 194 feedback received from the receiver; 196 o Cautious - are sent whenever the sender cannot be sure of the 197 correct amount of positive bytes to inject into the network for 198 example, at the start of a flow to indicate that feedback has not 199 been established; 201 o Cancelled - packets sent by the sender as positive that get marked 202 as negative by queues in the network due to incipient congestion; 204 o Neutral - normal IP packets but show queues that they can be 205 marked negative. 207 A flow starts to transmit packets. No feedback has been established 208 so a number of cautious packets are sent (see the protocol definition 209 [Re-TCP] for an analysis of how many cautious packets should be sent 210 at flow start). The rest are sent as neutral. 212 The packets traverse a congested queue. A fraction are marked 213 negative as an indication of incipient congestion. 215 The packets are received by the receiver. The receiver feeds back to 216 the sender a count of the number of packets that have been marked 217 negative. This feedback can be provided either by the transport 218 (e.g. TCP) or by higher-layer control messages. 220 The sender receives the feedback and then sends a number of positive 221 packets in proportion to the bytes represented by packets that have 222 been marked negative. It is important to note that congestion is 223 revealed by the fraction of marked packets rather than a field in the 224 IP header. This is due to the limited code points available and 225 includes use of the last unallocated bit (sometimes called the evil 226 bit [RFC3514]). Full details of the code points used is given in 227 [Re-TCP]. This lack of codepoints is, however, the case with IPv4. 228 ECN is similarly restricted. 230 The number of bytes inside the negative packets and positive packets 231 should therefore be approximately equal at the termination point of 232 the flow. To put it another way, the balance of negative and 233 positive should be zero. 235 1.3. The Re-ECN Framework 237 The introducion of the protocol enables 3 things: 239 o Gives a view of whole path congestion; 241 o Enables policing of flows; 243 o It allows networks to monitor the flow of congestion across their 244 borders. 246 At any point in the network a device can calculate the upstream 247 congestion by calculating the fraction of bytes in negative packets 248 to total packets. This it could do using ECN by calculating the 249 fraction of packets marked Congestion Experienced. 251 Using re-ECN a device in the network can calculate downstream 252 congestion by subtracting the fraction of negative packets from the 253 fraction of positive packets. 255 A user can be restricted to only causing a certain amount of 256 congestion. A Policer could be introduced at the ingress of a 257 network that counts the number of positive packets being sent and 258 limits the sender if that sender ties to transmit more positive 259 packets than their allowance. 261 A user could deliberately ignore some or all of the feedback and 262 transmit packets with a zero or much lower proportion of positive 263 packets than negative packets. To solve this a Dropper is proposed. 264 This would be placed at the egress of a network. If the number of 265 negative packets exceeds the number of positive packets then the flow 266 could be dropped or some other sanction enacted. 268 Policers and droppers could be used between networks in order to 269 police bulk traffic. A whole network harbouring users causing 270 congestion in downstream networks can be held responsible or policed 271 by its downstream neighbour. 273 1.4. Solving Hard Problems 275 We have already shown that by making flows declare the level of 276 congestion they are causing that they can be policed, more 277 specifically these are the kind of problems that can be solved: 279 o mitigating distributed denial of service (DDoS); 281 o simplifying differentiation of quality of service (QoS); 283 o policing compliance to congestion control; 285 o inter-provider service monitoring; 287 o etc. 289 Uniquely, re-ECN manages to enable solutions to these problems 290 without unduly stifling innovative new ways to use the Internet. 291 This was a hard balance to strike, given it could be argued that DDoS 292 is an innovative way to use the Internet. The most valuable insight 293 was to allow each network to choose the level of constraint it wishes 294 to impose. Also re-ECN has been carefully designed so that networks 295 that choose to use it conservatively can protect themselves against 296 the congestion caused in their network by users on other networks 297 with more liberal policies. 299 For instance, some network owners want to block applications like 300 voice and video unless their network is compensated for the extra 301 share of bottleneck bandwidth taken. These real-time applications 302 tend to be unresponsive when congestion arises. Whereas elastic TCP- 303 based applications back away quickly, ending up taking a much smaller 304 share of congested capacity for themselves. Other network owners 305 want to invest in large amounts of capacity and make their gains from 306 simplicity of operation and economies of scale. 308 While we have designed re-ECN so that networks can choose to deploy 309 stringent policing, this does not imply we advocate that every 310 network should introduce tight controls on those that cause 311 congestion. Re-ECN has been specifically designed to allow different 312 networks to choose how conservative or liberal they wish to be with 313 respect to policing congestion. But those that choose to be 314 conservative can protect themselves from the excesses that liberal 315 networks allow their users. 317 Re-ECN allows the more conservative networks to police out flows that 318 have not asked to be unresponsive to congestion---not because they 319 are voice or video---just because they don't respond to congestion. 320 But it also allows other networks to choose not to police. 322 Crucially, when flows from liberal networks cross into a conservative 323 network, re-ECN enables the conservative network to apply penalties 324 to its neighbouring networks for the congestion they allow to be 325 caused. And these penalties can be applied to bulk data, without 326 regard to flows. 328 Then, if unresponsive applications become so dominant that some of 329 the more liberal networks experience congestion collapse [RFC3714], 330 they can change their minds and use re-ECN to apply tighter controls 331 in order to bring congestion back under control. 333 Re-ECN reduces the need for complex network equipment to perform 334 these functions. 336 1.5. The Rest of this Document 338 This document is structured as follows. First the motivation for the 339 new protocol is given (Section 3) followed by the incentive framework 340 that is possible with the protocol Section 4. Section 5 then 341 describes other important applications re-ECN, such as policing DDoS, 342 QoS and congestion control. Although these applications do not 343 require standardisation themselves, they are described in a fair 344 degree of detail in order to explain how re-ECN can be used. Given 345 re-ECN proposes to use the last undefined bit in the IPv4 header, we 346 felt it necessary to outline the potential that re-ECN could release 347 in return for being given that bit. 349 Deployment issues discussed throughout the document are brought 350 together in Section 7, which is followed by a brief section 351 explaining the somewhat subtle rationale for the design from an 352 architectural perspective (Section 8). We end by describing related 353 work (Section 9), listing security considerations (Section 10) and 354 finally drawing conclusions (Section 12). 356 2. Requirements notation 358 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 359 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 360 document are to be interpreted as described in [RFC2119]. 362 This document first specifies a protocol, then describes a framework 363 that creates the right incentives to ensure compliance to the 364 protocol. This could cause confusion because the second part of the 365 document considers many cases where malicious nodes may not comply 366 with the protocol. When such contingencies are described, if any of 367 the above keywords are not capitalised, that is deliberate. So, for 368 instance, the following two apparently contradictory sentences would 369 be perfectly consistent: i) x MUST do this; ii) x may not do this. 371 3. Motivation 373 3.1. Policing Congestion Response 375 3.1.1. The Policing Problem 377 The current Internet architecture trusts hosts to respond voluntarily 378 to congestion. Limited evidence shows that the large majority of 379 end-points on the Internet comply with a TCP-friendly response to 380 congestion. But telephony (and increasingly video) services over the 381 best effort Internet are attracting the interest of major commercial 382 operations. Most of these applications do not respond to congestion 383 at all. Those that can switch to lower rate codecs. 385 Of course, the Internet is intended to support many different 386 application behaviours. But the problem is that this freedom can be 387 exercised irresponsibly. The greater problem is that we will never 388 be able to agree on where the boundary is between responsible and 389 irresponsible. Therefore re-ECN is designed to allow different 390 networks to set their own view of the limit to irresponsibility, and 391 to allow networks that choose a more conservative limit to push back 392 against congestion caused in more liberal networks. 394 As an example of the impossibility of setting a standard for 395 fairness, mandating TCP-friendliness would set the bar too high for 396 unresponsive streaming media, but still some would say the bar was 397 too low [relax-fairness]. Even though all known peer-to-peer 398 filesharing applications are TCP-compatible, they can cause a 399 disproportionate amount of congestion, simply by using multiple flows 400 and by transferring data continuously relative to other short-lived 401 sessions. On the other hand, if we swung the other way and set the 402 bar low enough to allow streaming media to be unresponsive, we would 403 also allow denial of service attacks, which are typically 404 unresponsive to congestion and consist of multiple continuous flows. 406 Applications that need (or choose) to be unresponsive to congestion 407 can effectively take (some would say steal) whatever share of 408 bottleneck resources they want from responsive flows. Whether or not 409 such free-riding is common, inability to prevent it increases the 410 risk of poor returns for investors in network infrastructure, leading 411 to under-investment. An increasing proportion of unresponsive or 412 free-riding demand coupled with persistent under-supply is a broken 413 economic cycle. Therefore, if the current, largely co-operative 414 consensus continues to erode, congestion collapse could become more 415 common in more areas of the Internet [RFC3714]. 417 While we have designed re-ECN so that networks can choose to deploy 418 stringent policing, this does not imply we advocate that every 419 network should introduce tight controls on those that cause 420 congestion. Re-ECN has been specifically designed to allow different 421 networks to choose how conservative or liberal they wish to be with 422 respect to policing congestion. But those that choose to be 423 conservative can protect themselves from the excesses that liberal 424 networks allow their users. 426 3.1.2. The Case Against Bottleneck Policing 428 The state of the art in rate policing is the bottleneck policer, 429 which is intended to be deployed at any forwarding resource that may 430 become congested. Its aim is to detect flows that cause 431 significantly more local congestion than others. Although operators 432 might solve their immediate problems by deploying bottleneck 433 policers, we are concerned that widespread deployment would make it 434 extremely hard to evolve new application behaviours. We believe the 435 IETF should offer re-ECN as the preferred protocol on which to base 436 solutions to the policing problems of operators, because it would not 437 harm evolvability and, frankly, it would be far more effective (see 438 later for why). 440 Approaches like [XCHOKe] & [pBox] are nice approaches for rate 441 policing traffic without the benefit of whole path information (such 442 as could be provided by re-ECN). But they must be deployed at 443 bottlenecks in order to work. Unfortunately, a large proportion of 444 traffic traverses at least two bottlenecks (in two access networks), 445 particularly with the current traffic mix where peer-to-peer file- 446 sharing is prevalent. If ECN were deployed, we believe it would be 447 likely that these bottleneck policers would be adapted to combine ECN 448 congestion marking from the upstream path with local congestion 449 knowledge. But then the only useful placement for such policers 450 would be close to the egress of the internetwork. 452 But then, if these bottleneck policers were widely deployed (which 453 would require them to be more effective than they are now), the 454 Internet would find itself with one universal rate adaptation policy 455 (probably TCP-friendliness) embedded throughout the network. Given 456 TCP's congestion control algorithm is already known to be hitting its 457 scalability limits and new algorithms are being developed for high- 458 speed congestion control, embedding TCP policing into the Internet 459 would make evolution to new algorithms extremely painful. If a 460 source wanted to use a different algorithm, it would have to first 461 discover then negotiate with all the policers on its path, 462 particularly those in the far access network. The IETF has already 463 traveled that path with the Intserv architecture and found it 464 constrains scalability [RFC2208]. 466 Anyway, if bottleneck policers were ever widely deployed, they would 467 be likely to be bypassed by determined attackers. They inherently 468 have to police fairness per flow or per source-destination pair. 469 Therefore they can easily be circumvented either by opening multiple 470 flows (by varying the end-point port number); or by spoofing the 471 source address but arranging with the receiver to hide the true 472 return address at a higher layer. 474 4. Re-ECN Incentive Framework 476 The aim is to create an incentive environment that ensures optimal 477 sharing of capacity despite everyone acting selfishly (including 478 lying and cheating). Of course, the mechanisms put in place for this 479 can lie dormant wherever co-operation is the norm. 481 4.1. Revealing Congestion Along the Path 483 Throughout this document we focus on path congestion. But some forms 484 of fairness, particularly TCP's, also depend on round trip time. If 485 TCP-fairness is required, we also propose to measure downstream path 486 delay using re-feedback. We give a simple outline of how this could 487 work in Appendix D. However, we do not expect this to be necessary, 488 as researchers tend to agree that only congestion control dynamics 489 need to depend on RTT, not the rate that the algorithm would converge 490 on after a period of stability. 492 Recall that re-ECN can be used to measure path congestion at any 493 point on the path. End-systems know the whole path congestion. The 494 receiver knows this by the ratio of negative packets to all other 495 packets it observes. The sender knows this same information via the 496 feedback. 498 +---+ +----+ +----+ +---+ 499 | S |--| Q1 |----------------| Q2 |--| R | 500 +---+ +----+ +----+ +---+ 501 . . . . 502 ^ . . . . 503 | . . . . 504 | . positive fraction . . 505 3% |-------------------------------+======= 506 | . . | . 507 2% | . . | . 508 | . . negative fraction | . 509 1% | . +----------------------+ . 510 | . | . . 511 0% +---------------------------------------> 512 ^ ^ ^ 513 L M N Observation points 515 Figure 1: A 2-Queue Example (Imprecise) 517 Figure 1 uses a simple network to illustrate how re-ECN allows queues 518 to measure downstream congestion. The receiver counts negative 519 packets as being 3% of all received packets. This fraction is fed 520 back to the sender. The sender sets 3% of its packets to be positive 521 to match this. This fraction of positive packets can be observed 522 along the path. This is shown by the horizontal line at 3% in the 523 figure. The negative fraction is shown by the stepped line which 524 rises to meet the positive fraction line with steps at at each queue 525 where packets are marked negative. Two queues are shown (Q1 and Q2) 526 that are currently congested. Each time packets pass through a 527 fraction are marked red; 1% at Q1 and 2% at Q2). The approximate 528 downstream congestion can be measured at the observation points shown 529 along the path by subtracting the negative fraction from the positive 530 fraction, as shown in the table below. [Re-TCP] [ref other document] 531 derives these approximations from a precise analysis). 533 +-------------------+------------------------------+ 534 | Observation point | Approx downstream congestion | 535 +-------------------+------------------------------+ 536 | L | 3% - 0% = 3% | 537 | M | 3% - 1% = 2% | 538 | N | 3% - 3% = 0% | 539 +-------------------+------------------------------+ 541 Table 1: Downstream Congestion Measured at Example Observation Points 543 All along the path, whole-path congestion remains unchanged so it can 544 be used as a reference against which to compare upstream congestion. 546 The difference predicts downstream congestion for the rest of the 547 path. Therefore, measuring the fractions of negative and positive 548 packets at any point in the Internet will reveal upstream, downstream 549 and whole path congestion. 551 Note: to be absolutely clear these fractions are averages that would 552 result from the behaviour of the protocol handler mechanically 553 sending positive packets in direct response to incoming feedback---we 554 are not saying any protocol handler has to work with these average 555 fractions directly. 557 4.1.1. Positive and Negative Flows 559 In section Section 1.2 we introduced the notion of IP packets having 560 different values (negative, positive, cautious, cancelled and 561 neutral). So positive and cautious packets have a value of +1, 562 negative -1, and cancelled and neutral have zero value. 564 In the rest of this document we will loosely talk of positive or 565 negative flows. A negative flow is one where more negative bytes 566 than positive bytes arrive at the reciever. Likewise positive flows 567 are where more positive bytes arrive than negative bytes. Both of 568 these indicate that the wrong amount of positive bytes have been 569 sent. 571 4.2. Incentive Framework Overview 573 Figure 2 sketches the incentive framework that we will describe piece 574 by piece throughout this section. We will do a first pass in 575 overview, then return to each piece in detail. We re-use the earlier 576 example of how downstream congestion is derived by subtracting 577 upstream congestion from path congestion (Figure 1) but depict 578 multiple trust boundaries to turn it into an internetwork. For 579 clarity, only downstream congestion is shown (the difference between 580 the two earlier plots). The graph displays downstream path 581 congestion seen in a typical flow as it traverses an example path 582 from sender S to receiver R, across networks N1, N2 & N3. Everyone 583 is shown using re-ECN correctly, but we intend to show why everyone 584 would /choose/ to use it correctly, and honestly. 586 Three main types of self-interest can be identified: 588 o Users want to transmit data across the network as fast as 589 possible, paying as little as possible for the privilege. In this 590 respect, there is no distinction between senders and receivers, 591 but we must be wary of potential malice by one on the other; 593 o Network operators want to maximise revenues from the resources 594 they invest in. They compete amongst themselves for the custom of 595 users. 597 o Attackers (whether users or networks) want to use any opportunity 598 to subvert the new re-ECN system for their own gain or to damage 599 the service of their victims, whether targeted or random. 601 policer dropper 602 | | 603 | | 604 S <-----N1----> <---N2---> <---N3--> R domain 605 | | 606 | | 607 Border Gateways 609 Figure 2: Incentive Framework 611 Source congestion control: We want to ensure that the sender will 612 throttle its rate as downstream congestion increases. Whatever 613 the agreed congestion response (whether TCP-compatible or some 614 enhanced QoS), to some extent it will always be against the 615 sender's interest to comply. 617 Ingress policing: But it is in all the network operators' interests 618 to encourage fair congestion response, so that their investments 619 are employed to satisfy the most valuable demand. The re-ECN 620 protocol ensures packets carry the necessary information about 621 their own expected downstream congestion so that N1 can deploy a 622 policer at its ingress to check that S1 is complying with whatever 623 congestion control it should be using (Section 4.4). If N1 is 624 extremely conservative it could police each flow, but it is likely 625 to just police the bulk amount of congestion each customer causes 626 without regard to flows, or if it is extremely liberal it need not 627 police congestion control at all. Whatever, it is always 628 preferable to police traffic at the very first ingress into an 629 internetwork, before non-compliant traffic can cause any damage. 631 Edge egress dropper: If the policer ensures the source has less 632 right to a high rate the higher it declares downstream congestion, 633 the source has a clear incentive to understate downstream 634 congestion. But, if flows of packets are understated when they 635 enter the internetwork, they will have become negative by the time 636 they leave. So, we introduce a dropper at the last network 637 egress, which drops packets in flows that persistently declare 638 negative downstream congestion (see Section 4.3 for details). 640 Inter-domain traffic policing: But next we must ask, if congestion 641 arises downstream (say in N3), what is the ingress network's 642 (N1's) incentive to police its customers' response? If N1 turns a 643 blind eye, its own customers benefit while other networks suffer. 644 This is why all inter-domain QoS architectures (e.g. Intserv, 645 Diffserv) police traffic each time it crosses a trust boundary. 646 We have already shown that re-ECN gives a trustworthy measure of 647 the expected downstream congestion that a flow will cause by 648 subtracting negative volume from positive at any intermediate 649 point on a path. N3 (say) can use this measure to police all the 650 responses to congestion of all the sources beyond its upstream 651 neighbour (N2), but in bulk with one very simple passive 652 mechanism, rather than per flow, as we will now explain. 654 Emulating policing with inter-domain congestion penalties: Between 655 high-speed networks, we would rather avoid per-flow policing, and 656 we would rather avoid holding back traffic while it is policed. 657 Instead, once re-ECN has arranged headers to carry downstream 658 congestion honestly, N2 can contract to pay N3 penalties in 659 proportion to a single bulk count of the congestion metrics 660 crossing their mutual trust boundary (Section 4.5). In this way, 661 N3 puts pressure on N2 to suppress downstream congestion, for 662 every flow passing through the border interface, even though they 663 will all start and end in different places, and even though they 664 may all be allowed different responses to congestion. The figure 665 depicts this downward pressure on N2 by the solid downward arrow 666 at the egress of N2. Then N2 has an incentive either to police 667 the congestion response of its own ingress traffic (from N1) or to 668 emulate policing by applying penalties to N1 in turn on the basis 669 of congestion counted at their mutual boundary. In this recursive 670 way, the incentives for each flow to respond correctly to 671 congestion trace back with each flow precisely to each source, 672 despite the mechanism not recognising flows (see Section 5.2). 674 Inter-domain congestion charging diversity: Any two networks are 675 free to agree any of a range of penalty regimes between themselves 676 but they would only provide the right incentives if they were 677 within the following reasonable constraints. N2 should expect to 678 have to pay penalties to N3 where penalties monotonically increase 679 with the volume of congestion and negative penalties are not 680 allowed. For instance, they may agree an SLA with tiered 681 congestion thresholds, where higher penalties apply the higher the 682 threshold that is broken. But the most obvious (and useful) form 683 of penalty is where N3 levies a charge on N2 proportional to the 684 volume of downstream congestion N2 dumps into N3. In the 685 explanation that follows, we assume this specific variant of 686 volume charging between networks - charging proportionate to the 687 volume of congestion. 689 We must make clear that we are not advocating that everyone should 690 use this form of contract. We are well aware that the IETF tries 691 to avoid standardising technology that depends on a particular 692 business model. And we strongly share this desire to encourage 693 diversity. But our aim is merely to show that border policing can 694 at least work with this one model, then we can assume that 695 operators might experiment with the metric in other models (see 696 Section 4.5 for examples). Of course, operators are free to 697 complement this usage element of their charges with traditional 698 capacity charging, and we expect they will as predicted by 699 economics. 701 No congestion charging to users: Bulk congestion penalties at trust 702 boundaries are passive and extremely simple, and lose none of 703 their per-packet precision from one boundary to the next (unlike 704 Diffserv all-address traffic conditioning agreements, which 705 dissipate their effectiveness across long topologies). But at any 706 trust boundary, there is no imperative to use congestion charging. 707 Traditional traffic policing can be used, if the complexity and 708 cost is preferred. In particular, at the boundary with end 709 customers (e.g. between S and N1), traffic policing will most 710 likely be more appropriate. Policer complexity is less of a 711 concern at the edge of the network. And end-customers are known 712 to be highly averse to the unpredictability of congestion 713 charging. 715 NOTE WELL: This document neither advocates nor requires congestion 716 charging for end customers and advocates but does not require 717 inter-domain congestion charging. 719 Competitive discipline of inter-domain traffic engineering: With 720 inter-domain congestion charging, a domain seems to have a 721 perverse incentive to fake congestion; N2's profit depends on the 722 difference between congestion at its ingress (its revenue) and at 723 its egress (its cost). So, overstating internal congestion seems 724 to increase profit. However, smart border routing [Smart_rtg] by 725 N1 will bias its routing towards the least cost routes. So, N2 726 risks losing all its revenue to competitive routes if it 727 overstates congestion (see Section 5.3). In other words, if N2 is 728 the least congested route, its ability to raise excess profits is 729 limited by the congestion on the next least congested route. 731 Closing the loop: All the above elements conspire to trap everyone 732 between two opposing pressures, ensuring the downstream congestion 733 metric arrives at the destination neither above nor below zero. 734 So, we have arrived back where we started in our argument. The 735 ingress edge network can rely on downstream congestion declared in 736 the packet headers presented by the sender. So it can police the 737 sender's congestion response accordingly. 739 Evolvability of congestion control: We have seen that re-ECN enables 740 policing at the very first ingress. We have also seen that, as 741 flows continue on their path through further networks downstream, 742 re-ECN removes the need for further per-domain ingress policing of 743 all the different congestion responses allowed to each different 744 flow. This is why the evolvability of re-ECN policing is so 745 superior to bottleneck policing or to any policing of different 746 QoS for different flows. Even if all access networks choose to 747 conservatively police congestion per flow, each will want to 748 compete with the others to allow new responses to congestion for 749 new types of application. With re-ECN, each can introduce new 750 controls independently, without coordinating with other networks 751 and without having to standardise anything. But, as we have just 752 seen, by making inter-domain penalties proportionate to bulk 753 downtream congestion, downstream networks can be agnostic to the 754 specific congestion response for each flow, but they can still 755 apply more penalty the more liberal the ingress access network has 756 been in the response to congestion it allowed for each flow. 758 We now take a second pass over the incentive framework, filling in 759 the detail. 761 4.3. Egress Dropper 763 As traffic leaves the last network before the receiver (domain N3 in 764 Figure 2), the fraction of positive octets in a flow should match the 765 fraction of negative octets introduced by congestion marking (red 766 packets), leaving a balance of zero. If it is less (a negative 767 flow), it implies that the source is understating path congestion 768 (which will reduce the penalties that N2 owes N3). 770 If flows are positive, N3 need take no action---this simply means its 771 upstream neighbour is paying more penalties than it needs to, and the 772 source is going slower than it needs to. But, to protect itself 773 against persistently negative flows, N3 will need to install a 774 dropper at its egress. Appendix A gives a suggested algorithm for 775 this dropper. There is no intention that the dropper algorithm needs 776 to be standardised, it is merely provided to show that an efficient, 777 robust algorithm is possible. But whatever algorithm is used must 778 meet the criteria below: 780 o It SHOULD introduce minimal false positives for honest flows; 782 o It SHOULD quickly detect and sanction dishonest flows (minimal 783 false negatives); 785 o It SHOULD be invulnerable to state exhaustion attacks from 786 malicious sources. For instance, if the dropper uses flow-state, 787 it should not be possible for a source to send numerous packets, 788 each with a different flow ID, to force the dropper to exhaust its 789 memory capacity (rationale for SHOULD: Continuously sending keep- 790 alive packets might be perfectly reasonable behaviour, so we can't 791 distinguish a deliberate attack from reasonable levels of such 792 behaviour. Therefore it is strictly impossible to be invulnerable 793 to such an attack); 795 o It MUST introduce sufficient loss in goodput so that malicious 796 sources cannot play off losses in the egress dropper against 797 higher allowed throughput. Salvatori [CLoop_pol] describes this 798 attack, which involves the source understating path congestion 799 then inserting forward error correction (FEC) packets to 800 compensate expected losses; 802 o It MUST NOT be vulnerable to `identity whitewashing', where a 803 transport can label a flow with a new ID more cheaply than paying 804 the cost of continuing to use its current ID. 806 Note that the dropper operates on flows but we would like it not to 807 require per-flow state. This is why we have been careful to ensure 808 that all flows MUST start with a cautious packet. If a flow does not 809 start with a cautious packet, a dropper is likely to treat it 810 unfavourably. This risk makes it worth sending a cautious packet at 811 the start of a flow, even though there is a cost to the sender of 812 doing so (positive `worth'). Indeed, with cautious packets, the rate 813 at which a sender can generate new flows can be limited (Appendix B). 814 In this respect, cautious packets work like Handley's state set-up 815 bit [Steps_DoS]. 817 Appendix A also gives an example dropper implementation that 818 aggregates flow state. Dropper algorithms will often maintain a 819 moving average across flows of the fraction of positive packets. 820 When maintaining an average across flows, a dropper SHOULD only allow 821 flows into the average if they start with a cautious packet, but it 822 SHOULD NOT include cautious packets in the positive packet average. 823 A sender sends cautious packets when it does not have the benefit of 824 feedback from the receiver. So, counting cautious packets would be 825 likely to make the average unnecessarily positive, providing headroom 826 (or should we say footroom?) for dishonest (negative) traffic. 828 If the dropper detects a persistently negative flow, it SHOULD drop 829 sufficient negative and neutral packets to force the flow to not be 830 negative. Drops SHOULD be focused on just sufficient packets in 831 misbehaving flows to remove the negative bias while doing minimal 832 extra harm. 834 4.4. Ingress Policing 836 Access operators who wish to limit the congeston that a sender is 837 able to cause can deploy policers at the very first ingress to the 838 internetwork. Re-ECN has been designed to avoid the need for 839 bottleneck policing so that we can avoid a future where a single rate 840 adaptation policy is embedded throughout the network. Instead, re- 841 ECN allows the particular rate adaptation policy to be solely agreed 842 bilaterally between the sender and its ingress access provider ([ref 843 other document] discusses possible ways to signal between them), 844 which allows congestion control to be policed, but maintains its 845 evolvability, requiring only a single, local box to be updated. 847 Appendix B gives examples of per-user policing algorithms. But there 848 is no implication that these algorithms are to be standardised, or 849 that they are ideal. The ingress rate policer is the part of the re- 850 ECN incentive framework that is intended to be the most flexible. 851 Once endpoint protocol handlers for re-ECN and egress droppers are in 852 place, operators can choose exactly which congestion response they 853 want to police, and whether they want to do it per user, per flow or 854 not at all. 856 The re-ECN protocol allows these ingress policers to easily perform 857 bulk per-user policing (Appendix B.1). This is likely to provide 858 sufficient incentive to the user to correctly respond to congestion 859 without needing the policing function to be overly complex. If an 860 access operator chose they could use per-flow policing according to 861 the widely adopted TCP rate adaptation ( Appendix B.2) or other 862 alternatives, however this would introduce extra complexity to the 863 system. 865 If a per-flow rate policer is used, it should use path (not 866 downstream) congestion as the relevant metric, which is represented 867 by the fraction of octets in packets with positive (positive and 868 cautious packets) and cancelled packets. Of course, re-ECN provides 869 all the information a policer needs directly in the packets being 870 policed. So, even policing TCP's AIMD algorithm is relatively 871 straightforward (Appendix B.2). 873 Note that we have included cancelled packets in the measure of path 874 congestion. cancelled packets arise when the sender sends a positive 875 packet in response to feedback, but then this positive packet just 876 happens to be congestion marked itself. One would not normally 877 expect many cancelled packets at the first ingress because one would 878 not normally expect much congestion marking to have been necessary 879 that soon in the path. However, a home network or campus network may 880 well sit between the sending endpoint and the ingress policer, so 881 some congestion may occur upstream of the policer. And if congestion 882 does occur upstream, some cancelled packets should be visible, and 883 should be taken into account in the measure of path congestion. 885 But a much more important reason for including cancelled packets in 886 the measure of path congestion at an ingress policer is that a sender 887 might otherwise subvert the protocol by sending cancelled packets 888 instead of neutral packets. Like neutral, cancelled packets are 889 worth zero, so the sender knows they won't be counted against any 890 quota it might have been allowed. But unlike neutral packets, 891 cancelled packets are immune to congestion marking, because they have 892 already been congestion marked. So, it is both correct and useful 893 that cancelled packets should be included in a policer's measure of 894 path congestion, as this removes the incentive the sender would 895 otherwise have to mark more packets as cancelled than it should. 897 An ingress policer should also ensure that flows are not already 898 negative when they enter the access network. As with cancelled 899 packets, the presence of negative packets will typically be unusual. 900 Therefore it will be easy to detect negative flows at the ingress by 901 just detecting negative packets then monitoring the flow they belong 902 to. 904 Of course, even if the sender does operate its own network, it may 905 arrange not to congestion mark traffic. Whether the sender does this 906 or not is of no concern to anyone else except the sender. Such a 907 sender will not be policed against its own network's contribution to 908 congestion, but the only resulting problem would be overload in the 909 sender's own network. 911 Finally, we must not forget that an easy way to circumvent re-ECN's 912 defences is for the source to turn off re-ECN support, by setting the 913 Not-RECT codepoint, implying RFC3168 compliant traffic. Therefore an 914 ingress policer should put a general rate-limit on Not-RECT traffic, 915 which SHOULD be lax during early, patchy deployment, but will have to 916 become stricter as deployment widens. Similarly, flows starting 917 without a cautious packet can be confined by a strict rate-limit used 918 for the remainder of flows that haven't proved they are well-behaved 919 by starting correctly (therefore they need not consume any flow 920 state---they are just confined to the `misbehaving' bin if they carry 921 an unrecognised flow ID). 923 4.5. Inter-domain Policing 925 One of the main design goals of re-ECN is for border security 926 mechanisms to be as simple as possible, otherwise they will become 927 the pinch-points that limit scalability of the whole internetwork. 928 We want to avoid per-flow processing at borders and to keep to 929 passive mechanisms that can monitor traffic in parallel to 930 forwarding, rather than having to filter traffic inline---in series 931 with forwarding. Such passive, off-line mechanisms are essential for 932 future high-speed all-optical border interconnection where packets 933 cannot be buffered while they are checked for policy compliance. 935 So far, we have been able to keep the border mechanisms simple, 936 despite having had to harden them against some subtle attacks on the 937 re-ECN design. The mechanisms are still passive and avoid per-flow 938 processing. 940 The basic accounting mechanism at each border interface simply 941 involves accumulating the volume of packets with positive worth 942 (positive and cautious packets), and subtracting the volume of those 943 with negative worth (red packets). Even though this mechanism takes 944 no regard of flows, over an accounting period (say a month) this 945 subtraction will account for the downstream congestion caused by all 946 the flows traversing the interface, wherever they come from, and 947 wherever they go to. The two networks can agree to use this metric 948 however they wish to determine some congestion-related penalty 949 against the upstream network. Although the algorithm could hardly be 950 simpler, it is spelled out using pseudo-code in Appendix C.1. 952 Various attempts to subvert the re-ECN design have been made. In all 953 cases their root cause is persistently negative flows. But, after 954 describing these attacks we will show that we don't actually have to 955 get rid of all persistently negative flows in order to thwart the 956 attacks. 958 In honest flows, downstream congestion is measured as positive minus 959 negative volume. So if all flows are honest (i.e. not persistently 960 negative), adding all positive volume and all negative volume without 961 regard to flows will give an aggregate measure of downstream 962 congestion. But such simple aggregation is only possible if no flows 963 are persistently negative. Unless persistently negative flows are 964 completely removed, they will reduce the aggregate measure of 965 congestion. The aggregate may still be positive overall, but not as 966 positive as it would have been had the negative flows been removed. 968 In Section 4.3 we discussed how to sanction traffic to remove, or at 969 least to identify, persistently negative flows. But, even if the 970 sanction for negative traffic is to discard it, unless it is 971 discarded at the exact point it goes negative, it will wrongly 972 subtract from aggregate downstream congestion, at least at any 973 borders it crosses after it has gone negative but before it is 974 discarded. 976 We rely on sanctions to deter dishonest understatement of congestion. 977 But even the ultimate sanction of discard can only be effective if 978 the sender is bothered about the data getting through to its 979 destination. A number of attacks have been identified where a sender 980 gains from sending dummy traffic or it can attack someone or 981 something using dummy traffic even though it isn't communicating any 982 information to anyone: 984 o A host can send traffic with no positive packets towards its 985 intended destination, aiming to transmit as much traffic as any 986 dropper will allow [Bauer06]. It may add forward error correction 987 (FEC) to repair as much drop as it experiences. 989 o A host can send dummy traffic into the network with no positive 990 packets and with no intention of communicating with anyone, but 991 merely to cause higher levels of congestion for others who do want 992 to communicate (DoS). So, to ride over the extra congestion, 993 everyone else has to spend more of whatever rights to cause 994 congestion they have been allowed. 996 o A network can simply create its own dummy traffic to congest 997 another network, perhaps causing it to lose business at no cost to 998 the attacking network. This is a form of denial of service 999 perpetrated by one network on another. The preferential drop 1000 measures in [ref other document] provide crude protection against 1001 such attacks, but we are not overly worried about more accurate 1002 prevention measures, because it is already possible for networks 1003 to DoS other networks on the general Internet, but they generally 1004 don't because of the grave consequences of being found out. We 1005 are only concerned if re-ECN increases the motivation for such an 1006 attack, as in the next example. 1008 o A network can just generate negative traffic and send it over its 1009 border with a neighbour to reduce the overall penalties that it 1010 should pay to that neighbour. It could even initialise the TTL so 1011 it expired shortly after entering the neighbouring network, 1012 reducing the chance of detection further downstream. This attack 1013 need not be motivated by a desire to deny service and indeed need 1014 not cause denial of service. A network's main motivator would 1015 most likely be to reduce the penalties it pays to a neighbour. 1016 But, the prospect of financial gain might tempt the network into 1017 mounting a DoS attack on the other network as well, given the gain 1018 would offset some of the risk of being detected. 1020 The first step towards a solution to all these problems with negative 1021 flows is to be able to estimate the contribution they make to 1022 downstream congestion at a border and to correct the measure 1023 accordingly. Although ideally we want to remove negative flows 1024 themselves, perhaps surprisingly, the most effective first step is to 1025 cancel out the polluting effect negative flows have on the measure of 1026 downstream congestion at a border. It is more important to get an 1027 unbiased estimate of their effect, than to try to remove them all. A 1028 suggested algorithm to give an unbiased estimate of the contribution 1029 from negative flows to the downstream congestion measure is given in 1030 Appendix C.2. 1032 Although making an accurate assessment of the contribution from 1033 negative flows may not be easy, just the single step of neutralising 1034 their polluting effect on congestion metrics removes all the gains 1035 networks could otherwise make from mounting dummy traffic attacks on 1036 each other. This puts all networks on the same side (only with 1037 respect to negative flows of course), rather than being pitched 1038 against each other. The network where this flow goes negative as 1039 well as all the networks downstream lose out from not being 1040 reimbursed for any congestion this flow causes. So they all have an 1041 interest in getting rid of these negative flows. Networks forwarding 1042 a flow before it goes negative aren't strictly on the same side, but 1043 they are disinterested bystanders---they don't care that the flow 1044 goes negative downstream, but at least they can't actively gain from 1045 making it go negative. The problem becomes localised so that once a 1046 flow goes negative, all the networks from where it happens and beyond 1047 downstream each have a small problem, each can detect it has a 1048 problem and each can get rid of the problem if it chooses to. But 1049 negative flows can no longer be used for any new attacks. 1051 Once an unbiased estimate of the effect of negative flows can be 1052 made, the problem reduces to detecting and preferably removing flows 1053 that have gone negative as soon as possible. But importantly, 1054 complete eradication of negative flows is no longer critical---best 1055 endeavours will be sufficient. 1057 For instance, let us consider the case where a source sends traffic 1058 with no positive packets at all, hoping to at least get as much 1059 traffic delivered as network-based droppers will allow. The flow is 1060 likely to go at least slightly negative in the first network on the 1061 path (N1 if we use the example network layout in Figure 2). If all 1062 networks use the algorithm in Appendix C.2 to inflate penalties at 1063 their border with an upstream network, they will remove the effect of 1064 negative flows. So, for instance, N2 will not be paying a penalty to 1065 N1 for this flow. Further, because the flow contributes no positive 1066 packets at all, a dropper at the egress will completely remove it. 1068 The remaining problem is that every network is carrying a flow that 1069 is causing congestion to others but not being held to account for the 1070 congestion it is causing. Whenever the fail-safe border algorithm 1071 (Section 4.6) or the border algorithm to compensate for negative 1072 flows (Appendix C.2) detects a negative flow, it can instantiate a 1073 focused dropper for that flow locally. It may be some time before 1074 the flow is detected, but the more strongly negative the flow is, the 1075 more quickly it will be detected by the fail-safe algorithm. But, in 1076 the meantime, it will not be distorting border incentives. Until it 1077 is detected, if it contributes to drop anywhere, its packets will 1078 tend to be dropped before others if queues use the preferential drop 1079 rules in [ref other document], which discriminate against non- 1080 positive packets. All networks below the point where a flow goes 1081 negative (N1, N2 and N3 in this case) have an incentive to remove 1082 this flow, but the queue where it first goes negative (in N1) can of 1083 course remove the problem for everyone downstream. 1085 In the case of DDoS attacks, Section 5.1 describes how re-ECN 1086 mitigates their force. 1088 4.6. Inter-domain Fail-safes 1090 The mechanisms described so far create incentives for rational 1091 network operators to behave. That is, one operator aims to make 1092 another behave responsibly by applying penalties and expects a 1093 rational response (i.e. one that trades off costs against benefits). 1094 It is usually reasonable to assume that other network operators will 1095 behave rationally (policy routing can avoid those that might not). 1096 But this approach does not protect against the misconfigurations and 1097 accidents of other operators. 1099 Therefore, we propose the following two mechanisms at a network's 1100 borders to provide "defence in depth". Both are similar: 1102 Highly positive flows: A small sample of positive packets should be 1103 picked randomly as they cross a border interface. Then subsequent 1104 packets matching the same source and destination address and DSCP 1105 should be monitored. If the fraction of positive packets is well 1106 above a threshold (to be determined by operational practice), a 1107 management alarm SHOULD be raised, and the flow MAY be 1108 automatically subject to focused drop. 1110 Persistently negative flows: A small sample of congestion marked 1111 (red) packets should be picked randomly as they cross a border 1112 interface. Then subsequent packets matching the same source and 1113 destination address and DSCP should be monitored. If the balance 1114 of positive packets minus negative packets (measured in bytes) is 1115 persistently negative, a management alarm SHOULD be raised, and 1116 the flow MAY be automatically subject to focused drop. 1118 Both these mechanisms rely on the fact that highly positive (or 1119 negative) flows will appear more quickly in the sample by selecting 1120 randomly solely from positive (or negative) packets. 1122 4.7. The Case against Classic Feedback 1124 A system that produces an optimal outcome as a result of everyone's 1125 selfish actions is extremely powerful. Especially one that enables 1126 evolvability of congestion control. But why do we have to change to 1127 re-ECN to achieve it? Can't classic congestion feedback (as used 1128 already by standard ECN) be arranged to provide similar incentives 1129 and similar evolvability? Superficially it can. Kelly's seminal 1130 work showed how we can allow everyone the freedom to evolve whatever 1131 congestion control behaviour is in their application's best interest 1132 but still optimise the whole system of networks and users by placing 1133 a price on congestion to ensure responsible use of this 1134 freedom [Evol_cc]). Kelly used ECN with its classic congestion 1135 feedback model as the mechanism to convey congestion price 1136 information. The mechanism could be thought of as volume charging; 1137 except only the volume of packets marked with congestion experienced 1138 (CE) was counted. 1140 However, below we explain why relying on classic feedback /required/ 1141 congestion charging to be used, while re-ECN achieves the same 1142 powerful outcome (given it is built on Kelly's foundations), but does 1143 not /require/ congestion charging. In brief, the problem with 1144 classic feedback is that the incentives have to trace the indirect 1145 path back to the sender---the long way round the feedback loop. For 1146 example, if classic feedback were used in Figure 2, N2 would have had 1147 to influence N1 via all of N3, R & S rather than directly. 1149 Inability to agree what is happening downstream: In order to police 1150 its upstream neighbour's congestion response, the neighbours 1151 should be able to agree on the congestion to be responded to. 1152 Whatever the feedback regime, as packets change hands at each 1153 trust boundary, any path metrics they carry are verifiable by both 1154 neighbours. But, with a classic path metric, they can only agree 1155 on the /upstream/ path congestion. 1157 Inaccessible back-channel: The network needs a whole-path congestion 1158 metric if it wants to control the source. Classically, whole path 1159 congestion emerges at the destination, to be fed back from 1160 receiver to sender in a back-channel. But, in any data network, 1161 back-channels need not be visible to relays, as they are 1162 essentially communications between the end-points. They may be 1163 encrypted, asymmetrically routed or simply omitted, so no network 1164 element can reliably intercept them. The congestion charging 1165 literature solves this problem by charging the receiver and 1166 assuming this will cause the receiver to refer the charges to the 1167 sender. But, of course, this creates unintended side-effects... 1169 `Receiver pays' unacceptable: In connectionless datagram networks, 1170 receivers and receiving networks cannot prevent reception from 1171 malicious senders, so `receiver pays' opens them to `denial of 1172 funds' attacks. 1174 End-user congestion charging unacceptable in many societies: Even if 1175 'denial of funds' were not a problem, we know that end-users are 1176 highly averse to the unpredictability of congestion charging and 1177 anyway, we want to avoid restricting network operators to just one 1178 retail tariff. But with classic feedback only an upstream metric 1179 is available, so we cannot avoid having to wrap the `receiver 1180 pays' money flow around the feedback loop, necessarily forcing 1181 end-users to be subjected to congestion charging. 1183 To summarise so far, with classic feedback, policing congestion 1184 response without losing evolvability /requires/ congestion charging 1185 of end-users and a `receiver pays' model, whereas, with re-ECN, it is 1186 still possible to influence incentives using congestion charging but 1187 using the safer `sender pays' model. However, congestion charging is 1188 only likely to be appropriate between domains. So, without losing 1189 evolvability, re-ECN enables technical policing mechanisms that are 1190 more appropriate for end users than congestion pricing. 1192 4.8. Simulations 1194 Simulations of policer and dropper performance done for the multi-bit 1195 version of re-feedback have been included in section 5 "Dropper 1196 Performance" of [Re-fb]. Simulations of policer and dropper for the 1197 re-ECN version described in this document are work in progress. 1199 5. Other Applications of Re-ECN 1201 5.1. DDoS Mitigation 1203 A flooding attack is inherently about congestion of a resource. 1204 Because re-ECN ensures the sources causing network congestion 1205 experience the cost of their own actions, it acts as a first line of 1206 defence against DDoS. As load focuses on a victim, upstream queues 1207 grow, requiring honest sources to pre-load packets with a higher 1208 fraction of positive packets. Once downstream queues are so 1209 congested that they are dropping traffic, they will be marking to 1210 negative the traffic they do forward 100%. Honest sources will 1211 therefore be sending positive packets 100% (and therefore being 1212 severely rate-limited at the ingress). 1214 Senders under malicious control can either do the same as honest 1215 sources, and be rate-limited at ingress, or they can understate 1216 congestion by sending more neutral RECT packets than they should. If 1217 sources understate congestion (i.e. do not re-echo sufficient 1218 positive packets) and the preferential drop ranking is implemented on 1219 queues ([ref othe document]), these queues will preserve positive 1220 traffic until last. So, the neutral traffic from malicious sources 1221 will all be automatically dropped first. Either way, the malicious 1222 sources cannot send more than honest sources. 1224 Further, hosts under malicious control will tend to be re-used for 1225 many different attacks. They will therefore build up a long term 1226 history of causing congestion. Therefore, as long as the population 1227 of potentially compromisable hosts around the Internet is limited, 1228 the per-user policing algorithms in Appendix B.1 will gradually 1229 throttle down zombies and other launchpads for attacks. Therefore, 1230 widespread deployment of re-ECN could considerably dampen the force 1231 of DDoS. Certainly, zombie armies could hold their fire for long 1232 enough to be able to build up enough credit in the per-user policers 1233 to launch an attack. But they would then still be limited to no more 1234 throughput than other, honest users. 1236 Inter-domain traffic policing (see Section 4.5)ensures that any 1237 network that harbours compromised `zombie' hosts will have to bear 1238 the cost of the congestion caused by traffic from zombies in 1239 downstream networks. Such networks will be incentivised to deploy 1240 per-user policers that rate-limit hosts that are unresponsive to 1241 congestion so they can only send very slowly into congested paths. 1242 As well as protecting other networks, the extremely poor performance 1243 at any sign of congestion will incentivise the zombie's owner to 1244 clean it up. However, the host should behave normally when using 1245 uncongested paths. 1247 Uniquely, re-ECN handles DDoS traffic without relying on the validity 1248 of identifiers in packets. Certainly the egress dropper relies on 1249 uniqueness of flow identifiers, but not their validity. So if a 1250 source spoofs another address, re-ECN works just as well, as long as 1251 the attacker cannot imitate all the flow identifiers of another 1252 active flow passing through the same dropper (see Section 6). 1253 Similarly, the ingress policer relies on uniqueness of flow IDs, not 1254 their validity. Because a new flow will only be allowed any rate at 1255 all if it starts with a cautious packet, and the more cautious 1256 packets there are starting new flows, the more they will be limited. 1257 Essentially a re-ECN policer limits the bulk of all congestion 1258 entering the network through a physical interface; limiting the 1259 congestion caused by each flow is merely an optional extra. 1261 5.2. End-to-end QoS 1263 {ToDo: (Section 3.3.2 of [Re-fb] entitled `Edge QoS' gives an outline 1264 of the text that will be added here).} 1266 5.3. Traffic Engineering 1268 Classic feedback makes congestion-based traffic engineering 1269 inefficient too. Network N3 can see which of its two alternative 1270 upstream networks N2 and N3 are less congested. But it is N1 that 1271 makes the routing decision. This is why current traffic engineering 1272 requires a continuous message stream from congestion monitors to the 1273 routing controller. And even then the monitors can only be trusted 1274 for /intra-/domain traffic engineering. The trustworthiness of re- 1275 ECN enables /inter-/domain traffic engineering without messaging 1276 overhead. {ToDo: Elaborate} 1278 5.4. Inter-Provider Service Monitoring 1280 {ToDo: } 1282 6. Limitations 1284 {ToDo:See also: slide of limitations} 1286 The known limitations of the re-ECN approach are: 1288 o We still cannot defend against the attack described in Section 10 1289 where a malicious source sends negative traffic through the same 1290 egress dropper as another flow and imitates its flow identifiers, 1291 allowing a malicious source to cause an innocent flow to 1292 experience heavy drop. 1294 o Re-feedback for TTL (re-TTL) would also be desirable at the same 1295 time as re-ECN. Unfortunately this requires a further standards 1296 action for the mechanisms briefly described in Appendix D 1298 o Traffic must be ECN-capable for re-ECN to be effective. The only 1299 defence against malicious users who turn off ECN capbility is that 1300 networks are expected to rate limit Not-ECT traffic and to apply 1301 higher drop preference to it during congestion. Although these 1302 are blunt instruments, they at least represent a feasible scenario 1303 for the future Internet where Not-ECT traffic co-exists with re- 1304 ECN traffic, but as a severely hobbled under-class. We recommend 1305 (Section 7.1) that while accommodating a smooth initial transition 1306 to re-ECN, policing policies should gradually be tightened to rate 1307 limit Not-ECT traffic more strictly in the longer term. 1309 o When checking whether a flow is balancing positive packets with 1310 negative packets (measured in bytes), re-ECN can only account for 1311 congestion marking, not drops. So, whenever a sender experiences 1312 drop, it does not have to re-echo the congestion event by sending 1313 positive packet(s). Nonetheless, it is hardly any advantage to be 1314 able to send faster than other flows only if your traffic is 1315 dropped and the other traffic isn't. 1317 o We are considering the issue of whether it would be useful to 1318 truncate rather than drop packets that appear to be malicious, so 1319 that the feedback loop is not broken but useful data can be 1320 removed. 1322 {ToDo: Monopolies over Routes} 1324 7. Incremental Deployment 1326 7.1. Incremental Deployment Features 1328 The design of the re-ECN protocol started from the fact that the 1329 current ECN marking behaviour of queues was sufficient and that re- 1330 feedback could be introduced around these queues by changing the 1331 sender behaviour but not the routers. Otherwise, if we had required 1332 routers to be changed, the chance of encountering a path that had 1333 every router upgraded would be vanishly small during early 1334 deployment, giving no incentive to start deployment. Also, as there 1335 is no new forwarding behaviour, routers and hosts do not have to 1336 signal or negotiate anything. 1338 However, networks that choose to protect themselves using re-ECN do 1339 have to add new security functions at their trust boundaries with 1340 others. They distinguish legacy traffic by its ECN field. Traffic 1341 from Not-ECT transports is distinguishable by its Not-ECT marking . 1342 Traffic from RFC3168 compliant ECN transports is distinguished from 1343 re-ECN by which of ECT(0) or ECT(1) is used. We chose to use ECT(1) 1344 for re-ECN traffic deliberately. Existing ECN sources set ECT(0) on 1345 either 50% (the nonce) or 100% (the default) of packets, whereas re- 1346 ECN does not use ECT(0) at all. We can use this distinguishing 1347 feature of RFC3168 compliant ECN traffic to separate it out for 1348 different treatment at the various border security functions: egress 1349 dropping, ingress policing and border policing. 1351 The general principle we adopt is that an egress dropper will not 1352 drop any legacy traffic, but ingress and border policers will limit 1353 the bulk rate of legacy traffic (Not-ECT, ECT(0) and those amrked 1354 with the unused codepoint as defined in [Re-TCP]) that can enter each 1355 network. Then, during early re-ECN deployment, operators can set 1356 very permissive (or non-existent) rate-limits on legacy traffic, but 1357 once re-ECN implementations are generally available, legacy traffic 1358 can be rate-limited increasingly harshly. Ultimately, an operator 1359 might choose to block all legacy traffic entering its network, or at 1360 least only allow through a trickle. 1362 Then, as the limits are set more strictly, the more RFC3168 ECN 1363 sources will gain by upgrading to re-ECN. Thus, towards the end of 1364 the voluntary incremental deployment period, RFC3168 compliant 1365 transports can be given progressively stronger encouragement to 1366 upgrade. 1368 7.2. Incremental Deployment Incentives 1370 It would only be worth standardising the re-ECN protocol if there 1371 existed a coherent story for how it might be incrementally deployed. 1372 In order for it to have a chance of deployment, everyone who needs to 1373 act must have a strong incentive to act, and the incentives must 1374 arise in the order that deployment would have to happen. Re-ECN 1375 works around unmodified ECN routers, but we can't just discuss why 1376 and how re-ECN deployment might build on ECN deployment, because 1377 there is precious little to build on in the first place. Instead, we 1378 aim to show that re-ECN deployment could carry ECN with it. We focus 1379 on commercial deployment incentives, although some of the arguments 1380 apply equally to academic or government sectors. 1382 ECN deployment: 1384 ECN is largely implemented in commercial routers, but generally 1385 not as a supported feature, and it has largely not been deployed 1386 by commercial network operators. ECN has been implemented in most 1387 Unix-based operating systems for some time. Microsoft first 1388 implemented ECN in Windows Vista, but it is only on by default for 1389 the server end of a TCP connection. Unfortunately the client end 1390 had to be turned off by default, because a non-zero ECN field 1391 triggers a bug in a legacy home gateway which makes it crash. For 1392 detailed deployment status, see [ECN-Deploy]. We believe the 1393 reason ECN deployment has not happened is twofold: 1395 * ECN requires changes to both routers and hosts. If someone 1396 wanted to sell the improvement that ECN offers, they would have 1397 to co-ordinate deployment of their product with others. An ECN 1398 server only gives any improvement on an ECN network. An ECN 1399 network only gives any improvement if used by ECN devices. 1400 Deployment that requires co-ordination adds cost and delay and 1401 tends to dilute any competitive advantage that might be gained. 1403 * ECN `only' gives a performance improvement. Making a product a 1404 bit faster (whether the product is a device or a network), 1405 isn't usually a sufficient selling point to be worth the cost 1406 of co-ordinating across the industry to deploy it. Network 1407 operators tend to avoid re-configuring a working network unless 1408 launching a new product. 1410 ECN and Re-ECN for Edge-to-edge Assured QoS: 1412 We believe the proposal to provide assured QoS sessions using a 1413 form of ECN called pre-congestion notification (PCN) [RFC5559] is 1414 most likely to break the deadlock in ECN deployment first. It 1415 only requires edge-to-edge deployment so it does not require 1416 endpoint support. It can be deployed in a single network, then 1417 grow incrementally to interconnected networks. And it provides a 1418 different `product' (internetworked assured QoS), rather than 1419 merely making an existing product a bit faster. 1421 Not only could this assured QoS application kick-start ECN 1422 deployment, it could also carry re-ECN deployment with it; because 1423 re-ECN can enable the assured QoS region to expand to a large 1424 internetwork where neighbouring networks do not trust each other. 1425 [Re-PCN] argues that re-ECN security should be built in to the QoS 1426 system from the start, explaining why and how. 1428 If ECN and re-ECN were deployed edge-to-edge for assured QoS, 1429 operators would gain valuable experience. They would also clear 1430 away many technical obstacles such as firewall configurations that 1431 block all but the RFC3168 settings of the ECN field and the RE 1432 flag. 1434 ECN in Access Networks: 1436 The next obstacle to ECN deployment would be extension to access 1437 and backhaul networks, where considerable link layer differences 1438 makes implementation non-trivial, particularly on congested 1439 wireless links. ECN and re-ECN work fine during partial 1440 deployment, but they will not be very useful if the most congested 1441 elements in networks are the last to support them. Access network 1442 support is one of the weakest parts of this deployment story. All 1443 we can hope is that, once the benefits of ECN are better 1444 understood by operators, they will push for the necessary link 1445 layer implementations as deployment proceeds. 1447 Policing Unresponsive Flows: 1449 Re-ECN allows a network to offer differentiated quality of service 1450 as explained in Section 5.2. But we do not believe this will 1451 motivate initial deployment of re-ECN, because the industry is 1452 already set on alternative ways of doing QoS. Despite being much 1453 more complicated and expensive, the alternative approaches are 1454 here and now. 1456 But re-ECN is critical to QoS deployment in another respect. It 1457 can be used to prevent applications from taking whatever bandwidth 1458 they choose without asking. 1460 Currently, applications that remain resolute in their lack of 1461 response to congestion are rewarded by other TCP applications. In 1462 other words, TCP is naively friendly, in that it reduces its rate 1463 in response to congestion whether it is competing with friends 1464 (other TCPs) or with enemies (unresponsive applications). 1466 Therefore, those network owners that want to sell QoS will be keen 1467 to ensure that their users can't help themselves to QoS for free. 1468 Given the very large revenues at stake, we believe effective 1469 policing of congestion response will become highly sought after by 1470 network owners. 1472 But this does not necessarily argue for re-ECN deployment. 1473 Network owners might choose to deploy bottleneck policers rather 1474 than re-ECN-based policing. However, under Related Work 1475 (Section 9) we argue that bottleneck policers are inherently 1476 vulnerable to circumvention. 1478 Therefore we believe there will be a strong demand from network 1479 owners for re-ECN deployment so they can police flows that do not 1480 ask to be unresponsive to congestion, in order to protect their 1481 revenues from flows that do ask (QoS). In particular, we suspect 1482 that the operators of cellular networks will want to prevent VoIP 1483 and video applications being used freely on their networks as a 1484 more open market develops in GPRS and 3G devices. 1486 Initial deployments are likely to be isolated to single cellular 1487 networks. Cellular operators would first place requirements on 1488 device manufacturers to include re-ECN in the standards for mobile 1489 devices. In parallel, they would put out tenders for ingress and 1490 egress policers. Then, after a while they would start to tighten 1491 rate limits on Not-ECT traffic from non-standard devices and they 1492 would start policing whatever non-accredited applications people 1493 might install on mobile devices with re-ECN support in the 1494 operating system. This would force even independent mobile device 1495 manufacturers to provide re-ECN support. Early standardisation 1496 across the cellular operators is likely, including interconnection 1497 agreements with penalties for excess downstream congestion. 1499 We suspect some fixed broadband networks (whether cable or DSL) 1500 would follow a similar path. However, we also believe that larger 1501 parts of the fixed Internet would not choose to police on a per- 1502 flow basis. Some might choose to police congestion on a per-user 1503 basis in order to manage heavy peer-to-peer file-sharing, but it 1504 seems likely that a sizeable majority would not deploy any form of 1505 policing. 1507 This hybrid situation begs the question, "How does re-ECN work for 1508 networks that choose to using policing if they connect with others 1509 that don't?" Traffic from non-ECN capable sources will arrive 1510 from other networks and cause congestion within the policed, ECN- 1511 capable networks. So networks that chose to police congestion 1512 would rate-limit Not-ECT traffic throughout their network, 1513 particularly at their borders. They would probably also set 1514 higher usage prices in their interconnection contracts for 1515 incoming Not-ECT and Not-RECT traffic. We assume that 1516 interconnection contracts between networks in the same tier will 1517 include congestion penalties before contracts with provider 1518 backbones do. 1520 A hybrid situation could remain for all time. As was explained in 1521 the introduction, we believe in healthy competition between 1522 policing and not policing, with no imperative to convert the whole 1523 world to the religion of policing. Networks that chose not to 1524 deploy egress droppers would leave themselves open to being 1525 congested by senders in other networks. But that would be their 1526 choice. 1528 The important aspect of the egress dropper though is that it most 1529 protects the network that deploys it. If a network does not 1530 deploy an egress dropper, sources sending into it from other 1531 networks will be able to understate the congestion they are 1532 causing. Whereas, if a network deploys an egress dropper, it can 1533 know how much congestion other networks are dumping into it, and 1534 apply penalties or charges accordingly. So, whether or not a 1535 network polices its own sources at ingress, it is in its interests 1536 to deploy an egress dropper. 1538 Host support: 1540 In the above deployment scenario, host operating system support 1541 for re-ECN came about through the cellular operators demanding it 1542 in device standards (i.e. 3GPP). Of course, increasingly, mobile 1543 devices are being built to support multiple wireless technologies. 1544 So, if re-ECN were stipulated for cellular devices, it would 1545 automatically appear in those devices connected to the wireless 1546 fringes of fixed networks if they coupled cellular with WiFi or 1547 Bluetooth technology, for instance. Also, once implemented in the 1548 operating system of one mobile device, it would tend to be found 1549 in other devices using the same family of operating system. 1551 Therefore, whether or not a fixed network deployed ECN, or 1552 deployed re-ECN policers and droppers, many of its hosts might 1553 well be using re-ECN over it. Indeed, they would be at an 1554 advantage when communicating with hosts across re-ECN policed 1555 networks that rate limited Not-RECT traffic. 1557 Other possible scenarios: 1559 The above is thankfully not the only plausible scenario we can 1560 think of. One of the many clubs of operators that meet regularly 1561 around the world might decide to act together to persuade a major 1562 operating system manufacturer to implement re-ECN. And they may 1563 agree between them on an interconnection model that includes 1564 congestion penalties. 1566 Re-ECN provides an interesting opportunity for device 1567 manufacturers as well as network operators. Policers can be 1568 configured loosely when first deployed. Then as re-ECN take-up 1569 increases, they can be tightened up, so that a network with re-ECN 1570 deployed can gradually squeeze down the service provided to 1571 RFC3168 compliant devices that have not upgraded to re-ECN. Many 1572 device vendors rely on replacement sales. And operating system 1573 companies rely heavily on new release sales. Also support 1574 services would like to be able to force stragglers to upgrade. 1575 So, the ability to throttle service to RFC3168 compliant operating 1576 systems is quite valuable. 1578 Also, policing unresponsive sources may not be the only or even 1579 the first application that drives deployment. It may be policing 1580 causes of heavy congestion (e.g. peer-to-peer file-sharing). Or 1581 it may be mitigation of denial of service. Or we may be wrong in 1582 thinking simpler QoS will not be the initial motivation for re-ECN 1583 deployment. Indeed, the combined pressure for all these may be 1584 the motivator, but it seems optimistic to expect such a level of 1585 joined-up thinking from today's communications industry. We 1586 believe a single application alone must be a sufficient motivator. 1588 In short, everyone gains from adding accountability to TCP/IP, 1589 except the selfish or malicious. So, deployment incentives tend 1590 to be strong. 1592 8. Architectural Rationale 1594 In the Internet's technical community, the danger of not responding 1595 to congestion is well-understood, as well as its attendant risk of 1596 congestion collapse [RFC3714]. However, one side of the Internet's 1597 commercial community considers that the very essence of IP is to 1598 provide open access to the internetwork for all applications. They 1599 see congestion as a symptom of over-conservative investment, and rely 1600 on revising application designs to find novel ways to keep 1601 applications working despite congestion. They argue that the 1602 Internet was never intended to be solely for TCP-friendly 1603 applications. Meanwhile, another side of the Internet's commercial 1604 community believes that it is worthwhile providing a network for 1605 novel applications only if it has sufficient capacity, which can 1606 happen only if a greater share of application revenues can be 1607 /assured/ for the infrastructure provider. Otherwise the major 1608 investments required would carry too much risk and wouldn't happen. 1610 The lesson articulated in [Tussle] is that we shouldn't embed our 1611 view on these arguments into the Internet at design time. Instead we 1612 should design the Internet so that the outcome of these arguments can 1613 get decided at run-time. Re-ECN is designed in that spirit. Once 1614 the protocol is available, different network operators can choose how 1615 liberal they want to be in holding people accountable for the 1616 congestion they cause. Some might boldly invest in capacity and not 1617 police its use at all, hoping that novel applications will result. 1618 Others might use re-ECN for fine-grained flow policing, expecting to 1619 make money selling vertically integrated services. Yet others might 1620 sit somewhere half-way, perhaps doing coarse, per-user policing. All 1621 might change their minds later. But re-ECN always allows them to 1622 interconnect so that the careful ones can protect themselves from the 1623 liberal ones. 1625 The incentive-based approach used for re-ECN is based on Gibbens and 1626 Kelly's arguments [Evol_cc] on allowing endpoints the freedom to 1627 evolve new congestion control algorithms for new applications. They 1628 ensured responsible behaviour despite everyone's self-interest by 1629 applying pricing to ECN marking, and Kelly had proved stability and 1630 optimality in an earlier paper. 1632 Re-ECN keeps all the underlying economic incentives, but rearranges 1633 the feedback. The idea is to allow a network operator (if it 1634 chooses) to deploy engineering mechanisms like policers at the front 1635 of the network which can be designed to behave /as if/ they are 1636 responding to congestion prices. Rather than having to subject users 1637 to congestion pricing, networks can then use more traditional 1638 charging regimes (or novel ones). But the engineering can constrain 1639 the overall amount of congestion a user can cause. This provides a 1640 buffer against completely outrageous congestion control, but still 1641 makes it easy for novel applications to evolve if they need different 1642 congestion control to the norms. It also allows novel charging 1643 regimes to evolve. 1645 Despite being achieved with a relatively minor protocol change, re- 1646 ECN is an architectural change. Previously, Internet congestion 1647 could only be controlled by the data sender, because it was the only 1648 one both in a position to control the load and in a position to see 1649 information on congestion. Re-ECN levels the playing field. It 1650 recognises that the network also has a role to play in moderating 1651 (policing) congestion control. But policing is only truly effective 1652 at the first ingress into an internetwork, whereas path congestion 1653 was previously only visible at the last egress. So, re-ECN 1654 democratises congestion information. Then the choice over who 1655 actually controls congestion can be made at run-time, not design 1656 time---a bit like an aircraft with dual controls. And different 1657 operators can make different choices. We believe non-architectural 1658 approaches to this problem are unlikely to offer more than partial 1659 solutions (see Section 9). 1661 Importantly, re-ECN does not require assumptions about specific 1662 congestion responses to be embedded in any network elements, except 1663 at the first ingress to the internetwork if that level of control is 1664 desired by the ingress operator. But such tight policing will be a 1665 matter of agreement between the source and its access network 1666 operator. The ingress operator need not police congestion response 1667 at flow granularity; it can simply hold a source responsible for the 1668 aggregate congestion it causes, perhaps keeping it within a monthly 1669 congestion quota. Or if the ingress network trusts the source, it 1670 can do nothing. 1672 Therefore, the aim of the re-ECN protocol is NOT solely to police 1673 TCP-friendliness. Re-ECN preserves IP as a generic network layer for 1674 all sorts of responses to congestion, for all sorts of transports. 1675 Re-ECN merely ensures truthful downstream congestion information is 1676 available in the network layer for all sorts of accountability 1677 applications. 1679 The end to end design principle does not say that all functions 1680 should be moved out of the lower layers---only those functions that 1681 are not generic to all higher layers. Re-ECN adds a function to the 1682 network layer that is generic, but was omitted: accountability for 1683 causing congestion. Accountability is not something that an end-user 1684 can provide to themselves. We believe re-ECN adds no more than is 1685 sufficient to hold each flow accountable, even if it consists of a 1686 single datagram. 1688 "Accountability" implies being able to identify who is responsible 1689 for causing congestion. However, at the network layer it would NOT 1690 be useful to identify the cause of congestion by adding individual or 1691 organisational identity information, NOR by using source IP 1692 addresses. Rather than bringing identity information to the point of 1693 congestion, we bring downstream congestion information to the point 1694 where the cause can be most easily identified and dealt with. That 1695 is, at any trust boundary congestion can be associated with the 1696 physically connected upstream neighbour that is directly responsible 1697 for causing it (whether intentionally or not). A trust boundary 1698 interface is exactly the place to police or throttle in order to 1699 directly mitigate congestion, rather than having to trace the 1700 (ir)responsible party in order to shut them down. 1702 Some considered that ECN itself was a layering violation. The 1703 reasoning went that the interface to a layer should provide a service 1704 to the higher layer and hide how the lower layer does it. However, 1705 ECN reveals the state of the network layer and below to the transport 1706 layer. A more positive way to describe ECN is that it is like the 1707 return value of a function call to the network layer. It explicitly 1708 returns the status of the request to deliver a packet, by returning a 1709 value representing the current risk that a packet will not be served. 1710 Re-ECN has similar semantics, except the transport layer must try to 1711 guess the return value, then it can use the actual return value from 1712 the network layer to modify the next guess. 1714 The guiding principle behind all the discussion in Section 4.5 on 1715 Policing is that any gain from subverting the protocol should be 1716 precisely neutralised, rather than punished. If a gain is punished 1717 to a greater extent than is sufficient to neutralise it, it will most 1718 likely open up a new vulnerability, where the amplifying effect of 1719 the punishment mechanism can be turned on others. 1721 For instance, if possible, flows should be removed as soon as they go 1722 negative, but we do NOT RECOMMEND any attempts to discard such flows 1723 further upstream while they are still positive. Such over-zealous 1724 push-back is unnecessary and potentially dangerous. These flows have 1725 paid their `fare' up to the point they go negative, so there is no 1726 harm in delivering them that far. If someone downstream asks for a 1727 flow to be dropped as near to the source as possible, because they 1728 say it is going to become negative later, an upstream node cannot 1729 test the truth of this assertion. Rather than have to authenticate 1730 such messages, re-ECN has been designed so that flows can be dropped 1731 solely based on locally measurable evidence. A message hinting that 1732 a flow should be watched closely to test for negativity is fine. But 1733 not a message that claims that a positive flow will go negative 1734 later, so it should be dropped. . 1736 9. Related Work 1738 {Due to lack of time, this section is incomplete. The reader is 1739 referred to the Related Work section of [Re-fb] for a brief selection 1740 of related ideas.} 1742 9.1. Policing Rate Response to Congestion 1744 ATM network elements send congestion back-pressure 1745 messages [ITU-T.I.371] along each connection, duplicating any end to 1746 end feedback because they don't trust it. On the other hand, re-ECN 1747 ensures information in forwarded packets can be used for congestion 1748 management without requiring a connection-oriented architecture and 1749 re-using the overhead of fields that are already set aside for end to 1750 end congestion control (and routing loop detection in the case of re- 1751 TTL in Appendix D). 1753 We borrowed ideas from policers in the literature [pBox],[XCHOKe], 1754 AFD etc. for our rate equation policer. However, without the benefit 1755 of re-ECN they don't police the correct rate for the condition of 1756 their path. They detect unusually high /absolute/ rates, but only 1757 while the policer itself is congested, because they work by detecting 1758 prevalent flows in the discards from the local RED queue. These 1759 policers must sit at every potential bottleneck, whereas our policer 1760 need only be located at each ingress to the internetwork. As Floyd & 1761 Fall explain [pBox], the limitation of their approach is that a high 1762 sending rate might be perfectly legitimate, if the rest of the path 1763 is uncongested or the round trip time is short. Commercially 1764 available rate policers cap the rate of any one flow. Or they 1765 enforce monthly volume caps in an attempt to control high volume 1766 file-sharing. They limit the value a customer derives. They might 1767 also limit the congestion customers can cause, but only as an 1768 accidental side-effect. They actually punish traffic that fills 1769 troughs as much as traffic that causes peaks in utilisation. In 1770 practice network operators need to be able to allocate service by 1771 cost during congestion, and by value at other times. 1773 9.2. Congestion Notification Integrity 1775 The choice of two ECT code-points in the ECN field [RFC3168] 1776 permitted future flexibility, optionally allowing the sender to 1777 encode the experimental ECN nonce [RFC3540] in the packet stream. 1778 This mechanism has since been included in the specifications of DCCP 1779 [RFC4340]. 1781 The ECN nonce is an elegant scheme that allows the sender to detect 1782 if someone in the feedback loop - the receiver especially - tries to 1783 claim no congestion was experienced when in fact congestion led to 1784 packet drops or ECN marks. For each packet it sends, the sender 1785 chooses between the two ECT codepoints in a pseudo-random sequence. 1786 Then, whenever the network marks a packet with CE, if the receiver 1787 wants to deny congestion happened, she has to guess which ECT 1788 codepoint was overwritten. She has only a 50:50 chance of being 1789 correct each time she denies a congestion mark or a drop, which 1790 ultimately will give her away. 1792 The purpose of a network-layer nonce should primarily be protection 1793 of the network, while a transport-layer nonce would be better used to 1794 protect the sender from cheating receivers. Now, the assumption 1795 behind the ECN nonce is that a sender will want to detect whether a 1796 receiver is suppressing congestion feedback. This is only true if 1797 the sender's interests are aligned with the network's, or with the 1798 community of users as a whole. This may be true for certain large 1799 senders, who are under close scrutiny and have a reputation to 1800 maintain. But we have to deal with a more hostile world, where 1801 traffic may be dominated by peer-to-peer transfers, rather than 1802 downloads from a few popular sites. Often the `natural' self- 1803 interest of a sender is not aligned with the interests of other 1804 users. It often wishes to transfer data quickly to the receiver as 1805 much as the receiver wants the data quickly. 1807 In contrast, the re-ECN protocol enables policing of an agreed rate- 1808 response to congestion (e.g. TCP-friendliness) at the sender's 1809 interface with the internetwork. It also ensures downstream networks 1810 can police their upstream neighbours, to encourage them to police 1811 their users in turn. But most importantly, it requires the sender to 1812 declare path congestion to the network and it can remove traffic at 1813 the egress if this declaration is dishonest. So it can police 1814 correctly, irrespective of whether the receiver tries to suppress 1815 congestion feedback or whether the sender ignores genuine congestion 1816 feedback. Therefore the re-ECN protocol addresses a much wider range 1817 of cheating problems, which includes the one addressed by the ECN 1818 nonce. 1820 9.3. Identifying Upstream and Downstream Congestion 1822 Purple [Purple] proposes that queues should use the CWR flag in the 1823 TCP header of ECN-capable flows to work out path congestion and 1824 therefore downstream congestion in a similar way to re-ECN. However, 1825 because CWR is in the transport layer, it is not always visible to 1826 network layer routers and policers. Purple's motivation was to 1827 improve AQM, not policing. But, of course, nodes trying to avoid a 1828 policer would not be expected to allow CWR to be visible. 1830 10. Security Considerations 1832 {ToDo: enrich this section}{ToDo: Describe attacks by networks on 1833 flows (and by spoofing sources).} {ToDo: Re-ECN & DNS servers} 1835 Nearly the whole of this document concerns security. 1837 11. IANA Considerations 1839 This memo includes no request to IANA. 1841 12. Conclusions 1843 {ToDo:} 1845 13. Acknowledgements 1847 Sebastien Cazalet and Andrea Soppera contributed to the idea of re- 1848 feedback. All the following have given helpful comments: Andrea 1849 Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley, 1850 Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright, 1851 John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru 1852 Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd 1853 (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark 1854 Handley (who developed the attack with cancelled packets), Adam 1855 Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft 1856 (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who 1857 complemented our own dummy traffic attacks with others), Liz Maida 1858 (MIT), and comments from participants in the CRN/CFP Broadband and 1859 DoS-resistant Internet working groups.A special thank you to 1860 Alessandro Salvatori for coming up with fiendish attacks on re-ECN. 1862 14. Comments Solicited 1864 Comments and questions are encouraged and very welcome. They can be 1865 addressed to the IETF Transport Area working group's mailing list 1866 , and/or to the authors. 1868 15. References 1870 15.1. Normative References 1872 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1873 Requirement Levels", BCP 14, RFC 2119, March 1997. 1875 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The 1876 Addition of Explicit Congestion Notification (ECN) 1877 to IP", RFC 3168, September 2001. 1879 15.2. Informative References 1881 [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing 1882 the assumptions underlying mechanism design for the 1883 Internet", Proc. Workshop on the Economics of 1884 Networked Systems (NetEcon06) , June 2006, . 1888 [CLoop_pol] Salvatori, A., "Closed Loop Traffic Policing", 1889 Politecnico Torino and Institut Eurecom Masters 1890 Thesis , September 2005. 1892 [ECN-Deploy] Floyd, S., "ECN (Explicit Congestion Notification) 1893 in TCP/IP; Implementation and Deployment of ECN", 1894 Web-page , May 2004, . 1897 [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the 1898 evolution of congestion control", 1899 Automatica 35(12)1969--1985, December 1999, . 1903 [ITU-T.I.371] ITU-T, "Traffic Control and Congestion Control in 1904 B-ISDN", ITU-T Rec. I.371 (03/04), March 2004. 1906 [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic 1907 Behavior of the TCP Congestion Avoidance 1908 Algorithm", ACM SIGCOMM CCR 32(3)75-88, July 2002, 1909 . 1911 [Mathis97] Mathis, M., Semke, J., Mahdavi, J., and T. Ott, 1912 "The Macroscopic Behavior of the TCP Congestion 1913 Avoidance Algorithm", ACM SIGCOMM CCR 27(3)67--82, 1914 July 1997, 1915 . 1917 [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: 1918 Predictive Active Queue Management Utilizing 1919 Congestion Information", Proc. Local Computer 1920 Networks (LCN 2003) , October 2003. 1922 [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., 1923 O'Dell, M., Romanow, A., Weinrib, A., and L. Zhang, 1924 "Resource ReSerVation Protocol (RSVP) Version 1 1925 Applicability Statement Some Guidelines on 1926 Deployment", RFC 2208, September 1997. 1928 [RFC3514] Bellovin, S., "The Security Flag in the IPv4 1929 Header", RFC 3514, April 2003. 1931 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust 1932 Explicit Congestion Notification (ECN) Signaling 1933 with Nonces", RFC 3540, June 2003. 1935 [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding 1936 Congestion Control for Voice Traffic in the 1937 Internet", RFC 3714, March 2004. 1939 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1940 Congestion Control Protocol (DCCP)", RFC 4340, 1941 March 2006. 1943 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram 1944 Congestion Control Protocol (DCCP) Congestion 1945 Control ID 2: TCP-like Congestion Control", 1946 RFC 4341, March 2006. 1948 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1949 Datagram Congestion Control Protocol (DCCP) 1950 Congestion Control ID 3: TCP-Friendly Rate Control 1951 (TFRC)", RFC 4342, March 2006. 1953 [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) 1954 Architecture", RFC 5559, June 2009. 1956 [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using 1957 Re-PCN on Bulk Data", 1958 draft-briscoe-re-pcn-border-cheat-03 (work in 1959 progress), October 2009. 1961 [Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. 1962 Smith, "Re-ECN: Adding Accountability for Causing 1963 Congestion to TCP/IP", 1964 draft-briscoe-conex-re-ecn-tcp-03 (work in 1965 progress), March 2014. 1967 [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., 1968 Salvatori, A., Soppera, A., and M. Koyabe, 1969 "Policing Congestion Response in an Internetwork 1970 Using Re-Feedback", ACM SIGCOMM CCR 35(4)277--288, 1971 August 2005, . 1974 [Savage99] Savage, S., Cardwell, N., Wetherall, D., and T. 1975 Anderson, "TCP congestion control with a 1976 misbehaving receiver", ACM SIGCOMM CCR 29(5), 1977 October 1999, 1978 . 1980 [Smart_rtg] Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. 1981 Zhang, "Optimizing Cost and Performance for 1982 Multihoming", ACM SIGCOMM CCR 34(4)79--92, 1983 October 2004, 1984 . 1986 [Steps_DoS] Handley, M. and A. Greenhalgh, "Steps towards a 1987 DoS-resistant Internet Architecture", Proc. ACM 1988 SIGCOMM workshop on Future directions in network 1989 architecture (FDNA'04) pp 49--56, August 2004. 1991 [Tussle] Clark, D., Sollins, K., Wroclawski, J., and R. 1992 Braden, "Tussle in Cyberspace: Defining Tomorrow's 1993 Internet", ACM SIGCOMM CCR 32(4)347--356, 1994 October 2002, . 1997 [XCHOKe] Chhabra, P., Chuig, S., Goel, A., John, A., Kumar, 1998 A., Saran, H., and R. Shorey, "XCHOKe: Malicious 1999 Source Control for Congestion Avoidance at Internet 2000 Gateways", Proceedings of IEEE International 2001 Conference on Network Protocols (ICNP-02) , 2002 November 2002, 2003 . 2005 [pBox] Floyd, S. and K. Fall, "Promoting the Use of End- 2006 to-End Congestion Control in the Internet", IEEE/ 2007 ACM Transactions on Networking 7(4) 458--472, 2008 August 1999, . 2011 [relax-fairness] Briscoe, B., Moncaster, T., and L. Burness, 2012 "Problem Statement: Transport Protocols Don't Have 2013 To Do Fairness", 2014 draft-briscoe-tsvwg-relax-fairness-01 (work in 2015 progress), July 2008. 2017 Appendix A. Example Egress Dropper Algorithm 2019 {ToDo: Write up the basic algorithm with flow state, then the 2020 aggregated one.} 2022 Appendix B. Policer Designs to ensure Congestion Responsiveness 2024 B.1. Per-user Policing 2026 User policing requires a policer on the ingress interface of the 2027 access router associated with the user. At that point, the traffic 2028 of the user hasn't diverged on different routes yet; nor has it mixed 2029 with traffic from other sources. 2031 In order to ensure that a user doesn't generate more congestion in 2032 the network than her due share, a modified bulk token-bucket is 2033 maintained with the following parameter: 2035 o b_0 the initial token level 2037 o r the filling rate 2039 o b_max the bucket depth 2041 The same token bucket algorithm is used as in many areas of 2042 networking, but how it is used is very different: 2044 o all traffic from a user over the lifetime of their subscription is 2045 policed in the same token bucket. 2047 o only positive and cancelled packets (positive, cautious and 2048 cancelled) consume tokens 2050 Such a policer will allow network operators to throttle the 2051 contribution of their users to network congestion. This will require 2052 the appropriate contractual terms to be in place between operators 2053 and users. For instance: a condition for a user to subscribe to a 2054 given network service may be that she should not cause more than a 2055 volume C_user of congestion over a reference period T_user, although 2056 she may carry forward up to N_user times her allowance at the end of 2057 each period. These terms directly set the parameter of the user 2058 policer: 2060 o b_0 = C_user 2062 o r = C_user/T_user 2064 o b_max = b_0 * (N_user +1) 2066 Besides the congestion budget policer above, another user policer may 2067 be necessary to further rate-limit cautious packets, if they are to 2068 be marked rather than dropped (see discussion in [ref other 2069 document].). Rate-limiting cautious packets will prevent high bursts 2070 of new flow arrivals, which is a very useful feature in DoS 2071 prevention. A condition to subscribe to a given network service 2072 would have to be that a user should not generate more than C_cautious 2073 cautious packets, over a reference period T_cautious, with no option 2074 to carry forward any of the allowance at the end of each period. 2075 These terms directly set the parameters of the cautious packet 2076 policer: 2078 o b_0 = C_cautious 2080 o r = C_cautious/T_cautious 2082 o b_max = b_0 2084 T_cautious should be a much shorter period than T_user: for instance 2085 T_cautious could be in the order of minutes while T_user could be in 2086 order of weeks. 2088 B.2. Per-flow Rate Policing 2090 Whilst we believe that simple per-user policing would be sufficient 2091 to ensure senders comply with congestion control, some operators may 2092 wish to police the rate response of each flow to congestion as well. 2093 Although we do not believe this will be neceesary, we include this 2094 section to show how one could perform per-flow policing using 2095 enforcement of TCP-fairness as an example. Per-flow policing aims to 2096 enforce congestion responsiveness on the shortest information 2097 timescale on a network path: packet roundtrips. 2099 This again requires that the appropriate terms be agreed between a 2100 network operator and its users, where a congestion responsiveness 2101 policy might be required for the use of a given network service 2102 (perhaps unless the user specifically requests otherwise). 2104 As an example, we describe below how a rate adaptation policer can be 2105 designed when the applicable rate adaptation policy is TCP- 2106 compliance. In that context, the average throughput of a flow will 2107 be expected to be bounded by the value of the TCP throughput during 2108 congestion avoidance, given in Mathis' formula [Mathis97] 2110 x_TCP = k * s / ( T * sqrt(m) ) 2112 where: 2114 o x_TCP is the throughput of the TCP flow in packets per second, 2116 o k is a constant upper-bounded by sqrt(3/2), 2117 o s is the average packet size of the flow, 2119 o T is the roundtrip time of the flow, 2121 o m is the congestion level experienced by the flow. 2123 We define the marking period N=1/m which represents the average 2124 number of packets between two positive or cancelled packets. Mathis' 2125 formula can be re-written as: 2127 x_TCP = k*s*sqrt(N)/T 2129 We can then get the average inter-mark time in a compliant TCP flow, 2130 dt_TCP, by solving (x_TCP/s)*dt_TCP = N which gives 2132 dt_TCP = sqrt(N)*T/k 2134 We rely on this equation for the design of a rate-adaptation policer 2135 as a variation of a token bucket. In that case a policer has to be 2136 set up for each policed flow. This may be triggered by cautious 2137 packets, with the remainder of flows being all rate limited together 2138 if they do not start with a cautious packet. 2140 Where maintaining per flow state is not a problem, for instance on 2141 some access routers, systematic per-flow policing may be considered. 2142 Should per-flow state be more constrained, rate adaptation policing 2143 could be limited to a random sample of flows exhibiting positive or 2144 cancelled packets. 2146 As in the case of user policing, only positive or cancelled packets 2147 will consume tokens, however the amount of tokens consumed will 2148 depend on the congestion signal. 2150 When a new rate adaptation policer is set up for flow j, the 2151 following state is created: 2153 o a token bucket b_j of depth b_max starting at level b_0 2155 o a timestamp t_j = timenow() 2157 o a counter N_j = 0 2159 o a roundtrip estimate T_j 2161 o a filling rate r 2163 When the policing node forwards a packet of flow j with no positive 2164 packets: 2166 o . the counter is incremented: N_j += 1 2168 When the policing node forwards a packet of flow j carrying a 2169 negative packet: 2171 o the counter is incremented: N_j += 1 2173 o the token level is adjusted: b_j += r*(timenow()-t_j) - sqrt(N_j)* 2174 T_j/k 2176 o the counter is reset: N_j = 0 2178 o the timer is reset: t_j = timenow() 2180 An implementation example will be given in a later draft that avoids 2181 having to extract the square root. 2183 Analysis: For a TCP flow, for r= 1 token/sec, on average, 2185 r*(timenow()-t_j)-sqrt(N_j)* T_j/k = dt_TCP - sqrt(N)*T/k = 0 2187 This means that the token level will fluctuate around its initial 2188 level. The depth b_max of the bucket sets the timescale on which the 2189 rate adaptation policy is performed while the filling rate r sets the 2190 trade-off between responsiveness and robustness: 2192 o the higher b_max, the longer it will take to catch greedy flows 2194 o the higher r, the fewer false positives (greedy verdict on 2195 compliant flows) but the more false negatives (compliant verdict 2196 on greedy flows) 2198 This rate adaptation policer requires the availability of a roundtrip 2199 estimate which may be obtained for instance from the application of 2200 re-feedback to the downstream delay Appendix D or passive estimation 2201 [Jiang02]. 2203 When the bucket of a policer located at the access router (whether it 2204 is a per-user policer or a per-flow policer) becomes empty, the 2205 access router SHOULD drop at least all packets causing the token 2206 level to become negative. The network operator MAY take further 2207 sanctions if the token level of the per-flow policers associated with 2208 a user becomes negative. 2210 Appendix C. Downstream Congestion Metering Algorithms 2211 C.1. Bulk Downstream Congestion Metering Algorithm 2213 To meter the bulk amount of downstream congestion in traffic crossing 2214 an inter-domain border an algorithm is needed that accumulates the 2215 size of positive packets and subtracts the size of negative packets. 2216 We maintain two counters: 2218 V_b: accumulated congestion volume 2220 B: total data volume (in case it is needed) 2222 A suitable pseudo-code algorithm for a border router is as follows: 2224 ==================================================================== 2225 V_b = 0 2226 B = 0 2227 for each Re-ECN-capable packet { 2228 b = readLength(packet) /* set b to packet size */ 2229 B += b /* accumulate total volume */ 2230 if readEECN(packet) == (positive || cautious { 2231 V_b += b /* increment... */ 2232 } elseif readEECN(packet) == negative { 2233 V_b -= b /* ...or decrement V_b... */ 2234 } /*...depending on EECN field */ 2235 } 2236 ==================================================================== 2238 At the end of an accounting period this counter V_b represents the 2239 congestion volume that penalties could be applied to, as described in 2240 Section 4.5. 2242 For instance, accumulated volume of congestion through a border 2243 interface over a month might be V_b = 5PB (petabyte = 10^15 byte). 2244 This might have resulted from an average downstream congestion level 2245 of 1% on an accumulated total data volume of B = 500PB. 2247 {ToDo: Include algorithm for precise downstream congestion.} 2249 C.2. Inflation Factor for Persistently Negative Flows 2251 The following process is suggested to complement the simple algorithm 2252 above in order to protect against the various attacks from 2253 persistently negative flows described in Section 4.5. As explained 2254 in that section, the most important and first step is to estimate the 2255 contribution of persistently negative flows to the bulk volume of 2256 downstream pre-congestion and to inflate this bulk volume as if these 2257 flows weren't there. The process below has been designed to give an 2258 unbiased estimate, but it may be possible to define other processes 2259 that achieve similar ends. 2261 While the above simple metering algorithm is counting the bulk of 2262 traffic over an accounting period, the meter should also select a 2263 subset of the whole flow ID space that is small enough to be able to 2264 realistically measure but large enough to give a realistic sample. 2265 Many different samples of different subsets of the ID space should be 2266 taken at different times during the accounting period, preferably 2267 covering the whole ID space. During each sample, the meter should 2268 count the volume of positive packets and subtract the volume of 2269 negative, maintaining a separate account for each flow in the sample. 2270 It should run a lot longer than the large majority of flows, to avoid 2271 a bias from missing the starts and ends of flows, which tend to be 2272 positive and negative respectively. 2274 Once the accounting period finishes, the meter should calculate the 2275 total of the accounts V_{bI} for the subset of flows I in the sample, 2276 and the total of the accounts V_{fI} excluding flows with a negative 2277 account from the subset I. Then the weighted mean of all these 2278 samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} 2279 V_{bI}. 2281 If V_b is the result of the bulk accounting algorithm over the 2282 accounting period (Appendix C.1) it can be inflated by this factor 2283 a_S to get a good unbiased estimate of the volume of downstream 2284 congestion over the accounting period a_S.V_b, without being polluted 2285 by the effect of persistently negative flows. 2287 Appendix D. Re-TTL 2289 This Appendix gives an overview of a proposal to be able to overload 2290 the TTL field in the IP header to monitor downstream propagation 2291 delay. This is included to show that it would be possible to take 2292 account of RTT if it was deemed desirable. 2294 Delay re-feedback can be achieved by overloading the TTL field, 2295 without changing IP or router TTL processing. A target value for TTL 2296 at the destination would need standardising, say 16. If the path hop 2297 count increased by more than 16 during a routing change, it would 2298 temporarily be mistaken for a routing loop, so this target would need 2299 to be chosen to exceed typical hop count increases. The TCP wire 2300 protocol and handlers would need modifying to feed back the 2301 destination TTL and initialise it. It would be necessary to 2302 standardise the unit of TTL in terms of real time (as was the 2303 original intent in the early days of the Internet). 2305 In the longer term, precision could be improved if routers 2306 decremented TTL to represent exact propagation delay to the next 2307 router. That is, for a router to decrement TTL by, say, 1.8 time 2308 units it would alternate the decrement of every packet between 1 & 2 2309 at a ratio of 1:4. Although this might sometimes require a seemingly 2310 dangerous null decrement, a packet in a loop would still decrement to 2311 zero after 255 time units on average. As more routers were upgraded 2312 to this more accurate TTL decrement, path delay estimates would 2313 become increasingly accurate despite the presence of some RFC3168 2314 compliant routers that continued to always decrement the TTL by 1. 2316 Appendix E. Argument for holding back the ECN nonce 2318 The ECN nonce is a mechanism that allows a /sending/ transport to 2319 detect if drop or ECN marking at a congested router has been 2320 suppressed by a node somewhere in the feedback loop---another router 2321 or the receiver. 2323 Space for the ECN nonce was set aside in [RFC3168] (currently 2324 proposed standard) while the full nonce mechanism is specified in 2325 [RFC3540] (currently experimental). The specifications for [RFC4340] 2326 (currently proposed standard) requires that "Each DCCP sender SHOULD 2327 set ECN Nonces on its packets...". It also mandates as a requirement 2328 for all CCID profiles that "Any newly defined acknowledgement 2329 mechanism MUST include a way to transmit ECN Nonce Echoes back to the 2330 sender.", therefore: 2332 o The CCID profile for TCP-like Congestion Control [RFC4341] 2333 (currently proposed standard) says "The sender will use the ECN 2334 Nonce for data packets, and the receiver will echo those nonces in 2335 its Ack Vectors." 2337 o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] 2338 recommends that "The sender [use] Loss Intervals options' ECN 2339 Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to 2340 probabilistically verify that the receiver is correctly reporting 2341 all dropped or marked packets." 2343 The primary function of the ECN nonce is to protect the integrity of 2344 the information about congestion: ECN marks and packet drops. 2345 However, when the nonce is used to protect the integrity of 2346 information about packet drops, rather than ECN marks, a transport 2347 layer nonce will always be sufficient (because a drop loses the 2348 transport header as well as the ECN field in the network header), 2349 which would avoid using scarce IP header codepoint space. Similarly, 2350 a transport layer nonce would protect against a receiver sending 2351 early acknowledgements [Savage99]. 2353 If the ECN nonce reveals integrity problems with the information 2354 about congestion, the sending transport can use that knowledge for 2355 two functions: 2357 o to protect its own resources, by allocating them in proportion to 2358 the rates that each network path can sustain, based on congestion 2359 control, 2361 o and to protect congested routers in the network, by slowing down 2362 drastically its connection to the destination with corrupt 2363 congestion information. 2365 If the sending transport chooses to act in the interests of congested 2366 routers, it can reduce its rate if it detects some malicious party in 2367 the feedback loop may be suppressing ECN feedback. But it would only 2368 be useful to congested routers when /all/ senders using them are 2369 trusted to act in interest of the congested routers. 2371 In the end, the only essential use of a network layer nonce is when 2372 sending transports (e.g. large servers) want to allocate their /own/ 2373 resources in proportion to the rates that each network path can 2374 sustain, based on congestion control. In that case, the nonce allows 2375 senders to be assured that they aren't being duped into giving more 2376 of their own resources to a particular flow. And if congestion 2377 suppression is detected, the sending transport can rate limit the 2378 offending connection to protect its own resources. Certainly, this 2379 is a useful function, but the IETF should carefully decide whether 2380 such a single, very specific case warrants IP header space. 2382 In contrast, Re-ECN allows all routers to fully protect themselves 2383 from such attacks, without having to trust anyone - senders, 2384 receivers, neighbouring networks. Re-ECN is therefore proposed in 2385 preference to the ECN nonce on the basis that it addresses the 2386 generic problem of accountability for congestion of a network's 2387 resources at the IP layer. 2389 Delaying the ECN nonce is justified because the applicability of the 2390 ECN nonce seems too limited for it to consume a two-bit codepoint in 2391 the IP header. It therefore seems prudent to give time for an 2392 alternative way to be found to do the one function the nonce is 2393 essential for. 2395 Moreover, while we have re-designed the Re-ECN codepoints so that 2396 they do not prevent the ECN nonce progressing, the same is not true 2397 the other way round. If the ECN nonce started to see some deployment 2398 (perhaps because it was blessed with proposed standard status), 2399 incremental deployment of Re-ECN would effectively be impossible, 2400 because Re-ECN marking fractions at inter-domain borders would be 2401 polluted by unknown levels of nonce traffic. 2403 The authors are aware that Re-ECN must prove it has the potential it 2404 claims if it is to displace the nonce. Therefore, every effort has 2405 been made to complete a comprehensive specification of Re-ECN so that 2406 its potential can be assessed. We therefore seek the opinion of the 2407 Internet community on whether the Re-ECN protocol is sufficiently 2408 useful to warrant standards action. 2410 Authors' Addresses 2412 Bob Briscoe (editor) 2413 BT 2414 B54/77, Adastral Park 2415 Martlesham Heath 2416 Ipswich IP5 3RE 2417 UK 2419 Phone: +44 1473 645196 2420 EMail: bob.briscoe@bt.com 2421 URI: http://bobbriscoe.net/ 2423 Arnaud Jacquet 2424 BT 2425 B54/70, Adastral Park 2426 Martlesham Heath 2427 Ipswich IP5 3RE 2428 UK 2430 Phone: +44 1473 647284 2431 EMail: arnaud.jacquet@bt.com 2432 URI: 2434 Toby Moncaster 2435 Moncaster.com 2436 Dukes 2437 Layer Marney 2438 Colchester CO5 9UZ 2439 UK 2441 EMail: toby@moncaster.com 2442 Alan Smith 2443 BT 2444 B54/76, Adastral Park 2445 Martlesham Heath 2446 Ipswich IP5 3RE 2447 UK 2449 Phone: +44 1473 640404 2450 EMail: alan.p.smith@bt.com