idnits 2.17.1 draft-jholland-cb-assisted-cc-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.ietf-tsvwg-circuit-breaker]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 21, 2017) is 2559 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'TBD' is mentioned on line 342, but not defined ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group J. Holland 3 Internet-Draft Akamai Technologies, Inc. 4 Intended status: Experimental April 21, 2017 5 Expires: October 23, 2017 7 Circuit Breaker Assisted Congestion Control (CBACC): Protocol 8 Specification 9 draft-jholland-cb-assisted-cc-01 11 Abstract 13 This document specifies Circuit Breaker Assisted Congestion Control 14 (CBACC), which provides bandwidth information from senders to 15 intermediate network nodes to enable good decisions for fast-trip 16 Network Transport Circuit Breaker activity 17 ([I-D.ietf-tsvwg-circuit-breaker]) when necessary for network health. 18 CBACC is specifically designed to support protocols using IP 19 multicast, particularly as a supplement to receiver-driven congestion 20 control protocols to help affected networks rapidly detect and 21 mitigate the impact of scenarios in which a network is oversubscribed 22 to flows which are not responsive to congestion. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on October 23, 2017. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 4. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 62 5. Protocol Specification . . . . . . . . . . . . . . . . . . . 5 63 5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 5 64 5.2. Packet Header Fields . . . . . . . . . . . . . . . . . . 6 65 5.2.1. Bandwidth Advertisement . . . . . . . . . . . . . . . 6 66 5.2.1.1. As an IP header option . . . . . . . . . . . . . 6 67 5.2.1.2. Field definitions . . . . . . . . . . . . . . . . 7 68 5.3. States . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 5.3.1. Interface State . . . . . . . . . . . . . . . . . . . 8 70 5.3.2. Flow State . . . . . . . . . . . . . . . . . . . . . 9 71 5.4. Functionality . . . . . . . . . . . . . . . . . . . . . . 10 72 6. Requirements from other building blocks . . . . . . . . . . . 12 73 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 74 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 75 8.1. Forged Packets . . . . . . . . . . . . . . . . . . . . . 13 76 8.2. Overloading of Slow Paths . . . . . . . . . . . . . . . . 14 77 8.3. Overloading of State . . . . . . . . . . . . . . . . . . 14 78 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 79 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 80 10.1. Normative References . . . . . . . . . . . . . . . . . . 15 81 10.2. Informative References . . . . . . . . . . . . . . . . . 16 82 Appendix A. Overjoining . . . . . . . . . . . . . . . . . . . . 18 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 85 1. Introduction 87 This document specifies Circuit Breaker Assisted Congestion Control 88 (CBACC). 90 CBACC is a congestion control building block designed for use with IP 91 traffic that has a known maximum bandwidth, which does not reduce its 92 sending rate in response to congestion. CBACC is specifically 93 designed to supplement protocols using receiver-driven multicast 94 congestion control systems that rely on well-behaved receivers to 95 achieve congestion control in a very highly scalable system (up to 96 millions of receivers) without a feedback path that reduces sending 97 rates by senders. Examples of congestion control systems fitting 98 this description include PLM, RLM, RLC, FLID-DL, SMCC, ESMCC, QIRLM, 99 and WEBRC [RFC3738]. 101 CBACC addresses a vulnerability to "overjoining", a condition in 102 which receivers (particularly malicious receivers) subscribe to 103 traffic which, from the sending side, is non-responsive to 104 congestion. Overjoining attacks and the challenges they present are 105 discussed in more detail in Appendix A. 107 A careful reading of the congestion control requirements of UDP Best 108 Practices [I-D.ietf-tsvwg-rfc5405bis] suggests that a network that 109 forwards multicast traffic is required to operate a circuit breaker 110 to maintain network health under a persistent overjoining condition, 111 at a cost of cutting off some or all multicast traffic across the 112 network during high congestion. 114 CBACC provides a mechanism for networks to mitigate the impact of 115 overjoining within a network by introducing a mechanism for 116 communicating the bandwidth of non-responsive flows from the sender 117 of the flow to the transit nodes forwarding the flow. The bandwidth 118 information is sufficient to implement a fast-trip circuit breaker 119 [I-D.ietf-tsvwg-circuit-breaker] within a single network node which 120 can specifically block or police flows when receivers have overjoined 121 the network's capacity. 123 In conjunction with receiver counts (e.g. via [RFC6807]) such nodes 124 can also provide much improved network fairness for circuit breaking 125 decisions during an overjoining condition. 127 In addition to streams using multicast receiver-driven congestion 128 control, CBACC may also be suitable for use with other traffic, both 129 unicast and multicast, that does not respond to congestion by 130 reducing sending rates, including certain profiles of RTP [RFC3550] 131 over either unicast or multicast, as well as several tunneling 132 protocols (e.g. AMT [RFC7450] and GRE [RFC2784]) when they are known 133 to carry traffic that would be suitable for CBACC. A complete 134 specification for use of CBACC with unicast protocols and with 135 tunneling protocols is out of scope for this document, though the 136 security issues section does mention a few special considerations for 137 potential unicast usage. 139 CBACC-compliant senders transmit Bandwidth Advertisements through the 140 same transport path as the data traffic, so that circuit breakers can 141 make informed decisions about how flows should be prioritized for 142 circuit breaking. Additionally, CBACC-compliant circuit breakers 143 transmit information to receivers about flows which have been or 144 might soon be circuit-broken, to encourage CBACC-aware applications 145 to use alternate methods to retrieve equivalent (though probably 146 lower-quality and possibly less efficient) data when possible. 148 This document describes a building block as defined in [RFC3048]. 149 This document describes a congestion control building block that 150 conforms to [RFC2357]. This document follows the general guidelines 151 provided in [RFC3269], in addition to the requirements on RFCs from 152 [RFC5226] and [RFC3552]. 154 2. Terminology 156 +--------------+----------------------------------------------------+ 157 | Term | Definition | 158 +--------------+----------------------------------------------------+ 159 | circuit | See [I-D.ietf-tsvwg-circuit-breaker] | 160 | breaker | | 161 | controlled | See [I-D.ietf-tsvwg-rfc5405bis] Section 3.6 | 162 | environment | | 163 | general | See [I-D.ietf-tsvwg-rfc5405bis] Section 3.6 | 164 | internet | | 165 | flow | traffic for a single (source,destination) IP pair, | 166 | | including destinations that are group addresses | 167 | upstream | along a network topology path in the direction of | 168 | | a flow's sender | 169 | downstream | along a network topology path in the direction of | 170 | | a flow's receiver | 171 | ingress | the (single) upstream interface for a flow in a | 172 | interface | circuit breaker | 173 | egress | a downstream interface for a flow in a circuit | 174 | interface | breaker | 175 +--------------+----------------------------------------------------+ 177 Table 1 179 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 180 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 181 document are to be interpreted as described in [RFC2119]. 183 3. Rationale 185 CBACC is defined as an independent congestion control building block 186 because it would be a useful supplement a wide variety of receiver- 187 driven multicast congestion control schemes, such as [PLM] or other 188 methods based on receiver-driven conformance to a measurement of 189 available network bandwidth or congestion. 191 CBACC is also potentially valuable, even without other congestion 192 control systems, in controlled environments where congestion control 193 may not be required (e.g. for certain profiles of RTP [RFC3550]), 194 since CBACC can provide protection for such a network against 195 congestion due to sender or network mis-configuration. 197 CBACC provides a new form of communication between senders and 198 network transit nodes to facilitate fast-trip circuit breakers as 199 described in section 5.1 of [I-D.ietf-tsvwg-circuit-breaker] which 200 are not available via previously existing methods. When used in 201 conjunction with compatible circuit breakers, CBACC can greatly 202 improve the safety of a network that accepts and delivers interdomain 203 massively scalable multicast traffic to potentially untrusted 204 receivers. 206 4. Applicability 208 CBACC relies on the presence of CBACC-aware circuit breakers on a 209 flow's transit path in order to provide congestion control in a 210 network. In the absence of any CBACC-aware circuit breakers on a 211 network path, CBACC constitutes a small extra overhead to a flow 212 without providing any additional value. 214 CBACC provides a form of congestion control for massively scalable 215 protocols using the IP multicast service. CBACC is best used in 216 conjunction with another receiver-driven multicast congestion 217 control, but it is also suitable for use even without another 218 congestion control mechanism, or when presence of another congestion 219 control mechanism is unproven, such as when accepting multicast joins 220 from untrusted receivers. 222 5. Protocol Specification 224 5.1. Overview 226 CBACC senders send Bandwidth Advertisement packets to advertise the 227 maximum sending bandwidth along the data path for a flow through a 228 network. 230 CBACC bandwidth information is monitored by CBACC circuit breakers 231 along the network path, which may block the forwarding of traffic for 232 some flows in order to maintain network health. When a flow is 233 blocked, a CBACC circuit breaker sets a bit in Bandwidth 234 Advertisement packets before they're forwarded downstream that 235 indicates to subscribed receivers of that flow that traffic has been 236 blocked. 238 The protocol also defines a way to notify downstream receivers when a 239 flow is in danger of being circuit broken in the near future. A 240 CBACC-capable transport node SHOULD send this information when it is 241 known, as described in section [TBD]. This gives applications an 242 opportunity to gracefully shift to a lower-bandwidth version of the 243 same content, when possible, providing an early warning system for 244 avoiding congestion more smoothly. 246 A Bandwidth Advertisement packet constitutes an "ingress meter" as 247 described in section 3.1 of [I-D.ietf-tsvwg-circuit-breaker]. The 248 configured bandwidth caps of egress interfaces likewise constitute 249 "egress meters". However, the diagram in the referenced document is 250 simplified by running the ingress and egress on the same network 251 node. At the CBACC-aware circuit breaker, the CBACC node has both 252 pieces of information as soon as a Bandwidth Advertisement is 253 received, and can trip the circuit breaker if the aggregate 254 advertised CBACC bandwidth exceeds the actual bandwidth available on 255 any egress interfaces. 257 5.2. Packet Header Fields 259 5.2.1. Bandwidth Advertisement 261 5.2.1.1. As an IP header option 263 Bandwidth advertisements can appear as either an IPv4 header option 264 (as in Section 3.1 of [RFC0791]) or as an IPv6 extension header 265 option (as in section 4.2 of [RFC2460]). They have the same layout: 267 0 1 2 3 268 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | Type | Length |B|D|P| Res | Priority | 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 272 | Bandwidth | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 Figure 1 277 Bandwidth advertisements sent as IPv4 header options use option value 278 [TBD], with the "copied" bit set and the option class "control", as 279 specified in [RFC0791] section 3.1. Until and unless IANA assigns a 280 value, this will be option number 158 as described in section 8 of 281 [RFC4727] for experiments using IPv4 Option types. The length field 282 is 8. 284 Bandwidth advertisements sent as IPv6 header options use option value 285 [TBD], with the "action" bits set to "skip" and the "change" bit set 286 to 1, as specified in [RFC2460] section 4.2. Until and unless IANA 287 assigns a value, this will be option number 0x3e as described in 288 section 8 of [RFC4727] for experiments using IPv6 Option Types. The 289 length field is 6. 291 Using an IP header option has the benefit of exposing the bandwidth 292 to all CBACC-compatible routers, in much the same way the IP Router 293 Alert option would, but without being processed or causing undue load 294 in non-CBACC routers. 296 The IP Header encapsulations DO work with IPSEC. As described in 297 Appendix A of [RFC4302], the IP header fields are properly treated as 298 mutable and zeroed for the IPSEC ICV calculations. CBACC circuit 299 breakers MAY change bits in transit. The Bandwidth Advertisement 300 header itself IS NOT protected by IPSEC security services, but 301 protection of other parts of the packet remain unchanged. 303 5.2.1.2. Field definitions 305 5.2.1.2.1. Bandwidth 307 As in several other protocols sending bandwidth values such as OSPF- 308 TE [RFC3630], the bandwidth is expressed in bytes per second (not 309 bits), in IEEE floating point format. For quick reference, this 310 format is as follows: 312 0 1 2 3 313 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 |S| Exponent | Fraction | 316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 Figure 2 320 S is the sign, Exponent is the exponent base 2 in "excess 127" 321 notation, and Fraction is the mantissa - 1, with an implied binary 322 point in front of it. Thus, the above represents the value: 324 (-1)**(S) * 2**(Exponent-127) * (1 + Fraction) 326 For more details, refer to [IEEE.754.1985]. 328 Figure 3 330 5.2.1.2.2. B (Blocked) bit 332 Indicates that the flow has been circuit-broken. 334 5.2.1.2.3. D (Danger) bit 336 Indicates that the flow is in danger of being circuit-broken. 338 5.2.1.2.4. P (Police) bit 340 Indicates that the flow should be policed instead of blocked. Flows 341 marked for policing by the sender should have traffic proportionally 342 dropped when bandwidth is needed, according to their priority. [TBD] 343 Flesh this concept out, and decide whether it's actually viable. 344 This was my attempt at addressing a suggestion from Bob Briscoe at 345 IETF 97 in ICCRG at the mic, IIRC. It probably requires more state, 346 such as total desired policable bandwidth, total current policed 347 bandwidth, and current policing bandwidth per-flow, plus some 348 definition of how to decide between cutting off some flows and 349 policing others. This may not be worth the hassle, but there are 350 some use cases such as FEC repair traffic which might actually be 351 nicer this way. However, it might also be possible to get the same 352 effect by assigning priority to those repair flows. Things like 353 video enhancement layers of course are probably better done as a 354 complete cutoff. 356 5.2.1.2.5. Res (Reserved bits) 358 The sender MUST set all reserved bits to 0 when sending a CBACC 359 control packet. Receivers and CBACC-capable transit nodes MUST 360 accept any value in the reserved bits. 362 5.2.1.2.6. Priority 364 The sender MAY indicate relative priorities of different streams from 365 the same sender with this field. This is an 8-bit unsigned integer, 366 and higher values are kept preferentially over other traffic from the 367 same sender with lower priority values, so all flows with a lower 368 priority value are circuit-broken before any flows with a higher 369 priority value. Among multiple flows from the same sender with the 370 same priority, the highest bandwidth flows are circuit- broken first. 372 5.3. States 374 5.3.1. Interface State 376 A CBACC circuit breaker holds the following state for each interface, 377 for both the inbound and outbound directions on that interface: 379 o aggregate bandwidth: The sum of the bandwidths of all non- 380 circuit-broken CBACC flows which transit this interface in this 381 direction. 383 o bandwidth limit: The maximum aggregate CBACC advertised bandwidth 384 allowed, not including circuit-broken flows. This may depend on 385 administrative configuration and congestion measurements for the 386 network, whether from this node or other nodes. It's out of 387 scope for this document to define such congestion measurements. 388 Network operators should carefully consider that this bandwidth 389 limit applies to flows that are unresponsive to congestion. 391 When reducing the bandwidth limit due to congestion, the circuit 392 breaker MUST NOT reduce the limit by more than half its value in 393 10 seconds, and SHOULD use a smoothing function to reduce the 394 limit gradually over time. 396 It is RECOMMENDED that no more than half the capacity for a link 397 be allocated to CBACC flows if the link might be shared with TCP 398 or other traffic that is responsive to congestion. 400 Depending on administrative configuration and the physical 401 characteristics of the interface, the bandwidth limit may be 402 either shared between upstream and downstream traffic, or it may 403 be separate. Either a single shared value should be used, or two 404 separate independent values should be used for the inbound and 405 outbound directions for an interface. 407 o CBACC bandwidth warning threshold: A soft bandwidth threshold. 408 When the aggregate CBACC advertised bandwidth exceeds this 409 threshold, flows that would have been circuit-broken with a 410 bandwidth limit at this threshold MUST have the Danger bit set in 411 the Bandwidth Advertisement packets that are forwarded by this 412 circuit breaker. This threshold SHOULD be configurable as a 413 proportion of the bandwidth limit, and MUST remain at or below 414 the bandwidth limit when the bandwidth limit changes. The 415 recommended proportion value is .75, but specific networks may 416 use a different value if deemed useful by the network operators. 418 5.3.2. Flow State 420 The following state is kept for flows that are joined from at least 421 one downstream interface and for which at least one CBACC Bandwidth 422 Advertisement packet has been received: 424 o bandwidth: The bandwidth from the most receintly received 425 Bandwidth Advertisement. 427 o ingress status: One of the following values: 429 * 'subscribed' 430 Indicates that the circuit breaker is subscribed upstream to 431 the flow and forwarding data and control packets through zero 432 or more egress interfaces. 434 * 'pruned' 435 Indicates that the flow has been circuit-broken. A request to 436 unsubscribe from the flow has been sent upstream, e.g. a PIM 437 prune (section 3.5 of [RFC7761]) or a "leave" operation via 438 IGMP, MLD, or another appropriate group membership protocol. 440 * 'probing' 441 Indicates that the flow was circuit-broken previously, and is 442 currently joined upstream to refresh the most recent Bandwidth 443 Advertisement in order to evaluate reinstating the flow. 445 o probe timer: Used to periodically probe a flow in the 'pruned' 446 state, to evaluate returning to 'forwarding'. 448 Flows additionally have a per-interface state for egress interfaces: 450 o egress status: One of the following values: 452 * 'forwarding' 453 Indicates that the flow is a non-circuit-broken flow in steady 454 state, forwarding data and control packets downstream. 456 * 'blocked' 457 Indicates that data packets for this flow are NOT forwarded 458 downstream via this interface. Bandwidth Advertisements are 459 still forwarded, each with the 'Blocked' bit set to 1. All 460 other flow traffic MUST be dropped. 462 5.4. Functionality 464 The CBACC building block on a sender MUST have access to the maximum 465 bandwidth that may be sent at any time in the following 3 seconds. A 466 CBACC sender MUST send this value in a Bandwidth Advertisement packet 467 once per second. The end result of the traffic sent on the wire for 468 a particular flow MUST honor this maximum bandwidth commitment, such 469 that bandwidth measurements taken over any sliding window one-second 470 period MUST NOT exceed any of prior 3 maximum Bandwidth 471 Advertisements (or any of them, if fewer than 3 have been sent). 473 A CBACC circuit breaker MUST order its monitored flows based on per- 474 flow estimates of network fairness and preferentially circuit break 475 less fair flows when bandwidth limits are exceeded. A normative 476 method to determine network fairness for a flow is out of scope for 477 this document, but CBACC circuit breaker implementations SHOULD 478 provide a capability for network operators to configure 479 administrative biases for specific sets of flows, and network 480 operators SHOULD consider fairness concerns as expressed in [RFC2914] 481 section 3.2 and other relevant documents describing best practices. 483 In particular, fairness metrics SHOULD favor multicast flows with 484 many receivers over multicast flows with few receivers and flows with 485 low bandwidth over flows with high bandwidth. When receiver counts 486 are known (for example via the experimental PIM extension specified 487 in [RFC6807]) a RECOMMENDED metric is (bandwidth/receiver count), 488 though other metrics MAY be used where deemed appropriate by network 489 operators following internet best practices, or when receiver counts 490 can't be determined. 492 A CBACC sender MUST send Bandwidth Advertisements once per second. 493 (Implementation-specific jitter in timer implementations not 494 exceeding .1s is acceptable.) 496 If a circuit breaker receives more than 5 Bandwidth Advertisement 497 packets for a flow in two seconds, the circuit breaker SHOULD set the 498 flow to "pruned" and leave the upstream channel, and MUST drop 499 Bandwidth Advertisement packets in excess of one per second. 501 Flows which are currently circuit-broken on an egress interface are 502 set to "blocked". When a flow on an egress interface is in blocked 503 state, Bandwidth Advertisement packets MUST be forwarded except as 504 described in the preceding paragraph, the "Blocked" bit MUST be set 505 to 1 before forwarding, and other traffic for that flow MUST NOT be 506 forwarded along that interface. 508 When a flow is blocked or pruned, the circuit breaker MAY truncate 509 the Bandwidth Advertisement packet, keeping only the headers of the 510 packet containing the Bandwidth Advertisement before forwarding. 512 When a flow is pruned, the circuit-breaker MUST generate and forward 513 a Bandwidth Advertisement packet once per second with the "Blocked" 514 bit set when there are still downstream receivers connected. 516 In flows which are not circuit-broken but which would be circuit- 517 broken if the bandwidth warning threshold were the bandwidth limit, 518 the Danger bit MUST be set to 1 before forwarding. Both data and 519 control packets are forwarded for flows in this situation. The 520 "Danger" bit MAY be used by receivers to take early action to avoid 521 getting circuit-broken by shifting to a lower-bandwidth 522 representation, if available. 524 When a flow is in the "blocked" state on every egress interface, the 525 circuit breaker MAY set the flow to "pruned" on the ingress interface 526 and leave the channel upstream. 528 In addition to monitoring the advertised bandwidth, a CBACC circuit 529 breaker or other assisting nodes in the network SHOULD monitor the 530 observed bandwidth per flow, and SHOULD circuit break "overactive" 531 flows, defined as those which exceed their CBACC maximum bandwidth 532 commitment. A circuit breaker MAY perform constant monitoring on all 533 flows, or MAY use load sharing techniques such as random selection or 534 round robin to monitor only a certain subset of flows at a time. 536 When detecting overactive flows, circuit breakers MUST use techniques 537 to avoid false positives due to transient upstream network conditions 538 such as packet compression or occasional packet duplication. For 539 example, using an average of bandwidth measurements over the prior 3 540 seconds would qualify, where a half-second window would not. (A full 541 listing of reasonable false-positive avoidance techniques is out of 542 scope for this document.) 544 [TBD: examples with network diagrams and bandwidths?] [TBD: some 545 internal structure on this section. "wall of text" was some feedback] 547 6. Requirements from other building blocks 549 The sender needs to know the bandwidth, including any upcoming 550 changes, at least 3 seconds in advance. There is no requirement on 551 how building blocks define this functionality except on the packets 552 on the wire--the advance knowledge might, for example, be implemented 553 by buffering and pacing on the sending machine. Specifics of the 554 sending bandwidth implementations are out of scope for this document, 555 as it's intended to provide requirements that will be applicable to a 556 broad range of possible implementations, including RTP and WEBRC. 558 7. IANA Considerations 560 This draft requests IANA to allocate an IPv6 packet header option 561 number with the "action" bits set to "skip" and the "change" bit set 562 to 1, as specified in [RFC2460] section 4.2. [TO BE REMOVED: This 563 registration should take place at the following location: 564 http://www.iana.org/assignments/ipv6-parameters/ipv6- 565 parameters.xhtml#extension-header.] 567 This draft also requests IANA to allocate an IPv4 packet header 568 option number with the "copied" bit set and the option class 569 "control", as specified in [RFC0791] section 3.1. [TO BE REMOVED: 570 This registration should take place at the following location: 572 http://www.iana.org/assignments/ip-parameters/ip-parameters.xhtml#ip- 573 parameters-1.] 575 If those are deemed unacceptable, as an alternative with some 576 compromises described in Section 5.2.1, this draft instead requests 577 IANA to allocate a UDP destination port number. [TO BE REMOVED: This 578 registration should take place at the following location: 579 http://www.iana.org/assignments/service-names-port-numbers/service- 580 names-port-numbers.xhtml.] 582 8. Security Considerations 584 8.1. Forged Packets 586 Forged Bandwidth Advertisement packets that get accepted by CBACC 587 circuit breakers which dramatically over-report or under-report the 588 correct bandwidth would present a potential DoS against a CBACC flow, 589 by making the circuit breaker believe the flow exceeds the node's 590 capacity when over-reporting, or by letting the node notice an 591 apparent violation of the commitment to remain under the advertised 592 bandwidth when under-reporting. 594 Similarly, it is possible to forge a CBACC Bandwidth Advertisement 595 for a non-CBACC flow, which likewise may constitute a DoS against 596 that flow. 598 For multicast, attacker would have to be on-path in order to deliver 599 a forged packet to a CBACC circuit breaker, because the join's 600 reverse path propagation will only reach the sender on a legitimate 601 network path to its source address. 603 For unicast, it's a bigger problem, because ANY sender along path 604 that doesn't have RPF check BCP 38 [RFC2827] permits attack on the 605 flow via forged packet that substantially under-reports or over- 606 reports bandwidth. 608 For AMT tunnels, when RPF checks along a path to the gateway are not 609 present, nothing stops forged packets from being forwarded by the 610 gateway. If these packets contain CBACC control packets, it's 611 possible to inject a forged packet into the network downstream from 612 the gateway, combining the unicast hole with the multicast hole. 613 This is a vulnerability that should probably be addressed by a new 614 AMT version with some defense against forgery of data. 616 For IPSEC, since the Bandwidth Advertisement IP header option is 617 mutable, it's not protected by the IPSEC security services, so the 618 Bandwidth Advertisement can be forged for consumption by the circuit 619 breakers, even though the packet will be rejected by the end host 620 with the security association. This could mount a DoS via the 621 intermediate circuit-breakers by over-reporting or under-reporting 622 flow bandwidth, when processing CBACC traffic through untrusted 623 network paths. 625 The unicast vulnerabilities would be much mitigated by RPF checks as 626 recommended by BCP 38 [RFC2827] at every hop, or otherwise maintained 627 by the network. Absent such checks, cheap DoS vulnerabilities may be 628 present from any permissive network locations. 630 8.2. Overloading of Slow Paths 632 CBACC control packets are sent as part of the data stream so that 633 they traverse the same intermediate network nodes as the rest of the 634 data, but they also carry control information that must be processed 635 by certain nodes along that path. 637 This creates potential problems very similar to the problems with the 638 Router Alert IP option discussed in Section 3 of [RFC6398], where a 639 circuit-breaker might have a "fast path" for forwarding that can 640 handle a much higher traffic volume than the "slow path" necessary to 641 process CBACC control packets, which is potentially vulnerable to 642 overloading. 644 If a CBACC-compatible circuit breaker receives a high rate of CBACC 645 control packets, the circuit breaker MUST maintain network health for 646 other flows. A circuit-breaker MAY drop all packets, including all 647 CBACC control packets, for a flow in which more than 5 CBACC control 648 packets were received in less than a second. (This number is 649 intended to allow for moderate IP packet duplication and packet 650 compression by upstream routers, while still being slow enough for 651 handling of packets on the slow path.) 653 8.3. Overloading of State 655 Since CBACC flows require state, it may be possible for a set of 656 receivers and/or senders, possibly acting in concert, to generate 657 many flows in an attempt to overflow the circuit breakers' state 658 tables. 660 It is permissible for a network node to behave as a CBACC circuit 661 breaker for some CBACC flows while treating other CBACC flows as non- 662 CBACC, as part of a load balancing strategy for the network as a 663 whole, or simply as defense against this concern when the number of 664 monitored flows exceeds some threshold. 666 The same techniques described in section 3.1 of [RFC4609] can be used 667 to help mitigate this attack, for much the same reasons. It is 668 RECOMMENDED that network operators implement measures to mitigate 669 such attacks. 671 9. Acknowledgements 673 Many thanks to Devin Anderson and Ben Kaduk for detailed reviews and 674 many great suggestions. Thanks also to Cheng Jin, Scott Brown, 675 Miroslav Kaduk, and Bob Briscoe for their thoughtful contributions. 677 10. References 679 10.1. Normative References 681 [IEEE.754.1985] 682 Institute of Electrical and Electronics Engineers, 683 "Standard for Binary Floating-Point Arithmetic", 684 IEEE Standard 754, August 1985. 686 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 687 DOI 10.17487/RFC0791, September 1981, 688 . 690 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 691 Requirement Levels", BCP 14, RFC 2119, 692 DOI 10.17487/RFC2119, March 1997, 693 . 695 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 696 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, 697 December 1998, . 699 [RFC3048] Whetten, B., Vicisano, L., Kermode, R., Handley, M., 700 Floyd, S., and M. Luby, "Reliable Multicast Transport 701 Building Blocks for One-to-Many Bulk-Data Transfer", 702 RFC 3048, DOI 10.17487/RFC3048, January 2001, 703 . 705 [RFC3738] Luby, M. and V. Goyal, "Wave and Equation Based Rate 706 Control (WEBRC) Building Block", RFC 3738, 707 DOI 10.17487/RFC3738, April 2004, 708 . 710 [RFC4302] Kent, S., "IP Authentication Header", RFC 4302, 711 DOI 10.17487/RFC4302, December 2005, 712 . 714 [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, 715 ICMPv6, UDP, and TCP Headers", RFC 4727, 716 DOI 10.17487/RFC4727, November 2006, 717 . 719 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 720 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 721 Multicast - Sparse Mode (PIM-SM): Protocol Specification 722 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 723 2016, . 725 10.2. Informative References 727 [I-D.ietf-tsvwg-circuit-breaker] 728 Fairhurst, G., "Network Transport Circuit Breakers", 729 draft-ietf-tsvwg-circuit-breaker-15 (work in progress), 730 April 2016. 732 [I-D.ietf-tsvwg-rfc5405bis] 733 Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 734 Guidelines", draft-ietf-tsvwg-rfc5405bis-19 (work in 735 progress), October 2016. 737 [PLM] A.Legout, E.W.Biersack, Institut EURECOM, "Fast 738 Convergence for Cumulative Layered Multicast Transmission 739 Schemes", 1999. 741 [RFC2357] Mankin, A., Romanow, A., Bradner, S., and V. Paxson, "IETF 742 Criteria for Evaluating Reliable Multicast Transport and 743 Application Protocols", RFC 2357, DOI 10.17487/RFC2357, 744 June 1998, . 746 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 747 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 748 DOI 10.17487/RFC2784, March 2000, 749 . 751 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 752 Defeating Denial of Service Attacks which employ IP Source 753 Address Spoofing", BCP 38, RFC 2827, DOI 10.17487/RFC2827, 754 May 2000, . 756 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 757 RFC 2914, DOI 10.17487/RFC2914, September 2000, 758 . 760 [RFC3269] Kermode, R. and L. Vicisano, "Author Guidelines for 761 Reliable Multicast Transport (RMT) Building Blocks and 762 Protocol Instantiation documents", RFC 3269, 763 DOI 10.17487/RFC3269, April 2002, 764 . 766 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 767 Jacobson, "RTP: A Transport Protocol for Real-Time 768 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 769 July 2003, . 771 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 772 Text on Security Considerations", BCP 72, RFC 3552, 773 DOI 10.17487/RFC3552, July 2003, 774 . 776 [RFC3630] Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering 777 (TE) Extensions to OSPF Version 2", RFC 3630, 778 DOI 10.17487/RFC3630, September 2003, 779 . 781 [RFC4609] Savola, P., Lehtonen, R., and D. Meyer, "Protocol 782 Independent Multicast - Sparse Mode (PIM-SM) Multicast 783 Routing Security Issues and Enhancements", RFC 4609, 784 DOI 10.17487/RFC4609, October 2006, 785 . 787 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 788 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 789 DOI 10.17487/RFC5226, May 2008, 790 . 792 [RFC6398] Le Faucheur, F., Ed., "IP Router Alert Considerations and 793 Usage", BCP 168, RFC 6398, DOI 10.17487/RFC6398, October 794 2011, . 796 [RFC6807] Farinacci, D., Shepherd, G., Venaas, S., and Y. Cai, 797 "Population Count Extensions to Protocol Independent 798 Multicast (PIM)", RFC 6807, DOI 10.17487/RFC6807, December 799 2012, . 801 [RFC7450] Bumgardner, G., "Automatic Multicast Tunneling", RFC 7450, 802 DOI 10.17487/RFC7450, February 2015, 803 . 805 Appendix A. Overjoining 807 [I-D.ietf-tsvwg-rfc5405bis] describes several remedies for unicast 808 congestion control under UDP, even though UDP does not itself provide 809 congestion control. In general, any network node under congestion 810 could in theory collect evidence that a unicast flow's sending rate 811 is not responding to congestion, and would then be justified in 812 circuit-breaking it. 814 With multicast IP, the situation is different, especially in the 815 presence of malicious receivers. A well-behaved sender using a 816 receiver-controlled congestion scheme such as WEBRC does not reduce 817 its send rate in response to congestion, instead relying on receivers 818 to leave the appropriate multicast groups. 820 This leads to a situation where, when a network accepts inter-domain 821 multicast traffic, as long as there are senders somewhere in the 822 world with aggregate bandwidth that exceeds a network's capacity, 823 receivers in that network can join the flows and overflow the network 824 capacity. A receiver controlled by an attacker could do this at the 825 IGMP/MLD level without running the application layer protocol that 826 participates in the receiver-controlled congestion control. 828 A network might be able to detect and defend against the most naive 829 version of such an attack by blocking end users that try to join too 830 many flows at once. However, an attacker can achieve the same effect 831 by joining a few high-bandwidth flows, if those exist anywhere, and 832 an attacker that controls a few machines in a network can coordinate 833 the receivers so they join disjoint sets of non-responsive sending 834 flows. 836 This scenario will produce congestion in a middle node in the network 837 that can't be easily detected at the edge where the IGMP/MLD join is 838 accepted. Thus, an attacker with a small set of machines in a target 839 network can always trip a circuit breaker if present, or can induce 840 excessive congestion among the bandwidth allocated to multicast. 841 This problem gets worse as more multicast flows become available. 843 This is a significant barrier to multicast adoption because there is 844 no present defense which does not itself constitute a denial of 845 service attack. 847 Although the same can apply to non-responsive unicast traffic, 848 network operators can assume that non-responsive sending flows are in 849 violation of congestion control best practices, and can therefore cut 850 off such flows. However, non-responsive multicast senders are likely 851 to be well-behaved participants in receiver-controlled congestion 852 control schemes. 854 However, receiver controlled congestion control schemes also show the 855 most promise for efficient massive scale content distribution via 856 multicast, provided network health can be ensured. Therefore, 857 mechanisms to mitigate overjoining attacks while still permitting 858 receiver-controlled congestion control are necessary. [TBD: this 859 whole section should be expanded and moved to a separate 860 informational draft] 862 TBD: network diagram 864 Figure 4 866 Author's Address 868 Jacob Holland 869 Akamai Technologies, Inc. 870 150 Broadway 871 Cambridge, Massachusetts 02142 872 USA 874 Phone: +1 617 444 3000 875 Email: jholland@akamai.com 876 URI: https://www.akamai.com/