idnits 2.17.1 

draft-jholland-cb-assisted-cc-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references
     ([I-D.ietf-tsvwg-circuit-breaker]), which it shouldn't.  Please replace
     those with straight textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 21, 2017) is 2559 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'TBD' is mentioned on line 342, but not defined

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 5226
     (Obsoleted by RFC 8126)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  J. Holland
3	Internet-Draft                                 Akamai Technologies, Inc.
4	Intended status: Experimental                             April 21, 2017
5	Expires: October 23, 2017

7	     Circuit Breaker Assisted Congestion Control (CBACC): Protocol
8	                             Specification
9	                    draft-jholland-cb-assisted-cc-01

11	Abstract

13	   This document specifies Circuit Breaker Assisted Congestion Control
14	   (CBACC), which provides bandwidth information from senders to
15	   intermediate network nodes to enable good decisions for fast-trip
16	   Network Transport Circuit Breaker activity
17	   ([I-D.ietf-tsvwg-circuit-breaker]) when necessary for network health.
18	   CBACC is specifically designed to support protocols using IP
19	   multicast, particularly as a supplement to receiver-driven congestion
20	   control protocols to help affected networks rapidly detect and
21	   mitigate the impact of scenarios in which a network is oversubscribed
22	   to flows which are not responsive to congestion.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on October 23, 2017.

41	Copyright Notice

43	   Copyright (c) 2017 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
59	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   3.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .   4
61	   4.  Applicability . . . . . . . . . . . . . . . . . . . . . . . .   5
62	   5.  Protocol Specification  . . . . . . . . . . . . . . . . . . .   5
63	     5.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   5
64	     5.2.  Packet Header Fields  . . . . . . . . . . . . . . . . . .   6
65	       5.2.1.  Bandwidth Advertisement . . . . . . . . . . . . . . .   6
66	         5.2.1.1.  As an IP header option  . . . . . . . . . . . . .   6
67	         5.2.1.2.  Field definitions . . . . . . . . . . . . . . . .   7
68	     5.3.  States  . . . . . . . . . . . . . . . . . . . . . . . . .   8
69	       5.3.1.  Interface State . . . . . . . . . . . . . . . . . . .   8
70	       5.3.2.  Flow State  . . . . . . . . . . . . . . . . . . . . .   9
71	     5.4.  Functionality . . . . . . . . . . . . . . . . . . . . . .  10
72	   6.  Requirements from other building blocks . . . . . . . . . . .  12
73	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
74	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  13
75	     8.1.  Forged Packets  . . . . . . . . . . . . . . . . . . . . .  13
76	     8.2.  Overloading of Slow Paths . . . . . . . . . . . . . . . .  14
77	     8.3.  Overloading of State  . . . . . . . . . . . . . . . . . .  14
78	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
79	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
80	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  15
81	     10.2.  Informative References . . . . . . . . . . . . . . . . .  16
82	   Appendix A.  Overjoining  . . . . . . . . . . . . . . . . . . . .  18
83	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  19

85	1.  Introduction

87	   This document specifies Circuit Breaker Assisted Congestion Control
88	   (CBACC).

90	   CBACC is a congestion control building block designed for use with IP
91	   traffic that has a known maximum bandwidth, which does not reduce its
92	   sending rate in response to congestion.  CBACC is specifically
93	   designed to supplement protocols using receiver-driven multicast
94	   congestion control systems that rely on well-behaved receivers to
95	   achieve congestion control in a very highly scalable system (up to
96	   millions of receivers) without a feedback path that reduces sending
97	   rates by senders.  Examples of congestion control systems fitting
98	   this description include PLM, RLM, RLC, FLID-DL, SMCC, ESMCC, QIRLM,
99	   and WEBRC [RFC3738].

101	   CBACC addresses a vulnerability to "overjoining", a condition in
102	   which receivers (particularly malicious receivers) subscribe to
103	   traffic which, from the sending side, is non-responsive to
104	   congestion.  Overjoining attacks and the challenges they present are
105	   discussed in more detail in Appendix A.

107	   A careful reading of the congestion control requirements of UDP Best
108	   Practices [I-D.ietf-tsvwg-rfc5405bis] suggests that a network that
109	   forwards multicast traffic is required to operate a circuit breaker
110	   to maintain network health under a persistent overjoining condition,
111	   at a cost of cutting off some or all multicast traffic across the
112	   network during high congestion.

114	   CBACC provides a mechanism for networks to mitigate the impact of
115	   overjoining within a network by introducing a mechanism for
116	   communicating the bandwidth of non-responsive flows from the sender
117	   of the flow to the transit nodes forwarding the flow.  The bandwidth
118	   information is sufficient to implement a fast-trip circuit breaker
119	   [I-D.ietf-tsvwg-circuit-breaker] within a single network node which
120	   can specifically block or police flows when receivers have overjoined
121	   the network's capacity.

123	   In conjunction with receiver counts (e.g. via [RFC6807]) such nodes
124	   can also provide much improved network fairness for circuit breaking
125	   decisions during an overjoining condition.

127	   In addition to streams using multicast receiver-driven congestion
128	   control, CBACC may also be suitable for use with other traffic, both
129	   unicast and multicast, that does not respond to congestion by
130	   reducing sending rates, including certain profiles of RTP [RFC3550]
131	   over either unicast or multicast, as well as several tunneling
132	   protocols (e.g.  AMT [RFC7450] and GRE [RFC2784]) when they are known
133	   to carry traffic that would be suitable for CBACC.  A complete
134	   specification for use of CBACC with unicast protocols and with
135	   tunneling protocols is out of scope for this document, though the
136	   security issues section does mention a few special considerations for
137	   potential unicast usage.

139	   CBACC-compliant senders transmit Bandwidth Advertisements through the
140	   same transport path as the data traffic, so that circuit breakers can
141	   make informed decisions about how flows should be prioritized for
142	   circuit breaking.  Additionally, CBACC-compliant circuit breakers
143	   transmit information to receivers about flows which have been or
144	   might soon be circuit-broken, to encourage CBACC-aware applications
145	   to use alternate methods to retrieve equivalent (though probably
146	   lower-quality and possibly less efficient) data when possible.

148	   This document describes a building block as defined in [RFC3048].
149	   This document describes a congestion control building block that
150	   conforms to [RFC2357].  This document follows the general guidelines
151	   provided in [RFC3269], in addition to the requirements on RFCs from
152	   [RFC5226] and [RFC3552].

154	2.  Terminology

156	   +--------------+----------------------------------------------------+
157	   |     Term     |                     Definition                     |
158	   +--------------+----------------------------------------------------+
159	   |   circuit    |        See [I-D.ietf-tsvwg-circuit-breaker]        |
160	   |   breaker    |                                                    |
161	   |  controlled  |    See [I-D.ietf-tsvwg-rfc5405bis] Section 3.6     |
162	   | environment  |                                                    |
163	   |   general    |    See [I-D.ietf-tsvwg-rfc5405bis] Section 3.6     |
164	   |   internet   |                                                    |
165	   |     flow     | traffic for a single (source,destination) IP pair, |
166	   |              |  including destinations that are group addresses   |
167	   |   upstream   | along a network topology path in the direction of  |
168	   |              |                  a flow's sender                   |
169	   |  downstream  | along a network topology path in the direction of  |
170	   |              |                 a flow's receiver                  |
171	   |   ingress    |  the (single) upstream interface for a flow in a   |
172	   |  interface   |                  circuit breaker                   |
173	   |    egress    |   a downstream interface for a flow in a circuit   |
174	   |  interface   |                      breaker                       |
175	   +--------------+----------------------------------------------------+

177	                                  Table 1

179	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
180	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
181	   document are to be interpreted as described in [RFC2119].

183	3.  Rationale

185	   CBACC is defined as an independent congestion control building block
186	   because it would be a useful supplement a wide variety of receiver-
187	   driven multicast congestion control schemes, such as [PLM] or other
188	   methods based on receiver-driven conformance to a measurement of
189	   available network bandwidth or congestion.

191	   CBACC is also potentially valuable, even without other congestion
192	   control systems, in controlled environments where congestion control
193	   may not be required (e.g. for certain profiles of RTP [RFC3550]),
194	   since CBACC can provide protection for such a network against
195	   congestion due to sender or network mis-configuration.

197	   CBACC provides a new form of communication between senders and
198	   network transit nodes to facilitate fast-trip circuit breakers as
199	   described in section 5.1 of [I-D.ietf-tsvwg-circuit-breaker] which
200	   are not available via previously existing methods.  When used in
201	   conjunction with compatible circuit breakers, CBACC can greatly
202	   improve the safety of a network that accepts and delivers interdomain
203	   massively scalable multicast traffic to potentially untrusted
204	   receivers.

206	4.  Applicability

208	   CBACC relies on the presence of CBACC-aware circuit breakers on a
209	   flow's transit path in order to provide congestion control in a
210	   network.  In the absence of any CBACC-aware circuit breakers on a
211	   network path, CBACC constitutes a small extra overhead to a flow
212	   without providing any additional value.

214	   CBACC provides a form of congestion control for massively scalable
215	   protocols using the IP multicast service.  CBACC is best used in
216	   conjunction with another receiver-driven multicast congestion
217	   control, but it is also suitable for use even without another
218	   congestion control mechanism, or when presence of another congestion
219	   control mechanism is unproven, such as when accepting multicast joins
220	   from untrusted receivers.

222	5.  Protocol Specification

224	5.1.  Overview

226	   CBACC senders send Bandwidth Advertisement packets to advertise the
227	   maximum sending bandwidth along the data path for a flow through a
228	   network.

230	   CBACC bandwidth information is monitored by CBACC circuit breakers
231	   along the network path, which may block the forwarding of traffic for
232	   some flows in order to maintain network health.  When a flow is
233	   blocked, a CBACC circuit breaker sets a bit in Bandwidth
234	   Advertisement packets before they're forwarded downstream that
235	   indicates to subscribed receivers of that flow that traffic has been
236	   blocked.

238	   The protocol also defines a way to notify downstream receivers when a
239	   flow is in danger of being circuit broken in the near future.  A
240	   CBACC-capable transport node SHOULD send this information when it is
241	   known, as described in section [TBD].  This gives applications an
242	   opportunity to gracefully shift to a lower-bandwidth version of the
243	   same content, when possible, providing an early warning system for
244	   avoiding congestion more smoothly.

246	   A Bandwidth Advertisement packet constitutes an "ingress meter" as
247	   described in section 3.1 of [I-D.ietf-tsvwg-circuit-breaker].  The
248	   configured bandwidth caps of egress interfaces likewise constitute
249	   "egress meters".  However, the diagram in the referenced document is
250	   simplified by running the ingress and egress on the same network
251	   node.  At the CBACC-aware circuit breaker, the CBACC node has both
252	   pieces of information as soon as a Bandwidth Advertisement is
253	   received, and can trip the circuit breaker if the aggregate
254	   advertised CBACC bandwidth exceeds the actual bandwidth available on
255	   any egress interfaces.

257	5.2.  Packet Header Fields

259	5.2.1.  Bandwidth Advertisement

261	5.2.1.1.  As an IP header option

263	   Bandwidth advertisements can appear as either an IPv4 header option
264	   (as in Section 3.1 of [RFC0791]) or as an IPv6 extension header
265	   option (as in section 4.2 of [RFC2460]).  They have the same layout:

267	    0                   1                   2                   3
268	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
269	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
270	   |     Type      |     Length    |B|D|P|   Res   |   Priority    |
271	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
272	   |                           Bandwidth                           |
273	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

275	                                 Figure 1

277	   Bandwidth advertisements sent as IPv4 header options use option value
278	   [TBD], with the "copied" bit set and the option class "control", as
279	   specified in [RFC0791] section 3.1.  Until and unless IANA assigns a
280	   value, this will be option number 158 as described in section 8 of
281	   [RFC4727] for experiments using IPv4 Option types.  The length field
282	   is 8.

284	   Bandwidth advertisements sent as IPv6 header options use option value
285	   [TBD], with the "action" bits set to "skip" and the "change" bit set
286	   to 1, as specified in [RFC2460] section 4.2.  Until and unless IANA
287	   assigns a value, this will be option number 0x3e as described in
288	   section 8 of [RFC4727] for experiments using IPv6 Option Types.  The
289	   length field is 6.

291	   Using an IP header option has the benefit of exposing the bandwidth
292	   to all CBACC-compatible routers, in much the same way the IP Router
293	   Alert option would, but without being processed or causing undue load
294	   in non-CBACC routers.

296	   The IP Header encapsulations DO work with IPSEC.  As described in
297	   Appendix A of [RFC4302], the IP header fields are properly treated as
298	   mutable and zeroed for the IPSEC ICV calculations.  CBACC circuit
299	   breakers MAY change bits in transit.  The Bandwidth Advertisement
300	   header itself IS NOT protected by IPSEC security services, but
301	   protection of other parts of the packet remain unchanged.

303	5.2.1.2.  Field definitions

305	5.2.1.2.1.  Bandwidth

307	   As in several other protocols sending bandwidth values such as OSPF-
308	   TE [RFC3630], the bandwidth is expressed in bytes per second (not
309	   bits), in IEEE floating point format.  For quick reference, this
310	   format is as follows:

312	    0                   1                   2                   3
313	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
314	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
315	   |S|    Exponent   |                  Fraction                   |
316	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

318	                                 Figure 2

320	   S is the sign, Exponent is the exponent base 2 in "excess 127"
321	   notation, and Fraction is the mantissa - 1, with an implied binary
322	   point in front of it.  Thus, the above represents the value:

324	   (-1)**(S) * 2**(Exponent-127) * (1 + Fraction)

326	                For more details, refer to [IEEE.754.1985].

328	                                 Figure 3

330	5.2.1.2.2.  B (Blocked) bit

332	   Indicates that the flow has been circuit-broken.

334	5.2.1.2.3.  D (Danger) bit

336	   Indicates that the flow is in danger of being circuit-broken.

338	5.2.1.2.4.  P (Police) bit

340	   Indicates that the flow should be policed instead of blocked.  Flows
341	   marked for policing by the sender should have traffic proportionally
342	   dropped when bandwidth is needed, according to their priority.  [TBD]
343	   Flesh this concept out, and decide whether it's actually viable.
344	   This was my attempt at addressing a suggestion from Bob Briscoe at
345	   IETF 97 in ICCRG at the mic, IIRC.  It probably requires more state,
346	   such as total desired policable bandwidth, total current policed
347	   bandwidth, and current policing bandwidth per-flow, plus some
348	   definition of how to decide between cutting off some flows and
349	   policing others.  This may not be worth the hassle, but there are
350	   some use cases such as FEC repair traffic which might actually be
351	   nicer this way.  However, it might also be possible to get the same
352	   effect by assigning priority to those repair flows.  Things like
353	   video enhancement layers of course are probably better done as a
354	   complete cutoff.

356	5.2.1.2.5.  Res (Reserved bits)

358	   The sender MUST set all reserved bits to 0 when sending a CBACC
359	   control packet.  Receivers and CBACC-capable transit nodes MUST
360	   accept any value in the reserved bits.

362	5.2.1.2.6.  Priority

364	   The sender MAY indicate relative priorities of different streams from
365	   the same sender with this field.  This is an 8-bit unsigned integer,
366	   and higher values are kept preferentially over other traffic from the
367	   same sender with lower priority values, so all flows with a lower
368	   priority value are circuit-broken before any flows with a higher
369	   priority value.  Among multiple flows from the same sender with the
370	   same priority, the highest bandwidth flows are circuit- broken first.

372	5.3.  States

374	5.3.1.  Interface State

376	   A CBACC circuit breaker holds the following state for each interface,
377	   for both the inbound and outbound directions on that interface:

379	   o  aggregate bandwidth:  The sum of the bandwidths of all non-
380	       circuit-broken CBACC flows which transit this interface in this
381	       direction.

383	   o  bandwidth limit:  The maximum aggregate CBACC advertised bandwidth
384	       allowed, not including circuit-broken flows.  This may depend on
385	       administrative configuration and congestion measurements for the
386	       network, whether from this node or other nodes.  It's out of
387	       scope for this document to define such congestion measurements.
388	       Network operators should carefully consider that this bandwidth
389	       limit applies to flows that are unresponsive to congestion.

391	       When reducing the bandwidth limit due to congestion, the circuit
392	       breaker MUST NOT reduce the limit by more than half its value in
393	       10 seconds, and SHOULD use a smoothing function to reduce the
394	       limit gradually over time.

396	       It is RECOMMENDED that no more than half the capacity for a link
397	       be allocated to CBACC flows if the link might be shared with TCP
398	       or other traffic that is responsive to congestion.

400	       Depending on administrative configuration and the physical
401	       characteristics of the interface, the bandwidth limit may be
402	       either shared between upstream and downstream traffic, or it may
403	       be separate.  Either a single shared value should be used, or two
404	       separate independent values should be used for the inbound and
405	       outbound directions for an interface.

407	   o  CBACC bandwidth warning threshold:  A soft bandwidth threshold.
408	       When the aggregate CBACC advertised bandwidth exceeds this
409	       threshold, flows that would have been circuit-broken with a
410	       bandwidth limit at this threshold MUST have the Danger bit set in
411	       the Bandwidth Advertisement packets that are forwarded by this
412	       circuit breaker.  This threshold SHOULD be configurable as a
413	       proportion of the bandwidth limit, and MUST remain at or below
414	       the bandwidth limit when the bandwidth limit changes.  The
415	       recommended proportion value is .75, but specific networks may
416	       use a different value if deemed useful by the network operators.

418	5.3.2.  Flow State

420	   The following state is kept for flows that are joined from at least
421	   one downstream interface and for which at least one CBACC Bandwidth
422	   Advertisement packet has been received:

424	   o  bandwidth:  The bandwidth from the most receintly received
425	       Bandwidth Advertisement.

427	   o  ingress status:  One of the following values:

429	          * 'subscribed'
430	          Indicates that the circuit breaker is subscribed upstream to
431	          the flow and forwarding data and control packets through zero
432	          or more egress interfaces.

434	          * 'pruned'
435	          Indicates that the flow has been circuit-broken.  A request to
436	          unsubscribe from the flow has been sent upstream, e.g. a PIM
437	          prune (section 3.5 of [RFC7761]) or a "leave" operation via
438	          IGMP, MLD, or another appropriate group membership protocol.

440	          * 'probing'
441	          Indicates that the flow was circuit-broken previously, and is
442	          currently joined upstream to refresh the most recent Bandwidth
443	          Advertisement in order to evaluate reinstating the flow.

445	   o  probe timer:  Used to periodically probe a flow in the 'pruned'
446	       state, to evaluate returning to 'forwarding'.

448	   Flows additionally have a per-interface state for egress interfaces:

450	   o  egress status:  One of the following values:

452	          * 'forwarding'
453	          Indicates that the flow is a non-circuit-broken flow in steady
454	          state, forwarding data and control packets downstream.

456	          * 'blocked'
457	          Indicates that data packets for this flow are NOT forwarded
458	          downstream via this interface.  Bandwidth Advertisements are
459	          still forwarded, each with the 'Blocked' bit set to 1.  All
460	          other flow traffic MUST be dropped.

462	5.4.  Functionality

464	   The CBACC building block on a sender MUST have access to the maximum
465	   bandwidth that may be sent at any time in the following 3 seconds.  A
466	   CBACC sender MUST send this value in a Bandwidth Advertisement packet
467	   once per second.  The end result of the traffic sent on the wire for
468	   a particular flow MUST honor this maximum bandwidth commitment, such
469	   that bandwidth measurements taken over any sliding window one-second
470	   period MUST NOT exceed any of prior 3 maximum Bandwidth
471	   Advertisements (or any of them, if fewer than 3 have been sent).

473	   A CBACC circuit breaker MUST order its monitored flows based on per-
474	   flow estimates of network fairness and preferentially circuit break
475	   less fair flows when bandwidth limits are exceeded.  A normative
476	   method to determine network fairness for a flow is out of scope for
477	   this document, but CBACC circuit breaker implementations SHOULD
478	   provide a capability for network operators to configure
479	   administrative biases for specific sets of flows, and network
480	   operators SHOULD consider fairness concerns as expressed in [RFC2914]
481	   section 3.2 and other relevant documents describing best practices.

483	   In particular, fairness metrics SHOULD favor multicast flows with
484	   many receivers over multicast flows with few receivers and flows with
485	   low bandwidth over flows with high bandwidth.  When receiver counts
486	   are known (for example via the experimental PIM extension specified
487	   in [RFC6807]) a RECOMMENDED metric is (bandwidth/receiver count),
488	   though other metrics MAY be used where deemed appropriate by network
489	   operators following internet best practices, or when receiver counts
490	   can't be determined.

492	   A CBACC sender MUST send Bandwidth Advertisements once per second.
493	   (Implementation-specific jitter in timer implementations not
494	   exceeding .1s is acceptable.)

496	   If a circuit breaker receives more than 5 Bandwidth Advertisement
497	   packets for a flow in two seconds, the circuit breaker SHOULD set the
498	   flow to "pruned" and leave the upstream channel, and MUST drop
499	   Bandwidth Advertisement packets in excess of one per second.

501	   Flows which are currently circuit-broken on an egress interface are
502	   set to "blocked".  When a flow on an egress interface is in blocked
503	   state, Bandwidth Advertisement packets MUST be forwarded except as
504	   described in the preceding paragraph, the "Blocked" bit MUST be set
505	   to 1 before forwarding, and other traffic for that flow MUST NOT be
506	   forwarded along that interface.

508	   When a flow is blocked or pruned, the circuit breaker MAY truncate
509	   the Bandwidth Advertisement packet, keeping only the headers of the
510	   packet containing the Bandwidth Advertisement before forwarding.

512	   When a flow is pruned, the circuit-breaker MUST generate and forward
513	   a Bandwidth Advertisement packet once per second with the "Blocked"
514	   bit set when there are still downstream receivers connected.

516	   In flows which are not circuit-broken but which would be circuit-
517	   broken if the bandwidth warning threshold were the bandwidth limit,
518	   the Danger bit MUST be set to 1 before forwarding.  Both data and
519	   control packets are forwarded for flows in this situation.  The
520	   "Danger" bit MAY be used by receivers to take early action to avoid
521	   getting circuit-broken by shifting to a lower-bandwidth
522	   representation, if available.

524	   When a flow is in the "blocked" state on every egress interface, the
525	   circuit breaker MAY set the flow to "pruned" on the ingress interface
526	   and leave the channel upstream.

528	   In addition to monitoring the advertised bandwidth, a CBACC circuit
529	   breaker or other assisting nodes in the network SHOULD monitor the
530	   observed bandwidth per flow, and SHOULD circuit break "overactive"
531	   flows, defined as those which exceed their CBACC maximum bandwidth
532	   commitment.  A circuit breaker MAY perform constant monitoring on all
533	   flows, or MAY use load sharing techniques such as random selection or
534	   round robin to monitor only a certain subset of flows at a time.

536	   When detecting overactive flows, circuit breakers MUST use techniques
537	   to avoid false positives due to transient upstream network conditions
538	   such as packet compression or occasional packet duplication.  For
539	   example, using an average of bandwidth measurements over the prior 3
540	   seconds would qualify, where a half-second window would not.  (A full
541	   listing of reasonable false-positive avoidance techniques is out of
542	   scope for this document.)

544	   [TBD: examples with network diagrams and bandwidths?]  [TBD: some
545	   internal structure on this section. "wall of text" was some feedback]

547	6.  Requirements from other building blocks

549	   The sender needs to know the bandwidth, including any upcoming
550	   changes, at least 3 seconds in advance.  There is no requirement on
551	   how building blocks define this functionality except on the packets
552	   on the wire--the advance knowledge might, for example, be implemented
553	   by buffering and pacing on the sending machine.  Specifics of the
554	   sending bandwidth implementations are out of scope for this document,
555	   as it's intended to provide requirements that will be applicable to a
556	   broad range of possible implementations, including RTP and WEBRC.

558	7.  IANA Considerations

560	   This draft requests IANA to allocate an IPv6 packet header option
561	   number with the "action" bits set to "skip" and the "change" bit set
562	   to 1, as specified in [RFC2460] section 4.2.  [TO BE REMOVED: This
563	   registration should take place at the following location:
564	   http://www.iana.org/assignments/ipv6-parameters/ipv6-
565	   parameters.xhtml#extension-header.]

567	   This draft also requests IANA to allocate an IPv4 packet header
568	   option number with the "copied" bit set and the option class
569	   "control", as specified in [RFC0791] section 3.1.  [TO BE REMOVED:
570	   This registration should take place at the following location:

572	   http://www.iana.org/assignments/ip-parameters/ip-parameters.xhtml#ip-
573	   parameters-1.]

575	   If those are deemed unacceptable, as an alternative with some
576	   compromises described in Section 5.2.1, this draft instead requests
577	   IANA to allocate a UDP destination port number.  [TO BE REMOVED: This
578	   registration should take place at the following location:
579	   http://www.iana.org/assignments/service-names-port-numbers/service-
580	   names-port-numbers.xhtml.]

582	8.  Security Considerations

584	8.1.  Forged Packets

586	   Forged Bandwidth Advertisement packets that get accepted by CBACC
587	   circuit breakers which dramatically over-report or under-report the
588	   correct bandwidth would present a potential DoS against a CBACC flow,
589	   by making the circuit breaker believe the flow exceeds the node's
590	   capacity when over-reporting, or by letting the node notice an
591	   apparent violation of the commitment to remain under the advertised
592	   bandwidth when under-reporting.

594	   Similarly, it is possible to forge a CBACC Bandwidth Advertisement
595	   for a non-CBACC flow, which likewise may constitute a DoS against
596	   that flow.

598	   For multicast, attacker would have to be on-path in order to deliver
599	   a forged packet to a CBACC circuit breaker, because the join's
600	   reverse path propagation will only reach the sender on a legitimate
601	   network path to its source address.

603	   For unicast, it's a bigger problem, because ANY sender along path
604	   that doesn't have RPF check BCP 38 [RFC2827] permits attack on the
605	   flow via forged packet that substantially under-reports or over-
606	   reports bandwidth.

608	   For AMT tunnels, when RPF checks along a path to the gateway are not
609	   present, nothing stops forged packets from being forwarded by the
610	   gateway.  If these packets contain CBACC control packets, it's
611	   possible to inject a forged packet into the network downstream from
612	   the gateway, combining the unicast hole with the multicast hole.
613	   This is a vulnerability that should probably be addressed by a new
614	   AMT version with some defense against forgery of data.

616	   For IPSEC, since the Bandwidth Advertisement IP header option is
617	   mutable, it's not protected by the IPSEC security services, so the
618	   Bandwidth Advertisement can be forged for consumption by the circuit
619	   breakers, even though the packet will be rejected by the end host
620	   with the security association.  This could mount a DoS via the
621	   intermediate circuit-breakers by over-reporting or under-reporting
622	   flow bandwidth, when processing CBACC traffic through untrusted
623	   network paths.

625	   The unicast vulnerabilities would be much mitigated by RPF checks as
626	   recommended by BCP 38 [RFC2827] at every hop, or otherwise maintained
627	   by the network.  Absent such checks, cheap DoS vulnerabilities may be
628	   present from any permissive network locations.

630	8.2.  Overloading of Slow Paths

632	   CBACC control packets are sent as part of the data stream so that
633	   they traverse the same intermediate network nodes as the rest of the
634	   data, but they also carry control information that must be processed
635	   by certain nodes along that path.

637	   This creates potential problems very similar to the problems with the
638	   Router Alert IP option discussed in Section 3 of [RFC6398], where a
639	   circuit-breaker might have a "fast path" for forwarding that can
640	   handle a much higher traffic volume than the "slow path" necessary to
641	   process CBACC control packets, which is potentially vulnerable to
642	   overloading.

644	   If a CBACC-compatible circuit breaker receives a high rate of CBACC
645	   control packets, the circuit breaker MUST maintain network health for
646	   other flows.  A circuit-breaker MAY drop all packets, including all
647	   CBACC control packets, for a flow in which more than 5 CBACC control
648	   packets were received in less than a second.  (This number is
649	   intended to allow for moderate IP packet duplication and packet
650	   compression by upstream routers, while still being slow enough for
651	   handling of packets on the slow path.)

653	8.3.  Overloading of State

655	   Since CBACC flows require state, it may be possible for a set of
656	   receivers and/or senders, possibly acting in concert, to generate
657	   many flows in an attempt to overflow the circuit breakers' state
658	   tables.

660	   It is permissible for a network node to behave as a CBACC circuit
661	   breaker for some CBACC flows while treating other CBACC flows as non-
662	   CBACC, as part of a load balancing strategy for the network as a
663	   whole, or simply as defense against this concern when the number of
664	   monitored flows exceeds some threshold.

666	   The same techniques described in section 3.1 of [RFC4609] can be used
667	   to help mitigate this attack, for much the same reasons.  It is
668	   RECOMMENDED that network operators implement measures to mitigate
669	   such attacks.

671	9.  Acknowledgements

673	   Many thanks to Devin Anderson and Ben Kaduk for detailed reviews and
674	   many great suggestions.  Thanks also to Cheng Jin, Scott Brown,
675	   Miroslav Kaduk, and Bob Briscoe for their thoughtful contributions.

677	10.  References

679	10.1.  Normative References

681	   [IEEE.754.1985]
682	              Institute of Electrical and Electronics Engineers,
683	              "Standard for Binary Floating-Point Arithmetic",
684	              IEEE Standard 754, August 1985.

686	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
687	              DOI 10.17487/RFC0791, September 1981,
688	              <http://www.rfc-editor.org/info/rfc791>.

690	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
691	              Requirement Levels", BCP 14, RFC 2119,
692	              DOI 10.17487/RFC2119, March 1997,
693	              <http://www.rfc-editor.org/info/rfc2119>.

695	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
696	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
697	              December 1998, <http://www.rfc-editor.org/info/rfc2460>.

699	   [RFC3048]  Whetten, B., Vicisano, L., Kermode, R., Handley, M.,
700	              Floyd, S., and M. Luby, "Reliable Multicast Transport
701	              Building Blocks for One-to-Many Bulk-Data Transfer",
702	              RFC 3048, DOI 10.17487/RFC3048, January 2001,
703	              <http://www.rfc-editor.org/info/rfc3048>.

705	   [RFC3738]  Luby, M. and V. Goyal, "Wave and Equation Based Rate
706	              Control (WEBRC) Building Block", RFC 3738,
707	              DOI 10.17487/RFC3738, April 2004,
708	              <http://www.rfc-editor.org/info/rfc3738>.

710	   [RFC4302]  Kent, S., "IP Authentication Header", RFC 4302,
711	              DOI 10.17487/RFC4302, December 2005,
712	              <http://www.rfc-editor.org/info/rfc4302>.

714	   [RFC4727]  Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4,
715	              ICMPv6, UDP, and TCP Headers", RFC 4727,
716	              DOI 10.17487/RFC4727, November 2006,
717	              <http://www.rfc-editor.org/info/rfc4727>.

719	   [RFC7761]  Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
720	              Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
721	              Multicast - Sparse Mode (PIM-SM): Protocol Specification
722	              (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
723	              2016, <http://www.rfc-editor.org/info/rfc7761>.

725	10.2.  Informative References

727	   [I-D.ietf-tsvwg-circuit-breaker]
728	              Fairhurst, G., "Network Transport Circuit Breakers",
729	              draft-ietf-tsvwg-circuit-breaker-15 (work in progress),
730	              April 2016.

732	   [I-D.ietf-tsvwg-rfc5405bis]
733	              Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
734	              Guidelines", draft-ietf-tsvwg-rfc5405bis-19 (work in
735	              progress), October 2016.

737	   [PLM]      A.Legout, E.W.Biersack, Institut EURECOM, "Fast
738	              Convergence for Cumulative Layered Multicast Transmission
739	              Schemes", 1999.

741	   [RFC2357]  Mankin, A., Romanow, A., Bradner, S., and V. Paxson, "IETF
742	              Criteria for Evaluating Reliable Multicast Transport and
743	              Application Protocols", RFC 2357, DOI 10.17487/RFC2357,
744	              June 1998, <http://www.rfc-editor.org/info/rfc2357>.

746	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
747	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
748	              DOI 10.17487/RFC2784, March 2000,
749	              <http://www.rfc-editor.org/info/rfc2784>.

751	   [RFC2827]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
752	              Defeating Denial of Service Attacks which employ IP Source
753	              Address Spoofing", BCP 38, RFC 2827, DOI 10.17487/RFC2827,
754	              May 2000, <http://www.rfc-editor.org/info/rfc2827>.

756	   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
757	              RFC 2914, DOI 10.17487/RFC2914, September 2000,
758	              <http://www.rfc-editor.org/info/rfc2914>.

760	   [RFC3269]  Kermode, R. and L. Vicisano, "Author Guidelines for
761	              Reliable Multicast Transport (RMT) Building Blocks and
762	              Protocol Instantiation documents", RFC 3269,
763	              DOI 10.17487/RFC3269, April 2002,
764	              <http://www.rfc-editor.org/info/rfc3269>.

766	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
767	              Jacobson, "RTP: A Transport Protocol for Real-Time
768	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
769	              July 2003, <http://www.rfc-editor.org/info/rfc3550>.

771	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
772	              Text on Security Considerations", BCP 72, RFC 3552,
773	              DOI 10.17487/RFC3552, July 2003,
774	              <http://www.rfc-editor.org/info/rfc3552>.

776	   [RFC3630]  Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering
777	              (TE) Extensions to OSPF Version 2", RFC 3630,
778	              DOI 10.17487/RFC3630, September 2003,
779	              <http://www.rfc-editor.org/info/rfc3630>.

781	   [RFC4609]  Savola, P., Lehtonen, R., and D. Meyer, "Protocol
782	              Independent Multicast - Sparse Mode (PIM-SM) Multicast
783	              Routing Security Issues and Enhancements", RFC 4609,
784	              DOI 10.17487/RFC4609, October 2006,
785	              <http://www.rfc-editor.org/info/rfc4609>.

787	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
788	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
789	              DOI 10.17487/RFC5226, May 2008,
790	              <http://www.rfc-editor.org/info/rfc5226>.

792	   [RFC6398]  Le Faucheur, F., Ed., "IP Router Alert Considerations and
793	              Usage", BCP 168, RFC 6398, DOI 10.17487/RFC6398, October
794	              2011, <http://www.rfc-editor.org/info/rfc6398>.

796	   [RFC6807]  Farinacci, D., Shepherd, G., Venaas, S., and Y. Cai,
797	              "Population Count Extensions to Protocol Independent
798	              Multicast (PIM)", RFC 6807, DOI 10.17487/RFC6807, December
799	              2012, <http://www.rfc-editor.org/info/rfc6807>.

801	   [RFC7450]  Bumgardner, G., "Automatic Multicast Tunneling", RFC 7450,
802	              DOI 10.17487/RFC7450, February 2015,
803	              <http://www.rfc-editor.org/info/rfc7450>.

805	Appendix A.  Overjoining

807	   [I-D.ietf-tsvwg-rfc5405bis] describes several remedies for unicast
808	   congestion control under UDP, even though UDP does not itself provide
809	   congestion control.  In general, any network node under congestion
810	   could in theory collect evidence that a unicast flow's sending rate
811	   is not responding to congestion, and would then be justified in
812	   circuit-breaking it.

814	   With multicast IP, the situation is different, especially in the
815	   presence of malicious receivers.  A well-behaved sender using a
816	   receiver-controlled congestion scheme such as WEBRC does not reduce
817	   its send rate in response to congestion, instead relying on receivers
818	   to leave the appropriate multicast groups.

820	   This leads to a situation where, when a network accepts inter-domain
821	   multicast traffic, as long as there are senders somewhere in the
822	   world with aggregate bandwidth that exceeds a network's capacity,
823	   receivers in that network can join the flows and overflow the network
824	   capacity.  A receiver controlled by an attacker could do this at the
825	   IGMP/MLD level without running the application layer protocol that
826	   participates in the receiver-controlled congestion control.

828	   A network might be able to detect and defend against the most naive
829	   version of such an attack by blocking end users that try to join too
830	   many flows at once.  However, an attacker can achieve the same effect
831	   by joining a few high-bandwidth flows, if those exist anywhere, and
832	   an attacker that controls a few machines in a network can coordinate
833	   the receivers so they join disjoint sets of non-responsive sending
834	   flows.

836	   This scenario will produce congestion in a middle node in the network
837	   that can't be easily detected at the edge where the IGMP/MLD join is
838	   accepted.  Thus, an attacker with a small set of machines in a target
839	   network can always trip a circuit breaker if present, or can induce
840	   excessive congestion among the bandwidth allocated to multicast.
841	   This problem gets worse as more multicast flows become available.

843	   This is a significant barrier to multicast adoption because there is
844	   no present defense which does not itself constitute a denial of
845	   service attack.

847	   Although the same can apply to non-responsive unicast traffic,
848	   network operators can assume that non-responsive sending flows are in
849	   violation of congestion control best practices, and can therefore cut
850	   off such flows.  However, non-responsive multicast senders are likely
851	   to be well-behaved participants in receiver-controlled congestion
852	   control schemes.

854	   However, receiver controlled congestion control schemes also show the
855	   most promise for efficient massive scale content distribution via
856	   multicast, provided network health can be ensured.  Therefore,
857	   mechanisms to mitigate overjoining attacks while still permitting
858	   receiver-controlled congestion control are necessary.  [TBD: this
859	   whole section should be expanded and moved to a separate
860	   informational draft]

862	   TBD: network diagram

864	                                 Figure 4

866	Author's Address

868	   Jacob Holland
869	   Akamai Technologies, Inc.
870	   150 Broadway
871	   Cambridge, Massachusetts  02142
872	   USA

874	   Phone: +1 617 444 3000
875	   Email: jholland@akamai.com
876	   URI:   https://www.akamai.com/