idnits 2.17.1 

draft-ietf-tsvwg-circuit-breaker-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 17, 2015) is 3114 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC3828' is mentioned on line 577, but not defined

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	TSVWG Working Group                                         G. Fairhurst
3	Internet-Draft                                    University of Aberdeen
4	Intended status: Best Current Practice                  October 17, 2015
5	Expires: April 19, 2016

7	                   Network Transport Circuit Breakers
8	                  draft-ietf-tsvwg-circuit-breaker-06

10	Abstract

12	   This document explains what is meant by the term "network transport
13	   Circuit Breaker" (CB).  It describes the need for circuit breakers
14	   when using network tunnels, and other non-congestion controlled
15	   applications, and explains where circuit breakers are, and are not,
16	   needed.  It also defines requirements for building a circuit breaker
17	   and the expected outcomes of using a circuit breaker within the
18	   Internet.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on April 19, 2016.

37	Copyright Notice

39	   Copyright (c) 2015 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
55	     1.1.  Types of Circuit-Breaker  . . . . . . . . . . . . . . . .   4
56	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
57	   3.  Design of a Circuit-Breaker (What makes a good circuit
58	       breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . .   5
59	     3.1.  Functional Components . . . . . . . . . . . . . . . . . .   5
60	   4.  Requirements for a Network Transport Circuit Breaker  . . . .   8
61	   5.  Other network topologies  . . . . . . . . . . . . . . . . . .  11
62	     5.1.  Use with a multicast control/routing protocol . . . . . .  11
63	     5.2.  Use with control protocols supporting pre-provisioned
64	           capacity  . . . . . . . . . . . . . . . . . . . . . . . .  12
65	     5.3.  Unidirectional Circuit Breakers over Controlled Paths . .  13
66	   6.  Examples of Circuit Breakers  . . . . . . . . . . . . . . . .  13
67	     6.1.  A Fast-Trip Circuit Breaker . . . . . . . . . . . . . . .  13
68	       6.1.1.  A Fast-Trip Circuit Breaker for RTP . . . . . . . . .  14
69	     6.2.  A Slow-trip Circuit Breaker . . . . . . . . . . . . . . .  14
70	     6.3.  A Managed Circuit Breaker . . . . . . . . . . . . . . . .  15
71	       6.3.1.  A Managed Circuit Breaker for SAToP Pseudo-Wires  . .  15
72	       6.3.2.  A Managed Circuit Breaker for Pseudowires (PWs) . . .  16
73	   7.  Examples where circuit breakers may not be needed.  . . . . .  16
74	     7.1.  CBs over pre-provisioned Capacity . . . . . . . . . . . .  16
75	     7.2.  CBs with tunnels carrying Congestion-Controlled Traffic .  17
76	     7.3.  CBs with Uni-directional Traffic and no Control Path  . .  18
77	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
78	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  19
79	   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  19
80	   11. Revision Notes  . . . . . . . . . . . . . . . . . . . . . . .  19
81	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
82	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  20
83	     12.2.  Informative References . . . . . . . . . . . . . . . . .  21
84	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  22

86	1.  Introduction

88	   A network transport Circuit Breaker (CB) is an automatic mechanism
89	   that is used to estimate congestion caused by a flow, and to
90	   terminate (or significantly reduce the rate of) the flow when
91	   persistent congestion is detected.  This is a safety measure to
92	   prevent starvation of network resources denying other flows from
93	   access to the Internet, such measures are essential for an Internet
94	   that is heterogeneous and for traffic that is hard to predict in
95	   advance.  Avoiding persistent prevention is important to reduce the
96	   potential for "Congestion Collapse" [RFC2914].

98	   The term "Circuit Breaker" originates in electricity supply, and has
99	   nothing to do with network circuits or virtual circuits.  In
100	   electricity supply, a Circuit Breaker is intended as a protection
101	   mechanism of last resort.  Under normal circumstances, a Circuit
102	   Breaker ought not to be triggered; it is designed to protect the
103	   supply network and attached equipment when there is overload.  Just
104	   as people do not expect the electrical circuit-breaker (or fuse) in
105	   their home to be triggered, except when there is a wiring fault or a
106	   problem with an electrical appliance.

108	   In networking, the Circuit Breaker principle can be used as a
109	   protection mechanism of last resort to avoid persistent congestion
110	   impacting other flows that share network capacity.  Persistent
111	   congestion was a feature of the early Internet of the 1980s.  This
112	   resulted in excess traffic starving other connection from access to
113	   the Internet.  It was countered by the requirement to use congestion
114	   control (CC) by the Transmission Control Protocol (TCP) [Jacobsen88]
115	   [RFC1112].  These mechanisms operate in Internet hosts to cause TCP
116	   connections to "back off" during congestion.  The introduction of a
117	   Congestion Controller in TCP (currently documented in [RFC5681]
118	   ensured the stability of the Internet, because it was able to detect
119	   congestion and promptly react.  This worked well while TCP was by far
120	   the dominant traffic in the Internet, and most TCP flows were long-
121	   lived (ensuring that they could detect and respond to congestion
122	   before the flows terminated).  This is no longer the case, and non-
123	   congestion controlled traffic, including many applications of the
124	   User Datagram Protocol (UDP) can form a significant proportion of the
125	   total traffic traversing a link.  The current Internet therefore
126	   requires that non-congestion controlled traffic needs to be
127	   considered to avoid persistent congestion.

129	   There are important differences between a transport circuit-breaker
130	   and a congestion-control method.  Specifically, congestion control
131	   (as implemented in TCP, SCTP, and DCCP) operates on the timescale on
132	   the order of a packet round-trip-time (RTT), the time from sender to
133	   destination and return.  Congestion control methods are able to react
134	   to a single packet loss/marking and reduce the transmission rate for
135	   each loss or congestion event.  The goal is usually to limit the
136	   maximum transmission rate to a rate that reflects the available
137	   capacity across a network path.  These methods typically operate on
138	   individual traffic flows (e.g., a 5-tuple).

140	   In contrast, Circuit Breakers are recommended for non-congestion-
141	   controlled Internet flows and for traffic aggregates, e.g., traffic
142	   sent using a network tunnel.  People have been implementing what this
143	   draft characterizes as circuit breakers on an ad hoc basis to protect
144	   Internet traffic, this draft therefore provides guidance on how to
145	   deploy and use these mechanisms.  Later sections provide examples of
146	   cases where circuit-breakers may or may not be desirable.

148	   A Circuit Breaker needs to measure (meter) the traffic to determine
149	   if the network is experiencing congestion and needs to be designed to
150	   trigger robustly when there is persistent congestion.  This means the
151	   trigger needs to operate on a timescale much longer than the path
152	   round trip time (e.g., seconds to possibly many tens of seconds).
153	   This longer period is needed to provide sufficient time for
154	   transports (or applications) to adjust their rate following
155	   congestion, and for the network load to stabilize after any
156	   adjustment.

158	   A Circuit Breaker trigger will often utilize a series of successive
159	   sample measurements metered at an ingress point and an egress point
160	   (either of which could be a transport endpoint).  These measurements
161	   need to be taken over a reasonably long period of time.  This is to
162	   ensure that a Circuit Breaker does not accidentally trigger following
163	   a single (or even successive) congestion events (congestion events
164	   are what triggers congestion control, and are to be regarded as
165	   normal on a network link operating near its capacity).  Once
166	   triggered, a control function needs to remove traffic from the
167	   network, either by disabling the flow or by significantly reducing
168	   the level of traffic.  This reaction provides the required protection
169	   to prevent persistent congestion being experienced by other flows
170	   that share the congested part of the network path.

172	   Section 4 defines requirements for building a Circuit Breaker.

174	1.1.  Types of Circuit-Breaker

176	   There are various forms of network transport circuit breaker.  These
177	   are differentiated mainly on the timescale over which they are
178	   triggered, but also in the intended protection they offer:

180	   o  Fast-Trip Circuit Breakers: The relatively short timescale used by
181	      this form of circuit breaker is intended to provide protection for
182	      network traffic from a single flow or related group of flows.

184	   o  Slow-Trip Circuit Breakers: This circuit breaker utilizes a longer
185	      timescale and is designed to protect network traffic from
186	      congestion by traffic aggregates.

188	   o  Managed Circuit Breakers: Utilize the operations and management
189	      functions that might be present in a managed service to implement
190	      a circuit breaker.

192	   Examples of each type of circuit breaker are provided in section 4.

194	2.  Terminology

196	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
197	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
198	   document are to be interpreted as described in [RFC2119].

200	3.  Design of a Circuit-Breaker (What makes a good circuit breaker?)

202	   Although circuit breakers have been talked about in the IETF for many
203	   years, there has not yet been guidance on the cases where circuit
204	   breakers are needed or upon the design of circuit breaker mechanisms.
205	   This document seeks to offer advice on these two topics.

207	   Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that
208	   carry non-congestion-controlled Internet flows and for traffic
209	   aggregates.  This includes traffic sent using a network tunnel.
210	   Designers of other protocols and tunnel encapsulations also ought to
211	   consider the use of these techniques to provide last resort to
212	   protect traffic that shares the network path being used.

214	   This document defines the requirements for design of a Circuit
215	   Breaker and provides examples of how a Circuit Breaker can be
216	   constructed.  The specifications of individual protocols and tunnel
217	   encapsulations need to detail the protocol mechanisms needed to
218	   implement a Circuit Breaker.

220	   Section 3.1 describes the functional components of a circuit breaker
221	   and section 3.2 defines requirements for implementing a Circuit
222	   Breaker.

224	3.1.  Functional Components

226	   The basic design of a transport circuit breaker involves
227	   communication between an ingress point (a sender) and an egress point
228	   (a receiver) of a network flow or set of flows.  A simple picture of
229	   Circuit Breaker operation is provided in figure 1.  This shows a set
230	   of routers (each labelled R) connecting a set of endpoints.

232	   A Circuit Breaker is used to control traffic passing through a subset
233	   of these routers, acting between the ingress and a egress point
234	   network devices.  The path between the ingress and egress could be
235	   provided by a tunnel or other network-layer technique.  One expected
236	   use would be at the ingress and egress of a service, where all
237	   traffic being considered terminates beyond the egress point, and
238	   hence the ingress and egress carry the same set of flows.

240	 +--------+                                                   +--------+
241	 |Endpoint|                                                   |Endpoint|
242	 +--+-----+          >>> circuit breaker traffic >>>          +--+-----+
243	    |                                                            |
244	    | +-+  +-+  +---------+  +-+  +-+  +-+  +--------+  +-+  +-+ |
245	    +-+R+--+R+->+ Ingress +--+R+--+R+--+R+--+ Egress |--+R+--+R+-+
246	      +++  +-+  +------+--+  +-+  +-+  +-+  +-----+--+  +++  +-+
247	       |         ^     |                          |      |
248	       |         |  +--+------+            +------+--+   |
249	       |         |  | Ingress |            | Egress  |   |
250	       |         |  | Meter   |            | Meter   |   |
251	       |         |  +----+----+            +----+----+   |
252	       |         |       |                      |        |
253	  +-+  |         |  +----+----+                 |        |  +-+
254	  |R+--+         |  | Measure +<----------------+        +--+R|
255	  +++            |  +----+----+      Reported               +++
256	   |             |      |            Egress                  |
257	   |             |  +----+----+      Measurement             |
258	+--+-----+       |  | Trigger +                           +--+-----+
259	|Endpoint|       |  +----+----+                           |Endpoint|
260	+--------+       |       |                                +--------+
261	                 +---<---+
262	                  Reaction

264	   Figure 1: A CB controlling the part of the end-to-end path between an
265	   ingress point and an egress point.  (Note: In some cases, the trigger
266	   and measure functions could alternatively be located at other
267	   locations (e.g., at a network operations centre.)

269	   In the context of a Circuit Breaker, the ingress and egress functions
270	   could be implemented in different places.  For example, they could be
271	   located in network devices at a tunnel ingress and at the tunnel
272	   egress.  In some cases, they could be located at one or both network
273	   endpoints (see figure 2), implemented as components within a
274	   transport protocol.

276	    +----------+                 +----------+
277	    | Ingress  |  +-+  +-+  +-+  | Egress   |
278	    | Endpoint +->+R+--+R+--+R+--+ Endpoint |
279	    +--+----+--+  +-+  +-+  +-+  +----+-----+
280	       ^    |                         |
281	       | +--+------+             +----+----+
282	       | | Ingress |             | Egress  |
283	       | | Meter   |             | Meter   |
284	       | +----+----+             +----+----+
285	       |      |                       |
286	       | +--- +----+                  |
287	       | | Measure +<-----------------+
288	       | +----+----+      Reported
289	       |      |           Egress
290	       | +----+----+      Measurement
291	       | | Trigger |
292	       | +----+----+
293	       |      |
294	       +---<--+
295	       Reaction

297	   Figure 2: An endpoint CB implemented at the sender (ingress) and
298	   receiver (egress).

300	   The set of components needed to implement a Circuit Breaker are:

302	   1.  An ingress meter (at the sender or tunnel ingress) records the
303	       number of packets/bytes sent in each measurement interval.  This
304	       measures the offered network load for a flow or set of flows.
305	       For example, the measurement interval could be many seconds (or
306	       every few tens of seconds or a series of successive shorter
307	       measurements that are combined by the Circuit Breaker Measurement
308	       function).

310	   2.  An egress meter (at the receiver or tunnel egress) records the
311	       number/bytes received in each measurement interval.  This
312	       measures the supported load for the flow or set of flows, and
313	       could utilize other signals to detect the effect of congestion
314	       (e.g., loss/marking experienced over the path).  The measurements
315	       at the egress could be synchronised (including an offset for the
316	       time of flight of the data, or referencing the measurements to a
317	       particular packet) to ensure any counters refer to the same span
318	       of packets.

320	   3.  The measured values at the ingress and egress are communicated to
321	       the Circuit Breaker Measurement function.  This could use several
322	       methods including: Sending return measurement packets from a
323	       receiver to a trigger function at the sender; An implementation
324	       using Operations, Administration and Management (OAM); or be
325	       sending another in-band signalling datagram to the trigger
326	       function.  This could also be implemented purely as a control
327	       plane function, e.g., using a software-defined network
328	       controller.

330	   4.  The measurement function combines the ingress and egress
331	       measurements to assess the present level of network congestion.
332	       (For example, the loss rate for each measurement interval could
333	       be deduced from calculating the difference between ingress and
334	       egress counter values.)  Note the method does not require high
335	       accuracy for the period of the measurement interval (or therefore
336	       the measured value, since isolated and/or infrequent loss events
337	       need to be disregarded.)

339	   5.  A trigger function determines if the measurements indicate
340	       persistent congestion.  This function defines an appropriate
341	       threshold for determining there is persistent congestion between
342	       the ingress and egress.  This preferably considers a rate or
343	       ratio, rather than an absolute value (e.g., more than 10% loss,
344	       but other methods could also be based on the rate of transmission
345	       as well as the loss rate).  The transport Circuit Breaker is
346	       triggered when the threshold is exceeded in multiple measurement
347	       intervals (e.g., 3 successive measurements).  Designs need to be
348	       robust so that single or spurious events do not trigger a
349	       reaction.

351	   6.  A reaction that is applied that the Ingress when the Circuit
352	       Breaker is triggered.  This seeks to automatically remove the
353	       traffic causing persistent congestion.

355	   7.  A feedback mechanism that triggers when either the receive or
356	       ingress and egress measurements are not available, since this
357	       also could indicate a loss of control packets (also a symptom of
358	       heavy congestion or inability to control the load).

360	4.  Requirements for a Network Transport Circuit Breaker

362	   The requirements for implementing a Circuit Breaker are:

364	   o  There MUST be a communication path used for control messages from
365	      the ingress meter and the egress meter to the point of
366	      measurement.  The Circuit Breaker MUST trigger if there is a
367	      failure of the communication path used for the control messages.
368	      That is, the feedback indicating a congested period needs to be
369	      designed so that the Circuit Breaker is triggered when it fails to
370	      receive measurement reports that indicate an absence of
371	      congestion, rather than relying on the successful transmission of
372	      a "congested" signal back to the sender.  (The feedback signal
373	      could itself be lost under congestion).

375	   o  A Circuit Breaker MUST define a measurement period over which the
376	      Circuit Breaker Measurement function measures the level of
377	      congestion or loss.  This method does not have to detect
378	      individual packet loss, but MUST have a way to know that packets
379	      have been lost/marked from the traffic flow.  If Explicit
380	      Congestion Notification (ECN) is enabled [RFC3168], an egress
381	      meter MAY also count the number of ECN congestion marks/event per
382	      measurement interval, but even if ECN is used, loss MUST still be
383	      measured, since this better reflects the impact of persistent
384	      congestion.  In this context, loss represents a reliable
385	      indication of congestion, as opposed to the finer-grain marking of
386	      incipient congestion that can be provided via ECN.  The type of
387	      Circuit Breaker will determine how long this measurement period
388	      needs to be.

390	   o  The measurement period used by a Circuit Breaker Measurement
391	      function MUST be longer than the time that current Congestion
392	      Control algorithms need to reduce their rate following detection
393	      of congestion.  This is important because end-to-end Congestion
394	      Control algorithms require at least one RTT to notify and adjust
395	      the traffic to experienced congestion, and congestion bottlenecks
396	      can share traffic with a diverse range of RTTs.  The measurement
397	      period is therefore expected to be significantly longer than the
398	      RTT experienced by the Circuit Breaker itself.

400	   o  If necessary, MAY combine successive individual meter samples from
401	      the ingress and egress to ensure observation of an average over a
402	      sufficiently long interval.  (Note when meter samples need to be
403	      combined, the combination needs to reflect the sum of the
404	      individual sample counts divided by the total time/volume over
405	      which the samples were measured.  Individual samples over
406	      different intervals can not be directly combined to generate an
407	      average value.)

409	   o  A Circuit Breaker is REQUIRED to define a threshold to determine
410	      whether the measured congestion is considered excessive.

412	   o  A Circuit Breaker is REQUIRED to define the triggering interval,
413	      defining the period over which the trigger uses the collected
414	      measurements.  Circuit Breakers need to trigger over a
415	      sufficiently long period to avoid additionally penalizing flows
416	      with a long path RTT (e.g., many path RTTs).

418	   o  A Circuit Breaker MUST be robust to multiple congestion events.
419	      This usually will define a number of measured persistent
420	      congestion events per triggering period.  For example, a Circuit
421	      Breaker MAY combine the results of several measurement periods to
422	      determine if the Circuit Breaker is triggered. (e.g., triggered
423	      when persistent congestion is detected in 3 of the measurements
424	      within the triggering interval).

426	   o  A Circuit Breaker SHOULD be constructed so that it does not
427	      trigger under light or intermittent congestion.

429	   o  The default response to a trigger SHOULD disable all traffic that
430	      contributed to congestion.

432	   o  Once triggered, the Circuit Breaker MUST react decisively by
433	      disabling or significantly reducing traffic at the source (e.g.,
434	      ingress).  A reaction that results in a reduction SHOULD result in
435	      reducing the traffic by at least an order of magnitude, each time
436	      the Circuit Breaker is triggered.  This response needs to be much
437	      more severe than that of a Congestion Controller algorithm (such
438	      as TCP's congestion control [RFC5681] or TFRC [RFC5348]), because
439	      the Circuit Breaker reacts to more persistent congestion and
440	      operates over longer timescales (i.e., the overload condition will
441	      have persisted for a longer time before the Circuit Breaker is
442	      triggered).

444	   o  A Circuit Breaker that reduces the rate of a flow, MUST continue
445	      to monitor the level of congestion and MUST further reduce the
446	      rate if the Circuit Breaker is again triggered.

448	   o  The reaction to a triggered Circuit Breaker MUST continue for a
449	      period that is at least the triggering interval.  Operator
450	      intervention will usually be required to restore a flow.  If an
451	      automated response is needed to reset the trigger, then this needs
452	      to not be immediate.  The design of an automated reset mechanism
453	      needs to be sufficiently conservative that it does not adversely
454	      interact with other mechanisms (including other Circuit Breaker
455	      algorithms that control traffic over a common path).  It SHOULD
456	      NOT perform an automated reset when there is evidence of continued
457	      congestion.

459	   o  When a Circuit Breaker is triggered, it SHOULD be regarded as an
460	      abnormal network event.  As such, this event SHOULD be logged.
461	      The measurements that lead to triggering of the Circuit Breaker
462	      SHOULD also be logged.

464	5.  Other network topologies

466	   A Circuit Breaker can be deployed in networks with topologies
467	   different to that presented in figure 2.  This section describes
468	   examples of such usage, and possible places where functions may be
469	   implemented.

471	5.1.  Use with a multicast control/routing protocol

473	    +----------+                 +--------+  +----------+
474	    | Ingress  |  +-+  +-+  +-+  | Egress |  |  Egress  |
475	    | Endpoint +->+R+--+R+--+R+--+ Router |--+ Endpoint +->+
476	    +----+-----+  +-+  +-+  +-+  +---+--+-+  +----+-----+  |
477	         ^         ^    ^    ^       |  ^         |        |
478	         |         |    |    |       |  |         |        |
479	    +----+----+    + - - - < - - - - +  |    +----+----+   | Reported
480	    | Ingress |      multicast Prune    |    | Egress  |   | Ingress
481	    | Meter   |                         |    | Meter   |   | Measurement
482	    +---------+                         |    +----+----+   |
483	                                        |         |        |
484	                                        |    +----+----+   |
485	                                        |    | Measure +<--+
486	                                        |    +----+----+
487	                                        |         |
488	                                        |    +----+----+
489	                              multicast |    | Trigger |
490	                              Leave     |    +----+----+
491	                              Message   |         |
492	                                        +----<----+

494	   Figure 3: An example of a multicast CB controlling the end-to-end
495	   path between an ingress endpoint and an egress endpoint.

497	   Figure 3 shows one example of how a multicast circuit breaker could
498	   be implemented at a pair of multicast endpoints (e.g., to implement a
499	   Fast-Trip Circuit Breaker, Section 6.1).  The ingress endpoint (the
500	   sender that sources the multicast traffic) meters the ingress load,
501	   generating an ingress measurement (e.g., recording timestamped packet
502	   counts), and sends this measurement to the multicast group together
503	   with the traffic it has measured.

505	   Routers along a multicast path forward the multicast traffic
506	   (including the ingress measurement) to all active endpoint receivers.
507	   Each last hop (egress) router forwards the traffic to one or more
508	   egress endpoint(s).

510	   In this figure, each endpoint includes a meter that performs a local
511	   egress load measurement.  An endpoint also extracts the received
512	   ingress measurement from the traffic, and compares the ingress and
513	   egress measurements to determine if the Circuit Breaker ought to be
514	   triggered.  This measurement has to be robust to loss (see previous
515	   section).  If the Circuit Breaker is triggered, it generates a
516	   multicast leave message for the egress (e.g., an IGMP or MLD message
517	   sent to the last hop router), which causes the upstream router to
518	   cease forwarding traffic to the egress endpoint.

520	   Any multicast router that has no active receivers for a particular
521	   multicast group will prune traffic for that group, sending a prune
522	   message to its upstream router.  This starts the process of releasing
523	   the capacity used by the traffic and is a standard multicast routing
524	   function (e.g., using the PIM-SM routing protocol).  Each egress
525	   operates autonomously, and the circuit breaker "reaction" is executed
526	   by the multicast control plane (e.g., by the PIM multicast routing
527	   protocol), requiring no explicit signalling by the circuit breaker
528	   along the communication path used for the control messages.  Note:
529	   there is no direct communication with the Ingress, and hence a
530	   triggered Circuit Breaker only controls traffic downstream of the
531	   first hop router.  It does not stop traffic flowing from the sender
532	   to the first hop router; this is however the common practice for
533	   multicast deployment.

535	   The method could also be used with a multicast tunnel or subnetwork
536	   (e.g., Section 6.2, Section 6.3), where a meter at the ingress
537	   generates additional control messages to carry the measurement data
538	   towards the egress where the egress metering is implemented.

540	5.2.  Use with control protocols supporting pre-provisioned capacity

542	   Some paths are provisioned using a control protocol, e.g., flows
543	   provisioned using the Multi-Protocol Label Switching (MPLS) services,
544	   path provisioned using the Resource reservation protocol (RSVP),
545	   networks utilizing Software Defined Network (SDN) functions, or
546	   admission-controlled Differentiated Services.

548	   Figure 1 shows one expected use case, where in this usage a separate
549	   device could be used to perform the measurement and trigger
550	   functions.  The reaction generated by the trigger could take the form
551	   of a network control message sent to the ingress and/or other network
552	   elements causing these elements to react to the Circuit Breaker.
553	   Examples of this type of use are provided in section Section 6.3.

555	5.3.  Unidirectional Circuit Breakers over Controlled Paths

557	   A Circuit Breaker can be used to control uni-directional UDP traffic,
558	   providing that there is a communication path that can be used for
559	   control messages to connect the functional components at the Ingress
560	   and Egress.  This communication path for the control messages can
561	   exist in networks for which the traffic flow is purely
562	   unidirectional.  For example, a multicast stream that sends packets
563	   across an Internet path and can use multicast routing to prune flows
564	   to shed network load.  Some other types of subnetwork also utilize
565	   control protocols that can be used to control traffic flows.

567	6.  Examples of Circuit Breakers

569	   There are multiple types of Circuit Breaker that could be defined for
570	   use in different deployment cases.  This section provides examples of
571	   different types of circuit breaker:

573	6.1.  A Fast-Trip Circuit Breaker

575	   Applications ought to use a full-featured transport (TCP, SCTP,
576	   DCCP), and if not, application (e.g. those using UDP and its UDP-Lite
577	   variant [RFC3828])they need to provide appropriate congestion
578	   avoidance.  [RFC2309] discusses the dangers of congestion-
579	   unresponsive flows and states that "all UDP-based streaming
580	   applications should incorporate effective congestion avoidance
581	   mechanisms".  Guidance for applications that do not use congestion-
582	   controlled transports is provided in [ID-ietf-tsvwg-RFC5405.bis].
583	   Such mechanisms can be designed to react on much shorter timescales
584	   than a circuit breaker, that only observes a traffic envelope.  These
585	   methods can also interact with an application to more effectively
586	   control its sending rate.

588	   A fast-trip circuit breaker is the most responsive form of Circuit
589	   Breaker.  It has a response time that is only slightly larger than
590	   that of the traffic that it controls.  It is suited to traffic with
591	   well-understood characteristics (and could include one or more
592	   trigger functions specifically tailored the type of traffic for which
593	   it is designed).  It is not suited to arbitrary network traffic and
594	   may be unsuitable fro traffic aggregates, since it could prematurely
595	   trigger (e.g., when multiple congestion-controlled flows lead to
596	   short-term overload).

598	   These mechanisms are suitable for implementation in endpoints, where
599	   they can also compliment end-to-end congestion control methods.  A
600	   shorter response time enables these mechanisms to triggers before
601	   other forms of circuit breaker (e.g., circuit breakers operating on
602	   traffic aggregates at a point along the network path).

604	6.1.1.  A Fast-Trip Circuit Breaker for RTP

606	   A set of fast-trip Circuit Breaker methods have been specified for
607	   use together by a Real-time Transport Protocol (RTP) flow using the
608	   RTP/AVP Profile [RTP-CB].  It is expected that, in the absence of
609	   severe congestion, all RTP applications running on best-effort IP
610	   networks will be able to run without triggering these circuit
611	   breakers.  A fast-trip RTP Circuit Breaker is therefore implemented
612	   as a fail-safe that when triggered will terminate RTP traffic.

614	   The sender monitors reception of RTCP reception report blocks, as
615	   contained in SR or RR packets, that convey reception quality feedback
616	   information.  This is used to measure (congestion) loss, possibly in
617	   combination with ECN [RFC6679].

619	   The Circuit Breaker action (shutdown of the flow) is triggered when
620	   any of the following trigger conditions are true:

622	   1.  An RTP Circuit Breaker triggers on reported lack of progress.

624	   2.  An RTP Circuit Breaker triggers when no receiver reports messages
625	       are received.

627	   3.  An RTP Circuit Breaker uses a TFRC-style check and sets a hard
628	       upper limit to the long-term RTP throughput (over many RTTs).

630	   4.  An RTP Circuit Breaker includes the notion of Media Usability.
631	       This circuit breaker is triggered when the quality of the
632	       transported media falls below some required minimum acceptable
633	       quality.

635	6.2.  A Slow-trip Circuit Breaker

637	   A slow-trip Circuit Breaker could be implemented in an endpoint or
638	   network device.  This type of Circuit Breaker is much slower at
639	   responding to congestion than a fast-trip Circuit Breaker and is
640	   expected to be more common.

642	   One example where a slow-trip Circuit Breaker is needed is where
643	   flows or traffic-aggregates use a tunnel or encapsulation and the
644	   flows within the tunnel do not all support TCP-style congestion
645	   control (e.g., TCP, SCTP, TFRC), see [ID-ietf-tsvwg-RFC5405.bis]
646	   section 3.1.3.  A use case is where tunnels are deployed in the
647	   general Internet (rather than "controlled environments" within an
648	   Internet service provider or enterprise network), especially when the
649	   tunnel could need to cross a customer access router.

651	6.3.  A Managed Circuit Breaker

653	   A managed Circuit Breaker is implemented in the signalling protocol
654	   or management plane that relates to the traffic aggregate being
655	   controlled.  This type of circuit breaker is typically applicable
656	   when the deployment is within a "controlled environment".

658	   A Circuit Breaker requires more than the ability to determine that a
659	   network path is forwarding data, or to measure the rate of a path -
660	   which are often normal network operational functions.  There is an
661	   additional need to determine a metric for congestion on the path and
662	   to trigger a reaction when a threshold is crossed that indicates
663	   persistent congestion.

665	6.3.1.  A Managed Circuit Breaker for SAToP Pseudo-Wires

667	   [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example
668	   of a managed circuit breaker for isochronous flows.

670	   If such flows were to run over a pre-provisioned (e.g., Multi-
671	   Protocol Label Switching, MPLS) infrastructure, then it could be
672	   expected that the Pseudowire (PW) would not experience congestion,
673	   because a flow is not expected to either increase (or decrease) their
674	   rate.  If instead Pseudo-Wire traffic is multiplexed with other
675	   traffic over the general Internet, it could experience congestion.
676	   [RFC4553] states: "If SAToP PWs run over a PSN providing best-effort
677	   service, they SHOULD monitor packet loss in order to detect "severe
678	   congestion".  The currently recommended measurement period is 1
679	   second, and the trigger operates when there are more than three
680	   measured Severely Errored Seconds (SES) within a period.  If such a
681	   condition is detected, a SAToP PW ought to shut down bidirectionally
682	   for some period of time...".

684	   The concept was that when the packet loss ratio (congestion) level
685	   increased above a threshold, the PW was by default disabled.  This
686	   use case considered fixed-rate transmission, where the PW had no
687	   reasonable way to shed load.

689	   The trigger needs to be set at the rate that the PW was likely to
690	   experience a serious problem, possibly making the service non-
691	   compliant.  At this point, triggering the Circuit Breaker would
692	   remove the traffic preventing undue impact on congestion-responsive
693	   traffic (e.g., TCP).  Part of the rationale, was that high loss
694	   ratios typically indicated that something was "broken" and ought to
695	   have already resulted in operator intervention, and therefore need to
696	   trigger this intervention.

698	   An operator-based response provides opportunity for other action to
699	   restore the service quality, e.g., by shedding other loads or
700	   assigning additional capacity, or to consciously avoid reacting to
701	   the trigger while engineering a solution to the problem.  This could
702	   require the trigger to be sent to a third location (e.g., a network
703	   operations centre, NOC) responsible for operation of the tunnel
704	   ingress, rather than the tunnel ingress itself.

706	6.3.2.  A Managed Circuit Breaker for Pseudowires (PWs)

708	   Pseudowires (PWs) [RFC3985] have become a common mechanism for
709	   tunneling traffic, and may compete for network resources both with
710	   other PWs and with non-PW traffic, such as TCP/IP flows.

712	   [ID-ietf-pals-congcons] discusses congestion conditions that can
713	   arise when PWs compete with elastic (i.e., congestion responsive)
714	   network traffic (e.g, TCP traffic).  Elastic PWs carrying IP traffic
715	   (see [RFC4488]) do not raise major concerns because all of the
716	   traffic involved responds, reducing the transmission rate when
717	   network congestion is detected.

719	   In contrast, inelastic PWs (e.g., a fixed bandwidth Time Division
720	   Multiplex, TDM) [RFC4553] [RFC5086] [RFC5087]) have the potential to
721	   harm congestion responsive traffic or to contribute to excessive
722	   congestion because inelastic PWs do not adjust their transmission
723	   rate in response to congestion.  [ID-ietf-pals-congcons] analyses TDM
724	   PWs, with an initial conclusion that a TDM PW operating with a degree
725	   of loss that may result in congestion-related problems is also
726	   operating with a degree of loss that results in an unacceptable TDM
727	   service.  For that reason, the draft suggests that a managed circuit
728	   breaker that shuts down a PW when it persistently fails to deliver
729	   acceptable TDM service is a useful means for addressing these
730	   congestion concerns.

732	7.  Examples where circuit breakers may not be needed.

734	   A Circuit Breaker is not required for a single Congestion Controller-
735	   controlled flow using TCP, SCTP, TFRC, etc.  In these cases, the
736	   Congestion Control methods are already designed to prevent persistent
737	   congestion.

739	7.1.  CBs over pre-provisioned Capacity

741	   One common question is whether a Circuit Breaker is needed when a
742	   tunnel is deployed in a private network with pre-provisioned
743	   capacity.

745	   In this case, compliant traffic that does not exceed the provisioned
746	   capacity ought not to result in persistent congestion.  A Circuit
747	   Breaker will hence only be triggered when there is non-compliant
748	   traffic.  It could be argued that this event ought never to happen -
749	   but it could also be argued that the Circuit Breaker equally ought
750	   never to be triggered.  If a Circuit Breaker were to be implemented,
751	   it will provide an appropriate response if persistent congestion
752	   occurs in an operational network.

754	   Implementing a Circuit Breaker will not reduce the performance of the
755	   flows, but in the event that persistent congestion occurs it protects
756	   network traffic that shares network capacity with these flows.  A
757	   Circuit Breaker also could be used to protect other sharing network
758	   traffic from a failure that causes the Circuit Breaker traffic to be
759	   routed over a non-pre-provisioned path.

761	7.2.  CBs with tunnels carrying Congestion-Controlled Traffic

763	   IP-based traffic is generally assumed to be congestion-controlled,
764	   i.e., it is assumed that the transport protocols generating IP-based
765	   traffic at the sender already employ mechanisms that are sufficient
766	   to address congestion on the path [ID-ietf-tsvwg-RFC5405.bis].  A
767	   question therefore arises when people deploy a tunnel that is thought
768	   to only carry an aggregate of TCP (or some other Congestion
769	   Controller-controlled) traffic: Is there advantage in this case in
770	   using a Circuit Breaker?

772	   For sure, traffic in a such a tunnel will respond to congestion.
773	   However, the answer to the question is not always obvious, because
774	   the overall traffic formed by an aggregate of flows that implement a
775	   Congestion Controller mechanism does not necessarily prevent
776	   persistent congestion.  For instance, most Congestion Controller
777	   mechanisms require long-lived flows to react to reduce the rate of a
778	   flow, an aggregate of many short flows could result in many
779	   terminating before they experience congestion.  It is also often
780	   impossible for a tunnel service provider to know that the tunnel only
781	   contains CC-controlled traffic (e.g., Inspecting packet headers could
782	   not be possible).  The important thing to note is that if the
783	   aggregate of the traffic does not result in persistent congestion
784	   (impacting other flows), then the Circuit Breaker will not trigger.
785	   This is the expected case in this context - so implementing a Circuit
786	   Breaker will not reduce performance of the tunnel, but in the event
787	   that persistent congestion occurs this protects other network traffic
788	   that shares capacity with the tunnel traffic.

790	7.3.  CBs with Uni-directional Traffic and no Control Path

792	   A one-way forwarding path could have no associated communication path
793	   for sending control messages, and therefore cannot be controlled
794	   using an automated process.  This service could be provided using a
795	   path that has dedicated capacity and does not share this capacity
796	   with other elastic Internet flows (i.e., flows that vary their rate).

798	   A way to mitigate the impact on other flows when capacity could be
799	   shared is to manage the traffic envelope by using ingress policing.

801	   Supporting this type of traffic in the general Internet requires
802	   operator monitoring to detect and respond to persistent congestion.

804	8.  Security Considerations

806	   All Circuit Breaker mechanisms rely upon coordination between the
807	   ingress and egress meters and communication with the trigger
808	   function.  This is usually achieved by passing network control
809	   information (or protocol messages) across the network.  Timely
810	   operation of a circuit breaker depends on the choice of measurement
811	   period.  If the receiver has an interval that is overly long, then
812	   the responsiveness of the circuit breaker decreases.  This impacts
813	   the ability of the circuit breaker to detect and react to congestion.

815	   Mechanisms need to be implemented to prevent attacks on the network
816	   control information that would result in Denial of Service (DoS).
817	   The source and integrity of control information (measurements and
818	   triggers) MUST be protected from off-path attacks.  Without
819	   protection, it could be trivial for an attacker to inject packets
820	   with values that could prematurely trigger a circuit breaker
821	   resulting in DoS.  Simple protection can be provided by using a
822	   randomized source port, or equivalent field in the packet header
823	   (such as the RTP SSRC value and the RTP sequence number) expected not
824	   to be known to an off-path attacker.  Stronger protection can be
825	   achieved using a secure authentication protocol.

827	   Transmission of network control information consumes network
828	   capacity.  This control traffic needs to be considered in the design
829	   of a Circuit Breaker and could potentially add to network congestion.
830	   If this traffic is sent over a shared path, it is RECOMMENDED that
831	   this control traffic is prioritized to reduce the probability of loss
832	   under congestion.  Control traffic also needs to be considered when
833	   provisioning a network that uses a circuit breaker.

835	   The circuit breaker MUST be designed to be robust to packet loss that
836	   can also be experienced during congestion/overload.  Loss of control
837	   messages could be a side-effect of a congested network, but also
838	   could arise from other causes.  This does not imply that it is
839	   desirable to provide reliable delivery (e.g., over TCP), since this
840	   can incur additional delay in responding to congestion.  Appropriate
841	   mechanisms could be to duplicate control messages to provide
842	   increased robustness to loss, or/and to regard a lack of control
843	   traffic as an indication that excessive congestion may be being
844	   experienced [ID-ietf-tsvwg-RFC5405.bis].

846	   The security implications depend on the design of the mechanisms, the
847	   type of traffic being controlled and the intended deployment
848	   scenario.  Each design of a Circuit Breaker MUST therefore evaluate
849	   whether the particular circuit breaker mechanism has new security
850	   implications.

852	9.  IANA Considerations

854	   This document makes no request from IANA.

856	10.  Acknowledgments

858	   There are many people who have discussed and described the issues
859	   that have motivated this draft.  Contributions and comments included:
860	   Lars Eggert, Colin Perkins, David Black, Matt Mathis and Andrew
861	   McGregor.  This work was part-funded by the European Community under
862	   its Seventh Framework Programme through the Reducing Internet
863	   Transport Latency (RITE) project (ICT-317700).

865	11.  Revision Notes

867	   XXX RFC-Editor: Please remove this section prior to publication XXX

869	   Draft 00

871	   This was the first revision.  Help and comments are greatly
872	   appreciated.

874	   Draft 01

876	   Contained clarifications and changes in response to received
877	   comments, plus addition of diagram and definitions.  Comments are
878	   welcome.

880	   WG Draft 00

882	   Approved as a WG work item on 28th Aug 2014.

884	   WG Draft 01
885	   Incorporates feedback after Dallas IETF TSVWG meeting.  This version
886	   is thought ready for WGLC comments.

888	   WG Draft 02

890	   Minor fixes for typos.  Rewritten security considerations section.

892	   WG Draft 03

894	   Updates following WGLC comments (see TSV mailing list).  Comments
895	   from C Perkins; D Black and off-list feedback.

897	   A clear recommendation of intended scope.

899	   Changes include: Improvement of language on timescales and minimum
900	   measurement period; clearer articulation of endpoint and multicast
901	   examples - with new diagrams; separation of the controlled network
902	   case; updated text on position of trigger function; corrections to
903	   RTP-CB text; clarification of loss v ECN metrics; checks against
904	   submission checklist 9use of keywords, added meters to diagrams).

906	   WG Draft 04

908	   Added section on PW CB for TDM - a newly adopted draft (D.  Black).

910	   WG Draft 05

912	   Added clarifications requested during AD review.

914	   WG Draft 06

916	   Fixed some remaining typos.

918	   Update following detailed review by Bob Briscoe, and comments by D.
919	   Black.

921	12.  References

923	12.1.  Normative References

925	   [ID-ietf-tsvwg-RFC5405.bis]
926	              Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
927	              Guidelines (Work-in-Progress)", 2015.

929	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
930	              Requirement Levels", BCP 14, RFC 2119,
931	              DOI 10.17487/RFC2119, March 1997,
932	              <http://www.rfc-editor.org/info/rfc2119>.

934	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
935	              of Explicit Congestion Notification (ECN) to IP",
936	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
937	              <http://www.rfc-editor.org/info/rfc3168>.

939	12.2.  Informative References

941	   [ID-ietf-pals-congcons]
942	              Stein, YJ., Black, D., and B. Briscoe, "Pseudowire
943	              Congestion Considerations (Work-in-Progress)", 2015.

945	   [Jacobsen88]
946	              European Telecommunication Standards, Institute (ETSI),
947	              "Congestion Avoidance and Control", SIGCOMM Symposium
948	              proceedings on Communications architectures and
949	              protocols", August 1998.

951	   [RFC1112]  Deering, S., "Host extensions for IP multicasting", STD 5,
952	              RFC 1112, DOI 10.17487/RFC1112, August 1989,
953	              <http://www.rfc-editor.org/info/rfc1112>.

955	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
956	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
957	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
958	              S., Wroclawski, J., and L. Zhang, "Recommendations on
959	              Queue Management and Congestion Avoidance in the
960	              Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
961	              <http://www.rfc-editor.org/info/rfc2309>.

963	   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
964	              RFC 2914, DOI 10.17487/RFC2914, September 2000,
965	              <http://www.rfc-editor.org/info/rfc2914>.

967	   [RFC3985]  Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation
968	              Edge-to-Edge (PWE3) Architecture", RFC 3985,
969	              DOI 10.17487/RFC3985, March 2005,
970	              <http://www.rfc-editor.org/info/rfc3985>.

972	   [RFC4488]  Levin, O., "Suppression of Session Initiation Protocol
973	              (SIP) REFER Method Implicit Subscription", RFC 4488,
974	              DOI 10.17487/RFC4488, May 2006,
975	              <http://www.rfc-editor.org/info/rfc4488>.

977	   [RFC4553]  Vainshtein, A., Ed. and YJ. Stein, Ed., "Structure-
978	              Agnostic Time Division Multiplexing (TDM) over Packet
979	              (SAToP)", RFC 4553, DOI 10.17487/RFC4553, June 2006,
980	              <http://www.rfc-editor.org/info/rfc4553>.

982	   [RFC5086]  Vainshtein, A., Ed., Sasson, I., Metz, E., Frost, T., and
983	              P. Pate, "Structure-Aware Time Division Multiplexed (TDM)
984	              Circuit Emulation Service over Packet Switched Network
985	              (CESoPSN)", RFC 5086, DOI 10.17487/RFC5086, December 2007,
986	              <http://www.rfc-editor.org/info/rfc5086>.

988	   [RFC5087]  Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi,
989	              "Time Division Multiplexing over IP (TDMoIP)", RFC 5087,
990	              DOI 10.17487/RFC5087, December 2007,
991	              <http://www.rfc-editor.org/info/rfc5087>.

993	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
994	              Friendly Rate Control (TFRC): Protocol Specification",
995	              RFC 5348, DOI 10.17487/RFC5348, September 2008,
996	              <http://www.rfc-editor.org/info/rfc5348>.

998	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
999	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
1000	              <http://www.rfc-editor.org/info/rfc5681>.

1002	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
1003	              and K. Carlberg, "Explicit Congestion Notification (ECN)
1004	              for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
1005	              2012, <http://www.rfc-editor.org/info/rfc6679>.

1007	   [RTP-CB]   Perkins, and Singh, "Multimedia Congestion Control:
1008	              Circuit Breakers for Unicast RTP Sessions", February 2014.

1010	Author's Address

1012	   Godred Fairhurst
1013	   University of Aberdeen
1014	   School of Engineering
1015	   Fraser Noble Building
1016	   Aberdeen, Scotland  AB24 3UE
1017	   UK

1019	   Email: gorry@erg.abdn.ac.uk
1020	   URI:   http://www.erg.abdn.ac.uk