idnits 2.17.1 

draft-fairhurst-tsvwg-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 12, 2014) is 3660 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RTP-CB' is defined on line 468, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6040' is defined on line 487, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobsen88'

  ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RTP-CB'


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TSVWG Working Group                                         G. Fairhurst
3	Internet-Draft                                    University of Aberdeen
4	Intended status: Standards Track                          April 12, 2014
5	Expires: October 14, 2014

7	                   Network Transport Circuit Breakers
8	                       draft-fairhurst-tsvwg-00

10	Abstract

12	   This note explains what is meant by the term "transport circuit
13	   breaker" in the context of an Internet tunnel service.

15	Status of This Memo

17	   This Internet-Draft is submitted in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF).  Note that other groups may also distribute
22	   working documents as Internet-Drafts.  The list of current Internet-
23	   Drafts is at http://datatracker.ietf.org/drafts/current/.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   This Internet-Draft will expire on October 14, 2014.

32	Copyright Notice

34	   Copyright (c) 2014 IETF Trust and the persons identified as the
35	   document authors.  All rights reserved.

37	   This document is subject to BCP 78 and the IETF Trust's Legal
38	   Provisions Relating to IETF Documents
39	   (http://trustee.ietf.org/license-info) in effect on the date of
40	   publication of this document.  Please review these documents
41	   carefully, as they describe your rights and restrictions with respect
42	   to this document.  Code Components extracted from this document must
43	   include Simplified BSD License text as described in Section 4.e of
44	   the Trust Legal Provisions and are provided without warranty as
45	   described in the Simplified BSD License.

47	Table of Contents

49	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
50	     1.1.  Types of Circuit-Breaker  . . . . . . . . . . . . . . . .   3
51	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
52	   3.  Designing a Circuit-Breaker (What makes a good circuit
53	       breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . .   4
54	     3.1.  Basic Function  . . . . . . . . . . . . . . . . . . . . .   6
55	   4.  Examples of Circuit Breakers  . . . . . . . . . . . . . . . .   6
56	     4.1.  A fast-trip Circuit Breaker . . . . . . . . . . . . . . .   6
57	       4.1.1.  A fast-trip RTP Circuit Breaker . . . . . . . . . . .   7
58	     4.2.  A Slow-trip Circuit Breaker . . . . . . . . . . . . . . .   7
59	     4.3.  A Managed Circuit Breaker . . . . . . . . . . . . . . . .   8
60	       4.3.1.  A Managed Circuit Breaker for SAToP Pseudo-Wires  . .   8
61	   5.  Examples where circuit breakers may not be needed.  . . . . .   9
62	     5.1.  CBs and uni-directional Traffic . . . . . . . . . . . . .   9
63	     5.2.  CBs over pre-provisioned Capacity . . . . . . . . . . . .   9
64	     5.3.  CBs with CC Traffic . . . . . . . . . . . . . . . . . . .   9
65	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
66	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
67	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  10
68	   9.  Revision Notes  . . . . . . . . . . . . . . . . . . . . . . .  10
69	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
70	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  10
71	     10.2.  Informative References . . . . . . . . . . . . . . . . .  11
72	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  11

74	1.  Introduction

76	   A transport Circuit Breaker (CB) is an automatic mechanism that is
77	   used to estimate congestion caused by a flow, and to terminate (or
78	   significantly reduce the rate of) the flow when excessive congestion
79	   is detected.  This is a safety measure to prevent congestion collapse
80	   (starvation of resources available to other flows), essential for an
81	   Internet that is heterogeneous and for traffic that is hard to
82	   predict in advance.

84	   A CB is intended as a protection mechanism of last resort.  Under
85	   normal circumstances, a CB should not be triggered; It is designed to
86	   protect things when there is overload.  Just as people do not expect
87	   the electrical circuit-breaker (or fuse) in their home to be
88	   triggered, except when there is a wiring fault or a problem with an
89	   electrical appliance.

91	   Persistent congestion (also known as "congestion collapse") was a
92	   feature of the early Internet of the 1980s. This resulted in excess
93	   traffic starving other connection from access to the Internet.  It
94	   was countered by the requirement to use congestion control (CC) by
95	   the TCP transport protocol[Jacobsen88] [RFC1112].  These mechanisms
96	   operate in Internet hosts to cause TCP connections to "back off"
97	   during congestion.  The introduction of CC in TCP (currently
98	   documented in [RFC5681] ensured the stability of the Internet,
99	   because it was able to detect congestion and promptly react.  This
100	   worked well while TCP was by far the dominant traffic in the
101	   Internet, and most TCP flows were long-lived (ensuring that they
102	   could detect and respond to congestion before the flows terminated).
103	   This is no longer the case, and non-congestion controlled traffic,
104	   such as UDP can form a significant proportion of the total traffic
105	   traversing a link.  The current Internet therefore requires that non-
106	   congestion controlled traffic needs to be considered to avoid
107	   congestion collapse.

109	   There are important differences between a transport circuit-breaker
110	   and a congestion-control method.  Specifically, congestion control
111	   (as implemented in TCP, SCTP, and DCCP) needs to operate on the
112	   timescale on the order of a packet round-trip-time (RTT), the time
113	   from sender to destination and return.  Congestion control methods
114	   may react to a single packet loss/marking and reduce the transmission
115	   rate for each loss or congestion event.  The goal is usually to limit
116	   the maximum transmission rate that reflects the available capacity of
117	   a network path.  These methods typically operate on individual
118	   traffic flows (e.g. a 5-tuple).

120	   In contrast, CBs are recommended for traffic aggregates, e.g.traffic
121	   sent using a network tunnel.  Later sections provide examples of
122	   cases where circuit-breakers may or may not be desirable.

124	   A CB needs to be designed to trigger robustly when there is
125	   persistent congestion.  It will often operate on a much longer
126	   timescale: many RTTs, possibly many 10s of seconds.  This longer
127	   period is needed to provide sufficient time for transports (or
128	   applications) to adjust their rate following congestion, and for the
129	   network load to stabilise after adjustment.  A CB also needs to
130	   decide if a reaction is required based on a series of successive
131	   samples taken over a reasonably long period of time.  This is to
132	   ensure that a CB does not accidentally trigger following a single (or
133	   even successive) congestion events (congestion events are what
134	   triggers congestion control, and are to be regarded as normal on a
135	   network link operating near its capacity).

137	1.1.  Types of Circuit-Breaker

139	   There are various forms of circuit breaker, which are differentiated
140	   mainly on the timescale over which they are triggered, but also in
141	   the intended protection they offer:

143	   o  Fast-Trip Circuit Breakers: The relatively short timescale used by
144	      this form of circuit breaker is intended to protect a flow or
145	      related group of flows.

147	   o  Slow-Trip Circuit Breakers: This circuit breaker utilises a longer
148	      timescale and is designed to protect traffic aggregates.

150	   o  Managed Circuit Breakers: Utilise the operations and management
151	      functions that may be present in a managed service to implement a
152	      circuit breaker.

154	   Examples of each type of circuit breaker are provided in section 4.

156	2.  Terminology

158	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
159	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
160	   document are to be interpreted as described in [RFC2119].

162	3.  Designing a Circuit-Breaker (What makes a good circuit breaker?)

164	   Although circuit breakers have been talked about in the IETF for many
165	   years, there has not yet been guidance on the cases where they are
166	   need for or the design of circuit breaker mechanisms.  This document
167	   seeks to offer advise on these topics.

169	   The basic design of a circuit breaker involves communication between
170	   the sender and receiver of a network flow.  It is assumed that a
171	   sender can control the rate of the flow, but the effect of congestion
172	   can only be measured at the corresponding receiver (after loss/
173	   marking is experienced across the end-to-end path).  The receiver
174	   therefore needs to be responsible for either measuring the level of
175	   congestion (and returning this measure to the sender to inform a
176	   trigger) or for detecting excessive congestion (returning the trigger
177	   to the sender).  Whether the trigger is generated at the receiver or
178	   based on measurements returned to the sender, the result of the
179	   trigger (the circuit-breaker action) needs to be applied at the
180	   sender.

182	   The set of components needed to implement a circuit breaker are:

184	   o  There MUST be a control path from the receiver to the sender.
185	      Ideally the CB should trigger if this control path fails.  That
186	      is, the feedback indicating a congested period is designed so that
187	      the sender triggers the CB action when it fails to receive reports
188	      from the receiver that indicate an absence of congestion, rather
189	      than relying on the successful transmission of a "congested"
190	      signal back to the sender.  (The feedback signal could itself be
191	      lost under congestion collapse).

193	   o  A CB MUST define a measurement period over which the receiver
194	      measures the level of congestion.  This method does not have to
195	      detect individual packet loss, but MUST have a way to know that
196	      packets have been lost/marked from the traffic flow.  If ECN is
197	      enabled, a receiver MAY also count the number of Explicit
198	      Congestion Notification (ECN)[RFC3168] marks per measurement
199	      interval, but even if ECN is used, the loss MUST still be
200	      measured, since this better reflects the impact of excessive
201	      congestion.  The type of CB will determine how long this
202	      measurement period needs to be.  The minimum time must be
203	      significantly longer than the time that current CC algorithms need
204	      to reduce their rate following detection of congestion (i.e. many
205	      path RTTs).

207	   o  A CB MUST define a threshold to determine whether the measured
208	      congestion is considered excessive.

210	   o  A CB MUST define a period over which the trigger uses collected
211	      measurements.

213	   o  A CB MUST be robust to multiple congestion events.  This usually
214	      will define a number of measured excessive congestion events per
215	      triggering period.  For example, a CB may combine the results of
216	      several measurement periods to determine if the CB is triggered.
217	      (e.g. triggered when excessive congestion is detected in 3
218	      measurements within the triggering interval).

220	   o  A triggered CB MUST react decisively by reducing traffic at the
221	      source (e.g. tunnel egress).  A CB SHOULD be constructed so that
222	      it does not trigger under light or intermittent congestion, hence
223	      the response when triggered needs to be much more severe than that
224	      of a CC algorithm.  By default, a CB SHOULD disable the flow, it
225	      could alternatively significantly reduce the rate of the flow it
226	      controls.

228	   o  Triggering a CB SHOULD result in a response that continues for a
229	      period of time.  This by default SHOULD be at least the triggering
230	      interval.  Manual operator intervention MAY be required to restore
231	      the flow.  If an automated response is needed to restore the flow,
232	      then this MUST NOT be immediate.

234	   o  When a CB is triggered, it SHOULD be regarded as an abnormal
235	      network event.  As such, this event SHOULD be logged.  The
236	      measurements that lead to triggering of the CB SHOULD also be
237	      logged.

239	3.1.  Basic Function

241	   This section provides one example of a suitable method to measure
242	   congestion:

244	   1.  A sender or a tunnel ingress records the number of packets/bytes
245	       sent in each measurement interval.  The measurement interval
246	       could be every few seconds.

248	   2.  The receiver or tunnel egress also records the number/bytes
249	       received (at ) in each measurement interval.

251	   3.  The receiver periodically returns the measured values.  (This
252	       could be using Operations and Management (OAM), or an in-band
253	       signalling datagram).

255	   4.  Using the ingress and egress measurements, the loss rate for each
256	       measurement interval can be deduced from calculating the
257	       difference between these two counter values.  Note that accurate
258	       measurement intervals are not typically important, since isolated
259	       loss events need to be disregard.  An appropriate threshold for
260	       determining excessive congestion needs to be set (e.g. more than
261	       10% loss, but other methods could also be based on the rate of
262	       transmission as well as the loss rate).

264	   5.  The transport circuit breaker is triggered when the threshold is
265	       exceeded in multiple measurement intervals (e.g. 3 successive
266	       measurements).  This design is to be robust to single or spurious
267	       events resulting in a trigger.

269	   6.  The design may also trigger loss when it does not receive
270	       receiver measurements for 3 successive measurement periods - this
271	       may indicate a loss of control packets.

273	4.  Examples of Circuit Breakers

275	   This section provides examples of different types of circuit breaker.
276	   There are multiple types of circuit breaker that may be defined for
277	   use in different deployment cases:

279	4.1.  A fast-trip Circuit Breaker

281	   A fast-trip circuit breaker is the most responsive It has a response
282	   time that is only slightly larger than that of the traffic it
283	   controls.  It is suited to traffic with well-understood
284	   characteristics.  It is not be suited to arbitrary network traffic,
285	   since it may prematurely trigger (e.g. when multiple congestion-
286	   controlled flows lead to short-term overload).

288	4.1.1.  A fast-trip RTP Circuit Breaker

290	   A set of fast-trip CB methods have been specified for use together by
291	   a Real-time Transport Protocol (RTP) flow using the RTP/AVP Profile
292	   :[RTP-CB] . It is expected that, in the absence of severe congestion,
293	   all RTP applications running on best-effort IP networks will be able
294	   to run without triggering these circuit breakers.

296	   The RTP congestion control specification is therefore implemented as
297	   a fail-safe.

299	   The sender monitors reception of RTCP Reception Report (RR or XRR)
300	   packets that convey reception quality feedback information.  This is
301	   used to measure (congestion) loss, possibly in combination with ECN
302	   [RFC6679].

304	   The CB action (shutdown of the flow) is triggered when any of the
305	   following trigger conditions are true:

307	   1.  An RTP CB triggers on reported lack of progress.

309	   2.  An RTP CB triggers when no receiver reports messages are
310	       received.

312	   3.  An RTP CB uses a TFRC-style check and set a hard upper limit to
313	       the long-term RTP throughput (over many RTTs).

315	   4.  An RTP CB includes the notion of Media Usability.  This circuit
316	       breaker is triggered when the quality of the transported media
317	       falls below some required minimum acceptable quality.

319	4.2.  A Slow-trip Circuit Breaker

321	   It is expected that most circuit breakers will be slower at
322	   responding to loss.

324	   One example where a circuit breaker is needed is where flows or
325	   traffic-aggregates use a tunnel or encapsulation and the flows within
326	   the tunnel do not all support TCP-style congestion control (e.g. TCP,
327	   SCTP, TFRC), see [RFC5405] section 3.1.3.  The usual case where this
328	   is needed is when tunnels are deployed in the general Internet
329	   (rather than "controlled environments" within an ISP or Enterprise),
330	   especially when the tunnel may need to cross a customer access
331	   router.

333	4.3.  A Managed Circuit Breaker

335	   This type of circuit breaker is implemented in the signalling
336	   protocol or management plane that relates to the traffic aggregate
337	   being controlled.  This type of circuit breaker is typically
338	   applicable when the deployment is within a "controlled environment".

340	4.3.1.  A Managed Circuit Breaker for SAToP Pseudo-Wires

342	   [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example
343	   of a managed circuit breaker for isochronous flows.

345	   If such flows were to run over a pre-provisioned (e.g. MPLS)
346	   infrastructure, then it may be expected that the Pseudo-Wire (PW)
347	   would not experience congestion, because a flow is not expected to
348	   either increase (or decrease) their rate.  If instead Pseudo-Wire
349	   traffic is multiplexed with other traffic over the general Internet,
350	   it could experience congestion.  [RFC4553] states: "If SAToP PWs run
351	   over a PSN providing best-effort service, they SHOULD monitor packet
352	   loss in order to detect "severe congestion".  The currently
353	   recommended measurement period is 1 second, and the trigger operates
354	   when there are more than three measured Severely Errored Seconds
355	   (SES) within a period.

357	   If such a condition is detected, a SAToP PW should shut down
358	   bidirectionally for some period of time..." The concept was that when
359	   the packet loss ratio (congestion) level increased above a threshold,
360	   the PW was by default disabled.  This use case considered fixed-rate
361	   transmission, where the PW had no reasonable way to shed load.

363	   The trigger needs to be set at the rate the PW was likely have a
364	   serious problem, possibly making the service non-compliant.  At this
365	   point triggering the CB would remove the traffic prevent undue impact
366	   congestion-responsive traffic (e.g., TCP).  Part of the rationale,
367	   was that high loss ratios typically indicated that something was
368	   "broken" and should have already resulted in operator intervention,
369	   and should trigger this intervention.  An operator-based response
370	   provides opportunity for other action to restore the service quality,
371	   e.g. by shedding other loads or assigning additional capacity, or to
372	   consciously avoid reacting to the trigger while engineering a
373	   solution to the problem.  This may require the trigger to be sent to
374	   a third location (e.g. a network operations centre, NOC) responsible
375	   for operation of the tunnel ingress, rather than the tunnel ingress
376	   itself.

378	5.  Examples where circuit breakers may not be needed.

380	   A CB is not required for a single CC-controlled flow using TCP, SCTP,
381	   TFRC, etc.  In these cases, the CC methods are designed to prevent
382	   congestion collapse.

384	5.1.  CBs and uni-directional Traffic

386	   A CB can not be used to control uni-directional UDP traffic.  The
387	   lack of feedback prevents automated triggering of the CB.  Supporting
388	   this type of traffic in the general Internet requires operator
389	   monitoring to detect and respond to congestion collapse or the use of
390	   dedicated capacity - e.g. Using per-provisioned MPLS services, RSVP,
391	   or admission-controlled Differentiated Services.

393	5.2.  CBs over pre-provisioned Capacity

395	   One common question is whether a CB is needed when a tunnel is
396	   deployed in a private network with pre-provisioned capacity?  In this
397	   case, compliant traffic that does not exceed the provisioned capacity
398	   should not result in congestion.  The CB will hence only be triggered
399	   when there is non-compliant traffic.  It could be argued that this
400	   event should never happen - but it may also be argued that the CB
401	   equally should never be triggered.  If a CB were to be implemented,
402	   it would provide an appropriate response should this excessive
403	   congestion occur in an operational network.

405	5.3.  CBs with CC Traffic

407	   IP-based traffic is generally assumed to be congestion-controlled,
408	   i.e., it is assumed that the transport protocols generating IP-based
409	   traffic at the sender already employ mechanisms that are sufficient
410	   to address congestion on the path [RFC5405].  A question therefore
411	   arises when people deploy a tunnel that is thought to only carry an
412	   aggregate of TCP (or some other CC-controlled) traffic: Is there
413	   advantage in this case in using a CB?  For sure, traffic in a such a
414	   tunnel will respond to congestion.  However, the answer to the
415	   question is not obvious, because the overall traffic formed by an
416	   aggregate of flows that implement a CC mechanism does not necessarily
417	   prevent congestion collapse.  For instance, most CC mechanisms
418	   require long-lived flows to react to reduce the rate of a flow, an
419	   aggregate of many short flows may result in many terminating before
420	   they experience congestion.  It is also often impossible for a tunnel
421	   service provider to know that the tunnel only contains CC-controlled
422	   traffic (e.g. Inspecting packet headers may not be possible).  The
423	   important thing to note is that if the aggregate of the traffic does
424	   not result in persistent congestion (impacting other flows), then the
425	   CB will not trigger.  This is the expected case in this context - so
426	   implementing a CB will not reduce performance of the tunnel, but
427	   offers protection should congestion collapse occur.

429	6.  Security Considerations

431	   This section will describe security considerations.

433	7.  IANA Considerations

435	   This document makes no request from IANA.

437	8.  Acknowledgments

439	   There are many people who have discussed and described the issues
440	   that have motivated this draft.

442	9.  Revision Notes

444	   RFC-Editor: Please remove this section prior to publication

446	   Draft 00

448	   This was the first revision.  Help and comments are greatly
449	   appreciated.

451	10.  References

453	10.1.  Normative References

455	   [Jacobsen88]
456	              European Telecommunication Standards, Institute (ETSI),
457	              "Congestion Avoidance and Control", SIGCOMM Symposium
458	              proceedings on Communications architectures and
459	              protocols", August 1998.

461	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
462	              Requirement Levels", BCP 14, RFC 2119, March 1997.

464	   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
465	              for Application Designers", BCP 145, RFC 5405, November
466	              2008.

468	   [RTP-CB]   and , "Multimedia Congestion Control: Circuit Breakers for
469	              Unicast RTP Sessions", February 2014.

471	10.2.  Informative References

473	   [RFC1112]  Deering, S., "Host extensions for IP multicasting", STD 5,
474	              RFC 1112, August 1989.

476	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
477	              of Explicit Congestion Notification (ECN) to IP", RFC
478	              3168, September 2001.

480	   [RFC4553]  Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time
481	              Division Multiplexing (TDM) over Packet (SAToP)", RFC
482	              4553, June 2006.

484	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
485	              Control", RFC 5681, September 2009.

487	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
488	              Notification", RFC 6040, November 2010.

490	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
491	              and K. Carlberg, "Explicit Congestion Notification (ECN)
492	              for RTP over UDP", RFC 6679, August 2012.

494	Author's Address

496	   Godred Fairhurst
497	   University of Aberdeen
498	   School of Engineering
499	   Fraser Noble Building
500	   Aberdeen, Scotland  AB24 3UE
501	   UK

503	   Email: gorry@erg.abdn.ac.uk
504	   URI:   http://www.erg.abdn.ac.uk