idnits 2.17.1 draft-fairhurst-tsvwg-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 12, 2014) is 3660 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RTP-CB' is defined on line 468, but no explicit reference was found in the text == Unused Reference: 'RFC6040' is defined on line 487, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobsen88' ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) -- Possible downref: Non-RFC (?) normative reference: ref. 'RTP-CB' Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group G. Fairhurst 3 Internet-Draft University of Aberdeen 4 Intended status: Standards Track April 12, 2014 5 Expires: October 14, 2014 7 Network Transport Circuit Breakers 8 draft-fairhurst-tsvwg-00 10 Abstract 12 This note explains what is meant by the term "transport circuit 13 breaker" in the context of an Internet tunnel service. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on October 14, 2014. 32 Copyright Notice 34 Copyright (c) 2014 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 1.1. Types of Circuit-Breaker . . . . . . . . . . . . . . . . 3 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 3. Designing a Circuit-Breaker (What makes a good circuit 53 breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 3.1. Basic Function . . . . . . . . . . . . . . . . . . . . . 6 55 4. Examples of Circuit Breakers . . . . . . . . . . . . . . . . 6 56 4.1. A fast-trip Circuit Breaker . . . . . . . . . . . . . . . 6 57 4.1.1. A fast-trip RTP Circuit Breaker . . . . . . . . . . . 7 58 4.2. A Slow-trip Circuit Breaker . . . . . . . . . . . . . . . 7 59 4.3. A Managed Circuit Breaker . . . . . . . . . . . . . . . . 8 60 4.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires . . 8 61 5. Examples where circuit breakers may not be needed. . . . . . 9 62 5.1. CBs and uni-directional Traffic . . . . . . . . . . . . . 9 63 5.2. CBs over pre-provisioned Capacity . . . . . . . . . . . . 9 64 5.3. CBs with CC Traffic . . . . . . . . . . . . . . . . . . . 9 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 66 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 67 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10 68 9. Revision Notes . . . . . . . . . . . . . . . . . . . . . . . 10 69 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 70 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 71 10.2. Informative References . . . . . . . . . . . . . . . . . 11 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 74 1. Introduction 76 A transport Circuit Breaker (CB) is an automatic mechanism that is 77 used to estimate congestion caused by a flow, and to terminate (or 78 significantly reduce the rate of) the flow when excessive congestion 79 is detected. This is a safety measure to prevent congestion collapse 80 (starvation of resources available to other flows), essential for an 81 Internet that is heterogeneous and for traffic that is hard to 82 predict in advance. 84 A CB is intended as a protection mechanism of last resort. Under 85 normal circumstances, a CB should not be triggered; It is designed to 86 protect things when there is overload. Just as people do not expect 87 the electrical circuit-breaker (or fuse) in their home to be 88 triggered, except when there is a wiring fault or a problem with an 89 electrical appliance. 91 Persistent congestion (also known as "congestion collapse") was a 92 feature of the early Internet of the 1980s. This resulted in excess 93 traffic starving other connection from access to the Internet. It 94 was countered by the requirement to use congestion control (CC) by 95 the TCP transport protocol[Jacobsen88] [RFC1112]. These mechanisms 96 operate in Internet hosts to cause TCP connections to "back off" 97 during congestion. The introduction of CC in TCP (currently 98 documented in [RFC5681] ensured the stability of the Internet, 99 because it was able to detect congestion and promptly react. This 100 worked well while TCP was by far the dominant traffic in the 101 Internet, and most TCP flows were long-lived (ensuring that they 102 could detect and respond to congestion before the flows terminated). 103 This is no longer the case, and non-congestion controlled traffic, 104 such as UDP can form a significant proportion of the total traffic 105 traversing a link. The current Internet therefore requires that non- 106 congestion controlled traffic needs to be considered to avoid 107 congestion collapse. 109 There are important differences between a transport circuit-breaker 110 and a congestion-control method. Specifically, congestion control 111 (as implemented in TCP, SCTP, and DCCP) needs to operate on the 112 timescale on the order of a packet round-trip-time (RTT), the time 113 from sender to destination and return. Congestion control methods 114 may react to a single packet loss/marking and reduce the transmission 115 rate for each loss or congestion event. The goal is usually to limit 116 the maximum transmission rate that reflects the available capacity of 117 a network path. These methods typically operate on individual 118 traffic flows (e.g. a 5-tuple). 120 In contrast, CBs are recommended for traffic aggregates, e.g.traffic 121 sent using a network tunnel. Later sections provide examples of 122 cases where circuit-breakers may or may not be desirable. 124 A CB needs to be designed to trigger robustly when there is 125 persistent congestion. It will often operate on a much longer 126 timescale: many RTTs, possibly many 10s of seconds. This longer 127 period is needed to provide sufficient time for transports (or 128 applications) to adjust their rate following congestion, and for the 129 network load to stabilise after adjustment. A CB also needs to 130 decide if a reaction is required based on a series of successive 131 samples taken over a reasonably long period of time. This is to 132 ensure that a CB does not accidentally trigger following a single (or 133 even successive) congestion events (congestion events are what 134 triggers congestion control, and are to be regarded as normal on a 135 network link operating near its capacity). 137 1.1. Types of Circuit-Breaker 139 There are various forms of circuit breaker, which are differentiated 140 mainly on the timescale over which they are triggered, but also in 141 the intended protection they offer: 143 o Fast-Trip Circuit Breakers: The relatively short timescale used by 144 this form of circuit breaker is intended to protect a flow or 145 related group of flows. 147 o Slow-Trip Circuit Breakers: This circuit breaker utilises a longer 148 timescale and is designed to protect traffic aggregates. 150 o Managed Circuit Breakers: Utilise the operations and management 151 functions that may be present in a managed service to implement a 152 circuit breaker. 154 Examples of each type of circuit breaker are provided in section 4. 156 2. Terminology 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 160 document are to be interpreted as described in [RFC2119]. 162 3. Designing a Circuit-Breaker (What makes a good circuit breaker?) 164 Although circuit breakers have been talked about in the IETF for many 165 years, there has not yet been guidance on the cases where they are 166 need for or the design of circuit breaker mechanisms. This document 167 seeks to offer advise on these topics. 169 The basic design of a circuit breaker involves communication between 170 the sender and receiver of a network flow. It is assumed that a 171 sender can control the rate of the flow, but the effect of congestion 172 can only be measured at the corresponding receiver (after loss/ 173 marking is experienced across the end-to-end path). The receiver 174 therefore needs to be responsible for either measuring the level of 175 congestion (and returning this measure to the sender to inform a 176 trigger) or for detecting excessive congestion (returning the trigger 177 to the sender). Whether the trigger is generated at the receiver or 178 based on measurements returned to the sender, the result of the 179 trigger (the circuit-breaker action) needs to be applied at the 180 sender. 182 The set of components needed to implement a circuit breaker are: 184 o There MUST be a control path from the receiver to the sender. 185 Ideally the CB should trigger if this control path fails. That 186 is, the feedback indicating a congested period is designed so that 187 the sender triggers the CB action when it fails to receive reports 188 from the receiver that indicate an absence of congestion, rather 189 than relying on the successful transmission of a "congested" 190 signal back to the sender. (The feedback signal could itself be 191 lost under congestion collapse). 193 o A CB MUST define a measurement period over which the receiver 194 measures the level of congestion. This method does not have to 195 detect individual packet loss, but MUST have a way to know that 196 packets have been lost/marked from the traffic flow. If ECN is 197 enabled, a receiver MAY also count the number of Explicit 198 Congestion Notification (ECN)[RFC3168] marks per measurement 199 interval, but even if ECN is used, the loss MUST still be 200 measured, since this better reflects the impact of excessive 201 congestion. The type of CB will determine how long this 202 measurement period needs to be. The minimum time must be 203 significantly longer than the time that current CC algorithms need 204 to reduce their rate following detection of congestion (i.e. many 205 path RTTs). 207 o A CB MUST define a threshold to determine whether the measured 208 congestion is considered excessive. 210 o A CB MUST define a period over which the trigger uses collected 211 measurements. 213 o A CB MUST be robust to multiple congestion events. This usually 214 will define a number of measured excessive congestion events per 215 triggering period. For example, a CB may combine the results of 216 several measurement periods to determine if the CB is triggered. 217 (e.g. triggered when excessive congestion is detected in 3 218 measurements within the triggering interval). 220 o A triggered CB MUST react decisively by reducing traffic at the 221 source (e.g. tunnel egress). A CB SHOULD be constructed so that 222 it does not trigger under light or intermittent congestion, hence 223 the response when triggered needs to be much more severe than that 224 of a CC algorithm. By default, a CB SHOULD disable the flow, it 225 could alternatively significantly reduce the rate of the flow it 226 controls. 228 o Triggering a CB SHOULD result in a response that continues for a 229 period of time. This by default SHOULD be at least the triggering 230 interval. Manual operator intervention MAY be required to restore 231 the flow. If an automated response is needed to restore the flow, 232 then this MUST NOT be immediate. 234 o When a CB is triggered, it SHOULD be regarded as an abnormal 235 network event. As such, this event SHOULD be logged. The 236 measurements that lead to triggering of the CB SHOULD also be 237 logged. 239 3.1. Basic Function 241 This section provides one example of a suitable method to measure 242 congestion: 244 1. A sender or a tunnel ingress records the number of packets/bytes 245 sent in each measurement interval. The measurement interval 246 could be every few seconds. 248 2. The receiver or tunnel egress also records the number/bytes 249 received (at ) in each measurement interval. 251 3. The receiver periodically returns the measured values. (This 252 could be using Operations and Management (OAM), or an in-band 253 signalling datagram). 255 4. Using the ingress and egress measurements, the loss rate for each 256 measurement interval can be deduced from calculating the 257 difference between these two counter values. Note that accurate 258 measurement intervals are not typically important, since isolated 259 loss events need to be disregard. An appropriate threshold for 260 determining excessive congestion needs to be set (e.g. more than 261 10% loss, but other methods could also be based on the rate of 262 transmission as well as the loss rate). 264 5. The transport circuit breaker is triggered when the threshold is 265 exceeded in multiple measurement intervals (e.g. 3 successive 266 measurements). This design is to be robust to single or spurious 267 events resulting in a trigger. 269 6. The design may also trigger loss when it does not receive 270 receiver measurements for 3 successive measurement periods - this 271 may indicate a loss of control packets. 273 4. Examples of Circuit Breakers 275 This section provides examples of different types of circuit breaker. 276 There are multiple types of circuit breaker that may be defined for 277 use in different deployment cases: 279 4.1. A fast-trip Circuit Breaker 281 A fast-trip circuit breaker is the most responsive It has a response 282 time that is only slightly larger than that of the traffic it 283 controls. It is suited to traffic with well-understood 284 characteristics. It is not be suited to arbitrary network traffic, 285 since it may prematurely trigger (e.g. when multiple congestion- 286 controlled flows lead to short-term overload). 288 4.1.1. A fast-trip RTP Circuit Breaker 290 A set of fast-trip CB methods have been specified for use together by 291 a Real-time Transport Protocol (RTP) flow using the RTP/AVP Profile 292 :[RTP-CB] . It is expected that, in the absence of severe congestion, 293 all RTP applications running on best-effort IP networks will be able 294 to run without triggering these circuit breakers. 296 The RTP congestion control specification is therefore implemented as 297 a fail-safe. 299 The sender monitors reception of RTCP Reception Report (RR or XRR) 300 packets that convey reception quality feedback information. This is 301 used to measure (congestion) loss, possibly in combination with ECN 302 [RFC6679]. 304 The CB action (shutdown of the flow) is triggered when any of the 305 following trigger conditions are true: 307 1. An RTP CB triggers on reported lack of progress. 309 2. An RTP CB triggers when no receiver reports messages are 310 received. 312 3. An RTP CB uses a TFRC-style check and set a hard upper limit to 313 the long-term RTP throughput (over many RTTs). 315 4. An RTP CB includes the notion of Media Usability. This circuit 316 breaker is triggered when the quality of the transported media 317 falls below some required minimum acceptable quality. 319 4.2. A Slow-trip Circuit Breaker 321 It is expected that most circuit breakers will be slower at 322 responding to loss. 324 One example where a circuit breaker is needed is where flows or 325 traffic-aggregates use a tunnel or encapsulation and the flows within 326 the tunnel do not all support TCP-style congestion control (e.g. TCP, 327 SCTP, TFRC), see [RFC5405] section 3.1.3. The usual case where this 328 is needed is when tunnels are deployed in the general Internet 329 (rather than "controlled environments" within an ISP or Enterprise), 330 especially when the tunnel may need to cross a customer access 331 router. 333 4.3. A Managed Circuit Breaker 335 This type of circuit breaker is implemented in the signalling 336 protocol or management plane that relates to the traffic aggregate 337 being controlled. This type of circuit breaker is typically 338 applicable when the deployment is within a "controlled environment". 340 4.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires 342 [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example 343 of a managed circuit breaker for isochronous flows. 345 If such flows were to run over a pre-provisioned (e.g. MPLS) 346 infrastructure, then it may be expected that the Pseudo-Wire (PW) 347 would not experience congestion, because a flow is not expected to 348 either increase (or decrease) their rate. If instead Pseudo-Wire 349 traffic is multiplexed with other traffic over the general Internet, 350 it could experience congestion. [RFC4553] states: "If SAToP PWs run 351 over a PSN providing best-effort service, they SHOULD monitor packet 352 loss in order to detect "severe congestion". The currently 353 recommended measurement period is 1 second, and the trigger operates 354 when there are more than three measured Severely Errored Seconds 355 (SES) within a period. 357 If such a condition is detected, a SAToP PW should shut down 358 bidirectionally for some period of time..." The concept was that when 359 the packet loss ratio (congestion) level increased above a threshold, 360 the PW was by default disabled. This use case considered fixed-rate 361 transmission, where the PW had no reasonable way to shed load. 363 The trigger needs to be set at the rate the PW was likely have a 364 serious problem, possibly making the service non-compliant. At this 365 point triggering the CB would remove the traffic prevent undue impact 366 congestion-responsive traffic (e.g., TCP). Part of the rationale, 367 was that high loss ratios typically indicated that something was 368 "broken" and should have already resulted in operator intervention, 369 and should trigger this intervention. An operator-based response 370 provides opportunity for other action to restore the service quality, 371 e.g. by shedding other loads or assigning additional capacity, or to 372 consciously avoid reacting to the trigger while engineering a 373 solution to the problem. This may require the trigger to be sent to 374 a third location (e.g. a network operations centre, NOC) responsible 375 for operation of the tunnel ingress, rather than the tunnel ingress 376 itself. 378 5. Examples where circuit breakers may not be needed. 380 A CB is not required for a single CC-controlled flow using TCP, SCTP, 381 TFRC, etc. In these cases, the CC methods are designed to prevent 382 congestion collapse. 384 5.1. CBs and uni-directional Traffic 386 A CB can not be used to control uni-directional UDP traffic. The 387 lack of feedback prevents automated triggering of the CB. Supporting 388 this type of traffic in the general Internet requires operator 389 monitoring to detect and respond to congestion collapse or the use of 390 dedicated capacity - e.g. Using per-provisioned MPLS services, RSVP, 391 or admission-controlled Differentiated Services. 393 5.2. CBs over pre-provisioned Capacity 395 One common question is whether a CB is needed when a tunnel is 396 deployed in a private network with pre-provisioned capacity? In this 397 case, compliant traffic that does not exceed the provisioned capacity 398 should not result in congestion. The CB will hence only be triggered 399 when there is non-compliant traffic. It could be argued that this 400 event should never happen - but it may also be argued that the CB 401 equally should never be triggered. If a CB were to be implemented, 402 it would provide an appropriate response should this excessive 403 congestion occur in an operational network. 405 5.3. CBs with CC Traffic 407 IP-based traffic is generally assumed to be congestion-controlled, 408 i.e., it is assumed that the transport protocols generating IP-based 409 traffic at the sender already employ mechanisms that are sufficient 410 to address congestion on the path [RFC5405]. A question therefore 411 arises when people deploy a tunnel that is thought to only carry an 412 aggregate of TCP (or some other CC-controlled) traffic: Is there 413 advantage in this case in using a CB? For sure, traffic in a such a 414 tunnel will respond to congestion. However, the answer to the 415 question is not obvious, because the overall traffic formed by an 416 aggregate of flows that implement a CC mechanism does not necessarily 417 prevent congestion collapse. For instance, most CC mechanisms 418 require long-lived flows to react to reduce the rate of a flow, an 419 aggregate of many short flows may result in many terminating before 420 they experience congestion. It is also often impossible for a tunnel 421 service provider to know that the tunnel only contains CC-controlled 422 traffic (e.g. Inspecting packet headers may not be possible). The 423 important thing to note is that if the aggregate of the traffic does 424 not result in persistent congestion (impacting other flows), then the 425 CB will not trigger. This is the expected case in this context - so 426 implementing a CB will not reduce performance of the tunnel, but 427 offers protection should congestion collapse occur. 429 6. Security Considerations 431 This section will describe security considerations. 433 7. IANA Considerations 435 This document makes no request from IANA. 437 8. Acknowledgments 439 There are many people who have discussed and described the issues 440 that have motivated this draft. 442 9. Revision Notes 444 RFC-Editor: Please remove this section prior to publication 446 Draft 00 448 This was the first revision. Help and comments are greatly 449 appreciated. 451 10. References 453 10.1. Normative References 455 [Jacobsen88] 456 European Telecommunication Standards, Institute (ETSI), 457 "Congestion Avoidance and Control", SIGCOMM Symposium 458 proceedings on Communications architectures and 459 protocols", August 1998. 461 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 462 Requirement Levels", BCP 14, RFC 2119, March 1997. 464 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 465 for Application Designers", BCP 145, RFC 5405, November 466 2008. 468 [RTP-CB] and , "Multimedia Congestion Control: Circuit Breakers for 469 Unicast RTP Sessions", February 2014. 471 10.2. Informative References 473 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 474 RFC 1112, August 1989. 476 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 477 of Explicit Congestion Notification (ECN) to IP", RFC 478 3168, September 2001. 480 [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time 481 Division Multiplexing (TDM) over Packet (SAToP)", RFC 482 4553, June 2006. 484 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 485 Control", RFC 5681, September 2009. 487 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 488 Notification", RFC 6040, November 2010. 490 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 491 and K. Carlberg, "Explicit Congestion Notification (ECN) 492 for RTP over UDP", RFC 6679, August 2012. 494 Author's Address 496 Godred Fairhurst 497 University of Aberdeen 498 School of Engineering 499 Fraser Noble Building 500 Aberdeen, Scotland AB24 3UE 501 UK 503 Email: gorry@erg.abdn.ac.uk 504 URI: http://www.erg.abdn.ac.uk