idnits 2.17.1 draft-ietf-tsvwg-circuit-breaker-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 07, 2015) is 3125 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group G. Fairhurst 3 Internet-Draft University of Aberdeen 4 Intended status: Best Current Practice October 07, 2015 5 Expires: April 9, 2016 7 Network Transport Circuit Breakers 8 draft-ietf-tsvwg-circuit-breaker-05 10 Abstract 12 This document explains what is meant by the term "network transport 13 Circuit Breaker" (CB). It describes the need for circuit breakers 14 when using network tunnels, and other non-congestion controlled 15 applications, and explains where circuit breakers are, and are not, 16 needed. It also defines requirements for building a circuit breaker 17 and the expected outcomes of using a circuit breaker within the 18 Internet. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 9, 2016. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 1.1. Types of Circuit-Breaker . . . . . . . . . . . . . . . . 4 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3. Design of a Circuit-Breaker (What makes a good circuit 58 breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.1. Functional Components . . . . . . . . . . . . . . . . . . 5 60 4. Requirements for a Network Transport Circuit Breaker . . . . 8 61 4.1. Unidirectional Circuit Breakers over Controlled Paths . . 10 62 4.1.1. Use with a multicast control/routing protocol . . . . 10 63 4.1.2. Use with control protocols supporting pre- 64 prosvisioned capacity . . . . . . . . . . . . . . . . 12 65 5. Examples of Circuit Breakers . . . . . . . . . . . . . . . . 12 66 5.1. A Fast-Trip Circuit Breaker . . . . . . . . . . . . . . . 12 67 5.1.1. A Fast-Trip Circuit Breaker for RTP . . . . . . . . . 13 68 5.2. A Slow-trip Circuit Breaker . . . . . . . . . . . . . . . 13 69 5.3. A Managed Circuit Breaker . . . . . . . . . . . . . . . . 14 70 5.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires . . 14 71 5.3.2. A Managed Circuit Breaker for Pseudowires (PWs) . . . 15 72 6. Examples where circuit breakers may not be needed. . . . . . 15 73 6.1. CBs over pre-provisioned Capacity . . . . . . . . . . . . 15 74 6.2. CBs with tunnels carrying Congestion-Controlled Traffic . 16 75 6.3. CBs with Uni-directional Traffic and no Control Path . . 16 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 78 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 79 10. Revision Notes . . . . . . . . . . . . . . . . . . . . . . . 18 80 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 81 11.1. Normative References . . . . . . . . . . . . . . . . . . 19 82 11.2. Informative References . . . . . . . . . . . . . . . . . 19 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 21 85 1. Introduction 87 A network transport Circuit Breaker (CB) is an automatic mechanism 88 that is used to estimate congestion caused by a flow, and to 89 terminate (or significantly reduce the rate of) the flow when 90 persistent congestion is detected. This is a safety measure to 91 prevent congestion collapse (starvation of network resources denying 92 other flows from access to the Internet), essential for an Internet 93 that is heterogeneous and for traffic that is hard to predict in 94 advance. 96 The term "Circuit Breaker" originates in electricity supply, and has 97 nothing to do with network circuits or virtual circuits. In 98 electricity supply, a Circuit Breaker is intended as a protection 99 mechanism of last resort. Under normal circumstances, a Circuit 100 Breaker ought not to be triggered; it is designed to protect the 101 supply network and attached equipment when there is overload. Just 102 as people do not expect the electrical circuit-breaker (or fuse) in 103 their home to be triggered, except when there is a wiring fault or a 104 problem with an electrical appliance. 106 In networking, the Circuit Breaker principle can be used as a 107 protection mechanism of last resort to avoid persistent congestion. 108 Persistent congestion (also known as "congestion collapse") was a 109 feature of the early Internet of the 1980s. This resulted in excess 110 traffic starving other connection from access to the Internet. It 111 was countered by the requirement to use congestion control (CC) by 112 the Transmission Control Protocol (TCP) [Jacobsen88] [RFC1112]. 113 These mechanisms operate in Internet hosts to cause TCP connections 114 to "back off" during congestion. The introduction of a Congestion 115 Controller in TCP (currently documented in [RFC5681] ensured the 116 stability of the Internet, because it was able to detect congestion 117 and promptly react. This worked well while TCP was by far the 118 dominant traffic in the Internet, and most TCP flows were long-lived 119 (ensuring that they could detect and respond to congestion before the 120 flows terminated). This is no longer the case, and non-congestion 121 controlled traffic, including many applications of the User Datagram 122 Protocol (UDP) can form a significant proportion of the total traffic 123 traversing a link. The current Internet therefore requires that non- 124 congestion controlled traffic needs to be considered to avoid 125 congestion collapse. 127 There are important differences between a transport circuit-breaker 128 and a congestion-control method. Specifically, congestion control 129 (as implemented in TCP, SCTP, and DCCP) operates on the timescale on 130 the order of a packet round-trip-time (RTT), the time from sender to 131 destination and return. Congestion control methods are able to react 132 to a single packet loss/marking and reduce the transmission rate for 133 each loss or congestion event. The goal is usually to limit the 134 maximum transmission rate to a rate that reflects the available 135 capacity across a network path. These methods typically operate on 136 individual traffic flows (e.g., a 5-tuple). 138 In contrast, Circuit Breakers are recommended for non-congestion- 139 controlled Internet flows and for traffic aggregates, e.g., traffic 140 sent using a network tunnel. People have been implementing what this 141 draft characterizes as circuit breakers on an ad hoc basis to protect 142 Internet traffic, this draft therefore provides guidance on how to 143 deploy and use these mechanisms. Later sections provide examples of 144 cases where circuit-breakers may or may not be desirable. 146 A Circuit Breaker needs to measure (meter) the traffic to determine 147 if the network is experiencing congestion and needs to be designed to 148 trigger robustly when there is persistent congestion. This means the 149 trigger needs to operate on a timescale much longer than the path 150 round trip time (e.g., seconds to possibly many tens of seconds). 151 This longer period is needed to provide sufficient time for 152 transports (or applications) to adjust their rate following 153 congestion, and for the network load to stabilize after any 154 adjustment. 156 A Circuit Breaker trigger will often utilize a series of successive 157 sample measurements metered at an ingress point and an egress point 158 (either of which could be a transport endpoint). These measurements 159 need to be taken over a reasonably long period of time. This is to 160 ensure that a Circuit Breaker does not accidentally trigger following 161 a single (or even successive) congestion events (congestion events 162 are what triggers congestion control, and are to be regarded as 163 normal on a network link operating near its capacity). Once 164 triggered, a control function needs to remove traffic from the 165 network, either by disabling the flow or by significantly reducing 166 the level of traffic. This reaction provides the required protection 167 to prevent persistent congestion being experienced by other flows 168 that share the congested part of the network path. 170 Section 4 defines requirements for building a Circuit Breaker. 172 1.1. Types of Circuit-Breaker 174 There are various forms of network transport circuit breaker. These 175 are differentiated mainly on the timescale over which they are 176 triggered, but also in the intended protection they offer: 178 o Fast-Trip Circuit Breakers: The relatively short timescale used by 179 this form of circuit breaker is intended to protect a flow or 180 related group of flows. 182 o Slow-Trip Circuit Breakers: This circuit breaker utilizes a longer 183 timescale and is designed to protect traffic aggregates. 185 o Managed Circuit Breakers: Utilize the operations and management 186 functions that might be present in a managed service to implement 187 a circuit breaker. 189 Examples of each type of circuit breaker are provided in section 4. 191 2. Terminology 193 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 194 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 195 document are to be interpreted as described in [RFC2119]. 197 3. Design of a Circuit-Breaker (What makes a good circuit breaker?) 199 Although circuit breakers have been talked about in the IETF for many 200 years, there has not yet been guidance on the cases where circuit 201 breakers are needed or upon the design of circuit breaker mechanisms. 202 This document seeks to offer advice on these two topics. 204 Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that 205 carry non-congestion-controlled Internet flows and for traffic 206 aggregates, e.g., traffic sent using a network tunnel. Designers of 207 other protocols and tunnel encapsulations also ought to consider the 208 use of these techniques to provide last resort protection to the 209 network paths that these are used. 211 This document defines the requirements for design of a Circuit 212 Breaker and provides examples of how a Circuit Breaker can be 213 constructed. The specifications of individual protocols and tunnels 214 encapsulations need to detail the protocol mechanisms needed to 215 implement a Circuit Breaker. 217 Section 3.1 describes the functional components of a circuit breaker 218 and section 3.2 defines requirements for implementing a Circuit 219 Breaker. 221 3.1. Functional Components 223 The basic design of a transport circuit breaker involves 224 communication between an ingress point (a sender) and an egress point 225 (a receiver) of a network flow. A simple picture of Circuit Breaker 226 operation is provided in figure 1. This shows a set of routers (each 227 labelled R) connecting a set of endpoints. A Circuit Breaker is used 228 to control traffic passing through a subset of these routers, acting 229 between the ingress and a egress point network devices. The path 230 between the ingress and egress could be provided by a tunnel or other 231 network-layer technique. One expected use would be at the ingress 232 and egress of a service. 234 +--------+ +--------+ 235 |Endpoint| |Endpoint| 236 +--+-----+ >>> circuit breaker traffic >>> +--+-----+ 237 | | 238 | +-+ +-+ +---------+ +-+ +-+ +-+ +--------+ +-+ +-+ | 239 +-+R+--+R+->+ Ingress +--+R+--+R+--+R+--+ Egress |--+R+--+R+-+ 240 +++ +-+ +------+--+ +-+ +-+ +-+ +-----+--+ +++ +-+ 241 | ^ | | | 242 | | +--+------+ +------+--+ | 243 | | | Ingress | | Egress | | 244 | | | Meter | | Meter | | 245 | | +----+----+ +----+----+ | 246 | | | | | 247 +-+ | | +----+----+ | | +-+ 248 |R+--+ | | Measure +<----------------+ +--+R| 249 +++ | +----+----+ Reported +++ 250 | | | Egress | 251 | | +----+----+ Measurement | 252 +--+-----+ | | Trigger + +--+-----+ 253 |Endpoint| | +----+----+ |Endpoint| 254 +--------+ | | +--------+ 255 +---<---+ 256 Reaction 258 Figure 1: A CB controlling the part of the end-to-end path between an 259 ingress point and an egress point. (Note: In some cases, the trigger 260 and measure functions could alternatively be located at other 261 locations (e.g., at a network operations centre.) 263 In the context of a Circuit Breaker, the ingress and egress functions 264 could be implemented in different places. For exmaple, they could be 265 located in network devices at a tunnel ingress and at the tunnel 266 egresss. In some cases, they could be located at one or both network 267 endpoints (see figure 2), implemented as components within a 268 transport protocol. 270 +----------+ +----------+ 271 | Ingress | +-+ +-+ +-+ | Egress | 272 | Endpoint +->+R+--+R+--+R+--+ Endpoint | 273 +--+----+--+ +-+ +-+ +-+ +----+-----+ 274 ^ | | 275 | +--+------+ +----+----+ 276 | | Ingress | | Egress | 277 | | Meter | | Meter | 278 | +----+----+ +----+----+ 279 | | | 280 | +--- +----+ | 281 | | Measure +<-----------------+ 282 | +----+----+ Reported 283 | | Egress 284 | +----+----+ Measurement 285 | | Trigger | 286 | +----+----+ 287 | | 288 +---<--+ 289 Reaction 291 Figure 2: An endpoint CB implemented at the sender (ingress) and 292 receiver (egress). 294 The set of components needed to implement a Circuit Breaker are: 296 1. An ingress meter (at the sender or tunnel ingress) records the 297 number of packets/bytes sent in each measurement interval. This 298 measures the offered network load. For example, the measurement 299 interval could be every few seconds. 301 2. An egress meter (at the receiver or tunnel egress) records the 302 number/bytes received in each measurement interval. This 303 measures the supported load and could utilize other signals to 304 detect the effect of congestion (e.g., loss/marking experienced 305 over the path). 307 3. The measured values at the ingress and egress are communicated to 308 the Circuit Breaker Measurement function. This could use several 309 methods including: Sending return measurement packets from a 310 receiver to a trigger function at the sender; An implementation 311 using Operations, Administration and Management (OAM); or be 312 sending another in-band signalling datagram to the trigger 313 function. This could also be implemented purely as a control 314 plane function, e.g., using a software-defined network 315 controller. 317 4. The measurement function combines the ingress and egress 318 measurements to assess the present level of network congestion. 319 (For example, the loss rate for each measurement interval could 320 be deduced from calculating the difference between ingress and 321 egress counter values. Note the method does not require high 322 accuracy for the period of the measurement interval (or therefore 323 the measured value, since isolated and/or infrequent loss events 324 need to be disregarded.) 326 5. A trigger function determines if the measurements indicate 327 persistent congestion. This function defines an appropriate 328 threshold for determining there is persistent congestion between 329 the ingress and egress. This preferably consider rate or ratio, 330 rather than an absolute value (e.g., more than 10% loss, but 331 other methods could also be based on the rate of transmission as 332 well as the loss rate). The transport Circuit Breaker is 333 triggered when the threshold is exceeded in multiple measurement 334 intervals (e.g., 3 successive measurements). Designs need to be 335 robust so that single or spurious events do not trigger a 336 reaction. 338 6. A reaction that is applied that the Ingress when the Circuit 339 Breaker is triggered. This seeks to automatically remove the 340 traffic causing persistent congestion. 342 7. A feedback mechanism that triggers when either the receive or 343 ingress and egress measurements are not available, since this 344 also could indicate a loss of control packets (also a symptom of 345 heavy congestion or inability to control the load). 347 4. Requirements for a Network Transport Circuit Breaker 349 The requirements for implementing a Circuit Breaker are: 351 o There MUST be a control path from the ingress meter and the egress 352 meter to the point of measurement. The Circuit Breaker MUST 353 trigger if this control path fails. That is, the feedback 354 indicating a congested period needs to be designed so that the 355 Circuit Breaker is triggered when it fails to receive measurement 356 reports that indicate an absence of congestion, rather than 357 relying on the successful transmission of a "congested" signal 358 back to the sender. (The feedback signal could itself be lost 359 under congestion). 361 o A Circuit Breaker MUST define a measurement period over which the 362 receiver measures the level of congestion or loss. This method 363 does not have to detect individual packet loss, but MUST have a 364 way to know that packets have been lost/marked from the traffic 365 flow. If Explicit Congestion Notification (ECN) is enabled 366 [RFC3168], an egress meter MAY also count the number of ECN 367 congestion marks/event per measurement interval, but even if ECN 368 is used, loss MUST still be measured, since this better reflects 369 the impact of persistent congestion. In this context, loss 370 represents a reliable indication of congestion, as opposed to the 371 finer-grain marking of incipient congestion that can be provided 372 via ECN. The type of Circuit Breaker will determine how long this 373 measurement period needs to be. 375 o The measurement period MUST be longer than the time that current 376 Congestion Control algorithms need to reduce their rate following 377 detection of congestion. This is important because end-to-end 378 Congestion Control algorithms require at least one RTT to notify 379 and adjust to experienced congestion, and congestion bottlenecks 380 can share traffic with a diverse range of RTTs and Circuit 381 Breakers hence need to perform measurements over a sufficiently 382 long period to avoid additionally penalizing flows with a long 383 path RTT (e.g., many path RTTs). In some implementations, this 384 may require a measurement to combine multiple meter samples to 385 achieve a sufficiently long measurement period. In most cases, 386 the measurement period is expected to be significantly longer than 387 the RTT experience by the Circuit Breaker itself. 389 o A Circuit Breaker is REQUIRED to define a threshold to determine 390 whether the measured congestion is considered excessive. 392 o A Circuit Breaker is REQUIRED to define the triggering interval, 393 defining the period over which the trigger uses the collected 394 measurements. 396 o A Circuit Breaker MUST be robust to multiple congestion events. 397 This usually will define a number of measured persistent 398 congestion events per triggering period. For example, a Circuit 399 Breaker MAY combine the results of several measurement periods to 400 determine if the Circuit Breaker is triggered. (e.g., triggered 401 when persistent congestion is detected in 3 of the measurements 402 within the triggering interval). 404 o A Circuit Breaker SHOULD be constructed so that it does not 405 trigger under light or intermittent congestion, with a default 406 response to a trigger that disables all traffic that contributed 407 to congestion. 409 o Once triggered, the Circuit Breaker MUST react decisively by 410 disabling or significantly reducing traffic at the source (e.g., 411 ingress). A reaction that results in a reduction SHOULD result in 412 reducing the traffic by at least a factor of ten, each time the 413 Circuit Breaker is triggered. This response needs to be much more 414 severe than that of a Congestion Controller algorithm (such as 415 TCP's congestion control [RFC5681] or TFRC [RFC5348]), because the 416 Circuit Breaker reacts to more persistent congestion and operates 417 over longer timescales (i.e., the overload condition will have 418 persisted for a longer time before the Circuit Breaker is 419 triggered). 421 o A Circuit Breaker that reduces the rate of a flow, MUST continue 422 to monitor the level congestion and MUST further reduce the rate 423 if the Circuit Breaker is again triggered. 425 o The reaction to a triggered Circuit Breaker MUST continue for a 426 period that is at least the triggering interval. Manual operator 427 intervention will usually be required to restore a flow. If an 428 automated response is needed to reset the trigger, then this needs 429 to not be immediate. The design of an automated reset mechanism 430 needs to be sufficiently conservative that it does not adversely 431 interact with other mechanisms (including other Circuit Breaker 432 algorithms that control traffic over a common path). It SHOULD 433 NOT perform an automated reset when there is evidence of continued 434 congestion. 436 o When a Circuit Breaker is triggered, it SHOULD be regarded as an 437 abnormal network event. As such, this event SHOULD be logged. 438 The measurements that lead to triggering of the Circuit Breaker 439 SHOULD also be logged. 441 4.1. Unidirectional Circuit Breakers over Controlled Paths 443 A Circuit Breaker can be used to control uni-directional UDP traffic, 444 providing that there is a control path to connect the functional 445 components at the Ingress and Egress. This control path can exist in 446 networks for which the traffic flow is purely unidirectional. For 447 example, a multicast stream that sends packets across an Internet 448 path and can use multicast routing to prune flows to shed network 449 load. Some other types of subnetwork also utilize control protocols 450 that can be used to control traffic flows. 452 4.1.1. Use with a multicast control/routing protocol 453 +----------+ +--------+ +----------+ 454 | Ingress | +-+ +-+ +-+ | Egress | | Egress | 455 | Endpoint +->+R+--+R+--+R+--+ Router |--+ Endpoint +->+ 456 +----+-----+ +-+ +-+ +-+ +---+--+-+ +----+-----+ | 457 ^ ^ ^ ^ | ^ | | 458 | | | | | | | | 459 +----+----+ + - - - < - - - - + | +----+----+ | Reported 460 | Ingress | multicast Prune | | Egress | | Ingress 461 | Meter | | | Meter | | Measurement 462 +---------+ | +----+----+ | 463 | | | 464 | +----+----+ | 465 | | Measure +<--+ 466 | +----+----+ 467 | | 468 | +----+----+ 469 multicast | | Trigger | 470 Leave | +----+----+ 471 Message | | 472 +----<----+ 474 Figure 3: An example of a multicast CB controlling the end-to-end 475 path between an ingress endpoint and an egress endpoint. 477 Figure 3 shows one example of how a multicast circuit breaker could 478 be implemented at a pair of multicast endpoints (e.g. to implement a 479 Section 5.1). The ingress endpoint (the sender that sources the 480 multicast traffic) meters the ingress load, generating an ingress 481 measurement (e.g., recording timestamped packet counts), and sends 482 this measurement to the multicast group together with the traffic it 483 has measured. 485 Routers along a multicast path forward the multicast traffic 486 (including the ingress measurement) to all active endpoint receivers. 487 Each last hop (egress) router forwards the traffic to one or more 488 egress endpoint(s). 490 In this figure, each endpoint includes a meter that performs a local 491 egress load measurement. An endpoint also extracts the received 492 ingress measurement from the traffic, and compares the ingress and 493 egress measurements to determine if the Circuit Breaker ought to be 494 triggered. This measurement has to be robust to loss (see previous 495 section). If the Circuit Breaker is triggered, it generates a 496 multicast leave message for the egress (e.g., an IGMP or MLD message 497 sent to the last hop router), which causes the upstream router to 498 cease forwarding traffic to the egress endpoint. 500 Any multicast router that has no active receivers for a particular 501 multicast group will prune traffic for that group, sending a prune 502 message to its upstream router. This starts the process of releasing 503 the capacity used by the traffic and is a standard multicast routing 504 function (e.g., using the PIM-SM routing protocol). Each egress 505 operates autonomously, and the circuit breaker "reaction" is executed 506 by the multicast control plane (e.g., PIM l), requiring no explicit 507 signalling by the circuit breaker along the control path. Note: 508 there is no direct communication with the Ingress, and hence a 509 triggered Circuit Breaker only controls traffic downstream of the 510 first hop router. It does not stop traffic flowing from the sender 511 to the first hop router; this is however the common practice for 512 multicast deployment. 514 The method could also be used with a multicast tunnel or subnetwork 515 (e.g., Section 5.2, Section 5.3), where a meter at the ingress 516 generates additional control messages to carry the measurement data 517 towards the egress where the egress metering is implemented. 519 4.1.2. Use with control protocols supporting pre-prosvisioned capacity 521 Some paths are provisioned using a control protocol, e.g., flows 522 provisioned using the Multi-Protocol Label Switching (MPLS) services, 523 path provisioned using the Resource reservation protocol (RSVP), 524 networks utilizing Software Defined Network (SDN) functions, or 525 admission-controlled Differentiated Services. 527 Figure 1 shows one expected use case, where in this usage a separate 528 device could be used to perform the measurement and trigger 529 functions. The reaction generated by the trigger could take the form 530 of a network control message sent to the ingress and/or other network 531 elements causing these elements to react to the Circuit Breaker. 532 Examples of this type of use are provided in section Section 5.3. 534 5. Examples of Circuit Breakers 536 There are multiple types of Circuit Breaker that could be defined for 537 use in different deployment cases. This section provides examples of 538 different types of circuit breaker: 540 5.1. A Fast-Trip Circuit Breaker 542 A fast-trip circuit breaker is the most responsive form of Circuit 543 Breaker. It has a response time that is only slightly larger than 544 that of the traffic that it controls. It is suited to traffic with 545 well-understood characteristics (and could include one or more 546 trigger functions specifically tailored the type of traffic for which 547 it is designed). It is not suited to arbitrary network traffic, 548 since it could prematurely trigger (e.g., when multiple congestion- 549 controlled flows lead to short-term overload). 551 5.1.1. A Fast-Trip Circuit Breaker for RTP 553 A set of fast-trip Circuit Breaker methods have been specified for 554 use together by a Real-time Transport Protocol (RTP) flow using the 555 RTP/AVP Profile [RTP-CB]. It is expected that, in the absence of 556 severe congestion, all RTP applications running on best-effort IP 557 networks will be able to run without triggering these circuit 558 breakers. A fast-trip RTP Circuit Breaker is therefore implemented 559 as a fail-safe that when triggred will terminate RTP traffic. 561 The sender monitors reception of RTCP reception report blocks, as 562 contained in SR or RR packets, that convey reception quality feedback 563 information. This is used to measure (congestion) loss, possibly in 564 combination with ECN [RFC6679]. 566 The Circuit Breaker action (shutdown of the flow) is triggered when 567 any of the following trigger conditions are true: 569 1. An RTP Circuit Breaker triggers on reported lack of progress. 571 2. An RTP Circuit Breaker triggers when no receiver reports messages 572 are received. 574 3. An RTP Circuit Breaker uses a TFRC-style check and sets a hard 575 upper limit to the long-term RTP throughput (over many RTTs). 577 4. An RTP Circuit Breaker includes the notion of Media Usability. 578 This circuit breaker is triggered when the quality of the 579 transported media falls below some required minimum acceptable 580 quality. 582 5.2. A Slow-trip Circuit Breaker 584 A slow-trip Circuit Breaker could be implemented in an endpoint or 585 network device. This type of Circuit Breaker is much slower at 586 responding to congestion than a fast-trip Circuit Breaker and is 587 expected to be more common. 589 One example where a slow-trip Circuit Breaker is needed is where 590 flows or traffic-aggregates use a tunnel or encapsulation and the 591 flows within the tunnel do not all support TCP-style congestion 592 control (e.g., TCP, SCTP, TFRC), see [RFC5405] section 3.1.3. A use 593 case is where tunnels are deployed in the general Internet (rather 594 than "controlled environments" within an ISP or Enterprise), 595 especially when the tunnel could need to cross a customer access 596 router. 598 5.3. A Managed Circuit Breaker 600 A managed Circuit Breaker is implemented in the signalling protocol 601 or management plane that relates to the traffic aggregate being 602 controlled. This type of circuit breaker is typically applicable 603 when the deployment is within a "controlled environment". 605 A Circuit Breaker requires more than the ability to determine that a 606 network path is forwarding data, or to measure the rate of a path - 607 which are often normal network operational functions. There is an 608 additional need to determine a metric for congestion on the path and 609 to trigger a reaction when a threshold is crossed that indicates 610 persistent congestion. 612 5.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires 614 [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example 615 of a managed circuit breaker for isochronous flows. 617 If such flows were to run over a pre-provisioned (e.g., MPLS) 618 infrastructure, then it could be expected that the Pseudo-Wire (PW) 619 would not experience congestion, because a flow is not expected to 620 either increase (or decrease) their rate. If instead Pseudo-Wire 621 traffic is multiplexed with other traffic over the general Internet, 622 it could experience congestion. [RFC4553] states: "If SAToP PWs run 623 over a PSN providing best-effort service, they SHOULD monitor packet 624 loss in order to detect "severe congestion". The currently 625 recommended measurement period is 1 second, and the trigger operates 626 when there are more than three measured Severely Errored Seconds 627 (SES) within a period. If such a condition is detected, a SAToP PW 628 ought to shut down bidirectionally for some period of time...". 630 The concept was that when the packet loss ratio (congestion) level 631 increased above a threshold, the PW was by default disabled. This 632 use case considered fixed-rate transmission, where the PW had no 633 reasonable way to shed load. 635 The trigger needs to be set at the rate that the PW was likely to 636 experience a serious problem, possibly making the service non- 637 compliant. At this point, triggering the Circuit Breaker would 638 remove the traffic preventing undue impact on congestion-responsive 639 traffic (e.g., TCP). Part of the rationale, was that high loss 640 ratios typically indicated that something was "broken" and ought to 641 have already resulted in operator intervention, and therefore need to 642 trigger this intervention. 644 An operator-based response provides opportunity for other action to 645 restore the service quality, e.g., by shedding other loads or 646 assigning additional capacity, or to consciously avoid reacting to 647 the trigger while engineering a solution to the problem. This could 648 require the trigger to be sent to a third location (e.g., a network 649 operations centre, NOC) responsible for operation of the tunnel 650 ingress, rather than the tunnel ingress itself. 652 5.3.2. A Managed Circuit Breaker for Pseudowires (PWs) 654 Pseudowires (PWs) [RFC3985] have become a common mechanism for 655 tunneling traffic, and may compete for network resources both with 656 other PWs and with non-PW traffic, such as TCP/IP flows. 658 [ID-ietf-pals-congcons] discusses congestion conditions that can 659 arise when PWs compete with elastic (i.e., congestion responsive) 660 network traffic (e.g, TCP traffic). Elastic PWs carrying IP traffic 661 (see [RFC4488]) do not raise major concerns because all of the 662 traffic involved responds, reducing the transmission rate when 663 network congestion is detected. 665 In contrast, inelastic PWs (e.g., fixed (e.g., fixed bandwidth Time 666 Division Multiplex (TDM) [RFC4553] [RFC5086] [RFC5087]) have the 667 potential to harm congestion responsive traffic or to contribute to 668 excessive congestion because inelastic PWs do not adjust their 669 transmission rate in response to congestion. [ID-ietf-pals-congcons] 670 analyses TDM PWs, with an initial conclusion that a TDM PW operating 671 with a degree of loss that may result in congestion-related problems 672 is also operating with a degree of loss that results in an 673 unacceptable TDM service. For that reason, the draft suggests that a 674 managed circuit breaker that shuts down a PW when it persistently 675 fails to deliver acceptable TDM service is a useful means for 676 addressing these congestion concerns. 678 6. Examples where circuit breakers may not be needed. 680 A Circuit Breaker is not required for a single Congestion Controller- 681 controlled flow using TCP, SCTP, TFRC, etc. In these cases, the 682 Congestion Control methods are already designed to prevent congestion 683 collapse. 685 6.1. CBs over pre-provisioned Capacity 687 One common question is whether a Circuit Breaker is needed when a 688 tunnel is deployed in a private network with pre-provisioned 689 capacity? 690 In this case, compliant traffic that does not exceed the provisioned 691 capacity ought not to result in congestion collapse. A Circuit 692 Breaker will hence only be triggered when there is non-compliant 693 traffic. It could be argued that this event ought never to happen - 694 but it could also be argued that the Circuit Breaker equally ought 695 never to be triggered. If a Circuit Breaker were to be implemented, 696 it will provide an appropriate response if persistent congestion 697 occurs in an operational network. 699 Implementing a Circuit Breaker will not reduce the performance of the 700 flows, but offers protection in the event that persistent congestion 701 occurs. This also could be used to protect from a failure that 702 causes traffic to be routed over a non-pre-provisioned path. 704 6.2. CBs with tunnels carrying Congestion-Controlled Traffic 706 IP-based traffic is generally assumed to be congestion-controlled, 707 i.e., it is assumed that the transport protocols generating IP-based 708 traffic at the sender already employ mechanisms that are sufficient 709 to address congestion on the path [RFC5405]. A question therefore 710 arises when people deploy a tunnel that is thought to only carry an 711 aggregate of TCP (or some other Congestion Controller-controlled) 712 traffic: Is there advantage in this case in using a Circuit Breaker? 714 For sure, traffic in a such a tunnel will respond to congestion. 715 However, the answer to the question is not always obvious, because 716 the overall traffic formed by an aggregate of flows that implement a 717 Congestion Controller mechanism does not necessarily prevent 718 congestion collapse. For instance, most Congestion Controller 719 mechanisms require long-lived flows to react to reduce the rate of a 720 flow, an aggregate of many short flows could result in many 721 terminating before they experience congestion. It is also often 722 impossible for a tunnel service provider to know that the tunnel only 723 contains CC-controlled traffic (e.g., Inspecting packet headers could 724 not be possible). The important thing to note is that if the 725 aggregate of the traffic does not result in persistent congestion 726 (impacting other flows), then the Circuit Breaker will not trigger. 727 This is the expected case in this context - so implementing a Circuit 728 Breaker will not reduce performance of the tunnel, but offers 729 protection in the event that persistent congestion occur. 731 6.3. CBs with Uni-directional Traffic and no Control Path 733 A one-way forwarding path could have no associated control path, and 734 therefore cannot be controlled using an automated process. This 735 service could be provided using a path that has dedicated capacity 736 and does not share this capacity with other elastic Internet flows 737 (i.e., flows that vary their rate). 739 A way to mitigate the impact on other flows when capacity could be 740 shared is to manage the traffic envelope by using ingress policing. 742 Supporting this type of traffic in the general Internet requires 743 operator monitoring to detect and respond to persistent congestion. 745 7. Security Considerations 747 All Circuit Breaker mechanisms rely upon coordination between the 748 ingress and egress meters and communication with the trigger 749 function. This is usually achieved by passing network control 750 information (or protocol messages) across the network. Timely 751 operation of a circuit breaker depends on the choice of measurement 752 period. If the receiver has an interval that is overly long, then 753 the responsiveness of the circuit breaker decreases. This impacts 754 the ability of the circuit breaker to detect and react to congestion. 756 Mechanisms need to be implemented to prevent attacks on the network 757 control information that would result in Denial of Service (DoS). 758 The source and integrity of control information (measurements and 759 triggers) MUST be protected from off-path attacks. Without 760 protection, it could be trivial for an attacker to inject packets 761 with values that could prematurely trigger a circuit breaker 762 resulting in DoS. Simple protection can be provided by using a 763 randomized source port, or equivalent field in the packet header 764 (such as the RTP SSRC value and the RTP sequence number) expected not 765 to be known to an off-path attacker. Stronger protection can be 766 achieved using a secure authentication protocol. 768 Transmission of network control information consumes network 769 capacity. This control traffic needs to be considered in the design 770 of a circuit breaker and could potentially add to network congestion. 771 If this traffic is sent over a shared path, it is RECOMMENDED that 772 this control traffic is prioritized to reduce the probability of loss 773 under congestion. Control traffic also needs to be considered when 774 provisioning a network that uses a circuit breaker. 776 The circuit breaker MUST be designed to be robust to packet loss that 777 can also be experienced during congestion/overload. Loss of control 778 traffic could be a side-effect of a congested network, but also could 779 arise from other causes. 781 The security implications depend on the design of the mechanisms, the 782 type of traffic being congtrolled and the intended deployment 783 scenario. Each design of a Circuit Breaker MUST therefore evaluate 784 whether the particular circuit breaker mechanism has new security 785 implications. 787 8. IANA Considerations 789 This document makes no request from IANA. 791 9. Acknowledgments 793 There are many people who have discussed and described the issues 794 that have motivated this draft. Contributions and comments included: 795 Lars Eggert, Colin Perkins, David Black, Matt Mathis and Andrew 796 McGregor. This work was part-funded by the European Community under 797 its Seventh Framework Programme through the Reducing Internet 798 Transport Latency (RITE) project (ICT-317700). 800 10. Revision Notes 802 XXX RFC-Editor: Please remove this section prior to publication XXX 804 Draft 00 806 This was the first revision. Help and comments are greatly 807 appreciated. 809 Draft 01 811 Contained clarifications and changes in response to received 812 comments, plus addition of diagram and definitions. Comments are 813 welcome. 815 WG Draft 00 817 Approved as a WG work item on 28th Aug 2014. 819 WG Draft 01 821 Incorporates feedback after Dallas IETF TSVWG meeting. This version 822 is thought ready for WGLC comments. 824 WG Draft 02 826 Minor fixes for typos. Rewritten security considerations section. 828 WG Draft 03 830 Updates following WGLC comments (see TSV mailing list). Comments 831 from C Perkins; D Black and off-list feedback. 833 A clear recommendation of intended scope. 835 Changes include: Improvement of language on timescales and minimum 836 measurement period; clearer articulation of endpoint and multicast 837 examples - with new diagrams; separation of the controlled network 838 case; updated text on position of trigger function; corrections to 839 RTP-CB text; clarification of loss v ECN metrics; checks against 840 submission checklist 9use of keywords, added meters to diagrams). 842 WG Draft 04 844 Added section on PW CB for TDM - a newly adopted draft (D. Black). 846 WG Draft 05 848 Added clarifications requested during AD review. 850 11. References 852 11.1. Normative References 854 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 855 Requirement Levels", BCP 14, RFC 2119, 856 DOI 10.17487/RFC2119, March 1997, 857 . 859 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 860 of Explicit Congestion Notification (ECN) to IP", 861 RFC 3168, DOI 10.17487/RFC3168, September 2001, 862 . 864 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 865 for Application Designers", BCP 145, RFC 5405, 866 DOI 10.17487/RFC5405, November 2008, 867 . 869 11.2. Informative References 871 [ID-ietf-pals-congcons] 872 Stein, YJ., Black, D., and B. Briscoe, "Pseudowire 873 Congestion Considerations (Work-in-Progress)", 2015. 875 [Jacobsen88] 876 European Telecommunication Standards, Institute (ETSI), 877 "Congestion Avoidance and Control", SIGCOMM Symposium 878 proceedings on Communications architectures and 879 protocols", August 1998. 881 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 882 RFC 1112, DOI 10.17487/RFC1112, August 1989, 883 . 885 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 886 Edge-to-Edge (PWE3) Architecture", RFC 3985, 887 DOI 10.17487/RFC3985, March 2005, 888 . 890 [RFC4488] Levin, O., "Suppression of Session Initiation Protocol 891 (SIP) REFER Method Implicit Subscription", RFC 4488, 892 DOI 10.17487/RFC4488, May 2006, 893 . 895 [RFC4553] Vainshtein, A., Ed. and YJ. Stein, Ed., "Structure- 896 Agnostic Time Division Multiplexing (TDM) over Packet 897 (SAToP)", RFC 4553, DOI 10.17487/RFC4553, June 2006, 898 . 900 [RFC5086] Vainshtein, A., Ed., Sasson, I., Metz, E., Frost, T., and 901 P. Pate, "Structure-Aware Time Division Multiplexed (TDM) 902 Circuit Emulation Service over Packet Switched Network 903 (CESoPSN)", RFC 5086, DOI 10.17487/RFC5086, December 2007, 904 . 906 [RFC5087] Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi, 907 "Time Division Multiplexing over IP (TDMoIP)", RFC 5087, 908 DOI 10.17487/RFC5087, December 2007, 909 . 911 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 912 Friendly Rate Control (TFRC): Protocol Specification", 913 RFC 5348, DOI 10.17487/RFC5348, September 2008, 914 . 916 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 917 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 918 . 920 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 921 and K. Carlberg, "Explicit Congestion Notification (ECN) 922 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 923 2012, . 925 [RTP-CB] Perkins, and Singh, "Multimedia Congestion Control: 926 Circuit Breakers for Unicast RTP Sessions", February 2014. 928 Author's Address 930 Godred Fairhurst 931 University of Aberdeen 932 School of Engineering 933 Fraser Noble Building 934 Aberdeen, Scotland AB24 3UE 935 UK 937 Email: gorry@erg.abdn.ac.uk 938 URI: http://www.erg.abdn.ac.uk