idnits 2.17.1 draft-ietf-tsvwg-circuit-breaker-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 17, 2015) is 3114 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC3828' is mentioned on line 577, but not defined -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group G. Fairhurst 3 Internet-Draft University of Aberdeen 4 Intended status: Best Current Practice October 17, 2015 5 Expires: April 19, 2016 7 Network Transport Circuit Breakers 8 draft-ietf-tsvwg-circuit-breaker-06 10 Abstract 12 This document explains what is meant by the term "network transport 13 Circuit Breaker" (CB). It describes the need for circuit breakers 14 when using network tunnels, and other non-congestion controlled 15 applications, and explains where circuit breakers are, and are not, 16 needed. It also defines requirements for building a circuit breaker 17 and the expected outcomes of using a circuit breaker within the 18 Internet. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 19, 2016. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 1.1. Types of Circuit-Breaker . . . . . . . . . . . . . . . . 4 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3. Design of a Circuit-Breaker (What makes a good circuit 58 breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.1. Functional Components . . . . . . . . . . . . . . . . . . 5 60 4. Requirements for a Network Transport Circuit Breaker . . . . 8 61 5. Other network topologies . . . . . . . . . . . . . . . . . . 11 62 5.1. Use with a multicast control/routing protocol . . . . . . 11 63 5.2. Use with control protocols supporting pre-provisioned 64 capacity . . . . . . . . . . . . . . . . . . . . . . . . 12 65 5.3. Unidirectional Circuit Breakers over Controlled Paths . . 13 66 6. Examples of Circuit Breakers . . . . . . . . . . . . . . . . 13 67 6.1. A Fast-Trip Circuit Breaker . . . . . . . . . . . . . . . 13 68 6.1.1. A Fast-Trip Circuit Breaker for RTP . . . . . . . . . 14 69 6.2. A Slow-trip Circuit Breaker . . . . . . . . . . . . . . . 14 70 6.3. A Managed Circuit Breaker . . . . . . . . . . . . . . . . 15 71 6.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires . . 15 72 6.3.2. A Managed Circuit Breaker for Pseudowires (PWs) . . . 16 73 7. Examples where circuit breakers may not be needed. . . . . . 16 74 7.1. CBs over pre-provisioned Capacity . . . . . . . . . . . . 16 75 7.2. CBs with tunnels carrying Congestion-Controlled Traffic . 17 76 7.3. CBs with Uni-directional Traffic and no Control Path . . 18 77 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 79 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 80 11. Revision Notes . . . . . . . . . . . . . . . . . . . . . . . 19 81 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 82 12.1. Normative References . . . . . . . . . . . . . . . . . . 20 83 12.2. Informative References . . . . . . . . . . . . . . . . . 21 84 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 22 86 1. Introduction 88 A network transport Circuit Breaker (CB) is an automatic mechanism 89 that is used to estimate congestion caused by a flow, and to 90 terminate (or significantly reduce the rate of) the flow when 91 persistent congestion is detected. This is a safety measure to 92 prevent starvation of network resources denying other flows from 93 access to the Internet, such measures are essential for an Internet 94 that is heterogeneous and for traffic that is hard to predict in 95 advance. Avoiding persistent prevention is important to reduce the 96 potential for "Congestion Collapse" [RFC2914]. 98 The term "Circuit Breaker" originates in electricity supply, and has 99 nothing to do with network circuits or virtual circuits. In 100 electricity supply, a Circuit Breaker is intended as a protection 101 mechanism of last resort. Under normal circumstances, a Circuit 102 Breaker ought not to be triggered; it is designed to protect the 103 supply network and attached equipment when there is overload. Just 104 as people do not expect the electrical circuit-breaker (or fuse) in 105 their home to be triggered, except when there is a wiring fault or a 106 problem with an electrical appliance. 108 In networking, the Circuit Breaker principle can be used as a 109 protection mechanism of last resort to avoid persistent congestion 110 impacting other flows that share network capacity. Persistent 111 congestion was a feature of the early Internet of the 1980s. This 112 resulted in excess traffic starving other connection from access to 113 the Internet. It was countered by the requirement to use congestion 114 control (CC) by the Transmission Control Protocol (TCP) [Jacobsen88] 115 [RFC1112]. These mechanisms operate in Internet hosts to cause TCP 116 connections to "back off" during congestion. The introduction of a 117 Congestion Controller in TCP (currently documented in [RFC5681] 118 ensured the stability of the Internet, because it was able to detect 119 congestion and promptly react. This worked well while TCP was by far 120 the dominant traffic in the Internet, and most TCP flows were long- 121 lived (ensuring that they could detect and respond to congestion 122 before the flows terminated). This is no longer the case, and non- 123 congestion controlled traffic, including many applications of the 124 User Datagram Protocol (UDP) can form a significant proportion of the 125 total traffic traversing a link. The current Internet therefore 126 requires that non-congestion controlled traffic needs to be 127 considered to avoid persistent congestion. 129 There are important differences between a transport circuit-breaker 130 and a congestion-control method. Specifically, congestion control 131 (as implemented in TCP, SCTP, and DCCP) operates on the timescale on 132 the order of a packet round-trip-time (RTT), the time from sender to 133 destination and return. Congestion control methods are able to react 134 to a single packet loss/marking and reduce the transmission rate for 135 each loss or congestion event. The goal is usually to limit the 136 maximum transmission rate to a rate that reflects the available 137 capacity across a network path. These methods typically operate on 138 individual traffic flows (e.g., a 5-tuple). 140 In contrast, Circuit Breakers are recommended for non-congestion- 141 controlled Internet flows and for traffic aggregates, e.g., traffic 142 sent using a network tunnel. People have been implementing what this 143 draft characterizes as circuit breakers on an ad hoc basis to protect 144 Internet traffic, this draft therefore provides guidance on how to 145 deploy and use these mechanisms. Later sections provide examples of 146 cases where circuit-breakers may or may not be desirable. 148 A Circuit Breaker needs to measure (meter) the traffic to determine 149 if the network is experiencing congestion and needs to be designed to 150 trigger robustly when there is persistent congestion. This means the 151 trigger needs to operate on a timescale much longer than the path 152 round trip time (e.g., seconds to possibly many tens of seconds). 153 This longer period is needed to provide sufficient time for 154 transports (or applications) to adjust their rate following 155 congestion, and for the network load to stabilize after any 156 adjustment. 158 A Circuit Breaker trigger will often utilize a series of successive 159 sample measurements metered at an ingress point and an egress point 160 (either of which could be a transport endpoint). These measurements 161 need to be taken over a reasonably long period of time. This is to 162 ensure that a Circuit Breaker does not accidentally trigger following 163 a single (or even successive) congestion events (congestion events 164 are what triggers congestion control, and are to be regarded as 165 normal on a network link operating near its capacity). Once 166 triggered, a control function needs to remove traffic from the 167 network, either by disabling the flow or by significantly reducing 168 the level of traffic. This reaction provides the required protection 169 to prevent persistent congestion being experienced by other flows 170 that share the congested part of the network path. 172 Section 4 defines requirements for building a Circuit Breaker. 174 1.1. Types of Circuit-Breaker 176 There are various forms of network transport circuit breaker. These 177 are differentiated mainly on the timescale over which they are 178 triggered, but also in the intended protection they offer: 180 o Fast-Trip Circuit Breakers: The relatively short timescale used by 181 this form of circuit breaker is intended to provide protection for 182 network traffic from a single flow or related group of flows. 184 o Slow-Trip Circuit Breakers: This circuit breaker utilizes a longer 185 timescale and is designed to protect network traffic from 186 congestion by traffic aggregates. 188 o Managed Circuit Breakers: Utilize the operations and management 189 functions that might be present in a managed service to implement 190 a circuit breaker. 192 Examples of each type of circuit breaker are provided in section 4. 194 2. Terminology 196 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 197 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 198 document are to be interpreted as described in [RFC2119]. 200 3. Design of a Circuit-Breaker (What makes a good circuit breaker?) 202 Although circuit breakers have been talked about in the IETF for many 203 years, there has not yet been guidance on the cases where circuit 204 breakers are needed or upon the design of circuit breaker mechanisms. 205 This document seeks to offer advice on these two topics. 207 Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that 208 carry non-congestion-controlled Internet flows and for traffic 209 aggregates. This includes traffic sent using a network tunnel. 210 Designers of other protocols and tunnel encapsulations also ought to 211 consider the use of these techniques to provide last resort to 212 protect traffic that shares the network path being used. 214 This document defines the requirements for design of a Circuit 215 Breaker and provides examples of how a Circuit Breaker can be 216 constructed. The specifications of individual protocols and tunnel 217 encapsulations need to detail the protocol mechanisms needed to 218 implement a Circuit Breaker. 220 Section 3.1 describes the functional components of a circuit breaker 221 and section 3.2 defines requirements for implementing a Circuit 222 Breaker. 224 3.1. Functional Components 226 The basic design of a transport circuit breaker involves 227 communication between an ingress point (a sender) and an egress point 228 (a receiver) of a network flow or set of flows. A simple picture of 229 Circuit Breaker operation is provided in figure 1. This shows a set 230 of routers (each labelled R) connecting a set of endpoints. 232 A Circuit Breaker is used to control traffic passing through a subset 233 of these routers, acting between the ingress and a egress point 234 network devices. The path between the ingress and egress could be 235 provided by a tunnel or other network-layer technique. One expected 236 use would be at the ingress and egress of a service, where all 237 traffic being considered terminates beyond the egress point, and 238 hence the ingress and egress carry the same set of flows. 240 +--------+ +--------+ 241 |Endpoint| |Endpoint| 242 +--+-----+ >>> circuit breaker traffic >>> +--+-----+ 243 | | 244 | +-+ +-+ +---------+ +-+ +-+ +-+ +--------+ +-+ +-+ | 245 +-+R+--+R+->+ Ingress +--+R+--+R+--+R+--+ Egress |--+R+--+R+-+ 246 +++ +-+ +------+--+ +-+ +-+ +-+ +-----+--+ +++ +-+ 247 | ^ | | | 248 | | +--+------+ +------+--+ | 249 | | | Ingress | | Egress | | 250 | | | Meter | | Meter | | 251 | | +----+----+ +----+----+ | 252 | | | | | 253 +-+ | | +----+----+ | | +-+ 254 |R+--+ | | Measure +<----------------+ +--+R| 255 +++ | +----+----+ Reported +++ 256 | | | Egress | 257 | | +----+----+ Measurement | 258 +--+-----+ | | Trigger + +--+-----+ 259 |Endpoint| | +----+----+ |Endpoint| 260 +--------+ | | +--------+ 261 +---<---+ 262 Reaction 264 Figure 1: A CB controlling the part of the end-to-end path between an 265 ingress point and an egress point. (Note: In some cases, the trigger 266 and measure functions could alternatively be located at other 267 locations (e.g., at a network operations centre.) 269 In the context of a Circuit Breaker, the ingress and egress functions 270 could be implemented in different places. For example, they could be 271 located in network devices at a tunnel ingress and at the tunnel 272 egress. In some cases, they could be located at one or both network 273 endpoints (see figure 2), implemented as components within a 274 transport protocol. 276 +----------+ +----------+ 277 | Ingress | +-+ +-+ +-+ | Egress | 278 | Endpoint +->+R+--+R+--+R+--+ Endpoint | 279 +--+----+--+ +-+ +-+ +-+ +----+-----+ 280 ^ | | 281 | +--+------+ +----+----+ 282 | | Ingress | | Egress | 283 | | Meter | | Meter | 284 | +----+----+ +----+----+ 285 | | | 286 | +--- +----+ | 287 | | Measure +<-----------------+ 288 | +----+----+ Reported 289 | | Egress 290 | +----+----+ Measurement 291 | | Trigger | 292 | +----+----+ 293 | | 294 +---<--+ 295 Reaction 297 Figure 2: An endpoint CB implemented at the sender (ingress) and 298 receiver (egress). 300 The set of components needed to implement a Circuit Breaker are: 302 1. An ingress meter (at the sender or tunnel ingress) records the 303 number of packets/bytes sent in each measurement interval. This 304 measures the offered network load for a flow or set of flows. 305 For example, the measurement interval could be many seconds (or 306 every few tens of seconds or a series of successive shorter 307 measurements that are combined by the Circuit Breaker Measurement 308 function). 310 2. An egress meter (at the receiver or tunnel egress) records the 311 number/bytes received in each measurement interval. This 312 measures the supported load for the flow or set of flows, and 313 could utilize other signals to detect the effect of congestion 314 (e.g., loss/marking experienced over the path). The measurements 315 at the egress could be synchronised (including an offset for the 316 time of flight of the data, or referencing the measurements to a 317 particular packet) to ensure any counters refer to the same span 318 of packets. 320 3. The measured values at the ingress and egress are communicated to 321 the Circuit Breaker Measurement function. This could use several 322 methods including: Sending return measurement packets from a 323 receiver to a trigger function at the sender; An implementation 324 using Operations, Administration and Management (OAM); or be 325 sending another in-band signalling datagram to the trigger 326 function. This could also be implemented purely as a control 327 plane function, e.g., using a software-defined network 328 controller. 330 4. The measurement function combines the ingress and egress 331 measurements to assess the present level of network congestion. 332 (For example, the loss rate for each measurement interval could 333 be deduced from calculating the difference between ingress and 334 egress counter values.) Note the method does not require high 335 accuracy for the period of the measurement interval (or therefore 336 the measured value, since isolated and/or infrequent loss events 337 need to be disregarded.) 339 5. A trigger function determines if the measurements indicate 340 persistent congestion. This function defines an appropriate 341 threshold for determining there is persistent congestion between 342 the ingress and egress. This preferably considers a rate or 343 ratio, rather than an absolute value (e.g., more than 10% loss, 344 but other methods could also be based on the rate of transmission 345 as well as the loss rate). The transport Circuit Breaker is 346 triggered when the threshold is exceeded in multiple measurement 347 intervals (e.g., 3 successive measurements). Designs need to be 348 robust so that single or spurious events do not trigger a 349 reaction. 351 6. A reaction that is applied that the Ingress when the Circuit 352 Breaker is triggered. This seeks to automatically remove the 353 traffic causing persistent congestion. 355 7. A feedback mechanism that triggers when either the receive or 356 ingress and egress measurements are not available, since this 357 also could indicate a loss of control packets (also a symptom of 358 heavy congestion or inability to control the load). 360 4. Requirements for a Network Transport Circuit Breaker 362 The requirements for implementing a Circuit Breaker are: 364 o There MUST be a communication path used for control messages from 365 the ingress meter and the egress meter to the point of 366 measurement. The Circuit Breaker MUST trigger if there is a 367 failure of the communication path used for the control messages. 368 That is, the feedback indicating a congested period needs to be 369 designed so that the Circuit Breaker is triggered when it fails to 370 receive measurement reports that indicate an absence of 371 congestion, rather than relying on the successful transmission of 372 a "congested" signal back to the sender. (The feedback signal 373 could itself be lost under congestion). 375 o A Circuit Breaker MUST define a measurement period over which the 376 Circuit Breaker Measurement function measures the level of 377 congestion or loss. This method does not have to detect 378 individual packet loss, but MUST have a way to know that packets 379 have been lost/marked from the traffic flow. If Explicit 380 Congestion Notification (ECN) is enabled [RFC3168], an egress 381 meter MAY also count the number of ECN congestion marks/event per 382 measurement interval, but even if ECN is used, loss MUST still be 383 measured, since this better reflects the impact of persistent 384 congestion. In this context, loss represents a reliable 385 indication of congestion, as opposed to the finer-grain marking of 386 incipient congestion that can be provided via ECN. The type of 387 Circuit Breaker will determine how long this measurement period 388 needs to be. 390 o The measurement period used by a Circuit Breaker Measurement 391 function MUST be longer than the time that current Congestion 392 Control algorithms need to reduce their rate following detection 393 of congestion. This is important because end-to-end Congestion 394 Control algorithms require at least one RTT to notify and adjust 395 the traffic to experienced congestion, and congestion bottlenecks 396 can share traffic with a diverse range of RTTs. The measurement 397 period is therefore expected to be significantly longer than the 398 RTT experienced by the Circuit Breaker itself. 400 o If necessary, MAY combine successive individual meter samples from 401 the ingress and egress to ensure observation of an average over a 402 sufficiently long interval. (Note when meter samples need to be 403 combined, the combination needs to reflect the sum of the 404 individual sample counts divided by the total time/volume over 405 which the samples were measured. Individual samples over 406 different intervals can not be directly combined to generate an 407 average value.) 409 o A Circuit Breaker is REQUIRED to define a threshold to determine 410 whether the measured congestion is considered excessive. 412 o A Circuit Breaker is REQUIRED to define the triggering interval, 413 defining the period over which the trigger uses the collected 414 measurements. Circuit Breakers need to trigger over a 415 sufficiently long period to avoid additionally penalizing flows 416 with a long path RTT (e.g., many path RTTs). 418 o A Circuit Breaker MUST be robust to multiple congestion events. 419 This usually will define a number of measured persistent 420 congestion events per triggering period. For example, a Circuit 421 Breaker MAY combine the results of several measurement periods to 422 determine if the Circuit Breaker is triggered. (e.g., triggered 423 when persistent congestion is detected in 3 of the measurements 424 within the triggering interval). 426 o A Circuit Breaker SHOULD be constructed so that it does not 427 trigger under light or intermittent congestion. 429 o The default response to a trigger SHOULD disable all traffic that 430 contributed to congestion. 432 o Once triggered, the Circuit Breaker MUST react decisively by 433 disabling or significantly reducing traffic at the source (e.g., 434 ingress). A reaction that results in a reduction SHOULD result in 435 reducing the traffic by at least an order of magnitude, each time 436 the Circuit Breaker is triggered. This response needs to be much 437 more severe than that of a Congestion Controller algorithm (such 438 as TCP's congestion control [RFC5681] or TFRC [RFC5348]), because 439 the Circuit Breaker reacts to more persistent congestion and 440 operates over longer timescales (i.e., the overload condition will 441 have persisted for a longer time before the Circuit Breaker is 442 triggered). 444 o A Circuit Breaker that reduces the rate of a flow, MUST continue 445 to monitor the level of congestion and MUST further reduce the 446 rate if the Circuit Breaker is again triggered. 448 o The reaction to a triggered Circuit Breaker MUST continue for a 449 period that is at least the triggering interval. Operator 450 intervention will usually be required to restore a flow. If an 451 automated response is needed to reset the trigger, then this needs 452 to not be immediate. The design of an automated reset mechanism 453 needs to be sufficiently conservative that it does not adversely 454 interact with other mechanisms (including other Circuit Breaker 455 algorithms that control traffic over a common path). It SHOULD 456 NOT perform an automated reset when there is evidence of continued 457 congestion. 459 o When a Circuit Breaker is triggered, it SHOULD be regarded as an 460 abnormal network event. As such, this event SHOULD be logged. 461 The measurements that lead to triggering of the Circuit Breaker 462 SHOULD also be logged. 464 5. Other network topologies 466 A Circuit Breaker can be deployed in networks with topologies 467 different to that presented in figure 2. This section describes 468 examples of such usage, and possible places where functions may be 469 implemented. 471 5.1. Use with a multicast control/routing protocol 473 +----------+ +--------+ +----------+ 474 | Ingress | +-+ +-+ +-+ | Egress | | Egress | 475 | Endpoint +->+R+--+R+--+R+--+ Router |--+ Endpoint +->+ 476 +----+-----+ +-+ +-+ +-+ +---+--+-+ +----+-----+ | 477 ^ ^ ^ ^ | ^ | | 478 | | | | | | | | 479 +----+----+ + - - - < - - - - + | +----+----+ | Reported 480 | Ingress | multicast Prune | | Egress | | Ingress 481 | Meter | | | Meter | | Measurement 482 +---------+ | +----+----+ | 483 | | | 484 | +----+----+ | 485 | | Measure +<--+ 486 | +----+----+ 487 | | 488 | +----+----+ 489 multicast | | Trigger | 490 Leave | +----+----+ 491 Message | | 492 +----<----+ 494 Figure 3: An example of a multicast CB controlling the end-to-end 495 path between an ingress endpoint and an egress endpoint. 497 Figure 3 shows one example of how a multicast circuit breaker could 498 be implemented at a pair of multicast endpoints (e.g., to implement a 499 Fast-Trip Circuit Breaker, Section 6.1). The ingress endpoint (the 500 sender that sources the multicast traffic) meters the ingress load, 501 generating an ingress measurement (e.g., recording timestamped packet 502 counts), and sends this measurement to the multicast group together 503 with the traffic it has measured. 505 Routers along a multicast path forward the multicast traffic 506 (including the ingress measurement) to all active endpoint receivers. 507 Each last hop (egress) router forwards the traffic to one or more 508 egress endpoint(s). 510 In this figure, each endpoint includes a meter that performs a local 511 egress load measurement. An endpoint also extracts the received 512 ingress measurement from the traffic, and compares the ingress and 513 egress measurements to determine if the Circuit Breaker ought to be 514 triggered. This measurement has to be robust to loss (see previous 515 section). If the Circuit Breaker is triggered, it generates a 516 multicast leave message for the egress (e.g., an IGMP or MLD message 517 sent to the last hop router), which causes the upstream router to 518 cease forwarding traffic to the egress endpoint. 520 Any multicast router that has no active receivers for a particular 521 multicast group will prune traffic for that group, sending a prune 522 message to its upstream router. This starts the process of releasing 523 the capacity used by the traffic and is a standard multicast routing 524 function (e.g., using the PIM-SM routing protocol). Each egress 525 operates autonomously, and the circuit breaker "reaction" is executed 526 by the multicast control plane (e.g., by the PIM multicast routing 527 protocol), requiring no explicit signalling by the circuit breaker 528 along the communication path used for the control messages. Note: 529 there is no direct communication with the Ingress, and hence a 530 triggered Circuit Breaker only controls traffic downstream of the 531 first hop router. It does not stop traffic flowing from the sender 532 to the first hop router; this is however the common practice for 533 multicast deployment. 535 The method could also be used with a multicast tunnel or subnetwork 536 (e.g., Section 6.2, Section 6.3), where a meter at the ingress 537 generates additional control messages to carry the measurement data 538 towards the egress where the egress metering is implemented. 540 5.2. Use with control protocols supporting pre-provisioned capacity 542 Some paths are provisioned using a control protocol, e.g., flows 543 provisioned using the Multi-Protocol Label Switching (MPLS) services, 544 path provisioned using the Resource reservation protocol (RSVP), 545 networks utilizing Software Defined Network (SDN) functions, or 546 admission-controlled Differentiated Services. 548 Figure 1 shows one expected use case, where in this usage a separate 549 device could be used to perform the measurement and trigger 550 functions. The reaction generated by the trigger could take the form 551 of a network control message sent to the ingress and/or other network 552 elements causing these elements to react to the Circuit Breaker. 553 Examples of this type of use are provided in section Section 6.3. 555 5.3. Unidirectional Circuit Breakers over Controlled Paths 557 A Circuit Breaker can be used to control uni-directional UDP traffic, 558 providing that there is a communication path that can be used for 559 control messages to connect the functional components at the Ingress 560 and Egress. This communication path for the control messages can 561 exist in networks for which the traffic flow is purely 562 unidirectional. For example, a multicast stream that sends packets 563 across an Internet path and can use multicast routing to prune flows 564 to shed network load. Some other types of subnetwork also utilize 565 control protocols that can be used to control traffic flows. 567 6. Examples of Circuit Breakers 569 There are multiple types of Circuit Breaker that could be defined for 570 use in different deployment cases. This section provides examples of 571 different types of circuit breaker: 573 6.1. A Fast-Trip Circuit Breaker 575 Applications ought to use a full-featured transport (TCP, SCTP, 576 DCCP), and if not, application (e.g. those using UDP and its UDP-Lite 577 variant [RFC3828])they need to provide appropriate congestion 578 avoidance. [RFC2309] discusses the dangers of congestion- 579 unresponsive flows and states that "all UDP-based streaming 580 applications should incorporate effective congestion avoidance 581 mechanisms". Guidance for applications that do not use congestion- 582 controlled transports is provided in [ID-ietf-tsvwg-RFC5405.bis]. 583 Such mechanisms can be designed to react on much shorter timescales 584 than a circuit breaker, that only observes a traffic envelope. These 585 methods can also interact with an application to more effectively 586 control its sending rate. 588 A fast-trip circuit breaker is the most responsive form of Circuit 589 Breaker. It has a response time that is only slightly larger than 590 that of the traffic that it controls. It is suited to traffic with 591 well-understood characteristics (and could include one or more 592 trigger functions specifically tailored the type of traffic for which 593 it is designed). It is not suited to arbitrary network traffic and 594 may be unsuitable fro traffic aggregates, since it could prematurely 595 trigger (e.g., when multiple congestion-controlled flows lead to 596 short-term overload). 598 These mechanisms are suitable for implementation in endpoints, where 599 they can also compliment end-to-end congestion control methods. A 600 shorter response time enables these mechanisms to triggers before 601 other forms of circuit breaker (e.g., circuit breakers operating on 602 traffic aggregates at a point along the network path). 604 6.1.1. A Fast-Trip Circuit Breaker for RTP 606 A set of fast-trip Circuit Breaker methods have been specified for 607 use together by a Real-time Transport Protocol (RTP) flow using the 608 RTP/AVP Profile [RTP-CB]. It is expected that, in the absence of 609 severe congestion, all RTP applications running on best-effort IP 610 networks will be able to run without triggering these circuit 611 breakers. A fast-trip RTP Circuit Breaker is therefore implemented 612 as a fail-safe that when triggered will terminate RTP traffic. 614 The sender monitors reception of RTCP reception report blocks, as 615 contained in SR or RR packets, that convey reception quality feedback 616 information. This is used to measure (congestion) loss, possibly in 617 combination with ECN [RFC6679]. 619 The Circuit Breaker action (shutdown of the flow) is triggered when 620 any of the following trigger conditions are true: 622 1. An RTP Circuit Breaker triggers on reported lack of progress. 624 2. An RTP Circuit Breaker triggers when no receiver reports messages 625 are received. 627 3. An RTP Circuit Breaker uses a TFRC-style check and sets a hard 628 upper limit to the long-term RTP throughput (over many RTTs). 630 4. An RTP Circuit Breaker includes the notion of Media Usability. 631 This circuit breaker is triggered when the quality of the 632 transported media falls below some required minimum acceptable 633 quality. 635 6.2. A Slow-trip Circuit Breaker 637 A slow-trip Circuit Breaker could be implemented in an endpoint or 638 network device. This type of Circuit Breaker is much slower at 639 responding to congestion than a fast-trip Circuit Breaker and is 640 expected to be more common. 642 One example where a slow-trip Circuit Breaker is needed is where 643 flows or traffic-aggregates use a tunnel or encapsulation and the 644 flows within the tunnel do not all support TCP-style congestion 645 control (e.g., TCP, SCTP, TFRC), see [ID-ietf-tsvwg-RFC5405.bis] 646 section 3.1.3. A use case is where tunnels are deployed in the 647 general Internet (rather than "controlled environments" within an 648 Internet service provider or enterprise network), especially when the 649 tunnel could need to cross a customer access router. 651 6.3. A Managed Circuit Breaker 653 A managed Circuit Breaker is implemented in the signalling protocol 654 or management plane that relates to the traffic aggregate being 655 controlled. This type of circuit breaker is typically applicable 656 when the deployment is within a "controlled environment". 658 A Circuit Breaker requires more than the ability to determine that a 659 network path is forwarding data, or to measure the rate of a path - 660 which are often normal network operational functions. There is an 661 additional need to determine a metric for congestion on the path and 662 to trigger a reaction when a threshold is crossed that indicates 663 persistent congestion. 665 6.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires 667 [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example 668 of a managed circuit breaker for isochronous flows. 670 If such flows were to run over a pre-provisioned (e.g., Multi- 671 Protocol Label Switching, MPLS) infrastructure, then it could be 672 expected that the Pseudowire (PW) would not experience congestion, 673 because a flow is not expected to either increase (or decrease) their 674 rate. If instead Pseudo-Wire traffic is multiplexed with other 675 traffic over the general Internet, it could experience congestion. 676 [RFC4553] states: "If SAToP PWs run over a PSN providing best-effort 677 service, they SHOULD monitor packet loss in order to detect "severe 678 congestion". The currently recommended measurement period is 1 679 second, and the trigger operates when there are more than three 680 measured Severely Errored Seconds (SES) within a period. If such a 681 condition is detected, a SAToP PW ought to shut down bidirectionally 682 for some period of time...". 684 The concept was that when the packet loss ratio (congestion) level 685 increased above a threshold, the PW was by default disabled. This 686 use case considered fixed-rate transmission, where the PW had no 687 reasonable way to shed load. 689 The trigger needs to be set at the rate that the PW was likely to 690 experience a serious problem, possibly making the service non- 691 compliant. At this point, triggering the Circuit Breaker would 692 remove the traffic preventing undue impact on congestion-responsive 693 traffic (e.g., TCP). Part of the rationale, was that high loss 694 ratios typically indicated that something was "broken" and ought to 695 have already resulted in operator intervention, and therefore need to 696 trigger this intervention. 698 An operator-based response provides opportunity for other action to 699 restore the service quality, e.g., by shedding other loads or 700 assigning additional capacity, or to consciously avoid reacting to 701 the trigger while engineering a solution to the problem. This could 702 require the trigger to be sent to a third location (e.g., a network 703 operations centre, NOC) responsible for operation of the tunnel 704 ingress, rather than the tunnel ingress itself. 706 6.3.2. A Managed Circuit Breaker for Pseudowires (PWs) 708 Pseudowires (PWs) [RFC3985] have become a common mechanism for 709 tunneling traffic, and may compete for network resources both with 710 other PWs and with non-PW traffic, such as TCP/IP flows. 712 [ID-ietf-pals-congcons] discusses congestion conditions that can 713 arise when PWs compete with elastic (i.e., congestion responsive) 714 network traffic (e.g, TCP traffic). Elastic PWs carrying IP traffic 715 (see [RFC4488]) do not raise major concerns because all of the 716 traffic involved responds, reducing the transmission rate when 717 network congestion is detected. 719 In contrast, inelastic PWs (e.g., a fixed bandwidth Time Division 720 Multiplex, TDM) [RFC4553] [RFC5086] [RFC5087]) have the potential to 721 harm congestion responsive traffic or to contribute to excessive 722 congestion because inelastic PWs do not adjust their transmission 723 rate in response to congestion. [ID-ietf-pals-congcons] analyses TDM 724 PWs, with an initial conclusion that a TDM PW operating with a degree 725 of loss that may result in congestion-related problems is also 726 operating with a degree of loss that results in an unacceptable TDM 727 service. For that reason, the draft suggests that a managed circuit 728 breaker that shuts down a PW when it persistently fails to deliver 729 acceptable TDM service is a useful means for addressing these 730 congestion concerns. 732 7. Examples where circuit breakers may not be needed. 734 A Circuit Breaker is not required for a single Congestion Controller- 735 controlled flow using TCP, SCTP, TFRC, etc. In these cases, the 736 Congestion Control methods are already designed to prevent persistent 737 congestion. 739 7.1. CBs over pre-provisioned Capacity 741 One common question is whether a Circuit Breaker is needed when a 742 tunnel is deployed in a private network with pre-provisioned 743 capacity. 745 In this case, compliant traffic that does not exceed the provisioned 746 capacity ought not to result in persistent congestion. A Circuit 747 Breaker will hence only be triggered when there is non-compliant 748 traffic. It could be argued that this event ought never to happen - 749 but it could also be argued that the Circuit Breaker equally ought 750 never to be triggered. If a Circuit Breaker were to be implemented, 751 it will provide an appropriate response if persistent congestion 752 occurs in an operational network. 754 Implementing a Circuit Breaker will not reduce the performance of the 755 flows, but in the event that persistent congestion occurs it protects 756 network traffic that shares network capacity with these flows. A 757 Circuit Breaker also could be used to protect other sharing network 758 traffic from a failure that causes the Circuit Breaker traffic to be 759 routed over a non-pre-provisioned path. 761 7.2. CBs with tunnels carrying Congestion-Controlled Traffic 763 IP-based traffic is generally assumed to be congestion-controlled, 764 i.e., it is assumed that the transport protocols generating IP-based 765 traffic at the sender already employ mechanisms that are sufficient 766 to address congestion on the path [ID-ietf-tsvwg-RFC5405.bis]. A 767 question therefore arises when people deploy a tunnel that is thought 768 to only carry an aggregate of TCP (or some other Congestion 769 Controller-controlled) traffic: Is there advantage in this case in 770 using a Circuit Breaker? 772 For sure, traffic in a such a tunnel will respond to congestion. 773 However, the answer to the question is not always obvious, because 774 the overall traffic formed by an aggregate of flows that implement a 775 Congestion Controller mechanism does not necessarily prevent 776 persistent congestion. For instance, most Congestion Controller 777 mechanisms require long-lived flows to react to reduce the rate of a 778 flow, an aggregate of many short flows could result in many 779 terminating before they experience congestion. It is also often 780 impossible for a tunnel service provider to know that the tunnel only 781 contains CC-controlled traffic (e.g., Inspecting packet headers could 782 not be possible). The important thing to note is that if the 783 aggregate of the traffic does not result in persistent congestion 784 (impacting other flows), then the Circuit Breaker will not trigger. 785 This is the expected case in this context - so implementing a Circuit 786 Breaker will not reduce performance of the tunnel, but in the event 787 that persistent congestion occurs this protects other network traffic 788 that shares capacity with the tunnel traffic. 790 7.3. CBs with Uni-directional Traffic and no Control Path 792 A one-way forwarding path could have no associated communication path 793 for sending control messages, and therefore cannot be controlled 794 using an automated process. This service could be provided using a 795 path that has dedicated capacity and does not share this capacity 796 with other elastic Internet flows (i.e., flows that vary their rate). 798 A way to mitigate the impact on other flows when capacity could be 799 shared is to manage the traffic envelope by using ingress policing. 801 Supporting this type of traffic in the general Internet requires 802 operator monitoring to detect and respond to persistent congestion. 804 8. Security Considerations 806 All Circuit Breaker mechanisms rely upon coordination between the 807 ingress and egress meters and communication with the trigger 808 function. This is usually achieved by passing network control 809 information (or protocol messages) across the network. Timely 810 operation of a circuit breaker depends on the choice of measurement 811 period. If the receiver has an interval that is overly long, then 812 the responsiveness of the circuit breaker decreases. This impacts 813 the ability of the circuit breaker to detect and react to congestion. 815 Mechanisms need to be implemented to prevent attacks on the network 816 control information that would result in Denial of Service (DoS). 817 The source and integrity of control information (measurements and 818 triggers) MUST be protected from off-path attacks. Without 819 protection, it could be trivial for an attacker to inject packets 820 with values that could prematurely trigger a circuit breaker 821 resulting in DoS. Simple protection can be provided by using a 822 randomized source port, or equivalent field in the packet header 823 (such as the RTP SSRC value and the RTP sequence number) expected not 824 to be known to an off-path attacker. Stronger protection can be 825 achieved using a secure authentication protocol. 827 Transmission of network control information consumes network 828 capacity. This control traffic needs to be considered in the design 829 of a Circuit Breaker and could potentially add to network congestion. 830 If this traffic is sent over a shared path, it is RECOMMENDED that 831 this control traffic is prioritized to reduce the probability of loss 832 under congestion. Control traffic also needs to be considered when 833 provisioning a network that uses a circuit breaker. 835 The circuit breaker MUST be designed to be robust to packet loss that 836 can also be experienced during congestion/overload. Loss of control 837 messages could be a side-effect of a congested network, but also 838 could arise from other causes. This does not imply that it is 839 desirable to provide reliable delivery (e.g., over TCP), since this 840 can incur additional delay in responding to congestion. Appropriate 841 mechanisms could be to duplicate control messages to provide 842 increased robustness to loss, or/and to regard a lack of control 843 traffic as an indication that excessive congestion may be being 844 experienced [ID-ietf-tsvwg-RFC5405.bis]. 846 The security implications depend on the design of the mechanisms, the 847 type of traffic being controlled and the intended deployment 848 scenario. Each design of a Circuit Breaker MUST therefore evaluate 849 whether the particular circuit breaker mechanism has new security 850 implications. 852 9. IANA Considerations 854 This document makes no request from IANA. 856 10. Acknowledgments 858 There are many people who have discussed and described the issues 859 that have motivated this draft. Contributions and comments included: 860 Lars Eggert, Colin Perkins, David Black, Matt Mathis and Andrew 861 McGregor. This work was part-funded by the European Community under 862 its Seventh Framework Programme through the Reducing Internet 863 Transport Latency (RITE) project (ICT-317700). 865 11. Revision Notes 867 XXX RFC-Editor: Please remove this section prior to publication XXX 869 Draft 00 871 This was the first revision. Help and comments are greatly 872 appreciated. 874 Draft 01 876 Contained clarifications and changes in response to received 877 comments, plus addition of diagram and definitions. Comments are 878 welcome. 880 WG Draft 00 882 Approved as a WG work item on 28th Aug 2014. 884 WG Draft 01 885 Incorporates feedback after Dallas IETF TSVWG meeting. This version 886 is thought ready for WGLC comments. 888 WG Draft 02 890 Minor fixes for typos. Rewritten security considerations section. 892 WG Draft 03 894 Updates following WGLC comments (see TSV mailing list). Comments 895 from C Perkins; D Black and off-list feedback. 897 A clear recommendation of intended scope. 899 Changes include: Improvement of language on timescales and minimum 900 measurement period; clearer articulation of endpoint and multicast 901 examples - with new diagrams; separation of the controlled network 902 case; updated text on position of trigger function; corrections to 903 RTP-CB text; clarification of loss v ECN metrics; checks against 904 submission checklist 9use of keywords, added meters to diagrams). 906 WG Draft 04 908 Added section on PW CB for TDM - a newly adopted draft (D. Black). 910 WG Draft 05 912 Added clarifications requested during AD review. 914 WG Draft 06 916 Fixed some remaining typos. 918 Update following detailed review by Bob Briscoe, and comments by D. 919 Black. 921 12. References 923 12.1. Normative References 925 [ID-ietf-tsvwg-RFC5405.bis] 926 Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 927 Guidelines (Work-in-Progress)", 2015. 929 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 930 Requirement Levels", BCP 14, RFC 2119, 931 DOI 10.17487/RFC2119, March 1997, 932 . 934 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 935 of Explicit Congestion Notification (ECN) to IP", 936 RFC 3168, DOI 10.17487/RFC3168, September 2001, 937 . 939 12.2. Informative References 941 [ID-ietf-pals-congcons] 942 Stein, YJ., Black, D., and B. Briscoe, "Pseudowire 943 Congestion Considerations (Work-in-Progress)", 2015. 945 [Jacobsen88] 946 European Telecommunication Standards, Institute (ETSI), 947 "Congestion Avoidance and Control", SIGCOMM Symposium 948 proceedings on Communications architectures and 949 protocols", August 1998. 951 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 952 RFC 1112, DOI 10.17487/RFC1112, August 1989, 953 . 955 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 956 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 957 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 958 S., Wroclawski, J., and L. Zhang, "Recommendations on 959 Queue Management and Congestion Avoidance in the 960 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 961 . 963 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 964 RFC 2914, DOI 10.17487/RFC2914, September 2000, 965 . 967 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 968 Edge-to-Edge (PWE3) Architecture", RFC 3985, 969 DOI 10.17487/RFC3985, March 2005, 970 . 972 [RFC4488] Levin, O., "Suppression of Session Initiation Protocol 973 (SIP) REFER Method Implicit Subscription", RFC 4488, 974 DOI 10.17487/RFC4488, May 2006, 975 . 977 [RFC4553] Vainshtein, A., Ed. and YJ. Stein, Ed., "Structure- 978 Agnostic Time Division Multiplexing (TDM) over Packet 979 (SAToP)", RFC 4553, DOI 10.17487/RFC4553, June 2006, 980 . 982 [RFC5086] Vainshtein, A., Ed., Sasson, I., Metz, E., Frost, T., and 983 P. Pate, "Structure-Aware Time Division Multiplexed (TDM) 984 Circuit Emulation Service over Packet Switched Network 985 (CESoPSN)", RFC 5086, DOI 10.17487/RFC5086, December 2007, 986 . 988 [RFC5087] Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi, 989 "Time Division Multiplexing over IP (TDMoIP)", RFC 5087, 990 DOI 10.17487/RFC5087, December 2007, 991 . 993 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 994 Friendly Rate Control (TFRC): Protocol Specification", 995 RFC 5348, DOI 10.17487/RFC5348, September 2008, 996 . 998 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 999 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1000 . 1002 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1003 and K. Carlberg, "Explicit Congestion Notification (ECN) 1004 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 1005 2012, . 1007 [RTP-CB] Perkins, and Singh, "Multimedia Congestion Control: 1008 Circuit Breakers for Unicast RTP Sessions", February 2014. 1010 Author's Address 1012 Godred Fairhurst 1013 University of Aberdeen 1014 School of Engineering 1015 Fraser Noble Building 1016 Aberdeen, Scotland AB24 3UE 1017 UK 1019 Email: gorry@erg.abdn.ac.uk 1020 URI: http://www.erg.abdn.ac.uk