idnits 2.17.1 draft-ietf-tsvwg-circuit-breaker-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 25, 2015) is 3133 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group G. Fairhurst 3 Internet-Draft University of Aberdeen 4 Intended status: Best Current Practice September 25, 2015 5 Expires: March 28, 2016 7 Network Transport Circuit Breakers 8 draft-ietf-tsvwg-circuit-breaker-04 10 Abstract 12 This document explains what is meant by the term "network transport 13 Circuit Breaker" (CB). It describes the need for circuit breakers 14 when using network tunnels, and other non-congestion controlled 15 applications. It also defines requirements for building a circuit 16 breaker and the expected outcomes of using a circuit breaker within 17 the Internet. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 28, 2016. 36 Copyright Notice 38 Copyright (c) 2015 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Types of Circuit-Breaker . . . . . . . . . . . . . . . . 4 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Design of a Circuit-Breaker (What makes a good circuit 57 breaker?) . . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 3.1. Functional Components . . . . . . . . . . . . . . . . . . 5 59 4. Requirements for a Network Transport Circuit Breaker . . . . 8 60 4.1. Unidirectional Circuit Breakers over Controlled Paths . . 10 61 4.1.1. Use with a multicast control/routing protocol . . . . 10 62 4.1.2. Use with control protocols supporting pre- 63 prosvisioned capacity . . . . . . . . . . . . . . . . 12 64 5. Examples of Circuit Breakers . . . . . . . . . . . . . . . . 12 65 5.1. A Fast-Trip Circuit Breaker . . . . . . . . . . . . . . . 12 66 5.1.1. A Fast-Trip Circuit Breaker for RTP . . . . . . . . . 13 67 5.2. A Slow-trip Circuit Breaker . . . . . . . . . . . . . . . 13 68 5.3. A Managed Circuit Breaker . . . . . . . . . . . . . . . . 14 69 5.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires . . 14 70 5.3.2. A Managed Circuit Breaker for Pseudowires (PWs) . . . 15 71 6. Examples where circuit breakers may not be needed. . . . . . 15 72 6.1. CBs over pre-provisioned Capacity . . . . . . . . . . . . 16 73 6.2. CBs with tunnels carrying Congestion-Controlled Traffic . 16 74 6.3. CBs with Uni-directional Traffic and no Control Path . . 17 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 77 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 78 10. Revision Notes . . . . . . . . . . . . . . . . . . . . . . . 18 79 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 80 11.1. Normative References . . . . . . . . . . . . . . . . . . 19 81 11.2. Informative References . . . . . . . . . . . . . . . . . 19 82 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 20 84 1. Introduction 86 A network transport Circuit Breaker (CB) is an automatic mechanism 87 that is used to estimate congestion caused by a flow, and to 88 terminate (or significantly reduce the rate of) the flow when 89 persistent congestion is detected. This is a safety measure to 90 prevent congestion collapse (starvation of resources available to 91 other flows), essential for an Internet that is heterogeneous and for 92 traffic that is hard to predict in advance. 94 The term "Circuit Breaker" originates in electricity supply, and has 95 nothing to do with network circuits or virtual circuits. In 96 electricity supply, a Circuit Breaker is intended as a protection 97 mechanism of last resort. Under normal circumstances, a Circuit 98 Breaker ought not to be triggered; It is designed to protect the 99 supply network and attached equipment when there is overload. Just 100 as people do not expect the electrical circuit-breaker (or fuse) in 101 their home to be triggered, except when there is a wiring fault or a 102 problem with an electrical appliance. 104 In networking, the Circuit Breaker principle can be used as a 105 protection mechanism of last resort to avoid persistent congestion. 106 Persistent congestion (also known as "congestion collapse") was a 107 feature of the early Internet of the 1980s. This resulted in excess 108 traffic starving other connection from access to the Internet. It 109 was countered by the requirement to use congestion control (CC) by 110 the Transmission Control Protocol (TCP) [Jacobsen88] [RFC1112]. 111 These mechanisms operate in Internet hosts to cause TCP connections 112 to "back off" during congestion. The introduction of a Congestion 113 Controller in TCP (currently documented in [RFC5681] ensured the 114 stability of the Internet, because it was able to detect congestion 115 and promptly react. This worked well while TCP was by far the 116 dominant traffic in the Internet, and most TCP flows were long-lived 117 (ensuring that they could detect and respond to congestion before the 118 flows terminated). This is no longer the case, and non-congestion 119 controlled traffic, including many applications of the User Datagram 120 Protocol (UDP) can form a significant proportion of the total traffic 121 traversing a link. The current Internet therefore requires that non- 122 congestion controlled traffic needs to be considered to avoid 123 congestion collapse. 125 There are important differences between a transport circuit-breaker 126 and a congestion-control method. Specifically, congestion control 127 (as implemented in TCP, SCTP, and DCCP) operates on the timescale on 128 the order of a packet round-trip-time (RTT), the time from sender to 129 destination and return. Congestion control methods are able to react 130 to a single packet loss/marking and reduce the transmission rate for 131 each loss or congestion event. The goal is usually to limit the 132 maximum transmission rate to a rate that reflects the available 133 capacity across a network path. These methods typically operate on 134 individual traffic flows (e.g., a 5-tuple). 136 In contrast, Circuit Breakers are recommended for non-congestion- 137 controlled Internet flows and for traffic aggregates, e.g., traffic 138 sent using a network tunnel. Later sections provide examples of 139 cases where circuit-breakers may or may not be desirable. 141 A Circuit Breaker needs to measure (meter) the traffic to determine 142 if the network is experiencing congestion and needs to be designed to 143 trigger robustly when there is persistent congestion. This means the 144 trigger needs to operate on a timescale much longer than the path 145 round trip time (e.g., seconds to possibly many tens of seconds). 146 This longer period is needed to provide sufficient time for 147 transports (or applications) to adjust their rate following 148 congestion, and for the network load to stabalize after any 149 adjustment. 151 A Circuit Breaker trigger will often utilize a series of successive 152 sample measurements metered at an ingress point and an egress point 153 (either of which could be a transport endpoint). These measurements 154 need taken over a reasonably long period of time. This is to ensure 155 that a Circuit Breaker does not accidentally trigger following a 156 single (or even successive) congestion events (congestion events are 157 what triggers congestion control, and are to be regarded as normal on 158 a network link operating near its capacity). Once triggered, a 159 control function needs to remove traffic from the network, either by 160 disabling the flow or by significantly reducing the level of traffic. 161 This reaction provides the required protection to prevent persistent 162 congestion being experienced by other flows that share the congested 163 part of the network path. 165 Section 4 defines requirements for building a Circuit Breaker. 167 1.1. Types of Circuit-Breaker 169 There are various forms of network transport circuit breaker. These 170 are differentiated mainly on the timescale over which they are 171 triggered, but also in the intended protection they offer: 173 o Fast-Trip Circuit Breakers: The relatively short timescale used by 174 this form of circuit breaker is intended to protect a flow or 175 related group of flows. 177 o Slow-Trip Circuit Breakers: This circuit breaker utilizes a longer 178 timescale and is designed to protect traffic aggregates. 180 o Managed Circuit Breakers: Utilize the operations and management 181 functions that might be present in a managed service to implement 182 a circuit breaker. 184 Examples of each type of circuit breaker are provided in section 4. 186 2. Terminology 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in [RFC2119]. 192 3. Design of a Circuit-Breaker (What makes a good circuit breaker?) 194 Although circuit breakers have been talked about in the IETF for many 195 years, there has not yet been guidance on the cases where circuit 196 breakers are needed or upon the design of circuit breaker mechanisms. 197 This document seeks to offer advise on these two topics. 199 Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that 200 carry non-congestion-controlled Internet flows and for traffic 201 aggregates, e.g., traffic sent using a network tunnel. Designers of 202 other protocols and tunnel encapsulations also ought to consider the 203 use of these techniques to provide last resort protection to the 204 network paths that these are used. 206 This document defines the requirements for design of a Circuit 207 Breaker and provides examples of how a Circuit Breaker can be 208 constructed. The specifications of individual protocols and tunnels 209 encapsulations need to detail the protocol mechanisms needed to 210 implement a Circuit Breaker. 212 Section 3.1 describes the functional components of a circuit breaker 213 and section 3.2 defines requirements for implementing a Circuit 214 Breaker. 216 3.1. Functional Components 218 The basic design of a transport circuit breaker involves 219 communication between an ingress point (a sender) and an egress point 220 (a receiver) of a network flow. A simple picture of Circuit Breaker 221 operation is provided in figure 1. This shows a set of routers (each 222 labelled R) connecting a set of endpoints. A Circuit Breaker is used 223 to control traffic passing through a subset of these routers, acting 224 between the ingress and a egress point network devices. The path 225 between the ingress and egress could be provided by a tunnel or other 226 network-layer technique. One expected use would be at the ingress 227 and egress of a service. 229 +--------+ +--------+ 230 |Endpoint| |Endpoint| 231 +--+-----+ >>> circuit breaker traffic >>> +--+-----+ 232 | | 233 | +-+ +-+ +---------+ +-+ +-+ +-+ +--------+ +-+ +-+ | 234 +-+R+--+R+->+ Ingress +--+R+--+R+--+R+--+ Egress |--+R+--+R+-+ 235 +++ +-+ +------+--+ +-+ +-+ +-+ +-----+--+ +++ +-+ 236 | ^ | | | 237 | | +--+------+ +------+--+ | 238 | | | Ingress | | Egress | | 239 | | | Meter | | Meter | | 240 | | +----+----+ +----+----+ | 241 | | | | | 242 +-+ | | +----+----+ | | +-+ 243 |R+--+ | | Measure +<----------------+ +--+R| 244 +++ | +----+----+ Reported +++ 245 | | | Egress | 246 | | +----+----+ Measurement | 247 +--+-----+ | | Trigger + +--+-----+ 248 |Endpoint| | +----+----+ |Endpoint| 249 +--------+ | | +--------+ 250 +---<---+ 251 Reaction 253 Figure 1: A CB controlling the part of the end-to-end path between an 254 ingress point and an egress point. (Note: In some cases, the trigger 255 and measure functions could alternatively be located at other 256 locations (e.g., at a network operations centre.) 258 In the context of a Circuit Breaker, the ingress and egress functions 259 could be located in one or both network endpoints (see figure 2), for 260 example, implemented as components within a transport protocol. 262 +----------+ +----------+ 263 | Ingress | +-+ +-+ +-+ | Egress | 264 | Endpoint +->+R+--+R+--+R+--+ Endpoint | 265 +--+----+--+ +-+ +-+ +-+ +----+-----+ 266 ^ | | 267 | +--+------+ +----+----+ 268 | | Ingress | | Egress | 269 | | Meter | | Meter | 270 | +----+----+ +----+----+ 271 | | | 272 | +--- +----+ | 273 | | Measure +<-----------------+ 274 | +----+----+ Reported 275 | | Egress 276 | +----+----+ Measurement 277 | | Trigger | 278 | +----+----+ 279 | | 280 +---<--+ 281 Reaction 283 Figure 2: An endpoint CB implemented at the sender (ingress) and 284 receiver (egress). 286 The set of components needed to implement a Circuit Breaker are: 288 1. An ingress meter (at the sender or tunnel ingress) records the 289 number of packets/bytes sent in each measurement interval. This 290 measures the offered network load. For example, the measurement 291 interval could be every few seconds. 293 2. An egress meter (at the receiver or tunnel egress) records the 294 number/bytes received in each measurement interval. This 295 measures the supported load and could utilize other signals to 296 detect the effect of congestion (e.g., loss/marking experienced 297 over the path). 299 3. The measured values at the ingress and egress are communicated to 300 the Circuit Breaker Measurement function. This could use several 301 methods including: Sending return measurement packets from a 302 receiver to a trigger function at the sender; An implementation 303 using Operations, Administration and Management (OAM); or be 304 sending another in-band signalling datagram to the trigger 305 function. This could also be implemented purely as a control 306 plane function, e.g., using a software-defined network 307 controller. 309 4. The measurement function combines the ingress and egress 310 measurements to assess the present level of network congestion. 311 (For example, the loss rate for each measurement interval could 312 be deduced from calculating the difference between ingress and 313 egress counter values. Note the method does not require high 314 accuracy for the period of the measurement interval (or therefore 315 the measured value, since isolated and/or infrequent loss events 316 need to be disregarded.) 318 5. A trigger function determines if the measurements indicate 319 persistent congestion. This function defines an appropriate 320 threshold for determining there is persistent congestion between 321 the ingress and egress. This preferably consider rate or ratio, 322 rather than an absolute value (e.g., more than 10% loss, but 323 other methods could also be based on the rate of transmission as 324 well as the loss rate). The transport Circuit Breaker is 325 triggered when the threshold is exceeded in multiple measurement 326 intervals (e.g., 3 successive measurements). Designs need to be 327 robust so that single or spurious events do not trigger a 328 reaction. 330 6. A reaction that is applied that the Ingress when the Circuit 331 Breaker is triggered. This seeks to automatically remove the 332 traffic causing persistent congestion. 334 7. A feedback mechanism that triggers when either the receive or 335 ingress and egress measurements are not available, since this 336 also could indicate a loss of control packets (also a symptom of 337 heavy congestion or inability to control the load). 339 4. Requirements for a Network Transport Circuit Breaker 341 The requirements for implementing a Circuit Breaker are: 343 o There MUST be a control path from the ingress meter and the egress 344 meter to the point of measurement. The Circuit Breaker MUST 345 trigger if this control path fails. That is, the feedback 346 indicating a congested period needs to be designed so that the 347 Circuit Breaker is triggered when it fails to receive measurement 348 reports that indicate an absence of congestion, rather than 349 relying on the successful transmission of a "congested" signal 350 back to the sender. (The feedback signal could itself be lost 351 under congestion). 353 o A Circuit Breaker MUST define a measurement period over which the 354 receiver measures the level of congestion or loss. This method 355 does not have to detect individual packet loss, but MUST have a 356 way to know that packets have been lost/marked from the traffic 357 flow. If Explicit Congestion Notification (ECN) is enabled 358 [RFC3168], an egress meter MAY also count the number of ECN 359 congestion marks/event per measurement interval, but even if ECN 360 is used, loss MUST still be measured, since this better reflects 361 the impact of persistent congestion. In this context, loss 362 represents a reliable indication of congestion, as opposed to the 363 finer-grain marking of incipient congestion that can be provided 364 via ECN. The type of Circuit Breaker will determine how long this 365 measurement period needs to be. 367 o The measurement period MUST be longer than the time that current 368 Congestion Control algorithms need to reduce their rate following 369 detection of congestion. This is important because end-to-end 370 Congestion Control algorithms require at least one RTT to notify 371 and adjust to experienced congestion, and congestion bottlenecks 372 can share traffic with a diverse range of RTTs and Circuit 373 Breakers hence need to perform measurements over a sufficiently 374 long period to avoid additionally penalizing flows with a long 375 path RTT (e.g., many path RTTs). In some implementations, this 376 may require a measurement to combine multiple meter samples to 377 achieve a sufficiently long measurement period. In most cases, 378 the measurement period is expected to be significantly longer than 379 the RTT experience by the Circuit Breaker itself. 381 o A Circuit Breaker is REQUIRED to define a threshold to determine 382 whether the measured congestion is considered excessive. 384 o A Circuit Breaker is REQUIRED to define the triggering interval, 385 defining the period over which the trigger uses the collected 386 measurements. 388 o A Circuit Breaker MUST be robust to multiple congestion events. 389 This usually will define a number of measured persistent 390 congestion events per triggering period. For example, a Circuit 391 Breaker MAY combine the results of several measurement periods to 392 determine if the Circuit Breaker is triggered. (e.g., triggered 393 when persistent congestion is detected in 3 of the measurements 394 within the triggering interval). 396 o A Circuit Breaker SHOULD be constructed so that it does not 397 trigger under light or intermittent congestion, with a default 398 response to a trigger that disables all traffic that contributed 399 to congestion. 401 o Once triggered, the Circuit Breaker MUST react decisively by 402 disabling or significantly reducing traffic at the source (e.g., 403 ingress). A reaction that results in a reduction SHOULD result in 404 reducing the traffic by at least a factor of ten, each time the 405 Circuit Breaker is triggered. 407 o Some circuit breaker designs use a reaction that reduces, rather 408 that disables, the flows it controls. This response MUST be much 409 more severe than that of a Congestion Controller algorithm, 410 because the Circuit Breaker reacts to more persistent congestion 411 and operates over longer timescales (i.e., the overload condition 412 will have persisted for a longer time before the Circuit Breaker 413 is triggered). A Circuit Breaker that reduces the rate of a flow, 414 MUST continue to monitor the level congestion and MUST further 415 reduce the rate if the Circuit Breaker is again triggered. 417 o The reaction to a triggered Circuit Breaker MUST continue for a 418 period that is at least the triggering interval. Manual operator 419 intervention will usually be required to restore a flow. If an 420 automated response is needed to reset the trigger, then this MUST 421 NOT be immediate. The design of an automated reset mechanism 422 needs to be sufficiently conservative that it does not adversely 423 interact with other mechanisms (including other Circuit Breaker 424 algorithms that control traffic over a common path). It SHOULD 425 NOT perform an automated reset when there is evidence of continued 426 congestion. 428 o When a Circuit Breaker is triggered, it SHOULD be regarded as an 429 abnormal network event. As such, this event SHOULD be logged. 430 The measurements that lead to triggering of the Circuit Breaker 431 SHOULD also be logged. 433 4.1. Unidirectional Circuit Breakers over Controlled Paths 435 A Circuit Breaker can be used to control uni-directional UDP traffic, 436 providing that there is a control path to connect the functional 437 components at the Ingress and Egress. This control path can exist in 438 networks for which the traffic flow is purely unidirectional. For 439 example, a multicast stream that sends packets across an Internet 440 path and can use multicast routing to prune flows to shed network 441 load. Some other types of subnetwork also utilize control protocols 442 that can be used to control traffic flows. 444 4.1.1. Use with a multicast control/routing protocol 445 +----------+ +--------+ +----------+ 446 | Ingress | +-+ +-+ +-+ | Egress | | Egress | 447 | Endpoint +->+R+--+R+--+R+--+ Router |--+ Endpoint +->+ 448 +----+-----+ +-+ +-+ +-+ +---+--+-+ +----+-----+ | 449 ^ ^ ^ ^ | ^ | | 450 | | | | | | | | 451 +----+----+ + - - - < - - - - + | +----+----+ | Reported 452 | Ingress | multicast Prune | | Egress | | Ingress 453 | Meter | | | Meter | | Measurement 454 +---------+ | +----+----+ | 455 | | | 456 | +----+----+ | 457 | | Measure +<--+ 458 | +----+----+ 459 | | 460 | +----+----+ 461 multicast | | Trigger | 462 Leave | +----+----+ 463 Message | | 464 +----<----+ 466 Figure 3: An example of a multicast CB controlling the end-to-end 467 path between an ingress endpoint and an egress endpoint. 469 Figure 3 shows one example of how a multicast circuit breaker could 470 be implemented at a pair of multicast endpoints (e.g. to implement a 471 Section 5.1). The ingress endpoint (the sender that sources the 472 multicast traffic) meters the ingress load, generating an ingress 473 measurement (e.g., recording timestamped packet counts), and sends 474 this measurement to the multicast group together with the traffic it 475 has measured. 477 Routers along a multicast path forward the multicast traffic 478 (including the ingress measurement) to all active endpoint receivers. 479 Each last hop (egress) router forwards the traffic to one or more 480 egress endpoint(s). 482 In this figure, each endpoint includes a meter that performs a local 483 egress load measurement. An endpoint also extracts the received 484 ingress measurement from the traffic, and compares the ingress and 485 egress measurements to determine if the Circuit Breaker ought to be 486 triggered. This measurement has to be robust to loss (see previous 487 section). If the Circuit Breaker is triggered, it generates a 488 multicast leave message for the egress (e.g., an IGMP or MLD message 489 sent to the last hop router), which causes the upstream router to 490 cease forwarding traffic to the egress endpoint. 492 Any multicast router that has no active receivers for a particular 493 multicast group will prune traffic for that group, sending a prune 494 message to its upstream router. This starts the process of releasing 495 the capacity used by the traffic and is a standard multicast routing 496 function (e.g., using the PIM-SM routing protocol). Each egress 497 operates autonomously, and the circuit breaker "reaction" is executed 498 by the multicast control plane (e.g., PIM l), requiring no explicit 499 signalling by the circuit breaker along the control path. Note: 500 there is no direct communication with the Ingress, and hence a 501 triggered Circuit Breaker only controls traffic downstream of the 502 first hop router. It does not stop traffic flowing from the sender 503 to the first hop router; this is however the common practice for 504 multicast deployment. 506 The method could also be used with a multicast tunnel or subnetwork 507 (e.g., Section 5.2, Section 5.3), where a meter at the ingress 508 generates additional control messages to carry the measurement data 509 towards the egress where the egress metering is implemented. 511 4.1.2. Use with control protocols supporting pre-prosvisioned capacity 513 Some paths are provisioned using a control protocol, e.g., flows 514 provisioned using the Multi-Protocol Label Switching (MPLS) services, 515 path provisioned using the Resource reservation protocol (RSVP), 516 networks utilizing Software Defining Network (SDN) functions, or 517 admission-controlled Differentiated Services. 519 Figure 1 shows one expected use case, where in this usage a separate 520 device could be used to perform the measurement and trigger 521 functions. The reaction generated by the trigger could take the form 522 of a network control message sent to the ingress and/or other network 523 elements causing these elements to react to the Circuit Breaker. 524 Examples of this type of use are provided in section Section 5.3. 526 5. Examples of Circuit Breakers 528 There are multiple types of Circuit Breaker that could be defined for 529 use in different deployment cases. This section provides examples of 530 different types of circuit breaker: 532 5.1. A Fast-Trip Circuit Breaker 534 A fast-trip circuit breaker is the most responsive form of Circuit 535 Breaker. It has a response time that is only slightly larger than 536 that of the traffic that it controls. It is suited to traffic with 537 well-understood characteristics (and could include one or more 538 trigger functions specifically tailored the type of traffic for which 539 it is designed). It is not be suited to arbitrary network traffic, 540 since it could prematurely trigger (e.g., when multiple congestion- 541 controlled flows lead to short-term overload). 543 5.1.1. A Fast-Trip Circuit Breaker for RTP 545 A set of fast-trip Circuit Breaker methods have been specified for 546 use together by a Real-time Transport Protocol (RTP) flow using the 547 RTP/AVP Profile [RTP-CB]. It is expected that, in the absence of 548 severe congestion, all RTP applications running on best-effort IP 549 networks will be able to run without triggering these circuit 550 breakers. A fast-trip RTP Circuit Breaker is therefore implemented 551 as a fail-safe. 553 The sender monitors reception of RTCP reception report blocks, as 554 contained in SR or RR packets, that convey reception quality feedback 555 information. This is used to measure (congestion) loss, possibly in 556 combination with ECN [RFC6679]. 558 The Circuit Breaker action (shutdown of the flow) is triggered when 559 any of the following trigger conditions are true: 561 1. An RTP Circuit Breaker triggers on reported lack of progress. 563 2. An RTP Circuit Breaker triggers when no receiver reports messages 564 are received. 566 3. An RTP Circuit Breaker uses a TFRC-style check and sets a hard 567 upper limit to the long-term RTP throughput (over many RTTs). 569 4. An RTP Circuit Breaker includes the notion of Media Usability. 570 This circuit breaker is triggered when the quality of the 571 transported media falls below some required minimum acceptable 572 quality. 574 5.2. A Slow-trip Circuit Breaker 576 A slow-trip Circuit Breaker could be implemented in an endpoint or 577 network device. This type of Circuit Breaker is much slower at 578 responding to congestion than a fast-trip Circuit Breaker and is 579 expected to be more common. 581 One example where a slow-trip Circuit Breaker is needed is where 582 flows or traffic-aggregates use a tunnel or encapsulation and the 583 flows within the tunnel do not all support TCP-style congestion 584 control (e.g., TCP, SCTP, TFRC), see [RFC5405] section 3.1.3. A use 585 case is where tunnels are deployed in the general Internet (rather 586 than "controlled environments" within an ISP or Enterprise), 587 especially when the tunnel could need to cross a customer access 588 router. 590 5.3. A Managed Circuit Breaker 592 A managed Circuit Breaker is implemented in the signalling protocol 593 or management plane that relates to the traffic aggregate being 594 controlled. This type of circuit breaker is typically applicable 595 when the deployment is within a "controlled environment". 597 A Circuit Breaker requires more than the ability to determine that a 598 network path is forwarding data, or to measure the rate of a path - 599 which are often normal network operational functions. There is an 600 additional need to determine a metric for congestion on the path and 601 to trigger a reaction when a threshold is crossed that indicates 602 persistent congestion. 604 5.3.1. A Managed Circuit Breaker for SAToP Pseudo-Wires 606 [RFC4553], SAToP Pseudo-Wires (PWE3), section 8 describes an example 607 of a managed circuit breaker for isochronous flows. 609 If such flows were to run over a pre-provisioned (e.g., MPLS) 610 infrastructure, then it could be expected that the Pseudo-Wire (PW) 611 would not experience congestion, because a flow is not expected to 612 either increase (or decrease) their rate. If instead Pseudo-Wire 613 traffic is multiplexed with other traffic over the general Internet, 614 it could experience congestion. [RFC4553] states: "If SAToP PWs run 615 over a PSN providing best-effort service, they SHOULD monitor packet 616 loss in order to detect "severe congestion". The currently 617 recommended measurement period is 1 second, and the trigger operates 618 when there are more than three measured Severely Errored Seconds 619 (SES) within a period. 621 If such a condition is detected, a SAToP PW ought to shut down 622 bidirectionally for some period of time...". The concept was that 623 when the packet loss ratio (congestion) level increased above a 624 threshold, the PW was by default disabled. This use case considered 625 fixed-rate transmission, where the PW had no reasonable way to shed 626 load. 628 The trigger needs to be set at the rate that the PW was likely to 629 experience a serious problem, possibly making the service non- 630 compliant. At this point, triggering the Circuit Breaker would 631 remove the traffic preventing undue impact on congestion-responsive 632 traffic (e.g., TCP). Part of the rationale, was that high loss 633 ratios typically indicated that something was "broken" and ought to 634 have already resulted in operator intervention, and therefore need to 635 trigger this intervention. 637 An operator-based response provides opportunity for other action to 638 restore the service quality, e.g., by shedding other loads or 639 assigning additional capacity, or to consciously avoid reacting to 640 the trigger while engineering a solution to the problem. This could 641 require the trigger to be sent to a third location (e.g., a network 642 operations centre, NOC) responsible for operation of the tunnel 643 ingress, rather than the tunnel ingress itself. 645 5.3.2. A Managed Circuit Breaker for Pseudowires (PWs) 647 Pseudowires (PWs) [RFC3985] have become a common mechanism for 648 tunneling traffic, and may compete for network resources both with 649 other PWs and with non-PW traffic, such as TCP/IP flows. 651 [ID-ietf-pals-congcons] discusses congestion conditions that can 652 arise when PWs compete with elastic (i.e., congestion responsive) 653 network traffic (e.g, TCP traffic). Elastic PWs carrying IP traffic 654 (see [RFC4488]) do not raise major concerns because all of the 655 traffic involved responds, reducing the transmission rate when 656 network congestion is detected. 658 In contrast, inelastic PWs (e.g., fixed (e.g., fixed bandwidth Time 659 Division Multiplex (TDM) [RFC4553] [RFC5086] [RFC5087]) have the 660 potential to harm congestion responsive traffic or to contribute to 661 excessive congestion because inelastic PWs do not adjust their 662 transmission rate in response to congestion. [ID-ietf-pals-congcons] 663 analyses TDM PWs, with an initial conclusion that a TDM PW operating 664 with a degree of loss that may result in congestion-related problems 665 is also operating with a degree of loss that results in an 666 unacceptable TDM service. For that reason, the draft suggests that a 667 managed circuit breaker that shuts down a PW when it persistently 668 fails to deliver acceptable TDM service is a useful means for 669 addressing these congestion concerns. 671 6. Examples where circuit breakers may not be needed. 673 A Circuit Breaker is not required for a single Congestion Controller- 674 controlled flow using TCP, SCTP, TFRC, etc. In these cases, the 675 Congestion Control methods are already designed to prevent congestion 676 collapse. 678 6.1. CBs over pre-provisioned Capacity 680 One common question is whether a Circuit Breaker is needed when a 681 tunnel is deployed in a private network with pre-provisioned 682 capacity? 684 In this case, compliant traffic that does not exceed the provisioned 685 capacity ought not to result in congestion collapse. A Circuit 686 Breaker will hence only be triggered when there is non-compliant 687 traffic. It could be argued that this event ought never to happen - 688 but it could also be argued that the Circuit Breaker equally ought 689 never to be triggered. If a Circuit Breaker were to be implemented, 690 it will provide an appropriate response if persistent congestion 691 occurs in an operational network. 693 Implementing a Circuit Breaker will not reduce the performance of the 694 flows, but offers protection in the event that persistent congestion 695 occurs. This also could be used to protect from a failure that 696 causes traffic to be routed over a non-pre-provisioned path. 698 6.2. CBs with tunnels carrying Congestion-Controlled Traffic 700 IP-based traffic is generally assumed to be congestion-controlled, 701 i.e., it is assumed that the transport protocols generating IP-based 702 traffic at the sender already employ mechanisms that are sufficient 703 to address congestion on the path [RFC5405]. A question therefore 704 arises when people deploy a tunnel that is thought to only carry an 705 aggregate of TCP (or some other Congestion Controller-controlled) 706 traffic: Is there advantage in this case in using a Circuit Breaker? 708 For sure, traffic in a such a tunnel will respond to congestion. 709 However, the answer to the question is not always obvious, because 710 the overall traffic formed by an aggregate of flows that implement a 711 Congestion Controller mechanism does not necessarily prevent 712 congestion collapse. For instance, most Congestion Controller 713 mechanisms require long-lived flows to react to reduce the rate of a 714 flow, an aggregate of many short flows could result in many 715 terminating before they experience congestion. It is also often 716 impossible for a tunnel service provider to know that the tunnel only 717 contains CC-controlled traffic (e.g., Inspecting packet headers could 718 not be possible). The important thing to note is that if the 719 aggregate of the traffic does not result in persistent congestion 720 (impacting other flows), then the Circuit Breaker will not trigger. 721 This is the expected case in this context - so implementing a Circuit 722 Breaker will not reduce performance of the tunnel, but offers 723 protection in the event that persistent congestion occur. 725 6.3. CBs with Uni-directional Traffic and no Control Path 727 A one-way forwarding path could have no associated control path, and 728 therefore cannot be controlled using an automated process. This 729 service could be provided using a path that has dedicated capacity 730 and does not share this capacity with other elastic Internet flows 731 (i.e., flows that vary their rate). 733 A way to mitigate the impact on other flows when capacity could be 734 shared is to manage the traffic envelope by using ingress policing. 736 Supporting this type of traffic in the general Internet requires 737 operator monitoring to detect and respond to persistent congestion. 739 7. Security Considerations 741 All Circuit Breaker mechanisms rely upon coordination between the 742 ingress and egress meters and communication with the trigger 743 function. This is usually achieved by passing network control 744 information (or protocol messages) across the network. Timely 745 operation of a circuit breaker depends on the choice of measurement 746 period. If the receiver has an interval that is overly long, then 747 the responsiveness of the circuit breaker decreases. This impacts 748 the ability of the circuit breaker to detect and react to congestion. 750 Mechanisms need to be implemented to prevent attacks on the network 751 control information that would result in Denial of Service (DoS). 752 The source and integrity of control information (measurements and 753 triggers) MUST be protected from off-path attacks. Without 754 protection, it could be trivial for an attacker to inject packets 755 with values that could prematurely trigger a circuit breaker 756 resulting in DoS. Simple protection can be provided by using a 757 randomized source port, or equivalent field in the packet header 758 (such as the RTP SSRC value and the RTP sequence number) expected not 759 to be known to an off-path attacker. Stronger protection can be 760 achieved using a secure authentication protocol. 762 Transmission of network control information consumes network 763 capacity. This control traffic needs to be considered in the design 764 of a circuit breaker and could potentially add to network congestion. 765 If this traffic is sent over a shared path, it is RECOMMENDED that 766 this control traffic is prioritized to reduce the probability of loss 767 under congestion. Control traffic also needs to be considered when 768 provisioning a network that uses a circuit breaker. 770 The circuit breaker MUST be designed to be robust to packet loss that 771 can also be experienced during congestion/overload. Loss of control 772 traffic could be a side-effect of a congested network, but also could 773 arise from other causes. 775 Each design of a Circuit Breaker MUST evaluate whether the particular 776 circuit breaker mechanism has new security implications. 778 8. IANA Considerations 780 This document makes no request from IANA. 782 9. Acknowledgments 784 There are many people who have discussed and described the issues 785 that have motivated this draft. Contributions and comments included: 786 Lars Eggert, Colin Perkins, David Black, Matt Mathis and Andrew 787 McGregor. This work was part-funded by the European Community under 788 its Seventh Framework Programme through the Reducing Internet 789 Transport Latency (RITE) project (ICT-317700). 791 10. Revision Notes 793 XXX RFC-Editor: Please remove this section prior to publication XXX 795 Draft 00 797 This was the first revision. Help and comments are greatly 798 appreciated. 800 Draft 01 802 Contained clarifications and changes in response to received 803 comments, plus addition of diagram and definitions. Comments are 804 welcome. 806 WG Draft 00 808 Approved as a WG work item on 28th Aug 2014. 810 WG Draft 01 812 Incorporates feedback after Dallas IETF TSVWG meeting. This version 813 is thought ready for WGLC comments. 815 WG Draft 02 817 Minor fixes for typos. Rewritten security considerations section. 819 WG Draft 03 820 Updates following WGLC comments (see TSV mailing list). Comments 821 from C Perkins; D Black and off-list feedback. 823 A clear recommendation of intended scope. 825 Changes include: Improvement of language on timescales and minimum 826 measurement period; clearer articulation of endpoint and multicast 827 examples - with new diagrams; separation of the controlled network 828 case; updated text on position of trigger function; corrections to 829 RTP-CB text; clarification of loss v ECN metrics; checks against 830 submission checklist 9use of keywords, added meters to diagrams). 832 WG Draft 04 834 Added section on PW CB for TDM - a newly adopted draft (D. Black). 836 11. References 838 11.1. Normative References 840 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 841 Requirement Levels", BCP 14, RFC 2119, 842 DOI 10.17487/RFC2119, March 1997, 843 . 845 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 846 of Explicit Congestion Notification (ECN) to IP", 847 RFC 3168, DOI 10.17487/RFC3168, September 2001, 848 . 850 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 851 for Application Designers", BCP 145, RFC 5405, 852 DOI 10.17487/RFC5405, November 2008, 853 . 855 11.2. Informative References 857 [ID-ietf-pals-congcons] 858 Stein, YJ., Black, D., and B. Briscoe, "Pseudowire 859 Congestion Considerations (Work-in-Progress)", 2015. 861 [Jacobsen88] 862 European Telecommunication Standards, Institute (ETSI), 863 "Congestion Avoidance and Control", SIGCOMM Symposium 864 proceedings on Communications architectures and 865 protocols", August 1998. 867 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 868 RFC 1112, DOI 10.17487/RFC1112, August 1989, 869 . 871 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 872 Edge-to-Edge (PWE3) Architecture", RFC 3985, 873 DOI 10.17487/RFC3985, March 2005, 874 . 876 [RFC4488] Levin, O., "Suppression of Session Initiation Protocol 877 (SIP) REFER Method Implicit Subscription", RFC 4488, 878 DOI 10.17487/RFC4488, May 2006, 879 . 881 [RFC4553] Vainshtein, A., Ed. and YJ. Stein, Ed., "Structure- 882 Agnostic Time Division Multiplexing (TDM) over Packet 883 (SAToP)", RFC 4553, DOI 10.17487/RFC4553, June 2006, 884 . 886 [RFC5086] Vainshtein, A., Ed., Sasson, I., Metz, E., Frost, T., and 887 P. Pate, "Structure-Aware Time Division Multiplexed (TDM) 888 Circuit Emulation Service over Packet Switched Network 889 (CESoPSN)", RFC 5086, DOI 10.17487/RFC5086, December 2007, 890 . 892 [RFC5087] Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi, 893 "Time Division Multiplexing over IP (TDMoIP)", RFC 5087, 894 DOI 10.17487/RFC5087, December 2007, 895 . 897 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 898 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 899 . 901 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 902 and K. Carlberg, "Explicit Congestion Notification (ECN) 903 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 904 2012, . 906 [RTP-CB] Perkins, and Singh, "Multimedia Congestion Control: 907 Circuit Breakers for Unicast RTP Sessions", February 2014. 909 Author's Address 910 Godred Fairhurst 911 University of Aberdeen 912 School of Engineering 913 Fraser Noble Building 914 Aberdeen, Scotland AB24 3UE 915 UK 917 Email: gorry@erg.abdn.ac.uk 918 URI: http://www.erg.abdn.ac.uk