idnits 2.17.1 draft-ietf-soc-overload-control-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 724 has weird spacing: '...control param...' == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (December 9, 2013) is 3790 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 827 -- Looks like a reference, but probably isn't: '100' on line 827 == Outdated reference: A later version (-13) exists of draft-ietf-soc-load-control-event-package-10 == Outdated reference: A later version (-10) exists of draft-ietf-soc-overload-rate-control-06 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Gurbani, Ed. 3 Internet-Draft V. Hilt 4 Intended status: Standards Track Bell Laboratories, 5 Expires: June 12, 2014 Alcatel-Lucent 6 H. Schulzrinne 7 Columbia University 8 December 9, 2013 10 Session Initiation Protocol (SIP) Overload Control 11 draft-ietf-soc-overload-control-14 13 Abstract 15 Overload occurs in Session Initiation Protocol (SIP) networks when 16 SIP servers have insufficient resources to handle all SIP messages 17 they receive. Even though the SIP protocol provides a limited 18 overload control mechanism through its 503 (Service Unavailable) 19 response code, SIP servers are still vulnerable to overload. This 20 document defines the behaviour of SIP servers involved in overload 21 control, and in addition, it specifies a loss-based overload scheme 22 for SIP. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on June 12, 2014. 41 Copyright Notice 43 Copyright (c) 2013 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3. Overview of operations . . . . . . . . . . . . . . . . . . . . 5 61 4. Via header parameters for overload control . . . . . . . . . . 6 62 4.1. The oc parameter . . . . . . . . . . . . . . . . . . . . . 6 63 4.2. The oc-algo parameter . . . . . . . . . . . . . . . . . . 7 64 4.3. The oc-validity parameter . . . . . . . . . . . . . . . . 8 65 4.4. The oc-seq parameter . . . . . . . . . . . . . . . . . . . 8 66 5. General behaviour . . . . . . . . . . . . . . . . . . . . . . 9 67 5.1. Determining support for overload control . . . . . . . . . 9 68 5.2. Creating and updating the overload control parameters . . 10 69 5.3. Determining the 'oc' Parameter Value . . . . . . . . . . . 12 70 5.4. Processing the Overload Control Parameters . . . . . . . . 12 71 5.5. Using the Overload Control Parameter Values . . . . . . . 13 72 5.6. Forwarding the overload control parameters . . . . . . . . 13 73 5.7. Terminating overload control . . . . . . . . . . . . . . . 14 74 5.8. Stabilizing overload algorithm selection . . . . . . . . . 14 75 5.9. Self-Limiting . . . . . . . . . . . . . . . . . . . . . . 15 76 5.10. Responding to an Overload Indication . . . . . . . . . . . 15 77 5.10.1. Message prioritization at the hop before the 78 overloaded server . . . . . . . . . . . . . . . . . . 16 79 5.10.2. Rejecting requests at an overloaded server . . . . . 16 80 5.11. 100-Trying provisional response and overload control 81 parameters . . . . . . . . . . . . . . . . . . . . . . . . 17 82 6. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 83 7. The loss-based overload control scheme . . . . . . . . . . . . 18 84 7.1. Special parameter values for loss-based overload 85 control . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 7.2. Default algorithm for loss-based overload control . . . . 19 87 8. Relationship with other IETF SIP load control efforts . . . . 23 88 9. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 89 10. Design Considerations . . . . . . . . . . . . . . . . . . . . 23 90 10.1. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 24 91 10.1.1. SIP Response Header . . . . . . . . . . . . . . . . . 24 92 10.1.2. SIP Event Package . . . . . . . . . . . . . . . . . . 24 93 10.2. Backwards Compatibility . . . . . . . . . . . . . . . . . 25 94 11. Security Considerations . . . . . . . . . . . . . . . . . . . 26 95 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 96 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 97 13.1. Normative References . . . . . . . . . . . . . . . . . . . 28 98 13.2. Informative References . . . . . . . . . . . . . . . . . . 29 99 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 29 100 Appendix B. RFC5390 requirements . . . . . . . . . . . . . . . . 30 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 103 1. Introduction 105 As with any network element, a Session Initiation Protocol (SIP) 106 [RFC3261] server can suffer from overload when the number of SIP 107 messages it receives exceeds the number of messages it can process. 108 Overload can pose a serious problem for a network of SIP servers. 109 During periods of overload, the throughput of a network of SIP 110 servers can be significantly degraded. In fact, overload may lead to 111 a situation in which the throughput drops down to a small fraction of 112 the original processing capacity. This is often called congestion 113 collapse. 115 Overload is said to occur if a SIP server does not have sufficient 116 resources to process all incoming SIP messages. These resources may 117 include CPU processing capacity, memory, network bandwidth, input/ 118 output, or disk resources. 120 For overload control, we only consider failure cases where SIP 121 servers are unable to process all SIP requests due to resource 122 constraints. There are other cases where a SIP server can 123 successfully process incoming requests but has to reject them due to 124 failure conditions unrelated to the SIP server being overloaded. For 125 example, a PSTN gateway that runs out of trunks but still has plenty 126 of capacity to process SIP messages should reject incoming INVITEs 127 using a 488 (Not Acceptable Here) response [RFC4412]. Similarly, a 128 SIP registrar that has lost connectivity to its registration database 129 but is still capable of processing SIP requests should reject 130 REGISTER requests with a 500 (Server Error) response [RFC3261]. 131 Overload control does not apply to these cases and SIP provides 132 appropriate response codes for them. 134 The SIP protocol provides a limited mechanism for overload control 135 through its 503 (Service Unavailable) response code. However, this 136 mechanism cannot prevent overload of a SIP server and it cannot 137 prevent congestion collapse. In fact, the use of the 503 (Service 138 Unavailable) response code may cause traffic to oscillate and to 139 shift between SIP servers and thereby worsen an overload condition. 140 A detailed discussion of the SIP overload problem, the problems with 141 the 503 (Service Unavailable) response code and the requirements for 142 a SIP overload control mechanism can be found in [RFC5390]. 144 This document defines the protocol for communicating overload 145 information between SIP servers and clients, so that clients can 146 reduce the volume of traffic sent to overloaded servers, avoiding 147 congestion collapse and increasing useful throughput. Section 4 148 describes the Via header parameters used for this communication. The 149 general behaviour of SIP servers and clients involved in overload 150 control is described in Section 5. In addition, Section 7 specifies 151 a loss-based overload control scheme. SIP clients and servers 152 conformant to this specification MUST implement the loss-based 153 overload control scheme. They MAY implement other overload control 154 schemes as well. 156 2. Terminology 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 160 document are to be interpreted as described in RFC 2119 [RFC2119]. 162 In this document, the terms "SIP client" and "SIP server" are used in 163 their generic forms. Thus, a "SIP client" could refer to the client 164 transaction state machine in a SIP proxy or it could refer to a user 165 agent client. Similarly, a "SIP server" could be a user agent server 166 or the server transaction state machine in a proxy. Various 167 permutations of this are also possible, for instance, SIP clients and 168 servers could also be part of back-to-back user agents (B2BUAs). 170 However, irrespective of the context (i.e., proxy, B2BUA, UAS, UAC) 171 these terms are used in, "SIP client" applies to any SIP entity that 172 provides overload control to traffic destined downstream. Similarly, 173 "SIP server" applies to any SIP entity that is experiencing overload 174 and would like its upstream neighbour to throttle incoming traffic. 176 Unless otherwise specified, all SIP entities described in this 177 document are assumed to support this specification. 179 The normative statements in this specification as they apply to SIP 180 clients and SIP servers assume that both the SIP clients and SIP 181 servers support this specification. If, for instance, only a SIP 182 client supports this specification and not the SIP server, then 183 follows that the normative statements in this specification pertinent 184 to the behavior of a SIP server do not apply to the server that does 185 not support this specification. 187 3. Overview of operations 189 We now explain the overview of how the overload control mechanism 190 operates by introducing the overload control parameters. Section 4 191 provides more details and normative behavior on the parameters listed 192 below. 194 Because overload control is performed hop-by-hop, the Via parameter 195 is attractive since it allows two adjacent SIP entities to indicate 196 support for, and exchange information associated with overload 197 control [RFC6357]. Additional advantages of this choice are 198 discussed in Section 10.1.1. An alternative mechanism using SIP 199 event packages was also considered, and the characteristics of that 200 choice are further outlined in Section 10.1.2. 202 This document defines four new parameters for the SIP Via header for 203 overload control. These parameters provide a mechanism for conveying 204 overload control information between adjacent SIP entities. The "oc" 205 parameter is used by a SIP server to indicate a reduction in the 206 amount of requests arriving at the server. The "oc-algo" parameter 207 contains a token or a list of tokens corresponding to the class of 208 overload control algorithms supported by the client. The server 209 chooses one algorithm from this list. The "oc-validity" parameter 210 establishes a time limit for which overload control is in effect, and 211 the "oc-seq" parameter aids in sequencing the responses at the 212 client. These parameters are discussed in detail in the next 213 section. 215 4. Via header parameters for overload control 217 The four Via header parameters are introduced below. Further context 218 about how to interpret these under various conditions is provided in 219 Section 5. 221 4.1. The oc parameter 223 This parameter is inserted by the SIP client and updated by the SIP 224 server. 226 A SIP client MUST add an "oc" parameter to the topmost Via header it 227 inserts into every SIP request. This provides an indication to 228 downstream neighbors that the client supports overload control. 229 There MUST NOT be a value associated with the parameter (the value 230 will be added by the server). 232 The downstream server MUST add a value to the "oc" parameter in the 233 response going upstream to a client that included the "oc" parameter 234 in the request. Inclusion of a value to the parameter represents two 235 things: one, upon the first contact (see Section 5.1), addition of a 236 value by the server to this parameter indicates (to the client) that 237 the downstream server supports overload control as defined in this 238 document. Second, if overload control is active, then it indicates 239 the level of control to be applied. 241 When a SIP client receives a response with the value in the "oc" 242 parameter filled in, it MUST reduce, as indicated by the "oc" and 243 "oc-algo" parameters, the number of requests going downstream to the 244 SIP server from which it received the response (see Section 5.10 for 245 pertinent discussion on traffic reduction). 247 4.2. The oc-algo parameter 249 This parameter is inserted by the SIP client and updated by the SIP 250 server. 252 A SIP client MUST add an "oc-algo" parameter to the topmost Via 253 header it inserts into every SIP request, with a default value of 254 "loss". 256 This parameter contains names of one or more classes of overload 257 control algorithms. A SIP client MUST support the loss-based 258 overload control scheme and MUST insert at least the token "loss" as 259 one of the "oc-algo" parameter values. In addition, the SIP client 260 MAY insert other tokens, separated by a comma, in the "oc-algo" 261 parameter if it supports other overload control schemes such as a 262 rate-based scheme ([I-D.ietf-soc-overload-rate-control]). Each 263 element in the comma-separated list corresponds to the class of 264 overload control algorithms supported by the SIP client. When more 265 than one class of overload control algorithms is present in the "oc- 266 algo" parameter, the client may indicate algorithm preference by 267 ordering the list in a decreasing order of preference. However, the 268 client must not assume that the server will pick the most preferred 269 algorithm. 271 When a downstream SIP server receives a request with multiple 272 overload control algorithms specified in the "oc-algo" parameter 273 (optionally sorted by decreasing order of preference), it MUST choose 274 one algorithm from the list and return the single selected algorithm 275 in the response to the upstream SIP client. 277 Once the SIP server has chosen, and communicated to the client, a 278 mutually agreeable class of overload control algorithm, the selection 279 stays in effect until such time that the algorithm is changed by the 280 server. Furthermore, the client MUST continue to include all the 281 supported algorithms in subsequent requests; the server MUST respond 282 with the agreed to algorithm until such time that the algorithm is 283 changed by the server. The selection SHOULD stay the same for a non- 284 trivial duration of time to allow the overload control algorithm to 285 stabilize its behaviour (see Section 5.8). 287 The "oc-algo" parameter does not define the exact algorithm to be 288 used for traffic reduction, rather, the intent is to use any 289 algorithm from a specific class of algorithms that affect traffic 290 reduction similarly. For example, the reference algorithm in 291 Section 7.2 can be used as a loss-based algorithm, or it can be 292 substituted by any other loss-based algorithm that results in 293 equivalent traffic reduction. 295 4.3. The oc-validity parameter 297 This parameter MAY be inserted by the SIP server in a response; it 298 MUST NOT be inserted by the SIP client in a request. 300 This parameter contains a value that indicates an interval of time 301 (measured in milliseconds) that the load reduction specified in the 302 value of the "oc" parameter should be in effect. The default value 303 of the "oc-validity" parameter is 500 (millisecond). If the client 304 receives a response with the "oc" and "oc-algo" parameters suitably 305 filled in, but no "oc-validity" parameter, the SIP client should 306 behave as if it had received "oc-validity=500". 308 A value of 0 in the "oc-validity" parameter is reserved to denote the 309 event that the server wishes to stop overload control, or to indicate 310 that it supports overload control, but is not currently requesting 311 any reduction in traffic (see Section 5.7). 313 A non-zero value for the "oc-validity" parameter MUST only be present 314 in conjunction with an "oc" parameter. A SIP client MUST discard a 315 non-zero value of the "oc-validity" parameter if the client receives 316 it in a response without the corresponding "oc" parameter being 317 present as well. 319 After the value specified in the "oc-validity" parameter expires and 320 until the SIP client receives an updated set of overload control 321 parameters from the SIP server, the client MUST behave as if overload 322 control is not in effect between it and the downstream SIP server. 324 4.4. The oc-seq parameter 326 This parameter MUST be inserted by the SIP server in a response; it 327 MUST NOT be inserted by the SIP client in a request. 329 This parameter contains an unsigned integer value that indicates the 330 sequence number associated with the "oc" parameter. This sequence 331 number is used to differentiate two "oc" parameter values generated 332 by an overload control algorithm at two different instants in time. 333 "oc" parameter values generated by an overload control algorithm at 334 time t and t+1 MUST have an increasing value in the "oc-seq" 335 parameter. This allows the upstream SIP client to properly collate 336 out-of-order responses. 338 A timestamp can be used as a value of the "oc-seq" parameter. 340 If the value contained in "oc-seq" parameter overflows during the 341 period in which the load reduction is in effect, then the "oc-seq" 342 parameter MUST be reset to the current timestamp or an appropriate 343 base value. 345 A client implementation can recognize that an overflow has 346 occurred when it receives an "oc-seq" parameter whose value is 347 significantly less than several previous values. (Note that an 348 "oc-seq" parameter whose value does not deviate significantly from 349 the last several previous values is symptomatic of a tardy packet. 350 However, overflow will cause "oc-seq" an "oc-seq" parameter value 351 to be significantly less than the last several values.) If an 352 overflow is detected, then the client should use the overload 353 parameters in the new message, even though the sequence number is 354 lower. The client should also reset any internal state to reflect 355 the overflow so that future messages (following the overflow) will 356 be accepted. 358 5. General behaviour 360 When forwarding a SIP request, a SIP client uses the SIP procedures 361 of [RFC3263] to determine the next hop SIP server. The procedures of 362 [RFC3263] take as input a SIP URI, extract the domain portion of that 363 URI for use as a lookup key, and query the Domain Name Service (DNS) 364 to obtain an ordered set of one or more IP addresses with a port 365 number and transport corresponding to each IP address in this set 366 (the "Expected Output"). 368 After selecting a specific SIP server from the Expected Output, a SIP 369 client MUST determine whether overload controls are currently active 370 with that server. If overload controls are currently active (and oc- 371 validity period has not yet expired), the client applies the relevant 372 algorithm to determine whether or not to send the SIP request to the 373 server. If overload controls are not currently active with this 374 server (which will be the case if this is the initial contact with 375 the server, or the last response from this server had "oc- 376 validity=0", or the time period indicated by the "oc-validity" 377 parameter has expired), the SIP client sends the SIP message to the 378 server without invoking any overload control algorithm. 380 5.1. Determining support for overload control 382 If a client determines that this is the first contact with a server, 383 the client MUST insert the "oc" parameter without any value, and MUST 384 insert the "oc-algo" parameter with a list of algorithms it supports. 386 This list MUST include "loss" and MAY include other algorithm names 387 approved by IANA and described in corresponding documents. The 388 client transmits the request to the chosen server. 390 If a server receives a SIP request containing the "oc" and "oc-algo" 391 parameters, the server MUST determine if it has already selected the 392 overload control algorithm class with this client. If it has, the 393 server SHOULD use the previously selected algorithm class in its 394 response to the message. If the server determines that the message 395 is from a new client, or a client the server has not heard from in a 396 long time, the server MUST choose one algorithm from the list of 397 algorithms in the "oc-algo" parameter. It MUST put the chosen 398 algorithm as the sole parameter value in the "oc-algo" parameter of 399 the response it sends to the client. In addition, if the server is 400 currently not in an overload condition, it MUST set the value of the 401 "oc" parameter to be 0 and MAY insert an "oc-validity=0" parameter in 402 the response to further qualify the value in the "oc" parameter. If 403 the server is currently overloaded, it MUST follow the procedures of 404 Section 5.2. 406 A client that supports the rate-based overload control scheme 407 [I-D.ietf-soc-overload-rate-control] will consider "oc=0" as an 408 indication not to send any requests downstream at all. Thus, when 409 the server inserts "oc-validity=0" as well, it is indicating that 410 it does support overload control, but it is not under overload 411 mode right now (see Section 5.7). 413 5.2. Creating and updating the overload control parameters 415 A SIP server provides overload control feedback to its upstream 416 clients by providing a value for the "oc" parameter to the topmost 417 Via header field of a SIP response, that is, the Via header added by 418 the client before it sent the request to the server. 420 Since the topmost Via header of a response will be removed by an 421 upstream client after processing it, overload control feedback 422 contained in the "oc" parameter will not travel beyond the upstream 423 SIP client. A Via header parameter therefore provides hop-by-hop 424 semantics for overload control feedback (see [RFC6357]) even if the 425 next hop neighbor does not support this specification. 427 The "oc" parameter can be used in all response types, including 428 provisional, success and failure responses (please see Section 5.11 429 for special consideration on transporting overload control parameters 430 in a 100-Trying response). A SIP server MAY update the "oc" 431 parameter a response, asking the client to increase or decrease the 432 number of requests destined to the server, or to stop performing 433 overload control altogether. 435 A SIP server that has updated the "oc" parameter SHOULD also add a 436 "oc-validity" parameter. The "oc-validity" parameter defines the 437 time in milliseconds during which the the overload control feedback 438 specified in the "oc" parameter is valid. The default value of the 439 "oc-validity" parameter is 500 (millisecond). 441 When a SIP server retransmits a response, it SHOULD use the "oc" 442 parameter value and "oc-validity" parameter value consistent with the 443 overload state at the time the retransmitted response is sent. This 444 implies that the values in the "oc" and "oc-validity" parameters may 445 be different than the ones used in previous retransmissions of the 446 response. Due to the fact that responses sent over UDP may be 447 subject to delays in the network and arrive out of order, the "oc- 448 seq" parameter aids in detecting a stale "oc" parameter value. 450 Implementations that are capable of updating the "oc" and "oc- 451 validity" parameter values during retransmissions MUST insert the 452 "oc-seq" parameter. The value of this parameter MUST be a set of 453 numbers drawn from an increasing sequence. 455 Implementations that are not capable of updating the "oc" and "oc- 456 validity" parameter values during retransmissions --- or 457 implementations that do not want to do so because they will have to 458 regenerate the message to be retransmitted --- MUST still insert a 459 "oc-seq" parameter in the first response associated with a 460 transaction; however, they do not have to update the value in 461 subsequent retransmissions. 463 The "oc-validity" and "oc-seq" Via header parameters are only defined 464 in SIP responses and MUST NOT be used in SIP requests. These 465 parameters are only useful to the upstream neighbor of a SIP server 466 (i.e., the entity that is sending requests to the SIP server) since 467 the client is the entity that can offload traffic by redirecting or 468 rejecting new requests. If requests are forwarded in both directions 469 between two SIP servers (i.e., the roles of upstream/downstream 470 neighbors change), there are also responses flowing in both 471 directions. Thus, both SIP servers can exchange overload 472 information. 474 Since overload control protects a SIP server from overload, it is 475 RECOMMENDED that a SIP server uses the mechanisms described in this 476 specification. However, if a SIP server wanted to limit its overload 477 control capability for privacy reasons, it MAY decide to perform 478 overload control only for requests that are received on a secure 479 transport channel, such as TLS. This enables a SIP server to protect 480 overload control information and ensure that it is only visible to 481 trusted parties. 483 5.3. Determining the 'oc' Parameter Value 485 The value of the "oc" parameter is determined by the overloaded 486 server using any pertinent information at its disposal. The only 487 constraint imposed by this document is that the server control 488 algorithm MUST produce a value for the "oc" parameter that it expects 489 the receiving SIP clients to apply to all downstream SIP requests 490 (dialogue forming as well as in-dialogue) to this SIP server. Beyond 491 this stipulation, the process by which an overloaded server 492 determines the value of the "oc" parameter is considered out of scope 493 for this document. 495 Note that this stipulation is required so that both the client and 496 server have an common view of which messages the overload control 497 applies to. With this stipulation in place, the client can 498 prioritize messages as discussed in Section 5.10.1. 500 As an example, a value of "oc=10" when the loss-based algorithm is 501 used implies that 10% of the total number of SIP requests (dialog 502 forming as well as in-dialogue) are subject to reduction at the 503 client. Analogously, a value of "oc=10" when the rate-based 504 algorithm [I-D.ietf-soc-overload-rate-control] is used indicates that 505 the client should send SIP requests at a rate of 10 SIP requests or 506 fewer per second. 508 5.4. Processing the Overload Control Parameters 510 A SIP client SHOULD remove "oc", "oc-validity" and "oc-seq" 511 parameters from all Via headers of a response received, except for 512 the topmost Via header. This prevents overload control parameters 513 that were accidentally or maliciously inserted into Via headers by a 514 downstream SIP server from traveling upstream. 516 The scope of overload control applies to unique combinations of IP 517 and port values. A SIP client maintains the overload control values 518 received (along with the address and port number of the SIP servers 519 from which they were received) for the duration specified in the "oc- 520 validity" parameter or the default duration. Each time a SIP client 521 receives a response with overload control parameter from a downstream 522 SIP server, it compares the "oc-seq" value extracted from the Via 523 header with the "oc-seq" value stored for this server. If these 524 values match, the response does not update the overload control 525 parameters related to this server and the client continues to provide 526 overload control as previously negotiated. If the "oc-seq" value 527 extracted from the Via header is larger than the stored value, the 528 client updates the stored values by copying the new values of "oc", 529 "oc-algo" and "oc-seq" parameters from the Via header to the stored 530 values. Upon such an update of the overload control parameters, the 531 client restarts the validity period of the new overload control 532 parameters. The overload control parameters now remain in effect 533 until the validity period expires or the parameters are updated in a 534 new response. Stored overload control parameters MUST be reset to 535 default values once the validity period has expired (see Section 5.7 536 for the detailed steps on terminating overload control). 538 5.5. Using the Overload Control Parameter Values 540 A SIP client MUST honor overload control values it receives from 541 downstream neighbors. The SIP client MUST NOT forward more requests 542 to a SIP server than allowed by the current "oc" and "oc-algo" 543 parameter values from that particular downstream server. 545 When forwarding a SIP request, a SIP client uses the SIP procedures 546 of [RFC3263] to determine the next hop SIP server. The procedures of 547 [RFC3263] take as input a SIP URI, extract the domain portion of that 548 URI for use as a lookup key, and query the Domain Name Service (DNS) 549 to obtain an ordered set of one or more IP addresses with a port 550 number and transport corresponding to each IP address in this set 551 (the "Expected Output"). 553 After selecting a specific SIP server from the Expected Output, the 554 SIP client MUST determine if it already has overload control 555 parameter values for the server chosen from the Expected Output. If 556 the SIP client has a non-expired "oc" parameter value for the server 557 chosen from the Expected Output, then this chosen server is operating 558 in overload control mode. Thus, the SIP client MUST determine if it 559 can or cannot forward the current request to the SIP server based on 560 the "oc" and "oc-algo" parameters and any relevant local policy. 562 The particular algorithm used to determine whether or not to forward 563 a particular SIP request is a matter of local policy, and may take 564 into account a variety of prioritization factors. However, this 565 local policy SHOULD transmit the same number of SIP requests as the 566 sample algorithm defined by the overload control scheme being used. 567 (See Section 7.2 for the default loss-based overload control 568 algorithm.) 570 5.6. Forwarding the overload control parameters 572 Overload control is defined in a hop-by-hop manner. Therefore, 573 forwarding the contents of the overload control parameters is 574 generally NOT RECOMMENDED and should only be performed if permitted 575 by the configuration of SIP servers. This means that a SIP proxy 576 SHOULD strip the overload control parameters inserted by the client 577 before proxying the request further downstream. 579 5.7. Terminating overload control 581 A SIP client removes overload control if one of the following events 582 occur: 584 1. The "oc-validity" period previously received by the client from 585 this server (or the default value of 500ms if the server did not 586 previously specify an "oc-validity" parameter) expires; 587 2. The client is explicitly told by the server to stop performing 588 overload control using the "oc-validity=0" parameter. 590 A SIP server can decide to terminate overload control by explicitly 591 signaling the client. To do so, the SIP server MUST set the value of 592 the "oc-validity" parameter to 0. The SIP server MUST increment the 593 value of "oc-seq", and SHOULD set the value of the "oc" parameter to 594 0. 596 Note that the loss-based overload control scheme (Section 7) can 597 effectively stop overload control by setting the value of the "oc" 598 parameter to 0. However, the rate-based scheme 599 ([I-D.ietf-soc-overload-rate-control]) needs an additional piece 600 of information in the form of "oc-validity=0". 602 When the client receives a response with a higher "oc-seq" number 603 than the one it most recently processed, it checks the "oc-validity" 604 parameter. If the value of the "oc-validity" parameter is 0, the 605 client MUST stop performing overload control of messages destined to 606 the server and the traffic should flow without any reduction. 607 Furthermore, when the value of the "oc-validity" parameter is 0, the 608 client SHOULD disregard the value in the "oc" parameter. 610 5.8. Stabilizing overload algorithm selection 612 Realities of deployments of SIP necessitate that the overload control 613 algorithm may be changed upon a system reboot or a software upgrade. 614 However, frequent changes of the overload control algorithm must be 615 avoided. Frequent changes of the overload control algorithm will not 616 benefit the client or the server as such flapping does not allow the 617 chosen algorithm to stabilize. An algorithm change, when desired, is 618 simply accomplished by the SIP server choosing a new algorithm from 619 the list in the client's "oc-algo" parameter and sending it back to 620 the client in a response. 622 The client associates a specific algorithm with each server it sends 623 traffic to and when the server changes the algorithm, the client must 624 change its behaviour accordingly. 626 Once the server selects a specific overload control algorithm for a 627 given client, the algorithm SHOULD NOT change the algorithm 628 associated with that client for at least 3600 seconds (1 hour). This 629 period may involve one or more cycles of overload control being in 630 effect and then being stopped depending on the traffic and resources 631 at the server. 633 One way to accomplish this involves the server saving the time of 634 the last algorithm change in a lookup table, indexed by the 635 client's network identifiers. The server only changes the "oc- 636 algo" parameter when the time since the last change has surpassed 637 3600 seconds. 639 5.9. Self-Limiting 641 In some cases, a SIP client may not receive a response from a server 642 after sending a request. RFC3261 [RFC3261] defines that when a 643 timeout error is received from the transaction layer, it MUST be 644 treated as if a 408 (Request Timeout) status code has been received. 645 If a fatal transport error is reported by the transport layer, it 646 MUST be treated as a 503 (Service Unavailable) status code. 648 In the event of repeated timeouts or fatal transport errors, the SIP 649 client MUST stop sending requests to this server. The SIP client 650 SHOULD periodically probe if the downstream server is alive using any 651 mechanism at its disposal. Clients should be conservative in their 652 probing (e.g., using an exponential back-off) so that their liveness 653 probes do not exacerbate an overload situation. Once a SIP client 654 has successfully received a normal response for a request sent to the 655 downstream server, the SIP client can resume sending SIP requests. 656 It should, of course, honor any overload control parameters it may 657 receive in the initial, or later, responses. 659 5.10. Responding to an Overload Indication 661 A SIP client can receive overload control feedback indicating that it 662 needs to reduce the traffic it sends to its downstream server. The 663 client can accomplish this task by sending some of the requests that 664 would have gone to the overloaded element to a different destination. 665 It needs to ensure, however, that this destination is not in overload 666 and capable of processing the extra load. A client can also buffer 667 requests in the hope that the overload condition will resolve quickly 668 and the requests still can be forwarded in time. In many cases, 669 however, it will need to reject these requests with a "503 (Service 670 Unavailable)" response without the Retry-After header. 672 5.10.1. Message prioritization at the hop before the overloaded server 674 During an overload condition, a SIP client needs to prioritize 675 requests and select those requests that need to be rejected or 676 redirected. This selection is largely a matter of local policy. It 677 is expected that a SIP client will follow local policy as long as the 678 result in reduction of traffic is consistent with the overload 679 algorithm in effect at that node. Accordingly, the normative 680 behaviour in the next three paragraphs should be interpreted with the 681 understanding that the SIP client will aim to preserve local policy 682 to the fullest extent possible. 684 A SIP client SHOULD honor the local policy for prioritizing SIP 685 requests such as policies based on message type, e.g., INVITEs versus 686 requests associated with existing sessions. 688 A SIP client SHOULD honor the local policy for prioritizing SIP 689 requests based on the content of the Resource-Priority header (RPH, 690 RFC4412 [RFC4412]). Specific (namespace.value) RPH contents may 691 indicate high priority requests that should be preserved as much as 692 possible during overload. The RPH contents can also indicate a low- 693 priority request that is eligible to be dropped during times of 694 overload. 696 A SIP client SHOULD honor the local policy for prioritizing SIP 697 requests relating to emergency calls as identified by the SOS URN 698 [RFC5031] indicating an emergency request. 700 A local policy can be expected to combine both the SIP request type 701 and the prioritization markings, and SHOULD be honored when overload 702 conditions prevail. 704 5.10.2. Rejecting requests at an overloaded server 706 If the upstream SIP client to the overloaded server does not support 707 overload control, it will continue to direct requests to the 708 overloaded server. Thus, for the non-participating client, the 709 overloaded server must bear the cost of rejecting some requests from 710 the client as well as the cost of processing the non-rejected 711 requests to completion. It would be fair to devote the same amount 712 of processing at the overloaded server to the combination of 713 rejection and processing from a non-participating client as the 714 overloaded server would devote to processing requests from a 715 participating client. This is to ensure that SIP clients that do not 716 support this specification don't receive an unfair advantage over 717 those that do. 719 A SIP server that is under overload and has started to throttle 720 incoming traffic MUST reject some requests from non-participating 721 clients with a "503 (Service Unavailable)" response without the 722 Retry-After header. 724 5.11. 100-Trying provisional response and overload control parameters 726 The overload control information sent from a SIP server to a client 727 is transported in the responses. While implementations can insert 728 overload control information in any response, special attention 729 should be accorded to overload control information transported in a 730 100-Trying response. 732 Traditionally, the 100-Trying response has been used in SIP to quench 733 retransmissions. In some implementations, the 100-Trying message may 734 not be generated by the transaction user (TU) nor consumed by the TU. 735 In these implementations, the 100-Trying response is generated at the 736 transaction layer and sent to the upstream SIP client. At the 737 receiving SIP client, the 100-Trying is consumed at the transaction 738 layer by inhibiting the retransmission of the corresponding request. 739 Consequently, implementations that insert overload control 740 information in the 100-Trying cannot assume that the upstream SIP 741 client passed the overload control information in the 100-Trying to 742 their corresponding TU. For this reason, implementations that insert 743 overload control information in the 100-Trying MUST re-insert the 744 same (or updated) overload control information in the first non-100 745 response being sent to the upstream SIP client. 747 6. Example 749 Consider a SIP client, P1, which is sending requests to another 750 downstream SIP server, P2. The following snippets of SIP messages 751 demonstrate how the overload control parameters work. 753 INVITE sips:user@example.com SIP/2.0 754 Via: SIP/2.0/TLS p1.example.net; 755 branch=z9hG4bK2d4790.1;oc;oc-algo="loss,A" 756 ... 758 SIP/2.0 100 Trying 759 Via: SIP/2.0/TLS p1.example.net; 760 branch=z9hG4bK2d4790.1;received=192.0.2.111; 761 oc=0;oc-algo="loss";oc-validity=0 762 ... 764 In the messages above, the first line is sent by P1 to P2. This line 765 is a SIP request; because P1 supports overload control, it inserts 766 the "oc" parameter in the topmost Via header that it created. P1 767 supports two overload control algorithms: loss and some algorithm 768 called "A". 770 The second line --- a SIP response --- shows the topmost Via header 771 amended by P2 according to this specification and sent to P1. 772 Because P2 also supports overload control, and because it chooses the 773 "loss" based scheme, it sends "loss" back to P1 in the "oc-algo" 774 parameter. It also sets the value of "oc" and "oc-validity" 775 parameters to 0 because it is not currently requesting overload 776 control activation. 778 Had P2 not supported overload control, it would have left the "oc" 779 and "oc-algo" parameters unchanged, thus allowing the client to know 780 that it did not support overload control. 782 At some later time, P2 starts to experience overload. It sends the 783 following SIP message indicating that P1 should decrease the messages 784 arriving to P2 by 20% for 0.5s. 786 SIP/2.0 180 Ringing 787 Via: SIP/2.0/TLS p1.example.net; 788 branch=z9hG4bK2d4790.3;received=192.0.2.111; 789 oc=20;oc-algo="loss";oc-validity=500; 790 oc-seq=1282321615.782 791 ... 793 After some time, the overload condition at P2 subsides. It then 794 changes the parameter values in the response it sends to P1 to allow 795 P1 to send all messages destined to P2. 797 SIP/2.0 183 Queued 798 Via: SIP/2.0/TLS p1.example.net; 799 branch=z9hG4bK2d4790.4;received=192.0.2.111; 800 oc=0;oc-algo="loss";oc-validity=0;oc-seq=1282321892.439 801 ... 803 7. The loss-based overload control scheme 805 Under a loss-based approach, a SIP server asks an upstream neighbor 806 to reduce the number of requests it would normally forward to this 807 server by a certain percentage. For example, a SIP server can ask an 808 upstream neighbor to reduce the number of requests this neighbor 809 would normally send by 10%. The upstream neighbor then redirects or 810 rejects 10% of the traffic originally destined for that server. 812 This section specifies the semantics of the overload control 813 parameters associated with the loss-based overload control scheme. 815 The general behaviour of SIP clients and servers is specified in 816 Section 5 and is applicable to SIP clients and servers that implement 817 loss-based overload control. 819 7.1. Special parameter values for loss-based overload control 821 The loss-based overload control scheme is identified using the token 822 "loss". This token MUST appear in the "oc-algo" parameter list sent 823 by the SIP client. 825 A SIP server that has selected the loss-based algorithm, upon 826 entering the overload state, will assign a value to the "oc" 827 parameter. This value MUST be in the range of [0, 100], inclusive. 828 This value MUST be interpreted by the client as a percentage, and the 829 SIP client MUST reduce the number of requests being forwarded to the 830 overloaded server by that percent. The SIP client may use any 831 algorithm that reduces the traffic it sends to the overloaded server 832 by the amount indicated. Such an algorithm SHOULD honor the message 833 prioritization discussion of Section 5.10.1. While a particular 834 algorithm is not subject to standardization, for completeness a 835 default algorithm for loss-based overload control is provided in 836 Section 7.2. 838 7.2. Default algorithm for loss-based overload control 840 This section describes a default algorithm that a SIP client can use 841 to throttle SIP traffic going downstream by the percentage loss value 842 specified in the "oc" parameter. 844 The client maintains two categories of requests; the first category 845 will include requests that are candidates for reduction, and the 846 second category will include requests that are not subject to 847 reduction except when all messages in the first category have been 848 rejected, and further reduction is still needed. Section 849 Section 5.10.1 contains directives on identifying messages for 850 inclusion in the second category. The remaining messages are 851 allocated to the first category. 853 Under overload condition, the client converts the value of the "oc" 854 parameter to a value that it applies to requests in the first 855 category. As a simple example, if "oc=10" and 40% of the requests 856 should be included in the first category, then: 858 10 / 40 * 100 = 25 860 Or, 25% of the requests in the first category can be reduced to get 861 an overall reduction of 10%. The client uses random discard to 862 achieve the 25% reduction of messages in the first category. 864 Messages in the second category proceed downstream unscathed. To 865 affect the 25% reduction rate from the first category, the client 866 draws a random number between 1 and 100 for the request picked from 867 the first category. If the random number is less than or equal to 868 converted value of the "oc" parameter, the request is not forwarded; 869 otherwise the request is forwarded. 871 A reference algorithm is shown below. 873 cat1 := 80.0 // Category 1 --- subject to reduction 874 cat2 := 100.0 - cat1 // Category 2 --- Under normal operations 875 // only subject to reduction after category 1 is exhausted. 876 // Note that the above ratio is simply a reasonable default. 877 // The actual values will change through periodic sampling 878 // as the traffic mix changes over time. 880 while (true) { 881 // We're modeling message processing as a single work queue 882 // that contains both incoming and outgoing messages. 883 sip_msg := get_next_message_from_work_queue() 885 update_mix(cat1, cat2) // See Note below 887 switch (sip_msg.type) { 889 case outbound request: 890 destination := get_next_hop(sip_msg) 891 oc_context := get_oc_context(destination) 893 if (oc_context == null) { 894 send_to_network(sip_msg) // Process it normally by sending the 895 // request to the next hop since this particular destination 896 // is not subject to overload 897 } 898 else { 899 // Determine if server wants to enter in overload or is in 900 // overload 901 in_oc := extract_in_oc(oc_context) 903 oc_value := extract_oc(oc_context) 904 oc_validity := extract_oc_validity(oc_context) 906 if (in_oc == false or oc_validity is not in effect) { 907 send_to_network(sip_msg) // Process it normally by sending 908 // the request to the next hop since this particular 909 // destination is not subject to overload. Optionally, 910 // clear the oc context for this server (not shown). 911 } 912 else { // Begin perform overload control 913 r := random() 914 drop_msg := false 916 category := assign_msg_to_category(sip_msg) 918 pct_to_reduce_cat1 = oc_value / cat1 * 100 920 if (oc_value <= cat1) { // Reduce all msgs from category 1 921 if (r <= pct_to_reduce_cat1 && category == cat1) { 922 drop_msg := true 923 } 924 } 925 else { // oc_value > category 1. Reduce 100% of msgs from 926 // category 1 and remaining from category 2. 927 pct_to_reduce_cat2 = (oc_value - cat1) / cat2 * 100 928 if (category == cat1) { 929 drop_msg := true 930 } 931 else { 932 if (r <= pct_to_reduce_cat2) { 933 drop_msg := true; 934 } 935 } 936 } 938 if (drop_msg == false) { 939 send_to_network(sip_msg) // Process it normally by 940 // sending the request to the next hop 941 } 942 else { 943 // Do not send request downstream, handle locally by 944 // generating response (if a proxy) or treating as 945 // an error (if a user agent). 946 } 948 } // End perform overload control 949 } 951 end case // outbound request 953 case outbound response: 954 if (we are in overload) { 955 add_overload_parameters(sip_msg) 956 } 957 send_to_network(sip_msg) 959 end case // outbound response 960 case inbound response: 962 if (sip_msg has oc parameter values) { 963 create_or_update_oc_context() // For the specific server 964 // that sent the response, create or update the oc context; 965 // i.e., extract the values of the oc-related parameters 966 // and store them for later use. 967 } 968 process_msg(sip_msg) 970 end case // inbound response 971 case inbound request: 973 if (we are not in overload) { 974 process_msg(sip_msg) 975 } 976 else { // We are in overload 977 if (sip_msg has oc parameters) { // Upstream client supports 978 process_msg(sip_msg) // oc; only sends important requests 979 } 980 else { // Upstream client does not support oc 981 if (local_policy(sip_msg) says process message) { 982 process_msg(sip_msg) 983 } 984 else { 985 send_response(sip_msg, 503) 986 } 987 } 988 } 989 end case // inbound request 990 } 991 } 993 Note: A simple way to sample the traffic mix for category 1 and 994 category 2 is to associate a counter with each category of message. 995 Periodically (every 5-10s) get the value of the counters and calculate 996 the ratio of category 1 messages to category 2 messages since the 997 last calculation. 999 Example: In the last 5 seconds, a total of 500 requests arrived 1000 at the queue. 450 out of the 500 were messages subject 1001 to reduction and 50 out of 500 were classified as requests not 1002 subject to reduction. Based on this ratio, cat1 := 90 and 1003 cat2 := 10, so a 90/10 mix will be used in overload calculations. 1005 8. Relationship with other IETF SIP load control efforts 1007 The overload control mechanism described in this document is reactive 1008 in nature and apart from message prioritization directives listed in 1009 Section 5.10.1 the mechanisms described in this draft will not 1010 discriminate requests based on user identity, filtering action and 1011 arrival time. SIP networks that require pro-active overload control 1012 mechanisms can upload user-level load control filters as described in 1013 [I-D.ietf-soc-load-control-event-package]. Local policy will also 1014 dictate the precedence of different overload control mechanisms 1015 applied to the traffic. Specifically, in a scenario where load 1016 control filters are installed by signaling neighbours [I-D.ietf-soc- 1017 load-control-event-package] and the same traffic can also be 1018 throttled using the overload control mechanism, local policy will 1019 dictate which of these schemes shall be given precedence. 1020 Interactions between the two schemes are out of scope for this 1021 document. 1023 9. Syntax 1025 This specification extends the existing definition of the Via header 1026 field parameters of [RFC3261] as follows: 1028 via-params = via-ttl / via-maddr 1029 / via-received / via-branch 1030 / oc / oc-validity 1031 / oc-seq / oc-algo / via-extension 1033 oc = "oc" [EQUAL oc-num] 1034 oc-num = 1*DIGIT 1035 oc-validity = "oc-validity" [EQUAL delta-ms] 1036 oc-seq = "oc-seq" EQUAL 1*12DIGIT "." 1*5DIGIT 1037 oc-algo = "oc-algo" EQUAL DQUOTE algo-list *(COMMA algo-list) 1038 DQUOTE 1039 algo-list = "loss" / *(other-algo) 1040 other-algo = %x41-5A / %x61-7A / %x30-39 1041 delta-ms = 1*DIGIT 1043 10. Design Considerations 1045 This section discusses specific design considerations for the 1046 mechanism described in this document. General design considerations 1047 for SIP overload control can be found in [RFC6357]. 1049 10.1. SIP Mechanism 1051 A SIP mechanism is needed to convey overload feedback from the 1052 receiving to the sending SIP entity. A number of different 1053 alternatives exist to implement such a mechanism. 1055 10.1.1. SIP Response Header 1057 Overload control information can be transmitted using a new Via 1058 header field parameter for overload control. A SIP server can add 1059 this header parameter to the responses it is sending upstream to 1060 provide overload control feedback to its upstream neighbors. This 1061 approach has the following characteristics: 1063 o A Via header parameter is light-weight and creates very little 1064 overhead. It does not require the transmission of additional 1065 messages for overload control and does not increase traffic or 1066 processing burdens in an overload situation. 1067 o Overload control status can frequently be reported to upstream 1068 neighbors since it is a part of a SIP response. This enables the 1069 use of this mechanism in scenarios where the overload status needs 1070 to be adjusted frequently. It also enables the use of overload 1071 control mechanisms that use regular feedback such as window-based 1072 overload control. 1073 o With a Via header parameter, overload control status is inherent 1074 in SIP signaling and is automatically conveyed to all relevant 1075 upstream neighbors, i.e., neighbors that are currently 1076 contributing traffic. There is no need for a SIP server to 1077 specifically track and manage the set of current upstream or 1078 downstream neighbors with which it should exchange overload 1079 feedback. 1080 o Overload status is not conveyed to inactive senders. This avoids 1081 the transmission of overload feedback to inactive senders, which 1082 do not contribute traffic. If an inactive sender starts to 1083 transmit while the receiver is in overload it will receive 1084 overload feedback in the first response and can adjust the amount 1085 of traffic forwarded accordingly. 1086 o A SIP server can limit the distribution of overload control 1087 information by only inserting it into responses to known upstream 1088 neighbors. A SIP server can use transport level authentication 1089 (e.g., via TLS) with its upstream neighbors. 1091 10.1.2. SIP Event Package 1093 Overload control information can also be conveyed from a receiver to 1094 a sender using a new event package. Such an event package enables a 1095 sending entity to subscribe to the overload status of its downstream 1096 neighbors and receive notifications of overload control status 1097 changes in NOTIFY requests. This approach has the following 1098 characteristics: 1100 o Overload control information is conveyed decoupled from SIP 1101 signaling. It enables an overload control manager, which is a 1102 separate entity, to monitor the load on other servers and provide 1103 overload control feedback to all SIP servers that have set up 1104 subscriptions with the controller. 1105 o With an event package, a receiver can send updates to senders that 1106 are currently inactive. Inactive senders will receive a 1107 notification about the overload and can refrain from sending 1108 traffic to this neighbor until the overload condition is resolved. 1109 The receiver can also notify all potential senders once they are 1110 permitted to send traffic again. However, these notifications do 1111 generate additional traffic, which adds to the overall load. 1112 o A SIP entity needs to set up and maintain overload control 1113 subscriptions with all upstream and downstream neighbors. A new 1114 subscription needs to be set up before/while a request is 1115 transmitted to a new downstream neighbor. Servers can be 1116 configured to subscribe at boot time. However, this would require 1117 additional protection to avoid the avalanche restart problem for 1118 overload control. Subscriptions need to be terminated when they 1119 are not needed any more, which can be done, for example, using a 1120 timeout mechanism. 1121 o A receiver needs to send NOTIFY messages to all subscribed 1122 upstream neighbors in a timely manner when the control algorithm 1123 requires a change in the control variable (e.g., when a SIP server 1124 is in an overload condition). This includes active as well as 1125 inactive neighbors. These NOTIFYs add to the amount of traffic 1126 that needs to be processed. To ensure that these requests will 1127 not be dropped due to overload, a priority mechanism needs to be 1128 implemented in all servers these request will pass through. 1129 o As overload feedback is sent to all senders in separate messages, 1130 this mechanism is not suitable when frequent overload control 1131 feedback is needed. 1132 o A SIP server can limit the set of senders that can receive 1133 overload control information by authenticating subscriptions to 1134 this event package. 1135 o This approach requires each proxy to implement user agent 1136 functionality (UAS and UAC) to manage the subscriptions. 1138 10.2. Backwards Compatibility 1140 An new overload control mechanism needs to be backwards compatible so 1141 that it can be gradually introduced into a network and functions 1142 properly if only a fraction of the servers support it. 1144 Hop-by-hop overload control (see [RFC6357]) has the advantage that it 1145 does not require that all SIP entities in a network support it. It 1146 can be used effectively between two adjacent SIP servers if both 1147 servers support overload control and does not depend on the support 1148 from any other server or user agent. The more SIP servers in a 1149 network support hop-by-hop overload control, the better protected the 1150 network is against occurrences of overload. 1152 A SIP server may have multiple upstream neighbors from which only 1153 some may support overload control. If a server would simply use this 1154 overload control mechanism, only those that support it would reduce 1155 traffic. Others would keep sending at the full rate and benefit from 1156 the throttling by the servers that support overload control. In 1157 other words, upstream neighbors that do not support overload control 1158 would be better off than those that do. 1160 A SIP server should therefore follow the behaviour outlined in 1161 Section 5.10.2 to handle clients that do not support overload 1162 control. 1164 11. Security Considerations 1166 Overload control mechanisms can be used by an attacker to conduct a 1167 denial-of-service attack on a SIP entity if the attacker can pretend 1168 that the SIP entity is overloaded. When such a forged overload 1169 indication is received by an upstream SIP client, it will stop 1170 sending traffic to the victim. Thus, the victim is subject to a 1171 denial-of-service attack. 1173 To better understand the threat model, consider the following 1174 diagram: 1176 Pa ------- ------ Pb 1177 \ / 1178 : ------ +-------- P1 ------+------ : 1179 / L1 L2 \ 1180 : ------- ------ : 1182 -----> Downstream (requests) 1183 <----- Upstream (responses) 1185 Here, requests travel downstream from the left-hand side, through 1186 Proxy P1, towards the right-hand side, and responses travel upstream 1187 from the right-hand side, through P1, towards the left hand side. 1188 Proxies Pa, Pb and P1 support overload control. L1 and L2 are labels 1189 for the links connecting P1 to the upstream clients and downstream 1190 servers. 1192 If an attacker is able to modify traffic between Pa and P1 on link 1193 L1, it can cause denial of service attack on P1 by having Pa not send 1194 any traffic to P1. Such an attack can proceed by the attacker 1195 modifying the response from P1 to Pa such that Pa's Via header is 1196 changed to indicate that all requests destined towards P1 should be 1197 dropped. Conversely, the attacker can simply remove any "oc", "oc- 1198 validity" and "oc-seq" markings added by P1 in a response to Pa. In 1199 such a case, the attacker will force P1 into overload control by 1200 denying request quenching at Pa even though Pa is capable of 1201 performing overload control. 1203 Similarly, if an attacker is able to modify traffic between P1 and Pb 1204 on link L2, it can change the Via header associated with P1 in a 1205 response from Pb to P1 such that all subsequent requests destined 1206 towards Pb from P1 are dropped. In essence, the attacker mounts a 1207 denial of service attack on Pb by indicating false overload control. 1208 Note that it is immaterial whether Pb supports overload control or 1209 not, the attack will succeed as long as the attacker is able to 1210 control L2. Conversely, an attacker can suppress a genuine overload 1211 condition at Pb by simply remove any "oc", "oc-validity" and "oc-seq" 1212 markings added by Pb in a response to P1. In such a case, the 1213 attacker will force P1 into sending requests to Pb even under 1214 overload conditions because P1 would not be aware aware that Pb 1215 supports overload control. 1217 Attacks that indicate false overload control can be mitigated by 1218 using TCP or Websockets [RFC6455], or better yet, TLS in conjunction 1219 with applying BCP 38 [RFC2827]. Attacks that are mounted to suppress 1220 genuine overload conditions can be avoided by using TLS on the 1221 connection. 1223 Another way to conduct an attack is to send a message containing a 1224 high overload feedback value through a proxy that does not support 1225 this extension. If this feedback is added to the second Via header 1226 (or all Via headers), it will reach the next upstream proxy. If the 1227 attacker can make the recipient believe that the overload status was 1228 created by its direct downstream neighbor (and not by the attacker 1229 further downstream) the recipient stops sending traffic to the 1230 victim. A precondition for this attack is that the victim proxy does 1231 not support this extension since it would not pass through overload 1232 control feedback otherwise. 1234 A malicious SIP entity could gain an advantage by pretending to 1235 support this specification but never reducing the amount of traffic 1236 it forwards to the downstream neighbor. If its downstream neighbor 1237 receives traffic from multiple sources which correctly implement 1238 overload control, the malicious SIP entity would benefit since all 1239 other sources to its downstream neighbor would reduce load. 1241 The solution to this problem depends on the overload control 1242 method. For rate-based and window-based overload control, it is 1243 very easy for a downstream entity to monitor if the upstream 1244 neighbor throttles traffic forwarded as directed. For percentage 1245 throttling this is not always obvious since the load forwarded 1246 depends on the load received by the upstream neighbor. 1248 To prevent such attacks, servers should monitor client behavior to 1249 determine whether they are complying with overload control policies. 1250 If a client is not conforming to such policies, then the server 1251 should treat it as a non-supporting client (see Section 5.10.2). 1253 12. IANA Considerations 1255 This specification defines four new Via header parameters as detailed 1256 below in the "Header Field Parameter and Parameter Values" sub- 1257 registry as per the registry created by [RFC3968]. The required 1258 information is: 1260 Header Field Parameter Name Predefined Values Reference 1261 __________________________________________________________ 1262 Via oc Yes RFCXXXX 1263 Via oc-validity Yes RFCXXXX 1264 Via oc-seq Yes RFCXXXX 1265 Via oc-algo Yes RFCXXXX 1267 RFC XXXX [NOTE TO RFC-EDITOR: Please replace with final RFC 1268 number of this specification.] 1270 13. References 1272 13.1. Normative References 1274 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1275 Requirement Levels", BCP 14, RFC 2119, March 1997. 1277 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1278 A., Peterson, J., Sparks, R., Handley, M., and E. 1279 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1280 June 2002. 1282 [RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation 1283 Protocol (SIP): Locating SIP Servers", RFC 3263, 1284 June 2002. 1286 [RFC3968] Camarillo, G., "The Internet Assigned Number Authority 1287 (IANA) Header Field Parameter Registry for the Session 1288 Initiation Protocol (SIP)", BCP 98, RFC 3968, 1289 December 2004. 1291 [RFC4412] Schulzrinne, H. and J. Polk, "Communications Resource 1292 Priority for the Session Initiation Protocol (SIP)", 1293 RFC 4412, February 2006. 1295 13.2. Informative References 1297 [I-D.ietf-soc-load-control-event-package] 1298 Shen, C., Schulzrinne, H., and A. Koike, "A Session 1299 Initiation Protocol (SIP) Load Control Event Package", 1300 draft-ietf-soc-load-control-event-package-10 (work in 1301 progress), November 2013. 1303 [I-D.ietf-soc-overload-rate-control] 1304 Noel, E. and P. Williams, "Session Initiation Protocol 1305 (SIP) Rate Control", 1306 draft-ietf-soc-overload-rate-control-06 (work in 1307 progress), October 2013. 1309 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1310 Defeating Denial of Service Attacks which employ IP Source 1311 Address Spoofing", BCP 38, RFC 2827, May 2000. 1313 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 1314 Emergency and Other Well-Known Services", RFC 5031, 1315 January 2008. 1317 [RFC5390] Rosenberg, J., "Requirements for Management of Overload in 1318 the Session Initiation Protocol", RFC 5390, December 2008. 1320 [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design 1321 Considerations for Session Initiation Protocol (SIP) 1322 Overload Control", RFC 6357, August 2011. 1324 [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", 1325 RFC 6455, December 2011. 1327 Appendix A. Acknowledgements 1329 The authors acknowledge the contributions of Bruno Chatras, Keith 1330 Drage, Janet Gunn, Rich Terpstra, Daryl Malas, Eric Noel, R. 1331 Parthasarathi, Antoine Roly, Jonathan Rosenberg, Charles Shen, Rahul 1332 Srivastava, Padma Valluri, Shaun Bharrat, Paul Kyzivat and Jeroen Van 1333 Bemmel to this document. 1335 Adam Roach and Eric McMurry helped flesh out the different cases for 1336 handling SIP messages described in the algorithm of Section 7.2. 1337 Janet Gunn reviewed the algorithm and suggested changes that lead to 1338 simpler processing for the case where "oc_value > cat1". 1340 Richard Barnes provided invaluable comments as part of area director 1341 review of the draft. 1343 Appendix B. RFC5390 requirements 1345 Table 1 provides a summary how this specification fulfills the 1346 requirements of [RFC5390]. A more detailed view on how each 1347 requirements is fulfilled is provided after the table. 1349 +-------------+-------------------+ 1350 | Requirement | Meets requirement | 1351 +-------------+-------------------+ 1352 | REQ 1 | Yes | 1353 | REQ 2 | Yes | 1354 | REQ 3 | Partially | 1355 | REQ 4 | Partially | 1356 | REQ 5 | Partially | 1357 | REQ 6 | Not applicable | 1358 | REQ 7 | Yes | 1359 | REQ 8 | Partially | 1360 | REQ 9 | Yes | 1361 | REQ 10 | Yes | 1362 | REQ 11 | Yes | 1363 | REQ 12 | Yes | 1364 | REQ 13 | Yes | 1365 | REQ 14 | Yes | 1366 | REQ 15 | Yes | 1367 | REQ 16 | Yes | 1368 | REQ 17 | Partially | 1369 | REQ 18 | Yes | 1370 | REQ 19 | Yes | 1371 | REQ 20 | Yes | 1372 | REQ 21 | Yes | 1373 | REQ 22 | Yes | 1374 | REQ 23 | Yes | 1375 +-------------+-------------------+ 1377 Summary of meeting requirements in RFC5390 1379 Table 1 1381 REQ 1: The overload mechanism shall strive to maintain the overall 1382 useful throughput (taking into consideration the quality-of-service 1383 needs of the using applications) of a SIP server at reasonable 1384 levels, even when the incoming load on the network is far in excess 1385 of its capacity. The overall throughput under load is the ultimate 1386 measure of the value of an overload control mechanism. 1388 Meeting REQ 1: Yes, the overload control mechanism allows an 1389 overloaded SIP server to maintain a reasonable level of throughput as 1390 it enters into congestion mode by requesting the upstream clients to 1391 reduce traffic destined downstream. 1393 REQ 2: When a single network element fails, goes into overload, or 1394 suffers from reduced processing capacity, the mechanism should strive 1395 to limit the impact of this on other elements in the network. This 1396 helps to prevent a small-scale failure from becoming a widespread 1397 outage. 1399 Meeting REQ 2: Yes. When a SIP server enters overload mode, it will 1400 request the upstream clients to throttle the traffic destined to it. 1401 As a consequence of this, the overloaded SIP server will itself 1402 generate proportionally less downstream traffic, thereby limiting the 1403 impact on other elements in the network. 1405 REQ 3: The mechanism should seek to minimize the amount of 1406 configuration required in order to work. For example, it is better 1407 to avoid needing to configure a server with its SIP message 1408 throughput, as these kinds of quantities are hard to determine. 1410 Meeting REQ 3: Partially. On the server side, the overload condition 1411 is determined monitoring S (c.f., Section 4 of [RFC6357]) and 1412 reporting a load feedback F as a value to the "oc" parameter. On the 1413 client side, a throttle T is applied to requests going downstream 1414 based on F. This specification does not prescribe any value for S, 1415 nor a particular value for F. The "oc-algo" parameter allows for 1416 automatic convergence to a particular class of overload control 1417 algorithm. There are suggested default values for the "oc-validity" 1418 parameter. 1420 REQ 4: The mechanism must be capable of dealing with elements that do 1421 not support it, so that a network can consist of a mix of elements 1422 that do and don't support it. In other words, the mechanism should 1423 not work only in environments where all elements support it. It is 1424 reasonable to assume that it works better in such environments, of 1425 course. Ideally, there should be incremental improvements in overall 1426 network throughput as increasing numbers of elements in the network 1427 support the mechanism. 1429 Meeting REQ 4: Partially. The mechanism is designed to reduce 1430 congestion when a pair of communicating entities support it. If a 1431 downstream overloaded SIP server does not respond to a request in 1432 time, a SIP client will attempt to reduce traffic destined towards 1433 the non-responsive server as outlined in Section 5.9. 1435 REQ 5: The mechanism should not assume that it will only be deployed 1436 in environments with completely trusted elements. It should seek to 1437 operate as effectively as possible in environments where other 1438 elements are malicious; this includes preventing malicious elements 1439 from obtaining more than a fair share of service. 1441 Meeting REQ 5: Partially. Since overload control information is 1442 shared between a pair of communicating entities, a confidential and 1443 authenticated channel can be used for this communication. However, 1444 if such a channel is not available, then the security ramifications 1445 outlined in Section 11 apply. 1447 REQ 6: When overload is signaled by means of a specific message, the 1448 message must clearly indicate that it is being sent because of 1449 overload, as opposed to other, non overload-based failure conditions. 1450 This requirement is meant to avoid some of the problems that have 1451 arisen from the reuse of the 503 response code for multiple purposes. 1452 Of course, overload is also signaled by lack of response to requests. 1453 This requirement applies only to explicit overload signals. 1455 Meeting REQ 6: Not applicable. Overload control information is 1456 signaled as part of the Via header and not in a new header. 1458 REQ 7: The mechanism shall provide a way for an element to throttle 1459 the amount of traffic it receives from an upstream element. This 1460 throttling shall be graded so that it is not all- or-nothing as with 1461 the current 503 mechanism. This recognizes the fact that "overload" 1462 is not a binary state and that there are degrees of overload. 1464 Meeting REQ 7: Yes, please see Section 5.5 and Section 5.10. 1466 REQ 8: The mechanism shall ensure that, when a request was not 1467 processed successfully due to overload (or failure) of a downstream 1468 element, the request will not be retried on another element that is 1469 also overloaded or whose status is unknown. This requirement derives 1470 from REQ 1. 1472 Meeting REQ 8: Partially. A SIP client that has overload information 1473 from multiple downstream servers will not retry the request on 1474 another element. However, if a SIP client does not know the overload 1475 status of a downstream server, it may send the request to that 1476 server. 1478 REQ 9: That a request has been rejected from an overloaded element 1479 shall not unduly restrict the ability of that request to be submitted 1480 to and processed by an element that is not overloaded. This 1481 requirement derives from REQ 1. 1483 Meeting REQ 9: Yes, a SIP client conformant to this specification 1484 will send the request to a different element. 1486 REQ 10: The mechanism should support servers that receive requests 1487 from a large number of different upstream elements, where the set of 1488 upstream elements is not enumerable. 1490 Meeting REQ 10: Yes, there are no constraints on the number of 1491 upstream clients. 1493 REQ 11: The mechanism should support servers that receive requests 1494 from a finite set of upstream elements, where the set of upstream 1495 elements is enumerable. 1497 Meeting REQ 11: Yes, there are no constraints on the number of 1498 upstream clients. 1500 REQ 12: The mechanism should work between servers in different 1501 domains. 1503 Meeting REQ 12: Yes, there are no inherent limitations on using 1504 overload control between domains. 1506 REQ 13: The mechanism must not dictate a specific algorithm for 1507 prioritizing the processing of work within a proxy during times of 1508 overload. It must permit a proxy to prioritize requests based on any 1509 local policy, so that certain ones (such as a call for emergency 1510 services or a call with a specific value of the Resource-Priority 1511 header field [RFC4412]) are given preferential treatment, such as not 1512 being dropped, being given additional retransmission, or being 1513 processed ahead of others. 1515 Meeting REQ 13: Yes, please see Section 5.10. 1517 REQ 14: REQ 14: The mechanism should provide unambiguous directions 1518 to clients on when they should retry a request and when they should 1519 not. This especially applies to TCP connection establishment and SIP 1520 registrations, in order to mitigate against avalanche restart. 1522 Meeting REQ 14: Yes, Section 5.9 provides normative behavior on when 1523 to retry a request after repeated timeouts and fatal transport errors 1524 resulting from communications with a non-responsive downstream SIP 1525 server. 1527 REQ 15: In cases where a network element fails, is so overloaded that 1528 it cannot process messages, or cannot communicate due to a network 1529 failure or network partition, it will not be able to provide explicit 1530 indications of the nature of the failure or its levels of congestion. 1531 The mechanism must properly function in these cases. 1533 Meeting REQ 15: Yes, Section 5.9 provides normative behavior on when 1534 to retry a request after repeated timeouts and fatal transport errors 1535 resulting from communications with a non-responsive downstream SIP 1536 server. 1538 REQ 16: The mechanism should attempt to minimize the overhead of the 1539 overload control messaging. 1541 Meeting REQ 16: Yes, overload control messages are sent in the 1542 topmost Via header, which is always processed by the SIP elements. 1544 REQ 17: The overload mechanism must not provide an avenue for 1545 malicious attack, including DoS and DDoS attacks. 1547 Meeting REQ 17: Partially. Since overload control information is 1548 shared between a pair of communicating entities, a confidential and 1549 authenticated channel can be used for this communication. However, 1550 if such a channel is not available, then the security ramifications 1551 outlined in Section 11 apply. 1553 REQ 18: The overload mechanism should be unambiguous about whether a 1554 load indication applies to a specific IP address, host, or URI, so 1555 that an upstream element can determine the load of the entity to 1556 which a request is to be sent. 1558 Meeting REQ 18: Yes, please see discussion in Section 5.5. 1560 REQ 19: The specification for the overload mechanism should give 1561 guidance on which message types might be desirable to process over 1562 others during times of overload, based on SIP-specific 1563 considerations. For example, it may be more beneficial to process a 1564 SUBSCRIBE refresh with Expires of zero than a SUBSCRIBE refresh with 1565 a non-zero expiration (since the former reduces the overall amount of 1566 load on the element), or to process re-INVITEs over new INVITEs. 1568 Meeting REQ 19: Yes, please see Section 5.10. 1570 REQ 20: In a mixed environment of elements that do and do not 1571 implement the overload mechanism, no disproportionate benefit shall 1572 accrue to the users or operators of the elements that do not 1573 implement the mechanism. 1575 Meeting REQ 20: Yes, an element that does not implement overload 1576 control does not receive any measure of extra benefit. 1578 REQ 21: The overload mechanism should ensure that the system remains 1579 stable. When the offered load drops from above the overall capacity 1580 of the network to below the overall capacity, the throughput should 1581 stabilize and become equal to the offered load. 1583 Meeting REQ 21: Yes, the overload control mechanism described in this 1584 draft ensures the stability of the system. 1586 REQ 22: It must be possible to disable the reporting of load 1587 information towards upstream targets based on the identity of those 1588 targets. This allows a domain administrator who considers the load 1589 of their elements to be sensitive information, to restrict access to 1590 that information. Of course, in such cases, there is no expectation 1591 that the overload mechanism itself will help prevent overload from 1592 that upstream target. 1594 Meeting REQ 22: Yes, an operator of a SIP server can configure the 1595 SIP server to only report overload control information for requests 1596 received over a confidential channel, for example. However, note 1597 that this requirement is in conflict with REQ 3, as it introduces a 1598 modicum of extra configuration. 1600 REQ 23: It must be possible for the overload mechanism to work in 1601 cases where there is a load balancer in front of a farm of proxies. 1603 Meeting REQ 23: Yes. Depending on the type of load balancer, this 1604 requirement is met. A load balancer fronting a farm of SIP proxies 1605 could be a SIP-aware load balancer or one that is not SIP-aware. If 1606 the load balancer is SIP-aware, it can make conscious decisions on 1607 throttling outgoing traffic towards the individual server in the farm 1608 based on the overload control parameters returned by the server. On 1609 the other hand, if the load balancer is not SIP-aware, then there are 1610 other strategies to perform overload control. Section 6 of [RFC6357] 1611 documents some of these strategies in more detail (see discussion 1612 related to Figure 3(a) in Section 6). 1614 Authors' Addresses 1616 Vijay K. Gurbani (editor) 1617 Bell Laboratories, Alcatel-Lucent 1618 1960 Lucent Lane, Rm 9C-533 1619 Naperville, IL 60563 1620 USA 1622 Email: vkg@bell-labs.com 1624 Volker Hilt 1625 Bell Laboratories, Alcatel-Lucent 1626 791 Holmdel-Keyport Rd 1627 Holmdel, NJ 07733 1628 USA 1630 Email: volkerh@bell-labs.com 1631 Henning Schulzrinne 1632 Columbia University/Department of Computer Science 1633 450 Computer Science Building 1634 New York, NY 10027 1635 USA 1637 Phone: +1 212 939 7004 1638 Email: hgs@cs.columbia.edu 1639 URI: http://www.cs.columbia.edu