idnits 2.17.1 draft-ietf-soc-overload-control-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 712 has weird spacing: '...control param...' == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (May 23, 2013) is 3983 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 758 -- Looks like a reference, but probably isn't: '100' on line 758 == Outdated reference: A later version (-13) exists of draft-ietf-soc-load-control-event-package-08 == Outdated reference: A later version (-10) exists of draft-ietf-soc-overload-rate-control-04 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Gurbani, Ed. 3 Internet-Draft V. Hilt 4 Intended status: Standards Track Bell Laboratories, Alcatel-Lucent 5 Expires: November 24, 2013 H. Schulzrinne 6 Columbia University 7 May 23, 2013 9 Session Initiation Protocol (SIP) Overload Control 10 draft-ietf-soc-overload-control-13 12 Abstract 14 Overload occurs in Session Initiation Protocol (SIP) networks when 15 SIP servers have insufficient resources to handle all SIP messages 16 they receive. Even though the SIP protocol provides a limited 17 overload control mechanism through its 503 (Service Unavailable) 18 response code, SIP servers are still vulnerable to overload. This 19 document defines the behaviour of SIP servers involved in overload 20 control, and in addition, it specifies a loss-based overload scheme 21 for SIP. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on November 24, 2013. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Overview of operations . . . . . . . . . . . . . . . . . . . . 5 60 4. Via header parameters for overload control . . . . . . . . . . 6 61 4.1. The oc parameter . . . . . . . . . . . . . . . . . . . . . 6 62 4.2. The oc-algo parameter . . . . . . . . . . . . . . . . . . 7 63 4.3. The oc-validity parameter . . . . . . . . . . . . . . . . 8 64 4.4. The oc-seq parameter . . . . . . . . . . . . . . . . . . . 8 65 5. General behaviour . . . . . . . . . . . . . . . . . . . . . . 9 66 5.1. Determining support for overload control . . . . . . . . . 9 67 5.2. Creating and updating the overload control parameters . . 10 68 5.3. Determining the 'oc' Parameter Value . . . . . . . . . . . 11 69 5.4. Processing the Overload Control Parameters . . . . . . . . 12 70 5.5. Using the Overload Control Parameter Values . . . . . . . 13 71 5.6. Forwarding the overload control parameters . . . . . . . . 13 72 5.7. Terminating overload control . . . . . . . . . . . . . . . 13 73 5.8. Stabilizing overload algorithm selection . . . . . . . . . 14 74 5.9. Self-Limiting . . . . . . . . . . . . . . . . . . . . . . 15 75 5.10. Responding to an Overload Indication . . . . . . . . . . . 15 76 5.10.1. Message prioritization at the hop before the 77 overloaded server . . . . . . . . . . . . . . . . . . 15 78 5.10.2. Rejecting requests at an overloaded server . . . . . 16 79 5.11. 100-Trying provisional response and overload control 80 parameters . . . . . . . . . . . . . . . . . . . . . . . . 16 81 6. The loss-based overload control scheme . . . . . . . . . . . . 17 82 6.1. Special parameter values for loss-based overload 83 control . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 6.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 18 85 6.3. Default algorithm for loss-based overload control . . . . 19 86 7. Relationship with other IETF SIP load control efforts . . . . 23 87 8. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 88 9. Design Considerations . . . . . . . . . . . . . . . . . . . . 24 89 9.1. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 24 90 9.1.1. SIP Response Header . . . . . . . . . . . . . . . . . 24 91 9.1.2. SIP Event Package . . . . . . . . . . . . . . . . . . 25 92 9.2. Backwards Compatibility . . . . . . . . . . . . . . . . . 26 93 10. Security Considerations . . . . . . . . . . . . . . . . . . . 26 94 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 95 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 96 12.1. Normative References . . . . . . . . . . . . . . . . . . . 27 97 12.2. Informative References . . . . . . . . . . . . . . . . . . 28 98 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 28 99 Appendix B. RFC5390 requirements . . . . . . . . . . . . . . . . 29 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34 102 1. Introduction 104 As with any network element, a Session Initiation Protocol (SIP) 105 [RFC3261] server can suffer from overload when the number of SIP 106 messages it receives exceeds the number of messages it can process. 107 Overload can pose a serious problem for a network of SIP servers. 108 During periods of overload, the throughput of a network of SIP 109 servers can be significantly degraded. In fact, overload may lead to 110 a situation in which the throughput drops down to a small fraction of 111 the original processing capacity. This is often called congestion 112 collapse. 114 Overload is said to occur if a SIP server does not have sufficient 115 resources to process all incoming SIP messages. These resources may 116 include CPU processing capacity, memory, network bandwidth, input/ 117 output, or disk resources. 119 For overload control, we only consider failure cases where SIP 120 servers are unable to process all SIP requests due to resource 121 constraints. There are other cases where a SIP server can 122 successfully process incoming requests but has to reject them due to 123 failure conditions unrelated to the SIP server being overloaded. For 124 example, a PSTN gateway that runs out of trunks but still has plenty 125 of capacity to process SIP messages should reject incoming INVITEs 126 using a 488 (Not Acceptable Here) response [RFC4412]. Similarly, a 127 SIP registrar that has lost connectivity to its registration database 128 but is still capable of processing SIP requests should reject 129 REGISTER requests with a 500 (Server Error) response [RFC3261]. 130 Overload control does not apply to these cases and SIP provides 131 appropriate response codes for them. 133 The SIP protocol provides a limited mechanism for overload control 134 through its 503 (Service Unavailable) response code. However, this 135 mechanism cannot prevent overload of a SIP server and it cannot 136 prevent congestion collapse. In fact, the use of the 503 (Service 137 Unavailable) response code may cause traffic to oscillate and to 138 shift between SIP servers and thereby worsen an overload condition. 139 A detailed discussion of the SIP overload problem, the problems with 140 the 503 (Service Unavailable) response code and the requirements for 141 a SIP overload control mechanism can be found in [RFC5390]. 143 This document defines the protocol for communicating overload 144 information between SIP servers and clients, so that clients can 145 reduce the volume of traffic sent to overloaded servers, avoiding 146 congestion collapse and increasing useful throughput. Section 4 147 describes the Via header parameters used for this communication. The 148 general behaviour of SIP servers and clients involved in overload 149 control is described in Section 5. In addition, Section 6 specifies 150 a loss-based overload control scheme. SIP clients and servers 151 conformant to this specification MUST implement the loss-based 152 overload control scheme. They MAY implement other overload control 153 schemes as well. 155 2. Terminology 157 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 158 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 159 document are to be interpreted as described in RFC 2119 [RFC2119]. 161 In this document, the terms "SIP client" and "SIP server" are used in 162 their generic forms. Thus, a "SIP client" could refer to the client 163 transaction state machine in a SIP proxy or it could refer to a user 164 agent client. Similarly, a "SIP server" could be a user agent server 165 or the server transaction state machine in a proxy. Various 166 permutations of this are also possible, for instance, SIP clients and 167 servers could also be part of back-to-back user agents (B2BUAs). 169 However, irrespective of the context (i.e., proxy, B2BUA, UAS, UAC) 170 these terms are used in, "SIP client" applies to any SIP entity that 171 provides overload control to traffic destined downstream. Similarly, 172 "SIP server" applies to any SIP entity that is experiencing overload 173 and would like its upstream neighbour to throttle incoming traffic. 175 Unless otherwise specified, all SIP entities described in this 176 document are assumed to support this specification. 178 The normative statements in this specification as they apply to SIP 179 clients and SIP servers assume that both the SIP clients and SIP 180 servers support this specification. If, for instance, only a SIP 181 client supports this specification and not the SIP server, then 182 follows that the normative statements in this specification pertinent 183 to the behavior of a SIP server do not apply to the server that does 184 not support this specification. 186 3. Overview of operations 188 We now explain the overview of how the overload control mechanism 189 operates by introducing the overload control parameters. Section 4 190 provides more details and normative behavior on the parameters listed 191 below. 193 Because overload control is performed hop-by-hop, the Via parameter 194 is attractive since it allows two adjacent SIP entities to indicate 195 support for, and exchange information associated with overload 196 control [RFC6357]. Additional advantages of this choice are 197 discussed in Section 9.1.1. An alternative mechanism using SIP event 198 packages was also considered, and the characteristics of that choice 199 are further outlined in Section 9.1.2. 201 This document defines four new parameters for the SIP Via header for 202 overload control. These parameters provide a mechanism for conveying 203 overload control information between adjacent SIP entities. The "oc" 204 parameter is used by a SIP server to indicate a reduction in the 205 amount of requests arriving at the server. The "oc-algo" parameter 206 contains a token or a list of tokens corresponding to the class of 207 overload control algorithms supported by the client. The server 208 chooses one algorithm from this list. The "oc-validity" parameter 209 establishes a time limit for which overload control is in effect, and 210 the "oc-seq" parameter aids in sequencing the responses at the 211 client. These parameters are discussed in detail in the next 212 section. 214 4. Via header parameters for overload control 216 The four Via header parameters are introduced below. Further context 217 about how to interpret these under various conditions is provided in 218 Section 5. 220 4.1. The oc parameter 222 This parameter is inserted by the SIP client and updated by the SIP 223 server. 225 A SIP client MUST add an "oc" parameter to the topmost Via header it 226 inserts into every SIP request. This provides an indication to 227 downstream neighbors that the client supports overload control. 228 There MUST NOT be a value associated with the parameter (the value 229 will be added by the server). 231 The downstream server MUST add a value to the "oc" parameter in the 232 response going upstream to a client that included the "oc" parameter 233 in the request. Inclusion of a value to the parameter represents two 234 things: one, upon the first contact (see Section 5.1), addition of a 235 value by the server to this parameter indicates (to the client) that 236 the downstream server supports overload control as defined in this 237 document. Second, if overload control is active, then it indicates 238 the level of control to be applied. 240 When a SIP client receives a response with the value in the "oc" 241 parameter filled in, it MUST reduce, as indicated by the "oc" and 242 "oc-algo" parameters, the number of requests going downstream to the 243 SIP server from which it received the response (see Section 5.10 for 244 pertinent discussion on traffic reduction). 246 4.2. The oc-algo parameter 248 This parameter is inserted by the SIP client and updated by the SIP 249 server. 251 A SIP client MUST add an "oc-algo" parameter to the topmost Via 252 header it inserts into every SIP request, with a default value of 253 "loss". 255 This parameter contains names of one or more classes of overload 256 control algorithms. A SIP client MUST support the loss-based 257 overload control scheme and MUST insert at least the token "loss" as 258 one of the "oc-algo" parameter values. In addition, the SIP client 259 MAY insert other tokens, separated by a comma, in the "oc-algo" 260 parameter if it supports other overload control schemes such as a 261 rate-based scheme ([I-D.ietf-soc-overload-rate-control]). Each 262 element in the comma-separated list corresponds to the class of 263 overload control algorithms supported by the SIP client. When more 264 than one class of overload control algorithms is present in the "oc- 265 algo" parameter, the client may indicate algorithm preference by 266 ordering the list in a decreasing order of preference. However, the 267 client must not assume that the server will pick the most preferred 268 algorithm. 270 When a downstream SIP server receives a request with multiple 271 overload control algorithms specified in the "oc-algo" parameter 272 (optionally sorted by decreasing order of preference), it MUST choose 273 one algorithm from the list and return the single selected algorithm 274 in the response to the upstream SIP client. 276 Once the SIP server has chosen, and communicated to the client, a 277 mutually agreeable class of overload control algorithm, the selection 278 stays in effect until such time that the algorithm is changed by the 279 server. Furthermore, the client MUST continue to include all the 280 supported algorithms in subsequent requests; the server MUST respond 281 with the agreed to algorithm until such time that the algorithm is 282 changed by the server. The selection SHOULD stay the same for a non- 283 trivial duration of time to allow the overload control algorithm to 284 stabilize its behaviour (see Section 5.8). 286 The "oc-algo" parameter does not define the exact algorithm to be 287 used for traffic reduction, rather, the intent is to use any 288 algorithm from a specific class of algorithms that affect traffic 289 reduction similarly. For example, the reference algorithm in 290 Section 6.3 can be used as a loss-based algorithm, or it can be 291 substituted by any other loss-based algorithm that results in 292 equivalent traffic reduction. 294 4.3. The oc-validity parameter 296 This parameter MAY be inserted by the SIP server in a response; it 297 MUST NOT be inserted by the SIP client in a request. 299 This parameter contains a value that indicates an interval of time 300 (measured in milliseconds) that the load reduction specified in the 301 value of the "oc" parameter should be in effect. The default value 302 of the "oc-validity" parameter is 500 (millisecond). If the client 303 receives a response with the "oc" and "oc-algo" parameters suitably 304 filled in, but no "oc-validity" parameter, the SIP client should 305 behave as if it had received "oc-validity=500". 307 A value of 0 in the "oc-validity" parameter is reserved to denote the 308 event that the server wishes to stop overload control, or to indicate 309 that it supports overload control, but is not currently requesting 310 any reduction in traffic (see Section 5.7). 312 A non-zero value for the "oc-validity" parameter MUST only be present 313 in conjunction with an "oc" parameter. A SIP client MUST discard a 314 non-zero value of the "oc-validity" parameter if the client receives 315 it in a response without the corresponding "oc" parameter being 316 present as well. 318 After the value specified in the "oc-validity" parameter expires and 319 until the SIP client receives an updated set of overload control 320 parameters from the SIP server, the client MUST behave as if overload 321 control is not in effect between it and the downstream SIP server. 323 4.4. The oc-seq parameter 325 This parameter MUST be inserted by the SIP server in a response; it 326 MUST NOT be inserted by the SIP client in a request. 328 This parameter contains a value that indicates the sequence number 329 associated with the "oc" parameter. This sequence number is used to 330 differentiate two "oc" parameter values generated by an overload 331 control algorithm at two different instants in time. "oc" parameter 332 values generated by an overload control algorithm at time t and t+1 333 MUST have an increasing value in the "oc-seq" parameter. This allows 334 the upstream SIP client to properly collate out-of-order responses. 336 A timestamp can be used as a value of the "oc-seq" parameter. 338 If the value contained in "oc-seq" parameter overflows during the 339 period in which the load reduction is in effect, then the "oc-seq" 340 parameter MUST be reset to the current timestamp or an appropriate 341 base value. 343 Due to an overflow, client implementations should be prepared to 344 receive an "oc-seq" parameter whose value is less than the 345 previous value. Client implementations can handle this by 346 continuing to perform overload control until the "oc-validity" 347 related to the previous value of "oc-seq" parameter expires. 349 5. General behaviour 351 When forwarding a SIP request, a SIP client uses the SIP procedures 352 of [RFC3263] to determine the next hop SIP server. The procedures of 353 [RFC3263] take as input a SIP URI, extract the domain portion of that 354 URI for use as a lookup key, and query the Domain Name Service (DNS) 355 to obtain an ordered set of one or more IP addresses with a port 356 number and transport corresponding to each IP address in this set 357 (the "Expected Output"). 359 After selecting a specific SIP server from the Expected Output, a SIP 360 client MUST determine whether overload controls are currently active 361 with that server. If overload controls are currently active (and oc- 362 validity period has not yet expired), the client applies the relevant 363 algorithm to determine whether or not to send the SIP request to the 364 server. If overload controls are not currently active with this 365 server (which will be the case if this is the initial contact with 366 the server, or the last response from this server had "oc- 367 validity=0", or the time period indicated by the "oc-validity" 368 parameter has expired), the SIP client sends the SIP message to the 369 server without invoking any overload control algorithm. 371 5.1. Determining support for overload control 373 If a client determines that this is the first contact with a server, 374 the client MUST insert the "oc" parameter without any value, and MUST 375 insert the "oc-algo" parameter with a list of algorithms it supports. 376 This list MUST include "loss" and MAY include other algorithm names 377 approved by IANA and described in corresponding documents. The 378 client transmits the request to the chosen server. 380 If a server receives a SIP request containing the "oc" and "oc-algo" 381 parameters, the server MUST determine if it has already selected the 382 overload control algorithm class with this client. If it has, the 383 server SHOULD use the previously selected algorithm class in its 384 response to the message. If the server determines that the message 385 is from a new client, or a client the server has not heard from in a 386 long time, the server MUST choose one algorithm from the list of 387 algorithms in the "oc-algo" parameter. It MUST put the chosen 388 algorithm as the sole parameter value in the "oc-algo" parameter of 389 the response it sends to the client. In addition, if the server is 390 currently not in an overload condition, it MUST set the value of the 391 "oc" parameter to be 0 and MAY insert an "oc-validity=0" parameter in 392 the response to further qualify the value in the "oc" parameter. If 393 the server is currently overloaded, it MUST follow the procedures of 394 Section 5.2. 396 A client that supports the rate-based overload control scheme 397 [I-D.ietf-soc-overload-rate-control] will consider "oc=0" as an 398 indication not to send any requests downstream at all. Thus, when 399 the server inserts "oc-validity=0" as well, it is indicating that 400 it does support overload control, but it is not under overload 401 mode right now (see Section 5.7). 403 5.2. Creating and updating the overload control parameters 405 A SIP server provides overload control feedback to its upstream 406 clients by providing a value for the "oc" parameter to the topmost 407 Via header field of a SIP response, that is, the Via header added by 408 the client before it sent the request to the server. 410 Since the topmost Via header of a response will be removed by an 411 upstream client after processing it, overload control feedback 412 contained in the "oc" parameter will not travel beyond the upstream 413 SIP client. A Via header parameter therefore provides hop-by-hop 414 semantics for overload control feedback (see [RFC6357]) even if the 415 next hop neighbor does not support this specification. 417 The "oc" parameter can be used in all response types, including 418 provisional, success and failure responses (please see Section 5.11 419 for special consideration on transporting overload control parameters 420 in a 100-Trying response). A SIP server MAY update the "oc" 421 parameter a response, asking the client to increase or decrease the 422 number of requests destined to the server, or to stop performing 423 overload control altogether. 425 A SIP server that has updated the "oc" parameter SHOULD also add a 426 "oc-validity" parameter. The "oc-validity" parameter defines the 427 time in milliseconds during which the the overload control feedback 428 specified in the "oc" parameter is valid. The default value of the 429 "oc-validity" parameter is 500 (millisecond). 431 When a SIP server retransmits a response, it SHOULD use the "oc" 432 parameter value and "oc-validity" parameter value consistent with the 433 overload state at the time the retransmitted response is sent. This 434 implies that the values in the "oc" and "oc-validity" parameters may 435 be different than the ones used in previous retransmissions of the 436 response. Due to the fact that responses sent over UDP may be 437 subject to delays in the network and arrive out of order, the "oc- 438 seq" parameter aids in detecting a stale "oc" parameter value. 440 Implementations that are capable of updating the "oc" and "oc- 441 validity" parameter values during retransmissions MUST insert the 442 "oc-seq" parameter. The value of this parameter MUST be a set of 443 numbers drawn from an increasing sequence. 445 Implementations that are not capable of updating the "oc" and "oc- 446 validity" parameter values during retransmissions --- or 447 implementations that do not want to do so because they will have to 448 regenerate the message to be retransmitted --- MUST still insert a 449 "oc-seq" parameter in the first response associated with a 450 transaction; however, they do not have to update the value in 451 subsequent retransmissions. 453 The "oc-validity" and "oc-seq" Via header parameters are only defined 454 in SIP responses and MUST NOT be used in SIP requests. These 455 parameters are only useful to the upstream neighbor of a SIP server 456 (i.e., the entity that is sending requests to the SIP server) since 457 the client is the entity that can offload traffic by redirecting or 458 rejecting new requests. If requests are forwarded in both directions 459 between two SIP servers (i.e., the roles of upstream/downstream 460 neighbors change), there are also responses flowing in both 461 directions. Thus, both SIP servers can exchange overload 462 information. 464 Since overload control protects a SIP server from overload, it is 465 RECOMMENDED that a SIP server uses the mechanisms described in this 466 specification. However, if a SIP server wanted to limit its overload 467 control capability for privacy reasons, it MAY decide to perform 468 overload control only for requests that are received on a secure 469 transport channel, such as TLS. This enables a SIP server to protect 470 overload control information and ensure that it is only visible to 471 trusted parties. 473 5.3. Determining the 'oc' Parameter Value 475 The value of the "oc" parameter is determined by the overloaded 476 server using any pertinent information at its disposal. The only 477 constraint imposed by this document is that the server control 478 algorithm MUST produce a value for the "oc" parameter that it expects 479 the receiving SIP clients to apply to all downstream SIP requests 480 (dialogue forming as well as in-dialogue) to this SIP server. Beyond 481 this stipulation, the process by which an overloaded server 482 determines the value of the "oc" parameter is considered out of scope 483 for this document. 485 Note that this stipulation is required so that both the client and 486 server have an common view of which messages the overload control 487 applies to. With this stipulation in place, the client can 488 prioritize messages as discussed in Section 5.10.1. 490 As an example, a value of "oc=10" when the loss-based algorithm is 491 used implies that 10% of the total number of SIP requests (dialog 492 forming as well as in-dialogue) are subject to reduction at the 493 client. Analogously, a value of "oc=10" when the rate-based 494 algorithm [I-D.ietf-soc-overload-rate-control] is used indicates that 495 the client should send SIP requests at a rate of 10 SIP requests or 496 fewer per second. 498 5.4. Processing the Overload Control Parameters 500 A SIP client SHOULD remove "oc", "oc-validity" and "oc-seq" 501 parameters from all Via headers of a response received, except for 502 the topmost Via header. This prevents overload control parameters 503 that were accidentally or maliciously inserted into Via headers by a 504 downstream SIP server from traveling upstream. 506 The scope of overload control applies to unique combinations of IP 507 and port values. A SIP client maintains the overload control values 508 received (along with the address and port number of the SIP servers 509 from which they were received) for the duration specified in the "oc- 510 validity" parameter or the default duration. Each time a SIP client 511 receives a response with overload control parameter from a downstream 512 SIP server, it compares the "oc-seq" value extracted from the Via 513 header with the "oc-seq" value stored for this server. If these 514 values match, the response does not update the overload control 515 parameters related to this server and the client continues to provide 516 overload control as previously negotiated. If the "oc-seq" value 517 extracted from the Via header is larger than the stored value, the 518 client updates the stored values by copying the new values of "oc", 519 "oc-algo" and "oc-seq" parameters from the Via header to the stored 520 values. Upon such an update of the overload control parameters, the 521 client restarts the validity period of the new overload control 522 parameters. The overload control parameters now remain in effect 523 until the validity period expires or the parameters are updated in a 524 new response. Stored overload control parameters MUST be reset to 525 default values once the validity period has expired (see Section 5.7 526 for the detailed steps on terminating overload control). 528 5.5. Using the Overload Control Parameter Values 530 A SIP client MUST honor overload control values it receives from 531 downstream neighbors. The SIP client MUST NOT forward more requests 532 to a SIP server than allowed by the current "oc" and "oc-algo" 533 parameter values from that particular downstream server. 535 When forwarding a SIP request, a SIP client uses the SIP procedures 536 of [RFC3263] to determine the next hop SIP server. The procedures of 537 [RFC3263] take as input a SIP URI, extract the domain portion of that 538 URI for use as a lookup key, and query the Domain Name Service (DNS) 539 to obtain an ordered set of one or more IP addresses with a port 540 number and transport corresponding to each IP address in this set 541 (the "Expected Output"). 543 After selecting a specific SIP server from the Expected Output, the 544 SIP client MUST determine if it already has overload control 545 parameter values for the server chosen from the Expected Output. If 546 the SIP client has a non-expired "oc" parameter value for the server 547 chosen from the Expected Output, then this chosen server is operating 548 in overload control mode. Thus, the SIP client MUST determine if it 549 can or cannot forward the current request to the SIP server based on 550 the "oc" and "oc-algo" parameters and any relevant local policy. 552 The particular algorithm used to determine whether or not to forward 553 a particular SIP request is a matter of local policy, and may take 554 into account a variety of prioritization factors. However, this 555 local policy SHOULD transmit the same number of SIP requests as the 556 sample algorithm defined by the overload control scheme being used. 557 (See Section 6.3 for the default loss-based overload control 558 algorithm.) 560 5.6. Forwarding the overload control parameters 562 Overload control is defined in a hop-by-hop manner. Therefore, 563 forwarding the contents of the overload control parameters is 564 generally NOT RECOMMENDED and should only be performed if permitted 565 by the configuration of SIP servers. This means that a SIP proxy 566 SHOULD strip the overload control parameters inserted by the client 567 before proxying the request further downstream. 569 5.7. Terminating overload control 571 A SIP client removes overload control if one of the following events 572 occur: 574 1. The "oc-validity" period previously received by the client from 575 this server (or the default value of 500ms if the server did not 576 previously specify an "oc-validity" parameter) expires; 577 2. The client is explicitly told by the server to stop performing 578 overload control using the "oc-validity=0" parameter. 580 A SIP server can decide to terminate overload control by explicitly 581 signaling the client. To do so, the SIP server MUST set the value of 582 the "oc-validity" parameter to 0. The SIP server MUST increment the 583 value of "oc-seq", and SHOULD set the value of the "oc" parameter to 584 0. 586 Note that the loss-based overload control scheme (Section 6) can 587 effectively stop overload control by setting the value of the "oc" 588 parameter to 0. However, the rate-based scheme 589 ([I-D.ietf-soc-overload-rate-control]) needs an additional piece 590 of information in the form of "oc-validity=0". 592 When the client receives a response with a higher "oc-seq" number 593 than the one it most recently processed, it checks the "oc-validity" 594 parameter. If the value of the "oc-validity" parameter is 0, the 595 client MUST stop performing overload control of messages destined to 596 the server and the traffic should flow without any reduction. 597 Furthermore, when the value of the "oc-validity" parameter is 0, the 598 client SHOULD disregard the value in the "oc" parameter. 600 5.8. Stabilizing overload algorithm selection 602 Realities of deployments of SIP necessitate that the overload control 603 algorithm may be changed upon a system reboot or a software upgrade. 604 However, frequent changes of the overload control algorithm MUST be 605 avoided. Frequent changes of the overload control algorithm will not 606 benefit the client or the server as such flapping does not allow the 607 chosen algorithm to stabilize. An algorithm change, when desired, is 608 simply accomplished by the SIP server choosing a new algorithm from 609 the list in the client's "oc-algo" parameter and sending it back to 610 the client in a response. 612 The client associates a specific algorithm with each server it sends 613 traffic to and when the server changes the algorithm, the client must 614 change its behaviour accordingly. 616 Once the server selects a specific overload control algorithm for a 617 given client, the algorithm MUST remain in effect for at least 3600 618 seconds (1 hour) before another change occurs. This period may 619 involve one or more cycles of overload control being in effect and 620 then being stopped depending on the traffic and resources at the 621 server. 623 One way to accomplish this involves the server saving the time of 624 the last algorithm change in a lookup table, indexed by the 625 client's network identifiers. The server only changes the "oc- 626 algo" parameter when the time since the last change has surpassed 627 3600 seconds. 629 5.9. Self-Limiting 631 In some cases, a SIP client may not receive a response from a server 632 after sending a request. RFC3261 [RFC3261] defines that when a 633 timeout error is received from the transaction layer, it MUST be 634 treated as if a 408 (Request Timeout) status code has been received. 635 If a fatal transport error is reported by the transport layer, it 636 MUST be treated as a 503 (Service Unavailable) status code. 638 In the event of repeated timeouts or fatal transport errors, the SIP 639 client MUST stop sending requests to this server. The SIP client 640 SHOULD periodically probe if the downstream server is alive using any 641 mechanism at its disposal. Once a SIP client has successfully 642 received a normal response for a request sent to the downstream 643 server, the SIP client can resume sending SIP requests. It should, 644 of course, honor any overload control parameters it may receive in 645 the initial, or later, responses. 647 5.10. Responding to an Overload Indication 649 A SIP client can receive overload control feedback indicating that it 650 needs to reduce the traffic it sends to its downstream server. The 651 client can accomplish this task by sending some of the requests that 652 would have gone to the overloaded element to a different destination. 653 It needs to ensure, however, that this destination is not in overload 654 and capable of processing the extra load. A client can also buffer 655 requests in the hope that the overload condition will resolve quickly 656 and the requests still can be forwarded in time. In many cases, 657 however, it will need to reject these requests with a "503 (Service 658 Unavailable)" response without the Retry-After header. 660 5.10.1. Message prioritization at the hop before the overloaded server 662 During an overload condition, a SIP client needs to prioritize 663 requests and select those requests that need to be rejected or 664 redirected. This selection is largely a matter of local policy. It 665 is expected that a SIP client will follow local policy as long as the 666 result in reduction of traffic is consistent with the overload 667 algorithm in effect at that node. Accordingly, the normative 668 behaviour in the next three paragraphs should be interpreted with the 669 understanding that the SIP client will aim to preserve local policy 670 to the fullest extent possible. 672 A SIP client SHOULD honor the local policy for prioritizing SIP 673 requests such as policies based on message type, e.g., INVITEs versus 674 requests associated with existing sessions. 676 A SIP client SHOULD honor the local policy for prioritizing SIP 677 requests based on the content of the Resource-Priority header (RPH, 678 RFC4412 [RFC4412]). Specific (namespace.value) RPH contents may 679 indicate high priority requests that should be preserved as much as 680 possible during overload. The RPH contents can also indicate a low- 681 priority request that is eligible to be dropped during times of 682 overload. 684 A SIP client SHOULD honor the local policy for prioritizing SIP 685 requests relating to emergency calls as identified by the SOS URN 686 [RFC5031] indicating an emergency request. 688 A local policy can be expected to combine both the SIP request type 689 and the prioritization markings, and SHOULD be honored when overload 690 conditions prevail. 692 5.10.2. Rejecting requests at an overloaded server 694 If the upstream SIP client to the overloaded server does not support 695 overload control, it will continue to direct requests to the 696 overloaded server. Thus, for the non-participating client, the 697 overloaded server must bear the cost of rejecting some requests from 698 the client as well as the cost of processing the non-rejected 699 requests to completion. It would be fair to devote the same amount 700 of processing at the overloaded server to the combination of 701 rejection and processing from a non-participating client as the 702 overloaded server would devote to processing requests from a 703 participating client. This is to ensure that SIP clients that do not 704 support this specification don't receive an unfair advantage over 705 those that do. 707 A SIP server that is under overload and has started to throttle 708 incoming traffic MUST reject some requests from non-participating 709 clients with a "503 (Service Unavailable)" response without the 710 Retry-After header. 712 5.11. 100-Trying provisional response and overload control parameters 714 The overload control information sent from a SIP server to a client 715 is transported in the responses. While implementations can insert 716 overload control information in any response, special attention 717 should be accorded to overload control information transported in a 718 100-Trying response. 720 Traditionally, the 100-Trying response has been used in SIP to quench 721 retransmissions. In some implementations, the 100-Trying message may 722 not be generated by the transaction user (TU) nor consumed by the TU. 723 In these implementations, the 100-Trying response is generated at the 724 transaction layer and sent to the upstream SIP client. At the 725 receiving SIP client, the 100-Trying is consumed at the transaction 726 layer by inhibiting the retransmission of the corresponding request. 727 Consequently, implementations that insert overload control 728 information in the 100-Trying cannot assume that the upstream SIP 729 client passed the overload control information in the 100-Trying to 730 their corresponding TU. For this reason, implementations that insert 731 overload control information in the 100-Trying MUST re-insert the 732 same (or updated) overload control information in the first non-100 733 response being sent to the upstream SIP client. 735 6. The loss-based overload control scheme 737 Under a loss-based approach, a SIP server asks an upstream neighbor 738 to reduce the number of requests it would normally forward to this 739 server by a certain percentage. For example, a SIP server can ask an 740 upstream neighbor to reduce the number of requests this neighbor 741 would normally send by 10%. The upstream neighbor then redirects or 742 rejects 10% of the traffic originally destined for that server. 744 This section specifies the semantics of the overload control 745 parameters associated with the loss-based overload control scheme. 746 The general behaviour of SIP clients and servers is specified in 747 Section 5 and is applicable to SIP clients and servers that implement 748 loss-based overload control. 750 6.1. Special parameter values for loss-based overload control 752 The loss-based overload control scheme is identified using the token 753 "loss". This token MUST appear in the "oc-algo" parameter list sent 754 by the SIP client. 756 A SIP server that has selected the loss-based algorithm, upon 757 entering the overload state, will assign a value to the "oc" 758 parameter. This value MUST be in the range of [0, 100], inclusive. 759 This value MUST be interpreted by the client as a percentage, and the 760 SIP client MUST reduce the number of requests being forwarded to the 761 overloaded server by that percent. The SIP client may use any 762 algorithm that reduces the traffic it sends to the overloaded server 763 by the amount indicated. Such an algorithm SHOULD honor the message 764 prioritization discussion of Section 5.10.1. While a particular 765 algorithm is not subject to standardization, for completeness a 766 default algorithm for loss-based overload control is provided in 767 Section 6.3. 769 When a SIP server using the loss-based algorithm receives a request 770 from a client with an "oc" parameter but the SIP server is not 771 experiencing overload, it MUST assign a value of 0 to the "oc" 772 parameter in the response. Assigning such a value lets the client 773 know that the server supports overload control but is not currently 774 requesting any reduction in traffic. 776 When the "oc-validity" parameter is used to signify overload control 777 termination (Section 5.7), the server MUST insert a value of 0 in the 778 "oc-validity" parameter. The server MUST insert a value of 0 in the 779 "oc" parameter as well. When a client receives a response whose "oc- 780 validity" parameter contains a 0, it MUST treat any non-zero value in 781 the "oc" parameter as if it had received a value of 0 in that 782 parameter. 784 6.2. Example 786 Consider a SIP client, P1, which is sending requests to another 787 downstream SIP server, P2. The following snippets of SIP messages 788 demonstrate how the overload control parameters work. 790 INVITE sips:user@example.com SIP/2.0 791 Via: SIP/2.0/TLS p1.example.net; 792 branch=z9hG4bK2d4790.1;oc;oc-algo="loss,A" 793 ... 795 SIP/2.0 100 Trying 796 Via: SIP/2.0/TLS p1.example.net; 797 branch=z9hG4bK2d4790.1;received=192.0.2.111; 798 oc=0;oc-algo="loss";oc-validity=0 799 ... 801 In the messages above, the first line is sent by P1 to P2. This line 802 is a SIP request; because P1 supports overload control, it inserts 803 the "oc" parameter in the topmost Via header that it created. P1 804 supports two overload control algorithms: loss and some algorithm 805 called "A". 807 The second line --- a SIP response --- shows the topmost Via header 808 amended by P2 according to this specification and sent to P1. 809 Because P2 also supports overload control, and because it chooses the 810 "loss" based scheme, it sends "loss" back to P1 in the "oc-algo" 811 parameter. It also sets the value of "oc" and "oc-validity" 812 parameters to 0 because it is not currently requesting overload 813 control activation. 815 Had P2 not supported overload control, it would have left the "oc" 816 and "oc-algo" parameters unchanged, thus allowing the client to know 817 that it did not support overload control. 819 At some later time, P2 starts to experience overload. It sends the 820 following SIP message indicating that P1 should decrease the messages 821 arriving to P2 by 20% for 0.5s. 823 SIP/2.0 180 Ringing 824 Via: SIP/2.0/TLS p1.example.net; 825 branch=z9hG4bK2d4790.3;received=192.0.2.111; 826 oc=20;oc-algo="loss";oc-validity=500; 827 oc-seq=1282321615.782 828 ... 830 After some time, the overload condition at P2 subsides. It then 831 changes the parameter values in the response it sends to P1 to allow 832 P1 to send all messages destined to P2. 834 SIP/2.0 183 Queued 835 Via: SIP/2.0/TLS p1.example.net; 836 branch=z9hG4bK2d4790.4;received=192.0.2.111; 837 oc=0;oc-algo="loss";oc-validity=0;oc-seq=1282321892.439 838 ... 840 6.3. Default algorithm for loss-based overload control 842 This section describes a default algorithm that a SIP client can use 843 to throttle SIP traffic going downstream by the percentage loss value 844 specified in the "oc" parameter. 846 The client maintains two categories of requests; the first category 847 will include requests that are candidates for reduction, and the 848 second category will include requests that are not subject to 849 reduction except when all messages in the first category have been 850 rejected, and further reduction is still needed. Section 851 Section 5.10.1 contains directives on identifying messages for 852 inclusion in the second category. The remaining messages are 853 allocated to the first category. 855 Under overload condition, the client converts the value of the "oc" 856 parameter to a value that it applies to requests in the first 857 category. As a simple example, if "oc=10" and 40% of the requests 858 should be included in the first category, then: 860 10 / 40 * 100 = 25 862 Or, 25% of the requests in the first category can be reduced to get 863 an overall reduction of 10%. The client uses random discard to 864 achieve the 25% reduction of messages in the first category. 865 Messages in the second category proceed downstream unscathed. To 866 affect the 25% reduction rate from the first category, the client 867 draws a random number between 1 and 100 for the request picked from 868 the first category. If the random number is less than or equal to 869 converted value of the "oc" parameter, the request is not forwarded; 870 otherwise the request is forwarded. 872 A reference algorithm is shown below. 874 cat1 := 80.0 // Category 1 --- subject to reduction 875 cat2 := 100.0 - cat1 // Category 2 --- Under normal operations 876 // only subject to reduction after category 1 is exhausted. 877 // Note that the above ratio is simply a reasonable default. 878 // The actual values will change through periodic sampling 879 // as the traffic mix changes over time. 881 while (true) { 882 // We're modeling message processing as a single work queue 883 // that contains both incoming and outgoing messages. 884 sip_msg := get_next_message_from_work_queue() 886 update_mix(cat1, cat2) // See Note below 888 switch (sip_msg.type) { 890 case outbound request: 891 destination := get_next_hop(sip_msg) 892 oc_context := get_oc_context(destination) 894 if (oc_context == null) { 895 send_to_network(sip_msg) // Process it normally by sending the 896 // request to the next hop since this particular destination 897 // is not subject to overload 898 } 899 else { 900 // Determine if server wants to enter in overload or is in 901 // overload 902 in_oc := extract_in_oc(oc_context) 904 oc_value := extract_oc(oc_context) 905 oc_validity := extract_oc_validity(oc_context) 906 if (in_oc == false or oc_validity is not in effect) { 907 send_to_network(sip_msg) // Process it normally by sending 908 // the request to the next hop since this particular 909 // destination is not subject to overload. Optionally, 910 // clear the oc context for this server (not shown). 911 } 912 else { // Begin perform overload control 913 r := random() 914 drop_msg := false 916 category := assign_msg_to_category(sip_msg) 918 pct_to_reduce_cat1 = oc_value / cat1 * 100 920 if (oc_value <= cat1) { // Reduce all msgs from category 1 921 if (r <= pct_to_reduce_cat1 && category == cat1) { 922 drop_msg := true 923 } 924 } 925 else { // oc_value > category 1. Reduce 100% of msgs from 926 // category 1 and remaining from category 2. 927 pct_to_reduce_cat2 = (oc_value - cat1) / cat2 * 100 928 if (category == cat1) { 929 drop_msg := true 930 } 931 else { 932 if (r <= pct_to_reduce_cat2) { 933 drop_msg := true; 934 } 935 } 936 } 938 if (drop_msg == false) { 939 send_to_network(sip_msg) // Process it normally by 940 // sending the request to the next hop 941 } 942 else { 943 // Do not send request downstream, handle locally by 944 // generating response (if a proxy) or treating as 945 // an error (if a user agent). 946 } 948 } // End perform overload control 949 } 951 end case // outbound request 953 case outbound response: 955 if (we are in overload) { 956 add_overload_parameters(sip_msg) 957 } 958 send_to_network(sip_msg) 960 end case // outbound response 962 case inbound response: 964 if (sip_msg has oc parameter values) { 965 create_or_update_oc_context() // For the specific server 966 // that sent the response, create or update the oc context; 967 // i.e., extract the values of the oc-related parameters 968 // and store them for later use. 969 } 970 process_msg(sip_msg) 972 end case // inbound response 973 case inbound request: 975 if (we are not in overload) { 976 process_msg(sip_msg) 977 } 978 else { // We are in overload 979 if (sip_msg has oc parameters) { // Upstream client supports 980 process_msg(sip_msg) // oc; only sends important requests 981 } 982 else { // Upstream client does not support oc 983 if (local_policy(sip_msg) says process message) { 984 process_msg(sip_msg) 985 } 986 else { 987 send_response(sip_msg, 503) 988 } 989 } 990 } 991 end case // inbound request 992 } 993 } 995 Note: A simple way to sample the traffic mix for category 1 and 996 category 2 is to associate a counter with each category of message. 997 Periodically (every 5-10s) get the value of the counters and calculate 998 the ratio of category 1 messages to category 2 messages since the 999 last calculation. 1001 Example: In the last 5 seconds, a total of 500 requests arrived 1002 at the queue. 450 out of the 500 were messages subject 1003 to reduction and 50 out of 500 were classified as requests not 1004 subject to reduction. Based on this ratio, cat1 := 90 and 1005 cat2 := 10, so a 90/10 mix will be used in overload calculations. 1007 7. Relationship with other IETF SIP load control efforts 1009 The overload control mechanism described in this document is reactive 1010 in nature and apart from message prioritization directives listed in 1011 Section 5.10.1 the mechanisms described in this draft will not 1012 discriminate requests based on user identity, filtering action and 1013 arrival time. SIP networks that require pro-active overload control 1014 mechanisms can upload user-level load control filters as described in 1015 [I-D.ietf-soc-load-control-event-package]. Local policy will also 1016 dictate the precedence of different overload control mechanisms 1017 applied to the traffic. Specifically, in a scenario where load 1018 control filters are installed by signaling neighbours [I-D.ietf-soc- 1019 load-control-event-package] and the same traffic can also be 1020 throttled using the overload control mechanism, local policy will 1021 dictate which of these schemes shall be given precedence. 1022 Interactions between the two schemes are out of scope for this 1023 document. 1025 8. Syntax 1027 This specification extends the existing definition of the Via header 1028 field parameters of [RFC3261] as follows: 1030 via-params = via-ttl / via-maddr 1031 / via-received / via-branch 1032 / oc / oc-validity 1033 / oc-seq / oc-algo / via-extension 1035 oc = "oc" [EQUAL oc-num] 1036 oc-num = 1*DIGIT 1037 oc-validity = "oc-validity" [EQUAL delta-ms] 1038 oc-seq = "oc-seq" EQUAL 1*12DIGIT "." 1*5DIGIT 1039 oc-algo = "oc-algo" EQUAL DQUOTE algo-list *(COMMA algo-list) 1040 DQUOTE 1041 algo-list = "loss" / *(other-algo) 1042 other-algo = %x41-5A / %x61-7A / %x30-39 1043 delta-ms = 1*DIGIT 1045 9. Design Considerations 1047 This section discusses specific design considerations for the 1048 mechanism described in this document. General design considerations 1049 for SIP overload control can be found in [RFC6357]. 1051 9.1. SIP Mechanism 1053 A SIP mechanism is needed to convey overload feedback from the 1054 receiving to the sending SIP entity. A number of different 1055 alternatives exist to implement such a mechanism. 1057 9.1.1. SIP Response Header 1059 Overload control information can be transmitted using a new Via 1060 header field parameter for overload control. A SIP server can add 1061 this header parameter to the responses it is sending upstream to 1062 provide overload control feedback to its upstream neighbors. This 1063 approach has the following characteristics: 1065 o A Via header parameter is light-weight and creates very little 1066 overhead. It does not require the transmission of additional 1067 messages for overload control and does not increase traffic or 1068 processing burdens in an overload situation. 1069 o Overload control status can frequently be reported to upstream 1070 neighbors since it is a part of a SIP response. This enables the 1071 use of this mechanism in scenarios where the overload status needs 1072 to be adjusted frequently. It also enables the use of overload 1073 control mechanisms that use regular feedback such as window-based 1074 overload control. 1075 o With a Via header parameter, overload control status is inherent 1076 in SIP signaling and is automatically conveyed to all relevant 1077 upstream neighbors, i.e., neighbors that are currently 1078 contributing traffic. There is no need for a SIP server to 1079 specifically track and manage the set of current upstream or 1080 downstream neighbors with which it should exchange overload 1081 feedback. 1082 o Overload status is not conveyed to inactive senders. This avoids 1083 the transmission of overload feedback to inactive senders, which 1084 do not contribute traffic. If an inactive sender starts to 1085 transmit while the receiver is in overload it will receive 1086 overload feedback in the first response and can adjust the amount 1087 of traffic forwarded accordingly. 1088 o A SIP server can limit the distribution of overload control 1089 information by only inserting it into responses to known upstream 1090 neighbors. A SIP server can use transport level authentication 1091 (e.g., via TLS) with its upstream neighbors. 1093 9.1.2. SIP Event Package 1095 Overload control information can also be conveyed from a receiver to 1096 a sender using a new event package. Such an event package enables a 1097 sending entity to subscribe to the overload status of its downstream 1098 neighbors and receive notifications of overload control status 1099 changes in NOTIFY requests. This approach has the following 1100 characteristics: 1102 o Overload control information is conveyed decoupled from SIP 1103 signaling. It enables an overload control manager, which is a 1104 separate entity, to monitor the load on other servers and provide 1105 overload control feedback to all SIP servers that have set up 1106 subscriptions with the controller. 1107 o With an event package, a receiver can send updates to senders that 1108 are currently inactive. Inactive senders will receive a 1109 notification about the overload and can refrain from sending 1110 traffic to this neighbor until the overload condition is resolved. 1111 The receiver can also notify all potential senders once they are 1112 permitted to send traffic again. However, these notifications do 1113 generate additional traffic, which adds to the overall load. 1114 o A SIP entity needs to set up and maintain overload control 1115 subscriptions with all upstream and downstream neighbors. A new 1116 subscription needs to be set up before/while a request is 1117 transmitted to a new downstream neighbor. Servers can be 1118 configured to subscribe at boot time. However, this would require 1119 additional protection to avoid the avalanche restart problem for 1120 overload control. Subscriptions need to be terminated when they 1121 are not needed any more, which can be done, for example, using a 1122 timeout mechanism. 1123 o A receiver needs to send NOTIFY messages to all subscribed 1124 upstream neighbors in a timely manner when the control algorithm 1125 requires a change in the control variable (e.g., when a SIP server 1126 is in an overload condition). This includes active as well as 1127 inactive neighbors. These NOTIFYs add to the amount of traffic 1128 that needs to be processed. To ensure that these requests will 1129 not be dropped due to overload, a priority mechanism needs to be 1130 implemented in all servers these request will pass through. 1131 o As overload feedback is sent to all senders in separate messages, 1132 this mechanism is not suitable when frequent overload control 1133 feedback is needed. 1134 o A SIP server can limit the set of senders that can receive 1135 overload control information by authenticating subscriptions to 1136 this event package. 1137 o This approach requires each proxy to implement user agent 1138 functionality (UAS and UAC) to manage the subscriptions. 1140 9.2. Backwards Compatibility 1142 An new overload control mechanism needs to be backwards compatible so 1143 that it can be gradually introduced into a network and functions 1144 properly if only a fraction of the servers support it. 1146 Hop-by-hop overload control (see [RFC6357]) has the advantage that it 1147 does not require that all SIP entities in a network support it. It 1148 can be used effectively between two adjacent SIP servers if both 1149 servers support overload control and does not depend on the support 1150 from any other server or user agent. The more SIP servers in a 1151 network support hop-by-hop overload control, the better protected the 1152 network is against occurrences of overload. 1154 A SIP server may have multiple upstream neighbors from which only 1155 some may support overload control. If a server would simply use this 1156 overload control mechanism, only those that support it would reduce 1157 traffic. Others would keep sending at the full rate and benefit from 1158 the throttling by the servers that support overload control. In 1159 other words, upstream neighbors that do not support overload control 1160 would be better off than those that do. 1162 A SIP server should therefore follow the behaviour outlined in 1163 Section 5.10.2 to handle clients that do not support overload 1164 control. 1166 10. Security Considerations 1168 Overload control mechanisms can be used by an attacker to conduct a 1169 denial-of-service attack on a SIP entity if the attacker can pretend 1170 that the SIP entity is overloaded. When such a forged overload 1171 indication is received by an upstream SIP client, it will stop 1172 sending traffic to the victim. Thus, the victim is subject to a 1173 denial-of-service attack. 1175 An attacker can create forged overload feedback by inserting itself 1176 into the communication between the victim and its upstream neighbors. 1177 The attacker would need to add overload feedback indicating a high 1178 load to the responses passed from the victim to its upstream 1179 neighbor. Proxies can prevent this attack by communicating via TLS. 1180 Since overload feedback has no meaning beyond the next hop, there is 1181 no need to secure the communication over multiple hops. 1183 Another way to conduct an attack is to send a message containing a 1184 high overload feedback value through a proxy that does not support 1185 this extension. If this feedback is added to the second Via headers 1186 (or all Via headers), it will reach the next upstream proxy. If the 1187 attacker can make the recipient believe that the overload status was 1188 created by its direct downstream neighbor (and not by the attacker 1189 further downstream) the recipient stops sending traffic to the 1190 victim. A precondition for this attack is that the victim proxy does 1191 not support this extension since it would not pass through overload 1192 control feedback otherwise. 1194 A malicious SIP entity could gain an advantage by pretending to 1195 support this specification but never reducing the amount of traffic 1196 it forwards to the downstream neighbor. If its downstream neighbor 1197 receives traffic from multiple sources which correctly implement 1198 overload control, the malicious SIP entity would benefit since all 1199 other sources to its downstream neighbor would reduce load. 1201 The solution to this problem depends on the overload control 1202 method. For rate-based and window-based overload control, it is 1203 very easy for a downstream entity to monitor if the upstream 1204 neighbor throttles traffic forwarded as directed. For percentage 1205 throttling this is not always obvious since the load forwarded 1206 depends on the load received by the upstream neighbor. 1208 11. IANA Considerations 1210 This specification defines four new Via header parameters as detailed 1211 below in the "Header Field Parameter and Parameter Values" sub- 1212 registry as per the registry created by [RFC3968]. The required 1213 information is: 1215 Header Field Parameter Name Predefined Values Reference 1216 __________________________________________________________ 1217 Via oc Yes RFCXXXX 1218 Via oc-validity Yes RFCXXXX 1219 Via oc-seq Yes RFCXXXX 1220 Via oc-algo Yes RFCXXXX 1222 RFC XXXX [NOTE TO RFC-EDITOR: Please replace with final RFC 1223 number of this specification.] 1225 12. References 1227 12.1. Normative References 1229 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1230 Requirement Levels", BCP 14, RFC 2119, March 1997. 1232 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1233 A., Peterson, J., Sparks, R., Handley, M., and E. 1234 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1235 June 2002. 1237 [RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation 1238 Protocol (SIP): Locating SIP Servers", RFC 3263, 1239 June 2002. 1241 [RFC3968] Camarillo, G., "The Internet Assigned Number Authority 1242 (IANA) Header Field Parameter Registry for the Session 1243 Initiation Protocol (SIP)", BCP 98, RFC 3968, 1244 December 2004. 1246 [RFC4412] Schulzrinne, H. and J. Polk, "Communications Resource 1247 Priority for the Session Initiation Protocol (SIP)", 1248 RFC 4412, February 2006. 1250 12.2. Informative References 1252 [I-D.ietf-soc-load-control-event-package] 1253 Shen, C., Schulzrinne, H., and A. Koike, "A Session 1254 Initiation Protocol (SIP) Load Control Event Package", 1255 draft-ietf-soc-load-control-event-package-08 (work in 1256 progress), March 2013. 1258 [I-D.ietf-soc-overload-rate-control] 1259 Noel, E. and P. Williams, "Session Initiation Protocol 1260 (SIP) Rate Control", 1261 draft-ietf-soc-overload-rate-control-04 (work in 1262 progress), April 2013. 1264 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 1265 Emergency and Other Well-Known Services", RFC 5031, 1266 January 2008. 1268 [RFC5390] Rosenberg, J., "Requirements for Management of Overload in 1269 the Session Initiation Protocol", RFC 5390, December 2008. 1271 [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design 1272 Considerations for Session Initiation Protocol (SIP) 1273 Overload Control", RFC 6357, August 2011. 1275 Appendix A. Acknowledgements 1277 The authors acknowledge the contributions of Bruno Chatras, Keith 1278 Drage, Janet Gunn, Rich Terpstra, Daryl Malas, Eric Noel, R. 1280 Parthasarathi, Antoine Roly, Jonathan Rosenberg, Charles Shen, Rahul 1281 Srivastava, Padma Valluri, Shaun Bharrat, Paul Kyzivat and Jeroen Van 1282 Bemmel to this document. 1284 Adam Roach and Eric McMurry helped flesh out the different cases for 1285 handling SIP messages described in the algorithm of Section 6.3. 1286 Janet Gunn reviewed the algorithm and suggested changes that lead to 1287 simpler processing for the case where "oc_value > cat1". 1289 Appendix B. RFC5390 requirements 1291 Table 1 provides a summary how this specification fulfills the 1292 requirements of [RFC5390]. A more detailed view on how each 1293 requirements is fulfilled is provided after the table. 1295 +-------------+-------------------+ 1296 | Requirement | Meets requirement | 1297 +-------------+-------------------+ 1298 | REQ 1 | Yes | 1299 | REQ 2 | Yes | 1300 | REQ 3 | Partially | 1301 | REQ 4 | Partially | 1302 | REQ 5 | Partially | 1303 | REQ 6 | Not applicable | 1304 | REQ 7 | Yes | 1305 | REQ 8 | Partially | 1306 | REQ 9 | Yes | 1307 | REQ 10 | Yes | 1308 | REQ 11 | Yes | 1309 | REQ 12 | Yes | 1310 | REQ 13 | Yes | 1311 | REQ 14 | Yes | 1312 | REQ 15 | Yes | 1313 | REQ 16 | Yes | 1314 | REQ 17 | Partially | 1315 | REQ 18 | Yes | 1316 | REQ 19 | Yes | 1317 | REQ 20 | Yes | 1318 | REQ 21 | Yes | 1319 | REQ 22 | Yes | 1320 | REQ 23 | Yes | 1321 +-------------+-------------------+ 1323 Summary of meeting requirements in RFC5390 1325 Table 1 1327 REQ 1: The overload mechanism shall strive to maintain the overall 1328 useful throughput (taking into consideration the quality-of-service 1329 needs of the using applications) of a SIP server at reasonable 1330 levels, even when the incoming load on the network is far in excess 1331 of its capacity. The overall throughput under load is the ultimate 1332 measure of the value of an overload control mechanism. 1334 Meeting REQ 1: Yes, the overload control mechanism allows an 1335 overloaded SIP server to maintain a reasonable level of throughput as 1336 it enters into congestion mode by requesting the upstream clients to 1337 reduce traffic destined downstream. 1339 REQ 2: When a single network element fails, goes into overload, or 1340 suffers from reduced processing capacity, the mechanism should strive 1341 to limit the impact of this on other elements in the network. This 1342 helps to prevent a small-scale failure from becoming a widespread 1343 outage. 1345 Meeting REQ 2: Yes. When a SIP server enters overload mode, it will 1346 request the upstream clients to throttle the traffic destined to it. 1347 As a consequence of this, the overloaded SIP server will itself 1348 generate proportionally less downstream traffic, thereby limiting the 1349 impact on other elements in the network. 1351 REQ 3: The mechanism should seek to minimize the amount of 1352 configuration required in order to work. For example, it is better 1353 to avoid needing to configure a server with its SIP message 1354 throughput, as these kinds of quantities are hard to determine. 1356 Meeting REQ 3: Partially. On the server side, the overload condition 1357 is determined monitoring S (c.f., Section 4 of [RFC6357]) and 1358 reporting a load feedback F as a value to the "oc" parameter. On the 1359 client side, a throttle T is applied to requests going downstream 1360 based on F. This specification does not prescribe any value for S, 1361 nor a particular value for F. The "oc-algo" parameter allows for 1362 automatic convergence to a particular class of overload control 1363 algorithm. There are suggested default values for the "oc-validity" 1364 parameter. 1366 REQ 4: The mechanism must be capable of dealing with elements that do 1367 not support it, so that a network can consist of a mix of elements 1368 that do and don't support it. In other words, the mechanism should 1369 not work only in environments where all elements support it. It is 1370 reasonable to assume that it works better in such environments, of 1371 course. Ideally, there should be incremental improvements in overall 1372 network throughput as increasing numbers of elements in the network 1373 support the mechanism. 1375 Meeting REQ 4: Partially. The mechanism is designed to reduce 1376 congestion when a pair of communicating entities support it. If a 1377 downstream overloaded SIP server does not respond to a request in 1378 time, a SIP client will attempt to reduce traffic destined towards 1379 the non-responsive server as outlined in Section 5.9. 1381 REQ 5: The mechanism should not assume that it will only be deployed 1382 in environments with completely trusted elements. It should seek to 1383 operate as effectively as possible in environments where other 1384 elements are malicious; this includes preventing malicious elements 1385 from obtaining more than a fair share of service. 1387 Meeting REQ 5: Partially. Since overload control information is 1388 shared between a pair of communicating entities, a confidential and 1389 authenticated channel can be used for this communication. However, 1390 if such a channel is not available, then the security ramifications 1391 outlined in Section 10 apply. 1393 REQ 6: When overload is signaled by means of a specific message, the 1394 message must clearly indicate that it is being sent because of 1395 overload, as opposed to other, non overload-based failure conditions. 1396 This requirement is meant to avoid some of the problems that have 1397 arisen from the reuse of the 503 response code for multiple purposes. 1398 Of course, overload is also signaled by lack of response to requests. 1399 This requirement applies only to explicit overload signals. 1401 Meeting REQ 6: Not applicable. Overload control information is 1402 signaled as part of the Via header and not in a new header. 1404 REQ 7: The mechanism shall provide a way for an element to throttle 1405 the amount of traffic it receives from an upstream element. This 1406 throttling shall be graded so that it is not all- or-nothing as with 1407 the current 503 mechanism. This recognizes the fact that "overload" 1408 is not a binary state and that there are degrees of overload. 1410 Meeting REQ 7: Yes, please see Section 5.5 and Section 5.10. 1412 REQ 8: The mechanism shall ensure that, when a request was not 1413 processed successfully due to overload (or failure) of a downstream 1414 element, the request will not be retried on another element that is 1415 also overloaded or whose status is unknown. This requirement derives 1416 from REQ 1. 1418 Meeting REQ 8: Partially. A SIP client that has overload information 1419 from multiple downstream servers will not retry the request on 1420 another element. However, if a SIP client does not know the overload 1421 status of a downstream server, it may send the request to that 1422 server. 1424 REQ 9: That a request has been rejected from an overloaded element 1425 shall not unduly restrict the ability of that request to be submitted 1426 to and processed by an element that is not overloaded. This 1427 requirement derives from REQ 1. 1429 Meeting REQ 9: Yes, a SIP client conformant to this specification 1430 will send the request to a different element. 1432 REQ 10: The mechanism should support servers that receive requests 1433 from a large number of different upstream elements, where the set of 1434 upstream elements is not enumerable. 1436 Meeting REQ 10: Yes, there are no constraints on the number of 1437 upstream clients. 1439 REQ 11: The mechanism should support servers that receive requests 1440 from a finite set of upstream elements, where the set of upstream 1441 elements is enumerable. 1443 Meeting REQ 11: Yes, there are no constraints on the number of 1444 upstream clients. 1446 REQ 12: The mechanism should work between servers in different 1447 domains. 1449 Meeting REQ 12: Yes, there are no inherent limitations on using 1450 overload control between domains. 1452 REQ 13: The mechanism must not dictate a specific algorithm for 1453 prioritizing the processing of work within a proxy during times of 1454 overload. It must permit a proxy to prioritize requests based on any 1455 local policy, so that certain ones (such as a call for emergency 1456 services or a call with a specific value of the Resource-Priority 1457 header field [RFC4412]) are given preferential treatment, such as not 1458 being dropped, being given additional retransmission, or being 1459 processed ahead of others. 1461 Meeting REQ 13: Yes, please see Section 5.10. 1463 REQ 14: REQ 14: The mechanism should provide unambiguous directions 1464 to clients on when they should retry a request and when they should 1465 not. This especially applies to TCP connection establishment and SIP 1466 registrations, in order to mitigate against avalanche restart. 1468 Meeting REQ 14: Yes, Section 5.9 provides normative behavior on when 1469 to retry a request after repeated timeouts and fatal transport errors 1470 resulting from communications with a non-responsive downstream SIP 1471 server. 1473 REQ 15: In cases where a network element fails, is so overloaded that 1474 it cannot process messages, or cannot communicate due to a network 1475 failure or network partition, it will not be able to provide explicit 1476 indications of the nature of the failure or its levels of congestion. 1477 The mechanism must properly function in these cases. 1479 Meeting REQ 15: Yes, Section 5.9 provides normative behavior on when 1480 to retry a request after repeated timeouts and fatal transport errors 1481 resulting from communications with a non-responsive downstream SIP 1482 server. 1484 REQ 16: The mechanism should attempt to minimize the overhead of the 1485 overload control messaging. 1487 Meeting REQ 16: Yes, overload control messages are sent in the 1488 topmost Via header, which is always processed by the SIP elements. 1490 REQ 17: The overload mechanism must not provide an avenue for 1491 malicious attack, including DoS and DDoS attacks. 1493 Meeting REQ 17: Partially. Since overload control information is 1494 shared between a pair of communicating entities, a confidential and 1495 authenticated channel can be used for this communication. However, 1496 if such a channel is not available, then the security ramifications 1497 outlined in Section 10 apply. 1499 REQ 18: The overload mechanism should be unambiguous about whether a 1500 load indication applies to a specific IP address, host, or URI, so 1501 that an upstream element can determine the load of the entity to 1502 which a request is to be sent. 1504 Meeting REQ 18: Yes, please see discussion in Section 5.5. 1506 REQ 19: The specification for the overload mechanism should give 1507 guidance on which message types might be desirable to process over 1508 others during times of overload, based on SIP-specific 1509 considerations. For example, it may be more beneficial to process a 1510 SUBSCRIBE refresh with Expires of zero than a SUBSCRIBE refresh with 1511 a non-zero expiration (since the former reduces the overall amount of 1512 load on the element), or to process re-INVITEs over new INVITEs. 1514 Meeting REQ 19: Yes, please see Section 5.10. 1516 REQ 20: In a mixed environment of elements that do and do not 1517 implement the overload mechanism, no disproportionate benefit shall 1518 accrue to the users or operators of the elements that do not 1519 implement the mechanism. 1521 Meeting REQ 20: Yes, an element that does not implement overload 1522 control does not receive any measure of extra benefit. 1524 REQ 21: The overload mechanism should ensure that the system remains 1525 stable. When the offered load drops from above the overall capacity 1526 of the network to below the overall capacity, the throughput should 1527 stabilize and become equal to the offered load. 1529 Meeting REQ 21: Yes, the overload control mechanism described in this 1530 draft ensures the stability of the system. 1532 REQ 22: It must be possible to disable the reporting of load 1533 information towards upstream targets based on the identity of those 1534 targets. This allows a domain administrator who considers the load 1535 of their elements to be sensitive information, to restrict access to 1536 that information. Of course, in such cases, there is no expectation 1537 that the overload mechanism itself will help prevent overload from 1538 that upstream target. 1540 Meeting REQ 22: Yes, an operator of a SIP server can configure the 1541 SIP server to only report overload control information for requests 1542 received over a confidential channel, for example. However, note 1543 that this requirement is in conflict with REQ 3, as it introduces a 1544 modicum of extra configuration. 1546 REQ 23: It must be possible for the overload mechanism to work in 1547 cases where there is a load balancer in front of a farm of proxies. 1549 Meeting REQ 23: Yes. Depending on the type of load balancer, this 1550 requirement is met. A load balancer fronting a farm of SIP proxies 1551 could be a SIP-aware load balancer or one that is not SIP-aware. If 1552 the load balancer is SIP-aware, it can make conscious decisions on 1553 throttling outgoing traffic towards the individual server in the farm 1554 based on the overload control parameters returned by the server. On 1555 the other hand, if the load balancer is not SIP-aware, then there are 1556 other strategies to perform overload control. Section 6 of [RFC6357] 1557 documents some of these strategies in more detail (see discussion 1558 related to Figure 3(a) in Section 6). 1560 Authors' Addresses 1562 Vijay K. Gurbani (editor) 1563 Bell Laboratories, Alcatel-Lucent 1564 1960 Lucent Lane, Rm 9C-533 1565 Naperville, IL 60563 1566 USA 1568 Email: vkg@bell-labs.com 1570 Volker Hilt 1571 Bell Laboratories, Alcatel-Lucent 1572 791 Holmdel-Keyport Rd 1573 Holmdel, NJ 07733 1574 USA 1576 Email: volkerh@bell-labs.com 1578 Henning Schulzrinne 1579 Columbia University/Department of Computer Science 1580 450 Computer Science Building 1581 New York, NY 10027 1582 USA 1584 Phone: +1 212 939 7004 1585 Email: hgs@cs.columbia.edu 1586 URI: http://www.cs.columbia.edu