idnits 2.17.1 draft-ietf-soc-overload-control-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 742 has weird spacing: '...control param...' == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (March 3, 2014) is 3707 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 845 -- Looks like a reference, but probably isn't: '100' on line 845 == Outdated reference: A later version (-10) exists of draft-ietf-soc-overload-rate-control-07 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Gurbani, Ed. 3 Internet-Draft V. Hilt 4 Intended status: Standards Track Bell Laboratories, 5 Expires: September 4, 2014 Alcatel-Lucent 6 H. Schulzrinne 7 Columbia University 8 March 3, 2014 10 Session Initiation Protocol (SIP) Overload Control 11 draft-ietf-soc-overload-control-15 13 Abstract 15 Overload occurs in Session Initiation Protocol (SIP) networks when 16 SIP servers have insufficient resources to handle all SIP messages 17 they receive. Even though the SIP protocol provides a limited 18 overload control mechanism through its 503 (Service Unavailable) 19 response code, SIP servers are still vulnerable to overload. This 20 document defines the behaviour of SIP servers involved in overload 21 control, and in addition, it specifies a loss-based overload scheme 22 for SIP. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on September 4, 2014. 41 Copyright Notice 43 Copyright (c) 2014 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3. Overview of operations . . . . . . . . . . . . . . . . . . . . 5 61 4. Via header parameters for overload control . . . . . . . . . . 6 62 4.1. The oc parameter . . . . . . . . . . . . . . . . . . . . . 6 63 4.2. The oc-algo parameter . . . . . . . . . . . . . . . . . . 7 64 4.3. The oc-validity parameter . . . . . . . . . . . . . . . . 8 65 4.4. The oc-seq parameter . . . . . . . . . . . . . . . . . . . 8 66 5. General behaviour . . . . . . . . . . . . . . . . . . . . . . 9 67 5.1. Determining support for overload control . . . . . . . . . 9 68 5.2. Creating and updating the overload control parameters . . 10 69 5.3. Determining the 'oc' Parameter Value . . . . . . . . . . . 12 70 5.4. Processing the Overload Control Parameters . . . . . . . . 12 71 5.5. Using the Overload Control Parameter Values . . . . . . . 13 72 5.6. Forwarding the overload control parameters . . . . . . . . 13 73 5.7. Terminating overload control . . . . . . . . . . . . . . . 14 74 5.8. Stabilizing overload algorithm selection . . . . . . . . . 14 75 5.9. Self-Limiting . . . . . . . . . . . . . . . . . . . . . . 15 76 5.10. Responding to an Overload Indication . . . . . . . . . . . 15 77 5.10.1. Message prioritization at the hop before the 78 overloaded server . . . . . . . . . . . . . . . . . . 16 79 5.10.2. Rejecting requests at an overloaded server . . . . . 16 80 5.11. 100-Trying provisional response and overload control 81 parameters . . . . . . . . . . . . . . . . . . . . . . . . 17 82 6. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 83 7. The loss-based overload control scheme . . . . . . . . . . . . 19 84 7.1. Special parameter values for loss-based overload 85 control . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 7.2. Default algorithm for loss-based overload control . . . . 19 87 8. Relationship with other IETF SIP load control efforts . . . . 23 88 9. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 89 10. Design Considerations . . . . . . . . . . . . . . . . . . . . 24 90 10.1. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 24 91 10.1.1. SIP Response Header . . . . . . . . . . . . . . . . . 24 92 10.1.2. SIP Event Package . . . . . . . . . . . . . . . . . . 25 93 10.2. Backwards Compatibility . . . . . . . . . . . . . . . . . 26 94 11. Security Considerations . . . . . . . . . . . . . . . . . . . 26 95 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 96 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 97 13.1. Normative References . . . . . . . . . . . . . . . . . . . 29 98 13.2. Informative References . . . . . . . . . . . . . . . . . . 29 99 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 30 100 Appendix B. RFC5390 requirements . . . . . . . . . . . . . . . . 30 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 103 1. Introduction 105 As with any network element, a Session Initiation Protocol (SIP) 106 [RFC3261] server can suffer from overload when the number of SIP 107 messages it receives exceeds the number of messages it can process. 108 Overload can pose a serious problem for a network of SIP servers. 109 During periods of overload, the throughput of a network of SIP 110 servers can be significantly degraded. In fact, overload may lead to 111 a situation where the retransmissions of dropped SIP messages may 112 overwhelm the capacity of the network. This is often called 113 congestion collapse. 115 Overload is said to occur if a SIP server does not have sufficient 116 resources to process all incoming SIP messages. These resources may 117 include CPU processing capacity, memory, input/output, or disk 118 resources. 120 For overload control, this document only addresses failure cases 121 where SIP servers are unable to process all SIP requests due to 122 resource constraints. There are other cases where a SIP server can 123 successfully process incoming requests but has to reject them due to 124 failure conditions unrelated to the SIP server being overloaded. For 125 example, a PSTN gateway that runs out of trunks but still has plenty 126 of capacity to process SIP messages should reject incoming INVITEs 127 using a 488 (Not Acceptable Here) response [RFC4412]. Similarly, a 128 SIP registrar that has lost connectivity to its registration database 129 but is still capable of processing SIP requests should reject 130 REGISTER requests with a 500 (Server Error) response [RFC3261]. 131 Overload control does not apply to these cases and SIP provides 132 appropriate response codes for them. 134 The SIP protocol provides a limited mechanism for overload control 135 through its 503 (Service Unavailable) response code. However, this 136 mechanism cannot prevent overload of a SIP server and it cannot 137 prevent congestion collapse. In fact, the use of the 503 (Service 138 Unavailable) response code may cause traffic to oscillate and to 139 shift between SIP servers and thereby worsen an overload condition. 140 A detailed discussion of the SIP overload problem, the problems with 141 the 503 (Service Unavailable) response code and the requirements for 142 a SIP overload control mechanism can be found in [RFC5390]. 144 This document defines the protocol for communicating overload 145 information between SIP servers and clients, so that clients can 146 reduce the volume of traffic sent to overloaded servers, avoiding 147 congestion collapse and increasing useful throughput. Section 4 148 describes the Via header parameters used for this communication. The 149 general behaviour of SIP servers and clients involved in overload 150 control is described in Section 5. In addition, Section 7 specifies 151 a loss-based overload control scheme. 153 This document specifies the loss-based overload control scheme 154 (Section 7), which is mandatory-to-implement for this specification. 155 In addition, this document allows other overload control schemes to 156 be supported as well. To do so effectively, the expectations and 157 primitive protocol parameters common to all class of overload control 158 schemes are specified in this document. 160 2. Terminology 162 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 163 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 164 document are to be interpreted as described in RFC 2119 [RFC2119]. 166 In this document, the terms "SIP client" and "SIP server" are used in 167 their generic forms. Thus, a "SIP client" could refer to the client 168 transaction state machine in a SIP proxy or it could refer to a user 169 agent client. Similarly, a "SIP server" could be a user agent server 170 or the server transaction state machine in a proxy. Various 171 permutations of this are also possible, for instance, SIP clients and 172 servers could also be part of back-to-back user agents (B2BUAs). 174 However, irrespective of the context (i.e., proxy, B2BUA, UAS, UAC) 175 these terms are used in, "SIP client" applies to any SIP entity that 176 provides overload control to traffic destined downstream. Similarly, 177 "SIP server" applies to any SIP entity that is experiencing overload 178 and would like its upstream neighbour to throttle incoming traffic. 180 Unless otherwise specified, all SIP entities described in this 181 document are assumed to support this specification. 183 The normative statements in this specification as they apply to SIP 184 clients and SIP servers assume that both the SIP clients and SIP 185 servers support this specification. If, for instance, only a SIP 186 client supports this specification and not the SIP server, then 187 follows that the normative statements in this specification pertinent 188 to the behavior of a SIP server do not apply to the server that does 189 not support this specification. 191 3. Overview of operations 193 This section provides an overview of how the overload control 194 mechanism operates by introducing the overload control parameters. 195 Section 4 provides more details and normative behavior on the 196 parameters listed below. 198 Because overload control is performed hop-by-hop, the Via parameter 199 is attractive since it allows two adjacent SIP entities to indicate 200 support for, and exchange information associated with overload 201 control [RFC6357]. Additional advantages of this choice are 202 discussed in Section 10.1.1. An alternative mechanism using SIP 203 event packages was also considered, and the characteristics of that 204 choice are further outlined in Section 10.1.2. 206 This document defines four new parameters for the SIP Via header for 207 overload control. These parameters provide a mechanism for conveying 208 overload control information between adjacent SIP entities. The "oc" 209 parameter is used by a SIP server to indicate a reduction in the 210 amount of requests arriving at the server. The "oc-algo" parameter 211 contains a token or a list of tokens corresponding to the class of 212 overload control algorithms supported by the client. The server 213 chooses one algorithm from this list. The "oc-validity" parameter 214 establishes a time limit for which overload control is in effect, and 215 the "oc-seq" parameter aids in sequencing the responses at the 216 client. These parameters are discussed in detail in the next 217 section. 219 4. Via header parameters for overload control 221 The four Via header parameters are introduced below. Further context 222 about how to interpret these under various conditions is provided in 223 Section 5. 225 4.1. The oc parameter 227 This parameter is inserted by the SIP client and updated by the SIP 228 server. 230 A SIP client MUST add an "oc" parameter to the topmost Via header it 231 inserts into every SIP request. This provides an indication to 232 downstream neighbors that the client supports overload control. 233 There MUST NOT be a value associated with the parameter (the value 234 will be added by the server). 236 The downstream server MUST add a value to the "oc" parameter in the 237 response going upstream to a client that included the "oc" parameter 238 in the request. Inclusion of a value to the parameter represents two 239 things: one, upon the first contact (see Section 5.1), addition of a 240 value by the server to this parameter indicates (to the client) that 241 the downstream server supports overload control as defined in this 242 document. Second, if overload control is active, then it indicates 243 the level of control to be applied. 245 When a SIP client receives a response with the value in the "oc" 246 parameter filled in, it MUST reduce, as indicated by the "oc" and 247 "oc-algo" parameters, the number of requests going downstream to the 248 SIP server from which it received the response (see Section 5.10 for 249 pertinent discussion on traffic reduction). 251 4.2. The oc-algo parameter 253 This parameter is inserted by the SIP client and updated by the SIP 254 server. 256 A SIP client MUST add an "oc-algo" parameter to the topmost Via 257 header it inserts into every SIP request, with a default value of 258 "loss". 260 This parameter contains names of one or more classes of overload 261 control algorithms. A SIP client MUST support the loss-based 262 overload control scheme and MUST insert at least the token "loss" as 263 one of the "oc-algo" parameter values. In addition, the SIP client 264 MAY insert other tokens, separated by a comma, in the "oc-algo" 265 parameter if it supports other overload control schemes such as a 266 rate-based scheme ([I-D.ietf-soc-overload-rate-control]). Each 267 element in the comma-separated list corresponds to the class of 268 overload control algorithms supported by the SIP client. When more 269 than one class of overload control algorithms is present in the "oc- 270 algo" parameter, the client may indicate algorithm preference by 271 ordering the list in a decreasing order of preference. However, the 272 client can not assume that the server will pick the most preferred 273 algorithm. 275 When a downstream SIP server receives a request with multiple 276 overload control algorithms specified in the "oc-algo" parameter 277 (optionally sorted by decreasing order of preference), it chooses one 278 algorithm from the list and MUST return the single selected algorithm 279 to the client. 281 Once the SIP server has chosen, and communicated to the client, a 282 mutually agreeable class of overload control algorithm, the selection 283 stays in effect until such time that the algorithm is changed by the 284 server. Furthermore, the client MUST continue to include all the 285 supported algorithms in subsequent requests; the server MUST respond 286 with the agreed to algorithm until such time that the algorithm is 287 changed by the server. The selection SHOULD stay the same for a non- 288 trivial duration of time to allow the overload control algorithm to 289 stabilize its behaviour (see Section 5.8). 291 The "oc-algo" parameter does not define the exact algorithm to be 292 used for traffic reduction, rather, the intent is to use any 293 algorithm from a specific class of algorithms that affect traffic 294 reduction similarly. For example, the reference algorithm in 295 Section 7.2 can be used as a loss-based algorithm, or it can be 296 substituted by any other loss-based algorithm that results in 297 equivalent traffic reduction. 299 4.3. The oc-validity parameter 301 This parameter MAY be inserted by the SIP server in a response; it 302 MUST NOT be inserted by the SIP client in a request. 304 This parameter contains a value that indicates an interval of time 305 (measured in milliseconds) that the load reduction specified in the 306 value of the "oc" parameter should be in effect. The default value 307 of the "oc-validity" parameter is 500 (millisecond). If the client 308 receives a response with the "oc" and "oc-algo" parameters suitably 309 filled in, but no "oc-validity" parameter, the SIP client should 310 behave as if it had received "oc-validity=500". 312 A value of 0 in the "oc-validity" parameter is reserved to denote the 313 event that the server wishes to stop overload control, or to indicate 314 that it supports overload control, but is not currently requesting 315 any reduction in traffic (see Section 5.7). 317 A non-zero value for the "oc-validity" parameter MUST only be present 318 in conjunction with an "oc" parameter. A SIP client MUST discard a 319 non-zero value of the "oc-validity" parameter if the client receives 320 it in a response without the corresponding "oc" parameter being 321 present as well. 323 After the value specified in the "oc-validity" parameter expires and 324 until the SIP client receives an updated set of overload control 325 parameters from the SIP server, overload control is not in effect 326 between the client and the downstream SIP server. 328 4.4. The oc-seq parameter 330 This parameter MUST be inserted by the SIP server in a response; it 331 MUST NOT be inserted by the SIP client in a request. 333 This parameter contains an unsigned integer value that indicates the 334 sequence number associated with the "oc" parameter. This sequence 335 number is used to differentiate two "oc" parameter values generated 336 by an overload control algorithm at two different instants in time. 337 "oc" parameter values generated by an overload control algorithm at 338 time t and t+1 MUST have an increasing value in the "oc-seq" 339 parameter. This allows the upstream SIP client to properly collate 340 out-of-order responses. 342 A timestamp can be used as a value of the "oc-seq" parameter. 344 If the value contained in "oc-seq" parameter overflows during the 345 period in which the load reduction is in effect, then the "oc-seq" 346 parameter MUST be reset to the current timestamp or an appropriate 347 base value. 349 A client implementation can recognize that an overflow has 350 occurred when it receives an "oc-seq" parameter whose value is 351 significantly less than several previous values. (Note that an 352 "oc-seq" parameter whose value does not deviate significantly from 353 the last several previous values is symptomatic of a tardy packet. 354 However, overflow will cause "oc-seq" an "oc-seq" parameter value 355 to be significantly less than the last several values.) If an 356 overflow is detected, then the client should use the overload 357 parameters in the new message, even though the sequence number is 358 lower. The client should also reset any internal state to reflect 359 the overflow so that future messages (following the overflow) will 360 be accepted. 362 5. General behaviour 364 When forwarding a SIP request, a SIP client uses the SIP procedures 365 of [RFC3263] to determine the next hop SIP server. The procedures of 366 [RFC3263] take as input a SIP URI, extract the domain portion of that 367 URI for use as a lookup key, and query the Domain Name Service (DNS) 368 to obtain an ordered set of one or more IP addresses with a port 369 number and transport corresponding to each IP address in this set 370 (the "Expected Output"). 372 After selecting a specific SIP server from the Expected Output, a SIP 373 client determines whether overload controls are currently active with 374 that server. If overload controls are currently active (and oc- 375 validity period has not yet expired), the client applies the relevant 376 algorithm to determine whether or not to send the SIP request to the 377 server. If overload controls are not currently active with this 378 server (which will be the case if this is the initial contact with 379 the server, or the last response from this server had "oc- 380 validity=0", or the time period indicated by the "oc-validity" 381 parameter has expired), the SIP client sends the SIP message to the 382 server without invoking any overload control algorithm. 384 5.1. Determining support for overload control 386 If a client determines that this is the first contact with a server, 387 the client MUST insert the "oc" parameter without any value, and MUST 388 insert the "oc-algo" parameter with a list of algorithms it supports. 390 This list MUST include "loss" and MAY include other algorithm names 391 approved by IANA and described in corresponding documents. The 392 client transmits the request to the chosen server. 394 If a server receives a SIP request containing the "oc" and "oc-algo" 395 parameters, the server MUST determine if it has already selected the 396 overload control algorithm class with this client. If it has, the 397 server SHOULD use the previously selected algorithm class in its 398 response to the message. If the server determines that the message 399 is from a new client, or a client the server has not heard from in a 400 long time, the server MUST choose one algorithm from the list of 401 algorithms in the "oc-algo" parameter. It MUST put the chosen 402 algorithm as the sole parameter value in the "oc-algo" parameter of 403 the response it sends to the client. In addition, if the server is 404 currently not in an overload condition, it MUST set the value of the 405 "oc" parameter to be 0 and MAY insert an "oc-validity=0" parameter in 406 the response to further qualify the value in the "oc" parameter. If 407 the server is currently overloaded, it MUST follow the procedures of 408 Section 5.2. 410 A client that supports the rate-based overload control scheme 411 [I-D.ietf-soc-overload-rate-control] will consider "oc=0" as an 412 indication not to send any requests downstream at all. Thus, when 413 the server inserts "oc-validity=0" as well, it is indicating that 414 it does support overload control, but it is not under overload 415 mode right now (see Section 5.7). 417 5.2. Creating and updating the overload control parameters 419 A SIP server provides overload control feedback to its upstream 420 clients by providing a value for the "oc" parameter to the topmost 421 Via header field of a SIP response, that is, the Via header added by 422 the client before it sent the request to the server. 424 Since the topmost Via header of a response will be removed by an 425 upstream client after processing it, overload control feedback 426 contained in the "oc" parameter will not travel beyond the upstream 427 SIP client. A Via header parameter therefore provides hop-by-hop 428 semantics for overload control feedback (see [RFC6357]) even if the 429 next hop neighbor does not support this specification. 431 The "oc" parameter can be used in all response types, including 432 provisional, success and failure responses (please see Section 5.11 433 for special consideration on transporting overload control parameters 434 in a 100-Trying response). A SIP server can update the "oc" 435 parameter a response, asking the client to increase or decrease the 436 number of requests destined to the server, or to stop performing 437 overload control altogether. 439 A SIP server that has updated the "oc" parameter SHOULD also add a 440 "oc-validity" parameter. The "oc-validity" parameter defines the 441 time in milliseconds during which the the overload control feedback 442 specified in the "oc" parameter is valid. The default value of the 443 "oc-validity" parameter is 500 (millisecond). 445 When a SIP server retransmits a response, it SHOULD use the "oc" 446 parameter value and "oc-validity" parameter value consistent with the 447 overload state at the time the retransmitted response is sent. This 448 implies that the values in the "oc" and "oc-validity" parameters may 449 be different than the ones used in previous retransmissions of the 450 response. Due to the fact that responses sent over UDP may be 451 subject to delays in the network and arrive out of order, the "oc- 452 seq" parameter aids in detecting a stale "oc" parameter value. 454 Implementations that are capable of updating the "oc" and "oc- 455 validity" parameter values during retransmissions MUST insert the 456 "oc-seq" parameter. The value of this parameter MUST be a set of 457 numbers drawn from an increasing sequence. 459 Implementations that are not capable of updating the "oc" and "oc- 460 validity" parameter values during retransmissions --- or 461 implementations that do not want to do so because they will have to 462 regenerate the message to be retransmitted --- MUST still insert a 463 "oc-seq" parameter in the first response associated with a 464 transaction; however, they do not have to update the value in 465 subsequent retransmissions. 467 The "oc-validity" and "oc-seq" Via header parameters are only defined 468 in SIP responses and MUST NOT be used in SIP requests. These 469 parameters are only useful to the upstream neighbor of a SIP server 470 (i.e., the entity that is sending requests to the SIP server) since 471 the client is the entity that can offload traffic by redirecting or 472 rejecting new requests. If requests are forwarded in both directions 473 between two SIP servers (i.e., the roles of upstream/downstream 474 neighbors change), there are also responses flowing in both 475 directions. Thus, both SIP servers can exchange overload 476 information. 478 This specification provides a good overload control mechanism that 479 can protect a SIP server from overload. However, if a SIP server 480 wanted to limit advertisements of overload control capability for 481 privacy reasons, it might decide to perform overload control only for 482 requests that are received on a secure transport, such as TLS. 483 Indicating support for overload control on a request received on an 484 untrusted link can leak privacy in the form of capabilities supported 485 by the server. To limit the knowledge that the server supports 486 overload control, a server can adopt a policy of inserting overload 487 control parameters in only those requests received over trusted links 488 such that these parameters are only visible to trusted neighbours. 490 5.3. Determining the 'oc' Parameter Value 492 The value of the "oc" parameter is determined by the overloaded 493 server using any pertinent information at its disposal. The only 494 constraint imposed by this document is that the server control 495 algorithm MUST produce a value for the "oc" parameter that it expects 496 the receiving SIP clients to apply to all downstream SIP requests 497 (dialogue forming as well as in-dialogue) to this SIP server. Beyond 498 this stipulation, the process by which an overloaded server 499 determines the value of the "oc" parameter is considered out of scope 500 for this document. 502 Note that this stipulation is required so that both the client and 503 server have an common view of which messages the overload control 504 applies to. With this stipulation in place, the client can 505 prioritize messages as discussed in Section 5.10.1. 507 As an example, a value of "oc=10" when the loss-based algorithm is 508 used implies that 10% of the total number of SIP requests (dialog 509 forming as well as in-dialogue) are subject to reduction at the 510 client. Analogously, a value of "oc=10" when the rate-based 511 algorithm [I-D.ietf-soc-overload-rate-control] is used indicates that 512 the client should send SIP requests at a rate of 10 SIP requests or 513 fewer per second. 515 5.4. Processing the Overload Control Parameters 517 A SIP client SHOULD remove "oc", "oc-validity" and "oc-seq" 518 parameters from all Via headers of a response received, except for 519 the topmost Via header. This prevents overload control parameters 520 that were accidentally or maliciously inserted into Via headers by a 521 downstream SIP server from traveling upstream. 523 The scope of overload control applies to unique combinations of IP 524 and port values. A SIP client maintains the overload control values 525 received (along with the address and port number of the SIP servers 526 from which they were received) for the duration specified in the "oc- 527 validity" parameter or the default duration. Each time a SIP client 528 receives a response with overload control parameter from a downstream 529 SIP server, it compares the "oc-seq" value extracted from the Via 530 header with the "oc-seq" value stored for this server. If these 531 values match, the response does not update the overload control 532 parameters related to this server and the client continues to provide 533 overload control as previously negotiated. If the "oc-seq" value 534 extracted from the Via header is larger than the stored value, the 535 client updates the stored values by copying the new values of "oc", 536 "oc-algo" and "oc-seq" parameters from the Via header to the stored 537 values. Upon such an update of the overload control parameters, the 538 client restarts the validity period of the new overload control 539 parameters. The overload control parameters now remain in effect 540 until the validity period expires or the parameters are updated in a 541 new response. Stored overload control parameters MUST be reset to 542 default values once the validity period has expired (see Section 5.7 543 for the detailed steps on terminating overload control). 545 5.5. Using the Overload Control Parameter Values 547 A SIP client MUST honor overload control values it receives from 548 downstream neighbors. The SIP client MUST NOT forward more requests 549 to a SIP server than allowed by the current "oc" and "oc-algo" 550 parameter values from that particular downstream server. 552 When forwarding a SIP request, a SIP client uses the SIP procedures 553 of [RFC3263] to determine the next hop SIP server. The procedures of 554 [RFC3263] take as input a SIP URI, extract the domain portion of that 555 URI for use as a lookup key, and query the Domain Name Service (DNS) 556 to obtain an ordered set of one or more IP addresses with a port 557 number and transport corresponding to each IP address in this set 558 (the "Expected Output"). 560 After selecting a specific SIP server from the Expected Output, the 561 SIP client determines if it already has overload control parameter 562 values for the server chosen from the Expected Output. If the SIP 563 client has a non-expired "oc" parameter value for the server chosen 564 from the Expected Output, then this chosen server is operating in 565 overload control mode. Thus, the SIP client determines if it can or 566 cannot forward the current request to the SIP server based on the 567 "oc" and "oc-algo" parameters and any relevant local policy. 569 The particular algorithm used to determine whether or not to forward 570 a particular SIP request is a matter of local policy, and may take 571 into account a variety of prioritization factors. However, this 572 local policy SHOULD transmit the same number of SIP requests as the 573 sample algorithm defined by the overload control scheme being used. 574 (See Section 7.2 for the default loss-based overload control 575 algorithm.) 577 5.6. Forwarding the overload control parameters 579 Overload control is defined in a hop-by-hop manner. Therefore, 580 forwarding the contents of the overload control parameters is 581 generally NOT RECOMMENDED and should only be performed if permitted 582 by the configuration of SIP servers. This means that a SIP proxy 583 SHOULD strip the overload control parameters inserted by the client 584 before proxying the request further downstream. Of course, when the 585 proxy acts as a client and proxies the request downstream, it is free 586 to add overload control parameters pertinent to itself in the Via 587 header it inserted in the request. 589 5.7. Terminating overload control 591 A SIP client removes overload control if one of the following events 592 occur: 594 1. The "oc-validity" period previously received by the client from 595 this server (or the default value of 500ms if the server did not 596 previously specify an "oc-validity" parameter) expires; 597 2. The client is explicitly told by the server to stop performing 598 overload control using the "oc-validity=0" parameter. 600 A SIP server can decide to terminate overload control by explicitly 601 signaling the client. To do so, the SIP server MUST set the value of 602 the "oc-validity" parameter to 0. The SIP server MUST increment the 603 value of "oc-seq", and SHOULD set the value of the "oc" parameter to 604 0. 606 Note that the loss-based overload control scheme (Section 7) can 607 effectively stop overload control by setting the value of the "oc" 608 parameter to 0. However, the rate-based scheme 609 ([I-D.ietf-soc-overload-rate-control]) needs an additional piece 610 of information in the form of "oc-validity=0". 612 When the client receives a response with a higher "oc-seq" number 613 than the one it most recently processed, it checks the "oc-validity" 614 parameter. If the value of the "oc-validity" parameter is 0, this 615 indicates to the client that overload control of messages destined to 616 the server is no longer necessary and the traffic can flow without 617 any reduction. Furthermore, when the value of the "oc-validity" 618 parameter is 0, the client SHOULD disregard the value in the "oc" 619 parameter. 621 5.8. Stabilizing overload algorithm selection 623 Realities of deployments of SIP necessitate that the overload control 624 algorithm may be changed upon a system reboot or a software upgrade. 625 However, frequent changes of the overload control algorithm must be 626 avoided. Frequent changes of the overload control algorithm will not 627 benefit the client or the server as such flapping does not allow the 628 chosen algorithm to stabilize. An algorithm change, when desired, is 629 simply accomplished by the SIP server choosing a new algorithm from 630 the list in the client's "oc-algo" parameter and sending it back to 631 the client in a response. 633 The client associates a specific algorithm with each server it sends 634 traffic to and when the server changes the algorithm, the client must 635 change its behaviour accordingly. 637 Once the server selects a specific overload control algorithm for a 638 given client, the algorithm SHOULD NOT change the algorithm 639 associated with that client for at least 3600 seconds (1 hour). This 640 period may involve one or more cycles of overload control being in 641 effect and then being stopped depending on the traffic and resources 642 at the server. 644 One way to accomplish this involves the server saving the time of 645 the last algorithm change in a lookup table, indexed by the 646 client's network identifiers. The server only changes the "oc- 647 algo" parameter when the time since the last change has surpassed 648 3600 seconds. 650 5.9. Self-Limiting 652 In some cases, a SIP client may not receive a response from a server 653 after sending a request. RFC3261 [RFC3261] defines that when a 654 timeout error is received from the transaction layer, it MUST be 655 treated as if a 408 (Request Timeout) status code has been received. 656 If a fatal transport error is reported by the transport layer, it 657 MUST be treated as a 503 (Service Unavailable) status code. 659 In the event of repeated timeouts or fatal transport errors, the SIP 660 client MUST stop sending requests to this server. The SIP client 661 SHOULD periodically probe if the downstream server is alive using any 662 mechanism at its disposal. Clients should be conservative in their 663 probing (e.g., using an exponential back-off) so that their liveness 664 probes do not exacerbate an overload situation. Once a SIP client 665 has successfully received a normal response for a request sent to the 666 downstream server, the SIP client can resume sending SIP requests. 667 It should, of course, honor any overload control parameters it may 668 receive in the initial, or later, responses. 670 5.10. Responding to an Overload Indication 672 A SIP client can receive overload control feedback indicating that it 673 needs to reduce the traffic it sends to its downstream server. The 674 client can accomplish this task by sending some of the requests that 675 would have gone to the overloaded element to a different destination. 676 It needs to ensure, however, that this destination is not in overload 677 and capable of processing the extra load. A client can also buffer 678 requests in the hope that the overload condition will resolve quickly 679 and the requests still can be forwarded in time. In many cases, 680 however, it will need to reject these requests with a "503 (Service 681 Unavailable)" response without the Retry-After header. 683 5.10.1. Message prioritization at the hop before the overloaded server 685 During an overload condition, a SIP client needs to prioritize 686 requests and select those requests that need to be rejected or 687 redirected. This selection is largely a matter of local policy. It 688 is expected that a SIP client will follow local policy as long as the 689 result in reduction of traffic is consistent with the overload 690 algorithm in effect at that node. Accordingly, the normative 691 behaviour in the next three paragraphs should be interpreted with the 692 understanding that the SIP client will aim to preserve local policy 693 to the fullest extent possible. 695 A SIP client SHOULD honor the local policy for prioritizing SIP 696 requests such as policies based on message type, e.g., INVITEs versus 697 requests associated with existing sessions. 699 A SIP client SHOULD honor the local policy for prioritizing SIP 700 requests based on the content of the Resource-Priority header (RPH, 701 RFC4412 [RFC4412]). Specific (namespace.value) RPH contents may 702 indicate high priority requests that should be preserved as much as 703 possible during overload. The RPH contents can also indicate a low- 704 priority request that is eligible to be dropped during times of 705 overload. 707 A SIP client SHOULD honor the local policy for prioritizing SIP 708 requests relating to emergency calls as identified by the SOS URN 709 [RFC5031] indicating an emergency request. This policy ensures that 710 when a server is overloaded and non-emergency calls outnumber 711 emergency calls in the traffic arriving at the client, the few 712 emergency calls will be given preference. If, on the other hand, the 713 server is overloaded and the majority of calls arriving at the client 714 are emergency in nature, then no amount of message prioritization 715 will ensure the delivery of all emergency calls if the client is to 716 reduce the amount of traffic as requested by the server. 718 A local policy can be expected to combine both the SIP request type 719 and the prioritization markings, and SHOULD be honored when overload 720 conditions prevail. 722 5.10.2. Rejecting requests at an overloaded server 724 If the upstream SIP client to the overloaded server does not support 725 overload control, it will continue to direct requests to the 726 overloaded server. Thus, for the non-participating client, the 727 overloaded server must bear the cost of rejecting some requests from 728 the client as well as the cost of processing the non-rejected 729 requests to completion. It would be fair to devote the same amount 730 of processing at the overloaded server to the combination of 731 rejection and processing from a non-participating client as the 732 overloaded server would devote to processing requests from a 733 participating client. This is to ensure that SIP clients that do not 734 support this specification don't receive an unfair advantage over 735 those that do. 737 A SIP server that is under overload and has started to throttle 738 incoming traffic MUST reject some requests from non-participating 739 clients with a "503 (Service Unavailable)" response without the 740 Retry-After header. 742 5.11. 100-Trying provisional response and overload control parameters 744 The overload control information sent from a SIP server to a client 745 is transported in the responses. While implementations can insert 746 overload control information in any response, special attention 747 should be accorded to overload control information transported in a 748 100-Trying response. 750 Traditionally, the 100-Trying response has been used in SIP to quench 751 retransmissions. In some implementations, the 100-Trying message may 752 not be generated by the transaction user (TU) nor consumed by the TU. 753 In these implementations, the 100-Trying response is generated at the 754 transaction layer and sent to the upstream SIP client. At the 755 receiving SIP client, the 100-Trying is consumed at the transaction 756 layer by inhibiting the retransmission of the corresponding request. 757 Consequently, implementations that insert overload control 758 information in the 100-Trying cannot assume that the upstream SIP 759 client passed the overload control information in the 100-Trying to 760 their corresponding TU. For this reason, implementations that insert 761 overload control information in the 100-Trying MUST re-insert the 762 same (or updated) overload control information in the first non-100 763 response being sent to the upstream SIP client. 765 6. Example 767 Consider a SIP client, P1, which is sending requests to another 768 downstream SIP server, P2. The following snippets of SIP messages 769 demonstrate how the overload control parameters work. 771 INVITE sips:user@example.com SIP/2.0 772 Via: SIP/2.0/TLS p1.example.net; 773 branch=z9hG4bK2d4790.1;oc;oc-algo="loss,A" 774 ... 776 SIP/2.0 100 Trying 777 Via: SIP/2.0/TLS p1.example.net; 778 branch=z9hG4bK2d4790.1;received=192.0.2.111; 779 oc=0;oc-algo="loss";oc-validity=0 780 ... 782 In the messages above, the first line is sent by P1 to P2. This line 783 is a SIP request; because P1 supports overload control, it inserts 784 the "oc" parameter in the topmost Via header that it created. P1 785 supports two overload control algorithms: loss and some algorithm 786 called "A". 788 The second line --- a SIP response --- shows the topmost Via header 789 amended by P2 according to this specification and sent to P1. 790 Because P2 also supports overload control, and because it chooses the 791 "loss" based scheme, it sends "loss" back to P1 in the "oc-algo" 792 parameter. It also sets the value of "oc" and "oc-validity" 793 parameters to 0 because it is not currently requesting overload 794 control activation. 796 Had P2 not supported overload control, it would have left the "oc" 797 and "oc-algo" parameters unchanged, thus allowing the client to know 798 that it did not support overload control. 800 At some later time, P2 starts to experience overload. It sends the 801 following SIP message indicating that P1 should decrease the messages 802 arriving to P2 by 20% for 0.5s. 804 SIP/2.0 180 Ringing 805 Via: SIP/2.0/TLS p1.example.net; 806 branch=z9hG4bK2d4790.3;received=192.0.2.111; 807 oc=20;oc-algo="loss";oc-validity=500; 808 oc-seq=1282321615.782 809 ... 811 After some time, the overload condition at P2 subsides. It then 812 changes the parameter values in the response it sends to P1 to allow 813 P1 to send all messages destined to P2. 815 SIP/2.0 183 Queued 816 Via: SIP/2.0/TLS p1.example.net; 817 branch=z9hG4bK2d4790.4;received=192.0.2.111; 818 oc=0;oc-algo="loss";oc-validity=0;oc-seq=1282321892.439 820 ... 822 7. The loss-based overload control scheme 824 Under a loss-based approach, a SIP server asks an upstream neighbor 825 to reduce the number of requests it would normally forward to this 826 server by a certain percentage. For example, a SIP server can ask an 827 upstream neighbor to reduce the number of requests this neighbor 828 would normally send by 10%. The upstream neighbor then redirects or 829 rejects 10% of the traffic originally destined for that server. 831 This section specifies the semantics of the overload control 832 parameters associated with the loss-based overload control scheme. 833 The general behaviour of SIP clients and servers is specified in 834 Section 5 and is applicable to SIP clients and servers that implement 835 loss-based overload control. 837 7.1. Special parameter values for loss-based overload control 839 The loss-based overload control scheme is identified using the token 840 "loss". This token appears in the "oc-algo" parameter list sent by 841 the SIP client. 843 A SIP server that has selected the loss-based algorithm, upon 844 entering the overload state, will assign a value to the "oc" 845 parameter. This value MUST be in the range of [0, 100], inclusive. 846 This value indicates to the client the percentage by which the client 847 is to reduce the number of requests being forwarded to the overloaded 848 server. The SIP client may use any algorithm that reduces the 849 traffic it sends to the overloaded server by the amount indicated. 850 Such an algorithm should honor the message prioritization discussion 851 of Section 5.10.1. While a particular algorithm is not subject to 852 standardization, for completeness a default algorithm for loss-based 853 overload control is provided in Section 7.2. 855 7.2. Default algorithm for loss-based overload control 857 This section describes a default algorithm that a SIP client can use 858 to throttle SIP traffic going downstream by the percentage loss value 859 specified in the "oc" parameter. 861 The client maintains two categories of requests; the first category 862 will include requests that are candidates for reduction, and the 863 second category will include requests that are not subject to 864 reduction except when all messages in the first category have been 865 rejected, and further reduction is still needed. Section 866 Section 5.10.1 contains directives on identifying messages for 867 inclusion in the second category. The remaining messages are 868 allocated to the first category. 870 Under overload condition, the client converts the value of the "oc" 871 parameter to a value that it applies to requests in the first 872 category. As a simple example, if "oc=10" and 40% of the requests 873 should be included in the first category, then: 875 10 / 40 * 100 = 25 877 Or, 25% of the requests in the first category can be reduced to get 878 an overall reduction of 10%. The client uses random discard to 879 achieve the 25% reduction of messages in the first category. 880 Messages in the second category proceed downstream unscathed. To 881 affect the 25% reduction rate from the first category, the client 882 draws a random number between 1 and 100 for the request picked from 883 the first category. If the random number is less than or equal to 884 converted value of the "oc" parameter, the request is not forwarded; 885 otherwise the request is forwarded. 887 A reference algorithm is shown below. 889 cat1 := 80.0 // Category 1 --- subject to reduction 890 cat2 := 100.0 - cat1 // Category 2 --- Under normal operations 891 // only subject to reduction after category 1 is exhausted. 892 // Note that the above ratio is simply a reasonable default. 893 // The actual values will change through periodic sampling 894 // as the traffic mix changes over time. 896 while (true) { 897 // We're modeling message processing as a single work 898 // queue that contains both incoming and outgoing messages. 899 sip_msg := get_next_message_from_work_queue() 901 update_mix(cat1, cat2) // See Note below 903 switch (sip_msg.type) { 905 case outbound request: 906 destination := get_next_hop(sip_msg) 907 oc_context := get_oc_context(destination) 909 if (oc_context == null) { 910 send_to_network(sip_msg) // Process it normally by 911 // sending the request to the next hop since this 912 // particular destination is not subject to overload 913 } 914 else { 915 // Determine if server wants to enter in overload or is in 916 // overload 917 in_oc := extract_in_oc(oc_context) 919 oc_value := extract_oc(oc_context) 920 oc_validity := extract_oc_validity(oc_context) 922 if (in_oc == false or oc_validity is not in effect) { 923 send_to_network(sip_msg) // Process it normally by sending 924 // the request to the next hop since this particular 925 // destination is not subject to overload. Optionally, 926 // clear the oc context for this server (not shown). 927 } 928 else { // Begin perform overload control 929 r := random() 930 drop_msg := false 932 category := assign_msg_to_category(sip_msg) 934 pct_to_reduce_cat1 = oc_value / cat1 * 100 936 if (oc_value <= cat1) { // Reduce all msgs from category 1 937 if (r <= pct_to_reduce_cat1 && category == cat1) { 938 drop_msg := true 939 } 940 } 941 else { // oc_value > category 1. Reduce 100% of msgs from 942 // category 1 and remaining from category 2. 943 pct_to_reduce_cat2 = (oc_value - cat1) / cat2 * 100 944 if (category == cat1) { 945 drop_msg := true 946 } 947 else { 948 if (r <= pct_to_reduce_cat2) { 949 drop_msg := true; 950 } 951 } 952 } 954 if (drop_msg == false) { 955 send_to_network(sip_msg) // Process it normally by 956 // sending the request to the next hop 957 } 958 else { 959 // Do not send request downstream, handle locally by 960 // generating response (if a proxy) or treating as 961 // an error (if a user agent). 962 } 964 } // End perform overload control 965 } 967 end case // outbound request 969 case outbound response: 970 if (we are in overload) { 971 add_overload_parameters(sip_msg) 972 } 973 send_to_network(sip_msg) 975 end case // outbound response 977 case inbound response: 979 if (sip_msg has oc parameter values) { 980 create_or_update_oc_context() // For the specific server 981 // that sent the response, create or update the oc context; 982 // i.e., extract the values of the oc-related parameters 983 // and store them for later use. 984 } 985 process_msg(sip_msg) 987 end case // inbound response 988 case inbound request: 990 if (we are not in overload) { 991 process_msg(sip_msg) 992 } 993 else { // We are in overload 994 if (sip_msg has oc parameters) { // Upstream client supports 995 process_msg(sip_msg) // oc; only sends important requests 996 } 997 else { // Upstream client does not support oc 998 if (local_policy(sip_msg) says process message) { 999 process_msg(sip_msg) 1000 } 1001 else { 1002 send_response(sip_msg, 503) 1003 } 1004 } 1005 } 1006 end case // inbound request 1007 } 1008 } 1010 Note: A simple way to sample the traffic mix for category 1 and 1011 category 2 is to associate a counter with each category of message. 1013 Periodically (every 5-10s) get the value of the counters and calculate 1014 the ratio of category 1 messages to category 2 messages since the 1015 last calculation. 1017 Example: In the last 5 seconds, a total of 500 requests arrived 1018 at the queue. 450 out of the 500 were messages subject 1019 to reduction and 50 out of 500 were classified as requests not 1020 subject to reduction. Based on this ratio, cat1 := 90 and 1021 cat2 := 10, so a 90/10 mix will be used in overload calculations. 1023 8. Relationship with other IETF SIP load control efforts 1025 The overload control mechanism described in this document is reactive 1026 in nature and apart from message prioritization directives listed in 1027 Section 5.10.1 the mechanisms described in this draft will not 1028 discriminate requests based on user identity, filtering action and 1029 arrival time. SIP networks that require pro-active overload control 1030 mechanisms can upload user-level load control filters as described in 1031 [I-D.ietf-soc-load-control-event-package]. Local policy will also 1032 dictate the precedence of different overload control mechanisms 1033 applied to the traffic. Specifically, in a scenario where load 1034 control filters are installed by signaling neighbours [I-D.ietf-soc- 1035 load-control-event-package] and the same traffic can also be 1036 throttled using the overload control mechanism, local policy will 1037 dictate which of these schemes shall be given precedence. 1038 Interactions between the two schemes are out of scope for this 1039 document. 1041 9. Syntax 1043 This specification extends the existing definition of the Via header 1044 field parameters of [RFC3261] as follows: 1046 via-params =/ oc / oc-validity / oc-seq / oc-algo 1047 oc = "oc" [EQUAL oc-num] 1048 oc-num = 1*DIGIT 1049 oc-validity = "oc-validity" [EQUAL delta-ms] 1050 oc-seq = "oc-seq" EQUAL 1*12DIGIT "." 1*5DIGIT 1051 oc-algo = "oc-algo" EQUAL DQUOTE algo-list *(COMMA algo-list) 1052 DQUOTE 1053 algo-list = "loss" / *(other-algo) 1054 other-algo = %x41-5A / %x61-7A / %x30-39 1055 delta-ms = 1*DIGIT 1057 10. Design Considerations 1059 This section discusses specific design considerations for the 1060 mechanism described in this document. General design considerations 1061 for SIP overload control can be found in [RFC6357]. 1063 10.1. SIP Mechanism 1065 A SIP mechanism is needed to convey overload feedback from the 1066 receiving to the sending SIP entity. A number of different 1067 alternatives exist to implement such a mechanism. 1069 10.1.1. SIP Response Header 1071 Overload control information can be transmitted using a new Via 1072 header field parameter for overload control. A SIP server can add 1073 this header parameter to the responses it is sending upstream to 1074 provide overload control feedback to its upstream neighbors. This 1075 approach has the following characteristics: 1077 o A Via header parameter is light-weight and creates very little 1078 overhead. It does not require the transmission of additional 1079 messages for overload control and does not increase traffic or 1080 processing burdens in an overload situation. 1081 o Overload control status can frequently be reported to upstream 1082 neighbors since it is a part of a SIP response. This enables the 1083 use of this mechanism in scenarios where the overload status needs 1084 to be adjusted frequently. It also enables the use of overload 1085 control mechanisms that use regular feedback such as window-based 1086 overload control. 1087 o With a Via header parameter, overload control status is inherent 1088 in SIP signaling and is automatically conveyed to all relevant 1089 upstream neighbors, i.e., neighbors that are currently 1090 contributing traffic. There is no need for a SIP server to 1091 specifically track and manage the set of current upstream or 1092 downstream neighbors with which it should exchange overload 1093 feedback. 1094 o Overload status is not conveyed to inactive senders. This avoids 1095 the transmission of overload feedback to inactive senders, which 1096 do not contribute traffic. If an inactive sender starts to 1097 transmit while the receiver is in overload it will receive 1098 overload feedback in the first response and can adjust the amount 1099 of traffic forwarded accordingly. 1100 o A SIP server can limit the distribution of overload control 1101 information by only inserting it into responses to known upstream 1102 neighbors. A SIP server can use transport level authentication 1103 (e.g., via TLS) with its upstream neighbors. 1105 10.1.2. SIP Event Package 1107 Overload control information can also be conveyed from a receiver to 1108 a sender using a new event package. Such an event package enables a 1109 sending entity to subscribe to the overload status of its downstream 1110 neighbors and receive notifications of overload control status 1111 changes in NOTIFY requests. This approach has the following 1112 characteristics: 1114 o Overload control information is conveyed decoupled from SIP 1115 signaling. It enables an overload control manager, which is a 1116 separate entity, to monitor the load on other servers and provide 1117 overload control feedback to all SIP servers that have set up 1118 subscriptions with the controller. 1119 o With an event package, a receiver can send updates to senders that 1120 are currently inactive. Inactive senders will receive a 1121 notification about the overload and can refrain from sending 1122 traffic to this neighbor until the overload condition is resolved. 1123 The receiver can also notify all potential senders once they are 1124 permitted to send traffic again. However, these notifications do 1125 generate additional traffic, which adds to the overall load. 1126 o A SIP entity needs to set up and maintain overload control 1127 subscriptions with all upstream and downstream neighbors. A new 1128 subscription needs to be set up before/while a request is 1129 transmitted to a new downstream neighbor. Servers can be 1130 configured to subscribe at boot time. However, this would require 1131 additional protection to avoid the avalanche restart problem for 1132 overload control. Subscriptions need to be terminated when they 1133 are not needed any more, which can be done, for example, using a 1134 timeout mechanism. 1135 o A receiver needs to send NOTIFY messages to all subscribed 1136 upstream neighbors in a timely manner when the control algorithm 1137 requires a change in the control variable (e.g., when a SIP server 1138 is in an overload condition). This includes active as well as 1139 inactive neighbors. These NOTIFYs add to the amount of traffic 1140 that needs to be processed. To ensure that these requests will 1141 not be dropped due to overload, a priority mechanism needs to be 1142 implemented in all servers these request will pass through. 1143 o As overload feedback is sent to all senders in separate messages, 1144 this mechanism is not suitable when frequent overload control 1145 feedback is needed. 1146 o A SIP server can limit the set of senders that can receive 1147 overload control information by authenticating subscriptions to 1148 this event package. 1149 o This approach requires each proxy to implement user agent 1150 functionality (UAS and UAC) to manage the subscriptions. 1152 10.2. Backwards Compatibility 1154 An new overload control mechanism needs to be backwards compatible so 1155 that it can be gradually introduced into a network and functions 1156 properly if only a fraction of the servers support it. 1158 Hop-by-hop overload control (see [RFC6357]) has the advantage that it 1159 does not require that all SIP entities in a network support it. It 1160 can be used effectively between two adjacent SIP servers if both 1161 servers support overload control and does not depend on the support 1162 from any other server or user agent. The more SIP servers in a 1163 network support hop-by-hop overload control, the better protected the 1164 network is against occurrences of overload. 1166 A SIP server may have multiple upstream neighbors from which only 1167 some may support overload control. If a server would simply use this 1168 overload control mechanism, only those that support it would reduce 1169 traffic. Others would keep sending at the full rate and benefit from 1170 the throttling by the servers that support overload control. In 1171 other words, upstream neighbors that do not support overload control 1172 would be better off than those that do. 1174 A SIP server should therefore follow the behaviour outlined in 1175 Section 5.10.2 to handle clients that do not support overload 1176 control. 1178 11. Security Considerations 1180 Overload control mechanisms can be used by an attacker to conduct a 1181 denial-of-service attack on a SIP entity if the attacker can pretend 1182 that the SIP entity is overloaded. When such a forged overload 1183 indication is received by an upstream SIP client, it will stop 1184 sending traffic to the victim. Thus, the victim is subject to a 1185 denial-of-service attack. 1187 To better understand the threat model, consider the following 1188 diagram: 1190 Pa ------- ------ Pb 1191 \ / 1192 : ------ +-------- P1 ------+------ : 1193 / L1 L2 \ 1194 : ------- ------ : 1196 -----> Downstream (requests) 1197 <----- Upstream (responses) 1199 Here, requests travel downstream from the left-hand side, through 1200 Proxy P1, towards the right-hand side, and responses travel upstream 1201 from the right-hand side, through P1, towards the left hand side. 1202 Proxies Pa, Pb and P1 support overload control. L1 and L2 are labels 1203 for the links connecting P1 to the upstream clients and downstream 1204 servers. 1206 If an attacker is able to modify traffic between Pa and P1 on link 1207 L1, it can cause denial of service attack on P1 by having Pa not send 1208 any traffic to P1. Such an attack can proceed by the attacker 1209 modifying the response from P1 to Pa such that Pa's Via header is 1210 changed to indicate that all requests destined towards P1 should be 1211 dropped. Conversely, the attacker can simply remove any "oc", "oc- 1212 validity" and "oc-seq" markings added by P1 in a response to Pa. In 1213 such a case, the attacker will force P1 into overload by denying 1214 request quenching at Pa even though Pa is capable of performing 1215 overload control. 1217 Similarly, if an attacker is able to modify traffic between P1 and Pb 1218 on link L2, it can change the Via header associated with P1 in a 1219 response from Pb to P1 such that all subsequent requests destined 1220 towards Pb from P1 are dropped. In essence, the attacker mounts a 1221 denial of service attack on Pb by indicating false overload control. 1222 Note that it is immaterial whether Pb supports overload control or 1223 not, the attack will succeed as long as the attacker is able to 1224 control L2. Conversely, an attacker can suppress a genuine overload 1225 condition at Pb by simply remove any "oc", "oc-validity" and "oc-seq" 1226 markings added by Pb in a response to P1. In such a case, the 1227 attacker will force P1 into sending requests to Pb even under 1228 overload conditions because P1 would not be aware aware that Pb 1229 supports overload control. 1231 Attacks that indicate false overload control are best mitigated by 1232 using TLS in conjunction with applying BCP 38 [RFC2827]. Attacks 1233 that are mounted to suppress genuine overload conditions can be 1234 similarly avoided by using TLS on the connection. Generally, TCP or 1235 Websockets [RFC6455] in conjunction with BCP 38 makes it more 1236 difficult for an attacker to insert or modify messages, but may still 1237 prove inadequate against an adversary that controls links L1 and L2. 1238 TLS provides the best protection from an attacker with access to the 1239 network links. 1241 Another way to conduct an attack is to send a message containing a 1242 high overload feedback value through a proxy that does not support 1243 this extension. If this feedback is added to the second Via header 1244 (or all Via headers), it will reach the next upstream proxy. If the 1245 attacker can make the recipient believe that the overload status was 1246 created by its direct downstream neighbor (and not by the attacker 1247 further downstream) the recipient stops sending traffic to the 1248 victim. A precondition for this attack is that the victim proxy does 1249 not support this extension since it would not pass through overload 1250 control feedback otherwise. 1252 A malicious SIP entity could gain an advantage by pretending to 1253 support this specification but never reducing the amount of traffic 1254 it forwards to the downstream neighbor. If its downstream neighbor 1255 receives traffic from multiple sources which correctly implement 1256 overload control, the malicious SIP entity would benefit since all 1257 other sources to its downstream neighbor would reduce load. 1259 The solution to this problem depends on the overload control 1260 method. With rate-based, window-based and other similar overload 1261 control algorithms that promise to produce no more than a 1262 specified number of requests per unit time, the overloaded server 1263 can regulate the traffic arriving to it. However, when using 1264 loss-based overload control, such policing is not always obvious 1265 since the load forwarded depends on the load received by client. 1267 To prevent such attacks, servers should monitor client behavior to 1268 determine whether they are complying with overload control policies. 1269 If a client is not conforming to such policies, then the server 1270 should treat it as a non-supporting client (see Section 5.10.2). 1272 Finally, a distributed denial of service (DDoS) attack could cause an 1273 honest server to start signaling an overload condition. Such a DDoS 1274 attack could be mounted without controlling the communications links 1275 since the attack simply depends on the attacker injecting a large 1276 volume of packets on the communication links. If the honest server 1277 attacked by a DDoS attack has a long "oc-validity" interval, and the 1278 attacker can guess this interval, the attacker can keep the server 1279 overloaded by synchronizing the DDoS traffic with the validity 1280 period. While such an attack may be relatively easy to spot, 1281 mechanisms for combating it are outside the scope of this document 1282 and, of course, since attackers can invent new variations, the 1283 appropriate mechanisms are likely to change over time. 1285 12. IANA Considerations 1287 This specification defines four new Via header parameters as detailed 1288 below in the "Header Field Parameter and Parameter Values" sub- 1289 registry as per the registry created by [RFC3968]. The required 1290 information is: 1292 Header Field Parameter Name Predefined Values Reference 1293 __________________________________________________________ 1294 Via oc Yes RFCXXXX 1295 Via oc-validity Yes RFCXXXX 1296 Via oc-seq Yes RFCXXXX 1297 Via oc-algo Yes RFCXXXX 1299 RFC XXXX [NOTE TO RFC-EDITOR: Please replace with final RFC 1300 number of this specification.] 1302 13. References 1304 13.1. Normative References 1306 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1307 Requirement Levels", BCP 14, RFC 2119, March 1997. 1309 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1310 A., Peterson, J., Sparks, R., Handley, M., and E. 1311 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1312 June 2002. 1314 [RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation 1315 Protocol (SIP): Locating SIP Servers", RFC 3263, 1316 June 2002. 1318 [RFC3968] Camarillo, G., "The Internet Assigned Number Authority 1319 (IANA) Header Field Parameter Registry for the Session 1320 Initiation Protocol (SIP)", BCP 98, RFC 3968, 1321 December 2004. 1323 [RFC4412] Schulzrinne, H. and J. Polk, "Communications Resource 1324 Priority for the Session Initiation Protocol (SIP)", 1325 RFC 4412, February 2006. 1327 13.2. Informative References 1329 [I-D.ietf-soc-load-control-event-package] 1330 Shen, C., Schulzrinne, H., and A. Koike, "A Session 1331 Initiation Protocol (SIP) Load Control Event Package", 1332 draft-ietf-soc-load-control-event-package-13 (work in 1333 progress), December 2013. 1335 [I-D.ietf-soc-overload-rate-control] 1336 Noel, E. and P. Williams, "Session Initiation Protocol 1337 (SIP) Rate Control", 1338 draft-ietf-soc-overload-rate-control-07 (work in 1339 progress), January 2014. 1341 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1342 Defeating Denial of Service Attacks which employ IP Source 1343 Address Spoofing", BCP 38, RFC 2827, May 2000. 1345 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 1346 Emergency and Other Well-Known Services", RFC 5031, 1347 January 2008. 1349 [RFC5390] Rosenberg, J., "Requirements for Management of Overload in 1350 the Session Initiation Protocol", RFC 5390, December 2008. 1352 [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design 1353 Considerations for Session Initiation Protocol (SIP) 1354 Overload Control", RFC 6357, August 2011. 1356 [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", 1357 RFC 6455, December 2011. 1359 Appendix A. Acknowledgements 1361 The authors acknowledge the contributions of Bruno Chatras, Keith 1362 Drage, Janet Gunn, Rich Terpstra, Daryl Malas, Eric Noel, R. 1363 Parthasarathi, Antoine Roly, Jonathan Rosenberg, Charles Shen, Rahul 1364 Srivastava, Padma Valluri, Shaun Bharrat, Paul Kyzivat and Jeroen Van 1365 Bemmel to this document. 1367 Adam Roach and Eric McMurry helped flesh out the different cases for 1368 handling SIP messages described in the algorithm of Section 7.2. 1369 Janet Gunn reviewed the algorithm and suggested changes that lead to 1370 simpler processing for the case where "oc_value > cat1". 1372 Richard Barnes provided invaluable comments as part of area director 1373 review of the draft. 1375 Appendix B. RFC5390 requirements 1377 Table 1 provides a summary how this specification fulfills the 1378 requirements of [RFC5390]. A more detailed view on how each 1379 requirements is fulfilled is provided after the table. 1381 +-------------+-------------------+ 1382 | Requirement | Meets requirement | 1383 +-------------+-------------------+ 1384 | REQ 1 | Yes | 1385 | REQ 2 | Yes | 1386 | REQ 3 | Partially | 1387 | REQ 4 | Partially | 1388 | REQ 5 | Partially | 1389 | REQ 6 | Not applicable | 1390 | REQ 7 | Yes | 1391 | REQ 8 | Partially | 1392 | REQ 9 | Yes | 1393 | REQ 10 | Yes | 1394 | REQ 11 | Yes | 1395 | REQ 12 | Yes | 1396 | REQ 13 | Yes | 1397 | REQ 14 | Yes | 1398 | REQ 15 | Yes | 1399 | REQ 16 | Yes | 1400 | REQ 17 | Partially | 1401 | REQ 18 | Yes | 1402 | REQ 19 | Yes | 1403 | REQ 20 | Yes | 1404 | REQ 21 | Yes | 1405 | REQ 22 | Yes | 1406 | REQ 23 | Yes | 1407 +-------------+-------------------+ 1409 Summary of meeting requirements in RFC5390 1411 Table 1 1413 REQ 1: The overload mechanism shall strive to maintain the overall 1414 useful throughput (taking into consideration the quality-of-service 1415 needs of the using applications) of a SIP server at reasonable 1416 levels, even when the incoming load on the network is far in excess 1417 of its capacity. The overall throughput under load is the ultimate 1418 measure of the value of an overload control mechanism. 1420 Meeting REQ 1: Yes, the overload control mechanism allows an 1421 overloaded SIP server to maintain a reasonable level of throughput as 1422 it enters into congestion mode by requesting the upstream clients to 1423 reduce traffic destined downstream. 1425 REQ 2: When a single network element fails, goes into overload, or 1426 suffers from reduced processing capacity, the mechanism should strive 1427 to limit the impact of this on other elements in the network. This 1428 helps to prevent a small-scale failure from becoming a widespread 1429 outage. 1431 Meeting REQ 2: Yes. When a SIP server enters overload mode, it will 1432 request the upstream clients to throttle the traffic destined to it. 1433 As a consequence of this, the overloaded SIP server will itself 1434 generate proportionally less downstream traffic, thereby limiting the 1435 impact on other elements in the network. 1437 REQ 3: The mechanism should seek to minimize the amount of 1438 configuration required in order to work. For example, it is better 1439 to avoid needing to configure a server with its SIP message 1440 throughput, as these kinds of quantities are hard to determine. 1442 Meeting REQ 3: Partially. On the server side, the overload condition 1443 is determined monitoring S (c.f., Section 4 of [RFC6357]) and 1444 reporting a load feedback F as a value to the "oc" parameter. On the 1445 client side, a throttle T is applied to requests going downstream 1446 based on F. This specification does not prescribe any value for S, 1447 nor a particular value for F. The "oc-algo" parameter allows for 1448 automatic convergence to a particular class of overload control 1449 algorithm. There are suggested default values for the "oc-validity" 1450 parameter. 1452 REQ 4: The mechanism must be capable of dealing with elements that do 1453 not support it, so that a network can consist of a mix of elements 1454 that do and don't support it. In other words, the mechanism should 1455 not work only in environments where all elements support it. It is 1456 reasonable to assume that it works better in such environments, of 1457 course. Ideally, there should be incremental improvements in overall 1458 network throughput as increasing numbers of elements in the network 1459 support the mechanism. 1461 Meeting REQ 4: Yes. The mechanism is designed to reduce congestion 1462 when a pair of communicating entities support it. If a downstream 1463 overloaded SIP server does not respond to a request in time, a SIP 1464 client will attempt to reduce traffic destined towards the non- 1465 responsive server as outlined in Section 5.9. 1467 REQ 5: The mechanism should not assume that it will only be deployed 1468 in environments with completely trusted elements. It should seek to 1469 operate as effectively as possible in environments where other 1470 elements are malicious; this includes preventing malicious elements 1471 from obtaining more than a fair share of service. 1473 Meeting REQ 5: Partially. Since overload control information is 1474 shared between a pair of communicating entities, a confidential and 1475 authenticated channel can be used for this communication. However, 1476 if such a channel is not available, then the security ramifications 1477 outlined in Section 11 apply. 1479 REQ 6: When overload is signaled by means of a specific message, the 1480 message must clearly indicate that it is being sent because of 1481 overload, as opposed to other, non overload-based failure conditions. 1482 This requirement is meant to avoid some of the problems that have 1483 arisen from the reuse of the 503 response code for multiple purposes. 1484 Of course, overload is also signaled by lack of response to requests. 1485 This requirement applies only to explicit overload signals. 1487 Meeting REQ 6: Not applicable. Overload control information is 1488 signaled as part of the Via header and not in a new header. 1490 REQ 7: The mechanism shall provide a way for an element to throttle 1491 the amount of traffic it receives from an upstream element. This 1492 throttling shall be graded so that it is not all- or-nothing as with 1493 the current 503 mechanism. This recognizes the fact that "overload" 1494 is not a binary state and that there are degrees of overload. 1496 Meeting REQ 7: Yes, please see Section 5.5 and Section 5.10. 1498 REQ 8: The mechanism shall ensure that, when a request was not 1499 processed successfully due to overload (or failure) of a downstream 1500 element, the request will not be retried on another element that is 1501 also overloaded or whose status is unknown. This requirement derives 1502 from REQ 1. 1504 Meeting REQ 8: Partially. A SIP client that has overload information 1505 from multiple downstream servers will not retry the request on 1506 another element. However, if a SIP client does not know the overload 1507 status of a downstream server, it may send the request to that 1508 server. 1510 REQ 9: That a request has been rejected from an overloaded element 1511 shall not unduly restrict the ability of that request to be submitted 1512 to and processed by an element that is not overloaded. This 1513 requirement derives from REQ 1. 1515 Meeting REQ 9: Yes, a SIP client conformant to this specification 1516 will send the request to a different element. 1518 REQ 10: The mechanism should support servers that receive requests 1519 from a large number of different upstream elements, where the set of 1520 upstream elements is not enumerable. 1522 Meeting REQ 10: Yes, there are no constraints on the number of 1523 upstream clients. 1525 REQ 11: The mechanism should support servers that receive requests 1526 from a finite set of upstream elements, where the set of upstream 1527 elements is enumerable. 1529 Meeting REQ 11: Yes, there are no constraints on the number of 1530 upstream clients. 1532 REQ 12: The mechanism should work between servers in different 1533 domains. 1535 Meeting REQ 12: Yes, there are no inherent limitations on using 1536 overload control between domains. However, interconnections points 1537 that engage in overload control between domains will have to populate 1538 and maintain the overload control parameters as requests cross 1539 domains. 1541 REQ 13: The mechanism must not dictate a specific algorithm for 1542 prioritizing the processing of work within a proxy during times of 1543 overload. It must permit a proxy to prioritize requests based on any 1544 local policy, so that certain ones (such as a call for emergency 1545 services or a call with a specific value of the Resource-Priority 1546 header field [RFC4412]) are given preferential treatment, such as not 1547 being dropped, being given additional retransmission, or being 1548 processed ahead of others. 1550 Meeting REQ 13: Yes, please see Section 5.10. 1552 REQ 14: REQ 14: The mechanism should provide unambiguous directions 1553 to clients on when they should retry a request and when they should 1554 not. This especially applies to TCP connection establishment and SIP 1555 registrations, in order to mitigate against avalanche restart. 1557 Meeting REQ 14: Yes, Section 5.9 provides normative behavior on when 1558 to retry a request after repeated timeouts and fatal transport errors 1559 resulting from communications with a non-responsive downstream SIP 1560 server. 1562 REQ 15: In cases where a network element fails, is so overloaded that 1563 it cannot process messages, or cannot communicate due to a network 1564 failure or network partition, it will not be able to provide explicit 1565 indications of the nature of the failure or its levels of congestion. 1566 The mechanism must properly function in these cases. 1568 Meeting REQ 15: Yes, Section 5.9 provides normative behavior on when 1569 to retry a request after repeated timeouts and fatal transport errors 1570 resulting from communications with a non-responsive downstream SIP 1571 server. 1573 REQ 16: The mechanism should attempt to minimize the overhead of the 1574 overload control messaging. 1576 Meeting REQ 16: Yes, overload control messages are sent in the 1577 topmost Via header, which is always processed by the SIP elements. 1579 REQ 17: The overload mechanism must not provide an avenue for 1580 malicious attack, including DoS and DDoS attacks. 1582 Meeting REQ 17: Partially. Since overload control information is 1583 shared between a pair of communicating entities, a confidential and 1584 authenticated channel can be used for this communication. However, 1585 if such a channel is not available, then the security ramifications 1586 outlined in Section 11 apply. 1588 REQ 18: The overload mechanism should be unambiguous about whether a 1589 load indication applies to a specific IP address, host, or URI, so 1590 that an upstream element can determine the load of the entity to 1591 which a request is to be sent. 1593 Meeting REQ 18: Yes, please see discussion in Section 5.5. 1595 REQ 19: The specification for the overload mechanism should give 1596 guidance on which message types might be desirable to process over 1597 others during times of overload, based on SIP-specific 1598 considerations. For example, it may be more beneficial to process a 1599 SUBSCRIBE refresh with Expires of zero than a SUBSCRIBE refresh with 1600 a non-zero expiration (since the former reduces the overall amount of 1601 load on the element), or to process re-INVITEs over new INVITEs. 1603 Meeting REQ 19: Yes, please see Section 5.10. 1605 REQ 20: In a mixed environment of elements that do and do not 1606 implement the overload mechanism, no disproportionate benefit shall 1607 accrue to the users or operators of the elements that do not 1608 implement the mechanism. 1610 Meeting REQ 20: Yes, an element that does not implement overload 1611 control does not receive any measure of extra benefit. 1613 REQ 21: The overload mechanism should ensure that the system remains 1614 stable. When the offered load drops from above the overall capacity 1615 of the network to below the overall capacity, the throughput should 1616 stabilize and become equal to the offered load. 1618 Meeting REQ 21: Yes, the overload control mechanism described in this 1619 draft ensures the stability of the system. 1621 REQ 22: It must be possible to disable the reporting of load 1622 information towards upstream targets based on the identity of those 1623 targets. This allows a domain administrator who considers the load 1624 of their elements to be sensitive information, to restrict access to 1625 that information. Of course, in such cases, there is no expectation 1626 that the overload mechanism itself will help prevent overload from 1627 that upstream target. 1629 Meeting REQ 22: Yes, an operator of a SIP server can configure the 1630 SIP server to only report overload control information for requests 1631 received over a confidential channel, for example. However, note 1632 that this requirement is in conflict with REQ 3, as it introduces a 1633 modicum of extra configuration. 1635 REQ 23: It must be possible for the overload mechanism to work in 1636 cases where there is a load balancer in front of a farm of proxies. 1638 Meeting REQ 23: Yes. Depending on the type of load balancer, this 1639 requirement is met. A load balancer fronting a farm of SIP proxies 1640 could be a SIP-aware load balancer or one that is not SIP-aware. If 1641 the load balancer is SIP-aware, it can make conscious decisions on 1642 throttling outgoing traffic towards the individual server in the farm 1643 based on the overload control parameters returned by the server. On 1644 the other hand, if the load balancer is not SIP-aware, then there are 1645 other strategies to perform overload control. Section 6 of [RFC6357] 1646 documents some of these strategies in more detail (see discussion 1647 related to Figure 3(a) in Section 6). 1649 Authors' Addresses 1651 Vijay K. Gurbani (editor) 1652 Bell Laboratories, Alcatel-Lucent 1653 1960 Lucent Lane, Rm 9C-533 1654 Naperville, IL 60563 1655 USA 1657 Email: vkg@bell-labs.com 1659 Volker Hilt 1660 Bell Laboratories, Alcatel-Lucent 1661 791 Holmdel-Keyport Rd 1662 Holmdel, NJ 07733 1663 USA 1665 Email: volkerh@bell-labs.com 1666 Henning Schulzrinne 1667 Columbia University/Department of Computer Science 1668 450 Computer Science Building 1669 New York, NY 10027 1670 USA 1672 Phone: +1 212 939 7004 1673 Email: hgs@cs.columbia.edu 1674 URI: http://www.cs.columbia.edu