idnits 2.17.1 draft-ietf-soc-overload-control-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 667 has weird spacing: '...control param...' == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (March 12, 2012) is 4426 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 712 -- Looks like a reference, but probably isn't: '100' on line 712 == Outdated reference: A later version (-13) exists of draft-ietf-soc-load-control-event-package-03 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Gurbani, Ed. 3 Internet-Draft V. Hilt 4 Intended status: Standards Track Bell Laboratories, Alcatel-Lucent 5 Expires: September 13, 2012 H. Schulzrinne 6 Columbia University 7 March 12, 2012 9 Session Initiation Protocol (SIP) Overload Control 10 draft-ietf-soc-overload-control-08 12 Abstract 14 Overload occurs in Session Initiation Protocol (SIP) networks when 15 SIP servers have insufficient resources to handle all SIP messages 16 they receive. Even though the SIP protocol provides a limited 17 overload control mechanism through its 503 (Service Unavailable) 18 response code, SIP servers are still vulnerable to overload. This 19 document defines the behaviour of SIP servers involved in overload 20 control, and in addition, it specifies a loss-based overload scheme 21 for SIP. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on September 13, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Overview of operations . . . . . . . . . . . . . . . . . . . . 5 60 4. Via header parameters for overload control . . . . . . . . . . 6 61 4.1. The oc parameter . . . . . . . . . . . . . . . . . . . . . 6 62 4.2. The oc-algo parameter . . . . . . . . . . . . . . . . . . 7 63 4.3. The oc-validity parameter . . . . . . . . . . . . . . . . 7 64 4.4. The oc-seq parameter . . . . . . . . . . . . . . . . . . . 8 65 5. General behaviour . . . . . . . . . . . . . . . . . . . . . . 8 66 5.1. Handshake to determine support for overload control . . . 9 67 5.2. Creating and updating the overload control parameters . . 9 68 5.3. Determining the 'oc' Parameter Value . . . . . . . . . . . 11 69 5.4. Processing the Overload Control Parameters . . . . . . . . 11 70 5.5. Using the Overload Control Parameter Values . . . . . . . 12 71 5.6. Forwarding the overload control parameters . . . . . . . . 12 72 5.7. Terminating overload control . . . . . . . . . . . . . . . 13 73 5.8. Stabilizing overload algorithm selection . . . . . . . . . 13 74 5.9. Self-Limiting . . . . . . . . . . . . . . . . . . . . . . 14 75 5.10. Responding to an Overload Indication . . . . . . . . . . . 14 76 5.10.1. Message prioritization at the hop before the 77 overloaded server . . . . . . . . . . . . . . . . . . 14 78 5.10.2. Rejecting requests at an overloaded server . . . . . 15 79 5.11. 100-Trying provisional response and overload control 80 parameters . . . . . . . . . . . . . . . . . . . . . . . . 15 81 6. The loss-based overload control scheme . . . . . . . . . . . . 16 82 6.1. Special parameter values for loss-based overload 83 control . . . . . . . . . . . . . . . . . . . . . . . . . 16 84 6.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 17 85 6.3. Default algorithm for loss-based overload control . . . . 18 86 7. Relationship with other IETF SIP load control efforts . . . . 22 87 8. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 88 9. Design Considerations . . . . . . . . . . . . . . . . . . . . 23 89 9.1. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 23 90 9.1.1. SIP Response Header . . . . . . . . . . . . . . . . . 23 91 9.1.2. SIP Event Package . . . . . . . . . . . . . . . . . . 24 92 9.2. Backwards Compatibility . . . . . . . . . . . . . . . . . 25 93 10. Security Considerations . . . . . . . . . . . . . . . . . . . 25 94 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 95 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 96 12.1. Normative References . . . . . . . . . . . . . . . . . . . 27 97 12.2. Informative References . . . . . . . . . . . . . . . . . . 27 98 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 28 99 Appendix B. RFC5390 requirements . . . . . . . . . . . . . . . . 28 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34 102 1. Introduction 104 As with any network element, a Session Initiation Protocol (SIP) 105 [RFC3261] server can suffer from overload when the number of SIP 106 messages it receives exceeds the number of messages it can process. 107 Overload can pose a serious problem for a network of SIP servers. 108 During periods of overload, the throughput of a network of SIP 109 servers can be significantly degraded. In fact, overload may lead to 110 a situation in which the throughput drops down to a small fraction of 111 the original processing capacity. This is often called congestion 112 collapse. 114 Overload is said to occur if a SIP server does not have sufficient 115 resources to process all incoming SIP messages. These resources may 116 include CPU processing capacity, memory, network bandwidth, input/ 117 output, or disk resources. 119 For overload control, we only consider failure cases where SIP 120 servers are unable to process all SIP requests due to resource 121 constraints. There are other cases where a SIP server can 122 successfully process incoming requests but has to reject them due to 123 failure conditions unrelated to the SIP server being overloaded. For 124 example, a PSTN gateway that runs out of trunks but still has plenty 125 of capacity to process SIP messages should reject incoming INVITEs 126 using a 488 (Not Acceptable Here) response [RFC4412]. Similarly, a 127 SIP registrar that has lost connectivity to its registration database 128 but is still capable of processing SIP requests should reject 129 REGISTER requests with a 500 (Server Error) response [RFC3261]. 130 Overload control does not apply to these cases and SIP provides 131 appropriate response codes for them. 133 The SIP protocol provides a limited mechanism for overload control 134 through its 503 (Service Unavailable) response code. However, this 135 mechanism cannot prevent overload of a SIP server and it cannot 136 prevent congestion collapse. In fact, the use of the 503 (Service 137 Unavailable) response code may cause traffic to oscillate and to 138 shift between SIP servers and thereby worsen an overload condition. 139 A detailed discussion of the SIP overload problem, the problems with 140 the 503 (Service Unavailable) response code and the requirements for 141 a SIP overload control mechanism can be found in [RFC5390]. 143 This document defines the general behaviour of SIP servers and 144 clients involved in overload control in Section 5. In addition, 145 Section 6 specifies a loss-based overload control scheme. SIP 146 clients and servers conformant to this specification MUST implement 147 the loss-based overload control scheme. They MAY implement other 148 overload control schemes as well. 150 2. Terminology 152 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 153 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 154 document are to be interpreted as described in RFC 2119 [RFC2119]. 156 In this document, the terms "SIP client" and "SIP server" are used in 157 their generic forms. Thus, a "SIP client" could refer to the client 158 transaction state machine in a SIP proxy or it could refer to a user 159 agent client. Similarly, a "SIP server" could be a user agent server 160 or the server transaction state machine in a proxy. Various 161 permutations of this are also possible, for instance, SIP clients and 162 servers could also be part of back-to-back user agents (B2BUAs). 164 However, irrespective of the context (i.e., proxy, B2BUA, UAS, UAC) 165 these terms are used in, "SIP client" applies to any SIP entity that 166 provides overload control to traffic destined downstream. Similarly, 167 "SIP server" applies to any SIP entity that is experiencing overload 168 and would like its upstream neighbour to throttle incoming traffic. 170 Unless otherwise specified, all SIP entities described in this 171 document are assumed to support this specification. 173 The normative statements in this specification as they apply to SIP 174 clients and SIP servers assume that both the SIP clients and SIP 175 servers support this specification. If, for instance, only a SIP 176 client supports this specification and not the SIP server, then 177 follows that the normative statements in this specification pertinent 178 to the behavior of a SIP server do not apply to the server that does 179 not support this specification. 181 3. Overview of operations 183 We now explain the overview of how the overload control mechanism 184 operates by introducing the overload control parameters. Section 4 185 provides more details and normative behavior on the parameters listed 186 below. 188 Because overload control is best performed hop-by-hop, the Via 189 parameter is attractive since it allows two adjacent SIP entities to 190 indicate support for, and exchange information associated with 191 overload control [RFC6357]. Additional advantages of this choice are 192 discussed in Section 9.1.1. An alternative mechanism using SIP event 193 packages was also considered, and the characteristics of that choice 194 are further outlined in Section 9.1.2. 196 This document defines four new parameters for the SIP Via header for 197 overload control. These parameters provide a mechanism for conveying 198 overload control information between adjacent SIP entities. The "oc" 199 parameter is used by a SIP server to indicate a reduction in the 200 amount of requests arriving at the server. The "oc-algo" parameter 201 contains a token or a list of tokens corresponding to the class of 202 overload control algorithms supported by the client. The server 203 chooses one algorithm from this list. The "oc-validity" parameter 204 establishes a time limit for which overload control is in effect, and 205 the "oc-seq" parameter aids in sequencing the responses at the 206 client. These parameters are discussed in detail in the next 207 section. 209 4. Via header parameters for overload control 211 The four Via header parameters are introduced below. Further context 212 about how to interpret these under various conditions is provided in 213 Section 5. 215 4.1. The oc parameter 217 This parameter is inserted by the SIP client and updated by the SIP 218 server. 220 A SIP client MUST add an "oc" parameter to the topmost Via header it 221 inserts into the SIP request. This provides an indication to 222 downstream neighbors that the client supports overload control. 223 There MUST NOT be a value associated with the parameter (the value 224 will be added by the server). 226 The downstream server MUST add a value to the "oc" parameter in the 227 response going upstream. Inclusion of a value to the parameter 228 represents two things: one, upon an initial handshake (see 229 Section 5.1), addition of a value by the server to this parameter 230 indicates (to the client) that the downstream server supports 231 overload control as defined in this document. Second, if overload 232 control is active, then it indicates the level of control to be 233 applied. 235 When a SIP client receives a response with the value in the "oc" 236 parameter filled in, it SHOULD reduce, as indicated by the "oc" and 237 "oc-algo" parameters, the number of requests going downstream to the 238 SIP server from which it received the response (see Section 5.10 for 239 pertinent discussion on traffic reduction). 241 4.2. The oc-algo parameter 243 This parameter is inserted by the SIP client and updated by the SIP 244 server. 246 A SIP client MUST add an "oc-algo" parameter to the topmost Via 247 header it inserts into the SIP request. This parameter contains one 248 or more overload control algorithms. A SIP client MUST support the 249 loss-based overload control scheme and MUST insert the token "loss" 250 as the "oc-algo" parameter value. In addition, the SIP client MAY 251 insert other tokens, separated by a comma, in the "oc-algo" parameter 252 if it supports other overload control schemes such as a rate-based 253 scheme ([I-D.noel-soc-overload-rate-control]). Each element in the 254 comma-separated list corresponds to the class of overload control 255 algorithms supported by the SIP client. When more than one class of 256 overload control algorithms is present in the "oc-algo" parameter, 257 the client may indicate algorithm preference by ordering the list in 258 a decreasing order of preference. However, the client must not 259 assume that the server will pick the most preferred algorithm. 261 When a downstream SIP server receives a request with multiple 262 overload control algorithms specified in the "oc-algo" parameter 263 (optionally sorted by decreasing order of preference), it MUST choose 264 one algorithm from the list and MUST pare the list down to include 265 the one chosen algorithm. The pared down list consisting of the 266 chosen algorithm MUST be returned to the upstream SIP client in the 267 response. 269 Once a SIP client and a SIP server have converged to a mutually 270 agreeable class of overload control algorithm, the agreed upon class 271 stays in effect for a non-trivial duration of time to allow the 272 overload control algorithm to stabilize its behaviour (see 273 Section 5.8). Furthermore, the client MUST continue to include all 274 supported algorithms in subsequent requests; the server MUST respond 275 with the agreed to algorithm until such time that the algorithm is 276 changed by the server (see Section 5.8). 278 4.3. The oc-validity parameter 280 This parameter is inserted by the SIP server. 282 This parameter contains a value that indicates an interval of time 283 (measured in milliseconds) that the load reduction specified value of 284 the "oc" parameter should be in effect. The default value of the 285 "oc-validity" parameter is 500 (millisecond). 287 A value of 0 in the "oc-validity" parameter is reserved to denote the 288 event that the server wishes to stop overload control (see 289 Section 5.7 for more information). 291 A SIP client MUST discard the "oc-validity" parameter if the client 292 receives it in a response without the corresponding "oc" parameter 293 being present as well. A non-zero value for the "oc-validity" 294 parameter MUST only be present in conjunction with an "oc" parameter. 296 When the period during which the load reduction is in effect expires, 297 the SIP client MUST NOT accord any special meaning to the value of 298 "oc", "oc-seq" and "oc-algo" parameters. 300 4.4. The oc-seq parameter 302 This parameter is inserted by the SIP server. 304 This parameter contains a value that indicates the sequence number 305 associated with the "oc" parameter. Some implementations may be 306 capable of updating the overload control information before the 307 validity period specified by the "oc-validity" parameter expires. 308 Such implementations MUST have an increasing value in the "oc-seq" 309 parameter for each response sent to the upstream SIP client. This is 310 to allow the upstream SIP client to properly collate out-of-order 311 responses. 313 A timestamp can be used as a value of the "oc-seq" parameter. 315 If the value contained in "oc-seq" parameter overflows during the 316 period in which the load reduction is in effect, then the "oc-seq" 317 parameter MUST be reset to the current timestamp or an appropriate 318 base value. 320 5. General behaviour 322 When forwarding a SIP request, a SIP client uses the SIP procedures 323 of [RFC3263] to determine the next hop SIP server. The procedures of 324 [RFC3263] take as input a SIP URI, extract the domain portion of that 325 URI for use as a lookup key, and query the Domain Name Service (DNS) 326 to obtain an ordered set of one or more IP addresses with a port 327 number and transport corresponding to each IP address in this set 328 (the "Expected Output"). 330 After selecting a specific SIP server from the Expected Output, a SIP 331 client MUST determine if it is operating under overload control mode 332 with the server (see Section 5.5) or if this is the initial contact 333 with the server. 335 If the client determines that this is the initial contact with the 336 server, it follows the steps outlined in the first paragraph of 337 Section 5.1. Otherwise, the client has conversed with this server 338 before and any overload control parameters established during the 339 previous exchange remain in effect. 341 5.1. Handshake to determine support for overload control 343 If a client determines that this is the initial contact with the 344 server, the client MUST insert the "oc" parameter without any value, 345 and MUST insert the "oc-algo" parameter with a list of algorithms it 346 supports. This list MUST include "loss" and MAY include other 347 algorithm names approved by IANA and described in corresponding 348 documents. The client transmits the request to the chosen server. 350 A server that supports overload control MUST choose one algorithm 351 from the list of algorithms in the "oc-algo" parameter. It MUST put 352 the chosen algorithm as the sole parameter value in the "oc-algo" 353 parameter of the response it sends to the client. In addition, if 354 the server is currently not in an overload condition, it MUST set the 355 value of the "oc" parameter to be 0 and MAY insert an "oc-validity=0" 356 parameter in the response to further qualify the value in the "oc" 357 parameter. If the server is currently overloaded, it MUST follow the 358 procedures of Section 5.2. 360 A client that supports the rate-based overload control scheme 361 [I-D.noel-soc-overload-rate-control] will consider "oc=0" as an 362 indication not to send any requests downstream at all. Thus, when 363 the server inserts "oc-validity=0" as well, it is indicating that 364 it does support overload control, but it is not under overload 365 mode right now (see Section 5.7). 367 5.2. Creating and updating the overload control parameters 369 A SIP server provides overload control feedback to its upstream 370 clients by providing a value for the "oc" parameter to the topmost 371 Via header field of a SIP response, that is, the Via header added by 372 the client before it sent the request to the server. 374 Since the topmost Via header of a response will be removed by an 375 upstream client after processing it, overload control feedback 376 contained in the "oc" parameter will not travel beyond the upstream 377 SIP client. A Via header parameter therefore provides hop-by-hop 378 semantics for overload control feedback (see [RFC6357]) even if the 379 next hop neighbor does not support this specification. 381 The "oc" parameter can be used in all response types, including 382 provisional, success and failure responses (please see Section 5.11 383 for special consideration on transporting overload control parameters 384 in a 100-Trying response). A SIP server MAY update the "oc" 385 parameter in all responses it is sending. A SIP server MUST update 386 the "oc" parameter to responses when the transmission of overload 387 control feedback is required by the overload control algorithm to 388 limit the traffic received by the server. I.e., a SIP server MUST 389 update the "oc" parameter when the overload control algorithm sets 390 the value of an "oc" parameter to a value different than the default 391 value. 393 A SIP server that has updated the "oc" parameter to Via header SHOULD 394 also add a "oc-validity" parameter to the same Via header. The "oc- 395 validity" parameter defines the time in milliseconds during which the 396 content (i.e., the overload control feedback) of the "oc" parameter 397 is valid. The default value of the "oc-validity" parameter is 500 398 (millisecond). A SIP server SHOULD specify an oc-validity time that 399 is responsive to changing client originated traffic rates, but not 400 too short as to introduce instability. This is a complex subject and 401 outside the scope of this specification. If the "oc-validity" 402 parameter is not present, its default value is used. The "oc- 403 validity" parameter MUST NOT be used in a Via header that did not 404 originally contain an "oc" parameter when received. 406 When a SIP server retransmits a response, it SHOULD use the "oc" 407 parameter value and "oc-validity" parameter value consistent with the 408 overload state at the time the retransmitted response is sent. This 409 implies that the values in the "oc" and "oc-validity" parameters may 410 be different then the ones used in previous retransmissions of the 411 response. Due to the fact that responses sent over UDP may be 412 subject to delays in the network and arrive out of order, the "oc- 413 seq" parameter aids in detecting a stale "oc" parameter value. 415 Implementations that are capable of updating the "oc" and "oc- 416 validity" parameter values for retransmissions MUST insert the "oc- 417 seq" parameter. The value of this parameter MUST be a set of numbers 418 drawn from an increasing sequence. 420 Implementations that are not capable of updating the "oc" and "oc- 421 validity" parameter values for retransmissions --- or implementations 422 that do not want to do so because they will have to regenerate the 423 message to be retransmitted --- MUST still insert a "oc-seq" 424 parameter in the first response associated with a transaction; 425 however, they do not have to update the value in subsequent 426 retransmissions. 428 The "oc-validity" and "oc-seq" Via header parameters are only defined 429 in SIP responses and MUST NOT be used in SIP requests. These 430 parameters are only useful to the upstream neighbor of a SIP server 431 (i.e., the entity that is sending requests to the SIP server) since 432 this is the entity that can offload traffic by redirecting/rejecting 433 new requests. If requests are forwarded in both directions between 434 two SIP servers (i.e., the roles of upstream/downstream neighbors 435 change), there are also responses flowing in both directions. Thus, 436 both SIP servers can exchange overload information. 438 Since overload control protects a SIP server from overload, it is 439 RECOMMENDED that a SIP server uses the mechanisms described in this 440 specification. However, if a SIP server wanted to limit its overload 441 control capability for privacy reasons, it MAY decide to perform 442 overload control only for requests that are received on a secure 443 transport channel, such as TLS. This enables a SIP server to protect 444 overload control information and ensure that it is only visible to 445 trusted parties. 447 5.3. Determining the 'oc' Parameter Value 449 The value of the "oc" parameter is determined by the overloaded 450 server using any pertinent information at its disposal. The only 451 constraint imposed by this document is that the server control 452 algorithm MUST produce a value for the "oc" parameter such that the 453 receiving clients can apply it to all downstream requests (dialogue 454 forming as well as in-dialogue). Beyond this stipulation, the 455 process by which an overloaded server determines the value of the 456 "oc" parameter is considered out of scope for this document. 458 Note that this stipulation is required so that both the and server 459 have an common view of which messages to include in the 460 calculation of the feedback. With this stipulation in place, the 461 client can prioritize messages as discussed in Section 5.10.1. 463 As an example, a value of "oc=10" when the loss-based algorithm is 464 uses implies that 10% of all requests (dialog forming as well as in- 465 dialogue) are subject to reduction at the client. Analogously, a 466 value of "oc=10" when the rate-based algorithm 467 [I-D.noel-soc-overload-rate-control] is used indicates that the 468 client should send SIP requests at a rate no greater than or equal to 469 10 SIP requests per second. 471 5.4. Processing the Overload Control Parameters 473 A SIP client SHOULD remove "oc", "oc-validity" and "oc-seq" 474 parameters from all Via headers of a response received, except for 475 the topmost Via header. This prevents overload control parameters 476 that were accidentally or maliciously inserted into Via headers by a 477 downstream SIP server from traveling upstream. 479 The scope of overload control applies to unique combinations of IP 480 and port values. A SIP client maintains the "oc" parameter values 481 received along with the address and port number of the SIP servers 482 from which they were received for the duration specified in the "oc- 483 validity" parameter or the default duration. Each time a SIP client 484 receives a response with an "oc" parameter from a downstream SIP 485 server, it overwrites the "oc" value it has currently stored for this 486 server with the new value received. The SIP client restarts the 487 validity period of an "oc" parameter each time a response with an 488 "oc" parameter is received from this server. A stored "oc" parameter 489 value MUST be discarded once it has reached the end of its validity. 491 5.5. Using the Overload Control Parameter Values 493 A SIP client MUST honor overload control values it receives from 494 downstream neighbors. The SIP client MUST NOT forward more requests 495 to a SIP server than allowed by the current "oc" parameter value from 496 that particular downstream server. 498 When forwarding a SIP request, a SIP client uses the SIP procedures 499 of [RFC3263] to determine the next hop SIP server. The procedures of 500 [RFC3263] take as input a SIP URI, extract the domain portion of that 501 URI for use as a lookup key, and query the Domain Name Service (DNS) 502 to obtain an ordered set of one or more IP addresses with a port 503 number and transport corresponding to each IP address in this set 504 (the "Expected Output"). 506 After selecting a specific SIP server from the Expected Output, the 507 SIP client MUST determine if it already has overload control 508 parameter values for the server chosen from the Expected Output. If 509 the SIP client has a non-expired "oc" parameter value for the server 510 chosen from the Expected Output, then this chosen server is operating 511 in overload control mode. Thus, the SIP client MUST determine if it 512 can or cannot forward the current request to the SIP server depending 513 on the nature of the request and the prevailing overload conditions. 515 The particular algorithm used to determine whether or not to forward 516 a particular SIP request is a matter of local policy, and may take 517 into account a variety of prioritization factors. However, this 518 local policy SHOULD generate the same number of SIP requests as the 519 default algorithm defined by the overload control scheme being used. 521 5.6. Forwarding the overload control parameters 523 Overload control is defined in a hop-by-hop manner. Therefore, 524 forwarding the contents of the overload control parameters is 525 generally NOT RECOMMENDED and should only be performed if permitted 526 by the configuration of SIP servers. This means that a SIP proxy 527 SHOULD strip the overload control parameters inserted by the client 528 before proxying the request further downstream. 530 5.7. Terminating overload control 532 A SIP client removes overload control if one of the following events 533 occur: 535 1. The "oc-validity" period negotiated to put the server and client 536 in overload state expires; 537 2. The client is explicitly told by the server to stop performing 538 overload control using the "oc-validity=0" parameter. 540 A SIP server can decide to terminate overload control by explicitly 541 signaling the client. To do so, the SIP server MUST set the value of 542 the "oc-validity" parameter to 0. The SIP server MUST increment the 543 value of "oc-seq", and SHOULD set the value of the "oc" parameter to 544 0. 546 Note that the loss-based overload control scheme (Section 6) can 547 effectively stop overload control by setting the value of the "oc" 548 parameter to 0. However, the rate-based scheme 549 ([I-D.noel-soc-overload-rate-control]) needs an additional piece 550 of information in the form of "oc-validity=0". 552 When the client receives a response with a higher "oc-seq" number 553 than the one it currently is processing, it checks the "oc-validity" 554 parameter. If the value of the "oc-validity" parameter is 0, the 555 client MUST stop performing overload control of messages destined to 556 the server and the traffic should flow without any reduction. 557 Furthermore, when the value of the "oc-validity" parameter is 0, the 558 client SHOULD disregard the value in the "oc" parameter. 560 5.8. Stabilizing overload algorithm selection 562 Realities of deployments of SIP necessitate that the overload control 563 algorithm be renegotiated upon a system reboot or a software upgrade. 564 However, frequent renegotiation of the overload control algorithm 565 MUST be avoided. A rapid renegotiation of the overload control 566 algorithm will not benefit the client or the server as such flapping 567 does not allow the chosen algorithm to measure and fine tune its 568 behavior over a period of time. Renegotiation, when desired, is 569 simply accomplished by the SIP server choosing a new algorithm from 570 the list in the "oc-algo" parameter and sending it back to the client 571 in a response. 573 The client associates a specific algorithm with each server it sends 574 traffic to such that when the server changes the algorithm, the 575 client must behave accordingly as well. 577 Once the client and server agree on an overload control algorithm, it 578 MUST remain in effect for at least 3600 seconds (1 hour) before 579 renegotiation occurs. 581 One way to accomplish this involves the server saving the time of 582 the last negotiation in a lookup table, indexed by the client's 583 network identifiers. Renegotiation is only done when the time of 584 the last negotiation has surpassed 3600 seconds. 586 5.9. Self-Limiting 588 In some cases, a SIP client may not receive a response from a server 589 after sending a request. RFC3261 [RFC3261] defines that when a 590 timeout error is received from the transaction layer, it MUST be 591 treated as if a 408 (Request Timeout) status code has been received. 592 If a fatal transport error is reported by the transport layer, it 593 MUST be treated as a 503 (Service Unavailable) status code. 595 In the event of repeated timeouts or fatal transport errors, the SIP 596 client MUST stop sending requests to this server. The SIP client 597 SHOULD periodically probe if the downstream server is alive using any 598 mechanism for this probe at its disposal. Once a SIP client has 599 successfully transmitted a request to the downstream server, the SIP 600 client can resume normal traffic rates. It should, of course, honor 601 any "oc" parameters it may receive subsequent to resuming normal 602 traffic rates. 604 5.10. Responding to an Overload Indication 606 A SIP client can receive overload control feedback indicating that it 607 needs to reduce the traffic it sends to its downstream server. The 608 client can accomplish this task by sending some of the requests that 609 would have gone to the overloaded element to a different destination. 610 It needs to ensure, however, that this destination is not in overload 611 and capable of processing the extra load. A client can also buffer 612 requests in the hope that the overload condition will resolve quickly 613 and the requests still can be forwarded in time. In many cases, 614 however, it will need to reject these requests. 616 5.10.1. Message prioritization at the hop before the overloaded server 618 During an overload condition, a SIP client needs to prioritize 619 requests and select those requests that need to be rejected or 620 redirected. While this selection is largely a matter of local 621 policy, certain heuristics can be suggested. One, during overload 622 control, the SIP client should preserve existing dialogs as much as 623 possible. This suggests that mid-dialog requests MAY be given 624 preferential treatment. Similarly, requests that result in releasing 625 resources (such as a BYE) MAY also be given preferential treatment. 627 A SIP client SHOULD honor the local policy for prioritizing SIP 628 requests such as policies based on the content of the Resource- 629 Priority header (RPH, RFC4412 [RFC4412]). Specific (namespace.value) 630 RPH contents may indicate high priority requests that should be 631 preserved as much as possible during overload. The RPH contents can 632 also indicate a low-priority request that is eligible to be dropped 633 during times of overload. Other indicators, such as the SOS URN 634 [RFC5031] indicating an emergency request, may also be used for 635 prioritization. 637 Local policy could also include giving precedence to mid-dialog SIP 638 requests (re-INVITEs, UPDATEs, BYEs etc.) in times of overload. A 639 local policy can be expected to combine both the SIP request type and 640 the prioritization markings, and SHOULD be honored when overload 641 conditions prevail. 643 A SIP client SHOULD honor user-level load control filters installed 644 by signaling neighbors [I-D.ietf-soc-load-control-event-package] by 645 sending the SIP messages that matched the filter downstream. 647 5.10.2. Rejecting requests at an overloaded server 649 If the upstream SIP client to the overloaded server does not support 650 overload control, it will continue to direct requests to the 651 overloaded server. Thus, the overloaded server must bear the cost of 652 rejecting some session requests as well as the cost of processing 653 other requests to completion. It would be fair to devote the same 654 amount of processing at the overloaded server to the combination of 655 rejection and processing as the overloaded server would devote to 656 processing requests from an upstream SIP client that supported 657 overload control. This is to ensure that SIP servers that do not 658 support this specification don't receive an unfair advantage over 659 those that do. 661 A SIP server that is under overload and has started to throttle 662 incoming traffic MUST reject this request with a "503 (Service 663 Unavailable)" response without Retry-After header to reject some 664 requests from upstream neighbors that do not support overload 665 control. 667 5.11. 100-Trying provisional response and overload control parameters 669 The overload control information sent from a SIP server to a client 670 is transported in the responses. While implementations can insert 671 overload control information in any response, special attention 672 should be accorded to overload control information transported in a 673 100-Trying response. 675 Traditionally, the 100-Trying response has been used in SIP to quench 676 retransmissions. In some implementations, the 100-Trying message may 677 not be generated by the transaction user (TU) nor consumed by the TU. 678 In these implementations, the 100-Trying response is generated at the 679 transaction layer and sent to the upstream SIP client. At the 680 receiving SIP client, the 100-Trying is consumed at the transaction 681 layer by inhibiting the retransmission of the corresponding request. 682 Consequently, implementations that insert overload control 683 information in the 100-Trying cannot assume that the upstream SIP 684 client passed the overload control information in the 100-Trying to 685 their corresponding TU. For this reason, implementations that insert 686 overload control information in the 100-Trying MUST re-insert the 687 same (or updated) overload control information in the first non-100 688 response being sent to the upstream SIP client. 690 6. The loss-based overload control scheme 692 A loss percentage enables a SIP server to ask an upstream neighbor to 693 reduce the number of requests it would normally forward to this 694 server by X%. For example, a SIP server can ask an upstream neighbor 695 to reduce the number of requests this neighbor would normally send by 696 10%. The upstream neighbor then redirects or rejects 10% of the 697 traffic that is destined for this server. 699 This section specifies the semantics of the overload control 700 parameters associated with the loss-based overload control scheme. 701 The general behaviour of SIP clients and servers is specified in 702 Section 5 and is applicable to SIP clients and servers that implement 703 loss-based overload control. 705 6.1. Special parameter values for loss-based overload control 707 The loss-based overload control scheme is identified using the token 708 "loss". This token MUST appear in the "oc-algo" parameter. 710 A SIP server, upon entering the overload state, will assign a value 711 to the "oc" parameter. This value MUST be restricted in the range of 712 [0, 100], inclusive. This value MUST be interpreted as a percentage, 713 and the SIP client MUST reduce the number of requests being forwarded 714 to the overloaded server by that amount. The SIP client may use any 715 algorithm that reduces the traffic arriving at the overloaded server 716 by the amount indicated. Such an algorithm SHOULD honor the message 717 prioritization discussion of Section 5.10.1. While a particular 718 algorithm is not subject to standardization, for completeness a 719 default algorithm for loss-based overload control is provided in 720 Section 6.3. 722 When a SIP server receives a request from a client with an "oc" 723 parameter but without a value, and the SIP server is not experiencing 724 overload, it MUST assign a value of 0 to the "oc" parameter in the 725 response. Assigning such a value lets the client know that the 726 server supports overload control and is not currently experiencing 727 overload. 729 When the "oc-validity" parameter is used to signify overload control 730 termination (Section 5.7), the server MUST insert a value of 0 in the 731 "oc-validity" parameter. The server MUST insert a value of 0 in the 732 "oc" parameter as well. When a client receives a response whose "oc- 733 validity" parameter contains a 0, it MUST treat any non-zero value in 734 the "oc" parameter as if it had received a value of 0 in that 735 parameter. 737 6.2. Example 739 Consider a SIP client, P1, which is sending requests to another 740 downstream SIP server, P2. The following snippets of SIP messages 741 demonstrate how the overload control parameters work. 743 INVITE sips:user@example.com SIP/2.0 744 Via: SIP/2.0/TLS p1.example.net; 745 branch=z9hG4bK2d4790.1;oc;oc-algo="loss,A" 746 ... 748 SIP/2.0 100 Trying 749 Via: SIP/2.0/TLS p1.example.net; 750 branch=z9hG4bK2d4790.1;received=192.0.2.111; 751 oc=0;oc-algo="loss"; 752 ... 754 In the messages above, the first line is sent by P1 to P2. This line 755 is a SIP request; because P1 supports overload control, it inserts 756 the "oc" parameter in the topmost Via header that it created. P1 757 supports two overload control algorithms: loss and some algorithm 758 called "A". 760 The second line --- a SIP response --- shows the topmost Via header 761 amended by P2 according to this specification and sent to P1. 762 Because P2 also supports overload control, it chooses the "loss" 763 based scheme and sends that back to P1 in the "oc-algo" parameter. 764 It also sets the value of "oc" parameter to 0. 766 Had P2 not supported overload control, it would have left the "oc" 767 and "oc-algo" parameters unchanged, thus allowing the client to know 768 that it did not support overload control. 770 At some later time, P2 starts to experience overload. It sends the 771 following SIP message indicating that P1 should decrease the messages 772 arriving to P2 by 20% for 1s. 774 SIP/2.0 180 Ringing 775 Via: SIP/2.0/TLS p1.example.net; 776 branch=z9hG4bK2d4790.3;received=192.0.2.111; 777 oc=20;oc-algo="loss";oc-validity=1000; 778 oc-seq=1282321615.782 779 ... 781 After 500ms, the overload condition at P2 subsides. It then sends 782 out the message below to allow P1 to send all messages destined to 783 P2. 785 SIP/2.0 183 Queued 786 Via: SIP/2.0/TLS p1.example.net; 787 branch=z9hG4bK2d4790.4;received=192.0.2.111; 788 oc=0;oc-algo="loss";oc-validity=0;oc-seq=1282321887.783 789 ... 791 6.3. Default algorithm for loss-based overload control 793 This section describes a default algorithm that a SIP client can to 794 throttle SIP traffic going downstream by the percentage loss value 795 specified in the "oc" parameter. 797 The client maintains two categories of requests; the first category 798 will include requests that are candidates for reduction, and the 799 second category will include requests that are not subject to 800 reduction (except under extenuating circumstances when there aren't 801 any messages in the first category that can be reduced). Section 802 Section 5.10.1 contains normative directives on how to prioritize 803 messages for inclusion in the second category. The remaining 804 messages can be allocated to the first category. 806 The client determines the mix of requests falling into the first 807 category and those falling into the second category. For example, 808 40% of the requests may be eligible for reduction and 60% not 809 eligible (and therefore, must be sent downstream). 811 Under overload condition, the client converts the value of the "oc" 812 parameter to a value that it applies to requests in the first 813 category. As a simple example, if "oc=10" and 40% of the requests 814 should be included in the first category, then: 816 10 / 40 * 100 = 25 818 Or, 25% of the requests in the first category can be reduced to get 819 an overall reduction of 10%. The client uses random discard to 820 achieve the 25% reduction of messages in the first category. 821 Messages in the second category proceed downstream unscathed. To 822 affect the 25% reduction rate from the first category, the client 823 draws a random number between 1 and 100 for the request picked from 824 the first category. If the random number is less than or equal to 825 converted value of the "oc" parameter, the request is not forwarded; 826 otherwise the request is forwarded. 828 A reference algorithm is shown below. 830 cat1 := 80.0 // Category 1 --- subject to reduction 831 cat2 := 100.0 - cat1 // Category 2 --- Not subject to 832 // reduction. 80/20 mix. 834 while (true) { 835 // We're modeling message processing as a single work queue 836 // that contains both incoming and outgoing messages. 837 sip_msg := get_next_message_from_work_queue() 839 switch (sip_msg.type) { 841 case outbound request: 842 destination := get_next_hop(sip_msg) 843 oc_context := get_oc_context(destination) 845 if (oc_context == null) { 846 send_to_network(sip_msg) // Process it normally by sending the 847 // request to the next hop since this particular destination 848 // is not subject to overload 849 } 850 else { 851 // Determine if server wants to enter in overload or is in 852 // overload 853 in_oc := extract_in_oc(oc_context) 855 oc_value := extract_oc(oc_context) 856 oc_validity := extract_oc_validity(oc_context) 858 if (in_oc == false or oc_validity is not in effect) { 859 send_to_network(sip_msg) // Process it normally by sending 860 // the request to the next hop since this particular 861 // destination is not subject to overload. Optionally, 862 // clear the oc context for this server (not shown). 864 } 865 else { 866 category := assign_msg_to_category(sip_msg) 867 drop_msg := false 868 pct_to_reduce := min(100, oc_value / cat1 * 100) 870 r := random() 871 if (r <= pct_to_reduce) { 872 drop_msg := true 873 } 875 if (category == cat2 && drop_msg == true) { 876 if (local_policy(sip_msg, oc_value) says 877 process message) { 878 drop_msg := 0 // See Note below 879 } 880 } 882 if (drop_msg == false) { 883 send_to_network(sip_msg) // Process it normally by 884 // sending the request to the next hop 885 } 886 else { 887 // Do not send request downstream, handle locally by 888 // generating response (if a proxy) or treating as 889 // an error (if a user agent). 890 } 891 } 892 } 894 end case // outbound request 896 case outbound response: 897 if (we are in overload) { 898 add_overload_parameters(sip_msg) 899 } 900 send_to_network(sip_msg) 902 end case // outbound response 904 case inbound response: 906 if (sip_msg has oc parameter values) { 907 create_or_update_oc_context() // For the specific server 908 // that sent the response, create or update the oc context; 909 // i.e., extract the values of the oc-related parameters 910 // and store them for later use. 911 } 912 process_msg(sip_msg) 914 end case // inbound response 915 case inbound request: 917 if (we are not in overload) { 918 process_msg(sip_msg) 919 } 920 else { // We are in overload 921 if (sip_msg has oc parameters) { // Upstream client supports 922 process_msg(sip_msg) // oc; only sends important requests 923 } 924 else { // Upstream client does not support oc 925 if (local_policy(sip_msg) says process message) { 926 process_msg(sip_msg) 927 } 928 else { 929 send_response(sip_msg, 503) 930 } 931 } 932 } 933 end case // inbound request 934 } 935 } 937 Note: local_policy() will have to decide whether to allow a category 938 2 request downstream if that request has been marked for discard. 939 Some discussion on how to make this decision is captured in Section 940 5.10.1. 942 There will be four cases to consider in figuring out how local_policy() 943 should behave. These are enunciated below, and in these cases, t is 944 the inter-invocation time of local_policy() and oc is the value of 945 the "oc" parameter. 947 Case 1: t is small (default: <= 10 times/sec) and oc is small 948 (default: < 20%) 949 Case 2: t is large (default: >= 200 times/sec) and oc is large 950 (default: > 70%) 951 Case 3: t is small and oc is large 952 Case 4: t is large and oc is small 954 The decision in cases 1 and 3 seems simple. In case 1, local_policy() 955 is not invoked as often and the oc value is small. On the few 956 times that local_policy() is invoked, it could allow the request to 957 to be sent to the server. 959 In case 3, local_policy() is not invoked as often but the oc value 960 is large. This implies that there are enough category 1 messages 961 that are being dropped. On the few times that local_policy() is 962 invoked, it could allow the request to be sent to the server. 964 It is cases 2 and 4 that local_policy() should do something more 965 intelligent. 967 In case 2, local_policy() is getting invoked very 968 often and the oc is also large. This implies that category 1 969 requests are being dropped as much as possible and it will help 970 to drop a good number of category 2 requests as well. Thus, 971 it seems reasonable to drop all but the SOS URN [RFC5031] 972 requests and high priority RPH content requests. 974 In case 4, local_policy() is getting invoked very often, but the 975 oc value is small. This implies that the bulk of traffic to be 976 dropped consists of category 2 requests. So here, it seems 977 reasonable to drop all but the SOS URN [RFC5031] requests. 979 7. Relationship with other IETF SIP load control efforts 981 The overload control mechanism described in this document is reactive 982 in nature and apart from message prioritization directives listed in 983 Section 5.10.1 the mechanisms described in this draft will not 984 discriminate requests based on user identity, filtering action and 985 arrival time. SIP networks that require pro-active overload control 986 mechanisms can upload user-level load control filters as described in 987 [I-D.ietf-soc-load-control-event-package]. 989 8. Syntax 991 This specification extends the existing definition of the Via header 992 field parameters of [RFC3261] as follows: 994 via-params = via-ttl / via-maddr 995 / via-received / via-branch 996 / oc / oc-validity 997 / oc-seq / oc-algo / via-extension 999 oc = "oc" [EQUAL oc-num] 1000 oc-num = 1*DIGIT 1001 oc-validity = "oc-validity" [EQUAL delta-ms] 1002 oc-seq = "oc-seq" EQUAL 1*12DIGIT "." 1*5DIGIT 1003 oc-algo = "oc-algo" EQUAL DQUOTE algo-list *(COMMA algo-list) 1004 DQUOTE 1005 algo-list = "loss" / *(other-algo) 1006 other-algo = %x41-5A / %x61-7A / %x30-39 1007 delta-ms = 1*DIGIT 1009 9. Design Considerations 1011 This section discusses specific design considerations for the 1012 mechanism described in this document. General design considerations 1013 for SIP overload control can be found in [RFC6357]. 1015 9.1. SIP Mechanism 1017 A SIP mechanism is needed to convey overload feedback from the 1018 receiving to the sending SIP entity. A number of different 1019 alternatives exist to implement such a mechanism. 1021 9.1.1. SIP Response Header 1023 Overload control information can be transmitted using a new Via 1024 header field parameter for overload control. A SIP server can add 1025 this header parameter to the responses it is sending upstream to 1026 provide overload control feedback to its upstream neighbors. This 1027 approach has the following characteristics: 1029 o A Via header parameter is light-weight and creates very little 1030 overhead. It does not require the transmission of additional 1031 messages for overload control and does not increase traffic or 1032 processing burdens in an overload situation. 1033 o Overload control status can frequently be reported to upstream 1034 neighbors since it is a part of a SIP response. This enables the 1035 use of this mechanism in scenarios where the overload status needs 1036 to be adjusted frequently. It also enables the use of overload 1037 control mechanisms that use regular feedback such as window-based 1038 overload control. 1040 o With a Via header parameter, overload control status is inherent 1041 in SIP signaling and is automatically conveyed to all relevant 1042 upstream neighbors, i.e., neighbors that are currently 1043 contributing traffic. There is no need for a SIP server to 1044 specifically track and manage the set of current upstream or 1045 downstream neighbors with which it should exchange overload 1046 feedback. 1047 o Overload status is not conveyed to inactive senders. This avoids 1048 the transmission of overload feedback to inactive senders, which 1049 do not contribute traffic. If an inactive sender starts to 1050 transmit while the receiver is in overload it will receive 1051 overload feedback in the first response and can adjust the amount 1052 of traffic forwarded accordingly. 1053 o A SIP server can limit the distribution of overload control 1054 information by only inserting it into responses to known upstream 1055 neighbors. A SIP server can use transport level authentication 1056 (e.g., via TLS) with its upstream neighbors. 1058 9.1.2. SIP Event Package 1060 Overload control information can also be conveyed from a receiver to 1061 a sender using a new event package. Such an event package enables a 1062 sending entity to subscribe to the overload status of its downstream 1063 neighbors and receive notifications of overload control status 1064 changes in NOTIFY requests. This approach has the following 1065 characteristics: 1067 o Overload control information is conveyed decoupled from SIP 1068 signaling. It enables an overload control manager, which is a 1069 separate entity, to monitor the load on other servers and provide 1070 overload control feedback to all SIP servers that have set up 1071 subscriptions with the controller. 1072 o With an event package, a receiver can send updates to senders that 1073 are currently inactive. Inactive senders will receive a 1074 notification about the overload and can refrain from sending 1075 traffic to this neighbor until the overload condition is resolved. 1076 The receiver can also notify all potential senders once they are 1077 permitted to send traffic again. However, these notifications do 1078 generate additional traffic, which adds to the overall load. 1079 o A SIP entity needs to set up and maintain overload control 1080 subscriptions with all upstream and downstream neighbors. A new 1081 subscription needs to be set up before/while a request is 1082 transmitted to a new downstream neighbor. Servers can be 1083 configured to subscribe at boot time. However, this would require 1084 additional protection to avoid the avalanche restart problem for 1085 overload control. Subscriptions need to be terminated when they 1086 are not needed any more, which can be done, for example, using a 1087 timeout mechanism. 1089 o A receiver needs to send NOTIFY messages to all subscribed 1090 upstream neighbors in a timely manner when the control algorithm 1091 requires a change in the control variable (e.g., when a SIP server 1092 is in an overload condition). This includes active as well as 1093 inactive neighbors. These NOTIFYs add to the amount of traffic 1094 that needs to be processed. To ensure that these requests will 1095 not be dropped due to overload, a priority mechanism needs to be 1096 implemented in all servers these request will pass through. 1097 o As overload feedback is sent to all senders in separate messages, 1098 this mechanism is not suitable when frequent overload control 1099 feedback is needed. 1100 o A SIP server can limit the set of senders that can receive 1101 overload control information by authenticating subscriptions to 1102 this event package. 1103 o This approach requires each proxy to implement user agent 1104 functionality (UAS and UAC) to manage the subscriptions. 1106 9.2. Backwards Compatibility 1108 An new overload control mechanism needs to be backwards compatible so 1109 that it can be gradually introduced into a network and functions 1110 properly if only a fraction of the servers support it. 1112 Hop-by-hop overload control (see [RFC6357]) has the advantage that it 1113 does not require that all SIP entities in a network support it. It 1114 can be used effectively between two adjacent SIP servers if both 1115 servers support overload control and does not depend on the support 1116 from any other server or user agent. The more SIP servers in a 1117 network support hop-by-hop overload control, the better protected the 1118 network is against occurrences of overload. 1120 A SIP server may have multiple upstream neighbors from which only 1121 some may support overload control. If a server would simply use this 1122 overload control mechanism, only those that support it would reduce 1123 traffic. Others would keep sending at the full rate and benefit from 1124 the throttling by the servers that support overload control. In 1125 other words, upstream neighbors that do not support overload control 1126 would be better off than those that do. 1128 A SIP server should therefore follow the behaviour outlined in 1129 Section 5.10.2 to handle clients that do not support overload 1130 control. 1132 10. Security Considerations 1134 Overload control mechanisms can be used by an attacker to conduct a 1135 denial-of-service attack on a SIP entity if the attacker can pretend 1136 that the SIP entity is overloaded. When such a forged overload 1137 indication is received by an upstream SIP client, it will stop 1138 sending traffic to the victim. Thus, the victim is subject to a 1139 denial-of-service attack. 1141 An attacker can create forged overload feedback by inserting itself 1142 into the communication between the victim and its upstream neighbors. 1143 The attacker would need to add overload feedback indicating a high 1144 load to the responses passed from the victim to its upstream 1145 neighbor. Proxies can prevent this attack by communicating via TLS. 1146 Since overload feedback has no meaning beyond the next hop, there is 1147 no need to secure the communication over multiple hops. 1149 Another way to conduct an attack is to send a message containing a 1150 high overload feedback value through a proxy that does not support 1151 this extension. If this feedback is added to the second Via headers 1152 (or all Via headers), it will reach the next upstream proxy. If the 1153 attacker can make the recipient believe that the overload status was 1154 created by its direct downstream neighbor (and not by the attacker 1155 further downstream) the recipient stops sending traffic to the 1156 victim. A precondition for this attack is that the victim proxy does 1157 not support this extension since it would not pass through overload 1158 control feedback otherwise. 1160 A malicious SIP entity could gain an advantage by pretending to 1161 support this specification but never reducing the amount of traffic 1162 it forwards to the downstream neighbor. If its downstream neighbor 1163 receives traffic from multiple sources which correctly implement 1164 overload control, the malicious SIP entity would benefit since all 1165 other sources to its downstream neighbor would reduce load. 1167 The solution to this problem depends on the overload control 1168 method. For rate-based and window-based overload control, it is 1169 very easy for a downstream entity to monitor if the upstream 1170 neighbor throttles traffic forwarded as directed. For percentage 1171 throttling this is not always obvious since the load forwarded 1172 depends on the load received by the upstream neighbor. 1174 11. IANA Considerations 1176 This specification defines four new Via header parameters as detailed 1177 below in the "Header Field Parameter and Parameter Values" sub- 1178 registry as per the registry created by [RFC3968]. The required 1179 information is: 1181 Header Field Parameter Name Predefined Values Reference 1182 __________________________________________________________ 1183 Via oc Yes RFCXXXX 1184 Via oc-validity Yes RFCXXXX 1185 Via oc-seq Yes RFCXXXX 1186 Via oc-algo Yes RFCXXXX 1188 RFC XXXX [NOTE TO RFC-EDITOR: Please replace with final RFC 1189 number of this specification.] 1191 NOTE: Do we need to do anything special to register "loss" 1192 as a value for "oc-algo" parameter? 1194 12. References 1196 12.1. Normative References 1198 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1199 Requirement Levels", BCP 14, RFC 2119, March 1997. 1201 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1202 A., Peterson, J., Sparks, R., Handley, M., and E. 1203 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1204 June 2002. 1206 [RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation 1207 Protocol (SIP): Locating SIP Servers", RFC 3263, 1208 June 2002. 1210 [RFC3968] Camarillo, G., "The Internet Assigned Number Authority 1211 (IANA) Header Field Parameter Registry for the Session 1212 Initiation Protocol (SIP)", BCP 98, RFC 3968, 1213 December 2004. 1215 [RFC4412] Schulzrinne, H. and J. Polk, "Communications Resource 1216 Priority for the Session Initiation Protocol (SIP)", 1217 RFC 4412, February 2006. 1219 12.2. Informative References 1221 [I-D.ietf-soc-load-control-event-package] 1222 Shen, C., Schulzrinne, H., and A. Koike, "A Session 1223 Initiation Protocol (SIP) Load Control Event Package", 1224 draft-ietf-soc-load-control-event-package-03 (work in 1225 progress), March 2012. 1227 [I-D.noel-soc-overload-rate-control] 1228 Noel, E. and P. Williams, "Session Initiation Protocol 1229 (SIP) Rate Control", 1230 draft-noel-soc-overload-rate-control-02 (work in 1231 progress), December 2011. 1233 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 1234 Emergency and Other Well-Known Services", RFC 5031, 1235 January 2008. 1237 [RFC5390] Rosenberg, J., "Requirements for Management of Overload in 1238 the Session Initiation Protocol", RFC 5390, December 2008. 1240 [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design 1241 Considerations for Session Initiation Protocol (SIP) 1242 Overload Control", RFC 6357, August 2011. 1244 Appendix A. Acknowledgements 1246 Many thanks to Bruno Chatras, Keith Drage, Janet Gunn, Rich Terpstra, 1247 Daryl Malas, R. Parthasarathi, Antoine Roly, Jonathan Rosenberg, 1248 Charles Shen, Rahul Srivastava, Padma Valluri, Shaun Bharrat, Paul 1249 Kyzivat and Jeroen Van Bemmel for their contributions to this 1250 specification. 1252 Adam Roach and Eric McMurry helped flesh out the different cases for 1253 handling SIP messages described in the algorithm of Section 6.3. 1255 Appendix B. RFC5390 requirements 1257 Table 1 provides a summary how this specification fulfills the 1258 requirements of [RFC5390]. A more detailed view on how each 1259 requirements is fulfilled is provided after the table. 1261 +-------------+-------------------+ 1262 | Requirement | Meets requirement | 1263 +-------------+-------------------+ 1264 | REQ 1 | Yes | 1265 | REQ 2 | Yes | 1266 | REQ 3 | Partially | 1267 | REQ 4 | Partially | 1268 | REQ 5 | Partially | 1269 | REQ 6 | Not applicable | 1270 | REQ 7 | Yes | 1271 | REQ 8 | Partially | 1272 | REQ 9 | Yes | 1273 | REQ 10 | Yes | 1274 | REQ 11 | Yes | 1275 | REQ 12 | Yes | 1276 | REQ 13 | Yes | 1277 | REQ 14 | Yes | 1278 | REQ 15 | Yes | 1279 | REQ 16 | Yes | 1280 | REQ 17 | Partially | 1281 | REQ 18 | Yes | 1282 | REQ 19 | Yes | 1283 | REQ 20 | Yes | 1284 | REQ 21 | Yes | 1285 | REQ 22 | Yes | 1286 | REQ 23 | Yes | 1287 +-------------+-------------------+ 1289 Summary of meeting requirements in RFC5390 1291 Table 1 1293 REQ 1: The overload mechanism shall strive to maintain the overall 1294 useful throughput (taking into consideration the quality-of-service 1295 needs of the using applications) of a SIP server at reasonable 1296 levels, even when the incoming load on the network is far in excess 1297 of its capacity. The overall throughput under load is the ultimate 1298 measure of the value of an overload control mechanism. 1300 Meeting REQ 1: Yes, the overload control mechanism allows an 1301 overloaded SIP server to maintain a reasonable level of throughput as 1302 it enters into congestion mode by requesting the upstream clients to 1303 reduce traffic destined downstream. 1305 REQ 2: When a single network element fails, goes into overload, or 1306 suffers from reduced processing capacity, the mechanism should strive 1307 to limit the impact of this on other elements in the network. This 1308 helps to prevent a small-scale failure from becoming a widespread 1309 outage. 1311 Meeting REQ 2: Yes. When a SIP server enters overload mode, it will 1312 request the upstream clients to throttle the traffic destined to it. 1313 As a consequence of this, the overloaded SIP server will itself 1314 generate proportionally less downstream traffic, thereby limiting the 1315 impact on other elements in the network. 1317 REQ 3: The mechanism should seek to minimize the amount of 1318 configuration required in order to work. For example, it is better 1319 to avoid needing to configure a server with its SIP message 1320 throughput, as these kinds of quantities are hard to determine. 1322 Meeting REQ 3: Partially. On the server side, the overload condition 1323 is determined monitoring S (c.f., Section 4 of [RFC6357]) and 1324 reporting a load feedback F as a value to the "oc" parameter. On the 1325 client side, a throttle T is applied to requests going downstream 1326 based on F. This specification does not prescribe any value for S, 1327 nor a particular value for F. The "oc-algo" parameter allows for 1328 automatic convergence to a particular class of overload control 1329 algorithm. There are suggested default values for the "oc-validity" 1330 parameter. 1332 REQ 4: The mechanism must be capable of dealing with elements that do 1333 not support it, so that a network can consist of a mix of elements 1334 that do and don't support it. In other words, the mechanism should 1335 not work only in environments where all elements support it. It is 1336 reasonable to assume that it works better in such environments, of 1337 course. Ideally, there should be incremental improvements in overall 1338 network throughput as increasing numbers of elements in the network 1339 support the mechanism. 1341 Meeting REQ 4: Partially. The mechanism is designed to reduce 1342 congestion when a pair of communicating entities support it. If a 1343 downstream overloaded SIP server does not respond to a request in 1344 time, a SIP client will attempt to reduce traffic destined towards 1345 the non-responsive server as outlined in Section 5.9. 1347 REQ 5: The mechanism should not assume that it will only be deployed 1348 in environments with completely trusted elements. It should seek to 1349 operate as effectively as possible in environments where other 1350 elements are malicious; this includes preventing malicious elements 1351 from obtaining more than a fair share of service. 1353 Meeting REQ 5: Partially. Since overload control information is 1354 shared between a pair of communicating entities, a confidential and 1355 authenticated channel can be used for this communication. However, 1356 if such a channel is not available, then the security ramifications 1357 outlined in Section 10 apply. 1359 REQ 6: When overload is signaled by means of a specific message, the 1360 message must clearly indicate that it is being sent because of 1361 overload, as opposed to other, non overload-based failure conditions. 1362 This requirement is meant to avoid some of the problems that have 1363 arisen from the reuse of the 503 response code for multiple purposes. 1364 Of course, overload is also signaled by lack of response to requests. 1365 This requirement applies only to explicit overload signals. 1367 Meeting REQ 6: Not applicable. Overload control information is 1368 signaled as part of the Via header and not in a new header. 1370 REQ 7: The mechanism shall provide a way for an element to throttle 1371 the amount of traffic it receives from an upstream element. This 1372 throttling shall be graded so that it is not all- or-nothing as with 1373 the current 503 mechanism. This recognizes the fact that "overload" 1374 is not a binary state and that there are degrees of overload. 1376 Meeting REQ 7: Yes, please see Section 5.5 and Section 5.10. 1378 REQ 8: The mechanism shall ensure that, when a request was not 1379 processed successfully due to overload (or failure) of a downstream 1380 element, the request will not be retried on another element that is 1381 also overloaded or whose status is unknown. This requirement derives 1382 from REQ 1. 1384 Meeting REQ 8: Partially. A SIP client that has overload information 1385 from multiple downstream servers will not retry the request on 1386 another element. However, if a SIP client does not know the overload 1387 status of a downstream server, it may send the request to that 1388 server. 1390 REQ 9: That a request has been rejected from an overloaded element 1391 shall not unduly restrict the ability of that request to be submitted 1392 to and processed by an element that is not overloaded. This 1393 requirement derives from REQ 1. 1395 Meeting REQ 9: Yes, a SIP client conformant to this specification 1396 will send the request to a different element. 1398 REQ 10: The mechanism should support servers that receive requests 1399 from a large number of different upstream elements, where the set of 1400 upstream elements is not enumerable. 1402 Meeting REQ 10: Yes, there are no constraints on the number of 1403 upstream clients. 1405 REQ 11: The mechanism should support servers that receive requests 1406 from a finite set of upstream elements, where the set of upstream 1407 elements is enumerable. 1409 Meeting REQ 11: Yes, there are no constraints on the number of 1410 upstream clients. 1412 REQ 12: The mechanism should work between servers in different 1413 domains. 1415 Meeting REQ 12: Yes, there are no inherent limitations on using 1416 overload control between domains. 1418 REQ 13: The mechanism must not dictate a specific algorithm for 1419 prioritizing the processing of work within a proxy during times of 1420 overload. It must permit a proxy to prioritize requests based on any 1421 local policy, so that certain ones (such as a call for emergency 1422 services or a call with a specific value of the Resource-Priority 1423 header field [RFC4412]) are given preferential treatment, such as not 1424 being dropped, being given additional retransmission, or being 1425 processed ahead of others. 1427 Meeting REQ 13: Yes, please see Section 5.10. 1429 REQ 14: REQ 14: The mechanism should provide unambiguous directions 1430 to clients on when they should retry a request and when they should 1431 not. This especially applies to TCP connection establishment and SIP 1432 registrations, in order to mitigate against avalanche restart. 1434 Meeting REQ 14: Yes, Section 5.9 provides normative behavior on when 1435 to retry a request after repeated timeouts and fatal transport errors 1436 resulting from communications with a non-responsive downstream SIP 1437 server. 1439 REQ 15: In cases where a network element fails, is so overloaded that 1440 it cannot process messages, or cannot communicate due to a network 1441 failure or network partition, it will not be able to provide explicit 1442 indications of the nature of the failure or its levels of congestion. 1443 The mechanism must properly function in these cases. 1445 Meeting REQ 15: Yes, Section 5.9 provides normative behavior on when 1446 to retry a request after repeated timeouts and fatal transport errors 1447 resulting from communications with a non-responsive downstream SIP 1448 server. 1450 REQ 16: The mechanism should attempt to minimize the overhead of the 1451 overload control messaging. 1453 Meeting REQ 16: Yes, overload control messages are sent in the 1454 topmost Via header, which is always processed by the SIP elements. 1456 REQ 17: The overload mechanism must not provide an avenue for 1457 malicious attack, including DoS and DDoS attacks. 1459 Meeting REQ 17: Partially. Since overload control information is 1460 shared between a pair of communicating entities, a confidential and 1461 authenticated channel can be used for this communication. However, 1462 if such a channel is not available, then the security ramifications 1463 outlined in Section 10 apply. 1465 REQ 18: The overload mechanism should be unambiguous about whether a 1466 load indication applies to a specific IP address, host, or URI, so 1467 that an upstream element can determine the load of the entity to 1468 which a request is to be sent. 1470 Meeting REQ 18: Yes, please see discussion in Section 5.5. 1472 REQ 19: The specification for the overload mechanism should give 1473 guidance on which message types might be desirable to process over 1474 others during times of overload, based on SIP-specific 1475 considerations. For example, it may be more beneficial to process a 1476 SUBSCRIBE refresh with Expires of zero than a SUBSCRIBE refresh with 1477 a non-zero expiration (since the former reduces the overall amount of 1478 load on the element), or to process re-INVITEs over new INVITEs. 1480 Meeting REQ 19: Yes, please see Section 5.10. 1482 REQ 20: In a mixed environment of elements that do and do not 1483 implement the overload mechanism, no disproportionate benefit shall 1484 accrue to the users or operators of the elements that do not 1485 implement the mechanism. 1487 Meeting REQ 20: Yes, an element that does not implement overload 1488 control does not receive any measure of extra benefit. 1490 REQ 21: The overload mechanism should ensure that the system remains 1491 stable. When the offered load drops from above the overall capacity 1492 of the network to below the overall capacity, the throughput should 1493 stabilize and become equal to the offered load. 1495 Meeting REQ 21: Yes, the overload control mechanism described in this 1496 draft ensures the stability of the system. 1498 REQ 22: It must be possible to disable the reporting of load 1499 information towards upstream targets based on the identity of those 1500 targets. This allows a domain administrator who considers the load 1501 of their elements to be sensitive information, to restrict access to 1502 that information. Of course, in such cases, there is no expectation 1503 that the overload mechanism itself will help prevent overload from 1504 that upstream target. 1506 Meeting REQ 22: Yes, an operator of a SIP server can configure the 1507 SIP server to only report overload control information for requests 1508 received over a confidential channel, for example. However, note 1509 that this requirement is in conflict with REQ 3, as it introduces a 1510 modicum of extra configuration. 1512 REQ 23: It must be possible for the overload mechanism to work in 1513 cases where there is a load balancer in front of a farm of proxies. 1515 Meeting REQ 23: Yes. Depending on the type of load balancer, this 1516 requirement is met. A load balancer fronting a farm of SIP proxies 1517 could be a SIP-aware load balancer or one that is not SIP-aware. If 1518 the load balancer is SIP-aware, it can make conscious decisions on 1519 throttling outgoing traffic towards the individual server in the farm 1520 based on the overload control parameters returned by the server. On 1521 the other hand, if the load balancer is not SIP-aware, then there are 1522 other strategies to perform overload control. Section 6 of [RFC6357] 1523 documents some of these strategies in more detail (see discussion 1524 related to Figure 3(a) in Section 6). 1526 Authors' Addresses 1528 Vijay K. Gurbani (editor) 1529 Bell Laboratories, Alcatel-Lucent 1530 1960 Lucent Lane, Rm 9C-533 1531 Naperville, IL 60563 1532 USA 1534 Email: vkg@bell-labs.com 1536 Volker Hilt 1537 Bell Laboratories, Alcatel-Lucent 1538 791 Holmdel-Keyport Rd 1539 Holmdel, NJ 07733 1540 USA 1542 Email: volkerh@bell-labs.com 1543 Henning Schulzrinne 1544 Columbia University/Department of Computer Science 1545 450 Computer Science Building 1546 New York, NY 10027 1547 USA 1549 Phone: +1 212 939 7004 1550 Email: hgs@cs.columbia.edu 1551 URI: http://www.cs.columbia.edu