idnits 2.17.1 draft-ietf-soc-overload-design-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 19, 2010) is 4905 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Hilt 3 Internet-Draft Bell Labs/Alcatel-Lucent 4 Intended status: Informational E. Noel 5 Expires: May 23, 2011 AT&T Labs 6 C. Shen 7 Columbia University 8 A. Abdelal 9 Sonus Networks 10 November 19, 2010 12 Design Considerations for Session Initiation Protocol (SIP) Overload 13 Control 14 draft-ietf-soc-overload-design-02 16 Abstract 18 Overload occurs in Session Initiation Protocol (SIP) networks when 19 SIP servers have insufficient resources to handle all SIP messages 20 they receive. Even though the SIP protocol provides a limited 21 overload control mechanism through its 503 (Service Unavailable) 22 response code, SIP servers are still vulnerable to overload. This 23 document discusses models and design considerations for a SIP 24 overload control mechanism. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on May 23, 2011. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. SIP Overload Problem . . . . . . . . . . . . . . . . . . . . . 4 62 3. Explicit vs. Implicit Overload Control . . . . . . . . . . . . 5 63 4. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 5. Degree of Cooperation . . . . . . . . . . . . . . . . . . . . 8 65 5.1. Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Local Overload Control . . . . . . . . . . . . . . . . . . 11 68 6. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 7. Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 70 8. Performance Metrics . . . . . . . . . . . . . . . . . . . . . 14 71 9. Explicit Overload Control Feedback . . . . . . . . . . . . . . 15 72 9.1. Rate-based Overload Control . . . . . . . . . . . . . . . 15 73 9.2. Loss-based Overload Control . . . . . . . . . . . . . . . 17 74 9.3. Window-based Overload Control . . . . . . . . . . . . . . 17 75 9.4. Overload Signal-based Overload Control . . . . . . . . . . 19 76 9.5. On-/Off Overload Control . . . . . . . . . . . . . . . . . 19 77 10. Implicit Overload Control . . . . . . . . . . . . . . . . . . 19 78 11. Overload Control Algorithms . . . . . . . . . . . . . . . . . 20 79 12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 20 80 13. Security Considerations . . . . . . . . . . . . . . . . . . . 21 81 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 82 15. Informative References . . . . . . . . . . . . . . . . . . . . 21 83 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 22 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 86 1. Introduction 88 As with any network element, a Session Initiation Protocol (SIP) 89 [RFC3261] server can suffer from overload when the number of SIP 90 messages it receives exceeds the number of messages it can process. 91 Overload occurs if a SIP server does not have sufficient resources to 92 process all incoming SIP messages. These resources may include CPU, 93 memory, input/output, or disk resources. 95 Overload can pose a serious problem for a network of SIP servers. 96 During periods of overload, the throughput of a network of SIP 97 servers can be significantly degraded. In fact, overload may lead to 98 a situation in which the throughput drops down to a small fraction of 99 the original processing capacity. This is often called congestion 100 collapse. 102 An overload control mechanism enables a SIP server to perform close 103 to its capacity limit during times of overload. Overload control is 104 used by a SIP server if it is unable to process all SIP requests due 105 to resource constraints. There are other failure cases in which a 106 SIP server can successfully process incoming requests but has to 107 reject them for other reasons. For example, a PSTN gateway that runs 108 out of trunk lines but still has plenty of capacity to process SIP 109 messages should reject incoming INVITEs using a response such as 488 110 (Not Acceptable Here), as described in [RFC4412]. Similarly, a SIP 111 registrar that has lost connectivity to its registration database but 112 is still capable of processing SIP messages should reject REGISTER 113 requests with a 500 (Server Error) response [RFC3261]. Overload 114 control mechanisms do not apply in these cases and SIP provides 115 appropriate response codes for them. 117 There are cases in which a SIP server runs other services that do not 118 involve the processing of SIP messages (e.g., processing of RTP 119 packets, database queries, software updates and event handling). 120 These services may, or may not, be correlated with the SIP message 121 volume. These services can use up a substantial share of resources 122 available on the server (e.g., CPU cycles) and leave the server in a 123 condition where it is unable to process all incoming SIP requests. 124 In these cases, the SIP server applies SIP overload control 125 mechanisms to avoid congestion collapse on the SIP signaling plane. 126 However, controlling the number of SIP requests may not significantly 127 reduce the load on the server if the resource shortage was created by 128 another service. In these cases, it is to be expected that the 129 server uses appropriate methods of controlling the resource usage of 130 other services. The specifics of controlling the resource usage of 131 other services and their coordination is of scope for this document. 133 The SIP protocol provides a limited mechanism for overload control 134 through its 503 (Service Unavailable) response code and the Retry- 135 After header. However, this mechanism cannot prevent overload of a 136 SIP server and it cannot prevent congestion collapse. In fact, it 137 may cause traffic to oscillate and to shift between SIP servers and 138 thereby worsen an overload condition. A detailed discussion of the 139 SIP overload problem, the problems with the 503 (Service Unavailable) 140 response code and the Retry-After header and the requirements for a 141 SIP overload control mechanism can be found in [RFC5390]. In 142 addition, 503 is used for other situations (with or without Retry- 143 After), not just SIP Server overload. A SIP Overload Control process 144 based on 503 would have to specify exactly which cause values trigger 145 the Overload Control. 147 This document discusses the models, assumptions and design 148 considerations for a SIP overload control mechanism. The document is 149 a product of the SIP overload control design team. 151 2. SIP Overload Problem 153 A key contributor to the SIP congestion collapse [RFC5390] is the 154 regenerative behavior of overload in the SIP protocol. When SIP is 155 running over the UDP protocol, it will retransmit messages that were 156 dropped or excessively delayed by a SIP server due to overload and 157 thereby increase the offered load for the already overloaded server. 158 This increase in load worsens the severity of the overload condition 159 and, in turn, causes more messages to be dropped. A congestion 160 collapse can occur [Hilt et al.], [Noel et al.], [Shen et al.] and 161 [Abdelal et al.]. 163 Regenerative behavior under overload should ideally be avoided by any 164 protocol as this would lead to stable operation under overload. 165 However, this is often difficult to achieve in practice. For 166 example, changing the SIP retransmission timer mechanisms can reduce 167 the degree of regeneration during overload but will impact the 168 ability of SIP to recover from message losses. Without any 169 retransmission each message that is dropped due to SIP server 170 overload will eventually lead to a failed call. 172 For a SIP INVITE transaction to be successful a minimum of three 173 messages need to be forwarded by a SIP server. Often an INVITE 174 transaction consists of five or more SIP messages. If a SIP server 175 under overload randomly discards messages without evaluating them, 176 the chances that all messages belonging to a transaction are 177 successfully forwarded will decrease as the load increases. Thus, 178 the number of transactions that complete successfully will decrease 179 even if the message throughput of a server remains up and assuming 180 the overload behavior is fully non-regenerative. A SIP server might 181 (partially) parse incoming messages to determine if it is a new 182 request or a message belonging to an existing transaction. However, 183 after having spend resources on parsing a SIP message, discarding 184 this message is expensive as the resources already spend are lost. 185 The number of successful transactions will therefore decline with an 186 increase in load as less and less resources can be spent on 187 forwarding messages and more and more resources are consumed by 188 inspecting messages that will eventually be dropped. The slope of 189 the decline depends on the amount of resources spent to inspect each 190 message. 192 Another challenge for SIP overload control is controlling the rate of 193 the true traffic source. Overload is often caused by a large number 194 of UAs each of which creates only a single message. However, the sum 195 of their traffic can overload a SIP server. The overload mechanisms 196 suitable for controlling a SIP server (e.g., rate control) may not be 197 effective for individual UAs. In some cases, there are other non-SIP 198 mechanisms for limiting the load from the UAs. These may operate 199 independently from, or in conjunction with, the SIP overload 200 mechanisms described here. In either case, they are out of scope for 201 this document. 203 3. Explicit vs. Implicit Overload Control 205 The main differences between explicit and implicit overload control 206 is the way overload is signaled from a SIP server that is reaching 207 overload condition to its upstream neighbors. 209 In an explicit overload control mechanism, a SIP server uses an 210 explicit overload signal to indicate that it is reaching its capacity 211 limit. Upstream neighbors receiving this signal can adjust their 212 transmission rate according to the overload signal to a level that is 213 acceptable to the downstream server. The overload signal enables a 214 SIP server to steer the load it is receiving to a rate at which it 215 can perform at maximum capacity. 217 Implicit overload control uses the absence of responses and packet 218 loss as an indication of overload. A SIP server that is sensing such 219 a condition reduces the load it is forwarding a downstream neighbor. 220 Since there is no explicit overload signal, this mechanism is robust 221 as it does not depend on actions taken by the SIP server running into 222 overload. 224 The ideas of explicit and implicit overload control are in fact 225 complementary. By considering implicit overload indications a server 226 can avoid overloading an unresponsive downstream neighbor. An 227 explicit overload signal enables a SIP server to actively steer the 228 incoming load to a desired level. 230 4. System Model 232 The model shown in Figure 1 identifies fundamental components of an 233 explicit SIP overload control mechanism: 235 SIP Processor: The SIP Processor processes SIP messages and is the 236 component that is protected by overload control. 237 Monitor: The Monitor measures the current load of the SIP processor 238 on the receiving entity. It implements the mechanisms needed to 239 determine the current usage of resources relevant for the SIP 240 processor and reports load samples (S) to the Control Function. 241 Control Function: The Control Function implements the overload 242 control algorithm. The control function uses the load samples (S) 243 and determines if overload has occurred and a throttle (T) needs 244 to be set to adjust the load sent to the SIP processor on the 245 receiving entity. The control function on the receiving entity 246 sends load feedback (F) to the sending entity. 247 Actuator: The Actuator implements the algorithms needed to act on 248 the throttles (T) and ensures that the amount of traffic forwarded 249 to the receiving entity meets the criteria of the throttle. For 250 example, a throttle may instruct the Actuator to not forward more 251 than 100 INVITE messages per second. The Actuator implements the 252 algorithms to achieve this objective, e.g., using message gapping. 253 It also implements algorithms to select the messages that will be 254 affected and determine whether they are rejected or redirected. 256 The type of feedback (F) conveyed from the receiving to the sending 257 entity depends on the overload control method used (i.e., loss-based, 258 rate-based, window-based or signal-based overload control; see 259 Section 9), the overload control algorithm (see Section 11) as well 260 as other design parameters. The feedback (F) enables the sending 261 entity to adjust the amount of traffic forwarded to the receiving 262 entity to a level that is acceptable to the receiving entity without 263 causing overload. 265 Figure 1 depicts a general system model for overload control. In 266 this diagram, one instance of the control function is on the sending 267 entity (i.e., associated with the actuator) and one is on the 268 receiving entity (i.e., associated with the monitor). However, a 269 specific mechanism may not require both elements. In this case, one 270 of two control function elements can be empty and simply passes along 271 feedback. E.g., if (F) is defined as a loss-rate (e.g., reduce 272 traffic by 10%) there is no need for a control function on the 273 sending entity as the content of (F) can be copied directly into (T). 275 The model in Figure 1 shows a scenario with one sending and one 276 receiving entity. In a more realistic scenario a receiving entity 277 will receive traffic from multiple sending entities and vice versa 278 (see Section 6). The feedback generated by a Monitor will therefore 279 often be distributed across multiple Actuators. A Monitor needs to 280 be able to split the load it can process across multiple sending 281 entities and generate feedback that correctly adjusts the load each 282 sending entity is allowed to send. Similarly, an Actuator needs to 283 be prepared to receive different levels of feedback from different 284 receiving entities and throttle traffic to these entities 285 accordingly. 287 In a realistic deployment, SIP messages will flow in both directions, 288 from server B to server A as well as server A to server B. The 289 overload control mechanisms in each direction can be considered 290 independently. For messages flowing from server A to server B, the 291 sending entity is server A and the receiving entity is server B and 292 vice versa. The control loops in both directions operate 293 independently. 295 Sending Receiving 296 Entity Entity 297 +----------------+ +----------------+ 298 | Server A | | Server B | 299 | +----------+ | | +----------+ | -+ 300 | | Control | | F | | Control | | | 301 | | Function |<-+------+--| Function | | | 302 | +----------+ | | +----------+ | | 303 | T | | | ^ | | Overload 304 | v | | | S | | Control 305 | +----------+ | | +----------+ | | 306 | | Actuator | | | | Monitor | | | 307 | +----------+ | | +----------+ | | 308 | | | | ^ | -+ 309 | v | | | | -+ 310 | +----------+ | | +----------+ | | 311 <-+--| SIP | | | | SIP | | | SIP 312 --+->|Processor |--+------+->|Processor |--+-> | System 313 | +----------+ | | +----------+ | | 314 +----------------+ +----------------+ -+ 316 Figure 1: System Model for Explicit Overload Control 318 5. Degree of Cooperation 320 A SIP request is usually processed by more than one SIP server on its 321 path to the destination. Thus, a design choice for an explicit 322 overload control mechanism is where to place the components of 323 overload control along the path of a request and, in particular, 324 where to place the Monitor and Actuator. This design choice 325 determines the degree of cooperation between the SIP servers on the 326 path. Overload control can be implemented hop-by-hop with the 327 Monitor on one server and the Actuator on its direct upstream 328 neighbor. Overload control can be implemented end-to-end with 329 Monitors on all SIP servers along the path of a request and an 330 Actuator on the sender. In this case, the Control Functions 331 associated with each Monitor have to cooperate to jointly determine 332 the overall feedback for this path. Finally, overload control can be 333 implemented locally on a SIP server if Monitor and Actuator reside on 334 the same server. In this case, the sending entity and receiving 335 entity are the same SIP server and Actuator and Monitor operate on 336 the same SIP processor (although, the Actuator typically operates on 337 a pre-processing stage in local overload control). Local overload 338 control is an internal overload control mechanism as the control loop 339 is implemented internally on one server. Hop-by-hop and end-to-end 340 are external overload control mechanisms. All three configurations 341 are shown in Figure 2. 343 +---------+ +------(+)---------+ 344 +------+ | | | ^ | 345 | | | +---+ | | +---+ 346 v | v //=>| C | v | //=>| C | 347 +---+ +---+ // +---+ +---+ +---+ // +---+ 348 | A |===>| B | | A |===>| B | 349 +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ 350 ^ \\=>| D | ^ | \\=>| D | 351 | +---+ | | +---+ 352 | | | v | 353 +---------+ +------(+)---------+ 355 (a) hop-by-hop (b) end-to-end 357 +-+ 358 v | 359 +-+ +-+ +---+ 360 v | v | //=>| C | 361 +---+ +---+ // +---+ 362 | A |===>| B | 363 +---+ +---+ \\ +---+ 364 \\=>| D | 365 +---+ 366 ^ | 367 +-+ 369 (c) local 371 ==> SIP request flow 372 <-- Overload feedback loop 374 Figure 2: Degree of Cooperation between Servers 376 5.1. Hop-by-Hop 378 The idea of hop-by-hop overload control is to instantiate a separate 379 control loop between all neighboring SIP servers that directly 380 exchange traffic. I.e., the Actuator is located on the SIP server 381 that is the direct upstream neighbor of the SIP server that has the 382 corresponding Monitor. Each control loop between two servers is 383 completely independent of the control loop between other servers 384 further up- or downstream. In the example in Figure 2(a), three 385 independent overload control loops are instantiated: A - B, B - C and 386 B - D. Each loop only controls a single hop. Overload feedback 387 received from a downstream neighbor is not forwarded further 388 upstream. Instead, a SIP server acts on this feedback, for example, 389 by rejecting SIP messages if needed. If the upstream neighbor of a 390 server also becomes overloaded, it will report this problem to its 391 upstream neighbors, which again take action based on the reported 392 feedback. Thus, in hop-by-hop overload control, overload is always 393 resolved by the direct upstream neighbors of the overloaded server 394 without the need to involve entities that are located multiple SIP 395 hops away. 397 Hop-by-hop overload control reduces the impact of overload on a SIP 398 network and can avoid congestion collapse. It is simple and scales 399 well to networks with many SIP entities. An advantage is that it 400 does not require feedback to be transmitted across multiple-hops, 401 possibly crossing multiple trust domains. Feedback is sent to the 402 next hop only. Furthermore, it does not require a SIP entity to 403 aggregate a large number of overload status values or keep track of 404 the overload status of SIP servers it is not communicating with. 406 5.2. End-to-End 408 End-to-end overload control implements an overload control loop along 409 the entire path of a SIP request, from UAC to UAS. An end-to-end 410 overload control mechanism consolidates overload information from all 411 SIP servers on the way (including all proxies and the UAS) and uses 412 this information to throttle traffic as far upstream as possible. An 413 end-to-end overload control mechanism has to be able to frequently 414 collect the overload status of all servers on the potential path(s) 415 to a destination and combine this data into meaningful overload 416 feedback. 418 A UA or SIP server only throttles requests if it knows that these 419 requests will eventually be forwarded to an overloaded server. For 420 example, if D is overloaded in Figure 2(b), A should only throttle 421 requests it forwards to B when it knows that they will be forwarded 422 to D. It should not throttle requests that will eventually be 423 forwarded to C, since server C is not overloaded. In many cases, it 424 is difficult for A to determine which requests will be routed to C 425 and D since this depends on the local routing decision made by B. 426 These routing decisions can be highly variable and, for example, 427 depend on call routing policies configured by the user, services 428 invoked on a call, load balancing policies, etc. The fact that a 429 previous message to a target has been routed through an overloaded 430 server does not necessarily mean the next message to this target will 431 also be routed through the same server. 433 The main problem of end-to-end overload control is its inherent 434 complexity since UAC or SIP servers need to monitor all potential 435 paths to a destination in order to determine which requests should be 436 throttled and which requests may be sent. Even if this information 437 is available, it is not clear which path a specific request will 438 take. 440 A variant of end-to-end overload control is to implement a control 441 loop between a set of well-known SIP servers along the path of a SIP 442 request. For example, an overload control loop can be instantiated 443 between a server that only has one downstream neighbor or a set of 444 closely coupled SIP servers. A control loop spanning multiple hops 445 can be used if the sending entity has full knowledge about the SIP 446 servers on the path of a SIP message. 448 A key difference to transport protocols using end-to-end congestion 449 control such as TCP is that the traffic exchanged between SIP servers 450 consists of many individual SIP messages. Each of these SIP messages 451 has its own source and destination. Even SIP messages containing 452 identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message to the 453 same SIP URI) can be routed to different destinations. This is 454 different from TCP which controls a stream of packets between a 455 single source and a single destination. 457 5.3. Local Overload Control 459 The idea of local overload control (see Figure 2(c)) is to run the 460 Monitor and Actuator on the same server. This enables the server to 461 monitor the current resource usage and to reject messages that can't 462 be processed without overusing the local resources. The fundamental 463 assumption behind local overload control is that it is less resource 464 consuming for a server to reject messages than to process them. A 465 server can therefore reject the excess messages it cannot process to 466 stop all retransmissions of these messages. Since rejecting messages 467 does consume resources on a SIP server, local overload control alone 468 cannot prevent a congestion collapse. 470 Local overload control can be used in conjunction with an other 471 overload control mechanisms and provides an additional layer of 472 protection against overload. It is fully implemented within a SIP 473 server and does not require cooperation between servers. In general, 474 SIP servers should apply other overload control techniques to control 475 load before a local overload control mechanism is activated as a 476 mechanism of last resort. 478 6. Topologies 480 The following topologies describe four generic SIP server 481 configurations. These topologies illustrate specific challenges for 482 an overload control mechanism. An actual SIP server topology is 483 likely to consist of combinations of these generic scenarios. 485 In the "load balancer" configuration shown in Figure 3(a) a set of 486 SIP servers (D, E and F) receives traffic from a single source A. A 487 load balancer is a typical example for such a configuration. In this 488 configuration, overload control needs to prevent server A (i.e., the 489 load balancer) from sending too much traffic to any of its downstream 490 neighbors D, E and F. If one of the downstream neighbors becomes 491 overloaded, A can direct traffic to the servers that still have 492 capacity. If one of the servers serves as a backup, it can be 493 activated once one of the primary servers reaches overload. 495 If A can reliably determine that D, E and F are its only downstream 496 neighbors and all of them are in overload, it may choose to report 497 overload upstream on behalf of D, E and F. However, if the set of 498 downstream neighbors is not fixed or only some of them are in 499 overload then A should not activate an overload control since A can 500 still forward the requests destined to non-overloaded downstream 501 neighbors. These requests would be throttled as well if A would use 502 overload control towards its upstream neighbors. 504 In some cases, the servers D, E, and F are in a server farm and 505 configured to appear as a single server to their upstream neighbors. 506 In this case, server A can report overload on behalf of the server 507 farm. If the load balancer is not a SIP entity, servers D, E, and F 508 can report the overall load of the server farm (i.e., the load of the 509 virtual server) in their messages. As an alternative, one of the 510 servers (e.g., server E) can report overload on behalf of the server 511 farm. In this case, not all messages contain overload control 512 information and it needs to be ensured that all upstream neighbors 513 are periodically served by server E to received updated information. 515 In the "multiple sources" configuration shown in Figure 3(b), a SIP 516 server D receives traffic from multiple upstream sources A, B and C. 517 Each of these sources can contribute a different amount of traffic, 518 which can vary over time. The set of active upstream neighbors of D 519 can change as servers may become inactive and previously inactive 520 servers may start contributing traffic to D. 522 If D becomes overloaded, it needs to generate feedback to reduce the 523 amount of traffic it receives from its upstream neighbors. D needs 524 to decide by how much each upstream neighbor should reduce traffic. 525 This decision can require the consideration of the amount of traffic 526 sent by each upstream neighbor and it may need to be re-adjusted as 527 the traffic contributed by each upstream neighbor varies over time. 528 Server D can use a local fairness policy to determine much traffic it 529 accepts from each upstream neighbor. 531 In many configurations, SIP servers form a "mesh" as shown in 532 Figure 3(c). Here, multiple upstream servers A, B and C forward 533 traffic to multiple alternative servers D and E. This configuration 534 is a combination of the "load balancer" and "multiple sources" 535 scenario. 537 +---+ +---+ 538 /->| D | | A |-\ 539 / +---+ +---+ \ 540 / \ +---+ 541 +---+-/ +---+ +---+ \->| | 542 | A |------>| E | | B |------>| D | 543 +---+-\ +---+ +---+ /->| | 544 \ / +---+ 545 \ +---+ +---+ / 546 \->| F | | C |-/ 547 +---+ +---+ 549 (a) load balancer (b) multiple sources 551 +---+ 552 | A |---\ a--\ 553 +---+-\ \---->+---+ \ 554 \/----->| D | b--\ \--->+---+ 555 +---+--/\ /-->+---+ \---->| | 556 | B | \/ c-------->| D | 557 +---+---\/\--->+---+ | | 558 /\---->| E | ... /--->+---+ 559 +---+--/ /-->+---+ / 560 | C |-----/ z--/ 561 +---+ 563 (c) mesh (d) edge proxy 565 Figure 3: Topologies 567 Overload control that is based on reducing the number of messages a 568 sender is allowed to send is not suited for servers that receive 569 requests from a very large population of senders, each of which only 570 infrequently sends a request. This scenario is shown in Figure 3(d). 571 An edge proxy that is connected to many UAs is a typical example for 572 such a configuration. 574 Since each UA typically only contributes a few requests, which are 575 often related to the same call, it can't decrease its message rate to 576 resolve the overload. In such a configuration, a SIP server can 577 resort to local overload control by rejecting a percentage of the 578 requests it receives with 503 (Service Unavailable) responses. Since 579 there are many upstream neighbors that contribute to the overall 580 load, sending 503 (Service Unavailable) to a fraction of them can 581 gradually reduce load without entirely stopping all incoming traffic. 582 The Retry-After header can be used in 503 (Service Unavailable) 583 responses to ask UAs to wait a given number of seconds before trying 584 the call again. Using 503 (Service Unavailable) towards individual 585 sources can, however, not prevent overload if a large number of users 586 places calls at the same time. 588 Note: The requirements of the "edge proxy" topology are different 589 from the ones of the other topologies, which may require a 590 different method for overload control. 592 7. Fairness 594 There are many different ways to define fairness between multiple 595 upstream neighbors of a SIP server. In the context of SIP server 596 overload, it is helpful to describe two categories of fairness: basic 597 fairness and customized fairness. With basic fairness a SIP server 598 treats all call attempts equally and ensures that each call attempt 599 has the same chance of succeeding. With customized fairness, the 600 server allocates resources according to different priorities. An 601 example application of the basic fairness criteria is the "Third 602 caller receives free tickets" scenario, where each call attempt 603 should have an equal success probability in making calls through an 604 overloaded SIP server, irrespective of the service provider where it 605 was initiated. An example of customized fairness would be a server 606 which assigns different resource allocations to its upstream 607 neighbors (e.g., service providers) as defined in a service level 608 agreement (SLA). 610 8. Performance Metrics 612 The performance of an overload control mechanism can be measured 613 using different metrics. 615 A key performance indicator is the goodput of a SIP server under 616 overload. Ideally, a SIP server will be enabled to perform at its 617 capacity limit during periods of overload. E.g., if a SIP server has 618 a processing capacity of 140 INVITE transactions per second then an 619 overload control mechanism should enable it to process 140 INVITEs 620 per second even if the offered load is much higher. The delay 621 introduced by a SIP server is another important indicator. An 622 overload control mechanism should ensure that the delay encountered 623 by a SIP message is not increased significantly during periods of 624 overload. Significantly increased delay can lead to time-outs, and 625 retransmission > of SIP messages, making the overload worse. 627 Responsiveness and stability are other important performance 628 indicators. An overload control mechanism should quickly react to an 629 overload occurrence and ensure that a SIP server does not become 630 overloaded even during sudden peaks of load. Similarly, an overload 631 control mechanism should quickly stop rejecting calls if the overload 632 disappears. Stability is another important criteria. An overload 633 control mechanism should not cause significant oscillations of load 634 on a SIP server. The performance of SIP overload control mechanisms 635 is discussed in [Noel et al.], [Shen et al.], [Hilt et al.] and 636 [Abdelal et al.]. 638 In addition to the above metrics, there are other indicators that are 639 relevant for the evaluation of an overload control mechanism: 641 Fairness: Which types of fairness does the overload control 642 mechanism implement? 643 Self-limiting: Is the overload control self-limiting if a SIP server 644 becomes unresponsive? 645 Changes in neighbor set: How does the mechanism adapt to a changing 646 set of sending entities? 647 Data points to monitor: Which and how many data points does an 648 overload control mechanism need to monitor? 649 Computational complexity: What is the (cpu) load created by the 650 overload "monitor" and "actuator" 652 9. Explicit Overload Control Feedback 654 Explicit overload control feedback enables a receiver to indicate how 655 much traffic it wants to receive. Explicit overload control 656 mechanisms can be differentiated based on the type of information 657 conveyed in the overload control feedback and whether the control 658 function is in the receiving or sending entity (receiver- vs. sender- 659 based overload control), or both. 661 9.1. Rate-based Overload Control 663 The key idea of rate-based overload control is to limit the request 664 rate at which an upstream element is allowed to forward traffic to 665 the downstream neighbor. If overload occurs, a SIP server instructs 666 each upstream neighbor to send at most X requests per second. Each 667 upstream neighbor can be assigned a different rate cap. 669 An example algorithm for an Actuator in the sending entity is request 670 gapping. After transmitting a request to a downstream neighbor, a 671 server waits for 1/X seconds before it transmits the next request to 672 the same neighbor. Requests that arrive during the waiting period 673 are not forwarded and are either redirected, rejected or buffered. 675 The rate cap ensures that the number of requests received by a SIP 676 server never increases beyond the sum of all rate caps granted to 677 upstream neighbors. Rate-based overload control protects a SIP 678 server against overload even during load spikes assuming there are no 679 new upstream neighbors that start sending traffic. New upstream 680 neighbors need to be considered in the rate caps assigned to all 681 upstream neighbors. The rate assigned to upstream neighbors needs to 682 be adjusted when new neighbors join. During periods when new 683 neighbors are joining, overload can occur in extreme cases until the 684 rate caps of all servers are adjusted to again match the overall rate 685 cap of the server. The overall rate cap of a SIP server is 686 determined by an overload control algorithm, e.g., based on system 687 load. 689 Rate-based overload control requires a SIP server to assign a rate 690 cap to each of its upstream neighbors while it is activated. 691 Effectively, a server needs to assign a share of its overall capacity 692 to each upstream neighbor. A server needs to ensure that the sum of 693 all rate caps assigned to upstream neighbors does not substantially 694 oversubscribe its actual processing capacity. This requires a SIP 695 server to keep track of the set of upstream neighbors and to adjust 696 the rate cap if a new upstream neighbor appears or an existing 697 neighbor stops transmitting. For example, if the capacity of the 698 server is X and this server is receiving traffic from two upstream 699 neighbors, it can assign a rate of X/2 to each of them. If a third 700 sender appears, the rate for each sender is lowered to X/3. If the 701 overall rate cap is too high, a server may experience overload. If 702 the cap is too low, the upstream neighbors will reject requests even 703 though they could be processed by the server. 705 An approach for estimating a rate cap for each upstream neighbor is 706 using a fixed proportion of a control variable, X, where X is 707 initially equal to the capacity of the SIP server. The server then 708 increases or decreases X until the workload arrival rate matches the 709 actual server capacity. Usually, this will mean that the sum of the 710 rate caps sent out by the server (=X) exceeds its actual capacity, 711 but enables upstream neighbors who are not generating more than their 712 fair share of the work to be effectively unrestricted. In this 713 approach, the server only has to measure the aggregate arrival rate. 714 However, since the overall rate cap is usually higher than the actual 715 capacity, brief periods of overload may occur. 717 9.2. Loss-based Overload Control 719 A loss percentage enables a SIP server to ask an upstream neighbor to 720 reduce the number of requests it would normally forward to this 721 server by X%. For example, a SIP server can ask an upstream neighbor 722 to reduce the number of requests this neighbor would normally send by 723 10%. The upstream neighbor then redirects or rejects 10% of the 724 traffic that is destined for this server. 726 An algorithm for the sending entity to implement a loss percentage is 727 to draw a random number between 1 and 100 for each request to be 728 forwarded. The request is not forwarded to the server if the random 729 number is less than or equal to X. 731 An advantage of loss-based overload control is that, the receiving 732 entity does not need to track the set of upstream neighbors or the 733 request rate it receives from each upstream neighbor. It is 734 sufficient to monitor the overall system utilization. To reduce 735 load, a server can ask its upstream neighbors to lower the traffic 736 forwarded by a certain percentage. The server calculates this 737 percentage by combining the loss percentage that is currently in use 738 (i.e., the loss percentage the upstream neighbors are currently using 739 when forwarding traffic), the current system utilization and the 740 desired system utilization. For example, if the server load 741 approaches 90% and the current loss percentage is set to a 50% 742 traffic reduction, then the server can decide to increase the loss 743 percentage to 55% in order to get to a system utilization of 80%. 744 Similarly, the server can lower the loss percentage if permitted by 745 the system utilization. 747 Loss-based overload control requires that the throttle percentage is 748 adjusted to the current overall number of requests received by the 749 server. This is particularly important if the number of requests 750 received fluctuates quickly. For example, if a SIP server sets a 751 throttle value of 10% at time t1 and the number of requests increases 752 by 20% between time t1 and t2 (t1