idnits 2.17.1 draft-ietf-soc-overload-design-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 2, 2010) is 4888 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Hilt 3 Internet-Draft Bell Labs/Alcatel-Lucent 4 Intended status: Informational E. Noel 5 Expires: June 5, 2011 AT&T Labs 6 C. Shen 7 Columbia University 8 A. Abdelal 9 Sonus Networks 10 December 2, 2010 12 Design Considerations for Session Initiation Protocol (SIP) Overload 13 Control 14 draft-ietf-soc-overload-design-03 16 Abstract 18 Overload occurs in Session Initiation Protocol (SIP) networks when 19 SIP servers have insufficient resources to handle all SIP messages 20 they receive. Even though the SIP protocol provides a limited 21 overload control mechanism through its 503 (Service Unavailable) 22 response code, SIP servers are still vulnerable to overload. This 23 document discusses models and design considerations for a SIP 24 overload control mechanism. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on June 5, 2011. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. SIP Overload Problem . . . . . . . . . . . . . . . . . . . . . 4 62 3. Explicit vs. Implicit Overload Control . . . . . . . . . . . . 5 63 4. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 5. Degree of Cooperation . . . . . . . . . . . . . . . . . . . . 8 65 5.1. Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Local Overload Control . . . . . . . . . . . . . . . . . . 11 68 6. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 7. Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 70 8. Performance Metrics . . . . . . . . . . . . . . . . . . . . . 14 71 9. Explicit Overload Control Feedback . . . . . . . . . . . . . . 15 72 9.1. Rate-based Overload Control . . . . . . . . . . . . . . . 15 73 9.2. Loss-based Overload Control . . . . . . . . . . . . . . . 17 74 9.3. Window-based Overload Control . . . . . . . . . . . . . . 17 75 9.4. Overload Signal-based Overload Control . . . . . . . . . . 19 76 9.5. On-/Off Overload Control . . . . . . . . . . . . . . . . . 19 77 10. Implicit Overload Control . . . . . . . . . . . . . . . . . . 19 78 11. Overload Control Algorithms . . . . . . . . . . . . . . . . . 20 79 12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 20 80 13. Security Considerations . . . . . . . . . . . . . . . . . . . 21 81 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 82 15. Informative References . . . . . . . . . . . . . . . . . . . . 21 83 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 22 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 86 1. Introduction 88 As with any network element, a Session Initiation Protocol (SIP) 89 [RFC3261] server can suffer from overload when the number of SIP 90 messages it receives exceeds the number of messages it can process. 91 Overload occurs if a SIP server does not have sufficient resources to 92 process all incoming SIP messages. These resources may include CPU, 93 memory, input/output, or disk resources. 95 Overload can pose a serious problem for a network of SIP servers. 96 During periods of overload, the throughput of a network of SIP 97 servers can be significantly degraded. In fact, overload may lead to 98 a situation in which the throughput drops down to a small fraction of 99 the original processing capacity. This is often called congestion 100 collapse. 102 An overload control mechanism enables a SIP server to perform close 103 to its capacity limit during times of overload. Overload control is 104 used by a SIP server if it is unable to process all SIP requests due 105 to resource constraints. There are other failure cases in which a 106 SIP server can successfully process incoming requests but has to 107 reject them for other reasons. For example, a PSTN gateway that runs 108 out of trunk lines but still has plenty of capacity to process SIP 109 messages should reject incoming INVITEs using a response such as 488 110 (Not Acceptable Here), as described in [RFC4412]. Similarly, a SIP 111 registrar that has lost connectivity to its registration database but 112 is still capable of processing SIP messages should reject REGISTER 113 requests with a 500 (Server Error) response [RFC3261]. Overload 114 control mechanisms do not apply in these cases and SIP provides 115 appropriate response codes for them. 117 There are cases in which a SIP server runs other services that do not 118 involve the processing of SIP messages (e.g., processing of RTP 119 packets, database queries, software updates and event handling). 120 These services may, or may not, be correlated with the SIP message 121 volume. These services can use up a substantial share of resources 122 available on the server (e.g., CPU cycles) and leave the server in a 123 condition where it is unable to process all incoming SIP requests. 124 In these cases, the SIP server applies SIP overload control 125 mechanisms to avoid congestion collapse on the SIP signaling plane. 126 However, controlling the number of SIP requests may not significantly 127 reduce the load on the server if the resource shortage was created by 128 another service. In these cases, it is to be expected that the 129 server uses appropriate methods of controlling the resource usage of 130 other services. The specifics of controlling the resource usage of 131 other services and their coordination is out of scope for this 132 document. 134 The SIP protocol provides a limited mechanism for overload control 135 through its 503 (Service Unavailable) response code and the Retry- 136 After header. However, this mechanism cannot prevent overload of a 137 SIP server and it cannot prevent congestion collapse. In fact, it 138 may cause traffic to oscillate and to shift between SIP servers and 139 thereby worsen an overload condition. A detailed discussion of the 140 SIP overload problem, the problems with the 503 (Service Unavailable) 141 response code and the Retry-After header and the requirements for a 142 SIP overload control mechanism can be found in [RFC5390]. In 143 addition, 503 is used for other situations (with or without Retry- 144 After), not just SIP Server overload. A SIP Overload Control process 145 based on 503 would have to specify exactly which cause values trigger 146 the Overload Control. 148 This document discusses the models, assumptions and design 149 considerations for a SIP overload control mechanism. The document is 150 a product of the SIP overload control design team. 152 2. SIP Overload Problem 154 A key contributor to the SIP congestion collapse [RFC5390] is the 155 regenerative behavior of overload in the SIP protocol. When SIP is 156 running over the UDP protocol, it will retransmit messages that were 157 dropped or excessively delayed by a SIP server due to overload and 158 thereby increase the offered load for the already overloaded server. 159 This increase in load worsens the severity of the overload condition 160 and, in turn, causes more messages to be dropped. A congestion 161 collapse can occur [Hilt et al.], [Noel et al.], [Shen et al.] and 162 [Abdelal et al.]. 164 Regenerative behavior under overload should ideally be avoided by any 165 protocol as this would lead to stable operation under overload. 166 However, this is often difficult to achieve in practice. For 167 example, changing the SIP retransmission timer mechanisms can reduce 168 the degree of regeneration during overload but will impact the 169 ability of SIP to recover from message losses. Without any 170 retransmission each message that is dropped due to SIP server 171 overload will eventually lead to a failed call. 173 For a SIP INVITE transaction to be successful a minimum of three 174 messages need to be forwarded by a SIP server. Often an INVITE 175 transaction consists of five or more SIP messages. If a SIP server 176 under overload randomly discards messages without evaluating them, 177 the chances that all messages belonging to a transaction are 178 successfully forwarded will decrease as the load increases. Thus, 179 the number of transactions that complete successfully will decrease 180 even if the message throughput of a server remains up and assuming 181 the overload behavior is fully non-regenerative. A SIP server might 182 (partially) parse incoming messages to determine if it is a new 183 request or a message belonging to an existing transaction. However, 184 after having spend resources on parsing a SIP message, discarding 185 this message is expensive as the resources already spend are lost. 186 The number of successful transactions will therefore decline with an 187 increase in load as less and less resources can be spent on 188 forwarding messages and more and more resources are consumed by 189 inspecting messages that will eventually be dropped. The slope of 190 the decline depends on the amount of resources spent to inspect each 191 message. 193 Another challenge for SIP overload control is controlling the rate of 194 the true traffic source. Overload is often caused by a large number 195 of UAs each of which creates only a single message. However, the sum 196 of their traffic can overload a SIP server. The overload mechanisms 197 suitable for controlling a SIP server (e.g., rate control) may not be 198 effective for individual UAs. In some cases, there are other non-SIP 199 mechanisms for limiting the load from the UAs. These may operate 200 independently from, or in conjunction with, the SIP overload 201 mechanisms described here. In either case, they are out of scope for 202 this document. 204 3. Explicit vs. Implicit Overload Control 206 The main differences between explicit and implicit overload control 207 is the way overload is signaled from a SIP server that is reaching 208 overload condition to its upstream neighbors. 210 In an explicit overload control mechanism, a SIP server uses an 211 explicit overload signal to indicate that it is reaching its capacity 212 limit. Upstream neighbors receiving this signal can adjust their 213 transmission rate according to the overload signal to a level that is 214 acceptable to the downstream server. The overload signal enables a 215 SIP server to steer the load it is receiving to a rate at which it 216 can perform at maximum capacity. 218 Implicit overload control uses the absence of responses and packet 219 loss as an indication of overload. A SIP server that is sensing such 220 a condition reduces the load it is forwarding a downstream neighbor. 221 Since there is no explicit overload signal, this mechanism is robust 222 as it does not depend on actions taken by the SIP server running into 223 overload. 225 The ideas of explicit and implicit overload control are in fact 226 complementary. By considering implicit overload indications a server 227 can avoid overloading an unresponsive downstream neighbor. An 228 explicit overload signal enables a SIP server to actively steer the 229 incoming load to a desired level. 231 4. System Model 233 The model shown in Figure 1 identifies fundamental components of an 234 explicit SIP overload control mechanism: 236 SIP Processor: The SIP Processor processes SIP messages and is the 237 component that is protected by overload control. 238 Monitor: The Monitor measures the current load of the SIP processor 239 on the receiving entity. It implements the mechanisms needed to 240 determine the current usage of resources relevant for the SIP 241 processor and reports load samples (S) to the Control Function. 242 Control Function: The Control Function implements the overload 243 control algorithm. The control function uses the load samples (S) 244 and determines if overload has occurred and a throttle (T) needs 245 to be set to adjust the load sent to the SIP processor on the 246 receiving entity. The control function on the receiving entity 247 sends load feedback (F) to the sending entity. 248 Actuator: The Actuator implements the algorithms needed to act on 249 the throttles (T) and ensures that the amount of traffic forwarded 250 to the receiving entity meets the criteria of the throttle. For 251 example, a throttle may instruct the Actuator to not forward more 252 than 100 INVITE messages per second. The Actuator implements the 253 algorithms to achieve this objective, e.g., using message gapping. 254 It also implements algorithms to select the messages that will be 255 affected and determine whether they are rejected or redirected. 257 The type of feedback (F) conveyed from the receiving to the sending 258 entity depends on the overload control method used (i.e., loss-based, 259 rate-based, window-based or signal-based overload control; see 260 Section 9), the overload control algorithm (see Section 11) as well 261 as other design parameters. The feedback (F) enables the sending 262 entity to adjust the amount of traffic forwarded to the receiving 263 entity to a level that is acceptable to the receiving entity without 264 causing overload. 266 Figure 1 depicts a general system model for overload control. In 267 this diagram, one instance of the control function is on the sending 268 entity (i.e., associated with the actuator) and one is on the 269 receiving entity (i.e., associated with the monitor). However, a 270 specific mechanism may not require both elements. In this case, one 271 of two control function elements can be empty and simply passes along 272 feedback. E.g., if (F) is defined as a loss-rate (e.g., reduce 273 traffic by 10%) there is no need for a control function on the 274 sending entity as the content of (F) can be copied directly into (T). 276 The model in Figure 1 shows a scenario with one sending and one 277 receiving entity. In a more realistic scenario a receiving entity 278 will receive traffic from multiple sending entities and vice versa 279 (see Section 6). The feedback generated by a Monitor will therefore 280 often be distributed across multiple Actuators. A Monitor needs to 281 be able to split the load it can process across multiple sending 282 entities and generate feedback that correctly adjusts the load each 283 sending entity is allowed to send. Similarly, an Actuator needs to 284 be prepared to receive different levels of feedback from different 285 receiving entities and throttle traffic to these entities 286 accordingly. 288 In a realistic deployment, SIP messages will flow in both directions, 289 from server B to server A as well as server A to server B. The 290 overload control mechanisms in each direction can be considered 291 independently. For messages flowing from server A to server B, the 292 sending entity is server A and the receiving entity is server B and 293 vice versa. The control loops in both directions operate 294 independently. 296 Sending Receiving 297 Entity Entity 298 +----------------+ +----------------+ 299 | Server A | | Server B | 300 | +----------+ | | +----------+ | -+ 301 | | Control | | F | | Control | | | 302 | | Function |<-+------+--| Function | | | 303 | +----------+ | | +----------+ | | 304 | T | | | ^ | | Overload 305 | v | | | S | | Control 306 | +----------+ | | +----------+ | | 307 | | Actuator | | | | Monitor | | | 308 | +----------+ | | +----------+ | | 309 | | | | ^ | -+ 310 | v | | | | -+ 311 | +----------+ | | +----------+ | | 312 <-+--| SIP | | | | SIP | | | SIP 313 --+->|Processor |--+------+->|Processor |--+-> | System 314 | +----------+ | | +----------+ | | 315 +----------------+ +----------------+ -+ 317 Figure 1: System Model for Explicit Overload Control 319 5. Degree of Cooperation 321 A SIP request is usually processed by more than one SIP server on its 322 path to the destination. Thus, a design choice for an explicit 323 overload control mechanism is where to place the components of 324 overload control along the path of a request and, in particular, 325 where to place the Monitor and Actuator. This design choice 326 determines the degree of cooperation between the SIP servers on the 327 path. Overload control can be implemented hop-by-hop with the 328 Monitor on one server and the Actuator on its direct upstream 329 neighbor. Overload control can be implemented end-to-end with 330 Monitors on all SIP servers along the path of a request and an 331 Actuator on the sender. In this case, the Control Functions 332 associated with each Monitor have to cooperate to jointly determine 333 the overall feedback for this path. Finally, overload control can be 334 implemented locally on a SIP server if Monitor and Actuator reside on 335 the same server. In this case, the sending entity and receiving 336 entity are the same SIP server and Actuator and Monitor operate on 337 the same SIP processor (although, the Actuator typically operates on 338 a pre-processing stage in local overload control). Local overload 339 control is an internal overload control mechanism as the control loop 340 is implemented internally on one server. Hop-by-hop and end-to-end 341 are external overload control mechanisms. All three configurations 342 are shown in Figure 2. 344 +---------+ +------(+)---------+ 345 +------+ | | | ^ | 346 | | | +---+ | | +---+ 347 v | v //=>| C | v | //=>| C | 348 +---+ +---+ // +---+ +---+ +---+ // +---+ 349 | A |===>| B | | A |===>| B | 350 +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ 351 ^ \\=>| D | ^ | \\=>| D | 352 | +---+ | | +---+ 353 | | | v | 354 +---------+ +------(+)---------+ 356 (a) hop-by-hop (b) end-to-end 358 +-+ 359 v | 360 +-+ +-+ +---+ 361 v | v | //=>| C | 362 +---+ +---+ // +---+ 363 | A |===>| B | 364 +---+ +---+ \\ +---+ 365 \\=>| D | 366 +---+ 367 ^ | 368 +-+ 370 (c) local 372 ==> SIP request flow 373 <-- Overload feedback loop 375 Figure 2: Degree of Cooperation between Servers 377 5.1. Hop-by-Hop 379 The idea of hop-by-hop overload control is to instantiate a separate 380 control loop between all neighboring SIP servers that directly 381 exchange traffic. I.e., the Actuator is located on the SIP server 382 that is the direct upstream neighbor of the SIP server that has the 383 corresponding Monitor. Each control loop between two servers is 384 completely independent of the control loop between other servers 385 further up- or downstream. In the example in Figure 2(a), three 386 independent overload control loops are instantiated: A - B, B - C and 387 B - D. Each loop only controls a single hop. Overload feedback 388 received from a downstream neighbor is not forwarded further 389 upstream. Instead, a SIP server acts on this feedback, for example, 390 by rejecting SIP messages if needed. If the upstream neighbor of a 391 server also becomes overloaded, it will report this problem to its 392 upstream neighbors, which again take action based on the reported 393 feedback. Thus, in hop-by-hop overload control, overload is always 394 resolved by the direct upstream neighbors of the overloaded server 395 without the need to involve entities that are located multiple SIP 396 hops away. 398 Hop-by-hop overload control reduces the impact of overload on a SIP 399 network and can avoid congestion collapse. It is simple and scales 400 well to networks with many SIP entities. An advantage is that it 401 does not require feedback to be transmitted across multiple-hops, 402 possibly crossing multiple trust domains. Feedback is sent to the 403 next hop only. Furthermore, it does not require a SIP entity to 404 aggregate a large number of overload status values or keep track of 405 the overload status of SIP servers it is not communicating with. 407 5.2. End-to-End 409 End-to-end overload control implements an overload control loop along 410 the entire path of a SIP request, from UAC to UAS. An end-to-end 411 overload control mechanism consolidates overload information from all 412 SIP servers on the way (including all proxies and the UAS) and uses 413 this information to throttle traffic as far upstream as possible. An 414 end-to-end overload control mechanism has to be able to frequently 415 collect the overload status of all servers on the potential path(s) 416 to a destination and combine this data into meaningful overload 417 feedback. 419 A UA or SIP server only throttles requests if it knows that these 420 requests will eventually be forwarded to an overloaded server. For 421 example, if D is overloaded in Figure 2(b), A should only throttle 422 requests it forwards to B when it knows that they will be forwarded 423 to D. It should not throttle requests that will eventually be 424 forwarded to C, since server C is not overloaded. In many cases, it 425 is difficult for A to determine which requests will be routed to C 426 and D since this depends on the local routing decision made by B. 427 These routing decisions can be highly variable and, for example, 428 depend on call routing policies configured by the user, services 429 invoked on a call, load balancing policies, etc. The fact that a 430 previous message to a target has been routed through an overloaded 431 server does not necessarily mean the next message to this target will 432 also be routed through the same server. 434 The main problem of end-to-end overload control is its inherent 435 complexity since UAC or SIP servers need to monitor all potential 436 paths to a destination in order to determine which requests should be 437 throttled and which requests may be sent. Even if this information 438 is available, it is not clear which path a specific request will 439 take. 441 A variant of end-to-end overload control is to implement a control 442 loop between a set of well-known SIP servers along the path of a SIP 443 request. For example, an overload control loop can be instantiated 444 between a server that only has one downstream neighbor or a set of 445 closely coupled SIP servers. A control loop spanning multiple hops 446 can be used if the sending entity has full knowledge about the SIP 447 servers on the path of a SIP message. 449 A key difference to transport protocols using end-to-end congestion 450 control such as TCP is that the traffic exchanged between SIP servers 451 consists of many individual SIP messages. Each of these SIP messages 452 has its own source and destination. Even SIP messages containing 453 identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message to the 454 same SIP URI) can be routed to different destinations. This is 455 different from TCP which controls a stream of packets between a 456 single source and a single destination. 458 5.3. Local Overload Control 460 The idea of local overload control (see Figure 2(c)) is to run the 461 Monitor and Actuator on the same server. This enables the server to 462 monitor the current resource usage and to reject messages that can't 463 be processed without overusing the local resources. The fundamental 464 assumption behind local overload control is that it is less resource 465 consuming for a server to reject messages than to process them. A 466 server can therefore reject the excess messages it cannot process to 467 stop all retransmissions of these messages. Since rejecting messages 468 does consume resources on a SIP server, local overload control alone 469 cannot prevent a congestion collapse. 471 Local overload control can be used in conjunction with an other 472 overload control mechanisms and provides an additional layer of 473 protection against overload. It is fully implemented within a SIP 474 server and does not require cooperation between servers. In general, 475 SIP servers should apply other overload control techniques to control 476 load before a local overload control mechanism is activated as a 477 mechanism of last resort. 479 6. Topologies 481 The following topologies describe four generic SIP server 482 configurations. These topologies illustrate specific challenges for 483 an overload control mechanism. An actual SIP server topology is 484 likely to consist of combinations of these generic scenarios. 486 In the "load balancer" configuration shown in Figure 3(a) a set of 487 SIP servers (D, E and F) receives traffic from a single source A. A 488 load balancer is a typical example for such a configuration. In this 489 configuration, overload control needs to prevent server A (i.e., the 490 load balancer) from sending too much traffic to any of its downstream 491 neighbors D, E and F. If one of the downstream neighbors becomes 492 overloaded, A can direct traffic to the servers that still have 493 capacity. If one of the servers serves as a backup, it can be 494 activated once one of the primary servers reaches overload. 496 If A can reliably determine that D, E and F are its only downstream 497 neighbors and all of them are in overload, it may choose to report 498 overload upstream on behalf of D, E and F. However, if the set of 499 downstream neighbors is not fixed or only some of them are in 500 overload then A should not activate an overload control since A can 501 still forward the requests destined to non-overloaded downstream 502 neighbors. These requests would be throttled as well if A would use 503 overload control towards its upstream neighbors. 505 In some cases, the servers D, E, and F are in a server farm and 506 configured to appear as a single server to their upstream neighbors. 507 In this case, server A can report overload on behalf of the server 508 farm. If the load balancer is not a SIP entity, servers D, E, and F 509 can report the overall load of the server farm (i.e., the load of the 510 virtual server) in their messages. As an alternative, one of the 511 servers (e.g., server E) can report overload on behalf of the server 512 farm. In this case, not all messages contain overload control 513 information and it needs to be ensured that all upstream neighbors 514 are periodically served by server E to received updated information. 516 In the "multiple sources" configuration shown in Figure 3(b), a SIP 517 server D receives traffic from multiple upstream sources A, B and C. 518 Each of these sources can contribute a different amount of traffic, 519 which can vary over time. The set of active upstream neighbors of D 520 can change as servers may become inactive and previously inactive 521 servers may start contributing traffic to D. 523 If D becomes overloaded, it needs to generate feedback to reduce the 524 amount of traffic it receives from its upstream neighbors. D needs 525 to decide by how much each upstream neighbor should reduce traffic. 526 This decision can require the consideration of the amount of traffic 527 sent by each upstream neighbor and it may need to be re-adjusted as 528 the traffic contributed by each upstream neighbor varies over time. 529 Server D can use a local fairness policy to determine much traffic it 530 accepts from each upstream neighbor. 532 In many configurations, SIP servers form a "mesh" as shown in 533 Figure 3(c). Here, multiple upstream servers A, B and C forward 534 traffic to multiple alternative servers D and E. This configuration 535 is a combination of the "load balancer" and "multiple sources" 536 scenario. 538 +---+ +---+ 539 /->| D | | A |-\ 540 / +---+ +---+ \ 541 / \ +---+ 542 +---+-/ +---+ +---+ \->| | 543 | A |------>| E | | B |------>| D | 544 +---+-\ +---+ +---+ /->| | 545 \ / +---+ 546 \ +---+ +---+ / 547 \->| F | | C |-/ 548 +---+ +---+ 550 (a) load balancer (b) multiple sources 552 +---+ 553 | A |---\ a--\ 554 +---+-\ \---->+---+ \ 555 \/----->| D | b--\ \--->+---+ 556 +---+--/\ /-->+---+ \---->| | 557 | B | \/ c-------->| D | 558 +---+---\/\--->+---+ | | 559 /\---->| E | ... /--->+---+ 560 +---+--/ /-->+---+ / 561 | C |-----/ z--/ 562 +---+ 564 (c) mesh (d) edge proxy 566 Figure 3: Topologies 568 Overload control that is based on reducing the number of messages a 569 sender is allowed to send is not suited for servers that receive 570 requests from a very large population of senders, each of which only 571 infrequently sends a request. This scenario is shown in Figure 3(d). 572 An edge proxy that is connected to many UAs is a typical example for 573 such a configuration. 575 Since each UA typically only contributes a few requests, which are 576 often related to the same call, it can't decrease its message rate to 577 resolve the overload. In such a configuration, a SIP server can 578 resort to local overload control by rejecting a percentage of the 579 requests it receives with 503 (Service Unavailable) responses. Since 580 there are many upstream neighbors that contribute to the overall 581 load, sending 503 (Service Unavailable) to a fraction of them can 582 gradually reduce load without entirely stopping all incoming traffic. 583 The Retry-After header can be used in 503 (Service Unavailable) 584 responses to ask UAs to wait a given number of seconds before trying 585 the call again. Using 503 (Service Unavailable) towards individual 586 sources can, however, not prevent overload if a large number of users 587 places calls at the same time. 589 Note: The requirements of the "edge proxy" topology are different 590 from the ones of the other topologies, which may require a 591 different method for overload control. 593 7. Fairness 595 There are many different ways to define fairness between multiple 596 upstream neighbors of a SIP server. In the context of SIP server 597 overload, it is helpful to describe two categories of fairness: basic 598 fairness and customized fairness. With basic fairness a SIP server 599 treats all call attempts equally and ensures that each call attempt 600 has the same chance of succeeding. With customized fairness, the 601 server allocates resources according to different priorities. An 602 example application of the basic fairness criteria is the "Third 603 caller receives free tickets" scenario, where each call attempt 604 should have an equal success probability in making calls through an 605 overloaded SIP server, irrespective of the service provider where it 606 was initiated. An example of customized fairness would be a server 607 which assigns different resource allocations to its upstream 608 neighbors (e.g., service providers) as defined in a service level 609 agreement (SLA). 611 8. Performance Metrics 613 The performance of an overload control mechanism can be measured 614 using different metrics. 616 A key performance indicator is the goodput of a SIP server under 617 overload. Ideally, a SIP server will be enabled to perform at its 618 capacity limit during periods of overload. E.g., if a SIP server has 619 a processing capacity of 140 INVITE transactions per second then an 620 overload control mechanism should enable it to process 140 INVITEs 621 per second even if the offered load is much higher. The delay 622 introduced by a SIP server is another important indicator. An 623 overload control mechanism should ensure that the delay encountered 624 by a SIP message is not increased significantly during periods of 625 overload. Significantly increased delay can lead to time-outs, and 626 retransmission of SIP messages, making the overload worse. 628 Responsiveness and stability are other important performance 629 indicators. An overload control mechanism should quickly react to an 630 overload occurrence and ensure that a SIP server does not become 631 overloaded even during sudden peaks of load. Similarly, an overload 632 control mechanism should quickly stop rejecting calls if the overload 633 disappears. Stability is another important criteria. An overload 634 control mechanism should not cause significant oscillations of load 635 on a SIP server. The performance of SIP overload control mechanisms 636 is discussed in [Noel et al.], [Shen et al.], [Hilt et al.] and 637 [Abdelal et al.]. 639 In addition to the above metrics, there are other indicators that are 640 relevant for the evaluation of an overload control mechanism: 642 Fairness: Which types of fairness does the overload control 643 mechanism implement? 644 Self-limiting: Is the overload control self-limiting if a SIP server 645 becomes unresponsive? 646 Changes in neighbor set: How does the mechanism adapt to a changing 647 set of sending entities? 648 Data points to monitor: Which and how many data points does an 649 overload control mechanism need to monitor? 650 Computational load: What is the (cpu) load created by the overload 651 "monitor" and "actuator" 653 9. Explicit Overload Control Feedback 655 Explicit overload control feedback enables a receiver to indicate how 656 much traffic it wants to receive. Explicit overload control 657 mechanisms can be differentiated based on the type of information 658 conveyed in the overload control feedback and whether the control 659 function is in the receiving or sending entity (receiver- vs. sender- 660 based overload control), or both. 662 9.1. Rate-based Overload Control 664 The key idea of rate-based overload control is to limit the request 665 rate at which an upstream element is allowed to forward traffic to 666 the downstream neighbor. If overload occurs, a SIP server instructs 667 each upstream neighbor to send at most X requests per second. Each 668 upstream neighbor can be assigned a different rate cap. 670 An example algorithm for an Actuator in the sending entity is request 671 gapping. After transmitting a request to a downstream neighbor, a 672 server waits for 1/X seconds before it transmits the next request to 673 the same neighbor. Requests that arrive during the waiting period 674 are not forwarded and are either redirected, rejected or buffered. 676 The rate cap ensures that the number of requests received by a SIP 677 server never increases beyond the sum of all rate caps granted to 678 upstream neighbors. Rate-based overload control protects a SIP 679 server against overload even during load spikes assuming there are no 680 new upstream neighbors that start sending traffic. New upstream 681 neighbors need to be considered in the rate caps assigned to all 682 upstream neighbors. The rate assigned to upstream neighbors needs to 683 be adjusted when new neighbors join. During periods when new 684 neighbors are joining, overload can occur in extreme cases until the 685 rate caps of all servers are adjusted to again match the overall rate 686 cap of the server. The overall rate cap of a SIP server is 687 determined by an overload control algorithm, e.g., based on system 688 load. 690 Rate-based overload control requires a SIP server to assign a rate 691 cap to each of its upstream neighbors while it is activated. 692 Effectively, a server needs to assign a share of its overall capacity 693 to each upstream neighbor. A server needs to ensure that the sum of 694 all rate caps assigned to upstream neighbors does not substantially 695 oversubscribe its actual processing capacity. This requires a SIP 696 server to keep track of the set of upstream neighbors and to adjust 697 the rate cap if a new upstream neighbor appears or an existing 698 neighbor stops transmitting. For example, if the capacity of the 699 server is X and this server is receiving traffic from two upstream 700 neighbors, it can assign a rate of X/2 to each of them. If a third 701 sender appears, the rate for each sender is lowered to X/3. If the 702 overall rate cap is too high, a server may experience overload. If 703 the cap is too low, the upstream neighbors will reject requests even 704 though they could be processed by the server. 706 An approach for estimating a rate cap for each upstream neighbor is 707 using a fixed proportion of a control variable, X, where X is 708 initially equal to the capacity of the SIP server. The server then 709 increases or decreases X until the workload arrival rate matches the 710 actual server capacity. Usually, this will mean that the sum of the 711 rate caps sent out by the server (=X) exceeds its actual capacity, 712 but enables upstream neighbors who are not generating more than their 713 fair share of the work to be effectively unrestricted. In this 714 approach, the server only has to measure the aggregate arrival rate. 715 However, since the overall rate cap is usually higher than the actual 716 capacity, brief periods of overload may occur. 718 9.2. Loss-based Overload Control 720 A loss percentage enables a SIP server to ask an upstream neighbor to 721 reduce the number of requests it would normally forward to this 722 server by X%. For example, a SIP server can ask an upstream neighbor 723 to reduce the number of requests this neighbor would normally send by 724 10%. The upstream neighbor then redirects or rejects 10% of the 725 traffic that is destined for this server. 727 An algorithm for the sending entity to implement a loss percentage is 728 to draw a random number between 1 and 100 for each request to be 729 forwarded. The request is not forwarded to the server if the random 730 number is less than or equal to X. 732 An advantage of loss-based overload control is that, the receiving 733 entity does not need to track the set of upstream neighbors or the 734 request rate it receives from each upstream neighbor. It is 735 sufficient to monitor the overall system utilization. To reduce 736 load, a server can ask its upstream neighbors to lower the traffic 737 forwarded by a certain percentage. The server calculates this 738 percentage by combining the loss percentage that is currently in use 739 (i.e., the loss percentage the upstream neighbors are currently using 740 when forwarding traffic), the current system utilization and the 741 desired system utilization. For example, if the server load 742 approaches 90% and the current loss percentage is set to a 50% 743 traffic reduction, then the server can decide to increase the loss 744 percentage to 55% in order to get to a system utilization of 80%. 745 Similarly, the server can lower the loss percentage if permitted by 746 the system utilization. 748 Loss-based overload control requires that the throttle percentage is 749 adjusted to the current overall number of requests received by the 750 server. This is particularly important if the number of requests 751 received fluctuates quickly. For example, if a SIP server sets a 752 throttle value of 10% at time t1 and the number of requests increases 753 by 20% between time t1 and t2 (t1