idnits 2.17.1 draft-ietf-soc-overload-design-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 18, 2011) is 4727 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Hilt 3 Internet-Draft Bell Labs/Alcatel-Lucent 4 Intended status: Informational E. Noel 5 Expires: November 19, 2011 AT&T Labs 6 C. Shen 7 Columbia University 8 A. Abdelal 9 Sonus Networks 10 May 18, 2011 12 Design Considerations for Session Initiation Protocol (SIP) Overload 13 Control 14 draft-ietf-soc-overload-design-06 16 Abstract 18 Overload occurs in Session Initiation Protocol (SIP) networks when 19 SIP servers have insufficient resources to handle all SIP messages 20 they receive. Even though the SIP protocol provides a limited 21 overload control mechanism through its 503 (Service Unavailable) 22 response code, SIP servers are still vulnerable to overload. This 23 document discusses models and design considerations for a SIP 24 overload control mechanism. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on November 19, 2011. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. SIP Overload Problem . . . . . . . . . . . . . . . . . . . . . 4 62 3. Explicit vs. Implicit Overload Control . . . . . . . . . . . . 5 63 4. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 5. Degree of Cooperation . . . . . . . . . . . . . . . . . . . . 7 65 5.1. Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Local Overload Control . . . . . . . . . . . . . . . . . . 11 68 6. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 70 8. Performance Metrics . . . . . . . . . . . . . . . . . . . . . 14 71 9. Explicit Overload Control Feedback . . . . . . . . . . . . . . 15 72 9.1. Rate-based Overload Control . . . . . . . . . . . . . . . 16 73 9.2. Loss-based Overload Control . . . . . . . . . . . . . . . 17 74 9.3. Window-based Overload Control . . . . . . . . . . . . . . 18 75 9.4. Overload Signal-based Overload Control . . . . . . . . . . 19 76 9.5. On-/Off Overload Control . . . . . . . . . . . . . . . . . 20 77 10. Implicit Overload Control . . . . . . . . . . . . . . . . . . 20 78 11. Overload Control Algorithms . . . . . . . . . . . . . . . . . 20 79 12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 21 80 13. Security Considerations . . . . . . . . . . . . . . . . . . . 21 81 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 82 15. Informative References . . . . . . . . . . . . . . . . . . . . 23 83 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 24 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 86 1. Introduction 88 As with any network element, a Session Initiation Protocol (SIP) 89 [RFC3261] server can suffer from overload when the number of SIP 90 messages it receives exceeds the number of messages it can process. 91 Overload occurs if a SIP server does not have sufficient resources to 92 process all incoming SIP messages. These resources may include CPU, 93 memory, input/output, or disk resources. 95 Overload can pose a serious problem for a network of SIP servers. 96 During periods of overload, the throughput of a network of SIP 97 servers can be significantly degraded. In fact, overload may lead to 98 a situation in which the throughput drops down to a small fraction of 99 the original processing capacity. This is often called congestion 100 collapse. 102 An overload control mechanism enables a SIP server to perform close 103 to its capacity limit during times of overload. Overload control is 104 used by a SIP server if it is unable to process all SIP requests due 105 to resource constraints. There are other failure cases in which a 106 SIP server can successfully process incoming requests but has to 107 reject them for other reasons. For example, a PSTN gateway that runs 108 out of trunk lines but still has plenty of capacity to process SIP 109 messages should reject incoming INVITEs using a response such as 488 110 (Not Acceptable Here), as described in [RFC4412]. Similarly, a SIP 111 registrar that has lost connectivity to its registration database but 112 is still capable of processing SIP messages should reject REGISTER 113 requests with a 500 (Server Error) response [RFC3261]. Overload 114 control mechanisms do not apply in these cases and SIP provides 115 appropriate response codes for them. 117 There are cases in which a SIP server runs other services that do not 118 involve the processing of SIP messages (e.g., processing of RTP 119 packets, database queries, software updates and event handling). 120 These services may, or may not, be correlated with the SIP message 121 volume. These services can use up a substantial share of resources 122 available on the server (e.g., CPU cycles) and leave the server in a 123 condition where it is unable to process all incoming SIP requests. 124 In these cases, the SIP server applies SIP overload control 125 mechanisms to avoid congestion collapse on the SIP signaling plane. 126 However, controlling the number of SIP requests may not significantly 127 reduce the load on the server if the resource shortage was created by 128 another service. In these cases, it is to be expected that the 129 server uses appropriate methods of controlling the resource usage of 130 other services. The specifics of controlling the resource usage of 131 other services and their coordination is out of scope for this 132 document. 134 The SIP protocol provides a limited mechanism for overload control 135 through its 503 (Service Unavailable) response code and the Retry- 136 After header. However, this mechanism cannot prevent overload of a 137 SIP server and it cannot prevent congestion collapse. In fact, it 138 may cause traffic to oscillate and to shift between SIP servers and 139 thereby worsen an overload condition. A detailed discussion of the 140 SIP overload problem, the problems with the 503 (Service Unavailable) 141 response code and the Retry-After header and the requirements for a 142 SIP overload control mechanism can be found in [RFC5390]. In 143 addition, 503 is used for other situations, not just SIP Server 144 overload. A SIP Overload Control process based on 503 would have to 145 specify exactly which cause values trigger the Overload Control. 147 This document discusses the models, assumptions and design 148 considerations for a SIP overload control mechanism. The document 149 originated in the SIP overload control design team and has been 150 further developed by the SOC working group. 152 2. SIP Overload Problem 154 A key contributor to SIP congestion collapse [RFC5390] is the 155 regenerative behavior of overload in the SIP protocol. When SIP is 156 running over the UDP protocol, it will retransmit messages that were 157 dropped or excessively delayed by a SIP server due to overload and 158 thereby increase the offered load for the already overloaded server. 159 This increase in load worsens the severity of the overload condition 160 and, in turn, causes more messages to be dropped. A congestion 161 collapse can occur [Hilt et al.], [Noel et al.], [Shen et al.] and 162 [Abdelal et al.]. 164 Regenerative behavior under overload should ideally be avoided by any 165 protocol as this would lead to unstable operation under overload. 166 However, this is often difficult to achieve in practice. For 167 example, changing the SIP retransmission timer mechanisms can reduce 168 the degree of regeneration during overload but will impact the 169 ability of SIP to recover from message losses. Without any 170 retransmission each message that is dropped due to SIP server 171 overload will eventually lead to a failed transaction. 173 For a SIP INVITE transaction to be successful, a minimum of three 174 messages need to be forwarded by a SIP server. Often an INVITE 175 transaction consists of five or more SIP messages. If a SIP server 176 under overload randomly discards messages without evaluating them, 177 the chances that all messages belonging to a transaction are 178 successfully forwarded will decrease as the load increases. Thus, 179 the number of transactions that complete successfully will decrease 180 even if the message throughput of a server remains up and assuming 181 the overload behavior is fully non-regenerative. A SIP server might 182 (partially) parse incoming messages to determine if it is a new 183 request or a message belonging to an existing transaction. 184 Discarding a SIP message after spending the resources to parse it is 185 expensive. The number of successful transactions will therefore 186 decline with an increase in load as fewer resources can be spent on 187 forwarding messages and more resources are consumed by inspecting 188 messages that will eventually be dropped. The rate of the decline 189 depends on the amount of resources spent to inspect each message. 191 Another challenge for SIP overload control is controlling the rate of 192 the true traffic source. Overload is often caused by a large number 193 of UAs each of which creates only a single message. However, the sum 194 of their traffic can overload a SIP server. The overload mechanisms 195 suitable for controlling a SIP server (e.g., rate control) may not be 196 effective for individual UAs. In some cases, there are other non-SIP 197 mechanisms for limiting the load from the UAs. These may operate 198 independently from, or in conjunction with, the SIP overload 199 mechanisms described here. In either case, they are out of scope for 200 this document. 202 3. Explicit vs. Implicit Overload Control 204 The main differences between explicit and implicit overload control 205 is the way overload is signaled from a SIP server that is reaching 206 overload condition to its upstream neighbors. 208 In an explicit overload control mechanism, a SIP server uses an 209 explicit overload signal to indicate that it is reaching its capacity 210 limit. Upstream neighbors receiving this signal can adjust their 211 transmission rate according to the overload signal to a level that is 212 acceptable to the downstream server. The overload signal enables a 213 SIP server to steer the load it is receiving to a rate at which it 214 can perform at maximum capacity. 216 Implicit overload control uses the absence of responses and packet 217 loss as an indication of overload. A SIP server that is sensing such 218 a condition reduces the load it is forwarding a downstream neighbor. 219 Since there is no explicit overload signal, this mechanism is robust 220 as it does not depend on actions taken by the SIP server running into 221 overload. 223 The ideas of explicit and implicit overload control are in fact 224 complementary. By considering implicit overload indications a server 225 can avoid overloading an unresponsive downstream neighbor. An 226 explicit overload signal enables a SIP server to actively steer the 227 incoming load to a desired level. 229 4. System Model 231 The model shown in Figure 1 identifies fundamental components of an 232 explicit SIP overload control mechanism: 234 SIP Processor: The SIP Processor processes SIP messages and is the 235 component that is protected by overload control. 236 Monitor: The Monitor measures the current load of the SIP processor 237 on the receiving entity. It implements the mechanisms needed to 238 determine the current usage of resources relevant for the SIP 239 processor and reports load samples (S) to the Control Function. 240 Control Function: The Control Function implements the overload 241 control algorithm. The control function uses the load samples (S) 242 and determines if overload has occurred and a throttle (T) needs 243 to be set to adjust the load sent to the SIP processor on the 244 receiving entity. The control function on the receiving entity 245 sends load feedback (F) to the sending entity. 246 Actuator: The Actuator implements the algorithms needed to act on 247 the throttles (T) and ensures that the amount of traffic forwarded 248 to the receiving entity meets the criteria of the throttle. For 249 example, a throttle may instruct the Actuator to not forward more 250 than 100 INVITE messages per second. The Actuator implements the 251 algorithms to achieve this objective, e.g., using message gapping. 252 It also implements algorithms to select the messages that will be 253 affected and determine whether they are rejected or redirected. 255 The type of feedback (F) conveyed from the receiving to the sending 256 entity depends on the overload control method used (i.e., loss-based, 257 rate-based, window-based or signal-based overload control; see 258 Section 9), the overload control algorithm (see Section 11) as well 259 as other design parameters. The feedback (F) enables the sending 260 entity to adjust the amount of traffic forwarded to the receiving 261 entity to a level that is acceptable to the receiving entity without 262 causing overload. 264 Figure 1 depicts a general system model for overload control. In 265 this diagram, one instance of the control function is on the sending 266 entity (i.e., associated with the actuator) and one is on the 267 receiving entity (i.e., associated with the monitor). However, a 268 specific mechanism may not require both elements. In this case, one 269 of two control function elements can be empty and simply passes along 270 feedback. E.g., if (F) is defined as a loss-rate (e.g., reduce 271 traffic by 10%) there is no need for a control function on the 272 sending entity as the content of (F) can be copied directly into (T). 274 The model in Figure 1 shows a scenario with one sending and one 275 receiving entity. In a more realistic scenario a receiving entity 276 will receive traffic from multiple sending entities and vice versa 277 (see Section 6). The feedback generated by a Monitor will therefore 278 often be distributed across multiple Actuators. A Monitor needs to 279 be able to split the load it can process across multiple sending 280 entities and generate feedback that correctly adjusts the load each 281 sending entity is allowed to send. Similarly, an Actuator needs to 282 be prepared to receive different levels of feedback from different 283 receiving entities and throttle traffic to these entities 284 accordingly. 286 In a realistic deployment, SIP messages will flow in both directions, 287 from server B to server A as well as server A to server B. The 288 overload control mechanisms in each direction can be considered 289 independently. For messages flowing from server A to server B, the 290 sending entity is server A and the receiving entity is server B and 291 vice versa. The control loops in both directions operate 292 independently. 294 Sending Receiving 295 Entity Entity 296 +----------------+ +----------------+ 297 | Server A | | Server B | 298 | +----------+ | | +----------+ | -+ 299 | | Control | | F | | Control | | | 300 | | Function |<-+------+--| Function | | | 301 | +----------+ | | +----------+ | | 302 | T | | | ^ | | Overload 303 | v | | | S | | Control 304 | +----------+ | | +----------+ | | 305 | | Actuator | | | | Monitor | | | 306 | +----------+ | | +----------+ | | 307 | | | | ^ | -+ 308 | v | | | | -+ 309 | +----------+ | | +----------+ | | 310 <-+--| SIP | | | | SIP | | | SIP 311 --+->|Processor |--+------+->|Processor |--+-> | System 312 | +----------+ | | +----------+ | | 313 +----------------+ +----------------+ -+ 315 Figure 1: System Model for Explicit Overload Control 317 5. Degree of Cooperation 319 A SIP request is usually processed by more than one SIP server on its 320 path to the destination. Thus, a design choice for an explicit 321 overload control mechanism is where to place the components of 322 overload control along the path of a request and, in particular, 323 where to place the Monitor and Actuator. This design choice 324 determines the degree of cooperation between the SIP servers on the 325 path. Overload control can be implemented hop-by-hop with the 326 Monitor on one server and the Actuator on its direct upstream 327 neighbor. Overload control can be implemented end-to-end with 328 Monitors on all SIP servers along the path of a request and an 329 Actuator on the sender. In this case, the Control Functions 330 associated with each Monitor have to cooperate to jointly determine 331 the overall feedback for this path. Finally, overload control can be 332 implemented locally on a SIP server if Monitor and Actuator reside on 333 the same server. In this case, the sending entity and receiving 334 entity are the same SIP server and Actuator and Monitor operate on 335 the same SIP processor (although, the Actuator typically operates on 336 a pre-processing stage in local overload control). Local overload 337 control is an internal overload control mechanism as the control loop 338 is implemented internally on one server. Hop-by-hop and end-to-end 339 are external overload control mechanisms. All three configurations 340 are shown in Figure 2. 342 +---------+ +------(+)---------+ 343 +------+ | | | ^ | 344 | | | +---+ | | +---+ 345 v | v //=>| C | v | //=>| C | 346 +---+ +---+ // +---+ +---+ +---+ // +---+ 347 | A |===>| B | | A |===>| B | 348 +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ 349 ^ \\=>| D | ^ | \\=>| D | 350 | +---+ | | +---+ 351 | | | v | 352 +---------+ +------(+)---------+ 354 (a) hop-by-hop (b) end-to-end 356 +-+ 357 v | 358 +-+ +-+ +---+ 359 v | v | //=>| C | 360 +---+ +---+ // +---+ 361 | A |===>| B | 362 +---+ +---+ \\ +---+ 363 \\=>| D | 364 +---+ 365 ^ | 366 +-+ 368 (c) local 370 ==> SIP request flow 371 <-- Overload feedback loop 373 Figure 2: Degree of Cooperation between Servers 375 5.1. Hop-by-Hop 377 The idea of hop-by-hop overload control is to instantiate a separate 378 control loop between all neighboring SIP servers that directly 379 exchange traffic. I.e., the Actuator is located on the SIP server 380 that is the direct upstream neighbor of the SIP server that has the 381 corresponding Monitor. Each control loop between two servers is 382 completely independent of the control loop between other servers 383 further up- or downstream. In the example in Figure 2(a), three 384 independent overload control loops are instantiated: A - B, B - C and 385 B - D. Each loop only controls a single hop. Overload feedback 386 received from a downstream neighbor is not forwarded further 387 upstream. Instead, a SIP server acts on this feedback, for example, 388 by rejecting SIP messages if needed. If the upstream neighbor of a 389 server also becomes overloaded, it will report this problem to its 390 upstream neighbors, which again take action based on the reported 391 feedback. Thus, in hop-by-hop overload control, overload is always 392 resolved by the direct upstream neighbors of the overloaded server 393 without the need to involve entities that are located multiple SIP 394 hops away. 396 Hop-by-hop overload control reduces the impact of overload on a SIP 397 network and can avoid congestion collapse. It is simple and scales 398 well to networks with many SIP entities. An advantage is that it 399 does not require feedback to be transmitted across multiple-hops, 400 possibly crossing multiple trust domains. Feedback is sent to the 401 next hop only. Furthermore, it does not require a SIP entity to 402 aggregate a large number of overload status values or keep track of 403 the overload status of SIP servers it is not communicating with. 405 5.2. End-to-End 407 End-to-end overload control implements an overload control loop along 408 the entire path of a SIP request, from UAC to UAS. An end-to-end 409 overload control mechanism consolidates overload information from all 410 SIP servers on the way (including all proxies and the UAS) and uses 411 this information to throttle traffic as far upstream as possible. An 412 end-to-end overload control mechanism has to be able to frequently 413 collect the overload status of all servers on the potential path(s) 414 to a destination and combine this data into meaningful overload 415 feedback. 417 A UA or SIP server only throttles requests if it knows that these 418 requests will eventually be forwarded to an overloaded server. For 419 example, if D is overloaded in Figure 2(b), A should only throttle 420 requests it forwards to B when it knows that they will be forwarded 421 to D. It should not throttle requests that will eventually be 422 forwarded to C, since server C is not overloaded. In many cases, it 423 is difficult for A to determine which requests will be routed to C 424 and D since this depends on the local routing decision made by B. 425 These routing decisions can be highly variable and, for example, 426 depend on call routing policies configured by the user, services 427 invoked on a call, load balancing policies, etc. The fact that a 428 previous message to a target has been routed through an overloaded 429 server does not necessarily mean the next message to this target will 430 also be routed through the same server. 432 The main problem of end-to-end overload control is its inherent 433 complexity since UAC or SIP servers need to monitor all potential 434 paths to a destination in order to determine which requests should be 435 throttled and which requests may be sent. Even if this information 436 is available, it is not clear which path a specific request will 437 take. 439 A variant of end-to-end overload control is to implement a control 440 loop between a set of well-known SIP servers along the path of a SIP 441 request. For example, an overload control loop can be instantiated 442 between a server that only has one downstream neighbor or a set of 443 closely coupled SIP servers. A control loop spanning multiple hops 444 can be used if the sending entity has full knowledge about the SIP 445 servers on the path of a SIP message. 447 Overload control for SIP servers is different from end-to-end 448 congestion control used by transport protocols such as TCP. The 449 traffic exchanged between SIP servers consists of many individual SIP 450 messages. Each SIP message is created by a SIP UA to achieve a 451 specific goal (e.g., to start setting up a call). All messages have 452 their own source and destination addresses. Even SIP messages 453 containing identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message 454 to the same SIP URI) can be routed to different destinations. This 455 is different from TCP where the traffic exchanged between routers 456 consists of packets belonging to a usually longer flow of messages 457 exchanged between a source and a destination (e.g., to transmit a 458 file). If congestion occurs, the sources can detect this condition 459 and adjust the rate at which next packets are transmitted. 461 5.3. Local Overload Control 463 The idea of local overload control (see Figure 2(c)) is to run the 464 Monitor and Actuator on the same server. This enables the server to 465 monitor the current resource usage and to reject messages that can't 466 be processed without overusing the local resources. The fundamental 467 assumption behind local overload control is that it is less resource 468 consuming for a server to reject messages than to process them. A 469 server can therefore reject the excess messages it cannot process to 470 stop all retransmissions of these messages. Since rejecting messages 471 does consume resources on a SIP server, local overload control alone 472 cannot prevent a congestion collapse. 474 Local overload control can be used in conjunction with an other 475 overload control mechanisms and provides an additional layer of 476 protection against overload. It is fully implemented within a SIP 477 server and does not require cooperation between servers. In general, 478 SIP servers should apply other overload control techniques to control 479 load before a local overload control mechanism is activated as a 480 mechanism of last resort. 482 6. Topologies 484 The following topologies describe four generic SIP server 485 configurations. These topologies illustrate specific challenges for 486 an overload control mechanism. An actual SIP server topology is 487 likely to consist of combinations of these generic scenarios. 489 In the "load balancer" configuration shown in Figure 3(a) a set of 490 SIP servers (D, E and F) receives traffic from a single source A. A 491 load balancer is a typical example for such a configuration. In this 492 configuration, overload control needs to prevent server A (i.e., the 493 load balancer) from sending too much traffic to any of its downstream 494 neighbors D, E and F. If one of the downstream neighbors becomes 495 overloaded, A can direct traffic to the servers that still have 496 capacity. If one of the servers serves as a backup, it can be 497 activated once one of the primary servers reaches overload. 499 If A can reliably determine that D, E and F are its only downstream 500 neighbors and all of them are in overload, it may choose to report 501 overload upstream on behalf of D, E and F. However, if the set of 502 downstream neighbors is not fixed or only some of them are in 503 overload then A should not activate an overload control since A can 504 still forward the requests destined to non-overloaded downstream 505 neighbors. These requests would be throttled as well if A would use 506 overload control towards its upstream neighbors. 508 In some cases, the servers D, E, and F are in a server farm and 509 configured to appear as a single server to their upstream neighbors. 510 In this case, server A can report overload on behalf of the server 511 farm. If the load balancer is not a SIP entity, servers D, E, and F 512 can report the overall load of the server farm (i.e., the load of the 513 virtual server) in their messages. As an alternative, one of the 514 servers (e.g., server E) can report overload on behalf of the server 515 farm. In this case, not all messages contain overload control 516 information and it needs to be ensured that all upstream neighbors 517 are periodically served by server E to received updated information. 519 In the "multiple sources" configuration shown in Figure 3(b), a SIP 520 server D receives traffic from multiple upstream sources A, B and C. 521 Each of these sources can contribute a different amount of traffic, 522 which can vary over time. The set of active upstream neighbors of D 523 can change as servers may become inactive and previously inactive 524 servers may start contributing traffic to D. 526 If D becomes overloaded, it needs to generate feedback to reduce the 527 amount of traffic it receives from its upstream neighbors. D needs 528 to decide by how much each upstream neighbor should reduce traffic. 529 This decision can require the consideration of the amount of traffic 530 sent by each upstream neighbor and it may need to be re-adjusted as 531 the traffic contributed by each upstream neighbor varies over time. 532 Server D can use a local fairness policy to determine how much 533 traffic it accepts from each upstream neighbor. 535 In many configurations, SIP servers form a "mesh" as shown in 536 Figure 3(c). Here, multiple upstream servers A, B and C forward 537 traffic to multiple alternative servers D and E. This configuration 538 is a combination of the "load balancer" and "multiple sources" 539 scenario. 541 +---+ +---+ 542 /->| D | | A |-\ 543 / +---+ +---+ \ 544 / \ +---+ 545 +---+-/ +---+ +---+ \->| | 546 | A |------>| E | | B |------>| D | 547 +---+-\ +---+ +---+ /->| | 548 \ / +---+ 549 \ +---+ +---+ / 550 \->| F | | C |-/ 551 +---+ +---+ 553 (a) load balancer (b) multiple sources 555 +---+ 556 | A |---\ a--\ 557 +---+-\ \---->+---+ \ 558 \/----->| D | b--\ \--->+---+ 559 +---+--/\ /-->+---+ \---->| | 560 | B | \/ c-------->| D | 561 +---+---\/\--->+---+ | | 562 /\---->| E | ... /--->+---+ 563 +---+--/ /-->+---+ / 564 | C |-----/ z--/ 565 +---+ 567 (c) mesh (d) edge proxy 569 Figure 3: Topologies 571 Overload control that is based on reducing the number of messages a 572 sender is allowed to send is not suited for servers that receive 573 requests from a very large population of senders, each of which only 574 sends a very small number of requests. This scenario is shown in 575 Figure 3(d). An edge proxy that is connected to many UAs is a 576 typical example for such a configuration. Since each UA typically 577 only infrequently sends requests, which are often related to the same 578 session, it can't decrease its message rate to resolve the overload. 580 A SIP server that receives traffic from many sources, which each 581 contribute only a small number of requests can resort to local 582 overload control by rejecting a percentage of the requests it 583 receives with 503 (Service Unavailable) responses. Since it has many 584 upstream neighbors it can send 503 (Service Unavailable) to a 585 fraction of them to gradually reduce load without entirely stopping 586 all incoming traffic. The Retry-After header can be used in 503 587 (Service Unavailable) responses to ask upstream neighbors to wait a 588 given number of seconds before trying the request again. Using 503 589 (Service Unavailable) can, however, not prevent overload if a large 590 number of sources create requests (e.g., to place calls) at the same 591 time. 593 Note: The requirements of the "edge proxy" topology are different 594 from the ones of the other topologies, which may require a 595 different method for overload control. 597 7. Fairness 599 There are many different ways to define fairness between multiple 600 upstream neighbors of a SIP server. In the context of SIP server 601 overload, it is helpful to describe two categories of fairness: basic 602 fairness and customized fairness. With basic fairness a SIP server 603 treats all requests equally and ensures that each request has the 604 same chance of succeeding. With customized fairness, the server 605 allocates resources according to different priorities. An example 606 application of the basic fairness criteria is the "Third caller 607 receives free tickets" scenario, where each call attempt should have 608 an equal success probability in making calls through an overloaded 609 SIP server, irrespective of the service provider where it was 610 initiated. An example of customized fairness would be a server which 611 assigns different resource allocations to its upstream neighbors 612 (e.g., service providers) as defined in a service level agreement 613 (SLA). 615 8. Performance Metrics 617 The performance of an overload control mechanism can be measured 618 using different metrics. 620 A key performance indicator is the goodput of a SIP server under 621 overload. Ideally, a SIP server will be enabled to perform at its 622 capacity limit during periods of overload. E.g., if a SIP server has 623 a processing capacity of 140 INVITE transactions per second then an 624 overload control mechanism should enable it to process 140 INVITEs 625 per second even if the offered load is much higher. The delay 626 introduced by a SIP server is another important indicator. An 627 overload control mechanism should ensure that the delay encountered 628 by a SIP message is not increased significantly during periods of 629 overload. Significantly increased delay can lead to time-outs, and 630 retransmission of SIP messages, making the overload worse. 632 Responsiveness and stability are other important performance 633 indicators. An overload control mechanism should quickly react to an 634 overload occurrence and ensure that a SIP server does not become 635 overloaded even during sudden peaks of load. Similarly, an overload 636 control mechanism should quickly stop rejecting requests if the 637 overload disappears. Stability is another important criteria. An 638 overload control mechanism should not cause significant oscillations 639 of load on a SIP server. The performance of SIP overload control 640 mechanisms is discussed in [Noel et al.], [Shen et al.], [Hilt et 641 al.] and [Abdelal et al.]. 643 In addition to the above metrics, there are other indicators that are 644 relevant for the evaluation of an overload control mechanism: 646 Fairness: Which types of fairness does the overload control 647 mechanism implement? 648 Self-limiting: Is the overload control self-limiting if a SIP server 649 becomes unresponsive? 650 Changes in neighbor set: How does the mechanism adapt to a changing 651 set of sending entities? 652 Data points to monitor: Which and how many data points does an 653 overload control mechanism need to monitor? 654 Computational load: What is the (cpu) load created by the overload 655 "monitor" and "actuator" 657 9. Explicit Overload Control Feedback 659 Explicit overload control feedback enables a receiver to indicate how 660 much traffic it wants to receive. Explicit overload control 661 mechanisms can be differentiated based on the type of information 662 conveyed in the overload control feedback and whether the control 663 function is in the receiving or sending entity (receiver- vs. sender- 664 based overload control), or both. 666 9.1. Rate-based Overload Control 668 The key idea of rate-based overload control is to limit the request 669 rate at which an upstream element is allowed to forward traffic to 670 the downstream neighbor. If overload occurs, a SIP server instructs 671 each upstream neighbor to send at most X requests per second. Each 672 upstream neighbor can be assigned a different rate cap. 674 An example algorithm for an Actuator in the sending entity is request 675 gapping. After transmitting a request to a downstream neighbor, a 676 server waits for 1/X seconds before it transmits the next request to 677 the same neighbor. Requests that arrive during the waiting period 678 are not forwarded and are either redirected, rejected or buffered. 679 Request gapping only affects requests that are targeted by overload 680 control (e.g., requests that initiate a transaction and not 681 retransmissions in an ongoing transaction). 683 The rate cap ensures that the number of requests received by a SIP 684 server never increases beyond the sum of all rate caps granted to 685 upstream neighbors. Rate-based overload control protects a SIP 686 server against overload even during load spikes assuming there are no 687 new upstream neighbors that start sending traffic. New upstream 688 neighbors need to be considered in the rate caps assigned to all 689 upstream neighbors. The rate assigned to upstream neighbors needs to 690 be adjusted when new neighbors join. During periods when new 691 neighbors are joining, overload can occur in extreme cases until the 692 rate caps of all servers are adjusted to again match the overall rate 693 cap of the server. The overall rate cap of a SIP server is 694 determined by an overload control algorithm, e.g., based on system 695 load. 697 Rate-based overload control requires a SIP server to assign a rate 698 cap to each of its upstream neighbors while it is activated. 699 Effectively, a server needs to assign a share of its overall capacity 700 to each upstream neighbor. A server needs to ensure that the sum of 701 all rate caps assigned to upstream neighbors does not substantially 702 oversubscribe its actual processing capacity. This requires a SIP 703 server to keep track of the set of upstream neighbors and to adjust 704 the rate cap if a new upstream neighbor appears or an existing 705 neighbor stops transmitting. For example, if the capacity of the 706 server is X and this server is receiving traffic from two upstream 707 neighbors, it can assign a rate of X/2 to each of them. If a third 708 sender appears, the rate for each sender is lowered to X/3. If the 709 overall rate cap is too high, a server may experience overload. If 710 the cap is too low, the upstream neighbors will reject requests even 711 though they could be processed by the server. 713 An approach for estimating a rate cap for each upstream neighbor is 714 using a fixed proportion of a control variable, X, where X is 715 initially equal to the capacity of the SIP server. The server then 716 increases or decreases X until the workload arrival rate matches the 717 actual server capacity. Usually, this will mean that the sum of the 718 rate caps sent out by the server (=X) exceeds its actual capacity, 719 but enables upstream neighbors who are not generating more than their 720 fair share of the work to be effectively unrestricted. In this 721 approach, the server only has to measure the aggregate arrival rate. 722 However, since the overall rate cap is usually higher than the actual 723 capacity, brief periods of overload may occur. 725 9.2. Loss-based Overload Control 727 A loss percentage enables a SIP server to ask an upstream neighbor to 728 reduce the number of requests it would normally forward to this 729 server by X%. For example, a SIP server can ask an upstream neighbor 730 to reduce the number of requests this neighbor would normally send by 731 10%. The upstream neighbor then redirects or rejects 10% of the 732 traffic that is destined for this server. 734 An algorithm for the sending entity to implement a loss percentage is 735 to draw a random number between 1 and 100 for each request to be 736 forwarded. The request is not forwarded to the server if the random 737 number is less than or equal to X. 739 An advantage of loss-based overload control is that, the receiving 740 entity does not need to track the set of upstream neighbors or the 741 request rate it receives from each upstream neighbor. It is 742 sufficient to monitor the overall system utilization. To reduce 743 load, a server can ask its upstream neighbors to lower the traffic 744 forwarded by a certain percentage. The server calculates this 745 percentage by combining the loss percentage that is currently in use 746 (i.e., the loss percentage the upstream neighbors are currently using 747 when forwarding traffic), the current system utilization and the 748 desired system utilization. For example, if the server load 749 approaches 90% and the current loss percentage is set to a 50% 750 traffic reduction, then the server can decide to increase the loss 751 percentage to 55% in order to get to a system utilization of 80%. 752 Similarly, the server can lower the loss percentage if permitted by 753 the system utilization. 755 Loss-based overload control requires that the throttle percentage is 756 adjusted to the current overall number of requests received by the 757 server. This is particularly important if the number of requests 758 received fluctuates quickly. For example, if a SIP server sets a 759 throttle value of 10% at time t1 and the number of requests increases 760 by 20% between time t1 and t2 (t1