idnits 2.17.1 draft-ietf-soc-overload-design-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 9, 2011) is 4668 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SOC Working Group V. Hilt 3 Internet-Draft Bell Labs/Alcatel-Lucent 4 Intended status: Informational E. Noel 5 Expires: January 10, 2012 AT&T Labs 6 C. Shen 7 Columbia University 8 A. Abdelal 9 Sonus Networks 10 July 9, 2011 12 Design Considerations for Session Initiation Protocol (SIP) Overload 13 Control 14 draft-ietf-soc-overload-design-07 16 Abstract 18 Overload occurs in Session Initiation Protocol (SIP) networks when 19 SIP servers have insufficient resources to handle all SIP messages 20 they receive. Even though the SIP protocol provides a limited 21 overload control mechanism through its 503 (Service Unavailable) 22 response code, SIP servers are still vulnerable to overload. This 23 document discusses models and design considerations for a SIP 24 overload control mechanism. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 10, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. SIP Overload Problem . . . . . . . . . . . . . . . . . . . . . 4 62 3. Explicit vs. Implicit Overload Control . . . . . . . . . . . . 5 63 4. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 5. Degree of Cooperation . . . . . . . . . . . . . . . . . . . . 7 65 5.1. Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.3. Local Overload Control . . . . . . . . . . . . . . . . . . 11 68 6. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 70 8. Performance Metrics . . . . . . . . . . . . . . . . . . . . . 14 71 9. Explicit Overload Control Feedback . . . . . . . . . . . . . . 15 72 9.1. Rate-based Overload Control . . . . . . . . . . . . . . . 16 73 9.2. Loss-based Overload Control . . . . . . . . . . . . . . . 17 74 9.3. Window-based Overload Control . . . . . . . . . . . . . . 18 75 9.4. Overload Signal-based Overload Control . . . . . . . . . . 19 76 9.5. On-/Off Overload Control . . . . . . . . . . . . . . . . . 20 77 10. Implicit Overload Control . . . . . . . . . . . . . . . . . . 20 78 11. Overload Control Algorithms . . . . . . . . . . . . . . . . . 20 79 12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 21 80 13. Security Considerations . . . . . . . . . . . . . . . . . . . 21 81 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 82 15. Informative References . . . . . . . . . . . . . . . . . . . . 23 83 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 24 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 86 1. Introduction 88 As with any network element, a Session Initiation Protocol (SIP) 89 [RFC3261] server can suffer from overload when the number of SIP 90 messages it receives exceeds the number of messages it can process. 91 Overload occurs if a SIP server does not have sufficient resources to 92 process all incoming SIP messages. These resources may include CPU, 93 memory, input/output, or disk resources. 95 Overload can pose a serious problem for a network of SIP servers. 96 During periods of overload, the throughput of SIP messages in a 97 network of SIP servers can be significantly degraded. In fact, 98 overload in a SIP server may lead to a situation in which the 99 overload is amplified by retransmissions of SIP messages causing the 100 throughput to drop down to a very small fraction of the original 101 processing capacity. This is often called congestion collapse. 103 An overload control mechanism enables a SIP server to process SIP 104 messages close to its capacity limit during times of overload. 105 Overload control is used by a SIP server if it is unable to process 106 all SIP requests due to resource constraints. There are other 107 failure cases in which a SIP server can successfully process incoming 108 requests but has to reject them for other reasons. For example, a 109 PSTN gateway that runs out of trunk lines but still has plenty of 110 capacity to process SIP messages should reject incoming INVITEs using 111 a response such as 488 (Not Acceptable Here), as described in 112 [RFC4412]. Similarly, a SIP registrar that has lost connectivity to 113 its registration database but is still capable of processing SIP 114 messages should reject REGISTER requests with a 500 (Server Error) 115 response [RFC3261]. Overload control mechanisms do not apply in 116 these cases and SIP provides appropriate response codes for them. 118 There are cases in which a SIP server runs other services that do not 119 involve the processing of SIP messages (e.g., processing of RTP 120 packets, database queries, software updates and event handling). 121 These services may, or may not, be correlated with the SIP message 122 volume. These services can use up a substantial share of resources 123 available on the server (e.g., CPU cycles) and leave the server in a 124 condition where it is unable to process all incoming SIP requests. 125 In these cases, the SIP server applies SIP overload control 126 mechanisms to avoid congestion collapse on the SIP signaling plane. 127 However, controlling the number of SIP requests may not significantly 128 reduce the load on the server if the resource shortage was created by 129 another service. In these cases, it is to be expected that the 130 server uses appropriate methods of controlling the resource usage of 131 other services. The specifics of controlling the resource usage of 132 other services and their coordination is out of scope for this 133 document. 135 The SIP protocol provides a limited mechanism for overload control 136 through its 503 (Service Unavailable) response code and the Retry- 137 After header. However, this mechanism cannot prevent overload of a 138 SIP server and it cannot prevent congestion collapse. In fact, it 139 may cause traffic to oscillate and to shift between SIP servers and 140 thereby worsen an overload condition. A detailed discussion of the 141 SIP overload problem, the problems with the 503 (Service Unavailable) 142 response code and the Retry-After header and the requirements for a 143 SIP overload control mechanism can be found in [RFC5390]. In 144 addition, 503 is used for other situations, not just SIP Server 145 overload. A SIP Overload Control process based on 503 would have to 146 specify exactly which cause values trigger the Overload Control. 148 This document discusses the models, assumptions and design 149 considerations for a SIP overload control mechanism. The document 150 originated in the SIP overload control design team and has been 151 further developed by the SIP Overload Control (SOC) working group. 153 2. SIP Overload Problem 155 A key contributor to SIP congestion collapse [RFC5390] is the 156 regenerative behavior of overload in the SIP protocol. When SIP is 157 running over the UDP protocol, it will retransmit messages that were 158 dropped or excessively delayed by a SIP server due to overload and 159 thereby increase the offered load for the already overloaded server. 160 This increase in load worsens the severity of the overload condition 161 and, in turn, causes more messages to be dropped. A congestion 162 collapse can occur [Hilt et al.], [Noel et al.], [Shen et al.] and 163 [Abdelal et al.]. 165 Regenerative behavior under overload should ideally be avoided by any 166 protocol as this would lead to unstable operation under overload. 167 However, this is often difficult to achieve in practice. For 168 example, changing the SIP retransmission timer mechanisms can reduce 169 the degree of regeneration during overload but will impact the 170 ability of SIP to recover from message losses. Without any 171 retransmission each message that is dropped due to SIP server 172 overload will eventually lead to a failed transaction. 174 For a SIP INVITE transaction to be successful, a minimum of three 175 messages need to be forwarded by a SIP server. Often an INVITE 176 transaction consists of five or more SIP messages. If a SIP server 177 under overload randomly discards messages without evaluating them, 178 the chances that all messages belonging to a transaction are 179 successfully forwarded will decrease as the load increases. Thus, 180 the number of transactions that complete successfully will decrease 181 even if the message throughput of a server remains up and assuming 182 the overload behavior is fully non-regenerative. A SIP server might 183 (partially) parse incoming messages to determine if it is a new 184 request or a message belonging to an existing transaction. 185 Discarding a SIP message after spending the resources to parse it is 186 expensive. The number of successful transactions will therefore 187 decline with an increase in load as fewer resources can be spent on 188 forwarding messages and more resources are consumed by inspecting 189 messages that will eventually be dropped. The rate of the decline 190 depends on the amount of resources spent to inspect each message. 192 Another challenge for SIP overload control is controlling the rate of 193 the true traffic source. Overload is often caused by a large number 194 of user agents (UAs) each of which creates only a single message. 195 However, the sum of their traffic can overload a SIP server. The 196 overload mechanisms suitable for controlling a SIP server (e.g., rate 197 control) may not be effective for individual UAs. In some cases, 198 there are other non-SIP mechanisms for limiting the load from the 199 UAs. These may operate independently from, or in conjunction with, 200 the SIP overload mechanisms described here. In either case, they are 201 out of scope for this document. 203 3. Explicit vs. Implicit Overload Control 205 The main differences between explicit and implicit overload control 206 is the way overload is signaled from a SIP server that is reaching 207 overload condition to its upstream neighbors. 209 In an explicit overload control mechanism, a SIP server uses an 210 explicit overload signal to indicate that it is reaching its capacity 211 limit. Upstream neighbors receiving this signal can adjust their 212 transmission rate according to the overload signal to a level that is 213 acceptable to the downstream server. The overload signal enables a 214 SIP server to steer the load it is receiving to a rate at which it 215 can perform at maximum capacity. 217 Implicit overload control uses the absence of responses and packet 218 loss as an indication of overload. A SIP server that is sensing such 219 a condition reduces the load it is forwarding a downstream neighbor. 220 Since there is no explicit overload signal, this mechanism is robust 221 as it does not depend on actions taken by the SIP server running into 222 overload. 224 The ideas of explicit and implicit overload control are in fact 225 complementary. By considering implicit overload indications a server 226 can avoid overloading an unresponsive downstream neighbor. An 227 explicit overload signal enables a SIP server to actively steer the 228 incoming load to a desired level. 230 4. System Model 232 The model shown in Figure 1 identifies fundamental components of an 233 explicit SIP overload control mechanism: 235 SIP Processor: The SIP Processor processes SIP messages and is the 236 component that is protected by overload control. 237 Monitor: The Monitor measures the current load of the SIP processor 238 on the receiving entity. It implements the mechanisms needed to 239 determine the current usage of resources relevant for the SIP 240 processor and reports load samples (S) to the Control Function. 241 Control Function: The Control Function implements the overload 242 control algorithm. The control function uses the load samples (S) 243 and determines if overload has occurred and a throttle (T) needs 244 to be set to adjust the load sent to the SIP processor on the 245 receiving entity. The control function on the receiving entity 246 sends load feedback (F) to the sending entity. 247 Actuator: The Actuator implements the algorithms needed to act on 248 the throttles (T) and ensures that the amount of traffic forwarded 249 to the receiving entity meets the criteria of the throttle. For 250 example, a throttle may instruct the Actuator to not forward more 251 than 100 INVITE messages per second. The Actuator implements the 252 algorithms to achieve this objective, e.g., using message gapping. 253 It also implements algorithms to select the messages that will be 254 affected and determine whether they are rejected or redirected. 256 The type of feedback (F) conveyed from the receiving to the sending 257 entity depends on the overload control method used (i.e., loss-based, 258 rate-based, window-based or signal-based overload control; see 259 Section 9), the overload control algorithm (see Section 11) as well 260 as other design parameters. The feedback (F) enables the sending 261 entity to adjust the amount of traffic forwarded to the receiving 262 entity to a level that is acceptable to the receiving entity without 263 causing overload. 265 Figure 1 depicts a general system model for overload control. In 266 this diagram, one instance of the control function is on the sending 267 entity (i.e., associated with the actuator) and one is on the 268 receiving entity (i.e., associated with the monitor). However, a 269 specific mechanism may not require both elements. In this case, one 270 of two control function elements can be empty and simply passes along 271 feedback. E.g., if (F) is defined as a loss-rate (e.g., reduce 272 traffic by 10%) there is no need for a control function on the 273 sending entity as the content of (F) can be copied directly into (T). 275 The model in Figure 1 shows a scenario with one sending and one 276 receiving entity. In a more realistic scenario a receiving entity 277 will receive traffic from multiple sending entities and vice versa 278 (see Section 6). The feedback generated by a Monitor will therefore 279 often be distributed across multiple Actuators. A Monitor needs to 280 be able to split the load it can process across multiple sending 281 entities and generate feedback that correctly adjusts the load each 282 sending entity is allowed to send. Similarly, an Actuator needs to 283 be prepared to receive different levels of feedback from different 284 receiving entities and throttle traffic to these entities 285 accordingly. 287 In a realistic deployment, SIP messages will flow in both directions, 288 from server B to server A as well as server A to server B. The 289 overload control mechanisms in each direction can be considered 290 independently. For messages flowing from server A to server B, the 291 sending entity is server A and the receiving entity is server B and 292 vice versa. The control loops in both directions operate 293 independently. 295 Sending Receiving 296 Entity Entity 297 +----------------+ +----------------+ 298 | Server A | | Server B | 299 | +----------+ | | +----------+ | -+ 300 | | Control | | F | | Control | | | 301 | | Function |<-+------+--| Function | | | 302 | +----------+ | | +----------+ | | 303 | T | | | ^ | | Overload 304 | v | | | S | | Control 305 | +----------+ | | +----------+ | | 306 | | Actuator | | | | Monitor | | | 307 | +----------+ | | +----------+ | | 308 | | | | ^ | -+ 309 | v | | | | -+ 310 | +----------+ | | +----------+ | | 311 <-+--| SIP | | | | SIP | | | SIP 312 --+->|Processor |--+------+->|Processor |--+-> | System 313 | +----------+ | | +----------+ | | 314 +----------------+ +----------------+ -+ 316 Figure 1: System Model for Explicit Overload Control 318 5. Degree of Cooperation 320 A SIP request is usually processed by more than one SIP server on its 321 path to the destination. Thus, a design choice for an explicit 322 overload control mechanism is where to place the components of 323 overload control along the path of a request and, in particular, 324 where to place the Monitor and Actuator. This design choice 325 determines the degree of cooperation between the SIP servers on the 326 path. Overload control can be implemented hop-by-hop with the 327 Monitor on one server and the Actuator on its direct upstream 328 neighbor. Overload control can be implemented end-to-end with 329 Monitors on all SIP servers along the path of a request and an 330 Actuator on the sender. In this case, the Control Functions 331 associated with each Monitor have to cooperate to jointly determine 332 the overall feedback for this path. Finally, overload control can be 333 implemented locally on a SIP server if Monitor and Actuator reside on 334 the same server. In this case, the sending entity and receiving 335 entity are the same SIP server and Actuator and Monitor operate on 336 the same SIP processor (although, the Actuator typically operates on 337 a pre-processing stage in local overload control). Local overload 338 control is an internal overload control mechanism as the control loop 339 is implemented internally on one server. Hop-by-hop and end-to-end 340 are external overload control mechanisms. All three configurations 341 are shown in Figure 2. 343 +---------+ +------(+)---------+ 344 +------+ | | | ^ | 345 | | | +---+ | | +---+ 346 v | v //=>| C | v | //=>| C | 347 +---+ +---+ // +---+ +---+ +---+ // +---+ 348 | A |===>| B | | A |===>| B | 349 +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ 350 ^ \\=>| D | ^ | \\=>| D | 351 | +---+ | | +---+ 352 | | | v | 353 +---------+ +------(+)---------+ 355 (a) hop-by-hop (b) end-to-end 357 +-+ 358 v | 359 +-+ +-+ +---+ 360 v | v | //=>| C | 361 +---+ +---+ // +---+ 362 | A |===>| B | 363 +---+ +---+ \\ +---+ 364 \\=>| D | 365 +---+ 366 ^ | 367 +-+ 369 (c) local 371 ==> SIP request flow 372 <-- Overload feedback loop 374 Figure 2: Degree of Cooperation between Servers 376 5.1. Hop-by-Hop 378 The idea of hop-by-hop overload control is to instantiate a separate 379 control loop between all neighboring SIP servers that directly 380 exchange traffic. I.e., the Actuator is located on the SIP server 381 that is the direct upstream neighbor of the SIP server that has the 382 corresponding Monitor. Each control loop between two servers is 383 completely independent of the control loop between other servers 384 further up- or downstream. In the example in Figure 2(a), three 385 independent overload control loops are instantiated: A - B, B - C and 386 B - D. Each loop only controls a single hop. Overload feedback 387 received from a downstream neighbor is not forwarded further 388 upstream. Instead, a SIP server acts on this feedback, for example, 389 by rejecting SIP messages if needed. If the upstream neighbor of a 390 server also becomes overloaded, it will report this problem to its 391 upstream neighbors, which again take action based on the reported 392 feedback. Thus, in hop-by-hop overload control, overload is always 393 resolved by the direct upstream neighbors of the overloaded server 394 without the need to involve entities that are located multiple SIP 395 hops away. 397 Hop-by-hop overload control reduces the impact of overload on a SIP 398 network and can avoid congestion collapse. It is simple and scales 399 well to networks with many SIP entities. An advantage is that it 400 does not require feedback to be transmitted across multiple-hops, 401 possibly crossing multiple trust domains. Feedback is sent to the 402 next hop only. Furthermore, it does not require a SIP entity to 403 aggregate a large number of overload status values or keep track of 404 the overload status of SIP servers it is not communicating with. 406 5.2. End-to-End 408 End-to-end overload control implements an overload control loop along 409 the entire path of a SIP request, from user agent client (UAC) to 410 user agent server (UAS). An end-to-end overload control mechanism 411 consolidates overload information from all SIP servers on the way 412 (including all proxies and the UAS) and uses this information to 413 throttle traffic as far upstream as possible. An end-to-end overload 414 control mechanism has to be able to frequently collect the overload 415 status of all servers on the potential path(s) to a destination and 416 combine this data into meaningful overload feedback. 418 A UA or SIP server only throttles requests if it knows that these 419 requests will eventually be forwarded to an overloaded server. For 420 example, if D is overloaded in Figure 2(b), A should only throttle 421 requests it forwards to B when it knows that they will be forwarded 422 to D. It should not throttle requests that will eventually be 423 forwarded to C, since server C is not overloaded. In many cases, it 424 is difficult for A to determine which requests will be routed to C 425 and D since this depends on the local routing decision made by B. 426 These routing decisions can be highly variable and, for example, 427 depend on call routing policies configured by the user, services 428 invoked on a call, load balancing policies, etc. The fact that a 429 previous message to a target has been routed through an overloaded 430 server does not necessarily mean the next message to this target will 431 also be routed through the same server. 433 The main problem of end-to-end overload control is its inherent 434 complexity since UAC or SIP servers need to monitor all potential 435 paths to a destination in order to determine which requests should be 436 throttled and which requests may be sent. Even if this information 437 is available, it is not clear which path a specific request will 438 take. 440 A variant of end-to-end overload control is to implement a control 441 loop between a set of well-known SIP servers along the path of a SIP 442 request. For example, an overload control loop can be instantiated 443 between a server that only has one downstream neighbor or a set of 444 closely coupled SIP servers. A control loop spanning multiple hops 445 can be used if the sending entity has full knowledge about the SIP 446 servers on the path of a SIP message. 448 Overload control for SIP servers is different from end-to-end 449 congestion control used by transport protocols such as TCP. The 450 traffic exchanged between SIP servers consists of many individual SIP 451 messages. Each SIP message is created by a SIP UA to achieve a 452 specific goal (e.g., to start setting up a call). All messages have 453 their own source and destination addresses. Even SIP messages 454 containing identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message 455 to the same SIP URI) can be routed to different destinations. This 456 is different from TCP where the traffic exchanged between routers 457 consists of packets belonging to a usually longer flow of messages 458 exchanged between a source and a destination (e.g., to transmit a 459 file). If congestion occurs, the sources can detect this condition 460 and adjust the rate at which next packets are transmitted. 462 5.3. Local Overload Control 464 The idea of local overload control (see Figure 2(c)) is to run the 465 Monitor and Actuator on the same server. This enables the server to 466 monitor the current resource usage and to reject messages that can't 467 be processed without overusing the local resources. The fundamental 468 assumption behind local overload control is that it is less resource 469 consuming for a server to reject messages than to process them. A 470 server can therefore reject the excess messages it cannot process to 471 stop all retransmissions of these messages. Since rejecting messages 472 does consume resources on a SIP server, local overload control alone 473 cannot prevent a congestion collapse. 475 Local overload control can be used in conjunction with an other 476 overload control mechanisms and provides an additional layer of 477 protection against overload. It is fully implemented within a SIP 478 server and does not require cooperation between servers. In general, 479 SIP servers should apply other overload control techniques to control 480 load before a local overload control mechanism is activated as a 481 mechanism of last resort. 483 6. Topologies 485 The following topologies describe four generic SIP server 486 configurations. These topologies illustrate specific challenges for 487 an overload control mechanism. An actual SIP server topology is 488 likely to consist of combinations of these generic scenarios. 490 In the "load balancer" configuration shown in Figure 3(a) a set of 491 SIP servers (D, E and F) receives traffic from a single source A. A 492 load balancer is a typical example for such a configuration. In this 493 configuration, overload control needs to prevent server A (i.e., the 494 load balancer) from sending too much traffic to any of its downstream 495 neighbors D, E and F. If one of the downstream neighbors becomes 496 overloaded, A can direct traffic to the servers that still have 497 capacity. If one of the servers serves as a backup, it can be 498 activated once one of the primary servers reaches overload. 500 If A can reliably determine that D, E and F are its only downstream 501 neighbors and all of them are in overload, it may choose to report 502 overload upstream on behalf of D, E and F. However, if the set of 503 downstream neighbors is not fixed or only some of them are in 504 overload then A should not activate an overload control since A can 505 still forward the requests destined to non-overloaded downstream 506 neighbors. These requests would be throttled as well if A would use 507 overload control towards its upstream neighbors. 509 In some cases, the servers D, E, and F are in a server farm and 510 configured to appear as a single server to their upstream neighbors. 511 In this case, server A can report overload on behalf of the server 512 farm. If the load balancer is not a SIP entity, servers D, E, and F 513 can report the overall load of the server farm (i.e., the load of the 514 virtual server) in their messages. As an alternative, one of the 515 servers (e.g., server E) can report overload on behalf of the server 516 farm. In this case, not all messages contain overload control 517 information and it needs to be ensured that all upstream neighbors 518 are periodically served by server E to received updated information. 520 In the "multiple sources" configuration shown in Figure 3(b), a SIP 521 server D receives traffic from multiple upstream sources A, B and C. 522 Each of these sources can contribute a different amount of traffic, 523 which can vary over time. The set of active upstream neighbors of D 524 can change as servers may become inactive and previously inactive 525 servers may start contributing traffic to D. 527 If D becomes overloaded, it needs to generate feedback to reduce the 528 amount of traffic it receives from its upstream neighbors. D needs 529 to decide by how much each upstream neighbor should reduce traffic. 530 This decision can require the consideration of the amount of traffic 531 sent by each upstream neighbor and it may need to be re-adjusted as 532 the traffic contributed by each upstream neighbor varies over time. 533 Server D can use a local fairness policy to determine how much 534 traffic it accepts from each upstream neighbor. 536 In many configurations, SIP servers form a "mesh" as shown in 537 Figure 3(c). Here, multiple upstream servers A, B and C forward 538 traffic to multiple alternative servers D and E. This configuration 539 is a combination of the "load balancer" and "multiple sources" 540 scenario. 542 +---+ +---+ 543 /->| D | | A |-\ 544 / +---+ +---+ \ 545 / \ +---+ 546 +---+-/ +---+ +---+ \->| | 547 | A |------>| E | | B |------>| D | 548 +---+-\ +---+ +---+ /->| | 549 \ / +---+ 550 \ +---+ +---+ / 551 \->| F | | C |-/ 552 +---+ +---+ 554 (a) load balancer (b) multiple sources 556 +---+ 557 | A |---\ a--\ 558 +---+-\ \---->+---+ \ 559 \/----->| D | b--\ \--->+---+ 560 +---+--/\ /-->+---+ \---->| | 561 | B | \/ c-------->| D | 562 +---+---\/\--->+---+ | | 563 /\---->| E | ... /--->+---+ 564 +---+--/ /-->+---+ / 565 | C |-----/ z--/ 566 +---+ 568 (c) mesh (d) edge proxy 570 Figure 3: Topologies 572 Overload control that is based on reducing the number of messages a 573 sender is allowed to send is not suited for servers that receive 574 requests from a very large population of senders, each of which only 575 sends a very small number of requests. This scenario is shown in 576 Figure 3(d). An edge proxy that is connected to many UAs is a 577 typical example for such a configuration. Since each UA typically 578 only infrequently sends requests, which are often related to the same 579 session, it can't decrease its message rate to resolve the overload. 581 A SIP server that receives traffic from many sources, which each 582 contribute only a small number of requests can resort to local 583 overload control by rejecting a percentage of the requests it 584 receives with 503 (Service Unavailable) responses. Since it has many 585 upstream neighbors it can send 503 (Service Unavailable) to a 586 fraction of them to gradually reduce load without entirely stopping 587 all incoming traffic. The Retry-After header can be used in 503 588 (Service Unavailable) responses to ask upstream neighbors to wait a 589 given number of seconds before trying the request again. Using 503 590 (Service Unavailable) can, however, not prevent overload if a large 591 number of sources create requests (e.g., to place calls) at the same 592 time. 594 Note: The requirements of the "edge proxy" topology are different 595 from the ones of the other topologies, which may require a 596 different method for overload control. 598 7. Fairness 600 There are many different ways to define fairness between multiple 601 upstream neighbors of a SIP server. In the context of SIP server 602 overload, it is helpful to describe two categories of fairness: basic 603 fairness and customized fairness. With basic fairness a SIP server 604 treats all requests equally and ensures that each request has the 605 same chance of succeeding. With customized fairness, the server 606 allocates resources according to different priorities. An example 607 application of the basic fairness criteria is the "Third caller 608 receives free tickets" scenario, where each call attempt should have 609 an equal success probability in making calls through an overloaded 610 SIP server, irrespective of the service provider where it was 611 initiated. An example of customized fairness would be a server which 612 assigns different resource allocations to its upstream neighbors 613 (e.g., service providers) as defined in a service level agreement 614 (SLA). 616 8. Performance Metrics 618 The performance of an overload control mechanism can be measured 619 using different metrics. 621 A key performance indicator is the goodput of a SIP server under 622 overload. Ideally, a SIP server will be enabled to perform at its 623 capacity limit during periods of overload. E.g., if a SIP server has 624 a processing capacity of 140 INVITE transactions per second then an 625 overload control mechanism should enable it to process 140 INVITEs 626 per second even if the offered load is much higher. The delay 627 introduced by a SIP server is another important indicator. An 628 overload control mechanism should ensure that the delay encountered 629 by a SIP message is not increased significantly during periods of 630 overload. Significantly increased delay can lead to time-outs, and 631 retransmission of SIP messages, making the overload worse. 633 Responsiveness and stability are other important performance 634 indicators. An overload control mechanism should quickly react to an 635 overload occurrence and ensure that a SIP server does not become 636 overloaded even during sudden peaks of load. Similarly, an overload 637 control mechanism should quickly stop rejecting requests if the 638 overload disappears. Stability is another important criteria. An 639 overload control mechanism should not cause significant oscillations 640 of load on a SIP server. The performance of SIP overload control 641 mechanisms is discussed in [Noel et al.], [Shen et al.], [Hilt et 642 al.] and [Abdelal et al.]. 644 In addition to the above metrics, there are other indicators that are 645 relevant for the evaluation of an overload control mechanism: 647 Fairness: Which types of fairness does the overload control 648 mechanism implement? 649 Self-limiting: Is the overload control self-limiting if a SIP server 650 becomes unresponsive? 651 Changes in neighbor set: How does the mechanism adapt to a changing 652 set of sending entities? 653 Data points to monitor: Which and how many data points does an 654 overload control mechanism need to monitor? 655 Computational load: What is the (cpu) load created by the overload 656 "monitor" and "actuator" 658 9. Explicit Overload Control Feedback 660 Explicit overload control feedback enables a receiver to indicate how 661 much traffic it wants to receive. Explicit overload control 662 mechanisms can be differentiated based on the type of information 663 conveyed in the overload control feedback and whether the control 664 function is in the receiving or sending entity (receiver- vs. sender- 665 based overload control), or both. 667 9.1. Rate-based Overload Control 669 The key idea of rate-based overload control is to limit the request 670 rate at which an upstream element is allowed to forward traffic to 671 the downstream neighbor. If overload occurs, a SIP server instructs 672 each upstream neighbor to send at most X requests per second. Each 673 upstream neighbor can be assigned a different rate cap. 675 An example algorithm for an Actuator in the sending entity is request 676 gapping. After transmitting a request to a downstream neighbor, a 677 server waits for 1/X seconds before it transmits the next request to 678 the same neighbor. Requests that arrive during the waiting period 679 are not forwarded and are either redirected, rejected or buffered. 680 Request gapping only affects requests that are targeted by overload 681 control (e.g., requests that initiate a transaction and not 682 retransmissions in an ongoing transaction). 684 The rate cap ensures that the number of requests received by a SIP 685 server never increases beyond the sum of all rate caps granted to 686 upstream neighbors. Rate-based overload control protects a SIP 687 server against overload even during load spikes assuming there are no 688 new upstream neighbors that start sending traffic. New upstream 689 neighbors need to be considered in the rate caps assigned to all 690 upstream neighbors. The rate assigned to upstream neighbors needs to 691 be adjusted when new neighbors join. During periods when new 692 neighbors are joining, overload can occur in extreme cases until the 693 rate caps of all servers are adjusted to again match the overall rate 694 cap of the server. The overall rate cap of a SIP server is 695 determined by an overload control algorithm, e.g., based on system 696 load. 698 Rate-based overload control requires a SIP server to assign a rate 699 cap to each of its upstream neighbors while it is activated. 700 Effectively, a server needs to assign a share of its overall capacity 701 to each upstream neighbor. A server needs to ensure that the sum of 702 all rate caps assigned to upstream neighbors does not substantially 703 oversubscribe its actual processing capacity. This requires a SIP 704 server to keep track of the set of upstream neighbors and to adjust 705 the rate cap if a new upstream neighbor appears or an existing 706 neighbor stops transmitting. For example, if the capacity of the 707 server is X and this server is receiving traffic from two upstream 708 neighbors, it can assign a rate of X/2 to each of them. If a third 709 sender appears, the rate for each sender is lowered to X/3. If the 710 overall rate cap is too high, a server may experience overload. If 711 the cap is too low, the upstream neighbors will reject requests even 712 though they could be processed by the server. 714 An approach for estimating a rate cap for each upstream neighbor is 715 using a fixed proportion of a control variable, X, where X is 716 initially equal to the capacity of the SIP server. The server then 717 increases or decreases X until the workload arrival rate matches the 718 actual server capacity. Usually, this will mean that the sum of the 719 rate caps sent out by the server (=X) exceeds its actual capacity, 720 but enables upstream neighbors who are not generating more than their 721 fair share of the work to be effectively unrestricted. In this 722 approach, the server only has to measure the aggregate arrival rate. 723 However, since the overall rate cap is usually higher than the actual 724 capacity, brief periods of overload may occur. 726 9.2. Loss-based Overload Control 728 A loss percentage enables a SIP server to ask an upstream neighbor to 729 reduce the number of requests it would normally forward to this 730 server by X%. For example, a SIP server can ask an upstream neighbor 731 to reduce the number of requests this neighbor would normally send by 732 10%. The upstream neighbor then redirects or rejects 10% of the 733 traffic that is destined for this server. 735 An algorithm for the sending entity to implement a loss percentage is 736 to draw a random number between 1 and 100 for each request to be 737 forwarded. The request is not forwarded to the server if the random 738 number is less than or equal to X. 740 An advantage of loss-based overload control is that, the receiving 741 entity does not need to track the set of upstream neighbors or the 742 request rate it receives from each upstream neighbor. It is 743 sufficient to monitor the overall system utilization. To reduce 744 load, a server can ask its upstream neighbors to lower the traffic 745 forwarded by a certain percentage. The server calculates this 746 percentage by combining the loss percentage that is currently in use 747 (i.e., the loss percentage the upstream neighbors are currently using 748 when forwarding traffic), the current system utilization and the 749 desired system utilization. For example, if the server load 750 approaches 90% and the current loss percentage is set to a 50% 751 traffic reduction, then the server can decide to increase the loss 752 percentage to 55% in order to get to a system utilization of 80%. 753 Similarly, the server can lower the loss percentage if permitted by 754 the system utilization. 756 Loss-based overload control requires that the throttle percentage is 757 adjusted to the current overall number of requests received by the 758 server. This is particularly important if the number of requests 759 received fluctuates quickly. For example, if a SIP server sets a 760 throttle value of 10% at time t1 and the number of requests increases 761 by 20% between time t1 and t2 (t1