idnits 2.17.1 

draft-ietf-soc-overload-design-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 19, 2010) is 4905 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	SOC Working Group                                                V. Hilt
3	Internet-Draft                                  Bell Labs/Alcatel-Lucent
4	Intended status: Informational                                   E. Noel
5	Expires: May 23, 2011                                          AT&T Labs
6	                                                                 C. Shen
7	                                                     Columbia University
8	                                                              A. Abdelal
9	                                                          Sonus Networks
10	                                                       November 19, 2010

12	  Design Considerations for Session Initiation Protocol (SIP) Overload
13	                                Control
14	                   draft-ietf-soc-overload-design-02

16	Abstract

18	   Overload occurs in Session Initiation Protocol (SIP) networks when
19	   SIP servers have insufficient resources to handle all SIP messages
20	   they receive.  Even though the SIP protocol provides a limited
21	   overload control mechanism through its 503 (Service Unavailable)
22	   response code, SIP servers are still vulnerable to overload.  This
23	   document discusses models and design considerations for a SIP
24	   overload control mechanism.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on May 23, 2011.

43	Copyright Notice

45	   Copyright (c) 2010 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	   2.  SIP Overload Problem . . . . . . . . . . . . . . . . . . . . .  4
62	   3.  Explicit vs. Implicit Overload Control . . . . . . . . . . . .  5
63	   4.  System Model . . . . . . . . . . . . . . . . . . . . . . . . .  6
64	   5.  Degree of Cooperation  . . . . . . . . . . . . . . . . . . . .  8
65	     5.1.  Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . .  9
66	     5.2.  End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 10
67	     5.3.  Local Overload Control . . . . . . . . . . . . . . . . . . 11
68	   6.  Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 11
69	   7.  Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
70	   8.  Performance Metrics  . . . . . . . . . . . . . . . . . . . . . 14
71	   9.  Explicit Overload Control Feedback . . . . . . . . . . . . . . 15
72	     9.1.  Rate-based Overload Control  . . . . . . . . . . . . . . . 15
73	     9.2.  Loss-based Overload Control  . . . . . . . . . . . . . . . 17
74	     9.3.  Window-based Overload Control  . . . . . . . . . . . . . . 17
75	     9.4.  Overload Signal-based Overload Control . . . . . . . . . . 19
76	     9.5.  On-/Off Overload Control . . . . . . . . . . . . . . . . . 19
77	   10. Implicit Overload Control  . . . . . . . . . . . . . . . . . . 19
78	   11. Overload Control Algorithms  . . . . . . . . . . . . . . . . . 20
79	   12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 20
80	   13. Security Considerations  . . . . . . . . . . . . . . . . . . . 21
81	   14. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
82	   15. Informative References . . . . . . . . . . . . . . . . . . . . 21
83	   Appendix A.  Contributors  . . . . . . . . . . . . . . . . . . . . 22
84	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22

86	1.  Introduction

88	   As with any network element, a Session Initiation Protocol (SIP)
89	   [RFC3261] server can suffer from overload when the number of SIP
90	   messages it receives exceeds the number of messages it can process.
91	   Overload occurs if a SIP server does not have sufficient resources to
92	   process all incoming SIP messages.  These resources may include CPU,
93	   memory, input/output, or disk resources.

95	   Overload can pose a serious problem for a network of SIP servers.
96	   During periods of overload, the throughput of a network of SIP
97	   servers can be significantly degraded.  In fact, overload may lead to
98	   a situation in which the throughput drops down to a small fraction of
99	   the original processing capacity.  This is often called congestion
100	   collapse.

102	   An overload control mechanism enables a SIP server to perform close
103	   to its capacity limit during times of overload.  Overload control is
104	   used by a SIP server if it is unable to process all SIP requests due
105	   to resource constraints.  There are other failure cases in which a
106	   SIP server can successfully process incoming requests but has to
107	   reject them for other reasons.  For example, a PSTN gateway that runs
108	   out of trunk lines but still has plenty of capacity to process SIP
109	   messages should reject incoming INVITEs using a response such as 488
110	   (Not Acceptable Here), as described in [RFC4412].  Similarly, a SIP
111	   registrar that has lost connectivity to its registration database but
112	   is still capable of processing SIP messages should reject REGISTER
113	   requests with a 500 (Server Error) response [RFC3261].  Overload
114	   control mechanisms do not apply in these cases and SIP provides
115	   appropriate response codes for them.

117	   There are cases in which a SIP server runs other services that do not
118	   involve the processing of SIP messages (e.g., processing of RTP
119	   packets, database queries, software updates and event handling).
120	   These services may, or may not, be correlated with the SIP message
121	   volume.  These services can use up a substantial share of resources
122	   available on the server (e.g., CPU cycles) and leave the server in a
123	   condition where it is unable to process all incoming SIP requests.
124	   In these cases, the SIP server applies SIP overload control
125	   mechanisms to avoid congestion collapse on the SIP signaling plane.
126	   However, controlling the number of SIP requests may not significantly
127	   reduce the load on the server if the resource shortage was created by
128	   another service.  In these cases, it is to be expected that the
129	   server uses appropriate methods of controlling the resource usage of
130	   other services.  The specifics of controlling the resource usage of
131	   other services and their coordination is of scope for this document.

133	   The SIP protocol provides a limited mechanism for overload control
134	   through its 503 (Service Unavailable) response code and the Retry-
135	   After header.  However, this mechanism cannot prevent overload of a
136	   SIP server and it cannot prevent congestion collapse.  In fact, it
137	   may cause traffic to oscillate and to shift between SIP servers and
138	   thereby worsen an overload condition.  A detailed discussion of the
139	   SIP overload problem, the problems with the 503 (Service Unavailable)
140	   response code and the Retry-After header and the requirements for a
141	   SIP overload control mechanism can be found in [RFC5390].  In
142	   addition, 503 is used for other situations (with or without Retry-
143	   After), not just SIP Server overload.  A SIP Overload Control process
144	   based on 503 would have to specify exactly which cause values trigger
145	   the Overload Control.

147	   This document discusses the models, assumptions and design
148	   considerations for a SIP overload control mechanism.  The document is
149	   a product of the SIP overload control design team.

151	2.  SIP Overload Problem

153	   A key contributor to the SIP congestion collapse [RFC5390] is the
154	   regenerative behavior of overload in the SIP protocol.  When SIP is
155	   running over the UDP protocol, it will retransmit messages that were
156	   dropped or excessively delayed by a SIP server due to overload and
157	   thereby increase the offered load for the already overloaded server.
158	   This increase in load worsens the severity of the overload condition
159	   and, in turn, causes more messages to be dropped.  A congestion
160	   collapse can occur [Hilt et al.], [Noel et al.], [Shen et al.] and
161	   [Abdelal et al.].

163	   Regenerative behavior under overload should ideally be avoided by any
164	   protocol as this would lead to stable operation under overload.
165	   However, this is often difficult to achieve in practice.  For
166	   example, changing the SIP retransmission timer mechanisms can reduce
167	   the degree of regeneration during overload but will impact the
168	   ability of SIP to recover from message losses.  Without any
169	   retransmission each message that is dropped due to SIP server
170	   overload will eventually lead to a failed call.

172	   For a SIP INVITE transaction to be successful a minimum of three
173	   messages need to be forwarded by a SIP server.  Often an INVITE
174	   transaction consists of five or more SIP messages.  If a SIP server
175	   under overload randomly discards messages without evaluating them,
176	   the chances that all messages belonging to a transaction are
177	   successfully forwarded will decrease as the load increases.  Thus,
178	   the number of transactions that complete successfully will decrease
179	   even if the message throughput of a server remains up and assuming
180	   the overload behavior is fully non-regenerative.  A SIP server might
181	   (partially) parse incoming messages to determine if it is a new
182	   request or a message belonging to an existing transaction.  However,
183	   after having spend resources on parsing a SIP message, discarding
184	   this message is expensive as the resources already spend are lost.
185	   The number of successful transactions will therefore decline with an
186	   increase in load as less and less resources can be spent on
187	   forwarding messages and more and more resources are consumed by
188	   inspecting messages that will eventually be dropped.  The slope of
189	   the decline depends on the amount of resources spent to inspect each
190	   message.

192	   Another challenge for SIP overload control is controlling the rate of
193	   the true traffic source.  Overload is often caused by a large number
194	   of UAs each of which creates only a single message.  However, the sum
195	   of their traffic can overload a SIP server.  The overload mechanisms
196	   suitable for controlling a SIP server (e.g., rate control) may not be
197	   effective for individual UAs.  In some cases, there are other non-SIP
198	   mechanisms for limiting the load from the UAs.  These may operate
199	   independently from, or in conjunction with, the SIP overload
200	   mechanisms described here.  In either case, they are out of scope for
201	   this document.

203	3.  Explicit vs. Implicit Overload Control

205	   The main differences between explicit and implicit overload control
206	   is the way overload is signaled from a SIP server that is reaching
207	   overload condition to its upstream neighbors.

209	   In an explicit overload control mechanism, a SIP server uses an
210	   explicit overload signal to indicate that it is reaching its capacity
211	   limit.  Upstream neighbors receiving this signal can adjust their
212	   transmission rate according to the overload signal to a level that is
213	   acceptable to the downstream server.  The overload signal enables a
214	   SIP server to steer the load it is receiving to a rate at which it
215	   can perform at maximum capacity.

217	   Implicit overload control uses the absence of responses and packet
218	   loss as an indication of overload.  A SIP server that is sensing such
219	   a condition reduces the load it is forwarding a downstream neighbor.
220	   Since there is no explicit overload signal, this mechanism is robust
221	   as it does not depend on actions taken by the SIP server running into
222	   overload.

224	   The ideas of explicit and implicit overload control are in fact
225	   complementary.  By considering implicit overload indications a server
226	   can avoid overloading an unresponsive downstream neighbor.  An
227	   explicit overload signal enables a SIP server to actively steer the
228	   incoming load to a desired level.

230	4.  System Model

232	   The model shown in Figure 1 identifies fundamental components of an
233	   explicit SIP overload control mechanism:

235	   SIP Processor:  The SIP Processor processes SIP messages and is the
236	      component that is protected by overload control.
237	   Monitor:  The Monitor measures the current load of the SIP processor
238	      on the receiving entity.  It implements the mechanisms needed to
239	      determine the current usage of resources relevant for the SIP
240	      processor and reports load samples (S) to the Control Function.
241	   Control Function:  The Control Function implements the overload
242	      control algorithm.  The control function uses the load samples (S)
243	      and determines if overload has occurred and a throttle (T) needs
244	      to be set to adjust the load sent to the SIP processor on the
245	      receiving entity.  The control function on the receiving entity
246	      sends load feedback (F) to the sending entity.
247	   Actuator:  The Actuator implements the algorithms needed to act on
248	      the throttles (T) and ensures that the amount of traffic forwarded
249	      to the receiving entity meets the criteria of the throttle.  For
250	      example, a throttle may instruct the Actuator to not forward more
251	      than 100 INVITE messages per second.  The Actuator implements the
252	      algorithms to achieve this objective, e.g., using message gapping.
253	      It also implements algorithms to select the messages that will be
254	      affected and determine whether they are rejected or redirected.

256	   The type of feedback (F) conveyed from the receiving to the sending
257	   entity depends on the overload control method used (i.e., loss-based,
258	   rate-based, window-based or signal-based overload control; see
259	   Section 9), the overload control algorithm (see Section 11) as well
260	   as other design parameters.  The feedback (F) enables the sending
261	   entity to adjust the amount of traffic forwarded to the receiving
262	   entity to a level that is acceptable to the receiving entity without
263	   causing overload.

265	   Figure 1 depicts a general system model for overload control.  In
266	   this diagram, one instance of the control function is on the sending
267	   entity (i.e., associated with the actuator) and one is on the
268	   receiving entity (i.e., associated with the monitor).  However, a
269	   specific mechanism may not require both elements.  In this case, one
270	   of two control function elements can be empty and simply passes along
271	   feedback.  E.g., if (F) is defined as a loss-rate (e.g., reduce
272	   traffic by 10%) there is no need for a control function on the
273	   sending entity as the content of (F) can be copied directly into (T).

275	   The model in Figure 1 shows a scenario with one sending and one
276	   receiving entity.  In a more realistic scenario a receiving entity
277	   will receive traffic from multiple sending entities and vice versa
278	   (see Section 6).  The feedback generated by a Monitor will therefore
279	   often be distributed across multiple Actuators.  A Monitor needs to
280	   be able to split the load it can process across multiple sending
281	   entities and generate feedback that correctly adjusts the load each
282	   sending entity is allowed to send.  Similarly, an Actuator needs to
283	   be prepared to receive different levels of feedback from different
284	   receiving entities and throttle traffic to these entities
285	   accordingly.

287	   In a realistic deployment, SIP messages will flow in both directions,
288	   from server B to server A as well as server A to server B. The
289	   overload control mechanisms in each direction can be considered
290	   independently.  For messages flowing from server A to server B, the
291	   sending entity is server A and the receiving entity is server B and
292	   vice versa.  The control loops in both directions operate
293	   independently.

295	          Sending                Receiving
296	           Entity                  Entity
297	     +----------------+      +----------------+
298	     |    Server A    |      |    Server B    |
299	     |  +----------+  |      |  +----------+  |    -+
300	     |  | Control  |  |  F   |  | Control  |  |     |
301	     |  | Function |<-+------+--| Function |  |     |
302	     |  +----------+  |      |  +----------+  |     |
303	     |     T |        |      |       ^        |     | Overload
304	     |       v        |      |       | S      |     | Control
305	     |  +----------+  |      |  +----------+  |     |
306	     |  | Actuator |  |      |  | Monitor  |  |     |
307	     |  +----------+  |      |  +----------+  |     |
308	     |       |        |      |       ^        |    -+
309	     |       v        |      |       |        |    -+
310	     |  +----------+  |      |  +----------+  |     |
311	   <-+--|   SIP    |  |      |  |   SIP    |  |     |  SIP
312	   --+->|Processor |--+------+->|Processor |--+->   | System
313	     |  +----------+  |      |  +----------+  |     |
314	     +----------------+      +----------------+    -+

316	           Figure 1: System Model for Explicit Overload Control

318	5.  Degree of Cooperation

320	   A SIP request is usually processed by more than one SIP server on its
321	   path to the destination.  Thus, a design choice for an explicit
322	   overload control mechanism is where to place the components of
323	   overload control along the path of a request and, in particular,
324	   where to place the Monitor and Actuator.  This design choice
325	   determines the degree of cooperation between the SIP servers on the
326	   path.  Overload control can be implemented hop-by-hop with the
327	   Monitor on one server and the Actuator on its direct upstream
328	   neighbor.  Overload control can be implemented end-to-end with
329	   Monitors on all SIP servers along the path of a request and an
330	   Actuator on the sender.  In this case, the Control Functions
331	   associated with each Monitor have to cooperate to jointly determine
332	   the overall feedback for this path.  Finally, overload control can be
333	   implemented locally on a SIP server if Monitor and Actuator reside on
334	   the same server.  In this case, the sending entity and receiving
335	   entity are the same SIP server and Actuator and Monitor operate on
336	   the same SIP processor (although, the Actuator typically operates on
337	   a pre-processing stage in local overload control).  Local overload
338	   control is an internal overload control mechanism as the control loop
339	   is implemented internally on one server.  Hop-by-hop and end-to-end
340	   are external overload control mechanisms.  All three configurations
341	   are shown in Figure 2.

343	               +---------+             +------(+)---------+
344	      +------+ |         |             |       ^          |
345	      |      | |        +---+          |       |         +---+
346	      v      | v    //=>| C |          v       |     //=>| C |
347	   +---+    +---+ //    +---+       +---+    +---+ //    +---+
348	   | A |===>| B |                   | A |===>| B |
349	   +---+    +---+ \\    +---+       +---+    +---+ \\    +---+
350	               ^    \\=>| D |          ^       |     \\=>| D |
351	               |        +---+          |       |         +---+
352	               |         |             |       v          |
353	               +---------+             +------(+)---------+

355	         (a) hop-by-hop                   (b) end-to-end

357	                         +-+
358	                         v |
359	    +-+      +-+        +---+
360	    v |      v |    //=>| C |
361	   +---+    +---+ //    +---+
362	   | A |===>| B |
363	   +---+    +---+ \\    +---+
364	                    \\=>| D |
365	                        +---+
366	                         ^ |
367	                         +-+

369	           (c) local

371	    ==> SIP request flow
372	    <-- Overload feedback loop

374	              Figure 2: Degree of Cooperation between Servers

376	5.1.  Hop-by-Hop

378	   The idea of hop-by-hop overload control is to instantiate a separate
379	   control loop between all neighboring SIP servers that directly
380	   exchange traffic.  I.e., the Actuator is located on the SIP server
381	   that is the direct upstream neighbor of the SIP server that has the
382	   corresponding Monitor.  Each control loop between two servers is
383	   completely independent of the control loop between other servers
384	   further up- or downstream.  In the example in Figure 2(a), three
385	   independent overload control loops are instantiated: A - B, B - C and
386	   B - D. Each loop only controls a single hop.  Overload feedback
387	   received from a downstream neighbor is not forwarded further
388	   upstream.  Instead, a SIP server acts on this feedback, for example,
389	   by rejecting SIP messages if needed.  If the upstream neighbor of a
390	   server also becomes overloaded, it will report this problem to its
391	   upstream neighbors, which again take action based on the reported
392	   feedback.  Thus, in hop-by-hop overload control, overload is always
393	   resolved by the direct upstream neighbors of the overloaded server
394	   without the need to involve entities that are located multiple SIP
395	   hops away.

397	   Hop-by-hop overload control reduces the impact of overload on a SIP
398	   network and can avoid congestion collapse.  It is simple and scales
399	   well to networks with many SIP entities.  An advantage is that it
400	   does not require feedback to be transmitted across multiple-hops,
401	   possibly crossing multiple trust domains.  Feedback is sent to the
402	   next hop only.  Furthermore, it does not require a SIP entity to
403	   aggregate a large number of overload status values or keep track of
404	   the overload status of SIP servers it is not communicating with.

406	5.2.  End-to-End

408	   End-to-end overload control implements an overload control loop along
409	   the entire path of a SIP request, from UAC to UAS.  An end-to-end
410	   overload control mechanism consolidates overload information from all
411	   SIP servers on the way (including all proxies and the UAS) and uses
412	   this information to throttle traffic as far upstream as possible.  An
413	   end-to-end overload control mechanism has to be able to frequently
414	   collect the overload status of all servers on the potential path(s)
415	   to a destination and combine this data into meaningful overload
416	   feedback.

418	   A UA or SIP server only throttles requests if it knows that these
419	   requests will eventually be forwarded to an overloaded server.  For
420	   example, if D is overloaded in Figure 2(b), A should only throttle
421	   requests it forwards to B when it knows that they will be forwarded
422	   to D. It should not throttle requests that will eventually be
423	   forwarded to C, since server C is not overloaded.  In many cases, it
424	   is difficult for A to determine which requests will be routed to C
425	   and D since this depends on the local routing decision made by B.
426	   These routing decisions can be highly variable and, for example,
427	   depend on call routing policies configured by the user, services
428	   invoked on a call, load balancing policies, etc.  The fact that a
429	   previous message to a target has been routed through an overloaded
430	   server does not necessarily mean the next message to this target will
431	   also be routed through the same server.

433	   The main problem of end-to-end overload control is its inherent
434	   complexity since UAC or SIP servers need to monitor all potential
435	   paths to a destination in order to determine which requests should be
436	   throttled and which requests may be sent.  Even if this information
437	   is available, it is not clear which path a specific request will
438	   take.

440	   A variant of end-to-end overload control is to implement a control
441	   loop between a set of well-known SIP servers along the path of a SIP
442	   request.  For example, an overload control loop can be instantiated
443	   between a server that only has one downstream neighbor or a set of
444	   closely coupled SIP servers.  A control loop spanning multiple hops
445	   can be used if the sending entity has full knowledge about the SIP
446	   servers on the path of a SIP message.

448	   A key difference to transport protocols using end-to-end congestion
449	   control such as TCP is that the traffic exchanged between SIP servers
450	   consists of many individual SIP messages.  Each of these SIP messages
451	   has its own source and destination.  Even SIP messages containing
452	   identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message to the
453	   same SIP URI) can be routed to different destinations.  This is
454	   different from TCP which controls a stream of packets between a
455	   single source and a single destination.

457	5.3.  Local Overload Control

459	   The idea of local overload control (see Figure 2(c)) is to run the
460	   Monitor and Actuator on the same server.  This enables the server to
461	   monitor the current resource usage and to reject messages that can't
462	   be processed without overusing the local resources.  The fundamental
463	   assumption behind local overload control is that it is less resource
464	   consuming for a server to reject messages than to process them.  A
465	   server can therefore reject the excess messages it cannot process to
466	   stop all retransmissions of these messages.  Since rejecting messages
467	   does consume resources on a SIP server, local overload control alone
468	   cannot prevent a congestion collapse.

470	   Local overload control can be used in conjunction with an other
471	   overload control mechanisms and provides an additional layer of
472	   protection against overload.  It is fully implemented within a SIP
473	   server and does not require cooperation between servers.  In general,
474	   SIP servers should apply other overload control techniques to control
475	   load before a local overload control mechanism is activated as a
476	   mechanism of last resort.

478	6.  Topologies

480	   The following topologies describe four generic SIP server
481	   configurations.  These topologies illustrate specific challenges for
482	   an overload control mechanism.  An actual SIP server topology is
483	   likely to consist of combinations of these generic scenarios.

485	   In the "load balancer" configuration shown in Figure 3(a) a set of
486	   SIP servers (D, E and F) receives traffic from a single source A. A
487	   load balancer is a typical example for such a configuration.  In this
488	   configuration, overload control needs to prevent server A (i.e., the
489	   load balancer) from sending too much traffic to any of its downstream
490	   neighbors D, E and F. If one of the downstream neighbors becomes
491	   overloaded, A can direct traffic to the servers that still have
492	   capacity.  If one of the servers serves as a backup, it can be
493	   activated once one of the primary servers reaches overload.

495	   If A can reliably determine that D, E and F are its only downstream
496	   neighbors and all of them are in overload, it may choose to report
497	   overload upstream on behalf of D, E and F. However, if the set of
498	   downstream neighbors is not fixed or only some of them are in
499	   overload then A should not activate an overload control since A can
500	   still forward the requests destined to non-overloaded downstream
501	   neighbors.  These requests would be throttled as well if A would use
502	   overload control towards its upstream neighbors.

504	   In some cases, the servers D, E, and F are in a server farm and
505	   configured to appear as a single server to their upstream neighbors.
506	   In this case, server A can report overload on behalf of the server
507	   farm.  If the load balancer is not a SIP entity, servers D, E, and F
508	   can report the overall load of the server farm (i.e., the load of the
509	   virtual server) in their messages.  As an alternative, one of the
510	   servers (e.g., server E) can report overload on behalf of the server
511	   farm.  In this case, not all messages contain overload control
512	   information and it needs to be ensured that all upstream neighbors
513	   are periodically served by server E to received updated information.

515	   In the "multiple sources" configuration shown in Figure 3(b), a SIP
516	   server D receives traffic from multiple upstream sources A, B and C.
517	   Each of these sources can contribute a different amount of traffic,
518	   which can vary over time.  The set of active upstream neighbors of D
519	   can change as servers may become inactive and previously inactive
520	   servers may start contributing traffic to D.

522	   If D becomes overloaded, it needs to generate feedback to reduce the
523	   amount of traffic it receives from its upstream neighbors.  D needs
524	   to decide by how much each upstream neighbor should reduce traffic.
525	   This decision can require the consideration of the amount of traffic
526	   sent by each upstream neighbor and it may need to be re-adjusted as
527	   the traffic contributed by each upstream neighbor varies over time.
528	   Server D can use a local fairness policy to determine much traffic it
529	   accepts from each upstream neighbor.

531	   In many configurations, SIP servers form a "mesh" as shown in
532	   Figure 3(c).  Here, multiple upstream servers A, B and C forward
533	   traffic to multiple alternative servers D and E. This configuration
534	   is a combination of the "load balancer" and "multiple sources"
535	   scenario.

537	                   +---+              +---+
538	                /->| D |              | A |-\
539	               /   +---+              +---+  \
540	              /                               \   +---+
541	       +---+-/     +---+              +---+    \->|   |
542	       | A |------>| E |              | B |------>| D |
543	       +---+-\     +---+              +---+    /->|   |
544	              \                               /   +---+
545	               \   +---+              +---+  /
546	                \->| F |              | C |-/
547	                   +---+              +---+

549	       (a) load balancer             (b) multiple sources

551	       +---+
552	       | A |---\                        a--\
553	       +---+-\  \---->+---+                 \
554	              \/----->| D |             b--\ \--->+---+
555	       +---+--/\  /-->+---+                 \---->|   |
556	       | B |    \/                      c-------->| D |
557	       +---+---\/\--->+---+                       |   |
558	               /\---->| E |            ...   /--->+---+
559	       +---+--/   /-->+---+                 /
560	       | C |-----/                      z--/
561	       +---+

563	             (c) mesh                   (d) edge proxy

565	                           Figure 3: Topologies

567	   Overload control that is based on reducing the number of messages a
568	   sender is allowed to send is not suited for servers that receive
569	   requests from a very large population of senders, each of which only
570	   infrequently sends a request.  This scenario is shown in Figure 3(d).
571	   An edge proxy that is connected to many UAs is a typical example for
572	   such a configuration.

574	   Since each UA typically only contributes a few requests, which are
575	   often related to the same call, it can't decrease its message rate to
576	   resolve the overload.  In such a configuration, a SIP server can
577	   resort to local overload control by rejecting a percentage of the
578	   requests it receives with 503 (Service Unavailable) responses.  Since
579	   there are many upstream neighbors that contribute to the overall
580	   load, sending 503 (Service Unavailable) to a fraction of them can
581	   gradually reduce load without entirely stopping all incoming traffic.
582	   The Retry-After header can be used in 503 (Service Unavailable)
583	   responses to ask UAs to wait a given number of seconds before trying
584	   the call again.  Using 503 (Service Unavailable) towards individual
585	   sources can, however, not prevent overload if a large number of users
586	   places calls at the same time.

588	      Note: The requirements of the "edge proxy" topology are different
589	      from the ones of the other topologies, which may require a
590	      different method for overload control.

592	7.  Fairness

594	   There are many different ways to define fairness between multiple
595	   upstream neighbors of a SIP server.  In the context of SIP server
596	   overload, it is helpful to describe two categories of fairness: basic
597	   fairness and customized fairness.  With basic fairness a SIP server
598	   treats all call attempts equally and ensures that each call attempt
599	   has the same chance of succeeding.  With customized fairness, the
600	   server allocates resources according to different priorities.  An
601	   example application of the basic fairness criteria is the "Third
602	   caller receives free tickets" scenario, where each call attempt
603	   should have an equal success probability in making calls through an
604	   overloaded SIP server, irrespective of the service provider where it
605	   was initiated.  An example of customized fairness would be a server
606	   which assigns different resource allocations to its upstream
607	   neighbors (e.g., service providers) as defined in a service level
608	   agreement (SLA).

610	8.  Performance Metrics

612	   The performance of an overload control mechanism can be measured
613	   using different metrics.

615	   A key performance indicator is the goodput of a SIP server under
616	   overload.  Ideally, a SIP server will be enabled to perform at its
617	   capacity limit during periods of overload.  E.g., if a SIP server has
618	   a processing capacity of 140 INVITE transactions per second then an
619	   overload control mechanism should enable it to process 140 INVITEs
620	   per second even if the offered load is much higher.  The delay
621	   introduced by a SIP server is another important indicator.  An
622	   overload control mechanism should ensure that the delay encountered
623	   by a SIP message is not increased significantly during periods of
624	   overload.  Significantly increased delay can lead to time-outs, and
625	   retransmission > of SIP messages, making the overload worse.

627	   Responsiveness and stability are other important performance
628	   indicators.  An overload control mechanism should quickly react to an
629	   overload occurrence and ensure that a SIP server does not become
630	   overloaded even during sudden peaks of load.  Similarly, an overload
631	   control mechanism should quickly stop rejecting calls if the overload
632	   disappears.  Stability is another important criteria.  An overload
633	   control mechanism should not cause significant oscillations of load
634	   on a SIP server.  The performance of SIP overload control mechanisms
635	   is discussed in [Noel et al.], [Shen et al.], [Hilt et al.] and
636	   [Abdelal et al.].

638	   In addition to the above metrics, there are other indicators that are
639	   relevant for the evaluation of an overload control mechanism:

641	   Fairness:  Which types of fairness does the overload control
642	      mechanism implement?
643	   Self-limiting:  Is the overload control self-limiting if a SIP server
644	      becomes unresponsive?
645	   Changes in neighbor set:  How does the mechanism adapt to a changing
646	      set of sending entities?
647	   Data points to monitor:  Which and how many data points does an
648	      overload control mechanism need to monitor?
649	   Computational complexity:  What is the (cpu) load created by the
650	      overload "monitor" and "actuator"

652	9.  Explicit Overload Control Feedback

654	   Explicit overload control feedback enables a receiver to indicate how
655	   much traffic it wants to receive.  Explicit overload control
656	   mechanisms can be differentiated based on the type of information
657	   conveyed in the overload control feedback and whether the control
658	   function is in the receiving or sending entity (receiver- vs. sender-
659	   based overload control), or both.

661	9.1.  Rate-based Overload Control

663	   The key idea of rate-based overload control is to limit the request
664	   rate at which an upstream element is allowed to forward traffic to
665	   the downstream neighbor.  If overload occurs, a SIP server instructs
666	   each upstream neighbor to send at most X requests per second.  Each
667	   upstream neighbor can be assigned a different rate cap.

669	   An example algorithm for an Actuator in the sending entity is request
670	   gapping.  After transmitting a request to a downstream neighbor, a
671	   server waits for 1/X seconds before it transmits the next request to
672	   the same neighbor.  Requests that arrive during the waiting period
673	   are not forwarded and are either redirected, rejected or buffered.

675	   The rate cap ensures that the number of requests received by a SIP
676	   server never increases beyond the sum of all rate caps granted to
677	   upstream neighbors.  Rate-based overload control protects a SIP
678	   server against overload even during load spikes assuming there are no
679	   new upstream neighbors that start sending traffic.  New upstream
680	   neighbors need to be considered in the rate caps assigned to all
681	   upstream neighbors.  The rate assigned to upstream neighbors needs to
682	   be adjusted when new neighbors join.  During periods when new
683	   neighbors are joining, overload can occur in extreme cases until the
684	   rate caps of all servers are adjusted to again match the overall rate
685	   cap of the server.  The overall rate cap of a SIP server is
686	   determined by an overload control algorithm, e.g., based on system
687	   load.

689	   Rate-based overload control requires a SIP server to assign a rate
690	   cap to each of its upstream neighbors while it is activated.
691	   Effectively, a server needs to assign a share of its overall capacity
692	   to each upstream neighbor.  A server needs to ensure that the sum of
693	   all rate caps assigned to upstream neighbors does not substantially
694	   oversubscribe its actual processing capacity.  This requires a SIP
695	   server to keep track of the set of upstream neighbors and to adjust
696	   the rate cap if a new upstream neighbor appears or an existing
697	   neighbor stops transmitting.  For example, if the capacity of the
698	   server is X and this server is receiving traffic from two upstream
699	   neighbors, it can assign a rate of X/2 to each of them.  If a third
700	   sender appears, the rate for each sender is lowered to X/3.  If the
701	   overall rate cap is too high, a server may experience overload.  If
702	   the cap is too low, the upstream neighbors will reject requests even
703	   though they could be processed by the server.

705	   An approach for estimating a rate cap for each upstream neighbor is
706	   using a fixed proportion of a control variable, X, where X is
707	   initially equal to the capacity of the SIP server.  The server then
708	   increases or decreases X until the workload arrival rate matches the
709	   actual server capacity.  Usually, this will mean that the sum of the
710	   rate caps sent out by the server (=X) exceeds its actual capacity,
711	   but enables upstream neighbors who are not generating more than their
712	   fair share of the work to be effectively unrestricted.  In this
713	   approach, the server only has to measure the aggregate arrival rate.
714	   However, since the overall rate cap is usually higher than the actual
715	   capacity, brief periods of overload may occur.

717	9.2.  Loss-based Overload Control

719	   A loss percentage enables a SIP server to ask an upstream neighbor to
720	   reduce the number of requests it would normally forward to this
721	   server by X%.  For example, a SIP server can ask an upstream neighbor
722	   to reduce the number of requests this neighbor would normally send by
723	   10%.  The upstream neighbor then redirects or rejects 10% of the
724	   traffic that is destined for this server.

726	   An algorithm for the sending entity to implement a loss percentage is
727	   to draw a random number between 1 and 100 for each request to be
728	   forwarded.  The request is not forwarded to the server if the random
729	   number is less than or equal to X.

731	   An advantage of loss-based overload control is that, the receiving
732	   entity does not need to track the set of upstream neighbors or the
733	   request rate it receives from each upstream neighbor.  It is
734	   sufficient to monitor the overall system utilization.  To reduce
735	   load, a server can ask its upstream neighbors to lower the traffic
736	   forwarded by a certain percentage.  The server calculates this
737	   percentage by combining the loss percentage that is currently in use
738	   (i.e., the loss percentage the upstream neighbors are currently using
739	   when forwarding traffic), the current system utilization and the
740	   desired system utilization.  For example, if the server load
741	   approaches 90% and the current loss percentage is set to a 50%
742	   traffic reduction, then the server can decide to increase the loss
743	   percentage to 55% in order to get to a system utilization of 80%.
744	   Similarly, the server can lower the loss percentage if permitted by
745	   the system utilization.

747	   Loss-based overload control requires that the throttle percentage is
748	   adjusted to the current overall number of requests received by the
749	   server.  This is particularly important if the number of requests
750	   received fluctuates quickly.  For example, if a SIP server sets a
751	   throttle value of 10% at time t1 and the number of requests increases
752	   by 20% between time t1 and t2 (t1<t2), then the server will see an
753	   increase in traffic by 10% between time t1 and t2.  This is even
754	   though all upstream neighbors have reduced traffic by 10% as told.
755	   Thus, percentage throttling requires an adjustment of the throttling
756	   percentage in response to the traffic received and may not always be
757	   able to prevent a server from encountering brief periods of overload
758	   in extreme cases.

760	9.3.  Window-based Overload Control

762	   The key idea of window-based overload control is to allow an entity
763	   to transmit a certain number of messages before it needs to receive a
764	   confirmation for the messages in transit.  Each sender maintains an
765	   overload window that limits the number of messages that can be in
766	   transit without being confirmed.

768	   Each sender maintains an unconfirmed message counter for each
769	   downstream neighbor it is communicating with.  For each message sent
770	   to the downstream neighbor, the counter is increased.  For each
771	   confirmation received, the counter is decreased.  The sender stops
772	   transmitting messages to the downstream neighbor when the unconfirmed
773	   message counter has reached the current window size.

775	   A crucial parameter for the performance of window-based overload
776	   control is the window size.  Each sender has an initial window size
777	   it uses when first sending a request.  This window size can be
778	   changed based on the feedback it receives from the receiver.

780	   The sender adjusts its window size as soon as it receives the
781	   corresponding feedback from the receiver.  If the new window size is
782	   smaller than the current unconfirmed message counter, the sender
783	   stops transmitting messages until more messages are confirmed and the
784	   current unconfirmed message counter is less than the window size.

786	   Note that the reception of a 100 Trying response does not provide a
787	   confirmation for the reception of a message. 100 Trying responses are
788	   often created by a SIP server very early in processing and do not
789	   indicate that a message has been successfully processed and cleared
790	   from the input buffer.  If the downstream neighbor is a stateless
791	   proxy, it will not create 100 Trying responses at all and instead
792	   pass through 100 Trying responses created by the next stateful
793	   server.  Also, 100 Trying responses are typically only created for
794	   INVITE requests.  Explicit message confirmations do not have these
795	   problems.

797	   Window-based overload control is similar to rate-based overload
798	   control in that the total available receiver buffer space needs to be
799	   divided among all upstream neighbors.  However, unlike rate-based
800	   overload control, window-based overload control is self-limiting and
801	   can ensure that the receiver buffer does not overflow under normal
802	   conditions.  The transmission of messages by senders is clocked by
803	   message confirmations received from the receiver.  A buffer overflow
804	   can occur in extreme cases when a large number of new upstream
805	   neighbors arrives at the same time.  However, senders will eventually
806	   stop transmitting new requests once their initial sending window is
807	   closed.

809	   In window-based overload control, the number of messages a sender is
810	   allowed to send can frequently be set to zero.  In this state, the
811	   sender needs to be informed when it is allowed to send again and the
812	   receiver window has opened up.  However, since the sender is not
813	   allowed to transmit messages, the receiver cannot convey the new
814	   window size by piggybacking it in a response to another message.
815	   Instead, it needs to inform the sender through another mechanism,
816	   e.g., by sending a message that contains the new window size.

818	9.4.  Overload Signal-based Overload Control

820	   The key idea of overload signal-based overload control is to use the
821	   transmission of a 503 (Service Unavailable) response as a signal for
822	   overload in the downstream neighbor.  After receiving a 503 (Service
823	   Unavailable) response, the sender reduces the load forwarded to the
824	   downstream neighbor to avoid triggering more 503 (Service
825	   Unavailable) responses.  The sender keeps reducing the load if more
826	   503 (Service Unavailable) responses are received.  Note that this
827	   scheme is based on the use of 503 (Service Unavailable) responses
828	   without Retry-After header as the Retry-After header would require a
829	   sender to entirely stop forwarding requests.  It should also be noted
830	   that 503 responses can be generated for other reasons than overload
831	   (e.g., server maintenance).

833	   A sender which has not received 503 (Service Unavailable) responses
834	   for a while but is still throttling traffic can start to increase the
835	   offered load.  By slowly increasing the traffic forwarded a sender
836	   can detect that overload in the downstream neighbor has been resolved
837	   and more load can be forwarded.  The load is increased until the
838	   sender again receives another 503 (Service Unavailable) response or
839	   is forwarding all requests it has.  A possible algorithm for
840	   adjusting traffic is additive increase/multiplicative decrease
841	   (AIMD).

843	   Overload Signal-based Overload Control is a sender-based overload
844	   control mechanism.

846	9.5.  On-/Off Overload Control

848	   On-/off overload control feedback enables a SIP server to turn the
849	   traffic it is receiving either on or off.  The 503 (Service
850	   Unavailable) response with Retry-After header implements on-/off
851	   overload control.  On-/off overload control is less effective in
852	   controlling load than the fine grained control methods above.  In
853	   fact, all above methods can realize on/-off overload control, e.g.,
854	   by setting the allowed rate to either zero or unlimited.

856	10.  Implicit Overload Control

858	   Implicit overload control ensures that the transmission of a SIP
859	   server is self-limiting.  It slows down the transmission rate of a
860	   sender when there is an indication that the receiving entity is
861	   experiencing overload.  Such an indication can be that the receiving
862	   entity is not responding within the expected timeframe or is not
863	   responding at all.  The idea of implicit overload control is that
864	   senders should try to sense overload of a downstream neighbor even if
865	   there is no explicit overload control feedback.  It avoids that an
866	   overloaded server, which has become unable to generate overload
867	   control feedback, will be overwhelmed with requests.

869	   Window-based overload control is inherently self-limiting since a
870	   sender cannot continue without receiving confirmations.  All other
871	   explicit overload control schemes described above do not have this
872	   property and require additional implicit controls to limit
873	   transmissions in case an overloaded downstream neighbor does not
874	   generate explicit feedback.

876	11.  Overload Control Algorithms

878	   An important aspect of the design of an overload control mechanism is
879	   the overload control algorithm.  The control algorithm determines
880	   when the amount of traffic to a SIP server needs to be decreased and
881	   when it can be increased.  In terms of the model described in
882	   Section 4 the control algorithm takes (S) as an input value and
883	   generates (T) as a result.

885	   Overload control algorithms have been studied to a large extent and
886	   many different overload control algorithms exist.  With many
887	   different overload control algorithms available, it seems reasonable
888	   to suggest a baseline algorithm in a specification for a SIP overload
889	   control mechanism and allow the use of other algorithms if they
890	   provide the same protocol semantics.  This will also allow the
891	   development of future algorithms, which may lead to a better
892	   performance.  Conversely, the overload control mechanism should allow
893	   the use of different algorithms if they adhere to the defined
894	   protocol semantics.

896	12.  Message Prioritization

898	   Overload control can require a SIP server to prioritize requests and
899	   select requests to be rejected or redirected.  The selection is
900	   largely a matter of local policy of the SIP server, the overall
901	   network, and the services it provides.  As a general rule, SIP server
902	   should prioritize requests for ongoing sessions over requests that
903	   set up a new session.

905	   While there are many factors which can affect the prioritization of
906	   SIP requests, the Resource-Priority header field [RFC4412] is a prime
907	   candidate for marking the prioritization of SIP requests.  Depending
908	   on the particular network and the services it offers, a particular
909	   namespace and priority value in the RPH it could indicate i) a high
910	   priority request, which should be preserved if possible during
911	   overload, ii) a low priority request, which should be dropped during
912	   overload, or iii) a label, which has no impact on message
913	   prioritization in this network.

915	   For a number of reasons, responses should not be targeted in order to
916	   reduce SIP server load.  Responses cannot be rejected and would have
917	   to be dropped.  This triggers the retransmission of the request plus
918	   the response, leading to even more load.  In addition, the request
919	   associated with a response has already been processed and dropping
920	   the response will waste the efforts that have been spent on the
921	   request.  Most importantly, rejecting a request effectively also
922	   removes the request and the response.  If no requests are passed
923	   along there will be no responses coming back in return.

925	   Overload control does not change the retransmission behavior of SIP.
926	   Retransmissions are triggered using procedures defined in RFC 3261
927	   [RFC3261] and not subject to throttling.

929	13.  Security Considerations

931	   Overload control mechanisms, in general, have security implications.
932	   If not designed carefully they can, for example, be used to launch a
933	   denial of service attack.  The specific security risks and their
934	   remedies depend on the actual protocol mechanisms chosen for overload
935	   control.  They need to be addressed in a document that specifies such
936	   a mechanism.

938	14.  IANA Considerations

940	   This document does not require any IANA considerations.

942	15.  Informative References

944	   [Abdelal et al.]
945	              Abdelal, A. and W. Matragi, "Signal-Based Overload Control
946	              for SIP Servers", 7th Annual IEEE Consumer Communications
947	              and Networking Conference (CCNC-10), Las Vegas, Nevada,
948	              USA, January 2010.

950	   [Hilt et al.]
951	              Hilt, V. and I. Widjaja, "Controlling Overload in Networks
952	              of SIP Servers", IEEE International Conference on Network
953	              Protocols (ICNP'08), Orlando, Florida, October 2008.

955	   [Noel et al.]
956	              Noel, E. and C. Johnson, "Initial Simulation Results That
957	              Analyze SIP Based VoIP Networks Under Overload",
958	              International Teletraffic Congress (ITC'07), Ottawa,
959	              Canada, June 2007.

961	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
962	              A., Peterson, J., Sparks, R., Handley, M., and E.
963	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
964	              June 2002.

966	   [RFC4412]  Schulzrinne, H. and J. Polk, "Communications Resource
967	              Priority for the Session Initiation Protocol (SIP)",
968	              RFC 4412, February 2006.

970	   [RFC5390]  Rosenberg, J., "Requirements for Management of Overload in
971	              the Session Initiation Protocol", RFC 5390, December 2008.

973	   [Shen et al.]
974	              Shen, C., Schulzrinne, H., and E. Nahum, "Session
975	              Initiation Protocol (SIP) Server Overload Control: Design
976	              and Evaluation, Principles", Systems and Applications of
977	              IP Telecommunications (IPTComm'08), Heidelberg, Germany,
978	              July 2008.

980	Appendix A.  Contributors

982	   Many thanks for the contributions, comments and feedback on this
983	   document to: Mary Barnes (Nortel), Janet Gunn (CSC), Carolyn Johnson
984	   (AT&T Labs), Paul Kyzivat (Cisco), Daryl Malas (CableLabs), Tom
985	   Phelan (Sonus Networks), Jonathan Rosenberg (Cisco), Henning
986	   Schulzrinne (Columbia University), Nick Stewart (British
987	   Telecommunications plc), Rich Terpstra (Level 3), Fangzhe Chang (Bell
988	   Labs/Alcatel-Lucent).

990	Authors' Addresses

992	   Volker Hilt
993	   Bell Labs/Alcatel-Lucent
994	   791 Holmdel-Keyport Rd
995	   Holmdel, NJ  07733
996	   USA

998	   Email: volker.hilt@alcatel-lucent.com

1000	   Eric Noel
1001	   AT&T Labs

1003	   Email: eric.noel@att.com

1005	   Charles Shen
1006	   Columbia University

1008	   Email: charles@cs.columbia.edu

1010	   Ahmed Abdelal
1011	   Sonus Networks

1013	   Email: aabdelal@sonusnet.com