idnits 2.17.1 

draft-ietf-sipping-overload-design-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 7, 2009) is 5529 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SIPPING Working Group                                      V. Hilt (Ed.)
3	Internet-Draft                                  Bell Labs/Alcatel-Lucent
4	Intended status: Informational                             March 7, 2009
5	Expires: September 8, 2009

7	  Design Considerations for Session Initiation Protocol (SIP) Overload
8	                                Control
9	                 draft-ietf-sipping-overload-design-01

11	Status of this Memo

13	   This Internet-Draft is submitted to IETF in full conformance with the
14	   provisions of BCP 78 and BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on September 8, 2009.

34	Copyright Notice

36	   Copyright (c) 2009 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents in effect on the date of
41	   publication of this document (http://trustee.ietf.org/license-info).
42	   Please review these documents carefully, as they describe your rights
43	   and restrictions with respect to this document.

45	Abstract

47	   Overload occurs in Session Initiation Protocol (SIP) networks when
48	   SIP servers have insufficient resources to handle all SIP messages
49	   they receive.  Even though the SIP protocol provides a limited
50	   overload control mechanism through its 503 (Service Unavailable)
51	   response code, SIP servers are still vulnerable to overload.  This
52	   document discusses models and design considerations for a SIP
53	   overload control mechanism.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
58	   2.  SIP Overload Problem . . . . . . . . . . . . . . . . . . . . .  4
59	   3.  Explicit vs. Implicit Overload Control . . . . . . . . . . . .  5
60	   4.  System Model . . . . . . . . . . . . . . . . . . . . . . . . .  5
61	   5.  Degree of Cooperation  . . . . . . . . . . . . . . . . . . . .  7
62	     5.1.  Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . .  8
63	     5.2.  End-to-End . . . . . . . . . . . . . . . . . . . . . . . .  9
64	     5.3.  Local Overload Control . . . . . . . . . . . . . . . . . . 10
65	   6.  Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 10
66	   7.  Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
67	   8.  Performance Metrics  . . . . . . . . . . . . . . . . . . . . . 13
68	   9.  Explicit Overload Control Feedback . . . . . . . . . . . . . . 14
69	     9.1.  Rate-based Overload Control  . . . . . . . . . . . . . . . 14
70	     9.2.  Loss-based Overload Control  . . . . . . . . . . . . . . . 15
71	     9.3.  Window-based Overload Control  . . . . . . . . . . . . . . 16
72	     9.4.  Overload Signal-based Overload Control . . . . . . . . . . 17
73	     9.5.  On-/Off Overload Control . . . . . . . . . . . . . . . . . 18
74	   10. Implicit Overload Control  . . . . . . . . . . . . . . . . . . 18
75	   11. Overload Control Algorithms  . . . . . . . . . . . . . . . . . 19
76	   12. Message Prioritization . . . . . . . . . . . . . . . . . . . . 19
77	   13. Security Considerations  . . . . . . . . . . . . . . . . . . . 19
78	   14. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
79	   15. Informative References . . . . . . . . . . . . . . . . . . . . 19
80	   Appendix A.  Contributors  . . . . . . . . . . . . . . . . . . . . 20
81	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20

83	1.  Introduction

85	   As with any network element, a Session Initiation Protocol (SIP)
86	   [RFC3261] server can suffer from overload when the number of SIP
87	   messages it receives exceeds the number of messages it can process.
88	   Overload occurs if a SIP server does not have sufficient resources to
89	   process all incoming SIP messages.  These resources may include CPU,
90	   memory, network bandwidth, input/output, or disk resources.

92	   Overload can pose a serious problem for a network of SIP servers.
93	   During periods of overload, the throughput of a network of SIP
94	   servers can be significantly degraded.  In fact, overload may lead to
95	   a situation in which the throughput drops down to a small fraction of
96	   the original processing capacity.  This is often called congestion
97	   collapse.

99	   An overload control mechanism enables a SIP server to perform close
100	   to its capacity limit during times of overload.  Overload control is
101	   used by a SIP server if it is unable to process all SIP requests due
102	   to resource constraints.  There are other failure cases in which a
103	   SIP server can successfully process incoming requests but has to
104	   reject them for other reasons.  For example, a PSTN gateway that runs
105	   out of trunk lines but still has plenty of capacity to process SIP
106	   messages should reject incoming INVITEs using a 488 (Not Acceptable
107	   Here) response [RFC4412].  Similarly, a SIP registrar that has lost
108	   connectivity to its registration database but is still capable of
109	   processing SIP messages should reject REGISTER requests with a 500
110	   (Server Error) response [RFC3261].  Overload control mechanisms do
111	   not apply in these cases and SIP provides appropriate response codes
112	   for them.

114	   The SIP protocol provides a limited mechanism for overload control
115	   through its 503 (Service Unavailable) response code and the Retry-
116	   After header.  However, this mechanism cannot prevent overload of a
117	   SIP server and it cannot prevent congestion collapse.  In fact, it
118	   may cause traffic to oscillate and to shift between SIP servers and
119	   thereby worsen an overload condition.  A detailed discussion of the
120	   SIP overload problem, the problems with the 503 (Service Unavailable)
121	   response code and the Retry-After header and the requirements for a
122	   SIP overload control mechanism can be found in [RFC5390].

124	   This document discusses the models, assumptions and design
125	   considerations for a SIP overload control mechanism.  The document is
126	   a product of the SIP overload control design team.

128	2.  SIP Overload Problem

130	   A key contributor to the SIP congestion collapse [RFC5390] is the
131	   regenerative behavior of overload in the SIP protocol.  When SIP is
132	   running over the UDP protocol, it will retransmit messages that were
133	   dropped by a SIP server due to overload and thereby increase the
134	   offered load for the already overloaded server.  This increase in
135	   load worsens the severity of the overload condition and, in turn,
136	   causes more messages to be dropped.  A congestion collapse can occur
137	   [Noel et al.], [Shen et al.] and [Hilt et al.].

139	   Regenerative behavior under overload should ideally be avoided by any
140	   protocol as this would lead to stable operation under overload.
141	   However, this is often difficult to achieve in practice.  For
142	   example, changing the SIP retransmission timer mechanisms can reduce
143	   the degree of regeneration during overload but will impact the
144	   ability of SIP to recover from message losses.  Without any
145	   retransmission each message that is dropped due to SIP server
146	   overload will eventually lead to a failed call.

148	   For a SIP INVITE transaction to be successful a minimum of three
149	   messages need to be forwarded by a SIP server.  Often an INVITE
150	   transaction consists of five or more SIP messages.  If a SIP server
151	   under overload randomly discards messages without evaluating them,
152	   the chances that all messages belonging to a transaction are
153	   successfully forwarded will decrease as the load increases.  Thus,
154	   the number of transactions that complete successfully will decrease
155	   even if the message throughput of a server remains up and assuming
156	   the overload behavior is fully non-regenerative.  A SIP server might
157	   (partially) parse incoming messages to determine if it is a new
158	   request or a message belonging to an existing transaction.  However,
159	   after having spend resources on parsing a SIP message, discarding
160	   this message is expensive as the resources already spend are lost.
161	   The number of successful transactions will therefore decline with an
162	   increase in load as less and less resources can be spent on
163	   forwarding messages and more and more resources are consumed by
164	   inspecting messages that will eventually be dropped.  The slope of
165	   the decline depends on the amount of resources spent to inspect each
166	   message.

168	   Another key challenge for SIP overload control is that the rate of
169	   the true traffic source usually cannot be controlled.  Overload is
170	   often caused by a large number of UAs each of which creates only a
171	   single message.  These UAs cannot be rate controlled as they only
172	   send one message.  However, the sum of their traffic can overload a
173	   SIP server.

175	3.  Explicit vs. Implicit Overload Control

177	   The main differences between explicit and implicit overload control
178	   is the way overload is signaled from a SIP server that is reaching
179	   overload condition to its upstream neighbors.

181	   In an explicit overload control mechanism, a SIP server uses an
182	   explicit overload signal to indicate that it is reaching its capacity
183	   limit.  Upstream neighbors receiving this signal can adjust their
184	   transmission rate as indicated by the overload signal to a level that
185	   is acceptable to the downstream server.  The overload signal enables
186	   a SIP server to steer the load it is receiving to a rate at which it
187	   can perform at maximum capacity.

189	   Implicit overload control uses the absence of responses and packet
190	   loss as an indication of overload.  A SIP server that is sensing such
191	   a condition reduces the load it is forwarding a downstream neighbor.
192	   Since there is no explicit overload signal, this mechanism is robust
193	   as it does not depend on actions taken by the SIP server running into
194	   overload.

196	   The ideas of explicit and implicit overload control are in fact
197	   complementary.  By considering implicit overload indications a server
198	   can avoid overloading an unresponsive downstream neighbor.  An
199	   explicit overload signal enables a SIP server to actively steer the
200	   incoming load to a desired level.

202	4.  System Model

204	   The model shown in Figure 1 identifies fundamental components of an
205	   explicit SIP overload control mechanism:

207	   SIP Processor:  The SIP Processor processes SIP messages and is the
208	      component that is protected by overload control.
209	   Monitor:  The Monitor measures the current load of the SIP processor
210	      on the receiving entity.  It implements the mechanisms needed to
211	      determine the current usage of resources relevant for the SIP
212	      processor and reports load samples (S) to the Control Function.
213	   Control Function:  The Control Function implements the overload
214	      control algorithm.  The control function uses the load samples (S)
215	      and determines if overload has occurred and a throttle (T) needs
216	      to be set to adjust the load sent to the SIP processor on the
217	      receiving entity.  The control function on the receiving entity
218	      sends load feedback (F) to the sending entity.

220	   Actuator:  The Actuator implements the algorithms needed to act on
221	      the throttles (T) and ensures that the amount of traffic forwarded
222	      to the receiving entity meets the criteria of the throttle.  For
223	      example, a throttle may instruct the Actuator to not forward more
224	      than 100 INVITE messages per second.  The Actuator implements the
225	      algorithms to achieve this objective, e.g., using message gapping.
226	      It also implements algorithms to select the messages that will be
227	      affected and determine whether they are rejected or redirected.

229	   The type of feedback (F) conveyed from the receiving to the sending
230	   entity depends on the overload control method used (i.e., loss-based,
231	   rate-based, window-based or signal-based overload control; see
232	   Section 9), the overload control algorithm (see Section 11) as well
233	   as other design parameters.  The feedback (F) enables the sending
234	   entity to adjust the amount of traffic forwarded to the receiving
235	   entity to a level that is acceptable to the receiving entity without
236	   causing overload.

238	   Figure 1 depicts a general system model for overload control.  In
239	   this diagram, one instance of the control function is on the sending
240	   entity (i.e., associated with the actuator) and one is on the
241	   receiving entity (i.e., associated with the monitor).  However, a
242	   specific mechanism may not require both elements.  In this case, one
243	   of two control function elements can be empty and simply passes along
244	   feedback.  E.g., if (F) is defined as a loss-rate (e.g., reduce
245	   traffic by 10%) there is no need for a control function on the
246	   sending entity as the content of (F) can be copied directly into (T).

248	   The model in Figure 1 shows a scenario with one sending and one
249	   receiving entity.  In a more realistic scenario a receiving entity
250	   will receive traffic from multiple sending entities and vice versa
251	   (see Section 6).  The feedback generated by a Monitor will therefore
252	   often be distributed across multiple Actuators.  A Monitor needs to
253	   be able to split the load it can process across multiple sending
254	   entities and generate feedback that correctly adjusts the load each
255	   sending entity is allowed to send.  Similarly, an Actuator needs to
256	   be prepared to receive different levels of feedback from different
257	   receiving entities and throttle traffic to these entities
258	   accordingly.

260	          Sending                Receiving
261	           Entity                  Entity
262	     +----------------+      +----------------+
263	     |    Server A    |      |    Server B    |
264	     |  +----------+  |      |  +----------+  |    -+
265	     |  | Control  |  |  F   |  | Control  |  |     |
266	     |  | Function |<-+------+--| Function |  |     |
267	     |  +----------+  |      |  +----------+  |     |
268	     |     T |        |      |       ^        |     | Overload
269	     |       v        |      |       | S      |     | Control
270	     |  +----------+  |      |  +----------+  |     |
271	     |  | Actuator |  |      |  | Monitor  |  |     |
272	     |  +----------+  |      |  +----------+  |     |
273	     |       |        |      |       ^        |    -+
274	     |       v        |      |       |        |    -+
275	     |  +----------+  |      |  +----------+  |     |
276	   <-+--|   SIP    |  |      |  |   SIP    |  |     |  SIP
277	   --+->|Processor |--+------+->|Processor |--+->   | System
278	     |  +----------+  |      |  +----------+  |     |
279	     +----------------+      +----------------+    -+

281	           Figure 1: System Model for Explicit Overload Control

283	5.  Degree of Cooperation

285	   A SIP request is usually processed by more than one SIP server on its
286	   path to the destination.  Thus, a design choice for an explicit
287	   overload control mechanism is where to place the components of
288	   overload control along the path of a request and, in particular,
289	   where to place the Monitor and Actuator.  This design choice
290	   determines the degree of cooperation between the SIP servers on the
291	   path.  Overload control can be implemented hop-by-hop with the
292	   Monitor on one server and the Actuator on its direct upstream
293	   neighbor.  Overload control can be implemented end-to-end with
294	   Monitors on all SIP servers along the path of a request and an
295	   Actuator on the sender.  In this case, the Control Functions
296	   associated with each Monitor have to cooperate to jointly determine
297	   the overall feedback for this path.  Finally, overload control can be
298	   implemented locally on a SIP server if Monitor and Actuator reside on
299	   the same server.  In this case, the sending entity and receiving
300	   entity are the same SIP server and Actuator and Monitor operate on
301	   the same SIP processor (although, the Actuator typically operates on
302	   a pre-processing stage in local overload control).  Local overload
303	   control is an internal overload control mechanism as the control loop
304	   is implemented internally on one server.  Hop-by-hop and end-to-end
305	   are external overload control mechanisms.  All three configurations
306	   are shown in Figure 2.

308	               +---------+             +------(+)---------+
309	      +------+ |         |             |       ^          |
310	      |      | |        +---+          |       |         +---+
311	      v      | v    //=>| C |          v       |     //=>| C |
312	   +---+    +---+ //    +---+       +---+    +---+ //    +---+
313	   | A |===>| B |                   | A |===>| B |
314	   +---+    +---+ \\    +---+       +---+    +---+ \\    +---+
315	               ^    \\=>| D |          ^       |     \\=>| D |
316	               |        +---+          |       |         +---+
317	               |         |             |       v          |
318	               +---------+             +------(+)---------+

320	         (a) hop-by-hop                   (b) end-to-end

322	                         +-+
323	                         v |
324	    +-+      +-+        +---+
325	    v |      v |    //=>| C |
326	   +---+    +---+ //    +---+
327	   | A |===>| B |
328	   +---+    +---+ \\    +---+
329	                    \\=>| D |
330	                        +---+
331	                         ^ |
332	                         +-+

334	           (c) local

336	    ==> SIP request flow
337	    <-- Overload feedback loop

339	              Figure 2: Degree of Cooperation between Servers

341	5.1.  Hop-by-Hop

343	   The idea of hop-by-hop overload control is to instantiate a separate
344	   control loop between all neighboring SIP servers that directly
345	   exchange traffic.  I.e., the Actuator is located on the SIP server
346	   that is the direct upstream neighbor of the SIP server that has the
347	   corresponding Monitor.  Each control loop between two servers is
348	   completely independent of the control loop between other servers
349	   further up- or downstream.  In the example in Figure 2(a), three
350	   independent overload control loops are instantiated: A - B, B - C and
351	   B - D. Each loop only controls a single hop.  Overload feedback
352	   received from a downstream neighbor is not forwarded further
353	   upstream.  Instead, a SIP server acts on this feedback, for example,
354	   by rejecting SIP messages if needed.  If the upstream neighbor of a
355	   server also becomes overloaded, it will report this problem to its
356	   upstream neighbors, which again take action based on the reported
357	   feedback.  Thus, in hop-by-hop overload control, overload is always
358	   resolved by the direct upstream neighbors of the overloaded server
359	   without the need to involve entities that are located multiple SIP
360	   hops away.

362	   Hop-by-hop overload control reduces the impact of overload on a SIP
363	   network and can avoid congestion collapse.  It is simple and scales
364	   well to networks with many SIP entities.  An advantage is that it
365	   does not require feedback to be transmitted across multiple-hops,
366	   possibly crossing multiple trust domains.  Feedback is sent to the
367	   next hop only.  Furthermore, it does not require a SIP entity to
368	   aggregate a large number of overload status values or keep track of
369	   the overload status of SIP servers it is not communicating with.

371	5.2.  End-to-End

373	   End-to-end overload control implements an overload control loop along
374	   the entire path of a SIP request, from UAC to UAS.  An end-to-end
375	   overload control mechanism consolidates overload information from all
376	   SIP servers on the way (including all proxies and the UAS) and uses
377	   this information to throttle traffic as far upstream as possible.  An
378	   end-to-end overload control mechanism has to be able to frequently
379	   collect the overload status of all servers on the potential path(s)
380	   to a destination and combine this data into meaningful overload
381	   feedback.

383	   A UA or SIP server only throttles requests if it knows that these
384	   requests will eventually be forwarded to an overloaded server.  For
385	   example, if D is overloaded in Figure 2(b), A should only throttle
386	   requests it forwards to B when it knows that they will be forwarded
387	   to D. It should not throttle requests that will eventually be
388	   forwarded to C, since server C is not overloaded.  In many cases, it
389	   is difficult for A to determine which requests will be routed to C
390	   and D since this depends on the local routing decision made by B.
391	   These routing decisions can be highly variable and, for example,
392	   depend on call routing policies configured by the user, services
393	   invoked on a call, load balancing policies, etc.  The fact that a
394	   previous message to a target has been routed through an overload
395	   server does not necessarily mean the next message to this target will
396	   also be routed through the same server.

398	   The main problem of end-to-end overload control is its inherent
399	   complexity since UAC or SIP servers need to monitor all potential
400	   paths to a destination in order to determine which requests should be
401	   throttled and which requests may be sent.  Even if this information
402	   is available, it is not clear which path a specific request will
403	   take.

405	   A variant of end-to-end overload control is to implement a control
406	   loop control between a set of well-known SIP servers along the path
407	   of a SIP request.  For example, an overload control loop can be
408	   instantiated between a server that only has one downstream neighbor
409	   or a set of closely coupled SIP servers.  A control loop spanning
410	   multiple hops can be used if the sending entity has full knowledge
411	   about the SIP servers on the path of a SIP message.

413	   A key difference to transport protocols using end-to-end congestion
414	   control such as TCP is that the traffic exchanged between SIP servers
415	   consists of many individual SIP messages.  Each of these SIP messages
416	   has its own source and destination.  Even SIP messages containing
417	   identical SIP URIs (e.g., a SUBSCRIBE and a INVITE message to the
418	   same SIP URI) can be routed to different destinations.  This is
419	   different from TCP which controls a stream of packets between a
420	   single source and a single destination.

422	5.3.  Local Overload Control

424	   The idea of local overload control (see Figure 2(c)) is to run the
425	   Monitor and Actuator on the same server.  This enables the server to
426	   monitor the current resource usage and to reject messages that can't
427	   be processed without overusing the local resources.  The fundamental
428	   assumption behind local overload control is that it is less resource
429	   consuming for a server to reject messages than to process them.  A
430	   server can therefore reject the excess messages it cannot process to
431	   stop all retransmissions of these messages.

433	   Local overload control can be used in conjunction with an other
434	   overload control mechanisms and provides an additional layer of
435	   protection against overload.  It is fully implemented on the local
436	   server and does not require any cooperation from upstream neighbors.
437	   In general, SIP servers should apply implicit or explicit overload
438	   control techniques to control load before a local overload control
439	   mechanism is activated as a mechanism of last resort.

441	6.  Topologies

443	   The following topologies describe four generic SIP server
444	   configurations.  These topologies illustrate specific challenges for
445	   an overload control mechanism.  An actual SIP server topology is
446	   likely to consist of combinations of these generic scenarios.

448	   In the "load balancer" configuration shown in Figure 3(a) a set of
449	   SIP servers (D, E and F) receives traffic from a single source A. A
450	   load balancer is a typical example for such a configuration.  In this
451	   configuration, overload control needs to prevent server A (i.e., the
452	   load balancer) from sending too much traffic to any of its downstream
453	   neighbors D, E and F. If one of the downstream neighbors becomes
454	   overloaded, A can direct traffic to the servers that still have
455	   capacity.  If one of the servers serves as a backup, it can be
456	   activated once one of the primary servers reaches overload.

458	   If A can reliably determine that D, E and F are its only downstream
459	   neighbors and all of them are in overload, it may choose to report
460	   overload upstream on behalf of D, E and F. However, if the set of
461	   downstream neighbors is not fixed or only some of them are in
462	   overload then A should not activate an overload control since A can
463	   still forward the requests destined to non-overloaded downstream
464	   neighbors.  These requests would be throttled as well if A would use
465	   overload control towards its upstream neighbors.

467	   In the "multiple sources" configuration shown in Figure 3(b), a SIP
468	   server D receives traffic from multiple upstream sources A, B and C.
469	   Each of these sources can contribute a different amount of traffic,
470	   which can vary over time.  The set of active upstream neighbors of D
471	   can change as servers may become inactive and previously inactive
472	   servers may start contributing traffic to D.

474	   If D becomes overloaded, it needs to generate feedback to reduce the
475	   amount of traffic it receives from its upstream neighbors.  D needs
476	   to decide by how much each upstream neighbor should reduce traffic.
477	   This decision can require the consideration of the amount of traffic
478	   sent by each upstream neighbor and it may need to be re-adjusted as
479	   the traffic contributed by each upstream neighbor varies over time.
480	   Server D can use a local fairness policy to determine much traffic it
481	   accepts from each upstream neighbor.

483	   In many configurations, SIP servers form a "mesh" as shown in
484	   Figure 3(c).  Here, multiple upstream servers A, B and C forward
485	   traffic to multiple alternative servers D and E. This configuration
486	   is a combination of the "load balancer" and "multiple sources"
487	   scenario.

489	                   +---+              +---+
490	                /->| D |              | A |-\
491	               /   +---+              +---+  \
492	              /                               \   +---+
493	       +---+-/     +---+              +---+    \->|   |
494	       | A |------>| E |              | B |------>| D |
495	       +---+-\     +---+              +---+    /->|   |
496	              \                               /   +---+
497	               \   +---+              +---+  /
498	                \->| F |              | C |-/
499	                   +---+              +---+

501	       (a) load balancer             (b) multiple sources

503	       +---+
504	       | A |---\                        a--\
505	       +---+-\  \---->+---+                 \
506	              \/----->| D |             b--\ \--->+---+
507	       +---+--/\  /-->+---+                 \---->|   |
508	       | B |    \/                      c-------->| D |
509	       +---+---\/\--->+---+                       |   |
510	               /\---->| E |            ...   /--->+---+
511	       +---+--/   /-->+---+                 /
512	       | C |-----/                      z--/
513	       +---+

515	             (c) mesh                   (d) edge proxy

517	                           Figure 3: Topologies

519	   Overload control that is based on reducing the number of messages a
520	   sender is allowed to send is not suited for servers that receive
521	   requests from a very large population of senders, each of which only
522	   infrequently sends a request.  This scenario is shown in Figure 3(d).
523	   An edge proxy that is connected to many UAs is a typical example for
524	   such a configuration.

526	   Since each UA typically only contributes a few requests, which are
527	   often related to the same call, it can't decrease its message rate to
528	   resolve the overload.  In such a configuration, a SIP server can
529	   resort to local overload control by rejecting a percentage of the
530	   requests it receives with 503 (Service Unavailable) responses.  Since
531	   there are many upstream neighbors that contribute to the overall
532	   load, sending 503 (Service Unavailable) to a fraction of them can
533	   gradually reduce load without entirely stopping all incoming traffic.
534	   The Retry-After header can be used in 503 (Service Unavailable)
535	   responses to ask UAs to wait a given number of seconds before trying
536	   the call again.  Using 503 (Service Unavailable) towards individual
537	   sources can, however, not prevent overload if a large number of users
538	   places calls at the same time.

540	      Note: The requirements of the "edge proxy" topology are different
541	      than the ones of the other topologies, which may require a
542	      different method for overload control.

544	7.  Fairness

546	   There are many different ways to define fairness if a SIP server has
547	   multiple upstream neighbors.  In the context of SIP server overload,
548	   it is helpful to describe two categories of fairness criteria: basic
549	   fairness and customized fairness.  With basic fairness a SIP server
550	   treats all end-users equally and ensures that each end-user has the
551	   same chance in accessing the server resources.  With customized
552	   fairness the server allocate resources according to different
553	   priorities.  An example application of the basic fairness criteria is
554	   the "Third caller receives free tickets" scenario, where each end-
555	   user should have an equal success probability in making calls through
556	   an overloaded SIP server, regardless of which service provider he/she
557	   is subscribing to.  An example of customized fairness would be a
558	   server which gives different resource allocations to its upstream
559	   neighbors (e.g., service providers) as defined in service level
560	   agreements.

562	8.  Performance Metrics

564	   The performance of an overload control mechanism can be measured
565	   using different metrics.

567	   A key performance indicator is the goodput of a SIP server during
568	   overload.  Ideally, a SIP server is enabled to perform at its
569	   capacity limit during periods of overload.  E.g., if a SIP server has
570	   a processing capacity of 140 INVITE transactions per second then an
571	   overload control mechanism should enable it to handle 140 INVITEs per
572	   second even if the offered load is much higher.  The delay introduced
573	   by a SIP server is another important indicator.  An overload control
574	   mechanism should ensure that the delay encountered by a SIP message
575	   is not increased significantly during periods of overload.

577	   Reactiveness and stability are other important performance
578	   indicators.  An overload control mechanism should quickly react to an
579	   overload occurrence and ensure that a SIP server does not become
580	   overloaded even during sudden peaks of load.  Similarly, an overload
581	   control mechanism should quickly remove all throttles if the overload
582	   disappears.  Stability is another important criteria as using an
583	   overload control mechanism should not lead to the oscillation of load
584	   on a SIP server.  The performance of SIP overload control mechanisms
585	   is discussed in [Noel et al.], [Shen et al.] and [Hilt et al.].

587	   In addition to the above metrics, there are other indicators that are
588	   relevant for the evaluation of an overload control mechanism:

590	   Fairness:  Which types of fairness does the overload control
591	      mechanism implement?
592	   Self-limiting:  Is the overload control self-limiting if a SIP server
593	      becomes unresponsive?
594	   Changes in neighbor set:  How does the mechanism adapt to a changing
595	      set of sending entities?
596	   Data points to monitor:  Which data points does an overload control
597	      mechanism need to monitor?
598	   Tuning requirements:  Does the algorithm work out of the box or is
599	      parameter tweaking required?

601	      TBD: a discussion of these metrics for the following overload
602	      control mechanisms is needed.

604	9.  Explicit Overload Control Feedback

606	   Explicit overload control feedback enables a receiver to indicate how
607	   much traffic it wants to receive.  Explicit overload control
608	   mechanisms can be differentiated based on the type of information
609	   conveyed in the overload control feedback.  Another way to classify
610	   explicit overload control mechanisms is whether the control function
611	   is in the receiving or sending entity (receiver- vs. sender-based
612	   overload control).

614	9.1.  Rate-based Overload Control

616	   The key idea of rate-based overload control is to limit the request
617	   rate at which an upstream element is allowed to forward traffic to
618	   the downstream neighbor.  If overload occurs, a SIP server instructs
619	   each upstream neighbor to send at most X requests per second.  Each
620	   upstream neighbor can be assigned a different rate cap.

622	   An example algorithm for the Actuator in a sending entity to
623	   implement a rate cap is request gapping.  After transmitting a
624	   request to a downstream neighbor, a server waits for 1/X seconds
625	   before it transmits the next request to the same neighbor.  Requests
626	   that arrive during the waiting period are not forwarded and are
627	   either redirected, rejected or buffered.

629	   The rate cap ensures that the number of requests received by a SIP
630	   server never increases beyond the sum of all rate caps granted to
631	   upstream neighbors.  Rate-based overload control protects a SIP
632	   server against overload even during load spikes assuming there are no
633	   new upstream neighbors that start sending traffic.  New upstream
634	   neighbors need to be considered in all rate caps currently assigned
635	   to upstream neighbors.  The current overall rate cap of a SIP server
636	   is determined by an overload control algorithm, e.g., based on system
637	   load.

639	   Rate-based overload control requires a SIP server to assign a rate
640	   cap to each of its upstream neighbors while it is activated.
641	   Effectively, a server needs to assign a share of its overall capacity
642	   to each upstream neighbor.  A server needs to ensure that the sum of
643	   all rate caps assigned to upstream neighbors is not (significantly)
644	   higher than its actual processing capacity.  This requires a SIP
645	   server to keep track of the set of upstream neighbors and to adjust
646	   the rate cap if a new upstream neighbor appears or an existing
647	   neighbor stops transmitting.  For example, if the capacity of the
648	   server is X and this server is receiving traffic from two upstream
649	   neighbors, it can assign a rate of X/2 to each of them.  If a third
650	   sender appears, the rate for each sender is lowered to X/3.  If the
651	   rate cap assigned to upstream neighbors is too high, a server may
652	   still experience overload.  If the cap is too low, the upstream
653	   neighbors will reject requests even though they could be processed by
654	   the server.

656	   An approach for estimating a rate cap for each upstream neighbor is
657	   using a fixed proportion of a control variable, X, where X is
658	   initially equal to the capacity of the SIP server.  The server then
659	   increases or decreases X until the workload arrival rate matches the
660	   actual server capacity.  Usually, this will mean that the sum of the
661	   rate caps sent out by the server (=X) exceeds its actual capacity,
662	   but enables upstream neighbors who are not generating more than their
663	   fair share of the work to be effectively unrestricted.  In this
664	   approach, the server only has to measure the aggregate arrival rate.
665	   However, since the overall rate cap is usually higher than the actual
666	   capacity, brief periods of overload may occur.

668	9.2.  Loss-based Overload Control

670	   A loss percentage enables a SIP server to ask an upstream neighbor to
671	   reduce the number of requests it would normally forward to this
672	   server by a percentage X. For example, a SIP server can ask an
673	   upstream neighbor to reduce the number of requests this neighbor
674	   would normally send by 10%.  The upstream neighbor then redirects or
675	   rejects X percent of the traffic that is destined for this server.

677	   An algorithm for the sending entity to implement a loss percentage is
678	   to draw a random number between 1 and 100 for each request to be
679	   forwarded.  The request is not forwarded to the server if the random
680	   number is less than or equal to X.

682	   An advantage of loss-based overload control is that, the receiving
683	   entity does not need to track the set of upstream neighbors or the
684	   request rate it receives from each upstream neighbor.  It is
685	   sufficient to monitor the overall system utilization.  To reduce
686	   load, a server can ask its upstream neighbors to lower the traffic
687	   forwarded by a certain percentage.  The server calculates this
688	   percentage by combining the loss percentage that is currently in use
689	   (i.e., the loss percentage the upstream neighbors are currently using
690	   when forwarding traffic), the current system utilization and the
691	   desired system utilization.  For example, if the server load
692	   approaches 90% and the current loss percentage is set to a 50%
693	   traffic reduction, then the server can decide to increase the loss
694	   percentage to 55% in order to get to a system utilization of 80%.
695	   Similarly, the server can lower the loss percentage if permitted by
696	   the system utilization.

698	   Loss-based overload control requires that the throttle percentage is
699	   adjusted to the current overall number of requests received by the
700	   server.  This is particularly important if the number of requests
701	   received fluctuates quickly.  For example, if a SIP server sets a
702	   throttle value of 10% at time t1 and the number of requests increases
703	   by 20% between time t1 and t2 (t1<t2), then the server will see an
704	   increase in traffic by 10% between time t1 and t2.  This is even
705	   though all upstream neighbors have reduced traffic by 10% as told.
706	   Thus, percentage throttling requires an adjustment of the throttling
707	   percentage in response to the traffic received and may not always be
708	   able to prevent a server from encountering brief periods of overload
709	   in extreme cases.

711	9.3.  Window-based Overload Control

713	   The key idea of window-based overload control is to allow an entity
714	   to transmit a certain number of messages before it needs to receive a
715	   confirmation for the messages in transit.  Each sender maintains an
716	   overload window that limits the number of messages that can be in
717	   transit without being confirmed.

719	   Each sender maintains an unconfirmed message counter for each
720	   downstream neighbor it is communicating with.  For each message sent
721	   to the downstream neighbor, the counter is increased.  For each
722	   confirmation received, the counter is decreased.  The sender stops
723	   transmitting messages to the downstream neighbor when the unconfirmed
724	   message counter has reached the current window size.

726	   A crucial parameter for the performance of window-based overload
727	   control is the window size.  Each sender has an initial window size
728	   it uses when first sending a request.  This window size can be
729	   changed based on the feedback it receives from the receiver.

731	   The sender adjusts its window size as soon as it receives the
732	   corresponding feedback from the receiver.  If the new window size is
733	   smaller than the current unconfirmed message counter, the sender
734	   stops transmitting messages until more messages are confirmed and the
735	   current unconfirmed message counter is less than the window size.

737	   A sender should not treat the reception of a 100 Trying response as
738	   an implicit confirmation for a message. 100 Trying responses are
739	   often created by a SIP server very early in processing and do not
740	   indicate that a message has been successfully processed and cleared
741	   from the input buffer.  If the downstream neighbor is a stateless
742	   proxy, it will not create 100 Trying responses at all and instead
743	   pass through 100 Trying responses created by the next stateful
744	   server.  Also, 100 Trying responses are typically only created for
745	   INVITE requests.  Explicit message confirmations do not have these
746	   problems.

748	   The behavior and issues of window-based overload control are similar
749	   to rate-based overload control, in that the total available receiver
750	   buffer space needs to be divided among all upstream neighbors.
751	   However, unlike rate-based overload control, window-based overload
752	   control can ensure that the receiver buffer does not overflow under
753	   normal conditions.  The transmission of messages by senders is
754	   effectively clocked by message confirmations received from the
755	   receiver.  A buffer overflow can occur if a large number of new
756	   upstream neighbors arrives at the same time.  In window-based
757	   overload control, the number of messages a sender is allowed to send
758	   can be set to zero.  In these cases, the sender needs to be informed
759	   through an out-of-band mechanism that it is allowed to send again if
760	   the window at the receiver has opened up.

762	9.4.  Overload Signal-based Overload Control

764	   The key idea of overload signal-based overload control is to use the
765	   transmission of a 503 (Service Unavailable) response as a signal for
766	   overload in the downstream neighbor.  After receiving a 503 (Service
767	   Unavailable) response, the sender reduces the load forwarded to the
768	   downstream neighbor to avoid triggering more 503 (Service
769	   Unavailable) responses.  The sender keeps reducing the load if 503
770	   (Service Unavailable) responses are received.  This scheme is based
771	   on the use of 503 (Service Unavailable) responses without Retry-After
772	   header as the Retry-After header would require a sender to entirely
773	   stop forwarding requests.

775	   A sender which has not received 503 (Service Unavailable) responses
776	   for a while but is still throttling traffic can start to increase the
777	   offered load.  By slowly increasing the traffic forwarded a sender
778	   can detect that overload in the downstream neighbor has been resolved
779	   and more load can be forwarded.  The load is increased until the
780	   sender again receives another 503 (Service Unavailable) response or
781	   is forwarding all requests it has.

783	   A possible algorithm for adjusting traffic is additive increase/
784	   multiplicative decrease (AIMD).

786	   Overload Signal-based Overload Control is a sender-based overload
787	   control mechanism.

789	9.5.  On-/Off Overload Control

791	   On-/off overload control feedback enables a SIP server to turn the
792	   traffic it is receiving either on or off.  The 503 (Service
793	   Unavailable) response with Retry-After header implements on-/off
794	   overload control.  On-/off overload control is less effective in
795	   controlling load than the fine grained control methods above.  In
796	   fact, all above methods can realize on/-off overload control, e.g.,
797	   by setting the allowed rate to either zero or unlimited.

799	10.  Implicit Overload Control

801	   Implicit overload control ensures that the transmission of a SIP
802	   server is self-limiting.  It slows down the transmission rate of a
803	   sender when there is an indication that the receiving entity is
804	   experiencing overload.  Such an indication can be that the receiving
805	   entity is not responding within the expected timeframe or is not
806	   responding at all.  The idea of implicit overload control is that
807	   senders should try to sense overload of a downstream neighbor even if
808	   there is no explicit overload control feedback.  It avoids that an
809	   overloaded server, which has become unable to generate overload
810	   control feedback, will be overwhelmed with requests.

812	   Window-based overload control is inherently self-limiting since a
813	   sender cannot continue without receiving confirmations.  All other
814	   explicit overload control schemes described above do not have this
815	   property and require additional implicit controls to limit
816	   transmissions in case an overloaded downstream neighbor does not
817	   generate explicit feedback.

819	11.  Overload Control Algorithms

821	   An important aspect of the design of an overload control mechanism is
822	   the overload control algorithm.  The control algorithm determines
823	   when the amount of traffic to a SIP server needs to be decreased and
824	   when it can be increased.  In terms of the model described in
825	   Section 4 the control algorithm takes (S) as an input value and
826	   generates (T) as a result.

828	   Overload control algorithms have been studied to a large extent and
829	   many different overload control algorithms exist.  With many
830	   different overload control algorithms available, it seems reasonable
831	   to define a baseline algorithm and allow the use of other algorithms
832	   if they don't violate the protocol semantics.  This will also allow
833	   the development of future algorithms, which may lead to a better
834	   performance.

836	12.  Message Prioritization

838	   Overload control can require a SIP server to prioritize messages and
839	   select messages that need to be rejected or redirected.  The
840	   selection is largely a matter of local policy of the SIP server.  As
841	   a general rule, a SIP server should prioritize high-priority
842	   requests, such as emergency service requests, and preserve them as
843	   much as possible during times of overload.  It should also prioritize
844	   messages for ongoing sessions over messages that set up a new
845	   session.

847	13.  Security Considerations

849	   Overload control mechanisms, in general, have security implications.
850	   If not designed carefully they can, for example, be used to launch a
851	   denial of service attack.  The specific security risks and their
852	   remedies depend on the actual protocol mechanisms chosen for overload
853	   control.  They need to be addressed in a document that specifies such
854	   a mechanism.

856	14.  IANA Considerations

858	   This document does not require any IANA considerations.

860	15.  Informative References

862	   [Hilt et al.]
863	              Hilt, V. and I. Widjaja, "Controlling Overload in Networks
864	              of SIP Servers", IEEE International Conference on Network
865	              Protocols (ICNP'08), Orlando, Florida, October 2008.

867	   [Noel et al.]
868	              Noel, E. and C. Johnson, "Initial Simulation Results That
869	              Analyze SIP Based VoIP Networks Under Overload",
870	              International Teletraffic Congress (ITC'07), Ottawa,
871	              Canada, June 2007.

873	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
874	              A., Peterson, J., Sparks, R., Handley, M., and E.
875	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
876	              June 2002.

878	   [RFC4412]  Schulzrinne, H. and J. Polk, "Communications Resource
879	              Priority for the Session Initiation Protocol (SIP)",
880	              RFC 4412, February 2006.

882	   [RFC5390]  Rosenberg, J., "Requirements for Management of Overload in
883	              the Session Initiation Protocol", RFC 5390, December 2008.

885	   [Shen et al.]
886	              Shen, C., Schulzrinne, H., and E. Nahum, "Session
887	              Initiation Protocol (SIP) Server Overload Control: Design
888	              and Evaluation, Principles", Systems and Applications of
889	              IP Telecommunications (IPTComm'08), Heidelberg, Germany,
890	              July 2008.

892	Appendix A.  Contributors

894	   Contributors to this document are: Ahmed Abdelal (Sonus Networks),
895	   Mary Barnes (Nortel), Carolyn Johnson (AT&T Labs), Daryl Malas
896	   (CableLabs), Eric Noel (AT&T Labs), Tom Phelan (Sonus Networks),
897	   Jonathan Rosenberg (Cisco), Henning Schulzrinne (Columbia
898	   University), Charles Shen (Columbia University), Nick Stewart
899	   (British Telecommunications plc), Rich Terpstra (Level 3), Fangzhe
900	   Chang (Bell Labs/Alcatel-Lucent).  Many thanks!

902	Author's Address

904	   Volker Hilt (Ed.)
905	   Bell Labs/Alcatel-Lucent
906	   791 Holmdel-Keyport Rd
907	   Holmdel, NJ  07733
908	   USA

910	   Email: volkerh@bell-labs.com