idnits 2.17.1 

draft-ietf-sipping-overload-design-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 799.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 810.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 817.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 823.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 22, 2008) is 5657 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SIPPING Working Group                                      V. Hilt (Ed.)
3	Internet-Draft                                  Bell Labs/Alcatel-Lucent
4	Intended status: Informational                          October 22, 2008
5	Expires: April 25, 2009

7	  Design Considerations for Session Initiation Protocol (SIP) Overload
8	                                Control
9	                 draft-ietf-sipping-overload-design-00

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on April 25, 2009.

36	Abstract

38	   Overload occurs in Session Initiation Protocol (SIP) networks when
39	   SIP servers have insufficient resources to handle all SIP messages
40	   they receive.  Even though the SIP protocol provides a limited
41	   overload control mechanism through its 503 (Service Unavailable)
42	   response code, SIP servers are still vulnerable to overload.  This
43	   document discusses models and design considerations for a SIP
44	   overload control mechanism.

46	Table of Contents

48	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
49	   2.  SIP Overload Problem . . . . . . . . . . . . . . . . . . . . .  4
50	   3.  Explicit vs. Implicit Overload Control . . . . . . . . . . . .  4
51	   4.  System Model . . . . . . . . . . . . . . . . . . . . . . . . .  5
52	   5.  Degree of Cooperation  . . . . . . . . . . . . . . . . . . . .  7
53	     5.1.  Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . .  8
54	     5.2.  End-to-End . . . . . . . . . . . . . . . . . . . . . . . .  9
55	     5.3.  Local Overload Control . . . . . . . . . . . . . . . . . . 10
56	   6.  Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 10
57	   7.  Explicit Overload Control Feedback . . . . . . . . . . . . . . 13
58	     7.1.  Rate-based Overload Control  . . . . . . . . . . . . . . . 13
59	     7.2.  Loss-based Overload Control  . . . . . . . . . . . . . . . 14
60	     7.3.  Window-based Overload Control  . . . . . . . . . . . . . . 15
61	     7.4.  Overload Signal-based Overload Control . . . . . . . . . . 16
62	     7.5.  On-/Off Overload Control . . . . . . . . . . . . . . . . . 16
63	   8.  Implicit Overload Control  . . . . . . . . . . . . . . . . . . 17
64	   9.  Overload Control Algorithms  . . . . . . . . . . . . . . . . . 17
65	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
66	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
67	   Appendix A.  Contributors  . . . . . . . . . . . . . . . . . . . . 18
68	   12. Informative References . . . . . . . . . . . . . . . . . . . . 18
69	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18
70	   Intellectual Property and Copyright Statements . . . . . . . . . . 19

72	1.  Introduction

74	   As with any network element, a Session Initiation Protocol (SIP)
75	   [RFC3261] server can suffer from overload when the number of SIP
76	   messages it receives exceeds the number of messages it can process.
77	   Overload occurs if a SIP server does not have sufficient resources to
78	   process all incoming SIP messages.  These resources may include CPU,
79	   memory, network bandwidth, input/output, or disk resources.

81	   Overload can pose a serious problem for a network of SIP servers.
82	   During periods of overload, the throughput of a network of SIP
83	   servers can be significantly degraded.  In fact, overload may lead to
84	   a situation in which the throughput drops down to a small fraction of
85	   the original processing capacity.  This is often called congestion
86	   collapse.

88	   An overload control mechanism enables a SIP server to perform close
89	   to its capacity limit during times of overload.  Overload control is
90	   used by a SIP server if it is unable to process all SIP requests due
91	   to resource constraints.  There are other failure cases in which a
92	   SIP server can successfully process incoming requests but has to
93	   reject them for other reasons.  For example, a PSTN gateway that runs
94	   out of trunk lines but still has plenty of capacity to process SIP
95	   messages should reject incoming INVITEs using a 488 (Not Acceptable
96	   Here) response [RFC4412].  Similarly, a SIP registrar that has lost
97	   connectivity to its registration database but is still capable of
98	   processing SIP messages should reject REGISTER requests with a 500
99	   (Server Error) response [RFC3261].  Overload control mechanisms do
100	   not apply in these cases and SIP provides appropriate response codes
101	   for them.

103	   The SIP protocol provides a limited mechanism for overload control
104	   through its 503 (Service Unavailable) response code and the Retry-
105	   After header.  However, this mechanism cannot prevent overload of a
106	   SIP server and it cannot prevent congestion collapse.  In fact, it
107	   may cause traffic to oscillate and to shift between SIP servers and
108	   thereby worsen an overload condition.  A detailed discussion of the
109	   SIP overload problem, the problems with the 503 (Service Unavailable)
110	   response code and the Retry-After header and the requirements for a
111	   SIP overload control mechanism can be found in
112	   [I-D.ietf-sipping-overload-reqs].

114	   This document discusses the models, assumptions and design
115	   considerations for a SIP overload control mechanism.  The document is
116	   a product of the SIP overload control design team.

118	2.  SIP Overload Problem

120	   A key contributor to the SIP congestion collapse
121	   [I-D.ietf-sipping-overload-reqs] is the regenerative behavior of
122	   overload in the SIP protocol.  When SIP is running over the UDP
123	   protocol, it will retransmit messages that were dropped by a SIP
124	   server due to overload and thereby increase the offered load for the
125	   already overloaded server.  This increase in load worsens the
126	   severity of the overload condition and, in turn, causes more messages
127	   to be dropped.  A congestion collapse can occur.

129	   While regenerative behavior under overload should ideally be avoided
130	   by any protocol and would lead to stable operation under overload,
131	   this is often difficult to achieve in practice.  For example,
132	   changing the SIP retransmission timer mechanisms can reduce the
133	   degree of regeneration during overload, however, these changes will
134	   impact the ability of SIP to recover from message losses.  Without
135	   any retransmission each message that is dropped due to SIP server
136	   overload will eventually lead to a failed call.

138	   For a SIP INVITE transaction to be successful a minimum of three
139	   messages need to be forwarded by a SIP server, often five or more.
140	   If a SIP server under overload randomly discards messages without
141	   evaluating them, the chances that all messages belonging to a
142	   transaction are passed on will decrease as the load increases.  Thus,
143	   the number of successful transactions will decrease even if the
144	   message throughput of a server remains up and the overload behavior
145	   would be fully non-regenerative.  A SIP server might (partially)
146	   parse incoming messages to determine if it is a new request or a
147	   message belonging to an existing transaction.  However, after having
148	   spend resources on parsing a SIP message, discarding this message
149	   becomes expensive as the resources already spend are lost.  The
150	   number of successful transactions will therefore decline with an
151	   increase in load as less and less resources can be spent on
152	   forwarding messages.  The slope of the decline depends on the amount
153	   of resources spent to evaluate each message.

155	   Another key challenge for SIP overload control is that the rate of
156	   the true traffic source usually cannot be controlled.  Overload is
157	   often caused by a large number of UAs each of which creates only a
158	   single message.  These UAs cannot be rate controlled as they only
159	   send one message.  However, the sum of their traffic can overload a
160	   SIP server.

162	3.  Explicit vs. Implicit Overload Control

164	   The main differences between explicit and implicit overload control
165	   is the way overload is signaled from a SIP server that is reaching
166	   overload condition to its upstream neighbors.

168	   In an explicit overload control mechanism, a SIP server uses an
169	   explicit overload signal to indicate that it is reaching its capacity
170	   limit.  Upstream neighbors receiving this signal can adjust their
171	   transmission rate as indicated in the overload signal to a level that
172	   is acceptable to the downstream server.  The overload signal enables
173	   a SIP server to steer the load it is receiving to a rate at which it
174	   can perform at maximum capacity.

176	   Implicit overload control uses the absence of responses and packet
177	   loss as an indication of overload.  A SIP server that is sensing such
178	   a condition reduces the load it is forwarding a downstream neighbor.
179	   Since there is no explicit overload signal, this mechanism is robust
180	   as it does not depend on actions taken by the SIP server running into
181	   overload.

183	   The ideas of explicit and implicit overload control are in fact
184	   complementary.  By considering implicit overload indications a server
185	   can avoid overloading an unresponsive downstream neighbor.  An
186	   explicit overload signal enables a SIP server to actively steer the
187	   incoming load to a desired level.

189	4.  System Model

191	   The model shown in Figure 1 identifies fundamental components of an
192	   explicit SIP overload control mechanism:

194	   SIP Processor:  The SIP Processor processes SIP messages and is the
195	      component that is protected by overload control.
196	   Monitor:  The Monitor measures the current load of the SIP processor
197	      on the receiving entity.  It implements the mechanisms needed to
198	      determine the current usage of resources relevant for the SIP
199	      processor and reports load samples (S) to the Control Function.
200	   Control Function:  The Control Function implements the overload
201	      control algorithm.  The control function uses the load samples (S)
202	      and determines if overload has occurred and a throttle (T) needs
203	      to be set to adjust the load sent to the SIP processor on the
204	      receiving entity.  The control function on the receiving entity
205	      sends load feedback (F) to the sending entity.
206	   Actuator:  The Actuator implements the algorithms needed to act on
207	      the throttles (T) and ensures that the amount of traffic forwarded
208	      to the receiving entity meets the criteria of the throttle.  For
209	      example, a throttle may instruct the Actuator to not forward more
210	      than 100 INVITE messages per second.  The Actuator implements the
211	      algorithms to achieve this objective, e.g., using message gapping.

213	      It also implements algorithms to select the messages that will be
214	      affected and determine whether they are rejected or redirected.

216	   The type of feedback (F) conveyed from the receiving to the sending
217	   entity depends on the overload control method used (i.e., loss-based,
218	   rate-based or window-based overload control; see Section 7), the
219	   overload control algorithm (see Section 9) as well as other design
220	   parameters.  In any case, the feedback (F) enables the sending entity
221	   to adjust the amount of traffic forwarded to the receiving entity to
222	   a level that is acceptable to the receiving entity without causing
223	   overload.

225	   Figure 1 depicts a general system model for overload control.  In
226	   this diagram, one instance of the control function is on the sending
227	   entity (i.e., associated with the actuator) and one is on the
228	   receiving entity (i.e., associated with the monitor).  However, a
229	   specific mechanism may not require both elements.  In this case, one
230	   of two control function elements can be empty and simply passes along
231	   feedback.  E.g., if (F) is defined as a loss-rate (e.g., reduce
232	   traffic by 10%) there is no need for a control function on the
233	   sending entity as the content of (F) can be copied directly into (T).

235	   The model in Figure 1 shows a scenario with one sending and one
236	   receiving entity.  In a more realistic scenario a receiving entity
237	   will receive traffic from multiple sending entities and vice versa
238	   (see Section 6).  The feedback generated by a Monitor will therefore
239	   often be distributed across multiple Actuators.  An Actuator needs to
240	   be prepared to receive different levels of feedback from different
241	   receiving entities and throttle traffic to these entities
242	   accordingly.

244	          Sending                Receiving
245	           Entity                  Entity
246	     +----------------+      +----------------+
247	     |    Server A    |      |    Server B    |
248	     |  +----------+  |      |  +----------+  |    -+
249	     |  | Control  |  |  F   |  | Control  |  |     |
250	     |  | Function |<-+------+--| Function |  |     |
251	     |  +----------+  |      |  +----------+  |     |
252	     |     T |        |      |       ^        |     | Overload
253	     |       v        |      |       | S      |     | Control
254	     |  +----------+  |      |  +----------+  |     |
255	     |  | Actuator |  |      |  | Monitor  |  |     |
256	     |  +----------+  |      |  +----------+  |     |
257	     |       |        |      |       ^        |    -+
258	     |       v        |      |       |        |    -+
259	     |  +----------+  |      |  +----------+  |     |
260	   <-+--|   SIP    |  |      |  |   SIP    |  |     |  SIP
261	   --+->|Processor |--+------+->|Processor |--+->   | System
262	     |  +----------+  |      |  +----------+  |     |
263	     +----------------+      +----------------+    -+

265	           Figure 1: System Model for Explicit Overload Control

267	5.  Degree of Cooperation

269	   A SIP request is often processed by more than one SIP server on its
270	   path to the destination.  Thus, a design choice for an explicit
271	   overload control mechanism is where to place the components of
272	   overload control along the path of a request and, in particular,
273	   where to place the Monitor and Actuator.  This design choice
274	   determines the degree of cooperation between the SIP servers on the
275	   path.  Overload control can be implemented hop-by-hop with the
276	   Monitor on one server and the Actuator on its direct upstream
277	   neighbor.  Overload control can be implemented end-to-end with
278	   Monitors on all SIP servers along the path of a request and an
279	   Actuator on the sender.  In this case, the Control Functions
280	   associated with each Monitor have to cooperate to jointly determine
281	   the overall feedback for this path.  Finally, overload control can be
282	   implemented locally on a SIP server if Monitor and Actuator reside on
283	   the same server.  In this case, the sending entity and receiving
284	   entity are the same SIP server and Actuator and Monitor operate on
285	   the same SIP processor (although, the Actuator typically operates on
286	   a pre-processing stage in local overload control).  Local overload
287	   control is an internal overload control mechanism as the control loop
288	   is implemented internally on one server.  Hop-by-hop and end-to-end
289	   are external overload control mechanisms.  All three configurations
290	   are shown in Figure 2.

292	               +---------+             +------(+)---------+
293	      +------+ |         |             |       ^          |
294	      |      | |        +---+          |       |         +---+
295	      v      | v    //=>| C |          v       |     //=>| C |
296	   +---+    +---+ //    +---+       +---+    +---+ //    +---+
297	   | A |===>| B |                   | A |===>| B |
298	   +---+    +---+ \\    +---+       +---+    +---+ \\    +---+
299	               ^    \\=>| D |          ^       |     \\=>| D |
300	               |        +---+          |       |         +---+
301	               |         |             |       v          |
302	               +---------+             +------(+)---------+

304	         (a) hop-by-hop                   (b) end-to-end

306	                         +-+
307	                         v |
308	    +-+      +-+        +---+
309	    v |      v |    //=>| C |
310	   +---+    +---+ //    +---+
311	   | A |===>| B |
312	   +---+    +---+ \\    +---+
313	                    \\=>| D |
314	                        +---+
315	                         ^ |
316	                         +-+

318	           (c) local

320	    ==> SIP request flow
321	    <-- Overload feedback loop

323	              Figure 2: Degree of Cooperation between Servers

325	5.1.  Hop-by-Hop

327	   The idea of hop-by-hop overload control is to instantiate a separate
328	   control loop between all neighboring SIP servers that directly
329	   exchange traffic.  I.e., the Actuator is located on the SIP server
330	   that is the direct upstream neighbor of the SIP server that has the
331	   corresponding Monitor.  Each control loop between two servers is
332	   completely independent of the control loop between other servers
333	   further up- or downstream.  In the example in Figure 2(b), three
334	   independent overload control loops are instantiated: A - B, B - C and
335	   B - D. Each loop only controls a single hop.  Overload feedback
336	   received from a downstream neighbor is not forwarded further
337	   upstream.  Instead, a SIP server acts on this feedback, for example,
338	   by re-routing or rejecting traffic if needed.  If the upstream
339	   neighbor of a server also becomes overloaded, it will report this
340	   problem to its upstream neighbors, which again take action based on
341	   the reported feedback.  Thus, in hop-by-hop overload control,
342	   overload is always resolved by the direct upstream neighbors of the
343	   overloaded server without the need to involve entities that are
344	   located multiple SIP hops away.

346	   Hop-by-hop overload control reduces the impact of overload on a SIP
347	   network and can avoid congestion collapse.  It is simple and scales
348	   well to networks with many SIP entities.  A key advantage is that it
349	   does not require feedback to be transmitted across multiple-hops,
350	   possibly crossing multiple trust domains.  Feedback is sent to the
351	   next hop only.  Furthermore, it does not require a SIP entity to
352	   aggregate a large number of overload status values or keep track of
353	   the overload status of SIP servers it is not communicating with.

355	5.2.  End-to-End

357	   End-to-end overload control implements an overload control loop along
358	   the entire path of a SIP request, from UAC to UAS.  An end-to-end
359	   overload control mechanism consolidates overload information from all
360	   SIP servers on the way (including all proxies and the UAS) and uses
361	   this information to throttle traffic as far upstream as possible.  An
362	   end-to-end overload control mechanism has to be able to frequently
363	   collect the overload status of all servers on the potential path(s)
364	   to a destination and combine this data into meaningful overload
365	   feedback.

367	   A UA or SIP server only needs to throttle requests if it knows that
368	   these requests will eventually be forwarded to an overloaded server.
369	   For example, if D is overloaded in Figure 2(c), A should only
370	   throttle requests it forwards to B when it knows that they will be
371	   forwarded to D. It should not throttle requests that will eventually
372	   be forwarded to C, since server C is not overloaded.  In many cases,
373	   it is difficult for A to determine which requests will be routed to C
374	   and D since this depends on the local routing decision made by B.
375	   These routing decisions can be highly variable and, for example,
376	   depend on call routing policies configured by the user, services
377	   invoked on a call, load balancing policies, etc.  The fact that a
378	   previous call to a target has been routed through an overload server
379	   does not necessarily mean the next call to this target will also be
380	   routed through the same server.

382	   Overall, the main problem of end-to-end path overload control is its
383	   inherent complexity since UAC or SIP servers need to monitor all
384	   potential paths to a destination in order to determine which requests
385	   should be throttled and which requests may be sent.  Even if this
386	   information is available, it is not clear which path a specific
387	   request will take.  Therefore, end-to-end overload control is likely
388	   to only work well in simple, well-known topologies (e.g., a server
389	   that is known to only have one downstream neighbor).

391	   A key difference to transport protocols using end-to-end congestion
392	   control such as TCP is that the traffic exchanged by SIP servers
393	   consists of many individual SIP messages.  Each of these SIP messages
394	   has its own source and destination.  This is different from TCP which
395	   controls a stream of packets between a single source and a single
396	   destination.

398	5.3.  Local Overload Control

400	   The idea of local overload control is to run the Monitor and Actuator
401	   on the same server.  This enables the server to monitor the current
402	   resource usage and to reject messages that can't be processed without
403	   overusing the local resources.  The fundamental assumption behind
404	   local overload control is that it is less resource consuming for a
405	   server to reject messages than to process them.  A server can
406	   therefore reject the excess messages it cannot process, stopping all
407	   retransmissions of these messages.

409	   Local overload control can be used in conjunction with an implicit or
410	   explicit overload control mechanism and provides an additional layer
411	   of protection against overload.  It is fully implemented on the local
412	   server and does not require any cooperation from upstream neighbors.
413	   In general, servers should use implicit or explicit overload control
414	   techniques before using local overload control as a mechanism of last
415	   resort.

417	6.  Topologies

419	   The following topologies describe four generic SIP server
420	   configurations.  These topologies illustrate specific challenges for
421	   an overload control mechanism.  An actual SIP server topology is
422	   likely to consist of combinations of these generic scenarios.

424	   In the "load balancer" configuration shown in Figure 3(a) a set of
425	   SIP servers (D, E and F) receives traffic from a single source A. A
426	   load balancer is a typical example for such a configuration.  In this
427	   configuration, overload control needs to prevent server A (i.e., the
428	   load balancer) from sending too much traffic to any of its downstream
429	   neighbors D, E and F. If one of the downstream neighbors becomes
430	   overloaded, A can direct traffic to the servers that still have
431	   capacity.  If one of the servers serves as a backup, it can be
432	   activated once one of the primary servers reaches overload.

434	   If A can reliably determine that D, E and F are its only downstream
435	   neighbors and all of them are in overload, it may choose to report
436	   overload upstream on behalf of D, E and F. However, if the set of
437	   downstream neighbors is not fixed or only some of them are in
438	   overload then A should not use overload control since A can still
439	   forward the requests destined to non-overloaded downstream neighbors.
440	   These requests would be throttled as well if A would use overload
441	   control towards its upstream neighbors.

443	   In the "multiple sources" configuration shown in Figure 3(b), a SIP
444	   server D receives traffic from multiple upstream sources A, B and C.
445	   Each of these sources can contribute a different amount of traffic,
446	   which can vary over time.  The set of active upstream neighbors of D
447	   can change as servers may become inactive and previously inactive
448	   servers may start contributing traffic to D.

450	   If D becomes overloaded, it needs to generate feedback to reduce the
451	   amount of traffic it receives from its upstream neighbors.  D needs
452	   to decide by how much each upstream neighbor should reduce traffic.
453	   This decision can require the consideration of the amount of traffic
454	   sent by each upstream neighbor and it may need to be re-adjusted as
455	   the traffic contributed by each upstream neighbor varies over time.

457	   In many configurations, SIP servers form a "mesh" as shown in
458	   Figure 3(c).  Here, multiple upstream servers A, B and C forward
459	   traffic to multiple alternative servers D and E. This configuration
460	   is a combination of the "load balancer" and "multiple sources"
461	   scenario.

463	                   +---+              +---+
464	                /->| D |              | A |-\
465	               /   +---+              +---+  \
466	              /                               \   +---+
467	       +---+-/     +---+              +---+    \->|   |
468	       | A |------>| E |              | B |------>| D |
469	       +---+-\     +---+              +---+    /->|   |
470	              \                               /   +---+
471	               \   +---+              +---+  /
472	                \->| F |              | C |-/
473	                   +---+              +---+

475	       (a) load balancer             (b) multiple sources

477	       +---+
478	       | A |---\                        a--\
479	       +---+-\  \---->+---+                 \
480	              \/----->| D |             b--\ \--->+---+
481	       +---+--/\  /-->+---+                 \---->|   |
482	       | B |    \/                      c-------->| D |
483	       +---+---\/\--->+---+                       |   |
484	               /\---->| E |            ...   /--->+---+
485	       +---+--/   /-->+---+                 /
486	       | C |-----/                      z--/
487	       +---+

489	             (c) mesh                   (d) edge proxy

491	                           Figure 3: Topologies

493	   Overload control that is based on reducing the number of messages a
494	   sender is allowed to send is not suited for servers that receive
495	   requests from a very large population of senders, each of which only
496	   infrequently sends a request.  This scenario is shown in Figure 3(d).
497	   An edge proxy that is connected to many UAs is a typical example for
498	   such a configuration.

500	   Since each UA typically only contributes a few requests, which are
501	   often related to the same call, it can't decrease its message rate to
502	   resolve the overload.  In such a configuration, a SIP server can
503	   resort to local overload control by rejecting a percentage of the
504	   requests it receives with 503 (Service Unavailable) responses.  Since
505	   there are many upstream neighbors that contribute to the overall
506	   load, sending 503 (Service Unavailable) to a fraction of them can
507	   gradually reduce load without entirely stopping all incoming traffic.
508	   The Retry-After header can be used in 503 (Service Unavailable)
509	   responses to ask UAs to wait a given number of seconds before trying
510	   the call again.  Using 503 (Service Unavailable) towards individual
511	   sources can, however, not prevent overload if a large number of users
512	   places calls at the same time.

514	      Note: The requirements of the "edge proxy" topology are different
515	      than the ones of the other topologies, which may require a
516	      different method for overload control.

518	7.  Explicit Overload Control Feedback

520	   Explicit overload control feedback enables a receiver to indicate how
521	   much traffic it wants to receive.  Explicit overload control
522	   mechanisms can be differentiated based on the type of information
523	   conveyed in the overload control feedback.

525	7.1.  Rate-based Overload Control

527	   The key idea of rate-based overload control is to limit the request
528	   rate at which an upstream element is allowed to forward to the
529	   downstream neighbor.  If overload occurs, a SIP server instructs each
530	   upstream neighbor to send at most X requests per second.  Each
531	   upstream neighbor can be assigned a different rate cap.

533	   An example algorithm for the Actuator in a sending entity to
534	   implement a rate cap request gapping.  After transmitting a request
535	   to a downstream neighbor, a server waits for 1/X seconds before it
536	   transmits the next request to the same neighbor.  Requests that
537	   arrive during the waiting period are not forwarded and are either
538	   redirected, rejected or buffered.

540	   The rate cap ensures that the number of requests received by a SIP
541	   server never increases beyond the sum of all rate caps granted to
542	   upstream neighbors.  Rate-based overload control protects a SIP
543	   server against overload even during load spikes assuming there are no
544	   new upstream neighbors that start sending traffic.  New upstream
545	   neighbors need to be considered in all rate caps currently assigned
546	   to upstream neighbors.  The current overall rate cap of a SIP server
547	   is determined by an overload control algorithm, e.g., based on system
548	   load.

550	   Rate-based overload control requires a SIP server to assign a rate
551	   cap to each of its upstream neighbors while it is activated.
552	   Effectively, a server needs to assign a share of its overall capacity
553	   to each upstream neighbor.  A server needs to ensure that the sum of
554	   all rate caps assigned to upstream neighbors is not (significantly)
555	   higher than its actual processing capacity.  This requires a SIP
556	   server to keep track of the set of upstream neighbors and to adjust
557	   the rate cap if a new upstream neighbor appears or an existing
558	   neighbor stops transmitting.  For example, if the capacity of the
559	   server is X and this server is receiving traffic from two upstream
560	   neighbors, it can assign a rate of X/2 to each of them.  If a third
561	   sender appears, the rate for each sender is lowered to X/3.  If the
562	   rate cap assigned to upstream neighbors is too high, a server may
563	   still experience overload.  If the cap is too low, the upstream
564	   neighbors will reject requests even though they could be processed by
565	   the server.

567	   An approach for estimating a rate cap for each upstream neighbor is
568	   using a fixed proportion of a control variable, X, where X is
569	   initially equal to the capacity of the SIP server.  The server then
570	   increases or decreases X until the workload arrival rate matches the
571	   actual server capacity.  Usually, this will mean that the sum of the
572	   rate caps sent out by the server (=X) exceeds its actual capacity,
573	   but enables upstream neighbors who are not generating more than their
574	   fair share of the work to be effectively unrestricted.  In this
575	   approach, the server only has to measure the aggregate arrival rate,
576	   however, since the overall rate cap is usually higher than the actual
577	   capacity, brief periods of overload may occur.

579	7.2.  Loss-based Overload Control

581	   A loss percentage enables a SIP server to ask an upstream neighbor to
582	   reduce the number of requests it would normally forward to this
583	   server by a percentage X. For example, a SIP server can ask an
584	   upstream neighbor to reduce the number of requests this neighbor
585	   would normally send by 10%.  The upstream neighbor then redirects or
586	   rejects X percent of the traffic that is destined for this server.

588	   An algorithm for the sending entity to implement a loss percentage is
589	   to draw a random number between 1 and 100 for each request to be
590	   forwarded.  The request is not forwarded to the server if the random
591	   number is less than or equal to X.

593	   An advantage of loss-based overload control is that, the receiving
594	   entity does not need to track the set of upstream neighbors or the
595	   request rate it receives from each upstream neighbor.  It is
596	   sufficient to monitor the overall system utilization.  To reduce
597	   load, a server can ask its upstream neighbors to lower the traffic
598	   forwarded by a certain percentage.  The server calculates this
599	   percentage by combining the loss percentage that is currently in use
600	   (i.e., the loss percentage the upstream neighbors are currently using
601	   when forwarding traffic), the current system utilization and the
602	   desired system utilization.  For example, if the server load
603	   approaches 90% and the current loss percentage is set to a 50%
604	   traffic reduction, then the server can decide to increase the loss
605	   percentage to 55% in order to get to a system utilization of 80%.
606	   Similarly, the server can lower the loss percentage if permitted by
607	   the system utilization.

609	   Loss-based overload control requires that the throttle percentage is
610	   adjusted to the current overall number of requests received by the
611	   server.  This is in particular important if the number of requests
612	   received fluctuates quickly.  For example, if a SIP server sets a
613	   throttle value of 10% at time t1 and the number of requests increases
614	   by 20% between time t1 and t2 (t1<t2), then the server will see an
615	   increase in traffic by 10% between time t1 and t2.  This is even
616	   though all upstream neighbors have reduced traffic by 10% as told.
617	   Thus, percentage throttling requires an adjustment of the throttling
618	   percentage in response to the traffic received and may not always be
619	   able to prevent a server from encountering brief periods of overload
620	   in extreme cases.

622	7.3.  Window-based Overload Control

624	   The key idea of window-based overload control is to allow an entity
625	   to transmit a certain number of messages before it needs to receive a
626	   confirmation for the messages in transit.  Each sender maintains an
627	   overload window that limits the number of messages that can be in
628	   transit without being confirmed.

630	   Each sender maintains an unconfirmed message counter for each
631	   downstream neighbor it is communicating with.  For each message sent
632	   to the downstream neighbor, the counter is increased by one.  For
633	   each confirmation received, the counter is decreased by one.  The
634	   sender stops transmitting messages to the downstream neighbor when
635	   the unconfirmed message counter has reached the current window size.

637	   A crucial parameter for the performance of window-based overload
638	   control is the window size.  Each sender has an initial window size
639	   it uses when first sending a request.  This window size can be
640	   changed based on the feedback it receives from the receiver.

642	   The sender adjusts its window size as soon as it receives the
643	   corresponding feedback from the receiver.  If the new window size is
644	   smaller than the current unconfirmed message counter, the sender
645	   stops transmitting messages until more messages are confirmed and the
646	   current unconfirmed message counter is less than the window size.

648	   A sender should not treat the reception of a 100 Trying response as
649	   an implicit confirmation for a message. 100 Trying responses are
650	   often created by a SIP server very early in processing and do not
651	   indicate that a message has been successfully processed and cleared
652	   from the input buffer.  If the downstream neighbor is a stateless
653	   proxy, it will not create 100 Trying responses at all and instead
654	   pass through 100 Trying responses created by the next stateful
655	   server.  Also, 100 Trying responses are typically only created for
656	   INVITE requests.  Explicit message confirmations do not have these
657	   problems.

659	   The behavior and issues of window-based overload control are similar
660	   to rate-based overload control, in that the total available receiver
661	   buffer space needs to be divided among all upstream neighbors.
662	   However, unlike rate-based overload control, window-based overload
663	   control can ensure that the receiver buffer does not overflow under
664	   normal conditions.  The transmission of messages by senders is
665	   effectively clocked by message confirmations received from the
666	   receiver.  A buffer overflow can occur if a large number of new
667	   upstream neighbors arrives at the same time.

669	7.4.  Overload Signal-based Overload Control

671	   The key idea of overload signal-based overload control is to use the
672	   transmission of a 503 (Service Unavailable) response as a signal for
673	   overload in the downstream neighbor.  After receiving a 503 (Service
674	   Unavailable) response, the sender reduces the load forwarded to the
675	   downstream neighbor to avoid triggering more 503 (Service
676	   Unavailable) responses.  The sender reduces the load further if more
677	   503 (Service Unavailable) responses are returned.  This scheme is
678	   based on the use of 503 (Service Unavailable) responses without
679	   Retry-After header as the Retry-After header would require a sender
680	   to stop forwarding requests.

682	   A sender which has not received 503 (Service Unavailable) responses
683	   for a while but is still throttling traffic can start to increase the
684	   offered load.  By slowly increasing load a sender can detect that
685	   overload in the downstream neighbor has been resolved and more load
686	   can be forwarded.  The load is increased until the sender again
687	   receives another 503 (Service Unavailable) response or is forwarding
688	   all requests it has.

690	   A possible algorithm for adjusting traffic is additive increase/
691	   multiplicative decrease (AIMD).

693	7.5.  On-/Off Overload Control

695	   On-/off overload control feedback enables a SIP server to turn the
696	   traffic it is receiving either on or off.  The 503 (Service
697	   Unavailable) response with Retry-After header implements on-/off
698	   overload control.  On-/off overload control is less effective in
699	   controlling load than the fine grained control methods above.  In
700	   fact, the above methods can realize on/-off overload control, e.g.,
701	   by setting the allowed rate to either zero or unlimited.

703	8.  Implicit Overload Control

705	   Implicit overload control ensures that the transmission of a SIP
706	   server is self-limiting.  It slows down the transmission rate of a
707	   sender when there is an indication that the receiving entity is
708	   experiencing overload.  Such an indication can be that the receiving
709	   entity is not responding within the expected timeframe or is not
710	   responding at all.  The idea of implicit overload control is that
711	   senders should try to sense overload of a downstream neighbor even if
712	   there is no explicit overload control feedback.  It avoids that an
713	   overloaded server, which has become unable to generate overload
714	   control feedback, will be overwhelmed with requests.

716	   Window-based overload control is inherently self-limiting since a
717	   sender cannot continue without receiving confirmations.  All other
718	   explicit overload control schemes described above do not have this
719	   property and require additional implicit controls to limit
720	   transmissions in case an overloaded downstream neighbor does not
721	   generate explicit feedback.

723	9.  Overload Control Algorithms

725	   An important aspect of the design of an overload control mechanism is
726	   the overload control algorithm.  The control algorithm determines
727	   when the amount of traffic to a SIP server needs to be decreased and
728	   when it can be increased.  In terms of the model described in
729	   Section 4 the control algorithm takes (S) as an input value and
730	   generates (T) as a result.

732	   Overload control algorithms have been studied to a large extent and
733	   many different overload control algorithms exist.  With many
734	   different overload control algorithms available, it seems reasonable
735	   to define a baseline algorithm and allow the use of other algorithms
736	   if they don't violate the protocol semantics.  This will also allow
737	   the development of future algorithms, which may lead to a better
738	   performance.

740	10.  Security Considerations

742	   [TBD.]

744	11.  IANA Considerations

746	   This document does not require any IANA considerations.

748	Appendix A.  Contributors

750	   Contributors to this document are: Ahmed Abdelal (Sonus Networks),
751	   Mary Barnes (Nortel), Carolyn Johnson (AT&T Labs), Daryl Malas
752	   (CableLabs), Eric Noel (AT&T Labs), Tom Phelan (Sonus Networks),
753	   Jonathan Rosenberg (Cisco), Henning Schulzrinne (Columbia
754	   University), Charles Shen (Columbia University), Nick Stewart
755	   (British Telecommunications plc), Rich Terpstra (Level 3), Fangzhe
756	   Chang (Bell Labs/Alcatel-Lucent).  Many thanks!

758	12.  Informative References

760	   [I-D.ietf-sipping-overload-reqs]
761	              Rosenberg, J., "Requirements for Management of Overload in
762	              the Session Initiation Protocol",
763	              draft-ietf-sipping-overload-reqs-05 (work in progress),
764	              July 2008.

766	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
767	              A., Peterson, J., Sparks, R., Handley, M., and E.
768	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
769	              June 2002.

771	   [RFC4412]  Schulzrinne, H. and J. Polk, "Communications Resource
772	              Priority for the Session Initiation Protocol (SIP)",
773	              RFC 4412, February 2006.

775	Author's Address

777	   Volker Hilt (Ed.)
778	   Bell Labs/Alcatel-Lucent
779	   791 Holmdel-Keyport Rd
780	   Holmdel, NJ  07733
781	   USA

783	   Email: volkerh@bell-labs.com

785	Full Copyright Statement

787	   Copyright (C) The IETF Trust (2008).

789	   This document is subject to the rights, licenses and restrictions
790	   contained in BCP 78, and except as set forth therein, the authors
791	   retain all their rights.

793	   This document and the information contained herein are provided on an
794	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
795	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
796	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
797	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
798	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
799	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

801	Intellectual Property

803	   The IETF takes no position regarding the validity or scope of any
804	   Intellectual Property Rights or other rights that might be claimed to
805	   pertain to the implementation or use of the technology described in
806	   this document or the extent to which any license under such rights
807	   might or might not be available; nor does it represent that it has
808	   made any independent effort to identify any such rights.  Information
809	   on the procedures with respect to rights in RFC documents can be
810	   found in BCP 78 and BCP 79.

812	   Copies of IPR disclosures made to the IETF Secretariat and any
813	   assurances of licenses to be made available, or the result of an
814	   attempt made to obtain a general license or permission for the use of
815	   such proprietary rights by implementers or users of this
816	   specification can be obtained from the IETF on-line IPR repository at
817	   http://www.ietf.org/ipr.

819	   The IETF invites any interested party to bring to its attention any
820	   copyrights, patents or patent applications, or other proprietary
821	   rights that may cover technology that may be required to implement
822	   this standard.  Please address the information to the IETF at
823	   ietf-ipr@ietf.org.