idnits 2.17.1 

draft-campbell-dime-overload-issues-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 15, 2013) is 3936 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-13) exists of
     draft-ietf-dime-overload-reqs-07


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                        B. Campbell
3	Internet-Draft                                                   Tekelec
4	Intended status: Informational                             July 15, 2013
5	Expires: January 16, 2014

7	               Diameter Overload Control Solution Issues
8	                 draft-campbell-dime-overload-issues-01

10	Abstract

12	   The Diameter Maintenance and Extensions (DIME) working group has
13	   undertaken an "overload control" work item, with the goal of
14	   standardizing a mechanism to allow Diameter nodes to report overload
15	   information among themselves.  Requirements currently include, among
16	   others, the need to accurately report the scope of overload
17	   conditions, and the ability to report overload information between
18	   nodes that are not directly connected at the transport layer.  These
19	   requirements introduce complex issues.  This document describes those
20	   issues, in the hope that it will assist the working group's decision
21	   process.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on January 16, 2014.

40	Copyright Notice

42	   Copyright (c) 2013 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   2.  Document Conventions  . . . . . . . . . . . . . . . . . . . .   4
59	   3.  Non-adjacent Overload Information . . . . . . . . . . . . . .   4
60	     3.1.  Use-Cases for Non-adjacent Overload Control . . . . . . .   5
61	       3.1.1.  Interconnect  . . . . . . . . . . . . . . . . . . . .   5
62	       3.1.2.  Non-Supporting Agents . . . . . . . . . . . . . . . .   6
63	     3.2.  Issues with Non-Adjacent Overload Control . . . . . . . .   6
64	       3.2.1.  Topology Issues . . . . . . . . . . . . . . . . . . .   6
65	       3.2.2.  Support Negotiation . . . . . . . . . . . . . . . . .   7
66	       3.2.3.  Overload Report Delivery  . . . . . . . . . . . . . .   8
67	       3.2.4.  Non-Adjacent Overload Scopes  . . . . . . . . . . . .   9
68	     3.3.  Non-adjacent Overload Control Recommendations . . . . . .  11
69	   4.  Overload Scopes . . . . . . . . . . . . . . . . . . . . . . .  12
70	     4.1.  Explicit vs Implicit Indication of Scopes . . . . . . . .  13
71	     4.2.  Types of Overload Scopes  . . . . . . . . . . . . . . . .  14
72	       4.2.1.  Connection Scope-Type . . . . . . . . . . . . . . . .  14
73	       4.2.2.  Peer Scope-Type . . . . . . . . . . . . . . . . . . .  15
74	       4.2.3.  Destination-Host Scope-Type . . . . . . . . . . . . .  15
75	       4.2.4.  Origin-Host Scope-Type  . . . . . . . . . . . . . . .  16
76	       4.2.5.  Diameter-Application Scope-Type . . . . . . . . . . .  16
77	       4.2.6.  Destination-Realm Scope-Type  . . . . . . . . . . . .  16
78	       4.2.7.  Session Scope-Type  . . . . . . . . . . . . . . . . .  17
79	       4.2.8.  Session-Group Scope-Type  . . . . . . . . . . . . . .  18
80	     4.3.  Scope Values  . . . . . . . . . . . . . . . . . . . . . .  18
81	     4.4.  Combining Scopes  . . . . . . . . . . . . . . . . . . . .  18
82	     4.5.  Scope Extensibility . . . . . . . . . . . . . . . . . . .  19
83	     4.6.  Scope Recommendations . . . . . . . . . . . . . . . . . .  19
84	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  19
85	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
86	   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
87	     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  20
88	     7.2.  Informative References  . . . . . . . . . . . . . . . . .  20
89	   Appendix A.  Contributors . . . . . . . . . . . . . . . . . . . .  20
90	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  20

92	1.  Introduction

94	   When a Diameter [RFC6733] server or agent becomes overloaded, it
95	   needs to be able to gracefully reduce its load, typically by
96	   requesting other nodes to reduce the number of Diameter requests for
97	   some period of time.

99	   The Diameter Overload Control Requirements
100	   [I-D.ietf-dime-overload-reqs] describe requirements for overload
101	   control mechanisms.  Requirement 31 states that Diameter nodes must
102	   be able to report overload with sufficient granularity to avoid
103	   forcing available capacity to go unused.  Requirement 34 requires the
104	   ability to report overload across Diameter nodes that do not support
105	   the mechanism.  These requirements introduce significant and
106	   interrelated complexities to potential solutions.  This document
107	   describes the related issues.  The author hopes that this document
108	   will assist the working group's decision process related to these
109	   requirements.

111	   At the time of this writing, there have been two proposals for
112	   Diameter overload control solutions.  "A Mechanism for Diameter
113	   Overload Control" (MDOC) [I-D.roach-dime-overload-ctrl] defines a
114	   solution that piggybacks overload and load state information over
115	   existing Diameter messages.  "The Diameter Overload Control
116	   Application" (DOCA) [I-D.korhonen-dime-ovl] defines a solution that
117	   uses a new dedicated Diameter application to communicate similar
118	   information.

120	      While there are significant differences between the two proposals,
121	      they carry similar information.  In many ways, the issues related
122	      to Requirements 31 and 34 apply to both proposals.  This
123	      discussion is not specific to one proposal or the other, unless
124	      explicitly mentioned.

126	   This document serves two purposes.  The primary purpose is to explore
127	   the issues related to Requirement 34, that is, the requirement for
128	   the overload control mechanism to support sending load and overload
129	   information across intermediaries that do not support the mechanism
130	   (referred to herein as "non-adjacent" overload reporting.)  The
131	   document describes two use cases for non-adjacent overload reporting.
132	   It does not, however, attempt to describe the use cases for Diameter
133	   agents in general.  For a more thorough treatment of Diameter agent
134	   use cases in the context of overload control, please see
135	   [I-D.ietf-dime-overload-reqs].

137	   The secondary purpose is to help the reader understand the concept of
138	   overload scopes, and make recommendations about what kinds of
139	   overload scope should be supported by the mechanism.  These purposes
140	   are interrelated, since an understanding of overload scopes is
141	   necessary to fully understand some of the issues with non-adjacent
142	   overload reporting.

144	2.  Document Conventions

146	   This document uses terms defined in [RFC6733] and
147	   [I-D.ietf-dime-overload-reqs].  In particular, the terms "client",
148	   "server","upstream", and "downstream" are used as defined in RFC
149	   6733.  In addition, this document uses the following terms:

151	   Overload: A condition where a Diameter node needs a reduction in the
152	             number of requests that it must handle.

154	   Overload Report:  A request to reduce traffic that contributes to an
155	             overload condition.

157	   Overload Scope:  A classifier that defines the set of requests that
158	             may contribute to particular overload conditions.
159	             Alternatively, the purposes for which a node may be
160	             overloaded.  For example, if a server is overloaded for the
161	             purposes of one Diameter application but not another, the
162	             overload condition can be considered "scoped" to that
163	             application.

165	   Reporting Node:  The node that sends an overload report.  Also known
166	             as an "overloaded node".

168	   Reacting Node:  A node that consumes and possibly acts on an overload
169	             report.

171	   Adjacent Overload Reporting:  Overload reports exchanged between
172	             adjacent Diameter peers.

174	   Non-Adjacent Overload Reporting:  Overload reports sent between
175	             Diameter nodes separated by one or more intermediate
176	             Diameter agents (i.e. relays or proxies) .

178	   Piggybacked Overload Reporting:  The inclusion of overload reports in
179	             existing Diameter messages.

181	   Application-Based Overload Reporting:  The sending of overload
182	             reports in a separate, dedicated Diameter application.

184	3.  Non-adjacent Overload Information

186	   Requirement 34 of [I-D.ietf-dime-overload-reqs] says that the
187	   selected Diameter overload control mechanism "SHOULD" be able to
188	   communicate overload and load information across intermediaries that
189	   do not support the mechanism.  This requirement introduces a number
190	   of complications to the solution effort, creating complications in
191	   how Diameters negotiate support for overload control, address and
192	   route overload reports to the right places, and act on received
193	   overload reports.

195	   While the requirement does not explicitly say it, we interpret
196	   "intermediaries" in this context to mean Diameter agents.  The
197	   requirement is irrelevant for lower layer intermediaries (e.g.
198	   routers), and cannot be reasonably applied for non-Diameter entities,
199	   or hybrid entities such as gateways between Diameter and other
200	   protocols.

202	   The requirement to traverse non-supporting intermediaries is not
203	   necessarily the same thing as a requirement for end-to-end
204	   communication of overload reports between Diameter clients and
205	   servers.  Non-adjacent reporting can include client-to-server
206	   scenarios.  They can also include server-to-agent scenarios and
207	   agent-to-client scenarios.  All such scenarios may include one or
208	   more intervening agents.  Since Diameter allows transactions to be
209	   sent from server to client, all scenarios may be reversed.
210	   Therefore, we refer to this requirement as "Non-adjacent Overload
211	   Control".

213	3.1.  Use-Cases for Non-adjacent Overload Control

215	   There are two primary use-cases for non-adjacent overload control.

217	3.1.1.  Interconnect

219	   The first significant non-adjacent use-case is the interconnect
220	   scenario described in section 2.3 of the overload control
221	   requirements [I-D.ietf-dime-overload-reqs].  Two or more Diameter
222	   network operators communicate with each other across a third-party
223	   interconnect provider that brokers Diameter traffic between the
224	   operators.  Figure 1 illustrates the interconnect use case.

226	                +-------------------------------------------+
227	                |               Interconnect                |
228	                |                                           |
229	                |   +--------------+      +--------------+  |
230	                |   |     Agent    |------|     Agent    |  |
231	                |   +--------------+      +--------------+  |
232	                |         .'                      `.        |
233	                +------.-'--------------------------`.------+
234	                     .'                               `.
235	                  .-'                                   `.

237	    ------------.'-----+                             +----`.------------
238	          +----------+ |                             | +----------+
239	          |Edge Agent|                               | |Edge Agent|
240	          +----------+ |                             | +----------+
241	                       |                             |
242	            Operator 1 |                             |  Operator 2
243	    -------------------+                             +------------------

245	               Figure 1: Two Operator Interconnect Scenario

247	   If the interconnect provider does not support Diameter overload
248	   control, each operator network becomes an island of overload control,
249	   similar to those in the non-supporting agent use-case
250	   (Section 3.1.2).  Even if the interconnect provider does support
251	   overload control, the operators may not trust it to generate and act
252	   on overload reports on the operators' behalves, and may prefer to
253	   exchange overload and load information directly with each other.

255	   The interconnect use-case may introduce additional security concerns.
256	   While the non-supporting agent use case typically (but not
257	   necessarily) occurs inside a single administrative domain, the
258	   interconnect case will almost always involve sending overload reports
259	   across multiple administrative domains.  Since a malicious or
260	   incorrect overload report can effectively shut down Diameter
261	   processing, the current lack of a viable solution for end-to-end
262	   integrity protection of Diameter messages may be a problem.

264	3.1.2.  Non-Supporting Agents

266	   [I-D.ietf-dime-overload-reqs] requires the solution to function in
267	   networks where not all Diameter elements support it.  That is, the
268	   solution must allow gradual deployment, and must not require a flag-
269	   day cutover.  If non-adjacent overload control is not supported, one
270	   or more non-supporting Diameter Agents can divide a network into
271	   overload control islands, where overload information is communicated
272	   inside each island, but not among separate islands.

274	      In the author's strictly personal opinion, the non-supporting
275	      agent use case is less compelling than the interconnect case.  The
276	      non-supporting agent case would typically occur inside one
277	      administrative domain.  The operator of that domain has
278	      considerably more control over the implementations used in the
279	      domain than it might have for third-party domains.

281	3.2.  Issues with Non-Adjacent Overload Control

283	3.2.1.  Topology Issues
284	   Many of the issues with non-adjacent overload control derive from the
285	   fact that a Diameter node is unlikely to know the topology of the
286	   Diameter network past its immediate peers.  In a trivial topology,
287	   that is, a Diameter network with only clients and servers, this is
288	   not a problem.  But if the immediate peer is a Diameter agent, a node
289	   is unlikely to know what next hop the relay will select for a given
290	   Diameter message.  This is particularly difficult if the agent hides
291	   topology in either direction, or uses dynamic peer discovery.  While
292	   a node may be able to infer the path a given message will take in
293	   some specific cases (e.g. for mid-session messages), they cannot do
294	   this in general.  And even those specific cases may fail if an agent
295	   on the message path performs topology hiding.

297	   This lack of topology knowledge impacts the way that nodes can
298	   negotiate overload-control support, the ways they send overload
299	   reports, and the ways a reacting node can act to mitigate overload.
300	   A non-adjacent overload-control mechanism will need to solve the
301	   topology issues, either by offering ways to discover non-adjacent
302	   topologies, or offering ways to constrain overload-control relevant
303	   parts of such topologies in ways where a node could reasonably know
304	   them in advance.

306	3.2.2.  Support Negotiation

308	   Diameter nodes need to negotiate or otherwise indicate their support
309	   for overload control to other nodes.  This includes indicating
310	   support for overload control in general, as well as potentially
311	   indicating support of certain parameters of the overload control
312	   solution.  For example, a node may need to indicate which overload
313	   algorithms it supports.  This becomes complex if two non-adjacent
314	   nodes need to negotiate support.

316	   In a Diameter application-based solution, support for the overload
317	   control application would occur during the capabilities exchange
318	   between peers.  Diameter capabilities exchange occurs strictly
319	   between peers; Diameter offers no mechanism for indicating support of
320	   a given Application-ID between non-adjacent nodes.

322	   Diameter allows non-negotiated use of an arbitrary Application-Id
323	   between non-adjacent nodes across Diameter agents that implement the
324	   Diameter Relay application.  In theory, this means that an
325	   application-based, non-adjacent overload control could only traverse
326	   Diameter relays, or Diameter proxies that explicitly support the
327	   overload-control Application-Id.  In the latter case, we assume that
328	   a proxy will not indicate support for the overload-control
329	   Application-Id unless it supports the overload-control mechanism;
330	   such a proxy cannot be considered a non-supporting agent.

332	   In practice, a Diameter agent can act as a proxy for some purposes
333	   and a relay for others.  If a Diameter proxy indicates support for
334	   the Diameter relay application, we assume that it will relay any
335	   arbitrary application.  This means it can be considered a relay for
336	   the purposes of overload control.

338	   For both application-based and piggybacked solutions, a supporting
339	   node needs know the other nodes with which it should negotiate.  For
340	   overload-control between Diameter peers, this is easy; a node
341	   exchanges support information with its immediate peers.  But for non-
342	   adjacent overload control, this is more difficult for reasons
343	   discussed in Section 3.2.1.

345	   Therefore, for non-adjacent overload control negotiation, each
346	   supporting node either needs advance knowledge of all nodes with
347	   which it may negotiate overload-control support, or it needs a
348	   mechanism for discovering that knowledge dynamically.

350	3.2.3.  Overload Report Delivery

352	   With adjacent overload control reporting, overload report addressing
353	   and delivery is relatively simple.  A node sends overload reports
354	   directly to its peers.  This becomes more complex for non-adjacent
355	   overload-control.

357	   For application-based overload control, nodes could address overload
358	   reports to specific endpoint nodes using the Destination-Host AVP.
359	   Doing so would be subject to the same non-adjacent topology issues
360	   described in Section 3.2.1.  That is, a node can only send overload
361	   reports to non-adjacent clients or servers that it knows about,
362	   either from prior knowledge (i.e. provisioning) or from which it has
363	   observed previous Diameter messages.

365	   An application-based mechanism could possibly address reports to non-
366	   adjacent Diameter agents using the Destination-Host AVP.  This would
367	   effectively make the agent into an endpoint for the overload-control
368	   application.

370	   A piggy-backed mechanism will have more difficulty addressing non-
371	   adjacent overload reports.  A piggy-backed mechanism sends overload
372	   reports in already existing Diameter requests; That is, requests that
373	   have their own purposes and destinations independent of the overload-
374	   report.  Thus, nodes can only select the destination of an overload
375	   report by bundling it into a Diameter message that was already going
376	   to that destination.  While a piggy-backed mechanism might be able to
377	   send overload-reports across quiescent transport connections using
378	   watchdog (DWR/DWA) messages, these message are cannot be exchanged
379	   between non-adjacent nodes.

381	      In some cases, the limit of sending overload reports to
382	      destinations to which existing traffic is bound may be acceptable.
383	      If a node is contributing to an overload condition, then it's
384	      reasonable to assume that node is regularly exchanging traffic
385	      with the overloaded node.  However, there may be cases where an
386	      overload report causes a connection become quiescent.  If the
387	      reporting node needed to tell a reacting node that the condition
388	      has resolved or improved, it would need to send a new report
389	      across the now quiescent connection.  There may also be cases
390	      where a reacting node redirects traffic along a different path,
391	      causing a previously quiescent node to suddenly start sending
392	      requests to the overloaded node.  Thus, without careful selection
393	      of the overload report scope, an overloaded node may find itself
394	      engaged in a game of Whack-a-Mole [Whac-a-Mole] with previously
395	      quiescent non-adjacent nodes.

397	   For both piggy-backed and application-based solutions, non-adjacent
398	   overload control introduces a need to identify the sender of a
399	   report, or at least determine whether the report is from an adjacent
400	   or non-adjacent node.  This is not required for purely adjacent
401	   solutions, since the sender could always be assumed to be the peer.

403	   For example, a non-adjacent report with a "Connection" scope does not
404	   make sense.  If a node receives one, it should ignore it.  But in
405	   order to make that decision, it must be able distinguish a non-
406	   adjacent report from an adjacent one.  For example, in an
407	   application-based mechanism,

409	3.2.4.  Non-Adjacent Overload Scopes

411	   A reacting node will typically attempt to mitigate an overload
412	   condition by either reducing the number of requests that contribute
413	   to the condition, or by rerouting part of that traffic to avoid the
414	   problem.  In both cases, the reacting node's is limited by its
415	   ability to determine to which Diameter requests contribute to the
416	   overload condition in the first place.  The overload scope concept
417	   (Section 4) offers a way for overloaded nodes to indicate what
418	   traffic is likely to contribute to an overload condition and should
419	   be abated.

421	   Not all of the scope-types described in Section 4 make sense for non-
422	   adjacent overload control.  The "Connection" scope-type is an obvious
423	   example, since the reacting node will never share a transport
424	   connection with a non-adjacent node; this is the very definition of
425	   non-adjacent nodes.

427	   Since a Diameter node cannot control how requests are forwarded to
428	   non-adjacent nodes, the "Peer" scope-type also does not work well,
429	   especially when there are multiple possible destinations up or
430	   downstream from the adjacent peer.  For example in Figure 2, Node A
431	   sends Diameter requests to Nodes B and C across a non-supporting
432	   agent.  If Node B becomes overloaded but Node C does not, Node A
433	   cannot reroute requests to Node C, since it has very little way to
434	   influence where the agent will forward any given request.  If Node A
435	   tries to reduce traffic by 50%, the agent will likely still send half
436	   of the remaining traffic to Node B. If B and C are endpoints, Node A
437	   may in some cases be able to use the Destination-Host AVP for this
438	   purpose (in which case the "Destination-Host" scope-type would be
439	   more appropriate), but this does not help if B and C are also agents
440	   rather than servers.

442	                      +--------+       +--------+
443	                      | Node B |       | Node C |
444	                      +----+---+       +---+----+
445	                           |               |
446	                           +-------+-------+
447	                                   |
448	                           +-------+--------+
449	                           | Non-Supporting |
450	                           |  Agent         |
451	                           +-------+--------+
452	                                   |
453	                                   |
454	                              +----+----+
455	                              | Node  A |
456	                              +---------+

458	                      Figure 2: Non-Adjacent Routing

460	   Scope-types that classify traffic by origin or final destinations,
461	   such as "Origin-Host","Destination-Realm", "Application-ID", and
462	   "Destination-Host" can be used for non-adjacent overload control.  In
463	   general, scope-types that may denote non-adjacent intermediary
464	   devices, such "Peer" cannot, nor can scope-types that refer only to
465	   peers, e.g. "Connection".

467	   Even for destination-oriented scope-types, the sender of an overload
468	   report must be authoritative for the indicated scope.  That is, it
469	   must have full knowledge of the congestion state for the scope.  For
470	   example, if Node B and C both serve the ream "example.com", and B
471	   becomes 50% overloaded while C does not, B cannot simply report 50%
472	   overload at realm scope.  If it did, Node A would reduce its
473	   generated traffic by 50%. Since the overall realm is really only
474	   overloaded by 75%, this would leave the realm operating beneath
475	   available capacity.

477	      The need to be authoritative for an indicated scope is also true
478	      for strictly adjacent reporting mechanisms.  But in an adjacent
479	      mechanism, it is easier for an intervening agent to learn the
480	      overload state of upstream nodes.  In the example, if the agent
481	      supported the overload control mechanism, it would most likely
482	      receive reports from Nodes B and C, and could then construct
483	      downstream reports that incorporate the state of B, C, and its own
484	      local state.  This contrasts with the non-adjacent case where B
485	      must understand the current state of C even though it is not in
486	      the path of overload reports from C.

488	   Therefore, a given node must only report overload for scopes for
489	   which it has full knowledge of the load and overload state.  That is,
490	   it must be a "scope authority" for any scope it reports.  In the
491	   example, nodes B and C (and any other nodes serving "example.com")
492	   would be required to share current load and overload state.  The
493	   state-sharing requirement could be substantial for high-capacity
494	   nodes.

496	   When a node reports overload for a certain scope, reacting nodes will
497	   treat the overload condition as uniform across the entire scope.  For
498	   example, if a node reports overload for an entire realm, reacting
499	   nodes will reduce traffic equally for all servers that serve that
500	   realm.  If the servers are unequally overloaded, they must use a more
501	   granular scope-type, for example, "Destination-Host".

503	3.3.  Non-adjacent Overload Control Recommendations

505	   An adjacent reporting mechanism allows for very flexible and fine
506	   grained overload control.  It solves or simplifies a number of
507	   issues, such as negotiation of support and parameters, requirements
508	   for topology knowledge, end-to-end security, etc, by avoiding them in
509	   the first place.  Adding non-adjacent support to such a mechanism
510	   would complicate it considerably.

512	   Non-adjacent overload control mechanism are better for connecting
513	   islands of overload control.  Such a mechanism works well for larger
514	   scopes and relatively static topologies.

516	   The author believes that we are unlikely to find a single solution
517	   that works well for both adjacent and non-adjacent overload control.
518	   While a single solution is more desirable in general, a single
519	   solution that works well for both cases is likely to be extremely
520	   complicated.  Therefore, the working group should consider a separate
521	   mechanism for the non-adjacent delivery of overload reports.

523	   If the group chooses to accept two separate solutions, we should be
524	   able to specify a single data model and set of AVPs that work for
525	   both, with some restrictions.  (For example, the non-adjacent
526	   solution would likely forbid the use of the "Connection" scope-type.)

528	   If the working group chooses to add non-adjacent features to MDOC or
529	   DOCA, we will need to change the support negotiation mechanisms to
530	   allow for the non-adjacent case, specify how a node can determine
531	   whether a report is adjacent or non-adjacent, and state what subset
532	   of scope-types are allowed in non-adjacent supports.  We will also
533	   need to study how we can meet the security-related requirements
534	   [I-D.ietf-dime-overload-reqs] given the current lack of end-to-end
535	   security features in Diameter.

537	4.  Overload Scopes

539	   Diameter overload does not necessarily affect all kinds of Diameter
540	   traffic.  A node may become overloaded for some requests but not
541	   others.  For example, a Diameter agent may handle requests for more
542	   than one Diameter Application, and may route requests to a different
543	   set of servers for each application.  If one server set becomes
544	   overloaded, but the other does not, then the agent itself is
545	   effectively overloaded for one application, but can process the other
546	   at normal capacity.

548	   The Diameter overload requirements [I-D.ietf-dime-overload-reqs] list
549	   several scenarios that illustrate overload that affects some requests
550	   but not others.  We refer to the set of requests affected by a
551	   particular overload event as the "scope" of the overload event.  The
552	   overload requirements require the mechanism to be able to report
553	   overload reports that are "scoped" to (that is, they affect requests
554	   targeted to) a particular Diameter node, a Realm, or a Diameter
555	   Application.

557	      The concept of scope may also be useful when applied to reported
558	      load even without an overload condition.  This usage is out of
559	      "scope" for this document.

561	   A scope indication in an overload report is a set of classifiers that
562	   identify requests likely to contribute to the overload condition.  In
563	   general, this could include any aspect of a Diameter message that a
564	   reacting node can observe.  For example, requests could be classified
565	   by Attribute Value Pair (AVP) values or next-hop routing decisions.

567	   The ability to express the scope of an overload condition is only
568	   useful when reacting nodes can act on the information.  There are
569	   only a small number of actions a reacting node may take to mitigate
570	   overload.  Essentially these actions boil down to reducing the number
571	   of requests that "match" the scope, either by sending fewer requests
572	   in the first place, or by routing around the problem.  The former is
573	   limited by the node's ability to distinguish between requests that
574	   match the overload scope, and request that do not.  The latter is
575	   limited by the node's ability to predict or influence how a request
576	   will be routed.

578	      Reacting nodes most likely take additional application-specific
579	      actions to mitigate overload conditions.  If a client reduces the
580	      number of messages it sends, it almost certainly has to take
581	      additional application-specific steps that affect its own client
582	      application.  Depending on the application, it might refuse some
583	      client application requests, redirect some of its own clients to
584	      different services (e.g. offloading mobile data sessions to local
585	      WiFi networks), or assert an overload condition in the client
586	      application protocol (e.g. The Session Initiation Protocol (SIP)
587	      ).

589	   This section discusses the meanings of the required scope-types, and
590	   analyses their implications for the selected mechanism.

592	4.1.  Explicit vs Implicit Indication of Scopes

594	   Both MDOC and DOCA use explicit scope indication.  That is, the scope
595	   of an overload report is not, in general, implied by the type of
596	   message that carries the report.  For example, if an overload report
597	   is scoped to a particular Diameter Application-Id, the report
598	   explicitly indicates affected Application-Id, rather than leaving the
599	   reacting-node to infer the Application-ID based on that of the
600	   message that carries the report.  There are a few exceptions to this;
601	   for example MDOC supports a "Connection" scope that, when specified,
602	   pertains to requests to be sent over the same transport connection
603	   over which the overload report arrived.

605	      List discussions have shown a common assumption that overload
606	      reports sent over a piggy-backed solution such as MDOC would only
607	      affect requests associated with the same Diameter Application-Id.
608	      For MDOC, this is a false assumption.  MDOC's explicit use of
609	      scopes allows overload reports sent over one application to affect
610	      requests for any arbitrary application.  On the other hand,
611	      solutions that use a dedicated Application-Id (such as DOCA)
612	      necessarily require the ability to report overload for arbitrary
613	      applications; otherwise it would only be possible for an overload
614	      control application to report overload on itself.

616	   Some list participants have suggested that the solution include a
617	   concept of a default scope, that is, a scope that is implied if no
618	   other scope is explicitly indicated.  The concept of default or
619	   implicit scopes requires further study by the working group.

621	4.2.  Types of Overload Scopes

623	   There are several different kinds, or types, of overload scopes.  The
624	   type of a scope defines how the reacting node interprets it.  Table 1
625	   gives a summary of the scope types discussed in this document.  The
626	   "Scope Type" column gives the name of the scope.  The "Affected
627	   Traffic" column describes what Diameter requests are impacted by the
628	   scope-type.  The "Reacting-Node" column describes which Diameter
629	   nodes may be able to take action on an overload report with the
630	   respective scope-type.  Finally, the "Draft" column describes which
631	   proposed solution includes the respective scope-type.

633	   +------------------+-----------------------+---------------+--------+
634	   | Scope Type       | Affected Traffic      | Reacting-Node | Draft  |
635	   +------------------+-----------------------+---------------+--------+
636	   | Connection       | Requests sent to      | Adjacent Peer | MDOC,  |
637	   |                  | directly to the       |               | DOCA   |
638	   |                  | reporting-node on a   |               |        |
639	   |                  | particular transport  |               |        |
640	   |                  | connection            |               |        |
641	   | Peer             | Requests routed       | Adjacent Peer | MDOC,  |
642	   |                  | directly to           |               | DOCA   |
643	   |                  | reporting-node.       |               |        |
644	   | Destination-Host | Requests with a       | Any           | MDOC   |
645	   |                  | matching Destination- |               |        |
646	   |                  | Host AVP              |               |        |
647	   | Origin Host      | Requests including a  | Any           | DOCA?  |
648	   |                  | matching Origin-Host  |               |        |
649	   |                  | AVP                   |               |        |
650	   | Diameter         | Requests with a       | Any           | MDOC,  |
651	   | Application      | matching Application- |               | DOCA   |
652	   |                  | Id AVP                |               |        |
653	   | Destination      | Requests with a       | Any           | MDOC,  |
654	   | Realm            | matching Destination- |               | DOCA   |
655	   |                  | Realm AVP             |               |        |
656	   | Session          | Requests with a       | Any           | MDOC   |
657	   |                  | matching Session-Id   |               |        |
658	   |                  | AVP                   |               |        |
659	   | Session-Group    | Requests belonging to | Any           | MDOC   |
660	   |                  | sessions assigned     |               |        |
661	   |                  | matching labels       |               |        |
662	   +------------------+-----------------------+---------------+--------+

664	                 Table 1: Summary of Overload Scope Types

666	4.2.1.  Connection Scope-Type
667	   The "Connection" scope-type indicates that the reacting node should
668	   reduce traffic sent on the transport connection on which it received
669	   the overload report.  A Connection scope indicate does not include an
670	   explicit value; rather it implies "this connection".

672	4.2.2.  Peer Scope-Type

674	   The "Peer" scope-type indicates that a particular Diameter node is
675	   overloaded.  Other nodes should mitigate the overload by reducing the
676	   number of requests that will land on the overloaded node, either by
677	   sending fewer requests, or by attempting to route requests around the
678	   overloaded node.

680	      In both MDOC and DOCA, the "Peer" scope-type is named "Host".  In
681	      practice, only immediate peers can act as the reacting node for a
682	      Host scoped overload report.  This is due to the fact that non-
683	      adjacent nodes have limited ability to influence routing decisions
684	      beyond the immediate next hop.  This document uses the term "Peer"
685	      to illustrate that fact.

687	   Large-scale Diameter nodes are often implemented as clusters of IP
688	   hosts, which may or may not share their knowledge about upstream
689	   overload conditions.  Certain IP hosts in a cluster could become
690	   overloaded when others do not.  Furthermore, if the reacting-node is
691	   also clustered, it may be difficult for the cluster members to share
692	   real-time knowledge of the reporting-node's overload state.  This can
693	   make it difficult for a node to know conclusively whether any two
694	   connections that appear to connect to the same peer can be treated as
695	   such for the purposes of overload control.  The working group should
696	   study whether the Peer scope-type should be deprecated in favor of
697	   the "Connection" scope-type.

699	4.2.3.  Destination-Host Scope-Type

701	   The "Destination-Host" scope type pertains to requests that contain a
702	   Destination-Host AVP that matches the indicated Destination-Host
703	   value.  Destination-Host always refers to the endpoint for a given
704	   Diameter request.

706	   The best the reacting node can do is reduce the number of requests
707	   that contain a Destination-Host AVP that match the overloaded node.
708	   Rerouting will not help in general, since the requests will simply
709	   take different routes to arrive at the same overloaded server.
710	   Unless the destination node is also direct peer, the reacting node
711	   cannot do much about requests that don't contain a Destination-Host
712	   AVP in the first place, since it cannot predict whether these
713	   requests will land on the overloaded endpoint.  The Destination-Host
714	   scope type is useful for requests bound to a particular server, for
715	   example, mid-session requests for a session-stateful application.

717	   Go ahead and cover details for "session" and "session-groups", and
718	   argue for removal of "session".

720	4.2.4.  Origin-Host Scope-Type

722	   While most scope-types refer to where a request is likely to go, the
723	   "Origin-Host" scope-type refers to where the request originates.
724	   That is, any request with a matching Origin-Host AVP would match.
725	   The Origin-Host scope type is useful for situations where a specific
726	   client or set of clients sends an excessive number of requests.  An
727	   overload report with an Origin-Host scope would tell matching clients
728	   to reduce traffic, or agents to throttle requests that came from
729	   matching clients.

731	      Note that the Origin-Host scope-type is not explicitly mentioned
732	      in the requirements document.  The authors include it here because
733	      others have mentioned the need in conversation.

735	4.2.5.  Diameter-Application Scope-Type

737	   The "Diameter Application" scope-type indicates overload for a
738	   particular Diameter application.  That is, it impacts all requests
739	   with the matching value in an Application-Id AVP.

741	   The Diameter Application scope-type is useful for declaring an
742	   overload condition that affects a specific Diameter service,
743	   typically, but not necessarily, in a specific realm.

745	   Since the Diameter Application scope-type indicates overload for an
746	   entire application, reacting nodes should reduce the number of
747	   requests sent for that application.  Similarly to the Realm scope-
748	   type, it will rarely if ever make sense for a Diameter node to
749	   reroute traffic to a different Diameter application.

751	4.2.6.  Destination-Realm Scope-Type
752	   The "Destination-Realm" scope-type indicates overload for all servers
753	   that handle requests for the particular Diameter realm.  That is, it
754	   impacts all requests with the particular realm in the Destination-
755	   Realm AVP.

757	   The Realm scope-type is useful for declaring a global overload
758	   condition within a network serving a single realm.  It is also useful
759	   for requesting third-parties to reduce Diameter traffic sent to a
760	   particular realm, for example, in roaming scenarios.

762	   Since the Realm scope-type indicates overload for an entire realm,
763	   reacting nodes should reduce the number of messages sent for the
764	   realm.  Rerouting traffic does not make sense for the Realm scope
765	   type, since it would probably never be useful for Diameter nodes to
766	   reroute traffic destined for an overloaded realm to a different, non-
767	   overloaded realm.  Client applications might, however, be able to
768	   choose to use services from a different operator if the Diameter
769	   realm of one operator reports an overload condition.

771	   MDOC currently makes the Realm scope-type mandatory to implement.
772	   List participants have indicated that there may be use cases where
773	   all Diameter traffic on a network uses the same Realm, and that the
774	   use of the Realm scope-type would be redundant in such networks.
775	   Whether the Realm scope-type should remain mandatory or become
776	   optional to implement requires further study.

778	4.2.7.  Session Scope-Type

780	   MDOC currently includes a "Session" scope-type.  This scope-type
781	   refers to messages that include a matching Session-Id.  Conceptually,
782	   this applies to all requests that are part of a previously
783	   established session.  This scope-type could potentially be useful for
784	   a session-stateful agent that assigns session-establishing requests
785	   to a certain server, and then sends all future requests in that
786	   session to the same server.  If that server became overloaded, the
787	   agent could send an overload report scoped to the assigned session.

789	   However, the Session scope-type will become unwieldy for anything
790	   other than very small-scale installations.  The number of sessions
791	   assigned to any specific server is likely to be quite large.
792	   Therefore, the number of Session scope values would probably become
793	   quite large.  The working group should consider deprecating the
794	   Session scope-type.  In non-topology hiding agents, the Destination-
795	   Host scope-type can be used to affect all sessions assigned to a
796	   particular server.  For topology-hiding agents, the session-group
797	   mechanism can do the same.

799	4.2.8.  Session-Group Scope-Type

801	   Diameter agents that implement certain topology-hiding schemes may
802	   modify Origin-Host AVPs inserted by servers, and use some local
803	   mechanism to bind sessions to specific servers.  The "Destination-
804	   Host" type may not function correctly in this case.  MDOC specifies a
805	   "session-group" scope-type, where an agent or server can assign a
806	   common identifier to sessions that are fate-shared in some way, such
807	   as being bound to the same server.  If that server becomes
808	   overloaded, the agent can send an overload report that matches
809	   requests in all sessions with the matching identifier.

811	   This scope-type may be useful under certain circumstances, but may
812	   also be complex to implement.  Further discussion is needed to
813	   determine if the session-group type should be included in the base
814	   mechanism.  Since the mechanism is required to allow extensible
815	   scope-types, session-groups could still be added in the future.  The
816	   working group should study whether the Session-Group mechanism should
817	   be included in the base overload control solution, or removed with
818	   the potential to add as an extension scope-type in the future.

820	4.3.  Scope Values

822	   Scope labels in an overload report will typically take the form of a
823	   scope-type and a value.  For example, if the "example.com" realm is
824	   overloaded for all services, the overload report would indicate a
825	   scope-type of "Realm" and a scope-value of "example.com"

827	   The Connection scope-type is an exception.  Since an overload report
828	   with a Connection scope is only actionable by one of the peers
829	   connected via the specified connection, it makes sense to treat the
830	   Connection scope-type as always having a value of "this connection".

832	4.4.  Combining Scopes

834	   Diameter nodes will commonly need to construct overload reports that
835	   apply to a combination of scopes.  For example, if a given realm is
836	   overloaded for subset of the applications it supports, it might
837	   indicate both a realm scope and and one or more Diameter application
838	   scopes.

840	   Logically, combining multiple scopes of different types reduces the
841	   overall set of requests to which the overload report would apply.
842	   Combining multiple scopes of the same type increases the applicable
843	   set.  A function that determines the requests affected by an overload
844	   report could model this as a logical "and" or "intersection" operator
845	   for combining scopes of different types, and a logical "or" or
846	   "union" operator for combining scopes of the same type.

848	   The working group should study whether all possible combinations
849	   should be allowed.  For example, it may or may not make sense to
850	   combine a "Connection" scope with other scopes, or to allow more than
851	   one "Connection" scope-value for a single overload report.

853	4.5.  Scope Extensibility

855	   [I-D.ietf-dime-overload-reqs] requires scope-types to be extensible.
856	   This requirement implies that the chosen mechanism or mechanisms must
857	   discuss how new scope-types can be added, how support for specific
858	   scope-types should be declared or negotiated, and which scope-types
859	   might be mandatory to support.

861	4.6.  Scope Recommendations

863	   In the author's opinion, the selected solution or solutions should
864	   support, at a minimum, the "Connection", "Destination-Host", "Realm"
865	   and "Application-ID" scope-types.  The working group should consider
866	   also adding the "Origin-Host" scope-type.

868	   The working group should consider whether the advantages of the
869	   "session-group" concept and scope-type are worth the complexity.  The
870	   group should also study whether the Peer scope-type adds sufficient
871	   utility over the Connection scope-type to warrant it's inclusion.

873	5.  IANA Considerations

875	   This draft makes no requests of IANA.

877	6.  Security Considerations

879	   Overload reports induce Diameter nodes to reduce or reroute traffic.
880	   For large scopes, a single erroneous or malicious overload report
881	   could effectively shut down Diameter processing for an entire realm.
882	   A Diameter overload control solution needs mechanisms to ensure that
883	   overload reports are only accepted from trusted sources, and that
884	   nothing tampers with the reports en route.

886	   For adjacent approaches, the transport connection can be protected
887	   with TLS or IPSec.  But this will not help for non-adjacent
888	   reporting, since no such transport connection exists.

890	   While such work is in progress in the DIME working group, Diameter
891	   has no currently viable mechanism for end-to-end authentication and
892	   integrity protection.  The working group should consider either
893	   making non-adjacent overload control contingent on a generic Diameter
894	   end-to-end protection mechanism, or adding a specialized protection
895	   mechanism to any resulting non-adjacent overload control solution.

897	7.  References

899	7.1.  Normative References

901	   [RFC6733]  Fajardo, V., Arkko, J., Loughney, J., and G. Zorn,
902	              "Diameter Base Protocol", RFC 6733, October 2012.

904	   [I-D.ietf-dime-overload-reqs]
905	              McMurry, E. and B. Campbell, "Diameter Overload Control
906	              Requirements", draft-ietf-dime-overload-reqs-07 (work in
907	              progress), June 2013.

909	7.2.  Informative References

911	   [I-D.roach-dime-overload-ctrl]
912	              Roach, A. and E. McMurry, "A Mechanism for Diameter
913	              Overload Control", draft-roach-dime-overload-ctrl-03 (work
914	              in progress), May 2013.

916	   [I-D.korhonen-dime-ovl]
917	              Korhonen, J. and H. Tschofenig, "The Diameter Overload
918	              Control Application (DOCA)", draft-korhonen-dime-ovl-01
919	              (work in progress), February 2013.

921	   [Whac-a-Mole]
922	              , "Whack-a-Mole Colloquial Usage", , <http://
923	              en.wikipedia.org/wiki/Whack-a-mole#Colloquial_usage>.

925	Appendix A.  Contributors

927	   Eric McMurry and Robert Sparks made significant contributions to the
928	   concepts in this draft.

930	Author's Address

932	   Ben Campbell
933	   Tekelec
934	   17210 Campbell Rd.
935	   Suite 250
936	   Dallas, TX  75252
937	   US

939	   Email: ben@nostrum.com