idnits 2.17.1 

draft-campbell-dime-load-considerations-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 6, 2015) is 3332 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'C' is mentioned on line 560, but not defined

  == Missing Reference: 'A1' is mentioned on line 560, but not defined

  == Missing Reference: 'A2' is mentioned on line 560, but not defined

  == Missing Reference: 'S4' is mentioned on line 560, but not defined

  -- Looks like a reference, but probably isn't: '1' on line 648

  -- Looks like a reference, but probably isn't: '2' on line 648

  == Outdated reference: A later version (-10) exists of
     draft-ietf-dime-ovli-03


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                              B. Campbell
3	Internet-Draft                                           S. Donovan, Ed.
4	Intended status: Informational                                    Oracle
5	Expires: September 7, 2015                                   JJ. Trottin
6	                                                          Alcatel-Lucent
7	                                                           March 6, 2015

9	       Architectural Considerations for Diameter Load Information
10	               draft-campbell-dime-load-considerations-01

12	Abstract

14	   RFC 7068 describes requirements for Overload Control in Diameter.
15	   This includes a requirement to allow Diameter nodes to send "load"
16	   information, even when the node is not overloaded.  The Diameter
17	   Overload Information Conveyance (DOIC) solution describes a mechanism
18	   meeting most of the requirements, but does not currently include the
19	   ability to send load information.  This document explores some
20	   architectural considerations for a mechanism to send Diameter load
21	   information.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on September 7, 2015.

40	Copyright Notice

42	   Copyright (c) 2015 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
58	   2.  Differences between Load and Overload information . . . . . .   3
59	   3.  How is Load Information Used? . . . . . . . . . . . . . . . .   4
60	   4.  Piggy-Backing vs a Dedicated Application. . . . . . . . . . .   5
61	   5.  Which Nodes Exchange Load Information?  . . . . . . . . . . .   6
62	   6.  Scope of Load Information . . . . . . . . . . . . . . . . . .   7
63	   7.  Frequency of Sending Load Information . . . . . . . . . . . .   8
64	   8.  Load Information Semantics  . . . . . . . . . . . . . . . . .   9
65	   9.  Is Negotiation of Support Needed? . . . . . . . . . . . . . .  10
66	   10. Topology Scenarios  . . . . . . . . . . . . . . . . . . . . .  10
67	     10.1.  No Agent . . . . . . . . . . . . . . . . . . . . . . . .  11
68	     10.2.  Single Agent . . . . . . . . . . . . . . . . . . . . . .  11
69	     10.3.  Multiple Agents  . . . . . . . . . . . . . . . . . . . .  11
70	     10.4.  Linked Agents  . . . . . . . . . . . . . . . . . . . . .  12
71	     10.5.  Shared Server Pools  . . . . . . . . . . . . . . . . . .  13
72	     10.6.  Agent Chains . . . . . . . . . . . . . . . . . . . . . .  14
73	     10.7.  Fully Meshed Layers  . . . . . . . . . . . . . . . . . .  14
74	     10.8.  Partitions . . . . . . . . . . . . . . . . . . . . . . .  15
75	     10.9.  Active-Standby Nodes . . . . . . . . . . . . . . . . . .  15
76	     10.10. Addition and removal of Nodes  . . . . . . . . . . . . .  15
77	   11. Security Considerations . . . . . . . . . . . . . . . . . . .  15
78	   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
79	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  16
80	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  16
81	     13.2.  Informative References . . . . . . . . . . . . . . . . .  16
82	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

84	1.  Introduction

86	   [RFC7068] describes requirements for Overload Control in Diameter
87	   [RFC6733].  At the time of this writing, the DIME working group is
88	   working on the Diameter Overload Information Conveyance (DOIC)
89	   mechanism [I-D.ietf-dime-ovli] .  As currently specified, DOIC
90	   fulfills some, but not all, of the requirements.

92	   In particular, DOIC does not fulfill Req 24, which requires a
93	   mechanism where Diameter nodes can indicate their current load, even
94	   if they are not currently overloaded.  DOIC also does not fulfill Req
95	   23, which requires that nodes that divert traffic away from
96	   overloaded nodes be provided with sufficient information to select
97	   targets that are most likely to have sufficient capacity.

99	   There are several other requirements in RFC 7068 that mention both
100	   overload and load information that are only partially fulfilled by
101	   DOIC.

103	   The DIME working group explicitly chose not to fulfill these
104	   requirements in DOIC due to several reasons.  A principal reason was
105	   that the working group did not agree on a general approach for
106	   conveying load information.  It chose to progress the rest of DOIC,
107	   and defer load information conveyance to a DOIC extension or a
108	   separate mechanism.

110	   This document describes some high level architectural decisions that
111	   the working group will need to consider in order to solve the load-
112	   related requirements from RFC 7068.

114	   At the time of this writing, there have been several attempts to
115	   create mechanisms for conveyance of both load and overload control
116	   information that were not adopted by the DIME working group.  While
117	   these drafts are not expected to progress, they may be instructive
118	   when considering these decisions.

120	   o  [I-D.tschofenig-dime-dlba] proposed a dedicated Diameter
121	      application for exchanging load balancing information.

123	   o  [I-D.roach-dime-overload-ctrl] described a strictly peer-to-peer
124	      exchange of both load and overload information in new AVPs piggy-
125	      backed on existing Diameter messages.

127	   o  [I-D.korhonen-dime-ovl] described a dedicated Diameter application
128	      for exchanging both load and overload information.

130	2.  Differences between Load and Overload information

132	   Previous discussions of how to solve the load-related requirements in
133	   [RFC7068] have shown that people do not have an agreed-upon concept
134	   of how "load" information differs from "overload" information.  The
135	   two concepts are highly interrelated, and so far the working group
136	   has not defined a bright line between what constitutes load
137	   information and what constitutes overload information.

139	   In the opinion of the authors, there are two primary differences.
140	   First, a Diameter node always has a load.  At any given time that
141	   load maybe effectively zero, effectively fully loaded, or somewhere
142	   in between.  In contrast, overload is an exceptional condition.  A
143	   node only has overload information when it in an overloaded state.

145	   Furthermore, the relationship between a node's load level and
146	   overload state at any given time may be vague.  For example, a node
147	   may normally operate at a "fully loaded" level, but still not be
148	   considered overloaded.  Another node may declare itself to be
149	   "overloaded" even though it might not be fully "loaded".

151	   Second, Overload information, in the form of a DOIC Overload Report
152	   (OLR) [I-D.ietf-dime-ovli] indicates an explicit request for action
153	   on the part of the reacting node.  That is, the OLR requests that the
154	   reacting node reduce the offered load -- the actual traffic sent to
155	   the reporting node after overload abatement and routing decisions are
156	   made -- by an indicated amount or to an indicated level.
157	   Effectively, DOIC provides a contract between the reporting node and
158	   the reacting node.

160	   In contrast, load is informational.  That is, load information can be
161	   considered a hint to the recipient node.  That node may use the load
162	   information for load balancing purposes, as an input to certain
163	   overload abatement techniques, to make inferences about the
164	   likelihood that the sending node becomes overloaded in the immediate
165	   future, or for other purposes.

167	   None of this prevents a Diameter node from deciding to reduce the
168	   offered load based on load information.  The fundamental difference
169	   is that an overload report requires that reduction.  It is also
170	   reasonable for a Diameter node to decide to increase the offered load
171	   based on load information.

173	3.  How is Load Information Used?

175	   [RFC7068] contemplates two primary uses for load information.  Req 23
176	   discusses how load information might be used when performing
177	   diversion as an overload abatement technique, as described in
178	   [I-D.ietf-dime-ovli].  When a reacting node diverts traffic away from
179	   an overloaded node, it needs load information for the other
180	   candidates for that traffic in order to effectively load balance the
181	   diverted load between potential candidates.  Otherwise, diversion has
182	   a greater potential to drive other nodes into overload.

184	   Req 24 discusses how Diameter load information might be used when no
185	   overload condition currently exists.  Diameter nodes can use the load
186	   information to make decisions to try to avoid overload conditions in
187	   the first place.  Normal load-balancing falls into this category.  A
188	   node might also take other proactive steps to reduce offered load
189	   based on load information, so that the loaded node never goes into
190	   overload in the first place.

192	   If the loaded nodes are Diameter servers (or clients in the case of
193	   server-to-client transactions), both of these uses are most
194	   effectively accomplished by a Diameter node that performs server
195	   selection.  Typically, server selection is performed by a node (a
196	   client or an agent) that is an immediate peer of the server.
197	   However, there are scenarios (see Section 10) where a client or proxy
198	   that is not the immediate peer to the selected servers performs
199	   server selection.  In this case, the client or proxy enforces the
200	   server selection by inserting a Destination-Host AVP.

202	      For example, a Diameter node (e.g. client) can use a redirect
203	      agent to get candidate destination host addresses.  The redirect
204	      agent might return several destination host addresses, from which
205	      the Diameter node selects one.  The Diameter node can use load
206	      information received from these hosts to make the selection.

208	   Just as load information can be used as part of server selection, it
209	   can also be used as input to the selection of the next-hop peer to
210	   which a request is to be routed.

212	   One area that requires thought is how load information is used, if at
213	   all, in the presence of an overload report from the same Diameter
214	   node.  It might be that the load information from that Diameter node
215	   is ignored for the duration of the time that the overload report is
216	   in effect.  It might also be possible that the load information can
217	   aid in the routing of non-abated requests targeted for the overloaded
218	   Diameter node.

220	4.  Piggy-Backing vs a Dedicated Application.

222	   [I-D.roach-dime-overload-ctrl] imbeds load and overload information
223	   onto messages of existing applications.  This is known as a "piggy-
224	   back" approach.  Such an approach has the advantage of not requiring
225	   new messages to carry load information.  It has an additional
226	   advantage of scaling with load; that is, the more the transaction
227	   load, the more opportunities to send load information.

229	   DOIC [I-D.ietf-dime-ovli] also uses a piggy-backed approach to send
230	   OLRs.  Given the potentially tight connection between load and
231	   overload information, there may be advantages to maintaining
232	   consistency with DOIC.

234	   [I-D.tschofenig-dime-dlba] used a dedicated application to carry load
235	   information.  This application has quasi-subscription semantics,
236	   where a client requests updates according to a cadence.  The server
237	   can send unsolicited updates if the load level changes between
238	   updates in the cadence.

240	   [I-D.korhonen-dime-ovl] also used a dedicated application, but
241	   allowed nodes to send unsolicited reports containing load and
242	   overload information.  The mechanism has an issue that the sender of
243	   load information may not know which other nodes need the information.
244	   It may be possible to infer that information from other application
245	   messages handled by the sender.

247	   Another potential approach is that of a dedicated Diameter
248	   application with a slightly different subscription semantic than that
249	   of [I-D.tschofenig-dime-dlba].  In such an application, a node that
250	   consumes load information sends a Diameter request to the source of
251	   the load information.  This request indicates that the consumer
252	   wishes to receive load information for some period of time.  The load
253	   source would send periodic Diameter requests indicating the current
254	   load level, until such time that the subscription period expired, or
255	   the subscribe explicitly unsubscribed.  After the initial
256	   notification, the sender would only send updates when the load level
257	   changed.

259	5.  Which Nodes Exchange Load Information?

261	   Section 10 illustrates a number of Diameter network topologies where
262	   load information may be useful.  However, there are potentially
263	   limitless configurations where load information might be used to make
264	   peer and server selection choices.  Nodes may be unaware of the
265	   topology beyond their immediate peers, which may limit the utility of
266	   load information for nodes beyond that peer.

268	   There may in fact be scenarios where a peer-selection decision is
269	   impacted by the load of non-adjacent nodes, or where a node needs to
270	   force selection of a particular non-adjacent server.  While explicit
271	   knowledge of the load of such non-adjacent nodes may be useful in
272	   such decisions, the working group should consider whether this
273	   utility is worth the added complexity.

275	      For instance, one approach would be to support two types of load
276	      reports, endpoint load reports and peer load reports.  In this
277	      scenario, load reports would likely require an AVP indicating the
278	      Diameter node to which the report applies.  This would be needed
279	      to differentiate between endpoint load reports and next hop load
280	      reports.  This would imply that a single message will likely have
281	      two load reports, one for the endpoint and one for the next hop.
282	      This would also add complexity in agents, sometimes needing to
283	      strip next hop load reports and sometimes not.

285	   Previous load related efforts have made different assumptions about
286	   which Diameter nodes exchange load information.

288	   [I-D.roach-dime-overload-ctrl] operated in a strictly peer-to-peer
289	   mode.  Each node would only learn the load (and overload) information
290	   from its immediate peers.

292	   [I-D.korhonen-dime-ovl] and [I-D.tschofenig-dime-dlba] are each
293	   effectively any-to-any.  That is, they each allowed any node to send
294	   load information to any other node that supported the dedicated
295	   overload or load application, respectively.

297	   In the latter case, load is effectively sent between clients and
298	   servers of the dedicated application, but those roles may not match
299	   the client and server roles for the "main" Diameter applications in
300	   use.  For example, a pair of adjacent diameter agents might be
301	   "client" and "server" for the dedicated "load" application,
302	   effectively creating a peer-to-peer relationship similar to that of
303	   [I-D.roach-dime-overload-ctrl].

305	   Each approach has advantages.  Peer-to-peer transmission covers the
306	   case when server selection is done by the servers immediate peers.
307	   Additionally, selection of non-terminal nodes is generally done on a
308	   peer-to-peer basis.  If the loaded node is an agent, for example, the
309	   load information is only useful to immediate peers.  Peer-to-peer
310	   transmission is the easiest to negotiate.  (See Section 9)

312	   Any-to-Any transmission offers more flexibility, and could
313	   potentially cover the case where server selection is done by nodes
314	   that are not peers to the candidate servers.

316	6.  Scope of Load Information

318	   Load information could refer to several different scopes:

320	   o  Load of a Node -- The load information refers to the load for an
321	      entire Diameter host, that is a Client, Agent, or Server described
322	      by a Diameter Identity.

324	   o  Load of an Application -- The load for a specific Diameter node
325	      that supports multiple Diameter applications might differ between
326	      applications.

328	   o  Load of a set of nodes -- The load would likely be the aggregated
329	      load of the nodes in the set.  This would likely require a
330	      separate Diameter identity be assigned to the set of nodes and the
331	      load information would be associated with that Diameter identity.

333	   o  Aggregate Load -- Different paths via different agents may exist
334	      between a node making a peer selection decision and the final
335	      destination of the request.  The least loaded destination may only
336	      be reachable via certain peers.

338	   o  Load of an agent plus load of a Diameter endpoint -- Different
339	      paths via different Diameter agents may exist between the node
340	      doing the server selection and the targeted Diameter endpoint.
341	      The load information on the Diameter endpoint might be used for
342	      server selection and the load information on the agent might be
343	      used for selecting the next hop in the route to the Diameter
344	      endpoint.

346	   The "scope" of load information defines what the load indication
347	   applies to.  For example, load could apply to a whole Diameter node,
348	   or a node could report different load for different application.  It
349	   might be possible to have a load value for a whole realm, or a group
350	   of nodes.

352	   [I-D.roach-dime-overload-ctrl] has a very expressive concept of
353	   scope, which applies both to load and overload information.  It
354	   defines the scopes of "Destination-Realm", "Application-ID",
355	   "Destination-Host", "Host", "Connection", "Session", and "Session-
356	   Group".  Scopes can be combined.

358	   [I-D.tschofenig-dime-dlba] does not have an explicit concept of
359	   scope.  Load information describes the load of a server for all
360	   Diameter purposes.

362	   [I-D.korhonen-dime-ovl] defines several scopes for overload
363	   information.  However, load information applies to the a whole node.

365	   One view is that the load level of a Diameter node will usually apply
366	   to the whole node.  In this case, the working group should consider a
367	   single "whole node" scope for load information.  Alternatively, a
368	   "per-connection" scope could simulate "whole node" scope without
369	   requiring the recipient to pay attention to whether multiple
370	   transport connections terminate at the same peer.

372	   Other scopes might also be considered based on the analysis of the
373	   use cases identified for the use of load information.

375	7.  Frequency of Sending Load Information

377	   While it is true that a node always has a discrete load, a
378	   determination needs to be made as to the frequency with which load
379	   information is sent.

381	   This interacts with the method for transporting load information --
382	   piggy-backed versus a dedicated application -- discussed in
383	   Section 5.

385	   With a piggy-backed approach the following alternatives exist:

387	   1.  Send load information in every message.

389	   2.  Send load information when it changes by some amount.  For
390	       instance, only send a new load report when the load value has
391	       changed by some percentage.

393	   3.  Send load information every interval of time.  With this
394	       approach, load information would be sent every some number of
395	       seconds.

397	   With alternatives 2 and 3 there would need to be a mechanism for the
398	   sender of the load information to ensure that all consumers of the
399	   load information receive the periodic load information.  This is more
400	   straightforward if the load information is sent only to peers.  It
401	   becomes more difficult if the load information is sent to non
402	   adjacent nodes.  This might require option one if the load mechanism
403	   supports sending of load information to non adjacent nodes.

405	   If a dedicated application is used for transporting of load
406	   information then part of the application definition would need to
407	   define the frequency of sending load information.  Options 2 and 3 in
408	   the above list would be the likely alternatives.

410	8.  Load Information Semantics

412	   Both [I-D.tschofenig-dime-dlba] and [I-D.korhonen-dime-ovl] define
413	   load level to be a range between zero and some maximum value, where
414	   zero means no load at all and the max value means fully loaded.  The
415	   former uses a range of 0-10, while the later uses 0-100.

417	   [I-D.roach-dime-overload-ctrl] treats load information as a strictly
418	   relative weighting factor.  The weight is only meaningful when load-
419	   balancing across multiple destinations.  That is, a maximum load
420	   value does not necessarily imply that the node is cannot handle more
421	   traffic.  The load level scale is zero to 65535.  That scale was
422	   chosen to match the resolution of the weight field from a DNS SRV
423	   record, [RFC2782]

425	9.  Is Negotiation of Support Needed?

427	   The working group should discuss whether a load conveyance mechanism
428	   requires negotiation or declaration of support.  Several
429	   considerations apply to this discussion.

431	   If load information is treated as a hint, it can be safely ignored by
432	   nodes that don't understand it.  However, security considerations may
433	   apply if load information is accidentally leaked across a non-
434	   supporting node to a node that is not authorized to receive it.

436	   If load information is conveyed using a dedicated Diameter
437	   application, the normal mechanisms for negotiation support for
438	   Diameter applications apply.  However, the Diameter Capabilities
439	   Exchange [RFC6733] mechanism is inherently peer-to-peer.  If there is
440	   a need to convey load information across a node that does not
441	   understand the mechanism, the standard Diameter mechanism would
442	   involve probing for support by sending load requests and watching for
443	   error answers with a result code of DIAMETER_APPLICATION_UNSUPPORTED.
444	   If the probe request also includes load information, there is again a
445	   potential for leaking load information to unauthorized parties.

447	   If load information was treated in a strictly peer-to-peer fashion,
448	   there would be no need to probe to see if non-adjacent nodes support
449	   the mechanism.  However, there would still be a need to control
450	   whether a non-supporting node would leak load information.  Such a
451	   leak could be prevented if adjacent peers declared support, and never
452	   sent load information to a peer that did not declare support.

454	   A peer-to-peer mechanism would also need a way to make sure that, if
455	   load information leaked across a non-supporting node, the receiving
456	   node would not mistakenly think the information came from the non-
457	   supporting node.  This could be mitigated with a mechanism to declare
458	   support as in the previous paragraph, or with a mechanism to identify
459	   the origin of the load information.  In the latter case, the
460	   receiving node would treat any load information as invalid if the
461	   origin of that information did not match the identity of the peer
462	   node.

464	10.  Topology Scenarios

466	   This section presents a number of Diameter topology scenarios, and
467	   discusses how load information might be used in each scenario.
468	   Nothing in this section should be construed to mean that a given
469	   scenario is in scope for this effort, or even a good idea.  Some
470	   scenarios might be considered as not relevant in practice and
471	   subsequently discarded.

473	10.1.  No Agent

475	   Figure 1 shows a simple client-server scenario, where a client picks
476	   from a set of candidate servers available for a particular realm and
477	   application.  The client selects the server for a given transaction
478	   using the load information received from each server.

480	     ------S1
481	    /
482	   C
483	    \
484	     ------S2

486	                  Figure 1: Basic Client Server Scenario

488	      Open Issue: Will a Diameter node include potential peers that it
489	      is not currently connected to as part of the candidate set?  It is
490	      unlikely the client would have load information from peers that it
491	      is not currently connected to.

493	      Note: The use of dynamic connections needs to be considered.

495	10.2.  Single Agent

497	   Figure 2 shows a client that sends requests to an agent.  The agent
498	   selects the request destination from a set of candidate servers,
499	   using load information received from each server.  The client does
500	   not need to receive load information, since it does not select
501	   between multiple agents.

503	          ------S1
504	         /
505	   C----A
506	         \
507	          ------S2

509	                      Figure 2: Simple Agent Scenario

511	10.3.  Multiple Agents

513	   Figure 3 shows a client selecting between multiple agents, and each
514	   agent selecting from multiple servers.  The client selects an agent
515	   based on the load information received from each agent.  Each agent
516	   selects a server based on the load information received from its
517	   servers.

519	   This scenario adds a complication that one set of servers may be more
520	   loaded than the other set.  If, for example, S4 was the least loaded
521	   server, C would need to know to select agent A2 to reach S4.  This
522	   might require C to receive load information from the servers as well
523	   as the agents.  Alternatively, each agent might use the load of its
524	   servers as an input into calculating its own load, in effect
525	   aggregating upstream load.

527	   Similarly, if C sends a host-routed request [I-D.ietf-dime-ovli], it
528	   needs to know which agent can deliver requests to the selected
529	   server.  Without some special, potentially proprietary, knowledge of
530	   the topology upstream of A1 and A2, C would select the agent based on
531	   the normal peer selection procedures for the realm and application,
532	   and perhaps consider the load information from A1 and A2.  If C sends
533	   a request to A1 that contains a Destination-Host AVP with a value of
534	   S4, A1 will not be able to deliver the request.

536	           -----S3
537	          /
538	     ---A1------S1
539	    /
540	   C
541	    \
542	     ---A2------S2
543	          \
544	           ---- S4

546	                   Figure 3: Multiple Agents and Servers

548	10.4.  Linked Agents

550	   Figure 4 shows a scenario similar to that of Figure 3, except that
551	   the agents are linked, so that A1 can forward a request to A2, and
552	   vice-versa.  Each agent could receive load information from the
553	   linked agent, as well as its connected servers.

555	   This somewhat simplifies the complication from Figure 3, due to the
556	   fact that C does not necessarily need to choose a particular agent to
557	   reach a particular server.  But it creates a similar question of how,
558	   for example, A1 might know that S4 was less loaded than S1 or S3.
559	   Additionally, it creates the opportunity for sub-optimal request
560	   paths.  For example [C,A1,A2,S4] vs. [C,A2,S4].

562	   A likely application for linked agents is when each agent prefers to
563	   route only to directly connected servers and only forwards requests
564	   to another agent under exceptional circumstances.  For example, A1
565	   might not forward requests to A2 unless both S1 and S3 are
566	   overloaded.  In this case, A1 might use the load information from S1
567	   and S3 to select between those, and only consider the load
568	   information from A2 (and other connected agents) if it needs to
569	   divert requests to different agents.

571	            -----S3
572	           /
573	      ---A1------S1
574	    /    |
575	   C     |
576	    \    |
577	      ---A2------S2
578	           \
579	            ---- S4

581	                          Figure 4: Linked Agents

583	   Figure 5 is a variant of Figure 4.  In this case, C1 sends all
584	   traffic through A1 and C2 sends all traffic through A2.  By default,
585	   A1 will load balance traffic between S1 and S3 and A2 will load
586	   balance traffic between S2 and S4.

588	   Now, if S1 S3 are significantly more loaded than S2 S4, A1 may route
589	   some C1 traffic to A2.  This is non optimal path but allows a better
590	   load balancing between the servers.  To achieve this, A1 needs to
591	   receive some load info from A2 about S2/S4 load.

593	            -----S3
594	           /
595	   C1----A1------S1
596	         |
597	         |
598	         |
599	   C2----A2------S2
600	           \
601	            ---- S4

603	                          Figure 5: Linked Agents

605	10.5.  Shared Server Pools

607	   Figure 6 is similar to Figure 4, except that instead of a link
608	   between agents, each agent is linked to all servers.  (The links to
609	   each set of servers should be interpreted as a link to each server.
610	   The links are not shown separately due to the limitations of ASCII
611	   art.)
612	   In this scenario, each agent can select among all of the servers,
613	   based on the load information from the servers.  The client need only
614	   be concerned with the load information of the agents.

616	     ---A1---S[1], S[2]...S[p]
617	    /     \ /
618	   C       x
619	    \     / \
620	     ---A2---S[p+1], S[p+2] ...S[n]

622	                       Figure 6: Shared Server Pools

624	10.6.  Agent Chains

626	   The scenario in Figure 7 is similar to that of Figure 3, except that,
627	   instead of the client possibly needing to select an agent that can
628	   route requests to the least loaded server, in this case A1 and A2
629	   need to make similar decisions when selecting between A3 or A4.  As
630	   the former scenario, this could be mitigated if A3 and A4 aggregate
631	   upstream loads into the load information they report downstream.

633	     ---A1---A3----S[1], S[2]...S[p]
634	    /   | \ /
635	   C    |  x
636	    \   | / \
637	     ---A2---A4----S[p+1], S[p+2] ...S[n]

639	                          Figure 7: Agent Chains

641	10.7.  Fully Meshed Layers

643	   Figure 8 extends the scenario in Figure 6 by adding an extra layer of
644	   agents.  But since each layer of nodes can reach any node in the next
645	   layer, each node only needs to consider the load of its next-hop
646	   peer.

648	     ---A1---A3---S[1], S[2]...S[p]
649	    /   | \ / |\ /
650	   C    |  x  | x
651	    \   | / \ |/ \
652	     ---A2---A4---S[p+1], S[p+2] ...S[n]

654	                            Figure 8: Full Mesh

656	10.8.  Partitions

658	   A Diameter network with multiple is said to be "partitioned" when
659	   only a subset of available servers can server a particular realm-
660	   routed request.  For example, one group of servers may handle users
661	   whose names start with "A" through "M", and another group may handle
662	   "N" through "Z".

664	   In such a partitioned network, nodes cannot load-balance requests
665	   across partitions, since not all servers can handle the request.  A
666	   client, or an intermediate agent, may still be able to load-balance
667	   between servers inside a partition.

669	10.9.  Active-Standby Nodes

671	   The previous scenarios assume that traffic can be load balanced among
672	   all peers that are eligible to handle a request.  That is, the peers
673	   operate in an "active-active" configuration.  In an "active-standby"
674	   configuration, traffic would be load-balanced among active peers.
675	   Requests would only be sent to peers in a "standby" state if the
676	   active peers became unavailable.  For example, requests might be
677	   diverted to a stand-by peer if one or more active peers becomes
678	   overloaded.

680	10.10.  Addition and removal of Nodes

682	   When a Diameter node is added, the new node will start by advertising
683	   its load.  Downstream nodes will need to factor the new load
684	   information into load balancing decisions.  The downstream nodes
685	   should attempt to ensure a smooth increase of the traffic to the new
686	   node, avoiding an immediate spike of traffic to the new node.  It
687	   should be determined if this use case is in the scope of the load
688	   control mechanism.

690	   When removing a node in a controlled way (e.g. for maintenance
691	   purpose, so outside a failure case), it might be appropriate to
692	   progressively reduce the traffic to this node by routing traffic to
693	   other nodes.  Simple load information (load percentage) would be not
694	   sufficient.  It should be determined if this use case is in the scope
695	   of the load control mechanism.

697	11.  Security Considerations

699	   Load information may be sensitive information in some cases.
700	   Depending on the mechanism. an unauthorized recipient might be able
701	   to infer the topology of a Diameter network from load information.
702	   Load information might be useful in identifying targets for Denial of
703	   Service (DoS) attacks, where a node known to be already heavily
704	   loaded might be a tempting target.  Load information might also be
705	   useful as feedback about the success of an ongoing DoS attack.

707	   Any load information conveyance mechanism will need to allow
708	   operators to avoid sending load information to nodes that are not
709	   authorized to receive it.  Since Diameter currently only offers
710	   authentication of nodes at the transport level, any solution that
711	   sends load information to non-peer nodes might require a transitive-
712	   trust model.

714	12.  IANA Considerations

716	   This document makes no requests of IANA.

718	13.  References

720	13.1.  Normative References

722	   [I-D.ietf-dime-ovli]
723	              Korhonen, J., Donovan, S., Campbell, B., and L. Morand,
724	              "Diameter Overload Indication Conveyance", draft-ietf-
725	              dime-ovli-03 (work in progress), July 2014.

727	   [RFC6733]  Fajardo, V., Arkko, J., Loughney, J., and G. Zorn,
728	              "Diameter Base Protocol", RFC 6733, October 2012.

730	   [RFC7068]  McMurry, E. and B. Campbell, "Diameter Overload Control
731	              Requirements", RFC 7068, November 2013.

733	13.2.  Informative References

735	   [I-D.korhonen-dime-ovl]
736	              Korhonen, J. and H. Tschofenig, "The Diameter Overload
737	              Control Application (DOCA)", draft-korhonen-dime-ovl-01
738	              (work in progress), February 2013.

740	   [I-D.roach-dime-overload-ctrl]
741	              Roach, A. and E. McMurry, "A Mechanism for Diameter
742	              Overload Control", draft-roach-dime-overload-ctrl-03 (work
743	              in progress), May 2013.

745	   [I-D.tschofenig-dime-dlba]
746	              Tschofenig, H., "The Diameter Load Balancing Application
747	              (DLBA)", draft-tschofenig-dime-dlba-00 (work in progress),
748	              July 2013.

750	   [RFC2782]  Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
751	              specifying the location of services (DNS SRV)", RFC 2782,
752	              February 2000.

754	Authors' Addresses

756	   Ben Campbell
757	   Oracle
758	   7460 Warren Parkway # 300
759	   Frisco, Texas  75034
760	   USA

762	   Email: ben@nostrum.com

764	   Steve Donovan (editor)
765	   Oracle
766	   7460 Warren Parkway # 300
767	   Frisco, Texas  75034
768	   United States

770	   Email: srdonovan@usdonovans.com

772	   Jean-Jacques Trottin
773	   Alcatel-Lucent
774	   Route de Villejust
775	   91620 Nozay
776	   France

778	   Email: jean-jacques.trottin@alcatel-lucent.com