idnits 2.17.1 draft-campbell-dime-load-considerations-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 6, 2015) is 3332 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'C' is mentioned on line 560, but not defined == Missing Reference: 'A1' is mentioned on line 560, but not defined == Missing Reference: 'A2' is mentioned on line 560, but not defined == Missing Reference: 'S4' is mentioned on line 560, but not defined -- Looks like a reference, but probably isn't: '1' on line 648 -- Looks like a reference, but probably isn't: '2' on line 648 == Outdated reference: A later version (-10) exists of draft-ietf-dime-ovli-03 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force B. Campbell 3 Internet-Draft S. Donovan, Ed. 4 Intended status: Informational Oracle 5 Expires: September 7, 2015 JJ. Trottin 6 Alcatel-Lucent 7 March 6, 2015 9 Architectural Considerations for Diameter Load Information 10 draft-campbell-dime-load-considerations-01 12 Abstract 14 RFC 7068 describes requirements for Overload Control in Diameter. 15 This includes a requirement to allow Diameter nodes to send "load" 16 information, even when the node is not overloaded. The Diameter 17 Overload Information Conveyance (DOIC) solution describes a mechanism 18 meeting most of the requirements, but does not currently include the 19 ability to send load information. This document explores some 20 architectural considerations for a mechanism to send Diameter load 21 information. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on September 7, 2015. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Differences between Load and Overload information . . . . . . 3 59 3. How is Load Information Used? . . . . . . . . . . . . . . . . 4 60 4. Piggy-Backing vs a Dedicated Application. . . . . . . . . . . 5 61 5. Which Nodes Exchange Load Information? . . . . . . . . . . . 6 62 6. Scope of Load Information . . . . . . . . . . . . . . . . . . 7 63 7. Frequency of Sending Load Information . . . . . . . . . . . . 8 64 8. Load Information Semantics . . . . . . . . . . . . . . . . . 9 65 9. Is Negotiation of Support Needed? . . . . . . . . . . . . . . 10 66 10. Topology Scenarios . . . . . . . . . . . . . . . . . . . . . 10 67 10.1. No Agent . . . . . . . . . . . . . . . . . . . . . . . . 11 68 10.2. Single Agent . . . . . . . . . . . . . . . . . . . . . . 11 69 10.3. Multiple Agents . . . . . . . . . . . . . . . . . . . . 11 70 10.4. Linked Agents . . . . . . . . . . . . . . . . . . . . . 12 71 10.5. Shared Server Pools . . . . . . . . . . . . . . . . . . 13 72 10.6. Agent Chains . . . . . . . . . . . . . . . . . . . . . . 14 73 10.7. Fully Meshed Layers . . . . . . . . . . . . . . . . . . 14 74 10.8. Partitions . . . . . . . . . . . . . . . . . . . . . . . 15 75 10.9. Active-Standby Nodes . . . . . . . . . . . . . . . . . . 15 76 10.10. Addition and removal of Nodes . . . . . . . . . . . . . 15 77 11. Security Considerations . . . . . . . . . . . . . . . . . . . 15 78 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 79 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 80 13.1. Normative References . . . . . . . . . . . . . . . . . . 16 81 13.2. Informative References . . . . . . . . . . . . . . . . . 16 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 84 1. Introduction 86 [RFC7068] describes requirements for Overload Control in Diameter 87 [RFC6733]. At the time of this writing, the DIME working group is 88 working on the Diameter Overload Information Conveyance (DOIC) 89 mechanism [I-D.ietf-dime-ovli] . As currently specified, DOIC 90 fulfills some, but not all, of the requirements. 92 In particular, DOIC does not fulfill Req 24, which requires a 93 mechanism where Diameter nodes can indicate their current load, even 94 if they are not currently overloaded. DOIC also does not fulfill Req 95 23, which requires that nodes that divert traffic away from 96 overloaded nodes be provided with sufficient information to select 97 targets that are most likely to have sufficient capacity. 99 There are several other requirements in RFC 7068 that mention both 100 overload and load information that are only partially fulfilled by 101 DOIC. 103 The DIME working group explicitly chose not to fulfill these 104 requirements in DOIC due to several reasons. A principal reason was 105 that the working group did not agree on a general approach for 106 conveying load information. It chose to progress the rest of DOIC, 107 and defer load information conveyance to a DOIC extension or a 108 separate mechanism. 110 This document describes some high level architectural decisions that 111 the working group will need to consider in order to solve the load- 112 related requirements from RFC 7068. 114 At the time of this writing, there have been several attempts to 115 create mechanisms for conveyance of both load and overload control 116 information that were not adopted by the DIME working group. While 117 these drafts are not expected to progress, they may be instructive 118 when considering these decisions. 120 o [I-D.tschofenig-dime-dlba] proposed a dedicated Diameter 121 application for exchanging load balancing information. 123 o [I-D.roach-dime-overload-ctrl] described a strictly peer-to-peer 124 exchange of both load and overload information in new AVPs piggy- 125 backed on existing Diameter messages. 127 o [I-D.korhonen-dime-ovl] described a dedicated Diameter application 128 for exchanging both load and overload information. 130 2. Differences between Load and Overload information 132 Previous discussions of how to solve the load-related requirements in 133 [RFC7068] have shown that people do not have an agreed-upon concept 134 of how "load" information differs from "overload" information. The 135 two concepts are highly interrelated, and so far the working group 136 has not defined a bright line between what constitutes load 137 information and what constitutes overload information. 139 In the opinion of the authors, there are two primary differences. 140 First, a Diameter node always has a load. At any given time that 141 load maybe effectively zero, effectively fully loaded, or somewhere 142 in between. In contrast, overload is an exceptional condition. A 143 node only has overload information when it in an overloaded state. 145 Furthermore, the relationship between a node's load level and 146 overload state at any given time may be vague. For example, a node 147 may normally operate at a "fully loaded" level, but still not be 148 considered overloaded. Another node may declare itself to be 149 "overloaded" even though it might not be fully "loaded". 151 Second, Overload information, in the form of a DOIC Overload Report 152 (OLR) [I-D.ietf-dime-ovli] indicates an explicit request for action 153 on the part of the reacting node. That is, the OLR requests that the 154 reacting node reduce the offered load -- the actual traffic sent to 155 the reporting node after overload abatement and routing decisions are 156 made -- by an indicated amount or to an indicated level. 157 Effectively, DOIC provides a contract between the reporting node and 158 the reacting node. 160 In contrast, load is informational. That is, load information can be 161 considered a hint to the recipient node. That node may use the load 162 information for load balancing purposes, as an input to certain 163 overload abatement techniques, to make inferences about the 164 likelihood that the sending node becomes overloaded in the immediate 165 future, or for other purposes. 167 None of this prevents a Diameter node from deciding to reduce the 168 offered load based on load information. The fundamental difference 169 is that an overload report requires that reduction. It is also 170 reasonable for a Diameter node to decide to increase the offered load 171 based on load information. 173 3. How is Load Information Used? 175 [RFC7068] contemplates two primary uses for load information. Req 23 176 discusses how load information might be used when performing 177 diversion as an overload abatement technique, as described in 178 [I-D.ietf-dime-ovli]. When a reacting node diverts traffic away from 179 an overloaded node, it needs load information for the other 180 candidates for that traffic in order to effectively load balance the 181 diverted load between potential candidates. Otherwise, diversion has 182 a greater potential to drive other nodes into overload. 184 Req 24 discusses how Diameter load information might be used when no 185 overload condition currently exists. Diameter nodes can use the load 186 information to make decisions to try to avoid overload conditions in 187 the first place. Normal load-balancing falls into this category. A 188 node might also take other proactive steps to reduce offered load 189 based on load information, so that the loaded node never goes into 190 overload in the first place. 192 If the loaded nodes are Diameter servers (or clients in the case of 193 server-to-client transactions), both of these uses are most 194 effectively accomplished by a Diameter node that performs server 195 selection. Typically, server selection is performed by a node (a 196 client or an agent) that is an immediate peer of the server. 197 However, there are scenarios (see Section 10) where a client or proxy 198 that is not the immediate peer to the selected servers performs 199 server selection. In this case, the client or proxy enforces the 200 server selection by inserting a Destination-Host AVP. 202 For example, a Diameter node (e.g. client) can use a redirect 203 agent to get candidate destination host addresses. The redirect 204 agent might return several destination host addresses, from which 205 the Diameter node selects one. The Diameter node can use load 206 information received from these hosts to make the selection. 208 Just as load information can be used as part of server selection, it 209 can also be used as input to the selection of the next-hop peer to 210 which a request is to be routed. 212 One area that requires thought is how load information is used, if at 213 all, in the presence of an overload report from the same Diameter 214 node. It might be that the load information from that Diameter node 215 is ignored for the duration of the time that the overload report is 216 in effect. It might also be possible that the load information can 217 aid in the routing of non-abated requests targeted for the overloaded 218 Diameter node. 220 4. Piggy-Backing vs a Dedicated Application. 222 [I-D.roach-dime-overload-ctrl] imbeds load and overload information 223 onto messages of existing applications. This is known as a "piggy- 224 back" approach. Such an approach has the advantage of not requiring 225 new messages to carry load information. It has an additional 226 advantage of scaling with load; that is, the more the transaction 227 load, the more opportunities to send load information. 229 DOIC [I-D.ietf-dime-ovli] also uses a piggy-backed approach to send 230 OLRs. Given the potentially tight connection between load and 231 overload information, there may be advantages to maintaining 232 consistency with DOIC. 234 [I-D.tschofenig-dime-dlba] used a dedicated application to carry load 235 information. This application has quasi-subscription semantics, 236 where a client requests updates according to a cadence. The server 237 can send unsolicited updates if the load level changes between 238 updates in the cadence. 240 [I-D.korhonen-dime-ovl] also used a dedicated application, but 241 allowed nodes to send unsolicited reports containing load and 242 overload information. The mechanism has an issue that the sender of 243 load information may not know which other nodes need the information. 244 It may be possible to infer that information from other application 245 messages handled by the sender. 247 Another potential approach is that of a dedicated Diameter 248 application with a slightly different subscription semantic than that 249 of [I-D.tschofenig-dime-dlba]. In such an application, a node that 250 consumes load information sends a Diameter request to the source of 251 the load information. This request indicates that the consumer 252 wishes to receive load information for some period of time. The load 253 source would send periodic Diameter requests indicating the current 254 load level, until such time that the subscription period expired, or 255 the subscribe explicitly unsubscribed. After the initial 256 notification, the sender would only send updates when the load level 257 changed. 259 5. Which Nodes Exchange Load Information? 261 Section 10 illustrates a number of Diameter network topologies where 262 load information may be useful. However, there are potentially 263 limitless configurations where load information might be used to make 264 peer and server selection choices. Nodes may be unaware of the 265 topology beyond their immediate peers, which may limit the utility of 266 load information for nodes beyond that peer. 268 There may in fact be scenarios where a peer-selection decision is 269 impacted by the load of non-adjacent nodes, or where a node needs to 270 force selection of a particular non-adjacent server. While explicit 271 knowledge of the load of such non-adjacent nodes may be useful in 272 such decisions, the working group should consider whether this 273 utility is worth the added complexity. 275 For instance, one approach would be to support two types of load 276 reports, endpoint load reports and peer load reports. In this 277 scenario, load reports would likely require an AVP indicating the 278 Diameter node to which the report applies. This would be needed 279 to differentiate between endpoint load reports and next hop load 280 reports. This would imply that a single message will likely have 281 two load reports, one for the endpoint and one for the next hop. 282 This would also add complexity in agents, sometimes needing to 283 strip next hop load reports and sometimes not. 285 Previous load related efforts have made different assumptions about 286 which Diameter nodes exchange load information. 288 [I-D.roach-dime-overload-ctrl] operated in a strictly peer-to-peer 289 mode. Each node would only learn the load (and overload) information 290 from its immediate peers. 292 [I-D.korhonen-dime-ovl] and [I-D.tschofenig-dime-dlba] are each 293 effectively any-to-any. That is, they each allowed any node to send 294 load information to any other node that supported the dedicated 295 overload or load application, respectively. 297 In the latter case, load is effectively sent between clients and 298 servers of the dedicated application, but those roles may not match 299 the client and server roles for the "main" Diameter applications in 300 use. For example, a pair of adjacent diameter agents might be 301 "client" and "server" for the dedicated "load" application, 302 effectively creating a peer-to-peer relationship similar to that of 303 [I-D.roach-dime-overload-ctrl]. 305 Each approach has advantages. Peer-to-peer transmission covers the 306 case when server selection is done by the servers immediate peers. 307 Additionally, selection of non-terminal nodes is generally done on a 308 peer-to-peer basis. If the loaded node is an agent, for example, the 309 load information is only useful to immediate peers. Peer-to-peer 310 transmission is the easiest to negotiate. (See Section 9) 312 Any-to-Any transmission offers more flexibility, and could 313 potentially cover the case where server selection is done by nodes 314 that are not peers to the candidate servers. 316 6. Scope of Load Information 318 Load information could refer to several different scopes: 320 o Load of a Node -- The load information refers to the load for an 321 entire Diameter host, that is a Client, Agent, or Server described 322 by a Diameter Identity. 324 o Load of an Application -- The load for a specific Diameter node 325 that supports multiple Diameter applications might differ between 326 applications. 328 o Load of a set of nodes -- The load would likely be the aggregated 329 load of the nodes in the set. This would likely require a 330 separate Diameter identity be assigned to the set of nodes and the 331 load information would be associated with that Diameter identity. 333 o Aggregate Load -- Different paths via different agents may exist 334 between a node making a peer selection decision and the final 335 destination of the request. The least loaded destination may only 336 be reachable via certain peers. 338 o Load of an agent plus load of a Diameter endpoint -- Different 339 paths via different Diameter agents may exist between the node 340 doing the server selection and the targeted Diameter endpoint. 341 The load information on the Diameter endpoint might be used for 342 server selection and the load information on the agent might be 343 used for selecting the next hop in the route to the Diameter 344 endpoint. 346 The "scope" of load information defines what the load indication 347 applies to. For example, load could apply to a whole Diameter node, 348 or a node could report different load for different application. It 349 might be possible to have a load value for a whole realm, or a group 350 of nodes. 352 [I-D.roach-dime-overload-ctrl] has a very expressive concept of 353 scope, which applies both to load and overload information. It 354 defines the scopes of "Destination-Realm", "Application-ID", 355 "Destination-Host", "Host", "Connection", "Session", and "Session- 356 Group". Scopes can be combined. 358 [I-D.tschofenig-dime-dlba] does not have an explicit concept of 359 scope. Load information describes the load of a server for all 360 Diameter purposes. 362 [I-D.korhonen-dime-ovl] defines several scopes for overload 363 information. However, load information applies to the a whole node. 365 One view is that the load level of a Diameter node will usually apply 366 to the whole node. In this case, the working group should consider a 367 single "whole node" scope for load information. Alternatively, a 368 "per-connection" scope could simulate "whole node" scope without 369 requiring the recipient to pay attention to whether multiple 370 transport connections terminate at the same peer. 372 Other scopes might also be considered based on the analysis of the 373 use cases identified for the use of load information. 375 7. Frequency of Sending Load Information 377 While it is true that a node always has a discrete load, a 378 determination needs to be made as to the frequency with which load 379 information is sent. 381 This interacts with the method for transporting load information -- 382 piggy-backed versus a dedicated application -- discussed in 383 Section 5. 385 With a piggy-backed approach the following alternatives exist: 387 1. Send load information in every message. 389 2. Send load information when it changes by some amount. For 390 instance, only send a new load report when the load value has 391 changed by some percentage. 393 3. Send load information every interval of time. With this 394 approach, load information would be sent every some number of 395 seconds. 397 With alternatives 2 and 3 there would need to be a mechanism for the 398 sender of the load information to ensure that all consumers of the 399 load information receive the periodic load information. This is more 400 straightforward if the load information is sent only to peers. It 401 becomes more difficult if the load information is sent to non 402 adjacent nodes. This might require option one if the load mechanism 403 supports sending of load information to non adjacent nodes. 405 If a dedicated application is used for transporting of load 406 information then part of the application definition would need to 407 define the frequency of sending load information. Options 2 and 3 in 408 the above list would be the likely alternatives. 410 8. Load Information Semantics 412 Both [I-D.tschofenig-dime-dlba] and [I-D.korhonen-dime-ovl] define 413 load level to be a range between zero and some maximum value, where 414 zero means no load at all and the max value means fully loaded. The 415 former uses a range of 0-10, while the later uses 0-100. 417 [I-D.roach-dime-overload-ctrl] treats load information as a strictly 418 relative weighting factor. The weight is only meaningful when load- 419 balancing across multiple destinations. That is, a maximum load 420 value does not necessarily imply that the node is cannot handle more 421 traffic. The load level scale is zero to 65535. That scale was 422 chosen to match the resolution of the weight field from a DNS SRV 423 record, [RFC2782] 425 9. Is Negotiation of Support Needed? 427 The working group should discuss whether a load conveyance mechanism 428 requires negotiation or declaration of support. Several 429 considerations apply to this discussion. 431 If load information is treated as a hint, it can be safely ignored by 432 nodes that don't understand it. However, security considerations may 433 apply if load information is accidentally leaked across a non- 434 supporting node to a node that is not authorized to receive it. 436 If load information is conveyed using a dedicated Diameter 437 application, the normal mechanisms for negotiation support for 438 Diameter applications apply. However, the Diameter Capabilities 439 Exchange [RFC6733] mechanism is inherently peer-to-peer. If there is 440 a need to convey load information across a node that does not 441 understand the mechanism, the standard Diameter mechanism would 442 involve probing for support by sending load requests and watching for 443 error answers with a result code of DIAMETER_APPLICATION_UNSUPPORTED. 444 If the probe request also includes load information, there is again a 445 potential for leaking load information to unauthorized parties. 447 If load information was treated in a strictly peer-to-peer fashion, 448 there would be no need to probe to see if non-adjacent nodes support 449 the mechanism. However, there would still be a need to control 450 whether a non-supporting node would leak load information. Such a 451 leak could be prevented if adjacent peers declared support, and never 452 sent load information to a peer that did not declare support. 454 A peer-to-peer mechanism would also need a way to make sure that, if 455 load information leaked across a non-supporting node, the receiving 456 node would not mistakenly think the information came from the non- 457 supporting node. This could be mitigated with a mechanism to declare 458 support as in the previous paragraph, or with a mechanism to identify 459 the origin of the load information. In the latter case, the 460 receiving node would treat any load information as invalid if the 461 origin of that information did not match the identity of the peer 462 node. 464 10. Topology Scenarios 466 This section presents a number of Diameter topology scenarios, and 467 discusses how load information might be used in each scenario. 468 Nothing in this section should be construed to mean that a given 469 scenario is in scope for this effort, or even a good idea. Some 470 scenarios might be considered as not relevant in practice and 471 subsequently discarded. 473 10.1. No Agent 475 Figure 1 shows a simple client-server scenario, where a client picks 476 from a set of candidate servers available for a particular realm and 477 application. The client selects the server for a given transaction 478 using the load information received from each server. 480 ------S1 481 / 482 C 483 \ 484 ------S2 486 Figure 1: Basic Client Server Scenario 488 Open Issue: Will a Diameter node include potential peers that it 489 is not currently connected to as part of the candidate set? It is 490 unlikely the client would have load information from peers that it 491 is not currently connected to. 493 Note: The use of dynamic connections needs to be considered. 495 10.2. Single Agent 497 Figure 2 shows a client that sends requests to an agent. The agent 498 selects the request destination from a set of candidate servers, 499 using load information received from each server. The client does 500 not need to receive load information, since it does not select 501 between multiple agents. 503 ------S1 504 / 505 C----A 506 \ 507 ------S2 509 Figure 2: Simple Agent Scenario 511 10.3. Multiple Agents 513 Figure 3 shows a client selecting between multiple agents, and each 514 agent selecting from multiple servers. The client selects an agent 515 based on the load information received from each agent. Each agent 516 selects a server based on the load information received from its 517 servers. 519 This scenario adds a complication that one set of servers may be more 520 loaded than the other set. If, for example, S4 was the least loaded 521 server, C would need to know to select agent A2 to reach S4. This 522 might require C to receive load information from the servers as well 523 as the agents. Alternatively, each agent might use the load of its 524 servers as an input into calculating its own load, in effect 525 aggregating upstream load. 527 Similarly, if C sends a host-routed request [I-D.ietf-dime-ovli], it 528 needs to know which agent can deliver requests to the selected 529 server. Without some special, potentially proprietary, knowledge of 530 the topology upstream of A1 and A2, C would select the agent based on 531 the normal peer selection procedures for the realm and application, 532 and perhaps consider the load information from A1 and A2. If C sends 533 a request to A1 that contains a Destination-Host AVP with a value of 534 S4, A1 will not be able to deliver the request. 536 -----S3 537 / 538 ---A1------S1 539 / 540 C 541 \ 542 ---A2------S2 543 \ 544 ---- S4 546 Figure 3: Multiple Agents and Servers 548 10.4. Linked Agents 550 Figure 4 shows a scenario similar to that of Figure 3, except that 551 the agents are linked, so that A1 can forward a request to A2, and 552 vice-versa. Each agent could receive load information from the 553 linked agent, as well as its connected servers. 555 This somewhat simplifies the complication from Figure 3, due to the 556 fact that C does not necessarily need to choose a particular agent to 557 reach a particular server. But it creates a similar question of how, 558 for example, A1 might know that S4 was less loaded than S1 or S3. 559 Additionally, it creates the opportunity for sub-optimal request 560 paths. For example [C,A1,A2,S4] vs. [C,A2,S4]. 562 A likely application for linked agents is when each agent prefers to 563 route only to directly connected servers and only forwards requests 564 to another agent under exceptional circumstances. For example, A1 565 might not forward requests to A2 unless both S1 and S3 are 566 overloaded. In this case, A1 might use the load information from S1 567 and S3 to select between those, and only consider the load 568 information from A2 (and other connected agents) if it needs to 569 divert requests to different agents. 571 -----S3 572 / 573 ---A1------S1 574 / | 575 C | 576 \ | 577 ---A2------S2 578 \ 579 ---- S4 581 Figure 4: Linked Agents 583 Figure 5 is a variant of Figure 4. In this case, C1 sends all 584 traffic through A1 and C2 sends all traffic through A2. By default, 585 A1 will load balance traffic between S1 and S3 and A2 will load 586 balance traffic between S2 and S4. 588 Now, if S1 S3 are significantly more loaded than S2 S4, A1 may route 589 some C1 traffic to A2. This is non optimal path but allows a better 590 load balancing between the servers. To achieve this, A1 needs to 591 receive some load info from A2 about S2/S4 load. 593 -----S3 594 / 595 C1----A1------S1 596 | 597 | 598 | 599 C2----A2------S2 600 \ 601 ---- S4 603 Figure 5: Linked Agents 605 10.5. Shared Server Pools 607 Figure 6 is similar to Figure 4, except that instead of a link 608 between agents, each agent is linked to all servers. (The links to 609 each set of servers should be interpreted as a link to each server. 610 The links are not shown separately due to the limitations of ASCII 611 art.) 612 In this scenario, each agent can select among all of the servers, 613 based on the load information from the servers. The client need only 614 be concerned with the load information of the agents. 616 ---A1---S[1], S[2]...S[p] 617 / \ / 618 C x 619 \ / \ 620 ---A2---S[p+1], S[p+2] ...S[n] 622 Figure 6: Shared Server Pools 624 10.6. Agent Chains 626 The scenario in Figure 7 is similar to that of Figure 3, except that, 627 instead of the client possibly needing to select an agent that can 628 route requests to the least loaded server, in this case A1 and A2 629 need to make similar decisions when selecting between A3 or A4. As 630 the former scenario, this could be mitigated if A3 and A4 aggregate 631 upstream loads into the load information they report downstream. 633 ---A1---A3----S[1], S[2]...S[p] 634 / | \ / 635 C | x 636 \ | / \ 637 ---A2---A4----S[p+1], S[p+2] ...S[n] 639 Figure 7: Agent Chains 641 10.7. Fully Meshed Layers 643 Figure 8 extends the scenario in Figure 6 by adding an extra layer of 644 agents. But since each layer of nodes can reach any node in the next 645 layer, each node only needs to consider the load of its next-hop 646 peer. 648 ---A1---A3---S[1], S[2]...S[p] 649 / | \ / |\ / 650 C | x | x 651 \ | / \ |/ \ 652 ---A2---A4---S[p+1], S[p+2] ...S[n] 654 Figure 8: Full Mesh 656 10.8. Partitions 658 A Diameter network with multiple is said to be "partitioned" when 659 only a subset of available servers can server a particular realm- 660 routed request. For example, one group of servers may handle users 661 whose names start with "A" through "M", and another group may handle 662 "N" through "Z". 664 In such a partitioned network, nodes cannot load-balance requests 665 across partitions, since not all servers can handle the request. A 666 client, or an intermediate agent, may still be able to load-balance 667 between servers inside a partition. 669 10.9. Active-Standby Nodes 671 The previous scenarios assume that traffic can be load balanced among 672 all peers that are eligible to handle a request. That is, the peers 673 operate in an "active-active" configuration. In an "active-standby" 674 configuration, traffic would be load-balanced among active peers. 675 Requests would only be sent to peers in a "standby" state if the 676 active peers became unavailable. For example, requests might be 677 diverted to a stand-by peer if one or more active peers becomes 678 overloaded. 680 10.10. Addition and removal of Nodes 682 When a Diameter node is added, the new node will start by advertising 683 its load. Downstream nodes will need to factor the new load 684 information into load balancing decisions. The downstream nodes 685 should attempt to ensure a smooth increase of the traffic to the new 686 node, avoiding an immediate spike of traffic to the new node. It 687 should be determined if this use case is in the scope of the load 688 control mechanism. 690 When removing a node in a controlled way (e.g. for maintenance 691 purpose, so outside a failure case), it might be appropriate to 692 progressively reduce the traffic to this node by routing traffic to 693 other nodes. Simple load information (load percentage) would be not 694 sufficient. It should be determined if this use case is in the scope 695 of the load control mechanism. 697 11. Security Considerations 699 Load information may be sensitive information in some cases. 700 Depending on the mechanism. an unauthorized recipient might be able 701 to infer the topology of a Diameter network from load information. 702 Load information might be useful in identifying targets for Denial of 703 Service (DoS) attacks, where a node known to be already heavily 704 loaded might be a tempting target. Load information might also be 705 useful as feedback about the success of an ongoing DoS attack. 707 Any load information conveyance mechanism will need to allow 708 operators to avoid sending load information to nodes that are not 709 authorized to receive it. Since Diameter currently only offers 710 authentication of nodes at the transport level, any solution that 711 sends load information to non-peer nodes might require a transitive- 712 trust model. 714 12. IANA Considerations 716 This document makes no requests of IANA. 718 13. References 720 13.1. Normative References 722 [I-D.ietf-dime-ovli] 723 Korhonen, J., Donovan, S., Campbell, B., and L. Morand, 724 "Diameter Overload Indication Conveyance", draft-ietf- 725 dime-ovli-03 (work in progress), July 2014. 727 [RFC6733] Fajardo, V., Arkko, J., Loughney, J., and G. Zorn, 728 "Diameter Base Protocol", RFC 6733, October 2012. 730 [RFC7068] McMurry, E. and B. Campbell, "Diameter Overload Control 731 Requirements", RFC 7068, November 2013. 733 13.2. Informative References 735 [I-D.korhonen-dime-ovl] 736 Korhonen, J. and H. Tschofenig, "The Diameter Overload 737 Control Application (DOCA)", draft-korhonen-dime-ovl-01 738 (work in progress), February 2013. 740 [I-D.roach-dime-overload-ctrl] 741 Roach, A. and E. McMurry, "A Mechanism for Diameter 742 Overload Control", draft-roach-dime-overload-ctrl-03 (work 743 in progress), May 2013. 745 [I-D.tschofenig-dime-dlba] 746 Tschofenig, H., "The Diameter Load Balancing Application 747 (DLBA)", draft-tschofenig-dime-dlba-00 (work in progress), 748 July 2013. 750 [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for 751 specifying the location of services (DNS SRV)", RFC 2782, 752 February 2000. 754 Authors' Addresses 756 Ben Campbell 757 Oracle 758 7460 Warren Parkway # 300 759 Frisco, Texas 75034 760 USA 762 Email: ben@nostrum.com 764 Steve Donovan (editor) 765 Oracle 766 7460 Warren Parkway # 300 767 Frisco, Texas 75034 768 United States 770 Email: srdonovan@usdonovans.com 772 Jean-Jacques Trottin 773 Alcatel-Lucent 774 Route de Villejust 775 91620 Nozay 776 France 778 Email: jean-jacques.trottin@alcatel-lucent.com