idnits 2.17.1 draft-campbell-dime-overload-issues-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3936 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-dime-overload-reqs-07 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Campbell 3 Internet-Draft Tekelec 4 Intended status: Informational July 15, 2013 5 Expires: January 16, 2014 7 Diameter Overload Control Solution Issues 8 draft-campbell-dime-overload-issues-01 10 Abstract 12 The Diameter Maintenance and Extensions (DIME) working group has 13 undertaken an "overload control" work item, with the goal of 14 standardizing a mechanism to allow Diameter nodes to report overload 15 information among themselves. Requirements currently include, among 16 others, the need to accurately report the scope of overload 17 conditions, and the ability to report overload information between 18 nodes that are not directly connected at the transport layer. These 19 requirements introduce complex issues. This document describes those 20 issues, in the hope that it will assist the working group's decision 21 process. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 16, 2014. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 4 59 3. Non-adjacent Overload Information . . . . . . . . . . . . . . 4 60 3.1. Use-Cases for Non-adjacent Overload Control . . . . . . . 5 61 3.1.1. Interconnect . . . . . . . . . . . . . . . . . . . . 5 62 3.1.2. Non-Supporting Agents . . . . . . . . . . . . . . . . 6 63 3.2. Issues with Non-Adjacent Overload Control . . . . . . . . 6 64 3.2.1. Topology Issues . . . . . . . . . . . . . . . . . . . 6 65 3.2.2. Support Negotiation . . . . . . . . . . . . . . . . . 7 66 3.2.3. Overload Report Delivery . . . . . . . . . . . . . . 8 67 3.2.4. Non-Adjacent Overload Scopes . . . . . . . . . . . . 9 68 3.3. Non-adjacent Overload Control Recommendations . . . . . . 11 69 4. Overload Scopes . . . . . . . . . . . . . . . . . . . . . . . 12 70 4.1. Explicit vs Implicit Indication of Scopes . . . . . . . . 13 71 4.2. Types of Overload Scopes . . . . . . . . . . . . . . . . 14 72 4.2.1. Connection Scope-Type . . . . . . . . . . . . . . . . 14 73 4.2.2. Peer Scope-Type . . . . . . . . . . . . . . . . . . . 15 74 4.2.3. Destination-Host Scope-Type . . . . . . . . . . . . . 15 75 4.2.4. Origin-Host Scope-Type . . . . . . . . . . . . . . . 16 76 4.2.5. Diameter-Application Scope-Type . . . . . . . . . . . 16 77 4.2.6. Destination-Realm Scope-Type . . . . . . . . . . . . 16 78 4.2.7. Session Scope-Type . . . . . . . . . . . . . . . . . 17 79 4.2.8. Session-Group Scope-Type . . . . . . . . . . . . . . 18 80 4.3. Scope Values . . . . . . . . . . . . . . . . . . . . . . 18 81 4.4. Combining Scopes . . . . . . . . . . . . . . . . . . . . 18 82 4.5. Scope Extensibility . . . . . . . . . . . . . . . . . . . 19 83 4.6. Scope Recommendations . . . . . . . . . . . . . . . . . . 19 84 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 85 6. Security Considerations . . . . . . . . . . . . . . . . . . . 19 86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 87 7.1. Normative References . . . . . . . . . . . . . . . . . . 20 88 7.2. Informative References . . . . . . . . . . . . . . . . . 20 89 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 20 90 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 20 92 1. Introduction 94 When a Diameter [RFC6733] server or agent becomes overloaded, it 95 needs to be able to gracefully reduce its load, typically by 96 requesting other nodes to reduce the number of Diameter requests for 97 some period of time. 99 The Diameter Overload Control Requirements 100 [I-D.ietf-dime-overload-reqs] describe requirements for overload 101 control mechanisms. Requirement 31 states that Diameter nodes must 102 be able to report overload with sufficient granularity to avoid 103 forcing available capacity to go unused. Requirement 34 requires the 104 ability to report overload across Diameter nodes that do not support 105 the mechanism. These requirements introduce significant and 106 interrelated complexities to potential solutions. This document 107 describes the related issues. The author hopes that this document 108 will assist the working group's decision process related to these 109 requirements. 111 At the time of this writing, there have been two proposals for 112 Diameter overload control solutions. "A Mechanism for Diameter 113 Overload Control" (MDOC) [I-D.roach-dime-overload-ctrl] defines a 114 solution that piggybacks overload and load state information over 115 existing Diameter messages. "The Diameter Overload Control 116 Application" (DOCA) [I-D.korhonen-dime-ovl] defines a solution that 117 uses a new dedicated Diameter application to communicate similar 118 information. 120 While there are significant differences between the two proposals, 121 they carry similar information. In many ways, the issues related 122 to Requirements 31 and 34 apply to both proposals. This 123 discussion is not specific to one proposal or the other, unless 124 explicitly mentioned. 126 This document serves two purposes. The primary purpose is to explore 127 the issues related to Requirement 34, that is, the requirement for 128 the overload control mechanism to support sending load and overload 129 information across intermediaries that do not support the mechanism 130 (referred to herein as "non-adjacent" overload reporting.) The 131 document describes two use cases for non-adjacent overload reporting. 132 It does not, however, attempt to describe the use cases for Diameter 133 agents in general. For a more thorough treatment of Diameter agent 134 use cases in the context of overload control, please see 135 [I-D.ietf-dime-overload-reqs]. 137 The secondary purpose is to help the reader understand the concept of 138 overload scopes, and make recommendations about what kinds of 139 overload scope should be supported by the mechanism. These purposes 140 are interrelated, since an understanding of overload scopes is 141 necessary to fully understand some of the issues with non-adjacent 142 overload reporting. 144 2. Document Conventions 146 This document uses terms defined in [RFC6733] and 147 [I-D.ietf-dime-overload-reqs]. In particular, the terms "client", 148 "server","upstream", and "downstream" are used as defined in RFC 149 6733. In addition, this document uses the following terms: 151 Overload: A condition where a Diameter node needs a reduction in the 152 number of requests that it must handle. 154 Overload Report: A request to reduce traffic that contributes to an 155 overload condition. 157 Overload Scope: A classifier that defines the set of requests that 158 may contribute to particular overload conditions. 159 Alternatively, the purposes for which a node may be 160 overloaded. For example, if a server is overloaded for the 161 purposes of one Diameter application but not another, the 162 overload condition can be considered "scoped" to that 163 application. 165 Reporting Node: The node that sends an overload report. Also known 166 as an "overloaded node". 168 Reacting Node: A node that consumes and possibly acts on an overload 169 report. 171 Adjacent Overload Reporting: Overload reports exchanged between 172 adjacent Diameter peers. 174 Non-Adjacent Overload Reporting: Overload reports sent between 175 Diameter nodes separated by one or more intermediate 176 Diameter agents (i.e. relays or proxies) . 178 Piggybacked Overload Reporting: The inclusion of overload reports in 179 existing Diameter messages. 181 Application-Based Overload Reporting: The sending of overload 182 reports in a separate, dedicated Diameter application. 184 3. Non-adjacent Overload Information 186 Requirement 34 of [I-D.ietf-dime-overload-reqs] says that the 187 selected Diameter overload control mechanism "SHOULD" be able to 188 communicate overload and load information across intermediaries that 189 do not support the mechanism. This requirement introduces a number 190 of complications to the solution effort, creating complications in 191 how Diameters negotiate support for overload control, address and 192 route overload reports to the right places, and act on received 193 overload reports. 195 While the requirement does not explicitly say it, we interpret 196 "intermediaries" in this context to mean Diameter agents. The 197 requirement is irrelevant for lower layer intermediaries (e.g. 198 routers), and cannot be reasonably applied for non-Diameter entities, 199 or hybrid entities such as gateways between Diameter and other 200 protocols. 202 The requirement to traverse non-supporting intermediaries is not 203 necessarily the same thing as a requirement for end-to-end 204 communication of overload reports between Diameter clients and 205 servers. Non-adjacent reporting can include client-to-server 206 scenarios. They can also include server-to-agent scenarios and 207 agent-to-client scenarios. All such scenarios may include one or 208 more intervening agents. Since Diameter allows transactions to be 209 sent from server to client, all scenarios may be reversed. 210 Therefore, we refer to this requirement as "Non-adjacent Overload 211 Control". 213 3.1. Use-Cases for Non-adjacent Overload Control 215 There are two primary use-cases for non-adjacent overload control. 217 3.1.1. Interconnect 219 The first significant non-adjacent use-case is the interconnect 220 scenario described in section 2.3 of the overload control 221 requirements [I-D.ietf-dime-overload-reqs]. Two or more Diameter 222 network operators communicate with each other across a third-party 223 interconnect provider that brokers Diameter traffic between the 224 operators. Figure 1 illustrates the interconnect use case. 226 +-------------------------------------------+ 227 | Interconnect | 228 | | 229 | +--------------+ +--------------+ | 230 | | Agent |------| Agent | | 231 | +--------------+ +--------------+ | 232 | .' `. | 233 +------.-'--------------------------`.------+ 234 .' `. 235 .-' `. 237 ------------.'-----+ +----`.------------ 238 +----------+ | | +----------+ 239 |Edge Agent| | |Edge Agent| 240 +----------+ | | +----------+ 241 | | 242 Operator 1 | | Operator 2 243 -------------------+ +------------------ 245 Figure 1: Two Operator Interconnect Scenario 247 If the interconnect provider does not support Diameter overload 248 control, each operator network becomes an island of overload control, 249 similar to those in the non-supporting agent use-case 250 (Section 3.1.2). Even if the interconnect provider does support 251 overload control, the operators may not trust it to generate and act 252 on overload reports on the operators' behalves, and may prefer to 253 exchange overload and load information directly with each other. 255 The interconnect use-case may introduce additional security concerns. 256 While the non-supporting agent use case typically (but not 257 necessarily) occurs inside a single administrative domain, the 258 interconnect case will almost always involve sending overload reports 259 across multiple administrative domains. Since a malicious or 260 incorrect overload report can effectively shut down Diameter 261 processing, the current lack of a viable solution for end-to-end 262 integrity protection of Diameter messages may be a problem. 264 3.1.2. Non-Supporting Agents 266 [I-D.ietf-dime-overload-reqs] requires the solution to function in 267 networks where not all Diameter elements support it. That is, the 268 solution must allow gradual deployment, and must not require a flag- 269 day cutover. If non-adjacent overload control is not supported, one 270 or more non-supporting Diameter Agents can divide a network into 271 overload control islands, where overload information is communicated 272 inside each island, but not among separate islands. 274 In the author's strictly personal opinion, the non-supporting 275 agent use case is less compelling than the interconnect case. The 276 non-supporting agent case would typically occur inside one 277 administrative domain. The operator of that domain has 278 considerably more control over the implementations used in the 279 domain than it might have for third-party domains. 281 3.2. Issues with Non-Adjacent Overload Control 283 3.2.1. Topology Issues 284 Many of the issues with non-adjacent overload control derive from the 285 fact that a Diameter node is unlikely to know the topology of the 286 Diameter network past its immediate peers. In a trivial topology, 287 that is, a Diameter network with only clients and servers, this is 288 not a problem. But if the immediate peer is a Diameter agent, a node 289 is unlikely to know what next hop the relay will select for a given 290 Diameter message. This is particularly difficult if the agent hides 291 topology in either direction, or uses dynamic peer discovery. While 292 a node may be able to infer the path a given message will take in 293 some specific cases (e.g. for mid-session messages), they cannot do 294 this in general. And even those specific cases may fail if an agent 295 on the message path performs topology hiding. 297 This lack of topology knowledge impacts the way that nodes can 298 negotiate overload-control support, the ways they send overload 299 reports, and the ways a reacting node can act to mitigate overload. 300 A non-adjacent overload-control mechanism will need to solve the 301 topology issues, either by offering ways to discover non-adjacent 302 topologies, or offering ways to constrain overload-control relevant 303 parts of such topologies in ways where a node could reasonably know 304 them in advance. 306 3.2.2. Support Negotiation 308 Diameter nodes need to negotiate or otherwise indicate their support 309 for overload control to other nodes. This includes indicating 310 support for overload control in general, as well as potentially 311 indicating support of certain parameters of the overload control 312 solution. For example, a node may need to indicate which overload 313 algorithms it supports. This becomes complex if two non-adjacent 314 nodes need to negotiate support. 316 In a Diameter application-based solution, support for the overload 317 control application would occur during the capabilities exchange 318 between peers. Diameter capabilities exchange occurs strictly 319 between peers; Diameter offers no mechanism for indicating support of 320 a given Application-ID between non-adjacent nodes. 322 Diameter allows non-negotiated use of an arbitrary Application-Id 323 between non-adjacent nodes across Diameter agents that implement the 324 Diameter Relay application. In theory, this means that an 325 application-based, non-adjacent overload control could only traverse 326 Diameter relays, or Diameter proxies that explicitly support the 327 overload-control Application-Id. In the latter case, we assume that 328 a proxy will not indicate support for the overload-control 329 Application-Id unless it supports the overload-control mechanism; 330 such a proxy cannot be considered a non-supporting agent. 332 In practice, a Diameter agent can act as a proxy for some purposes 333 and a relay for others. If a Diameter proxy indicates support for 334 the Diameter relay application, we assume that it will relay any 335 arbitrary application. This means it can be considered a relay for 336 the purposes of overload control. 338 For both application-based and piggybacked solutions, a supporting 339 node needs know the other nodes with which it should negotiate. For 340 overload-control between Diameter peers, this is easy; a node 341 exchanges support information with its immediate peers. But for non- 342 adjacent overload control, this is more difficult for reasons 343 discussed in Section 3.2.1. 345 Therefore, for non-adjacent overload control negotiation, each 346 supporting node either needs advance knowledge of all nodes with 347 which it may negotiate overload-control support, or it needs a 348 mechanism for discovering that knowledge dynamically. 350 3.2.3. Overload Report Delivery 352 With adjacent overload control reporting, overload report addressing 353 and delivery is relatively simple. A node sends overload reports 354 directly to its peers. This becomes more complex for non-adjacent 355 overload-control. 357 For application-based overload control, nodes could address overload 358 reports to specific endpoint nodes using the Destination-Host AVP. 359 Doing so would be subject to the same non-adjacent topology issues 360 described in Section 3.2.1. That is, a node can only send overload 361 reports to non-adjacent clients or servers that it knows about, 362 either from prior knowledge (i.e. provisioning) or from which it has 363 observed previous Diameter messages. 365 An application-based mechanism could possibly address reports to non- 366 adjacent Diameter agents using the Destination-Host AVP. This would 367 effectively make the agent into an endpoint for the overload-control 368 application. 370 A piggy-backed mechanism will have more difficulty addressing non- 371 adjacent overload reports. A piggy-backed mechanism sends overload 372 reports in already existing Diameter requests; That is, requests that 373 have their own purposes and destinations independent of the overload- 374 report. Thus, nodes can only select the destination of an overload 375 report by bundling it into a Diameter message that was already going 376 to that destination. While a piggy-backed mechanism might be able to 377 send overload-reports across quiescent transport connections using 378 watchdog (DWR/DWA) messages, these message are cannot be exchanged 379 between non-adjacent nodes. 381 In some cases, the limit of sending overload reports to 382 destinations to which existing traffic is bound may be acceptable. 383 If a node is contributing to an overload condition, then it's 384 reasonable to assume that node is regularly exchanging traffic 385 with the overloaded node. However, there may be cases where an 386 overload report causes a connection become quiescent. If the 387 reporting node needed to tell a reacting node that the condition 388 has resolved or improved, it would need to send a new report 389 across the now quiescent connection. There may also be cases 390 where a reacting node redirects traffic along a different path, 391 causing a previously quiescent node to suddenly start sending 392 requests to the overloaded node. Thus, without careful selection 393 of the overload report scope, an overloaded node may find itself 394 engaged in a game of Whack-a-Mole [Whac-a-Mole] with previously 395 quiescent non-adjacent nodes. 397 For both piggy-backed and application-based solutions, non-adjacent 398 overload control introduces a need to identify the sender of a 399 report, or at least determine whether the report is from an adjacent 400 or non-adjacent node. This is not required for purely adjacent 401 solutions, since the sender could always be assumed to be the peer. 403 For example, a non-adjacent report with a "Connection" scope does not 404 make sense. If a node receives one, it should ignore it. But in 405 order to make that decision, it must be able distinguish a non- 406 adjacent report from an adjacent one. For example, in an 407 application-based mechanism, 409 3.2.4. Non-Adjacent Overload Scopes 411 A reacting node will typically attempt to mitigate an overload 412 condition by either reducing the number of requests that contribute 413 to the condition, or by rerouting part of that traffic to avoid the 414 problem. In both cases, the reacting node's is limited by its 415 ability to determine to which Diameter requests contribute to the 416 overload condition in the first place. The overload scope concept 417 (Section 4) offers a way for overloaded nodes to indicate what 418 traffic is likely to contribute to an overload condition and should 419 be abated. 421 Not all of the scope-types described in Section 4 make sense for non- 422 adjacent overload control. The "Connection" scope-type is an obvious 423 example, since the reacting node will never share a transport 424 connection with a non-adjacent node; this is the very definition of 425 non-adjacent nodes. 427 Since a Diameter node cannot control how requests are forwarded to 428 non-adjacent nodes, the "Peer" scope-type also does not work well, 429 especially when there are multiple possible destinations up or 430 downstream from the adjacent peer. For example in Figure 2, Node A 431 sends Diameter requests to Nodes B and C across a non-supporting 432 agent. If Node B becomes overloaded but Node C does not, Node A 433 cannot reroute requests to Node C, since it has very little way to 434 influence where the agent will forward any given request. If Node A 435 tries to reduce traffic by 50%, the agent will likely still send half 436 of the remaining traffic to Node B. If B and C are endpoints, Node A 437 may in some cases be able to use the Destination-Host AVP for this 438 purpose (in which case the "Destination-Host" scope-type would be 439 more appropriate), but this does not help if B and C are also agents 440 rather than servers. 442 +--------+ +--------+ 443 | Node B | | Node C | 444 +----+---+ +---+----+ 445 | | 446 +-------+-------+ 447 | 448 +-------+--------+ 449 | Non-Supporting | 450 | Agent | 451 +-------+--------+ 452 | 453 | 454 +----+----+ 455 | Node A | 456 +---------+ 458 Figure 2: Non-Adjacent Routing 460 Scope-types that classify traffic by origin or final destinations, 461 such as "Origin-Host","Destination-Realm", "Application-ID", and 462 "Destination-Host" can be used for non-adjacent overload control. In 463 general, scope-types that may denote non-adjacent intermediary 464 devices, such "Peer" cannot, nor can scope-types that refer only to 465 peers, e.g. "Connection". 467 Even for destination-oriented scope-types, the sender of an overload 468 report must be authoritative for the indicated scope. That is, it 469 must have full knowledge of the congestion state for the scope. For 470 example, if Node B and C both serve the ream "example.com", and B 471 becomes 50% overloaded while C does not, B cannot simply report 50% 472 overload at realm scope. If it did, Node A would reduce its 473 generated traffic by 50%. Since the overall realm is really only 474 overloaded by 75%, this would leave the realm operating beneath 475 available capacity. 477 The need to be authoritative for an indicated scope is also true 478 for strictly adjacent reporting mechanisms. But in an adjacent 479 mechanism, it is easier for an intervening agent to learn the 480 overload state of upstream nodes. In the example, if the agent 481 supported the overload control mechanism, it would most likely 482 receive reports from Nodes B and C, and could then construct 483 downstream reports that incorporate the state of B, C, and its own 484 local state. This contrasts with the non-adjacent case where B 485 must understand the current state of C even though it is not in 486 the path of overload reports from C. 488 Therefore, a given node must only report overload for scopes for 489 which it has full knowledge of the load and overload state. That is, 490 it must be a "scope authority" for any scope it reports. In the 491 example, nodes B and C (and any other nodes serving "example.com") 492 would be required to share current load and overload state. The 493 state-sharing requirement could be substantial for high-capacity 494 nodes. 496 When a node reports overload for a certain scope, reacting nodes will 497 treat the overload condition as uniform across the entire scope. For 498 example, if a node reports overload for an entire realm, reacting 499 nodes will reduce traffic equally for all servers that serve that 500 realm. If the servers are unequally overloaded, they must use a more 501 granular scope-type, for example, "Destination-Host". 503 3.3. Non-adjacent Overload Control Recommendations 505 An adjacent reporting mechanism allows for very flexible and fine 506 grained overload control. It solves or simplifies a number of 507 issues, such as negotiation of support and parameters, requirements 508 for topology knowledge, end-to-end security, etc, by avoiding them in 509 the first place. Adding non-adjacent support to such a mechanism 510 would complicate it considerably. 512 Non-adjacent overload control mechanism are better for connecting 513 islands of overload control. Such a mechanism works well for larger 514 scopes and relatively static topologies. 516 The author believes that we are unlikely to find a single solution 517 that works well for both adjacent and non-adjacent overload control. 518 While a single solution is more desirable in general, a single 519 solution that works well for both cases is likely to be extremely 520 complicated. Therefore, the working group should consider a separate 521 mechanism for the non-adjacent delivery of overload reports. 523 If the group chooses to accept two separate solutions, we should be 524 able to specify a single data model and set of AVPs that work for 525 both, with some restrictions. (For example, the non-adjacent 526 solution would likely forbid the use of the "Connection" scope-type.) 528 If the working group chooses to add non-adjacent features to MDOC or 529 DOCA, we will need to change the support negotiation mechanisms to 530 allow for the non-adjacent case, specify how a node can determine 531 whether a report is adjacent or non-adjacent, and state what subset 532 of scope-types are allowed in non-adjacent supports. We will also 533 need to study how we can meet the security-related requirements 534 [I-D.ietf-dime-overload-reqs] given the current lack of end-to-end 535 security features in Diameter. 537 4. Overload Scopes 539 Diameter overload does not necessarily affect all kinds of Diameter 540 traffic. A node may become overloaded for some requests but not 541 others. For example, a Diameter agent may handle requests for more 542 than one Diameter Application, and may route requests to a different 543 set of servers for each application. If one server set becomes 544 overloaded, but the other does not, then the agent itself is 545 effectively overloaded for one application, but can process the other 546 at normal capacity. 548 The Diameter overload requirements [I-D.ietf-dime-overload-reqs] list 549 several scenarios that illustrate overload that affects some requests 550 but not others. We refer to the set of requests affected by a 551 particular overload event as the "scope" of the overload event. The 552 overload requirements require the mechanism to be able to report 553 overload reports that are "scoped" to (that is, they affect requests 554 targeted to) a particular Diameter node, a Realm, or a Diameter 555 Application. 557 The concept of scope may also be useful when applied to reported 558 load even without an overload condition. This usage is out of 559 "scope" for this document. 561 A scope indication in an overload report is a set of classifiers that 562 identify requests likely to contribute to the overload condition. In 563 general, this could include any aspect of a Diameter message that a 564 reacting node can observe. For example, requests could be classified 565 by Attribute Value Pair (AVP) values or next-hop routing decisions. 567 The ability to express the scope of an overload condition is only 568 useful when reacting nodes can act on the information. There are 569 only a small number of actions a reacting node may take to mitigate 570 overload. Essentially these actions boil down to reducing the number 571 of requests that "match" the scope, either by sending fewer requests 572 in the first place, or by routing around the problem. The former is 573 limited by the node's ability to distinguish between requests that 574 match the overload scope, and request that do not. The latter is 575 limited by the node's ability to predict or influence how a request 576 will be routed. 578 Reacting nodes most likely take additional application-specific 579 actions to mitigate overload conditions. If a client reduces the 580 number of messages it sends, it almost certainly has to take 581 additional application-specific steps that affect its own client 582 application. Depending on the application, it might refuse some 583 client application requests, redirect some of its own clients to 584 different services (e.g. offloading mobile data sessions to local 585 WiFi networks), or assert an overload condition in the client 586 application protocol (e.g. The Session Initiation Protocol (SIP) 587 ). 589 This section discusses the meanings of the required scope-types, and 590 analyses their implications for the selected mechanism. 592 4.1. Explicit vs Implicit Indication of Scopes 594 Both MDOC and DOCA use explicit scope indication. That is, the scope 595 of an overload report is not, in general, implied by the type of 596 message that carries the report. For example, if an overload report 597 is scoped to a particular Diameter Application-Id, the report 598 explicitly indicates affected Application-Id, rather than leaving the 599 reacting-node to infer the Application-ID based on that of the 600 message that carries the report. There are a few exceptions to this; 601 for example MDOC supports a "Connection" scope that, when specified, 602 pertains to requests to be sent over the same transport connection 603 over which the overload report arrived. 605 List discussions have shown a common assumption that overload 606 reports sent over a piggy-backed solution such as MDOC would only 607 affect requests associated with the same Diameter Application-Id. 608 For MDOC, this is a false assumption. MDOC's explicit use of 609 scopes allows overload reports sent over one application to affect 610 requests for any arbitrary application. On the other hand, 611 solutions that use a dedicated Application-Id (such as DOCA) 612 necessarily require the ability to report overload for arbitrary 613 applications; otherwise it would only be possible for an overload 614 control application to report overload on itself. 616 Some list participants have suggested that the solution include a 617 concept of a default scope, that is, a scope that is implied if no 618 other scope is explicitly indicated. The concept of default or 619 implicit scopes requires further study by the working group. 621 4.2. Types of Overload Scopes 623 There are several different kinds, or types, of overload scopes. The 624 type of a scope defines how the reacting node interprets it. Table 1 625 gives a summary of the scope types discussed in this document. The 626 "Scope Type" column gives the name of the scope. The "Affected 627 Traffic" column describes what Diameter requests are impacted by the 628 scope-type. The "Reacting-Node" column describes which Diameter 629 nodes may be able to take action on an overload report with the 630 respective scope-type. Finally, the "Draft" column describes which 631 proposed solution includes the respective scope-type. 633 +------------------+-----------------------+---------------+--------+ 634 | Scope Type | Affected Traffic | Reacting-Node | Draft | 635 +------------------+-----------------------+---------------+--------+ 636 | Connection | Requests sent to | Adjacent Peer | MDOC, | 637 | | directly to the | | DOCA | 638 | | reporting-node on a | | | 639 | | particular transport | | | 640 | | connection | | | 641 | Peer | Requests routed | Adjacent Peer | MDOC, | 642 | | directly to | | DOCA | 643 | | reporting-node. | | | 644 | Destination-Host | Requests with a | Any | MDOC | 645 | | matching Destination- | | | 646 | | Host AVP | | | 647 | Origin Host | Requests including a | Any | DOCA? | 648 | | matching Origin-Host | | | 649 | | AVP | | | 650 | Diameter | Requests with a | Any | MDOC, | 651 | Application | matching Application- | | DOCA | 652 | | Id AVP | | | 653 | Destination | Requests with a | Any | MDOC, | 654 | Realm | matching Destination- | | DOCA | 655 | | Realm AVP | | | 656 | Session | Requests with a | Any | MDOC | 657 | | matching Session-Id | | | 658 | | AVP | | | 659 | Session-Group | Requests belonging to | Any | MDOC | 660 | | sessions assigned | | | 661 | | matching labels | | | 662 +------------------+-----------------------+---------------+--------+ 664 Table 1: Summary of Overload Scope Types 666 4.2.1. Connection Scope-Type 667 The "Connection" scope-type indicates that the reacting node should 668 reduce traffic sent on the transport connection on which it received 669 the overload report. A Connection scope indicate does not include an 670 explicit value; rather it implies "this connection". 672 4.2.2. Peer Scope-Type 674 The "Peer" scope-type indicates that a particular Diameter node is 675 overloaded. Other nodes should mitigate the overload by reducing the 676 number of requests that will land on the overloaded node, either by 677 sending fewer requests, or by attempting to route requests around the 678 overloaded node. 680 In both MDOC and DOCA, the "Peer" scope-type is named "Host". In 681 practice, only immediate peers can act as the reacting node for a 682 Host scoped overload report. This is due to the fact that non- 683 adjacent nodes have limited ability to influence routing decisions 684 beyond the immediate next hop. This document uses the term "Peer" 685 to illustrate that fact. 687 Large-scale Diameter nodes are often implemented as clusters of IP 688 hosts, which may or may not share their knowledge about upstream 689 overload conditions. Certain IP hosts in a cluster could become 690 overloaded when others do not. Furthermore, if the reacting-node is 691 also clustered, it may be difficult for the cluster members to share 692 real-time knowledge of the reporting-node's overload state. This can 693 make it difficult for a node to know conclusively whether any two 694 connections that appear to connect to the same peer can be treated as 695 such for the purposes of overload control. The working group should 696 study whether the Peer scope-type should be deprecated in favor of 697 the "Connection" scope-type. 699 4.2.3. Destination-Host Scope-Type 701 The "Destination-Host" scope type pertains to requests that contain a 702 Destination-Host AVP that matches the indicated Destination-Host 703 value. Destination-Host always refers to the endpoint for a given 704 Diameter request. 706 The best the reacting node can do is reduce the number of requests 707 that contain a Destination-Host AVP that match the overloaded node. 708 Rerouting will not help in general, since the requests will simply 709 take different routes to arrive at the same overloaded server. 710 Unless the destination node is also direct peer, the reacting node 711 cannot do much about requests that don't contain a Destination-Host 712 AVP in the first place, since it cannot predict whether these 713 requests will land on the overloaded endpoint. The Destination-Host 714 scope type is useful for requests bound to a particular server, for 715 example, mid-session requests for a session-stateful application. 717 Go ahead and cover details for "session" and "session-groups", and 718 argue for removal of "session". 720 4.2.4. Origin-Host Scope-Type 722 While most scope-types refer to where a request is likely to go, the 723 "Origin-Host" scope-type refers to where the request originates. 724 That is, any request with a matching Origin-Host AVP would match. 725 The Origin-Host scope type is useful for situations where a specific 726 client or set of clients sends an excessive number of requests. An 727 overload report with an Origin-Host scope would tell matching clients 728 to reduce traffic, or agents to throttle requests that came from 729 matching clients. 731 Note that the Origin-Host scope-type is not explicitly mentioned 732 in the requirements document. The authors include it here because 733 others have mentioned the need in conversation. 735 4.2.5. Diameter-Application Scope-Type 737 The "Diameter Application" scope-type indicates overload for a 738 particular Diameter application. That is, it impacts all requests 739 with the matching value in an Application-Id AVP. 741 The Diameter Application scope-type is useful for declaring an 742 overload condition that affects a specific Diameter service, 743 typically, but not necessarily, in a specific realm. 745 Since the Diameter Application scope-type indicates overload for an 746 entire application, reacting nodes should reduce the number of 747 requests sent for that application. Similarly to the Realm scope- 748 type, it will rarely if ever make sense for a Diameter node to 749 reroute traffic to a different Diameter application. 751 4.2.6. Destination-Realm Scope-Type 752 The "Destination-Realm" scope-type indicates overload for all servers 753 that handle requests for the particular Diameter realm. That is, it 754 impacts all requests with the particular realm in the Destination- 755 Realm AVP. 757 The Realm scope-type is useful for declaring a global overload 758 condition within a network serving a single realm. It is also useful 759 for requesting third-parties to reduce Diameter traffic sent to a 760 particular realm, for example, in roaming scenarios. 762 Since the Realm scope-type indicates overload for an entire realm, 763 reacting nodes should reduce the number of messages sent for the 764 realm. Rerouting traffic does not make sense for the Realm scope 765 type, since it would probably never be useful for Diameter nodes to 766 reroute traffic destined for an overloaded realm to a different, non- 767 overloaded realm. Client applications might, however, be able to 768 choose to use services from a different operator if the Diameter 769 realm of one operator reports an overload condition. 771 MDOC currently makes the Realm scope-type mandatory to implement. 772 List participants have indicated that there may be use cases where 773 all Diameter traffic on a network uses the same Realm, and that the 774 use of the Realm scope-type would be redundant in such networks. 775 Whether the Realm scope-type should remain mandatory or become 776 optional to implement requires further study. 778 4.2.7. Session Scope-Type 780 MDOC currently includes a "Session" scope-type. This scope-type 781 refers to messages that include a matching Session-Id. Conceptually, 782 this applies to all requests that are part of a previously 783 established session. This scope-type could potentially be useful for 784 a session-stateful agent that assigns session-establishing requests 785 to a certain server, and then sends all future requests in that 786 session to the same server. If that server became overloaded, the 787 agent could send an overload report scoped to the assigned session. 789 However, the Session scope-type will become unwieldy for anything 790 other than very small-scale installations. The number of sessions 791 assigned to any specific server is likely to be quite large. 792 Therefore, the number of Session scope values would probably become 793 quite large. The working group should consider deprecating the 794 Session scope-type. In non-topology hiding agents, the Destination- 795 Host scope-type can be used to affect all sessions assigned to a 796 particular server. For topology-hiding agents, the session-group 797 mechanism can do the same. 799 4.2.8. Session-Group Scope-Type 801 Diameter agents that implement certain topology-hiding schemes may 802 modify Origin-Host AVPs inserted by servers, and use some local 803 mechanism to bind sessions to specific servers. The "Destination- 804 Host" type may not function correctly in this case. MDOC specifies a 805 "session-group" scope-type, where an agent or server can assign a 806 common identifier to sessions that are fate-shared in some way, such 807 as being bound to the same server. If that server becomes 808 overloaded, the agent can send an overload report that matches 809 requests in all sessions with the matching identifier. 811 This scope-type may be useful under certain circumstances, but may 812 also be complex to implement. Further discussion is needed to 813 determine if the session-group type should be included in the base 814 mechanism. Since the mechanism is required to allow extensible 815 scope-types, session-groups could still be added in the future. The 816 working group should study whether the Session-Group mechanism should 817 be included in the base overload control solution, or removed with 818 the potential to add as an extension scope-type in the future. 820 4.3. Scope Values 822 Scope labels in an overload report will typically take the form of a 823 scope-type and a value. For example, if the "example.com" realm is 824 overloaded for all services, the overload report would indicate a 825 scope-type of "Realm" and a scope-value of "example.com" 827 The Connection scope-type is an exception. Since an overload report 828 with a Connection scope is only actionable by one of the peers 829 connected via the specified connection, it makes sense to treat the 830 Connection scope-type as always having a value of "this connection". 832 4.4. Combining Scopes 834 Diameter nodes will commonly need to construct overload reports that 835 apply to a combination of scopes. For example, if a given realm is 836 overloaded for subset of the applications it supports, it might 837 indicate both a realm scope and and one or more Diameter application 838 scopes. 840 Logically, combining multiple scopes of different types reduces the 841 overall set of requests to which the overload report would apply. 842 Combining multiple scopes of the same type increases the applicable 843 set. A function that determines the requests affected by an overload 844 report could model this as a logical "and" or "intersection" operator 845 for combining scopes of different types, and a logical "or" or 846 "union" operator for combining scopes of the same type. 848 The working group should study whether all possible combinations 849 should be allowed. For example, it may or may not make sense to 850 combine a "Connection" scope with other scopes, or to allow more than 851 one "Connection" scope-value for a single overload report. 853 4.5. Scope Extensibility 855 [I-D.ietf-dime-overload-reqs] requires scope-types to be extensible. 856 This requirement implies that the chosen mechanism or mechanisms must 857 discuss how new scope-types can be added, how support for specific 858 scope-types should be declared or negotiated, and which scope-types 859 might be mandatory to support. 861 4.6. Scope Recommendations 863 In the author's opinion, the selected solution or solutions should 864 support, at a minimum, the "Connection", "Destination-Host", "Realm" 865 and "Application-ID" scope-types. The working group should consider 866 also adding the "Origin-Host" scope-type. 868 The working group should consider whether the advantages of the 869 "session-group" concept and scope-type are worth the complexity. The 870 group should also study whether the Peer scope-type adds sufficient 871 utility over the Connection scope-type to warrant it's inclusion. 873 5. IANA Considerations 875 This draft makes no requests of IANA. 877 6. Security Considerations 879 Overload reports induce Diameter nodes to reduce or reroute traffic. 880 For large scopes, a single erroneous or malicious overload report 881 could effectively shut down Diameter processing for an entire realm. 882 A Diameter overload control solution needs mechanisms to ensure that 883 overload reports are only accepted from trusted sources, and that 884 nothing tampers with the reports en route. 886 For adjacent approaches, the transport connection can be protected 887 with TLS or IPSec. But this will not help for non-adjacent 888 reporting, since no such transport connection exists. 890 While such work is in progress in the DIME working group, Diameter 891 has no currently viable mechanism for end-to-end authentication and 892 integrity protection. The working group should consider either 893 making non-adjacent overload control contingent on a generic Diameter 894 end-to-end protection mechanism, or adding a specialized protection 895 mechanism to any resulting non-adjacent overload control solution. 897 7. References 899 7.1. Normative References 901 [RFC6733] Fajardo, V., Arkko, J., Loughney, J., and G. Zorn, 902 "Diameter Base Protocol", RFC 6733, October 2012. 904 [I-D.ietf-dime-overload-reqs] 905 McMurry, E. and B. Campbell, "Diameter Overload Control 906 Requirements", draft-ietf-dime-overload-reqs-07 (work in 907 progress), June 2013. 909 7.2. Informative References 911 [I-D.roach-dime-overload-ctrl] 912 Roach, A. and E. McMurry, "A Mechanism for Diameter 913 Overload Control", draft-roach-dime-overload-ctrl-03 (work 914 in progress), May 2013. 916 [I-D.korhonen-dime-ovl] 917 Korhonen, J. and H. Tschofenig, "The Diameter Overload 918 Control Application (DOCA)", draft-korhonen-dime-ovl-01 919 (work in progress), February 2013. 921 [Whac-a-Mole] 922 , "Whack-a-Mole Colloquial Usage", , . 925 Appendix A. Contributors 927 Eric McMurry and Robert Sparks made significant contributions to the 928 concepts in this draft. 930 Author's Address 932 Ben Campbell 933 Tekelec 934 17210 Campbell Rd. 935 Suite 250 936 Dallas, TX 75252 937 US 939 Email: ben@nostrum.com