idnits 2.17.1 draft-roach-dime-overload-ctrl-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2012) is 4204 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-13) exists of draft-ietf-dime-overload-reqs-00 ** Downref: Normative reference to an Informational draft: draft-ietf-dime-overload-reqs (ref. 'I-D.ietf-dime-overload-reqs') == Outdated reference: A later version (-15) exists of draft-ietf-soc-overload-control-10 -- Obsolete informational reference (is this intentional?): RFC 3588 (Obsoleted by RFC 6733) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DIME A. B. Roach 3 Internet-Draft Tekelec 4 Intended status: Standards Track October 22, 2012 5 Expires: April 25, 2013 7 A Mechanism for Diameter Overload Control 8 draft-roach-dime-overload-ctrl-01 10 Abstract 12 When a Diameter server or agent becomes overloaded, it needs to be 13 able to gracefully reduce its load, typically by informing clients to 14 reduce or stop sending traffic for some period of time. Otherwise, 15 it must continue to expend resources parsing and responding to 16 Diameter messages. 18 This document proposes a concrete, application-independent mechanism 19 to address the challenge of communicating load and overload state 20 among Diameter peers, and specifies an algorithm for load abatement 21 to address such overload conditions as they occur. The load 22 abatement algorithm is extensible, allowing for future documents to 23 define additional load abatement approaches. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on April 25, 2013. 42 Copyright Notice 44 Copyright (c) 2012 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.1. Mechanism Properties . . . . . . . . . . . . . . . . . . . 4 61 1.2. Overview of Operation . . . . . . . . . . . . . . . . . . 6 62 1.3. Documentation Conventions . . . . . . . . . . . . . . . . 6 63 2. Overload Scopes . . . . . . . . . . . . . . . . . . . . . . . 6 64 2.1. Scope Descriptions . . . . . . . . . . . . . . . . . . . . 7 65 2.2. Combining Scopes . . . . . . . . . . . . . . . . . . . . . 8 66 3. Diameter Node Behavior . . . . . . . . . . . . . . . . . . . . 9 67 3.1. Connection Establishment Procedures . . . . . . . . . . . 9 68 3.2. Diameter Client and Diameter Server Behavior . . . . . . . 11 69 3.2.1. Sending a Request . . . . . . . . . . . . . . . . . . 12 70 3.2.2. Receiving a Request . . . . . . . . . . . . . . . . . 13 71 3.2.3. Sending an Answer . . . . . . . . . . . . . . . . . . 14 72 3.2.4. Receiving an Answer . . . . . . . . . . . . . . . . . 15 73 3.3. Diameter Agent Behavior . . . . . . . . . . . . . . . . . 15 74 3.3.1. Proxying a Request . . . . . . . . . . . . . . . . . . 16 75 3.3.2. Proxying an Answer . . . . . . . . . . . . . . . . . . 16 76 3.4. Proactive Load and Overload Communication . . . . . . . . 16 77 3.5. Load Processing . . . . . . . . . . . . . . . . . . . . . 16 78 3.5.1. Sending Load Information . . . . . . . . . . . . . . . 17 79 3.5.2. Receiving Load Information . . . . . . . . . . . . . . 18 80 3.6. Session Establishment for Session Groups . . . . . . . . . 19 81 3.6.1. Session Group Concepts . . . . . . . . . . . . . . . . 19 82 3.6.2. Session Group Procedures . . . . . . . . . . . . . . . 21 83 4. Loss-Based Overload Control Algorithm . . . . . . . . . . . . 22 84 4.1. Overload-Metric values for the 'Loss' Algorithm . . . . . 22 85 4.2. Example Implementation . . . . . . . . . . . . . . . . . . 23 86 5. Diameter AVPs for Overload . . . . . . . . . . . . . . . . . . 27 87 5.1. Load-Info AVP . . . . . . . . . . . . . . . . . . . . . . 27 88 5.2. Supported-Scopes AVP . . . . . . . . . . . . . . . . . . . 28 89 5.3. Overload-Algorithm AVP . . . . . . . . . . . . . . . . . . 28 90 5.4. Overload-Info-Scope AVP . . . . . . . . . . . . . . . . . 29 91 5.4.1. Realm Scope . . . . . . . . . . . . . . . . . . . . . 30 92 5.4.2. Application-ID Scope . . . . . . . . . . . . . . . . . 30 93 5.4.3. Host Scope . . . . . . . . . . . . . . . . . . . . . . 30 94 5.4.4. Session Scope . . . . . . . . . . . . . . . . . . . . 30 95 5.4.5. Connection Scope . . . . . . . . . . . . . . . . . . . 30 96 5.4.6. Session Group Scope . . . . . . . . . . . . . . . . . 31 97 5.5. Overload-Metric AVP . . . . . . . . . . . . . . . . . . . 31 98 5.6. Period-Of-Validity AVP . . . . . . . . . . . . . . . . . . 31 99 5.7. Session-Group AVP . . . . . . . . . . . . . . . . . . . . 31 100 5.8. Load AVP . . . . . . . . . . . . . . . . . . . . . . . . . 31 101 6. Security Considerations . . . . . . . . . . . . . . . . . . . 31 102 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 103 7.1. New Diameter AVPs . . . . . . . . . . . . . . . . . . . . 32 104 7.2. New Diameter Disconnect-Cause . . . . . . . . . . . . . . 32 105 7.3. New Diameter Response Code . . . . . . . . . . . . . . . . 33 106 7.4. New Command Flag . . . . . . . . . . . . . . . . . . . . . 33 107 7.5. Overload Algorithm Registry . . . . . . . . . . . . . . . 33 108 7.6. Overload Scope Registry . . . . . . . . . . . . . . . . . 33 109 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 110 8.1. Normative References . . . . . . . . . . . . . . . . . . . 34 111 8.2. Informative References . . . . . . . . . . . . . . . . . . 34 112 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 34 113 Appendix B. Requirements Analysis . . . . . . . . . . . . . . . . 35 114 Appendix C. Extending the Overload Mechanism . . . . . . . . . . 35 115 C.1. New Algorithms . . . . . . . . . . . . . . . . . . . . . . 35 116 C.2. New Scopes . . . . . . . . . . . . . . . . . . . . . . . . 35 117 Appendix D. Design Rationale . . . . . . . . . . . . . . . . . . 36 118 D.1. Piggybacking . . . . . . . . . . . . . . . . . . . . . . . 36 119 D.2. Load AVP in All Packets . . . . . . . . . . . . . . . . . 37 120 D.3. Graceful Failure . . . . . . . . . . . . . . . . . . . . . 38 121 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 38 123 1. Introduction 125 When a Diameter [I-D.ietf-dime-rfc3588bis] server or agent becomes 126 overloaded, it needs to be able to gracefully reduce its load, 127 typically by informing clients to reduce or stop sending traffic for 128 some period of time. Otherwise, it must continue to expend resources 129 parsing and responding to Diameter messages. 131 This document defines a mechanism for communicating the load and 132 overload information among Diameter nodes. It also defines a base 133 algorithm for shedding traffic under overload circumstances. The 134 design of the mechanism described in this document allows for the 135 definition of alternate load abatement algorithms as well. 137 The mechanism proposed in this document is heavily influenced by the 138 work performed in the IETF Session Initiation Protocol (SIP) Overload 139 Control Working Group, and draws on the conclusions reached by that 140 working group after extensive network modeling. 142 The solution described in this document is intended to satisfy the 143 requirements described in [I-D.ietf-dime-overload-reqs], with the 144 exception of REQ 35. As discussed in that document, the intention of 145 a Diameter overload mechanism is to handle overload of the actual 146 message processing portions of Diameter servers. This is in contrast 147 to congestion, which is the inability of the underlying switching and 148 routing fabric of the network to carry the volume of traffic at the 149 volume that IP hosts wish to send it. Handling of congestion is 150 relegated to the underlying transport protocol (TCP or SCTP), and 151 will not be discussed. 153 Philosophically, the approach in designing this mechanism is based on 154 the prospect that building a base-level, fully compliant 155 implementation should be a very simple and straightforward exercise. 156 However, the protocol includes many additional features that may be 157 implemented to allow Diameter nodes to apply increasingly 158 sophisticated behaviors. This approach gives implementors the 159 freedom to implement as sophisticated a scheme as they desire, while 160 freeing them from the burden of unnecessary complexity. By doing so, 161 the mechanism allows for the rapid development and deployment of the 162 mechanism followed by a period of steady and gradual improvements as 163 implementations become more capable. 165 1.1. Mechanism Properties 167 The core Diameter overload mechanism described in this document is 168 fundamentally hop-by-hop. The rationale for using a hop-by-hop 169 approach is the same as is described in section 5.1 of [RFC6357]. 170 However, due to the fact that Diameter networks frequently have 171 traffic that is easily grouped into a few well-defined categories, we 172 have added some concepts that allow Diameter agents to push back on 173 subsets of traffic that correspond to certain well-defined and 174 client-visible constructs (such as Destination-Host, Destination- 175 Realm, and Application-ID). These constructs are termed "Scopes" in 176 this document. A more complete discussion of Scopes is found in 177 Section 2. 179 The key information transmitted between Diameter peers is the current 180 server load (to allow for better balancing of traffic, so as to 181 preempt overload in the first place) as well as an indication of 182 overload state and severity (overload information). The actual load 183 and overload information is conveyed as a new compound AVP, added to 184 any Diameter messages that allow for extensibility. As discussed in 185 section 3.2 of [I-D.ietf-dime-rfc3588bis], all CCFs are encouraged to 186 include AVP-level extensibility by inclusion of a "* [ AVP ]" 187 construct in their syntax definition. The document author has 188 conducted an extensive (although admittedly not exhaustive) audit of 189 existing applications, and found none lacking this property. The 190 inclusion of load and overload information in existing messages has 191 the property that the frequency with which information can be 192 exchanged increases as load on the system goes up. 194 For the purpose of grouping the several different parts of load 195 information together, this mechanism makes use of a Grouped AVP, 196 called "Load-Info". The Load-Info AVP may appear one or more times 197 in any extensible command, with the restriction that each instance of 198 the Load-Info AVP must contain different Scopes. 200 Load and overload information can be conveyed during times of inter- 201 node quiescence through the use of DWR/DWA exchanges. These 202 exchanges can also be used to proactively change the overload or load 203 level of a server when no other transaction is ready to be sent. 204 Finally, in the unlikely event that an application is defined that 205 precludes the inclusion of new AVPs in its commands, DWR/DWA 206 exchanges can be sent at any rate acceptable to the server in order 207 to convey load and overload information. 209 In [RFC3588], the DWR and DWA message syntax did not allow for the 210 addition of new AVPs in the DWR and DWA messages. This oversight 211 was fixed in [I-D.ietf-dime-rfc3588bis]. To allow for 212 transmission of load information on quiescent links, 213 implementations of the mechanism described in this document are 214 expected to correctly handle extension AVPs in DWR and DWA 215 messages, even if such implementations have not otherwise been 216 upgraded to support [I-D.ietf-dime-rfc3588bis]. 218 1.2. Overview of Operation 220 During the capabilities exchange phase of connection establishment, 221 peers determine whether the connection will make use of the overload 222 control mechanism; and, if so, which optional behaviors are to be 223 employed. 225 The information sent between adjacent nodes includes two key metrics: 226 Load (which, roughly speaking, provides a linear metric of how busy 227 the node is), and Overload-Metric (which is input to the negotiated 228 load abatement algorithm). 230 Message originators (whether originating a request or an answer) 231 include one or more Load-Info AVPs in messages when they form them. 232 These Load-Info AVPs reflect the originators' own load and overload 233 state. 235 Because information is being used on a hop-by-hop basis, it is 236 exchanged only between adjacent nodes. This means that any Diameter 237 agent that forwards a message (request or answer) is required to 238 remove any information received from the previous hop, and act upon 239 it as necessary. Agents also add their own load and overload 240 information (which may, at implementors' preference, take previous- 241 hop information into account) into a new Load-Info AVP before sending 242 the request or answer along. 244 Because the mechanism requires affirmative indication of support 245 in the capabilities exchange phase of connection establishment, 246 load and overload information will never be sent to intermediaries 247 that do not support the overload mechanism. Therefore, no special 248 provisions need to be made for removal of information at such 249 intermediaries -- it will simply not be sent to them. 251 Message recipients are responsible for reading and acting upon load 252 and overload information that they receive in such messages. 254 1.3. Documentation Conventions 256 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 257 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 258 document are to be interpreted as described in [RFC2119]. 260 2. Overload Scopes 262 In normal operation, a Diameter node may be overloaded for some but 263 not all possible requests. For example, an agent that supports two 264 realms (realm A and realm B in this example) may route traffic to one 265 set of servers for realm A, and another set of servers for realm B. 266 If the realm A servers are overloaded but realm B servers are not, 267 then the agent is effectively overloaded for realm A but not for 268 realm B. 270 Despite the fact that Diameter agents can report on scopes that 271 semantically map to constructs elsewhere in the network, it is 272 important to keep in mind that overload state is still reported on a 273 hop-by-hop basis. In other words, the overload state reported for 274 realm A in the example above represents the aggregate of the agent's 275 overload state along with the overload state being reported by 276 applicable upstream servers (those serving realm A). 278 Even without the use of Diameter agents, similar situations may arise 279 in servers that need to make use of external resources for certain 280 applications but not for others. For example, if a single server is 281 handling two applications, one of which uses an external database 282 while the other does not, it may become overloaded for the 283 application that uses the external database when the database 284 response latency increases. 286 The indication of scopes for overload information (using the 287 Overload-Info-Scope AVP; see Section 5.4) allows a node to indicate a 288 subset of requests to which overload information is to be applied. 289 This document defines seven scopes; only "Connection" scope is 290 mandatory to implement. The use of the optional scopes, along with 291 the use of any additional scopes defined in other documents, is 292 negotiated at connection establishment time; see Section 3.1. 294 2.1. Scope Descriptions 296 Destination-Realm: This scope, which nodes SHOULD implement, 297 pertains to all transactions that have a Destination-Realm AVP 298 matching the indicated value. 300 Application-ID: This scope, which nodes SHOULD implement, pertains 301 to all transactions that contain an Application-ID field 302 matching the indicated value. 304 Destination-Host: This scope, which nodes SHOULD implement, pertains 305 to all transactions that have a Destination-Host AVP matching 306 the indicated value. 308 Host: This scope, which nodes SHOULD implement, pertains to all 309 transactions sent directly to the host matching the indicated 310 value. 312 Connection: This scope, which nodes MUST implement, pertains to all 313 transactions sent on the same TCP connection or SCTP 314 association. This scope has no details indicating which 315 connection or association it applies to; instead, the recipient 316 of an indication of "Connection" scope is to use the connection 317 or association on which the message was received as the 318 indicated connection or association. In other words, any use 319 of Connection scope applies to "this connection." 321 Session-Group: This scope, which nodes MAY implement, pertains to 322 all transactions in a session that has been assigned to the 323 indicated group. For more information on assigning sessions to 324 groups, see Section 3.6. 326 Session: This scope, which nodes MAY implement, pertains to all 327 transactions in the indicated session. 329 Some applications do not have long-running sessions containing 330 multiple transactions. For such applications, the use of "Session- 331 Group" and "Session" scopes do not make sense. Such applications 332 will instead make use of the most applicable of the remaining five 333 scopes (plus any negotiated extension scopes) to achieve overload 334 control. 336 OPEN ISSUE: Is there value to including a stream-level scope for 337 SCTP? We haven't been able to come up with a use case for doing so 338 yet, but it wouldn't necessarily be unreasonable. 340 2.2. Combining Scopes 342 To allow for the expression of more complicated scopes than the 343 primitives defined above, multiple Overload-Info-Scope AVPs may be 344 included in a single Load-Info AVP. Semantically, these scopes are 345 included in the following way: 347 o Attributes of the different kinds are logically and-ed together 348 (e.g., if both "Destination-Realm" and "Application-ID" are 349 present, the information applies to requests sent that match both 350 the realm and the application). 352 o Attributes of the same kind are logically or-ed together (e.g., if 353 two "Destination-Realm"s are present, the information applies to 354 requests sent to either realm). 356 o If a transaction falls within more than one scope, the "most 357 overloaded" scope is used for traffic shaping. 359 To prevent the complexity of implementing arbitrary scope combination 360 rules, only the following combinations of scopes are allowed (OPEN 361 ISSUE -- we need to figure out what makes most sense for expressing 362 these combinations. Formal grammar? Prose? A table of some kind? 363 For now, they're expressed as a pseudo-ABNF): 365 o 1*(Destination-Realm) 0*1(Application-ID) 366 o 1*(Application-ID) 0*1(Destination-Realm) 367 o 1*(Application-ID) 0*1(Destination-Host) 368 o 1*(Application-ID) 0*1(Host) 369 o 1*(Application-ID) 0*1(Connection) 370 o 1*(Destination-Host) 371 o 1*1(Host) 372 o 1*1(Connection) 373 o 1*(Session-Group) 0*1(Host | Connection) 374 o 1*(Session) 0*1(Host | Connection) 376 OPEN ISSUE: Is this the right set of scope combinations? Is there 377 a need for more? Are any of these unnecessary? Ideally, this 378 should be the smallest set of combinations that lets nodes report 379 what they realistically need to report. 381 Any document that creates additional scopes MUST define how they may 382 be combined with all scopes registered with IANA at the time of their 383 publication. 385 3. Diameter Node Behavior 387 The following sections outline the behavior expected of Diameter 388 clients, servers, and agents that implement the overload control 389 mechanism. 391 OPEN ISSUE: SIP Overload Control includes a sequence parameter to 392 ensure that out-of-order messages do not cause the receiver to act on 393 state that is no longer accurate. Is message reordering a concern in 394 Diameter? That is, do we need to include sequence numbers in the 395 messages to ensure that the receiver does not act on stale state 396 information? Because Diameter uses only reliable, in-order 397 transports, it seems that this isn't likely to be an issue. Is there 398 room for a race when multiple connections are in use? 400 3.1. Connection Establishment Procedures 402 Negotiation for support of this mechanism is performed during 403 Diameter capabilities exchange. Optional protocol features and 404 extensions to this mechanism are also negotiated at this time. No 405 provision is provided for renegotiation of mechanism use or 406 extensions during the course of a connection. If peers wish to make 407 changes to the mechanism, they must create a new connection to do so. 409 The connection initiator includes a Load-Info AVP in the CER 410 (Capabilities-Exchange-Request) message that it sends after 411 establishing the connection. This Load-Info AVP MUST contain a 412 Supported-Scopes AVP and an Overload-Algorithm AVP. The Supported- 413 Scopes AVP includes a comprehensive list of scopes supported that the 414 connection initiator can receive and understand. See Section 5.2 for 415 information on the format of the Supported-Scopes AVP. 417 The Load-Info AVP in a CER message also MAY contain one or more 418 Overload-Algorithm AVPs. If present, these AVPs indicate every 419 Overload-Algorithm the connection initiator is willing to support for 420 the connection that is being established. If the connection 421 initiator supports only the "Loss" algorithm, it MAY indicate this 422 fact by omitting the Overload-Algorithm altogether. 424 The Load-Info AVP in a CER message MAY also contain additional AVPs, 425 as defined in other documents, for the purpose of negotiation 426 extensions to the Overload mechanism. 428 The Diameter node that receives a CER message first examines it for 429 the presence of a Load-Info AVP. If no such AVP is present, the node 430 concludes that the overload control mechanism is not supported for 431 this connection, and no further overload-related negotiation is 432 performed. If the received CER contains a Load-Info AVP, the 433 recipient of that message stores that information locally in the 434 context of the connection being established. It then examines the 435 Overload-Algorithm AVPs, if present, and selects a single algorithm 436 from that list. If no Overload-Algorithm is indicated, then the base 437 "Loss" algorithm is used for the connection. In either case, the 438 recipient of the CER stores this algorithm in the context of the 439 connection. 441 When a node conformant to this specification sends a Capabilities- 442 Exchange-Answer (CEA) message in answer to a CER that contained a 443 Load-Info AVP, the CEA MUST contain a Load-Info AVP. This Load-Info 444 AVP MUST contain a Supported-Scopes AVP that includes a comprehensive 445 list of scopes supported that the connection initiator can receive 446 and understand. The CEA also contains zero or one Overload-Algorithm 447 AVPs. If present, this Overload-Algorithm MUST match one of the 448 Overload-Algorithm AVPs sent in the CER, and it indicates the 449 overload control algorithm that will be used for the connection. If 450 the CEA contains no Overload-Algorithm, the connection will use the 451 "Loss" algorithm. 453 When a node receives a CEA message, it examines it for the presence 454 of a Load-Info AVP. If no such AVP is present, the node concludes 455 that the overload mechanism is not supported for this connection. If 456 the received CEA contains a Load-Info AVP, then the recipient 457 extracts the Supported-Scopes information, and stores them locally in 458 the context of the connection being established. It then checks for 459 the presence of an Overload-Algorithm AVP. If present, this AVP 460 indicates the overload control algorithm that will be used for the 461 connection. If absent, then the connection will use the "Loss" 462 algorithm. 464 If a node receives a CEA message that indicates support for a scope 465 that it did not indicate in its CER or which selects an overload 466 control algorithm that it did not advertise in its CER, then it MUST 467 terminate the connection by sending a DPR with a Disconnect-Cause of 468 NEGOTIATION_FAILURE, (128 [actual value TBD]) indicating that the CEA 469 sender has failed to properly follow the negotiation process 470 described above. 472 Note that the Supported-Scopes announcement during capabilities 473 exchange is a set of mutual advertisements of which scopes the two 474 nodes are willing to receive information about. It is not a 475 negotiation. It is perfectly acceptable for a node to send 476 information for scopes it did not include in the Supported-Scopes AVP 477 it sent, as long as the recipient indicated support for receiving 478 such a scope. For example, a Diameter agent, during connection 479 establishment with a client, may indicate support for receiving only 480 "Connection" and "Host" scope; however, if the client indicated 481 support for "Application" scope, then the agent is free to send Load- 482 Info AVPs that make use of "Application" scope to the client. 484 3.2. Diameter Client and Diameter Server Behavior 486 The following sections describe the behavior that Diameter clients 487 and Diameter servers implement for the overload control mechanism. 488 Behavior at Diameter Agents is described in Section 3.3. 490 To implement overload control, Diameter nodes need to keep track of 491 three important metrics for each of the scopes for which information 492 has been received: the overload metric for the scope, the period of 493 validity for that overload metric, and the load within that scope. 494 Conceptually, these are data records indexed by the scope to which 495 they apply. In the following sections, we refer to these data 496 records with the term "scope entry." Further, when it is necessary 497 to distinguish between those scope entries referring to the load 498 information received from other nodes and those referring to the load 499 information sent to other nodes, we use the term "remote scope entry" 500 to refer to the information received from other nodes, and "local 501 scope entry" to refer to that information that is being maintained to 502 send to other nodes. 504 In order to allow recipients of overload information to perform 505 certain performance optimizations, we also define a new command flag, 506 called 'O'verload. This bit, when set, indicates that the message 507 contains at least one Load-Info AVP with a non-zero Overload-Metric 508 -- in other words, the sending node is overloaded for at least one 509 context. See Section 7.4 for the definition of the 'O'verload bit. 511 OPEN ISSUE: Is there anything we can do to make this 'O'verload 512 bit even more useful? Perhaps setting it only when the overload 513 value has changed, or changed by a certain amount? 515 3.2.1. Sending a Request 517 This section applies only to those requests sent to peers who 518 negotiated use of the overload control mechanism during capabilities 519 exchange. Requests sent over other connections are handled the same 520 as they would in the absence of the overload control mechanism. 522 Before sending a request, a Diameter node must first determine which 523 scope applies. It does this as follows: first, a next hop host and 524 connection are determined, according to normal Diameter procedures 525 (potentially modified as described in Section 3.5.2). The sending 526 node then searches through its list of remote scope entries (ignoring 527 any whose Period-of-Validity has expired) to determine which ones 528 match the combination of the fields in the current request, the next- 529 hop host, and the selected connection. If none of the matching scope 530 entries are in overload, then the message is handled normally, and no 531 additional processing is required. 533 As an optimization, a sending node MAY choose to track whether any of 534 its peers are in overload, and to skip the preceding step if it knows 535 that no scopes are in overload. 537 If one or more matching scope entries are in overload, then the 538 sending node determines which scope is most overloaded. The sending 539 node then sends, drops, queues, or otherwise modifies handling of the 540 request according to the negotiated overload control algorithm, using 541 the Overload-Metric from the selected scope entry as input to the 542 algorithm. 544 When determining which requests are impacted by the overload control 545 algorithm, request senders MAY take into account the type of message 546 being sent and its contents. For example, messages within an 547 existing session may be prioritized over those that create a new 548 session. The exact rules for such prioritization will likely vary 549 from application to application. The authors expect that 550 specifications that define or specify the use of specific Diameter 551 Applications may choose to formally define a set of rules for such 552 prioritization on a per-Application basis. 554 The foregoing notwithstanding, senders MUST NOT use the content or 555 type of request to exempt that request from overload handling. For 556 example, if a peer requests a 50% decrease in sent traffic using the 557 "Loss" algorithm (see Section 4), but the traffic that the sending 558 node wishes to send consists 65% of traffic that the sender considers 559 critical, then the sender is nonetheless obliged to drop some portion 560 of that critical traffic (e.g., it may elect to drop all non-critical 561 traffic and 23% of the critical traffic, resulting in an overall 50% 562 reduction). 564 The sending node then inserts one or more Load-Info AVPs (see 565 Section 5.1) into the request. If the sender inserts more than one 566 Load-Info AVP, then each Load-Info AVP MUST contain a unique scope, 567 as specified by the Overload-Scope AVP(s) inside the Load-Info AVP. 569 Each Load-Info AVP in the request MUST contain an Overload-Metric 570 (see Section 5.5), indicating whether (and to what degree) the sender 571 is overloaded for the indicated scope. If this metric is not zero, 572 then the Load-Info AVP MUST also contain a Period-Of-Validity AVP 573 (see Section 5.6), indicating the maximum period the recipient should 574 consider the Overload-Metric to be valid. Any message containing a 575 non-zero Overload-Metric also MUST set the 'O'verload bit in the 576 Command Flags field to indicate to the recipient that the message 577 contains an overload indication. See Section 7.4 for the definition 578 of the 'O'verload bit. 580 Each Load-Info AVP MUST also contain a Load AVP, indicating the 581 server's load level within the context of the indicated scope. See 582 Section 3.5.1 for details on generating this load metric. Note that 583 a server's load may frequently be identical for all the scopes for 584 which it sends information. 586 3.2.2. Receiving a Request 588 3.2.2.1. Receiving a Request from a Compliant Peer 590 A node that receives a request from a peer that has negotiated 591 support for the overload control mechanism will extract the Load-Info 592 AVPs from the request and use each of them to update its remote scope 593 entries. First, the node attempts to locate an existing scope entry 594 that corresponds to the Overload-Scope indicated in the Load-Info 595 AVP. If one does not exist, it is created. The scope entry is then 596 populated with the overload metric, period of validity, and load 597 information. The message is then processed as normal. 599 3.2.2.2. Receiving a Request from a Noncompliant Peer 601 An important aspect of the overload control mechanism is that 602 Diameter nodes that do not implement the mechanism cannot have an 603 advantage over those that do. In other words, it is necessary to 604 prevent the situation that a network in overload will cease servicing 605 those transactions from overload-compliant nodes in favor of those 606 sent by those nodes that do not implement the overload control 607 mechanism. To achieve this goal, message recipients need to track 608 the overload control metric on behalf of those sending nodes that do 609 not implement overload, and to reject messages from those nodes that 610 would have been dropped if the sender had implemented the overload 611 mechanism. 613 A node that receives a request from a peer that has not negotiated 614 support for the overload control mechanism searches through its list 615 of local scope entries to determine which ones match the combination 616 of the fields in the received request. (These are the entries that 617 indicate the Overload-Metric that the node would have sent to the 618 peer if the peer had supported the overload mechanism). If none of 619 the matching scope entries are in overload, then the message is sent 620 normally, and no additional processing is required. 622 If one or more matching local scope entries are in overload, then the 623 node determines which scope is most overloaded. The node then 624 executes the "Loss" overload control algorithm (see Section 4) using 625 the overload metric in that most overloaded scope. If the result of 626 running that algorithm determines that a sender who had implemented 627 the overload control mechanism would have dropped the message, then 628 the recipient MUST reply to the request with a 629 DIAMETER_PEER_IN_OVERLOAD response (see Section 7.3). 631 3.2.3. Sending an Answer 633 This section applies only to those answers sent to peers who 634 negotiated use of the overload control mechanism during capabilities 635 exchange. 637 When sending an answer, a Diameter node inserts one or more Load-Info 638 AVPs (see Section 5.1) into the answer. If the sender inserts more 639 than one Load-Info AVP, then each Load-Info AVP MUST contain a unique 640 scope, as specified by the Overload-Scope AVP(s) inside the Load-Info 641 AVP. 643 Each Load-Info AVP in the answer MUST contain an Overload-Metric (see 644 Section 5.5), indicating whether (and to what degree) the server is 645 overloaded for the indicated scope. If this metric is not zero, then 646 the Load-Info AVP MUST also contain a Period-Of-Validity AVP (see 647 Section 5.6), indicating the maximum period the recipient should 648 consider the Overload-Metric to be valid. Any message containing a 649 non-zero Overload-Metric also MUST set the 'O'verload bit in the 650 Command Flags field to indicate to the recipient that the message 651 contains an overload indication. See Section 7.4 for the definition 652 of the 'O'verload bit. 654 Each Load-Info AVP MUST also contain a Load AVP, indicating the 655 server's load level within the context of the indicated scope. See 656 Section 3.5.1 for details on generating this load metric. Note that 657 a server's load may frequently be identical for all the scopes for 658 which it sends information. 660 3.2.4. Receiving an Answer 662 A node that receives a answer from a peer that has negotiated support 663 for the overload control mechanism will extract the Load-Info AVPs 664 from the answer and use each of them to update its remote scope 665 entries. First, the node attempts to locate an existing scope entry 666 that corresponds to the Overload-Scope indicated in the Load-Info 667 AVP. If one does not exist, it is created. The scope entry is then 668 populated with the overload metric, period of validity, and load 669 information. The message is then processed as normal. 671 3.3. Diameter Agent Behavior 673 This section discusses the behavior of a Diameter Agent acting as a 674 Proxy or Relay. Diameter Agents that provide redirect or translation 675 services behave the same as Diameter Servers for the purpose of 676 overload control, and follow the procedures defined in Section 3.2. 678 Whenever sending a request or an answer, Agents MUST include a Load- 679 Info AVP reflecting the a Agent's overload and load information. In 680 formulating this information, the Agent may choose to use only that 681 information relating to its own local resources. However, better 682 network behavior can be achieved if agents incorporate information 683 received from their peers when generating overload information. The 684 exact means for incorporating such information is left to local 685 policy at the agent. 687 For example: consider an agent that distributes sessions and 688 transactions among three Diameter servers, each hosting a different 689 Diameter application. While it would be compliant for the Agent to 690 only report its own overload state (i.e., at "Host" scope), overall 691 network behavior would be improved if it chose to also report 692 overload state for up to three additional scopes (i.e. at 693 "Application-ID" scope), incorporating the Overload information 694 received from each server in these scopes. 696 3.3.1. Proxying a Request 698 Upon receiving a request, a Diameter Proxy or Relay performs the 699 steps detailed in Section 3.2.2. 701 The agent then MUST remove all Load-Info AVPs from the request: Load- 702 Info is never passed through a Proxy or Relay transparently. 704 When the Diameter Agent proxies or relays a request, it follows the 705 process outlined in Section 3.2.1. 707 3.3.2. Proxying an Answer 709 Upon receiving an answer, a Diameter Agent follows the process 710 described in Section 3.2.4 to update its remote scope entries. 712 The Agent then MUST remove all Load-Info AVPs from the answer: Load- 713 Info is never passed through a Proxy or Relay transparently. 715 When the Diameter Agent proxies or relays a response, it follows the 716 process outlined in Section 3.2.3. 718 3.4. Proactive Load and Overload Communication 720 Because not all Diameter links will have constant traffic, it may be 721 occasionally necessary to send overload and/or load information over 722 links that would otherwise be quiescent. To proactively send such 723 information to peers, the Diameter node with information to convey 724 may choose to send a Diameter Watchdog Request (DWR) message to its 725 peers. The procedure described in Section 3.2.1 applies to these 726 requests, which provides the means to send load and overload 727 information. 729 In order to prevent unnecessarily diminished throughput between 730 peers, a Diameter node SHOULD proactively send a DWR to all its peers 731 whenever it leaves an overload state. Similarly, in order to provide 732 peers the proper data for load distribution, nodes SHOULD send DWR 733 messages to a peer if the load information most recently sent to that 734 peer has changed by more than 20% and is more than 5 seconds old. 736 3.5. Load Processing 738 While the remainder of the mechanism described in this document is 739 aimed at handling overload situations once they occur, it is far 740 better for a system if overload can be avoided altogether. In order 741 to facilitate overload avoidance, the overload mechanism includes the 742 ability to convey node load information. 744 Semantically, the Load information sent by a Diameter node indicates 745 the current utilization of its most constrained resource. It is a 746 linear scale from 0 (least loaded) to 65535 (most loaded). 748 It is critical to distinguish between the value conveyed in the Load 749 AVP and the value conveyed in the Overload-Metric AVP. The Load AVP 750 is computed and used independent of the Overload-Algorithm selected 751 for a connection, while the Overload-Metric is meaningful only in the 752 context of the selected algorithm. Most importantly, the Load 753 information never has any impact on the behavior specified in the 754 overload algorithm. If a node reports a Load of 65535, but the 755 Overload-Metric does not indicate any need to apply the selected 756 overload control algorithm, then the sender MUST NOT apply the 757 selected overload control algorithm. Conversely, if a node is 758 reporting an Overload-Metric that requires the recipient to take 759 action to reduce traffic, those actions MUST be taken, even if the 760 node is simultaneously reporting a Load value of 0. 762 3.5.1. Sending Load Information 764 Diameter nodes implementing the overload mechanism described in this 765 document MUST include a Load AVP (inside a Load-Info AVP) in every 766 Diameter message (request and answer) they send over a connection 767 that has been negotiated to use the overload control mechanism. Note 768 that this requirement does not necessitate calculation of the Load 769 metric each time a message is sent; the Load value may be calculated 770 periodically (e.g., every 100 ms), and used for every message sent 771 until it is recalculated. 773 The algorithm for generation of the load metric is a matter of local 774 policy at the Diameter node, and may vary widely based on the 775 internal software architecture of that node. 777 For advanced calculations of Load, anticipated inputs to the 778 computation include CPU utilization, network utilization, processor 779 interrupts, I/O throughput, and internal message queue depths. 781 To free implementors from the potential complexity of determining an 782 optimal calculation for load, we define a very simple, baseline load 783 calculation that MAY be used for the purpose of populating the Load 784 AVP. Implementations using this simplified calculation will use a 785 configured, hard-coded, or Service Level Agreement (SLA)-defined 786 maximum number of transactions per second (TPS) which a node is known 787 to be able to support without issue. These implementations simply 788 report their load as a linear representation of how much of this 789 known capacity is currently in use: 791 Load = MIN(Current_TPS * 65535 / Maximum_TPS, 65535) 793 To prevent rapid fluctuations in the load metric, nodes SHOULD report 794 a rolling average of the calculated load rather than the actual 795 instantaneous load at any given moment. 797 Load information is scoped to the level indicated by the Overload- 798 Info-Scope AVP present in the Load-Info AVP in which the Load AVP 799 appears. 801 3.5.2. Receiving Load Information 803 While sending load information is mandatory, the actual processing of 804 load information at a recipient is completely optional. Ideally, 805 recipients will use the load information as input to a decision 806 regarding which of multiple equivalent servers to use when initiating 807 a new connection. Recipients may choose to update load information 808 on receipt of every message; alternately, they may periodically 809 "sample" messages from a host to determine the load it is currently 810 reporting. 812 3.5.2.1. Example Load Handling 814 This section describes a non-normative example of how recipients can 815 use Load information received from other Diameter nodes. At a high 816 level, the concept is that received load metrics are used to scale 817 the distribution algorithm that the node uses for selection of a 818 server from a group of equivalent servers. 820 Consider a client that uses DNS to resolve a host name into IP 821 addresses. In this example, the client is attempting to reach the 822 server for the realm example.com. It performs a NAPTR query for the 823 "AAA+D2T" record for that domain, and receives a result pointing to 824 the SRV record "_diameter._tcp.example.com". Querying for this SRV 825 record, in turn, results in three entries, with the same priorities: 827 +------------+----------------------+ 828 | SRV Weight | Server Name | 829 +------------+----------------------+ 830 | 20 | server-a.example.com | 831 | 20 | server-b.example.com | 832 | 60 | server-c.example.com | 833 +------------+----------------------+ 835 The client then examines the currently reported loads for each of the 836 three servers. In this example, we are asserting that the reported 837 load metrics are as follows: 839 +-------------+----------------------+ 840 | Load | Server Name | 841 +-------------+----------------------+ 842 | 13107 (20%) | server-a.example.com | 843 | 26214 (60%) | server-b.example.com | 844 | 52428 (80%) | server-c.example.com | 845 +-------------+----------------------+ 847 Based on this load information, the client scales the SRV weights 848 proportional to each server's reported load; the general formula is: 850 new_weight = original_weight * (65535 - load) / 65535 852 The node then calculates a new set of weights for the destination 853 hosts: 855 o server-a: new_weight = 20 * (65535 - 13107) / 65535 = 16 856 o server-b: new_weight = 20 * (65535 - 26214) / 65535 = 12 857 o server-c: new_weight = 60 * (65535 - 52428) / 65535 = 12 859 These three new weights (16, 12, and 12) are then used as input to 860 the random selection process traditionally used when selecting among 861 several SRV records. 863 Note that this example is provided in the context of DNS SRV 864 processing; however, it works equally well in the case that server 865 processing weights are provisioned or made available through an 866 alternate resolution process. 868 3.6. Session Establishment for Session Groups 870 The procedure in this section applies to any Diameter operation that 871 may result in the creation of a new Diameter session. Note that 872 these operations are performed in addition to any normal message 873 processing, and in addition to the operations described in the 874 following sections. 876 3.6.1. Session Group Concepts 878 At the time a session is established, the server and/or the client 879 may choose to assign the newly created session to a Session Group 880 that they can use to refer to the session (and other sessions in the 881 same group) in later overload-related messages. This grouping is 882 intended to be used by servers that have visibility into resources 883 that may be independently overloaded, but which do not correspond to 884 an existing Diameter construct (such as Application, Realm, or 885 Destination Server). 887 One example of a server having visibility into resources that don't 888 have a corresponding Diameter construct is a Diameter Agent servicing 889 a mixed community of users -- say, one authenticated by a "Business" 890 server, and another authenticated by a "Residential" server. The 891 client in this network does not know which group any given session 892 belongs in; the routing of sessions is based on information available 893 only to the agent. 895 +-------------+ +-------------+ 896 | | | | 897 | Server A | | Server B | 898 | (Business) | |(Residential)| 899 | | | | 900 +-------------+ +-------------+ 901 `. ,' 902 `. ,' 903 `. ,' 904 +-----+---+-----+ 905 | | 906 | Agent | 907 | | 908 +---------------+ 909 ^ 910 | 911 +-------+-------+ 912 | | 913 | Client | 914 | | 915 +---------------+ 917 In this case, the Agent may wish to assign sessions to two client- 918 visible Session Groups when the session is established. By doing so, 919 the Agent gains the ability to report Load and Overload metrics to 920 the Client independently for the two classes of users. This can be 921 extremely helpful, for example, in allowing the Agent to ask the 922 Client to throttle traffic for the Residential server when it becomes 923 overload, without impacting sessions pertaining to the Business 924 server. 926 Similar situations can arise even without the presence of Diameter 927 Agents in the network: a server may have a class of sessions that 928 require access to an off-board database (which can, itself, become 929 overloaded), while also servicing a class of sessions that is handled 930 entirely by a local authentication table. The server can use Session 931 Groups to assign these two classes of sessions to different groups, 932 and report overload on the class using the (overloaded) off-board 933 database without impacting the other sessions. 935 In some applications, it is possible to have the session established 936 by one peer (e.g., in the upstream direction), while some subsequent 937 in-session transactions are initiated by the other peer (e.g., in the 938 downstream direction). Because of this possibility, the overload 939 mechanism allows both peers to establish a Session Group at the time 940 the session is set up. The session identifiers are scoped to the 941 node that sends them. In other words, if a server assigns a session 942 to a group called "Residential", this group is not related to a 943 client group (if any) by the same name. For clarity, this document 944 will refer to the session group assigned by the server performing the 945 processing as a "local session group," and the session group assigned 946 by the remote node as a "remote session group." 948 Nodes that send a session-creating request follow normal Diameter 949 procedures, along with the additional behavior described in 950 Section 3.2.1 and Section 3.3.1, as appropriate. Such nodes may also 951 assign the session to a Session Group, as long as the peer to which 952 they are communicating indicated support for the "Session-Group" 953 scope during capabilities exchange. Whether to do so and what group 954 to assign a session to is done according to local policy. To perform 955 such assignment, the node will include a Session-Group AVP (see 956 Section 5.7 in the Load-Info AVP for the session creating request. 957 These nodes also store the assigned name as the session's local 958 session group. 960 3.6.2. Session Group Procedures 962 The procedures in this section only apply on connections for which 963 support for the "Session-Group" scope has been negotiated during 964 capabilities exchange. See Section 3.1. 966 When a node receives a session creating request, it MUST check that 967 request for the presence for a Session-Group AVP in its Load-Info 968 AVP. If one is present, it stores that session group name as the 969 remote session group name for that server. This allows clients to 970 assign the session to a group, allowing it to indicate overload for 971 server-initiated transactions in the resulting session. 973 When a node replies to a session creating request, it can choose to 974 assign the newly-established session to a session group. Whether it 975 chooses to do so is independent of whether the remote node assigned 976 the session to a session group. To perform such an assignment, the 977 node includes a Session-Group AVP in the Load-Info AVP sent in answer 978 to the session-creating request. These nodes also store the assigned 979 name as the session's local session group. 981 Finally, when a node that has sent a session-creating request 982 receives a corresponding answer message, it MUST check that answer 983 for the presence of a Session-Group AVP in its Load-Info AVP. If one 984 is present, it stores that session group name as the remote session 985 group name for that server. 987 4. Loss-Based Overload Control Algorithm 989 This section describes a baseline, mandatory-to-implement overload 990 control algorithm, identified by the indicator "Loss". This 991 algorithm allows a Diameter peer to ask its peers to reduce the 992 number of requests they would ordinarily send by a specified 993 percentage. For example, if a peer requests of another peer that it 994 reduce the traffic it is sending by 10%, then that peer will 995 redirect, reject, or treat as failed, 10% of the traffic that would 996 have otherwise been sent to this Diameter node. 998 4.1. Overload-Metric values for the 'Loss' Algorithm 1000 A Diameter node entering the overload state for any of the scopes 1001 that it uses with its peers will calculate a value for its Overload 1002 Metric, in the range of 0 to 100 (inclusive). This value indicates 1003 the percentage traffic reduction the Diameter node wishes its peers 1004 to implement. The computation of the exact value for this parameter 1005 is left as an implementation choice at the sending node. It is 1006 acceptable for implementations to request different levels of traffic 1007 reduction to different peers according to local policy at the 1008 Diameter node. These Overload Metrics are then communicated to peers 1009 using the Overload-Metric AVP in requests and answers sent by this 1010 node. 1012 Recipients of Overload-Metric AVPs on connections for which the 1013 "Loss" algorithm has been specified MUST reduce the number of 1014 requests sent in the corresponding scope by that percentage, either 1015 by redirecting them to an alternate destination, or by failing the 1016 request. For a Diameter Agent, these failures are indicated to the 1017 peer who originated the request by sending a 1018 DIAMETER_PEER_IN_OVERLOAD response (see Section 7.3). For diameter 1019 clients, these failures cause the client to behave as if they 1020 received a transient error in response to the request. 1022 It is acceptable, when implementing the "Loss" algorithm, for the 1023 reduction in transactions to make use of a statistical loss function 1024 (e.g., random assignment of transactions into "success" and "failure" 1025 categories based on the indicated percentage). In such a case, the 1026 actual traffic reduction might vary slightly from the percentage 1027 indicated, albeit in an insignificant amount. 1029 The selection of which messages to withhold from sending does not 1030 need to be arbitrary. For example, implementations are allowed to 1031 distinguish between higher-priority and lower-priority messages, and 1032 drop the lower-priority messages in favor of dropping the higher 1033 priority messages, as long as the total reduction in traffic conforms 1034 to the Overload-Metric in effect at the time. The selection of which 1035 messages to prioritize over others will likely vary from application 1036 to application (and may even be subject to standardization as part of 1037 the application definition). One example of such a prioritization 1038 scheme would be to treat those messages that result in the creation 1039 of a new session as lower priority then those messages sent in the 1040 context of an established session. 1042 4.2. Example Implementation 1044 The exact means a client uses to implement the requirement that it 1045 reduce traffic by a requested percentage is left to the discretion of 1046 the implementor. However, to aid in understanding the nature of such 1047 an implementation, we present an example of a valid implementation in 1048 pseudo-code. 1050 In this example, we consider that the sending node maintains two 1051 classes of request. The first category are considered of lower 1052 priority than the second category. If a reduction in traffic is 1053 required, then these lower priority requests will be dropped before 1054 any of the higher priority requests are dropped. 1056 The sending Diameter node determines the mix of requests falling into 1057 the first category, and those falling into the second category. For 1058 example, 40% of the requests may be in the lower-priority category, 1059 while 60% are in the higher-priority category. 1061 When a node receives an overload indication from one of its peers, it 1062 converts the Overload-Metric value to a value that applies to the 1063 first category of requests. For example, if the Overload-Metric for 1064 the applicable context is "10", and 40% of the requests are in the 1065 lower-priority category, then: 1067 10 / 40 * 100 = 25 1069 Or 25% of the requests in the first category can be dropped, with an 1070 overall reduction in sent traffic of 10%. The sender then drops 25% 1071 of all category 1 requests. This can be done stochastically, by 1072 selecting a random number for each sent packet between 1 to 100 1073 (inclusive), and dropping any packet for which the resulting 1074 percentage is equal to or less than 25. In this set of 1075 circumstances, messages in the second category do not require any 1076 reduction to meet the requirement of 25% traffic reduction. 1078 A reference algorithm is shown below, using pseudo-code. 1080 cat1 := 80.0 // Category 1 --- subject to reduction 1081 cat2 := 100.0 - cat1 // Category 2 --- Under normal operations 1082 // only subject to reduction after category 1 is exhausted. 1083 // Note that the above ratio is simply a reasonable default. 1084 // The actual values will change through periodic sampling 1085 // as the traffic mix changes over time. 1087 while (true) { 1088 // We're modeling message processing as a single work queue 1089 // that contains both incoming and outgoing messages. 1090 msg := get_next_message_from_work_queue() 1092 update_mix(cat1, cat2) // See Note below 1094 switch (msg.type) { 1096 case outbound request: 1097 destination := get_next_hop(msg) 1098 oc_context := get_oc_scope(destination,msg) 1100 if (we are in overload) { 1101 add_overload_avps(msg) 1102 } 1104 if (oc_context == null) { 1105 send_to_network(msg) // Process it normally by sending the 1106 // request to the next hop since this particular 1107 // destination is not subject to overload 1108 } 1109 else { 1110 // Determine if server wants to enter in overload or is in 1111 // overload 1112 in_oc := extract_in_oc(oc_context) 1114 oc_value := extract_oc(oc_context) 1115 oc_validity := extract_oc_validity(oc_context) 1117 if (in_oc == false or oc_validity is not in effect) { 1118 send_to_network(msg) // Process it normally by sending 1119 // the request to the next hop since this particular 1120 // destination is not subject to overload. Optionally, 1121 // clear the oc context for this server (not shown). 1122 } 1123 else { // Begin perform overload control 1124 r := random() 1125 drop_msg := false 1126 if (cat1 >= cat2) { 1127 category := assign_msg_to_category(msg) 1128 pct_to_reduce_cat2 := 0 1129 pct_to_reduce_cat1 := oc_value / cat1 * 100 1130 if (pct_to_reduce_cat1 > 100) { 1131 // Get remaining messages from category 2 1132 pct_to_reduce_cat2 := 100 - pct_to_reduce_cat1 1133 pct_to_reduce_cat1 := 100 1134 } 1136 if (category == cat1) { 1137 if (r <= pct_to_reduce_cat1) { 1138 drop_msg := true 1139 } 1140 } 1141 else { // Message from category 2 1142 if (r <= pct_to_reduce_cat2) { 1143 drop_msg := true 1144 } 1145 } 1146 } 1147 else { // More category 2 messages than category 1; 1148 // indicative of an emergency situation. Since 1149 // there are more category 2 messages, don't 1150 // bother distinguishing between category 1 or 1151 // 2 --- treat them equal (for simplicity). 1152 if (r <= oc_value) 1153 drop_msg := true 1154 } 1156 if (drop_msg == false) { 1157 send_to_network(msg) // Process it normally by 1158 // sending the request to the next hop 1159 } 1160 else { 1161 // Do not send request downstream, handle locally by 1162 // generating response (if a proxy) or treating as 1163 // an error (if a user agent). 1164 } 1165 } // End perform overload control 1166 } 1168 end case // outbound request 1170 case outbound answer: 1171 if (we are in overload) { 1172 add_overload_avps(msg) 1173 } 1174 send_to_network(msg) 1176 end case // outbound answer 1178 case inbound answer: 1179 create_or_update_oc_scope() // For the specific server 1180 // that sent the answer, create or update the oc scope; 1181 // i.e., extract the values of the overload AVPs 1182 // and store them in the proper scopes for later use. 1183 process_msg(msg) 1185 end case // inbound answer 1186 case inbound request: 1187 create_or_update_oc_scope() 1189 if (we are not in overload) { 1190 process_msg(msg) 1191 } 1192 else { // We are in overload 1193 if ( connection supports overload) 1194 process_msg(msg) 1195 } 1196 else { // Sender does not support oc 1197 if (local_policy(msg) says process message) { 1198 process_msg(msg) 1199 } 1200 else { 1201 send_answer(msg, DIAMETER_PEER_IN_OVERLOAD) 1202 } 1203 } 1204 } 1205 end case // inbound request 1206 } 1207 } 1209 A simple way to sample the traffic mix for category 1 and category 2 1210 is to associate a counter with each category of message. 1211 Periodically (every 5-10s), get the value of the counters and 1212 calculate the ratio of category 1 messages to category 2 messages 1213 since the last calculation. 1215 Example: In the last 5 seconds, a total of 500 requests were 1216 scheduled to be sent. Assume that 450 out of 500 were messages 1217 subject to reduction and 50 out of 500 were classified as requests 1218 not subject to reduction. Based on this ratio, cat1 := 90 and cat2 1219 := 10, or a 90/10 mix will be used in overload calculations. 1221 Of course, this scheme can be generalized to include an arbitrary 1222 number of priorities, depending on how many different classes of 1223 messages make sense for the given application. 1225 5. Diameter AVPs for Overload 1227 NOTE: THE AVP NUMBERS IN THIS SECTION ARE USED FOR EXAMPLE PURPOSES 1228 ONLY. THE FINAL AVP CODES TO BE USED WILL BE ASSIGNED BY IANA DURING 1229 THE PUBLICATION PROCESS, WHEN AND IF THIS DOCUMENT IS PUBLISHED AS AN 1230 RFC. 1232 +---------------------+-------+-------+-------------+------+--------+ 1233 | Attribute Name | AVP | Sec. | Data Type | MUST | MUST | 1234 | | Code | Def. | | | NOT | 1235 +---------------------+-------+-------+-------------+------+--------+ 1236 | Load-Info | 1600 | 5.1 | Grouped | | M,V | 1237 | Supported-Scopes | 1601 | 5.2 | Unsigned64 | | M,V | 1238 | Overload-Algorithm | 1602 | 5.3 | Enumerated | | M,V | 1239 | Overload-Info-Scope | 1603 | 5.4 | OctetString | | M,V | 1240 | Overload-Metric | 1604 | 5.5 | Unsigned32 | | M,V | 1241 | Period-Of-Validity | 1605 | 5.6 | Unsigned32 | | M,V | 1242 | Session-Group | 1606 | 5.7 | UTF8String | | M,V | 1243 | Load | 1607 | 5.8 | Unsigned32 | | M,V | 1244 +---------------------+-------+-------+-------------+------+--------+ 1246 5.1. Load-Info AVP 1248 The Load-Info AVP (AVP code 1600) is of type Grouped, and is used as 1249 a top-level container to group together all information pertaining to 1250 load and overload information. Every Load-Info AVP MUST contain one 1251 Overload-Information-Scope AVP, and one Overload-Metric AVP. 1253 The Grouped Data field of the Load-Info AVP has the following CCF 1254 grammar: 1256 < Load-Info > ::= < AVP Header: 1600 > 1257 < Overload-Metric > 1258 * { Overload-Info-Scope } 1259 [ Supported-Scopes ] 1260 * [ Overload-Algorithm ] 1261 [ Period-Of-Validity ] 1262 [ Session-Group ] 1263 [ Load ] 1264 * [ AVP ] 1266 5.2. Supported-Scopes AVP 1268 The Supported-Scopes AVP (AVP code 1601) is of type Uint64, and is 1269 used during capabilities exchange to indicate the scopes that a given 1270 node can receive on the connection. Nodes that support the mechanism 1271 defined in this document MUST include a Supported-Scopes AVP in all 1272 CER messages. It also MUST appear in any CEA messages sent in answer 1273 to a CER message containing a Load-Info AVP. The Supported-Scopes 1274 AVP MUST NOT appear in any other message types. See Section 5.4 for 1275 an initial list of scopes. 1277 The Supported-Scopes AVP contains a bitmap that indicates the scopes 1278 supported by the sender. Within the bitmap, the least significant 1279 bit indicates support for scope 1 (Destination-Realm), while the next 1280 least significant bit indicates support for scope 2 (Application-ID), 1281 and so on. In general, if we consider the bits to be numbered from 0 1282 (LSB) to 63 (MSB), then any bit n corresponds to the scope type 1283 numbered n+1. This scheme allows for up to 64 total scopes to be 1284 supported. More formally, the bitmask used to indicate support for 1285 any specific context is calculated as follows (where the symbol "<<" 1286 indicates a bit shift left): 1288 bitmask = 1 << (n - 1) 1290 For additional clarity, the bitmasks for the scopes defined in this 1291 document are as follows: 1293 +-------+--------------------+-------------------+ 1294 | Scope | Bitmask | Scope | 1295 +-------+--------------------+-------------------+ 1296 | 1 | 0x0000000000000001 | Destination-Realm | 1297 | 2 | 0x0000000000000002 | Application-ID | 1298 | 3 | 0x0000000000000004 | Destination-Host | 1299 | 4 | 0x0000000000000008 | Host | 1300 | 5 | 0x0000000000000010 | Connection | 1301 | 6 | 0x0000000000000020 | Session-Group | 1302 | 7 | 0x0000000000000040 | Session | 1303 +-------+--------------------+-------------------+ 1305 The advertisement process that makes use of the Supported-Scopes AVP 1306 is described in Section 3.1. 1308 5.3. Overload-Algorithm AVP 1310 The Overload-Algorithm AVP (AVP code 1602) is of type Enumerated, and 1311 is used to negotiate the algorithm that will be used for load 1312 abatement. The Overload-Algorithm AVP MAY appear in CER and CEA 1313 messages, and MUST NOT appear in any other message types. If absent, 1314 an Overload Algorithm of type 1 (Loss) is indicated. Additional 1315 values can be registered by other documents; see Appendix C.1. 1316 Initial values for the enumeration are as follows: 1318 +------------+----------------+------------+ 1319 | AVP Values | Attribute Name | Reference | 1320 +------------+----------------+------------+ 1321 | 0 | Reserved | - | 1322 | 1 | Loss | [RFC xxxx] | 1323 +------------+----------------+------------+ 1325 5.4. Overload-Info-Scope AVP 1327 The Overload-Info-Scope AVP (AVP code 1603) is of type OctetString, 1328 and is used to indicate to which scope the Overload-Metric applies. 1330 See Section 2 for a definition of the different scope types and a 1331 formal description of how they are applied. Other documents may 1332 define additional scopes; see Appendix C.2 for details. 1334 0 1 2 3 1335 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1337 | Scope | | 1338 +-+-+-+-+-+-+-+-+ Details | 1339 | | 1340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1342 +-------+-------------------+------------+ 1343 | Scope | Attribute Name | Reference | 1344 +-------+-------------------+------------+ 1345 | 0 | Reserved | [RFC xxxx] | 1346 | 1 | Destination-Realm | [RFC xxxx] | 1347 | 2 | Application-ID | [RFC xxxx] | 1348 | 3 | Destination-Host | [RFC xxxx] | 1349 | 4 | Host | [RFC xxxx] | 1350 | 5 | Connection | [RFC xxxx] | 1351 | 6 | Session-Group | [RFC xxxx] | 1352 | 7 | Session | [RFC xxxx] | 1353 +-------+-------------------+------------+ 1355 Each Overload-Info-Scope has a different encoding, according to the 1356 identifier used to designate the corresponding scope. The formats 1357 for the seven scopes defined in this document are given in the 1358 following section. 1360 5.4.1. Realm Scope 1362 0 1 2 3 1363 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1365 | 1 | | 1366 +-+-+-+-+-+-+-+-+ Realm (DiameterIdentity) | 1367 | | 1368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1370 5.4.2. Application-ID Scope 1372 0 1 2 3 1373 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1375 | 2 | Reserved (set to zeros) | 1376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1377 | Application-ID (Unsigned32) | 1378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1380 5.4.3. Host Scope 1382 0 1 2 3 1383 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1385 | 3 | | 1386 +-+-+-+-+-+-+-+-+ Host (DiameterIdentity) | 1387 | | 1388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1390 5.4.4. Session Scope 1392 0 1 2 3 1393 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1395 | 4 | | 1396 +-+-+-+-+-+-+-+-+ Session-ID (UTF8String) | 1397 | | 1398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1400 5.4.5. Connection Scope 1402 0 1 2 3 1403 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1405 | 5 | Reserved (set to zeros) | 1406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1408 5.4.6. Session Group Scope 1410 0 1 2 3 1411 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1413 | 6 | | 1414 +-+-+-+-+-+-+-+-+ Group Name (UTF8String) | 1415 | | 1416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1418 5.5. Overload-Metric AVP 1420 The Overload-Metric AVP (AVP code 1604) is of type Unsigned32, and is 1421 used as input to the load mitigation algorithm. Its definition and 1422 interpretation is left up to each individual algorithm, with the 1423 exception that an Overload-Metric of "0" always indicates that the 1424 node is not in overload (that is, no load abatement procedures are in 1425 effect) for the indicated scope. 1427 5.6. Period-Of-Validity AVP 1429 The Period-Of-Validity AVP (AVP code 1605) is of type Unsigned32, and 1430 is used to indicate the length of time, in seconds, the Overload- 1431 Metric is to be considered valid (unless overridden by a subsequent 1432 Overload-Metric in the same scope). It MUST NOT be present if the 1433 Overload-Metric is '0', and MUST be present otherwise. 1435 5.7. Session-Group AVP 1437 The Session-Group AVP (AVP code 1606) is of type UTF8String, and is 1438 used to assign a new session to the session group that it names. The 1439 Session-Group AVP MAY appear once in the answer to a session-creating 1440 request, and MUST NOT appear in any other message types. 1442 5.8. Load AVP 1444 The Load AVP (AVP code 1607) is of type Unsigned32, and is used to 1445 indicate the load level of the scope in which it appears. See 1446 Section 3.5 for additional information. 1448 6. Security Considerations 1450 A key concern for recipients of overload metrics and load information 1451 is whether the peer from which the information has been received is 1452 authorized to speak for the indicated scope. For scopes such as 1453 "Host" and "Connection", such authorization is obvious. For other 1454 scopes, such as "Application-ID" and "Realm", the potential for a 1455 peer to maliciously or accidentally reduce traffic to a third party 1456 is evident. Implementations may choose to ignore indications from 1457 hosts which do not clearly have authority over the indicated scope; 1458 alternately, they may wish to further restrict the scope to apply 1459 only to the host from which the information has been received. 1461 On the other hand, multiple nodes that are under the same 1462 administrative control (or a tightly controlled confederation of 1463 control) may be implicitly trusted to speak for all scopes within 1464 that domain of control. Implementations are encouraged to allow 1465 configuration of inherently trusted servers to which the foregoing 1466 restrictions are not applied. 1468 Open Issue: There are almost certainly other security issues to take 1469 into consideration here. For example, we might need to include 1470 guidance around who gets to see our own load information, and 1471 potentially changing the granularity of information presented based 1472 on trust relationships. 1474 7. IANA Considerations 1476 This document defines new entries in several existing IANA tables. 1477 It also creates two new tables. 1479 7.1. New Diameter AVPs 1481 The following entries are added to the "AVP Codes" table under the 1482 "aaa-parameters" registry. 1484 +----------+---------------------+-----------+ 1485 | AVP Code | Attribute Name | Reference | 1486 +----------+---------------------+-----------+ 1487 | 1600 | Load-Info | RFC xxxx | 1488 | 1601 | Supported-Scopes | RFC xxxx | 1489 | 1602 | Overload-Algorithm | RFC xxxx | 1490 | 1603 | Overload-Info-Scope | RFC xxxx | 1491 | 1604 | Overload-Metric | RFC xxxx | 1492 | 1605 | Period-Of-Validity | RFC xxxx | 1493 | 1606 | Session-Group | RFC xxxx | 1494 | 1607 | Load | RFC xxxx | 1495 +----------+---------------------+-----------+ 1497 7.2. New Diameter Disconnect-Cause 1499 The following entry is added to the "Disconnect-Cause AVP Values 1500 (code 273)" table in the "aaa-parameters" registry: 1502 +------------------------+---------------------+-----------+ 1503 | AVP Values | Attribute Name | Reference | 1504 +------------------------+---------------------+-----------+ 1505 | 128 [actual value TBD] | NEGOTIATION_FAILURE | RFC xxxx | 1506 +------------------------+---------------------+-----------+ 1508 7.3. New Diameter Response Code 1510 The following entry is added to the "Result-Code AVP Values (code 1511 268) - Transient Failures" table in the "aaa-parameters" registry: 1513 +-------------------------+---------------------------+-----------+ 1514 | AVP Values | Attribute Name | Reference | 1515 +-------------------------+---------------------------+-----------+ 1516 | 4128 [actual value TBD] | DIAMETER_PEER_IN_OVERLOAD | RFC xxxx | 1517 +-------------------------+---------------------------+-----------+ 1519 7.4. New Command Flag 1521 The following entry is added to the "Command Flags" table in the 1522 "aaa-parameters" registry: 1524 +-----+------------+-----------+ 1525 | bit | Name | Reference | 1526 +-----+------------+-----------+ 1527 | 4 | 'O'verload | RFC xxxx | 1528 +-----+------------+-----------+ 1530 7.5. Overload Algorithm Registry 1532 This document defines a new table, to be titled "Overload-Algorithm 1533 Values (code 1602)", in the "aaa-parameters" registry. Its initial 1534 values are to be taken from the table in Section 5.3. 1536 New entries in this table follow the IANA policy of "Specification 1537 Required." (Open Issue: The WG should discuss registration policy to 1538 ensure that we think this is the right balance). 1540 7.6. Overload Scope Registry 1542 This document defines a new table, to be titled "Overload-Info-Scope 1543 Values (code 1603)", in the "aaa-parameters" registry. Its initial 1544 values are to be taken from the table in Section 5.4. 1546 New entries in this table follow the IANA policy of "Specification 1547 Required." (Open Issue: The WG should discuss registration policy to 1548 ensure that we think this is the right balance). 1550 8. References 1552 8.1. Normative References 1554 [I-D.ietf-dime-overload-reqs] 1555 McMurry, E. and B. Campbell, "Diameter Overload Control 1556 Requirements", draft-ietf-dime-overload-reqs-00 (work in 1557 progress), September 2012. 1559 [I-D.ietf-dime-rfc3588bis] 1560 Fajardo, V., Arkko, J., Loughney, J., and G. Zorn, 1561 "Diameter Base Protocol", draft-ietf-dime-rfc3588bis-34 1562 (work in progress), June 2012. 1564 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1565 Requirement Levels", BCP 14, RFC 2119, March 1997. 1567 8.2. Informative References 1569 [I-D.ietf-soc-overload-control] 1570 Gurbani, V., Hilt, V., and H. Schulzrinne, "Session 1571 Initiation Protocol (SIP) Overload Control", 1572 draft-ietf-soc-overload-control-10 (work in progress), 1573 October 2012. 1575 [RFC3588] Calhoun, P., Loughney, J., Guttman, E., Zorn, G., and J. 1576 Arkko, "Diameter Base Protocol", RFC 3588, September 2003. 1578 [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design 1579 Considerations for Session Initiation Protocol (SIP) 1580 Overload Control", RFC 6357, August 2011. 1582 Appendix A. Acknowledgements 1584 This work was inspired by and borrows heavily from the SIP overload 1585 control mechanism described in [I-D.ietf-soc-overload-control]. The 1586 authors of this document are deeply grateful to the editor and 1587 authors of that work, as well as its many contributors. 1589 Thanks to Ben Campbell and Eric McMurry for significant input to the 1590 initial mechanism design. The author also thanks Martin Dolly, Bob 1591 Wallace, John Gilmore, Matt McCann, Jonathan Palmer, Kedar Karmarkar, 1592 Imtiaz Shaikh, Jouni Korhonen, Uri Baniel, Jianrong Wang, Brian 1593 Freeman, and Eric Noel for early feedback on the mechanism. 1595 Appendix B. Requirements Analysis 1597 This section analyzes the mechanism described in this document 1598 against the set of requirements detailed in 1599 [I-D.ietf-dime-overload-reqs]. 1601 Open Issue: This analysis to be performed after requirements have 1602 been finalized. 1604 Appendix C. Extending the Overload Mechanism 1606 This specification includes two key extension points to allow for new 1607 behaviors to be smoothly added to the mechanism in the future. The 1608 following sections discuss the means by which future documents are 1609 expected to extend the mechanism. 1611 C.1. New Algorithms 1613 In order to provide the ability for different means of traffic 1614 abatement in the future, this specification allows for descriptions 1615 of new traffic reduction algorithms. In general, documents that 1616 define new algorithms need to describe externally-observable node 1617 behavior in sufficient detail as to allow interoperation. 1619 At a minimum, such description needs to include: 1621 1. The name and IANA-registered number for negotiating the algorithm 1622 (see Section 5.3). 1623 2. A clear description of how the Overload-Metric AVP is to be 1624 interpreted, keeping in mind that "0" is reserved to indicate 1625 that no overload condition exists. 1626 3. An example, proof-of-concept description (preferably in pseudo- 1627 code) of how nodes can implement the algorithm. 1629 New algorithms must be capable of working with all applications, not 1630 just a subset of applications. 1632 C.2. New Scopes 1634 Because it is impossible to foresee all the potential constructs that 1635 it might be useful to scope operations to for the purposes of 1636 overload, we allow for the registration of new scopes. 1638 At a minimum, such description needs to include: 1640 1. The name and IANA-registered number for negotiating and 1641 indicating the scope (see Section 5.4). 1642 2. A syntax for the "Details" field of the Overload-Info-Scope AVP, 1643 preferably derived from one of the base Diameter data types. 1644 3. An explicit and unambiguous description of how both parties to 1645 the overload control mechanism can determine which transactions 1646 correspond to the indicated scope. 1647 4. A clear and exhaustive list that extends the one in Section 2.2, 1648 indicating exactly which combinations of scopes are allowed with 1649 the new scope. This list must take into account all of the IANA- 1650 registered scopes at the time of its publication. 1652 It is acceptable for new scopes to be specific to constructs within 1653 one or several applications. In other words, it may be desirable to 1654 define scopes that can be applied to one kind of application while 1655 not making sense for another. Extension documents should be very 1656 clear that such is the case, however, if they choose to do so. 1658 Appendix D. Design Rationale 1660 The current design proposed in this document takes into account 1661 several trade-offs and requirements that may not be immediately 1662 obvious. The remainder of this appendix highlights some of the 1663 potentially more controversial and/or non-obvious of these, and 1664 attempts to explain why such decisions were made they way they were. 1666 That said, none of the following text is intended to represent a line 1667 in the sand. All of the decisions can be revisited if necessary, 1668 especially if additional facts are brought into the analysis that 1669 change the balance of the decisions. 1671 D.1. Piggybacking 1673 The decision to piggyback load information on existing messages 1674 derives primarily from REQ 14 in [I-D.ietf-dime-overload-reqs]: "The 1675 mechanism SHOULD provide for increased feedback when traffic levels 1676 increase. The mechanism MUST NOT do this in such a way that it 1677 increases the number of messages while at high loads." 1679 If we were to introduce new messaging -- say, by defining a new 1680 overload control Application -- then a node in overload would be 1681 required to generate more messages at high load in order to keep 1682 overload information in its peers up-to-date. 1684 If further analysis determines that other factors are ultimately more 1685 important than the provisions of REQ 14, several factors would need 1686 to be considered. 1688 First and foremost would be the prohibition, in the base Diameter 1689 specification ([I-D.ietf-dime-rfc3588bis]), against adding new 1690 commands to an existing application. Specifically, section 1.3.4 1691 stipulates: "[A] new Diameter application MUST be created when one or 1692 more of the following criteria are met:... A new command is used 1693 within the existing application either because an additional command 1694 is added, an existing command has been modified so that a new Command 1695 Code had to be registered, or a command has been deleted." Because 1696 of this stipulation, the addition of new command codes to existing 1697 applications would require registration of entirely new application 1698 IDs for those applications to support overload control. We consider 1699 this to be too disruptive a change to consider. 1701 By the author's reading, there is no provision that exempts the 1702 "Diameter Common Messages" Application (Application ID 0) from the 1703 above clauses. This effectively prohibits the additional of new 1704 messages to this Application. While it may be theoretically possible 1705 to specify behavior that hijacks the DWR/DWA watchdog messages for 1706 the purpose of overload control messaging, doing so requires a 1707 complete redefinition of their behavior and, fundamentally, their 1708 semantics. This approach seems, at first blush, to be an 1709 unacceptable change to the base Application. 1711 The remaining approach -- defining a new application for overload 1712 control -- has some promise, if we decide not to fulfill REQ 14. It 1713 remains to be seen whether the users of the Diameter protocol, 1714 including other SDOs who define applications for Diameter, are 1715 willing to specify the use of multiple Diameter Applications for use 1716 on a single reference point. 1718 D.2. Load AVP in All Packets 1720 Some have questioned the currently specified behavior of message 1721 senders including a Load AVP in every message sent. This is being 1722 proposed as a potential performance enhancement, with the idea being 1723 that message recipients can save processing time by examining 1724 arbitrarily selected messages for load information, rather than 1725 looking for a Load AVP in every message that arrives. Of course, to 1726 enable this kind of sampling, the Load AVP must be guaranteed to be 1727 present; otherwise, attempts to find it will occasionally fail. 1729 The reciprocal approach, of sending a Load AVP only when the Load has 1730 changed (or changed by more than a certain amount), requires the 1731 recipient to search through the Load-Info grouped AVP in every 1732 message received in order to determine whether a Load AVP is present. 1734 On a cursory analysis, we determined that appending a Load AVP to 1735 each message is fundamentally a cheaper operation than traversing the 1736 contents of each Load-Info AVP to determine whether a Load AVP is 1737 present. 1739 If a later decision is made to require examination of each message to 1740 determine whether it include a Load AVP, we may be able to obtain 1741 some efficiencies by requiring Load to be the first AVP in the Load- 1742 Info AVP. 1744 D.3. Graceful Failure 1746 Some commenters have raised the question of whether a node can reject 1747 an incoming connection upon recognizing that the remote node does not 1748 support the Diameter overload control mechanism. One suggestion has 1749 been to add a response code to indicate exactly such a situation. 1751 So far, we have opted against doing so. Instead, we anticipate an 1752 incremental deployment of the overload control mechanism, which will 1753 likely consist of a mixture of nodes that support and node that do 1754 not support the mechanism. Were we to allow the rejection of 1755 connections that do not support the mechanism, we would create a 1756 situation that necessitates a "flag day," on which every Diameter 1757 node in a network is required to simultaneously, and in perfect 1758 synchronization, switch from not supporting the overload mechanism, 1759 to supporting it. 1761 Given the operational difficulty of the foregoing, we have decided 1762 that defining a response code, even if optional, that was to be used 1763 to reject connections merely for the lack of overload control 1764 support, would form an attractive nuisance for implementors. The 1765 result could easily be a potential operational nightmare for network 1766 operators. 1768 Author's Address 1770 Adam Roach 1771 Tekelec 1772 17210 Campbell Rd. 1773 Suite 250 1774 Dallas, TX 75252 1775 US 1777 Email: adam@nostrum.com