idnits 2.17.1 draft-ietf-dime-doic-rate-control-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC2119], [RFC7683]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 16, 2017) is 2619 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 758 == Missing Reference: 'T' is mentioned on line 758, but not defined == Unused Reference: 'RFC5226' is defined on line 806, but no explicit reference was found in the text == Unused Reference: 'RFC6733' is defined on line 811, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-dime-agent-overload-00 ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Diameter Maintenance and Extensions (DIME) S. Donovan, Ed. 3 Internet-Draft Oracle 4 Intended status: Standards Track E. Noel 5 Expires: August 20, 2017 AT&T Labs 6 February 16, 2017 8 Diameter Overload Rate Control 9 draft-ietf-dime-doic-rate-control-05.txt 11 Abstract 13 This specification documents an extension to the Diameter Overload 14 Indication Conveyance (DOIC) [RFC7683] base solution. This extension 15 adds a new overload control abatement algorithm. This abatement 16 algorithm allows for a DOIC reporting node to specify a maximum rate 17 at which a DOIC reacting node sends Diameter requests to the DOIC 18 reporting node. 20 Requirements 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in RFC 2119 [RFC2119]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on August 20, 2017. 43 Copyright Notice 45 Copyright (c) 2017 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Terminology and Abbreviations . . . . . . . . . . . . . . . . 4 62 3. Interaction with DOIC report types . . . . . . . . . . . . . 5 63 4. Capability Announcement . . . . . . . . . . . . . . . . . . . 5 64 5. Overload Report Handling . . . . . . . . . . . . . . . . . . 6 65 5.1. Reporting Node Overload Control State . . . . . . . . . . 6 66 5.2. Reacting Node Overload Control State . . . . . . . . . . 6 67 5.3. Reporting Node Maintenance of Overload Control State . . 7 68 5.4. Reacting Node Maintenance of Overload Control State . . . 7 69 5.5. Reporting Node Behavior for Rate Abatement Algorithm . . 7 70 5.6. Reacting Node Behavior for Rate Abatement Algorithm . . . 8 71 6. Rate Abatement Algorithm AVPs . . . . . . . . . . . . . . . . 8 72 6.1. OC-Supported-Features AVP . . . . . . . . . . . . . . . . 8 73 6.1.1. OC-Feature-Vector AVP . . . . . . . . . . . . . . . . 8 74 6.2. OC-OLR AVP . . . . . . . . . . . . . . . . . . . . . . . 9 75 6.2.1. OC-Maximum-Rate AVP . . . . . . . . . . . . . . . . . 9 76 6.3. Attribute Value Pair flag rules . . . . . . . . . . . . . 9 77 7. Rate Based Abatement Algorithm . . . . . . . . . . . . . . . 10 78 7.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 79 7.2. Reporting Node Behavior . . . . . . . . . . . . . . . . . 10 80 7.3. Reacting Node Behavior . . . . . . . . . . . . . . . . . 11 81 7.3.1. Default Algorithm . . . . . . . . . . . . . . . . . . 11 82 7.3.2. Priority Treatment . . . . . . . . . . . . . . . . . 14 83 7.3.3. Optional Enhancement: Avoidance of Resonance . . . . 16 84 8. IANA Consideration . . . . . . . . . . . . . . . . . . . . . 17 85 8.1. AVP codes . . . . . . . . . . . . . . . . . . . . . . . . 17 86 8.2. New registries . . . . . . . . . . . . . . . . . . . . . 17 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 90 11.1. Normative References . . . . . . . . . . . . . . . . . . 18 91 11.2. Informative References . . . . . . . . . . . . . . . . . 18 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 94 1. Introduction 96 This document defines a new Diameter overload control abatement 97 algorithm. 99 The base Diameter overload specification [RFC7683] defines the loss 100 algorithm as the default Diameter overload abatement algorithm. The 101 loss algorithm allows a reporting node to instruct a reacting node to 102 reduce the amount of traffic sent to the reporting node by abating 103 (diverting or throttling) a percentage of requests sent to the 104 server. While this can effectively decrease the load handled by the 105 server, it does not directly address cases where the rate of arrival 106 of service requests increases quickly. If the service requests that 107 result in Diameter transactions increase quickly then the loss 108 algorithm cannot guarantee the load presented to the server remains 109 below a specific rate level. The loss algorithm can be slow to 110 protect the stability of reporting nodes when subjected with rapidly 111 changing loads. 113 Consider the case where a reacting node is handling 100 service 114 requests per second, where each of these service requests results in 115 one Diameter transaction being sent to a reporting node. If the 116 reporting node is approaching an overload state, or is already in an 117 overload state, it will send a Diameter overload report requesting a 118 percentage reduction in traffic sent. Assume for this discussion 119 that the reporting node requests a 10% reduction. The reacting node 120 will then abate (diverting or throttling) ten Diameter transactions a 121 second, sending the remaining 90 transactions per second to the 122 reporting node. 124 Now assume that the reacting node's service requests spikes to 1000 125 requests per second. The reacting node will continue to honor the 126 reporting node's request for a 10% reduction in traffic. This 127 results, in this example, in the reacting node sending 900 Diameter 128 transactions per second, abating the remaining 100 transactions per 129 second. This spike in traffic is significantly higher than the 130 reporting node is expecting to handle and can result in negative 131 impacts to the stability of the reporting node. 133 The reporting node can, and likely would, send another overload 134 report requesting that the reacting node abate 91% of requests to get 135 back to the desired 90 transactions per second. However, once the 136 spike has abated and the reacting node handled service requests 137 returns to 100 per second, this will result in just 9 transactions 138 per second being sent to the reporting node, requiring a new overload 139 report setting the reduction percentage back to 10%. This control 140 feedback loop has the potential to make the situation worse by 141 causing wide fluctuations in traffic on multiple nodes in the 142 Diameter network. 144 One of the benefits of a rate based algorithm is that it better 145 handles spikes in traffic. Instead of sending a request to reduce 146 traffic by a percentage, the rate approach allows the reporting node 147 to specify the maximum number of Diameter requests per second that 148 can be sent to the reporting node. For instance, in this example, 149 the reporting node could send a rate-based request specifying the 150 maximum transactions per second to be 90. The reacting node will 151 send the 90 regardless of whether it is receiving 100 or 1000 service 152 requests per second. 154 This document extends the base DOIC solution [RFC7683] to add support 155 for the rate based overload abatement algorithm. 157 This document draws heavily on work in the SIP Overload Control 158 working group. The definition of the rate abatement algorithm is 159 copied almost verbatim from the SOC document [RFC7415], with changes 160 focused on making the wording consistent with the DOIC solution and 161 the Diameter protocol. 163 2. Terminology and Abbreviations 165 Diameter Node 167 A RFC6733 Diameter Client, RFC6733 Diameter Server, or RFC6733 168 Diameter Agent. 170 Diameter Endpoint 172 An RFC6733 Diameter Client or RFC6733 Diameter Server. 174 DOIC Node 176 A Diameter Node that supports the DOIC solution defined in 177 [RFC7683]. 179 Reporting Node 181 A DOIC Node that sends a DOIC overload report. 183 Reacting Node 185 A DOIC Node that receives and acts on a DOIC overload report. 187 3. Interaction with DOIC report types 189 As of the publication of this specification there are two DOIC report 190 types defined with the specification of a third in progress: 192 1. Host - Overload of a specific Diameter Application at a specific 193 Diameter Node as defined in [RFC7683]. 195 2. Realm - Overload of a specific Diameter Application at a specific 196 Diameter Realm as defined in [RFC7683]. 198 3. Peer - Overload of a specific Diameter peer as defined in 199 [I-D.ietf-dime-agent-overload]. 201 The rate algorithm MAY be selected by reporting nodes for any of 202 these report types. 204 It is expected that all report types defined in the future will 205 indicate whether or not the rate algorithm can be used with that 206 report type. 208 4. Capability Announcement 210 This extension defines the rate abatement algorithm (referred to as 211 rate in this document) feature. Support for the rate feature will be 212 reflected by use of a new value, as defined in Section 6.1.1, in the 213 OC-Feature-Vector AVP per the rules defined in [RFC7683]. 215 Note that Diameter nodes that support the rate feature will, by 216 definition, support both the loss and rate based abatement 217 algorithms. DOIC reacting nodes SHOULD indicate support for both the 218 loss and rate algorithms in the OC-Feature-Vector AVP. 220 There may be local policy reasons that cause a DOIC node that 221 supports the rate abatement algorithm to not include it in the OC- 222 Feature-Vector. All reacting nodes, however, must continue to 223 include loss in the OC-Feature-Vector in order to remain compliant 224 with [RFC7683]. 226 A reporting node MAY select one abatement algorithm to apply to host 227 and realm reports and a different algorithm to apply to peer reports. 229 For host or realm reports the selected algorithm is reflected in 230 the OC-Feature-Vector AVP sent as part of the OC-Supported- 231 Features AVP included in answer messages for transaction where the 232 request contained an OC-Supported-Features AVP. This is per the 233 procedures defined in [RFC7683]. 235 For peer reports the selected algorithm is reflected in the OC- 236 Peer-Algo AVP sent as part of the OC-Supported-Features AVP 237 included answer messages for transactions where the request 238 contained an OC-Supported-Features AVP. This is per the 239 procedures defined in [I-D.ietf-dime-agent-overload]. 241 Editor's Node: The peer report specification is still under 242 development and, as such, the above paragraph is subject to 243 change. 245 5. Overload Report Handling 247 This section describes any changes to the behavior defined in 248 [RFC7683] for handling of overload reports when the rate overload 249 abatement algorithm is used. 251 5.1. Reporting Node Overload Control State 253 A reporting node that uses the rate abatement algorithm SHOULD 254 maintain reporting node Overload Control State (OCS) for each 255 reacting node to which it sends a rate Overload Report (OLR). 257 This is different from the behavior defined in [RFC7683] where a 258 single loss percentage sent to all reacting nodes. 260 A reporting node SHOULD maintain OCS entries when using the rate 261 abatement algorithm per supported Diameter application, per targeted 262 reacting node and per report type. 264 A rate OCS entry is identified by the tuple of Application-Id, report 265 type and DiameterIdentity of the target of the rate OLR. 267 A reporting node that supports the rate abatement algorithm MUST 268 include the rate of its abatement algorithm in the OC-Maximum-Rate 269 AVP when sending a rate OLR. 271 All other elements for the OCS defined in [RFC7683] and 272 [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS 273 when using the rate abatement algorithm. 275 5.2. Reacting Node Overload Control State 277 A reacting node that supports the rate abatement algorithm MUST 278 indicate rate as the selected abatement algorithm in the reacting 279 node OCS when receiving a rate OLR. 281 A reacting node that supports the rate abatement algorithm MUST 282 include the rate specified in the OC-Maximum-Rate AVP included in the 283 OC-OLR AVP as an element of the abatement algorithm specific portion 284 of reacting node OCS entries. 286 All other elements for the OCS defined in [RFC7683] and 287 [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS 288 when using the rate abatement algorithm. 290 5.3. Reporting Node Maintenance of Overload Control State 292 A reporting node that has selected the rate overload abatement 293 algorithm and enters an overload condition MUST indicate rate as the 294 abatement algorithm in the resulting reporting node OCS entries. 296 A reporting node that has selected the rate abatement algorithm and 297 enters an overload condition MUST indicate the selected rate in the 298 resulting reporting node OCS entries. 300 When selecting the rate algorithm in the response to a request that 301 contained an OC-Supporting-Features AVP with an OC-Feature-Vector AVP 302 indicating support for the rate feature, a reporting node MUST ensure 303 that a reporting node OCS entry exists for the target of the overload 304 report. The target is defined as follows: 306 o For Host reports the target is the DiameterIdentity contained in 307 the Origin-Host AVP received in the request. 309 o For Realm reports the target is the DiameterIdentity contained in 310 the Origin-Realm AVP received in the request. 312 o For Peer reports the target is the DiameterIdentity of the 313 Diameter Peer from which the request was received. 315 5.4. Reacting Node Maintenance of Overload Control State 317 When receiving an answer message indicating that the reporting node 318 has selected the rate algorithm, a reacting node MUST indicate the 319 rate abatement algorithm in the reacting node OCS entry for the 320 reporting node. 322 A reacting node receiving an overload report for the rate abatement 323 algorithm MUST save the rate received in the OC-Maximum-Rate AVP 324 contained in the OC-OLR AVP in the reacting node OCS entry. 326 5.5. Reporting Node Behavior for Rate Abatement Algorithm 328 When in an overload condition with rate selected as the overload 329 abatement algorithm and when handling a request that contained an OC- 330 Supported-Features AVP that indicated support for the rate abatement 331 algorithm, a reporting node SHOULD include an OC-OLR AVP for the rate 332 algorithm using the parameters stored in the reporting node OCS for 333 the target of the overload report. 335 When sending an overload report for the rate algorithm, the OC- 336 Maximum-Rate AVP MUST be included and the OC-Reduction-Percentage AVP 337 MUST NOT be included. 339 5.6. Reacting Node Behavior for Rate Abatement Algorithm 341 When determining if abatement treatment should be applied to a 342 request being sent to a reporting node that has selected the rate 343 overload abatement algorithm, the reacting node MAY use the algorithm 344 detailed in Section 7. 346 Note: Other algorithms for controlling the rate can be implemented 347 by the reacting node as long as they result in the correct rate of 348 traffic being sent to the reporting node. 350 Once a determination is made by the reacting node that an individual 351 Diameter request is to be subjected to abatement treatment then the 352 procedures for throttling and diversion defined in [RFC7683] and 353 [I-D.ietf-dime-agent-overload] apply. 355 6. Rate Abatement Algorithm AVPs 357 6.1. OC-Supported-Features AVP 359 The rate algorithm does not add any new AVPs to the OC-Supported- 360 Features AVP. 362 The rate algorithm does add a new feature bit to be carried in the 363 OC-Feature-Vector AVP. 365 6.1.1. OC-Feature-Vector AVP 367 This extension adds the following capabilities to the OC-Feature- 368 Vector AVP. 370 OLR_RATE_ALGORITHM (0x0000000000000004) 372 When this flag is set by the overload control endpoint it 373 indicates that the DOIC Node supports the rate overload control 374 algorithm. 376 6.2. OC-OLR AVP 378 This extension defines the OC-Maximum-Rate AVP to be an optional part 379 of the OC-OLR AVP. 381 OC-OLR ::= < AVP Header: TBD2 > 382 < OC-Sequence-Number > 383 < OC-Report-Type > 384 [ OC-Reduction-Percentage ] 385 [ OC-Validity-Duration ] 386 [ SourceID ] 387 [ OC-Maximum-Rate ] 388 * [ AVP ] 390 This extension makes no changes to the other AVPs that are part of 391 the OC-OLR AVP. 393 This extension does not define new overload report types. The 394 existing report types of host and realm defined in [RFC7683] apply to 395 the rate control algorithm. The peer report type defined in 396 [I-D.ietf-dime-agent-overload] also applies to the rate control 397 algorithm. 399 6.2.1. OC-Maximum-Rate AVP 401 The OC-Maximum-Rate AVP (AVP code TBD1) is of type Unsigned32 and 402 describes the maximum rate that the sender is requested to send 403 traffic. This is specified in terms of requests per second. 405 A value of zero indicates that no traffic is to be sent. 407 6.3. Attribute Value Pair flag rules 409 +---------+ 410 |AVP flag | 411 |rules | 412 +----+----+ 413 AVP Section | |MUST| 414 Attribute Name Code Defined Value Type |MUST| NOT| 415 +---------------------------------------------------------+----+----+ 416 |OC-Maximum-Rate TBD1 6.2 Unsigned32 | | V | 417 +---------------------------------------------------------+----+----+ 419 7. Rate Based Abatement Algorithm 421 This section is pulled from [RFC7415], with minor changes needed to 422 make it apply to the Diameter protocol. 424 7.1. Overview 426 The reporting node is the one protected by the overload control 427 algorithm defined here. The reacting node is the one that abates 428 traffic towards the server. 430 Following the procedures defined in [draft-ietf-dime-doic], the 431 reacting node and reporting node signal one another support for rate- 432 based overload control. 434 Then periodically, the reporting node relies on internal measurements 435 (e.g. CPU utilization or queuing delay) to evaluate its overload 436 state and estimate a target maximum Diameter request rate in number 437 of requests per second (as opposed to target percent reduction in the 438 case of loss-based abatement). 440 When in an overloaded state, the reporting node uses the OC-OLR AVP 441 to inform reacting nodes of its overload state and of the target 442 Diameter request rate. 444 Upon receiving the overload report with a target maximum Diameter 445 request rate, each reacting node applies abatement treatment for new 446 Diameter requests towards the reporting node. 448 7.2. Reporting Node Behavior 450 The actual algorithm used by the reporting node to determine its 451 overload state and estimate a target maximum Diameter request rate is 452 beyond the scope of this document. 454 However, the reporting node MUST periodically evaluate its overload 455 state and estimate a target Diameter request rate beyond which it 456 would become overloaded. The reporting node must allocate a portion 457 of the target Diameter request rate to each of its reacting nodes. 458 The reporting node may set the same rate for every reacting node, or 459 may set different rates for different reacting node. 461 The maximum rate determined by the reporting node for a reacting node 462 applies to the entire stream of Diameter requests, even though 463 abatement may only affect a particular subset of the requests, since 464 the reacting node might apply priority as part of its decision of 465 which requests to abate. 467 When setting the maximum rate for a particular reacting node, the 468 reporting node may need take into account the workload (e.g. CPU 469 load per request) of the distribution of message types from that 470 reacting node. Furthermore, because the reacting node may prioritize 471 the specific types of messages it sends while under overload 472 restriction, this distribution of message types may be different from 473 the message distribution for that reacting node under non-overload 474 conditions (e.g., either higher or lower CPU load). 476 Note that the AVP for the rate algorithm is an upper bound (in 477 request messages per second) on the traffic sent by the reacting node 478 to the reporting node. The reacting node may send traffic at a rate 479 significantly lower than the upper bound, for a variety of reasons. 481 In other words, when multiple reacting nodes are being controlled by 482 an overloaded reporting node, at any given time some reacting nodes 483 may receive requests at a rate below its target maximum Diameter 484 request rate while others above that target rate. But the resulting 485 request rate presented to the overloaded reporting node will converge 486 towards the target Diameter request rate. 488 Upon detection of overload, and the determination to invoke overload 489 controls, the reporting node MUST follow the specifications in 490 [RFC7683] to notify its clients of the allocated target maximum 491 Diameter request rate and to notify them that the rate overload 492 abatement is in effect. 494 The reporting node MUST use the OC-Maximum-Rate AVP defined in this 495 specification to communicate a target maximum Diameter request rate 496 to each of its clients. 498 7.3. Reacting Node Behavior 500 7.3.1. Default Algorithm 502 In determining whether or not to transmit a specific message, the 503 reacting node can use any algorithm that limits the message rate to 504 the OC-Maximum-Rate AVP value in units of messages per second. For 505 ease of discussion, we define T = 1/[OC-Maximum-Rate] as the target 506 inter-Diameter request interval. It may be strictly deterministic, 507 or it may be probabilistic. It may, or may not, have a tolerance 508 factor, to allow for short bursts, as long as the long term rate 509 remains below 1/T. 511 The algorithm may have provisions for prioritizing traffic. 513 If the algorithm requires other parameters (in addition to "T", which 514 is 1/OC-Maximum-Rate), they may be set autonomously by the reacting 515 node, or they may be negotiated independently between reacting node 516 and reporting node. 518 In either case, the coordination is out of scope for this document. 519 The default algorithms presented here (one with and one without 520 provisions for prioritizing traffic) are only examples. 522 To apply abatement treatment to new Diameter requests at the rate 523 specified in the OC-Maximum-Rate AVP value sent by the reporting node 524 to its reacting nodes, the reacting node MAY use the proposed default 525 algorithm for rate-based control or any other equivalent algorithm 526 that forward messages in conformance with the upper bound of 1/T 527 messages per second. 529 The default Leaky Bucket algorithm presented here is based on [ITU-T 530 Rec. I.371] Appendix A.2. The algorithm makes it possible for 531 reacting nodes to deliver Diameter requests at a rate specified in 532 the OC-Maximum-Rate value with tolerance parameter TAU (preferably 533 configurable). 535 Conceptually, the Leaky Bucket algorithm can be viewed as a finite 536 capacity bucket whose real-valued content drains out at a continuous 537 rate of 1 unit of content per time unit and whose content increases 538 by the increment T for each forwarded Diameter request. T is 539 computed as the inverse of the rate specified in the OC-Maximum-Rate 540 AVP value, namely T = 1 / OC-Maximum-Rate. 542 Note that when the OC-Maximum-Rate value is 0 with a non-zero OC- 543 Validity-Duration, then the reacting node should apply abatement 544 treatment to 100% of Diameter requests destined to the overloaded 545 reporting node. However, when the OC-Validity-Duration value is 0, 546 the reacting node should stop applying abatement treatment. 548 If, at a new Diameter request arrival, the content of the bucket is 549 less than or equal to the limit value TAU, then the Diameter request 550 is forwarded to the server; otherwise, the abatement treatment is 551 applied to the Diameter request. 553 Note that the capacity of the bucket (the upper bound of the counter) 554 is (T + TAU). 556 The tolerance parameter TAU determines how close the long-term 557 admitted rate is to an ideal control that would admit all Diameter 558 requests for arrival rates less than 1/T and then admit Diameter 559 requests precisely at the rate of 1/T for arrival rates above 1/T. 560 In particular at mean arrival rates close to 1/T, it determines the 561 tolerance to deviation of the inter-arrival time from T (the larger 562 TAU the more tolerance to deviations from the inter-departure 563 interval T). 565 This deviation from the inter-departure interval influences the 566 admitted rate burstyness, or the number of consecutive Diameter 567 requests forwarded to the reporting node (burst size proportional to 568 TAU over the difference between 1/T and the arrival rate). 570 In situations where reacting nodes are configured with some knowledge 571 about the reporting node (e.g., operator pre-provisioning), it can be 572 beneficial to choose a value of TAU based on how many reacting nodes 573 will be sending requests to the reporting node. 575 Reporting nodes with a very large number of reacting nodes, each with 576 a relatively small arrival rate, will generally benefit from a 577 smaller value for TAU in order to limit queuing (and hence response 578 times) at the reporting node when subjected to a sudden surge of 579 traffic from all reacting nodes. Conversely, a reporting node with a 580 relatively small number of reacting nodes, each with proportionally 581 larger arrival rate, will benefit from a larger value of TAU. 583 Once the control has been activated, at the arrival time of the k-th 584 new Diameter request, ta(k), the content of the bucket is 585 provisionally updated to the value 587 X' = X - (ta(k) - LCT) 589 where X is the value of the leaky bucket counter after arrival of the 590 last forwarded Diameter request, and LCT is the time at which the 591 last Diameter request was forwarded. 593 If X' is less than or equal to the limit value TAU, then the new 594 Diameter request is forwarded and the leaky bucket counter X is set 595 to X' (or to 0 if X' is negative) plus the increment T, and LCT is 596 set to the current time ta(k). If X' is greater than the limit value 597 TAU, then the abatement treatment is applied to the new Diameter 598 request and the values of X and LCT are unchanged. 600 When the first response from the reporting node has been received 601 indicating control activation (OC-Validity-Duration>0), LCT is set to 602 the time of activation, and the leaky bucket counter is initialized 603 to the parameter TAU0 (preferably configurable) which is 0 or larger 604 but less than or equal to TAU. 606 TAU can assume any positive real number value and is not necessarily 607 bounded by T. 609 TAU=4*T is a reasonable compromise between burst size and abatement 610 rate adaptation at low offered rate. 612 Note that specification of a value for TAU, and any communication or 613 coordination between servers, is beyond the scope of this document. 615 A reference algorithm is shown below. 617 No priority case: 619 // T: inter-transmission interval, set to 1 / OC-Maximum-Rate 620 // TAU: tolerance parameter 621 // ta: arrival time of the most recent arrival 622 // LCT: arrival time of last SIP request that was sent to the server 623 // (initialized to the first arrival time) 624 // X: current value of the leaky bucket counter (initialized to 625 // TAU0) 627 // After most recent arrival, calculate auxiliary variable Xp 628 Xp = X - (ta - LCT); 630 if (Xp <= TAU) { 631 // Transmit SIP request 632 // Update X and LCT 633 X = max (0, Xp) + T; 634 LCT = ta; 635 } else { 636 // Reject SIP request 637 // Do not update X and LCT 638 } 640 7.3.2. Priority Treatment 642 The reacting node is responsible for applying message priority and 643 for maintaining two categories of requests: Request candidates for 644 reduction, requests not subject to reduction (except under 645 extenuating circumstances when there aren't any messages in the first 646 category that can be reduced). 648 Accordingly, the proposed Leaky bucket implementation is modified to 649 support priority using two thresholds for Diameter requests in the 650 set of request candidates for reduction. With two priorities, the 651 proposed Leaky bucket requires two thresholds TAU1 < TAU2: 653 o All new requests would be admitted when the leaky bucket counter 654 is at or below TAU1, 656 o Only higher priority requests would be admitted when the leaky 657 bucket counter is between TAU1 and TAU2, 659 o All requests would be rejected when the bucket counter is above 660 TAU2. 662 This can be generalized to n priorities using n thresholds for n>2 in 663 the obvious way. 665 With a priority scheme that relies on two tolerance parameters (TAU2 666 influences the priority traffic, TAU1 influences the non-priority 667 traffic), always set TAU1 <= TAU2 (TAU is replaced by TAU1 and TAU2). 668 Setting both tolerance parameters to the same value is equivalent to 669 having no priority. TAU1 influences the admitted rate the same way 670 as TAU does when no priority is set. And the larger the difference 671 between TAU1 and TAU2, the closer the control is to strict priority 672 queuing. 674 TAU1 and TAU2 can assume any positive real number value and is not 675 necessarily bounded by T. 677 Reasonable values for TAU0, TAU1 & TAU2 are: 679 o TAU0 = 0, 681 o TAU1 = 1/2 * TAU2, and 683 o TAU2 = 10 * T. 685 Note that specification of a value for TAU1 and TAU2, and any 686 communication or coordination between servers, is beyond the scope of 687 this document. 689 A reference algorithm is shown below. 691 Priority case: 693 // T: inter-transmission interval, set to 1 / OC-Maximum-Rate 694 // TAU1: tolerance parameter of no priority Diameter requests 695 // TAU2: tolerance parameter of priority Diameter requests 696 // ta: arrival time of the most recent arrival 697 // LCT: arrival time of last Diameter request that was sent to the server 698 // (initialized to the first arrival time) 699 // X: current value of the leaky bucket counter (initialized to 700 // TAU0) 702 // After most recent arrival, calculate auxiliary variable Xp 703 Xp = X - (ta - LCT); 705 if (AnyRequestReceived && Xp <= TAU1) || (PriorityRequestReceived && 706 Xp <= TAU2 && Xp > TAU1) { 707 // Transmit Diameter request 708 // Update X and LCT 709 X = max (0, Xp) + T; 710 LCT = ta; 711 } else { 712 // Apply abatement treatment to Diameter request 713 // Do not update X and LCT 714 } 716 7.3.3. Optional Enhancement: Avoidance of Resonance 718 As the number of reacting node sources of traffic increases and the 719 throughput of the reporting node decreases, the maximum rate admitted 720 by each reacting node needs to decrease, and therefore the value of T 721 becomes larger. Under some circumstances, e.g. if the traffic arises 722 very quickly simultaneously at many sources, the occupancies of each 723 bucket can become synchronized, resulting in the admissions from each 724 source being close in time and batched or very 'peaky' arrivals at 725 the reporting node, which not only gives rise to control instability, 726 but also very poor delays and even lost messages. An appropriate 727 term for this is 'resonance' [Erramilli]. 729 If the network topology is such that resonance can occur, then a 730 simple way to avoid resonance is to randomize the bucket occupancy at 731 two appropriate points -- at the activation of control and whenever 732 the bucket empties -- as described below. 734 After updating the value of the leaky bucket to X', generate a value 735 u as follows: 737 if X' > 0, then u=0 739 else if X' <= 0, then let u be set to a random value uniformly 740 distributed between -1/2 and +1/2 741 Then (only) if the arrival is admitted, increase the bucket by an 742 amount T + uT, which will therefore be just T if the bucket hadn't 743 emptied, or lie between T/2 and 3T/2 if it had. 745 This randomization should also be done when control is activated, 746 i.e. instead of simply initializing the leaky bucket counter to TAU0, 747 initialize it to TAU0 + uT, where u is uniformly distributed as 748 above. Since activation would have been a result of response to a 749 request sent by the reacting node, the second term in this expression 750 can be interpreted as being the bucket increment following that 751 admission. 753 This method has the following characteristics: 755 o If TAU0 is chosen to be equal to TAU and all sources activate 756 control at the same time due to an extremely high request rate, 757 then the time until the first request admitted by each reacting 758 node would be uniformly distributed over [0,T]; 760 o The maximum occupancy is TAU + (3/2)T, rather than TAU + T without 761 randomization; 763 o For the special case of 'classic gapping' where TAU=0, then the 764 minimum time between admissions is uniformly distributed over 765 [T/2, 3T/2], and the mean time between admissions is the same, 766 i.e. T+1/R where R is the request arrival rate. 768 o At high load randomization rarely occurs, so there is no loss of 769 precision of the admitted rate, even though the randomized 770 'phasing' of the buckets remains. 772 8. IANA Consideration 774 8.1. AVP codes 776 New AVPs defined by this specification are listed in Section 6. All 777 AVP codes are allocated from the 'Authentication, Authorization, and 778 Accounting (AAA) Parameters' AVP Codes registry. 780 8.2. New registries 782 There are no new IANA registries introduced by this document. 784 9. Security Considerations 786 The rate overload abatement mechanism is an extension to the base 787 Diameter overload mechanism. As such, all of the security 788 considerations outlined in [RFC7683] apply to the rate overload 789 abatement mechanism. 791 10. Acknowledgements 793 11. References 795 11.1. Normative References 797 [I-D.ietf-dime-agent-overload] 798 Donovan, S., "Diameter Agent Overload", draft-ietf-dime- 799 agent-overload-00 (work in progress), December 2014. 801 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 802 Requirement Levels", BCP 14, RFC 2119, 803 DOI 10.17487/RFC2119, March 1997, 804 . 806 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 807 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 808 DOI 10.17487/RFC5226, May 2008, 809 . 811 [RFC6733] Fajardo, V., Ed., Arkko, J., Loughney, J., and G. Zorn, 812 Ed., "Diameter Base Protocol", RFC 6733, 813 DOI 10.17487/RFC6733, October 2012, 814 . 816 [RFC7683] Korhonen, J., Ed., Donovan, S., Ed., Campbell, B., and L. 817 Morand, "Diameter Overload Indication Conveyance", 818 RFC 7683, DOI 10.17487/RFC7683, October 2015, 819 . 821 11.2. Informative References 823 [Erramilli] 824 Erramilli, A. and L. Forys, "Traffic Synchronization 825 Effects In Teletraffic Systems", 1991. 827 [RFC7415] Noel, E. and P. Williams, "Session Initiation Protocol 828 (SIP) Rate Control", RFC 7415, DOI 10.17487/RFC7415, 829 February 2015, . 831 Authors' Addresses 832 Steve Donovan (editor) 833 Oracle 834 17210 Campbell Road 835 Dallas, Texas 75254 836 United States 838 Email: srdonovan@usdonovans.com 840 Eric Noel 841 AT&T Labs 842 200s Laurel Avenue 843 Middletown, NJ 07747 844 United States 846 Email: ecnoel@research.att.com