| < draft-ietf-dime-overload-reqs-02.txt | draft-ietf-dime-overload-reqs-03.txt > | |||
|---|---|---|---|---|
| Network Working Group E. McMurry | Network Working Group E. McMurry | |||
| Internet-Draft B. Campbell | Internet-Draft B. Campbell | |||
| Intended status: Standards Track Tekelec | Intended status: Standards Track Tekelec | |||
| Expires: June 20, 2013 December 17, 2012 | Expires: July 19, 2013 January 15, 2013 | |||
| Diameter Overload Control Requirements | Diameter Overload Control Requirements | |||
| draft-ietf-dime-overload-reqs-02 | draft-ietf-dime-overload-reqs-03 | |||
| Abstract | Abstract | |||
| When a Diameter server or agent becomes overloaded, it needs to be | When a Diameter server or agent becomes overloaded, it needs to be | |||
| able to gracefully reduce its load, typically by informing clients to | able to gracefully reduce its load, typically by informing clients to | |||
| reduce sending traffic for some period of time. Otherwise, it must | reduce sending traffic for some period of time. Otherwise, it must | |||
| continue to expend resources parsing and responding to Diameter | continue to expend resources parsing and responding to Diameter | |||
| messages, possibly resulting in congestion collapse. The existing | messages, possibly resulting in congestion collapse. The existing | |||
| mechanisms provided by Diameter are not sufficient for this purpose. | Diameter mechanisms, listed in Section 3 are not sufficient for this | |||
| This document describes the limitations of the existing mechanisms, | purpose. This document describes the limitations of the existing | |||
| and provides requirements for new overload management mechanisms. | mechanisms in Section 4. Requirements for new overload management | |||
| mechanisms are provided in Section 7. | ||||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on June 20, 2013. | This Internet-Draft will expire on July 19, 2013. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| skipping to change at page 2, line 20 ¶ | skipping to change at page 2, line 21 ¶ | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Causes of Overload . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Causes of Overload . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.2. Effects of Overload . . . . . . . . . . . . . . . . . . . 5 | 1.2. Effects of Overload . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.3. Overload vs. Network Congestion . . . . . . . . . . . . . 5 | 1.3. Overload vs. Network Congestion . . . . . . . . . . . . . 5 | |||
| 1.4. Diameter Applications in a Broader Network . . . . . . . . 5 | 1.4. Diameter Applications in a Broader Network . . . . . . . . 5 | |||
| 1.5. Documentation Conventions . . . . . . . . . . . . . . . . 6 | 1.5. Documentation Conventions . . . . . . . . . . . . . . . . 6 | |||
| 2. Overload Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Overload Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2.1. Peer to Peer Scenarios . . . . . . . . . . . . . . . . . . 7 | 2.1. Peer to Peer Scenarios . . . . . . . . . . . . . . . . . . 7 | |||
| 2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9 | 2.2. Agent Scenarios . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12 | 2.3. Interconnect Scenario . . . . . . . . . . . . . . . . . . 12 | |||
| 3. Extensibility . . . . . . . . . . . . . . . . . . . . . . . . 13 | 3. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4. Existing Mechanisms . . . . . . . . . . . . . . . . . . . . . 14 | 4. Issues with the Current Mechanisms . . . . . . . . . . . . . . 14 | |||
| 5. Issues with the Current Mechanisms . . . . . . . . . . . . . . 14 | 4.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 15 | |||
| 5.1. Problems with Implicit Mechanism . . . . . . . . . . . . . 15 | 4.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 15 | |||
| 5.2. Problems with Explicit Mechanisms . . . . . . . . . . . . 15 | 5. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 16 | |||
| 6. Diameter Overload Case Studies . . . . . . . . . . . . . . . . 16 | 5.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 16 | |||
| 6.1. Overload in Mobile Data Networks . . . . . . . . . . . . . 16 | 5.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 17 | |||
| 6.2. 3GPP Study on Core Network Overload . . . . . . . . . . . 17 | 6. Extensibility and Application Independence . . . . . . . . . . 18 | |||
| 7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 18 | 7. Solution Requirements . . . . . . . . . . . . . . . . . . . . 19 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 24 | |||
| 9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24 | 9.1. Access Control . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24 | 9.2. Denial-of-Service Attacks . . . . . . . . . . . . . . . . 24 | |||
| 9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 24 | 9.3. Replay Attacks . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25 | 9.4. Man-in-the-Middle Attacks . . . . . . . . . . . . . . . . 25 | |||
| 9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25 | 9.5. Compromised Hosts . . . . . . . . . . . . . . . . . . . . 25 | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . . 25 | 10.1. Normative References . . . . . . . . . . . . . . . . . . . 26 | |||
| 10.2. Informative References . . . . . . . . . . . . . . . . . . 26 | 10.2. Informative References . . . . . . . . . . . . . . . . . . 26 | |||
| Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 26 | Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 27 | |||
| Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 26 | Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 27 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 1. Introduction | 1. Introduction | |||
| When a Diameter [RFC6733] server or agent becomes overloaded, it | When a Diameter [RFC6733] server or agent becomes overloaded, it | |||
| needs to be able to gracefully reduce its load, typically by | needs to be able to gracefully reduce its load, typically by | |||
| informing clients to reduce sending traffic for some period of time. | informing clients to reduce sending traffic for some period of time. | |||
| Otherwise, it must continue to expend resources parsing and | Otherwise, it must continue to expend resources parsing and | |||
| responding to Diameter messages, possibly resulting in congestion | responding to Diameter messages, possibly resulting in congestion | |||
| collapse. The existing mechanisms provided by Diameter are not | collapse. The existing mechanisms provided by Diameter are not | |||
| sufficient for this purpose. This document describes the limitations | sufficient for this purpose. This document describes the limitations | |||
| of the existing mechanisms, and provides requirements for new | of the existing mechanisms, and provides requirements for new | |||
| overload management mechanisms. | overload management mechanisms. | |||
| This document draws on [RFC5390] and the work done on SIP overload | This document draws on the work done on SIP overload control | |||
| control as well as on overload practices in SS7 networks and studies | ([RFC5390], [RFC6357]) as well as on experience gained via overload | |||
| done by 3GPP. | handling in Signaling System No. 7 (SS7) networks and studies done by | |||
| the Third Generation Partnersip Project (3GPP) (Section 5). | ||||
| Diameter is not typically an end-user protocol; rather it is | Diameter is not typically an end-user protocol; rather it is | |||
| generally used as one component in support of some end-user activity. | generally used as one component in support of some end-user activity. | |||
| For example, a WiFi access point might use Diameter to authenticate | ||||
| and authorize user access via 802.11. Overload in a network that | For example, a SIP server might use Diameter to authenticate and | |||
| uses Diameter applications will likely spill over into the end-user | authorize user access. Overload in the Diameter backend | |||
| application network. The impact of Diameter overload on the client | infrastructure will likely impact the experience observed by the end | |||
| application (a client application may use the Diameter protocol and | user in the SIP application. | |||
| other protocols to do its job) is beyond the scope of this document. | ||||
| The impact of Diameter overload on the client application (a client | ||||
| application may use the Diameter protocol and other protocols to do | ||||
| its job) is beyond the scope of this document. | ||||
| This document presents non-normative descriptions of causes of | This document presents non-normative descriptions of causes of | |||
| overload along with related scenarios and studies. Finally, it | overload along with related scenarios and studies. Finally, it | |||
| offers a set of normative requirements for an improved overload | offers a set of normative requirements for an improved overload | |||
| indication mechanism. | indication mechanism. | |||
| 1.1. Causes of Overload | 1.1. Causes of Overload | |||
| Overload occurs when an element, such as a Diameter server or agent, | Overload occurs when an element, such as a Diameter server or agent, | |||
| has insufficient resources to successfully process all of the traffic | has insufficient resources to successfully process all of the traffic | |||
| skipping to change at page 5, line 19 ¶ | skipping to change at page 5, line 26 ¶ | |||
| transaction volumes. If a Diameter node becomes overloaded, or even | transaction volumes. If a Diameter node becomes overloaded, or even | |||
| worse, fails completely, a large number of messages may be lost very | worse, fails completely, a large number of messages may be lost very | |||
| quickly. Even with redundant servers, many messages can be lost in | quickly. Even with redundant servers, many messages can be lost in | |||
| the time it takes for failover to complete. While a Diameter client | the time it takes for failover to complete. While a Diameter client | |||
| or agent should be able to retry such requests, an overloaded peer | or agent should be able to retry such requests, an overloaded peer | |||
| may cause a sudden large increase in the number of transaction | may cause a sudden large increase in the number of transaction | |||
| transactions needing to be retried, rapidly filling local queues or | transactions needing to be retried, rapidly filling local queues or | |||
| otherwise contributing to local overload. Therefore Diameter devices | otherwise contributing to local overload. Therefore Diameter devices | |||
| need to be able to shed load before critical failures can occur. | need to be able to shed load before critical failures can occur. | |||
| Diameter depends heavily on The "Authentication, Authorization, | ||||
| and Accounting (AAA) Transport Profile" [RFC3539], which states | ||||
| assumptions about the scale of AAA services which may be incorrect | ||||
| for current uses of Diameter. In particular, the document | ||||
| suggests that AAA services will typically be low volume and that | ||||
| traffic will typically be application-driven. Section 2.1 of that | ||||
| document uses an example of a 48 port NAS. However, Diameter is | ||||
| commonly used in large-scale mobile data environments, where a | ||||
| typical client could be a packet gateway that serves millions of | ||||
| users, and generates Diameter messages at network-driven rates. | ||||
| 1.3. Overload vs. Network Congestion | 1.3. Overload vs. Network Congestion | |||
| This document uses the term "overload" to refer to application-layer | This document uses the term "overload" to refer to application-layer | |||
| overload at Diameter nodes. This is distinct from "network | overload at Diameter nodes. This is distinct from "network | |||
| congestion", that is, congestion that occurs at the lower networking | congestion", that is, congestion that occurs at the lower networking | |||
| layers that may impact the delivery of Diameter messages between | layers that may impact the delivery of Diameter messages between | |||
| nodes. The authors recognize that element overload and network | nodes. The authors recognize that element overload and network | |||
| congestion are interrelated, and that overload can contribute to | congestion are interrelated, and that overload can contribute to | |||
| network congestion and vice versa. | network congestion and vice versa. | |||
| skipping to change at page 13, line 37 ¶ | skipping to change at page 13, line 37 ¶ | |||
| shared between components within a network operator's network. | shared between components within a network operator's network. | |||
| Network operators may not want to convey topology or operational | Network operators may not want to convey topology or operational | |||
| information, which limits how much overload and loading information | information, which limits how much overload and loading information | |||
| can be sent. For the interconnect scenario shown, Server 2 may want | can be sent. For the interconnect scenario shown, Server 2 may want | |||
| to signal overload to Server 1, to affect traffic coming from Network | to signal overload to Server 1, to affect traffic coming from Network | |||
| Operator 1. | Operator 1. | |||
| This case is distinct from those internal to a network operator's | This case is distinct from those internal to a network operator's | |||
| network, where there may be many more elements in a more complicated | network, where there may be many more elements in a more complicated | |||
| topology. Also, the elements in the interconnect network may not | topology. Also, the elements in the interconnect network may not | |||
| support diameter overload control, and the network operators may not | support Diameter overload control, and the network operators may not | |||
| want the interconnect network to use overload or loading information. | want the interconnect network to use overload or loading information. | |||
| They may only want the information to pass through the interconnect | They may only want the information to pass through the interconnect | |||
| network without further processing or action by the interconnect | network without further processing or action by the interconnect | |||
| network even if the elements in the interconnect network do support | network even if the elements in the interconnect network do support | |||
| diameter overload control. | Diameter overload control. | |||
| 3. Extensibility | ||||
| Given the variety of scenarios diameter elements can be deployed in, | ||||
| and the variety of roles they can fulfill with diameter and other | ||||
| technologies, a single algorithm for handling overload may not be | ||||
| sufficient. This effort cannot anticipate all possible future | ||||
| scenarios and roles. Extensibility, particularly of algorithms used | ||||
| to deal with overload, will be important to cover these cases. | ||||
| 4. Existing Mechanisms | 3. Existing Mechanisms | |||
| Diameter offers both implicit and explicit mechanisms for a Diameter | Diameter offers both implicit and explicit mechanisms for a Diameter | |||
| node to learn that a peer is overloaded or unreachable. The implicit | node to learn that a peer is overloaded or unreachable. The implicit | |||
| mechanism is simply the lack of responses to requests. If a client | mechanism is simply the lack of responses to requests. If a client | |||
| fails to receive a response in a certain time period, it assumes the | fails to receive a response in a certain time period, it assumes the | |||
| upstream peer is unavailable, or overloaded to the point of effective | upstream peer is unavailable, or overloaded to the point of effective | |||
| unavailability. The watchdog mechanism [RFC3539] ensures that a | unavailability. The watchdog mechanism [RFC3539] ensures that a | |||
| certain rate of transaction responses occur even when there is | certain rate of transaction responses occur even when there is | |||
| otherwise little or no other Diameter traffic. | otherwise little or no other Diameter traffic. | |||
| skipping to change at page 14, line 49 ¶ | skipping to change at page 14, line 40 ¶ | |||
| issues with transport (e.g. congestion propagation and window | issues with transport (e.g. congestion propagation and window | |||
| management) are managed at that level. But even with a congestion- | management) are managed at that level. But even with a congestion- | |||
| managed transport, a Diameter node can become overloaded at the | managed transport, a Diameter node can become overloaded at the | |||
| Diameter protocol or application layers due to the causes described | Diameter protocol or application layers due to the causes described | |||
| in Section 1.1 and congestion managed transports do not provide | in Section 1.1 and congestion managed transports do not provide | |||
| facilities (and are at the wrong level) to handle server overload. | facilities (and are at the wrong level) to handle server overload. | |||
| Transport level congestion management is also not sufficient to | Transport level congestion management is also not sufficient to | |||
| address overload in cases of multi-hop and multi-destination | address overload in cases of multi-hop and multi-destination | |||
| signaling. | signaling. | |||
| 5. Issues with the Current Mechanisms | 4. Issues with the Current Mechanisms | |||
| The currently available Diameter mechanisms for indicating an | The currently available Diameter mechanisms for indicating an | |||
| overload condition are not adequate to avoid service outages due to | overload condition are not adequate to avoid service outages due to | |||
| overload. This inadequacy may, in turn, contribute to broader | overload. This inadequacy may, in turn, contribute to broader | |||
| congestion collapse due to unresponsive Diameter nodes causing | congestion collapse due to unresponsive Diameter nodes causing | |||
| application or transport layer retransmissions. In particular, they | application or transport layer retransmissions. In particular, they | |||
| do not allow a Diameter agent or server to shed load as it approaches | do not allow a Diameter agent or server to shed load as it approaches | |||
| overload. At best, a node can only indicate that it needs to | overload. At best, a node can only indicate that it needs to | |||
| entirely stop receiving requests, i.e. that it has effectively | entirely stop receiving requests, i.e. that it has effectively | |||
| failed. Even that is problematic due to the inability to indicate | failed. Even that is problematic due to the inability to indicate | |||
| durational validity on the transient errors available in the base | durational validity on the transient errors available in the base | |||
| Diameter protocol. Diameter offers no mechanism to allow a node to | Diameter protocol. Diameter offers no mechanism to allow a node to | |||
| indicate different overload states for different categories of | indicate different overload states for different categories of | |||
| messages, for example, if it is overloaded for one Diameter | messages, for example, if it is overloaded for one Diameter | |||
| application but not another. | application but not another. | |||
| 5.1. Problems with Implicit Mechanism | 4.1. Problems with Implicit Mechanism | |||
| The implicit mechanism doesn't allow an agent or server to inform the | The implicit mechanism doesn't allow an agent or server to inform the | |||
| client of a problem until it is effectively too late to do anything | client of a problem until it is effectively too late to do anything | |||
| about it. The client does not know to take action until the upstream | about it. The client does not know to take action until the upstream | |||
| node has effectively failed. A Diameter node has no opportunity to | node has effectively failed. A Diameter node has no opportunity to | |||
| shed load early to avoid collapse in the first place. | shed load early to avoid collapse in the first place. | |||
| Additionally, the implicit mechanism cannot distinguish between | Additionally, the implicit mechanism cannot distinguish between | |||
| overload of a Diameter node and network congestion. Diameter treats | overload of a Diameter node and network congestion. Diameter treats | |||
| the failure to receive an answer as a transport failure. | the failure to receive an answer as a transport failure. | |||
| 5.2. Problems with Explicit Mechanisms | 4.2. Problems with Explicit Mechanisms | |||
| The Diameter specification is ambiguous on how a client should handle | The Diameter specification is ambiguous on how a client should handle | |||
| receipt of a DIAMETER_TOO_BUSY response. The base specification | receipt of a DIAMETER_TOO_BUSY response. The base specification | |||
| [RFC6733] indicates that the sending client should attempt to send | [RFC6733] indicates that the sending client should attempt to send | |||
| the request to a different peer. It makes no suggestion that a the | the request to a different peer. It makes no suggestion that the | |||
| receipt of a DIAMETER_TOO_BUSY response should affect future Diameter | receipt of a DIAMETER_TOO_BUSY response should affect future Diameter | |||
| messages in any way. | messages in any way. | |||
| The Authentication, Authorization, and Accounting (AAA) Transport | The Authentication, Authorization, and Accounting (AAA) Transport | |||
| Profile [RFC3539] recommends that a AAA node that receives a "Busy" | Profile [RFC3539] recommends that a AAA node that receives a "Busy" | |||
| response failover all remaining requests to a different agent or | response failover all remaining requests to a different agent or | |||
| server. But while the Diameter base specification explicitly depends | server. But while the Diameter base specification explicitly depends | |||
| on RFC3539 to define transport behavior, it does not refer to RFC3539 | on RFC3539 to define transport behavior, it does not refer to RFC3539 | |||
| in the description of behavior on receipt of DIAMETER_TOO_BUSY. | in the description of behavior on receipt of DIAMETER_TOO_BUSY. | |||
| There's a strong likelihood that at least some implementations will | There's a strong likelihood that at least some implementations will | |||
| skipping to change at page 16, line 40 ¶ | skipping to change at page 16, line 32 ¶ | |||
| DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also | DIAMETER_UNABLE_TO_DELIVER, or using DPR with cause code BUSY also | |||
| have no mechanisms for specifying the scope or cause of the failure, | have no mechanisms for specifying the scope or cause of the failure, | |||
| or the durational validity. | or the durational validity. | |||
| The issues with error responses in [RFC6733] extend beyond the | The issues with error responses in [RFC6733] extend beyond the | |||
| particular issues for overload control and have been addressed in an | particular issues for overload control and have been addressed in an | |||
| ad hoc fashion by various implementations. Addressing these in a | ad hoc fashion by various implementations. Addressing these in a | |||
| standard way would be a useful exercise, but it us beyond the scope | standard way would be a useful exercise, but it us beyond the scope | |||
| of this document. | of this document. | |||
| 6. Diameter Overload Case Studies | 5. Diameter Overload Case Studies | |||
| 6.1. Overload in Mobile Data Networks | 5.1. Overload in Mobile Data Networks | |||
| As the number of Third Generation (3G) and Long Term Evolution (LTE) | As the number of Third Generation (3G) and Long Term Evolution (LTE) | |||
| enabled smartphone devices continue to expand in mobility networks, | enabled smartphone devices continue to expand in mobility networks, | |||
| there have been situations where high signaling traffic load led to | there have been situations where high signaling traffic load led to | |||
| overload events at the Diameter-based Home Location Registries (HLR) | overload events at the Diameter-based Home Location Registries (HLR) | |||
| and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of | and/or Home Subscriber Servers (HSS) [TR23.843]. The root causes of | |||
| the HLR congestion events were manifold but included hardware failure | the HLR congestion events were manifold but included hardware failure | |||
| and procedural errors. The result was high signaling traffic load on | and procedural errors. The result was high signaling traffic load on | |||
| the HLR and HSS. | the HLR and HSS. | |||
| The 3GPP architecture [TS23.002] makes extensive use of Diameter. It | The 3GPP architecture [TS23.002] makes extensive use of Diameter. It | |||
| is used for mobility management [TS29.272] (and others), IMS | is used for mobility management [TS29.272] (and others), (IP | |||
| [TS29.228] (and others), policy and charging control [TS29.212] (and | Multimedia Subsystem) IMS [TS29.228] (and others), policy and | |||
| others) as well as other functions. The details of the architecture | charging control [TS29.212] (and others) as well as other functions. | |||
| are out of scope for this document, but it is worth noting that there | The details of the architecture are out of scope for this document, | |||
| are quite a few Diameter applications, some with quite large amounts | but it is worth noting that there are quite a few Diameter | |||
| of Diameter signaling in deployed networks. | applications, some with quite large amounts of Diameter signaling in | |||
| deployed networks. | ||||
| The 3GPP specifications do not currently address overload for | The 3GPP specifications do not currently address overload for | |||
| Diameter applications or provide an equivalent load control mechanism | Diameter applications or provide an equivalent load control mechanism | |||
| to those provided in the more traditional SS7 elements in GSM | to those provided in the more traditional SS7 elements in (Global | |||
| [TS29.002]. The capabilities specified in the 3GPP standards do not | System for Mobile Communications) GSM [TS29.002]. The capabilities | |||
| adequately address the abnormal condition where excessively high | specified in the 3GPP standards do not adequately address the | |||
| signaling traffic load situations are experienced. | abnormal condition where excessively high signaling traffic load | |||
| situations are experienced. | ||||
| Smartphones contribute much more heavily, relative to non- | Smartphones, an increasingly large percentage of mobile devices, | |||
| smartphones, to the continuation of a registration surge due to their | contribute much more heavily, relative to non-smartphones, to the | |||
| very aggressive registration algorithms. The aggressive smartphone | continuation of a registration surge due to their very aggressive | |||
| logic is designed to: | registration algorithms. Smartphone behavior contributes to network | |||
| loading and can contribute to overload conditions. The aggressive | ||||
| smartphone logic is designed to: | ||||
| a. always have voice and data registration, and | a. always have voice and data registration, and | |||
| b. constantly try to be on 3G or LTE data (and thus on 3G voice or | b. constantly try to be on 3G or LTE data (and thus on 3G voice or | |||
| VoLTE) for their added benefits. | VoLTE) for their added benefits. | |||
| Non-smartphones typically have logic to wait for a time period after | Non-smartphones typically have logic to wait for a time period after | |||
| registering successfully on voice and data. | registering successfully on voice and data. | |||
| The smartphone aggressive registration is problematic in two ways: | The smartphone aggressive registration is problematic in two ways: | |||
| o first by generating excessive signaling load towards the HLR that | o first by generating excessive signaling load towards the HLR that | |||
| is ten times that from a non-smartphone, | is ten times that from a non-smartphone, | |||
| o and second by causing continual registration attempts when a | o and second by causing continual registration attempts when a | |||
| network failure affects registrations through the 3G data network. | network failure affects registrations through the 3G data network. | |||
| 6.2. 3GPP Study on Core Network Overload | 5.2. 3GPP Study on Core Network Overload | |||
| A study in 3GPP SA2 on core network overload has produced the | A study in 3GPP SA2 on core network overload has produced the | |||
| technical report [TR23.843]. This enumerates several causes of | technical report [TR23.843]. This enumerates several causes of | |||
| overload in mobile core networks including portions that are signaled | overload in mobile core networks including portions that are signaled | |||
| using Diameter. This document is a work in progress and is not | using Diameter. This document is a work in progress and is not | |||
| complete. However, it is useful for pointing out scenarios and the | complete. However, it is useful for pointing out scenarios and the | |||
| general need for an overload control mechanism for Diameter. | general need for an overload control mechanism for Diameter. | |||
| It is common for mobile networks to employ more than one radio | It is common for mobile networks to employ more than one radio | |||
| technology and to do so in an overlay fashion with multiple | technology and to do so in an overlay fashion with multiple | |||
| technologies present in the same location (such as GSM, UMTS or CDMA | technologies present in the same location (such as 2nd or 3rd | |||
| along with LTE). This presents opportunities for traffic storms when | generation mobile technologies along with LTE). This presents | |||
| issues occur on one overlay and not another as all devices that had | opportunities for traffic storms when issues occur on one overlay and | |||
| been on the overlay with issues switch. This causes a large amount | not another as all devices that had been on the overlay with issues | |||
| of Diameter traffic as locations and policies are updated. | switch. This causes a large amount of Diameter traffic as locations | |||
| and policies are updated. | ||||
| Another scenario called out by this study is a flood of registration | Another scenario called out by this study is a flood of registration | |||
| and mobility management events caused by some element in the core | and mobility management events caused by some element in the core | |||
| network failing. This flood of traffic from end nodes falls under | network failing. This flood of traffic from end nodes falls under | |||
| the network initiated traffic flood category. There is likely to | the network initiated traffic flood category. There is likely to | |||
| also be traffic resulting directly from the component failure in this | also be traffic resulting directly from the component failure in this | |||
| case. A similar flood can occur when elements or components recover | case. A similar flood can occur when elements or components recover | |||
| as well. | as well. | |||
| Subscriber initiated traffic floods are also indicated in this study | Subscriber initiated traffic floods are also indicated in this study | |||
| as an overload mechanism where a large number of mobile devices | as an overload mechanism where a large number of mobile devices | |||
| attempting to access services at the same time, such as in response | attempting to access services at the same time, such as in response | |||
| to an entertainment event or a catastrophic event. | to an entertainment event or a catastrophic event. | |||
| While this 3GPP study is concerned with the broader effects of these | While this 3GPP study is concerned with the broader effects of these | |||
| scenarios on wireless networks and their elements, they have | scenarios on wireless networks and their elements, they have | |||
| implications specifically for Diameter signaling. One of the goals | implications specifically for Diameter signaling. One of the goals | |||
| of this document is to provide guidance for a core mechanism that can | of this document is to provide guidance for a core mechanism that can | |||
| be used to mitigate the scenarios called out by this study. | be used to mitigate the scenarios called out by this study. | |||
| 6. Extensibility and Application Independence | ||||
| Given the variety of scenarios Diameter elements can be deployed in, | ||||
| and the variety of roles they can fulfill with Diameter and other | ||||
| technologies, a single algorithm for handling overload may not be | ||||
| sufficient. This effort cannot anticipate all possible future | ||||
| scenarios and roles. Extensibility, particularly of algorithms used | ||||
| to deal with overload, will be important to cover these cases. | ||||
| Similarly, the scopes that overload information may apply to may | ||||
| include cases that have not yet been considered. Extensibility in | ||||
| this area will also be important. | ||||
| The basic mechanism is intended to be application-independent, that | ||||
| is, a Diameter node can use it across any existing and future | ||||
| Diameter applications and expect reasonable results. Certain | ||||
| Diameter applications might, however, benefit from application- | ||||
| specific behavior over and above the mechanism's defaults. For | ||||
| example, an application specification might specify relative | ||||
| priorities of messages or selection of a specific overload control | ||||
| algorithm. | ||||
| 7. Solution Requirements | 7. Solution Requirements | |||
| This section proposes requirements for an improved mechanism to | This section proposes requirements for an improved mechanism to | |||
| control Diameter overload, with the goals of improving the issues | control Diameter overload, with the goals of improving the issues | |||
| described in Section 5 and supporting the scenarios described in | described in Section 4 and supporting the scenarios described in | |||
| Section 2 | Section 2 | |||
| REQ 1: The overload control mechanism MUST provide a communication | REQ 1: The overload control mechanism MUST provide a communication | |||
| method for Diameter nodes to exchange load and overload | method for Diameter nodes to exchange load and overload | |||
| information. | information. | |||
| REQ 2: [Open Issue: The following requirement has generated list | REQ 2: The mechanism MUST allow Diameter nodes to support overload | |||
| discussion that is unresolved at the time of this writing. | control regardless of which Diameter applications they | |||
| The discussion concerns whether this requirement is needed | support. | |||
| at all, whether it should include the "MUST NOT require | ||||
| specification changes" language vs saying that it should not | ||||
| force changes large enough to require new application IDs, | ||||
| and whether we should include additional language to forbid | ||||
| assumptions about the behavior of specific implementations.] | ||||
| The overload control mechanism MUST be useable with any | ||||
| existing or future Diameter application. It MUST NOT | ||||
| require specification changes for existing Diameter | ||||
| applications. | ||||
| REQ 3: The overload control mechanism MUST limit the impact of | REQ 3: The overload control mechanism MUST limit the impact of | |||
| overload on the overall useful throughput of a Diameter | overload on the overall useful throughput of a Diameter | |||
| server, even when the incoming load on the network is far in | server, even when the incoming load on the network is far in | |||
| excess of its capacity. The overall useful throughput under | excess of its capacity. The overall useful throughput under | |||
| load is the ultimate measure of the value of an overload | load is the ultimate measure of the value of an overload | |||
| control mechanism. | control mechanism. | |||
| REQ 4: Diameter allows requests to be sent from either side of a | REQ 4: Diameter allows requests to be sent from either side of a | |||
| connection and either side of a connection may have need to | connection and either side of a connection may have need to | |||
| skipping to change at page 20, line 5 ¶ | skipping to change at page 20, line 17 ¶ | |||
| decisions using the most currently available information. | decisions using the most currently available information. | |||
| REQ 9: The mechanism MUST function across fully loaded as well as | REQ 9: The mechanism MUST function across fully loaded as well as | |||
| quiescent transport connections. This is partially derived | quiescent transport connections. This is partially derived | |||
| from the requirements for stability and hysteresis control | from the requirements for stability and hysteresis control | |||
| above. | above. | |||
| REQ 10: Consumers of overload state indications MUST be able to | REQ 10: Consumers of overload state indications MUST be able to | |||
| determine when the overload condition improves or ends. | determine when the overload condition improves or ends. | |||
| REQ 11: The overload control mechanism MUST be scalable. That is, | REQ 11: The overload control mechanism MUST be able to operate in | |||
| it MUST be able to operate in different sized networks. | networks of different sizes. | |||
| REQ 12: When a single network node fails, goes into overload, or | REQ 12: When a single network node fails, goes into overload, or | |||
| suffers from reduced processing capacity, the mechanism MUST | suffers from reduced processing capacity, the mechanism MUST | |||
| make it possible to limit the impact of this on other nodes | make it possible to limit the impact of this on other nodes | |||
| in the network. This helps to prevent a small-scale failure | in the network. This helps to prevent a small-scale failure | |||
| from becoming a widespread outage. | from becoming a widespread outage. | |||
| REQ 13: The mechanism MUST NOT introduce substantial additional work | REQ 13: The mechanism MUST NOT introduce substantial additional work | |||
| for node in an overloaded state. For example, a requirement | for node in an overloaded state. For example, a requirement | |||
| for an overloaded node to send overload information every | for an overloaded node to send overload information every | |||
| skipping to change at page 20, line 45 ¶ | skipping to change at page 21, line 12 ¶ | |||
| environment with a mix of nodes that do, and nodes that do | environment with a mix of nodes that do, and nodes that do | |||
| not, support the mechanism. | not, support the mechanism. | |||
| REQ 17: In a mixed environment with nodes that support the overload | REQ 17: In a mixed environment with nodes that support the overload | |||
| control mechanism and that do not, the mechanism MUST result | control mechanism and that do not, the mechanism MUST result | |||
| in at least as much useful throughput as would have resulted | in at least as much useful throughput as would have resulted | |||
| if the mechanism were not present. It SHOULD result in less | if the mechanism were not present. It SHOULD result in less | |||
| severe congestion in this environment. | severe congestion in this environment. | |||
| REQ 18: In a mixed environment of nodes that support the overload | REQ 18: In a mixed environment of nodes that support the overload | |||
| control mechanism and that do not, users and operators of | control mechanism and that do not, the mechanism MUST NOT | |||
| nodes that do not support the mechanism MUST NOT unfairly | preclude elements that support overload control from | |||
| benefit from the mechanism. | treating elements that do not support overload control in a | |||
| equitable fashion relative to those that do. users and | ||||
| operators of nodes that do not support the mechanism MUST | ||||
| NOT unfairly benefit from the mechanism. The mechanism | ||||
| specification SHOULD provide guidance to implementors for | ||||
| dealing with elements not supporting overload control. | ||||
| REQ 19: It MUST be possible to use the mechanism between nodes in | REQ 19: It MUST be possible to use the mechanism between nodes in | |||
| different realms and in different administrative domains. | different realms and in different administrative domains. | |||
| REQ 20: Any explicit overload indication MUST distinguish between | REQ 20: Any explicit overload indication MUST distinguish between | |||
| actual overload, as opposed to other, non-overload related | actual overload, as opposed to other, non-overload related | |||
| failures. | failures. | |||
| REQ 21: In cases where a network node fails, is so overloaded that | REQ 21: In cases where a network node fails, is so overloaded that | |||
| it cannot process messages, or cannot communicate due to a | it cannot process messages, or cannot communicate due to a | |||
| skipping to change at page 26, line 10 ¶ | skipping to change at page 26, line 26 ¶ | |||
| RFC 2914, September 2000. | RFC 2914, September 2000. | |||
| [RFC3539] Aboba, B. and J. Wood, "Authentication, Authorization and | [RFC3539] Aboba, B. and J. Wood, "Authentication, Authorization and | |||
| Accounting (AAA) Transport Profile", RFC 3539, June 2003. | Accounting (AAA) Transport Profile", RFC 3539, June 2003. | |||
| 10.2. Informative References | 10.2. Informative References | |||
| [RFC5390] Rosenberg, J., "Requirements for Management of Overload in | [RFC5390] Rosenberg, J., "Requirements for Management of Overload in | |||
| the Session Initiation Protocol", RFC 5390, December 2008. | the Session Initiation Protocol", RFC 5390, December 2008. | |||
| [RFC6357] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design | ||||
| Considerations for Session Initiation Protocol (SIP) | ||||
| Overload Control", RFC 6357, August 2011. | ||||
| [TR23.843] | [TR23.843] | |||
| 3GPP, "Study on Core Network Overload Solutions", | 3GPP, "Study on Core Network Overload Solutions", | |||
| TR 23.843 0.6.0, October 2012. | TR 23.843 0.6.0, October 2012. | |||
| [IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines", | [IR.34] GSMA, "Inter-Service Provider IP Backbone Guidelines", | |||
| IR 34 7.0, January 2012. | IR 34 7.0, January 2012. | |||
| [IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012. | [IR.88] GSMA, "LTE Roaming Guidelines", IR 88 7.0, January 2012. | |||
| [TS23.002] | [TS23.002] | |||
| End of changes. 32 change blocks. | ||||
| 99 lines changed or deleted | 111 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||