Network Working Group A. Lior Internet-Draft Bridgewater Systems Expires: April 16, 2004 October 17, 2003 RADIUS UDP transport mapping draft-lior-radius-udp-transport-mapping-00 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 16, 2004. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract The Remote Authentication Dial In User Service (RADIUS) Request For Comments (RFC) specify that RADIUS should use retransmit strategy to recover from the loss of packets but they do not provide any further guidance. This has resulted in implementations lacking support for congestion control. This Internet Draft discusses and recommends strategies for measuring, calculating and applying a retransmission timer for RADIUS. Lior Expires April 16, 2004 [Page 1] Internet-Draft RADIUS UDP transport mapping October 2003 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Requirements Language . . . . . . . . . . . . . . . . . . . . 3 1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Current Practice . . . . . . . . . . . . . . . . . . . . . . . 5 3. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Calculating Retrasnmit Timeout . . . . . . . . . . . . . . . . 7 3.2 How to Measure . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Where Should We Retransmit . . . . . . . . . . . . . . . . . . 8 4. Other Matters . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 Is the Server Up . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Open question: Use of Jitter . . . . . . . . . . . . . . . . . 10 5. Security Consideration . . . . . . . . . . . . . . . . . . . . 11 Normative References . . . . . . . . . . . . . . . . . . . . . 12 Informative References . . . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . 14 Lior Expires April 16, 2004 [Page 2] Internet-Draft RADIUS UDP transport mapping October 2003 1. Introduction The Remote Authentication Dial In User Service (RADIUS) Request For Comments (RFC) specify that RADIUS Clients should retransmit their requests when they don't receive a response after some time. However, the RFCs leave implementors high and dry as to any specifics. According to RFC2865 [2]: "Retry and fall-back algorithms are the topic of current research and are not specified in detail in this document.". This document discusses and makes recommendation on issues specifically associated with retransmission of RADIUS packets. While fail-over strategies are an important contributor in making any RADIUS deployment more robust and more reactive; fail-over strategies will be discussed in another document. 1.1 Requirements Language In this document, several words are used to signify the requirements of the specification. These words are often capitalized. The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [1]. 1.2 Terminology Lior Expires April 16, 2004 [Page 3] Internet-Draft RADIUS UDP transport mapping October 2003 2. Overview For reasons explained in RFC2865 [2] , the RADIUS protocol uses (the unreliable) UDP transport. In order to recover from lost RADIUS packets, a RADIUS client requires to retransmit packets for which it has not received a response. Note we use the term RADIUS client for the NAS. As well, a RADIUS server that is forwarding packets also acts as a client and according to RFC2865 [2] will also retransmit packets (later on we discuss whether or not such a RADIUS server should retransmit). Failure to receive a reply in RADIUS within a specified amount of time can be attributed to several conditions: o The packet or the response packet was actually dropped by the network. o The packet was delayed by the network due to congestion. o No response was received because the server is no longer alive. o No response was received because the server is congested. The reply is delayed or the message was purged from the receive queue in the server. o No response was received because the packet was silently discarded due to errors. Given the above conditions, the problem is that a RADIUS client does not have direct means to distinguish between these different conditions. Therefore, a RADIUS client would typically behave as follows: o Transmit a packet and start a retransmit timer. o If the retransmit timer has expired. It would retry sending the packet to the same server. o After a number of retries it would retransmit using another server if available (fail-over). Retransmission is only effective when the network has dropped the request or response packet or when the server has dropped the request or response packet. Retransmitting into an already congested network just adds to the congestion. A server should only retransmit when it is reasonably assured that the original packet has left the network. Lior Expires April 16, 2004 [Page 4] Internet-Draft RADIUS UDP transport mapping October 2003 Failing to do so adds to the congestion, and worse may actually cause the "congestion collapse" of the network. Later in the document we describe a mechanism that should be used by a RADIUS client to calculate a dynamic value for its retransmission timer. This value takes into account the networkresponse (RTT). Retransmitting to a server that is no longer alive is useless. It generates unnecessary traffic. After getting several failed attempts transmitting and retransmitting to a failed server, a RADIUS Client should fail-over to another server if available. Similarly, retransmitting to an already congested server is also not helpful in that it adds to the congestion of the server. Lastly, retransmitting a packet that has been silently discarded is also wasteful. Since the packet will probably be discarded again. While the new AAA protocol, Diameter RFC3588 [7], addresses these issues, it is still possible to strengthen the reliability of RADIUS. The challenge is to do so while maintaining back-wards compatibility. Therefore in this document we present strategies that can be used to make RADIUS transport more robust, that remain back-wards compatibility. We do not generated new commands, introduce new attributes, or add a heartbeat mechanism. 2.1 Current Practice Current practice is that RADIUS clients, in this case NASes, use a configurable static retransmit timeout. The NAS will timeout and retransmit several times (typically 3) and then fail-over to another RADIUS server. The timeout at the NASes is typical selected to be the longest possible timeout that would be acceptable to the end user. For example, in a Dial-up world, we know that users will only wait a few seconds before hanging up the call and trying again. The RADIUS servers that service the NAS (the first hop) will have a retransmit timeout set. Some RADIUS servers have a capability to set timeout periods that are set on target proxy basis. The retransmit timeouts will be set such that they are within the setting at the NAS. This is achievable because the RADIUS Servers and the NAS are being managed by in the same administrative domain. Administrating retransmit timers in other administrative domains is very difficult because it requires knowledge of what is happening in administrative domains that are upstream. Furthermore upstream RADIUS server may be getting traffic from various downstream networks. For this reasons, we recommend that in general intermediaries should not be configure to retransmit. Lior Expires April 16, 2004 [Page 5] Internet-Draft RADIUS UDP transport mapping October 2003 RADIUS servers that are acting as Proxy for accounting and that proxy accounting records using store and forward, would use a retransmit strategy. Because of the types of issues discussed here, it is apparent that setting the retransmit behavior of RADIUS servers in a network can be tedious. It is more of an art form then a science. Lior Expires April 16, 2004 [Page 6] Internet-Draft RADIUS UDP transport mapping October 2003 3. Recommendations 3.1 Calculating Retrasnmit Timeout As discussed above, in the absence of guidance, one of the problems with RADIUS today is that some implementations use a configurable yet static retransmission timer. This overly aggressive non-adaptive approach may worsen congestion and in some circumstance may cause "congestion collapse" of the network first described in [5] and then in [6]. To address this problem it is recommended that RADIUS adopt the method for calculating it's retransmission timer as described in RFC2988 [4]. RFC2988 [4] describes an algorithm for calculating the retransmission timeout RTO based on the observed round trip time (RTT) and its variance, of packets recently sent. The RFC also describes how to make the RTT measurements and how to apply the calculated RTO is determining the retransmission time. The reader is asked to refer to the RFC for details. In the next few section we will discuss how to apply RFC2988 [4] to RADIUS. Specifically we have to look at how a client measures RTO and as well, which RADIUS clients retransmit packets. 3.2 How to Measure RFC2988 [4] describes how to measure RTO but this measurement is specific to TCP and certain issues need to be considered when applying the algorithm to RADIUS. A RADIUS client sends RADIUS packets to a RADIUS server. In certain conditions the RADIUS server acts as an intermediary server and forwards the packet to another RADIUS server. The client has no visibility as to what the server will do with the packet. Will the packet terminate there or will it be forwarded on? If it is forwarded on, how is it forwarded on? Is it routed based on the realm or on some other attribute? To get an accurate measurement of RTO we need to consider routing. RTO is based on the RTT of the entire route of the packet and its response. To illustrate the problem consider the following scenario. A client server sends its packets to a RADIUS server that acts as a proxy serving two realms, realm-A and realm-B. Realm-A is very responsive and Realm-B is not that responsive. That is, RTT-A is smaller then RTT-B. As well the traffic sent to realm-B is smaller then the traffic sent to realm-A. If the client bases its calculation Lior Expires April 16, 2004 [Page 7] Internet-Draft RADIUS UDP transport mapping October 2003 of RTO on the RTT observed solely on the next hop (a not on the realm), then the calculated RTO will largely be influenced by the RTT of the faster realm-A. The net result would then be that the client will timeout prematurely when waiting for responses being forwarded to the slower realm-B. NASes typically exhibit the above behavior in that they send packets to a primary RADIUS server and fail-over to a secondary server. They may even use a round-robin strategy between several RADIUS servers. Regardless, NASes typically are not concerned with routing. 3.3 Where Should We Retransmit The RADIUS specifications allow for RADIUS server that proxy or forward to retransmit. If we are going to allow a RADIUS server in a proxy chain to retransmit then we want to make sure that we only retransmit when we are reasonably assured that the original packet has left the network. This means that: o We use RTO as described above; o We must make sure that at each hop we accommodate for the retransmit behavior of the other nodes that perform retransmission in the proxy chain. The later point requires manual configuration and an understanding of what is going on in the network. However, if we know that every node in the network is using the same algorithm for computing RTO then all we need to know is how many times the node will retransmit and whether or not they fail-over and whether or not they use binary-back-off. We also need to make sure that they measure RTO based on how the packets are routed in the network. For example, if we have three nodes A,B,C with B being a proxy for A and C being a proxy for B; Furthermore, given that each node uses binary-back-off and will only retry once. Then: o Node C will require a maximum of 3 RTO periods. o Node B initially should wait 3 RTOs before retransmitting the first time; plus 2 RTOs for the second retransmission requiring a total of 5 RTO periods. o Node A should initially wait for 5 RTOs before retrying; plus 2 RTOs for the second retries. Note that if the maximum time to respond to the user is the source of the bounding then the allocation needs to be done in the reverse Lior Expires April 16, 2004 [Page 8] Internet-Draft RADIUS UDP transport mapping October 2003 order, starting at the NAS. For example, if we want to make sure that we get a response back to the user in a maximum of 15 seconds. Allowing the NAS (node-A) to failover to a secondary server budgets 7.5 seconds for retransmisting at the primary server. Allowing for two retransmits with a binary backoff requires three RTO periods with a limit of 2.5 seconds per RTO period. Similarly, the next hop, node-B would require 3 RTOs periods: one for the initial retry and two for the second retry. Therefore for node-B, the RTO limit would be 833 msec; this is clearly too low (too aggresive); according to RFC2988 [4] minimum RTO should be set to 1 second. The point of the above example was to illustrate the complexity introduced by having intermediaries retransmit. Given the complexity we question the need for retransmission in general by intermediary nodes. In general we recommend that only RADIUS clients that originate the packets retransmit with the following exceptions: Proxy nodes that store and forward accounting packets should retransmit.; Since NAS's behavior is to use statically configured retransmission timers and NASes typically do not considering routing (as discussed above) then it may make sense for the RADIUS server serving the NAS to retransmit. However, it must complete its retransmit procedure within timeout period configured in the NAS. Where a particularly bad link exists it may make sense to retransmit over that link. Note: while it may not make sense to retransmit at intermediate nodes, fail-over to alternate nodes at intermediaries nodes does make sense. Therefore, intermediary nodes should keep track of failures and should measure RTO and use this information in their fail-over strategy. Lior Expires April 16, 2004 [Page 9] Internet-Draft RADIUS UDP transport mapping October 2003 4. Other Matters 4.1 Is the Server Up As pointed out above, a RADIUS client has difficulty determining why it has not received a reply to a message. Is it the network downstream or is it the server? Under certain circumstances the client can determine whether the node is operating or not and even whether it is slow or not. For example, if the server is serving more then one realm and the client was keeping track of the RTT on a per realm basis; then if one realm is responsive and the other realm is not this would indicate conclusively to the client that the server is up. If both realms were not responsive then this would indicate that the server is probably down. A robust implementation should take advantage of this type of information. 4.2 Open question: Use of Jitter Rather than an exponential back-off, the introduction of transmission jitter might be a more effective strategy. Should it be combined with RTO measurements? Jittering is used to spread out requests. It may be useful to use jitter in the case where a client detects an extraordinary number of messages to send. Note however, that typically there are only a few seconds available to respond back to the user. Users behave badly and will just retry logging in generating even more work in a server. Lior Expires April 16, 2004 [Page 10] Internet-Draft RADIUS UDP transport mapping October 2003 5. Security Consideration TBD. Lior Expires April 16, 2004 [Page 11] Internet-Draft RADIUS UDP transport mapping October 2003 Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Rigney, C., Willens, S., Rubens, A. and W. Simpson, "Remote Authentication Dial In User Service (RADIUS)", RFC 2865, June 2000. [3] Rigney, C., "RADIUS Accounting", RFC 2866, June 2000. [4] Paxson, V. and M. Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. Lior Expires April 16, 2004 [Page 12] Internet-Draft RADIUS UDP transport mapping October 2003 Informative References [5] Nagle, J., "Congestion control in IP/TCP internetworks", RFC 896, January 1984. [6] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, September 2000. [7] Calhoun, P., Loughney, J., Guttman, E., Zorn, G. and J. Arkko, "Diameter Base Protocol", RFC 3588, September 2003. Author's Address Avi Lior Bridgewater Systems Corporation 303 Terry Fox Drive Suite 100 Ottawa, Ontario K2K 3J1 Canada Phone: (613) 591-6655 EMail: avi@bridgewatersystems.com URI: http://www.bridgewatersystems.com/ Lior Expires April 16, 2004 [Page 13] Internet-Draft RADIUS UDP transport mapping October 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Lior Expires April 16, 2004 [Page 14] Internet-Draft RADIUS UDP transport mapping October 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Lior Expires April 16, 2004 [Page 15]