Network Working Group                                            A. Lior
Internet-Draft                                       Bridgewater Systems
Expires: April 16, 2004                                 October 17, 2003


                      RADIUS UDP transport mapping
               draft-lior-radius-udp-transport-mapping-00

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 16, 2004.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   The Remote Authentication Dial In User Service (RADIUS) Request For
   Comments (RFC) specify that RADIUS should use retransmit strategy to
   recover from the loss of packets but they do not provide any further
   guidance. This has resulted in implementations lacking support for
   congestion control.  This Internet Draft discusses and recommends
   strategies for measuring, calculating and applying a retransmission
   timer for RADIUS.









Lior                     Expires April 16, 2004                 [Page 1]

Internet-Draft        RADIUS UDP transport mapping          October 2003


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   1.1 Requirements Language  . . . . . . . . . . . . . . . . . . . .  3
   1.2 Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.1 Current Practice . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . .  7
   3.1 Calculating Retrasnmit Timeout . . . . . . . . . . . . . . . .  7
   3.2 How to Measure . . . . . . . . . . . . . . . . . . . . . . . .  7
   3.3 Where Should We Retransmit . . . . . . . . . . . . . . . . . .  8
   4.  Other Matters  . . . . . . . . . . . . . . . . . . . . . . . . 10
   4.1 Is the Server Up . . . . . . . . . . . . . . . . . . . . . . . 10
   4.2 Open question: Use of Jitter . . . . . . . . . . . . . . . . . 10
   5.  Security Consideration . . . . . . . . . . . . . . . . . . . . 11
       Normative References . . . . . . . . . . . . . . . . . . . . . 12
       Informative References . . . . . . . . . . . . . . . . . . . . 13
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13
       Intellectual Property and Copyright Statements . . . . . . . . 14
































Lior                     Expires April 16, 2004                 [Page 2]

Internet-Draft        RADIUS UDP transport mapping          October 2003


1. Introduction

   The Remote Authentication Dial In User Service (RADIUS) Request For
   Comments (RFC) specify that RADIUS Clients should retransmit their
   requests when they don't receive a response after some time. However,
   the RFCs leave implementors high and dry as to any specifics.
   According to RFC2865 [2]: "Retry and fall-back algorithms are the
   topic of current research and are not specified in detail in this
   document.".

   This document discusses and makes recommendation on issues
   specifically associated with retransmission of RADIUS packets. While
   fail-over strategies are an important contributor in making any
   RADIUS deployment more robust and more reactive; fail-over strategies
   will be discussed in another document.

1.1 Requirements Language

   In this document, several words are used to signify the requirements
   of the specification. These words are often capitalized. The keywords
   "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
   "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document
   are to be interpreted as described in RFC2119 [1].

1.2 Terminology


























Lior                     Expires April 16, 2004                 [Page 3]

Internet-Draft        RADIUS UDP transport mapping          October 2003


2. Overview

   For reasons explained in RFC2865 [2] , the RADIUS protocol uses (the
   unreliable) UDP transport. In order to recover from lost RADIUS
   packets, a RADIUS client requires to retransmit packets for which it
   has not received a response.

   Note we use the term RADIUS client for the NAS. As well, a RADIUS
   server that is forwarding packets also acts as a client and according
   to RFC2865 [2] will also retransmit packets (later on we discuss
   whether or not such a RADIUS server should retransmit).

   Failure to receive a reply in RADIUS within a specified amount of
   time can be attributed to several conditions:

   o  The packet or the response packet was actually dropped by the
      network.

   o  The packet was delayed by the network due to congestion.

   o  No response was received because the server is no longer alive.

   o  No response was received because the server is congested. The
      reply is delayed or the message was purged from the receive queue
      in the server.

   o  No response was received because the packet was silently discarded
      due to errors.

   Given the above conditions, the problem is that a RADIUS client does
   not have direct means to distinguish between these different
   conditions.

   Therefore, a RADIUS client would typically behave as follows:

   o  Transmit a packet and start a retransmit timer.

   o  If the retransmit timer has expired.  It would retry sending the
      packet to the same server.

   o  After a number of retries it would retransmit using another server
      if available (fail-over).

   Retransmission is only effective when the network has dropped the
   request or response packet or when the server has dropped the request
   or response packet. Retransmitting into an already congested network
   just adds to the congestion. A server should only retransmit when it
   is reasonably assured that the original packet has left the network.



Lior                     Expires April 16, 2004                 [Page 4]

Internet-Draft        RADIUS UDP transport mapping          October 2003


   Failing to do so adds to the congestion, and worse may actually cause
   the "congestion collapse" of the network. Later in the document we
   describe a mechanism that should be used by a RADIUS client to
   calculate a dynamic value for its retransmission timer. This value
   takes into account the networkresponse (RTT).

   Retransmitting to a server that is no longer alive is useless. It
   generates unnecessary traffic. After getting several failed attempts
   transmitting and retransmitting to a failed server, a RADIUS Client
   should fail-over to another server if available.

   Similarly, retransmitting to an already congested server is also not
   helpful in that it adds to the congestion of the server.

   Lastly, retransmitting a packet that has been silently discarded is
   also wasteful. Since the packet will probably be discarded again.

   While the new AAA protocol,  Diameter RFC3588 [7], addresses these
   issues, it is still possible to strengthen the reliability of RADIUS.
   The challenge is to do so while maintaining back-wards compatibility.
   Therefore in this document we present strategies that can be used to
   make RADIUS transport more robust, that remain back-wards
   compatibility. We do not generated new commands, introduce new
   attributes, or add a heartbeat mechanism.

2.1 Current Practice

   Current practice is that RADIUS clients, in this case NASes, use a
   configurable static retransmit timeout. The NAS will timeout and
   retransmit several times (typically 3) and then fail-over to another
   RADIUS server. The timeout at the NASes is typical selected to be the
   longest possible timeout that would be acceptable to the end user.
   For example, in a Dial-up world, we know that users will only wait a
   few seconds before hanging up the call and trying again.

   The RADIUS servers that service the NAS (the first hop) will have a
   retransmit timeout set. Some RADIUS servers have a capability to set
   timeout periods that are set on target proxy basis. The retransmit
   timeouts will be set such that they are within the setting at the
   NAS. This is achievable because the RADIUS Servers and the NAS are
   being managed by in the same administrative domain.

   Administrating retransmit timers in other administrative domains is
   very difficult because it requires knowledge of what is happening in
   administrative domains that are upstream.  Furthermore upstream
   RADIUS server may be getting traffic from various downstream
   networks. For this reasons, we recommend that in general
   intermediaries should not be configure to retransmit.



Lior                     Expires April 16, 2004                 [Page 5]

Internet-Draft        RADIUS UDP transport mapping          October 2003


   RADIUS servers that are acting as Proxy for accounting and that proxy
   accounting records using store and forward, would use a retransmit
   strategy.

   Because of the types of issues discussed here, it is apparent that
   setting the retransmit behavior of RADIUS servers in a network can be
   tedious. It is more of an art form then a science.












































Lior                     Expires April 16, 2004                 [Page 6]

Internet-Draft        RADIUS UDP transport mapping          October 2003


3. Recommendations

3.1 Calculating Retrasnmit Timeout

   As discussed above, in the absence of guidance, one of the problems
   with RADIUS today is that some implementations use a configurable yet
   static retransmission timer. This overly aggressive non-adaptive
   approach may worsen congestion and in some circumstance may cause
   "congestion collapse" of the network first described in [5] and then
   in [6].

   To address this problem it is recommended that RADIUS adopt the
   method for calculating it's retransmission timer as described in
   RFC2988 [4].

   RFC2988 [4] describes an algorithm for calculating the retransmission
   timeout RTO based on the observed round trip time (RTT) and its
   variance, of packets recently sent. The RFC also describes how to
   make the RTT measurements and how to apply the calculated RTO is
   determining the retransmission time. The reader is asked to refer to
   the RFC for details.

   In the next few section we will discuss how to apply RFC2988 [4] to
   RADIUS.  Specifically we have to look at how a client measures RTO
   and as well, which RADIUS clients retransmit packets.

3.2 How to Measure

   RFC2988 [4] describes how to measure RTO but this measurement is
   specific to TCP and certain issues need to be considered when
   applying the algorithm to RADIUS.

   A RADIUS client sends RADIUS packets to a RADIUS server. In certain
   conditions the RADIUS server acts as an intermediary server and
   forwards the packet to another RADIUS server. The client has no
   visibility as to what the server will do with the packet. Will the
   packet terminate there or will it be forwarded on? If it is forwarded
   on, how is it forwarded on? Is it routed based on the realm or on
   some other attribute?

   To get an accurate measurement of RTO we need to consider routing.
   RTO is based on the RTT of the entire route of the packet and its
   response. To illustrate the problem consider the following scenario.
   A client server sends its packets to a RADIUS server that acts as a
   proxy serving two realms, realm-A and realm-B. Realm-A is very
   responsive and Realm-B is not that responsive. That is, RTT-A is
   smaller then RTT-B. As well the traffic sent to realm-B is smaller
   then the traffic sent to realm-A. If the client bases its calculation



Lior                     Expires April 16, 2004                 [Page 7]

Internet-Draft        RADIUS UDP transport mapping          October 2003


   of RTO on the RTT observed solely on the next hop (a not on the
   realm), then the calculated RTO will largely be influenced by the RTT
   of the faster realm-A. The net result would then be that the client
   will timeout prematurely when waiting for responses being forwarded
   to the slower realm-B.

   NASes typically exhibit the above behavior in that they send packets
   to a primary RADIUS server and fail-over to a secondary server. They
   may even use a round-robin strategy between several RADIUS servers.
   Regardless, NASes typically are not concerned with routing.

3.3 Where Should We Retransmit

   The RADIUS specifications allow for RADIUS server that proxy or
   forward to retransmit. If we are going to allow a RADIUS server in a
   proxy chain to retransmit then we want to make sure that we only
   retransmit when we are reasonably assured that the original packet
   has left the network. This means that:

   o  We use RTO as described above;

   o  We must make sure that at each hop we accommodate for the
      retransmit behavior of the other nodes that perform retransmission
      in the proxy chain.

   The later point requires manual configuration and an understanding of
   what is going on in the network. However, if we know that every node
   in the network is using the same algorithm for computing RTO then all
   we need to know is how many times the node will retransmit and
   whether or not they fail-over and whether or not they use
   binary-back-off. We also need to make sure that they measure RTO
   based on how the packets are routed in the network.

   For example, if we have three nodes A,B,C with B being a proxy for A
   and C being a proxy for B; Furthermore, given that each node uses
   binary-back-off and will only retry once.  Then:

   o  Node C will require a maximum of 3 RTO periods.

   o  Node B initially should wait 3 RTOs before retransmitting the
      first time; plus 2 RTOs for the second retransmission requiring a
      total of 5 RTO periods.

   o  Node A should initially wait for 5 RTOs before retrying; plus 2
      RTOs for the second retries.

   Note that if the maximum time to respond to the user is the source of
   the bounding then the allocation needs to be done in the reverse



Lior                     Expires April 16, 2004                 [Page 8]

Internet-Draft        RADIUS UDP transport mapping          October 2003


   order, starting at the NAS. For example, if we want to make sure that
   we get a response back to the user in a maximum of 15 seconds.
   Allowing the NAS (node-A) to failover to a secondary server budgets
   7.5 seconds for retransmisting at the primary server. Allowing for
   two retransmits with a binary backoff requires three RTO periods with
   a limit of 2.5 seconds per RTO period. Similarly, the next hop,
   node-B would require 3 RTOs periods: one for the initial retry and
   two for the second retry. Therefore for node-B, the RTO limit would
   be 833 msec; this is clearly too low (too aggresive); according to
   RFC2988 [4] minimum RTO should be set to 1 second.

   The point of the above example was to illustrate the complexity
   introduced by having intermediaries retransmit. Given the complexity
   we question the need for retransmission in general by intermediary
   nodes. In general we recommend that only RADIUS clients that
   originate the packets retransmit with the following exceptions:

      Proxy nodes that store and forward accounting packets should
      retransmit.;

      Since NAS's behavior is to use statically configured
      retransmission timers and NASes typically do not considering
      routing (as discussed above) then it may make sense for the RADIUS
      server serving the NAS to retransmit. However, it must complete
      its retransmit procedure within timeout period configured in the
      NAS.

      Where a particularly bad link exists it may make sense to
      retransmit over that link.

   Note: while it may not make sense to retransmit at intermediate
   nodes, fail-over to alternate nodes at intermediaries nodes does make
   sense. Therefore, intermediary nodes should keep track of failures
   and should measure RTO and use this information in their fail-over
   strategy.
















Lior                     Expires April 16, 2004                 [Page 9]

Internet-Draft        RADIUS UDP transport mapping          October 2003


4. Other Matters

4.1 Is the Server Up

   As pointed out above, a RADIUS client has difficulty determining why
   it has not received a reply to a message. Is it the network
   downstream or is it the server? Under certain circumstances the
   client can determine whether the node is operating or not and even
   whether it is slow or not. For example, if the server is serving more
   then one realm and the client was keeping track of the RTT on a per
   realm basis; then if one realm is responsive and the other realm is
   not this would indicate conclusively to the client that the server is
   up. If both realms were not responsive then this would indicate that
   the server is probably down.

   A robust implementation should take advantage of this type of
   information.

4.2 Open question: Use of Jitter

   Rather than an exponential back-off, the introduction of transmission
   jitter might be a more effective strategy.

   Should it be combined with RTO measurements?

   Jittering is used to spread out requests. It may be useful to use
   jitter in the case where a client detects an extraordinary number of
   messages to send.

   Note however, that typically there are only a few seconds available
   to respond back to the user. Users behave badly and will just retry
   logging in generating even more work in a server.



















Lior                     Expires April 16, 2004                [Page 10]

Internet-Draft        RADIUS UDP transport mapping          October 2003


5. Security Consideration

   TBD.
















































Lior                     Expires April 16, 2004                [Page 11]

Internet-Draft        RADIUS UDP transport mapping          October 2003


Normative References

   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.

   [2]  Rigney, C., Willens, S., Rubens, A. and W. Simpson, "Remote
        Authentication Dial In User Service (RADIUS)", RFC 2865, June
        2000.

   [3]  Rigney, C., "RADIUS Accounting", RFC 2866, June 2000.

   [4]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
        Timer", RFC 2988, November 2000.






































Lior                     Expires April 16, 2004                [Page 12]

Internet-Draft        RADIUS UDP transport mapping          October 2003


Informative References

   [5]  Nagle, J., "Congestion control in IP/TCP internetworks", RFC
        896, January 1984.

   [6]  Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914,
        September 2000.

   [7]  Calhoun, P., Loughney, J., Guttman, E., Zorn, G. and J. Arkko,
        "Diameter Base Protocol", RFC 3588, September 2003.


Author's Address

   Avi Lior
   Bridgewater Systems Corporation
   303 Terry Fox Drive
   Suite 100
   Ottawa, Ontario  K2K 3J1
   Canada

   Phone: (613) 591-6655
   EMail: avi@bridgewatersystems.com
   URI:   http://www.bridgewatersystems.com/



























Lior                     Expires April 16, 2004                [Page 13]

Internet-Draft        RADIUS UDP transport mapping          October 2003


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION



Lior                     Expires April 16, 2004                [Page 14]

Internet-Draft        RADIUS UDP transport mapping          October 2003


   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.











































Lior                     Expires April 16, 2004                [Page 15]