INTERNET DRAFT Pat R. Calhoun Category: Standards Track Sun Microsystems, Inc. Title: draft-calhoun-diameter-reliable-00.txt Allan C. Rubens Date: November 1998 Ascend Communications DIAMETER Reliable Transport Extensions Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract Many services that require DIAMETER need retransmission and timeout faster than TCP can provide. An example would be in a NAS environment where DIAMETER is used for the authentication and authorization of users. The amount of time that it takes for TCP to determine that a connection to a server is broken is longer than the disonnect timeout of the PPP clients on whose behalf the server is being contacted. RADIUS has been able to handle this situation by operating over UDP. However, RADIUS fails to define a standard retransmission and timeout scheme, which has resulted in many different methods across implementations. Calhoun expires May 1999 [Page 1] INTERNET DRAFT November 1998 This DIAMETER specification defines the extensions necessary for the base protocol to operate over a non-reliable transport (e.g. UDP). Calhoun expires May 1999 [Page 2] INTERNET DRAFT November 1998 Table of Contents 1.0 Introduction 1.1 Definitions 2.0 Protocol Overview 2.1 Flow Control 2.2 Suggested implementation 2.3 Peer failure recovery 3.0 Extended Header Format 4.0 DIAMETER AVPs 4.1 Receive-Window 5.0 References 6.0 Acknowledgements 7.0 Author's Address Appendix A: Acknowledgment Timeouts A.1 Calculating Adaptive Acknowledgment Timeout A.2 Flow Control: Adjusting for Timeout Appendix B: Examples of sequence numbering B.1 Lock-step tunnel establishment B.2 Multiple packets acknowledged B.3 Lost packet with retransmission 1.0 Introduction The extensions defined in this specification are mandatory for all DIAMETER extensions operating over a non-reliable transport (e.g. UDP). 1.1 Definitions In this document, several words are used to signify the requirements of the specification. These words are often capitalized. MUST This word, or the adjective "required", means that the definition is an absolute requirement of the specification. MUST NOT This phrase means that the definition is an absolute prohibition of the specification. SHOULD This word, or the adjective "recommended", means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications must be understood and carefully weighed before choosing a different course. Calhoun expires May 1999 [Page 3] INTERNET DRAFT November 1998 MAY This word, or the adjective "optional", means that this item is one of an allowed set of alternatives. An implementation which does not include this option MUST be prepared to interoperate with another implementation which does include the option. 2.0 Protocol Overview This section provides a detailed overview of how reliable transport can be optionally provided by DIAMETER. No negotiation mechanism for determining if this optional capability is required by either peer of a DIAMETER session is defined herein. The mechanism for deciding this is beyond the scope of this document. 2.1 Flow Control There are two different types of DIAMETER messages; A DIAMETER message that only contains the header and no Attribute-Value Pairs (AVPs) is known as a zero length body message (ZLB). ZLB messages are used for explicitly acknowledging packets to the peer. Non-ZLB DIAMETER messages are messages that contain AVPs and can be of any type defined in [10]. Two optional fields in the DIAMETER header that are important to the operation of DIAMETER when it is not being run over TCP are Nr (Next Received) Ns (Next Send). A single sequence number state is maintained for all DIAMETER messages to a given peer. The sequence number starts at 0. Each subsequent non-ZLB packet is sent with the next increment of the sequence number. The sequence number is thus a free running counter represented modulo 65536. For purposes of detecting duplication, a received sequence value is considered less than or equal to the last received value if its value lies in the range of the last value and its 32767 successor values. For example, if the last received sequence number was 15, then received packets with Ns values in the range ( 32783, ... 65535, 0, ... 15 ) would be considered duplicates and would be silently discarded. A packet with sequence number 16 would be treated as the next in-sequence packet and packets with other sequences numbers are out-of-order. It is an implementation decision as to whether DIAMETER Messages received out-of-order are queued for later processing or silently discarded. The former is recommended when possible. In this document, the sequence number state for each peer is Calhoun expires May 1999 [Page 4] INTERNET DRAFT November 1998 represented for clarity of discussion by distinct pairs of state variables, Sr and Ss. Sr represents the value of the next in-sequence message expected to be received for a given session by a peer. Ss represents the sequence number to be placed in the Ns field of the next message sent to a given peer. Each state is initialized such that the first message sent and the first message expected to be received to/from each peer has an Ns value of 0. This corresponds to initializing Ss and Sr to 0 for each peer. As messages are sent to a given peer, Nr is set in these messages to reflect one more than the Ns value of the highest (modulo 2^16) in- order message received from that peer; if sent before any packet is received Nr will be 0, indicating that the peer expects the next new Ns value to be 0. When a non-ZLB message is received with an Ns value that matches the peer's current Sr value, Sr is incremented by 1 (modulo 2^16). It is important to note that Sr is not modified if a message is received with a value of Ns greater than the current Sr value. Retransmission of lost packets will eventually provide the receiving peer with its next expected message. Every time a peer sends a non-ZLB message it increments its Ss value for that peer by 1 (modulo 2^16). This increment takes place after the current Ss value is copied to Ns in the message to be sent. New outgoing messages normally include the current value of Sr for the corresponding peer in their Nr field. A peer may not wish to send the latest Sr value back to its peer due to congestion (i.e., its receive buffer for the session is full). In this case it is permissible for the peer to send back an Nr value containing the Ns value of the first message in the window. It is preferable to return an acknowledgment with this old Nr value rather than to withhold acknowledgments entirely when the receive window is full. Retransmitted messages should also include the current value of Sr in their Nr field, but some implementations may choose not to update Nr to avoid having to perform another hash in the Integrity-Check-Vector AVP. Note that the hash would only have to be recomputed if the Nr value had changed. This restriction does not apply to end-to-end integrity since the Ns and Nr fields are mutable. When retransmitting a message the identifier in the protocol header MUST NOT be changed. When transmitting packets, a DIAMETER peer must obey the receive window size offerred by its peer. The default window size is 7. A DIAMETER peer MUST NOT send new packets when its peer's window is closed (the number of packets unacknowledged is equal to the advertised, or assumed, window size). Previously transmitted packets may be retransmitted while the peer's window is closed. A peer Calhoun expires May 1999 [Page 5] INTERNET DRAFT November 1998 communicating via UDP can specify the window size it is providing to its peer by specifying this value in the Device-Reboot-Ind message. A ZLB message is used to communicate Nr and Ns fields. The Nr and the Ns fields are filled in as above, but the sequence number state, Ss, is not modified. Thus a ZLB message sent after a non-ZLB message will contain the new Ss value while a non-ZLB message sent after a ZLB message will contain the same value of Ns as the ZLB message did. Upon receipt of an in-order non-ZLB message, the receiving peer must increment its Sr value and may acknowledge the message by sending back the updated value of Sr in the Nr field of the next outgoing message. This updated Sr value can be piggybacked in the Nr field of any outgoing messages that the peer may happen to send back. If a peer does not have a message queued to transmit at the time a non-ZLB message is received then it should delay a short time before sending a ZLB message containing the latest values of Sr and Ss, as described above. This short delay is to allow for the possible arrival of a message to be transmitted back to its peer, thus avoiding the need to issue a ZLB. The suggested value for this time delay is 1/4 the receiving peer's value of Round-Trip-Time (RTT - see Appendix A), if it computes RTT, or a maximum of 1/2 of its fixed acknowledgment timeout interval otherwise. This timeout should provide a reasonable opportunity for the receiving peer to obtain a payload message destined for its peer, upon which the ACK of the received message can be piggybacked. Note that if a peer's window is full, it MAY advertise an older Nr value if it is not ready to accept new messages. This delay value should be treated as a suggested maximum; an implementation could make this delay quite small without adversely affecting the protocol. The default time delay is 2 seconds. To provide for better throughput, the receiving peer should skip this delay entirely and send a ZLB message immediately in the case where its receive window is filled and it has no queued data to send for this connection or it can't send queued data because the transmit window is closed. See Appendix B for some examples of how sequence numbers progress. 2.2 Suggested implementation A suggested implementation of this delay is as follows: Upon receiving a non-ZLB message, the receiver starts a timer that will expire in the recommended time interval. A variable, Lr (Last Nr value sent), is used by the transmitter to store the last value sent Calhoun expires May 1999 [Page 6] INTERNET DRAFT November 1998 in the Nr field of a transmitted payload message for this connection. Upon expiration of this timer, Sr is compared to Lr and, if they are not equal, a ZLB ACK is issued. If they are equal, then no ACK's are outstanding and no action needs to be taken. This timer should not be reinitialized if a new message is received while it is active since such messages will be acknowledged when the timer expires. This ensures that periodic ACK's are issued with a maximum period equal to the recommended delay time interval. This interval should be short enough to not cause false acknowledgement timeouts at the transmitter when payload messages are being sent in one direction only. Since such ACK's are being sent on what would otherwise be an idle data path, their affect on performance should be small, of not negligible. In order for a DIAMETER implementation to be able to retransmit messages, it MUST queue transmitted messages until the messages are acknowledged. It must also maintain a retransmission timer that determines when to assume that either a sent message did not arrive at the peer or the acknowledgment sent by the peer was lost. See Appendix A for a recommended retransmit timer implementation. There are two recommended methods for implementing the retransmission procedure. One method is for the sender to resend the entire window of unacknowledged messages when the retransmit timeout expires. This is the simplest method, but is inefficient when a receiver is not rotating the window due to congestion. The alternative method is to only resend the first message in the window (the first unacknowledged message) until an acknowledgment is received. This acknowledgment will indicate to the receiver the next, if any, message in the current window that needs to be retransmitted. A particular implementation may use either or both methods if desired. When a DIAMETER node has retransmitted a message to a given peer the maximum number of times (the recommended value is 3), it may send the request to an alternate DIAMETER server. This procedure may continue until either all of the servers have been tried, or the node selectively issues a failure to the requestor. 2.3 Peer failure recovery A DIAMETER message with the Command-Code AVP set to Device-Reboot-Ind and the Ns and Nr values set to zero (0) indicates that the peer has rebooted. This message MUST be recognized and supported by a DIAMETER implementation. When this event occurs, the Ss and Sr values must be reset and the retransmission queue MUST be cleared. Since the protocol requires that all new messages include a random identifier in the protocol header, a Device-Reboot-Ind that is received with the Calhoun expires May 1999 [Page 7] INTERNET DRAFT November 1998 same identifier as the last processed Device-Reboot-Ind is considered a retransmission and SHOULD NOT change the peer's state to inactive. Messages other than the Device-Reboot-Ind MUST NOT be sent to the peer until both the acknowledgement for the transmitted Device- Reboot-Ind AND the peer's Device-Reboot-Ind have been received. When both of these have been received, the peer is considered to be in the active state. 3.0 Extended Header Format The DIAMETER Base Protocol [12] assumes that the underlying transport is reliable (e.g. TCP). This section defines the optional fields in the DIAMETER header that allow DIAMETER to provide reliability. See [12] for a full description of the header fields not introduced in this document. A summary of the DIAMETER data format is shown below. The fields are transmitted from left to right. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RADIUS PCC |Flags|A|W| Ver | Packet Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Send (Ns) | Next Received (Nr) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AVPs ... +-+-+-+-+-+-+-+-+-+-+-+-+- PKT Flags The Packet Flags field is five bits, and is used in order to identify any options. This field MUST be initialized to zero. The following flags may be set: The 'W' bit (Window-Present) is set when the Next Send (Ns) and Next Received (Nr) fields are present in the header. This SHOULD be set unless the underlying layer provides reliability (i.e. TCP). The 'A' bit is set to indicate that the packet is an acknowledgement only and does not contain a Command-Code AVP following the header. Note that the Security AVPs MUST still be Calhoun expires May 1999 [Page 8] INTERNET DRAFT November 1998 present within an acknowledgment message. Next Send This field is present when the Window-Present bit is set in the header flags. The Next Send (Ns) is copied from the send sequence number state variable, Ss, at the time the message is transmitted. Ss is incremented after copying if the message is not a ZLB ACK. Next Received This field is present when the Window-Present bit is set in the header flags. Nr is copied from the receive sequence number state variable, Sr, and indicates the sequence number, Ns, +1 of the highest (modulo 2^16) in-sequence message received. See section 2.0 for more information. 4.0 DIAMETER AVPs This section defines a mandatory AVP which MUST be supported by all DIAMETER implementations supporting this extension. The following AVP is defined in this document: Attribute Name Attribute Code ----------------------------------- Receive-Window 277 4.1 Receive-Window Description This AVP is used by a peer to inform its peer of its local receive window size. The size indicated is the number of packets that it is willing to accept before the window is full. A sending peer MUST stop sending new DIAMETER messages when this many messages are outstanding (sent but not yet acknowledged). If a peer does not issue this attribute, a receive window size of 7 is assumed by its peer. This attribute is only valid in the Device-Reboot-Ind message. AVP Format Calhoun expires May 1999 [Page 9] INTERNET DRAFT November 1998 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AVP Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AVP Length | Reserved |P|T|V|E|H|M| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Integer32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ AVP Code 277 Receive-Window AVP Length The length of this attribute MUST be 12. AVP Flags The 'M' bit MUST be set. The 'H' and 'E' MAY be set depending upon the security model used. The 'V', 'T' and the 'P' bits MUST NOT be set. Integer32 This field contains the receive window size. 5.0 References [1] Reynolds, Postel, "Assigned Numbers", RFC 1700, October 1994. [2] Postel, "User Datagram Protocol", RFC 768, August 1980. [3] Calhoun, Zorn, Pan, "DIAMETER Framework", Internet- Draft, draft-calhoun-diameter-framework-00.txt, May 1998 [4] Calhoun, Rubens, "DIAMETER Base Protocol", Internet-Draft, draft-calhoun-diameter-05.txt, May 1998. [5] K. Hamzeh, T. Kolar, M. Littlewood, G. Singh Pall, J. Taarud, A. J. Valencia, W. Verthein, W.M. Townsley, B. Palter, A. Rubens "Layer Two Tunneling Protocol (L2TP)", Internet-Draft, May 1998 6.0 Acknowledgements The Authors would like to acknowledge the following people for their contribution in the development of the DIAMETER protocol: Calhoun expires May 1999 [Page 10] INTERNET DRAFT November 1998 Bernard Aboba, Jari Arkko, William Bulley, Daniel C. Fox, Lol Grant, Nancy Greene, Peter Heitman, Ryan Moats, Victor Muslin, Kenneth Peirce, Sumit Vakil, John R. Vollbrecht, Jeff Weisberg and Glen Zorn The authors would also like to thank the authors of the L2TP spec since most of the windowing text in this draft was shamefully copied from that spec. 7.0 Author's Address Questions about this memo can be directed to: Pat R. Calhoun Technology Development Sun Microsystems, Inc. 15 Network Circle Menlo Park, California, 94025 USA Phone: 1-650-786-7733 Fax: 1-650-786-6445 E-mail: pcalhoun@eng.sun.com Allan C. Rubens Ascend Communications 1678 Broadway Ann Arbor, MI 48105-1812 USA Phone: 1-734-761-6025 E-Mail: acr@del.com Calhoun expires May 1999 [Page 11] INTERNET DRAFT November 1998 Appendix A: Acknowledgment Timeouts DIAMETER uses sliding windows and timeouts to provide flow-control across the underlying medium and to perform efficient data buffering to keep two DIAMETER peers' receive window full without causing receive buffer overflow. DIAMETER requires that a timeout be used to recover from dropped packets. When the timeout for a peer expires, the previously transmitted message with Ns value equal to the highest in-sequence value of Nr received from the peer is retransmitted. The receiving peer does not advance its value for the receive sequence number state, Sr, until it receives a message with Ns equal to its current value of Sr. This rule assures that all subsequent acknowledgements to this peer will contain an Nr value equal to the Ns value of the first missing message until a message with the missing Ns value is received. The exact implementation of the acknowledgment timeout is vendor- specific. It is suggested that an adaptive timeout be implemented with backoff for flow control. The timeout mechanism proposed here has the following properties: Independent timeouts for each peer. A device will have to maintain and calculate timeouts for every active peer. An administrator-adjustable maximum timeout, MaxTimeOut, unique to each device. An adaptive timeout mechanism that compensates for changing throughput. To reduce packet processing overhead, vendors may choose not to recompute the adaptive timeout for every received acknowledgment. The result of this overhead reduction is that the timeout will not respond as quickly to rapid network changes. Timer backoff on timeout to reduce congestion. The backed-off timer value is limited by the configurable maximum timeout value. Timer backoff is done every time an acknowledgment timeout occurs. In general, this mechanism has the desirable behavior of quickly backing off upon a timeout and of slowly decreasing the timeout value as packets are delivered without errors. A.1 Calculating Adaptive Acknowledgment Timeout We must decide how much time to allow for acknowledgments to return. If the timeout is set too high, we may wait an unnecessarily long Calhoun expires May 1999 [Page 12] INTERNET DRAFT November 1998 time for dropped packets. If the timeout is too short, we may time out just before the acknowledgment arrives. The acknowledgment timeout should also be reasonable and responsive to changing network conditions. The suggested adaptive algorithm detailed below is based on the TCP 1989 implementation and is explained in Richard Steven's book TCP/IP Illustrated, Volume 1 (page 300). 'n' means this iteration of the calculation, and 'n-1' refers to values from the last calculation. DIFF[n] = SAMPLE[n] - RTT[n-1] DEV[n] = DEV[n-1] + (beta * (|DIFF[n]| - DEV[n-1])) RTT[n] = RTT[n-1] + (alpha * DIFF[n]) ATO[n] = MIN (RTT[n] + (chi * DEV[n]), MaxTimeOut) DIFF represents the error between the last estimated round-trip time and the measured time. DIFF is calculated on each iteration. DEV is the estimated mean deviation. This approximates the standard deviation. DEV is calculated on each iteration and stored for use in the next iteration. Initially, it is set to 0. RTT is the estimated round-trip time of an average packet. RTT is calculated on each iteration and stored for use in the next iteration. Initially, it is set to PPD. ATO is the adaptive timeout for the next transmitted packet. ATO is calculated on each iteration. Its value is limited, by the MIN function, to be a maximum of the configured MaxTimeOut value. Alpha is the gain for the round trip estimate error and is typically 1/8 (0.125). Beta is the gain for the deviation and is typically 1/4 (0.250). Chi is the gain for the timeout and is typically set to 4. To eliminate division operations for fractional gain elements, the entire set of equations can be scaled. With the suggested gain constants, they should be scaled by 8 to eliminate all division. To simplify calculations, all gain values are kept to powers of two so that shift operations can be used in place of multiplication or division. The above calculations are carried out each time an acknowledgment is received for a packet that was not retransmitted (no timeout occured). A.2 Flow Control: Adjusting for Timeout Calhoun expires May 1999 [Page 13] INTERNET DRAFT November 1998 This section describes how the calculation of ATO is modified in the case where a timeout does occur. When a timeout occurs, the timeout value should be adjusted rapidly upward. To compensate for shifting internetwork time delays, a strategy must be employed to increase the timeout when it expires. A simple formula called Karn's Algorithm is used in TCP implementations and may be used in implementing the backoff timers for the DIAMETER peers. Notice that in addition to increasing the timeout, we also shrink the size of the window as described in the next section. Karn's timer backoff algorithm, as used in TCP, is: NewTIMEOUT = delta * TIMEOUT Adapted to our timeout calculations, for an interval in which a timeout occurs, the new timeout interval ATO is calculated as: RTT[n] = delta * RTT[n-1] DEV[n] = DEV[n-1] ATO[n] = MIN (RTT[n] + (chi * DEV[n]), MaxTimeOut) In this modified calculation of ATO, only the two values that contribute to ATO and that are stored for the next iteration are calculated. RTT is scaled by delta, and DEV is unmodified. DIFF is not carried forward and is not used in this scenario. A value of 2 for Delta, the timeout gain factor for RTT, is suggested. Appendix B: Examples of sequence numbering This appendix uses several common scenarios to illustrate how sequence number state progresses and is interpreted. B.1 Lock-step session establishment In this example, a DIAMETER host establishes communication with a peer, with the exchange involving each side alternating in the sending of messages. This example is contrived, in that the final acknowledgement typically would be included in the Device-Watchdog- Ind message. DIAMETER Host A DIAMETER Host B -> Device-Reboot-Ind Nr: 0, Ns: 0 (ZLB) <- Nr: 1, Ns: 0 Calhoun expires May 1999 [Page 14] INTERNET DRAFT November 1998 -> Device-Watchdog-Ind Nr: 0, Ns: 1 (delay) (ZLB) <- Nr: 2, Ns: 0 B.2 Multiple packets acknowledged This example shows a flow of packets from DIAMETER Host B to Host A, with Host A having no traffic of its own. Host A is waiting 1/4 of its timeout interval, and then acknowledging all packets seen since the last interval. DIAMETER Host A DIAMETER Host B (previous packet flow precedes this) -> (ZLB) Nr: 7000, Ns: 1000 (non-ZLB) <- Nr: 1000, Ns: 7000 (non-ZLB) <- Nr: 1000, Ns: 7001 (non-ZLB) <- Nr: 1000, Ns: 7002 (Host A's timer indicates it should acknowledge pending traffic) -> (ZLB) Nr: 7003, Ns: 1000 B.3 Lost packet with retransmission Host A attempts to communicate with Host B. The Device-Reboot-Ind sent from B to A is lost and must be retransmitted by Host B. DIAMETER Host A DIAMETER Host B -> Device-Reboot-Ind Nr: 0, Ns: 0 (packet lost) Device-Reboot-Ind <- Nr: 1, Ns: 0 (pause; Host A's timer started first, so fires first) Calhoun expires May 1999 [Page 15] INTERNET DRAFT November 1998 -> Device-Reboot-Ind Nr: 0, Ns: 0 (Host B realizes it has already seen this packet) (Host B might use this as a cue to retransmit, as in this example) Device-Reboot-Ind <- Nr: 1, Ns: 0 -> Device-Watchdog-Ind Nr: 1, Ns: 1 (delay) (ZLB) <- Nr: 2, Ns: 1 Calhoun expires May 1999 [Page 16]