L2TP Working Group Vipin Jain INTERNET DRAFT Nortel Networks, Inc. Expires Nov 2001 May 2001 Fail over extensions for L2TP draft-vipin-l2tpext-failover-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Layer Two Tunneling Protocol (L2TP) [1] provides a standard method for tunneling PPP [2] packets. Because L2TP control, and optionally data packets uses sequencing, it becomes difficult to preserve L2TP tunnels and sessions within, should there be a failure in a system. Protocol extensions are required to indicate the peer about fail over to help failed system recover better and exhibit more predictable behavior. This would provide resiliency in an L2TP based network thereby improving end user's PPP connectivity. It can also be used to provide planned shutdown of L2TP tunnels. Jain, Vipin expires November 2001 [Page 1] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 1.0 Introduction L2TP control plane uses sequencing, timeouts and retransmissions to reliably transmit control packets. Where as L2TP data plane uses sequencing to detect packet loss. Sliding window mechanism used by L2TP makes it difficult for a system that fails and wants a standby to take over. To be able to do this, an implementation has to maintain an active copy of transmit and receive windows for every tunnel on the standby. This document defines new AVPs and procedures describing extensions to the protocol that will allow indicating the peer about fail over and fail over capabilities. Upon such indication a peer would understand the new sequencing requirements on data and control plane and not drop existing L2TP tunnels and sessions. The extensions proposed are backward compatible. 1.1 Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [5]. 2.0 Fail Over Protocol This section describes the protocol followed between LAC and LNS before and after fail-over occurs. 2.1 Tunnel Establishement LAC or LNS when sending SCCRQ or SCCN include Fail-over capability AVP to indicate its level of support for a fail-over situation. This means granularity of a fail-over operation is per-tunnel. Appendix A discusses design considerations for providing fail-over operation on per tunnel granularity. 5.2 Session Establishement There is no change to how L2TP [2] describes L2TP session establishment. A node requiring supporting fail-over must maintain the state (and other relevant info) of each session on a redundant card or processor. How it achieves this is left to the implementation and is out of the scope of this document. Jain, Vipin expires November 2001 [Page 2] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 5.3 Fail Over protocol This section describes the behavior of two endpoints of a tunnel, should LAC or LNS fails. The behavior is not different for a LAC failure or LNS Failure. Appendix B contains example dialogues on a fail-over situations. 5.3.1. On the Node that Fails After a fail over occurs the Node sends an SCCRQ with following considerations: o It includes all AVPs that it had sent when tunnel was established. o It includes Fail-Over AVP indicating that fail-over has occurred. o It includes Old-Assigned-Tunnel-ID AVP, indicating the value of tunnel ID that was assigned by the peer in prior tunnel establishment dialogue. This AVP indicates the tunnel-id of the peer that is being subjected to fail-over. o This SCCRQ MUST use a new value tunnel ID in Assigned Tunnel ID AVP upon fail-over. Use of a different tunnel ID avoids acknowledging some control messages by peer that were meant for previous tunnel. The session IDs of various sessions remain same. After a new tunnel is established, the node MUST retransmit all CDNs that were not acknowledged by the peer. It MUST also use the new tunnel id while re-transmitting these messages. 5.3.2. On the Node that gets indication of peer failure Upon receipt of an SCCRQ by a node that supports fail-over, it responds as follows: o It MUST use Old-Assigned-Tunnel-ID AVP to identify the tunnel that is subjected to fail-over. If it could not find Old Assigned Tunnel ID AVP in SCCRQ, it MUST reject the SCCRQ and send SCCN in response. o If Fail-Over AVP indicates a value that is different from what peer advertised in Fail-over Capability AVP, SCCRQ MUST be rejected with an SCCN in response. In this case the node should not take any action on any tunnel that matches Old Assigned Tunnel ID AVP. Jain, Vipin expires November 2001 [Page 3] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 o The SCCRP in response to SCCRQ, MUST use a new tunnel ID in Assigned Tunnel ID AVP. This avoids acknowledging some control messages by peer that were meant for previous tunnel. The session IDs of various sessions remain same. o It retain all sessions within a tunnel and have them belong to the new tunnel upon establishment. o It MUST retransmit all non ZLB control messages that were not acknowledged by the peer. It MUST also use the new tunnel id while re-transmitting these control messages. 5.3.3. Data Session sequencing If sequencing is used on any session within a tunnel, then both peers MUST reset their sequence numbers to 0. This allows data plane to come back in sync and avoids any confusion of packet loss. 6.0. Fail Over AVPs Following new AVPs are introduced that should be included in SCCRQ and SCCN messages to deal with fail over situations. 6.1. Fail-over capability AVP A Fail-over capability AVP, Attribute Type [TBD], describes the node's capability for a fail-over situation. A node should depend on peer's fail-over capability in a fail-over situation. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|H| rsvd | Length | Vendor ID [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Attribute Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The valid value for the attribute can only be 1. Higher level of fail-over capabilities can be defined in future to support further requirements. This document describes the mechanism only for level 1 fail-over support. The AVP is not mandatory (the M-bit MUST be set to 0), however an implementation requiring a fail-over capability from peer might reject SCCRQ if it doesn't find Fail-over capability AVP in it. The AVP MAY be hidden (the H-bit set to 0 or 1). Jain, Vipin expires November 2001 [Page 4] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 6.2. Old Assigned Tunnel ID AVP The Old Assigned Tunnel ID AVP, Attribute Type [TBD], encodes the Tunnel ID in SCCRQ and SCCRP messages that was assigned by the sender before a fail-over occurred. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|H| rsvd | Length | Vendor ID [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Attribute Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Assigned Tunnel ID is a 2 octet non-zero unsigned integer. When used in fail-over situation this AVP MUST be included in SCCRQ. This AVP is mandatory(the M-bit MUST be set to 1). The AVP MAY be hidden (the H-bit set to 0 or 1). 6.3 Fail-over AVP The Fail-over AVP, Attribute Type [TBD], indicates the peer that Fail-over on the node has occurred and it would like the peer to restart tunnel establishment and preserve all sessions that were established. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|H| rsvd | Length | Vendor ID [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Attribute Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The value of this attribute should be the same as what was advertised by the peer in Fail-over Capability AVP defined in section 6.1. It indicates the level of fail-over support peer needs on a fail-over situation. The valid value of the attribute can only be 1 because peer will not advertise anything other than 1. Higher level of Fail-over AVP behavior can be defined in future to support further requirements. This document describes the mechanism only for level 1 fail-over support. When used in fail-over situation this AVP MUST be included in SCCRQ This AVP is mandatory(the M-bit MUST be set to 1). The AVP MAY be hidden (the H-bit set to 0 or 1). Jain, Vipin expires November 2001 [Page 5] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 7. Security considerations The fail-over mechanism does not add any further security problems. To prevent from potential misuse of a fail-over situation tunnel authentication is recommended. 8. Future work Future work would comprise of defining behavior among peers for new levels of fail-over support. 9. IANA Considerations To be completed. 10. Acknowledgments Many thanks to Mark Townsley, Keyur Parikh, Andy Kocinski and Reinaldo Penno for their valuable comments. 11. References [1] Townsley, et. al., "Layer Two Tunneling Protocol L2TP", RFC 2661, February 1999. [2] Simpson, W., "The Point-to-Point Protocol (PPP)", STD 51, RFC 1661, July 1994. 12. Authors' Addresses Vipin Jain Nortel Networks, Inc. 2305 Mission College Blvd Santa Clara, CA 95054 Phone: +1 408.565.2636 Email: vipin@nortelnetworks.com Jain, Vipin expires November 2001 [Page 6] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 Appendix A This section lists some design considerations: 1. Why tunnel level granularity? Why not system wide? Why not per session? Following Reasons, why tunnel level granularity should be preferred over system level granularity or session level granularity: o When fail-over occurs at a system where from only one tunnel is being established, only a tunnel needs fail-over support from our perspective. o There might be implementations that want to provide resiliency only for specific set of tunnels (due to some QOS agreements) and not everything else. o Session level granularity is like inviting too much of message exchange for no real advantage. Appendix B This section contains examples of how L2TP control channel would recover from various fail-over situations. 2.1. ICRQ sent: LAC LNS Sid =x, tid=y ---- ICRQ (Ns=a, Nr=b) --X Fail-over occurs before receiving ICRQ new tid=y1 <--- SCCRQ (Ns=0, Nr=0)----- Send SCCRQ with Fail-over AVP -----SCCRP (Ns=0, Nr=1) ---> New LAC tid noted here <--- SCCN (Ns=1, Nr=1) ----- ----- ZLB (Ns=1, Nr=2)-----> sid=x, tid=y1 --RESEND ICRQ (Ns=1,Nr=2)--> Valid ICRQ, send ICRP <---- ICRP (Ns=2, Nr=2) ---- 2.2. ICCN sent: LAC LNS Sid =x, tid=y ---- ICRQ (Ns=a, Nr=b) ----> Valid ICRQ <--- ICRP (Ns=b, Nr=a+1) --- Send ICRP and save the session's state Jain, Vipin expires November 2001 [Page 7] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 --- ICCN (Ns=a+1, Nr=b+1)--X Fail-over occurs before receiving ICCN new tid=y1 <--- SCCRQ (Ns=0, Nr=0)----- Send SCCRQ with Fail-over AVP -----SCCRP (Ns=0, Nr=1) ---> New LAC tid y2 noted <--- SCCN (Ns=1, Nr=1) ----- ----- ZLB (Ns=1, Nr=2)-----> Sid =x, tid=y1 ---- ICCN (Ns=1, Nr=2) ----> LNS from its previous Saved state knows it had sent ICRP <--- ZLB (Ns=2, Nr=2)------- tid = y2 2.3. ICRP sent: LAC LNS Save session state --- ICRQ (Ns=a, Nr=b) ----> Valid ICRQ, Send ICRP Fail-over occurs X-- ICRP (Ns=b, Nr=a+1) --- Sid =x, tid=y before receiving ICRP Send SCCRQ with ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1 Fail-over AVP New LNS tid y2 <-----SCCRP (Ns=0, Nr=1)---- noted here ----- SCCN (Ns=1, Nr=1) ----> <----- ZLB (Ns=1, Nr=2)------ sid =x, tid=y1 <---- ICRP (Ns=1, Nr=2) ----- LNS resends unacknowledged ICRP ----- ICCN (Ns=2, Nr=2) ----> sid = x, tid = y2 2.4. ICCN Acked: LAC LNS Save session state --- ICRQ (Ns=a, Nr=b) ----> Valid ICRQ, Send ICRP <--- ICRP (Ns=b, Nr=a+1)--- Sid =x, tid=y Save session state --- ICCN (Ns=a+1, Nr=b+1)-> Valid ICCN, Send ZLB Fail-over occurs X--- ZLB (Ns=1, Nr=2)------ Before receiving ZLB Send SCCRQ with ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1 Fail-over AVP New LNS tid y2 <-----SCCRP (Ns=0, Nr=1)---- noted here Jain, Vipin expires November 2001 [Page 8] Internet-Draft draft-ietf-l2tpext-failover-00.txt May,2001 ----- SCCN (Ns=1, Nr=1) ----> <----- ZLB (Ns=1, Nr=2)------ LAC is not required LNS is not required to to resend anything resend ZLB 2.5. CDN Sent: LAC LNS Sid =x, tid=y ---- CDN (Ns=a, Nr=b) ---X Fail-over occurs before receiving CDN new tid=y1 <--- SCCRQ (Ns=0, Nr=0)----- Send SCCRQ with Fail-over AVP -----SCCRP (Ns=0, Nr=1) ---> New LAC tid noted here <--- SCCN (Ns=1, Nr=1) ----- ----- ZLB (Ns=1, Nr=2)-----> sid=x, tid=y1 ---RESEND CDN (Ns=1,Nr=2)--> Valid CDN, send ZLB Ack <---- ZLB (Ns=2, Nr=2) ---- 2.5. CDN Sent: LAC LNS <--- CDN (Ns=b, Nr=a+1) --- Sid =x, tid=y, LNS remembers un-acknowledged CDNs here --- ZLB Ack (Ns=a+1, Nr=b+1)-X Fail-over occurs before receiving ZLB Send SCCRQ with ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1 Fail-over AVP New LNS tid y2 <-----SCCRP (Ns=0, Nr=1)---- noted here ----- SCCN (Ns=1, Nr=1) ----> <----- ZLB (Ns=1, Nr=2)------ sid=x, tid=y1 <--RESEND CDN (Ns=1, Nr=2)--- LNS resends CDN If LAC finds ----- ZLB (Ns=1, Nr=2)-----> a session with sid=x it deletes the session Jain, Vipin expires November 2001 [Page 9]