Network Working Group Sira Panduranga Rao Internet Draft UTA Expiration Date: July 2001 Alex Zinin File name: draft-ietf-ospf-dc-00.txt Cisco Systems November 2000 Detecting Inactive Neighbors over OSPF Demand Circuits draft-ietf-ospf-dc-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract OSPF [RFC2328] is a link-state intra-domain routing protocol used in IP networks. OSPF behavior over demand circuits is optimized in [RFC1793] to minimize the amount of overhead traffic. A part of OSPF demand circuit extensions is the Hello suppression mechanism. This technique allows a demand circuit to go down when no interesting traffic is going through the link. However, it also introduces a problem, where it becomes impossible to detect a OSPF-inactive neighbor over such a link. This memo addresses the above problem by introducing three mechanisms---Hello probes, limitation of the number of LSA retransmits and flushing of self-originated LSAs. 1. Motivation In some situations, when operating over demand circuits, the remote neighbor may be unable to run OSPF, and, as a possible result, unable Rao, Zinin [Page 1] INTERNET DRAFT OSPF DC Inactive Neighbor Detection October 2000 to route application traffic. Possible scenarios include: o The OSPF process might have died on the remote neighbor. o Oversubscription (Section 7 of [RFC1793]) may cause a continuous drop of application data at the link level. The problem here is that the local router cannot identify the prob- lems such as this, since Hello exchange is suppressed on demand cir- cuits. If the topology of the network is such that other routers cannot communicate their knowledge about the remote neighbor via flooding, the local router and all routers behind it will never know about the problem, so application traffic may continue being for- warded to the OSPF-incapable router. This memo describes two techniques that solve the described problem. First, a neighbor probing mechanism using Hellos is introduced, and second, the number of LSA retransmit attempts on demand circuits is limited. We also encourage flushing of self-originated LSAs when the OSPF process is going down. 2. Proposed Solution The first part of the solution this document proposes makes use of Hellos to detect whether the OSPF process is operational on the remote neighbor. We call this process "Hello probing". The idea behind this technique is to allow either of the two neighbors con- nected over a demand circuit to test the remote neighbor at any time (see Section 2.1.2). The routers across the demand circuit can be connected by either a point-to-point link, or a virtual link, or a point-to-multipoint interface. The case of routers connected by broadcast networks or NBMA is not considered, since Hello suppression is not used in these cases (Section 3.2 [RFC1793]). Since Hellos are suppressed on demand circuit interfaces, the local router must make sure the remote router supports Hello probing before testing it. Oth- erwise the remote router may be mistakenly declared inoperational. To accomplish this, we introduce a new capability bit that is exchanged in DBD packets (see Section 2.1.1). The Hello probing mechanism is used as follows. After a router has synchronized the LSDB with its neighbor over the demand circuit, the demand circuit may be torn down if there is no more application traffic. When application traffic starts going over the link, the link is brought up, and the routers may probe each other. The routers may also probe each other any time the link is up (could be imple- mented as a configurable option) with the caution that OSPF Hello packets are not considered as interesting traffic and do not cause Rao, Zinin [Page 2] INTERNET DRAFT OSPF DC Inactive Neighbor Detection October 2000 the demand circuit to remain up. The case when one or more of the router's links are oversubscribed (see section 7 of [RFC1793]) should be considered by the implementa- tions. In such a situation even if the link status is up and applica- tion data being sent on the link, only a limited number of neighbors is really reachable. To make sure temporarily unreachable neighbors are not mistakenly declared down, Hello probing should be restricted to those neighbors that are actually reachable (i.e., there is a cir- cuit established with the neighbor at the moment the probing pro- cedure needs to be initiated). This check itself is considered an implementation detail. The second part of the solution is limiting the number of times LSAs can be retransmitted over a demand circuit. See Section 2.2 for more details. The third part of the solution is flushing of self-originated LSAs whenever the OSPF process on a router is going down. Hello probing and LSA retransmission limit may be used together or alone. This memo does not dictate which one and how many of them must be implemented, but only provides mechanisms to solve the described problem. This memo, however, recommends to flush some locally originated LSAs when possible when OSPF process is going down. 2.1 Hello Probing The Hello probing mechanism allows routers connected over a demand circuit to test each other's OSPF capabilities. In order to do so, both routers need to support this functionality, otherwise opera- tional routers may mistakenly be declared unreachable. We insure this by introducing a new capability bit in the Extended Options TLV announced in the link-local signaling (LLS) data block of DBD packets (see [LLS] for more information on LLS). We also use the same bit in Hello packets as a Hello reply request (RR) flag. This helps avoid racing conditions when a Hello sent in reply causes another reply to be sent, and so on. When a router needs to probe its neighbor, it sends a Hello with the RR bit set. The receiving side sends a Hello packet in reply with RR bit clear. 2.1.1 Extended Options TLV The Extended Options TLV is a part of LLS specification (see [LLS]) and is announced in both Hello and DBD packets. Rao, Zinin [Page 3] INTERNET DRAFT OSPF DC Inactive Neighbor Detection October 2000 A new bit is introduced in the Value field of this TLV as shown in Figure 1. The value of the bit is 0x00000004. +---+---+---+---+---+---+---+- -+---+---+---+---+-----+---+---+ | * | * | * | * | * | * | * |...| * | * | * | * |HP/RR| RS| OR| +---+---+---+---+---+---+---+- -+---+---+---+---+-----+---+---+ Figure 1. Bits in Extended Options TLV When used in DBD packets, the new bit indicates router's Hello Prob- ing capability and is called the HP-bit. When used in Hello packets, the new bit means that a Hello must be sent in reply and is called the Reply Request (RR) bit. Routers supporting Hello probing must always set the HP bit in their DBD packets. For description of RS and OR bits, see [HELLO] and [OOB] correspond- ingly. 2.1.2 Hello Probing Procedure OSPF routers are allowed to perform Hello probing at any time. How- ever, it is not recommended to do so when the link is down, because, in its one extreme, it will keep the demand circuit up or bouncing, or, in its other extreme, it may cause a neighbor to be mistakenly declared unreachable. It is recommended that both sides perform Hello probing whenever the demand circuit goes up, and periodically if the circuit stays in the active state. Note however that care must be taken not to let OSPF Hello probes keep the circuit in the active state without any appli- cation traffic going through it. When a router needs to probe a neighbor, it should start its Hello and Dead timers and send Hello packets with the RR-bit set. If asso- ciated interface is point-to-multipoint, it is recommended to account for neighbor-specific timers and send Hello probes as IP unicasts. On the receiving side, when a packet with the RR-bit set is received, the router should immediately reply with a unicast Hello packet without setting the RR-bit. Unicast Hello limits the scope of Hello probing. The described procedure makes it possible for the sides to probe their corresponding neighbors asynchronously and without coordina- tion. Rao, Zinin [Page 4] INTERNET DRAFT OSPF DC Inactive Neighbor Detection October 2000 2.2 Limiting Number of LSA Retransmissions An alternative method (that can be used together with Hello probing) to identify OSPF-incapable neighbors is to limit the amount of LSA retransmits over a demand circuit. The router should count the number of retransmit attempts for each neighbor. When an LSA is acknowledged by the neighbor, the router should zero the counter. When the counter reaches a predefined (or configured) value, a KillNbr event should be generated for the neighbor experiencing the problem. Note that this method does not require cooperation of the routers on both sides of a demand circuit and can be used with already installed OSPF routers without requiring them to be upgraded with new software. 2.3 OSPF Process Shutdown and Flushing of LSAs It is recommended for an OSPF process to flush its self-originated LSAs when the OSPF process is going down. This way the router informs all other routers in the area that they should not consider it tran- sit any more and should look for alternative routes. Care must be taken not to introduce instability in the network by flushing all LSAs. It is acceptable to flush only the self-originated router-LSA in the appropriate area and let other LSAs age out. Note that there can happen situations where the router cannot reli- ably flush its LSAs within reasonable time frame. This could be due to the loss of the packets, the demand circuit being down or the delay in establishing a path to the neighbor. A situation highlight- ing this problem is when the router is oversubscribed (see Section 7 of [RFC1793]) and thus cannot communicate the news to its neighbors. 3. Support of Virtual Links and Point-to-multipoint Interfaces Virtual links can be treated analogous to point-to-point links and so the techniques described in this memo are applicable to virtual links as well. The case of point-to-multipoint interface running as demand circuit (section 3.5 [RFC1793]) can be treated as individual point- to-point links, for which the solution has been described in section 2. 4. Compatibility issues Backward compatibility of the Hello probing mechanism is insured by introducing the HP bit in the Extended Options TLV. Limiting the number of LSA retransmission is a backward-compatible technique by its nature. Rao, Zinin [Page 5] INTERNET DRAFT OSPF DC Inactive Neighbor Detection October 2000 5. Considerations In addition to the lost functionality mentioned in Section 6 of [RFC1793], there is an added overhead in terms of the amount of data (hello packets) being transmitted due to Hello probing whenever the link is up and thereby increasing the overall cost. 6. Acknowledgements The authors would like to thank John Moy, Vijayapal Reddy Patil, SVR Anand, and Peter Psenak for their comments on this work. 7. References [RFC2328] J.Moy, OSPF Version 2. Technical Report RFC2328 Internet Engineer- ing Task Force, 1998 ftp://ftp.isi.edu/in-notes/rfc2328.txt [RFC1793] J.Moy, Extending OSPF to support Demand Circuits. Technical Report RFC1793 Internet Engineering Task Force, 1995 ftp://ftp.isi.edu/in-notes/rfc1793.txt [LLS] Zinin, Friedman, Roy, Nguyen, Yeung, "OSPF Link-local Signaling", draft-ietf-ospf-lls-00.txt, Work in progress. [HELLO] Zinin, Roy, Nguyen, "OSPF Restart Signaling", draft-ietf-ospf- restart-00.txt, Work in progress. [OOB] Zinin, Roy, Nguyen, "OSPF Out-of-band LSDB resynchronization", draft-ietf-ospf-oob-resync-00.txt, Work in progress. 8. Authors' addresses Sira Panduranga Rao The University of Texas at Arlington Arlington, TX 76013 Email: siraprao@hotmail.com Alex Zinin Cisco Systems 150 West Tasman Dr. San Jose, CA 95134 Email: azinin@cisco.com Rao, Zinin [Page 6]