Network Working Group K. Patel Internet-Draft C. Appanna Intended status: Standards Track P. Mohapatra Expires: February 9, 2011 Cisco Systems J. Scudder Juniper Networks J. Uttaro AT&T August 8, 2010 Root cause notification to solve BGP path hunting draft-keyupate-bgp-rcn-00.txt Abstract Whenever a prefix is withdrawn using BGP withdrawal mechanism, it triggers a number of updates in certain scenarios before the prefix is completly withdrawn from the entire BGP network. This phenomenon is popularly known as _path exploration_ or _path hunting_ and occurs because of path vector property of BGP. It results in a series of unwanted or redundant transitions that overloads the BGP network. This document describes a mechanism to help limit the amount of such path exploration by defining two optional transitive path attributes for BGP: SPEAKERID_PATH and ROOT_CAUSE. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 9, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. Patel, et al. Expires February 9, 2011 [Page 1] Internet-Draft Root Cause Notification August 2010 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Patel, et al. Expires February 9, 2011 [Page 2] Internet-Draft Root Cause Notification August 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2. Reference Diagram . . . . . . . . . . . . . . . . . . . . . . 5 3. SPEAKERID_PATH attribute . . . . . . . . . . . . . . . . . . . 6 4. ROOT_CAUSE attribute . . . . . . . . . . . . . . . . . . . . . 8 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.1. Sending SPEAKERID_PATH attribute . . . . . . . . . . . . . 9 5.2. Sending ROOT_CAUSE attribute . . . . . . . . . . . . . . . 9 5.2.1. At the point of occurrence . . . . . . . . . . . . . . 9 5.2.2. At an intermediate point . . . . . . . . . . . . . . . 9 5.3. Receiving ROOT_CAUSE Attribute . . . . . . . . . . . . . . 10 5.4. Usage of BGP Aggregates . . . . . . . . . . . . . . . . . 10 5.5. BGP Confederation . . . . . . . . . . . . . . . . . . . . 10 5.6. BGP Inactive Timer . . . . . . . . . . . . . . . . . . . . 10 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Patel, et al. Expires February 9, 2011 [Page 3] Internet-Draft Root Cause Notification August 2010 1. Introduction Whenever a prefix is withdrawn using BGP withdrawal mechanism, it triggers a number of updates in certain scenarios before the prefix is completly withdrawn from the entire BGP network. This phenomenon is popularly known as _path exploration_ or _path hunting_ and occurs because of path vector property of BGP. It results in a series of unwanted or redundant transitions that overloads the BGP network ([I-D.li-bgp-stability]). It is interesting to note that these redundant transitions can end up triggering route dampening ([RFC2439], if deployed in the network. Additionally, route dampening itself is known to cause path exploration in the network due to the delay it introduces ([I-D.li-bgp-stability]). This effectively creates a spiral effect on BGP instability. Both the generation of unwanted update messages and the triggering of route dampening can adversly affect the BGP convergence time. The problem lies in the way BGP path vector is defined. With a link state protocol, each router stores a complete view of the entire network and derives reachability information from that view. In the event of a flap, each router can correctly determine all paths that suffer from the same root cause. This is not scalable in large networks in which BGP operates. By design, BGP advertises only the path it is using in terms of ASes to its neighbors with each prefix. Unfortunately, this information is coarse even in a simple topology as the number of possible paths through the routers is quite large. When a route is not reachable, because the detail route information is not included, BGP selection process may end up choosing an alternative path that is actually not available. After sets of such transitions, BGP speaker will resolve this abnormality and decide on correct available path based on receiver side loop detection. This document proposes a mechanism to identify unreachable paths for which BGP withdrawals are not received and prevent them from being selected as prefered paths. This helps avoid unnecessary route flapping within the network. A new optional transitive path attribute, SPEAKERID_PATH is tagged in BGP announcements as the prefix travels through the network, essentially creating more granular information about routers in the path. When a prefix is withdrawn, another optional transitive attribute, ROOT_CAUSE is attached to the implicit or explicit withdraws that are generated at different points in the network. This attribute is created once at the point of occurrence of the fault and gets attached to the resulting UPDATE message throughout the network unchanged. At a receiving speaker, the ROOT_CAUSE attribute is matched against the SPEAKERID_PATH attributes of available paths to help identify and Patel, et al. Expires February 9, 2011 [Page 4] Internet-Draft Root Cause Notification August 2010 avoid those that are unreachable since they are affected by the same root cause. Path exploration caused by new prefix advertisements is not discussed in this document. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Reference Diagram +-------+ | AS5 | .................R11 R13....... . | | . . +--R12--+ . . . . . . . . . . . . . +--R4---+ +---R7--+ +--R10--+ | | | | | | R2 R3........... R5 R6............... R8 R9 . | AS2 | | AS3 | | AS4 | . +-------+ +-------+ +-------+ . . . . +-- R1--+ | | | AS1 | | | +-------+ The figure above describes a topology that leads to classic path hunting problem. In steady state, AS5 has 3 paths for prefixes received from AS1: Patel, et al. Expires February 9, 2011 [Page 5] Internet-Draft Root Cause Notification August 2010 +----------+---------+ | Path | AS_PATH | +----------+---------+ | p1(best) | 2 1 | | p2 | 3 2 1 | | p3 | 4 3 2 1 | +----------+---------+ When the link between AS1 and AS2 goes down, it leads to a series of events and actions at AS5 as follows: +------+---------------------+--------------------------------------+ | Step | Event | Action | +------+---------------------+--------------------------------------+ | 1 | Recv withdraw of p1 | Select p2 as best | | | | Send AS_PATH (5 3 2 1) upstream | | -- | -- | -- | | 2 | Recv withdraw of p2 | Select p3 as best | | | | Send AS_PATH (5 4 3 2 1) upstream | | -- | -- | -- | | 3 | Recv withdraw of p3 | Prefixes have no path | | | | Send withdraw for the prefixes | | | | upstream | +------+---------------------+--------------------------------------+ This trivial example creates unnecessary churn in the network till the end state is reached. 3. SPEAKERID_PATH attribute SPEAKERID_PATH is an optional transitive attribute that is very similar in encoding and operation to the AS_PATH attribute. It is composed of a sequence of SPEAKERID path segments. Each segment is represented by a triple (type, length, value). Following is the format: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | | (1 octet) | (1 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ ~ Value ~ ~ ~ Patel, et al. Expires February 9, 2011 [Page 6] Internet-Draft Root Cause Notification August 2010 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The type is a 1-octet field with the following value defined: Value Type definition 1 AS_ID_SEQUENCE: ordered set of AS and Speaker ID pair a route in the UPDATE message has traversed. The length is a 1-octet field, containing the number of such pairs. Thus when the type is 1, the value contains one or more entries of the following: +---------------------------------+ | AS (4 bytes) | +---------------------------------+ | SPEAKER-ID (4 bytes) | +---------------------------------+ The use and meaning of these fields are as follows: AS: The AS is a four-octet field that indicates the AS number of the BGP speaker. If this ASN is from the public ASN space, it must have been assigned by the appropriate authority (use of ASN values from the private ASN space is strongly discouraged). Note that when a four-octet AS supporting speaker (NEW) announces an UPDATE to a two-octet AS supporting speaker (OLD), it encodes AS_TRANS as a two-octet AS in the AS_PATH attribute instead of its own AS ([I-D.ietf-idr-rfc4893bis]). But while encoding the SPEAKERID_PATH attribute, it MUST put its own four-octet AS in this field regardless of whether the neighbor to whom the UPDATE message is being sent is an OLD or NEW speaker. SPEAKER-ID: The SPEAKER-ID is a four-octet field that indicates the router-id of the BGP speaker. If the router-id is from the public address space, it must have been assigned by the appropriate authority. (use of the private ip address as a router-id is strongly discourged). Patel, et al. Expires February 9, 2011 [Page 7] Internet-Draft Root Cause Notification August 2010 4. ROOT_CAUSE attribute ROOT_CAUSE is an optional transitive attribute that is composed of one or more triple (type, length, value). Following is the format: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | | (1 octet) | (1 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ ~ Value ~ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The type is a 1-octet field with the following value defined: Value Type definition 1 AS_ID_CONN: AS and router-ID pairs from both sides of the connection that is the point of occurrence for the withdraw. The length is a 1-octet field, containing the length in octets of the value field. When the type is 1, the value contains the following: +--------------+ |Flags(1 octet)| +---------------------------------+ | left AS (4 bytes) | +---------------------------------+ | left SPEAKER-ID (4 bytes) | +---------------------------------+ | right AS (4 bytes) | +---------------------------------+ | right SPEAKER-ID (4 bytes) | +---------------------------------+ Patel, et al. Expires February 9, 2011 [Page 8] Internet-Draft Root Cause Notification August 2010 5. Operation 5.1. Sending SPEAKERID_PATH attribute When a BGP speaker supporting the mechanism described in this document propagates a route it learned from another BGP speaker's UPDATE message, it modifies the route's SPEAKERID_PATH attribute by prepending its own router-ID and AS number as the last pair of the sequence. If there is no such attribute, the local system creates the attribute, creates a new segment in the attribute of type AS_ID_SEQUENCE and places its own pair into that segment. If the act of prepending will cause an overflow in the existing segment (i.e. more than 255 pairs), it MUST prepend a new segment of type AS_ID_SEQUENCE and prepend its own pair to this new segment. This operation should be performed regardless of whether the peer is IBGP or EBGP. 5.2. Sending ROOT_CAUSE attribute 5.2.1. At the point of occurrence A BGP speaker originates the ROOT_CAUSE attribute into an UPDATE message in one of the following scenarios: o A session with a peer AS goes down or the associated link goes down and the received prefixes need to be withdrawn or their bestpath changes. o it receives withdraws for some prefixes without the ROOT_CAUSE attribute and they in turn need to be either withdrawn from the ASes upstream or re-advertised with new paths. While originating the attribute, the speaker encodes the router-ID and AS of each side of the session. 5.2.2. At an intermediate point Any speaker receiving a withdrawal UPDATE message with ROOT_CAUSE attribute should preserve and announce the resulting UPDATE message with the same attribute value. This can be an explicit withdraw for a prefix or an implicit withdraw. Any speaker receiving a reachable UPDATE message with ROOT_CAUSE attribute should preserve the attribute and not announce the attribute in resulting UPDATE message unless the resulting UPDATE message is an explicit withdrawal message. Patel, et al. Expires February 9, 2011 [Page 9] Internet-Draft Root Cause Notification August 2010 5.3. Receiving ROOT_CAUSE Attribute Whenever a BGP speaker receives an update message to process withdrawn prefixes, it does the following: o Remove the BGP path of the prefix withdrawn. o Find all the other paths that have matching ROOT_CAUSE information to the one present in path that is removed. Place these paths on an Inactive timer for an Inactive time interval. Do not select these paths for the BGP bespath selection. 5.4. Usage of BGP Aggregates Whenever a BGP speaker creates an aggregate route from more specific routes, it will not inherit any BGP SPEAKERID_PATH information from its more specific routes used for aggregation. Instead, it will create its own SPEAKERID_PATH attribute when it announces the aggregate route to its BGP peers, i.e. the attribute will contain one segment with only its own (AS, router-id) pair when it announces the aggregate. 5.5. BGP Confederation BGP Confederation Speaker peering with EBGP peers and receiving routes from them will exchange BGP Route Originator attributes as well. Whenever a Special Withdrawal message is received, following is done: o Remove the path announced by peer (sending a Special Withdrawal message). o Not select any other BGP Paths with matching Route Originator Attribute (as one received in the Special Withdrawal). o If there arent any alternate paths available, forward the Special Withdrawal message (with originate Route Originator Attribute). 5.6. BGP Inactive Timer BGP inactive timer is used for suppressing path information from being used in BGP bestpath selection. This prevents BGP from selecting such alternate paths for which withdrawals are not received yet. A BGP speaker should remove suppress paths whenever withdrawn. A BGP speaker must subject all the suppress paths for BGP bestpath selection if they are not withdrawn even after inactive timer expires. The timeout for an Inactive Timer should be kept big enough to allow the withdrawal information to propagate across the AS. Patel, et al. Expires February 9, 2011 [Page 10] Internet-Draft Root Cause Notification August 2010 6. Acknowledgements Authors would like to thank Robert Raszuk and Pedro Marques for their input. 7. IANA Considerations IANA shall assign codepoints for the SPEAKERID_PATH and ROOT_CAUSE attributes. These codepoints will come from the "BGP Path Attributes" registry. 8. Security Considerations This extension to BGP does not change the underlying security issues. 9. References 9.1. Normative References [I-D.ietf-idr-rfc4893bis] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS Number Space", draft-ietf-idr-rfc4893bis-01 (work in progress), October 2009. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route Flap Damping", RFC 2439, November 1998. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. 9.2. Informative References [I-D.li-bgp-stability] Huston, G. and T. Li, "BGP Stability Improvements", draft-li-bgp-stability-01 (work in progress), June 2007. Patel, et al. Expires February 9, 2011 [Page 11] Internet-Draft Root Cause Notification August 2010 Authors' Addresses Keyur Patel Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: keyupate@cisco.com Chandra Appanna Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: chandra@cisco.com Pradosh Mohapatra Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: pmohapat@cisco.com John Scudder Juniper Networks 1194 N. Mathilda Ave Sunnyvale, CA 94089 USA Email: jgs@juniper.net James Uttaro AT&T 200 S. Laurel Ave Middletown, NJ 07748 USA Email: uttaro@att.com Patel, et al. Expires February 9, 2011 [Page 12]