Network Working Group Daniel Walton Internet Draft David Cook Expiration Date: May 2003 Alvaro Retana File name: draft-walton-bgp-add-paths-01.txt John Scudder Cisco Systems November 2002 Advertisement of Multiple Paths in BGP draft-walton-bgp-add-paths-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The BGP specification [BGP] defines an "Update-Send Process" to advertise the routes chosen by the Decision Process to other BGP speakers. No provisions are made to facilitate the advertisement of multiple paths to the same destination. In fact, a route with the same NLRI as a previously advertised route implicitly replaces the original advertisement. This document proposes a mechanism that will allow the advertisement of multiple paths for the same prefix without the new paths implicitly replacing any previous ones. The essence of the mechanism is that each path is identified by an arbitrary identifier in addition to its prefix. Walton, et al [Page 1] INTERNET DRAFT Multiple Paths in BGP November 2002 1. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Advertisement of Multiple Paths in BGP This section describes an alternate NLRI encoding that allows the advertisement of multiple paths in BGP. 2.1. Capability Advertisement This specification defines the capability [BGP_CAP] ADD_PATH. The ADD_PATH capability has code TBD. Its length is zero, there is no data. Capability code 4 defined in [RFC3107] MUST NOT be advertised if ADD_PATH is advertised (see also the section below entitled 'Modifications to "Carrying Label Information in BGP-4"'). 2.2. NLRI Encoding If two BGP speakers advertise the ADD_PATH capability to each other, the NLRI encoding is modified to add two new fields at the beginning of the NLRI -- a "flags" field (described below), and an identifier to distinguish the NLRI from other NLRI with the same prefix but different path attributes and/or nexthop. We note that in many BGP operations, the prefix is used as a key for identifying a datum. For example, when withdrawing a route using the procedures of [BGP], only the prefix needs to be specified in order to withdraw the entire route. For such purposes, the identifier field introduced by this specification is treated as part of the key. The following subsections specify the necessary modifications to existing encodings. We recommend that future documents which specify NLRI encodings for BGP include an encoding (possibly the sole encoding) compatible with this specification. Walton, et al [Page 2] INTERNET DRAFT Multiple Paths in BGP November 2002 2.2.1. Modifications to "BGP-4" "BGP-4" [BGP], section 4.6 (the sub-sections titled "Withdrawn Routes" and "Network Layer Reachability Information") are updated by the following: The Network Layer Reachability information is encoded as one or more 4-tuples of the form , whose fields are described below: +---------------------------+ | Flags (1 octet) | +---------------------------+ | Identifier (2 octets) | +---------------------------+ | Length (1 octet) | +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Flags: This is a one-octet bit-field and MUST NOT be used for identifying the path. In other words, it does not form part of the key used to to identify the path. The following values are defined: BestPath (0x01) If set to one, the bestpath bit indicates that the path associated with the NLRI has been selected by the BGP speaker for installation into its FIB. If set to zero, the path has not been selected. If a route which was advertised with the bestpath bit set to one is removed from the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to zero, or withdrawn. Likewise, if a route which was advertised with the bestpath bit set to zero is selected for installation in the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to one, or withdrawn. FirstPath (0x02) Walton, et al [Page 3] INTERNET DRAFT Multiple Paths in BGP November 2002 If set to one, the firstpath bit indicates the current update contains the first of a series of paths for a specific prefix. Any paths received before this one MUST be removed by the receiver. If set to zero, it indicates that the current update is not the first in the series. LastPath (0x04) If set to one, the lastpath bit indicates that the current update is the last one for the prefix. If set to zero, it indicates that more paths for the same pre- fix MAY be advertised. b) Identifier: The Identifier field allows the address prefix and its asso- ciated path attributes ("path") to be distinguished from other paths for the same prefix. The selection of identif- ier values is a local implementation decision. If the Identifier is set to 65535, then it MUST be inter- preted as an explicit withdraw for wall paths associated with the prefix. c) Length: The Length field indicates the length in bits of the address prefix. A length of zero indicates a prefix that matches all (as specified by the address family) addresses (with prefix, itself, of zero octets). d) Prefix: The Prefix field contains an address prefix followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. 2.2.2. Modifications to "Multiprotocol Extensions for BGP-4" "Multiprotocol Extensions for BGP-4" [MP_BGP], section 7 is replaced by the following: The Network Layer Reachability information is encoded as one or more 4-tuples of the form , Walton, et al [Page 4] INTERNET DRAFT Multiple Paths in BGP November 2002 whose fields are described below: +---------------------------+ | Flags (1 octet) | +---------------------------+ | Identifier (2 octets) | +---------------------------+ | Length (1 octet) | +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Flags: This is a one-octet bit-field and MUST NOT be used for iden- tifying the path. In other words, it does not form part of the key used to to identify the path. The following values are defined: BestPath (0x01) If set to one, the bestpath bit indicates that the path associated with the NLRI has been selected by the BGP speaker for installation into its FIB. If set to zero, the path has not been selected. If a route which was advertised with the bestpath bit set to one is removed from the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to zero, or withdrawn. Likewise, if a route which was advertised with the bestpath bit set to zero is selected for installation in the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to one, or withdrawn. FirstPath (0x02) If set to one, the firstpath bit indicates the current update contains the first of a series of paths for a specific prefix. Any paths received before this one MUST be removed by the receiver. If set to zero, it indicates that the current update is not the first in the series. Walton, et al [Page 5] INTERNET DRAFT Multiple Paths in BGP November 2002 LastPath (0x04) If set to one, the lastpath bit indicates that the current update is the last one for the prefix. If set to zero, it indicates that more paths for the same pre- fix MAY be advertised. b) Identifier: The Identifier field allows the address prefix and its asso- ciated path attributes ("path") to be distinguished from other paths for the same prefix. The selection of identif- ier values is a local implementation decision. If the Identifier is set to 65535, then it MUST be inter- preted as an explicit withdraw for wall paths associated with the prefix. c) Length: The Length field indicates the length in bits of the address prefix. A length of zero indicates a prefix that matches all (as specified by the address family) addresses (with prefix, itself, of zero octets). d) Prefix: The Prefix field contains an address prefix followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. 2.2.3. Modifications to "Carrying Label Information in BGP-4" "Carrying Label Information in BGP-4" [RFC3107] is modified as fol- lows. Section 4 ("Advertising Multiple Routes to a Destination") is deleted, as the procedures of this specification allow multiple routes to be advertised, so no other procedures are required. For the same reason, the final paragraph of Section 5 (which specifies capability code 4) is deleted. Section 3 is replaced by the follow- ing: Label mapping information is carried as part of the Network Layer Reachability Information (NLRI) in the Multiprotocol Extensions attributes. The AFI indicates, as usual, the address family of the associated route. The fact that the NLRI contains a label is indicated by using SAFI value 4. Walton, et al [Page 6] INTERNET DRAFT Multiple Paths in BGP November 2002 The Network Layer Reachability information is encoded as one or more 5-tuples of the form , whose fields are described below: +---------------------------+ | Flags (1 octet) | +---------------------------+ | Identifier (2 octets) | +---------------------------+ | Length (1 octet) | +---------------------------+ | Label (3 octets) | +---------------------------+ +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Flags: This is a one-octet bit-field and MUST NOT be used for iden- tifying the path. In other words, it does not form part of the key used to to identify the path. The following values are defined: BestPath (0x01) If set to one, the bestpath bit indicates that the path associated with the NLRI has been selected by the BGP speaker for installation into its FIB. If set to zero, the path has not been selected. If a route which was advertised with the bestpath bit set to one is removed from the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to zero, or withdrawn. Likewise, if a route which was advertised with the bestpath bit set to zero is selected for installation in the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to one, or withdrawn. FirstPath (0x02) If set to one, the firstpath bit indicates the current update contains the first of a series of paths for a Walton, et al [Page 7] INTERNET DRAFT Multiple Paths in BGP November 2002 specific prefix. Any paths received before this one MUST be removed by the receiver. If set to zero, it indicates that the current update is not the first in the series. LastPath (0x04) If set to one, the lastpath bit indicates that the current update is the last one for the prefix. If set to zero, it indicates that more paths for the same pre- fix MAY be advertised. b) Identifier: The Identifier field allows the address prefix and its asso- ciated path attributes ("path") to be distinguished from other paths for the same prefix. The selection of identif- ier values is a local implementation decision. If the Identifier is set to 65535, then it MUST be inter- preted as an explicit withdraw for wall paths associated with the prefix. c) Length: The Length field indicates the length in bits of the address prefix. A length of zero indicates a prefix that matches all (as specified by the address family) addresses (with prefix, itself, of zero octets). d) Label: The Label field carries one or more labels (that corresponds to the stack of labels [LABELS]). Each label is encoded as 3 octets, where the high-order 20 bits contain the label value, and the low order bit contains "Bottom of Stack" (as defined in [LABELS]). e) Prefix: The Prefix field contains an address prefix followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. The label(s) specified for a particular route (and associated with its address prefix) must be assigned by the LSR which is Walton, et al [Page 8] INTERNET DRAFT Multiple Paths in BGP November 2002 identified by the value of the Next Hop attribute of the route. When a BGP speaker redistributes a route, the label(s) assigned to that route must not be changed (except by omission), unless the speaker changes the value of the Next Hop attribute of the route. A BGP speaker can withdraw a previously advertised route (as well as the binding between this route and a label) by either (a) advertising a new route (and, optionally, a label) with the same NLRI as the previously advertised route (keeping in mind that the identifier comprises part of the NLRI for this purpose), or (b) listing the NLRI (again keeping in mind the inclusion of the iden- tifier as part of the NLRI for this purpose) of the previously advertised route in the Withdrawn Routes field of an Update mes- sage. In the latter case, no label information need be included. 2.3. Operation Using the identifier specified in the previous subsection, the same prefix can be advertised multiple times without subsequent advertise- ments replacing previous ones. Apart from the fact that this is pos- sible, the route advertisement rules of [BGP] are not changed. In particular, a new advertisement of a given NLRI (remembering that the identifier is part of the NLRI's definition) replaces a previous advertisement of the given NLRI. When two BGP speakers have advertised the ADD_PATH capability to each other, the NLRi encoding defined in this document MUST be used. 3. Deployment Considerations The intent of this extension is to be used in a controlled fashion for applications that require only partial propagation of the routing information, or specific individual recipients. Care should be taken when deploying this enhancement. If deployed improperly, the presence of extra paths in some parts of the AS and not in others can cause inconsistent routing. One scenario of par- ticular concern involves the IGP metric to the address depicted by the NEXT_HOP, and the MED attribute. If this extension is used to advertise alternate paths, the best path [BGP] SHOULD also be adver- tised. As long as the best path is still selected as best, the pres- ence of additional paths in some parts of the AS and not others will not cause inconsistent routing. However, if the IGP metric to the address depicted by the NEXT_HOP should change such that a non best path is now preferred over the best path, then every router in the Walton, et al [Page 9] INTERNET DRAFT Multiple Paths in BGP November 2002 path to the address depicted by the NEXT_HOP should have the addi- tional paths. Because the MED is only compared between routes from the same AS [BGP], it is possible that an additional path could be selected as the best path. This may cause inconsistent routing if all routers in the forwarding path of the affected routers do not have the addi- tional paths. In a simple topology, it may be possible to anticipate these scenarios and avoid inconsistent routing while still enabling appropriate applications. Documents proposing applications of this extension SHOULD specify restrictions for propagating additional paths and should supply specific deployment guidelines. 4. Security Considerations This document introduces no new security concerns to BGP or other specifications referenced in this document. 5. Acknowledgments We would like to thank Dave Meyer, Srihari Ramachandra, Eric Rosen, Dan Tappan, Robert Raszuk, Mark Turner and Enke Chen for their com- ments and suggestions. 6. References [BGP] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)," RFC 1771, March 1995. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels," RFC 2119, March 1997. [BGP_CAP] Chandra, R. and J. Scudder, "Capabilities Advertisement with BGP- 4," RFC 2842, May 2000. [MP_BGP] Bates, T., R. Chandra, D. Katz and Y. Rekhter, "Multiprotocol Extensions for BGP-4," RFC 2858, June 2000. [RFC3107] Walton, et al [Page 10] INTERNET DRAFT Multiple Paths in BGP November 2002 Rekhter, R. and E. Rosen, "Carrying Label Information in BGP-4," RFC 3107, May 2001. [LABELS] Rosen, E., D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci, T. Li and A. Conta, "MPLS Label Stack Encoding", RFC 3032, January 2001. 7. Authors' Addresses Daniel Walton Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: dwalton@cisco.com Alvaro Retana Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: aretana@cisco.com David Cook Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: dacook@cisco.com John G. Scudder Cisco Systems, Inc. 100 S. Main Suite 200 Ann Arbor, MI 48104 Email: jgs@cisco.com Walton, et al [Page 11]