Network Working Group P. Lapukhov Internet-Draft Facebook Intended status: InformationalOctober 30, 2017J. Tantsura Expires:May 3, 2018January 2, 2020 Apstra, Inc. July 1, 2019 Equal-Cost Multipath Considerations for BGPdraft-lapukhov-bgp-ecmp-considerations-01draft-lapukhov-bgp-ecmp-considerations-02 Abstract BGProuting protocol defined in ([RFC4271])(Border Gateway Protocol) [RFC4271] employs tie-breaking logic toelectselect a single best path among multiplepossible.paths available, known as BGP best path selection. At the same time, ithas beenis a commonin all practical BGP implementationspractice to allow for "equal-cost multipath" (ECMP)path electionselection and programming of multiple next-hops in routing tables. Thisdocuments providesdocument summarizes some common considerations for the ECMPlogic,logic when BGP is used as the routing protocol, with the intent of providing common referenceonfor otherwise unstandardizedfeature.set of features. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onMay 3, 2018.January 2, 2020. Copyright Notice Copyright (c)20172019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. AS-PATH attribute comparison . . . . . . . . . . . . . . . . 2 3. Multipath among eBGP-learned paths . . . . . . . . . . . . . 3 4. Multipath among iBGP learned paths . . . . . . . . . . . . . 3 5. Multipath among eBGP and iBGP paths . . . . . . . . . . . . . 4 6. Multipath with AIGP . . . . . . . . . . . . . . . . . . . . . 5 7. Best path advertisement . . . . . . . . . . . . . . . . . . . 5 8. Multipath and non-deterministic tie-breaking . . . . . . . . 5 9. Weighted equal-cost multipath . . . . . . . . . . . . . . . . 5 10. Informative References . . . . . . . . . . . . . . . . . . . 5Author's Address .Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction Section 9.1.2.2 of [RFC4271] definesstep-by stepstep-by-step tie-breaking procedure for selecting a single "best-path" among multiplealternativealternatives available for the sameNLRI (Network Layer Reachability Information) element.route. In order to improve efficiency in densely meshed symmetric network topologies it ishas becomecommonpracticeto allowselectingthe selection of multiple"equivalent""equal cost" paths for the sameprefix. Most commonly usedroute. Typical approach is tostopabort thetie- breakingtie-breaking process after comparingtheIGP cost for the NEXT_HOP attribute andselectingselect either all eBGP or all iBGP paths that remainedequivalent"equal" under thetie-breaking rules (seetie- breaking rules. See [BGPMP] for a vendor document explaining thelogic). Basically,logic. In a nutshell, the steps that compare the BGPidentifieridentifiers and BGP peer IP addresses (steps (f) and(g))(g) in [RFC4271]) are ignored for the purpose of multipath routing. BGP implementations commonly have a configuration knob that specifies the maximum number ofequivalentequal paths thatmayare allowed be programmedtoin the routing table.There isCommonly, there's alsocommona knob to enable multipath separately for iBGP-learned or eBGP-learned paths. 2. AS-PATH attribute comparisonAThe mandatory requirementisfor all paths that are considered as the candidates for ECMP selection is to have the same AS_PATH length, computed using thestandardlogic defined in [RFC4271] and [RFC5065], i.e. ignoring the AS_SET, AS_CONFED_SEQUENCE, and AS_CONFED_SET segment lengths. The content of the latter attributes is used purely for routing loopdetection.prevention. Assuming thatAS_PATH lengthsAS_PATHs length computed in this fashion are the same, many implementations require that the content of AS_SEQUENCE segment MUST be the same among allequivalent paths.the paths considered. Two common configuration knobs to alter this behaviour are usually provided:one allowingFirst one, to relax the otherwise mandatory AS_SEQUENCE comparison rule, enforcing only thelength ofAS_PATHto belength rule, while ignoring thesame, and anothercontent of AS_SEQUENCE. Another one requiring that the first AS numbers in the first AS_SEQUENCE segment found in AS_PATH (often referred to as "peer AS" number) be the same as the one found in best path(determined(as determined by running the fulltie-breaking algorithm).tie- breaking procedure). This documentreferrefers to those two as "multipath as-path relaxed" and "multipath same peer-as"knobs.correspondingly. 3. Multipath among eBGP-learned paths Step (d) in Section 9.1.2.2 of [RFC4271]instructsmandates, in presence of an eBGP path, to remove all iBGP paths fromconsiderations if an eBGP path is present inthecandidatethe ECMP candidates set. This leaves the BGPprocesstie-breaking procedure with just eBGP paths. At this point, the mandatory BGP NEXT_HOP attribute value most commonly belongs to the IP subnet that the BGP speaker shares with the advertising neighbor. In this case, it is common forimplementationimplementations to treat all NEXT_HOP values as having the same "internal cost" to reach them per the guidance of step (e) of Section 9.1.2.2. In some cases, either static routing or an IGP routing protocol could berunningused between the BGP speakers peeringoverusing an eBGP session. An implementation may use the next-hop metric discovered from the above sources to performtie- breakingtie-breaking even for eBGP paths.In case whenIf the MED attribute is present in some paths, the set ofallowedmultipath routes allowed will most likely be reduced to the ones coming from the same peer AS, per step (c) of Section 9.1.2.2. This is unlessthean implementationprovidedprovides a configuration knob to always compare MED attributes across all paths, as recommendedinby [RFC4451]. In the latter case, the presence of the MED attribute does notnarrowautomatically reduce the candidate path setonlyto the same peerAS.AS only. 4. Multipath among iBGP learned pathsWhenIn most cases iBGP is used along with an underlying IGP. Thus, when all paths for a prefix are learned via iBGP, the tie-breaking commonly occurs based on IGP metric of the NEXT_HOPattribute, since in most cases iBGP is used along with an underlying IGP. It is possible, inattribute. In some implementations, it is possible to ignore the IGP cost as well, if all of the paths are reachable via some kind of tunneling mechanism, such as MPLS([RFC3031]).[RFC3031]. This is enabled via a knob referredtoin this document as "skip igp check"in this document.. Notice that there is no standard way for a BGP speaker to detect presence of such tunneling techniques other than relying on the configuration settings. When iBGP is deployed with BGP route-reflectors per[RFC4456][RFC4456], the path attribute list may include the CLUSTER_LIST attribute.MostMany implementationscommonlyignore it for the purpose of ECMP route selection, assuming that IGP cost along should be sufficient for loop prevention. This assumption may not hold when IGP is not deployed, and instead iBGP session are configured to reset the NEXT_HOP attribute to "self" on everynode (thisnode. This also assumes the use of directly connected linkIPaddresses for sessionformation).formation. In this case, ignoring CLUSTER_LIST length might lead to routing loops. It is therefore recommended for implementations to have a knob that enables accounting for CLUSTER_LIST length when performing multipath route selection.In this case,Effectively, the CLUSTER_LIST attribute length should beeffectively used to replace theas an IGP metric.SimilarSimilarly to the route-reflector scenario, the use of BGP confederations in multipath scenarios assumes presence of an IGP for proper loop preventionin multipath scenarios,and use the IGP metric as the final tie- breaker for multipath routing. In addition tothis,that, and similar to eBGP case,implementationimplementations often require thatequivalentin order to be considered equal, the paths must belong to the same peer member AS as the best-path. It is useful to have the following two configurationknobs,knobs. First one enabling "multipath same confederation memberpeer-as"peer- as", and another enabling less restrictive "confed as-path multipathrelaxed", which allowsrelaxed" rule, that allow selecting multipath routesgoingreachable via any confederation member peer AS. As mentioned above, the AS_CONFED_SEQUENCE value length is usually ignored for the purpose of AS_PATH length comparison, relying instead on the IGP costinsteadfor loop prevention. Incase ifcases when IGP is not present with BGP confederation deployment, and similar to route-reflection case, it may beneedednessesary to consider AS_CONFED_SEQUENCE length when selecting the equivalent routes, effectively using it as a substitution for an IGP metric. A separate configuration knob is needed to allow this behavior. Per [RFC5065]the pathpaths learned over BGP intra-confederation peering sessions are treated as iBGP. There is no specification or operational document that defines how a mixed iBGP route-reflector and confederation basedmodeldeplyments would work together. Therefore, this document does not make recommendationsor considers thisfor the above case. 5. Multipath among eBGP and iBGP paths The best-path selection algorithm explicitly prefers eBGP paths over iBGP(oror learned from a BGP confederation member AS, whichisis, as per [RFC5065]istreated the same as iBGP from perspective of best-pathselection).selection. In somecase, allowingcases however, it might be beneficial to allow multipath routing between eBGP and iBGP learnedpaths might be beneficial.paths. This is only possible if some sort of tunneling technique is used to reach both the eBGP and iBGPpath.paths. If this feature is enabled, theequivalentequal routes areselection by stopping the tie-breaking processselected prioratto the MED comparison step (c) in Section 9.1.2.2of[RFC4271]. 6. Multipath with AIGP AIGP attribute defined in [RFC7311] must be used for best-path selection prior to running any logic of Section9.1.2.2.9.1.2.2 [RFC4271]. Only the paths with minimal value of AIGP metric are eligible for further consideration of tie-breaking rules. The rest of multipath selection logic remains the same. 7. Best path advertisementEventUnless BGP "Add-Path" feature described in [RFC7911] is enabled and even though multipleequivalentequal paths may be selected for programming into the routing table,thea BGP speakeralwaysannounces single best-path only to itspeers, unless BGP "Add-Path" feature has been enabled as described in [RFC7911].peers. The unique best-path is elected among the multi-path set using the standard tie-breaking rules. 8. Multipath and non-deterministic tie-breaking Some implementations may implement non-standard tie-breaking logic, for example using the oldest pathrule to improve routing stability.rule(reference). This is generally not recommended, and may interact with multi-path route selection on downstream BGP speakers. That is, after a route flap that affects the best-path upstream, the original best path would not be recovered, and the older path would still be advertised, possibly affecting the tie-breaking rules on down-streamdevice, for exampledevice if for example, the AS_PATH contents are different from previous. 9. Weighted equal-cost multipath The proposal in [I-D.ietf-idr-link-bandwidth] defines conditions where iBGP multipath feature might inform the routing table ofthe"weights" associated with the multiple external paths.The document[I-D.ietf-idr-link-bandwidth] defines the weight extended community attribute as non-transitive, considers the applicabilityonlyin iBGPcase,case only, though there are implementations that apply it to eBGPmultipathas well. The proposal does not change the equal-cost multipath selection logic,onlybut associates additional load-sharing attributes with equivalent paths. 10. Informative References [BGPMP] "BGP Best Path Selection Algorithm", <http://www.cisco.com/c/en/us/support/docs/ip/ border-gateway-protocol-bgp/13753-25.html>. [I-D.ietf-idr-link-bandwidth] Mohapatra, P. and R. Fernando, "BGP Link Bandwidth Extended Community",draft-ietf-idr-link-bandwidth-06draft-ietf-idr-link-bandwidth-07 (work in progress),January 2013.March 2018. [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, DOI 10.17487/RFC3031, January 2001, <https://www.rfc-editor.org/info/rfc3031>. [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, <https://www.rfc-editor.org/info/rfc4271>. [RFC4451] McPherson, D. and V. Gill, "BGP MULTI_EXIT_DISC (MED) Considerations", RFC 4451, DOI 10.17487/RFC4451, March 2006, <https://www.rfc-editor.org/info/rfc4451>. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, <https://www.rfc-editor.org/info/rfc4456>. [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous System Confederations for BGP", RFC 5065, DOI 10.17487/RFC5065, August 2007, <https://www.rfc-editor.org/info/rfc5065>. [RFC7311] Mohapatra, P., Fernando, R., Rosen, E., and J. Uttaro, "The Accumulated IGP Metric Attribute for BGP", RFC 7311, DOI 10.17487/RFC7311, August 2014, <https://www.rfc-editor.org/info/rfc7311>. [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, "Advertisement of Multiple Paths in BGP", RFC 7911, DOI 10.17487/RFC7911, July 2016, <https://www.rfc-editor.org/info/rfc7911>.Author's AddressAuthors' Addresses Petr Lapukhov Facebook 1 Hacker Way Menlo Park, CA 94025 US Email: petr@fb.com Jeff Tantsura Apstra, Inc. Menlo Park, CA 94025 US Email: jefftant.ietf@gmail.com