<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- $Id$ -->
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [
	  <!ENTITY rfc2119 PUBLIC ''
		   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
	  ]>
<?rfc toc="yes"?>
<?rfc tocompact="no"?>
<?rfc tocdepth="6"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc strict="yes" ?>


<rfc category="info" docName="draft-mohanty-bess-mutipath-interas-01"
     ipr="trust200902" updates="">

  <front>
    <title abbrev="BGP Multipath in Inter-AS Option-B">
  BGP Multipath in Inter-AS Option-B    
    </title>

    <author fullname="Satya Ranjan Mohanty" initials="S R."
            surname="Mohanty">
      <organization>Cisco Systems</organization>
      <address>
        <postal>
          <street>170 W. Tasman Drive</street>
          <street/>
          <city>San Jose</city>
          <code>95134</code>
          <region>CA</region>
          <country>USA</country>
        </postal>
        <email>satyamoh@cisco.com</email>
      </address>
    </author>

  <author fullname="Arjun Sreekantiah" initials="A.S."
            surname="Sreekantiah">
      <organization>Cisco Systems</organization>
      <address>
        <postal>
          <street>170 W. Tasman Drive</street>
          <city>San Jose</city>
          <region>CA</region>
          <code>95134</code>
          <country>USA</country>
        </postal>
        <email>asreekan@cisco.com</email>
      </address>
  </author>

<author fullname="Dhananjaya Rao" initials="D.R."
            surname="Rao">
      <organization>Cisco Systems</organization>
      <address>
        <postal>
          <street>170 W. Tasman Drive</street>
          <city>San Jose</city>
          <region>CA</region>
          <code>95134</code>
          <country>USA</country>
        </postal>
        <email>dhrao@cisco.com</email>
      </address>
  </author>
<author fullname="Keyur Patel" initials="K.P."
            surname="Patel">
      <organization>Arrcus, Inc</organization>
      <address>
        <email>keyur@arrcus.com</email>
      </address>
  </author>
    <date month="September" day="10" year="2017"/>
    <area>Routing</area>

    <workgroup>BESS WorkGroup</workgroup>

    <abstract>
      <t>
	By default, The Border Gateway Protocol, BGP only installs the best-path to the IP Routing Table. BGP multi-path is a well known feature that enables installation of multiple paths to the IP Routing Table. This is done to achieve load balancing while forwarding traffic. For a path to be eligible as a multi-path, certain criteria need to be fulfilled. Inter-AS VPNs are commonly deployed to span organizations across Service Provider boundaries. In this draft, we describe an issue relating to multi-path load balancing that can arise in an Option B Inter-AS Deployment. With the help of a representative topology, we illustrate the problem and then present two simple schemes as the solution to the problem. We also note as a matter of independent interest that the same underlying issue is applicable to deployments that employ next-hop-self behavior (implicit or explicit) downstream and the multi-path feature upstream.
      </t>
    </abstract>

  </front>

  <middle>

<section anchor="Intro" title="Introduction">
<t>
By Default BGP <xref target="RFC4271"></xref> only advertises the best-path to a peer and also installs the best-path to the IP Routing Table (RIB) and thereby to the Forwarding Information Base (FIB). BGP multi-path is a feature where more than one received BGP route, rather than only the one corresponding to 
the BGP best-path, are installed in the IP Routing Table and the Forwarding Information Base. This offers benefits of load balancing, efficient utilization of system resources network-wide, and enabling high throughput for traffic flows which would be lacking otherwise. It also has the added benefit of providing redundancy in case one of the BGP paths are withdrawn due to a link going down or some other event. Often vendors have a configurable knob which dictates how many paths to a given destination can be installed in the forwarding.
</t>

<t>
BGP Multi-path is widely deployed in practice and when augmented with the Demilitarized Link Bandwidth (DMZ LB) <xref target="I-D.ietf-idr-link-bandwidth"/> can be used to provide unequal cost load balancing as per user control.
</t>
      
<t>The BGP best-path algorithm proceeds through a well-known and deterministic selection mechanism in determining the best-path.
Typically, a path is deemed eligible as a multi-path, if it encounters a tie with the best-path, when it is determined that the IGP cost (metric) to the BGP next-hop is the same, as per the BGP best-path algorithm <xref target="RFC4271"></xref>.
 In addition, two paths, which match all criteria until the IGP metric but have the same next-hop IP address cannot both be considered as multi-paths. This is regardless of EBGP or IBGP rules.

In this draft we point out an issue that limits the benefits of multi-path deployments arising out of above restrictions when the BGP path is propagated across Inter-AS Option B <xref target="RFC4364"></xref> Autonomous System Boundary Routers (ASBRs).
</t>
       <figure anchor="figure_example">
           <artwork>
                               \      /
                                \    /
      |----PE1----|             |    |
      |           |             |    |
CE1---|           RR-------ASBR1------ASBR2------PE3
      |----PE2----|             |    |
                                |    |
          AS 100                             AS 200
           </artwork>
           <postamble>Inter-AS Option B.</postamble>
       </figure>

 </section>  
<section anchor="Req" title="Requirements Language">
	<t>The key words &quot;MUST&quot;, &quot;MUST NOT&quot;,
          &quot;REQUIRED&quot;, &quot;SHALL&quot;,
          &quot;SHALL NOT&quot;, &quot;SHOULD&quot;,
          &quot;SHOULD NOT&quot;, &quot;RECOMMENDED&quot;,
          &quot;MAY&quot;, and &quot;OPTIONAL&quot; in this document
          are to be interpreted as described in <xref target="RFC2119"/>.</t>
</section> 

<section anchor="deploy" title="Topology notation">
   <t>In the Figure 1. above, we consider a typical Inter-AS Option B topology,
ASBR1 peering with ASBR2 over the inter-AS eBGPlink.
A VPN, vpn has a presence in both the Autonomous Systems, on all the PE routers shown; 
i.e. a Virtual routing Forwarding (VRF) tables associated with the VPN vpn exists at each
of the Provider Routers shown. 
A dual-homed CE, CE1 is peering with PE1 and PE2 respectively in the context 
of vrf VRF1.</t>  

<t>Denote the Route-Distinguisher (RD) of the vrf VRF1 configured in PE1 by RD1.
Denote the Route-Distinguisher of the vrf VRF2 configured in PE2 by RD2.
Assume that CE1 advertises an ipv4 prefix p, at ASBR1, the received VPN route prefix
will be RD1:p and RD2:p, with next-hops PE1 and PE2 respectively, with the vpn
(service) label as L1 and L2 respectively.
</t>
</section> 
<section anchor="problem-description" title="Problem Description">
<t>
As per EBGP rules at the advertising ASBR, ASBR1,the next-hop will be reset to
the ASBR1 itself. This causes the two routes RD1:p and RD2:p to be advertised to the 
receiving AS, AS2, with the mandatory attribute, the next-hop which points to ASBR1.</t>

<t>Let's say the swapped label for RD1:p and RD2:p at ASBR1 is L1 and L2 respectively.
If ASBR2 does not reset the next-hop (usual behavior), then the two paths will be received
at PE3 with the same next-hop, i.e. ASBR1.
If ASBR2 does reset the next-hop, then the two paths will be received
at PE3 with the next-hop set to ASBR2.</t>

<t>In either case above, the two paths received at PE3 have the same next-hop, even though the
labels are different. As explained earlier, if two received BGP paths have the same next-hop,
then both of them cannot be eligible for multi-paths at the same time. 
This means that at the PE3, only one of the routes
will be installed in the forwarding.</t> 

<t>In the Figure 1 above, even though the advertising AS (AS 100) has path redundancy, this is not visible to
AS 200, and therefore load balancing cannot be done at ASBR1.
Note that this is different from the classic same RD problem which one often encounters in the 
Route-Reflector context.</t>

  </section>

<section anchor="add-path" title="BGP ADDpath with the non-unique RD case">
<t>
The above scenario is described in the context of the unique-RD case.
Now consider the case when one has non-unique RDs configured for the vpn VRF at PE1 and PE2, and
BGP Add-Path <xref target="RFC7911"></xref> is used to propagate the paths to AS200 via RR, ASBR1 and ASBR2 respectively.
In this case, the ASBR1 resets the next-hop to itself in both of the add-paths thus ensuring that the two add-paths 
cannot be installed as primary and backup in the FIB at PE3 in AS200.
</t>
</section>

<section anchor="BGP-LU" title="BGP Labeled unicast with Add-Path">
<t>
A similar situation exists for non-VPN labeled traffic.
Figure 2 shows a simple ebgp topology, in which R1 is in AS 1, R2 and R3 are in AS 2, R4 is in AS 3, and R5 is in AS 4. 
A labeled unicast <xref target="RFC3107"></xref> prefix, p, is being advertised from R1 to R5.
Add-Path is configured at R4 and R5 and the capability is negotiated.
Both R2 and R3, will set the next-hop to themselves.
When R4 receives the prefix p from R2 and R3, the situation is similar to the add-path scenario for the VPN case as described in the earlier section. As a result only one of the paths will be advertised to R5.
</t>
       <figure anchor="figure_example2">
           <artwork>
                               
                                
      |===== R2 =====|           
      |              |             
R1----|              R4————— R5
      |              |
      |===== R3 =====|        
-AS1 -| - - AS2 - -|-AS3-|——AS4                             
                                       
           </artwork>
           <postamble>Inter-AS Option B.</postamble>
       </figure>

</section>
   
      <section anchor="solution1" title="BGP Multi-path Inter-As Solution 1">
       <t>
The first solution is to consider the uniqueness of the label and the next-hop by considering 
the tuple (next-hop, label).
This translates to (ASBR1, L1) and (ASBR2, L2) and therefore they can be distinguished.
However many existing deployments today consider only the next-hop as the key.
Therefore this solution requires upgrade to existing deployment software.
An independent issue is that there should be no implications on hashing the weights assigned to the paths in the FIB due to the	dependency on the label.
	</t>
      </section>
      <section anchor="solution2" title="BGP Multi-path Inter-As Solution 2">
       <t>
The second solution is to inject two loopback ip addresses at ASBR1 into the IBGP of the receiving AS corresponding
to the PE1 and PE2’s configured ip address or loopbacks that are in the next-hop attribute of the vpn routes
RD1:p and RD2:p. These loopback addresses need to be injected into the IGP of the receiving AS.
Also ASBR2 needs to be configured with a static route pointing to ASBR1 for this purpose.
Alternatively, ASBR1 can redistribute these loopbacks into EBGP.
This is also equivalent to doing next-hop-self.
The above solution won’t require any software upgrade.
However it will require the implementation to support policy and may have security implications since routes need to be leaked from one AS to the other.	
	</t>
      </section>
   <section anchor="proto"
               title="Protocol Considerations">
   <t> No Protocol Changes are necessary
   </t>
   </section>


      <section anchor="Oper"
               title="Operational Considerations">
      <t>
        Any of the two methods above can be adopted. A note may be made that these solutions also are applicable to EVPN <xref target="RFC7432"></xref>
      </t>
      </section> <!-- EO Oper -->



    <section anchor="Security"
             title="Security Considerations">
      <t>
	This document raises no new security issues for L3VPN.
      </t>

    </section> <!-- EO Security -->



    <section anchor="Acknowledgements"
             title="Acknowledgements">
      <t>
	The authors would like to thank Yuri Tsier for his feedback and
        useful discussions
      </t>

    </section> <!-- Ack -->


  </middle>
  <back>
    <references title="Normative References">



      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4271.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.7432.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4360.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-idr-extcomm-iana-02.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-idr-link-bandwidth-06.xml"?>
    </references>

    <references title="Informative References">
            <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4364.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6624.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.7911.xml"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3107.xml"?>
    </references>

  </back>
</rfc>
