<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc category="std"
     docName="draft-idr-bgp-rt-oscillation-01"
     ipr="pre5378Trust200902"
     updates="4684">

  <front>

    <title abbrev="Route Oscillation in BGP RT-Constraint">
      Persistent Route Oscillation in BGP Constrained Route Distribution</title>

    <author fullname="Dongling Duan" initials="D." surname="Duan">
      <organization>Cisco</organization>
      <address>
        <postal>
          <street>170 W. Tasman Drive</street>
          <city>San Jose</city>
          <region>CA</region>
          <code>95134</code>
          <country>USA</country>
        </postal>
        <email>duan@cisco.com</email>
      </address>
    </author>
    <author fullname="Jakob Heitz" initials="J." surname="Heitz">
      <organization>Cisco</organization>
      <address>
	<postal>
	  <street>170 West Tasman Drive</street>
	  <city>San Jose</city>
	  <region>CA</region>
	  <code>95054</code>
	  <country>USA</country>
	</postal>
	<email>jheitz@cisco.com</email>
      </address>
    </author>
    <author fullname="Keyur Patel" initials="K." surname="Patel">
      <organization abbrev="Arrcus">Arrcus, Inc</organization>
      <address>
	<email>keyur@arrcus.com</email>
      </address>
    </author>
    <author fullname="Jeffrey Hass" initials="J." surname="Hass">
      <organization>Juniper Networks</organization>
      <address>
        <postal>
          <street>1194 N. Mathida Ave</street>
          <city>Sunnyvale</city>
          <region>CA</region>
          <code>94089</code>
          <country>USA</country>
        </postal>
        <email>jhaas@juniper.net</email>
      </address>
    </author>

    <date/>

    <!-- Meta-data Declarations -->

    <area>General</area>

    <workgroup>IDR</workgroup>

    <keyword>BGP RT-Constraint</keyword>

<abstract>
<t>
RFC4684 defines Multi-Protocol BGP (MP-BGP) procedures that allow BGP speakers
to exchange Route Target reachability information (RT-Constrain) to restrict the
propagation of Virtual Private Network (VPN) routes. In network scenarios where
hierarchical route reflection (RR) is used, the existing RT-Constrain mechanism may
result in persistent route oscillations within RRs. This document describes the 
problem and proposes a solution.
</t>
</abstract>

  </front>

  <middle>

<section anchor="introduction" title="Introduction">
<t>
[RFC4684] defines Multi-Protocol BGP (MP-BGP) procedures that allow BGP speakers
to exchange Route Target reachability information to restrict the propagation of
Virtual Private Network (VPN) routes.
</t>

<t>
<xref target="RFC4684"/> section 3.2 defines a route advertisement rule for Route Target
membership information. When advertising an RT membership NLRI to a non-client
peer, if the best path as selected by the path selection procedure described in
Section 9.1 of <xref target="RFC4271"/> is a route received from a non-client
peer, and if there is an alternative path to the same destination from a client
peer, then
the attributes of the client path are advertised to the peer. <xref target="RFC4684"/> does not
clarify which path to choose in case there are multiple client paths to the same
destination.
</t>

<t>
In network scenarios where hierarchical route reflection (RR) is used, and
multiple such client paths exist, persistent route oscillations might be
formed based on which client path attributes are advertised to the non-client peers.
</t>
</section> <!-- for Introductions section -->

<section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref target="RFC2119"/>.
</t>
</section>


<section title="Problem Statement - Persistent Route Oscillations">
<t>
     
            <figure align="center">
        <artwork align="left"><![CDATA[
 

                 44.44.0.39          44.44.0.199  
                      +-----+        +-----+                                           
                      | RR1 |        | RR2 |                                           
                      +----\+        +-/---+                                            
 (900 to PE1 and PE2)   |     \     /   |  (900 to PE1 and PE2)                                    
                        |       \ /     |                                              
                        |       / \     |                                                 
                        |     /     \   |                                             
                      +-----/        +-----+                                            
      44.44.2.196 ->  | RR3 |        | RR4 | <- 44.44.1.193                                  
                      +-----+        +-----+                                           
                (3040)->|     \     /   |<-(3030)                                       
                        |       \ /     |                                       
                        |       / \     |                                       
                        |   (3040)(3030)|                                       
                      +-----+        +-----+                                      
      55.55.0.42 ->  | PE1 |        | PE2 | 55.55.0.43                                     
                      +-----+        +-----+                                        
                         |               |                                           
                       RT-1           RT-1     
  
     Figure 1. RT-Constrain with Hierarchical Route-reflector

         ]]></artwork>
      </figure>
                                
   
</t>
<t>
In Figure 1, Hierarchical RRs are deployed. RR3 and RR4 are first level Router Reflectors and
RR1 and RR2 are the second level Route Reflectors.
Each RR is using its own router-id as its cluster-id. PE1 and PE2 are Route Reflector clients of
RR3 and RR4 while RR3 and RR4 are Route Reflector clients of RR1 and RR2. Both PE1 and PE2 are
advertising the route-target information RT-1 to the first level Router Reflectors RR3 and RR4. RR3
and RR4 are also advertising route-target information to the second level Router Reflectors RR1
and RR2. The numbers in the parentheses are nexthop metrics. 
</t>

<t>
At step #1, RR3 has two paths for RT-1: one from PE1 and the other from PE2. The path from
PE1 has next hop metric 3040 and the path from PE2 has next hop metric 3030. RR3 and RR4 select
the path from PE2 as the best path (lower metric). RR3 and RR4 advertise their best paths
for RT-1 to the second level router reflectors RR1 and RR2.
</t>

<t>
At step #2, RR1 has two paths for RT-1: one path from RR3 and the other path from RR4.
The next hop metric to reach PE1 and PE2 are both 900.
On RR1 and RR2, if both paths have the same ORIGINATOR_ID, 
then the lower peer address will be used to select the best path.
RR1 selects the path from RR4 as best path because it has a lower peer address than RR3.
RR1 advertises RT-1 back to RR3.
</t>

<t>
When announcing RT-1 to its client (RR3), RR1 will set the ORIGINATOR_ID to itself
according to <xref target="RFC4684"/> section 3.2.i.
</t>

<t>
At step #3, RR3 has four paths: the first from PE1, the second from PE2, the third
from RR1 and a fourth from RR2. For the purposes of this discussion, the path from
RR2 is equivalent to that from RR1. The result is the same if either is chosen.
RR3 selects the non-client path from RR1 to PE2 as best path because the
ORIGINATOR_ID is lower than that of the paths from PE1 and PE2.
Since there are client paths available to reach 
RT-1, RR3 advertises the path attribute of a client path to RR1 according to
<xref target="RFC4684"/> section 3.2.ii. RR3 could choose either the path from PE1 or PE2.
RR3 chooses the path attribute of RT-1 from PE1 at random.
</t>

<t>
At step #4, RR1 receives the updates and recalculates the best path.
RR1 has a path from RR4 with ORIGINATOR_ID set to PE2's router-id and a path from RR3
with ORIGINATOR_ID set to PE1's router-id.
RR1 selects the path from RR3 as best path because of lower ORIGINATOR_ID.
RR1 sets the ORIGINATOR_ID to its own router-id and sends it back to RR3.
</t>

<t>
At step #5, RR3 receives the updates from RR1 and drops the updates since its own cluster-id
is in the cluster list.
Now RR3's routing state goes back to that at step #1 with 2 paths from its clients
and the whole cycle starts again.
</t>

<t>
The same thing happens on RR4 as on RR3 and the same thing happens on RR2 as on RR1.
</t>

<t>
These iterations results in a persistent route oscillation for RT-1 prefix of RT-Constrain
address-family on RR1, RR2, RR3 and RR4.
</t>

</section>

<section title="Solution">
<t>
The solution is for the Route Reflector always to prefer the client paths when selecting a best path.
This preference MUST be expressed before step f) of the BGP Decision Tie Breaking rules
in Section 9.1.2.2 of <xref target="RFC4271"/>. It MAY be expressed at a higher step.
So at Step #3 on RR3, the best path is still the path from PE2. The oscillation terminates with
PE2's path.
</t>

<t>
Note that the scenario can not happen if RR1 and RR2 are in the same cluster.  So 
at step #3, RR3 only has two client paths. The update from the top level Route Reflector will be
dropped because of the cluster id check. The oscillation never happens with such a topology.
</t>
</section>


<!-- Possibly a 'Contributors' section ... -->

<section anchor="IANA" title="IANA Considerations">
<t>
This draft makes no request of IANA.
</t>
</section>

<section anchor="Security" title="Security Considerations">
<t>
This extension to BGP does not change the underlying security issues
inherent in the existing <xref target="RFC4271"/> and <xref target="RFC4684"/>.
</t>
</section>

<section anchor="Acknowledgements" title="Acknowledgements">
<t>
The authors would like to thank Shyam Sethuram, Nitin Kumar, Sameer Gulrajani,
Mohammed Mirza and Mike Dubrovskiy.
</t>
</section>

  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
      <?rfc include="reference.RFC.4271"?>
      <?rfc include="reference.RFC.4684"?>
    </references>

  </back>
</rfc>
