<?xml version='1.0'?>   
    <!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [ 
        <!ENTITY rfc7432 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7432.xml'> 
        <!ENTITY rfc4601 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4601.xml'> 
        <!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'> 
        ]>
<?rfc toc="yes"?>
<?rfc tocompact="no"?>
<?rfc tocdepth="6"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc strict="yes" ?>
<rfc category="std" docName="draft-sajassi-bess-evpn-per-mcast-flow-df-election-00">
  <!-- ***** FRONT MATTER ***** -->
  <front>
    <title abbrev="Per multicast flow Designated Forwarder Election for EVPN">Per multicast flow Designated Forwarder Election for EVPN</title>
    
    <author initials="Ali" surname="Sajassi" 
    fullname="Ali Sajassi">
      <organization>Cisco Systems</organization>
    
      <address>
       <postal>
         <street>821 Alder Drive,</street>
         
        <region>MILPITAS, CALIFORNIA 95035</region>
        
        <country>UNITED STATES</country>
       </postal>
       
       <phone></phone>
       <email>sajassi@cisco.com</email>
       </address>
    </author>

    <author initials="Mankamana" surname="Mishra" 
    fullname="Mankamana Mishra">
      <organization>Cisco Systems</organization>
    
      <address>
       <postal>
         <street>821 Alder Drive,</street>
         
        <region>MILPITAS, CALIFORNIA 95035</region>
        
        <country>UNITED STATES</country>
       </postal>
       
       <phone></phone>
       <email>mankamis@cisco.com</email>
      </address>
    </author>

    <author initials="Samir" surname="Thoria" 
        fullname="Samir Thoria">
      <organization>Cisco Systems</organization>
        <address>
            <postal>
                <street>821 Alder Drive,</street>

                <region>MILPITAS, CALIFORNIA 95035</region>

                <country>UNITED STATES</country>
            </postal>

            <phone></phone>
            <email>sthoria@cisco.com</email>
        </address>
    </author>

        <author initials="Jorge" surname="Rabadan"
        fullname="Jorge Rabadan">
      <organization>Nokia</organization>
        <address>
            <postal>
                <street>777 E. Middlefield Road</street>

                <region>Mountain View, CA 94043</region>

                <country>UNITED STATES</country>
            </postal>

            <phone></phone>
            <email>jorge.rabadan@nokia.com</email>
        </address>
    </author>

        <author initials="John" surname="Drake"
        fullname="John Drake">
      <organization>Juniper Networks</organization>
        <address>
            <postal>
                <street></street>

                <region></region>

                <country></country>
            </postal>

            <phone></phone>
            <email>jdrake@juniper.net</email>
        </address>
    </author>


    <date year="2018"/>    
    <area>Routing</area>
    <workgroup>BESS WorkGroup</workgroup>
    <abstract>
        <t>
            <xref target="RFC7432"/> describes mechanism to elect  designated forwarder (DF) 
            at the granularity of (ESI, EVI) which is per VLAN (or per group of VLANs in case 
            of VLAN bundle or VLAN-aware bundle service). However, the current level of
            granularity of per-VLAN is not adequate for some of applications. 
            <xref target="I-D.ietf-bess-evpn-ac-df"/> and  <xref target="I-D.ietf-bess-evpn-df-election"/> improves base line DF election. This document is an extension to HRW base drafts 
            (<xref target="I-D.ietf-bess-evpn-ac-df"/> and  <xref target="I-D.ietf-bess-evpn-df-election"/>) and 
            further enhances HRW algorithm to do DF election at the granularity  
            of (ESI, VLAN, Mcast flow).
        </t>
    </abstract>
  </front>

  <!-- ***** MIDDLE MATTER ***** -->

  <middle>
      <section title="Introduction">
          <t>
            EVPN based All-Active multi-homing is becoming the basic building 
            block for providing redundancy in  next generation data center 
            deployments as well as 
            service provider access/aggregation network.
            <xref target="RFC7432"/> defines role of a designated forwarder 
            as the node in the redundancy group that is responsible to forward 
            Broadcast, Unknown unicast, Multicast (BUM) traffic on that Ethernet 
            Segment (CE device or network) in an All-Active multi-homing. 
        </t>
        <t>
              This DF election mechanism  allows selecting a DF at the
              granularity of (ES, VLAN) or (ES, VLAN bundle) for Broadcast, 
              Unknown Unicast, or Multicast (BUM)  traffic. Though 
              <xref target="I-D.ietf-bess-evpn-ac-df"/> 
              and <xref target="I-D.ietf-bess-evpn-df-election"/> improves 
              the default DF election procedure , still it does not fit well 
              for some  of service provider residential application, where 
              whole multicast traffic is delivered on single VLAN. 

          </t> 
          <figure  >
            <preamble/>
              <artwork ><![CDATA[            

                            (Multicast sources)
                                     |
                                     |
                                   +---+
                                   |CE4|
                                   +---+
                                     |
                                     |
                               +-----+-----+
                  +------------|   PE-1    |------------+
                  |            |           |            |
                  |            +-----------+            |
                  |                                     |
                  |                   EVPN              |
                  |                                     |
                  |                                     |
                  | (DF)                           (NDF)|
            +-----------+                        +-----------+
            |  |EVI-1|  |                        |  |EVI-1|  |
            |   PE-2    |------------------------|   PE-3    |
            +-----------+                        +-----------+
                   AC1  \                       / AC2                     
                         \                     /                     
                          \      ESI-1        /                     
                           \                 /                     
                            \               /                     
                            +---------------+
                            |    CE2        |
                            +---------------+
                                   |
                                   |
                          (Multiple receivers)


                    Figure 1: Multi-homing Network of EVPN for IPTV deployments
                  ]]></artwork>
              <postamble></postamble>
          </figure>     
          <t> Consider the above topology, which shows residential deployment 
              scenario, where multiple receivers are behind all active 
              multihoming segment. All of the multicast traffic is provisioned 
              on EVI-1. Assume PE-2 get elected as DF. According to 
              <xref target="RFC7432"/> PE-2 will be responsible for forwarding 
              multicast traffic to that Ethernet segment. 

              <list style="symbols">
                  <t>
                      Forcing sole data plane forwarding responsibility on the 
                      PE-2 proves a limitation in the current DF election mechanism. 
                      In topology
                      at Figure 1 would always have only one of the PE to be 
                      elected as DF irrespective of which current DF election 
                      mechanism is in use (defined in <xref target="RFC7432"/> 
                      or <xref target="I-D.ietf-bess-evpn-ac-df"/>
                      and <xref target="I-D.ietf-bess-evpn-df-election"/>).
                  </t>
                  <t>In the above deployment we have to consider one more factor, 
                      Network bandwidth is shared between multicast and unicast 
                      flow. At any given point of time if AC1 already has 
                      unicast traffic flow which is taking good amount of 
                      network bandwidth. we would have very limited bandwidth 
                      available for multicast flows. Even though PE-3 to CE2 
                      (AC2) has not been used much, still we would end up having 
                      limitation about how much multicast can flow though AC1.
                  </t> 
              </list>
              In this document, we propose an extension to  HRW base drafts to 
              allow DF election at the granularity of (ESI, VLAN, Mcast flow) which would allow 
              multicast flows to be distributed among redundancy group 
              PE's to share the load.

          </t>
      </section>     

      <section title="Terminology">
          <t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
              "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
              document are to be interpreted as described in <xref target="RFC2119"/>  .
          </t>
          <t>With respect to EVPN, this document follows the terminology that has
              been defined in <xref target="RFC7432"/> and <xref target="RFC4601"/> for 
              multicast terminology.
          </t>
      </section>
      <section title="The DF Election Extended Community">
          <t> <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/> defines extended 
              community, which would be used for PE's in redundancy group to 
              come to an agreement about which DF election procedures is 
              supported.
              A PE can notify other participating PE's in redundancy group about 
              its willingness to support Per multicast flow base DF election 
              capability by signaling a DF election extended community along with 
              Ethernet-Segment Route (Type-4). 
              current proposal extends the existing extended community defined 
              in <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/>. This draft 
              defines new a DF type.
              <list style="symbols">
                  <t>DF type (1 octet) - Encodes the DF Election algorithm 
                      values (between 0 and 255) that the advertising PE 
                      desires to use for the ES.  
                      <list style="symbols">
                          <t> Type 0: Default DF Election algorithm, or 
                              modulus-based algorithms in <xref target="RFC7432"/>.
                          </t>
                          <t> Type 1:  HRW algorithm defined in <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/>
                          </t>
                          <t> Type 4: HRW base per multicast flow DF election 
                              (explained in this document)
                         </t>
                         <t> Type 5 - 254: Unassigned 
                         </t>
                         <t> Type 255: Reserved for Experimental Use.
                         </t>
                      </list>
                  </t>
                  <t>  The <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/> 
                      describes encoding of capabilities associated to the DF 
                      election algorithm using Bitmap field. When these 
                      capabilities bits are set along with the DF type-4, 
                      then these capabilities need to be interpreted in 
                      context of this new DF type-4. For example consider a 
                      scenario where all PEs in the same redundancy group 
                      (same ES) can support both AC-DF and DF type-4 and thus 
                      they receive such indications from the other PEs in the 
                      ES. In this scenario, if a VLAN is not active in a PE, 
                      then the DF election procedure on all PEs in the ES 
                      should factor that in and exclude that PE in the DF 
                      election per multicast flow.   
                  </t>
                  <t> A PE SHOULD attach the DF election Extended Community to ES route
                      and Extended Community MUST be sent if the ES is locally configured 
                      for DF type Per Multicast flow DF election. Only one DF Election 
                      Extended community can be sent along with an ES route.
                  </t>
                  <t> When a PE receives the ES Routes from all the other PE's for 
                      the ES, it check if all of other PE's have advertised their 
                      capability about Per multicast flow DF election procedure. 
                      If all of them have advertised capability, it performs DF 
                      election based on Per multicast flow procedure. But if 
                      <list style="symbols">
                          <t>There is at least one PE which advertised route-4 ( AD per ES Route) which 
                      does not indicates its capability to perform Per multicast flow 
                      DF election. OR 
                         </t>
                         <t> There is at least one PE signals single active in the AD per ES route
                         </t>
                 </list>
                  It MUST be considered as an indication to support  of only Default DF election 
                  <xref target="RFC7432"/> and DF election procedure in <xref target="RFC7432"/> MUST be used. 
                  </t>
              </list>
              
          </t>
      </section>

          <section title="HRW base per multicast flow EVPN DF election">
              <t> This document is an extension of 
                  <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/>, so this draft 
                  does not repeat description of HRW algorithm itself.
              </t>
              <t> EVPN PE does the discovery of redundancy group based on 
                  <xref target="RFC7432"/>. If redundancy group consists of 
                  N EVPN PE nodes. Then after the discovery all PEs build an 
                  unordered list of IP address of all the nodes in redundancy 
                  group. Procedure  defined in this draft does not require 
                  PE's to be ordered list.Address [i] denotes the IP address 
                  of i'th EVPN PE in redundancy group where (0 &lt; i &lt;= N ). 
              </t>
              <section anchor="v3-df" title="DF election for IGMP (S,G) membership request ">
                  <t> The DF is the PE who has maximum affinity for (S, G, V, ESI) where 
                      <list style="symbols">
                          <t>S - Multicast Source </t> 
                          <t>G - Multicast Group </t>
                          <t>V - Vlan ID for Ethernet Tag V.</t>
                          <t>ESI - Ethernet Segment Identifier</t>
                      </list>
                      In case of tie choose the PE whose IP address is numerically 
                      least.
                  </t>
                  <t> The affinity of PE(i) to (S,G,VLAN ID, ESI) is calculated by function, 
                      affinity (S,G,V, ESI, Address(i)), where (0 &lt; i &lt;= N), PE(i) 
                      is the PE at ordinal i, address(i) is the IP address of PE 
                      at ordinal i
                      <list style="symbols">
                          <t>affinity (S,G,V, ESI,  Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(S,G,V,ESI))+12345) (mod 2^31) </t>
                          <t>D(S,G,V, ESI) = CRC_32(S,G,V, ESI).  </t>
                      </list>
                      Here D(S,G,V,ESI) is the 32-bit digest (CRC_32) of the Source 
                      IP, Group IP, Vlan ID for Ethernet Tag V. Source and Group
                      IP address length does not matter as only the lower order 31
                      bits are modulo significant. 
                  </t>
              </section>

              <section title="DF election for IGMP (*,G) membership request ">
                  <t> In case of IGMP membership request where source is not known. The 
                      DF is the PE which has maximum affinity for (G,V, ESI) where 
                      <list style="symbols">
                          <t>G - Multicast Group </t>
                          <t>V - Vlan ID for Ethernet Tag V.</t>
                          <t>ESI - Ethernet Segment Identifier</t>
                      </list>
                      In case of tie choose the PE whose IP address is numerically 
                      least.
                  </t>
                  <t> The affinity of PE(i) to (G,V, ESI) is calculated by function, 
                      affinity (G,V, ESI, Address(i)), where (0 &lt; i &lt;= N), PE(i) 
                      is the PE at ordinal i, address(i) is the IP address of PE 
                      at ordinal i
                      <list style="symbols">
                          <t>affinity (G, V, ESI,  Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(G,V,ESI))+12345) (mod 2^31) </t>
                          <t>D(G,V, ESI) = CRC_32(G,V, ESI). </t>
                      </list>
                      Here D(G,V,ESI) is the 32-bit digest (CRC_32) of the 
                      Group IP, Vlan ID for Ethernet Tag V. Source and Group
                      IP address length does not matter as only the lower order 31
                      bits are modulo significant. 
                  </t>
              </section>
              <section anchor="default" title="Default DF election procedure">
                  <t> Even if all of the PE's indicate their availability to participate 
                      in per multicast flow DF election procedure, there is need to have 
                      default DF election algorithm. Since Per multicast flow DF election 
                      is applicable for only those multicast flows for which PE has received
                      membership request. For other BUM traffic, forwarding plane need default 
                      DF election procedure. And we use HRW based DF election procedure as 
                      default one in these cases which is defined in 
                      <xref target="I-D.ietf-bess-evpn-ac-df"/> and
                  <xref target="I-D.ietf-bess-evpn-df-election"/>. 
                  </t>
              </section>
          </section>

          <section title="Procedure to use per multicast flow DF election algorithm  ">
              <figure  align="center">
                  <artwork align="center"><![CDATA[

                                     Multicast  Source
                                             |
                                             |
                                             |
                                             |
                                         +---------+
                          +--------------+  PE-4   +--------------+
                          |              |         |              |
                          |              +---------+              |
                          |                                       |
                          |              EVPN CORE                |
                          |                                       |
                          |                                       |
                          |                                       |
                      +---------+        +---------+         +---------+
                      |  PE-1   +--------+   PE-2  +---------+   PE-3  |
                      |  EVI-1  |        |  EVI-1  |         | EVI-1   |
                      +---------+        +---------+         +---------+
                           |__________________|___________________|     
                         AC-1    ESI-1        | AC-2               AC-3
                                         +---------+
                                         |  CE-1   |
                                         |         |
                                         +---------+
                                              |
                                              |
                                              |
                                              |
                                      Multicast Receivers

                      Figure-2 : Multihomed network   
                      ]]></artwork>
                  <postamble></postamble>
              </figure>

              <t> Figure-2 shows multihomed network. Where EVPN PE-1, PE-2, PE-3
                  are multihomed to CE-1. Multiple multicast receivers are behind 
                  all active multihoming segment. 
                  <list style="numbers">
                      <t> PE's connected to the same Ethernet segment can 
                          automatically discover each other through exchange 
                          of the Ethernet Segment Route. This draft does not change 
                          any of this procedure, it still uses procedure defined in 
                          <xref target="RFC7432"/>.
                      </t>
                      <t> Each of the PE's in redundancy group advertise Ethernet 
                          segment route with extended community indicating their 
                          ability to participate in per multicast flow 
                          DF election procedure. Since Per multicast flow 
                          would not be applicable unless PE learns about 
                          membership request from receiver, there is need to 
                          have default DF election among PE's in redundancy 
                          group for BUM traffic.  In initial phase we use 
                          <xref target="default"/> DF election procedure.
 
                      </t>
                      <t> When receiver starts sending membership request for (s1,g1)
                          where s1 is multicast source address and g1 is multicast 
                          group address, CE-1 could hash membership request (IGMP join) 
                          to any of the PE's in 
                          redundancy group. Lets consider it is hashed to PE-2. 
                          <xref target="I-D.ietf-bess-evpn-igmp-mld-proxy"/> defines 
                          procedure to sync IGMP join state 
                          among redundancy group of PE's. Now each of the PE would 
                          have information about membership request (s1,g1) and each 
                          of them run DF election procedure <xref target="v3-df"/> to
                          elect DF among participating PE's in redundancy group. 
                          Consider PE-2 gets elected as DF for multicast flow (s1,g1).
                          <list style="numbers">
                              <t> PE-1 forwarding state would be nDF for flow (s1,g1) and 
                                  DF for rest other BUM traffic.</t>
                              <t> PE-2 forwarding state would be DF for flow (s1,g1) and 
                                  nDF for rest other BUM traffic.</t>
                              <t> PE-3 forwarding state would be nDF for flow (s1,g1) and 
                                  rest other BUM traffic.</t>
                          </list>
                      </t>
                      <t> As and when new multicast membership request comes, 
                          same procedure as above would continue.
                      </t>
                  </list>
              </t>
          </section>
          <section title="Triggers for DF re-election">
              <t> There are multiple triggers which can cause DF re-election. 
                  Some of the triggers could be 
                  <list style="numbers">
                      <t> Local ES going down due to physical failure or 
                          configuration change
                      </t>
                      <t> Detection of new PE through ES route.   
                      </t>
                      <t> AC going up / down
                      </t>
                  </list>
                  This document does not provide any new mechanism to handle DF 
                  re-election procedure. it does uses existing mechanism defined 
                  in <xref target="RFC7432"/>. When ever either of trigger occur,
                  DF re-election would be done. and all of the flows would be 
                  redistributed among existing PE's in redundancy group for ES.
              </t>
          </section>

     <section title="Protocol Considerations">
         <t> More details to be added in next version. </t>
 </section>
     <section title="Security Considerations">
         <t>The same Security Considerations described in <xref target="RFC7432"/>
             are valid for this document.
         </t>
     </section>

     <section title="IANA Considerations">
         <t> There are no new IANA considerations in this document.
         </t>
 </section>

      <section title="Acknowledgement">
      </section>

    
    
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>

      <references title='Normative References'>
          <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-bess-evpn-igmp-mld-proxy-00.xml"?>
          <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-bess-evpn-df-election-03.xml"?>
          <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-bess-evpn-ac-df-03.xml"?>
          &rfc7432;
          &rfc2119;
          &rfc4601;
          <reference anchor="HRW1999">
              <front>
                  <title>Using name-based mappings to increase hit rates</title>
                  <author>
                      <organization>
                          IEEE
                      </organization>
                  </author>
                  <date month="February" year="1998" />
              </front>
              <seriesInfo name="IEEE" value="HRW" />
          </reference>
      </references>

  </back>
</rfc>
