<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="no" ?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC4271 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4271.xml">
<!ENTITY RFC4272 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4272.xml">
<!ENTITY RFC4360 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4360.xml">
<!ENTITY RFC4364 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4364.xml">
<!ENTITY RFC4684 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4684.xml">
<!ENTITY RFC5291 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5291.xml">
<!ENTITY RFC5925 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5925.xml">
<!ENTITY RFC6952 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6952.xml">
<!ENTITY RFC7752 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7752.xml">
<!ENTITY RFC7911 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7911.xml">
<!ENTITY RFC7926 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7926.xml">
<!ENTITY RFC8126 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8126.xml">
<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC8402 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8402.xml">
]>

<rfc category="std" docName="draft-ietf-bess-datacenter-gateway-08" ipr="trust200902">
  <front>
    <title abbrev="SR DC Gateways">Gateway Auto-Discovery and Route Advertisement for Segment Routing Enabled Domain Interconnection</title>

    <author fullname="Adrian Farrel" initials="A." surname="Farrel">
      <organization>Old Dog Consulting</organization>
      <address>
        <email>adrian@olddog.co.uk</email>
      </address>
    </author>

    <author fullname="John Drake" initials="J." surname="Drake">
      <organization>Juniper Networks</organization>
      <address>
        <email>jdrake@juniper.net</email>
      </address>
    </author>

    <author fullname="Eric Rosen" initials="E." surname="Rosen">
      <organization>Juniper Networks</organization>
      <address>
        <email>erosen52@gmail.com</email>
      </address>
    </author>

    <author fullname="Keyur Patel" initials="K." surname="Patel">
      <organization>Arrcus, Inc.</organization>
      <address>
        <email>keyur@arrcus.com</email>
      </address>
    </author>

    <author fullname="Luay Jalil" initials="L" surname="Jalil">
      <organization>Verizon</organization>
      <address>
        <email>luay.jalil@verizon.com</email>
      </address>
    </author>

    <date year="2020" />
    <area>Routing</area>
    <workgroup>BESS Working Group</workgroup>
    <keyword>SR</keyword>
    <keyword>GW</keyword>
    <keyword>BGP</keyword>

    <abstract>
       <t>Data centers are critical components of the infrastructure used by network operators to provide
          services to their customers.  Data centers are attached to the Internet or a backbone network by
          gateway routers.  One data center typically has more than one gateway for commercial, load
          balancing, and resiliency reasons.</t>

       <t>Segment Routing is a protocol mechanism that can be used within a data center, and also for
          steering traffic that flows between two data center sites.  In order that one data center site
          may load balance the traffic it sends to another data center site, it needs to know the complete
          set of gateway routers at the remote data center, the points of connection from those gateways to
          the backbone network, and the connectivity across the backbone network.</t>

       <t>Segment Routing may also be operated in other domains, such as access networks.  Those domains
          also need to be connected across backbone networks through gateways.</t>

       <t>This document defines a mechanism using the BGP Tunnel Encapsulation attribute to allow each
          gateway router to advertise the routes to the prefixes in the Segment Routing domains to which
          it provides access, and also to advertise on behalf of each other gateway to the same Segment
          Routing domain.</t>
    </abstract>

</front>

<middle>

  <section anchor="introduction" title="Introduction">
     <t>Data centers (DCs) are critical components of the infrastructure used by network operators to provide
        services to their customers.  DCs are attached to the Internet or a backbone network by gateway routers
        (GWs).  One DC typically has more than one GW for various reasons including commercial preferences,
        load balancing, and resiliency against connection of device failure.</t>

     <t>Segment Routing (SR) <xref target="RFC8402" /> is a protocol mechanism that can be used within a DC, and
        also for steering traffic that flows between two DC sites.  In order for a source (ingress) DC that uses
        SR to load balance the flows it sends to a destination (egress) DC, it needs to know the complete set of
        entry nodes (i.e., GWs) for that egress DC from the backbone network connecting the two DCs.  Note
        that it is assumed that the connected set of DCs and the backbone network connecting them are part of the
        same SR BGP Link State (LS) instance (<xref target="RFC7752" /> and
        <xref target="I-D.ietf-idr-bgpls-segment-routing-epe" />) so that traffic engineering using SR may be used
        for these flows.</t>

     <t>SR may also be operated in other domains, such as access networks.  Those domains also need to be
        connected across backbone networks through gateways. For illustrative purposes, consider the Ingress and
        Egress SR Domains shown in <xref target="david_hockney" /> as spearate ASes.  The various ASes that provide
        connectivity between the Ingress and Egress Domains could each be constructed differently and use different
        technologies such as IP, MPLS with global table routing native BGP to the edge, MPLS IP VPN, SR-MPLS IP VPN,
        or SRv6 IP VPN.</t>

     <t>Suppose that there are two gateways, GW1 and GW2 as shown in <xref target="david_hockney" />, for a given
        egress SR domain and that they each advertise a route to prefix X which is located within the
        egress SR domain with each setting itself as next hop.  One might think that the GWs for X could
        be inferred from the routes&apos; next hop fields, but typically it is not the case that both routes get
        distributed across the backbone: rather only the best route, as selected by BGP, is distributed.  This
        precludes load balancing flows across both GWs.</t>

     <figure anchor="david_hockney" title="Example Segment Routing Domain Interconnection">
       <artwork align="center">
         <![CDATA[
      -----------------                    ---------------------
     | Ingress         |                  | Egress     ------   |
     | SR Domain       |                  | SR Domain |Prefix|  |
     |                 |                  |           |   X  |  |
     |                 |                  |            ------   |
     |       --        |                  |   ---          ---  |
     |      |GW|       |                  |  |GW1|        |GW2| |
      -------++--------                    ----+-----------+-+--
             | \                               |          /  |
             |  \                              |         /   |
             |  -+-------------        --------+--------+--  |
             | ||ASBR|     ----|      |----  |ASBR| |ASBR| | |
             | | ----     |ASBR+------+ASBR|  ----   ----  | |
             | |           ----|      |----                | |
             | |               |      |                    | |
             | |           ----|      |----                | |
             | | AS1      |ASBR+------+ASBR|           AS2 | |
             | |           ----|      |----                | |
             |  ---------------        --------------------  |
           --+-----------------------------------------------+--
          | |ASBR|                                       |ASBR| |
          |  ----               AS3                       ----  |
          |                                                     |
           -----------------------------------------------------
         ]]>
       </artwork>
     </figure>

     <t>The obvious solution to this problem is to use the BGP feature that allows the advertisement of
        multiple paths in BGP (known as Add-Paths) <xref target="RFC7911" /> to ensure that all routes to
        X get advertised by BGP.  However, even if this is done, the identity of the GWs will be lost as
        soon as the routes get distributed through an Autonomous System Border Router (ASBR) that will
        set itself to be the next hop.  And if there are multiple Autonomous Systems (ASes) in the
        backbone, not only will the next hop change several times, but the Add-Paths technique will
        experience scaling issues.  This all means that the Add-Paths approach is limited to SR domains
        connected over a single AS.</t>

     <t>This document defines a solution that overcomes this limitation and works equally well with a
        backbone constructed from one or more ASes.  The solution uses the Tunnel Encapsulation attribute
        <xref target="I-D.ietf-idr-tunnel-encaps" /> as follows:
        <list style="empty">
           <t>We define a new tunnel type, "SR Tunnel".  When the GWs to a given SR domain advertise a route
              to a prefix X within the SR domain, they will each include a Tunnel Encapsulation attribute
              with multiple tunnel instances each of type "SR Tunnel" (value 17), one for
              each GW, and each containing a Remote Endpoint sub-TLV with that GW&apos;s address.</t>
        </list></t>

     <t>In other words, each route advertised by a GW identifies all of the GWs to the same SR domain (see
        <xref target="DCGWautodisco" /> for a discussion of how GWs discover each other).  Therefore, even if
        only one of the routes is distributed to other ASes, it will not matter how many times the next hop
        changes, as the Tunnel Encapsulation attribute (and its remote endpoint sub-TLVs) will remain
        unchanged.</t>

     <t>To put this in the context of <xref target="david_hockney" />, GW1 and GW2 discover each other as
        gateways for the egress SR domain.  Both GW1 and GW2 advertise themselves as having routes to prefix
        X.  Furthermore, GW1 includes a Tunnel Encapsulation attribute with a tunnel instance of type "SR
        tunnel" for itself and another for GW2.  Similarly, GW2 includes a Tunnel Encapsulation for itself
        and another for GW1.  The gateway in the ingress SR domain can now see all possible paths to the
        egress SR domain regardless of which route advertisement is propagated to it, and it can choose one,
        or balance traffic flows as it sees fit.</t>

     <t>The protocol extensions defined in this document are put into the broader context of SR domain
        interconnection by <xref target="I-D.farrel-spring-sr-domain-interconnect" />.  That document shows
        how other existing protocol elements may be combined with the extensions defined in this document
        to provide a full system.</t>
  </section>

  <section title="Requirements Language">

    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
       "OPTIONAL" in this document are to be interpreted as described in BCP 14
       <xref target="RFC2119" /> <xref target="RFC8174" /> when, and only when,
       they appear in all capitals, as shown here.</t>

  </section>

  <section anchor="DCGWautodisco" title="SR Domain Gateway Auto-Discovery">
     <t>To allow a given SR domain&apos;s GWs to auto-discover each other and to coordinate their operations,
        the following procedures are implemented:

        <list style="symbols">
           <t>Each GW is configured with an identifier for the SR domain.  That identifier is common across all GWs
              to the domain (i.e., the same identifier is used by all GWs to the same SR domain), and unique across
              all SR domains that are connected (i.e., across all GWs to all SR domains that are interconnected).</t>

           <t>A route target (<xref target="RFC4360" />) is attached to each GW&apos;s auto-discovery route and
              has its value set to the SR domain identifier.</t>

           <t>Each GW constructs an import filtering rule to import any route that carries a route target with
              the same SR domain identifier that the GW itself uses.  This means that only these GWs will import
              those routes, and that all GWs to the same SR domain will import each other&apos;s routes and will
              learn (auto-discover) the current set of active GWs for the SR domain.</t>
         </list>
     </t>

     <t>The auto-discovery route that each GW advertises consists of the following:
        <list style="symbols">
           <t>An IPv4 or IPv6 NLRI containing one of the GW&apos;s loopback addresses (that is, with an AFI/SAFI
              pair that is one of 1/1, 2/1, 1/4, or 2/4).</t>

           <t>A Tunnel Encapsulation attribute <xref target="I-D.ietf-idr-tunnel-encaps" /> containing the GW&apos;s
              encapsulation information, which at a minimum consists of an SR Tunnel TLV (type TBD1 to be allocated by
              IANA) with a Remote Endpoint sub-TLV as specified in <xref target="I-D.ietf-idr-tunnel-encaps" />.</t>
         </list>
     </t>

     <t>To avoid the side effect of applying the Tunnel Encapsulation attribute to any packet that is addressed
        to the GW itself, the GW SHOULD use a different loopback address for the two cases.</t>

     <t>As described in <xref target="introduction" />, each GW will include a Tunnel Encapsulation attribute for
        each GW that is active for the SR domain (including itself), and will include these in every route advertised
        externally to the SR domain by each GW.  As the current set of active GWs changes (due to the addition of a
        new GW or the failure/removal of an existing GW) each externally advertised route will be re-advertised with
        the set of SR tunnel instances reflecting the current set of active GWs.</t>

     <t>If a gateway becomes disconnected from the backbone network, or if the SR domain operator decides to terminate
        the gateway&apos;s activity, it withdraws the advertisements described above.  This means that remote gateways
        at other sites will stop seeing advertisements from this gateway.  It also means that other local gateways at
        this site will "unlearn" the removed gateway and stop including a Tunnel Encapsulation attribute for the
        removed gateway in their advertisements.</t>

     <t>Note that if a GW is (mis)configured with a different SR domain identifier from the other GWs to the
        same domain then it will not be auto-discovered by the other GWs (and will not auto-discover the other
        GWs).  This would result in the in a receiver just getting the best route with only the advertising
        node&apos;s tunnel encapsulation information.</t>
  </section>

  <section anchor="EPE" title="Relationship to BGP Link State and Egress Peer Engineering">
     <t>When a remote GW receives a route to a prefix X it can use the SR tunnel instances within the contained
        Tunnel Encapsulation attribute to identify the GWs through which X can be reached.  It uses this information
        to compute SR Traffic Engineering (SR TE) paths across the backbone network looking at the information
        advertised to it in SR BGP Link State (BGP-LS) <xref target="I-D.ietf-idr-bgp-ls-segment-routing-ext" />
        and correlated using the SR domain identity.  SR Egress Peer Engineering (EPE)
        <xref target="I-D.ietf-idr-bgpls-segment-routing-epe" /> can be used to supplement the information advertised
        in BGP-LS.</t>
  </section>

  <section anchor="advertising" title="Advertising an SR Domain Route Externally">
     <t>When a packet destined for prefix X is sent on an SR TE path to a GW for the SR domain containing X (that is,
        the packet is sent in the Ingress Domain on an SR TE path that describes the path including within the Egress
        Domain), it needs to carry the receiving GW&apos;s label for X such that this label rises to the top of the stack
        before the GW completes its processing of the packet.  To achieve this we place a Prefix SID sub-TLV
        <xref target="I-D.ietf-idr-tunnel-encaps" /> for X in each SR tunnel instance in the Tunnel Encapsulation
        attribute in the externally advertised route for X.</t>

     <t>Alternatively, if the GWs for a given SR domain are configured to allow remote GWs to perform SR TE through that
        SR domain for a prefix X, then each GW computes an SR TE path through that SR domain to X from each of the currently
        active GWs, and places each in an MPLS label stack sub-TLV <xref target="I-D.ietf-idr-tunnel-encaps" /> in the SR
        tunnel instance for that GW.</t>

     <t>Please refer to Section 7 of <xref target="I-D.farrel-spring-sr-domain-interconnect" /> for worked examples of
        how the label stack is consructed in this case, and how the advertisements would work.</t>
  </section>

  <section anchor="encaps" title="Encapsulation">
     <t>If the GWs for a given SR domain are configured to allow remote GWs to send them a packet in that SR domain&apos;s
        native encapsulation, then each GW will also include multiple instances of a tunnel TLV for that native
        encapsulation in externally advertised routes: one for each GW and each containing a remote endpoint sub-TLV with
        that GW&apos;s address.  A remote GW may then encapsulate a packet according to the rules defined via the sub-TLVs
        included in each of the tunnel TLV instances.</t>
  </section>

  <section anchor="iana" title="IANA Considerations">
     <section anchor="tunneltype" title="Tunnel Encapsulation Tunnel Type">
        <t>IANA maintains a registry called "Border Gateway Protocol (BGP) Parameters" with a sub-registry called "BGP Tunnel
           Encapsulation Attribute Tunnel Types."  The registration policy for this registry is First-Come First-Served
           <xref target="RFC8126"/>.</t>

        <t>IANA has assigned the value 17 from this sub-registry for "SR Tunnel".</t>
     </section>

     <section anchor="tunnelsubtlv" title="Tunnel Encapsulation Sub-TLVs">
        <t>IANA maintains a registry called "Border Gateway Protocol (BGP) Parameters" with a sub-registry called "BGP Tunnel
           Encapsulation Attribute Sub-TLVs."  The registration policy for this registry is Standards Action.<xref target="RFC8126"/>.</t>

        <t>IANA is requested to assign a codepoint from this sub-registry for "SR Tunnel TLV" (TBD1).  The next available value may be
           used and reference should be made to this document.</t>
     </section>
  </section>

  <section anchor="security" title="Security Considerations">
     <t>From a protocol point of view, the mechanisms described in this document can leverage the security mechanisms already
        defined for BGP.  Further discussion of security considerations for BGP may be found in the BGP specification itself
        <xref target="RFC4271" /> and in the security analysis for BGP <xref target="RFC4272" />.  The original discussion of
        the use of the TCP MD5 signature option to protect BGP sessions is found in <xref target="RFC5925" />, while
        <xref target="RFC6952" /> includes an analysis of BGP keying and authentication issues.</t>

     <t>The mechanisms described in this document involve sharing routing or reachability information between domains: that may
        mean disclosing information that is normally contained within a domain.  So it needs to be understood that normal security
        paradigms based on the boundaries of domains are weakened.  Discussion of these issues with respect to VPNs can be found
        in <xref target="RFC4364" />, while <xref target="RFC7926" /> describes many of the issues associated with the exchange of
        topology or TE information between domains.</t>

     <t>Particular exposures resulting from this work include:
        <list style="symbols">
          <t>Gateways to a domain will know about all other gateways to the same domain.  This feature applies within a domain
             and so is not a substantial exposure, but it does mean that if the BGP exchanges within a domain can be snooped or
             if a gateway can be subverted then an attacker may learn the full set of gateways to a domain.  This would facilitate
             more effective attacks on that domain.</t>

          <t>The existence of multiple gateways to a domain becomes more visible across the backbone and even into remote domains.
             This means that an attacker is able to prepare a more comprehensive attack than exists when only the locally attached
             backbone network (e.g., the AS that hosts the domain) can see all of the gateways to a site.  For example, a Denial of
             Service attack on a single GW is mitigated by the existence of other GWs, but if the attacker knows about all the
             gateways then the whole set can be attacked at once.</t>

          <t>A node in a domain that does not have external BGP peering (i.e., is not really a domain gateway and cannot speak BGP
             into the backbone network) may be able to get itself advertised as a gateway by letting other genuine gateways discover
             it (by speaking BGP to them within the domain) and so may get those genuine gateways to advertise it as a gateway into
             the backbone network.  This would allow the malicious node to attract traffic without having to have secure BGP peerings
             with out-of-domain nodes.</t>

          <t>If it is possible to modify a BGP message within the backbone, it may be possible to spoof the existence of a
             gateway.  This could cause traffic to be attracted to a specific node and might result in black-holing of traffic.</t>
        </list></t>

     <t>All of the issues in the list above could cause disruption to domain interconnection, but are not new protocol vulnerabilities
        so much as new exposures of information that SHOULD be protected against using existing protocol mechanisms.  Furthermore, it
        is a general observation that if these attacks are possible then it is highly likely that far more significant attacks can be
        made on the routing system.  It should be noted that BGP peerings are not discovered, but always arise from explicit
        configuration.</t>
  </section>

  <section anchor="manageability" title="Manageability Considerations">
     <t>The principal configuration item added by this solution is the allocation of an SR domain identifier.  The same identifier
        MUST be assigned to every GW to the same domain, and each domain MUST have a different identifier.  This requires coordination,
        probably through a central management agent.</t>

     <t>It should be noted that BGP peerings are not discovered, but always arise from explicit configuration.  This is no different
        from any other BGP operation.</t>

     <section anchor="rtc" title="Relationship to Route Target Constraint">
        <t>In order to limit the VPN routing information that is maintained at a given route reflector, <xref target="RFC4364" />
           suggests the use of "Cooperative Route Filtering" <xref target="RFC5291" /> between route reflectors.  <xref target="RFC4684" />
           defines an exension to that mechanism to include support for multiple autonomous systems and asymmetric VPN topologies such
           as hub-and-spoke.  The mechanism in RFC 4684 is known as Route Target Constraint (RTC).</t>

        <t>An operator would not normally configure RTC by default for any AFI/SAFI combination, and would only enable it after careful
           consideration.  When using the mechanisms defined in this document, the operator should consider carefully the effects of
           filtering routes.  In some cases this may be desirable, and in others it could limit the effectiveness of the procedures.</t>
     </section>
  </section>

<!--
  <section anchor="contrib" title="Contributors">
     <t>The following people contributed to discussions that led to the
        development of this document:</t>

     <figure>
       <artwork  align="left">
         <![CDATA[
           TBD
           name
           Email: email
         ]]>
       </artwork>
     </figure>
  </section>
-->

  <section anchor="acks" title="Acknowledgements">
     <t>Thanks to Bruno Rijsman, Stephane Litkowsji, Boris Hassanov, Linda Dunbar, Ravi Singh, and Gyan Mishra for review
        comments, and to Robert Raszuk for useful discussions.</t>
  </section>

</middle>

<back>
  <references title="Normative References">
    &RFC2119;
    &RFC4271;
    &RFC4360;
    &RFC5925;
    &RFC7752;
    &RFC8174;
    <?rfc include="reference.I-D.ietf-idr-bgpls-segment-routing-epe"?>
    <?rfc include="reference.I-D.ietf-idr-tunnel-encaps"?>
  </references>

  <references title="Informative References">
    &RFC4272;
    &RFC4364;
    &RFC4684;
    &RFC5291;
    &RFC6952;
    &RFC7911;
    &RFC7926;
    &RFC8126;
    &RFC8402;
    <?rfc include="reference.I-D.farrel-spring-sr-domain-interconnect"?>
    <?rfc include="reference.I-D.ietf-idr-bgp-ls-segment-routing-ext"?>
  </references>

</back>
</rfc>
