<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4893 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4893.xml">
<!ENTITY rfc4271 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4271.xml">
<!ENTITY rfc4760 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4760.xml">
]>
<!-- WK: Set category, IPR, docName -->
<rfc category="info" docName="draft-wkumari-idr-socialite-02"
     ipr="trust200902">
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

  <?rfc toc="yes" ?>

  <?rfc symrefs="yes" ?>

  <?rfc sortrefs="yes"?>

  <?rfc iprnotified="no" ?>

  <?rfc strict="yes" ?>

  <?rfc compact="yes" ?>

  <front>
    <!-- WK: Set long title. -->

    <title abbrev="socialite">Automagic peering at IXPs.</title>

    <author fullname="Warren Kumari" initials="W." surname="Kumari">
      <organization>Google</organization>

      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>

          <city>Mountain View, CA</city>

          <code>94043</code>

          <country>US</country>
        </postal>

        <email>warren@kumari.net</email>
      </address>
    </author>

    <author fullname="Keyur Patel" initials="K." surname="Patel">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street></street>

          <city></city>

          <region></region>

          <code></code>

          <country></country>
        </postal>

        <phone></phone>

        <facsimile></facsimile>

        <email>keyupate@cisco.com</email>

        <uri></uri>
      </address>
    </author>

    <author fullname="John Scudder" initials="J." surname="Scudder">
      <organization>Juniper Networks</organization>

      <address>
        <postal>
          <street>1194 N. Mathilda Ave</street>

          <city>Sunnyvale</city>

          <region></region>

          <code>CA</code>

          <country>USA</country>
        </postal>

        <phone></phone>

        <facsimile></facsimile>

        <email>jgs@juniper.net</email>

        <uri></uri>
      </address>
    </author>

    <!--
  -->

    <date day="18" month="October" year="2012" />

    <area>int</area>

    <workgroup>idr</workgroup>

    <abstract>
      <t>This document describes a method for automatically establishing BGP
      peering sessions at an Internet exchange point. Creation of these
      peering sessions is facilitated by a host.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>A large amount of Internet traffic is exchanged at Internet Exchange
      Points (IXP). These are networks that are specifically built and
      operated as locations for networks to peer and exchange traffic.</t>

      <t>Public peering refers to peering across the (IXP provided) switch
      fabric. In order to avoid having each participant at the IXP having to
      contact all of the other participants to enter into peering
      relationships, the IXP often provides a Route Server (RS). The Route
      Server is a BGP speaker that participants peer with and announce routes
      to. The Route Server takes these announcements and serves them to all of
      the other participants who peer with it (so far this is just like any
      other BGP router!). The Route Server differs from a standard eBGP
      speaker in that it neither updates the Next Hop, nor prepends its own AS
      to the AS Path attribute. By not changing the Next Hop attribute,
      traffic between participants flows directly between those participants
      (and does not pass through the Route Server), as the traffic doesn't
      flow though it, it is appropriate that it doesn't appear in the AS Path
      - this is known as a transparent Route Server (by not showing up in the
      AS Path, the fact that the peering between the participants occurs over
      a public peering session is hidden, and participants are not penalized
      by having longer AS Paths).</t>

      <t>This document describes an alternate solution for peering at an IXP.
      Instead of having a server that re-announces the routes from each
      participant to all of the others, we introduce a "socialite", a device
      that is responsible to making introductions between all of the
      participants and facilitating connections between them. This socialite
      can be thought of like a host at a dinner party. The guests arrive and
      the socialite introduces them to each other, and then steps out of the
      way to allow them to communicate (and peer!) on their own.</t>

      <t>This solution is aimed at operators who are currently peering with
      route-servers (and operators of those route-servers), and it is not
      expected to be a good alternative to "private peerings".</t>

      <section title="Requirements notation">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119"></xref>.</t>
      </section>
    </section>

    <section title="Terminology">
      <t><list style="hanging">
          <t hangText="Internet Exchange Point">A network for exchanging BGP
          routing information and traffic.</t>

          <t hangText="Route Server">A BGP speaker at an IXP that "reflects"
          routes from one participant to all the other participants. See <xref
          target="I-D.jasinska-ix-bgp-route-server"></xref></t>

          <t hangText="Socialite">A device running the Introduction protocol,
          responsible for making introductions between Guests.</t>

          <t hangText="Guests">Participants of the IXP that speak the Debut
          protocol with the Socialite, and are introduced by the Socialite to
          other Guests.</t>

          <t hangText="Debut">The protocol spoken between the Socialite and
          the Guests.</t>
        </list></t>
    </section>

    <section title="Overview">
      <t>The Guests at the IXP form a BGP peering relationship with the
      Socialite, announcing support for the Debut protocol. The Socialite
      sends the Guests a set of Debut updates, containing informations about
      the other participants. The Guests use this information to form direct
      BGP peerings between themselves. Policy can be configured on the
      Socialite to only make introductions between subsets of participants if
      so desired.</t>
    </section>

    <section title="Protocol Extensions">
      <t>The BGP protocol extensions introduced in this document include the
      definition of a new BGP capability, named "'Debut Capability", and the
      specification of the message subtypes for the Debut messages.</t>

      <section title="Debut Capability">
        <t>The "Debut Capability" is a new BGP capability [RFC5492]. The
        Capability Code for this capability is specified in the IANA
        Considerations section of this document. The Capability Length field
        of this capability is zero. By advertising this capability to a peer,
        a BGP speaker conveys to the peer that the speaker supports the
        message subtypes for the Debut protocol and the related procedures
        described in this document.</t>
      </section>
    </section>

    <section title="Packet Formats">
      <t>The Debut protocol is implemented using TLV structures, and fields
      are in network byte order. These TLV records are carried as payload in a
      standard BGP Message packet <xref target="RFC4271">(RFC 4271, Section
      4.1. Message Header Format )</xref></t>

      <section title="Message Header ">
        <t>The Debut protocol is implemented using the standard
        Type-Length-Value paradigm.</t>

        <t>All Debut messages are carried as payload in a standard BGP Message
        of Type [TBD_BGP] and are preceded by a standard header that specific
        the type and length of the message.</t>

        <figure>
          <artwork><![CDATA[
    0 1 2 3 4 5 6 7              15                              31
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | VER   |  RES  |          TYPE             |       LENGTH      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   VALUE (Variable Length)                     |
   ~                                                               ~
   ~                                                               ~
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

]]></artwork>
        </figure>

        <t><list style="symbols">
            <t>VER (4 bits): The VER (version) field specifies the version of
            the Debut protocol. For the initial (this) version of the protocol
            it will be set to 0.</t>

            <t>RES (4 bits): The RES (reserved) field is reserved in the
            initial (this) version of the protocol. It SHOULD be initialized
            to 0 on transmit and should be ignored on reception.</t>

            <t>TYPE (2 octets): The TYPE field species the TYPE of the TLV
            record, and allows an implementation to determine what type of
            information is carried in the record. If the highest bit of the
            TYPE field is set (the TYPE value is &gt;= 32786), understanding /
            implementation of the TYPE is optional - if an implementation does
            not implement this type it may ignore this message (this
            capability is included to allow for possible future logging,
            diagnostics, etc). If the highest bit is not set, and the
            implementation receives a TYPE that it does not implement, it
            should send a BGP NOTIFY and tear down the session. The TYPE codes
            are defined in the</t>

            <t>LENGTH (2 octets): The number of octets in the VALUE field of
            the TLV record. The total length of the TLV record in octets can
            be calculated by adding 4 (the number of octets in the TYPE and
            LENGTH fields) to the value of this field. This allows
            implementations to skip over TLV records that it cannot
            handle.</t>

            <t>VALUE (Variable length): The actual data. The meaning of this
            data is given by the TYPE filed, and the length by the LENGTH
            field. Parsing of the data field is performed according to the
            value of the TYPE field.</t>
          </list></t>
      </section>

      <section title="INTRODUCTION Record">
        <t>INTRODUCTION (TYPE 0) TLV records are used to "make introductions"
        between the Guests speaking the Debut protocol. They carry the
        information needed by Guests to contact the other Guests and establish
        a BGP peering session.</t>

        <figure>
          <artwork align="center"><![CDATA[
    0 1 2 3 4 5 6 7                15                24           32
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0: |                          NEIGHBOR AS                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4: |          AFI                  |   SAFI          |     LEN     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             ADDRESS                           |
   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
   /                                                               /
   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
x: |                              Auth                             |
   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
        </figure>

        <t><list style="symbols">
            <t>NEIGHBOR AS (4 octets): This specifies the Autonomous System of
            the Guest being introduced. In "Four-octet AS Number" format as
            specified in <xref target="RFC4893">RFC4893</xref></t>

            <t>AFI (2 octets): Address Family Identifier <xref
            target="RFC4760">[RFC4760]</xref></t>

            <t>SAFI (1 octet): Subsequent Address Family Identifier <xref
            target="RFC4760">[RFC4760]</xref></t>

            <t>LEN (1 octet): The length of the address in the ADDRESS field
            (32 for IPv4, 128 for IPv6).</t>

            <t>ADDRESS (Variable length) This contains the IP Address of the
            Guest being introduced.</t>

            <t>Auth (optional, variable length): The existence of this field
            is determined from LENGTH of the TLV. If the LENGTH is greater
            than the length of NEIGHBOR AS, FAMILY and ADDRESS, there is Auth
            data).</t>
          </list></t>

        <t>The AFI and SAFI are included in the INTRODUCTION message to allow
        the Socialite to introduce Guests with multiple address families.</t>

        <t>On reception of an INTRODUCTION message a Guest should store the
        information and then consult local policy (if any) to determine if it
        is willing to peer with the newly introduced Guest. If so, it should
        proceed as though this were a manually configured peer. This peering
        SHOULD be annotated to note that this is a Socialite created peering.
        It is recommended that the peering show up in the configuration, but
        not persist across reboots -- this is to allow operators to more
        easily see all neighbors while looking through the config.</t>
      </section>

      <section title="WITHDRAW Record">
        <t>WITHDRAW (TYPE 1) TLV records are used to inform a Guest that
        another previously introduced Guest is no longer participating. A
        Guest can use this information to abort in progress connection
        attempts, invalidate information from a cache, for informational
        logging or in any other way is sees fit, but it SHOULD NOT use this
        information to tear down peering sessions to other Guests in
        ESTABLISHED state. Debut is intended to make initial introductions
        between participants and does not provide any mechanisms to invalidate
        / abort sessions once the introductions have been made.</t>

        <t>If a Socialite attempts to unintroduce an unknown Guest, this
        information should be logged and then ignored.</t>

        <figure>
          <artwork align="center"><![CDATA[
    0 1 2 3 4 5 6 7                15                24           32
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0: |          AFI                  |   SAFI          |     LEN     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4: |                            ADDRESS                            \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
        </figure>

        <t><list style="symbols">
            <t>AFI (2 octets): Address Family Identifier <xref
            target="RFC4760">[RFC4760]</xref></t>

            <t>SAFI (1 octet): Subsequent Address Family Identifier <xref
            target="RFC4760">[RFC4760]</xref></t>

            <t>LEN (1 octet): The length of the address in the ADDRESS field
            (32 for IPv4, 128 for IPv6).</t>

            <t>ADDRESS (Variable length): This contains the IP address of the
            Guest that is no longer participating.</t>
          </list></t>

        <t>The AFI and SAFI are included so that the Socialite can inform
        Guests that only one of the AFI / SAFIs is being removed.</t>
      </section>
    </section>

    <section title="Protocol operation">
      <t>Debut BGP sessions behave just like any other BGP sessions, just the
      information carried is different - Guests should use the standard BGP
      peering process to contact Socialites (or Socialites, Guests). Once the
      peering is ESTABLISHED, Guests will begin receiving INTRODUCTION
      messages in UPDATES, and will store them in something resembling
      Adj-RIB-IN. Standard BGP logic applied for things like error handling,
      invalidation of previously received information, etc.</t>

      <t>As Debut is only intended to make initial introductions between
      Guests (and not to manage sessions between those Guests), if the BGP
      session between Guest and Socialite goes down, established BGP peerings
      between Guests will continue to remain active.</t>
    </section>

    <section title="Operational overview / implications">
      <t>There are many reasons why participants peer with route-servers at
      IXPs (see <xref target="I-D.jasinska-ix-bgp-route-server"></xref>)
      including<list style="symbols">
          <t>reducing the administrative burden of arranging and configuring
          BGP sessions with all the other participants,</t>

          <t>not wanting (or being able) to carry views from all the
          participants,</t>

          <t>relying on the IXP operator to implement routing policy decisions
          (see <xref target="I-D.jasinska-ix-bgp-route-server"></xref>,
          section 2.3)</t>
        </list></t>

      <t>This solution only attempts to address the first reason for using a
      route-server, and the implications of deploying this are described
      below.</t>

      <section title="Additional eBGP sessions.">
        <t>Debut is used to make introductions between all (or a subset of)
        participants at an IXP, and then the participants peer over "regular"
        BGP peerings. This means that each participating router will build a
        separate BGP peering session with every other participating router. As
        participants at IXPs (usually) only advertise a small subset of the
        full Internet routing table (such as internal or customer routes) and
        there is (usually) not a huge overlap of this routing information, the
        additional memory requirements are expected to not be too onerous
        (especially with the capacity of modern routers). As with all
        operational matters though, "Your network, your rules" applies -- it
        is up to each operator to determine the applicability / utility of the
        solution and how it fits (or doesn't) into their network (see what I
        did there? If this ends poorly, it's your own fault!)</t>
      </section>

      <section title="Simplified debugging.">
        <t>As "normal" eBGP peering sessions are setup between the
        participants (and there is no third party performing route-selection,
        etc), operators have more visibility into the system and can more
        easily leverage their existing troubleshooting / debugging skills to
        debug issues. Debut also more closely aligns the data and control
        plane, etc.</t>
      </section>

      <section title="BGP PATH / SIDR implications">
        <t>Part of the justification for this work is to simplify the design
        and implementation of path security in SIDR. As a Route-Server passes
        routing information between peers, but does not show up in the BGP
        AS-PATH it is indistinguishable from a BGP path shortening attack. By
        using Debut, eBGP speakers peer directly with each other and this
        problem is avoided.</t>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>IANA is requested to assign an AFI and SAFI for the Debut protocol.
      The text TBD1 should be replaced with the allocated AFI and the text
      TBD2 should be replaced with the allocated SAFI (and then this sentence
      should be removed).</t>

      <t>The IANA is requested to assign a value from the "BGP Message Types"
      registry and replace the text [TBD_BGP] with this value. The definition
      should be "Debut protocol".</t>

      <section title="Debut TYPE registry">
        <t>This document creates a new registry, "Debut Message Types".</t>

        <t>The registry policy is ""Specification Required".</t>

        <t>The initial entries in the registry are:</t>

        <figure>
          <artwork><![CDATA[
Value        Short description    Reference      
-------------------------------------------------
0            INTRODUCTION         [This]
1            WITHDRAW             [This]
2-3200       Unassigned
3200-32767   Private Use
32768-65480  Unassigned
65481-65535  Private Use
]]></artwork>
        </figure>

        <t></t>

        <t>Applications to the registry can request specific values that have
        yet to be assigned.</t>
      </section>
    </section>

    <section title="Security considerations">
      <t>This protocol is designed to facilitate direct BGP peerings between
      participants at an IXP, which eliminates the need for transparent route
      servers (which do not show up in the AS_PATH). This will facilitate the
      deployment of SIDR.</t>

      <t>As participants peer with each other directly (and not through a
      third party) there is less opportunity for malicious tampering with the
      control plane (for example, by the IXP).</t>

      <t>Debut currently does not provide a means to securely distribute
      Authentication information (there is a field, but it's not really
      defined). Depending on if needed this may be addressed.</t>

      <t>An attacker who manages to subvert the Socialite (or inject UPDATES
      that into the Socialite to Guest communication) will be able to make
      Guests peer with a device under his control -- the impact of this seems
      to be no worse than in the routeserver model.</t>

      <t>Currently routeserver operators perform some base level checking /
      sanitization of routing information (such as enforcing max-paths) - in
      the socialte model each operator is expected to perform thier own
      checks.</t>

      <section title="Privacy">
        <t>By having participants peer directly (as opposed to having their
        routing information pass through a route-server) the routing
        information is hidden from the IXP / route-server operator. Please
        note that this doesn't protect the data-plane, and the routing
        information could still be sniffed off the wire.</t>

        <t>The biggest concern with regards to privacy on a route server is
        towards propagating your policy to a third party, rather than
        propagating your routing information.</t>
      </section>
    </section>

    <section title="Acknowledgements">
      <t>The authors wish to thank Elisa Jasinska, Masataka MAWATARI, Robert
      Raszuk, Martin Hannigan, Simon Leinen.</t>
    </section>

    <section title="Author Notes">
      <t>[ RFC Editor -- Please remove this section before publication! ]</t>

      <t><list style="numbers">
          <t>Choose a better name than "Debut"</t>
        </list></t>

      <section title="Changelog.">
        <t><list style="symbols">
            <t>Changed the name of the protocol from 'elo-'elo to Debut - this
            is still not great, but "Introduction" is worse.</t>

            <t>Added Operational section, incorporated notes from John
            Scudder, Keyur.</t>
          </list></t>
      </section>

      <section title="Changes from -00 to -01">
        <t><list style="symbols">
            <t>Incorporated some comments from Elisa Jasinska</t>

            <t>Mainly version bump tp prevent expire!</t>
          </list></t>
      </section>

      <section title="Changes from -01 to -02">
        <t><list style="symbols">
            <t>Incorporated some long lingering nits / suggestions.</t>

            <t>9 or 10 folk have expressed interest and asked us to revive
            this. I (Warren) have done a really poor job of taking notes and
            incorporating them. Appologies, if you mentioned issues to me in
            person I have probably forgotten to incorporate them, *please*
            send them in email and I'll get to them.</t>

            <t>Clarified the audience slightly, improved some security
            bits.</t>
          </list></t>
      </section>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc2119;

      &rfc4893;

      &rfc4271;

      &rfc4760;
    </references>

    <references title="Informative References">
      <?rfc include='reference.I-D.draft-jasinska-ix-bgp-route-server-03.xml'?>
    </references>
  </back>
</rfc>
