<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-ginsberg-lsr-isis-flooding-scale-05"
     ipr="trust200902" updates="">
  <front>
    <title abbrev="draft-ginsberg-lsr-isis-flooding-scale">IS-IS Flooding
    Scale Considerations</title>

    <author fullname="Les Ginsberg" initials="L" surname="Ginsberg">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>821 Alder Drive</street>

          <city>Milpitas</city>

          <code>95035</code>

          <region>CA</region>

          <country>USA</country>
        </postal>

        <email>ginsberg@cisco.com</email>
      </address>
    </author>

    <author fullname="Peter Psenak" initials="P" surname="Psenak">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>Apollo Business Center Mlynske nivy 43</street>

          <city>Bratislava</city>

          <code>821 09</code>

          <country>Slovakia</country>
        </postal>

        <email>ppsenak@cisco.com</email>
      </address>
    </author>

    <author fullname="Marek Karasek" initials="M" surname="Karasek">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>Pujmanove 1753/10a, Prague 4 - Nusle</street>

          <city>Prague</city>

          <region/>

          <code>10 14000</code>

          <country>Czech Republic</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>mkarasek@cisco.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Acee Lindem" initials="A" surname="Lindem">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>301 Midenhall Way</street>

          <city>Cary</city>

          <region>NC</region>

          <code>27513</code>

          <country>US</country>
        </postal>

        <email>acee@cisco.com</email>
      </address>
    </author>

    <author fullname="Tony Przygienda" initials="T" surname="Przygienda">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street>1137 Innovation Way</street>

          <city>Sunnyvale</city>

          <region>Ca</region>

          <code/>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>prz@juniper.net</email>

        <uri/>
      </address>
    </author>

    <date year="2021"/>

    <area>Routing Area</area>

    <workgroup>Networking Working Group</workgroup>

    <keyword/>

    <abstract>
      <t>Link State PDU flooding rates in use are much slower than what modern
      networks can support. The use of IS-IS at larger scale requires faster
      flooding rates to achieve desired convergence goals. This document
      discusses issues associated with increasing flooding rates and some
      recommended practices which allow faster flooding rates to be used
      safely.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as
      shown here.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Link state IGPs such as Intermediate-System-to-Intermediate-System
      (IS-IS) depend upon having consistent Link State Databases (LSDB) on all
      Intermediate Systems (ISs) in the network in order to provide correct
      forwarding of data packets. When topology changes occur, new/updated
      Link State PDUs (LSPs) are propagated network-wide. The speed of
      propagation is a key contributor to convergence time.</t>

      <t>Historically, flooding rates have been conservative - on the order of
      10s of LSPs/second. This derives from guidance in the base specification
      <xref target="ISO10589"/> and early deployments when both CPU speeds and
      interface speeds were much slower than they are today and the scale of
      an IS-IS area was smaller than it may be today.</t>

      <t>As IS-IS is deployed in greater scale (larger number of nodes in an
      area and larger number of neighbors/node), the impact of the historic
      flooding rates becomes more significant. Consider the bringup or failure
      of a node with 1000 neighbors. This will result in a minimum of 1000 LSP
      updates. At a typical LSP flooding rate used in many deployments today
      (33 LSPs/second), it would take 30+ seconds simply to send the updated
      LSPs to a given neighbor. Depending on the diameter of the network,
      achieving a consistent LSDB on all nodes in the network could easily
      take a minute (or more).</t>

      <t>Increasing LSP flooding rate therefore becomes an essential element
      of supporting greater network scale.</t>

      <t>The remainder of this document discusses various aspects of protocol
      operation and how they are impacted by increased flooding rate. Where
      appropriate, best practices are defined which enhance an
      implementation's ability to support faster flooding rates.</t>
    </section>

    <section anchor="HISTORY" title="Historical Behavior">
      <t>The base specification for IS-IS <xref target="ISO10589"/> was first
      published in 1992 and updated in 2002. The update made no changes in
      regards to suggested timer values. Convergence targets at the time were
      on the order of seconds and the specified timer values reflect that.
      Here are some examples:</t>

      <t><figure>
          <artwork><![CDATA[  minimumLSPGenerationInterval - This is the minimum time interval
     between generation of Link State PDUs. A source Intermediate 
     system shall wait at least this long before re-generating one
     of its own Link State PDUs.
  The recommended value was 30 seconds.

  minimumLSPTransmissionInterval - This is the amount of time an 
     Intermediate system shall wait before further propagating 
     another Link State PDU from the same source system. 
  The recommended value was 5 seconds.

  partialSNPInterval - This is the amount of time between periodic 
     action for transmission of Partial Sequence Number PDUs.
     It shall be less than minimumLSPTransmission-Interval.
  The recommend value was 2 seconds.

]]></artwork>
        </figure></t>

      <t>Most relevant to a discussion of LSP flooding rate is the recommended
      interval between the transmission of two different LSPs on a given
      interface.</t>

      <t>For broadcast interfaces, <xref target="ISO10589"/> defined:</t>

      <t><figure>
          <artwork><![CDATA[  minimumBroadcastLSPTransmissionInterval - the minimum interval
     between PDU arrivals which can be processed by the slowest 
     Intermediate System on the LAN.
  The default value was defined as 33 milliseconds.
  NOTE: It was permitted to send multiple LSPs "back-to-back"
  as a burst, but this was limited to 10 LSPs in a one second
  period.

]]></artwork>
        </figure></t>

      <t>Although this value was specific to LAN interfaces, this has commonly
      been applied by implementations to all interfaces though that was not
      the original intent of the base specification. In fact Section
      12.1.2.4.3 states:</t>

      <t><figure>
          <artwork><![CDATA[  On point-to-point links the peak rate of arrival is limited only 
  by the speed of the data link and the other traffic flowing on 
  that link. 

]]></artwork>
        </figure></t>

      <t>Although modern implementations have not strictly adhered to the 33
      millisecond interval, it is commonplace for implementations to limit
      flooding rate to an order of magnitude similar to the 33 ms value.</t>

      <t>In the past 20 years, significant work on achieving faster
      convergence - more specifically sub-second convergence - has resulted in
      implementations modifying a number of the above timers in order to
      support faster signaling of topology changes. For example,
      minimumLSPGenerationInterval has been modified to support millisecond
      intervals - often with a backoff algorithm applied to prevent LSP
      generation storms in the event of a series of rapid oscillations.</t>

      <t>However, flooding rate has not been fundamentally altered.</t>
    </section>

    <section anchor="FCONVERGE" title="Flooding Rate and Convergence">
      <t>Convergence involves a number of sequential operations.</t>

      <t>First the topology change needs to be detected. This is a local
      activity occurring only on the node or nodes directly connected to the
      topology change. The directly connected node(s) then must advertise the
      topology change by updating their LSPs and flooding the changed LSPs.
      Routers then must process the updated LSDB and recalculate paths to
      affected destinations. The updated paths must then be installed in the
      forwarding plane.</t>

      <t>Only when all of the steps are completed on all nodes in the network
      has the network completed convergence.</t>

      <t>As the convergence requirement is consistency of LSDBs on all nodes
      in the network, it is fundamental to understand that the goal of
      flooding is to update the LSDB on all nodes in the network "as fast as
      possible". Controling the rate of flooding per interface is done to
      address some practical limitations which include:</t>

      <t><list style="symbols">
          <t>Fairness to other data and control traffic on the same
          interface</t>

          <t>Limitations on the processing rate of incoming control
          traffic</t>
        </list></t>

      <t>However, intentionally using different flooding rates on different
      interfaces increases the possibility of longer periods of LSDB
      inconsistency, which, in turn, delays network wide convergence.</t>

      <t>Many implementations provide knobs to control the rate of LSP
      flooding on a per interface basis. To the extent that this serves as a
      flow control mechanism, this may reduce the number of dropped LSPs
      during high activity bursts and thereby reduce the number of LSP
      retransmissions required. As LSP retransimssion timers are typically
      long (multiple seconds), this may result in shorter convergence times
      than if the LSP burst was uncontrolled. But if the performance
      characteristics of routers in the network are such that some routers
      consistently accept and process fewer LSPs/second than other routers,
      convergence will be degraded. Tuning LSP transmission timers on a per
      interface basis will never provide optimal convergence. Consistent
      flooding rates should be used on all interfaces.</t>

      <section anchor="FLOWCTRL" title="Flow Control Considerations">
        <t>In large scale deployments where an increased flooding rate is
        being used, it becomes more likely that a burst of LSPs may
        temporarily overwhelm a receiver. Normal operation of the Update
        Process will recover from this, but it may well make sense to employ
        some form of flow control. This will not serve to optimize
        convergence, but it can serve to reduce the number of LSP
        retransmissions. As retransmissions are deliberately done at a slow
        rate, the result of flow control will be to provide a shorter recovery
        time from a transient condition which prevents a node from handling
        the targeted rate of LSP transmission. Sustained inability to handle
        LSP reception at the targeted flooding rate indicates that the network
        is provisioned in a way which does not support optimal convergence.
        Steps need to be taken to resolve this issue. Such steps could include
        upgrading the routers that demonstrate this condition consistently,
        altering the configuration on the problematic routers or altering the
        position of the problematic routers in the network so as to reduce the
        overall load on those routers, or reducing the target maximum LSP
        transmission rate network-wide.</t>

        <t>When flow control is necessary, it can be implemented in a
        straightforward manner based on knowledge of the current flooding rate
        and the current acknowledgement rate. Such an algorithm is a local
        matter and there is no requirement or intent to standardize an
        algorithm. There are a number of aspects which serve as guidelines
        which can be described.</t>

        <t>A maximum target LSP transmission rate (LSPTxMax) SHOULD be
        configurable. This represents the fastest LSP transmission rate which
        will be attempted. This value SHOULD be applicable to all interfaces
        and and SHOULD be consistent network wide.</t>

        <t>When the current rate of LSP transmission (LSPTxRate) exceeds the
        capabilities of the receiver, the flow control algorithm needs to
        aggressively reduce the LSPTxRate within a few seconds. Slower
        responsiveness is likely to result in a large number of
        retransmissions which can introduce much larger delays in
        convergence.</t>

        <t>NOTE: Even with modest increases in flooding speed (for example, a
        target LSPTxMax of 300 LSPs/second (10 times the typical rate
        supported today)), a topology change triggering 2100 new LSPs would
        only take 7 seconds to complete.</t>

        <t>Dynamic adjustment of the rate of LSP transmission (LSPTxRate)
        upwards (i.e., faster) SHOULD be done less aggressively and only be
        done when the neighbor has demonstrated its ability to sustain the
        current LSPTxRate.</t>

        <t>The flow control algorithm MUST NOT assume the receive capabilities
        of a neighbor are static, i.e., it MUST handle transient conditions
        which result in a slower or faster receive rate on the part of a
        neighbor.</t>

        <t>The flow control algorithm needs to consider the expected delay
        time in receiving an acknowledgment. See <xref target="LSPACKRate"/>.
        This may vary per neighbor.</t>
      </section>

      <section anchor="LSPACKRate" title="Rate of LSP Acknowledgments">
        <t>On point-to-point networks, PSNP PDUs provide acknowledgments for
        received LSPs. <xref target="ISO10589"/> suggests that some delay be
        used when sending PSNPs. This provides some optimization as multiple
        LSPs can be acknowledged in a single PSNP.</t>

        <t>If faster LSP flooding is to be used safely, it is necessary that
        LSPs be acknowledged more promptly as well. This requires a reduction
        in the delay in sending PSNPs.</t>

        <t>As PSNPs also consume link bandwidth and packet queue space and
        protocol processing time on receipt, the increased sending of PSNPs
        should be taken into account when considering the rate at which LSPs
        can be sent on an interface.</t>
      </section>

      <section anchor="BANDU" title="Bandwidth Utilization">
        <t>Routing protocol traffic has to share bandwidth on a link with
        other control traffic and data traffic. During periods of instability,
        routing protocol traffic will increase, but it is still desirable that
        the maximum bandwidth consumption by routing protocol traffic be
        modest. This needs to be considered when setting IS-IS flooding
        rates.</t>

        <t>If we assume a maximum size of 1492 bytes for an LSP, here are some
        rough estimates of bandwidth consumption at different flooding
        rates:</t>

        <t><figure>
            <artwork><![CDATA[+--------------+----------------+-------------+
| LSPs/second  | 100 Mb Link    | 1 Gb Link   |
+--------------+----------------+-------------+
|   100        |    1.2 %       |  0.1 %      |
+--------------+----------------+-------------+
|   500        |    6.1 %       |  0.6 %      |
+--------------+----------------+-------------+
|  1000        |    12.1 %      |  1.2 %      |
+--------------+----------------+-------------+

]]></artwork>
          </figure></t>
      </section>

      <section anchor="PKTPRI" title="Packet Prioritization on Receive">
        <t>There are three classes of PDUs sent by IS-IS:</t>

        <t><list style="symbols">
            <t>Hellos</t>

            <t>LSPs</t>

            <t>Complete Sequence Number PDUs (CSNPs) and Partial Sequence
            Number PDUs (PSNPs)</t>
          </list>Implementations today may prioritize the reception of Hellos
        over LSPs and SNPs in order to prevent a burst of LSP updates from
        triggering an adjacency timeout which in turn would require additional
        LSPs to be updated.</t>

        <t>SNPs serve to acknowledge or trigger the transmission of specified
        LSPs. On a point-to-point link, PSNPs acknowledge the receipt of one
        or more LSPs. Because PSNPs (like all IS-IS PDUs) use TLVs in the
        body, it is possible to acknowledge multiple LSPs using a single PSNP.
        For this reason, <xref target="ISO10589"/> specifies a delay
        (partialSNPInterval) before sending a PSNP so that the number of PSNPs
        required to be sent is reduced. On receipt of a PSNP, the set of LSPs
        acknowledged by that PSNP can be marked so that they do not need to be
        retransmitted.</t>

        <t>If a PSNP is dropped on reception, this has a significant impact as
        the set of LSPs advertised in the PSNP cannot be marked as
        acknowledged and this results in needless retransmissions which may
        further delay transmission of other LSPs which have yet to be
        transmitted. It may also make it more likely that a receiver becomes
        overwhelmed by LSP transmissions.</t>

        <t>It is therefore recommended that implementations prioritize the
        receipt of SNPs over LSPs.</t>
      </section>
    </section>

    <section anchor="LSPGEN" title="Minimizing LSP Generation">
      <t>In IS-IS the unit of flooding is an LSP. Each router may generate a
      set of LSPs at each supported level. Each LSP in the set has an LSP
      number - which is a value from 0-N where N = 255 for the base protocol.
      (N has been extended to 65535 by <xref target="RFC7356"/>.) Each LSP
      carries network information using defined Type/Length/Value (TLV)
      tuples. For example, some TLVs carry neighbor information and some TLVs
      carry reachable prefix information. <xref target="ISO10589"/> strongly
      recommends preserving the association of a given advertisement (such as
      a neighbor) with a specific LSP whenever possible. This minimizes the
      number of LSPs which need to be regenerated when a topology change
      occurs. This recommendation becomes even more important as the scale of
      the network increases.</t>

      <t>Consider the following example;</t>

      <t>Node A has 11 neighbors currently in the UP state and is advertising
      them in three LSPs with content as follows:</t>

      <t><figure>
          <artwork><![CDATA[A.00-00 contains the following advertisements
   Neighbor 1
   Neighbor 2
   Neighbor 3
   Neighbor 4
   Neighbor 5
A.00-01 contains the following advertisements:
   Neighbor 6
   Neighbor 7
   Neighbor 8
   Neighbor 9
   Neighbor 10
A.00-02 contains the following advertisements
   Neighbor 11

]]></artwork>
        </figure>Imagine that the adjacency to Neighbor 3 goes down. There are
      (at least) two ways that A could update its LSPs.</t>

      <t>Method 1: Node A removes the neighbor advertisement for neighbor 3
      from A.00-00 and sends an update for that LSP. LSPs 00-01 and 00-02 are
      unchanged and so do not have to be flooded.</t>

      <t>Method 2: Node A attempts to reduce the number of LSPs currently
      active and updates the content as follows:</t>

      <t><figure>
          <artwork><![CDATA[A.00-00 contains the following advertisements
   Neighbor 1
   Neighbor 2
   Neighbor 4
   Neighbor 5
   Neighbor 6
A.00-01 contains the following advertisements:
   Neighbor 7
   Neighbor 8
   Neighbor 9
   Neighbor 10
   Neighbor 11
A.00-02 becomes empty

]]></artwork>
        </figure>Node A now has to flood all three LSPs. LSPs #0 and #1 are
      reflooded because their content has changed. LSP #2 is purged.</t>

      <t>In a large scale network, the impact of using Method #2 becomes
      significant and introduces conditions where a much larger number of LSPs
      need to be flooded than is the case with Method #1.</t>

      <t>In order to operate at scale, implementations need to follow the
      guidance in <xref target="ISO10589"/> and use Method #1 whenever
      possible.</t>
    </section>

    <section anchor="REDUNDANT" title="Redundant Flooding">
      <t>Default operation of the Update Process is to flood on all
      interfaces. In cases where a network is highly meshed, this can result
      in a significant amount of redundant flooding. Nodes will receive
      multiple copies of each updated LSP.</t>

      <t>There are defined mechanisms which can greatly reduce the redundant
      flooding. These include:</t>

      <t><list style="symbols">
          <t>Mesh Groups ( <xref target="RFC2973"/> )</t>

          <t>Dynamic Flooding ( <xref target="I-D.ietf-lsr-dynamic-flooding"/>
          )</t>
        </list></t>
    </section>

    <section anchor="JUMBO" title="Use of Jumbo Frames">
      <t>The maximum size of an LSP (LSPBufferSize) is a parameter that needs
      to be set consistently network wide. This is because IS-IS does not
      support fragmentation of its PDUs - so in order for network wide
      flooding of an LSP to be successful all routers must restrict their LSP
      size to a size which can be supported without fragmentation on all
      interfaces on which IS-IS operates.</t>

      <t>In networks where all interfaces on which IS-IS operates support
      large frames, LSPBufferSize may be set to a larger value than the
      default (1492). This allows more routing information to be encoded in a
      single LSP, which means that fewer LSPs are generated by each node and
      therefore the number of LSPs which need to be flooded can be reduced in
      some scenarios (e.g., node or interface bringup).</t>
    </section>

    <section title="Deployment Considerations">
      <t>As noted earlier in this document, it is desired to have consistent
      flooding speeds on all nodes in the network. Today, this is roughly
      achieved to the extent that current implementations flood at rates which
      are on the order of what is discussed in <xref target="ISO10589"/> ,
      i.e., 33 LSPs/second).</t>

      <t>As the goal is to introduce an order of magnitude increase in the
      rate of flooding (e.g., 10 times the current flooding rate) a network
      which has a mixture of nodes which support the faster flooding speeds
      and nodes which do not is at greater risk of introducing longer periods
      of LSDB inconsistency in the network - which is likely to have a
      negative impact on convergence and increase the occurrence of traffic
      drops or looping.</t>

      <t>It is recommended that all nodes in the network support increased
      flooding rates before enabling use of the increased flooding rates.</t>

      <t>Note that as the Update process runs in the context of an area (or
      the L2 sub-domain), enablement can safely be done on a per area basis
      even when nodes in another area do not support the faster flooding
      rates.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document requires no actions by IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>Security concerns for IS-IS are addressed in [ISO10589, [RFC5304],
      and [RFC5310].</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>Thanks to Bruno Decraene for his careful review and insightful
      comments.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <reference anchor="ISO10589">
        <front>
          <title>Intermediate system to Intermediate system intra-domain
          routeing information exchange protocol for use in conjunction with
          the protocol for providing the connectionless-mode Network Service
          (ISO 8473)</title>

          <author>
            <organization abbrev="ISO">International Organization for
            Standardization</organization>
          </author>

          <date month="Nov" year="2002"/>
        </front>

        <seriesInfo name="ISO/IEC" value="10589:2002, Second Edition"/>
      </reference>

      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.2973'?>

      <?rfc include='reference.RFC.5304'?>

      <?rfc include='reference.RFC.5310'?>

      <?rfc include='reference.RFC.8174'?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.ietf-lsr-dynamic-flooding.xml"?>

      <?rfc include='reference.RFC.7356'?>
    </references>
  </back>
</rfc>
