< draft-white-lsr-distoptflood-01.txt   draft-white-lsr-distoptflood-02.txt >
Network Working Group R. White Network Working Group R. White
Internet-Draft S. Hegde Internet-Draft S. Hegde
Intended status: Informational T. Przygienda Intended status: Informational T. Przygienda
Expires: 3 September 2022 Juniper Networks Expires: 27 October 2022 Juniper Networks
2 March 2022 25 April 2022
IS-IS Optimal Distributed Flooding for Dense Topologies IS-IS Optimal Distributed Flooding for Dense Topologies
draft-white-lsr-distoptflood-01 draft-white-lsr-distoptflood-02
Abstract Abstract
In dense topologies (such as data center fabrics based on the Clos In dense topologies (such as data center fabrics based on the Clos
and butterfly, though not limited to these topologies), flooding and butterfly topologies, though not limited to these), IGP flooding
mechanisms designed for sparse topologies can flood excessively, or mechanisms designed for sparse topologies can "overflood," or carry
carry too many copies of topology and reachability to fabric devices. too many copies of topology and reachability information to fabric
This results in slower convergence times and higher resource devices. This results in slower convergence times and higher
utilization. The modifications to the flooding mechanism in the resource utilization. The modifications to the flooding mechanism in
Intermediate System to Intermediate System (IS-IS) link state the Intermediate System to Intermediate System (IS-IS) link state
protocol described in this document reduce resource utilization to a protocol described in this document reduce resource utilization
minimum, while increasing convergence performance in dense significantly, while increasing convergence performance in dense
topologies. topologies.
Note that a Clos fabric is used as the primary example of a dense Note that a Clos fabric is used as the primary example of a dense
flooding topology throughout this document. However, the flooding flooding topology throughout this document. However, the flooding
optimizations described in this document apply to any dense topology. optimizations described in this document apply to any topology.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 3 September 2022. This Internet-Draft will expire on 27 October 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 25 skipping to change at page 2, line 25
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Experience . . . . . . . . . . . . . . . . . . . . . . . 3 1.3. Experience . . . . . . . . . . . . . . . . . . . . . . . 3
1.4. Sample Network . . . . . . . . . . . . . . . . . . . . . 3 1.4. Sample Network . . . . . . . . . . . . . . . . . . . . . 3
2. Flooding Modifications . . . . . . . . . . . . . . . . . . . 5 2. Flooding Modifications . . . . . . . . . . . . . . . . . . . 5
2.1. Optimizing Flooding . . . . . . . . . . . . . . . . . . . 5 2.1. Optimizing Flooding . . . . . . . . . . . . . . . . . . . 5
2.2. Optimization Process . . . . . . . . . . . . . . . . . . 6 2.2. Optimization Process . . . . . . . . . . . . . . . . . . 6
2.3. Flooding Failures . . . . . . . . . . . . . . . . . . . . 7 2.3. Flooding Failures . . . . . . . . . . . . . . . . . . . . 7
2.4. Flooding Example . . . . . . . . . . . . . . . . . . . . 7 2.4. Flooding Example . . . . . . . . . . . . . . . . . . . . 8
2.5. A Note on Performance . . . . . . . . . . . . . . . . . . 8 2.5. A Note on Performance . . . . . . . . . . . . . . . . . . 8
3. Security Considerations . . . . . . . . . . . . . . . . . . . 8 3. Security Considerations . . . . . . . . . . . . . . . . . . . 8
4. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1. Normative References . . . . . . . . . . . . . . . . . . 8 4.1. Normative References . . . . . . . . . . . . . . . . . . 8
4.2. Informative References . . . . . . . . . . . . . . . . . 10 4.2. Informative References . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction 1. Introduction
1.1. Goals 1.1. Goals
The goal of this draft is to solve one specific set of problems The goal of this draft is to solve one specific set of problems
involved in operating a link state protocol in a dense mesh topology. involved in operating a link state protocol in a densely meshed
The problem with such topologies is the connectivity density, which topology. The problem with such topologies is the connectivity
causes too much information to be flooded (or too much repeated state density, which causes too many copies of identical information to be
to be flooded). Analysis and experiment show, for instance, that in flooded. Analysis and experiment show, for instance, that in a
a butterfly fabric of around 2500 intermediate systems, each butterfly fabric of around 2500 intermediate systems, each
intermediate system will receive 40+ copies of any changed LSP intermediate system will receive more than 40 copies of any changed
fragment. This not only wastes bandwidth and processor time, this LSP fragment. This not only wastes bandwidth and processor time,
dramatically slows convergence speed. this dramatically slows convergence speed.
This document describes a set of modifications to existing IS-IS This document describes a set of modifications to existing IS-IS
flooding mechanisms which minimize the number of LSP framgents flooding mechanisms which minimize the number of LSP fragments
received by individual intermediate systems, potentially to one copy received by individual intermediate systems, in its extreme version
per intermediate system. The mechanism described here chooses to one copy per intermediate system. The mechanisms described in
different flooding intermediates using a hash across the system ID to this document are similar to those implemented in OSPF to support
prevent single failures from having a large impact on flooding, and mobile ad-hoc networks, as described in [RFC5449], [RFC5614], and
to spread the processing load of flooding across many systems and [RFC7182]. These mechanisms have been widely implemented and
prevent bottlenecks. deployed.
The mechanisms described in this document are similar to those
implemented in OSPF to support mobile ad-hoc networks, as described
in [RFC5449], [RFC5614], and [RFC7182]. Mechanisms similar to these
have been widely implemented and deployed.
1.2. Contributors 1.2. Contributors
The following people have contributed to this draft: Abhishek Kumar, The following people have contributed to this draft: Abhishek Kumar,
Nikos Triantafillis, Ivan Pepelnjak, Christian Franke, Hannes Nikos Triantafillis, Ivan Pepelnjak, Christian Franke, Hannes
Gredler, Les Ginsberg, Naiming Shen, Uma Chunduri, Nick Russo, and Gredler, Les Ginsberg, Naiming Shen, Uma Chunduri, Nick Russo, Shawn
Rodny Molina. Zandi, and Rodny Molina.
1.3. Experience 1.3. Experience
Lab testing shows modifications similar to these reduce flooding in a Laboratory tests show modifications similar to these reduce flooding
large scale emulated butterfly network topology; without these in a large scale emulated butterfly network topology; without these
modifications, intermediate systems receive, on average, 40 copies of modifications, intermediate systems receive, on average, 40 copies of
any changed LSP fragment. With these modifications, intermediate any changed LSP fragment. With the modifications described in this
systems recieve, on average, two copies of any changed LSP fragment. document intermediate systems recieve, on average, two copies of any
In many cases, each intermediate system receives one copy of each changed LSP fragment. In many cases, each intermediate system
changed LSP. In terms of performance, the modifications described receives only a single copy of each changed LSP. In terms of
here reduce convergence times by around 50%. A network that converges performance, the modifications described here cut convergence times
in about 30-40 seconds without these modifications converged in 15-20 in half. Processor load times were not checked, as this was an
seconds with these modifications. Processor load times were not emulated environment.
checked, as this was an emulated environment.
A mechanism similar to the one described in this document has been A mechanism similar to the one described in this document has been
implemented in the FR Routing open source routing stack as part of implemented in the FR Routing open source routing stack as part of
fabricd. fabricd.
1.4. Sample Network 1.4. Sample Network
The following spine and leaf fabric will be used to describe these The following spine and leaf fabric will be used to describe these
modifications. modifications.
skipping to change at page 4, line 49 skipping to change at page 4, line 49
* 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and * 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and
5F 5F
* etc. * etc.
The tiers or stages of the fabric are also marked for easier The tiers or stages of the fabric are also marked for easier
reference. T0 is assumed to be connected to application servers, or reference. T0 is assumed to be connected to application servers, or
rather they are Top of Rack (ToR) intermediate systems. The rather they are Top of Rack (ToR) intermediate systems. The
remaining tiers, T1 and T2, are connected only to other devices in remaining tiers, T1 and T2, are connected only to other devices in
the fabric itself. the fabric itself. A common alternate representation of this
topology is drawn "folded" with T2, the "top of fabric," shown on
top, while T1 is shown below, and T0 below T1.
2. Flooding Modifications 2. Flooding Modifications
Flooding is perhaps the most challenging scaling issue for a link Flooding is perhaps the most challenging scaling issue for a link
state protocol running on a dense, large scale fabric. This section state protocol running on a dense, large scale topology. This
describes modifications to the IS-IS flooding process to reduce section describes detailed modifications to the IS-IS flooding
flooding load on a dense or mesh topology. process to reduce flooding load in a densely meshed topology.
2.1. Optimizing Flooding 2.1. Optimizing Flooding
The simplest way to conceive of the solution presented here is in two The simplest way to conceive of the solution presented here is in two
stages: stages:
* Stage 1: Forward Optimization * Stage 1: Forward Optimization
- Find the group of intermediate systems that will all flood to - Find the group of intermediate systems that will all flood to
the same set of neighbors as the local IS the same set of neighbors as the local IS
- Decide (deterministially) which of the intermediate systems - Decide (deterministically) which subset of the intermediate
within this group should flood any received LSPs systems within this group should re-flood any received LSPs
* Stage 2: Reverse Optimization * Stage 2: Reverse Optimization
- Find neighbors on the shortest path towards the origin of the - Find neighbors on the shortest path towards the origin of the
change change
- Do not flood towards these neighbors - Do not flood towards these neighbors
The first stage is best explained through an illustration. In the The first stage is best explained through an illustration. In the
network above, if 5A transmits a modified Link State Protocol Data network above, if 5A transmits a modified Link State Protocol Data
Unit (LSP) to 4A-4F, each of 4A-4F will, in turn, flood this modified Unit (LSP) to 4A-4F, each of 4A-4F will, in turn, flood this modified
LSP to 3A (for instance). 3A will receive 6 copies of the modified LSP to 3A (for instance). 3A will receive 6 copies of the modified
LSP, while only one copy is necessary for the intermediate systems LSP, while only one copy is necessary for the intermediate systems
shown to converge on a single view of the topology. If 4A-4F could shown to converge on a single view of the topology. If 4A-4F could
determine they will all flood identical copies of the modified LSP to determine they will all flood identical copies of the modified LSP to
3A, it is possible for all of them except one to decide not to flood 3A, it is possible for all of them except one to decide not to flood
the changed LSP to 3A. the changed LSP to 3A.
The technique used in this draft to determine the flooding group is The technique used in this draft to determine the flooding group is
for each intermediate system to calculate a special SPT from the for each intermediate system to calculate a special Shortest-path
point of view of the transmitting neighbor. By setting the metric of Spanning Tree (SPT) from the point of view of the transmitting
all links to 1 and truncating the SPT to two hops, the local IS can neighbor. By setting the metric of all links to 1 and truncating the
find the group of neighbors it will flood any changed LSP towards and SPT to two hops, the local IS can find the group of neighbors it will
the set of intermediate systems (not necessarily neighbors) which flood any changed LSP towards and the set of intermediate systems
will also flood to this same set of neighbors. If every intermediate (not necessarily neighbors) which will also flood to this same set of
system in the flooding set performs this same calculation, they will neighbors. If every intermediate system in the flooding set performs
all obtain the same flooding group. this same calculation, they will all obtain the same flooding group.
Once this flooding group is determined, the members of the flooding Once this flooding group is determined, the members of the flooding
group will each (independently) choose which of the members should group will each (independently) choose which of the members should
flood. Each member of the flooding group calculates this re-flood the received information. Each member of the flooding group
independently of all the other members, using a common hash across a calculates this independently of all the other members, but a common
set of shared variables so each member of the group comes to the same hash MUST be used across a set of shared variables so each member of
conclusion. The group member which is selected to flood the changed the group comes to the same conclusion. The group member which is
LSP does so normally; the remaining group members do not flood the selected to flood the changed LSP does so normally; the remaining
LSP. group members do not flood the LSP.
Note there is no signaling between the intermediate systems running Note there is no signaling between the intermediate systems running
this flooding reduction mechanism. Each IS calculates a special, this flooding reduction mechanism. Each IS calculates the special,
truncated SPT separately, and determines which IS should flood any truncated SPT separately, and determines which IS should flood any
changed LSPs independently. Because these calculations are performed changed LSPs independently based on a common hash function. Because
using a shared view of the network, however (based on the common link these calculations are performed using a shared view of the network,
state database) and a shared sorting hash, each member of the however (based on the common link state database) and a shared hash
flooding group will make the same decision. function, each member of the flooding group will make the same
decision.
The second stage is simpler, consisting of a single rule: do not The second stage is simpler, consisting of a single rule: do not
flood modified LSPs along the shortest path towards the origin of the flood modified LSPs along the shortest path towards the origin of the
modified LSP. This rule relies on the observation that any IS modified LSP. This rule relies on the observation that any IS
between the origin of the modified LSP and the local IS should between the origin of the modified LSP and the local IS should
receive the modified LSP from some other IS closer to the source of receive the modified LSP from some other IS closer to the source of
the modified LSP. the modified LSP.
2.2. Optimization Process 2.2. Optimization Process
Each intermediate system will determine whether it should reflood as Each intermediate system will determine whether it should re-flood
described below, when a modified LSP arrives from a Transmitting LSPs as described below. When a modified LSP arrives from a
Neighbor (TN), by: Transmitting Neighbor (TN), the result of the following algorithm
obtains the necessary decision:
Step 1: Build the Two-Hop List (THL) and Remote Neighbor's List (RNL) Step 1: Build the Two-Hop List (THL) and Remote Neighbor's List (RNL)
by: by:
* Set all link metrics to 1 * Set all link metrics to 1
* Calculate an SPT truncated to 2 hops from the perspective of TN * Calculate an SPT truncated to 2 hops from the perspective of TN
* For each IS that is two hops (has a metric of two in the truncated * For each IS that is two hops (has a metric of two in the truncated
SPT) from TN: SPT) from TN:
skipping to change at page 7, line 4 skipping to change at page 7, line 6
* For each IS that is two hops (has a metric of two in the truncated * For each IS that is two hops (has a metric of two in the truncated
SPT) from TN: SPT) from TN:
- If the IS is on the shortest path towards the originator of the - If the IS is on the shortest path towards the originator of the
modified LSP, skip modified LSP, skip
- If the IS is not on the shortest path towards the originator of - If the IS is not on the shortest path towards the originator of
the modified LSP, add it to THL the modified LSP, add it to THL
* Add each IS that is one hop away from TN to the RNL * Add each IS that is one hop away from TN to the RNL
Step 2: Sort RNL by system IDs, from the least to the greatest. Step 2: Sort RNL by system IDs, from the least to the greatest.
Step 3: Calculate a number, N, by adding each byte in the LSP Step 3: Calculate a number, N, by adding each byte in LSP-ID (without
originator's System-ID and then taking MOD on the number of the fragment ID) and fragment ID MOD 2 (allowing for some balancing
neighbors. N MUST be less than the number of members of RNL. of LSPs coming from same system ID without introducing excessive
amount of state in an implementation) and then taking MOD on the
number of neighbors. N MUST be less than the number of members of
RNL.
Step 4: Starting with the Nth member of RNL: Step 4: Starting with the Nth member of RNL:
* If THL is empty, exit * If THL is empty, exit
* If this member of RNL is the local calculating IS, this IS MUST * If this member of RNL is the local calculating IS, this IS MUST
reflood the modified LSP; exit reflood the modified LSP; exit
* Remove all members of THL connected to (adjacent to) this member * Remove all members of THL connected to (adjacent to) this member
of RNL of RNL
skipping to change at page 7, line 32 skipping to change at page 7, line 38
Note: This description is geared to clarity rather than optimal Note: This description is geared to clarity rather than optimal
performance. performance.
2.3. Flooding Failures 2.3. Flooding Failures
It is possible in some failure modes for flooding to be incomplete It is possible in some failure modes for flooding to be incomplete
because of the flooding optimizations outlined. Specifically, if a because of the flooding optimizations outlined. Specifically, if a
reflooder fails, or is somehow disconnected from all the links across reflooder fails, or is somehow disconnected from all the links across
which it should be reflooding, it is possible an LSP is only which it should be reflooding, it is possible an LSP is only
partially flooded through the fabric. To prevent faliures, an partially flooded through the fabric. To prevent such partition
intermediate system which does not reflood an LSP (or fragment) failures, an intermediate system which does not reflood an LSP (or
should: fragment) should:
* Set a short timer; the default should be less than one second * Set a short timer; the default should be one second
* When the timer expires, send a Partial Sequence Number Packet * When the timer expires, send Partial Sequence Number Packet (PSNP)
(PSNP) to all neighbors of all LSPs that have not been reflooded during the timer runtime
to all neighbors unless an up-to-date PSNP or CSNP has been
already received from the neighbor
* Process any Partial Sequence Number Packets (PSNPs) as required to * Process any Partial Sequence Number Packets (PSNPs) received that
resynchronize indicate that neighbors still have older versions of the LSP per
normal protocol procedures to resynchronize
* Log/notify if a resynchronization is required * If resynchronization above a configurable threshold is required,
an implementation SHOULD notify the network operator
2.4. Flooding Example 2.4. Flooding Example
Assume, in the network above, 5A floods some modified LSP towards 4A- Assume, in the network above, 5A floods some modified LSP towards 4A-
4F. To determine whether 4A should flood this LSP to 3A-3F: 4F. To determine whether 4A should flood this LSP to 3A-3F:
* 5A is TN; 4A calculates a truncated SPT from 5A's perspective with * 5A is TN; 4A calculates a truncated SPT from 5A's perspective with
all link metrics set to 1 all link metrics set to 1
* 4A builds THL, which contains 3A, 3B, 3C, 3D, 3E, 3F, 5B, 5C, 5D, * 4A builds THL, which contains 3A, 3B, 3C, 3D, 3E, 3F, 5B, 5C, 5D,
 End of changes. 27 change blocks. 
87 lines changed or deleted 93 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/