| < draft-white-lsr-distoptflood-00.txt | draft-white-lsr-distoptflood-01.txt > | |||
|---|---|---|---|---|
| Network Working Group R. White | Network Working Group R. White | |||
| Internet-Draft S. Hegde | Internet-Draft S. Hegde | |||
| Intended status: Informational Juniper Networks | Intended status: Informational T. Przygienda | |||
| Expires: June 2, 2021 S. Zandi | Expires: 3 September 2022 Juniper Networks | |||
| 2 March 2022 | ||||
| November 29, 2020 | ||||
| IS-IS Optimal Distributed Flooding for Dense Topologies | IS-IS Optimal Distributed Flooding for Dense Topologies | |||
| draft-white-lsr-distoptflood-00 | draft-white-lsr-distoptflood-01 | |||
| Abstract | Abstract | |||
| In dense topologies, such as data center fabrics based on the Clos | In dense topologies (such as data center fabrics based on the Clos | |||
| and butterfly fabric topologies, flooding mechanisms designed for | and butterfly, though not limited to these topologies), flooding | |||
| sparse topologies, when used in these dense topologies, can | mechanisms designed for sparse topologies can flood excessively, or | |||
| "overflood," or carry too many copies of topology and reachability to | carry too many copies of topology and reachability to fabric devices. | |||
| fabric devices. This results in slower convergence times and higher | This results in slower convergence times and higher resource | |||
| resource utilization. The modifications to the flooding mechanism in | utilization. The modifications to the flooding mechanism in the | |||
| the Intermediate System to Intermediate System (IS-IS) link state | Intermediate System to Intermediate System (IS-IS) link state | |||
| protocol described in this document reduce resource utilization to a | protocol described in this document reduce resource utilization to a | |||
| minimum, while increaseing convergence performance in dense | minimum, while increasing convergence performance in dense | |||
| topologies. | topologies. | |||
| Note that a Clos fabric is used as the primary example of a desne | Note that a Clos fabric is used as the primary example of a dense | |||
| flooding topology throughout this document. However, the flooding | flooding topology throughout this document. However, the flooding | |||
| optimizations described in this document apply to any dense topology. | optimizations described in this document apply to any dense topology. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on June 2, 2021. | This Internet-Draft will expire on 3 September 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2020 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | license-info) in effect on the date of publication of this document. | |||
| publication of this document. Please review these documents | Please review these documents carefully, as they describe your rights | |||
| carefully, as they describe your rights and restrictions with respect | and restrictions with respect to this document. Code Components | |||
| to this document. Code Components extracted from this document must | extracted from this document must include Revised BSD License text as | |||
| include Simplified BSD License text as described in Section 4.e of | described in Section 4.e of the Trust Legal Provisions and are | |||
| the Trust Legal Provisions and are provided without warranty as | provided without warranty as described in the Revised BSD License. | |||
| described in the Simplified BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3 | 1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.3. Experience . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.3. Experience . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.4. Sample Network . . . . . . . . . . . . . . . . . . . . . 3 | 1.4. Sample Network . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Flooding Modifications . . . . . . . . . . . . . . . . . . . 5 | 2. Flooding Modifications . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1. Optimizing Flooding . . . . . . . . . . . . . . . . . . . 5 | 2.1. Optimizing Flooding . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.2. Flooding Failures . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Optimization Process . . . . . . . . . . . . . . . . . . 6 | |||
| 3. Use of Flooding Leaders and Flooding Mechanism Advertisements 6 | 2.3. Flooding Failures . . . . . . . . . . . . . . . . . . . . 7 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 6 | 2.4. Flooding Example . . . . . . . . . . . . . . . . . . . . 7 | |||
| 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.5. A Note on Performance . . . . . . . . . . . . . . . . . . 8 | |||
| 5.1. Normative References . . . . . . . . . . . . . . . . . . 7 | 3. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | |||
| 5.2. Informative References . . . . . . . . . . . . . . . . . 8 | 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| Appendix A. Flooding Optimization Operation . . . . . . . . . . 10 | 4.1. Normative References . . . . . . . . . . . . . . . . . . 8 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | 4.2. Informative References . . . . . . . . . . . . . . . . . 10 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Goals | 1.1. Goals | |||
| The goal of this draft is to solve one specific set of problems | The goal of this draft is to solve one specific set of problems | |||
| involved in operating a link state protocol in a dense mesh topology. | involved in operating a link state protocol in a dense mesh topology. | |||
| The problem with such topologies is the connectivity density, which | The problem with such topologies is the connectivity density, which | |||
| causes too much information to be flooded (or too much repeated state | causes too much information to be flooded (or too much repeated state | |||
| to be flooded). Analysis and experiment show, for instance, that in | to be flooded). Analysis and experiment show, for instance, that in | |||
| a butterfly fabric of around 2500 intermediate systems, each | a butterfly fabric of around 2500 intermediate systems, each | |||
| intermediate system will receive 40+ copies of any changed LSP | intermediate system will receive 40+ copies of any changed LSP | |||
| fragment. This not only wastes bandwidth and processor time, this | fragment. This not only wastes bandwidth and processor time, this | |||
| dramatically slows convergence speed. | dramatically slows convergence speed. | |||
| This document describes a set of modifications to existing IS-IS | This document describes a set of modifications to existing IS-IS | |||
| flooding mechanisms which minimize the number of LSP framgents | flooding mechanisms which minimize the number of LSP framgents | |||
| received by individual intermediate systems, potentially to one copy | received by individual intermediate systems, potentially to one copy | |||
| per intermediate system. The mechanisms described in this document | per intermediate system. The mechanism described here chooses | |||
| are similar to those implemented in OSPF to support mobile ad-hoc | different flooding intermediates using a hash across the system ID to | |||
| networks, as described in [RFC5449], [RFC5614], and [RFC7182]. These | prevent single failures from having a large impact on flooding, and | |||
| mechanisms have been widely deployed and tested. | to spread the processing load of flooding across many systems and | |||
| prevent bottlenecks. | ||||
| The mechanisms described in this document are similar to those | ||||
| implemented in OSPF to support mobile ad-hoc networks, as described | ||||
| in [RFC5449], [RFC5614], and [RFC7182]. Mechanisms similar to these | ||||
| have been widely implemented and deployed. | ||||
| 1.2. Contributors | 1.2. Contributors | |||
| The following people have contributed to this draft: Nikos | The following people have contributed to this draft: Abhishek Kumar, | |||
| Triantafillis, Ivan Pepelnjak, Christian Franke, Hannes Gredler, Les | Nikos Triantafillis, Ivan Pepelnjak, Christian Franke, Hannes | |||
| Ginsberg, Naiming Shen, Uma Chunduri, Nick Russo, and Rodny Molina. | Gredler, Les Ginsberg, Naiming Shen, Uma Chunduri, Nick Russo, and | |||
| Rodny Molina. | ||||
| 1.3. Experience | 1.3. Experience | |||
| The modifications described in this draft have been implemented in | Lab testing shows modifications similar to these reduce flooding in a | |||
| the FR Routing open source routing stack, and hence are available for | large scale emulated butterfly network topology; without these | |||
| testing and modification. The implementation is part of the | ||||
| openfabric daemon, which can be conditionally compiled from isisd. | ||||
| Note openfabricd has further modifications are not described in this | ||||
| document. | ||||
| Lab testing shows these modifications reduce flooding in a large | ||||
| scale emulated butterfly network topology; without these | ||||
| modifications, intermediate systems receive, on average, 40 copies of | modifications, intermediate systems receive, on average, 40 copies of | |||
| any changed LSP fragment. With these modifications, intermediate | any changed LSP fragment. With these modifications, intermediate | |||
| systems recieve, on average, two copies of any changed LSP fragment. | systems recieve, on average, two copies of any changed LSP fragment. | |||
| In many cases, each intermediate system receives one copy of each | In many cases, each intermediate system receives one copy of each | |||
| changed LSP. In terms of performance, the modifications described | changed LSP. In terms of performance, the modifications described | |||
| here reduce convergence times by around 50%. A network that converges | here reduce convergence times by around 50%. A network that converges | |||
| in about 30-40 seconds without these modifications converged in 15-20 | in about 30-40 seconds without these modifications converged in 15-20 | |||
| seconds with these modifications. Processor load times were not | seconds with these modifications. Processor load times were not | |||
| checked, as this was an emulated environment. | checked, as this was an emulated environment. | |||
| A mechanism similar to the one described in this document has been | ||||
| implemented in the FR Routing open source routing stack as part of | ||||
| fabricd. | ||||
| 1.4. Sample Network | 1.4. Sample Network | |||
| The following spine and leaf fabric will be used to describe these | The following spine and leaf fabric will be used to describe these | |||
| modifications. | modifications. | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| | 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0) | | 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0) | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| skipping to change at page 4, line 25 ¶ | skipping to change at page 4, line 25 ¶ | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| | 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1) | | 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1) | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| | 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0) | | 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0) | |||
| +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ | |||
| Figure 1 | Figure 1 | |||
| To reduce confusion (spine and leaf fabrics are difficult to draw in | To reduce confusion (spine and leaf fabrics are difficult to draw in | |||
| plain text art), this diagram does not contain the connections | plain text art), this diagram does not contain the connections | |||
| between devices. The reader should assume that each device in a | between devices. The reader should assume that each device in a | |||
| given layer is connected to every device in the layer above it. For | given layer is connected to every device in the layer above it. For | |||
| instance: | instance: | |||
| o 5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F | * 5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F | |||
| o 5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F | * 5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F | |||
| o 4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and | * 4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and | |||
| 5F | 5F | |||
| o 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and | * 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and | |||
| 5F | 5F | |||
| o etc. | * etc. | |||
| The tiers or stages of the fabric are also marked for easier | The tiers or stages of the fabric are also marked for easier | |||
| reference. T0 is assumed to be connected to application servers, or | reference. T0 is assumed to be connected to application servers, or | |||
| rather they are Top of Rack (ToR) intermediate systems. The | rather they are Top of Rack (ToR) intermediate systems. The | |||
| remaining tiers, T1 and T2, are connected only to the fabric itself. | remaining tiers, T1 and T2, are connected only to other devices in | |||
| the fabric itself. | ||||
| 2. Flooding Modifications | 2. Flooding Modifications | |||
| Flooding is perhaps the most challenging scaling issue for a link | Flooding is perhaps the most challenging scaling issue for a link | |||
| state protocol running on a dense, large scale fabric. This section | state protocol running on a dense, large scale fabric. This section | |||
| describes modifications to the IS-IS flooding process to reduce | describes modifications to the IS-IS flooding process to reduce | |||
| flooding load on a dense or mesh topology. | flooding load on a dense or mesh topology. | |||
| 2.1. Optimizing Flooding | 2.1. Optimizing Flooding | |||
| To reduce the flooding of link state information in the form of Link | The simplest way to conceive of the solution presented here is in two | |||
| State Protocol Data Units (LSPs), the following tables are required | stages: | |||
| to compute a set of reflooders: | ||||
| o Neighbor List (NL) list: The set of neighbors | * Stage 1: Forward Optimization | |||
| o Neighbor's Neighbors (NN) list: The set of neighbor's neighbors; | - Find the group of intermediate systems that will all flood to | |||
| this can be calculated by running SPF truncated to two hops | the same set of neighbors as the local IS | |||
| o Do Not Reflood (DNR) list: The set of neighbors who should have | - Decide (deterministially) which of the intermediate systems | |||
| LSPs (or fragments) who should not reflood LSPs | within this group should flood any received LSPs | |||
| o Reflood (RF) list: The set of neighbors who should flood LSPs (or | * Stage 2: Reverse Optimization | |||
| fragments) to their adjacent neighbors to ensure synchronization | ||||
| NL is set to contain all neighbors, and sorted deterministically (for | - Find neighbors on the shortest path towards the origin of the | |||
| instance, from the highest IS identifier to the lowest). All | change | |||
| intermediate systems within a single fabric SHOULD use the same | ||||
| mechanism for sorting the NL list. NN is set to contain all | ||||
| neighbor's neighbors, or all intermediate systems that are two hops | ||||
| away, as determined by performing a truncated SPF. The DNR and RF | ||||
| tables are initially empty. To begin, the following steps are taken | ||||
| to reduce the size of NN and NL: | ||||
| o Remove all intermediate systems from NL and NN that in the | - Do not flood towards these neighbors | |||
| shortest path to the IS that originated the LSP | ||||
| Then, for every IS in NL: | The first stage is best explained through an illustration. In the | |||
| network above, if 5A transmits a modified Link State Protocol Data | ||||
| Unit (LSP) to 4A-4F, each of 4A-4F will, in turn, flood this modified | ||||
| LSP to 3A (for instance). 3A will receive 6 copies of the modified | ||||
| LSP, while only one copy is necessary for the intermediate systems | ||||
| shown to converge on a single view of the topology. If 4A-4F could | ||||
| determine they will all flood identical copies of the modified LSP to | ||||
| 3A, it is possible for all of them except one to decide not to flood | ||||
| the changed LSP to 3A. | ||||
| o If the current entry in NL is connected to any entries in NN: | The technique used in this draft to determine the flooding group is | |||
| for each intermediate system to calculate a special SPT from the | ||||
| point of view of the transmitting neighbor. By setting the metric of | ||||
| all links to 1 and truncating the SPT to two hops, the local IS can | ||||
| find the group of neighbors it will flood any changed LSP towards and | ||||
| the set of intermediate systems (not necessarily neighbors) which | ||||
| will also flood to this same set of neighbors. If every intermediate | ||||
| system in the flooding set performs this same calculation, they will | ||||
| all obtain the same flooding group. | ||||
| * Move the IS to RF | Once this flooding group is determined, the members of the flooding | |||
| group will each (independently) choose which of the members should | ||||
| flood. Each member of the flooding group calculates this | ||||
| independently of all the other members, using a common hash across a | ||||
| set of shared variables so each member of the group comes to the same | ||||
| conclusion. The group member which is selected to flood the changed | ||||
| LSP does so normally; the remaining group members do not flood the | ||||
| LSP. | ||||
| * Remove the intermediate systems connected to the IS from NN | Note there is no signaling between the intermediate systems running | |||
| this flooding reduction mechanism. Each IS calculates a special, | ||||
| truncated SPT separately, and determines which IS should flood any | ||||
| changed LSPs independently. Because these calculations are performed | ||||
| using a shared view of the network, however (based on the common link | ||||
| state database) and a shared sorting hash, each member of the | ||||
| flooding group will make the same decision. | ||||
| o Else move the IS to DNR | The second stage is simpler, consisting of a single rule: do not | |||
| flood modified LSPs along the shortest path towards the origin of the | ||||
| modified LSP. This rule relies on the observation that any IS | ||||
| between the origin of the modified LSP and the local IS should | ||||
| receive the modified LSP from some other IS closer to the source of | ||||
| the modified LSP. | ||||
| The calculation terminates when the NL is empty. | 2.2. Optimization Process | |||
| When flooding, LSPs transmitted to adjacent neighbors on the RF list | Each intermediate system will determine whether it should reflood as | |||
| will be transmitted normally. Adjacent intermediate systems on this | described below, when a modified LSP arrives from a Transmitting | |||
| list will reflood received LSPs into the next stage of the topology, | Neighbor (TN), by: | |||
| ensuring database synchronization. LSPs transmitted to adjacent | ||||
| neighbors on the DNR list, however, MUST be transmitted using a | ||||
| circuit scope PDU as described in [RFC7356]. | ||||
| 2.2. Flooding Failures | Step 1: Build the Two-Hop List (THL) and Remote Neighbor's List (RNL) | |||
| by: | ||||
| * Set all link metrics to 1 | ||||
| * Calculate an SPT truncated to 2 hops from the perspective of TN | ||||
| * For each IS that is two hops (has a metric of two in the truncated | ||||
| SPT) from TN: | ||||
| - If the IS is on the shortest path towards the originator of the | ||||
| modified LSP, skip | ||||
| - If the IS is not on the shortest path towards the originator of | ||||
| the modified LSP, add it to THL | ||||
| * Add each IS that is one hop away from TN to the RNL | ||||
| Step 2: Sort RNL by system IDs, from the least to the greatest. | ||||
| Step 3: Calculate a number, N, by adding each byte in the LSP | ||||
| originator's System-ID and then taking MOD on the number of | ||||
| neighbors. N MUST be less than the number of members of RNL. | ||||
| Step 4: Starting with the Nth member of RNL: | ||||
| * If THL is empty, exit | ||||
| * If this member of RNL is the local calculating IS, this IS MUST | ||||
| reflood the modified LSP; exit | ||||
| * Remove all members of THL connected to (adjacent to) this member | ||||
| of RNL | ||||
| * Move to the next member of RNL, wrapping to the beginning of RNL | ||||
| if necessary | ||||
| Note: This description is geared to clarity rather than optimal | ||||
| performance. | ||||
| 2.3. Flooding Failures | ||||
| It is possible in some failure modes for flooding to be incomplete | It is possible in some failure modes for flooding to be incomplete | |||
| because of the flooding optimizations outlined. Specifically, if a | because of the flooding optimizations outlined. Specifically, if a | |||
| reflooder fails, or is somehow disconnected from all the links across | reflooder fails, or is somehow disconnected from all the links across | |||
| which it should be reflooding, it is possible an LSP is only | which it should be reflooding, it is possible an LSP is only | |||
| partially flooded through the fabric. To prevent such situations, | partially flooded through the fabric. To prevent faliures, an | |||
| any IS receiving an LSP transmitted using DNR SHOULD: | intermediate system which does not reflood an LSP (or fragment) | |||
| should: | ||||
| o Set a short timer; the default should be less than one second | * Set a short timer; the default should be less than one second | |||
| o When the timer expires, send a Complete Sequence Number Packet | * When the timer expires, send a Partial Sequence Number Packet | |||
| (CSNP) to all neighbors | (PSNP) to all neighbors | |||
| o Process any Partial Sequence Number Packets (PSNPs) as required to | * Process any Partial Sequence Number Packets (PSNPs) as required to | |||
| resynchronize | resynchronize | |||
| o If a resynchronization is required, notify the network operator | * Log/notify if a resynchronization is required | |||
| through a network management system | ||||
| 3. Use of Flooding Leaders and Flooding Mechanism Advertisements | 2.4. Flooding Example | |||
| [I-D.ietf-lsr-dynamic-flooding], section 5.1.1, describes the | Assume, in the network above, 5A floods some modified LSP towards 4A- | |||
| election of a flooding domain leader, which can advertise the kind of | 4F. To determine whether 4A should flood this LSP to 3A-3F: | |||
| flooding reduction mechanism used in the flooding domain. | ||||
| Implementations of this draft MAY implement the election and | ||||
| advertisement of a flooding domain leader as described in section | ||||
| 5.1.1 of [I-D.ietf-lsr-dynamic-flooding]. If the election of a | ||||
| flooding domain leader is implemented, implementations SHOULD also | ||||
| advertise the flooding mechanism using the IS-IS Dynamic Flooding | ||||
| Sub-TLV described in section 5.1.2 of | ||||
| [I-D.ietf-lsr-dynamic-flooding], using Algorithm number (TBD). | ||||
| 4. Security Considerations | * 5A is TN; 4A calculates a truncated SPT from 5A's perspective with | |||
| all link metrics set to 1 | ||||
| * 4A builds THL, which contains 3A, 3B, 3C, 3D, 3E, 3F, 5B, 5C, 5D, | ||||
| 5E and 5F | ||||
| * 4A builds RNL, which contains 4A,4B,4C,4D,4E and 4F, sorting it by | ||||
| the system ID | ||||
| * 4A computes hash on the received LSP-ID to get N; assume N is 1 in | ||||
| this case | ||||
| * Since 4A is the Nth member of R-NL and there are members in N-NL, | ||||
| 4A must reflood; the loop exits | ||||
| 2.5. A Note on Performance | ||||
| The calculations described here are complex, which might lead the | ||||
| reader to conclude that the cost of calculation is so much higher | ||||
| than the cost of flooding that this optimization is counter- | ||||
| productive. The description provided here is designed for clarity | ||||
| rather than optimal calculation, however. Many of the calculations | ||||
| can be performed in advance and stored, rather than being performed | ||||
| for each LSP and each neighbor. Optimized versions of the process | ||||
| described here have been implemented, and do result in strong | ||||
| convergence speed gains. | ||||
| 3. Security Considerations | ||||
| This document outlines modifications to the IS-IS protocol for | This document outlines modifications to the IS-IS protocol for | |||
| operation on high density network topologies. Implementations SHOULD | operation on high density network topologies. Implementations SHOULD | |||
| implement IS-IS cryptographic authentication, as described in | implement IS-IS cryptographic authentication, as described in | |||
| [RFC5304], and should enable other security measures in accordance | [RFC5304], and should enable other security measures in accordance | |||
| with best common practices for the IS-IS protocol. | with best common practices for the IS-IS protocol. | |||
| 5. References | 4. References | |||
| 5.1. Normative References | 4.1. Normative References | |||
| [I-D.ietf-lsr-dynamic-flooding] | [I-D.ietf-lsr-dynamic-flooding] | |||
| Li, T., Psenak, P., Ginsberg, L., Chen, H., Przygienda, | Li, T., Przygienda, T., Psenak, P., Ginsberg, L., Chen, | |||
| T., Cooper, D., Jalil, L., Dontula, S., and G. Mishra, | H., Cooper, D., Jalil, L., Dontula, S., and G. S. Mishra, | |||
| "Dynamic Flooding on Dense Graphs", draft-ietf-lsr- | "Dynamic Flooding on Dense Graphs", Work in Progress, | |||
| dynamic-flooding-07 (work in progress), June 2020. | Internet-Draft, draft-ietf-lsr-dynamic-flooding-10, 7 | |||
| December 2021, <https://www.ietf.org/archive/id/draft- | ||||
| ietf-lsr-dynamic-flooding-10.txt>. | ||||
| [ISO10589] | [ISO10589] International Organization for Standardization, | |||
| International Organization for Standardization, | ||||
| "Intermediate system to Intermediate system intra-domain | "Intermediate system to Intermediate system intra-domain | |||
| routeing information exchange protocol for use in | routeing information exchange protocol for use in | |||
| conjunction with the protocol for providing the | conjunction with the protocol for providing the | |||
| connectionless-mode Network Service (ISO 8473)", ISO/ | connectionless-mode Network Service (ISO 8473)", ISO/ | |||
| IEC 10589:2002, Second Edition, Nov 2002. | IEC 10589:2002, Second Edition, November 2002. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, | [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, | |||
| DOI 10.17487/RFC2629, June 1999, | DOI 10.17487/RFC2629, June 1999, | |||
| <https://www.rfc-editor.org/info/rfc2629>. | <https://www.rfc-editor.org/info/rfc2629>. | |||
| skipping to change at page 8, line 38 ¶ | skipping to change at page 10, line 24 ¶ | |||
| [RFC7981] Ginsberg, L., Previdi, S., and M. Chen, "IS-IS Extensions | [RFC7981] Ginsberg, L., Previdi, S., and M. Chen, "IS-IS Extensions | |||
| for Advertising Router Information", RFC 7981, | for Advertising Router Information", RFC 7981, | |||
| DOI 10.17487/RFC7981, October 2016, | DOI 10.17487/RFC7981, October 2016, | |||
| <https://www.rfc-editor.org/info/rfc7981>. | <https://www.rfc-editor.org/info/rfc7981>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| 5.2. Informative References | 4.2. Informative References | |||
| [I-D.ietf-isis-segment-routing-extensions] | [I-D.ietf-isis-segment-routing-extensions] | |||
| Previdi, S., Ginsberg, L., Filsfils, C., Bashandy, A., | Previdi, S., Ginsberg, L., Filsfils, C., Bashandy, A., | |||
| Gredler, H., and B. Decraene, "IS-IS Extensions for | Gredler, H., and B. Decraene, "IS-IS Extensions for | |||
| Segment Routing", draft-ietf-isis-segment-routing- | Segment Routing", Work in Progress, Internet-Draft, draft- | |||
| extensions-25 (work in progress), May 2019. | ietf-isis-segment-routing-extensions-25, 19 May 2019, | |||
| <https://www.ietf.org/archive/id/draft-ietf-isis-segment- | ||||
| routing-extensions-25.txt>. | ||||
| [RFC3277] McPherson, D., "Intermediate System to Intermediate System | [RFC3277] McPherson, D., "Intermediate System to Intermediate System | |||
| (IS-IS) Transient Blackhole Avoidance", RFC 3277, | (IS-IS) Transient Blackhole Avoidance", RFC 3277, | |||
| DOI 10.17487/RFC3277, April 2002, | DOI 10.17487/RFC3277, April 2002, | |||
| <https://www.rfc-editor.org/info/rfc3277>. | <https://www.rfc-editor.org/info/rfc3277>. | |||
| [RFC3719] Parker, J., Ed., "Recommendations for Interoperable | [RFC3719] Parker, J., Ed., "Recommendations for Interoperable | |||
| Networks using Intermediate System to Intermediate System | Networks using Intermediate System to Intermediate System | |||
| (IS-IS)", RFC 3719, DOI 10.17487/RFC3719, February 2004, | (IS-IS)", RFC 3719, DOI 10.17487/RFC3719, February 2004, | |||
| <https://www.rfc-editor.org/info/rfc3719>. | <https://www.rfc-editor.org/info/rfc3719>. | |||
| skipping to change at page 10, line 10 ¶ | skipping to change at page 11, line 45 ¶ | |||
| [RFC7182] Herberg, U., Clausen, T., and C. Dearlove, "Integrity | [RFC7182] Herberg, U., Clausen, T., and C. Dearlove, "Integrity | |||
| Check Value and Timestamp TLV Definitions for Mobile Ad | Check Value and Timestamp TLV Definitions for Mobile Ad | |||
| Hoc Networks (MANETs)", RFC 7182, DOI 10.17487/RFC7182, | Hoc Networks (MANETs)", RFC 7182, DOI 10.17487/RFC7182, | |||
| April 2014, <https://www.rfc-editor.org/info/rfc7182>. | April 2014, <https://www.rfc-editor.org/info/rfc7182>. | |||
| [RFC7921] Atlas, A., Halpern, J., Hares, S., Ward, D., and T. | [RFC7921] Atlas, A., Halpern, J., Hares, S., Ward, D., and T. | |||
| Nadeau, "An Architecture for the Interface to the Routing | Nadeau, "An Architecture for the Interface to the Routing | |||
| System", RFC 7921, DOI 10.17487/RFC7921, June 2016, | System", RFC 7921, DOI 10.17487/RFC7921, June 2016, | |||
| <https://www.rfc-editor.org/info/rfc7921>. | <https://www.rfc-editor.org/info/rfc7921>. | |||
| Appendix A. Flooding Optimization Operation | ||||
| Recent testing has shown that flooding is largely a "non-issue" in | ||||
| terms of scaling when using high speed links connecting intermediate | ||||
| systems with reasonable processing power and memory. However, | ||||
| testing has also shown that flooding will impact convergence speed | ||||
| even in such environments, and flooding optimization has a major | ||||
| impact on the performance of a link state protocol in resource | ||||
| constrained environments. Some thoughts on flooding optimization in | ||||
| general, and the flooding optimization contained in this document, | ||||
| follow. | ||||
| There are two general classes of flooding optimization available for | ||||
| link state protocols. The first class of optimization relies on a | ||||
| centralized service or server to gather the link state information | ||||
| and redistribute it back into the intermediate systems making up the | ||||
| fabric. Such solutions are attractive in many, but not all, | ||||
| environments; hence these systems compliment, rather than compete | ||||
| with, the system described here. Systems relying on a service or | ||||
| server necessarily also rely on connectivity to that service or | ||||
| server, either through an out-of-band network or connectivity through | ||||
| the fabric itself. Because of this, these mechanisms do not apply to | ||||
| all deployments; some deployments require underlying reachability | ||||
| regardless of connectivity to an outside service or server. | ||||
| The second possibility is to create a fully distributed system that | ||||
| floods the minimal amount of information possible to every | ||||
| intermediate system. The system described in this draft is an | ||||
| example of such a system. Again, there are many ways to accomplish | ||||
| this goal, but simplicity is a primary goal of the system described | ||||
| in this draft. | ||||
| The system described here divides the work into two different parts; | ||||
| forward and reverse optimization. The forward optimization begins by | ||||
| finding the set of intermediate systems two hops away from the | ||||
| flooding device, and choosing a subset of connected neighbors that | ||||
| will successfully reach this entire set of intermediate systems, as | ||||
| shown in the diagram below. | ||||
| G | ||||
| | | ||||
| A B C--+ | ||||
| | | | | | ||||
| +--D--+ E H | ||||
| | | | | ||||
| +----F--+--+ | ||||
| Figure 2 | ||||
| If F is flooding some piece of information, then it will find the | ||||
| entire set of intermediate systems within two hops by discovering its | ||||
| neighbors and their neighbors from the local LSDB. This will include | ||||
| A, B, C, D, and E--but not G. From this set, F can determine that D | ||||
| can reach A and B, while a single flood to either E or H will reach | ||||
| C. Hence F can flood to D and either E or H to reach C. F can | ||||
| choose to flood to D and E normally. Because H still needs to | ||||
| receive this new LSP (or fragment!), but does not need to reflood to | ||||
| C, F can send the LSP using link local signaling. In this case, H | ||||
| will receive and process the new LSP, but not reflood it. | ||||
| Rather than carrying the information necessary through hello | ||||
| extensions, as is done in [RFC5820], the neighbors are allowed to | ||||
| complete initial synchronization, and then a truncated shortest path | ||||
| tree is built to determine the "two hop neighborhood." This has the | ||||
| advantage of using mechanisms already used in IS-IS, rather than | ||||
| adding new processes. The risk with this process is any LSPs flooded | ||||
| through the network before this initial calculation takes place will | ||||
| be suboptimal. This "two hop neighborhood" process has been used in | ||||
| OSPF deployments for a number of years, and has proven stable in | ||||
| practice. | ||||
| Rather than setting a timer for reflooding, the implementation | ||||
| described here uses IS-IS' ability to describe the entire database | ||||
| using a CSNP to ensure flooding is successful. This adds some small | ||||
| amount of overhead, so there is some balance between optimal flooding | ||||
| and ensuring flooding is complete. | ||||
| The reverse optimization is simpler. It relies on the observation | ||||
| that any intermediate system between the local IS and the origin of | ||||
| the LSP, other than in the case of floods removing an LSP from the | ||||
| shared LSDB, should have already received a copy of the LSP. For | ||||
| instance, if F originates an LSP in the figure above, and E refloods | ||||
| the LSP to C, C does not need to reflood back to F if F is on its | ||||
| shortest path tree towards F. It is obvious this is not a "perfect" | ||||
| optimization. A perfect optimization would block flooding back along | ||||
| a directed acyclic graph towards the originator. Using the SPT, | ||||
| however, is a quick way to reduce flooding without performing more | ||||
| calculations. | ||||
| The combination of these two optimizations have been seen, in | ||||
| testing, to reduce the number of copies any IS receives from the tens | ||||
| to precisely one. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Russ White | Russ White | |||
| Juniper Networks | Juniper Networks | |||
| Email: russ@riw.us | Email: russ@riw.us | |||
| Shraddha Hegde | Shraddha Hegde | |||
| Juniper Networks | Juniper Networks | |||
| Email: shraddha@juniper.net | Email: shraddha@juniper.net | |||
| Shawn Zandi | Tony Przygienda | |||
| Juniper Networks | ||||
| Email: prz@juniper.net | ||||
| Email: szandi@linkedin.com | ||||
| End of changes. 55 change blocks. | ||||
| 216 lines changed or deleted | 207 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||