idnits 2.17.1 draft-white-distoptflood-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (March 31, 2019) is 1853 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.shen-isis-spine-leaf-ext' is defined on line 332, but no explicit reference was found in the text == Unused Reference: 'ISO10589' is defined on line 337, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 345, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 350, but no explicit reference was found in the text == Unused Reference: 'RFC5120' is defined on line 354, but no explicit reference was found in the text == Unused Reference: 'RFC5301' is defined on line 360, but no explicit reference was found in the text == Unused Reference: 'RFC5303' is defined on line 364, but no explicit reference was found in the text == Unused Reference: 'RFC5305' is defined on line 369, but no explicit reference was found in the text == Unused Reference: 'RFC5308' is defined on line 373, but no explicit reference was found in the text == Unused Reference: 'RFC5309' is defined on line 377, but no explicit reference was found in the text == Unused Reference: 'RFC5311' is defined on line 382, but no explicit reference was found in the text == Unused Reference: 'RFC5316' is defined on line 387, but no explicit reference was found in the text == Unused Reference: 'RFC7981' is defined on line 397, but no explicit reference was found in the text == Unused Reference: 'RFC8174' is defined on line 402, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-isis-segment-routing-extensions' is defined on line 408, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-spring-segment-routing' is defined on line 414, but no explicit reference was found in the text == Unused Reference: 'RFC3277' is defined on line 420, but no explicit reference was found in the text == Unused Reference: 'RFC3719' is defined on line 425, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 430, but no explicit reference was found in the text == Unused Reference: 'RFC5440' is defined on line 439, but no explicit reference was found in the text == Unused Reference: 'RFC5837' is defined on line 459, but no explicit reference was found in the text == Unused Reference: 'RFC6232' is defined on line 464, but no explicit reference was found in the text == Unused Reference: 'RFC7921' is defined on line 474, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2629 (Obsoleted by RFC 7749) ** Obsolete normative reference: RFC 5316 (Obsoleted by RFC 9346) == Outdated reference: A later version (-25) exists of draft-ietf-isis-segment-routing-extensions-23 Summary: 3 errors (**), 0 flaws (~~), 26 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group R. White, Ed. 2 Internet-Draft S. Zandi, Ed. 3 Intended status: Informational LinkedIn 4 Expires: October 2, 2019 March 31, 2019 6 IS-IS Optimal Distributed Flooding for Dense Topologies 7 draft-white-distoptflood-00 9 Abstract 11 Dense topologies, such as data center fabrics based on the Clos and 12 butterfly fabric topologies. Flooding mechanisms designed for sparse 13 topologies, when used in these dense topologies, can result in slower 14 convergence times and higher resource utilization. The modifications 15 to the flooding mechanism in the Intermediate System to Intermediate 16 System (IS-IS) link state protocol described in this document reduce 17 resource utilization to a minimum, while increaseing convergence 18 performance in dense topologies. 20 Note that a Clos fabric is used as the primary example of a desne 21 flooding topology throughout this document. However, the flooding 22 optimizations described in this document apply to any dense topology. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on October 2, 2019. 41 Copyright Notice 43 Copyright (c) 2019 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (https://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2 60 1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3 61 1.3. Experience . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.4. Additions . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.5. Sample Network . . . . . . . . . . . . . . . . . . . . . 4 64 2. Adjacency Formation Optimization . . . . . . . . . . . . . . 5 65 3. Flooding Modifications . . . . . . . . . . . . . . . . . . . 6 66 3.1. Optimizing Flooding . . . . . . . . . . . . . . . . . . . 6 67 3.2. Flooding Failures . . . . . . . . . . . . . . . . . . . . 7 68 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 69 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 5.1. Normative References . . . . . . . . . . . . . . . . . . 8 71 5.2. Informative References . . . . . . . . . . . . . . . . . 9 72 Appendix A. Flooding Optimization Operation . . . . . . . . . . 11 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 75 1. Introduction 77 1.1. Goals 79 The goal of this draft is to solve one specific set of problems 80 involved in operating a link state protocol in a dense mesh topology. 81 The problem with such topologies is the connectivity density, which 82 causes too much information to be flooded (or too much repeated state 83 to be flooded). Analysis and experiment show, for instance, that in 84 a butterfyl fabric of around 2500 intermediate systems, each 85 intermediate system will receive 40+ copies of any changed LSP 86 fragment. This not only wastes bandwidth and processor time, this 87 dramatically slows convergence speed. 89 While there are a number of centralized flooding reduction mechanisms 90 designed specifically for data center fabrics available, a 91 distributed flooding reduction mechanism will be more widely 92 applicable to dense topologies. Modifying existing distributed 93 flooding mechanisms for efficiency is also simpler than creating 94 entirely new flooding mechanisms. Experience with the existing 95 distributed flooding mechanism in IS-IS is directly applicable to 96 modifications of that scheme. 98 Ultimately, the goal of this document is to describe a set of 99 modifications that will reduce the number of copies any single 100 intermediate system receives of an LSP fragment on any network change 101 to less than three, and almost always one. Optimizing flooding in 102 this way can dramatically increase the performance of IS-IS in terms 103 of convergence performance and resource utilization. 105 1.2. Contributors 107 The following people have contributed to this draft: Nikos 108 Triantafillis (reflected flooding optimization), Ivan Pepelnjak 109 (fabric locality calculation modifications), Christian Franke (fabric 110 localigy calculation modification), Hannes Gredler (do not reflood 111 optimizations), Les Ginsberg (capabilities encoding, circuit local 112 reflooding), Naiming Shen (capabilities encoding, circuit local 113 reflooding), Uma Chunduri (failure mode suggestions, flooding), Nick 114 Russo, and Rodny Molina. 116 See [RFC5449], [RFC5614], and [RFC7182] for similar solutions in the 117 Mobile Ad Hoc Networking (MANET) solution space. 119 1.3. Experience 121 The modifications described in this draft have been implemented in 122 the FR Routing open source routing stack, and hence are available for 123 testing and modification. The implementation is part of the 124 openfabric daemon, which can be conditionally compiled from isisd. 125 Note openfabricd has further modifications are not described in this 126 document. 128 Lab testing shows these modifications reduce flooding in a large 129 scale emulated butterfly network topology; without these 130 modifications, intermediate systems receive, on average, 40 copies of 131 any changed LSP fragment. With these modifications, intermediate 132 systems recieve, on average, two copies of any changed LSPF fragment. 133 In many cases, each intermediate system receives one copy of each 134 changed LSP. In terms of performance, the modifications described 135 here reduce convergence times by around 50%. A network that converges 136 in about 30-40 seconds without these modifications converged in 15-20 137 seconds with these modifications. Processor load times were not 138 checked, as this was an emulated environment. 140 1.4. Additions 142 This draft describes two additions to IS-IS to improve flooding 143 efficiency and convergence time: 145 o A slightly modified adjacency formation process. 147 o A mechanism that reduces flooding to the minimum possible, while 148 still ensuring complete database synchronization among the 149 intermediate systems within the fabric. 151 1.5. Sample Network 153 The following spine and leaf fabric will be used to describe these 154 modifications. 156 +----+ +----+ +----+ +----+ +----+ +----+ 157 | 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0) 158 +----+ +----+ +----+ +----+ +----+ +----+ 160 +----+ +----+ +----+ +----+ +----+ +----+ 161 | 2A | | 2B | | 2C | | 2D | | 2E | | 2F | (T1) 162 +----+ +----+ +----+ +----+ +----+ +----+ 164 +----+ +----+ +----+ +----+ +----+ +----+ 165 | 3A | | 3B | | 3C | | 3D | | 3E | | 3F | (T2) 166 +----+ +----+ +----+ +----+ +----+ +----+ 168 +----+ +----+ +----+ +----+ +----+ +----+ 169 | 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1) 170 +----+ +----+ +----+ +----+ +----+ +----+ 172 +----+ +----+ +----+ +----+ +----+ +----+ 173 | 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0) 174 +----+ +----+ +----+ +----+ +----+ +----+ 176 Figure 1 178 To reduce confusion (spine and leaf fabrics are difficult to draw in 179 plain text art), this diagram does not contain the connections 180 between devices. The reader should assume that each device in a 181 given layer is connected to every device in the layer above it. For 182 instance: 184 o 5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F 186 o 5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F 187 o 4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 188 5F 190 o 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 191 5F 193 o etc. 195 The tiers or stages of the fabric are also marked for easier 196 reference. T0 is assumed to be connected to application servers, or 197 rather they are Top of Rack (ToR) intermediate systems. The 198 remaining tiers, T1 and T2, are connected only to the fabric itself. 200 2. Adjacency Formation Optimization 202 While adjacency formation is not considered particularly burdensome 203 in IS-IS, it may still be useful to reduce the amount of state 204 transferred across the network when connecting a new IS to the 205 fabric. In its simplest form, the process is: 207 o An IS connected to the fabric will send hellos on all links. 209 o The IS will only complete the three-way handshake with one newly 210 discovered neighbor; this would normally be the first neighbor 211 which sends the newly connected intermediate system's ID back in 212 the three-way handshake process. 214 o The IS will complete its database exchange with this one newly 215 adjacent neighbor. 217 o Once this process is completed, the IS will continue processing 218 the remaining neighbors as normal. 220 o If synchronization is not achieved within twice the dead timer on 221 the local interface, the newly connected IS will repeat this 222 process with the second neighbor with which it forms a three-way 223 adjacency. 225 This process allows each IS newly added to the fabric to exchange a 226 full table once; a very minimal amount of information will be 227 transferred with the remaining neighbors to reach full 228 synchronization. 230 Any such optimization is bound to present a tradeoff between several 231 factors; the mechanism described here increases the amount of time 232 required to form adjacencies slightly in order to reduce the total 233 state carried across the network. An alternative mechanism could 234 provide a better balance of the amount of information carried across 235 the network for initial synchronization and the time required to 236 synchronize a new IS. For instance, an IS could choose to 237 synchronize its database with two or three adjacent intermediate 238 systems, which could speed the synchronization process up at the cost 239 of carrying additional data on the network. A locally determined 240 balance between the speed of synchronization and the amount of data 241 carried on the network can be acheived by adjusting the number of 242 adjacent intermediate systems the newly attached IS synchronizes 243 with. 245 3. Flooding Modifications 247 Flooding is perhaps the most challenging scaling issue for a link 248 state protocol running on a dense, large scale fabric. This section 249 describes modifications to the IS-IS flooding process to reduce 250 flooding load on a dense or mesh topology. 252 3.1. Optimizing Flooding 254 To reduce the flooding of link state information in the form of Link 255 State Protocol Data Units (LSPs), the following tables are required 256 to compute a set of reflooders: 258 o Neighbor List (NL) list: The set of neighbors 260 o Neighbor's Neighbors (NN) list: The set of neighbor's neighbors; 261 this can be calculated by running SPF truncated to two hops 263 o Do Not Reflood (DNR) list: The set of neighbors who should have 264 LSPs (or fragments) who should not reflood LSPs 266 o Reflood (RF) list: The set of neighbors who should flood LSPs (or 267 fragments) to their adjacent neighbors to ensure synchronization 269 NL is set to contain all neighbors, and sorted deterministically (for 270 instance, from the highest IS identifier to the lowest). All 271 intermediate systems within a single fabric SHOULD use the same 272 mechanism for sorting the NL list. NN is set to contain all 273 neighbor's neighbors, or all intermediate systems that are two hops 274 away, as determined by performing a truncated SPF. The DNR and RF 275 tables are initially empty. To begin, the following steps are taken 276 to reduce the size of NN and NL: 278 o Remove all intermediate systems from NL and NN that in the 279 shortest path to the IS that originated the LSP 281 Then, for every IS in NL: 283 o If the current entry in NL is connected to any entries in NN: 285 * Move the IS to RF 287 * Remove the intermediate systems connected to the IS from NN 289 o Else move the IS to DNR 291 The calculation terminates when the NL is empty. 293 When flooding, LSPs transmitted to adjacent neighbors on the RF list 294 will be transmitted normally. Adjacent intermediate systems on this 295 list will reflood received LSPs into the next stage of the topology, 296 ensuring database synchronization. LSPs transmitted to adjacent 297 neighbors on the DNR list, however, MUST be transmitted using a 298 circuit scope PDU as described in [RFC7356]. 300 3.2. Flooding Failures 302 It is possible in some failure modes for flooding to be incomplete 303 because of the flooding optimizations outlined. Specifically, if a 304 reflooder fails, or is somehow disconnected from all the links across 305 which it should be reflooding, it is possible an LSP is only 306 partially flooded through the fabric. To prevent such situations, 307 any IS receiving an LSP transmitted using DNR SHOULD: 309 o Set a short timer; the default should be less than one second 311 o When the timer expires, send a Complete Sequence Number Packet 312 (CSNP) to all neighbors 314 o Process any Partial Sequence Number Packets (PSNPs) as required to 315 resynchronize 317 o If a resynchronization is required, notify the network operator 318 through a network management system 320 4. Security Considerations 322 This document outlines modifications to the IS-IS protocol for 323 operation on high density network topologies. Implementations SHOULD 324 implement IS-IS cryptographic authentication, as described in 325 [RFC5304], and should enable other security measures in accordance 326 with best common practices for the IS-IS protocol. 328 5. References 330 5.1. Normative References 332 [I-D.shen-isis-spine-leaf-ext] 333 Shen, N., Ginsberg, L., and S. Thyamagundalu, "IS-IS 334 Routing for Spine-Leaf Topology", draft-shen-isis-spine- 335 leaf-ext-07 (work in progress), October 2018. 337 [ISO10589] 338 International Organization for Standardization, 339 "Intermediate system to Intermediate system intra-domain 340 routeing information exchange protocol for use in 341 conjunction with the protocol for providing the 342 connectionless-mode Network Service (ISO 8473)", ISO/ 343 IEC 10589:2002, Second Edition, Nov 2002. 345 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 346 Requirement Levels", BCP 14, RFC 2119, 347 DOI 10.17487/RFC2119, March 1997, 348 . 350 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 351 DOI 10.17487/RFC2629, June 1999, 352 . 354 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 355 Topology (MT) Routing in Intermediate System to 356 Intermediate Systems (IS-ISs)", RFC 5120, 357 DOI 10.17487/RFC5120, February 2008, 358 . 360 [RFC5301] McPherson, D. and N. Shen, "Dynamic Hostname Exchange 361 Mechanism for IS-IS", RFC 5301, DOI 10.17487/RFC5301, 362 October 2008, . 364 [RFC5303] Katz, D., Saluja, R., and D. Eastlake 3rd, "Three-Way 365 Handshake for IS-IS Point-to-Point Adjacencies", RFC 5303, 366 DOI 10.17487/RFC5303, October 2008, 367 . 369 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 370 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 371 2008, . 373 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 374 DOI 10.17487/RFC5308, October 2008, 375 . 377 [RFC5309] Shen, N., Ed. and A. Zinin, Ed., "Point-to-Point Operation 378 over LAN in Link State Routing Protocols", RFC 5309, 379 DOI 10.17487/RFC5309, October 2008, 380 . 382 [RFC5311] McPherson, D., Ed., Ginsberg, L., Previdi, S., and M. 383 Shand, "Simplified Extension of Link State PDU (LSP) Space 384 for IS-IS", RFC 5311, DOI 10.17487/RFC5311, February 2009, 385 . 387 [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in 388 Support of Inter-Autonomous System (AS) MPLS and GMPLS 389 Traffic Engineering", RFC 5316, DOI 10.17487/RFC5316, 390 December 2008, . 392 [RFC7356] Ginsberg, L., Previdi, S., and Y. Yang, "IS-IS Flooding 393 Scope Link State PDUs (LSPs)", RFC 7356, 394 DOI 10.17487/RFC7356, September 2014, 395 . 397 [RFC7981] Ginsberg, L., Previdi, S., and M. Chen, "IS-IS Extensions 398 for Advertising Router Information", RFC 7981, 399 DOI 10.17487/RFC7981, October 2016, 400 . 402 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 403 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 404 May 2017, . 406 5.2. Informative References 408 [I-D.ietf-isis-segment-routing-extensions] 409 Previdi, S., Ginsberg, L., Filsfils, C., Bashandy, A., 410 Gredler, H., and B. Decraene, "IS-IS Extensions for 411 Segment Routing", draft-ietf-isis-segment-routing- 412 extensions-23 (work in progress), March 2019. 414 [I-D.ietf-spring-segment-routing] 415 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 416 Litkowski, S., and R. Shakir, "Segment Routing 417 Architecture", draft-ietf-spring-segment-routing-15 (work 418 in progress), January 2018. 420 [RFC3277] McPherson, D., "Intermediate System to Intermediate System 421 (IS-IS) Transient Blackhole Avoidance", RFC 3277, 422 DOI 10.17487/RFC3277, April 2002, 423 . 425 [RFC3719] Parker, J., Ed., "Recommendations for Interoperable 426 Networks using Intermediate System to Intermediate System 427 (IS-IS)", RFC 3719, DOI 10.17487/RFC3719, February 2004, 428 . 430 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 431 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 432 DOI 10.17487/RFC4271, January 2006, 433 . 435 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 436 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 437 2008, . 439 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 440 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 441 DOI 10.17487/RFC5440, March 2009, 442 . 444 [RFC5449] Baccelli, E., Jacquet, P., Nguyen, D., and T. Clausen, 445 "OSPF Multipoint Relay (MPR) Extension for Ad Hoc 446 Networks", RFC 5449, DOI 10.17487/RFC5449, February 2009, 447 . 449 [RFC5614] Ogier, R. and P. Spagnolo, "Mobile Ad Hoc Network (MANET) 450 Extension of OSPF Using Connected Dominating Set (CDS) 451 Flooding", RFC 5614, DOI 10.17487/RFC5614, August 2009, 452 . 454 [RFC5820] Roy, A., Ed. and M. Chandra, Ed., "Extensions to OSPF to 455 Support Mobile Ad Hoc Networking", RFC 5820, 456 DOI 10.17487/RFC5820, March 2010, 457 . 459 [RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, 460 N., and JR. Rivers, "Extending ICMP for Interface and 461 Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, 462 April 2010, . 464 [RFC6232] Wei, F., Qin, Y., Li, Z., Li, T., and J. Dong, "Purge 465 Originator Identification TLV for IS-IS", RFC 6232, 466 DOI 10.17487/RFC6232, May 2011, 467 . 469 [RFC7182] Herberg, U., Clausen, T., and C. Dearlove, "Integrity 470 Check Value and Timestamp TLV Definitions for Mobile Ad 471 Hoc Networks (MANETs)", RFC 7182, DOI 10.17487/RFC7182, 472 April 2014, . 474 [RFC7921] Atlas, A., Halpern, J., Hares, S., Ward, D., and T. 475 Nadeau, "An Architecture for the Interface to the Routing 476 System", RFC 7921, DOI 10.17487/RFC7921, June 2016, 477 . 479 Appendix A. Flooding Optimization Operation 481 Recent testing has shown that flooding is largely a "non-issue" in 482 terms of scaling when using high speed links connecting intermediate 483 systems with reasonable processing power and memory. However, 484 testing has also shown that flooding will impact convergence speed 485 even in such environments, and flooding optimization has a major 486 impact on the performance of a link state protocol in resource 487 constrained environments. Some thoughts on flooding optimization in 488 general, and the flooding optimization contained in this document, 489 follow. 491 There are two general classes of flooding optimization available for 492 link state protocols. The first class of optimization relies on a 493 centralized service or server to gather the link state information 494 and redistribute it back into the intermediate systems making up the 495 fabric. Such solutions are attractive in many, but not all, 496 environments; hence these systems compliment, rather than compete 497 with, the system described here. Systems relying on a service or 498 server necessarily also rely on connectivity to that service or 499 server, either through an out-of-band network or connectivity through 500 the fabric itself. Because of this, these mechanisms do not apply to 501 all deployments; some deployments require underlying reachability 502 regardless of connectivity to an outside service or server. 504 The second possibility is to create a fully distributed system that 505 floods the minimal amount of information possible to every 506 intermediate system. The system described in this draft is an 507 example of such a system. Again, there are many ways to accomplish 508 this goal, but simplicity is a primary goal of the system described 509 in this draft. 511 The system described here divides the work into two different parts; 512 forward and reverse optimization. The forward optimization begins by 513 finding the set of intermediate systems two hops away from the 514 flooding device, and choosing a subset of connected neighbors that 515 will successfully reach this entire set of intermediate systems, as 516 shown in the diagram below. 518 G 519 | 520 A B C--+ 521 | | | | 522 +--D--+ E H 523 | | | 524 +----F--+--+ 526 Figure 2 528 If F is flooding some piece of information, then it will find the 529 entire set of intermediate systems within two hops by discovering its 530 neighbors and their neighbors from the local LSDB. This will include 531 A, B, C, D, and E--but not G. From this set, F can determine that D 532 can reach A and B, while a single flood to either E or H will reach 533 C. Hence F can flood to D and either E or H to reach C. F can 534 choose to flood to D and E normally. Because H still needs to 535 receive this new LSP (or fragment!), but does not need to reflood to 536 C, F can send the LSP using link local signaling. In this case, H 537 will receive and process the new LSP, but not reflood it. 539 Rather than carrying the information necessary through hello 540 extensions, as is done in [RFC5820], the neighbors are allowed to 541 complete initial synchronization, and then a truncated shortest path 542 tree is built to determine the "two hop neighborhood." This has the 543 advantage of using mechanisms already used in IS-IS, rather than 544 adding new processes. The risk with this process is any LSPs flooded 545 through the network before this initial calculation takes place will 546 be suboptimal. This "two hop neighborhood" process has been used in 547 OSPF deployments for a number of years, and has proven stable in 548 practice. 550 Rather than setting a timer for reflooding, the implementation 551 described here uses IS-IS' ability to describe the entire database 552 using a CSNP to ensure flooding is successful. This adds some small 553 amount of overhead, so there is some balance between optimal flooding 554 and ensuring flooding is complete. 556 The reverse optimization is simpler. It relies on the observation 557 that any intermediate system between the local IS and the origin of 558 the LSP, other than in the case of floods removing an LSP from the 559 shared LSDB, should have already received a copy of the LSP. For 560 instance, if F originates an LSP in the figure above, and E refloods 561 the LSP to C, C does not need to reflood back to F if F is on its 562 shortest path tree towards F. It is obvious this is not a "perfect" 563 optimization. A perfect optimization would block flooding back along 564 a directed acyclic graph towards the originator. Using the SPT, 565 however, is a quick way to reduce flooding without performing more 566 calculations. 568 The combination of these two optimizations have been seen, in 569 testing, to reduce the number of copies any IS receives from the tens 570 to precisely one. 572 Authors' Addresses 574 Russ White (editor) 575 LinkedIn 577 Email: russ@riw.us 579 Shawn Zandi (editor) 580 LinkedIn 582 Email: szandi@linkedin.com