idnits 2.17.1 draft-white-openfabric-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (June 15, 2018) is 2142 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 611, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 616, but no explicit reference was found in the text == Unused Reference: 'RFC5309' is defined on line 643, but no explicit reference was found in the text == Unused Reference: 'RFC5311' is defined on line 648, but no explicit reference was found in the text == Unused Reference: 'RFC5316' is defined on line 653, but no explicit reference was found in the text == Unused Reference: 'RFC7981' is defined on line 663, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 693, but no explicit reference was found in the text == Unused Reference: 'RFC5837' is defined on line 717, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-shen-isis-spine-leaf-ext-05 ** Obsolete normative reference: RFC 2629 (Obsoleted by RFC 7749) ** Obsolete normative reference: RFC 5316 (Obsoleted by RFC 9346) == Outdated reference: A later version (-25) exists of draft-ietf-isis-segment-routing-extensions-16 Summary: 3 errors (**), 0 flaws (~~), 12 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. White, Ed. 3 Internet-Draft S. Zandi, Ed. 4 Intended status: Informational LinkedIn 5 Expires: December 17, 2018 June 15, 2018 7 IS-IS Support for Openfabric 8 draft-white-openfabric-06 10 Abstract 12 Spine and leaf topologies are widely used in hyperscale and cloud 13 scale networks. In most of these networks, configuration is 14 automated, but difficult, and topology information is extracted 15 through broad based connections. Policy is often integrated into the 16 control plane, as well, making configuration, management, and 17 troubleshooting difficult. Openfabric is an adaptation of an 18 existing, widely deployed link state protocol, Intermediate System to 19 Intermediate System (IS-IS) that is designed to: 21 o Provide a full view of the topology from a single point in the 22 network to simplify operations 24 o Minimize configuration of each Intermediate System (IS) (also 25 called a router or switch) in the network 27 o Optimize the operation of IS-IS within a spine and leaf fabric to 28 enable scaling 30 This document begins with an overview of openfabric, including a 31 description of what may be removed from IS-IS to enable scaling. The 32 document then describes an optimized adjacency formation process; an 33 optimized flooding scheme; some thoughts on the operation of 34 openfabric, metrics, and aggregation; and finally a description of 35 the changes to the IS-IS protocol required for openfabric. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on December 17, 2018. 54 Copyright Notice 56 Copyright (c) 2018 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 3 73 1.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 3 74 1.3. Simplification . . . . . . . . . . . . . . . . . . . . . 3 75 1.4. Additions and Requirements . . . . . . . . . . . . . . . 4 76 1.5. Sample Network . . . . . . . . . . . . . . . . . . . . . 4 77 2. Modified Adjacency Formation . . . . . . . . . . . . . . . . 6 78 2.1. Level 2 Adjacencies Only . . . . . . . . . . . . . . . . 6 79 2.2. Point-to-point Adjacencies . . . . . . . . . . . . . . . 6 80 2.3. Three Way Handshake Support . . . . . . . . . . . . . . . 7 81 2.4. Adjacency Formation Optimization . . . . . . . . . . . . 7 82 3. Advertisement of Reachability Information . . . . . . . . . . 8 83 4. Determining and Advertising Location on the Fabric . . . . . 9 84 5. Flooding Optimization . . . . . . . . . . . . . . . . . . . . 10 85 5.1. Flooding Failures . . . . . . . . . . . . . . . . . . . . 11 86 6. Other Optimizations . . . . . . . . . . . . . . . . . . . . . 12 87 6.1. Transit Link Reachability . . . . . . . . . . . . . . . . 12 88 6.2. Transiting T0 Intermediate Systems . . . . . . . . . . . 12 89 7. Openfabric and Route Aggregation . . . . . . . . . . . . . . 13 90 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 91 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 92 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 93 9.2. Informative References . . . . . . . . . . . . . . . . . 15 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 96 1. Introduction 98 1.1. Goals 100 Spine and leaf fabrics are often used in large scale data centers; in 101 this application, they are commonly called a fabric because of their 102 regular structure and predictable forwarding and convergence 103 properties. This document describes modifications to the IS-IS 104 protocol to enable it to run efficiently on a large scale spine and 105 leaf fabric, openfabric. The goals of this control plane are: 107 o Provide a full view of the topology from a single point in the 108 network to simplify operations 110 o Minimize configuration of each IS in the network 112 o Optimize the operation of IS-IS within a spine and leaf fabric to 113 enable scaling 115 1.2. Contributors 117 The following people have contributed to this draft: Nikos 118 Triantafillis (reflected flooding optimization), Ivan Pepelnjak 119 (fabric locality calculation modifications), Christian Franke (fabric 120 localigy calculation modification), Hannes Gredler (do not reflood 121 optimizations), Les Ginsberg (capabilities encoding, circuit local 122 reflooding), Naiming Shen (capabilities encoding, circuit local 123 reflooding), Uma Chunduri (failure mode suggestions, flooding), Nick 124 Russo, and Rodny Molina. 126 See [RFC5449], [RFC5614], and [RFC7182] for similar solutions in the 127 Mobile Ad Hoc Networking (MANET) solution space. 129 1.3. Simplification 131 In building any scalable system, it is often best to begin by 132 removing what is not needed. In this spirit, openfabric 133 implementations MAY remove the following from IS-IS: 135 o External metrics. There is no need for external metrics in large 136 scale spine and leaf fabrics; it is assumed that metrics will be 137 properly configured by the operator to account for the correct 138 order of route preference at any route redistribution point. 140 o Tags and traffic engineering processing. Openfabric is only 141 designed to provide topology and reachability information. It is 142 not designed to provide for traffic engineering, route preference 143 through tags, or other policy mechanisms. It is assumed that all 144 routing policy will be provided through an overlay system which 145 communicates directly with each IS in the fabric, such as PCEP 146 [RFC5440] or I2RS [RFC7921]. Traffic engineering is assumed to be 147 provided through Segment Routing (SR) 148 [I-D.ietf-spring-segment-routing]. 150 1.4. Additions and Requirements 152 To create a scalable link state fabric, openfabric includes the 153 following: 155 o A slightly modified adjacency formation process. 157 o Mechanisms for determining which tier within a spine and leaf 158 fabric in which the IS is located. 160 o A mechanism that reduces flooding to the minimum possible, while 161 still ensuring complete database synchronization among the 162 intermediate systems within the fabric. 164 Three general requirements are placed here; more specific 165 requirements are considered in the following sections. Openfabric 166 implementations: 168 o MUST support [RFC5301] and enable hostname advertisement by 169 default if a hostname is configured on the intermediate system. 171 o SHOULD support [RFC6232], purge originator identification for IS- 172 IS. 174 o MUST NOT be mixed with standard IS-IS implementations in 175 operational deployments. Openfabric and standard IS-IS 176 implementations SHOULD be treated as two separate protocols. 178 1.5. Sample Network 180 The following spine and leaf fabric will be used to describe these 181 modifications. 183 +----+ +----+ +----+ +----+ +----+ +----+ 184 | 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0) 185 +----+ +----+ +----+ +----+ +----+ +----+ 187 +----+ +----+ +----+ +----+ +----+ +----+ 188 | 2A | | 2B | | 2C | | 2D | | 2E | | 2F | (T1) 189 +----+ +----+ +----+ +----+ +----+ +----+ 191 +----+ +----+ +----+ +----+ +----+ +----+ 192 | 3A | | 3B | | 3C | | 3D | | 3E | | 3F | (T2) 193 +----+ +----+ +----+ +----+ +----+ +----+ 195 +----+ +----+ +----+ +----+ +----+ +----+ 196 | 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1) 197 +----+ +----+ +----+ +----+ +----+ +----+ 199 +----+ +----+ +----+ +----+ +----+ +----+ 200 | 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0) 201 +----+ +----+ +----+ +----+ +----+ +----+ 203 Figure 1 205 To reduce confusion (spine and leaf fabrics are difficult to draw in 206 plain text art), this diagram does not contain the connections 207 between devices. The reader should assume that each device in a 208 given layer is connected to every device in the layer above it. For 209 instance: 211 o 5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F 213 o 5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F 215 o 4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 216 5F 218 o 4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 219 5F 221 o etc. 223 The tiers or stages of the fabric are also marked for easier 224 reference. T0 is assumed to be connected to application servers, or 225 rather they are Top of Rack (ToR) intermediate systems. The 226 remaining tiers, T1 and T2, are connected only to the fabric itself. 227 Note there are no "cross links," or "east west" links in the 228 illustrated fabric. The fabric locality detection mechanism 229 described here will not work if there are cross links running east/ 230 west through the fabric. Locality detection may be possible in such 231 a fabric; this is an area for further study. 233 2. Modified Adjacency Formation 235 Because Openfabric operates in a tightly controlled data center 236 environment, various modifications can be made to the IS-IS neighbor 237 formation process to increase efficencicy and simplify the protocol. 238 Specifically, Openfabric implementations SHOULD support [RFC3719], 239 section 4, hello padding for IS-IS. Variable hello padding SHOULD 240 NOT be used, as data center fabrics are built using high speed links 241 on which padded hellos will have little performance impact. Further 242 modifications to the neighbor formation process are considered in the 243 following sections. 245 2.1. Level 2 Adjacencies Only 247 Openfabric is designed to work in a single flooding domain over a 248 single data center fabric at the scale of thousands of routers with 249 hundreds of thousands of routes (so a moderate scale in router and 250 route count terms). Because of the way Openfabric optimizes 251 operation in this environment, it is not necessary nor desirable to 252 build multiple flooding domains. For instance, the flooding 253 optimizations described later this document require a full view of 254 the topology, as does any proposed overlay to inject policy into the 255 forwarding plane. In light of this, the following changes SHOULD BE 256 to IS-IS implemetations to support Openfabric: 258 o IIH PDU 17 (level 2 point-to-point circuit hello) should be the 259 only IIH PDU type transmitted (see section 9.7 of ISO 10589) 261 o In IIH PDU 17 (level 2 point-to-point circuit hello), the Circuit 262 Type field should be set to 2 (see section 9.7 of ISO 10589) 264 o Support for IIH PDU 15 (level 1 broadcast hello) should be removed 265 (see section 9.5 of ISO 10589) 267 o Support for IIH PDU 16 (level 2 broadcast hello) should be removed 268 (see section 9.6 of ISO 10589) 270 2.2. Point-to-point Adjacencies 272 Data center network fabrics only contain point-to-point links; 273 because of this, there is no reason to support any broadcast link 274 types, nor to support the Designated Intermediate System processing, 275 including pseudonode creation. In light ot his, processing related 276 to sections 7.2.3 (broadcast networks), 7.3.8 (generation of level 1 277 pseudonode LSPs), 7.3.10 (generation of level 2 pseudonode LSPs), and 278 section 8.4.5 (LAN designated intermediate systems) in [ISO10589] 279 SHOULD BE removed. 281 2.3. Three Way Handshake Support 283 It is important that two way connectivity be established before 284 synchronizing the link state database, or routing through a link in a 285 data center fabric. To reject optical failures that cause a one way 286 connection between two routers, fabricDC must support the three way 287 handshake mechanism described in [RFC5303]. 289 2.4. Adjacency Formation Optimization 291 While adjacency formation is not considered particularly burdensome 292 in IS-IS, it may still be useful to reduce the amount of state 293 transferred across the network when connecting a new IS to the 294 fabric. In its simplest form, the process is: 296 o An IS connected to the fabric will send hellos on all links. 298 o The IS will only complete the three-way handshake with one newly 299 discovered neighbor; this would normally be the first neighbor 300 which sends the newly connected intermediate system's ID back in 301 the three-way handshake process. 303 o The IS will complete its database exchange with this one newly 304 adjacent neighbor. 306 o Once this process is completed, the IS will continue processing 307 the remaining neighbors as normal. 309 o If synchronization is not achieved within twice the dead timer on 310 the local interface, the newly connected IS will repeat this 311 process with the second neighbor with which it forms a three-way 312 adjacency. 314 This process allows each IS newly added to the fabric to exchange a 315 full table once; a very minimal amount of information will be 316 transferred with the remaining neighbors to reach full 317 synchronization. 319 Any such optimization is bound to present a tradeoff between several 320 factors; the mechanism described here increases the amount of time 321 required to form adjacencies slightly in order to reduce the total 322 state carried across the network. An alternative mechanism could 323 provide a better balance of the amount of information carried across 324 the network for initial synchronization and the time required to 325 synchronize a new IS. For instance, an IS could choose to 326 synchronize its database with two or three adjacent intermediate 327 systems, which could speed the synchronization process up at the cost 328 of carrying additional data on the network. A locally determined 329 balance between the speed of synchronization and the amount of data 330 carried on the network can be acheived by adjusting the number of 331 adjacent intermediate systems the newly attached IS synchronizes 332 with. 334 3. Advertisement of Reachability Information 336 IS-IS describes the topology in two different sets of TLVs; the first 337 describes the set of neighbors connected to an IS, the second 338 describes the set of reachable destination connected to an IS. There 339 are two different forms of both of these descriptions, one of which 340 carries what are widely called narrow metrics, the other of which 341 carries what are widely called wide metrics. In a tightly controlled 342 data center fabric implementation, such as the ones Openfabric is 343 designed to support, no IS that supports narrow metrics will ever be 344 deployed or supported; hence there is no reason to support any metric 345 type other than wide metrics. 347 o The Level 2 Link State PDU (type 20 in section 9.9 of [ISO10589]) 348 and the scoped flooding PDU (type 10 in section 3.1 of [RFC7356]) 349 SHOULD BE the only PDU types used to carry link state information 350 in a Openfabric implementation 352 o Processing related to the Level 1 Link State PDU (type 18) MAY BE 353 removed from Openfabric implementations (see section 9.8 of 354 [ISO10589]) 356 o Neighbor reachability MUST BE carried in TLV type 22 (see section 357 3 of [RFC5305]) 359 o IPv4 reachability SHOULD BE carried in TLV type 135 (see section 4 360 of [RFC5305]), or TLV type 235 for multitopology implementations 361 (see [RFC5120]) 363 o IPv6 reachability SHOULD BE carried in TLV type 236 (see 364 [RFC5308]), or TLV type 237 for multitopology implemenations (see 365 [RFC5120]) 367 o Processing related to the neighbor reachability TLV (type 2, see 368 sections 9.8 and 9.9 of [ISO10589]) SHOULD BE removed 370 o Processing related to the narrow metric IP reachability TLV (types 371 128 and 130) SHOULD BE removed 373 Further, if segment routing support is desired, Openfabric MAY 374 support the Prefix Segment Identifier sub-TLV and other TLVs as 375 required in [I-D.ietf-isis-segment-routing-extensions]. 377 4. Determining and Advertising Location on the Fabric 379 The tier to which a IS is connected is useful to enable 380 autoconfiguration of intermediate systems connected to the fabric and 381 to reduce flooding. Once the tier of an intermediate system within 382 the fabric has been determined, it MUST be advertised using the 4 bit 383 Tier field described in section 3.3 of 384 [I-D.shen-isis-spine-leaf-ext]. This section describes a method of 385 calculating the tier number, assuming the tier numbers rise in value 386 from the edge of the fabric. 388 This method begins with two of the T0 intermediate systems 389 advertising their location in the fabric. This information can 390 either be obtained through: 392 o Two T0 intermediate systems are manually configured to advertise 393 0x00 in their IS reachability tier sub-TLV, indicating they are at 394 the edge of the fabric (a ToR IS). 396 o The T0 intermediate systems detect they are T0 through the 397 presence connected hosts (i.e. through a request for address 398 assignment or some other means). If such detection is used, and 399 the IS determines it is located at T0, it should advertise 0x00 in 400 its IS reachability tier sub-TLV. 402 If the first method is used, the two T0 routers MUST be "maximally 403 separated" on the fabric. They must be a maximal number of hops 404 apart, or rather thay MUST NOT be connected to the same T1 device as 405 their "upstream" towards the superspines in a 5 ary fabric. 407 The second method above SHOULD be used with care, as it may not be 408 secure, and it may not work in all data center environments. For 409 instance, if a host is mistakenly (or intentionally, as a form of 410 attack) attached to a spine IS, or a request for address assignment 411 is transmitted to a spine IS during the bootup phase of the device or 412 fabric, it is possible to cause a spine IS to advertise itself as a 413 T0. Unless the autodetection of the T0 devices is secured, the 414 manual mechanism SHOULD BE used (configuring at least one T0 device 415 manually). 417 Given the correct configuration of two T0 devices, maximally spaced 418 on the fabric, the remaining intermediate systems calculate their 419 tier number as follows: 421 o The local IS calculates an SPT (using SPF) setting the cost of 422 every link to 1; this effectively calculates a topology only view 423 of the network, without considering any configured link costs 425 o Ensure that at least two T0 are in the calculated SPT; otherwise 426 abort 428 o Find the furthest T0; call this node A and set LD to the cost; the 429 "farthest T0" is the T0 with the largest metric, or the farthest 430 distance from the local calculating node 432 o Calculate an SPT (using SPF) from the perspective of A (above) 433 setting the cost of every link to 1 435 o Find the furthest IS in A's SPT; call this node B and set RD to 436 the cost from A to B 438 o Calculate the tier number of the local IS by subtracting LD from 439 RD 441 In the example network, assume 5A and 1C are manually configured as a 442 T0, and are advertising their tier numbers. From here: 444 o From 1A the path to 5A is 4 hops; this is LD 446 o Run SPF from the perspective of 5A with all link metrics set to 1 448 o From 5A the path length to 1C is 4; this is RD 450 o RD - LD is 0 at 1A, so 1A is T0, or a ToR 452 This process will work for any spine and leaf fabric without "cross 453 links." 455 5. Flooding Optimization 457 Flooding is perhaps the most challenging scaling issue for a link 458 state protocol running on a dense, large scale fabric. To reduce the 459 flooding of link state information in the form of Link State Protocol 460 Data Units (LSPs), Openfabric takes advantage of information already 461 available in the link state protocol, the list of the local 462 intermediate system's neighbor's neighbors, and the fabric locality 463 computed above. The following tables are required to compute a set 464 of reflooders: 466 o Neighbor List (NL) list: The set of neighbors 467 o Neighbor's Neighbors (NN) list: The set of neighbor's neighbors; 468 this can be calculated by running SPF truncated to two hops 470 o Do Not Reflood (DNR) list: The set of neighbors who should have 471 LSPs (or fragments) who should not reflood LSPs 473 o Reflood (RF) list: The set of neighbors who should flood LSPs (or 474 fragments) to their adjacent neighbors to ensure synchronization 476 NL is set to contain all neighbors, and sorted deterministically (for 477 instance, from the highest IS identifier to the lowest). All 478 intermediate systems within a single fabric SHOULD use the same 479 mechanism for sorting the NL list. NN is set to contain all 480 neighbor's neighbors, or all intermediate systems that are two hops 481 away, as determined by performing a truncated SPF. The DNR and RF 482 tables are initially empty. To begin, the following steps are taken 483 to reduce the size of NN and NL: 485 o Move any IS in NL with its tier (or fabric location) set to T0 to 486 DNR 488 o Remove all intermediate systems from NL and NN that in the 489 shortest path to the IS that originated the LSP 491 Then, for every IS in NL: 493 o If the current entry in NL is connected to any entries in NN: 495 * Move the IS to RF 497 * Remove the intermediate systems connected to the IS from NN 499 o Else move the IS to DNR 501 When flooding, LSPs transmitted to adjacent neighbors on the RF list 502 will be transmitted normally. Adjacent intermediate systems on this 503 list will reflood received LSPs into the next stage of the topology, 504 ensuring database synchronization. LSPs transmitted to adjacent 505 neighbors on the DNR list, however, MUST be transmitted using a 506 circuit scope PDU as described in [RFC7356]. 508 5.1. Flooding Failures 510 It is possible in some failure modes for flooding to be incomplete 511 because of the flooding optimizations outlined. Specifically, if a 512 reflooder fails, or is somehow disconnected from all the links across 513 which it should be reflooding, it is possible an LSP is only 514 partially flooded through the fabric. To prevent such situations, 515 any IS receiving an LSP transmitted using DNR SHOULD: 517 o Set a short timer; the default should be less than one second 519 o When the timer expires, send a Complete Sequence Number Packet 520 (CSNP) to all neighbors 522 o Process any Partial Sequence Number Packets (PSNPs) as required to 523 resynchronize 525 o If a resynchronization is required, notify the network operator 526 through a network management system 528 6. Other Optimizations 530 6.1. Transit Link Reachability 532 In order to reduce the amount of control plane state carried on large 533 scale spine and leaf fabrics, openfabric implementations SHOULD NOT 534 advertise reachability for transit links. These links MAY remain 535 unnumbered, as IS-IS does not require layer 3 IP addresses to 536 operate. Each IS SHOULD be configured with a single loopback 537 address, which is assigned an IPv6 address, to provide reachability 538 to intermediate systems which make up the fabric. 540 [RFC3277] SHOULD be supported on devices supporting openfabric with 541 unnumbered interface in order to support traceability and network 542 management. 544 6.2. Transiting T0 Intermediate Systems 546 In data center fabrics, ToR intermediate systems SHOULD NOT be used 547 to transit between two T1 (or above) spine intermediate systems. The 548 simplest way to prevent this is to set the overload bit [RFC3277] for 549 all the LSPs originated from T0 intermediate systems. However, this 550 solution would have the unfortunate side effect of causing all 551 reachability beyond any T0 IS to have the same metric, and many 552 implementations treat a set overload bit as a metric of 0xFFFF in 553 calculating the Shortest Path Tree (SPT). This document proposes an 554 alternate solution which preserves the leaf node metric, while still 555 avoiding transiting T0 intermediate systems. 557 Specifically, all T0 intermediate systems SHOULD advertise their 558 metric to reach any T1 adjacent neighbor with a cost of 0XFFE. T1 559 intermediate systems, on the other hand, will advertise T0 560 intermediate systems with the actual interface cost used to reach the 561 T0 IS. Hence, links connecting T0 and T1 intermediate systems will 562 be advertised with an asymmetric cost that discourages transiting T0 563 intermediate systems, while leaving reachability to the destinations 564 attached to T0 devices the same. 566 7. Openfabric and Route Aggregation 568 While schemes may be designed so reachability information can be 569 aggregated in Openfabric deployments, this is not a recommended 570 configuraiton. 572 8. Security Considerations 574 This document outlines modifications to the IS-IS protocol for 575 operation on large scale data center fabrics. While it does add new 576 TLVs, and some local processing changes, it does not add any new 577 security vulnerabilities to the operation of IS-IS. However, 578 openfabric implementations SHOULD implement IS-IS cryptographic 579 authentication, as described in [RFC5304], and should enable other 580 security measures in accordance with best common practices for the 581 IS-IS protocol. 583 If T0 intermediate systems are auto-detected using information 584 outside Openfabric, it is possible to attack the calucations used for 585 flooding reduction and auto-configuration of intermediate systems. 586 For instance, if a request for an address pool is used as an 587 indicator of an attached host, and hence receiving such a request 588 causes an intermediate system to advertise itself as T0, it is 589 possible for an attacker (or a simple mistake) to cause auto- 590 configuration to fail. Any such auto-detection mechanims SHOULD BE 591 secured using appropriate techniques, as described by any protocols 592 or mechanisms used. 594 9. References 596 9.1. Normative References 598 [I-D.shen-isis-spine-leaf-ext] 599 Shen, N., Ginsberg, L., and S. Thyamagundalu, "IS-IS 600 Routing for Spine-Leaf Topology", draft-shen-isis-spine- 601 leaf-ext-05 (work in progress), January 2018. 603 [ISO10589] 604 International Organization for Standardization, 605 "Intermediate system to Intermediate system intra-domain 606 routeing information exchange protocol for use in 607 conjunction with the protocol for providing the 608 connectionless-mode Network Service (ISO 8473)", ISO/ 609 IEC 10589:2002, Second Edition, Nov 2002. 611 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 612 Requirement Levels", BCP 14, RFC 2119, 613 DOI 10.17487/RFC2119, March 1997, 614 . 616 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 617 DOI 10.17487/RFC2629, June 1999, 618 . 620 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 621 Topology (MT) Routing in Intermediate System to 622 Intermediate Systems (IS-ISs)", RFC 5120, 623 DOI 10.17487/RFC5120, February 2008, 624 . 626 [RFC5301] McPherson, D. and N. Shen, "Dynamic Hostname Exchange 627 Mechanism for IS-IS", RFC 5301, DOI 10.17487/RFC5301, 628 October 2008, . 630 [RFC5303] Katz, D., Saluja, R., and D. Eastlake 3rd, "Three-Way 631 Handshake for IS-IS Point-to-Point Adjacencies", RFC 5303, 632 DOI 10.17487/RFC5303, October 2008, 633 . 635 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 636 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 637 2008, . 639 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 640 DOI 10.17487/RFC5308, October 2008, 641 . 643 [RFC5309] Shen, N., Ed. and A. Zinin, Ed., "Point-to-Point Operation 644 over LAN in Link State Routing Protocols", RFC 5309, 645 DOI 10.17487/RFC5309, October 2008, 646 . 648 [RFC5311] McPherson, D., Ed., Ginsberg, L., Previdi, S., and M. 649 Shand, "Simplified Extension of Link State PDU (LSP) Space 650 for IS-IS", RFC 5311, DOI 10.17487/RFC5311, February 2009, 651 . 653 [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in 654 Support of Inter-Autonomous System (AS) MPLS and GMPLS 655 Traffic Engineering", RFC 5316, DOI 10.17487/RFC5316, 656 December 2008, . 658 [RFC7356] Ginsberg, L., Previdi, S., and Y. Yang, "IS-IS Flooding 659 Scope Link State PDUs (LSPs)", RFC 7356, 660 DOI 10.17487/RFC7356, September 2014, 661 . 663 [RFC7981] Ginsberg, L., Previdi, S., and M. Chen, "IS-IS Extensions 664 for Advertising Router Information", RFC 7981, 665 DOI 10.17487/RFC7981, October 2016, 666 . 668 9.2. Informative References 670 [I-D.ietf-isis-segment-routing-extensions] 671 Previdi, S., Ginsberg, L., Filsfils, C., Bashandy, A., 672 Gredler, H., Litkowski, S., Decraene, B., and J. Tantsura, 673 "IS-IS Extensions for Segment Routing", draft-ietf-isis- 674 segment-routing-extensions-16 (work in progress), April 675 2018. 677 [I-D.ietf-spring-segment-routing] 678 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 679 Litkowski, S., and R. Shakir, "Segment Routing 680 Architecture", draft-ietf-spring-segment-routing-15 (work 681 in progress), January 2018. 683 [RFC3277] McPherson, D., "Intermediate System to Intermediate System 684 (IS-IS) Transient Blackhole Avoidance", RFC 3277, 685 DOI 10.17487/RFC3277, April 2002, 686 . 688 [RFC3719] Parker, J., Ed., "Recommendations for Interoperable 689 Networks using Intermediate System to Intermediate System 690 (IS-IS)", RFC 3719, DOI 10.17487/RFC3719, February 2004, 691 . 693 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 694 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 695 DOI 10.17487/RFC4271, January 2006, 696 . 698 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 699 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 700 2008, . 702 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 703 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 704 DOI 10.17487/RFC5440, March 2009, 705 . 707 [RFC5449] Baccelli, E., Jacquet, P., Nguyen, D., and T. Clausen, 708 "OSPF Multipoint Relay (MPR) Extension for Ad Hoc 709 Networks", RFC 5449, DOI 10.17487/RFC5449, February 2009, 710 . 712 [RFC5614] Ogier, R. and P. Spagnolo, "Mobile Ad Hoc Network (MANET) 713 Extension of OSPF Using Connected Dominating Set (CDS) 714 Flooding", RFC 5614, DOI 10.17487/RFC5614, August 2009, 715 . 717 [RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, 718 N., and JR. Rivers, "Extending ICMP for Interface and 719 Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, 720 April 2010, . 722 [RFC6232] Wei, F., Qin, Y., Li, Z., Li, T., and J. Dong, "Purge 723 Originator Identification TLV for IS-IS", RFC 6232, 724 DOI 10.17487/RFC6232, May 2011, 725 . 727 [RFC7182] Herberg, U., Clausen, T., and C. Dearlove, "Integrity 728 Check Value and Timestamp TLV Definitions for Mobile Ad 729 Hoc Networks (MANETs)", RFC 7182, DOI 10.17487/RFC7182, 730 April 2014, . 732 [RFC7921] Atlas, A., Halpern, J., Hares, S., Ward, D., and T. 733 Nadeau, "An Architecture for the Interface to the Routing 734 System", RFC 7921, DOI 10.17487/RFC7921, June 2016, 735 . 737 Authors' Addresses 739 Russ White (editor) 740 LinkedIn 742 Email: russ@riw.us 744 Shawn Zandi (editor) 745 LinkedIn 747 Email: szandi@linkedin.com