idnits 2.17.1 draft-ietf-lsr-isis-spine-leaf-ext-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 8, 2019) is 1873 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' ** Obsolete normative reference: RFC 5306 (Obsoleted by RFC 8706) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group N. Shen 3 Internet-Draft L. Ginsberg 4 Intended status: Standards Track Cisco Systems 5 Expires: September 9, 2019 S. Thyamagundalu 6 March 8, 2019 8 IS-IS Routing for Spine-Leaf Topology 9 draft-ietf-lsr-isis-spine-leaf-ext-01 11 Abstract 13 This document describes a mechanism for routers and switches in a 14 Spine-Leaf type topology to have non-reciprocal Intermediate System 15 to Intermediate System (IS-IS) routing relationships between the 16 leafs and spines. The leaf nodes do not need to have the topology 17 information of other nodes and exact prefixes in the network. This 18 extension also has application in the Internet of Things (IoT). 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 9, 2019. 37 Copyright Notice 39 Copyright (c) 2019 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 56 2. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Spine-Leaf (SL) Extension . . . . . . . . . . . . . . . . . . 4 58 3.1. Topology Examples . . . . . . . . . . . . . . . . . . . . 4 59 3.2. Applicability Statement . . . . . . . . . . . . . . . . . 5 60 3.3. Spine-Leaf TLVs . . . . . . . . . . . . . . . . . . . . . 6 61 3.3.1. Spine-Leaf TLV . . . . . . . . . . . . . . . . . . . 6 62 3.3.2. Leaf-Set TLV . . . . . . . . . . . . . . . . . . . . 7 63 3.3.2.1. Leaf-Set Sub-TLVs . . . . . . . . . . . . . . . . 7 64 3.3.3. Advertising IPv4/IPv6 Reachability . . . . . . . . . 8 65 3.3.4. Advertising Connection to RF-Leaf Node . . . . . . . 8 66 3.4. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 9 67 3.4.1. Pure CLOS Topology . . . . . . . . . . . . . . . . . 10 68 3.5. Implementation and Operation . . . . . . . . . . . . . . 11 69 3.5.1. CSNP PDU . . . . . . . . . . . . . . . . . . . . . . 11 70 3.5.2. Leaf to Leaf connection . . . . . . . . . . . . . . . 12 71 3.5.2.1. Local traffic only . . . . . . . . . . . . . . . 12 72 3.5.2.2. Transit traffic allowed . . . . . . . . . . . . . 12 73 3.5.3. Spine Node Hostname . . . . . . . . . . . . . . . . . 13 74 3.5.4. IS-IS Reverse Metric . . . . . . . . . . . . . . . . 13 75 3.5.5. Spine-Leaf Traffic Engineering . . . . . . . . . . . 13 76 3.5.6. Other End-to-End Services . . . . . . . . . . . . . . 13 77 3.5.7. Address Family and Topology . . . . . . . . . . . . . 14 78 3.5.8. Migration . . . . . . . . . . . . . . . . . . . . . . 14 79 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 15 81 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 82 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 83 7.1. Normative References . . . . . . . . . . . . . . . . . . 15 84 7.2. Informative References . . . . . . . . . . . . . . . . . 17 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 87 1. Introduction 89 The IS-IS routing protocol defined by [ISO10589] has been widely 90 deployed in provider networks, data centers and enterprise campus 91 environments. In the data center and enterprise switching networks, 92 a Spine-Leaf topology is commonly used. This document describes a 93 mechanism where IS-IS routing can be optimized for a Spine-Leaf 94 topology. 96 In a Spine-Leaf topology, normally a leaf node connects to a number 97 of spine nodes. Data traffic going from one leaf node to another 98 leaf node needs to pass through one of the spine nodes. Also, the 99 decision to choose one of the spine nodes is usually part of equal 100 cost multi-path (ECMP) load sharing. The spine nodes can be 101 considered as gateway devices to reach destinations on other leaf 102 nodes. In this type of topology, the spine nodes have to know the 103 topology and routing information of the entire network, but the leaf 104 nodes only need to know how to reach the gateway devices to which are 105 the spine nodes they are uplinked. 107 This document describes the IS-IS Spine-Leaf extension that allows 108 the spine nodes to have all the topology and routing information, 109 while keeping the leaf nodes free of topology information other than 110 the default gateway routing information. The leaf nodes do not even 111 need to run a Shortest Path First (SPF) calculation since they have 112 no topology information. 114 1.1. Requirements Language 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 118 document are to be interpreted as described in RFC 2119 [RFC2119]. 120 2. Motivations 122 o The leaf nodes in a Spine-Leaf topology do not require complete 123 topology and routing information of the entire domain since their 124 forwarding decision is to use ECMP with spine nodes as default 125 gateways 127 o The spine nodes in a Spine-Leaf topology are richly connected to 128 leaf nodes, which introduces significant flooding duplication if 129 they flood all Link State PDUs (LSPs) to all the leaf nodes. It 130 saves both spine and leaf nodes' CPU and link bandwidth resources 131 if flooding is blocked to leaf nodes. For small Top of the Rack 132 (ToR) leaf switches in data centers, it is meaningful to prevent 133 full topology routing information and massive database flooding 134 through those devices. 136 o When a spine node advertises a topology change, every leaf node 137 connected to it will flood the update to all the other spine 138 nodes, and those spine nodes will further flood them to all the 139 leaf nodes, causing a O(n^2) flooding storm which is largely 140 redundant. 142 o Similar to some of the overlay technologies which are popular in 143 data centers, the edge devices (leaf nodes) may not need to 144 contain all the routing and forwarding information on the device's 145 control and forwarding planes. "Conversational Learning" can be 146 utilized to get the specific routing and forwarding information in 147 the case of pure CLOS topology and in the events of link and node 148 down. 150 o Small devices and appliances of Internet of Things (IoT) can be 151 considered as leafs in the routing topology sense. They have CPU 152 and memory constrains in design, and those IoT devices do not have 153 to know the exact network topology and prefixes as long as there 154 are ways to reach the cloud servers or other devices. 156 3. Spine-Leaf (SL) Extension 158 3.1. Topology Examples 160 +--------+ +--------+ +--------+ 161 | | | | | | 162 | Spine1 +----+ Spine2 +- ......... -+ SpineN | 163 | | | | | | 164 +-+-+-+-++ ++-+-+-+-+ +-+-+-+-++ 165 +------+ | | | | | | | | | | | 166 | +-----|-|-|------+ | | | | | | | 167 | | +--|-|-|--------+-|-|-----------------+ | | | 168 | | | | | | +---+ | | | | | 169 | | | | | | | +--|-|-------------------+ | | 170 | | | | | | | | | | +------+ +----+ 171 | | | | | | | | | +--------------|----------+ | 172 | | | | | | | | +-------------+ | | | 173 | | | | | +----|--|----------------|--|--------+ | | 174 | | | | +------|--|--------------+ | | | | | 175 | | | +------+ | | | | | | | | 176 ++--+--++ +-+-+--++ ++-+--+-+ ++-+--+-+ 177 | Leaf1 |~~~~~~| Leaf2 | ........ | LeafX | | LeafY | 178 +-------+ +-------+ +-------+ +-------+ 180 Figure 1: A Spine-Leaf Topology 182 +---------+ +--------+ 183 | Spine1 | | Spine2 | 184 +-+-+-+-+-+ +-+-+-+-++ 185 | | | | | | | | 186 | | | +-----------------|-|-|-|-+ 187 | | +------------+ | | | | | 188 +--------+ +-+ | | | | | | 189 | +----------------------------+ | | | | 190 | | | +------------------+ | +----+ 191 | | | | | +-------+ | | 192 | | | | | | | | 193 +-+---+-+ +--+--+-+ +-+--+--+ +--+--+-+ 194 | Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 | 195 +-------+ +-------+ +-------+ +-------+ 197 Figure 2: A CLOS Topology 199 3.2. Applicability Statement 201 This extension assumes the network is a Spine-Leaf topology, and it 202 should not be applied in an arbitrary network setup. The spine nodes 203 can be viewed as the aggregation layer of the network, and the leaf 204 nodes as the access layer of the network. The leaf nodes use a load 205 sharing algorithm with spine nodes as nexthops in routing and 206 forwarding. 208 This extension works when the spine nodes are inter-connected, and it 209 works with a pure CLOS or Fat Tree topology based network where the 210 spines are NOT horizontally interconnected. 212 Although the example diagram in Figure 1 shows a fully meshed Spine- 213 Leaf topology, this extension also works in the case where they are 214 partially meshed. For instance, leaf1 through leaf10 may be fully 215 meshed with spine1 through spine5 while leaf11 through leaf20 is 216 fully meshed with spine4 through spine8, and all the spines are 217 inter-connected in a redundant fashion. 219 This extension can also work in multi-level spine-leaf topology. The 220 lower level spine node can be a 'leaf' node to the upper level spine 221 node. A spine-leaf 'Tier' can be exchanged with IS-IS hello packets 222 to allow tier X to be connected with tier X+1 using this extension. 223 Normally tier-0 will be the TOR routers and switches if provisioned. 225 This extension also works with normal IS-IS routing in a topology 226 with more than two layers of spine and leaf. For instance, in 227 example diagrams Figure 1 and Figure 2, there can be another Core 228 layer of routers/switches on top of the aggregation layer. From an 229 IS-IS routing point of view, the Core nodes are not affected by this 230 extension and will have the complete topology and routing information 231 just like the spine nodes. To make the network even more scalable, 232 the Core layer can operate as a level-2 IS-IS sub-domain while the 233 Spine and Leaf layers operate as stays at the level-1 IS-IS domain. 235 This extension assumes the link between the spine and leaf nodes are 236 point-to-point, or point-to-point over LAN [RFC5309]. The links 237 connecting among the spine nodes or the links between the leaf nodes 238 can be any type. 240 3.3. Spine-Leaf TLVs 242 This extension introduces two new TLVs, the Spine-Leaf TLV and the 243 Leaf-Set TLV. The Spine-Leaf TLV may be advertised in IS-IS Hello 244 (IIH) PDUs; the Leaf-Set TLV may be advertised in IS-IS Circuit 245 Scoped Link State PDUs (CS-LSP) [RFC7356]. They are used by both 246 spine and leaf nodes in this Spine-Leaf mechanism. 248 3.3.1. Spine-Leaf TLV 250 0 1 2 3 251 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 | Type | Length | SL Flag | 254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 256 The fields of this TLV are defined as follows: 258 Type: 1 octet Suggested value 151 (to be assigned by IANA) 260 Length: 1 octet (2 + length of sub-TLVs). 262 SL Flags: 16 bits 264 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | Tier | Reserved |T|R|L| 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 269 Tier: A value from 0 to 15. It represents the spine-leaf 270 tier level. The value 15 is reserved to indicate the 271 tier level is unknown. This value is only valid when 272 the 'T' bit (see below) is set. If the 'T' bit is 273 clear, this value MUST be set to zero on transmission, 274 and it MUST be ignored on receipt. 276 L bit (0x01): Only leaf node sets this bit. If the L bit is 277 set in the SL flag, the node indicates it is in 'Leaf- 278 Mode'. 280 R bit (0x02): Only Spine node sets this bit. If the R bit is 281 set, the node indicates to the leaf neighbor that it 282 can be used as the default route gateway. 284 T bit (0x04): If set, the value in the "Tier" field (see 285 above) is valid. 287 3.3.2. Leaf-Set TLV 289 0 1 2 3 290 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 | Type | Length | .. Optional Sub-TLVs 293 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+.... 295 The Type is suggested value of 152 (to be assigned by IANA). This 296 TLV and associated Sub-TLVs MAY appear in CS-LSP PDUs. Multiple TLVs 297 MAY be sent. 299 3.3.2.1. Leaf-Set Sub-TLVs 301 If the data center topology is a pure CLOS or Fat Tree, there are no 302 link connections among the spine nodes. If we also assume there is 303 not another Core layer on top of the aggregation layer, then the 304 traffic from one leaf node to another may have a problem if there is 305 a link outage between a spine node and a leaf node. For instance, in 306 the diagram of Figure 2, if Leaf1 sends data traffic to Leaf3 through 307 Spine1 node, and the Spine1-Leaf3 link is down, the data traffic will 308 be dropped on the Spine1 node. 310 To address this issue spine and leaf nodes may use the sub-TLVs 311 defined below to obtain more specific reachability information. 313 Two Leaf-Set sub-TLVs are defined. The Leaf-Neighbors sub-TLV and 314 the Reachability-Req sub-TLV. 316 3.3.2.1.1. Leaf-Neighbors Sub-TLV 318 This sub-TLV is used by spine nodes to advertise the current set of 319 Leaf neighbors to Leaf nodes. The fields of this sub-TLV are defined 320 as follows: 322 Type: 1 octet Suggested value 1 (to be assigned by IANA) 324 Length: 1 octet MUST be a multiple of 6 octets. 326 Leaf-Neighbors A list of IS-IS System-IDs of the leaf node 327 neighbors of this spine node. 329 3.3.2.1.2. Reachability-Req Sub-TLV 331 This sub-TLV is used by leaf nodes to request the advertisement of 332 more specific prefix information from one or more selected spine 333 node(s). The list of leaf nodes in this sub-TLV reflects the current 334 set of leaf-nodes for which not all spine node neighbors have 335 indicated the presence of connectivity in the Leaf-Neighbors sub-TLV 336 (See Section 3.3.2.1.1). The fields of this sub-TLV are defined as 337 follows: 339 Type: 1 octet Suggested value 2 (to be assigned by IANA) 341 Length: 1 octet. It MUST be a multiple of 6 octets. 343 Leaf Nodes List of IS-IS System-IDs of leaf nodes for which 344 reachability information is being requested. 346 3.3.3. Advertising IPv4/IPv6 Reachability 348 In cases where connectivity between a leaf node and a spine node is 349 down, the leaf node MAY request reachability information from a spine 350 node as described in Section 3.3.2.1.2. The spine node utilizes TLVs 351 135 [RFC5305] and TLVs 236 [RFC5308] to advertise this information. 352 These TLVs MAY be included in CS-LSPs [RFC7356] sent from the spine 353 to the requesting leaf node. 355 3.3.4. Advertising Connection to RF-Leaf Node 357 For links between Spine and Leaf Nodes on which the Spine Node has 358 set the R-bit and the Leaf node has set the L-bit in their respective 359 Spine-Leaf TLVs, spine nodes MAY advertise the link with a bit in the 360 "link-attribute" sub-TLV [RFC5029] to indicate that this link is not 361 used for LSP flooding. This bit is named the Connect-to-RF-Leaf Node 362 bit. This information can be used by nodes computing a flooding 363 topology e.g., [DYNAMIC-FLOODING], to exclude the RF-Leaf nodes from 364 the computed flooding topology. 366 For links between Spine and Leaf Nodes on which the Spine Node has 367 set the R-bit and the Leaf node has set the L-bit in their respective 368 Spine-Leaf TLVs, leaf nodes MAY advertise the link with a bit in the 369 "link-attribute" sub-TLV [RFC5029] to indicate that this link is to a 370 Spine Node neighbor. This bit is named the Connect-to-RF-Spine Node 371 bit. This information can be used by leaf nodes when deciding 372 whether a leaf to leaf link can be used as an alternate default path 373 when a leaf node has no connectivity to any spines. See 374 Section 3.5.2. 376 3.4. Mechanism 378 Leaf nodes in a spine-leaf application using this extension are 379 provisioned with two attributes: 381 1)Tier level of 0. This indicates the node is a Leaf Node. The 382 value 0 is advertised in the Tier field of Spine-Leaf TLV defined 383 above. 385 2)Flooding reduction enabled/disabled. If flooding reduction is 386 enabled the L-bit is set to one in the Spine-Leaf TLV defined above 388 A spine node does not need explicit configuration. Spine nodes can 389 dynamically discover their tier level by computing the number of hops 390 to a leaf node. Until a spine node determines its tier level it MUST 391 advertise level 15 (unknown tier level) in the Spine-Leaf TLV defined 392 above. Each tier level can also be statically provisioned on the 393 node. 395 When a spine node receives an IIH which includes the Spine-Leaf TLV 396 with Tier level 0 and 'L' bit set, it labels the point-to-point 397 interface and adjacency to be a 'Reduced Flooding Leaf-Peer (RF- 398 Leaf)'. IIHs sent by a spine node on a link to an RF-Leaf include 399 the Spine-Leaf TLV with the 'R' bit set in the flags field. The 'R' 400 bit indicates to the RF-Leaf neighbor that the spine node can be used 401 as a default routing nexthop. 403 There is no change to the IS-IS adjacency bring-up mechanism for 404 Spine-Leaf peers. 406 A spine node blocks LSP flooding to RF-Leaf adjacencies, except for 407 the LSP PDUs in which the IS-IS System-ID matches the System-ID of 408 the RF-Leaf neighbor. This exception is needed since when the leaf 409 node reboots, the spine node needs to forward to the leaf node non- 410 purged LSPs from the RF-Leaf's previous incarnation. 412 Leaf nodes will perform IS-IS LSP flooding as normal to send the LSPs 413 over all of its IS-IS adjacencies. In the case of RF-Leafs only 414 self-originated LSPs will exist in its LSP database, and in the case 415 of leaf-leaf connections, there will be neighbor leaf nodes LSPs in 416 the LSP database in addition to the self-originated LSPs. 418 Spine nodes will receive all the LSP PDUs in the network, including 419 all the spine nodes and leaf nodes. It will perform Shortest Path 420 First (SPF) as a normal IS-IS node does. There is no change to the 421 route calculation and forwarding on the spine nodes. 423 The LSPs of a node only floods north bound towards the upper layer 424 spine nodes. The default route is generated with loadsharing also 425 towards the upper layer spine nodes. 427 RF-Leaf nodes do not have any LSP in the network except for its own. 428 Therefore there is no need to perform SPF calculation on the RF-Leaf 429 node. It only needs to download the default route with the nexthops 430 of those Spine Neighbors which have the 'R' bit set in the Spine-Leaf 431 TLV in IIH PDUs. IS-IS can perform equal cost or unequal cost load 432 sharing while using the spine nodes as nexthops. The aggregated 433 metric of the outbound interface and the 'Reverse Metric' [RFC8500] 434 can be used for this purpose. 436 3.4.1. Pure CLOS Topology 438 In a data center where the topology is pure CLOS or Fat Tree, there 439 is no interconnection among the spine nodes, and there is not another 440 Core layer above the aggregation layer with reachability to the leaf 441 nodes. When flooding reduction to RF-Leafs is in use, if the link 442 between a spine and a leaf goes down, there is then a possibility of 443 black holing the data traffic in the network. 445 As in the diagram Figure 2, if the link Spine1-Leaf3 goes down, there 446 needs to be a way for Leaf1, Leaf2 and Leaf4 to avoid the Spine1 if 447 the destination of data traffic is to Leaf3 node. 449 In the above example, the Spine1 and Spine2 are provisioned to 450 advertise the Leaf-Set sub-TLV of the Spine-Leaf TLV. Originally 451 both Spines will advertise Leaf1 through Leaf4 as their Leaf-Set. 452 When the Spine1-Leaf3 link is down, Spine1 will only have Leaf1, 453 Leaf2 and Leaf4 in its Leaf-Set. This allows the other leaf nodes to 454 know that Spine1 has lost connectivity to the leaf node of Leaf3. 456 Each RF-Leaf node can select another spine node to request for some 457 prefix information associated with the lost leaf node. In this 458 diagram of Figure 2, there are only two spine nodes (Spine-Leaf 459 topology can have more than two spine nodes in general). Each RF- 460 Leaf node can independently select a spine node for the leaf 461 information. The RF-Leaf nodes will include the Info-Req sub-TLV in 462 the Spine-Leaf TLV in hellos sent to the selected spine node, Spine2 463 in this case. 465 The spine node, upon receiving the request from one or more leaf 466 nodes, will find the IPv6/IPv4 prefixes advertised by the leaf nodes 467 listed in the Info-Req sub-TLV. The spine node will use the 468 mechanism defined in Section 3.3.2 to advertise these prefixes to the 469 RF-Leaf node. For instance, it will include the IPv4 loopback prefix 470 of leaf3 based on the policy configured or administrative tag 471 attached to the prefixes. When the leaf nodes receive the more 472 specific prefixes, they will install the advertised prefixes towards 473 the other spine nodes (Spine2 in this example). 475 For instance in the data center overlay scenario, when any IP 476 destination or MAC destination uses the leaf3's loopback as the 477 tunnel nexthop, the overlay tunnel from leaf nodes will only select 478 Spine2 as the gateway to reach leaf3 as long as the Spine1-Leaf3 link 479 is still down. 481 In cases where multiple links or nodes fail at the same time, the RF- 482 leaf node may need to send the Info-Req to multiple upper layer spine 483 nodes in order to obtain reachability information for all the 484 partially connected nodes. 486 This negative routing is more useful between tier 0 and tier 1 spine- 487 leaf levels in a multi-level spine-leaf topology when the reduced 488 flooding extension is in use. Nodes in tiers 1 or greater may have 489 much richer topology information and alternative paths. 491 3.5. Implementation and Operation 493 3.5.1. CSNP PDU 495 In Spine-Leaf extension, Complete Sequence Number PDUs (CSNP) do not 496 need to be transmitted over the Spine-Leaf link to an RF-Leaf. Some 497 IS-IS implementations send periodic CSNPs after the initial adjacency 498 bring-up over a point-to-point interface. There is no need for this 499 optimization here since the RF-Leaf does not need to receive any 500 other LSPs from the network, and the only LSPs transmitted across the 501 Spine-Leaf link are the leaf node LSPs. 503 Also in the graceful restart case[RFC5306], for the same reason, 504 there is no need to send the CSNPs over the Spine-Leaf interface to 505 an RF-Leaf. Spine nodes only need to set the SRMflag on the LSPs 506 belonging to the RF-Leaf that has restarted. 508 3.5.2. Leaf to Leaf connection 510 Leaf to leaf node links are useful in host redundancy cases in 511 switching networks. There are no flooding extensions required in 512 this case. Leaf node LSPs will be exchanged over this link using the 513 normal operation of the IS-IS Update process. In the example diagram 514 Figure 1, Leaf1 will receive Leaf2's LSPs and Leaf2 will receive 515 Leaf1's LSPs. Each of the Leaf nodes will in turn flood the LSPs 516 they receive from their leaf node neighbor to their spine neighbors. 517 Prefix reachability advertisements received from the leaf neighbor 518 will result in the installation of more specific routes using this 519 local Leaf-Leaf link. SPF will be performed in this case just like 520 when the entire network only involves with those two IS-IS nodes. 521 This does not affect the normal Spine-Leaf mechanism they perform 522 toward the spine nodes. 524 Leaf to leaf connections SHOULD be limited to a single leaf neighbor. 526 Two modes of operation for the Leaf-Leaf link are possible and are 527 described in the following sub-sections. 529 3.5.2.1. Local traffic only 531 The leaf node sets the 'overload' bit in its LSP PDU so that spine 532 nodes will not send traffic destined for the neighboring leaf node 533 via its leaf node neighbor. The Leaf-Leaf link will then be used 534 solely for local traffic between the two Leaf Nodes. 536 3.5.2.2. Transit traffic allowed 538 If a leaf node becomes disconnected from all spine nodes, it is 539 possible for spine nodes to route traffic destined for the 540 disconnected leaf node via its leaf node neighbor. However the leaf 541 to leaf link SHOULD be the link of last resort. To support this mode 542 the leaf nodes do NOT set the overload bit in their LSPs and they 543 advertise a high metric for the leaf to leaf link((2^24 - 2) is 544 recommended). This signals to the Spine Nodes that the leaf to leaf 545 link may be used for transit traffic, but also insures that it will 546 not be used unless the spine node has no other path to a given leaf 547 node. 549 When the leaf node is disconnected from all spine nodes it MAY 550 install a default route towards its leaf-node neighbor in support of 551 return traffic to the spine nodes. When doing so the leaf should 552 validate that its leaf neighbor has at least one spine neighbor. 553 This can be done by looking for the Connect-to-RF-Spine Node bit in 554 the Link Attributes sub-TLVs [RFC5029] advertised in the LSPs of its 555 leaf node neighbor. 557 3.5.3. Spine Node Hostname 559 This extension creates a non-reciprocal relationship between the 560 spine node and leaf node. The spine node will receive leaf's LSP and 561 will know the leaf's hostname, but the leaf does not have spine's 562 LSP. This extension allows the Dynamic Hostname TLV [RFC5301] to be 563 optionally included in spine's IIH PDU when sending to a 'Leaf-Peer'. 564 This is useful in troubleshooting cases. 566 3.5.4. IS-IS Reverse Metric 568 This metric is part of the aggregated metric for leaf's default route 569 installation with load sharing among the spine nodes. When a spine 570 node is in 'overload' condition, it should use the IS-IS Reverse 571 Metric TLV in IIH [RFC8500] to set this metric to maximum to 572 discourage the leaf using it as part of the loadsharing. 574 In some cases, certain spine nodes may have less bandwidth in link 575 provisioning or in real-time condition, and it can use this metric to 576 signal to the leaf nodes dynamically. 578 In other cases, such as when the spine node loses a link to a 579 particular leaf node, although it can redirect the traffic to other 580 spine nodes to reach that destination leaf node, but it MAY want to 581 increase this metric value if the inter-spine connection becomes over 582 utilized, or the latency becomes an issue. 584 3.5.5. Spine-Leaf Traffic Engineering 586 Besides using the IS-IS Reverse Metric by the spine nodes to affect 587 the traffic pattern for leaf default gateway towards multiple spine 588 nodes, the IPv6/IPv4 Info-Advertise sub-TLVs can be selectively used 589 by traffic engineering controllers to move data traffic around the 590 data center fabric to alleviate congestion and to reduce the latency 591 of a certain class of traffic pairs. By injecting more specific leaf 592 node prefixes, it will allow the spine nodes to attract more traffic 593 on some underutilized links. 595 3.5.6. Other End-to-End Services 597 Losing the topology information will have an impact on some of the 598 end-to-end network services, for instance, MPLS TE or end-to-end 599 segment routing. Some other mechanisms such as those described in 600 PCE [RFC4655] based solution may be used. In this Spine-Leaf 601 extension, the role of the leaf node is not too much different from 602 the multi-level IS-IS routing while the level-1 IS-IS nodes only have 603 the default route information towards the node which has the Attach 604 Bit (ATT) set, and the level-2 backbone does not have any topology 605 information of the level-1 areas. The exact mechanism to enable 606 certain end-to-end network services in Spine-Leaf network is outside 607 the scope of this document. 609 3.5.7. Address Family and Topology 611 IPv6 Address families[RFC5308], Multi-Topology (MT)[RFC5120] and 612 Multi-Instance (MI)[RFC8202] information is carried over the IIH PDU. 613 Since the goal is to simplify the operation of IS-IS network, for the 614 simplicity of this extension, the Spine-Leaf mechanism is applied the 615 same way to all the address families, MTs and MIs. 617 3.5.8. Migration 619 For this extension to be deployed in existing networks, a simple 620 migration scheme is needed. To support any leaf node in the network, 621 all the involved spine nodes have to be upgraded first. So the first 622 step is to migrate all the involved spine nodes to support this 623 extension, then the leaf nodes can be enabled with 'Leaf-Mode' one by 624 one. No flag day is needed for the extension migration. 626 4. IANA Considerations 628 Two new TLV codepoint is defined in this document and needs to be 629 assigned by IANA from the "IS-IS TLV Codepoints" registry. They are 630 referred to as the Spine-Leaf TLV and the suggested value is 151, and 631 Leaf-Set TLV and suggested value is 152. The Spine-Leaf TLV is only 632 to be optionally inserted in the IIH PDU, and the Leaf-Set TLV is 633 only to be optionally inserted in Circuit Flooding Scoped LSP PDU. 634 IANA is also requested to maintain the SL-flag bit values in the 635 Spine-Leaf TLV, and 0x01, 0x02 and 0x04 bits are defined in this 636 document. 638 Value Name IIH LSP SNP Purge CS-LSP 639 ----- --------------------- --- --- --- ----- ------- 640 151 Spine-Leaf y n n n n 641 152 Leaf-Set n n n n y 643 This document also proposes to have the Dynamic Hostname TLV, already 644 assigned as code 137, to be allowed in IIH PDU. 646 Value Name IIH LSP SNP Purge 647 ----- --------------------- --- --- --- ----- 648 137 Dynamic Name y y n y 650 This documents requests IANA to create a new registry under the IS-IS 651 TLV Codepoints registry. The suggested name of the registry is "Sub- 652 TLVs for TLV 152 (Leaf-Set TLV)". Initial contents of the new 653 registry is defined below: 655 Value Name 656 ----- --------------------- 657 0 Reserved 658 1 Leaf Neighbors 659 2 Reachability Req 660 3-255 Unassigned 662 This document also requests that IANA allocate from the registry of 663 link-attribute two new bit values for sub-TLV 19 of TLV 22 (Extended 664 IS reachability TLV). 666 Value Name Reference 667 ----- ----- ---------- 668 0x4 Connect to RF-Leaf Node This document 669 0x8 Connect to RF-Spine Node This document 671 5. Security Considerations 673 Security concerns for IS-IS are addressed in [ISO10589], [RFC5304], 674 [RFC5310], and [RFC7602]. This extension does not raise additional 675 security issues. 677 6. Acknowledgments 679 The authors would like to thank Tony Przygienda and Lukas Krattiger 680 for their discussion and contributions. The authors also would like 681 to thank Acee Lindem, Russ White, Christian Hopps and Aijun Wang for 682 their review and comments of this document. 684 7. References 686 7.1. Normative References 688 [ISO10589] 689 ISO "International Organization for Standardization", 690 "Intermediate system to Intermediate system intra-domain 691 routeing information exchange protocol for use in 692 conjunction with the protocol for providing the 693 connectionless-mode Network Service (ISO 8473), ISO/IEC 694 10589:2002, Second Edition.", Nov 2002. 696 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 697 Requirement Levels", BCP 14, RFC 2119, 698 DOI 10.17487/RFC2119, March 1997, 699 . 701 [RFC5029] Vasseur, JP. and S. Previdi, "Definition of an IS-IS Link 702 Attribute Sub-TLV", RFC 5029, DOI 10.17487/RFC5029, 703 September 2007, . 705 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 706 Topology (MT) Routing in Intermediate System to 707 Intermediate Systems (IS-ISs)", RFC 5120, 708 DOI 10.17487/RFC5120, February 2008, 709 . 711 [RFC5301] McPherson, D. and N. Shen, "Dynamic Hostname Exchange 712 Mechanism for IS-IS", RFC 5301, DOI 10.17487/RFC5301, 713 October 2008, . 715 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 716 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 717 2008, . 719 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 720 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 721 2008, . 723 [RFC5306] Shand, M. and L. Ginsberg, "Restart Signaling for IS-IS", 724 RFC 5306, DOI 10.17487/RFC5306, October 2008, 725 . 727 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 728 DOI 10.17487/RFC5308, October 2008, 729 . 731 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 732 and M. Fanto, "IS-IS Generic Cryptographic 733 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 734 2009, . 736 [RFC7356] Ginsberg, L., Previdi, S., and Y. Yang, "IS-IS Flooding 737 Scope Link State PDUs (LSPs)", RFC 7356, 738 DOI 10.17487/RFC7356, September 2014, 739 . 741 [RFC7602] Chunduri, U., Lu, W., Tian, A., and N. Shen, "IS-IS 742 Extended Sequence Number TLV", RFC 7602, 743 DOI 10.17487/RFC7602, July 2015, 744 . 746 [RFC8202] Ginsberg, L., Previdi, S., and W. Henderickx, "IS-IS 747 Multi-Instance", RFC 8202, DOI 10.17487/RFC8202, June 748 2017, . 750 [RFC8500] Shen, N., Amante, S., and M. Abrahamsson, "IS-IS Routing 751 with Reverse Metric", RFC 8500, DOI 10.17487/RFC8500, 752 February 2019, . 754 7.2. Informative References 756 [DYNAMIC-FLOODING] 757 Li, T., "Dynamic Flooding on Dense Graphs", draft-li- 758 dynamic-flooding (work in progress), 2018. 760 [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation 761 Element (PCE)-Based Architecture", RFC 4655, 762 DOI 10.17487/RFC4655, August 2006, 763 . 765 [RFC5309] Shen, N., Ed. and A. Zinin, Ed., "Point-to-Point Operation 766 over LAN in Link State Routing Protocols", RFC 5309, 767 DOI 10.17487/RFC5309, October 2008, 768 . 770 Authors' Addresses 772 Naiming Shen 773 Cisco Systems 774 560 McCarthy Blvd. 775 Milpitas, CA 95035 776 US 778 Email: naiming@cisco.com 780 Les Ginsberg 781 Cisco Systems 782 821 Alder Drive 783 Milpitas, CA 95035 784 US 786 Email: ginsberg@cisco.com 788 Sanjay Thyamagundalu 790 Email: tsanjay@gmail.com