idnits 2.17.1 draft-shen-isis-spine-leaf-ext-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 30, 2017) is 2491 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'OPENFABRIC' is defined on line 722, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' == Outdated reference: A later version (-17) exists of draft-ietf-isis-reverse-metric-06 ** Obsolete normative reference: RFC 5306 (Obsoleted by RFC 8706) == Outdated reference: A later version (-07) exists of draft-white-openfabric-02 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group N. Shen 3 Internet-Draft L. Ginsberg 4 Intended status: Standards Track Cisco Systems 5 Expires: January 1, 2018 S. Thyamagundalu 6 June 30, 2017 8 IS-IS Routing for Spine-Leaf Topology 9 draft-shen-isis-spine-leaf-ext-04 11 Abstract 13 This document describes a mechanism for routers and switches in a 14 Spine-Leaf type topology to have non-reciprocal Intermediate System 15 to Intermediate System (IS-IS) routing relationships between the 16 leafs and spines. The leaf nodes do not need to have the topology 17 information of other nodes and exact prefixes in the network. This 18 extension also has application in the Internet of Things (IoT). 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 1, 2018. 37 Copyright Notice 39 Copyright (c) 2017 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 56 2. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Spine-Leaf (SL) Extension . . . . . . . . . . . . . . . . . . 4 58 3.1. Topology Examples . . . . . . . . . . . . . . . . . . . . 4 59 3.2. Applicability Statement . . . . . . . . . . . . . . . . . 5 60 3.3. Extension Encoding . . . . . . . . . . . . . . . . . . . 6 61 3.3.1. Spine-Leaf Sub-TLVs . . . . . . . . . . . . . . . . . 7 62 3.3.1.1. Leaf-Set Sub-TLV . . . . . . . . . . . . . . . . 8 63 3.3.1.2. Info-Req Sub-TLV . . . . . . . . . . . . . . . . 8 64 3.3.2. Advertising IPv4/IPv6 Reachability . . . . . . . . . 8 65 3.4. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.4.1. Pure CLOS Topology . . . . . . . . . . . . . . . . . 10 67 3.5. Implementation and Operation . . . . . . . . . . . . . . 11 68 3.5.1. CSNP PDU . . . . . . . . . . . . . . . . . . . . . . 11 69 3.5.2. Leaf to Leaf connection . . . . . . . . . . . . . . . 11 70 3.5.3. Overload Bit . . . . . . . . . . . . . . . . . . . . 11 71 3.5.4. Spine Node Hostname . . . . . . . . . . . . . . . . . 12 72 3.5.5. IS-IS Reverse Metric . . . . . . . . . . . . . . . . 12 73 3.5.6. Spine-Leaf Traffic Engineering . . . . . . . . . . . 12 74 3.5.7. Other End-to-End Services . . . . . . . . . . . . . . 12 75 3.5.8. Address Family and Topology . . . . . . . . . . . . . 13 76 3.5.9. Migration . . . . . . . . . . . . . . . . . . . . . . 13 77 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 78 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 79 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 80 7. Document Change Log . . . . . . . . . . . . . . . . . . . . . 14 81 7.1. Changes to draft-shen-isis-spine-leaf-ext-04.txt . . . . 14 82 7.2. Changes to draft-shen-isis-spine-leaf-ext-03.txt . . . . 14 83 7.3. Changes to draft-shen-isis-spine-leaf-ext-02.txt . . . . 14 84 7.4. Changes to draft-shen-isis-spine-leaf-ext-01.txt . . . . 14 85 7.5. Changes to draft-shen-isis-spine-leaf-ext-00.txt . . . . 15 86 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 87 8.1. Normative References . . . . . . . . . . . . . . . . . . 15 88 8.2. Informative References . . . . . . . . . . . . . . . . . 16 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 91 1. Introduction 93 The IS-IS routing protocol defined by [ISO10589] has been widely 94 deployed in provider networks, data centers and enterprise campus 95 environments. In the data center and enterprise switching networks, 96 a Spine-Leaf topology is commonly used. This document describes a 97 mechanism where IS-IS routing can be optimized for a Spine-Leaf 98 topology. 100 In a Spine-Leaf topology, normally a leaf node connects to a number 101 of spine nodes. Data traffic going from one leaf node to another 102 leaf node needs to pass through one of the spine nodes. Also, the 103 decision to choose one of the spine nodes is usually part of equal 104 cost multi-path (ECMP) load sharing. The spine nodes can be 105 considered as gateway devices to reach destinations on other leaf 106 nodes. In this type of topology, the spine nodes have to know the 107 topology and routing information of the entire network, but the leaf 108 nodes only need to know how to reach the gateway devices to which are 109 the spine nodes they are uplinked. 111 This document describes the IS-IS Spine-Leaf extension that allows 112 the spine nodes to have all the topology and routing information, 113 while keeping the leaf nodes free of topology information other than 114 the default gateway routing information. The leaf nodes do not even 115 need to run a Shortest Path First (SPF) calculation since they have 116 no topology information. 118 1.1. Requirements Language 120 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 121 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 122 document are to be interpreted as described in RFC 2119 [RFC2119]. 124 2. Motivations 126 o The leaf nodes in a Spine-Leaf topology do not require complete 127 topology and routing information of the entire domain since their 128 forwarding decision is to use ECMP with spine nodes as default 129 gateways 131 o The spine nodes in a Spine-Leaf topology are richly connected to 132 leaf nodes, which introduces significant flooding duplication if 133 they flood all Link State PDUs (LSPs) to all the leaf nodes. It 134 saves both spine and leaf nodes' CPU and link bandwidth resources 135 if flooding is blocked to leaf nodes. For small Top of the Rack 136 (ToR) leaf switches in data centers, it is meaningful to prevent 137 full topology routing information and massive database flooding 138 through those devices. 140 o When a spine node advertises a topology change, every leaf node 141 connected to it will flood the update to all the other spine 142 nodes, and those spine nodes will further flood them to all the 143 leaf nodes, causing a O(n^2) flooding storm which is largely 144 redundant. 146 o Similar to some of the overlay technologies which are popular in 147 data centers, the edge devices (leaf nodes) may not need to 148 contain all the routing and forwarding information on the device's 149 control and forwarding planes. "Conversational Learning" can be 150 utilized to get the specific routing and forwarding information in 151 the case of pure CLOS topology and in the events of link and node 152 down. 154 o Small devices and appliances of Internet of Things (IoT) can be 155 considered as leafs in the routing topology sense. They have CPU 156 and memory constrains in design, and those IoT devices do not have 157 to know the exact network topology and prefixes as long as there 158 are ways to reach the cloud servers or other devices. 160 3. Spine-Leaf (SL) Extension 162 3.1. Topology Examples 164 +--------+ +--------+ +--------+ 165 | | | | | | 166 | Spine1 +----+ Spine2 +- ......... -+ SpineN | 167 | | | | | | 168 +-+-+-+-++ ++-+-+-+-+ +-+-+-+-++ 169 +------+ | | | | | | | | | | | 170 | +-----|-|-|------+ | | | | | | | 171 | | +--|-|-|--------+-|-|-----------------+ | | | 172 | | | | | | +---+ | | | | | 173 | | | | | | | +--|-|-------------------+ | | 174 | | | | | | | | | | +------+ +----+ 175 | | | | | | | | | +--------------|----------+ | 176 | | | | | | | | +-------------+ | | | 177 | | | | | +----|--|----------------|--|--------+ | | 178 | | | | +------|--|--------------+ | | | | | 179 | | | +------+ | | | | | | | | 180 ++--+--++ +-+-+--++ ++-+--+-+ ++-+--+-+ 181 | Leaf1 +~~~~~~+ Leaf2 | ........ | LeafX | | LeafY | 182 +-------+ +-------+ +-------+ +-------+ 184 Figure 1: A Spine-Leaf Topology 186 +---------+ +--------+ 187 | Spine1 | | Spine2 | 188 +-+-+-+-+-+ +-+-+-+-++ 189 | | | | | | | | 190 | | | +-----------------|-|-|-|-+ 191 | | +------------+ | | | | | 192 +--------+ +-+ | | | | | | 193 | +----------------------------+ | | | | 194 | | | +------------------+ | +----+ 195 | | | | | +-------+ | | 196 | | | | | | | | 197 +-+---+-+ +--+--+-+ +-+--+--+ +--+--+-+ 198 | Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 | 199 +-------+ +-------+ +-------+ +-------+ 201 Figure 2: A CLOS Topology 203 3.2. Applicability Statement 205 This extension assumes the network is a Spine-Leaf topology, and it 206 should not be applied in an arbitrary network setup. The spine nodes 207 can be viewed as the aggregation layer of the network, and the leaf 208 nodes as the access layer of the network. The leaf nodes use a load 209 sharing algorithm with spine nodes as nexthops in routing and 210 forwarding. 212 This extension works when the spine nodes are inter-connected, and it 213 works with a pure CLOS or Fat Tree topology based network where the 214 spines are NOT horizontally interconnected. 216 Although the example diagram in Figure 1 shows a fully meshed Spine- 217 Leaf topology, this extension also works in the case where they are 218 partially meshed. For instance, leaf1 through leaf10 may be fully 219 meshed with spine1 through spine5 while leaf11 through leaf20 is 220 fully meshed with spine4 through spine8, and all the spines are 221 inter-connected in a redundant fashion. 223 This extension can also work in multi-level spine-leaf topology. The 224 lower level spine node can be a 'leaf' node to the upper level spine 225 node. A spine-leaf 'Tier' can be exchanged with IS-IS hello packets 226 to allow tier X to be connected with tier X+1 using this extension. 227 Normally tier-0 will be the TOR routers and switches if provisioned. 229 This extension also works with normal IS-IS routing in a topology 230 with more than two layers of spine and leaf. For instance, in 231 example diagrams Figure 1 and Figure 2, there can be another Core 232 layer of routers/switches on top of the aggregation layer. From an 233 IS-IS routing point of view, the Core nodes are not affected by this 234 extension and will have the complete topology and routing information 235 just like the spine nodes. To make the network even more scalable, 236 the Core layer can operate as a level-2 IS-IS sub-domain while the 237 Spine and Leaf layers operate as stays at the level-1 IS-IS domain. 239 This extension also supports the leaf nodes having local connections 240 to other leaf nodes, in the example diagram Figure 1 there is a 241 connection between 'Leaf1' node and 'Leaf2' node, and an external 242 host can be dual homed into both of the leaf nodes. 244 This extension assumes the link between the spine and leaf nodes are 245 point-to-point, or point-to-point over LAN [RFC5309]. The links 246 connecting among the spine nodes or the links between the leaf nodes 247 can be any type. 249 3.3. Extension Encoding 251 This extension introduces one new TLV which may be used in IS-IS 252 Hello (IIH) PDUs, LSPs, or in Circuit Scoped Link State PDUs (CS-LSP) 253 [RFC7356]. It is used by both spine and leaf nodes in this Spine- 254 Leaf mechanism. 256 0 1 2 3 257 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 259 | Type | Length | SL Flag | 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 261 | .. Optional Sub-TLVs 262 +-+-+-+-+-+-+-+-+-.... 264 The fields of this TLV are defined as follows: 266 Type: 1 octet Suggested value 150 (to be assigned by IANA) 268 Length: 1 octet (2 + length of sub-TLVs). 270 Flags 16 bits 272 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | Tier | Reserved |T|B|R|L| 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 276 Tier: A 4 bits value range from 0 to 15. It is used to 277 represent the spine-leaf tier level when the 'T' bit 278 is set. If the 'T' is cleared, this value MUST be set 279 to zero from the sender, and it MUST be ignored on the 280 receiver. The value 15 is reserved to indicate the 281 tier level is unknown or not configured. 283 L bit (0x01): Only leaf node sets this bit. If the L bit is 284 set in the SL flag, the node indicates it is in 'Leaf- 285 Mode'. 287 R bit (0x02): Only Spine node sets this bit. If the R bit is 288 set, the node indicates to the leaf neighbor that it 289 can be used as the default route gateway. 291 B bit (0x04): Only leaf node sets this bit on Leaf-Leaf link, 292 in additional to the 'L' bit setting. If the B bit is 293 set, the node indicates to its leaf neighbor that it 294 can be used as the backup default route gateway. 296 T bit (0x08): If set, the value in the 'Tier' field represents 297 the spine-leaf tier level in the topology. 299 Optional Sub-TLV: Not defined in this document, for future 300 extension 302 sub-TLVs MAY be included when the TLV is in a CS-LSP. 303 sub-TLVs MUST NOT be included when the TLV is in an IIH 305 3.3.1. Spine-Leaf Sub-TLVs 307 If the data center topology is a pure CLOS or Fat Tree, there are no 308 link connections among the spine nodes. If we also assume there is 309 not another Core layer on top of the aggregation layer, then the 310 traffic from one leaf node to another may have a problem if there is 311 a link outage between a spine node and a leaf node. For instance, in 312 the diagram of Figure 2, if Leaf1 sends data traffic to Leaf3 through 313 Spine1 node, and the Spine1-Leaf3 link is down, the data traffic will 314 be dropped on the Spine1 node. 316 To address this issue spine and leaf nodes may send/request specific 317 reachability information via the sub-TLVs defined below. 319 Two Spine-Leaf sub-TLVs are defined. The Leaf-Set sub-TLV and the 320 Info-Req sub-TLV. 322 3.3.1.1. Leaf-Set Sub-TLV 324 This sub-TLV is used by spine nodes to optionally advertise Leaf 325 neighbors to other Leaf nodes. The fields of this sub-TLV are 326 defined as follows: 328 Type: 1 octet Suggested value 1 (to be assigned by IANA) 330 Length: 1 octet MUST be a multiple of 6 octets. 332 Leaf-Set: A list of IS-IS System-ID of the leaf node neighbors of 333 this spine node. 335 3.3.1.2. Info-Req Sub-TLV 337 This sub-TLV is used by leaf nodes to request more specific prefix 338 information from a selected spine node, upon detecting one of the 339 spine node has lost the connection to a leaf node. The fields of 340 this sub-TLV are defined as follows: 342 Type: 1 octet Suggested value 2 (to be assigned by IANA) 344 Length: 1 octet. It MUST be a multiple of 6 octets. 346 Info-Req: List of IS-IS System-IDs of leaf nodes for which 347 connectivity information is being requested. 349 3.3.2. Advertising IPv4/IPv6 Reachability 351 In cases where connectivity between a leaf node and a spine node is 352 down, the leaf node MAY request reachability information from a spine 353 node as described in Section 3.3.1.2. The spine node utilizes TLVs 354 135 [RFC5305] and TLVs 236 [RFC5308] to advertise this information. 355 These TLVs MAY be included either in IIHs or CS-LSPs sent from the 356 spine to the requesting leaf node. Sending such information in IIHs 357 has limited scale - all reachability information MUST fit within a 358 single IIH. It is therefore recommended that CS-LSPs be used. 360 3.4. Mechanism 362 Leaf nodes in a spine-leaf application using this extension are 363 provisioned with two attributes: 365 1)Tier level of 0. This indicates the node is a Leaf Node. The 366 value 0 is advertised in the Tier field of Spine-Leaf TLV defined 367 above. 369 2)Flooding reduction enabled/disabled. If flooding reduction is 370 enabled the L-bit is set to one in the Spine-Leaf TLV defined above 372 A spine node does not need explicit configuration. Spine nodes can 373 dynamically discover their tier level by computing the number of hops 374 to a leaf node. Until a spine node determines its tier level it MUST 375 advertise level 15 (unknown tier level) in the Spine-Leaf TLV defined 376 above. 378 When a spine node receives an IIH which includes the Spine-Leaf TLV 379 with Tier level 0 and 'L' bit set, it labels the point-to-point 380 interface and adjacency to be a 'Reduced Flooding Leaf-Peer (RF- 381 Leaf)'. IIHs sent by a spine node on a link to an RF-Leaf include 382 the Spine-Leaf TLV with the 'R' bit set in the flags field. The 'R' 383 bit indicates to the RF-Leaf neighbor that the spine node can be used 384 as a default routing nexthop. 386 There is no change to the IS-IS adjacency bring-up mechanism for 387 Spine-Leaf peers. 389 A spine node blocks LSP flooding to RF-Leaf adjacencies, except for 390 the LSP PDUs in which the IS-IS System-ID matches the System-ID of 391 the RF-Leaf neighbor. This exception is needed since when the leaf 392 node reboots, the spine node needs to forward to the leaf node non- 393 purged LSPs from the RF-Leaf's previous incarnation. 395 Leaf nodes will perform IS-IS LSP flooding as normal over all of its 396 IS-IS adjacencies, but in the case of RF-Leafs only self-originated 397 LSPs will exist in its LSP database. 399 Spine nodes will receive all the LSP PDUs in the network, including 400 all the spine nodes and leaf nodes. It will perform Shortest Path 401 First (SPF) as a normal IS-IS node does. There is no change to the 402 route calculation and forwarding on the spine nodes. 404 RF-Leaf nodes do not have any LSP in the network except for its own. 405 Therefore there is no need to perform SPF calculation on the RF-Leaf 406 node. It only needs to download the default route with the nexthops 407 of those Spine Neighbors which have the 'R' bit set in the Spine-Leaf 408 TLV in IIH PDUs. IS-IS can perform equal cost or unequal cost load 409 sharing while using the spine nodes as nexthops. The aggregated 410 metric of the outbound interface and the 'Reverse Metric' 411 [REVERSE-METRIC] can be used for this purpose. 413 3.4.1. Pure CLOS Topology 415 In a data center where the topology is pure CLOS or Fat Tree, there 416 is no interconnection among the spine nodes, and there is not another 417 Core layer above the aggregation layer with reachability to the leaf 418 nodes. When flooding reduction to RF-Leafs is in use, if the link 419 between a spine and a leaf goes down, there is then a possibility of 420 black holing the data traffic in the network. 422 As in the diagram Figure 2, if the link Spine1-Leaf3 goes down, there 423 needs to be a way for Leaf1, Leaf2 and Leaf4 to avoid the Spine1 if 424 the destination of data traffic is to Leaf3 node. 426 In the above example, the Spine1 and Spine2 are provisioned to 427 advertise the Leaf-Set sub-TLV of the Spine-Leaf TLV. Originally 428 both Spines will advertise Leaf1 through Leaf4 as their Leaf-Set. 429 When the Spine1-Leaf3 link is down, Spine1 will only have Leaf1, 430 Leaf2 and Leaf4 in its Leaf-Set. This allows the other leaf nodes to 431 know that Spine1 has lost connectivity to the leaf node of Leaf3. 433 Each RF-Leaf node can select another spine node to request for some 434 prefix information associated with the lost leaf node. In this 435 diagram of Figure 2, there are only two spine nodes (Spine-Leaf 436 topology can have more than two spine nodes in general). Each RF- 437 Leaf node can independently select a spine node for the leaf 438 information. The RF-Leaf nodes will include the Info-Req sub-TLV in 439 the Spine-Leaf TLV in hellos sent to the selected spine node, Spine2 440 in this case. 442 The spine node, upon receiving the request from one or more leaf 443 nodes, will find the IPv6/IPv4 prefixes advertised by the leaf nodes 444 listed in the Info-Req sub-TLV. The spine node will use the 445 mechanism defined in Section 3.3.2 to advertise these prefixes to the 446 RF-Leaf node. For instance, it will include the IPv4 loopback prefix 447 of leaf3 based on the policy configured or administrative tag 448 attached to the prefixes. When the leaf nodes receive the more 449 specific prefixes, they will install the advertised prefixes towards 450 the other spine nodes (Spine2 in this example). 452 For instance in the data center overlay scenario, when any IP 453 destination or MAC destination uses the leaf3's loopback as the 454 tunnel nexthop, the overlay tunnel from leaf nodes will only select 455 Spine2 as the gateway to reach leaf3 as long as the Spine1-Leaf3 link 456 is still down. 458 This negative routing is only relevant between tier 0 and tier 1 459 spine-leaf levels in a multi-level spine-leaf topology when the 460 reduced flooding extension is in use. Nodes in tiers 1 or greater 461 have the full topology information. 463 3.5. Implementation and Operation 465 3.5.1. CSNP PDU 467 In Spine-Leaf extension, Complete Sequence Number PDU (CSNP) does not 468 need to be transmitted over the Spine-Leaf link to an RF-Leaf. Some 469 IS-IS implementations send periodic CSNPs after the initial adjacency 470 bring-up over a point-to-point interface. There is no need for this 471 optimization here since the RF-Leaf does not need to receive any 472 other LSPs from the network, and the only LSPs transmitted across the 473 Spine-Leaf link is the leaf node LSP. 475 Also in the graceful restart case[RFC5306], for the same reason, 476 there is no need to send the CSNPs over the Spine-Leaf interface to 477 an RF-Leaf. Spine nodes only need to set the SRMflag on the LSPs 478 belonging to the RF-Leaf. 480 3.5.2. Leaf to Leaf connection 482 Leaf to leaf node links are useful in host redundancy cases in 483 switching networks, and normally there is no flooding extensions are 484 required in this case. Each leaf node will set tier level = 0 in the 485 Spine-Leaf TLV included in hellos to leaf neighbors. LSP will be 486 exchanged over this link. In the example diagram Figure 1, the Leaf1 487 will get Leaf2's LSP and Leaf2 will get Leaf1's LSP. They will 488 install more specific routes towards each other using this local 489 Leaf-Leaf link. SPF will be performed in this case just like when 490 the entire network only involves with those two IS-IS nodes. This 491 does not affect the normal Spine-Leaf mechanism they perform toward 492 the spine nodes. 494 Besides the local leaf-to-leaf traffic, the leaf node can serve as a 495 backup gateway for its leaf neighbor. It needs to remove the 496 'Overload-Bit' setting in its LSP, and it sets both the 'L' bit and 497 the 'B' bit in the SL-flag with a high 'Reverse Metric' value. 499 3.5.3. Overload Bit 501 The leaf node SHOULD set the 'overload' bit on its LSP PDU, since if 502 the spine nodes were to forward traffic not meant for the local node, 503 the leaf node does not have the topology information to prevent a 504 routing/forwarding loop. 506 3.5.4. Spine Node Hostname 508 This extension creates a non-reciprocal relationship between the 509 spine node and leaf node. The spine node will receive leaf's LSP and 510 will know the leaf's hostname, but the leaf does not have spine's 511 LSP. This extension allows the Dynamic Hostname TLV [RFC5301] to be 512 optionally included in spine's IIH PDU when sending to a 'Leaf-Peer'. 513 This is useful in troubleshooting cases. 515 3.5.5. IS-IS Reverse Metric 517 This metric is part of the aggregated metric for leaf's default route 518 installation with load sharing among the spine nodes. When a spine 519 node is in 'overload' condition, it should use the IS-IS Reverse 520 Metric TLV in IIH [REVERSE-METRIC] to set this metric to maximum to 521 discourage the leaf using it as part of the loadsharing. 523 In some cases, certain spine nodes may have less bandwidth in link 524 provisioning or in real-time condition, and it can use this metric to 525 signal to the leaf nodes dynamically. 527 In other cases, such as when the spine node loses a link to a 528 particular leaf node, although it can redirect the traffic to other 529 spine nodes to reach that destination leaf node, but it MAY want to 530 increase this metric value if the inter-spine connection becomes over 531 utilized, or the latency becomes an issue. 533 In the leaf-leaf link as a backup gateway use case, the 'Reverse 534 Metric' SHOULD always be set to very high value. 536 3.5.6. Spine-Leaf Traffic Engineering 538 Besides using the IS-IS Reverse Metric by the spine nodes to affect 539 the traffic pattern for leaf default gateway towards multiple spine 540 nodes, the IPv6/IPv4 Info-Advertise sub-TLVs can be selectively used 541 by traffic engineering controllers to move data traffic around the 542 data center fabric to alleviate congestion and to reduce the latency 543 of a certain class of traffic pairs. By injecting more specific leaf 544 node prefixes, it will allow the spine nodes to attract more traffic 545 on some underutilized links. 547 3.5.7. Other End-to-End Services 549 Losing the topology information will have an impact on some of the 550 end-to-end network services, for instance, MPLS TE or end-to-end 551 segment routing. Some other mechanisms such as those described in 552 PCE [RFC4655] based solution may be used. In this Spine-Leaf 553 extension, the role of the leaf node is not too much different from 554 the multi-level IS-IS routing while the level-1 IS-IS nodes only have 555 the default route information towards the node which has the Attach 556 Bit (ATT) set, and the level-2 backbone does not have any topology 557 information of the level-1 areas. The exact mechanism to enable 558 certain end-to-end network services in Spine-Leaf network is outside 559 the scope of this document. 561 3.5.8. Address Family and Topology 563 IPv6 Address families[RFC5308], Multi-Topology (MT)[RFC5120] and 564 Multi-Instance (MI)[RFC8202] information is carried over the IIH PDU. 565 Since the goal is to simplify the operation of IS-IS network, for the 566 simplicity of this extension, the Spine-Leaf mechanism is applied the 567 same way to all the address families, MTs and MIs. 569 3.5.9. Migration 571 For this extension to be deployed in existing networks, a simple 572 migration scheme is needed. To support any leaf node in the network, 573 all the involved spine nodes have to be upgraded first. So the first 574 step is to migrate all the involved spine nodes to support this 575 extension, then the leaf nodes can be enabled with 'Leaf-Mode' one by 576 one. No flag day is needed for the extension migration. 578 4. IANA Considerations 580 A new TLV codepoint is defined in this document and needs to be 581 assigned by IANA from the "IS-IS TLV Codepoints" registry. It is 582 referred to as the Spine-Leaf TLV and the suggested value is 150. 583 This TLV is only to be optionally inserted either in the IIH PDU or 584 in the Circuit Flooding Scoped LSP PDU. IANA is also requested to 585 maintain the SL-flag bit values in this TLV, and 0x01, 0x02 and 0x04 586 bits are defined in this document. 588 Value Name IIH LSP SNP Purge CS-LSP 589 ----- --------------------- --- --- --- ----- ------- 590 150 Spine-Leaf y y n n y 592 This extension also proposes to have the Dynamic Hostname TLV, 593 already assigned as code 137, to be allowed in IIH PDU. 595 Value Name IIH LSP SNP Purge 596 ----- --------------------- --- --- --- ----- 597 137 Dynamic Name y y n y 599 Two new sub-TLVs are defined in this document and needs to be added 600 assigned by IANA from the "IS-IS TLV Codepoints". They are referred 601 to in this document as the Leaf-Set sub-TLV and the Info-Req sub-TLV. 602 It is suggested to have the values 1 and 2 respectively. 604 5. Security Considerations 606 Security concerns for IS-IS are addressed in [ISO10589], [RFC5304], 607 [RFC5310], and [RFC7602]. This extension does not raise additional 608 security issues. 610 6. Acknowledgments 612 TBD. 614 7. Document Change Log 616 7.1. Changes to draft-shen-isis-spine-leaf-ext-04.txt 618 o Submitted April 2017. 620 o Added the Tier level information to handle the multi-level spine- 621 leaf topology using this extension. 623 7.2. Changes to draft-shen-isis-spine-leaf-ext-03.txt 625 o Submitted March 2017. 627 o Added the Spine-Leaf sub-TLVs to handle the case of data center 628 pure CLOS topology and mechanism. 630 o Added the Spine-Leaf TLV and sub-TLVs can be optionally inserted 631 in either IIH PDU or CS-LSP PDU. 633 o Allow use of prefix Reachability TLVs 135 and 236 in IIHs/CS-LSPs 634 sent from spine to leaf. 636 7.3. Changes to draft-shen-isis-spine-leaf-ext-02.txt 638 o Submitted October 2016. 640 o Removed the 'Default Route Metric' field in the Spine-Leaf TLV and 641 changed to using the IS-IS Reverse Metric in IIH. 643 7.4. Changes to draft-shen-isis-spine-leaf-ext-01.txt 645 o Submitted April 2016. 647 o No change. Refresh the draft version. 649 7.5. Changes to draft-shen-isis-spine-leaf-ext-00.txt 651 o Initial version of the draft is published in November 2015. 653 8. References 655 8.1. Normative References 657 [ISO10589] 658 ISO "International Organization for Standardization", 659 "Intermediate system to Intermediate system intra-domain 660 routeing information exchange protocol for use in 661 conjunction with the protocol for providing the 662 connectionless-mode Network Service (ISO 8473), ISO/IEC 663 10589:2002, Second Edition.", Nov 2002. 665 [REVERSE-METRIC] 666 Shen, N., Amante, S., and M. Abrahamsson, "IS-IS Routing 667 with Reverse Metric", draft-ietf-isis-reverse-metric-06 668 (work in progress), 2017. 670 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 671 Requirement Levels", BCP 14, RFC 2119, 672 DOI 10.17487/RFC2119, March 1997, 673 . 675 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 676 Topology (MT) Routing in Intermediate System to 677 Intermediate Systems (IS-ISs)", RFC 5120, 678 DOI 10.17487/RFC5120, February 2008, 679 . 681 [RFC5301] McPherson, D. and N. Shen, "Dynamic Hostname Exchange 682 Mechanism for IS-IS", RFC 5301, DOI 10.17487/RFC5301, 683 October 2008, . 685 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 686 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 687 2008, . 689 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 690 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 691 2008, . 693 [RFC5306] Shand, M. and L. Ginsberg, "Restart Signaling for IS-IS", 694 RFC 5306, DOI 10.17487/RFC5306, October 2008, 695 . 697 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 698 DOI 10.17487/RFC5308, October 2008, 699 . 701 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 702 and M. Fanto, "IS-IS Generic Cryptographic 703 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 704 2009, . 706 [RFC7356] Ginsberg, L., Previdi, S., and Y. Yang, "IS-IS Flooding 707 Scope Link State PDUs (LSPs)", RFC 7356, 708 DOI 10.17487/RFC7356, September 2014, 709 . 711 [RFC7602] Chunduri, U., Lu, W., Tian, A., and N. Shen, "IS-IS 712 Extended Sequence Number TLV", RFC 7602, 713 DOI 10.17487/RFC7602, July 2015, 714 . 716 [RFC8202] Ginsberg, L., Previdi, S., and W. Henderickx, "IS-IS 717 Multi-Instance", RFC 8202, DOI 10.17487/RFC8202, June 718 2017, . 720 8.2. Informative References 722 [OPENFABRIC] 723 White, R. and S. Zandi, "Openfabric", draft-white- 724 openfabric-02 (work in progress), April 2017. 726 [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation 727 Element (PCE)-Based Architecture", RFC 4655, 728 DOI 10.17487/RFC4655, August 2006, 729 . 731 [RFC5309] Shen, N., Ed. and A. Zinin, Ed., "Point-to-Point Operation 732 over LAN in Link State Routing Protocols", RFC 5309, 733 DOI 10.17487/RFC5309, October 2008, 734 . 736 Authors' Addresses 738 Naiming Shen 739 Cisco Systems 740 560 McCarthy Blvd. 741 Milpitas, CA 95035 742 US 744 Email: naiming@cisco.com 745 Les Ginsberg 746 Cisco Systems 747 821 Alder Drive 748 Milpitas, CA 95035 749 US 751 Email: ginsberg@cisco.com 753 Sanjay Thyamagundalu 755 Email: tsanjay@gmail.com