idnits 2.17.1 draft-shen-isis-spine-leaf-ext-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 2, 2017) is 2612 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10589' == Outdated reference: A later version (-17) exists of draft-ietf-isis-reverse-metric-04 ** Obsolete normative reference: RFC 5306 (Obsoleted by RFC 8706) ** Obsolete normative reference: RFC 6822 (Obsoleted by RFC 8202) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group N. Shen 3 Internet-Draft L. Ginsberg 4 Intended status: Standards Track Cisco Systems 5 Expires: September 3, 2017 S. Thyamagundalu 6 March 2, 2017 8 IS-IS Routing for Spine-Leaf Topology 9 draft-shen-isis-spine-leaf-ext-03 11 Abstract 13 This document describes a mechanism for routers and switches in a 14 Spine-Leaf type topology to have non-reciprocal Intermediate System 15 to Intermediate System (IS-IS) routing relationships between the 16 leafs and spines. The leaf nodes do not need to have the topology 17 information of other nodes and exact prefixes in the network. This 18 extension also has application in the Internet of Things (IoT). 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on September 3, 2017. 37 Copyright Notice 39 Copyright (c) 2017 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 56 2. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Spine-Leaf (SL) Extension . . . . . . . . . . . . . . . . . . 4 58 3.1. Topology Examples . . . . . . . . . . . . . . . . . . . . 4 59 3.2. Applicability Statement . . . . . . . . . . . . . . . . . 5 60 3.3. Extension Encoding . . . . . . . . . . . . . . . . . . . 6 61 3.3.1. Spine-Leaf Sub-TLVs . . . . . . . . . . . . . . . . . 7 62 3.3.1.1. Leaf-Set Sub-TLV . . . . . . . . . . . . . . . . 7 63 3.3.1.2. Info-Req Sub-TLV . . . . . . . . . . . . . . . . 7 64 3.3.2. Advertising IPv4/IPv6 Reachability . . . . . . . . . 8 65 3.4. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.4.1. Pure CLOS Topology . . . . . . . . . . . . . . . . . 9 67 3.5. Implementation and Operation . . . . . . . . . . . . . . 10 68 3.5.1. CSNP PDU . . . . . . . . . . . . . . . . . . . . . . 10 69 3.5.2. Leaf to Leaf connection . . . . . . . . . . . . . . . 10 70 3.5.3. Overload Bit . . . . . . . . . . . . . . . . . . . . 11 71 3.5.4. Spine Node Hostname . . . . . . . . . . . . . . . . . 11 72 3.5.5. IS-IS Reverse Metric . . . . . . . . . . . . . . . . 11 73 3.5.6. Spine-Leaf Traffic Engineering . . . . . . . . . . . 12 74 3.5.7. Other End-to-End Services . . . . . . . . . . . . . . 12 75 3.5.8. Address Family and Topology . . . . . . . . . . . . . 12 76 3.5.9. Migration . . . . . . . . . . . . . . . . . . . . . . 12 77 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 78 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 79 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 80 7. Document Change Log . . . . . . . . . . . . . . . . . . . . . 13 81 7.1. Changes to draft-shen-isis-spine-leaf-ext-03.txt . . . . 13 82 7.2. Changes to draft-shen-isis-spine-leaf-ext-02.txt . . . . 14 83 7.3. Changes to draft-shen-isis-spine-leaf-ext-01.txt . . . . 14 84 7.4. Changes to draft-shen-isis-spine-leaf-ext-00.txt . . . . 14 85 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 86 8.1. Normative References . . . . . . . . . . . . . . . . . . 14 87 8.2. Informative References . . . . . . . . . . . . . . . . . 15 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 90 1. Introduction 92 The IS-IS routing protocol defined by [ISO10589] has been widely 93 deployed in provider networks, data centers and enterprise campus 94 environments. In the data center and enterprise switching networks, 95 a Spine-Leaf topology is commonly used. This document describes a 96 mechanism where IS-IS routing can be optimized for a Spine-Leaf 97 topology. 99 In a Spine-Leaf topology, normally a leaf node connects to a number 100 of spine nodes. Data traffic going from one leaf node to another 101 leaf node needs to pass through one of the spine nodes. Also, the 102 decision to choose one of the spine nodes is usually part of equal 103 cost multi-path (ECMP) load sharing. The spine nodes can be 104 considered as gateway devices to reach destinations on other leaf 105 nodes. In this type of topology, the spine nodes have to know the 106 topology and routing information of the entire network, but the leaf 107 nodes only need to know how to reach the gateway devices to which are 108 the spine nodes they are uplinked. 110 This document describes the IS-IS Spine-Leaf extension that allows 111 the spine nodes to have all the topology and routing information, 112 while keeping the leaf nodes free of topology information other than 113 the default gateway routing information. The leaf nodes do not even 114 need to run a Shortest Path First (SPF) calculation since they have 115 no topology information. 117 1.1. Requirements Language 119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 121 document are to be interpreted as described in RFC 2119 [RFC2119]. 123 2. Motivations 125 o The leaf nodes in a Spine-Leaf topology do not require complete 126 topology and routing information of the entire domain since their 127 forwarding decision is to use ECMP with spine nodes as default 128 gateways 130 o The spine nodes in a Spine-Leaf topology are richly connected to 131 leaf nodes, which introduces significant flooding duplication if 132 they flood all Link State PDUs (LSPs) to all the leaf nodes. It 133 saves both spine and leaf nodes' CPU and link bandwidth resources 134 if flooding is blocked to leaf nodes. For small Top of the Rack 135 (ToR) leaf switches in data centers, it is meaningful to prevent 136 full topology routing information and massive database flooding 137 through those devices. 139 o When a spine node advertises a topology change, every leaf node 140 connected to it will flood the update to all the other spine 141 nodes, and those spine nodes will further flood them to all the 142 leaf nodes, causing a O(n^2) flooding storm which is largely 143 redundant. 145 o Similar to some of the overlay technologies which are popular in 146 data centers, the edge devices (leaf nodes) may not need to 147 contain all the routing and forwarding information on the device's 148 control and forwarding planes. "Conversational Learning" can be 149 utilized to get the specific routing and forwarding information in 150 the case of pure CLOS topology and in the events of link and node 151 down. 153 o Small devices and appliances of Internet of Things (IoT) can be 154 considered as leafs in the routing topology sense. They have CPU 155 and memory constrains in design, and those IoT devices do not have 156 to know the exact network topology and prefixes as long as there 157 are ways to reach the cloud servers or other devices. 159 3. Spine-Leaf (SL) Extension 161 3.1. Topology Examples 163 +--------+ +--------+ +--------+ 164 | | | | | | 165 | Spine1 +----+ Spine2 +- ......... -+ SpineN | 166 | | | | | | 167 +-+-+-+-++ ++-+-+-+-+ +-+-+-+-++ 168 +------+ | | | | | | | | | | | 169 | +-----|-|-|------+ | | | | | | | 170 | | +--|-|-|--------+-|-|-----------------+ | | | 171 | | | | | | +---+ | | | | | 172 | | | | | | | +--|-|-------------------+ | | 173 | | | | | | | | | | +------+ +----+ 174 | | | | | | | | | +--------------|----------+ | 175 | | | | | | | | +-------------+ | | | 176 | | | | | +----|--|----------------|--|--------+ | | 177 | | | | +------|--|--------------+ | | | | | 178 | | | +------+ | | | | | | | | 179 ++--+--++ +-+-+--++ ++-+--+-+ ++-+--+-+ 180 | Leaf1 +~~~~~~+ Leaf2 | ........ | LeafX | | LeafY | 181 +-------+ +-------+ +-------+ +-------+ 183 Figure 1: A Spine-Leaf Topology 185 +---------+ +--------+ 186 | Spine1 | | Spine2 | 187 +-+-+-+-+-+ +-+-+-+-++ 188 | | | | | | | | 189 | | | +-----------------|-|-|-|-+ 190 | | +------------+ | | | | | 191 +--------+ +-+ | | | | | | 192 | +----------------------------+ | | | | 193 | | | +------------------+ | +----+ 194 | | | | | +-------+ | | 195 | | | | | | | | 196 +-+---+-+ +--+--+-+ +-+--+--+ +--+--+-+ 197 | Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 | 198 +-------+ +-------+ +-------+ +-------+ 200 Figure 2: A Fat Tree Topology 202 3.2. Applicability Statement 204 This extension assumes the network is a Spine-Leaf topology, and it 205 should not be applied in an arbitrary network setup. The spine nodes 206 can be viewed as the aggregation layer of the network, and the leaf 207 nodes as the access layer of the network. The leaf nodes use a load 208 sharing algorithm with spine nodes as nexthops in routing and 209 forwarding. 211 This extension works when the spine nodes are inter-connected, and it 212 works with a pure CLOS or Fat Tree topology based network where the 213 spines are NOT interconnected. 215 Although the example diagram in Figure 1 shows a fully meshed Spine- 216 Leaf topology, this extension also works in the case where they are 217 partially meshed. For instance, leaf1 through leaf10 may be fully 218 meshed with spine1 through spine5 while leaf11 through leaf20 is 219 fully meshed with spine4 through spine8, and all the spines are 220 inter-connected in a redundant fashion. 222 This extension also works with a topology with more than the typical 223 two layers of spine and leaf. For instance, in example diagrams 224 Figure 1 and Figure 2, there can be another Core layer of routers/ 225 switches on top of the aggregation layer. From an IS-IS routing 226 point of view, the Core nodes are not affected by this extension and 227 will have the complete topology and routing information just like the 228 spine nodes. To make the network even more scalable, the Core layer 229 can operate as a level-2 IS-IS sub-domain while the Spine and Leaf 230 layers operate as stays at the level-1 IS-IS domain. 232 This extension also supports the leaf nodes having local connections 233 to other leaf nodes, in the example diagram Figure 1 there is a 234 connection between 'Leaf1' node and 'Leaf2' node, and an external 235 host can be dual homed into both of the leaf nodes. 237 This extension assumes the link between the spine and leaf nodes are 238 point-to-point, or point-to-point over LAN [RFC5309]. The links 239 connecting among the spine nodes or the links between the leaf nodes 240 can be any type. 242 3.3. Extension Encoding 244 This extension introduces one TLV which may be used in IS-IS Hello 245 (IIH) PDUs or in Circuit Scoped Link State PDUs (CS-LSP) [RFC7356]. 246 It is used by both spine and leaf nodes in this Spine-Leaf mechanism. 248 0 1 2 3 249 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 251 | Type | Length | SL Flag | 252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 | .. Optional Sub-TLVs 254 +-+-+-+-+-+-+-+-+-.... 256 The fields of this TLV are defined as follows: 258 Type: 1 octet Suggested value 150 (to be assigned by IANA) 260 Length: 1 octet (2 + length of sub-TLVs). 262 Flags 16 bits 264 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | Reserved |B|R|L| 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 269 L bit (0x01): Only leaf node sets this bit. If the L bit is 270 set in the SL flag, the node indicates it is in 'Leaf- 271 Mode'. 273 R bit (0x02): Only Spine node sets this bit. If the R bit is 274 set, the node indicates to the leaf neighbor that it 275 can be used as the default route gateway. 277 B bit (0x04): Only leaf node sets this bit on Leaf-Leaf link, 278 in additional to the 'L' bit setting. If the B bit is 279 set, the node indicates to its leaf neighbor that it 280 can be used as the backup default route gateway. 282 Optional Sub-TLV: Not defined in this document, for future 283 extension 285 sub-TLVs MAY be included when the TLV is in a CS-LSP. 286 sub-TLVs MUST NOT be included when the TLV is in an IIH 288 3.3.1. Spine-Leaf Sub-TLVs 290 If the data center topology is a pure CLOS or Fat Tree, there are no 291 link connections among the spine nodes. If we also assume there is 292 not another Core layer on top of the aggregation layer, then the 293 traffic from one leaf node to another may have a problem if there is 294 a link outage between a spine node and a leaf node. For instance, in 295 the diagram of Figure 2, if Leaf1 sends data traffic to Leaf3 through 296 Spine1 node, and the Spine1-Leaf3 link is down, the data traffic will 297 be dropped on the Spine1 node. 299 To address this issue spine and leaf nodes may send/request specific 300 reachability information via the sub-TLVs defined below. 302 Two Spine-Leaf sub-TLVs are defined. The Leaf-Set sub-TLV and the 303 Info-Req sub-TLV. 305 3.3.1.1. Leaf-Set Sub-TLV 307 This sub-TLV is used by spine nodes to optionally advertise Leaf 308 neighbors to other Leaf nodes. The fields of this sub-TLV are 309 defined as follows: 311 Type: 1 octet Suggested value 1 (to be assigned by IANA) 313 Length: 1 octet MUST be a multiple of 6 octets. 315 Leaf-Set: A list of IS-IS System-ID of the leaf node neighbors of 316 this spine node. 318 3.3.1.2. Info-Req Sub-TLV 320 This sub-TLV is used by leaf nodes to request more specific prefix 321 information from a selected spine node, upon detecting one of the 322 spine node has lost the connection to a leaf node. The fields of 323 this sub-TLV are defined as follows: 325 Type: 1 octet Suggested value 2 (to be assigned by IANA) 327 Length: 1 octet. It MUST be a multiple of 6 octets. 329 Info-Req: List of IS-IS System-IDs of leaf nodes for which 330 connectivity information is being requested. 332 3.3.2. Advertising IPv4/IPv6 Reachability 334 In cases where connectivity between a leaf node and a spine node is 335 down, the leaf node MAY request reachability information from a spine 336 node as described in Section 3.3.1.2. The spine node utilizes TLVs 337 135 [RFC5305] and TLVs 236 [RFC5308] to advertise this information. 338 These TLVs MAY be included either in IIHs or CS-LSPs sent from the 339 spine to the requesting leaf node. Sending such information in IIHs 340 has limited scale - all reachability information MUST fit within a 341 single IIH. It is therefore recommended that CS-LSPs be used. 343 3.4. Mechanism 345 Each leaf node is provisioned by network operators as an IS-IS 'Leaf- 346 Node'. A spine node does not need explicit configuration. A leaf 347 node inserts the Spine-Leaf TLV in IIHs it originates. The TLV has 348 the 'L' bit set in the flags field. 350 When a spine node receives an IIH with the SL TLV and 'L' bit set, it 351 labels the point-to-point interface and adjacency to be a 'Leaf- 352 Peer'. IIHs sent by the spine node on a link to a Leaf-Peer includes 353 the Spine-Leaf TLV with the 'R' bit set in the flags field. The 'R' 354 bit indicates to the 'Leaf-Peer' neighbor that the spine node can be 355 used as a default routing nexthop. 357 There is no change to the IS-IS adjacency bring-up mechanism for 358 Spine-Leaf peers. 360 For the spine node with 'Leaf-Peer' adjacencies, the IS-IS LSP 361 flooding is blocked to the 'Leaf-Peer' interface, except for the LSP 362 PDUs in which the IS-IS System-ID matches the System-ID of the 'Leaf- 363 Peer' adjacency. This exception is needed since when the leaf node 364 reboots, the spine node needs to forward to the leaf node its 365 previous generation of LSP. No other LSP PDU needs to be flooded 366 over this 'Leaf-Peer' interface. 368 The leaf node will perform IS-IS LSP flooding as normal over all of 369 its IS-IS adjacencies, this means the leaf node will flood its own 370 LSPs over to spine nodes since those are all the LSPs in its LSP 371 database. 373 The spine node will receive all the LSP PDUs in the network, 374 including all the spine nodes and leaf nodes. It will perform 375 Shortest Path First (SPF) as normal IS-IS node does. There is no 376 change to the route calculation and forwarding on the spine nodes. 378 But the leaf node does not have any LSP in the network except for its 379 own, and there is no need to perform SPF algorithm on the system. It 380 only needs to download the default route with the nexthops of those 381 'Spine-Peer' which has the 'R' bit set in the Spine-Leaf TLV in IIH 382 PDUs. IS-IS can perform equal cost or unequal cost load sharing 383 while using the spine nodes as nexthops. The aggregated metric of 384 the outbound interface and the 'Reverse Metric' [REVERSE-METRIC] can 385 be used for this purpose. 387 In summary, this extension requires leaf node to insert Spine-Leaf 388 TLV in IIH, and set the 'L' bit in the SL flag, and download IS-IS 389 default route using the spine nodes as nexthops where the 'Spine- 390 Peer' set the 'R' bit in its IIH PDU; It requires spine node to 391 respond from 'Leaf-Peer' by inserting Spine-Leaf TLV in its IIH, 392 setting the 'R' bit in the SL flag, and blocking the LSP flooding 393 with the exception that it will set SRMflag on the LSPs that belong 394 to the 'Leaf-Peer' over that interface. 396 3.4.1. Pure CLOS Topology 398 In a data center where the topology is pure CLOS or Fat Tree, there 399 is no interconnection among the spine nodes, and there is not another 400 Core layer above the aggregation layer, when the link between a spine 401 and a leaf goes down, there is a possibility of black holing the data 402 traffic in the network. 404 As in the diagram Figure 2, if the link Spine1-Leaf3 goes down, there 405 needs to be a way for Leaf1, Leaf2 and Leaf4 to avoid the Spine1 if 406 the destination of data traffic is to Leaf3 node. 408 In the above example, the Spine1 and Spine2 are provisioned to 409 advertise the Spine-Leaf sub-TLV of Leaf-Set. Originally both Spines 410 will advertise Leaf1 through Leaf4 as their Leaf-Set. When the 411 Spine1-Leaf3 link is down, Spine1 will only have Leaf1, Leaf2 and 412 Leaf4 in its Leaf-Set. This allows the other leaf nodes to know that 413 Spine1 has lost the leaf node of Leaf3. 415 Each leaf node can select another spine node to request for some 416 prefix information associated with the lost leaf node. In this 417 diagram of Figure 2, there are only two spine nodes (Spine-Leaf 418 topology can have more than two spine nodes in general). Each leaf 419 node can independently select a spine node for the leaf information. 420 The leaf nodes will include the Info-Req sub-TLV in the Spine-Leaf 421 TLV towards that spine node, Spine2 in this case. 423 The spine node, upon receiving the request from one or more leaf 424 nodes, it will find the associated IPv6/IPv6 prefixes for this 425 requested client node, and the spine node will include the IPv6/IPv4 426 Info-Advertise sub-TLV when sending message towards the leaf nodes. 427 For instance, it will include the IPv4 loopback prefix of the leaf3 428 based on the policy configured or administrative tag attached to the 429 prefixes. When the leaf nodes receive the more specific prefixes, 430 they will install the advertised prefixes towards the other spine 431 nodes (Spine2 in this example). 433 For instance in the data center overlay scenario, when any IP 434 destination or MAC destination uses the leaf3's loopback as the 435 tunnel nexthop, the overlay tunnel from leaf nodes will only select 436 Spine2 as the gateway to reach leaf3 as long as the Spine1-Leaf3 link 437 is still down. 439 3.5. Implementation and Operation 441 3.5.1. CSNP PDU 443 In Spine-Leaf extension, Complete Sequence Number PDU (CSNP) does not 444 need to be transmitted over the Spine-Leaf link. Some IS-IS 445 implementation sends CSNPs after the initial adjacency bring-up over 446 point-to-point interface. There is no need for this optimization 447 here since the Leaf does not need to receive any other LSPs from the 448 network, and the only LSPs transmitted across the Spine-Leaf link is 449 the leaf node LSP. 451 Also in the graceful restart case[RFC5306], for the same reason, 452 there is no need to send the CSNPs over the Spine-Leaf interface. It 453 only needs to set the SRMflag on the LSPs belonging to the 'Leaf- 454 Peer' on the spine node, and set the SRMflag on its own LSPs on the 455 leaf node. 457 3.5.2. Leaf to Leaf connection 459 Leaf to leaf node links are useful in host redundancy cases in 460 switching networks, and normally there is no special requirement of 461 mechanism is needed for this case. Each leaf node will set the 'L' 462 bit in its IIH of the Spine-Leaf flag. LSP will be exchanged over 463 this link. In the example diagram Figure 1, the Leaf1 will get 464 Leaf2's LSP and Leaf2 will get Leaf1's LSP. They will install more 465 specific routes towards each other using this local Leaf-Leaf link. 466 SPF will be performed in this case just like when the entire network 467 only involves with those two IS-IS nodes. This does not affect the 468 normal Spine-Leaf mechanism they perform toward the spine nodes. 470 Besides the local leaf-to-leaf traffic, the leaf node can serve as a 471 backup gateway for its leaf neighbor. It needs to remove the 472 'Overload-Bit' setting in its LSP, and it sets both the 'L' bit and 473 the 'B' bit in the SL-flag with a high 'Reverse Metric' value. 475 3.5.3. Overload Bit 477 The leaf node SHOULD set the 'overload' bit on its LSP PDU, since if 478 the spine nodes were to forward traffic not meant for the local node, 479 the leaf node does not have the topology information to prevent a 480 routing/forwarding loop. 482 3.5.4. Spine Node Hostname 484 This extension creates a non-reciprocal relationship between the 485 spine node and leaf node. The spine node will receive leaf's LSP and 486 will know the leaf's hostname, but the leaf does not have spine's 487 LSP. This extension allows the Dynamic Hostname TLV [RFC5301] to be 488 optionally included in spine's IIH PDU when sending to a 'Leaf-Peer'. 489 This is useful in troubleshooting cases. 491 3.5.5. IS-IS Reverse Metric 493 This metric is part of the aggregated metric for leaf's default route 494 installation with load sharing among the spine nodes. When a spine 495 node is in 'overload' condition, it should use the IS-IS Reverse 496 Metric TLV in IIH [REVERSE-METRIC] to set this metric to maximum to 497 discourage the leaf using it as part of the loadsharing. 499 In some cases, certain spine nodes may have less bandwidth in link 500 provisioning or in real-time condition, and it can use this metric to 501 signal to the leaf nodes dynamically. 503 In other cases, such as when the spine node loses a link to a 504 particular leaf node, although it can redirect the traffic to other 505 spine nodes to reach that destination leaf node, but it MAY want to 506 increase this metric value if the inter-spine connection becomes over 507 utilized, or the latency becomes an issue. 509 In the leaf-leaf link as a backup gateway use case, the 'Reverse 510 Metric' SHOULD always be set to very high value. 512 3.5.6. Spine-Leaf Traffic Engineering 514 Besides using the IS-IS Reverse Metric by the spine nodes to affect 515 the traffic pattern for leaf default gateway towards multiple spine 516 nodes, the IPv6/IPv4 Info-Advertise sub-TLVs can be selectively used 517 by traffic engineering controllers to move data traffic around the 518 data center fabric to alleviate congestion and to reduce the latency 519 of a certain class of traffic pairs. By injecting more specific leaf 520 node prefixes, it will allow the spine nodes to attract more traffic 521 on some underutilized links. 523 3.5.7. Other End-to-End Services 525 Losing the topology information will have an impact on some of the 526 end-to-end network services, for instance, MPLS TE or end-to-end 527 segment routing. Some other mechanisms such as those described in 528 PCE [RFC4655] based solution may be used. In this Spine-Leaf 529 extension, the role of the leaf node is not too much different from 530 the multi-level IS-IS routing while the level-1 IS-IS nodes only have 531 the default route information towards the node which has the Attach 532 Bit (ATT) set, and the level-2 backbone does not have any topology 533 information of the level-1 areas. The exact mechanism to enable 534 certain end-to-end network services in Spine-Leaf network is outside 535 the scope of this document. 537 3.5.8. Address Family and Topology 539 IPv6 Address families[RFC5308], Multi-Topology (MT)[RFC5120] and 540 Multi-Instance (MI)[RFC6822] information is carried over the IIH PDU. 541 Since the goal is to simplify the operation of IS-IS network, for the 542 simplicity of this extension, the Spine-Leaf mechanism is applied the 543 same way to all the address families, MTs and MIs. 545 3.5.9. Migration 547 For this extension to be deployed in existing networks, a simple 548 migration scheme is needed. To support any leaf node in the network, 549 all the involved spine nodes have to be upgraded first. So the first 550 step is to migrate all the involved spine nodes to support this 551 extension, then the leaf nodes can be enabled with 'Leaf-Mode' one by 552 one. No flag day is needed for the extension migration. 554 4. IANA Considerations 556 A new TLV codepoint is defined in this document and needs to be 557 assigned by IANA from the "IS-IS TLV Codepoints" registry. It is 558 referred to as the Spine-Leaf TLV and the suggested value is 150. 559 This TLV is only to be optionally inserted either in the IIH PDU or 560 in the Circuit Flooding Scoped LSP PDU. IANA is also requested to 561 maintain the SL-flag bit values in this TLV, and 0x01, 0x02 and 0x04 562 bits are defined in this document. 564 Value Name IIH LSP SNP Purge CS-LSP 565 ----- --------------------- --- --- --- ----- ------- 566 150 Spine-Leaf y n n n y 568 This extension also proposes to have the Dynamic Hostname TLV, 569 already assigned as code 137, to be allowed in IIH PDU. 571 Value Name IIH LSP SNP Purge 572 ----- --------------------- --- --- --- ----- 573 137 Dynamic Name y y n y 575 Two new sub-TLVs are defined in this document and needs to be added 576 assigned by IANA from the "IS-IS TLV Codepoints". They are referred 577 to in this document as the Leaf-Set sub-TLV and the Info-Req sub-TLV. 578 It is suggested to have the values 1 and 2 respectively. 580 5. Security Considerations 582 Security concerns for IS-IS are addressed in [ISO10589], [RFC5304], 583 [RFC5310], and [RFC7602]. This extension does not raise additional 584 security issues. 586 6. Acknowledgments 588 TBD. 590 7. Document Change Log 592 7.1. Changes to draft-shen-isis-spine-leaf-ext-03.txt 594 o Submitted March 2017. 596 o Added the Spine-Leaf sub-TLVs to handle the case of data center 597 pure CLOS topology and mechanism. 599 o Added the Spine-Leaf TLV and sub-TLVs can be optionally inserted 600 in either IIH PDU or CS-LSP PDU. 602 o Allow use of prefix Reachability TLVs 135 and 236 in IIHs/CS-LSPs 603 sent from spine to leaf. 605 7.2. Changes to draft-shen-isis-spine-leaf-ext-02.txt 607 o Submitted October 2016. 609 o Removed the 'Default Route Metric' field in the Spine-Leaf TLV and 610 changed to using the IS-IS Reverse Metric in IIH. 612 7.3. Changes to draft-shen-isis-spine-leaf-ext-01.txt 614 o Submitted April 2016. 616 o No change. Refresh the draft version. 618 7.4. Changes to draft-shen-isis-spine-leaf-ext-00.txt 620 o Initial version of the draft is published in November 2015. 622 8. References 624 8.1. Normative References 626 [ISO10589] 627 ISO "International Organization for Standardization", 628 "Intermediate system to Intermediate system intra-domain 629 routeing information exchange protocol for use in 630 conjunction with the protocol for providing the 631 connectionless-mode Network Service (ISO 8473), ISO/IEC 632 10589:2002, Second Edition.", Nov 2002. 634 [REVERSE-METRIC] 635 Shen, N., Amante, S., and M. Abrahamsson, "IS-IS Routing 636 with Reverse Metric", draft-ietf-isis-reverse-metric-04 637 (work in progress), 2016. 639 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 640 Requirement Levels", BCP 14, RFC 2119, 641 DOI 10.17487/RFC2119, March 1997, 642 . 644 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 645 Topology (MT) Routing in Intermediate System to 646 Intermediate Systems (IS-ISs)", RFC 5120, 647 DOI 10.17487/RFC5120, February 2008, 648 . 650 [RFC5301] McPherson, D. and N. Shen, "Dynamic Hostname Exchange 651 Mechanism for IS-IS", RFC 5301, DOI 10.17487/RFC5301, 652 October 2008, . 654 [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic 655 Authentication", RFC 5304, DOI 10.17487/RFC5304, October 656 2008, . 658 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 659 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 660 2008, . 662 [RFC5306] Shand, M. and L. Ginsberg, "Restart Signaling for IS-IS", 663 RFC 5306, DOI 10.17487/RFC5306, October 2008, 664 . 666 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 667 DOI 10.17487/RFC5308, October 2008, 668 . 670 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 671 and M. Fanto, "IS-IS Generic Cryptographic 672 Authentication", RFC 5310, DOI 10.17487/RFC5310, February 673 2009, . 675 [RFC6822] Previdi, S., Ed., Ginsberg, L., Shand, M., Roy, A., and D. 676 Ward, "IS-IS Multi-Instance", RFC 6822, 677 DOI 10.17487/RFC6822, December 2012, 678 . 680 [RFC7356] Ginsberg, L., Previdi, S., and Y. Yang, "IS-IS Flooding 681 Scope Link State PDUs (LSPs)", RFC 7356, 682 DOI 10.17487/RFC7356, September 2014, 683 . 685 [RFC7602] Chunduri, U., Lu, W., Tian, A., and N. Shen, "IS-IS 686 Extended Sequence Number TLV", RFC 7602, 687 DOI 10.17487/RFC7602, July 2015, 688 . 690 8.2. Informative References 692 [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation 693 Element (PCE)-Based Architecture", RFC 4655, 694 DOI 10.17487/RFC4655, August 2006, 695 . 697 [RFC5309] Shen, N., Ed. and A. Zinin, Ed., "Point-to-Point Operation 698 over LAN in Link State Routing Protocols", RFC 5309, 699 DOI 10.17487/RFC5309, October 2008, 700 . 702 Authors' Addresses 704 Naiming Shen 705 Cisco Systems 706 560 McCarthy Blvd. 707 Milpitas, CA 95035 708 US 710 Email: naiming@cisco.com 712 Les Ginsberg 713 Cisco Systems 714 821 Alder Drive 715 Milpitas, CA 95035 716 US 718 Email: ginsberg@cisco.com 720 Sanjay Thyamagundalu 722 Email: tsanjay@gmail.com