idnits 2.17.1 draft-chen-isis-black-hole-avoid-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2600 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 5305' is mentioned on line 270, but not defined == Unused Reference: 'RFC1195' is defined on line 356, but no explicit reference was found in the text == Unused Reference: 'RFC5305' is defined on line 364, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-chen-isis-sl-overheads-reduction-00 == Outdated reference: A later version (-07) exists of draft-shen-isis-spine-leaf-ext-02 == Outdated reference: A later version (-07) exists of draft-white-openfabric-00 ** Downref: Normative reference to an Informational draft: draft-white-openfabric (ref. 'OpenFabric') == Outdated reference: A later version (-05) exists of draft-przygienda-rift-01 Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Z. Chen 3 Internet-Draft X. Xu 4 Intended status: Standards Track Huawei Technologies 5 Expires: September 14, 2017 March 13, 2017 7 Avoiding Traffic Black-Holes for Route Aggregation in IS-IS 8 draft-chen-isis-black-hole-avoid-00 10 Abstract 12 When the Intermediate System to Intermediate System (IS-IS) routing 13 protocol is adopted by a highly symmetric network such as the Leaf- 14 Spine or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches 15 in datacenters) are recommended to be prevented from receiving other 16 nodes' explicit routes in order to achieve scalability. However, 17 such a setup would cause traffic black-holes or suboptimal routing if 18 link failure happens in the network. This document extends IS-IS to 19 solve this problem. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in RFC 2119 [RFC2119]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on September 14, 2017. 44 Copyright Notice 46 Copyright (c) 2017 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Problem Description . . . . . . . . . . . . . . . . . . . . . 3 63 3. IS-IS Extensions . . . . . . . . . . . . . . . . . . . . . . 4 64 3.1. TLV Encoding . . . . . . . . . . . . . . . . . . . . . . 4 65 3.2. Unreachable Prefixes Advertisement . . . . . . . . . . . 5 66 4. Alternative Solution . . . . . . . . . . . . . . . . . . . . 6 67 5. IPv6 Support . . . . . . . . . . . . . . . . . . . . . . . . 8 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 71 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 74 1. Introduction 76 When running the Intermediate System to Intermediate System (IS-IS) 77 routing protocol in a highly symmetric network such as the Leaf-Spine 78 or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches in 79 datacenters) are recommended to be prevented from receiving other 80 nodes' explicit routes in order to achieve scalability, as proposed 81 in [IS-IS-SL-Extension], [IS-IS-Overhead-Reduction], [RIFT], and 82 [OpenFabric]. In particular, each Leaf node SHOULD simply maintain a 83 default (or aggregated) route (e.g., 0.0.0.0/0) in its routing table, 84 of which the next hop SHOULD be an Equal Cost Multi Path (ECMP) group 85 including all Spines nodes that the Leaf node connects to. However, 86 such a setup would cause traffic black-holes or suboptimal routing if 87 link failure happens in the network, since the Leaf nodes are not 88 aware of any topology information. 90 To solve this problem, this document extends IS-IS to advertise 91 unreachable prefixes, which are defined as the prefixes that a 92 default (or aggregated) route's next hop can no longer reach. When 93 link failure happens between a Spine node and a Leaf node, the Spine 94 node SHOULD advertise all prefixes attached to the Leaf node (i.e., 95 the unreachable prefixes) to every other Leaf node it connects to. 96 On receiving the unreachable prefixes, each Leaf node SHOULD add the 97 unreachable prefixes to its routing table, thus avoiding traffic 98 black-holes and suboptimal routing. 100 2. Problem Description 102 This section illustrates why link failure would cause traffic black- 103 hole or suboptimal routing when Leaf nodes only maintain default (or 104 aggregated) routes. 106 +--------+ +--------+ +--------+ 107 | Spine1 | | Spine2 | | Spine3 | 108 +-+-+-+-++ +-+-+-+-++ +-+-+-+-++ 109 +------+ | | | | | | | | | | | 110 | +------|-|-|-------------+ | | | | | | X 111 | | +----|-|-|---------------|-|-|-------------+ | | X 112 | | | | | | +------+ | | | | X 113 | | | | | | | +------|-|---------------+ | | 114 | | | | | | | | | | | | 115 | | | | | | | | | | | | 116 | | | | | | | | | | +-------+ +-----+ 117 | | | | | | | | | +---------|-------------+ | 118 | | | | | | | | +---------+ | | | 119 | | | | | +--------|-|----------------|-|-----------+ | | 120 | | | | +----------|-|--------------+ | | | | | 121 | | | +----------+ | | | | | | | | 122 +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ 123 | Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 | 124 +-------+ +-------+ +-------+ +-------+ 125 | | 126 --- --- 127 prefixA prefixB 129 Figure 1: Topology Example 131 Figure 1 shows a Spine-Leaf topology example where Leaf1 to Leaf4 are 132 connected to Spine1 to Spine3, and prefixA and prefixB are attached 133 to Leaf4. To achieve scalability, as proposed in [IS-IS-SL- 134 Extension], [IS-IS-Overhead-Reduction], [RIFT], and [OpenFabric], 135 Leaf1 to Leaf4 SHOULD NOT receive explict routes from each other nor 136 the Spine nodes. Instead, each of them maintains a default (or 137 aggregated) route (e.g., 0.0.0.0/0) in the routing table, of which 138 the next hop is an ECMP group including Spine1, Spine2, and Spine3. 139 Flows from one Leaf node to another are shared among Spine1, Spine2, 140 and Spine3 based on the well known 5-tuple hashing. 142 However, such a setup would cause traffic black-hole or suboptimal 143 routing when link failure happens in the network. For example, if 144 the link between Spine3 and Leaf4 is broken, Leaf1, Leaf2, and Leaf3 145 could not get aware of the failure. As a result, these Leaf nodes 146 will still send a portion of traffic destined for prefixA or prefixB 147 toward Spine3, which makes the traffic be discarded at Spine3, 148 causing traffic black-hole. On the other hand, if there is a higher 149 tier of switches interconnecting Spine1, Spine2, and Spine3, the 150 traffic will be steered up to the higher-tier switches by Spine3, 151 causing suboptimal routing. 153 Therefore, this document extends IS-IS to advertise unreachable 154 prefixes thus solving this problem. 156 3. IS-IS Extensions 158 3.1. TLV Encoding 160 This document introduces one IS-IS TLV to advertise unreachable 161 prefixes, called the IP Unreachability TLV, which SHOULD be carried 162 in the IS-IS Link State Packet (LSP). The format of the IP 163 Unreachability TLV is shown as follow: 165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 166 | Type (1 octet) | 167 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 168 | Length (1 octet) | 169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 170 | Reserved (1 octet) | 171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 172 | Prefix Length (1 octet) | 173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 174 | Prefix (1 or 2 or 3 or 4 octets) | 175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 176 | Sub-TLV Length (1 octet) | 177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 178 | Optional Sub-TLVs (variable) | 179 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 180 | ...... | 181 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 182 | Prefix Length (1 octet) | 183 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 184 | Prefix (1 or 2 or 3 or 4 octets) | 185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 186 | Sub-TLV Length (1 octet) | 187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 188 | Optional Sub-TLVs (variable) | 189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 191 The fields of this TLV are defined as follows: 193 Type: TBD. 195 Length: Length of the Value field of the TLV. 197 Reserved: Bits reserved for future usage. 199 Prefix Length: The value can be 0 to 32, indicating the number of 200 effective bits in the Prefix field. 202 Prefix: Encoding the unreachable prefix in the minimal number of 203 octets for the given number of effective bits (i.e., the Prefix 204 Length field). The remaining bits of prefix SHOULD be set zero 205 and ignored upon receipt. 207 Sub-TLV Length: Length of Sub-TLVs. 209 Sub-TLVs: Optional Sub-TLVs for future extension. 211 Note that the last four fields can appear repeatedly. 213 3.2. Unreachable Prefixes Advertisement 215 When link failure happens between a Spine node and a Leaf node, the 216 Spine node SHOULD 1) encode all prefixes attached to the Leaf node 217 (i.e., the unreachable prefixes) into the IP Unreachability TLV, 2) 218 append the IP Unreachability TLV to the IS-IS LSP, and 3) send the 219 LSP to every other Leaf node it connects to. 221 When a Leaf node receives unreachable prefixes (contained in a LSP) 222 advertised by a Spine node, it SHOULD install each of the unreachable 223 prefixes into its routing table, of which the next hop SHOULD be set 224 an ECMP group including all Spine nodes it connects to except the one 225 who advertises the unreachable prefix. 227 For example, if the link between Spine3 and Leaf4 in Figure 1 is 228 broken, Spine3 SHOULD advertise prefixA and prefixB to Leaf1, Leaf2, 229 and Leaf3, by sending them an IS-IS LSP containing the IP 230 Unreachability TLV. On receiving the LSP, Leaf1, Leaf2, and Leaf3 231 SHOULD install prefixA and prefixB into their routing tables, and the 232 next hop of prefixA or prefixB SHOULD be set an ECMP group including 233 Spine1 and Spine2. For instance, the routing table of Leaf1 before 234 and after the link failure is shown in Figure 2 and Figure 3, 235 respectively. 237 Note that the mechanism described above could achieve minimal 238 signaling latency, which helps to avoid black-hole or suboptimal 239 routing rapidly when link failure happens. 241 +-----------+-----+---+----+-----+-------+--------------+ 242 |Destination|Proto|Pre|Cost|Flags|NextHop|Interface | 243 +-----------+-----+---+----+-----+-------+--------------+ 244 |0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 245 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 246 | |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 | 247 +-----------+-----+---+----+-----+-------+--------------+ 249 Figure 2: Routing Table of Leaf1 before link failure 251 +-----------+-----+---+----+-----+-------+--------------+ 252 |Destination|Proto|Pre|Cost|Flags|NextHop|Interface | 253 +-----------+-----+---+----+-----+-------+--------------+ 254 |0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 255 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 256 | |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 | 257 +-----------+-----+---+----+-----+-------+--------------+ 258 |prefixA |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 259 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 260 +-----------+-----+---+----+-----+-------+--------------+ 261 |prefixB |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 262 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 263 +-----------+-----+---+----+-----+-------+--------------+ 265 Figure 3: Routing Table of Leaf1 after link failure 267 4. Alternative Solution 269 The unreachable prefixes can alternatively be encoded as a new Sub- 270 TLV of the Extended IP Reachability TLV defined in [RFC 5305]. The 271 format of the Sub-TLV is shown as follow: 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | Type (1 octet) | 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 276 | Length (1 octet) | 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 | Reserved (1 octet) | 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 | Prefix Length (1 octet) | 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 | Prefix (1 or 2 or 3 or 4 octets) | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 | ...... | 285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 | Prefix Length (1 octet) | 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 | Prefix (1 or 2 or 3 or 4 octets) | 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 The fields of this Sub-TLV are defined as follows: 293 Type: TBD. 295 Length: Length of the Value field of the Sub-TLV. 297 Reserved: Bits reserved for future usage. 299 Prefix Length: The value can be 0 to 32, indicating the number of 300 effective bits in the Prefix field. 302 Prefix: Encoding the unreachable prefix in the minimal number of 303 octets for the given number of effective bits (i.e., the Prefix 304 Length field). The remaining bits of prefix SHOULD be set zero 305 and ignored upon receipt. 307 Note that the last two fields can appear repeatedly. 309 When link failure happens between a Spine node and a Leaf node, the 310 Spine node SHOULD 1) encode all prefixes attached to the Leaf node 311 (i.e., the unreachable prefixes) into the Sub-TLV described above, 2) 312 encode the Sub-TLV into the Extended IP Reachability TLV, 3) append 313 the Extended IP Reachability TLV to the IS-IS LSP, and 4) send the 314 LSP to every other Leaf node it connects to. The Prefix field of the 315 Extended IP Reachability TLV SHOULD be set the default (or 316 aggregated) route that each of the Leaf nodes already maintains. 318 When a Leaf node receives unreachable prefixes (contained in a LSP) 319 advertised by a Spine node, it SHOULD install each of the unreachable 320 prefixes into its routing table, of which the next hop SHOULD be set 321 an ECMP group including all Spine nodes it connects to except the one 322 who advertises the unreachable prefix. 324 5. IPv6 Support 326 Will be completed in the next version of the document. 328 6. IANA Considerations 330 TBD. 332 7. Security Considerations 334 TBD. 336 8. Acknowledgements 338 TBD. 340 9. References 342 [IS-IS-Overhead-Reduction] 343 Chen, Z. and X. Xu, "Overheads Reduction for IS-IS Enabled 344 Spine-Leaf Networks", draft-chen-isis-sl-overheads- 345 reduction-00 (work in progress) , January 2017. 347 [IS-IS-SL-Extension] 348 Shen, N. and S. Thyamagundalu, "IS-IS Routing for Spine- 349 Leaf Topology", draft-shen-isis-spine-leaf-ext-02 (work in 350 progress) , October 2016. 352 [OpenFabric] 353 White, R. and S. Zandi, "OpenFabric", draft-white- 354 openfabric-00 (work in progress) , March 2017. 356 [RFC1195] Callon, R., "Use of OSI IS-IS for Routing in TCP/IP and 357 Dual Environments", RFC 1195 , December 1990. 359 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 360 Requirement Levels", BCP 14, RFC 2119, 361 DOI 10.17487/RFC2119, March 1997, 362 . 364 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 365 Engineering", RFC 5305 , October 2008. 367 [RIFT] Przygienda, T., Drake, J., and A. Atlas, "RIFT: Routing in 368 Fat Trees", draft-przygienda-rift-01 (work in progress) , 369 January 2017. 371 Authors' Addresses 373 Zhe Chen 374 Huawei Technologies 375 No. 156 Beiqing Rd 376 Beijing 100095 377 China 379 Email: chenzhe17@huawei.com 381 Xiaohu Xu 382 Huawei Technologies 383 No. 156 Beiqing Rd 384 Beijing 100095 385 China 387 Email: xuxiaohu@huawei.com