idnits 2.17.1 draft-chen-isis-black-hole-avoid-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 5, 2018) is 2060 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1195' is defined on line 238, but no explicit reference was found in the text == Unused Reference: 'RFC5305' is defined on line 246, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-shen-isis-spine-leaf-ext-06 == Outdated reference: A later version (-07) exists of draft-white-openfabric-06 ** Downref: Normative reference to an Informational draft: draft-white-openfabric (ref. 'OpenFabric') == Outdated reference: A later version (-21) exists of draft-ietf-rift-rift-02 Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Z. Chen 3 Internet-Draft Huawei 4 Intended status: Standards Track X. Xu 5 Expires: March 9, 2019 Alibaba 6 D. Cheng 7 Huawei 8 September 5, 2018 10 Avoiding Traffic Black-Holes for Route Aggregation in IS-IS 11 draft-chen-isis-black-hole-avoid-03 13 Abstract 15 When the Intermediate System to Intermediate System (IS-IS) routing 16 protocol is adopted by a highly symmetric network such as the Leaf- 17 Spine or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches 18 in datacenters) are recommended to be prevented from receiving other 19 nodes' explicit routes in order to achieve scalability. However, 20 such a setup would cause traffic black-holes or suboptimal routing if 21 link failure happens in the network. This document introduces 22 INFINITE cost to IS-IS LSPs to solve this problem. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in RFC 2119 [RFC2119]. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on March 9, 2019. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Problem Description . . . . . . . . . . . . . . . . . . . . . 3 66 3. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 68 5. Security Considerations . . . . . . . . . . . . . . . . . . . 5 69 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 70 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 73 1. Introduction 75 When running the Intermediate System to Intermediate System (IS-IS) 76 routing protocol in a highly symmetric network such as the Leaf-Spine 77 or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches in 78 datacenters) are recommended to be prevented from receiving other 79 nodes' explicit routes in order to achieve scalability, as proposed 80 in [IS-IS-SL-Extension], [IS-IS-Overhead-Reduction], [RIFT], and 81 [OpenFabric]. In particular, each Leaf node SHOULD simply maintain a 82 default (or aggregated) route (e.g., 0.0.0.0/0) in its routing table, 83 of which the next hop SHOULD be an Equal Cost Multi Path (ECMP) group 84 including all Spines nodes that the Leaf node connects to. However, 85 such a setup would cause traffic black-holes or suboptimal routing if 86 link failure happens in the network, since the Leaf nodes are not 87 aware of any topology information. 89 To solve this problem, this document introduces INFINITE cost to IS- 90 IS LSPs. When link failure happens between a Spine node and a Leaf 91 node, the Spine node SHOULD advertise all prefixes attached to the 92 Leaf node, whose costs SHOULD be set to be INFINITE, to every other 93 Leaf node it connects to. On receiving the prefixes (with INFINITE 94 cost), each Leaf node SHOULD add the prefixes to its routing table, 95 thus avoiding traffic black-holes and suboptimal routing. 97 2. Problem Description 99 This section illustrates why link failure would cause traffic black- 100 hole or suboptimal routing when Leaf nodes only maintain default (or 101 aggregated) routes. 103 +--------+ +--------+ +--------+ 104 | Spine1 | | Spine2 | | Spine3 | 105 +-+-+-+-++ +-+-+-+-++ +-+-+-+-++ 106 +------+ | | | | | | | | | | | 107 | +------|-|-|-------------+ | | | | | | X 108 | | +----|-|-|---------------|-|-|-------------+ | | X 109 | | | | | | +------+ | | | | X 110 | | | | | | | +------|-|---------------+ | | 111 | | | | | | | | | | | | 112 | | | | | | | | | | | | 113 | | | | | | | | | | +-------+ +-----+ 114 | | | | | | | | | +---------|-------------+ | 115 | | | | | | | | +---------+ | | | 116 | | | | | +--------|-|----------------|-|-----------+ | | 117 | | | | +----------|-|--------------+ | | | | | 118 | | | +----------+ | | | | | | | | 119 +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ 120 | Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 | 121 +-------+ +-------+ +-------+ +-------+ 122 | | 123 --- --- 124 prefixA prefixB 126 Figure 1: Topology Example 128 Figure 1 shows a Spine-Leaf topology example where Leaf1 to Leaf4 are 129 connected to Spine1 to Spine3, and prefixA and prefixB are attached 130 to Leaf4. To achieve scalability, as proposed in [IS-IS-SL- 131 Extension], [IS-IS-Overhead-Reduction], [RIFT], and [OpenFabric], 132 Leaf1 to Leaf4 SHOULD NOT receive explict routes from each other nor 133 the Spine nodes. Instead, each of them maintains a default (or 134 aggregated) route (e.g., 0.0.0.0/0) in the routing table, of which 135 the next hop is an ECMP group including Spine1, Spine2, and Spine3. 136 Flows from one Leaf node to another are shared among Spine1, Spine2, 137 and Spine3 based on the well known 5-tuple hashing. 139 However, such a setup would cause traffic black-hole or suboptimal 140 routing when link failure happens in the network. For example, if 141 the link between Spine3 and Leaf4 is broken, Leaf1, Leaf2, and Leaf3 142 could not get aware of the failure. As a result, these Leaf nodes 143 will still send a portion of traffic destined for prefixA or prefixB 144 toward Spine3, which makes the traffic be discarded at Spine3, 145 causing traffic black-hole. On the other hand, if there are a set of 146 links or a higher tier of switches interconnecting Spine1, Spine2, 147 and Spine3, the traffic will be steered to other spine nodes or the 148 higher-tier switches by Spine3, causing suboptimal routing. 150 Therefore, this document introduces INFINITE cost to IS-IS LSPs to 151 solve this problem. 153 3. Solution 155 This document introduces the INFINITE cost to IS-IS LSPs, whose value 156 is to be determined. When link failure happens between a Spine node 157 and a Leaf node, the Spine node SHOULD 1) encode all prefixes 158 attached to the Leaf node into the IP Reachability TLV, 2) set the 159 cost of the prefixes to be INFINITE, 3) append the IP Reachability 160 TLV to the IS-IS LSP, and 4) send the LSP to every other Leaf node it 161 connects to. 163 When a Leaf node receives the prefixes (with INFINITE cost) 164 advertised by a Spine node, it SHOULD install each of the prefixes 165 into its routing table, of which the next hop SHOULD be set an ECMP 166 group including all Spine nodes it connects to except the one who 167 advertises the prefix. 169 For example, if the link between Spine3 and Leaf4 in Figure 1 is 170 broken, Spine3 SHOULD advertise prefixA and prefixB to Leaf1, Leaf2, 171 and Leaf3, by sending them an IS-IS LSP containing the IP 172 Reachability TLV. The cost of prefixA and prefixB SHOULD be set 173 INFINITE. On receiving the LSP, Leaf1, Leaf2, and Leaf3 SHOULD 174 install prefixA and prefixB into their routing tables, and the next 175 hop of prefixA or prefixB SHOULD be set an ECMP group including 176 Spine1 and Spine2. For instance, the routing table of Leaf1 before 177 and after the link failure is shown in Figure 2 and Figure 3, 178 respectively. 180 Note that the mechanism described above could achieve minimal 181 signaling latency, which helps to avoid black-hole or suboptimal 182 routing rapidly when link failure happens. 184 +-----------+-----+---+----+-----+-------+--------------+ 185 |Destination|Proto|Pre|Cost|Flags|NextHop|Interface | 186 +-----------+-----+---+----+-----+-------+--------------+ 187 |0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 188 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 189 | |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 | 190 +-----------+-----+---+----+-----+-------+--------------+ 192 Figure 2: Routing Table of Leaf1 before link failure 194 +-----------+-----+---+----+-----+-------+--------------+ 195 |Destination|Proto|Pre|Cost|Flags|NextHop|Interface | 196 +-----------+-----+---+----+-----+-------+--------------+ 197 |0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 198 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 199 | |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 | 200 +-----------+-----+---+----+-----+-------+--------------+ 201 |prefixA |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 202 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 203 +-----------+-----+---+----+-----+-------+--------------+ 204 |prefixB |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 | 205 | |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 | 206 +-----------+-----+---+----+-----+-------+--------------+ 208 Figure 3: Routing Table of Leaf1 after link failure 210 4. IANA Considerations 212 TBD. 214 5. Security Considerations 216 TBD. 218 6. Acknowledgements 220 TBD. 222 7. References 224 [IS-IS-Overhead-Reduction] 225 Chen, Z., Xu, X., and D. Cheng, "Overheads Reduction for 226 IS-IS Enabled Spine-Leaf Networks", draft-chen-isis-sl- 227 overheads-reduction-03 (work in progress) , March 2018. 229 [IS-IS-SL-Extension] 230 Shen, N., Ginsberg, L., and S. Thyamagundalu, "IS-IS 231 Routing for Spine-Leaf Topology", draft-shen-isis-spine- 232 leaf-ext-06 (work in progress) , June 2018. 234 [OpenFabric] 235 White, R. and S. Zandi, "IS-IS Support for Openfabric", 236 draft-white-openfabric-06 (work in progress) , June 2018. 238 [RFC1195] Callon, R., "Use of OSI IS-IS for Routing in TCP/IP and 239 Dual Environments", RFC 1195 , December 1990. 241 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 242 Requirement Levels", BCP 14, RFC 2119, 243 DOI 10.17487/RFC2119, March 1997, 244 . 246 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 247 Engineering", RFC 5305 , October 2008. 249 [RIFT] Przygienda, T., Sharma, A., Drake, J., and A. Atlas, 250 "RIFT: Routing in Fat Trees", draft-ietf-rift-rift-02 251 (work in progress) , June 2018. 253 Authors' Addresses 255 Zhe Chen 256 Huawei 257 No. 156 Beiqing Rd 258 Beijing 100095 259 China 261 Email: chenzhe17@huawei.com 263 Xiaohu Xu 264 Alibaba 266 Email: xiaohu.xxh@alibaba-inc.com 268 Dean Cheng 269 Huawei 271 Email: dean.cheng@huawei.com