idnits 2.17.1 draft-ietf-trill-parent-selection-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 15, 2018) is 2259 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group R. Parameswaran 2 INTERNET-DRAFT Individual Contributor 3 Intended status: Proposed Standard 4 Expires: August 19, 2018 February 15, 2018 6 TRILL (Transparent Interconnection of Lots of Links): 7 Mitigation of Parent Node Shifts in Tree Construction 8 10 Abstract 12 This document describes a known problem in the TRILL tree 13 construction mechanism and offers an approach requiring no change to 14 the TRILL protocol that solves the problem. 16 Status of This Memo 18 This Internet-Draft is submitted to IETF in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Distribution of this document is unlimited. Comments should be sent 22 to the authors or the TRILL working group mailing list: 23 trill@ietf.org. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet- 28 Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 37 Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Table of Contents 42 1. Introduction............................................3 43 1.1 Terminology and Acronyms...............................3 44 2. Tree construction in TRILL..............................3 45 3. Issues with the TRILL tree construction algorithm.......4 46 4. Solution using the Affinity sub-TLV.....................6 47 5. Network wide selection of computation algorithm........10 48 6. Security Considerations................................10 49 7. IANA Considerations....................................10 51 8. Normative References...................................11 52 9. Informative References.................................11 53 10. Acknowledgements.......................................11 55 Author's Address:.........................................12 57 1. Introduction 59 TRILL is a data center technology that uses link-state routing 60 mechanisms in a layer 2 setting, and serves as a replacement for the 61 spanning-tree protocol. TRILL uses Multi-destination trees rooted at 62 predetermined nodes as a way to distribute multi-destination traffic. 64 Multi-destination traffic includes traffic such as layer-2 broadcast 65 frames, unknown unicast flooded frames, and layer 2 traffic with 66 multicast MAC addresses (collectively referred to as BUM traffic). 67 Multi-destination traffic is typically hashed onto one of the 68 available trees and sent over the tree, potentially reaching all 69 nodes in the network (hosts behind which may own/need the packet in 70 question). 72 1.1 Terminology and Acronyms 74 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 75 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 76 document are to be interpreted as described in [RFC2119]. 78 2. Tree construction in TRILL 80 Tree construction in TRILL is defined by [RFC6325], with corrections 81 defined in [RFC7780]. 83 The tree construction mechanism used in TRILL codifies certain tree 84 construction steps which make the resultant trees brittle as 85 explained below. TRILL uses the following rule - when constructing an 86 SPF tree, if there are multiple possible parents for a given node 87 (i.e. if multiple upstream nodes can potentially pull in a given node 88 during SPF, all at the same cumulative cost, then the parent 89 selection is imposed in the following manner): 91 [RFC6325]: 92 "When building the tree number j, remember all possible equal cost 93 parents for node N. After calculating the entire 'tree' (actually, 94 directed graph), for each node N, if N has 'p' parents, then order 95 the parents in ascending order according to the 7-octet IS-IS ID 96 considered as an unsigned integer, and number them starting at zero. 97 For tree j, choose N's parent as choice j mod p." 99 There is an additional correction posted to this in [RFC7780]: 101 [RFC7780], Section 3.4: 103 "Section 4.5.1 of [RFC6325] specifies that, when building 104 distribution tree number j, node (RBridge) N that has multiple 105 possible parents in the tree is attached to possible parent number 106 j mod p. Trees are numbered starting with 1, but possible parents 107 are numbered starting with 0. As a result, if there are two trees 108 and two possible parents, then in tree 1 parent 1 will be 109 selected, and in tree 2 parent 0 will be selected. 111 This is changed so that the selected parent MUST be (j-1) mod p. 112 As a result, in the case above, tree 1 will select parent 0, and 113 tree 2 will select parent 1. This change is not backward 114 compatible with [RFC6325]. If all RBridges in a campus do not 115 determine distribution trees in the same way, then for most 116 topologies, the RPFC will drop many multi-destination packets 117 before they have been properly delivered." 119 3. Issues with the TRILL tree construction algorithm 121 With the tree construction mechanism in Section 2 in mind,let's look 122 at the Spine-Leaf topology presented below and consider the 123 calculation of Tree number 2 in TRILL. Assume all the links in the 124 tree are the same cost. 126 A-- --B 127 / \ \/ /\ 128 / \/\ _/_ \ 129 /__ _/\ / \\ 130 // \/ \\ 131 1 2 3 132 \ | / 133 \ | / 134 \ | / 135 \ | / 136 \ | / 137 \ | / 138 \ |/ 139 C 141 Assume that in the above topology, when ordered by 7-octet ISIS-id, 1 142 < 2 < 3 holds and that the root for Tree number 2 is A. Given the 143 ordered set {1, 2, 3} , these nodes have the following indices (with 144 a starting index of 0): 146 Node Index 147 1 0 148 2 1 149 3 2 151 Given the SPF (Shortest Path First) constraint and that the tree root 152 is A, the parent for nodes 1,2, and 3 will be A. However, when the 153 SPF algorithm tries to pull B or C into the tree, we have a choice of 154 parents, namely 1, 2, or 3. 156 Given that this is tree 2, the parent will be the one with index 157 (2-1) mod 3 (which is equal to 1). Hence the parent for node B will 158 be the node with an index value of 1, which is node 2. 160 A 161 /|\ 162 / | \ 163 / | \ 164 1 2 3 165 /\ 166 / \ 167 B C 169 However, due to TRILL's parent selection algorithm, the sub-tree 170 rooted at Node 2 will be impacted even if Node 1 or Node 3 go down. 172 Take the case where Node 1 goes down. Tree 2 must now be re-computed 173 (this is normal) - but now, when the SPF computation is underway, 174 when the SPF process tries to pull in B, the list of potential 175 parents for B now are {2 and 3}. So, after ordering these by ISIS- 176 Id as {2, 3} (where 2 is considered to be at index of 0 and 3 is 177 considered to be at index 1), for tree 1, we apply TRILL's formula 178 of: 180 Parent's index = (TreeNumber-1) mod Number_of_parents. 181 = (2-1) mod 2 182 = 1 mod 2 183 = 1 (which is the index of Node 3) 185 The re-calculated tree now looks as shown below. The shift in parent 186 nodes (for B) may cause disruption to live traffic in the network, 187 and is unnecessary in absolute terms because the existing parent for 188 node B, node 2, was not perturbed in any way. 190 A 191 / \ 192 / \ 193 / \ 194 2 3 195 /\ 196 / \ 197 B C 199 Aside from the disruption posed by the change in the tree links, 200 depending upon how the concerned RBridges distribute VLANs/FGLs 201 across trees and how they may prune these, additional disruption is 202 possible if the forwarding state on the new parent RBridge is not 203 primed to match the new tree structure. This churn could simply be 204 avoided with a better approach. 206 The parent shift issue noted above can be solved by using the 207 Affinity sub-TLV which is specified in [RFC7176]. 209 While the technique identified in this draft has an immediate benefit 210 when applied to spine/leaf networks popular in data-center designs, 211 nothing in the approach outlined below assumes a spine-leaf network. 212 The technique presented below will work on any connected graph. 213 Furthermore, no directional symmetry in link-cost is assumed. 215 4. Solution using the Affinity sub-TLV 217 At a high level, this problem can be solved by having the affected 218 parent send out an Affinity sub-TLV identifying the children for 219 which it wants to preserve the parent-child relationship, despite 220 network events which may change the structure of the tree. The 221 concerned parent node would send out an Affinity sub-TLV with 222 multiple Affinity records, one per child node, listing the affected 223 tree number. 225 It would be sufficient to have a local RBridge configuration option 226 at one of the nodes that is the parent chosen (referred to as 227 designated parent below). The following steps provide a way to 228 implement this proposal: 230 a. The operator locally configures the designated parent to 231 indicate its stickiness in tree construction for a specific 232 tree number and tree root via the Affinity sub-TLV. This can be 233 done before tree construction if the operator consults the 7 234 octet ISIS-ID relative ordering of the concerned nodes and 235 decides up-front which of the potential parent nodes should 236 become the parent node for a given set of children on that tree 237 number under the TRILL tree construction mechanism. The 238 operator MUST configure the designated parent stickiness on 239 only one node amongst a set of sibling (potential parent) nodes 240 relative to the tree root for that tree number. 242 It is suggested that the parent stickiness be configured on the 243 node that would have been selected as the parent under default 244 TRILL parent selection rules. Parent stickiness MUST NOT be 245 configured on the root of the tree, or if configured previously 246 on a non-root node with the root for that tree shifting to that 247 node subsequently, such configuration MUST be ignored on the 248 root node. 250 b. On any subsequent SPF calculation after the operator configures 251 the designated parent as indicated above, when the designated 252 parent node finds that it could be a potential parent for one 253 or more child nodes during tree construction, it declares 254 itself to be the parent for the concerned child nodes, 255 overriding the default TRILL parent selection rules. The 256 configured node advertises its parent preference via the 257 Affinity sub-TLV when it completes a tree calculation, and 258 finds itself the parent of one or more child nodes per the SPF 259 tree calculation. The Affinity sub-TLV MUST reflect the 260 appropriate tree number and the child nodes for which the 261 concerned node is a parent node. The Affinity sub-TLV SHOULD be 262 published when the tree computation is deemed to have converged 263 (more on this under d below). 265 c. Likewise, when any change event happens in the network, one 266 which forces a tree re-calculation for the concerned tree, the 267 designated parent node MUST run through the normal TRILL tree 268 calculation agnostic to the fact that it has published an 269 Affinity sub-TLV and agnostic to the default TRILL tree 270 selection rules i.e the node asserts its right to be a parent 271 (based on its configuration as a designated parent) without 272 directly referencing the default TRILL parent selection rules 273 or its own published Affinity sub-TLV in establishing parent 274 relationships. 276 d. During the SPF tree calculation, the designated parent node 277 should react in the following manner: 279 i. If the node is a potential parent for some of the 280 children identified in an existing Affinity sub-TLV, if 281 any, after convergence of the tree computation, the node 282 MUST send out an (updated) Affinity sub-TLV identifying 283 the correct sub-set of children for which the node 284 aspires to establish/continue the parent relationship. 285 This case would also apply if there are new child nodes 286 for which the node is now a parent (however, see the 287 conflicted Affinity sub-TLV rules in vii and i below). 289 For its own tree computation, the designated parent node 290 MUST use itself as parent in order to pull the set of 291 children identified during the SPF run into the tree, 292 barring a conflicting affinity sub-TLV seen from another 293 node (see vii. below for handling this case). 295 ii. If the tree structure later changes such that the 296 designated node is no longer a potential parent for any 297 of the child nodes in the advertised Affinity sub-TLV, 298 then it SHOULD retract the Affinity sub-TLV, upon 299 convergence of the tree computation. In this case, the 300 default TRILL tie-breaking rule would need to be used 301 during SPF construction for the nodes that were children 302 of this designated node previously. One specific case may 303 be worth highlighting - if a parent-child relationship 304 inverts i.e. if the designated parent becomes a child of 305 its former child node due to a change in the tree 306 structure, it MUST exclude that child from its Affinity 307 sub-TLV. In such case, if the designated parent node 308 cannot maintain a parent relationship with any of its 309 prior child nodes, then it MUST retract any previously 310 published affinity sub-TLV. 312 iii. Nodes SHOULD use a convergence timer to track completion 313 of the tree computation. If there are any additional tree 314 computations while the convergence timer is running, the 315 timer SHOULD be re-started/extended in order to absorb 316 the interim network events. It is possible that the 317 intended action at the expiration of the timer may change 318 meanwhile. The timer needs to be large enough to absorb 319 multiple network events that may happen due to a change 320 in the physical state of the network, and yet short 321 enough to avoid delaying the update of the Affinity sub- 322 TLV. 324 iv. At the expiration of the convergence timer, the existing 325 state of the tree MUST be compared with the existing 326 Affinity sub-TLV and the intended change in the status of 327 the Affinity sub-TLV is carried out e.g. a fresh 328 publication, or an update to the list of children, or a 329 retraction. 331 v. Alternately, the above steps (re-examination of the 332 Affinity sub-TLV and update) MAY be tied to/triggered 333 from the download of the tree routes to the L2 RIB, since 334 that typically happens upon a successful computation of 335 the complete tree. An additional stabilization timer 336 could be used to counteract back-to-back L2 RIB downloads 337 due to repeated computations of the tree due to a burst 338 of network events. 340 vi. Note that this approach may cause an additional tree 341 computation at remote nodes once the updated Affinity 342 sub-TLV (or lack of it) is received/perceived, beyond the 343 network events which led up to the change in the tree. In 344 the case where an operator introduced a designated parent 345 configuration on an existing tree, then remote nodes 346 would need to receive the Affinity sub-TLV indicating the 347 designated parent's Affinity for its children before the 348 remote nodes shift away from the default TRILL parent 349 selection rules. However, in most cases, in steady state, 350 this mechanism should cause very little tree churn unless 351 a designated parent configuration was introduced, 352 removed, or a link between the designated parent and its 353 children changed state. In cases where the network change 354 event originated on the designated parent node, it may be 355 possible to optimize on the churn by packing both the 356 data bearing the network change event and the Affinity 357 sub-TLV into the same link-state update packet. 359 vii. In situations where the designated parent node would 360 normally originate an affinity sub-TLV to indicate 361 affinity to a specific set of child nodes, it MUST NOT 362 originate an Affinity sub-TLV if it sees an Affinity sub- 363 TLV from some other node for the same tree number and for 364 all of the same child-nodes, such that the other node's 365 Affinity sub-TLV would win using the conflict tie-break 366 rules in section 5.3 of [RFC7783]. Any existing Affinity 367 sub-TLV already published by this node in such a 368 situation MUST be retracted. If only some of the child 369 nodes overlap between the two conflicting Affinity sub- 370 TLVs, then this designated parent node MAY continue to 371 publish its affinity sub-TLV listing its child nodes that 372 are not in conflict with the other Affinity sub-TLV. 373 Other guidelines listed in [RFC7783] MUST be adhered to 374 as well - the originator of the Affinity sub-TLV must 375 name only directly adjacent nodes as children, and must 376 not name the tree root as a child. 378 e. Situations where the node advertising the Affinity sub-TLV dies 379 or restarts SHOULD be handled using the normal handling for 380 such scenarios relating to the parent Router Capability TLV, 381 and as specified in [RFC7981]. 383 f. Situations where a parent-child link directly connected to the 384 designated parent node constantly flaps, MUST be handled by 385 having the designated parent node retract the Affinity sub-TLV, 386 if it affects the parent-child relationships in consideration. 387 The long-term state of the Affinity sub-TLV can be monitored by 388 the designated parent node to see if it is being published and 389 retracted repeatedly in multiple iterations or if a specific 390 set of children are being constantly added and removed. The 391 designated parent may resume publication of the Affinity sub- 392 TLV once it perceives the network to be stable again in the 393 future. 395 g. If the designated parent node is forced to retract its Affinity 396 sub-TLV due to a change in the tree structure, it can then 397 repeat these steps in a subsequent tree construction, if the 398 same node becomes a parent again, so long as it perceives its 399 parent-child links to be stable (free of link/node flaps). 401 h. Remote nodes MUST default to the TRILL parent selection rules 402 if they do not see an Affinity sub-TLV sent by any node in the 403 network. 405 i. At remote nodes, conflicting Affinity sub-TLVs from different 406 originators for the same tree number and child node MUST be 407 handled as specified in section 5.3 of [RFC7783], namely by 408 selecting the Affinity sub-TLV originated by the node with the 409 highest priority to be a tree root, with System-ID as tie- 410 breaker. 412 5. Network wide selection of computation algorithm 414 The proposed solution above does not need any operational change to 415 the TRILL protocol, beyond the usage of the Affinity sub-TLV (which 416 is already in the proposed standard) for the use case identified in 417 this draft. 419 In terms of nodes that do not support this draft, they are expected 420 to seamlessly inter-operate with this draft, so long as they 421 understand and honor the Affinity sub-TLV. The draft assumes that 422 most TRILL implementations now support the Affinity sub-TLV. In any 423 case, the guidelines specified in section 4.1 of [RFC7783] MUST be 424 used i.e. if all nodes in the network do not announce support of the 425 Affinity sub-TLV then the network MUST default to the TRILL parent 426 selection rules. 428 6. Security Considerations 430 The proposal primarily influences tree construction and tries to 431 preserve parent-child relationships in the tree from prior 432 computations of the same tree, without changing any operational 433 aspects of the protocol (this proposal does not introduce any new 434 TLV/sub-TLV). Hence, no new security considerations for TRILL are 435 raised by this proposal. 437 7. IANA Considerations 439 This document requires no actions by IANA. The Affinity Sub-TLV has 440 been defined in [RFC7176], and this proposal requires use of this 441 Sub-TLV but does not change its semantics in any way. 443 8. Normative References 445 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 446 Requirement Levels", BCP 14, RFC 2119, DOI 447 10.17487/RFC2119, March 1997, . 450 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 451 Ghanwani, "Routing Bridges (RBridges): Base Protocol 452 Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011, 453 . 455 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 456 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 457 Lots of Links (TRILL): Clarifications, Corrections, and 458 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 459 . 461 [RFC7783] Senevirathne, T., Pathangi, J., Hudson, J., "Coordinated 462 Multicast Trees (CMT) for Transparent Interconnection of 463 Lots of Links (TRILL)", RFC 7783, February 2016, 464 466 [RFC7981] Ginsberg, L., Previdi, S., Chen, M., "IS-IS Extensions 467 for Advertising Router Information", RFC 7981, October 468 2016, 470 [RFC7176] Eastlake 3'rd, D., et al, "Transparent Interconnection of 471 Lots of Links (TRILL) Use of IS-IS", RFC 7176, May 2014, 472 474 9. Informative References 476 None. 478 10. Acknowledgements 480 I would like to thank Donald Eastlake for his help in preparing the 481 current iteration of the draft, and for reviewing prior iterations. 483 Author's Address: 485 Ramkumar Parameswaran, 486 Individual contributor, 487 PO Box 2788 488 Cupertino, CA 95015. 490 Email: parameswaran.r7@gmail.com 492 Copyright, Disclaimer, and Additional IPR Provisions 494 Copyright (c) 2018 IETF Trust and the persons identified as the 495 document authors. All rights reserved. 497 This document is subject to BCP 78 and the IETF Trust's Legal 498 Provisions Relating to IETF Documents 499 (http://trustee.ietf.org/license-info) in effect on the date of 500 publication of this document. Please review these documents 501 carefully, as they describe your rights and restrictions with respect 502 to this document. Code Components extracted from this document must 503 include Simplified BSD License text as described in Section 4.e of 504 the Trust Legal Provisions and are provided without warranty as 505 described in the Simplified BSD License. 507 The definitive version of an IETF Document is that published by, or 508 under the auspices of, the IETF. Versions of IETF Documents that are 509 published by third parties, including those that are translated into 510 other languages, should not be considered to be definitive versions 511 of IETF Documents. The definitive version of these Legal Provisions 512 is that published by, or under the auspices of, the IETF. Versions of 513 these Legal Provisions that are published by third parties, including 514 those that are translated into other languages, should not be 515 considered to be definitive versions of these Legal Provisions. For 516 the avoidance of doubt, each Contributor to the IETF Standards 517 Process licenses each Contribution that he or she makes as part of 518 the IETF Standards Process to the IETF Trust pursuant to the 519 provisions of RFC 5378. No language to the contrary, or terms, 520 conditions or rights that differ from or are inconsistent with the 521 rights and licenses granted under RFC 5378, shall have any effect and 522 shall be null and void, whether published or posted by such 523 Contributor, or included with or in such Contribution.