idnits 2.17.1 draft-zhang-trill-aggregation-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 22, 2013) is 3892 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 144, but not defined == Missing Reference: 'RFC6327' is mentioned on line 148, but not defined ** Obsolete undefined reference: RFC 6327 (Obsoleted by RFC 7177) ** Downref: Normative reference to an Informational RFC: RFC 6349 == Outdated reference: A later version (-11) exists of draft-ietf-trill-cmt-01 Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Donald Eastlake 4 Expires: February 23, 2014 Huawei 5 August 22, 2013 7 Problem Statement: TRILL Active/Active Edge 8 draft-zhang-trill-aggregation-04.txt 10 Abstract 12 This document specifies TRILL active/active edge which allows 13 multiple RBridges concurrently forward data frames of the same VLAN 14 on links bundled by a Multi-Chassis Link Aggregation Group. With this 15 kind of connection, end nodes may increase the bandwidth and 16 reliability of the access at the edge of TRILL campuses. It's 17 required that no loop or duplication is caused by this new connection 18 type. Besides this basic requirement, this document outlines other 19 potential issues associated with TRILL active/active edge and 20 investigates how these issues may be addressed. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as 30 Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 Copyright and License Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Acronyms and Terminology . . . . . . . . . . . . . . . . . . . 3 62 2.1. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 4. Frame Processing . . . . . . . . . . . . . . . . . . . . . . . 6 66 4.1. Unicast Ingressing . . . . . . . . . . . . . . . . . . . . 6 67 4.2. Unicast Egressing . . . . . . . . . . . . . . . . . . . . . 6 68 4.3. Multicast Ingressing . . . . . . . . . . . . . . . . . . . 6 69 4.4. Multicast Egressing . . . . . . . . . . . . . . . . . . . . 6 70 5. DRB and Pseudonode . . . . . . . . . . . . . . . . . . . . . . 7 71 6. MAC Addresses Sharing . . . . . . . . . . . . . . . . . . . . . 8 72 7. Failures and Self-healing . . . . . . . . . . . . . . . . . . . 9 73 7.1. Link Failure . . . . . . . . . . . . . . . . . . . . . . . 9 74 7.2. Node Failure . . . . . . . . . . . . . . . . . . . . . . . 9 75 8. Reverse Path Forwarding Check . . . . . . . . . . . . . . . . . 9 76 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 11 77 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 78 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 79 11.1. Normative References . . . . . . . . . . . . . . . . . . . 11 80 11.2. Informative References . . . . . . . . . . . . . . . . . . 11 81 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 83 1. Introduction 85 TRILL makes use of the ISIS link state routing to provide least cost 86 paths between TRILL switches (a.k.a. Routing Bridge, RBridge). When a 87 multi-access LAN link connects end-stations to multiple RBridges, a 88 single RBridge has to be appointed as the frame forwarder for each 89 VLAN-x on this LAN link. Other RBridges MAY be appointed as frame 90 forwarders for other VLANs but MUST be inhibited from forwarding 91 frames for the same VLAN-x on this LAN link [RFC6349]. 93 An MC-LAG can also be used to connect end-stations to multiple 94 RBridges. There are two possible scenarios: (a) an end-station is 95 connected to multiple RBridges by an MC-LAG directly; (b) end- 96 stations are attached to a bridge and this bridge uses an MC-LAG to 97 connect multiple RBridges. An MC-LAG may choose any component link to 98 forward frames and never forwards between them. Therefore, it 99 requires the up-connected RBridges to provide active/active 100 attachment instead of the active/standby mode adopted in the 101 Appointed Forwarder mechanism [RFC6349]. This kind of attachment 102 allows end nodes increase the bandwidth and reliability of their 103 access to the TRILL campus via MC-LAG. 105 Similar as a LAN link, an MC-LAG can be represented by a pseudonode. 106 All member RBridges should report their adjacencies to this 107 pseudonode using LSPs. In this way, RBridges attached to the same MC- 108 LAG forms an active/active edge group. Other RBridges in the campus 109 communicate with this pseudonode using forwarding paths computed 110 according to ISIS link state routing. No additional add-on 111 characteristics are required. 113 The baseline requirement is that the active/active edge MUST provide 114 frame forwarding without causing loops or duplications to TRILL 115 campus and the end node. In order to work properly, the TRILL 116 active/active edge has to conduct several other issues. The purpose 117 of this document is to outline these issues while specific solutions 118 to address them are to be explored in the future as building blocks 119 of the whole TRILL active/active edge mechanism. 121 The rest of this document is organized as follows. Section 2 gives 122 acronyms and terminology. Section 3 provides an overview. Section 4 123 specifies the frame processing behaviors of member RBridges. Section 124 5 describes how pseudonode is set up. Section 6 explains the MAC 125 sharing among member RBridges. Section 7 describes the self-healing 126 issue. Section 8 investigates how to go through Reverse Path 127 Forwarding Check without packet loss. 129 2. Acronyms and Terminology 130 2.1. Acronyms 132 MC-LAG: Multi-Chassis Link Aggregation Group 133 ISIS: Intermediate System to Intermediate System 134 TRILL: TRansparent Interconnection of Lots of Links 135 AF: Appointed Forwarder 136 DT: Distribution Tree 137 RPFC: Reverse Path Forwarding Check 139 2.2. Terminology 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 143 document are to be interpreted as described in RFC 2119 [RFC2119]. 145 In this document, the term "end node" means the end station or bridge 146 connected to the TRILL active/active edge by MC-LAG. 148 Familiarity with [RFC6325], [RFC6327], and [RFC6349] is assumed in 149 this document. As in [RFC6325], in this document the word "link" 150 means a "bridged LAN", unless otherwise qualified. 152 3. Overview 154 If an end node (end station or bridge) uses an MC-LAG to connect 155 multiple edge RBridges, it's expected that all these RBridges can 156 ingress and egress frames for the end node. In contrast, if multiple 157 RBridges are connected to a LAN link, only one of them can be 158 appointed as the frame forwarder for each VLAN-x [RFC6349], as 159 illustrated in Figure 2.1 (a). Other RBridges will be inhibited from 160 ingressing and egressing frames for VLAN-x. 162 +-----+ +-----+ 163 | RBi | | RBi |(Remote RBridge) 164 +-----+ +-----+ 165 /\/\/\/\/\/\ /\/\/\/\/\/\ 166 / Transit \ / Transit \ 167 < RBridges > < RBridges > 168 \ / \ / 169 \/\/\/\/\/\/ \/\/\/\/\/\/ 170 | | | | 171 +-----+ +-----+ +-----+ +-----+ 172 | RB1 |--| RB2 | | RB1 |--| RB2 |(Active/Active Edge) 173 +-----+ +-----+ +-----+ +-----+ 174 AF\ / \ / 175 +---+ ******* 176 |LAN| * RBv * (Virtual RBridge) 177 +---+ ******* 178 | |(MC-LAG) 179 +---+ 180 | E | 181 +---+ 182 (a) Appointed Forwarder (b) Active/Active Edge 184 Figure 2.1: TRILL Appointed Forwarder vs Active-Active Edge 186 As illustrated in Figure 2.1 (b), The end node 'E' are attached to 187 both RB1 and RB2 using an MC-LAG. Each member RBridge can ingress and 188 egress frames for the end node for VLAN-x. If each of them uses its 189 own nickname as the ingress nickname, the remote RBridge may observe 190 different locations for one MAC address at different time, which is 191 referred as the "MAC move" problem in this document. The MAC move 192 problem affects the path selection at the remote RBridge. Frames 193 destined to the end node may go through different paths, which may 194 cause frame disorder of a traffic flow. 196 In order to avoid the MAC move problem, each member RBridge should 197 use a uniform nickname as the ingress nickname in TRILL data frame 198 encapsulation. As shown in Figure 2.1 (b), member RBridges pretend 199 there is an virtual RBridge connected to them, acting as the 200 appointed forwarder of the end node. It is naturally to denote this 201 virtual RBridge as a pseudonode. All RBridges connected to the MC-LAG 202 forms adjacencies with the pseudonode. Other RBridges believe there 203 is an RBridge RBv connecting RB1, RB2. Note that member RBridges 204 SHOULD NOT announce they are VLAN-x Appointed Forwarder if VLAN-x is 205 enabled on the MC-LAG. 207 Although the above example includes two edge RBridges, the TRILL 208 active/active edge solution SHOULD support cases with more than two 209 member RBridges. 211 4. Frame Processing 213 When the end node injects frames into the TRILL campus via a member 214 RBridge, this RBridge encapsulates the native frames on behalf of the 215 pseudonode. When frames are sent to the end node, the pseudonode is 216 supposed to be the egress RBridge. It's REQUIRED that RBridges other 217 than the active/active members are not aware of the active/active 218 group and need not change their frame processing behavior. 220 Compared to the Appointed Forwarder mechanism, all active/active 221 member RBridges are able to ingress and egress frames of VLAN-x on 222 the same link. It is crucial to avoid loops and duplications in the 223 frame processing. 225 4.1. Unicast Ingressing 227 Receiver RBridges encapsulate native frames using the nickname of the 228 pseudonode as the ingress nickname. When these TRILL data frames 229 arrive at the remote RBridge, the MAC addresses will be learnt from 230 packet decapsulation. The remote RBridge will regard the pseudonode 231 as the egress RBridge for these MAC addresses. 233 4.2. Unicast Egressing 235 As learnt in the MAC table, TRILL data frames from remote RBridges 236 destined to the end node will be sent to the pseudonode rather than 237 member RBridges. If member RBridges receive TRILL data frames whose 238 egress RBridge is the pseudonode, they can judge that these frames 239 should be egressed onto the MC-LAG. 241 However, member RBridges MUST NOT egress any TRILL data frames whose 242 ingress RBridge is the pseudonode. Otherwise, loops will happen. 244 4.3. Multicast Ingressing 246 The end node chooses one component link of the MC-LAG to send 247 multicast frames to member RBridges. Similar as the unicast 248 ingressing, the receiver RBridge encapsulate the native frames using 249 the nickname of the pseudonode as the ingress nickname. 251 Different member RBridges MUST NOT share the same Distribution Tree 252 to ingress a multicast frame of a specific VLAN-x from the end node. 253 Otherwise, some multicast frames may suffer from loss due to Reverse 254 Path Forwarding Check. This issues is detailed in Section 8. 256 4.4. Multicast Egressing 258 Multicast frames sent along the VLAN-x Distribution Tree may reach 259 all member RBridges. However, only one of them can egress the 260 multicast frames onto the MC-LAG. Otherwise, the end node will suffer 261 from frame duplication. This requirement can be met if member 262 RBridges calculate the Distribution Tree regarding the pseudonode as 263 a normal RBridge. Then only one parent RBridge will be selected for 264 the pseudonode. Other non-parent member RBridges MUST refrain from 265 egressing multicast frames of VLAN-x onto the MC-LAG. 267 Similar as the unicast egressing, member RBridges MUST NOT egress any 268 multicast frames whose ingress RBridge is the pseudonode. 270 5. DRB and Pseudonode 272 As we know, a DRB MAY give a pseudonode name to a LAN link, issue an 273 LSP (Link State PDU) on behalf of the pseudonode, and issues CSNPs 274 (Complete Sequence Number PDUs) on the LAN link [RFC6325]. Different 275 from a LAN link, there is no HELLO exchanging on the MC-LAG. Thus, 276 the DRB cannot be elected using HELLO protocol. Member RBridges MAY 277 establish a dedicated RBridge Channel to discover each other and 278 elect the DRB (DRB for active/active RBridge group, aDRB) to execute 279 the above tasks: to assign the nickname and issue LSP and CSNPs. The 280 member RBridge with the highest priority to be the tree root is a 281 good choice. 283 Member RBridges SHOULD be able to discover each other to resolve 284 misconfiguration and failures. Each member RBridge SHALL report their 285 connection to the MC-LAG. The MAC address of the end node MAY be used 286 to identify the MC-LAG to which the member RBridges are connected. 288 One RBridge may be connected to multiple MC-LAGs. It's probably that 289 all these MC-LAGs share the same set of member RBridges. However, 290 these MC-LAGs MUST NOT share the same pseudonode, otherwise it can 291 cause the following issue. 293 o Component Links from Different MC-LAGs Cannot be Distinguished: 294 Assume member RBridge RBi is connected to multiple end nodes and 295 these links are all advertised as a single ISIS link "RBi-RBv". 296 Remote RBridges cannot distinguish these links connecting RBi and 297 RBv. When one of these links fails, it becomes problematic. On one 298 hand, if the failed link is not advertised as a down ISIS link, 299 traffic sent from remote RBridges to RBv via the failed link will 300 be trapped by blackholing. On the other hand, if the failed link is 301 announced as a down ISIS link. Component links from other MC-LAGs 302 will be disconnected mistakenly. 304 The right choice is to represent every MC-LAG as a unique pseudonode. 305 In this way, the failure of a component link of an MC-LAG can be 306 interpreted as an ISIS link failure. Thus the aDRB can issue a new 307 LSP on half of the pseudonode to trigger the link state update across 308 the campus. 310 6. MAC Addresses Sharing 312 When a member RBridge learns a MAC address from the encapsulation or 313 decapsulation of a TRILL data frame, it SHOULD share this learning 314 among all member RBridges. Afterwards, a frame destined to this MAC 315 address can be delivered to the MC-LAG or ingressed to the TRILL 316 campus by any other member RBridge as a unicast native frame or TRILL 317 data frame. 319 a) Northbound Sharing: When a remote RBridge chooses the path to send 320 data frames to the end node, these frames may arrive at anyone of 321 the member RBridges, given that member RBridges may be on the 322 Equal Cost Multiple Paths from the remote RBridge to the 323 pseudonode. If the MAC address from the end node was learnt and 324 recorded by any member RBridge before. The receiver RBridge SHOULD 325 have recorded this MAC (VLAN ID, MAC Address, Port Number) as 326 well, so that the frame can be delivered as a known unicast to the 327 end node. Therefore, local MAC addresses learnt from data frames 328 sent by the end node (northbound) SHOULD be shared among member 329 RBridges. 331 b) Southbound Sharing: The end node may choose any component link to 332 inject a frame, which achieves load-balance on the MC-LAG. If the 333 destination MAC address has been learnt by any member RBridge, the 334 receiver RBridge SHOULD also hold that MAC record (VLAN ID, MAC 335 Address, Egress RBridge Nickname). Thus the data frame need not be 336 sent as a multicast frame (unknown unicast). Therefore, MAC 337 addresses learnt from data frames sent by remote RBridges to the 338 end node (southbound) should be shared as well. 340 When an RBridge learns a source MAC address from a data frame, it 341 will record the VLAN ID, the source MAC address and location which 342 can be the incoming port number or the ingress nickname. A MAC 343 address shared by a peer RBridge is recorded as if it is locally 344 learned. For example, when RB1 shares a MAC with RB2, RB2 should set 345 the incoming port as its port attaching to the end node. 347 It is REQUIRED that all member RBridges set the same aging time for 348 each MAC address. Every time a MAC address is learnt or updated, all 349 member RBridges MUST update the record and reset its aging time. It's 350 probably that data frames from one source MAC are received 351 continuously. There is no problem to update the entry of this MAC 352 locally. However, when this update is executed among multiple member 353 RBridges, the intensive updates may consume a considerable bandwidth. 354 Therefore, member RBridges need a communication channel to realize 355 the MAC sharing, which can be realized through the extension of ESADI 356 or using a dedicated RBridge Channel [Channel]. 358 7. Failures and Self-healing 360 Resilience is a major purpose that the active/active edge aims to 361 achieve. From the side of the end node, the MC-LAG provides 362 reliability of the access link. From the side of the member RBridges, 363 the state change of the active/active edge caused by link or node 364 failures is reflected by the update of LSPs of member RBridges. This 365 provides self-healing of the active/active edge. 367 7.1. Link Failure 369 The failure of a component link of the MC-LAG link is translated into 370 an ISIS link failure: if a member RBridge is disconnected from the 371 end node, it will send out an LSP to announce that it is not 372 connected to the pseudonode. This will trigger the update of 373 forwarding tables of remote RBridges. Since other member RBridges 374 have also reported the connection to the pseudonode, remote RBridges 375 in the TRILL campus can send frames to the pseudonode via any other 376 member RBridge. Therefore, the reach-ability to the end node is not 377 broken by this link failure. 379 If the link connecting the aDRB and the end node fails, the link 380 failure will trigger the election of aDRB. The new aDRB SHOULD reuse 381 the nickname allocated to the pseudonode, which avoids changing the 382 locations of MAC addresses from the end node learnt by remote 383 RBridges. 385 The extreme case is that the last component link of the MC-LAG fails. 386 Then the aDRB SHOULD update its LSPs to remove the pseudonode from 387 the campus, which also destroys the whole active/active edge. 389 7.2. Node Failure 391 The node failure of member RBridges will also be reflected by LSP 392 announcement. If the aDRB fails, a new aDRB will be elected and this 393 new aDRB SHOULD reuse the nickname of the pseudonode allocated by the 394 old aDRB. 396 8. Reverse Path Forwarding Check 398 Reverse Path Forwarding Check (RPFC) is used by TRILL to suppress 399 forwarding loops of multicast frames [RFC6325]. For a specific 400 Distribution Tree (DT), a multicast frame from a specific ingress 401 RBridge can arrive at only one expected link of an RBridge. RBridges 402 MUST drop multicast frames that fail the RPFC [RFC6325]. 404 When multiple member RBridges ingress multicast frames for VLAN-x of 405 the end node simultaneously, it can not guarantee that these frames 406 always arrive at the expected link of at a remote RBridge. The 407 following example explains this issue. 409 RBi 410 / \ 411 RB1 RB2 412 / 413 RBv 415 Figure 7.1: The Distribution Tree, root=RBi 417 Suppose a Distribution Tree of Figure 2.1 (b) is constructed as shown 418 in Figure 7.1. For this Distributions Tree, multicast frames from RBv 419 to RBi is expected to be received at the port attaching to RB1. With 420 the active/active connection, RB2 can receive native data frames from 421 the MC-LAG as well. If RB2 adopts the above Distribution Tree, 422 multicast frames from RBv to RBi will be received at the port 423 attaching to RB2. This brings the problem: these frames will be 424 discarded according to the rule of RPFC. 426 RBx RBy 427 | | 428 RBi RBi 429 / \ / \ 430 RB1 RB2 RB1 RB2 431 / \ 432 RBv RBv 434 (a) DT, root=RBx (b) DT, root=RBy 436 Figure 7.2: Assign an Unique Tree to each Member RBridge 438 One way to avoid the above issue is to leverage the feature that 439 RBridges can compute multiple Distribution Trees. Be sure to assign 440 an unique Distribution Tree to each member RBridge for multicast 441 frame distribution. Identify these trees using their root RBridge 442 nicknames. The example in Figure 7.2 illustrates this method, where 443 RB1 and RB2 adopt two different Distribution Trees. 445 Active/active edge need to assign at least one Distribution Tree per 446 component link of an MC-LAG, the maximally allowed number of 447 component links depends on the number of Distribution Trees that all 448 RBridges can compute. However, MC-LAGs of the best current practice 449 have two component links, which are well supported by TRILL switches. 451 In [CMT], the Affinity TLV is used to achieve the above assignment of 452 Distribution Trees to member RBridges. It is REQUIRED that all 453 RBridges in the campus are able to recognize the Affinity TLV and 454 compute Distribution Trees as this TLV specified. 456 When there is a link or node failure in the active/active edge, the 457 failed Distribution Tree should be re-allocated to a new member 458 RBridge. It is RECOMMENDED that this re-allocation is incremental. In 459 other words, other Distribution Trees not affected by the failure 460 SHOULD be retained. 462 9. Security Considerations 464 This document raises no new security issues for ISIS. 466 10. IANA Considerations 468 This document requires no IANA actions. RFC Editor: please remove 469 this section before publication. 471 11. References 473 11.1. Normative References 475 [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol 476 Specification", RFC 6325, July 2011. 478 [RFC6349] R. Perlman, D. Eastlake, et al, "RBridges: Appointed 479 Forwarders", RFC 6349, November 2011. 481 [Channel] D. Eastlake, V Manral, et al, "TRILL: RBridge Channel 482 Support", draft-ietf-trill-rbridge-channel-08.txt, July 483 2012, working in progress. 485 [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast 486 Trees (CMT)for TRILL", draft-ietf-trill-cmt-01.txt, 487 November 2012, working in progress. 489 11.2. Informative References 491 None. 493 Author's Addresses 495 Mingui Zhang 496 Huawei Technologies 497 No.156 Beiqing Rd. Haidian District, 498 Beijing 100095 P.R. China 500 Email: zhangmingui@huawei.com 502 Donald E. Eastlake, 3rd 503 Huawei Technologies 504 155 Beaver Street 505 Milford, MA 01757 USA 507 Phone: +1-508-333-2270 508 Email: d3e3e3@gmail.com