idnits 2.17.1 draft-hao-trill-analysis-active-active-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(ii) Publication Limitation clause. If this document is intended for submission to the IESG for publication, this constitutes an error. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 53 instances of too long lines in the document, the longest one being 53 characters in excess of 72. ** The abstract seems to contain references ([TRILL-Active-PS]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'TRILL-Active-PS' on line 142 -- Looks like a reference, but probably isn't: 'RFC6325' on line 747 -- Looks like a reference, but probably isn't: 'RFC6165' on line 131 -- Looks like a reference, but probably isn't: 'RFC6326bis' on line 131 -- Looks like a reference, but probably isn't: 'RFC2119' on line 196 -- Looks like a reference, but probably isn't: 'CMT' on line 439 -- Looks like a reference, but probably isn't: '8021AX' on line 215 -- Looks like a reference, but probably isn't: 'TRILLPN' on line 395 -- Looks like a reference, but probably isn't: 'TRILAA' on line 687 == Unused Reference: '1' is defined on line 758, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 761, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 764, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 770, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 774, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 779, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 784, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 789, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 793, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-trill-active-active-connection-prob-03 -- Obsolete informational reference (is this intentional?): RFC 6439 (ref. '8') (Obsoleted by RFC 8139) == Outdated reference: A later version (-09) exists of draft-ietf-trill-esadi-05 Summary: 2 errors (**), 0 flaws (~~), 14 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Weiguo Hao 2 Yizhou Li 3 Donald Eastlake 4 Internet Draft Huawei 5 S. Hares 6 Hickory Hill Consulting 7 Muhammad Durrani 8 Brocade 9 H. Zhai 10 ZTE Corporation 12 Intended status: Informational May 20,2014 13 Expires: November 2014 15 Analysis of Active-Active Connection Solutions 16 draft-hao-trill-analysis-active-active-02.txt 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. This document may not be modified, 25 and derivative works of it may not be created, and it may not be 26 published except as an Internet-Draft. 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. This document may not be modified, 30 and derivative works of it may not be created, except to publish it 31 as an RFC and to translate it into languages other than English. 33 This document may contain material from IETF Documents or IETF 34 Contributions published or made publicly available before November 35 10, 2008. The person(s) controlling the copyright in some of this 36 material may not have granted the IETF Trust the right to allow 37 modifications of such material outside the IETF Standards Process. 38 Without obtaining an adequate license from the person(s) controlling 39 the copyright in such materials, this document may not be modified 40 outside the IETF Standards Process, and derivative works of it may 41 not be created outside the IETF Standards Process, except to format 42 it for publication as an RFC or to translate it into languages other 43 than English. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF), its areas, and its working groups. Note that 47 other groups may also distribute working documents as Internet- 48 Drafts. 50 Internet-Drafts are draft documents valid for a maximum of six 51 months and may be updated, replaced, or obsoleted by other documents 52 at any time. It is inappropriate to use Internet-Drafts as 53 reference material or to cite them other than as "work in progress." 55 The list of current Internet-Drafts can be accessed at 56 http://www.ietf.org/ietf/1id-abstracts.txt 58 The list of Internet-Draft Shadow Directories can be accessed at 59 http://www.ietf.org/shadow.html 61 This Internet-Draft will expire on November 20, 2014. 63 Copyright Notice 65 Copyright (c) 2014 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with 73 respect to this document. Code Components extracted from this 74 document must include Simplified BSD License text as described in 75 Section 4.e of the Trust Legal Provisions and are provided without 76 warranty as described in the Simplified BSD License. 78 This document is subject to BCP 78 and the IETF Trust's Legal 79 Provisions Relating to IETF Documents 80 (http://trustee.ietf.org/license-info) in effect on the date of 81 publication of this document. Please review these documents 82 carefully, as they describe your rights and restrictions with 83 respect to this document. 85 Abstract 87 Draft [TRILL-Active-PS] lists basic problems which any active-active 88 solutions should address, these problems include frame duplications, 89 loop, MAC address flip-flop and unsynchronized information among 90 member RBridges. For each problem, there may be multiple ways to 91 deal with it. Some solutions solve all or most of the problems 92 listed, and at the same time introduces extra issues. This draft 93 tries to analyze and compare the different solutions for each of the 94 issues, gives a brief summary on the pros and cons, and/or the 95 applicable scenarios. 97 Table of Contents 99 1. Introduction ................................................ 3 100 2. Conventions used in this document............................ 5 101 3. Frame duplications .......................................... 5 102 4. Loop ........................................................ 6 103 4.1. Independent nickname allocation......................... 7 104 4.2. Consistent nickname allocation per MC-LAG............... 7 105 4.3. Consistent nickname allocation per edge group RBridges...8 106 4.4. Comparison ............................................. 9 107 5. Address flip-flop ........................................... 9 108 5.1. Data plane learning mode................................ 9 109 5.1.1. CMT .............................................. 10 110 5.1.2. Centralized replication........................... 11 111 5.1.3. Tunneling among edge RBs.......................... 12 112 5.1.4. Comparison........................................ 13 113 5.2. Control plane learning mode............................ 14 114 6. Unsynchronized information among member RBridges............ 14 115 6.1. RBridge channel based communication protocol........... 15 116 6.2. TRILL LSP extension.................................... 15 117 6.3. ESADI extension........................................ 15 118 6.4. Comparison ............................................ 15 119 7. Solution summary ........................................... 16 120 8. Security Considerations..................................... 17 121 9. IANA Considerations ........................................ 17 122 10. References ................................................ 18 123 10.1. Normative References.................................. 18 124 10.2. Informative References................................ 18 126 1. Introduction 128 The IETF TRILL (Transparent Interconnection of Lots of Links) 129 [RFC6325] protocol provides loop free and per hop based multipath 130 data forwarding with minimum configuration. TRILL uses IS-IS 131 [RFC6165] [RFC6326bis] as its control plane routing protocol and 132 defines a TRILL specific header for user data. 134 Classic Ethernet(CE) devices typically are multi-homed to multiple 135 edge RBridges which form an edge group. All of the uplinks of CE are 136 bundled as a Multi-Chassis Link Aggregation (MC-LAG). An active- 137 active flow-based load sharing mechanism is normally implemented to 138 achieve better load balancing and high reliability. A CE device can 139 be a layer 3 end system by itself or a bridge switch through which 140 layer 3 end systems access to TRILL campus. 142 Draft [TRILL-Active-PS] lists the following problems which any 143 active-active solution should address: 145 +------+ 146 | CEx | 147 +------+ 148 | 149 +------+ 150 |(RBx) | 151 +------+ 152 | 153 ------------------- 154 / \ 156 | | 157 | TRILL Campus | 158 | | 159 \ / 160 -------------------- 161 | | | 162 -------- | -------- 163 | | | 164 +------+ +------+ +------+ 165 |(RB1) | |(RB2) | | (RBk)| 166 +------+ +------+ +------+ 167 | | | | 168 | -----------| |------ | 169 | |LAG1 LAG2 | | 170 +------+ +------+ 171 | CE1 | | CE2 | 172 +------+ +------+ 173 Figure 1 TRILL Active-Active Access Scenario 175 1. Frame duplications 177 2. Loop 179 3. Address flip-flop 181 4. Unsynchronized information among member RBridges 182 For each problem, there may be multiple ways to deal with it. And 183 some solutions solve all or most of the problems listed, and at the 184 same time introduces extra issues. This draft tries to analyze and 185 compare the different solutions for each of the issue, gives a brief 186 summary on the pros and cons, and/or the applicable scenarios. The 187 co-authors believe such analysis is helpful to design a more 188 completed solution in future. 190 2. Conventions used in this document 192 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 193 NOT","SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 194 this document are to be interpreted as described in RFC 2119 195 [RFC2119]. 197 The acronyms and terminology in [RFC6325] is used herein with the 198 following additions: 200 BUM - Broadcast, Unknown unicast, and Multicast. 202 CE - Refer to [CMT]. The device can be either physical or virtual 203 equipment. 205 CMT - Coordinated Multicast Trees [CMT]. 207 Edge group - a group of edge RBridges to which at least one CE is 208 multiply attached using MC-LAG. When multiple CEs attach to the 209 exact same set of edge RBridges, those edge RBridges can be 210 considered as a single edge group. One RBridge can be in more than 211 one edge group. 213 LACP - Link Aggregation Control Protocol. 215 LAG - Link Aggregation, as specified in [8021AX]. 217 3. Frame duplications 219 Problem: 221 Frame duplication may occur when a remote host sends multi- 222 destination frame to a local CE which has an active-active 223 connection to the TRILL campus. 225 Solution: 227 To avoid local CE receiving multiple copies from a remote RBridge, 228 the designated forwarder (DF) mechanism should be supported. DF 229 election mechanism allows only one port in one RB of a MC-LAG to 230 forward multicast traffic from TRILL campus to local access side for 231 each VLAN. The basic idea of DF is to elect one RBridge per VLAN 232 from an edge group to be responsible for egressing the multicast 233 traffic. 235 Each RB in an edge group elects a DF using same algorithm which 236 guarantees the same RB elected as DF per MC-LAG per VLAN. [draft- 237 hao-trill-dup-avoidance-active-active-00] describes the detail DF 238 mechanism and TRILL protocol extension for DF election. The RB that 239 is elected as a DF for a given VLAN will forward multi-destination 240 traffic in the egress direction towards the CE. All non-DF RBs drop 241 multi-destination traffic in the egress direction towards the CE. 242 All edge RBs, including DF and non-DF, can ingress the traffic to 243 TRILL campus as usual. As DF election is based on VLAN, DF ports for 244 different VLANs can be on different edge RBs. Thus egress bound 245 multicast traffic can be load balanced among multiple edge RBridges 246 in an edge group on per VLAN basis. 248 4. Loop 250 Problem: 252 If a CE sends a broadcast, unknown unicast, or multicast (BUM) 253 packet through DF port to a ingress RB, the RB will forward that 254 packet to all or subset of the other RBridges that only have non-DF 255 ports for that MC-LAG. Because BUM traffic forwarding to non-DF port 256 isn't allowed, in this case the frame won't loop back to the CE. 258 If a CE sends a BUM packet through non-DF port to a ingress RB, say 259 RB1, then RB1 will forward that packet through TRILL campus to DF 260 RBridge for the MC-LAG. In this case the frame will loop back to the 261 CE. 263 Solution: 265 A traffic split-horizon filtering mechanism should be used to avoid 266 looping back among RBridges in a edge group. 268 Split-horizon mechanism relies on ingress nickname to check if a 269 packet's egress port belongs to same MC-LAG with the packet's 270 incoming port to TRILL campus. The following sections describe 271 different nickname allocation schemes: 273 4.1. Independent nickname allocation 275 Each ingress RBridge allocates a unique nickname for each MC-LAG 276 independently. It is not required that the nickname provisioned on 277 all involved edge RBridges remains the same for one corresponding 278 MC-LAG. 280 When the ingress RBridge receives BUM traffic from an active-active 281 accessing CE device, the traffic will be injected into TRILL campus, 282 ingress nickname is the allocated unique nickname on ingress RB. 284 When an egress RBridge receives the BUM traffic from the TRILL 285 campus, it checks the ingress nickname in the TRILL header and 286 filters out the traffic on all local interfaces connected to the 287 same CE. Each egress RBridge should track the nickname(s) associated 288 with the other RBridge(s) with which it has a shared multi-homed LAG. 289 The solution has limited nickname allocation scalability issue, 290 because each RBridge needs allocate per nickname per MC-LAG. 292 4.2. Consistent nickname allocation per MC-LAG 294 Edge RBridges forming an MC-LAG in an edge group are assigned a 295 globally unique pseudo-nickname. If multiple MC-LAGs exist, edge 296 BRridges for each individual MC-LAG should be assigned such a 297 pseudo-nickname. It should be guaranteed that pseudo-nickname 298 provisioned on all involved edge RBridges remains the same for one 299 corresponding MC-LAG. 301 When a ingress RBridge receives traffic from a active-active 302 accessed CE, it performs TRILL encapsulation with the pseudo- 303 nickname as ingress nickname. When the traffic comes to each egress 304 RBridge, the egress RBridge checks ingress nickname in TRILL header 305 and filters out the traffic on all local interfaces connected to the 306 same CE. Each egress RBridge relies on the pseudo-nickname to filter 307 out the frame on all local interfaces connected to the same CE. 309 4.3. Consistent nickname allocation per edge group RBridges 311 +-----------+ 312 | (RB4) | 313 +-----------+ 315 | | | 316 -------- | -------- 317 | | | 318 +------+ +-------+ +------+ 319 |(RB1) | | (RB2) | | (RB3)| 320 +------+ +-------+ +------+ 321 * | * | ^ * | ^ 322 * | * | ^^^^^^^^^^^^^^^^ 323 * ----------*--------------*-| ^ 324 ****************************** | ^ 325 MC-LAG1 * MC-LAG2 | MC-LAG3 ^ 326 +------+ +------+ +------+ 327 | CE1 | | CE2 | | CE3 | 328 +------+ +------+ +------+ 329 Figure 2 Consistent nickname allocation per edge group RBridges 330 scenario 332 An edge group forming one or multiple MC-LAGs is assigned a globally 333 unique pseudo-nickname. All MC-LAGs corresponding to the edge group 334 share same pseudo-nickname to save nickname space. It should be 335 guaranteed that pseudo-nickname provisioned on all involving edge 336 RBridges in an edge group remains same. 338 In above figure 2,CE1 and CE2 are active-active accessed to RB1,RB2 339 and RB3,CE3 is active-active accessed to RB2 and RB3. Globally 340 unique pseudo-nickname of p-nick1 is assigned to the edge group 341 which contains RB1,RB2 and RB3, p-nick2 is assigned to the edge 342 group which contains RB2 and RB3. P-nick1 is used for MC-LAG1 and 343 MC-LAG2, p-nick2 is used for MC-LAG3.As only one pseudo-nickname is 344 assigned for MC-LAG1 and MC-LAG2, so nickname consumption is lower 345 than the consistent nickname allocation method per MC-LAG. 347 If one or more CE's uplinks occur link failure, the CE will connect 348 to new edge group RBs. At this time, the CE will use new pseudo- 349 nickname corresponding to the new edge group as ingress nickname. 351 Take the topology shown in figure 2 as example. If the link between 352 CE1 and RB1 fails, CE1 will connect to the edge group which contains 353 RB2 and RB3 only. Then p-nick2 will be used as ingress nickname for 354 CE1. If RB1 encounters node failure, both CE1 and CE2 will connect 355 to the rest edge RBs which are RB2 and RB3. Then p-nick2 is used as 356 ingress nickname for all of CE1, CE2 and CE3. 358 To enhance network convergence, access link failure and edge node 359 failure should be detected by each edge RBridge in a edge group as 360 fast as possible. 362 4.4. Comparison 364 +----------------------+------------------------------------+----------------------------+----------------------------+ 366 | Solution | Independent Allocation | Consistent Allocation | Consistent Allocation | 368 | | | per MC-LAG | per Edge Group | 370 +----------------------+------------------------------------+----------------------------+----------------------------+ 372 | Nickname consumption | High | Medium | Low | 374 +----------------------+------------------------------------+----------------------------+----------------------------+ 376 | Scalability | Low | Medium | High | 378 +----------------------+------------------------------------+----------------------------+----------------------------+ 380 5. Address flip-flop 382 MAC learning in TRILL can be performed either in data plane or 383 control plane. When a local host h1 attaches to multiple edge 384 RBridges, learning at the remote host for h1 may have MAC flip-flop 385 problem. 387 There are different ways to avoid this for data plane learning and 388 control plane learning scenarios. 390 5.1. Data plane learning mode 392 Problem: 394 For data plane learning mode, to avoid mac address flip-flop on 395 remote RBs, a pseudo-nickname [TRILLPN] solution was proposed. The 396 basic idea is to use a virtual RBridge of RBv with a single pseudo- 397 nickname to represent an edge group that MC-LAG connects to. Any 398 member RBridge of that edge group should use this pseudo-nickname 399 rather than its own nickname as ingress nickname when it injects 400 TRILL data frames to TRILL campus. The use of the nickname solves 401 the address flip flop issue by making the MAC address learnt by the 402 remote RBridge bound to pseudo-nickname. 404 If DF-election mechanism is used for frame duplication prevention, 405 access ports on an RB are categorized as three types: non mc-lag, 406 mc-lag DF port and mc-lag non-DF port. The last two types can be 407 called mc-lag port. For each of the mc-lag port, there is a pseudo- 408 nickname associated. If consistent nickname allocation per edge 409 group RBridges is used, it is possible that same pseudo-nickname 410 associated to more than one port on a single RB. A typical scenario 411 is that CE1 is connected to RB1 & RB2 by mc-lag1 while CE2 is 412 connected to RB1 & RB2 by mc-lag 2. In order to save the number of 413 pseudo-nickname used, member ports for both mc-lag1 and mc-lag2 on 414 RB1 & RB2 are all associated to pseudo-nickname pn1. 416 On the other hand, pseudo-nickname introduces another issue, which 417 is incorrect packet drop by RPF check failure. Due to edge RBridges 418 which use a pseudo-nickname other than own nicknames as the ingress 419 nickname (Eg. Nick-Y) when the RBbridge forwards BUM traffic from 420 local CE, the traffic will be treated by an RBridge (RBn) sitting 421 between the ingress RB and distribution tree root as traffic whose 422 ingress point is RBv. If same distribution tree is used by these 423 different edge RBridges, the traffic may arrive at RBn from 424 different ports. Then the RPF check fails, and some of the traffic 425 receiving from unexpected ports will be dropped by RBn. 427 Solutions: 429 To overcome the RPF check failure issue, the following three 430 solutions have been proposed: CMT, centralized replication and 431 tunneling among edge RBs. For local replication behavior on the 432 ingress RBridge, CMT, centralized replication and tunneling among 433 edge RBs solutions should consider all the above access ports type 434 and may be different. The following subsections will give more 435 details. 437 5.1.1. CMT 439 CMT [CMT] solution allows edge RBridges to specify different 440 distribution trees to forward BUM traffic from a connecting CE 441 device by using a new IS-IS Affinity sub-TLV. Remote RBridges 442 calculate their forwarding tables and derive the RPF for 443 distribution trees based on the distribution tree association 444 advertisements. The BUM traffic injected to TRILL campus by ingress 445 RB will not return to ingress RB again. 447 When an ingress RBridge of RB1 receives BUM traffic from an active- 448 active accessing CE1 device, local replication behavior on RB1 is as 449 follows: 451 1. Local replication to non mc-lag ports as per RFC6325. 453 2. Local replication to the ports associated with the same pseudo- 454 nickname as that associated to the incoming port as per RFC6325. 456 3. Local replication to the mc-lag DF port associated with different 457 pseudo-nickname as per RFC6325. Do not replicate to mc-lag non-DF 458 port associated with different pseudo-nickname. 460 The above local forwarding behavior on the ingress RB of RB1 can be 461 called CMT local forwarding behavior. 463 In this solution, it's required to establish multiple distribution 464 trees in a TRILL campus, i.e. if a CE is active-active accessed to 4 465 edge RBridges, at least 4 distribution trees are required. No 466 hardware upgrade is needed for RBridges in the TRILL campus, only 467 software upgrade is needed. 469 5.1.2. Centralized replication 471 The solution has all ingress RBs send BUM traffic receiving from 472 local active-active connecting CE to a centralized node via unicast 473 TRILL encapsulation. When the centralized node receives the BUM 474 traffic, it decapsulates the traffic and forwards the BUM traffic to 475 all destination RBs using a distribution tree established via the 476 TRILL base protocol. To avoid RPF check failure on a RBridge sitting 477 between the ingress RBridge and the centralized replication node, 478 some change of RPF calculation algorithm is required. RPF 479 calculation on each RBridge should use the centralized node as 480 ingress RB instead of the real ingress RBridge of RBv to perform the 481 calculation. The BUM traffic injected to TRILL campus by ingress RB 482 will return to the ingress RB via distribution tree established as 483 per TRILL base protocol. [draft-hao-trill-centralized-replication-00] 484 describes the detail centralized replication solution. 486 When the ingress RBridge of RB1 receives BUM traffic from an active- 487 active accessing CE1 device, one copy of the traffic is forwarded 488 locally to other CE devices connecting via MC-LAG ports that share 489 same pseudo-nickname with the port connecting to CE1, another copy 490 of the traffic will be sent to a centralized node via unicast TRILL 491 encapsulation. Then it is replicated and forwarded to all 492 destination RBridges including RB1 itself along TRILL distribution 493 tree established as per TRILL base protocol. When RB1 receives the 494 TRILL multicast traffic, it will decapsulate TRILL encapsulation and 495 forward it to all local CE devices except CE1, if these CE devices 496 connect to RB1 via non-MC-LAG ports and MC-LAG DF ports. For other 497 CE devices which are connected to RB1 via MC-LAG non-DF ports, the 498 traffic will be dropped and will not be forwarded to these CEs. 500 In summary, local replication behavior on RB1 is as follows: 502 1. Local replication to the ports associated with the same pseudo- 503 nickname as that associated to the incoming port as per RFC6325. 505 2. Do not replicate to mc-lag port associated with different pseudo- 506 nickname. 508 3. Do not replicate to non mc-lag ports. 510 The above local forwarding behavior on the ingress RB of RB1 can be 511 called centralized local forwarding behavior it is different from 512 CMT local forwarding behavior. 514 If ingress RB of RB1 itself is the centralized node, BUM traffic 515 injected to TRILL campus won't loop back to RB1. In this case, the 516 local forwarding behavior is same with CMT local forwarding behavior. 518 In this solution, it's required to consume more network bandwidth 519 between ingress RB and distribution tree root node than CMT solution. 520 Both hardware and software upgrade are required on edge RBs 521 participating in active-active connection and the distribution tree 522 root node. This solution doesn't require multiple distribution trees 523 in TRILL campus. 525 5.1.3. Tunneling among edge RBs 527 This solution allows only a selected edge RBridge in an edge group 528 participating in active-active access to be responsible for 529 forwarding BUM traffic from connecting CE to TRILL campus along 530 distribution tree per TRILL base protocol. All other edge RBridges 531 in the edge group send BUM traffic from connecting CE to the 532 selected edge RBridge through unicast TRILL encapsulation. When the 533 selected edge RBridge receives unicast TRILL traffic from RB1 in a 534 same edge group, the selected RBridge decapsulates the unicast TRILL 535 packet. Then it forwards the BUM traffic through TRILL multicast 536 encapsulation to TRILL campus along distribution tree established as 537 per TRILL protocol. 539 The traffic will reach all destination RBridges and will loop back 540 to ingress RBridge of RB1 similar to the above centralized 541 replication solution, so local forwarding behavior on RB1 is same 542 with the centralized local forwarding behavior. 544 If ingress RBridge of RB1 is selected RBridge, the BUM traffic that 545 is injected into TRILL campus won't loop back to RB1, the local 546 forwarding behavior is same with the CMT local forwarding behavior. 548 In this solution, it's required to consume more network bandwidth 549 among edge RBs. Both hardware and software upgrade are required on 550 edge RBs participating active-active connection. This solution only 551 needs one distribution tree in TRILL campus. 553 5.1.4. Comparison 555 Data Plane Mode: 557 +------------------------+---------+--------------------------+----------------------------+ 559 | Solution | CMT | Centralized replication | Tunneling among edge RBs | 561 +------------------------+---------+--------------------------+----------------------------+ 563 | Dist tree required | | | | 565 |for N-active scenario | N | 1 | 1 | 567 +------------------------+---------+--------------------------+----------------------------+ 569 | Network bandwidth | Low | High | High | 571 | consumption | | | | 573 +------------------------+---------+--------------------------+----------------------------+ 575 | Local forwarding | CMT | Ingress RB is the |Ingress RB is selected RB: | 577 | behavior on ingress RB | | centralized node: CMT |CMT | 579 | | | Other ingress RB: |Other ingress RB: | 581 | | | centralized | centralized | 583 +------------------------+---------+--------------------------+----------------------------+ 585 | Software upgrade | All RBs | All RBs | root and edge nodes | 587 +------------------------+---------+--------------------------+----------------------------+ 588 | Hardware upgrade | No | root and edge nodes | root and edge nodes | 590 +------------------------+---------+--------------------------+----------------------------+ 592 5.2. Control plane learning mode 594 If a CE device is multi-homed to multiple edge RBs in active-active 595 mode, each edge RB should announce the MAC of its attached end 596 systems to all other RBs through ESADI-like control protocol. Remote 597 RBriges will learn the MAC association with different ingress RB 598 nicknames and generate multiple MAC forwarding entries in ECMP mode. 599 All edge RBs should disable the data plane MAC learning function. 600 MAC to nickname association should be learned only through the 601 control plane. 603 Pseudo-nickname mechanism was basically designed to avoid MAC 604 address learning flip-flop when a MAC address could be learnt to 605 more than one RBridge. With control plane MAC learning, pseudo- 606 nickname is not required since multiple mac to nickname entries can 607 be leaned for the same MAC. The problem of RPF check failure for 608 multicast frame caused by pseudo-nickname mechanism is not an issue 609 here. 611 In the control plane MAC learning solution, if an edge RB 612 participating TRILL active-active access receives BUM traffic from 613 connecting CE device, it uses its own nickname as ingress nickname 614 instead of pseudo-nickname to ingress data frame into a TRILL campus. 616 This method requires hardware and software changes. 618 6. Unsynchronized information among member RBridges 620 Problem: 622 A local Rbridge, say RB1 in MC-LAG1, may have learned a VLAN and MAC 623 to nickname correspondence for a remote host h1 when h1 sends a 624 packet to CE1. The returning traffic from CE1 may go to any other 625 member RBridge of MC-LAG1, for example RB2. To avoid always flooding 626 for unicast traffic on RB2, MAC address should be synchronized among 627 the edge RBridges in a edge group. 629 To ensure DF election consistency, dynamic joined VLAN through VLAN 630 registration protocol (VRP) (GVRP or MVRP) and multicast group 631 through IGMP or MLD protocol should be synchronized among all 632 RBridges in a edge group. 634 Solution: 636 Synchronization mechanism should be provided to ensure information 637 consistency among all edge RBridges in a edge group. Three 638 synchronization solutions as follows are provided. 640 6.1. RBridge channel based communication protocol 642 RBridge channel based communication protocol among all RBridges in a 643 edge group is introduced to implement synchronization. The 644 communication protocol is restricted to RBridge nodes in each edge 645 group, other RBridges in TRILL campus needn't involve. A new type of 646 RBridge Channel message should be given by a Protocol field in the 647 RBridge Channel Header to indicate synchronization information in 648 the payload. RBridge channel message is forwarded through TRILL data 649 plane. Transmission delay is relatively low. 651 6.2. TRILL LSP extension 653 TRILL LSP can be extended to implement synchronization among all 654 edge RBridges. Synchronization information is conveyed through new 655 TLVs or sub-TLVs in TRILL LSP. Because TRILL LSP is flooded to all 656 RBridges in TRILL campus, so it may cause campus wide fluctuation. 657 TRILL LSP is forwarded through control plane. Transmission delay is 658 relatively high. 660 6.3. ESADI extension 662 TRILL ESADI can be extended to implement synchronization among all 663 edge RBridges. Currently ESADI only support MAC synchronization, it 664 doesn't support VLAN and multicast group information synchronization. 665 Similar to the solution of RBridge channel based communication 666 protocol, ESADI message is forwarded through TRILL data plane. 667 Transmission delay is relatively low. 669 6.4. Comparison 670 +----------------------+------------------------------------+----------------------------+----------------------------+ 672 | Solution | RBridge channel based | TRILL LSP extension | ESADI extension | 674 +----------------------+------------------------------------+----------------------------+----------------------------+ 676 | Flooding scope | Edge group | Campus wide | Edge group | 678 +----------------------+------------------------------------+----------------------------+----------------------------+ 680 | Forwarding | Data plane | Control plane | Data plane | 682 +----------------------+------------------------------------+----------------------------+----------------------------+ 684 7. Solution summary 686 The possible mechanisms for each individual problem listed in 687 [TRILAA] are described and compared in this document. The readers 688 can compile a complete solution from these mechanisms. 690 If there are multiple mechanisms for an individual problem, the 691 readers can picked up the most appropriate one based on the scenario. 692 For example, to solve MAC address flip-flop problem, if control 693 plane learning is not possibly supported, pseudo-nickname mechanism 694 via data plane MAC learning should be used. 696 When a mechanism is used to solve an individual problem, other 697 additional issues may be introduced and a complete solution should 698 be carefully designed to solve those non-generic issues. For example, 699 when pseudo-nickname mechanism is used to solve MAC address flip- 700 flop problem, RPF check failure issue is incurred. Three mechanisms, 701 CMT, centralized replication and tunneling among edge RBs, can be 702 used to solve the RPF check failure issue. If any one of them is 703 used, local forwarding behavior on ingress RBridges should be 704 carefully designed to ensure BUM traffic not duplicated or looped to 705 ingress RBridge's local connecting CE devices. 707 In summary, the candidate mechanism for each of the problem is 708 listed as follows. 710 +----------------------+-----------------------------------------------------------------+ 712 | Problem | Mechanisms | 714 +----------------------+-----------------------------------------------------------------+ 716 | Frame duplication | DF election | 718 +----------------------+---------------------------------------+-------------------------+ 720 | Loop | Data plane MAC learning | Control plane | 722 | | | MAC learning | 724 | |---------------------------------------+-------------------------+ 726 | | CMT | Centralized | Tunneling | | 728 | | | replication | among edge RBs | | 730 +----------------------+---------------------------------------+-------------------------+ 732 | Address flip-flop | Independant alloc| Consistent alloc | Consistent alloc | 734 | | | per LAG | per Edge Grp | 736 +----------------------+------------------------+--------------+--+----------------------+ 738 | Unsynchronized | | | | 740 | information | RBridge channel based | LSP extension | ESADI extension | 742 +----------------------+------------------------+-----------------+----------------------+ 744 8. Security Considerations 746 This draft does not introduce any extra security risks. For general 747 TRILL Security Considerations, see [RFC6325]. 749 9. IANA Considerations 751 This document requires no IANA Actions. RFC Editor: Please remove 752 this section before publication. 754 10. References 756 10.1. Normative References 758 [1] [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS 759 for Layer-2 Systems", RFC 6165, April 2011. 761 [2] [RFC6325] Perlman, R., et.al. "RBridge: Base Protocol 762 Specification", RFC 6325, July 2011. 764 [3] [RFC6326bis] Eastlake, D., Banerjee, A., Dutt, D., Perlman, 765 R., and A. Ghanwani, "TRILL Use of IS-IS", draft-eastlake- 766 isis-rfc6326bis, work in progress. 768 10.2. Informative References 770 [4] [TRILAA] Li,Y., et.al., " Problem Statement and Goals for 771 Active-Active TRILL Edge ", draft-ietf-trill-active-active- 772 connection-prob-03, Work in progress, May 2014. 774 [5] [TRILLPN] Zhai,H., et.al., "RBridge: Pseudonode Nickname", 775 draft-hu-trill-pseudonode-nickname, Work in progress, November 777 2011. 779 [6] [CMT] [CMT] Senevirathne, T., Pathangi, J., and J. Hudson, 781 "Coordinated Multicast Trees (CMT)for TRILL", draft-ietf- 782 trill-cmt-03.txt Work in Progress, April 2014 784 [7] [RFC7178] - D. Eastlake, V. Manral, L. Yizhou, S. Aldrin, D. 785 Ward, "Transparent Interconnection of Lots of Links (TRILL): 787 RBridge Channel Support", RFC7178, May 2014. 789 [8] [RFC6439] Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and 790 F. Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 791 6439, November 2011. 793 [9] [ESADI] H. Zhai, F. Hu, et al, "TRILL (Transparent 794 Interconnection of Lots of Links): ESADI (End Station Address 795 Distribution Information) Protocol", draft-ietf-trill-esadi- 796 05.txt, February 2014, working in progress. 798 Authors' Addresses 800 Weiguo Hao 801 Huawei Technologies 802 101 Software Avenue, 803 Nanjing 210012 804 China 805 Phone: +86-25-56623144 806 Email: haoweiguo@huawei.com 808 Yizhou Li 809 Huawei Technologies 810 101 Software Avenue, 811 Nanjing 210012 812 China 813 Phone: +86-25-56625375 814 Email: liyizhou@huawei.com 816 Susan Hares 817 Hickory Hill Consulting 818 7453 Hickory Hill 819 Saline, CA 48176 820 USA 821 Email: shares@ndzh.com 823 Muhammad Durrani 824 Brocade communications Systems, Inc 825 mdurrani@Brocade.com 827 Hongjun Zhai 828 ZTE Corporation 829 68 Zijinghua Road 830 Nanjing 200012 China 832 Phone: +86-25-52877345 833 Email: zhai.hongjun@zte.com.cn