idnits 2.17.1 draft-ietf-l2vpn-pbb-evpn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 8) being 64 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 483: '...he B-MAC address MUST be the same for ...' RFC 2119 keyword, line 510: '...me of CE auto-discovery), the MES MUST...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 27, 2012) is 4442 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'TRILL' is mentioned on line 237, but not defined == Missing Reference: 'RFC4364' is mentioned on line 334, but not defined == Outdated reference: A later version (-01) exists of draft-sajassi-raggarwa-l2vpn-evpn-req-00 == Outdated reference: A later version (-04) exists of draft-raggarwa-sajassi-l2vpn-evpn-01 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Working Group Ali Sajassi 3 Internet Draft Samer Salam 4 Category: Standards Track Sami Boutros 5 Cisco 6 Florin Balus 7 Wim Henderickx Nabil Bitar 8 Alcatel-Lucent Verizon 10 Clarence Filsfils Aldrin Issac 11 Dennis Cai Bloomberg 12 Cisco 13 Lizhong Jin 14 ZTE 16 Expires: August 27, 2012 February 27, 2012 18 PBB E-VPN 19 draft-ietf-l2vpn-pbb-evpn-00.txt 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with 24 the provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other documents 33 at any time. It is inappropriate to use Internet-Drafts as 34 reference material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 This Internet-Draft will expire on April 28, 2012. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described in 56 Section 4.e of the Trust Legal Provisions and are provided without 57 warranty as described in the Simplified BSD License. 59 Abstract 60 This document discusses how Ethernet Provider Backbone Bridging 61 [802.1ah] can be combined with E-VPN in order to reduce the number 62 of BGP MAC advertisement routes by aggregating Customer/Client MAC 63 (C-MAC) addresses via Provider Backbone MAC address (B-MAC), provide 64 client MAC address mobility using C-MAC aggregation and B-MAC sub- 65 netting, confine the scope of C-MAC learning to only active flows, 66 offer per site policies and avoid C-MAC address flushing on topology 67 changes. The combined solution is referred to as PBB-EVPN. 69 Conventions 71 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 72 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 73 document are to be interpreted as described in RFC 2119 75 Table of Contents 77 1. Introduction.................................................... 3 78 2. Contributors.................................................... 4 79 3. Terminology..................................................... 4 80 4. Requirements.................................................... 4 81 4.1. MAC Advertisement Route Scalability........................... 4 82 4.2. C-MAC Mobility with MAC Sub-netting........................... 5 83 4.3. C-MAC Address Learning and Confinement........................ 5 84 4.4. Interworking with TRILL and 802.1aq Access Networks with C-MAC 85 Address Transparency............................................... 5 86 4.5. Per Site Policy Support....................................... 6 87 4.6. Avoiding C-MAC Address Flushing............................... 6 88 5. Solution Overview............................................... 6 89 6. BGP Encoding.................................................... 7 90 6.1. BGP MAC Advertisement Route................................... 7 91 6.2. Ethernet Auto-Discovery Route................................. 8 92 6.3. Per VPN Route Targets......................................... 8 93 6.4. MAC Mobility Extended Community............................... 8 94 7. Operation....................................................... 8 95 7.1. MAC Address Distribution over Core............................ 8 96 7.2. Device Multi-homing........................................... 8 97 7.2.1. MES MAC Layer Addressing & Multi-homing..................... 8 98 7.2.2. Split Horizon and Designated Forwarder Election............ 11 99 7.3. Network Multi-homing......................................... 11 100 7.3.1. B-MAC Address Advertisement................................ 12 101 7.3.2. Failure Handling........................................... 12 102 7.4. Frame Forwarding............................................. 13 103 7.4.1. Unicast.................................................... 13 104 7.4.2. Multicast/Broadcast........................................ 14 105 8. Minimizing ARP Broadcast....................................... 14 106 9. Seamless Interworking with TRILL and IEEE 802.1aq/802.1Qbp..... 14 107 9.1. TRILL Nickname Advertisement Route........................... 15 108 9.2. IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route............ 16 109 9.3. Operation.................................................... 16 110 10. Solution Advantages........................................... 17 111 10.1. MAC Advertisement Route Scalability......................... 18 112 10.2. C-MAC Mobility with MAC Sub-netting......................... 18 113 10.3. C-MAC Address Learning and Confinement...................... 18 114 10.4. Interworking with TRILL and 802.1aq Access Networks with C-MAC 115 Address Transparency.............................................. 18 116 10.5. Per Site Policy Support..................................... 19 117 10.6. Avoiding C-MAC Address Flushing............................. 19 118 11. Acknowledgements.............................................. 20 119 12. Security Considerations....................................... 20 120 13. IANA Considerations........................................... 20 121 14. Intellectual Property Considerations.......................... 20 122 15. Normative References.......................................... 20 123 16. Informative References........................................ 20 124 17. Authors' Addresses............................................ 20 126 1. 127 Introduction 129 [E-VPN] introduces a solution for multipoint L2VPN services with 130 advanced multi-homing capabilities using BGP for distributing 131 customer/clinent MAC address reach-ability information over the core 132 MPLS/IP network. [802.1ah] defines an architecture for Ethernet 133 Provider Backbone Bridging (PBB), where MAC tunneling is employed to 134 improve service instance and MAC address scalability in Ethernet 135 networks and in VPLS networks [PBB-VPLS]. 137 In this document, we discuss how PBB can be combined with E-VPN in 138 order to reduce the number of BGP MAC advertisement routes by 139 aggregating Customer/Client MAC (C-MAC) addresses via Provider 140 Backbone MAC address (B-MAC), provide client MAC address mobility 141 using C-MAC aggregation and B-MAC sub-netting, confine the scope of 142 C-MAC learning to only active flows, offer per site policies and 143 avoid C-MAC address flushing on topology changes. The combined 144 solution is referred to as PBB-EVPN. 146 2. 147 Contributors 149 In addition to the authors listed above, the following individuals 150 also contributed to this document. 152 Keyur Patel 153 Cisco 155 3. 156 Terminology 158 BEB: Backbone Edge Bridge 159 B-MAC: Backbone MAC Address 160 CE: Customer Edge 161 C-MAC: Customer/Client MAC Address 162 DHD: Dual-homed Device 163 DHN: Dual-homed Network 164 LACP: Link Aggregation Control Protocol 165 LSM: Label Switched Multicast 166 MDT: Multicast Delivery Tree 167 MES: MPLS Edge Switch 168 MP2MP: Multipoint to Multipoint 169 P2MP: Point to Multipoint 170 P2P: Point to Point 171 PoA: Point of Attachment 172 PW: Pseudowire 173 E-VPN: Ethernet VPN 175 4. 176 Requirements 178 The requirements for PBB-EVPN include all the requirements for E-VPN 179 that were described in [EVPN-REQ], in addition to the following: 181 4.1. 182 MAC Advertisement Route Scalability 184 In typical operation, an [E-VPN] MES sends a BGP MAC Advertisement 185 Route per customer/client MAC (C-MAC) address. In certain 186 applications, this poses scalability challenges, as is the case in 187 virtualized data center environments where the number of virtual 188 machines (VMs), and hence the number of C-MAC addresses, can be in 189 the millions. In such scenarios, it is required to reduce the number 190 of BGP MAC Advertisement routes by relying on a MAC 'summarization' 191 scheme, as is provided by PBB. Note that the MAC sub-netting 192 capability already built into E-VPN is not sufficient in those 193 environments, as will be discussed next. 195 4.2. 196 C-MAC Mobility with MAC Sub-netting 198 Certain applications, such as virtual machine mobility, require 199 support for fast C-MAC address mobility. For these applications, it 200 is not possible to use MAC address sub-netting in E-VPN, i.e. 201 advertise reach-ability to a MAC address prefix. Rather, the exact 202 virtual machine MAC address needs to be transmitted in BGP MAC 203 Advertisement route. Otherwise, traffic would be forwarded to the 204 wrong segment when a virtual machine moves from one Ethernet segment 205 to another. This hinders the scalability benefits of sub-netting. 207 It is required to support C-MAC address mobility, while retaining 208 the scalability benefits of MAC sub-netting. This can be achieved by 209 leveraging PBB technology, which defines a Backbone MAC (B-MAC) 210 address space that is independent of the C-MAC address space, and 211 aggregate C-MAC addresses via a B-MAC address and then apply sub- 212 netting to B-MAC addresses. 214 4.3. 215 C-MAC Address Learning and Confinement 217 In E-VPN, all the MES nodes participating in the same E-VPN instance 218 are exposed to all the C-MAC addresses learnt by any one of these 219 MES nodes because a C-MAC learned by one of the MES nodes is 220 advertise in BGP to other MES nodes in that E-VPN instance. This is 221 the case even if some of the MES nodes for that E-VPN instance are 222 not involved in forwarding traffic to, or from, these C-MAC 223 addresses. Even if an implementation does not install hardware 224 forwarding entries for C-MAC addresses that are not part of active 225 traffic flows on that MES, the device memory is still consumed by 226 keeping record of the C-MAC addresses in the routing table (RIB). In 227 network applications with millions of C-MAC addresses, this 228 introduces a non-trivial waste of MES resources. As such, it is 229 required to confine the scope of visibility of C-MAC addresses only 230 to those MES nodes that are actively involved in forwarding traffic 231 to, or from, these addresses. 233 4.4. 234 Interworking with TRILL and 802.1aq Access Networks with C-MAC 235 Address Transparency 237 [TRILL] and [802.1aq] define next generation Ethernet bridging 238 technologies that offer optimal forwarding using IS-IS control 239 plane, and C-MAC address transparency via Ethernet tunneling 240 technologies. When access networks based on TRILL or 802.1aq are 241 interconnected over an MPLS/IP network, it is required to guarantee 242 C-MAC address transparency on the hand-off point and the edge (i.e. 243 MES) of the MPLS network. As such, solutions that require 244 termination of the access data-plane encapsulation (i.e. TRILL or 245 802.1aq) at the hand-off to the MPLS network do not meet this 246 transparency requirement, and expose the MPLS edge devices to the 247 MAC address scalability problem. 249 PBB-EVPN supports seamless interconnect with these next generation 250 Ethernet solutions while guaranteeing C-MAC address transparency on 251 the MES nodes. 253 4.5. 254 Per Site Policy Support 256 In many applications, it is required to be able to enforce 257 connectivity policy rules at the granularity of a site (or segment). 258 This includes the ability to control which MES nodes in the network 259 can forward traffic to, or from, a given site. PBB-EVPN is capable 260 of providing this granularity of policy control. In the case where 261 per C-MAC address granularity is required, the EVI can always 262 continue to operate in E-VPN mode. 264 4.6. 265 Avoiding C-MAC Address Flushing 267 It is required to avoid C-MAC address flushing upon link, port or 268 node failure for multi-homed devices and networks. This is in order 269 to speed up re-convergence upon failure. 271 5. 272 Solution Overview 274 The solution involves incorporating IEEE 802.1ah Backbone Edge 275 Bridge (BEB) functionality on the E-VPN MES nodes similar to PBB- 276 VPLS PEs (PBB-VPLS) where BEB functionality is incorporated in PE 277 nodes. The MES devices would then receive 802.1Q Ethernet frames 278 from their attachment circuits, encapsulate them in the PBB header 279 and forward the frames over the IP/MPLS core. On the egress E-VPN 280 MES, the PBB header is removed following the MPLS disposition, and 281 the original 802.1Q Ethernet frame is delivered to the customer 282 equipment. 284 BEB +--------------+ BEB 285 || | | || 286 \/ | | \/ 287 +----+ AC1 +----+ | | +----+ +----+ 288 | CE1|-----| | | | | |---| CE2| 289 +----+\ |MES1| | IP/MPLS | |MES3| +----+ 290 \ +----+ | Network | +----+ 291 \ | | 292 AC2\ +----+ | | 293 \| | | | 294 |MES2| | | 295 +----+ | | 296 /\ +--------------+ 297 || 298 BEB 299 <-802.1Q-> <------PBB over MPLS------> <-802.1Q-> 301 Figure 1: PBB-EVPN Network 303 The MES nodes perform the following functions: 304 - Learn customer/client MAC addresses (C-MACs) over the attachment 305 circuits in the data-plane, per normal bridge operation. 307 - Learn remote C-MAC to B-MAC bindings in the data-plane from 308 traffic ingress from the core per [802.1ah] bridging operation. 310 - Advertise local B-MAC address reach-ability information in BGP to 311 all other MES nodes in the same set of service instances. Note that 312 every MES has a set of local B-MAC addresses that uniquely identify 313 the device. More on the MES addressing in section 5. 315 - Build a forwarding table from remote BGP advertisements received 316 associating remote B-MAC addresses with remote MES IP addresses and 317 the associated MPLS label(s). 319 6. 320 BGP Encoding 322 PBB-EVPN leverages the same BGP Routes and Attributes defined in [E- 323 VPN], adapted as follows: 325 6.1. 326 BGP MAC Advertisement Route 328 The E-VPN MAC Advertisement Route is used to distribute B-MAC 329 addresses of the MES nodes instead of the C-MAC addresses of end- 330 stations/hosts. This is because the C-MAC addresses are learnt in 331 the data-plane for traffic arriving from the core. The MAC 332 Advertisement Route is encoded as follows: 334 - The RD is set to a Type 1 RD [RFC4364]. The value field encodes 335 the IP address of the MES (typically, the loopback address) 336 followed by 0. The reason for such encoding is that the RD cannot 337 be that of a single EVI since the same B-MAC address can span 338 across multiple EVIs. 340 - The MAC address field contains the B-MAC address. 341 - The Ethernet Tag field is set to 0. 343 The route is tagged with the set of RTs corresponding to all EVIs 344 associated with the B-MAC address. 346 All other fields are set as defined in [E-VPN]. 348 6.2. 349 Ethernet Auto-Discovery Route 351 This route and any of its associated modes is not needed in PBB- 352 EVPN. 354 6.3. 355 Per VPN Route Targets 357 PBB-EVPN uses the same set of route targets defined in [E-VPN]. More 358 specifically, the RT associated with a VPN is set to the value of 359 the I-SID associated with the service instance. This eliminates the 360 need for manually configuring the VPN-RT. 362 6.4. 363 MAC Mobility Extended Community 365 This extended community is a new transitive extended community. It 366 may be advertised along with MAC Advertisement routes. When used in 367 PBB-EVPN, it indicates that the C-MAC forwarding tables for the I- 368 SIDs associated with the RTs tagging the MAC Advertisement routes 369 must be flushed. This extended community is encoded in 8-bytes as 370 follows: 371 - Type (1 byte) = Pending IANA assignment. 372 - Sub-Type (1 byte) = Pending IANA assignment. 373 - Reserved (2 bytes) 374 - Counter (4 bytes) 376 Note that all other BGP messages and/or attributes are used as 377 defined in [E-VPN]. 379 7. 380 Operation 382 This section discusses the operation of PBB-EVPN, specifically in 383 areas where it differs from [E-VPN]. 385 7.1. 386 MAC Address Distribution over Core 388 In PBB-EVPN, host MAC addresses (i.e. C-MAC addresses) need not be 389 distributed in BGP. Rather, every MES independently learns the C-MAC 390 addresses in the data-plane via normal bridging operation. Every MES 391 has a set of one or more unicast B-MAC addresses associated with it, 392 and those are the addresses distributed over the core in MAC 393 Advertisement routes. Given that these B-MAC addresses are global 394 within the provider's network, there's no need to advertise them on 395 a per service instance basis. 397 7.2. 398 Device Multi-homing 400 7.2.1. 401 MES MAC Layer Addressing & Multi-homing 403 In [802.1ah] every BEB is uniquely identified by one or more B-MAC 404 addresses. These addresses are usually locally administered by the 405 Service Provider. For PBB-EVPN, the choice of B-MAC address(es) for 406 the MES nodes must be examined carefully as it has implications on 407 the proper operation of multi-homing. In particular, for the 408 scenario where a CE is multi-homed to a number of MES nodes with 409 all-active redundancy and flow-based load-balancing, a given C-MAC 410 address would be reachable via multiple MES nodes concurrently. 411 Given that any given remote MES will bind the C-MAC address to a 412 single B-MAC address, then the various MES nodes connected to the 413 same CE must share the same B-MAC address. Otherwise, the MAC 414 address table of the remote MES nodes will keep flip-flopping 415 between the B-MAC addresses of the various MES devices. For example, 416 consider the network of Figure 1, and assume that MES1 has B-MAC BM1 417 and MES2 has B-MAC BM2. Also, assume that both links from CE1 to the 418 MES nodes are part of an all-active multi-chassis Ethernet link 419 aggregation group. If BM1 is not equal to BM2, the consequence is 420 that the MAC address table on MES3 will keep oscillating such that 421 the C-MAC address CM of CE1 would flip-flop between BM1 or BM2, 422 depending on the load-balancing decision on CE1 for traffic destined 423 to the core. 425 Considering that there could be multiple sites (e.g. CEs) that are 426 multi-homed to the same set of MES nodes, then it is required for 427 all the MES devices in a Redundancy Group to have a unique B-MAC 428 address per site. This way, it is possible to achieve fast 429 convergence in the case where a link or port failure impacts the 430 attachment circuit connecting a single site to a given MES. 432 +---------+ 433 +-------+ MES1 | IP/MPLS | 434 / | | 435 CE1 | Network | MESr 436 M1 \ | | 437 +-------+ MES2 | | 438 /-------+ | | 439 / | | 440 CE2 | | 441 M2 \ | | 442 \ | | 443 +------+ MES3 +---------+ 445 Figure 2: B-MAC Address Assignment 447 In the example network shown in Figure 2 above, two sites 448 corresponding to CE1 and CE2 are dual-homed to MES1/MES2 and 449 MES2/MES3, respectively. Assume that BM1 is the B-MAC used for the 450 site corresponding to CE1. Similarly, BM2 is the B-MAC used for the 451 site corresponding to CE2. On MES1, a single B-MAC address (BM1) is 452 required for the site corresponding to CE1. On MES2, two B-MAC 453 addresses (BM1 and BM2) are required, one per site. Whereas on MES3, 454 a single B-MAC address (BM2) is required for the site corresponding 455 to CE2. All three MES nodes would advertise their respective B-MAC 456 addresses in BGP using the MAC Advertisement routes defined in [E- 457 VPN]. The remote MES, MESr, would learn via BGP that BM1 is 458 reachable via MES1 and MES2, whereas BM2 is reachable via both MES2 459 and MES3. Furthermore, MESr establishes via the normal bridge 460 learning that C-MAC M1 is reachable via BM1, and C-MAC M2 is 461 reachable via BM2. As a result, MESr can load-balance traffic 462 destined to M1 between MES1 and MES2, as well as traffic destined to 463 M2 between both MES2 and MES3. In the case of a failure that causes, 464 for example, CE1 to be isolated from MES1, the latter can withdraw 465 the route it has advertised for BM1. This way, MESr would update its 466 path list for BM1, and will send all traffic destined to M1 over to 467 MES2 only. 469 For single-homed sites, it is possible to assign a unique B-MAC 470 address per site, or have all the single-homed sites connected to a 471 given MES share a single B-MAC address. The advantage of the first 472 model over the second model is the ability to avoid C-MAC 473 destination address lookup on the disposition PE (even though source 474 C-MAC learning is still required in the data-plane). Also, by 475 assigning the B-MAC addresses from a contiguous range, it is 476 possible to advertise a single B-MAC subnet for all single-homed 477 sites, thereby rendering the number of MAC advertisement routes 478 required at par with the second model. 480 In summary, every MES may use a unicast B-MAC address shared by all 481 single-homed CEs or a unicast B-MAC address per single-homed CE, and 482 in addition a unicast B-MAC address per dual-homed CE. In the latter 483 case, the B-MAC address MUST be the same for all MES nodes in a 484 Redundancy Group connected to the same CE. 486 7.2.1.1. 487 Automating B-MAC Address Assignment 489 The MES B-MAC address used for single-homed sites can be 490 automatically derived from the hardware (using for e.g. the 491 backplane's address). However, the B-MAC address used for multi- 492 homed sites must be coordinated among the RG members. To automate 493 the assignment of this latter address, the MES can derive this B-MAC 494 address from the MAC Address portion of the CE's LACP System 495 Identifier by flipping the 'Locally Administered' bit of the CE's 496 address. This guarantees the uniqueness of the B-MAC address within 497 the network, and ensures that all MES nodes connected to the same 498 multi-homed CE use the same value for the B-MAC address. 500 Note that with this automatic provisioning of the B-MAC address 501 associated with mult-homed CEs, it is not possible to support the 502 uncommon scenario where a CE has multiple bundles towards the MES 503 nodes, and the service involves hair-pinning traffic from one bundle 504 to another. This is because the split-horizon filtering relies on B- 505 MAC addresses rather than Site-ID Labels (as will be described in 506 the next section). The operator must explicitly configure the B-MAC 507 address for this fairly uncommon service scenario. 509 Whenever a B-MAC address is provisioned on the MES, either manually 510 or automatically (as an outcome of CE auto-discovery), the MES MUST 511 transmit an MAC Advertisement Route for the B-MAC address with a 512 downstream assigned MPLS label that uniquely identifies that address 513 on the advertising MES. The route is tagged with the RTs of the 514 associated EVIs as described above. 516 7.2.2. 517 Split Horizon and Designated Forwarder Election 519 [E-VPN] relies on access split horizon, where the Ethernet Segment 520 Label is used for egress filtering on the attachment circuit in 521 order to prevent forwarding loops. In PBB-EVPN, the B-MAC source 522 address can be used for the same purpose, as it uniquely identifies 523 the originating site of a given frame. As such, Segment Labels are 524 not used in PBB-EVPN, and the egress filtering is done based on the 525 B-MAC source address. It is worth noting here that [802.1ah] defines 526 this B-MAC address based filtering function as part of the I- 527 Component options, hence no new functions are required to support 528 split-horizon beyond what is already defined in [802.1ah]. 529 Given that the Segment label is not used in PBB-EVPN, the MES sets 530 the Label field in the Ethernet Segment Route to 0. 532 The Designated Forwarder election procedures remain unchanged from 533 [E-VPN]. 535 7.3. 536 Network Multi-homing 538 When an Ethernet network is multi-homed to a set of MES nodes 539 running PBB-EVPN, an all-active redundancy model can be supported 540 with per service instance (i.e. I-SID) load-balancing. In this 541 model, DF election is performed to ensure that a single MES node in 542 the redundancy group is responsible for forwarding traffic 543 associated with a given I-SID. This guarantees that no forwarding 544 loops are created. Filtering based on DF state applies to both 545 unicast and multicast traffic, and in both access-to-core as well as 546 core-to-access directions (unlike the multi-homed device scenario 547 where DF filtering is limited to multi-destination frames in the 548 core-to-access direction). 549 Similar to the multi-homed device scenario, a unique B-MAC address 550 is used on the MES per multi-homed network (Segment). This helps 551 eliminate the need for C-MAC address flushing in all but one failure 552 scenario (more details on this in the Failure Handling section 553 below). The B-MAC address may be auto-provisioned by snooping on the 554 BPDUs of the multi-homed network: the B-MAC address is set to the 555 root bridge ID of the CIST albeit with the 'Locally Administered' 556 bit set. 558 7.3.1. 559 B-MAC Address Advertisement 561 For every multi-homed network, the MES advertises two MAC 562 Advertisement routes with different RDs and identical MAC addresses 563 and ESIs. One of these routes will be tagged with a lower Local Pref 564 attribute than the other. The route with the higher Local Pref will 565 be tagged with the RTs corresponding to the I-SIDs for which the 566 advertising MES is the DF. Whereas, the route with the lower Local 567 Pref will be tagged with the RTs corresponding to the I-SIDs for 568 which the advertising MES is the backup DF. Consider the example 569 network of the figure below, where a multi-homed network (MHN1) is 570 connected to two MES nodes (MES1 and MES2). 572 +---------+ 573 +-------+ MES1 | IP/MPLS | 574 +------+ BM1 | | 575 | | | Network | MESr 576 | MHN1 | BM1 | | 577 +------+ +-------+ MES2 | | 578 +---------+ 580 Figure 3: Multi-homed Network 582 Both MES nodes use the same B-MAC address (BM1) for the Ethernet 583 Segment (ESI1) associated with MHN1. Assume, for instance, that MES1 584 is the DF for the even I-SIDs whereas MES2 is the DF for the odd I- 585 SIDs. In this example, the routes advertised by MES1 and MES2 would 586 be as follows: 588 MES1: 590 Route 1: RD11, BM1, ESI1, Local Pref = 120, RT2, RT4, RT6... 591 Route 2: RD12, BM1, ESI1, Local Pref = 80, RT1, RT3, RT5... 593 MES2: 595 Route 1: RD21, BM1, ESI1, Local Pref = 120, RT1, RT3, RT5... 596 Route 2: RD22, BM1, ESI1, Local Pref = 80, RT2, RT4, RT6 598 Upon receiving the above MAC Advertisement routes, the remote MES 599 nodes (e.g. MESr) would install forwarding entries for BM1 towards 600 MES1 for the even I-SIDs, and towards MES2 for the odd I-SIDs. 602 It is worth noting that the procedures of this section can also be 603 used for a multi-homed device in order to support all-active 604 redundancy with per I-SID load-balancing. 606 7.3.2. 607 Failure Handling 609 In the case of an MES node failure, or when the MES is isolated from 610 the multi-homed network due to a port or link failure, the affected 611 MES withdraws its MAC Advertisement routes for the associated B-MAC. 612 This serves as a trigger for the remote MES nodes to adjust their 613 forwarding entries to point to the backup DF. Because the same B-MAC 614 address is used on both the DF and backup DF nodes, then there is no 615 need to flush the C-MAC address table upon the occurrence of these 616 failures. 618 In the case where the multi-homed network is partitioned, the MES 619 nodes can detect this condition by snooping on the network's BPDUs. 620 When a MES detects that the root bridge ID has changed, it must 621 change the value of the B-MAC address associated with the Ethernet 622 Segment. This is done by the MES withdrawing the previous MAC 623 Advertisement route, and advertising a new route for the updated B- 624 MAC. The MES, which detects the failure, must inform the remote MES 625 nodes to flush their C-MAC address tables for the affected I-SIDs. 626 This is required because when the multi-homed network is 627 partitioned, certain C-MAC addresses will move from being associated 628 with the old B-MAC address to the new B-MAC addresses. Other C-MAC 629 addresses will have their reachability remaining intact. Given that 630 the MES node has no means of identifying which C-MACs have moved and 631 which have not, the entire C-MAC forwarding table for the affected 632 I-SIDs must be flushed. The affected MES signals the need for the C- 633 MAC flushing by sending the MAC Mobility Extended Community in the 634 MP_UNREACH_NLRI attribute containing the E-VPN NLRI for the 635 withdrawn MAC Advertisement route. 637 7.4. 638 Frame Forwarding 640 The frame forwarding functions are divided in between the Bridge 641 Module, which hosts the [802.1ah] Backbone Edge Bridge (BEB) 642 functionality, and the MPLS Forwarder which handles the MPLS 643 imposition/disposition. The details of frame forwarding for unicast 644 and multi-destination frames are discussed next. 646 7.4.1. 647 Unicast 649 Known unicast traffic received from the AC will be PBB-encapsulated 650 by the MES using the B-MAC source address corresponding to the 651 originating site. The unicast B-MAC destination address is 652 determined based on a lookup of the C-MAC destination address (the 653 binding of the two is done via transparent learning of reverse 654 traffic). The resulting frame is then encapsulated with an LSP 655 tunnel label and the MPLS label which uniquely identifies the B-MAC 656 destination address on the egress MES. If per flow load-balancing 657 over ECMPs in the MPLS core is required, then a flow label is added 658 as the end of stack label. 660 For unknown unicast traffic, the MES forwards these frames over MPLS 661 core. When these frames are to be forwarded, then the same set of 662 options used for forwarding multicast/broadcast frames (as described 663 in next section) are used. 665 7.4.2. 666 Multicast/Broadcast 668 Multi-destination frames received from the AC will be PBB- 669 encapsulated by the MES using the B-MAC source address corresponding 670 to the originating site. The multicast B-MAC destination address is 671 selected based on the value of the I-SID as defined in [802.1ah]. 672 The resulting frame is then forwarded over the MPLS core using one 673 out of the following two options: 675 Option 1: the MPLS Forwarder can perform ingress replication over a 676 set of MP2P tunnel LSPs. The frame is encapsulated with a tunnel LSP 677 label and the E-VPN ingress replication label advertised in the 678 Inclusive Multicast Route. 680 Option 2: the MPLS Forwarder can use P2MP tunnel LSP per the 681 procedures defined in [E-VPN]. This includes either the use of 682 Inclusive or Aggregate Inclusive trees. 684 Note that the same procedures for advertising and handling the 685 Inclusive Multicast Route defined in [E-VPN] apply here. 687 8. 688 Minimizing ARP Broadcast 690 The MES nodes implement an ARP-proxy function in order to minimize 691 the volume of ARP traffic that is broadcasted over the MPLS network. 692 This is achieved by having each MES node snoop on ARP request and 693 response messages received over the access interfaces or the MPLS 694 core. The MES builds a cache of IP / MAC address bindings from these 695 snooped messages. The MES then uses this cache to respond to ARP 696 requests ingress on access ports and targeting hosts that are in 697 remote sites. If the MES finds a match for the IP address in its ARP 698 cache, it responds back to the requesting host and drops the 699 request. Otherwise, if it does not find a match, then the request is 700 flooded over the MPLS network using either ingress replication or 701 LSM. 703 9. 704 Seamless Interworking with TRILL and IEEE 802.1aq/802.1Qbp 706 PBB-EVPN enables seamless connectivity of TRILL or 802.1aq/802.1Qbp 707 networks over an MPLS/IP core while maintaining control-plane 708 separation among these networks. We will refer to one or any of 709 TRILL, 802.1aq or 802.1Qbp networks collectively as 'NG-Ethernet 710 networks' thereafter. 711 Every NG-Ethernet network that is connected to the MPLS core runs an 712 independent instance of the corresponding IS-IS control-plane. Each 713 MES participates in the NG-Ethernet network control plane of its 714 local site. The MES peers, in IS-IS protocol, with the switches 715 internal to the site, but does not terminate the TRILL / PBB data- 716 plane encapsulation. So, from a control-plane viewpoint, the MES 717 appears as an edge switch; whereas, from a data-plane viewpoint, the 718 MES appears as a core switch to the NG-Ethernet network. 719 The MES nodes encapsulate TRILL / PBB frames with MPLS in the 720 imposition path, and de-capsulate them in the disposition path. 722 9.1. 723 TRILL Nickname Advertisement Route 725 A new BGP route is defined to support the interconnection of TRILL 726 networks over PBB-EVPN: the TRILL Nickname Advertisement' route, 727 encoded as follows: 729 +---------------------------------------+ 730 | RD (8 octets) | 731 +---------------------------------------+ 732 |Ethernet Segment Identifier (10 octets)| 733 +---------------------------------------+ 734 | Ethernet Tag ID (4 octets) | 735 +---------------------------------------+ 736 | Nickname Length (1 octet) | 737 +---------------------------------------+ 738 | RBridge Nickname (2 octets) | 739 +---------------------------------------+ 740 | MPLS Label (n * 3 octets) | 741 +---------------------------------------+ 743 Figure 4: TRILL Nickname Advertisement Route 745 The MES uses this route to advertise the reachability of TRILL 746 RBridge nicknames to other MES nodes in the VPN instance. The MPLS 747 label advertised in this route can be allocated on a per VPN basis 748 and serves the purpose of identifying to the disposition MES that 749 the MPLS-encapsulated packet holds an MPLS encapsulated TRILL frame. 751 The encapsulation for the transport of TRILL frames over MPLS is 752 encoded as shown in the figure below: 754 +------------------+ 755 | IP/MPLS Header | 756 +------------------+ 757 | TRILL Header | 758 +------------------+ 759 | Ethernet Header | 760 +------------------+ 761 | Ethernet Payload | 762 +------------------+ 763 | Ethernet FCS | 764 +------------------+ 766 Figure 5: TRILL over MPLS Encapsulation 767 It is worth noting here that while it is possible to transport 768 Ethernet encapsulated TRILL frames over MPLS, that approach 769 unnecessarily wastes 16 bytes per packet. That approach further 770 requires either the use of well-known MAC addresses or having the 771 MES nodes advertise in BGP their device MAC addresses, in order to 772 resolve the TRILL next-hop L2 adjacency. To that end, it is simpler 773 and more efficient to transport TRILL natively over MPLS and that is 774 why we are defining the above BGP route for TRILL Nickname 775 advertisement. 777 9.2. 778 IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route 780 B-MAC addresses associated with 802.1aq / 802.1Qbp switches are 781 advertised using the BGP MAC Advertisement route already defined in 782 [E-VPN]. 784 The encapsulation for the transport of PBB frames over MPLS is 785 similar to that of classical Ethernet, albeit with the additional 786 PBB header, as shown in the figure below: 788 +------------------+ 789 | IP/MPLS Header | 790 +------------------+ 791 | PBB Header | 792 +------------------+ 793 | Ethernet Header | 794 +------------------+ 795 | Ethernet Payload | 796 +------------------+ 797 | Ethernet FCS | 798 +------------------+ 800 Figure 6: PBB over MPLS Encapsulation 802 9.3. 803 Operation 805 For correct connectivity, the TRILL Nicknames or 802.1aq/802.1Qbp B- 806 MACs must be globally unique in the network. This can be achieved, 807 for instance, by using a hierarchical Nickname (or B-MAC) assignment 808 paradigm, and encoding a Site ID in the high-order bits of the 809 Nickname (or B-MAC): 811 Nickname (or B-MAC) = [Site ID : Rbridge ID (or MAC)] 813 The only practical difference between TRILL Nicknames and B-MACs, in 814 this regards, is with respect to the size of the address space: 815 Nicknames are 16-bits wide whereas B-MACs are 48-bits wide. 817 Every MES then advertises (in BGP) the Nicknames (or B-MACs) of all 818 switches local to its site in the TRILL Nickname Advertisement 819 routes (or MAC Advertisement routes). 820 Furthermore, the MES advertises in IS-IS (to the local island) the 821 Rbridge nicknames (or B-MACs) of all remote switches in all the 822 other TRILL (or IEEE 802.1aq/802.1Qbp) islands that the MES has 823 learned via BGP. 825 Note that by having multiple MES nodes (connected to the same TRILL 826 or 802.1aq /802.1Qbp island) advertise routes to the same RBridge 827 nickname (or B-MAC), with equal BGP Local_Pref attribute, it is 828 possible to perform active/active load-balancing to/from the MPLS 829 core. 831 When a MES receives an Ethernet-encapsulated TRILL frame from the 832 access side, it removes the Ethernet encapsulation (i.e. outer MAC 833 header), and performs a lookup on the egress RBridge nickname in the 834 TRILL header to identify the next-hop. If the lookup yields that the 835 next hop is a remote MES, the local MES would then encapsulate the 836 TRILL frame in MPLS. The label stack comprises of the VPN label 837 (advertised by the remote MES), followed by an LSP/IGP label. From 838 that point onwards, regular MPLS forwarding is applied. 840 On the disposition MES, assuming penultimate-hop-popping is 841 employed, the MES receives the MPLS-encapsulated TRILL frame with a 842 single label: the VPN label. The value of the label indicates to the 843 disposition MES that this is a TRILL packet, so the label is popped, 844 the TTL field (in the TRILL header) is reinitialized and normal 845 TRILL processing is employed from this point onwards. 847 By the same token, when a MES receives a PBB-encapsulated Ethernet 848 frame from the access side, it performs a lookup on the B-MAC 849 destination address to identify the next hop. If the lookup yields 850 that the next hop is a remote MES, the local MES would then 851 encapsulate the PBB frame in MPLS. The label stack comprises of the 852 VPN label (advertised by the remote PE), followed by an LSP/IGP 853 label. From that point onwards, regular MPLS forwarding is applied. 855 On the disposition MES, assuming penultimate-hop-popping is 856 employed, the MES receives the MPLS-encapsulated PBB frame with a 857 single label: the VPN label. The value of the label indicates to the 858 disposition MES that this is a PBB frame, so the label is popped, 859 the TTL field (in the 802.1Qbp F-Tag) is reinitialized and normal 860 PBB processing is employed from this point onwards. 862 10. 863 Solution Advantages 865 In this section, we discuss the advantages of the PBB-EVPN solution 866 in the context of the requirements set forth in section 3 above. 868 10.1. 869 MAC Advertisement Route Scalability 871 In PBB-EVPN the number of MAC Advertisement Routes is a function of 872 the number of segments (sites), rather than the number of 873 hosts/servers. This is because the B-MAC addresses of the MESes, 874 rather than C-MAC addresses (of hosts/servers) are being advertised 875 in BGP. And, as discussed above, there's a one-to-one mapping 876 between multi-homed segments and B-MAC addresses, whereas there's a 877 one-to-one or many-to-one mapping between single-homed segments and 878 B-MAC addresses for a given MES. As a result, the volume of MAC 879 Advertisement Routes in PBB-EVPN is multiple orders of magnitude 880 less than E-VPN. 882 10.2. 883 C-MAC Mobility with MAC Sub-netting 885 In PBB-EVPN, if a MES allocates its B-MAC addresses from a 886 contiguous range, then it can advertise a MAC prefix rather than 887 individual 48-bit addresses. It should be noted that B-MAC addresses 888 can easily be assigned from a contiguous range because MES nodes are 889 within the provider administrative domain; however, CE devices and 890 hosts are typically not within the provider administrative domain. 891 The advantage of such MAC address sub-netting can be maintained even 892 as C-MAC addresses move from one Ethernet segment to another. This 893 is because the C-MAC address to B-MAC address association is learnt 894 in the data-plane and C-MAC addresses are not advertised in BGP. To 895 illustrate how this compares to E-VPN, consider the following 896 example: 898 If a MES running E-VPN advertises reachability for a MAC subnet that 899 spans N addresses via a particular segment, and then 50% of the MAC 900 addresses in that subnet move to other segments (e.g. due to virtual 901 machine mobility), then in the worst case, N/2 additional MAC 902 Advertisement routes need to be sent for the MAC addresses that have 903 moved. This defeats the purpose of the sub-netting. With PBB-EVPN, 904 on the other hand, the sub-netting applies to the B-MAC addresses 905 which are statically associated with MES nodes and are not subject 906 to mobility. As C-MAC addresses move from one segment to another, 907 the binding of C-MAC to B-MAC addresses is updated via data-plane 908 learning. 910 10.3. 911 C-MAC Address Learning and Confinement 913 In PBB-EVPN, C-MAC address reachability information is built via 914 data-plane learning. As such, MES nodes not participating in active 915 conversations involving a particular C-MAC address will purge that 916 address from their forwarding tables. Furthermore, since C-MAC 917 addresses are not distributed in BGP, MES nodes will not maintain 918 any record of them in control-plane routing table. 920 10.4. 921 Seamless Interworking with TRILL and 802.1aq Access Networks 923 Consider the scenario where two access networks, one running MPLS 924 and the other running 802.1aq, are interconnected via an MPLS 925 backbone network. The figure below shows such an example network. 927 +--------------+ 928 | | 929 +---------+ | MPLS | +---------+ 930 +----+ | | +----+ +----+ | | +----+ 931 | CE |--| | |MES1| |MES2| | |--| CE | 932 +----+ | 802.1aq |---| | | |--| MPLS | +----+ 933 +----+ | | +----+ +----+ | | +----+ 934 | CE |--| | | Backbone | | |--| CE | 935 +----+ +---------+ +--------------+ +---------+ +----+ 937 Figure 7: Interoperability with 802.1aq 939 If the MPLS backbone network employs E-VPN, then the 802.1aq data- 940 plane encapsulation must be terminated on MES1 or the edge device 941 connecting to MES1. Either way, all the MES nodes that are part of 942 the associated service instances will be exposed to all the C-MAC 943 addresses of all hosts/servers connected to the access networks. 944 However, if the MPLS backbone network employs PBB-EVPN, then the 945 802.1aq encapsulation can be extended over the MPLS backbone, 946 thereby maintaining C-MAC address transparency on MES1. If PBB-EVPN 947 is also extended over the MPLS access network on the right, then C- 948 MAC addresses would be transparent to MES2 as well. 950 Interoperability with TRILL access network will be described in 951 future revision of this draft. 953 10.5. 954 Per Site Policy Support 956 In PBB-EVPN, a unique B-MAC address can be associated with every 957 site (single-homed or multi-homed). Given that the B-MAC addresses 958 are sent in BGP MAC Advertisement routes, it is possible to define 959 per site (i.e. B-MAC) forwarding policies including policies for E- 960 TREE service. 962 10.6. 963 Avoiding C-MAC Address Flushing 965 With PBB-EVPN, it is possible to avoid C-MAC address flushing upon 966 topology change affecting a multi-homed device. To illustrate this, 967 consider the example network of Figure 1. Both MES1 and MES2 968 advertize the same B-MAC address (BM1) to MES3. MES3 then learns the 969 C-MAC addresses of the servers/hosts behind CE1 via data-plane 970 learning. If AC1 fails, then MES3 does not need to flush any of the 971 C-MAC addresses learnt and associated with BM1. This is because MES1 972 will withdraw the MAC Advertisement routes associated with BM1, 973 thereby leading MES3 to have a single adjacency (to MES2) for this 974 B-MAC address. Therefore, the topology change is communicated to 975 MES3 and no C-MAC address flushing is required. 977 11. 978 Acknowledgements 979 TBD. 981 12. 982 Security Considerations 984 There are no additional security aspects beyond those of VPLS/H-VPLS 985 that need to be discussed here. 987 13. 988 IANA Considerations 990 This document requires IANA to assign a new SAFI value for L2VPN_MAC 991 SAFI. 993 14. 994 Intellectual Property Considerations 996 This document is being submitted for use in IETF standards 997 discussions. 999 15. 1000 Normative References 1002 [802.1ah] "Virtual Bridged Local Area Networks Amendment 7: Provider 1003 Backbone Bridges", IEEE Std. 802.1ah-2008, August 2008. 1005 16. 1006 Informative References 1008 [PBB-VPLS] Sajassi et al., "VPLS Interoperability with Provider 1009 Backbone Bridges", draft-ietf-l2vpn-vpls-pbb-interop-00.txt, work in 1010 progress, September, 2011. 1012 [EVPN-REQ] Sajassi et al., "Requirements for Ethernet VPN (E-VPN)", 1013 draft-sajassi-raggarwa-l2vpn-evpn-req-00.txt, work in progress, 1014 October, 2010. 1016 [E-VPN] Aggarwal et al., "BGP MPLS Based Ethernet VPN", draft- 1017 raggarwa-sajassi-l2vpn-evpn-01.txt, November, 2010. 1018 , work in progress, June, 2010. 1020 17. 1021 Authors' Addresses 1023 Ali Sajassi 1024 Cisco 1025 170 West Tasman Drive 1026 San Jose, CA 95134, US 1027 Email: sajassi@cisco.com 1029 Samer Salam 1030 Cisco 1031 595 Burrard Street, Suite 2123 1032 Vancouver, BC V7X 1J1, Canada 1033 Email: ssalam@cisco.com 1035 Sami Boutros 1036 Cisco 1037 170 West Tasman Drive 1038 San Jose, CA 95134, US 1039 Email: sboutros@cisco.com 1041 Nabil Bitar 1042 Verizon Communications 1043 Email : nabil.n.bitar@verizon.com 1045 Aldrin Isaac 1046 Bloomberg 1047 Email: aisaac71@bloomberg.net 1049 Florin Balus 1050 Alcatel-Lucent 1051 701 E. Middlefield Road 1052 Mountain View, CA, USA 94043 1053 Email: florin.balus@alcatel-lucent.com 1055 Wim Henderickx 1056 Alcatel-Lucent 1057 Email: wim.henderickx@alcatel-lucent.be 1059 Clarence Filsfils 1060 Cisco 1061 Email: cfilsfil@cisco.com 1063 Dennis Cai 1064 Cisco 1065 Email: dcai@cisco.com 1067 Lizhong Jin 1068 ZTE Corporation 1069 889, Bibo Road 1070 Shanghai, 201203, China 1071 Email: lizhong.jin@zte.com.cn