idnits 2.17.1 draft-ietf-l2vpn-pbb-evpn-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 18, 2013) is 3836 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'PBB' == Outdated reference: A later version (-06) exists of draft-ietf-l2vpn-pbb-vpls-interop-05 == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-evpn-req-05 == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-04 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Working Group Ali Sajassi 3 Internet Draft Samer Salam 4 Category: Standards Track Sami Boutros 5 Cisco 7 Florin Balus Nabil Bitar 8 Wim Henderickx Verizon 9 Alcatel-Lucent 10 Aldrin Isaac 11 Clarence Filsfils Bloomberg 12 Dennis Cai 13 Cisco Lizhong Jin 14 ZTE 16 Expires: April 18, 2014 October 18, 2013 18 PBB-EVPN 19 draft-ietf-l2vpn-pbb-evpn-06 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as 29 Internet-Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/1id-abstracts.html 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 Copyright and License Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Abstract 59 This document discusses how Ethernet Provider Backbone Bridging 60 [802.1ah] can be combined with EVPN in order to reduce the number of 61 BGP MAC advertisement routes by aggregating Customer/Client MAC (C- 62 MAC) addresses via Provider Backbone MAC address (B-MAC), provide 63 client MAC address mobility using C-MAC aggregation and B-MAC sub- 64 netting, confine the scope of C-MAC learning to only active flows, 65 offer per site policies and avoid C-MAC address flushing on topology 66 changes. The combined solution is referred to as PBB-EVPN. 68 Conventions 70 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 71 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 72 document are to be interpreted as described in RFC 2119. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 2. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 79 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4 80 4.1. MAC Advertisement Route Scalability . . . . . . . . . . . 5 81 4.2. C-MAC Mobility with MAC Summarization . . . . . . . . . . 5 82 4.3. C-MAC Address Learning and Confinement . . . . . . . . . . 5 83 4.4. Per Site Policy Support . . . . . . . . . . . . . . . . . 6 84 4.5. Avoiding C-MAC Address Flushing . . . . . . . . . . . . . 6 85 5. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 86 6. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 7 87 6.1. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 7 88 6.2. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 7 89 6.3. Inclusive Multicast Ethernet Tag Route . . . . . . . . . . 8 90 6.4. Ethernet Segment Route . . . . . . . . . . . . . . . . . . 8 91 6.5. ESI Label Extended Community . . . . . . . . . . . . . . . 8 92 6.6. ES-Import Route Target . . . . . . . . . . . . . . . . . . 8 93 6.7. MAC Mobility Extended Community . . . . . . . . . . . . . 8 94 6.8. Default Gateway Extended Community . . . . . . . . . . . . 9 96 7. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 97 7.1. MAC Address Distribution over Core . . . . . . . . . . . . 9 98 7.2. Device Multi-homing . . . . . . . . . . . . . . . . . . . 9 99 7.2.1 Flow-based Load-balancing . . . . . . . . . . . . . . . 9 100 7.2.1.1 PE B-MAC Address Assignment . . . . . . . . . . . . 9 101 7.2.1.2. Automating B-MAC Address Assignment . . . . . . . 11 102 7.2.1.3 Split Horizon and Designated Forwarder Election . . 12 103 7.2.2 I-SID Based Load-balancing . . . . . . . . . . . . . . . 12 104 7.2.2.1 PE B-MAC Address Assignment . . . . . . . . . . . . 12 105 7.2.2.2 Split Horizon and Designated Forwarder Election . . 13 106 7.2.2.3 Handling Failure Scenarios . . . . . . . . . . . . . 13 107 7.3. Network Multi-homing . . . . . . . . . . . . . . . . . . . 14 108 7.4. Frame Forwarding . . . . . . . . . . . . . . . . . . . . . 14 109 7.4.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . 14 110 7.4.2. Multicast/Broadcast . . . . . . . . . . . . . . . . . 15 111 8. Minimizing ARP Broadcast . . . . . . . . . . . . . . . . . . . 15 112 9. Seamless Interworking with IEEE 802.1aq/802.1Qbp . . . . . . . 15 113 9.1 B-MAC Address Assignment . . . . . . . . . . . . . . . . . . 16 114 9.2 IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route . . . . . 16 115 9.3 Operation: . . . . . . . . . . . . . . . . . . . . . . . . . 17 116 10. Solution Advantages . . . . . . . . . . . . . . . . . . . . . 17 117 10.1. MAC Advertisement Route Scalability . . . . . . . . . . . 17 118 10.2. C-MAC Mobility with MAC Sub-netting . . . . . . . . . . . 18 119 10.3. C-MAC Address Learning and Confinement . . . . . . . . . 18 120 10.4. Seamless Interworking with TRILL and 802.1aq Access 121 Networks . . . . . . . . . . . . . . . . . . . . . . . . 18 122 10.5. Per Site Policy Support . . . . . . . . . . . . . . . . . 19 123 10.6. Avoiding C-MAC Address Flushing . . . . . . . . . . . . . 19 124 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 125 12. Security Considerations . . . . . . . . . . . . . . . . . . . 20 126 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 127 14. Intellectual Property Considerations . . . . . . . . . . . . 20 128 15. Normative References . . . . . . . . . . . . . . . . . . . . 20 129 16. Informative References . . . . . . . . . . . . . . . . . . . 20 130 17. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 21 132 1. Introduction 134 [EVPN] introduces a solution for multipoint L2VPN services, with 135 advanced multi-homing capabilities, using BGP for distributing 136 customer/client MAC address reach-ability information over the core 137 MPLS/IP network. [PBB] defines an architecture for Ethernet Provider 138 Backbone Bridging (PBB), where MAC tunneling is employed to improve 139 service instance and MAC address scalability in Ethernet as well as 140 VPLS networks [PBB-VPLS]. 142 In this document, we discuss how PBB can be combined with EVPN in 143 order to: reduce the number of BGP MAC advertisement routes by 144 aggregating Customer/Client MAC (C-MAC) addresses via Provider 145 Backbone MAC address (B-MAC), provide client MAC address mobility 146 using C-MAC aggregation and B-MAC sub-netting, confine the scope of 147 C-MAC learning to only active flows, offer per site policies and 148 avoid C-MAC address flushing on topology changes. The combined 149 solution is referred to as PBB-EVPN. 151 2. Contributors 153 In addition to the authors listed above, the following individuals 154 also contributed to this document. 156 Keyur Patel, Cisco 157 Sam Aldrin, Huawei 158 Himanshu Shah, Ciena 160 3. Terminology 162 BEB: Backbone Edge Bridge 163 B-MAC: Backbone MAC Address 164 CE: Customer Edge 165 C-MAC: Customer/Client MAC Address 166 DHD: Dual-homed Device 167 DHN: Dual-homed Network 168 LACP: Link Aggregation Control Protocol 169 LSM: Label Switched Multicast 170 MDT: Multicast Delivery Tree 171 MP2MP: Multipoint to Multipoint 172 P2MP: Point to Multipoint 173 P2P: Point to Point 174 PE: Provider Edge 175 PoA: Point of Attachment 176 PW: Pseudowire 177 EVPN: Ethernet VPN 179 4. Requirements 180 The requirements for PBB-EVPN include all the requirements for EVPN 181 that were described in [EVPN-REQ], in addition to the following: 183 4.1. MAC Advertisement Route Scalability 185 In typical operation, an [EVPN] PE sends a BGP MAC Advertisement 186 Route per customer/client MAC (C-MAC) address. In certain 187 applications, this poses scalability challenges, as is the case in 188 virtualized data center environments where the number of virtual 189 machines (VMs), and hence the number of C-MAC addresses, can be in 190 the millions. In such scenarios, it is required to reduce the number 191 of BGP MAC Advertisement routes by relying on a 'MAC summarization' 192 scheme, as is provided by PBB. Note that the MAC summarization 193 capability already built into EVPN is not sufficient in those 194 environments, as will be discussed next. 196 4.2. C-MAC Mobility with MAC Summarization 198 Certain applications, such as virtual machine mobility, require 199 support for fast C-MAC address mobility. For these applications, it 200 is not possible to use MAC address summarization in EVPN, i.e. 201 advertise reach-ability to a MAC address prefix. Rather, the exact 202 virtual machine MAC address needs to be transmitted in BGP MAC 203 Advertisement route. Otherwise, traffic would be forwarded to the 204 wrong segment when a virtual machine moves from one Ethernet segment 205 to another. This hinders the scalability benefits of summarization. 207 It is required to support C-MAC address mobility, while retaining the 208 scalability benefits of MAC summarization. This can be achieved by 209 leveraging PBB technology, which defines a Backbone MAC (B-MAC) 210 address space that is independent of the C-MAC address space, and 211 aggregate C-MAC addresses via a B-MAC address and then apply 212 summarization to B-MAC addresses. 214 4.3. C-MAC Address Learning and Confinement 216 In EVPN, all the PE nodes participating in the same EVPN instance are 217 exposed to all the C-MAC addresses learnt by any one of these PE 218 nodes because a C-MAC learned by one of the PE nodes is advertise in 219 BGP to other PE nodes in that EVPN instance. This is the case even if 220 some of the PE nodes for that EVPN instance are not involved in 221 forwarding traffic to, or from, these C-MAC addresses. Even if an 222 implementation does not install hardware forwarding entries for C-MAC 223 addresses that are not part of active traffic flows on that PE, the 224 device memory is still consumed by keeping record of the C-MAC 225 addresses in the routing table (RIB). In network applications with 226 millions of C-MAC addresses, this introduces a non-trivial waste of 227 PE resources. As such, it is required to confine the scope of 228 visibility of C-MAC addresses only to those PE nodes that are 229 actively involved in forwarding traffic to, or from, these addresses. 231 4.4. Per Site Policy Support 233 In many applications, it is required to be able to enforce 234 connectivity policy rules at the granularity of a site (or segment). 235 This includes the ability to control which PE nodes in the network 236 can forward traffic to, or from, a given site. PBB-EVPN is capable of 237 providing this granularity of policy control. In the case where per 238 C-MAC address granularity is required, the EVI can always continue to 239 operate in EVPN mode. 241 4.5. Avoiding C-MAC Address Flushing 243 It is required to avoid C-MAC address flushing upon link, port or 244 node failure for multi-homed devices and networks. This is in order 245 to speed up re-convergence upon failure. 247 5. Solution Overview 249 The solution involves incorporating IEEE Backbone Edge Bridge (BEB) 250 functionality on the EVPN PE nodes similar to PBB-VPLS, where BEB 251 functionality is incorporated in the VPLS PE nodes. The PE devices 252 would then receive 802.1Q Ethernet frames from their attachment 253 circuits, encapsulate them in the PBB header and forward the frames 254 over the IP/MPLS core. On the egress EVPN PE, the PBB header is 255 removed following the MPLS disposition, and the original 802.1Q 256 Ethernet frame is delivered to the customer equipment. 257 BEB +--------------+ BEB 258 || | | || 259 \/ | | \/ 260 +----+ AC1 +----+ | | +----+ +----+ 261 | CE1|-----| | | | | |---| CE2| 262 +----+\ | PE1| | IP/MPLS | | PE3| +----+ 263 \ +----+ | Network | +----+ 264 \ | | 265 AC2\ +----+ | | 266 \| | | | 267 | PE2| | | 268 +----+ | | 269 /\ +--------------+ 270 || 271 BEB 272 <-802.1Q-> <------PBB over MPLS------> <-802.1Q-> 274 Figure 1: PBB-EVPN Network 275 The PE nodes perform the following functions:- Learn customer/client 276 MAC addresses (C-MACs) over the attachment circuits in the data- 277 plane, per normal bridge operation. 279 - Learn remote C-MAC to B-MAC bindings in the data-plane from traffic 280 ingress from the core per [PBB] bridging operation. 282 - Advertise local B-MAC address reach-ability information in BGP to 283 all other PE nodes in the same set of service instances. Note that 284 every PE has a set of local B-MAC addresses that uniquely identify 285 the device. More on the PE addressing in section 5. 287 - Build a forwarding table from remote BGP advertisements received 288 associating remote B-MAC addresses with remote PE IP addresses and 289 the associated MPLS label(s). 291 6. BGP Encoding 293 PBB-EVPN leverages the same BGP Routes and Attributes defined in 294 [EVPN], adapted as follows: 296 6.1. Ethernet Auto-Discovery Route 298 This route and all of its associated modes are not needed in PBB- 299 EVPN. 301 The receiving PE knows that it need not wait for the receipt of the 302 Ethernet A-D route for route resolution by means of the reserved ESI 303 encoded in the MAC Advertisement route: the ESI values of 0 and MAX- 304 ESI indicate that the receiving PE can resolve the path without an 305 Ethernet A-D route. 307 6.2. BGP MAC Advertisement Route 309 The EVPN MAC Advertisement Route is used to distribute B-MAC 310 addresses of the PE nodes instead of the C-MAC addresses of end- 311 stations/hosts. This is because the C-MAC addresses are learnt in the 312 data-plane for traffic arriving from the core. The MAC Advertisement 313 Route is encoded as follows: 315 - The MAC address field contains the B-MAC address. 316 - The Ethernet Tag field is set to 0. 317 - The Ethernet Segment Identifier field must be set either to 0 (for 318 single-homed Segments or multi-homed Segments with per-ISID load- 319 balancing) or to MAX-ESI (for multi-homed Segments with per-flow 320 load-balancing). All other values are not permitted. 322 - All other fields are set as defined in [EVPN]. 324 This route is tagged with the RT corresponding to its EVI. This EVI 325 is analogous to a B-VID. 327 6.3. Inclusive Multicast Ethernet Tag Route 329 This route is used for multicast pruning per I-SID. It is used for 330 auto-discovery of PEs participating in a given I-SID so that a 331 multicast tunnel (MP2P, P2P, P2MP, or MP2MP LSP) can be setup for 332 that I-SID . [PBB-VPLS] uses multicast pruning per I-SID based on 333 [MMRP] which is a soft-state protocol. The advantages of multicast 334 pruning using this BGP route over [MMRP] are that a) it scales very 335 well for large number of PEs and b) it works with any type of LSP 336 (MP2P, P2P, P2MP, or MP2MP); whereas, [MMRP] only works over P2P PWs. 337 The Inclusive Multicast Ethernet Tag Route is encoded as follow: 339 - The Ethernet Tag field is set with the appropriate I-SID value. 340 - All other fields are set as defined in [EVPN]. 342 This route is tagged with an RT. This RT SHOULD be set to a value 343 corresponding to its EVI (which is analogous to a B-VID). The RT for 344 this route MAY also be auto-derived from the corresponding Ethernet 345 Tag (I-SID) based on the procedure specified in section 9.4.1.1.1 of 346 [EVPN]. 348 6.4. Ethernet Segment Route 350 This route is used as defined in [EVPN]. 352 6.5. ESI Label Extended Community 354 This extended community is not used in PBB-EVPN. In [EVPN], this 355 extended community is used along with the Ethernet AD route to 356 advertise an MPLS label for the purpose of split-horizon filtering. 357 Since in PBB-EVPN, the split-horizon filtering is performed natively 358 using BMAC SA, there is no need for this extended community. 360 6.6. ES-Import Route Target 362 This RT is used as defined in [EVPN]. 364 6.7. MAC Mobility Extended Community 365 This extended community is defined in [EVPN] and it is used with a 366 MAC route (BMAC route in case of PBB-EVPN). The BMAC route is tagged 367 with the RT corresponding to its EVI (which is analogous to a B-VID). 368 When this extended community is used along with a BMAC rotue in PBB- 369 EVPN, it indicates that the C-MAC forwarding tables for all the I- 370 SIDs associated with the RT tagging this BMAC Advertisement route 371 must be flushed. 373 6.8. Default Gateway Extended Community 375 This extended community is not used in PBB-EVPN. 377 7. Operation 379 This section discusses the operation of PBB-EVPN, specifically in 380 areas where it differs from [EVPN]. 382 7.1. MAC Address Distribution over Core 384 In PBB-EVPN, host MAC addresses (i.e. C-MAC addresses) need not be 385 distributed in BGP. Rather, every PE independently learns the C-MAC 386 addresses in the data-plane via normal bridging operation. Every PE 387 has a set of one or more unicast B-MAC addresses associated with it, 388 and those are the addresses distributed over the core in MAC 389 Advertisement routes. 391 7.2. Device Multi-homing 393 7.2.1 Flow-based Load-balancing 395 This section describes the procedures for supporting device multi- 396 homing in an all-active redundancy model with flow-based load- 397 balancing. 399 7.2.1.1 PE B-MAC Address Assignment 401 In [PBB] every BEB is uniquely identified by one or more B-MAC 402 addresses. These addresses are usually locally administered by the 403 Service Provider. For PBB-EVPN, the choice of B-MAC address(es) for 404 the PE nodes must be examined carefully as it has implications on the 405 proper operation of multi-homing. In particular, for the scenario 406 where a CE is multi-homed to a number of PE nodes with all-active 407 redundancy and flow-based load-balancing, a given C-MAC address would 408 be reachable via multiple PE nodes concurrently. Given that any given 409 remote PE will bind the C-MAC address to a single B-MAC address, then 410 the various PE nodes connected to the same CE must share the same B- 411 MAC address. Otherwise, the MAC address table of the remote PE nodes 412 will keep oscillating between the B-MAC addresses of the various PE 413 devices. For example, consider the network of Figure 1, and assume 414 that PE1 has B-MAC BM1 and PE2 has B-MAC BM2. Also, assume that both 415 links from CE1 to the PE nodes are part of an all-active multi- 416 chassis Ethernet link aggregation group. If BM1 is not equal to BM2, 417 the consequence is that the MAC address table on PE3 will keep 418 oscillating such that the C-MAC address CM of CE1 would flip-flop 419 between BM1 or BM2, depending on the load-balancing decision on CE1 420 for traffic destined to the core. 422 Considering that there could be multiple sites (e.g. CEs) that are 423 multi-homed to the same set of PE nodes, then it is required for all 424 the PE devices in a Redundancy Group to have a unique B-MAC address 425 per site. This way, it is possible to achieve fast convergence in the 426 case where a link or port failure impacts the attachment circuit 427 connecting a single site to a given PE. 429 +---------+ 430 +-------+ PE1 | IP/MPLS | 431 / | | 432 CE1 | Network | PEr 433 M1 \ | | 434 +-------+ PE2 | | 435 /-------+ | | 436 / | | 437 CE2 | | 438 M2 \ | | 439 \ | | 440 +------+ PE3 +---------+ 442 Figure 2: B-MAC Address Assignment 444 In the example network shown in Figure 2 above, two sites 445 corresponding to CE1 and CE2 are dual-homed to PE1/PE2 and PE2/PE3, 446 respectively. Assume that BM1 is the B-MAC used for the site 447 corresponding to CE1. Similarly, BM2 is the B-MAC used for the site 448 corresponding to CE2. On PE1, a single B-MAC address (BM1) is 449 required for the site corresponding to CE1. On PE2, two B-MAC 450 addresses (BM1 and BM2) are required, one per site. Whereas on PE3, a 451 single B-MAC address (BM2) is required for the site corresponding to 452 CE2. All three PE nodes would advertise their respective B-MAC 453 addresses in BGP using the MAC Advertisement routes defined in 454 [EVPN]. The remote PE, PEr, would learn via BGP that BM1 is reachable 455 via PE1 and PE2, whereas BM2 is reachable via both PE2 and PE3. 456 Furthermore, PEr establishes via the normal bridge learning that C- 457 MAC M1 is reachable via BM1, and C-MAC M2 is reachable via BM2. As a 458 result, PEr can load-balance traffic destined to M1 between PE1 and 459 PE2, as well as traffic destined to M2 between both PE2 and PE3. In 460 the case of a failure that causes, for example, CE1 to be isolated 461 from PE1, the latter can withdraw the route it has advertised for 462 BM1. This way, PEr would update its path list for BM1, and will send 463 all traffic destined to M1 over to PE2 only. 465 For single-homed sites, it is possible to assign a unique B-MAC 466 address per site, or have all the single-homed sites connected to a 467 given PE share a single B-MAC address. The advantage of the first 468 model over the second model is the ability to avoid C-MAC destination 469 address lookup on the disposition PE (even though source C-MAC 470 learning is still required in the data-plane). Also, by assigning the 471 B-MAC addresses from a contiguous range, it is possible to advertise 472 a single B-MAC subnet for all single-homed sites, thereby rendering 473 the number of MAC advertisement routes required at par with the 474 second model. 476 In summary, every PE may use a unicast B-MAC address shared by all 477 single-homed CEs or a unicast B-MAC address per single-homed CE and, 478 in addition, a unicast B-MAC address per dual-homed CE. In the latter 479 case, the B-MAC address MUST be the same for all PE nodes in a 480 Redundancy Group connected to the same CE. 482 7.2.1.2. Automating B-MAC Address Assignment 484 The PE B-MAC address used for single-homed sites can be automatically 485 derived from the hardware (using for e.g. the backplane's address). 486 However, the B-MAC address used for multi-homed sites must be 487 coordinated among the RG members. To automate the assignment of this 488 latter address, the PE can derive this B-MAC address from the MAC 489 Address portion of the CE's LACP System Identifier by flipping the 490 'Locally Administered' bit of the CE's address. This guarantees the 491 uniqueness of the B-MAC address within the network, and ensures that 492 all PE nodes connected to the same multi-homed CE use the same value 493 for the B-MAC address. 495 Note that with this automatic provisioning of the B-MAC address 496 associated with multi-homed CEs, it is not possible to support the 497 uncommon scenario where a CE has multiple bundles towards the PE 498 nodes, and the service involves hair-pinning traffic from one bundle 499 to another. This is because the split-horizon filtering relies on B- 500 MAC addresses rather than Site-ID Labels (as will be described in the 501 next section). The operator must explicitly configure the B-MAC 502 address for this fairly uncommon service scenario. 504 Whenever a B-MAC address is provisioned on the PE, either manually or 505 automatically (as an outcome of CE auto-discovery), the PE MUST 506 transmit an MAC Advertisement Route for the B-MAC address with a 507 downstream assigned MPLS label that uniquely identifies that address 508 on the advertising PE. The route is tagged with the RTs of the 509 associated EVIs as described above. 511 7.2.1.3 Split Horizon and Designated Forwarder Election 513 [EVPN] relies on access split horizon, where the Ethernet Segment 514 Label is used for egress filtering on the attachment circuit in order 515 to prevent forwarding loops. In PBB-EVPN, the B-MAC source address 516 can be used for the same purpose, as it uniquely identifies the 517 originating site of a given frame. As such, Segment Labels are not 518 used in PBB-EVPN, and the egress split-horizon filtering is done 519 based on the B-MAC source address. It is worth noting here that [PBB] 520 defines this B-MAC address based filtering function as part of the I- 521 Component options, hence no new functions are required to support 522 split-horizon beyond what is already defined in [PBB]. Given that the 523 Segment label is not used in PBB-EVPN, the PE sets the Label field in 524 the Ethernet Segment Route to 0. 526 The Designated Forwarder election procedures are defined in [I-D- 527 Segment-Route]. 529 7.2.2 I-SID Based Load-balancing 531 This section describes the procedures for supporting device multi- 532 homing in a single-active redundancy model with per-ISID load- 533 balancing. 535 7.2.2.1 PE B-MAC Address Assignment 537 In the case where per-ISID load-balancing is desired among the PE 538 nodes in a given redundancy group, multiple unicast B-MAC addresses 539 are allocated per multi-homed Ethernet Segment: Each PE connected to 540 the multi-homed segment is assigned a unique B-MAC. Every PE then 541 advertises its B-MAC address using the BGP MAC advertisement route. 542 In this mode of operation, two B-MAC address assignment models are 543 possible: 545 - The PE may use a shared B-MAC address for multiple Ethernet 546 Segments. This includes the single-homed segments as well as the 547 multi-homed segments operating with per-ISID load-balancing mode. 549 - The PE may use a dedicated B-MAC address for each Ethernet Segment 550 operating with per-ISID load-balancing mode. 552 All PE implementations MUST support the shared B-MAC address model 553 and MAY support the dedicated B-MAC address model. 555 A remote PE initially floods traffic to a destination C-MAC address, 556 located in a given multi-homed Ethernet Segment, to all the PE nodes 557 connected to that segment. Then, when reply traffic arrives at the 558 remote PE, it learns (in the data-path) the B-MAC address and 559 associated next-hop PE to use for said C-MAC address. 561 7.2.2.2 Split Horizon and Designated Forwarder Election The procedures 562 are similar to the flow-based load-balancing case, with the only 563 difference being that the DF filtering must be applied to unicast as 564 well as multicast traffic, and in both core-to-segment as well as 565 segment-to-core directions. 567 7.2.2.3 Handling Failure Scenarios 569 When a PE connected to a multi-homed Ethernet Segment loses 570 connectivity to the segment, due to link or port failure, it needs to 571 notify the remote PEs to trigger C-MAC address flushing. This can be 572 achieved in one of two ways, depending on the B-MAC assignment model: 574 - If the PE uses a shared B-MAC address for multiple Ethernet 575 Segments, then the C-MAC flushing is signaled by means of having the 576 failed PE re-advertise the MAC Advertisement route for the associated 577 B-MAC, tagged with the MAC Mobility Extended Community attribute. The 578 value of the Counter field in that attribute must be incremented 579 prior to advertisement. This causes the remote PE nodes to flush all 580 C-MAC addresses associated with the B-MAC in question. This is done 581 across all I-SIDs that are mapped to the EVI of the withdrawn MAC 582 route. 584 - If the PE uses a dedicated B-MAC address for each Ethernet Segment 585 operating under per-ISID load-balancing mode, the the failed PE 586 simply withdraws the B-MAC route previously advertised for that 587 segment. This causes the remote PE nodes to flush all C-MAC addresses 588 associated with the B-MAC in question. This is done across all I-SIDs 589 that are mapped to the EVI of the withdrawn MAC route. 591 When a PE connected to a multi-homed Ethernet Segment fails (i.e. 592 node failure) or when the PE becomes completely isolated from the 593 EVPN network, the remote PEs will start purging the MAC Advertisement 594 routes that were advertised by the failed PE. This is done either as 595 an outcome of the remote PEs detecting that the BGP session to the 596 failed PE has gone down, or by having a Route Reflector withdrawing 597 all the routes that were advertised by the failed PE. The remote PEs, 598 in this case, will perform C-MAC address flushing as an outcome of 599 the MAC Advertisement route withdrawals. 601 For all failure scenarios (link/port failure, node failure and PE 602 node isolation), when the fault condition clears, the recovered PE 603 re-advertises the associated Ethernet Segment route to other members 604 of its Redundancy Group. This triggers the backup PE(s) in the 605 Redundancy Group to block the I-SIDs for which the recovered PE is a 606 DF. When a backup PE blocks the I-SIDs, it triggers a C-MAC address 607 flush notification to the remote PEs by re-advertising the MAC 608 Advertisement route for the associated B-MAC, with the MAC Mobility 609 Extended Community attribute. The value of the Counter field in that 610 attribute must be incremented prior to advertisement. This causes the 611 remote PE nodes to flush all C-MAC addresses associated with the B- 612 MAC in question. This is done across all I-SIDs that are mapped to 613 the EVI of the withdrawn MAC route. 615 7.3. Network Multi-homing 617 When an Ethernet network is multi-homed to a set of PE nodes running 618 PBB-EVPN, a single-active redundancy model can be supported with per 619 service instance (i.e. I-SID) load-balancing. In this model, DF 620 election is performed to ensure that a single PE node in the 621 redundancy group is responsible for forwarding traffic associated 622 with a given I-SID. This guarantees that no forwarding loops are 623 created. Filtering based on DF state applies to both unicast and 624 multicast traffic, and in both access-to-core as well as core-to- 625 access directions (unlike the multi-homed device scenario where DF 626 filtering is limited to multi-destination frames in the core-to- 627 access direction). Similar to the multi-homed device scenario, with 628 I-SID based load-balancing, a unique B-MAC address is assigned to 629 each of the PE nodes connected to the multi-homed network (Segment). 631 7.4. Frame Forwarding 633 The frame forwarding functions are divided in between the Bridge 634 Module, which hosts the [PBB] Backbone Edge Bridge (BEB) 635 functionality, and the MPLS Forwarder which handles the MPLS 636 imposition/disposition. The details of frame forwarding for unicast 637 and multi-destination frames are discussed next. 639 7.4.1. Unicast 641 Known unicast traffic received from the AC will be PBB-encapsulated 642 by the PE using the B-MAC source address corresponding to the 643 originating site. The unicast B-MAC destination address is determined 644 based on a lookup of the C-MAC destination address (the binding of 645 the two is done via transparent learning of reverse traffic). The 646 resulting frame is then encapsulated with an LSP tunnel label and the 647 MPLS label which uniquely identifies the B-MAC destination address on 648 the egress PE. If per flow load-balancing over ECMPs in the MPLS core 649 is required, then a flow label is added as the end of stack label. 651 For unknown unicast traffic, the PE forwards these frames over MPLS 652 core. When these frames are to be forwarded, then the same set of 653 options used for forwarding multicast/broadcast frames (as described 654 in next section) are used. 656 7.4.2. Multicast/Broadcast 658 Multi-destination frames received from the AC will be PBB- 659 encapsulated by the PE using the B-MAC source address corresponding 660 to the originating site. The multicast B-MAC destination address is 661 selected based on the value of the I-SID as defined in [PBB]. The 662 resulting frame is then forwarded over the MPLS core using one out of 663 the following two options: 665 Option 1: the MPLS Forwarder can perform ingress replication over a 666 set of MP2P tunnel LSPs. The frame is encapsulated with a tunnel LSP 667 label and the EVPN ingress replication label advertised in the 668 Inclusive Multicast Route. 670 Option 2: the MPLS Forwarder can use P2MP tunnel LSP per the 671 procedures defined in [EVPN]. This includes either the use of 672 Inclusive or Aggregate Inclusive trees. 674 Note that the same procedures for advertising and handling the 675 Inclusive Multicast Route defined in [EVPN] apply here. 677 8. Minimizing ARP Broadcast 679 The PE nodes implement an ARP-proxy function in order to minimize the 680 volume of ARP traffic that is broadcasted over the MPLS network. This 681 is achieved by having each PE node snoop on ARP request and response 682 messages received over the access interfaces or the MPLS core. The PE 683 builds a cache of IP / MAC address bindings from these snooped 684 messages. The PE then uses this cache to respond to ARP requests 685 ingress on access ports and targeting hosts that are in remote sites. 686 If the PE finds a match for the IP address in its ARP cache, it 687 responds back to the requesting host and drops the request. 688 Otherwise, if it does not find a match, then the request is flooded 689 over the MPLS network using either ingress replication or LSM. 691 9. Seamless Interworking with IEEE 802.1aq/802.1Qbp 692 +--------------+ 693 | | 694 +---------+ | MPLS | +---------+ 695 +----+ | | +----+ +----+ | | +----+ 696 |SW1 |--| | | PE1| | PE2| | |--| SW3| 697 +----+ | 802.1aq |---| | | |--| 802.1aq | +----+ 698 +----+ | .1Qbp | +----+ +----+ | .1Qbp | +----+ 699 |SW2 |--| | | Backbone | | |--| SW4| 700 +----+ +---------+ +--------------+ +---------+ +----+ 702 |<------ IS-IS -------->|<-----BGP----->|<------ IS-IS ------>| CP 704 |<------------------------- PBB -------------------------->| DP 705 |<----MPLS----->| 707 Legend: CP = Control Plane View 708 DP = Data Plane View 710 Figure 7: Interconnecting 802.1aq/802.1Qbp Networks with PBB-EVPN 712 9.1 B-MAC Address Assignment 714 For the same reasons cited in the TRILL section, the B-MAC addresses 715 need to be globally unique across all the IEEE 802.1aq / 802.1Qbp 716 networks. The same hierarchical address assignment scheme depicted 717 above is proposed for B-MAC addresses as well. 719 9.2 IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route 721 B-MAC addresses associated with 802.1aq / 802.1Qbp switches are 722 advertised using the BGP MAC Advertisement route already defined in 723 [EVPN]. 725 The encapsulation for the transport of PBB frames over MPLS is 726 similar to that of classical Ethernet, albeit with the additional PBB 727 header, as shown in the figure below: 729 +------------------+ 730 | IP/MPLS Header | 731 +------------------+ 732 | PBB Header | 733 +------------------+ 734 | Ethernet Header | 735 +------------------+ 736 | Ethernet Payload | 737 +------------------+ 738 | Ethernet FCS | 739 +------------------+ 741 Figure 8: PBB over MPLS Encapsulation 743 9.3 Operation: 745 When a PE receives a PBB-encapsulated Ethernet frame from the access 746 side, it performs a lookup on the B-MAC destination address to 747 identify the next hop. If the lookup yields that the next hop is a 748 remote PE, the local PE would then encapsulate the PBB frame in MPLS. 749 The label stack comprises of the VPN label (advertised by the remote 750 PE), followed by an LSP/IGP label. From that point onwards, regular 751 MPLS forwarding is applied. 753 On the disposition PE, assuming penultimate-hop-popping is employed, 754 the PE receives the MPLS-encapsulated PBB frame with a single label: 755 the VPN label. The value of the label indicates to the disposition PE 756 that this is a PBB frame, so the label is popped, the TTL field (in 757 the 802.1Qbp F-Tag) is reinitialized and normal PBB processing is 758 employed from this point onwards. 760 10. Solution Advantages 762 In this section, we discuss the advantages of the PBB-EVPN solution 763 in the context of the requirements set forth in section 3 above. 765 10.1. MAC Advertisement Route Scalability 767 In PBB-EVPN the number of MAC Advertisement Routes is a function of 768 the number of segments (sites), rather than the number of 769 hosts/servers. This is because the B-MAC addresses of the PEs, rather 770 than C-MAC addresses (of hosts/servers) are being advertised in BGP. 771 And, as discussed above, there's a one-to-one mapping between multi- 772 homed segments and B-MAC addresses, whereas there's a one-to-one or 773 many-to-one mapping between single-homed segments and B-MAC addresses 774 for a given PE. As a result, the volume of MAC Advertisement Routes 775 in PBB-EVPN is multiple orders of magnitude less than EVPN. 777 10.2. C-MAC Mobility with MAC Sub-netting 779 In PBB-EVPN, if a PE allocates its B-MAC addresses from a contiguous 780 range, then it can advertise a MAC prefix rather than individual 48- 781 bit addresses. It should be noted that B-MAC addresses can easily be 782 assigned from a contiguous range because PE nodes are within the 783 provider administrative domain; however, CE devices and hosts are 784 typically not within the provider administrative domain. The 785 advantage of such MAC address sub-netting can be maintained even as 786 C-MAC addresses move from one Ethernet segment to another. This is 787 because the C-MAC address to B-MAC address association is learnt in 788 the data-plane and C-MAC addresses are not advertised in BGP. To 789 illustrate how this compares to EVPN, consider the following example: 791 If a PE running EVPN advertises reachability for a MAC subnet that 792 spans N addresses via a particular segment, and then 50% of the MAC 793 addresses in that subnet move to other segments (e.g. due to virtual 794 machine mobility), then in the worst case, N/2 additional MAC 795 Advertisement routes need to be sent for the MAC addresses that have 796 moved. This defeats the purpose of the sub-netting. With PBB-EVPN, on 797 the other hand, the sub-netting applies to the B-MAC addresses which 798 are statically associated with PE nodes and are not subject to 799 mobility. As C-MAC addresses move from one segment to another, the 800 binding of C-MAC to B-MAC addresses is updated via data-plane 801 learning. 803 10.3. C-MAC Address Learning and Confinement 805 In PBB-EVPN, C-MAC address reachability information is built via 806 data-plane learning. As such, PE nodes not participating in active 807 conversations involving a particular C-MAC address will purge that 808 address from their forwarding tables. Furthermore, since C-MAC 809 addresses are not distributed in BGP, PE nodes will not maintain any 810 record of them in control-plane routing table. 812 10.4. Seamless Interworking with TRILL and 802.1aq Access Networks 814 Consider the scenario where two access networks, one running MPLS and 815 the other running 802.1aq, are interconnected via an MPLS backbone 816 network. The figure below shows such an example network. 818 +--------------+ 819 | | 820 +---------+ | MPLS | +---------+ 821 +----+ | | +----+ +----+ | | +----+ 822 | CE |--| | | PE1| | PE2| | |--| CE | 823 +----+ | 802.1aq |---| | | |--| MPLS | +----+ 824 +----+ | | +----+ +----+ | | +----+ 825 | CE |--| | | Backbone | | |--| CE | 826 +----+ +---------+ +--------------+ +---------+ +----+ 828 Figure 9: Interoperability with 802.1aq 830 If the MPLS backbone network employs EVPN, then the 802.1aq data- 831 plane encapsulation must be terminated on PE1 or the edge device 832 connecting to PE1. Either way, all the PE nodes that are part of the 833 associated service instances will be exposed to all the C-MAC 834 addresses of all hosts/servers connected to the access networks. 835 However, if the MPLS backbone network employs PBB-EVPN, then the 836 802.1aq encapsulation can be extended over the MPLS backbone, thereby 837 maintaining C-MAC address transparency on PE1. If PBB-EVPN is also 838 extended over the MPLS access network on the right, then C-MAC 839 addresses would be transparent to PE2 as well. 841 Interoperability with TRILL access network will be described in 842 future revision of this draft. 844 10.5. Per Site Policy Support 846 In PBB-EVPN, a unique B-MAC address can be associated with every site 847 (single-homed or multi-homed). Given that the B-MAC addresses are 848 sent in BGP MAC Advertisement routes, it is possible to define per 849 site (i.e. B-MAC) forwarding policies including policies for E-TREE 850 service. 852 10.6. Avoiding C-MAC Address Flushing 854 With PBB-EVPN, it is possible to avoid C-MAC address flushing upon 855 topology change affecting a multi-homed device. To illustrate this, 856 consider the example network of Figure 1. Both PE1 and PE2 advertize 857 the same B-MAC address (BM1) to PE3. PE3 then learns the C-MAC 858 addresses of the servers/hosts behind CE1 via data-plane learning. If 859 AC1 fails, then PE3 does not need to flush any of the C-MAC addresses 860 learnt and associated with BM1. This is because PE1 will withdraw the 861 MAC Advertisement routes associated with BM1, thereby leading PE3 to 862 have a single adjacency (to PE2) for this B-MAC address. Therefore, 863 the topology change is communicated to PE3 and no C-MAC address 864 flushing is required. 866 11. Acknowledgements 868 The authors would like to thank Jose Liste and Patrice Brissette for 869 their reviews and comments of this document. 871 12. Security Considerations 873 There are no additional security aspects beyond those of VPLS/H-VPLS 874 that need to be discussed here. 876 13. IANA Considerations 878 This document requires IANA to assign a new SAFI value for L2VPN_MAC 879 SAFI. 881 14. Intellectual Property Considerations 883 This document is being submitted for use in IETF standards 884 discussions. 886 15. Normative References 888 [PBB] Clauses 25 and 26 of "IEEE Standard for Local and metropolitan 889 area networks - Media Access Control (MAC) Bridges and 890 Virtual Bridged Local Area Networks", IEEE Std 802.1Q, 891 2013. 893 16. Informative References 895 [PBB-VPLS] Sajassi, et al., "VPLS Interoperability with Provider 896 Backbone Bridges", draft-ietf-l2vpn-pbb-vpls-interop- 897 05.txt, work in progress, October, 2013. 899 [EVPN-REQ] Sajassi, et al., "Requirements for Ethernet VPN (EVPN)", 900 draft-ietf-l2vpn-evpn-req-05.txt, work in progress, 901 October, 2013. 903 [EVPN] Sajassi, et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 904 l2vpn-evpn-04.txt, work in progress, July, 2013. 906 [MMRP] Clause 10 of "IEEE Standard for Local and metropolitan area 907 networks - Media Access Control (MAC) Bridges and Virtual 908 Bridged Local Area Networks", IEEE Std 802.1Q, 2013. 910 17. Authors' Addresses 912 Ali Sajassi 913 Cisco 914 170 West Tasman Drive 915 San Jose, CA 95134, US 916 Email: sajassi@cisco.com 918 Samer Salam 919 Cisco 920 595 Burrard Street, Suite # 2123 921 Vancouver, BC V7X 1J1, Canada 922 Email: ssalam@cisco.com 924 Sami Boutros 925 Cisco 926 170 West Tasman Drive 927 San Jose, CA 95134, US 928 Email: sboutros@cisco.com 930 Nabil Bitar 931 Verizon Communications 932 Email : nabil.n.bitar@verizon.com 934 Aldrin Isaac 935 Bloomberg 936 Email: aisaac71@bloomberg.net 938 Florin Balus 939 Alcatel-Lucent 940 701 E. Middlefield Road 941 Mountain View, CA, USA 94043 942 Email: florin.balus@alcatel-lucent.com 944 Wim Henderickx 945 Alcatel-Lucent 946 Email: wim.henderickx@alcatel-lucent.be 948 Clarence Filsfils 949 Cisco 950 Email: cfilsfil@cisco.com 951 Dennis Cai 952 Cisco 953 Email: dcai@cisco.com 955 Lizhong Jin 956 ZTE Corporation 957 889, Bibo Road 958 Shanghai, 201203, China 959 Email: lizhong.jin@zte.com.cn