idnits 2.17.1 draft-ietf-l2vpn-pbb-evpn-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2013) is 3937 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-06) exists of draft-ietf-l2vpn-pbb-vpls-interop-05 == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-evpn-req-04 == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-04 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Working Group Ali Sajassi 3 Internet Draft Samer Salam 4 Category: Standards Track Sami Boutros 5 Cisco 7 Florin Balus Nabil Bitar 8 Wim Henderickx Verizon 9 Alcatel-Lucent 10 Aldrin Isaac 11 Clarence Filsfils Bloomberg 12 Dennis Cai 13 Cisco Lizhong Jin 14 ZTE 16 Expires: January 16, 2014 July 16, 2013 18 PBB-EVPN 19 draft-ietf-l2vpn-pbb-evpn-05 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as 29 Internet-Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/1id-abstracts.html 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 Copyright and License Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Abstract 59 This document discusses how Ethernet Provider Backbone Bridging 60 [802.1ah] can be combined with EVPN in order to reduce the number of 61 BGP MAC advertisement routes by aggregating Customer/Client MAC (C- 62 MAC) addresses via Provider Backbone MAC address (B-MAC), provide 63 client MAC address mobility using C-MAC aggregation and B-MAC sub- 64 netting, confine the scope of C-MAC learning to only active flows, 65 offer per site policies and avoid C-MAC address flushing on topology 66 changes. The combined solution is referred to as PBB-EVPN. 68 Conventions 70 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 71 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 72 document are to be interpreted as described in RFC 2119. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 2. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 79 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4 80 4.1. MAC Advertisement Route Scalability . . . . . . . . . . . 5 81 4.2. C-MAC Mobility with MAC Summarization . . . . . . . . . . 5 82 4.3. C-MAC Address Learning and Confinement . . . . . . . . . . 5 83 4.4. Per Site Policy Support . . . . . . . . . . . . . . . . . 6 84 4.5. Avoiding C-MAC Address Flushing . . . . . . . . . . . . . 6 85 5. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 86 6. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 7 87 6.1. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 7 88 6.2. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 7 89 6.3. Per VPN Route Targets . . . . . . . . . . . . . . . . . . 8 90 6.4. MAC Mobility Extended Community . . . . . . . . . . . . . 8 91 7. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 92 7.1. MAC Address Distribution over Core . . . . . . . . . . . . 8 93 7.2. Device Multi-homing . . . . . . . . . . . . . . . . . . . 8 94 7.2.1 Flow-based Load-balancing . . . . . . . . . . . . . . . 8 95 7.2.1.1 PE B-MAC Address Assignment . . . . . . . . . . . . 8 96 7.2.1.2. Automating B-MAC Address Assignment . . . . . . . 10 97 7.2.1.3 Split Horizon and Designated Forwarder Election . . 11 98 7.2.2 I-SID Based Load-balancing . . . . . . . . . . . . . . . 11 99 7.2.2.1 PE B-MAC Address Assignment . . . . . . . . . . . . 11 100 7.2.2.2 Split Horizon and Designated Forwarder Election . . 12 101 7.2.2.3 Handling Failure Scenarios . . . . . . . . . . . . . 12 102 7.3. Network Multi-homing . . . . . . . . . . . . . . . . . . . 13 103 7.4. Frame Forwarding . . . . . . . . . . . . . . . . . . . . . 13 104 7.4.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . 13 105 7.4.2. Multicast/Broadcast . . . . . . . . . . . . . . . . . 14 106 8. Minimizing ARP Broadcast . . . . . . . . . . . . . . . . . . . 14 107 9. Seamless Interworking with IEEE 802.1aq/802.1Qbp . . . . . . . 15 108 9.1 B-MAC Address Assignment . . . . . . . . . . . . . . . . . . 15 109 9.2 IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route . . . . . 15 110 9.3 Operation: . . . . . . . . . . . . . . . . . . . . . . . . . 16 111 10. Solution Advantages . . . . . . . . . . . . . . . . . . . . . 16 112 10.1. MAC Advertisement Route Scalability . . . . . . . . . . . 16 113 10.2. C-MAC Mobility with MAC Sub-netting . . . . . . . . . . . 17 114 10.3. C-MAC Address Learning and Confinement . . . . . . . . . 17 115 10.4. Seamless Interworking with TRILL and 802.1aq Access 116 Networks . . . . . . . . . . . . . . . . . . . . . . . . 17 117 10.5. Per Site Policy Support . . . . . . . . . . . . . . . . . 18 118 10.6. Avoiding C-MAC Address Flushing . . . . . . . . . . . . . 18 119 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 120 12. Security Considerations . . . . . . . . . . . . . . . . . . . 19 121 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 122 14. Intellectual Property Considerations . . . . . . . . . . . . 19 123 15. Normative References . . . . . . . . . . . . . . . . . . . . 19 124 16. Informative References . . . . . . . . . . . . . . . . . . . 19 125 17. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 19 127 1. Introduction 129 [EVPN] introduces a solution for multipoint L2VPN services, with 130 advanced multi-homing capabilities, using BGP for distributing 131 customer/client MAC address reach-ability information over the core 132 MPLS/IP network. [802.1ah] defines an architecture for Ethernet 133 Provider Backbone Bridging (PBB), where MAC tunneling is employed to 134 improve service instance and MAC address scalability in Ethernet as 135 well as VPLS networks [PBB-VPLS]. 137 In this document, we discuss how PBB can be combined with EVPN in 138 order to: reduce the number of BGP MAC advertisement routes by 139 aggregating Customer/Client MAC (C-MAC) addresses via Provider 140 Backbone MAC address (B-MAC), provide client MAC address mobility 141 using C-MAC aggregation and B-MAC sub-netting, confine the scope of 142 C-MAC learning to only active flows, offer per site policies and 143 avoid C-MAC address flushing on topology changes. The combined 144 solution is referred to as PBB-EVPN. 146 2. Contributors 148 In addition to the authors listed above, the following individuals 149 also contributed to this document. 151 Keyur Patel, Cisco 152 Sam Aldrin, Huawei 153 Himanshu Shah, Ciena 155 3. Terminology 157 BEB: Backbone Edge Bridge 158 B-MAC: Backbone MAC Address 159 CE: Customer Edge 160 C-MAC: Customer/Client MAC Address 161 DHD: Dual-homed Device 162 DHN: Dual-homed Network 163 LACP: Link Aggregation Control Protocol 164 LSM: Label Switched Multicast 165 MDT: Multicast Delivery Tree 166 MP2MP: Multipoint to Multipoint 167 P2MP: Point to Multipoint 168 P2P: Point to Point 169 PE: Provider Edge 170 PoA: Point of Attachment 171 PW: Pseudowire 172 EVPN: Ethernet VPN 174 4. Requirements 175 The requirements for PBB-EVPN include all the requirements for EVPN 176 that were described in [EVPN-REQ], in addition to the following: 178 4.1. MAC Advertisement Route Scalability 180 In typical operation, an [EVPN] PE sends a BGP MAC Advertisement 181 Route per customer/client MAC (C-MAC) address. In certain 182 applications, this poses scalability challenges, as is the case in 183 virtualized data center environments where the number of virtual 184 machines (VMs), and hence the number of C-MAC addresses, can be in 185 the millions. In such scenarios, it is required to reduce the number 186 of BGP MAC Advertisement routes by relying on a 'MAC summarization' 187 scheme, as is provided by PBB. Note that the MAC summarization 188 capability already built into EVPN is not sufficient in those 189 environments, as will be discussed next. 191 4.2. C-MAC Mobility with MAC Summarization 193 Certain applications, such as virtual machine mobility, require 194 support for fast C-MAC address mobility. For these applications, it 195 is not possible to use MAC address summarization in EVPN, i.e. 196 advertise reach-ability to a MAC address prefix. Rather, the exact 197 virtual machine MAC address needs to be transmitted in BGP MAC 198 Advertisement route. Otherwise, traffic would be forwarded to the 199 wrong segment when a virtual machine moves from one Ethernet segment 200 to another. This hinders the scalability benefits of summarization. 202 It is required to support C-MAC address mobility, while retaining the 203 scalability benefits of MAC summarization. This can be achieved by 204 leveraging PBB technology, which defines a Backbone MAC (B-MAC) 205 address space that is independent of the C-MAC address space, and 206 aggregate C-MAC addresses via a B-MAC address and then apply 207 summarization to B-MAC addresses. 209 4.3. C-MAC Address Learning and Confinement 211 In EVPN, all the PE nodes participating in the same EVPN instance are 212 exposed to all the C-MAC addresses learnt by any one of these PE 213 nodes because a C-MAC learned by one of the PE nodes is advertise in 214 BGP to other PE nodes in that EVPN instance. This is the case even if 215 some of the PE nodes for that EVPN instance are not involved in 216 forwarding traffic to, or from, these C-MAC addresses. Even if an 217 implementation does not install hardware forwarding entries for C-MAC 218 addresses that are not part of active traffic flows on that PE, the 219 device memory is still consumed by keeping record of the C-MAC 220 addresses in the routing table (RIB). In network applications with 221 millions of C-MAC addresses, this introduces a non-trivial waste of 222 PE resources. As such, it is required to confine the scope of 223 visibility of C-MAC addresses only to those PE nodes that are 224 actively involved in forwarding traffic to, or from, these addresses. 226 4.4. Per Site Policy Support 228 In many applications, it is required to be able to enforce 229 connectivity policy rules at the granularity of a site (or segment). 230 This includes the ability to control which PE nodes in the network 231 can forward traffic to, or from, a given site. PBB-EVPN is capable of 232 providing this granularity of policy control. In the case where per 233 C-MAC address granularity is required, the EVI can always continue to 234 operate in EVPN mode. 236 4.5. Avoiding C-MAC Address Flushing 238 It is required to avoid C-MAC address flushing upon link, port or 239 node failure for multi-homed devices and networks. This is in order 240 to speed up re-convergence upon failure. 242 5. Solution Overview 244 The solution involves incorporating IEEE 802.1ah Backbone Edge Bridge 245 (BEB) functionality on the EVPN PE nodes similar to PBB-VPLS, where 246 BEB functionality is incorporated in the VPLS PE nodes. The PE 247 devices would then receive 802.1Q Ethernet frames from their 248 attachment circuits, encapsulate them in the PBB header and forward 249 the frames over the IP/MPLS core. On the egress EVPN PE, the PBB 250 header is removed following the MPLS disposition, and the original 251 802.1Q Ethernet frame is delivered to the customer equipment. 252 BEB +--------------+ BEB 253 || | | || 254 \/ | | \/ 255 +----+ AC1 +----+ | | +----+ +----+ 256 | CE1|-----| | | | | |---| CE2| 257 +----+\ | PE1| | IP/MPLS | | PE3| +----+ 258 \ +----+ | Network | +----+ 259 \ | | 260 AC2\ +----+ | | 261 \| | | | 262 | PE2| | | 263 +----+ | | 264 /\ +--------------+ 265 || 266 BEB 267 <-802.1Q-> <------PBB over MPLS------> <-802.1Q-> 269 Figure 1: PBB-EVPN Network 270 The PE nodes perform the following functions:- Learn customer/client 271 MAC addresses (C-MACs) over the attachment circuits in the data- 272 plane, per normal bridge operation. 274 - Learn remote C-MAC to B-MAC bindings in the data-plane from traffic 275 ingress from the core per [802.1ah] bridging operation. 277 - Advertise local B-MAC address reach-ability information in BGP to 278 all other PE nodes in the same set of service instances. Note that 279 every PE has a set of local B-MAC addresses that uniquely identify 280 the device. More on the PE addressing in section 5. 282 - Build a forwarding table from remote BGP advertisements received 283 associating remote B-MAC addresses with remote PE IP addresses and 284 the associated MPLS label(s). 286 6. BGP Encoding 288 PBB-EVPN leverages the same BGP Routes and Attributes defined in 289 [EVPN], adapted as follows: 291 6.1. BGP MAC Advertisement Route 293 The EVPN MAC Advertisement Route is used to distribute B-MAC 294 addresses of the PE nodes instead of the C-MAC addresses of end- 295 stations/hosts. This is because the C-MAC addresses are learnt in the 296 data-plane for traffic arriving from the core. The MAC Advertisement 297 Route is encoded as follows: 299 - The MAC address field contains the B-MAC address. 300 - The Ethernet Tag field is set to 0. 301 - The Ethernet Segment Identifier field must be set either to 0 (for 302 single-homed Segments or multi-homed Segments with per-ISID load- 303 balancing) or to MAX-ESI (for multi-homed Segments with per-flow 304 load-balancing). All other values are not permitted. 306 The route is tagged with the RT corresponding to the EVI associated 307 with the B-MAC address. 309 All other fields are set as defined in [EVPN]. 311 6.2. Ethernet Auto-Discovery Route 313 This route and all of its associated modes are not needed in PBB- 314 EVPN. 316 The receiving PE knows that it need not wait for the receipt of the 317 Ethernet A-D route for route resolution by means of the reserved ESI 318 encoded in the MAC Advertisement route: the ESI values of 0 and MAX- 319 ESI indicate that the receiving PE can resolve the path without an 320 Ethernet A-D route. 322 6.3. Per VPN Route Targets 324 PBB-EVPN uses the same set of route targets defined in [EVPN]. The 325 future revision of this document will describe new RT types. 327 6.4. MAC Mobility Extended Community 329 This extended community is defined in [EVPN]. When used in PBB-EVPN, 330 it indicates that the C-MAC forwarding tables for the I-SIDs 331 associated with the RT tagging the MAC Advertisement route must be 332 flushed. 334 Note that all other BGP messages and/or attributes are used as 335 defined in [EVPN]. 337 7. Operation 339 This section discusses the operation of PBB-EVPN, specifically in 340 areas where it differs from [EVPN]. 342 7.1. MAC Address Distribution over Core 344 In PBB-EVPN, host MAC addresses (i.e. C-MAC addresses) need not be 345 distributed in BGP. Rather, every PE independently learns the C-MAC 346 addresses in the data-plane via normal bridging operation. Every PE 347 has a set of one or more unicast B-MAC addresses associated with it, 348 and those are the addresses distributed over the core in MAC 349 Advertisement routes. 351 7.2. Device Multi-homing 353 7.2.1 Flow-based Load-balancing 355 This section describes the procedures for supporting device multi- 356 homing in an all-active redundancy model with flow-based load- 357 balancing. 359 7.2.1.1 PE B-MAC Address Assignment 361 In [802.1ah] every BEB is uniquely identified by one or more B-MAC 362 addresses. These addresses are usually locally administered by the 363 Service Provider. For PBB-EVPN, the choice of B-MAC address(es) for 364 the PE nodes must be examined carefully as it has implications on the 365 proper operation of multi-homing. In particular, for the scenario 366 where a CE is multi-homed to a number of PE nodes with all-active 367 redundancy and flow-based load-balancing, a given C-MAC address would 368 be reachable via multiple PE nodes concurrently. Given that any given 369 remote PE will bind the C-MAC address to a single B-MAC address, then 370 the various PE nodes connected to the same CE must share the same B- 371 MAC address. Otherwise, the MAC address table of the remote PE nodes 372 will keep oscillating between the B-MAC addresses of the various PE 373 devices. For example, consider the network of Figure 1, and assume 374 that PE1 has B-MAC BM1 and PE2 has B-MAC BM2. Also, assume that both 375 links from CE1 to the PE nodes are part of an all-active multi- 376 chassis Ethernet link aggregation group. If BM1 is not equal to BM2, 377 the consequence is that the MAC address table on PE3 will keep 378 oscillating such that the C-MAC address CM of CE1 would flip-flop 379 between BM1 or BM2, depending on the load-balancing decision on CE1 380 for traffic destined to the core. 382 Considering that there could be multiple sites (e.g. CEs) that are 383 multi-homed to the same set of PE nodes, then it is required for all 384 the PE devices in a Redundancy Group to have a unique B-MAC address 385 per site. This way, it is possible to achieve fast convergence in the 386 case where a link or port failure impacts the attachment circuit 387 connecting a single site to a given PE. 389 +---------+ 390 +-------+ PE1 | IP/MPLS | 391 / | | 392 CE1 | Network | PEr 393 M1 \ | | 394 +-------+ PE2 | | 395 /-------+ | | 396 / | | 397 CE2 | | 398 M2 \ | | 399 \ | | 400 +------+ PE3 +---------+ 402 Figure 2: B-MAC Address Assignment 404 In the example network shown in Figure 2 above, two sites 405 corresponding to CE1 and CE2 are dual-homed to PE1/PE2 and PE2/PE3, 406 respectively. Assume that BM1 is the B-MAC used for the site 407 corresponding to CE1. Similarly, BM2 is the B-MAC used for the site 408 corresponding to CE2. On PE1, a single B-MAC address (BM1) is 409 required for the site corresponding to CE1. On PE2, two B-MAC 410 addresses (BM1 and BM2) are required, one per site. Whereas on PE3, a 411 single B-MAC address (BM2) is required for the site corresponding to 412 CE2. All three PE nodes would advertise their respective B-MAC 413 addresses in BGP using the MAC Advertisement routes defined in 414 [EVPN]. The remote PE, PEr, would learn via BGP that BM1 is reachable 415 via PE1 and PE2, whereas BM2 is reachable via both PE2 and PE3. 416 Furthermore, PEr establishes via the normal bridge learning that C- 417 MAC M1 is reachable via BM1, and C-MAC M2 is reachable via BM2. As a 418 result, PEr can load-balance traffic destined to M1 between PE1 and 419 PE2, as well as traffic destined to M2 between both PE2 and PE3. In 420 the case of a failure that causes, for example, CE1 to be isolated 421 from PE1, the latter can withdraw the route it has advertised for 422 BM1. This way, PEr would update its path list for BM1, and will send 423 all traffic destined to M1 over to PE2 only. 425 For single-homed sites, it is possible to assign a unique B-MAC 426 address per site, or have all the single-homed sites connected to a 427 given PE share a single B-MAC address. The advantage of the first 428 model over the second model is the ability to avoid C-MAC destination 429 address lookup on the disposition PE (even though source C-MAC 430 learning is still required in the data-plane). Also, by assigning the 431 B-MAC addresses from a contiguous range, it is possible to advertise 432 a single B-MAC subnet for all single-homed sites, thereby rendering 433 the number of MAC advertisement routes required at par with the 434 second model. 436 In summary, every PE may use a unicast B-MAC address shared by all 437 single-homed CEs or a unicast B-MAC address per single-homed CE and, 438 in addition, a unicast B-MAC address per dual-homed CE. In the latter 439 case, the B-MAC address MUST be the same for all PE nodes in a 440 Redundancy Group connected to the same CE. 442 7.2.1.2. Automating B-MAC Address Assignment 444 The PE B-MAC address used for single-homed sites can be automatically 445 derived from the hardware (using for e.g. the backplane's address). 446 However, the B-MAC address used for multi-homed sites must be 447 coordinated among the RG members. To automate the assignment of this 448 latter address, the PE can derive this B-MAC address from the MAC 449 Address portion of the CE's LACP System Identifier by flipping the 450 'Locally Administered' bit of the CE's address. This guarantees the 451 uniqueness of the B-MAC address within the network, and ensures that 452 all PE nodes connected to the same multi-homed CE use the same value 453 for the B-MAC address. 455 Note that with this automatic provisioning of the B-MAC address 456 associated with multi-homed CEs, it is not possible to support the 457 uncommon scenario where a CE has multiple bundles towards the PE 458 nodes, and the service involves hair-pinning traffic from one bundle 459 to another. This is because the split-horizon filtering relies on B- 460 MAC addresses rather than Site-ID Labels (as will be described in the 461 next section). The operator must explicitly configure the B-MAC 462 address for this fairly uncommon service scenario. 464 Whenever a B-MAC address is provisioned on the PE, either manually or 465 automatically (as an outcome of CE auto-discovery), the PE MUST 466 transmit an MAC Advertisement Route for the B-MAC address with a 467 downstream assigned MPLS label that uniquely identifies that address 468 on the advertising PE. The route is tagged with the RTs of the 469 associated EVIs as described above. 471 7.2.1.3 Split Horizon and Designated Forwarder Election 473 [EVPN] relies on access split horizon, where the Ethernet Segment 474 Label is used for egress filtering on the attachment circuit in order 475 to prevent forwarding loops. In PBB-EVPN, the B-MAC source address 476 can be used for the same purpose, as it uniquely identifies the 477 originating site of a given frame. As such, Segment Labels are not 478 used in PBB-EVPN, and the egress split-horizon filtering is done 479 based on the B-MAC source address. It is worth noting here that 480 [802.1ah] defines this B-MAC address based filtering function as part 481 of the I-Component options, hence no new functions are required to 482 support split-horizon beyond what is already defined in [802.1ah]. 483 Given that the Segment label is not used in PBB-EVPN, the PE sets the 484 Label field in the Ethernet Segment Route to 0. 486 The Designated Forwarder election procedures are defined in [I-D- 487 Segment-Route]. 489 7.2.2 I-SID Based Load-balancing 491 This section describes the procedures for supporting device multi- 492 homing in an all-active redundancy model with per-ISID load- 493 balancing. 495 7.2.2.1 PE B-MAC Address Assignment 497 In the case where per-ISID load-balancing is desired among the PE 498 nodes in a given redundancy group, multiple unicast B-MAC addresses 499 are allocated per multi-homed Ethernet Segment: Each PE connected to 500 the multi-homed segment is assigned a unique B-MAC. Every PE then 501 advertises its B-MAC address using the BGP MAC advertisement route. 502 In this mode of operation, two B-MAC address assignment models are 503 possible: 505 - The PE may use a shared B-MAC address for multiple Ethernet 506 Segments. This includes the single-homed segments as well as the 507 multi-homed segments operating with per-ISID load-balancing mode. 509 - The PE may use a dedicated B-MAC address for each Ethernet Segment 510 operating with per-ISID load-balancing mode. 512 All PE implementations MUST support the shared B-MAC address model 513 and MAY support the dedicated B-MAC address model. 515 A remote PE initially floods traffic to a destination C-MAC address, 516 located in a given multi-homed Ethernet Segment, to all the PE nodes 517 connected to that segment. Then, when reply traffic arrives at the 518 remote PE, it learns (in the data-path) the B-MAC address and 519 associated next-hop PE to use for said C-MAC address. 521 7.2.2.2 Split Horizon and Designated Forwarder Election The procedures 522 are similar to the flow-based load-balancing case, with the only 523 difference being that the DF filtering must be applied to unicast as 524 well as multicast traffic, and in both core-to-segment as well as 525 segment-to-core directions. 527 7.2.2.3 Handling Failure Scenarios 529 When a PE connected to a multi-homed Ethernet Segment loses 530 connectivity to the segment, due to link or port failure, it needs to 531 notify the remote PEs to trigger C-MAC address flushing. This can be 532 achieved in one of two ways, depending on the B-MAC assignment model: 534 - If the PE uses a shared B-MAC address for multiple Ethernet 535 Segments, then the C-MAC flushing is signaled by means of having the 536 failed PE re-advertise the MAC Advertisement route for the associated 537 B-MAC, tagged with the MAC Mobility Extended Community attribute. The 538 value of the Counter field in that attribute must be incremented 539 prior to advertisement. This causes the remote PE nodes to flush all 540 C-MAC addresses associated with the B-MAC in question. This is done 541 across all I-SIDs that are mapped to the EVI of the withdrawn MAC 542 route. 544 - If the PE uses a dedicated B-MAC address for each Ethernet Segment 545 operating under per-ISID load-balancing mode, the the failed PE 546 simply withdraws the B-MAC route previously advertised for that 547 segment. This causes the remote PE nodes to flush all C-MAC addresses 548 associated with the B-MAC in question. This is done across all I-SIDs 549 that are mapped to the EVI of the withdrawn MAC route. 551 When a PE connected to a multi-homed Ethernet Segment fails (i.e. 552 node failure) or when the PE becomes completely isolated from the 553 EVPN network, the remote PEs will start purging the MAC Advertisement 554 routes that were advertised by the failed PE. This is done either as 555 an outcome of the remote PEs detecting that the BGP session to the 556 failed PE has gone down, or by having a Route Reflector withdrawing 557 all the routes that were advertised by the failed PE. The remote PEs, 558 in this case, will perform C-MAC address flushing as an outcome of 559 the MAC Advertisement route withdrawals. 561 For all failure scenarios (link/port failure, node failure and PE 562 node isolation), when the fault condition clears, the recovered PE 563 re-advertises the associated Ethernet Segment route to other members 564 of its Redundancy Group. This triggers the backup PE(s) in the 565 Redundancy Group to block the I-SIDs for which the recovered PE is a 566 DF. When a backup PE blocks the I-SIDs, it triggers a C-MAC address 567 flush notification to the remote PEs by re-advertising the MAC 568 Advertisement route for the associated B-MAC, with the MAC Mobility 569 Extended Community attribute. The value of the Counter field in that 570 attribute must be incremented prior to advertisement. This causes the 571 remote PE nodes to flush all C-MAC addresses associated with the B- 572 MAC in question. This is done across all I-SIDs that are mapped to 573 the EVI of the withdrawn MAC route. 575 7.3. Network Multi-homing 577 When an Ethernet network is multi-homed to a set of PE nodes running 578 PBB-EVPN, an all-active redundancy model can be supported with per 579 service instance (i.e. I-SID) load-balancing. In this model, DF 580 election is performed to ensure that a single PE node in the 581 redundancy group is responsible for forwarding traffic associated 582 with a given I-SID. This guarantees that no forwarding loops are 583 created. Filtering based on DF state applies to both unicast and 584 multicast traffic, and in both access-to-core as well as core-to- 585 access directions (unlike the multi-homed device scenario where DF 586 filtering is limited to multi-destination frames in the core-to- 587 access direction). Similar to the multi-homed device scenario, with 588 I-SID based load-balancing, a unique B-MAC address is assigned to 589 each of the PE nodes connected to the multi-homed network (Segment). 591 7.4. Frame Forwarding 593 The frame forwarding functions are divided in between the Bridge 594 Module, which hosts the [802.1ah] Backbone Edge Bridge (BEB) 595 functionality, and the MPLS Forwarder which handles the MPLS 596 imposition/disposition. The details of frame forwarding for unicast 597 and multi-destination frames are discussed next. 599 7.4.1. Unicast 601 Known unicast traffic received from the AC will be PBB-encapsulated 602 by the PE using the B-MAC source address corresponding to the 603 originating site. The unicast B-MAC destination address is determined 604 based on a lookup of the C-MAC destination address (the binding of 605 the two is done via transparent learning of reverse traffic). The 606 resulting frame is then encapsulated with an LSP tunnel label and the 607 MPLS label which uniquely identifies the B-MAC destination address on 608 the egress PE. If per flow load-balancing over ECMPs in the MPLS core 609 is required, then a flow label is added as the end of stack label. 611 For unknown unicast traffic, the PE forwards these frames over MPLS 612 core. When these frames are to be forwarded, then the same set of 613 options used for forwarding multicast/broadcast frames (as described 614 in next section) are used. 616 7.4.2. Multicast/Broadcast 618 Multi-destination frames received from the AC will be PBB- 619 encapsulated by the PE using the B-MAC source address corresponding 620 to the originating site. The multicast B-MAC destination address is 621 selected based on the value of the I-SID as defined in [802.1ah]. The 622 resulting frame is then forwarded over the MPLS core using one out of 623 the following two options: 625 Option 1: the MPLS Forwarder can perform ingress replication over a 626 set of MP2P tunnel LSPs. The frame is encapsulated with a tunnel LSP 627 label and the EVPN ingress replication label advertised in the 628 Inclusive Multicast Route. 630 Option 2: the MPLS Forwarder can use P2MP tunnel LSP per the 631 procedures defined in [EVPN]. This includes either the use of 632 Inclusive or Aggregate Inclusive trees. 634 Note that the same procedures for advertising and handling the 635 Inclusive Multicast Route defined in [EVPN] apply here. 637 8. Minimizing ARP Broadcast 639 The PE nodes implement an ARP-proxy function in order to minimize the 640 volume of ARP traffic that is broadcasted over the MPLS network. This 641 is achieved by having each PE node snoop on ARP request and response 642 messages received over the access interfaces or the MPLS core. The PE 643 builds a cache of IP / MAC address bindings from these snooped 644 messages. The PE then uses this cache to respond to ARP requests 645 ingress on access ports and targeting hosts that are in remote sites. 646 If the PE finds a match for the IP address in its ARP cache, it 647 responds back to the requesting host and drops the request. 648 Otherwise, if it does not find a match, then the request is flooded 649 over the MPLS network using either ingress replication or LSM. 651 9. Seamless Interworking with IEEE 802.1aq/802.1Qbp 653 +--------------+ 654 | | 655 +---------+ | MPLS | +---------+ 656 +----+ | | +----+ +----+ | | +----+ 657 |SW1 |--| | | PE1| | PE2| | |--| SW3| 658 +----+ | 802.1aq |---| | | |--| 802.1aq | +----+ 659 +----+ | .1Qbp | +----+ +----+ | .1Qbp | +----+ 660 |SW2 |--| | | Backbone | | |--| SW4| 661 +----+ +---------+ +--------------+ +---------+ +----+ 663 |<------ IS-IS -------->|<-----BGP----->|<------ IS-IS ------>| CP 665 |<------------------------- PBB -------------------------->| DP 666 |<----MPLS----->| 668 Legend: CP = Control Plane View 669 DP = Data Plane View 671 Figure 7: Interconnecting 802.1aq/802.1Qbp Networks with PBB-EVPN 673 9.1 B-MAC Address Assignment 675 For the same reasons cited in the TRILL section, the B-MAC addresses 676 need to be globally unique across all the IEEE 802.1aq / 802.1Qbp 677 networks. The same hierarchical address assignment scheme depicted 678 above is proposed for B-MAC addresses as well. 680 9.2 IEEE 802.1aq / 802.1Qbp B-MAC Advertisement Route 682 B-MAC addresses associated with 802.1aq / 802.1Qbp switches are 683 advertised using the BGP MAC Advertisement route already defined in 684 [EVPN]. 686 The encapsulation for the transport of PBB frames over MPLS is 687 similar to that of classical Ethernet, albeit with the additional PBB 688 header, as shown in the figure below: 690 +------------------+ 691 | IP/MPLS Header | 692 +------------------+ 693 | PBB Header | 694 +------------------+ 695 | Ethernet Header | 696 +------------------+ 697 | Ethernet Payload | 698 +------------------+ 699 | Ethernet FCS | 700 +------------------+ 702 Figure 8: PBB over MPLS Encapsulation 704 9.3 Operation: 706 When a PE receives a PBB-encapsulated Ethernet frame from the access 707 side, it performs a lookup on the B-MAC destination address to 708 identify the next hop. If the lookup yields that the next hop is a 709 remote PE, the local PE would then encapsulate the PBB frame in MPLS. 710 The label stack comprises of the VPN label (advertised by the remote 711 PE), followed by an LSP/IGP label. From that point onwards, regular 712 MPLS forwarding is applied. 714 On the disposition PE, assuming penultimate-hop-popping is employed, 715 the PE receives the MPLS-encapsulated PBB frame with a single label: 716 the VPN label. The value of the label indicates to the disposition PE 717 that this is a PBB frame, so the label is popped, the TTL field (in 718 the 802.1Qbp F-Tag) is reinitialized and normal PBB processing is 719 employed from this point onwards. 721 10. Solution Advantages 723 In this section, we discuss the advantages of the PBB-EVPN solution 724 in the context of the requirements set forth in section 3 above. 726 10.1. MAC Advertisement Route Scalability 728 In PBB-EVPN the number of MAC Advertisement Routes is a function of 729 the number of segments (sites), rather than the number of 730 hosts/servers. This is because the B-MAC addresses of the PEs, rather 731 than C-MAC addresses (of hosts/servers) are being advertised in BGP. 732 And, as discussed above, there's a one-to-one mapping between multi- 733 homed segments and B-MAC addresses, whereas there's a one-to-one or 734 many-to-one mapping between single-homed segments and B-MAC addresses 735 for a given PE. As a result, the volume of MAC Advertisement Routes 736 in PBB-EVPN is multiple orders of magnitude less than EVPN. 738 10.2. C-MAC Mobility with MAC Sub-netting 740 In PBB-EVPN, if a PE allocates its B-MAC addresses from a contiguous 741 range, then it can advertise a MAC prefix rather than individual 48- 742 bit addresses. It should be noted that B-MAC addresses can easily be 743 assigned from a contiguous range because PE nodes are within the 744 provider administrative domain; however, CE devices and hosts are 745 typically not within the provider administrative domain. The 746 advantage of such MAC address sub-netting can be maintained even as 747 C-MAC addresses move from one Ethernet segment to another. This is 748 because the C-MAC address to B-MAC address association is learnt in 749 the data-plane and C-MAC addresses are not advertised in BGP. To 750 illustrate how this compares to EVPN, consider the following example: 752 If a PE running EVPN advertises reachability for a MAC subnet that 753 spans N addresses via a particular segment, and then 50% of the MAC 754 addresses in that subnet move to other segments (e.g. due to virtual 755 machine mobility), then in the worst case, N/2 additional MAC 756 Advertisement routes need to be sent for the MAC addresses that have 757 moved. This defeats the purpose of the sub-netting. With PBB-EVPN, on 758 the other hand, the sub-netting applies to the B-MAC addresses which 759 are statically associated with PE nodes and are not subject to 760 mobility. As C-MAC addresses move from one segment to another, the 761 binding of C-MAC to B-MAC addresses is updated via data-plane 762 learning. 764 10.3. C-MAC Address Learning and Confinement 766 In PBB-EVPN, C-MAC address reachability information is built via 767 data-plane learning. As such, PE nodes not participating in active 768 conversations involving a particular C-MAC address will purge that 769 address from their forwarding tables. Furthermore, since C-MAC 770 addresses are not distributed in BGP, PE nodes will not maintain any 771 record of them in control-plane routing table. 773 10.4. Seamless Interworking with TRILL and 802.1aq Access Networks 775 Consider the scenario where two access networks, one running MPLS and 776 the other running 802.1aq, are interconnected via an MPLS backbone 777 network. The figure below shows such an example network. 779 +--------------+ 780 | | 781 +---------+ | MPLS | +---------+ 782 +----+ | | +----+ +----+ | | +----+ 783 | CE |--| | | PE1| | PE2| | |--| CE | 784 +----+ | 802.1aq |---| | | |--| MPLS | +----+ 785 +----+ | | +----+ +----+ | | +----+ 786 | CE |--| | | Backbone | | |--| CE | 787 +----+ +---------+ +--------------+ +---------+ +----+ 789 Figure 9: Interoperability with 802.1aq 791 If the MPLS backbone network employs EVPN, then the 802.1aq data- 792 plane encapsulation must be terminated on PE1 or the edge device 793 connecting to PE1. Either way, all the PE nodes that are part of the 794 associated service instances will be exposed to all the C-MAC 795 addresses of all hosts/servers connected to the access networks. 796 However, if the MPLS backbone network employs PBB-EVPN, then the 797 802.1aq encapsulation can be extended over the MPLS backbone, thereby 798 maintaining C-MAC address transparency on PE1. If PBB-EVPN is also 799 extended over the MPLS access network on the right, then C-MAC 800 addresses would be transparent to PE2 as well. 802 Interoperability with TRILL access network will be described in 803 future revision of this draft. 805 10.5. Per Site Policy Support 807 In PBB-EVPN, a unique B-MAC address can be associated with every site 808 (single-homed or multi-homed). Given that the B-MAC addresses are 809 sent in BGP MAC Advertisement routes, it is possible to define per 810 site (i.e. B-MAC) forwarding policies including policies for E-TREE 811 service. 813 10.6. Avoiding C-MAC Address Flushing 815 With PBB-EVPN, it is possible to avoid C-MAC address flushing upon 816 topology change affecting a multi-homed device. To illustrate this, 817 consider the example network of Figure 1. Both PE1 and PE2 advertize 818 the same B-MAC address (BM1) to PE3. PE3 then learns the C-MAC 819 addresses of the servers/hosts behind CE1 via data-plane learning. If 820 AC1 fails, then PE3 does not need to flush any of the C-MAC addresses 821 learnt and associated with BM1. This is because PE1 will withdraw the 822 MAC Advertisement routes associated with BM1, thereby leading PE3 to 823 have a single adjacency (to PE2) for this B-MAC address. Therefore, 824 the topology change is communicated to PE3 and no C-MAC address 825 flushing is required. 827 11. Acknowledgements 829 TBD. 831 12. Security Considerations 833 There are no additional security aspects beyond those of VPLS/H-VPLS 834 that need to be discussed here. 836 13. IANA Considerations 838 This document requires IANA to assign a new SAFI value for L2VPN_MAC 839 SAFI. 841 14. Intellectual Property Considerations 843 This document is being submitted for use in IETF standards 844 discussions. 846 15. Normative References 848 [802.1ah] "Virtual Bridged Local Area Networks Amendment 7: Provider 849 Backbone Bridges", IEEE Std. 802.1ah-2008, August 2008. 851 16. Informative References 853 [PBB-VPLS] Sajassi et al., "VPLS Interoperability with Provider 854 Backbone Bridges", draft-ietf-l2vpn-pbb-vpls-interop- 855 05.txt, work in progress, July, 2011. 857 [EVPN-REQ] Sajassi et al., "Requirements for Ethernet VPN (EVPN)", 858 draft-ietf-l2vpn-evpn-req-04.txt, work in progress, July, 859 2011. 861 [EVPN] Aggarwal et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 862 l2vpn-evpn-04.txt, work in progress, February, 2012. 864 17. Authors' Addresses 866 Ali Sajassi 867 Cisco 868 170 West Tasman Drive 869 San Jose, CA 95134, US 870 Email: sajassi@cisco.com 871 Samer Salam 872 Cisco 873 595 Burrard Street, Suite # 2123 874 Vancouver, BC V7X 1J1, Canada 875 Email: ssalam@cisco.com 877 Sami Boutros 878 Cisco 879 170 West Tasman Drive 880 San Jose, CA 95134, US 881 Email: sboutros@cisco.com 883 Nabil Bitar 884 Verizon Communications 885 Email : nabil.n.bitar@verizon.com 887 Aldrin Isaac 888 Bloomberg 889 Email: aisaac71@bloomberg.net 891 Florin Balus 892 Alcatel-Lucent 893 701 E. Middlefield Road 894 Mountain View, CA, USA 94043 895 Email: florin.balus@alcatel-lucent.com 897 Wim Henderickx 898 Alcatel-Lucent 899 Email: wim.henderickx@alcatel-lucent.be 901 Clarence Filsfils 902 Cisco 903 Email: cfilsfil@cisco.com 905 Dennis Cai 906 Cisco 907 Email: dcai@cisco.com 909 Lizhong Jin 910 ZTE Corporation 911 889, Bibo Road 912 Shanghai, 201203, China 913 Email: lizhong.jin@zte.com.cn