idnits 2.17.1 draft-raggarwa-mac-vpn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 331 has weird spacing: '...se, may be...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 26, 2010) is 5146 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 114, but not defined == Missing Reference: 'BGP-VPLS' is mentioned on line 244, but not defined == Unused Reference: 'RFC4761' is defined on line 919, but no explicit reference was found in the text == Unused Reference: 'RFC4762' is defined on line 923, but no explicit reference was found in the text == Unused Reference: 'VPLS-MULTIHOMING' is defined on line 927, but no explicit reference was found in the text == Unused Reference: 'PIM-SNOOPING' is defined on line 931, but no explicit reference was found in the text == Unused Reference: 'IGMP-SNOOPING' is defined on line 934, but no explicit reference was found in the text == Outdated reference: A later version (-16) exists of draft-ietf-l2vpn-vpls-mcast-04 == Outdated reference: A later version (-02) exists of draft-kompella-l2vpn-vpls-multihoming-01 == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-vpls-pim-snooping-01 ** Downref: Normative reference to an Informational draft: draft-ietf-l2vpn-vpls-pim-snooping (ref. 'PIM-SNOOPING') ** Downref: Normative reference to an Informational RFC: RFC 4541 (ref. 'IGMP-SNOOPING') Summary: 6 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Aggarwal (Editor) 3 Internet Draft Juniper Networks 4 Category: Standards Track 5 Expiration Date: September 2010 A. Isaac 6 Bloomberg 8 J. Uttaro 9 AT&T 11 R. Shekhar 12 Juniper Networks 14 March 26, 2010 16 BGP MPLS Based MAC VPN 18 draft-raggarwa-mac-vpn-00.txt 20 Status of this Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that other 27 groups may also distribute working documents as Internet-Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Copyright and License Notice 42 Copyright (c) 2010 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 This document may contain material from IETF Documents or IETF 56 Contributions published or made publicly available before November 57 10, 2008. The person(s) controlling the copyright in some of this 58 material may not have granted the IETF Trust the right to allow 59 modifications of such material outside the IETF Standards Process. 60 Without obtaining an adequate license from the person(s) controlling 61 the copyright in such materials, this document may not be modified 62 outside the IETF Standards Process, and derivative works of it may 63 not be created outside the IETF Standards Process, except to format 64 it for publication as an RFC or to translate it into languages other 65 than English. 67 Abstract 69 This document describes procedures for BGP MPLS based MAC VPNs (MAC- 70 VPN). 72 Table of Contents 74 1 Specification of requirements ......................... 4 75 2 Contributors .......................................... 4 76 3 Introduction .......................................... 4 77 4 BGP MPLS Based MAC-VPN ................................ 5 78 5 Ethernet Segment Identifier ........................... 6 79 6 Determining Reachability to Unicast MAC Addresses ..... 7 80 6.1 Local Learning ........................................ 7 81 6.2 Remote learning ....................................... 8 82 6.2.1 BGP MAC-VPN MAC Address Advertisement ................. 8 83 7 Designated Forwarder Election ......................... 10 84 8 Forwarding Unicast Packets ............................ 12 85 8.1 Processing of Unknown Unicast Packets ................. 12 86 8.2 Forwarding packets received from a CE ................. 13 87 8.3 Forwarding packets received from a remote MES ......... 14 88 9 Split Horizon ......................................... 15 89 10 Load Balancing of Unicast Packets ..................... 15 90 10.1 Load balancing of traffic from a MES to remote CEs .... 16 91 10.2 Load balancing of traffic between a MES and a local CE ....17 92 10.2.1 Data plane learning ................................... 17 93 10.2.2 Control plane learning ................................ 17 94 11 MAC Moves ............................................. 18 95 12 Multicast ............................................. 18 96 12.1 Ingress Replication ................................... 18 97 12.2 P2MP LSPs ............................................. 19 98 12.2.1 Inclusive Trees ....................................... 19 99 12.2.2 Selective Trees ....................................... 20 100 12.3 Explicit Tracking ..................................... 20 101 13 Convergence ........................................... 21 102 13.1 Transit Link and Node Failures between MESes .......... 21 103 13.2 MES Failures .......................................... 21 104 13.2.1 Local Repair .......................................... 21 105 13.3 MES to CE Network Failures ............................ 21 106 14 Acknowledgements ...................................... 22 107 15 References ............................................ 22 108 16 Author's Address ...................................... 22 109 1. Specification of requirements 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in [RFC2119]. 115 2. Contributors 117 In addition to the authors listed above, the following individuals 118 also contributed to this document. 120 Quaizar Vohra 121 Kireeti Kompella 122 Apurva Mehta 123 Juniper Networks 125 3. Introduction 127 This document describes procedures for BGP MPLS based MAC VPNs (MAC- 128 VPN). 130 There is a desire by Service Providers (SP) and data center providers 131 to provide MPLS based bridged / LAN services or/and infrastructure 132 such that they meet the requirements listed below. An example of such 133 a service is a VPLS service offered by a SP. Another example is a 134 MPLS based Virtual Bridged Network infrastructure in a data center. 135 Here are the requirements: 137 - Minimal or no configuration required. MPLS implementations have 138 reduced the amount of configuration over the years. There is a need 139 for greater auto-configuration. 141 - Support of multiple active points of attachment for CEs which may 142 be hosts, switches or routers. Current MPLS technologies such as 143 VPLS, currently do not support this. This allows load-balancing among 144 multiple paths active. MPLS technologies such as VPLS currently do 145 not allow the same MAC to be learned from two different PEs and be 146 active at the same time. 148 - Ability to span a VLAN across multiple racks in different 149 geographic locations. 151 - Minimize or eliminate flooding of unknown unicast traffic. 153 - Allow hosts and Virtual Machines (VMs) in a data center to relocate 154 without requiring renumbering. For instnace VMs may be moved for load 155 or failure reasons. 157 - Ability to scale up to hundreds of thousands of hosts across 158 multiple data centers, where connectivity is required between hosts 159 in different data centers. 161 - Support for virtualization. This includes the ability to separate 162 hosts and VMs working together from other such groups, and the 163 ability to have overlapping IP and MAC addresses/ 165 - Fast convergence 167 This document proposes a MPLS based technology, referred to as MPLS- 168 based MAC VPN (MAC-VPN) for meeting the requirements described in 169 this section. MAC-VPN requires extensions to existing IP/MPLS 170 protocols as described in the next section. In addition to these 171 extensions MAC-VPN uses several building blocks from existing MPLS 172 technologies. 174 4. BGP MPLS Based MAC-VPN 176 This section describes the framework of MAC-VPN to meet the 177 requirements described in the previous section. 179 An MAC-VPN comprises CEs that are connected to PEs or MPLS Edge 180 Switches (MES) that comprise the edge of the MPLS infrastructure. A 181 CE may be a host, a router or a switch. The MPLS Edge Switches 182 provide layer 2 virtual bridge connectivity between the CEs. There 183 may be multiple MAC VPNs in the provider's network. This document 184 uses the terms MAC-VPN, MAC VPN inter-changeably. The instance of a 185 MAC VPN on an MES is referred to as a MAC VPN Instance (MVI). 187 The MESes are connected by a MPLS LSP infrastructure which provides 188 the benefits of MPLS such as fast-reroute, resiliency etc. 190 In a MAC VPN, learning between MESes occurs not in the data plane (as 191 happens with traditional bridging) but in the control plane. Control 192 plane learning offers much greater control over the learning process, 193 such as restricting who learns what, and the ability to apply 194 policies. Furthermore, the control plane chosen for this is BGP 195 (very similar to IP VPNs (RFC 4364)), providing much greater scale, 196 and the ability to "virtualize" or isolate groups of interacting 197 agents (hosts, servers, Virtual Machines) from each other. In MAC 198 VPNs MESes advertise the MAC addresses learned from the CEs that are 199 connected to them, along with a MPLS label, to other MESes in the 200 control plane. Control plane learning enables load balancing and 201 allows CEs to connect to multiple active points of attachment. It 202 also improves convergence times in the event of certain network 203 failures. 205 However, learning between MESes and CEs is done by the method best 206 suited to the CE: data plane learning, IEEE 802.1x, LLDP, or other 207 protocols. 209 It is a local decision as to whether the Layer 2 forwarding table on 210 a MES contains all the MAC destinations known to the control plane or 211 implements a cache based scheme. For instance the forwarding table 212 may be populated only with the MAC destinations of the active flows 213 transiting a specific MES. 215 The policy attributes of a MAC VPN are very similar to an IP VPN. A 216 MAC-VPN instance requires a Route-Distinguisher (RD) and a MAC-VPN 217 requires one or more Route-Targets (RTs). A CE attaches to a MAC-VPN 218 on a MES in a particular MVI on a VLAN or simply an ethernet 219 interface. When the point of attachment is a VLAN there may be one or 220 more VLANs in a particular MAC-VPN. Some deployment scenarios 221 guarantee uniqueness of VLANs across MAC-VPNs: all points of 222 attachment of a given MAC VPN use the same VLAN, and no other MAC VPN 223 uses this VLAN. This document refers to this case as a "Single VLAN 224 MAC-VPN" and describes simplified procedures to optimize for it. 226 The next section discusses how layer 2 connectivity is achieved 227 between the CEs. 229 Section 8 describes how load balancing and link bonding is achieved 230 for MAC-VPN. Section 10 describes procedures for handling MAC moves. 232 5. Ethernet Segment Identifier 234 If a CE is multi-homed to two or more MESes, the set of attachment 235 circuits constitutes an "Ethernet segment". An Ethernet segment may 236 appear to the CE as a Link Aggregation Group. Ethernet segments have 237 an identifier, called the "Ethernet Segment Identifier" (ESI). A 238 single-homed CE is considered to be attached to a Ethernet segment 239 with ESI 0. Otherwise, an Ethernet segment MUST have a unique non- 240 zero ESI. The ESI can be assigned using various mechanisms: 242 1. The ESI may be configured. For instance when MAC VPNs are used to 243 provide a VPLS service the ESI is fairly analogous to the VE ID used 244 for the procedures in [BGP-VPLS]. 246 2. If LACP is used, between the MESes and CEs that are hosts, then 247 the ESI is determined by LACP. This is the 48 bit virtual MAC address 248 of the host for the LACP link bundle. As far as the host is concerned 249 it would treat the multiple MESes that it is homed to as the same 250 switch. This allows the host to aggregate links to different MESes 251 in the same bundle. 253 3. If LLDP is used, between the MESes and CEs that are hosts, then 254 the ESI is determined by LLDP. The ESI will be specified in a 255 following version. 257 4. In the case of indirectly connected hosts and a bridged LAN 258 between the hosts and the MESes, the ESI is determined based on the 259 Layer 2 bridge protocol as follows: 261 If STP is used then the value of the ESI is derived by listening 262 to BPDUs on the ethernet segment. The MES does not run STP. However 263 it does learn the Switch ID, MSTP ID and Root Bridge ID by listening 264 to BPDUs. The ESI is as follows: 266 {Switch ID (6 bits), MSTP ID (6 bits), Root Bridge ID (48 267 bits)} 269 6. Determining Reachability to Unicast MAC Addresses 271 MESes forward packets that they receive based on the destination MAC 272 address. This implies that MESes must be able to learn how to reach a 273 given destination unicast MAC address. 275 There are two components to MAC address learning, "local learning" 276 and "remote learning": 278 6.1. Local Learning 280 A particular MES must be able to learn the MAC addresses from the CEs 281 that are connected to it. This is referred to as local learning. 283 The MESes in a particular MAC-VPN MUST support local data plane 284 learning using vanilla ethernet learning procedures. A MES must be 285 capable of learning MAC addresses in the data plane when it receives 286 packets such as the following from the CE network: 288 - DHCP requests 290 - gratuitous ARP request for its own MAC. 292 - ARP request for a peer. 294 Alternatively if a CE is a host then MESes MAY learn the MAC 295 addresses of the host in the control plane using extensions to 296 protocols such as LLDP that run between the MES and the hosts. 298 In the case where a CE is a host or a switched network connected 299 to hosts, the MAC address is reachable via a given MES may move 300 such that it becomes reachable via another MES. This is referred 301 to as a "MAC Move". Procedures to support this are described in 302 section 10. 304 6.2. Remote learning 306 A particular MES must be able to determine how to send traffic to MAC 307 addresses that belong to or are behind CEs connected to other MESes 308 i.e. to remote CEs or hosts behind remote CEs. We call such MAC 309 addresses as "remote" MAC addresses. 311 This document requires a MES to learn remote MAC addresses in the 312 control plane. In order to achieve this each MES advertises the MAC 313 addresses it learns from its locally attached CEs in the control 314 plane, to all the other MESes in the MAC-VPN, using BGP. 316 6.2.1. BGP MAC-VPN MAC Address Advertisement 318 BGP is extended to advertise these MAC addresses using a new NLRI 319 called the MAC-VPN-NLRI, with AFI (TBD) and a new SAFI of MAC-VPN 320 (TBD). 322 The MAC-VPN-NLRI encodes the following elements when it is used for 323 advertising MAC addresses: 325 a) Route-Distinguisher of the MAC-VPN instance that is advertising 326 the NLRI. A RD MUST be assigned for a given MAC-VPN instance on a 327 MES. This RD MUST be unique across all MAC-VPN instances on a MES. 328 This can be accomplished by using a Type 1 RD [RFC4364]. The value 329 field comprises an IP address of the MES (typically, the loopback 330 address) followed by a number unique to the MES. This number may be 331 generated by the MES, or, in the Single VLAN MAC-VPN case, may be 332 the 12 bit VLAN ID, with the remaining 4 bits set to 0. 334 b) VLAN ID if the MAC address is learned over a VLAN from the CE, 335 else this field is set to 0. 337 c) The Ethernet Segment Identifier described in the previous section. 339 d) The MAC address. 341 e) The advertisement may optionally carry one of the IP addresses 342 associated with the MAC address. If used, this aids the implemntation 343 of proxy ARP on MESes thereby reducing the flooding of broadcast 344 packets. 346 f) A MAC-VPN MPLS label that is used by the MES to forward packets 347 received from remote MESes. The forwarding procedures are specified 348 in section 8. A MES MAY advertise the same MAC-VPN label for all MAC 349 addresses in a given MAC-VPN instance. This label assignment 350 methodology is referred to as a per MVI label assigment. Or a MES may 351 advertise a unique MAC-VPN label per MAC address. Both of these 352 methodologies have their tradeoffs. Per MVI label assignment 353 requires the least number of MAC-VPN labels, but requires a MAC 354 lookup in addition to a MPLS lookup on an egress MES for forwarding. 355 On the other hand a unique label per MAC allows an egress MES to 356 forward a packet that it receives from another MES, to the connected 357 CE, after looking up only the MPLS labels and not having to do a MAC 358 lookup. 360 The BGP advertisement also carries the following attributes: 362 a) One or more Route Target (RT) attributes MUST be carried. 364 RTs may be configured (as in IP VPNs), or may be derived 365 automatically from the VLAN ID associated with the advertisement. 367 The following is the procedure for deriving the RT attribute 368 automatically from the VLAN ID associated with the advertisement: 370 + The Global Administrator field of the RT MUST 371 be set to an IP address of the MES. This address SHOULD be 372 common for all the MAC-VPN instances on the MES (e.,g., this 373 address may be the MES's loopback address). 375 + The Local Administrator field of the RT contains a 2 376 octets long number that encodes the VLAN-ID. 378 The above auto-configuration of the RT implies that a different RT is 379 used for every VLAN in a MAC-VPN, if the MAC-VPN contains multiple 380 VLANs. For the "Single VLAN MAC-VPN" this results in auto-deriving 381 the RT from the VLAN for that MAC-VPN. 383 b) The advertisement may optionally carry the IP addresses associated 384 with the MAC address, if the number of IP addresses is more than one 385 and cannot be encoded in the NLRI. This aids the implemntation of 386 proxy ARP on MESes thereby reducing the flooding of broadcast 387 packets. 389 It is to be noted that this document does not require MESes to create 390 forwarding state for remote MACs when they are learned in the control 391 plane. When this forwarding state is actually created is a local 392 implementation matter. 394 7. Designated Forwarder Election 396 If a CE that is a host or a router is multi-homed directly to more 397 than one MES in a MAC-VPN, only one of the MESes is responsible for 398 certain actions: 400 - Sending multicast and broadcast traffic to the CE. Note 401 that this is the default behavior. Optional mechanisms, 402 which will be specified later, will allow load balancing 403 of multicast and broadcast traffic from MESes to CEs on 404 multiple active points of attachment 406 - Flooding unknown unicast traffic (i.e. traffic for which a 407 MES does not know the destination MAC address) to the CE, 408 if the environment requires flooding of unknown unicast 409 traffic. 411 Note that a CE always sends packets using a single link. For instance 412 if the CE is a host then, as mentioned earlier, the host treats the 413 multiple links that it uses to reach the MESes as a LAG or a bundle. 415 If a bridge network is multi-homed to more than one MES in a MAC-VPN 416 via switches, only one of the MESes is responsible for certain 417 actions: 419 + - Forwarding packets to other MESes, out of the bridged LAN 420 which is multi-homed to more than one MES. This is the 421 case when the MAC-VPN cloud is inter-connecting bridged 422 LAN islands. There are certain cases where this may not 423 be the case. For instance this is not required if the 424 topology is loop free. 426 + - Sending multicast and broadcast traffic to the bridge 427 network. Note that this is the default behavior. 428 Optional mechanisms, which will be specified later, 429 will allow load balancing of multicast and broadcast 430 traffic from MESes on CEs on multiple active points 431 of attachment 433 + - Flooding unknown unicast traffic (i.e. traffic for which 434 a MES does not know the destination MAC address) to the 435 bridge network. 437 This particular MES is referred to as the designated forwarder (DF) 438 MES, for the ethernet segment over which the host is multi-homed to 439 two or more MESes. This ethernet segment may be a link bundle if the 440 host or router is directly connected to the MESes. Or this ethernet 441 segment may be a bridged LAN network, if the CEs are switches. The 442 bridged LAN network may be running a protocol such as STP. The 443 granularity of the DF election MUST be at least this ethernet 444 segment. In this case the same MES MUST be elected as the DF for all 445 CEs on the ethernet segment. The granularity of the DF election MAY 446 be the combination of the ethernet segment and VLAN on that ethernet 447 segment. In this case the same MES MUST be elected as the DF for all 448 hosts on a VLAN on that ethernet segment. 450 The MESes perform a designated forwarder (DF) election, for an 451 ethernet segment, or ethernet segment, vlan combination using BGP. 452 The Ethernet Segment Identifier is assigned as described in section 453 4. 455 In order to perform DF election each MES advertises in BGP, a DF 456 election route, using the MAC-VPN-NLRI, for each ethernet segment in 457 a MAC-VPN. This route contains the following information elements 459 a) Route-Distinguisher of the MAC-VPN instance that is advertising 460 the NLRI. This RD is the same as the one used in section 5.2.1. 462 b) Ethernet Segment Identifier 464 c) Optional VLAN ID which may be set to 0. 466 d) An upstream assigned MPLS label referred to as the "ESI label". 467 The usage of this label is described in section 8. 469 This route also carries the following BGP attributes: 471 - P-Tunnel attribute which is specified in [VPLS-MCAST]. The usage 472 of this attribute is described in section 11. 474 - One or more Route Target (RT) attributes. These RTs are assigned 475 using the same procedure as the one described in section 5. 477 The DF election for a particular ESI and VLAN combination proceeds as 478 follows. First a MES constructs a candidate list of MESes. This 479 comprises all the DF routes with that particular {ESI, VLAN} tuple 480 that a MES imports in a MAC-VPN instance, including the DF route 481 generated by the MES itself, if any. The DF MES is chosen from this 482 candidate list. Note that DF election is carried out by all the MESes 483 that import the DF route. 485 The default procedure for choosing the DF is the MES with the highest 486 IP address, of all the MESes in the candidate list. This procedure 487 MUST be implemented. It ensures that except during routing transients 488 each MES chooses the same DF MES for a given ESI and VLAN 489 combination. 491 Other alternative procedures for performing DF election are possible 492 and will be described in the future. 494 8. Forwarding Unicast Packets 496 8.1. Processing of Unknown Unicast Packets 498 The procedures in this document do not require MESes to flood unknown 499 unicast traffic to other MESes. If MESes learn CE MAC addresses via a 500 control plane, the MESes can then distribute MAC addresses via BGP, 501 and all unicast MAC addresses will be learnt prior to traffic to 502 those destinations. 504 However, if a destination MAC address of a received packet is not 505 known by the MES, the MES may have to flood the packet. Flooding must 506 take into account "split horizon forwarding" as follows. The 507 principles behind the following procedures are borrowed from the 508 split horizon forwarding rules in VPLS solutions [RFC 4761, RFC 509 4762]. When a MES capable of flooding (say MESx) receives a 510 broadcast Ethernet frame, or one with an unknown destination MAC 511 address, it must flood the frame. f the frame arrived from an 512 attached CE, MESx must send a copy of the frame to every other 513 attached CE, as well as to all other MESs participating in the MAC 514 VPN. If, on the other hand, the frame arrived from another MES (say 515 MESy), MESx must send a copy of the packet only to attached CEs. MESx 516 MUST NOT send the frame to other MESs, since MESy would have already 517 done so. Split horizon forwarding rules apply to broadcast and 518 multicast packets, as well as packets to an unknown MAC address. 520 Whether or not to flood packets to unknown destination MAC addresses 521 should be an administrative choice, depending on how learning happens 522 between CEs and MESes. 524 The MESes in a particular MAC VPN may use ingress replication using 525 RSVP-TE P2P LSPs or LDP MP2P LSPs for sending broadcast, multicast 526 and unknown unicast traffic to other MESes. Or they may use RSVP-TE 527 or LDP P2MP LSPs for sending such traffic to other MESes. 529 If ingress replication is in use, the P-Tunnel attribute, carried in 530 the DF routes for the MAC VPN, specifies the downstream label that 531 the other MESes can use to send unknown unicast, multicast or 532 broadcast traffic for the MAC VPN to this particular MES. Note that 533 if a MES has multiple ethernet segments for the same MAC-VPN instance 534 and ingress replication is in use, then the MES SHOULD advertise the 535 same P-Tunnel attribute for each DF route for that MAC-VPN instance. 537 The procedures for using P2MP LSPs are very similar to VPLS 538 procedures [VPLS-MCAST]. The P-Tunnel attribute used by a MES for 539 sending unknown unicast, broadcast or multicast traffic for a 540 particular ethernet segment, is advertised in the DF route as 541 described in section 6. Note that if a MES has multiple ethernet 542 segments for the same MAC-VPN instance then it SHOULD advertise the 543 same P-Tunnel attribute for each DF route for that MAC-VPN instance. 544 The P-Tunnel attribute specifies the P2MP LSP identifier. This is the 545 equivalent of an Inclusive tree in [VPLS-MCAST]. Note that multiple 546 MAC-VPNs can use the same P2MP LSP, using upstream labels [VPLS- 547 MCAST]. When P2MP LSPs are used for flooding unknown unicast traffic, 548 packet re-ordering is possible. 550 8.2. Forwarding packets received from a CE 552 When a MES receives a packet from a CE it must first look up the 553 source MAC address of the packet. In certain environments the source 554 MAC address may be used to authenticate the CE and determine that 555 traffic from the host can be allowed into the network. 557 If the MES decides to forward the packet the destination MAC address 558 of the packet must be looked up. If the MES has received MAC address 559 advertisements from one or more other MESes, for this destination MAC 560 address, it is considered as a known MAC address. Else the MAC 561 address is considered as an unknown MAC address. 563 For known MAC addresses the MES forwards this packet to one of the 564 remote MESes. The packet is encapsulated in the MAC-VPN MPLS label 565 advertised by the remote MES, for that MAC address, and in the MPLS 566 LSP label stack to reach the remote MES. 568 If the MAC address is unknown then, if the admnistrative policy on 569 the MES requires flooding of unknown unicast traffic: 570 - The MES floods the packet to other MESes. The MES first 571 encapsulates the packet in the ESI MPLS label as described in section 572 4. If P2MP LSPs are being used the packet is sent on the P2MP LSP 573 that the MES is the root of for that MAC-VPN, with all the other 574 MESes as the leaves. The packet is encapsulated in the P2MP LSP 575 label stack. If ingress replication is used the packet is replicated 576 once for each remote MES with the bottom label of the stack being the 577 MPLS label advertised by the remote MES in the P-Tunnel attribute. 579 If the MAC address is unknown then, if the admnistrative policy on 580 the MES does not allow flooding of unknown unicast traffic: 581 - The MES drops the packet. 583 8.3. Forwarding packets received from a remote MES 585 When a MES receives a MPLS packet from a remote MES then, after 586 processing the MPLS label stack, if the top MPLS label ends up being 587 a P2MP LSP label associated with a MAC-VPN or the downstream label 588 advertised in the P-Tunnel attribute and after performing the split 589 horizon procedures described in section 8: 591 - If the MES is the designated forwarder of unknown unicast, 592 broadcast or multicast traffic, the default behavior is for the MES 593 to flood the packet to all the host interfaces. In other words the 594 default behavior is for the MES to assume that the destination MAC 595 address is unknown unicast, broadcast or multicast and it is not 596 required to do a destination MAC address lookup. As an option the MES 597 may do a destination MAC lookup to flood the packet to only a subset 598 of the host interfaces. 599 - If the MES is not the designated forwarder, the default 600 behavior is for it to drop the packet. 602 If the top MPLS label ends up being a MAC-VPN label that was 603 advertised in the unicast MAC advertisements, then the MES either 604 forwards the packet based on CE next-hop forwarding information 605 associated with the label or does a destination MAC address lookup to 606 forward the packet to a CE. 608 9. Split Horizon 610 Consider a CE that is multi-homed to two or more MESes on an ethernet 611 segment ES1. If the CE sends a multicast, broadcast or unknown 612 unicast packet to a particular MES, say MES1, then MES1 will forward 613 that packet to all the other MESes in the MAC VPN. In this case the 614 MESes, other than MES1, that the CE is multi-homed to MUST drop the 615 packet and not forward back to the CE. This is referred to as "split 616 horizon" in this document. 618 In order to accomplish this each MES distributes to other MESes an 619 "ESI MPLS label" in the DF route as described in section 6. This 620 label is upstream assigned by the MES that advertises the DF route. 621 This label MUST be programmed by the other MESes, that are connected 622 to the ESI advertised in the route, in the context label space for 623 the advertising MES. Further the forwarding entry for this label must 624 result in discarding packets received with this label. 626 Further the MES that advertises the "ESI MPLS label" MUST program in 627 its platform MPLS forwarding table a forwarding entry for this label 628 which results in sending packets to the ESI. 630 Consider MES1 and MES2 that are multi-homed to CE1 on ES1. When MES1 631 sends a multicast, broadcast or unknown unicast packet, that it 632 receives from CE1, it MUST first push onto the MPLS label stack the 633 ESI label that it has assigned for the ESI. The resulting packet is 634 further encapsulated in the MPLS label stack necessary to transmit 635 the packet to the other MESes. Penultimate hop popping MUST be 636 disabled on the P2P or P2MP LSPs used in the MPLS transport 637 infrastructure for MAC VPN. When MES2 receives this packet it 638 decapsulates the top MPLS label and forwards the packet using the 639 context label space determined by the top label. If the next label is 640 the ESI label assigned by MES1 then MES2 must drop the packet. 642 10. Load Balancing of Unicast Packets 644 This section specifies how load balancing is achieved to/from a CE 645 that has more than one interface that is directly connected to one or 646 more MESes. The CE may be a host or a router or it may be a switched 647 network that is connected via LAG to the MESes. 649 10.1. Load balancing of traffic from a MES to remote CEs 651 Whenever a remote MES imports a MAC advertisement for a given ESI, in 652 a MAC VPN instance, it MUST consider the MAC as reachahable via all 653 the MESes from which it has imported DF routes for that ESI. 655 Consider a CE, CE1, that is dual homed to two MESes, MES1 and MES2 on 656 a LAG interface, ES1, and is sending packets with MAC address MAC1. 657 Based on MAC-VPN extensions described in sections 5 and 6, a remote 658 MES say MES3 is able to learn that a MAC1 is reachable via MES1 and 659 MES2. Both MES1 and MES2 may advertise MAC1 in BGP if they receive 660 packets with MAC1 from CE1. If this is not the case and if MAC1 is 661 advertised only by MES1, MES3 still considers MAC1 as reachable via 662 both MES1 and MES2 as both MES1 and MES2 advertise a DF route for 663 ES1. 665 The MPLS label stack to send the packets to MES1 is the MPLS LSP 666 stack to get to MES1 and the MAC-VPN label advertised by MES1 for 667 CE1's MAC. 669 The MPLS label stack to send packets to MES2 is the MPLS LSP stack to 670 get to MES2 and the upstream assigned label in the DF route 671 advertised by MES2 for ES1, if MES2 has not advertised MAC1 in BGP. 673 We will refer to these label stacks as MPLS next-hops. 675 The remote MES, MES3, can now load balance the traffic it receives 676 from its CEs, destined for CE1, between MES1 and MES2. MES3 may use 677 the IP flow information for it to hash into one of the MPLS next-hops 678 for load balancing for IP traffic. Or MES3 may rely on the source and 679 destination MAC addresses for load balancing. 681 Note that once MES3 decides to send a particular packet to MES1 or 682 MES2 it can pick from more than path to reach the particular remote 683 MES using regular MPLS procedures. For instance if the tunneling 684 technology is based on RSVP-TE LSPs, and MES3 decides to send a 685 particular packet to MES1 then MES3 can choose from multiple RSVP-TE 686 LSPs that have MES1 as their destination. 688 When MES1 or MES2 receive the packet destined for CE1 from MES3, if 689 the packet is a unicast MAC packet it is forwarded to CE1. If it is 690 a multicast or broadcast MAC packet then only one of MES1 or MES2 691 must forward the packet to the CE. Which of MES1 or MES2 forward this 692 packet to the CE is determined by default based on which of the two 693 is the DF. An alternate procedure to load balance multicast packets 694 will be described in the future. 696 If the connectivity between the multi-homed CE and one of the MESes 697 that it is multi-homed to fails, the MES MUST withdraw the MAC 698 address from BGP. This enables the remote MESes to remove the MPLS 699 next-hop to this particular MES from the set of MPLS next-hops that 700 can be used to forward traffic to the CE. 702 Load balancing requires that the MESes that the CE is multi-homed to 703 are configured with different Route-Distinguishers (RDs). 705 10.2. Load balancing of traffic between a MES and a local CE 707 A CE may be configured with more than one interface connected to 708 different MESes or the same MES for load balancing. The MES(s) and 709 the CE can load balance traffic onto these interfaces using one of 710 the following mechanisms. 712 10.2.1. Data plane learning 714 Consider that the MESes perform data plane learning for local MAC 715 addresses learned from local CEs. This enables the MES(s) to learn a 716 particular MAC address and associate it with one or more interfaces. 717 The MESes can now load balance traffic destined to that MAC address 718 on the multiple interfaces. 720 Whether the CE can load balance traffic that it generates on the 721 multiple interfaces is dependent on the CE implementation. 723 10.2.2. Control plane learning 725 The CE can be a host that advertises the same MAC address using a 726 control protocol on both interfaces. This enables the MES(s) to learn 727 the host's MAC address and associate it with one or more interfaces. 728 The MESes can now load balance traffic destined to the host on the 729 multiple interfaces. The host can also load balance the traffic it 730 generates onto these interfaces and the MES that receives the traffic 731 employs MAC-VPN forwarding procedures to forward the traffic. 733 11. MAC Moves 735 In the case where a CE is a host or a switched network connected to 736 hosts, the MAC address that is reachable via a given MES on a 737 particular ESI may move such that it becomes reachable via another 738 MES on another ESI. This is referred to as a "MAC Move". 740 Remote MESes must be able to distinguish a MAC move from the case 741 where a MAC address on an ESI is reachable via two different MESes 742 and load balancing is performed as described in section 9. This 743 distinction can be made as follows. If a MAC is learned by a 744 particular MES from multiple MESes, then the MES performs load 745 balancing only amongst the set of MESes that advertised the MAC with 746 the same ESI. If this is not the case then the MES chooses only one 747 of the advertising MESes to reach the MAC as per BGP path selection. 749 There can be traffic loss during a MAC move.Consider MAC1 that is 750 advertised by MES1 and learned from CE1 on ESI1. If MAC1 now moves 751 behind MES2, on ESI2, MES2 advertises the MAC in BGP. Until a remote 752 MES, MES3, determines that the best path is via MES2, it will 753 continue to send traffic destined for MAC1 to MES1. This will not 754 occur deterministially until MES1 withdraws the advertisement for 755 MAC1. 757 This specification requires that when MES1 learns MAC1 from MES2, and 758 MAC1 as learned by MES1 from the local CE, is not on the same 759 ethernet segment as the one associated with MAC1 by MES2, MES1 must 760 withdraw its own MAC address advertisement from BGP. Further if MES1 761 receives traffic destined for MAC1 it must send the traffic to MES2. 762 This procedure reduces the duration of traffic loss associated with 763 MAC moves. 765 12. Multicast 767 The MESes in a particular MAC-VPN may use ingress replication or P2MP 768 LSPs to send multicast traffic to other MESes. 770 12.1. Ingress Replication 772 The MESes may use ingress replication for flooding unknown unicast, 773 multicast or broadcast traffic as described in section 7.1. A given 774 unknown unicast or broadcast packet must be sent to all the remote 775 MESes. However a given multicast packet for a multicast flow may be 776 sent to only a subset of the MESes. Specifically a given multicast 777 flow may be sent to only those MESes that have receivers that are 778 interested in the multicast flow. Determining which of the MESes have 779 receivers for a given multicast flow is done using explicit tracking 780 described below. 782 12.2. P2MP LSPs 784 A MES may use an "Inclusive" tree for sending an unknown unicast, 785 broadcast or multicast packet or a "Selective" tree. This terminology 786 is borrowed from [VPLS-MCAST]. 788 A variety of transport technologies may be used in the SP network. 789 For inclusive P-Multicast trees, these transport technologies include 790 point-to-multipoint LSPs created by RSVP-TE or mLDP. For selective P- 791 Multicast trees, only unicast MES-MES tunnels (using MPLS or IP/GRE 792 encapsulation) and P2MP LSPs are supported, and the supported P2MP 793 LSP signaling protocols are RSVP-TE, and mLDP. 795 12.2.1. Inclusive Trees 797 An Inclusive Tree allows the use of a single multicast distribution 798 tree, referred to as an Inclusive P-Multicast tree, in the SP network 799 to carry all the multicast traffic from a specified set of MAC VPN 800 instances on a given MES. A particular P-Multicast tree can be set up 801 to carry the traffic originated by sites belonging to a single MAC 802 VPN, or to carry the traffic originated by sites belonging to 803 different MAC VPNs. The ability to carry the traffic of more than one 804 MAC VPN on the same tree is termed 'Aggregation'. The tree needs to 805 include every MES that is a member of any of the MAC VPNs that are 806 using the tree. This implies that a MES may receive multicast traffic 807 for a multicast stream even if it doesn't have any receivers that are 808 interested in receiving traffic for that stream. 810 An Inclusive P-Multicast tree as defined in this document is a P2MP 811 tree. A P2MP tree is used to carry traffic only for MAC VPN CEs that 812 are connected to the MES that is the root of the tree. 814 The procedures for signaling an Inclusive Tree are the same as those 815 in [VPLS-MCAST] with the VPLS-AD route replaced with the DF route. 816 The P-Tunnel attribute [VPLS-MCAST] for an Inclusive tree is 817 advertised in the DF route as described in section 5. Note that a 818 MES can "aggregate" multiple inclusive trees for different MAC-VPNs 819 on the same P2MP LSP using upstream labels. The procedures for 820 aggregation are the same as those described in [VPLS-MCAST], with 821 VPLS A-D routes replaced by MAC-VPN DF routes. 823 12.2.2. Selective Trees 825 A Selective P-Multicast tree is used by a MES to send IP multicast 826 traffic for one or IP more specific multicast streams, originated by 827 CEs connected to the MES, that belong to the same or different MAC 828 VPNs, to a subset of the MESs that belong to those MAC VPNs. Each of 829 the MESs in the subset should be on the path to a receiver of one or 830 more multicast streams that are mapped onto the tree. The ability to 831 use the same tree for multicast streams that belong to different MAC 832 VPNs is termed a MES the ability to create separate SP multicast 833 trees for specific multicast streams, e.g. high bandwidth multicast 834 streams. This allows traffic for these multicast streams to reach 835 only those MES routers that have receivers in these streams. This 836 avoids flooding other MES routers in the MAC VPN. 838 A SP can use both Inclusive P-Multicast trees and Selective P- 839 Multicast trees or either of them for a given MAC VPN on a MES, based 840 on local configuration. 842 The granularity of a selective tree is where S is an IP 843 multicast source address and G is an IP multicast group address or G 844 is a multicast MAC address. Wildcard sources and wildcard groups are 845 supported. Selective trees require explicit tracking as described 846 below. 848 A MAC-VPN MES advertises a selective tree using a MAC-VPN selective 849 A-D route. The procedures are the same as those in [VPLS-MCAST] with 850 S-PMSI A-D routes in [VPLS-MCAST] replaced by MAC-VPN selective A-D 851 routes. The information elements of the MAC VPN selective 852 A-D route are the same as those of the VPLS S-PMSI A-D route with 853 the following difference. A MAC VPN selective A-D route may encode a 854 MAC address in the Group field. The encoding details of the MAC VPN 855 selective A-D route will be described in the next revision. 857 Selective trees can also be aggregated on the same P2MP LSP using 858 aggregation as described in [VPLS-MCAST]. 860 12.3. Explicit Tracking 862 [VPLS-MCAST] describes procedures for explicit tracking that rely on 863 Leaf A-D routes. The same procedures are used for explicit tracking 864 in this specification with VPLS Leaf A-D routes replaced with MAC-VPN 865 Leaf A-D routes. These procedures allow a root MES to request 866 multicast membership information for a given (S, G), from leaf MESs. 867 Leaf MESs rely on IGMP snooping or PIM snooping between the MES and 868 the CE to determine the multicast membership information. Note that 869 the procedures in [VPLS-MCAST] do not describe how explicit tracking 870 is performed if the CEs are enabled with join suppression. The 871 procedures for this case will be described in a future version. 873 13. Convergence 875 This section describes failure recovery from different types of 876 network failures. 878 13.1. Transit Link and Node Failures between MESes 880 The use of existing MPLS Fast-Reroute mechanisms can provide failure 881 recovery in the order of 50ms, in the event of transit link and node 882 failures in the infrastructure that connects the MESes. 884 13.2. MES Failures 886 Consider a host host1 that is dual homed to MES1 and MES2. If MES1 887 fails, a remote MES, MES3, can discover this based on the failure of 888 the BGP session. This failure detection can be in the sub-second 889 range if BFD is used to detect BGP session failure. MES3 can update 890 its forwarding state to start sending all traffic for host1 to only 891 MES2. It is to be noted that this failure recovery is potentially 892 faster than what would be possible if data plane learning were to be 893 used. As in that case MES3 would have to rely on re-learning of MAC 894 addresses via MES2. 896 13.2.1. Local Repair 898 It is possible to perform local repair in the case of MES failures. 899 Details will be specified in the future. 901 13.3. MES to CE Network Failures 903 Deatils will be described in the future. 905 14. Acknowledgements 907 We would like to thank Yakov Rekhter, Kaushik Ghosh, Nischal Sheth 908 and Amit Shukla for discussions that helped shape this document. We 909 would also like to thank Han Nguyen for his comments and support of 910 this work. 912 15. References 914 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006 916 [VPLS-MCAST] "Multicast in VPLS". R. Aggarwal et.al., draft-ietf- 917 l2vpn-vpls-mcast-04.txt 919 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 920 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, January 921 2007. 923 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 924 (VPLS) Using Label Distribution Protocol (LDP) Signaling", RFC 4762, 925 January 2007. 927 [VPLS-MULTIHOMING] "Multi-homing in BGP-based Virtual Private LAN 928 Service", K. Kompella et.al., draft-kompella-l2vpn-vpls- 929 multihoming-01.txt 931 [PIM-SNOOPING] "PIM Snooping over VPLS", V. Hemige et. al., draft- 932 ietf-l2vpn-vpls-pim-snooping-01 934 [IGMP-SNOOPING] "Considerations for Internet Group Management 935 Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping 936 Switches", M. Christensen et. al., RFC4541, 938 16. Author's Address 940 Rahul Aggarwal 941 Juniper Networks 942 1194 N. Mathilda Ave. 943 Sunnyvale, CA 94089 US 945 Email: rahul@juniper.net 947 Aldrin Isaac 948 Bloomberg 949 Email: aisaac71@bloomberg.net 950 James Uttaro 951 AT&T 952 200 S. Laurel Avenue 953 Middletown, NJ 07748 954 USA 955 Email: uttaro@att.com 957 Ravi Shekhar 958 Juniper Networks 959 1194 N. Mathilda Ave. 960 Sunnyvale, CA 94089 US