idnits 2.17.1 draft-ietf-l2vpn-vpls-bgp-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1330. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1307. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1314. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1320. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 37 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 168 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 21, 2006) is 6518 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2385 (ref. '2') (Obsoleted by RFC 5925) -- Obsolete informational reference (is this intentional?): RFC 2796 (ref. '8') (Obsoleted by RFC 4456) == Outdated reference: A later version (-09) exists of draft-ietf-l3vpn-bgpvpn-auto-07 == Outdated reference: A later version (-10) exists of draft-kompella-l2vpn-l2vpn-01 Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group K. Kompella, Ed. 2 Internet-Draft Y. Rekhter, Ed. 3 Expires: December 23, 2006 Juniper Networks 4 June 21, 2006 6 Virtual Private LAN Service (VPLS) Using BGP for Auto-discovery and 7 Signaling 8 draft-ietf-l2vpn-vpls-bgp-08 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on December 23, 2006. 35 Copyright Notice 37 Copyright (C) The Internet Society (2006). 39 Abstract 41 Virtual Private LAN (Local Area Network) Service (VPLS), also known 42 as Transparent LAN Service, and Virtual Private Switched Network 43 service, is a useful Service Provider offering. The service offers a 44 Layer 2 Virtual Private Network (VPN); however, in the case of VPLS, 45 the customers in the VPN are connected by a multipoint Ethernet LAN, 46 in contrast to the usual Layer 2 VPNs, which are point-to-point in 47 nature. 49 This document describes the functions required to offer VPLS, a 50 mechanism for signaling a VPLS, and rules for forwarding VPLS frames 51 across a packet switched network. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 4 57 1.2. Conventions used in this document . . . . . . . . . . . . 5 58 1.3. Changes from version 06 to 07 . . . . . . . . . . . . . . 5 59 1.4. Changes from version 05 to 06 . . . . . . . . . . . . . . 6 60 1.5. Changes from version 04 to 05 . . . . . . . . . . . . . . 6 61 1.6. Changes from version 03 to 04 . . . . . . . . . . . . . . 7 62 2. Functional Model . . . . . . . . . . . . . . . . . . . . . . . 8 63 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 8 64 2.2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 9 65 2.3. Interactions . . . . . . . . . . . . . . . . . . . . . . . 9 66 3. Control Plane . . . . . . . . . . . . . . . . . . . . . . . . 11 67 3.1. Autodiscovery . . . . . . . . . . . . . . . . . . . . . . 11 68 3.1.1. Functions . . . . . . . . . . . . . . . . . . . . . . 11 69 3.1.2. Protocol Specification . . . . . . . . . . . . . . . . 12 70 3.2. Signaling . . . . . . . . . . . . . . . . . . . . . . . . 12 71 3.2.1. Label Blocks . . . . . . . . . . . . . . . . . . . . . 13 72 3.2.2. VPLS BGP NLRI . . . . . . . . . . . . . . . . . . . . 13 73 3.2.3. PW Setup and Teardown . . . . . . . . . . . . . . . . 14 74 3.2.4. Signaling PE Capabilities . . . . . . . . . . . . . . 15 75 3.3. BGP VPLS Operation . . . . . . . . . . . . . . . . . . . . 16 76 3.4. Multi-AS VPLS . . . . . . . . . . . . . . . . . . . . . . 17 77 3.4.1. a) VPLS-to-VPLS connections at the ASBRs. . . . . . . 18 78 3.4.2. b) EBGP redistribution of VPLS information between 79 ASBRs. . . . . . . . . . . . . . . . . . . . . . . . . 19 80 3.4.3. c) Multi-hop EBGP redistribution of VPLS 81 information between ASes. . . . . . . . . . . . . . . 20 82 3.4.4. Allocation of VE IDs Across Multiple ASes . . . . . . 20 83 3.5. Multi-homing and Path Selection . . . . . . . . . . . . . 21 84 3.6. Hierarchical BGP VPLS . . . . . . . . . . . . . . . . . . 21 85 4. Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 24 86 4.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 24 87 4.2. Forwarding . . . . . . . . . . . . . . . . . . . . . . . . 24 88 4.2.1. MAC address learning . . . . . . . . . . . . . . . . . 24 89 4.2.2. Aging . . . . . . . . . . . . . . . . . . . . . . . . 24 90 4.2.3. Flooding . . . . . . . . . . . . . . . . . . . . . . . 25 91 4.2.4. Broadcast and Multicast . . . . . . . . . . . . . . . 25 92 4.2.5. "Split Horizon" Forwarding . . . . . . . . . . . . . . 26 93 4.2.6. Qualified and Unqualified Learning . . . . . . . . . . 26 94 4.2.7. Class of Service . . . . . . . . . . . . . . . . . . . 26 95 5. Deployment Options . . . . . . . . . . . . . . . . . . . . . . 28 96 6. Security Considerations . . . . . . . . . . . . . . . . . . . 29 97 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 98 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 99 8.1. Normative References . . . . . . . . . . . . . . . . . . . 32 100 8.2. Informative References . . . . . . . . . . . . . . . . . . 32 101 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 34 102 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 35 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 104 Intellectual Property and Copyright Statements . . . . . . . . . . 37 106 1. Introduction 108 Virtual Private LAN Service (VPLS), also known as Transparent LAN 109 Service, and Virtual Private Switched Network service, is a useful 110 service offering. A Virtual Private LAN appears in (almost) all 111 respects as an Ethernet LAN to customers of a Service Provider. 112 However, in a VPLS, the customers are not all connected to a single 113 LAN; the customers may be spread across a metro or wide area. In 114 essence, a VPLS glues together several individual LANs across a 115 packet-switched network to appear and function as a single LAN ([9]). 116 This is accomplished by incorporating MAC address learning, flooding 117 and forwarding functions in the context of pseudowires that connect 118 these individual LANs across the packet-switched network. 120 This document details the functions needed to offer VPLS, and then 121 goes on to describe a mechanism for the autodiscovery of the 122 endpoints of a VPLS as well as for signaling a VPLS. It also 123 describes how VPLS frames are transported over tunnels across a 124 packet switched network. The autodiscovery and signaling mechanism 125 uses BGP as the control plane protocol. This document also briefly 126 discusses deployment options, in particular, the notion of decoupling 127 functions across devices. 129 Alternative approaches include: [14], which allows one to build a 130 Layer 2 VPN with Ethernet as the interconnect; and [13]), which 131 allows one to set up an Ethernet connection across a packet-switched 132 network. Both of these, however, offer point-to-point Ethernet 133 services. What distinguishes VPLS from the above two is that a VPLS 134 offers a multipoint service. A mechanism for setting up pseudowires 135 for VPLS using the Label Distribution Protocol (LDP) is defined in 136 [10]. 138 1.1. Scope of this Document 140 This document has four major parts: defining a VPLS functional model; 141 defining a control plane for setting up VPLS; defining the data plane 142 for VPLS (encapsulation and forwarding of data); and defining various 143 deployment options. 145 The functional model underlying VPLS is laid out in Section 2. This 146 describes the service being offered, the network components that 147 interact to provide the service, and at a high level their 148 interactions. 150 The control plane described in this document uses Multiprotocol BGP 151 [4] to establish VPLS service, i.e., for the autodiscovery of VPLS 152 members and for the setup and teardown of the pseudowires that 153 constitute a given VPLS instance. Section 3 focuses on this, and 154 also describes how a VPLS that spans Autonomous System boundaries is 155 set up, as well as how multi-homing is handled. Using BGP as the 156 control plane for VPNs is not new (see [14], [6] and [11]): what is 157 described here is based on the mechanisms proposed in [6]. 159 The forwarding plane and the actions that a participating Provider 160 Edge (PE) router offering the VPLS service must take is described in 161 Section 4. 163 In Section 5, the notion of 'decoupled' operation is defined, and the 164 interaction of decoupled and non-decoupled PEs is described. 165 Decoupling allows for more flexible deployment of VPLS. 167 1.2. Conventions used in this document 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 ([1]). 173 1.3. Changes from version 06 to 07 175 [NOTE to RFC Editor: this section is to be removed before 176 publication.] 178 Note: the DISCUSSes below are referred to by id; they can be accessed 179 at https://datatracker.ietf.org/public/ 180 pidtracker.cgi?command=view_comment&id=[ID] 182 Updated title of doc to reflect use of BGP. (Fenner's DISCUSS id 183 44901). 185 Addressed Russ Housley's DISCUSSes on Figure 6 and Section 6 (ids 186 44778 and 44779). 188 Addressed Sam Hartman's DISCUSS on the Security Considerations (id 189 48432). 191 Resolution of Kessens' DISCUSS (id 44870): 193 1. Reference to RFC 4364 has been made normative. There is no 194 normative text in ref draft-kompella-l2vpn-l2vpn -- any such text 195 has long since been incorporated directly into this document. 197 2. Description and IANA section updated. 199 3. Expanded section (b) of Section 3.4 to clarify the data plane 200 operation for option b. 202 4. Updated Section 3.5 to clarify that a VPLS customer can run STP 203 independent of whether the SP uses multi-homing or not. 205 5. P bit text deleted (left over from an earlier edit.) 207 6. Addressed (hopefully) by Sam's DISCUSS. 209 7. Updated Security Considerations to incorporate the techniques 210 described in RFC 4364 for inter-AS VPNs. Also, added a paragraph 211 stating that misconfiguration could cause inter-VPLS connections, 212 just as can happen with RFC 4364. 214 Updated references; added reference to RFC 4023. 216 1.4. Changes from version 05 to 06 218 [NOTE to RFC Editor: this section is to be removed before 219 publication.] 221 Changes in response to GenART review. 223 Updated Abstract and Introduction to make it clear that VPLS is an 224 Ethernet-based service. 226 Added sections on Aging, Broadcast and Multicast, Qualified and 227 Unqualified learning and CoS. Also added a section on scaling the 228 BGP control plane. These were requested for consistency between the 229 BGP and LDP VPLS documents. 231 Added a section clarifying the concepts of label blocks, why they are 232 necessary and how they are used. 234 For multi-AS operation, added a short introduction to the three 235 options, comparing their usage. 237 Lots of clean-up: consistent usage of terms, expansion of acronyms 238 before use, references. 240 1.5. Changes from version 04 to 05 242 [NOTE to RFC Editor: this section is to be removed before 243 publication.] 245 Updated IANA section to reflect agreement with authors of [11] that 246 the two docs should use the same AFI for L2VPN information. 248 Addressed comments received from Alex Zinin. No technical changes, 249 but a more complete description to cover the issues that Alex raised: 251 1. encoding of BGP NEXT_HOP for the new AFI/SAFI is not described 253 2. VE ID, Block offset, Block size, Label base are not described 254 anywhere 256 3. no information on how the receiving PE choose the PW label 258 4. section 3.2.2 talks about PE capabilities all of a sudden and 259 introduces a L2 Info Community, whose fields and use are not 260 described 262 Changes to address these: 264 1. Broke up section 3.2.1 into "Concepts" and "PW Setup". 266 2. Expanded section on "Signaling PE Capabilities". 268 3. Added a new section 3.3 "BGP VPLS Operation". 270 4. Minor tweaking, e.g. to fix section number references. 272 1.6. Changes from version 03 to 04 274 [NOTE to RFC Editor: this section is to be removed before 275 publication.] 277 Incorporated IDR review comments from Eric Ji, Chaitanya Kodeboyina, 278 and Mike Loomis. Most changes are clarifications and rewording for 279 better readability. The substantive changes are to remove several 280 flags from the control field. 282 2. Functional Model 284 This will be described with reference to the following figure. 286 ----- 287 / A1 \ 288 ---- ____CE1 | 289 / \ -------- -------- / | | 290 | A2 CE2- / \ / PE1 \ / 291 \ / \ / \___/ | \ ----- 292 ---- ---PE2 | \ 293 | | \ ----- 294 | Service Provider Network | \ / \ 295 | | CE5 A5 | 296 | ___ | / \ / 297 |----| \ / \ PE4_/ ----- 298 |u-PE|--PE3 / \ / 299 |----| -------- ------- 300 ---- / | ---- 301 / \/ \ / \ CE = Customer Edge Device 302 | A3 CE3 --CE4 A4 | PE = Provider Edge Router 303 \ / \ / u-PE = Layer 2 Aggregation 304 ---- ---- A = Customer site n 306 Figure 1: Example of a VPLS 308 2.1. Terminology 310 Terminology similar to that in [6] is used: a Service Provider (SP) 311 network with P (Provider-only) and PE (Provider Edge) routers, and 312 customers with CE (Customer Edge) devices. Here, however, there is 313 an additional concept, that of a "u-PE", a Layer 2 PE device used for 314 Layer 2 aggregation. The notion of u-PE is described further in 315 Section 5. PE and u-PE devices are "VPLS-aware", which means that 316 they know that a VPLS service is being offered. We will call these 317 VPLS edge devices, which could be either a PE or an u-PE, a VE. 319 In contrast, the CE device (which may be owned and operated by either 320 the SP or the customer) is VPLS-unaware; as far as the CE is 321 concerned, it is connected to the other CEs in the VPLS via a Layer 2 322 switched network. This means that there should be no changes to a CE 323 device, either to the hardware or the software, in order to offer 324 VPLS. 326 A CE device may be connected to a PE or a u-PE via Layer 2 switches 327 that are VPLS-unaware. From a VPLS point of view, such Layer 2 328 switches are invisible, and hence will not be discussed further. 329 Furthermore, a u-PE may be connected to a PE via Layer 2 and Layer 3 330 devices; this will be discussed further in a later section. 332 The term "demultiplexor" refers to an identifier in a data packet 333 that identifies both the VPLS to which the packet belongs as well as 334 the ingress PE. In this document, the demultiplexor is an MPLS 335 label. 337 The term "VPLS" will refer to the service as well as a particular 338 instantiation of the service (i.e., an emulated LAN); it should be 339 clear from the context which usage is intended. 341 2.2. Assumptions 343 The Service Provider Network is a packet switched network. The PEs 344 are assumed to be (logically) fully meshed with tunnels over which 345 packets that belong to a service (such as VPLS) are encapsulated and 346 forwarded. These tunnels can be IP tunnels, such as GRE, or MPLS 347 tunnels, established by RSVP-TE or LDP. These tunnels are 348 established independently of the services offered over them; the 349 signaling and establishment of these tunnels are not discussed in 350 this document. 352 "Flooding" and MAC address "learning" (see Section 4) are an integral 353 part of VPLS. However, these activities are private to an SP device, 354 i.e., in the VPLS described below, no SP device requests another SP 355 device to flood packets or learn MAC addresses on its behalf. 357 All the PEs participating in a VPLS are assumed to be fully meshed in 358 the data plane, i.e., there is a bidirectional pseudowire between 359 every pair of PEs participating in that VPLS, and thus every 360 (ingress) PE can send a VPLS packet to the egress PE(s) directly, 361 without the need for an intermediate PE (see Section 4.2.5.) This 362 requires that VPLS PEs are logically fully meshed in the control 363 plane so that a PE can send a message to another PE to set up the 364 necessary pseudowires. See Section 3.6 for a discussion on 365 alternatives to achieve a logical full mesh in the control plane. 367 2.3. Interactions 369 VPLS is a "LAN Service" in that CE devices that belong to VPLS V can 370 interact through the SP network as if they were connected by a LAN. 371 VPLS is "private" in that CE devices that belong to different VPLSs 372 cannot interact. VPLS is "virtual" in that multiple VPLSs can be 373 offered over a common packet switched network. 375 PE devices interact to "discover" all the other PEs participating in 376 the same VPLS, and to exchange demultiplexors. These interactions 377 are control-driven, not data-driven. 379 u-PEs interact with PEs to establish connections with remote PEs or 380 u-PEs in the same VPLS. This interaction is control-driven. 382 PE devices can participate simultaneously in both VPLS and IP VPNs 383 ([6]). These are independent services, and the information exchanged 384 for each type of service is kept separate as the Network Layer 385 Reachability Information (NLRI) used for this exchange have different 386 Address Family Identifiers (AFI) and Subsequent Address Family 387 Identifiers (SAFI). Consequently, an implementation MUST maintain a 388 separate routing storage for each service. However, multiple 389 services can use the same underlying tunnels; the VPLS or VPN label 390 is used to demultiplex the packets belonging to different services. 392 3. Control Plane 394 There are two primary functions of the VPLS control plane: 395 autodiscovery, and setup and teardown of the pseudowires that 396 constitute the VPLS, often called signaling. Section 3.1 and 397 Section 3.2 describe these functions. Both of these functions are 398 accomplished with a single BGP Update advertisement; Section 3.3 399 describes how this is done by detailing BGP protocol operation for 400 VPLS. Section 3.4 describes the setting up of pseudowires that span 401 Autonomous Systems. Section 3.5 describes how multi-homing is 402 handled. 404 3.1. Autodiscovery 406 Discovery refers to the process of finding all the PEs that 407 participate in a given VPLS instance. A PE can either be configured 408 with the identities of all the other PEs in a given VPLS, or the PE 409 can use some protocol to discover the other PEs. The latter is 410 called autodiscovery. 412 The former approach is fairly configuration-intensive, especially 413 since it is required that the PEs participating in a given VPLS are 414 fully meshed (i.e., that every PE in a given VPLS establish 415 pseudowires to every other PE in that VPLS). Furthermore, when the 416 topology of a VPLS changes (i.e., a PE is added to, or removed from 417 the VPLS), the VPLS configuration on all PEs in that VPLS must be 418 changed. 420 In the autodiscovery approach, each PE "discovers" which other PEs 421 are part of a given VPLS by means of some protocol, in this case BGP. 422 This allows each PE's configuration to consist only of the identity 423 of the VPLS instance established on this PE, not the identity of 424 every other PE in that VPLS instance -- that is auto-discovered. 425 Moreover, when the topology of a VPLS changes, only the affected PE's 426 configuration changes; other PEs automatically find out about the 427 change and adapt. 429 3.1.1. Functions 431 A PE that participates in a given VPLS instance V must be able to 432 tell all other PEs in VPLS V that it is also a member of V. A PE must 433 also have a means of declaring that it no longer participates in a 434 VPLS. To do both of these, the PE must have a means of identifying a 435 VPLS and a means by which to communicate to all other PEs. 437 U-PE devices also need to know what constitutes a given VPLS; 438 however, they don't need the same level of detail. The PE (or PEs) 439 to which a u-PE is connected gives the u-PE an abstraction of the 440 VPLS; this is described in section 5. 442 3.1.2. Protocol Specification 444 The specific mechanism for autodiscovery described here is based on 445 [14] and [6]; it uses BGP extended communities [5] to identify 446 members of a VPLS, in particular, the Route Target community, whose 447 format is described in [5]. The semantics of the use of Route 448 Targets is described in [6]; their use in VPLS is identical. 450 As it has been assumed that VPLSs are fully meshed, a single Route 451 Target RT suffices for a given VPLS V, and in effect that RT is the 452 identifier for VPLS V. 454 A PE announces (typically via I-BGP) that it belongs to VPLS V by 455 annotating its NLRIs for V (see next subsection) with Route Target 456 RT, and acts on this by accepting NLRIs from other PEs that have 457 Route Target RT. A PE announces that it no longer participates in V 458 by withdrawing all NLRIs that it had advertised with Route Target RT. 460 3.2. Signaling 462 Once discovery is done, each pair of PEs in a VPLS must be able to 463 establish (and tear down) pseudowires to each other, i.e., exchange 464 (and withdraw) demultiplexors. This process is known as signaling. 465 Signaling is also used to transmit certain characteristics of the 466 pseudowires that a PE sets up for a given VPLS. 468 Recall that a demultiplexor is used to distinguish among several 469 different streams of traffic carried over a tunnel, each stream 470 possibly representing a different service. In the case of VPLS, the 471 demultiplexor not only says to which specific VPLS a packet belongs, 472 but also identifies the ingress PE. The former information is used 473 for forwarding the packet; the latter information is used for 474 learning MAC addresses. The demultiplexor described here is an MPLS 475 label. However, note that the PE-to-PE tunnels need not be MPLS 476 tunnels. 478 Using a distinct BGP Update message to send a demultiplexor to each 479 remote PE would require the originating PE to send N such messages 480 for N remote PEs. The solution described in this document allows a 481 PE to send a single (common) Update message that contains 482 demultiplexors for all the remote PEs, instead of N individual 483 messages. Doing this reduces the control plane load both on the 484 originating PE as well as on the BGP Route Reflectors that may be 485 involved in distributing this Update to other PEs. 487 3.2.1. Label Blocks 489 To accomplish this, we introduce the notion of "label blocks". A 490 label block, defined by a label base LB and a VE block size VBS, is a 491 contiguous set of labels {LB, LB+1, ..., LB+VBS-1}. Here's how label 492 blocks work. All PEs within a given VPLS are assigned unique VE IDs 493 as part of their configuration. A PE X wishing to send a VPLS update 494 sends the same label block information to all other PEs. Each 495 receiving PE infers the label intended for PE X by adding their 496 (unique) VE ID to the label base. In this manner, each receiving PE 497 gets a unique demultiplexor for PE X for that VPLS. 499 This simple notion is enhanced with the concept of a VE block offset 500 VBO. A label block defined by is the set {LB+VBO, LB+ 501 VBO+1, ..., LB+VBO+VBS-1}. Thus, instead of a single large label 502 block to cover all VE IDs in a VPLS, one can have several label 503 blocks, each with a different label base. This makes label block 504 management easier, and also allows PE X to cater gracefully to a PE 505 joining a VPLS with a VE ID that is not covered by the set of label 506 blocks that that PE X has already advertised. 508 When a PE starts up, or is configured with a new VPLS instance, the 509 BGP process may wish to wait to receive several advertisements for 510 that VPLS instance from other PEs to improve the efficiency of label 511 block allocation. 513 3.2.2. VPLS BGP NLRI 515 The VPLS BGP NLRI described below, with a new AFI and SAFI (see [4]) 516 is used to exchange VPLS membership and demultiplexors. 518 A VPLS BGP NLRI has the following information elements: a VE ID, a VE 519 Block Offset, a VE Block Size and a label base. The format of the 520 VPLS NLRI is given below. The AFI is the L2VPN AFI (to be assigned 521 by IANA), and the SAFI is the VPLS SAFI (65). The Length field is in 522 octets. 524 +------------------------------------+ 525 | Length (2 octets) | 526 +------------------------------------+ 527 | Route Distinguisher (8 octets) | 528 +------------------------------------+ 529 | VE ID (2 octets) | 530 +------------------------------------+ 531 | VE Block Offset (2 octets) | 532 +------------------------------------+ 533 | VE Block Size (2 octets) | 534 +------------------------------------+ 535 | Label Base (3 octets) | 536 +------------------------------------+ 538 Figure 2: BGP NLRI for VPLS Information 540 A PE participating in a VPLS must have at least one VE ID. If the PE 541 is the VE, it typically has one VE ID. If the PE is connected to 542 several u-PEs, it has a distinct VE ID for each u-PE. It may 543 additionally have a VE ID for itself, if it itself acts as a VE for 544 that VPLS. In what follows, we will call the PE announcing the VPLS 545 NLRI PE-a, and we will assume that PE-a owns VE ID V (either 546 belonging to PE-a itself, or to a u-PE connected to PE-a). 548 VE IDs are typically assigned by the network administrator. Their 549 scope is local to a VPLS. A given VE ID should belong to only one 550 PE, unless a CE is multi-homed (see Section 3.5). 552 A label block is a set of demultiplexor labels used to reach a given 553 VE ID. A VPLS BGP NLRI with VE ID V, VE Block Offset VBO, VE Block 554 Size VBS and label base LB communicates to its peers the following: 556 label block for V: labels from LB to (LB + VBS - 1), and 558 remote VE set for V: from VBO to (VBO + VBS - 1). 560 There is a one-to-one correspondence between the remote VE set and 561 the label block: VE ID (VBO + n) corresponds to label (LB + n). 563 3.2.3. PW Setup and Teardown 565 Suppose PE-a is part of VPLS foo, and makes an announcement with VE 566 ID V, VE Block Offset VBO, VE Block Size VBS and label base LB. If 567 PE-b is also part of VPLS foo, and has VE ID W, PE-b does the 568 following: 570 1. checks if W is part of PE-a's 'remote VE set': if VBO <= W < VBO 571 + VBS, then W is part of PE-a's remote VE set. If not, PE-b 572 ignores this message, and skips the rest of this procedure. 574 2. sets up a PW to PE-a: the demultiplexor label to send traffic 575 from PE-b to PE-a is computed as (LB + W - VBO). 577 3. checks if V is part of any 'remote VE set' that PE-b announced, 578 i.e., PE-b checks if V belongs to some remote VE set that PE-b 579 announced, say with VE Block Offset VBO', VE Block Size VBS' and 580 label base LB'. If not, PE-b MUST make a new announcement as 581 described in Section 3.3. 583 4. sets up a PW from PE-a: the demultiplexor label over which PE-b 584 should expect traffic from PE-a is computed as: (LB' + V - VBO'). 586 If Y withdraws an NLRI for V that X was using, then X MUST tear down 587 its ends of the pseudowire between X and Y. 589 3.2.4. Signaling PE Capabilities 591 The following extended attribute, the "Layer2 Info Extended 592 Community", is used to signal control information about the 593 pseudowires to be setup for a given VPLS. The extended community 594 value is to be allocated by IANA (currently used value is 0x800A). 595 This information includes the Encaps Type (type of encapsulation on 596 the pseudowires), Control Flags (control information regarding the 597 pseudowires) and the Maximum Transmission Unit (MTU) to be used on 598 the pseudowires. 600 The Encaps Type for VPLS is 19. 602 +------------------------------------+ 603 | Extended community type (2 octets) | 604 +------------------------------------+ 605 | Encaps Type (1 octet) | 606 +------------------------------------+ 607 | Control Flags (1 octet) | 608 +------------------------------------+ 609 | Layer-2 MTU (2 octet) | 610 +------------------------------------+ 611 | Reserved (2 octets) | 612 +------------------------------------+ 614 Figure 3: Layer2 Info Extended Community 615 0 1 2 3 4 5 6 7 616 +-+-+-+-+-+-+-+-+ 617 | MBZ |C|S| (MBZ = MUST Be Zero) 618 +-+-+-+-+-+-+-+-+ 620 Figure 4: Control Flags Bit Vector 622 With reference to Figure 4, the following bits in the Control Flags 623 are defined; the remaining bits, designated MBZ, MUST be set to zero 624 when sending and MUST be ignored when receiving this community. 626 Name Meaning 627 C A Control word ( 628 [7] 629 ) MUST or MUST NOT be present when 630 sending VPLS packets to this PE, depending on whether C 631 is 1 or 0, respectively 632 S Sequenced delivery of frames MUST or MUST NOT be used 633 when sending VPLS packets to this PE. depending on 634 whether S is 1 or 0, respectively 636 3.3. BGP VPLS Operation 638 To create a new VPLS, say VPLS foo, a network administrator must pick 639 a RT for VPLS foo, say RT-foo. This will be used by all PEs that 640 serve VPLS foo. To configure a given PE, say PE-a, to be part of 641 VPLS foo, the network administrator only has to choose a VE ID V for 642 PE-a. (If PE-a is connected to u-PEs, PE-a may be configured with 643 more than one VE ID; in that case, the following is done for each VE 644 ID). The PE may also be configured with a Route Distinguisher (RD); 645 if not, it generates a unique RD for VPLS foo. Say the RD is 646 RD-foo-a. PE-a then generates an initial label block and a remote VE 647 set for V, defined by VE Block Offset VBO, VE Block Size VBS and 648 label base LB. These may be empty. 650 PE-a then creates a VPLS BGP NLRI with RD RD-foo-a, VE ID V, VE Block 651 Offset VBO, VE Block Size VBS and label base LB. To this, it 652 attaches a Layer2 Info Extended Community and a RT, RT-foo. It sets 653 the BGP Next Hop for this NLRI as itself, and announces this NLRI to 654 its peers. The Network Layer protocol associated with the Network 655 Address of the Next Hop for the combination is IP; this association is required by [4], Section 5. If the 657 value of the Length of the Next Hop field is 4, then the Next Hop 658 contains an IPv4 address. If this value is 16, then the Next Hop 659 contains an IPv6 address. 661 If PE-a hears from another PE, say PE-b, a VPLS BGP announcement with 662 RT-foo and VE ID W, then PE-a knows that PE-b is a member of the same 663 VPLS (autodiscovery). PE-a then has to set up its part of a VPLS 664 pseudowire between PE-a and PE-b, using the mechanisms in 665 Section 3.2. Similarly, PE-b will have discovered that PE-a is in 666 the same VPLS, and PE-b must set up its part of the VPLS pseudowire. 667 Thus, signaling and pseudowire setup is also achieved with the same 668 Update message. 670 If W is not in any remote VE set that PE-a announced for VE ID V in 671 VPLS foo, PE-b will not be able to set up its part of the pseudowire 672 to PE-a. To address this, PE-a can choose to withdraw the old 673 announcement(s) it made for VPLS foo, and announce a new Update with 674 a larger remote VE set and corresponding label block that covers all 675 VE IDs that are in VPLS foo. This however, may cause some service 676 disruption. An alternative for PE-a is to create a new remote VE set 677 and corresponding label block, and announce them in a new Update, 678 without withdrawing previous announcements. 680 If PE-a's configuration is changed to remove VE ID V from VPLS foo, 681 then PE-a MUST withdraw all its announcements for VPLS foo that 682 contain VE ID V. If all of PE-a's links to its CEs in VPLS foo go 683 down, then PE-a SHOULD either withdraw all its NLRIs for VPLS foo, or 684 let other PEs in the VPLS foo know in some way that PE-a is no longer 685 connected to its CEs. 687 3.4. Multi-AS VPLS 689 As in [14] and [6], the above autodiscovery and signaling functions 690 are typically announced via I-BGP. This assumes that all the sites 691 in a VPLS are connected to PEs in a single Autonomous System (AS). 693 However, sites in a VPLS may connect to PEs in different ASes. This 694 leads to two issues: 1) there would not be an I-BGP connection 695 between those PEs, so some means of signaling across ASes is needed; 696 and 2) there may not be PE-to-PE tunnels between the ASes. 698 A similar problem is solved in [6], Section 10. Three methods are 699 suggested to address issue (1); all these methods have analogs in 700 multi-AS VPLS. 702 Here is a diagram for reference: 704 __________ ____________ ____________ __________ 705 / \ / \ / \ / \ 706 \___/ AS 1 \ / AS 2 \___/ 707 \ / 708 +-----+ +-------+ | +-------+ +-----+ 709 | PE1 | ---...--- | ASBR1 | ======= | ASBR2 | ---...--- | PE2 | 710 +-----+ +-------+ | +-------+ +-----+ 711 ___ / \ ___ 712 / \ / \ / \ 713 \__________/ \____________/ \____________/ \__________/ 715 Figure 6: Inter-AS VPLS 717 As in the above reference, three methods for signaling inter-provider 718 VPLS are given; these are presented in order of increasing 719 scalability. Method (a) is the easiest to understand conceptually, 720 and the easiest to deploy; however, it requires an Ethernet 721 interconnect between the ASes, and both VPLS control and data plane 722 state on the AS border routers (ASBRs). Method (b) requires VPLS 723 control plane state on the ASBRs and MPLS on the AS-AS interconnect 724 (which need not be Ethernet). Method (c) requires MPLS on the AS-AS 725 interconnect, but no VPLS state of any kind on the ASBRs. 727 3.4.1. a) VPLS-to-VPLS connections at the ASBRs. 729 In this method, an AS Border Router (ASBR1) acts as a PE for all 730 VPLSs that span AS1 and an AS to which ASBR1 is connected, such as 731 AS2 here. The ASBR on the neighboring AS (ASBR2) is viewed by ASBR1 732 as a CE for the VPLSs that span AS1 and AS2; similarly, ASBR2 acts as 733 a PE for this VPLS from AS2's point of view, and views ASBR1 as a CE. 735 This method does not require MPLS on the ASBR1-ASBR2 link, but does 736 require that this link carry Ethernet traffic, and that there be a 737 separate VLAN sub-interface for each VPLS traversing this link. It 738 further requires that ASBR1 does the PE operations (discovery, 739 signaling, MAC address learning, flooding, encapsulation, etc.) for 740 all VPLSs that traverse ASBR1. This imposes a significant burden on 741 ASBR1, both on the control plane and the data plane, which limits the 742 number of multi-AS VPLSs. 744 Note that in general, there will be multiple connections between a 745 pair of ASes, for redundancy. In this case, the Spanning Tree 746 Protocol (STP) ([15]), or some other means of loop detection and 747 prevention, must be run on each VPLS that spans these ASes, so that a 748 loop-free topology can be constructed in each VPLS. This imposes a 749 further burden on the ASBRs and PEs participating in those VPLSs, as 750 these devices would need to run a loop detection algorithm for each 751 such VPLS. How this may be achieved is outside the scope of this 752 document. 754 3.4.2. b) EBGP redistribution of VPLS information between ASBRs. 756 This method requires I-BGP peerings between the PEs in AS1 and ASBR1 757 in AS1 (perhaps via route reflectors), an E-BGP peering between ASBR1 758 and ASBR2 in AS2, and I-BGP peerings between ASBR2 and the PEs in 759 AS2. In the above example, PE1 sends a VPLS NLRI to ASBR1 with a 760 label block and itself as the BGP nexthop; ASBR1 sends the NLRI to 761 ASBR2 with new labels and itself as the BGP nexthop; and ASBR2 sends 762 the NLRI to PE2 with new labels and itself as the nexthop. 763 Correspondingly, there are three tunnels: T1 from PE1 to ASBR1, T2 764 from ASBR1 to ASBR2, and T3 from ASBR2 to PE2. Within each tunnel, 765 the VPLS label to be used is determined by the receiving device; 766 e.g., the VPLS label within T1 is a label from the label block that 767 ASBR1 sent to PE1. The ASBRs are responsible for receiving VPLS 768 packets encapsulated in a tunnel, and performing the appropriate 769 label swap operations described next so that the next receiving 770 device can correctly identify and forward the packet. 772 The VPLS NLRI that ASBR1 sends to ASBR2 (and the NLRI that ASBR2 773 sends to PE2) is identical to the VPLS NLRI that PE1 sends to ASBR1, 774 except for the label block. To be precise, the Length, the Route 775 Distinguisher, the VE ID, the VE Block Offset, and the VE Block Size 776 MUST be the same; the Label Base may be different. Furthermore, 777 ASBR1 must also update its forwarding path as follows: if the Label 778 Base sent by PE1 is L1, the Label-block Size is N, the Label Base 779 sent by ASBR1 is L2, and the tunnel label from ASBR1 to PE1 is T, 780 then ASBR1 must install the following in the forwarding path: 782 swap L2 with L1 and push T, 784 swap L2+1 with L1+1 and push T, ... 786 swap L2+N-1 with L1+N-1 and push T. 788 ASBR2 must act similarly, except that it may not need a tunnel label 789 if it is directly connected with ASBR1. 791 When PE2 wants to send a VPLS packet to PE1, PE2 uses its VE ID to 792 get the right VPLS label from ASBR2's label block for PE1, and uses a 793 tunnel label to reach ASBR2. ASBR2 swaps the VPLS label with the 794 label from ASBR1; ASBR1 then swaps the VPLS label with the label from 795 PE1, and pushes a tunnel label to reach PE1. 797 In this method, one needs MPLS on the ASBR1-ASBR2 interface, but 798 there is no requirement that the link layer be Ethernet. 799 Furthermore, the ASBRs take part in distributing VPLS information. 800 However, the data plane requirements of the ASBRs is much simpler 801 than in method (a), being limited to label operations. Finally, the 802 construction of loop-free VPLS topologies is done by routing 803 decisions, viz. BGP path and nexthop selection, so there is no need 804 to run the Spanning Tree Protocol on a per-VPLS basis. Thus, this 805 method is considerably more scalable than method (a). 807 3.4.3. c) Multi-hop EBGP redistribution of VPLS information between 808 ASes. 810 In this method, there is a multi-hop E-BGP peering between the PEs 811 (or preferably, a Route Reflector) in AS1 and the PEs (or Route 812 Reflector) in AS2. PE1 sends a VPLS NLRI with labels and nexthop 813 self to PE2; if this is via route reflectors, the BGP nexthop is not 814 changed. This requires that there be a tunnel LSP from PE1 to PE2. 815 This tunnel LSP can be created exactly as in [6], section 10 (c), for 816 example using E-BGP to exchange labeled IPv4 routes for the PE 817 loopbacks. 819 When PE1 wants to send a VPLS packet to PE2, it pushes the VPLS label 820 corresponding to its own VE ID onto the packet. It then pushes the 821 tunnel label(s) to reach PE2. 823 This method requires no VPLS information (in either the control or 824 the data plane) on the ASBRs. The ASBRs only need to set up PE-to-PE 825 tunnel LSPs in the control plane, and do label operations in the data 826 plane. Again, as in the case of method (b), the construction of 827 loop-free VPLS topologies is done by routing decisions, i.e., BGP 828 path and nexthop selection, so there is no need to run the Spanning 829 Tree Protocol on a per-VPLS basis. This option is likely to be the 830 most scalable of the three methods presented here. 832 3.4.4. Allocation of VE IDs Across Multiple ASes 834 In order to ease the allocation of VE IDs for a VPLS that spans 835 multiple ASes, one can allocate ranges for each AS. For example, AS1 836 uses VE IDs in the range 1 to 100, AS2 from 101 to 200, etc. If 837 there are 10 sites attached to AS1 and 20 to AS2, the allocated VE 838 IDs could be 1-10 and 101 to 120. This minimizes the number of VPLS 839 NLRIs that are exchanged while ensuring that VE IDs are kept unique. 841 In the above example, if AS1 needed more than 100 sites, then another 842 range can be allocated to AS1. The only caveat is that there be no 843 overlap between VE ID ranges among ASes. The exception to this rule 844 is multi-homing, which is dealt with below. 846 3.5. Multi-homing and Path Selection 848 It is often desired to multi-home a VPLS site, i.e., to connect it to 849 multiple PEs, perhaps even in different ASes. In such a case, the 850 PEs connected to the same site can either be configured with the same 851 VE ID or with different VE IDs. In the latter case, it is mandatory 852 to run STP on the CE device, and possibly on the PEs, to construct a 853 loop-free VPLS topology. How this can be accomplished is outside the 854 scope of this document; however, the rest of this section will 855 describe in some detail the former case. Note that multi-homing by 856 the SP and STP on the CEs can co-exist; thus it is recommended that 857 the VPLS customer run STP if the CEs are able to. 859 In the case where the PEs connected to the same site are assigned the 860 same VE ID, a loop-free topology is constructed by routing 861 mechanisms, in particular, by BGP path selection. When a BGP speaker 862 receives two equivalent NLRIs (see below for the definition), it 863 applies standard path selection criteria such as Local Preference and 864 AS Path Length to determine which NLRI to choose; it MUST pick only 865 one. If the chosen NLRI is subsequently withdrawn, the BGP speaker 866 applies path selection to the remaining equivalent VPLS NLRIs to pick 867 another; if none remain, the forwarding information associated with 868 that NLRI is removed. 870 Two VPLS NLRIs are considered equivalent from a path selection point 871 of view if the Route Distinguisher, the VE ID and the VE Block Offset 872 are the same. If two PEs are assigned the same VE ID in a given 873 VPLS, they MUST use the same Route Distinguisher, and they SHOULD 874 announce the same VE Block Size for a given VE Offset. 876 3.6. Hierarchical BGP VPLS 878 This section discusses how one can scale the VPLS control plane when 879 using BGP. There are at least three aspects of scaling the control 880 plane: 882 1. alleviating the full mesh connectivity requirement among VPLS BGP 883 speakers; 885 2. limiting BGP VPLS message passing to just the interested speakers 886 rather than all BGP speakers; and 888 3. simplifying the addition and deletion of BGP speakers, whether 889 for VPLS or other applications. 891 Fortunately, the use of BGP for Internet routing as well as for IP 892 VPNs has yielded several good solutions for all these problems. The 893 basic technique is hierarchy, using BGP Route Reflectors (RRs) ([8]). 895 The idea is to designate a small set of Route Reflectors which are 896 themselves fully meshed, and then establish a BGP session between 897 each BGP speaker and one or more RRs. In this way, there is no need 898 of direct full mesh connectivity among all the BGP speakers. If the 899 particular scaling needs of a provider requires a large number of 900 RRs, then this technique can be applied recursively: the full mesh 901 connectivity among the RRs can be brokered by yet another level of 902 RRs. The use of RRs solves problems 1 and 3 above. 904 It is important to note that RRs, as used for VPLS and VPNs, are 905 purely a control plane technique. The use of RRs introduces no data 906 plane state and no data plane forwarding requirements on the RRs, and 907 does not in any way change the forwarding path of VPLS traffic. This 908 is in contrast to the technique of Hierarchical VPLS defined in [10]. 910 Another consequence of this approach is that it is not required that 911 one set of RRs handles all BGP messages, or that a particular RR 912 handle all messages from a given PE. One can define several sets of 913 RRs, for example a set to handle VPLS, another to handle IP VPNs and 914 another for Internet routing. Another partitioning could be to have 915 some subset of VPLSs and IP VPNs handled by one set of RRs, and 916 another subset of VPLSs and IP VPNs handled by another set of RRs; 917 the use of Route Target Filtering (RTF), described in [12] can make 918 this simpler and more effective. 920 Finally, problem 2 (that of limiting BGP VPLS message passing to just 921 the interested BGP speakers) is addressed by the use of RTF. This 922 technique is orthogonal to the use of RRs, but works well in 923 conjunction with RRs. RTF is also very effective in inter-AS VPLS; 924 more details on how RTF works and its benefits are provided in [12]. 926 It is worth mentioning an aspect of the control plane that is often a 927 source of confusion. No MAC addresses are exchanged via BGP. All 928 MAC address learning and aging is done in the data plane individually 929 by each PE. The only task of BGP VPLS message exchange is 930 autodiscovery and label exchange. 932 Thus, BGP processing for VPLS occurs when 934 1. a PE joins or leaves a VPLS; or 936 2. a failure occurs in the network, bringing down a PE-PE tunnel or 937 a PE-CE link. 939 These events are relatively rare, and typically, each such event 940 causes one BGP update to be generated. Coupled with BGP's messaging 941 efficiency when used for signaling VPLS, these observations lead to 942 the conclusion that BGP as a control plane for VPLS will scale quite 943 well both in terms of processing and memory requirements. 945 4. Data Plane 947 This section discusses two aspects of the data plane for PEs and 948 u-PEs implementing VPLS: encapsulation and forwarding. 950 4.1. Encapsulation 952 Ethernet frames received from CE devices are encapsulated for 953 transmission over the packet switched network connecting the PEs. 954 The encapsulation is as in [7]. 956 4.2. Forwarding 958 VPLS packets are classified as belonging to a given service instance 959 and associated forwarding table based on the interface over which the 960 packet is received. Packets are forwarded in the context of the 961 service instance based on the destination MAC address. The former 962 mapping is determined by configuration. The latter is the focus of 963 this section. 965 4.2.1. MAC address learning 967 As was mentioned earlier, the key distinguishing feature of VPLS is 968 that it is a multipoint service. This means that the entire Service 969 Provider network should appear as a single logical learning bridge 970 for each VPLS that the SP network supports. The logical ports for 971 the SP "bridge" are the customer ports as well as the pseudowires on 972 a VE. Just as a learning bridge learns MAC addresses on its ports, 973 the SP bridge must learn MAC addresses at its VEs. 975 Learning consists of associating source MAC addresses of packets with 976 the (logical) ports on which they arrive; this association is the 977 Forwarding Information Base (FIB). The FIB is used for forwarding 978 packets. For example, suppose the bridge receives a packet with 979 source MAC address S on (logical) port P. If subsequently, the bridge 980 receives a packet with destination MAC address S, it knows that it 981 should send the packet out on port P. 983 If a VE learns a source MAC address S on logical port P, then later 984 sees S on a different port P', then the VE MUST update its FIB to 985 reflect the new port P'. A VE MAY implement a mechanism to damp 986 flapping of source ports for a given MAC address. 988 4.2.2. Aging 990 VPLS PEs SHOULD have an aging mechanism to remove a MAC address 991 associated with a logical port, much the same as learning bridges do. 992 This is required so that a MAC address can be relearned if it "moves" 993 from a logical port to another logical port, either because the 994 station to which that MAC address belongs really has moved, or 995 because of a topology change in the LAN that causes this MAC address 996 to arrive on a new port. In addition, aging reduces the size of a 997 VPLS MAC table to just the active MAC addresses, rather than all MAC 998 addresses in that VPLS. 1000 The "age" of a source MAC address S on a logical port P is the time 1001 since it was last seen as a source MAC on port P. If the age exceeds 1002 the aging time T, S MUST be flushed from the FIB. This of course 1003 means that every time S is seen as a source MAC address on port P, 1004 S's age is reset. 1006 An implementation SHOULD provide a configurable knob to set the aging 1007 time T on a per-VPLS basis. In addition, an implementation MAY 1008 accelerate aging of all MAC addresses in a VPLS if it detects certain 1009 situations, such as a Spanning Tree topology change in that VPLS. 1011 4.2.3. Flooding 1013 When a bridge receives a packet to a destination that is not in its 1014 FIB, it floods the packet on all the other ports. Similarly, a VE 1015 will flood packets to an unknown destination to all other VEs in the 1016 VPLS. 1018 In Figure 1 above, if CE2 sent an Ethernet frame to PE2, and the 1019 destination MAC address on the frame was not in PE2's FIB (for that 1020 VPLS), then PE2 would be responsible for flooding that frame to every 1021 other PE in the same VPLS. On receiving that frame, PE1 would be 1022 responsible for further flooding the frame to CE1 and CE5 (unless PE1 1023 knew which CE "owned" that MAC address). 1025 On the other hand, if PE3 received the frame, it could delegate 1026 further flooding of the frame to its u-PE. If PE3 was connected to 2 1027 u-PEs, it would announce that it has two u-PEs. PE3 could either 1028 announce that it is incapable of flooding, in which case it would 1029 receive two frames, one for each u-PE, or it could announce that it 1030 is capable of flooding, in which case it would receive one copy of 1031 the frame, which it would then send to both u-PEs. 1033 4.2.4. Broadcast and Multicast 1035 There is a well-known broadcast MAC address. An Ethernet frame whose 1036 destination MAC address is the broadcast MAC address must be sent to 1037 all stations in that VPLS. This can be accomplished by the same 1038 means that is used for flooding. 1040 There is also an easily recognized set of "multicast" MAC addresses. 1042 Ethernet frames with a destination multicast MAC address MAY be 1043 broadcast to all stations; a VE MAY also use certain techniques to 1044 restrict transmission of multicast frames to a smaller set of 1045 receivers, those that have indicated interest in the corresponding 1046 multicast group. Discussion of this is outside the scope of this 1047 document. 1049 4.2.5. "Split Horizon" Forwarding 1051 When a PE capable of flooding (say PEx) receives a broadcast Ethernet 1052 frame, or one with an unknown destination MAC address, it must flood 1053 the frame. If the frame arrived from an attached CE, PEx must send a 1054 copy of the frame to every other attached CE, as well as to all other 1055 PEs participating in the VPLS. If, on the other hand, the frame 1056 arrived from another PE (say PEy), PEx must send a copy of the packet 1057 only to attached CEs. PEx MUST NOT send the frame to other PEs, 1058 since PEy would have already done so. This notion has been termed 1059 "split horizon" forwarding, and is a consequence of the PEs being 1060 logically fully meshed for VPLS. 1062 Split horizon forwarding rules apply to broadcast and multicast 1063 packets, as well as packets to an unknown MAC address. 1065 4.2.6. Qualified and Unqualified Learning 1067 The key for normal Ethernet MAC learning is usually just the 1068 (6-octet) MAC address. This is called "unqualified learning". 1069 However, it is also possible that the key for learning includes the 1070 VLAN tag when present; this is called "qualified learning". 1072 In the case of VPLS, learning is done in the context of a VPLS 1073 instance, which typically corresponds to a customer. If the customer 1074 uses VLAN tags, one can make the same distinctions of qualified and 1075 unqualified learning. If the key for learning within a VPLS is just 1076 the MAC address, then this VPLS is operating under unqualified 1077 learning. If the key for learning is (customer VLAN tag + MAC 1078 address), then this VPLS is operating under qualified learning. 1080 Choosing between qualified and unqualified learning involves several 1081 factors, the most important of which is whether one wants a single 1082 global broadcast domain (unqualified), or a broadcast domain per VLAN 1083 (qualified). The latter makes flooding and broadcasting more 1084 efficient, but requires larger MAC tables. These considerations 1085 apply equally to normal Ethernet forwarding and to VPLS. 1087 4.2.7. Class of Service 1089 In order to offer different Classes of Service within a VPLS, an 1090 implementation MAY choose to map 802.1p bits in a customer Ethernet 1091 frame with a VLAN tag to an appropriate setting of EXP bits in the 1092 pseudowire and/or tunnel label, allowing for differential treatment 1093 of VPLS frames in the packet-switched network. 1095 To be useful, an implementation SHOULD allow this mapping function to 1096 be different for each VPLS, as each VPLS customer may have their own 1097 view of the required behavior for a given setting of 802.1p bits. 1099 5. Deployment Options 1101 In deploying a network that supports VPLS, the SP must decide what 1102 functions the VPLS-aware device closest to the customer (the VE) 1103 supports. The default case described in this document is that the VE 1104 is a PE. However, there are a number of reasons that the VE might be 1105 a device that does all the Layer 2 functions (such as MAC address 1106 learning and flooding), and a limited set of Layer 3 functions (such 1107 as communicating to its PE), but, for example, doesn't do full- 1108 fledged discovery and PE-to-PE signaling. Such a device is called a 1109 "u-PE". 1111 As both of these cases have benefits, one would like to be able to 1112 "mix and match" these scenarios. The signaling mechanism presented 1113 here allows this. For example, in a given provider network, one PE 1114 may be directly connected to CE devices; another may be connected to 1115 u-PEs that are connected to CEs; and a third may be connected 1116 directly to a customer over some interfaces and to u-PEs over others. 1117 All these PEs perform discovery and signaling in the same manner. 1118 How they do learning and forwarding depends on whether or not there 1119 is a u-PE; however, this is a local matter, and is not signaled. 1120 However, the details of the operation of a u-PE and its interactions 1121 with PEs and other u-PEs is beyond the scope of this document. 1123 6. Security Considerations 1125 The focus in Virtual Private LAN Service is the privacy of data, 1126 i.e., that data in a VPLS is only distributed to other nodes in that 1127 VPLS and not to any external agent or other VPLS. Note that VPLS 1128 does not offer confidentiality, integrity, or authentication: VPLS 1129 packets are sent in the clear in the packet-switched network, and a 1130 man-in-the-middle can eavesdrop, and may be able to inject packets 1131 into the data stream. If security is desired, the PE-to-PE tunnels 1132 can be IPsec tunnels. For more security, the end systems in the VPLS 1133 sites can use appropriate means of encryption to secure their data 1134 even before it enters the Service Provider network. 1136 There are two aspects to achieving data privacy in a VPLS: securing 1137 the control plane, and protecting the forwarding path. Compromise of 1138 the control plane could result in a PE sending data belonging to some 1139 VPLS to another VPLS, or blackholing VPLS data, or even sending it to 1140 an eavesdropper, none of which are acceptable from a data privacy 1141 point of view. Since all control plane exchanges are via BGP, 1142 techniques such as in [2] help authenticate BGP messages, making it 1143 harder to spoof updates (which can be used to divert VPLS traffic to 1144 the wrong VPLS), or withdraws (denial of service attacks). In the 1145 multi-AS options (b) and (c), this also means protecting the inter-AS 1146 BGP sessions, between the ASBRs, the PEs or the Route Reflectors. 1147 One can also use the techniques described in section 10 (b) and (c) 1148 of [6], both for the control plane and the data plane. Note that [2] 1149 will not help in keeping VPLS labels private -- knowing the labels, 1150 one can eavesdrop on VPLS traffic. However, this requires access to 1151 the data path within a Service Provider network. 1153 There can also be misconfiguration leading to unintentional 1154 connection of CEs in different VPLSs. This can be caused, for 1155 example, by associating the wrong Route Target with a VPLS instance. 1156 This problem, shared by [6], is for further study. 1158 Protecting the data plane requires ensuring that PE-to-PE tunnels are 1159 well-behaved (this is outside the scope of this document), and that 1160 VPLS labels are accepted only from valid interfaces. For a PE, valid 1161 interfaces comprise links from P routers. For an ASBR, a valid 1162 interface is a link from an ASBR in an AS that is part of a given 1163 VPLS. It is especially important in the case of multi-AS VPLSs that 1164 one accept VPLS packets only from valid interfaces. 1166 MPLS-in-IP and MPLS-in-GRE tunneling are specified in [3]. If it is 1167 desired to use such tunnels to carry VPLS packets, then the security 1168 considerations described in Section 8 of that document must be fully 1169 understood. Any implementation of VPLS that allows VPLS packets to 1170 be tunneled as described in that document MUST contain an 1171 implementation of IPsec that can be used as therein described. If 1172 the tunnel is not secured by IPsec, then the technique of IP address 1173 filtering at the border routers, described in Section 8.2 of that 1174 document, is the only means of ensuring that a packet that exits the 1175 tunnel at a particular egress PE was actually placed in the tunnel by 1176 the proper tunnel head node (i.e., that the packet does not have a 1177 spoofed source address). Since border routers frequently filter only 1178 source addresses, packet filtering may not be effective unless the 1179 egress PE can check the IP source address of any tunneled packet it 1180 receives, and compare it to a list of IP addresses that are valid 1181 tunnel head addresses. Any implementation that allows MPLS-in-IP 1182 and/or MPLS-in-GRE tunneling to be used without IPsec MUST allow the 1183 egress PE to validate in this manner the IP source address of any 1184 tunneled packet that it receives. 1186 7. IANA Considerations 1188 IANA is asked to allocate an AFI for L2VPN information (suggested 1189 value: 25). This should be the same as the AFI requested by [11]. 1191 IANA is asked to allocate an extended community value for the Layer2 1192 Info Extended Community (suggested value: 0x800a). 1194 8. References 1196 8.1. Normative References 1198 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1199 Levels", BCP 14, RFC 2119, March 1997. 1201 [2] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 1202 Signature Option", RFC 2385, August 1998. 1204 [3] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating MPLS in 1205 IP or Generic Routing Encapsulation (GRE)", RFC 4023, 1206 March 2005. 1208 [4] Bates, T., "Multiprotocol Extensions for BGP-4", 1209 draft-ietf-idr-rfc2858bis-10 (work in progress), March 2006. 1211 [5] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 1212 Communities Attribute", RFC 4360, February 2006. 1214 [6] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks 1215 (VPNs)", RFC 4364, February 2006. 1217 [7] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 1218 "Encapsulation Methods for Transport of Ethernet over MPLS 1219 Networks", RFC 4448, April 2006. 1221 8.2. Informative References 1223 [8] Bates, T., Chandra, R., and E. Chen, "BGP Route Reflection - An 1224 Alternative to Full Mesh IBGP", RFC 2796, April 2000. 1226 [9] Andersson, L. and E. Rosen, "Framework for Layer 2 Virtual 1227 Private Networks (L2VPNs)", draft-ietf-l2vpn-l2-framework-05 1228 (work in progress), June 2004. 1230 [10] Lasserre, M. and V. Kompella, "Virtual Private LAN Services 1231 Using LDP", draft-ietf-l2vpn-vpls-ldp-09 (work in progress), 1232 June 2006. 1234 [11] Ould-Brahim, H., "Using BGP as an Auto-Discovery Mechanism for 1235 VR-based Layer-3 VPNs", draft-ietf-l3vpn-bgpvpn-auto-07 (work 1236 in progress), April 2006. 1238 [12] Marques, P., "Constrained VPN Route Distribution", 1239 draft-ietf-l3vpn-rt-constrain-02 (work in progress), June 2005. 1241 [13] Martini, L., "Pseudowire Setup and Maintenance using the Label 1242 Distribution Protocol", draft-ietf-pwe3-control-protocol-17 1243 (work in progress), June 2005. 1245 [14] Kompella, K., "Layer 2 VPNs Over Tunnels", 1246 draft-kompella-l2vpn-l2vpn-01 (work in progress), January 2006. 1248 [15] Institute of Electrical and Electronics Engineers, "Information 1249 technology - Telecommunications and information exchange 1250 between systems - Local and metropolitan area networks - Common 1251 specifications - Part 3: Media Access Control (MAC) Bridges: 1252 Revision. This is a revision of ISO/IEC 10038: 1993, 802.1j- 1253 1992 and 802.6k-1992. It incorporates P802.11c, P802.1p and 1254 P802.12e. ISO/IEC 15802-3: 1998.", IEEE Standard 802.1D, 1255 July 1998. 1257 Appendix A. Contributors 1259 The following contributed to this document: 1261 Javier Achirica, Telefonica 1262 Loa Andersson, Acreo 1263 Chaitanya Kodeboyina, Juniper 1264 Giles Heron, Tellabs 1265 Sunil Khandekar, Alcatel 1266 Vach Kompella, Alcatel 1267 Marc Lasserre, Riverstone 1268 Pierre Lin 1269 Pascal Menezes 1270 Ashwin Moranganti, Appian 1271 Hamid Ould-Brahim, Nortel 1272 Seo Yeong-il, Korea Tel 1274 Appendix B. Acknowledgements 1276 Thanks to Joe Regan and Alfred Nothaft for their contributions. Many 1277 thanks too to Eric Ji, Chaitanya Kodeboyina, Mike Loomis and Elwyn 1278 Davies for their detailed reviews. 1280 Authors' Addresses 1282 Kireeti Kompella (editor) 1283 Juniper Networks 1284 1194 N. Mathilda Ave. 1285 Sunnyvale, CA 94089 1286 US 1288 Email: kireeti@juniper.net 1290 Yakov Rekhter (editor) 1291 Juniper Networks 1292 1194 N. Mathilda Ave. 1293 Sunnyvale, CA 94089 1294 US 1296 Email: yakov@juniper.net 1298 Intellectual Property Statement 1300 The IETF takes no position regarding the validity or scope of any 1301 Intellectual Property Rights or other rights that might be claimed to 1302 pertain to the implementation or use of the technology described in 1303 this document or the extent to which any license under such rights 1304 might or might not be available; nor does it represent that it has 1305 made any independent effort to identify any such rights. Information 1306 on the procedures with respect to rights in RFC documents can be 1307 found in BCP 78 and BCP 79. 1309 Copies of IPR disclosures made to the IETF Secretariat and any 1310 assurances of licenses to be made available, or the result of an 1311 attempt made to obtain a general license or permission for the use of 1312 such proprietary rights by implementers or users of this 1313 specification can be obtained from the IETF on-line IPR repository at 1314 http://www.ietf.org/ipr. 1316 The IETF invites any interested party to bring to its attention any 1317 copyrights, patents or patent applications, or other proprietary 1318 rights that may cover technology that may be required to implement 1319 this standard. Please address the information to the IETF at 1320 ietf-ipr@ietf.org. 1322 Disclaimer of Validity 1324 This document and the information contained herein are provided on an 1325 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1326 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1327 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1328 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1329 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1330 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1332 Copyright Statement 1334 Copyright (C) The Internet Society (2006). This document is subject 1335 to the rights, licenses and restrictions contained in BCP 78, and 1336 except as set forth therein, the authors retain all their rights. 1338 Acknowledgment 1340 Funding for the RFC Editor function is currently provided by the 1341 Internet Society.