idnits 2.17.1 draft-ietf-l3vpn-rfc2547bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 788: '...s routes, the PE MUST filter out all r...' RFC 2119 keyword, line 791: '... MUST remove the RT before convertin...' RFC 2119 keyword, line 875: '...ling technology, all such systems MUST...' RFC 2119 keyword, line 877: '... MUST be supported on interfaces which are neither LC-ATM [MPLS-ATM]...' RFC 2119 keyword, line 878: '... nor LC-FR [MPLS-FR] interfaces, and Downstream on Demand mode MUST be...' (1 more instance...) -- The abstract seems to indicate that this document obsoletes RFC2547, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1144 has weird spacing: '...is then sent ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MPLS-in-IP-or-GRE' is mentioned on line 1158, but not defined == Unused Reference: 'IPSEC' is defined on line 2032, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2858 (ref. 'BGP-MP') (Obsoleted by RFC 4760) == Outdated reference: A later version (-09) exists of draft-ietf-idr-bgp-ext-communities-05 ** Obsolete normative reference: RFC 3107 (ref. 'MPLS-BGP') (Obsoleted by RFC 8277) == Outdated reference: A later version (-13) exists of draft-ietf-idr-as4bytes-06 == Outdated reference: A later version (-17) exists of draft-ietf-idr-route-filter-08 -- Obsolete informational reference (is this intentional?): RFC 2796 (ref. 'BGP-RR') (Obsoleted by RFC 4456) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) == Outdated reference: A later version (-08) exists of draft-ietf-mpls-in-ip-or-gre-00 -- Obsolete informational reference (is this intentional?): RFC 3036 (ref. 'MPLS-LDP') (Obsoleted by RFC 5036) == Outdated reference: A later version (-15) exists of draft-rosen-vpn-mcast-05 Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Eric C. Rosen 3 Internet Draft Cisco Systems, Inc. 4 Expiration Date: November 2003 5 Yakov Rekhter 6 Juniper Networks, Inc. 8 Tony Bogovic Stephen John Brannon 9 Ravichander Vaidyanathan Swisscom AG 10 Telcordia Technologies 12 Marco Carugi Christopher J. Chase 13 Nortel Networks Luyuan Fang 14 ATT 16 Ting Wo Chung Jeremy De Clercq 17 Bell Nexxia Alcatel 19 Eric Dean Paul Hitchen 20 Global One Adrian Smith 21 BT 23 Manoj Leelanivas Dave Marshall 24 Juniper Networks, Inc. Worldcom 26 Luca Martini Monique Jeanne Morrow 27 Level 3 Communications, LLC Cisco Systems, Inc. 29 Vijay Srinivasan Alain Vedrenne 30 Cosine Communications Equant 32 May 2003 34 BGP/MPLS IP VPNs 36 draft-ietf-l3vpn-rfc2547bis-00.txt 38 Status of this Memo 40 This document is an Internet-Draft and is in full conformance with 41 all provisions of Section 10 of RFC2026. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF), its areas, and its working groups. Note that 45 other groups may also distribute working documents as Internet- 46 Drafts. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 The list of current Internet-Drafts can be accessed at 54 http://www.ietf.org/ietf/1id-abstracts.txt. 56 The list of Internet-Draft Shadow Directories can be accessed at 57 http://www.ietf.org/shadow.html. 59 Copyright Notice 61 Copyright (C) The Internet Society (2000). All Rights Reserved. 63 Abstract 65 This document describes a method by which a Service Provider may use 66 an IP backbone to provide IP VPNs (Virtual Private Networks) for its 67 customers. This method uses a "peer model", in which the customers' 68 edge routers ("CE routers") send their routes to the Service 69 Provider's edge routers ("PE routers"). BGP is then used by the 70 Service Provider to exchange the routes of a particular VPN among the 71 PE routers that are attached to that VPN. This is done in a way 72 which ensures that routes from different VPNs remain distinct and 73 separate, even if two VPNs have an overlapping address space. The PE 74 routers distribute, to the CE routers in a particular VPN, the routes 75 from other the CE routers in that VPN. The CE routers do not peer 76 with each other, hence there is no "overlay" visible to the VPN's 77 routing algorithm. The term "IP" in "IP VPN" is used to indicate 78 that the PE receives IP datagrams from the CE, examines their IP 79 headers, and routes them accordingly. 81 Each route within a VPN is assigned an MPLS label; when BGP 82 distributes a VPN route, it also distributes an MPLS label for that 83 route. Before a customer data packet travels across the Service 84 Provider's backbone, it is encapsulated with the MPLS label that 85 corresponds, in the customer's VPN, to the route that is the best 86 match to the packet's destination address. This MPLS packet is 87 further encapsulated (e.g., with another MPLS label, or with an IP or 88 GRE tunnel header) so that it gets tunneled across the backbone to 89 the proper PE router. Thus the backbone core routers do not need to 90 know the VPN routes. 92 The primary goal of this method is to support the case in which a 93 client obtains IP backbone services from a Service Provider or 94 Service Providers with which it maintains contractual relationships. 95 The client may be an enterprise, a group of enterprises which need an 96 extranet, an Internet Service Provider, an application service 97 provider, another VPN Service Provider which uses this same method to 98 offer VPNs to clients of its own, etc. The method makes it very 99 simple for the client to use the backbone services. It is also very 100 scalable and flexible for the Service Provider, and allows the 101 Service Provider to add value. 103 This document obsoletes RFC 2547. 105 Table of Contents 107 1 Introduction ....................................... 5 108 1.1 Virtual Private Networks ........................... 5 109 1.2 Customer Edge and Provider Edge .................... 6 110 1.3 VPNs with Overlapping Address Spaces ............... 8 111 1.4 VPNs with Different Routes to the Same System ...... 8 112 1.5 SP Backbone Routers ................................ 8 113 1.6 Security ........................................... 9 114 2 Sites and CEs ...................................... 9 115 3 VRFs: Multiple Forwarding Tables in PEs ............ 10 116 3.1 VRFs and Attachment Circuits ....................... 10 117 3.2 Associating IP Packets with VRFs ................... 12 118 3.3 Populating the VRFs ................................ 13 119 4 VPN Route Distribution via BGP ..................... 14 120 4.1 The VPN-IPv4 Address Family ........................ 14 121 4.2 Encoding of Route Distinguishers ................... 15 122 4.3 Controlling Route Distribution ..................... 16 123 4.3.1 The Route Target Attribute ......................... 17 124 4.3.2 Route Distribution Among PEs by BGP ................ 19 125 4.3.3 Use of Route Reflectors ............................ 21 126 4.3.4 How VPN-IPv4 NLRI is Carried in BGP ................ 23 127 4.3.5 Building VPNs using Route Targets .................. 24 128 4.3.6 Route Distribution Among VRFs in a Single PE ....... 24 129 5 Forwarding ......................................... 25 130 6 Maintaining Proper Isolation of VPNs ............... 27 131 7 How PEs Learn Routes from CEs ...................... 28 132 8 How CEs learn Routes from PEs ...................... 31 133 9 Carriers' Carriers ................................. 32 134 10 Multi-AS Backbones ................................. 33 135 11 Accessing the Internet from a VPN .................. 35 136 12 Management VPNs .................................... 37 137 13 Security ........................................... 38 138 13.1 Data Plane ......................................... 38 139 13.2 Control Plane ...................................... 40 140 13.3 Security of P and PE devices ....................... 40 141 14 Quality of Service ................................. 40 142 15 Scalability ........................................ 41 143 16 Intellectual Property Considerations ............... 42 144 17 Acknowledgments .................................... 42 145 18 Authors' Addresses ................................. 42 146 19 Normative References ............................... 45 147 20 Informational References ........................... 46 148 21 Full Copyright Statement ........................... 47 150 1. Introduction 152 1.1. Virtual Private Networks 154 Consider a set of "sites" that are attached to a common network that 155 we call "the backbone". Now apply some policy to create a number of 156 subsets of that set, and impose the following rule: two sites may 157 have IP interconnectivity over that backbone only if at least one of 158 these subsets contains them both. 160 These subsets are "Virtual Private Networks" (VPNs). Two sites have 161 IP connectivity over the common backbone only if there is some VPN 162 which contains them both. Two sites which have no VPN in common have 163 no connectivity over that backbone. 165 If all the sites in a VPN are owned by the same enterprise, the VPN 166 may be thought of as a corporate "intranet". If the various sites in 167 a VPN are owned by different enterprises, the VPN may be thought of 168 as an "extranet". A site can be in more than one VPN; e.g., in an 169 intranet and in several extranets. In general, when we use the term 170 VPN we will not be distinguishing between intranets and extranets. 172 We refer to the owners of the sites as the "customers". We refer to 173 the owners/operators of the backbone as the "Service Providers" 174 (SPs). The customers obtain "VPN service" from the SPs. 176 A customer may be a single enterprise, a set of enterprises, an 177 Internet Service Provider, an Application Service Provider, another 178 SP which offers the same kind of VPN service to its own customers, 179 etc. 181 The policies that determine whether a particular collection of sites 182 is a VPN are the policies of the customers. Some customers will want 183 the implementation of these policies to be entirely the 184 responsibility of the SP. Other customers may want to share with the 185 SP the responsibility for implementing these policies. This document 186 specifies mechanisms that can be used to implement these policies. 187 The mechanisms we describe are general enough to allow these policies 188 to be implemented either by the SP alone, or by a VPN customer 189 together with the SP. Most of the discussion is focused on the 190 former case, however. 192 The mechanisms discussed in this document allow the implementation of 193 a wide range of policies. For example, within a given VPN, one can 194 allow every site to have a direct route to every other site ("full 195 mesh"). Alternatively, one can force traffic between certain pairs 196 of sites to be routed via a third site. This can be useful, e.g., if 197 it is desired that traffic between a pair of sites be passed through 198 a firewall, and the firewall is located at the third site. 200 In this document, we restrict our discussion to the case in which the 201 customer is explicitly purchasing VPN service from an SP, or from a 202 set of SPs that have agreed to cooperate to provide the VPN service. 203 That is, the customer is not merely purchasing internet access from 204 an SP, and the VPN traffic does not pass through a random collection 205 of interconnected SP networks. 207 We also restrict our discussion to the case in which the backbone 208 provides an IP service to the customer, rather than, e.g, a Frame 209 Relay, ATM, ethernet, HDLC, PPP, etc., service. The customer may 210 attach to the backbone via one of these (or other) layer 2 services, 211 but the layer 2 service is terminated at the "edge" of the backbone, 212 where the customer's IP datagrams are removed from any layer 2 213 encapsulation. 215 In the rest of this introduction, we specify some properties which 216 VPNs should have. The remainder of this document specifies a set of 217 mechanisms that can be deployed to provide a VPN model which has all 218 these properties. This section also introduces some of the technical 219 terminology used in the remainder of the document. 221 1.2. Customer Edge and Provider Edge 223 Routers can be attached to each other, or to end systems, in a 224 variety of different ways: PPP connections, ATM VCs, Frame Relay 225 DLCIs, ethernet interfaces, VLANs on ethernet interfaces, GRE 226 tunnels, L2TP tunnels, IPsec tunnels, etc. We will use the term 227 "attachment circuit" to refer generally to some such means of 228 attaching to a router. An attachment circuit may be the sort of 229 connection that is usually thought of as a "data link", or it may be 230 a tunnel of some sort; what matters is that it be possible for two 231 devices to be network layer peers over the attachment circuit. 233 Each VPN site must contain one or more Customer Edge (CE) devices. 234 Each CE device is attached, via some sort of attachment circuit, to 235 one or more Provider Edge (PE) routers. 237 Routers in the SP's network which do not attach to CE devices are 238 known as "P routers". 240 CE devices can be hosts or routers. In a typical case, a site 241 contains one or more routers, some of which are attached to PE 242 routers. The site routers which attach to the PE routers would then 243 be the CE devices, or "CE routers". However, there is nothing to 244 prevent a non-routing host from attaching directly to a PE router, in 245 which case the host would be a CE device. 247 Sometimes, what is physically attached to a PE router is a layer 2 248 switch. In this case, we do NOT say that the layer 2 switch is a CE 249 devices. Rather, the CE devices are the hosts and routers that 250 communicate with the PE router through the layer 2 switch; the layer 251 2 infrastructure is transparent. If the layer 2 infrastructure 252 provides a multipoint service, then multiple CE devices can be 253 attached to the PE router over the same attachment circuit. 255 CE devices are logically part of a customer's VPN. PE and P routers 256 are logically part of the SP's network. 258 The attachment circuit over which a packet travels when going from CE 259 to PE is known as that packet's "ingress attachment circuit", and the 260 PE as the packet's "ingress PE". The attachment circuit over which a 261 packet travels when going from PE to CE is known as that packet's 262 "egress attachment circuit", and the PE as the packet's "egress PE". 264 We will say that a PE router is attached to a particular VPN if it is 265 attached to a CE device which is in a site of that VPN. Similarly, 266 we will say that a PE router is attached to a particular site if it 267 is attached to a CE device which is in that site. 269 When the CE device is a router, it is a routing peer of the PE(s) to 270 which it is attached, but it is NOT a routing peer of CE routers at 271 other sites. Routers at different sites do not directly exchange 272 routing information with each other; in fact, they do not even need 273 to know of each other at all. As a consequence, the customer has no 274 backbone or "virtual backbone" to manage, and does not have to deal 275 with any inter-site routing issues. In other words, in the scheme 276 described in this document, a VPN is NOT an "overlay" on top of the 277 SP's network. 279 With respect to the management of the edge devices, clear 280 administrative boundaries are maintained between the SP and its 281 customers. Customers are not required to access the PE or P routers 282 for management purposes, nor is the SP required to access the CE 283 devices for management purposes. 285 1.3. VPNs with Overlapping Address Spaces 287 If two VPNs have no sites in common, then they may have overlapping 288 address spaces. That is, a given address might be used in VPN V1 as 289 the address of system S1, but in VPN V2 as the address of a 290 completely different system S2. This is a common situation when the 291 VPNs each use an RFC1918 private address space. Of course, within 292 each VPN, each address must be unambiguous. 294 Even two VPNs which do have sites in common may have overlapping 295 address spaces, as long as there is no need for any communication 296 between systems with such addresses and systems in the common sites. 298 1.4. VPNs with Different Routes to the Same System 300 Although a site may be in multiple VPNs, it is not necessarily the 301 case that the route to a given system at that site should be the same 302 in all the VPNs. Suppose, for example, we have an intranet 303 consisting of sites A, B, and C, and an extranet consisting of A, B, 304 C, and the "foreign" site D. Suppose that at site A there is a 305 server, and we want clients from B, C, or D to be able to use that 306 server. Suppose also that at site B there is a firewall. We want 307 all the traffic from site D to the server to pass through the 308 firewall, so that traffic from the extranet can be access controlled. 309 However, we don't want traffic from C to pass through the firewall on 310 the way to the server, since this is intranet traffic. 312 It is possible to set up two routes to the server. One route, used 313 by sites B and C, takes the traffic directly to site A. The second 314 route, used by site D, takes the traffic instead to the firewall at 315 site B. If the firewall allows the traffic to pass, it then appears 316 to be traffic coming from site B, and follows the route to site A. 318 1.5. SP Backbone Routers 320 The SP's backbone consists of the PE routers, as well as other 321 routers ("P routers") which do not attach to CE devices. 323 If every router in an SP's backbone had to maintain routing 324 information for all the VPNs supported by the SP, there would be 325 severe scalability problems; the number of sites that could be 326 supported would be limited by the amount of routing information that 327 could be held in a single router. It is important therefore that the 328 routing information about a particular VPN only needs to be present 329 in the PE routers which attach to that VPN. In particular, the P 330 routers do not need to have ANY per-VPN routing information 331 whatsoever. (This condition may need to be relaxed somewhat when 332 multicast routing is considered. This is not considered further in 333 this paper, but is examined in [VPN-MCAST].) 335 So just as the VPN owners do not have a backbone or "virtual 336 backbone" to administer, the SPs themselves do not have a separate 337 backbone or "virtual backbone" to administer for each VPN. Site-to- 338 site routing in the backbone is optimal (within the constraints of 339 the policies used to form the VPNs), and is not constrained in any 340 way by an artificial "virtual topology" of tunnels. 342 Section 10 discusses some of the special issues that arise when the 343 backbone spans several service providers. 345 1.6. Security 347 VPNs of the sort being discussed here, even without making use of 348 cryptographic security measures, are intended to provide a level of 349 security equivalent to that obtainable when a layer 2 backbone (e.g., 350 Frame Relay) is used. That is, in the absence of misconfiguration or 351 deliberate interconnection of different VPNs, it is not possible for 352 systems in one VPN to gain access to systems in another VPN. Of 353 course the methods described herein do not by themselves encrypt the 354 data for privacy, nor do they provide a way to determine whether data 355 has been tampered with en route. If this is desired, cryptographic 356 measures must be applied in addition. (See, e.g., [MPLS/BGP-IPsec]. 357 Security is discussed in more detail in section 13. 359 2. Sites and CEs 361 From the perspective of a particular backbone network, a set of IP 362 systems may be regarded as a "site" if those systems have mutual IP 363 interconnectivity that doesn't require use of the backbone. In 364 general, a site will consist of a set of systems which are in 365 geographic proximity. However, this is not universally true. If two 366 geographic locations are connected via a leased line, over which OSPF 367 is running, and if that line is the preferred way of communicating 368 between the two locations, then the two locations can be regarded as 369 a single site, even if each location has its own CE router. (This 370 notion of "site" is topological, rather than geographical. If the 371 leased line goes down, or otherwise ceases to be the preferred route, 372 but the two geographic locations can continue to communicate by using 373 the VPN backbone, then one site has become two.) 375 A CE device is always regarded as being in a single site (though as 376 we shall see in section 3.2), a site may consist of multiple "virtual 377 sites"). A site, however, may belong to multiple VPNs. 379 A PE router may attach to CE devices from any number of different 380 sites, whether those CE devices are in the same or in different VPNs. 381 A CE device may, for robustness, attach to multiple PE routers, of 382 the same or of different service providers. If the CE device is a 383 router, the PE router and the CE router will appear as router 384 adjacencies to each other. 386 While we speak mostly of "sites" as being the basic unit of 387 interconnection, nothing here prevents a finer degree of granularity 388 in the control of interconnectivity. For example, certain systems at 389 a site may be members of an intranet as well as members of one or 390 more extranets, while other systems at the same site may be 391 restricted to being members of the intranet only. However, this 392 might require that the site have two attachment circuits to the 393 backbone, one for the intranet and one for the extranet; it might 394 further require that firewall functionality be applied on the 395 extranet attachment circuit. 397 3. VRFs: Multiple Forwarding Tables in PEs 399 Each PE router maintains a number of separate forwarding tables. One 400 of the forwarding tables is the "default forwarding table". The 401 others are "VPN Routing and Forwarding tables", or "VRFs". 403 3.1. VRFs and Attachment Circuits 405 Every PE-CE attachment circuits is associated, by configuration, with 406 one or more VRFs. An attachment circuit which is associated with a 407 VRF is known as a "VRF attachment circuit". 409 In the simplest case and most typical case, a PE-CE attachment 410 circuit is associated with exactly one VRF. When an IP packet is 411 received over a particular attachment circuit, its destination IP 412 address is looked up in the associated VRF. The result of that 413 lookup determines how to route the packet. The VRF used by a 414 packet's ingress PE for routing a particular packet is known as the 415 packet's "ingress VRF". (There is also the notion of a packet's 416 "egress VRF", located at the packet's egress PE; this is discussed in 417 section 5.) 419 If an IP packet arrives over an attachment circuit which is not 420 associated with any VRF, the packet's destination address is looked 421 up in the default forwarding table, and the packet is routed 422 accordingly. Packets forwarded according to the default forwarding 423 table include packets from neighboring P or PE routers, as well as 424 packets from customer-facing attachment circuits that have not been 425 associated with VRFs. 427 Intuitively, one can think of the default forwarding table as 428 containing "public routes", and of the VRFs as containing "private 429 routes". One can similarly think of VRF attachment circuits as being 430 "private", and of non-VRF attachment circuits as being "public". 432 If a particular VRF attachment circuit connects site S to a PE 433 router, then connectivity from S (via that attachment circuit) can be 434 restricted by controlling the set of routes which get entered in the 435 corresponding VRF. The set of routes in that VRF should be limited 436 to the set of routes leading to sites which have at least one VPN in 437 common with S. Then a packet sent from S over a VRF attachment 438 circuit can only be routed by the PE to another site S' if S' is in 439 one of the same VPNs as S. That is, communication (via PE routers) 440 is prevented between any pair of VPN sites which have no VPN in 441 common. Communication between VPN sites and non-VPN sites is 442 prevented by keeping the routes to the VPN sites out of the default 443 forwarding table. 445 If there are multiple attachment circuits leading from S to one or 446 more PE routers, then there might be multiple VRFs that could be used 447 to route traffic from S. To properly restrict S's connectivity, the 448 same set of routes would have to exist in all the VRFs. 449 Alternatively, one could impose different connectivity restrictions 450 over different attachment circuit from S. In that case, some of the 451 VRFs associated with attachment circuits from S would contain 452 different sets of routes than some of the others. 454 We allow the case in which a single attachment circuit is associated 455 with a set of VRFs, rather than with a single VRF. This can be 456 useful if it is desired to divide a single VPN into several "sub- 457 VPNs", each with different connectivity restrictions, where some 458 characteristic of the customer packets is used to select from among 459 the sub-VPNs. For simplicity though, we will usually speak of an 460 attachment circuit as being associated with a single VRF. 462 3.2. Associating IP Packets with VRFs 464 When a PE router receives a packet from a CE device, it must 465 determine the attachment circuit over which the packet arrived, as 466 this determines in turn the VRF (or set of VRFs) that can be used for 467 forwarding that packet. In general, to determine the attachment 468 circuit over which a packet arrived, a PE router takes note of the 469 physical interface over which the packet arrived, and possibly also 470 takes note of some aspect of the packet's layer 2 header. For 471 example, if a packet's ingress attachment circuit is a frame relay 472 VC, the identity of the attachment circuit can be determined from the 473 physical frame relay interface over which the packet arrived, 474 together with the DLCI field in the packet's frame relay header. 476 Although the PE's conclusion that a particular packet arrived on a 477 particular Attachment Circuit may be partially determined by the 478 packet's layer 2 header, it must be impossible for a customer, by 479 writing the header fields, to fool the SP into thinking that a packet 480 which was received over one attachment circuit really arrived over a 481 different one. In the example above, although the attachment circuit 482 is determined partially by inspection of the DLCI field in the frame 483 relay header, this field cannot be set freely by the customer. 484 Rather, it must be set to a value specified by the SP, or else the 485 packet cannot arrive at the PE router. 487 In some cases, a particular site may be divided by the customer into 488 several "virtual sites". The SP may designate a particular set of 489 VRFs to be used for routing packets from that site, and may allow the 490 customer to set some characteristic of the packet which is then used 491 for choosing a particular VRF from the set. 493 For example, each virtual site might be realized as a VLAN. The SP 494 and the customer could agree that on packets arriving from a 495 particular CE, certain VLAN values would be used to identify certain 496 VRFs. Of course, packets from that CE would be discarded by the PE 497 if they carry VLAN tag values that are not in the agreed upon set. 498 Another way to accomplish this is to use IP source addresses. In this 499 case PE uses the IP source address in a packet received from the CE, 500 along with the interface over which the packet is received, to assign 501 the packet to a particular VRF. Again, the customer would only be 502 able to select from among the particular set of VRFs which that 503 customer is allowed to use. 505 If it is desired to have a particular host be in multiple virtual 506 sites, then that host must determine, for each packet, which virtual 507 site the packet is associated with. It can do this, e.g., by sending 508 packets from different virtual sites on different VLANs, or out 509 different network interfaces. 511 3.3. Populating the VRFs 513 With what set of routes are the VRFs populated? 515 As an example, let PE1, PE2, and PE3 be three PE routers, and let 516 CE1, CE2, and CE3 be three CE routers. Suppose that PE1 learns, from 517 CE1, the routes which are reachable at CE1's site. If PE2 and PE3 518 are attached respectively to CE2 and CE3, and there is some VPN V 519 containing CE1, CE2, and CE3, then PE1 uses BGP to distribute to PE2 520 and PE3 the routes which it has learned from CE1. PE2 and PE3 use 521 these routes to populate the VRFs which they associate respectively 522 with the sites of CE2 and CE3. Routes from sites which are not in 523 VPN V do not appear in these VRFs, which means that packets from CE2 524 or CE3 cannot be sent to sites which are not in VPN V. 526 When we speak of a PE "learning" routes from a CE, we are not 527 presupposing any particular learning technique. The PE may learn 528 routes by means of a dynamic routing algorithm, but it may also 529 "learn" routes by having those routes configured (i.e., static 530 routing). (In this case, to say that the PE "learned" the routes 531 from the CE is perhaps to exercise a bit of poetic license.) 533 PEs also need to learn, from other PEs, the routes which belong to a 534 given VPN. The procedures to be used for populating the VRFs with 535 the proper sets of routes are specified in section 4. 537 If there are multiple attachment circuits leading from a particular 538 PE router to a particular site, they might all be mapped to the same 539 forwarding table. But if policy dictates, they could be mapped to 540 different forwarding tables. For instance, the policy might be that 541 a particular attachment circuit from a site is used only for intranet 542 traffic, while another attachment circuit from that site is used only 543 for extranet traffic. (Perhaps, e.g., the CE attached to the 544 extranet attachment circuit is a firewall, while the CE attached to 545 the intranet attachment circuit is not.) In this case, the two 546 attachment circuits would be associated with different VRFs. 548 Note that if two attachment circuits are associated with the same 549 VRF, then packets which the PE receives over one of them will be able 550 to reach exactly the same set of destinations as packets which the PE 551 receives over the other. So two attachment circuits cannot be 552 associated with the same VRF unless each CE is in the exact same set 553 of VPNs as is the other. 555 If an attachment circuit leads to a site which is in multiple VPNs, 556 the attachment circuit may still associated with a single VRF, in 557 which case the VRF will contain routes from the full set of VPNs of 558 which the site is a member. 560 4. VPN Route Distribution via BGP 562 PE routers use BGP to distribute VPN routes to each other (more 563 accurately, to cause VPN routes to be distributed to each other). 565 We allow each VPN to have its own address space, which means that a 566 given address may denote different systems in different VPNs. If two 567 routes, to the same IP address prefix, are actually routes to 568 different systems, it is important to ensure that BGP not treat them 569 as comparable. Otherwise BGP might choose to install only one of 570 them, making the other system unreachable. Further, we must ensure 571 that POLICY is used to determine which packets get sent on which 572 routes; given that several such routes are installed by BGP, only one 573 such must appear in any particular VRF. 575 We meet these goals by the use of a new address family, as specified 576 below. 578 4.1. The VPN-IPv4 Address Family 580 The BGP Multiprotocol Extensions [BGP-MP] allow BGP to carry routes 581 from multiple "address families". We introduce the notion of the 582 "VPN-IPv4 address family". A VPN-IPv4 address is a 12-byte quantity, 583 beginning with an 8-byte "Route Distinguisher (RD)" and ending with a 584 4-byte IPv4 address. If several VPNs use the same IPv4 address 585 prefix, the PEs translate these into unique VPN-IPv4 address 586 prefixes. This ensures that if the same address is used in several 587 different VPNs, it is possible for BGP to carry several completely 588 different routes to that address, one for each VPN. 590 Since VPN-IPv4 addresses and IPv4 addresses are different address 591 families, BGP never treats them as comparable addresses. 593 An RD is simply a number, and it does not contain any inherent 594 information; it does not identify the origin of the route or the set 595 of VPNs to which the route is to be distributed. The purpose of the 596 RD is solely to allow one to create distinct routes to a common IPv4 597 address prefix. Other means are used to determine where to 598 redistribute the route (see section 4.3). 600 The RD can also be used to create multiple different routes to the 601 very same system. We have already discussed a situation in which the 602 route to a particular server should be different for intranet traffic 603 than for extranet traffic. This can be achieved by creating two 604 different VPN-IPv4 routes that have the same IPv4 part, but different 605 RDs. This allows BGP to install multiple different routes to the 606 same system, and allows policy to be used (see section 4.3.5) to 607 decide which packets use which route. 609 The RDs are structured so that every service provider can administer 610 its own "numbering space" (i.e., can make its own assignments of 611 RDs), without conflicting with the RD assignments made by any other 612 service provider. An RD consists of three fields: a two-byte type 613 field, an administrator field, and an assigned number field. The 614 value of the type field determines the lengths of the other two 615 fields, as well as the semantics of the administrator field. The 616 administrator field identifies an assigned number authority, and the 617 assigned number field contains a number which has been assigned, by 618 the identified authority, for a particular purpose. For example, one 619 could have an RD whose administrator field contains an Autonomous 620 System number (ASN), and whose (4-byte) number field contains a 621 number assigned by the SP to whom that ASN belongs (having been 622 assigned to that SP by the appropriate authority). 624 RDs are given this structure in order to ensure that an SP which 625 provides VPN backbone service can always create a unique RD when it 626 needs to do so. However, the structure is not meaningful to BGP; when 627 BGP compares two such address prefixes, it ignores the structure 628 entirely. 630 A PE needs to be configured such that routes which lead to particular 631 CE become associated with a particular RD. The configuration may 632 cause all routes leading to the same CE to be associated with the 633 same RD, or it may be cause different routes to be associated with 634 different RDs, even if they lead to the same CE. 636 4.2. Encoding of Route Distinguishers 638 As stated, a VPN-IPv4 address consists of an 8-byte Route 639 Distinguisher followed by a 4-byte IPv4 address. The RDs are encoded 640 as follows: 642 - Type Field: 2 bytes 643 - Value Field: 6 bytes 645 The interpretation of the Value field depends on the value of the 646 Type field. At the present time, three values of the type field are 647 defined: 0, 1, and 2. 649 - Type 0: The Value field consists of two subfields: 651 * Administrator subfield: 2 bytes 652 * Assigned Number subfield: 4 bytes 654 The Administrator subfield must contain an Autonomous System 655 number. If this ASN is from the public ASN space, it must have 656 been assigned by the appropriate authority (use of ASN values 657 from the private ASN space is strongly discouraged). The 658 Assigned Number subfield contains a number from a numbering space 659 which is administered by the enterprise to which the ASN has been 660 assigned by an appropriate authority. 662 - Type 1: The Value field consists of two subfields: 664 * Administrator subfield: 4 bytes 665 * Assigned Number subfield: 2 bytes 667 The Administrator subfield must contain an IP address. If this IP 668 address is from the public IP address space, it must have been 669 assigned by an appropriate authority (use of addresses from the 670 private IP address space is strongly discouraged). The Assigned 671 Number sub-field contains a number from a numbering space which 672 is administered by the enterprise to which the IP address has 673 been assigned. 675 - Type 2: The Value field consists of two subfields: 677 * Administrator subfield: 4 bytes 678 * Assigned Number subfield: 2 bytes 680 The Administrator subfield must contain a 4-byte Autonomous 681 System number [BGP-AS4]. If this ASN is from the public ASN 682 space, it must have been assigned by the appropriate authority 683 (use of ASN values from the private ASN space is strongly 684 discouraged). The Assigned Number subfield contains a number 685 from a numbering space which is administered by the enterprise to 686 which the ASN has been assigned by an appropriate authority. 688 4.3. Controlling Route Distribution 690 In this section, we discuss the way in which the distribution of the 691 VPN-IPv4 routes is controlled. 693 If a PE router is attached to a particular VPN (by being attached to 694 a particular CE in that VPN), it learns some of that VPN's IP routes 695 from the attached CE router. Routes learned from a CE routing peer 696 over a particular attachment circuit may be installed in the VRF 697 associated with that attachment circuit. Exactly which routes are 698 installed in this manner is determined by the way in which the PE 699 learns routes from the CE. In particular, when the PE and CE are 700 routing protocol peers, this is determined by the decision process of 701 the routing protocol; this is discussed in section 7. 703 These routes are then converted to VPN-IP4 routes, and "exported" to 704 BGP. If there is more than one route to a particular VPN-IP4 address 705 prefix, BGP chooses the "best" one, using the BGP decision process. 706 That route is then distributed by BGP to the set of other PEs that 707 need to know about it. At these other PEs, BGP will again choose the 708 best route for a particular VPN-IP4 address prefix. Then the chosen 709 VPN-IP4 routes are converted back into IP routes, and "imported" into 710 one or more VRFs. Whether they are actually installed in the VRFs 711 depends on the decision process of the routing method used between 712 the PE and those CEs that are associated with the VRF in question. 713 Finally, any route installed in a VRF may be distributed to the 714 associated CE routers. 716 4.3.1. The Route Target Attribute 718 Every VRF is associated with one or more "Route Target" attributes. 720 When a VPN-IPv4 route is created (from an IPv4 route which the PE has 721 learned from a CE) by a PE router, it is associated with one or more 722 "Route Target" attributes. These are carried in BGP as attributes of 723 the route. 725 Any route associated with Route Target T must be distributed to every 726 PE router that has a VRF associated with Route Target T. When such a 727 route is received by a PE router, it is eligible to be installed in 728 those of the PE's VRFs which are associated with Route Target T. 729 (Whether it actually gets installed depends upon the outcome of the 730 BGP decision process, and upon the outcome of the decision process of 731 the PE-CE IGP.) 733 A Route Target attribute can be thought of as identifying a set of 734 sites. (Though it would be more precise to think of it as 735 identifying a set of VRFs.) Associating a particular Route Target 736 attribute with a route allows that route to be placed in the VRFs 737 that are used for routing traffic which is received from the 738 corresponding sites. 740 There is a set of Route Targets that a PE router attaches to a route 741 received from site S; these may be called the "Export Targets". And 742 there is a set of Route Targets that a PE router uses to determine 743 whether a route received from another PE router could be placed in 744 the VRF associated with site S; these may be called the "Import 745 Targets". The two sets are distinct, and need not be the same. Note 746 that a particular VPN-IPv4 route is only eligible for installation in 747 a particular VRF if there is some Route Target which is both one of 748 the route's Route Targets and one of the VRF's Import Targets. 750 The function performed by the Route Target attribute is similar to 751 that performed by the BGP Communities Attribute. However, the format 752 of the latter is inadequate for present purposes, since it allows 753 only a two-byte numbering space. It is desirable to structure the 754 format, similar to what we have described for RDs (see section 4.2), 755 so that a type field defines the length of an administrator field, 756 and the remainder of the attribute is a number from the specified 757 administrator's numbering space. This can be done using BGP Extended 758 Communities. The Route Targets discussed herein are encoded as BGP 759 Extended Community Route Targets [BGP-EXTCOMM]. They are structured 760 similarly to the RDs. 762 When a BGP speaker has received more than one route to the same VPN- 763 IPv4 prefix, the BGP rules for route preference are used to choose 764 which VPN-IPv4 route is installed by BGP. 766 Note that a route can only have one RD, but it can have multiple 767 Route Targets. In BGP, scalability is improved if one has a single 768 route with multiple attributes, as opposed to multiple routes. One 769 could eliminate the Route Target attribute by creating more routes 770 (i.e., using more RDs), but the scaling properties would be less 771 favorable. 773 How does a PE determine which Route Target attributes to associate 774 with a given route? There are a number of different possible ways. 775 The PE might be configured to associate all routes that lead to a 776 specified site with a specified Route Target. Or the PE might be 777 configured to associate certain routes leading to a specified site 778 with one Route Target, and certain with another. 780 If the PE and the CE are themselves BGP peers (see section 7), then 781 the SP may allow the customer, within limits, to specify how its 782 routes are to be distributed. The SP and the customer would need to 783 agree in advance on the set of RTs which are allowed to be attached 784 to the customer's VPN routes. The CE could then attach one or more 785 of those RTs to each IP route which it distributes to the PE. This 786 gives the customer the freedom to specify in real time, within agreed 787 upon limits, its route distribution policies. If the CE is allowed 788 to attach RTs to its routes, the PE MUST filter out all routes which 789 contain RTs that the customer is not allowed to use. If the CE is 790 not allowed to attach RTs to its routes, but does so anyway, the PE 791 MUST remove the RT before converting the customer's route to a VPN- 792 IPv4 route. 794 4.3.2. Route Distribution Among PEs by BGP 796 If two sites of a VPN attach to PEs which are in the same Autonomous 797 System, the PEs can distribute VPN-IPv4 routes to each other by means 798 of an IBGP connection between them. Alternatively, each can have an 799 IBGP connection to a route reflector [BGP-RR]. 801 When a PE router distributes a VPN-IPv4 route via BGP, it uses its 802 own address as the "BGP next hop". This address is encoded as a 803 VPN-IPv4 address with an RD of 0. ([BGP-MP] requires that the next 804 hop address be in the same address family as the NLRI.) It also 805 assigns and distributes an MPLS label. (Essentially, PE routers 806 distribute not VPN-IPv4 routes, but Labeled VPN-IPv4 routes. Cf. 807 [MPLS-BGP]). When the PE processes a received packet that has this 808 label at the top of the stack, the PE will pop the stack, and process 809 the packet appropriately. 811 The PE may distribute the exact set of routes that appears in the 812 VRF, or it may perform summarization and distribute aggregates of 813 those routes, or it may do some of one and some of the other. 815 Suppose that a PE has assigned label L to route R, and has 816 distributed this label mapping via BGP. If R is an aggregate of a 817 set of routes in the VRF, the PE will know that packets from the 818 backbone which arrive with this label must have their destination 819 addresses looked up in a VRF. When the PE looks up the label in its 820 Label Information Base, it learns which VRF must be used. On the 821 other hand, if R is not an aggregate, then when the PE looks up the 822 label, it learns the egress attachment circuit, as well as the 823 encapsulation header for the packet. In this case, no lookup in the 824 VRF is done. 826 We would expect that the most common case would be the case where the 827 route is NOT an aggregate. The case where it is an aggregate can be 828 very useful though if the VRF contains a large number of host routes 829 (e.g., as in dial-in), or if the VRF has an associated LAN interface 830 (where there is a different outgoing layer 2 header for each system 831 on the LAN, but a route is not distributed for each such system). 833 Whether each route has a distinct label or not is an implementation 834 matter. There are a number of possible algorithms one could use to 835 determine whether two routes get assigned the same label: 837 - One may choose to have a single label for an entire VRF, so that 838 a single label is shared by all the routes from that VRF. Then 839 when the egress PE receives a packet with that label, it must 840 look up the packet's IP destination address in that VRF (the 841 packet's "egress VRF"), in order to determine the packet's egress 842 attachment circuit and the corresponding data link encapsulation. 844 - One may choose to have a single label for each attachment 845 circuit, so that a single label is shared by all the routes with 846 the same "outgoing attachment circuit". This enables one to 847 avoid doing a lookup in the egress VRF, though some sort of 848 lookup (e.g., ARP) may need to be done in order to determine the 849 data link encapsulation. 851 - One may choose to have a distinct label for each route. Then if 852 a route is potentially reachable over more than one attachment 853 circuit, the PE/CE routing can switch the preferred path for a 854 route from one attachment circuit to another, without there being 855 any need to distribute new a label for that route. 857 There may be other possible algorithms as well. The choice of 858 algorithm is entirely at the discretion of the egress PE, and is 859 otherwise transparent. 861 In using BGP-distributed MPLS labels in this manner, we presuppose 862 that an MPLS packet carrying such a label can be tunneled from the 863 router that installs the corresponding BGP-distributed route to the 864 router which is the BGP next hop of that route. This requires either 865 that a label switched path exist between those two routers, or else 866 that some other tunneling technology (e.g., [MPLS-in-IP-GRE]) can be 867 used between them. 869 This tunnel may follow a "best effort" route, or it may follow a 870 traffic engineered route. Between a given pair of routers there may 871 be one such tunnel, or there may be several, perhaps with different 872 QoS characteristics. All that matters for the VPN architecture is 873 that some such tunnel exists. To ensure interoperability among 874 systems which implement this VPN architecture using MPLS label 875 switched paths as the tunneling technology, all such systems MUST 876 support LDP [MPLS-LDP]. In particular, Downstream Unsolicited mode 877 MUST be supported on interfaces which are neither LC-ATM [MPLS-ATM] 878 nor LC-FR [MPLS-FR] interfaces, and Downstream on Demand mode MUST be 879 supported on LC-ATM interfaces and LC-FR interfaces. 881 If the tunnel follows a best effort route, then the PE finds the 882 route to the remote endpoint by looking up its IP address in the 883 default forwarding table. 885 A PE router, UNLESS it is a Route Reflector (see section 4.3.3) or an 886 Autonomous System border router for an inter-provider VPN (see 887 section 10) should not install a VPN-IPv4 route unless it has at 888 least one VRF with an Import Target identical to one of the route's 889 Route Target attributes. Inbound filtering should be used to cause 890 such routes to be discarded. If a new Import Target is later added 891 to one of the PE's VRFs (a "VPN Join" operation), it must then 892 acquire the routes it may previously have discarded. This can be 893 done using the refresh mechanism described in [BGP-RFSH]. The 894 outbound route filtering mechanism of [BGP-ORF] can also be used to 895 advantage to make the filtering more dynamic. 897 Similarly, if a particular Import Target is no longer present in any 898 of a PE's VRFs (as a result of one or more "VPN Prune" operations), 899 the PE may discard all routes which, as a result, no longer have any 900 of the PE's VRF's Import Targets as one of their Route Target 901 Attributes. 903 A router which is not attached to any VPN, and which is not a Route 904 Reflector (i.e., a P router), never installs any VPN-IPv4 routes at 905 all. 907 Note that VPN Join and Prune operations are non-disruptive, and do 908 not require any BGP connections to be brought down, as long as the 909 refresh mechanism of [BGP-RFSH] is used. 911 As a result of these distribution rules, no one PE ever needs to 912 maintain all routes for all VPNs; this is an important scalability 913 consideration. 915 4.3.3. Use of Route Reflectors 917 Rather than having a complete IBGP mesh among the PEs, it is 918 advantageous to make use of BGP Route Reflectors [BGP-RR] to improve 919 scalability. All the usual techniques for using route reflectors to 920 improve scalability, e.g., route reflector hierarchies, are 921 available. 923 Route reflectors are the only systems which need to have routing 924 information for VPNs to which they are not directly attached. 925 However, there is no need to have any one route reflector know all 926 the VPN-IPv4 routes for all the VPNs supported by the backbone. 928 We outline below two different ways to partition the set of VPN-IPv4 929 routes among a set of route reflectors. 931 1. Each route reflector is preconfigured with a list of Route 932 Targets. For redundancy, more than one route reflector may be 933 preconfigured with the same list. A route reflector uses the 934 preconfigured list of Route Targets to construct its inbound 935 route filtering. The route reflector may use the techniques of 936 [BGP-ORF] to install on each of its peers (regardless of 937 whether the peer is another route reflector, or a PE) the set 938 of "Outbound Route Filters" (ORFs) that contain the list of its 939 preconfigured Route Targets. Note that route reflectors should 940 accept ORFs from other route reflectors, which means that route 941 reflectors should advertise the ORF capability to other route 942 reflectors. 944 A service provider may modify the list of preconfigured Route 945 Targets on a route reflector. When this is done, the route 946 reflector modifies the ORFs it installs on all of its IBGP 947 peers. To reduce the frequency of configuration changes on 948 route reflectors, each route reflector may be preconfigured 949 with a block of Route Targets. This way, when a new Route 950 Target is needed for a new VPN, there is already one or more 951 route reflectors that are (pre)configured with this Route 952 Target. 954 Unless a given PE is a client of all route reflectors, when a 955 new VPN is added to the PE ("VPN Join"), it will need to become 956 a client of the route reflector(s) that maintain routes for 957 that VPN. Likewise, deleting an existing VPN from the PE ("VPN 958 Prune") may result in a situation where the PE no longer need 959 to be a client of some route reflector(s). In either case, the 960 Join or Prune operation is non-disruptive (as long as [BGP- 961 RFSH] is used, and never requires a BGP connection to be 962 brought down, only to be brought right back up. 964 (By "adding a new VPN to a PE", we really mean adding a new 965 import Route Target to one of its VRFs, or adding a new VRF 966 with an import Route Target not had by any of the PE's other 967 VRFs.) 969 2. Another method is to have each PE be a client of some subset of 970 the route reflectors. A route reflector is not preconfigured 971 with the list of Route Targets, and does not perform inbound 972 route filtering of routes received from its clients (PEs); 973 rather it accepts all the routes received from all of its 974 clients (PEs). The route reflector keeps track of the set of 975 the Route Targets carried by all the routes it receives. When 976 the route reflector receives from its client a route with a 977 Route Target that is not in this set, this Route Target is 978 immediately added to the set. On the other hand, when the route 979 reflector no longer has any routes with a particular Route 980 Target that is in the set, the route reflector should delay (by 981 a few hours) the deletion of this Route Target from the set. 983 The route reflector uses this set to form the inbound route 984 filters that it applies to routes received from other route 985 reflectors. The route reflector may also use ORFs to install 986 the appropriate outbound route filtering on other route 987 reflectors. Just like with the first approach, a route 988 reflector should accept ORFs from other route reflectors. To 989 accomplish this, a route reflector advertises ORF capability to 990 other route reflectors. 992 When the route reflector changes the set, it should immediately 993 change its inbound route filtering. In addition, if the route 994 reflector uses ORFs, then the ORFs have to be immediately 995 changed to reflect the changes in the set. If the route 996 reflector doesn't use ORFs, and a new Route Target is added to 997 the set, the route reflector, after changing its inbound route 998 filtering, must issue BGP Refresh to other route reflectors. 1000 The delay of "a few hours" mentioned above allows a route 1001 reflector to hold onto routes with a given RT, even after it 1002 loses the last of its clients which are interested in such 1003 routes. This protects against the need to reacquire all such 1004 routes if the clients' "disappearance" is only temporary. 1006 With this procedure, VPN Join and Prune operations are also 1007 non-disruptive. 1009 Note that this technique will not work properly if some client 1010 PE has a VRF with an import Route Target that is not one of its 1011 export Route Targets. 1013 In these procedures, a PE router which attaches to a particular VPN 1014 "auto-discovers" the other PEs which attach to the same VPN. When a 1015 new PE router is added, or when an existing PE router attaches to a 1016 new VPN, no reconfiguration of other PE routers is needed. 1018 Just as there is no one PE router that needs to know all the VPN-IPv4 1019 routes that are supported over the backbone, these distribution rules 1020 ensure that there is no one RR which needs to know all the VPN-IPv4 1021 routes that are supported over the backbone. As a result, the total 1022 number of such routes that can be supported over the backbone is not 1023 bounded by the capacity of any single device, and therefore can 1024 increase virtually without bound. 1026 4.3.4. How VPN-IPv4 NLRI is Carried in BGP 1028 The BGP Multiprotocol Extensions [BGP-MP] are used to encode the 1029 NLRI. If the AFI field is set to 1, and the SAFI field is set to 1030 128, the NLRI is an MPLS-labeled VPN-IPv4 address. AFI 1 is used 1031 since the network layer protocol associated with the NLRI is still 1032 IP. Note that this VPN architecture does not require the capability 1033 to distribute unlabeled VPN-IPv4 addresses. 1035 In order for two BGP speakers to exchange labeled VPN-IPv4 NLRI, they 1036 must use BGP Capabilities Advertisement to ensure that they both are 1037 capable of properly processing such NLRI. This is done as specified 1038 in [BGP-MP], by using capability code 1 (multiprotocol BGP), with an 1039 AFI of 1 and an SAFI of 128. 1041 The labeled VPN-IPv4 NLRI itself is encoded as specified in [MPLS- 1042 BGP], where the prefix consists of an 8-byte RD followed by an IPv4 1043 prefix. 1045 4.3.5. Building VPNs using Route Targets 1047 By setting up the Import Targets and Export Targets properly, one can 1048 construct different kinds of VPNs. 1050 Suppose it is desired to create a a fully meshed closed user group, 1051 i.e., a set of sites where each can send traffic directly to the 1052 other, but traffic cannot be sent to or received from other sites. 1053 Then each site is associated with a VRF, a single Route Target 1054 attribute is chosen, that Route Target is assigned to each VRF as 1055 both the Import Target and the Export Target, and that Route Target 1056 is not assigned to any other VRFs as either the Import Target or the 1057 Export Target. 1059 Alternatively, suppose one desired, for whatever reason, to create a 1060 "hub and spoke" kind of VPN. This could be done by the use of two 1061 Route Target values, one meaning "Hub" and one meaning "Spoke". At 1062 the VRFs attached to the hub sites, "Hub" is the Export Target and 1063 "Spoke" is the Import Target. At the VRFs attached to the spoke 1064 site, "Hub" is the Import Target and "Spoke" is the Export Target. 1066 Thus the methods for controlling the distribution of routing 1067 information among various sets of sites are very flexible, which in 1068 turn provides great flexibility in constructing VPNs. 1070 4.3.6. Route Distribution Among VRFs in a Single PE 1072 It is possible to distribute routes from one VRF to another, even if 1073 both VRFs are in the same PE, even though in this case one cannot say 1074 that the route has been distributed by BGP. Nevertheless, the 1075 decision to distribute a particular route from one VRF to another 1076 within a single PE is the same decision that would be made if the 1077 VRFs were on different PEs. That is, it depends on the route target 1078 attribute which is assigned to the route (or would be assigned if the 1079 route were distributed by BGP), and the import target of the second 1080 VRF. 1082 5. Forwarding 1084 If the intermediate routers in the backbone do not have any 1085 information about the routes to the VPNs, how are packets forwarded 1086 from one VPN site to another? 1088 When a PE receives an IP packet from a CE device, it chooses a 1089 particular VRF in which to look up the packet's destination address. 1090 This choice is based on the packet's ingress attachment circuit. 1091 Assume that a match is found. As a result we learn the packet's 1092 "next hop". 1094 If the packet's next hop is reached directly over a VRF attachment 1095 circuit from this PE (i.e., the packet's egress attachment circuit is 1096 on the same PE as its ingress attachment circuit), then the packet is 1097 sent on the egress attachment circuit, and no MPLS labels are pushed 1098 onto the packet's label stack. 1100 If the ingress and egress attachment circuits are on the same PE, but 1101 are associated with different VRFs, and if the route which best 1102 matches the destination address in the ingress attachment circuit's 1103 VRF is an aggregate of several routes in the egress attachment 1104 circuit's VRF, it may be necessary to look up the packet's 1105 destination address in the egress VRF as well. 1107 If the packet's next hop is NOT reached through a VRF attachment 1108 circuit, then the packet must travel at least one hop through the 1109 backbone. The packet thus has a "BGP Next Hop", and the BGP Next Hop 1110 will have assigned an MPLS label for the route that best matches the 1111 packet's destination address. Call this label the "VPN route label". 1112 The IP packet is turned into an MPLS packet with the VPN route label 1113 as the sole label on the label stack. 1115 The packet must then be tunneled to the BGP Next Hop. 1117 If the backbone supports MPLS, this is done as follows: 1119 - The PE routers (and any Autonomous System border routers) which 1120 redistribute VPN-IPv4 addresses need to insert /32 address 1121 prefixes for themselves into the IGP routing tables of the 1122 backbone. This enables MPLS, at each node in the backbone 1123 network, to assign a label corresponding to the route to each PE 1124 router. To ensure interoperability among different 1125 implementations, it is required to support LDP for setting up the 1126 label switched paths across the backbone. However, other methods 1127 of setting up these label switched paths are also possible. 1128 (Some of these other methods may not require the presence of the 1129 /32 address prefixes in the IGP.) 1131 - If there are any traffic engineering tunnels to the BGP next hop, 1132 and if one or more of those is available for use by the packet in 1133 question, one of these tunnels is chosen. This tunnel will be 1134 associated with an MPLS label, the "tunnel label". The tunnel 1135 label gets pushed on the MPLS label stack, and the packet is 1136 forwarded to the tunnel's next hop. 1138 - Otherwise, 1140 * The packet will have an "IGP Next Hop", which is the next hop 1141 along the IGP route to the BGP Next Hop. 1143 * If the BGP Next Hop and the IGP Next Hop are the same, and if 1144 penultimate hop popping is used, the packet is then sent to 1145 the IGP next hop, carrying only the VPN route label. 1147 * Otherwise, the IGP Next Hop will have assigned a label for 1148 the route which best matches the address of the BGP Next Hop. 1149 Call this the "tunnel label". The tunnel label gets pushed 1150 on as the packet's top label. The packet is then forwarded 1151 to the IGP next hop. 1153 - MPLS will then carry the packet across the backbone to the BGP 1154 Next Hop, where the VPN label will be examined. 1156 If the backbone does not support MPLS, the MPLS packet carrying only 1157 the VPN route label may be tunneled to the BGP Next Hop using the 1158 techniques of [MPLS-in-IP-or-GRE]. When the packet emerges from the 1159 tunnel, it will be at the BGP Next Hop, where the VPN route label 1160 will be examined. 1162 At the BGP Next Hop, the treatment of the packet depends on the VPN 1163 route label (see section 4.3.2). In many cases, the PE will be able 1164 to determine, from this label, the attachment circuit over which the 1165 packet should be transmitted (to a CE device), as well as the proper 1166 data link layer header for that interface. In other cases, the PE 1167 may only be able to determine that the packet's destination address 1168 needs to be looked up in a particular VRF before being forwarded to a 1169 CE device. There are also intermediate cases in which the VPN route 1170 label may determine the packet's egress attachment circuit, but a 1171 lookup (e.g., ARP) still needs to be done in order to determine the 1172 packet's data link header on that attachment circuit. 1174 Information in the MPLS header itself, and/or information associated 1175 with the label, may also be used to provide QoS on the interface to 1176 the CE. 1178 In any event, if the packet was an unlabeled IP packet when it 1179 arrived at its ingress PE, it will again be an unlabeled packet when 1180 it leaves its egress PE. 1182 The fact that packets with VPN route labels are tunneled through the 1183 backbone is what makes it possible to keep all the VPN routes out of 1184 the P routers. This is crucial to ensuring the scalability of the 1185 scheme. The backbone does not even need to have routes to the CEs, 1186 only to the PEs. 1188 With respect to the tunnels, it is worth noting that this 1189 specification: 1191 - DOES NOT require that the tunnels be point-to-point; multipoint- 1192 to-point can be used; 1194 - DOES NOT require that there be any explicit setup of the tunnels, 1195 either via signaling or via manual configuration. 1197 - DOES NOT require that there be any tunnel-specific signaling; 1199 - DOES NOT require that there be any tunnel-specific state in the P 1200 or PE routers, beyond what is necessary to maintain the routing 1201 information and (if used) the MPLS label information. 1203 Of course, this specification is compatible with the use of point- 1204 to-point tunnels that must be explicitly configured and/or signaled, 1205 and in some situations there may be reasons for using such tunnels. 1207 The considerations which are relevant to choosing a particular 1208 tunneling technology are outside the scope of this specification. 1210 6. Maintaining Proper Isolation of VPNs 1212 To maintain proper isolation of one VPN from another, it is important 1213 that no router in the backbone accept a tunneled packet from outside 1214 the backbone, unless it is sure that both endpoints of that tunnel 1215 are outside the backbone. 1217 If MPLS is being used as the tunneling technology, this means that a 1218 router in the backbone MUST NOT accept a labeled packet from any 1219 adjacent non-backbone device unless the following two conditions 1220 hold: 1222 1. the label at the top of the label stack was actually 1223 distributed by that backbone router to that non-backbone 1224 device, and 1226 2. the backbone router can determine that use of that label will 1227 cause the packet to leave the backbone before any labels lower 1228 in the stack will be inspected, and before the IP header will 1229 be inspected. 1231 The first condition ensure that any labeled packets received from 1232 non-backbone routers have a legitimate and properly assigned label at 1233 the top of the label stack. The second condition ensures that the 1234 backbone routers will never look below that top label. Of course, 1235 the simplest way to meet these two conditions is just to have the 1236 backbone devices refuse to accept labeled packets from non-backbone 1237 devices. 1239 If MPLS is not being used as the tunneling technology, then filtering 1240 must be done to ensure that an MPLS-in-IP or MPLS-in-GRE packet can 1241 be accepted into the backbone only if the packet's IP destination 1242 address will cause it to be sent outside the backbone. 1244 7. How PEs Learn Routes from CEs 1246 The PE routers which attach to a particular VPN need to know, for 1247 each attachment circuit leading to that VPN, which of the VPN's 1248 addresses should be reached over that attachment circuit. 1250 The PE translates these addresses into VPN-IPv4 addresses, using a 1251 configured RD. The PE then treats these VPN-IPv4 routes as input to 1252 BGP. Routes from a VPN site are NOT leaked into the backbone's IGP. 1254 Exactly which PE/CE route distribution techniques are possible 1255 depends on whether a particular CE is in a "transit VPN" or not. A 1256 "transit VPN" is one which contains a router that receives routes 1257 from a "third party" (i.e., from a router which is not in the VPN, 1258 but is not a PE router), and that redistributes those routes to a PE 1259 router. A VPN which is not a transit VPN is a "stub VPN". The vast 1260 majority of VPNs, including just about all corporate enterprise 1261 networks, would be expected to be "stubs" in this sense. 1263 The possible PE/CE distribution techniques are: 1265 1. Static routing (i.e., configuration) may be used. (This is 1266 likely to be useful only in stub VPNs.) 1268 2. PE and CE routers may be RIP peers, and the CE may use RIP to 1269 tell the PE router the set of address prefixes which are 1270 reachable at the CE router's site. When RIP is configured in 1271 the CE, care must be taken to ensure that address prefixes from 1272 other sites (i.e., address prefixes learned by the CE router 1273 from the PE router) are never advertised to the PE. More 1274 precisely: if a PE router, say PE1, receives a VPN-IPv4 route 1275 R1, and as a result distributes an IPv4 route R2 to a CE, then 1276 R2 must not be distributed back from that CE's site to a PE 1277 router, say PE2, (where PE1 and PE2 may be the same router or 1278 different routers), unless PE2 maps R2 to a VPN-IPv4 route 1279 which is different than (i.e., contains a different RD than) 1280 R1. 1282 3. The PE and CE routers may be OSPF peers. A PE router which is 1283 an OSPF peer of a CE router appears, to the CE router, to be an 1284 area 0 router. If a PE router is an OSPF peer of CE routers 1285 which are in distinct VPNs, the PE must of course be running 1286 multiple instances of OSPF. 1288 IPv4 routes which the PE learns from the CE via OSPF are 1289 redistributed into BGP as VPN-IPv4 routes. Extended community 1290 attributes are used to carry, along with the route, all the 1291 information needed to enable the route to be distributed to 1292 other CE routers in the VPN in the proper type of OSPF LSA. 1293 OSPF route tagging is used to ensure that routes received from 1294 the MPLS/BGP backbone are not sent back into the backbone. 1296 Specification of the complete set of procedures for the use of 1297 OSPF between PE and CE can be found in [VPN-OSPF] and [VPN- 1298 OSPF-0]. 1300 4. The PE and CE routers may be BGP peers, and the CE router may 1301 use BGP (in particular, EBGP to tell the PE router the set of 1302 address prefixes which are at the CE router's site. (This 1303 technique can be used in stub VPNs or transit VPNs.) 1305 This technique has a number of advantages over the others: 1307 a) Unlike the IGP alternatives, this does not require the PE 1308 to run multiple routing algorithm instances in order to 1309 talk to multiple CEs 1311 b) BGP is explicitly designed for just this function: 1312 passing routing information between systems run by 1313 different administrations 1315 c) If the site contains "BGP backdoors", i.e., routers with 1316 BGP connections to routers other than PE routers, this 1317 procedure will work correctly in all circumstances. The 1318 other procedures may or may not work, depending on the 1319 precise circumstances. 1321 d) Use of BGP makes it easy for the CE to pass attributes of 1322 the routes to the PE. A complete specification of the 1323 set of attributes and their use is outside the scope of 1324 this document. However, some examples of the way this 1325 may be used are the following: 1327 - The CE may suggest a particular Route Target for each 1328 route, from among the Route Targets that the PE is 1329 authorized to attach to the route. The PE would then 1330 attach only the suggested Route Target, rather than 1331 the full set. This gives the CE administrator some 1332 dynamic control of the distribution of routes from 1333 the CE. 1335 - Additional types of Extended Community attributes may 1336 be defined, where the intention is to have those 1337 attributes passed transparently (i.e., without being 1338 changed by the PE routers) from CE to CE. This would 1339 allow CE administrators to implement additional route 1340 filtering, beyond that which is done by the PEs. 1341 This additional filtering would not require 1342 coordination with the SP. 1344 On the other hand, using BGP may be something new for the CE 1345 administrators. 1347 If a site is not in a transit VPN, note that it need not have a 1348 unique Autonomous System Number (ASN). Every CE whose site is 1349 not in a transit VPN can use the same ASN. This can be chosen 1350 from the private ASN space, and it will be stripped out by the 1351 PE. Routing loops are prevented by use of the Site of Origin 1352 Attribute (see below). 1354 What if a set of sites constitute a transit VPN? This will 1355 generally be the case only if the VPN is itself an ISP's 1356 network, where the ISP is itself buying backbone services from 1357 another SP. The latter SP may be called a "Carrier's Carrier". 1358 In this case, the best way to provide the VPN is to have the CE 1359 routers support MPLS, and to use the technique described in 1360 section 9. 1362 When we do not need to distinguish among the different ways in which 1363 a PE can be informed of the address prefixes which exist at a given 1364 site, we will simply say that the PE has "learned" the routes from 1365 that site. This includes the case where the PE has been manually 1366 configured with the routes. 1368 Before a PE can redistribute a VPN-IPv4 route learned from a site, it 1369 must assign a Route Target attribute (see section 4.3.1) to the 1370 route, and it may assign a Site of Origin attribute to the route. 1372 The Site of Origin attribute, if used, is encoded as a Route Origin 1373 Extended Community [BGP-EXTCOMM]. The purpose of this attribute is 1374 to uniquely identify the set of routes learned from a particular 1375 site. This attribute is needed in some cases to ensure that a route 1376 learned from a particular site via a particular PE/CE connection is 1377 not distributed back to the site through a different PE/CE 1378 connection. It is particularly useful if BGP is being used as the 1379 PE/CE protocol, but different sites have not been assigned distinct 1380 ASNs. 1382 8. How CEs learn Routes from PEs 1384 In this section, we assume that the CE device is a router. 1386 If the PE places a particular route in the VRF it uses to route 1387 packets received from a particular CE, then in general, the PE may 1388 distribute that route to the CE. Of course the PE may distribute 1389 that route to the CE only if this is permitted by the rules of the 1390 PE/CE protocol. (For example, if a particular PE/CE protocol has 1391 "split horizon", certain routes in the VRF cannot be redistributed 1392 back to the CE.) We add one more restriction on the distribution of 1393 routes from PE to CE: if a route's Site of Origin attribute 1394 identifies a particular site, that route must never be redistributed 1395 to any CE at that site. 1397 In most cases, however, it will be sufficient for the PE to simply 1398 distribute the default route to the CE. (In some cases, it may even 1399 be sufficient for the CE to be configured with a default route 1400 pointing to the PE.) This will generally work at any site which does 1401 not itself need to distribute the default route to other sites. 1402 (E.g., if one site in a corporate VPN has the corporation's access to 1403 the Internet, that site might need to have default distributed to the 1404 other site, but one could not distribute default to that site 1405 itself.) 1407 Whatever procedure is used to distribute routes from CE to PE will 1408 also be used to distribute routes from PE to CE. 1410 9. Carriers' Carriers 1412 Sometimes a VPN may actually be the network of an ISP, with its own 1413 peering and routing policies. Sometimes a VPN may be the network of 1414 an SP which is offering VPN services in turn to its own customers. 1415 VPNs like these can also obtain backbone service from another SP, the 1416 "carrier's carrier", using essentially the same methods described in 1417 this document. However, it is necessary in these cases that the CE 1418 routers support MPLS. In particular: 1420 - The CE routers should distribute to the PE routers ONLY those 1421 routes which are internal to the VPN. This allows the VPN to be 1422 handled as a stub VPN. 1424 - The CE routers should support MPLS, in that they should be able 1425 to receive labels from the PE routers, and send labeled packets 1426 to the PE routers. They do not need to distribute labels of 1427 their own though. 1429 - The PE routers should distribute, to the CE routers, labels for 1430 the routes they distribute to the CE routers. 1432 The PE must not distribute the same label to two different CEs 1433 unless one of the following conditions holds: 1435 * The two CEs are associated with exactly the same set of VRFs; 1437 * The PE maintains a different Incoming Label Map ([MPLS-ARCH]) 1438 for each CE. 1440 Further, when the PE receives a labeled packet from a CE, it must 1441 verify that the top label is one that was distributed to that CE. 1443 - Routers at the different sites should establish BGP connections 1444 among themselves for the purpose of exchanging external routes 1445 (i.e., routes which lead outside of the VPN). 1447 - All the external routes must be known to the CE routers. 1449 Then when a CE router looks up a packet's destination address, the 1450 routing lookup will resolve to an internal address, usually the 1451 address of the packet's BGP next hop. The CE labels the packet 1452 appropriately and sends the packet to the PE. The PE, rather than 1453 looking up the packet's IP destination address in a VRF, uses the 1454 packet's top MPLS label to select the "BGP next hop". As a result, 1455 if the BGP next hop is more than one hop away, the top label will be 1456 replaced by two labels, a tunnel label and a VPN route label. If the 1457 BGP next hop is one hop away, the top label may be replaced by just 1458 the VPN route label. If the ingress PE is also the egress PE, the 1459 top label will just be popped. When the packet is sent from its 1460 egress PE to a CE, the packet will have one fewer MPLS labels than it 1461 had when it was first received by its ingress PE. 1463 In the above procedure, the CE routers are the only routers in the 1464 VPN which need to support MPLS. If, on the other hand, all the 1465 routers at a particular VPN site support MPLS, then it is no longer 1466 required that the CE routers know all the external routes. All that 1467 is required is that the external routes be known to whatever routers 1468 are responsible for putting the label stack on a hitherto unlabeled 1469 packet, and that there be label switched path that leads from those 1470 routers to their BGP peers at other sites. In this case, for each 1471 internal route that a CE router distributes to a PE router, it must 1472 also distribute a label. 1474 10. Multi-AS Backbones 1476 What if two sites of a VPN are connected to different Autonomous 1477 Systems (e.g., because the sites are connected to different SPs)? 1478 The PE routers attached to that VPN will then not be able to maintain 1479 IBGP connections with each other, or with a common route reflector. 1480 Rather, there needs to be some way to use EBGP to distribute VPN-IPv4 1481 addresses. 1483 There are a number of different ways of handling this case, which we 1484 present in order of increasing scalability. 1486 a) VRF-to-VRF connections at the AS border routers. 1488 In this procedure, a PE router in one AS attaches directly to a 1489 PE router in another. The two PE routers will be attached by 1490 multiple sub-interfaces, at least one for each of the VPNs 1491 whose routes need to be passed from AS to AS. Each PE will 1492 treat the other as if it were a CE router. That is, the PEs 1493 associate each such sub-interface with a VRF, and use EBGP to 1494 distribute unlabeled IPv4 addresses to each other. 1496 This is a procedure that "just works", and that does not 1497 require MPLS at the border between ASes. However, it does not 1498 scale as well as the other procedures discussed below. 1500 b) EBGP redistribution of labeled VPN-IPv4 routes from AS to 1501 neighboring AS. 1503 In this procedure, the PE routers use IBGP to redistribute 1504 labeled VPN-IPv4 routes either to an Autonomous System Border 1505 Router (ASBR), or to a route reflector of which an ASBR is a 1506 client. The ASBR then uses EBGP to redistribute those labeled 1507 VPN-IPv4 routes to an ASBR in another AS, which in turn 1508 distributes them to the PE routers in that AS, or perhaps to 1509 another ASBR which in turn distributes them ... 1511 When using this procedure, VPN-IPv4 routes should only be 1512 accepted on EBGP connections at private peering points, as part 1513 of a trusted arrangement between SPs. VPN-IPv4 routes should 1514 neither be distributed to nor accepted from the public 1515 Internet, or from any BGP peers which are not trusted. An ASBR 1516 should never accept a labeled packet from an EBGP peer unless 1517 it has actually distributed the top label to that peer. 1519 If there are many VPNs having sites attached to different 1520 Autonomous Systems, there does not need to be a single ASBR 1521 between those two ASes which holds all the routes for all the 1522 VPNs; there can be multiple ASBRs, each of which holds only the 1523 routes for a particular subset of the VPNs. 1525 This procedure requires that there be a label switched path 1526 leading from a packet's ingress PE to its egress PE. Hence the 1527 appropriate trust relationships must exist between and among 1528 the set of ASes along the path. Also, there must be agreement 1529 among the set of SPs as to which border routers need to receive 1530 routes with which Route Targets. 1532 c) Multihop EBGP redistribution of labeled VPN-IPv4 routes between 1533 source and destination ASes, with EBGP redistribution of 1534 labeled IPv4 routes from AS to neighboring AS. 1536 In this procedure, VPN-IPv4 routes are neither maintained nor 1537 distributed by the ASBRs. An ASBR must maintain labeled IPv4 1538 /32 routes to the PE routers within its AS. It uses EBGP to 1539 distribute these routes to other ASes. ASBRs in any transit 1540 ASes will also have to use EBGP to pass along the labeled /32 1541 routes. This results in the creation of a label switched path 1542 from the ingress PE router to the egress PE router. Now PE 1543 routers in different ASes can establish multi-hop EBGP 1544 connections to each other, and can exchange VPN-IPv4 routes 1545 over those connections. 1547 If the /32 routes for the PE routers are made known to the P 1548 routers of each AS, everything works normally. If the /32 1549 routes for the PE routers are NOT made known to the P routers 1550 (other than the ASBRs), then this procedure requires a packet's 1551 ingress PE to put a three label stack on it. The bottom label 1552 is assigned by the egress PE, corresponding to the packet's 1553 destination address in a particular VRF. The middle label is 1554 assigned by the ASBR, corresponding to the /32 route to the 1555 egress PE. The top label is assigned by the ingress PE's IGP 1556 Next Hop, corresponding to the /32 route to the ASBR. 1558 To improve scalability, one can have the multi-hop EBGP 1559 connections exist only between a route reflector in one AS and 1560 a route reflector in another. (However, when the route 1561 reflectors distribute routes over this connection, they do not 1562 modify the BGP next hop attribute of the routes.) The actual 1563 PE routers would then only have IBGP connections to the route 1564 reflectors in their own AS. 1566 This procedure is very similar to the "Carrier's Carrier" 1567 procedures described in section 9. Like the previous procedure, 1568 it requires that there be a label switched path leading from a 1569 packet's ingress PE to its egress PE. 1571 11. Accessing the Internet from a VPN 1573 Many VPN sites will need to be able to access the public Internet, as 1574 well as to access other VPN sites. The following describes some of 1575 the alternative ways of doing this. 1577 1. In some VPNs, one or more of the sites will obtain Internet 1578 Access by means of an "Internet gateway" (perhaps a firewall) 1579 attached to a non-VRF interface to an ISP. The ISP may or may 1580 not be the same organization as the SP which is providing the 1581 VPN service. Traffic to/from the Internet gateway would then 1582 be routed according to the PE router's default forwarding 1583 table. 1585 In this case, the sites which have Internet Access may be 1586 distributing a default route to their PEs, which in turn 1587 redistribute it to other PEs and hence into other sites of the 1588 VPN. This provides Internet Access for all of the VPN's sites. 1590 In order to properly handle traffic from the Internet, the ISP 1591 must distribute, to the Internet, routes leading to addresses 1592 that are within the VPN. This is completely independent of any 1593 of the route distribution procedures described in this 1594 document. The internal structure of the VPN will in general 1595 not be visible from the Internet; such routes would simply lead 1596 to the non-VRF interface that attaches to the VPN's Internet 1597 gateway. 1599 In this model, there is no exchange of routes between a PE 1600 router's default forwarding table and any of its VRFs. VPN 1601 route distribution procedures and Internet route distribution 1602 procedures are completely independent. 1604 Note that although some sites of the VPN use a VRF interface to 1605 communicate with the Internet, ultimately all packets to/from 1606 the Internet traverse a non-VRF interface before 1607 leaving/entering the VPN, so we refer to this as "non-VRF 1608 Internet Access". 1610 Note that the PE router to which the non-VRF interface attaches 1611 does not necessarily need to maintain all the Internet routes 1612 in its default forwarding table. The default forwarding table 1613 could have as few as one route, "default", which leads to 1614 another router (probably an adjacent one) which has the 1615 Internet routes. A variation of this scheme is to tunnel 1616 packets received over the non-VRF interface from the PE router 1617 to another router, where this other router maintains the full 1618 set of Internet routes. 1620 2. Some VPNs may obtain Internet access via a VRF interface ("VRF 1621 Internet Access"). If a packet is received by a PE over a VRF 1622 interface, and if the packet's destination address does not 1623 match any route in the VRF, then it may be matched against the 1624 PE's default forwarding table. If a match is made there, the 1625 packet can be forwarded natively through the backbone to the 1626 Internet, instead of being forwarded by MPLS. 1628 In order for traffic to flow natively in the opposite direction 1629 (from Internet to VRF interface), some of the routes from the 1630 VRF must be exported to the Internet forwarding table. 1631 Needless to say, any such routes must correspond to globally 1632 unique addresses. 1634 In this scheme, the default forwarding table might have the 1635 full set of Internet routes, or it might have a little as a 1636 single default route leading to another router which does have 1637 the full set of Internet routes in its default forwarding 1638 table. 1640 3. Suppose the PE has the capability to store "non-VPN routes" in 1641 a VRF. If a packet's destination address matches a "non-VPN 1642 route", then the packet is transmitted natively, rather than 1643 being transmitted via MPLS. If the VRF contains a non-VPN 1644 default route, all packets for the public Internet will match 1645 it, and be forwarded natively to the default route's next hop. 1646 At that next hop, the packets' destination addresses will be 1647 looked up in the default forwarding table, and may match more 1648 specific routes. 1650 This technique would only be available if none of the CE 1651 routers is distributing a default route. 1653 4. It is also possible to obtain Internet access via a VRF 1654 interface by having the VRF contain the Internet routes. 1655 Compared with model 2, this eliminates the second lookup, but 1656 it has the disadvantage of requiring the Internet routes to be 1657 replicated in each such VRF. 1659 If this technique is used, the SP may want to make its 1660 interface to the Internet be a VRF interface, and to use the 1661 techniques of section 4 to distribute Internet routes, as VPN- 1662 IPv4 routes, to other VRFs. 1664 It should be clearly understood that by default, there is no exchange 1665 of routes between a VRF and the default forwarding table. This is 1666 done ONLY upon agreement between a customer and a SP, and only if it 1667 suits the customer's policies. 1669 12. Management VPNs 1671 This specification does not require that the sub-interface connecting 1672 a PE router and a CE router be a "numbered" interface. If it is a 1673 numbered interface, this specification allows the addresses assigned 1674 to the interface to come from either the address space of the VPN or 1675 the address space of the SP. 1677 If a CE router is being managed by the Service Provider, then the 1678 Service Provider will likely have a network management system which 1679 needs to be able to communicate with the CE router. In this case, 1680 the addresses assigned to the sub-interface connecting the CE and PE 1681 routers should come from the SP's address space, and should be unique 1682 within that space. The network management system should itself 1683 connect to a PE router (more precisely, be at a site which connects 1684 to a PE router) via a VRF interface. The address of the network 1685 management system will be exported to all VRFs which are associated 1686 with interfaces to CE routers that are managed by the SP. The 1687 addresses of the CE routers will be exported to the VRF associated 1688 with the Network Management system, but not to any other VRFs. 1690 This allows communication between CE and Network Management system, 1691 but does not allow any undesired communication to or among the CE 1692 routers. 1694 One way to ensure that the proper route import/exports are done is to 1695 use two Route Targets, call them T1 and T2. If a particular VRF 1696 interface attaches to a CE router that is managed by the SP, then 1697 that VRF is configured to: 1699 - import routes that have T1 attached to them, and 1701 - attach T2 to addresses assigned to each end of its VRF 1702 interfaces. 1704 If a particular VRF interface attaches to the SP's Network Management 1705 system, then that VRF is configured to attach T1 to the address of 1706 that system, and to import routes that have T2 attached to them. 1708 13. Security 1710 13.1. Data Plane 1712 By security in the "data plane", we mean protection against the 1713 following possibilities: 1715 - Packets from within a VPN travel to a site outside the VPN, other 1716 than in a manner consistent with the policies of the VPN. 1718 - Packets from outside a VPN enter one of the VPN's sites, other 1719 than in a manner consistent with the policies of the VPN. 1721 Under the following conditions: 1723 1. a backbone router does not accept labeled packets over a 1724 particular data link, unless it is known that that data link 1725 attaches only to trusted systems, or unless it is known that 1726 such packets will leave the backbone before the IP header or 1727 any labels lower in the stack will be inspected, and 1729 2. labeled VPN-IPv4 routes are not accepted from untrusted or 1730 unreliable routing peers, 1732 3. no successful attacks have been mounted on the control plane, 1734 the data plane security provided by this architecture is virtually 1735 identical to that provided to VPNs by Frame Relay or ATM backbones. 1736 If the devices under the control of the SP are properly configured, 1737 data will not enter or leave a VPN unless authorized to do so. 1739 Condition 1 above can be stated more precisely. One should discard a 1740 labeled packet received from a particular neighbor unless one of the 1741 following two conditions holds: 1743 - the packet's top label has a label value which the receiving 1744 system has distributed to that neighbor, or 1746 - the packet's top label has a label value which the receiving 1747 system has distributed to a system beyond that neighbor (i.e., 1748 when it is known that the path from the system to which the label 1749 was distributed to the receiving system may be via that 1750 neighbor). 1752 Condition 2 above is of most interest in the case of inter-provider 1753 VPNs (see section 10). For inter-provider VPNs constructed according 1754 to scheme b) of section 10, condition 2 is easily checked. (The 1755 issue of security when scheme c) of section 10 is used is for further 1756 study.) 1758 It is worth noting that the use of MPLS makes it much simpler to 1759 provide data plane security than might be possible if one attempted 1760 to use some form of IP tunneling in place of the MPLS outer label. 1761 It is a simple matter to have one's border routers refuse to accept a 1762 labeled packet unless the first of the above conditions applies to 1763 it. It is rather more difficult to configure a router to refuse to 1764 accept an IP packet if that packet is an IP tunneled packet whose 1765 destination address is that of a PE router; certainly this is not 1766 impossible to do, but it has both management and performance 1767 implications. 1769 Note that if the PE routers support any "MPLS in IP" or "MPLS in GRE" 1770 or similar encapsulations, security is compromised unless either any 1771 such packets are filtered at the borders, or else some acceptable 1772 means of authentication (e.g., IPsec authentication) is carried in 1773 the packet itself. 1775 In the case where a number of CE routers attach to a PE router via a 1776 LAN interface, to ensure proper security, one of the following 1777 conditions must hold: 1779 1. All the CE routers on the LAN belong to the same VPN, or 1781 2. A trusted and secured LAN switch divides the LAN into multiple 1782 VLANs, with each VLAN containing only systems of a single VPN; 1783 in this case the switch will attach the appropriate VLAN tag to 1784 any packet before forwarding it to the PE router. 1786 Cryptographic privacy is not provided by this architecture, nor by 1787 Frame Relay or ATM VPNs. These architectures are all compatible with 1788 the use of cryptography on a CE-CE basis, if that is desired. 1790 The use of cryptography on a PE-PE basis is for further study. 1792 13.2. Control Plane 1794 The data plane security of the previous section depends on the 1795 security of the control plane. To ensure security, neither BGP nor 1796 LDP connections should be made with untrusted peers. The TCP/IP MD5 1797 authentication option should be used with both these protocols. The 1798 routing protocol within the SP's network should also be secured in a 1799 similar manner. 1801 13.3. Security of P and PE devices 1803 If the physical security of these devices is compromised, data plane 1804 security may also be compromised. 1806 The usual steps should be take to ensure that IP traffic from the 1807 public Internet cannot be used to modify the configuration of these 1808 devices, or to mount Denial of Service attacks on them. 1810 14. Quality of Service 1812 Although not the focus of this paper, Quality of Service is a key 1813 component of any VPN service. In MPLS/BGP VPNs, existing L3 QoS 1814 capabilities can be applied to labeled packets through the use of the 1815 "experimental" bits in the shim header [MPLS-ENCAPS], or, where ATM 1816 is used as the backbone, through the use of ATM QoS capabilities. 1817 The traffic engineering work discussed in [MPLS-RSVP] is also 1818 directly applicable to MPLS/BGP VPNs. Traffic engineering could even 1819 be used to establish label switched paths with particular QoS 1820 characteristics between particular pairs of sites, if that is 1821 desirable. Where an MPLS/BGP VPN spans multiple SPs, the 1822 architecture described in [PASTE] may be useful. An SP may apply 1823 either intserv or diffserv capabilities to a particular VPN, as 1824 appropriate. 1826 15. Scalability 1828 We have discussed scalability issues throughout this paper. In this 1829 section, we briefly summarize the main characteristics of our model 1830 with respect to scalability. 1832 The Service Provider backbone network consists of (a) PE routers, (b) 1833 BGP Route Reflectors, (c) P routers (which are neither PE routers nor 1834 Route Reflectors), and, in the case of multi-provider VPNs, (d) 1835 ASBRs. 1837 P routers do not maintain any VPN routes. In order to properly 1838 forward VPN traffic, the P routers need only maintain routes to the 1839 PE routers and the ASBRs. The use of two levels of labeling is what 1840 makes it possible to keep the VPN routes out of the P routers. 1842 A PE router maintains VPN routes, but only for those VPNs to which it 1843 is directly attached. 1845 Route reflectors can be partitioned among VPNs so that each partition 1846 carries routes for only a subset of the VPNs supported by the Service 1847 Provider. Thus no single route reflector is required to maintain 1848 routes for all VPNs. 1850 For inter-provider VPNs, if the ASBRs maintain and distribute VPN- 1851 IPv4 routes, then the ASBRs can be partitioned among VPNs in a 1852 similar manner, with the result that no single ASBR is required to 1853 maintain routes for all the inter-provider VPNs. If multi-hop EBGP 1854 is used, then the ASBRs need not maintain and distribute VPN-IPv4 1855 routes at all. 1857 As a result, no single component within the Service Provider network 1858 has to maintain all the routes for all the VPNs. So the total 1859 capacity of the network to support increasing numbers of VPNs is not 1860 limited by the capacity of any individual component. 1862 16. Intellectual Property Considerations 1864 Cisco Systems may seek patent or other intellectual property 1865 protection for some of all of the technologies disclosed in this 1866 document. If any standards arising from this document are or become 1867 protected by one or more patents assigned to Cisco Systems, Cisco 1868 intends to disclose those patents and license them on reasonable and 1869 non-discriminatory terms. 1871 17. Acknowledgments 1873 Significant contributions to this work have been made by Ravi 1874 Chandra, Dan Tappan and Bob Thomas. 1876 We also wish to thank Shantam Biswas for his review and 1877 contributions. 1879 18. Authors' Addresses 1881 Eric C. Rosen 1882 Cisco Systems, Inc. 1883 1414 Massachusetts Avenue 1884 Boxborough, MA 01719 1885 E-mail: erosen@cisco.com 1887 Yakov Rekhter 1888 Juniper Networks 1889 1194 N. Mathilda Avenue 1890 Sunnyvale, CA 94089 1891 E-mail: yakov@juniper.net 1893 Tony Bogovic 1894 Telcordia Technologies 1895 445 South Street, Room 1A264B 1896 Morristown, NJ 07960 1897 E-mail: tjb@research.telcordia.com 1898 Stephen John Brannon 1899 Swisscom AG 1900 Postfach 1570 1901 CH-8301 1902 Glattzentrum (Zuerich), Switzerland 1903 E-mail: stephen.brannon@swisscom.com 1905 Marco Carugi 1906 Nortel Networks S.A. 1907 Parc d'activit�s de Magny-Les Jeunes Bois CHATEAUFORT 1908 78928 YVELINES Cedex 9 - FRANCE 1909 Email : marco.carugi@nortelnetworks.com 1911 Christopher J. Chase 1912 AT&T 1913 200 Laurel Ave 1914 Middletown, NJ 07748 1915 USA 1916 E-mail: chase@att.com 1918 Ting Wo Chung 1919 Bell Nexxia 1920 181 Bay Street 1921 Suite 350 1922 Toronto, Ontario 1923 M5J2T3 1924 E-mail: ting_wo.chung@bellnexxia.com 1926 Eric Dean 1927 Global One 1928 12490 Sunrise Valley Dr. 1929 Reston, VA 20170 USA 1930 E-mail: edean@gip.net 1932 Jeremy De Clercq 1933 Alcatel Network Strategy Group 1934 Francis Wellesplein 1 1935 2018 Antwerp, Belgium 1936 E-mail: jeremy.de_clercq@alcatel.be 1937 Luyuan Fang 1938 AT&T 1939 IP Backbone Architecture 1940 200 Laurel Ave. 1941 Middletown, NJ 07748 1942 E-mail: luyuanfang@att.com 1944 Paul Hitchen 1945 BT 1946 BT Adastral Park 1947 Martlesham Heath, 1948 Ipswich IP5 3RE 1949 UK 1950 E-mail: paul.hitchen@bt.com 1952 Manoj Leelanivas 1953 Juniper Networks, Inc. 1954 385 Ravendale Drive 1955 Mountain View, CA 94043 USA 1956 E-mail: manoj@juniper.net 1958 Dave Marshall 1959 Worldcom 1960 901 International Parkway 1961 Richardson, Texas 75081 1962 E-mail: dave.marshall@wcom.com 1964 Luca Martini 1965 Level 3 Communications, LLC. 1966 1025 Eldorado Blvd. 1967 Broomfield, CO, 80021 1968 E-mail: luca@level3.net 1970 Monique Jeanne Morrow 1971 Cisco Systems, Inc. 1972 Glatt-com, 2nd floor 1973 CH-8301 1974 Glattzentrum, Switzerland 1975 E-mail: mmorrow@cisco.com 1976 Ravichander Vaidyanathan 1977 Telcordia Technologies 1978 445 South Street, Room 1C258B 1979 Morristown, NJ 07960 1980 E-mail: vravi@research.telcordia.com 1982 Adrian Smith 1983 BT 1984 BT Adastral Park 1985 Martlesham Heath, 1986 Ipswich IP5 3RE 1987 UK 1988 E-mail: adrian.ca.smith@bt.com 1990 Vijay Srinivasan 1991 1200 Bridge Parkway 1992 Redwood City, CA 94065 1993 E-mail: vsriniva@cosinecom.com 1995 Alain Vedrenne 1996 Equant 1997 Heraklion, 1041 route des Dolines, BP347 1998 06906 Sophia Antipolis, Cedex, france 1999 Email: Alain.Vedrenne@equant.com 2001 19. Normative References 2003 [BGP-MP] Bates, Chandra, Katz, and Rekhter, "Multiprotocol Extensions 2004 for BGP4", June 2000, RFC 2858 2006 [BGP-EXTCOMM] Rekhter, Tappan, Sangli, "BGP Extended Communities 2007 Attribute", May 2002, draft-ietf-idr-bgp-ext-communities-05.txt 2009 [MPLS-ARCH] Rosen, Viswanathan, and Callon, "Multiprotocol Label 2010 Switching Architecture", RFC 3031, January 2001 2012 [MPLS-BGP] Rekhter and Rosen, "Carrying Label Information in BGP4", 2013 May 2001, RFC 3107 2015 [MPLS-ENCAPS] Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, and 2016 Conta, "MPLS Label Stack Encoding", RFC 3032, January 2001 2018 20. Informational References 2020 [BGP-AS4] Vohra and Chen, "BGP Support for Four-Octet AS Number 2021 Space", Jan 2003, draft-ietf-idr-as4bytes-06.txt 2023 [BGP-ORF] Chen, Rekhter, "Cooperative Route Filtering Capability for 2024 BGP-4", Jan 2003, draft-ietf-idr-route-filter-08.txt 2026 [BGP-RFSH] Chen, "Route Refresh Capability for BGP-4", March 2000, 2027 RFC 2918 2029 [BGP-RR] Bates, Chandra, and Chen, "BGP Route Reflection: An 2030 alternative to full mesh IBGP", RFC 2796, April 2000 2032 [IPSEC] Kent and Atkinson, "Security Architecture for the Internet 2033 Protocol", November 1998, RFC 2401 2035 [MPLS-ATM] Davie, Doolan, Lawrence, McCloghrie, Rosen, Swallow, 2036 Rekhter, "MPLS using LDP and ATM VC Switching", RFC 3035, January 2037 2001 2039 [MPLS/BGP-IPsec] Rosen, De Clercq, Paridaen, T'Joens, Sargor, "Use of 2040 PE-PE IPsec in RFC2547 VPNs", draft-ietf-ppvpn-ipsec-2547-03.txt 2042 [MPLS-FR] Conta, Doolan, Malis, "Use of Label Switching on Frame 2043 Relay Networks Specification" RFC 3034, January 2001 2045 [MPLS-in-IP-GRE] Rekhter, Worster, Rosen, "Encapsulating MPLS in IP 2046 or GRE", Jan 2003, draft-ietf-mpls-in-ip-or-gre-00.txt 2048 [MPLS-LDP] Andersson, Doolan, Feldman, Fredette, Thomas, "LDP 2049 Specification", RFC 3036, January 2001 2051 [MPLS-RSVP] Awduche, Berger, Gan, Li, Srinavasan, Swallow, "RSVP-TE: 2052 Extensions to RSVP for LSP Tunnels", February 2001, RFC 3209 2054 [PASTE] Li and Rekhter, "A Provider Architecture for Differentiated 2055 Services and Traffic Engineering (PASTE)", October 1998, RFC 2430 2057 [VPN-OSPF] Rosen, Psenak and Pillay-Esnault, "OSPF as the PE/CE 2058 Protocol in BGP/MPLS VPNs", Feb 2003, draft-rosen-vpns-ospf-bgp- 2059 mpls-06.txt 2061 [VPN-OSPF-0] Rosen, Psenak, and Pillay-Esnault, "OSPF Area 0 PE/CE 2062 Links in BGP/MPLS VPNs", Feb 2003, draft-rosen-ppvpn-ospf2547-area0- 2063 02.txt 2065 [VPN-MCAST] Rosen, "Multicast in MPLS/BGP VPNs", draft-rosen-vpn- 2066 mcast-05.txt, Apr 2003 2068 21. Full Copyright Statement 2070 Copyright (C) The Internet Society (2000). All Rights Reserved. 2072 This document and translations of it may be copied and furnished to 2073 others, and derivative works that comment on or otherwise explain it 2074 or assist in its implementation may be prepared, copied, published 2075 and distributed, in whole or in part, without restriction of any 2076 kind, provided that the above copyright notice and this paragraph are 2077 included on all such copies and derivative works. However, this 2078 document itself may not be modified in any way, such as by removing 2079 the copyright notice or references to the Internet Society or other 2080 Internet organizations, except as needed for the purpose of 2081 developing Internet standards in which case the procedures for 2082 copyrights defined in the Internet Standards process must be 2083 followed, or as required to translate it into languages other than 2084 English. 2086 The limited permissions granted above are perpetual and will not be 2087 revoked by the Internet Society or its successors or assigns. 2089 This document and the information contained herein is provided on an 2090 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2091 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2092 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2093 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2094 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.