idnits 2.17.1 draft-marques-l3vpn-end-system-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([RFC4364]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 6 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 144: '...ficance only and SHOULD be allocated b...' RFC 2119 keyword, line 208: '...peering sessions SHALL be support the ...' RFC 2119 keyword, line 218: '... Network devices MAY have direct BGP s...' RFC 2119 keyword, line 318: '...nt XMPP sessions. These sessions MUST...' RFC 2119 keyword, line 322: '... An End-system MAY connect to multip...' (8 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 6, 2011) is 4579 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Marques 3 Internet-Draft 4 Expires: April 8, 2012 L. Fang 5 Cisco Systems 6 P. Pan 7 Infinera Corp 8 October 6, 2011 10 End-system support for BGP-signaled IP/VPNs. 11 draft-marques-l3vpn-end-system-00 13 Abstract 15 Network Service Providers often use BGP/MPLS IP VPNs [RFC4364] as the 16 control plane for overlay networks. That solution has proven to 17 scale to large number of VPNs and attachment points and is one 18 familiar to network equipment software. 20 There is a significant interest in the industry in building overlay 21 networks in which end-systems are themselves the direct participant, 22 along with network equipment such as service appliances. 24 This document proposes an extension of the BGP IP VPN model to serve 25 as the signaling protocol for host-based overlay networks along with 26 an XMPP interface that provides a bridge between the software 27 concepts familiar to end-points and those familiar to network 28 equipment. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 8, 2012. 47 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. End-system functionality . . . . . . . . . . . . . . . . . . . 4 65 3. Operational Model . . . . . . . . . . . . . . . . . . . . . . 6 66 4. XMPP client interface . . . . . . . . . . . . . . . . . . . . 9 67 5. VPN NLRI . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 68 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 69 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 72 1. Introduction 74 Data center applications require private networks connecting multiple 75 "Virtual Machines" belonging to the same administrative "user" and 76 between them and network elements and appliances. 78 In this context, it is a common goal, for the data-center forwarding 79 infrastructure to be isolated from the knowledge of the private 80 network. The set of routers and switches that interconnects physical 81 machines in the data-center is assumed to provide an IP service (with 82 or without the use of IEEE 802.1 technologies). 84 The Virtual Private Networks (VPNs) associated with each individual 85 administrative domain can be built without the knowledge of the data- 86 center connectivity layer as an overlay network. This proposal 87 leverages the technology used in the Service Provide managed VPN 88 space and extends it to address the problem of interconnecting 89 virtual interfaces on end-systems. In both applications there is the 90 need to be able to manage at scale a very large number of VPNs and 91 attachement points. And in both cases there is the need to support 92 the interchange of traffic between different VPNs. 94 This document defines how BGP-signaled IP/VPNs can be used to 95 interconnect end-systems and network elements. It assumes that the 96 forwarding layer uses IP over GRE as defined by [RFC4023]. Other 97 transport layers such as native MPLS or 802.1ah can also be used with 98 the same signaling approach. 100 When this document uses the term 'Infrastructure IP' addresses, it 101 refers to the addresses used in the outer header of GRE packets. In 102 the case of a transport other than IP over GRE, this would be the 103 Subnetwork Point of Attachement (SNPA) address corresponding to the 104 multi-access network providing connectivity to the end-systems. 106 BGP is not an interface that application software is familiar with. 107 In order to bridge the gap between concepts familiar to network 108 devices and those familiar to end-system developers, this document 109 defines an XMPP client interface to be used by end-systems. It 110 defines the procedures to interchange data between XMPP and BGP IP 111 VPN sessions along with the corresponding data schemas. Networking 112 devices may opt to receive the signaling information directly via 113 BGP. 115 2. End-system functionality 117 For the purposes of this document we assume that each end-system 118 executes an 'Host Operating System' with the ability to: 120 Create virtual interfaces (typically ethernet interfaces). 122 Associate a given virtual interface with a specific "Virtual and 123 Routing Forwarding (VRF)" table. 125 Store entries in the VRF table that map an VRF-specific IP prefix 126 into a next-hop which contains a destination IP address and a 20- 127 bit label. 129 Encapsulate outgoing packets according [RFC4023] using the result 130 of the VRF lookup. 132 Associate incoming packets with a VRF according to the 20-bit 133 label contained immediately after the GRE header. 135 Expose a programmatic interface to create, update and delete VRF 136 table entries. 138 The 'Host Operating System' may choose to associate the virtual 139 interfaces with specify 'Virtual Machines' or use other policies to 140 manage the application access to these interfaces. 142 Hosts should support the ability to associate multiple virtual 143 interfaces with the same VRF. The 20-bit label which is associated 144 with a VRF is of local significance only and SHOULD be allocated by 145 the end-system. 147 The procedure that determines that a VRF should configured on a 148 particular end-system as well as which IP addresses to be associated 149 with each interface are outside the scope of this document. We 150 assume that statically assigned IP addresses are used. 152 The VRFs support IP unicast traffic only. Multicast support is 153 subject for further study and will be detailed in a separate 154 document. Both IPv4 and IPv6 are supported and the term 'IP' can 155 refer to either version of the Internet Protocol. 157 The VRF table is populated by the signaling mechanisms described 158 bellow and may contain both host length (i.e. /32 and /128 for IPv4 159 and v6 respectively) or subnet prefixes. As an example a VPN with 160 access to the external networks would probably contain a default 161 route plus a set of host length entries for all the VMs in the same 162 VPN. 164 In the terminology used in the BGP-signaled IP/VPN standard 165 [RFC4364], a end-system acts as a 'Provider Edge' (PE) device in 166 terms of its forwarding capabilities, with the virtual interfaces 167 that it exposes (for instance to virtual machines) as the 'Customer 168 Edge' (CE) interfaces. 170 3. Operational Model 172 In the simplest case, a VPN is a collection of systems that are 173 allowed to exchange traffic with each other and where all the VRFs in 174 the VPN contain all the routing entries for the VPN. Only members of 175 the VPN are allowed to exchange traffic with each other. We can 176 refer to these as symmetrical VPNs since all VRFs contain the same 177 routing information. 179 When end-systems join a given VPN they advertise their membership by 180 advertising the VPN-specific IP address associated with a particular 181 virtual interface as well as its binding to the infrastructure IP 182 address associated with the host. 184 Infrastructure addresses are routable in the underlying transport 185 network (e.g. the data-center network). While VPN addresses are 186 routable on the VPN network only. 188 End-systems subscribe to the contents of the VPN routing tables for 189 which they have members associated with. This information is then 190 used to populate the host's routing tables. It may contain both host 191 routes (i.e. IPv4 32-bit prefixes or IPv6 128-bit prefixes) or 192 routes to gateways that interconnect other networks. 194 The signaling network delivers the membership advertisements 195 generated by the end-systems to other members of the same VPN, subjet 196 to policy controls. 198 When a particular VM "moves" from one physical end-system to another, 199 its respective VPN address will be advertised by the new system and 200 that notification propagated to all attachment points of that VPN. 202 This document assumes two types of applications that perform network 203 signaling functions: BGP Route Reflectors (RRs) and BGP/XMPP 204 signaling gateways. Both functions may be collocated in the same 205 physical device. 207 The BGP Route Reflectors accept connection from gateways or native 208 BGP devices. These BGP peering sessions SHALL be support the address 209 families: VPN-IPv4 (1, 128), VPN-IPv6 (2, 128) and RT-Constraint (1, 210 132) [RFC4684]. 212 The XMPP signaling gateways maintain persistent connection to a 213 subset of the end-systems of the domain and provide a 'pubsub' API to 214 the contents of each specific VPN routing table. These systems are 215 not in the forwarding plane and do not need to be collocated with a 216 network device. 218 Network devices MAY have direct BGP sessions to the BGP Route 219 Reflectors. For instance, a router or security appliance that 220 supports BGP/MPLS IP VPNs over GRE may use its existing functionality 221 to inter-operate directly with a collection of Virtual Machines. 223 The BGP/XMPP gateways implement the VRF policy functionality that is 224 associated with PE routers in the pure BGP IP/VPN case. In these 225 signaling gateways, the 'publish-subscribe' messages from the end- 226 systems are associated with a VRF-specific signaling table. Each of 227 these routing tables contains import and export policies which 228 provide fine grain control over the table contents. 230 An export policy associates VPN routing information with one or more 231 6 byte values known as 'Route Targets'. These 'Route Targets' are 232 associated with the routes as they are advertised out to other BGP 233 speakers. 235 Import policies, on the other hand, select via 'Route Targets', from 236 all the available routing information which routes should be imported 237 into a VPN-specific routing table. 239 A symmetrical VPN uses the same configuration for both import and 240 export. By controlling these policies it is possible to selectively 241 allow direct traffic exchanges between members of different VPNs, 242 assuming their respective IP addresses are non-overlapping. 244 +--------+ +--------+ 245 VM1 -- veth0 --| host 1 |=== [network] ===| host 2 |-- veth0 -- VM2 246 +--------+ +--------+ 248 IP pkt ===> GRE encap ===> [IP net] ===> GRE decap ===> IP pkt 249 [192.168.2.1, 20] map 20 to veth0 251 Figure 1 253 +----------------+--------------+-------+ 254 | VPN IP address | Host address | label | 255 +----------------+--------------+-------+ 256 | 10.1.1.1/32 | localhost | 10 | 257 | | | | 258 | 10.1.1.2/32 | 192.168.2.1 | 20 | 259 +----------------+--------------+-------+ 261 VRF table on host1 263 Table 1 265 The figure and table above contain an example in which IP packets are 266 transmitted from one VPN interface (with address 10.1.1.1) to another 267 VPN interface (with address 10.1.1.2). As previously mentioned, the 268 virtual ethernet interfaces function as a CE interace in a 269 traditional BGP-signaled IP VPN. While the end-system provide the 270 forwarding functionality equivalent to a PE device. 272 +--------+ +-----------+ +--------+ 273 | host 1 | <===> | signaling | <===> | BGP RR | 274 +--------+ | gateway | +--------+ 275 +-----------+ 277 Figure 2 279 +----------------+-------------+-------+-----------+ 280 | VPN IP address | SNPA | label | Known via | 281 +----------------+-------------+-------+-----------+ 282 | 10.1.1.1/32 | 192.168.1.1 | 10 | XMPP | 283 | | | | | 284 | 10.1.1.2/32 | 192.168.2.1 | 20 | BGP | 285 +----------------+-------------+-------+-----------+ 287 VPN Routing table on signaling gateway 289 Table 2 291 The signaling network corresponding to the same example is depicted 292 above. The signaling gateway is an out-of-band system which speaks 293 both XMPP to the host as well as BGP to the BGP RRs. The table above 294 represents the routing table on the gateway that corresponds to the 295 VPN of the example. Host 2 would be connected to another signaling 296 gateway which would be in turn connected to the BGP RR mesh. 298 The gateway is configured via an external mechanism with the 299 parameters that correspond to the VPNs in use by its clients along 300 with its respective vrf import and export policies. 302 XMPP publish request are translated into routing entries on this 303 table, which are then advertised via BGP, using standard BGP-signaled 304 IP VPN mechanism.BGP learned routes are also imported into this 305 routing table. Any changes to its content are advertised to local 306 XMPP clients. 308 In comparison with traditional IP VPNs, the signaling gateway is 309 performing the PE functionality, which XMPP is used as a PE-CE 310 routing protocol. 312 4. XMPP client interface 314 The communication between end-systems and the signaling gateway uses 315 the XMPP protocol with the PubSub Collection Nodes [pubsub] extension 316 in order to exchange VPN route information. 318 End-systems establish persistent XMPP sessions. These sessions MUST 319 use the XMPP Ping [xmpp-ping] extension in order to detect end-system 320 failures. 322 An End-system MAY connect to multiple VPN-signaling gateways for 323 reliability. In this case it SHOULD publish its information to each 324 of the gateways. It MAY choose to subscribe to VPN routing 325 information once only from one of the available gateways. 327 The information advertised by a end-system SHOULD be deleted after a 328 configurable timeout, when the session closes. This timeout should 329 default to 60 seconds. 331 +---------+ +--------+ 332 | gateway | ----------- | BGP RR | 333 +---------+ +--------+ 334 // \ / 335 XMPP \ / 336 // \ / 337 +------------+ \ / 338 | end-system | \ / 339 +------------+ \/ 340 \\ /\ 341 XMPP / \ 342 \\ / \ 343 +---------+ / \ +--------+ 344 | gateway | ----------- | BGP RR | 345 +---------+ +--------+ 347 The figure above represents a typical configuration in which a end- 348 system is homed to multiple gateways, which are in turn connected to 349 multiple BGP route reflectors. In a deployment there would be a 350 number of gateways corresponding to the number of end-systems divided 351 by the gateway capacity in terms of number of XMPP sessions. While 352 the BGP RR scale in terms of the number of gateways attached to it. 354 The XMPP "jid" used by the end-system shall be a 6-byte value that 355 uniquely identifies the host in the domain. This specification 356 recommends the use of the 802 MAC address of one of the physical 357 ethernet interfaces of the end-system, when present. 359 Each VPN shall be identified by a 64 ASCII character string. 361 The guest system software on an end-system SHALL establish an XMPP 362 session with its configured signaling gateways before creating 363 virtual interfaces. 365 When a virtual interface is created, for instance as result of a 366 Virtual Machine being instantiated on a end-system, the host 367 operating-system software shall generate an XMPP Publish message to 368 the VPN-signaling gateway. 370 Publish request from end-system to gateway: 372 374 to='network-control.domain.org' 375 id='request1'> 376 377 378 379 380 'vpn-ip-address>/32' 381 'infrastructure-ip-address' 382 383 384 385 386 387 388 390 394 395 396 397 398 399 401 In the request above the node 'vpn-customer-name' is assumed to be a 402 collection which is implicitly created by the VPN-signaling gateway. 404 The VPN-signaling gateway will convert the information received in a 405 the 'publish' request into the corresponding BGP route information 406 such that:. 408 It associates the specific request with a local VRF with the name 409 specified in the collection 'node' attribute. 411 It Creates a route with with a 'Route Distinguisher' (RD) 412 containing the end-system's 'system-id' and the specified 'label' 413 and NLRI prefix. 415 It associates this route with the specified SNPA address. 417 It associates the route with an extended community TDB containing 418 the version number. 420 Subscription request from end-system to gateway: 422 426 427 428 429 431 Update notification from gateway to end-system: 433 434 435 436 437 438 'vpn-ip-address>/32' 439 'infrastructure-ip-address' 440 441 442 443 444 445 ... 446 447 448 449 451 Notifications should be generated whenever a VPN route is added, 452 modified or deleted. 454 Note that the Update from the signaling gateway to the end-point does 455 not contain the system-id of the destination end-point. When 456 multiple possible routes exist for a given VPN IP address, for 457 instance because the VM may be in the process of moving location, it 458 is the responsibility of the gateway to select the best path to 459 advertise to the end-system. 461 In situations where an automated system is controlling the 462 instantiation of VMs it may be possible to have that system assign a 463 non-decreasing version number for each instantiation of that 464 particular VM. In that case, this number, carried in the 'version' 465 field may be used to help gateways select the most recent 466 instantiation of a VM during the interval of time where multiple 467 routes are present. 469 5. VPN NLRI 471 When a VPN-signaling gateway receives a request to create or modify a 472 VPN route is SHALL generate a BGP VPN route advertisement with the 473 corresponding information using the BGP address family corresponding 474 to the address family specified by the end-system. 476 It is assumed that the VPN-signaling gateways contain information 477 regarding the mapping between 'vpn-customer-names' and BGP Route 478 Targets used to import and export information from the associated 479 VRFs. This mapping is known via an out-of-band mechanism not 480 specified in this document. 482 Whenever a VRF in the gateway contains local routing information, the 483 gateway shall advertise the corresponding RT-Constraint route target 484 routes in BGP, which perform a parallel function to the subscription 485 requests in XMPP. 487 The 32bit route version number defined in the XML schema is 488 advertised into BGP as a Extended community with type TBD. 490 Signaling gateways SHOULD use automatically assign a BGP route 491 distinguisher per VPN routing table. 493 6. Security Considerations 495 The signaling protocol defines the access control policies for each 496 virtual interface and any VM associated with it. It is important to 497 secure the end-system access to signaling gateways and the BGP 498 infrastructure itself. 500 The XMPP session between end-systems and the XMPP gateways MUST use 501 mutual authentication. One possible strategy is to distribute pre- 502 signed certificates to end-systems which are presented as proof of 503 authorization to the signaling gateway. 505 BGP sessions MUST be authenticated using a shared secret. This 506 document recommends that BGP speaking systems filter traffic on port 507 179 such that only IP addresses which are known to participate in the 508 BGP signaling protocol are allowed. 510 7. References 512 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating 513 MPLS in IP or Generic Routing Encapsulation (GRE)", 514 RFC 4023, March 2005. 516 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 517 Networks (VPNs)", RFC 4364, February 2006. 519 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 520 R., Patel, K., and J. Guichard, "Constrained Route 521 Distribution for Border Gateway Protocol/MultiProtocol 522 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 523 Private Networks (VPNs)", RFC 4684, November 2006. 525 [xmpp-ping] 526 "XMPP Ping", XEP 0199, June 2009. 528 [pubsub] "PubSub Collection Nodes", XEP 0248, September 2010. 530 Authors' Addresses 532 Pedro Marques 534 Email: pedro.r.marques@gmail.com 536 Luyuan Fang 537 Cisco Systems 538 111 Wood Avenue South 539 Iselin, NJ 08830 541 Email: lufang@cisco.com 543 Ping Pan 544 Infinera Corp 545 140 Caspian Ct. 546 Sunnyvale, CA 94089 548 Email: ppan@infinera.com