idnits 2.17.1 draft-ietf-l3vpn-virtual-subnet-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 108 instances of too long lines in the document, the longest one being 11 characters in excess of 72. == There are 26 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 423 has weird spacing: '... a best route...' -- The document date (March 3, 2014) is 3706 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network working group X. Xu 2 Internet Draft Huawei 3 Category: Informational 4 R. Raszuk 6 S. Hares 8 Y. Fan 9 China Telecom 11 C. Jacquenet 12 Orange 14 T. Boyes 15 Bloomberg LP 17 B Fee 18 Extreme Networks 20 Expires: September 2014 March 3, 2014 22 Virtual Subnet: A L3VPN-based Subnet Extension Solution 24 draft-ietf-l3vpn-virtual-subnet-00 26 Abstract 28 This document describes a Layer3 Virtual Private Network (L3VPN)- 29 based subnet extension solution referred to as Virtual Subnet, which 30 can be used as a kind of Layer3 network virtualization overlay 31 approach for data center interconnect. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF), its areas, and its working groups. Note that 40 other groups may also distribute working documents as Internet- 41 Drafts. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/ietf/1id-abstracts.txt. 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html. 54 This Internet-Draft will expire on September 3, 2014. 56 Copyright Notice 58 Copyright (c) 2013 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Conventions used in this document 73 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 74 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 75 document are to be interpreted as described in RFC-2119 [RFC2119]. 77 Table of Contents 79 1. Introduction ................................................ 4 80 2. Terminology ................................................. 6 81 3. Solution Description......................................... 6 82 3.1. Unicast ................................................ 6 83 3.1.1. Intra-subnet Unicast .............................. 6 84 3.1.2. Inter-subnet Unicast .............................. 7 85 3.2. Multicast .............................................. 9 86 3.3. CE Host Discovery ..................................... 10 87 3.4. ARP/ND Proxy .......................................... 10 88 3.5. CE Host Mobility ...................................... 10 89 3.6. Forwarding Table Scalability on Data Center Switches .. 11 90 3.7. ARP/ND Cache Table Scalability on Default Gateways .... 11 91 3.8. ARP/ND and Unknown Uncast Flood Avoidance ............. 11 92 3.9. Path Optimization ..................................... 11 93 4. Limitations ................................................ 12 94 4.1. Non-support of Non-IP Traffic ......................... 12 95 4.2. Non-support of IP Broadcast and Link-local Multicast .. 12 96 4.3. TTL and Traceroute .................................... 13 97 5. Security Considerations .................................... 13 98 6. IANA Considerations ........................................ 13 99 7. Acknowledgements ........................................... 13 100 8. References ................................................. 13 101 8.1. Normative References .................................. 13 102 8.2. Informative References ................................ 14 103 Authors' Addresses ............................................ 14 105 1. Introduction 107 For business continuity purposes, Virtual Machine (VM) migration 108 across data centers is commonly used in those situations such as data 109 center maintenance, data center migration, data center consolidation, 110 data center expansion, and data center disaster avoidance. It's 111 generally admitted that IP renumbering of servers (i.e., VMs) after 112 the migration is usually complex and costly at the risk of extending 113 the business downtime during the process of migration. To allow the 114 migration of a VM from one data center to another without IP 115 renumbering, the subnet on which the VM resides needs to be extended 116 across these data centers. 118 In Infrastructure-as-a-Service (IaaS) cloud data center environments, 119 to achieve subnet extension across multiple data centers in a 120 scalable way, the following requirements SHOULD be considered for any 121 data center interconnect solution: 123 1) VPN Instance Space Scalability 125 In a modern cloud data center environment, thousands or even tens 126 of thousands of tenants could be hosted over a shared network 127 infrastructure. For security and performance isolation purposes, 128 these tenants need to be isolated from one another. Hence, the 129 data center interconnect solution SHOULD be capable of providing a 130 large enough Virtual Private Network (VPN) instance space for 131 tenant isolation. 133 2) Forwarding Table Scalability 135 With the development of server virtualization technologies, a 136 single cloud data center containing millions of VMs is not 137 uncommon. This number already implies a big challenge for data 138 center switches, especially for core/aggregation switches, from 139 the perspective of forwarding table scalability. Provided that 140 multiple data centers of such scale were interconnected at layer2, 141 this challenge would be even worse. Hence an ideal data center 142 interconnect solution SHOULD prevent the forwarding table size of 143 data center switches from growing by folds as the number of data 144 centers to be interconnected increases. 146 3) ARP/ND Cache Table Scalability on Default Gateways 148 [RFC6820] notes that the Address Resolution Protocol 149 (ARP)/Neighbor Discovery (ND) cache tables maintained by data 150 center default gateways in cloud data centers can raise both 151 scalability and security issues. Therefore, an ideal data center 152 interconnect solution SHOULD prevent the ARP/ND cache table size 153 from growing by multiples as the number of data centers to be 154 connected increases. 156 4) ARP/ND and Unknown Unicast Flood Suppression or Avoidance 158 It's well-known that the flooding of Address Resolution Protocol 159 (ARP)/Neighbor Discovery (ND) broadcast/multicast and unknown 160 unicast traffic within a large Layer2 network are likely to affect 161 performances of networks and hosts. As multiple data centers each 162 containing millions of VMs are interconnected together across the 163 Wide Area Network (WAN) at layer2, the impact of flooding as 164 mentioned above will become even worse. As such, it becomes 165 increasingly desirable for data center operators to suppress or 166 even avoid the flooding of ARP/ND broadcast/multicast and unknown 167 unicast traffic across data centers. 169 5) Path Optimization 171 A subnet usually indicates a location in the network. However, 172 when a subnet has been extended across multiple geographically 173 dispersed data center locations, the location semantics of such 174 subnet is not retained any longer. As a result, the traffic from a 175 cloud user (i.e., a VPN user) which is destined for a given server 176 located at one data center location of such extended subnet may 177 arrive at another data center location firstly according to the 178 subnet route, and then be forwarded to the location where the 179 service is actually located. This suboptimal routing would 180 obviously result in the unnecessary consumption of the bandwidth 181 resources which are intended for data center interconnection. 182 Furthermore, in the case where the traditional VPLS technology 183 [RFC4761, RFC4762] is used for data center interconnect and 184 default gateways of different data center locations are configured 185 within the same virtual router redundancy group, the returning 186 traffic from that server to the cloud user may be forwarded at 187 layer2 to a default gateway located at one of the remote data 188 center premises, rather than the one placed at the local data 189 center location. This suboptimal routing would also unnecessarily 190 consume the bandwidth resources which are intended for data center 191 interconnect. 193 This document describes a L3VPN-based subnet extension solution 194 referred to as Virtual Subnet (VS), which can meet all of the 195 requirements of cloud data center interconnect as described above. 196 Since VS mainly reuses existing technologies including BGP/MPLS IP 197 VPN [RFC4364] and ARP/ND proxy [RFC925][RFC1027][RFC4389], it allows 198 those service providers offering IaaS public cloud services to 199 interconnect their geographically dispersed data centers in a much 200 scalable way, and more importantly, data center interconnection 201 design can rely upon their existing MPLS/BGP IP VPN infrastructures 202 and their experiences in the delivery and the operation of MPLS/BGP 203 IP VPN services. 205 Although Virtual Subnet is described as a data center interconnection 206 solution in this document, there is no reason to assume that this 207 technology couldn't be used within data centers. 209 Note that the approach described in this document is not intended to 210 achieve an exact emulation of L2 connectivity and therefore can only 211 support a restricted L2 connectivity service model with limitations 212 declared in Section 4. As for the discussion about in which 213 environment this service model should be suitable, it's outside the 214 scope of this document. 216 2. Terminology 218 This memo makes use of the terms defined in [RFC4364]. 220 3. Solution Description 222 3.1. Unicast 224 3.1.1. Intra-subnet Unicast 225 +--------------------+ 226 +-----------------+ | | +-----------------+ 227 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 228 | \ | | | | / | 229 | +------+ \++---+-+ +-+---++/ +------+ | 230 | |Host A+----+ PE-1 | | PE-2 +----+Host B| | 231 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 232 | 1.1.1.2/24 | | | | | | 1.1.1.3/24 | 233 | | | | | | | | 234 | DC West | | | IP/MPLS Backbone | | | DC East | 235 +-----------------+ | | | | +-----------------+ 236 | +--------------------+ | 237 | | 238 VRF_A : V VRF_A : V 239 +------------+---------+--------+ +------------+---------+--------+ 240 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 241 +------------+---------+--------+ +------------+---------+--------+ 242 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 243 +------------+---------+--------+ +------------+---------+--------+ 244 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 245 +------------+---------+--------+ +------------+---------+--------+ 246 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 247 +------------+---------+--------+ +------------+---------+--------+ 249 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 250 +------------+---------+--------+ +------------+---------+--------+ 251 Figure 1: Intra-subnet Unicast Example 253 As shown in Figure 1, two CE hosts (i.e., Hosts A and B) belonging to 254 the same subnet (i.e., 1.1.1.0/24) are located at different data 255 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 256 PE-1 and PE-2) which are used for interconnecting these two data 257 centers create host routes for their local CE hosts respectively and 258 then advertise them via L3VPN signaling. Meanwhile, ARP proxy is 259 enabled on VRF attachment circuits of these PE routers. 261 Now assume host A sends an ARP request for host B before 262 communicating with host B. Upon receiving the ARP request, PE-1 263 acting as an ARP proxy returns its own MAC address as a response. 264 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 265 packets towards PE-2 which in turn forwards them to host B. Thus, 266 hosts A and B can communicate with each other as if they were located 267 within the same subnet. 269 3.1.2. Inter-subnet Unicast 270 +--------------------+ 271 +-----------------+ | | +-----------------+ 272 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 273 | \ | | | | / | 274 | +------+ \++---+-+ +-+---++/ +------+ | 275 | |Host A+------+ PE-1 | | PE-2 +-+----+Host B| | 276 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 277 | 1.1.1.2/24 | | | | | | | 1.1.1.3/24 | 278 | GW=1.1.1.4 | | | | | | | GW=1.1.1.4 | 279 | | | | | | | | +------+ | 280 | | | | | | | +----+ GW +--| 281 | | | | | | | /+------+ | 282 | | | | | | | 1.1.1.4/24 | 283 | | | | | | | | 284 | DC West | | | IP/MPLS Backbone | | | DC East | 285 +-----------------+ | | | | +-----------------+ 286 | +--------------------+ | 287 | | 288 VRF_A : V VRF_A : V 289 +------------+---------+--------+ +------------+---------+--------+ 290 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 291 +------------+---------+--------+ +------------+---------+--------+ 292 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 293 +------------+---------+--------+ +------------+---------+--------+ 294 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 295 +------------+---------+--------+ +------------+---------+--------+ 296 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 297 +------------+---------+--------+ +------------+---------+--------+ 298 | 1.1.1.4/32 | PE-2 | IBGP | | 1.1.1.4/32 | 1.1.1.4 | Direct | 299 +------------+---------+--------+ +------------+---------+--------+ 300 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 301 +------------+---------+--------+ +------------+---------+--------+ 302 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 | 1.1.1.4 | Static | 303 +------------+---------+--------+ +------------+---------+--------+ 304 Figure 2: Inter-subnet Unicast Example (1) 306 As shown in Figure 2, only one data center (i.e., DC East) is 307 deployed with a default gateway (i.e., GW). PE-2 which is connected 308 to GW would either be configured with or learn from GW a default 309 route with next-hop being pointed to GW. Meanwhile, this route is 310 distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 311 operation. Assume host A sends an ARP request for its default 312 gateway (i.e., 1.1.1.4) prior to communicating with a destination 313 host outside of its subnet. Upon receiving this ARP request, PE-1 314 acting as an ARP proxy returns its own MAC address as a response. 315 Host A then sends a packet for Host B to PE-1. PE-1 tunnels such 316 packet towards PE-2 according to the default route learnt from PE-2, 317 which in turn forwards that packet to GW. 318 +--------------------+ 319 +-----------------+ | | +-----------------+ 320 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 321 | \ | | | | / | 322 | +------+ \++---+-+ +-+---++/ +------+ | 323 | |Host A+----+-+ PE-1 | | PE-2 +-+----+Host B| | 324 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 325 | 1.1.1.2/24 | | | | | | | | 1.1.1.3/24 | 326 | GW=1.1.1.4 | | | | | | | | GW=1.1.1.4 | 327 | +------+ | | | | | | | | +------+ | 328 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +--| 329 | +------+\ | | | | | | /+------+ | 330 | 1.1.1.4/24 | | | | | | 1.1.1.4/24 | 331 | | | | | | | | 332 | DC West | | | IP/MPLS Backbone | | | DC East | 333 +-----------------+ | | | | +-----------------+ 334 | +--------------------+ | 335 | | 336 VRF_A : V VRF_A : V 337 +------------+---------+--------+ +------------+---------+--------+ 338 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 339 +------------+---------+--------+ +------------+---------+--------+ 340 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 341 +------------+---------+--------+ +------------+---------+--------+ 342 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 343 +------------+---------+--------+ +------------+---------+--------+ 344 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 345 +------------+---------+--------+ +------------+---------+--------+ 346 | 1.1.1.4/32 | 1.1.1.4 | Direct | | 1.1.1.4/32 | 1.1.1.4 | Direct | 347 +------------+---------+--------+ +------------+---------+--------+ 348 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 349 +------------+---------+--------+ +------------+---------+--------+ 350 | 0.0.0.0/0 | 1.1.1.4 | Static | | 0.0.0.0/0 | 1.1.1.4 | Static | 351 +------------+---------+--------+ +------------+---------+--------+ 352 Figure 3: Inter-subnet Unicast Example (2) 354 As shown in Figure 3, in the case where each data center is deployed 355 with a default gateway, CE hosts will get ARP responses directly from 356 their local default gateways, rather than from their local PE routers 357 when sending ARP requests for their default gateways. 358 +------+ 359 +------+ PE-3 +------+ 360 +-----------------+ | +------+ | +-----------------+ 361 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 362 | \ | | | | / | 363 | +------+ \++---+-+ +-+---++/ +------+ | 364 | |Host A+------+ PE-1 | | PE-2 +------+Host B| | 365 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 366 | 1.1.1.2/24 | | | | | | 1.1.1.3/24 | 367 | GW=1.1.1.1 | | | | | | GW=1.1.1.1 | 368 | | | | | | | | 369 | DC West | | | IP/MPLS Backbone | | | DC East | 370 +-----------------+ | | | | +-----------------+ 371 | +--------------------+ | 372 | | 373 VRF_A : V VRF_A : V 374 +------------+---------+--------+ +------------+---------+--------+ 375 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 376 +------------+---------+--------+ +------------+---------+--------+ 377 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 378 +------------+---------+--------+ +------------+---------+--------+ 379 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 380 +------------+---------+--------+ +------------+---------+--------+ 381 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 382 +------------+---------+--------+ +------------+---------+--------+ 383 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 384 +------------+---------+--------+ +------------+---------+--------+ 385 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 386 +------------+---------+--------+ +------------+---------+--------+ 387 Figure 4: Inter-subnet Unicast Example (3) 389 Alternatively, as shown in Figure 4, PE routers themselves could be 390 directly configured as default gateways of their locally connected CE 391 hosts as long as these PE routers have routes for outside networks. 393 3.2. Multicast 395 To support IP multicast between CE hosts of the same virtual subnet, 396 MVPN technology [MVPN] could be directly reused. For example, PE 397 routers attached to a given VPN join a default provider multicast 398 distribution tree which is dedicated for that VPN. Ingress PE routers, 399 upon receiving multicast packets from their local CE hosts, forward 400 them towards remote PE routers through the corresponding default 401 provider multicast distribution tree. 403 More details about how to support multicast and broadcast in VS will 404 be explored in a later version of this document. 406 3.3. CE Host Discovery 408 PE routers SHOULD be able to discover their local CE hosts and keep 409 the list of these hosts up to date in a timely manner so as to ensure 410 the availability and accuracy of the corresponding host routes 411 originated from them. PE routers could accomplish local CE host 412 discovery by some traditional host discovery mechanisms using ARP or 413 ND protocols. Furthermore, Link Layer Discovery Protocol (LLDP) 414 described in [802.1AB] or VSI Discovery and Configuration Protocol 415 (VDP) described in [802.1Qbg], or even interaction with the data 416 center orchestration system could also be considered as a means to 417 dynamically discover local CE hosts. 419 3.4. ARP/ND Proxy 421 Acting as an ARP or ND proxies, a PE routers SHOULD only respond to 422 an ARP request or Neighbor Solicitation (NS) message for a target 423 host when it has a best route for that target host in the associated 424 VRF and the outgoing interface of that best route is different from 425 the one over which the ARP request or NS message is received. 427 In the scenario where a given VPN site (i.e., a data center) is 428 multi-homed to more than one PE router via an Ethernet switch or an 429 Ethernet network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] 430 is usually enabled on these PE routers. In this case, only the PE 431 router being elected as the VRRP Master is allowed to perform the 432 ARP/ND proxy function. 434 3.5. CE Host Mobility 436 During the VM migration process, the PE router to which the moving VM 437 is now attached would create a host route for that CE host upon 438 receiving a notification message of VM attachment (e.g., a gratuitous 439 ARP or unsolicited NA message). The PE router to which the moving VM 440 was previously attached would withdraw the corresponding host route 441 when receiving a notification message of VM detachment (e.g., a VDP 442 message about VM detachment). Meanwhile, the latter PE router could 443 optionally broadcast a gratuitous ARP or send an unsolicited NA 444 message on behalf of that CE host with source MAC address being one 445 of its own. In this way, the ARP/ND entry of this CE host that moved 446 and which has been cached on any local CE host would be updated 447 accordingly. In the case where there is no explicit VM detachment 448 notification mechanism, the PE router could also use the following 449 trick to determine the VM detachment event: upon learning a route 450 update for a local CE host from a remote PE router for the first time, 451 the PE router could immediately check whether that local CE host is 452 still attached to it by some means (e.g., ARP/ND PING and/or ICMP 453 PING). 455 It is important to ensure that the same MAC and IP are associated to 456 the default gateway active in each data center, as the VM would most 457 likely continue to send packets to the same default gateway address 458 after migrated from one data center to another. One possible way to 459 achieve this goal is to configure the same VRRP group on each 460 location so as to ensure the default gateway active in each data 461 center share the same virtual MAC and virtual IP addresses. 463 3.6. Forwarding Table Scalability on Data Center Switches 465 In a VS environment, the MAC learning domain associated with a given 466 virtual subnet which has been extended across multiple data centers 467 is partitioned into segments and each segment is confined within a 468 single data center. Therefore data center switches only need to learn 469 local MAC addresses, rather than learning both local and remote MAC 470 addresses. 472 3.7. ARP/ND Cache Table Scalability on Default Gateways 474 In case where data center default gateway functions are implemented 475 on PE routers of the VS as shown in Figure 4, since the ARP/ND cache 476 table on each PE router only needs to contain ARP/ND entries of local 477 CE hosts, the ARP/ND cache table size will not grow as the number of 478 data centers to be connected increases. 480 3.8. ARP/ND and Unknown Uncast Flood Avoidance 482 In VS, the flooding domain associated with a given virtual subnet 483 that has been extended across multiple data centers, has been 484 partitioned into segments and each segment is confined within a 485 single data center. Therefore, the performance impact on networks and 486 servers caused by the flooding of ARP/ND broadcast/multicast and 487 unknown unicast traffic is alleviated. 489 3.9. Path Optimization 491 Take the scenario shown in Figure 4 as an example, to optimize the 492 forwarding path for traffic between cloud users and cloud data 493 centers, PE routers located at cloud data centers (i.e., PE-1 and PE- 494 2), which are also data center default gateways, propagate host 495 routes for their local CE hosts respectively to remote PE routers 496 which are attached to cloud user sites (i.e., PE-3). 498 As such, traffic from cloud user sites to a given server on the 499 virtual subnet which has been extended across data centers would be 500 forwarded directly to the data center location where that server 501 resides, since traffic is now forwarded according to the host route 502 for that server, rather than the subnet route. 504 Furthermore, for traffic coming from cloud data centers and forwarded 505 to cloud user sites, each PE router acting as a default gateway would 506 forward the traffic received from its local CE hosts according to the 507 best-match route in the corresponding VRF. As a result, traffic from 508 data centers to cloud user sites is forwarded along the optimal path 509 as well. 511 4. Limitations 513 4.1. Non-support of Non-IP Traffic 515 Although most traffic within and across data centers is IP traffic, 516 there may still be a few legacy clustering applications which rely on 517 non-IP communications (e.g., heartbeat messages between cluster 518 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 519 those non-IP communications cannot be supported in the Virtual Subnet 520 solution. In order to support those few non-IP traffic (if present) 521 in the environment where the Virtual Subnet solution has been 522 deployed, the approach following the idea of "route all IP traffic, 523 bridge non-IP traffic" could be considered. That's to say, all IP 524 traffic including both intra-subnet and inter-subnet would be 525 processed by the Virtual Subnet process, while the non-IP traffic 526 would be resorted to a particular Layer2 VPN approach. Such unified 527 L2/L3 VPN approach requires ingress PE routers to classify the 528 traffic received from CE hosts before distributing them to the 529 corresponding L2 or L3 VPN forwarding processes. 531 Note that more and more cluster vendors are offering clustering 532 applications based on Layer 3 interconnection. 534 4.2. Non-support of IP Broadcast and Link-local Multicast 536 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 537 the Virtual Subnet solution. Therefore, IP broadcast and link-local 538 multicast traffic cannot be supported by the Virtual Subnet solution. 539 In order to support the IP broadcast and link-local multicast traffic 540 in the environment where the Virtual Subnet solution has been 541 deployed, the unified L2/L3 overlay approach as described in Section 542 4.1 could be considered as well. That's to say, the IP broadcast and 543 link-local multicast would be resorted to the L2VPN forwarding 544 process while the routable IP traffic would be processed by the 545 Virtual Subnet process. 547 4.3. TTL and Traceroute 549 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 550 the Virtual Subnet context. Since it doesn't require any change to 551 the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a 552 traceroute operation on one CE host for another CE host (assuming 553 that these two hosts are within the same subnet but are attached to 554 different sites), the traceroute output would reflect the fact that 555 these two hosts belonging to the same subnet are actually connected 556 via an virtual subnet emulated by ARP proxy, rather than a normal LAN. 557 In addition, for any other applications which generate intra-subnet 558 traffic with TTL set to 1, these applications may not be workable in 559 the Virtual Subnet context, unless special TTL processing for such 560 case has been implemented (e.g., if the source and destination 561 addresses of a packet whose TTL is set to 1 belong to the same 562 extended subnet, both ingress and egress PE routers MUST NOT 563 decrement the TTL of such packet. Furthermore, the TTL of such packet 564 SHOULD NOT be copied into the TTL of the transport tunnel and vice 565 versa). 567 5. Security Considerations 569 This document doesn't introduce additional security risk to BGP/MPLS 570 IP VPN, nor does it provide any additional security feature for 571 BGP/MPLS IP VPN. 573 6. IANA Considerations 575 There is no requirement for any IANA action. 577 7. Acknowledgements 579 Thanks to Dino Farinacci, Himanshu Shah, Nabil Bitar, Giles Heron, 580 Ronald Bonica, Monique Morrow, Rajiv Asati, Eric Osborne, Thomas 581 Morin, Martin Vigoureux, Pedro Roque Marque, Joe Touch and Wim 582 Henderickx for their valuable comments and suggestions on this 583 document. 585 8. References 587 8.1. Normative References 589 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 590 Requirement Levels", BCP 14, RFC 2119, March 1997. 592 8.2. Informative References 594 [RFC4364] Rosen. E and Y. Rekhter, "BGP/MPLS IP Virtual Private 595 Networks (VPNs)", RFC 4364, February 2006. 597 [MVPN] Rosen. E and Aggarwal. R, "Multicast in MPLS/BGP IP VPNs", 598 draft-ietf-l3vpn-2547bis-mcast-10.txt, Work in Progress, 599 Janurary 2010. 601 [RFC925] Postel, J., "Multi-LAN Address Resolution", RFC-925, USC 602 Information Sciences Institute, October 1984. 604 [RFC1027] Smoot Carl-Mitchell, John S. Quarterman, "Using ARP to 605 Implement Transparent Subnet Gateways", RFC 1027, October 606 1987. 608 [RFC4389] D. Thaler, M. Talwar, and C. Patel, "Neighbor Discovery 609 Proxies (ND Proxy) ", RFC 4389, April 2006. 611 [RFC5798] S. Nadas., "Virtual Router Redundancy Protocol", RFC 5798, 612 March 2010. 614 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 615 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 616 4761, January 2007. 618 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 619 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 620 RFC 4762, January 2007. 622 [802.1AB] IEEE Standard 802.1AB-2009, "Station and Media Access 623 Control Connectivity Discovery", September 17, 2009. 625 [802.1Qbg] IEEE Draft Standard P802.1Qbg/D2.0, "Virtual Bridged Local 626 Area Networks -Amendment XX: Edge Virtual Bridging", Work 627 in Progress, December 1, 2011. 629 [RFC6820] Narten, T., Karir, M., and I. Foo, "Problem Statement for 630 ARMD", RFC 6820, January 2013. 632 Authors' Addresses 634 Xiaohu Xu 635 Huawei Technologies, 636 Beijing, China. 637 Phone: +86 10 60610041 638 Email: xuxiaohu@huawei.com 639 Robert Raszuk 640 Email: robert@raszuk.net 642 Susan Hares 643 Email: shares@ndzh.com 645 Yongbing Fan 646 Guangzhou Institute, China Telecom 647 Guangzhou, China. 648 Phone: +86 20 38639121 649 Email: fanyb@gsta.com 651 Christian Jacquenet 652 Orange 653 Rennes France 654 Email: christian.jacquenet@orange.com 656 Truman Boyes 657 Bloomberg LP 658 Phone: +1 2126174826 659 Email: tboyes@bloomberg.net 661 Brendan Fee 662 Extreme Networks 663 9 Northeastern Blvd. 664 Salem, NH, 03079 665 Email: bfee@enterasys.com