idnits 2.17.1 draft-ietf-l3vpn-virtual-subnet-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 54 instances of too long lines in the document, the longest one being 2 characters in excess of 72. == There are 26 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 4, 2014) is 3515 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei 4 Intended status: Informational R. Raszuk 5 Expires: March 8, 2015 6 S. Hares 8 Y. Fan 9 China Telecom 10 C. Jacquenet 11 Orange 12 T. Boyes 13 Bloomberg LP 14 B. Fee 15 Extreme Networks 16 September 4, 2014 18 Virtual Subnet: A L3VPN-based Subnet Extension Solution 19 draft-ietf-l3vpn-virtual-subnet-01 21 Abstract 23 This document describes a Layer3 Virtual Private Network (L3VPN)- 24 based subnet extension solution referred to as Virtual Subnet, which 25 can be used for building Layer3 network virtualization overlays 26 within and/or across data centers. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on March 8, 2015. 45 Copyright Notice 47 Copyright (c) 2014 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3. Solution Description . . . . . . . . . . . . . . . . . . . . 5 66 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 5 68 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 6 69 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 70 3.3. CE Host Discovery . . . . . . . . . . . . . . . . . . . . 9 71 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 72 3.5. CE Host Mobility . . . . . . . . . . . . . . . . . . . . 9 73 3.6. Forwarding Table Scalability on Data Center Switches . . 10 74 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 75 3.8. ARP/ND and Unknown Uncast Flood Avoidance . . . . . . . . 10 76 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 77 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 78 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 79 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 80 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 81 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 82 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 83 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 84 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 85 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 86 8.2. Informative References . . . . . . . . . . . . . . . . . 13 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 89 1. Introduction 91 For business continuity purpose, Virtual Machine (VM) migration 92 across data centers is commonly used in those situations such as data 93 center maintenance, data center migration, data center consolidation, 94 data center expansion, and data center disaster avoidance. It's 95 generally admitted that IP renumbering of servers (i.e., VMs) after 96 the migration is usually complex and costly at the risk of extending 97 the business downtime during the process of migration. To allow the 98 migration of a VM from one data center to another without IP 99 renumbering, the subnet on which the VM resides needs to be extended 100 across these data centers. 102 To achieve subnet extension across multiple Infrastructure-as- 103 a-Service (IaaS) cloud data centers in a scalable way, the following 104 requirements and challenges must be considered: 106 a. VPN Instance Space Scalability: In a modern cloud data center 107 environment, thousands or even tens of thousands of tenants could 108 be hosted over a shared network infrastructure. For security and 109 performance isolation purposes, these tenants need to be isolated 110 from one another. 112 b. Forwarding Table Scalability: With the development of server 113 virtualization technologies, it's not uncommon for a single cloud 114 data center to contain millions of VMs. This number already 115 implies a big challenge on the forwarding table scalability of 116 data center switches. Provided multiple data centers of such 117 scale were interconnected at layer2, this challenge would become 118 even worse. 120 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 121 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 122 maintained on default gateways within cloud data centers can 123 raise scalability issues. Therefore, it's very useful if the 124 ARP/ND cache table size could be prevented from growing by 125 multiples as the number of data centers to be connected 126 increases. 128 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 129 flooding of ARP/ND broadcast/multicast and unknown unicast 130 traffic within large Layer2 networks would affect the performance 131 of networks and hosts. As multiple data centers with each 132 containing millions of VMs are interconnected at layer2, the 133 impact of flooding as mentioned above would become even worse. 134 As such, it becomes increasingly important to avoid the flooding 135 of ARP/ND broadcast/multicast and unknown unicast traffic across 136 data centers. 138 e. Path Optimization: A subnet usually indicates a location in the 139 network. However, when a subnet has been extended across 140 multiple geographically dispersed data center locations, the 141 location semantics of such subnet is not retained any longer. As 142 a result, the traffic from a cloud user (i.e., a VPN user) which 143 is destined for a given server located at one data center 144 location of such extended subnet may arrive at another data 145 center location firstly according to the subnet route, and then 146 be forwarded to the location where the service is actually 147 located. This suboptimal routing would obviously result in an 148 unnecessary consumption of the bandwidth resource between data 149 centers. Furthermore, in the case where the traditional VPLS 150 technology [RFC4761] [RFC4762] is used for data center 151 interconnect and default gateways of different data center 152 locations are configured within the same virtual router 153 redundancy group, the returning traffic from that server to the 154 cloud user may be forwarded at layer2 to a default gateway 155 located at one of the remote data center premises, rather than 156 the one placed at the local data center location. This 157 suboptimal routing would also unnecessarily consume the bandwidth 158 resource between data centers 160 This document describes a L3VPN-based subnet extension solution 161 referred to as Virtual Subnet (VS), which can be used for data center 162 interconnection while addressing all of the requirements and 163 challenges as mentioned above. In addition, since VS is mainly built 164 on proven technologies such as BGP/MPLS IP VPN [RFC4364] and ARP/ND 165 proxy [RFC0925][RFC1027][RFC4389], those service providers offering 166 IaaS public cloud services could rely upon their existing BGP/MPLS IP 167 VPN infrastructures and their corresponding experiences to realize 168 data center interconnection. 170 Although Virtual Subnet is described in this document as an approach 171 for data center interconnection, it actually could be used within 172 data centers as well. 174 Note that the approach described in this document is not intended to 175 achieve an exact emulation of L2 connectivity and therefore it can 176 only support a restricted L2 connectivity service model with 177 limitations declared in Section 4. As for the discussion about in 178 which environment this service model should be suitable, it's outside 179 the scope of this document. 181 1.1. Requirements Language 183 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 184 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 185 document are to be interpreted as described in RFC 2119 [RFC2119]. 187 2. Terminology 189 This memo makes use of the terms defined in [RFC4364]. 191 3. Solution Description 193 3.1. Unicast 195 3.1.1. Intra-subnet Unicast 197 +--------------------+ 198 +-----------------+ | | +-----------------+ 199 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 200 | \ | | | | / | 201 | +------+ \++---+-+ +-+---++/ +------+ | 202 | |Host A+----+ PE-1 | | PE-2 +----+Host B| | 203 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 204 | 1.1.1.2/24 | | | | | | 1.1.1.3/24 | 205 | | | | | | | | 206 | DC West | | | IP/MPLS Backbone | | | DC East | 207 +-----------------+ | | | | +-----------------+ 208 | +--------------------+ | 209 | | 210 VRF_A : V VRF_A : V 211 +------------+---------+--------+ +------------+---------+--------+ 212 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 213 +------------+---------+--------+ +------------+---------+--------+ 214 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 215 +------------+---------+--------+ +------------+---------+--------+ 216 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 217 +------------+---------+--------+ +------------+---------+--------+ 218 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 219 +------------+---------+--------+ +------------+---------+--------+ 220 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 221 +------------+---------+--------+ +------------+---------+--------+ 222 Figure 1: Intra-subnet Unicast Example 224 As shown in Figure 1, two CE hosts (i.e., Hosts A and B) belonging to 225 the same subnet (i.e., 1.1.1.0/24) are located at different data 226 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 227 PE-1 and PE-2) which are used for interconnecting these two data 228 centers create host routes for their own local CE hosts respectively 229 and then advertise them via the BGP/MPLS IP VPN signaling. 230 Meanwhile, ARP proxy is enabled on VRF attachment circuits of these 231 PE routers. 233 Now assume host A sends an ARP request for host B before 234 communicating with host B. Upon receiving the ARP request, PE-1 235 acting as an ARP proxy returns its own MAC address as a response. 236 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 237 packets towards PE-2 which in turn forwards them to host B. Thus, 238 hosts A and B can communicate with each other as if they were located 239 within the same subnet. 241 3.1.2. Inter-subnet Unicast 243 +--------------------+ 244 +-----------------+ | | +-----------------+ 245 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 246 | \ | | | | / | 247 | +------+ \++---+-+ +-+---++/ +------+ | 248 | |Host A+------+ PE-1 | | PE-2 +-+----+Host B| | 249 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 250 | 1.1.1.2/24 | | | | | | | 1.1.1.3/24 | 251 | GW=1.1.1.4 | | | | | | | GW=1.1.1.4 | 252 | | | | | | | | +------+ | 253 | | | | | | | +----+ GW +--| 254 | | | | | | | /+------+ | 255 | | | | | | | 1.1.1.4/24 | 256 | | | | | | | | 257 | DC West | | | IP/MPLS Backbone | | | DC East | 258 +-----------------+ | | | | +-----------------+ 259 | +--------------------+ | 260 | | 261 VRF_A : V VRF_A : V 262 +------------+---------+--------+ +------------+---------+--------+ 263 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 264 +------------+---------+--------+ +------------+---------+--------+ 265 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 266 +------------+---------+--------+ +------------+---------+--------+ 267 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 268 +------------+---------+--------+ +------------+---------+--------+ 269 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 270 +------------+---------+--------+ +------------+---------+--------+ 271 | 1.1.1.4/32 | PE-2 | IBGP | | 1.1.1.4/32 | 1.1.1.4 | Direct | 272 +------------+---------+--------+ +------------+---------+--------+ 273 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 274 +------------+---------+--------+ +------------+---------+--------+ 275 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 | 1.1.1.4 | Static | 276 +------------+---------+--------+ +------------+---------+--------+ 277 Figure 2: Inter-subnet Unicast Example (1) 279 As shown in Figure 2, only one data center (i.e., DC East) is 280 deployed with a default gateway (i.e., GW). PE-2 which is connected 281 to GW would either be configured with or learn from GW a default 282 route with next-hop being pointed to GW. Meanwhile, this route is 283 distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 284 operation. Assume host A sends an ARP request for its default 285 gateway (i.e., 1.1.1.4) prior to communicating with a destination 286 host outside of its subnet. Upon receiving this ARP request, PE-1 287 acting as an ARP proxy returns its own MAC address as a response. 288 Host A then sends a packet for Host B to PE-1. PE-1 tunnels such 289 packet towards PE-2 according to the default route learnt from PE-2, 290 which in turn forwards that packet to GW. 292 +--------------------+ 293 +-----------------+ | | +-----------------+ 294 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 295 | \ | | | | / | 296 | +------+ \++---+-+ +-+---++/ +------+ | 297 | |Host A+----+-+ PE-1 | | PE-2 +-+----+Host B| | 298 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 299 | 1.1.1.2/24 | | | | | | | | 1.1.1.3/24 | 300 | GW=1.1.1.4 | | | | | | | | GW=1.1.1.4 | 301 | +------+ | | | | | | | | +------+ | 302 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +--| 303 | +------+\ | | | | | | /+------+ | 304 | 1.1.1.4/24 | | | | | | 1.1.1.4/24 | 305 | | | | | | | | 306 | DC West | | | IP/MPLS Backbone | | | DC East | 307 +-----------------+ | | | | +-----------------+ 308 | +--------------------+ | 309 | | 310 VRF_A : V VRF_A : V 311 +------------+---------+--------+ +------------+---------+--------+ 312 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 313 +------------+---------+--------+ +------------+---------+--------+ 314 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 315 +------------+---------+--------+ +------------+---------+--------+ 316 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 317 +------------+---------+--------+ +------------+---------+--------+ 318 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 319 +------------+---------+--------+ +------------+---------+--------+ 320 | 1.1.1.4/32 | 1.1.1.4 | Direct | | 1.1.1.4/32 | 1.1.1.4 | Direct | 321 +------------+---------+--------+ +------------+---------+--------+ 322 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 323 +------------+---------+--------+ +------------+---------+--------+ 324 | 0.0.0.0/0 | 1.1.1.4 | Static | | 0.0.0.0/0 | 1.1.1.4 | Static | 325 +------------+---------+--------+ +------------+---------+--------+ 326 Figure 3: Inter-subnet Unicast Example (2) 328 As shown in Figure 3, in the case where each data center is deployed 329 with a default gateway, CE hosts will get ARP responses directly from 330 their local default gateways, rather than from their local PE routers 331 when sending ARP requests for their default gateways. 333 +------+ 334 +------+ PE-3 +------+ 335 +-----------------+ | +------+ | +-----------------+ 336 |VPN_A:1.1.1.1/24 | | | |VPN_A:1.1.1.1/24 | 337 | \ | | | | / | 338 | +------+ \++---+-+ +-+---++/ +------+ | 339 | |Host A+------+ PE-1 | | PE-2 +------+Host B| | 340 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 341 | 1.1.1.2/24 | | | | | | 1.1.1.3/24 | 342 | GW=1.1.1.1 | | | | | | GW=1.1.1.1 | 343 | | | | | | | | 344 | DC West | | | IP/MPLS Backbone | | | DC East | 345 +-----------------+ | | | | +-----------------+ 346 | +--------------------+ | 347 | | 348 VRF_A : V VRF_A : V 349 +------------+---------+--------+ +------------+---------+--------+ 350 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 351 +------------+---------+--------+ +------------+---------+--------+ 352 | 1.1.1.1/32 |127.0.0.1| Direct | | 1.1.1.1/32 |127.0.0.1| Direct | 353 +------------+---------+--------+ +------------+---------+--------+ 354 | 1.1.1.2/32 | 1.1.1.2 | Direct | | 1.1.1.2/32 | PE-1 | IBGP | 355 +------------+---------+--------+ +------------+---------+--------+ 356 | 1.1.1.3/32 | PE-2 | IBGP | | 1.1.1.3/32 | 1.1.1.3 | Direct | 357 +------------+---------+--------+ +------------+---------+--------+ 358 | 1.1.1.0/24 | 1.1.1.1 | Direct | | 1.1.1.0/24 | 1.1.1.1 | Direct | 359 +------------+---------+--------+ +------------+---------+--------+ 360 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 361 +------------+---------+--------+ +------------+---------+--------+ 362 Figure 4: Inter-subnet Unicast Example (3) 364 Alternatively, as shown in Figure 4, PE routers themselves could be 365 directly configured as default gateways of their locally connected CE 366 hosts as long as these PE routers have routes for outside networks. 368 3.2. Multicast 370 To support IP multicast between CE hosts of the same virtual subnet, 371 MVPN technologies [RFC6513] could be directly used without any 372 change. For example, PE routers attached to a given VPN join a 373 default provider multicast distribution tree which is dedicated for 374 that VPN. Ingress PE routers, upon receiving multicast packets from 375 their local CE hosts, forward them towards remote PE routers through 376 the corresponding default provider multicast distribution tree. 378 3.3. CE Host Discovery 380 PE routers SHOULD be able to discover their local CE hosts and keep 381 the list of these hosts up to date in a timely manner so as to ensure 382 the availability and accuracy of the corresponding host routes 383 originated from them. PE routers could accomplish local CE host 384 discovery by some traditional host discovery mechanisms using ARP or 385 ND protocols. Furthermore, Link Layer Discovery Protocol (LLDP) or 386 VSI Discovery and Configuration Protocol (VDP), or even interaction 387 with the data center orchestration system could also be considered as 388 a means to dynamically discover local CE hosts 390 3.4. ARP/ND Proxy 392 Acting as an ARP or ND proxies, a PE routers SHOULD only respond to 393 an ARP request or Neighbor Solicitation (NS) message for a target 394 host when it has a best route for that target host in the associated 395 VRF and the outgoing interface of that best route is different from 396 the one over which the ARP request or NS message is received. In the 397 scenario where a given VPN site (i.e., a data center) is multi-homed 398 to more than one PE router via an Ethernet switch or an Ethernet 399 network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 400 usually enabled on these PE routers. In this case, only the PE 401 router being elected as the VRRP Master is allowed to perform the 402 ARP/ND proxy function. 404 3.5. CE Host Mobility 406 During the VM migration process, the PE router to which the moving VM 407 is now attached would create a host route for that CE host upon 408 receiving a notification message of VM attachment (e.g., a gratuitous 409 ARP or unsolicited NA message). The PE router to which the moving VM 410 was previously attached would withdraw the corresponding host route 411 when receiving a notification message of VM detachment (e.g., a VDP 412 message about VM detachment). Meanwhile, the latter PE router could 413 optionally broadcast a gratuitous ARP or send an unsolicited NA 414 message on behalf of that CE host with source MAC address being one 415 of its own. In this way, the ARP/ND entry of this CE host that moved 416 and which has been cached on any local CE host would be updated 417 accordingly. In the case where there is no explicit VM detachment 418 notification mechanism, the PE router could also use the following 419 trick to determine the VM detachment event: upon learning a route 420 update for a local CE host from a remote PE router for the first 421 time, the PE router could immediately check whether that local CE 422 host is still attached to it by some means (e.g., ARP/ND PING and/or 423 ICMP PING). It is important to ensure that the same MAC and IP are 424 associated to the default gateway active in each data center, as the 425 VM would most likely continue to send packets to the same default 426 gateway address after migrated from one data center to another. One 427 possible way to achieve this goal is to configure the same VRRP group 428 on each location so as to ensure the default gateway active in each 429 data center share the same virtual MAC and virtual IP addresses. 431 3.6. Forwarding Table Scalability on Data Center Switches 433 In a VS environment, the MAC learning domain associated with a given 434 virtual subnet which has been extended across multiple data centers 435 is partitioned into segments and each segment is confined within a 436 single data center. Therefore data center switches only need to 437 learn local MAC addresses, rather than learning both local and remote 438 MAC addresses. 440 3.7. ARP/ND Cache Table Scalability on Default Gateways 442 When default gateway functions are implemented on PE routers as shown 443 in Figure 4, the ARP/ND cache table on each PE router only needs to 444 contain ARP/ND entries of local CE hosts As a result, the ARP/ND 445 cache table size would not grow as the number of data centers to be 446 connected increases. 448 3.8. ARP/ND and Unknown Uncast Flood Avoidance 450 In VS, the flooding domain associated with a given virtual subnet 451 that has been extended across multiple data centers, is partitioned 452 into segments and each segment is confined within a single data 453 center. Therefore, the performance impact on networks and servers 454 imposed by the flooding of ARP/ND broadcast/multicast and unknown 455 unicast traffic is alleviated. 457 3.9. Path Optimization 459 Take the scenario shown in Figure 4 as an example, to optimize the 460 forwarding path for the traffic between cloud users and cloud data 461 centers, PE routers located at cloud data centers (i.e., PE-1 and PE- 462 2), which are also acting as default gateways, propagate host routes 463 for their own local CE hosts respectively to remote PE routers which 464 are attached to cloud user sites (i.e., PE-3). As such, the traffic 465 from cloud user sites to a given server on the virtual subnet which 466 has been extended across data centers would be forwarded directly to 467 the data center location where that server resides, since the traffic 468 is now forwarded according to the host route for that server, rather 469 than the subnet route. Furthermore, for the traffic coming from 470 cloud data centers and forwarded to cloud user sites, each PE router 471 acting as a default gateway would forward the traffic according to 472 the best-match route in the corresponding VRF. As a result, the 473 traffic from data centers to cloud user sites is forwarded along an 474 optimal path as well. 476 4. Limitations 478 4.1. Non-support of Non-IP Traffic 480 Although most traffic within and across data centers is IP traffic, 481 there may still be a few legacy clustering applications which rely on 482 non-IP communications (e.g., heartbeat messages between cluster 483 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 484 those non-IP communications cannot be supported in the Virtual Subnet 485 solution. In order to support those few non-IP traffic (if present) 486 in the environment where the Virtual Subnet solution has been 487 deployed, the approach following the idea of "route all IP traffic, 488 bridge non-IP traffic" could be considered. That's to say, all IP 489 traffic including both intra-subnet and inter-subnet would be 490 processed by the Virtual Subnet process, while the non-IP traffic 491 would be resorted to a particular Layer2 VPN approach. Such unified 492 L2/L3 VPN approach requires ingress PE routers to classify the 493 traffic received from CE hosts before distributing them to the 494 corresponding L2 or L3 VPN forwarding processes. Note that more and 495 more cluster vendors are offering clustering applications based on 496 Layer 3 interconnection. 498 4.2. Non-support of IP Broadcast and Link-local Multicast 500 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 501 the Virtual Subnet solution. Therefore, IP broadcast and link-local 502 multicast traffic cannot be supported by the Virtual Subnet solution. 503 In order to support the IP broadcast and link-local multicast traffic 504 in the environment where the Virtual Subnet solution has been 505 deployed, the unified L2/L3 overlay approach as described in 506 Section 4.1 could be considered as well. That's to say, the IP 507 broadcast and link-local multicast would be resorted to the L2VPN 508 forwarding process while the routable IP traffic would be processed 509 by the Virtual Subnet process. 511 4.3. TTL and Traceroute 513 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 514 the Virtual Subnet context. Since it doesn't require any change to 515 the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a 516 traceroute operation on one CE host for another CE host (assuming 517 that these two hosts are within the same subnet but are attached to 518 different sites), the traceroute output would reflect the fact that 519 these two hosts belonging to the same subnet are actually connected 520 via an virtual subnet emulated by ARP proxy, rather than a normal 521 LAN. In addition, for any other applications which generate intra- 522 subnet traffic with TTL set to 1, these applications may not be 523 workable in the Virtual Subnet context, unless special TTL processing 524 for such case has been implemented (e.g., if the source and 525 destination addresses of a packet whose TTL is set to 1 belong to the 526 same extended subnet, both ingress and egress PE routers MUST NOT 527 decrement the TTL of such packet. Furthermore, the TTL of such 528 packet SHOULD NOT be copied into the TTL of the transport tunnel and 529 vice versa). 531 5. Acknowledgements 533 Thanks to Dino Farinacci, Himanshu Shah, Nabil Bitar, Giles Heron, 534 Ronald Bonica, Monique Morrow, Rajiv Asati, Eric Osborne, Thomas 535 Morin, Martin Vigoureux, Pedro Roque Marque, Joe Touch and Wim 536 Henderickx for their valuable comments and suggestions on this 537 document. 539 6. IANA Considerations 541 There is no requirement for any IANA action. 543 7. Security Considerations 545 This document doesn't introduce additional security risk to BGP/MPLS 546 IP VPN, nor does it provide any additional security feature for BGP/ 547 MPLS IP VPN. 549 8. References 551 8.1. Normative References 553 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 554 October 1984. 556 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 557 implement transparent subnet gateways", RFC 1027, October 558 1987. 560 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 561 Requirement Levels", BCP 14, RFC 2119, March 1997. 563 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 564 Networks (VPNs)", RFC 4364, February 2006. 566 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 567 Proxies (ND Proxy)", RFC 4389, April 2006. 569 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 570 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 571 4761, January 2007. 573 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 574 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 575 RFC 4762, January 2007. 577 [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP) 578 Version 3 for IPv4 and IPv6", RFC 5798, March 2010. 580 [RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP 581 VPNs", RFC 6513, February 2012. 583 8.2. Informative References 585 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 586 Problems in Large Data Center Networks", RFC 6820, January 587 2013. 589 Authors' Addresses 591 Xiaohu Xu 592 Huawei 594 Email: xuxiaohu@huawei.com 596 Robert Raszuk 598 Email: robert@raszuk.net 600 Susan Hares 602 Email: shares@ndzh.com 604 Yongbing Fan 605 China Telecom 607 Email: fanyb@gsta.com 609 Christian Jacquenet 610 Orange 612 Email: christian.jacquenet@orange.com 613 Truman Boyes 614 Bloomberg LP 616 Email: tboyes@bloomberg.net 618 Brendan Fee 619 Extreme Networks 621 Email: bfee@enterasys.com