idnits 2.17.1 draft-ietf-bess-virtual-subnet-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 8 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 1, 2015) is 3242 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei 4 Intended status: Informational R. Raszuk 5 Expires: December 3, 2015 Mirantis Inc. 6 C. Jacquenet 7 Orange 8 T. Boyes 9 Bloomberg LP 10 B. Fee 11 Extreme Networks 12 June 1, 2015 14 Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution 15 draft-ietf-bess-virtual-subnet-00 17 Abstract 19 This document describes a BGP/MPLS IP VPN-based subnet extension 20 solution referred to as Virtual Subnet, which can be used for 21 building Layer3 network virtualization overlays within and/or between 22 data centers. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on December 3, 2015. 41 Copyright Notice 43 Copyright (c) 2015 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 62 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 5 64 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 6 65 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.3. CE Host Discovery . . . . . . . . . . . . . . . . . . . . 9 67 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 68 3.5. CE Host Mobility . . . . . . . . . . . . . . . . . . . . 9 69 3.6. Forwarding Table Scalability on Data Center Switches . . 10 70 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 71 3.8. ARP/ND and Unknown Uncast Flood Avoidance . . . . . . . . 10 72 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 73 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 75 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 76 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 77 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 78 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 80 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 81 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 82 8.2. Informative References . . . . . . . . . . . . . . . . . 13 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 85 1. Introduction 87 For business continuity purpose, Virtual Machine (VM) migration 88 across data centers is commonly used in those situations such as data 89 center maintenance, data center migration, data center consolidation, 90 data center expansion, and data center disaster avoidance. It's 91 generally admitted that IP renumbering of servers (i.e., VMs) after 92 the migration is usually complex and costly at the risk of extending 93 the business downtime during the process of migration. To allow the 94 migration of a VM from one data center to another without IP 95 renumbering, the subnet on which the VM resides needs to be extended 96 across these data centers. 98 To achieve subnet extension across multiple Infrastructure-as- 99 a-Service (IaaS) cloud data centers in a scalable way, the following 100 requirements and challenges must be considered: 102 a. VPN Instance Space Scalability: In a modern cloud data center 103 environment, thousands or even tens of thousands of tenants could 104 be hosted over a shared network infrastructure. For security and 105 performance isolation purposes, these tenants need to be isolated 106 from one another. 108 b. Forwarding Table Scalability: With the development of server 109 virtualization technologies, it's not uncommon for a single cloud 110 data center to contain millions of VMs. This number already 111 implies a big challenge on the forwarding table scalability of 112 data center switches. Provided multiple data centers of such 113 scale were interconnected at layer2, this challenge would become 114 even worse. 116 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 117 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 118 maintained on default gateways within cloud data centers can 119 raise scalability issues. Therefore, it's very useful if the 120 ARP/ND cache table size could be prevented from growing by 121 multiples as the number of data centers to be connected 122 increases. 124 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 125 flooding of ARP/ND broadcast/multicast and unknown unicast 126 traffic within large Layer2 networks would affect the performance 127 of networks and hosts. As multiple data centers with each 128 containing millions of VMs are interconnected at layer2, the 129 impact of flooding as mentioned above would become even worse. 130 As such, it becomes increasingly important to avoid the flooding 131 of ARP/ND broadcast/multicast and unknown unicast traffic across 132 data centers. 134 e. Path Optimization: A subnet usually indicates a location in the 135 network. However, when a subnet has been extended across 136 multiple geographically dispersed data center locations, the 137 location semantics of such subnet is not retained any longer. As 138 a result, the traffic from a cloud user (i.e., a VPN user) which 139 is destined for a given server located at one data center 140 location of such extended subnet may arrive at another data 141 center location firstly according to the subnet route, and then 142 be forwarded to the location where the service is actually 143 located. This suboptimal routing would obviously result in an 144 unnecessary consumption of the bandwidth resource between data 145 centers. Furthermore, in the case where the traditional VPLS 146 technology [RFC4761] [RFC4762] is used for data center 147 interconnect and default gateways of different data center 148 locations are configured within the same virtual router 149 redundancy group, the returning traffic from that server to the 150 cloud user may be forwarded at layer2 to a default gateway 151 located at one of the remote data center premises, rather than 152 the one placed at the local data center location. This 153 suboptimal routing would also unnecessarily consume the bandwidth 154 resource between data centers 156 This document describes a BGP/MPLS IP VPN-based subnet extension 157 solution referred to as Virtual Subnet, which can be used for data 158 center interconnection while addressing all of the requirements and 159 challenges as mentioned above. Here the BGP/MPLS IP VPN means both 160 BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In 161 addition, since Virtual Subnet is mainly built on proven technologies 162 such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], 163 those service providers offering IaaS public cloud services could 164 rely upon their existing BGP/MPLS IP VPN infrastructures and their 165 corresponding experiences to realize data center interconnection. 167 Although Virtual Subnet is described in this document as an approach 168 for data center interconnection, it actually could be used within 169 data centers as well. 171 Note that the approach described in this document is not intended to 172 achieve an exact emulation of L2 connectivity and therefore it can 173 only support a restricted L2 connectivity service model with 174 limitations declared in Section 4. As for the discussion about in 175 which environment this service model should be suitable, it's outside 176 the scope of this document. 178 1.1. Requirements Language 180 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 181 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 182 document are to be interpreted as described in RFC 2119 [RFC2119]. 184 2. Terminology 186 This memo makes use of the terms defined in [RFC4364]. 188 3. Solution Description 189 3.1. Unicast 191 3.1.1. Intra-subnet Unicast 193 +--------------------+ 194 +-----------------+ | | +-----------------+ 195 |VPN_A:10.1.1.1/24| | | |VPN_A:10.1.1.1/24| 196 | \ | | | | / | 197 | +------+ \++---+-+ +-+---++/ +------+ | 198 | |Host A+----+ PE-1 | | PE-2 +----+Host B| | 199 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 200 | 10.1.1.2/24 | | | | | | 10.1.1.3/24 | 201 | | | | | | | | 202 | DC West | | | IP/MPLS Backbone | | | DC East | 203 +-----------------+ | | | | +-----------------+ 204 | +--------------------+ | 205 | | 206 VRF_A : V VRF_A : V 207 +------------+---------+--------+ +------------+---------+--------+ 208 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 209 +------------+---------+--------+ +------------+---------+--------+ 210 |10.1.1.1/32 |127.0.0.1| Direct | |10.1.1.1/32 |127.0.0.1| Direct | 211 +------------+---------+--------+ +------------+---------+--------+ 212 |10.1.1.2/32 |10.1.1.2 | Direct | |10.1.1.2/32 | PE-1 | IBGP | 213 +------------+---------+--------+ +------------+---------+--------+ 214 |10.1.1.3/32 | PE-2 | IBGP | |10.1.1.3/32 |10.1.1.3 | Direct | 215 +------------+---------+--------+ +------------+---------+--------+ 216 |10.1.1.0/24 |10.1.1.1 | Direct | |10.1.1.0/24 |10.1.1.1 | Direct | 217 +------------+---------+--------+ +------------+---------+--------+ 218 Figure 1: Intra-subnet Unicast Example 220 As shown in Figure 1, two CE hosts (i.e., Hosts A and B) belonging to 221 the same subnet (i.e., 10.1.1.0/24) are located at different data 222 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 223 PE-1 and PE-2) which are used for interconnecting these two data 224 centers create host routes for their own local CE hosts respectively 225 and then advertise them via the BGP/MPLS IP VPN signaling. 226 Meanwhile, ARP proxy is enabled on VRF attachment circuits of these 227 PE routers. 229 Now assume host A sends an ARP request for host B before 230 communicating with host B. Upon receiving the ARP request, PE-1 231 acting as an ARP proxy returns its own MAC address as a response. 232 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 233 packets towards PE-2 which in turn forwards them to host B. Thus, 234 hosts A and B can communicate with each other as if they were located 235 within the same subnet. 237 3.1.2. Inter-subnet Unicast 239 +--------------------+ 240 +-----------------+ | | +-----------------+ 241 |VPN_A:10.1.1.1/24| | | |VPN_A:10.1.1.1/24| 242 | \ | | | | / | 243 | +------+ \++---+-+ +-+---++/ +------+ | 244 | |Host A+------+ PE-1 | | PE-2 +-+----+Host B| | 245 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 246 | 10.1.1.2/24 | | | | | | | 10.1.1.3/24 | 247 | GW=10.1.1.4 | | | | | | | GW=10.1.1.4 | 248 | | | | | | | | +------+ | 249 | | | | | | | +----+ GW +--| 250 | | | | | | | /+------+ | 251 | | | | | | | 10.1.1.4/24 | 252 | | | | | | | | 253 | DC West | | | IP/MPLS Backbone | | | DC East | 254 +-----------------+ | | | | +-----------------+ 255 | +--------------------+ | 256 | | 257 VRF_A : V VRF_A : V 258 +------------+---------+--------+ +------------+---------+--------+ 259 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 260 +------------+---------+--------+ +------------+---------+--------+ 261 |10.1.1.1/32 |127.0.0.1| Direct | |10.1.1.1/32 |127.0.0.1| Direct | 262 +------------+---------+--------+ +------------+---------+--------+ 263 |10.1.1.2/32 |10.1.1.2 | Direct | |10.1.1.2/32 | PE-1 | IBGP | 264 +------------+---------+--------+ +------------+---------+--------+ 265 |10.1.1.3/32 | PE-2 | IBGP | |10.1.1.3/32 |10.1.1.3 | Direct | 266 +------------+---------+--------+ +------------+---------+--------+ 267 |10.1.1.4/32 | PE-2 | IBGP | |10.1.1.4/32 |10.1.1.4 | Direct | 268 +------------+---------+--------+ +------------+---------+--------+ 269 |10.1.1.0/24 |10.1.1.1 | Direct | |10.1.1.0/24 |10.1.1.1 | Direct | 270 +------------+---------+--------+ +------------+---------+--------+ 271 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |10.1.1.4 | Static | 272 +------------+---------+--------+ +------------+---------+--------+ 273 Figure 2: Inter-subnet Unicast Example (1) 275 As shown in Figure 2, only one data center (i.e., DC East) is 276 deployed with a default gateway (i.e., GW). PE-2 which is connected 277 to GW would either be configured with or learn from GW a default 278 route with next-hop being pointed to GW. Meanwhile, this route is 279 distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 280 operation. Assume host A sends an ARP request for its default 281 gateway (i.e., 10.1.1.4) prior to communicating with a destination 282 host outside of its subnet. Upon receiving this ARP request, PE-1 283 acting as an ARP proxy returns its own MAC address as a response. 284 Host A then sends a packet for Host B to PE-1. PE-1 tunnels such 285 packet towards PE-2 according to the default route learnt from PE-2, 286 which in turn forwards that packet to GW. 288 +--------------------+ 289 +-----------------+ | | +-----------------+ 290 |VPN_A:10.1.1.1/24| | | |VPN_A:10.1.1.1/24| 291 | \ | | | | / | 292 | +------+ \++---+-+ +-+---++/ +------+ | 293 | |Host A+----+-+ PE-1 | | PE-2 +-+----+Host B| | 294 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 295 | 10.1.1.2/24 | | | | | | | | 10.1.1.3/24 | 296 | GW=10.1.1.4 | | | | | | | | GW=10.1.1.4 | 297 | +------+ | | | | | | | | +------+ | 298 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +--| 299 | +------+\ | | | | | | /+------+ | 300 | 10.1.1.4/24 | | | | | | 10.1.1.4/24 | 301 | | | | | | | | 302 | DC West | | | IP/MPLS Backbone | | | DC East | 303 +-----------------+ | | | | +-----------------+ 304 | +--------------------+ | 305 | | 306 VRF_A : V VRF_A : V 307 +------------+---------+--------+ +------------+---------+--------+ 308 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 309 +------------+---------+--------+ +------------+---------+--------+ 310 |10.1.1.1/32 |127.0.0.1| Direct | |10.1.1.1/32 |127.0.0.1| Direct | 311 +------------+---------+--------+ +------------+---------+--------+ 312 |10.1.1.2/32 |10.1.1.2 | Direct | |10.1.1.2/32 | PE-1 | IBGP | 313 +------------+---------+--------+ +------------+---------+--------+ 314 |10.1.1.3/32 | PE-2 | IBGP | |10.1.1.3/32 |10.1.1.3 | Direct | 315 +------------+---------+--------+ +------------+---------+--------+ 316 |10.1.1.4/32 |10.1.1.4 | Direct | |10.1.1.4/32 |10.1.1.4 | Direct | 317 +------------+---------+--------+ +------------+---------+--------+ 318 |10.1.1.0/24 |10.1.1.1 | Direct | |10.1.1.0/24 |10.1.1.1 | Direct | 319 +------------+---------+--------+ +------------+---------+--------+ 320 | 0.0.0.0/0 |10.1.1.4 | Static | | 0.0.0.0/0 |10.1.1.4 | Static | 321 +------------+---------+--------+ +------------+---------+--------+ 322 Figure 3: Inter-subnet Unicast Example (2) 324 As shown in Figure 3, in the case where each data center is deployed 325 with a default gateway, CE hosts will get ARP responses directly from 326 their local default gateways, rather than from their local PE routers 327 when sending ARP requests for their default gateways. 329 +------+ 330 +------+ PE-3 +------+ 331 +-----------------+ | +------+ | +-----------------+ 332 |VPN_A:10.1.1.1/24| | | |VPN_A:10.1.1.1/24| 333 | \ | | | | / | 334 | +------+ \++---+-+ +-+---++/ +------+ | 335 | |Host A+------+ PE-1 | | PE-2 +------+Host B| | 336 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 337 | 10.1.1.2/24 | | | | | | 10.1.1.3/24 | 338 | GW=10.1.1.1 | | | | | | GW=10.1.1.1 | 339 | | | | | | | | 340 | DC West | | | IP/MPLS Backbone | | | DC East | 341 +-----------------+ | | | | +-----------------+ 342 | +--------------------+ | 343 | | 344 VRF_A : V VRF_A : V 345 +------------+---------+--------+ +------------+---------+--------+ 346 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 347 +------------+---------+--------+ +------------+---------+--------+ 348 |10.1.1.1/32 |127.0.0.1| Direct | |10.1.1.1/32 |127.0.0.1| Direct | 349 +------------+---------+--------+ +------------+---------+--------+ 350 |10.1.1.2/32 |10.1.1.2 | Direct | |10.1.1.2/32 | PE-1 | IBGP | 351 +------------+---------+--------+ +------------+---------+--------+ 352 |10.1.1.3/32 | PE-2 | IBGP | |10.1.1.3/32 |10.1.1.3 | Direct | 353 +------------+---------+--------+ +------------+---------+--------+ 354 |10.1.1.0/24 |10.1.1.1 | Direct | |10.1.1.0/24 |10.1.1.1 | Direct | 355 +------------+---------+--------+ +------------+---------+--------+ 356 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 357 +------------+---------+--------+ +------------+---------+--------+ 358 Figure 4: Inter-subnet Unicast Example (3) 360 Alternatively, as shown in Figure 4, PE routers themselves could be 361 directly configured as default gateways of their locally connected CE 362 hosts as long as these PE routers have routes for outside networks. 364 3.2. Multicast 366 To support IP multicast between CE hosts of the same virtual subnet, 367 MVPN technologies [RFC6513] could be directly used without any 368 change. For example, PE routers attached to a given VPN join a 369 default provider multicast distribution tree which is dedicated for 370 that VPN. Ingress PE routers, upon receiving multicast packets from 371 their local CE hosts, forward them towards remote PE routers through 372 the corresponding default provider multicast distribution tree. 374 3.3. CE Host Discovery 376 PE routers SHOULD be able to discover their local CE hosts and keep 377 the list of these hosts up to date in a timely manner so as to ensure 378 the availability and accuracy of the corresponding host routes 379 originated from them. PE routers could accomplish local CE host 380 discovery by some traditional host discovery mechanisms using ARP or 381 ND protocols. Furthermore, Link Layer Discovery Protocol (LLDP) or 382 VSI Discovery and Configuration Protocol (VDP), or even interaction 383 with the data center orchestration system could also be considered as 384 a means to dynamically discover local CE hosts 386 3.4. ARP/ND Proxy 388 Acting as an ARP or ND proxies, a PE routers SHOULD only respond to 389 an ARP request or Neighbor Solicitation (NS) message for a target 390 host when it has a best route for that target host in the associated 391 VRF and the outgoing interface of that best route is different from 392 the one over which the ARP request or NS message is received. In the 393 scenario where a given VPN site (i.e., a data center) is multi-homed 394 to more than one PE router via an Ethernet switch or an Ethernet 395 network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 396 usually enabled on these PE routers. In this case, only the PE 397 router being elected as the VRRP Master is allowed to perform the 398 ARP/ND proxy function. 400 3.5. CE Host Mobility 402 During the VM migration process, the PE router to which the moving VM 403 is now attached would create a host route for that CE host upon 404 receiving a notification message of VM attachment (e.g., a gratuitous 405 ARP or unsolicited NA message). The PE router to which the moving VM 406 was previously attached would withdraw the corresponding host route 407 when receiving a notification message of VM detachment (e.g., a VDP 408 message about VM detachment). Meanwhile, the latter PE router could 409 optionally broadcast a gratuitous ARP or send an unsolicited NA 410 message on behalf of that CE host with source MAC address being one 411 of its own. In this way, the ARP/ND entry of this CE host that moved 412 and which has been cached on any local CE host would be updated 413 accordingly. In the case where there is no explicit VM detachment 414 notification mechanism, the PE router could also use the following 415 trick to determine the VM detachment event: upon learning a route 416 update for a local CE host from a remote PE router for the first 417 time, the PE router could immediately check whether that local CE 418 host is still attached to it by some means (e.g., ARP/ND PING and/or 419 ICMP PING). It is important to ensure that the same MAC and IP are 420 associated to the default gateway active in each data center, as the 421 VM would most likely continue to send packets to the same default 422 gateway address after migrated from one data center to another. One 423 possible way to achieve this goal is to configure the same VRRP group 424 on each location so as to ensure the default gateway active in each 425 data center share the same virtual MAC and virtual IP addresses. 427 3.6. Forwarding Table Scalability on Data Center Switches 429 In a VS environment, the MAC learning domain associated with a given 430 virtual subnet which has been extended across multiple data centers 431 is partitioned into segments and each segment is confined within a 432 single data center. Therefore data center switches only need to 433 learn local MAC addresses, rather than learning both local and remote 434 MAC addresses. 436 3.7. ARP/ND Cache Table Scalability on Default Gateways 438 When default gateway functions are implemented on PE routers as shown 439 in Figure 4, the ARP/ND cache table on each PE router only needs to 440 contain ARP/ND entries of local CE hosts As a result, the ARP/ND 441 cache table size would not grow as the number of data centers to be 442 connected increases. 444 3.8. ARP/ND and Unknown Uncast Flood Avoidance 446 In VS, the flooding domain associated with a given virtual subnet 447 that has been extended across multiple data centers, is partitioned 448 into segments and each segment is confined within a single data 449 center. Therefore, the performance impact on networks and servers 450 imposed by the flooding of ARP/ND broadcast/multicast and unknown 451 unicast traffic is alleviated. 453 3.9. Path Optimization 455 Take the scenario shown in Figure 4 as an example, to optimize the 456 forwarding path for the traffic between cloud users and cloud data 457 centers, PE routers located at cloud data centers (i.e., PE-1 and PE- 458 2), which are also acting as default gateways, propagate host routes 459 for their own local CE hosts respectively to remote PE routers which 460 are attached to cloud user sites (i.e., PE-3). As such, the traffic 461 from cloud user sites to a given server on the virtual subnet which 462 has been extended across data centers would be forwarded directly to 463 the data center location where that server resides, since the traffic 464 is now forwarded according to the host route for that server, rather 465 than the subnet route. Furthermore, for the traffic coming from 466 cloud data centers and forwarded to cloud user sites, each PE router 467 acting as a default gateway would forward the traffic according to 468 the best-match route in the corresponding VRF. As a result, the 469 traffic from data centers to cloud user sites is forwarded along an 470 optimal path as well. 472 4. Limitations 474 4.1. Non-support of Non-IP Traffic 476 Although most traffic within and across data centers is IP traffic, 477 there may still be a few legacy clustering applications which rely on 478 non-IP communications (e.g., heartbeat messages between cluster 479 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 480 those non-IP communications cannot be supported in the Virtual Subnet 481 solution. In order to support those few non-IP traffic (if present) 482 in the environment where the Virtual Subnet solution has been 483 deployed, the approach following the idea of "route all IP traffic, 484 bridge non-IP traffic" could be considered. That's to say, all IP 485 traffic including both intra-subnet and inter-subnet would be 486 processed by the Virtual Subnet process, while the non-IP traffic 487 would be resorted to a particular Layer2 VPN approach. Such unified 488 L2/L3 VPN approach requires ingress PE routers to classify the 489 traffic received from CE hosts before distributing them to the 490 corresponding L2 or L3 VPN forwarding processes. Note that more and 491 more cluster vendors are offering clustering applications based on 492 Layer 3 interconnection. 494 4.2. Non-support of IP Broadcast and Link-local Multicast 496 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 497 the Virtual Subnet solution. Therefore, IP broadcast and link-local 498 multicast traffic cannot be supported by the Virtual Subnet solution. 499 In order to support the IP broadcast and link-local multicast traffic 500 in the environment where the Virtual Subnet solution has been 501 deployed, the unified L2/L3 overlay approach as described in 502 Section 4.1 could be considered as well. That's to say, the IP 503 broadcast and link-local multicast would be resorted to the L2VPN 504 forwarding process while the routable IP traffic would be processed 505 by the Virtual Subnet process. 507 4.3. TTL and Traceroute 509 As illustrated before, intra-subnet traffic is forwarded at Layer3 in 510 the Virtual Subnet context. Since it doesn't require any change to 511 the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a 512 traceroute operation on one CE host for another CE host (assuming 513 that these two hosts are within the same subnet but are attached to 514 different sites), the traceroute output would reflect the fact that 515 these two hosts belonging to the same subnet are actually connected 516 via an virtual subnet emulated by ARP proxy, rather than a normal 517 LAN. In addition, for any other applications which generate intra- 518 subnet traffic with TTL set to 1, these applications may not be 519 workable in the Virtual Subnet context, unless special TTL processing 520 for such case has been implemented (e.g., if the source and 521 destination addresses of a packet whose TTL is set to 1 belong to the 522 same extended subnet, neither ingress nor egress PE routers SHOULD 523 decrement the TTL of such packet. Furthermore, the TTL of such 524 packet SHOULD NOT be copied into the TTL of the transport tunnel and 525 vice versa). 527 5. Acknowledgements 529 Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, 530 Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, 531 Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe 532 Touch and Wim Henderickx for their valuable comments and suggestions 533 on this document. Thanks to Loa Andersson for his WG LC review on 534 this document. 536 6. IANA Considerations 538 There is no requirement for any IANA action. 540 7. Security Considerations 542 This document doesn't introduce additional security risk to BGP/MPLS 543 IP VPN, nor does it provide any additional security feature for BGP/ 544 MPLS IP VPN. 546 8. References 548 8.1. Normative References 550 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 551 October 1984. 553 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 554 implement transparent subnet gateways", RFC 1027, October 555 1987. 557 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 558 Requirement Levels", BCP 14, RFC 2119, March 1997. 560 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 561 Networks (VPNs)", RFC 4364, February 2006. 563 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 564 Proxies (ND Proxy)", RFC 4389, April 2006. 566 [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, 567 "BGP-MPLS IP Virtual Private Network (VPN) Extension for 568 IPv6 VPN", RFC 4659, September 2006. 570 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 571 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 572 4761, January 2007. 574 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 575 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 576 RFC 4762, January 2007. 578 [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP) 579 Version 3 for IPv4 and IPv6", RFC 5798, March 2010. 581 [RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP 582 VPNs", RFC 6513, February 2012. 584 8.2. Informative References 586 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 587 Problems in Large Data Center Networks", RFC 6820, January 588 2013. 590 Authors' Addresses 592 Xiaohu Xu 593 Huawei 595 Email: xuxiaohu@huawei.com 597 Robert Raszuk 598 Mirantis Inc. 600 Email: robert@raszuk.net 602 Christian Jacquenet 603 Orange 605 Email: christian.jacquenet@orange.com 607 Truman Boyes 608 Bloomberg LP 610 Email: tboyes@bloomberg.net 611 Brendan Fee 612 Extreme Networks 614 Email: bfee@enterasys.com