idnits 2.17.1 draft-ietf-bess-virtual-subnet-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (November 9, 2015) is 3085 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei 4 Intended status: Informational R. Raszuk 5 Expires: May 12, 2016 Mirantis Inc. 6 C. Jacquenet 7 Orange 8 T. Boyes 9 Bloomberg LP 10 B. Fee 11 Extreme Networks 12 November 9, 2015 14 Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution 15 draft-ietf-bess-virtual-subnet-03 17 Abstract 19 This document describes a BGP/MPLS IP VPN-based subnet extension 20 solution referred to as Virtual Subnet, which can be used for 21 building Layer 3 network virtualization overlays within and/or 22 between data centers. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 12, 2016. 41 Copyright Notice 43 Copyright (c) 2015 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 62 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4 64 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 5 65 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9 67 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 68 3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9 69 3.6. Forwarding Table Scalability on Data Center Switches . . 10 70 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 71 3.8. ARP/ND and Unknown Uncast Flood Avoidance . . . . . . . . 10 72 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 73 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 75 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 76 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 77 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 78 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 80 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 81 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 82 8.2. Informative References . . . . . . . . . . . . . . . . . 13 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 85 1. Introduction 87 For business continuity purpose, Virtual Machine (VM) migration 88 across data centers is commonly used in situations such as data 89 center maintenance, data center migration, data center consolidation, 90 data center expansion, and data center disaster avoidance. It's 91 generally admitted that IP renumbering of servers (i.e., VMs) after 92 the migration is usually complex and costly at the risk of extending 93 the business downtime during the process of migration. To allow the 94 migration of a VM from one data center to another without IP 95 renumbering, the subnet on which the VM resides needs to be extended 96 across these data centers. 98 To achieve subnet extension across multiple Infrastructure-as- 99 a-Service (IaaS) cloud data centers in a scalable way, the following 100 requirements and challenges must be considered: 102 a. VPN Instance Space Scalability: In a modern cloud data center 103 environment, thousands or even tens of thousands of tenants could 104 be hosted over a shared network infrastructure. For security and 105 performance isolation purposes, these tenants need to be isolated 106 from one another. 108 b. Forwarding Table Scalability: With the development of server 109 virtualization technologies, it's not uncommon for a single cloud 110 data center to contain millions of VMs. This number already 111 implies a big challenge on the forwarding table scalability of 112 data center switches. Provided multiple data centers of such 113 scale were interconnected at Layer 2, this challenge would become 114 even worse. 116 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 117 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 118 maintained on default gateways within cloud data centers can 119 raise scalability issues. Therefore, it's very useful if the 120 ARP/ND cache table size could be prevented from growing by 121 multiples as the number of data centers to be connected 122 increases. 124 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 125 flooding of ARP/ND broadcast/multicast and unknown unicast 126 traffic within large Layer 2 networks would affect the 127 performance of networks and hosts. As multiple data centers with 128 each containing millions of VMs are interconnected at Layer 2, 129 the impact of flooding as mentioned above would become even 130 worse. As such, it becomes increasingly important to avoid the 131 flooding of ARP/ND broadcast/multicast and unknown unicast 132 traffic across data centers. 134 e. Path Optimization: A subnet usually indicates a location in the 135 network. However, when a subnet has been extended across 136 multiple geographically dispersed data center locations, the 137 location semantics of such subnet is not retained any longer. As 138 a result, the traffic between a specific user and server, in 139 different data centers, may first be routed through a third data 140 center. This suboptimal routing would obviously result in an 141 unnecessary consumption of the bandwidth resource between data 142 centers. Furthermore, in the case where traditional VPLS 143 technology [RFC4761] [RFC4762] is used for data center 144 interconnect, return traffic from a server may be forwarded to a 145 default gateway located in a different data center due to the 146 configuration in a virtual router redundancy group. This 147 suboptimal routing would also unnecessarily consume the bandwidth 148 resource between data centers. 150 This document describes a BGP/MPLS IP VPN-based subnet extension 151 solution referred to as Virtual Subnet, which can be used for data 152 center interconnection while addressing all of the requirements and 153 challenges as mentioned above. Here the BGP/MPLS IP VPN means both 154 BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In 155 addition, since Virtual Subnet is mainly built on proven technologies 156 such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], 157 those service providers offering IaaS public cloud services could 158 rely upon their existing BGP/MPLS IP VPN infrastructures and their 159 corresponding experiences to realize data center interconnection. 161 Although Virtual Subnet is described in this document as an approach 162 for data center interconnection, it actually could be used within 163 data centers as well. 165 Note that the approach described in this document is not intended to 166 achieve an exact emulation of Layer 2 connectivity and therefore it 167 can only support a restricted Layer 2 connectivity service model with 168 limitations declared in Section 4. As for the discussion about in 169 which environment this service model should be suitable, it's outside 170 the scope of this document. 172 1.1. Requirements Language 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in RFC 2119 [RFC2119]. 178 2. Terminology 180 This memo makes use of the terms defined in [RFC4364]. 182 3. Solution Description 184 3.1. Unicast 186 3.1.1. Intra-subnet Unicast 187 +--------------------+ 188 +------------------+ | | +------------------+ 189 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 190 | \ | | | | / | 191 | +------+ \ ++---+-+ +-+---++/ +------+ | 192 | |Host A+-----+ PE-1 | | PE-2 +----+Host B| | 193 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 194 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 195 | | | | | | | | 196 | DC West | | | IP/MPLS Backbone | | | DC East | 197 +------------------+ | | | | +------------------+ 198 | +--------------------+ | 199 | | 200 VRF_A : V VRF_A : V 201 +------------+---------+--------+ +------------+---------+--------+ 202 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 203 +------------+---------+--------+ +------------+---------+--------+ 204 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 205 +------------+---------+--------+ +------------+---------+--------+ 206 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 207 +------------+---------+--------+ +------------+---------+--------+ 208 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 209 +------------+---------+--------+ +------------+---------+--------+ 210 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 211 +------------+---------+--------+ +------------+---------+--------+ 212 Figure 1: Intra-subnet Unicast Example 214 As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to 215 the same subnet (i.e., 192.0.2.0/24) are located at different data 216 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 217 PE-1 and PE-2) which are used for interconnecting these two data 218 centers create host routes for their own local hosts respectively and 219 then advertise them via the BGP/MPLS IP VPN signaling. Meanwhile, an 220 ARP proxy is enabled on VRF attachment circuits of these PE routers. 222 Now assume host A sends an ARP request for host B before 223 communicating with host B. Upon receiving the ARP request, PE-1 224 acting as an ARP proxy returns its own MAC address as a response. 225 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 226 packets towards PE-2 which in turn forwards them to host B. Thus, 227 hosts A and B can communicate with each other as if they were located 228 within the same subnet. 230 3.1.2. Inter-subnet Unicast 231 +--------------------+ 232 +------------------+ | | +------------------+ 233 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 234 | \ | | | | / | 235 | +------+ \ ++---+-+ +-+---++/ +------+ | 236 | |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| | 237 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 238 | 192.0.2.2/24 | | | | | | | 192.0.2.3/24 | 239 | GW=192.0.2.4 | | | | | | | GW=192.0.2.4 | 240 | | | | | | | | +------+ | 241 | | | | | | | +----+ GW +-- | 242 | | | | | | | /+------+ | 243 | | | | | | | 192.0.2.4/24 | 244 | | | | | | | | 245 | DC West | | | IP/MPLS Backbone | | | DC East | 246 +------------------+ | | | | +------------------+ 247 | +--------------------+ | 248 | | 249 VRF_A : V VRF_A : V 250 +------------+---------+--------+ +------------+---------+--------+ 251 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 252 +------------+---------+--------+ +------------+---------+--------+ 253 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 254 +------------+---------+--------+ +------------+---------+--------+ 255 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 256 +------------+---------+--------+ +------------+---------+--------+ 257 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 258 +------------+---------+--------+ +------------+---------+--------+ 259 |192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct | 260 +------------+---------+--------+ +------------+---------+--------+ 261 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 262 +------------+---------+--------+ +------------+---------+--------+ 263 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static | 264 +------------+---------+--------+ +------------+---------+--------+ 265 Figure 2: Inter-subnet Unicast Example (1) 267 As shown in Figure 2, only one data center (i.e., DC East) is 268 deployed with a default gateway (i.e., GW). PE-2 which is connected 269 to GW would either be configured with or learn from GW a default 270 route with next-hop being pointed to GW. Meanwhile, this route is 271 distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 272 operation. Assume host A sends an ARP request for its default 273 gateway (i.e., 192.0.2.4) prior to communicating with a destination 274 host outside of its subnet. Upon receiving this ARP request, PE-1 275 acting as an ARP proxy returns its own MAC address as a response. 276 Host A then sends a packet for Host B to PE-1. PE-1 tunnels such 277 packet towards PE-2 according to the default route learnt from PE-2, 278 which in turn forwards that packet to GW. 280 +--------------------+ 281 +------------------+ | | +------------------+ 282 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 283 | \ | | | | / | 284 | +------+ \ ++---+-+ +-+---++/ +------+ | 285 | |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| | 286 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 287 | 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 | 288 | GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 | 289 | +------+ | | | | | | | | +------+ | 290 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +-- | 291 | +------+\ | | | | | | /+------+ | 292 | 192.0.2.4/24 | | | | | | 192.0.2.4/24 | 293 | | | | | | | | 294 | DC West | | | IP/MPLS Backbone | | | DC East | 295 +------------------+ | | | | +------------------+ 296 | +--------------------+ | 297 | | 298 VRF_A : V VRF_A : V 299 +------------+---------+--------+ +------------+---------+--------+ 300 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 301 +------------+---------+--------+ +------------+---------+--------+ 302 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 303 +------------+---------+--------+ +------------+---------+--------+ 304 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 305 +------------+---------+--------+ +------------+---------+--------+ 306 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 307 +------------+---------+--------+ +------------+---------+--------+ 308 |192.0.2.4/32|192.0.2.4| Direct | |192.0.2.4/32|192.0.2.4| Direct | 309 +------------+---------+--------+ +------------+---------+--------+ 310 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 311 +------------+---------+--------+ +------------+---------+--------+ 312 | 0.0.0.0/0 |192.0.2.4| Static | | 0.0.0.0/0 |192.0.2.4| Static | 313 +------------+---------+--------+ +------------+---------+--------+ 314 Figure 3: Inter-subnet Unicast Example (2) 316 As shown in Figure 3, in the case where each data center is deployed 317 with a default gateway, hosts will get ARP responses directly from 318 their local default gateways, rather than from their local PE routers 319 when sending ARP requests for their default gateways. 321 +------+ 322 +------+ PE-3 +------+ 323 +------------------+ | +------+ | +------------------+ 324 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 325 | \ | | | | / | 326 | +------+ \ ++---+-+ +-+---++/ +------+ | 327 | |Host A+-------+ PE-1 | | PE-2 +------+Host B| | 328 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 329 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 330 | GW=192.0.2.1 | | | | | | GW=192.0.2.1 | 331 | | | | | | | | 332 | DC West | | | IP/MPLS Backbone | | | DC East | 333 +------------------+ | | | | +------------------+ 334 | +--------------------+ | 335 | | 336 VRF_A : V VRF_A : V 337 +------------+---------+--------+ +------------+---------+--------+ 338 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 339 +------------+---------+--------+ +------------+---------+--------+ 340 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 341 +------------+---------+--------+ +------------+---------+--------+ 342 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 343 +------------+---------+--------+ +------------+---------+--------+ 344 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 345 +------------+---------+--------+ +------------+---------+--------+ 346 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 347 +------------+---------+--------+ +------------+---------+--------+ 348 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 349 +------------+---------+--------+ +------------+---------+--------+ 350 Figure 4: Inter-subnet Unicast Example (3) 352 Alternatively, as shown in Figure 4, PE routers themselves could be 353 directly configured as default gateways of their locally connected 354 hosts as long as these PE routers have routes for outside networks. 356 3.2. Multicast 358 To support IP multicast between hosts of the same Virtual Subnet, 359 MVPN technologies [RFC6513] could be directly used without any 360 change. For example, PE routers attached to a given VPN join a 361 default provider multicast distribution tree which is dedicated for 362 that VPN. Ingress PE routers, upon receiving multicast packets from 363 their local hosts, forward them towards remote PE routers through the 364 corresponding default provider multicast distribution tree. Note 365 that here the IP multicast doesn't include link-local multicast. 367 3.3. Host Discovery 369 PE routers should be able to discover their local hosts and keep the 370 list of these hosts up to date in a timely manner so as to ensure the 371 availability and accuracy of the corresponding host routes originated 372 from them. PE routers could accomplish local host discovery by some 373 traditional host discovery mechanisms using ARP or ND protocols. 375 3.4. ARP/ND Proxy 377 Acting as an ARP or ND proxies, a PE routers should only respond to 378 an ARP request or Neighbor Solicitation (NS) message for a target 379 host when it has a best route for that target host in the associated 380 VRF and the outgoing interface of that best route is different from 381 the one over which the ARP request or NS message is received. In the 382 scenario where a given VPN site (i.e., a data center) is multi-homed 383 to more than one PE router via an Ethernet switch or an Ethernet 384 network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 385 usually enabled on these PE routers. In this case, only the PE 386 router being elected as the VRRP Master is allowed to perform the 387 ARP/ND proxy function. 389 3.5. Host Mobility 391 During the VM migration process, the PE router to which the moving VM 392 is now attached would create a host route for that host upon 393 receiving a notification message of VM attachment (e.g., a gratuitous 394 ARP or unsolicited NA message). The PE router to which the moving VM 395 was previously attached would withdraw the corresponding host route 396 when receiving a notification message of VM detachment (e.g., a VDP 397 message about VM detachment). Meanwhile, the latter PE router could 398 optionally broadcast a gratuitous ARP or send an unsolicited NA 399 message on behalf of that host with source MAC address being one of 400 its own. In this way, the ARP/ND entry of this host that moved and 401 which has been cached on any local host would be updated accordingly. 402 In the case where there is no explicit VM detachment notification 403 mechanism, the PE router could also use the following trick to 404 determine the VM detachment event: upon learning a route update for a 405 local host from a remote PE router for the first time, the PE router 406 could immediately check whether that local host is still attached to 407 it by some means (e.g., ARP/ND PING and/or ICMP PING). It is 408 important to ensure that the same MAC and IP are associated to the 409 default gateway active in each data center, as the VM would most 410 likely continue to send packets to the same default gateway address 411 after migrated from one data center to another. One possible way to 412 achieve this goal is to configure the same VRRP group on each 413 location so as to ensure the default gateway active in each data 414 center share the same virtual MAC and virtual IP addresses. 416 3.6. Forwarding Table Scalability on Data Center Switches 418 In a Virtual Subnet environment, the MAC learning domain associated 419 with a given Virtual Subnet which has been extended across multiple 420 data centers is partitioned into segments and each segment is 421 confined within a single data center. Therefore data center switches 422 only need to learn local MAC addresses, rather than learning both 423 local and remote MAC addresses. 425 3.7. ARP/ND Cache Table Scalability on Default Gateways 427 When default gateway functions are implemented on PE routers as shown 428 in Figure 4, the ARP/ND cache table on each PE router only needs to 429 contain ARP/ND entries of local hosts As a result, the ARP/ND cache 430 table size would not grow as the number of data centers to be 431 connected increases. 433 3.8. ARP/ND and Unknown Uncast Flood Avoidance 435 In VS, the flooding domain associated with a given Virtual Subnet 436 that has been extended across multiple data centers, is partitioned 437 into segments and each segment is confined within a single data 438 center. Therefore, the performance impact on networks and servers 439 imposed by the flooding of ARP/ND broadcast/multicast and unknown 440 unicast traffic is alleviated. 442 3.9. Path Optimization 444 Take the scenario shown in Figure 4 as an example, to optimize the 445 forwarding path for the traffic between cloud users and cloud data 446 centers, PE routers located at cloud data centers (i.e., PE-1 and PE- 447 2), which are also acting as default gateways, propagate host routes 448 for their own local hosts respectively to remote PE routers which are 449 attached to cloud user sites (i.e., PE-3). As such, the traffic from 450 cloud user sites to a given server on the Virtual Subnet which has 451 been extended across data centers would be forwarded directly to the 452 data center location where that server resides, since the traffic is 453 now forwarded according to the host route for that server, rather 454 than the subnet route. Furthermore, for the traffic coming from 455 cloud data centers and forwarded to cloud user sites, each PE router 456 acting as a default gateway would forward the traffic according to 457 the best-match route in the corresponding VRF. As a result, the 458 traffic from data centers to cloud user sites is forwarded along an 459 optimal path as well. 461 4. Limitations 463 4.1. Non-support of Non-IP Traffic 465 Although most traffic within and across data centers is IP traffic, 466 there may still be a few legacy clustering applications which rely on 467 non-IP communications (e.g., heartbeat messages between cluster 468 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 469 those non-IP communications cannot be supported in the Virtual Subnet 470 solution. In order to support those few non-IP traffic (if present) 471 in the environment where the Virtual Subnet solution has been 472 deployed, the approach following the idea of "route all IP traffic, 473 bridge non-IP traffic" could be considered. That's to say, all IP 474 traffic including both intra-subnet and inter-subnet would be 475 processed by the Virtual Subnet process, while the non-IP traffic 476 would be resorted to a particular Layer 2 VPN approach. Such unified 477 L2/L3 VPN approach requires ingress PE routers to classify the 478 traffic received from hosts before distributing them to the 479 corresponding L2 or L3 VPN forwarding processes. Note that more and 480 more cluster vendors are offering clustering applications based on 481 Layer 3 interconnection. 483 4.2. Non-support of IP Broadcast and Link-local Multicast 485 As illustrated before, intra-subnet traffic is forwarded at Layer 3 486 in the Virtual Subnet solution. Therefore, IP broadcast and link- 487 local multicast traffic cannot be supported by the Virtual Subnet 488 solution. In order to support the IP broadcast and link-local 489 multicast traffic in the environment where the Virtual Subnet 490 solution has been deployed, the unified L2/L3 overlay approach as 491 described in Section 4.1 could be considered as well. That's to say, 492 the IP broadcast and link-local multicast would be resorted to the 493 L2VPN forwarding process while the routable IP traffic would be 494 processed by the Virtual Subnet process. 496 4.3. TTL and Traceroute 498 As illustrated before, intra-subnet traffic is forwarded at Layer 3 499 in the Virtual Subnet context. Since it doesn't require any change 500 to the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a 501 traceroute operation on one host for another host (assuming that 502 these two hosts are within the same subnet but are attached to 503 different sites), the traceroute output would reflect the fact that 504 these two hosts within the same subnet are actually connected via an 505 Virtual Subnet, rather than a Layer 2 connection since the PE routers 506 to which those two host are connected respectively would be displayed 507 in the traceroute output. In addition, for any other applications 508 which generate intra-subnet traffic with TTL set to 1, these 509 applications may not be workable in the Virtual Subnet context, 510 unless special TTL processing for such case has been implemented 511 (e.g., if the source and destination addresses of a packet whose TTL 512 is set to 1 belong to the same extended subnet, neither ingress nor 513 egress PE routers should decrement the TTL of such packet. 514 Furthermore, the TTL of such packet should not be copied into the TTL 515 of the transport tunnel and vice versa). 517 5. Acknowledgements 519 Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, 520 Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, 521 Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe 522 Touch and Wim Henderickx for their valuable comments and suggestions 523 on this document. Thanks to Loa Andersson for his WG LC review on 524 this document. Thanks to Alvaro Retana for his AD review on this 525 document. Thanks to Ronald Bonica for his RtgDir review. 527 6. IANA Considerations 529 There is no requirement for any IANA action. 531 7. Security Considerations 533 This document doesn't introduce additional security risk to BGP/MPLS 534 IP VPN, nor does it provide any additional security feature for BGP/ 535 MPLS IP VPN. 537 8. References 539 8.1. Normative References 541 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 542 DOI 10.17487/RFC0925, October 1984, 543 . 545 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 546 implement transparent subnet gateways", RFC 1027, 547 DOI 10.17487/RFC1027, October 1987, 548 . 550 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 551 Requirement Levels", BCP 14, RFC 2119, 552 DOI 10.17487/RFC2119, March 1997, 553 . 555 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 556 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 557 2006, . 559 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 560 Proxies (ND Proxy)", RFC 4389, DOI 10.17487/RFC4389, April 561 2006, . 563 8.2. Informative References 565 [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, 566 "BGP-MPLS IP Virtual Private Network (VPN) Extension for 567 IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006, 568 . 570 [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private 571 LAN Service (VPLS) Using BGP for Auto-Discovery and 572 Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, 573 . 575 [RFC4762] Lasserre, M., Ed. and V. Kompella, Ed., "Virtual Private 576 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 577 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 578 . 580 [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) 581 Version 3 for IPv4 and IPv6", RFC 5798, 582 DOI 10.17487/RFC5798, March 2010, 583 . 585 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 586 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 587 2012, . 589 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 590 Problems in Large Data Center Networks", RFC 6820, 591 DOI 10.17487/RFC6820, January 2013, 592 . 594 Authors' Addresses 596 Xiaohu Xu 597 Huawei 599 Email: xuxiaohu@huawei.com 600 Robert Raszuk 601 Mirantis Inc. 603 Email: robert@raszuk.net 605 Christian Jacquenet 606 Orange 608 Email: christian.jacquenet@orange.com 610 Truman Boyes 611 Bloomberg LP 613 Email: tboyes@bloomberg.net 615 Brendan Fee 616 Extreme Networks 618 Email: bfee@extremenetworks.com