idnits 2.17.1 draft-ietf-bess-virtual-subnet-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 27, 2015) is 3067 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei Technologies 4 Intended status: Informational R. Raszuk 5 Expires: May 30, 2016 Bloomberg LP 6 C. Jacquenet 7 Orange 8 T. Boyes 9 Bloomberg LP 10 B. Fee 11 Extreme Networks 12 November 27, 2015 14 Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution 15 draft-ietf-bess-virtual-subnet-06 17 Abstract 19 This document describes a BGP/MPLS IP VPN-based subnet extension 20 solution referred to as Virtual Subnet, which can be used for 21 building Layer 3 network virtualization overlays within and/or 22 between data centers. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 30, 2016. 41 Copyright Notice 43 Copyright (c) 2015 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 61 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4 63 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 6 64 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 65 3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9 66 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 67 3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9 68 3.6. Forwarding Table Scalability on Data Center Switches . . 10 69 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 70 3.8. ARP/ND and Unknown Unicast Flood Avoidance . . . . . . . 10 71 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 72 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 74 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 75 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 76 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 81 8.2. Informative References . . . . . . . . . . . . . . . . . 13 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 84 1. Introduction 86 For business continuity purposes, Virtual Machine (VM) migration 87 across data centers is commonly used in situations such as data 88 center maintenance, migration, consolidation, expansion, or disaster 89 avoidance. It's generally admitted that IP renumbering of servers 90 (i.e., VMs) after the migration is usually complex and costly at the 91 risk of extending the business downtime during the process of 92 migration. To allow the migration of a VM from one data center to 93 another without IP renumbering, the subnet on which the VM resides 94 needs to be extended across these data centers. 96 To achieve subnet extension across multiple cloud data centers in a 97 scalable way, the following requirements and challenges must be 98 considered: 100 a. VPN Instance Space Scalability: In a modern cloud data center 101 environment, thousands or even tens of thousands of tenants could 102 be hosted over a shared network infrastructure. For security and 103 performance isolation purposes, these tenants need to be isolated 104 from one another. 106 b. Forwarding Table Scalability: With the development of server 107 virtualization technologies, it's not uncommon for a single cloud 108 data center to contain millions of VMs. This number already 109 implies a big challenge to the forwarding table scalability of 110 data center switches. Provided multiple data centers of such 111 scale were interconnected at Layer 2, this challenge would become 112 even worse. 114 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 115 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 116 maintained by default gateways within cloud data centers can 117 raise scalability issues. Therefore, mastering the size of the 118 ARP/ND cache tables is critical as the number of data centers to 119 be connected increases. 121 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 122 flooding of ARP/ND broadcast/multicast messages as well as 123 unknown unicast traffic within large Layer 2 networks is likely 124 to affect network and host performance. When multiple data 125 centers that each hosts millions of VMs are interconnected at 126 Layer 2, the impact of such flooding would become even worse. As 127 such, it becomes increasingly important to avoid the flooding of 128 ARP/ND broadcast/multicast as well as unknown unicast traffic 129 across data centers. 131 e. Path Optimization: A subnet usually indicates a location in the 132 network. However, when a subnet has been extended across 133 multiple geographically-dispersed data center locations, the 134 location semantics of such subnet is not retained any longer. As 135 a result, traffic exchanged between a specific user and a server 136 that would be located in different data centers, may first be 137 forwarded through a third data center. This suboptimal routing 138 would obviously result in an unnecessary consumption of the 139 bandwidth resources between data centers. Furthermore, in the 140 case where traditional VPLS technology [RFC4761] [RFC4762] is 141 used for data center interconnect, return traffic from a server 142 may be forwarded to a default gateway located in a different data 143 center due to the configuration of a virtual router redundancy 144 group. This suboptimal routing would also unnecessarily consume 145 the bandwidth resources between data centers. 147 This document describes a BGP/MPLS IP VPN-based subnet extension 148 solution referred to as Virtual Subnet, which can be used for data 149 center interconnection while addressing all of the aforementioned 150 requirements and challenges. Here the BGP/MPLS IP VPN means both 151 BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In 152 addition, since Virtual Subnet is mainly built on proven technologies 153 such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], 154 those service providers that provide Infrastructure as a Service 155 (IaaS) cloud services can rely upon their existing BGP/MPLS IP VPN 156 infrastructure and take advantage of their BGP/MPLS VPN operational 157 experience to interconnect data centers. 159 Although Virtual Subnet is described in this document as an approach 160 for data center interconnection, it can be used within data centers 161 as well. 163 Note that the approach described in this document is not intended to 164 achieve an exact emulation of Layer 2 connectivity and therefore it 165 can only support a restricted Layer 2 connectivity service model with 166 limitations that are discussed in Section 4. As for the discussion 167 about where this service model can apply, it's outside the scope of 168 this document. 170 2. Terminology 172 This memo makes use of the terms defined in [RFC4364]. 174 3. Solution Description 176 3.1. Unicast 178 3.1.1. Intra-subnet Unicast 179 +--------------------+ 180 +------------------+ | | +------------------+ 181 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 182 | \ | | | | / | 183 | +------+ \ ++---+-+ +-+---++/ +------+ | 184 | |Host A+-----+ PE-1 | | PE-2 +----+Host B| | 185 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 186 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 187 | | | | | | | | 188 | DC West | | | IP/MPLS Backbone | | | DC East | 189 +------------------+ | | | | +------------------+ 190 | +--------------------+ | 191 | | 192 VRF_A : V VRF_A : V 193 +------------+---------+--------+ +------------+---------+--------+ 194 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 195 +------------+---------+--------+ +------------+---------+--------+ 196 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 197 +------------+---------+--------+ +------------+---------+--------+ 198 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 199 +------------+---------+--------+ +------------+---------+--------+ 200 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 201 +------------+---------+--------+ +------------+---------+--------+ 202 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 203 +------------+---------+--------+ +------------+---------+--------+ 204 Figure 1: Intra-subnet Unicast Example 206 As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to 207 the same subnet (i.e., 192.0.2.0/24) are located in different data 208 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 209 PE-1 and PE-2) that are used for interconnecting these two data 210 centers create host routes for their own local hosts respectively and 211 then advertise these routes by means of the BGP/MPLS IP VPN 212 signaling. Meanwhile, an ARP proxy is enabled on Virtual Routing and 213 Forwarding (VRF) attachment circuits of these PE routers. 215 Let's now assume that host A sends an ARP request for host B before 216 communicating with host B. Upon receiving the ARP request, PE-1 217 acting as an ARP proxy returns its own MAC address as a response. 218 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 219 packets towards PE-2 which in turn forwards them to host B. Thus, 220 hosts A and B can communicate with each other as if they were located 221 within the same subnet. 223 3.1.2. Inter-subnet Unicast 225 +--------------------+ 226 +------------------+ | | +------------------+ 227 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 228 | \ | | | | / | 229 | +------+ \ ++---+-+ +-+---++/ +------+ | 230 | |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| | 231 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 232 | 192.0.2.2/24 | | | | | | | 192.0.2.3/24 | 233 | GW=192.0.2.4 | | | | | | | GW=192.0.2.4 | 234 | | | | | | | | +------+ | 235 | | | | | | | +----+ GW +-- | 236 | | | | | | | /+------+ | 237 | | | | | | | 192.0.2.4/24 | 238 | | | | | | | | 239 | DC West | | | IP/MPLS Backbone | | | DC East | 240 +------------------+ | | | | +------------------+ 241 | +--------------------+ | 242 | | 243 VRF_A : V VRF_A : V 244 +------------+---------+--------+ +------------+---------+--------+ 245 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 246 +------------+---------+--------+ +------------+---------+--------+ 247 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 248 +------------+---------+--------+ +------------+---------+--------+ 249 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 250 +------------+---------+--------+ +------------+---------+--------+ 251 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 252 +------------+---------+--------+ +------------+---------+--------+ 253 |192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct | 254 +------------+---------+--------+ +------------+---------+--------+ 255 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 256 +------------+---------+--------+ +------------+---------+--------+ 257 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static | 258 +------------+---------+--------+ +------------+---------+--------+ 259 Figure 2: Inter-subnet Unicast Example (1) 261 As shown in Figure 2, only one data center (i.e., DC East) is 262 deployed with a default gateway (i.e., GW). PE-2 that is connected 263 to GW would either be configured with or learn from GW a default 264 route with the next-hop being pointed at GW. Meanwhile, this route 265 is distributed to other PE routers (i.e., PE-1) as per normal 266 [RFC4364] operation. Assume host A sends an ARP request for its 267 default gateway (i.e., 192.0.2.4) prior to communicating with a 268 destination host outside of its subnet. Upon receiving this ARP 269 request, PE-1 acting as an ARP proxy returns its own MAC address as a 270 response. Host A then sends a packet for Host B to PE-1. PE-1 271 tunnels such packet towards PE-2 according to the default route 272 learnt from PE-2, which in turn forwards that packet to GW. 274 +--------------------+ 275 +------------------+ | | +------------------+ 276 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 277 | \ | | | | / | 278 | +------+ \ ++---+-+ +-+---++/ +------+ | 279 | |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| | 280 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 281 | 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 | 282 | GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 | 283 | +------+ | | | | | | | | +------+ | 284 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +-- | 285 | +------+\ | | | | | | /+------+ | 286 | 192.0.2.4/24 | | | | | | 192.0.2.4/24 | 287 | | | | | | | | 288 | DC West | | | IP/MPLS Backbone | | | DC East | 289 +------------------+ | | | | +------------------+ 290 | +--------------------+ | 291 | | 292 VRF_A : V VRF_A : V 293 +------------+---------+--------+ +------------+---------+--------+ 294 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 295 +------------+---------+--------+ +------------+---------+--------+ 296 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 297 +------------+---------+--------+ +------------+---------+--------+ 298 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 299 +------------+---------+--------+ +------------+---------+--------+ 300 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 301 +------------+---------+--------+ +------------+---------+--------+ 302 |192.0.2.4/32|192.0.2.4| Direct | |192.0.2.4/32|192.0.2.4| Direct | 303 +------------+---------+--------+ +------------+---------+--------+ 304 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 305 +------------+---------+--------+ +------------+---------+--------+ 306 | 0.0.0.0/0 |192.0.2.4| Static | | 0.0.0.0/0 |192.0.2.4| Static | 307 +------------+---------+--------+ +------------+---------+--------+ 308 Figure 3: Inter-subnet Unicast Example (2) 310 As shown in Figure 3, in the case where each data center is deployed 311 with a default gateway, hosts will get ARP responses directly from 312 their local default gateways, rather than from their local PE routers 313 when sending ARP requests for their default gateways. 315 +------+ 316 +------+ PE-3 +------+ 317 +------------------+ | +------+ | +------------------+ 318 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 319 | \ | | | | / | 320 | +------+ \ ++---+-+ +-+---++/ +------+ | 321 | |Host A+-------+ PE-1 | | PE-2 +------+Host B| | 322 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 323 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 324 | GW=192.0.2.1 | | | | | | GW=192.0.2.1 | 325 | | | | | | | | 326 | DC West | | | IP/MPLS Backbone | | | DC East | 327 +------------------+ | | | | +------------------+ 328 | +--------------------+ | 329 | | 330 VRF_A : V VRF_A : V 331 +------------+---------+--------+ +------------+---------+--------+ 332 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 333 +------------+---------+--------+ +------------+---------+--------+ 334 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 335 +------------+---------+--------+ +------------+---------+--------+ 336 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 337 +------------+---------+--------+ +------------+---------+--------+ 338 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 339 +------------+---------+--------+ +------------+---------+--------+ 340 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 341 +------------+---------+--------+ +------------+---------+--------+ 342 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 343 +------------+---------+--------+ +------------+---------+--------+ 344 Figure 4: Inter-subnet Unicast Example (3) 346 Alternatively, as shown in Figure 4, PE routers themselves could be 347 configured as default gateways for their locally connected hosts as 348 long as these PE routers have routes to reach outside networks. 350 3.2. Multicast 352 To support IP multicast between hosts of the same Virtual Subnet, 353 MVPN technologies [RFC6513] could be used without any change. For 354 example, PE routers attached to a given VPN join a default provider 355 multicast distribution tree which is dedicated to that VPN. Ingress 356 PE routers, upon receiving multicast packets from their local hosts, 357 forward them towards remote PE routers through the corresponding 358 default provider multicast distribution tree. Within this context, 359 the IP multicast doesn't include link-local multicast. 361 3.3. Host Discovery 363 PE routers should be able to dynamically discover their local hosts 364 and keep the list of these hosts up-to-date in a timely manner so as 365 to ensure the availability and accuracy of the corresponding host 366 routes originated from them. PE routers could accomplish local host 367 discovery by some traditional host discovery mechanisms using ARP or 368 ND protocols. 370 3.4. ARP/ND Proxy 372 Acting as an ARP or ND proxy, a PE router should only respond to an 373 ARP request or Neighbor Solicitation (NS) message for a target host 374 when it has a best route for that target host in the associated VRF 375 and the outgoing interface of that best route is different from the 376 one over which the ARP request or NS message is received. In the 377 scenario where a given VPN site (i.e., a data center) is multi-homed 378 to more than one PE router via an Ethernet switch or an Ethernet 379 network, the Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 380 usually enabled on these PE routers. In this case, only the PE 381 router being elected as the VRRP Master is allowed to perform the 382 ARP/ND proxy function. 384 3.5. Host Mobility 386 During the VM migration process, the PE router to which the moving VM 387 is now attached would create a host route for that host upon 388 receiving a notification message of VM attachment (e.g., a gratuitous 389 ARP or unsolicited NA message). The PE router to which the moving VM 390 was previously attached would withdraw the corresponding host route 391 when noticing the detachment of that VM. Meanwhile, the latter PE 392 router could optionally broadcast a gratuitous ARP or send an 393 unsolicited NA message on behalf of that host with source MAC address 394 being one of its own. In this way, the ARP/ND entry of this host 395 that moved and which has been cached on any local host would be 396 updated accordingly. In the case where there is no explicit VM 397 detachment notification mechanism, the PE router could also use the 398 following trick to detect the VM detachment: upon learning a route 399 update for a local host from a remote PE router for the first time, 400 the PE router could immediately check whether that local host is 401 still attached to it by some means (e.g., ARP/ND PING and/or ICMP 402 PING). It is important to ensure that the same MAC and IP are 403 associated to the default gateway active in each data center, as the 404 VM would most likely continue to send packets to the same default 405 gateway address after having migrated from one data center to 406 another. One possible way to achieve this goal is to configure the 407 same VRRP group on each location so as to ensure that the default 408 gateway active in each data center shares the same virtual MAC and 409 virtual IP addresses. 411 3.6. Forwarding Table Scalability on Data Center Switches 413 In a Virtual Subnet environment, the MAC learning domain associated 414 with a given Virtual Subnet which has been extended across multiple 415 data centers is partitioned into segments and each segment is 416 confined within a single data center. Therefore data center switches 417 only need to learn local MAC addresses, rather than learning both 418 local and remote MAC addresses. 420 3.7. ARP/ND Cache Table Scalability on Default Gateways 422 When default gateway functions are implemented on PE routers as shown 423 in Figure 4, the ARP/ND cache table on each PE router only needs to 424 contain ARP/ND entries of local hosts. As a result, the ARP/ND cache 425 table size would not grow as the number of data centers to be 426 connected increases. 428 3.8. ARP/ND and Unknown Unicast Flood Avoidance 430 In a Virtual Subnet environment, the flooding domain associated with 431 a given Virtual Subnet that has been extended across multiple data 432 centers, is partitioned into segments and each segment is confined 433 within a single data center. Therefore, the performance impact on 434 networks and servers imposed by the flooding of ARP/ND broadcast/ 435 multicast and unknown unicast traffic is minimized. 437 3.9. Path Optimization 439 Take the scenario shown in Figure 4 as an example, to optimize the 440 forwarding path for the traffic between cloud users and cloud data 441 centers, PE routers located in cloud data centers (i.e., PE-1 and PE- 442 2), which are also acting as default gateways, propagate host routes 443 for their own local hosts respectively to remote PE routers which are 444 attached to cloud user sites (i.e., PE-3). As such, traffic from 445 cloud user sites to a given server on the Virtual Subnet which has 446 been extended across data centers would be forwarded directly to the 447 data center location where that server resides, since traffic is now 448 forwarded according to the host route for that server, rather than 449 the subnet route. Furthermore, for traffic coming from cloud data 450 centers and forwarded to cloud user sites, each PE router acting as a 451 default gateway would forward traffic according to the longest-match 452 route in the corresponding VRF. As a result, traffic from data 453 centers to cloud user sites is forwarded along an optimal path as 454 well. 456 4. Limitations 458 4.1. Non-support of Non-IP Traffic 460 Although most traffic within and across data centers is IP traffic, 461 there may still be a few legacy clustering applications which rely on 462 non-IP communications (e.g., heartbeat messages between cluster 463 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 464 those non-IP communications cannot be supported in the Virtual Subnet 465 solution. In order to support those few non-IP traffic (if present) 466 in the environment where the Virtual Subnet solution has been 467 deployed, the approach following the idea of "route all IP traffic, 468 bridge non-IP traffic" could be considered. In other words, all IP 469 traffic including both intra- and inter-subnet, would be processed 470 according to the Virtual Subnet design, while non-IP traffic would be 471 forwarded according to a particular Layer 2 VPN approach. Such 472 unified L2/L3 VPN approach requires ingress PE routers to classify 473 packets received from hosts before distributing them to the 474 corresponding L2 or L3 VPN forwarding processes. Note that more and 475 more cluster vendors are offering clustering applications based on 476 Layer 3 interconnection. 478 4.2. Non-support of IP Broadcast and Link-local Multicast 480 As illustrated before, intra-subnet traffic is forwarded at Layer 3 481 in the Virtual Subnet solution. Therefore, IP broadcast and link- 482 local multicast traffic cannot be supported by the Virtual Subnet 483 solution. In order to support the IP broadcast and link-local 484 multicast traffic in the environment where the Virtual Subnet 485 solution has been deployed, the unified L2/L3 overlay approach as 486 described in Section 4.1 could be considered as well. That is, IP 487 broadcast and link-local multicast messages would be forwared at 488 Layer 2 while routable IP traffic would be processed according to the 489 Virtual Subnet design. 491 4.3. TTL and Traceroute 493 As mentioned before, intra-subnet traffic is forwarded at Layer 3 in 494 the Virtual Subnet context. Since it doesn't require any change to 495 the Time To Live (TTL) handling mechanism of the BGP/MPLS IP VPN, 496 when doing a traceroute operation on one host for another host 497 (assuming that these two hosts are within the same subnet but are 498 attached to different sites), the traceroute output would reflect the 499 fact that these two hosts within the same subnet are actually 500 connected via a Virtual Subnet, rather than a Layer 2 connection 501 since the PE routers to which those two hosts are connected would be 502 displayed in the traceroute output. In addition, for any other 503 applications that generate intra-subnet traffic with TTL set to 1, 504 these applications may not work properly in the Virtual Subnet 505 context, unless special TTL processing for such context has been 506 implemented (e.g., if the source and destination addresses of a 507 packet whose TTL is set to 1 belong to the same extended subnet, 508 neither ingress nor egress PE routers should decrement the TTL of 509 such packet. Furthermore, the TTL of such packet should not be 510 copied into the TTL of the transport tunnel and vice versa). 512 5. Acknowledgements 514 Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, 515 Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, 516 Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe 517 Touch and Wim Henderickx for their valuable comments and suggestions 518 on this document. Thanks to Loa Andersson for his WG LC review on 519 this document. Thanks to Alvaro Retana for his AD review on this 520 document. Thanks to Ronald Bonica for his RtgDir review. Thanks to 521 Donald Eastlake for his Sec-DIR review of this document. Thanks to 522 Jouni Korhonen for the OPS-Dir review of this document. Thanks to 523 Roni Even for the Gen-ART review of this document. Thanks to Sabrina 524 Tanamal for the IANA review of this document. 526 6. IANA Considerations 528 There is no requirement for any IANA action. 530 7. Security Considerations 532 Since the BGP/MPLS IP VPN signaling is reused without any change, 533 those security considerations as described in [RFC4364] are 534 applicable to this document. Meanwhile, since security issues 535 associated with the NDP are inherited due to the use of NDP proxy, 536 those security considerations and recommendations as described in 537 [RFC6583] are applicable to this document as well. 539 8. References 541 8.1. Normative References 543 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 544 DOI 10.17487/RFC0925, October 1984, 545 . 547 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 548 implement transparent subnet gateways", RFC 1027, 549 DOI 10.17487/RFC1027, October 1987, 550 . 552 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 553 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 554 2006, . 556 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 557 Proxies (ND Proxy)", RFC 4389, DOI 10.17487/RFC4389, April 558 2006, . 560 8.2. Informative References 562 [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, 563 "BGP-MPLS IP Virtual Private Network (VPN) Extension for 564 IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006, 565 . 567 [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private 568 LAN Service (VPLS) Using BGP for Auto-Discovery and 569 Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, 570 . 572 [RFC4762] Lasserre, M., Ed. and V. Kompella, Ed., "Virtual Private 573 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 574 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 575 . 577 [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) 578 Version 3 for IPv4 and IPv6", RFC 5798, 579 DOI 10.17487/RFC5798, March 2010, 580 . 582 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 583 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 584 2012, . 586 [RFC6583] Gashinsky, I., Jaeggli, J., and W. Kumari, "Operational 587 Neighbor Discovery Problems", RFC 6583, 588 DOI 10.17487/RFC6583, March 2012, 589 . 591 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 592 Problems in Large Data Center Networks", RFC 6820, 593 DOI 10.17487/RFC6820, January 2013, 594 . 596 Authors' Addresses 598 Xiaohu Xu 599 Huawei Technologies 600 No.156 Beiqing Rd 601 Beijing 100095 602 CHINA 604 Email: xuxiaohu@huawei.com 606 Robert Raszuk 607 Bloomberg LP 608 731 Lexington Ave 609 New York City, NY 10022 610 USA 612 Email: robert@raszuk.net 614 Christian Jacquenet 615 Orange 616 4 rue du Clos Courtel 617 Cesson-Sevigne, 35512 618 FRANCE 620 Email: christian.jacquenet@orange.com 622 Truman Boyes 623 Bloomberg LP 625 Email: tboyes@bloomberg.net 627 Brendan Fee 628 Extreme Networks 630 Email: bfee@extremenetworks.com