idnits 2.17.1 draft-ietf-bess-virtual-subnet-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 10, 2015) is 3089 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei 4 Intended status: Informational R. Raszuk 5 Expires: May 13, 2016 Mirantis Inc. 6 C. Jacquenet 7 Orange 8 T. Boyes 9 Bloomberg LP 10 B. Fee 11 Extreme Networks 12 November 10, 2015 14 Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution 15 draft-ietf-bess-virtual-subnet-04 17 Abstract 19 This document describes a BGP/MPLS IP VPN-based subnet extension 20 solution referred to as Virtual Subnet, which can be used for 21 building Layer 3 network virtualization overlays within and/or 22 between data centers. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 13, 2016. 41 Copyright Notice 43 Copyright (c) 2015 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 61 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4 63 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 5 64 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 65 3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9 66 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 67 3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9 68 3.6. Forwarding Table Scalability on Data Center Switches . . 10 69 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 70 3.8. ARP/ND and Unknown Uncast Flood Avoidance . . . . . . . . 10 71 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 72 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 74 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 75 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 76 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 81 8.2. Informative References . . . . . . . . . . . . . . . . . 13 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 84 1. Introduction 86 For business continuity purpose, Virtual Machine (VM) migration 87 across data centers is commonly used in situations such as data 88 center maintenance, data center migration, data center consolidation, 89 data center expansion, and data center disaster avoidance. It's 90 generally admitted that IP renumbering of servers (i.e., VMs) after 91 the migration is usually complex and costly at the risk of extending 92 the business downtime during the process of migration. To allow the 93 migration of a VM from one data center to another without IP 94 renumbering, the subnet on which the VM resides needs to be extended 95 across these data centers. 97 To achieve subnet extension across multiple Infrastructure-as- 98 a-Service (IaaS) cloud data centers in a scalable way, the following 99 requirements and challenges must be considered: 101 a. VPN Instance Space Scalability: In a modern cloud data center 102 environment, thousands or even tens of thousands of tenants could 103 be hosted over a shared network infrastructure. For security and 104 performance isolation purposes, these tenants need to be isolated 105 from one another. 107 b. Forwarding Table Scalability: With the development of server 108 virtualization technologies, it's not uncommon for a single cloud 109 data center to contain millions of VMs. This number already 110 implies a big challenge on the forwarding table scalability of 111 data center switches. Provided multiple data centers of such 112 scale were interconnected at Layer 2, this challenge would become 113 even worse. 115 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 116 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 117 maintained on default gateways within cloud data centers can 118 raise scalability issues. Therefore, it's very useful if the 119 ARP/ND cache table size could be prevented from growing by 120 multiples as the number of data centers to be connected 121 increases. 123 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 124 flooding of ARP/ND broadcast/multicast and unknown unicast 125 traffic within large Layer 2 networks would affect the 126 performance of networks and hosts. As multiple data centers with 127 each containing millions of VMs are interconnected at Layer 2, 128 the impact of flooding as mentioned above would become even 129 worse. As such, it becomes increasingly important to avoid the 130 flooding of ARP/ND broadcast/multicast and unknown unicast 131 traffic across data centers. 133 e. Path Optimization: A subnet usually indicates a location in the 134 network. However, when a subnet has been extended across 135 multiple geographically dispersed data center locations, the 136 location semantics of such subnet is not retained any longer. As 137 a result, the traffic between a specific user and server, in 138 different data centers, may first be routed through a third data 139 center. This suboptimal routing would obviously result in an 140 unnecessary consumption of the bandwidth resource between data 141 centers. Furthermore, in the case where traditional VPLS 142 technology [RFC4761] [RFC4762] is used for data center 143 interconnect, return traffic from a server may be forwarded to a 144 default gateway located in a different data center due to the 145 configuration in a virtual router redundancy group. This 146 suboptimal routing would also unnecessarily consume the bandwidth 147 resource between data centers. 149 This document describes a BGP/MPLS IP VPN-based subnet extension 150 solution referred to as Virtual Subnet, which can be used for data 151 center interconnection while addressing all of the requirements and 152 challenges as mentioned above. Here the BGP/MPLS IP VPN means both 153 BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In 154 addition, since Virtual Subnet is mainly built on proven technologies 155 such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], 156 those service providers offering IaaS public cloud services could 157 rely upon their existing BGP/MPLS IP VPN infrastructures and their 158 corresponding experiences to realize data center interconnection. 160 Although Virtual Subnet is described in this document as an approach 161 for data center interconnection, it actually could be used within 162 data centers as well. 164 Note that the approach described in this document is not intended to 165 achieve an exact emulation of Layer 2 connectivity and therefore it 166 can only support a restricted Layer 2 connectivity service model with 167 limitations declared in Section 4. As for the discussion about in 168 which environment this service model should be suitable, it's outside 169 the scope of this document. 171 2. Terminology 173 This memo makes use of the terms defined in [RFC4364]. 175 3. Solution Description 177 3.1. Unicast 179 3.1.1. Intra-subnet Unicast 180 +--------------------+ 181 +------------------+ | | +------------------+ 182 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 183 | \ | | | | / | 184 | +------+ \ ++---+-+ +-+---++/ +------+ | 185 | |Host A+-----+ PE-1 | | PE-2 +----+Host B| | 186 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 187 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 188 | | | | | | | | 189 | DC West | | | IP/MPLS Backbone | | | DC East | 190 +------------------+ | | | | +------------------+ 191 | +--------------------+ | 192 | | 193 VRF_A : V VRF_A : V 194 +------------+---------+--------+ +------------+---------+--------+ 195 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 196 +------------+---------+--------+ +------------+---------+--------+ 197 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 198 +------------+---------+--------+ +------------+---------+--------+ 199 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 200 +------------+---------+--------+ +------------+---------+--------+ 201 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 202 +------------+---------+--------+ +------------+---------+--------+ 203 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 204 +------------+---------+--------+ +------------+---------+--------+ 205 Figure 1: Intra-subnet Unicast Example 207 As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to 208 the same subnet (i.e., 192.0.2.0/24) are located at different data 209 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 210 PE-1 and PE-2) which are used for interconnecting these two data 211 centers create host routes for their own local hosts respectively and 212 then advertise them via the BGP/MPLS IP VPN signaling. Meanwhile, an 213 ARP proxy is enabled on VRF attachment circuits of these PE routers. 215 Now assume host A sends an ARP request for host B before 216 communicating with host B. Upon receiving the ARP request, PE-1 217 acting as an ARP proxy returns its own MAC address as a response. 218 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 219 packets towards PE-2 which in turn forwards them to host B. Thus, 220 hosts A and B can communicate with each other as if they were located 221 within the same subnet. 223 3.1.2. Inter-subnet Unicast 224 +--------------------+ 225 +------------------+ | | +------------------+ 226 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 227 | \ | | | | / | 228 | +------+ \ ++---+-+ +-+---++/ +------+ | 229 | |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| | 230 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 231 | 192.0.2.2/24 | | | | | | | 192.0.2.3/24 | 232 | GW=192.0.2.4 | | | | | | | GW=192.0.2.4 | 233 | | | | | | | | +------+ | 234 | | | | | | | +----+ GW +-- | 235 | | | | | | | /+------+ | 236 | | | | | | | 192.0.2.4/24 | 237 | | | | | | | | 238 | DC West | | | IP/MPLS Backbone | | | DC East | 239 +------------------+ | | | | +------------------+ 240 | +--------------------+ | 241 | | 242 VRF_A : V VRF_A : V 243 +------------+---------+--------+ +------------+---------+--------+ 244 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 245 +------------+---------+--------+ +------------+---------+--------+ 246 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 247 +------------+---------+--------+ +------------+---------+--------+ 248 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 249 +------------+---------+--------+ +------------+---------+--------+ 250 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 251 +------------+---------+--------+ +------------+---------+--------+ 252 |192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct | 253 +------------+---------+--------+ +------------+---------+--------+ 254 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 255 +------------+---------+--------+ +------------+---------+--------+ 256 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static | 257 +------------+---------+--------+ +------------+---------+--------+ 258 Figure 2: Inter-subnet Unicast Example (1) 260 As shown in Figure 2, only one data center (i.e., DC East) is 261 deployed with a default gateway (i.e., GW). PE-2 which is connected 262 to GW would either be configured with or learn from GW a default 263 route with next-hop being pointed to GW. Meanwhile, this route is 264 distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 265 operation. Assume host A sends an ARP request for its default 266 gateway (i.e., 192.0.2.4) prior to communicating with a destination 267 host outside of its subnet. Upon receiving this ARP request, PE-1 268 acting as an ARP proxy returns its own MAC address as a response. 269 Host A then sends a packet for Host B to PE-1. PE-1 tunnels such 270 packet towards PE-2 according to the default route learnt from PE-2, 271 which in turn forwards that packet to GW. 273 +--------------------+ 274 +------------------+ | | +------------------+ 275 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 276 | \ | | | | / | 277 | +------+ \ ++---+-+ +-+---++/ +------+ | 278 | |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| | 279 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 280 | 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 | 281 | GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 | 282 | +------+ | | | | | | | | +------+ | 283 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +-- | 284 | +------+\ | | | | | | /+------+ | 285 | 192.0.2.4/24 | | | | | | 192.0.2.4/24 | 286 | | | | | | | | 287 | DC West | | | IP/MPLS Backbone | | | DC East | 288 +------------------+ | | | | +------------------+ 289 | +--------------------+ | 290 | | 291 VRF_A : V VRF_A : V 292 +------------+---------+--------+ +------------+---------+--------+ 293 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 294 +------------+---------+--------+ +------------+---------+--------+ 295 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 296 +------------+---------+--------+ +------------+---------+--------+ 297 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 298 +------------+---------+--------+ +------------+---------+--------+ 299 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 300 +------------+---------+--------+ +------------+---------+--------+ 301 |192.0.2.4/32|192.0.2.4| Direct | |192.0.2.4/32|192.0.2.4| Direct | 302 +------------+---------+--------+ +------------+---------+--------+ 303 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 304 +------------+---------+--------+ +------------+---------+--------+ 305 | 0.0.0.0/0 |192.0.2.4| Static | | 0.0.0.0/0 |192.0.2.4| Static | 306 +------------+---------+--------+ +------------+---------+--------+ 307 Figure 3: Inter-subnet Unicast Example (2) 309 As shown in Figure 3, in the case where each data center is deployed 310 with a default gateway, hosts will get ARP responses directly from 311 their local default gateways, rather than from their local PE routers 312 when sending ARP requests for their default gateways. 314 +------+ 315 +------+ PE-3 +------+ 316 +------------------+ | +------+ | +------------------+ 317 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 318 | \ | | | | / | 319 | +------+ \ ++---+-+ +-+---++/ +------+ | 320 | |Host A+-------+ PE-1 | | PE-2 +------+Host B| | 321 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 322 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 323 | GW=192.0.2.1 | | | | | | GW=192.0.2.1 | 324 | | | | | | | | 325 | DC West | | | IP/MPLS Backbone | | | DC East | 326 +------------------+ | | | | +------------------+ 327 | +--------------------+ | 328 | | 329 VRF_A : V VRF_A : V 330 +------------+---------+--------+ +------------+---------+--------+ 331 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 332 +------------+---------+--------+ +------------+---------+--------+ 333 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 334 +------------+---------+--------+ +------------+---------+--------+ 335 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 336 +------------+---------+--------+ +------------+---------+--------+ 337 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 338 +------------+---------+--------+ +------------+---------+--------+ 339 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 340 +------------+---------+--------+ +------------+---------+--------+ 341 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 342 +------------+---------+--------+ +------------+---------+--------+ 343 Figure 4: Inter-subnet Unicast Example (3) 345 Alternatively, as shown in Figure 4, PE routers themselves could be 346 directly configured as default gateways of their locally connected 347 hosts as long as these PE routers have routes for outside networks. 349 3.2. Multicast 351 To support IP multicast between hosts of the same Virtual Subnet, 352 MVPN technologies [RFC6513] could be directly used without any 353 change. For example, PE routers attached to a given VPN join a 354 default provider multicast distribution tree which is dedicated for 355 that VPN. Ingress PE routers, upon receiving multicast packets from 356 their local hosts, forward them towards remote PE routers through the 357 corresponding default provider multicast distribution tree. Note 358 that here the IP multicast doesn't include link-local multicast. 360 3.3. Host Discovery 362 PE routers should be able to discover their local hosts and keep the 363 list of these hosts up to date in a timely manner so as to ensure the 364 availability and accuracy of the corresponding host routes originated 365 from them. PE routers could accomplish local host discovery by some 366 traditional host discovery mechanisms using ARP or ND protocols. 368 3.4. ARP/ND Proxy 370 Acting as an ARP or ND proxies, a PE routers should only respond to 371 an ARP request or Neighbor Solicitation (NS) message for a target 372 host when it has a best route for that target host in the associated 373 VRF and the outgoing interface of that best route is different from 374 the one over which the ARP request or NS message is received. In the 375 scenario where a given VPN site (i.e., a data center) is multi-homed 376 to more than one PE router via an Ethernet switch or an Ethernet 377 network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 378 usually enabled on these PE routers. In this case, only the PE 379 router being elected as the VRRP Master is allowed to perform the 380 ARP/ND proxy function. 382 3.5. Host Mobility 384 During the VM migration process, the PE router to which the moving VM 385 is now attached would create a host route for that host upon 386 receiving a notification message of VM attachment (e.g., a gratuitous 387 ARP or unsolicited NA message). The PE router to which the moving VM 388 was previously attached would withdraw the corresponding host route 389 when receiving a notification message of VM detachment (e.g., a VDP 390 message about VM detachment). Meanwhile, the latter PE router could 391 optionally broadcast a gratuitous ARP or send an unsolicited NA 392 message on behalf of that host with source MAC address being one of 393 its own. In this way, the ARP/ND entry of this host that moved and 394 which has been cached on any local host would be updated accordingly. 395 In the case where there is no explicit VM detachment notification 396 mechanism, the PE router could also use the following trick to 397 determine the VM detachment event: upon learning a route update for a 398 local host from a remote PE router for the first time, the PE router 399 could immediately check whether that local host is still attached to 400 it by some means (e.g., ARP/ND PING and/or ICMP PING). It is 401 important to ensure that the same MAC and IP are associated to the 402 default gateway active in each data center, as the VM would most 403 likely continue to send packets to the same default gateway address 404 after migrated from one data center to another. One possible way to 405 achieve this goal is to configure the same VRRP group on each 406 location so as to ensure the default gateway active in each data 407 center share the same virtual MAC and virtual IP addresses. 409 3.6. Forwarding Table Scalability on Data Center Switches 411 In a Virtual Subnet environment, the MAC learning domain associated 412 with a given Virtual Subnet which has been extended across multiple 413 data centers is partitioned into segments and each segment is 414 confined within a single data center. Therefore data center switches 415 only need to learn local MAC addresses, rather than learning both 416 local and remote MAC addresses. 418 3.7. ARP/ND Cache Table Scalability on Default Gateways 420 When default gateway functions are implemented on PE routers as shown 421 in Figure 4, the ARP/ND cache table on each PE router only needs to 422 contain ARP/ND entries of local hosts As a result, the ARP/ND cache 423 table size would not grow as the number of data centers to be 424 connected increases. 426 3.8. ARP/ND and Unknown Uncast Flood Avoidance 428 In a Virtual Subnet environment, the flooding domain associated with 429 a given Virtual Subnet that has been extended across multiple data 430 centers, is partitioned into segments and each segment is confined 431 within a single data center. Therefore, the performance impact on 432 networks and servers imposed by the flooding of ARP/ND broadcast/ 433 multicast and unknown unicast traffic is alleviated. 435 3.9. Path Optimization 437 Take the scenario shown in Figure 4 as an example, to optimize the 438 forwarding path for the traffic between cloud users and cloud data 439 centers, PE routers located at cloud data centers (i.e., PE-1 and PE- 440 2), which are also acting as default gateways, propagate host routes 441 for their own local hosts respectively to remote PE routers which are 442 attached to cloud user sites (i.e., PE-3). As such, the traffic from 443 cloud user sites to a given server on the Virtual Subnet which has 444 been extended across data centers would be forwarded directly to the 445 data center location where that server resides, since the traffic is 446 now forwarded according to the host route for that server, rather 447 than the subnet route. Furthermore, for the traffic coming from 448 cloud data centers and forwarded to cloud user sites, each PE router 449 acting as a default gateway would forward the traffic according to 450 the best-match route in the corresponding VRF. As a result, the 451 traffic from data centers to cloud user sites is forwarded along an 452 optimal path as well. 454 4. Limitations 456 4.1. Non-support of Non-IP Traffic 458 Although most traffic within and across data centers is IP traffic, 459 there may still be a few legacy clustering applications which rely on 460 non-IP communications (e.g., heartbeat messages between cluster 461 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 462 those non-IP communications cannot be supported in the Virtual Subnet 463 solution. In order to support those few non-IP traffic (if present) 464 in the environment where the Virtual Subnet solution has been 465 deployed, the approach following the idea of "route all IP traffic, 466 bridge non-IP traffic" could be considered. That's to say, all IP 467 traffic including both intra-subnet and inter-subnet would be 468 processed by the Virtual Subnet process, while the non-IP traffic 469 would be resorted to a particular Layer 2 VPN approach. Such unified 470 L2/L3 VPN approach requires ingress PE routers to classify the 471 traffic received from hosts before distributing them to the 472 corresponding L2 or L3 VPN forwarding processes. Note that more and 473 more cluster vendors are offering clustering applications based on 474 Layer 3 interconnection. 476 4.2. Non-support of IP Broadcast and Link-local Multicast 478 As illustrated before, intra-subnet traffic is forwarded at Layer 3 479 in the Virtual Subnet solution. Therefore, IP broadcast and link- 480 local multicast traffic cannot be supported by the Virtual Subnet 481 solution. In order to support the IP broadcast and link-local 482 multicast traffic in the environment where the Virtual Subnet 483 solution has been deployed, the unified L2/L3 overlay approach as 484 described in Section 4.1 could be considered as well. That's to say, 485 the IP broadcast and link-local multicast would be resorted to the 486 L2VPN forwarding process while the routable IP traffic would be 487 processed by the Virtual Subnet process. 489 4.3. TTL and Traceroute 491 As illustrated before, intra-subnet traffic is forwarded at Layer 3 492 in the Virtual Subnet context. Since it doesn't require any change 493 to the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a 494 traceroute operation on one host for another host (assuming that 495 these two hosts are within the same subnet but are attached to 496 different sites), the traceroute output would reflect the fact that 497 these two hosts within the same subnet are actually connected via an 498 Virtual Subnet, rather than a Layer 2 connection since the PE routers 499 to which those two host are connected respectively would be displayed 500 in the traceroute output. In addition, for any other applications 501 which generate intra-subnet traffic with TTL set to 1, these 502 applications may not be workable in the Virtual Subnet context, 503 unless special TTL processing for such case has been implemented 504 (e.g., if the source and destination addresses of a packet whose TTL 505 is set to 1 belong to the same extended subnet, neither ingress nor 506 egress PE routers should decrement the TTL of such packet. 507 Furthermore, the TTL of such packet should not be copied into the TTL 508 of the transport tunnel and vice versa). 510 5. Acknowledgements 512 Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, 513 Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, 514 Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe 515 Touch and Wim Henderickx for their valuable comments and suggestions 516 on this document. Thanks to Loa Andersson for his WG LC review on 517 this document. Thanks to Alvaro Retana for his AD review on this 518 document. Thanks to Ronald Bonica for his RtgDir review. 520 6. IANA Considerations 522 There is no requirement for any IANA action. 524 7. Security Considerations 526 This document doesn't introduce additional security risk to BGP/MPLS 527 IP VPN, nor does it provide any additional security feature for BGP/ 528 MPLS IP VPN. 530 8. References 532 8.1. Normative References 534 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 535 DOI 10.17487/RFC0925, October 1984, 536 . 538 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 539 implement transparent subnet gateways", RFC 1027, 540 DOI 10.17487/RFC1027, October 1987, 541 . 543 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 544 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 545 2006, . 547 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 548 Proxies (ND Proxy)", RFC 4389, DOI 10.17487/RFC4389, April 549 2006, . 551 8.2. Informative References 553 [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, 554 "BGP-MPLS IP Virtual Private Network (VPN) Extension for 555 IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006, 556 . 558 [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private 559 LAN Service (VPLS) Using BGP for Auto-Discovery and 560 Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, 561 . 563 [RFC4762] Lasserre, M., Ed. and V. Kompella, Ed., "Virtual Private 564 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 565 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 566 . 568 [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) 569 Version 3 for IPv4 and IPv6", RFC 5798, 570 DOI 10.17487/RFC5798, March 2010, 571 . 573 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 574 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 575 2012, . 577 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 578 Problems in Large Data Center Networks", RFC 6820, 579 DOI 10.17487/RFC6820, January 2013, 580 . 582 Authors' Addresses 584 Xiaohu Xu 585 Huawei 587 Email: xuxiaohu@huawei.com 589 Robert Raszuk 590 Mirantis Inc. 592 Email: robert@raszuk.net 593 Christian Jacquenet 594 Orange 596 Email: christian.jacquenet@orange.com 598 Truman Boyes 599 Bloomberg LP 601 Email: tboyes@bloomberg.net 603 Brendan Fee 604 Extreme Networks 606 Email: bfee@extremenetworks.com