idnits 2.17.1 draft-ietf-bess-virtual-subnet-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 9, 2015) is 3055 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-03) exists of draft-ietf-mpls-opportunistic-encrypt-00 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Huawei Technologies 4 Intended status: Informational C. Jacquenet 5 Expires: June 11, 2016 Orange 6 R. Raszuk 7 T. Boyes 8 Bloomberg LP 9 B. Fee 10 Extreme Networks 11 December 9, 2015 13 Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution 14 draft-ietf-bess-virtual-subnet-07 16 Abstract 18 This document describes a BGP/MPLS IP VPN-based subnet extension 19 solution referred to as Virtual Subnet, which can be used for 20 building Layer 3 network virtualization overlays within and/or 21 between data centers. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on June 11, 2016. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 60 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4 62 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 6 63 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 64 3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9 65 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 66 3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9 67 3.6. Forwarding Table Scalability on Data Center Switches . . 10 68 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 69 3.8. ARP/ND and Unknown Unicast Flood Avoidance . . . . . . . 10 70 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 71 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 73 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 74 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 75 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 76 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 78 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 79 8.1. Normative References . . . . . . . . . . . . . . . . . . 13 80 8.2. Informative References . . . . . . . . . . . . . . . . . 13 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 83 1. Introduction 85 For business continuity purposes, Virtual Machine (VM) migration 86 across data centers is commonly used in situations such as data 87 center maintenance, migration, consolidation, expansion, or disaster 88 avoidance. The IETF community has recognized that IP renumbering of 89 servers (i.e., VMs) after the migration is usually complex and 90 costly. To allow the migration of a VM from one data center to 91 another without IP renumbering, the subnet on which the VM resides 92 needs to be extended across these data centers. 94 To achieve subnet extension across multiple cloud data centers in a 95 scalable way, the following requirements and challenges must be 96 considered: 98 a. VPN Instance Space Scalability: In a modern cloud data center 99 environment, thousands or even tens of thousands of tenants could 100 be hosted over a shared network infrastructure. For security and 101 performance isolation purposes, these tenants need to be isolated 102 from one another. 104 b. Forwarding Table Scalability: With the development of server 105 virtualization technologies, it's not uncommon for a single cloud 106 data center to contain millions of VMs. This number already 107 implies a big challenge to the forwarding table scalability of 108 data center switches. Provided multiple data centers of such 109 scale were interconnected at Layer 2, this challenge would become 110 even worse. 112 c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address 113 Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables 114 maintained by default gateways within cloud data centers can 115 raise scalability issues. Therefore, mastering the size of the 116 ARP/ND cache tables is critical as the number of data centers to 117 be connected increases. 119 d. ARP/ND and Unknown Unicast Flooding: It's well-known that the 120 flooding of ARP/ND broadcast/multicast messages as well as 121 unknown unicast traffic within large Layer 2 networks is likely 122 to affect network and host performance. When multiple data 123 centers that each hosts millions of VMs are interconnected at 124 Layer 2, the impact of such flooding would become even worse. As 125 such, it becomes increasingly important to avoid the flooding of 126 ARP/ND broadcast/multicast as well as unknown unicast traffic 127 across data centers. 129 e. Path Optimization: A subnet usually indicates a location in the 130 network. However, when a subnet has been extended across 131 multiple geographically-dispersed data center locations, the 132 location semantics of such subnet is not retained any longer. As 133 a result, traffic exchanged between a specific user and a server 134 that would be located in different data centers, may first be 135 forwarded through a third data center. This suboptimal routing 136 would obviously result in an unnecessary consumption of the 137 bandwidth resources between data centers. Furthermore, in the 138 case where traditional VPLS technology [RFC4761] [RFC4762] is 139 used for data center interconnect, return traffic from a server 140 may be forwarded to a default gateway located in a different data 141 center due to the configuration of a virtual router redundancy 142 group. This suboptimal routing would also unnecessarily consume 143 the bandwidth resources between data centers. 145 This document describes a BGP/MPLS IP VPN-based subnet extension 146 solution referred to as Virtual Subnet, which can be used for data 147 center interconnection while addressing all of the aforementioned 148 requirements and challenges. Here the BGP/MPLS IP VPN means both 149 BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In 150 addition, since Virtual Subnet is mainly built on proven technologies 151 such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], 152 those service providers that provide Infrastructure as a Service 153 (IaaS) cloud services can rely upon their existing BGP/MPLS IP VPN 154 infrastructure and take advantage of their BGP/MPLS VPN operational 155 experience to interconnect data centers. 157 Although Virtual Subnet is described in this document as an approach 158 for data center interconnection, it can be used within data centers 159 as well. 161 Note that the approach described in this document is not intended to 162 achieve an exact emulation of Layer 2 connectivity and therefore it 163 can only support a restricted Layer 2 connectivity service model with 164 limitations that are discussed in Section 4. As for the discussion 165 about where this service model can apply, it's outside the scope of 166 this document. 168 2. Terminology 170 This memo makes use of the terms defined in [RFC4364]. 172 3. Solution Description 174 3.1. Unicast 176 3.1.1. Intra-subnet Unicast 177 +--------------------+ 178 +------------------+ | | +------------------+ 179 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 180 | \ | | | | / | 181 | +------+ \ ++---+-+ +-+---++/ +------+ | 182 | |Host A+-----+ PE-1 | | PE-2 +----+Host B| | 183 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 184 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 185 | | | | | | | | 186 | DC West | | | IP/MPLS Backbone | | | DC East | 187 +------------------+ | | | | +------------------+ 188 | +--------------------+ | 189 | | 190 VRF_A : V VRF_A : V 191 +------------+---------+--------+ +------------+---------+--------+ 192 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 193 +------------+---------+--------+ +------------+---------+--------+ 194 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 195 +------------+---------+--------+ +------------+---------+--------+ 196 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 197 +------------+---------+--------+ +------------+---------+--------+ 198 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 199 +------------+---------+--------+ +------------+---------+--------+ 200 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 201 +------------+---------+--------+ +------------+---------+--------+ 202 Figure 1: Intra-subnet Unicast Example 204 As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to 205 the same subnet (i.e., 192.0.2.0/24) are located in different data 206 centers (i.e., DC West and DC East) respectively. PE routers (i.e., 207 PE-1 and PE-2) that are used for interconnecting these two data 208 centers create host routes for their own local hosts respectively and 209 then advertise these routes by means of the BGP/MPLS IP VPN 210 signaling. Meanwhile, an ARP proxy is enabled on Virtual Routing and 211 Forwarding (VRF) attachment circuits of these PE routers. 213 Let's now assume that host A sends an ARP request for host B before 214 communicating with host B. Upon receiving the ARP request, PE-1 215 acting as an ARP proxy returns its own MAC address as a response. 216 Host A then sends IP packets for host B to PE-1. PE-1 tunnels such 217 packets towards PE-2 which in turn forwards them to host B. Thus, 218 hosts A and B can communicate with each other as if they were located 219 within the same subnet. 221 3.1.2. Inter-subnet Unicast 223 +--------------------+ 224 +------------------+ | | +------------------+ 225 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 226 | \ | | | | / | 227 | +------+ \ ++---+-+ +-+---++/ +------+ | 228 | |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| | 229 | +------+\ ++-+-+-+ +-+-+-++ | /+------+ | 230 | 192.0.2.2/24 | | | | | | | 192.0.2.3/24 | 231 | GW=192.0.2.4 | | | | | | | GW=192.0.2.4 | 232 | | | | | | | | +------+ | 233 | | | | | | | +----+ GW +-- | 234 | | | | | | | /+------+ | 235 | | | | | | | 192.0.2.4/24 | 236 | | | | | | | | 237 | DC West | | | IP/MPLS Backbone | | | DC East | 238 +------------------+ | | | | +------------------+ 239 | +--------------------+ | 240 | | 241 VRF_A : V VRF_A : V 242 +------------+---------+--------+ +------------+---------+--------+ 243 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 244 +------------+---------+--------+ +------------+---------+--------+ 245 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 246 +------------+---------+--------+ +------------+---------+--------+ 247 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 248 +------------+---------+--------+ +------------+---------+--------+ 249 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 250 +------------+---------+--------+ +------------+---------+--------+ 251 |192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct | 252 +------------+---------+--------+ +------------+---------+--------+ 253 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 254 +------------+---------+--------+ +------------+---------+--------+ 255 | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static | 256 +------------+---------+--------+ +------------+---------+--------+ 257 Figure 2: Inter-subnet Unicast Example (1) 259 As shown in Figure 2, only one data center (i.e., DC East) is 260 deployed with a default gateway (i.e., GW). PE-2 that is connected 261 to GW would either be configured with or learn from GW a default 262 route with the next-hop being pointed at GW. Meanwhile, this route 263 is distributed to other PE routers (i.e., PE-1) as per normal 264 [RFC4364] operation. Assume host A sends an ARP request for its 265 default gateway (i.e., 192.0.2.4) prior to communicating with a 266 destination host outside of its subnet. Upon receiving this ARP 267 request, PE-1 acting as an ARP proxy returns its own MAC address as a 268 response. Host A then sends a packet for Host B to PE-1. PE-1 269 tunnels such packet towards PE-2 according to the default route 270 learnt from PE-2, which in turn forwards that packet to GW. 272 +--------------------+ 273 +------------------+ | | +------------------+ 274 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 275 | \ | | | | / | 276 | +------+ \ ++---+-+ +-+---++/ +------+ | 277 | |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| | 278 | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | 279 | 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 | 280 | GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 | 281 | +------+ | | | | | | | | +------+ | 282 |--+ GW-1 +----+ | | | | | | +----+ GW-2 +-- | 283 | +------+\ | | | | | | /+------+ | 284 | 192.0.2.4/24 | | | | | | 192.0.2.4/24 | 285 | | | | | | | | 286 | DC West | | | IP/MPLS Backbone | | | DC East | 287 +------------------+ | | | | +------------------+ 288 | +--------------------+ | 289 | | 290 VRF_A : V VRF_A : V 291 +------------+---------+--------+ +------------+---------+--------+ 292 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 293 +------------+---------+--------+ +------------+---------+--------+ 294 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 295 +------------+---------+--------+ +------------+---------+--------+ 296 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 297 +------------+---------+--------+ +------------+---------+--------+ 298 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 299 +------------+---------+--------+ +------------+---------+--------+ 300 |192.0.2.4/32|192.0.2.4| Direct | |192.0.2.4/32|192.0.2.4| Direct | 301 +------------+---------+--------+ +------------+---------+--------+ 302 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 303 +------------+---------+--------+ +------------+---------+--------+ 304 | 0.0.0.0/0 |192.0.2.4| Static | | 0.0.0.0/0 |192.0.2.4| Static | 305 +------------+---------+--------+ +------------+---------+--------+ 306 Figure 3: Inter-subnet Unicast Example (2) 308 As shown in Figure 3, in the case where each data center is deployed 309 with a default gateway, hosts will get ARP responses directly from 310 their local default gateways, rather than from their local PE routers 311 when sending ARP requests for their default gateways. 313 +------+ 314 +------+ PE-3 +------+ 315 +------------------+ | +------+ | +------------------+ 316 |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| 317 | \ | | | | / | 318 | +------+ \ ++---+-+ +-+---++/ +------+ | 319 | |Host A+-------+ PE-1 | | PE-2 +------+Host B| | 320 | +------+\ ++-+-+-+ +-+-+-++ /+------+ | 321 | 192.0.2.2/24 | | | | | | 192.0.2.3/24 | 322 | GW=192.0.2.1 | | | | | | GW=192.0.2.1 | 323 | | | | | | | | 324 | DC West | | | IP/MPLS Backbone | | | DC East | 325 +------------------+ | | | | +------------------+ 326 | +--------------------+ | 327 | | 328 VRF_A : V VRF_A : V 329 +------------+---------+--------+ +------------+---------+--------+ 330 | Prefix | Nexthop |Protocol| | Prefix | Nexthop |Protocol| 331 +------------+---------+--------+ +------------+---------+--------+ 332 |192.0.2.1/32|127.0.0.1| Direct | |192.0.2.1/32|127.0.0.1| Direct | 333 +------------+---------+--------+ +------------+---------+--------+ 334 |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | 335 +------------+---------+--------+ +------------+---------+--------+ 336 |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | 337 +------------+---------+--------+ +------------+---------+--------+ 338 |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | 339 +------------+---------+--------+ +------------+---------+--------+ 340 | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | 341 +------------+---------+--------+ +------------+---------+--------+ 342 Figure 4: Inter-subnet Unicast Example (3) 344 Alternatively, as shown in Figure 4, PE routers themselves could be 345 configured as default gateways for their locally connected hosts as 346 long as these PE routers have routes to reach outside networks. 348 3.2. Multicast 350 To support IP multicast between hosts of the same Virtual Subnet, 351 MVPN technologies [RFC6513] could be used without any change. For 352 example, PE routers attached to a given VPN join a default provider 353 multicast distribution tree which is dedicated to that VPN. Ingress 354 PE routers, upon receiving multicast packets from their local hosts, 355 forward them towards remote PE routers through the corresponding 356 default provider multicast distribution tree. Within this context, 357 the IP multicast doesn't include link-local multicast. 359 3.3. Host Discovery 361 PE routers should be able to dynamically discover their local hosts 362 and keep the list of these hosts up-to-date in a timely manner so as 363 to ensure the availability and accuracy of the corresponding host 364 routes originated from them. PE routers could accomplish local host 365 discovery by some traditional host discovery mechanisms using ARP or 366 ND protocols. 368 3.4. ARP/ND Proxy 370 Acting as an ARP or ND proxy, a PE router should only respond to an 371 ARP request or Neighbor Solicitation (NS) message for a target host 372 when it has a best route for that target host in the associated VRF 373 and the outgoing interface of that best route is different from the 374 one over which the ARP request or NS message is received. In the 375 scenario where a given VPN site (i.e., a data center) is multi-homed 376 to more than one PE router via an Ethernet switch or an Ethernet 377 network, the Virtual Router Redundancy Protocol (VRRP) [RFC5798] is 378 usually enabled on these PE routers. In this case, only the PE 379 router being elected as the VRRP Master is allowed to perform the 380 ARP/ND proxy function. 382 3.5. Host Mobility 384 During the VM migration process, the PE router to which the moving VM 385 is now attached would create a host route for that host upon 386 receiving a notification message of VM attachment (e.g., a gratuitous 387 ARP or unsolicited NA message). The PE router to which the moving VM 388 was previously attached would withdraw the corresponding host route 389 when noticing the detachment of that VM. Meanwhile, the latter PE 390 router could optionally broadcast a gratuitous ARP or send an 391 unsolicited NA message on behalf of that host with source MAC address 392 being one of its own. In this way, the ARP/ND entry of this host 393 that moved and which has been cached on any local host would be 394 updated accordingly. In the case where there is no explicit VM 395 detachment notification mechanism, the PE router could also use the 396 following trick to detect the VM detachment: upon learning a route 397 update for a local host from a remote PE router for the first time, 398 the PE router could immediately check whether that local host is 399 still attached to it by some means (e.g., ARP/ND PING and/or ICMP 400 PING). It is important to ensure that the same MAC and IP are 401 associated to the default gateway active in each data center, as the 402 VM would most likely continue to send packets to the same default 403 gateway address after having migrated from one data center to 404 another. One possible way to achieve this goal is to configure the 405 same VRRP group on each location so as to ensure that the default 406 gateway active in each data center shares the same virtual MAC and 407 virtual IP addresses. 409 3.6. Forwarding Table Scalability on Data Center Switches 411 In a Virtual Subnet environment, the MAC learning domain associated 412 with a given Virtual Subnet which has been extended across multiple 413 data centers is partitioned into segments and each segment is 414 confined within a single data center. Therefore data center switches 415 only need to learn local MAC addresses, rather than learning both 416 local and remote MAC addresses. 418 3.7. ARP/ND Cache Table Scalability on Default Gateways 420 When default gateway functions are implemented on PE routers as shown 421 in Figure 4, the ARP/ND cache table on each PE router only needs to 422 contain ARP/ND entries of local hosts. As a result, the ARP/ND cache 423 table size would not grow as the number of data centers to be 424 connected increases. 426 3.8. ARP/ND and Unknown Unicast Flood Avoidance 428 In a Virtual Subnet environment, the flooding domain associated with 429 a given Virtual Subnet that has been extended across multiple data 430 centers, is partitioned into segments and each segment is confined 431 within a single data center. Therefore, the performance impact on 432 networks and servers imposed by the flooding of ARP/ND broadcast/ 433 multicast and unknown unicast traffic is minimized. 435 3.9. Path Optimization 437 Take the scenario shown in Figure 4 as an example, to optimize the 438 forwarding path for the traffic between cloud users and cloud data 439 centers, PE routers located in cloud data centers (i.e., PE-1 and PE- 440 2), which are also acting as default gateways, propagate host routes 441 for their own local hosts respectively to remote PE routers which are 442 attached to cloud user sites (i.e., PE-3). As such, traffic from 443 cloud user sites to a given server on the Virtual Subnet which has 444 been extended across data centers would be forwarded directly to the 445 data center location where that server resides, since traffic is now 446 forwarded according to the host route for that server, rather than 447 the subnet route. Furthermore, for traffic coming from cloud data 448 centers and forwarded to cloud user sites, each PE router acting as a 449 default gateway would forward traffic according to the longest-match 450 route in the corresponding VRF. As a result, traffic from data 451 centers to cloud user sites is forwarded along an optimal path as 452 well. 454 4. Limitations 456 4.1. Non-support of Non-IP Traffic 458 Although most traffic within and across data centers is IP traffic, 459 there may still be a few legacy clustering applications which rely on 460 non-IP communications (e.g., heartbeat messages between cluster 461 nodes). Since Virtual Subnet is strictly based on L3 forwarding, 462 those non-IP communications cannot be supported in the Virtual Subnet 463 solution. In order to support those few non-IP traffic (if present) 464 in the environment where the Virtual Subnet solution has been 465 deployed, the approach following the idea of "route all IP traffic, 466 bridge non-IP traffic" could be considered. In other words, all IP 467 traffic including both intra- and inter-subnet, would be processed 468 according to the Virtual Subnet design, while non-IP traffic would be 469 forwarded according to a particular Layer 2 VPN approach. Such 470 unified L2/L3 VPN approach requires ingress PE routers to classify 471 packets received from hosts before distributing them to the 472 corresponding L2 or L3 VPN forwarding processes. Note that more and 473 more cluster vendors are offering clustering applications based on 474 Layer 3 interconnection. 476 4.2. Non-support of IP Broadcast and Link-local Multicast 478 As illustrated before, intra-subnet traffic across PE routers is 479 forwarded at Layer 3 in the Virtual Subnet solution. Therefore, IP 480 broadcast and link-local multicast traffic cannot be forwarded across 481 PE routers in the Virtual Subnet solution. In order to support the 482 IP broadcast and link-local multicast traffic in the environment 483 where the Virtual Subnet solution has been deployed, the unified L2/ 484 L3 overlay approach as described in Section 4.1 could be considered 485 as well. That is, IP broadcast and link-local multicast messages 486 would be forwared at Layer 2 while routable IP traffic would be 487 processed according to the Virtual Subnet design. 489 4.3. TTL and Traceroute 491 As mentioned before, intra-subnet traffic is forwarded at Layer 3 in 492 the Virtual Subnet context. Since it doesn't require any change to 493 the Time To Live (TTL) handling mechanism of the BGP/MPLS IP VPN, 494 when doing a traceroute operation on one host for another host 495 (assuming that these two hosts are within the same subnet but are 496 attached to different sites), the traceroute output would reflect the 497 fact that these two hosts within the same subnet are actually 498 connected via a Virtual Subnet, rather than a Layer 2 connection 499 since the PE routers to which those two hosts are connected would be 500 displayed in the traceroute output. In addition, for any other 501 applications that generate intra-subnet traffic with TTL set to 1, 502 these applications may not work properly in the Virtual Subnet 503 context, unless special TTL processing and loop-prevention mechanisms 504 for such context have been implemented. Details about such special 505 TTL processing and loop-prevention mechanisms are outside the scope 506 of this document. 508 5. Acknowledgements 510 Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, 511 Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, 512 Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe 513 Touch, Wim Henderickx, Alia Atlas and Stephen Farrell for their 514 valuable comments and suggestions on this document. Thanks to Loa 515 Andersson for his WG LC review on this document. Thanks to Alvaro 516 Retana for his AD review on this document. Thanks to Ronald Bonica 517 for his RtgDir review. Thanks to Donald Eastlake for his Sec-DIR 518 review of this document. Thanks to Jouni Korhonen for the OPS-Dir 519 review of this document. Thanks to Roni Even for the Gen-ART review 520 of this document. Thanks to Sabrina Tanamal for the IANA review of 521 this document. 523 6. IANA Considerations 525 There is no requirement for any IANA action. 527 7. Security Considerations 529 Since the BGP/MPLS IP VPN signaling is reused without any change, 530 those security considerations as described in [RFC4364] are 531 applicable to this document. Meanwhile, since security issues 532 associated with the NDP are inherited due to the use of NDP proxy, 533 those security considerations and recommendations as described in 534 [RFC6583] are applicable to this document as well. 536 Inter data-center traffic often carries highly sensitive information 537 at higher layers that is not directly understood (parsed) within an 538 egress or ingress PE. For example, migrating a VM will often mean 539 moving private keys and other sensitive configuration information. 540 For this reason inter data-center traffic should always be protected 541 for both confidentiality and integrity using a strong security 542 mechanism such as IPsec [RFC4301]. In future it may be feasible to 543 protect that traffic within the MPLS layer 544 [I-D.ietf-mpls-opportunistic-encrypt] though at the time of writing 545 the mechanism for that is not sufficiently mature to recommend. 546 Exactly how such security mechanisms are deployed will vary from case 547 to case, so securing the inter data-center traffic may or may not 548 involve deploying security mechanisms on the ingress/egress PEs or 549 further "inside" the data centers concerned. Note though that if 550 security is not deployed on the egress/ingress PEs there is a 551 substantial risk that some sensitive traffic may be sent in clear and 552 therefore be vulnerable to pervasive monitoring [RFC7258] or other 553 attacks. 555 8. References 557 8.1. Normative References 559 [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, 560 DOI 10.17487/RFC0925, October 1984, 561 . 563 [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to 564 implement transparent subnet gateways", RFC 1027, 565 DOI 10.17487/RFC1027, October 1987, 566 . 568 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 569 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 570 2006, . 572 [RFC4389] Thaler, D., Talwar, M., and C. Patel, "Neighbor Discovery 573 Proxies (ND Proxy)", RFC 4389, DOI 10.17487/RFC4389, April 574 2006, . 576 8.2. Informative References 578 [I-D.ietf-mpls-opportunistic-encrypt] 579 Farrel, A. and S. Farrell, "Opportunistic Security in MPLS 580 Networks", draft-ietf-mpls-opportunistic-encrypt-00 (work 581 in progress), July 2015. 583 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 584 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 585 December 2005, . 587 [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, 588 "BGP-MPLS IP Virtual Private Network (VPN) Extension for 589 IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006, 590 . 592 [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private 593 LAN Service (VPLS) Using BGP for Auto-Discovery and 594 Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, 595 . 597 [RFC4762] Lasserre, M., Ed. and V. Kompella, Ed., "Virtual Private 598 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 599 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 600 . 602 [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) 603 Version 3 for IPv4 and IPv6", RFC 5798, 604 DOI 10.17487/RFC5798, March 2010, 605 . 607 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 608 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 609 2012, . 611 [RFC6583] Gashinsky, I., Jaeggli, J., and W. Kumari, "Operational 612 Neighbor Discovery Problems", RFC 6583, 613 DOI 10.17487/RFC6583, March 2012, 614 . 616 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 617 Problems in Large Data Center Networks", RFC 6820, 618 DOI 10.17487/RFC6820, January 2013, 619 . 621 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 622 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 623 2014, . 625 Authors' Addresses 627 Xiaohu Xu 628 Huawei Technologies 629 No.156 Beiqing Rd 630 Beijing 100095 631 CHINA 633 Email: xuxiaohu@huawei.com 635 Christian Jacquenet 636 Orange 637 4 rue du Clos Courtel 638 Cesson-Sevigne, 35512 639 FRANCE 641 Email: christian.jacquenet@orange.com 642 Robert Raszuk 643 Bloomberg LP 644 731 Lexington Ave 645 New York City, NY 10022 646 USA 648 Email: robert@raszuk.net 650 Truman Boyes 651 Bloomberg LP 653 Email: tboyes@bloomberg.net 655 Brendan Fee 656 Extreme Networks 658 Email: bfee@extremenetworks.com