idnits 2.17.1 draft-nachum-sarp-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 37 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 44 has weird spacing: '... The list ...' == Line 110 has weird spacing: '... each acces...' == Line 245 has weird spacing: '... hosts that ...' == Line 287 has weird spacing: '...icantly reduc...' == Line 364 has weird spacing: '...diation for I...' == (17 more instances...) -- The document date (April 8, 2015) is 3278 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'ARP-ND-PRACTICE' is mentioned on line 229, but not defined == Missing Reference: 'RFC 4664' is mentioned on line 362, but not defined == Missing Reference: 'RFC 4389' is mentioned on line 363, but not defined == Missing Reference: 'RFC 4541' is mentioned on line 364, but not defined == Missing Reference: 'RFC 6575' is mentioned on line 365, but not defined == Unused Reference: 'ProxyARP' is defined on line 817, but no explicit reference was found in the text == Unused Reference: 'RFC925' is defined on line 825, but no explicit reference was found in the text == Unused Reference: 'RFC4664' is defined on line 833, but no explicit reference was found in the text == Unused Reference: 'RFC6575' is defined on line 836, but no explicit reference was found in the text -- No information found for draft-ietf-ipv6-multi-link-subnets - is the name correct? Summary: 1 error (**), 0 flaws (~~), 16 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Youval Nachum 2 Internet Draft Ixia 3 Intended status: Experimental Linda Dunbar 4 Expires: October 2015 Huawei 6 Ilan Yerushalmi 7 Tal Mizrahi 8 Marvell 10 April 8, 2015 12 Scaling the Address Resolution Protocol for Large Data Centers 13 (SARP) 14 draft-nachum-sarp-11.txt 16 Abstract 18 This document introduces SARP, an architecture that uses proxy 19 gateways to scale large data center networks. SARP is based on 20 fast proxies that significantly reduce switches' Filtering 21 Databased (FDB) table sizes and ARP/ND impact on network 22 elements in an environment where hosts within one subnet (or 23 VLAN) can spread over various locations. SARP is targeted for 24 massive data centers with a significant number of Virtual 25 Machines (VMs) that can move across various physical 26 locations. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance 31 with the provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet 34 Engineering Task Force (IETF), its areas, and its working 35 groups. Note that other groups may also distribute working 36 documents as Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six 39 months and may be updated, replaced, or obsoleted by other 40 documents at any time. It is inappropriate to use Internet- 41 Drafts as reference material or to cite them other than as 42 "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt. 47 The list of Internet-Draft Shadow Directories can be accessed 48 at http://www.ietf.org/shadow.html. 50 This Internet-Draft will expire on October 8, 2015. 52 Copyright Notice 54 Copyright (c) 2015 IETF Trust and the persons identified as 55 the document authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date 60 of publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with 62 respect to this document. Code Components extracted from this 63 document must include Simplified BSD License text as described 64 in Section 4.e of the Trust Legal Provisions and are provided 65 without warranty as described in the Simplified BSD License. 67 Table of Contents 69 1. Introduction...................................................3 70 1.1. SARP Motivation...........................................3 71 1.2. SARP Overview.............................................6 72 1.3. SARP Deployment Options...................................7 73 1.4. Comparing with Existing Solutions.........................8 74 2. Terms and Abbreviations Used in this Document..................9 75 3. SARP - Theory of Operation....................................10 76 3.1. Control Plane: ARP/ND....................................10 77 3.1.1. ARP/NS Request for a Local VM.......................10 78 3.1.2. ARP/NS Request for a Remote VM......................11 79 3.1.3. Gratuitous ARP and Unsolicited Neighbor 80 Advertisement (UNA)........................................12 81 3.2. Data Plane: Packet Transmission..........................13 82 3.2.1. Local Packet Transmission...........................13 83 3.2.2. Packet Transmission Between Sites...................13 84 3.3. VM Migration.............................................14 85 3.3.1. VM Local Migration..................................14 86 3.3.2. VM Migration from One Site to Another...............14 87 3.3.2.1. Impact on IP<->MAC Mapping Cache Table of 88 Migrated VMs............................................16 89 3.4. Multicast and Broadcast..................................16 90 3.5. Non IP packet............................................17 91 3.6. High availability and load balancing.....................17 92 3.7. SARP Interaction with Overlay networks...................18 93 4. Security Considerations.......................................18 94 5. IANA Considerations...........................................19 95 6. References....................................................19 96 6.1. Normative References.....................................19 97 6.2. Informative References...................................19 98 7. Acknowledgments...............................................20 100 1. Introduction 102 This document describes a proxy gateway technique, called 103 Scalable Address Resolution Protocol (SARP), which reduces 104 switches' Filtering Data Base (FDB) size and ARP/Neighbor 105 Discovery impact on network elements in an environment where 106 hosts within one subnet (or VLAN) can spread over various 107 access domains in data centers. 109 The main idea of SARP is to represent all VMs (or hosts) under 110 each access domain by their corresponding access (or 111 aggregation) node's MAC address. For example (Figure 1), when 112 host A in the west site needs to communicate with host B, 113 which is on the same VLAN but connected to a different access 114 domain (east site), SARP requires A to use the MAC address of 115 SARP proxy 2, rather than the address of host B. By doing so, 116 switches in each domain do not need to maintain a list of MAC 117 addresses for all the VMs (hosts) in different access domains; 118 every switch only needs to be familiar with MAC addresses that 119 reside in the current domain, and addresses of remote SARP 120 proxy gateways. Therefore, the switches' FDB size is limited 121 regardless of the number of access domains. 123 +-------+ +-------+ _ __ +-------+ +-------+ 124 | | | SARP | / \_/ \_ | SARP | | | 125 |host A |<===>| proxy |<=>\_ \<==>| proxy |<===>|host B | 126 | | | 1 | / _/ | 2 | | | 127 +-------+ +-------+ \__ _/ +-------+ +-------+ 128 \_/ 129 <------west site------> <------east site------> 130 Figure 1 SARP in a nutshell 132 1.1. SARP Motivation 134 [RFC6820] discusses the impacts and scaling issues that arise 135 in data center networks when subnets span across multiple 136 L2/L3 boundary routers. 138 Unfortunately, when the combined number of VMs (or hosts) in 139 all those subnets is large, this can lead to switches' MAC 140 table size explosion and heavy impact on network elements. 142 There are four major issues associated with subnets spanning 143 across multiple L2/L3 boundary router ports: 145 1)Intermediate switches' MAC address table (FDB) explosion. 147 When hosts in a VLAN (or subnet) span across multiple access 148 domains and each access domain has hosts belonging to 149 different VLANs, each access switch has to enable multiple 150 VLANs. Thus, those access switches are exposed to all MAC 151 addresses across all VLANs. 153 For example, for an access switch with 40 attached physical 154 servers, where each server has 100 VMs, the access switch 155 has 4000 attached MAC addresses. If indeed hosts/VMs can be 156 moved anywhere, the worst case for the Access Switch is when 157 all those 4000 VMs belong to different VLANs, i.e. the 158 access switch has 4000 VLANs enabled. If each VLAN has 200 159 hosts, this access switch's MAC table potentially has 160 200*4000 = 800,000 entries. 162 It is important to note that the example above is relevant 163 regardless of whether IPv4 or IPv6 are used. 165 The example illustrates a scenario that is worse than what 166 today's L2/3 Gateway has to face. In today's environment 167 where each subnet is limited to a few access switches, the 168 number of MAC addresses the gateway has to learn is of a 169 significantly smaller scale. 171 2)ARP/ND processing load impact to the L2/L3 boundary routers. 173 All VMs periodically send NDs to their corresponding gateway 174 nodes to get gateway nodes' MAC addresses. When the combined 175 number of VMs across all the VLANs is large, processing the 176 responses to the ND requests from those VMs can easily 177 exhaust the gateway's CPU utilization. 179 A L2/L3 boundary router could be hit with ARP/ND twice when 180 the originating and destination stations are in different 181 subnets attached to the same router and when those hosts do 182 not communicate with external peers very frequently. The 183 first hit is when the originating station in subnet 1 184 initiates an ARP/ND request to the L2/L3 boundary router. 185 The second hit is when the L2/L3 boundary router initiates 186 an ARP/ND request to the target in subnet 2 if the target is 187 not in router's ARP/ND cache. 189 3)In IPv4, every end station in a subnet receives ARP 190 broadcast messages from all other end stations in the 191 subnet. IPv6 ND has eliminated this issue by using 192 multicast. 194 However, most devices support a limited number of multicast 195 addresses, due to multicast filtering scaling. Once the 196 number of multicast addresses exceeds the multicast filter 197 limit, the multicast addresses have to be processed by 198 devices' CPU (i.e. the slow path). 200 It is less of an issue in data centers without VM mobility, 201 since each port is only dedicated to one (or a small number 202 of) VLANs. Thus, the number of multicast addresses hitting 203 each port is significantly lower. 205 4)The ARP/ND messages are flooded to many physical link 206 segments which can reduce the bandwidth utilization for user 207 traffic. 209 ARP/ND flooding is, in most cases, an insignificant issue in 210 today's data center networks as the majority of data center 211 servers are shifting towards 1G or 10G ports. The bandwidth 212 used by ARP/ND, even when flooded to all physical links, 213 becomes negligible compared to the link bandwidth. 214 Furthermore, IGMP/MLD snooping [RFC4541] can further reduce 215 the ND multicast traffic to some physical link segments. 217 Statistics gathered by Merit Network [ARMDStats] have shown 218 that the major impact of a large number of VMs in data centers 219 is on the L2/L3 boundary routers, i.e., issue (2) above. An 220 L2/L3 boundary router could be hit with ARP/ND twice when the 221 originating and destination stations are in different subnets 222 attached to the same router and those hosts do not communicate 223 with external peers often enough. 225 Overlay approaches, e.g. [RFC7364], can hide hosts (VMs) 226 addresses in the core but do not prevent the MAC table 227 explosion problem (issue (1)) unless the NVE is on a server. 229 The scaling practices documented in [ARP-ND-PRACTICE] can only 230 reduce some ARP impact to L2/L3 boundary routers in some 231 scenarios, but not all. 233 In order to protect router CPUs from being overburdened by 234 target resolution requests, some routers rate limit the target 235 MAC resolution requests to the router's CPU. When the rate 236 limit is exceeded, the incoming data frames are dropped. In 237 traditional data centers, this issue is less significant, 238 since the number of hosts attached to one L2/L3 boundary 239 router is limited by the number of physical ports of the 240 switches/routers. When servers are virtualized to support 30+ 241 VMs, the number of hosts under one router can grow by a factor 242 of 30+. Furthermore, in traditional data center networks each 243 subnet is neatly bound to a limited number of server racks, 244 i.e., switches only need to be familiar with MAC addresses of 245 hosts that reside in this small number of subnets. In 246 contemporary data center networks, as subnets are spread 247 across many server racks, switches are exposed to VLAN/MAC 248 addresses of many subnets, greatly increasing the size of 249 switches' FDB tables. 251 The solution proposed in this document can eliminate or reduce 252 the likelihood of inter-subnet data frames being dropped and 253 reduce the number of host MAC addresses that intermediate 254 switches are exposed to, thus reducing switches' FDB table 255 sizes. 257 1.2. SARP Overview 259 The SARP approach uses proxy gateways to address the problems 260 discussed above. 262 Note: The Guidelines to proxy developers [RFC4389] have been 263 carefully considered for the SARP protocols. Section 3.3 264 discusses how SARP works when VMs are moved from one segment 265 to another. 267 In order to enable VMs to be moved across servers while 268 maintaining their MAC/IP addresses unchanged, the Layer 2 269 network (e.g. VLAN) which interconnects those VMs may spread 270 across different server racks, different rows of server racks, 271 or even different data center sites. 273 A multi-site data center network is comprised of two main 274 building blocks: an interconnecting segment and an access 275 segment. While the access network is, in most cases, a Layer 2 276 network, the interconnecting segment is not necessarily a 277 Layer 2 network. 279 The SARP proxies are located at the boundaries where the 280 access segment connects to its interconnecting segment. The 281 boundary node can be a hypervisor virtual switch, a top-of- 282 rack switch, an aggregation switch (or end of row switch), or 283 a data center core switch. Figure 2 depicts an example of two 284 remote data centers that are managed as a single flat Layer 2 285 domain. SARP proxies are implemented at the edge devices 286 connecting the data center to the transport network. SARP 287 significantly reduces the ARP/ND transmissions over the 288 interconnecting network. 290 *-------------------* 291 | | 292 +-------| Interconnecting |-------+ 293 | | network | | 294 | *-------------------* | 295 | | 296 *-----------------* *----------------* 297 | SARP Proxies | | SARP Proxies | 298 *-----------------* *----------------* 299 | | | | 300 *-------* *-------* *-------* *-------* 301 |Access | |Access | |Access | |Access | 302 *-------* *-------* *-------* *-------* 303 | 304 *----------* 305 |Hypervisor| 306 *----------* 307 | 308 *--------* 309 |Virtual | 310 |Machine | 311 *--------* 313 (West Site) (East Site) 315 Figure 2 SARP: Network Architecture Example 317 1.3. SARP Deployment Options 319 SARP deployment is tightly coupled with the data center 320 architecture. SARP proxies are located at the point where the 321 Layer 2 infrastructure connects to its Layer 2 cloud using 322 overlay networks. SARP proxies can be located at the data 323 center edge (as Figure 2 depicts), data center core, or data 324 center aggregation (denoted by Agg in the figure). SARP can 325 also be implemented by the hypervisor (as Figure 3 depicts). 327 To simplify the description, we will focus on data centers 328 that are managed as a single flat Layer 2 network, where SARP 329 proxies are located at the boundary where the data center 330 connects to the transport network (as Figure 2 depicts). 332 *-------------------* 333 | | 334 +-------| TRANSPORT |-------+ 335 | | | | 336 | *-------------------* | 337 | | 338 *-----------------* *----------------* 339 | Edge Device | | Edge Device | 340 *-----------------* *----------------* 341 | | 342 *-----------------* *----------------* 343 | Core | | Core | 344 *-----------------* *----------------* 345 | | | | 346 *-------* *-------* *-------* *-------* 347 | Agg | | Agg | | Agg | | Agg | 348 *-------* *-------* *-------* *-------* 349 | 350 *----------* 351 |Hypervisor| 352 *----------* 354 (West Site) (East Site) 356 Figure 3 SARP deployment options 358 1.4. Comparing with Existing Solutions 360 IETF has developed several mechanisms to address issues 361 associated with Layer 2 networks over multiple geographic 362 locations, for example, Layer 2 VPN [RFC 4664], proxy ARP [RFC 363 925], proxy Neighbor Discovery [RFC 4389], IGMP and MLD 364 snooping [RFC 4541], and ARP mediation for IP interworking of 365 Layer 2 VPNs [RFC 6575]. 367 However, all those solutions work well when hosts within one 368 subnet are placed together under one access domain, so that 369 the intermediate switches in each access domain are only 370 exposed to host addresses from a limited number of subnets 371 SARP is to provide a solution when hosts within one subnet are 372 spread across multiple access domains and each access domain 373 has hosts from many subnets. Under this environment, the 374 intermediate switches in each access domain are exposed to 375 combined hosts of all the subnets that are enabled by the 376 access domain. 378 2. Terms and Abbreviations Used in this Document 380 ARP: Address Resolution Protocol [ARP] 382 FDB: Filtering Data Base, which is used for Layer-2 switches 383 [802.1Q]. Layer 2 switches flood data frames when DA is 384 not in FDB, whereas routers drop data frames when the DA 385 is not in the Forwarding Information Base (FIB). That is 386 why Filtering Data Base (FDB) is used for Layer 2 387 switches. 389 FIB: Forwarding Information Base 391 Hypervisor: a software layer that creates and runs virtual 392 machines on a server. 394 IP-D: IP address of the destination virtual machine 396 IP-S: IP address of the source virtual machine 398 MAC-D: MAC address of the destination virtual machine 400 MAC-E: MAC address of the East Proxy SARP Device 402 MAC-S: MAC address of the source virtual machine 404 NA: IPv6 ND's Neighbor Advertisement 406 ND: IPv6 Neighbor Discovery Protocol [ND]. In this document, 407 ND also refers to Neighbor Solicitation, Neighbor 408 Advertisement, Unsolicited Neighbor Advertisement 409 messages defined by RFC4861 411 NS: IPv6 ND's Neighbor Solicitation 412 SARP Proxy: The components that participates in the SARP 413 protocol. 415 UNA: IPv6 ND's Unsolicited Neighbor Advertisement [ND] 417 VM: Virtual Machine 419 3. SARP - Theory of Operation 421 3.1. Control Plane: ARP/ND 423 This section describes the ARP/ND procedure scenarios. The 424 first scenario addresses a case where both the source and 425 destination VMs reside in the same access segment. In the 426 second scenario, the source VM is in the local access segment 427 and the destination VM is located at the remote access 428 segment. 430 In all scenarios, the VMs (source and destination) share the 431 same L2 broadcast domain. 433 3.1.1. ARP/NS Request for a Local VM 435 When source and destination VMs are located at the same access 436 segment (Figure 4), the address resolution process is as 437 described in [ARP] and [ND]; host A sends an ARP request or an 438 IPv6 Neighbor Solicitation (NS) to learn the IP-to-MAC mapping 439 of host B, and receives a reply from host B with the IP-D to 440 MAC-D mapping. 442 +-------+ _ __ +-------+ _ __ 443 |host A | / \_/ \_ | SARP | / \_/ \_ 444 | IP-S |<--->\_access \<==>| proxy |<===>\_interc.\ 445 | MAC-S | /network_/ | 1 | /network_/ 446 +-------+ +->\__ _/ +-------+ \__ _/ 447 | \_/ \_/ 448 +-------+ | 449 |host B |<-+ 450 | IP-D | 451 | MAC-D | 452 +-------+ 454 <--------------west site------------> 455 Figure 4 SARP: two hosts in the same access segment 457 3.1.2. ARP/NS Request for a Remote VM 459 When the source and destination VMs are located at different 460 access segments, the address resolution process is as follows. 462 +-------+ +-------+ _ __ +-------+ +-------+ 463 |host A | | SARP | / \_/ \_ | SARP | |host B | 464 | IP-S |<===>|proxy 1|<=>\_ \<==>|proxy 2|<===>| IP-D | 465 | MAC-S | | MAC-W | / _/ | MAC-E | | MAC-D | 466 +-------+ +-------+ \__ _/ +-------+ +-------+ 467 \_/ 468 <------west site------> <------east site------> 469 Figure 5 SARP: two hosts that reside at different segments 471 In the example illustrated in Figure 5, the source VM is 472 located at the west access segment and the destination VM is 473 located at the east access segment. 475 When host A sends an ARP/NS request to find out the IP-to-MAC 476 mapping of host B: 478 1. If SARP proxy 1 does not have IP-D in its ARP cache, the 479 ARP/NS request is propagated to all access segments which 480 might have VMs in the same virtual network as the 481 originating VM, including the east access segment. 483 2. As SARP proxy 1 forwards the ARP/NS message, it replaces 484 the source MAC address, MAC-S, with its own MAC address, 485 MAC-W. Thus, all switches that reside in the interconnecting 486 segment are not exposed to MAC-S. 488 3. The ARP/NS request reaches SARP proxy 2. 490 4. If SARP proxy 2 does not have IP-D in its ARP cache, the 491 ARP/NS request is forwarded to the east access network. Host 492 B responds with an ARP reply (IPv4) or a Neighbor 493 Advertisement (IPv6) to the request with MAC-D. 495 5. When the response message reaches SARP proxy 2, it replaces 496 MAC-D with MAC-E, and thus the response reaches SARP proxy 1 497 with MAC-E. 499 6. As SARP proxy 1 forwards the response to host A, it 500 replaces the destination address from MAC-W to MAC-S. 502 SARP Proxy ARP/ND Cache 504 SARP proxies maintain a cache of the IP<->MAC mapping. This 505 cache is based on ARP/ND messages that are sent by hosts and 506 traverse the SARP proxies. 508 In step . 1 and step 4 . above, if the SARP proxy has IP-D in its 509 ARP cache, it responds with MAC-E, without forwarding the 510 ARP/NS request. 512 This caching approach significantly reduces the volume of the 513 ARP/ND transmission over the network, and reduces the round 514 trip time of ARP/ND requests. 516 When the west SARP proxy caches the IP<-> MAC mapping entries 517 for remote VMs, the expiration timers should be set to 518 relatively low value to prevent stale entries due to remote 519 VMs being moved or deleted. In environments where VMs move 520 more frequently, it is not recommended for SARP proxies to 521 cache the IP<-> MAC mapping entries of remote VMs. 523 3.1.3. Gratuitous ARP and Unsolicited Neighbor Advertisement 524 (UNA) 526 Hosts (or VMs) send out Gratuitous ARP (IPv4) [TcpIp] and 527 Unsolicited Neighbor Advertisement - UNA (IPv6) to allow other 528 nodes to refresh IP<->MAC entries in their caches. 530 The local SARP proxy processes the Gratuitous ARP or UNA in 531 the same way as the ARP reply or IPv6 NA, i.e. replaces the 532 MAC addresses in the same manner. 534 3.2. Data Plane: Packet Transmission 536 3.2.1. Local Packet Transmission 538 When a VM transmits packets to a destination VM that is 539 located at the same site (Figure 4), the data plane is 540 unaffected by SARP; packets are sent from (IP-S, MAC-S) to 541 (IP-D, MAC-D). 543 3.2.2. Packet Transmission Between Sites 545 Packets that are sent between sites (Figure 5) traverse the 546 SARP proxy of both sites. 548 A packet sent from host A to host B undergoes the following 549 procedure: 551 1. Host A sends a packet to IP-D, and based on its ARP table 552 it uses the MAC addresses {MAC-E, MAC-S}. 554 2. SARP proxy 1 receives the packet and replaces the source 555 MAC address, such that the packet includes {MAC-E, MAC-W}. 557 3. SARP proxy 2 receives the packet and replaces the 558 destination MAC address, and the packet is sent to host B 559 with {MAC-D, MAC-W}. 561 SARP proxy 1 replaces the source MAC address with its own 562 since switches in the interconnecting segment are only 563 familiar with SARP proxy MAC addresses, and are not familiar 564 with host addresses. 566 Note: it is a common security practice in data center networks 567 to use access lists, allowing each VM to communicate only with 568 a list of authorized peer VMs. In most cases, such access 569 control lists are based on IP addresses, and hence are not 570 affected by the MAC address replacement in SARP. 572 3.3. VM Migration 574 3.3.1. VM Local Migration 576 When a VM migrates locally within its access segment, the SARP 577 protocol does not require any special behavior. VM migration 578 is resolved entirely by the Layer 2 mechanisms. 580 3.3.2. VM Migration from One Site to Another 582 This section focuses on a scenario where a VM migrates from 583 the west site to the east site while maintaining its MAC and 584 IP addresses. 586 VM migration might affect networking elements based on their 587 respective location: 589 - Origin site (west site) 591 - Destination site (east site) 593 - Other sites 595 +-------+ +-------+ _ __ +-------+ +-------+ 596 |host A | | SARP | / \_/ \_ | SARP | |host A | 597 | IP-D |<===>|proxy 1|<=>\_ \<==>|proxy 2|<===>| IP-D | 598 | MAC-D | | MAC-W | / _/ | MAC-E | | MAC-D | 599 +-------+ +-------+ \__ _/ +-------+ +-------+ 600 \_/ 601 <------west site------> <------east site------> 602 Origin site Destination site 603 Figure 6 SARP: host A migrates from west site to east site 605 Origin site 607 The Origin site is the site where the VM resides before the 608 migration (west site). 610 Before the VM (IP=IP-D, MAC=MAC-D) is moved, all VMs at the 611 west site that have an ARP entry of IP-D in their ARP table 612 have the IP-D -> MAC-D mapping. VMs on other access segments 613 have an ARP entry of IP-D -> MAC-W mapping where MAC-W is the 614 MAC address of the SARP proxy on the west access segment. 616 After the VM (IP-D) in the west site moves to the east site, 617 if a Gratuitous ARP (IPv4) or an Unsolicited Neighbor 618 Advertisement (IPv6) is sent out by the destination hypervisor 619 on behalf of the VM (IP-D), then the IP<->MAC mapping cache of 620 the VMs in all access segments is updated by IP-D -> MAC-E 621 where MAC-E is the MAC address of the SARP proxy on the east 622 site. If no Gratuitous ARP or Unsolicited Neighbor 623 Advertisement is sent out by the destination hypervisor, the 624 IP<->MAC cache on the VMs in the west site (and other sites) 625 is eventually aged out. 627 Until the IP<->MAC mapping cache tables are updated, the 628 source VMs from the west site continue sending packets locally 629 to MAC-D, and switches at the west site are still configured 630 with the old location of MAC-D. This transient condition can 631 be resolved by having the VM manager send out a fake 632 Gratuitous ARP or Unsolicited Neighbor Advertisement on behalf 633 of the destination Hypervisor. Another alternative is to have 634 a shorter aging timer configured for IP<->MAC cache table. 636 Destination Site 638 The destination site is the site to which the VM migrated, 639 i.e., the east site in Figure 6. 641 Before any Gratuitous ARP or Unsolicited Neighbor 642 Advertisement messages are sent out by the destination 643 hypervisor, all VMs at the east site (and all other sites) 644 might have IP-D -> MAC-W mapping in their IP<->MAC mapping 645 cache. The IP<->MAC mapping cache is updated by aging or by a 646 Gratuitous ARP or UNA message sent by the destination 647 hypervisor. Until the IP<->MAC mapping caches are updated, VMs 648 from the east site continue to send packets to MAC-W. This can 649 be resolved by having the VM manager sending out a fake 650 Gratuitous ARP/UNA immediately after the VM migration, or 651 redirecting the packets from the SARP proxy of the east site 652 back to the migrated VM by updating the destination MAC of the 653 packets to MAC-D. 655 Other Sites 657 All VMs at the other sites that have an ARP entry of IP-D in 658 their ARP table have the IP-D -> MAC-W mapping. The ARP 659 mapping is updated by aging or by a Gratuitous ARP message 660 sent by the destination hypervisor of the migrated VM and 661 modified by the SARP proxy of the east site to an IP-D -> MAC- 662 E mapping. Until ARP tables are updated, VMs from other sites 663 continue sending packets to MAC-W. 665 3.3.2.1. Impact on IP<->MAC Mapping Cache Table of Migrated VMs 667 When a VM (IP-D) is moved from one site to another, its IP<- 668 >MAC mapping entries for VMs located at other sites (i.e., 669 neither the east site nor the west site) are still valid, even 670 though most guest OSs (or VMs) will refresh their IP<->MAC 671 cache after migration. 673 The migrated VM's IP<->MAC mapping entries for VMs located at 674 the east site, if not refreshed after migration, can be kept 675 with no change until the ARP aging time since they are mapped 676 to MAC-E. All traffic originated from the migrated VM in its 677 new location to VMs located at the east site traverses the 678 SARP proxy of the east site, which can redirect the traffic 679 back to the corresponding destinations on the east site. 680 Furthermore, an ARP/UNA sent by the SARP proxy of the east 681 site or by the VMs on the east site can refresh the 682 corresponding entries in the migrated VM's IP<->MAC cache. 684 The migrated VM's ARP entries for VMs located at the west site 685 remain unchanged until either the ARP entries age out or new 686 data frames are received from the remote sites. Since all MAC 687 addresses of the VMs located at the west site are unknown at 688 the east site, all unknown traffic from the VM is intercepted 689 by the SARP proxy of the east site and forwarded to the SARP 690 proxy of the west site (during the transient period before the 691 ARP entries age out). This transient behavior is avoided if 692 the SARP proxy has the destination IP address in its ARP 693 cache, and upon receiving a packet with an unknown destination 694 MAC address it can send a Gratuitous ARP/UNA to the migrated 695 VM. 697 Note that overlay networks providing Layer 2 network 698 virtualization services configure their edge device MAC aging 699 timers to be greater than the ARP request interval. 701 3.4. Multicast and Broadcast 703 Multicast and broadcast traffic is forwarded by SARP proxies 704 as follows: 706 o SARP proxies modify the source MAC address of multicast and 707 broadcast packets as described in Section 3.2. 709 o SARP proxies do not modify the destination MAC address of 710 multicast and broadcast packets. 712 3.5. Non IP packet 714 The L2/L3 boundary routers in the current document are capable 715 of forwarding non-IP IEEE802.1 Ethernet frames (Layer 2) 716 without MAC header change. When subnets span across multiple 717 ports of those routers, they are still under the category of a 718 single link, or a multi-access link model recommended by 719 [RFC4903]. They differ from the "multi-link" subnets described 720 in [MultLinkSub] and [RFC4903], which refer to a different 721 physical media with the same prefix connected to a router, 722 where the Layer 2 frames cannot be natively forwarded without 723 header change. 724 3.6. High availability and load balancing 726 The SARP proxy is located at the boundary where the local 727 Layer 2 infrastructure connects to the interconnecting 728 network. All traffic from the local site to the remote sites 729 traverses the SARP proxy. The SARP proxy is subject to high 730 availability and bandwidth requirements. 732 The SARP architecture supports multiple SARP proxies 733 connecting a single site to the transport network. In the SARP 734 architecture all proxies can be active and can backup one 735 another. The SARP architecture is robust and allows network 736 administrators to allocate proxies according to bandwidth and 737 high availability requirements. 739 Traffic is segregated between SARP proxies by using VLANs. An 740 SARP proxy is the Master-SARP proxy of a set of VLANs and the 741 Backup-SARP proxy of another set of VLANs. 743 For example, assume the SARP proxies of the west site are SARP 744 proxy 1 and SARP proxy 2. The west site supports VLAN 1 and 745 VLAN 2 while SARP proxy 1 is the Master SARP proxy of VLAN 1 746 and the Backup proxy of VLAN 2 and SARP proxy 2 is the Master 747 SARP proxy of VLAN 2 and the Backup SARP proxy of VLAN 1. Both 748 proxies are members of VLAN 1 and VLAN 2. 750 The Master SARP proxy updates its Backup proxy with all the 751 ARP reply messages. The Backup SARP proxy maintains a backup 752 database to all the VLANs that it is the Backup SARP proxy of. 754 The Master and the Backup SARP proxies maintain a keepalive 755 mechanism. In case of a failure the Backup proxy becomes the 756 Master SARP proxy. The failure decision is per VLAN. When the 757 Master and the Backup proxies switch-over, the backup SARP 758 proxy can use the MAC address of the Master SARP proxy. The 759 backup SARP proxy sends locally a Gratuitous ARP message with 760 the MAC address of the Master SARP proxy to update the 761 forwarding tables on the local switches. The backup SARP proxy 762 also updates the remote SARP proxies on the change. 764 3.7. SARP Interaction with Overlay networks 766 SARP can be used over overlay networks, providing L2 network 767 virtualization (such as IP, VPLS, TRILL, OTV, NVGRE and 768 VXLAN). The mapping of SARP to overlay networks is 769 straightforward; the VM does the destination IP to SARP proxy 770 MAC mapping. The mapping of the proxy MAC to its correct 771 tunnel is done by the overlay networks. 773 SARP significantly scales down the complexity of the overlay 774 networks and transport networks by reducing the mapping tables 775 to the number of SARP proxies. 777 4. Security Considerations 779 SARP proxies are located at the boundaries of access networks, 780 where the local Layer 2 infrastructure connects to its Layer 2 781 cloud. SARP proxies interoperate with overlay network 782 protocols that extend the Layer 2 subnet across data centers 783 or between different systems within a data center. 785 The SARP protocol does not expose the network to additional 786 security threats that do not exist in the absence of SARP. 788 SARP proxies may be exposed to Denial of Service (DoS) attacks 789 by means of ARP/ND message flooding. Thus, SARP proxies must 790 have sufficient resources to support the SARP control plane 791 without making the network more vulnerable to DoS than without 792 SARP proxies. 794 SARP adds security to the data plane in terms of network 795 reconnaissance, by hiding all the local Layer 2 MAC addresses 796 from potential attackers located at the interconnecting 797 network, and significantly limiting the number of addresses 798 exposed to an attacker at a remote site. 800 5. IANA Considerations 802 There are no IANA actions required by this document. 804 RFC Editor: please delete this section before publication. 806 6. References 808 6.1. Normative References 810 [ARP] Plummer, D., "An Ethernet Address Resolution 811 Protocol", RFC 826, November 1982. 813 [ND] Narten, T., Nordmark, E., Simpson, W., and H. 814 Soliman, "Neighbor Discovery for IP version 6 815 (IPv6)", RFC 4861, September 2007. 817 [ProxyARP] Carl-Mitchell, S., Quarterman, J., "Using ARP to 818 Implement Transparent Subnet Gateways", RFC 819 1027, October 1987. 821 [RFC4389] Thaler, D., Talwar, M., Patel, C., "Neighbor 822 Discovery Proxies (ND Proxy)", RFC 4389, April 823 2006. 825 [RFC925] Postel,J., "Multi-LAN Address Resolution", Oct 826 1984. 828 [RFC4541] Christensen, M., et al, "Considerations for 829 Internet Group Management Protocol (IGMP) and 830 Multicast Listener Discovery (MLD) Snooping 831 Switches", may 2006. 833 [RFC4664] Andersson, L., et al, "Framework for Layer 2 834 Virtual Private Nteworks (L2VPNs)", Sept 2006. 836 [RFC6575] Shah, H. et al, "Address Resolution Protocol 837 (ARP Mediation for IP Interworking of Layer 2 838 VPNs", June 2012 840 6.2. Informative References 842 [802.1Q] IEEE, "IEEE Standard for Local and metropolitan 843 area networks -- Bridges and Bridged Networks", 844 IEEE Std 802.1Q, December 2014. 846 [RFC6820] Narten, T., Karir , M., Foo, I., "Address 847 Resolution Problems in Large Data Center 848 Networks", RFC 6820, Jan 2013. 850 [ARMDStats] Karir, M., Rees, J., "Address Resolution 851 Statistics", draft-karir-armd-statistics-01 852 (expired), July 2011. 854 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., 855 Kreeger, L., Napierala, M., "Problem Statement: 856 Overlays for Network Virtualization", draft- 857 ietf-nvo3-overlay-problem-statement, Oct 2014. 859 [RFC4903] Thaler, D., "Multilink Subnet Issues", RFC 4903, 860 June 2007. 862 [MultLinkSub] Thaler, D., Huitema, C., "Multi-link Subnet 863 Support in IPv6", draft-ietf-ipv6-multi-link- 864 subnets-00 (expired), June 2002. 866 [TcpIp] W. Stevens, "TCP/IP Illustrated, Volume 1: The 867 Protocols", Addison-Wesley, 1994. 869 7. Acknowledgments 871 The authors thank Ted Lemon, Eric Gray and Adrian Farrel for 872 providing valuable comments and suggestions to the draft. 874 This document was prepared using 2-Word-v2.0.template.dot. 876 Authors' Addresses 878 Youval Nachum 879 Email: youval.nachum@gmail.com 881 Linda Dunbar 882 Huawei Technologies 883 5430 Legacy Drive, Suite #175 884 Plano, TX 75024, USA 885 Phone: (469) 277 5840 886 Email: ldunbar@huawei.com 887 Ilan Yerushalmi 888 Marvell 889 6 Hamada St. 890 Yokneam, 20692 Israel 891 Email: yilan@marvell.com 893 Tal Mizrahi 894 Marvell 895 6 Hamada St. 896 Yokneam, 20692 Israel 897 Email: talmi@marvell.com