< draft-shah-armd-arp-reduction-01.txt   draft-shah-armd-arp-reduction-02.txt >
Working Group: ARMD Himanshu Shah Working Group: ARMD Himanshu Shah
Intended Status: Proposed Standard Ciena Corp Intended Status: Informational Ciena Corp
Internet Draft Internet Draft
Anoop Ghanwani Anoop Ghanwani
Expiration Date: May, 2011 Brocade Expiration Date: April 27, 2012 Brocade
Nabil Bitar Nabil Bitar
Verizon Verizon
October 25, 2010 October 28, 2011
ARP Broadcast Reduction for Large Data Centers ARP Broadcast Reduction for Large Data Centers
draft-shah-armd-arp-reduction-01.txt draft-shah-armd-arp-reduction-02.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at line 37 skipping to change at line 37
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as reference at any time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on May 25, 2011 This Internet-Draft will expire on April 27, 2012
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Abstract Abstract
With the emergence server virtualization technologies, a host is With advent of server virtualization technologies, a host is able to
able to support multiple Virtual Machines (VMs) in a single physical support multiple Virtual Machines (VMs) in a single physical
machine. Data centers can leverage these capabilities to instantiate machine. Data Centers can leverage these capabilities to instantiate
on the order of 10s to 100s of VMs in a server. Each VM operates as on the order of 10s to 100s of VMs in a single server with current
an independent IP host with a set of Virtual Network Interface Cards technology. It is conceivable that this number can be much higher
(vNICs), each having its own MAC address and mapping to a physical in the future. Each VM operates as an independent IP host with a set
Ethernet interface. These physical servers are typically installed of Virtual Network Interface Cards (vNICs), each having its own MAC
in a rack with their Ethernet interfaces connected to a top-of-rack address and mapping to a physical Ethernet interface. These physical
(ToR) switch. The ToR switches are interconnected through End-of- servers are typically installed in a rack with their Ethernet
the-Row (EoR) or aggregation switches which are in turn connected to interfaces connected to a top-of-the-rack (ToR) switch. The ToR
core switches. switches are interconnected through End-of-the-Row (EoR) or
aggregation switches which are in turn connected to core switches.
As discussed in [ARP-Problem] the host VMs use ARP broadcasts to As discussed in [ARP-Problem] the host VMs use ARP broadcasts to
find other host VMs and use periodic (broadcast) Gratuitous ARPs to find other host VMs and use periodic (broadcast) Gratuitous ARPs to
refresh their IP to MAC address binding in other VM hosts. Such refresh their IP to MAC address binding in other VM hosts. Such
broadcasts in a large data center with potentially thousands of VM broadcasts in a large data center with potentially thousands of VM
hosts in a Layer 2 based topology can overwhelm the network. hosts in a Layer 2 based topology can overwhelm the network.
This memo proposes mechanisms to reduce the number of broadcasts This memo proposes mechanisms to reduce the number of broadcasts
that are sent throughout the network. This is done by having the ToR that are sent throughout the network. This is done by having the
switches intelligently process ARP packets, rather than simply ToRs intelligently process ARP and frames, rather than simply
broadcasting them throughout the broadcast domain. broadcasting them throughout the broadcast domain.
While this document specifically addresses ARP, the Neighbor While this document addresses ARP, the Neighbor Discovery mechanisms
Discovery mechanisms used by IPv6 hosts that make use of multicast used by the IPv6 hosts that make use of multicast rather than
rather than broadcast also pose similar issues for the data center. broadcast also pose similar issues in the Data Center. The solutions
The solutions defined herein should be equally applicable to hosts defined herein should be equally applicable to hosts running IPv6.
running IPv6. The details will be specified in a subsequent The details will be specified in a subsequent revision.
revision.
Conventions Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119]. document are to be interpreted as described in RFC 2119 [RFC 2119].
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
Table of Contents Table of Contents
Copyright Notice .................................................... 1 Copyright Notice ........................................... 1
Abstract.............................................................. 2 Abstract .................................................... 2
1.0 Overview.......................................................... 3 1.0 Overview ................................................ 3
1.1 Terminology ..................................................... 5 1.1 Terminology ............................................ 5
2.0 Configuration..................................................... 6 2.0 Configuration ........................................... 6
3.0 Building the ARP Tables........................................... 6 3.0 Building the ARP tables ................................. 6
3.1 ARP Request ..................................................... 6 3.1 ARP Requests ........................................... 6
3.2 ARP Reply ....................................................... 7 3.2 ARP Reply .............................................. 7
3.3 Gratuitous ARP .................................................. 7 3.3 Gratuitous ARP ......................................... 7
3.4 Uplink Versus Downlink Processing ............................... 8 3.4 Host movement .......................................... 8
3.5 Host Mobility ................................................... 8 4.0 Conclusion .............................................. 9
4.0 Concluding Remarks................................................ 9 5.0 Security Considerations ................................. 10
5.0 Security Considerations ......................................... 10 6.0 Acknowledgments ......................................... 10
6.0 Acknowledgments ................................................. 10 7.0 References .............................................. 10
7.0 References....................................................... 10 7.1 Normative References.................................... 10
7.1 Normative References ........................................... 10 7.2 Informative References ................................. 10
7.2 Informative References ......................................... 10 8.0 Author's Address ........................................ 11
8.0 Author's Address................................................. 10
1.0 Overview 1.0 Overview
The traditional topology in a data center consists of racks of The traditional topology in a data center consists of racks of
servers connected to top-of-rack (ToR) switches, which connect to servers connected to top-of-rack (ToR) switches, which connect to
aggregation switches, which in turn connect to core switches. The aggregation switches, which in turn connect to core switches. The
network architecture is typically a combination Layer 2 and Layer 3 network architecture typically combines Layer 2 and Layer 3. In
functionality. In some architectures, Layer 2 is terminated at the some architectures, Layer 2 is terminated at the ToR, with Layer 3
ToR, with Layer 3 being run in the aggregation and core devices. In being run in the aggregation and core devices. In other
other architectures, Layer 2 may be extended all the way to the architectures, Layer 2 may be extended all the way to the
aggregation switch. The primary concerns that have influenced aggregation switch. The primary concerns that have influenced
network architectures in the data center have been keeping broadcast network architectures in the data center have been keeping broadcast
domains manageable and the spanning tree diameter contained. domains manageable and spanning tree domains contained.
Moving forward, these traditional network architectures are being Moving forward, these traditional network architectures are being
challenged due to emerging technologies such as server challenged due to emerging technologies such as server
virtualization. virtualization.
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
The effect of server virtualization in the data center brings some The effect of server virtualization in the data center brings some
challenges. Because of virtualization, the number of hosts seen by challenges. Because of virtualization, the number of hosts that the
the network increases dramatically - 10 to 100 times the number of network sees increases dramatically - 10 to 100 times the number of
physical servers. These virtual hosts are referred to as Virtual physical servers. These virtual hosts are referred to as Virtual
machines (VMs). In addition, virtualized environments offer a machines (VMs). VMs offer server mobility wherein a VM can be
feature referred to as "VM mobility" wherein a VM can be relocated relocated to run on a different physical server. In order for the
to run on a different physical server. In order for the VM mobility mobility to be non-disruptive to other hosts that have communication
to be non-disruptive to other hosts that have communication in in progress with the VM being moved, the VM must retain its MAC
progress with the VM being moved, the VM must retain its MAC address address and IP address. Because of the requirement to retain the
and IP address. Because of the requirement to retain the MAC and IP MAC and IP address, it is desirable to develop network architectures
address, it is desirable to develop network architectures that would that would offer the least restrictions in terms of server mobility.
offer the least restrictions in terms of VM mobility.
As an example, in a network architecture where TOR switches As an example, in a network architecture where TOR switches
terminate the L2 domain, the range of VM mobility would be terminate the L2 domain, the range of mobility would be restricted
restricted to a single ToR switch. It would be more preferable to to a single ToR switch. It would be more preferable to allow the
allow the flexibility of moving the VM anywhere within the data flexibility of moving the VM anywhere within the data center, or
center, or perhaps even a different data center. perhaps even a different data center.
Technologies such as TRILL [TRILL] overcome some of the issues of Technologies such as TRILL [TRILL] overcome some of the issues of
spanning trees that forced traditional Layer 2 topologies to be spanning trees because which traditional Layer 2 topologies have
severely constrained. However, because of virtualization there are been constrained. However, because of virtualization there are 2
2 specific problems that are introduced with respect to broadcast specific problems that are introduced with respect to broadcast
traffic. traffic.
1. A larger number of hosts. A single physical server now hosts 1. A larger number of hosts. A single physical server now hosts
multiple VMs taking the scale factor to a different level. If multiple virtual machines taking the scale factor to a
each VM issues the same number of broadcasts as a physical different level. If each VM has the same number of broadcasts
server, the amount of broadcast traffic will increased 10 to as a physical server, the amount of broadcast traffic has
greater than 100 times. increased 10 to greater than 100 times.
2. If the Layer 2 domains are extended to go across data centers, 2. If the Layer 2 domains are extended to go across data centers,
then broadcast traffic will now go across the backbone. If then broadcast traffic will now go across the backbone. If
Layer 2 was terminated at the ToR switch, the increase in Layer 2 was terminated at the ToR switch, the increase in
broadcast traffic would be been restricted to a single ToR broadcast traffic would be been restricted to a single ToR
switch, but as discussed earlier, this restriction is not switch, but as discussed earlier, this restriction is not
desirable. desirable.
Excessive broadcast traffic in Layer 2 networks results in wastage The broadcast as such in Layer 2 networks has far reaching impacts;
of network bandwidth, as well as in the wastage of CPU resources due i.e. wastage in network bandwidth as well as CPU resources used by
to all of the VMs processing superfluous ARP broadcasts (IPv6 gets all the VMs while processing superfluous ARP broadcasts (IPv6 gets
rid of the latter by running ND as a multicast service rather than a rid of the latter by running ND as a multicast service rather than a
broadcast service). broadcast service).
The solution presented here attempts to minimize the negative The solution presented here attempts to minimize negative effects of
effects of ARP broadcast packets. The solution requires the first ARP broadcasts. The solution requires the first hop Ethernet
hop Ethernet switches, typically the ToR switch, to maintain an ARP switches, typically ToR, to maintain an ARP table learned from the
table that is learned from the ARP packets received by the switch. ARP PDUs received by the switch and selectively propagates the ARP
The switch then selectively propagates the ARP packet to, or proxy- to, or proxy-responds on behalf of, the remote peer. These types of
responds on behalf of, the remote peer. These types of ARP ARP processing principles are well known and used/described in L2VPN
processing principles are well-known and are described in L2VPN Working Group documents such as [ARP-Mediation] and [IPLS]. The ARP
Working Group documents such as [ARP-Mediation] and [IPLS]. proxy response differs from that described in [RFC1027] as the ARP
response contains MAC address of the destination and not that of the
switch as is suggested in [RFC 1027].
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
The following sections describe the details of ARP snooping, the The following sections describe the details of ARP snooping,
learning and maintenance of ARP tables, the use of learned learning and maintaining ARP tables, using the learned information
information to limit broadcast propagation, and proxy (the response) to limit broadcast propagation and proxy (the response) on behalf of
on behalf of the remote peers. the remote peers.
1.1 Terminology 1.1 Terminology
ToR Top-of-Rack. An Ethernet switch present on top ToR switch Top-of-Rack switch. An Ethernet switch installed
of a rack which provides network connectivity to at the top of a rack of servers which provides
the servers present on the rack. network connectivity to those servers.
Downlink The Ethernet link between the ToR switch and a Downlink The Ethernet link between the ToR switch and a
directly connected host (server in the rack). directly connected host/server in the rack.
Uplink The network- facing Ethernet connection in the Uplink The network-facing Ethernet connection in the
ToR switch. Typically, the uplinks from ToRs ToR switch. Typically, the uplinks from ToRs
connect to end-of-row or aggregation switches. connect to end-of-row or aggregation switches.
EoR End-of-Row. An Ethernet switch to which the EoR switch End-of-Row switch. An Ethernet switch which
ToR switches connect, also referred to as an aggregates traffic from multiple racks. Also
aggregation switch. Uplinks from ToR switches commonly referred to as an aggregation switch.
connect to an EoR switch and uplinks from EoR Uplinks from the ToR connects to EoR switches
switches connect to a core switch. and uplinks from EoR switches in turn connect
to core switches.
Host/Server A host or server running the IP protocol. This Host/Server A host or server running the IP protocol. This
could be a physical entity or a logical entity could be a physical entity or a logical entity
(such as a Virtual Machine) in a physical host. (such as a Virtual Machine) in a physical host.
The term server refers to its role in the data The term server refers to its role in data
center. Both terms are used interchangeably to center. Both terms are used interchangeably
refer to an IP host. and refer to an IP end station.
Local hosts Used in the context of a ToR switch to denote Local hosts Used in the context of a ToR switch to denote
the VM hosts connected to a ToR on the the VM hosts connected to a ToR switch on the
downlink, i.e. directly attached hosts. downlink, i.e. directly connected hosts.
Remote hosts Used in the context of a ToR switch to denote Remote hosts Used in the context of a ToR switch to denote
the hosts that are accessible through uplink of the hosts that are accessible via the uplink of
the ToR. the ToR switch.
VM Virtual Machine. This is a logical instance of VM Virtual Machine. This is a logical instance of
a host that operates independently in a a host that operates independently in a
physical host and has its own IP and MAC physical host and has its own IP and MAC
addresses. VMs allow efficient use of physical addresses. The VM architecture allows efficient
host resources (such as multiple CPU cores). use of physical host resources (such as
multiple CPU cores).
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
2.0 Configuration 2.0 Configuration
It is assumed that ARP reduction mechanisms that are defined in this It is assumed that ARP reduction methodologies that are defined in
document will be limited to ToR switches. The maximum benefit of this document will be limited to ToR switches. The maximum benefit
restraining ARP broadcasts in the network is achieved by the first of restraining ARP broadcasts in the network is achieved by the
hop switches (the ones directly connected to the hosts) without first hop switches (the ones directly connected to the hosts)
placing additional burden on second or third tier switches. without placing additional burden on second or third tier switches.
First, the ToR switches would need to be configured in order to First, the ToR switches would need to be configured in order to
enable the ARP reduction feature. Every Ethernet interface needs to enable the ARP reduction feature. Every Ethernet interface needs to
be identified as either a downlink or uplink within the context of be identified as either a downlink or uplink within the context of
this feature. this feature. The ARP reduction feature treats ARP frames received
from downlink or uplink differently as described in the following
sections.
In addition the operator may optionally configure various ARP In additional the operator may optionally configure various ARP
reduction related parameters such as: reduction related parameters such as:
. ARP aging timer. . ARP aging timer,
. Size of the ARP table. . size of the ARP table,
. Static entries of IP to MAC address. . static entries of IP to MAC address, etc.
3.0 Building the ARP Tables 3.0 Building the ARP tables
When ARP reduction is enabled, the ToR switch will monitor all ARP When ARP reduction is enabled, the ToR switch will monitor all ARP
traffic transiting the switch (regardless of uplink port or downlink traffic transiting the switch (regardless of uplink port or downlink
port) and will process any ARP packets in the following manner: port) and will process any ARP PDUs in the following manner:
. ARP Request packets must be redirected to control plane CPU. . ARP Request PDUs must be redirected to control plane CPU.
. Gratuitous ARP packets (ARP Reply packet with a broadcast MAC . Gratuitous ARP PDUs (ARP Reply PDU with a broadcast MAC DA)
DA) must be redirected to control plane CPU. must be redirected to control plane CPU.
. Other ARP Reply packets (ARP Reply packet with a unicast MAC . Other ARP Reply PDUs (ARP Reply PDU with a unicast MAC DA)
DA) should be bi-casted; one copy sent to control plane CPU and should be bi-casted; one copy sent to control plane CPU and
other copy forwarded out normally. other copy forwarded out normally.
3.1 ARP Request 3.1 ARP Requests
The ToR examines the source IP and the source hardware address (MAC The ToR examines the source IP and the source hardware address (MAC
address) in the ARP Request . The source IP and MAC address address) in the ARP Request . The source IP and MAC address
association is learned, or is updated/refreshed if already learned. association is learned, or is updated/refreshed if already learned.
The destination IP address is searched in the ARP table. If an entry The destination IP address is searched in the ARP table. If an entry
exists, the associated MAC address from the table is used to prepare exists, the associated MAC address from the table is used to prepare
a unicast ARP Reply packet. The same MAC address is used as the a unicast ARP Reply PDU. The same MAC address is used as the source
source MAC address in the MAC header, as well as for the target MAC address in the MAC header, as well as for the target hardware
hardware address, in the unicast ARP Reply packet. address,in the unicast ARP Reply PDU.
If the destination IP address in the ARP Request is not present in If the destination IP address in the request is not present in the
the ARP table, then the original ARP Request packet is broadcast to ARP table, then the original ARP request PDU is broadcast to all the
all the switch ports that are members of the same VLAN except the switch ports that are member of the same VLAN except the source port
source port that the ARP Request was received from. However, if the that the Request was received from. However, if the requested
requested (destination) IP address is present in the ARP table, a Internet Draft draft-shah-arp-reduction-02.txt
unicast ARP Reply packet is prepared as described above and sent to
the switch port from which the ARP Request was received and original
ARP Request packet is dropped.
Internet Draft draft-shah-arp-reduction-01.txt (destination) IP address is present in the ARP table, a unicast ARP
Reply PDU is prepared as described above and sent to the switch port
from which the ARP Request was received and original ARP request PDU
is dropped.
The intent is to prevent propagation of ARP Request broadcasts as The intent is to prevent propagation of ARP Request PDU broadcasts
much as possible using the information present in the ARP table. The as much as possible using the information present in the ARP table.
following observations can be made from such behavior. The following observations can be made from such behavior.
. Most of the ARP Request packets from the local hosts of a ToR . Most of the ARP requests from the local hosts of a ToR switch
switch for the local hosts of that ToR switch can be prevented for the local hosts of the ToR switch can be prevented.
from being broadcast on uplinks or downlinks. . Most of the ARP requests from the remote hosts of a ToR switch
. Most of the ARP Request packets from remote hosts of a ToR for the local hosts of the ToR switch can be prevented from
switch for local hosts of that ToR switch can be prevented getting forwarded on downlinks or other uplinks of the ToR
from being broadcast on downlinks or other uplinks of the ToR
switch. switch.
. Many of the ARP Request packets from local hosts of a ToR . Many of the ARP requests from the local hosts of a ToR switch
switch for remote hosts of that ToR switch can be prevented for the remote hosts of the ToR switch can be prevented from
from being forwarded on uplinks if the remote host IP to MAC being forwarded on uplinks if the remote host IP to MAC
association is known to the ToR switch. association is known to the ToR switch.
3.2 ARP Reply 3.2 ARP Reply
The unicast ARP Reply is examined to learn/update the ARP table for The unicast ARP Reply is examined to learn/update the ARP table for
source and destination IP/MAC address association, but is also source and destination IP/MAC address association, but is also
forwarded out as a normal frame. forwarded out as a normal frame.
3.3 Gratuitous ARP 3.3 Gratuitous ARP
Gratuitous ARP is a broadcast ARP Reply packet with the destination Gratuitous ARP is a broadcast ARP Reply PDU with destination IP
IP address set to the IP address of the sender and target hardware address set to the IP address of the sender and target hardware
address set to the MAC address of the sender. It is typically used address set to the MAC address of the sender. It is typically used
by IP hosts (including VMs) to keep its IP-to-MAC address by the IP hosts (including VMs) to keep its association fresh in
association fresh in its peers' ARP cache. peer's ARP cache.
The ToR switch should process Gratuitous ARP in the following The ToR switch should process Gratuitous ARP in the following
manner. manner.
. Learn/update/refresh the ARP table entry. . Learn/update/refresh the ARP table entry.
. If the IP address is new, or exists but with a different . If the IP address is new, or exists but with a different
hardware address, then the Gratuitous ARP packet is forwarded hardware address, then the Gratuitous ARP PDU is forwarded
out; otherwise the packet is discarded. out; otherwise the PDU is discarded.
The goal for handling of Gratuitous ARP packets received from the The goal for handling of the Gratuitous ARP PDU received from the
downlinks (i.e. local hosts) is to avoid propagating it into the downlinks (i.e. local hosts) is to avoid propagating it into the
'network' (i.e. to the uplinks), unless there is a new association. 'network' (i.e. to uplinks), unless there is a new association.
By suppressing the propagation of Gratuitous ARP packets, the peer By suppressing the propagation of Gratuitous ARP PDUs, the peer IP
IP hosts will end up aging out the corresponding ARP table entries. hosts will end up aging out the corresponding ARP table entries.
This will result in generation of the broadcast ARP Requests by This will result in generation of the broadcast ARP Requests by
those IP hosts if they need to continue to communicate with the IP those IP hosts if they need to continue to communicate with the IP
host whose Gratuitous ARPs were obstructed. The handling of the ARP host whose Gratuitous ARPs were obstructed. The handling of the ARP
Request by the first-hop ToR switch, as described above, will be Request, as described above, by the first hop ToR switch will be
able to respond to this request based on the ARP cache maintained in able to respond to this request based on the ARP cache maintained in
the ToR switch. In essence, the presence of large ARP tables with the ToR switch. In essence, presence of large ARP tables with longer
longer aging times compensates for the smaller ARP table present in age out times compensates for the smaller ARP table present in the
Internet Draft draft-shah-arp-reduction-01.txt Internet Draft draft-shah-arp-reduction-02.txt
the IP hosts and eliminates the need for periodic use of Gratuitous
ARPs in order to refresh the ARP table in the IP hosts.
3.4 Uplink Versus Downlink Processing
With respect to processing of the ARP packets as described above,
the behavior is different depending on whether the packet was
received from an uplink or downlink in the following ways.
. The aging timer will typically be higher for entries learned IP hosts and eliminates the need for periodic use of Gratuitous ARPs
from an uplink versus those learned from a downlink. The in order to refresh the ARP table in the IP hosts.
reason for this is to avoid flooding ARP broadcast packets on
uplinks since they have a much larger negative impact.
. If ARP table fills up, then entries learned from downlinks
(i.e. directly attached hosts) will take precedence over those
learned from an uplink (i.e. remote hosts). This will trade
off sending broadcasts on host links versus sending them into
the core of the network. The reason for this is that access
links are typically lower bandwidth, and also this will
conserve CPU resources involved in processing unnecessary ARP
traffic.
3.5 Host Mobility 3.4 Host movement
As mentioned earlier, server virtualization technology allows As mentioned earlier, server virtualization technology allows
mobility of VMs to different physical servers. The flexibility to movement of VMs to different physical servers. The flexibility to
move VMs is one of the key benefits of server virtualization. VM move VMs is one of the key benefits of server virtualization. The
mobility could be manual (operator initiated) or may be done VM movement could be manual (operator initiated) or may be done
automatically in reaction to demands placed by the application automatically in reaction to demands placed by the application
users. The important point is that in either case, VM movement is users. The important point is that in either case, VM movement is
not transparent and is made known to the network. not transparent and is made known to the network.
There is ongoing work in IEEE 802.1 standards organization (IEEE There is ongoing work in IEEE 802.1 standards organization (IEEE
802.1Qbg) to coordinate/communicate the presence and capabilities of 802.1Qbg) to coordinate/communicate the presence and capabilities of
the VMs to the directly connected network switch. the VMs to the directly connected network switch.
VMs typically retain their MAC and IP address across a VM mobility VMs typically retain their MAC and IP address, and as such, there
event, and as such, there would be little impact to the ARP table would be little impact to the ARP table maintained by the ARP
maintained by the ARP reduction mechanism described herein. reduction mechanism described herein. However, the ARP reduction
However, the ARP reduction mechanism would benefit from knowing if a mechanism would benefit from knowing if a VM is completely
VM is completely decommissioned so that the ToR switch can remove decommissioned so that the ToR can removed the ARP entry it has for
the ARP entry that it has for that VM in a timely fashion, rather that VM in a timely fashion, rather than waiting for it to timeout.
than waiting for it to age out.
3.5 Applicability to environments with overlay transport
Recently, there have been multiple proposals for using overlay
transport technologies such as VXLAN [VXLAN] and NVGRE [NVGRE].
These proposals allow the network operator to build the network
using L2 or L3 technologies while building an L2-overlay on top of
that. As such, while they address the issue of network design, they
do not eliminate the need for a mechanism to reduce the amount of
broadcast traffic that may have to traverse the core, if there are
VMs of the same tenant on servers attached to different ToR
switches.
One of the ways for the overlay transport proposals to address this
issue would be to implement the mechanism discussed in this document
at the point where the overlay encapsulation and decapsulation is
performed (i.e. in the virtual switch).
3.6 Scaling Considerations 3.6 Scaling Considerations
Depending on the number of hosts in the network, the ARP table in a Depending on the number of hosts in the networks, the ARP table can
ToR switch needed for the ARP reduction mechanisms described above be quite large. Although it is possible to implement some of the
can be quite large. Although it is possible to implement some of the mechanisms for ARP reduction as described in this document in
mechanisms for ARP reduction in hardware in the forwarding plane, hardware in the forwarding plane, the number of ARP entries may
Internet Draft draft-shah-arp-reduction-01.txt favor maintaining the ARP table in the control plane memory.
the number of ARP entries favors maintaining the ARP table in the Internet Draft draft-shah-arp-reduction-02.txt
control plane memory.
3.7 Miscellaneous Issues 3.7 Miscellaneous Issues
Because of the distributed nature of the mechanisms described Because of the distributed nature of the mechanisms described
herein, there are a few additional issues that warrant consideration herein, there are a few additional issues that warrant consideration
from the network operator. from the network operator.
Earlier in the document, we had mentioned the configuration of a Earlier in the document, we had mentioned the configuration of a
aging timer for ARP entries. A longer timer for holding onto ARP timer for ARP entries. A longer timer for holding on to ARP entries
entries helps with reduction of broadcasts. However, having a "too helps with reduction of broadcasts. However, the risk of having a
large timer" can lead to problems in certain situations. "too large timer" can cause problems in certain situations.
Consider the following scenario. Host A is attached to ToR switch Consider the following scenario. Host A is attached to ToR switch
#1, and host B is attached to ToR switch #2. If host B issues an #1, and host B is attached to ToR switch #2. If host B issues an
ARP Request for host A, and if the entry is available at switch #2, ARP request for host A, if the entry is available at switch #2, then
then switch #2 would send the ARP Reply on behalf of host A. It is switch #2 would send the ARP Reply on behalf of host A. It is
possible that host A is no longer available, but there is no way for possible that host A is no longer available, but there is no way for
switch #2 to know this, and it would continue to respond on behalf switch #2 to know this, and it would continue to respond on behalf
of host A, until its entry for host A has aged out. In this case, of host A, until its entry for host A has timed out. In this case,
it is easy to see that a smaller aging timer would be beneficial. it is easy to see that a smaller timer would be beneficial.
Additionally, since host B has an ARP aging timer, it means that Additionally, since host B has an ARP age timer, it means that host
host B would find out about host A's unavailability only after its B would find out about host A's unavailability only after its entry
entry has aged out, which would be some time after it the entry has has aged, which would be after it has aged out of switch #2.
aged out of switch #2.
Another issue that can be somewhat problematic could be the Another issue that can be somewhat problematic could be the
inconsistency of tables in switches. Once again, consider a inconsistency of tables in switches. Once again, consider a
scenario similar to the one described above with two hosts each scenario similar to the one described above with 2 hosts each
connected to its respect ToR switch. Let the ARP entries at both A connected to its respect ToR switch. Let the ARP entries at both A
and B be learned by both switches. Now assume that the IP address and B be learned by both switches. Now assume that the IP address
on host A changes. This change is signaled to switch #1 which in on host A changes. This change is signaled to switch #1 which in
turn broadcasts the message on its uplink. Now, if this message is turn broadcasts the message on its uplink. Now, if this message is
discarded due to network congestion or signal integrity issues, then discarded due to network congestion or signal integrity issues, then
switch #2 will not learn about the change and will continue to switch #2 will not learn about the change and will continue to
respond to host B's ARP Requests for host A's old IP address with respond to host B's ARP Requests for host A's old IP address with
stale information. This lasts until the ARP entry for A ages out at stale information. This lasts until the ARP entry for A times out
Switch #2. at Switch #2.
4.0 Concluding Remarks 4.0 Conclusion
Based on the procedures described in this document, it is possible Based on the procedures described in this document, it is possible
for ToR switches in the data center to contain ARP broadcasts for ToR switches in the data center to contain ARP broadcasts
significantly. The solution is based on well known, non-intrusive significantly. The solution is based on well known, non-intrusive
procedures and strives to curtail ARP broadcasts that are procedures and strives to curtail broadcasts that are increasingly
increasingly becoming a cause for concern in the data centers. In becoming a cause for concern in the data centers. In essence, ToR
essence, ToR switches offload some of the ARP table management from switches facilitate the offloading of the extended ARP table
the IP hosts to themselves. The ARP table aging timer can be tuned management from the IP hosts to itself. The ARP table timeout can be
higher by the operator based on the available switch resources and tuned higher by the operator based on the available switch resources
network traffic behavior. The larger capacity of the ARP table and network traffic behavior. The larger capacity of the ARP table
Internet Draft draft-shah-arp-reduction-01.txt directly translates to more effective subduing of the ARP
broadcasts.
coupled with a long aging time for entries in the table directly Internet Draft draft-shah-arp-reduction-02.txt
translates to more effective subduing of the ARP broadcasts.
5.0 Security Considerations 5.0 Security Considerations
Security aspects will be addressed in a subsequent revision. The details of the security aspects will be addressed in future
revision.
6.0 Acknowledgments 6.0 Acknowledgments
This document resulted from discussions with Linda Dunbar (Huawei), This document resulted from discussions with Linda Durbar (Huawei),
Sue Hares (Huawei), and T Sridhar (Force10). We would like to Sue Hares (Huawei), and T Sridhar (VMware). We would like to
acknowledge their contribution to this work. acknowledge their contribution to this work.
7.0 References 7.0 References
7.1 Normative References 7.1 Normative References
[ARP] D. Plummer, "An Ethernet Address Resolution Protocol: Or [ARP] D. Plummer, "An Ethernet Address Resolution Protocol: Or
Converting Network Protocol Addresses to 48.bit Ethernet Converting Network Protocol Addresses to 48.bit Ethernet
Addresses for Transmission on Ethernet Hardware, " RFC 826 (also Addresses for Transmission on Ethernet Hardware," RFC 826, STD
STD 37), November 1982. 37.
[ARP-Problem] L.Dunbar et al., "Scalable Address Resolution for [ARP-Problem] T. Narten, "Problem Statement for ARMD,"
Large Data Center Problem Statements," <draft-dunbar-arp-for- work in progress, <draft-ietf-armd-problem-statement>.
large-dc-problem-statement-00>, July 2010.
7.2 Informative References 7.2 Informative References
[ARP-Mediation] H. Shah et al., "ARP Mediation for IP interworking [ARP-Mediation] H. Shah et al., "ARP Mediation for IP interworking
in Layer 2 VPN," <draft-ietf-l2vpn-arp-mediation-14>, July 2010. in Layer 2 VPN," work in progress, <draft-ietf-l2vpn-arp-
mediation>.
[IPLS] H.Shah et al., "IP-only LAN service," [IPLS] H.Shah et al., "IP-only LAN service," work in progress,
<draft-ietf-l2vpn-ipls-09>, February 2010. <draft-ietf-l2vpn-ipls>.
[PROXY-ARP] J. Postel, "Multi-LAN Address Resolution," RFC 925, [PROXY-ARP] J. Postel, "Multi-LAN Address Resolution," RFC 925.
October 1984.
[TRILL] R. Perlman et al., "RBridges: Base Protocol Specification", [RFC1027] Smoot et al., "Using ARP to Implement Transparent Subnet
<draft-ietf-trill-rbridge-protocol-16>, March 2010. Gateways".
[VXLAN] M. Mahalingam et al., "VXLAN: A Framework for Overlaying
Virtualized Layer 2 Networks over Layer 3 Networks"," work in
progress, <draft-mahalingam-dutt-dcops-vxlan>.
[NVGRE] M. Sridharan et al., " NVGRE: Network Virtualization using
Generic Routing Encapsulation", work in progress, <draft-
sridharan-virtualization-nvgre>.
Internet Draft draft-shah-arp-reduction-02.txt
8.0 Author's Address 8.0 Author's Address
Himanshu Shah Himanshu Shah
Ciena Corp Ciena Corp
Email: hshah@ciena.com Email: hshah@ciena.com
Anoop Ghanwani Anoop Ghanwani
Internet Draft draft-shah-arp-reduction-01.txt
Brocade Brocade
Email: anoop@brocade.com Email: anoop@alumni.duke.edu
Nabil Bitar Nabil Bitar
Verizon Verizon
Email: nabil.n.bitar@verizon.com Email: nabil.n.bitar@verizon.com
 End of changes. 79 change blocks. 
248 lines changed or deleted 252 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/