idnits 2.17.1 draft-karir-armd-statistics-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 10, 2011) is 4674 days in the past. Is this intentional? Checking references for intended status: None ---------------------------------------------------------------------------- == Unused Reference: 'ARP' is defined on line 498, but no explicit reference was found in the text == Unused Reference: 'ND' is defined on line 501, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 ARMD BOF M. Karir 2 Internet Draft J. Rees 3 Intended status: Informational Track Merit Network Inc. 4 Expires: January 2012 6 July 10, 2011 8 Address Resolution Statistics 9 draft-karir-armd-statistics-01.txt 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with 14 the provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html 32 This Internet-Draft will expire on January 10, 2012. 34 Copyright Notice 36 Copyright (c) 2011 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this 45 document must include Simplified BSD License text as described in 46 Section 4.e of the Trust Legal Provisions and are provided without 47 warranty as described in the BSD License. 49 Abstract 51 As large scale data centers continue to grow with an ever-increasing 52 number of virtual and physical servers there is a need to re- 53 evaluate performance at the network edge. Performance is often 54 critical for large scale data center scale applications and it is 55 important to minimize any unnecessary latency or load in order to 56 streamline the operation of services at such large scales. To 57 extract maximum performance from these applications it is important 58 to optimize and tune all the layers in the data center stack. One 59 critical area that requires particular attention is the link-layer 60 address resolution protocol that maps an IP address with the 61 specific hardware address at the edge of the network. 63 The goal of this document is to characterize this problem space in 64 detail in order to better understand the scale of the problem as 65 well as to identify particular scenarios where address resolution 66 might have greater adverse impact on performance. 68 Conventions used in this document 70 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 71 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 72 document are to be interpreted as described in RFC-2119 0. 74 Table of Contents 76 1. Introduction...................................................3 77 2. Terminology....................................................3 78 3. Factors That Might Impact ARP/ND Performance...................4 79 3.1. Number of Hosts...........................................4 80 3.2. Traffic Patterns..........................................4 81 3.3. Network Events............................................4 82 3.4. Address Resolution Implementations........................4 83 3.5. Layer 2 Network Topology..................................5 84 4. Experiments and Measurements...................................5 85 4.1. Experiment Architecture...................................5 86 4.2. Impact of Number of Hosts.................................8 87 4.3. Impact of Traffic Patterns................................8 88 4.4. Impact of Network Events..................................9 89 4.5. Implementation Issues....................................10 90 4.6. Experiment Limitations...................................10 91 5. Emulating Address Resolution Behavior.........................11 92 6. Conclusion and Recommendation.................................11 93 7. Manageability Considerations..................................11 94 8. Security Considerations.......................................11 95 9. IANA Considerations...........................................12 96 10. Acknowledgments..............................................12 97 11. References...................................................12 98 Authors' Addresses...............................................12 99 Intellectual Property Statement..................................13 100 Disclaimer of Validity...........................................13 102 1. Introduction 104 Data centers are a key part of delivering Internet scale 105 applications. Performance at such large scales is critical as even 106 a few milliseconds or microseconds of additional latency can result 107 in loss of customer traffic. Data center design and network 108 architecture is a key part of the overall service delivery plan. 109 This includes not only determining the scale of physical and virtual 110 servers but also optimizations to the entire data center stack 111 including in particular the layer 3 and layer 2 architectures. 112 One aspect of data center design that has received some close 113 attention is link-layer address resolution protocols such as Address 114 Resolution Protocol (ARP - IPv4) and Neighbor Discovery (ND - IPv6). 115 The goal of these protocols is to map an IP address of a destination 116 node with the hardware address of the network interface for that 117 node. This address resolution occurs at the edge of the network. 118 In general, both ARP and ND are query/response protocols. 119 In order to maximize performance it is important to understand the 120 behavior of these protocols at large scales. In particular, we need 121 to understand what the performance implications of these protocols 122 might be in terms of the number of additional messages that they 123 generate as well the resulting load on devices on the network that 124 must then process these messages. 126 2. Terminology 128 ARP: Address Resolution Protocol 130 ND: Neighbor Discovery 132 ToR: Top of Rack Switch 134 VM: Virtual Machines 136 3. Factors That Might Impact ARP/ND Performance 138 3.1. Number of Hosts 140 Every host on the network that attempts to send/receive traffic will 141 produce some base level of ARP/ND traffic. The overall amount of 142 ARP/ND traffic on the network will vary with the number of hosts. 143 In the case of ARP, all address resolution request messages are 144 broadcast and these will be received and processed by all nodes on 145 the network. In the case of ND, address resolution messages are sent 146 via multicast and therefore may have a lower overall impact on the 147 network even though the number of messages exchanged is the same. 149 3.2. Traffic Patterns 151 The traffic pattern can have a significant impact on the level of 152 ARP/ND traffic in the network. Therefore we would expect ARP/ND 153 traffic pattern to vary significantly based on the data center 154 design as well as the application mix. The traffic mix determines 155 how many other nodes a given node needs to communicate with and how 156 frequently. Both of these directly influence address discovery 157 traffic on the network. 159 3.3. Network Events 161 Several specific network events can have a significant impact on 162 ARP/ND traffic. One example of such an event is machine failure. 163 If a host that is frequently accessed fails, it could result in much 164 higher ARP/ND traffic as other hosts in the network continue to try 165 to reach it by repeatedly sending out additional address resolution 166 messages. Another example is Virtual Machine migration. If a VM is 167 migrated to a system on a different switch, VLAN, or even 168 geographically different data center, it can cause a significant 169 shift in overall traffic patterns as well as ARP/ND traffic. 170 Another particularly well-known network event that causes address 171 resolution traffic spikes is a network scan. In a network scan, one 172 or more hosts internal or external to the edge network attempt to 173 connect to a large number of internal hosts in a very short period 174 of time. This results in a sudden increase in the amount of address 175 resolution traffic in the network. 177 3.4. Address Resolution Implementations 179 As with any other protocol, the activity of address resolution 180 protocols such as ARP/ND can vary significantly with specific 181 implementations as well as the default settings for various protocol 182 parameters. ARP cache timeout is a common parameter that has a 183 direct impact on the amount of address resolution traffic. Older 184 versions of Microsoft Windows would use a default value of 2 minutes 185 for this parameter, however Windows Vista and Windows 2008 186 implementations changed this to be a random value between 15 seconds 187 and 45 seconds. This parameter defaults to 60 seconds for Linux and 188 20 minutes for FreeBSD. The default value for Cisco routers and 189 switches is 4 hours. For ND, one relevant parameter is the prefix 190 stale time, which determines when old entries can be aged out. This 191 value is 30 days for Cisco, and 60 seconds for Linux. The overall 192 address resolution traffic in a data center will vary based on the 193 mix of various ARP implementations that are present. 195 3.5. Layer 2 Network Topology 197 The layer 2 network topology within a data center can also influence 198 the impact of various address resolution protocols. While ARP 199 traffic is broadcast and must be processed by all nodes within that 200 broadcast domain, a well designed layer 2 topology can limit the 201 size of the broadcast domain and the amount of address resolution 202 traffic. ND traffic on the other hand is multicast and might 203 potentially increase the load on the directly connected layer 2 204 switch if the traffic pattern spans across broadcast domains. 206 4. Experiments and Measurements 208 4.1. Experiment Architecture 210 In an attempt to quantify address resolution issues in a data center 211 environment we have run experiments in our own data center, which is 212 used for production services. We were able to leverage unused 213 capacity for our experiments. The data center topology is fairly 214 simple. There are a pair of redundant access switches which pass 215 traffic to and from the data center. These switches connect to the 216 top of the rack switches which in turn connect to blade switches in 217 our Dell blade chassis. The entire hardware platform is managed via 218 VMware's vCloud Director. In total we have access to 8 blades of 219 resources on a single chassis, which is roughly 3TB of disk, 200GB 220 of RAM and 100GHz of CPU. The network available to us is a /22 221 network block of IPv4 space and a /64 of IPv6 address space in a 222 flat topology. 224 Using this resource pool we create a 500-node testbed based on 225 Centos 5.5. We use custom command and control software that allows 226 us to control these nodes for our experiments. This allows us to 227 issue commands to all nodes to start/stop services and traffic 228 generation scripts. We also use a custom traffic generator agent in 229 order to generate both internal and external traffic via wget 230 commands to various hosts. 232 The command and control software uses UDP broadcast messages for 233 communication so that no additional address resolution messages are 234 generated that might affect our measurements. Each of the 500 nodes 235 is given a list of other nodes that it must contact at the beginning 236 of an experiment. This is used to affect the traffic patterns for a 237 given experiment. In addition each experiment determines traffic 238 rate by specifying the inter-communication delay between attempts to 239 contact other nodes. The shorter the duration the more the traffic 240 that will be generated. The nodes all run dual IPv4/IPv6 stacks. 242 A packet tap attached to a monitor port on the access switch allows 243 us to monitor the arrival rate of ARP and ND requests and replies. 244 We also monitor the CPU load on the access switch at two-second 245 intervals via SNMP queries [STUDY]. 247 Figure 1. shows our experimental setup. 249 External External 250 | | 251 | | 252 | | 253 | | 254 +---+---------+ +---------+---+ 255 +------------+ | Data_Agg_1 | | Data_Agg_2 | 256 | Packet |_____| Cisco | | Cisco | 257 | Tap | | Catalyst | | Catalyst | 258 +------------+ | 4900M | | 4900M | 259 +---+----+---++ +---+---+--+--+ 260 | | \ | | | 261 | | \ | | | 262 / \ \ | | |_______ 263 / \ \ | |_______ | 264 / \ \____|___________|_ | 265 ________________/ \_________|__________ | || 266 | | || || 267 +-----|-------------Dell Enclosure 1----------|--------+ .. .. 268 |+----+-----+ +----------+ +----------+ +----------+| .. .. 269 || Cisco |__| Cisco |__| Cisco |__| Cisco || .. .. 270 || Catalyst | | Catalyst | | Catalyst | | Catalyst || 271 || 3130 | | 3130 | | 3130 | | 3130 || 272 |+-++++++++-+ +-++++++++-+ +-++++++++-+ +-++++++++-+| 273 | |||||||| |||||||| |||||||| |||||||| | 274 |1-+||||||+-8 1-+||||||+-8 1-+||||||+-8 1-+||||||+-8| 275 | 2-+||||+-7 2-+||||+-7 2-+||||+-7 2-+||||+-7 | 276 | 3-+||+-6 3-+||+-6 3-+||+-6 3-+||+-6 | .. .. 277 | 4-++-5 4-++-5 4-++-5 4-++-5 | .. .. 278 +------------------------------------------------------+ .. .. 279 +------+_________________________|| || 280 | En.2 |__________________________| || 281 +------+ || 282 +------+____________________________|| 283 | En.3 |_____________________________| 284 +------+ 286 4.2. Impact of Number of Hosts 288 One of the most simple experiments is to determine the overall 289 baseline load that is generated on a given network segment when a 290 varying number of hosts are active. While the absolute numbers 291 might vary on a large number of factors, what we are interested in 292 here is how the traffic scales as different numbers of hosts are 293 brought online given all other factors being held constant. Our 294 experiment therefore simply changes the number of active hosts in 295 our experiment setup from one run to the next and we measure address 296 resolution traffic on the network. The number of hosts is increased 297 from 100 to 500 in steps of 100. The results indicate that address 298 resolution traffic scales in a linear fashion with the number of 299 hosts in the network. This linear scaling applies both to ARP as 300 well as ND traffic though raw ARP traffic rate was considerably 301 higher than ND traffic rate. For our parameters the rate varied 302 from 100 to 250pps of ARP traffic and from 25pps to 200pps for ND 303 traffic. There is a clear spike in CPU load on the access switch in 304 the beginning of each experiment, which can reach almost 40 percent. 305 We were not able to discern any increase in this spike across 306 experiments. 308 4.3. Impact of Traffic Patterns 310 Traffic patterns can have a significant impact on the amount of 311 address resolution traffic in the network. In order to study this 312 in detail we constructed two distinct experiments, the first of 313 which simply increased the rate at which nodes were attempting to 314 communicate with each other, while the second experiment controlled 315 the number of active versus inactive nodes in the traffic exchange 316 matrix. 318 The first experiment uses all 500 nodes in our experiment and 319 increases the traffic load for each run by reducing the wait time 320 between communication events. The wait time is reduced from 50 321 seconds to 1 second over a series of 6 runs by roughly halving the 322 duration for each run. All other parameters remain the same across 323 experiment runs. Therefore the only factor we are varying is the 324 total number of nodes a single node will attempt to communicate 325 within a given interval of time. Once again we observe a linear 326 scaling in ARP traffic volumes ranging from 200pps for the slowest 327 experiment to almost 1800pps for the most aggressive experiment. 328 The linear trend also holds for ND traffic, which increases from 329 50pps to 1400pps across different runs. 331 The goal of the second experiment is to determine the impact of 332 active versus inactive hosts in the network. An inactive host in 333 this context means one for which an IP address has been assigned, 334 but there is nothing at that address so that ARP requests and all 335 other packets are ignored. All 500 hosts are involved in traffic 336 initiation. The pool of targets for this traffic starts out being 337 the same 500 hosts that are initiating. In subsequent runs we vary 338 the ratio of active to inactive target hosts, from 500/0 to 400/100 339 in steps of 100. This experiment showed roughly a 60% increase 340 (220-360 pps) in traffic for the IPv4 (ARP) case and about an 80% 341 increase (160-290 pps) for the IPv6 case. 343 In a slight variation on the second experiment all 500 nodes attempt 344 to contact all other hosts plus an additional varying number of 345 inactive hosts in steps of 100 up to a maximum of 400. In this 346 experiment we see a slight linear increase as the total number of 347 nodes in the traffic matrix increases for both ARP and ND. 349 We ran these experiments for IPv4 only, IPv6 only, and simultaneous 350 IPv4 and IPv6. ARP and ND traffic seemed to be independent of each 351 other. That is, the ARP and ND traffic rates and switch CPU load 352 depend on the presented traffic load, not on the presence of other 353 traffic on the network. 355 One final experiment attempted to determine what the maximum 356 additional load of ARP/ND traffic might be in our setup. For this 357 purpose we configured our experiment to use all 500 nodes to 358 communicate with all 500 other nodes one at a time as fast as 359 possible. We were able to observe ARP traffic peak of up to 4000pps 360 and a maximum CPU load of 65% on the access switch. 362 4.4. Impact of Network Events 364 Network scanning is commonly understood to cause significant address 365 resolution activity on the edge of the network. Using our 366 experimental setup we attempted to repeatedly scan our network both 367 from the outside as well as within. In each case we were able to 368 generate ARP traffic spikes of up to 1400pps and ND traffic spikes 369 of 1000pps. These are also accompanied by a corresponding spike in 370 CPU load at the access switch. 372 Node failures in a network also have the ability to significantly 373 impact address resolution traffic. This effect depends on the 374 particular traffic patter and the number of other hosts that are 375 attempting to communicate with the failed node. All nodes will 376 repeatedly attempt to perform address resolution for the failed node 377 and this can lead to significant increase in ARP/ND traffic. We are 378 able to show this via a simple experiment that creates 400 active 379 nodes which all attempt to communicate with nodes in a separate 380 group of 80 nodes. For each experiment run we then shutdown hosts 381 in the target group of 80 nodes in batches of 10 each. We are able 382 to demonstrate that ARP traffic actually increases in this scenario 383 from an overall rate of 200pps to 300pps. 385 Another network event that might result in significant changes in 386 address resolution traffic is the migration of VMs in a data center. 387 We attempted to replicate this scenario in our somewhat limited 388 environment by placing one of our 8 blades in maintenance mode, 389 which forced all 36 VMs on that blade to migrate to other blades. 390 However, as our entire experimental infrastructure is located within 391 a single rack we do not notice any changes in ARP traffic during 392 this event. 394 Many hypervisors remove the problem of virtual machine migration by 395 assigning a MAC address to a VM, and then a kernel switching module 396 handles all address resolution, accepting and sending packets for 397 all the MAC addresses of its virtual machines through a determined 398 host interface. In other words, the hypervisor responds to the 399 appropriate traffic for the VMs it contains. It behaves as a router 400 for the Layer 2 traffic it is exposed to. 402 4.5. Implementation Issues 404 Protocol implementations and default parameter values can also have 405 a significant impact on the behavior of address resolution traffic 406 in the network. Parameters such as cache timeout values in 407 particular determine when cached entries are removed or need to be 408 accessed to ensure they are not stale. Though these parameters are 409 unlikely to be modified the variation in these for different systems 410 can impact ARP/ND traffic when different systems are present on a 411 given network in varying numbers. Our experimental setup did not 412 explore this issue of mixed environments or sensitivity of ARP/ND 413 traffic to the various protocols parameters. 415 4.6. Experiment Limitations 417 Our experimental environment though fairly typical in the hardware 418 and software aspects probably only represents a very limited small 419 data center configuration. It is difficult to thoroughly instrument 420 very large environments and even smaller experimental environments 421 in a lab might not be very representative. We believe our 422 architecture is fairly representative and provides us with useful 423 insights regarding the scale and trends of address resolution 424 traffic in a data center. 426 One very significant limitation that we came across in our 427 experiments was the problems of using all 500 nodes in a high load 428 scenario. When all 500 nodes were active simultaneously our 429 architecture would run into a bottleneck while accessing disk 430 storage. This limitation also prevents us from attempting to scale 431 our experiments for more than 500 nodes. This also limited us in 432 what experiments we could run at the maximum possible load. 434 Our experimental testbed shared infrastructure, including network 435 access switches, with production equipment. This limited our 436 ability to stress the network to failure, and our ability to try 437 changes in switch configuration. 439 5. Scaling Up: Emulating Address Resolution Behavior on Larger Scales 441 Based on the data collected from our experiments we have built an 442 ARP/ND traffic emulator that has the ability to generate varying 443 amounts of address resolution traffic on a network with varying 444 address ranges. This gives us the ability to scale beyond 500 VM 445 nodes in our experiments. Our software emulator can be used to 446 directly test the impact of such traffic on nodes and switches in 447 the network at much larger scales. 449 Preliminary results show a good match between the testbed and the 450 emulator for both traffic rates and switch load over a wide range of 451 presented traffic load. We have calibrated the emulator from the 452 testbed data and will use the emulator to run experiments at scales 453 that would otherwise be impractical in the real network available to 454 us. 456 6. Conclusion and Recommendation 458 In this document we have described some of our experiments in 459 determining the actual amount of address resolution traffic on the 460 network under a variety of conditions for a simple small data center 461 topology. We are able to show that ARP/ND traffic scales linearly 462 with the number of hosts in the network as well as the traffic 463 interconnection matrix. In addition we also study the impact of 464 network events such as scanning, machine failure and VM migrations 465 on address resolution traffic. We were able to show that even in a 466 small data center with only 8 blades and 500 virtual hosts, ARP/ND 467 traffic can reach rates of thousands of packets per second, and 468 switch CPU loads can reach 65% or more. 470 We are able to utilize the data from our experiments to build a 471 software based ARP/ND traffic emulation engine that has the ability 472 to generate address resolution traffic at even larger scales. The 473 goal of this emulation engine is to allow us to study the impact of 474 this traffic on the network for large data centers. 476 7. Manageability Considerations 478 This document does not add additional manageability considerations. 480 8. Security Considerations 482 This document has no additional requirement for security. 484 9. IANA Considerations 486 None. 488 10. Acknowledgments 490 We want to acknowledge the following people for their valuable 491 discussions related to this draft: Igor Gashinsky, Kyle Creyts, 492 Warren Kumari. 494 This document was prepared using 2-Word-v2.0.template.dot. 496 11. References 498 [ARP] D.C. Plummer, "An Ethernet address resolution protocol." 499 RFC826, Nov 1982. 501 [ND] T. Narten, E. Nordmark, W. Simpson, H. Soliman, "Neighbor 502 Discovery for IP version 6 (IPv6)." RFC4861, Sept 2007. 504 [STUDY] Rees, J., Karir, M., "ARP Traffic Study." MANOG52, June 505 2011. URL 506 http://www.nanog.org/meetings/nanog52/presentations/Tuesda 507 y/Karir-4-ARP-Study-Merit Network.pdf 509 Authors' Addresses 511 Manish Karir 512 Merit Network Inc. 513 1000 Oakbrook Dr, Suite 200 514 Ann Arbor, MI 48104, USA 515 Phone: 734-527-5750 516 Email: mkarir@merit.edu 517 Jim Rees 518 Merit Network Inc. 519 100 Oakbrook Dr, Suite 200 520 Ann Arbor, MI 48104, USA 521 Phone: 734-527-5751 522 Email: rees@merit.edu 524 Intellectual Property Statement 526 The IETF Trust takes no position regarding the validity or scope of 527 any Intellectual Property Rights or other rights that might be 528 claimed to pertain to the implementation or use of the technology 529 described in any IETF Document or the extent to which any license 530 under such rights might or might not be available; nor does it 531 represent that it has made any independent effort to identify any 532 such rights. 534 Copies of Intellectual Property disclosures made to the IETF 535 Secretariat and any assurances of licenses to be made available, or 536 the result of an attempt made to obtain a general license or 537 permission for the use of such proprietary rights by implementers or 538 users of this specification can be obtained from the IETF on-line 539 IPR repository at http://www.ietf.org/ipr 541 The IETF invites any interested party to bring to its attention any 542 copyrights, patents or patent applications, or other proprietary 543 rights that may cover technology that may be required to implement 544 any standard or specification contained in an IETF Document. Please 545 address the information to the IETF at ietf-ipr@ietf.org. 547 Disclaimer of Validity 549 All IETF Documents and the information contained therein are 550 provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION 551 HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, 552 THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 553 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 554 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 555 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 556 FOR A PARTICULAR PURPOSE. 558 Acknowledgment 560 Funding for the RFC Editor function is currently provided by the 561 Internet Society.