82nd IETF ARMD WG session minutes --------------------------------- Location: 101B meeting room, TICC, Taipei, Taiwan Time: 15-Nov-2011 17:10~18:30 Chairs: Linda Dunbar, Benson Schliesser Slides posted at http://www.ietf.org/proceedings/82/slides/armd-0.pdf Agenda posted at http://www.ietf.org/proceedings/82/agenda/armd.html Outline of the Agenda: * Welcome & Administrivia * Data Center Reference Architectures (Manish Karir) * Problem Statement for ARMD (Thomas Narten) * ARMD Work Plan (Chairs) --------------------------------- Manish Karir Data Center Reference Architectures draft-armd-datacenter-reference-arch-01 Slides posted at http://www.ietf.org/proceedings/82/slides/armd-1.pdf Goals * Common architecture for ARMD work with the same terms, and same picture. This will avoid talking past you. * The data center is a balance of problems. You have minimized your particular issues. * Generalized data center has server racks, access layers, aggregations layer, and the core that interconnects. This is physical topology. * Layer 2/layer3 boundary can vary greatly from one data center to the next, * Layer 3 to access switches, * Layer 3 to aggregation switches, * Layer 3 in core only, * L2/L3 overlays. Factors affecting Data Center Design * What is the purpose? - Large virtualized web farm - Large compute clusters, - Multi-tenant data center, Impact on L2 protocols * L2/L3 boundary is the critical pain point, * Crossing L3/L2 boundary involves ARP/ND processing, * Bi-directional processing, Problem of Generalizing * Generalizing topology is not enough, * Need to have generalized by workload and topology Defining Different workloads * Web farms might right in 1500 messages/second * Compute farm - 1% FMS Conclusion * ARMD develop and use a generic data center physical topology * Avoid specific scenarios, Questions & Discussion: Kireeti Kompella: You used the term generalized topology, and another place you used generalized layer. Another place to typical topology. Juniper is pushing toward a different architecture. The generalized three layer architecture is the one I feel ok. The layer 3 boundary is at the VM. I know the problem is difficult, and you can go all the way to the VMs. Manish: I think this is a very good point. The problem is that there is a whole range of issues. You have to hang your hat on something. If the three layer generalized, is something we agree upon. Kireeti: If you want to get away from specific topology, you need to show a few examples of the document. Igor Gashinsky: I run a 6 layer data center. Conceptually, the three layers work for me. What happens at 4-6 doesn't add anything to the network. If you want to push three layer topology to the VMs, I've talked about this with the chairs. [missed something here]. Manish: The problem is where to draw the line Joel Jaeggli: One thing I'd add to the mailing list. The problem statement makes recommendation. Manish: This is accurate. There is a disconnect between what is in the draft and what got right down. Warren Kumari: It would be nice to have a document on these designs. ARMD seems to be an odd space to be documenting data centers. Manish: It would be better if there was documentation on the design. Igor: Your topology should fit into this. Your three layer topology is sufficient to map to this. Manish: There are documents with reference design. Igor: I do not think there are reference document, and network are not exactly designed as shown on the CCIE training documents. Benson: I think what we need is a taxonomy. Joel Jaeggli: Just to emphasize, if I am constraining my neighbor discovery and ARP to handle. It may have more problems near the edge. Benson: As long as we agree what is the middle, or center. Igor: There are different ways to look at it. We can say is the middle is the cloud. It is still like the three layers. Kireeti: You might do a two-layer topology of the edge and the core. The cloud is made of the access layer and the cloud. Manish: The question is whether that this is the general class or the Kireeti: The edge guy is transport and the core is ____. Igor: I hate to disagree with you. T&2, does not match to the data center. The OLTP search does not handle the two tiers or the two layers of functionality. David Allan: I do not get the idea of two layers and three layers. Trying to place a boundary around this discussion doesn't make sense. Manish: We have to agree about some boundaries around scale. Igor: on the slide that you were talking about different sizes. Small is 50K VMs. My data center has 2.5 M VMs. I can see 10M VMs. I run a large data center. Manish: We should hash this out on the mail list. We have some shared understanding of those layers. --------------------------------- Thomas Narten ARMD problem statement draft-ietf-armd-problem-statement-01 Slides posted at http://www.ietf.org/proceedings/82/slides/armd-2.pdf I expanded the scope from the last problem. We are not just L2, but we need to deal with routers and switches. I added two simple "representative" data center designs. I've got more details on ARP and ARP issues. Igor provided more details Open issues are the Neighbor Discovery is a TBD. Next steps are that we haven't gotten much detail. We need to get things that needed to be added. Questions & Discussion: Dino Farinacci: The problem is intentionally leaving non-IP traffic out of it. Thomas: It is not out of scope on purpose. Benson: Are there issues for non-IP traffic that ARMD or IETF should take a look at? --------------------------------- Benson Schliesser ARMD Work Plan Slides posted at http://www.ietf.org/proceedings/82/slides/armd-3.pdf The Work Plan is a straw man. Milestones Due * May 2011 - Problem statement * November 2011 - ARPD/ND statistics collection and behavior analysis, * November 2011 - Survey of Existing Implementations * November 2011 - Survey of Security Milestones Upcoming * March 2012 - Recommendations * March 2012 - Gap Analysis Existing documents * ARMD problem statement * Address Resolution Statistics * DC reference architecture The working group architecture is not yet a working group document. Work Plan * Finish existing documents, * Develop Recommendations * Examine Gaps * Recharter or Shutdown Milestone Candidates * May 2011 - Problem statement draft-ietf-problem-statement-00 Suggest that draft-armd-datacenter-reference-architecture will be merged. People nodded. * November 2011 - ARP/ND statistics and behavior draft-karir-armd-statistics-01 Is this enough, does it work for ARMD? Igor: It is a lot of good work, but the large data center work differs. Linda: are you willing to provide the data? Igor: I have no problem giving you details on how to gather the data. Benson: I do not know how to make progress here. Thomas: The statistics document is tricky so we get real data. The problem is we are not going to get real document. The current document is simulation. Benson: Can we answer the questions regarding our problem statement without the data? Igor: We might be better to use the vendor documentation. The ARMD stats have default values, and the tweaking values. I can provide a lot of the data on how to run the tests. I cannot provide the answers due to my NDAs. Manish: I do not think the statistics stands on it's own. I do not think it has enough meat to stand as a separate document. Thomas: My recollection is that we were here before. We decided that we would focus on pain points so that we can side-step the need for detailed discussions. Xxx: We should go back to what Kireeti and others said. What I can do with different types of processors and going four. Can we say if a particular box with this type of processor can deal with VMs or messages? How many messages we had with the progress? Should we talk about the factors based on OS? How can we characterize it with messages per second? If we say the TOR that has to deal with 100 messages Benson: Can we come up with these numbers? Dinesh Dutt: [missed] Igor: In large data centers, OS change per minute. Dinesh: He did a good job of characterizing the factors. There are so many factors Benson: We should write down these parameters. Warren: It would be good that the ARP statistics can be described by real DCs, and the simulations approximate it. Ron: If the milestone is unreasonable, you can just drop it. You should put the relevant information into the problem statement. Benson: You are saying that the problem statement. Kireeti: I want to hear the statement that worries me that you publish what you have. If the publish what you Igor: I want to expand why 6500 run ARP running SS7 gives you an entirely value than SS6? My iPhone has a better processor than some switch/router vendors. We want to state that the point points occur in common point. Warren: We do not need to see that we have a problem. I am seeing 500 ARPs, 1000 ARPS. Igor: Are we talking about the amount of traffic on the network or what is on the node? Warren: Let us at first giving your full traffic. Dino: I think it would be useful to separate protocol/architecture versus implementation. It is important to know so we can fix the bugs. Kireeti: You have architecture, implementation, and something between it. Your placement of L2/l3 boundary is keep determining the performance. Sometimes it is the choices of architecture. Benson: We had a great set of should/shouldn't for the document. If there is something that fails to be in the problem statement, then please state. Benson: After we finish the problem statement, we have BCP on “how to Scale”. If anyone thinks we should develop a recommendation or Timers, you should post it. Likewise, you should comment on recommendations on where to place the highly scalable. The obvious and scalable solution is to not use ARP. Dino: Let's not forget IGMP/MLD2. Benson: Do you count these in the problem statement? Dino: Only unicast are a problem? Benson: We have to make a case for these being in scope. Dino: We are talking about unicast/multicast. Igor: Are any large massive data centers running multicast? Warren: It is reasonable, but not in the character. Benson: I looked in the archive and Thomas stated "gee do we need a working group that concludes – we need to deploy proxy arp". If there is a gap? I also looked at the gaps in the our gap analysis? We are documenting ARP/ND Proxy. Ron: About proxy ARP, RFC1027 was written in 1987. This RFC may not be part of ARMD. Benson: I started to make a list of things that make descriptions of proxy arp documents (RFC4389, RFC 925, RFC1027), and other drafts. Benson: I knew that I should have said that this is a partial list. Kireeti: It is interesting and it gets me to think (which is a dangerous idea). If you look at VMs and they are terminated on a router, is this a problem? This is not always the case. There are people who do not segment into small domains, and there are people who doesn't. Benson: [missed]. Joel. J.: The management of adjacencies is also a scaling issue for L3 topologies. Benson: This L3 is true, but not in our scope. Benson: We did not talk about what an ARP proxy is, and what the inter-proxy behavior is like. Benson: We are not chartered to talk about overlay or TRILL or ARP proxy. If anyone thinks the overlay changes the story, we should make a case. Igor: The overlay absolutely changes the problem. Warren: Most of us, is generalizing. Igor: Quite of people running optical use the scale of the Warren: the arp scaling and broadcast scaling are problems. Kireeti: There are two people now? Warren: There is still one. David Black: The [missed] Luyuan Fang: The Layer 3 solution is has an effect. We are going to have a VPN4DC discussion in the L3VPN WG session. Benson: The problem statement was a straw man. It's done its job to get people to talk. Kireeti we'll bug you to talk. Requirements for overlay address resolution. Are these relevant? Igor: These are very relevant. Benson: The intent of this slide is to get people moving. We want to get through our documents. I'd like to see things on the mailing list. If we do see feedback, we'll alter things. If we do not see feedback, we will move document forward soon. Kireeti: What does security mean? Security of address resolution in a data center? Benson: These are from the charter. The simple way is: how secure is ARP in a data center. Thomas: As secure as it always ways. Thomas: I think survey of existing implementations is useful to be done. You need to tease out the detail a bit more. The Windows has used the same thing from IPv6 to IPv4 (??). We should be able to write now. Ron: If no one talks about survey of security, I would not be heart broken to drop it from the charter. Benson: Last chance to make a comment? None? Adjourned. ---------------------------------