ARMD Working Group - Meeting Minutes - IETF 80 Address Resolution for Massive numbers of hosts in the Data center (ARMD) Location: Barcelona/Berlin meeting room, Hilton Prague, Prague, CZ Time: 30-March-2011, 1510–1610 - Wednesday, Afternoon Session II Chairs: Benson Schliesser (bschlies@cisco.com) & Linda Dunbar (linda.dunbar@huawei.com) Agenda: * Meeting Administrivia * Discussion of Charter * Discussion of Problem Statement * Call for Investigation * Problem Statement Drafts Benson: Sorry for starting the meeting late, Welcome to ARMD which is now a working group, Read Note well, let's start the Charter discussion, as you all know there were couple of BOFS talking about the cloud topics in general and they seemed to be too broad. We are WG now and have a charter the summary of which is displayed in the first 3 bullets of the presentation. The 1st one has already gotten controversial on the mailing list -- We are assuming that this will be a massive layer 2 effort. We are not talking about why Rahul Agarwal: I do not agree with the interpretation of the charter that you have on the slides. The way I read the charter is that the way people are building data centers today one of the issues they have is data center mobility. One of the issues with the current charter is that it assumes VM mobility requires massive L2 network. We need to really grill into what VM mobility means and what are the reasons for building large layer 2 networks. For example the VM's might be in the same administrative domain and the administrative domain is in the same L2 network or in the same VLAN. That kind of architecture is probably hard to change but if there are other reasons we need to see if functions can be handed over to layer 3. So the charter and requirements of the WG needs to change to investigate where layer 2 hands off to layer 3 and the charter should also include definition of mobility. Benson: I do not disagree with the technical questions you are asking as whether you build it layer at 2 or layer 3 effects the scale but the question we have been chartered to address is -- Is there an ARP/ND problem to resolve, what is the dimension of the problem and at what stage the problem starts. If there is a problem than only discussions around whether layer 3 mitigates that problem becomes relevant. Rahul: I am not necessarily saying that ARP is not a problem. All I am saying is how big a problem it is and it changes how you partition your network based on layer 2 and layer 3. Getting your layer 2 and layer3 interaction right can be one way of scaling your network. By limiting the discussion to layer 2 only we are making a mistake. Jari Arkko: I am not the responsible A.D. but I was involved in the discussions and the charter does say that you need to document the scaling characteristics of ARP/ND and document the operational characteristics that can reduce those issues. So it's not to find the smoking gun but to determine if there is a problem or not. The interaction of layer 2 and layer 3 will come out in the operational practices part. Benson: We have to define what the problem is and that is not up for debate and in order to manage the working group that is all we will focus on at this first meeting. I am not saying that these discussions are not relevant. All I am saying is that they are not relevant to the current discussion. Igor: Defining the issue to be VM mobility is too restrictive. Address mobility and resolution in VM migration environment is the real issue and we should not limit to just ARP/ND. If the investigations reveals that ARP/ND are the culprits then that is fine. The problem should be generalized to Address Resolution. There are lots of researches happening in this area that we can utilize. Benson: We are not a research group. We are in the OPS area and are limited to things that are being deployed today. As you pointed out there is a lot of work going on in this area but that is out of scope for now. Linda: Agree with Igor that Address Resolution general issues should be studied by ARMD. Thomas Narten: People are trying to do something here in terms of scaling data center. We need to understand what they are saying, what the pain is and that should be the driving force here. By following the charter very closely I worry that by just looking at ARP/ND we are looking at the symptom of the problem without understanding the problem. Benson: Personally I share your concern. But we have to narrow down the scope. Thomas: But I hear someone say cloud and someone say data center. We need to make sure what we are talking about. Linda: The Charter says data center. Dave McDysan: I also have different reading of the charter. If we can find an alternative to large layer2 networks that is better than investing in understanding how to make large layer2 networks work. XXXX: What is the acceptable solution space in this WG Benson: We are not looking at the solution space. We are looking at the problem, is there is an operational problem and what are the dimensions of that problem. If there are gaps than look at the solutions that are there and maybe recharter. XXXX: Is proposing new solutions using existing technology in scope Benson: No Jari Working on solutions whether they are part of new protocol or existing protocol is out of scope for now. We are trying to understand what the problem is. Igor: You just said that we are looking for ways to solve address resolution problem but the slides say ARP/ND which one is it. Are you looking at just ARP/ND? It should be Address Resolution issues in Data Center. Benson: Anything that is deployed in a meaningful way Igor: What about 100K static entries? Benson: Sure. Please write a draft on how it scales. Erik Nordmark: We do not want sales pitch. However it will be useful if someone has experience doing address resolution using other than ARP/ND say like chicken wire then it is useful to write down their experience. XXXX: Why Erik Nordmark: What is more interesting right now is data and not a particular approach. Benson: Don't describe the solution, don't describe other reasons why you are interested in the solution. Describe the impact of your particular approach. Vince: I frankly suggest that look at the charter page and look at goals and mile stones. All the comments and recommendations can be directed to individual working groups or towards the deliverable's of the working group. Benson: There will be references to outside work that is out of scope. The chairs will create a bucket draft to hold these documents. Benson: Let's talk about the problem statement. The context is massive layer 2. There are all kinds of reasons we might have massive layer 2. We will just assume it is massive. We need to define the word massive. Does it means it spans large distances in one data center or different data centers. As a working group we need to achieve consensus what massive means. Talk about what are ARP/ND problems. Does ARP/ND cause too much bandwidth to be used or causes too much host processing. Our first milestone is to define this. Till now we have had discussion on the context but that does not answer the question. Jari: There are a different kinds of dimensions with different kinds of things we can scale for example number of hosts, number of routers. We should not define what massive means. Define the scaling characteristics of ARP/ND, The idea would be that if you keep adding hosts when will you run into issues. May be some kind of an equation that computes how much traffic gets generated as the network is expanded etc. Benson: That is what I meant to convey also Benson: We are proposing a new working group draft. Current drafts have good content and we may use them. We are looking for a new editor. Anyone interested please contact us. Ning So: I will volunteer for the editing job. Benson: There are operators in the room we need real data from them if we are going to be realistic and successful. If you have data that you can share please bring that to the mailing list. If you know someone who has data and want us to go make phone calls and twist arms please let us know. Igor: Can you be a little be more specific what data Benson: Let's talk about that on the mailing list. It is a good question in by itself Thomas: It would be great to have data that we understand and shows what pain it is. Another way to look at things would be to categorize them in three buckets. For example an environment where current solutions work fine, an environment where the current solutions do not scale and another environment where current technology does not work because of restrictions imposed several years ago and relaxing them will solve the issue. Based on conversations I have had, at some level the protocols work fine but they stop working when taken to a certain level. Igor: The reason I asked for specifics of the data is because there are different data center designs and each is optimized for different things. Benson: That will be good data to have Igor: So you want every possible data center design document Benson: This needs more discussion that we can have here today Ralph: It does not have to be a lot of data but data that helps us understand the issue Benson: Yeah. Once we concede there is a problem we can have discussion about solutions. Jari: Data which can tell us in which dimension the problem is will be very useful. Benson: Let's talk about it on the mailing list Rahul: What we need here is a problem statement and requirements documents written by the operators. Benson: Please bring any more discussions to the mailing list. let's start the presentations Linda: draft-dunbar-armd-problem-statement-01.txt Yizhou Li: draft-liyz-armd-vm-migration-ps-01 Randy Bush: this effort should be run by operators and not vendors and particularly not a single vendor whose presentations dominated the meeting and who described a broken implementation as an excuse for more work the wg is a poor excuse for huawei to get an overly-ambitious person to a wg chairship Benson: I agree that operators should lead this, but no operator submitted drafts or asked for time Igor: Come and ask us Jabber - Ron Bonica: Do you want to have an interim meeting where operators present Randy Bush: Come to NANOG Jari: I have been concerned that we are trying to create technology pushed by Vendors and I would like to hear from Operators and data center guys. Rahul Agarwal: Speaking as a vendor I agree with Randy and Igor that operators need to provide input for the WG to do the work but Operators do not shut down the WG. XXXX: If we don't define the problem the working group will go away Dinesh: It seems to me that you said we will not talk about solutions but the previous presentations have solutions Benson: I agree that all 3 presentations have out of scope material but these were the best and since this is the first meeting we are lenient. Dave McDyson: Are future data centers in scope? Benson: No as we are an ops WG we only look at current deployed solutions. But if you have ideas send them out. Igor: We need to define what a data center is as everyone's idea of what a data center is widely different. Benson: Let's take this to the mailing list Cathy: Nanog would love to have a BOF on Address Resolution. Nanog has much higher operator experience. Benson I will contact you off line. Linda: draft-mackcrane-armd-ipv6-nd-scaling-00 (Author not present, Linda presented on his behalf) Igor: you have assumed one type of architecture. A lot of people don't use this design, so the analysis won't apply. Contrary to your conclusion, IPv6 ND does have issues. Chris Morrow: DC is designed for a particular workload. If you change the workload, it may look like the protocol but may be hardware problems. Benson: those are all valid issues. Igor: for most people a DC is the three tier design in text books. But reality is different. Perhaps there are about 3 or 4 designs out there. Should we document those designs? Benson: yes, if you can help with that information on the mailing list. Meeting closes.