aIETF-100 DCROUTING agenda Session 2017-11-15 09:30-12:00: Padang IETF 100 – DC ROUTING Agenda Wednesday, November 15th, 2017 09:30-12:00: Morning Session Room: Padang Data Center Routing (DCROUTING): · Status: non-WG-Forming · Responsible AD: Alvaro Retana · BoF Chairs: Gunter Van de Velde, Victor Kuarsingh NOTE: In Meeting/BoF Quetions and Reponses captured in end of the document Background Over the last year, there have been discussions in a number of routing area working groups about proposals aimed at routing within a data center. Because of their topologies (traditional and emerging), traffic patterns, need for fast restoration, and need for low human intervention, among other things, data centers are driving a set of routing solutions specific to them. The intent of this BOF is to discuss the special circumstances that surround routing in the data center and potential new solutions. The objective is not to select a single solution, but to determine whether there is interest and energy in the community to work on any of the proposals. Agenda Administrivia (5m) · Blue sheets · Note taker: ? · Jabber scribe: ? · Agenda bashing · BOF goals and objectives Intro Dcrouting notes. Gunter. Meeting starting. NW applies. Blue sheets. We have a full agenda. We expect many Q&A and discussions. We will start with requirements discussion. Then a look at changing the requirements on existing protocols. Augmenting of existing technologies for DC routing environment. G: We would like to get better understanding on the modern requirements of what is needed in DC routing. G: What are the key questions that we are trying to address? We are truing to understand whether it makes sense for IETF to increase the focus on DC routing technologies. We would like to receive guidance and input from you on the requirements on routing in the DC,. How DC routing is different from traditional routing. The work on new protocols is not excluded, the work of augmenting existing routing protocols can go in parallel with new work on new routing protocols. G: Please be concise, we do not have that much time. We want feedback on whether DC requirements are clear enough to do work in IETF. Can a single solution achieve all the requirements? Do you have interest to work on the solution space? How should IETF organize the work on the solutions. Should we create DC routing WG? Should we develop a set of requirements and work in existing WGs? Discussions around Requirements & Problem Space (40 minutes) · Read-out: Cloud DC and Operator Requirements for DC Routing (Jeff Tantsura) o https://tools.ietf.org/id/draft-dt-rtgwg-dcrouting-requirements-00.html (15min) JT presenting. [presentation] The work on thinking what could be done in DC routing around a year ago. There was a team formed to focus on this topic. Not because we are long time fiends, but because we deal with developing and operating DC related technologies. We want to avoid the beauty context. We need your comments on requirements. Please be welcomed to comment. [discussion] Arthi/Arista: I have concern on requirements doc. Would be good to understand - MSDC is not a new problem, and requirements are very varied. It seems to be a collection of separate requirements., We do not have requirements on how SPs design their backbone network. Why are we coming with a set of requirements to standardize deployments in DC space? There may be different solutions - BGP or RIFT - to be used, but why are we trying to standardize the requirements? That does not seem to make much sense. JT: We are not defining how to design a DC. Aarthi: But you are. For some operators leaf/spine architecture may apply, for others it does not. JT: It is not a finalized version of requirements. Please comment on them. We will try to address that, We are not imposing the architecture. We are addressing the routing protocol design, not the DC design. Gunter: The reason why we are doing this requirements review - we want to understand whether we can reuse existing technology or do we need to do more tailored developments. What can fullfill this set of requirements. Aarthi: It is a STD track document. It is fine if it is a guideline at the end, the question how to treat if it becomes and RFC, Victor: We are trying to understand the indicators on what is driving the requirements. Whether there is potential new work. It is not a list of finalized requirements. If the drivers are significantly different. JT: If you look at the number of proposals, it is not something enforced on the world. The solution can be either an extension to Acee: A lot of these requirements seem to be control plane, data plane, and platform requirements,. Some of them are closer to routing, some are closer to platform. You need to classify better. JT: Agree. Please comment more, we will structure it better based on comments. ?: Maybe you should split it to different documents? There is too much of different requirements in the same document,. JT: On one side I agree with you. When we say The protocol should support out of band. ?: Question is whether it should be there by default. Dean: Regarding encoding - it would be good to say what requirements are on encoding and not naming a specific encoding framework as that may change. JT: The requirements on encodings is an example. There is no way to enforce that. Greg Mirsky: You emphasized BFD for convergence. What about degradation of links and paths - do you think that indirect analysis of telemetry like buffer queue utilization is sufficient or do you need performance measurement?. JT: We would like to contain the size of routing information base. Greg: You mentioned to routing protocols requirements a distribution of telemetry. is that for routing orr for management/. Do you think that telemetry needs to be flooded in the fabric instead tof sending to a specific host for analysis. JT: Very good question. Basic telemetry does not need to be streamed across the fabric. There is some metadata that would benefit of being distributes - load, balancing. Greg: It would be very interesting to discuss how the flow goes, is it always E-W or also N-S. JT: The amount of information sent to N is a magnitude higher. Randy: We provide global transit, we also build DCs. We see those as very different. The differentiation between DC and enterprise is substantial. I was horified by the laundry list of requirements. I guess this is IETF. It is an enormous list and some focus is going to be needed. JT: If -00 draft wouldn't trigger any comments we would not have done a good job. We would like you to asses and let us know. o Q&A, Clarifications and Discussion (5min) · Read-out: Enterprise Requirements for DC Routing (Nalini Elkins) o https://tools.ietf.org/html/draft-elkins-brickmortar-architecture-00 (10min) o (updated -01 version was submitted after draft cut-off) o Q&A, Clarifications and Discussion (5min) · Section Wrap-up (Chairs) (5min) Nalini presenting, [presentation] A view on legacy - brick and mortar - DC designs. We represent 99% of those who are not present at the IETF. Michael with a Blue Shield which is a large healthcare enterprise. This is to give a perspective, we may have different requirements. It is an aspirational goal for us on what you talk here about fabrics and similar things. [discussion] Randy: Two comments. Half people in this room understand about Enterprise and EIGRP. You have done an excellent job of delineating yourselves from DC routing. This is a different problem, it does not belong here. JT: Appreciated the time you spend the time you spend on this. Modern built architectures are build in a way of what a vendors tried to sell you during last 20 years. When next time a sales person comes up - this is simple and operational thing that we want rather than fighting the unknown. This was the intention of publishing BGP in large DC RFC. It is on making information available on leveraging the information to reuse. Nalini: A lot of this is - we want to go to SDN, containers,. We have a problem of how fast we can do that. We are talking to people on what kind of visibility and problems do we need to account for. Encryption in the DC. We have tons of applications that are in the stone age of computing that are very difficult to change. Michael: Management and monitoring is very important for us. Nalini: We would like to have guidance of those ahead of us on how to proceed. Michael: EIGRP has not gone away. TonyP: I do not think that this is that much disconnected. You are facing similar problems that hyperscales faces too - it is about scaling of the staff?? What is the trajectory of the consumption of the bandwidth. You consume CPU and storage as components. It is an abstract consumable, which if you want to consume in large quantities at low prove points needs to become a component. Bandwidth in a DC is a components. Bandwidth needs to be packaged in something as cheap as fries today. All the stuff that you have today will go away to components. Michel: You make a lot of good points. I agree with what Randy said. We have a lot more common. Victor: Please raise hand if you feel that the requirements that should be in focus that have not yet been discussed, A few. V: Do you feel this is well covered and there are no major requirement missing? A few. V: Do we agree that there is enough augmentation that make it different from traditional routing? The new DVC routing requirements? Many hands. V: If you feel that there is no significant difference? None. Linda: I have a comment on this - you are asking from the requirements presentation. Actual DC todays is different an SP network. The topology is very simple - that is a significant difference. V: We did not see all the potential requirements - that is the reason for question whether there are additional requirements. Linda: I would not call those requirements, it is a characteristic. It is about removing features, not adding features. G: We might remove some functionality, but we also need to add new functionality too. V: Do you think that what we have seen from requirements this will drive to new routing protocol work? 20. V: If the answer is no, show of hands? 10. V: To wrap up, it seems that more people believe that there is a potential new RP work is needed. Jan?/Bloomberg: There is a need for new protocol work, but there is a need for developing more fabric based technologies that are different from todays ones. There has to be a middle ground. What people are discussions here is different that DC routing protocol VL This is not a WG BoF. JT: What we are going to do here is going to be very relevant to many new work areas, IoT, 5G. DC Routing Proposals based around new routing technology (80 min) o BGP-LS SPF: Shortest Path Routing Extensions for BGP Protocol (Keyur Patel) o https://tools.ietf.org/html/draft-keyupate-idr-bgp-spf-03 (20min) o Q&A, Clarifications and Discussion (15 min) Keyur presenting. [presentation] [ discussion] ?/Huawei. I have discussed this idea with OTT guys. Some of them have interest to explore this solution. Keyur: Thanks. ?: Replacing the phase 1 - BGP provides central policies. Keyur: The policies are applied befofe or after you run the decision process. The policy process gets replaced by the SPF. Does that make sense? ?: Yes. Phase 1 and phase 2 processes mention that you apply the policy before sending. Same thing for the phase 2 - you apply route policies outbound. It is mainly for clarity. You can do SPF calculations. Keyur: Absolutely. Clark/Comcast: This is specific to one specific problem. Keyur: Yes, we are addressing a specific convergence problem. Comcast: There are many requirements or wish list of what we need to solve. We need some kind of SPF thing, we need some form of links state. You are solving a very small portion of it. K: Yes, and my answer still stands. We are looking to optimize the protocol to achieve even better convergence and manageability. There are requirements and we may address that but it is a different solution. Comcast: We want to solve the problem removing features and adding enhancements, but this is not - simplification is a requirement too. Acee: Phase 1 and phase 2 normal processing - you have to modify this for SAFI if you want to get the same convergence behaviour as IGP. Another comment - actual routing requirments and not dataplane and platform requirements, the whole thing about auto-discovery is a separate aspect. You can do it with BGP or another protocol. Kl; Agree. Randy: We have a long history of design of things designed for LAN and getting into problems. I have problems understanding this being applicable to large transport network. This is for limited scale DC. I see red flags if people start add requirements for this to work on a WAN. Robert Raszuk: If you are trying to do a new SAFI only to signal link state, you can do that today without extending the problem. K: There are several ways to do. The reason to do that is to have a clean multivendor solution. New SAFI gives a clean way to do this, and it Lets you break the protocol into different processes and scale. Acee: Today we are using BGPLS for other purposes, we are reusing a lot of technology for links and attributes. If we want to get fast convergence, we need a different different SAFI to attach to this new behaviour. We cannot rely in import communities. ?: Would be good to have in this draft to discuss a leaf spine Topology. K: That is an excellent suggesion, we can add a section. Victor (co-chair hat off): As an operator using BGP for may reasons, I see a lot of potential personally. We can put an applicability statement on what we want to address. JT: Future request to chairs - rather to compare apples and oranges, would be good to see all fruit compared to a single set of requirements. V: I took my chair hat off when I was speaking on the implementation aspects. SueHares: Some of the things address things for some problems, and some address for other problems. What is the scope of the discussion here? It is hard to understand on what I need to provide input G: Routing in the DC may be a different kind of beast than routing in WAN environment. Majority of routing protocols have been developed for WAN environment. Some people have developed new concepts and ideas. There are changing requirements and that results in augmenting the technology. Sue: Some requirements may not really be for the DC category. JT: We are not trying to choose the king here. It does not need to be a single solution here. It varies per company, per IT staff. It is good to be aware of different solutions. Randy: Would be good not to boil the ocean and narrow the focus. So openminded that his brains fell out. o RIFT: Routing in Fat Trees (Tony Przygienda) o https://tools.ietf.org/html/draft-przygienda-rift-03 (20min) o Q&A, Clarifications and Discussion (15 min) o Section Wrap-up (Chairs) (10min) TonyP presenting. [discussion] Dean: I do not understand the automatic flooding reduction. The other part - your name is already taken by a company in Boston? Tony: My last name? Dean: It is rift.io Robert Raszuk: I think you have a very good control plane worked out. The problem is with convergence. It takes hundred of msecs to advertise a route today. it is not production quality. TP: You can run it with a penalty of increased state on TORS. You can also use BFD. RR: Something like LFA. TP: That goes down to how fast can you converge. We may use LFA and FRR type of things, but the solution will be more expensive. 50ms single failure converge - it is very comfortable with this. Different use case some people are happy with 1sec blips, some people are uncomfortable with 100 msec blips. If you aggregate, you need to eventually deaggregate. JT: To clarify the requirements - support for BFD and not for LFA. TP: If your hardware does BFD correctly. Sam Aldrin: working off a controller is completely off? TP: The DV overlay where you can push prefixed and policies. When it comes to a shortest path convergence it will be always faster than you can push from controler. Gauwrav/cisco: If you compare the different solutions, which one converges faster? TP: Both well implemented, it is 3x convergence speedup. This is a number that I have, This is not a beauty contest. Gauwrav/cisco: compare the solutions based on requirements. V: any additional questions please address on the list. V: A question for WG - a show of hands - do you agree that proposed solutions address unique challenges of DC routing? DC Routing Proposals based extensions to existing routing technology (15min) o ISIS Routing for the Spine-Leaf Topology (Les Ginsberg) (5min) o https://datatracker.ietf.org/doc/draft-shen-isis-spine-leaf-ext/ Naiming presenting. [presentation] No questions. o Openfabric: ISIS Support for Openfabric (Russ White) (5min) o https://tools.ietf.org/html/draft-white-openfabric-04 Russ presenting. [presentation] Uma: I have sent some comments earlier - what you removed from ISIS. Russ: Yes, that was to make the protocol simple. o OSPF/ISIS Flooding reduction in MSDC (Xu Xiaohu) (5min) o https://tools.ietf.org/html/draft-xu-ospf-flooding-reduction-in-msdc-02 o https://tools.ietf.org/html/draft-xu-isis-flooding-reduction-in-msdc-02 Xiaohu presenting. [presentation] No questions. Session Wrap-up o Session Wrap-up and next steps (Chairs) (10min) V: Show of hands of who would be interested in contributing and participating in new work in routing area. V: I will publish the questions with the responses that we got. V: Alvaro, do you have any comments and anything to share with the bof? Alvaro: What I was hoping you ask next - the purpose of the bof was to see what the requirements are and whether there is a need for new work. Many of you raised hands on new work. I would like to get a sense of the room of people willing to work on specific proposals. If at some time later we get new work in IETF on a specific new work? Alvaro: Assuming that we go forward with the work in those proposals - how many of you would be willing to contribute to that work? Alvaro: If you are willing to this effort, ??? Alvaro: I notice some of you raised your hand twice. That is fine. We have a limited number of resources in RTG area, you cannot go to 27 WGs in a week. The sense whether there is interest in doing the work. Thank you. G: Any other comments? G: End of meeting. Gunter: Focus/Question: does it make sense to increase the focus on DC routing here at the IETF? Gunter: is there is need to change the routing in the DC or/and do we augment existing protocols Gunter: think about this: concise on QA, are the DC requirements clear enough to justify focus, is there a single solution that can cover all [CLOSED the session] [ Post Sessions Upate - Questions to Group and Respones - with Summary ] << Following Reviews Section Review >> < Question 1 > "Do you fee there are other requirements, not yet captured which can and/or should be included in how we focus on potential DC Routing Work?" Show of hands for 'Yes' = ~30+ Show of hands for 'No' = ~20 < Question 2 > "Do we agree that DC Fabrics have augmented routing requirements with respect to traditional routing?" Show of hands for 'Yes' = ~20 Show of hands for 'No' = 2 < Question 3 > "Do the DC Routing requirements drive towards protocol work?" Show of hands for 'Yes' = ~40 Show of hands for 'No' = ~10 < Requirement Section Summary Statement > - There was agreement in the BOF that DC Routing has different requirements beyond traditional routing environment - DC Routing BOF did not agree upon the correctness or completeness of the presented requirement documents - The DC Routing BOF agreed in the expectation to increase focus from IETF upon DC Routing environments << Following New Solutions Section >> < Question 1 > "Do you agree that the proposed solutions address unique challenges in the DC routing / fabric space?" Show of hands for 'Yes' = ~40-50 < Question 2 > "Would you be interested in contributing to new protocol work focused on DC routing/fabric?" (noted that working on solutions are not mutually exclusive) Show of hands for 'Yes' = ~40-50 < New Solutions Section Summary > - BoF attendees willing to contribute to new work - Positive, active discussion on both proposals << AD Lead Question >> "Would you be willing to work on specific proposals shared today?" Show of hands for "BGPSPF' = ~20 Show of hands for "RIFT" = ~20 < AD Question Summary > - People willing to specifically work on BGPSPF, RIFT, where some hands seemed to indicate they are willing to work on both