IRTF Open Meeting @ IETF-92 Dallas, TX, USA TUESDAY, March 24, 2015 1730-1830 CDT Tuesday Afternoon Session III Applied Networking Prize (ANRP) Award Talks 45 min *** Aaron Gember-Jacobson *** for designing and evaluating an NFV control plane: Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das and Aditya Akella. OpenNF: Enabling Innovation in Network Function Control. Proc ACM SIGCOMM, August 2014. Aaron Gember-Jacobson (AGJ) presents. Meetecho recording is available at: http://recordings.conf.meetecho.com/Playout/ watch.jsp?recording=IETF92_IRTFOPEN&chapter=chapter_0 Q&A: Kevin Fall (KF) - You talked about cases of per flow state, what about things that are not perflow that could be large, e.g. malware chunks previously seen… So what does the graph of size versus impact look like? AGJ - In the case of iptables state for a single flow less than a kB, in the case of bro 100-200 kB of state per flow - so it’s reasonably small, it’s true. You can pro-actively copy state in replay events - that is future work. Don’t assume that everything is perflow - good example of multi-flow state is objects in a cache - cache sharing protocols exist - could disregard on assumption that object will get recached, so could make a tradeoff in this case - may not be critical to copy state, but maybe if you move connection in the middle of serving a client from cache you definitely want to move state. KF - It depends on the semantics. AGJ - Exactly, it’s very dependent on semantics. KF - If i had a cascade of three or four of these functions and one of them frobs the packets in some way such that reclassification of the prior uplink needs to be done but now that’s been migrated to some other place - how can you handle that? Are there scheduling techniques you can apply? AGJ - We have thought a little about NF chains - we think that in many cases, you can migrate for one middlebox from one chain at a time - temporary redirection - you can do better scheduling if you look at the entire chain at a time - need more thought with regard to extending safety guarantees across multiple NFs. KF - Last comment related to moving state ahead of time - something along the lines of distributed shared memory - look at page accesses might be relevant. ?1 - in some cases you can’t fix this problem just with the controller and moving the state - subscriber must be made aware - in some cases only application itself can move state and can inform other elements that the subscriber has been moved - for some applications you can change state with controller, for some others only application itself can correctly decide how to move state. AGJ - I agree that there’s some information that you need to know aobut the NFs to know how you’re going to go about writing these applications - that’s something that we haven’t yet done a good job of capturing - we hope some of our program analysis could give you a simplified model of how this NF works or give recommendations regarding what control applications should do, and if you have it do that you’ll get some equivalency level of output - There are interesting questions about how you communicate that with someone who’s trying to write a control application. ?2 - Regarding state move, what’s the condition to check the move? statefully configured or what? AGJ - It’s really up to control applications how they want to do it. Control application in the scaling scenario could be monitoring CPU and then perform measurements to identify elephant flows to figure out which flows to move from one box to another - completely flexible, you could impelment whatever you wanted there. ?2 - i assume that when you move something it could also cost you, e.g. bandwidth, so intially you want to move to meet SLA, but you could be making the problem worse - it’s not clear that moving is a solution to the SLA problem. AGJ - There are other SLAs (I was referrring to SLAs for the NF itself) - you’re right - what you are doing in the network can have an impact - can be more proactive - eg. if you’re getting close to SLA violation you can pre-emptively migrate flows - we also want to look at reducing the amount of state that we transfer, some of our program analysis is trying to understand rather than exporting all of the state that the NF is maintaining maybe focus on updated state only, or maybe some state affects the packets that are output by our NF and other state affects the log - maybe log accuracy isn’t important e.g. for a caching proxy, so we don’t bother to move that state - so may be able to limit what state is moved in return for a relaxed notion of the behaviour of your network function and how much it compares to what one you would have gotten if you didn’t move it all. Diego Lopez (DL) - You are mentioning middleboxes - we are working with network functions that are related purely to control plane - routing functions, forwarding functions in general - how do you see this kind of framework applying in that environment? AGJ - Excellent question, we haven’t really thought about it in terms of control plane devices, only thought about data plane devices - probably a different problem there and possibly a simpler solution for control plane. Thing that comes most to mind is distributed SDN controller case where your SDN controller is your control plane so there you’re concerned about moving state, but you don’t have packets moving through this controller so you don’t have that challenge to deal with. DL - when you move function, performance penalties arise - i see a value for this in the case of dataplane functions, on the other hand the penalties of a formal framework like this need to be considered. We have a project on virtualising home routers i’m wondering if this could be applicable? AGJ - One challenge that you certainly face is: “Where is this going to?” - this is a standard NFV challenge - migrating across a datacentre is a very different problem to migrating across a state or country. One may be feasible, one probably not. DL - If I understand, I see a similarity with object-oriented programming object persistence frameworks. Is this a clear connection? AGJ - We haven’t looked specifically at that body of research - although we have started to look at it as we do some of the program analysis, e.g. what objects exist beyond the processing of a single packet and what objects are only used during the processing of that one packet at this middlebox. There is definitely a broader body of work there that is worth considering. DL - There are some researchers that are starting to think about a network programming paradigm that is object-oriented, and starting to think about persistence. With regards to control application, control plane - if you take SDN architecture and NFV architecture there is a mismatch - then this is additional framework, so now we have three axes - how do you see the whole thing matching? AGJ - I think some of what this controller is doing could be part of some other controller that’s already doing something in SDN or NFV orchestration things in the network - but tight integration is unclear, each solving a slightly different problem - going to need some interfaces there - NFV orchestration is similar, you may have an interface into your system that’s going to worry about launching the VMs themselves and then a system that’s going to worry about which NF image to place on the VM - at what point do we end up with too many controllers running around the network? I expect we are rapidly approaching that - it’s a big open problem. DL - In place of one centralised controller, you may end up with 4 or 5 ‘centralised’ controllers. This is a good challenge for us operators. Very interesting. MF - Is it trivial to bound the amount of bufferspace you need in the controller or is that automatically bounded by something else, e.g. the number of flows you can migrate? AGJ - In theory it’s reasonably predicatable - you know how much on average, how big state is and we can predict how long it’s going to take to transfer that. There’s a tradeoff between the more state you’re transferring, the longer it takes and the more buffering you need to do. You can elect to move flows in pieces, e.g. move 10 flows at a time, then next 10, and so on. The challenge you run into there is that now you are breaking flows smaller, the forwarding entries in your switch need to be broken down that much smaller, which may or may not be OK. Buffering is still a big challenge with this framework and we don’t have a great answer for how to go about reducing that. Meeting adjourned.