IRTF Open Meeting @ IETF-92
Dallas, TX, USA
TUESDAY, March 24, 2015
1730-1830 CDT Tuesday Afternoon Session III

Applied Networking Prize (ANRP) Award Talks
  45 min

*** Aaron Gember-Jacobson *** for designing and evaluating an NFV control plane:

   Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl,
   Junaid Khalid, Sourav Das and Aditya Akella. OpenNF: Enabling Innovation in
   Network Function Control. Proc ACM SIGCOMM, August 2014.

Aaron Gember-Jacobson (AGJ) presents.

Meetecho recording is available at: http://recordings.conf.meetecho.com/Playout/
watch.jsp?recording=IETF92_IRTFOPEN&chapter=chapter_0

Q&A:

Kevin Fall (KF) - You talked about cases of per flow state, what about things
that are not perflow that could be large, e.g. malware chunks previously seen…
So what does the graph of size versus impact look like?

AGJ - In the case of iptables state for a single flow less than a kB, in the
case of bro 100-200 kB of state per flow - so it’s reasonably small, it’s true.
You can pro-actively copy state in replay events - that is future work. Don’t
assume that everything is perflow - good example of multi-flow state is objects
in a cache - cache sharing protocols exist - could disregard on assumption that
object will get recached, so could make a tradeoff in this case - may not be
critical to copy state, but maybe if you move connection in the middle of
serving a client from cache you definitely want to move state.

KF - It depends on the semantics.

AGJ - Exactly, it’s very dependent on semantics.

KF - If i had a cascade of three or four of these functions and one of them
frobs the packets in some way such that reclassification of the prior uplink
needs to be done but now that’s been migrated to some other place - how can you
handle that? Are there scheduling techniques you can apply?

AGJ - We have thought a little about NF chains - we think that in many cases,
you can migrate for one middlebox from one chain at a time - temporary
redirection - you can do better scheduling if you look at the entire chain at a
time - need more thought with regard to extending safety guarantees across
multiple NFs.

KF - Last comment related to moving state ahead of time - something along the
lines of distributed shared memory - look at page accesses might be relevant.

?1 - in some cases you can’t fix this problem just with the controller and
moving the state - subscriber must be made aware - in some cases only
application itself can move state and can inform other elements that the
subscriber has been moved - for some applications you can change state with
controller, for some others only application itself can correctly decide how to
move state.

AGJ - I agree that there’s some information that you need to know aobut the NFs
to know how you’re going to go about writing these applications - that’s
something that we haven’t yet done a good job of capturing - we hope some of our
program analysis could give you a simplified model of how this NF works or give
recommendations regarding what control applications should do, and if you have
it do that you’ll get some equivalency level of output - There are interesting
questions about how you communicate that with someone who’s trying to write a
control application.

?2 - Regarding state move, what’s the condition to check the move? statefully
configured or what?

AGJ  - It’s really up to control applications how they want to do it. Control
application in the scaling scenario could be monitoring CPU and then perform
measurements to identify elephant flows to figure out which flows to move from
one box to another - completely flexible, you could impelment whatever you
wanted there.

?2 - i assume that when you move something it could also cost you, e.g.
bandwidth, so intially you want to move to meet SLA, but you could be making the
problem worse - it’s not clear that moving is a solution to the SLA problem.

AGJ - There are other SLAs (I was referrring to SLAs for the NF itself) - you’re
right - what you are doing in the network can have an impact - can be more
proactive - eg. if you’re getting close to SLA violation you can pre-emptively
migrate flows - we also want to look at reducing the amount of state that we
transfer, some of our program analysis is trying to understand rather than
exporting all of the state that the NF is maintaining maybe focus on updated
state only, or maybe some state affects the packets that are output by our NF
and other state affects the log - maybe log accuracy isn’t important e.g. for a
caching proxy, so we don’t bother to move that state - so may be able to limit
what state is moved in return for a relaxed notion of the behaviour of your
network function and how much it compares to what one you would have gotten if
you didn’t move it all.

Diego Lopez (DL) - You are mentioning middleboxes - we are working with network
functions that are related purely to control plane - routing functions,
forwarding functions in general - how do you see this kind of framework applying
in that environment?

AGJ - Excellent question, we haven’t really thought about it in terms of control
plane devices, only thought about data plane devices - probably a different
problem there and possibly a simpler solution for control plane. Thing that
comes most to mind is distributed SDN controller case where your SDN controller
is your control plane so there you’re concerned about moving state, but you
don’t have packets moving through this controller so you don’t have that
challenge to deal with.

DL - when you move function, performance penalties arise - i see a value for
this in the case of dataplane functions, on the other hand the penalties of a
formal framework like this need to be considered. We have a project on
virtualising home routers i’m wondering if this could be applicable?

AGJ - One challenge that you certainly face is: “Where is this going to?” - this
is a standard NFV challenge - migrating across a datacentre is a very different
problem to migrating across a state or country. One may be feasible, one
probably not.

DL - If I understand, I see a similarity with object-oriented programming object
persistence frameworks. Is this a clear connection?

AGJ - We haven’t looked specifically at that body of research - although we have
started to look at it as we do some of the program analysis, e.g. what objects
exist beyond the processing of a single packet and what objects are only used
during the processing of that one packet at this middlebox. There is definitely
a broader body of work there that is worth considering.

DL - There are some researchers that are starting to think about a network
programming paradigm that is object-oriented, and starting to think about
persistence. With regards to  control application, control plane - if you take
SDN architecture and NFV architecture there is a mismatch - then this is
additional framework, so now we have three axes - how do you see the whole thing
matching?

AGJ - I think some of what this controller is doing could be part of some other
controller that’s already doing something in SDN or NFV orchestration things in
the network - but tight integration is unclear, each solving a slightly
different problem - going to need some interfaces there - NFV orchestration is
similar, you may have an interface into your system that’s going to worry about
launching the VMs themselves and then a system that’s going to worry about which
NF image to place on the VM - at what point do we end up with too many
controllers running around the network? I expect we are rapidly approaching that
- it’s a big open problem.

DL - In place of one centralised controller, you may end up with 4 or 5
‘centralised’ controllers. This is a good challenge for us operators. Very
interesting.

MF - Is it trivial to bound the amount of bufferspace you need in the controller
or is that automatically bounded by something else, e.g. the number of flows you
can migrate?

AGJ - In theory it’s reasonably predicatable - you know how much on average, how
big state is and we can predict how long it’s going to take to transfer that.
There’s a tradeoff between the more state you’re transferring, the longer it
takes and the more buffering you need to do. You can elect to move flows in
pieces, e.g. move 10 flows at a time, then next 10, and so on. The challenge you
run into there is that now you are breaking flows smaller, the forwarding
entries in your switch need to be broken down that much smaller, which may or
may not be OK. Buffering is still a big challenge with this framework and we
don’t have a great answer for how to go about reducing that.

Meeting adjourned.