42nd NMRG meeting, Monday, 3/27 Topic: Next steps in Autonomic Networking Thanks to John Strassner (minutes taker) and Jéferson Campos Nobre (jabber scribe)! Agenda: 1. Meeting introduction, NMRG chairs 2. Self-driving networks, Kireeti Kompella 3. Autonomic networking retrospective, Jéferson Campos Nobre 4. Open discussion: gaps, challenges, and propositions for an actualized road-map towards Autonomic Networking This meeting is part of the series on defining a new research agenda for the RG. Main conclusion of the meeting: • Initiate a new ID defining the actualized challenges, gaps, and document research directions and an action plan for AN (towards IRTF, IETF and beyond). 1. NMRG Chair Discussion 1.1. RG Status - draft-irtf-nmrg-autonomic-sla-violation-detection passed RGLC, ready for IRSG review - draft-irtf-nmrg-location-ipfix-07 passed IESG conflict review, needs to go through IETF review and IESG approval 1.2. Next Meetings - Will meet at IM2017 (Lisbon, 5/8-12), topic TBD - IETF99 (Prague), workshop on measurement-based network management - IETF100 (Singapore), full-day workshop on intent-driven networking (TBC) 1.3, Research Agenda - Need your input and feedback about what to focus on! 2. Self-Driving Network, Kireeti Kompella Made a presentation at the MPLS World Congress a year ago. This was the first public presentation of this concept. It made an analogy to self-driving cars, which took 10 years from vision to prototype. Note that the first attempt in 2004 FAILED. This talk is about the vision required to realize a self-driving network. A self-driving network is different than a self-driving car. The self-driving network needs to do the following: - Accept “guidance” from a network operator - Self-discover its constituent parts - Self-organize and self-configure - Self-monitor using probes and other techniques - Auto-detect and auto-enable new customers - Automatically monitor and update service delivery - Self-diagnose using machine learning and self-heal - Self-report periodically There are five key technologies required to realize this challenge. However, the realization of these technologies differs for a self-driving car compared to a self-driving network: - Telemetry - Multi-dimensional Views - Automation - Declarative Intent (important for removing human from the loop) - Decision-Making - Rule-based - Machine learning Telemetry - Various sensors (e.g., speedometer, gas gauge) assist the user. - Self-driving requires more, so Google added LIDAR, which can do a 360 degree scan. Note that Car is $70K and Lidar is $80K - Look at slide 7. You have a producer and a consumer. You do NOT want to have to poll for these data. Data should be provided in a real-time, streaming format that is optimized for machine consumption. Multi-Dimensional Views - Cars have different types of sensors; rely on human to interpret - Networks today have specific views that provide specific types of data for specific purposes. Tends to be siloed. - Future networks will need different views that enable data to be correlated across multiple domains, spanning multiple geographies Automation - Where we are today: Chassis and Line Cards; on top of this, you have an OS. - On the top you have various solutions (e.g., Python, Chef, Ansible, Puppet) - In the middle, you have various communication mechanisms (e.g., RPC) that in the IETF case, support Netconf and Restconf; then you have a framework connecting this to the top layer. - Didn't describe automation for the future Declarative Statement of Intent - For Cars, state where you want to go, with hints (e.g., fastest time vs shortest distance vs. most efficient use of battery) - Next, have the car talk to your phone to see where you need to be - For Networks, we want to state a high-level, declarative specification of service requirements - Parse this, and produce config changes - For Future Networks, we need to do this better Decision-Making: Rule-based vs. Machine Learning - Rule-based: simple, straightforward, but doesn't scale (basically, a set of IF-THEN production rules that grows with number of decisions to be made) - Machine Learning: fastest way to learn complex behavior, but - can come to strange conclusions - hard to know what it knows - Typically, you pair these two approaches to build a system Five Stages of Self-Driving - Manual - Visualization (where we currently are) - Analysis and Prediction - Recommendation - Autonomous Decisions We are at the second level (visualization); we can augment to level 4 (recommendation). This can be easily analyzed by an experienced human to help. How do we get this kicked off? Shown in slide 15. We really need to fix telemetry. Telemetry needs to be optimized for machine consumption, not for humans! Then we need to optimize how to assist in decision making. We really need standardized data models and interactions to help collect the right data at the right time, and correlate them to produce actionable results. The hypothesis is to position this similar to DARPA's Grand Challenge that was given to self-driving cars. Note, however, that there is a huge skill set change required to realize this vision. For example, you don't need to set OSPF metrics, but you do need to be able define high-level, declarative policies that can specify what you want. The network is not programmed, node by node; rather, it is told what to do, and has the intelligence to do its tasks. Summary - there is a compelling vision that has strong economic implications (reducing OpEx, reducing time to market, increasing efficiency) - the ability to predict what is needed in the future, and anticipate what is needed and adapt the infrastructure to support those needs, is critical for supporting future business models. - this will also help security, not just management Giovane Moura: what is really new here? Isn't this just autonomics? Kiretti: I haven't really followed autonomic networking. What I want to do is to automate actions in the NOC or POP. How can you reduce the burden on the user? Pedro Martinez-Julia: How much have you played with or implemented? Is there a testbed? Kiretti: I started playing with bandwidth, but I lack data of sufficient size. I want to be able to do (long-term) trend analysis, so that I can predict where problems will be. As another example, if I see traffic on a port is vastly different than normal, this should trigger an alarm. We have the data available but no good way to get at it. Syslog analysis using rule-based analysis is too hard. Pedro Martinez-Julia: have you looked at other, more complex, systems? Kiretti: I don't care as much about implementation as I do about the challenge itself. When DARPA made the challenge, they did not say "use a specific implementation or technology". Dan Druta (AT&T): Interesting vision, good analogy. There are probably networks that are fairly self-sufficient. It all comes down to legacy. In the self-driving car case, it is usually the human that causes the problem. But legacy is a big issue because of the economic investments present. Hence, the deterrent is scale and legacy interworking. Kiretti: Yes and no. The big problem is not the bits and pieces that exist, but rather, enabling a car to drive itself. Diego: It is not just about the availability of data, the data must be shared. This is bogged down in arguments. Then, you have to sift through the data. You need to be able to not just generate good data, but also noise, to perfect how we identify the good data. Kiretti: I don't want to polish a diamond. Rather I want multi-modal learning, where I can combine different data (e.g., visual and audio). LuYuan, eBay: deep learning allows pooling of different types of knowledge. We have a lot of work to do before we get to the point of choosing an algorithm. 3. Autonomic Networking Retrospective, Jeferson Campos Nobre We have been focused on specifying a minimal set of properties for an autonomic system. Rather than focus on things like self-CHOP, we think that applying autonomics to manage the complete network lifecycle is more important. Reviewed simplified figure of FOCALE, emphasizing its use of two different control loops. Many efforts, such as ANA, UMF, and GANA, as well as recent efforts (e.g., SUPA, HOMENET, SDNRG, NFVRG), exist. Autonomic networking is usually addressed by the Network Management community (e.g., with IM, NOMS, and CNSM); NMRG coordinates with these and other communities. Previous editions of Autonomic Networking in the NMRG led to: - overview of technologies and terminology for autonomics applied to networking and network management - gap analysis, lessons used, and real world experiences - two RFCs (7575 and 7576) The UCAN BoF was popular, and resulted in the chartering of ANIMA. The focus of this WG is professionally managed networks. ANIMA has 4 major deliverables: - discovery of autonomic nodes (GRASP) - negotiation for autonomic nodes (GRASP) - bootstrapping a trust infrastructure (BRSKI) - definition of a separate autonomic control plane (ACP) However, other work items exist, and can either be satisfied by rechartering or as NMRG submissions - intent policy, ASAs, coordination of autonomic nodes Perhaps NMRG is a good home for these types of discussions: - definitions, goals, and gap analysis needs additional work - in particular, intent policies are controversial and out of scope - draft-pentikousis-nmrg-andr is an example - machine learning, and how this set of technologies can be used by autonomics - intent (e.g., pentikousis-supa-mapping) - how new paradigms, such as NFV and SDN, affect programmability and autonomics - fully programmable network elements and functions 4. Laurent: What are our next steps? Diego: there are many places where we have many touch points. Maybe we should start by identifying gaps and challenges. Also, note that there is a distinct difference between software-based networking vs. software-based network management. Diego would be happy to be a part of this. Kiretti: standardized data models. This needs to evolve as we understand data better. Can we have a shared data repository for our experiments? John: Actually, we need a standardized information model, or at least, fragments of one. Data models are platform-, language-, and/or protocol-specific. An information model defines the concepts independent of these parameters. If we don't have common concepts, then data models will fail.