Networking Machine Learning Proposed Research Group Meeting session in IETF-95,Buenos Aires, Argentina Thursday Morning session I, 10:00-12:30, Buen Ayre A Chairs: Albert Cabellos & Sheng Jiang Minuets taker: Bing Liu 1. WG Dash - co-chairs 2. Meeting theme guidance - Sheng Jiang Network traffic is one of the most important objectives that needs to be managed in network/Internet area. And it meets preconditions of applying Machine Learning mechanisms - data. 3. HTTPS Traffic Classification - Jerome Francois [Sheng]: where is the data come from, where do you collect the traffic? [Jerome]: operators can collect data from the firewall/gateway in university or companies. In our case, we collect it in a university, in SSL gateway. [Sheng]: that means you get the data in your domain, you only monitoring the outgoing traffic? [Jerome]: yes. [Sheng]: feature selection mechanism, also uses ML? [Jerome]: yes, part of the features [Sheng]: "without new learning phase", do you mean you don't apply the evaluation result back as the new training data? [Jerome]: what you mean is that we learn over the traffic, then apply the testing data, we need everything, we don't use timestamps; in that case, we use the timestamps for connection, and classify and see when it classifies the HTTPS traffic well. [Sheng]: it would be very interesting if you can show both, without-new- learning-phase and add-back-result-to-training-data, compare them to show the difference. [Sheng]: a feedback not a question: you showed a very good analysis result of your mechanism. Actually, many carriers now use DPI to monitor the traffic, and use the DPI result to dynamically manage the traffic. That would be nice if you can find a use case to combine your ML mechanism to network traffic management, to become some kind of autonomic management. [Jerome]: yes, it can work for a lot of applications. Our main use case is filtering, firewalling, not on that DPI use case. [Sheng]: with more interruption data traffic, traditional DPI is much less influenced, that's something you can do to replace the traditional DPI. [Jerome]: yes, thanks. [Sumandra Majee]: two questions, 1) this is probably not very unique in the sense that commercial providers already do that, I know at least two commercial providers do the similar thing. So this worik is for open source, or reseach? 2) how many attributes are you stroring, how much memory you require, off-line/on-line, can you do per-packet process. I'll take it offline. [Sumandra]: trouble for comparing approach is, e.g., bittorent encryption always changes in different version. 4. ML in the Router: Learn from and Act on Network Traffics - Bing Liu [Iftekhar Hussain]: question to use case 4-2. I can probe and get information, that's available on some of the systems today, so what additional intelligence is ML bring to this use case? [Bing]: each router can do random exploring such as RNN, and the whole ring could get an integrated result. It's still an initial thought, I haven't tested it yet. [Lee Howard]: question to the same use case 4-2. I have an almost religious opposition on active probing. You're almost saying you can do traffic optimization using ML. I think the next step is to do something like dynamic traffic management, or QoS you mentioned as the possibility. Some traffic is more link-sensitive, others might be more delay-tolerance, maybe there are different paths based on both local and global conditions, based on this kind of ML. [Bing]: this specific use case is too simple, in current technology, the traffic needs always follow the same direction. But I agree with you we have more space to explore in this field. [Lee]: something like this kind of analysis and traffic management with something like segment routing, might be some action local routers can take to affect the route locally, without necessarily affect the path globally. [Bing]: ok, thanks. 5. Applications of Machine Learning to Flow-based Monitoring - Josep Sanjuas (Remote) [Lee Howard]: the chart of accuracy, the averages don't look like they change very much, but the variance changes significantly with lot of re-training. Over 100 re-trainings, although the average is about the same, there is much less experienced, so you more like to classify individual flows? Just make sure you didn't mis-classify the fairly significant amount of traffic. [Josep]: yes. [Doug Montgomery]: you mean the re-training is triggered by measuring the accuracy or logic, how do you define how often you do the re-training? [Josep]: we have one parameter of the accuracy, whenever the self- assessment of the accuracy is below the number, it will be re- training, it's all automatic, there's no manual intervene here. [Doug Montgomery]: if I understand correctly, your service is cloud based ? [Josep]: yes. We get the accuracy on the training result itself. [Enno Rey]: the training set generated is only for one particular network, or did you find you could re-use it for different networks? I'll be interested to see how those results been compared. [Josep]: The accuracy might be lower, if training one network and apply to another. 6. Malicious Domains: Automatic Detection with DNS Traffic Analysis - Giovane Moura [Albert Cabellos]: did you try any other algorithms for naming? Like, any other classification algorithms. [Giovane]: for this case, no. There are definitely other can be tried, I didn't take look into that yet. [Anurag Sharma]: did you try to reduce the kind of features based on dimensions like, that some of them are less important than others? [Giovane]: I haven't worked on it. This is an industrial project, I had to make it work. But yes, this should be something be addressed. [Jerome Francois]: you mentioned you notify the registrar, what kind of feedback from the registrar? [Giovane]: Every registry has the policy that they can't take away the domains directly, their policy is to firstly notify the registrar. I contacted with two DNS registries. In some cases they said the bad domains had been notified. For bank-fishing domains, as the registrar observed, they would be quickly taken down. [Stuart Chesire]: very interesting work. With the recent work on improving privacy, and things like the Q-name minimization, do you foresee the source of data drying up? [Giovane]: I think we still see the TLD (Top Level Domain) queries, Martin is working on the Q-name minimization impact, Mark? [Mark ?]: this specific research isn't very much handle Qname minimization, we primarily look at the registry data or the registry systems, so we see new domain names been registered, those are the triggers for the NGO to do the research. Qname minimization is interesting, but not so far the field we research. 7. Machine-learning based Policy Derivation and Evaluation in Broadband Networks - Panagiotis Demestichas (Remote) No questions/comments. 8. Predicting Interface Failures For Better Traffic Management - Rudra Saha (Remote) [Sheng]: how confident this result would be that we may set up the autonomic switch function there to switch to another interface/path? [Rudra]: we haven't actually limited current switching mechanism from one network path to another network path, it depends on what action the network might take. For the presented case, 66.7% is very low, this is because the kind of date set we currently have, and the kind of training we have done. We're in the process of collecting much more robust data set. [Albert]: you mentioned at the beginning, that you not only predict when there is a failure, and also try to spin why, I guess you mean root cause analysis? [Rudra]: yes. Standby presentation: Analytic Framework for NFV Orchestrator, Vic Liu No time for this presentation. 9. Summary & NMLRG Future Activities - co-chairs Promoted the audience to join in the NMLRG #3 meeting, which will be held as a workshop in EuCNC2016, June 27, Athens, Greek. ETSI has formed a new Next Generation Protocol (NGP) Focus group. Within it, machine learning is introduced as a mechanism for autonomic decision in network controlling and management.