idnits 2.17.1 draft-jiang-nmlrg-network-machine-learning-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 169 has weird spacing: '...etworks also ...' -- The document date (October 28, 2016) is 2709 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Machine Learning Research Group S. Jiang 3 Internet-Draft Huawei Technologies Co., Ltd 4 Intended status: Informational October 28, 2016 5 Expires: May 1, 2017 7 Network Machine Learning 8 draft-jiang-nmlrg-network-machine-learning-02 10 Abstract 12 This document introduces background information of machine learning 13 briefly, then explores the potential of machine learning techniques 14 for networks. This document is serving as a white paper of the 15 (proposed) IRTF Network Machine Learning Research Group. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on May 1, 2017. 34 Copyright Notice 36 Copyright (c) 2016 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 3. Brief Background of Machine Learning . . . . . . . . . . . . 3 54 3.1. Machine Learning Categories . . . . . . . . . . . . . . . 3 55 3.2. Machine Learning Approaches . . . . . . . . . . . . . . . 3 56 3.3. Successful Applications . . . . . . . . . . . . . . . . . 5 57 3.4. Precondition of Applying Machine Learning Approach . . . 5 58 3.5. Limitation of Machine Learning Mechanism . . . . . . . . 5 59 4. Network Machine Learning Research Group in IRTF . . . . . . . 6 60 5. Use Cases Study of Applying Machine Learning in Network . . . 7 61 5.1. Network Traffic . . . . . . . . . . . . . . . . . . . . . 7 62 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 63 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 64 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 65 9. Change log [RFC Editor: Please remove] . . . . . . . . . . . 8 66 10. Informative References . . . . . . . . . . . . . . . . . . . 8 67 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 69 1. Introduction 71 Machine learning techniques help to make predictions or decisions by 72 learning from historical data. As machine learning mechanism could 73 dynamically adapt to varying situations and enhance their own 74 intelligence by learning from new data, they are more flexible in 75 handling complicated tasks than strictly static program instructions. 76 Therefore, machine learning techniques have been widely applied in 77 image analysis, pattern recognition, language recognition, 78 conversation simulation, and etc. 80 With deep exploration, machine learning techniques would cast light 81 on studies of autonomic networking, in that they could be well 82 adapted to learn the various environments of networks and react to 83 dynamic situations. 85 The proposed Network Machine Learning Research Group (NMLRG) was 86 formed within IRTF (Internet Research Task Force), October, 2015. As 87 a procedure, currently, IRTF requests an one-year provisional period. 88 After this period, the proposed research group may become a formal 89 research group if there is a steady research community. The NMLRG 90 provides a forum for researchers to explore the potential of machine 91 learning techniques for networks. 93 This document firstly provides background information of machine 94 learning briefly, then explores the potential of machine learning 95 techniques for networks functions, such as network control, network 96 management, and supplying network data for upper-layer applications. 98 Author notice: this document is in the primary stage. It is an 99 ongoing document for the proposed Network Machine Learning Research 100 Group. For now, it is not clear whether it would be published or 101 not. 103 2. Terminology 105 The terminology defined in this document. 107 Machine Learning A computational mechanism that analyzes and learns 108 from data input, either historic data or real-time feedback data, 109 following designed model/pattern. It can be used to make 110 predictions or decision, rather than following strictly static 111 program instructions. 113 3. Brief Background of Machine Learning 115 3.1. Machine Learning Categories 117 Machine learning mechanisms are typically classified into three broad 118 categories, depending on the nature of the learning "signal" or 119 "feedback" available: 121 Supervised learning The machine learning mechanism is given labeled 122 inputs and the correspondent desired outputs. The mechanism could 123 learn a general rule that maps inputs to outputs by itself. 125 Unsupervised learning The given input are not labeled. It leaves 126 the machine learning mechanism itself to find structure in its 127 input and output. 129 Reinforcement learning The machine learning mechanism interacts with 130 dynamic environments in which it performs a certain task and 131 receives feedback from its action. 133 Between supervised and unsupervised learning, there is semi- 134 supervised learning, in which input data are partially labeled. 136 3.2. Machine Learning Approaches 138 There are a few basic machine learning approaches. They can be mixed 139 together to complete complicated tasks. 141 Classification With the training data that has been labeled into a 142 number of classes, the machine learning mechanism could assign new 143 unlabeled data into one or more these classes. An example is SPAM 144 filtering, in which emails are classified into "spam" or "not 145 spam" classes. 147 Clustering Without labeled training data, the machine learning 148 mechanism divides data into groups. It is the learning mechanism 149 itself to decide the number or structure of output classes. 151 Regression It estimates the relationships among variables. The 152 outputs are continuous. 154 Anomaly detection It detects specific data which do not conform to 155 an expected pattern or other data in a data set. 157 Density estimation The machine learning mechanism needs to identify 158 the distribution of input data. 160 Dimensionality reduction The machine learning mechanism could 161 simplify inputs by mapping them into a lower-dimensional space. 163 Decision tree learning The learning output is structured into a 164 decision tree as a predictive model. 166 Association rule learning The learning delivers potential relations 167 between variables. 169 Artificial neural networks also called "neural network". It is 170 inspired by the structure and functions of biological neural 171 networks. It is structured by a number of interconnected 172 computational "neurons", each of which has independent deciding 173 ability. The connections have numeric weights that can be tuned 174 according to feedback and trends, making neural nets adaptive to 175 inputs and capable of learning. 177 Reinforcement learning It is inspired by behaviorist psychology. 178 The mechanism take actions in an environment so as to maximize 179 cumulative reward. 181 Similarity and metric learning It learns from training data a 182 similarity function that measures how similar or related two 183 objects are. 185 Representation learning Also called feature learning. It learns a 186 feature - a transformation of raw data input to a representation 187 that can be effectively exploited in machine learning tasks. 189 This is not a full enumerated list of machine learning approaches. 190 Other approaches may include support vector machines, bayesian 191 networks, inductive logic programming, sparse dictionary learning, 192 genetic algorithms, and etc. 194 Editor notes: the basic algorithms that machine learning approaches 195 use may be listed as a future work. It may be too detailed and too 196 many to be included. 198 3.3. Successful Applications 200 Machine learning approaches have been successfully applied in many 201 areas, such as human behavior analysis, image analysis, nature 202 language recognition (including speech and handwriting processing), 203 conversation simulation, medical diagnosis, structural health 204 monitoring, stock market analysis, biological analysis and 205 classifying, loan and insurance evaluation, game playing, and many 206 other applications. 208 As for network applications, such as search engines, SPAM filtering, 209 adaptive website, Internet fraud detection, online advertising, etc., 210 have all been greatly benefited from the machine learning mechanism. 211 However, most of those successful stories are in the application 212 layer of network perspective. 214 3.4. Precondition of Applying Machine Learning Approach 216 Although it is different from big data or data mining, machine 217 learning does also need data. However, machine learning can be 218 applied with small set of data or dynamic feedback from environment. 219 The quality of data decides the efficient and accuracy of machine 220 learning. 222 There is no generic machine learning mechanism that could suitable 223 for all or most of use cases. For each use case, the developers need 224 to design a specific analysis path, which may combine multiple 225 approaches or algorithms together. The feature design and analysis 226 path design are the key factor in the machine learning applications. 228 To achieve autonomic decision or minimize the human intervention, 229 there should be evaluation system for the results of machine learning 230 mechanism. The evaluation system could be the measurement that the 231 results of machine learning mechanism are executed. The evaluation 232 system and machine learning mechanism could compose a close decision 233 loop for autonomic decision. 235 3.5. Limitation of Machine Learning Mechanism 237 So far, the machine learning mechanism does not perform very well for 238 accurate result. In most successful cases, it is used as an 239 assistant analysis tool. Its results are usually accepted in fault- 240 tolerant environment or with further human confirmation. 242 4. Network Machine Learning Research Group in IRTF 244 The Network Machine Learning Research Group (NMLRG), which was formed 245 as a proposed research group of IRTF, October, 2015 (as a procedure, 246 a proposed research group may become a formal research group after 247 one year provisional period), provides a forum for researchers to 248 explore the potential of machine learning techniques for networks. 249 In particular, the NMLRG will work on potential approaches that apply 250 machine learning techniques in network control, network management, 251 and supplying network data for upper-layer applications. 253 The initial focus of the NMLRG will be on higher-layer concepts where 254 the machine learning mechanism could be applied in order to enhance 255 the network establishing, controlling, managing, network applications 256 and customer services. This includes mechanisms to acquire knowledge 257 from the existing networks so that new networks can be established 258 with minimum efforts; the potential to use machine learning 259 mechanisms for routing control and optimization; using machine 260 learning mechanisms in network management to predict future network 261 status; using machine learning mechanisms to autonomic and dynamical 262 network management; using machine learning mechanisms to analyze 263 network faults and support recovery; learning network attacks and 264 their behaviors, so that protection mechanisms could be self- 265 adapted; unifying the data structure and the communication interface 266 between network/network devices and customers, so that the upper- 267 layer applications could easily obtain relevant network information, 268 etc. The NMLRG is expected to identify and document requirements, to 269 survey possible approaches, to provide specifications for proposed 270 solutions, and to prove concepts with prototype implementations that 271 can be tested in real-world environments. 273 The more knowledge we have, the more intelligent we are. It is the 274 same for networks and network management. Up to now, the only 275 available network knowledge is usually the current network status 276 inside a given device or relevant current status from other devices. 277 However, historic knowledge is very helpful to make correct 278 decisions, in particular to reduce network oscillation or to manage 279 network resources over time. Transplantable knowledge from other 280 networks can be helpful to initially set up a new network or new 281 network devices. Knowledge of relationships between network events 282 and network configuration may help a network to decide the best 283 parameters according to real performance feedback. In addition to 284 such historic knowledge, powerful data analytics of current network 285 conditions may also be a valuable source of knowledge that can be 286 exploited directly. The machine learning mechanism is the 287 correspondent mechanism to learn and apply knowledge intelligently. 289 5. Use Cases Study of Applying Machine Learning in Network 291 In 2016, the NMLRG is focusing on collecting and studying of use 292 cases that applies machine learning mechanisms into network area. 293 More use cases are still in the collecting process. 295 5.1. Network Traffic 297 Network traffic is one of the most important objectives that needs to 298 be managed in network/Internet area. 300 Network traffic meets preconditions of applying Machine Learning 301 mechanisms. It is full of data: the network traffic itself is data 302 source, also there are many properties of network traffic are 303 measurable, such as latency, number of packets, last period, etc. 304 The network traffics are complicated. Its characteristics are often 305 beyond the awareness of human operators. Machine Learning would 306 greatly help to discover knowledge regarding to network traffics. 307 The network traffics are always dynamic changing. There is both 308 regularity and irregularity. Quick response to real-time network 309 traffic is a big challenge to network management. It is beyond the 310 ability of human operator. The rigid management has already become a 311 bottleneck of current networks. Machine Learning could form a quick 312 and adaptive auto response managing system. 314 There are many different types of network traffic. In April 2016, 315 NMLRG #2 IETF 95 meeting was organized with the theme of network 316 traffic. There are multiple use cases presented: HTTPS traffic 317 classification, machine learning in the router - learn from and act 318 on network traffics, applications of machine learning to flow-based 319 monitoring, malicious domains: automatic detection with DNS traffic 320 analysis, machine-learning based policy derivation and evaluation in 321 broadband networks, predicting interface failures for better traffic 322 management 324 NMLRG is currently working on a dedicated document for this theme. 325 It is potential this document becomes RG document and is published as 326 a RFC in the future. 328 6. Security Considerations 330 This document is focused on applying machine learning in network, 331 including of course applying machine learning in network security, on 332 higher-layer concepts. Therefore, it does not itself create any new 333 security issues. 335 7. IANA Considerations 337 This memo includes no request to IANA. 339 8. Acknowledgements 341 The author would like to acknowledge the valuable comments made by 342 participants in the IRTF Network Machine Learning Research Group, 343 particular thanks to Lars Eggert, Brian Carpenter, Albert Cabellos, 344 Shufan Ji, Panagiotis Demestichas, Jerome Francois, Susan Hares, 345 Rudra Saha, Dacheng Zhang and Bing Liu. 347 This document was produced using the xml2rfc tool [RFC2629]. 349 9. Change log [RFC Editor: Please remove] 351 draft-jiang-nmlrg-network-machine-learning-01: adding brief 352 description of network traffic and ML into use case study, 2016-4-23. 354 draft-jiang-nmlrg-network-machine-learning-00: original version, 355 2015-10-19. 357 10. Informative References 359 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 360 DOI 10.17487/RFC2629, June 1999, 361 . 363 Author's Address 365 Sheng Jiang 366 Huawei Technologies Co., Ltd 367 Q14, Huawei Campus, No.156 Beiqing Road 368 Hai-Dian District, Beijing, 100095 369 P.R. China 371 Email: jiangsheng@huawei.com