idnits 2.17.1 draft-zheng-opsawg-network-ai-usecases-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2599 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 374, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Zheng 3 Internet-Draft China Unicom 4 Intended status: Informational S. Xu 5 Expires: September 14, 2017 D. Dhody 6 Huawei Technologies 7 March 13, 2017 9 Usecases for Network Artificial Intelligence (NAI) 10 draft-zheng-opsawg-network-ai-usecases-00 12 Abstract 14 This document discusses the scope of Network Artificial Intelligence 15 (NAI), and the possible use cases that are able to demonstrate the 16 advantage of applying NAI. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on September 14, 2017. 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. NAI Architecture . . . . . . . . . . . . . . . . . . . . . . 3 54 3. NAI Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 3 55 3.1. Traffic Predication and Re-Optimization/Adjustment . . . 3 56 3.2. Route Monitoring and Analytics . . . . . . . . . . . . . 4 57 3.3. Multilayer Fault Detection In NFV Framework . . . . . . . 5 58 3.4. Data Center Network Use Cases . . . . . . . . . . . . . . 7 59 3.4.1. Service Function Chaining . . . . . . . . . . . . . . 7 60 4. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 8 61 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 62 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 63 7. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 8 64 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 65 8.1. Normative References . . . . . . . . . . . . . . . . . . 9 66 8.2. Informative References . . . . . . . . . . . . . . . . . 9 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 69 1. Introduction 71 Current networks have become much more dynamic and complex, and pose 72 new challenges for network management and optimization. For example, 73 network management/optimization should be automated to avoid human 74 intervention (and thus to minimize the operational expense). 75 Artificial Intelligence (AI) and Machine Learning (ML) is a promising 76 approach to realize such automation, and can even do better than 77 human beings. Furthermore, the population of Software-Defined 78 Networks (SDN) paradigm makes the application of Artificial 79 Intelligence in networks possible, since the SDN controller has the 80 complete knowledge of the network status and can control behavior of 81 network nodes to implement AI decisions. 83 AI and ML technologies can learn from historical data, and make 84 predictions or decisions, rather than following strictly static 85 program instructions. They can dynamically adapt to a changing 86 situation and enhance their own intelligence with by learning from 87 new data. It can learn and complete complicated tasks. It also has 88 potential in the network technology area especially with SDN and 89 Network Function Virtualization (NFV). 91 This document presents the concept of Network Artificial 92 Intelligence. It first discusses the scope of Network Artificial 93 Intelligence (NAI). And then Some use cases are discussed to 94 demonstrate the advantage of applying NAI. 96 2. NAI Architecture 98 The definition of the architecture of NAI could be refer to 99 [I-D.li-rtgwg-network-ai-arch]. In the architecture of NAI, central 100 controller is the core part of Network Artificial Intelligence which 101 can be called as 'Network Brain'. The Network Telemetry and 102 Analytics (NTA) engines can be introduced acompanying with the 103 central controller. The Network Telemetry and Analytics (NTA) engine 104 inclues data collector, analytics framework, data persistence, and 105 NAI applications. 107 ^ ^ 108 (4)| |(4) 109 +---------------|--------------+ +---------------|--------------+ 110 | Domain 1 | | | | Domain 2 | 111 | +------------+ | | +------------+ | 112 | | Central | | | | Central | | 113 | (1)| Controller |----------------------| Controller |(1) | 114 | | with | | | | with | | 115 | | NTA | | | | NTA | | 116 | +------------+ | | +------------+ | 117 | / \ | | / \ | 118 | (3)/ \ | | / \(3) | 119 | / \ | | / \ | 120 | +--------+ +--------+ | | +--------+ +--------+ | 121 | | | | | | | | | | | | 122 | |Network | ...... |Network | | | |Network | ...... |Network | | 123 | | Device | (2) | Device | | | | Device | (2) | Device | | 124 | | 1 | | N | | | | 1 | | N | | 125 | +--------+ +--------+ | | +--------+ +--------+ | 126 | | | | 127 +------------------------------+ +------------------------------+ 129 Figure 1: An Architecture of Network Artificial Intelligence(NAI) 131 3. NAI Use Cases 133 3.1. Traffic Predication and Re-Optimization/Adjustment 135 This subsection introduces the Path Computation Element (PCE) 136 [RFC4655] use cases in wide area networks (WAN). In PCE scenario, 137 network data collection is realized through the control plane 138 protocols such as PCE protocol (PCEP) and BGP-LS [RFC7752] protocol 139 and data are passed to the PCE application. PCEP receives the state 140 of Label Switched Path (LSP) from the network, and BGP-LS receives 141 the topology information from the network. If network telemetry is 142 used, traffic information can be received from the network as well 143 directly at the NTA engine using protocols such as gRPC. 145 PCE application (APP) only maintains the latest information. To 146 enable NAI, history of all LSP and topology changes is stored in 147 external data repository. Further traffic monitoring data could also 148 be collected and stored, if network telemetry is used. There are two 149 usecases in the application scenarios: (1) reroute/re-optimize using 150 the historical trend and predications from AI; (2) traffic congestion 151 avoidance and AI-enabled auto-bandwidth adjustment. 153 For the usecase (1), the analytics component in NTA (Network 154 Telemetry and Analytics), can use stored data to build models to 155 predict impact of network events and state of the LSPs. For example, 156 it can use historical trends to guide path computation to include/ 157 exclude specific links. Finding correlations between data, finding 158 anomalies and data visualization are also possible. 160 The analytics component in NTA can also use stored data to detect and 161 predict network events and request PCE to take necessary actions. 162 For example, it can use network bandwidth utilization historical 163 trends to request for re-optimizations. 165 For the usecase (2), with network telemetry, the NTA can collect per- 166 link and per-LSP traffic flow using gRPC from network. Such network 167 telemetry data includes statistics for tunnels, links, bandwidth 168 reservations, actual usage, delay, jitter, packet loss, etc. 169 Meanwhile, it also collects data regarding network events and its 170 impact on traffic flows. The analytics component can use telemetry 171 data to build traffic models to predict traffic congestion when new 172 years or sporting events are coming. According to the congestion 173 prediction, the PCE app could reroute traffic to avoid congested 174 links. Besides the case, NTA can also perform predication and make 175 necessary changes to network. In particular, the PCE APP performs 176 bandwidth usage prediction (i.e., bandwidth calendaring) by looking 177 at the historical trends of all sampled data instead of the instant 178 sampled data. The collected data are traffic engineering data base 179 (TEDB) and LSP-DB, and can also include scheduling information. In 180 addition, the collected data also include auto-bandwidth related 181 changes under particular network events. Using machine learning 182 algorithm, the analytics component is able to correct such changes 183 with the events, and predicts network events and their impact. 185 3.2. Route Monitoring and Analytics 187 This subsection introduces the BGP Monitoring Protocol (BMP) 188 [RFC7854] use case in wide area networks (WAN). The BGP protocol is 189 known for its flexibility and ability to manage a large number of 190 neighbors and routes. It is also the basis for many overlay services 191 such as L3VPN, L2VPN and so on. The BMP protocol can be used by the 192 controller to monitor BGP protocol neighbor status and routing 193 information on the routers. 195 According to [RFC7854], BMP client located in the router collects BGP 196 neighbor status, routes for each neighbor, and events defined by the 197 user. And then it passes the informations through the BMP protocol 198 to the management station located on the controller. Based on BMP 199 monitoring of BGP, there are three use cases: (1) BGP Route Leaks 200 Monitoring; (2) BGP Hijacks Monitoring; (3) Traffic Analytics. 202 Route leaks involve the illegitimate advertisement of prefixes, 203 blocks of IP addresses, which propagate across networks and lead to 204 incorrect or suboptimal routing. For case (1), based on BMP, NAI 205 apps can analyze BGP route leaks. 207 For case (2), by manipulating BGP, data can be rerouted in an 208 attacker's favor out them to intercept or modify traffic.If the 209 malicious announcement is more specific than the legitimate one, or 210 claims to offer a shorter path, the traffic may be directed to the 211 attacker.By broadcasting false announcements, the compromised router 212 may poison the RIB of its peers.After poisoning one peer, the 213 malicious routing information could propagate to other peers, to 214 other Autonomous Systems, and onto the interactive Internet. Based 215 on monitoring BGP routes, ML algorithms can be trained to determine 216 when a hijack has taken place and take necessary actions. 218 In case (3), with BMP protocol providing BGP changes, together with 219 Telemetry providing network traffic information, The NAI Apps can 220 analyze traffic trends, predict traffic changes, and do traffic 221 optimizing. 223 3.3. Multilayer Fault Detection In NFV Framework 225 The high reliability and high availability required for carrier-class 226 applications is a big challenge in virtualized and software-based 227 environment where failures are normal in a software-based 228 environment. The interdependence between NFV's abstraction levels 229 and virtual resources is complex as shown in Fig.. The dynamic 230 characteristics of the resources in the cloud environment make it 231 difficult to locate the fault. So multilayer fault detection for NFV 232 networks and cloud environment will be very useful. 234 +--------------------+ 235 | Central | 236 | Controller | 237 | with | 238 | NTA | 239 +--------------------+ 240 | | | 241 | | | 242 | | | 243 V V V 244 +-------------------------------------------------------+ 245 | | 246 | +-----------+ +-----------+ +-----------+ | 247 | | VNF1 | | VNF2 | | VNF3 | | 248 | +-----|-----+ +-----|-----+ +-----|-----+ | 249 | | | VN-NF | | 250 | +-------|--------------|--------------|-------+ | 251 | | NFVI | | 252 | | +-----------+ +-----------+ +-----------+ | | 253 | | | Virtual | | Virtual | | Virtual | | | 254 | | | Computing | | Storage | | Network | | | 255 | | +-----------+ +-----------+ +-----------+ | | 256 | | +-----------------------------------------+ | | 257 | | | VIRTUALIZATION LAYER | | | 258 | | +--------------------|--------------------+ | | 259 | | VI-Ha | | | 260 | |+---------------------|---------------------+| | 261 | || Hardware Resouces || | 262 | ||+-----------+ +-----------+ +-----------+|| | 263 | ||| Computing | | Storage | | Network ||| | 264 | ||| Hardware | | Hardware | | Hardware ||| | 265 | ||+-----------+ +-----------+ +-----------+|| | 266 | |+-------------------------------------------+| | 267 | +---------------------------------------------+ | 268 | | 269 +-------------------------------------------------------+ 271 Figure 2 NAI in Multi-layer NFV Framework 273 For the virtualization layer, CPU performance, memory usage, 274 interface bandwidth and other KPI indicators can be monitored. At 275 the same time resource occupancy and the life cycle of NVF software 276 process can also be monitored. Through the NAI, the relevant 277 statistical data in multiple levels can be analyzed and the models 278 can be setup to locate the root cause for the possible fault in the 279 multi-layer environment. 281 3.4. Data Center Network Use Cases 283 Traditionally, data center networks have comprised a large number of 284 switches and routers that direct traffic based on the limited view of 285 each device. With help of SDN/NFV the data center networks are more 286 agile and dynamic to changing usage and traffic patterns. The real- 287 time traffic data and usage can be used to make the data center 288 management and operations intelligent. 290 Various protocols such as sFLOW, IPFIX could be used to get the port 291 statistics as well as traffic sampling. Over time this information 292 can help build the traffic usage models on a per port and per flow 293 basis. With historical data as the base the NTA engine can predict 294 the traffic usage and make necessary instructions to the SDN 295 controller or NFV orchestrator. These instructions could be reroute 296 a flow to avoid a congested port or scale-in another switch to share 297 load based on the predicted traffic demand. 299 The NTA engine should find correlation between the various network 300 data to build models and predict the impact of network events, 301 congestions, network utilization patters etc. Further NTA could 302 detect anomalies based on the historical patterns and help in root 303 cause analysis. The policy framework can be enhanced to consider the 304 analytics. 306 NTA engine could also get the usage and health information from the 307 Host (servers). Correlation between this information with the 308 information received from network could help in finding security 309 flows and anomalies when the information does not match. 311 3.4.1. Service Function Chaining 313 This sub section introduces how to apply NAI to SFC scenario to 314 intelligently reroute/re-optimize the service chains; increase 315 utilization for both Service Functions(SF) and network; intelligent 316 selection of the Service Function Path (SFP) based on data traffic 317 trends. 319 As per [RFC7665], Service function chaining (SFC) enables the 320 creation of composite (network), services that consist of an ordered 321 set of SFs that must be applied for specific treatment of received 322 packets and/or frames and/or flows selected as a result of 323 classification The SFs of chain are connected using a service 324 function forwarder (SFF), which is responsible for forwarding traffic 325 to one or more connected SFs according to information carried in the 326 SFC encapsulation, as well as handling traffic coming back from the 327 SF. 329 The various network telemetry information like delay, jitter, packet 330 loss from the network and the CPU/memory usage utilizations from the 331 SFs, can be collected using sFLOW/gRPC protocol and stored in 332 persistent data repository. The analytics component in NTA can use 333 stored data to build statistics models to predict the impact on 334 various Service Function Paths due to network events, traffic and 335 state of the SFPs and instruct the SDN controller to take necessary 336 actions SDN controller can calculate new paths/reroute the SFC path 337 to avoid congested Ports/SFFs or overloaded SFs. This correlation of 338 application analytics from the SFs and the network analytics from the 339 SFFs could enhance the intelligent management of the service chains 340 for the operators. 342 The usage and traffic pattern over time can help increase the 343 utilization of SF as well as the underlay network. 345 4. Contributors 347 The following people have substantially contributed to the usecases 348 of NAI: 350 Lizhao You 351 Huawei 352 Email: youlizhao@huawei.com 354 Kalyankumar Asangi 355 Huawei 356 Email: kalyana@huawei.com 358 5. Security Considerations 360 TBD 362 6. IANA Considerations 364 This document has no actions for IANA. 366 7. Acknowledgement 368 Thanks to Li Zhenbin and Liu Shucheng for their comments and 369 contribution. 371 8. References 372 8.1. Normative References 374 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 375 Requirement Levels", BCP 14, RFC 2119, 376 DOI 10.17487/RFC2119, March 1997, 377 . 379 8.2. Informative References 381 [I-D.li-rtgwg-network-ai-arch] 382 Li, Z. and J. Zhang, "An Architecture of Network 383 Artificial Intelligence(NAI)", draft-li-rtgwg-network-ai- 384 arch-00 (work in progress), October 2016. 386 [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation 387 Element (PCE)-Based Architecture", RFC 4655, 388 DOI 10.17487/RFC4655, August 2006, 389 . 391 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 392 Chaining (SFC) Architecture", RFC 7665, 393 DOI 10.17487/RFC7665, October 2015, 394 . 396 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 397 S. Ray, "North-Bound Distribution of Link-State and 398 Traffic Engineering (TE) Information Using BGP", RFC 7752, 399 DOI 10.17487/RFC7752, March 2016, 400 . 402 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP 403 Monitoring Protocol (BMP)", RFC 7854, 404 DOI 10.17487/RFC7854, June 2016, 405 . 407 Authors' Addresses 409 Yi Zheng 410 China Unicom 411 No.9, Shouti Nanlu, Haidian District 412 Beijing 100048 413 China 415 Email: zhengyi39@chinaunicom.cn 416 Xu Shiping 417 Huawei Technologies 418 Huawei Bld., No.156 Beiqing Rd. 419 Beijing 100095 420 P.R. China 422 Email: xushiping7@huawei.com 424 Dhruv Dhody 425 Huawei Technologies 426 Divyashree Techno Park, Whitefield 427 Bangalore, Karnataka 560066 428 India 430 Email: dhruv.ietf@gmail.com