idnits 2.17.1 draft-ding-opsawg-wavelength-use-case-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 30, 2017) is 2364 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- No information found for draft-ietf-wu-t2trg-network-telemetry - is the name correct? Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Opsawg Working Group X. Ding 3 Internet-Draft W. Liu 4 Intended status: Informational Huawei 5 Expires: May 3, 2018 C. Li 6 China Telecom 7 October 30, 2017 9 Network Data Use Case for Wavelength Division Service 10 draft-ding-opsawg-wavelength-use-case-00 12 Abstract 14 This document describes use cases that demonstrate the applicability 15 of network data to evaluate the performance of wavelength division 16 service. The objective of this draft is not to cover the wavelength 17 division service in detail. Rather, the intention is to illustrate 18 the requirements of network data used to evaluate the performance of 19 wavelength division service. 21 General characteristics of network data and two typical use cases are 22 presented in this document to demonstrate the different application 23 scenarios of network data in wavelength division service. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 3, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Conventions used in this document . . . . . . . . . . . . . . 3 61 3. Characteristics of network data . . . . . . . . . . . . . . . 3 62 4. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 4.1. Anomaly detection . . . . . . . . . . . . . . . . . . . . 4 64 4.2. Risk assessment . . . . . . . . . . . . . . . . . . . . . 5 65 5. Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 5.1. Merge data from different time periods . . . . . . . . . 6 67 6. Security Considerations . . . . . . . . . . . . . . . . . . . 6 68 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 8. Normative References . . . . . . . . . . . . . . . . . . . . 7 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 72 1. Introduction 74 Wavelength-division multiplexing (WDM) is a method of combining 75 multiple signals on laser beams at various infrared (IR) wavelengths 76 for transmission along fiber optic media. A WDM system uses a 77 multiplexer at the transmitter to join the several signals together, 78 and a demultiplexer at the receiver to split them apart. During the 79 wavelength division service running, network data is consistently 80 generated from wavelength division devices and it can reflect the 81 process of service running. 83 In the case of wavelength division service, customer is accustomed to 84 handle the network failure after the service interruption. Such 85 passive strategy is inefficient, and easily leads to long service 86 interruption. Network data collected from device is real and 87 reliable, and can help customer to predict the trend of wavelength 88 division optical performance. Statistical characteristics of network 89 data can help operator to judge the time point at which the service 90 is abnormal or normal, or the service is risky or healthy . 92 This document attempts to describe the detailed use cases that lead 93 to the requirements to support wavelength division performance 94 evaluation. The objective of this draft is not to cover the 95 wavelength division service in detail. Rather, the intention is to 96 illustrate the requirements of network data used to evaluate the 97 performance of wavelength division service. 99 General characteristics of network data and two typical use cases are 100 presented in this document to demonstrate the different application 101 scenarios of network data in wavelength division service. Moreover, 102 the question of how to integrate network data collected from 103 different time periods is raised. 105 2. Conventions used in this document 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 109 document are to be interpreted as described in RFC 2119 [RFC2119]. 111 KPI: Key Performance Indicator. Network KPI represents the 112 operational state of a network device, link or network protocol in 113 the network. KPI data is usually represented to users as a set of 114 time series 116 (e.g., KPI = x_i, i=1..t), 118 each time series is corresponding to one network KPI indicator value 119 at different time point during specific time period. 121 3. Characteristics of network data 123 Network data describes the process that information collected from 124 various data sources and transmitted to one or more receiving 125 equipment for analysis tasks [I-D.ietf-wu-t2trg-network-telemetry]. 126 Analysis tasks may include event correlation, anomaly detection, risk 127 detection, performance monitoring, trend analysis, and other related 128 processes. 130 Network data is a series of data points indexed in time order. It 131 taken over time may have an internal structure (such as, trend, 132 seasonal variation, or outliers). Trend means that, on average, the 133 measurements tend to increase (or decrease) over time. Seasonality 134 means that, there is a regularly repeating pattern of highs and lows 135 related to calendar time such as seasons, quarters, months, days of 136 the week, and so on. In regression, outliers are far away from the 137 line. With time series data, outliers are far away from the other 138 data. 140 Network time series data analysis comprises methods for analyzing 141 time series data in order to extract meaningful statistics and other 142 characteristics of the data. 144 Network data mainly consists several major characteristics: 146 o Subject: The subject is the object to be measured, and it has 147 multiple properties from different dimensions. An example of a 148 wavelength division service performance monitoring scenario is 149 that the subject of the measurement is the ' optical module ' 150 whose attributes may include board name, device name, and so on. 152 o Measured values: A subject may have one or more measured values, 153 and each measurement corresponds to a specific indicator. Take 154 the server status monitoring scenario example, the measured 155 indicators may have FEC_bef (Forward Error Correction coding 156 before error correction), FEC_aft (Forward Error Correction coding 157 after error correction), input optical power, output optical 158 power, etc. 160 o Timestamp: Each report of the measured value will have a timestamp 161 attribute to indicate its time. 163 4. Use cases 165 The following sections highlight some of the most common wavelength 166 division use case scenarios and are in no way exhaustive. 168 4.1. Anomaly detection 170 In Data Analytics Engine, anomaly detection is the identification of 171 items, events or observations which do not conform to an expected 172 pattern or other items in data. Typically the anomalous items will 173 translate to some kind of problem, such as optical layer problem. 175 For network equipment performance anomalies, multiple features are 176 usually extracted from KPI data, such as time, value, frequency, 177 etc., and used as the key factors for anomaly analysis. 179 Take wavelength division service as an example, collection 180 information such as FEC_bef, input optical power, laser bias current 181 and other key factors can be selected to keep track of wavelength 182 division service over time and calculate the device statistics data 183 in a specific time period such as average device downtime in the 184 specified time window. These statistics data can be further used to 185 detect wavelength division service anomaly or improve the accuracy 186 rate for wavelength division KPI anomaly detection. In this 187 scenario, we do not rely on the manual preconfigured threshold to 188 trigger alarm, instead, we automatically detect KPI anomaly in 189 advance and raise alarm, as seen in figure 1. 191 +---------+ +----------+ +----------+ +--------+ 192 | Network | | feature | | anomaly | | raise | 193 | data |+-->| selection+--->| detection|+-->| alarm | 194 +---------+ +----------+ +----------+ +--------+ 196 Figure 1: anomaly detection 198 4.2. Risk assessment 200 In Data Analytics Engine, risk assessment is a component aiming at 201 providing an estimation of the overall network risk condition. 202 Unlike the anomaly detection component that copes with network faults 203 and failure that already happened, risk assessment module's goal is 204 to anticipant network event, forecast short term change and risk in 205 the network based on the trends of network data (e.g., fast growing, 206 fast dropping, slowly increasing, and slowly decreasing of KPI data). 207 This opens up a channel to reveal potential network problems or 208 locate the need for network optimization and upgrade. 210 Network KPIs provide fine-grained understanding of network 211 performance, which bring more value to network maintenance and 212 operation, including identifying possible bottlenecks, dimensioning 213 issues, and locating the need to perform network optimization. Based 214 on the various monitor mechanisms, if any high risk is occurred in 215 the network, administrators could be informed at a very early stage. 216 The ability to handle large amount of noisy KPI data properly is 217 vital to gain these desired insights. 219 Given hundreds of thousands of KPI data, it is a challenging issue to 220 assess network risk. Good network risk assessment criteria should be 221 indicative of local network-level problems, and hence be able to 222 provide prompt warnings and help locate potential problems when 223 trivial but persisting anomalies are observed. Meanwhile, it must 224 also describe system performance in a global sense by aggregating 225 multi-faceted information with large number of KPIs across the 226 network infrastructure. There are two strategies to design such KPI 227 network risk, as shown in figure 2: 229 +---------+ +------------+ +------------+ 230 | Network | | single KPI | | risk | 231 | data |+-->| scoring |+--->| assessment | 232 +---------+ +-----+------+ +------------+ 233 | ^ 234 | | 235 +-----v------+ | 236 | multi-KPI | | 237 | scoring +------------+ 238 +------------+ 240 Figure 2: risk assessment 242 1) Single KPI scoring: The scoring strategy for single KPI. In this 243 case, different dimensions of a KPI should be examined to score a 244 KPI; 246 2) Multi-KPI scoring: The scoring strategy for assessing the network 247 risk using values of many KPIs. If a device or a service is 248 monitored by several key KPIs, the risk should be analyzed by the 249 integration of these KPI scores. 251 5. Data Issues 253 5.1. Merge data from different time periods 255 In the process of data collection, the collection period of the same 256 KPI may be different from each other. For example, for a multi- 257 domain deployment service, there are many different collection 258 periods for network devices, such as 30s, 5min, 15min, and so on. 260 KPI data collected from different domains is need to be analyzed for 261 correlation. For example, anomaly detection of wavelength division 262 service data from different domains is performed, and comparison is 263 performed among different domains. So we need to merge data sets 264 from different periods into a integrated data set using metrics in 265 the period, such as mean value, peak value or media value. It then 266 raises a question that how these data sets are stored and assessed 267 with high efficiency. 269 6. Security Considerations 271 TBD. 273 7. Conclusions 275 TBD. 277 8. Normative References 279 [I-D.ietf-wu-t2trg-network-telemetry] 280 Wu, Q., "Network Telemetry and Big Data Analysis", March 281 2016. 283 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 284 Requirement Levels", March 1997. 286 Authors' Addresses 288 Xiaojian Ding 289 Huawei 290 101 Software Avenue, Yuhua District 291 Nanjing, Jiangsu 210012 292 China 294 Email: dingxiaojian1@huawei.com 296 Will(Shucheng) Liu 297 Huawei 298 Bantian, Longgang District 299 Shenzhen 518129 300 P.R. China 302 Email: liushucheng@huawei.com 304 Chen Li 305 China Telecom 306 No.118 Xizhimennei street, Xicheng District 307 Beijing 100035 308 P.R. China 310 Email: lichen@ctbri.com.cn