idnits 2.17.1 draft-wu-t2trg-network-telemetry-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 9, 2016) is 2969 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'I-D.draft-strassner-anima-control-loops-01' is mentioned on line 440, but not defined == Unused Reference: 'I-D.ietf-idr-te-pm-bgp' is defined on line 454, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-lime-yang-oam-model' is defined on line 460, but no explicit reference was found in the text == Unused Reference: 'I-D.strassner-anima-control-loops' is defined on line 467, but no explicit reference was found in the text == Unused Reference: 'RFC5693' is defined on line 495, but no explicit reference was found in the text == Outdated reference: A later version (-18) exists of draft-ietf-idr-te-pm-bgp-02 == Outdated reference: A later version (-10) exists of draft-ietf-lime-yang-oam-model-02 == Outdated reference: A later version (-01) exists of draft-strassner-anima-control-loops-00 == Outdated reference: A later version (-09) exists of draft-wu-alto-te-metrics-06 Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Q. Wu 3 Internet-Draft J. Strassner 4 Intended status: Informational Huawei 5 Expires: September 10, 2016 A. Farrel 6 Old Dog Consulting 7 L. Zhang 8 Huawei 9 March 9, 2016 11 Network Telemetry and Big Data Analysis 12 draft-wu-t2trg-network-telemetry-00 14 Abstract 16 This document focuses on network measurement and analysis in the 17 network environment. It first defines network telemetry, describes 18 an exemplary network telemetry architecture, and then explores the 19 characteristics of network telemetry data. It ends with detailing a 20 set of issues with retrieving and processing network telemetry data. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 10, 2016. 39 Copyright Notice 41 Copyright (c) 2016 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. The definition of Network Telemetry . . . . . . . . . . . . . 3 58 3. Network Telemetry architecture . . . . . . . . . . . . . . . 3 59 4. Measurement data Characteristics . . . . . . . . . . . . . . 6 60 5. Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 5.1. Data Fetching Efficiency . . . . . . . . . . . . . . . . 7 62 5.2. Existing Network Level Metrics Inefficiency issue . . . . 7 63 5.3. Measurement data format consistency issue . . . . . . . . 9 64 5.4. Data Correlation issue . . . . . . . . . . . . . . . . . 9 65 5.5. Data Synchronization Issues . . . . . . . . . . . . . . . 10 66 6. Informative References . . . . . . . . . . . . . . . . . . . 10 67 Appendix A. Network Telemetry data source Classification . . . . 12 68 Appendix B. Existing Network Data Collection Methods . . . . . . 12 69 B.1. Network Log Collection . . . . . . . . . . . . . . . . . 12 70 B.1.1. Text based data collection . . . . . . . . . . . . . 13 71 B.1.2. SNMP Trap . . . . . . . . . . . . . . . . . . . . . . 13 72 B.1.3. Syslog based Collection . . . . . . . . . . . . . . . 13 73 B.2. Network Traffic Collection . . . . . . . . . . . . . . . 13 74 B.3. Network Performance Collection . . . . . . . . . . . . . 14 75 B.4. Network Faults Collection . . . . . . . . . . . . . . . . 14 76 B.5. Network Topology data Collection . . . . . . . . . . . . 14 77 B.6. Other Data Collection . . . . . . . . . . . . . . . . . . 14 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 80 1. Introduction 82 Today, billions of devices can connect to the internet and VPN and 83 establish a good ecosystem of connectivity. Our daily life also has 84 been greatly changed with a large number of IoT applications and 85 mobile application being built on top of it (e.g., smart tags on many 86 daily life objects, wearable health monitoring sensors, smartphones, 87 intelligent cars, and smart home appliances). However, the increased 88 amount of connection of devices and the proliferation of web and 89 multimedia services also imposes a great impact on the network. 90 Examples include: 92 o The massive scale and highly dynamic nature of the IoT 93 applications and mobile applications (e.g., interaction with other 94 thing at anytime and in any location) 96 o The increasingly vast amounts of data gathered from the network 97 enviroment at varying speeds, with different amounts of accuracy, 98 and the new communication patterns created 100 o The disparate types of pre- and post-processing necessary to 101 understand the meaning and context (e.g., semantics) of measured 102 data 104 Therefore the network may be subject to increased network incidents 105 and unregulated network changes, without better network visibility or 106 a good view of the available network resources and network topology, 107 it is not easy to 109 o schedule network resource to adapt to near real-time service 110 demands 112 o measure the network performance and assess network quality as a 113 whole 115 o provide quick network diagnosis, prove network innocence when the 116 application quality get worse or identify what parts of the 117 network can cause problems if a network glitch or service 118 interruption happens. 120 In this document, we first define network telemetry in the context of 121 network environment, followed by an exemplary architecture for 122 collecting and processing telemetry data. We then explore the 123 characteristics of network telemetry data, and end with describing a 124 set of issues with retrieving and processing network telemetry data. 126 2. The definition of Network Telemetry 128 Network Telemetry describes how information from various data sources 129 can be collected using a set of automated communication processes and 130 transmitted to one or more receiving equipment for analysis tasks. 131 Analysis tasks may include event correlation, anomaly detection, 132 performance monitoring, metric calculation, trend analysis, and other 133 related processes. 135 3. Network Telemetry architecture 137 A Network Telemetry architecture describes how different types of 138 Network Telemetry data are transmitted from different network sources 139 and received by different collection entities. In an ideal network 140 telemetry architecture, the ability to collect data should be 141 independent of any specific application and vendor limitations. This 142 means that protocol and data format translation are required, so that 143 a normalized form of data can be used to simplify the various 144 analysis and processing tasks required. 146 The Network Telemetry architecture is made up of the following three 147 key functional components: 149 o Data Source: The Data Source can be any type of network device 150 that generates data. Examples include the management system that 151 accesses IGP/BGP routing information, network inventory, topology, 152 and resource data, as well as other types of information that 153 provides data to be measured and/or contextual information to 154 better understand the network telemetry data. 156 o Data Collector: The Data Collector may be a part of a control and/ 157 or management system (e.g., NMS/OSS, SDN Controller, or OAM 158 system) and/or a dedicated set of entities. It gathers data from 159 various Data Sources, and performs processing tasks to feed raw 160 and/or processed data to the Data Analyzer. 162 o Data Analyzer: The Data Analyzer processes data from various data 163 collectors to provide actionable insight. This ranges from 164 generating simple statistical metrics to inferring problems to 165 recommending solutions to said problems. 167 Figure 1 shows an exemplary architecture for network telemetry and 168 analysis. 170 +----------------------+ 171 | Policy-based Manager | 172 +----------+-----------+ 173 / \ 174 | 175 | 176 +----------------+----------+-----------------------+ 177 | | | 178 | | | 179 \ / \ / \ / 180 +----------------+ +--------+-----------+ +----+-----+ 181 | Data Analyzer, |/ \| Data Fusion, |/ \| Decision | 182 | Normalizer, +---------+ Analytics, +-----+| Logic | 183 | Filter, etc. |\ /| and other Apps |\ /| and Apps | 184 +--------+-------+ +---+------------+---+ +----------+ 185 / \ / \ / \ 186 | | | 187 | | | 188 \ / \ / \ / 189 +--------+-------------+ +----+----+ +----+----+ 190 | Data Abstraction and | | Other | | Other | 191 | Modeling Software | | OT Data | | IT Data | 192 +------+--------+------+ +---------+ +---------+ 193 / \ / \ 194 | | 195 | | 196 \ / \ / 197 +----+--------+-----+ 198 | Data Collectors | 199 +----+---------+----+ 200 / \ / \ 201 | | 202 | | 203 | \ / 204 | +----+------------+ +-----------+ 205 | | Edge Software |/ \| Temporary | 206 | | (analysis & +--------+ Data | 207 | | transformation) |\ /| Storage | 208 | +------------+----+ +-----------+ 209 | / \ 210 | | 211 | | 212 \ / \ / 213 +-------+------+ +------+-------+ 214 | Data Sources | | Data Sources | 215 +--------------+ +--------------+ 217 Network Telemetry and Analysis Architecture 219 o Data Abstraction and Modeling Software. This component uses an 220 overarching information model to define relevant terms, objects, 221 and values that all components in the Network Telemetry 222 Architecture can use. 224 o Edge Software refers to performing compute, storage, and/or 225 networking functions on nodes at the edges of a network. This 226 enables processing of data to occur at or near the source of the 227 data. Figure 4 shows that some information from some Data Sources 228 may be sent directly to Data Collectors, while other data may be 229 sent first to Edge Software for further processing before it is 230 consumed by Data Collectors. 232 o Policy-based Manager. This component is responsible for managing 233 different aspects of the Network Telemetry Architecture in a 234 distributed and extensible manner through the use of a set of 235 policies that govern the behavior of the system. Examples include 236 defining rules that determine what data to collect when, where, 237 and how, as well as defining rules that, given a specific context, 238 determine how to process collected data. 240 This reference architecture assumes that Data Collectors can choose 241 different measurement data formats to gather measurement data, and 242 different protocols to transmit said data; the Data Abstraction and 243 Modeling Software normalizes collected data into a common form. Both 244 the Data Collector and the Data Analyzer may support data filtering, 245 correlation, and other types of data processing mechanisms. In the 246 above architecture, bi-directional communication is shown for 247 generality. This may be implemented a number of different ways, such 248 as using a request-response mechanism, a publish-subscribe mechanism, 249 or even as a set of uni-directional (e.g., push and pull) requests. 251 4. Measurement data Characteristics 253 Measurement data is generated from different data sources, and has 254 varying characteristics, including (but not limited to): 256 o Measurement data can be any of network performance data, network 257 logging data, network warning and defects data, network statistics 258 and state data, and network resource operation data (e.g., 259 operations on RIBs and FIBs[RFC4984]). 261 o Most measurement data are monitor state data rather than 262 configuration data. However, on occasion, network configuration 263 data may also be included (e.g., to establish context for the 264 measurement data). 266 o In many cases, telemetry data requires real time delivery with 267 high throughput, multi-channel data collection mechanisms. 269 o In most cases, the required frequency of access to monitoring 270 state data is extremely high. 272 5. Issues 274 5.1. Data Fetching Efficiency 276 Today, the existing data feching methods (See appendix B) prove 277 insufficiency due to the following factors: 279 o The existing Network management protocol is not dedicated and also 280 not sufficient for data collection. 282 * E.g.,NETCONF more focus onnetwork configuration, only retrieve 283 operational data 285 o SNMP relies on Periodic fetching. Periodic fetching of data is 286 not an adequate solution for many types of applications 288 * E.g., Applications that require frequent update to the stored 289 data 291 In addition, it adds significant load on participating networks, 292 devices, and applications 294 o We increasingly rely on RPC-style interactions [RFC5531] to fetch 295 data on demand by application. However most of applications are 296 interested in update of the data or change to the data. 298 o When data fetching protocol is selected, human readable format 299 such as XML, JSON to encode structured data enable us to parse 300 without knowing schema, however it lacks efficiency on the wire. 302 5.2. Existing Network Level Metrics Inefficiency issue 304 Quality of Service (QoS) and Quality of Experience (QoE) assessment 305 [RFC7266] of multimedia services has been well studied in ITU-T SG 306 12. Media quality is commonly expressed in terms of MoS (Mean 307 Opinion Score) [RFC3611][G107]. MoS is typically rated on a scale 308 from 1 to 5, in which 5 represents excellent and 1 represents 309 unacceptable. When multimedia application quality becomes bad,it is 310 hard to know whether this is network problem or application specific 311 problem(e.;g.,Codec type, Coding bit rate, packetization scheme, loss 312 recovery technique,the interaction between transport problems and 313 application-layer protocols ). To make sure this is not network 314 problem or know how serious network events or network interrruption 315 is, network health index or network key performance Index(KPI) or key 316 quality index(KQI) becomes important. 318 However, QoS/QoE assessment of network service that is dependent on 319 or not dependent on the underlying network technology (e.g., MPLS, 320 IP) is not well studied or defined in any body or organization. The 321 QoS/QoE of generic network services requires a set of appropriate 322 network performance, reliability, or other metric definitions. This 323 may take the form of key quality and or performance indicators, 324 ranging from high-level metrics (e.g., dropped calls) to low-level 325 metrics (e.g., packet loss, delay, and jitter). IP service 326 performance parameters are defined in ITU-T Y.1540 [Y1540]; however, 327 these existing network performance metrics are proving insufficient 328 due to several factors: 330 o These transport-specific metrics are defined for specific 331 technologies. For example, network performance parameters in 332 Y.1540 are only designed for IP networks, and do not apply to 333 connection- oriented networks, such as an MPLS-TP network. 335 o Not all the metrics are end-to-end performance metrics at the 336 network level. For example, the TE performance metrice defined in 337 ISIS-TE [RFC5305] is only defined for per link usage. 339 o These transport specific metrics are all single objective metrics; 340 there are no transport specific metrics defined as multi-objective 341 metrics. For example, IP transfer Delay (IPTD) is a single- 342 objective metric and cannot be used to measure similar and 343 important performance behaviors such as IP packet Delay 344 Variation[Y1541]). 346 o Different services have different performance requirements. It is 347 hard to measure network QoS to satisfy all possible services using 348 a single metric. 350 o Transport-specific metrics are not applied to the whole network, 351 but to a specific flow passing through the network corresponding 352 to matched QoS classes. 354 o If there are multiple paths from source to destination in the IP 355 network, then transport-specific metrics change with the path 356 selected and it may be also hard to know which path the packet 357 will traverse. 359 5.3. Measurement data format consistency issue 361 The data format is typically vendor- and device-specific. This also 362 means that different commands, having different syntax and semantics 363 characteristics that use different protocols, may have to be issued 364 to retrieve the same type of data from different devices. 366 The Data Analyzer may need to ingest data in a specific format that 367 is not supported by the Data Collectors that service it. For 368 example, the ALTO data format used between a data source and a Data 369 Collector generates an abstracted network topology and provides it to 370 network-aware applications (i.e., a Data Analyzer) over a web service 371 based API [I-D.wu-alto-te-metrics]. In this case, prefix data in the 372 network topology information need to be generated into ALTO Network 373 Maps, TE (topology) data needs to be generated into ALTO Cost Maps. 374 To provide better data format mapping, ALTO Network Map and Cost MAP 375 need to be modeled in the same way as prefix data and TE data in the 376 network topology information. However, these data use different data 377 formats, and do not have a common model structure to represent them 378 in a consistent way. 380 This is why the architecture shown in Figure 1 has a "Data 381 Abstraction and Modeling Software" component. This component 382 normalizes all data received into a common format for analysis and 383 processing by the Data Analyzer. If this component is not present, 384 then the Data Analyzer would have to deal with m vendor devices X n 385 versions of software for each device at a minimum. Furthermore, 386 different protocols have different capabilities, and may or may not 387 be able to transmit and receive different types of data. The Data 388 Abstraction and Modeling Software component can provide information 389 that defines the structure of data that should be received; this can 390 be useful for checking for incomplete collection data as well as 391 missing collection data. 393 5.4. Data Correlation issue 395 To provide consistent configuration, reporting and representation for 396 OAM information, the LIME YANG model [I-D.draft-ietf-lime-yang-oam- 397 model-01] is proposed to correlate defects, faults, and network 398 failures between the different layers and irregardless of network 399 technologies. This helps improve efficiency of fault detection and 400 localization, and provide better OAM visibility. 402 Today we see large amounts of data collected from different data 403 sources. These data can be network log data, network event data, 404 network performance data, network fault data, network statistics 405 state, network operation state. However, these data can only be 406 meaningful if they are correlated in time and space. In particular, 407 useful trend analysis and anomaly detection depend on proper 408 correlation of the data collected from the different Data Sources. 409 In addition, Correlate different type data from different Data 410 Sources with time or space can provide better network visibility. 411 But such correlations is still an challenging issue. 413 5.5. Data Synchronization Issues 415 When retrieving data from Data Sources or Data Collectors, 416 synchronization the same type of data between data source and data 417 collector or between data collector and data analyzer is a 418 complicated thing. 420 o Arrange src and dst synchronized, especially when multiple source 421 feed one data collector, or multiple data collector feed one data 422 analyzer 424 o Aggregate data from different data source and synchronize the data 425 to the data analyzer is also not easy task. 427 The reference architecture of Figure 1 defines a "Policy-based 428 Manager" to manage the set of data that are collected how, when, 429 where, and by which devices. This component provides mechanisms that 430 help ensure that needed information is collected by the appropriate 431 components of the Network Telemetry Architecture. It also 432 facilitates the synchronization of different components that make up 433 the Network Telemetry Architecture, since these are likely 434 distributed throughout one or more networks. 436 It also provides a mechanism for the Data Analyzer, or other 437 applications (e.g., the "Data Fusion, Analytics, and other Apps", as 438 well as the "Decision Logic and Apps" components in Figure 1) to 439 provide information to the Policy-based Manager in the form of 440 feedback (e.g., see [I-D.draft-strassner-anima-control-loops-01]). 442 6. Informative References 444 [G107] ITU-T, "The E-model: a computational model for use in 445 transmission planning", ITU-T Recommendation G.107, June 446 2015. 448 [I-D.ietf-idr-ls-distribution] 449 Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. 450 Ray, "North-Bound Distribution of Link-State and TE 451 Information using BGP", draft-ietf-idr-ls-distribution-13 452 (work in progress), October 2015. 454 [I-D.ietf-idr-te-pm-bgp] 455 Wu, Q., Previdi, S., Gredler, H., Ray, S., and J. 456 Tantsura, "BGP attribute for North-Bound Distribution of 457 Traffic Engineering (TE) performance Metrics", draft-ietf- 458 idr-te-pm-bgp-02 (work in progress), January 2015. 460 [I-D.ietf-lime-yang-oam-model] 461 Senevirathne, T., Finn, N., Kumar, D., Salam, S., Wu, Q., 462 and Z. Wang, "Generic YANG Data Model for Connection 463 Oriented Operations, Administration, and Maintenance(OAM) 464 protocols", draft-ietf-lime-yang-oam-model-02 (work in 465 progress), February 2016. 467 [I-D.strassner-anima-control-loops] 468 Strassner, J., "The Use of Control Loops in Autonomic 469 Networking", draft-strassner-anima-control-loops-00 (work 470 in progress), October 2015. 472 [I-D.wu-alto-te-metrics] 473 Wu, W., Yang, Y., Lee, Y., Dhody, D., and S. Randriamasy, 474 "ALTO Traffic Engineering Cost Metrics", draft-wu-alto-te- 475 metrics-06 (work in progress), April 2015. 477 [RFC3611] Friedman, T., Ed., Caceres, R., Ed., and A. Clark, Ed., 478 "RTP Control Protocol Extended Reports (RTCP XR)", 479 RFC 3611, DOI 10.17487/RFC3611, November 2003, 480 . 482 [RFC4984] Meyer, D., Ed., Zhang, L., Ed., and K. Fall, Ed., "Report 483 from the IAB Workshop on Routing and Addressing", 484 RFC 4984, DOI 10.17487/RFC4984, September 2007, 485 . 487 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 488 Engineering", RFC 5305, DOI 10.17487/RFC5305, October 489 2008, . 491 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 492 Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, 493 May 2009, . 495 [RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic 496 Optimization (ALTO) Problem Statement", RFC 5693, 497 DOI 10.17487/RFC5693, October 2009, 498 . 500 [RFC7266] Clark, A., Wu, Q., Schott, R., and G. Zorn, "RTP Control 501 Protocol (RTCP) Extended Report (XR) Blocks for Mean 502 Opinion Score (MOS) Metric Reporting", RFC 7266, 503 DOI 10.17487/RFC7266, June 2014, 504 . 506 [Y1540] ITU-T, "Internet protocol data communication service - IP 507 packet transfer and availability performance parameters", 508 ITU-T Recommendation Y.1540, March 2011. 510 [Y1541] ITU-T, "Network performance objectives for IP-based 511 services", ITU-T Recommendation Y.1541, December 2011. 513 Appendix A. Network Telemetry data source Classification 515 +-----------------------------|------------------------------+ 516 | Data Source Catetory | Information | 517 | | | 518 ------------------------------|------------------------------- 519 | Network Data | Usage records | 520 | | Performance Monitoring Data| 521 | | Fault Monitoring Data | 522 | | Real Time Traffic Data | 523 | | Real Time Statistics Data | 524 | | Network Configuration Data | 525 | | Provision Data | 526 ------------------------------|------------------------------- 527 | | | 528 | Subscriber Data | Profile Data | 529 | | Network Registry | 530 | | Operation Data | 531 | | Billing Data | 532 | | | 533 ------------------------------|------------------------------| 534 | | | 535 | Application Data derived | Traffic Analysis | 536 | from interfaces, channels, | Web, Search, SMS, Email | 537 | software, etc. | Social Media Data | 538 | | Mobile apps | 539 +-----------------------------|------------------------------+ 541 Appendix B. Existing Network Data Collection Methods 543 B.1. Network Log Collection 545 There are three typical Log data Collection methods: 547 o Text based Collection 548 o SNMP Trap 550 o Syslog based Collection 552 B.1.1. Text based data collection 554 Text base Log data is designed for low speed network. The amount of 555 IoT data can not be too large. It only can be parsed by the network 556 personnel with experience to define such kind of Log. The log data 557 can be transferred either by Email or via FTP. The difference 558 between using Email and using FTP are: 560 o The volume of data transferred by FTP can be much larger than via 561 Email. 563 o FTP based collection is active data collection while Email based 564 collection is passive data collection 566 B.1.2. SNMP Trap 568 SNMP Trap is a notification mechanism which enables an agent to 569 notify the management system of significant events by way of an 570 unsolicited SNMP message. In case there are large number of devices 571 and each device has large number of objects, SNMP Trap is more 572 efficient to get the data than polling information from every object 573 on every device. 575 B.1.3. Syslog based Collection 577 Syslog protocol is used to convey event notification messages and 578 allows the use of any number of transport protocols for transmission 579 of syslog messages. It is widely used in the network device((e.g., 580 switch, router) . 582 B.2. Network Traffic Collection 584 Network Traffic Collection is a process of exporting network traffic 585 flow information from routers, probes and other devices. It doesn't 586 care operation state on the network device but traffic flow 587 characteristic on the links between any two adjacent network device. 588 Take IPFIX as an example, it is widely adopted in the router and 589 switch to get IP traffic flow information for the network management 590 system. 592 B.3. Network Performance Collection 594 Network performance collection is a process of exporting network 595 performance information from routers, probers and other devices. The 596 network peformance information can be applied to the quality, 597 performance, and reliability of data delivery services and 598 applications running over network. It is also applied to traffic 599 contract argreed by the user and the network service provider. 600 Measurement mechanism defined in IPPM WG and OAM technology and OAM 601 tools can be used to perform performance measurement. 603 B.4. Network Faults Collection 605 Network fault collection is a process of exporting network fault, 606 failure, warning, defects from router, probers and other devices. It 607 usually adopts OAM technology,OAM tools, OAM model(e.g., SNMP MIB or 608 NETCONF YANG model) to localize fault and pinpoint fault location. 609 However OAM YANG model is mainly focused on configure OAM 610 functionality on the network element, how to use OAM YANG model to 611 collect more data, e.g., warning, failure, defects and how to use 612 these data needs to be further standardized. 614 B.5. Network Topology data Collection 616 For network topology data collection, routing protocols are important 617 collection method, since every router need to propagate its 618 information throughout the whole network. In addition, we can use 619 NMS/OSS to get network topology data if they have access to network 620 topology database or routing protocols. 622 Network Topology data comprise node information and link information. 623 It can be collected in two typical ways, if the network topology data 624 is within one IGP area or one AS, we can use ISIS protocol or OSPF to 625 gather them and write into RIB or topology datasore, and then we can 626 use I2RS protocol to read these network topology data; if the network 627 topology data is beyond one IGP area and span across several domains, 628 we can use BGP-LS [I-D.ietf-idr-ls-distribution][I-D.ietf-idr-te-pm- 629 bgp] to collect network topology data in different domain and 630 aggregated them in the central network topology database. 632 B.6. Other Data Collection 634 To collect and process large volume of data in real time or in near 635 real time to detect subtle event and aid failure diagnosis, we can 636 choose some other data fetching efficient tools, e.g., Facebook's 637 Scribe, Chukwa built on top of Hadoop File subsystem to parse out 638 structured data from some of the logs and load them into a datastore. 640 Authors' Addresses 642 Qin Wu 643 Huawei 644 101 Software Avenue, Yuhua District 645 Nanjing, Jiangsu 210012 646 China 648 Email: bill.wu@huawei.com 650 John Strassner 651 Huawei 652 2230 Central Expressway 653 San Jose, CA, CA 654 USA 656 Email: john.sc.strassner@huawei.com 658 Adrian Farrel 659 Old Dog Consulting 661 Email: adrian@olddog.co.uk 663 Liang Zhang 664 Huawei 666 Email: zhangliang1@huawei.com