Semantic Metadata Annotation for Network Anomaly Detection

Internet-Draft	Network Anomaly Semantics	October 2023
Graf, et al.	Expires 25 April 2024	[Page]

Abstract

This document explains why and how semantic metadata annotation helps to test and validate outlier detection, supports supervised and semi-supervised machine learning development and make anomalies for humans apprehensible. The proposed semantics uniforms the network anomaly data exchange between and among operators and vendors to improve their network outlier detection systems.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 25 April 2024.¶

1. Introduction

Network Anomaly Detection Architecture [Ahf23] provides an overall introduction into how anomaly detection is being applied into the IP network domain and which operational data is needed. It approaches the problem space by automating what a Network Engineer would normally do when veryfing a network connectivity service. Monitor from different network plane perspectives to understand wherever one network plane affects another negatively.¶

In order to fine tune outlier detection, the results provided as analytical data need to be reviewed by a Network Engineer. Keeping the human out of the monitoring but still involving him in the alert verification loop.¶

This document describes what information is needed to understand the output of the outlier detection for a Network Engineer, but also at the same time is semantically structured that it can be used for outlier detection testing by comparing the results systematically and set a baseline for supervised machine learning which requires labeled operational data.¶

2. Outlier Detection

Outlier Detection, also known as anomaly detection, describes a systematic approach to identify rare data points deviating significantly from the majority. Outliers are commonly classified in three categories:¶

Global outliers:: A data point is considered a global outlier if its value is far outside the entirety of a data set. For example, an average dropped packet count is between 0 and 10 per minute during a one week observation and the observed global outlier was 100000 packets.¶

Contextual outliers:: A data point is considered a contextual outlier if its value significantly deviates from the rest of the data points in the same time series context. For example, the forwarded packet volume in a timeseries are changing during the time of the day like an oscillation curve, where the observed contextual packet volume outlier is outside the oscillation curve at that moment in time. At another time the same value could be considered normal.¶

Collective outliers:: A subset of data points within a data set is considered anomalous if those values as a collection deviate significantly from the entire data set, but the values of the individual data points are not themselves anomalous in either a contextual or global sense. In Network Telemetry time series, one way this can manifest is that the amount of network path and interface state changes matches the time range when the forwarded packet volume decreases as a group.¶

For each outlier a score between 0 and 1 is being calculated. The higher the value, the higher the probability that the observed data point is an outlier. Anomaly detection: A survey [VAP09] gives additional details on anomaly detection and its types.¶

3. Data Mesh

The Data Mesh [Deh22] Architecture distinguishes between operational and analytical data. Operational data refers to collected data from operational systems. While analytical data refers to insights gained from operational data.¶

In terms of network observability, semantics of operational network metrics are defined by IETF and are categorized as described in the Network Telemetry Framework [RFC9232] in the following three different network planes:¶

Management Plane:: Time series data describing the state changes and statistics of a network node and its components. For example, Interface state and statistics modelled in ietf-interfaces.yang [RFC8343]¶

Control Plane:: Time series data describing the state and state changes of network reachability. For example, BGP VPNv6 unicast updates and withdrawals exported in BGP Monitoring Protocol (BMP) [RFC7854] and modeled in BGP [RFC4364]¶

Forwarding Plane:: Time series data describing the forwarding behavior of packets and its data-plane context. For example, dropped packet count modelled in IPFIX entity forwardingStatus(IE89) [RFC7270] and packetDeltaCount(IE2) [RFC5102] and exportet with IPFIX [RFC7011].¶

In terms of network observability, semantics of analytical data refers to incident notifications or service level indicators. For example the incident notification described in Section 7.2 of [I-D.feng-opsawg-incident-management], the health status and symptoms described in the Service Assurance Intend Based Networking [RFC9418] or the precision availability metrics defined in [I-D.ietf-ippm-pam] or network anomalies and its symptoms as described in this document.¶

4. Observed Symptoms

In this section observed network symptoms are specified and categorized according to the following scheme:¶

Action:: Which action the network node performed for a packet in the Forwarding Plane, a path or adjancency in the Control Plane or state or statistical changes in the Management Plane. For Forwarding Plane we distinguish between missing, where the drop occured outside the measured network node, drop and on-path delay, which was measured on the network node. For control-plane we distinguish between reachability, which refers to a change in the routing or forwarding information base (RIB/FIB) and adjcacency which refers to a change in peering or link-layer resolution. For Management Plane we refer to state or statistical changes on interfaces.¶

Reason:: For each action one or more reasons describinging why this action was used. For Drops in Forwarding Plane we distinguish between Unreachable because network layer reachability information was missing, administered because an administrator configured a rule preventing the forwarding for this packet and Corrupt where the network node was unable to determine where to forward to due to packet, software or hardware error. For On-Path Delay we distinguish between Minimum, Average and Maximum Delay for a given Flow.¶

Relation:: For each reason one or more relation describe the cause why the action was chosen. These reason could relate network plane entity, a packet, control-plane or node administered instruction.¶

Table 1 consolidates for the forwarding plane a list of common symptoms with their actions, reasons and relations.¶

Table 1: Describing Symptoms and their Actions, Reason and Relation for Forwarding Plane
Action	Reason	Relation
Missing	Previous	Time
Drop	Unreachable	next-hop
Drop	Unreachable	link-layer
Drop	Unreachable	Time To Life expired
Drop	Unreachable	Fragmentation needed and Don't Fragment set
Drop	Administered	Access-List
Drop	Administered	Unicast Reverse Path Forwarding
Drop	Administered	Discard Route
Drop	Administered	Policed
Drop	Administered	Shaped
Drop	Corrupt	Bad Packet
Drop	Corrupt	Bad Egress Interface
Delay	Min	-
Delay	Mean	-
Delay	Max	-

Table 2 consolidates for the control plane a list of common symptoms with their actions, reasons and relations.¶

Table 2: Describing Symptoms and their Actions, Reason and Relation for Control Plane
Action	Reason	Relation
Reachability	Update	Imported
Reachability	Update	Received
Reachability	Withdraw	Received
Reachability	Withdraw	Peer Down
Adjacency	Established	Peer
Adjacency	Established	Link-Layer
Adjacency	Teared Down	Peer
Adjacency	Teared Down	Link-Layer

Table 3 consolidates for the management plane a list of common symptoms with their actions, reasons and relations.¶

Table 3: Describing Symptoms and their Actions, Reason and Relation for Management Plane
Action	Reason	Relation
Interface	Up	Link-Layer
Interface	Down	Link-Layer
Interface	Errors	-
Interface	Discards	-
Interface	Unknown Protocol	-

5. Semantic Metadata

Metadata adds additional context to data. For instance, in networks the software version of a network node where management plane metrics are obtained from as described in [I-D.claise-opsawg-collected-data-manifest]. Where in Semantic Metadata the meaning or ontology of the annotated data is being described.¶

5.1. Overview of the Model

Figure 1 contains the YANG tree diagram [RFC8340] of the ietf-anomaly-detection-semantic-metadata module.¶

module: ietf-anomaly-detection-semantic-metadata

Figure 1: YANG tree diagram for ietf-anomaly-detection-semantic-metadata

Describe YANG module¶

8. References

8.1. Normative References

[Ahf23]: Huang Feng, A., "Daisy: Practical Anomaly Detection in large BGP/MPLS and BGP/SRv6 VPN Networks", IETF 117, Applied Networking Research Workshop, DOI 10.1145/3606464.3606470, July 2023, <https://anrw23.hotcrp.com/doc/anrw23-paper8.pdf>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8340]: Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, <https://www.rfc-editor.org/info/rfc8340>.
[RFC9232]: Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, <https://www.rfc-editor.org/info/rfc9232>.

8.2. Informative References

[Deh22]: Dehghani, Z., "Data Mesh", O'Reilly Media, ISBN 9781492092391, March 2022, <https://www.oreilly.com/library/view/data-mesh/9781492092384/>.
[I-D.claise-opsawg-collected-data-manifest]: Claise, B., Quilbeuf, J., Lopez, D., Martinez-Casanueva, I. D., and T. Graf, "A Data Manifest for Contextualized Telemetry Data", Work in Progress, Internet-Draft, draft-claise-opsawg-collected-data-manifest-06, 10 March 2023, <https://datatracker.ietf.org/doc/html/draft-claise-opsawg-collected-data-manifest-06>.
[I-D.feng-opsawg-incident-management]: Feng, C., Hu, T., Contreras, L. M., Graf, T., Wu, Q., Yu, C., and N. Davis, "Incident Management for Network Services", Work in Progress, Internet-Draft, draft-feng-opsawg-incident-management-02, 21 October 2023, <https://datatracker.ietf.org/doc/html/draft-feng-opsawg-incident-management-02>.
[I-D.ietf-ippm-pam]: Mirsky, G., Halpern, J. M., Min, X., Clemm, A., Strassner, J., and J. François, "Precision Availability Metrics for Services Governed by Service Level Objectives (SLOs)", Work in Progress, Internet-Draft, draft-ietf-ippm-pam-08, 18 October 2023, <https://datatracker.ietf.org/doc/html/draft-ietf-ippm-pam-08>.
[I-D.ietf-opsawg-ipfix-on-path-telemetry]: Graf, T., Claise, B., and A. H. Feng, "Export of On-Path Delay in IPFIX", Work in Progress, Internet-Draft, draft-ietf-opsawg-ipfix-on-path-telemetry-04, 6 July 2023, <https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-ipfix-on-path-telemetry-04>.
[RFC4364]: Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, <https://www.rfc-editor.org/info/rfc4364>.
[RFC5102]: Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, DOI 10.17487/RFC5102, January 2008, <https://www.rfc-editor.org/info/rfc5102>.
[RFC7011]: Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013, <https://www.rfc-editor.org/info/rfc7011>.
[RFC7270]: Yourtchenko, A., Aitken, P., and B. Claise, "Cisco-Specific Information Elements Reused in IP Flow Information Export (IPFIX)", RFC 7270, DOI 10.17487/RFC7270, June 2014, <https://www.rfc-editor.org/info/rfc7270>.
[RFC7854]: Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP Monitoring Protocol (BMP)", RFC 7854, DOI 10.17487/RFC7854, June 2016, <https://www.rfc-editor.org/info/rfc7854>.
[RFC8343]: Bjorklund, M., "A YANG Data Model for Interface Management", RFC 8343, DOI 10.17487/RFC8343, March 2018, <https://www.rfc-editor.org/info/rfc8343>.
[RFC9418]: Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. Arumugam, "A YANG Data Model for Service Assurance", RFC 9418, DOI 10.17487/RFC9418, July 2023, <https://www.rfc-editor.org/info/rfc9418>.
[VAP09]: Chandola, V., Banerjee, A., and V. Kumar, "Anomaly detection: A survey", IETF 117, Applied Networking Research Workshop, DOI 10.1145/1541880.1541882, July 2009, <https://www.researchgate.net/publication/220565847_Anomaly_Detection_A_Survey>.

Semantic Metadata Annotation for Network Anomaly Detection

Abstract

Requirements Language

Status of This Memo

Copyright Notice

Table of Contents