A Data Manifest for Contextualized Telemetry DataHuaweibenoit.claise@huawei.comHuaweijean.quilbeuf@huawei.comTelefonica I+DDon Ramon de la Cruz, 82Madrid 28006Spaindiego.r.lopez@telefonica.comTelefonica I+DRonda de la Comunicacion, S/NMadrid 28050Spainignacio.dominguezmartinez@telefonica.comSwisscomBinzring 17Zurich8045Switzerlandthomas.graf@swisscom.com
OPS
OPSAWG
Network elements use Model-driven Telemetry, and in particular YANG-Push, to continuously stream information, including both counters and state information.
This document documents the metadata that ensure that the collected data can be interpreted correctly.
This document specifies the Data Manifest, composed of two YANG data models (the Platform Manifest and the Data Collection Manifest.) The Data Manifest must be streamed and stored along with the data, up to the collection and analytics system in order to keep the collected data fully exploitable by the data scientists.
Network elements use Model-driven Telemetry (MDT), and in particular YANG-Push , to continuously stream information, including both counters and state information.
This document specifies what needs to be kept as metadata (i.e., the Data Manifest) to ensure that the collected data can still be interpreted correctly throughout the collection and network analytics toolchain. When streaming YANG-structured data with YANG-Push , there is a semantic definition in the corresponding YANG module definition. This is the semantic information for the collected objects: While this semantic is absolutely required to correctly decode and interpret the data, understanding the network element and collection environment contexts information is equally important to interpret the data.
This document proposes the Data Manifest, which is composed of two YANG data models, namely, the Platform Manifest and the Data Collection Manifest, in order to keep the collected data exploitable by the data scientists.
The Platform Manifest contains information characterizing the platform streaming the telemetry information, while the the Data Collection Manifest contains the required information to characterize how and when the telemetry information was metered.
The two proposed YANG modules in the Data Manifest do not expose many new information but rather define what should be exposed by a platform streaming telemetry. Some related YANG modules have been specified to retrieve the platform capabilities:
The IETF YANG Library .
YANG Modules Describing Capabilities for Systems and Datastore Update Notifications for the platform capabilities regarding the production and export of telemetry data.
, which is based on the previous draft to define the optimal settings to stream specific items (i.e., per path).
These related YANG modules are important to discover the capabilities before applying the telemetry configuration (such as on-change). Some of their content is part of the context for the streamed data.
We first present the module for the Platform Manifest in and then the module for the Data Collection Manifest in . The full Data Manifest is obtained by combining these two modules. We explain in how the Data Manifest can be retrieved and how collected data is mapped to the Data Manifest.
Streamed information from network elements is used for network analytics, incident detections, and in the end closed-loop automation. This streamed data can be stored in a database (sometimes called a big data lake) for further analysis.
As an example, a database could store a time series representing the evolution of a specific counter collected from a network element. When analyzing the data, the network operator/data scientist must understand the context information for these data:
This object definition in the YANG model.
The network element specific vendor, platform, and OS.
The collection parameters.
Characterizing the source used for producing the data (vendor, platform, and OS) is useful to complement the data. As an example, knowing the exact data source software specification might reveal a particularity in the observed data, explained by a specific bug, a specific bug fix, or simply a particular specific behavior. This is also necessary to ensure the reliability of the collected data. On top of that, in particular for YANG-Push , it is crucial to know the set of YANG modules supported by the platform, along with their deviations. In some cases, there might even be some backwards incompatible changes in native modules between one OS version to the next one. This information is captured by the proposed Platform Manifest.
From a collection parameters point of view, the data scientists analyzing the collected data must know that the counter was requested from the network element as on-change or at specific cadence. Indeed, an on-change collection explains why there is a single value as opposed to a time series. In case of periodic collection, this exact cadence might not be observable in the time series. Indeed, this time series might report some values as 0 or might even omit some values. The reason for this behavior might be diverse: the network element was under stress, with a too small observation period, compared to the minimum-observed-period . Again, knowing the conditions under which the counter was collected and streamed (along with the platform details) help drawing the right conclusions. As an example, taking into account the value of 0 might lead to a wrong conclusion that the counter dropped to zero. This document specifies the Data Collection Manifest, which contains the required information to characterize how and when the telemetry information was metered.
The goal of the current document is to define what needs to be kept as metadata (i.e., the Data Manifest) to ensure that the collected data can still be interpreted correctly.
When a new device is onboarded, operators must make sure that the new device streams data with YANG-Push, that the telemetry data is the right ones, and that the data is correctly ingested in the collection system, and finally that the data can be analyzed (compared with other similar devices). For the last point, the Data Manifiest, which must be linked to the data up to the collection and analytics system, contains all the relevant information.
The concept behind the data mesh are:
Principle of Domain Ownership: Architecturally and organizationally align business, technology, and analytical data, following the line of responsibility. Here, the Data Mesh principles adopt the boundary of bounded context to individual data products where each domain is responsible for (and owns) its data and models.
Principle of Data as a Product: The “Domain” owners are responsible to provide the data in useful way (discoverable through a catalog, addressable with a permanent and unique address, understandable with well defined semantics, trustworthy and truthful, self-describing for easy consumption, interoperable by supporting standards, secure, self-contained, etc.) and should treat consumers of that data as customers. It requires and relies on the “Domain Ownership” principle.
Principle of Self-serve Data Platform: This fosters the sharing of cross-domain data in order to create extra value.
Principle of Federated Computational Governance: Describes the operating model and approach to establishing global policies across a mesh of data products.
The most relevant concept for this document is the "Data as a Product" principle. The Data Manifest fulfills this principle as the two YANG data models, Platform Manifest and the Data Collection Manifest, along with the data, provide all the necessary information in a self-describing way for easy consumption.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Data Manifest: all the necessary data required to interpret the telemetry information.
Platform Manifest: part of the Data Manifest that completely characterizes the platform producing the telemetry information
Data Collection Manifest: part of the Data Manifest that completely characterizes how and when the telemetry information was metered.
contains the YANG tree diagram of the ietf-platform-manifest module.
The YANG module actually contains a list of Platform Manifests (in 'platforms/platform'), indexed by the identifier of the platform.
That identifier should be defined by the network manager so that each platform has a unique id.
As an example, the identifier could be the 'sysname' from the ietf-notification module presented in .
The scope of this module is the scope of the data collection, i.e. a given network, therefore it contains a collection of Platform Manifests, as opposed to the device scope, which would contain a single Platform Manifest.
The Platform Manifest is identified by a set of parameters ('name', 'software-version', 'software-flavor', 'os-version', 'os-type') that are aligned with the YANG Catalog so that the YANG Catalog could be used to retrieve the YANG modules a posteriori.
The vendor of the platform can be identified via its name 'vendor' or its PEN number 'vendor-pen', as described in .
In order to provide information for yang-push subscriptions based on streams, the Platform Manifest specifies the streams available on the platform within the 'yang-push-streams' container.
That container is similar to the one from the ietf-subscribed-notifications module, and the Data Collection Manifest uses it to refer to streams used by subscriptions.
The Platform Manifest also includes the contents of the YANG Library .
That module set is particularly useful to define the paths, as they are based on module names.
Similarly, this module defines the available datastores, which can be referred to from the Data Manifest, if necessary.
If supported by the platform, fetching metrics from a specific datastore could enable some specific use cases: monitoring configuration before it is committed, comparing between the configuration and operational datastore.
<CODE BEGINS>
file "ietf-platform-manifest@2023-03-08.yang"
<CODE ENDS> contains the YANG tree diagram of the ietf-data-collection-manifest module.
The 'data-collections' container contains the information related to each YANG-Push subscription.
As for the Platform Manifest, these subscriptions are indexed by the platform id, so that all subscriptions in the network can be represented in the module.
The YANG-Push collection is organized in subscriptions, the parameters for such a subscription are specified in and .
The list of subscription from a given platform is stored in '/data-collection/data-collection/yang-push-subscriptions/subscription'.
Subscription metadata are the bulk of the Data Collection Manifest, they are heavily based on the two RFCs above.
The 'target' choice specify the selected contents for the subscription.
We did not include the target with a reference to a common filter as stored in 'filters' in ietf-subscribed-notfications.
The rationale for this choice is that otherwise we would need to store these filters in the Platform Manifest, which could cause changes in that manifest more often than needed.
If a stream based subscription is used , the stream must exist in the 'yang-push-streams' container of Platform Manifest, which is modelled as a leafref in our module.
If a datastore based subscription is used , the datastore must exist in the 'yang-library' container of the Platform Manifest, which is modelled by a leafref as well.
We also included 'transport', 'encoding' and 'purpose' as in ietf-subscribed-notification, but without the feature switches as we are not concerned about using this module for configuration.
We also included the 'dscp', 'weighting' and 'dependency' as in ietf-subscribed-notification, again without the feature switches.
This information might be useful to understand why the collection is failing or less frequent than expected for a given subscription.
The 'update-trigger' choice from ietf-yang-push is included as well, as it is crucial to understand the frequency at which notifications should arrive.
The only new content is the 'current-period', which might differ from the requested period (when in periodic collection mode) if the platform implements a mechanism to increase the collection period when it is overloaded.
Finally, we also included the state of the receivers for that subscription, as in ietf-subscribed-notifications.
This information is crucial to understand the collected values. For instance, the 'on-change' trigger, if used, might remove a lot of messages from the database because values are sent only when there is a change.
<CODE BEGINS> file "ietf-data-collection-manifest@2023-03-08.yang"<CODE ENDS>
The Data Manifest MUST be streamed and stored along with the collected data.
In case the collected data are moved to a different place (typically a database), the Data Manifest MUST follow the collected data.
This can render the collected data unusable if that context is lost, for instance when the data is stored without the relevant information.
The Data Manifest MUST be updated when the Data Manifest information changes, for example, when a router is upgraded, when a new telemetry subscription is configured, or when the telemetry subscription parameters change.
The Data Manifest can itself be considered as a time series, and stored in a similar fashion to the collected data.
The collected data should be mapped to the Data Manifest. Since the Data Manifest will not change as frequently as the collected data itself, it makes sense to map several data to the same Data Manifest. Somehow, the collected data must include a metadata pointing to the corresponding Data Manifest. In case of Data Manifest change, the system should keep the mapping between the data collected so far and the old Data Manifest, and not assume that the latest Data Manifest is valid for the entire time series.
The Platform Manifest is likely to remain the same until the platform is updated. Thus, the Platform Manifest only needs to be collected once per streaming session and updated after a platform reboot.
As this draft specifically focuses on giving context on data collected via streamed telemetry, we can assume that a streaming telemetry system is available.
Retrieving the Data Collection Manifest and Platform Manifest can be done either by reusing that streaming telemetry system (in-band) or using another system (out-of-band), for instance by adding headers or saving manifests into a YANG instance file .
We propose to reuse the existing telemetry system (in-band approach) in order to lower the efforts for implementing this draft.
To enable a platform supporting streaming telemetry to also support the Data Manifest, it is sufficient that this platform supports
the models from and .
Recall that each type of manifest has its own rough frequency update, i.e. at reboot for the Platform Manifest and at new subscription or CPU load variation for the Data Collection Manifest.
The Data Manifest MUST be streamed with the YANG-Push on-change feature (also called event-driven telemetry).
With YANG-push, each notification sent by the device is part of a subscription, which is also one of the YANG keys used to retrieve the Data Manifest, the other key being the platform ID.
In order to enable a posteriori retrieval of the Data Manifest associated to a datapoint, the collector must:
Keep the subscription id and platform id in the metadata of the collected valuesCollect as well the Data Manifest for the subscription associated to the datapoint.
With this information, to retrieve the Data Manifest from the datapoint, the following happens:
The subscription id and platform id are retrieved from the datapoint metadataThe Data Manifest for that datapoint is obtained by using the values above as keys.
We don’t focus on the timing aspect as storing both the data and their manifest in a time series database will allow the data scientists to look for the Data Manifest corresponding to the timestamp of the datapoint.
In that scenario, the reliability of the collection of the Data Manifest is the same as the reliability of the data collection itself, since the Data Manifest is like any other data.
It is expected that the Data Manifest is streamed directly from the network equipment, along with YANG-Push data. However, if the network element streaming telemetry does not support yet the YANG modules from the Data Manifest specified in this document, the telemetry collector could populate the Data Manifest from available information collected from the platform. However, this option requires efforts on the telemetry collector side, as the information gathered in the Data Manifest proposed in this document could be scattered among various standard and vendor- specific YANG modules , that depend on the platform.
That Data Manifest should be kept and available even if the source platform is not accessible (from the collection system), or if the platform has been updated (new operating system or new configuration). The Platform Manifest is "pretty" stable and should change only when the platform is updated or patched. On the other hand, the Data Collection Manifest is likely to change each time a new YANG-Push subscription is requested and might even change if the platform load increases and collection periods are updated. To separate these two parts, we enclose each of them in its own module.
As we are reusing an existing telemetry system, the security considerations lies with the new content divulged in the new manifests.
Appropriate access control must be associated to the corresponding leafs and containers.
This document includes no request to IANA.
Do we want to handle the absence of values, i.e. add information about missed collection or errors in the collection context ? It could also explain why some values are missing. On the other hand, this might also be out scope.
Regarding the inclusion of ietf-yang-library in our module, do we want to include as well the changes from ietf-yang-library-revisions? What if other information are present in the yang-library from the platform? Should we use a YANG mount to capture them as well (they would not be captured with our use of the main yang-library grouping).
Similarly, the ietf-data-collection-manifest.yang includes many lines of copy-pasting from ietf-yang-push.yang and ietf-subscribed-notifications.yang since we want to include the information from these modules. Reusing groupings is not suitable as some leafrefs are pointing to nodes that are not at the same location in our network level module.
Maybe we need to find a solution (deviations + some kind of schema mount?) maybe we can live with similar modules since they have common nodes but different purposes?
Henk: how does this interact with SBOM effort?
Eliot: important to give integrity of the information a lot of thought. Threat model to be considered.
In this example, the goal is to collect the administrative status and number of received bytes for the interfaces of a fictional ACME device, and store the result in a Influx database.
The metrics are collected via YANG-Push, which is configured by specifying their XPaths and when they should be collected (periodically or on-change).
More precisely, we want collect "ietf-interfaces:interfaces/interface/enabled" on every change and "ietf-interfaces:interfaces/interface/statistics/in-octets" every 100 milliseconds.
The paths here are referring to the YANG module from .
The configuration of YANG push is out of scope for this document.
Since they don’t have the same trigger, each of the path must be collected in its own subscription.
In the scenario from , the collector receives YANG-push from the device and stores it into InfluxDB.
We first present a version without data manifest and then how to enrich it with the data manifest.
In InfluxDB, a datapoint is specified by giving the name of the measurement, zero or more key value entries named tags, one or more named values called fields and the timestamp for the datapoint.
In our case a measurement could be "admin-status".
The tags, whose aim to identify a particular instance of the measurement could be the name of the device and the name of the interface.
The fields contain the values to store.
InfluxDb defines a textual notation, named line protocol, to represent one datapoint per line.
We use this line protocol in and to represent the way data could be fed to InfluxDB, omitting the timestamp for readability.
See for more details.
Without the data manifest, the YANG-push collector is likely to store something similar to in InfluxDB.
In that case, only the value is stored, without any way to know how the value was obtained.
A possibility for keeping the data manifest with the data is to store it directly into InfluxDB.
In that case, the collector can subscribe to the data exported by the module presented in this draft and store it inside influxDB.
For the Platform Manifest, assuming the platform ID is "PE1", the collector subscribes to the path "ietf-platform-manifest:platforms/platform[id=PE1]".
For the Data Collection Manifest, assuming the subscription id is 42, the collector subscribes to the path "ietf-data-collection-manifest:data-collections/data-collection[platform-id="PE1"]/yang-push-subscriptions/subscription[id=42]".
The data, for instance serialized in JSON, can be stored in InfluxDB as shown in where "<platform-manifest>" and "<data-manifest>" represent the contents of respectively the Platform Manifest and the Data Collection Manifest.
In our example, The link between a collected datapoint and the corresponding Platform Manifest is done via the common "device" tag.
In order to link a datapoint with the corresponding Data Collection Manifest, the collector can add fields to specify where the Data Collection Manifest is located for that specific datapoint.
For instance, the same datapoints as in could be stored as in .
In our simple example, from the "admin_status" datapoint, one can retrieve the corresponding Platform Manifest by looking at the last value for the "platform-manifest" measurement with the same value for the "device" tag.
From the "admin-status" datapoint, one can retrieve the corresponding Data Collection Manifest by looking at the last value for the "data-manifest" measurement with tags "device" and "subId" matching respectively with the tag "device" and the field "subId" of the measurement.
v05 -> v06
Remove YANG packages Switch YANG models from device view to network view Add PEN number to identify vendors Intro rewritten with uses cases Added an "Operational Considerations" section Switch from MDT to YANG-push
v04 -> v05
First version of example scenario Updated affiliation Updated YANG module names to ietf-platform-manifest and ietf-data-collection-manifest Unify used terms as defined in the terminology section Replaced 'device' with 'platform' Split Section 5 into two sections for better readibility
v03 -> v04
Fix xym error Moved terminology after introduction Clarified the role of the module
v02 -> v03
Add when clause in YANG model Fix validation errors on YANG modules Augment YANG library to handle semantic versioning
v01 -> v02
Alignment with YANGCatalog YANG module: name, vendor Clarify the use of YANG instance file Editorial improvements
v00 -> v01
Adding more into data platform: yang packages, whole yanglib module to specify datastores Setting the right type for periods: int64 -> uint64 Specify the origin datastore for mdt subscription Set both models to config false Applying text comments from Mohamed Boucadair Adding an example of data-manifest file Adding rationale for reusing telemetry system for collection of the manifests Export manifest with on change telemetry as opposed to YANG instance file
v00
Initial version
This section is only here to ensure that the draft passes without compilations error.
It should be removed as soon as we can fix the issue.
<CODE BEGINS>
file "ietf-yang-push-modif@2023-03-08.yang"
<CODE ENDS>
Thanks to Mohamed Boucadair and Tianran Zhou for their reviews and comments.