< draft-mirsky-ippm-hybrid-two-step-00.txt   draft-mirsky-ippm-hybrid-two-step-01.txt >
IPPM Working Group G. Mirsky IPPM Working Group G. Mirsky
Internet-Draft ZTE Corp. Internet-Draft ZTE Corp.
Intended status: Informational W. Lingqiang Intended status: Informational W. Lingqiang
Expires: August 31, 2018 G. Zhui Expires: January 3, 2019 G. Zhui
ZTE Corporation ZTE Corporation
February 27, 2018 July 2, 2018
Hybrid Two-Step Performance Measurement Method Hybrid Two-Step Performance Measurement Method
draft-mirsky-ippm-hybrid-two-step-00 draft-mirsky-ippm-hybrid-two-step-01
Abstract Abstract
Development of and advancements in automation of network operations Development of, and advancements in, automation of network operations
brought new requirements toward measurement methodology. Among them brought new requirements for measurement methodology. Among them is
is ability to collect the instant telemetry as the packet being the ability to collect instant network state as the packet being
processed by the networking elements along its path through the processed by the networking elements along its path through the
domain. This document introduces new hybrid measurement method, domain. This document introduces a new hybrid measurement method,
referred to as hybrid two-step, as it separates act of measuring and/ referred to as hybrid two-step, as it separates the act of measuring
or calculating performance metric from the act of collecting and and/or calculating performance metric from the act of collecting and
transporting telemetry. transporting network state.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 31, 2018. This Internet-Draft will expire on January 3, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 18 skipping to change at page 2, line 18
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions used in this document . . . . . . . . . . . . . . 3 2. Conventions used in this document . . . . . . . . . . . . . . 3
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3
3. Problem Overview . . . . . . . . . . . . . . . . . . . . . . 3 3. Problem Overview . . . . . . . . . . . . . . . . . . . . . . 3
4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 4 4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 4
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 6. Security Considerations . . . . . . . . . . . . . . . . . . . 5
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
8.1. Normative References . . . . . . . . . . . . . . . . . . 5 8.1. Normative References . . . . . . . . . . . . . . . . . . 6
8.2. Informative References . . . . . . . . . . . . . . . . . 6 8.2. Informative References . . . . . . . . . . . . . . . . . 6
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7
1. Introduction 1. Introduction
Successful resolution of challenges of automated network operation, Successful resolution of challenges of automated network operation,
as part of overall life-cycle service orchestration, relies on as part of, for example, overall service orchestration or data center
collection of accurate and timely information that reflects the state operation, relies on a timely collection of accurate information that
of network elements on unprecedented massive, even grandiose scale. reflects the state of network elements on an unprecedented scale.
Because analysis and action upon the it requires considerable Because performing the analysis and act upon the collected
computing and storage resources, the network state information, also information requires considerable computing and storage resources,
referred to as telemetry, is unlikely to be processed by network the network state information is unlikely to be processed by network
elements themselves but will be relayed into data lakes. The process elements themselves but will be relayed into the data storage
of producing telemetry information, collecting and transporting it facilities, e.g. data lakes. The process of producing, collecting
for post-processing should equally work with data flows and specially network state information also referred in this document as network
inserted in the network test packets. Per [RFC7799] classification telemetry, and transporting it for post-processing should work
such process classified as hybrid measurement method. equally well with data flows or injected in the network test packets.
RFC 7799 [RFC7799] describes a combination of elements of passive and
active measurement as a hybrid measurement.
Several technical methods were proposed to enable collection of Several technical methods have been proposed to enable collection of
telemetry information instantaneous to the packet processing. Among network state information instantaneous to the packet processing,
them [P4.INT] and [I-D.ietf-ippm-ioam-data]. among them [P4.INT] and [I-D.ietf-ippm-ioam-data].
This document introduces new hybrid measurement method, referred to This document introduces Hybrid Two-Step (HTS) as a new hybrid
as Hybrid Two-step (HTS), that it separates measuring and/or measurement method that separates measuring or calculating
calculating performance metric from the collecting and transporting performance metric from the collecting and transporting this
telemetry. The hybrid two-step method extends two-step mode of information. The Hybrid Two-Step method extends the two-step mode of
Residence Time Measurement (RTM) defined in [RFC8169] to on-path Residence Time Measurement (RTM) defined in [RFC8169] to on-path
telemetry collection and transport. network state collection and transport.
2. Conventions used in this document 2. Conventions used in this document
2.1. Terminology 2.1. Terminology
RTM Residence Time Measurement RTM Residence Time Measurement
ECMP Equal Cost Multipath ECMP Equal Cost Multipath
MTU Maximum Transmission Unit MTU Maximum Transmission Unit
HTS Hybrid Two-step HTS Hybrid Two-Step
Network telemetry - the process of collecting and reporting of
network state
2.2. Requirements Language 2.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3. Problem Overview 3. Problem Overview
Performance measurements are meant to provide data that characterize Performance measurements are meant to provide data that characterize
conditions experienced by data in the network and possibly trigger conditions experienced by traffic flows in the network and possibly
operations to re-route flows, allocate additional or free excess of trigger operational changes (e.g. - re-route of flows, or changes in
resources. All changes to the network depend on the quality of resource allocations). Changes to a network are determined based on
collected data and calculated based on its performance metrics. The the performance metric information available at the time that a
quality of measurements defined not only by resolution but by how change is to be made. The correctness of this determination is based
consistent are performed measurements, how predictable is the moment on the quality of the collected metrics data. The quality of
of measurement making, of obtaining the data. Consider case of delay collected measurement data is defined is defined by:
measurement that relies on collection of time of packet arrival at
the ingress interface and time of packet transmission at egress
interface. The ideal method may read wall clock value as the very
first octet of the packet being received at ingress, and another
value, as the first octet being transmitted. That way all nodal
processing delays be accounted for as this method excludes packet
queuing. But if the measurement method requires the original packet
to carry either both time values of the calculated delay value, then
the packet must be modified on-the-fly, while being transmitted. And
that task may become even more challenging if the packet is
encrypted. As result, at egress time may be obtained before the
packet transmission begins, thus leaving variable delays unmeasured.
Similar problem may cause lower quality of, for example, information
that characterizes utilization of the egress interface. If unable to
obtain the data consistently, without variable delays for additional
processing, information may not accurately reflect the state at the
egress interface. To mitigate this problem [RFC8169] defined RTM
two-step mode.
Another challenge facing methods that collect telemetry into the o the resolution and accuracy of each measurement;
actual data packet is risk of exceeding size of Maximum Transmission
Unit (MTU), particularly if the packet traverses overlay domains or o predictability of both the time at which each measurement is made
VPNs. Since the fragmentation is not available at the transport and the timeliness of measurement collection data delivery for
network, operators may have to reduce MTU size advertised to client use.
layer or risk missing telemetry data for the part, most probably the
latter part, of the path. Consider the case of delay measurement that relies on collecting time
of packet arrival at the ingress interface and time of the packet
transmission at egress interface. The method may be to record a
local clock value on receiving the first octet of an affected message
at the device ingress, and again to record the clock value on sending
the first byte of the same message at the device egress. In this
ideal case, the difference between the two recorded clock times
corresponds to the time that the message spent in traversing the
device. In practice, the times actually recorded can differ from the
ideal case by any fixed amount and a correction may then be applied
to compute the same time difference taking into account the known
fixed time associated with the actual measurement. In this way, the
resulting time difference reflects any variable delay associated with
queuing.
Depending on the implementation, it may be a challenge to compute the
difference between message arrival and departure times and - on the
fly - add the necessary residence time information to the same
message. And that task may become even more challenging if the
packet is encrypted. Implementations SHOULD NOT record a message
departure time that may be significantly inaccurate in an effort to
include a correlated/computed delay value, in the same message, as a
result of estimating the departure time while including any variable
time component (such as that associated with buffering and queuing of
messages). A similar problem may cause a lower quality of, for
example, information that characterizes utilization of the egress
interface. If unable to obtain the data consistently, without
variable delays for additional processing, information may not
accurately reflect the state at the egress interface. To mitigate
this problem [RFC8169] defined RTM two-step mode.
Another challenge associated with methods that collect network state
information into the actual data packet is the risk to exceed the
Maximum Transmission Unit (MTU) size, particularly if the packet
traverses overlay domains or VPNs. Since the fragmentation is not
available at the transport network, operators may have to reduce MTU
size advertised to client layer or risk missing network state data
for the part, most probably the latter part, of the path.
4. Theory of Operation 4. Theory of Operation
HTS method consists of the two phases: The HTS method consists of the two phases:
o performing a measurement or obtaining telemetry information, one o performing a measurement or obtaining network state information,
or more than one type, on a node; one or more than one type, on a node;
o collecting and transporting the measurement. o collecting and transporting the measurement.
HTS uses HTS Control message to define types of measurement or HTS uses HTS Control message to define types of measurement or
telemetry data collection requested from a node. HTS Control message network state data collection requested from a node. HTS Control
may be inserted into the data packet, as meta-data or shim, or be message may be inserted into the data packet, as meta-data or shim,
transmitted in the specially constructed test packet. or be transmitted in a specially constructed test packet.
To collect measurement and telemetry data from the nodes HTS method To collect measurement and network state data from the nodes HTS
uses the follow-up packet. The node that creates the HTS Control method uses the follow-up packet. The node that creates the HTS
message also originates the HTS follow-up packet. The follow-up Control message also originates the HTS follow-up packet. The
packet contains characteristic information, copied from the data follow-up packet contains characteristic information, copied from the
packet, sufficient for participating nodes to associate it with the data packet, sufficient for participating nodes to associate it with
original packet. Exact composition of the characteristic information the original packet. The exact composition of the characteristic
is specific for each transport network and its definition is outside information is specific for each transport network and its definition
the scope of this document. The follow-up packet also uses the same is outside the scope of this document. The follow-up packet also
encapsulation as the data packet. If not payload but only network uses the same encapsulation as the data packet. If not payload but
information used to load-balance flows in equal cost multipath only network information used to load-balance flows in equal cost
(ECMP), use of the network encapsulation identical to the data packet multipath (ECMP), use of the network encapsulation identical to the
should guarantee that the follow-up packet remains in-band, i.e. data packet should guarantee that the follow-up packet remains in-
traverses the same set of network elements, with the original data band, i.e. traverses the same set of network elements, with the
packet. Only one outstanding follow-up packet may be on the node for original data packet. Only one outstanding follow-up packet may be
the given path. That means that if the node receives HTS Control on the node for the given path. That means that if the node receives
message for the flow on which it still waits for the follow-up packet HTS Control message for the flow on which it still waits for the
to the previous HTS Control message, the node will originate the follow-up packet to the previous HTS Control message, the node will
follow-up packet to transport the former set of the telemetry data originate the follow-up packet to transport the former set of the
and transmit it before it transmits the follow-up packet with the network state data and transmit it before it transmits the follow-up
latest set of telemetry information. packet with the latest set of network state information.
5. IANA Considerations 5. IANA Considerations
This document doesn't have any IANA requirements. The section may be This document doesn't have any IANA requirements. The section may be
deleted before the publication. deleted before the publication.
6. Security Considerations 6. Security Considerations
Nodes that practice HTS method are presumed to share a trust model Nodes that practice HTS method are presumed to share a trust model
that depends on the existence of a trusted relationship among them. that depends on the existence of a trusted relationship among nodes.
This is necessary as these nodes are expected to correctly modify This is necessary as these nodes are expected to correctly modify the
specific content of the data in the follow-up packet, and degree to specific content of the data in the follow-up packet, and the degree
which HTS measurement is useful for network operation depends on this to which HTS measurement is useful for network operation depends on
ability. In practice, this means that those portions of messages this ability. In practice, this means that those portions of
that contain the telemetry data cannot be covered by either messages that contain the network state data cannot be covered by
confidentiality or integrity protection. Though there are methods either confidentiality or integrity protection. Though there are
that make it possible in theory to provide either or both such methods that make it possible in theory to provide either or both
protections and still allow for intermediate nodes to make detectable such protections and still allow for intermediate nodes to make
but authenticated modifications, such methods do not seem practical detectable but authenticated modifications, such methods do not seem
at present, particularly for protocols that used to measure latency practical at present, particularly for protocols that used to measure
and/or jitter. latency and/or jitter.
The ability to potentially authenticate and/or encrypt the telemetry The ability to potentially authenticate and/or encrypt the network
data for scenarios both with and without participation of state data for scenarios both with and without the participation of
intermediate nodes that participate in HTS measurement is left for intermediate nodes that participate in HTS measurement is left for
further study. further study.
While it is possible for a supposed compromised node to intercept and While it is possible for a supposed compromised node to intercept and
modify the telemetry information in the follow-up packet, this is an modify the network state information in the follow-up packet, this is
issue that exists for nodes in general - for any and all data that an issue that exists for nodes in general - for any and all data that
may be carried over the particular networking technology - and is may be carried over the particular networking technology - and is
therefore the basis for an additional presumed trust model associated therefore the basis for an additional presumed trust model associated
with existing network. with an existing network.
7. Acknowledgements 7. Acknowledgments
TBD TBD
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
skipping to change at page 6, line 10 skipping to change at page 6, line 34
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
8.2. Informative References 8.2. Informative References
[I-D.ietf-ippm-ioam-data] [I-D.ietf-ippm-ioam-data]
Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Brockners, F., Bhandari, S., Pignataro, C., Gredler, H.,
Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov,
P., Chang, R., and d. daniel.bernier@bell.ca, "Data Fields P., Chang, R., daniel.bernier@bell.ca, d., and J. Lemon,
for In-situ OAM", draft-ietf-ippm-ioam-data-01 (work in "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-
progress), October 2017. data-03 (work in progress), June 2018.
[P4.INT] "In-band Network Telemetry (INT)", P4.org Specification, [P4.INT] "In-band Network Telemetry (INT)", P4.org Specification,
October 2017. October 2017.
[RFC7799] Morton, A., "Active and Passive Metrics and Methods (with [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with
Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
May 2016, <https://www.rfc-editor.org/info/rfc7799>. May 2016, <https://www.rfc-editor.org/info/rfc7799>.
[RFC8169] Mirsky, G., Ruffini, S., Gray, E., Drake, J., Bryant, S., [RFC8169] Mirsky, G., Ruffini, S., Gray, E., Drake, J., Bryant, S.,
and A. Vainshtein, "Residence Time Measurement in MPLS and A. Vainshtein, "Residence Time Measurement in MPLS
 End of changes. 26 change blocks. 
115 lines changed or deleted 139 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/