< draft-mirsky-ippm-hybrid-two-step-01.txt   draft-mirsky-ippm-hybrid-two-step-02.txt >
IPPM Working Group G. Mirsky IPPM Working Group G. Mirsky
Internet-Draft ZTE Corp. Internet-Draft ZTE Corp.
Intended status: Informational W. Lingqiang Intended status: Standards Track W. Lingqiang
Expires: January 3, 2019 G. Zhui Expires: April 20, 2019 G. Zhui
ZTE Corporation ZTE Corporation
July 2, 2018 October 17, 2018
Hybrid Two-Step Performance Measurement Method Hybrid Two-Step Performance Measurement Method
draft-mirsky-ippm-hybrid-two-step-01 draft-mirsky-ippm-hybrid-two-step-02
Abstract Abstract
Development of, and advancements in, automation of network operations Development of, and advancements in, automation of network operations
brought new requirements for measurement methodology. Among them is brought new requirements for measurement methodology. Among them is
the ability to collect instant network state as the packet being the ability to collect instant network state as the packet being
processed by the networking elements along its path through the processed by the networking elements along its path through the
domain. This document introduces a new hybrid measurement method, domain. This document introduces a new hybrid measurement method,
referred to as hybrid two-step, as it separates the act of measuring referred to as hybrid two-step, as it separates the act of measuring
and/or calculating performance metric from the act of collecting and and/or calculating the performance metric from the act of collecting
transporting network state. and transporting network state.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 3, 2019. This Internet-Draft will expire on April 20, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 18 skipping to change at page 2, line 18
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions used in this document . . . . . . . . . . . . . . 3 2. Conventions used in this document . . . . . . . . . . . . . . 3
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3
3. Problem Overview . . . . . . . . . . . . . . . . . . . . . . 3 3. Problem Overview . . . . . . . . . . . . . . . . . . . . . . 3
4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 4 4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 4
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 4.1. Operation of the HTS Ingress Node . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 4.2. Operation of the HTS Transient Node . . . . . . . . . . . 7
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6 4.3. Operation of the HTS Egress Node . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.4. Considerations for HTS Timers . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . 6 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . 6 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . 9
8.2. Informative References . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction 1. Introduction
Successful resolution of challenges of automated network operation, Successful resolution of challenges of automated network operation,
as part of, for example, overall service orchestration or data center as part of, for example, overall service orchestration or data center
operation, relies on a timely collection of accurate information that operation, relies on a timely collection of accurate information that
reflects the state of network elements on an unprecedented scale. reflects the state of network elements on an unprecedented scale.
Because performing the analysis and act upon the collected Because performing the analysis and act upon the collected
information requires considerable computing and storage resources, information requires considerable computing and storage resources,
the network state information is unlikely to be processed by network the network state information is unlikely to be processed by the
elements themselves but will be relayed into the data storage network elements themselves but will be relayed into the data storage
facilities, e.g. data lakes. The process of producing, collecting facilities, e.g., data lakes. The process of producing, collecting
network state information also referred in this document as network network state information also referred to in this document as
telemetry, and transporting it for post-processing should work network telemetry, and transporting it for post-processing should
equally well with data flows or injected in the network test packets. work equally well with data flows or injected in the network test
RFC 7799 [RFC7799] describes a combination of elements of passive and packets. RFC 7799 [RFC7799] describes a combination of elements of
active measurement as a hybrid measurement. passive and active measurement as a hybrid measurement.
Several technical methods have been proposed to enable collection of Several technical methods have been proposed to enable collection of
network state information instantaneous to the packet processing, network state information instantaneous to the packet processing,
among them [P4.INT] and [I-D.ietf-ippm-ioam-data]. among them [P4.INT] and [I-D.ietf-ippm-ioam-data].
This document introduces Hybrid Two-Step (HTS) as a new hybrid This document introduces Hybrid Two-Step (HTS) as a new hybrid
measurement method that separates measuring or calculating measurement method that separates measuring or calculating the
performance metric from the collecting and transporting this performance metric from the collecting and transporting this
information. The Hybrid Two-Step method extends the two-step mode of information. The Hybrid Two-Step method extends the two-step mode of
Residence Time Measurement (RTM) defined in [RFC8169] to on-path Residence Time Measurement (RTM) defined in [RFC8169] to on-path
network state collection and transport. network state collection and transport.
2. Conventions used in this document 2. Conventions used in this document
2.1. Terminology 2.1. Terminology
RTM Residence Time Measurement RTM Residence Time Measurement
skipping to change at page 3, line 34 skipping to change at page 3, line 39
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3. Problem Overview 3. Problem Overview
Performance measurements are meant to provide data that characterize Performance measurements are meant to provide data that characterize
conditions experienced by traffic flows in the network and possibly conditions experienced by traffic flows in the network and possibly
trigger operational changes (e.g. - re-route of flows, or changes in trigger operational changes (e.g., re-route of flows, or changes in
resource allocations). Changes to a network are determined based on resource allocations). Modifications to a network are determined
the performance metric information available at the time that a based on the performance metric information available at the time
change is to be made. The correctness of this determination is based that a change is to be made. The correctness of this determination
on the quality of the collected metrics data. The quality of is based on the quality of the collected metrics data. The quality
collected measurement data is defined is defined by: of collected measurement data is defined by:
o the resolution and accuracy of each measurement; o the resolution and accuracy of each measurement;
o predictability of both the time at which each measurement is made o predictability of both the time at which each measurement is made
and the timeliness of measurement collection data delivery for and the timeliness of measurement collection data delivery for
use. use.
Consider the case of delay measurement that relies on collecting time Consider the case of delay measurement that relies on collecting time
of packet arrival at the ingress interface and time of the packet of packet arrival at the ingress interface and time of the packet
transmission at egress interface. The method may be to record a transmission at the egress interface. The method includes recording
local clock value on receiving the first octet of an affected message a local clock value on receiving the first octet of an affected
at the device ingress, and again to record the clock value on sending message at the device ingress, and again recording the clock value on
the first byte of the same message at the device egress. In this transmitting the first byte of the same message at the device egress.
ideal case, the difference between the two recorded clock times In this ideal case, the difference between the two recorded clock
corresponds to the time that the message spent in traversing the times corresponds to the time that the message spent in traversing
device. In practice, the times actually recorded can differ from the the device. In practice, the time that has been recorded can differ
ideal case by any fixed amount and a correction may then be applied from the ideal case by any fixed amount and a correction can be
to compute the same time difference taking into account the known applied to compute the same time difference taking into account the
fixed time associated with the actual measurement. In this way, the known fixed time associated with the actual measurement. In this
resulting time difference reflects any variable delay associated with way, the resulting time difference reflects any variable delay
queuing. associated with queuing.
Depending on the implementation, it may be a challenge to compute the Depending on the implementation, it may be a challenge to compute the
difference between message arrival and departure times and - on the difference between message arrival and departure times and - on the
fly - add the necessary residence time information to the same fly - add the necessary residence time information to the same
message. And that task may become even more challenging if the message. And that task may become even more challenging if the
packet is encrypted. Implementations SHOULD NOT record a message packet is encrypted. Implementations SHOULD NOT record a message
departure time that may be significantly inaccurate in an effort to departure time that may be significantly inaccurate in the same
include a correlated/computed delay value, in the same message, as a message, as the result of estimating the departure time that includes
result of estimating the departure time while including any variable the variable time component (such as that associated with buffering
time component (such as that associated with buffering and queuing of and queuing of the message). A similar problem may cause a lower
messages). A similar problem may cause a lower quality of, for quality of, for example, information that characterizes utilization
example, information that characterizes utilization of the egress of the egress interface. If unable to obtain the data consistently,
interface. If unable to obtain the data consistently, without without variable delays for additional processing, information may
variable delays for additional processing, information may not not accurately reflect the state at the egress interface. To
accurately reflect the state at the egress interface. To mitigate mitigate this problem [RFC8169] defined an RTM two-step mode.
this problem [RFC8169] defined RTM two-step mode.
Another challenge associated with methods that collect network state Another challenge associated with methods that collect network state
information into the actual data packet is the risk to exceed the information into the actual data packet is the risk to exceed the
Maximum Transmission Unit (MTU) size, particularly if the packet Maximum Transmission Unit (MTU) size, especially if the packet
traverses overlay domains or VPNs. Since the fragmentation is not traverses overlay domains or VPNs. Since the fragmentation is not
available at the transport network, operators may have to reduce MTU available at the transport network, operators may have to reduce MTU
size advertised to client layer or risk missing network state data size advertised to client layer or risk missing network state data
for the part, most probably the latter part, of the path. for the part, most probably the latter part, of the path.
4. Theory of Operation 4. Theory of Operation
The HTS method consists of the two phases: The HTS method consists of the two phases:
o performing a measurement or obtaining network state information, o performing a measurement or obtaining network state information,
one or more than one type, on a node; one or more than one type, on a node;
o collecting and transporting the measurement. o collecting and transporting the measurement.
HTS uses HTS Control message to define types of measurement or HTS uses HTS Trigger carried in a data packet or a specially
network state data collection requested from a node. HTS Control constructed test packet. Nature of the HTS Trigger is transport
message may be inserted into the data packet, as meta-data or shim, network layer specific, and its description is outside the scope of
or be transmitted in a specially constructed test packet. this document. The packet that includes the HTS Trigger in this
document also referred to as the trigger packet.
To collect measurement and network state data from the nodes HTS The HTS method uses the HTS Follow-up packet, in this document also
method uses the follow-up packet. The node that creates the HTS referred to as the follow-up packet, to collect measurement and
Control message also originates the HTS follow-up packet. The network state data from the nodes. The node that creates the HTS
follow-up packet contains characteristic information, copied from the Trigger also generates the HTS Follow-up packet. The follow-up
data packet, sufficient for participating nodes to associate it with packet contains characteristic information, copied from the trigger
packet, sufficient for participating HTS nodes to associate it with
the original packet. The exact composition of the characteristic the original packet. The exact composition of the characteristic
information is specific for each transport network and its definition information is specific for each transport network, and its
is outside the scope of this document. The follow-up packet also definition is outside the scope of this document. The follow-up
uses the same encapsulation as the data packet. If not payload but packet also uses the same encapsulation as the data packet. If not
only network information used to load-balance flows in equal cost payload but only network information used to load-balance flows in
multipath (ECMP), use of the network encapsulation identical to the equal cost multipath (ECMP), use of the network encapsulation
data packet should guarantee that the follow-up packet remains in- identical to the trigger packet should guarantee that the follow-up
band, i.e. traverses the same set of network elements, with the packet remains in-band, i.e., traverses the same set of network
original data packet. Only one outstanding follow-up packet may be elements, with the original data packet with the HTS Trigger. Only
on the node for the given path. That means that if the node receives one outstanding follow-up packet MUST be on the node for the given
HTS Control message for the flow on which it still waits for the path. That means that if the node receives an HTS Trigger for the
follow-up packet to the previous HTS Control message, the node will flow on which it still waits for the follow-up packet to the previous
originate the follow-up packet to transport the former set of the HTS Trigger, the node will originate the follow-up packet to
network state data and transmit it before it transmits the follow-up transport the former set of the network state data and transmit it
packet with the latest set of network state information. before it sends the follow-up packet with the latest collection of
network state information.
4.1. Operation of the HTS Ingress Node
A node that originates the HTS Trigger is referred to as HTS ingress
node. As stated, the ingress node originates the follow-up packet.
The follow-up packet has the transport network encapsulation
identical with the trigger packet followed by the HTS shim and one or
more telemetry information elements encoded as Type-Length-Value
{TLV}. Figure 1 displays the example of the follow-up packet format.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Transport Network ~
| Encapsulation |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver|HTS Shim Len| Flags | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Telemetry Data Profile |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Telemetry Data TLVs ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Follow-up Packet Format
Fields of the HTS shim are as follows:
Version (Ver) is the two-bits long field. It specifies the
version of the HTS shim format. This document defines the format
for the 0b00 value of the field.
HTS Shim Length is the six bits-long field. It defines the length
of the HTS shim in bytes. The minimal value of the field is four
bytes.
Flags is eight-bits long field. The format of the Flags field
displayed in Figure 2.
Full (F) flag MUST be set to zero by the node originating the
HTS follow-up packet and MUST be set to one by the node that
does not add its telemetry data to avoid exceeding MTU size.
The node originating the follow-up packet MUST zero the
Reserved field and ignore it on the receipt.
Sequence Number is 16 bits-long field. The value of the field
reflects the number of the HTS follow-up packet in the sequence of
the HTS follow-up packets originated in response to the same HTS
trigger. The ingress node MUST set the value of the field to
zero.
Telemetry Data Profile is the optional variable length field of
bit-size flags. Each flag indicates requested type of telemetry
data to be collected at the each HTS node. The increment of the
field is four bytes with a minimum length of zero.
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|F| Reserved |
+-+-+-+-+-+-+-+-+
Figure 2: Flags Field Format
4.2. Operation of the HTS Transient Node
Upon receiving the trigger packet the HTS transient node MUST:
o copy the transport information;
o start the HTS Follow-up Timer for the obtained flow.
Upon receiving the follow-up packet the HTS transient node MUST:
o verify that the matching transport information exists and the Full
flag is cleared, then stop the associated HTS Follow-up timer;
o collect telemetry data requested in the Telemetry Data Profile
field or defined by the local HTS policy;
o if adding the collected telemetry would not exceed MTU, then
append data into Telemetry Data TLVs field and transmit the
follow-up packet;
o otherwise, set the value of the Full flag to one and transmit the
received a follow-up packet;
o originate the new follow-up packet using the same transport
information. The value of the Sequence Number field in the HTS
shim MUST be set to the value of the field in the received follow-
up packet incremented by one. Copy collected telemetry data and
transmit the packet.
If the follow-up timer expires the transient node MUST:
o originate the follow-up packet using transport information
associated with the expired timer;
o initialize the HTS shim by setting Version field to 0b00 and
Sequence Number field to 0. Values of HTS Shim Length and
Telemetry Data Profile fields MAY be set according to the local
policy.
o copy telemetry information into Telemetry Data TLVs field and
transmit the packet.
4.3. Operation of the HTS Egress Node
Upon receiving the trigger packet the HTS egress node MUST:
o copy the transport information;
o start the HTS Collection timer for the obtained flow.
When the egress node receives the follow-up packet for the known
flow, i.e., the flow to which the Collection timer is running, the
node MUST:
o copy telemetry information;
o restart the corresponding Collection timer.
When the Collection timer expires the egress relays the collected
telemetry information for processing and analysis to a local or
remote agent.
4.4. Considerations for HTS Timers
This specification defines two timers - HTS Follow-up and HTS
Collection. Because for the particular flow there MUST be not more
than one HTS Trigger, values of HTS timers bounded by the rate of the
trigger generation for that flow.
5. IANA Considerations 5. IANA Considerations
This document doesn't have any IANA requirements. The section may be TBD
deleted before the publication.
6. Security Considerations 6. Security Considerations
Nodes that practice HTS method are presumed to share a trust model Nodes that practice HTS method are presumed to share a trust model
that depends on the existence of a trusted relationship among nodes. that depends on the existence of a trusted relationship among nodes.
This is necessary as these nodes are expected to correctly modify the This is necessary as these nodes are expected to correctly modify the
specific content of the data in the follow-up packet, and the degree specific content of the data in the follow-up packet, and the degree
to which HTS measurement is useful for network operation depends on to which HTS measurement is useful for network operation depends on
this ability. In practice, this means that those portions of this ability. In practice, this means either confidentiality or
messages that contain the network state data cannot be covered by integrity protection cannot cover those portions of messages that
either confidentiality or integrity protection. Though there are contain the network state data. Though there are methods that make
methods that make it possible in theory to provide either or both it possible in theory to provide either or both such protections and
such protections and still allow for intermediate nodes to make still allow for intermediate nodes to make detectable yet
detectable but authenticated modifications, such methods do not seem authenticated modifications, such methods do not seem practical at
practical at present, particularly for protocols that used to measure present, particularly for protocols that used to measure latency and/
latency and/or jitter. or jitter.
The ability to potentially authenticate and/or encrypt the network The ability to potentially authenticate and/or encrypt the network
state data for scenarios both with and without the participation of state data for scenarios both with and without the participation of
intermediate nodes that participate in HTS measurement is left for intermediate nodes that participate in HTS measurement is left for
further study. further study.
While it is possible for a supposed compromised node to intercept and While it is possible for a supposed compromised node to intercept and
modify the network state information in the follow-up packet, this is modify the network state information in the follow-up packet, this is
an issue that exists for nodes in general - for any and all data that an issue that exists for nodes in general - for all data that to be
may be carried over the particular networking technology - and is carried over the particular networking technology - and is therefore
therefore the basis for an additional presumed trust model associated the basis for an additional presumed trust model associated with an
with an existing network. existing network.
7. Acknowledgments 7. Acknowledgments
TBD Authors express their gratitude and appreciation to Joel Halpern for
the most helpful and insightful discussion on the applicability of
HTS in a Service Function Chaining domain.
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
 End of changes. 19 change blocks. 
90 lines changed or deleted 232 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/