IPPM Working Group                                             G. Mirsky
Internet-Draft                                                 ZTE Corp.
Intended status: Informational                              W. Lingqiang
Expires: August 31, 2018 January 3, 2019                                         G. Zhui
                                                         ZTE Corporation
                                                       February 27,
                                                            July 2, 2018

             Hybrid Two-Step Performance Measurement Method
                  draft-mirsky-ippm-hybrid-two-step-00
                  draft-mirsky-ippm-hybrid-two-step-01

Abstract

   Development of of, and advancements in in, automation of network operations
   brought new requirements toward for measurement methodology.  Among them is
   the ability to collect the instant telemetry network state as the packet being
   processed by the networking elements along its path through the
   domain.  This document introduces a new hybrid measurement method,
   referred to as hybrid two-step, as it separates the act of measuring and/
   or
   and/or calculating performance metric from the act of collecting and
   transporting telemetry. network state.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 31, 2018. January 3, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions used in this document . . . . . . . . . . . . . .   3
     2.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   3.  Problem Overview  . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Theory of Operation . . . . . . . . . . . . . . . . . . . . .   4
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   4   5
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   7.  Acknowledgements  Acknowledgments . . . . . . . . . . . . . . . . . . . . . .   5 .   6
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5   6
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   5   6
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6   7

1.  Introduction

   Successful resolution of challenges of automated network operation,
   as part of of, for example, overall life-cycle service orchestration, orchestration or data center
   operation, relies on a timely collection of accurate and timely information that
   reflects the state of network elements on an unprecedented massive, even grandiose scale.
   Because performing the analysis and action act upon the it collected
   information requires considerable computing and storage resources,
   the network state information, also
   referred to as telemetry, information is unlikely to be processed by network
   elements themselves but will be relayed into the data storage
   facilities, e.g. data lakes.  The process of producing telemetry information, producing, collecting
   network state information also referred in this document as network
   telemetry, and transporting it for post-processing should equally work
   equally well with data flows and specially
   inserted or injected in the network test packets.  Per
   RFC 7799 [RFC7799] classification
   such process classified describes a combination of elements of passive and
   active measurement as a hybrid measurement method. measurement.

   Several technical methods were have been proposed to enable collection of
   telemetry
   network state information instantaneous to the packet processing.  Among processing,
   among them [P4.INT] and [I-D.ietf-ippm-ioam-data].

   This document introduces Hybrid Two-Step (HTS) as a new hybrid
   measurement method, referred to
   as Hybrid Two-step (HTS), method that it separates measuring and/or or calculating
   performance metric from the collecting and transporting
   telemetry. this
   information.  The hybrid two-step Hybrid Two-Step method extends the two-step mode of
   Residence Time Measurement (RTM) defined in [RFC8169] to on-path
   telemetry
   network state collection and transport.

2.  Conventions used in this document

2.1.  Terminology

   RTM Residence Time Measurement

   ECMP Equal Cost Multipath

   MTU Maximum Transmission Unit

   HTS Hybrid Two-step Two-Step

   Network telemetry - the process of collecting and reporting of
   network state

2.2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Problem Overview

   Performance measurements are meant to provide data that characterize
   conditions experienced by data traffic flows in the network and possibly
   trigger
   operations to operational changes (e.g. - re-route of flows, allocate additional or free excess of
   resources.  All changes in
   resource allocations).  Changes to the a network depend are determined based on
   the quality performance metric information available at the time that a
   change is to be made.  The correctness of
   collected data and calculated this determination is based
   on its performance metrics. the quality of the collected metrics data.  The quality of measurements
   collected measurement data is defined not only by resolution but by how
   consistent are performed measurements, how predictable is defined by:

   o  the moment resolution and accuracy of measurement making, each measurement;

   o  predictability of obtaining both the data. time at which each measurement is made
      and the timeliness of measurement collection data delivery for
      use.

   Consider the case of delay measurement that relies on collection of collecting time
   of packet arrival at the ingress interface and time of the packet
   transmission at egress interface.  The ideal method may read wall be to record a
   local clock value as on receiving the very first octet of the packet being received an affected message
   at the device ingress, and another
   value, as again to record the clock value on sending
   the first octet being transmitted.  That way all nodal
   processing delays be accounted for as byte of the same message at the device egress.  In this method excludes packet
   queuing.  But if
   ideal case, the measurement method requires difference between the original packet two recorded clock times
   corresponds to carry either both the time values of that the calculated delay value, message spent in traversing the
   device.  In practice, the times actually recorded can differ from the
   ideal case by any fixed amount and a correction may then be applied
   to compute the packet must same time difference taking into account the known
   fixed time associated with the actual measurement.  In this way, the
   resulting time difference reflects any variable delay associated with
   queuing.

   Depending on the implementation, it may be modified on-the-fly, while being transmitted. a challenge to compute the
   difference between message arrival and departure times and - on the
   fly - add the necessary residence time information to the same
   message.  And that task may become even more challenging if the
   packet is encrypted.  As result, at egress  Implementations SHOULD NOT record a message
   departure time that may be obtained before significantly inaccurate in an effort to
   include a correlated/computed delay value, in the
   packet transmission begins, thus leaving same message, as a
   result of estimating the departure time while including any variable delays unmeasured.
   Similar
   time component (such as that associated with buffering and queuing of
   messages).  A similar problem may cause a lower quality of, for
   example, information that characterizes utilization of the egress
   interface.  If unable to obtain the data consistently, without
   variable delays for additional processing, information may not
   accurately reflect the state at the egress interface.  To mitigate
   this problem [RFC8169] defined RTM two-step mode.

   Another challenge facing associated with methods that collect telemetry network state
   information into the actual data packet is the risk of exceeding size of to exceed the
   Maximum Transmission Unit (MTU), (MTU) size, particularly if the packet
   traverses overlay domains or VPNs.  Since the fragmentation is not
   available at the transport network, operators may have to reduce MTU
   size advertised to client layer or risk missing telemetry network state data
   for the part, most probably the latter part, of the path.

4.  Theory of Operation

   The HTS method consists of the two phases:

   o  performing a measurement or obtaining telemetry network state information,
      one or more than one type, on a node;

   o  collecting and transporting the measurement.

   HTS uses HTS Control message to define types of measurement or
   telemetry
   network state data collection requested from a node.  HTS Control
   message may be inserted into the data packet, as meta-data or shim,
   or be transmitted in the a specially constructed test packet.

   To collect measurement and telemetry network state data from the nodes HTS
   method uses the follow-up packet.  The node that creates the HTS
   Control message also originates the HTS follow-up packet.  The
   follow-up packet contains characteristic information, copied from the
   data packet, sufficient for participating nodes to associate it with
   the original packet.  Exact  The exact composition of the characteristic
   information is specific for each transport network and its definition
   is outside the scope of this document.  The follow-up packet also
   uses the same encapsulation as the data packet.  If not payload but
   only network information used to load-balance flows in equal cost
   multipath (ECMP), use of the network encapsulation identical to the
   data packet should guarantee that the follow-up packet remains in-band, in-
   band, i.e. traverses the same set of network elements, with the
   original data packet.  Only one outstanding follow-up packet may be
   on the node for the given path.  That means that if the node receives
   HTS Control message for the flow on which it still waits for the
   follow-up packet to the previous HTS Control message, the node will
   originate the follow-up packet to transport the former set of the telemetry
   network state data and transmit it before it transmits the follow-up
   packet with the latest set of telemetry network state information.

5.  IANA Considerations

   This document doesn't have any IANA requirements.  The section may be
   deleted before the publication.

6.  Security Considerations

   Nodes that practice HTS method are presumed to share a trust model
   that depends on the existence of a trusted relationship among them. nodes.
   This is necessary as these nodes are expected to correctly modify the
   specific content of the data in the follow-up packet, and the degree
   to which HTS measurement is useful for network operation depends on
   this ability.  In practice, this means that those portions of
   messages that contain the telemetry network state data cannot be covered by
   either confidentiality or integrity protection.  Though there are
   methods that make it possible in theory to provide either or both
   such protections and still allow for intermediate nodes to make
   detectable but authenticated modifications, such methods do not seem
   practical at present, particularly for protocols that used to measure
   latency and/or jitter.

   The ability to potentially authenticate and/or encrypt the telemetry network
   state data for scenarios both with and without the participation of
   intermediate nodes that participate in HTS measurement is left for
   further study.

   While it is possible for a supposed compromised node to intercept and
   modify the telemetry network state information in the follow-up packet, this is
   an issue that exists for nodes in general - for any and all data that
   may be carried over the particular networking technology - and is
   therefore the basis for an additional presumed trust model associated
   with an existing network.

7.  Acknowledgements  Acknowledgments

   TBD

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

8.2.  Informative References

   [I-D.ietf-ippm-ioam-data]
              Brockners, F., Bhandari, S., Pignataro, C., Gredler, H.,
              Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov,
              P., Chang, R., and d. daniel.bernier@bell.ca, d., and J. Lemon,
              "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-data-01 draft-ietf-ippm-ioam-
              data-03 (work in progress), October 2017. June 2018.

   [P4.INT]   "In-band Network Telemetry (INT)", P4.org Specification,
              October 2017.

   [RFC7799]  Morton, A., "Active and Passive Metrics and Methods (with
              Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
              May 2016, <https://www.rfc-editor.org/info/rfc7799>.

   [RFC8169]  Mirsky, G., Ruffini, S., Gray, E., Drake, J., Bryant, S.,
              and A. Vainshtein, "Residence Time Measurement in MPLS
              Networks", RFC 8169, DOI 10.17487/RFC8169, May 2017,
              <https://www.rfc-editor.org/info/rfc8169>.

Authors' Addresses

   Greg Mirsky
   ZTE Corp.

   Email: gregimirsky@gmail.com

   Wang Lingqiang
   ZTE Corporation
   No 19 ,East Huayuan Road
   Beijing   100191
   P.R.China

   Phone: +86 10 82963945
   Email: wang.lingqiang@zte.com.cn

   Guo Zhui
   ZTE Corporation
   No 19 ,East Huayuan Road
   Beijing   100191
   P.R.China

   Phone: +86 10 82963945
   Email: guo.zhui@zte.com.cn