IPPM Working Group G. Mirsky Internet-Draft ZTE Corp. Intended status: Informational W. Lingqiang Expires:August 31, 2018January 3, 2019 G. Zhui ZTE CorporationFebruary 27,July 2, 2018 Hybrid Two-Step Performance Measurement Methoddraft-mirsky-ippm-hybrid-two-step-00draft-mirsky-ippm-hybrid-two-step-01 Abstract Developmentofof, and advancementsinin, automation of network operations brought new requirementstowardfor measurement methodology. Among them is the ability to collecttheinstanttelemetrynetwork state as the packet being processed by the networking elements along its path through the domain. This document introduces a new hybrid measurement method, referred to as hybrid two-step, as it separates the act of measuringand/ orand/or calculating performance metric from the act of collecting and transportingtelemetry.network state. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onAugust 31, 2018.January 3, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions used in this document . . . . . . . . . . . . . . 3 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 3. Problem Overview . . . . . . . . . . . . . . . . . . . . . . 3 4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 4 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . .45 6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 7.AcknowledgementsAcknowledgments . . . . . . . . . . . . . . . . . . . . . .5. 6 8. References . . . . . . . . . . . . . . . . . . . . . . . . .56 8.1. Normative References . . . . . . . . . . . . . . . . . .56 8.2. Informative References . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .67 1. Introduction Successful resolution of challenges of automated network operation, as partofof, for example, overalllife-cycleserviceorchestration,orchestration or data center operation, relies on a timely collection of accurateand timelyinformation that reflects the state of network elements on an unprecedentedmassive, even grandiosescale. Because performing the analysis andactionact upon theitcollected information requires considerable computing and storage resources, the network stateinformation, also referred to as telemetry,information is unlikely to be processed by network elements themselves but will be relayed into the data storage facilities, e.g. data lakes. The process ofproducing telemetry information,producing, collecting network state information also referred in this document as network telemetry, and transporting it for post-processing shouldequallywork equally well with data flowsand specially insertedor injected in the network test packets.PerRFC 7799 [RFC7799]classification such process classifieddescribes a combination of elements of passive and active measurement as a hybridmeasurement method.measurement. Several technical methodswerehave been proposed to enable collection oftelemetrynetwork state information instantaneous to the packetprocessing. Amongprocessing, among them [P4.INT] and [I-D.ietf-ippm-ioam-data]. This document introduces Hybrid Two-Step (HTS) as a new hybrid measurementmethod, referred to as Hybrid Two-step (HTS),method thatitseparates measuringand/oror calculating performance metric from the collecting and transportingtelemetry.this information. Thehybrid two-stepHybrid Two-Step method extends the two-step mode of Residence Time Measurement (RTM) defined in [RFC8169] to on-pathtelemetrynetwork state collection and transport. 2. Conventions used in this document 2.1. Terminology RTM Residence Time Measurement ECMP Equal Cost Multipath MTU Maximum Transmission Unit HTS HybridTwo-stepTwo-Step Network telemetry - the process of collecting and reporting of network state 2.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Problem Overview Performance measurements are meant to provide data that characterize conditions experienced bydatatraffic flows in the network and possibly triggeroperations tooperational changes (e.g. - re-route of flows,allocate additionalorfree excess of resources. Allchanges in resource allocations). Changes tothea networkdependare determined based on thequalityperformance metric information available at the time that a change is to be made. The correctness ofcollected data and calculatedthis determination is based onits performance metrics.the quality of the collected metrics data. The quality ofmeasurementscollected measurement data is definednot only by resolution but by how consistent are performed measurements, how predictableis defined by: o themomentresolution and accuracy ofmeasurement making,each measurement; o predictability ofobtainingboth thedata.time at which each measurement is made and the timeliness of measurement collection data delivery for use. Consider the case of delay measurement that relies oncollection ofcollecting time of packet arrival at the ingress interface and time of the packet transmission at egress interface. Theidealmethod mayread wallbe to record a local clock valueason receiving theveryfirst octet ofthe packet being receivedan affected message at the device ingress, andanother value, asagain to record the clock value on sending the firstoctet being transmitted. That way all nodal processing delays be accounted for asbyte of the same message at the device egress. In thismethod excludes packet queuing. But ifideal case, themeasurement method requiresdifference between theoriginal packettwo recorded clock times corresponds tocarry either boththe timevalues ofthat thecalculated delay value,message spent in traversing the device. In practice, the times actually recorded can differ from the ideal case by any fixed amount and a correction may then be applied to compute thepacket mustsame time difference taking into account the known fixed time associated with the actual measurement. In this way, the resulting time difference reflects any variable delay associated with queuing. Depending on the implementation, it may bemodified on-the-fly, while being transmitted.a challenge to compute the difference between message arrival and departure times and - on the fly - add the necessary residence time information to the same message. And that task may become even more challenging if the packet is encrypted.As result, at egressImplementations SHOULD NOT record a message departure time that may beobtained beforesignificantly inaccurate in an effort to include a correlated/computed delay value, in thepacket transmission begins, thus leavingsame message, as a result of estimating the departure time while including any variabledelays unmeasured. Similartime component (such as that associated with buffering and queuing of messages). A similar problem may cause a lower quality of, for example, information that characterizes utilization of the egress interface. If unable to obtain the data consistently, without variable delays for additional processing, information may not accurately reflect the state at the egress interface. To mitigate this problem [RFC8169] defined RTM two-step mode. Another challengefacingassociated with methods that collecttelemetrynetwork state information into the actual data packet is the riskof exceeding size ofto exceed the Maximum Transmission Unit(MTU),(MTU) size, particularly if the packet traverses overlay domains or VPNs. Since the fragmentation is not available at the transport network, operators may have to reduce MTU size advertised to client layer or risk missingtelemetrynetwork state data for the part, most probably the latter part, of the path. 4. Theory of Operation The HTS method consists of the two phases: o performing a measurement or obtainingtelemetrynetwork state information, one or more than one type, on a node; o collecting and transporting the measurement. HTS uses HTS Control message to define types of measurement ortelemetrynetwork state data collection requested from a node. HTS Control message may be inserted into the data packet, as meta-data or shim, or be transmitted inthea specially constructed test packet. To collect measurement andtelemetrynetwork state data from the nodes HTS method uses the follow-up packet. The node that creates the HTS Control message also originates the HTS follow-up packet. The follow-up packet contains characteristic information, copied from the data packet, sufficient for participating nodes to associate it with the original packet.ExactThe exact composition of the characteristic information is specific for each transport network and its definition is outside the scope of this document. The follow-up packet also uses the same encapsulation as the data packet. If not payload but only network information used to load-balance flows in equal cost multipath (ECMP), use of the network encapsulation identical to the data packet should guarantee that the follow-up packet remainsin-band,in- band, i.e. traverses the same set of network elements, with the original data packet. Only one outstanding follow-up packet may be on the node for the given path. That means that if the node receives HTS Control message for the flow on which it still waits for the follow-up packet to the previous HTS Control message, the node will originate the follow-up packet to transport the former set of thetelemetrynetwork state data and transmit it before it transmits the follow-up packet with the latest set oftelemetrynetwork state information. 5. IANA Considerations This document doesn't have any IANA requirements. The section may be deleted before the publication. 6. Security Considerations Nodes that practice HTS method are presumed to share a trust model that depends on the existence of a trusted relationship amongthem.nodes. This is necessary as these nodes are expected to correctly modify the specific content of the data in the follow-up packet, and the degree to which HTS measurement is useful for network operation depends on this ability. In practice, this means that those portions of messages that contain thetelemetrynetwork state data cannot be covered by either confidentiality or integrity protection. Though there are methods that make it possible in theory to provide either or both such protections and still allow for intermediate nodes to make detectable but authenticated modifications, such methods do not seem practical at present, particularly for protocols that used to measure latency and/or jitter. The ability to potentially authenticate and/or encrypt thetelemetrynetwork state data for scenarios both with and without the participation of intermediate nodes that participate in HTS measurement is left for further study. While it is possible for a supposed compromised node to intercept and modify thetelemetrynetwork state information in the follow-up packet, this is an issue that exists for nodes in general - for any and all data that may be carried over the particular networking technology - and is therefore the basis for an additional presumed trust model associated with an existing network. 7.AcknowledgementsAcknowledgments TBD 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. 8.2. Informative References [I-D.ietf-ippm-ioam-data] Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., Chang, R.,and d.daniel.bernier@bell.ca, d., and J. Lemon, "Data Fields for In-situ OAM",draft-ietf-ippm-ioam-data-01draft-ietf-ippm-ioam- data-03 (work in progress),October 2017.June 2018. [P4.INT] "In-band Network Telemetry (INT)", P4.org Specification, October 2017. [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, May 2016, <https://www.rfc-editor.org/info/rfc7799>. [RFC8169] Mirsky, G., Ruffini, S., Gray, E., Drake, J., Bryant, S., and A. Vainshtein, "Residence Time Measurement in MPLS Networks", RFC 8169, DOI 10.17487/RFC8169, May 2017, <https://www.rfc-editor.org/info/rfc8169>. Authors' Addresses Greg Mirsky ZTE Corp. Email: gregimirsky@gmail.com Wang Lingqiang ZTE Corporation No 19 ,East Huayuan Road Beijing 100191 P.R.China Phone: +86 10 82963945 Email: wang.lingqiang@zte.com.cn Guo Zhui ZTE Corporation No 19 ,East Huayuan Road Beijing 100191 P.R.China Phone: +86 10 82963945 Email: guo.zhui@zte.com.cn