| < draft-lapukhov-dataplane-probe-00.txt | draft-lapukhov-dataplane-probe-01.txt > | |||
|---|---|---|---|---|
| opsawg P. Lapukhov | opsawg P. Lapukhov | |||
| Internet-Draft Facebook | Internet-Draft Facebook | |||
| Intended status: Standards Track March 18, 2016 | Intended status: Standards Track R. Chang | |||
| Expires: September 19, 2016 | Expires: December 12, 2016 Barefoot Networks | |||
| June 10, 2016 | ||||
| Data-plane probe for in-band telemetry collection | Data-plane probe for in-band telemetry collection | |||
| draft-lapukhov-dataplane-probe-00 | draft-lapukhov-dataplane-probe-01 | |||
| Abstract | Abstract | |||
| Detecting and isolating network faults in IP networks has | Detecting and isolating network faults in IP networks has | |||
| traditionally been done using tools like ping and traceroute (see | traditionally been done using tools like ping and traceroute (see | |||
| [RFC7276]) or more complex systems built on similar concepts of | [RFC7276]) or more complex systems built on similar concepts of | |||
| active probing and path tracing. While using active synthetic probes | active probing and path tracing. While using active synthetic probes | |||
| is proven to be helpful in detecting data-plane faults, isolating | is proven to be helpful in detecting data-plane faults, isolating | |||
| fault location has proven to be a much harder problem, especially in | fault location is a much harder problem, especially in diverse | |||
| diverse networks with multiple active forwarding planes (e.g. IP and | networks with multiple active forwarding planes (e.g. IP and MPLS). | |||
| MPLS). Moreover, existing end-to-end tools do not generally support | Moreover, existing end-to-end tools do not generally support | |||
| functionality beyond dealing with packet loss - for example, they are | functionality beyond dealing with packet loss - for example, they are | |||
| hardly useful for detecting and reporting transient (i.e. milli- or | hardly useful for detecting and reporting transient (i.e. milli- or | |||
| even micro-second) network congestion. | even micro-second) network congestion. | |||
| Modern network forwarding hardware can enable more sophisticated | Modern network forwarding hardware can allow for more sophisticated | |||
| data-plane functionality that provides substantial improvement to the | data-plane functionality that provides substantial improvement to the | |||
| isolation and identification capabilities of network elements. For | isolation and identification capabilities of network elements. For | |||
| example, it has become possible to encode a snapshot of a network | example, it has become possible to encode a snapshot of a network | |||
| elements forwarding state within the packet payload as it transits | element's state within the packet payload as it transits the device. | |||
| the device. One example of such device/network state would be queue | One example of such state would be queue depth on the egress port | |||
| depth on the egress port taken by that specific packet. When | taken by that specific packet. When combined with a unique device | |||
| combined with a unique device identifier embedded in the same packet, | identifier embedded in the same packet, this could allow for precise | |||
| this could allow for precise time and topological identification of | time and topological identification of the the congested location | |||
| the the congested location within the network. | within the network. | |||
| This document proposes a standard format for embedding telemetry | This document proposes a format for requesting and embedding | |||
| information in UDP-based probing packets, i.e. packets designated for | telemetry information in active probes, i.e. packet designated for | |||
| testing the network while not carrying application traffic. These | actively testing the network while not carrying application traffic. | |||
| active probes could be conveyed over multiple protocols (ICMP, UDP, | These active probes could be conveyed over multiple protocols (ICMP, | |||
| TCP, etc.) but this document specifically focuses on UDP, given its | UDP, TCP, etc.) and the document does not prescribe any particular | |||
| simple semantics. In addition this document provides recommendations | transport. In addition, this document provides recommendations on | |||
| on handling the active probes by devices that do not support the | handling the active probes by devices that do not support the | |||
| required data-plane functionality. | required data-plane functionality. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on September 19, 2016. | This Internet-Draft will expire on December 12, 2016. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2016 IETF Trust and the persons identified as the | Copyright (c) 2016 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 39 ¶ | skipping to change at page 2, line 44 ¶ | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Data plane probe . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Data plane probe . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2.1. Probe transport . . . . . . . . . . . . . . . . . . . . . 4 | 2.1. Probe transport . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2.2. Probe structure . . . . . . . . . . . . . . . . . . . . . 4 | 2.2. Probe structure . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2.3. Header Format . . . . . . . . . . . . . . . . . . . . . . 5 | 2.3. Header Format . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.4. Telemetry Record Template . . . . . . . . . . . . . . . . 7 | 2.4. Telemetry Data Frame and Telemetry Data Records . . . . . 7 | |||
| 2.5. Telemetry Record . . . . . . . . . . . . . . . . . . . . 8 | 3. Telemetry Record Types . . . . . . . . . . . . . . . . . . . 8 | |||
| 3. Telemetry Record Types . . . . . . . . . . . . . . . . . . . 9 | ||||
| 3.1. Device Identifier . . . . . . . . . . . . . . . . . . . . 9 | 3.1. Device Identifier . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.2. Timestamp . . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.2. Timestamp . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3. Queueing Delay . . . . . . . . . . . . . . . . . . . . . 10 | 3.3. Queueing Delay . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.4. Ingress/Egress Port IDs . . . . . . . . . . . . . . . . . 11 | 3.4. Ingress/Egress Port IDs . . . . . . . . . . . . . . . . . 10 | |||
| 3.5. Forwarding Information . . . . . . . . . . . . . . . . . 11 | 3.5. Opaque State Snapshot . . . . . . . . . . . . . . . . . . 10 | |||
| 3.5.1. IPv6 Route . . . . . . . . . . . . . . . . . . . . . 12 | 4. Operating in loopback mode . . . . . . . . . . . . . . . . . 11 | |||
| 3.5.2. IPv4 Route . . . . . . . . . . . . . . . . . . . . . 12 | 5. Processing Probe Packet . . . . . . . . . . . . . . . . . . . 11 | |||
| 3.5.3. MPLS Route . . . . . . . . . . . . . . . . . . . . . 12 | 5.1. Detecting a probe . . . . . . . . . . . . . . . . . . . . 12 | |||
| 4. Operating in loopback mode . . . . . . . . . . . . . . . . . 13 | 6. Non-Capable Devices . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 5. Processing Probe Packet . . . . . . . . . . . . . . . . . . . 14 | 7. Handling data-plane probes in the MPLS domain . . . . . . . . 12 | |||
| 5.1. Detecting a probe . . . . . . . . . . . . . . . . . . . . 14 | 8. Multi-chip device considerations . . . . . . . . . . . . . . 12 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | ||||
| 6. Non-Capable Devices . . . . . . . . . . . . . . . . . . . . . 14 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 7. Handling data-plane probes in the MPLS domain . . . . . . . . 14 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 | |||
| 8. Multi-chip device considerations . . . . . . . . . . . . . . 15 | 10.2. Informative References . . . . . . . . . . . . . . . . . 13 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
| 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
| 11.1. Normative References . . . . . . . . . . . . . . . . . . 15 | ||||
| 11.2. Informative References . . . . . . . . . . . . . . . . . 15 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
| 1. Introduction | 1. Introduction | |||
| Detecting and isolating faults in IP networks may involve multiple | Detecting and isolating faults in IP networks may involve multiple | |||
| tools and approaches, but by far the two most popular utilities used | tools and approaches, but by far the two most popular utilities used | |||
| by operators are ping and traceroute. The ping utility provides the | by operators are ping and traceroute. The ping utility provides the | |||
| basic end-to-end connectivity check by sending a special ICMP packet. | basic end-to-end connectivity check by sending a special ICMP packet. | |||
| There are other variants of ping that work using TCP or UDP probes, | There are other variants of ping that work using TCP or UDP probes, | |||
| but may require a special responder application (for UDP) on the | but may require a special responder application (for UDP) on the | |||
| other end of the probed connection. | other end of the probed connection. | |||
| This type of active probing approach has its limitations. First, it | This type of active probing approach has its limitations. First, it | |||
| operates end-to-end and thus it is impossible to tell where in the | operates end-to-end and thus it is impossible to tell where in the | |||
| path the fault has happened from simply observing the packet loss | path the fault has happened from simply observing the packet loss | |||
| ratios. Secondly, in multipath (ECMP) scenarios it can be quite | ratios. Secondly, in multipath (ECMP) scenarios it can be difficult | |||
| difficult to fully and/or deterministically exercise all the possible | to fully and/or deterministically exercise all the possible paths | |||
| paths connecting two end-points. | connecting two end-points. | |||
| The traceroute utility has multiple variants as well - UDP, ICMP and | The traceroute utility has multiple variants as well - UDP, ICMP and | |||
| TCP based, for instance, and special variant for MPLS LSP testing. | TCP based, for instance, and special variant for MPLS LSP testing. | |||
| Practically all variants follow the same model of operations: varying | Practically all variants follow the same model of operations: varying | |||
| TTL field setting in outgoing probes and analyzing the returned ICMP | TTL field setting in outgoing probes and analyzing the returned ICMP | |||
| unreachable messages. This does allow isolating the fault down to | unreachable messages. This does allow isolating the fault down to | |||
| the IP hop that is losing packets, but has its own limitations. As | the IP hop that is losing packets, but has its own limitations. As | |||
| with the ping utility, it becomes complicated to explore all possible | with the ping utility, it becomes complicated to explore all possible | |||
| ECMP paths in the network. This is especially problematic in large | ECMP paths in the network. This is especially problematic in large | |||
| Clos fabric topologies that are very common in large data-center | Clos fabric topologies that are very common in large data-center | |||
| skipping to change at page 4, line 8 ¶ | skipping to change at page 4, line 8 ¶ | |||
| packets that take regular forwarding path: the latter are normally | packets that take regular forwarding path: the latter are normally | |||
| not redirected to the control plane processor and handled purely in | not redirected to the control plane processor and handled purely in | |||
| the data-plane hardware. | the data-plane hardware. | |||
| Modern network processing elements (both hardware and software based) | Modern network processing elements (both hardware and software based) | |||
| are capable of packet handling beyond basic forwarding and simple | are capable of packet handling beyond basic forwarding and simple | |||
| header modifications. Of special interest is the ability to capture | header modifications. Of special interest is the ability to capture | |||
| and embed instantaneous state from the network element and encode | and embed instantaneous state from the network element and encode | |||
| this state directly into the transit packet. One example would be to | this state directly into the transit packet. One example would be to | |||
| record the transit device's name, ingress and egress port | record the transit device's name, ingress and egress port | |||
| identifiers, queue depths, timestamps and so on. By collecting this | identifiers, queueing delays, timestamps and so on. By collecting | |||
| state along each network device in the path, it becomes trivial to | this state along each network device in the path, it becomes trivial | |||
| trace a probe's path through the network as well as record transit | to trace a probe's path through the network as well as record transit | |||
| device characteristics. Extending this model, one could build a tool | device characteristics. Extending this model, one could build a tool | |||
| that combines the useful properties of ping and traceroute using a | that combines the useful properties of ping and traceroute using a | |||
| single packet flight through the network, without the constraints of | single packet flight through the network, without the constraints of | |||
| control plane (aka "slow path") processing. To aid in the | control plane (aka "slow path") processing. To aid in the | |||
| development of such tooling, this document defines a format for | development of such tooling, this document defines a format for | |||
| embedding telemetry information in the body of active probing | requesting and embedding telemetry information in the body of active | |||
| packets. | probing packets. | |||
| 2. Data plane probe | 2. Data plane probe | |||
| This section defines the structure of the active data-plane probe | This section defines the structure of the active data-plane probe. | |||
| packets. | ||||
| 2.1. Probe transport | 2.1. Probe transport | |||
| This document assumes the use of IP/UDP for data-plane probing | This document does not prescribe any specific encapsulation for the | |||
| (either IPv4 or IPv6). A receiving application may listen on a pre- | data-plane probe. For example, the probe could be embedded inside a | |||
| defined UDP port to collect and possibly echo back the information | UDP packet, or within an IPv6 extension header. | |||
| embedded in the probe. One potential limitation to this methodology | ||||
| is the size of the probe packet, as some data-plane faults may only | ||||
| impact packets of a given size or range of sizes. In this case, the | ||||
| data-plane probe may not be able to detect such issues, given the | ||||
| requirement to pre-allocate storage in the packet body. | ||||
| 2.2. Probe structure | 2.2. Probe structure | |||
| The sender is responsible for constructing a packet large enough to | The probe consists of a fixed-size "Header" and arbitrary number of | |||
| hold all records to be added by the network elements. Concurrently, | variable-length "telemetry data frames" following the header. Frames | |||
| the probes must not exceed the minimum MTU allowed along the path, so | are variable length, and each frame, in turn, consists of multiple | |||
| it is assumed that the sender either knows the needed MTU or relies | "telemetry record" fields defined below in this document. The | |||
| on well-known mechanisms for path MTU discovery. After adding the | records are added per the request of the telemetry information | |||
| mandatory protocol (IP, UDP, etc.) headers, the packet payload is | specified in the header. | |||
| built according to the following layout: | ||||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| | Header | | | Header | | |||
| +---------------------------------------------------------+ | ||||
| | Telemetry Record template | | ||||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| | Placeholder for telemetry record 1 | | | Telemetry data frame N | | |||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| | Placeholder for telemetry record 2 | | | Telemetry data frame N-1 | | |||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| . . | . . | |||
| . . | . . | |||
| . . | . . | |||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| | Placeholder for telemetry record N | | | Telemetry data frame 1 | | |||
| +---------------------------------------------------------+ | +---------------------------------------------------------+ | |||
| Figure 1: Probe layout | Figure 1: Probe layout | |||
| Notice that all record placeholders are equal size, as prescribed by | Notice that the first frame is at the end of the packet. For | |||
| the telemetry record template, and that space for those must be pre- | efficient hardware implementation, new frames are pushed onto the | |||
| allocated by the sender of the packet. Each record corresponds to a | stack at each hop. This eliminates the need for the transit network | |||
| single network element on the path from sender to receiver of the | elements to inspect the full packet and allows for arbitrarily long | |||
| packet. | packets as the MTU allows. | |||
| 2.3. Header Format | 2.3. Header Format | |||
| The probe payload starts with a fixed-size header. The header | The probe payload starts with a fixed-size header. The header | |||
| identifies the packet as a probe packet, and encodes basic | identifies the packet as a data-plane probe packet, and encodes basic | |||
| information shared by all telemetry records. | information shared by all telemetry records. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Probe Marker (1) | | | Probe Marker (1) | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Probe Marker (2) | | | Probe Marker (2) | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Version Number | Must Be Zero |S|O| | | Version | Message Type | Flags | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Message Type | Hop Limit | Must Be Zero | | | Telemetry Request Vector | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Sender's Handle | | | Hop Limit | Hop Count | Must Be Zero | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Sequence Number | | | Maximum Length | Current Length | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Write Offset | | | Sender's Handle | Sequence Number | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 2: Header Format | Figure 2: Header Format | |||
| (1) The "Probe Marker" fields are arbitrary 32-bit values generally | (1) The "Probe Marker" fields are arbitrary 32-bit values generally | |||
| used by the network elements to identify the packet as a probe | used by the network elements to identify the packet as a probe | |||
| packet. These fields should be interpreted as unsigned integer | packet. These fields should be interpreted as unsigned integer | |||
| values, stored in network byte order. For example, a network | values, stored in network byte order. For example, a network | |||
| element may be configured to recognize a UDP packet destined to | element may be configured to recognize a UDP packet destined to | |||
| port 31337 and having 0xDEAD 0xBEEF as the values in "Probe | port 31337 and having 0xDEAD 0xBEEF as the values in "Probe | |||
| Marker" field as an active probe, and treat it respectively. | Marker" field as an active probe, and treat it respectively. | |||
| (2) "Version Number" is currently set to 1. | (2) "Version Number" is currently set to 1. | |||
| (3) The "Global Flags" field is 8 bits, and defines the following | (3) The "Message Type" field value could be either "1" - "Probe" or | |||
| flags: | "2" - "Probe Reply" | |||
| (1) "Overflow" (O-bit) (least significant bit). This bit is | (4) The "Flags" field is 8 bits, and defines the following flags: | |||
| set by the network element if there is no record | ||||
| placeholder available: i.e. the packet is already "full" of | ||||
| telemetry information. | ||||
| (2) "Sealed" (S-bit). This bit instructs the network element | (5) | |||
| to forward the packet WITHOUT embedding telemetry data, | ||||
| even if it matches the probe identification rules. This | ||||
| mechanism could be used to send "realistic" probes of | ||||
| arbitrary size after the network path associated with the | ||||
| combination of source/destination IP addresses and ports | ||||
| has been previously established. The network element must | ||||
| not inspect the "Telemetry Record Template" field for | ||||
| "sealed" probes. | ||||
| (4) The "Message Type" field value could be either "1" - "Probe" or | (1) "Overflow" (O-bit) (least significant bit). This bit is | |||
| "2" - "Probe Reply" | set by the network element if the number of records on the | |||
| packet is at the maximum limit as specified by the packet: | ||||
| i.e. the packet is already "full" of telemetry | ||||
| information. | ||||
| (5) "Hop Limit" is defined only for "Message Type" of "1" ("Probe"). | (6) "Telemetry Request Vector" is a 32-bit long field that requests | |||
| For "Probe Reply" the "Hop Limit" field must be set to zero. | well-known inband telemetry information from the network | |||
| This field is treated as an integer value and decremented by | elements on the path. A bit set in this vector translates to a | |||
| every network element in the path as "Probe" propagates. See | request of a particular type of information. The following | |||
| the Section 4 section on the intended use of the field. | types/bits are currently defined, starting with the least | |||
| significant bit first: | ||||
| (6) The "Sender's Handle" field is set by the sender to allow the | (1) Bit 0: Device identifier. | |||
| receiver to identify a particular originator of probe packets. | ||||
| Along with "Sequence Number" it allows for tracking of packet | ||||
| order and loss within the network. | ||||
| (7) The "Write Offset" field specifies the offset for the next | (2) Bit 1: Timestamp. | |||
| telemetry record to be written in the probe packet body. It | ||||
| counts from the start of the packet body and must be initially | ||||
| set to the first octet after the "Record Template" field. It | ||||
| must be incremented by every network element that adds a | ||||
| telemetry record, without overflowing the storage. This | ||||
| simplifies the work for the subsequent network element - it just | ||||
| needs to parse the template and then add the data at the "Write | ||||
| Offset". | ||||
| 2.4. Telemetry Record Template | (3) Bit 2: Queueing delay. | |||
| The following figure defines the "Record Template". This template | (4) Bit 3: Ingress/Egress port identifiers. | |||
| uses type-length fields to describe the telemetry data records as | ||||
| added by network elements. The most significant bit in the "Type" | ||||
| field must be set to zero. | ||||
| 0 1 2 3 | (5) Bit 31: Opaque state snapshot request. | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | TL record count (N) | Must Be Zero | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |0| Type 1 | Length 1 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |0| Type 2 | Length 2 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| . . | ||||
| . . | ||||
| . . | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |0| Type N | Length N | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Figure 3: Record Template | (7) "Hop Limit" is defined only for "Message Type" of "1" | |||
| ("Probe"). For "Probe Reply" the "Hop Limit" field must be set | ||||
| to zero. This field is treated as an integer value | ||||
| representing the number of network elements. See the Section 4 | ||||
| section on the intended use of the field. | ||||
| 2.5. Telemetry Record | (8) The "Hop Count" field specifies the current number of hops of | |||
| capable network elements the packet has transit through. It | ||||
| begins with zero and must be incremented by one for every | ||||
| network element that adds a telemetry record. Combined with a | ||||
| push mechanism, this simplifies the work for the subsequent | ||||
| network element and the packet receiver. The subsequent | ||||
| network element just needs to parse the template and then | ||||
| insert new record(s) immediately after the template. | ||||
| This section defines the structure of a telemetry record. Every | (9) The "Max Length" field specifies the maximum length of the | |||
| network element capable of reporting inband telemetry data must add a | telemetry payload in bytes. Given that the sender knows the | |||
| record as defined in the "Record Template" to the probe packet. The | minimum path MTU, the sender can set the maximum of payload | |||
| new record must be inserted at the "Write Offset" position in the | bytes allowed before exceeding the MTU. Thus, a simple | |||
| packet payload, with the "Write Offset" subsequenly incremented by | comparison between "Current Length" and "Max Length" allows to | |||
| the size of the new record. The order of TLV elements must follow | decide whether or not data could be added. | |||
| the order prescribed by the Figure 3 portion of the probe packet. | ||||
| The most significant bit in the type field ("S-bit") must be set to | ||||
| "1" if the network element was able to understand and record the | ||||
| requested telemetry type. That bit must be set to zero otherwise, | ||||
| along with the contents of the "Value" field. The length field is | ||||
| the TLV field length including the "Type" and "Length" fields. | ||||
| If writing a new telemetry record to the packet body would cause it | (10) The "Current Length" field specifies the current length of data | |||
| to exceed the packet size, no record is added and the overflow | stored in the probe. This field is incremented by eacn network | |||
| "O-bit" must be set to "1" in the probe header. | element by the number of bytes it has added with the telemetry | |||
| data frame. | ||||
| (11) The "Sender's Handle" field is set by the sender to allow the | ||||
| receiver to identify a particular originator of probe packets. | ||||
| Along with "Sequence Number" it allows for tracking of packet | ||||
| order and loss within the network. | ||||
| 2.4. Telemetry Data Frame and Telemetry Data Records | ||||
| Each telemetry data frame is constructed by concatenating multiple | ||||
| telemetry data record, per the request in "Telemetry Request Vector" | ||||
| fields of the dataplane probe header. The frame starts with a 16-bit | ||||
| length field, which reflects the frame size in bytes, excluding the | ||||
| length of the field itself. Following the "Frame Length" field is a | ||||
| "Telemetry Response Vector" field: this vector corresponds to the | ||||
| records the network element was capable of recording in the frame. | ||||
| The body of the frame is constructed by appending fixed-size records | ||||
| corresponding to every bit set in "Telemetry Response Vector". All | ||||
| of the records, except the one requested by 31st bit ("Opaque State | ||||
| Snapshot") are fixed size, with their lengths defined in Section 3. | ||||
| The order of the records in the frame follows the order of the bits | ||||
| in the "Telemetry Request Vector" (also reflected in "Telemetry | ||||
| Response Vector"). Finally, if requested, a variable-length field is | ||||
| appended at the end of the frame, with the length field occupying the | ||||
| first 8 bits. This "length" field reflects the length of the opaque | ||||
| data excluding the length field itself. | ||||
| If inserting a new telemetry record would cause "Current Length" to | ||||
| exceed "Max Length", no record is added and the overflow "O-bit" must | ||||
| be set to "1" in the probe header. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type 1 | Length 1 | | | Frame Length | Must be Zero | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Telemetry Response Vector | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | | |||
| . . | . Fixed Size Field 0 . | |||
| . Value 1 . | . (if requested) . | |||
| . . | . . | |||
| | | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type 2 | Length 2 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | | | | | | |||
| . . | . Fixed Size Field 1 . | |||
| . Value 2 . | . (if requested) . | |||
| . . | . . | |||
| | | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| . . | . . | |||
| . . | . . | |||
| ~ ~ | ||||
| . . | ||||
| . . | . . | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type N | Length N | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | | | | | | |||
| . Fixed Size Field 30 . | ||||
| . (if requested) . | ||||
| . . | . . | |||
| . Value N . | | | | |||
| . . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Length | | | ||||
| +-+-+-+-+-+-+-+-+ + | ||||
| | | | ||||
| . Opaque State Snapshot . | ||||
| . (if requested) . | ||||
| | | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 4: Telemetry Record Format | Figure 3: Telemetry Frame Format | |||
| 3. Telemetry Record Types | 3. Telemetry Record Types | |||
| This section defines some of the telemetry record types that could be | This section defines some of the telemetry record types that could be | |||
| supported by the network elements. | supported by the network elements. | |||
| 3.1. Device Identifier | 3.1. Device Identifier | |||
| This is used to identify the device reporting telemetry information. | This record is used to identify the device reporting telemetry | |||
| This document does not prescribe any specific identifier format. | information. This document does not prescribe any specific | |||
| identifier format. In general, it is expected to be configured by | ||||
| the operator. The length of this record is 32-bit. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type = 1 (Device ID) | Length = 12 | | | Device ID | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Device ID (1) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Device ID (2) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 5: Device Identifier | Figure 4: Device Identifier | |||
| 3.2. Timestamp | 3.2. Timestamp | |||
| This telemetry record encodes the time that the packet enters and | This telemetry record encodes the time data associated with the | |||
| leaves the device, in UTC. The "entering" time is recorded when the | packet. Most existing hardware support timestamping for IEEE1588. | |||
| L2 header enters the processing pipeline. The "exit" time is | To leverage existing hardware capabilities, packet receive time is | |||
| recorded when the network elements starts serializing L2 header on | stored similarly as 48-bits of seconds, 32-bits of nanoseconds, and | |||
| egress port. | residence time is in 48-bits of nanoseconds. The length of this | |||
| record is 128 bits. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type = 10 (Timestamp) | Length = 28 | | | Receive Seconds [47:16] | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Receive Seconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Receive Microseconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Receive Nanoseconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Send Seconds | | | Receive Seconds [15:0] | Receive Nanoseconds [31:16] | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Send Microseconds | | | Receive Nanoseconds [15:0] | Residence Time [47:32] | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Send Nanoseconds | | | Residence Time [31:0] | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 6: Timestamp | Figure 5: Timestamp | |||
| 3.3. Queueing Delay | 3.3. Queueing Delay | |||
| Encodes the amount of time that the frame has spent queued in the | This record encodes the amount of time that the frame has spent | |||
| network element. This is only recorded if packet has been queued, | queued in the network element. This is only recorded if packet has | |||
| and defines the time spent in memory buffers. This could be helpful | been queued, and defines the time spent in memory buffers. This | |||
| to detect queueing-related delays in the network. In case of the | could be helpful to detect queueing-related delays in the network. | |||
| cut-through switching operation this must be set to zero. | If the queueing delay exceeds the maximum number of 2+ seconds | |||
| allowed by the 31-bit number, the network element must set the | ||||
| overflow "O-bit". In case of the cut-through switching operation | ||||
| this must be set to zero. The length of this record is 32 bits. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type = 11 (Queueing Delay) | Length = 16 | | |O| Nanoseconds | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Seconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Microseconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Nanoseconds | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 7: Queueing Delay | Figure 6: Queueing Delay | |||
| 3.4. Ingress/Egress Port IDs | 3.4. Ingress/Egress Port IDs | |||
| This record stores the ingress and egress physical ports used to | This record stores the ingress and egress physical ports used to | |||
| receive and send packet respectively. Here, "physical port" means a | receive and send packet respectively. Here, "physical port" means a | |||
| unit with actual MAC and PHY devices associated - not any logical | unit with actual MAC and PHY devices associated - not any logical | |||
| subdivision based, for example, on protocol level tags (e.g. VLAN). | subdivision based, for example, on protocol level tags (e.g. VLAN). | |||
| The port identifiers are opaque, and defined as 32-bit entries. | The port identifiers are opaque, and defined as 16-bit entries. For | |||
| example, those could be the corresponding SNMP ifIndex values. The | ||||
| 0 1 2 3 | length of this record is 32 bits. | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |S| Type = 12 (Port IDs) | Length = 12 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Ingress Port ID | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Egress Port ID | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Figure 8: Ingress/Egress Port IDs | ||||
| 3.5. Forwarding Information | ||||
| Records defined in this section require the network element to store | ||||
| forwarding information that was used to direct the packet to the | ||||
| next-hop. In the network that uses multiple forwarding plane | ||||
| implementations (e.g. IP and MPLS) the originator of the probe is | ||||
| required to populate the record template with all kinds of forwarding | ||||
| information it expects in the path. The network elements then | ||||
| populate the entries they know about, e.g. in IPv4-only network the | ||||
| "IPv6 Route" record will be left unfilled, and so will be "MPLS | ||||
| Route". | ||||
| 3.5.1. IPv6 Route | ||||
| This record stores the IPv6 route that has been used for packet | ||||
| forwarding. If not used, then S-bit is set to zero, along with the | ||||
| value field. | ||||
| 0 1 2 3 | ||||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |S| Type = 20 (IPv6 Route) | Length = 24 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | ECMP group size | ECMP group index | Prefix Length | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | IPv6 Address (1) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | IPv6 Address (2) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | IPv6 Address (3) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | IPv6 Address (4) | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Figure 9: IPv6 Route | ||||
| 3.5.2. IPv4 Route | ||||
| This record stores the IPv4 route that has been used for packet | ||||
| forwarding. If not used, then S-bit is set to zero, along with the | ||||
| value field. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type = 21 (IPv4 Route) | Length = 12 | | | Ingress Port ID | Egress Port ID | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | ECMP group size | ECMP group index | Prefix Length | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | IPv4 Address | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 10: IPv4 Route | Figure 7: Ingress/Egress Port IDs | |||
| 3.5.3. MPLS Route | 3.5. Opaque State Snapshot | |||
| This record stores the MPLS label mapping that has been used for | This record has variable size. It allows the network element to | |||
| packet forwarding. It is possible that inbound or outbound label set | store arbitrary state in the probe, without a pre-defined schema. | |||
| set to zero, if it was not used (e.g. on ingress or egress of the | The schema needs to made known to the analyzer by some out-of-band | |||
| domain). At the edge of IP2MPLS or MPLS2IP domain it is expected | means. The 16-bit "Schema Id" field in the record is supposed to let | |||
| that the device would fill in the "MPLS Route" telemetry record along | the analyzer know which particular schema to use, and it is expected | |||
| with the corresponding "IPv6 Route" or "IPv4 Route" records. | to be configured on the network element by the operator. This ID is | |||
| expected to be configured on the device by the network operator. | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |S| Type = 22 (MPLS Route) | Length = 16 | | | Length | Schema Id | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Operation | ECMP group size | ECMP group index | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Must Be Zero | Incoming MPLS Label | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Must Be Zero | Outgoing MPLS Label | | | | | |||
| | | | ||||
| | Opaque Data | | ||||
| ~ ~ | ||||
| . . | ||||
| . . | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 11: MPLS Route | Figure 8: Opaque State | |||
| There are three MPLS operations defined | ||||
| "1" - Push | ||||
| "2" - Pop | ||||
| "3" - Swap | ||||
| 4. Operating in loopback mode | 4. Operating in loopback mode | |||
| In "loopback" mode the flow of probes is "turned back" at a given | In "loopback" mode the flow of probes is "turned back" at some | |||
| network element. The network element that "turns" packets around is | network element. The network element that "turns" packets around is | |||
| identified using the "Hop Limit" field. The network element that | identified using the "Hop Limit" field. The network element that | |||
| receives a "Probe" type packet having "Hop Limit" value of "1" is | receives a "Probe" type packet having "Hop Limit" value equal to "Hop | |||
| required to perform the following: | Count" is required to perform the following: | |||
| Change the "Message Type" field to "Probe Reply" and set the "Hop | Change the "Message Type" field to "Probe Reply", and keep the | |||
| Limit" to zero. | "Hop Limit" at zero. | |||
| Swap the destination/source addresses and port values in the IP/ | Swap the destination/source IP addresses in the transport header | |||
| UDP headers of the probe packet. | to send the packet back to the originator. | |||
| Add a telemetry record as required using the newly build IP/UDP | Add a new telemetry data frame corresponding to the new forwarding | |||
| headers to determine forwarding information. | information. | |||
| This way, the original probe is routed back to originator. Notice | This way, the original probe is routed back to originator. Notice | |||
| that the return path may be different from the path that the original | that the return path may be different from the path that the original | |||
| probe has taken. This path will be recorded by the network elements | probe has taken. This path will be recorded by the network elements | |||
| as the reply is transported back to the sender. Using this technique | as the reply is transported back to the sender. Using this technique | |||
| one may progressively test a path until its breaking point. Unlike | one may progressively test a path until its breaking point. | |||
| the traditional traceroute utility, however, the returning packets | ||||
| are the original probes, not the ICMP messages. | ||||
| 5. Processing Probe Packet | If a network element is incapable of redirecting packets back to the | |||
| originator, another option would be exporting those packets to a | ||||
| network analyzer device, using some sort of encapsulation header. | ||||
| 5. Processing Probe Packet | ||||
| 5.1. Detecting a probe | 5.1. Detecting a probe | |||
| Since the probe looks like a regular UDP packet, the data-plane | As mentioned previously, a combination of techniques need to be used | |||
| hardware needs a way to recognize it for special processing. This | to differentiate the active probes. This may include, but should not | |||
| document does not prescribe a specific way to do that. For example, | be limited to using just the known position of "Probe Id" fields. | |||
| classification could be based on only the destination UDP port, or | ||||
| using more complex pattern matching techniques, e.g matching on the | ||||
| contents of "Probe Marker" field. | ||||
| 6. Non-Capable Devices | 6. Non-Capable Devices | |||
| Non-capable devices are those that cannot process a probe natively in | Non-capable devices are those that cannot process a probe natively in | |||
| the fast-path data plane. Further, there could be two types of such | the fast-path data plane. Further, there could be two types of such | |||
| devices: those that can still process it via the control-plane | devices: those that can still process it via the control-plane | |||
| software, and those that can not. The control-plane processing | software, and those that can not. The control-plane processing | |||
| should be triggered by use of the "Router-Alert" option for IPv4 of | should be triggered by use of the "Router-Alert" option for IPv4 of | |||
| IPv6 packets (see [RFC2113] or [RFC2711]) added by the originator of | IPv6 packets (see [RFC2113] or [RFC2711]) added by the originator of | |||
| the probe. A control-plane capable device is expected to interpret | the probe. A control-plane capable device is expected to interpret | |||
| skipping to change at page 15, line 13 ¶ | skipping to change at page 13, line 9 ¶ | |||
| capable MPLS network element is present on the probe's path. | capable MPLS network element is present on the probe's path. | |||
| 8. Multi-chip device considerations | 8. Multi-chip device considerations | |||
| TBD | TBD | |||
| 9. IANA Considerations | 9. IANA Considerations | |||
| None | None | |||
| 10. Acknowledgements | 10. References | |||
| The author would like to thank L.J. Wobker and Changhoom Kim for | ||||
| reviewing and providing valuable comments for the initial version of | ||||
| this document. | ||||
| 11. References | ||||
| 11.1. Normative References | 10.1. Normative References | |||
| [RFC2113] Katz, D., "IP Router Alert Option", RFC 2113, | [RFC2113] Katz, D., "IP Router Alert Option", RFC 2113, | |||
| DOI 10.17487/RFC2113, February 1997, | DOI 10.17487/RFC2113, February 1997, | |||
| <http://www.rfc-editor.org/info/rfc2113>. | <http://www.rfc-editor.org/info/rfc2113>. | |||
| [RFC2711] Partridge, C. and A. Jackson, "IPv6 Router Alert Option", | [RFC2711] Partridge, C. and A. Jackson, "IPv6 Router Alert Option", | |||
| RFC 2711, DOI 10.17487/RFC2711, October 1999, | RFC 2711, DOI 10.17487/RFC2711, October 1999, | |||
| <http://www.rfc-editor.org/info/rfc2711>. | <http://www.rfc-editor.org/info/rfc2711>. | |||
| [RFC6398] Le Faucheur, F., Ed., "IP Router Alert Considerations and | [RFC6398] Le Faucheur, F., Ed., "IP Router Alert Considerations and | |||
| Usage", BCP 168, RFC 6398, DOI 10.17487/RFC6398, October | Usage", BCP 168, RFC 6398, DOI 10.17487/RFC6398, October | |||
| 2011, <http://www.rfc-editor.org/info/rfc6398>. | 2011, <http://www.rfc-editor.org/info/rfc6398>. | |||
| [RFC6178] Smith, D., Mullooly, J., Jaeger, W., and T. Scholl, "Label | [RFC6178] Smith, D., Mullooly, J., Jaeger, W., and T. Scholl, "Label | |||
| Edge Router Forwarding of IPv4 Option Packets", RFC 6178, | Edge Router Forwarding of IPv4 Option Packets", RFC 6178, | |||
| DOI 10.17487/RFC6178, March 2011, | DOI 10.17487/RFC6178, March 2011, | |||
| <http://www.rfc-editor.org/info/rfc6178>. | <http://www.rfc-editor.org/info/rfc6178>. | |||
| 11.2. Informative References | 10.2. Informative References | |||
| [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. | [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. | |||
| Weingarten, "An Overview of Operations, Administration, | Weingarten, "An Overview of Operations, Administration, | |||
| and Maintenance (OAM) Tools", RFC 7276, | and Maintenance (OAM) Tools", RFC 7276, | |||
| DOI 10.17487/RFC7276, June 2014, | DOI 10.17487/RFC7276, June 2014, | |||
| <http://www.rfc-editor.org/info/rfc7276>. | <http://www.rfc-editor.org/info/rfc7276>. | |||
| Author's Address | Authors' Addresses | |||
| Petr Lapukhov | Petr Lapukhov | |||
| 1 Hacker Way | 1 Hacker Way | |||
| Menlo Park, CA 94025 | Menlo Park, CA 94025 | |||
| US | US | |||
| Email: petr@fb.com | Email: petr@fb.com | |||
| Remy Chang | ||||
| Barefoot Networks | ||||
| 2185 Park Boulevard | ||||
| Palo Alto, CA 94306 | ||||
| US | ||||
| Email: remy@barefootnetworks.com | ||||
| End of changes. 85 change blocks. | ||||
| 327 lines changed or deleted | 241 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||