This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. This informational document describes an architectural framework for network telemetry and the main components of corresponding systems. It has two issues related to TSV topics: First, the document lacks a discussion of the importance of congestion control for telemetry traffic as well as corresponding references, e.g., to RFC 8085. High-volume telemetry traffic can overload a network unless proper counter-measures are in place (i.e., at minimum "circuit breakers"). It doesn't seem appropriate to entirely ignore that issue. Second, language regarding the ambigous term "transport" and the references to Internet transport protocols must be improved to be consistent with IETF standards. Below are some examples for sections in which these issues are obvious. Section 3.4 It is worth noting that a network telemetry system should not be intrusive to normal network operations by avoiding the pitfall of the "observer effect". That is, it should not change the network behavior and affect the forwarding performance. Otherwise, the whole purpose of network telemetry is compromised. => This statement should be extended to be very explicit about the risk of causing network congestion by high-volume telemetry traffic unless proper isolation or traffic engineering techniques are in place, or congestion control mechanisms ensure that telemetry traffic backs off if it exceeds the network capacity. RFC 8085 is a relevant BCP in this space. As a side note, RFC 8085 discusses other relevant challenges as well, but the issues caused by potentially inelastic high-volume telemetry traffic seem particularly relevant for ensuring network stability when telemetry solutions get deployed. 4.1. Top Level Modules +---------+--------------+--------------+---------------+-----------+ | Module | Management | Control | Forwarding | External | | | Plane | Plane | Plane | Data | +---------+--------------+--------------+---------------+-----------+ |Object | config. & | control | flow & packet | terminal, | | | operation | protocol & | QoS, traffic | social & | | | state | signaling, | stat., buffer | environ- | | | | RIB | & queue stat.,| mental | | | | | ACL, FIB | | +---------+--------------+--------------+---------------+-----------+ |Export | main control | main control | fwding chip | various | |Location | CPU | CPU, | or linecard | | | | | linecard CPU | CPU; main | | | | | or forwarding| control CPU | | | | | chip | unlikely | | +---------+--------------+--------------+---------------+-----------+ |Data | YANG, MIB, | YANG, | template, | YANG, | |Model | syslog | custom | YANG, | custom | | | | | custom | | +---------+--------------+--------------+---------------+-----------+ |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | |Encoding | XML | XML, plain | | XML, plain| +---------+--------------+--------------+---------------+-----------+ |Protocol | gRPC,NETCONF,| gRPC,NETCONF,| IPFIX, mirror,| gRPC | | | | IPFIX, mirror| gRPC, NETFLOW | | +---------+--------------+--------------+---------------+-----------+ |Transport| HTTP, TCP | HTTP, TCP, | UDP | HTTP,TCP | | | | UDP | | UDP | +---------+--------------+--------------+---------------+-----------+ => This table needs to be corrected. 1/ At least the entry in the column "forwarding plane" for IPFIX seems incorrect, as the IETF has standardized IPFIX use over TCP, UDP and also SCTP. 2/ The label "transport" in the last line should be replaced by an other term (maybe "data transport"?). In the TCP/IP protocol stack, "HTTP" is not a transport but an application protocol, unlike TCP and UDP. As a result, the line headline should use a term that cannot be confused with the name of a layer in the TCP/IP protocol stack. 3/ The label "protocol" in the second but last line is also misleading. All entries in the "transport" line are protocols as well. The term "Application protocol" may be one option; others may exist as well. 4.1.1. Management Plane Telemetry * High Speed Data Transport: In order to keep up with the velocity of information, a server needs to be able to send large amounts of data at high frequency. Compact encoding formats or data compression schemes are needed to reduce the quantity of data and improve the data transport efficiency. The subscription mode, by replacing the query mode, reduces the interactions between clients and servers and helps to improve the server's efficiency. => The server is not the only bottleneck. This section needs to discuss the network as a potential bottleneck as well, and explain that a telemetry solution must protect the network from congestion by congestion control mechanisms or at least circuit breakers. RFC 8085 is a relevant BCP in this space. 4.1.2. Control Plane Telemetry => Discussion of the risk of congestion by telemetry protocols without congestion control (e.g., using UDP possibly without circuit breakers) is missing in this section. 4.1.3. Forwarding Plane Telemetry * The data plane devices must provide timely data with the minimum possible delay. Long processing, transport, storage, and analysis delay can impact the effectiveness of the control loop and even render the data useless. => Similar like in the previous section, this wording entirely ignores the impact of potential network capacity shortage and congestion. A reference to RFC 8085 and a corresponding discussion of how to meet the requirements from RFC 8085 is missing. 4.1.4. External Data Telemetry => As the communication with "external" entites outside the boundary of a provider network may be realized over the Internet, the risk of congestion as well as proper counter-measures is even more relevant in this section as compared to the previous sections.