idnits 2.17.1 draft-ietf-ippm-explicit-flow-measurements-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 25, 2021) is 906 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-15 -- Obsolete informational reference (is this intentional?): RFC 8321 (ref. 'AltMark') (Obsoleted by RFC 9341) == Outdated reference: A later version (-17) exists of draft-ietf-6man-ipv6-alt-mark-12 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-udp-options-13 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPPM M. Cociglio 3 Internet-Draft Telecom Italia - TIM 4 Intended status: Informational A. Ferrieux 5 Expires: April 28, 2022 Orange Labs 6 G. Fioccola 7 Huawei Technologies 8 I. Lubashev 9 Akamai Technologies 10 F. Bulgarella 11 Telecom Italia - TIM 12 I. Hamchaoui 13 Orange Labs 14 M. Nilo 15 Telecom Italia - TIM 16 R. Sisto 17 Politecnico di Torino 18 D. Tikhonov 19 LiteSpeed Technologies 20 October 25, 2021 22 Explicit Flow Measurements Techniques 23 draft-ietf-ippm-explicit-flow-measurements-00 25 Abstract 27 This document describes protocol independent methods called Explicit 28 Flow Measurement Techniques that employ few marking bits, inside the 29 header of each packet, for loss and delay measurement. The 30 endpoints, marking the traffic, signal these metrics to intermediate 31 observers allowing them to measure connection performance, and to 32 locate the network segment where impairments happen. Different 33 alternatives are considered within this document. These signaling 34 methods apply to all protocols but they are especially valuable when 35 applied to protocols that encrypt transport header and do not allow 36 traditional methods for delay and loss detection. 38 Status of This Memo 40 This Internet-Draft is submitted in full conformance with the 41 provisions of BCP 78 and BCP 79. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF). Note that other groups may also distribute 45 working documents as Internet-Drafts. The list of current Internet- 46 Drafts is at https://datatracker.ietf.org/drafts/current/. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 This Internet-Draft will expire on April 28, 2022. 55 Copyright Notice 57 Copyright (c) 2021 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (https://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 5 74 3. Latency Bits . . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.1. Spin Bit . . . . . . . . . . . . . . . . . . . . . . . . 5 76 3.2. Delay Bit . . . . . . . . . . . . . . . . . . . . . . . . 6 77 3.2.1. Generation Phase . . . . . . . . . . . . . . . . . . 9 78 3.2.2. Reflection Phase . . . . . . . . . . . . . . . . . . 9 79 3.2.3. T_Max Selection . . . . . . . . . . . . . . . . . . . 10 80 3.2.4. Delay Measurement using Delay Bit . . . . . . . . . . 11 81 3.2.5. Observer's Algorithm . . . . . . . . . . . . . . . . 13 82 3.2.6. Two Bits Delay Measurement: Spin Bit + Delay Bit . . 14 83 3.2.7. Hidden Delay Bit - Delay Bit with Privacy Protection 14 84 4. Loss Bits . . . . . . . . . . . . . . . . . . . . . . . . . . 14 85 4.1. T Bit - Round Trip Loss Bit . . . . . . . . . . . . . . . 15 86 4.1.1. Round Trip Packet Loss Measurement . . . . . . . . . 16 87 4.1.2. Setting the Round Trip Loss Bit on Outgoing Packets . 18 88 4.1.3. Observer's Logic for Round Trip Loss Signal . . . . . 19 89 4.1.4. Loss Coverage and Signal Timing . . . . . . . . . . . 20 90 4.2. Q Bit - Square Bit . . . . . . . . . . . . . . . . . . . 20 91 4.2.1. Q Block Length Selection . . . . . . . . . . . . . . 20 92 4.2.2. Upstream Loss . . . . . . . . . . . . . . . . . . . . 21 93 4.2.3. Identifying Q Block Boundaries . . . . . . . . . . . 22 94 4.3. L Bit - Loss Event Bit . . . . . . . . . . . . . . . . . 22 95 4.3.1. End-To-End Loss . . . . . . . . . . . . . . . . . . . 23 96 4.3.2. Loss Profile Characterization . . . . . . . . . . . . 23 97 4.4. L+Q Bits - Upstream, Downstream, and End-to-End Loss 98 Measurements . . . . . . . . . . . . . . . . . . . . . . 23 99 4.4.1. Correlating End-to-End and Upstream Loss . . . . . . 24 100 4.5. R Bit - Reflection Square Bit . . . . . . . . . . . . . . 25 101 4.5.1. R+Q Bits - Using R and Q Bits for Passive Loss 102 Measurement . . . . . . . . . . . . . . . . . . . . . 26 103 4.5.2. Enhancement of R Block Length Computation . . . . . . 30 104 4.5.3. Improved Resilience to Packet Reordering . . . . . . 30 105 4.6. Improved Q and R Bits Resilience to Burst Losses . . . . 30 106 5. Summary of Delay and Loss Marking Methods . . . . . . . . . . 31 107 6. ECN-Echo Event Bit . . . . . . . . . . . . . . . . . . . . . 33 108 6.1. Setting the ECN-Echo Event Bit on Outgoing Packets . . . 33 109 6.2. Using E Bit for Passive ECN-Reported Congestion 110 Measurement . . . . . . . . . . . . . . . . . . . . . . . 33 111 7. Protocol Ossification Considerations . . . . . . . . . . . . 34 112 8. Examples of Application . . . . . . . . . . . . . . . . . . . 34 113 8.1. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 34 114 8.2. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 115 9. Security Considerations . . . . . . . . . . . . . . . . . . . 35 116 9.1. Optimistic ACK Attack . . . . . . . . . . . . . . . . . . 36 117 10. Privacy Considerations . . . . . . . . . . . . . . . . . . . 36 118 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 119 12. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 37 120 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 37 121 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 37 122 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 123 15.1. Normative References . . . . . . . . . . . . . . . . . . 37 124 15.2. Informative References . . . . . . . . . . . . . . . . . 38 125 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 127 1. Introduction 129 Packet loss and delay are hard and pervasive problems of day-to-day 130 network operation. Proactively detecting, measuring, and locating 131 them is crucial to maintaining high QoS and timely resolution of 132 crippling end-to-end throughput issues. To this effect, in a TCP- 133 dominated world, network operators have been heavily relying on 134 information present in the clear in TCP headers: sequence and 135 acknowledgment numbers and SACKs when enabled (see [RFC8517]). These 136 allow for quantitative estimation of packet loss and delay by passive 137 on-path observation. Additionally, the problem can be quickly 138 identified in the network path by moving the passive observer around. 140 With encrypted protocols, the equivalent transport headers are 141 encrypted and passive packet loss and delay observations are not 142 possible, as described in [RFC9065]. 144 Measuring TCP loss and delay between similar endpoints cannot be 145 relied upon to evaluate encrypted protocol loss and delay. Different 146 protocols could be routed by the network differently, and the 147 fraction of Internet traffic delivered using protocols other than TCP 148 is increasing every year. It is imperative to measure packet loss 149 and delay experienced by encrypted protocol users directly. 151 This document defines Explicit Flow Measurement Techniques. These 152 hybrid measurement path signals (see [IPM-Methods]) are to be 153 embedded into a transport layer protocol and are explicitly intended 154 for exposing RTT and loss rate information to on-path measurement 155 devices. They are designed to facilitate network operations and 156 management and are "beneficial" for maintaining the quality of 157 service (see [RFC9065]). These measurement mechanisms are applicable 158 to any transport-layer protocol, and, as an example, the document 159 describes QUIC and TCP bindings. 161 The Explicit Flow Measurement Techniques described in this document 162 can be used alone or in combination with other Explicit Flow 163 Measurement Techniques. Each technique uses a small number of bits 164 and exposes a specific measurement. 166 Following the recommendation in [RFC8558] of making path signals 167 explicit, this document proposes adding a small number of dedicated 168 measurement bits to the clear portion of the protocol headers. These 169 bits can be added to an unencrypted portion of a header belonging to 170 any protocol layer, e.g. IP (see [IP]) and IPv6 (see [IPv6]) headers 171 or extensions, such as [IPv6AltMark], UDP surplus space (see 172 [UDP-OPTIONS] and [UDP-SURPLUS]), reserved bits in a QUIC v1 header, 173 as already done with the latency spin bit (see [QUIC-TRANSPORT]). 175 The measurements are not designed for use in automated control of the 176 network in environments where signal bits are set by untrusted hosts. 177 Instead, the signal is to be used for troubleshooting individual 178 flows as well as for monitoring the network by aggregating 179 information from multiple flows and raising operator alarms if 180 aggregate statistics indicate a potential problem. 182 The spin bit, delay bit and loss bits explained in this document are 183 inspired by [AltMark], [SPIN-BIT], [I-D.trammell-tsvwg-spin] and 184 [I-D.trammell-ippm-spin]. 186 Additional details about the Performance Measurements for QUIC are 187 described in the paper [ANRW19-PM-QUIC]. 189 2. Notational Conventions 191 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 192 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 193 document are to be interpreted as described in [RFC2119]. 195 3. Latency Bits 197 This section introduces bits that can be used for round trip latency 198 measurements. Whenever this section of the specification refers to 199 packets, it is referring only to packets with protocol headers that 200 include the latency bits. 202 [QUIC-TRANSPORT] introduces an explicit per-flow transport-layer 203 signal for hybrid measurement of RTT. This signal consists of a spin 204 bit that toggles once per RTT. [SPIN-BIT] discusses an additional 205 two-bit Valid Edge Counter (VEC) to compensate for loss and 206 reordering of the spin bit and increase fidelity of the signal in 207 less than ideal network conditions. 209 This document introduces a stand-alone single-bit delay signal that 210 can be used by passive observers to measure the RTT of a network 211 flow, avoiding the spin bit ambiguities that arise as soon as network 212 conditions deteriorate. 214 3.1. Spin Bit 216 This section is a small recap of the spin bit working mechanism. For 217 a comprehensive explanation of the algorithm, please see [SPIN-BIT]. 219 The spin bit is an alternate marking [AltMark] generated signal, 220 where the size of the alternation changes with the flight size each 221 RTT. 223 The latency spin bit is a single bit signal that toggles once per 224 RTT, enabling latency monitoring of a connection-oriented 225 communication from intermediate observation points. 227 A "spin period" is a set of packets with the same spin bit value sent 228 during one RTT time interval. A "spin period value" is the value of 229 the spin bit shared by all packets in a spin period. 231 The client and server maintain an internal per-connection spin value 232 (i.e. 0 or 1) used to set the spin bit on outgoing packets. Both 233 endpoints initialize the spin value to 0 when a new connection 234 starts. Then: 236 - when the client receives a packet with the packet number larger 237 than any number seen so far, it sets the connection spin value to 238 the opposite value contained in the received packet; 240 - when the server receives a packet with the packet number larger 241 than any number seen so far, it sets the connection spin value to 242 the same value contained in the received packet. 244 The computed spin value is used by the endpoints for setting the spin 245 bit on outgoing packets. This mechanism allows the endpoints to 246 generate a square wave such that, by measuring the distance in time 247 between pairs of consecutive edges observed in the same direction, a 248 passive on-path observer can compute the round trip delay of that 249 network flow. 251 Spin bit enables round trip latency measurement by observing a single 252 direction of the traffic flow. 254 Note that packet reordering can cause spurious edges that require 255 heuristics to correct. The spin bit performance deteriorates as soon 256 as network impairments arise as explained in Section 3.2. 258 3.2. Delay Bit 260 The delay bit has been designed to overcome accuracy limitations 261 experienced by the spin bit under difficult network conditions: 263 - packet reordering leads to generation of spurious edges and errors 264 in delay estimation; 266 - loss of edges causes wrong estimation of spin periods and 267 therefore wrong RTT measurements; 269 - application-limited senders cause the spin bit to measure the 270 application delays instead of network delays. 272 Unlike the spin bit, which is set in every packet transmitted on the 273 network, the delay bit is set only once per round trip. 275 When the delay bit is used, a single packet with a marked bit (the 276 delay bit) bounces between a client and a server during the entire 277 connection lifetime. This single packet is called "delay sample". 279 An observer placed at an intermediate point, observing a single 280 direction of traffic, tracking the delay sample and the relative 281 timestamp, can measure the round trip delay of the connection. 283 The delay sample lifetime is comprised of two phases: initialization 284 and reflection. The initialization is the generation of the delay 285 sample, while the reflection realizes the bounce behavior of this 286 single packet between the two endpoints. 288 The next figure describes the elementary Delay bit mechanism. 290 +--------+ - - - - - +--------+ 291 | | -----------> | | 292 | Client | | Server | 293 | | <----------- | | 294 +--------+ - - - - - +--------+ 296 (a) No traffic at beginning. 298 +--------+ 0 0 1 - - +--------+ 299 | | -----------> | | 300 | Client | | Server | 301 | | <----------- | | 302 +--------+ - - - - - +--------+ 304 (b) The Client starts sending data and 305 sets the first packet as Delay Sample. 307 +--------+ 0 0 0 0 0 +--------+ 308 | | -----------> | | 309 | Client | | Server | 310 | | <----------- | | 311 +--------+ - - - 1 0 +--------+ 313 (c) The Server starts sending data 314 and reflects the Delay Sample. 316 +--------+ 0 1 0 0 0 +--------+ 317 | | -----------> | | 318 | Client | | Server | 319 | | <----------- | | 320 +--------+ 0 0 0 0 0 +--------+ 322 (d) The Client reflects the Delay Sample. 324 +--------+ 0 0 0 0 0 +--------+ 325 | | -----------> | | 326 | Client | | Server | 327 | | <----------- | | 328 +--------+ 0 0 0 1 0 +--------+ 330 (e) The Server reflects the Delay Sample 331 and so on. 333 Delay bit mechanism 335 3.2.1. Generation Phase 337 Only client is actively involved in the generation phase. It 338 maintains an internal per-flow timestamp variable ("ds_time") updated 339 every time a delay sample is transmitted. 341 When connection starts, the client generates a new delay sample 342 initializing the delay bit of the first outgoing packet to 1. Then 343 it updates the "ds_time" variable with the timestamp of its 344 transmission. 346 The server initializes the delay bit to 0 at the beginning of the 347 connection, and its only task during the connection is described in 348 Section 3.2.2. 350 In absence of network impairments, the delay sample should bounce 351 between client and server continuously, for the entire duration of 352 the connection. That is highly unlikely for two reasons: 354 1. the packet carrying the delay bit might be lost; 356 2. an endpoint could stop or delay sending packets because the 357 application is limiting the amount of traffic transmitted; 359 To deal with these problems, the client generates a new delay sample 360 if more than a predetermined time ("T_Max") has elapsed since the 361 last delay sample transmission (including reflections). Note that 362 "T_Max" should be greater than the max measurable RTT on the network. 363 See Section 3.2.3 for details. 365 3.2.2. Reflection Phase 367 Reflection is the process that enables the bouncing of the delay 368 sample between a client and a server. The behavior of the two 369 endpoints is almost the same. 371 - Server side reflection: when a delay sample arrives, the server 372 marks the first packet in the opposite direction as the delay 373 sample. 375 - Client side reflection: when a delay sample arrives, the client 376 marks the first packet in the opposite direction as the delay 377 sample. It also updates the "ds_time" variable when the outgoing 378 delay sample is actually forwarded. 380 In both cases, if the outgoing delay sample is being transmitted with 381 a delay greater than a predetermined threshold after the reception of 382 the incoming delay sample (1ms by default), the delay sample is not 383 reflected, and the outgoing delay bit is kept at 0. 385 By doing so, the algorithm can reject measurements that would 386 overestimate the delay due to lack of traffic on the endpoints. 387 Hence, the maximum estimation error would amount to twice the 388 threshold (e.g. 2ms) per measurement. 390 3.2.3. T_Max Selection 392 The internal "ds_time" variable allows a client to identify delay 393 sample losses. Considering that a lost delay sample is regenerated 394 at the end of an explicit time ("T_Max") since the last generation, 395 this same value can be used by an observer to reject a measure and 396 start a new one. 398 In other words, if the difference in time between two delay samples 399 is greater or equal than "T_Max", then these cannot be used to 400 produce a delay measure. Therefore the value of "T_Max" must also be 401 known to the on-path network probes. 403 There are two alternatives to select the "T_Max" value so that both 404 client and observers know it. The first one requires that "T_Max" is 405 known a priori ("T_Max_p") and therefore set within the protocol 406 specifications that implements the marking mechanism (e.g. 1 second 407 which usually is greater than the max expectable RTT). The second 408 alternative requires a dynamic mechanism able to adapt the duration 409 of the "T_Max" to the delay of the connection ("T_Max_c"). 411 For instance, client and observers could use the connection RTT as a 412 basis for calculating an effective "T_Max". They should use a 413 predetermined initial value so that "T_Max = T_Max_p" (e.g. 1 second) 414 and then, when a valid RTT is measured, change "T_Max" accordingly so 415 that "T_Max = T_Max_c". In any case, the selected "T_Max" should be 416 large enough to absorb any possible variations in the connection 417 delay. 419 "T_Max_c" could be computed as two times the measured "RTT" plus a 420 fixed amount of time ("100ms") to prevent low "T_Max" values in case 421 of very small RTTs. The resulting formula is: "T_Max_c = 2RTT + 422 100ms". If "T_Max_c" is greater than "T_Max_p" then "T_Max_c" is 423 forced to "T_Max_p" value. 425 Note that the observer's "T_Max" should always be less than or equal 426 to the client's "T_Max" to avoid considering as a valid measurement 427 what is actually the client's "T_Max". To obtain this result, the 428 client waits for two consecutive incoming samples and computes the 429 two related RTTs. Then it takes the largest of them as the basis of 430 the "T_Max_c" formula. At this point, observers have already 431 measured a valid RTT and then computed their "T_Max_c". 433 3.2.4. Delay Measurement using Delay Bit 435 When the Delay Bit is used, a passive observer can use delay samples 436 directly and avoid inherent ambiguities in the calculation of the RTT 437 as can be seen in spin bit analysis. 439 3.2.4.1. RTT Measurement 441 The delay sample generation process ensures that only one packet 442 marked with the delay bit set to 1 runs back and forth between two 443 endpoints per round trip time. To determine the RTT measurement of a 444 flow, an on-path passive observer computes the time difference 445 between two delay samples observed in a single direction. 447 To ensure a valid measurement, the observer must verify that the 448 distance in time between the two samples taken into account is less 449 than "T_Max". 451 =======================|======================> 452 = ********** -----Obs----> ********** = 453 = * Client * * Server * = 454 = ********** <------------ ********** = 455 <============================================== 457 (a) client-server RTT 459 ==============================================> 460 = ********** ------------> ********** = 461 = * Client * * Server * = 462 = ********** <----Obs----- ********** = 463 <======================|======================= 465 (b) server-client RTT 467 Round-trip time (both direction) 469 3.2.4.2. Half-RTT Measurement 471 An observer that is able to observe both forward and return traffic 472 directions can use the delay samples to measure "upstream" and 473 "downstream" RTT components, also known as the half-RTT measurements. 474 It does this by measuring the time between a delay sample observed in 475 one direction and the delay sample previously observed in the 476 opposite direction. 478 As with RTT measurement, the observer must verify that the distance 479 in time between the two samples taken into account is less than 480 "T_Max". 482 Note that upstream and downstream sections of paths between the 483 endpoints and the observer, i.e. observer-to-client vs client-to- 484 observer and observer-to-server vs server-to-observer, may have 485 different delay characteristics due to the difference in network 486 congestion and other factors. 488 =======================> 489 = ********** ------|-----> ********** 490 = * Client * Obs * Server * 491 = ********** <-----|------ ********** 492 <======================= 494 (a) client-observer half-RTT 496 =======================> 497 ********** ------|-----> ********** = 498 * Client * Obs * Server * = 499 ********** <-----|------ ********** = 500 <======================= 502 (b) observer-server half-RTT 504 Half Round-trip time (both direction) 506 3.2.4.3. Intra-Domain RTT Measurement 508 Intra-domain RTT is the portion of the entire RTT used by a flow to 509 traverse the network of a provider. To measure intra-domain RTT, two 510 observers capable of observing traffic in both directions must be 511 employed simultaneously at ingress and egress of the network to be 512 measured. Intra-domain RTT is difference between the two computed 513 upstream (or downstream) RTT components. 515 =========================================> 516 = =====================> 517 = = ********** ---|--> ---|--> ********** 518 = = * Client * Obs Obs * Server * 519 = = ********** <--|--- <--|--- ********** 520 = <===================== 521 <========================================= 523 (a) client-observer RTT components (half-RTTs) 525 ==================> 526 ********** ---|--> ---|--> ********** 527 * Client * Obs Obs * Server * 528 ********** <--|--- <--|--- ********** 529 <================== 531 (b) the intra-domain RTT resulting from the 532 subtraction of the above RTT components 534 Intra-domain Round-trip time (client-observer: upstream) 536 3.2.5. Observer's Algorithm 538 An on-path observer maintains an internal per-flow variable to keep 539 track of time at which the last delay sample has been observed. 541 A unidirectional observer, upon detecting a delay sample: 543 - if a delay sample was also detected previously in the same 544 direction and the distance in time between them is less than 545 "T_Max - K", then the two delay samples can be used to calculate 546 RTT measurement. "K" is a protection threshold to absorb 547 differences in "T_Max" computation and delay variations between 548 two consecutive delay samples (e.g. "K = 10% T_Max"). 550 If the observer can observe both forward and return traffic flows, 551 and it is able to determine which direction contains the client and 552 the server (e.g. by observing the connection handshake), upon 553 detecting a delay sample: 555 - if a delay sample was also detected in the opposite direction and 556 the distance in time between them is less than "T_Max - K", then 557 the two delay samples can be used to measure the observer-client 558 half-RTT or the observer-server half-RTT, according to the 559 direction of the last delay sample observed. 561 3.2.6. Two Bits Delay Measurement: Spin Bit + Delay Bit 563 Spin and Delay bit algorithms work independently. If both marking 564 methods are used in the same connection, observers can choose the 565 best measurement between the two available: 567 - when a precise measurement can be produced using the delay bit, 568 observers choose it; 570 - when a delay bit measurement is not available, observers choose 571 the approximate spin bit one. 573 3.2.7. Hidden Delay Bit - Delay Bit with Privacy Protection 575 Theoretically, delay measurements can be used to roughly evaluate the 576 distance of the client from the server (using the RTT) or from any 577 intermediate observer (using the client-observer half-RTT). To 578 protect users privacy, the algorithm of the delay bit can be slightly 579 modified to mask the RTT of the connection to an intermediate 580 observer. This result can be achieved using a simple expedient which 581 consists in delaying the client-side reflection of the delay sample 582 by a predetermined time value. This would lead an intermediate 583 observer to inevitably measure a delay greater than the real one. 585 The Additional Delay should be randomly selected by the client and 586 kept constant for a certain amount of time across multiple 587 connections. This ensures that the client-server jitter remains the 588 same as if no Additional Delay had been inserted. For instance, a 589 new Additional Delay value could be generated whenever the client's 590 IP address changes. 592 Using this technique, despite the Additional Delay introduced, it is 593 still possible to correctly measure the right component of RTT 594 (observer-server) and all the intra-domain measurements used to 595 distribute the delay in the network. Furthermore, differently from 596 the Delay Bit, the hidden Delay Bit makes the use of the client 597 reflection threshold (1ms) redundant. Removing this threshold leads 598 to the further advantage of increasing the number of valid 599 measurements produced by the algorithm. 601 4. Loss Bits 603 This section introduces bits that can be used for loss measurements. 604 Whenever this section of the specification refers to packets, it is 605 referring only to packets with protocol headers that include the loss 606 bits - the only packets whose loss can be measured. 608 - T: the "round Trip loss" bit is used in combination with the Spin 609 bit to measure round-trip loss. See Section 4.1. 611 - Q: the "sQuare signal" bit is used to measure upstream loss. See 612 Section 4.2. 614 - L: the "Loss event" bit is used to measure end-to-end loss. See 615 Section 4.3. 617 - R: the "Reflection square signal" bit is used in combination with 618 Q bit to measure end-to-end loss. See Section 4.1. 620 Loss measurements enabled by T, Q, and L bits can be implemented by 621 those loss bits alone (T bit requires a working Spin Bit). Two-bit 622 combinations Q+L and Q+R enable additional measurement opportunities 623 discussed below. 625 Each endpoint maintains appropriate counters independently and 626 separately for each separately identifiable flow (each sub-flow for 627 multipath connections). 629 Since loss is reported independently for each flow, all bits (except 630 for L bit) require a certain minimum number of packets to be 631 exchanged per flow before any signal can be measured. Therefore, 632 loss measurements work best for flows that transfer more than a 633 minimal amount of data. 635 4.1. T Bit - Round Trip Loss Bit 637 The round Trip loss bit is used to mark a variable number of packets 638 exchanged twice between the endpoints realizing a two round-trip 639 reflection. A passive on-path observer, observing either direction, 640 can count and compare the number of marked packets seen during the 641 two reflections, estimating the loss rate experienced by the 642 connection. The overall exchange comprises: 644 - The client selects, generates and consequently transmits a first 645 train of packets, by setting the T bit to 1; 647 - The server, upon receiving each packet included in the first 648 train, reflects to the client a respective second train of packets 649 of the same size as the first train received, by setting the T bit 650 to 1; 652 - The client, upon receiving each packet included in the second 653 train, reflects to the server a respective third train of packets 654 of the same size as the second train received, by setting the T 655 bit to 1; 657 - The server, upon receiving each packet included in the third 658 train, finally reflects to the client a respective fourth train of 659 packets of the same size as the third train received, by setting 660 the T bit to 1. 662 Packets belonging to the first round trip (first and second train) 663 represent the Generation Phase, while those belonging to the second 664 round trip (third and fourth train) represent the Reflection Phase. 666 A passive on-path observer can count and compare the number of marked 667 packets seen during the two round trips (i.e. the first and third or 668 the second and the fourth trains of packets, depending on which 669 direction is observed) and estimate the loss rate experienced by the 670 connection. This process is repeated continuously to obtain more 671 measurements as long as the endpoints exchange traffic. These 672 measurements can be called Round Trip losses. 674 Since packet rates in two directions may be different, the number of 675 marked packets in the train is determined by the direction with the 676 lowest packet rate. See Section 4.1.2 for details on packet 677 generation and for a mechanism to allow an observer to distinguish 678 between trains belonging to different phases (Generation and 679 Reflection). 681 4.1.1. Round Trip Packet Loss Measurement 683 Since the measurements are performed on a portion of the traffic 684 exchanged between the client and the server, the observer calculates 685 the end-to-end Round Trip Packet Loss (RTPL) that, statistically, 686 will correspond to the loss rate experienced by the connection along 687 the entire network path. 689 =======================|======================> 690 = ********** -----Obs----> ********** = 691 = * Client * * Server * = 692 = ********** <------------ ********** = 693 <============================================== 695 (a) client-server RTPL 697 ==============================================> 698 = ********** ------------> ********** = 699 = * Client * * Server * = 700 = ********** <----Obs----- ********** = 701 <======================|======================= 703 (b) server-client RTPL 705 Round-trip packet loss (both direction) 707 This methodology also allows the Half-RTPL measurement and the Intra- 708 domain RTPL measurement in a way similar to RTT measurement. 710 =======================> 711 = ********** ------|-----> ********** 712 = * Client * Obs * Server * 713 = ********** <-----|------ ********** 714 <======================= 716 (a) client-observer half-RTPL 718 =======================> 719 ********** ------|-----> ********** = 720 * Client * Obs * Server * = 721 ********** <-----|------ ********** = 722 <======================= 724 (b) observer-server half-RTPL 726 Half Round-trip packet loss (both direction) 727 =========================================> 728 =====================> = 729 ********** ---|--> ---|--> ********** = = 730 * Client * Obs Obs * Server * = = 731 ********** <--|--- <--|--- ********** = = 732 <===================== = 733 <========================================= 735 (a) observer-server RTPL components (half-RTPLs) 737 ==================> 738 ********** ---|--> ---|--> ********** 739 * Client * Obs Obs * Server * 740 ********** <--|--- <--|--- ********** 741 <================== 743 (b) the intra-domain RTPL resulting from the 744 subtraction of the above RTPL components 746 Intra-domain Round-trip packet loss (observer-server) 748 4.1.2. Setting the Round Trip Loss Bit on Outgoing Packets 750 The round Trip loss signal requires a working Spin-bit signal to 751 separate trains of marked packets (packets with T bit set to 1). A 752 "pause" of at least one empty spin-bit period between each phase of 753 the algorithm serves as such separator for the on-path observer. 755 The client is in charge of launching trains of marked packets and 756 does so according to the algorithm: 758 1. Generation Phase. The client starts generating marked packets 759 for two consecutive spin-bit periods; it maintains a "generation 760 token" count that is reset to zero at the beginning of the 761 algorithm phase and is incremented every time a packet arrives. 762 When the client transmits a packet and a "generation token" is 763 available, the client marks the packet and retires a "generation 764 token". If no token is available, the outgoing packet is 765 transmitted unmarked. At the end of the first spin-bit period 766 spent in generation, the reflection counter is unlocked to start 767 counting incoming marked packets that will be reflected later; 769 2. Pause Phase. When the generation is completed, the client pauses 770 till it has observed one entire spin bit period with no marked 771 packets. That spin bit period is used by the observer as a 772 separator between generated and reflected packets. During this 773 marking pause, all the outgoing packets are transmitted with T 774 bit set to 0. The reflection counter is still incremented every 775 time a marked packet arrives; 777 3. Reflection Phase. The client starts transmitting marked packets, 778 decrementing the reflection counter for each transmitted marked 779 packet until the reflection counter reached zero. The 780 "generation token" method from the generation phase is used 781 during this phase as well. At the end of the first spin-period 782 spent in reflection, the reflection counter is locked to avoid 783 incoming reflected packets incrementing it; 785 4. Pause Phase 2. The pause phase is repeated after the reflection 786 phase and serves as a separator between the reflected packet 787 train and a new packet train. 789 The generation token counter should be capped to limit the effects of 790 a subsequent sudden reduction in the other endpoint's packet rate 791 that could prevent that endpoint from reflecting collected packets. 792 The most conservative cap value is "1". 794 A server maintains a "marking counter" that starts at zero and is 795 incremented every time a marked packet arrives. When the server 796 transmits a packet and the "marking counter" is positive, the server 797 marks the packet and decrements the "marking counter". If the 798 "marking counter" is zero, the outgoing packet is transmitted 799 unmarked. 801 4.1.3. Observer's Logic for Round Trip Loss Signal 803 The on-path observer counts marked packets and separates different 804 trains by detecting spin-bit periods (at least one) with no marked 805 packets. The Round Trip Packet Loss (RTPL) is the difference between 806 the size of the Generation train and the Reflection train. 808 In the following example, packets are represented by two bits (first 809 one is the spin bit, second one is the loss bit): 811 Generation Pause Reflection Pause 812 ____________________ ______________ ____________________ ________ 813 | | | | | 814 01 01 00 01 11 10 11 00 00 10 10 10 01 00 01 01 10 11 10 00 00 10 816 Round Trip Loss signal example 818 Note that 5 marked packets have been generated of which 4 have been 819 reflected. 821 4.1.4. Loss Coverage and Signal Timing 823 A cycle of the round Trip loss signaling algorithm contains 2 RTTs of 824 Generation phase, 2 RTTs of Reflection phase, and two Pause phases at 825 least 1 RTT in duration each. Hence, the loss signal is delayed by 826 about 6 RTTs since the loss events. 828 The observer can only detect loss of marked packets that occurs after 829 its initial observation of the Generation phase and before its 830 subsequent observation of the Reflection phase. Hence, if the loss 831 occurs on the path that sends packets at a lower rate (typically ACKs 832 in such asymmetric scenarios), "2/6" ("1/3") of the packets will be 833 sampled for loss detection. 835 If the loss occurs on the path that sends packets at a higher rate, 836 "lowPacketRate/(3*highPacketRate)" of the packets will be sampled for 837 loss detection. For protocols that use ACKs, the portion of packets 838 sampled for loss in the higher rate direction during unidirectional 839 data transfer is "1/(3*packetsPerAck)", where the value of 840 "packetsPerAck" can vary by protocol, by implementation, and by 841 network conditions. 843 4.2. Q Bit - Square Bit 845 The sQuare bit (Q bit) takes its name from the square wave generated 846 by its signal. Every outgoing packet contains the Q bit value, which 847 is initialized to the 0 and inverted after sending N packets (a 848 sQuare Block or simply Q Block). Hence, Q Period is 2*N. The Q bit 849 represents "packet color" as defined by [AltMark]. 851 Observation points can estimate upstream losses by watching a single 852 direction of the traffic flow and counting the number of packets in 853 each observed Q Block, as described in Section 4.2.2. 855 4.2.1. Q Block Length Selection 857 The length of the block must be known to the on-path network probes. 858 There are two alternatives to selecting the Q Block length. The 859 first one requires that the length is known a priori and therefore 860 set within the protocol specifications that implements the marking 861 mechanism. The second requires the sender to select it. 863 In this latter scenario, the sender is expected to choose N (Q Block 864 length) based on the expected amount of loss and reordering on the 865 path. The choice of N strikes a compromise - the observation could 866 become too unreliable in case of packet reordering and/or severe loss 867 if N is too small, while short flows may not yield a useful upstream 868 loss measurement if N is too large (see Section 4.2.2). 870 The value of N should be at least 64 and be a power of 2. This 871 requirement allows an Observer to infer the Q Block length by 872 observing one period of the square signal. It also allows the 873 Observer to identify flows that set the loss bits to arbitrary values 874 (see Section 7). 876 If the sender does not have sufficient information to make an 877 informed decision about Q Block length, the sender should use N=64, 878 since this value has been extensively tried in large-scale field 879 tests and yielded good results. Alternatively, the sender may also 880 choose a random power-of-2 N for each flow, increasing the chances of 881 using a Q Block length that gives the best signal for some flows. 883 The sender must keep the value of N constant for a given flow. 885 4.2.2. Upstream Loss 887 Blocks of N (Q Block length) consecutive packets are sent with the 888 same value of the Q bit, followed by another block of N packets with 889 an inverted value of the Q bit. Hence, knowing the value of N, an 890 on-path observer can estimate the amount of upstream loss after 891 observing at least N packets. The upstream loss rate ("uloss") is 892 one minus the average number of packets in a block of packets with 893 the same Q value ("p") divided by N ("uloss=1-avg(p)/N"). 895 The observer needs to be able to tolerate packet reordering that can 896 blur the edges of the square signal, as explained in Section 4.2.3. 898 =====================> 899 ********** -----Obs----> ********** 900 * Client * * Server * 901 ********** <------------ ********** 903 (a) in client-server channel (uloss_up) 905 ********** ------------> ********** 906 * Client * * Server * 907 ********** <----Obs----- ********** 908 <===================== 910 (b) in server-client channel (uloss_down) 912 Upstream loss 914 4.2.3. Identifying Q Block Boundaries 916 Packet reordering can produce spurious edges in the square signal. 917 To address this, the observer should look for packets with the 918 current Q bit value up to X packets past the first packet with a 919 reverse Q bit value. The value of X, a "Marking Block Threshold", 920 should be less than "N/2". 922 The choice of X represents a trade-off between resiliency to 923 reordering and resiliency to loss. A very large Marking Block 924 Threshold will be able to reconstruct Q Blocks despite a significant 925 amount of reordring, but it may erroneously coalesce packets from 926 multiple Q Blocks into fewer Q Blocks, if loss exceeds 50% for some Q 927 Blocks. 929 4.3. L Bit - Loss Event Bit 931 The Loss Event bit uses an Unreported Loss counter maintained by the 932 protocol that implements the marking mechanism. To use the Loss 933 Event bit, the protocol must allow the sender to identify lost 934 packets. This is true of protocols such as QUIC, partially true for 935 TCP and SCTP (losses of pure ACKs are not detected) and is not true 936 of protocols such as UDP and IP/IPv6. 938 The Unreported Loss counter is initialized to 0, and L bit of every 939 outgoing packet indicates whether the Unreported Loss counter is 940 positive (L=1 if the counter is positive, and L=0 otherwise). 942 The value of the Unreported Loss counter is decremented every time a 943 packet with L=1 is sent. 945 The value of the Unreported Loss counter is incremented for every 946 packet that the protocol declares lost, using whatever loss detection 947 machinery the protocol employs. If the protocol is able to rescind 948 the loss determination later, a positive Unreported Loss counter may 949 be decremented due to the rescission, but it should NOT become 950 negative due to the rescission. 952 This loss signaling is similar to loss signaling in [ConEx], except 953 the Loss Event bit is reporting the exact number of lost packets, 954 whereas Echo Loss bit in [ConEx] is reporting an approximate number 955 of lost bytes. 957 For protocols, such as TCP ([TCP]), that allow network devices to 958 change data segmentation, it is possible that only a part of the 959 packet is lost. In these cases, the sender must increment Unreported 960 Loss counter by the fraction of the packet data lost (so Unreported 961 Loss counter may become negative when a packet with L=1 is sent after 962 a partial packet has been lost). 964 Observation points can estimate the end-to-end loss, as determined by 965 the upstream endpoint, by counting packets in this direction with the 966 L bit equal to 1, as described in Section 4.3.1. 968 4.3.1. End-To-End Loss 970 The Loss Event bit allows an observer to estimate the end-to-end loss 971 rate by counting packets with L bit value of 0 and 1 for a given 972 flow. The end-to-end loss rate is the fraction of packets with L=1. 974 The assumption here is that upstream loss affects packets with L=0 975 and L=1 equally. If some loss is caused by tail-drop in a network 976 device, this may be a simplification. If the sender's congestion 977 controller reduces the packet send rate after loss, there may be a 978 sufficient delay before sending packets with L=1 that they have a 979 greater chance of arriving at the observer. 981 4.3.2. Loss Profile Characterization 983 In addition to measuring the end-to-end loss rate, the Loss Event bit 984 allows an observer to characterize loss profile, since the 985 distribution of observed packets with L bit set to 1 roughly 986 corresponds to the distribution of packets lost between 1 RTT and 1 987 RTO before (see Section 4.4.1). Hence, observing random single 988 instances of L bit set to 1 indicates random single packet loss, 989 while observing blocks of packets with L bit set to 1 indicates loss 990 affecting entire blocks of packets. 992 4.4. L+Q Bits - Upstream, Downstream, and End-to-End Loss Measurements 994 Combining L and Q bits allows a passive observer watching a single 995 direction of traffic to accurately measure: 997 - upstream loss: sender-to-observer loss (see Section 4.2.2) 999 - downstream loss: observer-to-receiver loss (see Section 4.4.1.1) 1001 - end-to-end loss: sender-to-receiver loss on the observed path (see 1002 Section 4.3.1) with loss profile characterization (see 1003 Section 4.3.2) 1005 4.4.1. Correlating End-to-End and Upstream Loss 1007 Upstream loss is calculated by observing packets that did not suffer 1008 the upstream loss (Section 4.2.2). End-to-end loss, however, is 1009 calculated by observing subsequent packets after the sender's 1010 protocol detected the loss. Hence, end-to-end loss is generally 1011 observed with a delay of between 1 RTT (loss declared due to multiple 1012 duplicate acknowledgments) and 1 RTO (loss declared due to a timeout) 1013 relative to the upstream loss. 1015 The flow RTT can sometimes be estimated by timing protocol handshake 1016 messages. This RTT estimate can be greatly improved by observing a 1017 dedicated protocol mechanism for conveying RTT information, such as 1018 the Spin bit (see Section 3.1) or Delay bit (see Section 3.2). 1020 Whenever the observer needs to perform a computation that uses both 1021 upstream and end-to-end loss rate measurements, it should use 1022 upstream loss rate leading the end-to-end loss rate by approximately 1023 1 RTT. If the observer is unable to estimate RTT of the flow, it 1024 should accumulate loss measurements over time periods of at least 4 1025 times the typical RTT for the observed flows. 1027 If the calculated upstream loss rate exceeds the end-to-end loss rate 1028 calculated in Section 4.3.1, then either the Q Period is too short 1029 for the amount of packet reordering or there is observer loss, 1030 described in Section 4.4.1.2. If this happens, the observer should 1031 adjust the calculated upstream loss rate to match end-to-end loss 1032 rate, unless the following applies. 1034 In case of a protocol like TCP and SCTP that does not track losses of 1035 pure ACK packets, observing a direction of traffic dominated by pure 1036 ACK packets could result in measured upstream loss that is higher 1037 than measured end-to-end loss, if said pure ACK packets are lost 1038 upstream. Hence, if the measurement is applied to such protocols, 1039 and the observer can confirm that pure ACK packets dominate the 1040 observed traffic direction, the observer should adjust the calculated 1041 end-to-end loss rate to match upstream loss rate. 1043 4.4.1.1. Downstream Loss 1045 Because downstream loss affects only those packets that did not 1046 suffer upstream loss, the end-to-end loss rate ("eloss") relates to 1047 the upstream loss rate ("uloss") and downstream loss rate ("dloss") 1048 as "(1-uloss)(1-dloss)=1-eloss". Hence, "dloss=(eloss- 1049 uloss)/(1-uloss)". 1051 4.4.1.2. Observer Loss 1053 A typical deployment of a passive observation system includes a 1054 network tap device that mirrors network packets of interest to a 1055 device that performs analysis and measurement on the mirrored 1056 packets. The observer loss is the loss that occurs on the mirror 1057 path. 1059 Observer loss affects upstream loss rate measurement, since it causes 1060 the observer to account for fewer packets in a block of identical Q 1061 bit values (see Section 4.2.2). The end-to-end loss rate 1062 measurement, however, is unaffected by the observer loss, since it is 1063 a measurement of the fraction of packets with the L bit value of 1, 1064 and the observer loss would affect all packets equally (see 1065 Section 4.3.1). 1067 The need to adjust the upstream loss rate down to match end-to-end 1068 loss rate as described in Section 4.4.1 is an indication of the 1069 observer loss, whose magnitude is between the amount of such 1070 adjustment and the entirety of the upstream loss measured in 1071 Section 4.2.2. Alternatively, a high apparent upstream loss rate 1072 could be an indication of significant packet reordering, possibly due 1073 to packets belonging to a single flow being multiplexed over several 1074 upstream paths with different latency characteristics. 1076 4.5. R Bit - Reflection Square Bit 1078 R bit requires a deployment alongside Q bit. Unlike the square 1079 signal for which packets are transmitted into blocks of fixed size, 1080 the Reflection square signal (being an alternate marking signal too) 1081 produces blocks of packets whose size varies according to these 1082 rules: 1084 - when the transmission of a new block starts, its size is set equal 1085 to the size of the last Q Block whose reception has been 1086 completed; 1088 - if, before transmission of the block is terminated, the reception 1089 of at least one further Q Block is completed, the size of the 1090 block is updated to the average size of the further received Q 1091 Blocks. Implementation details follow. 1093 The Reflection square value is initialized to 0 and is applied to the 1094 R-bit of every outgoing packet. The Reflection square value is 1095 toggled for the first time when the completion of a Q Block is 1096 detected in the incoming square signal (produced by the opposite node 1097 using the Q-bit). When this happens, the number of packets ("p"), 1098 detected within this first Q Block, is used to generate a reflection 1099 square signal which toggles every "M=p" packets (at first). This new 1100 signal produces blocks of M packets (marked using the R-bit) and each 1101 of them is called "Reflection Block" (R Block). 1103 The M value is then updated every time a completed Q Block in the 1104 incoming square signal is received, following this formula: 1105 "M=round(avg(p))". 1107 The parameter "avg(p)" is the average number of packets in a marking 1108 period computed considering all the Q Blocks received since the 1109 beginning of the current R Block. 1111 To ensure a proper computation of the M value, endpoints implementing 1112 the R bit must identify the boundaries of incoming Q Blocks. The 1113 same approach described in {#endmarkingblock} should be used. 1115 Looking at the R-bit, unidirectional observation points have an 1116 indication of losses experienced by the entire unobserved channel 1117 plus those occurred in the path from the sender up to them. 1119 Since the Q Block is sent in one direction, and the corresponding 1120 reflected R Block is sent in the opposite direction, the reflected R 1121 signal is transmitted with the packet rate of the slowest direction. 1122 Namely, if the observed direction is the slowest, there can be 1123 multiple Q Blocks transmitted in the unobserved direction before a 1124 complete R Block is transmitted in the observed direction. If the 1125 unobserved direction is the slowest, the observed direction can be 1126 sending R Blocks of the same size repeatedly before it can update the 1127 signal to account for a newly-completed Q Block. 1129 4.5.1. R+Q Bits - Using R and Q Bits for Passive Loss Measurement 1131 Since both sQuare and Reflection square bits are toggled at most 1132 every N packets (except for the first transition of the R-bit as 1133 explained before), an on-path observer can count the number of 1134 packets of each marking block and, knowing the value of N, can 1135 estimate the amount of loss experienced by the connection. An 1136 observer can calculate different measurements depending on whether it 1137 is able to observe a single direction of the traffic or both 1138 directions. 1140 Single directional observer: 1142 - upstream loss in the observed direction: the loss between the 1143 sender and the observation point (see Section 4.2.2) 1145 - "three-quarters" connection loss: the loss between the receiver 1146 and the sender in the unobserved direction plus the loss between 1147 the sender and the observation point in the observed direction 1149 - end-to-end loss in the unobserved direction: the loss between the 1150 receiver and the sender in the opposite direction 1152 Two directions observer (same metrics seen previously applied to both 1153 direction, plus): 1155 - client-observer half round-trip loss: the loss between the client 1156 and the observation point in both directions 1158 - observer-server half round-trip loss: the loss between the 1159 observation point and the server in both directions 1161 - downstream loss: the loss between the observation point and the 1162 receiver (applicable to both directions) 1164 4.5.1.1. Three-Quarters Connection Loss 1166 Except for the very first block in which there is nothing to reflect 1167 (a complete Q Block has not been yet received), packets are 1168 continuously R-bit marked into alternate blocks of size lower or 1169 equal than N. Knowing the value of N, an on-path observer can 1170 estimate the amount of loss occurred in the whole opposite channel 1171 plus the loss from the sender up to it in the observation channel. 1172 As for the previous metric, the "three-quarters" connection loss rate 1173 ("tqloss") is one minus the average number of packets in a block of 1174 packets with the same R value ("t") divided by "N" 1175 ("tqloss=1-avg(t)/N"). 1177 =======================> 1178 = ********** -----Obs----> ********** 1179 = * Client * * Server * 1180 = ********** <------------ ********** 1181 <============================================ 1183 (a) in client-server channel (tqloss_up) 1185 ============================================> 1186 ********** ------------> ********** = 1187 * Client * * Server * = 1188 ********** <----Obs----- ********** = 1189 <======================= 1191 (b) in server-client channel (tqloss_down) 1193 Three-quarters connection loss 1195 The following metrics derive from this last metric and the upstream 1196 loss produced by the Q Bit. 1198 4.5.1.2. End-To-End Loss in the Opposite Direction 1200 End-to-end loss in the unobserved direction ("eloss_unobserved") 1201 relates to the "three-quarters" connection loss ("tqloss") and 1202 upstream loss in the observed direction ("uloss") as 1203 "(1-eloss_unobserved)(1-uloss)=1-tqloss". Hence, 1204 "eloss_unobserved=(tqloss-uloss)/(1-uloss)". 1206 ********** -----Obs----> ********** 1207 * Client * * Server * 1208 ********** <------------ ********** 1209 <========================================== 1211 (a) in client-server channel (eloss_down) 1213 ==========================================> 1214 ********** ------------> ********** 1215 * Client * * Server * 1216 ********** <----Obs----- ********** 1218 (b) in server-client channel (eloss_up) 1220 End-To-End loss in the opposite direction 1222 4.5.1.3. Half Round-Trip Loss 1224 If the observer is able to observe both directions of traffic, it is 1225 able to calculate two "half round-trip" loss measurements - loss from 1226 the observer to the receiver (in a given direction) and then back to 1227 the observer in the opposite direction. For both directions, "half 1228 round-trip" loss ("hrtloss") relates to "three-quarters" connection 1229 loss ("tqloss_opposite") measured in the opposite direction and the 1230 upstream loss ("uloss") measured in the given direction as 1231 "(1-uloss)(1-hrtloss)=1-tqloss_opposite". Hence, 1232 "hrtloss=(tqloss_opposite-uloss)/(1-uloss)". 1234 =======================> 1235 = ********** ------|-----> ********** 1236 = * Client * Obs * Server * 1237 = ********** <-----|------ ********** 1238 <======================= 1240 (a) client-observer half round-trip loss (hrtloss_co) 1242 =======================> 1243 ********** ------|-----> ********** = 1244 * Client * Obs * Server * = 1245 ********** <-----|------ ********** = 1246 <======================= 1248 (b) observer-server half round-trip loss (hrtloss_os) 1250 Half Round-trip loss (both direction) 1252 4.5.1.4. Downstream Loss 1254 If the observer is able to observe both directions of traffic, it is 1255 able to calculate two downstream loss measurements using either end- 1256 to-end loss and upstream loss, similar to the calculation in 1257 Section 4.4.1.1 or using "half round-trip" loss and upstream loss in 1258 the opposite direction. 1260 For the latter, "dloss=(hrtloss-uloss_opposite)/(1-uloss_opposite)". 1262 =====================> 1263 ********** ------|-----> ********** 1264 * Client * Obs * Server * 1265 ********** <-----|------ ********** 1267 (a) in client-server channel (dloss_up) 1269 ********** ------|-----> ********** 1270 * Client * Obs * Server * 1271 ********** <-----|------ ********** 1272 <===================== 1274 (b) in server-client channel (dloss_down) 1276 Downstream loss 1278 4.5.2. Enhancement of R Block Length Computation 1280 The use of the rounding function used in the M computation introduces 1281 errors that can be minimized by storing the rounding applied each 1282 time M is computed, and using it during the computation of the M 1283 value in the following R Block. 1285 This can be achieved introducing the new "r_avg" parameter in the 1286 computation of M. The new formula is "Mr=avg(p)+r_avg; M=round(Mr); 1287 r_avg=Mr-M" where the initial value of "r_avg" is equal to 0. 1289 4.5.3. Improved Resilience to Packet Reordering 1291 When a protocol implementing the marking mechanism is able to detect 1292 when packets are received out of order, it can improve resilience to 1293 packet reordering beyond what is possible using methods described in 1294 Section 4.2.3. 1296 This can be achieved by updating the size of the current R Block 1297 while this is being transmitted. The reflection block size is then 1298 updated every time an incoming reordered packet of the previous Q 1299 Block is detected. This can be done if and only if the transmission 1300 of the current reflection block is in progress and no packets of the 1301 following Q Block have been received. 1303 4.6. Improved Q and R Bits Resilience to Burst Losses 1305 Burst losses can affect Q and R measurements accuracy. Generally, 1306 burst losses can be absorbed and correctly measured if smaller than 1307 the established Q Block length. On the other hand, entire periods 1308 might be wiped out if the burst sizes become too large thus making 1309 the observer completely unaware of their loss. 1311 To improve burst loss resilience, an observer might consider a 1312 received Q or R Block larger than the selected Q Block length as a 1313 burst loss event. Then compute the loss as three times Q Block 1314 length minus the measured block length. By doing so, an observer can 1315 detect burst losses of less than two blocks (e.g., less than 128 1316 packets for Q Block length of 64 packets). A burst loss equal or 1317 greater than two consecutive periods would still remain unnoticed by 1318 the observer (or underestimated if a period longer than Q Block 1319 length were formed). 1321 5. Summary of Delay and Loss Marking Methods 1323 This section summarizes the marking methods described in this draft. 1325 For the Delay measurement, it is possible to use the spin bit and/or 1326 the delay bit. A unidirectional or bidirectional observer can be 1327 used. 1329 +---------------+----+------------------------+--------------------+ 1330 | Method |# of| Available | | # of | 1331 | |bits| Delay Metrics | Impairments | meas.| 1332 | | +------------+-----------+ Resiliency | | 1333 | | | UNIDIR | BIDIR | | | 1334 | | | Observer | Observer | | | 1335 +---------------+----+------------+-----------+-------------+------+ 1336 |S: Spin Bit | 1 | RTT | x2 | low | very | 1337 | | | | Half RTT | | high | 1338 +---------------+----+------------+-----------+-------------+------+ 1339 |D: Delay Bit | 1 | RTT | x2 | high |medium| 1340 | | | | Half RTT | | | 1341 +---------------+----+------------+-----------+-------------+------+ 1342 |D^: Hidden | 1 | RTT^ | x2 | high | high | 1343 | Delay Bit | | | Left Half^| | | 1344 | | | | Right Half| | | 1345 +---------------+----+------------+-----------+-------------+------+ 1346 |SD: Spin Bit & | 2 | RTT | x2 | high | very | 1347 | Delay Bit *| | | Half RTT | | high | 1348 +---------------+----+------------+-----------+-------------+------+ 1350 x2 Same metric for both directions 1351 * Both algorithms work independtly; an observer could use 1352 approximate spin bit measures when delay bit ones aren't available 1353 ^ Masked metric (real value can be calculated only by those who know 1354 the Additional Delay) 1356 Figure 1: Delay Comparison 1358 For the Loss measurement, each row in the table of Figure 2 1359 represents a loss marking method. For each method the table 1360 specifies the number of bits required in the header, the available 1361 metrics using an unidirectional or bidirectional observer, applicable 1362 protocols, measurement fidelity and delay. 1364 +-------------+-+-----------------------+-+------------------------+ 1365 | Method |B| Available |P| Measurement Aspects | 1366 | |i| Loss Metrics |r+------------+-----------+ 1367 | |t| UNIDIR | BIDIR |t| Fidelity | Delay | 1368 | |s| Observer | Observer |o| | | 1369 +-------------+-+-----------+-----------+-+------------+-----------+ 1370 |T: Round Trip|$| RT | x2 | | Rate by | ~6 RTT | 1371 | Loss Bit |1| | Half RT |*| sampling +-----------+ 1372 | | | | | | 1/3 to 1/(3*ppa) of | 1373 | | | | | | pkts over 2 RTT | 1374 +-------------+-+-----------+-----------+-+------------+-----------+ 1375 |Q: Square Bit|1| Upstream | x2 |*| Rate over | N pkts | 1376 | | | | | | N pkts | (e.g. 64) | 1377 | | | | | | (e.g. 64) | | 1378 +-------------+-+-----------+-----------+-+------------+-----------+ 1379 |L: Loss Event|1| E2E | x2 |#| Loss shape | Min: RTT | 1380 | Bit | | | | | (and rate) | Max: RTO | 1381 +-------------+-+-----------+-----------+-+------------+-----------+ 1382 |QL: Square + |2| Upstream | x2 | | -> see Q | Up: see Q | 1383 | Loss Ev. | | Downstream| x2 |#| -> see Q|L | Others: | 1384 | Bits | | E2E | x2 | | -> see L | see L | 1385 +-------------+-+-----------+-----------+-+------------+-----------+ 1386 |QR: Square + |2| Upstream | x2 | | Rate over | Up: see Q | 1387 | Ref. Sq. | | 3/4 RT | x2 | | N*ppa pkts | Others: | 1388 | Bits | | !E2E | E2E |*| (see Q bit | N*ppa pk | 1389 | | | | Downstream| | for N) | (see Q | 1390 | | | | Half RT | | | for N) | 1391 +-------------+-+-----------+-----------+-+------------+-----------+ 1393 * All protocols 1394 # Protocols employing loss detection (w/ or w/o pure ACK loss 1395 detection) 1396 $ Require a working spin bit 1397 ! Metric relative to the opposite channel 1398 x2 Same metric for both directions 1399 ppa Packets-Per-Ack 1400 Q|L See Q if Upstream loss is significant; L otherwise 1402 Figure 2: Loss Comparison 1404 6. ECN-Echo Event Bit 1406 While the primary focus of the draft is on exposing packet loss and 1407 delay, modern networks can report congestion before they are forced 1408 to drop packets, as described in [ECN]. When transport protocols 1409 keep ECN-Echo feedback under encryption, this signal cannot be 1410 observed by the network operators. When tasked with diagnosing 1411 network performance problems, knowledge of a congestion downstream of 1412 an observation point can be instrumental. 1414 If downstream congestion information is desired, this information can 1415 be signaled with an additional bit. 1417 - E: The "ECN-Echo Event" bit is set to 0 or 1 according to the 1418 Unreported ECN Echo counter, as explained below in Section 6.1. 1420 6.1. Setting the ECN-Echo Event Bit on Outgoing Packets 1422 The Unreported ECN-Echo counter operates identically to Unreported 1423 Loss counter (Section 4.3), except it counts packets delivered by the 1424 network with CE markings, according to the ECN-Echo feedback from the 1425 receiver. 1427 This ECN-Echo signaling is similar to ECN signaling in [ConEx]. ECN- 1428 Echo mechanism in QUIC provides the number of packets received with 1429 CE marks. For protocols like TCP, the method described in 1430 [ConEx-TCP] can be employed. As stated in [ConEx-TCP], such feedback 1431 can be further improved using a method described in [ACCURATE]. 1433 6.2. Using E Bit for Passive ECN-Reported Congestion Measurement 1435 A network observer can count packets with CE codepoint and determine 1436 the upstream CE-marking rate directly. 1438 Observation points can also estimate ECN-reported end-to-end 1439 congestion by counting packets in this direction with a E bit equal 1440 to 1. 1442 The upstream CE-marking rate and end-to-end ECN-reported congestion 1443 can provide information about downstream CE-marking rate. Presence 1444 of E bits along with L bits, however, can somewhat confound precise 1445 estimates of upstream and downstream CE-markings in case the flow 1446 contains packets that are not ECN-capable. 1448 7. Protocol Ossification Considerations 1450 Accurate loss and delay information is not critical to the operation 1451 of any protocol, though its presence for a sufficient number of flows 1452 is important for the operation of networks. 1454 The delay and loss bits are amenable to "greasing" described in 1455 [RFC8701], if the protocol designers are not ready to dedicate (and 1456 ossify) bits used for loss reporting to this function. The greasing 1457 could be accomplished similarly to the Latency Spin bit greasing in 1458 [QUIC-TRANSPORT]. Namely, implementations could decide that a 1459 fraction of flows should not encode loss and delay information and, 1460 instead, the bits would be set to arbitrary values. The observers 1461 would need to be ready to ignore flows with delay and loss 1462 information more resembling noise than the expected signal. 1464 8. Examples of Application 1466 8.1. QUIC 1468 The binding of a delay signal to QUIC is partially described in 1469 [QUIC-TRANSPORT], which adds the spin bit to the first byte of the 1470 short packet header, leaving two reserved bits for future 1471 experiments. 1473 To implement the additional signals discussed in this document, the 1474 first byte of the short packet header can be modified as follows: 1476 - the delay bit (D) can be placed in the first reserved bit (i.e. 1477 the fourth most significant bit _0x10_) while the round trip loss 1478 bit (T) in the second reserved bit (i.e. the fifth most 1479 significant bit _0x08_); the proposed scheme is: 1481 0 1 2 3 4 5 6 7 1482 +-+-+-+-+-+-+-+-+ 1483 |0|1|S|D|T|K|P|P| 1484 +-+-+-+-+-+-+-+-+ 1486 Scheme 1 1488 - alternatively, a two bits loss signal (QL or QR) can be placed in 1489 both reserved bits; the proposed schemes, in this case, are: 1491 0 1 2 3 4 5 6 7 1492 +-+-+-+-+-+-+-+-+ 1493 |0|1|S|Q|L|K|P|P| 1494 +-+-+-+-+-+-+-+-+ 1496 Scheme 2A 1498 0 1 2 3 4 5 6 7 1499 +-+-+-+-+-+-+-+-+ 1500 |0|1|S|Q|R|K|P|P| 1501 +-+-+-+-+-+-+-+-+ 1503 Scheme 2B 1505 A further option would be to substitute the spin bit with the delay 1506 bit (or hidden delay bit) leaving the two reserved bits for loss 1507 detection. The proposed schemes are: 1509 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1510 +-+-+-+-+-+-+-+-+ +-+-+--+-+-+-+-+-+ 1511 |0|1|D|Q|L|K|P|P| OR |0|1|D^|Q|L|K|P|P| 1512 +-+-+-+-+-+-+-+-+ +-+-+--+-+-+-+-+-+ 1514 Scheme 3A 1516 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 1517 +-+-+-+-+-+-+-+-+ +-+-+--+-+-+-+-+-+ 1518 |0|1|D|Q|R|K|P|P| OR |0|1|D^|Q|R|K|P|P| 1519 +-+-+-+-+-+-+-+-+ +-+-+--+-+-+-+-+-+ 1521 Scheme 3B 1523 8.2. TCP 1525 The signals can be added to TCP by defining bit 4 of byte 13 of the 1526 TCP header to carry the spin bit or the delay bit, and possibly bits 1527 5 and 6 to carry additional information, like the delay bit and the 1528 round-trip loss bit (DT), or a two bits loss signal (QL or QR). 1530 9. Security Considerations 1532 Passive loss and delay observations have been a part of the network 1533 operations for a long time, so exposing loss and delay information to 1534 the network does not add new security concerns for protocols that are 1535 currently observable. 1537 In the absence of packet loss, Q and R bits signals do not provide 1538 any information that cannot be observed by simply counting packets 1539 transiting a network path. In the presence of packet loss, Q and R 1540 bits will disclose the loss, but this is information about the 1541 environment and not the endpoint state. The L bit signal discloses 1542 internal state of the protocol's loss detection machinery, but this 1543 state can often be gleamed by timing packets and observing congestion 1544 controller response. 1546 Hence, loss bits do not provide a viable new mechanism to attack data 1547 integrity and secrecy. 1549 9.1. Optimistic ACK Attack 1551 A defense against an Optimistic ACK Attack, described in 1552 [QUIC-TRANSPORT], involves a sender randomly skipping packet numbers 1553 to detect a receiver acknowledging packet numbers that have never 1554 been received. The Q bit signal may inform the attacker which packet 1555 numbers were skipped on purpose and which had been actually lost (and 1556 are, therefore, safe for the attacker to acknowledge). To use the Q 1557 bit for this purpose, the attacker must first receive at least an 1558 entire Q Block of packets, which renders the attack ineffective 1559 against a delay-sensitive congestion controller. 1561 A protocol that is more susceptible to an Optimistic ACK Attack with 1562 the loss signal provided by Q bit and uses a loss-based congestion 1563 controller, should shorten the current Q Block by the number of 1564 skipped packets numbers. For example, skipping a single packet 1565 number will invert the square signal one outgoing packet sooner. 1567 Similar considerations apply to the R Bit, although a shortened R 1568 Block along with a matching skip in packet numbers does not 1569 necessarily imply a lost packet, since it could be due to a lost 1570 packet on the reverse path along with a deliberately skipped packet 1571 by the sender. 1573 10. Privacy Considerations 1575 To minimize unintentional exposure of information, loss bits provide 1576 an explicit loss signal - a preferred way to share information per 1577 [RFC8558]. 1579 New protocols commonly have specific privacy goals, and loss 1580 reporting must ensure that loss information does not compromise those 1581 privacy goals. For example, [QUIC-TRANSPORT] allows changing 1582 Connection IDs in the middle of a connection to reduce the likelihood 1583 of a passive observer linking old and new sub-flows to the same 1584 device. A QUIC implementation would need to reset all counters when 1585 it changes the destination (IP address or UDP port) or the Connection 1586 ID used for outgoing packets. It would also need to avoid 1587 incrementing Unreported Loss counter for loss of packets sent to a 1588 different destination or with a different Connection ID. 1590 11. IANA Considerations 1592 This document makes no request of IANA. 1594 12. Change Log 1596 TBD 1598 13. Contributors 1600 The following people provided valuable contributions to this 1601 document: 1603 - Marcus Ihlar, Ericsson, marcus.ihlar@ericsson.com 1605 - Jari Arkko, Ericsson, jari.arkko@ericsson.com 1607 - Emile Stephan, Orange, emile.stephan@orange.com 1609 14. Acknowledgements 1611 TBD 1613 15. References 1615 15.1. Normative References 1617 [ConEx] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1618 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1619 DOI 10.17487/RFC7713, December 2015, 1620 . 1622 [ConEx-TCP] 1623 Kuehlewind, M., Ed. and R. Scheffenegger, "TCP 1624 Modifications for Congestion Exposure (ConEx)", RFC 7786, 1625 DOI 10.17487/RFC7786, May 2016, 1626 . 1628 [ECN] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1629 of Explicit Congestion Notification (ECN) to IP", 1630 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1631 . 1633 [IP] Postel, J., "Internet Protocol", STD 5, RFC 791, 1634 DOI 10.17487/RFC0791, September 1981, 1635 . 1637 [IPM-Methods] 1638 Morton, A., "Active and Passive Metrics and Methods (with 1639 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1640 May 2016, . 1642 [IPv6] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1643 (IPv6) Specification", STD 86, RFC 8200, 1644 DOI 10.17487/RFC8200, July 2017, 1645 . 1647 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1648 Requirement Levels", BCP 14, RFC 2119, 1649 DOI 10.17487/RFC2119, March 1997, 1650 . 1652 [RFC8558] Hardie, T., Ed., "Transport Protocol Path Signals", 1653 RFC 8558, DOI 10.17487/RFC8558, April 2019, 1654 . 1656 [TCP] Postel, J., "Transmission Control Protocol", STD 7, 1657 RFC 793, DOI 10.17487/RFC0793, September 1981, 1658 . 1660 15.2. Informative References 1662 [ACCURATE] 1663 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1664 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1665 ecn-15 (work in progress), July 2021. 1667 [AltMark] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1668 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1669 "Alternate-Marking Method for Passive and Hybrid 1670 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1671 January 2018, . 1673 [ANRW19-PM-QUIC] 1674 Bulgarella, F., Cociglio, M., Fioccola, G., Marchetto, G., 1675 and R. Sisto, "Performance measurements of QUIC 1676 communications", Proceedings of the Applied Networking 1677 Research Workshop, DOI 10.1145/3340301.3341127, July 2019. 1679 [I-D.trammell-ippm-spin] 1680 Trammell, B., "An Explicit Transport-Layer Signal for 1681 Hybrid RTT Measurement", draft-trammell-ippm-spin-00 (work 1682 in progress), January 2019. 1684 [I-D.trammell-tsvwg-spin] 1685 Trammell, B., "A Transport-Independent Explicit Signal for 1686 Hybrid RTT Measurement", draft-trammell-tsvwg-spin-00 1687 (work in progress), July 2018. 1689 [IPv6AltMark] 1690 Fioccola, G., Zhou, T., Cociglio, M., Qin, F., and R. 1691 Pang, "IPv6 Application of the Alternate Marking Method", 1692 draft-ietf-6man-ipv6-alt-mark-12 (work in progress), 1693 October 2021. 1695 [QUIC-TRANSPORT] 1696 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1697 Multiplexed and Secure Transport", RFC 9000, 1698 DOI 10.17487/RFC9000, May 2021, 1699 . 1701 [RFC8517] Dolson, D., Ed., Snellman, J., Boucadair, M., Ed., and C. 1702 Jacquenet, "An Inventory of Transport-Centric Functions 1703 Provided by Middleboxes: An Operator Perspective", 1704 RFC 8517, DOI 10.17487/RFC8517, February 2019, 1705 . 1707 [RFC8701] Benjamin, D., "Applying Generate Random Extensions And 1708 Sustain Extensibility (GREASE) to TLS Extensibility", 1709 RFC 8701, DOI 10.17487/RFC8701, January 2020, 1710 . 1712 [RFC9065] Fairhurst, G. and C. Perkins, "Considerations around 1713 Transport Header Confidentiality, Network Operations, and 1714 the Evolution of Internet Transport Protocols", RFC 9065, 1715 DOI 10.17487/RFC9065, July 2021, 1716 . 1718 [SPIN-BIT] 1719 Trammell, B., Vaere, P. D., Even, R., Fioccola, G., 1720 Fossati, T., Ihlar, M., Morton, A., and E. Stephan, 1721 "Adding Explicit Passive Measurability of Two-Way Latency 1722 to the QUIC Transport Protocol", draft-trammell-quic- 1723 spin-03 (work in progress), May 2018. 1725 [UDP-OPTIONS] 1726 Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- 1727 udp-options-13 (work in progress), June 2021. 1729 [UDP-SURPLUS] 1730 Herbert, T., "UDP Surplus Header", draft-herbert-udp- 1731 space-hdr-01 (work in progress), July 2019. 1733 Authors' Addresses 1735 Mauro Cociglio 1736 Telecom Italia - TIM 1737 Via Reiss Romoli, 274 1738 Torino 10148 1739 Italy 1741 EMail: mauro.cociglio@telecomitalia.it 1743 Alexandre Ferrieux 1744 Orange Labs 1746 EMail: alexandre.ferrieux@orange.com 1748 Giuseppe Fioccola 1749 Huawei Technologies 1750 Riesstrasse, 25 1751 Munich 80992 1752 Germany 1754 EMail: giuseppe.fioccola@huawei.com 1756 Igor Lubashev 1757 Akamai Technologies 1759 EMail: ilubashe@akamai.com 1761 Fabio Bulgarella 1762 Telecom Italia - TIM 1763 Via Reiss Romoli, 274 1764 Torino 10148 1765 Italy 1767 EMail: fabio.bulgarella@guest.telecomitalia.it 1768 Isabelle Hamchaoui 1769 Orange Labs 1771 EMail: isabelle.hamchaoui@orange.com 1773 Massimo Nilo 1774 Telecom Italia - TIM 1775 Via Reiss Romoli, 274 1776 Torino 10148 1777 Italy 1779 EMail: massimo.nilo@telecomitalia.it 1781 Riccardo Sisto 1782 Politecnico di Torino 1784 EMail: riccardo.sisto@polito.it 1786 Dmitri Tikhonov 1787 LiteSpeed Technologies 1789 EMail: dtikhonov@litespeedtech.com