idnits 2.17.1 draft-mdt-ippm-explicit-flow-measurements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 22, 2021) is 1152 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-13 -- Obsolete informational reference (is this intentional?): RFC 8321 (ref. 'AltMark') (Obsoleted by RFC 9341) == Outdated reference: A later version (-17) exists of draft-ietf-6man-ipv6-alt-mark-02 == Outdated reference: A later version (-21) exists of draft-ietf-tsvwg-transport-encrypt-18 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-udp-options-09 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPPM M. Cociglio 3 Internet-Draft Telecom Italia 4 Intended status: Informational A. Ferrieux 5 Expires: August 26, 2021 Orange Labs 6 G. Fioccola 7 Huawei Technologies 8 I. Lubashev 9 Akamai Technologies 10 F. Bulgarella 11 Telecom Italia 12 I. Hamchaoui 13 Orange Labs 14 M. Nilo 15 Telecom Italia 16 R. Sisto 17 Politecnico di Torino 18 D. Tikhonov 19 LiteSpeed Technologies 20 February 22, 2021 22 Explicit Flow Measurements Techniques 23 draft-mdt-ippm-explicit-flow-measurements-01 25 Abstract 27 This document describes protocol independent methods called Explicit 28 Flow Measurement Techniques that employ few marking bits, inside the 29 header of each packet, for loss and delay measurement. The 30 endpoints, marking the traffic, signal these metrics to intermediate 31 observers allowing them to measure connection performance, and to 32 locate the network segment where impairments happen. Different 33 alternatives are considered within this document. These signaling 34 methods apply to all protocols but they are especially valuable when 35 applied to protocols that encrypt transport header and do not allow 36 traditional methods for delay and loss detection. 38 Status of This Memo 40 This Internet-Draft is submitted in full conformance with the 41 provisions of BCP 78 and BCP 79. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF). Note that other groups may also distribute 45 working documents as Internet-Drafts. The list of current Internet- 46 Drafts is at https://datatracker.ietf.org/drafts/current/. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 This Internet-Draft will expire on August 26, 2021. 55 Copyright Notice 57 Copyright (c) 2021 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (https://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 4 74 3. Latency Bits . . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.1. Spin Bit . . . . . . . . . . . . . . . . . . . . . . . . 5 76 3.2. Delay Bit . . . . . . . . . . . . . . . . . . . . . . . . 6 77 3.2.1. Generation Phase . . . . . . . . . . . . . . . . . . 8 78 3.2.2. Reflection Phase . . . . . . . . . . . . . . . . . . 8 79 3.2.3. T_Max Selection . . . . . . . . . . . . . . . . . . . 9 80 3.2.4. Delay Measurement using Delay Bit . . . . . . . . . . 10 81 3.2.5. Observer's Algorithm . . . . . . . . . . . . . . . . 12 82 3.2.6. Two Bits Delay Measurement: Spin Bit + Delay Bit . . 13 83 4. Loss Bits . . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 4.1. T Bit - Round Trip Loss Bit . . . . . . . . . . . . . . . 14 85 4.1.1. Round Trip Packet Loss Measurement . . . . . . . . . 15 86 4.1.2. Setting the Round Trip Loss Bit on Outgoing Packets . 16 87 4.1.3. Observer's Logic for Round Trip Loss Signal . . . . . 17 88 4.1.4. Loss Coverage and Signal Timing . . . . . . . . . . . 18 89 4.2. Q Bit - Square Bit . . . . . . . . . . . . . . . . . . . 18 90 4.2.1. Q Block Length Selection . . . . . . . . . . . . . . 18 91 4.2.2. Upstream Loss . . . . . . . . . . . . . . . . . . . . 19 92 4.2.3. Identifying Q Block Boundaries . . . . . . . . . . . 20 93 4.3. L Bit - Loss Event Bit . . . . . . . . . . . . . . . . . 20 94 4.3.1. End-To-End Loss . . . . . . . . . . . . . . . . . . . 21 95 4.3.2. Loss Profile Characterization . . . . . . . . . . . . 21 97 4.4. L+Q Bits - Upstream, Downstream, and End-to-End Loss 98 Measurements . . . . . . . . . . . . . . . . . . . . . . 21 99 4.4.1. Correlating End-to-End and Upstream Loss . . . . . . 22 100 4.5. R Bit - Reflection Square Bit . . . . . . . . . . . . . . 23 101 4.5.1. R+Q Bits - Using R and Q Bits for Passive Loss 102 Measurement . . . . . . . . . . . . . . . . . . . . . 24 103 4.5.2. Enhancement of R Block Length Computation . . . . . . 28 104 4.5.3. Improved Resilience to Packet Reordering . . . . . . 28 105 5. Summary of Delay and Loss Marking Methods . . . . . . . . . . 28 106 6. ECN-Echo Event Bit . . . . . . . . . . . . . . . . . . . . . 30 107 6.1. Setting the ECN-Echo Event Bit on Outgoing Packets . . . 31 108 6.2. Using E Bit for Passive ECN-Reported Congestion 109 Measurement . . . . . . . . . . . . . . . . . . . . . . . 31 110 7. Protocol Ossification Considerations . . . . . . . . . . . . 31 111 8. Examples of Application . . . . . . . . . . . . . . . . . . . 32 112 8.1. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 32 113 8.2. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 114 9. Security Considerations . . . . . . . . . . . . . . . . . . . 33 115 9.1. Optimistic ACK Attack . . . . . . . . . . . . . . . . . . 34 116 10. Privacy Considerations . . . . . . . . . . . . . . . . . . . 34 117 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 118 12. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 35 119 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 35 120 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 35 121 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 122 15.1. Normative References . . . . . . . . . . . . . . . . . . 35 123 15.2. Informative References . . . . . . . . . . . . . . . . . 36 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 126 1. Introduction 128 Packet loss and delay are hard and pervasive problems of day-to-day 129 network operation. Proactively detecting, measuring, and locating 130 them is crucial to maintaining high QoS and timely resolution of 131 crippling end-to-end throughput issues. To this effect, in a TCP- 132 dominated world, network operators have been heavily relying on 133 information present in the clear in TCP headers: sequence and 134 acknowledgment numbers and SACKs when enabled (see [RFC8517]). These 135 allow for quantitative estimation of packet loss and delay by passive 136 on-path observation. Additionally, the problem can be quickly 137 identified in the network path by moving the passive observer around. 139 With encrypted protocols, the equivalent transport headers are 140 encrypted and passive packet loss and delay observations are not 141 possible, as described in [TRANSPORT-ENCRYPT]. 143 Measuring TCP loss and delay between similar endpoints cannot be 144 relied upon to evaluate encrypted protocol loss and delay. Different 145 protocols could be routed by the network differently, and the 146 fraction of Internet traffic delivered using protocols other than TCP 147 is increasing every year. It is imperative to measure packet loss 148 and delay experienced by encrypted protocol users directly. 150 This document defines Explicit Flow Measurement Techniques. These 151 hybrid measurement path signals (see [IPM-Methods]) are to be 152 embedded into a transport layer protocol and are explicitly intended 153 for exposing RTT and loss rate information to on-path measurement 154 devices. These measurement mechanisms are applicable to any 155 transport-layer protocol, and, as an example, the document describes 156 QUIC and TCP bindings. 158 The Explicit Flow Measurement Techniques described in this document 159 can be used alone or in combination with other Explicit Flow 160 Measurement Techniques. Each technique uses a small number of bits 161 and exposes a specific measurement. 163 Following the recommendation in [RFC8558] of making path signals 164 explicit, this document proposes adding a small number of dedicated 165 measurement bits to the clear portion of the protocol headers. These 166 bits can be added to an encrypted portion of a header belonging to 167 any protocol layer, e.g. IP (see [IP]) and IPv6 (see [IPv6]) headers 168 or extensions, such as [IPv6AltMark], UDP surplus space (see 169 [UDP-OPTIONS] and [UDP-SURPLUS]), reserved bits in a QUIC v1 header 170 (see [QUIC-TRANSPORT]). 172 The measurements are not designed for use in automated control of the 173 network in environments where signal bits are set by untrusted hosts. 174 Instead, the signal is to be used for troubleshooting individual 175 flows as well as for monitoring the network by aggregating 176 information from multiple flows and raising operator alarms if 177 aggregate statistics indicate a potential problem. 179 The spin bit, delay bit and loss bits explained in this document are 180 inspired by [AltMark], [SPIN-BIT], [I-D.trammell-tsvwg-spin] and 181 [I-D.trammell-ippm-spin]. 183 Additional details about the Performance Measurements for QUIC are 184 described in the paper [ANRW19-PM-QUIC]. 186 2. Notational Conventions 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in [RFC2119]. 192 3. Latency Bits 194 This section introduces bits that can be used for round trip latency 195 measurements. Whenever this section of the specification refers to 196 packets, it is referring only to packets with protocol headers that 197 include the latency bits. 199 [QUIC-TRANSPORT] introduces an explicit per-flow transport-layer 200 signal for hybrid measurement of RTT. This signal consists of a spin 201 bit that toggles once per RTT. [SPIN-BIT] discusses an additional 202 two-bit Valid Edge Counter (VEC) to compensate for loss and 203 reordering of the spin bit and increase fidelity of the signal in 204 less than ideal network conditions. 206 This document introduces a stand-alone single-bit delay signal that 207 can be used by passive observers to measure the RTT of a network 208 flow, avoiding the spin bit ambiguities that arise as soon as network 209 conditions deteriorate. 211 3.1. Spin Bit 213 This section is a small recap of the spin bit working mechanism. For 214 a comprehensive explanation of the algorithm, please see [SPIN-BIT]. 216 The spin bit is an alternate marking [AltMark] generated signal, 217 where the size of the alternation changes with the flight size each 218 RTT. 220 The latency spin bit is a single bit signal that toggles once per 221 RTT, enabling latency monitoring of a connection-oriented 222 communication from intermediate observation points. 224 A "spin period" is a set of packets with the same spin bit value sent 225 during one RTT time interval. A "spin period value" is the value of 226 the spin bit shared by all packets in a spin period. 228 The client and server maintain an internal per-connection spin value 229 (i.e. 0 or 1) used to set the spin bit on outgoing packets. Both 230 endpoints initialize the spin value to 0 when a new connection 231 starts. Then: 233 - when the client receives a packet with the packet number larger 234 than any number seen so far, it sets the connection spin value to 235 the opposite value contained in the received packet; 237 - when the server receives a packet with the packet number larger 238 than any number seen so far, it sets the connection spin value to 239 the same value contained in the received packet. 241 The computed spin value is used by the endpoints for setting the spin 242 bit on outgoing packets. This mechanism allows the endpoints to 243 generate a square wave such that, by measuring the distance in time 244 between pairs of consecutive edges observed in the same direction, a 245 passive on-path observer can compute the round trip delay of that 246 network flow. 248 Spin bit enables round trip latency measurement by observing a single 249 direction of the traffic flow. 251 Note that packet reordering can cause spurious edges that require 252 heuristics to correct. The spin bit performance deteriorates as soon 253 as network impairments arise as explained in Section 3.2. 255 3.2. Delay Bit 257 The delay bit has been designed to overcome accuracy limitations 258 experienced by the spin bit under difficult network conditions: 260 - packet reordering leads to generation of spurious edges and errors 261 in delay estimation; 263 - loss of edges causes wrong estimation of spin periods and 264 therefore wrong RTT measurements; 266 - application-limited senders cause the spin bit to measure the 267 application delays instead of network delays. 269 Unlike the spin bit, which is set in every packet transmitted on the 270 network, the delay bit is set only once per round trip. 272 When the delay bit is used, a single packet with a marked bit (the 273 delay bit) bounces between a client and a server during the entire 274 connection lifetime. This single packet is called "delay sample". 276 An observer placed at an intermediate point, observing a single 277 direction of traffic, tracking the delay sample and the relative 278 timestamp, can measure the round trip delay of the connection. 280 The delay sample lifetime is comprised of two phases: initialization 281 and reflection. The initialization is the generation of the delay 282 sample, while the reflection realizes the bounce behavior of this 283 single packet between the two endpoints. 285 The next figure describes the elementary Delay bit mechanism. 287 +--------+ - - - - - +--------+ 288 | | -----------> | | 289 | Client | | Server | 290 | | <----------- | | 291 +--------+ - - - - - +--------+ 293 (a) No traffic at beginning. 295 +--------+ 0 0 1 - - +--------+ 296 | | -----------> | | 297 | Client | | Server | 298 | | <----------- | | 299 +--------+ - - - - - +--------+ 301 (b) The Client starts sending data and 302 sets the first packet as Delay Sample. 304 +--------+ 0 0 0 0 0 +--------+ 305 | | -----------> | | 306 | Client | | Server | 307 | | <----------- | | 308 +--------+ - - - 1 0 +--------+ 310 (c) The Server starts sending data 311 and reflects the Delay Sample. 313 +--------+ 0 1 0 0 0 +--------+ 314 | | -----------> | | 315 | Client | | Server | 316 | | <----------- | | 317 +--------+ 0 0 0 0 0 +--------+ 319 (d) The Client reflects the Delay Sample. 321 +--------+ 0 0 0 0 0 +--------+ 322 | | -----------> | | 323 | Client | | Server | 324 | | <----------- | | 325 +--------+ 0 0 0 1 0 +--------+ 327 (e) The Server reflects the Delay Sample 328 and so on. 330 Delay bit mechanism 332 3.2.1. Generation Phase 334 Only client is actively involved in the generation phase. It 335 maintains an internal per-flow timestamp variable ("ds_time") updated 336 every time a delay sample is transmitted. 338 When connection starts, the client generates a new delay sample 339 initializing the delay bit of the first outgoing packet to 1. Then 340 it updates the "ds_time" variable with the timestamp of its 341 transmission. 343 The server initializes the delay bit to 0 at the beginning of the 344 connection, and its only task during the connection is described in 345 Section 3.2.2. 347 In absence of network impairments, the delay sample should bounce 348 between client and server continuously, for the entire duration of 349 the connection. That is highly unlikely for two reasons: 351 1. the packet carrying the delay bit might be lost; 353 2. an endpoint could stop or delay sending packets because the 354 application is limiting the amount of traffic transmitted; 356 To deal with these problems, the client generates a new delay sample 357 if more than a predetermined time ("T_Max") has elapsed since the 358 last delay sample transmission (including reflections). Note that 359 "T_Max" should be greater than the max measurable RTT on the network. 360 See Section 3.2.3 for details. 362 3.2.2. Reflection Phase 364 Reflection is the process that enables the bouncing of the delay 365 sample between a client and a server. The behavior of the two 366 endpoints is almost the same. 368 - Server side reflection: when a delay sample arrives, the server 369 marks the first packet in the opposite direction as the delay 370 sample. 372 - Client side reflection: when a delay sample arrives, the client 373 marks the first packet in the opposite direction as the delay 374 sample. It also updates the "ds_time" variable when the outgoing 375 delay sample is actually forwarded. 377 In both cases, if the outgoing delay sample is being transmitted with 378 a delay greater than a predetermined threshold after the reception of 379 the incoming delay sample (1ms by default), the delay sample is not 380 reflected, and the outgoing delay bit is kept at 0. 382 By doing so, the algorithm can reject measurements that would 383 overestimate the delay due to lack of traffic on the endpoints. 384 Hence, the maximum estimation error would amount to twice the 385 threshold (e.g. 2ms) per measurement. 387 3.2.3. T_Max Selection 389 The internal "ds_time" variable allows a client to identify delay 390 sample losses. Considering that a lost delay sample is regenerated 391 at the end of an explicit time ("T_Max") since the last generation, 392 this same value can be used by an observer to reject a measure and 393 start a new one. 395 In other words, if the difference in time between two delay samples 396 is greater or equal than "T_Max", then these cannot be used to 397 produce a delay measure. Therefore the value of "T_Max" must also be 398 known to the on-path network probes. 400 There are two alternatives to select the "T_Max" value so that both 401 client and observers know it. The first one requires that "T_Max" is 402 known a priori ("T_Max_p") and therefore set within the protocol 403 specifications that implements the marking mechanism (e.g. 1 second 404 which usually is greater than the max expectable RTT). The second 405 alternative requires a dynamic mechanism able to adapt the duration 406 of the "T_Max" to the delay of the connection ("T_Max_c"). 408 For instance, client and observers could use the connection RTT as a 409 basis for calculating an effective "T_Max". They should use a 410 predetermined initial value so that "T_Max = T_Max_p" (e.g. 1 second) 411 and then, when a valid RTT is measured, change "T_Max" accordingly so 412 that "T_Max = T_Max_c". In any case, the selected "T_Max" should be 413 large enough to absorb any possible variations in the connection 414 delay. 416 "T_Max_c" could be computed as two times the measured "RTT" plus a 417 fixed amount of time ("100ms") to prevent low "T_Max" values in case 418 of very small RTTs. The resulting formula is: "T_Max_c = 2RTT + 419 100ms". If "T_Max_c" is greater than "T_Max_p" then "T_Max_c" is 420 forced to "T_Max_p" value. 422 Note that the observer's "T_Max" should always be less than or equal 423 to the client's "T_Max" to avoid considering as a valid measurement 424 what is actually the client's "T_Max". To obtain this result, the 425 client waits for two consecutive incoming samples and computes the 426 two related RTTs. Then it takes the largest of them as the basis of 427 the "T_Max_c" formula. At this point, observers have already 428 measured a valid RTT and then computed their "T_Max_c". 430 3.2.4. Delay Measurement using Delay Bit 432 When the Delay Bit is used, a passive observer can use delay samples 433 directly and avoid inherent ambiguities in the calculation of the RTT 434 as can be seen in spin bit analysis. 436 3.2.4.1. RTT Measurement 438 The delay sample generation process ensures that only one packet 439 marked with the delay bit set to 1 runs back and forth between two 440 endpoints per round trip time. To determine the RTT measurement of a 441 flow, an on-path passive observer computes the time difference 442 between two delay samples observed in a single direction. 444 To ensure a valid measurement, the observer must verify that the 445 distance in time between the two samples taken into account is less 446 than "T_Max". 448 =======================|======================> 449 = ********** -----Obs----> ********** = 450 = * Client * * Server * = 451 = ********** <------------ ********** = 452 <============================================== 454 (a) client-server RTT 456 ==============================================> 457 = ********** ------------> ********** = 458 = * Client * * Server * = 459 = ********** <----Obs----- ********** = 460 <======================|======================= 462 (b) server-client RTT 464 Round-trip time (both direction) 466 3.2.4.2. Half-RTT Measurement 468 An observer that is able to observe both forward and return traffic 469 directions can use the delay samples to measure "upstream" and 470 "downstream" RTT components, also known as the half-RTT measurements. 471 It does this by measuring the time between a delay sample observed in 472 one direction and the delay sample previously observed in the 473 opposite direction. 475 As with RTT measurement, the observer must verify that the distance 476 in time between the two samples taken into account is less than 477 "T_Max". 479 Note that upstream and downstream sections of paths between the 480 endpoints and the observer, i.e. observer-to-client vs client-to- 481 observer and observer-to-server vs server-to-observer, may have 482 different delay characteristics due to the difference in network 483 congestion and other factors. 485 =======================> 486 = ********** ------|-----> ********** 487 = * Client * Obs * Server * 488 = ********** <-----|------ ********** 489 <======================= 491 (a) client-observer half-RTT 493 =======================> 494 ********** ------|-----> ********** = 495 * Client * Obs * Server * = 496 ********** <-----|------ ********** = 497 <======================= 499 (b) observer-server half-RTT 501 Half Round-trip time (both direction) 503 3.2.4.3. Intra-Domain RTT Measurement 505 Intra-domain RTT is the portion of the entire RTT used by a flow to 506 traverse the network of a provider. To measure intra-domain RTT, two 507 observers capable of observing traffic in both directions must be 508 employed simultaneously at ingress and egress of the network to be 509 measured. Intra-domain RTT is difference between the two computed 510 upstream (or downstream) RTT components. 512 =========================================> 513 = =====================> 514 = = ********** ---|--> ---|--> ********** 515 = = * Client * Obs Obs * Server * 516 = = ********** <--|--- <--|--- ********** 517 = <===================== 518 <========================================= 520 (a) client-observer RTT components (half-RTTs) 522 ==================> 523 ********** ---|--> ---|--> ********** 524 * Client * Obs Obs * Server * 525 ********** <--|--- <--|--- ********** 526 <================== 528 (b) the intra-domain RTT resulting from the 529 subtraction of the above RTT components 531 Intra-domain Round-trip time (client-observer: upstream) 533 3.2.5. Observer's Algorithm 535 An on-path observer maintains an internal per-flow variable to keep 536 track of time at which the last delay sample has been observed. 538 A unidirectional observer, upon detecting a delay sample: 540 - if a delay sample was also detected previously in the same 541 direction and the distance in time between them is less than 542 "T_Max - K", then the two delay samples can be used to calculate 543 RTT measurement. "K" is a protection threshold to absorb 544 differences in "T_Max" computation and delay variations between 545 two consecutive delay samples (e.g. "K = 10% T_Max"). 547 If the observer can observe both forward and return traffic flows, 548 and it is able to determine which direction contains the client and 549 the server (e.g. by observing the connection handshake), upon 550 detecting a delay sample: 552 - if a delay sample was also detected in the opposite direction and 553 the distance in time between them is less than "T_Max - K", then 554 the two delay samples can be used to measure the observer-client 555 half-RTT or the observer-server half-RTT, according to the 556 direction of the last delay sample observed. 558 3.2.6. Two Bits Delay Measurement: Spin Bit + Delay Bit 560 Spin and Delay bit algorithms work independently. If both marking 561 methods are used in the same connection, observers can choose the 562 best measurement between the two available: 564 - when a precise measurement can be produced using the delay bit, 565 observers choose it; 567 - when a delay bit measurement is not available, observers choose 568 the approximate spin bit one. 570 4. Loss Bits 572 This section introduces bits that can be used for loss measurements. 573 Whenever this section of the specification refers to packets, it is 574 referring only to packets with protocol headers that include the loss 575 bits - the only packets whose loss can be measured. 577 - T: the "round Trip loss" bit is used in combination with the Spin 578 bit to measure round-trip loss. See Section 4.1. 580 - Q: the "sQuare signal" bit is used to measure upstream loss. See 581 Section 4.2. 583 - L: the "Loss event" bit is used to measure end-to-end loss. See 584 Section 4.3. 586 - R: the "Reflection square signal" bit is used in combination with 587 Q bit to measure end-to-end loss. See Section 4.1. 589 Loss measurements enabled by T, Q, and L bits can be implemented by 590 those loss bits alone (T bit requires a working Spin Bit). Two-bit 591 combinations Q+L and Q+R enable additional measurement opportunities 592 discussed below. 594 Each endpoint maintains appropriate counters independently and 595 separately for each separately identifiable flow (each sub-flow for 596 multipath connections). 598 Since loss is reported independently for each flow, all bits (except 599 for L bit) require a certain minimum number of packets to be 600 exchanged per flow before any signal can be measured. Therefore, 601 loss measurements work best for flows that transfer more than a 602 minimal amount of data. 604 4.1. T Bit - Round Trip Loss Bit 606 The round Trip loss bit is used to mark a variable number of packets 607 exchanged twice between the endpoints realizing a two round-trip 608 reflection. A passive on-path observer, observing either direction, 609 can count and compare the number of marked packets seen during the 610 two reflections, estimating the loss rate experienced by the 611 connection. The overall exchange comprises: 613 - The client selects, generates and consequently transmits a first 614 train of packets, by setting the T bit to 1; 616 - The server, upon receiving each packet included in the first 617 train, reflects to the client a respective second train of packets 618 of the same size as the first train received, by setting the T bit 619 to 1; 621 - The client, upon receiving each packet included in the second 622 train, reflects to the server a respective third train of packets 623 of the same size as the second train received, by setting the T 624 bit to 1; 626 - The server, upon receiving each packet included in the third 627 train, finally reflects to the client a respective fourth train of 628 packets of the same size as the third train received, by setting 629 the T bit to 1. 631 Packets belonging to the first round trip (first and second train) 632 represent the Generation Phase, while those belonging to the second 633 round trip (third and fourth train) represent the Reflection Phase. 635 A passive on-path observer can count and compare the number of marked 636 packets seen during the two round trips (i.e. the first and third or 637 the second and the fourth trains of packets, depending on which 638 direction is observed) and estimate the loss rate experienced by the 639 connection. This process is repeated continuously to obtain more 640 measurements as long as the endpoints exchange traffic. These 641 measurements can be called Round Trip losses. 643 Since packet rates in two directions may be different, the number of 644 marked packets in the train is determined by the direction with the 645 lowest packet rate. See Section 4.1.2 for details on packet 646 generation and for a mechanism to allow an observer to distinguish 647 between trains belonging to different phases (Generation and 648 Reflection). 650 4.1.1. Round Trip Packet Loss Measurement 652 Since the measurements are performed on a portion of the traffic 653 exchanged between the client and the server, the observer calculates 654 the end-to-end Round Trip Packet Loss (RTPL) that, statistically, 655 will correspond to the loss rate experienced by the connection along 656 the entire network path. 658 =======================|======================> 659 = ********** -----Obs----> ********** = 660 = * Client * * Server * = 661 = ********** <------------ ********** = 662 <============================================== 664 (a) client-server RTPL 666 ==============================================> 667 = ********** ------------> ********** = 668 = * Client * * Server * = 669 = ********** <----Obs----- ********** = 670 <======================|======================= 672 (b) server-client RTPL 674 Round-trip packet loss (both direction) 676 This methodology also allows the Half-RTPL measurement and the Intra- 677 domain RTPL measurement in a way similar to RTT measurement. 679 =======================> 680 = ********** ------|-----> ********** 681 = * Client * Obs * Server * 682 = ********** <-----|------ ********** 683 <======================= 685 (a) client-observer half-RTPL 687 =======================> 688 ********** ------|-----> ********** = 689 * Client * Obs * Server * = 690 ********** <-----|------ ********** = 691 <======================= 693 (b) observer-server half-RTPL 695 Half Round-trip packet loss (both direction) 696 =========================================> 697 =====================> = 698 ********** ---|--> ---|--> ********** = = 699 * Client * Obs Obs * Server * = = 700 ********** <--|--- <--|--- ********** = = 701 <===================== = 702 <========================================= 704 (a) observer-server RTPL components (half-RTPLs) 706 ==================> 707 ********** ---|--> ---|--> ********** 708 * Client * Obs Obs * Server * 709 ********** <--|--- <--|--- ********** 710 <================== 712 (b) the intra-domain RTPL resulting from the 713 subtraction of the above RTPL components 715 Intra-domain Round-trip packet loss (observer-server) 717 4.1.2. Setting the Round Trip Loss Bit on Outgoing Packets 719 The round Trip loss signal requires a working Spin-bit signal to 720 separate trains of marked packets (packets with T bit set to 1). A 721 "pause" of at least one empty spin-bit period between each phase of 722 the algorithm serves as such separator for the on-path observer. 724 The client is in charge of launching trains of marked packets and 725 does so according to the algorithm: 727 1. Generation Phase. The client starts generating marked packets 728 for two consecutive spin-bit periods; it maintains a "generation 729 token" count that is reset to zero at the beginning of the 730 algorithm phase and is incremented every time a packet arrives. 731 When the client transmits a packet and a "generation token" is 732 available, the client marks the packet and retires a "generation 733 token". If no token is available, the outgoing packet is 734 transmitted unmarked. At the end of the first spin-bit period 735 spent in generation, the reflection counter is unlocked to start 736 counting incoming marked packets that will be reflected later; 738 2. Pause Phase. When the generation is completed, the client pauses 739 till it has observed one entire spin bit period with no marked 740 packets. That spin bit period is used by the observer as a 741 separator between generated and reflected packets. During this 742 marking pause, all the outgoing packets are transmitted with T 743 bit set to 0. The reflection counter is still incremented every 744 time a marked packet arrives; 746 3. Reflection Phase. The client starts transmitting marked packets, 747 decrementing the reflection counter for each transmitted marked 748 packet until the reflection counter reached zero. The 749 "generation token" method from the generation phase is used 750 during this phase as well. At the end of the first spin-period 751 spent in reflection, the reflection counter is locked to avoid 752 incoming reflected packets incrementing it; 754 4. Pause Phase 2. The pause phase is repeated after the reflection 755 phase and serves as a separator between the reflected packet 756 train and a new packet train. 758 The generation token counter should be capped to limit the effects of 759 a subsequent sudden reduction in the other endpoint's packet rate 760 that could prevent that endpoint from reflecting collected packets. 761 The most conservative cap value is "1". 763 A server maintains a "marking counter" that starts at zero and is 764 incremented every time a marked packet arrives. When the server 765 transmits a packet and the "marking counter" is positive, the server 766 marks the packet and decrements the "marking counter". If the 767 "marking counter" is zero, the outgoing packet is transmitted 768 unmarked. 770 4.1.3. Observer's Logic for Round Trip Loss Signal 772 The on-path observer counts marked packets and separates different 773 trains by detecting spin-bit periods (at least one) with no marked 774 packets. The Round Trip Packet Loss (RTPL) is the difference between 775 the size of the Generation train and the Reflection train. 777 In the following example, packets are represented by two bits (first 778 one is the spin bit, second one is the loss bit): 780 Generation Pause Reflection Pause 781 ____________________ ______________ ____________________ ________ 782 | | | | | 783 01 01 00 01 11 10 11 00 00 10 10 10 01 00 01 01 10 11 10 00 00 10 785 Round Trip Loss signal example 787 Note that 5 marked packets have been generated of which 4 have been 788 reflected. 790 4.1.4. Loss Coverage and Signal Timing 792 A cycle of the round Trip loss signaling algorithm contains 2 RTTs of 793 Generation phase, 2 RTTs of Reflection phase, and two Pause phases at 794 least 1 RTT in duration each. Hence, the loss signal is delayed by 795 about 6 RTTs since the loss events. 797 The observer can only detect loss of marked packets that occurs after 798 its initial observation of the Generation phase and before its 799 subsequent observation of the Reflection phase. Hence, if the loss 800 occurs on the path that sends packets at a lower rate (typically ACKs 801 in such asymmetric scenarios), "2/6" ("1/3") of the packets will be 802 sampled for loss detection. 804 If the loss occurs on the path that sends packets at a higher rate, 805 "lowPacketRate/(3*highPacketRate)" of the packets will be sampled for 806 loss detection. For protocols that use ACKs, the portion of packets 807 sampled for loss in the higher rate direction during unidirectional 808 data transfer is "1/(3*packetsPerAck)", where the value of 809 "packetsPerAck" can vary by protocol, by implementation, and by 810 network conditions. 812 4.2. Q Bit - Square Bit 814 The sQuare bit (Q bit) takes its name from the square wave generated 815 by its signal. Every outgoing packet contains the Q bit value, which 816 is initialized to the 0 and inverted after sending N packets (a 817 sQuare Block or simply Q Block). Hence, Q Period is 2*N. The Q bit 818 represents "packet color" as defined by [AltMark]. 820 Observation points can estimate upstream losses by watching a single 821 direction of the traffic flow and counting the number of packets in 822 each observed Q Block, as described in Section 4.2.2. 824 4.2.1. Q Block Length Selection 826 The length of the block must be known to the on-path network probes. 827 There are two alternatives to selecting the Q Block length. The 828 first one requires that the length is known a priori and therefore 829 set within the protocol specifications that implements the marking 830 mechanism. The second requires the sender to select it. 832 In this latter scenario, the sender is expected to choose N (Q Block 833 length) based on the expected amount of loss and reordering on the 834 path. The choice of N strikes a compromise - the observation could 835 become too unreliable in case of packet reordering and/or severe loss 836 if N is too small, while short flows may not yield a useful upstream 837 loss measurement if N is too large (see Section 4.2.2). 839 The value of N should be at least 64 and be a power of 2. This 840 requirement allows an Observer to infer the Q Block length by 841 observing one period of the square signal. It also allows the 842 Observer to identify flows that set the loss bits to arbitrary values 843 (see Section 7). 845 If the sender does not have sufficient information to make an 846 informed decision about Q Block length, the sender should use N=64, 847 since this value has been extensively tried in large-scale field 848 tests and yielded good results. Alternatively, the sender may also 849 choose a random power-of-2 N for each flow, increasing the chances of 850 using a Q Block length that gives the best signal for some flows. 852 The sender must keep the value of N constant for a given flow. 854 4.2.2. Upstream Loss 856 Blocks of N (Q Block length) consecutive packets are sent with the 857 same value of the Q bit, followed by another block of N packets with 858 an inverted value of the Q bit. Hence, knowing the value of N, an 859 on-path observer can estimate the amount of upstream loss after 860 observing at least N packets. The upstream loss rate ("uloss") is 861 one minus the average number of packets in a block of packets with 862 the same Q value ("p") divided by N ("uloss=1-avg(p)/N"). 864 The observer needs to be able to tolerate packet reordering that can 865 blur the edges of the square signal, as explained in Section 4.2.3. 867 =====================> 868 ********** -----Obs----> ********** 869 * Client * * Server * 870 ********** <------------ ********** 872 (a) in client-server channel (uloss_up) 874 ********** ------------> ********** 875 * Client * * Server * 876 ********** <----Obs----- ********** 877 <===================== 879 (b) in server-client channel (uloss_down) 881 Upstream loss 883 4.2.3. Identifying Q Block Boundaries 885 Packet reordering can produce spurious edges in the square signal. 886 To address this, the observer should look for packets with the 887 current Q bit value up to X packets past the first packet with a 888 reverse Q bit value. The value of X, a "Marking Block Threshold", 889 should be less than "N/2". 891 The choice of X represents a trade-off between resiliency to 892 reordering and resiliency to loss. A very large Marking Block 893 Threshold will be able to reconstruct Q Blocks despite a significant 894 amount of reordring, but it may erroneously coalesce packets from 895 multiple Q Blocks into fewer Q Blocks, if loss exceeds 50% for some Q 896 Blocks. 898 4.3. L Bit - Loss Event Bit 900 The Loss Event bit uses an Unreported Loss counter maintained by the 901 protocol that implements the marking mechanism. To use the Loss 902 Event bit, the protocol must allow the sender to identify lost 903 packets. This is true of protocols such as QUIC, partially true for 904 TCP and SCTP (losses of pure ACKs are not detected) and is not true 905 of protocols such as UDP and IP/IPv6. 907 The Unreported Loss counter is initialized to 0, and L bit of every 908 outgoing packet indicates whether the Unreported Loss counter is 909 positive (L=1 if the counter is positive, and L=0 otherwise). 911 The value of the Unreported Loss counter is decremented every time a 912 packet with L=1 is sent. 914 The value of the Unreported Loss counter is incremented for every 915 packet that the protocol declares lost, using whatever loss detection 916 machinery the protocol employs. If the protocol is able to rescind 917 the loss determination later, a positive Unreported Loss counter may 918 be decremented due to the rescission, but it should NOT become 919 negative due to the rescission. 921 This loss signaling is similar to loss signaling in [ConEx], except 922 the Loss Event bit is reporting the exact number of lost packets, 923 whereas Echo Loss bit in [ConEx] is reporting an approximate number 924 of lost bytes. 926 For protocols, such as TCP ([TCP]), that allow network devices to 927 change data segmentation, it is possible that only a part of the 928 packet is lost. In these cases, the sender must increment Unreported 929 Loss counter by the fraction of the packet data lost (so Unreported 930 Loss counter may become negative when a packet with L=1 is sent after 931 a partial packet has been lost). 933 Observation points can estimate the end-to-end loss, as determined by 934 the upstream endpoint, by counting packets in this direction with the 935 L bit equal to 1, as described in Section 4.3.1. 937 4.3.1. End-To-End Loss 939 The Loss Event bit allows an observer to estimate the end-to-end loss 940 rate by counting packets with L bit value of 0 and 1 for a given 941 flow. The end-to-end loss rate is the fraction of packets with L=1. 943 The assumption here is that upstream loss affects packets with L=0 944 and L=1 equally. If some loss is caused by tail-drop in a network 945 device, this may be a simplification. If the sender's congestion 946 controller reduces the packet send rate after loss, there may be a 947 sufficient delay before sending packets with L=1 that they have a 948 greater chance of arriving at the observer. 950 4.3.2. Loss Profile Characterization 952 In addition to measuring the end-to-end loss rate, the Loss Event bit 953 allows an observer to characterize loss profile, since the 954 distribution of observed packets with L bit set to 1 roughly 955 corresponds to the distribution of packets lost between 1 RTT and 1 956 RTO before (see Section 4.4.1). Hence, observing random single 957 instances of L bit set to 1 indicates random single packet loss, 958 while observing blocks of packets with L bit set to 1 indicates loss 959 affecting entire blocks of packets. 961 4.4. L+Q Bits - Upstream, Downstream, and End-to-End Loss Measurements 963 Combining L and Q bits allows a passive observer watching a single 964 direction of traffic to accurately measure: 966 - upstream loss: sender-to-observer loss (see Section 4.2.2) 968 - downstream loss: observer-to-receiver loss (see Section 4.4.1.1) 970 - end-to-end loss: sender-to-receiver loss on the observed path (see 971 Section 4.3.1) with loss profile characterization (see 972 Section 4.3.2) 974 4.4.1. Correlating End-to-End and Upstream Loss 976 Upstream loss is calculated by observing packets that did not suffer 977 the upstream loss (Section 4.2.2). End-to-end loss, however, is 978 calculated by observing subsequent packets after the sender's 979 protocol detected the loss. Hence, end-to-end loss is generally 980 observed with a delay of between 1 RTT (loss declared due to multiple 981 duplicate acknowledgments) and 1 RTO (loss declared due to a timeout) 982 relative to the upstream loss. 984 The flow RTT can sometimes be estimated by timing protocol handshake 985 messages. This RTT estimate can be greatly improved by observing a 986 dedicated protocol mechanism for conveying RTT information, such as 987 the Spin bit (see Section 3.1) or Delay bit (see Section 3.2). 989 Whenever the observer needs to perform a computation that uses both 990 upstream and end-to-end loss rate measurements, it should use 991 upstream loss rate leading the end-to-end loss rate by approximately 992 1 RTT. If the observer is unable to estimate RTT of the flow, it 993 should accumulate loss measurements over time periods of at least 4 994 times the typical RTT for the observed flows. 996 If the calculated upstream loss rate exceeds the end-to-end loss rate 997 calculated in Section 4.3.1, then either the Q Period is too short 998 for the amount of packet reordering or there is observer loss, 999 described in Section 4.4.1.2. If this happens, the observer should 1000 adjust the calculated upstream loss rate to match end-to-end loss 1001 rate, unless the following applies. 1003 In case of a protocol like TCP and SCTP that does not track losses of 1004 pure ACK packets, observing a direction of traffic dominated by pure 1005 ACK packets could result in measured upstream loss that is higher 1006 than measured end-to-end loss, if said pure ACK packets are lost 1007 upstream. Hence, if the measurement is applied to such protocols, 1008 and the observer can confirm that pure ACK packets dominate the 1009 observed traffic direction, the observer should adjust the calculated 1010 end-to-end loss rate to match upstream loss rate. 1012 4.4.1.1. Downstream Loss 1014 Because downstream loss affects only those packets that did not 1015 suffer upstream loss, the end-to-end loss rate ("eloss") relates to 1016 the upstream loss rate ("uloss") and downstream loss rate ("dloss") 1017 as "(1-uloss)(1-dloss)=1-eloss". Hence, "dloss=(eloss- 1018 uloss)/(1-uloss)". 1020 4.4.1.2. Observer Loss 1022 A typical deployment of a passive observation system includes a 1023 network tap device that mirrors network packets of interest to a 1024 device that performs analysis and measurement on the mirrored 1025 packets. The observer loss is the loss that occurs on the mirror 1026 path. 1028 Observer loss affects upstream loss rate measurement, since it causes 1029 the observer to account for fewer packets in a block of identical Q 1030 bit values (see Section 4.2.2). The end-to-end loss rate 1031 measurement, however, is unaffected by the observer loss, since it is 1032 a measurement of the fraction of packets with the L bit value of 1, 1033 and the observer loss would affect all packets equally (see 1034 Section 4.3.1). 1036 The need to adjust the upstream loss rate down to match end-to-end 1037 loss rate as described in Section 4.4.1 is an indication of the 1038 observer loss, whose magnitude is between the amount of such 1039 adjustment and the entirety of the upstream loss measured in 1040 Section 4.2.2. Alternatively, a high apparent upstream loss rate 1041 could be an indication of significant packet reordering, possibly due 1042 to packets belonging to a single flow being multiplexed over several 1043 upstream paths with different latency characteristics. 1045 4.5. R Bit - Reflection Square Bit 1047 R bit requires a deployment alongside Q bit. Unlike the square 1048 signal for which packets are transmitted into blocks of fixed size, 1049 the Reflection square signal (being an alternate marking signal too) 1050 produces blocks of packets whose size varies according to these 1051 rules: 1053 - when the transmission of a new block starts, its size is set equal 1054 to the size of the last Q Block whose reception has been 1055 completed; 1057 - if, before transmission of the block is terminated, the reception 1058 of at least one further Q Block is completed, the size of the 1059 block is updated to the average size of the further received Q 1060 Blocks. Implementation details follow. 1062 The Reflection square value is initialized to 0 and is applied to the 1063 R-bit of every outgoing packet. The Reflection square value is 1064 toggled for the first time when the completion of a Q Block is 1065 detected in the incoming square signal (produced by the opposite node 1066 using the Q-bit). When this happens, the number of packets ("p"), 1067 detected within this first Q Block, is used to generate a reflection 1068 square signal which toggles every "M=p" packets (at first). This new 1069 signal produces blocks of M packets (marked using the R-bit) and each 1070 of them is called "Reflection Block" (R Block). 1072 The M value is then updated every time a completed Q Block in the 1073 incoming square signal is received, following this formula: 1074 "M=round(avg(p))". 1076 The parameter "avg(p)" is the average number of packets in a marking 1077 period computed considering all the Q Blocks received since the 1078 beginning of the current R Block. 1080 To ensure a proper computation of the M value, endpoints implementing 1081 the R bit must identify the boundaries of incoming Q Blocks. The 1082 same approach described in {#endmarkingblock} should be used. 1084 Looking at the R-bit, unidirectional observation points have an 1085 indication of losses experienced by the entire unobserved channel 1086 plus those occurred in the path from the sender up to them. 1088 Since the Q Block is sent in one direction, and the corresponding 1089 reflected R Block is sent in the opposite direction, the reflected R 1090 signal is transmitted with the packet rate of the slowest direction. 1091 Namely, if the observed direction is the slowest, there can be 1092 multiple Q Blocks transmitted in the unobserved direction before a 1093 complete R Block is transmitted in the observed direction. If the 1094 unobserved direction is the slowest, the observed direction can be 1095 sending R Blocks of the same size repeatedly before it can update the 1096 signal to account for a newly-completed Q Block. 1098 4.5.1. R+Q Bits - Using R and Q Bits for Passive Loss Measurement 1100 Since both sQuare and Reflection square bits are toggled at most 1101 every N packets (except for the first transition of the R-bit as 1102 explained before), an on-path observer can count the number of 1103 packets of each marking block and, knowing the value of N, can 1104 estimate the amount of loss experienced by the connection. An 1105 observer can calculate different measurements depending on whether it 1106 is able to observe a single direction of the traffic or both 1107 directions. 1109 Single directional observer: 1111 - upstream loss in the observed direction: the loss between the 1112 sender and the observation point (see Section 4.2.2) 1114 - "three-quarters" connection loss: the loss between the receiver 1115 and the sender in the unobserved direction plus the loss between 1116 the sender and the observation point in the observed direction 1118 - end-to-end loss in the unobserved direction: the loss between the 1119 receiver and the sender in the opposite direction 1121 Two directions observer (same metrics seen previously applied to both 1122 direction, plus): 1124 - client-observer half round-trip loss: the loss between the client 1125 and the observation point in both directions 1127 - observer-server half round-trip loss: the loss between the 1128 observation point and the server in both directions 1130 - downstream loss: the loss between the observation point and the 1131 receiver (applicable to both directions) 1133 4.5.1.1. Three-Quarters Connection Loss 1135 Except for the very first block in which there is nothing to reflect 1136 (a complete Q Block has not been yet received), packets are 1137 continuously R-bit marked into alternate blocks of size lower or 1138 equal than N. Knowing the value of N, an on-path observer can 1139 estimate the amount of loss occurred in the whole opposite channel 1140 plus the loss from the sender up to it in the observation channel. 1141 As for the previous metric, the "three-quarters" connection loss rate 1142 ("tqloss") is one minus the average number of packets in a block of 1143 packets with the same R value ("t") divided by "N" 1144 ("tqloss=1-avg(t)/N"). 1146 =======================> 1147 = ********** -----Obs----> ********** 1148 = * Client * * Server * 1149 = ********** <------------ ********** 1150 <============================================ 1152 (a) in client-server channel (tqloss_up) 1154 ============================================> 1155 ********** ------------> ********** = 1156 * Client * * Server * = 1157 ********** <----Obs----- ********** = 1158 <======================= 1160 (b) in server-client channel (tqloss_down) 1162 Three-quarters connection loss 1164 The following metrics derive from this last metric and the upstream 1165 loss produced by the Q Bit. 1167 4.5.1.2. End-To-End Loss in the Opposite Direction 1169 End-to-end loss in the unobserved direction ("eloss_unobserved") 1170 relates to the "three-quarters" connection loss ("tqloss") and 1171 upstream loss in the observed direction ("uloss") as 1172 "(1-eloss_unobserved)(1-uloss)=1-tqloss". Hence, 1173 "eloss_unobserved=(tqloss-uloss)/(1-uloss)". 1175 ********** -----Obs----> ********** 1176 * Client * * Server * 1177 ********** <------------ ********** 1178 <========================================== 1180 (a) in client-server channel (eloss_down) 1182 ==========================================> 1183 ********** ------------> ********** 1184 * Client * * Server * 1185 ********** <----Obs----- ********** 1187 (b) in server-client channel (eloss_up) 1189 End-To-End loss in the opposite direction 1191 4.5.1.3. Half Round-Trip Loss 1193 If the observer is able to observe both directions of traffic, it is 1194 able to calculate two "half round-trip" loss measurements - loss from 1195 the observer to the receiver (in a given direction) and then back to 1196 the observer in the opposite direction. For both directions, "half 1197 round-trip" loss ("hrtloss") relates to "three-quarters" connection 1198 loss ("tqloss_opposite") measured in the opposite direction and the 1199 upstream loss ("uloss") measured in the given direction as 1200 "(1-uloss)(1-hrtloss)=1-tqloss_opposite". Hence, 1201 "hrtloss=(tqloss_opposite-uloss)/(1-uloss)". 1203 =======================> 1204 = ********** ------|-----> ********** 1205 = * Client * Obs * Server * 1206 = ********** <-----|------ ********** 1207 <======================= 1209 (a) client-observer half round-trip loss (hrtloss_co) 1211 =======================> 1212 ********** ------|-----> ********** = 1213 * Client * Obs * Server * = 1214 ********** <-----|------ ********** = 1215 <======================= 1217 (b) observer-server half round-trip loss (hrtloss_os) 1219 Half Round-trip loss (both direction) 1221 4.5.1.4. Downstream Loss 1223 If the observer is able to observe both directions of traffic, it is 1224 able to calculate two downstream loss measurements using either end- 1225 to-end loss and upstream loss, similar to the calculation in 1226 Section 4.4.1.1 or using "half round-trip" loss and upstream loss in 1227 the opposite direction. 1229 For the latter, "dloss=(hrtloss-uloss_opposite)/(1-uloss_opposite)". 1231 =====================> 1232 ********** ------|-----> ********** 1233 * Client * Obs * Server * 1234 ********** <-----|------ ********** 1236 (a) in client-server channel (dloss_up) 1238 ********** ------|-----> ********** 1239 * Client * Obs * Server * 1240 ********** <-----|------ ********** 1241 <===================== 1243 (b) in server-client channel (dloss_down) 1245 Downstream loss 1247 4.5.2. Enhancement of R Block Length Computation 1249 The use of the rounding function used in the M computation introduces 1250 errors that can be minimized by storing the rounding applied each 1251 time M is computed, and using it during the computation of the M 1252 value in the following R Block. 1254 This can be achieved introducing the new "r_avg" parameter in the 1255 computation of M. The new formula is "Mr=avg(p)+r_avg; M=round(Mr); 1256 r_avg=Mr-M" where the initial value of "r_avg" is equal to 0. 1258 4.5.3. Improved Resilience to Packet Reordering 1260 When a protocol implementing the marking mechanism is able to detect 1261 when packets are received out of order, it can improve resilience to 1262 packet reordering beyond what is possible using methods described in 1263 Section 4.2.3. 1265 This can be achieved by updating the size of the current R Block 1266 while this is being transmitted. The reflection block size is then 1267 updated every time an incoming reordered packet of the previous Q 1268 Block is detected. This can be done if and only if the transmission 1269 of the current reflection block is in progress and no packets of the 1270 following Q Block have been received. 1272 5. Summary of Delay and Loss Marking Methods 1274 This section summarizes the marking methods described in this draft. 1276 For the Delay measurement, it is possible to use the spin bit and/or 1277 the delay bit. A unidirectional or bidirectional observer can be 1278 used. 1280 +------------------+----+-------------------------+---------------+ 1281 | Method |# of| Available | | 1282 | |bits| Delay Metrics | Impairments | 1283 | | +------------+------------+ Resiliency | 1284 | | | UNIDIR | BIDIR | | 1285 | | | Observer | Observer | | 1286 +------------------+----+------------+------------+---------------+ 1287 |S: Spin Bit | 1 | RTT | x2 | low | 1288 | | | | Half RTT | | 1289 +------------------+----+------------+------------+---------------+ 1290 |D: Delay Bit | 1 | RTT | x2 | high | 1291 | | | | Half RTT | | 1292 +------------------+----+------------+------------+---------------+ 1293 |SD: Spin Bit & | 2 | RTT | x2 | high | 1294 | Delay Bit * | | | Half RTT | | 1295 +------------------+----+------------+------------+---------------+ 1297 x2 Same metric for both directions 1298 * Both algorithms work independtly; an observer could use 1299 approximate spin bit measures when delay bit ones aren't available 1301 Figure 1: Delay Comparison 1303 For the Loss measurement, each row in the table of Figure 2 1304 represents a loss marking method. For each method the table 1305 specifies the number of bits required in the header, the available 1306 metrics using an unidirectional or bidirectional observer, applicable 1307 protocols, measurement fidelity and delay. 1309 +-------------+-+-----------------------+-+------------------------+ 1310 | Method |B| Available |P| Measurement Aspects | 1311 | |i| Loss Metrics |r+------------+-----------+ 1312 | |t| UNIDIR | BIDIR |t| Fidelity | Delay | 1313 | |s| Observer | Observer |o| | | 1314 +-------------+-+-----------+-----------+-+------------+-----------+ 1315 |T: Round Trip|$| RT | x2 | | Rate by | ~6 RTT | 1316 | Loss Bit |1| | Half RT |*| sampling +-----------+ 1317 | | | | | | 1/3 to 1/(3*ppa) of | 1318 | | | | | | pkts over 2 RTT | 1319 +-------------+-+-----------+-----------+-+------------+-----------+ 1320 |Q: Square Bit|1| Upstream | x2 |*| Rate over | N pkts | 1321 | | | | | | N pkts | (e.g. 64) | 1322 | | | | | | (e.g. 64) | | 1323 +-------------+-+-----------+-----------+-+------------+-----------+ 1324 |L: Loss Event|1| E2E | x2 |#| Loss shape | Min: RTT | 1325 | Bit | | | | | (and rate) | Max: RTO | 1326 +-------------+-+-----------+-----------+-+------------+-----------+ 1327 |QL: Square + |2| Upstream | x2 | | -> see Q | Up: see Q | 1328 | Loss Ev. | | Downstream| x2 |#| -> see Q|L | Others: | 1329 | Bits | | E2E | x2 | | -> see L | see L | 1330 +-------------+-+-----------+-----------+-+------------+-----------+ 1331 |QR: Square + |2| Upstream | x2 | | Rate over | Up: see Q | 1332 | Ref. Sq. | | 3/4 RT | x2 | | N*ppa pkts | Others: | 1333 | Bits | | !E2E | E2E |*| (see Q bit | N*ppa pk | 1334 | | | | Downstream| | for N) | (see Q | 1335 | | | | Half RT | | | for N) | 1336 +-------------+-+-----------+-----------+-+------------+-----------+ 1338 * All protocols 1339 # Protocols employing loss detection (w/ or w/o pure ACK loss 1340 detection) 1341 $ Require a working spin bit 1342 ! Metric relative to the opposite channel 1343 x2 Same metric for both directions 1344 ppa Packets-Per-Ack 1345 Q|L See Q if Upstream loss is significant; L otherwise 1347 Figure 2: Loss Comparison 1349 6. ECN-Echo Event Bit 1351 While the primary focus of the draft is on exposing packet loss and 1352 delay, modern networks can report congestion before they are forced 1353 to drop packets, as described in [ECN]. When transport protocols 1354 keep ECN-Echo feedback under encryption, this signal cannot be 1355 observed by the network operators. When tasked with diagnosing 1356 network performance problems, knowledge of a congestion downstream of 1357 an observation point can be instrumental. 1359 If downstream congestion information is desired, this information can 1360 be signaled with an additional bit. 1362 - E: The "ECN-Echo Event" bit is set to 0 or 1 according to the 1363 Unreported ECN Echo counter, as explained below in Section 6.1. 1365 6.1. Setting the ECN-Echo Event Bit on Outgoing Packets 1367 The Unreported ECN-Echo counter operates identically to Unreported 1368 Loss counter (Section 4.3), except it counts packets delivered by the 1369 network with CE markings, according to the ECN-Echo feedback from the 1370 receiver. 1372 This ECN-Echo signaling is similar to ECN signaling in [ConEx]. ECN- 1373 Echo mechanism in QUIC provides the number of packets received with 1374 CE marks. For protocols like TCP, the method described in 1375 [ConEx-TCP] can be employed. As stated in [ConEx-TCP], such feedback 1376 can be further improved using a method described in [ACCURATE]. 1378 6.2. Using E Bit for Passive ECN-Reported Congestion Measurement 1380 A network observer can count packets with CE codepoint and determine 1381 the upstream CE-marking rate directly. 1383 Observation points can also estimate ECN-reported end-to-end 1384 congestion by counting packets in this direction with a E bit equal 1385 to 1. 1387 The upstream CE-marking rate and end-to-end ECN-reported congestion 1388 can provide information about downstream CE-marking rate. Presence 1389 of E bits along with L bits, however, can somewhat confound precise 1390 estimates of upstream and downstream CE-markings in case the flow 1391 contains packets that are not ECN-capable. 1393 7. Protocol Ossification Considerations 1395 Accurate loss and delay information is not critical to the operation 1396 of any protocol, though its presence for a sufficient number of flows 1397 is important for the operation of networks. 1399 The delay and loss bits are amenable to "greasing" described in 1400 [RFC8701], if the protocol designers are not ready to dedicate (and 1401 ossify) bits used for loss reporting to this function. The greasing 1402 could be accomplished similarly to the Latency Spin bit greasing in 1403 [QUIC-TRANSPORT]. Namely, implementations could decide that a 1404 fraction of flows should not encode loss and delay information and, 1405 instead, the bits would be set to arbitrary values. The observers 1406 would need to be ready to ignore flows with delay and loss 1407 information more resembling noise than the expected signal. 1409 8. Examples of Application 1411 8.1. QUIC 1413 The binding of a delay signal to QUIC is partially described in 1414 [QUIC-TRANSPORT], which adds the spin bit to the first byte of the 1415 short packet header, leaving two reserved bits for future 1416 experiments. 1418 To implement the additional signals discussed in this document, the 1419 first byte of the short packet header can be modified as follows: 1421 - the delay bit (D) can be placed in the first reserved bit (i.e. 1422 the fourth most significant bit _0x10_) while the round trip loss 1423 bit (T) in the second reserved bit (i.e. the fifth most 1424 significant bit _0x08_); the proposed scheme is: 1426 0 1 2 3 4 5 6 7 1427 +-+-+-+-+-+-+-+-+ 1428 |0|1|S|D|T|K|P|P| 1429 +-+-+-+-+-+-+-+-+ 1431 Scheme 1 1433 - alternatively, a two bits loss signal (QL or QR) can be placed in 1434 both reserved bits; the proposed schemes, in this case, are: 1436 0 1 2 3 4 5 6 7 1437 +-+-+-+-+-+-+-+-+ 1438 |0|1|S|Q|L|K|P|P| 1439 +-+-+-+-+-+-+-+-+ 1441 Scheme 2A 1443 0 1 2 3 4 5 6 7 1444 +-+-+-+-+-+-+-+-+ 1445 |0|1|S|Q|R|K|P|P| 1446 +-+-+-+-+-+-+-+-+ 1448 Scheme 2B 1450 A further option would be to substitute the spin bit with the delay 1451 bit leaving the two reserved bits for loss detection. The proposed 1452 schemes are: 1454 0 1 2 3 4 5 6 7 1455 +-+-+-+-+-+-+-+-+ 1456 |0|1|D|Q|L|K|P|P| 1457 +-+-+-+-+-+-+-+-+ 1459 Scheme 3A 1461 0 1 2 3 4 5 6 7 1462 +-+-+-+-+-+-+-+-+ 1463 |0|1|D|Q|R|K|P|P| 1464 +-+-+-+-+-+-+-+-+ 1466 Scheme 3B 1468 8.2. TCP 1470 The signals can be added to TCP by defining bit 4 of byte 13 of the 1471 TCP header to carry the spin bit or the delay bit, and possibly bits 1472 5 and 6 to carry additional information, like the delay bit and the 1473 round-trip loss bit (DT), or a two bits loss signal (QL or QR). 1475 9. Security Considerations 1477 Passive loss and delay observations have been a part of the network 1478 operations for a long time, so exposing loss and delay information to 1479 the network does not add new security concerns for protocols that are 1480 currently observable. 1482 In the absence of packet loss, Q and R bits signals do not provide 1483 any information that cannot be observed by simply counting packets 1484 transiting a network path. In the presence of packet loss, Q and R 1485 bits will disclose the loss, but this is information about the 1486 environment and not the endpoint state. The L bit signal discloses 1487 internal state of the protocol's loss detection machinery, but this 1488 state can often be gleamed by timing packets and observing congestion 1489 controller response. 1491 Hence, loss bits do not provide a viable new mechanism to attack data 1492 integrity and secrecy. 1494 9.1. Optimistic ACK Attack 1496 A defense against an Optimistic ACK Attack, described in 1497 [QUIC-TRANSPORT], involves a sender randomly skipping packet numbers 1498 to detect a receiver acknowledging packet numbers that have never 1499 been received. The Q bit signal may inform the attacker which packet 1500 numbers were skipped on purpose and which had been actually lost (and 1501 are, therefore, safe for the attacker to acknowledge). To use the Q 1502 bit for this purpose, the attacker must first receive at least an 1503 entire Q Block of packets, which renders the attack ineffective 1504 against a delay-sensitive congestion controller. 1506 A protocol that is more susceptible to an Optimistic ACK Attack with 1507 the loss signal provided by Q bit and uses a loss-based congestion 1508 controller, should shorten the current Q Block by the number of 1509 skipped packets numbers. For example, skipping a single packet 1510 number will invert the square signal one outgoing packet sooner. 1512 Similar considerations apply to the R Bit, although a shortened R 1513 Block along with a matching skip in packet numbers does not 1514 necessarily imply a lost packet, since it could be due to a lost 1515 packet on the reverse path along with a deliberately skipped packet 1516 by the sender. 1518 10. Privacy Considerations 1520 To minimize unintentional exposure of information, loss bits provide 1521 an explicit loss signal - a preferred way to share information per 1522 [RFC8558]. 1524 New protocols commonly have specific privacy goals, and loss 1525 reporting must ensure that loss information does not compromise those 1526 privacy goals. For example, [QUIC-TRANSPORT] allows changing 1527 Connection IDs in the middle of a connection to reduce the likelihood 1528 of a passive observer linking old and new sub-flows to the same 1529 device. A QUIC implementation would need to reset all counters when 1530 it changes the destination (IP address or UDP port) or the Connection 1531 ID used for outgoing packets. It would also need to avoid 1532 incrementing Unreported Loss counter for loss of packets sent to a 1533 different destination or with a different Connection ID. 1535 11. IANA Considerations 1537 This document makes no request of IANA. 1539 12. Change Log 1541 TBD 1543 13. Contributors 1545 The following people provided valuable contributions to this 1546 document: 1548 - Marcus Ihlar, Ericsson, marcus.ihlar@ericsson.com 1550 - Jari Arkko, Ericsson, jari.arkko@ericsson.com 1552 - Emile Stephan, Orange, emile.stephan@orange.com 1554 14. Acknowledgements 1556 TBD 1558 15. References 1560 15.1. Normative References 1562 [ConEx] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1563 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1564 DOI 10.17487/RFC7713, December 2015, 1565 . 1567 [ConEx-TCP] 1568 Kuehlewind, M., Ed. and R. Scheffenegger, "TCP 1569 Modifications for Congestion Exposure (ConEx)", RFC 7786, 1570 DOI 10.17487/RFC7786, May 2016, 1571 . 1573 [ECN] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1574 of Explicit Congestion Notification (ECN) to IP", 1575 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1576 . 1578 [IP] Postel, J., "Internet Protocol", STD 5, RFC 791, 1579 DOI 10.17487/RFC0791, September 1981, 1580 . 1582 [IPM-Methods] 1583 Morton, A., "Active and Passive Metrics and Methods (with 1584 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1585 May 2016, . 1587 [IPv6] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1588 (IPv6) Specification", STD 86, RFC 8200, 1589 DOI 10.17487/RFC8200, July 2017, 1590 . 1592 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1593 Requirement Levels", BCP 14, RFC 2119, 1594 DOI 10.17487/RFC2119, March 1997, 1595 . 1597 [RFC8558] Hardie, T., Ed., "Transport Protocol Path Signals", 1598 RFC 8558, DOI 10.17487/RFC8558, April 2019, 1599 . 1601 [TCP] Postel, J., "Transmission Control Protocol", STD 7, 1602 RFC 793, DOI 10.17487/RFC0793, September 1981, 1603 . 1605 15.2. Informative References 1607 [ACCURATE] 1608 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1609 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1610 ecn-13 (work in progress), November 2020. 1612 [AltMark] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1613 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1614 "Alternate-Marking Method for Passive and Hybrid 1615 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1616 January 2018, . 1618 [ANRW19-PM-QUIC] 1619 Bulgarella, F., Cociglio, M., Fioccola, G., Marchetto, G., 1620 and R. Sisto, "Performance measurements of QUIC 1621 communications", Proceedings of the Applied Networking 1622 Research Workshop, DOI 10.1145/3340301.3341127, July 2019. 1624 [I-D.trammell-ippm-spin] 1625 Trammell, B., "An Explicit Transport-Layer Signal for 1626 Hybrid RTT Measurement", draft-trammell-ippm-spin-00 (work 1627 in progress), January 2019. 1629 [I-D.trammell-tsvwg-spin] 1630 Trammell, B., "A Transport-Independent Explicit Signal for 1631 Hybrid RTT Measurement", draft-trammell-tsvwg-spin-00 1632 (work in progress), July 2018. 1634 [IPv6AltMark] 1635 Fioccola, G., Zhou, T., Cociglio, M., Qin, F., and R. 1636 Pang, "IPv6 Application of the Alternate Marking Method", 1637 draft-ietf-6man-ipv6-alt-mark-02 (work in progress), 1638 October 2020. 1640 [QUIC-TRANSPORT] 1641 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1642 and Secure Transport", draft-ietf-quic-transport-34 (work 1643 in progress), January 2021. 1645 [RFC8517] Dolson, D., Ed., Snellman, J., Boucadair, M., Ed., and C. 1646 Jacquenet, "An Inventory of Transport-Centric Functions 1647 Provided by Middleboxes: An Operator Perspective", 1648 RFC 8517, DOI 10.17487/RFC8517, February 2019, 1649 . 1651 [RFC8701] Benjamin, D., "Applying Generate Random Extensions And 1652 Sustain Extensibility (GREASE) to TLS Extensibility", 1653 RFC 8701, DOI 10.17487/RFC8701, January 2020, 1654 . 1656 [SPIN-BIT] 1657 Trammell, B., Vaere, P., Even, R., Fioccola, G., Fossati, 1658 T., Ihlar, M., Morton, A., and S. Emile, "Adding Explicit 1659 Passive Measurability of Two-Way Latency to the QUIC 1660 Transport Protocol", draft-trammell-quic-spin-03 (work in 1661 progress), May 2018. 1663 [TRANSPORT-ENCRYPT] 1664 Fairhurst, G. and C. Perkins, "Considerations around 1665 Transport Header Confidentiality, Network Operations, and 1666 the Evolution of Internet Transport Protocols", draft- 1667 ietf-tsvwg-transport-encrypt-18 (work in progress), 1668 November 2020. 1670 [UDP-OPTIONS] 1671 Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- 1672 udp-options-09 (work in progress), November 2020. 1674 [UDP-SURPLUS] 1675 Herbert, T., "UDP Surplus Header", draft-herbert-udp- 1676 space-hdr-01 (work in progress), July 2019. 1678 Authors' Addresses 1680 Mauro Cociglio 1681 Telecom Italia 1682 Via Reiss Romoli, 274 1683 Torino 10148 1684 Italy 1686 EMail: mauro.cociglio@telecomitalia.it 1688 Alexandre Ferrieux 1689 Orange Labs 1691 EMail: alexandre.ferrieux@orange.com 1693 Giuseppe Fioccola 1694 Huawei Technologies 1695 Riesstrasse, 25 1696 Munich 80992 1697 Germany 1699 EMail: giuseppe.fioccola@huawei.com 1701 Igor Lubashev 1702 Akamai Technologies 1704 EMail: ilubashe@akamai.com 1706 Fabio Bulgarella 1707 Telecom Italia 1708 Via Reiss Romoli, 274 1709 Torino 10148 1710 Italy 1712 EMail: fabio.bulgarella@guest.telecomitalia.it 1714 Isabelle Hamchaoui 1715 Orange Labs 1717 EMail: isabelle.hamchaoui@orange.com 1718 Massimo Nilo 1719 Telecom Italia 1721 EMail: massimo.nilo@telecomitalia.it 1723 Riccardo Sisto 1724 Politecnico di Torino 1726 EMail: riccardo.sisto@polito.it 1728 Dmitri Tikhonov 1729 LiteSpeed Technologies 1731 EMail: dtikhonov@litespeedtech.com