idnits 2.17.1 draft-trammell-quic-spin-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 10, 2018) is 2207 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-18) exists of draft-ietf-quic-manageability-01 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-10 == Outdated reference: A later version (-04) exists of draft-trammell-wire-image-03 Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC B. Trammell, Ed. 3 Internet-Draft P. De Vaere 4 Intended status: Informational ETH Zurich 5 Expires: October 12, 2018 R. Even 6 Huawei 7 G. Fioccola 8 Telecom Italia 9 T. Fossati 10 Nokia 11 M. Ihlar 12 Ericsson 13 A. Morton 14 AT&T Labs 15 E. Stephan 16 Orange 17 April 10, 2018 19 Adding Explicit Passive Measurability of Two-Way Latency to the QUIC 20 Transport Protocol 21 draft-trammell-quic-spin-02 23 Abstract 25 This document describes the addition of a "spin bit", intended for 26 explicit measurability of end-to-end RTT, to the QUIC transport 27 protocol. It proposes a detailed mechanism for the spin bit, as well 28 as an additional mechanism, called the valid edge counter, to 29 increase the fidelity of the latency signal in less than ideal 30 network conditions. It describes how to use the latency spin signal 31 to measure end-to-end latency, discusses corner cases and their 32 workarounds in the measurement, describes experimental evaluation of 33 the mechanism done to date, and examines the utility and privacy 34 implications of the spin bit. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on October 12, 2018. 53 Copyright Notice 55 Copyright (c) 2018 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 1.1. About This Document . . . . . . . . . . . . . . . . . . . 4 72 2. The Spin Bit Mechanism . . . . . . . . . . . . . . . . . . . 4 73 3. Using the Spin Bit for Passive RTT Measurement . . . . . . . 5 74 3.1. Limitations and Workarounds . . . . . . . . . . . . . . . 5 75 3.2. Illustration . . . . . . . . . . . . . . . . . . . . . . 6 76 4. The Valid Edge Counter . . . . . . . . . . . . . . . . . . . 8 77 4.1. Proposed Short Header Format Including Spin Bit and VEC . 8 78 4.2. Setting the Valid Edge Counter (VEC) . . . . . . . . . . 9 79 4.3. Use of the VEC by a passive observer . . . . . . . . . . 9 80 5. Privacy and Security Considerations . . . . . . . . . . . . . 10 81 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 82 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 83 7.1. Normative References . . . . . . . . . . . . . . . . . . 12 84 7.2. Informative References . . . . . . . . . . . . . . . . . 12 85 Appendix A. Experimental Evaluation . . . . . . . . . . . . . . 15 86 Appendix B. Use Cases for Passive RTT Measurement . . . . . . . 16 87 B.1. Inter-domain Troubleshooting . . . . . . . . . . . . . . 17 88 B.2. Two-Point Intradomain Measurement . . . . . . . . . . . . 18 89 B.3. Bufferbloat Mitigation in Cellular Networks . . . . . . . 18 90 B.4. Locating WiFi Problems in Home Networks . . . . . . . . . 19 91 B.5. Internet Measurement Research . . . . . . . . . . . . . . 19 92 Appendix C. Alternate RTT Measurement Approaches for Diagnosing 93 QUIC flows . . . . . . . . . . . . . . . . . . . . . 20 94 C.1. Handshake RTT measurement . . . . . . . . . . . . . . . . 20 95 C.2. Parallel active measurement . . . . . . . . . . . . . . . 21 96 C.3. Frequency Analysis . . . . . . . . . . . . . . . . . . . 21 97 Appendix D. Greasing . . . . . . . . . . . . . . . . . . . . . . 22 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 100 1. Introduction 102 The QUIC transport protocol [QUIC-TRANS] is a UDP-encapsulated 103 protocol integrated with Transport Layer Security (TLS) [TLS] to 104 encrypt most of its protocol internals, beyond those handshake 105 packets needed to establish or resume a TLS session, and information 106 required to reassemble QUIC streams (the packet number) and to route 107 QUIC packets to the correct machine in a load-balancing situation 108 (the connection ID). In contrast to TCP, QUIC's wire image (see 109 [WIRE-IMAGE]) exposes much less information about transport protocol 110 state than TCP's wire image. Specifically, the fact that sequence 111 and acknowledgement numbers and timestamps (available in TCP) cannot 112 be seen by on-path observers in QUIC means that passive TCP loss and 113 latency measurement techniques that rely on this information (e.g. 114 [CACM-TCP], [TMA-QOF]) cannot be easily ported to work with QUIC. 116 This document proposes a solution to this problem by adding a 117 "latency spin bit" to the QUIC short header. This bit is designed 118 solely for explicit passive measurability of the protocol. It 119 provides one RTT sample per RTT to passive observers of QUIC traffic. 120 This document describes the mechanism, how it can be added to QUIC, 121 and how it can be used by passive measurement facilities to generate 122 RTT samples. It explores potential corner cases and shortcomings of 123 the mechanism, and proposes an extention called the Valid Edge 124 Counter (VEC) to mitigate them. It further details findings on 125 privacy risk researched by the QUIC RTT Design Team, which was tasked 126 by the IETF QUIC Working Group to determine the risk/utility tradeoff 127 for the spin bit. 129 Appendices summarize experimental results to date with an 130 implementation of the spin bit built atop a recent QUIC 131 implementation, describe use cases for passive RTT measurement at the 132 resolution provided by the spin bit, explore alternatives to the spin 133 bit for passive latency measurement of QUIC flows, and discuss the 134 necessity of "greasing" the spin bit. 136 The spin bit has low overhead, presents negligible privacy risk, and 137 has clear utility in providing passive RTT measurability of QUIC that 138 is far superior to QUIC's measurability without the spin bit, and 139 equivalent to or better than TCP passive measurability. 141 1.1. About This Document 143 [QUIC-SPIN-EXP] specifies the addition of the spin bit to the QUIC 144 transport protocol for experimental purposes. This document provides 145 background for that specification, documents work done in the 146 development of the spin bit proposal, and extends it with the VEC 147 signal for loss, reordering, and delay compensation without relying 148 on the QUIC packet number. 150 This document is maintained in the GitHub repository 151 https://github.com/britram/draft-trammell-quic-spin, and the editor's 152 copy is available online at https://britram.github.io/draft-trammell- 153 quic-spin. Current open issues on the document can be seen at 154 https://github.com/britram/draft-trammell-quic-spin/issues. Comments 155 and suggestions on this document can be made by filing an issue 156 there, or by contacting the editor. 158 2. The Spin Bit Mechanism 160 The latency spin bit enables latency monitoring from observation 161 points on the network path. Each endpoint, client and server, 162 maintains a spin value, 0 or 1, for each QUIC connection, and sets 163 the spin bit on packets it sends for that connection to the 164 appropriate value (below). It also maintains the highest packet 165 number seen from its peer on the connection. The value is then 166 determined at each endpoint as follows: 168 o The server initializes its spin value to 0. When it receives a 169 packet from the client, if that packet has a short header and if 170 it increments the highest packet number seen by the server from 171 the client, it sets the spin value to the spin bit in the received 172 packet. 174 o The client initializes its spin value to 0. When it receives a 175 packet from the server, if the packet has a short header and if it 176 increments the highest packet number seen by the client from the 177 server, it sets the spin value to the opposite of the spin bit in 178 the received packet. 180 This procedure will cause the spin bit to change value in each 181 direction once per round trip. Observation points can estimate the 182 network latency by observing these changes in the latency spin bit, 183 as described in Section 3. See Section 3.2 for an illustration of 184 this mechanism in action. 186 The defails of the addition of the spin bit to the QUIC short header 187 are given in [QUIC-SPIN-EXP]. 189 3. Using the Spin Bit for Passive RTT Measurement 191 When a QUIC flow is sending at full rate (i.e., neither application 192 nor flow control limited), the latency spin bit in each direction 193 changes value once per round-trip time (RTT). An on-path observer 194 can observe the time difference between edges in the spin bit signal 195 in a single direction to measure one sample of end-to-end RTT. Note 196 that this measurement, as with passive RTT measurement for TCP, 197 includes any transport protocol delay (e.g., delayed sending of 198 acknowledgements) and/or application layer delay (e.g., waiting for a 199 request to complete). It therefore provides devices on path a good 200 instantaneous estimate of the RTT as experienced by the application. 201 A simple linear smoothing or moving minimum filter can be applied to 202 the stream of RTT information to get a more stable estimate. 204 An on-path observer that can see traffic in both directions (from 205 client to server and from server to client) can also use the spin bit 206 to measure "upstream" and "downstream" component RTT; i.e, the 207 component of the end-to-end RTT attributable to the paths between the 208 observer and the server and the observer and the client, 209 respectively. It does this by measuring the delay between a spin 210 edge observed in the upstream direction and that observed in the 211 downstream direction, and vice versa. 213 3.1. Limitations and Workarounds 215 Application-limited and flow-control-limited senders can have 216 application and transport layer delay, respectively, that are much 217 greater than network RTT. Therefore, the spin bit provides network 218 latency information only when the sender is neither application nor 219 flow control limited. When the sender is application-limited by 220 periodic application traffic, where that period is longer than the 221 RTT, measuring the spin bit provides information about the 222 application period, not the RTT. Simple heuristics based on the 223 observed data rate per flow or changes in the RTT series can be used 224 to reject bad RTT samples due to application or flow control 225 limitation. 227 Since the spin bit logic at each endpoint considers only samples on 228 packets that advance the largest packet number seen, signal 229 generation itself is resistant to reordering. However, reordering 230 can cause problems at an observer by causing spurious edge detection 231 and therefore low RTT estimates, if reordering occurs across a spin 232 bit flip in the stream. This can be probabilistically mitigated by 233 the observer also tracking the low-order bits of the packet number, 234 and rejecting edges that appear out-of-order [RFC4737]. 236 All of these limitations are addressed by an enhancement to the spin 237 bit, the Valid Edge Counter, described in detail in Section 4. 239 3.2. Illustration 241 To illustrate the operation of the spin bit, we consider a simplified 242 model of a single path between client and server as a queue with 243 slots for five packets, and assume that both client and server sent 244 packets at a constant rate. If each packet moves one slot in the 245 queue per clock tick, note that this network has a RTT of 10 ticks. 247 Initially, during connection establishment, no packets with a spin 248 bit are in flight, as shown in Figure 1. 250 +--------+ - - - - - +--------+ 251 | | --------> | | 252 | Client | | Server | 253 | | <-------- | | 254 +--------+ - - - - - +--------+ 256 Figure 1: Initial state, no spin bit between client and server 258 Either the server, the client, or both can begin sending packets with 259 short headers after connection establishment, as shown in Figure 2; 260 here, no spin edges are yet in transit. 262 +--------+ 0 0 - - - +--------+ 263 | | --------> | | 264 | Client | | Server | 265 | | <-------- | | 266 +--------+ - - 0 0 0 +--------+ 268 Figure 2: Client and server begin sending packets with spin 0 270 Once the server's first 0-marked packet arrives at the client, the 271 client sets its spin value to 1, and begins sending packets with the 272 spin bit set, as shown in Figure 3. The spin edge is now in transit 273 toward the server. 275 +--------+ 1 0 0 0 0 +--------+ 276 | | --------> | | 277 | Client | | Server | 278 | | <-------- | | 279 +--------+ 0 0 0 0 0 +--------+ 281 Figure 3: The bit begins spinning 283 Five ticks later, this packet arrives at the server, which takes its 284 spin value from it and reflects that value back on the next packet it 285 sends, as shown in Figure 4. The spin edge is now in transit toward 286 the client. 288 +--------+ 1 1 1 1 1 +--------+ 289 | | --------> | | 290 | Client | | Server | 291 | | <-------- | | 292 +--------+ 0 0 0 0 1 +--------+ 294 Figure 4: Server reflects the spin edge 296 Five ticks later, the 1-marked packet arrives at the client, which 297 inverts its spin value and sends the inverted value on the next 298 packet it sends, as shown in Figure 5. 300 obs. points X Y 301 +--------+ 0 1 1 1 1 +--------+ 302 | | --------> | | 303 | Client | | Server | 304 | | <-------- | | 305 +--------+ 1 1 1 1 1 +--------+ 306 Y 308 Figure 5: Client inverts the spin edge 310 Here we can also see how measurement works. An observer watching the 311 signal at single observation point X in Figure 5 will see an edge 312 every 10 ticks, i.e. once per RTT. An observer watching the signal 313 at a symmetric observation point Y in Figure 5 will see a server- 314 client edge 4 ticks after the client-server edge, and a client-server 315 edge 6 ticks after the server-client edge, allowing it to compute 316 component RTT. 318 Figure 6 shows how this mechanism works in the presence of 319 reordering. Here, packet C carries the spin edge, and packet B is 320 reordered on the way to the client. In this case, the client will 321 begin sending spin 1 after the arrival of C, and ignore the spin bit 322 flip to 1 on packet B, since B < C; i.e. it does not increment the 323 highest packet number seen. 325 +--------+ 0 0 0 0 0 +--------+ 326 | | --------> | | 327 | Client | | Server | 328 | | <-------- | | 329 +--------+ 1 0 1 0 0 +--------+ 330 PN= A C B D E 332 Figure 6: Handling reordering 334 4. The Valid Edge Counter 336 This mechanism is indented to provide additional information about 337 the validity of the passively observed spin edges without using 338 information from a cleartext packet number. 340 A one-bit spin signal is resistent to reordering during signal 341 generation, since the spin value is only updated at each endpoint on 342 a packet that advances the packet counter. However, without using 343 the packet number, a passive observer can neither detect reordered 344 nor lost edges, and it must use heuristics to reject delayed edges. 346 The Valid Edge Counter (VEC) addresses these issues with two 347 additional bits added to each packet, encoding values from 0 to 3, 348 indicating that an edge was considered to be valid when send out by 349 the sender, and providing a possibility to detect invalid edges due 350 to reordering and edge loss. 352 4.1. Proposed Short Header Format Including Spin Bit and VEC 354 As of the current editor's version of [QUIC-TRANS], this proposal 355 specifies using bit 0x04 of the first octet in the short header for 356 the spin bit, and the bits 0x18 for the valid edge counter. Note 357 that these values are subject to change frequently to 359 0 1 2 3 360 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 361 +-+-+-+-+-+-+-+-+ 362 |0|K|1|VEC|S|T T| 363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 364 | Destination Connection ID (0..144) ... 365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 | Packet Number (8/16/32) ... 367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 368 | Protected Payload (*) ... 369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 Figure 7: Short Header Format Spin Bit and VEC 373 S: The Spin bit is set 0 or 1 depending on the stored spin value that 374 is updated on packet reception as explained in Section 2. 376 VEC: The Valid Edge Counter is set as defined in Section 4.2. If the 377 spin bit field does not contain an edge, the VEC is set to 0. 379 4.2. Setting the Valid Edge Counter (VEC) 381 The VEC is set by each endpoint as follows; unlike the spin bit, note 382 that there is no difference between client and server handling of the 383 VEC: 385 o By default, the VEC is set to 0. 387 o If a packet contains an edge (transition 0->1 or 1->0) in the spin 388 signal, and that edge is delayed (sent more than a configured 389 delay since the edge was received, defaulting to 1ms), the VEC is 390 set to 1. 392 o If a packet contains an edge in the spin signal, and that edge is 393 not delayed, the VEC is set to the value of the VEC that 394 accompanied the last incoming spin bit transition plus one. This 395 counter holds at 3, instead of cycling around. In other words, an 396 edge received with a VEC of 0 will be reflected as an edge with a 397 VEC of 1; with a VEC of 1 as VEC of 2, and a VEC of 2 or 3 as a 398 VEC of 3. 400 This mechanism allows observers to recognize spurious edges due to 401 reordering and delayed edges due to loss, since these packets will 402 have been sent with VEC 0: they were not edges when they were sent. 403 In addition, it allows senders to signal that a valid edge was 404 delayed because the sender was application-limited: these edges are 405 sent with the VEC set to 1 by the sender, prompting the VEC to count 406 back up over the next RTT. 408 4.3. Use of the VEC by a passive observer 410 The VEC can be used by observers to determine whether an edge in the 411 spin bit signal is valid or not, as follows: 413 o A packet containing an apparent edge in the spin signal with a VEC 414 of 0 is not a valid edge, but may be have been caused by 415 reordering or loss, or was marked as delayed by the sender. It 416 should therefore be ignored. 418 o A packet containing an apparent edge in the spin signal with a VEC 419 of 1 can be used as a left edge (i.e., to start measuring an RTT 420 sample), but not as a right edge (i.e., to take an RTT sample 421 since the last edge). 423 o A packet containing an apparent edge in the spin signal with a VEC 424 of 2 can be used as a left edge, but not as a right edge. If the 425 observation point is symmetric (i.e, it can see both upstream and 426 downstream packets in the flow), the packet can also be used to 427 take a component RTT sample on the segment of the path between the 428 observation point and the direction in which the previous VEC 1 429 edge was seen. 431 o A packet containing an apparent edge in the spin signal with a VEC 432 of 3 can be used as a left edge or right edge, and can be used to 433 compute component RTT in either direction. 435 5. Privacy and Security Considerations 437 The privacy considerations for the latency spin bit are essentially 438 the same as those for passive RTT measurement in general. 440 A concern was raised during the discussion of this feature within the 441 QUIC working group and the QUIC RTT Design Team that high-resolution 442 RTT information might be usable for geolocation. However, an 443 evaluation based on RTT samples taken over 13,780 paths in the 444 Internet from RIPE Atlas anchoring measurements [TRILAT] shows that 445 the magnitude and uncertainty of RTT data limit the resolution of 446 geolocation information that can be derived from Internet RTT to 447 national- or continental-scale; i.e., less resolution than is 448 generally available from free, open IP geolocation databases. 450 One reason for the inaccuracy of geolocation from network RTT is that 451 Internet backbone transmission facilities do not follow the great- 452 circle path between major nodes. Instead, major geographic features 453 and the efficiency of connecting adjacent major cities both influence 454 the facility routing. An evaluation of ~3500 measurements on a mesh 455 of 25 backbone nodes in the continental United States shows that 85% 456 had RTT to great-circle error of 3ms or more, making location within 457 US State boundaries ambiguous [CONUS]. 459 Therefore, in the general case, when an endpoint's IP address is 460 known, RTT information provides negligible additional information. 462 RTT information may be used to infer the occupancy of queues along a 463 path; indeed, this is part of its utility for performance measurement 464 and diagnostics. When a link on a given path has excessive buffering 465 (on the order of hundreds of milliseconds or more), such that the 466 difference in delay between an empty queue and a full queue dwarfs 467 normal variance and RTT along the path, RTT variance during the 468 lifetime of a flow can be used to infer the presence of traffic on 469 the bottleneck link. In practice, however, this is not a concern for 470 passive measurement of congestion-controlled traffic, since any 471 observer in a situation to observe RTT passively need not infer the 472 presence of the traffic, as it can observe it directly. 474 In addition, since RTT information contains application as well as 475 network delay, patterns in RTT variance from minimum, and therefore 476 application delay, can be used to infer or fingerprint application- 477 layer behavior. However, as with the case above, this is not a 478 concern with passive measurement, since the packet size and 479 interarrival time sequence, which is also directly observable, 480 carries more information than RTT variance sequence. 482 We therefore conclude that the high-resolution, per-flow exposure of 483 RTT for passive measurement as provided by the spin bit poses 484 negligible marginal risk to privacy. 486 As shown in Section 2, the spin bit can be implemented separately 487 from the rest of the mechanisms of the QUIC transport protocol, as it 488 requires no access to any state other than that observable in the 489 QUIC packet header itself. We recommend that implementations take 490 advantage of this property, to reduce the risk that errors in the 491 implementation could leak private transport protocol state through 492 the spin bit. 494 Since the spin bit is disconnected from transport mechanics, a QUIC 495 endpoint implementing the spin bit that has a model of the actual 496 network RTT and a target RTT to expose can "lie" about its spin bit 497 transitions, by anticipating or delaying observed transitions, even 498 without coordination with and the collusion of the other endpoint. 499 This is not the case with TCP, which requires coordination and 500 collusion to expose false information via its sequence and 501 acknowledgment numbers and its timestamp option. When passive 502 measurement is used for purposes where one endpoint might gain a 503 material advantage by representing a false RTT, e.g. SLA 504 verification or enforcement of telecommunications regulations, this 505 situation raises a question about the trustworthiness of spin bit RTT 506 measurements. 508 This issue must be appreciated by users of spin bit information, but 509 mitigation is simple, as QUIC implementations designed to lie about 510 RTT through spin bit modification can easily be detected. A lying 511 server can be contacted by an honest client under the control of a 512 verifying party, and the client's RTT estimate compared with the 513 spin-bit exposed estimate. Though in the general case, it is 514 impossible to verify explicit path signals with two complicit 515 endpoints (see [WIRE-IMAGE]), a lying server/client pair may be 516 subject to dynamic analysis along paths with known RTTs. We consider 517 the ease of verification of lying in situations where this would be 518 prohibited by regulation or contract, combined with the consequences 519 of violation of said regulation or contract, to be a sufficient 520 incentive in the general case not to do it. 522 6. Acknowledgments 524 Many thanks to Christian Huitema, who originally proposed the spin 525 bit as pull request 609 on [QUIC-TRANS]. Thanks to Tobias Buehler 526 for feedback on the draft, and for Alexandre Ferrieux for input on 527 the Valid Edge Counter. Special thanks to the QUIC RTT Design Team 528 for discussions leading especially to the privacy and security 529 considerations section. 531 This work is partially supported by the European Commission under 532 Horizon 2020 grant agreement no. 688421 Measurement and Architecture 533 for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat 534 for Education, Research, and Innovation under contract no. 15.0268. 535 This support does not imply endorsement. 537 7. References 539 7.1. Normative References 541 [QUIC-SPIN-EXP] 542 Trammell, B. and M. Kuehlewind, "The QUIC Latency Spin 543 Bit", draft-ietf-quic-spin-exp (work in progress). 545 7.2. Informative References 547 [ALT-MARK] 548 Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., 549 Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 550 "Alternate Marking method for passive and hybrid 551 performance monitoring", draft-ietf-ippm-alt-mark-14 (work 552 in progress), December 2017. 554 [CACM-TCP] 555 Strowes, S., "Passively Measuring TCP Round-Trip Times (in 556 Communications of the ACM)", October 2013. 558 [CARRA-RTT] 559 Carra, D., Avrachenkov, K., Alouf, S., Blanc, A., Nain, 560 P., and G. Post, "Passive Online RTT Estimation for Flow- 561 Aware Routers Using One-Way Traffic (NETWORKING 2010, LNCS 562 6091, pp. 109-121)", 2010. 564 [CONUS] Morton, A., "Comparison of Backbone Node RTT and Great 565 Circle Distances (https://github.com/acmacm/CONUS-RTT)", 566 September 2017. 568 [IMC-CONGESTION] 569 Luckie, M., Dhamdhere, A., Clark, D., Huffaker, B., and k. 570 claffy, "Challenges in Inferring Internet Interdomain 571 Congestion (in Proc. ACM IMC 2014)", November 2014. 573 [IMC-TCPSIG] 574 Sundaresan, S., Dhamdhere, A., Allman, M., and . k claffy, 575 "TCP Congestion Signatures (in Proc. ACM IMC 2017)", n.d.. 577 [MINQ] Rescorla, E., "MINQ, a simple Go implementation of QUIC 578 (https://github.com/ekr/minq)", November 2017. 580 [MOKUMOKUREN] 581 Trammell, B., "Mokumokuren, a lightweight flow meter using 582 gopacket (https://github.com/britram/mokumokuren)", 583 November 2017. 585 [NOSPIN] Morton, A., "Description of a tool chain to evaluate 586 Unidirectional Passive RTT measurement (and results) 587 (https://github.com/acmacm/PassiveRTT)", October 2017. 589 [QUIC-MGT] 590 Kuehlewind, M. and B. Trammell, "Manageability of the QUIC 591 Transport Protocol", draft-ietf-quic-manageability-01 592 (work in progress), October 2017. 594 [QUIC-TRANS] 595 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 596 and Secure Transport", draft-ietf-quic-transport-10 (work 597 in progress), March 2018. 599 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 600 RFC 792, DOI 10.17487/RFC0792, September 1981, 601 . 603 [RFC4433] Kulkarni, M., Patel, A., and K. Leung, "Mobile IPv4 604 Dynamic Home Agent (HA) Assignment", RFC 4433, 605 DOI 10.17487/RFC4433, March 2006, 606 . 608 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 609 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 610 DOI 10.17487/RFC4737, November 2006, 611 . 613 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 614 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 615 RFC 5357, DOI 10.17487/RFC5357, October 2008, 616 . 618 [RFC6049] Morton, A. and E. Stephan, "Spatial Composition of 619 Metrics", RFC 6049, DOI 10.17487/RFC6049, January 2011, 620 . 622 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 623 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 624 May 2016, . 626 [SHBAIR] Shbair, W., Cholez, T., Francois, J., and I. Chrisment, "A 627 multi-level framework to identify HTTPS services (in Proc. 628 IEEE/IFIP NOMS)", April 2016. 630 [SPINBIT-REPORT] 631 De Vaere, P., "Latency Spinbit Implementation Experience 632 (https://devae.re/f/eth/quic/spinbit_report/)", November 633 2017. 635 [TLS] Rescorla, E., "The Transport Layer Security (TLS) Protocol 636 Version 1.3", draft-ietf-tls-tls13-28 (work in progress), 637 March 2018. 639 [TMA-QOF] Trammell, B., Gugelmann, D., and N. Brownlee, "Inline Data 640 Integrity Signals for Passive Measurement (in Proc. TMA 641 2014)", April 2014. 643 [TOKYO-PING] 644 Pelsser, C., Cittadini, L., Vissicchio, S., and R. Bush, 645 "From Paris to Tokyo - On the Suitability of ping to 646 Measure Latency (In Proc. ACM IMC 2014)", October 2014. 648 [TRILAT] Trammell, B., "On the Suitability of RTT Measurements for 649 Geolocation 650 (https://github.com/britram/trilateration/blob/paper-rev- 651 1/paper.ipynb)", August 2017. 653 [WIRE-IMAGE] 654 Trammell, B. and M. Kuehlewind, "The Wire Image of a 655 Network Protocol", draft-trammell-wire-image-03 (work in 656 progress), April 2018. 658 [WWMM-BLOAT] 659 Alfredsson, S., Giudice, G., Garcia, J., Brunstrom, A., 660 Cicco, L., and S. Mascolo, "Impact of TCP Congestion 661 Control on Bufferbloat in Cellular Networks (in Proc. IEEE 662 WoWMoM 2013)", June 2013. 664 Appendix A. Experimental Evaluation 666 We have evaluated the effectiveness of the spin bit in an emulated 667 network environment. The spin bit was added to a fork of [MINQ], 668 using the mechanism described in Section 2, but with the spin bit 669 appearing in a measurement byte added to the header for passive 670 measurability experiments. Spin bit measurement support was added to 671 [MOKUMOKUREN]. Full results of these ongoing experiments are 672 available online in [SPINBIT-REPORT], but we summarize our findings 673 here. 675 First, we confirm that the spin bit works as advertised: it provides 676 one useful RTT sample per RTT to any passive observer of the flow. 677 This sample tracks each sender's local instantaneous estimate of RTT 678 as well as the expected RTT (i.e., defined by the emulation) fairly 679 well. One surprising implication of this is that the spin bit 680 provides _more_ information than is available by local estimation to 681 an endpoint which is mostly receiving data frames and sending mainly 682 ACKs, and as such can also be useful in purely endpoint-local 683 observations of the RTT evolution during the flow. The spin bit also 684 works correctly under moderate to heavy packet loss and jitter. 686 Second, we confirm that the spin bit can be easily implemented 687 without requiring deep integration into a QUIC implementation. 688 Indeed, it could be implemented completely independently, as a shim, 689 aside from the requirement that the spin bit value be integrity- 690 protected along with the rest of the QUIC header. 692 Third, we performed experiments focused on the intermittent-sender 693 problem described in Section 3.1. We confirm that the spin bit does 694 not provide useful RTT samples after the handshake when packets are 695 only sent intermittently. Simple heuristics can be used to recognize 696 this situation, however, and to reject these RTT samples. We also 697 find that a simple sender-side heuristic can be used to determine 698 whether a sample will be useful. If a sender sends a packet more 699 than a specified delay (e.g. 1ms) after the last packet received by 700 the client, it knows that any latency spin observation of that packet 701 will be invalid. If a second "spin valid" bit were available, the 702 sender could then mark that packet "spin invalid". Our experiments 703 show that this simple heuristic and spin validity bit are successful 704 in marking all packets whose RTT samples should be rejected. 706 Fourth, we performed experiments focused on the reordering problem 707 described in Section 3.1. We find that while reordering can cause 708 spurious samples at a naive observer, two simple approaches can be 709 used to reject spurious RTT samples due to reordering. First, a two- 710 bit spin signal that always advances in a single direction (e.g. 00 711 -> 01 -> 10 -> 11) successfully rejects all reordered samples, 712 including under amounts of reordering that render the transport 713 itself mostly useless. However, adding a bit is not necessary: 714 having the observer keep the least significant bits of the packet 715 number, and rejecting samples from packets that reverse the sequence 716 [RFC4737], as suggested in Section 3.1, is essentially as successful 717 as a two-bit spin signal in mitigating the effects of reordering on 718 RTT measurement. 720 Fifth, we performed parallel active measurements using ping, as 721 described in Appendix C.2. In our emulated network, the ICMP packets 722 and the QUIC packets traverse the same links with the same treatment, 723 and share queues at each link, which mitigates most of the issues 724 with ping. We find that while ping works as expected in measuring 725 end-to-end RTT, it does not track the sender's estimate of RTT, and 726 as such does not measure the RTT experienced by the application layer 727 as well as the spin bit does. 729 In summary, our experiments show that the spin bit is suitable for 730 purpose, can be implemented with minimal disruption, and that most of 731 the identified problems can be easily mitigated. See 732 [SPINBIT-REPORT] for more. 734 Appendix B. Use Cases for Passive RTT Measurement 736 This section describes use cases for passive RTT measurement. Most 737 of these are currently achieved with TCP, i.e., the matching of 738 packets based on sequence and acknowledgment numbers, or timestamps 739 and timestamp echoes, in order to generate upstream and downstream 740 RTT samples which can be added to get end-to-end RTT. These use 741 cases could be achieved with QUIC by replacing sequence/ 742 acknowledgement and timestamp analysis with spin bit analysis, as 743 described in Section 3. 745 In any case, the measurement methodology follows one of a few basic 746 variants: 748 o The RTT evolution of a flow or a set of flows can be compared to 749 baseline or expected RTT measurements for flows with the same 750 characteristics in order to detect or localize latency issues in a 751 specific network. 753 o The RTT evolution of a single flow can also be examined in detail 754 to diagnose performance issues with that flow. 756 o The spin bit can be used to generate a large number of samples of 757 RTT for a flow aggregate (e.g., all flows between two given 758 networks) without regard to temporal evolution of the RTT, in 759 order to examine the distribution of RTTs for a group of flows 760 that should have similar RTT (e.g., because they should share the 761 same path(s)). 763 B.1. Inter-domain Troubleshooting 765 Network access providers are often the first point of contact by 766 their customers when network problems impact the performance of 767 bandwidth-intensive and latency-sensitive applications such as video, 768 regardless of whether the root cause lies within the access 769 provider's network, the service provider's network, on the Internet 770 paths between them, or within the customer's own network. 772 The network performance is currently measured by points of presence 773 on-the-path which extract spatial delay and loss metrics measurements 774 [RFC6049] from fields of the transport layer (e.g. TCP) or of 775 application layer (e.g. RTP). The information is captured in the 776 upper layer because neither the IP header nor the UDP layer includes 777 fields allowing the measurement of upstream and downstream delay and 778 loss. 780 Local network performance problems are detected with monitoring tools 781 which observe the variation of upstream metrics and downstream 782 metrics. 784 Inter-domain troubleshooting relies on the same metrics but is not a 785 pro-active task. It is a recursive process which hones in on the 786 domain and link responsible for the failure. In practice, inter- 787 domain troubleshooting is a communication process between the Network 788 Operations Center (NOC) teams of the networks on the path, because 789 the root cause of a problem is rarely located on a single network, 790 and requires cooperation and exchange of data between the NOCs. 792 One example is the troubleshooting performance degradation resulting 793 from a change of routing policy on one side of the path which 794 increases the burden on a defective line card of a device located 795 somewhere on the path. The card's misbehavior introduces an abnormal 796 reordered packets only in the traffic exchanged at line rate. 798 Other examples are similar in terms of cooperation requirements and 799 the need to refer to measurements. NOCs need to share the same 800 measurement metrics and to measure these metrics on the same fields 801 of the packet to enable a minimal level of technical cooperation. 803 Experimentation with the spinbit Appendix A has shown ability to 804 replace the current RTT measurement opportunities based on clear-text 805 transport or application header fields with a standard approach for 806 measuring passive upstream and downstream RTT, which are a 807 fundamental metric for this diagnostic process. 809 B.2. Two-Point Intradomain Measurement 811 The spin bit is also useful as a basic signal for instantaneous 812 measurement of the treatment of QUIC traffic within a single network. 813 Though the primary design goal of the spin bit signal is to enable 814 single-observer on-path measurement of end-to-end RTT, the spin bit 815 can also be used by two cooperating observers with access to traffic 816 flowing in the same direction as an alternate marking signal, as 817 described in [ALT-MARK]. The only difference from alternate marking 818 with a generated signal is that the size of the alternation will 819 change with the flight size each RTT. However, these changes do not 820 affect the applicability of the method that works for each marking 821 batch separately applied between two measurement points on the same 822 direction. This two point measurement is an additional feature 823 enabled "for free" by the spin bit signal. 825 So, with more than one observer on the same direction, it can be 826 useful to segment the RTT and deduce the contribution to the RTT of 827 the portion of the network between two on-path observers. This can 828 be easily performed by calculating the delay between two or more 829 measurement points on a single direction by applying [ALT-MARK]. In 830 this way, packet loss, delay and delay variation can be measured for 831 each segment of the network depending on the number and distribution 832 of the available on-path observation points. When these observation 833 points are applied at network borders, the alternate-marking signal 834 can be used to measure the performance of QUIC traffic within a 835 network operator's own domain of responsibility. own portion of the 836 network. 838 B.3. Bufferbloat Mitigation in Cellular Networks 840 Cellular networks consist of multiple Radio Access Networks (RAN) 841 where mobile devices are attached to base stations. It is common 842 that base stations from different vendors and different generations 843 are deployed in the same cellular network. 845 Due to the dynamic nature of RANs, base stations have typically been 846 provisioned with large buffers to maximize throughput despite rapid 847 changes in capacity. As a side effect, bufferbloat has become a 848 common issue in such networks [WWMM-BLOAT]. 850 An effective way of mitigating bufferbloat without sacrificing too 851 much throughput is to deploy Active Queue Management (AQM) in 852 bottleneck routers and base stations. However, due to the variation 853 in deployed base-stations it is not always possible to enable AQM at 854 the bottlenecks, without massive infrastructure investments. 856 An alternative approach is to deploy AQM as a network function in a 857 more centralized location than the traditional bottleneck nodes. 858 Such an AQM monitors the RTT progression of flows and drops or marks 859 packets when the measured latency is indicative of congestion. Such 860 a function also has the possibility to detect misbehaving flows and 861 reduce the negative impact they have on the network. 863 B.4. Locating WiFi Problems in Home Networks 865 Many residential networks use WiFi (802.11) on the last segment, and 866 WiFi signal strength degradation manifests in high first-hop delay, 867 due to the fact that the MAC layer will retransmit packets lost at 868 that layer. Measuring the RTT between endpoints on the customer 869 network and parts of the service provider's own infrastructure (which 870 have predictable delay characteristics) can be used to isolate this 871 cause of performance problems. 873 The network provider can measure the RTT and packet loss in the home 874 gateway or an upstream point if there is no access to home gateway. 875 A problem in the WiFi network is identified by seeing high delay and 876 low packet loss. 878 These measurements are particularly useful for traffic which is 879 latency sensitive, such as interactive video applications. However, 880 since high latency is often correlated with other network-layer 881 issues such as chronic interconnect congestion [IMC-CONGESTION], it 882 is useful for general troubleshooting of network layer issues in an 883 interdomain setting. 885 In this case, multiple RTT samples per flow are useful less for 886 observing intraflow behavior, and more for generating sufficient 887 samples for a given aggregate to make a high-quality measurement. 889 B.5. Internet Measurement Research 891 As a large, distributed, engineered system with no centralized 892 control, the Internet has emergent properties of interest to the 893 research community not just for purely scientific curiosity, but also 894 to provide applicable guidance to Internet engineering, Internet 895 protocol design and development, network operations, and policy 896 development. Latency measurements in particular are both an active 897 area of research as well as an important tool for certain measurement 898 studies (see, e.g. [IMC-TCPSIG], from the most recent Internet 899 Measurement Conference). While much of this work is currently done 900 with active measurements, the ability to generate latency samples 901 passively or using a hybrid measurement approach (i.e., through 902 passive observation of purpose-generated active measurement traffic; 903 see [RFC7799]) can drastically increase the efficiency and 904 scalability of these studies. A latency spin bit would make these 905 techniques applicable to QUIC, as well. 907 Appendix C. Alternate RTT Measurement Approaches for Diagnosing QUIC 908 flows 910 There are three broad alternatives to explicit signaling for passive 911 RTT measurement of the RTT experienced by QUIC flows. 913 C.1. Handshake RTT measurement 915 The first of these is handshake RTT measurement. As described in 916 [QUIC-MGT], the packets of the QUIC handshake are distinguishable on 917 the wire in such a way that they can be used for one RTT measurement 918 sample per flow: the delay between the client initial and the server 919 cleartext packet can be used to measure "upstream" RTT (between the 920 observer and the server), and the delay between the server cleartext 921 packet and the next client cleartext packet can be used to measure 922 "downstream" RTT (between the client and the observer). When RTT 923 measurements are used in large aggregates (all flows traversing a 924 large link, for example), a methodology based on handshake RTT could 925 be used to generate sufficient samples for some purposes without the 926 spin bit. 928 However, this methodology would rely on the assumption that the 929 difference between handshake RTT and nominal in-flow RTT is 930 negligible. Specifically, (1) any additional delay required to 931 compute any cryptographic parameters must be negligible with respect 932 to network RTT; (2) any additional delay required to establish state 933 along the path must be negligible with respect to network RTT; and 934 (3) network treatment of initial packets in a flow must be identical 935 to that of later packets in the flow. When these assumptions cannot 936 be shown to hold, spin-bit based RTT measurement is preferable to 937 handshake RTT measurement, even for applications for which handshake 938 RTT measurement would otherwise be suitable. 940 C.2. Parallel active measurement 942 The second alternative is parallel active measurement: using ICMP 943 Echo Request and Reply [RFC0792] [RFC4433], a dedicated measurement 944 protocol like TWAMP [RFC5357], or a separate diagnostic QUIC flow to 945 measure RTT. Regardless of protocol, the active measurement must be 946 initiated by a client on the same network as the client of the QUIC 947 flow(s) of interest, or a network close by in the Internet topology, 948 toward the server. Note that there is no guarantee that ICMP flows 949 will receive the same network treatment as the flows under study, 950 both due to differential treatment of ICMP traffic and due to ECMP 951 routing (see e.g. [TOKYO-PING]). TWAMP and QUIC diagnostic flows, 952 though both use UDP, have similar issues regarding ECMP. However, in 953 situations where the entity doing the measurement can guarantee that 954 the active measurement traffic will traverse the subpaths of interest 955 (e.g. residential access network measurement under a network 956 architecture and business model where the network operator owns the 957 CPE), active measurement can be used to generate RTT samples at the 958 cost of at least two non-productive packets sent though the network 959 per sample. 961 C.3. Frequency Analysis 963 The third alternative, proposed during the QUIC RTT design team 964 process, relies on the inter-packet spacing to convey information 965 about the RTT, and would therefore allow measurements confined to a 966 single direction of transmission, as described in [CARRA-RTT]. 968 We evaluated the applicability of this work to passive RTT 969 measurement in QUIC, and found it wanting. We assembled a toolchain, 970 as described in [NOSPIN], that allowed evaluation of a critical 971 aspect of the [CARRA-RTT] method: extraction of inter-packet times of 972 real packet streams and the analysis of frequencies present in the 973 packet stream using the Lomb-Scargle Periodogram. Several streams 974 were evaluated, as summarized below: 976 o It seems that Carra et al. [CARRA-RTT] took the noisy and low- 977 confidence results of a statistical process (no RTT-related 978 frequency has been detected even after using very low alpha 979 confidence) and added heuristics with sliding-window averaging to 980 infer the fundamental frequency and RTT present in a 981 unidirectional stream. 983 o There appear to be several limitations on the streams that are 984 applicable. Streams with long RTT (~50ms) are more likely to be 985 suitable (having a better match between packet rate and relatively 986 low frequencies to detect). 988 o None of the TCP streams analysed (to date) possess a sufficient 989 packet rate such that the measured fundamental frequency or the 990 multiples of the fundamental are actually within the detectable 991 range. 993 o "Ideal" interarrival time streams were simulated with uniform 994 sampling and period. The Lomb-Scargle Periodogram is surprisingly 995 unable to detect the fundamental frequency at 100 Hz from the 996 constant 10 ms packet spacing. 998 o It is not clear if IETF QUIC protocol stream will possess the same 999 inter-packet arrival time features as TCP streams. Also, Carra et 1000 al. note that their process may not work if the TCP stream 1001 encounters a bottleneck, which would be an essential circumstance 1002 for network troubleshooting. Mobile networks with time-slot 1003 service disciplines would likely cause similar issues as a 1004 bottleneck, by imposing their time-slot interval on the spacing of 1005 most packets. 1007 o The Carra et al. [CARRA-RTT] calculation of minimum and maximum 1008 frequencies that can be detected may not be applicable when the 1009 inter-arrival times are (both) the signal being detected and 1010 govern the non-uniform sampling frequency. 1012 Appendix D. Greasing 1014 Routes, congestion levels and therefore latency between two fixed 1015 QUIC endpoints, as well as the shape of individual application flows, 1016 fluctuate in ways that are not totally predictable by an on path 1017 observer. In general, there is no a-priori pattern for the spin-bit 1018 distribution that will always materialise on a certain flow 1019 aggregate, even for a single user. 1021 There has been discussion in the QUIC working group that greasing 1022 could be a strategy to counter an evil access provider that might 1023 gate access to its users on a valid spin bit signal. Let's accept 1024 for a moment this threat model and consider the practical case of a 1025 home gateway that temporarily misbehaves, for example draining its 1026 queues slower than it would normally do while a firmware download is 1027 in progress. It would be ill-considered for an access provider (even 1028 a malicious one) to block, or otherwise interfere with, QUIC flows 1029 originating from behind that CPE solely based on the fact that RTTs 1030 are now different from "usual". In fact, providing a numerical 1031 assessment of what such "usual" RTT looks like would necessarily 1032 include many paths with different length, and considerable RTT 1033 variability within any fixed path, which is clearly beyond most ISPs' 1034 reach. But even assuming it were, there is a simple cost-benefit 1035 counterargument here that the same effect (i.e., gating traffic from 1036 or to a given user based on observed traffic patterns) could be 1037 achieved with much cheaper and effective means (e.g., [SHBAIR]). 1039 So, the potential for ossification appears to be extremely low. 1040 Since it depends on so much external noise, the spin-bit result 1041 variability is self-greasing to an extent. In fact, implementing 1042 explicit greasing around the spin-bit might even be harmful as it 1043 would potentially erode confidence in the veracity of the signal. 1045 However, if a greasing algorithm is really needed - for example, if 1046 we want to reuse the bit with different semantics in the future 1047 (i.e.: the spin-bit is not included in the header invariants), one 1048 very simple implementation would be as follows: each server will 1049 refuse to spin its bit on a per-flow basis with a given probability 1050 p, instead leaving it stuck to a randomly chosen value, 0 or 1. The 1051 client will then end up leaving its bit stuck to the opposite value, 1052 or could detect this condition and also pick a randomly chosen stuck 1053 value. The value chosen for p must be small enough to let the spin- 1054 bit mechanics work and large enough not to be seen as an error 1055 instead of an intentional protocol feature. 1057 Authors' Addresses 1059 Brian Trammell (editor) 1060 ETH Zurich 1062 Email: ietf@trammell.ch 1064 Piet De Vaere 1065 ETH Zurich 1067 Email: piet@devae.re 1069 Roni Even 1070 Huawei 1072 Email: roni.even@huawei.com 1074 Giuseppe Fioccola 1075 Telecom Italia 1077 Email: giuseppe.fioccola@telecomitalia.it 1078 Thomas Fossati 1079 Nokia 1081 Email: thomas.fossati@nokia.com 1083 Marcus Ihlar 1084 Ericsson 1086 Email: marcus.ihlar@ericsson.com 1088 Al Morton 1089 AT&T Labs 1091 Email: acmorton@att.com 1093 Emile Stephan 1094 Orange 1096 Email: emile.stephan@orange.com