idnits 2.17.1 draft-trammell-quic-spin-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 14, 2018) is 2174 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-18) exists of draft-ietf-quic-manageability-01 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-11 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC B. Trammell, Ed. 3 Internet-Draft P. De Vaere 4 Intended status: Informational ETH Zurich 5 Expires: November 15, 2018 R. Even 6 Huawei 7 G. Fioccola 8 Telecom Italia 9 T. Fossati 10 Nokia 11 M. Ihlar 12 Ericsson 13 A. Morton 14 AT&T Labs 15 E. Stephan 16 Orange 17 May 14, 2018 19 Adding Explicit Passive Measurability of Two-Way Latency to the QUIC 20 Transport Protocol 21 draft-trammell-quic-spin-03 23 Abstract 25 This document describes the addition of a "spin bit", intended for 26 explicit measurability of end-to-end RTT, to the QUIC transport 27 protocol. It proposes a detailed mechanism for the spin bit, as well 28 as an additional mechanism, called the valid edge counter, to 29 increase the fidelity of the latency signal in less than ideal 30 network conditions. It describes how to use the latency spin signal 31 to measure end-to-end latency, discusses corner cases and their 32 workarounds in the measurement, describes experimental evaluation of 33 the mechanism done to date, and examines the utility and privacy 34 implications of the spin bit. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on November 15, 2018. 53 Copyright Notice 55 Copyright (c) 2018 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 1.1. About This Document . . . . . . . . . . . . . . . . . . . 4 72 2. The Spin Bit Mechanism . . . . . . . . . . . . . . . . . . . 4 73 3. Using the Spin Bit for Passive RTT Measurement . . . . . . . 5 74 3.1. Limitations and Workarounds . . . . . . . . . . . . . . . 5 75 3.2. Illustration . . . . . . . . . . . . . . . . . . . . . . 6 76 4. The Valid Edge Counter . . . . . . . . . . . . . . . . . . . 8 77 4.1. Proposed Short Header Format Including Spin Bit and VEC . 8 78 4.2. Setting the Valid Edge Counter (VEC) . . . . . . . . . . 9 79 4.3. Use of the VEC by a passive observer . . . . . . . . . . 10 80 5. Privacy and Security Considerations . . . . . . . . . . . . . 10 81 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 82 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 83 7.1. Normative References . . . . . . . . . . . . . . . . . . 12 84 7.2. Informative References . . . . . . . . . . . . . . . . . 13 85 Appendix A. Experimental Evaluation . . . . . . . . . . . . . . 15 86 Appendix B. Use Cases for Passive RTT Measurement . . . . . . . 16 87 B.1. Inter-domain Troubleshooting . . . . . . . . . . . . . . 17 88 B.2. Two-Point Intradomain Measurement . . . . . . . . . . . . 18 89 B.3. Bufferbloat Mitigation in Cellular Networks . . . . . . . 19 90 B.4. Locating WiFi Problems in Home Networks . . . . . . . . . 19 91 B.5. Internet Measurement Research . . . . . . . . . . . . . . 20 92 Appendix C. Alternate RTT Measurement Approaches for Diagnosing 93 QUIC flows . . . . . . . . . . . . . . . . . . . . . 20 94 C.1. Handshake RTT measurement . . . . . . . . . . . . . . . . 20 95 C.2. Parallel active measurement . . . . . . . . . . . . . . . 21 96 C.3. Frequency Analysis . . . . . . . . . . . . . . . . . . . 21 97 Appendix D. Greasing . . . . . . . . . . . . . . . . . . . . . . 22 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 100 1. Introduction 102 The QUIC transport protocol [QUIC-TRANS] is a UDP-encapsulated 103 protocol integrated with Transport Layer Security (TLS) [TLS] to 104 encrypt most of its protocol internals, beyond those handshake 105 packets needed to establish or resume a TLS session, and information 106 required to reassemble QUIC streams (the packet number) and to route 107 QUIC packets to the correct machine in a load-balancing situation 108 (the connection ID). In contrast to TCP, QUIC's wire image (see 109 [WIRE-IMAGE]) exposes much less information about transport protocol 110 state than TCP's wire image. Specifically, the fact that sequence 111 and acknowledgement numbers and timestamps (available in TCP) cannot 112 be seen by on-path observers in QUIC means that passive TCP loss and 113 latency measurement techniques that rely on this information (e.g. 114 [CACM-TCP], [TMA-QOF]) cannot be easily ported to work with QUIC. 116 This document proposes a solution to this problem by adding a 117 "latency spin bit" to the QUIC short header. This bit is designed 118 solely for explicit passive measurability of the protocol. It 119 provides one RTT sample per RTT to passive observers of QUIC traffic. 120 This document describes the mechanism, how it can be added to QUIC, 121 and how it can be used by passive measurement facilities to generate 122 RTT samples. It explores potential corner cases and shortcomings of 123 the mechanism, and proposes an extention called the Valid Edge 124 Counter (VEC) to mitigate them. It further details findings on 125 privacy risk researched by the QUIC RTT Design Team, which was tasked 126 by the IETF QUIC Working Group to determine the risk/utility tradeoff 127 for the spin bit. 129 Appendices summarize experimental results to date with an 130 implementation of the spin bit built atop a recent QUIC 131 implementation, describe use cases for passive RTT measurement at the 132 resolution provided by the spin bit, explore alternatives to the spin 133 bit for passive latency measurement of QUIC flows, and discuss the 134 necessity of "greasing" the spin bit. 136 The spin bit has low overhead, presents negligible privacy risk, and 137 has clear utility in providing passive RTT measurability of QUIC that 138 is far superior to QUIC's measurability without the spin bit, and 139 equivalent to or better than TCP passive measurability. 141 1.1. About This Document 143 [QUIC-SPIN-EXP] specifies the addition of the spin bit to the QUIC 144 transport protocol for experimental purposes. This document provides 145 background for that specification, documents work done in the 146 development of the spin bit proposal, and extends it with the VEC 147 signal for loss, reordering, and delay compensation without relying 148 on the QUIC packet number. 150 This document is maintained in the GitHub repository 151 https://github.com/britram/draft-trammell-quic-spin, and the editor's 152 copy is available online at https://britram.github.io/draft-trammell- 153 quic-spin. Current open issues on the document can be seen at 154 https://github.com/britram/draft-trammell-quic-spin/issues. Comments 155 and suggestions on this document can be made by filing an issue 156 there, or by contacting the editor. 158 2. The Spin Bit Mechanism 160 The latency spin bit enables latency monitoring from observation 161 points on the network path. Each endpoint, client and server, 162 maintains a spin value, 0 or 1, for each QUIC connection, and sets 163 the spin bit on packets it sends for that connection to the 164 appropriate value (below). It also maintains the highest packet 165 number seen from its peer on the connection. The value is then 166 determined at each endpoint as follows: 168 o The server initializes its spin value to 0. When it receives a 169 packet from the client, if that packet has a short header and if 170 it increments the highest packet number seen by the server from 171 the client, it sets the spin value to the spin bit in the received 172 packet. 174 o The client initializes its spin value to 0. When it receives a 175 packet from the server, if the packet has a short header and if it 176 increments the highest packet number seen by the client from the 177 server, it sets the spin value to the opposite of the spin bit in 178 the received packet. 180 This procedure will cause the spin bit to change value in each 181 direction once per round trip. Observation points can estimate the 182 network latency by observing these changes in the latency spin bit, 183 as described in Section 3. See Section 3.2 for an illustration of 184 this mechanism in action. 186 The defails of the addition of the spin bit to the QUIC short header 187 are given in [QUIC-SPIN-EXP]. 189 3. Using the Spin Bit for Passive RTT Measurement 191 When a QUIC flow is sending at full rate (i.e., neither application 192 nor flow control limited), the latency spin bit in each direction 193 changes value once per round-trip time (RTT). An on-path observer 194 can observe the time difference between edges in the spin bit signal 195 in a single direction to measure one sample of end-to-end RTT. Note 196 that this measurement, as with passive RTT measurement for TCP, 197 includes any transport protocol delay (e.g., delayed sending of 198 acknowledgements) and/or application layer delay (e.g., waiting for a 199 request to complete). It therefore provides devices on path a good 200 instantaneous estimate of the RTT as experienced by the application. 201 A simple linear smoothing or moving minimum filter can be applied to 202 the stream of RTT information to get a more stable estimate. 204 An on-path observer that can see traffic in both directions (from 205 client to server and from server to client) can also use the spin bit 206 to measure "upstream" and "downstream" component RTT; i.e, the 207 component of the end-to-end RTT attributable to the paths between the 208 observer and the server and the observer and the client, 209 respectively. It does this by measuring the delay between a spin 210 edge observed in the upstream direction and that observed in the 211 downstream direction, and vice versa. 213 3.1. Limitations and Workarounds 215 Application-limited and flow-control-limited senders can have 216 application and transport layer delay, respectively, that are much 217 greater than network RTT. Therefore, the spin bit provides network 218 latency information only when the sender is neither application nor 219 flow control limited. When the sender is application-limited by 220 periodic application traffic, where that period is longer than the 221 RTT, measuring the spin bit provides information about the 222 application period, not the RTT. Simple heuristics based on the 223 observed data rate per flow or changes in the RTT series can be used 224 to reject bad RTT samples due to application or flow control 225 limitation. 227 Since the spin bit logic at each endpoint considers only samples on 228 packets that advance the largest packet number seen, signal 229 generation itself is resistant to reordering. However, reordering 230 can cause problems at an observer by causing spurious edge detection 231 and therefore low RTT estimates, if reordering occurs across a spin 232 bit flip in the stream. This can be probabilistically mitigated by 233 the observer also tracking the low-order bits of the packet number, 234 and rejecting edges that appear out-of-order [RFC4737]. 236 All of these limitations are addressed by an enhancement to the spin 237 bit, the Valid Edge Counter, described in detail in Section 4. 239 3.2. Illustration 241 To illustrate the operation of the spin bit, we consider a simplified 242 model of a single path between client and server as a queue with 243 slots for five packets, and assume that both client and server sent 244 packets at a constant rate. If each packet moves one slot in the 245 queue per clock tick, note that this network has a RTT of 10 ticks. 247 Initially, during connection establishment, no packets with a spin 248 bit are in flight, as shown in Figure 1. 250 +--------+ - - - - - +--------+ 251 | | --------> | | 252 | Client | | Server | 253 | | <-------- | | 254 +--------+ - - - - - +--------+ 256 Figure 1: Initial state, no spin bit between client and server 258 Either the server, the client, or both can begin sending packets with 259 short headers after connection establishment, as shown in Figure 2; 260 here, no spin edges are yet in transit. 262 +--------+ 0 0 - - - +--------+ 263 | | --------> | | 264 | Client | | Server | 265 | | <-------- | | 266 +--------+ - - 0 0 0 +--------+ 268 Figure 2: Client and server begin sending packets with spin 0 270 Once the server's first 0-marked packet arrives at the client, the 271 client sets its spin value to 1, and begins sending packets with the 272 spin bit set, as shown in Figure 3. The spin edge is now in transit 273 toward the server. 275 +--------+ 1 0 0 0 0 +--------+ 276 | | --------> | | 277 | Client | | Server | 278 | | <-------- | | 279 +--------+ 0 0 0 0 0 +--------+ 281 Figure 3: The bit begins spinning 283 Five ticks later, this packet arrives at the server, which takes its 284 spin value from it and reflects that value back on the next packet it 285 sends, as shown in Figure 4. The spin edge is now in transit toward 286 the client. 288 +--------+ 1 1 1 1 1 +--------+ 289 | | --------> | | 290 | Client | | Server | 291 | | <-------- | | 292 +--------+ 0 0 0 0 1 +--------+ 294 Figure 4: Server reflects the spin edge 296 Five ticks later, the 1-marked packet arrives at the client, which 297 inverts its spin value and sends the inverted value on the next 298 packet it sends, as shown in Figure 5. 300 obs. points X Y 301 +--------+ 0 1 1 1 1 +--------+ 302 | | --------> | | 303 | Client | | Server | 304 | | <-------- | | 305 +--------+ 1 1 1 1 1 +--------+ 306 Y 308 Figure 5: Client inverts the spin edge 310 Here we can also see how measurement works. An observer watching the 311 signal at single observation point X in Figure 5 will see an edge 312 every 10 ticks, i.e. once per RTT. An observer watching the signal 313 at a symmetric observation point Y in Figure 5 will see a server- 314 client edge 4 ticks after the client-server edge, and a client-server 315 edge 6 ticks after the server-client edge, allowing it to compute 316 component RTT. 318 Figure 6 shows how this mechanism works in the presence of 319 reordering. Here, packet C carries the spin edge, and packet B is 320 reordered on the way to the client. In this case, the client will 321 begin sending spin 1 after the arrival of C, and ignore the spin bit 322 flip to 1 on packet B, since B < C; i.e. it does not increment the 323 highest packet number seen. 325 +--------+ 0 0 0 0 0 +--------+ 326 | | --------> | | 327 | Client | | Server | 328 | | <-------- | | 329 +--------+ 1 0 1 0 0 +--------+ 330 PN= A C B D E 332 Figure 6: Handling reordering 334 4. The Valid Edge Counter 336 This mechanism is indented to provide additional information about 337 the validity of the passively observed spin edges without using 338 information from a cleartext packet number. 340 A one-bit spin signal is resistent to reordering during signal 341 generation, since the spin value is only updated at each endpoint on 342 a packet that advances the packet counter. However, without using 343 the packet number, a passive observer can neither detect reordered 344 nor lost edges, and it must use heuristics to reject delayed edges. 346 The Valid Edge Counter (VEC) addresses these issues with two 347 additional bits added to each packet, encoding values from 0 to 3, 348 indicating that an edge was considered to be valid when send out by 349 the sender, and providing a possibility to detect invalid edges due 350 to reordering and edge loss. 352 4.1. Proposed Short Header Format Including Spin Bit and VEC 354 As of the current editor's version of [QUIC-TRANS], this proposal 355 specifies using bit 0x04 of the first octet in the short header for 356 the spin bit, and the bits 0x03 for the valid edge counter. Note 357 that these values are subject to change as the layout of the first 358 octet is finalized. 360 0 1 2 3 361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 362 +-+-+-+-+-+-+-+-+ 363 |0|K|1|1|0|S|VEC| 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | Destination Connection ID (0..144) ... 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 | Packet Number (8/16/32) ... 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | Protected Payload (*) ... 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 372 Figure 7: Short Header Format Spin Bit and VEC 374 S: The Spin bit is set 0 or 1 depending on the stored spin value that 375 is updated on packet reception as explained in Section 2. 377 VEC: The Valid Edge Counter is set as defined in Section 4.2. If the 378 spin bit field does not contain an edge, the VEC is set to 0. 380 4.2. Setting the Valid Edge Counter (VEC) 382 The VEC is set by each endpoint as follows; unlike the spin bit, note 383 that there is no difference between client and server handling of the 384 VEC: 386 o By default, the VEC is set to 0. 388 o If a packet contains an edge (transition 0->1 or 1->0) in the spin 389 signal, and that edge is delayed (sent more than a configured 390 delay since the edge was received, defaulting to 1ms), the VEC is 391 set to 1. 393 o If a packet contains an edge in the spin signal, and that edge is 394 not delayed, the VEC is set to the value of the VEC that 395 accompanied the last incoming spin bit transition plus one. This 396 counter holds at 3, instead of cycling around. In other words, an 397 edge received with a VEC of 0 will be reflected as an edge with a 398 VEC of 1; with a VEC of 1 as VEC of 2, and a VEC of 2 or 3 as a 399 VEC of 3. 401 This mechanism allows observers to recognize spurious edges due to 402 reordering and delayed edges due to loss, since these packets will 403 have been sent with VEC 0: they were not edges when they were sent. 404 In addition, it allows senders to signal that a valid edge was 405 delayed because the sender was application-limited: these edges are 406 sent with the VEC set to 1 by the sender, prompting the VEC to count 407 back up over the next RTT. 409 4.3. Use of the VEC by a passive observer 411 The VEC can be used by observers to determine whether an edge in the 412 spin bit signal is valid or not, as follows: 414 o A packet containing an apparent edge in the spin signal with a VEC 415 of 0 is not a valid edge, but may be have been caused by 416 reordering or loss, or was marked as delayed by the sender. It 417 should therefore be ignored. 419 o A packet containing an apparent edge in the spin signal with a VEC 420 of 1 can be used as a left edge (i.e., to start measuring an RTT 421 sample), but not as a right edge (i.e., to take an RTT sample 422 since the last edge). 424 o A packet containing an apparent edge in the spin signal with a VEC 425 of 2 can be used as a left edge, but not as a right edge. If the 426 observation point is symmetric (i.e, it can see both upstream and 427 downstream packets in the flow), the packet can also be used to 428 take a component RTT sample on the segment of the path between the 429 observation point and the direction in which the previous VEC 1 430 edge was seen. 432 o A packet containing an apparent edge in the spin signal with a VEC 433 of 3 can be used as a left edge or right edge, and can be used to 434 compute component RTT in either direction. 436 5. Privacy and Security Considerations 438 The privacy considerations for the latency spin bit are essentially 439 the same as those for passive RTT measurement in general. 441 A concern was raised during the discussion of this feature within the 442 QUIC working group and the QUIC RTT Design Team that high-resolution 443 RTT information might be usable for geolocation. However, an 444 evaluation based on RTT samples taken over 13,780 paths in the 445 Internet from RIPE Atlas anchoring measurements [TRILAT] shows that 446 the magnitude and uncertainty of RTT data limit the resolution of 447 geolocation information that can be derived from Internet RTT to 448 national- or continental-scale; i.e., less resolution than is 449 generally available from free, open IP geolocation databases. 451 One reason for the inaccuracy of geolocation from network RTT is that 452 Internet backbone transmission facilities do not follow the great- 453 circle path between major nodes. Instead, major geographic features 454 and the efficiency of connecting adjacent major cities both influence 455 the facility routing. An evaluation of ~3500 measurements on a mesh 456 of 25 backbone nodes in the continental United States shows that 85% 457 had RTT to great-circle error of 3ms or more, making location within 458 US State boundaries ambiguous [CONUS]. 460 Therefore, in the general case, when an endpoint's IP address is 461 known, RTT information provides negligible additional information. 463 RTT information may be used to infer the occupancy of queues along a 464 path; indeed, this is part of its utility for performance measurement 465 and diagnostics. When a link on a given path has excessive buffering 466 (on the order of hundreds of milliseconds or more), such that the 467 difference in delay between an empty queue and a full queue dwarfs 468 normal variance and RTT along the path, RTT variance during the 469 lifetime of a flow can be used to infer the presence of traffic on 470 the bottleneck link. In practice, however, this is not a concern for 471 passive measurement of congestion-controlled traffic, since any 472 observer in a situation to observe RTT passively need not infer the 473 presence of the traffic, as it can observe it directly. 475 In addition, since RTT information contains application as well as 476 network delay, patterns in RTT variance from minimum, and therefore 477 application delay, can be used to infer or fingerprint application- 478 layer behavior. However, as with the case above, this is not a 479 concern with passive measurement, since the packet size and 480 interarrival time sequence, which is also directly observable, 481 carries more information than RTT variance sequence. 483 We therefore conclude that the high-resolution, per-flow exposure of 484 RTT for passive measurement as provided by the spin bit poses 485 negligible marginal risk to privacy. 487 As shown in Section 2, the spin bit can be implemented separately 488 from the rest of the mechanisms of the QUIC transport protocol, as it 489 requires no access to any state other than that observable in the 490 QUIC packet header itself. We recommend that implementations take 491 advantage of this property, to reduce the risk that errors in the 492 implementation could leak private transport protocol state through 493 the spin bit. 495 Since the spin bit is disconnected from transport mechanics, a QUIC 496 endpoint implementing the spin bit that has a model of the actual 497 network RTT and a target RTT to expose can "lie" about its spin bit 498 transitions, by anticipating or delaying observed transitions, even 499 without coordination with and the collusion of the other endpoint. 500 This is not the case with TCP, which requires coordination and 501 collusion to expose false information via its sequence and 502 acknowledgment numbers and its timestamp option. When passive 503 measurement is used for purposes where one endpoint might gain a 504 material advantage by representing a false RTT, e.g. SLA 505 verification or enforcement of telecommunications regulations, this 506 situation raises a question about the trustworthiness of spin bit RTT 507 measurements. 509 This issue must be appreciated by users of spin bit information, but 510 mitigation is simple, as QUIC implementations designed to lie about 511 RTT through spin bit modification can easily be detected. A lying 512 server can be contacted by an honest client under the control of a 513 verifying party, and the client's RTT estimate compared with the 514 spin-bit exposed estimate. Though in the general case, it is 515 impossible to verify explicit path signals with two complicit 516 endpoints (see [WIRE-IMAGE]), a lying server/client pair may be 517 subject to dynamic analysis along paths with known RTTs. We consider 518 the ease of verification of lying in situations where this would be 519 prohibited by regulation or contract, combined with the consequences 520 of violation of said regulation or contract, to be a sufficient 521 incentive in the general case not to do it. 523 6. Acknowledgments 525 Many thanks to Christian Huitema, who originally proposed the spin 526 bit as pull request 609 on [QUIC-TRANS]. Thanks to Tobias Buehler 527 for feedback on the draft, and for Alexandre Ferrieux for input on 528 the Valid Edge Counter. Special thanks to the QUIC RTT Design Team 529 for discussions leading especially to the privacy and security 530 considerations section. 532 This work is partially supported by the European Commission under 533 Horizon 2020 grant agreement no. 688421 Measurement and Architecture 534 for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat 535 for Education, Research, and Innovation under contract no. 15.0268. 536 This support does not imply endorsement. 538 7. References 540 7.1. Normative References 542 [QUIC-SPIN-EXP] 543 Trammell, B. and M. Kuehlewind, "The QUIC Latency Spin 544 Bit", draft-ietf-quic-spin-exp (work in progress). 546 7.2. Informative References 548 [ALT-MARK] 549 Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., 550 Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 551 "Alternate Marking method for passive and hybrid 552 performance monitoring", draft-ietf-ippm-alt-mark-14 (work 553 in progress), December 2017. 555 [CACM-TCP] 556 Strowes, S., "Passively Measuring TCP Round-Trip Times (in 557 Communications of the ACM)", October 2013. 559 [CARRA-RTT] 560 Carra, D., Avrachenkov, K., Alouf, S., Blanc, A., Nain, 561 P., and G. Post, "Passive Online RTT Estimation for Flow- 562 Aware Routers Using One-Way Traffic (NETWORKING 2010, LNCS 563 6091, pp. 109-121)", 2010. 565 [CONUS] Morton, A., "Comparison of Backbone Node RTT and Great 566 Circle Distances (https://github.com/acmacm/CONUS-RTT)", 567 September 2017. 569 [IMC-CONGESTION] 570 Luckie, M., Dhamdhere, A., Clark, D., Huffaker, B., and k. 571 claffy, "Challenges in Inferring Internet Interdomain 572 Congestion (in Proc. ACM IMC 2014)", November 2014. 574 [IMC-TCPSIG] 575 Sundaresan, S., Dhamdhere, A., Allman, M., and . k claffy, 576 "TCP Congestion Signatures (in Proc. ACM IMC 2017)", n.d.. 578 [MINQ] Rescorla, E., "MINQ, a simple Go implementation of QUIC 579 (https://github.com/ekr/minq)", November 2017. 581 [MOKUMOKUREN] 582 Trammell, B., "Mokumokuren, a lightweight flow meter using 583 gopacket (https://github.com/britram/mokumokuren)", 584 November 2017. 586 [NOSPIN] Morton, A., "Description of a tool chain to evaluate 587 Unidirectional Passive RTT measurement (and results) 588 (https://github.com/acmacm/PassiveRTT)", October 2017. 590 [QUIC-MGT] 591 Kuehlewind, M. and B. Trammell, "Manageability of the QUIC 592 Transport Protocol", draft-ietf-quic-manageability-01 593 (work in progress), October 2017. 595 [QUIC-TRANS] 596 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 597 and Secure Transport", draft-ietf-quic-transport-11 (work 598 in progress), April 2018. 600 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 601 RFC 792, DOI 10.17487/RFC0792, September 1981, 602 . 604 [RFC4433] Kulkarni, M., Patel, A., and K. Leung, "Mobile IPv4 605 Dynamic Home Agent (HA) Assignment", RFC 4433, 606 DOI 10.17487/RFC4433, March 2006, 607 . 609 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 610 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 611 DOI 10.17487/RFC4737, November 2006, 612 . 614 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 615 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 616 RFC 5357, DOI 10.17487/RFC5357, October 2008, 617 . 619 [RFC6049] Morton, A. and E. Stephan, "Spatial Composition of 620 Metrics", RFC 6049, DOI 10.17487/RFC6049, January 2011, 621 . 623 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 624 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 625 May 2016, . 627 [SHBAIR] Shbair, W., Cholez, T., Francois, J., and I. Chrisment, "A 628 multi-level framework to identify HTTPS services (in Proc. 629 IEEE/IFIP NOMS)", April 2016. 631 [SPINBIT-REPORT] 632 De Vaere, P., "Latency Spinbit Implementation Experience 633 (https://devae.re/f/eth/quic/spinbit_report/)", November 634 2017. 636 [TLS] Rescorla, E., "The Transport Layer Security (TLS) Protocol 637 Version 1.3", draft-ietf-tls-tls13-28 (work in progress), 638 March 2018. 640 [TMA-QOF] Trammell, B., Gugelmann, D., and N. Brownlee, "Inline Data 641 Integrity Signals for Passive Measurement (in Proc. TMA 642 2014)", April 2014. 644 [TOKYO-PING] 645 Pelsser, C., Cittadini, L., Vissicchio, S., and R. Bush, 646 "From Paris to Tokyo - On the Suitability of ping to 647 Measure Latency (In Proc. ACM IMC 2014)", October 2014. 649 [TRILAT] Trammell, B., "On the Suitability of RTT Measurements for 650 Geolocation 651 (https://github.com/britram/trilateration/blob/paper-rev- 652 1/paper.ipynb)", August 2017. 654 [WIRE-IMAGE] 655 Trammell, B. and M. Kuehlewind, "The Wire Image of a 656 Network Protocol", draft-trammell-wire-image-04 (work in 657 progress), April 2018. 659 [WWMM-BLOAT] 660 Alfredsson, S., Giudice, G., Garcia, J., Brunstrom, A., 661 Cicco, L., and S. Mascolo, "Impact of TCP Congestion 662 Control on Bufferbloat in Cellular Networks (in Proc. IEEE 663 WoWMoM 2013)", June 2013. 665 Appendix A. Experimental Evaluation 667 We have evaluated the effectiveness of the spin bit in an emulated 668 network environment. The spin bit was added to a fork of [MINQ], 669 using the mechanism described in Section 2, but with the spin bit 670 appearing in a measurement byte added to the header for passive 671 measurability experiments. Spin bit measurement support was added to 672 [MOKUMOKUREN]. Full results of these ongoing experiments are 673 available online in [SPINBIT-REPORT], but we summarize our findings 674 here. 676 First, we confirm that the spin bit works as advertised: it provides 677 one useful RTT sample per RTT to any passive observer of the flow. 678 This sample tracks each sender's local instantaneous estimate of RTT 679 as well as the expected RTT (i.e., defined by the emulation) fairly 680 well. One surprising implication of this is that the spin bit 681 provides _more_ information than is available by local estimation to 682 an endpoint which is mostly receiving data frames and sending mainly 683 ACKs, and as such can also be useful in purely endpoint-local 684 observations of the RTT evolution during the flow. The spin bit also 685 works correctly under moderate to heavy packet loss and jitter. 687 Second, we confirm that the spin bit can be easily implemented 688 without requiring deep integration into a QUIC implementation. 689 Indeed, it could be implemented completely independently, as a shim, 690 aside from the requirement that the spin bit value be integrity- 691 protected along with the rest of the QUIC header. 693 Third, we performed experiments focused on the intermittent-sender 694 problem described in Section 3.1. We confirm that the spin bit does 695 not provide useful RTT samples after the handshake when packets are 696 only sent intermittently. Simple heuristics can be used to recognize 697 this situation, however, and to reject these RTT samples. We also 698 find that a simple sender-side heuristic can be used to determine 699 whether a sample will be useful. If a sender sends a packet more 700 than a specified delay (e.g. 1ms) after the last packet received by 701 the client, it knows that any latency spin observation of that packet 702 will be invalid. If a second "spin valid" bit were available, the 703 sender could then mark that packet "spin invalid". Our experiments 704 show that this simple heuristic and spin validity bit are successful 705 in marking all packets whose RTT samples should be rejected. 707 Fourth, we performed experiments focused on the reordering problem 708 described in Section 3.1. We find that while reordering can cause 709 spurious samples at a naive observer, two simple approaches can be 710 used to reject spurious RTT samples due to reordering. First, a two- 711 bit spin signal that always advances in a single direction (e.g. 00 712 -> 01 -> 10 -> 11) successfully rejects all reordered samples, 713 including under amounts of reordering that render the transport 714 itself mostly useless. However, adding a bit is not necessary: 715 having the observer keep the least significant bits of the packet 716 number, and rejecting samples from packets that reverse the sequence 717 [RFC4737], as suggested in Section 3.1, is essentially as successful 718 as a two-bit spin signal in mitigating the effects of reordering on 719 RTT measurement. 721 Fifth, we performed parallel active measurements using ping, as 722 described in Appendix C.2. In our emulated network, the ICMP packets 723 and the QUIC packets traverse the same links with the same treatment, 724 and share queues at each link, which mitigates most of the issues 725 with ping. We find that while ping works as expected in measuring 726 end-to-end RTT, it does not track the sender's estimate of RTT, and 727 as such does not measure the RTT experienced by the application layer 728 as well as the spin bit does. 730 In summary, our experiments show that the spin bit is suitable for 731 purpose, can be implemented with minimal disruption, and that most of 732 the identified problems can be easily mitigated. See 733 [SPINBIT-REPORT] for more. 735 Appendix B. Use Cases for Passive RTT Measurement 737 This section describes use cases for passive RTT measurement. Most 738 of these are currently achieved with TCP, i.e., the matching of 739 packets based on sequence and acknowledgment numbers, or timestamps 740 and timestamp echoes, in order to generate upstream and downstream 741 RTT samples which can be added to get end-to-end RTT. These use 742 cases could be achieved with QUIC by replacing sequence/ 743 acknowledgement and timestamp analysis with spin bit analysis, as 744 described in Section 3. 746 In any case, the measurement methodology follows one of a few basic 747 variants: 749 o The RTT evolution of a flow or a set of flows can be compared to 750 baseline or expected RTT measurements for flows with the same 751 characteristics in order to detect or localize latency issues in a 752 specific network. 754 o The RTT evolution of a single flow can also be examined in detail 755 to diagnose performance issues with that flow. 757 o The spin bit can be used to generate a large number of samples of 758 RTT for a flow aggregate (e.g., all flows between two given 759 networks) without regard to temporal evolution of the RTT, in 760 order to examine the distribution of RTTs for a group of flows 761 that should have similar RTT (e.g., because they should share the 762 same path(s)). 764 B.1. Inter-domain Troubleshooting 766 Network access providers are often the first point of contact by 767 their customers when network problems impact the performance of 768 bandwidth-intensive and latency-sensitive applications such as video, 769 regardless of whether the root cause lies within the access 770 provider's network, the service provider's network, on the Internet 771 paths between them, or within the customer's own network. 773 The network performance is currently measured by points of presence 774 on-the-path which extract spatial delay and loss metrics measurements 775 [RFC6049] from fields of the transport layer (e.g. TCP) or of 776 application layer (e.g. RTP). The information is captured in the 777 upper layer because neither the IP header nor the UDP layer includes 778 fields allowing the measurement of upstream and downstream delay and 779 loss. 781 Local network performance problems are detected with monitoring tools 782 which observe the variation of upstream metrics and downstream 783 metrics. 785 Inter-domain troubleshooting relies on the same metrics but is not a 786 pro-active task. It is a recursive process which hones in on the 787 domain and link responsible for the failure. In practice, inter- 788 domain troubleshooting is a communication process between the Network 789 Operations Center (NOC) teams of the networks on the path, because 790 the root cause of a problem is rarely located on a single network, 791 and requires cooperation and exchange of data between the NOCs. 793 One example is the troubleshooting performance degradation resulting 794 from a change of routing policy on one side of the path which 795 increases the burden on a defective line card of a device located 796 somewhere on the path. The card's misbehavior introduces an abnormal 797 reordered packets only in the traffic exchanged at line rate. 799 Other examples are similar in terms of cooperation requirements and 800 the need to refer to measurements. NOCs need to share the same 801 measurement metrics and to measure these metrics on the same fields 802 of the packet to enable a minimal level of technical cooperation. 804 Experimentation with the spinbit Appendix A has shown ability to 805 replace the current RTT measurement opportunities based on clear-text 806 transport or application header fields with a standard approach for 807 measuring passive upstream and downstream RTT, which are a 808 fundamental metric for this diagnostic process. 810 B.2. Two-Point Intradomain Measurement 812 The spin bit is also useful as a basic signal for instantaneous 813 measurement of the treatment of QUIC traffic within a single network. 814 Though the primary design goal of the spin bit signal is to enable 815 single-observer on-path measurement of end-to-end RTT, the spin bit 816 can also be used by two cooperating observers with access to traffic 817 flowing in the same direction as an alternate marking signal, as 818 described in [ALT-MARK]. The only difference from alternate marking 819 with a generated signal is that the size of the alternation will 820 change with the flight size each RTT. However, these changes do not 821 affect the applicability of the method that works for each marking 822 batch separately applied between two measurement points on the same 823 direction. This two point measurement is an additional feature 824 enabled "for free" by the spin bit signal. 826 So, with more than one observer on the same direction, it can be 827 useful to segment the RTT and deduce the contribution to the RTT of 828 the portion of the network between two on-path observers. This can 829 be easily performed by calculating the delay between two or more 830 measurement points on a single direction by applying [ALT-MARK]. In 831 this way, packet loss, delay and delay variation can be measured for 832 each segment of the network depending on the number and distribution 833 of the available on-path observation points. When these observation 834 points are applied at network borders, the alternate-marking signal 835 can be used to measure the performance of QUIC traffic within a 836 network operator's own domain of responsibility. own portion of the 837 network. 839 B.3. Bufferbloat Mitigation in Cellular Networks 841 Cellular networks consist of multiple Radio Access Networks (RAN) 842 where mobile devices are attached to base stations. It is common 843 that base stations from different vendors and different generations 844 are deployed in the same cellular network. 846 Due to the dynamic nature of RANs, base stations have typically been 847 provisioned with large buffers to maximize throughput despite rapid 848 changes in capacity. As a side effect, bufferbloat has become a 849 common issue in such networks [WWMM-BLOAT]. 851 An effective way of mitigating bufferbloat without sacrificing too 852 much throughput is to deploy Active Queue Management (AQM) in 853 bottleneck routers and base stations. However, due to the variation 854 in deployed base-stations it is not always possible to enable AQM at 855 the bottlenecks, without massive infrastructure investments. 857 An alternative approach is to deploy AQM as a network function in a 858 more centralized location than the traditional bottleneck nodes. 859 Such an AQM monitors the RTT progression of flows and drops or marks 860 packets when the measured latency is indicative of congestion. Such 861 a function also has the possibility to detect misbehaving flows and 862 reduce the negative impact they have on the network. 864 B.4. Locating WiFi Problems in Home Networks 866 Many residential networks use WiFi (802.11) on the last segment, and 867 WiFi signal strength degradation manifests in high first-hop delay, 868 due to the fact that the MAC layer will retransmit packets lost at 869 that layer. Measuring the RTT between endpoints on the customer 870 network and parts of the service provider's own infrastructure (which 871 have predictable delay characteristics) can be used to isolate this 872 cause of performance problems. 874 The network provider can measure the RTT and packet loss in the home 875 gateway or an upstream point if there is no access to home gateway. 876 A problem in the WiFi network is identified by seeing high delay and 877 low packet loss. 879 These measurements are particularly useful for traffic which is 880 latency sensitive, such as interactive video applications. However, 881 since high latency is often correlated with other network-layer 882 issues such as chronic interconnect congestion [IMC-CONGESTION], it 883 is useful for general troubleshooting of network layer issues in an 884 interdomain setting. 886 In this case, multiple RTT samples per flow are useful less for 887 observing intraflow behavior, and more for generating sufficient 888 samples for a given aggregate to make a high-quality measurement. 890 B.5. Internet Measurement Research 892 As a large, distributed, engineered system with no centralized 893 control, the Internet has emergent properties of interest to the 894 research community not just for purely scientific curiosity, but also 895 to provide applicable guidance to Internet engineering, Internet 896 protocol design and development, network operations, and policy 897 development. Latency measurements in particular are both an active 898 area of research as well as an important tool for certain measurement 899 studies (see, e.g. [IMC-TCPSIG], from the most recent Internet 900 Measurement Conference). While much of this work is currently done 901 with active measurements, the ability to generate latency samples 902 passively or using a hybrid measurement approach (i.e., through 903 passive observation of purpose-generated active measurement traffic; 904 see [RFC7799]) can drastically increase the efficiency and 905 scalability of these studies. A latency spin bit would make these 906 techniques applicable to QUIC, as well. 908 Appendix C. Alternate RTT Measurement Approaches for Diagnosing QUIC 909 flows 911 There are three broad alternatives to explicit signaling for passive 912 RTT measurement of the RTT experienced by QUIC flows. 914 C.1. Handshake RTT measurement 916 The first of these is handshake RTT measurement. As described in 917 [QUIC-MGT], the packets of the QUIC handshake are distinguishable on 918 the wire in such a way that they can be used for one RTT measurement 919 sample per flow: the delay between the client initial and the server 920 cleartext packet can be used to measure "upstream" RTT (between the 921 observer and the server), and the delay between the server cleartext 922 packet and the next client cleartext packet can be used to measure 923 "downstream" RTT (between the client and the observer). When RTT 924 measurements are used in large aggregates (all flows traversing a 925 large link, for example), a methodology based on handshake RTT could 926 be used to generate sufficient samples for some purposes without the 927 spin bit. 929 However, this methodology would rely on the assumption that the 930 difference between handshake RTT and nominal in-flow RTT is 931 negligible. Specifically, (1) any additional delay required to 932 compute any cryptographic parameters must be negligible with respect 933 to network RTT; (2) any additional delay required to establish state 934 along the path must be negligible with respect to network RTT; and 935 (3) network treatment of initial packets in a flow must be identical 936 to that of later packets in the flow. When these assumptions cannot 937 be shown to hold, spin-bit based RTT measurement is preferable to 938 handshake RTT measurement, even for applications for which handshake 939 RTT measurement would otherwise be suitable. 941 C.2. Parallel active measurement 943 The second alternative is parallel active measurement: using ICMP 944 Echo Request and Reply [RFC0792] [RFC4433], a dedicated measurement 945 protocol like TWAMP [RFC5357], or a separate diagnostic QUIC flow to 946 measure RTT. Regardless of protocol, the active measurement must be 947 initiated by a client on the same network as the client of the QUIC 948 flow(s) of interest, or a network close by in the Internet topology, 949 toward the server. Note that there is no guarantee that ICMP flows 950 will receive the same network treatment as the flows under study, 951 both due to differential treatment of ICMP traffic and due to ECMP 952 routing (see e.g. [TOKYO-PING]). TWAMP and QUIC diagnostic flows, 953 though both use UDP, have similar issues regarding ECMP. However, in 954 situations where the entity doing the measurement can guarantee that 955 the active measurement traffic will traverse the subpaths of interest 956 (e.g. residential access network measurement under a network 957 architecture and business model where the network operator owns the 958 CPE), active measurement can be used to generate RTT samples at the 959 cost of at least two non-productive packets sent though the network 960 per sample. 962 C.3. Frequency Analysis 964 The third alternative, proposed during the QUIC RTT design team 965 process, relies on the inter-packet spacing to convey information 966 about the RTT, and would therefore allow measurements confined to a 967 single direction of transmission, as described in [CARRA-RTT]. 969 We evaluated the applicability of this work to passive RTT 970 measurement in QUIC, and found it wanting. We assembled a toolchain, 971 as described in [NOSPIN], that allowed evaluation of a critical 972 aspect of the [CARRA-RTT] method: extraction of inter-packet times of 973 real packet streams and the analysis of frequencies present in the 974 packet stream using the Lomb-Scargle Periodogram. Several streams 975 were evaluated, as summarized below: 977 o It seems that Carra et al. [CARRA-RTT] took the noisy and low- 978 confidence results of a statistical process (no RTT-related 979 frequency has been detected even after using very low alpha 980 confidence) and added heuristics with sliding-window averaging to 981 infer the fundamental frequency and RTT present in a 982 unidirectional stream. 984 o There appear to be several limitations on the streams that are 985 applicable. Streams with long RTT (~50ms) are more likely to be 986 suitable (having a better match between packet rate and relatively 987 low frequencies to detect). 989 o None of the TCP streams analysed (to date) possess a sufficient 990 packet rate such that the measured fundamental frequency or the 991 multiples of the fundamental are actually within the detectable 992 range. 994 o "Ideal" interarrival time streams were simulated with uniform 995 sampling and period. The Lomb-Scargle Periodogram is surprisingly 996 unable to detect the fundamental frequency at 100 Hz from the 997 constant 10 ms packet spacing. 999 o It is not clear if IETF QUIC protocol stream will possess the same 1000 inter-packet arrival time features as TCP streams. Also, Carra et 1001 al. note that their process may not work if the TCP stream 1002 encounters a bottleneck, which would be an essential circumstance 1003 for network troubleshooting. Mobile networks with time-slot 1004 service disciplines would likely cause similar issues as a 1005 bottleneck, by imposing their time-slot interval on the spacing of 1006 most packets. 1008 o The Carra et al. [CARRA-RTT] calculation of minimum and maximum 1009 frequencies that can be detected may not be applicable when the 1010 inter-arrival times are (both) the signal being detected and 1011 govern the non-uniform sampling frequency. 1013 Appendix D. Greasing 1015 Routes, congestion levels and therefore latency between two fixed 1016 QUIC endpoints, as well as the shape of individual application flows, 1017 fluctuate in ways that are not totally predictable by an on path 1018 observer. In general, there is no a-priori pattern for the spin-bit 1019 distribution that will always materialise on a certain flow 1020 aggregate, even for a single user. 1022 There has been discussion in the QUIC working group that greasing 1023 could be a strategy to counter an evil access provider that might 1024 gate access to its users on a valid spin bit signal. Let's accept 1025 for a moment this threat model and consider the practical case of a 1026 home gateway that temporarily misbehaves, for example draining its 1027 queues slower than it would normally do while a firmware download is 1028 in progress. It would be ill-considered for an access provider (even 1029 a malicious one) to block, or otherwise interfere with, QUIC flows 1030 originating from behind that CPE solely based on the fact that RTTs 1031 are now different from "usual". In fact, providing a numerical 1032 assessment of what such "usual" RTT looks like would necessarily 1033 include many paths with different length, and considerable RTT 1034 variability within any fixed path, which is clearly beyond most ISPs' 1035 reach. But even assuming it were, there is a simple cost-benefit 1036 counterargument here that the same effect (i.e., gating traffic from 1037 or to a given user based on observed traffic patterns) could be 1038 achieved with much cheaper and effective means (e.g., [SHBAIR]). 1040 So, the potential for ossification appears to be extremely low. 1041 Since it depends on so much external noise, the spin-bit result 1042 variability is self-greasing to an extent. In fact, implementing 1043 explicit greasing around the spin-bit might even be harmful as it 1044 would potentially erode confidence in the veracity of the signal. 1046 However, if a greasing algorithm is really needed - for example, if 1047 we want to reuse the bit with different semantics in the future 1048 (i.e.: the spin-bit is not included in the header invariants), one 1049 very simple implementation would be as follows: each server will 1050 refuse to spin its bit on a per-flow basis with a given probability 1051 p, instead leaving it stuck to a randomly chosen value, 0 or 1. The 1052 client will then end up leaving its bit stuck to the opposite value, 1053 or could detect this condition and also pick a randomly chosen stuck 1054 value. The value chosen for p must be small enough to let the spin- 1055 bit mechanics work and large enough not to be seen as an error 1056 instead of an intentional protocol feature. 1058 Authors' Addresses 1060 Brian Trammell (editor) 1061 ETH Zurich 1063 Email: ietf@trammell.ch 1065 Piet De Vaere 1066 ETH Zurich 1068 Email: piet@devae.re 1069 Roni Even 1070 Huawei 1072 Email: roni.even@huawei.com 1074 Giuseppe Fioccola 1075 Telecom Italia 1077 Email: giuseppe.fioccola@telecomitalia.it 1079 Thomas Fossati 1080 Nokia 1082 Email: thomas.fossati@nokia.com 1084 Marcus Ihlar 1085 Ericsson 1087 Email: marcus.ihlar@ericsson.com 1089 Al Morton 1090 AT&T Labs 1092 Email: acmorton@att.com 1094 Emile Stephan 1095 Orange 1097 Email: emile.stephan@orange.com