idnits 2.17.1 draft-scheffenegger-tcpm-timestamp-negotiation-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC1323, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1182 has weird spacing: '...EXP12lo and...' == Line 1191 has weird spacing: '...RAC12lo and...' (Using the creation date from RFC1323, updated by this document, for RFC5378 checks: 1992-05-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 28, 2011) is 4716 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) == Outdated reference: A later version (-10) exists of draft-ietf-ledbat-congestion-05 == Outdated reference: A later version (-03) exists of draft-ietf-tcpm-tcp-security-02 -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 6013 (Obsoleted by RFC 7805) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions R. Scheffenegger 3 (tcpm) NetApp, Inc. 4 Internet-Draft M. Kuehlewind 5 Updates: 1323 (if approved) University of Stuttgart 6 Intended status: Experimental May 28, 2011 7 Expires: November 29, 2011 9 Additional negotiation in the TCP Timestamp Option field 10 during the TCP handshake 11 draft-scheffenegger-tcpm-timestamp-negotiation-02 13 Abstract 15 A number of TCP enhancements in so diverse fields as congestion 16 control, loss recovery or side-band signaling could be improved by 17 making the values carried in the Timestamp option transparent, and 18 changing the receiver side processing of timestamps in the presence 19 of selective acknowledgements. 21 This documents specifies a backwards compatible way of negotiating 22 for Timestamp capabilities, and lists a number of benefits and 23 drawbacks of this approach. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 29, 2011. 42 Copyright Notice 44 Copyright (c) 2011 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 4. Problem statement . . . . . . . . . . . . . . . . . . . . . . 9 63 5. Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . 11 64 5.1. Capability Flags . . . . . . . . . . . . . . . . . . . . . 12 65 5.2. Implicit extended negotiation . . . . . . . . . . . . . . 15 66 6. Possible use cases . . . . . . . . . . . . . . . . . . . . . . 17 67 6.1. One-way delay variation measurement . . . . . . . . . . . 17 68 6.2. Early spurious retransmit detection . . . . . . . . . . . 18 69 6.3. Early lost retransmission detection . . . . . . . . . . . 19 70 6.4. Integrity of the Timestamp value . . . . . . . . . . . . . 21 71 6.5. Disambiguation with slow Timestamp clock . . . . . . . . . 21 72 6.6. Opaque timestamps as segment digest . . . . . . . . . . . 22 73 6.7. Timestamp value as covert channel . . . . . . . . . . . . 22 74 7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 24 75 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 76 9. Updates to Existing RFCs . . . . . . . . . . . . . . . . . . . 25 77 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 78 11. Security Considerations . . . . . . . . . . . . . . . . . . . 26 79 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 80 12.1. Normative References . . . . . . . . . . . . . . . . . . . 26 81 12.2. Informative References . . . . . . . . . . . . . . . . . . 26 82 Appendix A. Possible Extension . . . . . . . . . . . . . . . . . 28 83 A.1. Capability Flags . . . . . . . . . . . . . . . . . . . . . 29 84 A.2. Range Negotiation . . . . . . . . . . . . . . . . . . . . 30 85 Appendix B. Revision history . . . . . . . . . . . . . . . . . . 31 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 88 1. Introduction 90 The timestamp option originally introduced in [RFC1323] was designed 91 solely for two-way delay measurement and to support a particular TCP 92 algorithm (Reno). It would be useful to be able to support one-way 93 delay measurement and to take advantage of developments since TCP 94 Reno, such as selective acknowledgements (SACK) [RFC2018]. 96 This specification defines a protocol for the two ends of a TCP 97 session to negotiate alternative semantics for the timestamps they 98 will exchange during the rest of the session. It updates RFC1323 but 99 it is backwards compatible with implementations of RFC1323 timestamp 100 options. 102 The RFC1323 timestamp protocol presents the following problems when 103 trying to extend it for alternative uses: 105 a. Opaque meaning for the value in a timestamp. 107 * A timestamp value (TSval) as defined in [RFC1323] is 108 deliberately only meaningful to the end that sends it. The 109 other end is merely meant to echo the value without 110 understanding it. This is fine if one end is trying to 111 measure two-way delay (round trip time). However, to measure 112 one-way delay, timestamps from both ends need to be compared 113 by one end, which needs to relate the values in timestamps 114 from both ends to a notion of the passage of time that both 115 ends share. 117 b. No control over which timestamp to echo. 119 * A host implementing [RFC1323] is meant to echo the timestamp 120 value of the most recent in-order segment received. This was 121 fine for TCP Reno, but it is not the best choice for TCP 122 sessions using selective acknowledgement (SACK) [RFC2018]. 124 * A [RFC1323] host is meant to echo the timestamp value of the 125 earliest unacknowledged segment, e.g. if a host delays ACKs 126 for one segment, it echoes the first timestamp not the second. 127 It is desirable to include delay due to ACK withholding when a 128 host is conservatively measuring RTT. However, is not useful 129 to include the delay due to ACK withholding when measuring 130 one-way delay. 132 c. Alternative protection against wrapped sequence numbers. 134 * [RFC1323] also points out that the timestamps it specifies 135 will always strictly monotonically increase in each window so 136 they can be used to protect against wrapped sequence numbers 137 (PAWS). If the endpoints negotiate an alternative timestamp 138 scheme in which timestamps may not monotonically increase per 139 window, then it needs to be possible to negotiate alternative 140 protection against wrapped sequence numbers. 142 To solve these problems this specification changes the wire protocol 143 of the TCP timestamp option in two main ways: 145 1. It updates [RFC1323] to add the ability to negotiate the 146 semantics of timestamp options. The initiator of a TCP session 147 starts the negotiation in the TSecr field in the first , 148 which is currently unused. This specification defines the 149 semantics of the TSecr field in a segment with the SYN flag set. 150 A version number is included to allow further extension of 151 capability negotiation in future. 153 2. It updates [RFC1323] to define version 0 of timestamp 154 capabilities to include: 156 * the duration in seconds of a tick of the timestamp clock using 157 a floating point representation 159 * agreement that both ends will echo the timestamp on the most 160 recently received segment, rather than the one that would be 161 echoed by an [RFC1323] host. There is no specific option to 162 request this behavior, however it is implied by successful 163 negotiation of both SACK and timestamp capabilities. 165 * an ability to mask a specified number of the lower significant 166 bits of the timestamp values, so they are not considered for 167 timestamp calculations, or in an algorithm to protect against 168 wrapped sequence numbers. 170 With this new wire protocol, a number of new use-cases for the TCP 171 timestamp option become possible. Section 6 gives some examples. 172 Further extensions might be required in future. Appendix A gives an 173 example of a further version of timestamp capability negotiation that 174 could be defined in the future. 176 2. Terminology 178 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 179 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 180 document are to be interpreted as described in [RFC2119]. 182 The reader is expected to be familiar with the definitions given in 183 [RFC1323]. 185 Further terminology used within this document: 187 Timestamp clock rate 188 This document refers to clock rates for convenience. A rate is 189 expressed in Hertz (ticks-per-second). For signaling purposes, 190 the rate is not directly indicated in the protocol in Hertz 191 (s^-1) but as the duration between two ticks of the timestamp 192 clock, measured in seconds (s). The reason is to have high 193 precision at long durations (low frequencies) available in the 194 encoding (see Section 5 for details). 196 Timestamp option 197 This refers to the entire TCP timestamp option, including both 198 TSval and TSecr fields. 200 Timestamp capabilities 201 Refers only to the values and bits carried in the TSecr field of 202 and segments during a TCP handshake. For 203 signaling purposes, the timestamp capabilities are sent in clear 204 with the segment, and in an encoded form (see Section 5 for 205 details) in the segment. 207 3. Overview 209 The TCP Timestamp option (TSopt) provides timestamp echoing for 210 round-trip time (RTT) measurements. TSopt is widely deployed and 211 activated by default in many systems. [RFC1323] specifies TSopt the 212 following way: 214 Kind: 8 216 Length: 10 bytes 218 +-------+-------+---------------------+---------------------+ 219 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 220 +-------+-------+---------------------+---------------------+ 221 1 1 4 4 223 Figure 1: RFC1323 TSopt 225 "The Timestamps option carries two four-byte timestamp fields. 226 The Timestamp Value field (TSval) contains the current value of 227 the timestamp clock of the TCP sending the option. 229 The Timestamp Echo Reply field (TSecr) is only valid if the ACK 230 bit is set in the TCP header; if it is valid, it echos a times- 231 tamp value that was sent by the remote TCP in the TSval field of a 232 Timestamps option. When TSecr is not valid, its value must be 233 zero. The TSecr value will generally be from the most recent 234 Timestamp option that was received; however, there are exceptions 235 that are explained below. 237 A TCP may send the Timestamps option (TSopt) in an initial 238 segment (i.e., segment containing a SYN bit and no ACK bit), and 239 may send a TSopt in other segments only if it re- ceived a TSopt 240 in the initial segment for the connection." 242 The comparison of the timestamp in the TSecr field to the current 243 timestamp clock gives an estimation of the two-way delay (RTT). 244 [RFC1323] specifies various cases when more than one timestamp is 245 available to echo. The approach taken by [RFC1323] is not always be 246 the best choice, i.e. when the TCP Selective Acknowledgment option 247 (SACK) is used in conjunction. In addition there are use cases where 248 one-way delay (OWD) measurements are needed. These mechanisms 249 usually also rely on the TSopt to estimated the variation in OWD. 250 Current implementations are based around certain assumptions, 251 * sender using one specific timestamp clock rate, or 253 * one specific rate from a limited set of possible timestamp 254 clock rates, or 256 * the network conditions do not change for a short training 257 period while timestamp values are sampled, and 259 * the sender using all bits of TSval to reflect the timestamp 260 clock value directly with no bits used for different purposes 261 such as covert channels. 263 These assumptions may not be valid in general in the public internet. 265 This document specifies a way of negotiating the timestamp 266 capabilities available between the end hosts. This is enabled by 267 using the TSecr field in the TCP segment. In order to remain 268 backwards compatible, a receiver capable of timestamp capability 269 negotiation has to XOR the receivers (local) capabilities flags with 270 the received TSval, before echoing the result back in the TSecr 271 field. During the initial handshake, the sender has to store the 272 sent initial TSval, in order to determine if the receiver can support 273 this timestamp capability negotiation. 275 Enhancements in the area of TCP congestion control can use the 276 measurement of the one-way delay variation as one input. However, 277 without explicit knowledge of the partner's timestamp clock, arriving 278 at a good estimate requires a training phase over multiple segment 279 exchanges. In this phase, the network conditions need remain nearly 280 static to arrive at good measurements. In addition, the receiver has 281 to assume that the full TSval represents the timestamp clock value of 282 the sender, with no different use of some bits of the TSval. Covert 283 channels or fingerprinting a timestamp value artificially increase 284 the measurement noise, and a receiver may be lead to assume a higher 285 timestamp clock rate than what is actually implemented by the sender. 286 In order to assist such algorithms, explicit knowledge at an early 287 phase of the session needs to be negotiated. 289 In addition, by using synergistic signaling between timestamps 290 [RFC1323] and selective acknowledgments [RFC2018], enhancements in 291 loss recovery are possible by removing any remaining retransmission 292 and acknowledgment ambiguity. See Section 6 for a detailed 293 discussion. 295 Receivers conforming to [RFC1323] are required to only reflect the 296 timestamp of the last segment that was received in order, or the 297 timestamp of the last not yet acknowledged segment in the case of 298 delayed acknowledgments. In order to allow progressive deployment of 299 changed timestamp option semantics, a backwards compatible way of 300 negotiating the semantic is required. 302 As the importance of the timestamp option increases by using it in 303 more aspects of a TCP senders operation, so increases the importance 304 of maintaining the integrity of the reflected timestamps. At the 305 same time this must not inhibit the receiver to interpret a received 306 timestamp in TSval. 308 This is achieved by indicating how many LSB bits of the timestamp 309 value must not be interpreted by the receiver. Apart from the 310 purpose of maintaining timestamp integrity for the use as input 311 signal into congestion control algorithms, this also allows the use 312 of timestamp based methods to discriminate at the earliest possible 313 moment (within 1 RTT after the retransmission) between spurious 314 retransmissions and genuine loss even when using slow running TCP 315 timestamp clocks. 317 As an optional extension, a timestamp clock rate range negotiation is 318 also introduced in Appendix A. This is only included as example of 319 possible further enhancements. 321 4. Problem statement 323 Timestamp values are carried in each segment if negotiated for. 324 However, the content of this values is to be treated as an opaque 325 entity by the receiver. This document describes an enhancement to 326 the timestamp negotiation, and must meet the following criteria: 328 o Indicate the (rough) timestamp clock rate used by the sender in a 329 wide range. The slowest rate should be slower than 1 Hz, while 330 the highest rate should allow unique timestamps per segment, even 331 at extremely high link speeds. At the time of writing, the 332 shortest meaningful duration was found to be a 64 byte packets 333 (i.e. ACK segment) sent at a rate of 100 Gbit/s. This 334 corresponds to a maximum timestamp clock rate of around 200 MHz, 335 or a tick duration at about 5 ns. 337 o Allow for timestamps that are not directly related to real time 338 (i.e. segment counting, or use of the timestamp value as a true 339 extension of sequence numbers). 341 o Provide means to prevent or at least detect tampering with the 342 echoed timestamp value. 344 o Allow for future extensions that may use some of the timestamp 345 value bits for other signaling purposes for the remainder of the 346 session. 348 o Signaling must be backwards compatible with existing TCP stacks 349 implementing basic [RFC1323] timestamps. Current methods for 350 timestamp value generation must be supported. 352 o Allow to state timing information explicitly during the initial 353 handshake, to avoid a training phase extending beyond the initial 354 handshake. 356 o Possibly provide a means to disambiguate resent segments. 358 Some legacy implementations exist that violate [RFC1323] in that the 359 TSecr field in a is not cleared (see 360 [I-D.ietf-tcpm-tcp-security]. The protocol should have some 361 resiliency in the presence of such misbehaving senders, and must not 362 lead to an unfair advantage for such wrongly negotiated sessions. 364 As there exist some benefit to change the receiver side treatment of 365 which timestamp value to echo, the negotiation protocol itself must 366 also provide some backwards compatibility. Therefore, even when a 367 sender tries to negotiate for a higher version than supported by the 368 receiver, the receiver MUST respond with at least version 0. Also, a 369 future protocol enhancement MUST make sure that any extension is 370 compatible with at least version 0. 372 5. Signaling 374 To support these design goals stated in Section 4, only the TSecr 375 field in the initial can be used directly. The response from 376 the receiver has to be encoded, since no unused field is available in 377 the . The most straightforward encoding is a XOR with a 378 value, known to the sender. Therefore, the receiver also uses TSecr 379 to indicate it's capabilities, but calculates the XOR sum with the 380 received TSval. This allows the receiver to remain stateless and 381 functionalities like syncache (see [RFC4987]) can be maintained with 382 no change. 384 During the initial TCP three-way handshake, timestamp capabilities 385 are negotiated using the TSecr field. Timestamp capabilities MAY 386 only be negotiated in TSecr when the SYN bit is set. A host detects 387 the presence of timestamp capability flags when the EXO bit is set in 388 the TSecr field of the received segment. When receiving a 389 session request ( segment with timestamp capabilities), a 390 compliant TCP receiver is required to XOR the received TSval with the 391 receivers timestamp capabilities. The resulting value is then sent 392 in the response. 394 A host initiating a TCP session must verify if the partner also 395 supports timestamp capability negotiation and a supported version, 396 before using enhanced algorithms. Note that this change in semantics 397 does not necessarily change the signaling of timestamps on the wire 398 after initial negotiation. 400 When selective acknowledgements [RFC2018] are also negotiated for, 401 the immediate echoing of the last received timestamp value has to be 402 enabled, regardless of the senders version of the timestamp 403 capabilities. 405 To mitigate the effect from misbehaving TCP senders appearing to 406 negotiate for timestamp capabilities, a receiver MUST verify that one 407 specific bit (EXO) is set, and any reserved bits (currently 8, RES 408 field) are cleared. This limits the chance for a receiver to 409 mistakenly negotiate for version 0 capabilities to around 0.05%. 410 However, as a receiver has to use changed semantics when reflecting 411 TSval also for higher values in the version field, a misbehaving 412 sender negotiating for SACK, but not properly clearing TSecr, may 413 have a 37.5% chance of receiving timestamp values with modified 414 receiver behavior. This may lead to an increased number of spurious 415 retransmission timeouts, putting such a session to a disadvantage. 417 Once timestamp capabilities are successfully negotiated, the receiver 418 must ignore an indicated number of opaque bits, before applying the 419 heuristics defined in [RFC1323]. The monotonic increase of the 420 timestamp value could be violated for each newly sent segment, 421 conflicting with the constraints imposed by PAWS. 423 The presented distribution of the common three fields, EXO, VER and 424 MASK, that MUST be present regardless of which version is implemented 425 in a compliant TCP stack, is a result of the previously mentioned 426 design goals. The lower three octets MAY be redefined freely with 427 subsequent versions of the timestamp capability negotiation protocol. 428 This allows a future version to be implemented in such a way, that a 429 receiver can still operate with the modified behavior, and a minimum 430 amount of processing (PAWS) 432 The wide range of indicated timestamp clock rates (spanning 9 orders 433 of (decimal) magnitude, or 28 binary digits, and the limitation to no 434 more than 24 bits requires the use of a logarithmic encoding. Since 435 the precision of the timestamp clock value is most valuable at low 436 frequencies (long tick durations), the clock rate is encoded as a 437 time duration. This results in full precision for common used 438 timestamp clock tick durations, while allowing even higher 439 frequencies at reduced precision (subnormal numbers representing very 440 short tick durations). A format was chosen that resembles, but does 441 not conform to, the format of an IEEE-754 binary16 representation. 443 The timestamp clock values a host is using must not necessarily run 444 synchronous with the internal TCP clock. Different clock sources, 445 such as a NTP stratum, RTC, CPU cycle counters, or other independent 446 clocks can be used to derive the TSval. This allows the de-coupling 447 of the coarse-grained TCP clock used for retransmission and delayed 448 ACK timeouts, from the clock frequency indicated in the TSval itself. 449 Since [RFC1323] timestamp clocks used to be only useful for RTT 450 measurement, and calculation of the RTO, the straight forward use of 451 the TCP timer directly seemed natural to minimize subsequent RTT 452 calculations. 454 Most stacks will at first not be able to dynamically adjust their 455 timestamp clock rate. Therefore, the indicated clock duration can be 456 a static, compile time value. To use the indicated clock duration, 457 for example to perform one-way delay variation calculations, simple 458 integer operations can be used after an initial conversion of the 459 wire presentation to longer (i.e. 32 or 64 bit) integer values. 461 5.1. Capability Flags 463 In order to signal the supported capabilities, the TSecr value is 464 overloaded with the following flags and fields during the initial 465 and segments. The initiating host of a session with 466 timestamp capability negotiation has to keep minimal state to decode 467 the returned capabilities XOR'ed with the sent TSval. 469 Kind: 8 471 Length: 10 bytes 473 +-------+-------+---------------------+---------------------+ 474 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 475 +-------+-------+---------------------+---------------------+ 476 1 1 4 | 4 | 477 / | 478 .-----------------------------------' | 479 / \ 480 | | 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 482 |E| | # | DUR | 483 |X|VER| MASK # RES |-------------------------------| 484 |O| | # | EXP | FRAC | 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 487 Figure 2: Timestamp Capability flags 489 Common fields to all versions: 491 EXO - Extended Options (1 bit) 492 Indicates that the sender supports extended timestamp 493 capabilities as defined by this document, and MUST be set to 494 one by a compliant implementation. This flag also enables 495 the immediate echoing of the TSval with the next ACK, if both 496 timestamp capabilities and selective acknowledgement 497 [RFC2018] are successful negotiated during the initial 498 handshake. This change in semantics is independent of the 499 version in the signaled timestamp capabilities. 501 VER - Version (2 bits) 502 Version of the capabilities fields definition. This document 503 specifies codepoint 0. With the exception of the immediate 504 mirroring - simplifying the receiver side processing - and 505 the masking of some LSB bits before performing the Protection 506 Against Wrapped Sequence Numbers (PAWS) test, hosts must 507 treat received timestamps as opaque entity and not use them 508 as inputs into advanced heuristics, if the version is not 509 supported. The lower 3 octets of the timestamp capability 510 flags MUST be ignored if an unsupported version is received. 511 It is expected, that a host will implement at least version 512 0. A receiver MUST respond with the appropriate (equal or 513 version 0) version when responding to a new session request. 515 MASK - Mask Timestamps (5 bits) 516 The MASK field indicates how many least significant bits 517 should be excluded by the receiver, before further processing 518 the timestamp (i.e. PAWS, of for timing purposes). The 519 unmasked portion of a TSval has to comply with the 520 constraints imposed by [RFC1323] on the generation of valid 521 timestamps, e.g. must be monotonic increasing between 522 segments, and strict monotonic increasing for each window. 523 Note that this does not impact the reflected timestamp in any 524 way - TSecr will always be equal to an appropriate TSval. 525 This field MUST be present in all future version of timestamp 526 capability fields. A value of 31 (all bits set) MUST be 527 interpreted by a receiver that the full TSval is opaque. For 528 PAWS to be effective, at least 2 bits are required to 529 discriminate between an increase (and roll-over) versus 530 outdated segments. 532 Version 0 specific fields: 534 RES - Reserved (8 bits) 535 Reserved for future use, and MUST be zero ("0") with version 536 0. If timestamp capabilities are received with version set 537 to 0, but some of these bits set, the receiver MUST ignore 538 the extended options field and react as if the TSecr was zero 539 (compatibility mode). 541 DUR - Duration (16 bits) 542 The timestamp clock tick duration, measured in seconds. This 543 is a binary floating point value, indicating the length 544 between two timestamp clock ticks. A value of zero (both 545 exponent and fraction set to zero) is supported and 546 indicates, that the timestamp values are NOT linear related 547 to wall-clock time (i.e. the sender may perform some form of 548 segment counting or sequence number extension instead). A 549 host receiving a duration of zero from the other end host 550 MUST NOT perform time-based heuristics which take the 551 received TSval into account. The special floating point 552 numbers infinity and not-a-number (NaN), where all exponent 553 bits are set, are not supported. 554 Timestamp clock periods faster than 1 ms SHOULD be 555 implemented by inserting the timestamp "late" before 556 transmitting a segment to avoid unnecessary timing jitter. 557 Shortest clock periods, with periods of only a few 558 microseconds or less, are provided for hardware-assisted 559 implementations. 560 The range of possible values runs from 15.99 s to 7.45 ns 561 with highest precision, and down to 3.64 ps with reducing 562 precision, which is also the shortest difference in tick 563 duration, that could be resolved. This equates to clock 564 frequencies of 0.06 Hz, 134 MHz and 275 GHz respectively. 565 Despite the provision of such a large dynamic range, a 566 receiver should consider, that a timestamp clock may deviate 567 from the indicated rate by a large fraction. 569 EXP - Exponent (5 bits) 570 The exponent component of the binary floating point number 571 indicating the timestamp tick duration. The exponent bias is 572 28. Subnormal numbers (lower precision), where the exponent 573 is set to zero, extend the lowest possible value 574 representation to 2^-39 s (or 3.64 ps) at reduced precision. 575 An exponent value of 31 MUST be treated as normal exponent. 576 This allows timestamp clock ticks of up to 15.99 s. 577 Note that this representation is not identical to the 578 binary16 definition in IEEE 754-2008, and can not be 579 processed as-is in a standard floating point library. See 580 Section 6.1 for details. 582 FRAC - Fraction (11 bits) 583 The fraction component of a binary floating point number 584 indicating the timestamp tick duration. The range with the 585 highest resolution, excluding subnormal numbers, covers clock 586 periods between 7.45 ns (or 134 MHz clock frequency) and 587 15.99 s (0.06 Hz). The field has an implicit lead bit with 588 value 1 unless the exponent field is stored with all zeros. 590 Example for an timestamp capability negotiation, to indicate that the 591 senders timestamp clock (tcp clock) is running with 1 ms per tick: 593 SYN, TSopt=, TSecr=EXO|MASK|EXP=18|FRAC=0x031 595 The clock rate calculates as 2^(18-28)*1.00000110001b, thus indicates 596 an actual clock rate of 999.93 us 598 5.2. Implicit extended negotiation 600 If both Timestamp capabilities and Selective Acknowledgement options 601 [RFC2018] are negotiated (both hosts send these options in their 602 respective segments), both hosts MUST echo the timestamp value of the 603 last received segment, irrespective of the order of delivery. Note 604 that this is in conflict with [RFC1323], where only the timestamp of 605 the last segment received in sequence is mirrored. As SACK allows 606 discrimination of reordered or lost segments, the reflected 607 timestamps are not required to convey the most conservative 608 information. If SACK indicates lost or reordered packets at the 609 receiver, the sender MUST take appropriate action such as ignoring 610 the received timestamps for calculating the round trip time, or 611 assuming a delayed packet (with appropriate handling). The exact 612 implications are beyond the scope of this document. 614 The immediate echoing of the last received timestamp value allowed by 615 the synergistic use of the timestamp option with the SACK option 616 enables enhancements to improve loss recovery, round trip time (RTT) 617 and one-way delay (OWD) variation measurements (see Section 6) even 618 during loss or reordering episodes. This is enabled by removing any 619 retransmission ambiguity using unique timestamps for every 620 retransmission, while simultaneously the SACK option indicates the 621 ordering of received segments even in the presence of ACK loss or 622 reordering. 624 6. Possible use cases 626 6.1. One-way delay variation measurement 628 New congestion control algorithms are currently proposed, that react 629 on the measured one-way delay variation (i.e. 630 [I-D.ietf-ledbat-congestion], [Chirp]). This control variable is 631 updated after each received ACK: 633 C(t) = TSval(t) - TSecr(t) 635 V(t) = C(t) - C(t-1) 637 provided that the timestamp clock rates at both ends are running at 638 roughly the same rate. Without prior knowledge of the timestamp 639 clock rate used by the partner, a sender can try to learn this rate 640 by observing the exchanged segments for a duration of a few RTTs. 641 However, such a scheme fails if the partner uses some form of 642 implicit integrity check of the timestamp values, which would appear 643 as either random scrambling of LSB bits in the timestamp, or give the 644 impression of a much higher clock rate than what is actually used. 645 If the partner uses some form of segment counting as timestamp value, 646 without any direct relationship to the wall-clock time, the above 647 formula will fail to yield meaningful results. Finally the network 648 conditions need to remain stable during any such training phase, so 649 that the sender can arrive at reasonable estimates of the partners 650 timestamp clock rate. 652 This note addresses these concerns by providing a means by which both 653 host are required to use a timestamp clock that is closely related to 654 the wall-clock time, with known clock rate, and also provides means 655 by which a host can signal the use of a few LSB bits for timestamp 656 value integrity checks. To arrive at a valid one-way delay (OWD) 657 variation, first the timestamp received from the partner has to be 658 right-shifted by a known amount of bits as defined by the mask field. 659 Next the local and remote timestamp values need to be normalized to a 660 common base clock rate (typically, the local clock rate): 662 remote clock rate 663 C = (TSecr >> local mask) - (TSval >> remote mask) * ----------------- 664 t local clock rate 666 V(t) = C(t) - C(t-1) 668 The adjustment factor can be calculated once during the timestamp 669 capability negotiation phase, and pure integer arithmetic can be used 670 during per-segment processing: 672 EXP.min = min(EXP.loc, EXP.rem) 674 EXP.rem -= EXP.min 676 EXP.loc -= EXP.min 678 FRAC.rem = (0x800 | FRAC.rem) << EXP.rem 680 FRAC.loc = (0x800 | FRAC.loc) << EXP.loc 682 and assuming that the local clock rate (tick duration) is lower 684 ADJ = FRAC.rem / FRAC.loc 686 with ADJ being a integer variable. For higher precision, two 687 appropriately calculated integers can be used. 689 Any previously required training on the remote clock rate can be 690 removed, resulting in a simpler and more dependable algorithm. 691 Furthermore, transient network effects during the training phase 692 which may result in a wrong inference of the remote clock rate are 693 eliminated completely. 695 6.2. Early spurious retransmit detection 697 Using the provided timestamp negotiation scheme, clients utilizing 698 slow running timestamp clocks can set aside a small number of least 699 significant bits in the timestamps. These bits can be used to 700 differentiate between original and retransmitted segments, even 701 within the same timestamp clock tick (i.e. when RTT is smaller than 702 the TCP timestamp clock rate). It is recommended to use only a 703 single bit (mask = 1), unless the sender can also perform lost 704 retransmission detection. Using more than 2 bits for this purpose is 705 discouraged due to the diminishing probability of loosing 706 retransmitted packets more than one time. A simple scheme could send 707 out normal data segments with the so masked bits all cleared. Each 708 advance of the timestamp clock also clears those bits again. When a 709 segment is retransmitted without the timestamp clock increasing, 710 these bits increased by one for each consecutive retry of the same 711 segment, until the maximum value is reached. Newly sent segments 712 (during the same clock interval) should maintain these bits, in order 713 to maintain monotonically increasing values, even though compliant 714 end hosts do not require this property. This scheme maintains 715 monotonically increasing timestamp values - including the masked 716 bits. Even without negotiating the immediate mirroring of timestamps 717 (done by simultaneously doing timestamp capabilities negotiation, and 718 selective acknowledgments), this extends the use of the Eifel 719 Detection [RFC3522] and Eifel Response [RFC4015] algorithm to detect 720 and react to spurious retransmissions under all circumstances. Also, 721 currently experimental schemes such as ER-SRTO [Cho08] could be 722 deployed without requiring the receiver to explicitly support that 723 capability. 725 Seg0 Seg1 Seg2 Seg3 Seg4 726 TS00 TS00 TS00 TS00 TS00 727 X 729 Seg1 Seg5 730 TS01 TS01 732 Seg6 Seg7 733 TS01 TS10 735 Figure 3: timestamp for spurious retranmit detection 737 Masked bits are the 2nd digit, the timestamp value is represented by 738 the first digit. The timestamp clock "ticks" between segment 6 and 739 7. 741 6.3. Early lost retransmission detection 743 During phases where multiple segments in short succession (but not 744 necessarily successive segments) are lost, there is a high likelihood 745 that at least one segment is retransmitted, while the cause of loss 746 (i.e. congestion, fading) is still persisting. The best current 747 algorithms can recover such a lost retransmission with a few 748 constraints, for example, that the session has to have at least 749 DupThresh more segments to send beyond the current recovery phase. 750 During loss recovery, when a retransmission is lost again, currently 751 the timestamp can also not be used as means of conveying additional 752 information, to allow more rapid loss recovery while maintaining 753 packet conservation principles. Only the timestamp of the last 754 segment preceding the continuous loss will be reflected. Using the 755 extended timestamp option negotiation together with selective 756 acknowledgements, the receiver will immediately reflect the timestamp 757 of the last seen segment. Using both SACK and TS information 758 synergistically, a sender can infer the exact order in which original 759 and retransmitted segments are received. This allows a slightly less 760 conservative and faster approach to retransmit lost retransmitted 761 segments. 763 This can be implemented in combination with the masked bit approach 764 described in the previous paragraph, or without. However, if the 765 timestamp clock rate is lower than 1/2 RTT, both the original and the 766 retransmitted segment may carry an identical timestamp. If the 767 sender cannot discriminate between the original and the retransmitted 768 segments, is MUST refrain from taking any action before such a 769 determination can be made. 771 In this example, masked bits are used, with a simple marking method. 772 As the timestamp value of the retransmission itself is already 773 different from the original segments, such an additional 774 discrimination would not strictly be required here. The timestamp 775 clock ticks in the first digit and the dupthresh value is 3. 777 Seg0 Seg1 Seg2 Seg3 Seg4 Seg5 Seg6 Seg7 778 TS00 TS10 TS10 TS10 TS10 TS10 TS10 TS20 779 X X X * 781 Seg1 Seg2 Seg3 Seg4 782 TS21 TS30 TS30 TS30 783 X 785 Seg1 Seg8 Seg9 786 TS31 TS31 TS40 788 Figure 4: timestamp under loss 790 If Seg1,TS00 is lost twice, and Seg4,TS10 is also lost, the sender 791 could resend Seg1 once more after seeing dupthresh number of segments 792 sent after the first retransmission of Seg1 being received (ie, when 793 Seg4 is SACKed). However, there is a ambiguity between retransmitted 794 segments and original segments, as the sender cannot know, if a SACK 795 for one particular segment was due to the retransmitted segment, or a 796 delayed original segment. The timestamp value will not help in this 797 case, as per RFC1323 it will be held at TS00 for the entire loss 798 recovery episode. Therefore, currently a sender has to assume that 799 any SACKed segments may be due to delayed original sent segments, and 800 can only resolve this conflict by injecting additional, previously 801 unsent segments. Once dupthresh newly injected segments are SACKed, 802 continuous loss (and not further delay) of Seg1 can safely be 803 assumed, and that segment be resent. This approach is conservative 804 but constrained by the requirement that additional segments can be 805 sent, and thereby delayed in the response. 807 With the synergistic use of timestamp extended options together with 808 selective acknowledgments, the receiver would immediately reflect 809 back the timestamp of the last received segment. This allows the 810 sender to discriminate between a SACK due to a delayed Seg4,TS10, or 811 a SACK because of Seg4,TS30. Therefore, the appropriate decision 812 (retransmission of Seg1 once more, or addressing the observed 813 reordering/delay accordingly [I-D.blanton-tcp-reordering] can be 814 taken with high confidence. 816 6.4. Integrity of the Timestamp value 818 If the timestamp is used for congestion control purposes, an 819 incentive exists for malicious receivers to reflect tampered 820 timestamps, as demonstrated with some exploits [CUBIC]. 822 One way to address this is to not use timestamp information directly, 823 but to keep state in the sender for each sent segment, and track the 824 round trip time independent of sent timestamps. Such an approach has 825 the drawback, that it is not straightforward to make it work during 826 loss recovery phases for those segments possibly lost (or reordered). 827 In addition there is processing and memory overhead to maintain 828 possibly extensive lists in the sender that need to be consulted with 829 each ACK. Despite these drawbacks, this approach is currently 830 implemented due to lack of alternatives (see [Linux], and [BSD10]). 832 The preferred approach is that the sender MAY choose to protect 833 timestamps from such modifications by including a fingerprint (secure 834 hash of some kind) in some of the least significant bits. However, 835 doing so prevents a receiver from using the timestamp for other 836 purposes, unless the receiver has prior knowledge about this use of 837 some bits in the timestamp value. Furthermore, strict monotonic 838 increasing values are still to be maintained. That constraint 839 restricts this approach somewhat and limits or inhibits the use of 840 timestamp values for direct use by the receiver (i.e. for one-way 841 delay variation measurement, as the hash bits would look like random 842 noise in the delay measurement). 844 6.5. Disambiguation with slow Timestamp clock 846 In addition, but somewhat orthogonal to maintaining timestamp value 847 integrity, there is a use case when the sender does not support a 848 timestamp clock rate that can guarantee unique timestamps for 849 retransmitted segments. This may happen whenever the TCP timestamp 850 clock rate is slower than the round-trip time of the path. For 851 unambiguously identifying regular from retransmitted segments, the 852 timestamp must be unique for otherwise identical segments. Reserving 853 the least significant bits for this purpose allows senders with slow 854 running timestamp clocks to make use of this feature. However, 855 without modifying the receiver behavior, only limited benefits can be 856 extracted from such an approach. Furthermore the use of this option 857 has implications in the protection against wrapped sequence numbers 858 (PAWS - [RFC1323]), as the more bits are set aside for tamper 859 prevention, the faster the timestamp number space cycles. 861 Using Timestamp capabilities to explicitly negotiate mask bits, and 862 set aside a (low) number of least significant bits for the above 863 listed purposes, allows a sender to use more reliable integrity 864 checks. These masked bits are not to be considered part of the 865 timestamp value, for the purposes described in [RFC1323] (i.e. PAWS) 866 and subsequent heuristics using timestamp values (i.e. Eifel 867 Detection), thereby lifting the strict requirement of always 868 monotonically increasing timestamp values. However, care should be 869 taken to not mask too many bits, for the reasons outlined in 870 [RFC1323]. Using a mask value higher than 8 is therefore 871 discouraged. 873 The reason for having 5 bits for the mask field nevertheless is to 874 allow the implementation of this protocol in conjunction with TCP 875 cookie transaction (TCPCT) extended timestamps [RFC6013]. That 876 allows for nearly a quarter of a 128 bit timestamp to be set aside. 878 6.6. Opaque timestamps as segment digest 880 After making TCP alternate checksums historic ([RFC6247]), there 881 still remains a need to address increased corruption probabilities 882 when segment sizes are increased (see 883 [I-D.ietf-tcpm-anumita-tcp-stronger-checksum]). 885 Utilizing an all-opaque TSval field allows the sender to include a 886 stronger CRC32, with semantics independent of the fixed TCP header 887 fields. However, such a use would again exclude the use of PAWS on 888 the receiver side, and a receiver would need to know the specifics of 889 the digest for processing. It is assumed, that such a digest would 890 only cover the data payload of a TCP segment. In order to allow 891 disambiguation of retransmissions, a special TSval can be defined 892 (e.g. TSval=0) which bypasses regular CRC processing but allows the 893 identification of retransmitted segments. 895 The full semantics of such a data-only CRC scheme are beyond the 896 scope of this document, but would require a different version of the 897 timestamp capability. Nevertheless, allowing the full TSval to 898 remain unprocessed by the receiver for the purpose of PAWS even in 899 version 0 could still allow the successful negotiation of sender-side 900 enhancements such as loss recovery improvements (see Section 6.2, and 901 Section 6.3). 903 In effect, the masked portion of the timestamp values represent an 904 unreliable out of band signal channel, that could also be used for 905 other purposes than solely performing timestamp integrity checks (for 906 example, this would allow ER-SRTO algorithms [Cho08]). 908 6.7. Timestamp value as covert channel 910 Covert channels SHOULD NOT be implemented by using the mask field, as 911 the explicit masking clearly points to such a channel. As the 912 regular operation of the timestamp clock is still maintained, covert 913 channels working by artificially delaying data segments in an 914 application (and thereby influencing the timestamp inserted into the 915 segment) work unaffected. The received TSval would need to be 916 shifted by the appropriate number of bits, before extracting the data 917 from the covert channel by the receiver. 919 7. Discussion 921 RTT and OWD variation during loss episodes is not deeply researched. 922 Current heuristics ([RFC1122], [RFC1323], Karn's algorithm [RFC2988]) 923 explicitly exclude (and prevent) the use of RTT samples when loss 924 occurs. However, solving the retransmission ambiguity problem - and 925 the related reliable ACK delivery problem - would enable new 926 functionality to improve TCP processing. Also, having an immediate 927 echo of the last received timestamp value would enable new research 928 to distinguish between corruption loss (assumed to have no RTT / OWD 929 impact) and congestion loss (assumed to have RTT / OWD impact). 930 Research into this field appears to be rather neglected, especially 931 when it comes to large scale, public internet investigations. Due to 932 the very nature of this, passive investigations without signals 933 contained within the headers are only of limited use in empirical 934 research. 936 Retransmission ambiguity detection during loss recovery would allow 937 an additional level of loss recovery control without reverting to 938 timer-based methods. As with the deployment of SACK, separating 939 "what" to send from "when" to send it could be driven one step 940 further. In particular, less conservative loss recovery schemes 941 which do not trade principles of packet conservation against 942 timeliness, require a reliable way of prompt and best possible 943 feedback from the receiver about any delivered segment and their 944 ordering. [RFC2018] SACK alone goes quite a long way, but using 945 timestamp information in addition could remove any ambiguity. 946 However, the current specs in [RFC1323] make that use impossible, 947 thus a modified semantic (receiver behavior) is a necessity. 949 A synergistic signaling with immediate timestamp value echoes would 950 however break legacy, per-packet RTT measurements. The reason is, 951 that delayed ACKs would not be covered. Research has shown, that 952 per-packet updates of the RTT estimation (for the purpose of 953 calculating a reasonable RTO value) are only of limited benefit (see 954 [Path99], and [PH04]). This is the most serious implication of the 955 proposed synergistic signaling scheme with directly echoing the 956 timestamp value of the segment triggering the ACK. Even when using 957 the directly reflected timestamp values in an unmodified RTT 958 estimator, the immediate impact would be limited to causing premature 959 RTOs when the sending rate suddenly drops below two segments per RTT. 960 That is, assuming the receiver implements delayed ACK and sending one 961 ACK for every other data segment received. If the receiver has 962 D-SACK [RFC2883] enabled, such premature RTOs can be detected and 963 mitigated by the sender (for example, by increasing minRTO for low 964 bandwidth flows). 966 8. Acknowledgements 968 The authors would like to thank Dragana Damjanovic for some initial 969 thoughts around Timestamps and their extended potential use. 971 The editor would like to thank Bob Briscoe for his insightful 972 comments, and the gratuitous donation of text, that have resulted in 973 a substantially improved document. 975 9. Updates to Existing RFCs 977 Care has been taken to make sure the updates in this specification 978 can be deployed incrementally. 980 Updates to existing [RFC1323] implementations are only REQUIRED if 981 they do not clear the TSecr value in the initial segment. This 982 is a misinterpretation of [RFC1323] and may leak data anyway (see 983 [I-D.ietf-tcpm-tcp-security]). Otherwise, there will be no need to 984 update an RFC1323-compliant TCP stack unless the timestamp 985 capabilities negotiation is to be used. 987 Implementations compliant with the definitions in this document shall 988 be prepared to encounter misbehaving senders, that don't clear TSecr 989 in their initial . It is believed, that checking the reserved 990 bits to be all zero provides enough protection against misbehaving 991 senders. 993 10. IANA Considerations 995 With this document, the IANA is requested to establish a new registry 996 to record the timestamp capability flags defined with future versions 997 (codepoints 1, 2 and 3). 999 The lower 24 bits (3 octets) of the timestamp capabilities field may 1000 be freely assigned in future versions. The first octet must always 1001 contain the EXO, VER and MASK fields for compatibility, and the MASK 1002 field MUST be set to allow interoperation with a version 0 receiver. 1004 This document specifies version 0 and the use of the last three 1005 octets to signal the senders timestamp clock rate to the receiver. 1007 11. Security Considerations 1009 The algorithm presented in this paper shares security considerations 1010 with [RFC1323] (see [I-D.ietf-tcpm-tcp-security]). 1012 Some implementations address the vulnerabilities of [RFC1323], by 1013 dedicating a few low-order bits of the timestamp fields for use with 1014 a (secure) hash, that protects against malicious modification of 1015 TSecr value by the receiver. A MASK field has been provided to 1016 transparently notify the receiver about that alternate use of low- 1017 order bits. This allows the use of timestamps for purposes requiring 1018 higher integrity and security while maintaining transparency to the 1019 receiver. 1021 12. References 1023 12.1. Normative References 1025 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 1026 for High Performance", RFC 1323, May 1992. 1028 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 1029 Selective Acknowledgment Options", RFC 2018, October 1996. 1031 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1032 Requirement Levels", BCP 14, RFC 2119, March 1997. 1034 12.2. Informative References 1036 [BSD10] Hayes, D., "Timing enhancements to the FreeBSD kernel to 1037 support delay and rate based TCP mechanisms", Feb 2010, . 1041 [CUBIC] Rhee, I., Ha, S., and L. Xu, "CUBIC: A New TCP-Friendly 1042 High-Speed TCP Variant", Feb 2005, . 1046 [Chirp] Kuehlewind, M. and B. Briscoe, "Chirping for Congestion 1047 Control - Implementation Feasibility", Nov 2010, . 1050 [Cho08] Cho, I., Han, J., and J. Lee, "Enhanced Response Algorithm 1051 for Spurious TCP Timeout (ER-SRTO)", Jan 2008, . 1056 [I-D.blanton-tcp-reordering] 1057 Blanton, E., Dimond, R., and M. Allman, "Practices for TCP 1058 Senders in the Face of Segment Reordering", 1059 draft-blanton-tcp-reordering-00 (work in progress), 1060 February 2003. 1062 [I-D.ietf-ledbat-congestion] 1063 Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 1064 "Low Extra Delay Background Transport (LEDBAT)", 1065 draft-ietf-ledbat-congestion-05 (work in progress), 1066 May 2011. 1068 [I-D.ietf-tcpm-anumita-tcp-stronger-checksum] 1069 Biswas, A., "Support for Stronger Error Detection Codes in 1070 TCP for Jumbo Frames", 1071 draft-ietf-tcpm-anumita-tcp-stronger-checksum-00 (work in 1072 progress), May 2010. 1074 [I-D.ietf-tcpm-tcp-security] 1075 Gont, F., "Security Assessment of the Transmission Control 1076 Protocol (TCP)", draft-ietf-tcpm-tcp-security-02 (work in 1077 progress), January 2011. 1079 [Linux] Sarolahti, P., "Linux TCP", Apr 2007, 1080 . 1082 [PH04] Eckstroem, H. and R. Ludwig, "The Peak-Hopper: A New End- 1083 to-End Retransmission Timer for Reliable Unicast 1084 Transport", Apr 2004, . 1087 [Path99] Allman, M. and V. Paxson, "On Estimating End-to-End 1088 Network Path Properties", Sep 1999, 1089 . 1091 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1092 Communication Layers", STD 3, RFC 1122, October 1989. 1094 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 1095 Extension to the Selective Acknowledgement (SACK) Option 1096 for TCP", RFC 2883, July 2000. 1098 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 1099 Timer", RFC 2988, November 2000. 1101 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 1102 for TCP", RFC 3522, April 2003. 1104 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1105 for TCP", RFC 4015, February 2005. 1107 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1108 Mitigations", RFC 4987, August 2007. 1110 [RFC6013] Simpson, W., "TCP Cookie Transactions (TCPCT)", RFC 6013, 1111 January 2011. 1113 [RFC6247] Eggert, L., "Moving the Undeployed TCP Extensions RFC 1114 1072, RFC 1106, RFC 1110, RFC 1145, RFC 1146, RFC 1379, 1115 RFC 1644, and RFC 1693 to Historic Status", RFC 6247, 1116 May 2011. 1118 Appendix A. Possible Extension 1120 This section is not intended as normative description of an 1121 extension, but merely as an example of a possible extension. Future 1122 extensions MUST set the common fields in such a way that a receiver 1123 capable of version 0 only can react appropriately. 1125 Certain hosts may want to negotiate a common optimal timestamp clock 1126 rate between each other for various purposes. For example, the 1127 balance between PAWS ([RFC1323]) and the timestamp clock resolution 1128 should be more towards one or the other. Also, if a hosts wants to 1129 have identical timestamp clock rates both at the sender and receiver 1130 to simplify one-way delay variation calculation, negotiating the 1131 clock rate could be useful. With identical timestamp clock rates, 1132 instead of multiplications and divisions, only additions and 1133 subtractions are required for OWD variation calculation. 1135 Without a full three way handshake, full negotiation of the timestamp 1136 clock rate is not possible. For this reason, a special semantic is 1137 required during negotiation. This allows both ends know the exact 1138 timestamp clock rate with only two exchanged segments, while at the 1139 same time remaining compatible with version 0. 1141 For this purpose, the following extension (version 1) of this 1142 proposal is one suggestion. Depending on the exact requirements, a 1143 different signaling may be more appropriate. For example, only the 1144 two different EXP fields could be required, while a single, but 1145 higher precision FRAC field for both low and high boundaries could 1146 suffice, and some additional signaling bits could be made available. 1148 A.1. Capability Flags 1150 Kind: 8 1152 Length: 10 bytes 1154 +-------+-------+---------------------+---------------------+ 1155 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 1156 +-------+-------+---------------------+---------------------+ 1157 1 1 4 | 4 | 1158 / | 1159 .-----------------------------------' | 1160 / \ 1161 | | 1162 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1163 |E| | # | | | | 1164 |X|VER| MASK # EXP12lo | FRAC12lo | EXP12hi | FRAC12hi | 1165 |O| | # | | | | 1166 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1168 Figure 5: Timestamp Capability enhanced flags 1170 The following additional fields are defined: 1172 VER - version (2 bits) 1173 Version 1 could indicated that the sender is capable of adjusting 1174 the timestamp clock rate within the bounds of the two 12 bit 1175 fields (see Appendix A.2). A receiver that only implements 1176 version 0 SHOULD NOT ignore the timestamp capability negotiation 1177 entirely when encountering an unsupported version, any SHOULD 1178 respond with a version 0 response nevertheless (see below) - 1179 thereby enabling enhanced uses of the timestamp value and the 1180 modification of the receiver side timestamp processing. 1182 EXP12lo and 1184 EXP12hi - binary12 Exponent (5 bits each) 1185 The exponent component of a truncated, 12 bit floating point 1186 number indicating the possible timestamp clock ranges. The 1187 exponent bias is also 28, and no special numbers (infinity, NaN) 1188 are allowed. The exponent value 31 is treated like any other 1189 exponent value. 1191 FRAC12lo and 1192 FRAC12hi - binary12 Fraction (7 bits each) 1193 The fraction component of a 12 bit floating point number. 1194 Subnormal numbers are allowed (Exponent value 0). This allows a 1195 range between 7.45 ns and 15.99 s with full resolution (lower 1196 bound is 0.06 ns using subnormal values). As a value of zero 1197 (both exponent and fraction set to zero) has a special meaning, 1198 it is not a valid number for range negotiation. 1200 A.2. Range Negotiation 1202 Only the host initiating a TCP session MAY offer a timestamp clock 1203 range, while the receiver SHOULD select a timestamp clock within 1204 these bounds. If the receiver can not adjust it's timestamp clock to 1205 match the range, it MAY use a timestamp clock rate outside these 1206 bounds. If the receiver indicated a timestamp clock rate within the 1207 indicated bounds, the sender MUST set it's timestamp clock rate to 1208 the negotiated rate. If the receiver uses a timestamp clock rate 1209 outside the indicated bounds, the sender MUST set the local timestamp 1210 clock rate to the value indicated by the closer boundary. 1212 The following example sequence is provided to demonstrate how 1213 timestamp clock range negotiation works. Both sender and receiver 1214 finally know the clock rate of their respective partner. 1216 SYN, TSopt=, TSecr=EXO|VER=1|MASK|12bit-lo=1ms|12bit-hi=100ms 1218 SYN,ACK, TSopt=, TSecr=^EXO|VER=0|MASK|16bit=10ms 1220 In this example, both hosts would run their respective timestamp 1221 clocks with a resolution of 10 ms. 1223 SYN, TSopt=, TSecr=EXO|VER=1|MASK|12bit-lo=1ms|12bit-hi=100ms 1225 SYN,ACK, TSopt=, TSecr=^EXO|VER=0|MASK|16bit=1000ms 1227 In this example, the sender would set the timestamp clock rate to a 1228 resolution of 100 ms (closer to the receivers clock rate of 1 sec), 1229 while the receiver will have a timestamp clock rate running at 1 sec. 1231 SYN, TSopt=, TSecr=EXO|VER=1|MASK|12bit-lo=1ms|12bit-hi=100ms 1233 SYN,ACK, TSopt=, TSecr=^EXO|VER=0|MASK|16bit=100us 1235 In this example, the sender would set the timestamp clock rate to a 1236 resolution of 10 ms (closest to the receiver's clock rate of 100 us), 1237 while the receiver will have the timestamp clock running at 100 us. 1239 Appendix B. Revision history 1241 00 ... initial draft, early submission to meet deadline. 1243 01 ... refined draft, focusing only on those capabilities that have 1244 an immediate use case. Also excluding flags that can be substituted 1245 by other means (MIR - synergistic with SACK option only, RNG moved to 1246 appendix A, BIA removed and the exponent bias set to a fixed value. 1247 Also extended other paragraphs. 1249 02 ... updated document after IETF80 - referrals to "timestamp 1250 options" were seen to be ambiguous with "timestamp option", and 1251 therefore replaced by "timestamp capabilities". Also, the document 1252 was reworked to better align with RFC4101. Removed SGN and increased 1253 FRAC to allow higher precision. 1255 Authors' Addresses 1257 Richard Scheffenegger 1258 NetApp, Inc. 1259 Am Euro Platz 2 1260 Vienna, 1120 1261 Austria 1263 Phone: +43 1 3676811 3146 1264 Email: rs@netapp.com 1266 Mirja Kuehlewind 1267 University of Stuttgart 1268 Pfaffenwaldring 47 1269 Stuttgart 70569 1270 Germany 1272 Email: mirja.kuehlewind@ikr.uni-stuttgart.de