idnits 2.17.1 draft-scheffenegger-tcpm-timestamp-negotiation-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC1323, updated by this document, for RFC5378 checks: 1992-05-01) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 22, 2012) is 4203 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-01) exists of draft-trammell-tcpm-timestamp-interval-00 ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) == Outdated reference: A later version (-21) exists of draft-ietf-tcpm-1323bis-04 -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 6013 (Obsoleted by RFC 7805) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions R. Scheffenegger 3 (tcpm) NetApp, Inc. 4 Internet-Draft M. Kuehlewind 5 Updates: 1323 (if approved) University of Stuttgart 6 Intended status: Experimental B. Trammell 7 Expires: April 25, 2013 ETH Zurich 8 October 22, 2012 10 Additional negotiation in the TCP Timestamp Option field 11 during the TCP handshake 12 draft-scheffenegger-tcpm-timestamp-negotiation-05 14 Abstract 16 A number of TCP enhancements in diverse fields as congestion control, 17 loss recovery or side-band signaling could be improved by allowing 18 both ends of a TCP session to interpret the value carried in the 19 Timestamp option. Further enhancements are enabled by changing the 20 receiver side processing of timestamps in the presence of Selective 21 Acknowledgements. 23 This document updates RFC1323 and specifies a backward-compatible 24 method for negotiating for additional capabilities for the Timestamp 25 option, and lists a number of benefits and drawbacks of this 26 approach. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on April 25, 2013. 45 Copyright Notice 47 Copyright (c) 2012 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Overview of the TCP Timestamp Option . . . . . . . . . . . . . 7 65 4. Extended Timestamp Capabilities . . . . . . . . . . . . . . . 8 66 4.1. Description . . . . . . . . . . . . . . . . . . . . . . . 8 67 4.2. Timestamp echo update for Selective Acknowledgments . . . 9 68 5. Timestamp capability signaling and negotiation . . . . . . . . 10 69 5.1. Capability Flags . . . . . . . . . . . . . . . . . . . . . 10 70 5.2. Timestamp clock interval encoding . . . . . . . . . . . . 12 71 5.3. Negotiation error detection and recovery . . . . . . . . . 12 72 5.4. Interaction with Selective Acknowledgment . . . . . . . . 14 73 5.4.1. Interaction with the Retransmission Timer . . . . . . 15 74 5.4.2. Interaction with the PAWS test . . . . . . . . . . . . 16 75 5.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 16 76 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 77 7. Updates to Existing RFCs . . . . . . . . . . . . . . . . . . . 17 78 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 79 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 80 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 81 10.1. Normative References . . . . . . . . . . . . . . . . . . . 19 82 10.2. Informative References . . . . . . . . . . . . . . . . . . 19 83 Appendix A. Possible use cases . . . . . . . . . . . . . . . . . 21 84 A.1. Timestamp clock rate exposure . . . . . . . . . . . . . . 21 85 A.2. Early spurious retransmit detection . . . . . . . . . . . 22 86 A.3. Early lost retransmission detection . . . . . . . . . . . 23 87 A.4. Integrity of the Timestamp value . . . . . . . . . . . . . 24 88 A.5. Disambiguation with slow Timestamp clock . . . . . . . . . 25 89 A.6. Masked timestamps as segment digest . . . . . . . . . . . 26 90 Appendix B. Open Issues . . . . . . . . . . . . . . . . . . . . . 27 91 Appendix C. Revision history . . . . . . . . . . . . . . . . . . 27 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 94 1. Introduction 96 The Timestamp option originally introduced in [RFC1323] was designed 97 to support only two very specific mechanisms, round trip time 98 measurement (RTTM), and protection against wrapped sequence numbers 99 (PAWS), assuming a particular TCP algorithm (Reno). The current 100 semantics inhibit the use of the Timestamp option for other uses. 101 Taking advantage of developments since TCP Reno, in particular 102 Selective Acknowledgements (SACK) [RFC2018] allow different 103 semantics, which in turn enable new uses for the Timestamp option, 104 either for timing purposes (e.g. one-way delay variation measurement 105 in the context of congestion control), or as unique token (e.g. for 106 improved loss recovery). 108 This specification defines a protocol for the two ends of a TCP 109 session to negotiate alternative semantics of the Timestamp option 110 fields they will exchange during the rest of the session. It updates 111 RFC1323 but it is backwards compatible with implementations of 112 RFC1323 Timestamp options, and allows gradual deployment. 114 The RFC1323 timestamp protocol presents the following problems when 115 trying to extend it for alternative uses: 117 a. Unclear meaning of the value in a timestamp. 119 * A timestamp value (TSval) as defined in [RFC1323] is 120 deliberately only meaningful to the end that sends it. The 121 other end is merely meant to echo the value without 122 understanding it. This is fine if one end is trying to 123 measure two-way delay (round trip time). However, to measure 124 one-way delay variation, timestamps from both ends need to be 125 compared by one end, which needs to relate the values in 126 timestamps from both ends to a notion of the passage of time 127 that both ends share. 129 b. No control over which timestamp to echo. 131 * A host implementing [RFC1323] is meant to echo the timestamp 132 value of the most recent in-order segment received. This was 133 fine for TCP Reno, but it is not the best choice for TCP 134 sessions using selective acknowledgement (SACK) [RFC2018]. 136 * A [RFC1323] host is meant to echo the timestamp value of the 137 earliest unacknowledged segment, e.g. if a host delays ACKs 138 for one segment, it echoes the first timestamp not the second. 139 It is desirable to include delay due to ACK withholding when a 140 host is conservatively measuring RTT. However, is not useful 141 to include the delay due to ACK withholding when measuring 142 one-way delay variation. 144 c. Alternative protection against wrapped sequence numbers. 146 * [RFC1323] also points out that the timestamps it specifies 147 will always strictly monotonically increase in each window so 148 they can be used to protect against wrapped sequence numbers 149 (PAWS). If the endpoints negotiate an alternative timestamp 150 scheme in which timestamps may not monotonically increase per 151 window, then it needs to be possible to negotiate alternative 152 protection against wrapped sequence numbers. 154 To solve these problems this specification changes the wire protocol 155 of the TCP timestamp option in two main ways: 157 1. It updates [RFC1323] to add the ability to negotiate the 158 semantics of timestamp options. The initiator of a TCP session 159 starts the negotiation in the TSecr field in the first , 160 which is currently unused. This specification defines the 161 semantics of the TSecr field in a segment with the SYN flag set. 162 A version number is included to allow further extension of 163 capability negotiation in future. 165 2. A version independent ability to mask a specified number of the 166 lower significant bits of the timestamp values is present. These 167 masked bits are not considered for timestamp calculations, or in 168 an algorithm to protect against wrapped sequence numbers. Future 169 extensions can thereby change the timestamp signaling without 170 changing the modified treatment on the receiver side. 172 3. It updates [RFC1323] to define version 0 of timestamp 173 capabilities to include: 175 * the duration in seconds of a tick of the timestamp clock using 176 a time interval representation defined in 177 [I-D.trammell-tcpm-timestamp-interval]. 179 * agreement that both ends will echo the timestamp on the most 180 recently received segment, rather than the one that would be 181 echoed by an [RFC1323] host. There is no specific option to 182 request this behavior, however it is implied by successful 183 negotiation of both SACK and timestamp capabilities. 185 With this new wire protocol, a number of new use-cases for the TCP 186 timestamp option become possible. Appendix A gives some examples. 187 Further extensions might be required in future. Two possible ways to 188 extend the negotiation capabilities are mentioned, one maintaining 189 some of the semantics specified herein, and a incompatible extension 190 to allow for other semantics. 192 2. Terminology 194 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 195 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 196 document are to be interpreted as described in [RFC2119]. 198 The reader is expected to be familiar with the definitions given in 199 [RFC1323]. 201 Further terminology used within this document: 203 Timestamp option 204 This refers to the entire TCP timestamp option, including both 205 TSval and TSecr fields. 207 Timestamp capabilities 208 Refers only to the values and bits carried in the TSecr field of 209 and segments during a TCP handshake. For 210 signaling purposes, the timestamp capabilities are sent in clear 211 with the segment, and in an encoded form (see Section 5 for 212 details) in the segment. 214 3. Overview of the TCP Timestamp Option 216 The TCP Timestamp option (TSopt) provides timestamp echoing for 217 round-trip time (RTT) measurements. TSopt is widely deployed and 218 activated by default in many systems. [RFC1323] specifies TSopt the 219 following way: 221 Kind: 8 223 Length: 10 bytes 225 +-------+-------+---------------------+---------------------+ 226 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 227 +-------+-------+---------------------+---------------------+ 228 1 1 4 4 230 Figure 1: RFC1323 TSopt 232 "The Timestamps option carries two four-byte timestamp fields. 233 The Timestamp Value field (TSval) contains the current value of 234 the timestamp clock of the TCP sending the option. 236 The Timestamp Echo Reply field (TSecr) is only valid if the ACK 237 bit is set in the TCP header; if it is valid, it echos a times- 238 tamp value that was sent by the remote TCP in the TSval field of a 239 Timestamps option. When TSecr is not valid, its value must be 240 zero. The TSecr value will generally be from the most recent 241 Timestamp option that was received; however, there are exceptions 242 that are explained below. 244 A TCP may send the Timestamps option (TSopt) in an initial 245 segment (i.e., segment containing a SYN bit and no ACK bit), and 246 may send a TSopt in other segments only if it received a TSopt in 247 the initial segment for the connection." 249 The comparison of the timestamp in the TSecr field to the current 250 timestamp clock gives an estimation of the two-way delay (RTT). With 251 [RFC1323] the receiver is not supposed to interpret the TSval field 252 for timing purposes, e.g. one-way delay variation measurements, but 253 only to echo the content in the TSecr field. [RFC1323] specifies 254 various cases when more than one timestamp is available to echo. The 255 only property exposed to a receiver is a strict monotonic increase in 256 value, for use with the protection against wrapped sequence numbers 257 (PAWS) test. The approach taken by [RFC1323] is not always be the 258 best choice, i.e. when the TCP Selective Acknowledgment option (SACK) 259 is used in conjunction on the same session. 261 4. Extended Timestamp Capabilities 263 4.1. Description 265 Timestamp values are carried in each segment if negotiated for. 266 However, the content of these values is to be treated as an unmutable 267 and largely uninterpreted entity by the receiver. The timestamp 268 negotiation should allow for following criteria: 270 o Allow to state timing information explicitly during the initial 271 handshake, avoiding the proliferation of ad-hoc heuristics to 272 determine this information via some other means. Heuristics that 273 simply assume a specific timestamp clock intervals, or try to 274 learn the clock interval used by the partner during a training 275 phase extending beyond the initial handshake can thereby avoided. 276 This is discussed further in 277 [I-D.trammell-tcpm-timestamp-interval]. 279 o Indicate the (approximate) timestamp clock interval used by the 280 sender in a wide range. The longest interval should be around 10 281 seconds, while the shorted interval should allow unique timestamps 282 per segment, even at extremely high link speeds. A negotiation- 283 method-independent representation for timestamp intervals is given 284 in [I-D.trammell-tcpm-timestamp-interval]. 286 o Allow for timestamps that are not directly related to real time 287 (i.e. segment counting, or use of the timestamp value as a true 288 extension of sequence numbers). 290 o Provide means to prevent or at least detect tampering with the 291 echoed timestamp value, allowing for basic integrity and 292 consistency checks. 294 o Allow for future extensions that may use some of the timestamp 295 value bits for other signaling purposes during the remainder of 296 the session. 298 o Signaling must be backwards compatible with existing TCP stacks 299 implementing basic [RFC1323] timestamps. Current methods for 300 timestamp value generation must be supported. 302 o Allow for a means to disambiguate between retransmitted and 303 delayed segments. 305 o Cater for broken implementations of [RFC1323], that either send a 306 non-zero TSecr value in the initial , or a zero TSecr value 307 in . 309 o Provide flexibility to extend the negotiation protocol. 310 Backwards-compatible and incompatible extensions of using 311 timestamps should be available. 313 4.2. Timestamp echo update for Selective Acknowledgments 315 In [RFC1323], timing information is only considered in relation to 316 calculating a (conservative) estimate of the round trip time, in 317 order to arrive at a reasonable retransmission timeout (RTO). A 318 retransmission timeout is a very expensive event in TCP, in terms of 319 lost throughput and other metrics. For this reason, a receiver had 320 to follow special rules in what timestamp to echo. This was to never 321 underestimate the actual RTT, even during periods of loss or 322 reordering on either the forward or return path. No other explicit 323 signal could convey the presence of such events back to the sender at 324 the time [RFC1323] was defined. Therefore a receiver had to make 325 sure than at best, the timestamp of the last in-sequence segment 326 would be echoed to the sender. 328 Receivers conforming to [RFC1323] are required to only reflect the 329 timestamp of the last segment that was received in order, or the 330 timestamp of the last not yet acknowledged segment in the case of 331 delayed acknowledgments. 333 When selective acknowledgment (SACK) is enabled on a session, the 334 presence of a SACK option will explicitly signal reordering or loss 335 to the sender. This information can be used to suspend the 336 calculation of updated RTT estimates. As the SACK option will be 337 present in multiple ACKs, this also prevents increasing RTT 338 artificially when some of the ACKs, indicating loss, are dropped on 339 the return path. 341 A receiver supporting the timestamp negotiation mechanism defined in 342 this document MUST immediately reflect the value of TSval in the 343 segment triggering an ACK, when the same session also supports SACK. 345 The rules to update the state variable TS.recent remain the identical 346 to [RFC1323], and TS.recent must be evaluated when performing the 347 PAWS test on the receiver side. 349 By this change of semantics when using the timestamps and selective 350 acknowledgments [RFC2018] in the same session, enhancements in loss 351 recovery are possible by removing any remaining retransmission and 352 acknowledgment ambiguity. See Appendix A for a more detailed 353 discussion. Through the modification to the handling of which 354 timestamp to echo in the receiver, timestamps fulfill the properties 355 of the "token", as described in [I-D.sabatini-tcp-sack]. 357 5. Timestamp capability signaling and negotiation 359 In order to signal the supported capabilities, both the sender and 360 the receiver will independently generate a timestamp capability 361 negotiation field, as indicated below. The TSecr value field of the 362 [RFC1323] TSopt is overloaded with the following flags and fields 363 during the initial and segments. The connection 364 initiator will send the timestamp capabilities in plain, as with 365 [RFC1323] the TSecr is not used in the initial . The receiver 366 will XOR the local timestamp capabilities with the TSval received 367 from the sender and send the result in the TSecr field. The 368 initiating host of a session with timestamp capability negotiation 369 has to keep minimal state to decode the returned capabilities XOR'ed 370 with the sent TSval. 372 5.1. Capability Flags 374 Kind: 8 376 Length: 10 bytes 378 +-------+-------+---------------------+---------------------+ 379 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 380 +-------+-------+---------------------+---------------------+ 381 1 1 4 | 4 | 382 / | 383 .-----------------------------------' | 384 / \ 385 | | 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 |E| | # | 388 |X|VER| MSK # version specific contents | 389 |O| | # | 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 Figure 2: Timestamp Capability flags 394 Common fields to all versions: 396 EXO - Extended Options (1 bit) 397 Indicates that the sender supports extended timestamp 398 capabilities as defined by this document, and MUST be set to one 399 by a compliant implementation. This flag also enables the 400 immediate echoing of the TSval with the next ACK, if both 401 timestamp capabilities and selective acknowledgement [RFC2018] 402 are successful negotiated during the initial handshake (see 403 Section 4.2, and Section 5.4). This change in semantics is 404 independent of the version in the signaled timestamp 405 capabilities. 407 VER - Version (2 bits) 408 Version of the capabilities fields definition. This document 409 specifies codepoint 0 (00b). With the exception of the immediate 410 mirroring - simplifying the receiver side processing - and the 411 masking of some LSB bits before performing the Protection Against 412 Wrapped Sequence Numbers (PAWS) test, hosts must not interpret 413 the received timestamps and not use a timestamp value as input 414 into advanced heuristics, if the version received is not 415 supported. This is an identical requirement as with current 416 [RFC1323] compliant implementations. 417 The lower 3 octets of the timestamp capability flags MUST be 418 ignored if an unsupported version is received. It is expected, 419 that a host will implement at least version 0. A receiver MUST 420 respond with the appropriate (equal or version 0) version when 421 responding to a new session request. 423 MSK - Mask Timestamps (5 bits) 424 The MaSK field indicates how many least significant bits should 425 be excluded by the receiver, before further processing the 426 timestamp (i.e. PAWS, or for timing purposes). The unmasked 427 portion of a TSval has to comply with the constraints imposed by 428 [RFC1323] on the generation of valid timestamps, e.g. must be 429 monotone increasing between segments, and strict monotone 430 increasing for each TCP window. 431 Note that this does not impact the reflected timestamp in any way 432 - TSecr will always be equal to an appropriate TSval. This field 433 MUST be present in all future version of timestamp capability 434 fields. A value of 31 (all bits set) MUST be interpreted by a 435 receiver that the full TSval is to be ignored by any legacy 436 heuristics, e.g. disabling PAWS. For PAWS to be effective, at 437 least two not masked bits are required to discriminate between an 438 increase (and roll-over) versus outdated segments. 440 5.2. Timestamp clock interval encoding 442 Kind: 8 444 Length: 10 bytes 446 +-------+-------+---------------------+---------------------+ 447 |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| 448 +-------+-------+---------------------+---------------------+ 449 1 1 4 | 4 | 450 / | 451 .-----------------------------------' | 452 / \ 453 | | 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 |E| | # | | 456 |X|VER| MSK # reserved (0) | interval | 457 |O| | # | | 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 Figure 3: Timestamp Capability flags - version 0 462 reserved (8 bits) 463 Reserved for future use, and MUST be zero ("0") with version 0. 464 If timestamp capabilities are received with version set to 0, but 465 some of these bits set, the receiver MUST ignore the extended 466 options field and react as if the TSecr was zero (compatibility 467 mode). 469 interval (16 bits) 470 The interval of the timestamp clock, as defined in 471 [I-D.trammell-tcpm-timestamp-interval]. 473 5.3. Negotiation error detection and recovery 475 During the initial TCP three-way handshake, timestamp capabilities 476 are negotiated using the TSecr field. Timestamp capabilities MAY 477 only be negotiated in TSecr when the SYN bit is set. A host detects 478 the presence of timestamp capability flags when the EXO bit is set in 479 the TSecr field of the received segment. When receiving a 480 session request ( segment with timestamp capabilities), a 481 compliant TCP receiver is required to XOR the received TSval with the 482 receivers timestamp capabilities. The resulting value is then sent 483 in the response. 485 To support these design goals stated in Section 4, only the TSecr 486 field in the initial can be used directly. The response from 487 the receiver has to be encoded, since no unused field is available in 488 the . The most straightforward encoding is a XOR with a 489 value that is known to the sender. Therefore, the receiver also uses 490 TSecr to indicate its capabilities, but calculates the XOR sum with 491 the received TSval. This allows the receiver to remain stateless and 492 functionality like SYN Cache (see [RFC4987]) can be maintained with 493 no change. 495 If a sender has to retransmit the , this encoding also allows to 496 detect which segment was received and responded to. This is possible 497 by changing the timestamp clock offset between retransmissions in 498 such a way, that the decoding on the sender side would result in an 499 invalid timestamp capability negotiation field (e.g. some RES bits 500 are set). If the sender does not require the capability to 501 differentiate which was received, the timestamp clock offset 502 for each new can be set in such a way, that the TSopt of the 503 is identical for each retransmission. 505 As a receiver MAY report back a zero value at any time, in particular 506 during the , the sender is slightly constrained in its 507 selection of an initial Timestamp value. The Timestamp value sent in 508 the should be selected in such a way, that it does not resemble 509 a valid Timestamp capabilities field. One approach to ensure this 510 property is that the sender makes sure that at least one bit of the 511 RES field is set. This prevents a compliant sender to erroneously 512 detect a compliant receiver, if the returned TSecr value is zero. 514 A host initiating a TCP session must verify if the partner also 515 supports timestamp capability negotiation and a supported version, 516 before using enhanced algorithms. Note that this change in semantics 517 does not necessarily change the signaling of timestamps on the wire 518 after initial negotiation. 520 To mitigate the effect from misbehaving TCP senders appearing to 521 negotiate for timestamp capabilities, a receiver MUST verify that one 522 specific bit (EXO) is set, and any reserved bits (currently 8, RES 523 field) are cleared. This limits the chance for a receiver to 524 mistakenly negotiate for version 0 capabilities in the presence of a 525 misbehaving sender to around 0.05%. The prevalence of misbehaving 526 senders, and distribution of observed TSecr values, limits this to 527 less than 1 in 6 million. The modifications described in 528 [I-D.ietf-tcpm-1323bis] and implemented in a receiver would further 529 decrease the false negotiation to less then 10^-7. 531 However, as a receiver has to use changed semantics when reflecting 532 TSval also for higher values in the version field, a misbehaving 533 sender negotiating for SACK, but not properly clearing TSecr, may 534 have a 37.5% chance of receiving timestamp values with modified 535 receiver behavior (from an approximate population of 0.00036% of 536 sessions being started without a cleared TSecr). This may lead to an 537 increased number of spurious retransmission timeouts, putting such a 538 session from a misbehaving TCP sender to a disadvantage. 540 Once timestamp capabilities are successfully negotiated, the receiver 541 MUST ignore an indicated number of masked, low-order bits, before 542 applying the heuristics defined in [RFC1323]. The monotonic increase 543 of the timestamp value for each new segment could be violated if the 544 full 32 bit field, including the masked bits, are used. This 545 conflicts with the constraints imposed by PAWS. 547 The presented distribution of the common three fields, EXO, VER and 548 MASK, that MUST be present regardless of which version is implemented 549 in a compliant TCP stack, is a result of the previously mentioned 550 design goals. The lower three octets MAY be redefined freely with 551 subsequent versions of the timestamp capability negotiation protocol. 552 This allows a future version to be implemented in such a way, that a 553 receiver can still operate with the modified behavior, and a minimum 554 amount of processing (PAWS) 556 5.4. Interaction with Selective Acknowledgment 558 If both Timestamp capabilities and Selective Acknowledgement options 559 [RFC2018] are negotiated (both hosts send these options in their 560 respective handshake segments), both hosts MUST echo the timestamp 561 value of the last received segment, irrespective of the order of 562 delivery. Note that this is in conflict with [RFC1323], where only 563 the timestamp of the last segment received in sequence is mirrored. 564 As SACK allows discrimination of reordered or lost segments, the 565 reflected timestamp is not required to convey the most conservative 566 information. If SACK indicates lost or reordered packets at the 567 receiver, the sender MUST take appropriate action such as ignoring 568 the received timestamps for calculating the round trip time, or 569 assuming a delayed packet (with appropriate handling). An updated 570 algorithm to calculate the retransmission timeout timer (RTO) is 571 beyond the scope of this document. 573 The immediate echoing of the last received timestamp value allowed by 574 the simultaneous use of the timestamp option with the SACK option 575 enables enhancements to improve loss recovery, round trip time (RTT) 576 and one-way delay (OWD) variation measurements (see Appendix A) even 577 during loss or reordering episodes. This is enabled by removing any 578 retransmission ambiguity using unique timestamps for every 579 retransmission, while simultaneously the SACK option indicates the 580 ordering of received segments even in the presence of ACK loss or 581 reordering. 583 For legacy applications of the timestamp option such as RTTM and 584 PAWS, the presence of the SACK option gives a clear indication of 585 loss or reordering. Under these circumstances, RTTM should not be 586 invoked even under [RFC1323], but often is, due to separate handling 587 of timestamp and SACK options). 589 The use of RTT and OWD measurements during loss episodes is an open 590 research topic. A sender has to accommodate for the changed 591 timestamp semantics in order to maintain a conservative estimate of 592 the Retransmission Timer as defined in [RFC6298], when a TCP sender 593 has negotiated for an immediate reflection of the timestamp 594 triggering an ACK (i.e. both timestamp capability negotiation and 595 Selective Acknowledgements are enabled for the session). As the 596 presence of a SACK option in an ACK signals an ongoing reordering or 597 loss episode, timestamps conveyed in such segments MUST NOT be used 598 to update the retransmission timeout. Also note that the presence of 599 a SACK option alleviates the need of the receiver to reflect the last 600 in-order timestamp, as lost ACKs can no longer cause erroneous 601 updates of the retransmission timeout. 603 5.4.1. Interaction with the Retransmission Timer 605 The above stated rule, to ignore timestamps as soon as a SACK option 606 is present, is fully consistent with the guidance given in [RFC1323], 607 even though most implementations skip over such an additional 608 verification step in the presence of the SACK option. 610 To address the additional delay imposed by delayed ACKs, a compliant 611 sender SHOULD modify the update procedure when receiving normal, in- 612 sequence ACKs that acknowledge more than SMSS bytes, so that the 613 input (denoted R in [RFC6298]) is calculated as 615 R = RTT * ( 1 + 1/(cwnd/smss) ) 617 If RTT (as measured in units of the timestamp clock) is smaller than 618 the congestion window measured in full sized segments, the above 619 heuristic MAY be bypassed before updating the retransmission timeout 620 value. 622 5.4.2. Interaction with the PAWS test 624 The PAWS test as defined in [RFC1323] requires constant monotonic 625 increasing values at the receiver side. As TS.Recent is no longer 626 used to track which timestamp to echo, this variable can be reused. 627 Instead of tracking the timestamp sent in the most recent ACK, a more 628 strict update rule could be used: 630 "For example, we might save the timestamp from the segment that 631 last advanced the left edge of the receive window, i.e., the most 632 recent in-sequence segment." 634 TS.Recent is only to be updated whenever the left window advances, 635 but no longer has to consider delayed ACKs. 637 5.5. Discussion 639 RTT and OWD variation during loss episodes is not deeply researched. 640 Current heuristics ([RFC1122], [RFC1323], Karn's algorithm [RFC2988]) 641 explicitly exclude (and prevent) the use of RTT samples when loss 642 occurs. However, solving the retransmission ambiguity problem - and 643 the related reliable ACK delivery problem - would enable new 644 functionality to improve TCP processing. Also, having an immediate 645 echo of the last received timestamp value would enable new research 646 to distinguish between corruption loss (assumed to have no RTT / OWD 647 impact) and congestion loss (assumed to have RTT / OWD impact). 648 Research into this field appears to be rather neglected, especially 649 when it comes to large scale, public internet investigations. Due to 650 the very nature of this, passive investigations without signals 651 contained within the headers are only of limited use in empirical 652 research. 654 Retransmission ambiguity detection during loss recovery would allow 655 an additional level of loss recovery control without reverting to 656 timer-based methods. As with the deployment of SACK, separating 657 "what" to send from "when" to send it could be driven one step 658 further. In particular, less conservative loss recovery schemes 659 which do not trade principles of packet conservation against 660 timeliness, require a reliable way of prompt and best possible 661 feedback from the receiver about any delivered segment and their 662 ordering. [RFC2018] SACK alone goes quite a long way, but using 663 timestamp information in addition could remove any ambiguity. 664 However, the current specs in [RFC1323] make that use impossible, 665 thus a modified semantic (receiver behavior) is a necessity. 667 A change in signaling with immediate timestamp value echoes would 668 however break some legacy, per-packet RTT measurements. The reason 669 is, that delayed ACKs would not be covered. Research has shown, that 670 per-packet updates of the RTT estimation (for the purpose of 671 calculating a reasonable RTO value) are only of limited benefit (see 672 [Path99], and [PH04]). This is the most serious implication of the 673 proposed signaling scheme with directly echoing the timestamp value 674 of the segment triggering the ACK, when the SACK options is also 675 negotiated for. Even when using the directly reflected timestamp 676 values in an unmodified RTT estimator, the immediate impact would be 677 limited to causing premature RTOs when the sending rate suddenly 678 drops below two segments per RTT. That is, assuming the receiver 679 implements delayed ACK and sending one ACK for every other data 680 segment received. If the receiver has also D-SACK [RFC2883] enabled, 681 such premature RTOs can be detected and mitigated by the sender (for 682 example, by increasing minRTO for low bandwidth flows). 684 Allowing timestamps to play a more important role in TCP signaling 685 also gives rise to concerns. When the timestamp is used for 686 congestion control purposes, this gives an incentive for malicious 687 receivers to reflect tampered timestamps. During the early phases of 688 the introduction of Cubic, such modifications where shown to result 689 in unfair advantages to malicious receivers, that selectively alter 690 the reflected timestamp values (see [CUBIC]). For that very reason, 691 this document introduces the explicit possibility to include a signal 692 in the timestamp values that is excluded from any processing by the 693 receiver. A sender can then decide how to make use of this 694 capability, e.g. for use as additional security information, 695 improvements of loss recovery or other, yet unknown, means. 697 6. Acknowledgements 699 The authors would like to thank Dragana Damjanovic for some initial 700 thoughts around Timestamps and their extended potential use. 702 We would like to thank Bob Briscoe for his insightful comments, and 703 the gratuitous donation of text, that have resulted in a 704 substantially improved document. 706 We further want to thank Michael Welzl for his input and discussion. 708 7. Updates to Existing RFCs 710 Care has been taken to make sure the updates in this specification 711 can be deployed incrementally. 713 Updates to existing [RFC1323] implementations are only REQUIRED if 714 they do not clear the TSecr value in the initial segment. This 715 is a misinterpretation of [RFC1323] and may leak data anyway (see 717 [I-D.ietf-tcpm-tcp-security]). Also see [I-D.ietf-tcpm-1323bis], as 718 this stipulates, that the TSval sent in a should be zeroed, 719 further reducing the chance for a false positive. It is expected, 720 that these changes are implemented in stacks making use of timestamp 721 negotiation. Otherwise, there will be no need to update an RFC1323- 722 compliant TCP stack unless the timestamp capabilities negotiation is 723 to be used. 725 Implementations compliant with the definitions in this document shall 726 be prepared to encounter misbehaving senders, that don't clear TSecr 727 in their initial . It is believed, that checking the reserved 728 bits to be all zero provides enough protection against misbehaving 729 senders. 731 In the unlikely case of an erroneous negotiation of timestamp 732 capabilities between a compliant receiver, and a misbehaving sender, 733 the proposed semantic changes will trigger a higher rate of spurious 734 retransmissions, while time-based heuristics on the receiver side may 735 further negatively impact congestion control decisions. Overall, 736 misbehaving receivers will suffer from self-inflicted reductions in 737 TCP performance. 739 8. IANA Considerations 741 With this document, the IANA is requested to establish a new registry 742 to record the timestamp capability flags defined with future versions 743 (codepoints 1, 2 and 3). 745 The lower 24 bits (3 octets) of the timestamp capabilities field may 746 be freely assigned in future versions. The first octet must always 747 contain the EXO, VER and MASK fields for compatibility, and the MASK 748 field MUST be set to allow interoperation with a version 0 receiver. 750 This document specifies version 0 and the use of the last three 751 octets to signal the senders timestamp clock interval to the 752 receiver. 754 9. Security Considerations 756 The algorithm presented in this paper shares security considerations 757 with [RFC1323] (see [I-D.ietf-tcpm-tcp-security]). 759 An implementation can address the vulnerabilities of [RFC1323], by 760 dedicating a few low-order bits of the timestamp fields for use with 761 a (secure) hash, that protects against malicious modification of 762 returned timestamp value by the receiver. A MASK field has been 763 provided to explicitly notify the receiver about that alternate use 764 of low-order bits. This allows the use of timestamps for purposes 765 requiring higher integrity and security while allowing the receiver 766 to extract useful information nevertheless. 768 10. References 770 10.1. Normative References 772 [I-D.trammell-tcpm-timestamp-interval] 773 Scheffenegger, R., Kuehlewind, M., and B. Trammell, 774 "Exposure of Time Intervals for the TCP Timestamp Option", 775 draft-trammell-tcpm-timestamp-interval-00 (work in 776 progress), October 2012. 778 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 779 for High Performance", RFC 1323, May 1992. 781 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 782 Selective Acknowledgment Options", RFC 2018, October 1996. 784 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 785 Requirement Levels", BCP 14, RFC 2119, March 1997. 787 10.2. Informative References 789 [BSD10] Hayes, D., "Timing enhancements to the FreeBSD kernel to 790 support delay and rate based TCP mechanisms", Feb 2010, . 794 [CUBIC] Rhee, I., Ha, S., and L. Xu, "CUBIC: A New TCP-Friendly 795 High-Speed TCP Variant", Feb 2005, . 799 [Cho08] Cho, I., Han, J., and J. Lee, "Enhanced Response Algorithm 800 for Spurious TCP Timeout (ER-SRTO)", Jan 2008, . 805 [I-D.blanton-tcp-reordering] 806 Blanton, E., Dimond, R., and M. Allman, "Practices for TCP 807 Senders in the Face of Segment Reordering", 808 draft-blanton-tcp-reordering-00 (work in progress), 809 February 2003. 811 [I-D.ietf-tcpm-1323bis] 812 Borman, D., Braden, R., Jacobson, V., and R. 813 Scheffenegger, "TCP Extensions for High Performance", 814 draft-ietf-tcpm-1323bis-04 (work in progress), 815 August 2012. 817 [I-D.ietf-tcpm-anumita-tcp-stronger-checksum] 818 Biswas, A., "Support for Stronger Error Detection Codes in 819 TCP for Jumbo Frames", 820 draft-ietf-tcpm-anumita-tcp-stronger-checksum-00 (work in 821 progress), May 2010. 823 [I-D.ietf-tcpm-tcp-security] 824 Gont, F., "Survey of Security Hardening Methods for 825 Transmission Control Protocol (TCP) Implementations", 826 draft-ietf-tcpm-tcp-security-03 (work in progress), 827 March 2012. 829 [I-D.sabatini-tcp-sack] 830 Sabatini, A., "Highly Efficient Selective Acknowledgement 831 (SACK) for TCP", draft-sabatini-tcp-sack-01 (work in 832 progress), August 2012. 834 [Linux] Sarolahti, P., "Linux TCP", Apr 2007, 835 . 837 [PH04] Eckstroem, H. and R. Ludwig, "The Peak-Hopper: A New End- 838 to-End Retransmission Timer for Reliable Unicast 839 Transport", Apr 2004, . 842 [Path99] Allman, M. and V. Paxson, "On Estimating End-to-End 843 Network Path Properties", Sep 1999, 844 . 846 [RFC1122] Braden, R., "Requirements for Internet Hosts - 847 Communication Layers", STD 3, RFC 1122, October 1989. 849 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 850 Extension to the Selective Acknowledgement (SACK) Option 851 for TCP", RFC 2883, July 2000. 853 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 854 Timer", RFC 2988, November 2000. 856 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 857 for TCP", RFC 3522, April 2003. 859 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 860 for TCP", RFC 4015, February 2005. 862 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 863 Mitigations", RFC 4987, August 2007. 865 [RFC6013] Simpson, W., "TCP Cookie Transactions (TCPCT)", RFC 6013, 866 January 2011. 868 [RFC6247] Eggert, L., "Moving the Undeployed TCP Extensions RFC 869 1072, RFC 1106, RFC 1110, RFC 1145, RFC 1146, RFC 1379, 870 RFC 1644, and RFC 1693 to Historic Status", RFC 6247, 871 May 2011. 873 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 874 "Computing TCP's Retransmission Timer", RFC 6298, 875 June 2011. 877 Appendix A. Possible use cases 879 A.1. Timestamp clock rate exposure 881 Today, each TCP host may use an arbitrary, locally defined clock 882 source to derive the timestamp value from. Even though only a 883 handful of typically used clock rates are implemented in common TCP 884 stacks, this does not guarantee that any future stack will choose the 885 same clock rate. This poses a problem for current state of the art 886 heuristics, which try to determine the senders timestamp clock rate 887 by pure passive observation of the TCP stream, and affects both 888 advanced heuristics in the partner host of a TCP session, or 889 arbitrarily located passive observation points to estimate TCP 890 session parameters. 892 The proposed mechanism would reveal this information explicitly, even 893 though other environmental factors, such as the operation of a TCP 894 stack in a virtualized environment, may result in some deviations in 895 the actually used clock rate. 897 High-speed and real-time stacks would be expected to operate with 898 higher clock rates, while the observed variance in (known) timestamp 899 clock vs. reference clock could help in determining between physical 900 and virtual end hosts, for example. 902 A.2. Early spurious retransmit detection 904 Using the provided timestamp negotiation scheme, clients utilizing 905 slow running timestamp clocks can set aside a small number of least 906 significant bits in the timestamps. These bits can be used to 907 differentiate between original and retransmitted segments, even 908 within the same timestamp clock tick (i.e. when RTT is shorter than 909 the TCP timestamp clock interval). It is recommended to use only a 910 single bit (mask = 1), unless the sender can also perform lost 911 retransmission detection. Using more than 2 bits for this purpose is 912 discouraged due to the diminishing probability of loosing 913 retransmitted packets more than one time. A simple scheme could send 914 out normal data segments with the so masked bits all cleared. Each 915 advance of the timestamp clock also clears those bits again. When a 916 segment is retransmitted without the timestamp clock increasing, 917 these bits increased by one for each consecutive retry of the same 918 segment, until the maximum value is reached. Newly sent segments 919 (during the same clock interval) should maintain these bits, in order 920 to maintain monotonically increasing values, even though compliant 921 end hosts do not require this property. This scheme maintains 922 monotonically increasing timestamp values - including the masked 923 bits. Even without negotiating the immediate mirroring of timestamps 924 (done by simultaneously doing timestamp capabilities negotiation, and 925 selective acknowledgments), this extends the use of the Eifel 926 Detection [RFC3522] and Eifel Response [RFC4015] algorithm to detect 927 and react to spurious retransmissions under all circumstances. Also, 928 currently experimental schemes such as ER-SRTO [Cho08] could be 929 deployed without requiring the receiver to explicitly support that 930 capability. 932 Seg0 Seg1 Seg2 Seg3 Seg4 933 TS00 TS00 TS00 TS00 TS00 934 X 936 Seg1 Seg5 937 TS01 TS01 939 Seg6 Seg7 940 TS01 TS10 942 Figure 4: timestamp for spurious retranmit detection 944 Masked bits are the 2nd digit, the timestamp value is represented by 945 the first digit. The timestamp clock "ticks" between segment 6 and 946 7. 948 A.3. Early lost retransmission detection 950 During phases where multiple segments in short succession (but not 951 necessarily successive segments) are lost, there is a high likelihood 952 that at least one segment is retransmitted, while the cause of loss 953 (i.e. congestion, fading) is still persisting. The best current 954 algorithms can recover such a lost retransmission with a few 955 constraints, for example, that the session has to have at least 956 DupThresh more segments to send beyond the current recovery phase. 957 During loss recovery, when a retransmission is lost again, currently 958 the timestamp can also not be used as means of conveying additional 959 information, to allow more rapid loss recovery while maintaining 960 packet conservation principles. Only the timestamp of the last 961 segment preceding the continuous loss will be reflected. Using the 962 extended timestamp option negotiation together with selective 963 acknowledgements, the receiver will immediately reflect the timestamp 964 of the last seen segment. Using both SACK and TS information in 965 conjunction with each other, a sender can infer the exact order in 966 which original and retransmitted segments are received. This allows 967 faster recovery from lost retransmissions while maintaining the 968 principle of packet conservations and avoiding costly retransmission 969 timeouts. 971 The implementation can be done in combination with the masked bit 972 approach described in the previous paragraph, or without. However, 973 if the timestamp clock interval is lower than 1/2 RTT, both the 974 original and the retransmitted segment may carry an identical 975 timestamp. If the sender cannot discriminate between the original 976 and the retransmitted segments, is must refrain from taking any 977 action before such a determination can be made. 979 In this example, masked bits are used, with a simple marking method. 980 As the timestamp value of the retransmission itself is already 981 different from the original segments, such an additional 982 discrimination would not strictly be required here. The timestamp 983 clock ticks in the first digit and the dupthresh value is 3. 985 Seg0 Seg1 Seg2 Seg3 Seg4 Seg5 Seg6 Seg7 986 TS00 TS00 TS00 TS10 TS10 TS10 TS10 TS20 987 X X X * 989 Seg1 Seg2 Seg3 Seg4 990 TS21 TS30 TS30 TS30 991 X 993 Seg1 Seg8 Seg9 994 TS31 TS31 TS40 996 Figure 5: timestamp under loss 998 If Seg1,TS00 is lost twice, and Seg4,TS10 is also lost, the sender 999 could resend Seg1 once more after observing dupthresh number of 1000 segments sent after the first retransmission of Seg1 being received 1001 (ie, when Seg4 is SACKed). However, there is an ambiguity between 1002 retransmitted segments and original segments, as the sender cannot 1003 know, if a SACK for one particular segment was due to the 1004 retransmitted segment, or a delayed original segment. The timestamp 1005 value will not help in this case, as per RFC1323 it will be held at 1006 TS00 for the entire loss recovery episode. Therefore, currently a 1007 sender has to assume that any SACKed segments may be due to delayed 1008 original sent segments, and can only resolve this conflict by 1009 injecting additional, previously unsent segments. Once dupthresh 1010 newly injected segments are SACKed, continuous loss (and not further 1011 delay) of Seg1 can safely be assumed, and that segment be resent. 1012 This approach is conservative but constrained by the requirement that 1013 additional segments can be sent, and thereby delayed in the response. 1015 With the simultaneous use of timestamp extended options together with 1016 selective acknowledgments, the receiver would immediately reflect 1017 back the timestamp of the last received segment. This allows the 1018 sender to discriminate between a SACK due to a delayed Seg4,TS10, or 1019 a SACK because of Seg4,TS30. Therefore, the appropriate decision 1020 (retransmission of Seg1 once more, or addressing the observed 1021 reordering/delay accordingly [I-D.blanton-tcp-reordering] can be 1022 taken with high confidence. 1024 A.4. Integrity of the Timestamp value 1026 If the timestamp is used for congestion control purposes, an 1027 incentive exists for malicious receivers to reflect tampered 1028 timestamps, as demonstrated with some exploits [CUBIC]. 1030 One way to address this is to not use timestamp information directly, 1031 but to keep state in the sender for each sent segment, and track the 1032 round trip time independent of sent timestamps. Such an approach has 1033 the drawback, that it is not straightforward to make it work during 1034 loss recovery phases for those segments possibly lost (or reordered). 1035 In addition there is processing and memory overhead to maintain 1036 possibly extensive lists in the sender that need to be consulted with 1037 each ACK. Despite these drawbacks, this approach is currently 1038 implemented due to lack of alternatives (see [Linux], and [BSD10]). 1040 The preferred approach is that the sender MAY choose to protect 1041 timestamps from such modifications by including a fingerprint (secure 1042 hash of some kind) in some of the least significant bits. However, 1043 doing so prevents a receiver from using the timestamp for other 1044 purposes, unless the receiver has prior knowledge about this use of 1045 some bits in the timestamp value. Furthermore, strict monotonic 1046 increasing values are still to be maintained. That constraint 1047 restricts this approach somewhat and limits or inhibits the use of 1048 timestamp values for direct use by the receiver (i.e. for one-way 1049 delay variation measurement, as the hash bits would look like random 1050 noise in the delay measurement). 1052 A.5. Disambiguation with slow Timestamp clock 1054 In addition, but somewhat orthogonal to maintaining timestamp value 1055 integrity, there is a use case when the sender does not support a 1056 timestamp clock interval that can guarantee unique timestamps for 1057 retransmitted segments. This may happen whenever the TCP timestamp 1058 clock interval is higher than the round-trip time of the path. For 1059 unambiguously identifying regular from retransmitted segments, the 1060 timestamp must be unique for otherwise identical segments. Reserving 1061 the least significant bits for this purpose allows senders with slow 1062 running timestamp clocks to make use of this feature. However, 1063 without modifying the receiver behavior, only limited benefits can be 1064 extracted from such an approach. Furthermore the use of this option 1065 has implications in the protection against wrapped sequence numbers 1066 (PAWS - [RFC1323]), as the more bits are set aside for tamper 1067 prevention, the faster the timestamp number space cycles. 1069 Using Timestamp capabilities to explicitly negotiate mask bits, and 1070 set aside a (low) number of least significant bits for the above 1071 listed purposes, allows a sender to use more reliable integrity 1072 checks. These masked bits are not to be considered part of the 1073 timestamp value, for the purposes described in [RFC1323] (i.e. PAWS) 1074 and subsequent heuristics using timestamp values (i.e. Eifel 1075 Detection), thereby lifting the strict requirement of always 1076 monotonically increasing timestamp values. However, care should be 1077 taken to not mask too many bits, for the reasons outlined in 1078 [RFC1323]. Using a mask value higher than 8 is therefore 1079 discouraged. 1081 The reason for having 5 bits for the mask field nevertheless is to 1082 allow the implementation of this protocol in conjunction with TCP 1083 cookie transaction (TCPCT) extended timestamps [RFC6013]. That 1084 allows for nearly a quarter of a 128 bit timestamp to be set aside. 1086 A.6. Masked timestamps as segment digest 1088 After making TCP alternate checksums historic (see [RFC6247]), there 1089 still remains a need to address increased corruption probabilities 1090 when segment sizes are increased (see 1091 [I-D.ietf-tcpm-anumita-tcp-stronger-checksum]). 1093 Utilizing a completely masked TSval field allows the sender to 1094 include a stronger CRC32, with semantics independent of the fixed TCP 1095 header fields. However, such a use would again exclude the use of 1096 PAWS on the receiver side, and a receiver would need to know the 1097 specifics of the digest for processing. It is assumed, that such a 1098 digest would only cover the data payload of a TCP segment. In order 1099 to allow disambiguation of retransmissions, a special TSval can be 1100 defined (e.g. TSval=0) which bypasses regular CRC processing but 1101 allows the identification of retransmitted segments. 1103 The full semantics of such a data-only CRC scheme are beyond the 1104 scope of this document, but would require a different version of the 1105 timestamp capability. Nevertheless, allowing the full TSval to 1106 remain unprocessed by the receiver for the purpose of PAWS even in 1107 version 0 could still allow the successful negotiation of sender-side 1108 enhancements such as loss recovery improvements (see Appendix A.2, 1109 and Appendix A.3). 1111 In effect, the masked portion of the timestamp value represent an 1112 unreliable out of band signal channel, that could also be used for 1113 other purposes than solely performing timestamp integrity checks (for 1114 example, this would allow ER-SRTO algorithms [Cho08]). 1116 Appendix B. Open Issues 1118 o The split between this draft and 1119 [I-D.trammell-tcpm-timestamp-interval] is cursory; additional 1120 separation of timestamp interval export may be necessary. 1122 o [bht] suggest changing the "versioning" construct to a 1123 "capabilities" construct, especially since two bits of versioning 1124 might as well be none. The base specification would then define 1125 the alternate semantics WRT SACK and could use capabilities to 1126 define further semantics. 1128 o [bht] does it make sense to move masking out of the base spec and 1129 into the 8 "unused" bits in "version 0" (in order to get more 1130 capabilities bits / "magic bits" to reduce erroneous negotiation)? 1132 o [bht] does it make sense to define SACK-echo as version/capability 1133 independent? 1135 Appendix C. Revision history 1137 This appendix should be removed by the RFC Editor before publishing 1138 this document as a RFC. 1140 00 ... initial draft, early submission to meet deadline. 1142 01 ... refined draft, focusing only on those capabilities that have 1143 an immediate use case. Also excluding flags that can be substituted 1144 by other means (MIR - synergistic with SACK option only, RNG moved to 1145 appendix A, BIA removed and the exponent bias set to a fixed value. 1146 Also extended other paragraphs. 1148 02 ... updated document after IETF80 - referrals to "timestamp 1149 options" were seen to be ambiguous with "timestamp option", and 1150 therefore replaced by "timestamp capabilities". Also, the document 1151 was reworked to better align with RFC4101. Removed SGN and increased 1152 FRAC to allow higher precision. 1154 03 ... removed references to "opaque" and "transparent". substituted 1155 "timestamp clock interval" for all instances of rate. Changed signal 1156 encoding to resemble a scale/value approach like what is done with 1157 Window Scaling. As added benefit, clock quality can be implicitly 1158 signaled, since multiple representations can map to idential time 1159 intervals. Added discussion around resilience against broken RFC1323 1160 implementations (Win95, Linux 2.3.41+), which deviate from expected 1161 Timestamp signaling behavior. 1163 04 ... removed previous appendix A (range negotiation); minor edit to 1164 improve wording; moved Section 6 to the Appendix, and removed covert 1165 channels from the potential uses; added some text to discuss future 1166 versioning (compatible and incompatible variants); changed document 1167 structure; added guidance around PAWS; added pseudo-code examples 1168 (probably to be removed again) 1170 05 ... added new Open Issues section, added reference to separate 1171 interval draft, removed content on timestamp interval exposure which 1172 now appears in the interval draft. Removed pseudocode examples until 1173 they can be reworked on finalization of the mechanism, as they refer 1174 to fields which have changed / moved to the interval draft. 1176 Authors' Addresses 1178 Richard Scheffenegger 1179 NetApp, Inc. 1180 Am Euro Platz 2 1181 Vienna, 1120 1182 Austria 1184 Phone: +43 1 3676811 3146 1185 Email: rs@netapp.com 1187 Mirja Kuehlewind 1188 University of Stuttgart 1189 Pfaffenwaldring 47 1190 Stuttgart 70569 1191 Germany 1193 Email: mirja.kuehlewind@ikr.uni-stuttgart.de 1195 Brian Trammell 1196 Swiss Federal Institute of Technology Zurich 1197 Gloriastrasse 35 1198 8092 Zurich 1199 Switzerland 1201 Phone: +41 44 632 70 13 1202 Email: trammell@tik.ee.ethz.ch