idnits 2.17.1 

draft-ietf-tcpm-1323bis-17.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The abstract seems to indicate that this document obsoletes RFC1323, but
     the header doesn't have an 'Obsoletes:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 15, 2013) is 3814 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'Ekstroem04' is defined on line 1298, but no explicit
     reference was found in the text

  == Unused Reference: 'Hamming77' is defined on line 1316, but no explicit
     reference was found in the text

  == Unused Reference: 'Jain86' is defined on line 1340, but no explicit
     reference was found in the text

  == Unused Reference: 'Mathis08' is defined on line 1370, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC0896' is defined on line 1395, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1110' is defined on line 1401, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2581' is defined on line 1419, but no explicit
     reference was found in the text

  == Unused Reference: 'Watson81' is defined on line 1463, but no explicit
     reference was found in the text

  == Unused Reference: 'Zhang86' is defined on line 1468, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 1072
     (Obsoleted by RFC 1323, RFC 2018, RFC 6247)

  -- Obsolete informational reference (is this intentional?): RFC 1110
     (Obsoleted by RFC 6247)

  -- Obsolete informational reference (is this intentional?): RFC 1185
     (Obsoleted by RFC 1323)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 1981
     (Obsoleted by RFC 8201)

  -- Obsolete informational reference (is this intentional?): RFC 2581
     (Obsoleted by RFC 5681)

  -- Obsolete informational reference (is this intentional?): RFC 6528
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 6691
     (Obsoleted by RFC 9293)


     Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance (TCPM)                                         D. Borman
3	Internet-Draft                                       Quantum Corporation
4	Intended status: Standards Track                               B. Braden
5	Expires: May 19, 2014                             University of Southern
6	                                                              California
7	                                                             V. Jacobson
8	                                                            Google, Inc.
9	                                                   R. Scheffenegger, Ed.
10	                                                            NetApp, Inc.
11	                                                       November 15, 2013

13	                  TCP Extensions for High Performance
14	                       draft-ietf-tcpm-1323bis-17

16	Abstract

18	   This document specifies a set of TCP extensions to improve
19	   performance over paths with a large bandwidth * delay product and to
20	   provide reliable operation over very high-speed paths.  It defines
21	   the TCP Window Scale (WS) option and the TCP Timestamps (TS) option
22	   and their semantics.  The Window Scale option is used to support
23	   larger receive windows, while the Timestamps option can be used for
24	   at least two distinct mechanisms, PAWS (Protection Against Wrapped
25	   Sequences) and RTTM (Round Trip Time Measurement), that are also
26	   described herein.

28	   This document obsoletes RFC1323 and describes changes from it.

30	Status of this Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on May 19, 2014.

47	Copyright Notice
48	   Copyright (c) 2013 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
64	     1.1.  TCP Performance  . . . . . . . . . . . . . . . . . . . . .  4
65	     1.2.  TCP Reliability  . . . . . . . . . . . . . . . . . . . . .  5
66	     1.3.  Using TCP options  . . . . . . . . . . . . . . . . . . . .  6
67	     1.4.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  7
68	   2.  TCP Window Scale option  . . . . . . . . . . . . . . . . . . .  8
69	     2.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .  8
70	     2.2.  Window Scale option  . . . . . . . . . . . . . . . . . . .  8
71	     2.3.  Using the Window Scale option  . . . . . . . . . . . . . .  9
72	     2.4.  Addressing Window Retraction . . . . . . . . . . . . . . . 10
73	   3.  TCP Timestamps option  . . . . . . . . . . . . . . . . . . . . 12
74	     3.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 12
75	     3.2.  Timestamps option  . . . . . . . . . . . . . . . . . . . . 12
76	   4.  The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . . . 15
77	     4.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 15
78	     4.2.  Updating the RTO value . . . . . . . . . . . . . . . . . . 16
79	     4.3.  Which Timestamp to Echo  . . . . . . . . . . . . . . . . . 16
80	   5.  PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 20
81	     5.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . 20
82	     5.2.  The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 20
83	     5.3.  Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 21
84	     5.4.  Timestamp Clock  . . . . . . . . . . . . . . . . . . . . . 23
85	     5.5.  Outdated Timestamps  . . . . . . . . . . . . . . . . . . . 25
86	     5.6.  Header Prediction  . . . . . . . . . . . . . . . . . . . . 25
87	     5.7.  IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 27
88	     5.8.  Duplicates from Earlier Incarnations of Connection . . . . 27
89	   6.  Conclusions and Acknowledgements . . . . . . . . . . . . . . . 28
90	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 28
91	     7.1.  Privacy Considerations . . . . . . . . . . . . . . . . . . 30
92	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30
93	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 30
94	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 30
95	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 31
96	   Appendix A.  Implementation Suggestions  . . . . . . . . . . . . . 34
97	   Appendix B.  Duplicates from Earlier Connection Incarnations . . . 35
98	     B.1.  System Crash with Loss of State  . . . . . . . . . . . . . 36
99	     B.2.  Closing and Reopening a Connection . . . . . . . . . . . . 36
100	   Appendix C.  Summary of Notation . . . . . . . . . . . . . . . . . 37
101	   Appendix D.  Event Processing Summary  . . . . . . . . . . . . . . 38
102	   Appendix E.  Timestamps Edge Cases . . . . . . . . . . . . . . . . 44
103	   Appendix F.  Window Retraction Example . . . . . . . . . . . . . . 45
104	   Appendix G.  RTO calculation modification  . . . . . . . . . . . . 45
105	   Appendix H.  Changes from RFC 1323 . . . . . . . . . . . . . . . . 46
106	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 48

108	1.  Introduction

110	   The TCP protocol [RFC0793] was designed to operate reliably over
111	   almost any transmission medium regardless of transmission rate,
112	   delay, corruption, duplication, or reordering of segments.  Over the
113	   years, advances in networking technology have resulted in ever-higher
114	   transmission speeds, and the fastest paths are well beyond the domain
115	   for which TCP was originally engineered.

117	   This document defines a set of modest extensions to TCP to extend the
118	   domain of its application to match the increasing network capability.
119	   It is an update to and obsoletes [RFC1323], which in turn is based
120	   upon and obsoletes [RFC1072] and [RFC1185].

122	   Changes between [RFC1323] and this document are detailed in
123	   Appendix H.  These changes are partly due to errata in [RFC1323], and
124	   partly due to the improved understanding of how the involved
125	   components interact.

127	   For brevity, the full discussions of the merits and history behind
128	   the TCP options defined within this document have been omitted.
129	   [RFC1323] should be consulted for reference.  It is recommended that
130	   a modern TCP stack implements and make use of the extensions
131	   described in this document.

133	1.1.  TCP Performance

135	   TCP performance problems arise when the bandwidth * delay product is
136	   large.  A network having such paths is referred to as "long, fat
137	   network" (LFN).

139	   There are two fundamental performance problems with basic TCP over
140	   LFN paths:

142	   (1)  Window Size Limit

144	        The TCP header uses a 16 bit field to report the receive window
145	        size to the sender.  Therefore, the largest window that can be
146	        used is 2^16 = 64 KiB.  For LFN paths where the bandwidth *
147	        delay product exceeds 64 KiB, the receive window limits the
148	        maximum throughput of the TCP connection over the path, i.e.,
149	        the amount of unacknowledged data that TCP can send in order to
150	        keep the pipeline full.

152	        To circumvent this problem, Section 2 of this memo defines a TCP
153	        option, "Window Scale", to allow windows larger than 2^16.  This
154	        option defines an implicit scale factor, which is used to
155	        multiply the window size value found in a TCP header to obtain
156	        the true window size.

158	        It must be noted, that the use of large receive windows
159	        increases the chance of too quickly wrapping sequence numbers,
160	        as described below in Section 1.2, (1).

162	   (2)  Recovery from Losses

164	        Packet losses in an LFN can have a catastrophic effect on
165	        throughput.

167	        To generalize the Fast Retransmit / Fast Recovery mechanism to
168	        handle multiple packets dropped per window, Selective
169	        Acknowledgments are required.  Unlike the normal cumulative
170	        acknowledgments of TCP, Selective Acknowledgments give the
171	        sender a complete picture of which segments are queued at the
172	        receiver and which have not yet arrived.

174	        Selective acknowledgements and their use are specified in
175	        separate documents, "TCP Selective Acknowledgment options"
176	        [RFC2018], "An Extension to the Selective Acknowledgement (SACK)
177	        option for TCP" [RFC2883], and "A Conservative Selective
178	        Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP"
179	        [RFC6675], and not further discussed in this document.

181	1.2.  TCP Reliability

183	   An especially serious kind of error may result from an accidental
184	   reuse of TCP sequence numbers in data segments.  TCP reliability
185	   depends upon the existence of a bound on the lifetime of a segment:
186	   the "Maximum Segment Lifetime" or MSL.

188	   Duplication of sequence numbers might happen in either of two ways:

190	   (1)  Sequence number wrap-around on the current connection

192	        A TCP sequence number contains 32 bits.  At a high enough
193	        transfer rate of large volumes of data (at least 4 GiB in the
194	        same session), the 32-bit sequence space may be "wrapped"
195	        (cycled) within the time that a segment is delayed in queues.

197	   (2)  Earlier incarnation of the connection

199	        Suppose that a connection terminates, either by a proper close
200	        sequence or due to a host crash, and the same connection (i.e.,
201	        using the same pair of port numbers) is immediately reopened.  A
202	        delayed segment from the terminated connection could fall within
203	        the current window for the new incarnation and be accepted as
204	        valid.

206	   Duplicates from earlier incarnations, case (2), are avoided by
207	   enforcing the current fixed MSL of the TCP specification, as
208	   explained in Section 5.8 and Appendix B.  In addition, the randmizing
209	   of ephemeral ports can also help to probabilistically reduce the
210	   chances of duplicates from earlier connections.  However, case (1),
211	   avoiding the reuse of sequence numbers within the same connection,
212	   requires an upper bound on MSL that depends upon the transfer rate,
213	   and at high enough rates, a dedicated mechanism is required.

215	   A possible fix for the problem of cycling the sequence space would be
216	   to increase the size of the TCP sequence number field.  For example,
217	   the sequence number field (and also the acknowledgment field) could
218	   be expanded to 64 bits.  This could be done either by changing the
219	   TCP header or by means of an additional option.

221	   Section 5 presents a different mechanism, which we call PAWS
222	   (Protection Against Wrapped Sequence numbers), to extend TCP
223	   reliability to transfer rates well beyond the foreseeable upper limit
224	   of network bandwidths.  PAWS uses the TCP Timestamps option defined
225	   in Section 3.2 to protect against old duplicates from the same
226	   connection.

228	1.3.  Using TCP options

230	   The extensions defined in this document all use TCP options.

232	   When [RFC1323] was published, there was concern that some buggy TCP
233	   implementation might crash on the first appearance of an option on a
234	   non-<SYN> segment.  However, bugs like that can lead to DOS attacks
235	   against a TCP.  Research has shown that most TCP implementations will
236	   properly handle unknown options on non-<SYN> segments ([Medina04],
237	   [Medina05]).  But it is still prudent to be conservative in what you
238	   send, and avoiding buggy TCP implementation is not the only reason
239	   for negotiating TCP options on <SYN> segments.

241	   The window scale option negotiates fundamental parameters of the TCP
242	   session.  Therefore, it is only sent during the initial handshake.
243	   Furthermore, the window scale option will be sent in a <SYN,ACK>
244	   segment only if the corresponding option was received in the initial
245	   <SYN> segment.

247	   The Timestamps option may appear in any data or <ACK> segment, adding
248	   10 bytes (up to 12 bytes including padding) to the 20-byte TCP
249	   header.  It is required that this TCP option will be sent on all non-
250	   <SYN> segments after an exchange of options on the <SYN> segments has
251	   indicated that both sides understand this extension.

253	   Research has shown that the use of the Timestamps option to take
254	   additional RTT samples within each RTT has little effect on the
255	   ultimate retransmission timeout value [Allman99].  However, there are
256	   other uses of the Timestamps option, such as the Eifel mechanism
257	   [RFC3522], [RFC4015], and PAWS (see Section 5) which improve overall
258	   TCP security and performance.  The extra header bandwidth used by
259	   this option should be evaluated for the gains in performance and
260	   security in an actual deployment.

262	   Appendix A contains a recommended layout of the options in TCP
263	   headers to achieve reasonable data field alignment.

265	   Finally, we observe that most of the mechanisms defined in this
266	   document are important for LFN's and/or very high-speed networks.
267	   For low-speed networks, it might be a performance optimization to NOT
268	   use these mechanisms.  A TCP vendor concerned about optimal
269	   performance over low-speed paths might consider turning these
270	   extensions off for low- speed paths, or allow a user or installation
271	   manager to disable them.

273	1.4.  Terminology

275	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
276	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
277	   document are to be interpreted as described in [RFC2119].

279	   In this document, these words will appear with that interpretation
280	   only when in UPPER CASE.  Lower case uses of these words are not to
281	   be interpreted as carrying [RFC2119] significance.

283	2.  TCP Window Scale option

285	2.1.  Introduction

287	   The window scale extension expands the definition of the TCP window
288	   to 30 bits and then uses an implicit scale factor to carry this 30-
289	   bit value in the 16-bit Window field of the TCP header (SEG.WND in
290	   [RFC0793]).  The exponent of the scale factor is carried in a TCP
291	   option, Window Scale.  This option is sent only in a <SYN> segment (a
292	   segment with the SYN bit on), hence the window scale is fixed in each
293	   direction when a connection is opened.

295	   The maximum receive window, and therefore the scale factor, is
296	   determined by the maximum receive buffer space.  In a typical modern
297	   implementation, this maximum buffer space is set by default but can
298	   be overridden by a user program before a TCP connection is opened.
299	   This determines the scale factor, and therefore no new user interface
300	   is needed for window scaling.

302	2.2.  Window Scale option

304	   The three-byte Window Scale option MAY be sent in a <SYN> segment by
305	   a TCP.  It has two purposes: (1) indicate that the TCP is prepared to
306	   both send and receive window scaling, and (2) communicate the
307	   exponent of a scale factor to be applied to its receive window.
308	   Thus, a TCP that is prepared to scale windows SHOULD send the option,
309	   even if its own scale factor is 1 and the exponent 0.  The scale
310	   factor is limited to a power of two and encoded logarithmically, so
311	   it may be implemented by binary shift operations.  The maximum scale
312	   exponent is limited to 14 for a maximum permissible receive window
313	   size of 1 GiB (2^(14+16)).

315	   TCP Window Scale option (WSopt):

317	   Kind: 3

319	   Length: 3 bytes

321	          +---------+---------+---------+
322	          | Kind=3  |Length=3 |shift.cnt|
323	          +---------+---------+---------+
324	               1         1         1

326	   This option is an offer, not a promise; both sides MUST send Window
327	   Scale options in their <SYN> segments to enable window scaling in
328	   either direction.  If window scaling is enabled, then the TCP that
329	   sent this option will right-shift its true receive-window values by
330	   'shift.cnt' bits for transmission in SEG.WND.  The value 'shift.cnt'
331	   MAY be zero (offering to scale, while applying a scale factor of 1 to
332	   the receive window).

334	   This option MAY be sent in an initial <SYN> segment (i.e., a segment
335	   with the SYN bit on and the ACK bit off).  It MAY also be sent in a
336	   <SYN,ACK> segment, but only if a Window Scale option was received in
337	   the initial <SYN> segment.  A Window Scale option in a segment
338	   without a SYN bit MUST be ignored.

340	   The window field in a segment where the SYN bit is set (i.e., a <SYN>
341	   or <SYN,ACK>) MUST NOT be scaled.

343	2.3.  Using the Window Scale option

345	   A model implementation of window scaling is as follows, using the
346	   notation of [RFC0793]:

348	   o  The connection state is augmented by two window shift counters,
349	      Snd.Wind.Shift and Rcv.Wind.Shift, to be applied to the incoming
350	      and outgoing window fields, respectively.

352	   o  If a TCP receives a <SYN> segment containing a Window Scale
353	      option, it SHOULD send its own Window Scale option in the
354	      <SYN,ACK> segment.

356	   o  The Window Scale option MUST be sent with shift.cnt = R, where R
357	      is the value that the TCP would like to use for its receive
358	      window.

360	   o  Upon receiving a <SYN> segment with a Window Scale option
361	      containing shift.cnt = S, a TCP MUST set Snd.Wind.Shift to S and
362	      MUST set Rcv.Wind.Shift to R; otherwise, it MUST set both
363	      Snd.Wind.Shift and Rcv.Wind.Shift to zero.

365	   o  The window field (SEG.WND) in the header of every incoming
366	      segment, with the exception of <SYN> segments, MUST be left-
367	      shifted by Snd.Wind.Shift bits before updating SND.WND:

369	                    SND.WND = SEG.WND << Snd.Wind.Shift

371	      (assuming the other conditions of [RFC0793] are met, and using the
372	      "C" notation "<<" for left-shift).

374	   o  The window field (SEG.WND) of every outgoing segment, with the
375	      exception of <SYN> segments, MUST be right-shifted by
376	      Rcv.Wind.Shift bits:

378	                    SEG.WND = RCV.WND >> Rcv.Wind.Shift

380	   TCP determines if a data segment is "old" or "new" by testing whether
381	   its sequence number is within 2^31 bytes of the left edge of the
382	   window, and if it is not, discarding the data as "old".  To insure
383	   that new data is never mistakenly considered old and vice versa, the
384	   left edge of the sender's window has to be at most 2^31 away from the
385	   right edge of the receiver's window.  Similarly with the sender's
386	   right edge and receiver's left edge.  Since the right and left edges
387	   of either the sender's or receiver's window differ by the window
388	   size, and since the sender and receiver windows can be out of phase
389	   by at most the window size, the above constraints imply that two
390	   times the maximum window size must be less than 2^31, or

392	                             max window < 2^30

394	   Since the max window is 2^S (where S is the scaling shift count)
395	   times at most 2^16 - 1 (the maximum unscaled window), the maximum
396	   window is guaranteed to be < 2^30 if S <= 14.  Thus, the shift count
397	   MUST be limited to 14 (which allows windows of 2^30 = 1 GiB).  If a
398	   Window Scale option is received with a shift.cnt value larger than
399	   14, the TCP SHOULD log the error but MUST use 14 instead of the
400	   specified value.  This is safe as a sender can always choose to only
401	   partially use any signaled receive window.  If the receiver is
402	   scaling by a factor larger than 14 and the sender is only scaling by
403	   14 then the receive window used by the sender will appear smaller
404	   than it is in reality.

406	   The scale factor applies only to the Window field as transmitted in
407	   the TCP header; each TCP using extended windows will maintain the
408	   window values locally as 32-bit numbers.  For example, the
409	   "congestion window" computed by Slow Start and Congestion Avoidance
410	   (see [RFC5681]) is not affected by the scale factor, so window
411	   scaling will not introduce quantization into the congestion window.

413	2.4.  Addressing Window Retraction

415	   When a non-zero scale factor is in use, there are instances when a
416	   retracted window can be offered - see Appendix F for a detailed
417	   example.  The end of the window will be on a boundary based on the
418	   granularity of the scale factor being used.  If the sequence number
419	   is then updated by a number of bytes smaller than that granularity,
420	   the TCP will have to either advertise a new window that is beyond
421	   what it previously advertised (and perhaps beyond the buffer), or
422	   will have to advertise a smaller window, which will cause the TCP
423	   window to shrink.  Implementations MUST ensure that they handle a
424	   shrinking window, as specified in section 4.2.2.16 of [RFC1122].

426	   For the receiver, this implies that:

428	   1)  The receiver MUST honor, as in-window, any segment that would
429	       have been in-window for any <ACK> sent by the receiver.

431	   2)  When window scaling is in effect, the receiver SHOULD track the
432	       actual maximum window sequence number (which is likely to be
433	       greater than the window announced by the most recent <ACK>, if
434	       more than one segment has arrived since the application consumed
435	       any data in the receive buffer).

437	   On the sender side:

439	   3)  The initial transmission MUST be within the window announced by
440	       the most recent <ACK>.

442	   4)  On first retransmission, or if the sequence number is out-of-
443	       window by less than 2^Rcv.Wind.Shift then do normal
444	       retransmission(s) without regard to receiver window as long as
445	       the original segment was in window when it was sent.

447	   5)  Subsequent retransmissions MAY only be sent, if they are within
448	       the window announced by the most recent <ACK>.

450	3.  TCP Timestamps option

452	3.1.  Introduction

454	   The Timestamps option is introduced to address some of the issues
455	   mentioned in Section 1.1 and Section 1.2.  The Timestamps option is
456	   specified in a symmetrical manner, so that TSval timestamps are
457	   carried in both data and <ACK> segments and are echoed in TSecr
458	   fields carried in returning <ACK> or data segments.  Originally used
459	   primarily for timestamping individual segments, the properties of the
460	   Timestamps option allow not only the use for taking time measurements
461	   (Section 4), but additional uses as well (Section 5).

463	   It is necessary to remember that there is a distinction between the
464	   Timestamps option conveying timestamp information, and the use of
465	   that information.  In particular, the Round Trip Time Measurement
466	   (RTTM) mechanism must be viewed independently from updating the
467	   Retransmission Timeout (RTO) (see Section 4.2).  In this case, the
468	   sample granularity also needs to be taken into account.  Other
469	   mechanisms, such as PAWS, or Eifel, are not built upon the timestamp
470	   information itself, but are based on the intrinsic property of
471	   monotonically non-decreasing values.

473	   The Timestamps option is important when large receive windows are
474	   used, to allow the use of the PAWS mechanism (see Section 5).
475	   Furthermore, the option may be useful for all TCP's, since it
476	   simplifies the sender and allows the use of additional optimizations
477	   such as Eifel ([RFC3522], [RFC4015]) and others ([RFC6817],
478	   [Kuzmanovic03], [Kuehlewind10].

480	3.2.  Timestamps option

482	   TCP is a symmetric protocol, allowing data to be sent at any time in
483	   either direction, and therefore timestamp echoing may occur in either
484	   direction.  For simplicity and symmetry, we specify that timestamps
485	   always be sent and echoed in both directions.  For efficiency, we
486	   combine the timestamp and timestamp reply fields into a single TCP
487	   Timestamps option.

489	   TCP Timestamps option (TSopt):

491	   Kind: 8

493	   Length: 10 bytes

495	          +-------+-------+---------------------+---------------------+
496	          |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
497	          +-------+-------+---------------------+---------------------+
498	              1       1              4                     4

500	   The Timestamps option carries two four-byte timestamp fields.  The
501	   Timestamp Value field (TSval) contains the current value of the
502	   timestamp clock of the TCP sending the option.

504	   The Timestamp Echo Reply (TSecr) field is valid if the ACK bit is set
505	   in the TCP header.  If the ACK bit is not set in the outgoing TCP
506	   header, the sender of that segment SHOULD set the TSecr field to
507	   zero.  When the ACK bit is set in an outgoing segment, the sender
508	   MUST echo a recently received Timestamp Value (TSval) sent by the
509	   remote TCP in the TSval field of a Timestamps option.  The exact
510	   rules on which TSval MUST be echoed are given in Section 4.3.  When
511	   the ACK bit is not set, the receiver MUST ignore the value of the
512	   TSecr field.

514	   A TCP MAY send the Timestamps option (TSopt) in an initial <SYN>
515	   segment (i.e., segment containing a SYN bit and no ACK bit), and MAY
516	   send a TSopt in <SYN,ACK> only if it received a TSopt in the initial
517	   <SYN> segment for the connection.

519	   Once TSopt has been successfully negotiated, that is both <SYN>, and
520	   <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
521	   segment for the duration of the connection, and SHOULD be sent in an
522	   <RST> segment (see Section 5.2 for details).  The TCP SHOULD remember
523	   this state by setting a flag, referred to as Snd.TS.OK, to one.  If a
524	   non-<RST> segment is received without a TSopt, a TCP SHOULD silently
525	   drop the segment.  A TCP MUST NOT abort a TCP connection because any
526	   segment lacks an expected TSopt.

528	   Implementations are strongly encouraged to follow the above rules for
529	   handling a missing Timestamps option, and the order of precedence
530	   mentioned in Section 5.3 when deciding on the acceptance of a
531	   segment.

533	   If a receiver chooses to accept a segment without an expected
534	   Timestamps option, it must be clear that undetectable data corruption
535	   may occur.

537	   Such a TCP receiver may experience undetectable wrapped- sequence
538	   effects, such as data (payload) corruption or session stalls.  In
539	   order to maintain the integrity of the payload data, in particular on
540	   high speed networks, it is paramount to follow the described
541	   processing rules.

543	   However, it has been mentioned that under some circumstances, the
544	   above guidelines are too strict, and some paths sporadically suppress
545	   the Timestamps option, while maintaining payload integrity.  A path
546	   behaving in this manner should be deemed unacceptable, but it has
547	   been noted that some implementations relax the acceptance rules as a
548	   workaround, and allow TCP to run across such paths [Oppermann13]

550	   If a TSopt is received on a connection where TSopt was not negotiated
551	   in the initial three-way handshake, the TSopt MUST be ignored and the
552	   packet processed normally.

554	   In the case of crossing <SYN> segments where one <SYN> contains a
555	   TSopt and the other doesn't, both sides MAY send a TSopt in the
556	   <SYN,ACK> segment.

558	   TSopt is required for the two mechanisms described in sections 4 and
559	   5.  There are also other mechanisms that rely on the presence of the
560	   TSopt, e.g.  [RFC3522].  If a TCP stopped sending TSopt at any time
561	   during an established session, it interferes with these mechanisms.
562	   This update to [RFC1323] describes explicitly the previous assumption
563	   (see Section 5.2), that each TCP segment must have TSopt, once
564	   negotiated.

566	4.  The RTTM Mechanism

568	4.1.  Introduction

570	   One use of the Timestamps option is to measure the round trip time of
571	   virtually every packet acknowledged.  The Round Trip Time Measurement
572	   (RTTM) mechansim requires a Timestamps option in every measured
573	   segment, with a TSval that is obtained from a (virtual) "timestamp
574	   clock".  Values of this clock MUST be at least approximately
575	   proportional to real time, in order to measure actual RTT.

577	   TCP measures the round trip time (RTT), primarily for the purpose of
578	   arriving at a reasonable value for the Retransmission Timeout (RTO)
579	   timer interval.  Accurate and current RTT estimates are necessary to
580	   adapt to changing traffic conditions, while a conservative estimate
581	   of the RTO interval is necessary to minimize spurious RTOs.

583	   These TSval values are echoed in TSecr values in the reverse
584	   direction.  The difference between a received TSecr value and the
585	   current timestamp clock value provides an RTT measurement.

587	   When timestamps are used, every segment that is received will contain
588	   a TSecr value.  However, these values cannot all be used to update
589	   the measured RTT.  The following example illustrates why.  It shows a
590	   one-way data flow with segments arriving in sequence without loss.
591	   Here A, B, C... represent data blocks occupying successive blocks of
592	   sequence numbers, and ACK(A),... represent the corresponding
593	   cumulative acknowledgments.  The two timestamp fields of the
594	   Timestamps option are shown symbolically as <TSval=x,TSecr=y>.  Each
595	   TSecr field contains the value most recently received in a TSval
596	   field.

598	              TCP  A                                     TCP B

600	                              <A,TSval=1,TSecr=120> ----->

602	                   <---- <ACK(A),TSval=127,TSecr=1>

604	                              <B,TSval=5,TSecr=127> ----->

606	                   <---- <ACK(B),TSval=131,TSecr=5>

608	                . . . . . . . . . . . . . . . . . . . . . .

610	                              <C,TSval=65,TSecr=131> ---->

612	                   <---- <ACK(C),TSval=191,TSecr=65>
613	                                  (etc.)

615	   The dotted line marks a pause (60 time units long) in which A had
616	   nothing to send.  Note that this pause inflates the RTT which B could
617	   infer from receiving TSecr=131 in data segment C. Thus, in one-way
618	   data flows, RTTM in the reverse direction measures a value that is
619	   inflated by gaps in sending data.  However, the following rule
620	   prevents a resulting inflation of the measured RTT:

622	   RTTM Rule: A TSecr value received in a segment MAY be used to update
623	              the averaged RTT measurement only if the segment advances
624	              the left edge of the send window, i.e.  SND.UNA is
625	              increased.

627	   Since TCP B is not sending data, the data segment C does not
628	   acknowledge any new data when it arrives at B. Thus, the inflated
629	   RTTM measurement is not used to update B's RTTM measurement.

631	4.2.  Updating the RTO value

633	   When [RFC1323] was originally written, it was perceived that taking
634	   RTT measurements for each segment, and also during retransmissions,
635	   would contribute to reduce spurious RTOs, while maintaining the
636	   timeliness of necessary RTOs.  At the time, RTO was also the only
637	   mechanism to make use of the measured RTT.  It has been shown, that
638	   taking more RTT samples has only a very limited effect to optimize
639	   RTOs [Allman99].

641	   Implementers should note that with timestamps multiple RTTMs can be
642	   taken per RTT.  The [RFC6298] RTO estimator has weighting factors,
643	   alpha and beta, based on an implicit assumption that at most one RTTM
644	   will be sampled per RTT.  When multiple RTTMs per RTT are available
645	   to update the RTO estimator, an implementation SHOULD try to adhere
646	   to the spirit of the history specified in [RFC6298].  An
647	   implementation suggestion is detailed in Appendix G.

649	   [Ludwig00] and [Floyd05] have highlighted the problem that an
650	   unmodified RTO calculation, which is updated with per-packet RTT
651	   samples, will truncate the path history too soon.  This can lead to
652	   an increase in spurious retransmissions, when the path properties
653	   vary in the order of a few RTTs, but a high number of RTT samples are
654	   taken on a much shorter timescale.

656	4.3.  Which Timestamp to Echo

658	   If more than one Timestamps option is received before a reply segment
659	   is sent, the TCP must choose only one of the TSvals to echo, ignoring
660	   the others.  To minimize the state kept in the receiver (i.e., the
661	   number of unprocessed TSvals), the receiver should be required to
662	   retain at most one timestamp in the connection control block.

664	   There are three situations to consider:

666	   (A)  Delayed ACKs.

668	        Many TCP's acknowledge only every second segment out of a group
669	        of segments arriving within a short time interval; this policy
670	        is known generally as "delayed ACKs".  The data-sender TCP must
671	        measure the effective RTT, including the additional time due to
672	        delayed ACKs, or else it will retransmit unnecessarily.  Thus,
673	        when delayed ACKs are in use, the receiver SHOULD reply with the
674	        TSval field from the earliest unacknowledged segment.

676	   (B)  A hole in the sequence space (segment(s) have been lost).

678	        The sender will continue sending until the window is filled, and
679	        the receiver may be generating <ACK>s as these out-of-order
680	        segments arrive (e.g., to aid "fast retransmit").

682	        The lost segment is probably a sign of congestion, and in that
683	        situation the sender should be conservative about
684	        retransmission.  Furthermore, it is better to overestimate than
685	        underestimate the RTT.  An <ACK> for an out-of-order segment
686	        SHOULD therefore contain the timestamp from the most recent
687	        segment that advanced RCV.NXT.

689	        The same situation occurs if segments are re-ordered by the
690	        network.

692	   (C)  A filled hole in the sequence space.

694	        The segment that fills the hole and advances the window
695	        represents the most recent measurement of the network
696	        characteristics.  An RTT computed from an earlier segment would
697	        probably include the sender's retransmit time-out, badly biasing
698	        the sender's average RTT estimate.  Thus, the timestamp from the
699	        latest segment (which filled the hole) MUST be echoed.

701	   An algorithm that covers all three cases is described in the
702	   following rules for Timestamps option processing on a synchronized
703	   connection:

705	   (1)  The connection state is augmented with two 32-bit slots:

707	        TS.Recent holds a timestamp to be echoed in TSecr whenever a
708	        segment is sent, and Last.ACK.sent holds the ACK field from the
709	        last segment sent.  Last.ACK.sent will equal RCV.NXT except when
710	        <ACK>s have been delayed.

712	   (2)  If:

714	            SEG.TSval >= TS.recent and SEG.SEQ <= Last.ACK.sent

716	        then SEG.TSval is copied to TS.Recent; otherwise, it is ignored.

718	   (3)  When a TSopt is sent, its TSecr field is set to the current
719	        TS.Recent value.

721	   The following examples illustrate these rules.  Here A, B, C...
722	   represent data segments occupying successive blocks of sequence
723	   numbers, and ACK(A),... represent the corresponding acknowledgment
724	   segments.  Note that ACK(A) has the same sequence number as B. We
725	   show only one direction of timestamp echoing, for clarity.

727	   o  Segments arrive in sequence, and some of the <ACK>s are delayed.

729	      By case (A), the timestamp from the oldest unacknowledged segment
730	      is echoed.

732	                                                    TS.Recent
733	                  <A, TSval=1> ------------------->
734	                                                        1
735	                  <B, TSval=2> ------------------->
736	                                                        1
737	                  <C, TSval=3> ------------------->
738	                                                        1
739	                           <---- <ACK(C), TSecr=1>
740	                  (etc)

742	   o  Segments arrive out of order, and every segment is acknowledged.

744	      By case (B), the timestamp from the last segment that advanced the
745	      left window edge is echoed, until the missing segment arrives; it
746	      is echoed according to Case (C).  The same sequence would occur if
747	      segments B and D were lost and retransmitted.

749	                                                    TS.Recent
750	                  <A, TSval=1> ------------------->
751	                                                        1
752	                           <---- <ACK(A), TSecr=1>
753	                                                        1
754	                  <C, TSval=3> ------------------->
755	                                                        1
756	                           <---- <ACK(A), TSecr=1>
757	                                                        1
758	                  <B, TSval=2> ------------------->
759	                                                        2
760	                           <---- <ACK(C), TSecr=2>
761	                                                        2
762	                  <E, TSval=5> ------------------->
763	                                                        2
764	                           <---- <ACK(C), TSecr=2>
765	                                                        2
766	                  <D, TSval=4> ------------------->
767	                                                        4
768	                           <---- <ACK(E), TSecr=4>
769	                  (etc)

771	5.  PAWS - Protection Against Wrapped Sequence Numbers

773	5.1.  Introduction

775	   Another use for the Timestamps options is the mechanism to Protect
776	   Against Wrapped Sequence numbers (PAWS).  Section 5.2 describes a
777	   simple mechanism to reject old duplicate segments that might corrupt
778	   an open TCP connection.  PAWS operates within a single TCP
779	   connection, using state that is saved in the connection control
780	   block.  Section 5.8 and Appendix H discuss the implications of the
781	   PAWS mechanism for avoiding old duplicates from previous incarnations
782	   of the same connection.

784	5.2.  The PAWS Mechanism

786	   PAWS uses the TCP Timestamps option described earlier, and assumes
787	   that every received TCP segment (including data and <ACK> segments)
788	   contains a timestamp SEG.TSval whose values are monotonically non-
789	   decreasing in time.  The basic idea is that a segment can be
790	   discarded as an old duplicate if it is received with a timestamp
791	   SEG.TSval less than some timestamp recently received on this
792	   connection.

794	   In the PAWS mechanism, the "timestamps" are 32-bit unsigned integers
795	   in a modular 32-bit space.  Thus, "less than" is defined the same way
796	   it is for TCP sequence numbers, and the same implementation
797	   techniques apply.  If s and t are timestamp values,

799	                       s < t  if 0 < (t - s) < 2^31,

801	   computed in unsigned 32-bit arithmetic.

803	   The choice of incoming timestamps to be saved for this comparison
804	   MUST guarantee a value that is monotonically non-decreasing.  For
805	   example, an implementation might save the timestamp from the segment
806	   that last advanced the left edge of the receive window, i.e., the
807	   most recent in-sequence segment.  For simplicity, the value TS.Recent
808	   introduced in Section 4.3 is used instead, as using a common value
809	   for both PAWS and RTTM simplifies the implementation.  As Section 4.3
810	   explained, TS.Recent differs from the timestamp from the last in-
811	   sequence segment only in the case of delayed <ACK>s, and therefore by
812	   less than one window.  Either choice will therefore protect against
813	   sequence number wrap-around.

815	   PAWS submits all incoming segments to the same test, and therefore
816	   protects against duplicate <ACK> segments as well as data segments.
817	   (An alternative non-symmetric algorithm would protect against old
818	   duplicate <ACK>s: the sender of data would reject incoming <ACK>
819	   segments whose TSecr values were less than the TSecr saved from the
820	   last segment whose ACK field advanced the left edge of the send
821	   window.  This algorithm was deemed to lack economy of mechanism and
822	   symmetry.)

824	   TSval timestamps sent on <SYN> and <SYN,ACK> segments are used to
825	   initialize PAWS.  PAWS protects against old duplicate non- <SYN>
826	   segments, and duplicate <SYN> segments received while there is a
827	   synchronized connection.  Duplicate <SYN> and <SYN,ACK> segments
828	   received when there is no connection will be discarded by the normal
829	   3-way handshake and sequence number checks of TCP.

831	   [RFC1323] recommended that <RST> segments NOT carry timestamps, and
832	   that they be acceptable regardless of their timestamp.  At that time,
833	   the thinking was that old duplicate <RST> segments should be
834	   exceedingly unlikely, and their cleanup function should take
835	   precedence over timestamps.  More recently, discussions about various
836	   blind attacks on TCP connections have raised the suggestion that if
837	   the Timestamps option is present, SEG.TSecr could be used to provide
838	   stricter acceptance tests for <RST> segments.

840	   While still under discussion, to enable research into this area it is
841	   now RECOMMENDED that when generating an <RST>, that if the segment
842	   causing the <RST> to be generated contained a Timestamps option, that
843	   the <RST> also contain a Timestamps option.  In the <RST> segment,
844	   SEG.TSecr SHOULD be set to SEG.TSval from the incoming segment and
845	   SEG.TSval SHOULD be set to zero.  If an <RST> is being generated
846	   because of a user abort, and Snd.TS.OK is set, then a Timestamps
847	   option SHOULD be included in the <RST>.  When an <RST> segment is
848	   received, it MUST NOT be subjected to the PAWS check by verifying an
849	   acceptable value in SEG.TSval, and information from the Timestamps
850	   option MUST NOT be used to update connection state information.
851	   SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.

853	5.3.  Basic PAWS Algorithm

855	   If the PAWS algorithm is used, the following processing MUST be
856	   performed on all incoming segments for a synchronized connection.
857	   Also, PAWS processing MUST take precedence over the regular TCP
858	   acceptablitiy check (Section 3.3 in [RFC0793]), which is performed
859	   after verification of the received Timestamps option:

861	   R1)  If there is a Timestamps option in the arriving segment,
862	        SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion)
863	        and the RST bit is not set, then treat the arriving segment as
864	        not acceptable:

866	           Send an acknowledgement in reply as specified in [RFC0793]
867	           page 69 and drop the segment.

869	           Note: it is necessary to send an <ACK> segment in order to
870	           retain TCP's mechanisms for detecting and recovering from
871	           half- open connections.  For example, see Figure 10 of
872	           [RFC0793].

874	   R2)  If the segment is outside the window, reject it (normal TCP
875	        processing)

877	   R3)  If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent (see
878	        Section 4.3), then record its timestamp in TS.Recent.

880	   R4)  If an arriving segment is in-sequence (i.e., at the left window
881	        edge), then accept it normally.

883	   R5)  Otherwise, treat the segment as a normal in-window, out-of-
884	        sequence TCP segment (e.g., queue it for later delivery to the
885	        user).

887	   Steps R2, R4, and R5 are the normal TCP processing steps specified by
888	   [RFC0793].

890	   It is important to note that the timestamp MUST be checked only when
891	   a segment first arrives at the receiver, regardless of whether it is
892	   in- sequence or it must be queued for later delivery.

894	   Consider the following example.

896	      Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has been
897	      sent, where the letter indicates the sequence number and the digit
898	      represents the timestamp.  Suppose also that segment B.1 has been
899	      lost.  The timestamp in TS.Recent is 1 (from A.1), so C.1, ...,
900	      Z.1 are considered acceptable and are queued.  When B is
901	      retransmitted as segment B.2 (using the latest timestamp), it
902	      fills the hole and causes all the segments through Z to be
903	      acknowledged and passed to the user.  The timestamps of the queued
904	      segments are *not* inspected again at this time, since they have
905	      already been accepted.  When B.2 is accepted, TS.Recent is set to
906	      2.

908	   This rule allows reasonable performance under loss.  A full window of
909	   data is in transit at all times, and after a loss a full window less
910	   one segment will show up out-of-sequence to be queued at the receiver
911	   (e.g., up to ~2^30 bytes of data); the Timestamps option must not
912	   result in discarding this data.

914	   In certain unlikely circumstances, the algorithm of rules R1-R5 could
915	   lead to discarding some segments unnecessarily, as shown in the
916	   following example:

918	      Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have been
919	      sent in sequence and that segment B.1 has been lost.  Furthermore,
920	      suppose delivery of some of C.1, ...  Z.1 is delayed until *after*
921	      the retransmission B.2 arrives at the receiver.  These delayed
922	      segments will be discarded unnecessarily when they do arrive,
923	      since their timestamps are now out of date.

925	   This case is very unlikely to occur.  If the retransmission was
926	   triggered by a timeout, some of the segments C.1, ...  Z.1 must have
927	   been delayed longer than the RTO time.  This is presumably an
928	   unlikely event, or there would be many spurious timeouts and
929	   retransmissions.  If B's retransmission was triggered by the "fast
930	   retransmit" algorithm, i.e., by duplicate <ACK>s, then the queued
931	   segments that caused these <ACK>s must have been received already.

933	   Even if a segment were delayed past the RTO, the Fast Retransmit
934	   mechanism [Jacobson90c] will cause the delayed segments to be
935	   retransmitted at the same time as B.2, avoiding an extra RTT and
936	   therefore causing a very small performance penalty.

938	   We know of no case with a significant probability of occurrence in
939	   which timestamps will cause performance degradation by unnecessarily
940	   discarding segments.

942	5.4.  Timestamp Clock

944	   It is important to understand that the PAWS algorithm does not
945	   require clock synchronization between sender and receiver.  The
946	   sender's timestamp clock is used as a source of monotonic non-
947	   decreasing values to stamp the segments.  The receiver treats the
948	   timestamp value as simply a monotonically non-decreasing serial
949	   number, without any connection to time.  From the receiver's
950	   viewpoint, the timestamp is acting as a logical extension of the
951	   high-order bits of the sequence number.

953	   The receiver algorithm does place some requirements on the frequency
954	   of the timestamp clock.

956	   (a)  The timestamp clock must not be "too slow".

958	        It MUST tick at least once for each 2^31 bytes sent.  In fact,
959	        in order to be useful to the sender for round trip timing, the
960	        clock SHOULD tick at least once per window's worth of data, and
961	        even with the window extension defined in Section 2.2, 2^31
962	        bytes must be at least two windows.

964	        To make this more quantitative, any clock faster than 1 tick/sec
965	        will reject old duplicate segments for link speeds of ~8 Gbps.
966	        A 1 ms timestamp clock will work at link speeds up to 8 Tbps
967	        (8*10^12) bps!

969	   (b)  The timestamp clock must not be "too fast".

971	        The recycling time of the timestamp clock MUST be greater than
972	        MSL seconds.  Since the clock (timestamp) is 32 bits and the
973	        worst-case MSL is 255 seconds, the maximum acceptable clock
974	        frequency is one tick every 59 ns.

976	        However, it is desirable to establish a much longer recycle
977	        period, in order to handle outdated timestamps on idle
978	        connections (see Section 5.5), and to relax the MSL requirement
979	        for preventing sequence number wrap-around.  With a 1 ms
980	        timestamp clock, the 32-bit timestamp will wrap its sign bit in
981	        24.8 days.  Thus, it will reject old duplicates on the same
982	        connection if MSL is 24.8 days or less.  This appears to be a
983	        very safe figure; an MSL of 24.8 days or longer can probably be
984	        assumed in the Internet without requiring precise MSL
985	        enforcement.

987	   Based upon these considerations, we choose a timestamp clock
988	   frequency in the range 1 ms to 1 sec per tick.  This range also
989	   matches the requirements of the RTTM mechanism, which does not need
990	   much more resolution than the granularity of the retransmit timer,
991	   e.g., tens or hundreds of milliseconds.

993	   The PAWS mechanism also puts a strong monotonicity requirement on the
994	   sender's timestamp clock.  The method of implementation of the
995	   timestamp clock to meet this requirement depends upon the system
996	   hardware and software.

998	   o  Some hosts have a hardware clock that is guaranteed to be
999	      monotonic between hardware resets.

1001	   o  A clock interrupt may be used to simply increment a binary integer
1002	      by 1 periodically.

1004	   o  The timestamp clock may be derived from a system clock that is
1005	      subject to being abruptly changed, by adding a variable offset
1006	      value.  This offset is initialized to zero.  When a new timestamp
1007	      clock value is needed, the offset can be adjusted as necessary to
1008	      make the new value equal to or larger than the previous value
1009	      (which was saved for this purpose).

1011	   o  A random offset may be added to the timestamp clock on a per
1012	      connection basis.  See [RFC6528], section 3, on randomizing the
1013	      initial sequence number (ISN).  The same function with a different
1014	      secret key can be use to generate the per connection timestamp
1015	      offset.

1017	5.5.  Outdated Timestamps

1019	   If a connection remains idle long enough for the timestamp clock of
1020	   the other TCP to wrap its sign bit, then the value saved in TS.Recent
1021	   will become too old; as a result, the PAWS mechanism will cause all
1022	   subsequent segments to be rejected, freezing the connection (until
1023	   the timestamp clock wraps its sign bit again).

1025	   With the chosen range of timestamp clock frequencies (1 sec to 1 ms),
1026	   the time to wrap the sign bit will be between 24.8 days and 24800
1027	   days.  A TCP connection that is idle for more than 24 days and then
1028	   comes to life is exceedingly unusual.  However, it is undesirable in
1029	   principle to place any limitation on TCP connection lifetimes.

1031	   We therefore require that an implementation of PAWS include a
1032	   mechanism to "invalidate" the TS.Recent value when a connection is
1033	   idle for more than 24 days.  (An alternative solution to the problem
1034	   of outdated timestamps would be to send keep-alive segments at a very
1035	   low rate, but still more often than the wrap-around time for
1036	   timestamps, e.g., once a day.  This would impose negligible overhead.
1037	   However, the TCP specification has never included keep-alives, so the
1038	   solution based upon invalidation was chosen.)

1040	   Note that a TCP does not know the frequency, and therefore, the
1041	   wraparound time, of the other TCP, so it must assume the worst.  The
1042	   validity of TS.Recent needs to be checked only if the basic PAWS
1043	   timestamp check fails, i.e., only if SEG.TSval < TS.Recent.  If
1044	   TS.Recent is found to be invalid, then the segment is accepted,
1045	   regardless of the failure of the timestamp check, and rule R3 updates
1046	   TS.Recent with the TSval from the new segment.

1048	   To detect how long the connection has been idle, the TCP MAY update a
1049	   clock or timestamp value associated with the connection whenever
1050	   TS.Recent is updated, for example.  The details will be
1051	   implementation-dependent.

1053	5.6.  Header Prediction

1055	   "Header prediction" [Jacobson90a] is a high-performance transport
1056	   protocol implementation technique that is most important for high-
1057	   speed links.  This technique optimizes the code for the most common
1058	   case, receiving a segment correctly and in order.  Using header
1059	   prediction, the receiver asks the question, "Is this segment the next
1060	   in sequence?"  This question can be answered in fewer machine
1061	   instructions than the question, "Is this segment within the window?"

1063	   Adding header prediction to our timestamp procedure leads to the
1064	   following recommended sequence for processing an arriving TCP
1065	   segment:

1067	   H1)  Check timestamp (same as step R1 above)

1069	   H2)  Do header prediction: if segment is next in sequence and if
1070	        there are no special conditions requiring additional processing,
1071	        accept the segment, record its timestamp, and skip H3.

1073	   H3)  Process the segment normally, as specified in RFC 793.  This
1074	        includes dropping segments that are outside the window and
1075	        possibly sending acknowledgments, and queuing in-window, out-of-
1076	        sequence segments.

1078	   Another possibility would be to interchange steps H1 and H2, i.e., to
1079	   perform the header prediction step H2 *first*, and perform H1 and H3
1080	   only when header prediction fails.  This could be a performance
1081	   improvement, since the timestamp check in step H1 is very unlikely to
1082	   fail, and it requires unsigned modulo arithmetic.  To perform this
1083	   check on every single segment is contrary to the philosophy of header
1084	   prediction.  We believe that this change might produce a measurable
1085	   reduction in CPU time for TCP protocol processing on high-speed
1086	   networks.

1088	   However, putting H2 first would create a hazard: a segment from 2^32
1089	   bytes in the past might arrive at exactly the wrong time and be
1090	   accepted mistakenly by the header-prediction step.  The following
1091	   reasoning has been introduced in [RFC1185] to show that the
1092	   probability of this failure is negligible.

1094	      If all segments are equally likely to show up as old duplicates,
1095	      then the probability of an old duplicate exactly matching the left
1096	      window edge is the maximum segment size (MSS) divided by the size
1097	      of the sequence space.  This ratio must be less than 2^-16, since
1098	      MSS must be < 2^16; for example, it will be (2^12)/(2^32) = 2^-20
1099	      for a 100 Mbit/s link.  However, the older a segment is, the less
1100	      likely it is to be retained in the Internet, and under any
1101	      reasonable model of segment lifetime the probability of an old
1102	      duplicate exactly at the left window edge must be much smaller
1103	      than 2^-16.

1105	      The 16 bit TCP checksum also allows a basic unreliability of one
1106	      part in 2^16.  A protocol mechanism whose reliability exceeds the
1107	      reliability of the TCP checksum should be considered "good
1108	      enough", i.e., it won't contribute significantly to the overall
1109	      error rate.  We therefore believe we can ignore the problem of an
1110	      old duplicate being accepted by doing header prediction before
1111	      checking the timestamp.

1113	   However, this probabilistic argument is not universally accepted, and
1114	   the consensus at present is that the performance gain does not
1115	   justify the hazard in the general case.  It is therefore recommended
1116	   that H2 follow H1.

1118	5.7.  IP Fragmentation

1120	   At high data rates, the protection against old segments provided by
1121	   PAWS can be circumvented by errors in IP fragment reassembly (see
1122	   [RFC4963]).  The only way to protect against incorrect IP fragment
1123	   reassembly is to not allow the segments to be fragmented.  This is
1124	   done by setting the Don't Fragment (DF) bit in the IP header.
1125	   Setting the DF bit implies the use of Path MTU Discovery as described
1126	   in [RFC1191], [RFC1981], and [RFC4821], thus any TCP implementation
1127	   that implements PAWS MUST also implement Path MTU Discovery.

1129	5.8.  Duplicates from Earlier Incarnations of Connection

1131	   The PAWS mechanism protects against errors due to sequence number
1132	   wrap-around on high-speed connections.  Segments from an earlier
1133	   incarnation of the same connection are also a potential cause of old
1134	   duplicate errors.  In both cases, the TCP mechanisms to prevent such
1135	   errors depend upon the enforcement of a maximum segment lifetime
1136	   (MSL) by the Internet (IP) layer (see Appendix of RFC 1185 for a
1137	   detailed discussion).  Unlike the case of sequence space wrap-around,
1138	   the MSL required to prevent old duplicate errors from earlier
1139	   incarnations does not depend upon the transfer rate.  If the IP layer
1140	   enforces the recommended 2 minute MSL of TCP, and if the TCP rules
1141	   are followed, TCP connections will be safe from earlier incarnations,
1142	   no matter how high the network speed.  Thus, the PAWS mechanism is
1143	   not required for this case.

1145	   We may still ask whether the PAWS mechanism can provide additional
1146	   security against old duplicates from earlier connections, allowing us
1147	   to relax the enforcement of MSL by the IP layer.  Appendix B explores
1148	   this question, showing that further assumptions and/or mechanisms are
1149	   required, beyond those of PAWS.  This is not part of the current
1150	   extension.

1152	6.  Conclusions and Acknowledgements

1154	   This memo presented a set of extensions to TCP to provide efficient
1155	   operation over large bandwidth * delay product paths and reliable
1156	   operation over very high-speed paths.  These extensions are designed
1157	   to provide compatible interworking with TCP stacks that do not
1158	   implement the extensions.

1160	   These mechanisms are implemented using TCP options for scaled windows
1161	   and timestamps.  The timestamps are used for two distinct mechanisms:
1162	   RTTM (Round Trip Time Measurement) and PAWS (Protection Against
1163	   Wrapped Sequences).

1165	   The Window Scale option was originally suggested by Mike St. Johns of
1166	   USAF/DCA.  The present form of the option was suggested by Mike
1167	   Karels of UC Berkeley in response to a more cumbersome scheme defined
1168	   by Van Jacobson.  Lixia Zhang helped formulate the PAWS mechanism
1169	   description in [RFC1185].

1171	   Finally, much of this work originated as the result of discussions
1172	   within the End-to-End Task Force on the theoretical limitations of
1173	   transport protocols in general and TCP in particular.  Task force
1174	   members and other on the end2end-interest list have made valuable
1175	   contributions by pointing out flaws in the algorithms and the
1176	   documentation.  Continued discussion and development since the
1177	   publication of [RFC1323] originally occurred in the IETF TCP Large
1178	   Windows Working Group, later on in the End-to-End Task Force, and
1179	   most recently in the IETF TCP Maintenance Working Group.  The authors
1180	   are grateful for all these contributions.

1182	7.  Security Considerations

1184	   The TCP sequence space is a fixed size, and as the window becomes
1185	   larger it becomes easier for an attacker to generate forged packets
1186	   that can fall within the TCP window, and be accepted as valid
1187	   segments.  While use of timestamps and PAWS can help to mitigate
1188	   this, when using PAWS, if an attacker is able to forge a packet that
1189	   is acceptable to the TCP connection, a timestamp that is in the
1190	   future would cause valid segments to be dropped due to PAWS checks.
1191	   Hence, implementers should take care to not open the TCP window
1192	   drastically beyond the requirements of the connection.

1194	   A naive implementation that derives the timestamp clock value
1195	   directly from a system uptime clock may unintentionally leak this
1196	   information to an attacker.  This does not directly compromise any of
1197	   the mechanisms described in this document.  However, this may be
1198	   valuable information to a potential attacker.  An implementer should
1199	   evaluate the potential impact and mitigate this accordingly (i.e. by
1200	   using a random offset for the timestamp clock on each connection, or
1201	   using an external, real-time derived timestamp clock source).

1203	   Expanding the TCP window beyond 64 KiB for IPv6 allows Jumbograms
1204	   [RFC2675] to be used when the local network supports packets larger
1205	   than 64 KiB.  When larger TCP segments are used, the TCP checksum
1206	   becomes weaker.

1208	   Mechanisms to protect the TCP header from modification should also
1209	   protect the TCP options.

1211	   Middleboxes and TCP options:

1213	      Some middleboxes have been known to remove the TCP options
1214	      described in this document from TCP segments [Honda11].
1215	      Middleboxes that remove TCP options described in this document
1216	      from the <SYN> segment interfere with the selection of parameters
1217	      appropriate for the session.  Removing any of these options in a
1218	      <SYN,ACK> segment will leave the end hosts in a state that
1219	      destroys the proper operation of the protocol.

1221	      *  If a Window Scale option is removed from a <SYN,ACK> segment,
1222	         the end hosts will not negotiate the window scaling factor
1223	         correctly.  Middleboxes must not remove or modify the Window
1224	         Scale option from <SYN,ACK> segments.

1226	      *  If a stateful firewall uses the window field to detect whether
1227	         a received segment is inside the current window, and does not
1228	         support the Window Scale option, it will not be able to
1229	         correctly determine whether or not a packet is in the window.
1230	         These middle boxes must also support the Window Scale option
1231	         and apply the scale factor when processing segments.  If the
1232	         window scale factor cannot be determined, it must not do window
1233	         based processing.

1235	      *  If the Timestamps option is removed from the <SYN> or <SYN,ACK>
1236	         segment, high speed connections that need PAWS would not have
1237	         that protection.  Successful negotiation of Timestamps option
1238	         enforces a stricter verification of incoming segments at the
1239	         receiver.  If the Timestamps option was removed from a
1240	         subsequent data segment after a successful negotiation (e.g. as
1241	         part of re-segmentation), the segment is discarded by the
1242	         receiver without further processing.  Middleboxes should not
1243	         remove the Timestamps option.

1245	      *  It must be noted that [RFC1323] doesn't address the case of the
1246	         Timestamps option being dropped or selectively omitted after
1247	         being negotiated, and that the update in this document may
1248	         cause some broken middlebox behavior to be detected
1249	         (potentially unresponsive TCP sessions).

1251	   Implementations that depend on PAWS could provide a mechanism for the
1252	   application to determine whether or not PAWS is in use on the
1253	   connection, and chose to terminate the connection if that protection
1254	   doesn't exist.  This is not just to protect the connection against
1255	   middleboxes that might remove the Timestamps option, but also against
1256	   remote hosts that do not have Timestamp support.

1258	7.1.  Privacy Considerations

1260	   The TCP options described in this document do not expose individual
1261	   users data.  However, a naive implementation simply using the system
1262	   clock as source for the Timestamps option will reveal characteristics
1263	   of the TCP potentially allowing more targeted attacks.  It is
1264	   therefore RECOMMENDED to generate a random, per-connection offset to
1265	   be used with the clock source when generating the Timestamps option
1266	   value (see Section 5.4).

1268	   Furthermore, the combination, relative ordering and padding of the
1269	   TCP options described in Section 2.2 and Section 3.2 will reveal
1270	   additional clues to allow the fingerprinting of the system.

1272	8.  IANA Considerations

1274	   This document has no actions for IANA.  The described TCP options are
1275	   well known from the superceded [RFC1323].

1277	9.  References

1279	9.1.  Normative References

1281	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
1282	              RFC 793, September 1981.

1284	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
1285	              November 1990.

1287	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1288	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1290	9.2.  Informative References

1292	   [Allman99]
1293	              Allman, M. and V. Paxson, "On Estimating End-to-End
1294	              Network Path Properties", Proc. ACM SIGCOMM Technical
1295	              Symposium, Cambridge, MA, September 1999,
1296	              <http://aciri.org/mallman/papers/estimation-la.pdf>.

1298	   [Ekstroem04]
1299	              Ekstroem, H. and R. Ludwig, "The Peak-Hopper: A New End-
1300	              to-End Retransmission Timer for Reliable Unicast
1301	              Transport", INFOCOM 2004 IEEE, March 2004, <http://
1302	              citeseerx.ist.psu.edu/viewdoc/
1303	              download?doi=10.1.1.76.2748&rep=rep1&type=pdf>.

1305	   [Floyd05]  Floyd, S., "[tcpm] How the RTO should be estimated with
1306	              timestamps", Message from 26.Jan.2007 to the tcpm mailing
1307	              list, August 2005, <http://www.ietf.org/mail-archive/web/
1308	              tcpm/current/msg02508.html>.

1310	   [Garlick77]
1311	              Garlick, L., Rom, R., and J. Postel, "Issues in Reliable
1312	              Host-to-Host Protocols", Proc. Second Berkeley Workshop on
1313	              Distributed Data Management and Computer Networks,
1314	              May 1977, <http://www.rfc-editor.org/ien/ien12.txt>.

1316	   [Hamming77]
1317	              Hamming, R., "Digital Filters", Prentice Hall, Englewood
1318	              Cliffs, N.J. ISBN 0-13-212571-4, 1977.

1320	   [Honda11]  Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A.,
1321	              Handley, M., and H. Tokuda, "Is it still possible to
1322	              extend TCP?", Proc. of ACM Internet Measurement
1323	              Conference (IMC) '11, November 2011.

1325	   [Jacobson88a]
1326	              Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
1327	              '88, Stanford,  CA., August 1988,
1328	              <http://ee.lbl.gov/papers/congavoid.pdf>.

1330	   [Jacobson90a]
1331	              Jacobson, V., "4BSD Header Prediction", ACM Computer
1332	              Communication Review, April 1990.

1334	   [Jacobson90c]
1335	              Jacobson, V., "Modified TCP congestion avoidance
1336	              algorithm", Message to the end2end-interest mailing list,
1337	              April 1990,
1338	              <ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail>.

1340	   [Jain86]   Jain, R., "Divergence of Timeout Algorithms for Packet
1341	              Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and
1342	              Comm., Scottsdale, Arizona, March 1986,
1343	              <http://arxiv.org/ftp/cs/papers/9809/9809097.pdf>.

1345	   [Karn87]   Karn, P. and C. Partridge, "Estimating Round-Trip Times in
1346	              Reliable Transport Protocols", Proc. SIGCOMM '87,
1347	              August 1987.

1349	   [Kuehlewind10]
1350	              Kuehlewind, M. and B. Briscoe, "Chirping for Congestion
1351	              Control - Implementation Feasibility", November 2010,
1352	              <bobbriscoe.net/projects/netsvc_i-f/chirp_pfldnet10.pdf>.

1354	   [Kuzmanovic03]
1355	              Kuzmanovic, A. and E. Knightly, "TCP-LP: Low-Priority
1356	              Service via End-Point Congestion Control", 2003,
1357	              <www.cs.northwestern.edu/~akuzma/doc/TCP-LP-ToN.pdf>.

1359	   [Ludwig00]
1360	              Ludwig, R. and K. Sklower, "The Eifel Retransmission
1361	              Timer", ACM SIGCOMM Computer Communication Review Volume
1362	              30 Issue 3, July 2000, <http://ccr.sigcomm.org/archive/
1363	              2000/july00/LudwigFinal.pdf>.

1365	   [Martin03]
1366	              Martin, D., "[Tsvwg] RFC 1323.bis", Message to the tsvwg
1367	              mailing list, September 2003, <http://www.ietf.org/
1368	              mail-archive/web/tsvwg/current/msg04435.html>.

1370	   [Mathis08]
1371	              Mathis, M., "[tcpm] Example of 1323 window retraction
1372	              problem", Message to the tcpm mailing list, March 2008, <h
1373	              ttp://www.ietf.org/mail-archive/web/tcpm/current/
1374	              msg03564.html>.

1376	   [Medina04]
1377	              Medina, A., Allman, M., and S. Floyd, "Measuring
1378	              Interactions Between Transport Protocols and Middleboxes",
1379	              Proc. ACM SIGCOMM/USENIX Internet Measurement Conference.
1380	              October 2004, August 2004,
1381	              <http://www.icir.net/tbit/tbit-Aug2004.pdf>.

1383	   [Medina05]
1384	              Medina, A., Allman, M., and S. Floyd, "Measuring the
1385	              Evolution of Transport Protocols in the Internet", ACM
1386	              Computer Communication Review 35(2), April 2005,
1387	              <http://icir.net/floyd/papers/TCPevolution-Mar2005.pdf>.

1389	   [Oppermann13]
1390	              Oppermann, A., "[tcpm] Explanation to the relaxation of
1391	              TSopt acceptance rules", Message to the tcpm mailing list,
1392	              Jun 2013, <http://www.ietf.org/mail-archive/web/tcpm/
1393	              current/msg08001.html>.

1395	   [RFC0896]  Nagle, J., "Congestion control in IP/TCP internetworks",
1396	              RFC 896, January 1984.

1398	   [RFC1072]  Jacobson, V. and R. Braden, "TCP extensions for long-delay
1399	              paths", RFC 1072, October 1988.

1401	   [RFC1110]  McKenzie, A., "Problem with the TCP big window option",
1402	              RFC 1110, August 1989.

1404	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
1405	              Communication Layers", STD 3, RFC 1122, October 1989.

1407	   [RFC1185]  Jacobson, V., Braden, B., and L. Zhang, "TCP Extension for
1408	              High-Speed Paths", RFC 1185, October 1990.

1410	   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
1411	              for High Performance", RFC 1323, May 1992.

1413	   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
1414	              for IP version 6", RFC 1981, August 1996.

1416	   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
1417	              Selective Acknowledgment Options", RFC 2018, October 1996.

1419	   [RFC2581]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
1420	              Control", RFC 2581, April 1999.

1422	   [RFC2675]  Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms",
1423	              RFC 2675, August 1999.

1425	   [RFC2883]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
1426	              Extension to the Selective Acknowledgement (SACK) Option
1427	              for TCP", RFC 2883, July 2000.

1429	   [RFC3522]  Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
1430	              for TCP", RFC 3522, April 2003.

1432	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
1433	              for TCP", RFC 4015, February 2005.

1435	   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
1436	              Discovery", RFC 4821, March 2007.

1438	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
1439	              Errors at High Data Rates", RFC 4963, July 2007.

1441	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1442	              Control", RFC 5681, September 2009.

1444	   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
1445	              "Computing TCP's Retransmission Timer", RFC 6298,
1446	              June 2011.

1448	   [RFC6528]  Gont, F. and S. Bellovin, "Defending against Sequence
1449	              Number Attacks", RFC 6528, February 2012.

1451	   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
1452	              and Y. Nishida, "A Conservative Loss Recovery Algorithm
1453	              Based on Selective Acknowledgment (SACK) for TCP",
1454	              RFC 6675, August 2012.

1456	   [RFC6691]  Borman, D., "TCP Options and Maximum Segment Size (MSS)",
1457	              RFC 6691, July 2012.

1459	   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
1460	              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
1461	              December 2012.

1463	   [Watson81]
1464	              Watson, R., "Timer-based Mechanisms in Reliable Transport
1465	              Protocol Connection Management", Computer Networks, Vol.
1466	              5, 1981.

1468	   [Zhang86]  Zhang, L., "Why TCP Timers Don't Work Well", Proc. SIGCOMM
1469	              '86, Stowe, VT, August 1986.

1471	Appendix A.  Implementation Suggestions

1473	   TCP Option Layout

1475	      The following layout is recommended for sending options on non-
1476	      <SYN> segments, to achieve maximum feasible alignment of 32-bit
1477	      and 64-bit machines.

1479	                   +--------+--------+--------+--------+
1480	                   |   NOP  |  NOP   |  TSopt |   10   |
1481	                   +--------+--------+--------+--------+
1482	                   |          TSval timestamp          |
1483	                   +--------+--------+--------+--------+
1484	                   |          TSecr timestamp          |
1485	                   +--------+--------+--------+--------+

1487	   Interaction with the TCP Urgent Pointer

1489	      The TCP Urgent pointer, like the TCP window, is a 16 bit value.
1490	      Some of the original discussion for the TCP Window Scale option
1491	      included proposals to increase the Urgent pointer to 32 bits.  As
1492	      it turns out, this is unnecessary.  There are two observations
1493	      that should be made:

1495	      (1)  With IP Version 4, the largest amount of TCP data that can be
1496	           sent in a single packet is 65495 bytes (64 KiB - 1 -- size of
1497	           fixed IP and TCP headers).

1499	      (2)  Updates to the urgent pointer while the user is in "urgent
1500	           mode" are invisible to the user.

1502	      This means that if the Urgent Pointer points beyond the end of the
1503	      TCP data in the current segment, then the user will remain in
1504	      urgent mode until the next TCP segment arrives.  That segment will
1505	      update the urgent pointer to a new offset, and the user will never
1506	      have left urgent mode.

1508	      Thus, to properly implement the Urgent Pointer, the sending TCP
1509	      only has to check for overflow of the 16 bit Urgent Pointer field
1510	      before filling it in.  If it does overflow, than a value of 65535
1511	      should be inserted into the Urgent Pointer.

1513	      The same technique applies to IP Version 6, except in the case of
1514	      IPv6 Jumbograms.  When IPv6 Jumbograms are supported, [RFC2675]
1515	      requires additional steps for dealing with the Urgent Pointer,
1516	      these are described in section 5.2 of [RFC2675].

1518	Appendix B.  Duplicates from Earlier Connection Incarnations

1520	   There are two cases to be considered: (1) a system crashing (and
1521	   losing connection state) and restarting, and (2) the same connection
1522	   being closed and reopened without a loss of host state.  These will
1523	   be described in the following two sections.

1525	B.1.  System Crash with Loss of State

1527	   TCP's quiet time of one MSL upon system startup handles the loss of
1528	   connection state in a system crash/restart.  For an explanation, see
1529	   for example "When to Keep Quiet" in the TCP protocol specification
1530	   [RFC0793].  The MSL that is required here does not depend upon the
1531	   transfer speed.  The current TCP MSL of 2 minutes seemed acceptable
1532	   as an operational compromise, when many host systems used to take
1533	   this long to boot after a crash.  Current host systems can boot
1534	   considerably faster.

1536	   The Timestamps option may be used to ease the MSL requirements (or to
1537	   provide additional security against data corruption).  If timestamps
1538	   are being used and if the timestamp clock can be guaranteed to be
1539	   monotonic over a system crash/restart, i.e., if the first value of
1540	   the sender's timestamp clock after a crash/restart can be guaranteed
1541	   to be greater than the last value before the restart, then a quiet
1542	   time is unnecessary.

1544	   To dispense totally with the quiet time would require that the host
1545	   clock be synchronized to a time source that is stable over the crash/
1546	   restart period, with an accuracy of one timestamp clock tick or
1547	   better.  We can back off from this strict requirement to take
1548	   advantage of approximate clock synchronization.  Suppose that the
1549	   clock is always re-synchronized to within N timestamp clock ticks and
1550	   that booting (extended with a quiet time, if necessary) takes more
1551	   than N ticks.  This will guarantee monotonicity of the timestamps,
1552	   which can then be used to reject old duplicates even without an
1553	   enforced MSL.

1555	B.2.  Closing and Reopening a Connection

1557	   When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT state
1558	   ties up the socket pair for 4 minutes (see Section 3.5 of [RFC0793].
1559	   Applications built upon TCP that close one connection and open a new
1560	   one (e.g., an FTP data transfer connection using Stream mode) must
1561	   choose a new socket pair each time.  The TIME-WAIT delay serves two
1562	   different purposes:

1564	   (a)  Implement the full-duplex reliable close handshake of TCP.

1566	        The proper time to delay the final close step is not really
1567	        related to the MSL; it depends instead upon the RTO for the FIN
1568	        segments and therefore upon the RTT of the path.  (It could be
1569	        argued that the side that is sending a FIN knows what degree of
1570	        reliability it needs, and therefore it should be able to
1571	        determine the length of the TIME-WAIT delay for the FIN's
1572	        recipient.  This could be accomplished with an appropriate TCP
1573	        option in FIN segments.)

1575	        Although there is no formal upper-bound on RTT, common network
1576	        engineering practice makes an RTT greater than 1 minute very
1577	        unlikely.  Thus, the 4 minute delay in TIME-WAIT state works
1578	        satisfactorily to provide a reliable full-duplex TCP close.
1579	        Note again that this is independent of MSL enforcement and
1580	        network speed.

1582	        The TIME-WAIT state could cause an indirect performance problem
1583	        if an application needed to repeatedly close one connection and
1584	        open another at a very high frequency, since the number of
1585	        available TCP ports on a host is less than 2^16.  However, high
1586	        network speeds are not the major contributor to this problem;
1587	        the RTT is the limiting factor in how quickly connections can be
1588	        opened and closed.  Therefore, this problem will be no worse at
1589	        high transfer speeds.

1591	   (b)  Allow old duplicate segments to expire.

1593	        To replace this function of TIME-WAIT state, a mechanism would
1594	        have to operate across connections.  PAWS is defined strictly
1595	        within a single connection; the last timestamp (TS.Recent) is
1596	        kept in the connection control block, and discarded when a
1597	        connection is closed.

1599	        An additional mechanism could be added to the TCP, a per-host
1600	        cache of the last timestamp received from any connection.  This
1601	        value could then be used in the PAWS mechanism to reject old
1602	        duplicate segments from earlier incarnations of the connection,
1603	        if the timestamp clock can be guaranteed to have ticked at least
1604	        once since the old connection was open.  This would require that
1605	        the TIME-WAIT delay plus the RTT together must be at least one
1606	        tick of the sender's timestamp clock.  Such an extension is not
1607	        part of the proposal of this RFC.

1609	        Note that this is a variant on the mechanism proposed by
1610	        Garlick, Rom, and Postel [Garlick77], which required each host
1611	        to maintain connection records containing the highest sequence
1612	        numbers on every connection.  Using timestamps instead, it is
1613	        only necessary to keep one quantity per remote host, regardless
1614	        of the number of simultaneous connections to that host.

1616	Appendix C.  Summary of Notation

1618	   The following notation has been used in this document.

1620	   Options

1622	      WSopt:            TCP Window Scale option
1623	      TSopt:            TCP Timestamps option

1625	   Option Fields

1627	      shift.cnt:        Window scale byte in WSopt
1628	      TSval:            32-bit Timestamp Value field in TSopt
1629	      TSecr:            32-bit Timestamp Reply field in TSopt

1631	   Option Fields in Current Segment

1633	      SEG.TSval:        TSval field from TSopt in current segment
1634	      SEG.TSecr:        TSecr field from TSopt in current segment
1635	      SEG.WSopt:        8-bit value in WSopt

1637	   Clock Values

1639	      my.TSclock:       System wide source of 32-bit timestamp values
1640	      my.TSclock.rate:  Period of my.TSclock (1 ms to 1 sec)
1641	      Snd.TSoffset:     A offset for randomizing Snd.TSclock
1642	      Snd.TSclock:      my.TSclock + Snd.TSoffset

1644	   Per-Connection State Variables

1646	      TS.Recent:        Latest received Timestamp
1647	      Last.ACK.sent:    Last ACK field sent
1648	      Snd.TS.OK:        1-bit flag
1649	      Snd.WS.OK:        1-bit flag
1650	      Rcv.Wind.Shift:   Receive window scale exponent
1651	      Snd.Wind.Shift:   Send window scale exponent
1652	      Start.Time:       Snd.TSclock value when segment being timed was
1653	                        sent (used by pre-1323 code).

1655	   Procedure

1657	      Update_SRTT(m)    Procedure to update the smoothed RTT and RTT
1658	                        variance estimates, using the rules of
1659	                        [Jacobson88a], given m, a new RTT measurement

1661	Appendix D.  Event Processing Summary

1663	   OPEN Call

1665	      ...

1667	      An initial send sequence number (ISS) is selected.  Send a <SYN>
1668	      segment of the form:

1670	        <SEQ=ISS><CTL=SYN><TSval=Snd.TSclock><WSopt=Rcv.Wind.Shift>

1672	      ...

1674	   SEND Call

1676	      CLOSED STATE (i.e., TCB does not exist)

1678	         ...

1680	      LISTEN STATE

1682	         If the foreign socket is specified, then change the connection
1683	         from passive to active, select an ISS.  Send a <SYN> segment
1684	         containing the options: <TSval=Snd.TSclock> and
1685	         <WSopt=Rcv.Wind.Shift>.  Set SND.UNA to ISS, SND.NXT to ISS+1.
1686	         Enter SYN-SENT state. ...

1688	      SYN-SENT STATE
1689	      SYN-RECEIVED STATE

1691	         ...

1693	      ESTABLISHED STATE
1694	      CLOSE-WAIT STATE

1696	         Segmentize the buffer and send it with a piggybacked
1697	         acknowledgment (acknowledgment value = RCV.NXT). ...

1699	         If the urgent flag is set ...

1701	         If the Snd.TS.OK flag is set, then include the TCP Timestamps
1702	         option <TSval=Snd.TSclock,TSecr=TS.Recent> in each data
1703	         segment.

1705	         Scale the receive window for transmission in the segment
1706	         header:

1708	                   SEG.WND = (RCV.WND >> Rcv.Wind.Shift).

1710	   SEGMENT ARRIVES

1712	      ...

1714	      If the state is LISTEN then

1716	         first check for an RST

1718	            ...

1720	         second check for an ACK

1722	            ...

1724	         third check for a SYN

1726	            if the SYN bit is set, check the security.  If the ...

1728	               ...

1730	            if the SEG.PRC is less than the TCB.PRC then continue.

1732	            Check for a Window Scale option (WSopt); if one is found,
1733	            save SEG.WSopt in Snd.Wind.Shift and set Snd.WS.OK flag on.
1734	            Otherwise, set both Snd.Wind.Shift and Rcv.Wind.Shift to
1735	            zero and clear Snd.WS.OK flag.

1737	            Check for a TSopt option; if one is found, save SEG.TSval in
1738	            the variable TS.Recent and turn on the Snd.TS.OK bit.

1740	            Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any
1741	            other control or text should be queued for processing later.
1742	            ISS should be selected and a <SYN> segment sent of the form:

1744	                    <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK>

1746	            If the Snd.WS.OK bit is on, include a WSopt option
1747	            <WSopt=Rcv.Wind.Shift> in this segment.  If the Snd.TS.OK
1748	            bit is on, include a TSopt <TSval=Snd.TSclock,
1749	            TSecr=TS.Recent> in this segment.  Last.ACK.sent is set to
1750	            RCV.NXT.

1752	            SND.NXT is set to ISS+1 and SND.UNA to ISS.  The connection
1753	            state should be changed to SYN-RECEIVED.  Note that any
1754	            other incoming control or data (combined with SYN) will be
1755	            processed in the SYN-RECEIVED state, but processing of SYN
1756	            and ACK should not be repeated.  If the listen was not fully
1757	            specified (i.e., the foreign socket was not fully
1758	            specified), then the unspecified fields should be filled in
1759	            now.

1761	         fourth other text or control

1763	            ...

1765	      If the state is SYN-SENT then

1767	         first check the ACK bit

1769	            ...

1771	         ...

1773	         fourth check the SYN bit

1775	            ...

1777	            If the SYN bit is on and the security/compartment and
1778	            precedence are acceptable then, RCV.NXT is set to SEG.SEQ+1,
1779	            IRS is set to SEG.SEQ, and any acknowledgements on the
1780	            retransmission queue which are thereby acknowledged should
1781	            be removed.

1783	            Check for a Window Scale option (WSopt); if it is found,
1784	            save SEG.WSopt in Snd.Wind.Shift; otherwise, set both
1785	            Snd.Wind.Shift and Rcv.Wind.Shift to zero.

1787	            Check for a TSopt option; if one is found, save SEG.TSval in
1788	            variable TS.Recent and turn on the Snd.TS.OK bit in the
1789	            connection control block.  If the ACK bit is set, use
1790	            Snd.TSclock - SEG.TSecr as the initial RTT estimate.

1792	            If SND.UNA > ISS (our <SYN> has been ACKed), change the
1793	            connection state to ESTABLISHED, form an <ACK> segment:

1795	                    <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

1797	            and send it.  If the Snd.Echo.OK bit is on, include a TSopt
1798	            option <TSval=Snd.TSclock,TSecr=TS.Recent> in this <ACK>
1799	            segment.  Last.ACK.sent is set to RCV.NXT.

1801	            Data or controls which were queued for transmission may be
1802	            included.  If there are other controls or text in the
1803	            segment then continue processing at the sixth step below
1804	            where the URG bit is checked, otherwise return.

1806	            Otherwise enter SYN-RECEIVED, form a <SYN,ACK> segment:

1808	                    <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK>

1810	            and send it.  If the Snd.Echo.OK bit is on, include a TSopt
1811	            option <TSval=Snd.TSclock,TSecr=TS.Recent> in this segment.
1812	            If the Snd.WS.OK bit is on, include a WSopt option
1813	            <WSopt=Rcv.Wind.Shift> in this segment.  Last.ACK.sent is
1814	            set to RCV.NXT.

1816	            If there are other controls or text in the segment, queue
1817	            them for processing after the ESTABLISHED state has been
1818	            reached, return.

1820	         fifth, if neither of the SYN or RST bits is set then drop the
1821	         segment and return.

1823	      Otherwise,

1825	      First, check sequence number

1827	         SYN-RECEIVED STATE
1828	         ESTABLISHED STATE
1829	         FIN-WAIT-1 STATE
1830	         FIN-WAIT-2 STATE
1831	         CLOSE-WAIT STATE
1832	         CLOSING STATE
1833	         LAST-ACK STATE
1834	         TIME-WAIT STATE

1836	            Segments are processed in sequence.  Initial tests on
1837	            arrival are used to discard old duplicates, but further
1838	            processing is done in SEG.SEQ order.  If a segment's
1839	            contents straddle the boundary between old and new, only the
1840	            new parts should be processed.

1842	            Rescale the received window field:

1844	                  TrueWindow = SEG.WND << Snd.Wind.Shift,

1846	            and use "TrueWindow" in place of SEG.WND in the following
1847	            steps.

1849	            Check whether the segment contains a Timestamps option and
1850	            bit Snd.TS.OK is on.  If so:

1852	               If SEG.TSval < TS.Recent and the RST bit is off, then
1853	               test whether connection has been idle less than 24 days;
1854	               if all are true, then the segment is not acceptable;
1855	               follow steps below for an unacceptable segment.

1857	               If SEG.SEQ is less than or equal to Last.ACK.sent, then
1858	               save SEG.TSval in variable TS.Recent.

1860	            There are four cases for the acceptability test for an
1861	            incoming segment:

1863	               ...

1865	            If an incoming segment is not acceptable, an acknowledgment
1866	            should be sent in reply (unless the RST bit is set, if so
1867	            drop the segment and return):

1869	                    <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

1871	            Last.ACK.sent is set to SEG.ACK of the acknowledgment.  If
1872	            the Snd.Echo.OK bit is on, include the Timestamps option
1873	            <TSval=Snd.TSclock,TSecr=TS.Recent> in this <ACK> segment.
1874	            Set Last.ACK.sent to SEG.ACK and send the <ACK> segment.
1875	            After sending the acknowledgment, drop the unacceptable
1876	            segment and return.

1878	      ...

1880	      fifth check the ACK field.

1882	         if the ACK bit is off drop the segment and return.

1884	         if the ACK bit is on

1886	            ...

1888	            ESTABLISHED STATE

1890	               If SND.UNA < SEG.ACK <= SND.NXT then, set SND.UNA <-
1891	               SEG.ACK.  Also compute a new estimate of round-trip time.
1892	               If Snd.TS.OK bit is on, use Snd.TSclock - SEG.TSecr;
1893	               otherwise use the elapsed time since the first segment in
1894	               the retransmission queue was sent.  Any segments on the
1895	               retransmission queue which are thereby entirely
1896	               acknowledged...

1898	      ...

1900	      Seventh, process the segment text.

1902	         ESTABLISHED STATE
1903	         FIN-WAIT-1 STATE
1904	         FIN-WAIT-2 STATE
1905	            ...

1907	            Send an acknowledgment of the form:

1909	                    <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

1911	            If the Snd.TS.OK bit is on, include Timestamps option
1912	            <TSval=Snd.TSclock,TSecr=TS.Recent> in this <ACK> segment.
1913	            Set Last.ACK.sent to SEG.ACK of the acknowledgment, and send
1914	            it.  This acknowledgment should be piggy-backed on a segment
1915	            being transmitted if possible without incurring undue delay.

1917	            ...

1919	Appendix E.  Timestamps Edge Cases

1921	   While the rules laid out for when to calculate RTTM produce the
1922	   correct results most of the time, there are some edge cases where an
1923	   incorrect RTTM can be calculated.  All of these situations involve
1924	   the loss of segments.  It is felt that these scenarios are rare, and
1925	   that if they should happen, they will cause a single RTTM measurement
1926	   to be inflated, which mitigates its effects on RTO calculations.

1928	   [Martin03] cites two similar cases when the returning <ACK> is lost,
1929	   and before the retransmission timer fires, another returning <ACK>
1930	   segment arrives, which aknowledges the data.  In this case, the RTTM
1931	   calculated will be inflated:

1933	           clock
1934	             tc=1   <A, TSval=1> ------------------->

1936	             tc=2   (lost) <---- <ACK(A), TSecr=1, win=n>
1937	                 (RTTM would have been 1)

1939	                    (receive window opens, window update is sent)
1940	             tc=5        <---- <ACK(A), TSecr=1, win=m>
1941	                    (RTTM is calculated at 4)

1943	   One thing to note about this situation is that it is somewhat bounded
1944	   by RTO + RTT, limiting how far off the RTTM calculation will be.
1945	   While more complex scenarios can be constructed that produce larger
1946	   inflations (e.g., retransmissions are lost), those scenarios involve
1947	   multiple segment losses, and the connection will have other more
1948	   serious operational problems than using an inflated RTTM in the RTO
1949	   calculation.

1951	Appendix F.  Window Retraction Example

1953	   Consider an established TCP connection using a scale factor of 128,
1954	   Snd.Wind.Shift=7 and Rcv.Wind.Shift=7, that is running with a very
1955	   small window because the receiver is bottlenecked and both ends are
1956	   doing small reads and writes.

1958	   Consider the ACKs coming back:

1960	   SEG.ACK  SEG.WIN computed SND.WIN   receiver's actual window
1961	   1000     2       1256               1300

1963	   The sender writes 40 bytes and receiver ACKs:

1965	   1040     2       1296               1300

1967	   The sender writes 5 additional bytes and the receiver has a problem.
1968	   Two choices:

1970	   1045     2       1301               1300   - BEYOND BUFFER

1972	   1045     1       1173               1300   - RETRACTED WINDOW

1974	   This is a general problem and can happen any time the sender does a
1975	   write which is smaller than the window scale factor.

1977	   In most stacks it is at least partially obscured when the window size
1978	   is larger than some small number of segments because the stacks
1979	   prefer to announce windows that are an integral number of segments,
1980	   rounded up to the next scale factor.  This plus silly window
1981	   suppression tends to cause less frequent, larger window updates.  If
1982	   the window was rounded down to a segment size there is more
1983	   opportunity to advance the window, the BEYOND BUFFER case above,
1984	   rather than retracting it.

1986	Appendix G.  RTO calculation modification

1988	   Taking multiple RTT samples per window would shorten the history
1989	   calculated by the RTO mechanism in [RFC6298], and the below algorithm
1990	   aims to maintain a similar history as originally intended by
1991	   [RFC6298].

1993	   It is roughly known how many samples a congestion window worth of
1994	   data will yield, not accounting for ACK compression, and ACK losses.
1995	   Such events will result in more history of the path being reflected
1996	   in the final value for RTO, and are uncritical.  This modification
1997	   will ensure that a similar amount of time is taken into account for
1998	   the RTO estimation, regardless of how many samples are taken per
1999	   window:

2001	      ExpectedSamples = ceiling(FlightSize / (SMSS * 2))

2003	      alpha' = alpha / ExpectedSamples

2005	      beta' = beta / ExpectedSamples

2007	   Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".

2009	   Instead of using alpha and beta in the algorithm of [RFC6298], use
2010	   alpha' and beta' instead:

2012	      RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|

2014	      SRTT <- (1 - alpha') * SRTT + alpha' * R'

2016	      (for each sample R')

2018	Appendix H.  Changes from RFC 1323

2020	   Several important updates and clarifications to the specification in
2021	   RFC 1323 are made in these document.  The technical changes are
2022	   summarized below:

2024	   (a)  A wrong reference to SND.WND was corrected to SEG.WND in
2025	        Section 2.3

2027	   (b)  Section 2.4 was added describing the unavoidable window
2028	        retraction issue, and explicitly describing the mitigation steps
2029	        necessary.

2031	   (c)  In Section 3.2 the wording how the Timestamps option negotiation
2032	        is to be performed was updated with RFC2119 wording.  Further, a
2033	        number of paragraphs were added to clarify the expected behavior
2034	        with a compliant implementation using TSopt, as RFC1323 left
2035	        room for interpretation - e.g. potential late enablement of
2036	        TSopt.

2038	   (d)  The description of which TSecr values can be used to update the
2039	        measured RTT has been clarified.  Specifically, with timestamps,
2040	        the Karn algorithm [Karn87] is disabled.  The Karn algorithm
2041	        disables all RTT measurements during retransmission, since it is
2042	        ambiguous whether the <ACK> is for the original segment, or the
2043	        retransmitted segment.  With timestamps, that ambiguity is
2044	        removed since the TSecr in the <ACK> will contain the TSval from
2045	        whichever data segment made it to the destination.

2047	   (e)  RTTM update processing explicitly excludes segments not updating
2048	        SND.UNA.  The original text could be interpreted to allow taking
2049	        RTT samples when SACK acknowledges some new, non-continuous
2050	        data.

2052	   (f)  In RFC1323, section 3.4, step (2) of the algorithm to control
2053	        which timestamp is echoed was incorrect in two regards:

2055	        (1)  It failed to update TS.recent for a retransmitted segment
2056	             that resulted from a lost <ACK>.

2058	        (2)  It failed if SEG.LEN = 0.

2060	        In the new algorithm, the case of SEG.TSval >= TS.recent is
2061	        included for consistency with the PAWS test.

2063	   (g)  It is now recommended that the Timestamps option is included in
2064	        <RST> segments if the incoming segment contained a Timestamps
2065	        option.

2067	   (h)  <RST> segments are explicitly excluded from PAWS processing.

2069	   (i)  Added text to clarify the precedence between regular TCP
2070	        [RFC0793] and this document Timestamps option / PAWS processing.
2071	        Discussion about combined acceptability checks are ongoing.

2073	   (j)  Snd.TSoffset and Snd.TSclock variables have been added.
2074	        Snd.TSclock is the sum of my.TSclock and Snd.TSoffset.  This
2075	        allows the starting points for timestamp values to be randomized
2076	        on a per-connection basis.  Setting Snd.TSoffset to zero yields
2077	        the same results as [RFC1323].  Text was added to guide
2078	        implementors to the proper selection of these offsets, as
2079	        entirly random offsets for each new connection will conflict
2080	        with PAWS.

2082	   (k)  Appendix A has been expanded with information about the TCP
2083	        Urgent Pointer.  An earlier revision contained text around the
2084	        TCP MSS option, which was split off into [RFC6691].

2086	   (l)  One correction was made to the Event Processing Summary in
2087	        Appendix D.  In SEND CALL/ESTABLISHED STATE, RCV.WND is used to
2088	        fill in the SEG.WND value, not SND.WND.

2090	   (m)  Appendix G was added to exemplify how an RTO calculation might
2091	        be updated to properly take the much higher RTT sampling
2092	        frequency enabled by the Timestamps option into account.

2094	   Editorial changes of the document, that don't impact the
2095	   implementation or function of the mechanisms described in this
2096	   document include:

2098	   (a)  Removed much of the discussion in Section 1 to streamline the
2099	        document.  However, detailed examples and discussions in
2100	        Section 2, Section 3 and Section 5 are kept as guideline for
2101	        implementers.

2103	   (b)  Added short text that the use of WS increases the chances of
2104	        sequence number wrap, thus the PAWS mechanism is required in
2105	        certain environments.

2107	   (c)  Removed references to "new" options, as the options were
2108	        introduced in [RFC1323] already.  Changed the text in
2109	        Section 1.3 to specifically address TS and WS options.

2111	   (d)  Section 1.4 was added for [RFC2119] wording.  Normative text was
2112	        updated with the appropriate phrases.

2114	   (e)  Added < > brackets to mark specific types of segments, and
2115	        replaced most occurances of "packet" with "segment", where TCP
2116	        segments are referred to.

2118	   (f)  Updated the text in Section 3 to take into account what has been
2119	        learned since [RFC1323].

2121	   (g)  Removed the list of changes between [RFC1323] and prior
2122	        versions.  These changes are mentioned in Appendix C of
2123	        [RFC1323].

2125	   (h)  Moved Appendix Changes from RFC 1323 to the end of the
2126	        appendices for easier lookup.  In addition, the entries were
2127	        split into a technical and an editorial part, and sorted to
2128	        roughly correspond with the sections in the text where they
2129	        apply.

2131	Authors' Addresses

2133	   David Borman
2134	   Quantum Corporation
2135	   Mendota Heights  MN 55120
2136	   USA

2138	   Email: david.borman@quantum.com

2140	   Bob Braden
2141	   University of Southern California
2142	   4676 Admiralty Way
2143	   Marina del Rey  CA 90292
2144	   USA

2146	   Email: braden@isi.edu

2148	   Van Jacobson
2149	   Google, Inc.
2150	   1600 Amphitheatre Parkway
2151	   Mountain View  CA 94043
2152	   USA

2154	   Email: vanj@google.com

2156	   Richard Scheffenegger (editor)
2157	   NetApp, Inc.
2158	   Am Euro Platz 2
2159	   Vienna,   1120
2160	   Austria

2162	   Email: rs@netapp.com