idnits 2.17.1 

draft-ietf-tcpm-tcp-security-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 4 instances of too long lines in the document, the longest one
     being 8 characters in excess of 72.

  == There are 4 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 430: '...   TCP SHOULD randomize its ephemeral ...'
     RFC 2119 keyword, line 432: '...gest posible port range SHOULD be used...'
     RFC 2119 keyword, line 440: '...   TCP MUST NOT allocate port number 0...'
     RFC 2119 keyword, line 442: '...ion Port, a RST segment SHOULD be sent...'
     RFC 2119 keyword, line 476: '...   TCP MUST be able to grecefully hand...'
     (94 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 21, 2011) is 4844 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Clark' is mentioned on line 187, but not defined

  -- Looks like a reference, but probably isn't: '1988' on line 187

  == Missing Reference: 'Bellovin' is mentioned on line 4316, but not defined

  -- Looks like a reference, but probably isn't: '1989' on line 4208

  == Missing Reference: 'NISCC' is mentioned on line 3947, but not defined

  -- Looks like a reference, but probably isn't: '2005' on line 219

  == Missing Reference: 'Silbersack' is mentioned on line 219, but not defined

  == Missing Reference: 'Postel' is mentioned on line 4637, but not defined

  == Missing Reference: '1981c' is mentioned on line 4637, but not defined

  == Missing Reference: 'Braden' is mentioned on line 4208, but not defined

  == Missing Reference: 'Jones' is mentioned on line 459, but not defined

  -- Looks like a reference, but probably isn't: '2003' on line 3092

  == Missing Reference: 'CERT' is mentioned on line 2987, but not defined

  -- Looks like a reference, but probably isn't: '1996' on line 4612

  == Missing Reference: 'Meltman' is mentioned on line 488, but not defined

  -- Looks like a reference, but probably isn't: '1997' on line 1906

  == Missing Reference: 'Morris' is mentioned on line 567, but not defined

  -- Looks like a reference, but probably isn't: '1985' on line 567

  == Missing Reference: 'Shimomura' is mentioned on line 1978, but not defined

  -- Looks like a reference, but probably isn't: '1995' on line 1978

  -- Looks like a reference, but probably isn't: '2001' on line 4275

  == Missing Reference: 'US-CERT' is mentioned on line 4013, but not defined

  == Missing Reference: 'Zalewski' is mentioned on line 4719, but not defined

  == Missing Reference: '2001a' is mentioned on line 590, but not defined

  -- Looks like a reference, but probably isn't: '2002' on line 2853

  -- Looks like a reference, but probably isn't: '1987' on line 681

  -- Looks like a reference, but probably isn't: '1992' on line 681

  == Missing Reference: '2001b' is mentioned on line 754, but not defined

  == Missing Reference: 'Watson' is mentioned on line 3947, but not defined

  -- Looks like a reference, but probably isn't: '2004' on line 3947

  == Missing Reference: 'Heffner' is mentioned on line 2853, but not defined

  == Missing Reference: 'Barisani' is mentioned on line 1038, but not defined

  -- Looks like a reference, but probably isn't: '2006' on line 4273

  == Missing Reference: 'Ed3f' is mentioned on line 1054, but not defined

  == Missing Reference: '2005d' is mentioned on line 1750, but not defined

  -- Looks like a reference, but probably isn't: '2008' on line 4777

  == Missing Reference: 'Myst' is mentioned on line 1091, but not defined

  == Missing Reference: 'IANA' is mentioned on line 1098, but not defined

  -- Looks like a reference, but probably isn't: '2007' on line 4682

  == Missing Reference: 'Hnes' is mentioned on line 1103, but not defined

  -- Looks like a reference, but probably isn't: '1994' on line 1393

  == Missing Reference: 'CCSDS' is mentioned on line 1168, but not defined

  == Missing Reference: 'Stevens' is mentioned on line 1393, but not defined

  == Missing Reference: 'Reed' is mentioned on line 1408, but not defined

  == Missing Reference: '1981a' is mentioned on line 1422, but not defined

  == Missing Reference: 'Heffernan' is mentioned on line 1608, but not defined

  -- Looks like a reference, but probably isn't: '1998' on line 4367

  == Missing Reference: 'Welzl' is mentioned on line 1653, but not defined

  == Missing Reference: '2005c' is mentioned on line 1744, but not defined

  == Missing Reference: 'Gont' is mentioned on line 1889, but not defined

  == Missing Reference: '2008b' is mentioned on line 1889, but not defined

  == Missing Reference: 'Borman' is mentioned on line 1906, but not defined

  == Missing Reference: 'Eddy' is mentioned on line 1906, but not defined

  == Missing Reference: 'Lemon' is mentioned on line 1910, but not defined

  == Missing Reference: 'Bernstein' is mentioned on line 1996, but not defined

  == Missing Reference: 'Zquete' is mentioned on line 1986, but not defined

  == Missing Reference: 'CPNI' is mentioned on line 4795, but not defined

  -- Looks like a reference, but probably isn't: '2000' on line 2582

  == Missing Reference: '2003b' is mentioned on line 4719, but not defined

  == Missing Reference: 'Linux' is mentioned on line 2306, but not defined

  == Missing Reference: 'Shalunov' is mentioned on line 2582, but not defined

  == Missing Reference: '2004a' is mentioned on line 2726, but not defined

  == Missing Reference: 'CORE' is mentioned on line 2987, but not defined

  == Missing Reference: 'Allman' is mentioned on line 3092, but not defined

  == Missing Reference: '2005a' is mentioned on line 3158, but not defined

  == Missing Reference: 'Touch' is mentioned on line 3667, but not defined

  == Missing Reference: 'Ostermann' is mentioned on line 3328, but not defined

  == Missing Reference: 'PCNWG' is mentioned on line 3696, but not defined

  -- Looks like a reference, but probably isn't: '2009' on line 4795

  == Missing Reference: '2003a' is mentioned on line 4013, but not defined

  == Missing Reference: 'Fyodor' is mentioned on line 4585, but not defined

  == Missing Reference: '2006b' is mentioned on line 4585, but not defined

  == Missing Reference: 'TBIT' is mentioned on line 4268, but not defined

  == Missing Reference: '2006a' is mentioned on line 4272, but not defined

  == Missing Reference: 'Miller' is mentioned on line 4273, but not defined

  == Missing Reference: 'Beck' is mentioned on line 4275, but not defined

  == Missing Reference: 'Rowland' is mentioned on line 4455, but not defined

  == Missing Reference: 'Zander' is mentioned on line 4458, but not defined

  == Missing Reference: 'Maimon' is mentioned on line 4612, but not defined

  == Unused Reference: 'RFC6093' is defined on line 5308, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-04) exists of
     draft-ietf-tcpm-tcp-timestamps-03

  ** Obsolete normative reference: RFC 6093 (Obsoleted by RFC 9293)


     Summary: 4 errors (**), 0 flaws (~~), 64 warnings (==), 21 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance and Minor                                        F. Gont
3	Extensions (tcpm)                                                UK CPNI
4	Internet-Draft                                          January 21, 2011
5	Intended status: BCP
6	Expires: July 25, 2011

8	     Security Assessment of the Transmission Control Protocol (TCP)
9	                  draft-ietf-tcpm-tcp-security-02.txt

11	Abstract

13	   This document contains a security assessment of the specifications of
14	   the Transmission Control Protocol (TCP), and of a number of
15	   mechanisms and policies in use by popular TCP implementations.
16	   Additionally, it contains best current practices for hardening a TCP
17	   implementation.

19	Status of this Memo

21	   This Internet-Draft is submitted to IETF in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on July 25, 2011.

36	Copyright Notice

38	   Copyright (c) 2011 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Preface . . . . . . . . . . . . . . . . . . . . . . . . . . .   5
54	     1.1.  Introduction  . . . . . . . . . . . . . . . . . . . . . .   5
55	     1.2.  Scope of this document  . . . . . . . . . . . . . . . . .   6
56	     1.3.  Organization of this document . . . . . . . . . . . . . .   8
57	   2.  The Transmission Control Protocol . . . . . . . . . . . . . .   8
58	   3.  TCP header fields . . . . . . . . . . . . . . . . . . . . . .   9
59	     3.1.  Source Port and Destination Port  . . . . . . . . . . . .  10
60	     3.2.  Sequence number . . . . . . . . . . . . . . . . . . . . .  12
61	     3.3.  Acknowledgement Number  . . . . . . . . . . . . . . . . .  14
62	     3.4.  Data Offset . . . . . . . . . . . . . . . . . . . . . . .  15
63	     3.5.  Control bits  . . . . . . . . . . . . . . . . . . . . . .  15
64	       3.5.1.  Reserved (four bits)  . . . . . . . . . . . . . . . .  15
65	       3.5.2.  CWR (Congestion Window Reduced) . . . . . . . . . . .  16
66	       3.5.3.  ECE (ECN-Echo)  . . . . . . . . . . . . . . . . . . .  16
67	       3.5.4.  URG . . . . . . . . . . . . . . . . . . . . . . . . .  17
68	       3.5.5.  ACK . . . . . . . . . . . . . . . . . . . . . . . . .  17
69	       3.5.6.  PSH . . . . . . . . . . . . . . . . . . . . . . . . .  17
70	       3.5.7.  RST . . . . . . . . . . . . . . . . . . . . . . . . .  19
71	       3.5.8.  SYN . . . . . . . . . . . . . . . . . . . . . . . . .  19
72	       3.5.9.  FIN . . . . . . . . . . . . . . . . . . . . . . . . .  20
73	     3.6.  Window  . . . . . . . . . . . . . . . . . . . . . . . . .  20
74	     3.7.  Checksum  . . . . . . . . . . . . . . . . . . . . . . . .  22
75	     3.8.  Urgent pointer  . . . . . . . . . . . . . . . . . . . . .  23
76	     3.9.  Options . . . . . . . . . . . . . . . . . . . . . . . . .  24
77	     3.10. Padding . . . . . . . . . . . . . . . . . . . . . . . . .  28
78	     3.11. Data  . . . . . . . . . . . . . . . . . . . . . . . . . .  28
79	   4.  Common TCP Options  . . . . . . . . . . . . . . . . . . . . .  29
80	     4.1.  End of Option List (Kind = 0) . . . . . . . . . . . . . .  29
81	     4.2.  No Operation (Kind = 1) . . . . . . . . . . . . . . . . .  29
82	     4.3.  Maximum Segment Size (Kind = 2) . . . . . . . . . . . . .  29
83	     4.4.  Selective Acknowledgement Option  . . . . . . . . . . . .  32
84	       4.4.1.  SACK-permitted Option (Kind = 4)  . . . . . . . . . .  32
85	       4.4.2.  SACK Option (Kind = 5)  . . . . . . . . . . . . . . .  33
86	     4.5.  MD5 Option (Kind=19)  . . . . . . . . . . . . . . . . . .  35
87	     4.6.  Window scale option (Kind = 3)  . . . . . . . . . . . . .  36
88	     4.7.  Timestamps option (Kind = 8)  . . . . . . . . . . . . . .  37
89	       4.7.1.  Generation of timestamps  . . . . . . . . . . . . . .  37
90	       4.7.2.  Vulnerabilities . . . . . . . . . . . . . . . . . . .  38
91	   5.  Connection-establishment mechanism  . . . . . . . . . . . . .  39
92	     5.1.  SYN flood . . . . . . . . . . . . . . . . . . . . . . . .  40
93	     5.2.  Connection forgery  . . . . . . . . . . . . . . . . . . .  44
94	     5.3.  Connection-flooding attack  . . . . . . . . . . . . . . .  45
95	       5.3.1.  Vulnerability . . . . . . . . . . . . . . . . . . . .  45
96	       5.3.2.  Countermeasures . . . . . . . . . . . . . . . . . . .  46
97	     5.4.  Firewall-bypassing techniques . . . . . . . . . . . . . .  48
98	   6.  Connection-termination mechanism  . . . . . . . . . . . . . .  49
99	     6.1.  FIN-WAIT-2 flooding attack  . . . . . . . . . . . . . . .  49
100	       6.1.1.  Vulnerability . . . . . . . . . . . . . . . . . . . .  49
101	       6.1.2.  Countermeasures . . . . . . . . . . . . . . . . . . .  50
102	   7.  Buffer management . . . . . . . . . . . . . . . . . . . . . .  52
103	     7.1.  TCP retransmission buffer . . . . . . . . . . . . . . . .  52
104	       7.1.1.  Vulnerability . . . . . . . . . . . . . . . . . . . .  52
105	       7.1.2.  Countermeasures . . . . . . . . . . . . . . . . . . .  53
106	     7.2.  TCP segment reassembly buffer . . . . . . . . . . . . . .  56
107	     7.3.  Automatic buffer tuning mechanisms  . . . . . . . . . . .  59
108	       7.3.1.  Automatic send-buffer tuning mechanisms . . . . . . .  59
109	       7.3.2.  Automatic receive-buffer tuning mechanism . . . . . .  61
110	   8.  TCP segment reassembly algorithm  . . . . . . . . . . . . . .  63
111	     8.1.  Problems that arise from ambiguity in the reassembly
112	           process . . . . . . . . . . . . . . . . . . . . . . . . .  63
113	   9.  TCP Congestion Control  . . . . . . . . . . . . . . . . . . .  64
114	     9.1.  Congestion control with misbehaving receivers . . . . . .  66
115	       9.1.1.  ACK division  . . . . . . . . . . . . . . . . . . . .  66
116	       9.1.2.  DupACK forgery  . . . . . . . . . . . . . . . . . . .  66
117	       9.1.3.  Optimistic ACKing . . . . . . . . . . . . . . . . . .  67
118	     9.2.  Blind DupACK triggering attacks against TCP . . . . . . .  68
119	       9.2.1.  Blind throughput-reduction attack . . . . . . . . . .  70
120	       9.2.2.  Blind flooding attack . . . . . . . . . . . . . . . .  70
121	       9.2.3.  Difficulty in performing the attacks  . . . . . . . .  71
122	       9.2.4.  Modifications to TCP's loss recovery algorithms . . .  72
123	       9.2.5.  Countermeasures . . . . . . . . . . . . . . . . . . .  74
124	     9.3.  TCP Explicit Congestion Notification (ECN)  . . . . . . .  79
125	       9.3.1.  Possible attacks by a compromised router  . . . . . .  79
126	       9.3.2.  Possible attacks by a malicious TCP endpoint  . . . .  80
127	   10. TCP API . . . . . . . . . . . . . . . . . . . . . . . . . . .  81
128	     10.1. Passive opens and binding sockets . . . . . . . . . . . .  81
129	     10.2. Active opens and binding sockets  . . . . . . . . . . . .  82
130	   11. Blind in-window attacks . . . . . . . . . . . . . . . . . . .  84
131	     11.1. Blind TCP-based connection-reset attacks  . . . . . . . .  84
132	       11.1.1. RST flag  . . . . . . . . . . . . . . . . . . . . . .  85
133	       11.1.2. SYN flag  . . . . . . . . . . . . . . . . . . . . . .  86
134	       11.1.3. Security/Compartment  . . . . . . . . . . . . . . . .  88
135	       11.1.4. Precedence  . . . . . . . . . . . . . . . . . . . . .  89
136	       11.1.5. Illegal options . . . . . . . . . . . . . . . . . . .  90
137	     11.2. Blind data-injection attacks  . . . . . . . . . . . . . .  90
138	   12. Information leaking . . . . . . . . . . . . . . . . . . . . .  91
139	     12.1. Remote Operating System detection via TCP/IP stack
140	           fingerprinting  . . . . . . . . . . . . . . . . . . . . .  91
141	       12.1.1. FIN probe . . . . . . . . . . . . . . . . . . . . . .  91
142	       12.1.2. Bogus flag test . . . . . . . . . . . . . . . . . . .  92
143	       12.1.3. TCP ISN sampling  . . . . . . . . . . . . . . . . . .  92
144	       12.1.4. TCP initial window  . . . . . . . . . . . . . . . . .  92
145	       12.1.5. RST sampling  . . . . . . . . . . . . . . . . . . . .  93
146	       12.1.6. TCP options . . . . . . . . . . . . . . . . . . . . .  94
147	       12.1.7. Retransmission Timeout (RTO) sampling . . . . . . . .  94
148	     12.2. System uptime detection . . . . . . . . . . . . . . . . .  94
149	   13. Covert channels . . . . . . . . . . . . . . . . . . . . . . .  95
150	   14. TCP Port scanning . . . . . . . . . . . . . . . . . . . . . .  95
151	     14.1. Traditional connect() scan  . . . . . . . . . . . . . . .  96
152	     14.2. SYN scan  . . . . . . . . . . . . . . . . . . . . . . . .  96
153	     14.3. FIN, NULL, and XMAS scans . . . . . . . . . . . . . . . .  96
154	     14.4. Maimon scan . . . . . . . . . . . . . . . . . . . . . . .  98
155	     14.5. Window scan . . . . . . . . . . . . . . . . . . . . . . .  98
156	     14.6. ACK scan  . . . . . . . . . . . . . . . . . . . . . . . .  99
157	   15. Processing of ICMP error messages by TCP  . . . . . . . . . .  99
158	   16. TCP interaction with the Internet Protocol (IP) . . . . . . .  99
159	     16.1. TCP-based traceroute  . . . . . . . . . . . . . . . . . .  99
160	     16.2. Blind TCP data injection through fragmented IP traffic  . 100
161	     16.3. Broadcast and multicast IP addresses  . . . . . . . . . . 102
162	   17. Security Considerations . . . . . . . . . . . . . . . . . . . 102
163	   18. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . 102
164	   19. References  . . . . . . . . . . . . . . . . . . . . . . . . . 103
165	   20. References  . . . . . . . . . . . . . . . . . . . . . . . . . 113
166	     20.1. Normative References  . . . . . . . . . . . . . . . . . . 113
167	     20.2. Informative References  . . . . . . . . . . . . . . . . . 113
168	   Appendix A.  TODO list  . . . . . . . . . . . . . . . . . . . . . 113
169	   Appendix B.  Change log (to be removed by the RFC Editor
170	                before publication of this document as an RFC) . . . 113
171	     B.1.  Changes from draft-ietf-tcpm-tcp-security-01  . . . . . . 113
172	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . 114

174	1.  Preface

176	1.1.  Introduction

178	   The TCP/IP protocol suite was conceived in an environment that was
179	   quite different from the hostile environment they currently operate
180	   in.  However, the effectiveness of the protocols led to their early
181	   adoption in production environments, to the point that, to some
182	   extent, the current world's economy depends on them.

184	   While many textbooks and articles have created the myth that the
185	   Internet protocols were designed for warfare environments, the top
186	   level goal for the DARPA Internet Program was the sharing of large
187	   service machines on the ARPANET [Clark, 1988].  As a result, many
188	   protocol specifications focus only on the operational aspects of the
189	   protocols they specify, and overlook their security implications.

191	   While the Internet technology evolved since it early inception, the
192	   Internet's building blocks are basically the same core protocols
193	   adopted by the ARPANET more than two decades ago.  During the last
194	   twenty years, many vulnerabilities have been identified in the TCP/IP
195	   stacks of a number of systems.  Some of them were based on flaws in
196	   some protocol implementations, affecting only a reduced number of
197	   systems, while others were based in flaws in the protocols
198	   themselves, affecting virtually every existing implementation
199	   [Bellovin, 1989].  Even in the last couple of years, researchers were
200	   still working on security problems in the core protocols [NISCC,
201	   2004] [NISCC, 2005].

203	   The discovery of vulnerabilities in the TCP/IP protocol suite usually
204	   led to reports being published by a number of CSIRTs (Computer
205	   Security Incident Response Teams) and vendors, which helped to raise
206	   awareness about the threats and the best mitigations known at the
207	   time the reports were published.  Unfortunately, this also led to the
208	   documentation of the discovered protocol vulnerabilities being spread
209	   among a large number of documents, which are sometimes difficult to
210	   identify.

212	   For some reason, much of the effort of the security community on the
213	   Internet protocols did not result in official documents (RFCs) being
214	   issued by the IETF (Internet Engineering Task Force).  This basically
215	   led to a situation in which "known" security problems have not always
216	   been addressed by all vendors.  In addition, in many cases vendors
217	   have implemented quick "fixes" to the identified vulnerabilities
218	   without a careful analysis of their effectiveness and their impact on
219	   interoperability [Silbersack, 2005].

221	   Producing a secure TCP/IP implementation nowadays is a very difficult
222	   task, in part because of the lack of a single document that serves as
223	   a security roadmap for the protocols.  Implementers are faced with
224	   the hard task of identifying relevant documentation and
225	   differentiating between that which provides correct advice, and that
226	   which provides misleading advice based on inaccurate or wrong
227	   assumptions.

229	   There is a clear need for a companion document to the IETF
230	   specifications that discusses the security aspects and implications
231	   of the protocols, identifies the existing vulnerabilities, discusses
232	   the possible countermeasures, and analyzes their respective
233	   effectiveness.

235	   This document is the result of a security assessment of the IETF
236	   specifications of the Transmission Control Protocol (TCP), from a
237	   security point of view.  Possible threats are identified and, where
238	   possible, countermeasures are proposed.  Additionally, many
239	   implementation flaws that have led to security vulnerabilities have
240	   been referenced in the hope that future implementations will not
241	   incur the same problems.

243	   This document does not aim to be the final word on the security
244	   aspects of TCP.  On the contrary, it aims to raise awareness about a
245	   number of TCP vulnerabilities that have been faced in the past, those
246	   that are currently being faced, and some of those that we may still
247	   have to deal with in the future.

249	   Feedback from the community is more than encouraged to help this
250	   document be as accurate as possible and to keep it updated as new
251	   vulnerabilities are discovered.

253	   This document is heavily based on the "Security Assessment of the
254	   Transmission Control Protocol (TCP)" released by the UK Centre for
255	   the Protection of National Infrastructure (CPNI), available at: http:
256	   //www.cpni.gov.uk/Products/technicalnotes/
257	   Feb-09-security-assessment-TCP.aspx .

259	1.2.  Scope of this document

261	   While there are a number of protocols that may affect the way TCP
262	   operates, this document focuses only on the specifications of the
263	   Transmission Control Protocol (TCP) itself.

265	   The following IETF RFCs were selected for assessment as part of this
266	   work:

268	   o  RFC 793, "Transmission Control Protocol.  DARPA Internet Program.
269	      Protocol Specification" (91 pages)

271	   o  RFC 1122, "Requirements for Internet Hosts -- Communication
272	      Layers" (116 pages)

274	   o  RFC 1191, "Path MTU Discovery" (19 pages)

276	   o  RFC 1323, "TCP Extensions for High Performance" (37 pages)

278	   o  RFC 1948, "Defending Against Sequence Number Attacks" (6 pages)

280	   o  RFC 1981, "Path MTU Discovery for IP version 6" (15 pages)

282	   o  RFC 2018, "TCP Selective Acknowledgment Options" (12 pages)

284	   o  RFC 2385, "Protection of BGP Sessions via the TCP MD5 Signature
285	      Option" (6 pages)

287	   o  RFC 2581, "TCP Congestion Control" (14 pages)

289	   o  RFC 2675, "IPv6 Jumbograms" (9 pages)

291	   o  RFC 2883, "An Extension to the Selective Acknowledgement (SACK)
292	      Option for TCP" (17 pages)

294	   o  RFC 2884, "Performance Evaluation of Explicit Congestion
295	      Notification (ECN) in IP Networks" (18 pages)

297	   o  RFC 2988, "Computing TCP's Retransmission Timer" (8 pages)

299	   o  RFC 3168, "The Addition of Explicit Congestion Notification (ECN)
300	      to IP" (63 pages)

302	   o  RFC 3465, "TCP Congestion Control with Appropriate Byte Counting
303	      (ABC)" (10 pages)

305	   o  RFC 3517, "A Conservative Selective Acknowledgment (SACK)-based
306	      Loss Recovery Algorithm for TCP" (13 pages)

308	   o  RFC 3540, "Robust Explicit Congestion Notification (ECN) Signaling
309	      with Nonces" (13 pages)

311	   o  RFC 3782, "The NewReno Modification to TCP's Fast Recovery
312	      Algorithm" (19 pages)

314	1.3.  Organization of this document

316	   This document is basically organized in two parts.  The first part
317	   contains a discussion of each of the TCP header fields, identifies
318	   their security implications, and discusses the possible
319	   countermeasures.  The second part contains an analysis of the
320	   security implications of the mechanisms and policies implemented by
321	   TCP, and of a number of implementation strategies in use by a number
322	   of popular TCP implementations.

324	2.  The Transmission Control Protocol

326	   The Transmission Control Protocol (TCP) is a connection-oriented
327	   transport protocol that provides a reliable byte-stream data transfer
328	   service.

330	   Very few assumptions are made about the reliability of underlying
331	   data transfer services below the TCP layer.  Basically, TCP assumes
332	   it can obtain a simple, potentially unreliable datagram service from
333	   the lower level protocols.  Figure 1 illustrates where TCP fits in
334	   the DARPA reference model.

336	                             +---------------+
337	                             |  Application  |
338	                             +---------------+
339	                             |      TCP      |
340	                             +---------------+
341	                             |      IP       |
342	                             +---------------+
343	                             |    Network    |
344	                             +---------------+

346	                Figure 1: TCP in the DARPA reference model

348	   TCP provides facilities in the following areas:

350	   o  Basic Data Transfer

352	   o  Reliability

354	   o  Flow Control

356	   o  Multiplexing

358	   o  Connections
359	   o  Precedence and Security

361	   o  Congestion Control

363	   The core TCP specification, RFC 793 [Postel, 1981c], dates back to
364	   1981 and standardizes the basic mechanisms and policies of TCP.  RFC
365	   1122 [Braden, 1989] provides clarifications and errata for the
366	   original specification.  RFC 2581 [Allman et al, 1999] specifies TCP
367	   congestion control and avoidance mechanisms, not present in the
368	   original specification.  Other documents specify extensions and
369	   improvements for TCP.

371	   The large amount of documents that specify extensions, improvements,
372	   or modifications to existing TCP mechanisms has led the IETF to
373	   publish a roadmap for TCP, RFC 4614 [Duke et al, 2006], that
374	   clarifies the relevance of each of those documents.

376	3.  TCP header fields

378	   RFC 793 [Postel, 1981c] defines the syntax of a TCP segment, along
379	   with the semantics of each of the header fields.  Figure 2
380	   illustrates the syntax of a TCP segment.

382	        0                   1                   2                   3
383	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
384	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
385	       |          Source Port          |       Destination Port        |
386	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
387	       |                        Sequence Number                        |
388	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
389	       |                    Acknowledgment Number                      |
390	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
391	       |  Data |       |C|E|U|A|P|R|S|F|                               |
392	       | Offset|Resrved|W|C|R|C|S|S|Y|I|            Window             |
393	       |       |       |R|E|G|K|H|T|N|N|                               |
394	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
395	       |           Checksum            |         Urgent Pointer        |
396	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
397	       |                    Options                    |    Padding    |
398	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399	       |                             data                              |
400	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

402	   Note that one tick mark represents one bit position

404	           Figure 2: Transmission Control Protocol header format

406	   The minimum TCP header size is 20 bytes, and corresponds to a TCP
407	   segment with no options and no data.  However, a TCP module might be
408	   handed an (illegitimate) "TCP segment" of less than 20 bytes.
409	   Therefore, before doing any processing of the TCP header fields, the
410	   following check should be performed by TCP on the segments handed by
411	   the internet layer:

413	                             Segment.Size >= 20

415	   If a segment does not pass this check, it should be dropped.

417	   The following subsections contain further sanity checks that should
418	   be performed on TCP segments.

420	3.1.  Source Port and Destination Port

422	   The Source Port field contains a 16-bit number that identifies the
423	   TCP end-point that originated this TCP segment.  The TCP Destination
424	   Port contains a 16-bit number that identifies the destination TCP
425	   end-point of this segment.  In most of the discussion we refer to
426	   client-side (or "ephemeral") port-numbers and server-side port
427	   numbers, since that distinction is what usually affects the
428	   interpretation of a port number.

430	   TCP SHOULD randomize its ephemeral (client-side) ports, to improve
431	   its resistance to off-path attacks.  For the purpose of ephemeral
432	   port selection, the largest posible port range SHOULD be used
433	   (ideally 1024-65535) I-D.ietf-tsvwg-port-randomization.

435	   DISCUSSION:

437	      [I-D.ietf-tsvwg-port-randomization] provides advice on port
438	      randomization.

440	   TCP MUST NOT allocate port number 0, as its use could lead to
441	   interoperability problems.  If a segment is received with port 0 as
442	   the Source Port or the Destination Port, a RST segment SHOULD be sent
443	   in response (provided that the incomming segment does not have the
444	   RST flag set).

446	   DISCUSSION:

448	      While port 0 is a legitimate port number, it has a special meaning
449	      in the UNIX Sockets API.  For example, when a TCP port number of 0
450	      is passed as an argument to the bind() function, rather than
451	      binding port 0, an ephemeral port is selected for the
452	      corresponding TCP end-point.  As a result, the TCP port number 0
453	      is never actually used in TCP segments.

455	      Different implementations have been found to respond differently
456	      to TCP segments that have a port number of 0 as the Source Port
457	      and/or the Destination Port.  As a result, TCP segments with a
458	      port number of 0 are usually employed for remote OS detection via
459	      TCP/IP stack fingerprinting [Jones, 2003].

461	      Since in practice TCP port 0 is not used by any legitimate
462	      application and is only used for fingerprinting purposes, a number
463	      of host implementations already reject TCP segments that use 0 as
464	      the Source Port and/or the Destination Port.  Also, a number
465	      firewalls filter (by default) any TCP segments that contain a port
466	      number of zero for the Source Port and/or the Destination Port.

468	      We therefore recommend that TCP implementations respond to
469	      incoming TCP segments that have a Source Port or a Destination
470	      Port of 0 with an RST (provided these incoming segments do not
471	      have the RST bit set).

473	      Responding with an RST segment to incoming segments that have the
474	      RST bit would open the door to RST-war attacks.

476	   TCP MUST be able to grecefully handle the case where the source end-
477	   point (IP Source Address, TCP Source Port) is the same as the
478	   destination end-point (IP Destination Address, TCP Destination Port).

480	   DISCUSSION:

482	      Some systems have been found to be unable to process TCP segments
483	      in which the source endpoint {Source Address, Source Port} is the
484	      same than the destination end-point {Destination Address,
485	      Destination Port}.  Such TCP segments have been reported to cause
486	      malfunction of a number of implementations [CERT, 1996], and have
487	      been exploited in the past to perform Denial of Service (DoS)
488	      attacks [Meltman, 1997].  While these packets are very very
489	      unlikely to exist in real and legitimate scenarios, TCP should
490	      nevertheless be able to process them without the need of any
491	      "extra" code.

493	      A SYN segment in which the source end-point {Source Address,
494	      Source Port} is the same as the destination end-point {Destination
495	      Address, Destination Port} will result in a "simultaneous open"
496	      scenario, such as the one described in page 32 of RFC 793 [Postel,
497	      1981c].  Therefore, those TCP implementations that correctly
498	      handle simultaneous opens should already be prepared to handle
499	      these unusual TCP segments.

501	   TCP SHOULD NOT allocate of port numbers that are in use by a TCP that
502	   is in the LISTEN or CLOSED states for use as ephemeral ports, as this
503	   could allow attackers on the local system to "steal" incomming TCP
504	   connections.

506	   DISCUSSION:

508	      While the only requirement for a selected ephemeral port is that
509	      the resulting four-tuple (connection-id) is unique (i.e., not
510	      currently in use by any other TCP connection), in practice it may
511	      be necessary to not allow the allocation of port numbers that are
512	      in use by a TCP that is in the LISTEN or CLOSED states for use as
513	      ephemeral ports, as this might allow an attacker to "steal"
514	      incoming connections from a local server application.  Therefore,
515	      TCP SHOULD NOT allocate port numbers that are in use by a TCP in
516	      the LISTEN or CLOSED states for use as ephemeral ports.  Section
517	      10.2 of this document provides a detailed discussion of this
518	      issue.

520	   While some systems restrict use of the port numbers in the range
521	   0-1024 to privileged users, applications SHOULD NOT grant any trust
522	   based on the port numbers used for a TCP connection.

524	   DISCUSSION:

526	      Not all systems require superuser privileges to bind port numbers
527	      in that range.  Besides, with desktop computers such "distinction"
528	      has generally become irrelevant.

530	   Middle-boxes such as packet filters MUST NOT assume that clients use
531	   port numbers from only the Dynamic or Registered port ranges.

533	   DISCUSSION:

535	      It should also be noted that some clients, such as DNS resolvers,
536	      are known to use port numbers from the "Well Known Ports" range.
537	      Therefore, middle-boxes such as packet filters MUST NOT assume
538	      that clients use port number from only the Dynamic or Registered
539	      port ranges.

541	3.2.  Sequence number

543	   TCP SHOULD select its Initial Sequence Numbers (ISNs) with the
544	   following expression:

546	   ISN = M + F(localhost, localport, remotehost, remoteport, secret_key)

548	   where M is a monotonically increasing counter maintained within TCP,
549	   and F() is a Pseudo-Random Function (PRF).  As it is vital that F()
550	   not be computable from the outside, F() could be a PRF of the
551	   connection-id and some secret data.  HMAC-SHA-256 would be a good
552	   choice for F()

554	   DISCUSSION:

556	      The choice of the Initial Sequence Number of a connection is not
557	      arbitrary, but aims to minimize the chances of a stale segment
558	      from being accepted by a new incarnation of a previous connection.
559	      RFC 793 [Postel, 1981c] suggests the use of a global 32-bit ISN
560	      generator, whose lower bit is incremented roughly every 4
561	      microseconds.

563	      However, use of such an ISN generator makes it trivial to predict
564	      the ISN that a TCP will use for new connections, thus allowing a
565	      variety of attacks against TCP, such as those described in Section
566	      5.2 and Section 11 of this document.  This vulnerability was first
567	      described in [Morris, 1985], and its exploitation was widely
568	      publicized about 10 years later [Shimomura, 1995].

570	      As a matter of fact, protection against old stale segments from a
571	      previous incarnation of the connection comes from allowing the
572	      creation of a new incarnation of a previous connection only after
573	      2*MSL have passed since a segment corresponding to the old
574	      incarnation was last seen.  This is accomplished by the TIME-WAIT
575	      state, and TCP's "quiet time" concept.  However, as discussed in
576	      Section 3.1 and Section 11.1.2 of this document, the ISN can be
577	      used to perform some heuristics meant to avoid an interoperability
578	      problem that may arise when two systems establish connections at a
579	      high rate.  In order for such heuristics to work, the ISNs
580	      generated by a TCP should be monotonically increasing.

582	      The ISN generation scheme recommended in this section was
583	      originally proposed in RFC 1948 [Bellovin, 1996], such that the
584	      chances of an attacker from guessing the ISN of a TCP are reduced,
585	      while still producing a monotonically-increasing sequence that
586	      allows implementation of the optimization described in Section 3.1
587	      and Section 11.1.2 of this document.

589	      [CERT, 2001] and [US-CERT, 2001] are advisories about the security
590	      implications of weak ISN generators.  [Zalewski, 2001a] and
591	      [Zalewski, 2002] contain a detailed analysis of ISN generators,
592	      and a survey of the algorithms in use by popular TCP
593	      implementations.

595	      Another security consideration that should be made about TCP
596	      sequence numbers is that they might allow an attacker to count the
597	      number of systems behind a Network Address Translator (NAT)
598	      [Srisuresh and Egevang, 2001].  Depending on the ISN generators
599	      implemented by each of the systems behind the NAT, an attacker
600	      might be able to count the number of systems behind the NAT by
601	      establishing a number of TCP connections (using the public address
602	      of the NAT) and indentifying the number of different sequence
603	      number "spaces".  This information leakage could be eliminated by
604	      rewriting the contents of all those header fields and options that
605	      make use of sequence numbers (such as the Sequence Number and the
606	      Acknowledgement Number fields, and the SACK Option) at the NAT.
607	      [Gont and Srisuresh, 2008] provides a detailed discussion of the
608	      security implications of NATs and of the possible mitigations for
609	      this and other issues.

611	3.3.  Acknowledgement Number

613	   TCP SHOULD set the Acknowledgement Number to zero when sending a TCP
614	   segment that does not have the ACK bit set (i.e., a SYN segment).

616	   TCP MUST check that, on segments that have the ACK bit set, the
617	   Acknowledgment Number satisfies the expression:

619	                SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT

621	   If a TCP segment does not pass this check, the segment MUST be
622	   dropped, and an ACK segment SHOULD be sent in response.

624	   DISCUSSION:

626	      If the ACK bit is on, the Acknowledgement Number contains the
627	      value of the next sequence number the sender of this segment is
628	      expecting to receive.  According to RFC 793, the Acknowledgement
629	      Number is considered valid as long as it does not acknowledge the
630	      receipt of data that has not yet been sent.

632	      However, as a result of recent concerns on forgery attacks against
633	      TCP (see Section 11 of this document), ongoing work at the IETF
634	      [Ramaiah et al, 2008] has proposed to enforce a more strict check
635	      on the Acknowledgement Number of segments that have the ACK bit
636	      set:

638	                SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT

640	      If the ACK bit is off, the Acknowledgement Number field is not
641	      valid.  We recommend TCP implementations to set the
642	      Acknowledgement Number to zero when sending a TCP segment that
643	      does not have the ACK bit set (i.e., a SYN segment).  Some TCP
644	      implementations have been known to fail to set the Acknowledgement
645	      Number to zero, thus leaking information.

647	      TCP Acknowledgements are also used to perform heuristics for loss
648	      recovery and congestion control.  Section 9 of this document
649	      describes a number of ways in which these mechanisms can be
650	      exploited.

652	3.4.  Data Offset

654	   TCP MUST enforce the following checks on the Data Offset field:

656	                              Data Offset >= 5

658	                   Data Offset * 4 <= TCP segment length

660	   If a TCP segment does not pass these checks, it should be silently
661	   dropped.

663	      The TCP segment length should be obtained from the IP layer, as
664	      TCP does not include a TCP segment length field.

666	   DISCUSSION:

668	      The Data Offset field indicates the length of the TCP header in
669	      32-bit words.  As the minimum TCP header size is 20 bytes, the
670	      minimum legal value for this field is 5.

672	      For obvious reasons, the TCP header cannot be larger than the
673	      whole TCP segment it is part of.

675	3.5.  Control bits

677	   The following subsections provide a discussion of the different
678	   control bits in the TCP header.  TCP segments with unusual
679	   combinations of flags set have been known in the past to cause
680	   malfunction of some implementations, sometimes to the extent of
681	   causing them to crash [Postel, 1987] [Braden, 1992].  These packets
682	   are still usually employed for the purpose of TCP/IP stack
683	   fingerprinting.  Section 12.1 contains a discussion of TCP/IP stack
684	   fingerprinting.

686	3.5.1.  Reserved (four bits)

688	   TCP MUST ignore the Reserved field of incoming TCP segments.

690	   DISCUSSION:

692	      These four bits are reserved for future use, and must be zero.  As
693	      with virtually every field, the Reserved field could be used as a
694	      covert channel.  While there exist intermediate devices such as
695	      protocol scrubbers that clear these bits, and firewalls that drop/
696	      reject segments with any of these bits set, these devices should
697	      consider the impact of these policies on TCP interoperability.
698	      For example, as TCP continues to evolve, all or part of the bits
699	      in the Reserved field could be used to implement some new
700	      functionality.  If some middle-box or end-system implementation
701	      were to drop a TCP segment merely because some of these bits are
702	      not set to zero, interoperability problems would arise.

704	3.5.2.  CWR (Congestion Window Reduced)

706	   DISCUSSION:

708	      The CWR flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is
709	      used as part of the Explicit Congestion Notification (ECN)
710	      mechanism.  For connections in any of the synchronized states,
711	      this flag indicates, when set, that the TCP sending this segment
712	      has reduced its congestion window.

714	      An analysis of the security implications of ECN can be found in
715	      Section 9.3 of this document.

717	3.5.3.  ECE (ECN-Echo)

719	   DISCUSSION:

721	      The ECE flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is
722	      used as part of the Explicit Congestion Notification (ECN)
723	      mechanism.

725	      Once a TCP connection has been established, an ACK segment with
726	      the ECE bit set indicates that congestion was encountered in the
727	      network on the path from the sender to the receiver.  This
728	      indication of congestion should be treated just as a congestion
729	      loss in non-ECN-capable TCP [Ramakrishnan et al, 2001].
730	      Additionally, TCP should not increase the congestion window (cwnd)
731	      in response to such an ACK segment that indicates congestion, and
732	      should also not react to congestion indications more than once
733	      every window of data (or once per round-trip time).

735	      An analysis of the security implications of ECN can be found in
736	      Section 9.3 of this document.

738	3.5.4.  URG

740	   DISCUSSION:

742	      When the URG flag is set, the Urgent Pointer field contains the
743	      current value of the urgent pointer.

745	      Receipt of an "urgent" indication generates, in a number of
746	      implementations (such as those in UNIX-like systems), a software
747	      interrupt (signal) that is delivered to the corresponding process.

749	      In UNIX-like systems, receipt of an urgent indication causes a
750	      SIGURG signal to be delivered to the corresponding process.

752	      A number of applications handle TCP urgent indications by
753	      installing a signal handler for the corresponding signal (e.g.,
754	      SIGURG).  As discussed in [Zalewski, 2001b], some signal handlers
755	      can be maliciously exploited by an attacker, for example to gain
756	      remote access to a system.  While secure programming of signal
757	      handlers is out of the scope of this document, we nevertheless
758	      raise awareness that TCP urgent indications might be exploited to
759	      abuse poorly-written signal handlers.

761	      Section 3.9 discusses the security implications of the TCP urgent
762	      mechanism.

764	3.5.5.  ACK

766	   DISCUSSION:

768	      When the ACK bit is one, the Acknowledgment Number field contains
769	      the next sequence number expected, cumulatively acknowledging the
770	      receipt of all data up to the sequence number in the
771	      Acknowledgement Number, minus one.  Section 3.4 of this document
772	      describes sanity checks that should be performed on the
773	      Acknowledgement Number field.

775	      TCP Acknowledgements are also used to perform heuristics for loss
776	      recovery and congestion control.  Section 9 of this document
777	      describes a number of ways in which these mechanisms can be
778	      exploited.

780	3.5.6.  PSH

782	   As a result of a SEND call, TCP SHOULD send all queued data (provided
783	   that TCP's flow control and congestion control algorithms allow it).

785	   Received data SHOULD be immediately delivered to an application
786	   calling the RECEIVE function, even if the data already available are
787	   less than those requested by the application.

789	   DISCUSSION:

791	      RFC 793 [Postel, 1981c] contains (in pages 54-64) a functional
792	      description of a TCP Application Programming Interface (API).  One
793	      of the parameters of the SEND function is the PUSH flag which,
794	      when set, signals the local TCP that it must send all unsent data.
795	      The TCP PSH (PUSH) flag will be set in the last outgoing segment,
796	      to signal the push function to the receiving TCP.  Upon receipt of
797	      a segment with the PSH flag set, the receiving user's buffer is
798	      returned to the user, without waiting for additional data to
799	      arrive.

801	      There are two security considerations arising from the PUSH
802	      function.  On the sending side, an attacker could cause a large
803	      amount of data to be queued for transmission without setting the
804	      PUSH flag in the SEND call.  This would prevent the local TCP from
805	      sending the queued data, causing system memory to be tied to those
806	      data for an unnecessarily long period of time.

808	      An analogous consideration should be made for the receiving TCP.
809	      TCP is allowed to buffer incoming data until the receiving user's
810	      buffer fills or a segment with the PSH bit set is received.  If
811	      the receiving TCP implements this policy, an attacker could send a
812	      large amount of data, slightly less than the receiving user's
813	      buffer size, to cause system memory to be tied to these data for
814	      an unnecessarily long period of time.  Both of these issues are
815	      discussed in Section 4.2.2.2 of RFC 1122 [Braden, 1989].

817	      In order to mitigate these potential vulnerabilities, we suggest
818	      assuming an implicit "PUSH" in every SEND call.  On the sending
819	      side, this means that as a result of a SEND call TCP should try to
820	      send all queued data (provided that TCP's flow control and
821	      congestion control algorithms allow it).  On the receiving side,
822	      this means that the received data will be immediately delivered to
823	      an application calling the RECEIVE function, even if the data
824	      already available are less than those requested by the
825	      application.

827	      It is interesting to note that popular TCP APIs (such as
828	      "sockets") do not provide a PUSH flag in any of the interfaces
829	      they define, but rather perform some kind of "heuristics" to set
830	      the PSH bit in outgoing segments.  As a result, the value of the
831	      PSH bit in the received TCP segments is usually a policy of the
832	      sending TCP, rather than a policy of the sending application.  All
833	      robust applications that make use of those APIs (such as the
834	      sockets API) properly handle the case of a RECEIVE call returning
835	      less data (e.g., zero) than requested, usually by performing
836	      subsequent RECEIVE calls.

838	      Another potential malicious use of the PSH bit would be for an
839	      attacker to send small TCP segments (probably with zero bytes of
840	      data payload) to cause the receiving application to be
841	      unnecessarily woken up (increasing the CPU load), or to cause
842	      malfunction of poorly-written applications that may not handle
843	      well the case of RECEIVE calls returning less data than requested.

845	3.5.7.  RST

847	   TCP MUST process RST segments (i.e., segments with the RST bit set)
848	   as follows:

850	   o  If the Sequence Number of the RST segment is not valid (i.e.,
851	      falls outside of the receive window), silently drop the segment.

853	   o  If the Sequence Number of the RST segment matches the next
854	      expected sequence number (RCV.NXT), abort the corresponding
855	      connection.

857	   o  If the Sequence Number is valid (i.e., falls within the receive
858	      window) but is not exactly RCV.NXT, send an ACK segment (a
859	      "challenge ACK") of the form: <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>.
860	      TCP SHOULD rate-limit these challenge ACK segments.

862	   DISCUSSION:

864	      The RST bit is used to request the abortion (abnormal close) of a
865	      TCP connection.  RFC 793 [Postel, 1981c] suggests that an RST
866	      segment should be considered valid if its Sequence Number is valid
867	      (i.e., falls within the receive window).  However, in response to
868	      the security concerns raised by [Watson, 2004] and [NISCC, 2004],
869	      [Ramaiah et al, 2008] proposec the aforementioned stricter
870	      validity checks.

872	      Section 11.1 of this document describes TCP-based connection-reset
873	      attacks, along with a number of countermeasures to mitigate their
874	      impact.

876	3.5.8.  SYN

878	   DISCUSSION:

880	      The SYN bit is used during the connection-establishment phase, to
881	      request the synchronization of sequence numbers.

883	      There are basically four different vulnerabilities that make use
884	      of the SYN bit: SYN-flooding attacks, connection forgery attacks,
885	      connection flooding attacks, and connection-reset attacks.  They
886	      are described in Section 5.1, Section 5.2, Section 5.3, and
887	      Section 11.1.2, respectively, along with the possible
888	      countermeasures.

890	3.5.9.  FIN

892	   DISCUSSION:

894	      The FIN flag is used to signal the remote end-point the end of the
895	      data transfer in this direction.  Receipt of a valid FIN segment
896	      (i.e., a TCP segment with the FIN flag set) causes the transition
897	      in the connection state, as part of what is usually referred to as
898	      the "connection termination phase".

900	      The connection-termination phase can be exploited to perform a
901	      number of resource-exhaustion attacks.  Section 6 of this document
902	      describes a number of attacks that exploit the connection-
903	      termination phase along with the possible countermeasures.

905	3.6.  Window

907	   DISCUSSION:

909	      The TCP Window field advertises how many bytes of data the remote
910	      peer is allowed to send before a new advertisement is made.
911	      Theoretically, the maximum transfer rate that can be achieved by
912	      TCP is limited to:

914	      Maximum Transfer Rate = Window / RTT

916	      This means that, under ideal network conditions (e.g., no packet
917	      loss), the TCP Window in use should be at least:

919	                       Window = 2 * Bandwidth * Delay

921	      Using a larger Window than that resulting from the previous
922	      equation will not provide any improvements in terms of
923	      performance.

925	      In practice, selection of the most convenient Window size may also
926	      depend on a number of other parameters, such as: packet loss rate,
927	      loss recovery mechanisms in use, etc.

929	      Security implications of the maximum TCP window size

931	      An aspect of the TCP Window that is usually overlooked is the
932	      security implications of its size.  Increasing the TCP window
933	      increases the sequence number space that will be considered
934	      "valid" for incoming segments.  Thus, use of unnecessarily large
935	      TCP Window sizes increases TCP's vulnerability to forgery attacks
936	      unnecessarily.

938	      In those scenarios in which the network conditions are known
939	      and/or can be easily predicted, it is recommended that the TCP
940	      Window is never set to a value larger than that resulting from the
941	      equations above.  Additionally, the nature of the application
942	      running on top of TCP should be considered when tuning the TCP
943	      window.  As an example, an H.245 signaling application certainly
944	      does not have high requirements on throughput, and thus a window
945	      size of around 4 KBytes will usually fulfill its needs, while
946	      keeping TCP's resistance to off-path forgery attacks at a decent
947	      level.  Some rough measurements seem to indicate that a TCP window
948	      of 4Kbytes is common practice for TCP connections servicing
949	      applications such as BGP.

951	      In principle, a possible approach to avoid requiring
952	      administrators to manually set the TCP window would be to
953	      implement an automatic buffer tuning mechanism, such as that
954	      described in [Heffner, 2002].  However, as discussed in Section
955	      7.3.2 of this document these mechanisms can be exploited to
956	      perform other types of attacks.

958	      Security implications arising from closed windows

960	      The TCP window is a flow-control mechanism that prevents a fast
961	      data sender application from overwhelming a "slow" receiver.  When
962	      a TCP end-point is not willing to receive any more data (before
963	      some of the data that have already been received are consumed), it
964	      will advertise a TCP window of zero bytes.  This will effectively
965	      stop the sender from sending any new data to the TCP receiver.
966	      Transmission of new data will resume when the TCP receiver
967	      advertises a nonzero TCP window, usually with a TCP segment that
968	      contains no data ("an ACK").

970	      This segment is usually referred to as a "window update", as the
971	      only purpose of this segment is to update the server regarding the
972	      new window.

974	      To accommodate those scenarios in which the ACK segment that
975	      "opens" the window is lost, TCP implements a "persist timer" that
976	      causes the TCP sender to query the TCP receiver periodically if
977	      the last segment received advertised a window of zero bytes.  This
978	      probe simply consists of sending one byte of new data that will
979	      force the TCP receiver to send an ACK segment back to the TCP
980	      sender, containing the current TCP window.  Similarly to the
981	      retransmission timeout timer, an exponential back-off is used when
982	      calculating the retransmission timer, so that the spacing between
983	      probes increases exponentially.

985	      A fundamental difference between the "persist timer" and the
986	      retransmission timer is that there is no limit on the amount of
987	      time during which a TCP can advertise a zero window.  This means
988	      that a TCP end-point could potentially advertise a zero window
989	      forever, thus keeping kernel memory at the TCP sender tied to the
990	      TCP retransmission buffer.  This could clearly be exploited as a
991	      vector for performing a Denial of Service (DoS) attack against
992	      TCP, such as that described in Section 7.1 of this document.

994	      Section 7.1 of this document describes a Denial of Service attack
995	      that aims at exhausting the kernel memory used for the TCP
996	      retransmission buffer, along with possible countermeasures.

998	3.7.  Checksum

1000	   Middleboxes that process TCP segments MUST validate the Checksum
1001	   field, and silently discard the TCP segment if such validation fails.

1003	   DISCUSSION:

1005	      The Checksum field is an error detection mechanism meant for the
1006	      contents of the TCP segment and a number of important fields of
1007	      the IP header.  It is computed over the full TCP header pre-pended
1008	      with a pseudo header that includes the IP Source Address, the IP
1009	      Destination Address, the Protocol number, and the TCP segment
1010	      length.  While in principle there should not be security
1011	      implications arising from this field, due to non-RFC-compliant
1012	      implementations, the Checksum can be exploited to detect
1013	      firewalls, evade network intrusion detection systems (NIDS),
1014	      and/or perform Denial of Service attacks.

1016	      If a stateful firewall does not check the TCP Checksum in the
1017	      segments it processes, an attacker can exploit this situation to
1018	      perform a variety of attacks.  For example, he could send a flood
1019	      of TCP segments with invalid checksums, which would nevertheless
1020	      create state information at the firewall.  When each of these
1021	      segments is received at its intended destination, the TCP checksum
1022	      will be found to be incorrect, and the corresponding will be
1023	      silently discarded.  As these segments will not elicit a response
1024	      (e.g., an RST segment) from the intended recipients, the
1025	      corresponding connection state entries at the firewall will not be
1026	      removed.  Therefore, an attacker may end up tying all the state
1027	      resources of the firewall to TCP connections that will never
1028	      complete or be terminated, probably leading to a Denial of Service
1029	      to legitimate users, or forcing the firewall to randomly drop
1030	      connection state entries.

1032	      If a NIDS does not check the Checksum of TCP segments, an attacker
1033	      may send TCP segments with an invalid checksum to cause the NIDS
1034	      to obtain a TCP data stream different from that obtained by the
1035	      system being monitored.  In order to "confuse" the NIDS, the
1036	      attacker would send TCP segments with an invalid Checksum and a
1037	      Sequence Number that would overlap the sequence number space being
1038	      used for his malicious activity.  FTester [Barisani, 2006] is a
1039	      tool that can be used to assess NIDS on this issue.

1041	      Finally, an attacker performing port-scanning could potentially
1042	      exploit intermediate systems that do not check the TCP Checksum to
1043	      detect whether a given TCP port is being filtered by an
1044	      intermediate firewall, or the port is actually closed by the host
1045	      being port-scanned.  If a given TCP port appeared to be closed,
1046	      the attacker would then send a SYN segment with an invalid
1047	      Checksum.  If this segment elicited a response (either an ICMP
1048	      error message or a TCP RST segment) to this packet, then that
1049	      response should come from a system that does not check the TCP
1050	      checksum.  Since normal host implementations of the TCP protocol
1051	      do check the TCP checksum, such a response would most likely come
1052	      from a firewall or some other middle-box.

1054	      [Ed3f, 2002] describes the exploitation of the TCP checksum for
1055	      performing the above activities.  [US-CERT, 2005d] provides an
1056	      example of a TCP implementation that failed to check the TCP
1057	      checksum.

1059	3.8.  Urgent pointer

1061	                     Segment.Size - Data Offset * 4 > 0

1063	   If a TCP segment with the URG bit set does not pass this check, it
1064	   MUST be silently dropped.

1066	   For TCP segments that have the URG bit set to zero, sending TCP TCP
1067	   SHOULD set the Urgent Pointer to zero.

1069	   A receiving TCP MUST ignore the Urgent Pointer field of TCP segments
1070	   for which the URG bit is zero.

1072	   DISCUSSION:

1074	      Section 3.7 of RFC 793 [Postel, 1981c] states (in page 42) that to
1075	      send an urgent indication the user must also send at least one
1076	      byte of data.

1078	      If the URG bit is zero, the Urgent Pointer is not valid, and thus
1079	      should not be processed by the receiving TCP.  Nevertheless, we
1080	      recommend TCP implementations to set the Urgent Pointer to zero
1081	      when sending a TCP segment that does not have the URG bit set, and
1082	      to ignore the Urgent Pointer (as required by RFC 793) when the URG
1083	      bit is zero.

1085	      Some stacks have been known to fail to set the Urgent Pointer to
1086	      zero when the URG bit is zero, thus leaking out the corresponding
1087	      system memory contents.  [Zalewski, 2008] provides further details
1088	      about this issue.

1090	      Some implementations have been found to be unable to process TCP
1091	      urgent indications correctly.  [Myst, 1997] originally described
1092	      how TCP urgent indications could be exploited to perform a Denial
1093	      of Service (DoS) attack against some TCP/IP implementations,
1094	      usually leading to a system crash.

1096	3.9.  Options

1098	   [IANA, 2007] contains the official list of the assigned option
1099	   numbers.  TCP Options have been specified in the past both within the
1100	   IETF and by other groups.  [Hnes, 2007] contains an un-official
1101	   updated version of the IANA list of assigned option numbers.  The
1102	   following table contains a summary of the assigned TCP option
1103	   numbers, which is based on [Hnes, 2007].

1105	   +--------+----------------------+-----------------------------------+
1106	   |  Kind  |        Meaning       |              Summary              |
1107	   +--------+----------------------+-----------------------------------+
1108	   |    0   |  End of Option List  |      Discussed in Section 4.1     |
1109	   +--------+----------------------+-----------------------------------+
1110	   |    1   |     No-Operation     |      Discussed in Section 4.2     |
1111	   +--------+----------------------+-----------------------------------+
1112	   |    2   | Maximum Segment Size |      Discussed in Section 4.3     |
1113	   +--------+----------------------+-----------------------------------+
1114	   |    3   | WSOPT - Window Scale |      Discussed in Section 4.6     |
1115	   +--------+----------------------+-----------------------------------+
1116	   |    4   |    SACK Permitted    |     Discussed in Section 4.4.1    |
1117	   +--------+----------------------+-----------------------------------+
1118	   |    5   |         SACK         |     Discussed in Section 4.4.2    |
1119	   +--------+----------------------+-----------------------------------+
1120	   |    6   |  Echo (obsoleted by  |  Obsolete.  Specified in RFC 1072 |
1121	   |        |       option 8)      |    [Jacobson and Braden, 1988]    |
1122	   +--------+----------------------+-----------------------------------+
1123	   |    7   |      Echo Reply      |  Obsolete.  Specified in RFC 1072 |
1124	   |        | (obsoleted by option |    [Jacobson and Braden, 1988]    |
1125	   |        |          8)          |                                   |
1126	   +--------+----------------------+-----------------------------------+
1127	   |    8   |  TSOPT - Time Stamp  |      Discussed in Section 4.7     |
1128	   |        |        Option        |                                   |
1129	   +--------+----------------------+-----------------------------------+
1130	   |    9   |     Partial Order    |  Historic.  Specified in RFC 1693 |
1131	   |        | Connection Permitted |       [Connolly et al, 1994]      |
1132	   +--------+----------------------+-----------------------------------+
1133	   |   10   |     Partial Order    |  Historic.  Specified in RFC 1693 |
1134	   |        |    Service Profile   |       [Connolly et al, 1994]      |
1135	   +--------+----------------------+-----------------------------------+
1136	   |   11   |          CC          |  Historic.  Specified in RFC 1644 |
1137	   |        |                      |           [Braden, 1994]          |
1138	   +--------+----------------------+-----------------------------------+
1139	   |   12   |        CC.NEW        |  Historic.  Specified in RFC 1644 |
1140	   |        |                      |           [Braden, 1994]          |
1141	   +--------+----------------------+-----------------------------------+
1142	   |   13   |        CC.ECHO       |  Historic.  Specified in RFC 1644 |
1143	   |        |                      |           [Braden, 1994]          |
1144	   +--------+----------------------+-----------------------------------+
1145	   |   14   |     TCP Alternate    |  Historic.  Specified in RFC 1146 |
1146	   |        |   Checksum Request   |    [Zweig and Partridge, 1990]    |
1147	   +--------+----------------------+-----------------------------------+
1148	   |   15   |     TCP Alternate    |  Historic.  Specified in RFC 1145 |
1149	   |        |     Checksum Data    |    [Zweig and Partridge, 1990]    |
1150	   +--------+----------------------+-----------------------------------+
1151	   |   16   |        Skeeter       |              Historic             |
1152	   +--------+----------------------+-----------------------------------+
1153	   +--------+----------------------+-----------------------------------+
1154	   |   17   |         Bubba        |              Historic             |
1155	   +--------+----------------------+-----------------------------------+
1156	   |   18   |   Trailer Checksum   |              Historic             |
1157	   |        |        Option        |                                   |
1158	   +--------+----------------------+-----------------------------------+
1159	   |   19   | MD5 Signature Option |      Discussed in Section 4.5     |
1160	   +--------+----------------------+-----------------------------------+
1161	   |   20   |   SCPS Capabilities  |     Specified in [CCSDS, 2006]    |
1162	   +--------+----------------------+-----------------------------------+
1163	   |   21   |  Selective Negative  |     Specified in [CCSDS, 2006]    |
1164	   |        |   Acknowledgements   |                                   |
1165	   +--------+----------------------+-----------------------------------+
1166	   |   22   |   Record Boundaries  |     Specified in [CCSDS, 2006]    |
1167	   +--------+----------------------+-----------------------------------+
1168	   |   23   |      Corruption      |     Specified in [CCSDS, 2006]    |
1169	   |        |      experienced     |                                   |
1170	   +--------+----------------------+-----------------------------------+
1171	   |   24   |         SNAP         |              Historic             |
1172	   +--------+----------------------+-----------------------------------+
1173	   |   25   | Unassigned (released |             Unassigned            |
1174	   |        |      2000-12-18)     |                                   |
1175	   +--------+----------------------+-----------------------------------+
1176	   |   26   |    TCP Compression   |              Historic             |
1177	   |        |        Filter        |                                   |
1178	   +--------+----------------------+-----------------------------------+
1179	   |   27   | Quick-Start Response |  Specified in RFC 4782 [Floyd et  |
1180	   |        |                      |             al, 2007]             |
1181	   +--------+----------------------+-----------------------------------+
1182	   | 28-252 |      Unassigned      |             Unassigned            |
1183	   +--------+----------------------+-----------------------------------+
1184	   |   253  |     RFC3692-style    |   Described by RFC 4727 [Fenner,  |
1185	   |        |     Experiment 1     |               2006]               |
1186	   +--------+----------------------+-----------------------------------+
1187	   |   254  |     RFC3692-style    |   Described by RFC 4727 [Fenner,  |
1188	   |        |     Experiment 2     |               2006]               |
1189	   +--------+----------------------+-----------------------------------+

1191	                           Table 1: TCP Options

1193	   There are two cases for the format of a TCP option:

1195	   o  Case 1: A single byte of option-kind.

1197	   o  Case 2: An option-kind byte, followed by an option-length byte,
1198	      and the actual option-data bytes.

1200	   In options of the Case 2 above, the option-length byte counts the
1201	   option-kind byte and the option-length byte, as well as the actual
1202	   option-data bytes.

1204	   All options except "End of Option List" (Kind = 0) and "No Operation"
1205	   (Kind = 1), are of "Case 2".

1207	   For options that belong to the "Case 2" described above, the
1208	   following checks MUST be performed:

1210	                             option-length >= 2

1212	              option-offset + option-length <= Data Offset * 4

1214	   Where option-offset is the offset of the first byte of the option
1215	   within the TCP header, with the first byte of the TCP header being
1216	   assigned an offset of 0.

1218	   If a TCP segment fails to pass any of these checks, it SHOULD be
1219	   silently dropped.

1221	   TCP MUST ignore unknown TCP options, provided they pass the
1222	   validation checks specified above.  In the same way, middle-boxes
1223	   such as packet filters SHOULD NOT reject TCP segments containing
1224	   "unknown" TCP options that pass the validation checks described
1225	   earlier in this Section.

1227	   DISCUSSION:

1229	      The value "2" in the first equation accounts for the option-kind
1230	      byte and the option-length byte, and assumes zero bytes of option-
1231	      data.  This check prevents, among other things, loops in option
1232	      processing that may arise from incorrect option lengths.

1234	      The second equation takes into account the limit on the legitimate
1235	      option length imposed by the syntax of the TCP header, and is
1236	      meant to detect forged option-length values that might make an
1237	      option overlap with the TCP payload, or even go past the actual
1238	      end of the TCP segment carrying the option.

1240	   Middle-boxes such as packet filters should not reject TCP segments
1241	   containing unknown options solely because these options have not been
1242	   present in the SYN/SYN-ACK handshake.

1244	   DISCUSSION:

1246	      There is renewed interest in defining new TCP options for purposes
1247	      like improved connection management and maintenance, advanced
1248	      congestion control schemes, and security features.  The evolution
1249	      of the TCP/IP protocol suite would be severely impacted by
1250	      obstacles to deploying such new protocol mechanisms.

1252	   Middle-boxes such as packet filters SHOULD NOT reject TCP segments
1253	   containing unknown options solely because these options have not been
1254	   present in the SYN/SYN-ACK handshake.

1256	   DISCUSSION:

1258	      In the past, TCP enhancements based on TCP options regularly have
1259	      specified the exchange of a specific "enabling" option during the
1260	      initial SYN/SYN-ACK handshake.  Due to the severely limited TCP
1261	      option space which has already become a concern, it should be
1262	      expected that future specifications might introduce new options
1263	      not negotiated or enabled in this way.  Therefore, middle-boxes
1264	      such as packet filters should not reject TCP segments containing
1265	      unknown options solely because these options have not been present
1266	      in the SYN/SYN-ACK handshake.

1268	   TCP MUST NOT "echo" in any way unknown TCP options received in
1269	   inbound TCP segments.

1271	   DISCUSSION:

1273	      Some TCP implementations have been known to "echo" unknown TCP
1274	      options received in incoming segments.  Here we stress that TCP
1275	      must not "echo" in any way unknown TCP options received in inbound
1276	      TCP segments.  This is at the foundation for the introduction of
1277	      new TCP options, ensuring unambiguous behavior of systems not
1278	      supporting a new specification.

1280	   Section 4 discusses the security implications of common TCP options.

1282	3.10.  Padding

1284	   The TCP header padding is used to ensure that the TCP header ends and
1285	   data begins on a 32-bit boundary.  The padding is composed of zeros.

1287	3.11.  Data

1289	   The data field contains the upper-layer packet being transmitted by
1290	   means of TCP.  This payload is processed by the application process
1291	   making use of the transport services of TCP.  Therefore, the security
1292	   implications of this field are out of the scope of this document.

1294	4.  Common TCP Options

1296	4.1.  End of Option List (Kind = 0)

1298	   TCP implementations MUST be able to gracefully handle those TCP
1299	   segments in which the End of Option List should have been present,
1300	   but is missing.

1302	   DISCUSSION:

1304	      This option is used to indicate the "end of options" in those
1305	      cases in which the end of options would not coincide with the end
1306	      of the TCP header.

1308	      TCP implementations are required to ignore those options they do
1309	      not implement, and to be able to handle options with illegal
1310	      lengths.  Therefore, TCP implementations should be able to
1311	      gracefully handle those TCP segments in which the End of Option
1312	      List should have been present, but is missing.

1314	      It is interesting to note that some TCP implementations do not use
1315	      the "End of Option List" option for indicating the "end of
1316	      options", but simply pad the TCP header with several "No
1317	      Operation" (Kind = 1) options to meet the header length specified
1318	      by the Data Offset header field.

1320	4.2.  No Operation (Kind = 1)

1322	   The no-operation option is basically used to allow the sending system
1323	   to align subsequent options in, for example, 32-bit boundaries.

1325	   This option does not have any known security implications.

1327	4.3.  Maximum Segment Size (Kind = 2)

1329	   The Maximum Segment Size (MSS) option is used to indicate to the
1330	   remote TCP endpoint the maximum segment size this TCP is willing to
1331	   receive.

1333	   The following check MUST be performed on a TCP segment that carries a
1334	   MSS option:

1336	                                  SYN == 1

1338	   If the segment does not pass this check, it MUST be silently dropped.

1340	   DISCUSSION:

1342	      As stated in Section 3.1 of RFC 793 [Postel, 1981c], this option
1343	      can only be sent in the initial connection request (i.e., in
1344	      segments with the SYN control bit set).

1346	   TCP MUST check that the option length is 4.  If the option does not
1347	   pass this check, it MUST be dropped.

1349	   The received MSS SHOULD be sanitized as follows:

1351	                       Sanitized_MSS = max(MSS, 536)

1353	   This "sanitized" MSS value SHOULD be used to compute the "effective
1354	   send MSS" by the expression included in Section 4.2.2.6 of RFC 1122
1355	   [Braden, 1989], as follows:

1357	   Eff.snd.MSS = min(Sanitized_MSS+20, MMS_S) - TCPhdrsize - IPoptionsize

1359	   where:

1361	   Sanitized_MSS:
1362	      sanitized MSS value (the value received in the MSS option, with an
1363	      enforced minimum value)

1365	   MMS_S:
1366	      maximum size for a transport-layer message that TCP may send

1368	   TCPhdrsize:
1369	      size of the TCP header, which typically was 20, but may be larger
1370	      if TCP options are to be sent.

1372	   IPoptionsize
1373	      size of any IP options that TCP will pass to the IP layer with the
1374	      current message.

1376	   DISCUSSION:

1378	      The advertised maximum segment size may be the result of the
1379	      consideration of a number of factors.  Firstly, if fragmentation
1380	      is employed, the size of the IP reassembly buffer may impose a
1381	      limit on the maximum TCP segment size that can be received.
1382	      Considering that the minimum IP reassembly buffer size is 576
1383	      bytes, if an MSS option is not present included in the connection-
1384	      establishment phase, an MSS of 536 bytes should be assumed.
1385	      Secondly, if Path-MTU Discovery (specified in RFC 1191 [Mogul and
1386	      Deering, 1990] and RFC 1981 [McCann et al, 1996]) is expected to
1387	      be used for the connection, an artificial maximum segment size may
1388	      be enforced by a TCP to prevent the remote peer from sending TCP
1389	      segments which would be too large to be transmitted without
1390	      fragmentation.  Finally, a system connected by a low-speed link
1391	      may choose to introduce an artificial maximum segment size to
1392	      enforce an upper limit on the network latency that would otherwise
1393	      negatively affect its interactive applications [Stevens, 1994].

1395	      The TCP specifications do not impose any requirements on the
1396	      maximum segment size value that is included in the MSS option.
1397	      However, there are a number of values that may cause undesirable
1398	      results.  Firstly, an MSS of 0 could possible "freeze" the TCP
1399	      connection, as it would not allow data to be included in the
1400	      payload of the TCP segments.  Secondly, low values other than 0
1401	      would degrade the performance of the TCP connection (wasting more
1402	      bandwidth in protocol headers than in actual data), and could
1403	      potentially exhaust processing cycles at the sending TCP and/or
1404	      the receiving TCP by producing an increase in the interrupt rate
1405	      caused by the transmitted (or received) packets.

1407	      The problems that might arise from low MSS values were first
1408	      described by [Reed, 2001].  However, the community did not reach
1409	      consensus on how to deal with these issues at that point.

1411	      RFC 791 [Postel, 1981a] requires IP implementations to be able to
1412	      receive IP datagrams of at least 576 bytes.  Assuming an IPv4
1413	      header of 20 bytes, and a TCP header of 20 bytes, there should be
1414	      room in each IP packet for 536 application data bytes.

1416	      There are two cases to analyze when considering the possible
1417	      interoperability impact of sanitizing the received MSS value: TCP
1418	      connections relying on IP fragmentation and TCP connections
1419	      implementing Path-MTU Discovery.  In case the corresponding TCP
1420	      connection relies on IP fragmentation, given that the minimum
1421	      reassembly buffer size is required to be 576 bytes by RFC 791
1422	      [Postel, 1981a], the adoption of 536 bytes as a lower limit is
1423	      safe.

1425	      In case the TCP connection relies on Path-MTU Discovery, imposing
1426	      a lower limit on the adopted MSS may ignore the advice of the
1427	      remote TCP on the maximum segment size that can possibly be
1428	      transmitted without fragmentation.  As a result, this could lead
1429	      to the first TCP data segment to be larger than the Path-MTU.
1430	      However, in such a scenario, the TCP segment should elicit an ICMP
1431	      Unreachable "fragmentation needed and DF bit set" error message
1432	      that would cause the "effective send MSS" (E_MSS) to be decreased
1433	      appropriately.  Thus, imposing a lower limit on the accepted MSS
1434	      will not cause any interoperability problems.

1436	      A possible scenario exists in which the proposed enforcement of a
1437	      lower limit in the received MSS might lead to an interoperability
1438	      problem.  If a system was attached to the network by means of a
1439	      link with an MTU of less than 576 bytes, and there was some
1440	      intermediate system which either silently dropped (i.e., without
1441	      sending an ICMP error message) those packets equal to or larger
1442	      than that 576 bytes, or some intermediate system simply filtered
1443	      ICMP "fragmentation needed and DF bit set" error messages, the
1444	      proposed behavior would not lead to an interoperability problem,
1445	      when communication could have otherwise succeeded.  However, the
1446	      interoperability problem would really be introduced by the network
1447	      setup (e.g., the middle-box silently dropping packets), rather
1448	      than by the mechanism proposed in this section.  In any case, TCP
1449	      should nevertheless implement a mechanism such as that specified
1450	      by RFC 4821 [Mathis and Heffner, 2007] to deal with this type of
1451	      "network black-holes".

1453	4.4.  Selective Acknowledgement Option

1455	   The Selective Acknowledgement option provides an extension to allow
1456	   the acknowledgement of individual segments, to enhance TCP's loss
1457	   recovery.

1459	   Two options are involved in the SACK mechanism.  The "Sack-permitted
1460	   option" is sent during the connections-establishment phase, to
1461	   advertise that SACK is supported.  If both TCP peers agree to use
1462	   selective acknowledgements, the actual selective acknowledgements are
1463	   sent, if needed, by means of "SACK options".

1465	4.4.1.  SACK-permitted Option (Kind = 4)

1467	   The SACK-permitted option is meant to advertise that the TCP sending
1468	   this segment supports Selective Acknowledgements.

1470	   The following check MUST be performed on a TCP segment that carries a
1471	   MSS option:

1473	                                  SYN == 1

1475	   If a segment does not pass this check, it MUST be silently dropped.

1477	   DISCUSSION:

1479	      The SACK-permitted option can be sent only in SYN segments.

1481	   TCP MUST check that the option length is 2.  If the option does not
1482	   pass this check it MUST be silently dropped.

1484	4.4.2.  SACK Option (Kind = 5)

1486	   The SACK option is used to convey extended acknowledgment information
1487	   from the receiver to the sender over an established TCP connection.
1488	   The option consists of an option-kind byte (which must be 5), an
1489	   option-length byte, and a variable number of SACK blocks.

1491	   TCP MUST silently discard those TCP segments carrying a SACK option
1492	   that does not pass the following check:

1494	              option-offset + option-length <= Data Offset * 4

1496	   TCP MUST silently discard those TCP segments carrying a SACK option
1497	   that does not pass the following check:

1499	                            option-length >= 10

1501	   DISCUSSION:

1503	      A SACK Option with zero SACK blocks is nonsensical.  The value
1504	      "10" accounts for the option-kind byte, the option-length byte, a
1505	      4-byte left-edge field, and a 4-byte right-edge field.

1507	   TCP MUST silently discard those TCP segments carrying a SACK option
1508	   that does not pass the following check:

1510	                        (option-length - 2) % 8 == 0

1512	   DISCUSSION:

1514	      As stated in Section 3 of RFC 2018 [Mathis et al, 1996], a SACK
1515	      option that specifies n blocks will have a length of 8*n+2.

1517	   TCP MUST silently discard those TCP segments carrying a SACK option
1518	   that contains a SACK block that does not pass the following check:

1520	                  Left Edge of Block < Right Edge of Block

1522	   As in all the other occurrences in this document, all comparisons
1523	   between sequence numbers should be performed using sequence number
1524	   arithmetic.

1526	   DISCUSSION:

1528	      Each block included in a SACK option represents a number of
1529	      received data bytes that are contiguous and isolated; that is, the
1530	      bytes just below the block, (Left Edge of Block - 1), and just
1531	      above the block, (Right Edge of Block), have not yet been
1532	      received.

1534	   TCP MUST enforce a limit on the number of SACK blocks that a TCP will
1535	   store in memory for each connection at any time.

1537	   DISCUSSION:

1539	      The TCP receiving a SACK option is expected to keep track of the
1540	      selectively-acknowledged blocks.  Even when space in the TCP
1541	      header is limited (and thus each TCP segment can selectively-
1542	      acknowledge at most four blocks of data), an attacker could try to
1543	      perform a buffer overflow or a resource-exhaustion attack by
1544	      sending a large number of SACK options.

1546	      For example, an attacker could send a large number of SACK
1547	      options, each of them acknowledging one byte of data.
1548	      Additionally, for the purpose of wasting resources on the attacked
1549	      system, each of these blocks would be separated from each other by
1550	      one byte, to prevent the attacked system from coalescing two (or
1551	      more) contiguous SACK blocks into a single SACK block.  If the
1552	      attacked system kept track of each SACKed block by storing both
1553	      the Left Edge and the Right Edge of the block, then for each
1554	      window of data, the attacker could waste up to 4 * Window bytes of
1555	      memory at the attacked TCP.

1557	      The value "4 * Window" results from the expression "(Window / 2) *
1558	      8", in which the value "2" accounts for the 1-byte block
1559	      selectively-acknowledged by each SACK block and 1 byte that would
1560	      be used to separate each SACK blocks from each other, and the
1561	      value "8" accounts for the 8 bytes needed to store the Left Edge
1562	      and the Right Edge of each SACKed block.

1564	      Therefore, it is clear that a limit should be imposed on the
1565	      number of SACK blocks that a TCP will store in memory for each
1566	      connection at any time.  Measurements in [Dharmapurikar and
1567	      Paxson, 2005] indicate that in the vast majority of cases
1568	      connections have a single hole in the data stream at any given
1569	      time.  Thus, a limit of 16 SACK blocks for each connection would
1570	      handle even most of the more unusual cases in which there is more
1571	      than one simultaneous hole at a time.

1573	4.5.  MD5 Option (Kind=19)

1575	   The TCP MD5 option provides a mechanism for authenticating TCP
1576	   segments with a 18-byte digest produced by the MD5 algorithm.  The
1577	   option consists of an option-kind byte (which must be 19), an option-
1578	   length byte (which must be 18), and a 16-byte MD5 digest.

1580	   TCP MUST silently drop a TCP segment that carries a TCP MD5 option
1581	   that does not pass the following checks:

1583	              option-offset + option-length <= Data Offset * 4

1585	                            option-length == 18

1587	   DISCUSSION:

1589	      The TCP MD5 option is of "Case 2", and has a fixed length.

1591	   DISCUSSION:

1593	      A basic weakness on the TCP MD5 option is that the MD5 algorithm
1594	      itself has been known (for a long time) to be vulnerable to
1595	      collision search attacks.

1597	      [Bellovin, 2006] argues that it has two other weaknesses, namely
1598	      that it does not provide a key identifier, and that it has no
1599	      provision for automated key management.  However, it is generally
1600	      accepted that while a Key-ID field can be a good approach for
1601	      providing smooth key rollover, it is not actually a requirement.
1602	      For instance, most systems implementing the TCP MD5 option include
1603	      a "keychain" mechanism that fully supports smooth key rollover.
1604	      Additionally, with some further work, ISAKMP/IKE could be used to
1605	      configure the MD5 keys.

1607	      It is interesting to note that while the TCP MD5 option, as
1608	      specified by RFC 2385 [Heffernan, 1998], addresses the TCP-based
1609	      forgery attacks against TCP discussed in Section 11, it does not
1610	      address the ICMP-based connection-reset attacks discussed in
1611	      Section 15.  As a result, while a TCP connection may be protected
1612	      from TCP-based forgery attacks by means of the MD5 option, an
1613	      attacker might still be able to successfully perform the ICMP-
1614	      based counter-part.

1616	      The TCP MD5 option has been obsoleted by the TCP-AO.

1618	4.6.  Window scale option (Kind = 3)

1620	   The window scale option provides a mechanism to expand the definition
1621	   of the TCP window to 32 bits, such that the performance of TCP can be
1622	   improved in some network scenarios.  The Window scale option consists
1623	   of an option-kind byte (which must be 3), followed by an option-
1624	   length byte (which must be 3), and a shift count (shift.cnt) byte
1625	   (the actual option-data).

1627	   The option may be sent only in the initial SYN segment, but may also
1628	   be sent in a SYN/ACK segment if the option was received in the
1629	   initial SYN segment.  If the option is received in any other segment,
1630	   it MUST be silently dropped.

1632	   TCP MUST silently discard TCP segments that contain a Window scale
1633	   option whose option-length is not 3.

1635	   DISCUSSION:

1637	      This option has a fixed length.

1639	   TCP MUST silently discard TCP segments that contain a Window scale
1640	   option that does not pass the following check:

1642	                              shift.cnt <= 14

1644	   DISCUSSION:

1646	      As discussed in Section 2.3 of RFC 1323 [Jacobson et al, 1992], in
1647	      order to prevent new data from being mistakenly considered as old
1648	      and vice versa, the resulting window should be equal to or smaller
1649	      than 2^32.

1651	   DISCUSSION:

1653	      [Welzl, 2008] describes major problems with the use of the Window
1654	      scale option in the Internet due to faulty equipment.

1656	      While there are not known security implications arising from the
1657	      window scale mechanism itself, the size of the TCP window has a
1658	      number of security implications.  In general, larger window sizes
1659	      increase the chances of an attacker from successfully performing
1660	      forgery attacks against TCP, such as those described in Section 11
1661	      of this document.  Additionally, large windows can exacerbate the
1662	      impact of resource exhaustion attacks such as those described in
1663	      Section 7 of this document.

1665	      Section 3.7 provides a general discussion of the security
1666	      implications of the TCP window size.  Section 7.3.2 discusses the
1667	      security implications of Automatic receive-buffer tuning
1668	      mechanisms.

1670	4.7.  Timestamps option (Kind = 8)

1672	   The Timestamps option, specified in RFC 1323 [Jacobson et al, 1992],
1673	   is used to perform two functions: Round-Trip Time Measurement (RTTM),
1674	   and Protection Against Wrapped Sequence Numbers (PAWS).

1676	   TCP MUST silently discard TCP segments that contain a Timestamps
1677	   option that does not pass the following check:

1679	                            option-length == 10

1681	   DISCUSSION:

1683	      As specified by RFC 1323, the option-length must be 10.

1685	4.7.1.  Generation of timestamps

1687	   TCP SHOULD generate timestamps with the following expression:

1689	   timestamp = T() + F(localhost, localport, remotehost, remoteport, secret_key)

1691	   where the result of T() is a global system clock that complies with
1692	   the requirements of Section 4.2.2 of RFC 1323 [Jacobson et al, 1992],
1693	   and F() is a function that should not be computable from the outside.
1694	   Therefore, we suggest F() to be a cryptographic hash function of the
1695	   connection-id and some secret data.

1697	   DISCUSSION:

1699	      For the purpose of PAWS, the timestamps sent on a connection are
1700	      required to be monotonically increasing.  While there is no
1701	      requirement that timestamps are monotonically increasing across
1702	      TCP connections, the generation of timestamps such that they are
1703	      monotonically increasing across connections between the same two
1704	      endpoints allows the use of timestamps for improving the handling
1705	      of SYN segments that are received while the corresponding four-
1706	      tuple is in the TIME-WAIT state.  This is discussed in Section
1707	      11.1.2 of this document.

1709	      F() provides an offset that will be the same for all incarnations
1710	      of a connection between the same two endpoints, while T() provides
1711	      the monotonically increasing values that are needed for PAWS.

1713	      Further discussion about this algorithm is available in
1714	      [I-D.gont-timestamps-generation].

1716	   TCP SHOULD NOT initialize a global timestamp counter to a fixed value
1717	   when the system is bootstrapped.

1719	   DISCUSSION:

1721	      Some implementations are known to initialize their global
1722	      timestamp clock to zero when the system is bootstrapped.  This is
1723	      undesirable, as the timestamp clock would disclose the system
1724	      uptime.

1726	   TCP SHOULD set the Timestamp Echo Reply (TSecr) field to zero when
1727	   sending a TCP segment that does not have the ACK bit set (i.e., a SYN
1728	   segment).

1730	   DISCUSSION:

1732	      Some TCP implementations have been found to fail to set the
1733	      Timestamp Echo Reply field (TSecr) to zero in TCP segments that do
1734	      not have the ACK bit set, thus potentially leaking information.

1736	4.7.2.  Vulnerabilities

1738	   Blind In-Window Attacks

1740	   Segments that contain a timestamp option smaller than the last
1741	   timestamp option recorded by TCP are silently dropped.  This allows
1742	   for a subtle attack against TCP that would allow an attacker to cause
1743	   one direction of data transfer of the attacked connection to freeze
1744	   [US-CERT, 2005c].  An attacker could forge a TCP segment that
1745	   contains a timestamp that is much larger than the last timestamp
1746	   recorded for that direction of the data transfer of the connection.
1747	   The offending segment would cause the recorded timestamp (TS.Recent)
1748	   to be updated and, as a result, subsequent segments sent by the
1749	   impersonated TCP peer would be simply dropped by the receiving TCP.
1750	   This vulnerability has been documented in [US-CERT, 2005d].  However,
1751	   it is worth noting that exploitation of this vulnerability requires
1752	   an attacker to guess (or know) the four-tuple {IP Source Address, IP
1753	   Destination Address, TCP Source Port, TCP Destination Port}, as well
1754	   a valid Sequence Number and a valid Acknowledgement Number.  If an
1755	   attacker has such detailed knowledge about a TCP connection, unless
1756	   TCP segments are protected by proper authentication mechanisms (such
1757	   as IPsec [Kent and Seo, 2005]), he can perform a variety of attacks
1758	   against the TCP connection, even more devastating than the one just
1759	   described.

1761	   Information leaking

1763	   Some implementations are known to maintain a global timestamp clock,
1764	   which is used for all connections.  This is undesirable, as an
1765	   attacker that can establish a connection with a host would learn the
1766	   timestamp used for all the other connections maintained by that host,
1767	   which could be useful for performing any attacks that require the
1768	   attacker to forge TCP segments.  A timestamps generator such as the
1769	   one recommended in Section 4.7.1 of this document would prevent this
1770	   information leakage, as it separates the "timestamps space" among the
1771	   different TCP connections.

1773	   Some implementations are known to initialize their global timestamp
1774	   clock to zero when the system is bootstrapped.  This is undesirable,
1775	   as the timestamp clock would disclose the system uptime.  A
1776	   timestamps generator such as the one recommended in Section 4.7.1 of
1777	   this document would prevent this information leakage, as the function
1778	   F() introduces an "offset" that does not disclose the system uptime.

1780	   As discussed in Section 3.2 of RFC 1323 [Jacobson et al, 1992], the
1781	   Timestamp Echo Reply field (TSecr) is only valid if the ACK bit of
1782	   the TCP header is set, and its value must be zero when it is not
1783	   valid.  However, some TCP implementations have been found to fail to
1784	   set the Timestamp Echo Reply field (TSecr) to zero in TCP segments
1785	   that do not have the ACK bit set, thus potentially leaking
1786	   information.  We stress that TCP implementations should comply with
1787	   RFC 1323 by setting the Timestamp Echo Reply field (TSecr) to zero in
1788	   those TCP segments that do not have the ACK bit set, thus eliminating
1789	   this potential information leakage.

1791	   Finally, it should be noted that the Timestamps option can be
1792	   exploited to count the number of systems behind NATs (Network Address
1793	   Translators) [Srisuresh and Egevang, 2001].  An attacker could count
1794	   the number of systems behind a NAT by establishing a number of TCP
1795	   connections (using the public address of the NAT) and indentifying
1796	   the number of different timestamp sequences.  This information
1797	   leakage could be eliminated by rewriting the contents of the
1798	   Timestamps option at the NAT.  [Gont and Srisuresh, 2008] provides a
1799	   detailed discussion of the security implications of NATs, and
1800	   proposes mitigations for this and other issues.

1802	5.  Connection-establishment mechanism

1804	   The following subsections describe a number of attacks that can be
1805	   performed against TCP by exploiting its connection-establishment
1806	   mechanism.

1808	5.1.  SYN flood

1810	   TCP SHOULD implement (and enable by default) a syn-cache [Lemon,
1811	   2002].

1813	   TCP SHOULD implement syn-cookies, and SHOULD enable them only after a
1814	   specified number of TCBs has been allocated for connections in the
1815	   SYN-RECEIVED state.

1817	   DISCUSSION:

1819	      TCP uses a mechanism known as the "three-way handshake" for the
1820	      establishment of a connection between two TCP peers.  RFC 793
1821	      [Postel, 1981c] states that when a TCP that is in the LISTEN state
1822	      receives a SYN segment (i.e., a TCP segment with the SYN flag
1823	      set), it must transition to the SYN-RECEIVED state, record the
1824	      control information (e.g., the ISN) contained in the SYN segment
1825	      in a Transmission Control Block (TCB), and respond with a SYN/ACK
1826	      segment.

1828	      A Transmission Control Block is the data structure used to store
1829	      (usually within the kernel) all the information relevant to a TCP
1830	      connection.  The concept of "TCB" is introduced in the core TCP
1831	      specification RFC 793 [Postel, 1981c].

1833	      In practice, virtually all existing implementations do not modify
1834	      the state of the TCP that was in the LISTEN state, but rather
1835	      create a new TCP (i.e., a new "protocol machine"), and perform all
1836	      the state transitions on this newly-created TCP.  This allows the
1837	      application running on top of TCP to service to more than one
1838	      client at the same time.  As a result, each connection request
1839	      results in the allocation of system memory to store the TCB
1840	      associated with the newly created TCB.

1842	      If TCP was implemented strictly as described in RFC 793, the
1843	      application running on top of TCP would have to finish servicing
1844	      the current client before being able to service the next one in
1845	      line, or should instead be able to perform some kind of connection
1846	      hand-off.

1848	      An attacker could exploit TCP's connection-establishment mechanism
1849	      to perform a Denial of Service (DoS) attack, by sending a large
1850	      number of connection requests to the target system, with the
1851	      intent of exhausting the system memory destined for storing TCBs
1852	      (or related kernel data structures), thus preventing the attacked
1853	      system from establishing new connections with legitimate users.
1854	      This attack is widely known as "SYN flood", and has received a lot
1855	      of attention during the late 90's [CERT, 1996].

1857	      Given that the attacker does not need to complete the three-way
1858	      handshake for the attacked system to tie system resources to the
1859	      newly created TCBs, he will typically forge the source IP address
1860	      of the malicious SYN segments he sends, thus concealing his own IP
1861	      address.

1863	      If the forged IP addresses corresponded to some reachable system,
1864	      the impersonated system would receive the SYN/ACK segment sent by
1865	      the attacked host (in response to the forged SYN segment), which
1866	      would elicit an RST segment.  This RST segment would be delivered
1867	      to the attacked system, causing the corresponding connection to be
1868	      aborted, and the corresponding TCB to be removed.

1870	      As the impersonated host would not have any state information for
1871	      the TCP connection being referred to by the SYN/ACK segment, it
1872	      would respond with a RST segment, as specified by the TCP segment
1873	      processing rules of RFC 793 [Postel, 1981c].

1875	      However, if the forged IP source addresses were unreachable, the
1876	      attacked TCP would continue retransmitting the SYN/ACK segment
1877	      corresponding to each connection request, until timing out and
1878	      aborting the connection.  For this reason, a number of widely
1879	      available attack tools first check whether each of the (forged) IP
1880	      addresses are reachable by sending an ICMP echo request to them.
1881	      The receipt of an ICMP echo response is considered an indication
1882	      of the IP address being reachable (and thus results in the
1883	      corresponding IP address not being used for performing the
1884	      attack), while the receipt of an ICMP unreachable error message is
1885	      considered an indication of the IP address being unreachable (and
1886	      thus results in the corresponding IP address being used for
1887	      performing the attack).

1889	      [Gont, 2008b] describes how the so-called ICMP soft errors could
1890	      be used by TCP to abort connections in any of the non-synchronized
1891	      states.  While implementation of the mechanism described in that
1892	      document would certainly not eliminate the vulnerability of TCP to
1893	      SYN flood attacks (as the attacker could use addresses that are
1894	      simply "black-holed"), it provides an example of how signaling
1895	      information such as that provided by means of ICMP error messages
1896	      can provide valuable information that a transport protocol could
1897	      use to perform heuristics.

1899	      In order to mitigate the impact of this attack, the amount of
1900	      information stored for non-established connections should be
1901	      reduced (ideally, non-synchronized connections should not require
1902	      any state information to be maintained at the TCP performing the
1903	      passive OPEN).  There are basically two mitigation techniques for
1904	      this vulnerability: a syn-cache and syn-cookies.

1906	      [Borman, 1997] and RFC 4987 [Eddy, 2007] contain a general
1907	      discussion of SYN-flooding attacks and common mitigation
1908	      approaches.

1910	      The syn-cache [Lemon, 2002] approach aims at reducing the amount
1911	      of state information that is maintained for connections in the
1912	      SYN-RECEIVED state, and allocates a full TCB only after the
1913	      connection has transited to the ESTABLISHED state.

1915	      The syn-cookie [Bernstein, 1996] approach aims at completely
1916	      eliminating the need to maintain state information at the TCP
1917	      performing the passive OPEN, by encoding the most elementary
1918	      information required to complete the three-way handshake in the
1919	      Sequence Number of the SYN/ACK segment that is sent in response to
1920	      the received SYN segment.  Thus, TCP is relieved from keeping
1921	      state for connections in the SYN-RECEIVED state.

1923	      The syn-cookie approach has a number of drawbacks:

1925	      *  Firstly, given the limited space in the Sequence Number field,
1926	         it is not possible to encode all the information included in
1927	         the initial segment, such as, for example, support of Selective
1928	         Acknowledgements (SACK).

1930	      *  Secondly, in the event that the Acknowledgement segment sent in
1931	         response to the SYN/ACK sent by the TCP that performed the
1932	         passive OPEN (i.e., the TCP server) were lost, the connection
1933	         would end up in the ESTABLISHED state on the client-side, but
1934	         in the CLOSED state on the server side.  This scenario is
1935	         normally handled in TCP by having the TCP server retransmit its
1936	         SYN/ACK.  However, if syn-cookies are enabled, there would be
1937	         no connection state information on the server side, and thus
1938	         the SYN/ACK would never be retransmitted.  This could lead to a
1939	         scenario in which the connection could remain in the
1940	         ESTABLISHED state on the client side, but in the CLOSED state
1941	         at the server side, indefinitely.  If the application protocol
1942	         was such that it required the client to wait for some data from
1943	         the server (e.g., a greeting message) before sending any data
1944	         to the server, a deadlock would take place, with the client
1945	         application waiting for such server data, and the server
1946	         waiting for the TCP three-way handshake to complete.

1948	      *  Thirdly, unless the function used to encode information in the
1949	         SYN/ACK packet is cryptographically strong, an attacker could
1950	         forge TCP connections in the ESTABLISHED state by forging ACK
1951	         segments that would be considered as "legitimate" by the
1952	         receiving TCP.

1954	      *  Fourthly, in those scenarios in which establishment of new
1955	         connections is blocked by simply dropping segments with the SYN
1956	         bit set, use of SYN cookies could allow an attacker to bypass
1957	         the firewall rules, as a connection could be established by
1958	         forging an ACK segment with the correct values, without the
1959	         need of setting the SYN bit.

1961	      As a result, syn-cookies are usually not employed as a first line
1962	      of defense against SYN-flood attacks, but are only as the last
1963	      resort to cope with them.  For example, some TCP implementations
1964	      enable syn-cookies only after a certain number of TCBs has been
1965	      allocated for connections in the SYN-RECEIVED state.  We recommend
1966	      this implementation technique, with a syn-cache enabled by
1967	      default, and use of syn-cookies triggered, for example, when the
1968	      limit of TCBs for non-synchronized connections with a given port
1969	      number has been reached.

1971	      It is interesting to note that a SYN-flood attack should only
1972	      affect the establishment of new connections.  A number of books
1973	      and online documents seem to assume that TCP will not be able to
1974	      respond to any TCP segment that is meant for a TCP port that is
1975	      being SYN-flooded (e.g., respond with an RST segment upon receipt
1976	      of a TCP segment that refers to a non-existent TCP connection).
1977	      While SYN-flooding attacks have been successfully exploited in the
1978	      past for achieving such a goal [Shimomura, 1995], as clarified by
1979	      RFC 1948 [Bellovin, 1996] the effectiveness of SYN flood attacks
1980	      to silence a TCP implementation arose as a result of a bug in the
1981	      4.4BSD TCP implementation [Wright and Stevens, 1994], rather than
1982	      from a theoretical property of SYN-flood attacks themselves.
1983	      Therefore, those TCP implementations that do not suffer from such
1984	      a bug should not be silenced as a result of a SYN-flood attack.

1986	      [Zquete, 2002] describes a mechanism that could theoretically
1987	      improve the functionality of SYN cookies.  It exploits the TCP
1988	      "simultaneous open" mechanism, as illustrated in Figure 5.

1990	             See Figure 5, in page 46 of the UK CPNI document.

1992	             Use of TCP simultaneous open for handling SYN floods

1994	      In line 1, TCP A initiates the connection-establishment phase by
1995	      sending a SYN segment to TCP B. In line 2, TCP B creates a SYN
1996	      cookie as described by [Bernstein, 1996], but does not set the ACK
1997	      bit of the segment it sends (thus really sending a SYN segment,
1998	      rather than a SYN/ACK).  This "fools" TCP A into thinking that
1999	      both SYN segments "have crossed each other in the network" as if a
2000	      "simultaneous open" scenario had taken place.  As a result, in
2001	      line 3 TCP A sends a SYN/ACK segment containing the same options
2002	      that were contained in the original SYN segment.  In line 4, upon
2003	      receipt of this segment, TCP processes the cookie encoded in the
2004	      ACK field as if it had been the result of a traditional SYN cookie
2005	      scenario, and moves the connection into the ESTABLISHED state.  In
2006	      line 5, TCP B sends a SYN/ACK segment, which causes the connection
2007	      at TCP A to move into the ESTABLISHED state.  In line 6, TCP A
2008	      sends a data segment on the connection.

2010	      While this mechanism would work in theory, unfortunately there are
2011	      a number of factors that prevent it from being usable in real
2012	      network environments:

2014	      *  Some systems are not able to perform the "simultaneous open"
2015	         operation specified in RFC 793, and thus the connection
2016	         establishment will fail.

2018	      *  Some firewalls might prevent the establishment of TCP
2019	         connections that rely on the "simultaneous open" mechanism
2020	         (e.g., a given firewall might be allowing incoming SYN/ACK
2021	         segments, but not outgoing SYN/ACK segments).

2023	      Therefore, we do not recommend implementation of this mechanism
2024	      for mitigating SYN-flood attacks.

2026	5.2.  Connection forgery

2028	   The process of causing a TCP connection to be illegitimately
2029	   established between two arbitrary remote peers is usually referred to
2030	   as "connection spoofing" or "connection forgery".  This can have a
2031	   great negative impact when systems establish some sort of trust
2032	   relationships based on the IP addresses used to establish a TCP
2033	   connection [daemon9 et al, 1996].

2035	   It should be stressed that hosts should not establish trust
2036	   relationships based on the IP addresses [CPNI, 2008] or on the TCP
2037	   ports in use for the TCP connection (see Section 3.1 and Section 3.2
2038	   of this document).

2040	   One of the underlying weaknesses that allow this vulnerability to be
2041	   more easily exploited is the use of an inadequate Initial Sequence
2042	   Number (ISN) generator, as explained back in the 80's in [Morris,
2043	   1985].  As discussed in Section 3.3.1 of this document, any TCP
2044	   implementation that makes use of an inadequate ISN generator will be
2045	   more vulnerable to this type of attack.  A discussion of approaches
2046	   for a more careful generation of Initial Sequence Numbers (ISNs) can
2047	   be found in Section 3.3.1 of this document.

2049	   Another attack vector for performing connection-forgery attacks is
2050	   the use of IP source routing.  By forging the Source Address of the
2051	   IP packets that encapsulate the TCP segments of a connection, and
2052	   carefully crafting an IP source route option (i.e., either LSSR or
2053	   SSRR) that includes a system whose traffic he can monitor, an
2054	   attacker could cause the packets sent by the attacked system (e.g.,
2055	   the SYN/ACK segment sent in response to the attacker's SYN segment)
2056	   to be illegitimately directed to him [CPNI, 2008].  Thus, the
2057	   attacker would not even need to guess valid sequence numbers for
2058	   forging a TCP connection, as he would simply have direct access to
2059	   all this information.  As discussed in [CPNI, 2008], it is strongly
2060	   recommended that systems disable IP Source Routing by default, or at
2061	   the very least, they disable source routing for IP packets that
2062	   encapsulate TCP segments.

2064	   The IPv6 Routing Header Type 0, which provides a similar
2065	   functionality to that provided by IPv4 source routing, has been
2066	   officially deprecated by RFC 5095 [Abley et al, 2007].

2068	5.3.  Connection-flooding attack

2070	5.3.1.  Vulnerability

2072	   The creation and maintenance of a TCP connection requires system
2073	   memory to maintain shared state between the local and the remote TCP.
2074	   As system memory is a finite resource, there is a limit on the number
2075	   of TCP connections that a system can maintain at any time.  When the
2076	   TCP API is employed to create a TCP connection with a remote peer, it
2077	   allocates system memory for maintaining shared state with the remote
2078	   TCP peer, and thus the resulting connection would tie a similar
2079	   amount of resources at the remote host as at the local host.
2080	   However, if special packet-crafting tools are employed to forge TCP
2081	   segments to establish TCP connections with a remote peer, the local
2082	   kernel implementation of TCP can be bypassed, and the allocation of
2083	   resources on the attacker's system for maintaining shared state can
2084	   be avoided.  Thus, a malicious user could create a large number of
2085	   TCP connections, and subsequently abandon them, thus tying system
2086	   resources only at the remote peer.  This allows an attacker to create
2087	   a large number of TCP connections at the attacked system with the
2088	   intent of exhausting its kernel memory, without exhausting the
2089	   attacker's own resources.  [CERT, 2000] discusses this vulnerability,
2090	   which is usually referred to as the "Naptha attack".

2092	   This attack is similar in nature to the "Netkill" attack discussed in
2093	   Section 7.1.1.  However, while Netkill ties both TCBs and TCP send
2094	   buffers to the abandoned connections, Naptha only ties TCBs (and
2095	   related kernel structures), as it doesn't issue any application
2096	   requests.

2098	   The symptom of this attack is an extremely large number of TCP
2099	   connections in the ESTABLISHED state, which would tend to exhaust
2100	   system resources and deny service to new clients (or possibly cause
2101	   the system to crash).

2103	   It should be noted that it is possible for an attacker to perform the
2104	   same type of attack causing the abandoned connections to remain in
2105	   states other than ESTABLISHED.  This might be interesting for an
2106	   attacker, as it is usually the case that connections in states other
2107	   than ESTABLISHED usually have no controlling user-space process (that
2108	   is, the former controlling process for the connection has already
2109	   closed the corresponding file descriptor).

2111	   A particularly interesting case of a connection-flooding attack that
2112	   aims at abandoning connections in a state other than ESTABLISHED is
2113	   discussed in Section 6.1 of this document.

2115	5.3.2.  Countermeasures

2117	   As with many other resource exhaustion attacks, the problem in
2118	   generating countermeasures for this attack is that it may be
2119	   difficult to differentiate between an actual attack and a legitimate
2120	   high-load scenario.  However, there are a number of countermeasures
2121	   which, when tuned for each particular network environment, could
2122	   allow a system to resist this attack and continue servicing
2123	   legitimate clients.

2125	   Hosts SHOULD enforce limits on the number of TCP connections with no
2126	   user-space controlling process.

2128	   DISCUSSION:

2130	      Connections in states other than ESTABLISHED usually have no user-
2131	      space controlling process.  This prevents the application making
2132	      use of those connections from enforcing limits on the maximum
2133	      number of ongoing connections (either on a global basis or a
2134	      per-IP address basis).  When resource exhaustion is imminent or
2135	      some threshold of ongoing connections is reached, the operating
2136	      system should consider freeing system resources by aborting
2137	      connections that have no user-space controlling process.  A number
2138	      of such connections could be aborted on a random basis, or based
2139	      on some heuristics performed by the operating system (e.g., first
2140	      abort connections with peers that have the largest number of
2141	      ongoing connections with no user-space controlling process).

2143	   Hosts SHOULD enforce per-process and per-user limits on maximum
2144	   kernel memory that can be used at any time.

2146	   Hosts SHOULD enforce per-process and per-user limits on the number of
2147	   existent TCP connections at any time.

2149	   DISCUSSION:

2151	      While the Naphta attack is usually targeted at a service such as
2152	      HTTP, its impact is usually system-wide.  This is particularly
2153	      undesirable, as an attack against a single service might affect
2154	      the system as a whole (for example, possibly precluding remote
2155	      system administration).

2157	      In order to avoid an attack to a single service from affecting
2158	      other services, we advise TCP implementations to enforce per-
2159	      process and per-user limits on maximum kernel memory that can be
2160	      used at any time.  Additionally, we recommend implementations to
2161	      enforce per-process and per-user limits on the number of existent
2162	      TCP connections at any time.

2164	   Applications SHOULD enforce limits on the number of simultaneous
2165	   connections that can be established from a single IP address or
2166	   network prefix at any given time.

2168	   DISCUSSION:

2170	      An application could limit the number of simultaneous connections
2171	      that can be established from a single IP address or network prefix
2172	      at any given time.  Once that limit has been reached, some other
2173	      connection from the same IP address or network prefix would be
2174	      aborted, thus allowing the application to service this new
2175	      incoming connection.

2177	      There are a number of factors that should be taken into account
2178	      when defining the specific limit to enforce.  For example, in the
2179	      case of protocols that have an authentication phase (e.g., SSH,
2180	      POP3, etc.), this limit could be applied to sessions that have not
2181	      yet been authenticated.  Additionally, depending on the nature and
2182	      use of the application, it might or might not be normal for a
2183	      single system to have multiple connections to the same server at
2184	      the same time.

2186	      For many network services, the limit of maximum simultaneous
2187	      connections could be kept very low.  For example, an SMTP server
2188	      could limit the number of simultaneous connections from a single
2189	      IP address to 10 or 20 connections.

2191	      While this limit could work in many network scenarios, we
2192	      recommend network operators to measure the maximum number of
2193	      concurrent connections from a single IP address during normal
2194	      operation, and set the limit accordingly.

2196	      In the case of web servers, this limit will usually need to be set
2197	      much higher, as it is common practice for web clients to establish
2198	      multiple simultaneous connections with a single web server to
2199	      speed up the process of loading a web page (e.g., multiple graphic
2200	      files can be downloaded simultaneously using separate TCP
2201	      connections).

2203	      NATs (Network Address Translators) [Srisuresh and Egevang, 2001]
2204	      are widely deployed in the Internet, and may exacerbate this
2205	      situation, as a large number of clients behind a NAT might each
2206	      establish multiple TCP connections with a given web server, which
2207	      would all appear to be originate from the same IP address (that of
2208	      the NAT box).

2210	   Firewalls MAY enforce limits on the number of simultaneous
2211	   connections that can be established from a single IP address or
2212	   network prefix at any given time.

2214	   DISCUSSION:

2216	      Some firewalls can be configured to limit the number of
2217	      simultaneous connections that any system can maintain with a
2218	      specific system and/or service at any given time.  Limiting the
2219	      number of simultaneous connections that each system can establish
2220	      with a specific system and service would effectively limit the
2221	      possibility of an attacker that controls a single IP address to
2222	      exhaust system resources at the attacker system/service.

2224	5.4.  Firewall-bypassing techniques

2226	   TCP MUST silently drop those TCP segments that have both the SYN and
2227	   the RST flags set.

2229	   DISCUSSION:

2231	      Some firewalls block incoming TCP connections by blocking only
2232	      incoming SYN segments.  However, there are inconsistencies in how
2233	      different TCP implementations handle SYN segments that have
2234	      additional flags set, which may allow an attacker to bypass
2235	      firewall rules [US-CERT, 2003b].

2237	      For example, some firewalls have been known to mistakenly allow
2238	      incoming SYN segments if they also have the RST bit set.  As some
2239	      TCP implementations will create a new connection in response to a
2240	      TCP segment with both the SYN and RST bits set, an attacker could
2241	      bypass the firewall rules and establish a connection with a
2242	      "protected" system by setting the RST bit in his SYN segments.

2244	      Here we advise TCP implementations to silently drop those TCP
2245	      segments that have both the SYN and the RST flags set.

2247	6.  Connection-termination mechanism

2249	6.1.  FIN-WAIT-2 flooding attack

2251	6.1.1.  Vulnerability

2253	   TCP implements a connection-termination mechanism that is employed
2254	   for the graceful termination of a TCP connection.  This mechanism
2255	   usually consists of the exchange of four-segments.  Figure 6
2256	   illustrates the usual segment exchange for this mechanism.

2258	   Figure 6: TCP connection-termination mechanism

2260	             See Figure 6, in page 50 of the UK CPNI document.

2262	                   TCP connection-termination mechanism

2264	   A potential problem may arise as a result of the FIN-WAIT-2 state:
2265	   there is no limit on the amount of time that a TCP can remain in the
2266	   FIN-WAIT-2 state.  Furthermore, no segment exchange is required to
2267	   maintain the connection in that state.

2269	   As a result, an attacker could establish a large number of
2270	   connections with the target system, and cause it close each of them.
2271	   For each connection, once the target system has sent its FIN segment,
2272	   the attacker would acknowledge the receipt of this segment, but would
2273	   send no further segments on that connection.  As a result, an
2274	   attacker could cause the corresponding system resources (e.g., the
2275	   system memory used for storing the TCB) without the need to send any
2276	   further packets.

2278	   While the CLOSE command described in RFC 793 [Postel, 1981c] simply
2279	   signals the remote TCP end-point that this TCP has finished sending
2280	   data (i.e., it closes only one direction of the data transfer), the
2281	   close() system-call available in most operating systems has different
2282	   semantics: it marks the corresponding file descriptor as closed (and
2283	   thus it is no longer usable), and assigns the operating system the
2284	   responsibility to deliver any queued data to the remote TCP peer and
2285	   to terminate the TCP connection.  This makes the FIN-WAIT-2 state
2286	   particularly attractive for performing memory exhaustion attacks, as
2287	   even if the application running on top of TCP were imposing limits on
2288	   the maximum number of ongoing connections, and/or time limits on the
2289	   function calls performed on TCP connections, that application would
2290	   be unable to enforce these limits on the FIN-WAIT-2 state.

2292	6.1.2.  Countermeasures

2294	   A number of countermeasures can be implemented to mitigate FIN-WAIT-2
2295	   flooding attacks.  Some of these countermeasures require changes in
2296	   the TCP implementations, while others require changes in the
2297	   applications running on top of TCP.

2299	   TCP SHOULD enforce limits on the duration of the FIN-WAIT-2 state.

2301	   DISCUSSION:

2303	      In order to avoid the risk of having connections stuck in the FIN-
2304	      WAIT-2 state indefinitely, a number of systems incorporate a
2305	      timeout for the FIN-WAIT-2 state.  For example, the Linux kernel
2306	      version 2.4 enforces a timeout of 60 seconds [Linux, 2008].  If
2307	      the connection-termination mechanism does not complete before that
2308	      timeout value, it is aborted.

2310	   Enabling applications to enforce limits on ongoing connections

2312	   As discussed in Section 6.1.1, the fact that the close() system call
2313	   marks the corresponding file descriptor as closed prevents the
2314	   application running on top of TCP from enforcing limits on the
2315	   corresponding connection.

2317	   While it is common practice for applications to terminate their
2318	   connections by means of the close() system call, it is possible for
2319	   an application to initiate the connection-termination phase without
2320	   closing the corresponding file descriptor (hence keeping control of
2321	   the connection).

2323	   In order to achieve this, an application performing an active close
2324	   (i.e., initiating the connection-termination phase) should replace
2325	   the system-call close(sockfd) with the following code sequence:

2327	   o  A call to shutdown(sockfd, SHUT_WR), to close the sending
2328	      direction of this connection

2330	   o  Successive calls to read(), until it returns 0, thus indicating
2331	      that the remote TCP peer has finished sending data.

2333	   o  A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l,
2334	      sizeof(l)), where l is of type struct linger (with its members
2335	      l.l_onoff=1 and l.l_linger=90).

2337	   o  A call to close(sockfd), to close the corresponding file
2338	      descriptor.

2340	   The call to shutdown() (instead of close()) allows the application to
2341	   retain control of the underlying TCP connection while the connection
2342	   transitions through the FIN-WAIT-1 and FIN-WAIT-2 states.  However,
2343	   the application will not retain control of the connection while it
2344	   transitions through the CLOSING and TIME-WAIT states.

2346	   It should be noted that, strictly speaking, close(sockfd) decrements
2347	   the reference count for the descriptor sockfd, and initiates the
2348	   connection termination phase only when the reference count reaches 0.
2349	   On the other hand, shutdown(sockfd, SHUT_WR) initiates the
2350	   connection-termination phase, regardless of the reference count for
2351	   the sockfd descriptor.  This should be taken into account when
2352	   performing the code replacement described above.  For example, it
2353	   would be a bug for two processes (e.g., parent and child) that share
2354	   a descriptor to both call shutdown(sockfd, SHUT_WR).

2356	   An application performing a passive close should replace the call to
2357	   close(sockfd) with the following code sequence:

2359	   o  A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l,
2360	      sizeof(l)), where l is of type struct linger (with its members
2361	      l.l_onoff=1 and l.l_linger=90).

2363	   o  A call to close(sockfd), to close the corresponding file
2364	      descriptor.

2366	   It is assumed that if the application is performing a passive close,
2367	   the application already detected that the remote TCP peer finished
2368	   sending data by means as a result of a call to read() returning 0.

2370	   In this scenario, the application will not retain control of the
2371	   underlying connection when it transitions through the LAST_ACK state.

2373	   Enforcing limits on the number of connections with no user-space
2374	   controlling process

2376	   The considerations and recommendations in Section 5.3.2 for enforcing
2377	   limits on the number of connections with no user-space controlling
2378	   process are applicable to mitigate this vulnerability.

2380	   Limiting the number of simultaneous connections at the application

2382	   The considerations and recommendations in Section 5.3.2 for limiting
2383	   the number of simultaneous connections at the application are to
2384	   mitigate this vulnerability.  We note, however, that unless
2385	   applications are implemented to retain control of the underlying TCP
2386	   connection while the connection transitions through the FIN-WAIT-1
2387	   and FIN-WAIT-2 states, enforcing such limits may prove to be a
2388	   difficult task.

2390	   Limiting the number of simultaneous connections at firewalls

2392	   The considerations and recommendations in Section 5.3.2 for enforcing
2393	   limiting the number of simultaneous connections at firewalls are
2394	   applicable to mitigate this vulnerability.

2396	7.  Buffer management

2398	7.1.  TCP retransmission buffer

2400	7.1.1.  Vulnerability

2402	   [Shalunov, 2000] describes a resource exhaustion attack (Netkill)
2403	   that can be performed against TCP.  The attack aims at exhausting
2404	   system memory by creating a large number of TCP connections which are
2405	   then abandoned.  The attack is usually performed as follows:

2407	   o  The attacker creates a TCP connection to a service in which a
2408	      small client request can result in a large server response (e.g.,
2409	      HTTP).  Rather than relying on his kernel implementation of TCP,
2410	      the attacker creates his TCP connections by means of a specialized
2411	      packet-crafting tool.  This allows the attacker to create the TCP
2412	      connections and later abandon them, exhausting the resources at
2413	      the attacked system, while not tying his own system resources to
2414	      the abandoned connections.

2416	   o  When the connection is established (i.e., the three-way handshake
2417	      has completed), an application request is sent, and the TCP
2418	      connection is subsequently abandoned.  At this point, any state
2419	      information kept by the attack tool is removed.

2421	   o  The attacked server allocates TCP send buffers for transmitting
2422	      the response to the client's request.  This causes the victim TCP
2423	      to tie resources not only for the Transmission Control Block
2424	      (TCB), but also for the application data that needs to be
2425	      transferred.

2427	   o  Once the application response is queued for transmission, the
2428	      application closes the TCP connection, and thus TCP takes the
2429	      responsibility to deliver the queued data.  Having the application
2430	      close the connection has the benefit for the attacker that the
2431	      application is not able to keep track of the number of TCP
2432	      connections in use, and thus it is not able to enforce limits on
2433	      the number of connections.

2435	   o  The attacker repeats the above steps a large number of times, thus
2436	      causing a large amount of system memory at the victim host to be
2437	      tied to the abandoned connections.  When the system memory is
2438	      exhausted, the victim host denies service to new connections, or
2439	      possibly crashes.

2441	   There are a number of factors that affect the effectiveness of this
2442	   attack that are worth considering.  Firstly, while the attack is
2443	   typically targeted at a service such as HTTP, the consequences of the
2444	   attack are usually system-wide.  Secondly, depending on the size of
2445	   the server's response, the underlying TCP connection may or may not
2446	   be closed: if the response is larger than the TCP send buffer size at
2447	   the server, the application will usually block in a call to write()
2448	   or send(), and would therefore not close the TCP connection, thus
2449	   allowing the application to enforce limits on the number of ongoing
2450	   connections.  Consequently, the attacker will usually try to elicit a
2451	   response that is equal to or slightly smaller than the send buffer of
2452	   the attacked TCP.  Thirdly, while [Shalunov, 2000] notes that one
2453	   visible effect of this attack is a large number of connections in the
2454	   FIN-WAIT-1 state, this will not usually be the case.  Given that the
2455	   attacker never acknowledges any segment other than the SYN/ACK
2456	   segment that is part of the three-way handshake, at the point in
2457	   which the attacked TCP tries to send the application's response the
2458	   congestion window (cwnd) will usually be 4*SMSS (four maximum-sized
2459	   segments).  If the application's response were larger than 4*SMSS,
2460	   even if the application had closed the connection, the FIN segment
2461	   would never be sent, and thus the connection would still remain in
2462	   the ESTABLISHED state (rather than transit to the FIN-WAIT-1 state).

2464	7.1.2.  Countermeasures

2466	   The resource exhaustion attack described in Section 7.1.1 does not
2467	   necessarily differ from a legitimate high-load scenario, and
2468	   therefore is hard to mitigate without negatively affecting the
2469	   robustness of TCP.  However, complementary mitigations can still be
2470	   implemented to limit the impact of these attacks.

2472	   Enforcing limits on the number of connections with no user-space
2473	   controlling process

2475	   The considerations and recommendations in Section 5.3.2 for enforcing
2476	   limits on the number of connections with no user-space controlling
2477	   process are applicable to mitigate this vulnerability.

2479	   Enforcing per-user and per-process limits
2480	   While the Netkill attack is usually targeted at a service such as
2481	   HTTP, its impact is usually system-wide.  This is particularly
2482	   undesirable, as an attack against a single service might affect the
2483	   system as a whole (for example possibly precluding remote system
2484	   administration).

2486	   In order to avoid an attack against a single service from affecting
2487	   other services, we advise TCP implementations to enforce per-process
2488	   and per-user limits on maximum kernel memory that can be used at any
2489	   time.  Additionally, we recommend implementations to enforce per-
2490	   process and per-user limits on the number of existent TCP connections
2491	   at any time.

2493	   Limiting the number of ongoing connections at the application

2495	   The considerations and recommendations in Section 5.3.2 for enforcing
2496	   limits on the number of ongoing connections at the application are
2497	   applicable to mitigate this vulnerability.

2499	   Enabling applications to enforce limits on ongoing connections

2501	   As discussed in Section 6.1.1, the fact that the close() system call
2502	   marks the corresponding file descriptor as closed prevents the
2503	   application running on top of TCP from enforcing limits on the
2504	   corresponding connection.

2506	   While it is common practice for applications to terminate their
2507	   connections by means of the close() system call, it is possible for
2508	   an application to initiate the connection-termination phase without
2509	   closing the corresponding file descriptor (hence keeping control of
2510	   the connection).

2512	   In order to achieve this, an application performing an active close
2513	   (i.e., initiating the connection-termination phase) should replace
2514	   the call to close(sockfd) with the following code sequence:

2516	   o  A call to shutdown(sockfd, SHUT_WR), to close the sending
2517	      direction of this connection

2519	   o  Successive calls to read(), until it returns 0, thus indicating
2520	      that the remote TCP peer has finished sending data.

2522	   o  A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l,
2523	      sizeof(l)), where l is of type struct linger (with its members
2524	      l.l_onoff=1 and l.l_linger=90).

2526	   o  A call to close(sockfd), to close the corresponding file
2527	      descriptor.

2529	   The call to shutdown() (instead of close()) allows the application to
2530	   retain control of the underlying TCP connection while the connection
2531	   transitions through the FIN-WAIT-1 and FIN-WAIT-2 states.  However,
2532	   the application will not retain control of the connection while it
2533	   transitions through the CLOSING and TIME-WAIT states.  Nevertheless,
2534	   in these states TCP should not have any pending data to send to the
2535	   remote TCP peer or to be received by the application running on top
2536	   of it, and thus these states are less of a concern for this
2537	   particular vulnerability (Netkill).

2539	   It should be noted that, strictly speaking, close(sockfd) decrements
2540	   the reference count for the descriptor sockfd, and initiates the
2541	   connection termination phase only when the reference count reaches 0.
2542	   On the other hand, shutdown(sockfd, SHUT_WR) initiates the
2543	   connection-termination phase, regardless of the reference count for
2544	   the sockfd descriptor.  This should be taken into account when
2545	   performing the code replacement described above.  For example, it
2546	   would be a bug for two processes (e.g., parent and child) that share
2547	   a descriptor to both call shutdown(sockfd, SHUT_WR).

2549	   An application performing a passive close should replace the call to
2550	   close(sockfd) with the following code sequence:

2552	   o  A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l,
2553	      sizeof(l)), where l is of type struct linger (with its members
2554	      l.l_onoff=1 and l.l_linger=90).

2556	   o  A call to close(sockfd), to close the corresponding file
2557	      descriptor.

2559	   It is assumed that if the application is performing a passive close,
2560	   the application already detected that the remote TCP peer finished
2561	   sending data by means as a result of a call to read() returning 0.

2563	   In this scenario, the application will not retain control of the
2564	   underlying connection when it transitions through the LAST_ACK state.
2565	   However, in this state TCP should not have any pending data to send
2566	   to the remote TCP peer or to be received by the application running
2567	   on top of TCP, and thus this state is less of a concern for this
2568	   particular vulnerability (Netkill).

2570	   Limiting the number of simultaneous connections at firewalls

2572	   The considerations and recommendations in Section 5.3.2 for enforcing
2573	   limiting the number of simultaneous connections at firewalls are
2574	   applicable to mitigate this vulnerability.

2576	   Performing heuristics on ongoing TCP connections
2577	   Some heuristics could be performed on TCP connections that may
2578	   possibly help if scarce system requirements such as memory become
2579	   exhausted.  A number of parameters may be useful to perform such
2580	   heuristics.

2582	   In the case of the Netkill attack described in [Shalunov, 2000],
2583	   there are two parameters that are characteristic of a TCP being
2584	   attacked:

2586	   o  A large amount of data queued in the TCP retransmission buffer
2587	      (e.g., the socket send buffer).

2589	   o  Only small amount of data has been successfully transferred to the
2590	      remote peer.

2592	   Clearly, these two parameters do not necessarily indicate an ongoing
2593	   attack.  However, if exhaustion of the corresponding system resources
2594	   was imminent, these two parameters (among others) could be used to
2595	   perform heuristics when considering aborting ongoing connections.

2597	   It should be noted that while an attacker could advertise a zero
2598	   window to cause the target system to tie system memory to the TCP
2599	   retransmission buffer, it is hard to perform any useful statistics
2600	   from the advertised window.  While it is tempting to enforce a limit
2601	   on the length of the persist state (see Section 3.7.2 of this
2602	   document), an attacker could simply open the window (i.e., advertise
2603	   a TCP window larger than zero) from time to time to prevent this
2604	   enforced limit from causing his malicious connections to be aborted.

2606	7.2.  TCP segment reassembly buffer

2608	   TCP MAY discard out-of-order data when system-memory exhaustion is
2609	   imminent.

2611	   DISCUSSION:

2613	      TCP buffers out-of-order segments to more efficiently handle the
2614	      occurrence of packet reordering and segment loss.  When out-of-
2615	      order data are received, a "hole" momentarily exists in the data
2616	      stream which must be filled before the received data can be
2617	      delivered to the application making use of TCP's services.  This
2618	      situation can be exploited by an attacker, which could
2619	      intentionally create a hole in the data stream by sending a number
2620	      of segments with a sequence number larger than the next sequence
2621	      number expected (RCV.NXT) by the attacked TCP.  Thus, the attacked
2622	      TCP would tie system memory to buffer the out-of-order segments,
2623	      without being able to hand the received data to the corresponding
2624	      application.

2626	      If a large number of such connections were created, system memory
2627	      could be exhausted, precluding the attacked TCP from servicing new
2628	      connections and/or continue servicing TCP connections previously
2629	      established.

2631	      Fortunately, these attacks can be easily mitigated, at the expense
2632	      of degrading the performance of possibly legitimate connections.
2633	      When out-of-order data is received, an Acknowledgement segment is
2634	      sent with the next sequence number expected (RCV.NXT).  This means
2635	      that receipt of the out-of-order data will not be actually
2636	      acknowledged by the TCP's cumulative Acknowledgement Number.  As a
2637	      result, a TCP is free to discard any data that have been received
2638	      out-of-order, without affecting the reliability of the data
2639	      transfer.  Given the performance implications of discarding out-
2640	      of-order segments for legitimate connections, this pruning policy
2641	      should be applied only if memory exhaustion is imminent.

2643	      As a result of discarding the out-of-order data, these data will
2644	      need to be unnecessarily retransmitted.  Additionally, a loss
2645	      event will be detected by the sending TCP, and thus the slow start
2646	      phase of TCP's congestion control will be entered, thus reducing
2647	      the data transfer rate of the connection.

2649	      It is interesting to note that this pruning policy could be
2650	      applied even if Selective Acknowledgements (SACK) (specified in
2651	      RFC 2018 [Mathis et al, 1996]) are in use, as SACK provides only
2652	      advisory information, and does not preclude the receiving TCP from
2653	      discarding data that have been previously selectively-acknowledged
2654	      by means of TCP's SACK option, but not acknowledged by TCP's
2655	      cumulative Acknowledgement Number.

2657	      There are a number of ways in which the pruning policy could be
2658	      triggered.  For example, when out of order data are received, a
2659	      timer could be set, and the sequence number of the out-of-order
2660	      data could be recorded.  If the hole were filled before the timer
2661	      expires, the timer would be turned off.  However, if the timer
2662	      expired before the hole were filled, all the out-of-order segments
2663	      of the corresponding connection would be discarded.  This would be
2664	      a proactive counter-measure for attacks that aim at exhausting the
2665	      receive buffers.

2667	      In addition, an implementation could incorporate reactive
2668	      mechanisms for more carefully controlling buffer allocation when
2669	      some predefined buffer allocation threshold was reached.  At such
2670	      point, pruning policies would be applied.

2672	      A number of mechanisms can aid in the process of freeing system
2673	      resources.  For example, a table of network prefixes corresponding
2674	      to the IP addresses of TCP peers that have ongoing TCP connections
2675	      could record the aggregate amount of out-of-order data currently
2676	      buffered for those connections.  When the pruning policy was
2677	      triggered, TCP connections with hosts that have network prefixes
2678	      with large aggregate out-of-order buffered data could be selected
2679	      first for pruning the out-of-order segments.

2681	      Alternatively, if TCP segments were de-multiplexed by means of a
2682	      hash table (as it is currently the case in many TCP
2683	      implementations), a counter could be held at each entry of the
2684	      hash table that would record the aggregate out-of-order data
2685	      currently buffered for those connections belonging to that hash
2686	      table entry.  When the pruning policy is triggered, the out-of-
2687	      order data corresponding to those connections linked by the hash
2688	      table entry with largest amount of aggregate out-of-order data
2689	      could be pruned first.  It is important that this hash is not
2690	      computable by an attacker, as this would allow him to maliciously
2691	      cause the performance of specific connections to be degraded.
2692	      That is, given a four-tuple that identifies a connection, an
2693	      attacker should not be able to compute the corresponding hash
2694	      value used by the target system to de-multiplex incoming TCP
2695	      segments to that connection.

2697	      Another variant of a resource exhaustion attack against TCP's
2698	      segment reassembly mechanism would target the data structures used
2699	      to link the different holes in a data stream.  For example, an
2700	      attacker could send a burst of 1 byte segments, leaving a one-byte
2701	      hole between each of the data bytes sent.  Depending on the data
2702	      structures used for holding and linking together each of the data
2703	      segments, such an attack might waste a large amount of system
2704	      memory by exploiting the overhead needed store and link together
2705	      each of these one-byte segments.

2707	      For example, if a linked-list is used for holding and linking each
2708	      of the data segments, each of the involved data structures could
2709	      involve one byte of kernel memory for storing the received data
2710	      byte (the TCP payload), plus 4 bytes (32 bits) for storing a
2711	      pointer to the next node in the linked-list.  Additionally, while
2712	      such a data structure would require only a few bytes of kernel
2713	      memory, it could result in the allocation of a whole memory page,
2714	      thus consuming much more memory than expected.

2716	      Therefore, implementations should enforce a limit on the number of
2717	      holes that are allowed in the received data stream at any given
2718	      time.  When such a limit is reached, incoming TCP segments which
2719	      would create new holes would be silently dropped.  Measurements in
2720	      [Dharmapurikar and Paxson, 2005] indicate that in the vast
2721	      majority of TCP connections have at most a single hole at any
2722	      given time.  A limit of 16 holes for each connection would
2723	      accommodate even most of the very unusual cases in which there can
2724	      be more than hole in the data stream at a given time.

2726	      [US-CERT, 2004a] is a security advisory about a Denial of Service
2727	      vulnerability resulting from a TCP implementation that did not
2728	      enforce limits on the number of segments stored in the TCP
2729	      reassembly buffer.

2731	      Section 8 of this document describes the security implications of
2732	      the TCP segment reassembly algorithm.

2734	7.3.  Automatic buffer tuning mechanisms

2736	7.3.1.  Automatic send-buffer tuning mechanisms

2738	   A TCP implementing an automatic send-buffer tuning mechanism SHOULD
2739	   enforce the following limit on the size of the send buffer of each
2740	   TCP connection:

2742	   send_buffer_size <= send_buffer_pool / (min_buffer_size * max_connections)

2744	   where

2746	   send_buffer_size:
2747	      Maximum send buffer size to be used for this connection

2749	   send_buffer_pool:
2750	      Total amount of system memory meant for TCP send buffers

2752	   min_buffer_size:
2753	      Minimum send buffer size for each TCP connection

2755	   max_connections:
2756	      Maximum number of TCP connections this system is expected to
2757	      handle at a time

2759	   max_connections may be an artificial limit enforced by the system
2760	   administrator specifically on the number of TCP connections, or may
2761	   be derived from some other system limit (e.g., the maximum number of
2762	   file descriptors)

2764	   DISCUSSION:

2766	      A number of TCP implementations incorporate automatic tuning
2767	      mechanisms for the TCP send buffer size.  In most of them, the
2768	      underlying idea is to set the send buffer to some multiple of the
2769	      congestion window (cwnd).  This type of mechanism usually improves
2770	      TCP's performance, by preventing the socket send buffer from
2771	      becoming a bottleneck, while avoiding the need to simply
2772	      overestimate the TCP send buffer size (i.e., make it arbitrarily
2773	      large).  [Semke et al, 1998] discusses such an automatic buffer
2774	      tuning mechanism.

2776	      Unfortunately, automatic tuning mechanisms can be exploited by
2777	      attackers to amplify the impact of other resource exhaustion
2778	      attacks.  For example, an attacker could establish a TCP
2779	      connection with a victim host, and cause the congestion window to
2780	      be increased (either legitimately or illegitimately).  Once the
2781	      congestion window (and hence the TCP send buffer) is increased, he
2782	      could cause the corresponding system memory to be tied up by
2783	      advertising a zero-byte TCP window (see Section 3.7) or simply not
2784	      acknowledging any data, thus amplifying the effect of resource
2785	      exhaustion attacks such as that discussed in Section 7.1.1.

2787	      When an automatic buffer tuning mechanism is implemented, a number
2788	      of countermeasures should be incorporated to prevent the mechanism
2789	      from being exploited to amplify other resource exhaustion attacks.

2791	      Firstly, appropriate policies should be applied to guarantee fair
2792	      use of the available system memory by each of the established TCP
2793	      connections.  Secondly, appropriate policies should be applied to
2794	      avoid existing TCP connections from consuming all system
2795	      resources, thus preventing service to new TCP connections.

2797	      Appendix A of [Semke et al, 1998] proposes an algorithm for the
2798	      fair share of the available system memory among the established
2799	      connections.  However, there are a number of limits that should be
2800	      enforced on the system memory assigned for the send buffer of each
2801	      connection.  Firstly, each connection should always be assigned
2802	      some minimum send buffer space that would enable TCP to perform at
2803	      an acceptable performance.  Secondly, some system memory should be
2804	      reserved for future connections, according to the maximum number
2805	      of concurrent TCP connections that are expected to be successfully
2806	      handled at any given time.

2808	      These limits preclude the automatic tuning algorithm from
2809	      assigning all the available memory buffers to ongoing connections,
2810	      thus preventing the establishment of new connections.

2812	      Even if these limits are enforced, an attacker could still create
2813	      a large number of TCP connections, each of them tying valuable
2814	      system resources.  Therefore, in scenarios in which most of the
2815	      system memory reserved for TCP send buffers is allocated to
2816	      ongoing connections, it may be necessary for TCP to enforce some
2817	      policy to free resources to either service more TCP connections,
2818	      or to be able to improve the performance of other existing
2819	      connections, by allocating more resources to them.

2821	      When needing to free memory in use for send buffers, particular
2822	      attention should be paid to TCP's that have a large amount of data
2823	      in the socket send buffer, and that at the same time fall into any
2824	      of these categories:

2826	      *  The remote TCP peer that has been advertising a small (possibly
2827	         zero) window for a considerable period of time.

2829	      *  There have been a large number of retransmissions of segments
2830	         corresponding to the first few windows of data.

2832	      *  Connections that fall into one of the previous categories, for
2833	         which only a reduced amount of data have been successfully
2834	         transferred to the peer TCP since the connection was
2835	         established.

2837	      Unfortunately, all these cases are valid scenarios for the TCP
2838	      protocol, and thus aborting connections that fall in any of these
2839	      categories has the potential of causing interoperability problems.
2840	      However, in scenarios in which all system resources are allocated,
2841	      it may make sense to free resources allocated to TCP connections
2842	      which are tying a considerable amount of system resources and that
2843	      have not made progress in a considerable period of time.

2845	7.3.2.  Automatic receive-buffer tuning mechanism

2847	   A number of TCP implementations include automatic tuning mechanisms
2848	   for the receive buffer size.  These mechanisms aim at setting the
2849	   socket buffer to a size that is large enough to avoid the TCP window
2850	   from becoming a bottleneck that would limit TCP's throughput, without
2851	   wasting system memory by over-sizing it.

2853	   [Heffner, 2002] describes a mechanism for the automatic tuning of the
2854	   socket receive buffer.  Basically, the mechanism aims at measuring
2855	   the amount of data received during a RTT (Round-Trip Time), and
2856	   setting the socket receive buffer to some multiple of that value.

2858	   A TCP implementing an automatic receive-buffer tuning mechanism
2859	   SHOULD enforce the following limit on the size of the receive buffer
2860	   of each TCP connection:

2862	   recv_buffer_size <= recv_buffer_pool / (min_buffer_size * max_connections)
2863	   where:

2865	   recv_buffer_size:
2866	      Maximum receive buffer size to be used for this connection

2868	   recv_buffer_pool:
2869	      Total amount of system memory meant for TCP receive buffers

2871	   min_buffer_size:
2872	      Minimum receive buffer size for each TCP connection

2874	   max_connections:
2875	      Maximum number of TCP connections this system is expected to
2876	      handle at a time

2878	   max_connections may be an artificial limit enforced by the system
2879	   administrator specifically on the number of TCP connections, or may
2880	   be derived from some other system limit (e.g., the maximum number of
2881	   file descriptors).

2883	   DISCUSSION:

2885	      Unfortunately, automatic tuning mechanisms for the socket receive
2886	      buffer can be exploited to perform a resource exhaustion attack.
2887	      An attacker willing to exploit the automatic buffer tuning
2888	      mechanism would first establish a TCP connection with the victim
2889	      host.  Subsequently, he would start a bulk data transfer to the
2890	      victim host.  By carefully responding to the peer's TCP segments,
2891	      the attacker could cause the peer TCP to measure a large data/RTT
2892	      value, which would lead to the adoption of an unnecessarily large
2893	      socket receive buffer.  For example, the attacker could
2894	      optimistically send more data than those allowed by the TCP window
2895	      advertised by the remote TCP.  Those extra data would cross in the
2896	      network with the window updates sent by the remote TCP, and could
2897	      lead the TCP receiver to measure a data/RTT twice as big as the
2898	      real one.  Alternatively, if the TCP timestamp option (specified
2899	      in RFC 1323 [Jacobson et al, 1992]) is used for RTT measurement,
2900	      the attacker could lead the TCP receiver to measure a small RTT
2901	      (and hence a large Data/RTT rate) by "optimistically" echoing
2902	      timestamps that have not yet been received.

2904	      Finally, once the TCP receiver is led to increase the size of its
2905	      receive buffer, the attacker would transmit a large amount of
2906	      data, filling the whole peer's receive buffer except for a few
2907	      bytes at the beginning of the window (RCV.NXT).  This gap would
2908	      prevent the peer application from reading the data queued by TCP,
2909	      thus tying system memory to the received data segments until (if
2910	      ever) the peer application times out.

2912	      A number of limits should be enforced on the amount of system
2913	      memory assigned to any given connection.  Firstly, each connection
2914	      should always be assigned some minimum receive buffer space that
2915	      would enable TCP to perform at a minimum acceptable performance.
2916	      Additionally, some system memory should be reserved for future
2917	      connections, according to the maximum number of concurrent TCP
2918	      connections that are expected to be successfully handled at any
2919	      given time.

2921	      These limits preclude the automatic tuning algorithm from
2922	      assigning all the available memory buffers to existing
2923	      connections, thus preventing the establishment of new connections.

2925	      It is interesting to note that a TCP sender will always try to
2926	      retransmit any data that have not been acknowledged by TCP's
2927	      cumulative acknowledgement.  Therefore, if memory exhaustion is
2928	      imminent, a system should consider freeing those memory buffers
2929	      used for TCP segments that were received out of order,
2930	      particularly when a given connection has been keeping a large
2931	      number of out-of-order segments in the receive buffer for a
2932	      considerable period of time.

2934	      It is worth noting that TCP Selective Acknowledgements (SACK) are
2935	      advisory, in the sense that a TCP that has SACKed (but not ACKed)
2936	      a block of data is free to discard that block, and expect the TCP
2937	      sender to retransmit them when the retransmission timer of the
2938	      peer TCP expires.

2940	8.  TCP segment reassembly algorithm

2942	8.1.  Problems that arise from ambiguity in the reassembly process

2944	   If a TCP segment is received containing some data bytes that had
2945	   already been received, the first copy of those data SHOULD be used
2946	   for reassembling the application data stream.

2948	   DISCUSSION:

2950	      A security consideration that should be made for the TCP segment
2951	      reassembly algorithm is that of data stream consistency between
2952	      the host performing the TCP segment reassembly, and a Network
2953	      Intrusion Detection System (NIDS) being employed to monitor the
2954	      host in question.

2956	      In the event a TCP segment was unnecessarily retransmitted, or
2957	      there was packet duplication in any of the intervening networks, a
2958	      TCP might get more than one copy of the same data.  Also, as TCP
2959	      segments can be re-packetized when they are retransmitted, a given
2960	      TCP segment might partially overlap data already received in
2961	      earlier segments.  In all these cases, the question arises about
2962	      which of the copies of the received data should be used when
2963	      reassembling the data stream.  In legitimate and normal
2964	      circumstances, all copies would be identical, and the same data
2965	      stream would be obtained regardless of which copy of the data was
2966	      used.  However, an attacker could maliciously send overlapping
2967	      segments containing different data, with the intent of evading a
2968	      Network Intrusion Detection Systems (NIDS), which might reassemble
2969	      the received TCP segments differently than the monitored system.
2970	      [Ptacek and Newsham, 1998] provides a detailed discussion of these
2971	      issues.

2973	      As suggested in Section 3.9 of RFC 793 [Postel, 1981c], if a TCP
2974	      segment arrives containing some data bytes that have already been
2975	      received, the first copy of those data should be used for
2976	      reassembling the application data stream.  It should be noted that
2977	      while convergence to this policy might prevent some cases of
2978	      ambiguity in the reassembly process, there are a number of other
2979	      techniques that an attacker could still exploit to evade a NIDS
2980	      [CPNI, 2008].  These techniques can generally be defeated if the
2981	      NIDS is placed in-line with the monitored system, thus allowing
2982	      the NIDS to normalize the network traffic or apply some other
2983	      policy that could ensure consistency between the result of the
2984	      segment reassembly process obtained by the monitored host and that
2985	      obtained by the NIDS.

2987	      [CERT, 2003] and [CORE, 2003] are advisories about a heap buffer
2988	      overflow in a popular Network Intrusion Detection System resulting
2989	      from incorrect sequence number calculations in its TCP stream-
2990	      reassembly module.

2992	9.  TCP Congestion Control

2994	   TCP implements two algorithms, "slow start" and "congestion
2995	   avoidance", for controlling the rate at which data is transmitted on
2996	   a TCP connection [Allman et al, 1999].  These algorithms require the
2997	   addition of two variables as part of TCP per-connection state: cwnd
2998	   and ssthresh.

3000	   The congestion window (cwnd) is a sender-side limit on the amount of
3001	   outstanding data that the sender can have at any time, while the
3002	   receiver's advertised window (rwnd) is a receiver-side limit on the
3003	   amount of outstanding data.  The minimum of cwnd and rwnd governs
3004	   data transmission.

3006	   Another state variable, the slow-start threshold (ssthresh), is used
3007	   to determine whether it is the slow start or the congestion avoidance
3008	   algorithm that should control data transmission.  When cwnd <
3009	   ssthresh, "slow start" governs data transmission, and the congestion
3010	   window (cwnd) is exponentially increased.  When cwnd > ssthresh,
3011	   "congestion avoidance" governs data transmission, and the congestion
3012	   window (cwnd) is only linearly increased.

3014	   As specified in RFC 2581 [Allman et al, 1999], when cwnd and ssthresh
3015	   are equal the sender may use either slow start or congestion
3016	   avoidance.

3018	   During slow start, TCP increments cwnd by at most SMSS bytes for each
3019	   ACK received that acknowledges new data.  During congestion
3020	   avoidance, cwnd is incremented by 1 full-sized segment per round-trip
3021	   time (RTT), until congestion is detected.

3023	   Additionally, TCP uses two algorithms, Fast Retransmit and Fast
3024	   Recovery, to mitigate the effects of packet loss.  The "Fast
3025	   Retransmit" algorithm infers packet loss when three Duplicate
3026	   Acknowledgements (DupACKs) are received.

3028	   The value "three" is meant to allow for fast-retransmission of
3029	   "missing" data, while avoiding network packet reordering from
3030	   triggering loss recovery.

3032	   Once packet loss is detected by the receipt of three duplicate-ACKs,
3033	   the "Fast Recovery" algorithm governs the transfer of new data until
3034	   a non-duplicate ACK is received that acknowledges the receipt of new
3035	   data.  The Fast Retransmit and Fast Recovery algorithms are usually
3036	   implemented together, as follows (from RFC 2581):

3038	   o  When the third duplicate ACK is received, set ssthresh to no more
3039	      than the value given in the equation: ssthresh = max (FlightSize /
3040	      2, 2*SMSS)

3042	   o  Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.
3043	      This artificially "inflates" the congestion window by the number
3044	      of segments (three) that have left the network and which the
3045	      receiver has buffered.

3047	   o  For each additional duplicate ACK received, increment cwnd by
3048	      SMSS.  This artificially inflates the congestion window in order
3049	      to reflect the additional segment that has left the network.

3051	   o  Transmit a segment, if allowed by the new value of cwnd and the
3052	      receiver's advertised window.

3054	   o  When the next ACK arrives that acknowledges new data, set cwnd to
3055	      ssthresh (the value set in step 1).  This is termed "deflating"
3056	      the window.

3058	9.1.  Congestion control with misbehaving receivers

3060	   [Savage et al, 1999] describes a number of ways in which TCP's
3061	   congestion control mechanisms can be exploited by a misbehaving TCP
3062	   receiver to obtain more than its fair share of bandwidth.  The
3063	   following subsections provide a brief discussion of these
3064	   vulnerabilities, along with the possible countermeasures.

3066	9.1.1.  ACK division

3068	   TCP SHOULD increase cwnd by one SMSS only when a valid ACK covers the
3069	   entire data segment sent

3071	   (note: or should we recommend the other counter-measure (i.e.,
3072	   implementation of ABC?)

3074	   DISCUSSION:

3076	      Given that TCP updates cwnd based on the number of duplicate ACKs
3077	      it receives, rather than on the amount of data that each ACK is
3078	      actually acknowledging, a malicious TCP receiver could cause the
3079	      TCP sender to illegitimately increase its congestion window by
3080	      acknowledging a data segment with a number of separate
3081	      Acknowledgements, each covering a distinct piece of the received
3082	      data segment.

3084	             See Figure 7, in page 64 of the UK CPNI document.

3086	                             ACK division attack

3088	      [Savage et al, 1999] describes two possible countermeasures for
3089	      this vulnerability.  One of them is to increment cwnd not by a
3090	      full SMSS, but proportionally to the amount of data being
3091	      acknowledged by the received ACK, similarly to the policy
3092	      described in RFC 3465 [Allman, 2003].  Another alternative is to
3093	      increase cwnd by one SMSS only when a valid ACK covers the entire
3094	      data segment sent.

3096	9.1.2.  DupACK forgery

3098	   TCP SHOULD keep track of the number of outstanding segments (o_seg),
3099	   and accept only up to (o_seg -1) duplicate Acknowledgements.

3101	   DISCUSSION:

3103	      The second vulnerability discussed in [Savage et al, 1999] allows
3104	      an attacker to cause the TCP sender to illegitimately increase its
3105	      congestion window by forging a number of duplicate
3106	      Acknowledgements (DupACKs).  Figure 8 shows a sample scenario.
3107	      The first three DupACKs trigger the Fast Recovery mechanism, while
3108	      the rest of them cause the congestion window at the TCP sender to
3109	      be illegitimately inflated.  Thus, the attacker is able to
3110	      illegitimately cause the TCP sender to increase its data
3111	      transmission rate.

3113	             See Figure 8, in page 65 of the UK CPNI document.

3115	                            DupACK forgery attack

3117	      Fortunately, a number of sender-side heuristics can be implemented
3118	      to mitigate this vulnerability.  First, the TCP sender could keep
3119	      track of the number of outstanding segment (o_seg), and accept
3120	      only up to (o_seg -1) DupACKs.  Secondly, a TCP sender might, for
3121	      example, refuse to enter Fast Recovery multiple times in some
3122	      period of time (e.g., one RTT).

3124	      [Savage et al, 1999] also describes a modification to TCP to
3125	      implement a nonce protocol that would eliminate this
3126	      vulnerability.  However, this would require modification of all
3127	      implementations, which makes this counter-measure hard to deploy.

3129	9.1.3.  Optimistic ACKing

3131	   Another alternative for an attacker to exploit TCP's congestion
3132	   control mechanisms is to acknowledge data that has not yet been
3133	   received, thus causing the congestion window at the TCP sender to be
3134	   incremented faster than it should.

3136	             See Figure 9, in page 66 of the UK CPNI document.

3138	                         Optimistic ACKing attack

3140	   [Savage et al, 1999] describes a number of mitigations for this
3141	   vulnerability.  Firstly, it describes a countermeasure based on the
3142	   concept of "cumulative nonce", which would allow a receiver to prove
3143	   that it has received all the segments it is acknowledging.  However,
3144	   this countermeasure requires the introduction of two new fields to
3145	   the TCP header, thus requiring a modification to all the
3146	   communicating TCPs, makes this counter-measure hard to deploy.
3147	   Secondly, it describes a possible way to encode the nonce in a TCP
3148	   segment by carefully modifying its size.  While this countermeasure
3149	   could be easily deployed (as it is just sender side policy), we
3150	   believe that middle-boxes such as protocol-scrubbers might prevent
3151	   this counter-measure from working as expected.  Finally, it suggests
3152	   that a TCP sender might penalize a TCP receiver that acknowledges
3153	   data not yet sent by resetting the corresponding connection.  Here we
3154	   discourage the implementation of this policy, as it would provide an
3155	   attack vector for a TCP-based connection-reset attack, similar to
3156	   those described in Section 11.

3158	   [US-CERT, 2005a] is a vulnerability advisory about this issue.

3160	9.2.  Blind DupACK triggering attacks against TCP

3162	   While all of the attacks discussed in [Savage et al, 1999] have the
3163	   goal of increasing the performance of the attacker's TCP connections,
3164	   TCP congestion control mechanisms can be exploited with a variety of
3165	   goals.

3167	   Firstly, if bursts of many duplicate-ACKs are sent to the "sending
3168	   TCP", the third duplicate-ACK will cause the "lost" segment to be
3169	   retransmitted, and each subsequent duplicate-ACK will cause cwnd to
3170	   be artificially inflated.  Thus, the "sending TCP" might end up
3171	   injecting more packets into the network than it really should, with
3172	   the potential of causing network congestion.  This is a potential
3173	   consequence of the "Duplicate-ACK spoofing attack" described in
3174	   [Savage et al, 1999].

3176	   Secondly, if bursts of three duplicate ACKs are sent to the TCP
3177	   sender, the attacked system would infer packet loss, and ssthresh and
3178	   cwnd would be reduced.  As noted in RFC 2581 [Allman et al, 1999],
3179	   causing two congestion control events back-to-back will often cut
3180	   ssthresh and cwnd to their minimum value of 2*SMSS, with the
3181	   connection immediately entering the slower-performing congestion
3182	   avoidance phase.  While it would not be attractive for an attacker to
3183	   perform this attack against one of his TCP connections, the attack
3184	   might be attractive when the TCP connection to be attacked is
3185	   established between two other parties.

3187	   It is usually assumed that in order for an off-path attacker to
3188	   perform attacks against a third-party TCP connection, he should be
3189	   able to guess a number of values, including a valid TCP Sequence
3190	   Number and a valid TCP Acknowledgement Number.  While this is true if
3191	   the attacker tries to "inject" valid packets into the connection by
3192	   himself, a feature of TCP can be exploited to fool one of the TCP
3193	   endpoints to transmit valid duplicate Acknowledgements on behalf of
3194	   the attacker, hence relieving the attacker of the hard task of
3195	   forging valid values for the Sequence Number and Acknowledgement
3196	   Number TCP header fields.

3198	   Section 3.9 of RFC 793 [Postel, 1981c] describes the processing of
3199	   incoming TCP segments as a function of the connection state and the
3200	   contents of the various header fields of the received segment.  For
3201	   connections in the ESTABLISHED state, the first check that is
3202	   performed on incoming segments is that they contain "in window" data.
3203	   That is,

3205	                 RCV.NXT <= SEG.SEQ <= RCV.NXT+RCV.WND, or

3207	               RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND

3209	   If a segment does not pass this check, it is dropped, and an
3210	   Acknowledgement is sent in response:

3212	                    <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

3214	   The goal of this behavior is that, in the event data segments are
3215	   received by the TCP receiver, but all the corresponding
3216	   Acknowledgements are lost, when the TCP sender retransmits the
3217	   supposedly lost data, the TCP receiver will send an Acknowledgement
3218	   reflecting all the data received so far.  If "old" TCP segments were
3219	   silently dropped, the scenario just described would lead to a
3220	   "frozen" TCP connection, with the TCP sender retransmitting the data
3221	   for which it has not yet received an Acknowledgement, and the TCP
3222	   receiver silently ignoring these segments.  Additionally, it helps
3223	   TCP to detect half-open connections.

3225	   This feature implies that, provided the four-tuple that identifies a
3226	   given TCP connection is known or can be easily guessed, an attacker
3227	   could send a TCP segment with an "out of window" Sequence Number to
3228	   one of the endpoints of the TCP connection to cause it to send a
3229	   valid ACK to the other endpoint of the connection.  Figure 10
3230	   illustrates such a scenario.

3232	             See Figure 10, in page 68 of the UK CPNI document.

3234	                       Blind Dup-ACK forgery attack

3236	   As discussed in [Watson, 2004] and RFC 4953 [Touch, 2007], there are
3237	   a number of scenarios in which the four-tuple that identifies a TCP
3238	   connection is known or can be easily guessed.  In those scenarios, an
3239	   attacker could perform any of the "blind" attacks described in the
3240	   following subsections by exploiting the technique described above.

3242	   The following subsections describe blind DupACK-triggering attacks
3243	   that aim at either degrading the performance of an arbitrary
3244	   connection, or causing a TCP sender to illegitimately increase the
3245	   rate at which it transmits data, potentially leading to network
3246	   congestion.

3248	9.2.1.  Blind throughput-reduction attack

3250	   As discussed in Section 9, when three duplicate Acknowledgements are
3251	   received, the congestion window is reduced to half the current amount
3252	   of outstanding data (FlightSize).  Additionally, the slow-start
3253	   threshold (ssthresh) is reduced to the same value, causing the
3254	   connection to enter the slower-performing congestion avoidance phase.
3255	   If two congestion-control events occur back to back, ssthresh and
3256	   cwnd will often be reduced to their minimum value of 2*SMSS.

3258	   An attacker could exploit the technique described in Section 9.2 to
3259	   cause the throughput of the attacked TCP connection to be reduced, by
3260	   eliciting three duplicate acknowledgements from the TCP receiver,
3261	   which would cause the TCP sender to reduce its congestion window.  In
3262	   principle, the attacker would need to send a burst of only three out-
3263	   of-window segments.  However, in case the TCP receiver implements an
3264	   acknowledgement policy such as "ACK every other segment", four out-
3265	   of-window segments might be needed.  The first segment would cause
3266	   the pending (delayed) Acknowledgement to be sent, and the next three
3267	   segments would elicit the actual duplicate Acknowledgements.

3269	   Figure 11 shows a time-line graph of a sample scenario.  The burst of
3270	   DupACKs (in green) elicited by the burst of out-of-window segments
3271	   (in red) sent by the attacker causes the TCP sender to retransmit the
3272	   missing segment (in blue) and enter the loss recovery phase.  Once a
3273	   segment that acknowledges new data is received by the TCP sender, the
3274	   loss recovery phase ends, and cwnd and ssthresh are set to half the
3275	   number of segments that were outstanding when the loss recovery phase
3276	   was entered.

3278	             See Figure 11, in page 69 of the UK CPNI document.

3280	            Blind throughput-reduction attack (time-line graph)

3282	   The graphic assumes that the TCP receiver sends an Acknowledgement
3283	   for every other data segment it receives, and that the TCP sender
3284	   implements Appropriate Byte Counting (specified in RFC 3465 [Allman,
3285	   2003]) on the received Acknowledgement segments.  However,
3286	   implementation of these policies is not required for the attack to
3287	   succeed.

3289	9.2.2.  Blind flooding attack

3291	   As discussed in Section 9, when three duplicate Acknowledgements are
3292	   received, the "lost" segment is retransmitted, and the congestion
3293	   window is artificially inflated for each DupACK received, until the
3294	   loss recovery phase ends.  By sending a long burst of out-of-window
3295	   segments to the TCP receiver of the attacked connection, an attacker
3296	   could elicit a long burst of valid duplicate acknowledgements that
3297	   would illegitimately cause the TCP sender of the attacked TCP
3298	   connection to increase its data transmission rate.

3300	   Figure 12 shows a time-line graph for this attack.  The long burst of
3301	   DupACKs (in green) elicited by the long burst of out-of-window
3302	   segments (in red) sent by the attacker causes the TCP sender to enter
3303	   the loss recovery phase and illegitimately inflate the congestion
3304	   window, leading to an increase in the data transmission rate.  Once a
3305	   segment that acknowledges new data is received by the TCP sender, the
3306	   loss recovery phase ends, and the data transmission rate is reduced.

3308	             See Figure 12, in page 70 of the UK CPNI document.

3310	                  Blind flooding attack (time-line graph)

3312	   Figure 13 is a time-sequence graph produced from packet logs obtained
3313	   from tests of the described attack in a real network.  A burst of
3314	   segments is sent upon receipt of the burst of Duplicate
3315	   Acknowledgements illegitimately elicited by the attacker.  Figure 14
3316	   is an averaged-throughput graphic for the same time frame, which
3317	   clearly shows the effect of the attack in terms of throughput.

3319	             See Figure 13, in page 71 of the UK CPNI document.

3321	                Blind flooding attack (time sequence graph)

3323	             See Figure 14, in page 71 of the UK CPNI document.

3325	             Blind flooding attack (averaged throughput graph)

3327	   These graphics were produced with Shawn Ostermann's tcptrace tool
3328	   [Ostermann, 2008].  An explanation of the format of the graphics can
3329	   be found in tcptrace's manual (available at the project's web site:
3330	   http://www.tcptrace.org).

3332	9.2.3.  Difficulty in performing the attacks

3334	   In order to exploit the technique described in Section 9.2 of this
3335	   document, an attacker would need to know the four-tuple {IP Source
3336	   Address, TCP Source Port, IP Destination Address, TCP Destination
3337	   Port} that identifies the connection to be attacked.  As discussed by
3338	   [Watson, 2004] and RFC 4953 [Touch, 2007], there are a number of
3339	   scenarios in which these values may be known or easily guessed.

3341	   It is interesting to note that the attacks described in Section 9.2
3342	   of this document will typically require a much smaller number of
3343	   packets than other "blind" attacks against TCP, such as those
3344	   described in [Watson, 2004] and RFC 4953 [Touch, 2007], as the
3345	   technique discussed in Section 9.2 relieves the attacker from having
3346	   to guess valid TCP Sequence Numbers and a TCP Acknowledgement
3347	   numbers.

3349	   The attacks described in Section 9.2.1 and Section 9.2.2 of this
3350	   document require the attacker to forge the source address of the
3351	   packets it sends.  Therefore, if ingress/egress filtering is
3352	   performed by intermediate systems, the attacker's packets would not
3353	   get to the intended recipient, and thus the attack would not succeed.
3354	   However, we consider that ingress/egress filtering cannot be relied
3355	   upon as the first line of defense against these attacks.

3357	   Finally, it is worth noting that in order to successfully perform the
3358	   blind attacks discussed in Section 9.2.1 and Section 9.2.2 of this
3359	   document, the burst of out-of-sequence segments sent by the attacker
3360	   should not be intermixed with valid data segments sent by the TCP
3361	   sender, or else the Acknowledgement number of the illegitimately-
3362	   elicited ACK segments would change, and the Acknowledgements would
3363	   not be considered "Duplicate Acknowledgements" by the TCP sender.
3364	   Tests performed in real networks seem to suggest that this
3365	   requirement is not hard to fulfill, though.

3367	9.2.4.  Modifications to TCP's loss recovery algorithms

3369	   There are a number of algorithms that augment TCP's loss recovery
3370	   mechanism that have been suggested by TCP researchers and have been
3371	   specified by the IETF in the RFC series.  This section describes a
3372	   number of these algorithms, and discusses how their implementation
3373	   affects (or not) the vulnerability of TCP to the attacks discussed in
3374	   Section 9.2.1 and Section 9.2.2 of this document.

3376	   NewReno

3378	   RFC 3782 [Floyd et al, 2004] specifies the NewReno algorithm, which
3379	   is meant to improve TCP's performance in the presence of multiple
3380	   losses in a single window of data.  The implication of this algorithm
3381	   with respect to the attacks discussed in the previous sections is
3382	   that whenever either of the attacks is performed against a connection
3383	   with a NewReno TCP sender, a full-window (or half a window) of data
3384	   will be unnecessarily retransmitted.  This is particularly
3385	   interesting in the case of the blind-flooding attack, as the attack
3386	   would elicit even more packets from the TCP sender.

3388	   Whether a full-window or just half a window of data is retransmitted
3389	   depends on the Acknowledgement policy at the TCP receiver.  If the
3390	   TCP receiver sends an Acknowledgement (ACK) for every segment, a
3391	   full-window of data will be retransmitted.  If the TCP receiver sends
3392	   an Acknowledgement (ACK) for every other segment, then only half a
3393	   window of data will be retransmitted.

3395	   Figure 15 is a time-sequence graph produced from packet logs obtained
3396	   from tests performed in a real network.  Once loss recovery is
3397	   illegitimately triggered by the duplicate-ACKs elicited by the
3398	   attacker, an entire flight of data is unnecessarily retransmitted.
3399	   Figure 16 is an averaged-throughput graphic for the same time-frame,
3400	   which shows an increase in the throughput of the connection resulting
3401	   from the retransmission of segments governed by NewReno's loss
3402	   recovery.

3404	             See Figure 15, in page 73 of the UK CPNI document.

3406	                NewReno loss recovery (time-sequence graph)

3408	             See Figure 16, in page 74 of the UK CPNI document.

3410	             NewReno loss recovery (averaged throughput graph)

3412	   Limited Transmit

3414	   RFC 3042 [Allman et al, 2001] proposes an enhancement to TCP to more
3415	   effectively recover lost segments when a connection's congestion
3416	   window is small, or when a large number of segments are lost in a
3417	   single transmission window.  The "Limited Transmit" algorithm calls
3418	   for sending a new data segment in response to each of the first two
3419	   Duplicate Acknowledgements that arrive at the TCP sender.  This would
3420	   provide two additional transmitted packets that may be useful for the
3421	   attacker in the case of the blind flooding attack described in
3422	   Section 9.2.2 is performed.

3424	   SACK-based loss recovery

3426	   RFC 3517 [Blanton et al, 2003] specifies a conservative loss-recovery
3427	   algorithm that is based on the use of the selective acknowledgement
3428	   (SACK) TCP option.  The algorithm uses DupACKs as an indication of
3429	   congestion, as specified in RFC 2581 [Allman et al, 1999].  However,
3430	   a difference between this algorithm and the basic algorithm described
3431	   in RFC 2581 is that it clocks out segments only with the SACK
3432	   information included in the DupACKs.  That is, during the loss
3433	   recovery phase, segments will be injected in the network only if the
3434	   SACK information included in the received DupACKs indicates that one
3435	   or more segments have left the network.  As a result, those systems
3436	   that implement SACK-based loss recovery will not be vulnerable to the
3437	   blind flooding attack described in Section 9.2.2.  However, as RFC
3438	   3517 does not actually require DupACKs to include new SACK
3439	   information (corresponding to data that has not yet been acknowledged
3440	   by TCP's cumulative Acknowledgement), systems that implement SACK-
3441	   based loss-recovery may still remain vulnerable to the blind
3442	   throughput-reduction attack described in Section 9.2.1.  SACK-based
3443	   loss recovery implementations should be updated to implement the
3444	   countermeasure ("Use of SACK information to validate DupACKs")
3445	   described in Section 9.2.5.

3447	9.2.5.  Countermeasures

3449	   TCP SHOULD validate the Sequence Number of an incomming TCP segment
3450	   as follows:

3452	           RCV.NXT - MAX.RCV.WND <= SEG.SEQ <= RCV.NXT + RCV.WND

3454	   where MAX.RCV.WND is the largest TCP window that has so far been
3455	   advertised to the remote endpoint.

3457	   If a segment passes this check, the processing rules specified in RFC
3458	   793 [Postel, 1981c] MUST applied.  Otherwise, TCP SHOULD send an ACK
3459	   (as specified by the processing rules in RFC 793 [Postel, 1981c]),
3460	   applying rate-limiting to the Acknowledgement segments sent in
3461	   response to out-of-window segments.

3463	   DISCUSSION:

3465	      As discussed in Section 9.2, TCP responds with an ACK when an out-
3466	      of-window segment is received, to accommodate those scenarios in
3467	      which the Acknowledgement segments that correspond to some
3468	      received data are lost in the network, and to help discover half-
3469	      open TCP connections.

3471	      However, it is possible to restrict the sequence numbers that are
3472	      considered acceptable, and have TCP respond with ACKs only when it
3473	      is strictly necessary.

3475	      A feature of TCP is that, in some scenarios, it can detect half-
3476	      open connections.  If an implementation chose to silently drop
3477	      those TCP segments that do not pass the check enforced by the
3478	      equation above, it could prevent TCP from detecting half-open
3479	      connections.  Figure 17 shows a scenario in which, provided that
3480	      "TCP B" behaves as specified in RFC 793, a half-open connection
3481	      would be discovered and aborted.

3483	      An established connection is said to be "half open" if one of the
3484	      TCPs has closed or aborted the connection at its end without the
3485	      knowledge of the other, or if the two ends of the connection have
3486	      become desynchronized owing to a crash that resulted in loss of
3487	      memory.

3489	             See Figure 17, in page 76 of the UK CPNI document.

3491	                        Half-Open Connection Discovery

3493	      In the scenario illustrated by Figure 17, TCP A crashes losing the
3494	      connection-state information of the TCP connection with TCP B. In
3495	      line 3, TCP A tries to establish a new connection with TCP B,
3496	      using the same four-tuple {IP Source Address, TCP source port, IP
3497	      Destination Address, TCP destination port}.  In line 4, as the SYN
3498	      segment is out of window, TCP B responds with an ACK.  This ACK
3499	      elicits an RST segment from TCP A, which causes the half-open
3500	      connection at TCP B to be aborted.

3502	      If the SYN segment had been "in window", TCP B would have sent an
3503	      RST segment instead, which would have closed the half-open
3504	      connection.  Ongoing work at the TCPM WG of the IETF proposes to
3505	      change this behavior, and make TCP respond to a SYN segment
3506	      received for any of the synchronized states with an ACK segment,
3507	      to avoid in-window SYN segments from being used to perform
3508	      connection-reset attacks [Ramaiah et al, 2008].

3510	      However, in case the out-of-window segment was silently dropped,
3511	      the scenario in Figure 17 would change into that in Figure 18.

3513	             See Figure 18, in page 76 of the UK CPNI document.

3515	       Half-Open Connection Discovery with the proposed counter-measure

3517	      In line 3, the SYN segment sent by TCP A is silently dropped by
3518	      TCP B because it does not pass the check enforced by the equation
3519	      above (i.e., it contains an out-of-window sequence number).  As a
3520	      result, some time later (an RTO) TCP A retransmits its SYN
3521	      segment.  Even after TCP A times out, the half-open connection at
3522	      TCP B will remain in the same state.

3524	      Thus, a conservative reaction to those segments that do not pass
3525	      the check enforced by the equation above would be to respond with
3526	      an Acknowledgement segment (as specified by RFC 793), applying
3527	      rate-limiting to those Acknowledgement segments sent in response
3528	      to segments that do not pass the check enforced by that equation.
3529	      An implementation might choose to enforce a rate-limit of, e.g.,
3530	      one ACK per five seconds, as a single ACK segment is needed for
3531	      the Half-Open Connection Discovery mechanism to work.

3533	      As the only reason to respond with an ACK to those segments that
3534	      do not pass the check enforced by the equation above is to allow
3535	      TCP to discover half-open connections, an aggressive rate-limit
3536	      can be enforced.  As long as the rate-limit prevents out-of-window
3537	      segments from eliciting three Acknowledgment segments in a Round-
3538	      trip Time (RTT), an attacker would not be able to trigger TCP's
3539	      loss-recovery, and thus would not be able to perform the attacks
3540	      described in the previous sections.

3542	      It is interesting to note that RFC 793 [Postel, 1981c] itself
3543	      states that half-open connections are expected to be unusual.
3544	      Additionally, given that in many scenarios it may be unlikely for
3545	      a TCP connection request to be issued with the same four-tuple as
3546	      that of the half-open connection, a complete solution for the
3547	      discovery of half-open connections cannot rely on the mechanism
3548	      illustrated by Figure 17, either.  Therefore, some implementations
3549	      might choose to sacrifice TCP's ability to detect half-open
3550	      connections, and have a more aggressive reaction to those segments
3551	      that do not pass the check enforced by the equation above by
3552	      silently dropping them.

3554	      This validation check can also help to avoid ACK wars in some
3555	      scenarios that may arise from the use of transparent proxies.  In
3556	      those scenarios, when the transparent proxy fails to wire (i.e.,
3557	      is disabled), the sequence numbers of the two end-points of the
3558	      TCP connection become desynchronized, and both TCPs begin to send
3559	      duplicate Acknowledgements to each other, with the intention of
3560	      re-synchronizing them.  As the sequence numbers never get re-
3561	      synchronized, the ACK war can only be stopped by an external
3562	      agent.

3564	   TCP SHOULD limit the number of duplicate acknowledgements it will
3565	   honour to:

3567	                   Max_DupACKs = (FlightSize / SMSS) - 1

3569	   Where FlightSize and SMSS are the values defined in RFC 2581 [Allman
3570	   et al, 1999].  When more than Max_DupACKs duplicate acknowledgements
3571	   are received, the exceeding DupACKs should be silently dropped.

3573	   DISCUSSION:

3575	      Note that duplicate acknowledgements should be elicited by out-of-
3576	      order segments.

3578	   In the case of TCP connections that have agreed to employ SACK, TCP
3579	   SHOULD validate duplicate ACKs with the following criteria: Valid
3580	   Duplicate ACKs MUST contain new SACK information.  The SACK
3581	   information MUST refer to data that has already been sent, but that
3582	   has not yet been acknowledged by TCP's cumulative Acknowledgement.  A
3583	   TCP segment that does not pass this check SHOULD NOT be considered as
3584	   "duplicate Acknowledgement".

3586	   DISCUSSION:

3588	      SACK, specified in 2018 [Mathis et al, 1996], provides a mechanism
3589	      for TCP to be able to acknowledge the receipt of out-of-order TCP
3590	      segments.  For connections that have agreed to use SACK, each
3591	      legitimate DupACK will contain new SACK information that reflects
3592	      the data bytes contained in the out-of-order data segment that
3593	      elicited the DupACK.

3595	      RFC 3517 [Blanton et al, 2003] specifies a SACK-based loss
3596	      recovery algorithm for TCP.  However, it does recommend TCP
3597	      implementations to validate DupACKs by requiring that they contain
3598	      new SACK information.  Results obtained from auditing a number of
3599	      TCP implementations seem to indicate that most TCP implementations
3600	      do not enforce this validation check on incoming DupACKs, either.

3602	      In the case of TCP connections that have agreed to use SACK, a
3603	      validation check should be performed on incoming ACK segments to
3604	      completely eliminate the attacks described in Section 9.2.1 and
3605	      Section 9.2.2 of this document: "Duplicate ACKs should contain new
3606	      SACK information.  The SACK information should refer to data that
3607	      has already been sent, but that has not yet been acknowledged by
3608	      TCP's cumulative Acknowledgement".

3610	      Those ACK segments that do not comply with this validation check
3611	      should not be considered "duplicate ACKs", and thus should not
3612	      trigger the loss-recovery phase.

3614	      In case at least one segment in a window of data has been lost,
3615	      the successive segments will elicit the generation of Duplicate
3616	      ACKs containing new SACK information.  This SACK information will
3617	      indicate the receipt of these successive segments by the TCP
3618	      receiver.

3620	      In the case of pure ACKs illegitimately elicited by out-of-window
3621	      segments, however, the ACKs will not contain any SACK information.

3623	      If DSACK (specified in 2883 [Floyd et al, 2000]) were implemented
3624	      by the TCP receiver, then the illegitimately elicited DupACKs
3625	      might contain out-of-window SACK information if the sequence
3626	      number of the forged TCP segment (SEG.SEQ) is lower than the next
3627	      expected sequence number (RECV.NXT) at the TCP receiver.  Such
3628	      segments should be considered to indicate the receipt of duplicate
3629	      data, rather than an indication of lost data, and therefore should
3630	      not trigger loss recovery.

3632	   Other possible general mitigations are discussed in the following
3633	   paragraphs:

3635	   TCP port number randomization

3637	   As in order to perform the blind attacks described in Section 9.2.1
3638	   and Section 9.2.2 the attacker needs to know the TCP port numbers in
3639	   use by the connection to be attacked, obfuscating the TCP source port
3640	   used for outgoing TCP connections will increase the number of packets
3641	   required to successfully perform these attacks.  Section 3.1 of this
3642	   document discusses the use of port randomization.

3644	   It must be noted that given that these blind DupACK triggering
3645	   attacks do not require the attacker to forge valid TCP Sequence
3646	   numbers and TCP Acknowledgement numbers, port randomization should
3647	   not be relied upon as a first line of defense.

3649	   Ingress and Egress filtering

3651	   Ingress and Egress filtering reduces the number of systems in the
3652	   global Internet that can perform attacks that rely on forged source
3653	   IP addresses.  While protection from the blind attacks discussed in
3654	   Section 9.2 should not rely only on Ingress and Egress filtering, its
3655	   deployment is recommended to help prevent all attacks that rely on
3656	   forged IP addresses.  RFC 3704 [Baker and Savola, 2004], RFC 2827
3657	   [Ferguson and Senie, 2000], and [NISCC, 2006] provide advice on
3658	   Ingress and Egress filtering.

3660	   Generalized TTL Security Mechanism (GTSM)

3662	   RFC 5082 [Gill et al, 2007] proposes a check on the TTL field of the
3663	   IP packets that correspond to a given TCP connection to reduce the
3664	   number of systems that could successfully attack the protected TCP
3665	   connection.  It provides for the attacks discussed in this document
3666	   the same level of protection than for the attacks described in
3667	   [Watson, 2004] and RFC 4953 [Touch, 2007].  While implementation of
3668	   this mechanism may be useful in some scenarios, it should be clear
3669	   that countermeasures discussed in the previous sections provide a
3670	   more effective and simpler solution than that provided by the GTSM.

3672	9.3.  TCP Explicit Congestion Notification (ECN)

3674	   ECN (Explicit Congestion Notification) provides a mechanism for
3675	   intermediate systems to signal congestion to the communicating
3676	   endpoints that in some scenarios can be used as an alternative to
3677	   dropping packets.

3679	   RFC 3168 [Ramakrishnan et al, 2001] contains a detailed discussion of
3680	   the possible ways and scenarios in which ECN could be exploited by an
3681	   attacker.

3683	   RFC 3540 [Spring et al, 2003] specifies an improvement to ECN based
3684	   on nonces, that protects against accidental or malicious concealment
3685	   of marked packets from the TCP sender.  The specified mechanism
3686	   defines a "NS" ("Nonce Sum") field in the TCP header that makes use
3687	   of one bit from the Reserved field, and requires a modification in
3688	   both of the endpoints of a TCP connection to process this new field.
3689	   This mechanism is still in "Experimental" status, and since it might
3690	   suffer from the behavior of some middle-boxes such as firewalls or
3691	   packet-scrubbers, we defer a recommendation of this mechanism until
3692	   more experience is gained.

3694	   There also is ongoing work in the research community and the IETF to
3695	   define alternate semantics for the ECN field of the IP header (e.g.,
3696	   see [PCNWG, 2009]).

3698	   The following subsections try to summarize the security implications
3699	   of ECN.

3701	9.3.1.  Possible attacks by a compromised router

3703	   Firstly, a router controlled by a malicious user could erase the CE
3704	   codepoint (either by replacing it with the ECT(0), ECT(1), or non-ECT
3705	   codepoints), effectively eliminating the congestion indication.  As a
3706	   result, the corresponding TCP sender would not reduce its data
3707	   transmission rate, possibly leading to network congestion.  This
3708	   could also lead to unfairness, as this flow could experience better
3709	   performance than other flows for which the congestion indication is
3710	   not erased (and thus their transmission rate is reduced).

3712	   Secondly, a router controlled by a malicious user could
3713	   illegitimately set the CE codepoint, falsely indicating congestion,
3714	   to cause the TCP sender to reduce its data transmission rate.
3715	   However, this particular attack is no worse than the malicious router
3716	   simply dropping the packets rather setting their CE codepoint.

3718	   Thirdly, a malicious router could turn off the ECT codepoint of a
3719	   packet, thus disabling ECN support.  As a result, if the packet later
3720	   arrives at a router that is experiencing congestion, it may be
3721	   dropped rather than marked.  As with the previous scenario, though,
3722	   this is no worse than the malicious router simply dropping the
3723	   corresponding packet.

3725	   It should be noted that a compromised on-path IP router could engage
3726	   in a much broader range of attacks, with broader impacts, and at much
3727	   lower attacker cost than the ones described here.  Such a compromised
3728	   router is extremely unlikely to engage in the attack vectors
3729	   discussed in this section, given the existence of more effective
3730	   attack vectors that have lower attacker cost.

3732	9.3.2.  Possible attacks by a malicious TCP endpoint

3734	   If a packet with the ECT codepoint set arrives at an ECN-capable
3735	   router that is experiencing moderate congestion, the router may
3736	   decide to set its CE codepoint instead of dropping it.  If either of
3737	   the TCP endpoints do not honour the congestion indication provided by
3738	   an ECN-capable router, this would result in unfairness, as other
3739	   (legitimate) ECN-capable flows would still reduce their sending rate
3740	   in response to the ECN marking of packets.  Furthermore, under
3741	   moderate congestion, non-ECN-capable flows would be subject to packet
3742	   drops by the same router.  As a result, the flow with a malicious TCP
3743	   end-point would obtain better service than the legitimate flows.

3745	   As noted in RFC 3168 [Ramakrishnan et al, 2001], a TCP endpoint
3746	   falsely indicating ECN capability could lead to unfairness, allowing
3747	   the mis-beheaving flow to get more than its fair share of the
3748	   bandwidth.  This could be the result of the mis-behavior of either of
3749	   the TCP endpoints.  For example, the sending TCP could indicate ECN
3750	   capability, but then send a CWR in response to an ECE without
3751	   actually reducing its congestion window.  Alternatively (or in
3752	   addition), the receiving TCP could simply ignore those packets with
3753	   the CE codepoint set, thus avoiding the sending TCP from receiving
3754	   the congestion indication.

3756	   In the case of the sending TCP ignoring the ECN congestion
3757	   indication, this would be no worse than the sending TCP ignoring the
3758	   congestion indication provided by a lost segment.  However, the case
3759	   of a TCP receiver ignoring the CE codepoint allows the TCP receiver
3760	   to get more than its fair share of bandwidth in a way that was
3761	   previously unavailable.  If congestion was kept "moderate", then the
3762	   malicious TCP receiver could maintain the unfairness, as the router
3763	   experiencing congestion would mark the offending packets of the
3764	   misbehaving flow rather than dropping them.  At the same time,
3765	   legitimate ECN-capable flows would respond to the congestion
3766	   indication provided by the CE codepoint, while legitimate non-ECN-
3767	   capable flows would be subject of packet dropping.  However, if
3768	   congestion turned to sufficiently heavy, the router experiencing
3769	   congestion would switch from marking packets to dropping packets, and
3770	   at that point the attack vector provided by ECN could no longer be
3771	   exploited (until congestion returns to moderate state).

3773	   RFC 3168 [Ramakrishnan et al, 2001] describes the use of "penalty
3774	   boxes" which would act on flows that do not respond appropriately to
3775	   congestion indications.  Section 10 of RFC 3168 suggests that a first
3776	   action taken at a penalty box for an ECN-capable flow would be to
3777	   switch to dropping packets (instead of marking them), and, if the
3778	   flow does not respond appropriately to the congestion indication, the
3779	   penalty box could reset the misbehaving connection.  Here we
3780	   discourage implementation of such a policy, as it would create a
3781	   vector for connection-reset attacks.  For example, an attacker could
3782	   forge TCP segments with the same four-tuple as the targeted
3783	   connection and cause them to transit the penalty box.  The penalty
3784	   box would first switch from marking to dropping packets.  However,
3785	   the attacker would continue sending forged segments, at a steady
3786	   rate.  As a result, if the penalty box implemented such a severe
3787	   policy of resetting connections for flows that still do not respond
3788	   to end-to-end congestion control after switching from marking to
3789	   dropping, the attacked connection would be reset.

3791	10.  TCP API

3793	   Section 3.8 of RFC 793 [Postel, 1981c] describes the minimum set of
3794	   TCP User Commands required of all TCP Implementations.  Most
3795	   operating systems provide an Application Programming Interface (API)
3796	   that allows applications to make use of the services provided by TCP.
3797	   One of the most popular APIs is the Sockets API, originally
3798	   introduced in the BSD networking package [McKusick et al, 1996].

3800	10.1.  Passive opens and binding sockets

3802	   When there is already a pending passive OPEN for some local port
3803	   number, TCP SHOULD NOT allow processes that do not belong to the same
3804	   user to "reuse" the local port for another passive OPEN.
3805	   Additionally, reuse of a local port SHOULD default to "off", and be
3806	   enabled only by an explicit command (e.g., the setsockopt() function
3807	   of the Sockets API).

3809	   DISCUSSION:

3811	      RFC 793 specifies the syntax of the "OPEN" command, which can be
3812	      used to perform both passive and active opens.  The syntax of this
3813	      command is as follows:

3815	      OPEN (local port, foreign socket, active/passive [, timeout] [,
3816	      precedence] [, security/compartment] [, options]) -> local
3817	      connection name

3819	      When this command is used to perform a passive open (i.e., the
3820	      active/passive flag is set to passive), the foreign socket
3821	      parameter may be either fully-specified (to wait for a particular
3822	      connection) or unspecified (to wait for any call).

3824	      As discussed in Section 2.7 of RFC 793 [Postel, 1981c], if there
3825	      are several passive OPENs with the same local socket (recorded in
3826	      the corresponding TCB), an incoming connection will be matched to
3827	      the TCB with the more specific foreign socket.  This means that
3828	      when the foreign socket of a passive OPEN matches that of the
3829	      incoming connection request, that passive OPEN takes precedence
3830	      over those passive OPENs with an unspecified foreign socket.

3832	      Popular implementations such as the Sockets API let the user
3833	      specify the local socket as fully-specified {local IP address,
3834	      local TCP port} pair, or as just the local TCP port (leaving the
3835	      local IP address unspecified).  In the former case, only those
3836	      connection requests sent to {local port, local IP address} will be
3837	      accepted.  In the latter case, connection requests sent to any of
3838	      the system's IP addresses will be accepted.  In a similar fashion
3839	      to the generic API described in Section 2.7 of RFC 793, if there
3840	      is a pending passive OPEN with a fully-specified local socket that
3841	      matches that for which a connection establishment request has been
3842	      received, that local socket will take precedence over those which
3843	      have left the local IP address unspecified.  The implication of
3844	      this is that an attacker could "steal" incoming connection
3845	      requests meant for a local application by performing a passive
3846	      OPEN that is more specific than that performed by the legitimate
3847	      application.

3849	10.2.  Active opens and binding sockets

3851	   TCP SHOULD NOT allow port numbers that have been allocated for a TCP
3852	   that is the LISTEN or CLOSED states to be specified as the "local
3853	   port" argument of the "OPEN" command.

3855	   An implementation MAY relax the aforementioned restriction when the
3856	   process or system user requesting allocation of such a port number is
3857	   the same that the process or system user controlling the TCP in the
3858	   CLOSED or LISTEN states with the same port number.

3860	   DISCUSSION:

3862	      As discussed in Section 10.1, the "OPEN" command specified in
3863	      Section 3.8 of RFC 793 [Postel, 1981c] can be used to perform
3864	      active opens.  In case of active opens, the parameter "local port"
3865	      will contain a so-called "ephemeral port".  While the only
3866	      requirement for such an ephemeral port is that the resulting
3867	      connection-id is unique, port numbers that are currently in use by
3868	      a TCP in the LISTEN state should not be allowed for use as
3869	      ephemeral ports.  If this rule is not complied, an attacker could
3870	      potentially "steal" an incoming connection to a local server
3871	      application by issuing a connection request to the victim client
3872	      at roughly the same time the client tries to connect to the victim
3873	      server application.  If the SYN segment corresponding to the
3874	      attacker's connection request and the SYN segment corresponding to
3875	      the victim client "cross each other in the network", and provided
3876	      the attacker is able to know or guess the ephemeral port used by
3877	      the client, a TCP simultaneous open scenario would take place, and
3878	      the incoming connection request sent by the client would be
3879	      matched with the attacker's socket rather than with the victim
3880	      server application's socket.

3882	      As already noted, in order for this attack to succeed, the
3883	      attacker should be able to guess or know (in advance) the
3884	      ephemeral port selected by the victim client, and be able to know
3885	      the right moment to issue a connection request to the victim
3886	      client.  While in many scenarios this may prove to be a difficult
3887	      task, some factors such as an inadequate ephemeral port selection
3888	      policy at the victim client could make this attack feasible.

3890	      It should be noted that most applications based on popular
3891	      implementations of TCP API (such as the Sockets API) perform
3892	      "passive opens" in three steps.  Firstly, the application obtains
3893	      a file descriptor to be used for inter-process communication
3894	      (e.g., by issuing a socket() call).  Secondly, the application
3895	      binds the file descriptor to a local TCP port number (e.g., by
3896	      issuing a bind() call), thus creating a TCP in the fictional
3897	      CLOSED state.  Thirdly, the aforementioned TCP is put in the
3898	      LISTEN state (e.g., by issuing a listen() call).  As a result,
3899	      with such an implementation of the TCP API, even if port numbers
3900	      in use for TCPs in the LISTEN state were not allowed for use as
3901	      ephemeral ports, there is a window of time between the second and
3902	      the third steps in which an attacker could be allowed to select a
3903	      port number that would be later used for listening to incoming
3904	      connections.  Therefore, these implementations of the TCP API
3905	      should enforce a stricter requirement for the allocation of port
3906	      numbers: port numbers that are in use by a TCP in the LISTEN or
3907	      CLOSED states should not be allowed for allocation as ephemeral
3908	      ports.

3910	      An implementation might choose to relax the aforementioned
3911	      restriction when the process or system user requesting allocation
3912	      of such a port number is the same that the process or system user
3913	      controlling the TCP in the CLOSED or LISTEN states with the same
3914	      port number.

3916	11.  Blind in-window attacks

3918	   In the last few years awareness has been raised about a number of
3919	   "blind" attacks that can be performed against TCP by forging TCP
3920	   segments that fall within the receive window [NISCC, 2004] [Watson,
3921	   2004].

3923	   The term "blind" refers to the fact that the attacker does not have
3924	   access to the packets that belong to the attacked connection.

3926	   The effects of these attacks range from connection resets to data
3927	   injection.  While these attacks were known in the research community,
3928	   they were generally considered unfeasible.  However, increases in
3929	   bandwidth availability and the use of larger TCP windows raised
3930	   concerns in the community.  The following subsections discuss a
3931	   number of forgery attacks against TCP, along with the possible
3932	   countermeasures to mitigate their impact.

3934	11.1.  Blind TCP-based connection-reset attacks

3936	   Blind connection-reset attacks have the goal of causing a TCP
3937	   connection maintained between two TCP endpoints to be aborted.  The
3938	   level of damage that the attack may cause usually depends on the
3939	   application running on top of TCP, with the more vulnerable
3940	   applications being those that rely on long-lived TCP connections.

3942	   An interesting case of such applications is BGP [Rekhter et al,
3943	   2006], in which a connection-reset usually results in the
3944	   corresponding entries of the routing table being flushed.

3946	   There are a variety of vectors for performing TCP-based connection-
3947	   reset attacks against TCP.  [Watson, 2004] and [NISCC, 2004] raised
3948	   awareness about connection-reset attacks that exploit the RST flag of
3949	   TCP segments.  [Ramaiah et al, 2008] noted that carefully crafted SYN
3950	   segments could also be used to perform connection-reset attacks.
3951	   This document describes yet two previously undocumented vectors for
3952	   performing connection-reset attacks: the Precedence field of IP
3953	   packets that encapsulate TCP segments, and illegal TCP options.

3955	11.1.1.  RST flag

3957	   TCP SHOULD implement the mitigation for RST-based attacks specified
3958	   in [Ramaiah et al, 2008].

3960	   DISCUSSION:

3962	      The RST flag signals a TCP peer that the connection should be
3963	      aborted.  In contrast with the FIN handshake (which gracefully
3964	      terminates a TCP connection), an RST segment causes the connection
3965	      to be abnormally closed.

3967	      As stated in Section 3.4 of RFC 793 [Postel, 1981c], all reset
3968	      segments are validated by checking their Sequence Numbers, with
3969	      the Sequence Number considered valid if it is within the receive
3970	      window.  In the SYN-SENT state, however, an RST is valid if the
3971	      Acknowledgement Number acknowledges the SYN segment that
3972	      supposedly elicited the reset.

3974	      [Ramaiah et al, 2008] proposes a modification to TCP's transition
3975	      diagram to address this attack vector.  The counter-measure is a
3976	      combination of enforcing a more strict validation check on the
3977	      sequence number of reset segments, and the addition of a
3978	      "challenge" mechanism.  With the implementation of the proposed
3979	      mechanism, TCP would behave as follows:

3981	      If the Sequence Number of an RST segment is outside the receive
3982	      window, the segment is silently dropped (as stated by RFC 793).
3983	      That is, a reset segment is discarded unless it passes the
3984	      following check:

3986	                RCV.NXT <= Sequence Number < RCV.NXT+RCV.WND

3988	      If the sequence number falls exactly on the left-edge of the
3989	      receive window, the reset is honoured.  That is, the connection is
3990	      reset if the following condition is true:

3992	                         Sequence Number == RCV.NXT

3994	      If an RST segment passes the first check (i.e., it is within the
3995	      receive window) but does not pass the second check (i.e., it does
3996	      not fall exactly on the left edge of the receive window), an
3997	      Acknowledgement segment ("challenge ACK") is set in response:

3999	                    <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

4001	      This Acknowledgement segment is referred to as a "challenge ACK"
4002	      as, in the event the RST segment that elicited it had been
4003	      legitimate (but silently dropped as a result of enforcing the
4004	      above checks), the challenge ACK would elicit a new reset segment
4005	      that would fall exactly on the left edge of the window and would
4006	      thus pass all the above checks, finally resetting the connection.

4008	      We recommend the implementation of this countermeasure.  However,
4009	      we are aware of patent claims on this counter-measure, and suggest
4010	      vendors to research the consequences of the possible patents that
4011	      may apply.

4013	      [US-CERT, 2003a] is an advisory of a firewall system that was
4014	      found particularly vulnerable to resets attack because of not
4015	      validating the TCP Sequence Number of RST segments.  Clearly, all
4016	      TCPs (including those in middle-boxes) should validate RST
4017	      segments as discussed in this section.

4019	11.1.2.  SYN flag

4021	   Processing of SYN segments received for connections in the
4022	   synchronized states SHOULD occur as follows:

4024	   o  If a SYN segment is received for a connection in any synchronized
4025	      state other than TIME-WAIT, respond with an ACK, applying rate-
4026	      throttling.  [Ramaiah et al, 2008]

4028	   o  If the corresponding connection is in the TIME-WAIT state, then
4029	      process the incomming SYN as specified in
4030	      [I-D.ietf-tcpm-tcp-timestamps].

4032	   DISCUSSION:

4034	      Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if a
4035	      SYN segment is received with a valid (i.e., "in window") Sequence
4036	      Number, an RST segment should be sent in response, and the
4037	      connection should be aborted.

4039	      The IETF has published an RFC, "Improving TCP's Resistance to
4040	      Blind In-Window Attacks" [Ramaiah et al, 2008] which addresses,
4041	      among others, this variant of TCP-based connection-reset attack.
4042	      This section describes the counter-measure proposed by the IETF, a
4043	      problem that may arise from the implementation of that solution,
4044	      and a workaround to it.

4046	      In order to mitigate this attack vector, [Ramaiah et al, 2008]
4047	      proposes to change TCP's reaction to SYN segments as follows.
4048	      When a SYN segment is received for a connection in any of the
4049	      synchronized states, an Acknowledgement (ACK) segment is sent in
4050	      response.

4052	      As discussed in [Ramaiah et al, 2008], there is a corner-case that
4053	      would not be properly handled by this mechanism.  If a host (TCP
4054	      A) establishes a TCP connection with a remote peer (TCP B), and
4055	      then crashes, reboots and tries to initiate a new incarnation of
4056	      the same connection (i.e., a connection with the same four-tuple
4057	      as the previous connection) using an Initial Sequence Number equal
4058	      to the RCV.NXT value at the remote peer (TCP B), the ACK segment
4059	      sent by TCP B in response to the SYN segment would contain an
4060	      Acknowledgement number that would be considered valid by TCP A,
4061	      and thus an RST segment would not be sent in response to the
4062	      Acknowledgement (ACK) segment.  As this ACK would not have the SYN
4063	      bit set, TCP A (being in the SYN-SENT state) would silently drop
4064	      it (as stated on page 68 of RFC 793).  After a Retransmission
4065	      Timeout (RTO), TCP A would retransmit its SYN segment, which would
4066	      lead to the same sequence of events as before.  Eventually, TCP A
4067	      would timeout, and the connection would be aborted.  This is a
4068	      corner case in which the introduced change would lead to a non-
4069	      desirable behavior.  However, we consider this scenario to be
4070	      extremely unlikely and, in the event it ever took place, the
4071	      connection would nevertheless be aborted after retrying for a
4072	      period of USER TIMEOUT seconds.

4074	      However, when this change is implemented exactly as described in
4075	      [Ramaiah et al, 2008], the potential of interoperability problems
4076	      is introduced, as a heuristic widely incorporated in many TCP
4077	      implementations is disabled.

4079	      In a number of scenarios a socket pair may need to be reused while
4080	      the corresponding four-tuple is still in the TIME-WAIT state in a
4081	      remote TCP peer.  For example, a client accessing some service on
4082	      a host may try to create a new incarnation of a previous
4083	      connection, while the corresponding four-tuple is still in the
4084	      TIME-WAIT state at the remote TCP peer (the server).  This may
4085	      happen if the ephemeral port numbers are being reused too quickly,
4086	      either because of a bad policy of selection of ephemeral ports, or
4087	      simply because of a high connection rate to the corresponding
4088	      service.  In such scenarios, the establishment of new connections
4089	      that reuse a four-tuple that is in the TIME-WAIT state would fail.
4090	      In order to avoid this problem, RFC 1122 [Braden, 1989] states (in
4091	      Section 4.2.2.13) that when a connection request is received with
4092	      a four-tuple that is in the TIME-WAIT state, the connection
4093	      request could be accepted if the sequence number of the incoming
4094	      SYN segment is greater than the last sequence number seen on the
4095	      previous incarnation of the connection (for that direction of the
4096	      data transfer).

4098	      This requirement aims at avoiding the sequence number space of the
4099	      new and old incarnations of the connection to overlap, thus
4100	      avoiding old segments from the previous incarnation of the
4101	      connection to be accepted as valid by the new connection.

4103	      The requirement in [Ramaiah et al, 2008] to disregard SYN segments
4104	      received for connections in any of the synchronized states forbids
4105	      the implementation of the heuristic described above.  As a result,
4106	      we argue that the processing of SYN segments proposed in [Ramaiah
4107	      et al, 2008] should apply only for connections in any of the
4108	      synchronized states other than the TIME-WAIT state.

4110	11.1.3.  Security/Compartment

4112	   If the security/compartment field of an incoming TCP segment does not
4113	   match the value recorded in the corresponding TCB, TCP SHOULD NOT
4114	   abort the connection, but simply discard the corresponding packet.
4115	   Additionally, this whole event SHOULD be logged as a security
4116	   violation.

4118	   DISCUSSION:

4120	      Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if
4121	      the IP security/compartment of an incoming segment does not
4122	      exactly match the security/compartment in the TCB, a RST segment
4123	      should be sent, and the connection should be aborted.

4125	      A discussion of the IP security options relevant to this section
4126	      can be found in Section 3.13.2.12, Section 3.13.2.13, and Section
4127	      3.13.2.14 of [CPNI, 2008].

4129	      This certainly provides another attack vector for performing
4130	      connection-reset attacks, as an attacker could forge TCP segments
4131	      with a security/compartment that is different from that recorded
4132	      in the corresponding TCB and, as a result, the attacked connection
4133	      would be reset.

4135	      It is interesting to note that for connections in the ESTABLISHED
4136	      state, this check is performed after validating the TCP Sequence
4137	      Number and checking the RST bit, but before validating the
4138	      Acknowledgement field.  Therefore, even if the stricter validation
4139	      of the Acknowledgement field (described in Section 3.4) was
4140	      implemented, it would not help to mitigate this attack vector.

4142	      This attack vector can be easily mitigated by relaxing the
4143	      reaction to TCP segments with "incorrect" security/compartment
4144	      values as specified in this section.

4146	11.1.4.  Precedence

4148	   If the Precedence field of an incomming TCP segment does not match
4149	   the value recorded in the corresponding TCB, TCP MUST NOT abort the
4150	   connection, and MUST instead continue processing the segment as
4151	   specified by RFC 793.

4153	   DISCUSSION:

4155	      Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if
4156	      the IP Precedence of an incoming segment does not exactly match
4157	      the Precedence recorded in the TCB, a RST segment should be sent,
4158	      and the connection should be aborted.

4160	      This certainly provides another attack vector for performing
4161	      connection-reset attacks, as an attacker could forge TCP segments
4162	      with a IP Precedence that is different from that recorded in the
4163	      corresponding TCB and, as a result, the attacked connection would
4164	      be reset.

4166	      It is interesting to note that for connections in the ESTABLISHED
4167	      state, this check is performed after validating the TCP Sequence
4168	      Number and checking the RST bit, but before validating the
4169	      Acknowledgement field.  Therefore, even if the stricter validation
4170	      of the Acknowledgement field (described in Section 3.4) were
4171	      implemented, it would not help to mitigate this attack vector.

4173	      This attack vector can be easily mitigated by relaxing the
4174	      reaction to TCP segments with "incorrect" IP Precedence values.
4175	      That is, even if the Precedence field does not match the value
4176	      recorded in the corresponding TCB, TCP should not abort the
4177	      connection, and should instead continue processing the segment as
4178	      specified by RFC 793.

4180	      It is interesting to note that resetting a connection due to a
4181	      change in the Precedence value might have a negative impact on
4182	      interoperability.  For example, the packets that correspond to the
4183	      connection could temporarily take a different internet path, in
4184	      which some middle-box could re-mark the Precedence field (due to
4185	      administration policies at the network to be transited).  In such
4186	      a scenario, an implementation following the advice in RFC 793
4187	      would abort the connection, when the connection would have
4188	      probably survived.

4190	      While the IPv4 Type of Service field (and hence the Precedence
4191	      field) has been redefined by the Differentiated Services (DS)
4192	      field specified in RFC 2474 [Nichols et al, 1998], RFC 793
4193	      [Postel, 1981c] was never formally updated in this respect.  We
4194	      note that both legacy systems that have not been upgraded to
4195	      implement the differentiated services architecture described in
4196	      RFC 2475 [Blake et al, 1998] and current implementations that have
4197	      extrapolated the discussion of the Precedence field to the
4198	      Differentiated Services field may still be vulnerable to the
4199	      connection reset vector discussed in this section.

4201	11.1.5.  Illegal options

4203	   TCP MUST silently drop those TCP segments that contain TCP options
4204	   with illegal option lengths.

4206	   DISCUSSION:

4208	      Section 4.2.2.5 of RFC 1122 [Braden, 1989] discusses the
4209	      processing of TCP options.  It states that TCP must be able to
4210	      receive a TCP option in any segment, and must ignore without error
4211	      any option it does not implement.  Additionally, it states that
4212	      TCP should be prepared to handle an illegal option length (e.g.,
4213	      zero) without crashing, and suggests handling such illegal options
4214	      by resetting the corresponding connection and logging the reason.
4215	      However, this suggested behavior could be exploited to perform
4216	      connection-reset attacks.  Therefore, as discussed in Section 3.10
4217	      of this document, we advise TCP implementations to silently drop
4218	      those TCP segments that contain illegal option lengths.

4220	11.2.  Blind data-injection attacks

4222	   An attacker could try to inject data in the stream of data being
4223	   transferred on the connection.  As with the other attacks described
4224	   in Section 11 of this document, in order to perform a blind data
4225	   injection attack the attacker would need to know or guess the four-
4226	   tuple that identifies the TCP connection to be attacked.
4227	   Additionally, he should be able to guess a valid ("in window") TCP
4228	   Sequence Number, and a valid Acknowledgement Number.

4230	   As discussed in Section 3.4 of this document, [Ramaiah et al, 2008]
4231	   proposes to enforce a more strict check on the Acknowledgement Number
4232	   of incoming segments than that specified in RFC 793 [Postel, 1981c].

4234	   Implementation of the proposed check requires more packets on the
4235	   side of the attacker to successfully perform a blind data-injection
4236	   attack.  However, it should be noted that applications concerned with
4237	   any of the attacks discussed in Section 11 of this document should
4238	   make use of proper authentication techniques, such as those specified
4239	   for IPsec in RFC 4301 [Kent and Seo, 2005].

4241	12.  Information leaking

4243	12.1.  Remote Operating System detection via TCP/IP stack fingerprinting

4245	   Clearly, remote Operating System (OS) detection is a useful tool for
4246	   attackers.  Tools such as nmap [Fyodor, 2006b] can usually detect the
4247	   operating system type and version of a remote system with an
4248	   amazingly accurate precision.  This information can in turn be used
4249	   by attackers to tailor their exploits to the identified operating
4250	   system type and version.

4252	   Evasion of OS fingerprinting can prove to be a very difficult task.
4253	   Most systems make use of a variety of protocols, each of which have a
4254	   large number of parameters that can be set to arbitrary values.
4255	   Thus, information on the operating system may be obtained from a
4256	   number of sources ranging from application banners to more obscure
4257	   parameters such as TCP's retransmission timer.

4259	   Nmap [Fyodor, 2006b] is probably the most popular tool for remote OS
4260	   detection via active TCP/IP stack fingerprinting. p0f [Zalewski,
4261	   2006a], on the other hand, is a tool for performing remote OS
4262	   detection via passive TCP/IP stack fingerprinting.  SinFP [SinFP,
4263	   2006] can perform both active and passive fingerprinting.  Finally,
4264	   TBIT [TBIT, 2001] is a TCP fingerprinting tool that aims at
4265	   characterizing the behavior of a remote TCP peer based on active
4266	   probes, and which has been widely used in the research community.

4268	   TBIT [TBIT, 2001] implements a number of tests not present in other
4269	   tools, such as characterizing the behavior of a TCP peer with respect
4270	   to TCP congestion control.

4272	   [Fyodor, 1998] and [Fyodor, 2006a] are classic papers on the subject.
4273	   [Miller, 2006] and [Smith and Grundl, 2002] provide an introduction
4274	   to passive TCP/IP stack fingerprinting.  [Smart et al, 2000] and
4275	   [Beck, 2001] discuss some techniques for evading OS detection through
4276	   TCP/IP stack fingerprinting.

4278	   The following subsections discuss TCP-based techniques for remote OS
4279	   detection via and, where possible, propose ways to mitigate them.

4281	12.1.1.  FIN probe

4283	   TCP MUST silently drop TCP any segments received for a connection in
4284	   the LISTEN state that do not have the SYN, RST, or ACK flags set.  In
4285	   the rest of the cases, the processing rules in RFC 793 MUST be
4286	   applied.

4288	   DISCUSSION:

4290	      The attacker sends a FIN (or any packet without the SYN or the ACK
4291	      flags set) to an open port.  RFC 793 [Postel, 1981c] leaves the
4292	      reaction to such segments unspecified.  As a result, some
4293	      implementations silently drop the received segment, while others
4294	      respond with a RST.

4296	12.1.2.  Bogus flag test

4298	   TCP MUST ignore any flags not supported, and MUST NOT reflect them if
4299	   a TCP segment is sent in response to the one just received.

4301	   DISCUSSION:

4303	      The attacker sends a TCP segment setting at least one bit of the
4304	      Reserved field.  Some implementations ignore this field, while
4305	      others reset the corresponding connection or reflect the field in
4306	      the TCP segment sent in response.

4308	12.1.3.  TCP ISN sampling

4310	   The attacker samples a number of Initial Sequence Numbers by sending
4311	   a number of connection requests.  Many TCP implementations differ on
4312	   the ISN generator they implement, thus allowing the correlation of
4313	   ISN generation algorithm to the operating system type and version.

4315	   This document advises implementing an ISN generator that follows the
4316	   behavior described in RFC 1948 [Bellovin, 1996].  However, it should
4317	   be noted that even if all TCP implementations generated their ISNs as
4318	   proposed in RFC 1948, there is still a number of implementation
4319	   details that are left unspecified, which would allow remote OS
4320	   fingerprinting by means of ISN sampling.  For example, the time-
4321	   dependent parameter of the hash could have a different frequency in
4322	   different TCP implementations.

4324	12.1.4.  TCP initial window

4326	   Many TCP implementations differ on the initial TCP window they use.
4327	   There are a number of factors that should be considered when
4328	   selecting the TCP window to be used for a given system.  A number of
4329	   implementations that use static windows (i.e., no automatic buffer
4330	   tuning mechanisms are implemented) default to a window of around 32
4331	   KB, which seems sensible for the general case.  On the other hand, a
4332	   window of 4 KB seems to be common practice for connections servicing
4333	   critical applications such as BGP.  It is clear that the window size
4334	   is a tradeoff among a number of considerations.  Section 3.7
4335	   discusses some of the considerations that should be made when
4336	   selecting the window size for a TCP connection.

4338	   If automatic tuning mechanisms are implemented, we suggest the
4339	   initial window to be at least 4 * RMSS segments.  We note that a
4340	   remote OS fingerprinting tool could still sample the advertised TCP
4341	   window, trying to correlate the advertised window with the potential
4342	   automatic buffer tuning algorithm and Operating System.

4344	12.1.5.  RST sampling

4346	   If an RST must be sent in response to an incoming segment, then if
4347	   the ACK bit of an incoming TCP segment is off, a Sequence Number of
4348	   zero MUST be used in the RST segment sent in response.  That is,

4350	                 <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST, ACK>

4352	   It should be noted that the SEG.LEN value used for the
4353	   Acknowledgement Number MUST be incremented once for each flag set in
4354	   the original segment that makes use of a byte of the sequence number
4355	   space.  That is, if only one of the SYN or FIN flags were set in the
4356	   received segment, the Acknowledgement Number of the response should
4357	   be set to SEG.SEQ+SEG.LEN+1.  If both the SYN and FIN flags were set
4358	   in the received segment, the Acknowledgement Number should be set to
4359	   SEG.SEQ+SEG.LEN+2.

4361	   We also RECOMMEND that TCP sets ACK bit (and the Acknowledgement
4362	   Number) in all outgoing RST segments, as it allows for additional
4363	   validation checks to be enforced at the system receiving the segment.

4365	   DISCUSSION:

4367	      [Fyodor, 1998] reports that many implementations differ in the
4368	      Acknowledgement Number they use in response to segments received
4369	      for connections in the CLOSED state.  In particular, these
4370	      implementations differ in the way they construct the RST segment
4371	      that is sent in response to those TCP segments received for
4372	      connections in the CLOSED state.

4374	      RFC 793 [Postel, 1981c] describes (in pages 36-37) how RST
4375	      segments are to be generated.  According to this RFC, the ACK bit
4376	      (and the Acknowledgment Number) is set in a RST only if the
4377	      incoming segment that elicited the RST did not have the ACK bit
4378	      set (and thus the Sequence Number of the outgoing RST segment must
4379	      be set to zero).  However, we recommend TCP implementations to set
4380	      the ACK bit (and the Acknowledgement Number) in all outgoing RST
4381	      segments, as it allows for additional validation checks to be
4382	      enforced at the system receiving the segment.

4384	12.1.6.  TCP options

4386	   Different implementations differ in the TCP options they enable by
4387	   default.  Additionally, they differ in the actual contents of the
4388	   options, and in the order in which the options are included in a TCP
4389	   segment.  There is currently no recommendation on the order in which
4390	   to include TCP options in TCP segments.

4392	12.1.7.  Retransmission Timeout (RTO) sampling

4394	   TCP uses a retransmission timer for retransmitting data in the
4395	   absence of any feedback from the remote data receiver.  The duration
4396	   of this timer is referred to as "retransmission timeout" (RTO).  RFC
4397	   2988 [Paxson and Allman, 2000] specifies the algorithm for computing
4398	   the TCP retransmission timeout (RTO).

4400	   The algorithm allows the use of clocks of different granularities, to
4401	   accommodate the different granularities used by the existing
4402	   implementations.  Thus, the difference in the resulting RTO can be
4403	   used for remote OS fingerprinting.  [Veysset et al, 2002] describes
4404	   how to perform remote OS fingerprinting by sampling and analyzing the
4405	   RTO of the target system.  However, this fingerprinting technique has
4406	   at least the following drawbacks:

4408	   o  It is usually much slower than other fingerprinting techniques, as
4409	      it may require considerable time to sample the RTO of a given
4410	      target.

4412	   o  It is less reliable than other fingerprinting techniques, as
4413	      latency and packet loss can lead to bogus results.

4415	   While in principle it would be possible to defeat this fingerprinting
4416	   technique (e.g., by obfuscating the granularity of the clock used for
4417	   computing the RTO), we consider that a more important step to defeat
4418	   remote OS detection is for implementations to address the more
4419	   effective fingerprinting techniques described in Sections 12.1.1
4420	   through 12.1.7 of this document.

4422	12.2.  System uptime detection

4424	   The "uptime" of a system may prove to be valuable information to an
4425	   attacker.  For example, it might reveal the last time a security
4426	   patch was applied.  Information about system uptime is usually leaked
4427	   by TCP header fields or options that are (or may be) time-dependent,
4428	   and are usually initialized to zero when the system is bootstrapped.
4429	   As a result, if the attacker knows the frequency with which the
4430	   corresponding parameter or header field is incremented, and is able
4431	   to sample the current value of that parameter or header field, the
4432	   system uptime will be easily obtained.  Two fields that can
4433	   potentially reveal the system uptime is the Sequence Number field of
4434	   a SYN or SYN/ACK segment (i.e., when it contains an ISN) and the
4435	   TSval field of the timestamp option.  Section 3.3.1 of this document
4436	   discusses the generation of TCP Initial Sequence Numbers.  Section
4437	   4.7.1 of this document discusses the generation of TCP timestamps.

4439	13.  Covert channels

4441	   As virtually every communications protocol, TCP can be exploited to
4442	   establish covert channels.  While an exhaustive discussion of covert
4443	   channels is out of the scope of this document, for completeness of
4444	   the document we simply note that it is possible for a (probably
4445	   malicious) user to establish a covert channel by means of TCP, such
4446	   that data can be surreptitiously passed to a remote system, probably
4447	   unnoticed by a monitoring system, and with the possibility of
4448	   concealing the location of the source system.

4450	   In most cases, covert channels based on manipulation of TCP fields
4451	   can be eliminated by protocol scrubbers and other middle-boxes.  On
4452	   the other hand, "timing channels" may prove to be more difficult to
4453	   eliminate.

4455	   [Rowland, 1996] contains a discussion of covert channels in the
4456	   TCP/IP protocol suite, with some TCP-based examples.  [Giffin et al,
4457	   2002] describes the use of TCP timestamps for the establishment of
4458	   covert channels.  [Zander, 2008] contains an extensive bibliography
4459	   of papers on covert channels, and a list of freely-available tools
4460	   that implement covert channels with the TCP/IP protocol suite.

4462	14.  TCP Port scanning

4464	   TCP port scanning aims at identifying TCP port numbers on which there
4465	   is a process listening for incoming connections.  That is, it aims at
4466	   identifying TCPs at the target system that are in the LISTEN state.
4467	   The following subsections describe different TCP port scanning
4468	   techniques that have been implemented in freely-available tools.
4469	   These subsections focus only on those port scanning techniques that
4470	   exploit features of TCP itself, and not of other communication
4471	   protocols.

4473	   For example, the following subsections do not discuss the
4474	   exploitation of application protocols (such as FTP) or the
4475	   exploitation of features of underlying protocols (such as the IP
4476	   Identification field) for port-scanning purposes.

4478	14.1.  Traditional connect() scan

4480	   The most trivial scanning technique consists in trying to perform the
4481	   TCP three-way handshake with each of the port numbers at the target
4482	   system (e.g. by issuing a call to the connect() function of the
4483	   Sockets API).  The three-way handshake will complete for port numbers
4484	   that are "open", but will fail for those port numbers that are
4485	   "closed".

4487	   As this port-scanning technique can be implemented by issuing a call
4488	   to the connect() function of the Sockets API that normal applications
4489	   use, it does not require the attacker to have superuser privileges.
4490	   The downside of this port-scanning technique is that it is less
4491	   efficient than other scanning methods (e.g., the "SYN scan" described
4492	   in Section 14.2), and that it can be easily logged by the target
4493	   system.

4495	14.2.  SYN scan

4497	   The SYN scan was introduced as a "stealth" port-scanning technique.
4498	   It aims at avoiding the target system from logging the port scan by
4499	   not completing the TCP three-way handshake.  When a SYN/ACK segment
4500	   is received in response to the initial SYN segment, the system
4501	   performing the port scan will respond with an RST segment, thus
4502	   preventing the three-way handshake from completing.  While this port-
4503	   scanning technique is harder to detect and log than the traditional
4504	   connect() scan described in Section 14.1, most current NIDS (Network
4505	   Intrusion Detection Systems) can detect and log it.

4507	   SYN scans are sometimes mistakenly reported as "SYN flood" attacks by
4508	   NIDS, though.

4510	   The main advantage of this port scanning technique is that it is much
4511	   more efficient than the traditional connect() scan.

4513	   In order to implement this port-scanning technique, port-scanning
4514	   tools usually bypass the TCP API, and forge the SYN segments they
4515	   send (e.g., by using raw sockets).  This typically requires the
4516	   attacker to have superuser privileges to be able to run the port-
4517	   scanning tool.

4519	14.3.  FIN, NULL, and XMAS scans

4521	   TCP SHOULD respond with an RST when a TCP segment is received for a
4522	   connection in the LISTEN state, and the incoming segment has neither
4523	   the SYN bit nor the RST bit set.

4525	   DISCUSSION:

4527	      RFC 793 [Postel, 1981c] states, in page 65, that an incoming
4528	      segment that does not have the RST bit set and that is received
4529	      for a connection in the fictional state CLOSED causes an RST to be
4530	      sent in response.  Pages 65-66 of RFC 793 describes the processing
4531	      of incoming segments for connections in the state LISTEN, and
4532	      implicitly states that an incoming segment that does not have the
4533	      ACK bit set (and is not a SYN or an RST) should be silently
4534	      dropped.

4536	      As a result, an attacker can exploit this situation to perform a
4537	      port scan by sending TCP segments that do not have the ACK bit set
4538	      to the target system.  When a port is "open" (i.e., there is a TCP
4539	      in the LISTEN state on the corresponding port), the target system
4540	      will respond with an RST segment.  On the other hand, if the port
4541	      is "closed" (i.e., there is a TCP in the fictional state CLOSED)
4542	      the attacker will not get any response from the target system.

4544	      Since the only requirement for exploiting this port scanning
4545	      vector is that the probe segments must not have the ACK bit set,
4546	      there are a number of different TCP control-bits combinations that
4547	      can be used for the probe segments.

4549	      When the probe segment sent to the target system is a TCP segment
4550	      that has only the FIN bit set, the scanning technique is usually
4551	      referred to as a "FIN scan".  When the probe packet is a TCP
4552	      segment that does not have any of the control bits set, the
4553	      scanning technique is usually known as a "NULL scan".  Finally,
4554	      when the probe packet sent to the target system has only the FIN,
4555	      PSH, and the URG bits set, the port-scanning technique is known as
4556	      a "XMAS scan".

4558	      It should be clear that while the aforementioned control-bits
4559	      combinations are the most popular ones, other combinations could
4560	      be used to exploit this port-scanning vector.  For example, the
4561	      CWR, ECE, and/or any of the Reserved bits could be set in the
4562	      probe segments.

4564	      The advantage of this port-scanning technique is that in can
4565	      bypass some stateless firewalls.  However, the downside is that a
4566	      number of implementations do not comply strictly with RFC 793
4567	      [Postel, 1981c], and thus always respond to the probe segments
4568	      with an RST, regardless of whether the port is open or closed.

4570	      This port-scanning vector can be easily defeated as rby responding
4571	      with an RST when a TCP segment is received for a connection in the
4572	      LISTEN state, and the incoming segment has neither the SYN bit nor
4573	      the RST bit set.

4575	14.4.  Maimon scan

4577	   If a TCP that is in the CLOSED or LISTEN states receives a TCP
4578	   segment with both the FIN and ACK bits set, it MUST respond with a
4579	   RST.

4581	   DISCUSSION:

4583	      This port scanning technique was introduced in [Maimon, 1996] with
4584	      the name "StealthScan" (method #1), and was later incorporated
4585	      into the nmap tool [Fyodor, 2006b] as the "Maimon scan".

4587	      This port scanning technique employs TCP segments that have both
4588	      the FIN and ACK bits sets as the probe segments.  While according
4589	      to RFC 793 [Postel, 1981c] these segments should elicit an RST
4590	      regardless of whether the corresponding port is open or closed, a
4591	      programming flaw found in a number of TCP implementations has
4592	      caused some systems to silently drop the probe segment if the
4593	      corresponding port was open (i.e., there was a TCP in the LISTEN
4594	      state), and respond with an RST only if the port was closed.

4596	      Therefore, an RST would indicate that the scanned port is closed,
4597	      while the absence of a response from the target system would
4598	      indicate that the scanned port is open.

4600	      While this bug has not been found in current implementations of
4601	      TCP, it might still be present in some legacy systems.

4603	14.5.  Window scan

4605	   When sending an RST segment, TCP SHOULD set the Window field to zero.

4607	   DISCUSSION:

4609	      This port-scanning technique employs ACK segments as the probe
4610	      packets.  ACK segments will elicit an RST from the target system
4611	      regardless of whether the corresponding TCP port is open or
4612	      closed.  However, as described in [Maimon, 1996], some systems set
4613	      the Window field of the RST segments with different values
4614	      depending on whether the corresponding TCP port is open or closed.
4615	      These systems set the Window field of their RST segments to zero
4616	      when the corresponding TCP port is closed, and set the Window
4617	      field to a non-zero value when the corresponding TCP port is open.

4619	      As a result, an attacker could exploit this situation for
4620	      performing a port scan by sending ACK segments to the target
4621	      system, and examining the Window field of the RST segments that
4622	      his probe segments elicit.

4624	      In order to defeat this port-scanning technique, we recommend TCP
4625	      implementations to set the Window field to zero in all the RST
4626	      segments they send.  Most popular implementations of TCP already
4627	      implement this policy.

4629	14.6.  ACK scan

4631	   The so-called "ACK scan" is not really a port-scanning technique
4632	   (i.e., it does not aim at determining whether a specific port is open
4633	   or closed), but rather aims at determining whether some intermediate
4634	   system is filtering TCP segments sent to that specific port number.

4636	   The probe packet is a TCP segment with the ACK bit set which,
4637	   according to RFC 793 [Postel, 1981c] should elicit an RST from the
4638	   target system regardless of whether the corresponding TCP port is
4639	   open or closed.  If no response is received from the target system,
4640	   it is assumed that some intermediate system is filtering the probe
4641	   packets sent to the target system.

4643	   It should be noted that this "port scanning" techniques exploits
4644	   basic TCP processing rules, and therefore cannot be defeated at an
4645	   end-system.

4647	15.  Processing of ICMP error messages by TCP

4649	   TCP SHOULD silently ignore received ICMP Source Quench messages.

4651	   TCP SHOULD process ICMP "hard errors" as "soft errors" when they are
4652	   received for connections that are in any of he synchronized states.

4654	   TCP SHOULD process ICMP "fragmentation needed and DF bit set" and
4655	   ICMPv6 "Packet Too Big" error messages as described in [RFC5927].

4657	   DISCUSSION:

4659	      [RFC5927] analyzes a number of vulnerabilities based on crafted
4660	      ICMP messages, along with possible counter-measures.

4662	16.  TCP interaction with the Internet Protocol (IP)

4664	16.1.  TCP-based traceroute

4666	   The traceroute tool is used to identify the intermediate systems the
4667	   local system and the destination system.  It is usually implemented
4668	   by sending "probe" packets with increasing IP Time to Live values
4669	   (starting from 0), without maintaining any state with the final
4670	   destination.

4672	   Some traceroute implementations use ICMP "echo request" messages as
4673	   the probe packets, while others use UDP packets or TCP SYN segments.

4675	   In some cases, the state-less nature of the traceroute tool may
4676	   prevent it from working correctly across stateful devices such as
4677	   Network Address Translators (NATs) or firewalls.

4679	   In order to by-pass this limitation, an attacker could establish a
4680	   TCP connection with the destination system, and start sending TCP
4681	   segments on that connection with increasing IP Time to Live values
4682	   (starting from 0) [Zalewski, 2007] [Zalewski, 2008].  Provided ICMP
4683	   error messages are not blocked by any intermediate system, an
4684	   attacker could exploit this technique to map the network topology
4685	   behind the aforementioned stateful devices in scenarios in which he
4686	   could not have achieved this goal using the traditional traceroute
4687	   tool.

4689	   NATs [Srisuresh and Egevang, 2001] and other middle-boxes could
4690	   defeat this network-mapping technique by overwriting the Time to Live
4691	   of the packets they forward to the internal network.  For example,
4692	   they could overwrite the Time to Live of all packets being forwarded
4693	   to an internal network with a value such as 128.  We strongly
4694	   recommend against overwriting the IP Time to Live field with the
4695	   value 255 or other similar large values, as this could allow an
4696	   attacker to bypass the protection provided by the Generalized TTL
4697	   Security Mechanism (GTSM) described in RFC 5087 [Gill et al, 2007].

4699	   [Gont and Srisuresh, 2008] discusses the security implications of
4700	   NATs, and proposes mitigations for this and other issues.

4702	16.2.  Blind TCP data injection through fragmented IP traffic

4704	   As discussed in Section 11.2, TCP data injection attacks usually
4705	   require an attacker to guess or know a number of parameters related
4706	   with the target TCP connection, such as the connection-id {Source
4707	   Address, Source Port, Destination Address, Destination Port}, the TCP
4708	   Sequence Number, and the TCP Acknowledgement Number.  Provided these
4709	   values are obfuscated as recommended in this document, the chances of
4710	   an off-path attacker of successfully performing a data injection
4711	   attack against a TCP connection are fairly low for many of the most
4712	   common scenarios.

4714	   As discussed in this document, randomization of the values contained
4715	   in different TCP header fields is not a replacement for cryptographic
4716	   methods for protecting a TCP connection, such as IPsec (specified in
4717	   RFC 4301 [Kent and Seo, 2005]).

4719	   However, [Zalewski, 2003b] describes a possible vector for performing
4720	   a TCP data injection attack that does not require the attacker to
4721	   guess or know the aforementioned TCP connection parameters, and could
4722	   therefore be successfully exploited in some scenarios with less
4723	   effort than that required to exploit the more traditional data-
4724	   injection attack vectors.

4726	   The attack vector works as follows.  When one system is transferring
4727	   information to a remote peer by means of TCP, and the resulting
4728	   packet gets fragmented, the first fragment will usually contain the
4729	   entire TCP header which, together with the IP header, includes all
4730	   the connection parameters that an attacker would need to guess or
4731	   know to successfully perform a data injection attack against TCP.  If
4732	   an attacker were able to forge all the fragments other than the first
4733	   one, his forged fragments could be reassembled together with the
4734	   legitimate first fragment, and thus he would be relieved from the
4735	   hard task of guessing or knowing connection parameters such as the
4736	   TCP Sequence Number and the TCP Acknowledgement Number.

4738	   In order to successfully exploit this attack vector, the attacker
4739	   should be able to guess or know both of the IP addresses involved in
4740	   the target TCP connection, the IP Identification value used for the
4741	   specific packet he is targeting, and the TCP Checksum of that target
4742	   packet.  While it would seem that these values are hard to guess, in
4743	   some specific scenarios, and with some security-unwise implementation
4744	   approaches for the TCP and IP protocols, these values may be feasible
4745	   to guess or know.  For example, if the sending system uses
4746	   predictable IP Identification values, the attacker could simply
4747	   perform a brute force attack, trying each of the possible
4748	   combinations for the TCP Checksum field.  In more specific scenarios,
4749	   the attacker could have more detailed knowledge about the data being
4750	   transferred over the target TCP connection, which might allow him to
4751	   predict the TCP Checksum of the target packet.  For example, if both
4752	   of the involved TCP peers used predictable values for the TCP
4753	   Sequence Number and for the IP Identification fields, and the
4754	   attacker knew the data being transferred over the target TCP
4755	   connection, he could be able to carefully forge the IP payload of his
4756	   IP fragments so that the checksum of the reassembled TCP segment
4757	   matched the Checksum included in the TCP header of the first (and
4758	   legitimate) IP fragment.

4760	   As discussed in Section 4.1 of [CPNI, 2008], IP fragmentation
4761	   provides a vector for performing a variety of attacks against an IP
4762	   implementation.  Therefore, we discourage the reliance on IP
4763	   fragmentation by end-systems, and recommend the implementation of
4764	   mechanisms for the discovery of the Path-MTU, such as that described
4765	   in Section 15.7.3 of this document and/or that described in RFC 4821
4766	   [Mathis and Heffner, 2007].  We nevertheless recommend randomization
4767	   of the IP Identification field as described in Section 3.5.2 of
4768	   [CPNI, 2008].  While randomization of the IP Identification field
4769	   does not eliminate this attack vector, it does require more work on
4770	   the side of the attacker to successfully exploit it.

4772	16.3.  Broadcast and multicast IP addresses

4774	   TCP connection state is maintained between only two endpoints at a
4775	   time.  As a result, broadcast and multicast IP addresses should not
4776	   be allowed for the establishment of TCP connections.  Section 4.3 of
4777	   [CPNI, 2008] provides advice about which specific IP address blocks
4778	   should not be allowed for connection-oriented protocols such as TCP.

4780	17.  Security Considerations

4782	   This document provides a thorough security assessment of the
4783	   Transmission Control Protocol (TCP), identifies a number of
4784	   vulnerabilities, and specifies possible counter-measures.
4785	   Additionally, it provides implementation guidance such that the
4786	   resilience of TCP implementations is improved.

4788	18.  Acknowledgements

4790	   The author would like to thank (in alphabetical order) David Borman,
4791	   Wesley Eddy, and Alfred Hoenes, for providing valuable feedback on
4792	   earlier versions of thi document.

4794	   This document is heavily based on the document "Security Assessment
4795	   of the Transmission Control Protocol (TCP)" [CPNI, 2009] written by
4796	   Fernando Gont on behalf of CPNI (Centre for the Protection of
4797	   National Infrastructure).

4799	   The author would like to thank (in alphabetical order) Randall
4800	   Atkinson, Guillermo Gont, Alfred Hoenes, Jamshid Mahdavi, Stanislav
4801	   Shalunov, Michael Welzl, Dan Wing, Andrew Yourtchenko, Michal
4802	   Zalewski, and Christos Zoulas, for providing valuable feedback on
4803	   earlier versions of the UK CPNI document.

4805	   Additionally, the author would like to thank (in alphabetical order)
4806	   Mark Allman, David Black, Ethan Blanton, David Borman, James Chacon,
4807	   John Heffner, Jerrold Leichter, Jamshid Mahdavi, Keith Scott, Bill
4808	   Squier, and David White, who generously answered a number of
4809	   questions that araised while the aforementioned document was being
4810	   written.

4812	   Finally, the author would like to thank CPNI (formely NISCC) for
4813	   their continued support.

4815	19.  References

4817	   Abley, J., Savola, P., Neville-Neil, G. 2007.  Deprecation of Type 0
4818	   Routing Headers in IPv6.  RFC 5095.

4820	   Allman, M. 2003.  TCP Congestion Control with Appropriate Byte
4821	   Counting (ABC).  RFC 3465.

4823	   Allman, M. 2008.  Comments On Selecting Ephemeral Ports.  Available
4824	   at: http://www.icir.org/mallman/share/ports-dec08.pdf

4826	   Allman, M., Paxson, V., Stevens, W. 1999.  TCP Congestion Control.
4827	   RFC 2581.

4829	   Allman, M., Balakrishnan, H., Floyd, S. 2001.  Enhancing TCP's Loss
4830	   Recovery Using Limited Transmit.  RFC 3042.

4832	   Allman, M., Floyd, S., and C. Partridge. 2002.  Increasing TCP's
4833	   Initial Window.  RFC 3390.

4835	   Baker, F. 1995.  Requirements for IP Version 4 Routers.  RFC 1812.

4837	   Baker, F., Savola, P. 2004.  Ingress Filtering for Multihomed
4838	   Networks.  RFC 3704.

4840	   Barisani, A. 2006.  FTester - Firewall and IDS testing tool.
4841	   Available at: http://dev.inversepath.com/trac/ftester

4843	   Beck, R. 2001.  Passive-Aggressive Resistance: OS Fingerprint
4844	   Evasion.  Linux Journal.

4846	   Bellovin, S. M. 1989.  Security Problems in the TCP/IP Protocol
4847	   Suite.  Computer Communication Review, Vol. 19, No. 2, pp. 32-48.

4849	   Bellovin, S. M. 1996.  Defending Against Sequence Number Attacks.
4850	   RFC 1948.

4852	   Bellovin, S. M. 2006.  Towards a TCP Security Option.  IETF Internet-
4853	   Draft (draft-bellovin-tcpsec-00.txt), work in progress.

4855	   Bernstein, D. J. 1996.  SYN cookies.  Available at:
4856	   http://cr.yp.to/syncookies.html

4858	   Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss,
4859	   W., 1998.  An Architecture for Differentiated Services.  RFC 2475.

4861	   Blanton, E., Allman, M., Fall, K., Wang, L. 2003.  A Conservative
4862	   Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for
4863	   TCP.  RFC 3517.

4865	   Borman, D. 1997.  Post to the tcp-impl mailing-list.  Message-Id:
4866	   <199706061526.KAA01535@frantic.BSDI.COM>.  Available at:
4867	   http://www.kohala.com/start/borman.97jun06.txt

4869	   Borman, D., Deering, S., Hinden, R. 1999.  IPv6 Jumbograms.  RFC
4870	   2675.

4872	   Braden, R. 1989.  Requirements for Internet Hosts -- Communication
4873	   Layers.  RFC 1122.

4875	   Braden, R. 1992.  Extending TCP for Transactions -- Concepts.  RFC
4876	   1379.

4878	   Braden, R. 1994.  T/TCP -- TCP Extensions for Transactions Functional
4879	   Specification.  RFC 1644.

4881	   CCSDS. 2006.  Consultative Committee for Space Data Systems (CCSDS)
4882	   Recommendation Communications Protocol Specification (SCPS) --
4883	   Transport Protocol (SCPS-TP).  Blue Book.  Issue 2.  Available at:
4884	   http://public.ccsds.org/publications/archive/714x0b2.pdf

4886	   CERT. 1996.  CERT Advisory CA-1996-21: TCP SYN Flooding and IP
4887	   Spoofing Attacks.  Available at:
4888	   http://www.cert.org/advisories/CA-1996-21.html

4890	   CERT. 1997.  CERT Advisory CA-1997-28 IP Denial-of-Service Attacks.
4891	   Available at: http://www.cert.org/advisories/CA-1997-28.html

4893	   CERT. 2000.  CERT Advisory CA-2000-21: Denial-of-Service
4894	   Vulnerabilities in TCP/IP Stacks.  Available at:
4895	   http://www.cert.org/advisories/CA-2000-21.html

4897	   CERT. 2001.  CERT Advisory CA-2001-09: Statistical Weaknesses in
4898	   TCP/IP Initial Sequence Numbers.  Available at:
4899	   http://www.cert.org/advisories/CA-2001-09.html

4901	   CERT. 2003.  CERT Advisory CA-2003-13 Multiple Vulnerabilities in
4902	   Snort Preprocessors.  Available at:
4903	   http://www.cert.org/advisories/CA-2003-13.html

4905	   Cisco. 2008a.  Cisco Security Appliance Command Reference, Version
4906	   7.0.  Available at: http://www.cisco.com/en/US/docs/security/asa/
4907	   asa70/command/reference/tz.html#wp1288756
4908	   Cisco. 2008b.  Cisco Security Appliance System Log Messages, Version
4909	   8.0.  Available at: http://www.cisco.com/en/US/docs/security/asa/
4910	   asa80/system/message/logmsgs.html#wp4773952

4912	   Clark, D.D. 1982.  Fault isolation and recovery.  RFC 816.

4914	   Clark, D.D. 1988.  The Design Philosophy of the DARPA Internet
4915	   Protocols, Computer Communication Review, Vol. 18, No.4, pp. 106-114.

4917	   Connolly, T., Amer, P., Conrad, P. 1994.  An Extension to TCP :
4918	   Partial Order Service.  RFC 1693.

4920	   Conta, A., Deering, S., Gupta, M. 2006.  Internet Control Message
4921	   Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6)
4922	   Specification.  RFC 4443.

4924	   CORE. 2003.  Core Secure Technologies Advisory CORE-2003-0307: Snort
4925	   TCP Stream Reassembly Integer Overflow Vulnerability.  Available at:
4926	   http://www.coresecurity.com/common/showdoc.php?idx=313&idxseccion=10

4928	   CPNI, 2008.  Security Assessment of the Internet Protocol.  Available
4929	   at: http://www.cpni.gov.uk/Docs/InternetProtocol.pdf

4931	   CPNI, 2009.  Security Assessment of the Transmission Control Protocol
4932	   (TCP).  Available at:
4933	   http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-TCP.pdf

4935	   daemon9, route, and infinity. 1996.  IP-spoofing Demystified (Trust-
4936	   Relationship Exploitation), Phrack Magazine, Volume Seven, Issue
4937	   Forty-Eight, File 14 of 18.  Available at:
4938	   http://www.phrack.org/archives/48/P48-14

4940	   Deering, S., Hinden, R. 1998.  Internet Protocol, Version 6 (IPv6)
4941	   Specification.  RFC 2460.

4943	   Dharmapurikar, S., Paxson, V. 2005.  Robust TCP Stream Reassembly In
4944	   the Presence of Adversaries.  Proceedings of the USENIX Security
4945	   Symposium 2005.

4947	   Duke, M., Braden, R., Eddy, W., Blanton, E. 2006.  A Roadmap for
4948	   Transmission Control Protocol (TCP) Specification Documents.  RFC
4949	   4614.

4951	   Ed3f. 2002.  Firewall spotting and networks analisys with a broken
4952	   CRC.  Phrack Magazine, Volume 0x0b, Issue 0x3c, Phile #0x0c of 0x10.
4953	   Available at: http://www.phrack.org/phrack/60/p60-0x0c.txt

4955	   Eddy, W. 2007.  TCP SYN Flooding Attacks and Common Mitigations.  RFC
4956	   4987.

4958	   Fenner, B. 2006.  Experimental Values in IPv4, IPv6, ICMPv4, ICMPv6,
4959	   UDP, and TCP Headers.  RFC 4727.

4961	   Ferguson, P., and Senie, D. 2000.  Network Ingress Filtering:
4962	   Defeating Denial of Service Attacks which employ IP Source Address
4963	   Spoofing.  RFC 2827.

4965	   Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
4966	   Leach, P., and Berners-Lee, T. 1999.  Hypertext Transfer Protocol --
4967	   HTTP/1.1.  RFC 2616.

4969	   Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. 2000.  An Extension
4970	   to the Selective Acknowledgement (SACK) Option for TCP.  RFC 2883.

4972	   Floyd, S., Henderson, T., Gurtov, A. 2004.  The NewReno Modification
4973	   to TCP's Fast Recovery Algorithm.  RFC 3782.

4975	   Floyd, S., Allman, M., Jain, A., Sarolahti, P. 2007.  Quick-Start for
4976	   TCP and IP.  RFC 4782.

4978	   Fyodor. 1998.  Remote OS Detection via TCP/IP Stack Fingerprinting.
4979	   Phrack Magazine, Volume 8, Issue, 54.

4981	   Fyodor. 2006a.  Remote OS Detection via TCP/IP Fingerprinting (2nd
4982	   Generation).  Available at: http://insecure.org/nmap/osdetect/.

4984	   Fyodor. 2006b.  Nmap - Free Security Scanner For Network Exploration
4985	   and Audit.  Available at: http://www.insecure.org/nmap.

4987	   Fyodor. 2008.  Nmap Reference Guide: Port Scanning Techniques.
4988	   Available at: http://nmap.org/book/man-port-scanning-techniques.html

4990	   GIAC. 2000.  Egress Filtering v 0.2.  Available at:
4991	   http://www.sans.org/y2k/egress.htm

4993	   Giffin, J., Greenstadt, R., Litwack, P., Tibbetts, R. 2002.  Covert
4994	   Messaging through TCP Timestamps.  PET2002 (Workshop on Privacy
4995	   Enhancing Technologies), San Francisco, CA, USA, April2002.
4996	   Available at:
4997	   http://web.mit.edu/greenie/Public/CovertMessaginginTCP.ps

4999	   Gill, V., Heasley, J., Meyer, D., Savola, P, Pignataro, C. 2007.  The
5000	   Generalized TTL Security Mechanism (GTSM).  RFC 5082.

5002	   Gont, F. 2006.  Advanced ICMP packet filtering.  Available at:
5003	   http://www.gont.com.ar/papers/icmp-filtering.html
5004	   Gont, F. 2008a.  ICMP attacks against TCP.  IETF Internet-Draft
5005	   (draft-ietf-tcpm-icmp-attacks-04.txt), work in progress.

5007	   Gont, F.. 2008b.  TCP's Reaction to Soft Errors.  IETF Internet-Draft
5008	   (draft-ietf-tcpm-tcp-soft-errors-09.txt), work in progress.

5010	   Gont, F. 2009.  On the generation of TCP timestamps.  IETF Internet-
5011	   Draft (draft-gont-tcpm-tcp-timestamps-01.txt), work in progress.

5013	   Gont, F., Srisuresh, P. 2008.  Security Implications of Network
5014	   Address Translators (NATs).  IETF Internet-Draft
5015	   (draft-gont-behave-nat-security-01.txt), work in progress.

5017	   Gont, F., Yourtchenko, A. 2009.  On the implementation of TCP urgent
5018	   data.  IETF Internet-Draft (draft-gont-tcpm-urgent-data-01.txt), work
5019	   in progress.

5021	   Heffernan, A. 1998.  Protection of BGP Sessions via the TCP MD5
5022	   Signature Option.  RFC 2385.

5024	   Heffner, J. 2002.  High Bandwidth TCP Queuing.  Senior Thesis.

5026	   Hnes, A. 2007.  TCP options - tcp-parameters IANA registry.  Post to
5027	   the tcpm wg mailing-list.  Available at:
5028	   http://www.ietf.org/mail-archive/web/tcpm/current/msg03199.html

5030	   IANA. 2007.  Transmission Control Protocol (TCP) Option Numbers.
5031	   Avialable at: http://www.iana.org/assignments/tcp-parameters/

5033	   IANA. 2008.  Port Numbers.  Available at:
5034	   http://www.iana.org/assignments/port-numbers

5036	   Jacobson, V. 1988.  Congestion Avoidance and Control.  Computer
5037	   Communication Review, vol. 18, no. 4, pp. 314-329.  Available at:
5038	   ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

5040	   Jacobson, V., Braden, R. 1988.  TCP Extensions for Long-Delay Paths.
5041	   RFC 1072.

5043	   Jacobson, V., Braden, R., Borman, D. 1992.  TCP Extensions for High
5044	   Performance.  RFC 1323.

5046	   Jones, S. 2003.  Port 0 OS Fingerprinting.  Available at:
5047	   http://www.gont.com.ar/docs/port-0-os-fingerprinting.txt

5049	   Kent, S. and Seo, K. 2005.  Security Architecture for the Internet
5050	   Protocol.  RFC 4301.

5052	   Klensin, J. 2008.  Simple Mail Transfer Protocol.  RFC 5321.

5054	   Ko, Y., Ko, S., and Ko, M. 2001.  NIDS Evasion Method named SeolMa.
5055	   Phrack Magazine, Volume 0x0b, Issue 0x39, phile #0x03 of 0x12.
5056	   Available at: http://www.phrack.org/issues.html?issue=57&id=3#article

5058	   Lahey, K. 2000.  TCP Problems with Path MTU Discovery.  RFC 2923.

5060	   Larsen, M., Gont, F. 2008.  Port Randomization.  IETF Internet-Draft
5061	   (draft-ietf-tsvwg-port-randomization-02), work in progress.

5063	   Lemon, 2002.  Resisting SYN flood DoS attacks with a SYN cache.
5064	   Proceedings of the BSDCon 2002 Conference, pp 89-98.

5066	   Maimon, U. 1996.  Port Scanning without the SYN flag.  Phrack
5067	   Magazine, Volume Seven, Issue Fourty-Nine, phile #0x0f of 0x10.
5068	   Available at:
5069	   http://www.phrack.org/issues.html?issue=49&id=15#article

5071	   Mathis, M., Mahdavi, J., Floyd, S. Romanow, A. 1996.  TCP Selective
5072	   Acknowledgment Options.  RFC 2018.

5074	   Mathis, M., and Heffner, J. 2007.  Packetization Layer Path MTU
5075	   Discovery.  RFC 4821.

5077	   McCann, J., Deering, S., Mogul, J. 1996.  Path MTU Discovery for IP
5078	   version 6.  RFC 1981.

5080	   McKusick, M., Bostic, K., Karels, M., and J. Quarterman. 1996.  The
5081	   Design and Implementation of the 4.4BSD Operating System.  Addison-
5082	   Wesley.

5084	   Meltman. 1997. new TCP/IP bug in win95.  Post to the bugtraq mailing-
5085	   list.  Available at: http://insecure.org/sploits/land.ip.DOS.html

5087	   Miller, T. 2006.  Passive OS Fingerprinting: Details and Techniques.
5088	   Available at: http://www.ouah.org/incosfingerp.htm .

5090	   Mogul, J., and Deering, S. 1990.  Path MTU Discovery.  RFC 1191.

5092	   Morris, R. 1985.  A Weakness in the 4.2BSD Unix TCP/IP Software.
5093	   Technical Report CSTR-117, AT&T Bell Laboratories.  Available at:
5094	   http://pdos.csail.mit.edu/~rtm/papers/117.pdf .

5096	   Myst. 1997.  Windows 95/NT DoS.  Post to the bugtraq mailing-list.
5097	   Available at: http://seclists.org/bugtraq/1997/May/0039.html

5099	   Nichols, K., Blake, S., Baker, F., and Black, D. 1998.  Definition of
5100	   the Differentiated Services Field (DS Field) in the IPv4 and IPv6
5101	   Headers.  RFC 2474.

5103	   NISCC. 2004.  NISCC Vulnerability Advisory 236929: Vulnerability
5104	   Issues in TCP.  Available at:
5105	   http://www.uniras.gov.uk/niscc/docs/re-20040420-00391.pdf

5107	   NISCC. 2005.  NISCC Vulnerability Advisory 532967/NISCC/ICMP:
5108	   Vulnerability Issues in ICMP packets with TCP payloads.  Available
5109	   at: http://www.niscc.gov.uk/niscc/docs/re-20050412-00303.pdf

5111	   NISCC. 2006.  NISCC Technical Note 01/2006: Egress and Ingress
5112	   Filtering.  Available at:
5113	   http://www.niscc.gov.uk/niscc/docs/re-20060420-00294.pdf?lang=en

5115	   Ostermann, S. 2008. tcptrace tool.  Tool and documentation available
5116	   at: http://www.tcptrace.org.

5118	   Paxson, V., Allman, M. 2000.  Computing TCP's Retransmission Timer.
5119	   RFC 2988.

5121	   PCNWG. 2009.  Congestion and Pre-Congestion Notification (pcn)
5122	   charter.  Available at:
5123	   http://www.ietf.org/html.charters/pcn-charter.html

5125	   PMTUDWG. 2007.  Path MTU Discovery (pmtud) charter.  Available at:
5126	   http://www.ietf.org/html.charters/OLD/pmtud-charter.html

5128	   Postel, J. 1981a.  Internet Protocol.  DARPA Internet Program.
5129	   Protocol Specification.  RFC 791.

5131	   Postel, J. 1981b.  Internet Control Message Protocol.  RFC 792.

5133	   Postel, J. 1981c.  Transmission Control Protocol.  DARPA Internet
5134	   Program.  Protocol Specification.  RFC 793.

5136	   Postel, J. 1987.  TCP AND IP BAKE OFF.  RFC 1025.

5138	   Ptacek, T. H., and Newsham, T. N. 1998.  Insertion, Evasion and
5139	   Denial of Service: Eluding Network Intrusion Detection.  Secure
5140	   Networks, Inc. Available at:
5141	   http://www.aciri.org/vern/Ptacek-Newsham-Evasion-98.ps

5143	   Ramaiah, A., Stewart, R., and Dalal, M. 2008.  Improving TCP's
5144	   Robustness to Blind In-Window Attacks.  IETF Internet-Draft
5145	   (draft-ietf-tcpm-tcpsecure-10.txt), work in progress.

5147	   Ramakrishnan, K., Floyd, S., and Black, D. 2001.  The Addition of
5148	   Explicit Congestion Notification (ECN) to IP.  RFC 3168.

5150	   Rekhter, Y., Li, T., Hares, S. 2006.  A Border Gateway Protocol 4
5151	   (BGP-4).  RFC 4271.

5153	   Rivest, R. 1992.  The MD5 Message-Digest Algorithm.  RFC 1321.

5155	   Rowland, C. 1997.  Covert Channels in the TCP/IP Protocol Suite.
5156	   First Monday Journal, Volume 2, Number 5.  Available at:
5157	   http://www.firstmonday.org/issues/issue2_5/rowland/

5159	   Savage, S., Cardwell, N., Wetherall, D., Anderson, T. 1999.  TCP
5160	   Congestion Control with a Misbehaving Receiver.  ACM Computer
5161	   Communication Review, 29(5), October 1999.

5163	   Semke, J., Mahdavi, J., Mathis, M. 1998.  Automatic TCP Buffer
5164	   Tuning.  ACM Computer Communication Review, Vol. 28, No. 4.

5166	   Shalunov, S. 2000.  Netkill.  Available at:
5167	   http://www.internet2.edu/~shalunov/netkill/netkill.html

5169	   Shimomura, T. 1995.  Technical details of the attack described by
5170	   Markoff in NYT.  Message posted in USENETs comp.security.misc
5171	   newsgroup, Message-ID: <3g5gkl$5j1@ariel.sdsc.edu>.  Available at:
5172	   http://www.gont.com.ar/docs/post-shimomura-usenet.txt.

5174	   Silbersack, M. 2005.  Improving TCP/IP security through randomization
5175	   without sacrificing interoperability.  EuroBSDCon 2005 Conference.

5177	   SinFP. 2006.  Net::SinFP - a Perl module to do OS fingerprinting.
5178	   Available at:
5179	   http://www.gomor.org/cgi-bin/index.pl?mode=view;page=sinfp

5181	   Smart, M., Malan, G., Jahanian, F. 2000.  Defeating TCP/IP Stack
5182	   Fingerprinting.  Proceedings of the 9th USENIX Security Symposium,
5183	   pp. 229-240.  Available at: http://www.usenix.org/publications/
5184	   library/proceedings/sec2000/full_papers/smart/smart_html/index.html

5186	   Smith, C., Grundl, P. 2002.  Know Your Enemy: Passive Fingerprinting.
5187	   The Honeynet Project.

5189	   Spring, N., Wetherall, D., Ely, D. 2003.  Robust Explicit Congestion
5190	   Notification (ECN) Signaling with Nonces.  RFC 3540.

5192	   Srisuresh, P., Egevang, K. 2001.  Traditional IP Network Address
5193	   Translator (Traditional NAT).  RFC 3022.

5195	   Stevens, W. R. 1994.  TCP/IP Illustrated, Volume 1: The Protocols.

5197	   Addison-Wesley Professional Computing Series.

5199	   TBIT. 2001.  TBIT, the TCP Behavior Inference Tool.  Available at:
5200	   http://www.icir.org/tbit/

5202	   Touch, J. 2007.  Defending TCP Against Spoofing Attacks.  RFC 4953.

5204	   US-CERT. 2001.  US-CERT Vulnerability Note VU#498440: Multiple TCP/IP
5205	   implementations may use statistically predictable initial sequence
5206	   numbers.  Available at: http://www.kb.cert.org/vuls/id/498440

5208	   US-CERT. 2003a.  US-CERT Vulnerability Note VU#26825: Cisco Secure
5209	   PIX Firewall TCP Reset Vulnerability.  Available at:
5210	   http://www.kb.cert.org/vuls/id/26825

5212	   US-CERT. 2003b.  US-CERT Vulnerability Note VU#464113: TCP/IP
5213	   implementations handle unusual flag combinations inconsistently.
5214	   Available at: http://www.kb.cert.org/vuls/id/464113

5216	   US-CERT. 2004a.  US-CERT Vulnerability Note VU#395670: FreeBSD fails
5217	   to limit number of TCP segments held in reassembly queue.  Available
5218	   at: http://www.kb.cert.org/vuls/id/395670

5220	   US-CERT. 2005a.  US-CERT Vulnerability Note VU#102014: Optimistic TCP
5221	   acknowledgements can cause denial of service.  Available at:
5222	   http://www.kb.cert.org/vuls/id/102014

5224	   US-CERT. 2005b.  US-CERT Vulnerability Note VU#396645: Microsoft
5225	   Windows vulnerable to DoS via LAND attack.  Available at:
5226	   http://www.kb.cert.org/vuls/id/396645

5228	   US-CERT. 2005c.  US-CERT Vulnerability Note VU#637934: TCP does not
5229	   adequately validate segments before updating timestamp value.
5230	   Available at: http://www.kb.cert.org/vuls/id/637934

5232	   US-CERT. 2005d.  US-CERT Vulnerability Note VU#853540: Cisco PIX
5233	   fails to verify TCP checksum.  Available at:
5234	   http://www.kb.cert.org/vuls/id/853540.

5236	   Veysset, F., Courtay, O., Heen, O. 2002.  New Tool And Technique For
5237	   Remote Operating System Fingerprinting.  Intranode Research Team.

5239	   Watson, P. 2004.  Slipping in the Window: TCP Reset Attacks,
5240	   CanSecWest 2004 Conference.

5242	   Welzl, M. 2008.  Internet congestion control: evolution and current
5243	   open issues.  CAIA guest talk, Swinburne University, Melbourne,
5244	   Australia.  Available at:

5246	   http://www.welzl.at/research/publications/caia-jan08.pdf

5248	   Wright, G. and W. Stevens. 1994.  TCP/IP Illustrated, Volume 2: The
5249	   Implementation.  Addison-Wesley.

5251	   Zalewski, M. 2001a.  Strange Attractors and TCP/IP Sequence Number
5252	   Analysis.  Available at:
5253	   http://lcamtuf.coredump.cx/oldtcp/tcpseq.html

5255	   Zalewski, M. 2001b.  Delivering Signals for Fun and Profit.
5256	   Available at: http://lcamtuf.coredump.cx/signals.txt

5258	   Zalewski, M. 2002.  Strange Attractors and TCP/IP Sequence Number
5259	   Analysis - One Year Later.  Available at:
5260	   http://lcamtuf.coredump.cx/newtcp/

5262	   Zalewski, M. 2003a.  Windows URG mystery solved!  Post to the bugtraq
5263	   mailing-list.  Available at:
5264	   http://lcamtuf.coredump.cx/p0f-help/p0f/doc/win-memleak.txt

5266	   Zalewski, M. 2003b.  A new TCP/IP blind data injection technique?
5267	   Post to the bugtraq mailing-list.  Available at:
5268	   http://lcamtuf.coredump.cx/ipfrag.txt

5270	   Zalewski, M. 2006a. p0f passive fingerprinting tool.  Available at:
5271	   http://lcamtuf.coredump.cx/p0f.shtml

5273	   Zalewski, M. 2006b. p0f - RST+ signatures.  Available at:
5274	   http://lcamtuf.coredump.cx/p0f-help/p0f/p0fr.fp

5276	   Zalewski, M. 2007. 0trace - traceroute on established connections.
5277	   Post to the bugtraq mailing-list.  Available at:
5278	   http://seclists.org/bugtraq/2007/Jan/0176.html

5280	   Zalewski, M. 2008.  Museum of broken packets.  Available at:
5281	   http://lcamtuf.coredump.cx/mobp/

5283	   Zander, S. 2008.  Covert Channels in Computer Networks.  Available
5284	   at: http://caia.swin.edu.au/cv/szander/cc/index.html

5286	   Zquete, A. 2002.  Improving the functionality of SYN cookies. 6th
5287	   IFIP Communications and Multimedia Security Conference (CMS 2002).
5288	   Available at: http://www.ieeta.pt/~avz/pubs/CMS02.html

5290	   Zweig, J., Partridge, C. 1990.  TCP Alternate Checksum Options.  RFC
5291	   1146.

5293	20.  References

5295	20.1.  Normative References

5297	   [I-D.ietf-tcpm-tcp-timestamps]
5298	              Gont, F., "Reducing the TIME-WAIT state using TCP
5299	              timestamps", draft-ietf-tcpm-tcp-timestamps-03 (work in
5300	              progress), December 2010.

5302	   [I-D.ietf-tsvwg-port-randomization]
5303	              Larsen, M. and F. Gont, "Transport Protocol Port
5304	              Randomization Recommendations",
5305	              draft-ietf-tsvwg-port-randomization-09 (work in progress),
5306	              August 2010.

5308	   [RFC6093]  Gont, F. and A. Yourtchenko, "On the Implementation of the
5309	              TCP Urgent Mechanism", RFC 6093, January 2011.

5311	20.2.  Informative References

5313	   [I-D.gont-timestamps-generation]
5314	              Gont, F. and A. Oppermann, "On the generation of TCP
5315	              timestamps", draft-gont-timestamps-generation-00 (work in
5316	              progress), June 2010.

5318	   [RFC5927]  Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010.

5320	Appendix A.  TODO list

5322	   A Number of formatting issues still have to be fixed in this
5323	   document.  Among others are:

5325	   o  The ASCII-art corresponding to some figures are still missing.  We
5326	      still have to convert the nice JPGs of the UK CPNI document into
5327	      ugly ASCII-art.

5329	   o  The references have not yet been converted to xml, but are
5330	      hardcoded, instead.  That's why they may not look as expected

5332	Appendix B.  Change log (to be removed by the RFC Editor before
5333	             publication of this document as an RFC)

5335	B.1.  Changes from draft-ietf-tcpm-tcp-security-01

5337	   A Number of formatting issues still have to be fixed in this
5338	   document.  Among others are:

5340	   o  The whole document was reformatted with RFC 1122 style.

5342	Author's Address

5344	   Fernando Gont
5345	   UK Centre for the Protection of National Infrastructure

5347	   Email: fernando@gont.com.ar
5348	   URI:   http://www.cpni.gov.uk