idnits 2.17.1 draft-ietf-tcpm-tcp-security-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of too long lines in the document, the longest one being 5 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 384: '...such as packet filters MUST NOT assume...' RFC 2119 keyword, line 1370: '... Hosts SHOULD enforce limits on the ...' RFC 2119 keyword, line 1388: '... Hosts SHOULD enforce per-process an...' RFC 2119 keyword, line 1391: '... Hosts SHOULD enforce per-process an...' RFC 2119 keyword, line 1409: '... Applications SHOULD enforce limits ...' (16 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2012) is 4420 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Clark' is mentioned on line 188, but not defined -- Looks like a reference, but probably isn't: '1988' on line 188 == Missing Reference: 'Bellovin' is mentioned on line 2944, but not defined -- Looks like a reference, but probably isn't: '1989' on line 200 == Missing Reference: 'NISCC' is mentioned on line 2760, but not defined -- Looks like a reference, but probably isn't: '2005' on line 220 == Missing Reference: 'Silbersack' is mentioned on line 220, but not defined == Missing Reference: 'Zalewski' is mentioned on line 3338, but not defined == Missing Reference: '2001b' is mentioned on line 498, but not defined == Missing Reference: 'Watson' is mentioned on line 2760, but not defined -- Looks like a reference, but probably isn't: '2004' on line 2760 == Missing Reference: 'Heffner' is mentioned on line 2077, but not defined -- Looks like a reference, but probably isn't: '2002' on line 2077 == Missing Reference: 'Barisani' is mentioned on line 677, but not defined -- Looks like a reference, but probably isn't: '2006' on line 2901 == Missing Reference: 'Ed3f' is mentioned on line 693, but not defined == Missing Reference: 'US-CERT' is mentioned on line 2791, but not defined == Missing Reference: '2005d' is mentioned on line 1005, but not defined == Missing Reference: 'Myst' is mentioned on line 701, but not defined -- Looks like a reference, but probably isn't: '1997' on line 1149 == Missing Reference: 'IANA' is mentioned on line 713, but not defined -- Looks like a reference, but probably isn't: '2007' on line 3301 == Missing Reference: 'Hnes' is mentioned on line 718, but not defined == Missing Reference: 'Braden' is mentioned on line 758, but not defined -- Looks like a reference, but probably isn't: '1994' on line 758 == Missing Reference: 'CCSDS' is mentioned on line 783, but not defined == Missing Reference: 'Heffernan' is mentioned on line 936, but not defined -- Looks like a reference, but probably isn't: '1998' on line 2995 == Missing Reference: '2005c' is mentioned on line 999, but not defined == Missing Reference: 'CERT' is mentioned on line 2204, but not defined -- Looks like a reference, but probably isn't: '1996' on line 3242 == Missing Reference: 'Gont' is mentioned on line 1132, but not defined == Missing Reference: '2008b' is mentioned on line 1132, but not defined == Missing Reference: 'Borman' is mentioned on line 1149, but not defined == Missing Reference: 'Eddy' is mentioned on line 1149, but not defined == Missing Reference: 'Lemon' is mentioned on line 1152, but not defined == Missing Reference: 'Bernstein' is mentioned on line 1238, but not defined == Missing Reference: 'Shimomura' is mentioned on line 1220, but not defined -- Looks like a reference, but probably isn't: '1995' on line 1220 == Missing Reference: 'Zquete' is mentioned on line 1228, but not defined == Missing Reference: 'CPNI' is mentioned on line 3414, but not defined -- Looks like a reference, but probably isn't: '2008' on line 3396 -- Looks like a reference, but probably isn't: '2000' on line 1811 == Missing Reference: 'Linux' is mentioned on line 1534, but not defined == Missing Reference: 'Shalunov' is mentioned on line 1811, but not defined == Missing Reference: '2004a' is mentioned on line 1947, but not defined -- Looks like a reference, but probably isn't: '2003' on line 2204 == Missing Reference: 'CORE' is mentioned on line 2204, but not defined == Missing Reference: '2005a' is mentioned on line 2301, but not defined == Missing Reference: 'Touch' is mentioned on line 2467, but not defined == Missing Reference: 'PCNWG' is mentioned on line 2580, but not defined -- Looks like a reference, but probably isn't: '2009' on line 3414 == Missing Reference: '2003a' is mentioned on line 2791, but not defined == Missing Reference: 'Fyodor' is mentioned on line 3215, but not defined == Missing Reference: '2006b' is mentioned on line 3215, but not defined == Missing Reference: 'TBIT' is mentioned on line 2896, but not defined -- Looks like a reference, but probably isn't: '2001' on line 2903 == Missing Reference: '2006a' is mentioned on line 2900, but not defined == Missing Reference: 'Miller' is mentioned on line 2901, but not defined == Missing Reference: 'Beck' is mentioned on line 2903, but not defined == Missing Reference: 'Rowland' is mentioned on line 3083, but not defined == Missing Reference: 'Zander' is mentioned on line 3086, but not defined == Missing Reference: 'Maimon' is mentioned on line 3242, but not defined == Missing Reference: '2003b' is mentioned on line 3338, but not defined == Unused Reference: 'RFC6191' is defined on line 3937, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 6093 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 6528 (Obsoleted by RFC 9293) == Outdated reference: A later version (-02) exists of draft-ietf-tcpm-3517bis-01 -- Obsolete informational reference (is this intentional?): RFC 1379 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 6429 (Obsoleted by RFC 9293) Summary: 6 errors (**), 0 flaws (~~), 51 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor Extensions F. Gont 3 (tcpm) UK CPNI 4 Internet-Draft March 13, 2012 5 Intended status: Informational 6 Expires: September 14, 2012 8 Survey of Security Hardening Methods for Transmission Control Protocol 9 (TCP) Implementations 10 draft-ietf-tcpm-tcp-security-03.txt 12 Abstract 14 This document surveys methods to harden Transmission Control Protocol 15 (TCP) implementations. It provides an overview of known attacks and 16 refers to the corresponding solutions in the TCP standards. 18 Status of this Memo 20 This Internet-Draft is submitted to IETF in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on September 14, 2012. 35 Copyright Notice 37 Copyright (c) 2012 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 5 54 1.2. Scope of this document . . . . . . . . . . . . . . . . . . 6 55 1.3. Organization of this document . . . . . . . . . . . . . . 7 56 2. The Transmission Control Protocol . . . . . . . . . . . . . . 7 57 3. TCP header fields . . . . . . . . . . . . . . . . . . . . . . 8 58 3.1. Source Port and Destination Port . . . . . . . . . . . . . 8 59 3.2. Sequence number . . . . . . . . . . . . . . . . . . . . . 9 60 3.3. Acknowledgement Number . . . . . . . . . . . . . . . . . . 10 61 3.4. Data Offset . . . . . . . . . . . . . . . . . . . . . . . 10 62 3.5. Control bits . . . . . . . . . . . . . . . . . . . . . . . 10 63 3.5.1. Reserved (four bits) . . . . . . . . . . . . . . . . . 10 64 3.5.2. CWR (Congestion Window Reduced) . . . . . . . . . . . 11 65 3.5.3. ECE (ECN-Echo) . . . . . . . . . . . . . . . . . . . . 11 66 3.5.4. URG . . . . . . . . . . . . . . . . . . . . . . . . . 11 67 3.5.5. ACK . . . . . . . . . . . . . . . . . . . . . . . . . 12 68 3.5.6. PSH . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 3.5.7. RST . . . . . . . . . . . . . . . . . . . . . . . . . 12 70 3.5.8. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 12 71 3.5.9. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 12 72 3.6. Window . . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 3.6.1. Security implications arising from closed windows . . 14 74 3.7. Checksum . . . . . . . . . . . . . . . . . . . . . . . . . 14 75 3.8. Urgent pointer . . . . . . . . . . . . . . . . . . . . . . 16 76 3.9. Options . . . . . . . . . . . . . . . . . . . . . . . . . 16 77 3.10. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 19 78 3.11. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 79 4. Common TCP Options . . . . . . . . . . . . . . . . . . . . . . 19 80 4.1. End of Option List (Kind = 0) . . . . . . . . . . . . . . 19 81 4.2. No Operation (Kind = 1) . . . . . . . . . . . . . . . . . 19 82 4.3. Maximum Segment Size (Kind = 2) . . . . . . . . . . . . . 19 83 4.4. Selective Acknowledgement Option . . . . . . . . . . . . . 20 84 4.4.1. SACK-permitted Option (Kind = 4) . . . . . . . . . . . 20 85 4.4.2. SACK Option (Kind = 5) . . . . . . . . . . . . . . . . 20 86 4.5. MD5 Option (Kind=19) . . . . . . . . . . . . . . . . . . . 21 87 4.6. Window scale option (Kind = 3) . . . . . . . . . . . . . . 21 88 4.7. Timestamps option (Kind = 8) . . . . . . . . . . . . . . . 22 89 4.7.1. Generation of timestamps . . . . . . . . . . . . . . . 22 90 4.7.2. Vulnerabilities . . . . . . . . . . . . . . . . . . . 22 91 5. Connection-establishment mechanism . . . . . . . . . . . . . . 24 92 5.1. SYN flood . . . . . . . . . . . . . . . . . . . . . . . . 24 93 5.2. Connection forgery . . . . . . . . . . . . . . . . . . . . 28 94 5.3. Connection-flooding attack . . . . . . . . . . . . . . . . 29 95 5.3.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 29 96 5.3.2. Countermeasures . . . . . . . . . . . . . . . . . . . 30 97 5.4. Firewall-bypassing techniques . . . . . . . . . . . . . . 32 99 6. Connection-termination mechanism . . . . . . . . . . . . . . . 32 100 6.1. FIN-WAIT-2 flooding attack . . . . . . . . . . . . . . . . 32 101 6.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 32 102 6.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 33 103 7. Buffer management . . . . . . . . . . . . . . . . . . . . . . 35 104 7.1. TCP retransmission buffer . . . . . . . . . . . . . . . . 36 105 7.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 36 106 7.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 37 107 7.2. TCP segment reassembly buffer . . . . . . . . . . . . . . 40 108 7.3. Automatic buffer tuning mechanisms . . . . . . . . . . . . 42 109 7.3.1. Automatic send-buffer tuning mechanisms . . . . . . . 43 110 7.3.2. Automatic receive-buffer tuning mechanism . . . . . . 45 111 8. TCP segment reassembly algorithm . . . . . . . . . . . . . . . 47 112 8.1. Problems that arise from ambiguity in the reassembly 113 process . . . . . . . . . . . . . . . . . . . . . . . . . 47 114 9. TCP Congestion Control . . . . . . . . . . . . . . . . . . . . 48 115 9.1. Congestion control with misbehaving receivers . . . . . . 48 116 9.1.1. ACK division . . . . . . . . . . . . . . . . . . . . . 48 117 9.1.2. DupACK forgery . . . . . . . . . . . . . . . . . . . . 49 118 9.1.3. Optimistic ACKing . . . . . . . . . . . . . . . . . . 49 119 9.2. Blind DupACK triggering attacks against TCP . . . . . . . 50 120 9.2.1. Blind throughput-reduction attack . . . . . . . . . . 52 121 9.2.2. Blind flooding attack . . . . . . . . . . . . . . . . 53 122 9.2.3. Difficulty in performing the attacks . . . . . . . . . 53 123 9.2.4. Modifications to TCP's loss recovery algorithms . . . 54 124 9.2.5. Countermeasures . . . . . . . . . . . . . . . . . . . 55 125 9.3. TCP Explicit Congestion Notification (ECN) . . . . . . . . 55 126 10. TCP API . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 127 10.1. Passive opens and binding sockets . . . . . . . . . . . . 56 128 10.2. Active opens and binding sockets . . . . . . . . . . . . . 57 129 11. Blind in-window attacks . . . . . . . . . . . . . . . . . . . 59 130 11.1. Blind TCP-based connection-reset attacks . . . . . . . . . 59 131 11.1.1. RST flag . . . . . . . . . . . . . . . . . . . . . . . 60 132 11.1.2. SYN flag . . . . . . . . . . . . . . . . . . . . . . . 60 133 11.1.3. Security/Compartment . . . . . . . . . . . . . . . . . 60 134 11.1.4. Precedence . . . . . . . . . . . . . . . . . . . . . . 61 135 11.1.5. Illegal options . . . . . . . . . . . . . . . . . . . 61 136 11.2. Blind data-injection attacks . . . . . . . . . . . . . . . 61 137 12. Information leaking . . . . . . . . . . . . . . . . . . . . . 62 138 12.1. Remote Operating System detection via TCP/IP stack 139 fingerprinting . . . . . . . . . . . . . . . . . . . . . . 62 140 12.1.1. FIN probe . . . . . . . . . . . . . . . . . . . . . . 63 141 12.1.2. Bogus flag test . . . . . . . . . . . . . . . . . . . 63 142 12.1.3. TCP ISN sampling . . . . . . . . . . . . . . . . . . . 63 143 12.1.4. TCP initial window . . . . . . . . . . . . . . . . . . 63 144 12.1.5. RST sampling . . . . . . . . . . . . . . . . . . . . . 64 145 12.1.6. TCP options . . . . . . . . . . . . . . . . . . . . . 65 146 12.1.7. Retransmission Timeout (RTO) sampling . . . . . . . . 65 148 12.2. System uptime detection . . . . . . . . . . . . . . . . . 66 149 13. Covert channels . . . . . . . . . . . . . . . . . . . . . . . 66 150 14. TCP Port scanning . . . . . . . . . . . . . . . . . . . . . . 66 151 14.1. Traditional connect() scan . . . . . . . . . . . . . . . . 67 152 14.2. SYN scan . . . . . . . . . . . . . . . . . . . . . . . . . 67 153 14.3. FIN, NULL, and XMAS scans . . . . . . . . . . . . . . . . 68 154 14.4. Maimon scan . . . . . . . . . . . . . . . . . . . . . . . 69 155 14.5. Window scan . . . . . . . . . . . . . . . . . . . . . . . 69 156 14.6. ACK scan . . . . . . . . . . . . . . . . . . . . . . . . . 70 157 15. Processing of ICMP error messages by TCP . . . . . . . . . . . 70 158 16. TCP interaction with the Internet Protocol (IP) . . . . . . . 70 159 16.1. TCP-based traceroute . . . . . . . . . . . . . . . . . . . 71 160 16.2. Blind TCP data injection through fragmented IP traffic . . 71 161 16.3. Broadcast and multicast IP addresses . . . . . . . . . . . 73 162 17. Security Considerations . . . . . . . . . . . . . . . . . . . 73 163 18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 73 164 19. References (to be translated to xml) . . . . . . . . . . . . . 74 165 20. References . . . . . . . . . . . . . . . . . . . . . . . . . . 84 166 20.1. Normative References . . . . . . . . . . . . . . . . . . . 84 167 20.2. Informative References . . . . . . . . . . . . . . . . . . 84 168 Appendix A. TODO list . . . . . . . . . . . . . . . . . . . . . . 85 169 Appendix B. Change log (to be removed by the RFC Editor 170 before publication of this document as an RFC) . . . 85 171 B.1. Changes from draft-ietf-tcpm-tcp-security-02 . . . . . . . 85 172 B.2. Changes from draft-ietf-tcpm-tcp-security-01 . . . . . . . 86 173 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 86 175 1. Preface 177 1.1. Introduction 179 The TCP/IP protocol suite was conceived in an environment that was 180 quite different from the hostile environment they currently operate 181 in. However, the effectiveness of the protocols led to their early 182 adoption in production environments, to the point that, to some 183 extent, the current world's economy depends on them. 185 While many textbooks and articles have created the myth that the 186 Internet protocols were designed for warfare environments, the top 187 level goal for the DARPA Internet Program was the sharing of large 188 service machines on the ARPANET [Clark, 1988]. As a result, many 189 protocol specifications focus only on the operational aspects of the 190 protocols they specify, and overlook their security implications. 192 While the Internet technology evolved since it early inception, the 193 Internet's building blocks are basically the same core protocols 194 adopted by the ARPANET more than two decades ago. During the last 195 twenty years, many vulnerabilities have been identified in the TCP/IP 196 stacks of a number of systems. Some of them were based on flaws in 197 some protocol implementations, affecting only a reduced number of 198 systems, while others were based in flaws in the protocols 199 themselves, affecting virtually every existing implementation 200 [Bellovin, 1989]. Even in the last couple of years, researchers were 201 still working on security problems in the core protocols [NISCC, 202 2004] [NISCC, 2005]. 204 The discovery of vulnerabilities in the TCP/IP protocol suite usually 205 led to reports being published by a number of CSIRTs (Computer 206 Security Incident Response Teams) and vendors, which helped to raise 207 awareness about the threats and the best mitigations known at the 208 time the reports were published. Unfortunately, this also led to the 209 documentation of the discovered protocol vulnerabilities being spread 210 among a large number of documents, which are sometimes difficult to 211 identify. 213 For some reason, much of the effort of the security community on the 214 Internet protocols did not result in official documents (RFCs) being 215 issued by the IETF (Internet Engineering Task Force). This basically 216 led to a situation in which "known" security problems have not always 217 been addressed by all vendors. In addition, in many cases vendors 218 have implemented quick "fixes" to the identified vulnerabilities 219 without a careful analysis of their effectiveness and their impact on 220 interoperability [Silbersack, 2005]. 222 Producing a secure TCP/IP implementation nowadays is a very difficult 223 task, in part because of the lack of a single document that serves as 224 a security roadmap for the protocols. Implementers are faced with 225 the hard task of identifying relevant documentation and 226 differentiating between that which provides correct advice, and that 227 which provides misleading advice based on inaccurate or wrong 228 assumptions. 230 This document is the result of a security assessment of the IETF 231 specifications of the Transmission Control Protocol (TCP), from a 232 security point of view. Possible threats are identified and, where 233 possible, countermeasures are described. Additionally, many 234 implementation flaws that have led to security vulnerabilities have 235 been referenced in the hope that future implementations will not 236 incur the same problems. 238 This document is based on the "Security Assessment of the 239 Transmission Control Protocol (TCP)" released by the UK Centre for 240 the Protection of National Infrastructure (CPNI), available at: http: 241 //www.cpni.gov.uk/Products/technicalnotes/ 242 Feb-09-security-assessment-TCP.aspx . 244 1.2. Scope of this document 246 While there are a number of protocols that may affect the way TCP 247 operates, this document focuses only on the specifications of the 248 Transmission Control Protocol (TCP) itself. 250 The machanisms described in the following documents were selected for 251 assessment as part of this work: 253 o RFC 793, "Transmission Control Protocol. DARPA Internet Program. 254 Protocol Specification" (91 pages) 256 o RFC 1122, "Requirements for Internet Hosts -- Communication 257 Layers" (116 pages) 259 o RFC 1191, "Path MTU Discovery" (19 pages) 261 o RFC 1323, "TCP Extensions for High Performance" (37 pages) 263 o RFC 1948, "Defending Against Sequence Number Attacks" (6 pages) 265 o RFC 1981, "Path MTU Discovery for IP version 6" (15 pages) 267 o RFC 2018, "TCP Selective Acknowledgment Options" (12 pages) 269 o RFC 2385, "Protection of BGP Sessions via the TCP MD5 Signature 270 Option" (6 pages) 272 o RFC 2581, "TCP Congestion Control" (14 pages) 274 o RFC 2675, "IPv6 Jumbograms" (9 pages) 276 o RFC 2883, "An Extension to the Selective Acknowledgement (SACK) 277 Option for TCP" (17 pages) 279 o RFC 2884, "Performance Evaluation of Explicit Congestion 280 Notification (ECN) in IP Networks" (18 pages) 282 o RFC 2988, "Computing TCP's Retransmission Timer" (8 pages) 284 o RFC 3168, "The Addition of Explicit Congestion Notification (ECN) 285 to IP" (63 pages) 287 o RFC 3465, "TCP Congestion Control with Appropriate Byte Counting 288 (ABC)" (10 pages) 290 o RFC 3517, "A Conservative Selective Acknowledgment (SACK)-based 291 Loss Recovery Algorithm for TCP" (13 pages) 293 o RFC 3540, "Robust Explicit Congestion Notification (ECN) Signaling 294 with Nonces" (13 pages) 296 o RFC 3782, "The NewReno Modification to TCP's Fast Recovery 297 Algorithm" (19 pages) 299 1.3. Organization of this document 301 This document is basically organized in two parts. The first part 302 contains a discussion of each of the TCP header fields, identifies 303 their security implications, and discusses the possible 304 countermeasures. The second part contains an analysis of the 305 security implications of the mechanisms and policies implemented by 306 TCP, and of a number of implementation strategies in use by a number 307 of popular TCP implementations. 309 2. The Transmission Control Protocol 311 The Transmission Control Protocol (TCP) is a connection-oriented 312 transport protocol that provides a reliable byte-stream data transfer 313 service. Very few assumptions are made about the reliability of 314 underlying data transfer services below the TCP layer. Basically, 315 TCP assumes it can obtain a simple, potentially unreliable datagram 316 service from the lower level protocols. 318 The core TCP specification, RFC 793 [RFC0793], dates back to 1981 and 319 standardizes the basic mechanisms and policies of TCP. RFC 1122 320 [RFC1122] provides clarifications and errata for the original 321 specification. RFC 2581 [RFC5681] specifies TCP congestion control 322 and avoidance mechanisms, not present in the original specification. 323 Other documents specify extensions and improvements for TCP. 325 The large amount of documents that specify extensions, improvements, 326 or modifications to existing TCP mechanisms has led the IETF to 327 publish a roadmap for TCP, RFC 4614 [Duke et al, 2006], that 328 clarifies the relevance of each of those documents. 330 3. TCP header fields 332 RFC 793 [RFC0793] defines the syntax of a TCP segment, along with the 333 semantics of each of the header fields. 335 The minimum TCP header size is 20 bytes, and corresponds to a TCP 336 segment with no options and no data. However, a TCP module might be 337 handed an (illegitimate) "TCP segment" of less than 20 bytes. 338 Therefore, before doing any processing of the TCP header fields, the 339 following check should be performed by TCP on the segments handed by 340 the internet layer: 342 Segment.Size >= 20 344 If a segment does not pass this check, it should be dropped. 346 The following subsections contain further sanity checks that should 347 be performed on TCP segments. 349 3.1. Source Port and Destination Port 351 The Source Port field contains a 16-bit number that identifies the 352 TCP end-point that originated this TCP segment. The TCP Destination 353 Port contains a 16-bit number that identifies the destination TCP 354 end-point of this segment. In most of the discussion we refer to 355 client-side (or "ephemeral") port-numbers and server-side port 356 numbers, since that distinction is what usually affects the 357 interpretation of a port number. 359 Most active attacks against ongoing TCP connections require the 360 attacker to guess or know the four-tuple that identifies the 361 connection. As a result, randomization of the TCP ephemeral ports 362 provides a (partial) mitigation against off-path attacks. [RFC6056] 363 provides guidance in this area. 365 Some implementations have been known to crash when a TCP segment in 366 which the source end-point (IP Source Address, TCP Source Port) is 367 the same as the destination end-point (IP Destination Address, TCP 368 Destination Port). [draft-gont-tcpm-tcp-mirrored-endpoints-00.txt] 369 describes this issue in detail and provides advice in this area. 371 While some systems restrict use of the port numbers in the range 372 0-1024 to privileged users, applications should not grant any trust 373 based on the port numbers used for a TCP connection. 375 Not all systems require superuser privileges to bind port numbers 376 in that range. Besides, with desktop computers such "distinction" 377 has generally become irrelevant. 379 Middle-boxes such as packet filters must not assume that clients use 380 port numbers from only the Dynamic or Registered port ranges. 382 It should also be noted that some clients, such as DNS resolvers, 383 are known to use port numbers from the "Well Known Ports" range. 384 Therefore, middle-boxes such as packet filters MUST NOT assume 385 that clients use port number from only the Dynamic or Registered 386 port ranges. 388 3.2. Sequence number 390 Predictable sequence numbers allow a variety of attacks against TCP, 391 such as those described in Section 5.2 and Section 11 of this 392 document. This vulnerability was first described in [Morris1985], 393 and its exploitation was widely publicized about 10 years later 394 [Shimomura1995]. 396 In order to mitigate this vulnerabilities, some implementations set 397 the TCP ISN to a PRNG. However, this has been known to cause 398 interoperability problems. [RFC6528] provides advice in this area. 400 Another security consideration that should be made about TCP sequence 401 numbers is that they might allow an attacker to count the number of 402 systems behind a Network Address Translator (NAT) [Srisuresh and 403 Egevang, 2001]. Depending on the ISN generators implemented by each 404 of the systems behind the NAT, an attacker might be able to count the 405 number of systems behind the NAT by establishing a number of TCP 406 connections (using the public address of the NAT) and indentifying 407 the number of different sequence number "spaces". [Gont and 408 Srisuresh, 2008] provides a detailed discussion of the security 409 implications of NATs and of the possible mitigations for this and 410 other issues. 412 3.3. Acknowledgement Number 414 If the ACK bit is on, the Acknowledgement Number contains the value 415 of the next sequence number the sender of this segment is expecting 416 to receive. According to RFC 793, the Acknowledgement Number is 417 considered valid as long as it does not acknowledge the receipt of 418 data that has not yet been sent. 420 However, as a result of recent concerns on forgery attacks against 421 TCP (see Section 11 of this document) [RFC5961] has proposed to 422 enforce a more strict check on the Acknowledgement Number of segments 423 that have the ACK bit set. See for more details. 425 If the ACK bit is off, the Acknowledgement Number field is not valid. 426 We recommend TCP implementations to set the Acknowledgement Number to 427 zero when sending a TCP segment that does not have the ACK bit set 428 (i.e., a SYN segment). Some TCP implementations have been known to 429 fail to set the Acknowledgement Number to zero, thus leaking 430 information. 432 TCP Acknowledgements are also used to perform heuristics for loss 433 recovery and congestion control. Section 9 of this document 434 describes a number of ways in which these mechanisms can be 435 exploited. 437 3.4. Data Offset 439 [draft-gont-tcpm-tcp-sanity-checks-00.txt] specifies a number of 440 sanity checks that should be performed on the Data Offset field. 442 3.5. Control bits 444 The following subsections provide a discussion of the different 445 control bits in the TCP header. TCP segments with unusual 446 combinations of flags set have been known in the past to cause 447 malfunction of some implementations, sometimes to the extent of 448 causing them to crash [RFC1025] [RFC1379]. These packets are still 449 usually employed for the purpose of TCP/IP stack fingerprinting. 450 Section 12.1 contains a discussion of TCP/IP stack fingerprinting. 452 3.5.1. Reserved (four bits) 454 These four bits are reserved for future use, and must be zero. As 455 with virtually every field, the Reserved field could be used as a 456 covert channel. While there exist intermediate devices such as 457 protocol scrubbers that clear these bits, and firewalls that drop/ 458 reject segments with any of these bits set, these devices should 459 consider the impact of these policies on TCP interoperability. For 460 example, as TCP continues to evolve, all or part of the bits in the 461 Reserved field could be used to implement some new functionality. If 462 some middle-box or end-system implementation were to drop a TCP 463 segment merely because some of these bits are not set to zero, 464 interoperability problems would arise. 466 3.5.2. CWR (Congestion Window Reduced) 468 The CWR flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is used 469 as part of the Explicit Congestion Notification (ECN) mechanism. For 470 connections in any of the synchronized states, this flag indicates, 471 when set, that the TCP sending this segment has reduced its 472 congestion window. 474 An analysis of the security implications of ECN can be found in 475 Section 9.3 of this document. 477 3.5.3. ECE (ECN-Echo) 479 The ECE flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is used 480 as part of the Explicit Congestion Notification (ECN) mechanism. 482 An analysis of the security implications of ECN can be found in 483 Section 9.3 of this document. 485 3.5.4. URG 487 When the URG flag is set, the Urgent Pointer field contains the 488 current value of the urgent pointer. 490 Receipt of an "urgent" indication generates, in a number of 491 implementations (such as those in UNIX-like systems), a software 492 interrupt (signal) that is delivered to the corresponding process. 493 In UNIX-like systems, receipt of an urgent indication causes a SIGURG 494 signal to be delivered to the corresponding process. 496 A number of applications handle TCP urgent indications by installing 497 a signal handler for the corresponding signal (e.g., SIGURG). As 498 discussed in [Zalewski, 2001b], some signal handlers can be 499 maliciously exploited by an attacker, for example to gain remote 500 access to a system. While secure programming of signal handlers is 501 out of the scope of this document, we nevertheless raise awareness 502 that TCP urgent indications might be exploited to abuse poorly- 503 written signal handlers. 505 Section 3.9 discusses the security implications of the TCP urgent 506 mechanism. 508 3.5.5. ACK 510 When the ACK bit is one, the Acknowledgment Number field contains the 511 next sequence number expected, cumulatively acknowledging the receipt 512 of all data up to the sequence number in the Acknowledgement Number, 513 minus one. Section 3.4 of this document describes sanity checks that 514 should be performed on the Acknowledgement Number field. 516 TCP Acknowledgements are also used to perform heuristics for loss 517 recovery and congestion control. Section 9 of this document 518 describes a number of ways in which these mechanisms can be 519 exploited. 521 3.5.6. PSH 523 [draft-gont-tcpm-tcp-push-semantics-00.txt] describes a number of 524 security issues that may arise as a result of the PUSH semantics, and 525 proposes a number of ways to mitigate these issues. 527 3.5.7. RST 529 The RST bit is used to request the abortion (abnormal close) of a TCP 530 connection. RFC 793 [RFC0793] suggests that an RST segment should be 531 considered valid if its Sequence Number is valid (i.e., falls within 532 the receive window). However, in response to the security concerns 533 raised by [Watson, 2004] and [NISCC, 2004], [RFC6429] proposed 534 stricter validity checks. Please see [RFC6429] for additional 535 details. 537 Section 11.1 of this document describes TCP-based connection-reset 538 attacks, along with a number of countermeasures to mitigate their 539 impact. 541 3.5.8. SYN 543 The SYN bit is used during the connection-establishment phase, to 544 request the synchronization of sequence numbers. 546 There are basically four different vulnerabilities that make use of 547 the SYN bit: SYN-flooding attacks, connection forgery attacks, 548 connection flooding attacks, and connection-reset attacks. They are 549 described in Section 5.1, Section 5.2, Section 5.3, and Section 550 11.1.2, respectively, along with the possible countermeasures. 552 3.5.9. FIN 554 The FIN flag is used to signal the remote end-point the end of the 555 data transfer in this direction. Receipt of a valid FIN segment 556 (i.e., a TCP segment with the FIN flag set) causes the transition in 557 the connection state, as part of what is usually referred to as the 558 "connection termination phase". 560 The connection-termination phase can be exploited to perform a number 561 of resource-exhaustion attacks. Section 6 of this document describes 562 a number of attacks that exploit the connection-termination phase 563 along with the possible countermeasures. 565 3.6. Window 567 The TCP Window field advertises how many bytes of data the remote 568 peer is allowed to send before a new advertisement is made. 569 Theoretically, the maximum transfer rate that can be achieved by TCP 570 is limited to: 572 Maximum Transfer Rate = Window / RTT 574 This means that, under ideal network conditions (e.g., no packet 575 loss), the TCP Window in use should be at least: 577 Window = 2 * Bandwidth * Delay 579 Using a larger Window than that resulting from the previous equation 580 will not provide any improvements in terms of performance. 582 In practice, selection of the most convenient Window size may also 583 depend on a number of other parameters, such as: packet loss rate, 584 loss recovery mechanisms in use, etc. 586 An aspect of the TCP Window that is usually overlooked is the 587 security implications of its size. Increasing the TCP window 588 increases the sequence number space that will be considered "valid" 589 for incoming segments. Thus, use of unnecessarily large TCP Window 590 sizes increases TCP's vulnerability to forgery attacks unnecessarily. 592 In those scenarios in which the network conditions are known and/or 593 can be easily predicted, it is recommended that the TCP Window is 594 never set to a value larger than that resulting from the equations 595 above. Additionally, the nature of the application running on top of 596 TCP should be considered when tuning the TCP window. As an example, 597 an H.245 signaling application certainly does not have high 598 requirements on throughput, and thus a window size of around 4 KBytes 599 will usually fulfill its needs, while keeping TCP's resistance to 600 off-path forgery attacks at a decent level. Some rough measurements 601 seem to indicate that a TCP window of 4Kbytes is common practice for 602 TCP connections servicing applications such as BGP. 604 In principle, a possible approach to avoid requiring administrators 605 to manually set the TCP window would be to implement an automatic 606 buffer tuning mechanism, such as that described in [Heffner, 2002]. 607 However, as discussed in Section 7.3.2 of this document these 608 mechanisms can be exploited to perform other types of attacks. 610 3.6.1. Security implications arising from closed windows 612 When a TCP end-point is not willing to receive any more data (before 613 some of the data that have already been received are consumed), it 614 will advertise a TCP window of zero bytes. This will effectively 615 stop the sender from sending any new data to the TCP receiver. 616 Transmission of new data will resume when the TCP receiver advertises 617 a nonzero TCP window, usually with a TCP segment that contains no 618 data ("an ACK"). 620 This segment is usually referred to as a "window update", as the 621 only purpose of this segment is to update the server regarding the 622 new window. 624 To accommodate those scenarios in which the ACK segment that "opens" 625 the window is lost, TCP implements a "persist timer" that causes the 626 TCP sender to query the TCP receiver periodically if the last segment 627 received advertised a window of zero bytes. This probe simply 628 consists of sending one byte of new data that will force the TCP 629 receiver to send an ACK segment back to the TCP sender, containing 630 the current TCP window. Similarly to the retransmission timeout 631 timer, an exponential back-off is used when calculating the 632 retransmission timer, so that the spacing between probes increases 633 exponentially. 635 A fundamental difference between the "persist timer" and the 636 retransmission timer is that there is no limit on the amount of time 637 during which a TCP can advertise a zero window. This means that a 638 TCP end-point could potentially advertise a zero window forever, thus 639 keeping kernel memory at the TCP sender tied to the TCP 640 retransmission buffer. This could clearly be exploited as a vector 641 for performing a Denial of Service (DoS) attack against TCP, such as 642 that described in Section 7.1 of this document. 644 Section 7.1 of this document describes a Denial of Service attack 645 that aims at exhausting the kernel memory used for the TCP 646 retransmission buffer, along with possible countermeasures. 648 3.7. Checksum 650 While in principle there should not be security implications arising 651 from the Checksum field, due to non-RFC-compliant implementations, 652 the Checksum can be exploited to detect firewalls, evade network 653 intrusion detection systems (NIDS), and/or perform Denial of Service 654 attacks. 656 If a stateful firewall does not check the TCP Checksum in the 657 segments it processes, an attacker can exploit this situation to 658 perform a variety of attacks. For example, he could send a flood of 659 TCP segments with invalid checksums, which would nevertheless create 660 state information at the firewall. When each of these segments is 661 received at its intended destination, the TCP checksum will be found 662 to be incorrect, and the corresponding will be silently discarded. 663 As these segments will not elicit a response (e.g., an RST segment) 664 from the intended recipients, the corresponding connection state 665 entries at the firewall will not be removed. Therefore, an attacker 666 may end up tying all the state resources of the firewall to TCP 667 connections that will never complete or be terminated, probably 668 leading to a Denial of Service to legitimate users, or forcing the 669 firewall to randomly drop connection state entries. 671 If a NIDS does not check the Checksum of TCP segments, an attacker 672 may send TCP segments with an invalid checksum to cause the NIDS to 673 obtain a TCP data stream different from that obtained by the system 674 being monitored. In order to "confuse" the NIDS, the attacker would 675 send TCP segments with an invalid Checksum and a Sequence Number that 676 would overlap the sequence number space being used for his malicious 677 activity. FTester [Barisani, 2006] is a tool that can be used to 678 assess NIDS on this issue. 680 Finally, an attacker performing port-scanning could potentially 681 exploit intermediate systems that do not check the TCP Checksum to 682 detect whether a given TCP port is being filtered by an intermediate 683 firewall, or the port is actually closed by the host being port- 684 scanned. If a given TCP port appeared to be closed, the attacker 685 would then send a SYN segment with an invalid Checksum. If this 686 segment elicited a response (either an ICMP error message or a TCP 687 RST segment) to this packet, then that response should come from a 688 system that does not check the TCP checksum. Since normal host 689 implementations of the TCP protocol do check the TCP checksum, such a 690 response would most likely come from a firewall or some other middle- 691 box. 693 [Ed3f, 2002] describes the exploitation of the TCP checksum for 694 performing the above activities. [US-CERT, 2005d] provides an 695 example of a TCP implementation that failed to check the TCP 696 checksum. 698 3.8. Urgent pointer 700 Some implementations have been found to be unable to process TCP 701 urgent indications correctly. [Myst, 1997] originally described how 702 TCP urgent indications could be exploited to perform a Denial of 703 Service (DoS) attack against some TCP/IP implementations, usually 704 leading to a system crash. 706 [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes a number of 707 sanity checks to be enforced on TCP segments regarding urgent 708 indications. [RFC6093] deprecates the use of urgent indications in 709 new applications. 711 3.9. Options 713 [IANA, 2007] contains the official list of the assigned option 714 numbers. TCP Options have been specified in the past both within the 715 IETF and by other groups. [Hnes, 2007] contains an un-official 716 updated version of the IANA list of assigned option numbers. The 717 following table contains a summary of the assigned TCP option 718 numbers, which is based on [Hnes, 2007]. 720 +--------+----------------------+-----------------------------------+ 721 | Kind | Meaning | Summary | 722 +--------+----------------------+-----------------------------------+ 723 | 0 | End of Option List | Discussed in Section 4.1 | 724 +--------+----------------------+-----------------------------------+ 725 | 1 | No-Operation | Discussed in Section 4.2 | 726 +--------+----------------------+-----------------------------------+ 727 | 2 | Maximum Segment Size | Discussed in Section 4.3 | 728 +--------+----------------------+-----------------------------------+ 729 | 3 | WSOPT - Window Scale | Discussed in Section 4.6 | 730 +--------+----------------------+-----------------------------------+ 731 | 4 | SACK Permitted | Discussed in Section 4.4.1 | 732 +--------+----------------------+-----------------------------------+ 733 | 5 | SACK | Discussed in Section 4.4.2 | 734 +--------+----------------------+-----------------------------------+ 735 | 6 | Echo (obsoleted by | Obsolete. Specified in RFC 1072 | 736 | | option 8) | [Jacobson and Braden, 1988] | 737 +--------+----------------------+-----------------------------------+ 738 | 7 | Echo Reply | Obsolete. Specified in RFC 1072 | 739 | | (obsoleted by option | [Jacobson and Braden, 1988] | 740 | | 8) | | 741 +--------+----------------------+-----------------------------------+ 742 | 8 | TSOPT - Time Stamp | Discussed in Section 4.7 | 743 | | Option | | 744 +--------+----------------------+-----------------------------------+ 745 | 9 | Partial Order | Historic. Specified in RFC 1693 | 746 | | Connection Permitted | [Connolly et al, 1994] | 747 +--------+----------------------+-----------------------------------+ 748 | 10 | Partial Order | Historic. Specified in RFC 1693 | 749 | | Service Profile | [Connolly et al, 1994] | 750 +--------+----------------------+-----------------------------------+ 751 | 11 | CC | Historic. Specified in RFC 1644 | 752 | | | [Braden, 1994] | 753 +--------+----------------------+-----------------------------------+ 754 | 12 | CC.NEW | Historic. Specified in RFC 1644 | 755 | | | [Braden, 1994] | 756 +--------+----------------------+-----------------------------------+ 757 | 13 | CC.ECHO | Historic. Specified in RFC 1644 | 758 | | | [Braden, 1994] | 759 +--------+----------------------+-----------------------------------+ 760 | 14 | TCP Alternate | Historic. Specified in RFC 1146 | 761 | | Checksum Request | [Zweig and Partridge, 1990] | 762 +--------+----------------------+-----------------------------------+ 763 | 15 | TCP Alternate | Historic. Specified in RFC 1145 | 764 | | Checksum Data | [Zweig and Partridge, 1990] | 765 +--------+----------------------+-----------------------------------+ 766 | 16 | Skeeter | Historic | 767 +--------+----------------------+-----------------------------------+ 768 +--------+----------------------+-----------------------------------+ 769 | 17 | Bubba | Historic | 770 +--------+----------------------+-----------------------------------+ 771 | 18 | Trailer Checksum | Historic | 772 | | Option | | 773 +--------+----------------------+-----------------------------------+ 774 | 19 | MD5 Signature Option | Discussed in Section 4.5 | 775 +--------+----------------------+-----------------------------------+ 776 | 20 | SCPS Capabilities | Specified in [CCSDS, 2006] | 777 +--------+----------------------+-----------------------------------+ 778 | 21 | Selective Negative | Specified in [CCSDS, 2006] | 779 | | Acknowledgements | | 780 +--------+----------------------+-----------------------------------+ 781 | 22 | Record Boundaries | Specified in [CCSDS, 2006] | 782 +--------+----------------------+-----------------------------------+ 783 | 23 | Corruption | Specified in [CCSDS, 2006] | 784 | | experienced | | 785 +--------+----------------------+-----------------------------------+ 786 | 24 | SNAP | Historic | 787 +--------+----------------------+-----------------------------------+ 788 | 25 | Unassigned (released | Unassigned | 789 | | 2000-12-18) | | 790 +--------+----------------------+-----------------------------------+ 791 | 26 | TCP Compression | Historic | 792 | | Filter | | 793 +--------+----------------------+-----------------------------------+ 794 | 27 | Quick-Start Response | Specified in RFC 4782 [Floyd et | 795 | | | al, 2007] | 796 +--------+----------------------+-----------------------------------+ 797 | 28-252 | Unassigned | Unassigned | 798 +--------+----------------------+-----------------------------------+ 799 | 253 | RFC3692-style | Described by RFC 4727 [Fenner, | 800 | | Experiment 1 | 2006] | 801 +--------+----------------------+-----------------------------------+ 802 | 254 | RFC3692-style | Described by RFC 4727 [Fenner, | 803 | | Experiment 2 | 2006] | 804 +--------+----------------------+-----------------------------------+ 806 Table 1: TCP Options 808 There are two cases for the format of a TCP option: 810 o Case 1: A single byte of option-kind. 812 o Case 2: An option-kind byte, followed by an option-length byte, 813 and the actual option-data bytes. 815 In options of the Case 2 above, the option-length byte counts the 816 option-kind byte and the option-length byte, as well as the actual 817 option-data bytes. 819 All options except "End of Option List" (Kind = 0) and "No Operation" 820 (Kind = 1), are of "Case 2". 822 [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes a number of 823 sanity checks that should be performed on TCP options. 825 Section 4 discusses the security implications of common TCP options. 827 3.10. Padding 829 The TCP header padding is used to ensure that the TCP header ends and 830 data begins on a 32-bit boundary. The padding is composed of zeros. 832 3.11. Data 834 The data field contains the upper-layer packet being transmitted by 835 means of TCP. This payload is processed by the application process 836 making use of the transport services of TCP. Therefore, the security 837 implications of this field are out of the scope of this document. 839 4. Common TCP Options 841 4.1. End of Option List (Kind = 0) 843 This option indicates the "End of Options". As noted in 844 [draft-gont-tcpm-tcp-sanity-checks-00.txt], some implementations pad 845 the end of options with "No Operation" options rather than including 846 an "End of Options List" option. 848 4.2. No Operation (Kind = 1) 850 The no-operation option is basically used to allow the sending system 851 to align subsequent options in, for example, 32-bit boundaries. 853 This option does not have any known security implications. 855 4.3. Maximum Segment Size (Kind = 2) 857 The Maximum Segment Size (MSS) option is used to indicate to the 858 remote TCP endpoint the maximum segment size this TCP is willing to 859 receive. 861 The MSS option has been employed for performing DoS attacks, by 862 advertising very small MSS values thus greatly increasing the packet- 863 rate used by the victim system. 864 [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes this issue, and 865 proposes sanity checks to mitigate it. 867 4.4. Selective Acknowledgement Option 869 The Selective Acknowledgement option provides an extension to allow 870 the acknowledgement of individual segments, to enhance TCP's loss 871 recovery. 873 Two options are involved in the SACK mechanism. The "Sack-permitted 874 option" is sent during the connections-establishment phase, to 875 advertise that SACK is supported. If both TCP peers agree to use 876 selective acknowledgements, the actual selective acknowledgements are 877 sent, if needed, by means of "SACK options". 879 4.4.1. SACK-permitted Option (Kind = 4) 881 [draft-gont-tcpm-tcp-sanity-checks-00.txt] to be performed on this 882 option. 884 4.4.2. SACK Option (Kind = 5) 886 The TCP receiving a SACK option is expected to keep track of the 887 selectively-acknowledged blocks. Even when space in the TCP header 888 is limited (and thus each TCP segment can selectively-acknowledge at 889 most four blocks of data), an attacker could try to perform a buffer 890 overflow or a resource-exhaustion attack by sending a large number of 891 SACK options. 893 For example, an attacker could send a large number of SACK options, 894 each of them acknowledging one byte of data. Additionally, for the 895 purpose of wasting resources on the attacked system, each of these 896 blocks would be separated from each other by one byte, to prevent the 897 attacked system from coalescing two (or more) contiguous SACK blocks 898 into a single SACK block. If the attacked system kept track of each 899 SACKed block by storing both the Left Edge and the Right Edge of the 900 block, then for each window of data, the attacker could waste up to 4 901 * Window bytes of memory at the attacked TCP. 903 The value "4 * Window" results from the expression "(Window / 2) * 904 8", in which the value "2" accounts for the 1-byte block 905 selectively-acknowledged by each SACK block and 1 byte that would 906 be used to separate each SACK blocks from each other, and the 907 value "8" accounts for the 8 bytes needed to store the Left Edge 908 and the Right Edge of each SACKed block. 910 [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes sanity checks to 911 be performed on this option such that this and other possible issues 912 are mitigated. 914 4.5. MD5 Option (Kind=19) 916 The TCP MD5 option provides a mechanism for authenticating TCP 917 segments with a 18-byte digest produced by the MD5 algorithm. The 918 option consists of an option-kind byte (which must be 19), an option- 919 length byte (which must be 18), and a 16-byte MD5 digest. 921 A basic weakness on the TCP MD5 option is that the MD5 algorithm 922 itself has been known (for a long time) to be vulnerable to collision 923 search attacks. 925 [Bellovin, 2006] argues that it has two other weaknesses, namely that 926 it does not provide a key identifier, and that it has no provision 927 for automated key management. However, it is generally accepted that 928 while a Key-ID field can be a good approach for providing smooth key 929 rollover, it is not actually a requirement. For instance, most 930 systems implementing the TCP MD5 option include a "keychain" 931 mechanism that fully supports smooth key rollover. Additionally, 932 with some further work, ISAKMP/IKE could be used to configure the MD5 933 keys. 935 It is interesting to note that while the TCP MD5 option, as specified 936 by RFC 2385 [Heffernan, 1998], addresses the TCP-based forgery 937 attacks against TCP discussed in Section 11, it does not address the 938 ICMP-based connection-reset attacks discussed in Section 15. As a 939 result, while a TCP connection may be protected from TCP-based 940 forgery attacks by means of the MD5 option, an attacker might still 941 be able to successfully perform the ICMP-based counter-part. 943 The TCP MD5 option has been obsoleted by the TCP-AO. 945 4.6. Window scale option (Kind = 3) 947 The window scale option provides a mechanism to expand the definition 948 of the TCP window to 32 bits, such that the performance of TCP can be 949 improved in some network scenarios. The Window scale option consists 950 of an option-kind byte (which must be 3), followed by an option- 951 length byte (which must be 3), and a shift count (shift.cnt) byte 952 (the actual option-data). 954 While there are not known security implications arising from the 955 window scale mechanism itself, the size of the TCP window has a 956 number of security implications. In general, larger window sizes 957 increase the chances of an attacker from successfully performing 958 forgery attacks against TCP, such as those described in Section 11 of 959 this document. Additionally, large windows can exacerbate the impact 960 of resource exhaustion attacks such as those described in Section 7 961 of this document. 963 Section 3.7 provides a general discussion of the security 964 implications of the TCP window size. Section 7.3.2 discusses the 965 security implications of Automatic receive-buffer tuning mechanisms. 967 4.7. Timestamps option (Kind = 8) 969 The Timestamps option, specified in RFC 1323 [Jacobson et al, 1992], 970 is used to perform two functions: Round-Trip Time Measurement (RTTM), 971 and Protection Against Wrapped Sequence Numbers (PAWS). 973 4.7.1. Generation of timestamps 975 For the purpose of PAWS, the timestamps sent on a connection are 976 required to be monotonically increasing. While there is no 977 requirement that timestamps are monotonically increasing across TCP 978 connections, the generation of timestamps such that they are 979 monotonically increasing across connections between the same two 980 endpoints allows the use of timestamps for improving the handling of 981 SYN segments that are received while the corresponding four-tuple is 982 in the TIME-WAIT state. This is discussed in Section 11.1.2 of this 983 document. 985 Some implementations are known to initialize their global timestamp 986 clock to zero when the system is bootstrapped. This is undesirable, 987 as the timestamp clock would disclose the system uptime. 988 [I-D.gont-timestamps-generation] discusses the generation of TCP 989 timestamps in detail. 991 4.7.2. Vulnerabilities 993 Blind In-Window Attacks 995 Segments that contain a timestamp option smaller than the last 996 timestamp option recorded by TCP are silently dropped. This allows 997 for a subtle attack against TCP that would allow an attacker to cause 998 one direction of data transfer of the attacked connection to freeze 999 [US-CERT, 2005c]. An attacker could forge a TCP segment that 1000 contains a timestamp that is much larger than the last timestamp 1001 recorded for that direction of the data transfer of the connection. 1002 The offending segment would cause the recorded timestamp (TS.Recent) 1003 to be updated and, as a result, subsequent segments sent by the 1004 impersonated TCP peer would be simply dropped by the receiving TCP. 1005 This vulnerability has been documented in [US-CERT, 2005d]. However, 1006 it is worth noting that exploitation of this vulnerability requires 1007 an attacker to guess (or know) the four-tuple {IP Source Address, IP 1008 Destination Address, TCP Source Port, TCP Destination Port}, as well 1009 a valid Sequence Number and a valid Acknowledgement Number. If an 1010 attacker has such detailed knowledge about a TCP connection, unless 1011 TCP segments are protected by proper authentication mechanisms (such 1012 as IPsec [Kent and Seo, 2005]), he can perform a variety of attacks 1013 against the TCP connection, even more devastating than the one just 1014 described. 1016 Information leaking 1018 Some implementations are known to maintain a global timestamp clock, 1019 which is used for all connections. This is undesirable, as an 1020 attacker that can establish a connection with a host would learn the 1021 timestamp used for all the other connections maintained by that host, 1022 which could be useful for performing any attacks that require the 1023 attacker to forge TCP segments. A timestamps generator such as the 1024 one recommended in Section 4.7.1 of this document would prevent this 1025 information leakage, as it separates the "timestamps space" among the 1026 different TCP connections. 1028 Some implementations are known to initialize their global timestamp 1029 clock to zero when the system is bootstrapped. This is undesirable, 1030 as the timestamp clock would disclose the system uptime. A 1031 timestamps generator such as the one recommended in Section 4.7.1 of 1032 this document would prevent this information leakage, as the function 1033 F() introduces an "offset" that does not disclose the system uptime. 1035 As discussed in Section 3.2 of RFC 1323 [Jacobson et al, 1992], the 1036 Timestamp Echo Reply field (TSecr) is only valid if the ACK bit of 1037 the TCP header is set, and its value must be zero when it is not 1038 valid. However, some TCP implementations have been found to fail to 1039 set the Timestamp Echo Reply field (TSecr) to zero in TCP segments 1040 that do not have the ACK bit set, thus potentially leaking 1041 information. We stress that TCP implementations should comply with 1042 RFC 1323 by setting the Timestamp Echo Reply field (TSecr) to zero in 1043 those TCP segments that do not have the ACK bit set, thus eliminating 1044 this potential information leakage. 1046 Finally, it should be noted that the Timestamps option can be 1047 exploited to count the number of systems behind NATs (Network Address 1048 Translators) [Srisuresh and Egevang, 2001]. An attacker could count 1049 the number of systems behind a NAT by establishing a number of TCP 1050 connections (using the public address of the NAT) and indentifying 1051 the number of different timestamp sequences. This information 1052 leakage could be eliminated by rewriting the contents of the 1053 Timestamps option at the NAT. [Gont and Srisuresh, 2008] provides a 1054 detailed discussion of the security implications of NATs, and 1055 proposes mitigations for this and other issues. 1057 5. Connection-establishment mechanism 1059 The following subsections describe a number of attacks that can be 1060 performed against TCP by exploiting its connection-establishment 1061 mechanism. 1063 5.1. SYN flood 1065 TCP uses a mechanism known as the "three-way handshake" for the 1066 establishment of a connection between two TCP peers. RFC 793 1067 [RFC0793] states that when a TCP that is in the LISTEN state receives 1068 a SYN segment (i.e., a TCP segment with the SYN flag set), it must 1069 transition to the SYN-RECEIVED state, record the control information 1070 (e.g., the ISN) contained in the SYN segment in a Transmission 1071 Control Block (TCB), and respond with a SYN/ACK segment. 1073 A Transmission Control Block is the data structure used to store 1074 (usually within the kernel) all the information relevant to a TCP 1075 connection. The concept of "TCB" is introduced in the core TCP 1076 specification RFC 793 [RFC0793]. 1078 In practice, virtually all existing implementations do not modify the 1079 state of the TCP that was in the LISTEN state, but rather create a 1080 new TCP (i.e., a new "protocol machine"), and perform all the state 1081 transitions on this newly-created TCP. This allows the application 1082 running on top of TCP to service to more than one client at the same 1083 time. As a result, each connection request results in the allocation 1084 of system memory to store the TCB associated with the newly created 1085 TCB. 1087 If TCP was implemented strictly as described in RFC 793, the 1088 application running on top of TCP would have to finish servicing the 1089 current client before being able to service the next one in line, or 1090 should instead be able to perform some kind of connection hand-off. 1092 An attacker could exploit TCP's connection-establishment mechanism to 1093 perform a Denial of Service (DoS) attack, by sending a large number 1094 of connection requests to the target system, with the intent of 1095 exhausting the system memory destined for storing TCBs (or related 1096 kernel data structures), thus preventing the attacked system from 1097 establishing new connections with legitimate users. This attack is 1098 widely known as "SYN flood", and has received a lot of attention 1099 during the late 90's [CERT, 1996]. 1101 Given that the attacker does not need to complete the three-way 1102 handshake for the attacked system to tie system resources to the 1103 newly created TCBs, he will typically forge the source IP address of 1104 the malicious SYN segments he sends, thus concealing his own IP 1105 address. 1107 If the forged IP addresses corresponded to some reachable system, the 1108 impersonated system would receive the SYN/ACK segment sent by the 1109 attacked host (in response to the forged SYN segment), which would 1110 elicit an RST segment. This RST segment would be delivered to the 1111 attacked system, causing the corresponding connection to be aborted, 1112 and the corresponding TCB to be removed. 1114 As the impersonated host would not have any state information for the 1115 TCP connection being referred to by the SYN/ACK segment, it would 1116 respond with a RST segment, as specified by the TCP segment 1117 processing rules of RFC 793 [RFC0793]. 1119 However, if the forged IP source addresses were unreachable, the 1120 attacked TCP would continue retransmitting the SYN/ACK segment 1121 corresponding to each connection request, until timing out and 1122 aborting the connection. For this reason, a number of widely 1123 available attack tools first check whether each of the (forged) IP 1124 addresses are reachable by sending an ICMP echo request to them. The 1125 receipt of an ICMP echo response is considered an indication of the 1126 IP address being reachable (and thus results in the corresponding IP 1127 address not being used for performing the attack), while the receipt 1128 of an ICMP unreachable error message is considered an indication of 1129 the IP address being unreachable (and thus results in the 1130 corresponding IP address being used for performing the attack). 1132 [Gont, 2008b] describes how the so-called ICMP soft errors could be 1133 used by TCP to abort connections in any of the non-synchronized 1134 states. While implementation of the mechanism described in that 1135 document would certainly not eliminate the vulnerability of TCP to 1136 SYN flood attacks (as the attacker could use addresses that are 1137 simply "black-holed"), it provides an example of how signaling 1138 information such as that provided by means of ICMP error messages can 1139 provide valuable information that a transport protocol could use to 1140 perform heuristics. 1142 In order to mitigate the impact of this attack, the amount of 1143 information stored for non-established connections should be reduced 1144 (ideally, non-synchronized connections should not require any state 1145 information to be maintained at the TCP performing the passive OPEN). 1146 There are basically two mitigation techniques for this vulnerability: 1147 a syn-cache and syn-cookies. 1149 [Borman, 1997] and RFC 4987 [Eddy, 2007] contain a general discussion 1150 of SYN-flooding attacks and common mitigation approaches. 1152 The syn-cache [Lemon, 2002] approach aims at reducing the amount of 1153 state information that is maintained for connections in the SYN- 1154 RECEIVED state, and allocates a full TCB only after the connection 1155 has transited to the ESTABLISHED state. 1157 The syn-cookie [Bernstein, 1996] approach aims at completely 1158 eliminating the need to maintain state information at the TCP 1159 performing the passive OPEN, by encoding the most elementary 1160 information required to complete the three-way handshake in the 1161 Sequence Number of the SYN/ACK segment that is sent in response to 1162 the received SYN segment. Thus, TCP is relieved from keeping state 1163 for connections in the SYN-RECEIVED state. 1165 The syn-cookie approach has a number of drawbacks: 1167 o Firstly, given the limited space in the Sequence Number field, it 1168 is not possible to encode all the information included in the 1169 initial segment, such as, for example, support of Selective 1170 Acknowledgements (SACK). 1172 o Secondly, in the event that the Acknowledgement segment sent in 1173 response to the SYN/ACK sent by the TCP that performed the passive 1174 OPEN (i.e., the TCP server) were lost, the connection would end up 1175 in the ESTABLISHED state on the client-side, but in the CLOSED 1176 state on the server side. This scenario is normally handled in 1177 TCP by having the TCP server retransmit its SYN/ACK. However, if 1178 syn-cookies are enabled, there would be no connection state 1179 information on the server side, and thus the SYN/ACK would never 1180 be retransmitted. This could lead to a scenario in which the 1181 connection could remain in the ESTABLISHED state on the client 1182 side, but in the CLOSED state at the server side, indefinitely. 1183 If the application protocol was such that it required the client 1184 to wait for some data from the server (e.g., a greeting message) 1185 before sending any data to the server, a deadlock would take 1186 place, with the client application waiting for such server data, 1187 and the server waiting for the TCP three-way handshake to 1188 complete. 1190 o Thirdly, unless the function used to encode information in the 1191 SYN/ACK packet is cryptographically strong, an attacker could 1192 forge TCP connections in the ESTABLISHED state by forging ACK 1193 segments that would be considered as "legitimate" by the receiving 1194 TCP. 1196 o Fourthly, in those scenarios in which establishment of new 1197 connections is blocked by simply dropping segments with the SYN 1198 bit set, use of SYN cookies could allow an attacker to bypass the 1199 firewall rules, as a connection could be established by forging an 1200 ACK segment with the correct values, without the need of setting 1201 the SYN bit. 1203 As a result, syn-cookies are usually not employed as a first line of 1204 defense against SYN-flood attacks, but are only as the last resort to 1205 cope with them. For example, some TCP implementations enable syn- 1206 cookies only after a certain number of TCBs has been allocated for 1207 connections in the SYN-RECEIVED state. We recommend this 1208 implementation technique, with a syn-cache enabled by default, and 1209 use of syn-cookies triggered, for example, when the limit of TCBs for 1210 non-synchronized connections with a given port number has been 1211 reached. 1213 It is interesting to note that a SYN-flood attack should only affect 1214 the establishment of new connections. A number of books and online 1215 documents seem to assume that TCP will not be able to respond to any 1216 TCP segment that is meant for a TCP port that is being SYN-flooded 1217 (e.g., respond with an RST segment upon receipt of a TCP segment that 1218 refers to a non-existent TCP connection). While SYN-flooding attacks 1219 have been successfully exploited in the past for achieving such a 1220 goal [Shimomura, 1995], as clarified by RFC 1948 [Bellovin, 1996] the 1221 effectiveness of SYN flood attacks to silence a TCP implementation 1222 arose as a result of a bug in the 4.4BSD TCP implementation [Wright 1223 and Stevens, 1994], rather than from a theoretical property of SYN- 1224 flood attacks themselves. Therefore, those TCP implementations that 1225 do not suffer from such a bug should not be silenced as a result of a 1226 SYN-flood attack. 1228 [Zquete, 2002] describes a mechanism that could theoretically improve 1229 the functionality of SYN cookies. It exploits the TCP "simultaneous 1230 open" mechanism, as illustrated in Figure 5. 1232 See Figure 5, in page 46 of the UK CPNI document. 1234 Use of TCP simultaneous open for handling SYN floods 1236 In line 1, TCP A initiates the connection-establishment phase by 1237 sending a SYN segment to TCP B. In line 2, TCP B creates a SYN cookie 1238 as described by [Bernstein, 1996], but does not set the ACK bit of 1239 the segment it sends (thus really sending a SYN segment, rather than 1240 a SYN/ACK). This "fools" TCP A into thinking that both SYN segments 1241 "have crossed each other in the network" as if a "simultaneous open" 1242 scenario had taken place. As a result, in line 3 TCP A sends a SYN/ 1243 ACK segment containing the same options that were contained in the 1244 original SYN segment. In line 4, upon receipt of this segment, TCP 1245 processes the cookie encoded in the ACK field as if it had been the 1246 result of a traditional SYN cookie scenario, and moves the connection 1247 into the ESTABLISHED state. In line 5, TCP B sends a SYN/ACK 1248 segment, which causes the connection at TCP A to move into the 1249 ESTABLISHED state. In line 6, TCP A sends a data segment on the 1250 connection. 1252 While this mechanism would work in theory, unfortunately there are a 1253 number of factors that prevent it from being usable in real network 1254 environments: 1256 o Some systems are not able to perform the "simultaneous open" 1257 operation specified in RFC 793, and thus the connection 1258 establishment will fail. 1260 o Some firewalls might prevent the establishment of TCP connections 1261 that rely on the "simultaneous open" mechanism (e.g., a given 1262 firewall might be allowing incoming SYN/ACK segments, but not 1263 outgoing SYN/ACK segments). 1265 Therefore, we do not recommend implementation of this mechanism for 1266 mitigating SYN-flood attacks. 1268 5.2. Connection forgery 1270 The process of causing a TCP connection to be illegitimately 1271 established between two arbitrary remote peers is usually referred to 1272 as "connection spoofing" or "connection forgery". This can have a 1273 great negative impact when systems establish some sort of trust 1274 relationships based on the IP addresses used to establish a TCP 1275 connection [daemon9 et al, 1996]. 1277 It should be stressed that hosts should not establish trust 1278 relationships based on the IP addresses [CPNI, 2008] or on the TCP 1279 ports in use for the TCP connection (see Section 3.1 and Section 3.2 1280 of this document). 1282 One of the underlying weaknesses that allow this vulnerability to be 1283 more easily exploited is the use of an inadequate Initial Sequence 1284 Number (ISN) generator, as explained back in the 80's in [Morris, 1285 1985]. As discussed in Section 3.3.1 of this document, any TCP 1286 implementation that makes use of an inadequate ISN generator will be 1287 more vulnerable to this type of attack. A discussion of approaches 1288 for a more careful generation of Initial Sequence Numbers (ISNs) can 1289 be found in Section 3.3.1 of this document. 1291 Another attack vector for performing connection-forgery attacks is 1292 the use of IP source routing. By forging the Source Address of the 1293 IP packets that encapsulate the TCP segments of a connection, and 1294 carefully crafting an IP source route option (i.e., either LSSR or 1295 SSRR) that includes a system whose traffic he can monitor, an 1296 attacker could cause the packets sent by the attacked system (e.g., 1297 the SYN/ACK segment sent in response to the attacker's SYN segment) 1298 to be illegitimately directed to him [CPNI, 2008]. Thus, the 1299 attacker would not even need to guess valid sequence numbers for 1300 forging a TCP connection, as he would simply have direct access to 1301 all this information. As discussed in [CPNI, 2008], it is strongly 1302 recommended that systems disable IP Source Routing by default, or at 1303 the very least, they disable source routing for IP packets that 1304 encapsulate TCP segments. 1306 The IPv6 Routing Header Type 0, which provides a similar 1307 functionality to that provided by IPv4 source routing, has been 1308 officially deprecated by RFC 5095 [Abley et al, 2007]. 1310 5.3. Connection-flooding attack 1312 NOTE: THIS SECTION IS BEING EDITED. RFC2119-LANGUAGE IS BEING 1313 REMOVED. 1315 5.3.1. Vulnerability 1317 The creation and maintenance of a TCP connection requires system 1318 memory to maintain shared state between the local and the remote TCP. 1319 As system memory is a finite resource, there is a limit on the number 1320 of TCP connections that a system can maintain at any time. When the 1321 TCP API is employed to create a TCP connection with a remote peer, it 1322 allocates system memory for maintaining shared state with the remote 1323 TCP peer, and thus the resulting connection would tie a similar 1324 amount of resources at the remote host as at the local host. 1325 However, if special packet-crafting tools are employed to forge TCP 1326 segments to establish TCP connections with a remote peer, the local 1327 kernel implementation of TCP can be bypassed, and the allocation of 1328 resources on the attacker's system for maintaining shared state can 1329 be avoided. Thus, a malicious user could create a large number of 1330 TCP connections, and subsequently abandon them, thus tying system 1331 resources only at the remote peer. This allows an attacker to create 1332 a large number of TCP connections at the attacked system with the 1333 intent of exhausting its kernel memory, without exhausting the 1334 attacker's own resources. [CERT, 2000] discusses this vulnerability, 1335 which is usually referred to as the "Naptha attack". 1337 This attack is similar in nature to the "Netkill" attack discussed in 1338 Section 7.1.1. However, while Netkill ties both TCBs and TCP send 1339 buffers to the abandoned connections, Naptha only ties TCBs (and 1340 related kernel structures), as it doesn't issue any application 1341 requests. 1343 The symptom of this attack is an extremely large number of TCP 1344 connections in the ESTABLISHED state, which would tend to exhaust 1345 system resources and deny service to new clients (or possibly cause 1346 the system to crash). 1348 It should be noted that it is possible for an attacker to perform the 1349 same type of attack causing the abandoned connections to remain in 1350 states other than ESTABLISHED. This might be interesting for an 1351 attacker, as it is usually the case that connections in states other 1352 than ESTABLISHED usually have no controlling user-space process (that 1353 is, the former controlling process for the connection has already 1354 closed the corresponding file descriptor). 1356 A particularly interesting case of a connection-flooding attack that 1357 aims at abandoning connections in a state other than ESTABLISHED is 1358 discussed in Section 6.1 of this document. 1360 5.3.2. Countermeasures 1362 As with many other resource exhaustion attacks, the problem in 1363 generating countermeasures for this attack is that it may be 1364 difficult to differentiate between an actual attack and a legitimate 1365 high-load scenario. However, there are a number of countermeasures 1366 which, when tuned for each particular network environment, could 1367 allow a system to resist this attack and continue servicing 1368 legitimate clients. 1370 Hosts SHOULD enforce limits on the number of TCP connections with no 1371 user-space controlling process. 1373 DISCUSSION: 1375 Connections in states other than ESTABLISHED usually have no user- 1376 space controlling process. This prevents the application making 1377 use of those connections from enforcing limits on the maximum 1378 number of ongoing connections (either on a global basis or a 1379 per-IP address basis). When resource exhaustion is imminent or 1380 some threshold of ongoing connections is reached, the operating 1381 system should consider freeing system resources by aborting 1382 connections that have no user-space controlling process. A number 1383 of such connections could be aborted on a random basis, or based 1384 on some heuristics performed by the operating system (e.g., first 1385 abort connections with peers that have the largest number of 1386 ongoing connections with no user-space controlling process). 1388 Hosts SHOULD enforce per-process and per-user limits on maximum 1389 kernel memory that can be used at any time. 1391 Hosts SHOULD enforce per-process and per-user limits on the number of 1392 existent TCP connections at any time. 1394 DISCUSSION: 1396 While the Naphta attack is usually targeted at a service such as 1397 HTTP, its impact is usually system-wide. This is particularly 1398 undesirable, as an attack against a single service might affect 1399 the system as a whole (for example, possibly precluding remote 1400 system administration). 1402 In order to avoid an attack to a single service from affecting 1403 other services, we advise TCP implementations to enforce per- 1404 process and per-user limits on maximum kernel memory that can be 1405 used at any time. Additionally, we recommend implementations to 1406 enforce per-process and per-user limits on the number of existent 1407 TCP connections at any time. 1409 Applications SHOULD enforce limits on the number of simultaneous 1410 connections that can be established from a single IP address or 1411 network prefix at any given time. 1413 DISCUSSION: 1415 An application could limit the number of simultaneous connections 1416 that can be established from a single IP address or network prefix 1417 at any given time. Once that limit has been reached, some other 1418 connection from the same IP address or network prefix would be 1419 aborted, thus allowing the application to service this new 1420 incoming connection. 1422 There are a number of factors that should be taken into account 1423 when defining the specific limit to enforce. For example, in the 1424 case of protocols that have an authentication phase (e.g., SSH, 1425 POP3, etc.), this limit could be applied to sessions that have not 1426 yet been authenticated. Additionally, depending on the nature and 1427 use of the application, it might or might not be normal for a 1428 single system to have multiple connections to the same server at 1429 the same time. 1431 For many network services, the limit of maximum simultaneous 1432 connections could be kept very low. For example, an SMTP server 1433 could limit the number of simultaneous connections from a single 1434 IP address to 10 or 20 connections. 1436 While this limit could work in many network scenarios, we 1437 recommend network operators to measure the maximum number of 1438 concurrent connections from a single IP address during normal 1439 operation, and set the limit accordingly. 1441 In the case of web servers, this limit will usually need to be set 1442 much higher, as it is common practice for web clients to establish 1443 multiple simultaneous connections with a single web server to 1444 speed up the process of loading a web page (e.g., multiple graphic 1445 files can be downloaded simultaneously using separate TCP 1446 connections). 1448 NATs (Network Address Translators) [Srisuresh and Egevang, 2001] 1449 are widely deployed in the Internet, and may exacerbate this 1450 situation, as a large number of clients behind a NAT might each 1451 establish multiple TCP connections with a given web server, which 1452 would all appear to be originate from the same IP address (that of 1453 the NAT box). 1455 Firewalls MAY enforce limits on the number of simultaneous 1456 connections that can be established from a single IP address or 1457 network prefix at any given time. 1459 DISCUSSION: 1461 Some firewalls can be configured to limit the number of 1462 simultaneous connections that any system can maintain with a 1463 specific system and/or service at any given time. Limiting the 1464 number of simultaneous connections that each system can establish 1465 with a specific system and service would effectively limit the 1466 possibility of an attacker that controls a single IP address to 1467 exhaust system resources at the attacker system/service. 1469 5.4. Firewall-bypassing techniques 1471 [draft-gont-tcpm-tcp-sanity-checks-00.txt] discusses how packets with 1472 both the SYN and RST bits set have been employed in the wild to 1473 bypass firewall rules, and provides advices in this area. 1475 6. Connection-termination mechanism 1477 6.1. FIN-WAIT-2 flooding attack 1479 6.1.1. Vulnerability 1481 TCP implements a connection-termination mechanism that is employed 1482 for the graceful termination of a TCP connection. This mechanism 1483 usually consists of the exchange of four-segments. Figure 6 1484 illustrates the usual segment exchange for this mechanism. 1486 Figure 6: TCP connection-termination mechanism 1488 See Figure 6, in page 50 of the UK CPNI document. 1490 TCP connection-termination mechanism 1492 A potential problem may arise as a result of the FIN-WAIT-2 state: 1493 there is no limit on the amount of time that a TCP can remain in the 1494 FIN-WAIT-2 state. Furthermore, no segment exchange is required to 1495 maintain the connection in that state. 1497 As a result, an attacker could establish a large number of 1498 connections with the target system, and cause it close each of them. 1499 For each connection, once the target system has sent its FIN segment, 1500 the attacker would acknowledge the receipt of this segment, but would 1501 send no further segments on that connection. As a result, an 1502 attacker could cause the corresponding system resources (e.g., the 1503 system memory used for storing the TCB) without the need to send any 1504 further packets. 1506 While the CLOSE command described in RFC 793 [RFC0793] simply signals 1507 the remote TCP end-point that this TCP has finished sending data 1508 (i.e., it closes only one direction of the data transfer), the 1509 close() system-call available in most operating systems has different 1510 semantics: it marks the corresponding file descriptor as closed (and 1511 thus it is no longer usable), and assigns the operating system the 1512 responsibility to deliver any queued data to the remote TCP peer and 1513 to terminate the TCP connection. This makes the FIN-WAIT-2 state 1514 particularly attractive for performing memory exhaustion attacks, as 1515 even if the application running on top of TCP were imposing limits on 1516 the maximum number of ongoing connections, and/or time limits on the 1517 function calls performed on TCP connections, that application would 1518 be unable to enforce these limits on the FIN-WAIT-2 state. 1520 6.1.2. Countermeasures 1522 A number of countermeasures can be implemented to mitigate FIN-WAIT-2 1523 flooding attacks. Some of these countermeasures require changes in 1524 the TCP implementations, while others require changes in the 1525 applications running on top of TCP. 1527 TCP SHOULD enforce limits on the duration of the FIN-WAIT-2 state. 1529 DISCUSSION: 1531 In order to avoid the risk of having connections stuck in the FIN- 1532 WAIT-2 state indefinitely, a number of systems incorporate a 1533 timeout for the FIN-WAIT-2 state. For example, the Linux kernel 1534 version 2.4 enforces a timeout of 60 seconds [Linux, 2008]. If 1535 the connection-termination mechanism does not complete before that 1536 timeout value, it is aborted. 1538 Enabling applications to enforce limits on ongoing connections 1540 As discussed in Section 6.1.1, the fact that the close() system call 1541 marks the corresponding file descriptor as closed prevents the 1542 application running on top of TCP from enforcing limits on the 1543 corresponding connection. 1545 While it is common practice for applications to terminate their 1546 connections by means of the close() system call, it is possible for 1547 an application to initiate the connection-termination phase without 1548 closing the corresponding file descriptor (hence keeping control of 1549 the connection). 1551 In order to achieve this, an application performing an active close 1552 (i.e., initiating the connection-termination phase) should replace 1553 the system-call close(sockfd) with the following code sequence: 1555 o A call to shutdown(sockfd, SHUT_WR), to close the sending 1556 direction of this connection 1558 o Successive calls to read(), until it returns 0, thus indicating 1559 that the remote TCP peer has finished sending data. 1561 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 1562 sizeof(l)), where l is of type struct linger (with its members 1563 l.l_onoff=1 and l.l_linger=90). 1565 o A call to close(sockfd), to close the corresponding file 1566 descriptor. 1568 The call to shutdown() (instead of close()) allows the application to 1569 retain control of the underlying TCP connection while the connection 1570 transitions through the FIN-WAIT-1 and FIN-WAIT-2 states. However, 1571 the application will not retain control of the connection while it 1572 transitions through the CLOSING and TIME-WAIT states. 1574 It should be noted that, strictly speaking, close(sockfd) decrements 1575 the reference count for the descriptor sockfd, and initiates the 1576 connection termination phase only when the reference count reaches 0. 1577 On the other hand, shutdown(sockfd, SHUT_WR) initiates the 1578 connection-termination phase, regardless of the reference count for 1579 the sockfd descriptor. This should be taken into account when 1580 performing the code replacement described above. For example, it 1581 would be a bug for two processes (e.g., parent and child) that share 1582 a descriptor to both call shutdown(sockfd, SHUT_WR). 1584 An application performing a passive close should replace the call to 1585 close(sockfd) with the following code sequence: 1587 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 1588 sizeof(l)), where l is of type struct linger (with its members 1589 l.l_onoff=1 and l.l_linger=90). 1591 o A call to close(sockfd), to close the corresponding file 1592 descriptor. 1594 It is assumed that if the application is performing a passive close, 1595 the application already detected that the remote TCP peer finished 1596 sending data by means as a result of a call to read() returning 0. 1598 In this scenario, the application will not retain control of the 1599 underlying connection when it transitions through the LAST_ACK state. 1601 Enforcing limits on the number of connections with no user-space 1602 controlling process 1604 The considerations and recommendations in Section 5.3.2 for enforcing 1605 limits on the number of connections with no user-space controlling 1606 process are applicable to mitigate this vulnerability. 1608 Limiting the number of simultaneous connections at the application 1610 The considerations and recommendations in Section 5.3.2 for limiting 1611 the number of simultaneous connections at the application are to 1612 mitigate this vulnerability. We note, however, that unless 1613 applications are implemented to retain control of the underlying TCP 1614 connection while the connection transitions through the FIN-WAIT-1 1615 and FIN-WAIT-2 states, enforcing such limits may prove to be a 1616 difficult task. 1618 Limiting the number of simultaneous connections at firewalls 1620 The considerations and recommendations in Section 5.3.2 for enforcing 1621 limiting the number of simultaneous connections at firewalls are 1622 applicable to mitigate this vulnerability. 1624 7. Buffer management 1625 7.1. TCP retransmission buffer 1627 7.1.1. Vulnerability 1629 [Shalunov, 2000] describes a resource exhaustion attack (Netkill) 1630 that can be performed against TCP. The attack aims at exhausting 1631 system memory by creating a large number of TCP connections which are 1632 then abandoned. The attack is usually performed as follows: 1634 o The attacker creates a TCP connection to a service in which a 1635 small client request can result in a large server response (e.g., 1636 HTTP). Rather than relying on his kernel implementation of TCP, 1637 the attacker creates his TCP connections by means of a specialized 1638 packet-crafting tool. This allows the attacker to create the TCP 1639 connections and later abandon them, exhausting the resources at 1640 the attacked system, while not tying his own system resources to 1641 the abandoned connections. 1643 o When the connection is established (i.e., the three-way handshake 1644 has completed), an application request is sent, and the TCP 1645 connection is subsequently abandoned. At this point, any state 1646 information kept by the attack tool is removed. 1648 o The attacked server allocates TCP send buffers for transmitting 1649 the response to the client's request. This causes the victim TCP 1650 to tie resources not only for the Transmission Control Block 1651 (TCB), but also for the application data that needs to be 1652 transferred. 1654 o Once the application response is queued for transmission, the 1655 application closes the TCP connection, and thus TCP takes the 1656 responsibility to deliver the queued data. Having the application 1657 close the connection has the benefit for the attacker that the 1658 application is not able to keep track of the number of TCP 1659 connections in use, and thus it is not able to enforce limits on 1660 the number of connections. 1662 o The attacker repeats the above steps a large number of times, thus 1663 causing a large amount of system memory at the victim host to be 1664 tied to the abandoned connections. When the system memory is 1665 exhausted, the victim host denies service to new connections, or 1666 possibly crashes. 1668 There are a number of factors that affect the effectiveness of this 1669 attack that are worth considering. Firstly, while the attack is 1670 typically targeted at a service such as HTTP, the consequences of the 1671 attack are usually system-wide. Secondly, depending on the size of 1672 the server's response, the underlying TCP connection may or may not 1673 be closed: if the response is larger than the TCP send buffer size at 1674 the server, the application will usually block in a call to write() 1675 or send(), and would therefore not close the TCP connection, thus 1676 allowing the application to enforce limits on the number of ongoing 1677 connections. Consequently, the attacker will usually try to elicit a 1678 response that is equal to or slightly smaller than the send buffer of 1679 the attacked TCP. Thirdly, while [Shalunov, 2000] notes that one 1680 visible effect of this attack is a large number of connections in the 1681 FIN-WAIT-1 state, this will not usually be the case. Given that the 1682 attacker never acknowledges any segment other than the SYN/ACK 1683 segment that is part of the three-way handshake, at the point in 1684 which the attacked TCP tries to send the application's response the 1685 congestion window (cwnd) will usually be 4*SMSS (four maximum-sized 1686 segments). If the application's response were larger than 4*SMSS, 1687 even if the application had closed the connection, the FIN segment 1688 would never be sent, and thus the connection would still remain in 1689 the ESTABLISHED state (rather than transit to the FIN-WAIT-1 state). 1691 7.1.2. Countermeasures 1693 The resource exhaustion attack described in Section 7.1.1 does not 1694 necessarily differ from a legitimate high-load scenario, and 1695 therefore is hard to mitigate without negatively affecting the 1696 robustness of TCP. However, complementary mitigations can still be 1697 implemented to limit the impact of these attacks. 1699 Enforcing limits on the number of connections with no user-space 1700 controlling process 1702 The considerations and recommendations in Section 5.3.2 for enforcing 1703 limits on the number of connections with no user-space controlling 1704 process are applicable to mitigate this vulnerability. 1706 Enforcing per-user and per-process limits 1708 While the Netkill attack is usually targeted at a service such as 1709 HTTP, its impact is usually system-wide. This is particularly 1710 undesirable, as an attack against a single service might affect the 1711 system as a whole (for example possibly precluding remote system 1712 administration). 1714 In order to avoid an attack against a single service from affecting 1715 other services, we advise TCP implementations to enforce per-process 1716 and per-user limits on maximum kernel memory that can be used at any 1717 time. Additionally, we recommend implementations to enforce per- 1718 process and per-user limits on the number of existent TCP connections 1719 at any time. 1721 Limiting the number of ongoing connections at the application 1723 The considerations and recommendations in Section 5.3.2 for enforcing 1724 limits on the number of ongoing connections at the application are 1725 applicable to mitigate this vulnerability. 1727 Enabling applications to enforce limits on ongoing connections 1729 As discussed in Section 6.1.1, the fact that the close() system call 1730 marks the corresponding file descriptor as closed prevents the 1731 application running on top of TCP from enforcing limits on the 1732 corresponding connection. 1734 While it is common practice for applications to terminate their 1735 connections by means of the close() system call, it is possible for 1736 an application to initiate the connection-termination phase without 1737 closing the corresponding file descriptor (hence keeping control of 1738 the connection). 1740 In order to achieve this, an application performing an active close 1741 (i.e., initiating the connection-termination phase) should replace 1742 the call to close(sockfd) with the following code sequence: 1744 o A call to shutdown(sockfd, SHUT_WR), to close the sending 1745 direction of this connection 1747 o Successive calls to read(), until it returns 0, thus indicating 1748 that the remote TCP peer has finished sending data. 1750 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 1751 sizeof(l)), where l is of type struct linger (with its members 1752 l.l_onoff=1 and l.l_linger=90). 1754 o A call to close(sockfd), to close the corresponding file 1755 descriptor. 1757 The call to shutdown() (instead of close()) allows the application to 1758 retain control of the underlying TCP connection while the connection 1759 transitions through the FIN-WAIT-1 and FIN-WAIT-2 states. However, 1760 the application will not retain control of the connection while it 1761 transitions through the CLOSING and TIME-WAIT states. Nevertheless, 1762 in these states TCP should not have any pending data to send to the 1763 remote TCP peer or to be received by the application running on top 1764 of it, and thus these states are less of a concern for this 1765 particular vulnerability (Netkill). 1767 It should be noted that, strictly speaking, close(sockfd) decrements 1768 the reference count for the descriptor sockfd, and initiates the 1769 connection termination phase only when the reference count reaches 0. 1770 On the other hand, shutdown(sockfd, SHUT_WR) initiates the 1771 connection-termination phase, regardless of the reference count for 1772 the sockfd descriptor. This should be taken into account when 1773 performing the code replacement described above. For example, it 1774 would be a bug for two processes (e.g., parent and child) that share 1775 a descriptor to both call shutdown(sockfd, SHUT_WR). 1777 An application performing a passive close should replace the call to 1778 close(sockfd) with the following code sequence: 1780 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 1781 sizeof(l)), where l is of type struct linger (with its members 1782 l.l_onoff=1 and l.l_linger=90). 1784 o A call to close(sockfd), to close the corresponding file 1785 descriptor. 1787 It is assumed that if the application is performing a passive close, 1788 the application already detected that the remote TCP peer finished 1789 sending data by means as a result of a call to read() returning 0. 1791 In this scenario, the application will not retain control of the 1792 underlying connection when it transitions through the LAST_ACK state. 1793 However, in this state TCP should not have any pending data to send 1794 to the remote TCP peer or to be received by the application running 1795 on top of TCP, and thus this state is less of a concern for this 1796 particular vulnerability (Netkill). 1798 Limiting the number of simultaneous connections at firewalls 1800 The considerations and recommendations in Section 5.3.2 for enforcing 1801 limiting the number of simultaneous connections at firewalls are 1802 applicable to mitigate this vulnerability. 1804 Performing heuristics on ongoing TCP connections 1806 Some heuristics could be performed on TCP connections that may 1807 possibly help if scarce system requirements such as memory become 1808 exhausted. A number of parameters may be useful to perform such 1809 heuristics. 1811 In the case of the Netkill attack described in [Shalunov, 2000], 1812 there are two parameters that are characteristic of a TCP being 1813 attacked: 1815 o A large amount of data queued in the TCP retransmission buffer 1816 (e.g., the socket send buffer). 1818 o Only small amount of data has been successfully transferred to the 1819 remote peer. 1821 Clearly, these two parameters do not necessarily indicate an ongoing 1822 attack. However, if exhaustion of the corresponding system resources 1823 was imminent, these two parameters (among others) could be used to 1824 perform heuristics when considering aborting ongoing connections. 1826 It should be noted that while an attacker could advertise a zero 1827 window to cause the target system to tie system memory to the TCP 1828 retransmission buffer, it is hard to perform any useful statistics 1829 from the advertised window. While it is tempting to enforce a limit 1830 on the length of the persist state (see Section 3.7.2 of this 1831 document), an attacker could simply open the window (i.e., advertise 1832 a TCP window larger than zero) from time to time to prevent this 1833 enforced limit from causing his malicious connections to be aborted. 1835 7.2. TCP segment reassembly buffer 1837 TCP buffers out-of-order segments to more efficiently handle the 1838 occurrence of packet reordering and segment loss. When out-of-order 1839 data are received, a "hole" momentarily exists in the data stream 1840 which must be filled before the received data can be delivered to the 1841 application making use of TCP's services. This situation can be 1842 exploited by an attacker, which could intentionally create a hole in 1843 the data stream by sending a number of segments with a sequence 1844 number larger than the next sequence number expected (RCV.NXT) by the 1845 attacked TCP. Thus, the attacked TCP would tie system memory to 1846 buffer the out-of-order segments, without being able to hand the 1847 received data to the corresponding application. 1849 If a large number of such connections were created, system memory 1850 could be exhausted, precluding the attacked TCP from servicing new 1851 connections and/or continue servicing TCP connections previously 1852 established. 1854 Fortunately, these attacks can be easily mitigated, at the expense of 1855 degrading the performance of possibly legitimate connections. When 1856 out-of-order data is received, an Acknowledgement segment is sent 1857 with the next sequence number expected (RCV.NXT). This means that 1858 receipt of the out-of-order data will not be actually acknowledged by 1859 the TCP's cumulative Acknowledgement Number. As a result, a TCP is 1860 free to discard any data that have been received out-of-order, 1861 without affecting the reliability of the data transfer. Given the 1862 performance implications of discarding out-of-order segments for 1863 legitimate connections, this pruning policy should be applied only if 1864 memory exhaustion is imminent. 1866 As a result of discarding the out-of-order data, these data will need 1867 to be unnecessarily retransmitted. Additionally, a loss event will 1868 be detected by the sending TCP, and thus the slow start phase of 1869 TCP's congestion control will be entered, thus reducing the data 1870 transfer rate of the connection. 1872 It is interesting to note that this pruning policy could be applied 1873 even if Selective Acknowledgements (SACK) (specified in RFC 2018 1874 [Mathis et al, 1996]) are in use, as SACK provides only advisory 1875 information, and does not preclude the receiving TCP from discarding 1876 data that have been previously selectively-acknowledged by means of 1877 TCP's SACK option, but not acknowledged by TCP's cumulative 1878 Acknowledgement Number. 1880 There are a number of ways in which the pruning policy could be 1881 triggered. For example, when out of order data are received, a timer 1882 could be set, and the sequence number of the out-of-order data could 1883 be recorded. If the hole were filled before the timer expires, the 1884 timer would be turned off. However, if the timer expired before the 1885 hole were filled, all the out-of-order segments of the corresponding 1886 connection would be discarded. This would be a proactive counter- 1887 measure for attacks that aim at exhausting the receive buffers. 1889 In addition, an implementation could incorporate reactive mechanisms 1890 for more carefully controlling buffer allocation when some predefined 1891 buffer allocation threshold was reached. At such point, pruning 1892 policies would be applied. 1894 A number of mechanisms can aid in the process of freeing system 1895 resources. For example, a table of network prefixes corresponding to 1896 the IP addresses of TCP peers that have ongoing TCP connections could 1897 record the aggregate amount of out-of-order data currently buffered 1898 for those connections. When the pruning policy was triggered, TCP 1899 connections with hosts that have network prefixes with large 1900 aggregate out-of-order buffered data could be selected first for 1901 pruning the out-of-order segments. 1903 Alternatively, if TCP segments were de-multiplexed by means of a hash 1904 table (as it is currently the case in many TCP implementations), a 1905 counter could be held at each entry of the hash table that would 1906 record the aggregate out-of-order data currently buffered for those 1907 connections belonging to that hash table entry. When the pruning 1908 policy is triggered, the out-of-order data corresponding to those 1909 connections linked by the hash table entry with largest amount of 1910 aggregate out-of-order data could be pruned first. It is important 1911 that this hash is not computable by an attacker, as this would allow 1912 him to maliciously cause the performance of specific connections to 1913 be degraded. That is, given a four-tuple that identifies a 1914 connection, an attacker should not be able to compute the 1915 corresponding hash value used by the target system to de-multiplex 1916 incoming TCP segments to that connection. 1918 Another variant of a resource exhaustion attack against TCP's segment 1919 reassembly mechanism would target the data structures used to link 1920 the different holes in a data stream. For example, an attacker could 1921 send a burst of 1 byte segments, leaving a one-byte hole between each 1922 of the data bytes sent. Depending on the data structures used for 1923 holding and linking together each of the data segments, such an 1924 attack might waste a large amount of system memory by exploiting the 1925 overhead needed store and link together each of these one-byte 1926 segments. 1928 For example, if a linked-list is used for holding and linking each of 1929 the data segments, each of the involved data structures could involve 1930 one byte of kernel memory for storing the received data byte (the TCP 1931 payload), plus 4 bytes (32 bits) for storing a pointer to the next 1932 node in the linked-list. Additionally, while such a data structure 1933 would require only a few bytes of kernel memory, it could result in 1934 the allocation of a whole memory page, thus consuming much more 1935 memory than expected. 1937 Therefore, implementations should enforce a limit on the number of 1938 holes that are allowed in the received data stream at any given time. 1939 When such a limit is reached, incoming TCP segments which would 1940 create new holes would be silently dropped. Measurements in 1941 [Dharmapurikar and Paxson, 2005] indicate that in the vast majority 1942 of TCP connections have at most a single hole at any given time. A 1943 limit of 16 holes for each connection would accommodate even most of 1944 the very unusual cases in which there can be more than hole in the 1945 data stream at a given time. 1947 [US-CERT, 2004a] is a security advisory about a Denial of Service 1948 vulnerability resulting from a TCP implementation that did not 1949 enforce limits on the number of segments stored in the TCP reassembly 1950 buffer. 1952 Section 8 of this document describes the security implications of the 1953 TCP segment reassembly algorithm. 1955 7.3. Automatic buffer tuning mechanisms 1957 NOTE: THIS SECTION IS BEING EDITED. PLEASE DISREGARD THE RFC2119- 1958 LANGUAGE RECOMMENDATIONS. 1960 7.3.1. Automatic send-buffer tuning mechanisms 1962 A TCP implementing an automatic send-buffer tuning mechanism SHOULD 1963 enforce the following limit on the size of the send buffer of each 1964 TCP connection: 1966 send_buffer_size <= send_buffer_pool / (min_buffer_size * max_connections) 1968 where 1970 send_buffer_size: 1971 Maximum send buffer size to be used for this connection 1973 send_buffer_pool: 1974 Total amount of system memory meant for TCP send buffers 1976 min_buffer_size: 1977 Minimum send buffer size for each TCP connection 1979 max_connections: 1980 Maximum number of TCP connections this system is expected to 1981 handle at a time 1983 max_connections may be an artificial limit enforced by the system 1984 administrator specifically on the number of TCP connections, or may 1985 be derived from some other system limit (e.g., the maximum number of 1986 file descriptors) 1988 DISCUSSION: 1990 A number of TCP implementations incorporate automatic tuning 1991 mechanisms for the TCP send buffer size. In most of them, the 1992 underlying idea is to set the send buffer to some multiple of the 1993 congestion window (cwnd). This type of mechanism usually improves 1994 TCP's performance, by preventing the socket send buffer from 1995 becoming a bottleneck, while avoiding the need to simply 1996 overestimate the TCP send buffer size (i.e., make it arbitrarily 1997 large). [Semke et al, 1998] discusses such an automatic buffer 1998 tuning mechanism. 2000 Unfortunately, automatic tuning mechanisms can be exploited by 2001 attackers to amplify the impact of other resource exhaustion 2002 attacks. For example, an attacker could establish a TCP 2003 connection with a victim host, and cause the congestion window to 2004 be increased (either legitimately or illegitimately). Once the 2005 congestion window (and hence the TCP send buffer) is increased, he 2006 could cause the corresponding system memory to be tied up by 2007 advertising a zero-byte TCP window (see Section 3.7) or simply not 2008 acknowledging any data, thus amplifying the effect of resource 2009 exhaustion attacks such as that discussed in Section 7.1.1. 2011 When an automatic buffer tuning mechanism is implemented, a number 2012 of countermeasures should be incorporated to prevent the mechanism 2013 from being exploited to amplify other resource exhaustion attacks. 2015 Firstly, appropriate policies should be applied to guarantee fair 2016 use of the available system memory by each of the established TCP 2017 connections. Secondly, appropriate policies should be applied to 2018 avoid existing TCP connections from consuming all system 2019 resources, thus preventing service to new TCP connections. 2021 Appendix A of [Semke et al, 1998] proposes an algorithm for the 2022 fair share of the available system memory among the established 2023 connections. However, there are a number of limits that should be 2024 enforced on the system memory assigned for the send buffer of each 2025 connection. Firstly, each connection should always be assigned 2026 some minimum send buffer space that would enable TCP to perform at 2027 an acceptable performance. Secondly, some system memory should be 2028 reserved for future connections, according to the maximum number 2029 of concurrent TCP connections that are expected to be successfully 2030 handled at any given time. 2032 These limits preclude the automatic tuning algorithm from 2033 assigning all the available memory buffers to ongoing connections, 2034 thus preventing the establishment of new connections. 2036 Even if these limits are enforced, an attacker could still create 2037 a large number of TCP connections, each of them tying valuable 2038 system resources. Therefore, in scenarios in which most of the 2039 system memory reserved for TCP send buffers is allocated to 2040 ongoing connections, it may be necessary for TCP to enforce some 2041 policy to free resources to either service more TCP connections, 2042 or to be able to improve the performance of other existing 2043 connections, by allocating more resources to them. 2045 When needing to free memory in use for send buffers, particular 2046 attention should be paid to TCP's that have a large amount of data 2047 in the socket send buffer, and that at the same time fall into any 2048 of these categories: 2050 * The remote TCP peer that has been advertising a small (possibly 2051 zero) window for a considerable period of time. 2053 * There have been a large number of retransmissions of segments 2054 corresponding to the first few windows of data. 2056 * Connections that fall into one of the previous categories, for 2057 which only a reduced amount of data have been successfully 2058 transferred to the peer TCP since the connection was 2059 established. 2061 Unfortunately, all these cases are valid scenarios for the TCP 2062 protocol, and thus aborting connections that fall in any of these 2063 categories has the potential of causing interoperability problems. 2064 However, in scenarios in which all system resources are allocated, 2065 it may make sense to free resources allocated to TCP connections 2066 which are tying a considerable amount of system resources and that 2067 have not made progress in a considerable period of time. 2069 7.3.2. Automatic receive-buffer tuning mechanism 2071 A number of TCP implementations include automatic tuning mechanisms 2072 for the receive buffer size. These mechanisms aim at setting the 2073 socket buffer to a size that is large enough to avoid the TCP window 2074 from becoming a bottleneck that would limit TCP's throughput, without 2075 wasting system memory by over-sizing it. 2077 [Heffner, 2002] describes a mechanism for the automatic tuning of the 2078 socket receive buffer. Basically, the mechanism aims at measuring 2079 the amount of data received during a RTT (Round-Trip Time), and 2080 setting the socket receive buffer to some multiple of that value. 2082 A TCP implementing an automatic receive-buffer tuning mechanism 2083 SHOULD enforce the following limit on the size of the receive buffer 2084 of each TCP connection: 2086 recv_buffer_size <= recv_buffer_pool / (min_buffer_size * max_connections) 2088 where: 2090 recv_buffer_size: 2091 Maximum receive buffer size to be used for this connection 2093 recv_buffer_pool: 2094 Total amount of system memory meant for TCP receive buffers 2096 min_buffer_size: 2097 Minimum receive buffer size for each TCP connection 2099 max_connections: 2100 Maximum number of TCP connections this system is expected to 2101 handle at a time 2103 max_connections may be an artificial limit enforced by the system 2104 administrator specifically on the number of TCP connections, or may 2105 be derived from some other system limit (e.g., the maximum number of 2106 file descriptors). 2108 DISCUSSION: 2110 Unfortunately, automatic tuning mechanisms for the socket receive 2111 buffer can be exploited to perform a resource exhaustion attack. 2112 An attacker willing to exploit the automatic buffer tuning 2113 mechanism would first establish a TCP connection with the victim 2114 host. Subsequently, he would start a bulk data transfer to the 2115 victim host. By carefully responding to the peer's TCP segments, 2116 the attacker could cause the peer TCP to measure a large data/RTT 2117 value, which would lead to the adoption of an unnecessarily large 2118 socket receive buffer. For example, the attacker could 2119 optimistically send more data than those allowed by the TCP window 2120 advertised by the remote TCP. Those extra data would cross in the 2121 network with the window updates sent by the remote TCP, and could 2122 lead the TCP receiver to measure a data/RTT twice as big as the 2123 real one. Alternatively, if the TCP timestamp option (specified 2124 in RFC 1323 [Jacobson et al, 1992]) is used for RTT measurement, 2125 the attacker could lead the TCP receiver to measure a small RTT 2126 (and hence a large Data/RTT rate) by "optimistically" echoing 2127 timestamps that have not yet been received. 2129 Finally, once the TCP receiver is led to increase the size of its 2130 receive buffer, the attacker would transmit a large amount of 2131 data, filling the whole peer's receive buffer except for a few 2132 bytes at the beginning of the window (RCV.NXT). This gap would 2133 prevent the peer application from reading the data queued by TCP, 2134 thus tying system memory to the received data segments until (if 2135 ever) the peer application times out. 2137 A number of limits should be enforced on the amount of system 2138 memory assigned to any given connection. Firstly, each connection 2139 should always be assigned some minimum receive buffer space that 2140 would enable TCP to perform at a minimum acceptable performance. 2141 Additionally, some system memory should be reserved for future 2142 connections, according to the maximum number of concurrent TCP 2143 connections that are expected to be successfully handled at any 2144 given time. 2146 These limits preclude the automatic tuning algorithm from 2147 assigning all the available memory buffers to existing 2148 connections, thus preventing the establishment of new connections. 2150 It is interesting to note that a TCP sender will always try to 2151 retransmit any data that have not been acknowledged by TCP's 2152 cumulative acknowledgement. Therefore, if memory exhaustion is 2153 imminent, a system should consider freeing those memory buffers 2154 used for TCP segments that were received out of order, 2155 particularly when a given connection has been keeping a large 2156 number of out-of-order segments in the receive buffer for a 2157 considerable period of time. 2159 It is worth noting that TCP Selective Acknowledgements (SACK) are 2160 advisory, in the sense that a TCP that has SACKed (but not ACKed) 2161 a block of data is free to discard that block, and expect the TCP 2162 sender to retransmit them when the retransmission timer of the 2163 peer TCP expires. 2165 8. TCP segment reassembly algorithm 2167 8.1. Problems that arise from ambiguity in the reassembly process 2169 A security consideration that should be made for the TCP segment 2170 reassembly algorithm is that of data stream consistency between the 2171 host performing the TCP segment reassembly, and a Network Intrusion 2172 Detection System (NIDS) being employed to monitor the host in 2173 question. 2175 In the event a TCP segment was unnecessarily retransmitted, or there 2176 was packet duplication in any of the intervening networks, a TCP 2177 might get more than one copy of the same data. Also, as TCP segments 2178 can be re-packetized when they are retransmitted, a given TCP segment 2179 might partially overlap data already received in earlier segments. 2180 In all these cases, the question arises about which of the copies of 2181 the received data should be used when reassembling the data stream. 2182 In legitimate and normal circumstances, all copies would be 2183 identical, and the same data stream would be obtained regardless of 2184 which copy of the data was used. However, an attacker could 2185 maliciously send overlapping segments containing different data, with 2186 the intent of evading a Network Intrusion Detection Systems (NIDS), 2187 which might reassemble the received TCP segments differently than the 2188 monitored system. [Ptacek and Newsham, 1998] provides a detailed 2189 discussion of these issues. 2191 As suggested in Section 3.9 of RFC 793 [RFC0793], if a TCP segment 2192 arrives containing some data bytes that have already been received, 2193 the first copy of those data should be used for reassembling the 2194 application data stream. It should be noted that while convergence 2195 to this policy might prevent some cases of ambiguity in the 2196 reassembly process, there are a number of other techniques that an 2197 attacker could still exploit to evade a NIDS [CPNI, 2008]. These 2198 techniques can generally be defeated if the NIDS is placed in-line 2199 with the monitored system, thus allowing the NIDS to normalize the 2200 network traffic or apply some other policy that could ensure 2201 consistency between the result of the segment reassembly process 2202 obtained by the monitored host and that obtained by the NIDS. 2204 [CERT, 2003] and [CORE, 2003] are advisories about a heap buffer 2205 overflow in a popular Network Intrusion Detection System resulting 2206 from incorrect sequence number calculations in its TCP stream- 2207 reassembly module. 2209 9. TCP Congestion Control 2211 NOTE: THIS SECTION IS BEING EDITED. 2213 TCP implements two algorithms, "slow start" and "congestion 2214 avoidance", for controlling the rate at which data is transmitted on 2215 a TCP connection [RFC5681]. 2217 9.1. Congestion control with misbehaving receivers 2219 [Savage et al, 1999] describes a number of ways in which TCP's 2220 congestion control mechanisms can be exploited by a misbehaving TCP 2221 receiver to obtain more than its fair share of bandwidth. The 2222 following subsections provide a brief discussion of these 2223 vulnerabilities, along with the possible countermeasures. 2225 9.1.1. ACK division 2227 Given that TCP updates cwnd based on the number of duplicate ACKs it 2228 receives, rather than on the amount of data that each ACK is actually 2229 acknowledging, a malicious TCP receiver could cause the TCP sender to 2230 illegitimately increase its congestion window by acknowledging a data 2231 segment with a number of separate Acknowledgements, each covering a 2232 distinct piece of the received data segment. 2234 See Figure 7, in page 64 of the UK CPNI document. 2236 ACK division attack 2238 [Savage et al, 1999] describes two possible countermeasures for this 2239 vulnerability. One of them is to increment cwnd not by a full SMSS, 2240 but proportionally to the amount of data being acknowledged by the 2241 received ACK, similarly to the policy described in RFC 3465 [Allman, 2242 2003]. Another alternative is to increase cwnd by one SMSS only when 2243 a valid ACK covers the entire data segment sent. 2245 9.1.2. DupACK forgery 2247 The second vulnerability discussed in [Savage et al, 1999] allows an 2248 attacker to cause the TCP sender to illegitimately increase its 2249 congestion window by forging a number of duplicate Acknowledgements 2250 (DupACKs). Figure 8 shows a sample scenario. The first three 2251 DupACKs trigger the Fast Recovery mechanism, while the rest of them 2252 cause the congestion window at the TCP sender to be illegitimately 2253 inflated. Thus, the attacker is able to illegitimately cause the TCP 2254 sender to increase its data transmission rate. 2256 See Figure 8, in page 65 of the UK CPNI document. 2258 DupACK forgery attack 2260 Fortunately, a number of sender-side heuristics can be implemented to 2261 mitigate this vulnerability. First, the TCP sender could keep track 2262 of the number of outstanding segment (o_seg), and accept only up to 2263 (o_seg -1) DupACKs. Secondly, a TCP sender might, for example, 2264 refuse to enter Fast Recovery multiple times in some period of time 2265 (e.g., one RTT). 2267 [Savage et al, 1999] also describes a modification to TCP to 2268 implement a nonce protocol that would eliminate this vulnerability. 2269 However, this would require modification of all implementations, 2270 which makes this counter-measure hard to deploy. 2272 9.1.3. Optimistic ACKing 2274 Another alternative for an attacker to exploit TCP's congestion 2275 control mechanisms is to acknowledge data that has not yet been 2276 received, thus causing the congestion window at the TCP sender to be 2277 incremented faster than it should. 2279 See Figure 9, in page 66 of the UK CPNI document. 2281 Optimistic ACKing attack 2283 [Savage et al, 1999] describes a number of mitigations for this 2284 vulnerability. Firstly, it describes a countermeasure based on the 2285 concept of "cumulative nonce", which would allow a receiver to prove 2286 that it has received all the segments it is acknowledging. However, 2287 this countermeasure requires the introduction of two new fields to 2288 the TCP header, thus requiring a modification to all the 2289 communicating TCPs, makes this counter-measure hard to deploy. 2290 Secondly, it describes a possible way to encode the nonce in a TCP 2291 segment by carefully modifying its size. While this countermeasure 2292 could be easily deployed (as it is just sender side policy), we 2293 believe that middle-boxes such as protocol-scrubbers might prevent 2294 this counter-measure from working as expected. Finally, it suggests 2295 that a TCP sender might penalize a TCP receiver that acknowledges 2296 data not yet sent by resetting the corresponding connection. Here we 2297 discourage the implementation of this policy, as it would provide an 2298 attack vector for a TCP-based connection-reset attack, similar to 2299 those described in Section 11. 2301 [US-CERT, 2005a] is a vulnerability advisory about this issue. 2303 9.2. Blind DupACK triggering attacks against TCP 2305 While all of the attacks discussed in [Savage et al, 1999] have the 2306 goal of increasing the performance of the attacker's TCP connections, 2307 TCP congestion control mechanisms can be exploited with a variety of 2308 goals. 2310 Firstly, if bursts of many duplicate-ACKs are sent to the "sending 2311 TCP", the third duplicate-ACK will cause the "lost" segment to be 2312 retransmitted, and each subsequent duplicate-ACK will cause cwnd to 2313 be artificially inflated. Thus, the "sending TCP" might end up 2314 injecting more packets into the network than it really should, with 2315 the potential of causing network congestion. This is a potential 2316 consequence of the "Duplicate-ACK spoofing attack" described in 2317 [Savage et al, 1999]. 2319 Secondly, if bursts of three duplicate ACKs are sent to the TCP 2320 sender, the attacked system would infer packet loss, and ssthresh and 2321 cwnd would be reduced. As noted in RFC 5681 [RFC5681], causing two 2322 congestion control events back-to-back will often cut ssthresh and 2323 cwnd to their minimum value of 2*SMSS, with the connection 2324 immediately entering the slower-performing congestion avoidance 2325 phase. While it would not be attractive for an attacker to perform 2326 this attack against one of his TCP connections, the attack might be 2327 attractive when the TCP connection to be attacked is established 2328 between two other parties. 2330 It is usually assumed that in order for an off-path attacker to 2331 perform attacks against a third-party TCP connection, he should be 2332 able to guess a number of values, including a valid TCP Sequence 2333 Number and a valid TCP Acknowledgement Number. While this is true if 2334 the attacker tries to "inject" valid packets into the connection by 2335 himself, a feature of TCP can be exploited to fool one of the TCP 2336 endpoints to transmit valid duplicate Acknowledgements on behalf of 2337 the attacker, hence relieving the attacker of the hard task of 2338 forging valid values for the Sequence Number and Acknowledgement 2339 Number TCP header fields. 2341 Section 3.9 of RFC 793 [RFC0793] describes the processing of incoming 2342 TCP segments as a function of the connection state and the contents 2343 of the various header fields of the received segment. For 2344 connections in the ESTABLISHED state, the first check that is 2345 performed on incoming segments is that they contain "in window" data. 2346 That is, 2348 RCV.NXT <= SEG.SEQ <= RCV.NXT+RCV.WND, or 2350 RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 2352 If a segment does not pass this check, it is dropped, and an 2353 Acknowledgement is sent in response: 2355 2357 The goal of this behavior is that, in the event data segments are 2358 received by the TCP receiver, but all the corresponding 2359 Acknowledgements are lost, when the TCP sender retransmits the 2360 supposedly lost data, the TCP receiver will send an Acknowledgement 2361 reflecting all the data received so far. If "old" TCP segments were 2362 silently dropped, the scenario just described would lead to a 2363 "frozen" TCP connection, with the TCP sender retransmitting the data 2364 for which it has not yet received an Acknowledgement, and the TCP 2365 receiver silently ignoring these segments. Additionally, it helps 2366 TCP to detect half-open connections. 2368 This feature implies that, provided the four-tuple that identifies a 2369 given TCP connection is known or can be easily guessed, an attacker 2370 could send a TCP segment with an "out of window" Sequence Number to 2371 one of the endpoints of the TCP connection to cause it to send a 2372 valid ACK to the other endpoint of the connection. Figure 10 2373 illustrates such a scenario. 2375 See Figure 10, in page 68 of the UK CPNI document. 2377 Blind Dup-ACK forgery attack 2379 As discussed in [Watson, 2004] and RFC 4953 [Touch, 2007], there are 2380 a number of scenarios in which the four-tuple that identifies a TCP 2381 connection is known or can be easily guessed. In those scenarios, an 2382 attacker could perform any of the "blind" attacks described in the 2383 following subsections by exploiting the technique described above. 2385 The following subsections describe blind DupACK-triggering attacks 2386 that aim at either degrading the performance of an arbitrary 2387 connection, or causing a TCP sender to illegitimately increase the 2388 rate at which it transmits data, potentially leading to network 2389 congestion. 2391 9.2.1. Blind throughput-reduction attack 2393 As discussed in Section 9, when three duplicate Acknowledgements are 2394 received, the congestion window is reduced to half the current amount 2395 of outstanding data (FlightSize). Additionally, the slow-start 2396 threshold (ssthresh) is reduced to the same value, causing the 2397 connection to enter the slower-performing congestion avoidance phase. 2398 If two congestion-control events occur back to back, ssthresh and 2399 cwnd will often be reduced to their minimum value of 2*SMSS. 2401 An attacker could exploit the technique described in Section 9.2 to 2402 cause the throughput of the attacked TCP connection to be reduced, by 2403 eliciting three duplicate acknowledgements from the TCP receiver, 2404 which would cause the TCP sender to reduce its congestion window. In 2405 principle, the attacker would need to send a burst of only three out- 2406 of-window segments. However, in case the TCP receiver implements an 2407 acknowledgement policy such as "ACK every other segment", four out- 2408 of-window segments might be needed. The first segment would cause 2409 the pending (delayed) Acknowledgement to be sent, and the next three 2410 segments would elicit the actual duplicate Acknowledgements. 2412 Figure 11 shows a time-line graph of a sample scenario. The burst of 2413 DupACKs (in green) elicited by the burst of out-of-window segments 2414 (in red) sent by the attacker causes the TCP sender to retransmit the 2415 missing segment (in blue) and enter the loss recovery phase. Once a 2416 segment that acknowledges new data is received by the TCP sender, the 2417 loss recovery phase ends, and cwnd and ssthresh are set to half the 2418 number of segments that were outstanding when the loss recovery phase 2419 was entered. 2421 See Figure 11, in page 69 of the UK CPNI document. 2423 Blind throughput-reduction attack (time-line graph) 2425 The graphic assumes that the TCP receiver sends an Acknowledgement 2426 for every other data segment it receives, and that the TCP sender 2427 implements Appropriate Byte Counting (specified in RFC 3465 [Allman, 2428 2003]) on the received Acknowledgement segments. However, 2429 implementation of these policies is not required for the attack to 2430 succeed. 2432 9.2.2. Blind flooding attack 2434 As discussed in Section 9, when three duplicate Acknowledgements are 2435 received, the "lost" segment is retransmitted, and the congestion 2436 window is artificially inflated for each DupACK received, until the 2437 loss recovery phase ends. By sending a long burst of out-of-window 2438 segments to the TCP receiver of the attacked connection, an attacker 2439 could elicit a long burst of valid duplicate acknowledgements that 2440 would illegitimately cause the TCP sender of the attacked TCP 2441 connection to increase its data transmission rate. 2443 Figure 12 shows a time-line graph for this attack. The long burst of 2444 DupACKs (in green) elicited by the long burst of out-of-window 2445 segments (in red) sent by the attacker causes the TCP sender to enter 2446 the loss recovery phase and illegitimately inflate the congestion 2447 window, leading to an increase in the data transmission rate. Once a 2448 segment that acknowledges new data is received by the TCP sender, the 2449 loss recovery phase ends, and the data transmission rate is reduced. 2451 See Figure 12, in page 70 of the UK CPNI document. 2453 Blind flooding attack (time-line graph) 2455 9.2.3. Difficulty in performing the attacks 2457 In order to exploit the technique described in Section 9.2 of this 2458 document, an attacker would need to know the four-tuple {IP Source 2459 Address, TCP Source Port, IP Destination Address, TCP Destination 2460 Port} that identifies the connection to be attacked. As discussed by 2461 [Watson, 2004] and RFC 4953 [Touch, 2007], there are a number of 2462 scenarios in which these values may be known or easily guessed. 2464 It is interesting to note that the attacks described in Section 9.2 2465 of this document will typically require a much smaller number of 2466 packets than other "blind" attacks against TCP, such as those 2467 described in [Watson, 2004] and RFC 4953 [Touch, 2007], as the 2468 technique discussed in Section 9.2 relieves the attacker from having 2469 to guess valid TCP Sequence Numbers and a TCP Acknowledgement 2470 numbers. 2472 The attacks described in Section 9.2.1 and Section 9.2.2 of this 2473 document require the attacker to forge the source address of the 2474 packets it sends. Therefore, if ingress/egress filtering is 2475 performed by intermediate systems, the attacker's packets would not 2476 get to the intended recipient, and thus the attack would not succeed. 2477 However, we consider that ingress/egress filtering cannot be relied 2478 upon as the first line of defense against these attacks. 2480 Finally, it is worth noting that in order to successfully perform the 2481 blind attacks discussed in Section 9.2.1 and Section 9.2.2 of this 2482 document, the burst of out-of-sequence segments sent by the attacker 2483 should not be intermixed with valid data segments sent by the TCP 2484 sender, or else the Acknowledgement number of the illegitimately- 2485 elicited ACK segments would change, and the Acknowledgements would 2486 not be considered "Duplicate Acknowledgements" by the TCP sender. 2487 Tests performed in real networks seem to suggest that this 2488 requirement is not hard to fulfill, though. 2490 9.2.4. Modifications to TCP's loss recovery algorithms 2492 There are a number of algorithms that augment TCP's loss recovery 2493 mechanism that have been suggested by TCP researchers and have been 2494 specified by the IETF in the RFC series. This section describes a 2495 number of these algorithms, and discusses how their implementation 2496 affects (or not) the vulnerability of TCP to the attacks discussed in 2497 Section 9.2.1 and Section 9.2.2 of this document. 2499 NewReno 2501 RFC 3782 [Floyd et al, 2004] specifies the NewReno algorithm, which 2502 is meant to improve TCP's performance in the presence of multiple 2503 losses in a single window of data. The implication of this algorithm 2504 with respect to the attacks discussed in the previous sections is 2505 that whenever either of the attacks is performed against a connection 2506 with a NewReno TCP sender, a full-window (or half a window) of data 2507 will be unnecessarily retransmitted. This is particularly 2508 interesting in the case of the blind-flooding attack, as the attack 2509 would elicit even more packets from the TCP sender. 2511 Whether a full-window or just half a window of data is retransmitted 2512 depends on the Acknowledgement policy at the TCP receiver. If the 2513 TCP receiver sends an Acknowledgement (ACK) for every segment, a 2514 full-window of data will be retransmitted. If the TCP receiver sends 2515 an Acknowledgement (ACK) for every other segment, then only half a 2516 window of data will be retransmitted. 2518 Limited Transmit 2520 RFC 3042 [Allman et al, 2001] proposes an enhancement to TCP to more 2521 effectively recover lost segments when a connection's congestion 2522 window is small, or when a large number of segments are lost in a 2523 single transmission window. The "Limited Transmit" algorithm calls 2524 for sending a new data segment in response to each of the first two 2525 Duplicate Acknowledgements that arrive at the TCP sender. This would 2526 provide two additional transmitted packets that may be useful for the 2527 attacker in the case of the blind flooding attack described in 2528 Section 9.2.2 is performed. 2530 SACK-based loss recovery 2532 [I-D.ietf-tcpm-3517bis] specifies a conservative loss-recovery 2533 algorithm that is based on the use of the selective acknowledgement 2534 (SACK) TCP option. The algorithm uses DupACKs as an indication of 2535 congestion, as specified in RFC 2581 [RFC5681]. However, a 2536 difference between this algorithm and the basic algorithm described 2537 in RFC 2581 is that it clocks out segments only with the SACK 2538 information included in the DupACKs. That is, during the loss 2539 recovery phase, segments will be injected in the network only if the 2540 SACK information included in the received DupACKs indicates that one 2541 or more segments have left the network. As a result, those systems 2542 that implement SACK-based loss recovery will not be vulnerable to the 2543 blind flooding attack described in Section 9.2.2. Additionally, as 2544 [I-D.ietf-tcpm-3517bis] requires DupACKs to include new SACK 2545 information (corresponding to data that has not yet been acknowledged 2546 by TCP's cumulative Acknowledgement), systems that implement SACK- 2547 based loss-recovery will not be vulnerable to the blind throughput- 2548 reduction attack described in Section 9.2.1. 2550 9.2.5. Countermeasures 2552 [draft-gont-tcpm-limiting-aow-segments-00.txt] proposes to rate-limit 2553 the reaction to out-of-window segments. This would mitigate the 2554 attacks described earlier in this section. 2556 9.3. TCP Explicit Congestion Notification (ECN) 2558 ECN (Explicit Congestion Notification) provides a mechanism for 2559 intermediate systems to signal congestion to the communicating 2560 endpoints that in some scenarios can be used as an alternative to 2561 dropping packets. 2563 RFC 3168 [Ramakrishnan et al, 2001] contains a detailed discussion of 2564 the possible ways and scenarios in which ECN could be exploited by an 2565 attacker. 2567 RFC 3540 [Spring et al, 2003] specifies an improvement to ECN based 2568 on nonces, that protects against accidental or malicious concealment 2569 of marked packets from the TCP sender. The specified mechanism 2570 defines a "NS" ("Nonce Sum") field in the TCP header that makes use 2571 of one bit from the Reserved field, and requires a modification in 2572 both of the endpoints of a TCP connection to process this new field. 2573 This mechanism is still in "Experimental" status, and since it might 2574 suffer from the behavior of some middle-boxes such as firewalls or 2575 packet-scrubbers, we defer a recommendation of this mechanism until 2576 more experience is gained. 2578 There also is ongoing work in the research community and the IETF 2579 to define alternate semantics for the ECN field of the IP header 2580 (e.g., see [PCNWG, 2009]). 2582 RFC 3168 [RFC3168] provides a very throrough security assessment of 2583 ECN. Among the possible mitigations, it describes the use of 2584 "penalty boxes" which would act on flows that do not respond 2585 appropriately to congestion indications. Section 10 of RFC 3168 2586 suggests that a first action taken at a penalty box for an ECN- 2587 capable flow would be to switch to dropping packets (instead of 2588 marking them), and, if the flow does not respond appropriately to the 2589 congestion indication, the penalty box could reset the misbehaving 2590 connection. Here we discourage implementation of such a policy, as 2591 it would create a vector for connection-reset attacks. For example, 2592 an attacker could forge TCP segments with the same four-tuple as the 2593 targeted connection and cause them to transit the penalty box. The 2594 penalty box would first switch from marking to dropping packets. 2595 However, the attacker would continue sending forged segments, at a 2596 steady rate. As a result, if the penalty box implemented such a 2597 severe policy of resetting connections for flows that still do not 2598 respond to end-to-end congestion control after switching from marking 2599 to dropping, the attacked connection would be reset. 2601 10. TCP API 2603 NOTE: THIS SECTION IS BEING EDITED. 2605 Section 3.8 of RFC 793 [RFC0793] describes the minimum set of TCP 2606 User Commands required of all TCP Implementations. Most operating 2607 systems provide an Application Programming Interface (API) that 2608 allows applications to make use of the services provided by TCP. One 2609 of the most popular APIs is the Sockets API, originally introduced in 2610 the BSD networking package [McKusick et al, 1996]. 2612 10.1. Passive opens and binding sockets 2614 When there is already a pending passive OPEN for some local port 2615 number, TCP SHOULD NOT allow processes that do not belong to the same 2616 user to "reuse" the local port for another passive OPEN. 2617 Additionally, reuse of a local port SHOULD default to "off", and be 2618 enabled only by an explicit command (e.g., the setsockopt() function 2619 of the Sockets API). 2621 DISCUSSION: 2623 RFC 793 specifies the syntax of the "OPEN" command, which can be 2624 used to perform both passive and active opens. The syntax of this 2625 command is as follows: 2627 OPEN (local port, foreign socket, active/passive [, timeout] [, 2628 precedence] [, security/compartment] [, options]) -> local 2629 connection name 2631 When this command is used to perform a passive open (i.e., the 2632 active/passive flag is set to passive), the foreign socket 2633 parameter may be either fully-specified (to wait for a particular 2634 connection) or unspecified (to wait for any call). 2636 As discussed in Section 2.7 of RFC 793 [RFC0793], if there are 2637 several passive OPENs with the same local socket (recorded in the 2638 corresponding TCB), an incoming connection will be matched to the 2639 TCB with the more specific foreign socket. This means that when 2640 the foreign socket of a passive OPEN matches that of the incoming 2641 connection request, that passive OPEN takes precedence over those 2642 passive OPENs with an unspecified foreign socket. 2644 Popular implementations such as the Sockets API let the user 2645 specify the local socket as fully-specified {local IP address, 2646 local TCP port} pair, or as just the local TCP port (leaving the 2647 local IP address unspecified). In the former case, only those 2648 connection requests sent to {local port, local IP address} will be 2649 accepted. In the latter case, connection requests sent to any of 2650 the system's IP addresses will be accepted. In a similar fashion 2651 to the generic API described in Section 2.7 of RFC 793, if there 2652 is a pending passive OPEN with a fully-specified local socket that 2653 matches that for which a connection establishment request has been 2654 received, that local socket will take precedence over those which 2655 have left the local IP address unspecified. The implication of 2656 this is that an attacker could "steal" incoming connection 2657 requests meant for a local application by performing a passive 2658 OPEN that is more specific than that performed by the legitimate 2659 application. 2661 10.2. Active opens and binding sockets 2663 TCP SHOULD NOT allow port numbers that have been allocated for a TCP 2664 that is the LISTEN or CLOSED states to be specified as the "local 2665 port" argument of the "OPEN" command. 2667 An implementation MAY relax the aforementioned restriction when the 2668 process or system user requesting allocation of such a port number is 2669 the same that the process or system user controlling the TCP in the 2670 CLOSED or LISTEN states with the same port number. 2672 DISCUSSION: 2674 As discussed in Section 10.1, the "OPEN" command specified in 2675 Section 3.8 of RFC 793 [RFC0793] can be used to perform active 2676 opens. In case of active opens, the parameter "local port" will 2677 contain a so-called "ephemeral port". While the only requirement 2678 for such an ephemeral port is that the resulting connection-id is 2679 unique, port numbers that are currently in use by a TCP in the 2680 LISTEN state should not be allowed for use as ephemeral ports. If 2681 this rule is not complied, an attacker could potentially "steal" 2682 an incoming connection to a local server application by issuing a 2683 connection request to the victim client at roughly the same time 2684 the client tries to connect to the victim server application. If 2685 the SYN segment corresponding to the attacker's connection request 2686 and the SYN segment corresponding to the victim client "cross each 2687 other in the network", and provided the attacker is able to know 2688 or guess the ephemeral port used by the client, a TCP simultaneous 2689 open scenario would take place, and the incoming connection 2690 request sent by the client would be matched with the attacker's 2691 socket rather than with the victim server application's socket. 2693 As already noted, in order for this attack to succeed, the 2694 attacker should be able to guess or know (in advance) the 2695 ephemeral port selected by the victim client, and be able to know 2696 the right moment to issue a connection request to the victim 2697 client. While in many scenarios this may prove to be a difficult 2698 task, some factors such as an inadequate ephemeral port selection 2699 policy at the victim client could make this attack feasible. 2701 It should be noted that most applications based on popular 2702 implementations of TCP API (such as the Sockets API) perform 2703 "passive opens" in three steps. Firstly, the application obtains 2704 a file descriptor to be used for inter-process communication 2705 (e.g., by issuing a socket() call). Secondly, the application 2706 binds the file descriptor to a local TCP port number (e.g., by 2707 issuing a bind() call), thus creating a TCP in the fictional 2708 CLOSED state. Thirdly, the aforementioned TCP is put in the 2709 LISTEN state (e.g., by issuing a listen() call). As a result, 2710 with such an implementation of the TCP API, even if port numbers 2711 in use for TCPs in the LISTEN state were not allowed for use as 2712 ephemeral ports, there is a window of time between the second and 2713 the third steps in which an attacker could be allowed to select a 2714 port number that would be later used for listening to incoming 2715 connections. Therefore, these implementations of the TCP API 2716 should enforce a stricter requirement for the allocation of port 2717 numbers: port numbers that are in use by a TCP in the LISTEN or 2718 CLOSED states should not be allowed for allocation as ephemeral 2719 ports. 2721 An implementation might choose to relax the aforementioned 2722 restriction when the process or system user requesting allocation 2723 of such a port number is the same that the process or system user 2724 controlling the TCP in the CLOSED or LISTEN states with the same 2725 port number. 2727 11. Blind in-window attacks 2729 NOTE: THIS SECTION IS BEING EDITED. 2731 In the last few years awareness has been raised about a number of 2732 "blind" attacks that can be performed against TCP by forging TCP 2733 segments that fall within the receive window [NISCC, 2004] [Watson, 2734 2004]. 2736 The term "blind" refers to the fact that the attacker does not have 2737 access to the packets that belong to the attacked connection. 2739 The effects of these attacks range from connection resets to data 2740 injection. While these attacks were known in the research community, 2741 they were generally considered unfeasible. However, increases in 2742 bandwidth availability and the use of larger TCP windows raised 2743 concerns in the community. The following subsections discuss a 2744 number of forgery attacks against TCP, along with the possible 2745 countermeasures to mitigate their impact. 2747 11.1. Blind TCP-based connection-reset attacks 2749 Blind connection-reset attacks have the goal of causing a TCP 2750 connection maintained between two TCP endpoints to be aborted. The 2751 level of damage that the attack may cause usually depends on the 2752 application running on top of TCP, with the more vulnerable 2753 applications being those that rely on long-lived TCP connections. 2755 An interesting case of such applications is BGP [Rekhter et al, 2756 2006], in which a connection-reset usually results in the 2757 corresponding entries of the routing table being flushed. 2759 There are a variety of vectors for performing TCP-based connection- 2760 reset attacks against TCP. [Watson, 2004] and [NISCC, 2004] raised 2761 awareness about connection-reset attacks that exploit the RST flag of 2762 TCP segments. [Ramaiah et al, 2008] noted that carefully crafted SYN 2763 segments could also be used to perform connection-reset attacks. 2764 This document describes yet two previously undocumented vectors for 2765 performing connection-reset attacks: the Precedence field of IP 2766 packets that encapsulate TCP segments, and illegal TCP options. 2768 11.1.1. RST flag 2770 The RST flag signals a TCP peer that the connection should be 2771 aborted. In contrast with the FIN handshake (which gracefully 2772 terminates a TCP connection), an RST segment causes the connection to 2773 be abnormally closed. 2775 As stated in Section 3.4 of RFC 793 [RFC0793], all reset segments are 2776 validated by checking their Sequence Numbers, with the Sequence 2777 Number considered valid if it is within the receive window. In the 2778 SYN-SENT state, however, an RST is valid if the Acknowledgement 2779 Number acknowledges the SYN segment that supposedly elicited the 2780 reset. 2782 [RFC5961] proposes a modification to TCP's transition diagram to 2783 address this attack vector. The counter-measure is a combination of 2784 enforcing a more strict validation check on the sequence number of 2785 reset segments, and the addition of a "challenge" mechanism. 2787 We note that we are aware of patent claims on this counter- 2788 measure, and suggest vendors to research the consequences of the 2789 possible patents that may apply. 2791 [US-CERT, 2003a] is an advisory of a firewall system that was found 2792 particularly vulnerable to resets attack because of not validating 2793 the TCP Sequence Number of RST segments. Clearly, all TCPs 2794 (including those in middle-boxes) should validate RST segments as 2795 discussed in this section. 2797 11.1.2. SYN flag 2799 Section 3.9 (page 71) of RFC 793 [RFC0793] states that if a SYN 2800 segment is received with a valid (i.e., "in window") Sequence Number, 2801 an RST segment should be sent in response, and the connection should 2802 be aborted. This could be leveraged to perform a blind connection- 2803 reset attack. [RFC5961] proposes a change in TCP's state diagram to 2804 mitigate this attack vector. 2806 11.1.3. Security/Compartment 2808 Section 3.9 (page 71) of RFC 793 [RFC0793] states that if the IP 2809 security/compartment of an incoming segment does not exactly match 2810 the security/compartment in the TCB, a RST segment should be sent, 2811 and the connection should be aborted. This certainly provides 2812 another attack vector for performing connection-reset attacks, as an 2813 attacker could forge TCP segments with a security/compartment that is 2814 different from that recorded in the corresponding TCB and, as a 2815 result, the attacked connection would be reset. 2817 [draft-gont-tcpm-tcp-seccomp-prec-00.txt] aims to update RFC 793 such 2818 that this issue is eliminated. 2820 11.1.4. Precedence 2822 Section 3.9 (page 71) of RFC 793 [RFC0793] states that if the IP 2823 precedence of an incoming segment does not exactly match the 2824 precedence in the TCB, a RST segment should be sent, and the 2825 connection should be aborted. This certainly provides another attack 2826 vector for performing connection-reset attacks, as an attacker could 2827 forge TCP segments with a precedence that is different from that 2828 recorded in the corresponding TCB and, as a result, the attacked 2829 connection would be reset. 2831 [draft-gont-tcpm-tcp-seccomp-prec-00.txt] aims to update RFC 793 such 2832 that this issue is eliminated. 2834 11.1.5. Illegal options 2836 Section 4.2.2.5 of RFC 1122 [RFC1122] discusses the processing of TCP 2837 options. It states that TCP should be prepared to handle an illegal 2838 option length (e.g., zero) without crashing, and suggests handling 2839 such illegal options by resetting the corresponding connection and 2840 logging the reason. However, this suggested behavior could be 2841 exploited to perform connection-reset attacks. 2843 [draft-gont-tcpm-tcp-illegal-option-lengths-00] aims at formally 2844 updating RFC 1122, such that this issue is eliminated. 2846 11.2. Blind data-injection attacks 2848 An attacker could try to inject data in the stream of data being 2849 transferred on the connection. As with the other attacks described 2850 in Section 11 of this document, in order to perform a blind data 2851 injection attack the attacker would need to know or guess the four- 2852 tuple that identifies the TCP connection to be attacked. 2853 Additionally, he should be able to guess a valid ("in window") TCP 2854 Sequence Number, and a valid Acknowledgement Number. 2856 As discussed in Section 3.4 of this document, [Ramaiah et al, 2008] 2857 proposes to enforce a more strict check on the Acknowledgement Number 2858 of incoming segments than that specified in RFC 793 [RFC0793]. 2860 Implementation of the proposed check requires more packets on the 2861 side of the attacker to successfully perform a blind data-injection 2862 attack. However, it should be noted that applications concerned with 2863 any of the attacks discussed in Section 11 of this document should 2864 make use of proper authentication techniques, such as those specified 2865 for IPsec in RFC 4301 [Kent and Seo, 2005]. 2867 12. Information leaking 2869 NOTE: THIS SECTION IS BEING EDITED. 2871 12.1. Remote Operating System detection via TCP/IP stack fingerprinting 2873 Clearly, remote Operating System (OS) detection is a useful tool for 2874 attackers. Tools such as nmap [Fyodor, 2006b] can usually detect the 2875 operating system type and version of a remote system with an 2876 amazingly accurate precision. This information can in turn be used 2877 by attackers to tailor their exploits to the identified operating 2878 system type and version. 2880 Evasion of OS fingerprinting can prove to be a very difficult task. 2881 Most systems make use of a variety of protocols, each of which have a 2882 large number of parameters that can be set to arbitrary values. 2883 Thus, information on the operating system may be obtained from a 2884 number of sources ranging from application banners to more obscure 2885 parameters such as TCP's retransmission timer. 2887 Nmap [Fyodor, 2006b] is probably the most popular tool for remote OS 2888 detection via active TCP/IP stack fingerprinting. p0f [Zalewski, 2889 2006a], on the other hand, is a tool for performing remote OS 2890 detection via passive TCP/IP stack fingerprinting. SinFP [SinFP, 2891 2006] can perform both active and passive fingerprinting. Finally, 2892 TBIT [TBIT, 2001] is a TCP fingerprinting tool that aims at 2893 characterizing the behavior of a remote TCP peer based on active 2894 probes, and which has been widely used in the research community. 2896 TBIT [TBIT, 2001] implements a number of tests not present in other 2897 tools, such as characterizing the behavior of a TCP peer with respect 2898 to TCP congestion control. 2900 [Fyodor, 1998] and [Fyodor, 2006a] are classic papers on the subject. 2901 [Miller, 2006] and [Smith and Grundl, 2002] provide an introduction 2902 to passive TCP/IP stack fingerprinting. [Smart et al, 2000] and 2903 [Beck, 2001] discuss some techniques for evading OS detection through 2904 TCP/IP stack fingerprinting. 2906 The following subsections discuss TCP-based techniques for remote OS 2907 detection via and, where possible, propose ways to mitigate them. 2909 12.1.1. FIN probe 2911 TCP MUST silently drop TCP any segments received for a connection in 2912 the LISTEN state that do not have the SYN, RST, or ACK flags set. In 2913 the rest of the cases, the processing rules in RFC 793 MUST be 2914 applied. 2916 DISCUSSION: 2918 The attacker sends a FIN (or any packet without the SYN or the ACK 2919 flags set) to an open port. RFC 793 [RFC0793] leaves the reaction 2920 to such segments unspecified. As a result, some implementations 2921 silently drop the received segment, while others respond with a 2922 RST. 2924 12.1.2. Bogus flag test 2926 TCP MUST ignore any flags not supported, and MUST NOT reflect them if 2927 a TCP segment is sent in response to the one just received. 2929 DISCUSSION: 2931 The attacker sends a TCP segment setting at least one bit of the 2932 Reserved field. Some implementations ignore this field, while 2933 others reset the corresponding connection or reflect the field in 2934 the TCP segment sent in response. 2936 12.1.3. TCP ISN sampling 2938 The attacker samples a number of Initial Sequence Numbers by sending 2939 a number of connection requests. Many TCP implementations differ on 2940 the ISN generator they implement, thus allowing the correlation of 2941 ISN generation algorithm to the operating system type and version. 2943 This document advises implementing an ISN generator that follows the 2944 behavior described in RFC 1948 [Bellovin, 1996]. However, it should 2945 be noted that even if all TCP implementations generated their ISNs as 2946 proposed in RFC 1948, there is still a number of implementation 2947 details that are left unspecified, which would allow remote OS 2948 fingerprinting by means of ISN sampling. For example, the time- 2949 dependent parameter of the hash could have a different frequency in 2950 different TCP implementations. 2952 12.1.4. TCP initial window 2954 Many TCP implementations differ on the initial TCP window they use. 2955 There are a number of factors that should be considered when 2956 selecting the TCP window to be used for a given system. A number of 2957 implementations that use static windows (i.e., no automatic buffer 2958 tuning mechanisms are implemented) default to a window of around 32 2959 KB, which seems sensible for the general case. On the other hand, a 2960 window of 4 KB seems to be common practice for connections servicing 2961 critical applications such as BGP. It is clear that the window size 2962 is a tradeoff among a number of considerations. Section 3.7 2963 discusses some of the considerations that should be made when 2964 selecting the window size for a TCP connection. 2966 If automatic tuning mechanisms are implemented, we suggest the 2967 initial window to be at least 4 * RMSS segments. We note that a 2968 remote OS fingerprinting tool could still sample the advertised TCP 2969 window, trying to correlate the advertised window with the potential 2970 automatic buffer tuning algorithm and Operating System. 2972 12.1.5. RST sampling 2974 If an RST must be sent in response to an incoming segment, then if 2975 the ACK bit of an incoming TCP segment is off, a Sequence Number of 2976 zero MUST be used in the RST segment sent in response. That is, 2978 2980 It should be noted that the SEG.LEN value used for the 2981 Acknowledgement Number MUST be incremented once for each flag set in 2982 the original segment that makes use of a byte of the sequence number 2983 space. That is, if only one of the SYN or FIN flags were set in the 2984 received segment, the Acknowledgement Number of the response should 2985 be set to SEG.SEQ+SEG.LEN+1. If both the SYN and FIN flags were set 2986 in the received segment, the Acknowledgement Number should be set to 2987 SEG.SEQ+SEG.LEN+2. 2989 We also RECOMMEND that TCP sets ACK bit (and the Acknowledgement 2990 Number) in all outgoing RST segments, as it allows for additional 2991 validation checks to be enforced at the system receiving the segment. 2993 DISCUSSION: 2995 [Fyodor, 1998] reports that many implementations differ in the 2996 Acknowledgement Number they use in response to segments received 2997 for connections in the CLOSED state. In particular, these 2998 implementations differ in the way they construct the RST segment 2999 that is sent in response to those TCP segments received for 3000 connections in the CLOSED state. 3002 RFC 793 [RFC0793] describes (in pages 36-37) how RST segments are 3003 to be generated. According to this RFC, the ACK bit (and the 3004 Acknowledgment Number) is set in a RST only if the incoming 3005 segment that elicited the RST did not have the ACK bit set (and 3006 thus the Sequence Number of the outgoing RST segment must be set 3007 to zero). However, we recommend TCP implementations to set the 3008 ACK bit (and the Acknowledgement Number) in all outgoing RST 3009 segments, as it allows for additional validation checks to be 3010 enforced at the system receiving the segment. 3012 12.1.6. TCP options 3014 Different implementations differ in the TCP options they enable by 3015 default. Additionally, they differ in the actual contents of the 3016 options, and in the order in which the options are included in a TCP 3017 segment. There is currently no recommendation on the order in which 3018 to include TCP options in TCP segments. 3020 12.1.7. Retransmission Timeout (RTO) sampling 3022 TCP uses a retransmission timer for retransmitting data in the 3023 absence of any feedback from the remote data receiver. The duration 3024 of this timer is referred to as "retransmission timeout" (RTO). RFC 3025 2988 [Paxson and Allman, 2000] specifies the algorithm for computing 3026 the TCP retransmission timeout (RTO). 3028 The algorithm allows the use of clocks of different granularities, to 3029 accommodate the different granularities used by the existing 3030 implementations. Thus, the difference in the resulting RTO can be 3031 used for remote OS fingerprinting. [Veysset et al, 2002] describes 3032 how to perform remote OS fingerprinting by sampling and analyzing the 3033 RTO of the target system. However, this fingerprinting technique has 3034 at least the following drawbacks: 3036 o It is usually much slower than other fingerprinting techniques, as 3037 it may require considerable time to sample the RTO of a given 3038 target. 3040 o It is less reliable than other fingerprinting techniques, as 3041 latency and packet loss can lead to bogus results. 3043 While in principle it would be possible to defeat this fingerprinting 3044 technique (e.g., by obfuscating the granularity of the clock used for 3045 computing the RTO), we consider that a more important step to defeat 3046 remote OS detection is for implementations to address the more 3047 effective fingerprinting techniques described in Sections 12.1.1 3048 through 12.1.7 of this document. 3050 12.2. System uptime detection 3052 The "uptime" of a system may prove to be valuable information to an 3053 attacker. For example, it might reveal the last time a security 3054 patch was applied. Information about system uptime is usually leaked 3055 by TCP header fields or options that are (or may be) time-dependent, 3056 and are usually initialized to zero when the system is bootstrapped. 3057 As a result, if the attacker knows the frequency with which the 3058 corresponding parameter or header field is incremented, and is able 3059 to sample the current value of that parameter or header field, the 3060 system uptime will be easily obtained. Two fields that can 3061 potentially reveal the system uptime is the Sequence Number field of 3062 a SYN or SYN/ACK segment (i.e., when it contains an ISN) and the 3063 TSval field of the timestamp option. Section 3.3.1 of this document 3064 discusses the generation of TCP Initial Sequence Numbers. Section 3065 4.7.1 of this document discusses the generation of TCP timestamps. 3067 13. Covert channels 3069 As virtually every communications protocol, TCP can be exploited to 3070 establish covert channels. While an exhaustive discussion of covert 3071 channels is out of the scope of this document, for completeness of 3072 the document we simply note that it is possible for a (probably 3073 malicious) user to establish a covert channel by means of TCP, such 3074 that data can be surreptitiously passed to a remote system, probably 3075 unnoticed by a monitoring system, and with the possibility of 3076 concealing the location of the source system. 3078 In most cases, covert channels based on manipulation of TCP fields 3079 can be eliminated by protocol scrubbers and other middle-boxes. On 3080 the other hand, "timing channels" may prove to be more difficult to 3081 eliminate. 3083 [Rowland, 1996] contains a discussion of covert channels in the 3084 TCP/IP protocol suite, with some TCP-based examples. [Giffin et al, 3085 2002] describes the use of TCP timestamps for the establishment of 3086 covert channels. [Zander, 2008] contains an extensive bibliography 3087 of papers on covert channels, and a list of freely-available tools 3088 that implement covert channels with the TCP/IP protocol suite. 3090 14. TCP Port scanning 3092 NOTE: THIS SECTION IS BEING EDITED. 3094 TCP port scanning aims at identifying TCP port numbers on which there 3095 is a process listening for incoming connections. That is, it aims at 3096 identifying TCPs at the target system that are in the LISTEN state. 3097 The following subsections describe different TCP port scanning 3098 techniques that have been implemented in freely-available tools. 3099 These subsections focus only on those port scanning techniques that 3100 exploit features of TCP itself, and not of other communication 3101 protocols. 3103 For example, the following subsections do not discuss the 3104 exploitation of application protocols (such as FTP) or the 3105 exploitation of features of underlying protocols (such as the IP 3106 Identification field) for port-scanning purposes. 3108 14.1. Traditional connect() scan 3110 The most trivial scanning technique consists in trying to perform the 3111 TCP three-way handshake with each of the port numbers at the target 3112 system (e.g. by issuing a call to the connect() function of the 3113 Sockets API). The three-way handshake will complete for port numbers 3114 that are "open", but will fail for those port numbers that are 3115 "closed". 3117 As this port-scanning technique can be implemented by issuing a call 3118 to the connect() function of the Sockets API that normal applications 3119 use, it does not require the attacker to have superuser privileges. 3120 The downside of this port-scanning technique is that it is less 3121 efficient than other scanning methods (e.g., the "SYN scan" described 3122 in Section 14.2), and that it can be easily logged by the target 3123 system. 3125 14.2. SYN scan 3127 The SYN scan was introduced as a "stealth" port-scanning technique. 3128 It aims at avoiding the target system from logging the port scan by 3129 not completing the TCP three-way handshake. When a SYN/ACK segment 3130 is received in response to the initial SYN segment, the system 3131 performing the port scan will respond with an RST segment, thus 3132 preventing the three-way handshake from completing. While this port- 3133 scanning technique is harder to detect and log than the traditional 3134 connect() scan described in Section 14.1, most current NIDS (Network 3135 Intrusion Detection Systems) can detect and log it. 3137 SYN scans are sometimes mistakenly reported as "SYN flood" attacks by 3138 NIDS, though. 3140 The main advantage of this port scanning technique is that it is much 3141 more efficient than the traditional connect() scan. 3143 In order to implement this port-scanning technique, port-scanning 3144 tools usually bypass the TCP API, and forge the SYN segments they 3145 send (e.g., by using raw sockets). This typically requires the 3146 attacker to have superuser privileges to be able to run the port- 3147 scanning tool. 3149 14.3. FIN, NULL, and XMAS scans 3151 TCP SHOULD respond with an RST when a TCP segment is received for a 3152 connection in the LISTEN state, and the incoming segment has neither 3153 the SYN bit nor the RST bit set. 3155 DISCUSSION: 3157 RFC 793 [RFC0793] states, in page 65, that an incoming segment 3158 that does not have the RST bit set and that is received for a 3159 connection in the fictional state CLOSED causes an RST to be sent 3160 in response. Pages 65-66 of RFC 793 describes the processing of 3161 incoming segments for connections in the state LISTEN, and 3162 implicitly states that an incoming segment that does not have the 3163 ACK bit set (and is not a SYN or an RST) should be silently 3164 dropped. 3166 As a result, an attacker can exploit this situation to perform a 3167 port scan by sending TCP segments that do not have the ACK bit set 3168 to the target system. When a port is "open" (i.e., there is a TCP 3169 in the LISTEN state on the corresponding port), the target system 3170 will respond with an RST segment. On the other hand, if the port 3171 is "closed" (i.e., there is a TCP in the fictional state CLOSED) 3172 the attacker will not get any response from the target system. 3174 Since the only requirement for exploiting this port scanning 3175 vector is that the probe segments must not have the ACK bit set, 3176 there are a number of different TCP control-bits combinations that 3177 can be used for the probe segments. 3179 When the probe segment sent to the target system is a TCP segment 3180 that has only the FIN bit set, the scanning technique is usually 3181 referred to as a "FIN scan". When the probe packet is a TCP 3182 segment that does not have any of the control bits set, the 3183 scanning technique is usually known as a "NULL scan". Finally, 3184 when the probe packet sent to the target system has only the FIN, 3185 PSH, and the URG bits set, the port-scanning technique is known as 3186 a "XMAS scan". 3188 It should be clear that while the aforementioned control-bits 3189 combinations are the most popular ones, other combinations could 3190 be used to exploit this port-scanning vector. For example, the 3191 CWR, ECE, and/or any of the Reserved bits could be set in the 3192 probe segments. 3194 The advantage of this port-scanning technique is that in can 3195 bypass some stateless firewalls. However, the downside is that a 3196 number of implementations do not comply strictly with RFC 793 3197 [RFC0793], and thus always respond to the probe segments with an 3198 RST, regardless of whether the port is open or closed. 3200 This port-scanning vector can be easily defeated as rby responding 3201 with an RST when a TCP segment is received for a connection in the 3202 LISTEN state, and the incoming segment has neither the SYN bit nor 3203 the RST bit set. 3205 14.4. Maimon scan 3207 If a TCP that is in the CLOSED or LISTEN states receives a TCP 3208 segment with both the FIN and ACK bits set, it MUST respond with a 3209 RST. 3211 DISCUSSION: 3213 This port scanning technique was introduced in [Maimon, 1996] with 3214 the name "StealthScan" (method #1), and was later incorporated 3215 into the nmap tool [Fyodor, 2006b] as the "Maimon scan". 3217 This port scanning technique employs TCP segments that have both 3218 the FIN and ACK bits sets as the probe segments. While according 3219 to RFC 793 [RFC0793] these segments should elicit an RST 3220 regardless of whether the corresponding port is open or closed, a 3221 programming flaw found in a number of TCP implementations has 3222 caused some systems to silently drop the probe segment if the 3223 corresponding port was open (i.e., there was a TCP in the LISTEN 3224 state), and respond with an RST only if the port was closed. 3226 Therefore, an RST would indicate that the scanned port is closed, 3227 while the absence of a response from the target system would 3228 indicate that the scanned port is open. 3230 While this bug has not been found in current implementations of 3231 TCP, it might still be present in some legacy systems. 3233 14.5. Window scan 3235 When sending an RST segment, TCP SHOULD set the Window field to zero. 3237 DISCUSSION: 3239 This port-scanning technique employs ACK segments as the probe 3240 packets. ACK segments will elicit an RST from the target system 3241 regardless of whether the corresponding TCP port is open or 3242 closed. However, as described in [Maimon, 1996], some systems set 3243 the Window field of the RST segments with different values 3244 depending on whether the corresponding TCP port is open or closed. 3245 These systems set the Window field of their RST segments to zero 3246 when the corresponding TCP port is closed, and set the Window 3247 field to a non-zero value when the corresponding TCP port is open. 3249 As a result, an attacker could exploit this situation for 3250 performing a port scan by sending ACK segments to the target 3251 system, and examining the Window field of the RST segments that 3252 his probe segments elicit. 3254 In order to defeat this port-scanning technique, we recommend TCP 3255 implementations to set the Window field to zero in all the RST 3256 segments they send. Most popular implementations of TCP already 3257 implement this policy. 3259 14.6. ACK scan 3261 The so-called "ACK scan" is not really a port-scanning technique 3262 (i.e., it does not aim at determining whether a specific port is open 3263 or closed), but rather aims at determining whether some intermediate 3264 system is filtering TCP segments sent to that specific port number. 3266 The probe packet is a TCP segment with the ACK bit set which, 3267 according to RFC 793 [RFC0793] should elicit an RST from the target 3268 system regardless of whether the corresponding TCP port is open or 3269 closed. If no response is received from the target system, it is 3270 assumed that some intermediate system is filtering the probe packets 3271 sent to the target system. 3273 It should be noted that this "port scanning" techniques exploits 3274 basic TCP processing rules, and therefore cannot be defeated at an 3275 end-system. 3277 15. Processing of ICMP error messages by TCP 3279 [RFC5927] analyzes a number of vulnerabilities based on crafted ICMP 3280 messages, along with possible counter-measures. 3282 16. TCP interaction with the Internet Protocol (IP) 3283 16.1. TCP-based traceroute 3285 The traceroute tool is used to identify the intermediate systems the 3286 local system and the destination system. It is usually implemented 3287 by sending "probe" packets with increasing IP Time to Live values 3288 (starting from 0), without maintaining any state with the final 3289 destination. 3291 Some traceroute implementations use ICMP "echo request" messages as 3292 the probe packets, while others use UDP packets or TCP SYN segments. 3294 In some cases, the state-less nature of the traceroute tool may 3295 prevent it from working correctly across stateful devices such as 3296 Network Address Translators (NATs) or firewalls. 3298 In order to by-pass this limitation, an attacker could establish a 3299 TCP connection with the destination system, and start sending TCP 3300 segments on that connection with increasing IP Time to Live values 3301 (starting from 0) [Zalewski, 2007] [Zalewski, 2008]. Provided ICMP 3302 error messages are not blocked by any intermediate system, an 3303 attacker could exploit this technique to map the network topology 3304 behind the aforementioned stateful devices in scenarios in which he 3305 could not have achieved this goal using the traditional traceroute 3306 tool. 3308 NATs [Srisuresh and Egevang, 2001] and other middle-boxes could 3309 defeat this network-mapping technique by overwriting the Time to Live 3310 of the packets they forward to the internal network. For example, 3311 they could overwrite the Time to Live of all packets being forwarded 3312 to an internal network with a value such as 128. We strongly 3313 recommend against overwriting the IP Time to Live field with the 3314 value 255 or other similar large values, as this could allow an 3315 attacker to bypass the protection provided by the Generalized TTL 3316 Security Mechanism (GTSM) described in RFC 5087 [Gill et al, 2007]. 3318 [Gont and Srisuresh, 2008] discusses the security implications of 3319 NATs, and proposes mitigations for this and other issues. 3321 16.2. Blind TCP data injection through fragmented IP traffic 3323 As discussed in Section 11.2, TCP data injection attacks usually 3324 require an attacker to guess or know a number of parameters related 3325 with the target TCP connection, such as the connection-id {Source 3326 Address, Source Port, Destination Address, Destination Port}, the TCP 3327 Sequence Number, and the TCP Acknowledgement Number. Provided these 3328 values are obfuscated as recommended in this document, the chances of 3329 an off-path attacker of successfully performing a data injection 3330 attack against a TCP connection are fairly low for many of the most 3331 common scenarios. 3333 As discussed in this document, randomization of the values contained 3334 in different TCP header fields is not a replacement for cryptographic 3335 methods for protecting a TCP connection, such as IPsec (specified in 3336 RFC 4301 [Kent and Seo, 2005]). 3338 However, [Zalewski, 2003b] describes a possible vector for performing 3339 a TCP data injection attack that does not require the attacker to 3340 guess or know the aforementioned TCP connection parameters, and could 3341 therefore be successfully exploited in some scenarios with less 3342 effort than that required to exploit the more traditional data- 3343 injection attack vectors. 3345 The attack vector works as follows. When one system is transferring 3346 information to a remote peer by means of TCP, and the resulting 3347 packet gets fragmented, the first fragment will usually contain the 3348 entire TCP header which, together with the IP header, includes all 3349 the connection parameters that an attacker would need to guess or 3350 know to successfully perform a data injection attack against TCP. If 3351 an attacker were able to forge all the fragments other than the first 3352 one, his forged fragments could be reassembled together with the 3353 legitimate first fragment, and thus he would be relieved from the 3354 hard task of guessing or knowing connection parameters such as the 3355 TCP Sequence Number and the TCP Acknowledgement Number. 3357 In order to successfully exploit this attack vector, the attacker 3358 should be able to guess or know both of the IP addresses involved in 3359 the target TCP connection, the IP Identification value used for the 3360 specific packet he is targeting, and the TCP Checksum of that target 3361 packet. While it would seem that these values are hard to guess, in 3362 some specific scenarios, and with some security-unwise implementation 3363 approaches for the TCP and IP protocols, these values may be feasible 3364 to guess or know. For example, if the sending system uses 3365 predictable IP Identification values, the attacker could simply 3366 perform a brute force attack, trying each of the possible 3367 combinations for the TCP Checksum field. In more specific scenarios, 3368 the attacker could have more detailed knowledge about the data being 3369 transferred over the target TCP connection, which might allow him to 3370 predict the TCP Checksum of the target packet. For example, if both 3371 of the involved TCP peers used predictable values for the TCP 3372 Sequence Number and for the IP Identification fields, and the 3373 attacker knew the data being transferred over the target TCP 3374 connection, he could be able to carefully forge the IP payload of his 3375 IP fragments so that the checksum of the reassembled TCP segment 3376 matched the Checksum included in the TCP header of the first (and 3377 legitimate) IP fragment. 3379 As discussed in Section 4.1 of [CPNI, 2008], IP fragmentation 3380 provides a vector for performing a variety of attacks against an IP 3381 implementation. Therefore, we discourage the reliance on IP 3382 fragmentation by end-systems, and recommend the implementation of 3383 mechanisms for the discovery of the Path-MTU, such as that described 3384 in Section 15.7.3 of this document and/or that described in RFC 4821 3385 [Mathis and Heffner, 2007]. We nevertheless recommend randomization 3386 of the IP Identification field as described in Section 3.5.2 of 3387 [CPNI, 2008]. While randomization of the IP Identification field 3388 does not eliminate this attack vector, it does require more work on 3389 the side of the attacker to successfully exploit it. 3391 16.3. Broadcast and multicast IP addresses 3393 TCP connection state is maintained between only two endpoints at a 3394 time. As a result, broadcast and multicast IP addresses should not 3395 be allowed for the establishment of TCP connections. Section 4.3 of 3396 [CPNI, 2008] provides advice about which specific IP address blocks 3397 should not be allowed for connection-oriented protocols such as TCP. 3399 17. Security Considerations 3401 This document provides a thorough security assessment of the 3402 Transmission Control Protocol (TCP), identifies a number of 3403 vulnerabilities, and specifies possible counter-measures. 3404 Additionally, it provides implementation guidance such that the 3405 resilience of TCP implementations is improved. 3407 18. Acknowledgements 3409 The author would like to thank (in alphabetical order) David Borman, 3410 Wesley Eddy, Alfred Hoenes, and Michael Scharf, for providing 3411 valuable feedback on earlier versions of thi document. 3413 This document is heavily based on the document "Security Assessment 3414 of the Transmission Control Protocol (TCP)" [CPNI, 2009] written by 3415 Fernando Gont on behalf of CPNI (Centre for the Protection of 3416 National Infrastructure). 3418 The author would like to thank (in alphabetical order) Randall 3419 Atkinson, Guillermo Gont, Alfred Hoenes, Jamshid Mahdavi, Stanislav 3420 Shalunov, Michael Welzl, Dan Wing, Andrew Yourtchenko, Michal 3421 Zalewski, and Christos Zoulas, for providing valuable feedback on 3422 earlier versions of the UK CPNI document. 3424 Additionally, the author would like to thank (in alphabetical order) 3425 Mark Allman, David Black, Ethan Blanton, David Borman, James Chacon, 3426 John Heffner, Jerrold Leichter, Jamshid Mahdavi, Keith Scott, Bill 3427 Squier, and David White, who generously answered a number of 3428 questions that araised while the aforementioned document was being 3429 written. 3431 Finally, the author would like to thank CPNI (formely NISCC) for 3432 their continued support. 3434 19. References (to be translated to xml) 3436 Abley, J., Savola, P., Neville-Neil, G. 2007. Deprecation of Type 0 3437 Routing Headers in IPv6. RFC 5095. 3439 Allman, M. 2003. TCP Congestion Control with Appropriate Byte 3440 Counting (ABC). RFC 3465. 3442 Allman, M. 2008. Comments On Selecting Ephemeral Ports. Available 3443 at: http://www.icir.org/mallman/share/ports-dec08.pdf 3445 Allman, M., Paxson, V., Stevens, W. 1999. TCP Congestion Control. 3446 RFC 2581. 3448 Allman, M., Balakrishnan, H., Floyd, S. 2001. Enhancing TCP's Loss 3449 Recovery Using Limited Transmit. RFC 3042. 3451 Allman, M., Floyd, S., and C. Partridge. 2002. Increasing TCP's 3452 Initial Window. RFC 3390. 3454 Baker, F. 1995. Requirements for IP Version 4 Routers. RFC 1812. 3456 Baker, F., Savola, P. 2004. Ingress Filtering for Multihomed 3457 Networks. RFC 3704. 3459 Barisani, A. 2006. FTester - Firewall and IDS testing tool. 3460 Available at: http://dev.inversepath.com/trac/ftester 3462 Beck, R. 2001. Passive-Aggressive Resistance: OS Fingerprint 3463 Evasion. Linux Journal. 3465 Bellovin, S. M. 1989. Security Problems in the TCP/IP Protocol 3466 Suite. Computer Communication Review, Vol. 19, No. 2, pp. 32-48. 3468 Bellovin, S. M. 1996. Defending Against Sequence Number Attacks. 3469 RFC 1948. 3471 Bellovin, S. M. 2006. Towards a TCP Security Option. IETF Internet- 3472 Draft (draft-bellovin-tcpsec-00.txt), work in progress. 3474 Bernstein, D. J. 1996. SYN cookies. Available at: 3475 http://cr.yp.to/syncookies.html 3477 Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, 3478 W., 1998. An Architecture for Differentiated Services. RFC 2475. 3480 Blanton, E., Allman, M., Fall, K., Wang, L. 2003. A Conservative 3481 Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for 3482 TCP. RFC 3517. 3484 Borman, D. 1997. Post to the tcp-impl mailing-list. Message-Id: 3485 <199706061526.KAA01535@frantic.BSDI.COM>. Available at: 3486 http://www.kohala.com/start/borman.97jun06.txt 3488 Borman, D., Deering, S., Hinden, R. 1999. IPv6 Jumbograms. RFC 3489 2675. 3491 Braden, R. 1989. Requirements for Internet Hosts -- Communication 3492 Layers. RFC 1122. 3494 Braden, R. 1992. Extending TCP for Transactions -- Concepts. RFC 3495 1379. 3497 Braden, R. 1994. T/TCP -- TCP Extensions for Transactions Functional 3498 Specification. RFC 1644. 3500 CCSDS. 2006. Consultative Committee for Space Data Systems (CCSDS) 3501 Recommendation Communications Protocol Specification (SCPS) -- 3502 Transport Protocol (SCPS-TP). Blue Book. Issue 2. Available at: 3503 http://public.ccsds.org/publications/archive/714x0b2.pdf 3505 CERT. 1996. CERT Advisory CA-1996-21: TCP SYN Flooding and IP 3506 Spoofing Attacks. Available at: 3507 http://www.cert.org/advisories/CA-1996-21.html 3509 CERT. 1997. CERT Advisory CA-1997-28 IP Denial-of-Service Attacks. 3510 Available at: http://www.cert.org/advisories/CA-1997-28.html 3512 CERT. 2000. CERT Advisory CA-2000-21: Denial-of-Service 3513 Vulnerabilities in TCP/IP Stacks. Available at: 3514 http://www.cert.org/advisories/CA-2000-21.html 3516 CERT. 2001. CERT Advisory CA-2001-09: Statistical Weaknesses in 3517 TCP/IP Initial Sequence Numbers. Available at: 3518 http://www.cert.org/advisories/CA-2001-09.html 3519 CERT. 2003. CERT Advisory CA-2003-13 Multiple Vulnerabilities in 3520 Snort Preprocessors. Available at: 3521 http://www.cert.org/advisories/CA-2003-13.html 3523 Cisco. 2008a. Cisco Security Appliance Command Reference, Version 3524 7.0. Available at: http://www.cisco.com/en/US/docs/security/asa/ 3525 asa70/command/reference/tz.html#wp1288756 3527 Cisco. 2008b. Cisco Security Appliance System Log Messages, Version 3528 8.0. Available at: http://www.cisco.com/en/US/docs/security/asa/ 3529 asa80/system/message/logmsgs.html#wp4773952 3531 Clark, D.D. 1982. Fault isolation and recovery. RFC 816. 3533 Clark, D.D. 1988. The Design Philosophy of the DARPA Internet 3534 Protocols, Computer Communication Review, Vol. 18, No.4, pp. 106-114. 3536 Connolly, T., Amer, P., Conrad, P. 1994. An Extension to TCP : 3537 Partial Order Service. RFC 1693. 3539 Conta, A., Deering, S., Gupta, M. 2006. Internet Control Message 3540 Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) 3541 Specification. RFC 4443. 3543 CORE. 2003. Core Secure Technologies Advisory CORE-2003-0307: Snort 3544 TCP Stream Reassembly Integer Overflow Vulnerability. Available at: 3545 http://www.coresecurity.com/common/showdoc.php?idx=313&idxseccion=10 3547 CPNI, 2008. Security Assessment of the Internet Protocol. Available 3548 at: http://www.cpni.gov.uk/Docs/InternetProtocol.pdf 3550 CPNI, 2009. Security Assessment of the Transmission Control Protocol 3551 (TCP). Available at: 3552 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-TCP.pdf 3554 daemon9, route, and infinity. 1996. IP-spoofing Demystified (Trust- 3555 Relationship Exploitation), Phrack Magazine, Volume Seven, Issue 3556 Forty-Eight, File 14 of 18. Available at: 3557 http://www.phrack.org/archives/48/P48-14 3559 Deering, S., Hinden, R. 1998. Internet Protocol, Version 6 (IPv6) 3560 Specification. RFC 2460. 3562 Dharmapurikar, S., Paxson, V. 2005. Robust TCP Stream Reassembly In 3563 the Presence of Adversaries. Proceedings of the USENIX Security 3564 Symposium 2005. 3566 Duke, M., Braden, R., Eddy, W., Blanton, E. 2006. A Roadmap for 3567 Transmission Control Protocol (TCP) Specification Documents. RFC 3568 4614. 3570 Ed3f. 2002. Firewall spotting and networks analisys with a broken 3571 CRC. Phrack Magazine, Volume 0x0b, Issue 0x3c, Phile #0x0c of 0x10. 3572 Available at: http://www.phrack.org/phrack/60/p60-0x0c.txt 3574 Eddy, W. 2007. TCP SYN Flooding Attacks and Common Mitigations. RFC 3575 4987. 3577 Fenner, B. 2006. Experimental Values in IPv4, IPv6, ICMPv4, ICMPv6, 3578 UDP, and TCP Headers. RFC 4727. 3580 Ferguson, P., and Senie, D. 2000. Network Ingress Filtering: 3581 Defeating Denial of Service Attacks which employ IP Source Address 3582 Spoofing. RFC 2827. 3584 Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 3585 Leach, P., and Berners-Lee, T. 1999. Hypertext Transfer Protocol -- 3586 HTTP/1.1. RFC 2616. 3588 Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. 2000. An Extension 3589 to the Selective Acknowledgement (SACK) Option for TCP. RFC 2883. 3591 Floyd, S., Henderson, T., Gurtov, A. 2004. The NewReno Modification 3592 to TCP's Fast Recovery Algorithm. RFC 3782. 3594 Floyd, S., Allman, M., Jain, A., Sarolahti, P. 2007. Quick-Start for 3595 TCP and IP. RFC 4782. 3597 Fyodor. 1998. Remote OS Detection via TCP/IP Stack Fingerprinting. 3598 Phrack Magazine, Volume 8, Issue, 54. 3600 Fyodor. 2006a. Remote OS Detection via TCP/IP Fingerprinting (2nd 3601 Generation). Available at: http://insecure.org/nmap/osdetect/. 3603 Fyodor. 2006b. Nmap - Free Security Scanner For Network Exploration 3604 and Audit. Available at: http://www.insecure.org/nmap. 3606 Fyodor. 2008. Nmap Reference Guide: Port Scanning Techniques. 3607 Available at: http://nmap.org/book/man-port-scanning-techniques.html 3609 GIAC. 2000. Egress Filtering v 0.2. Available at: 3610 http://www.sans.org/y2k/egress.htm 3612 Giffin, J., Greenstadt, R., Litwack, P., Tibbetts, R. 2002. Covert 3613 Messaging through TCP Timestamps. PET2002 (Workshop on Privacy 3614 Enhancing Technologies), San Francisco, CA, USA, April2002. 3616 Available at: 3617 http://web.mit.edu/greenie/Public/CovertMessaginginTCP.ps 3619 Gill, V., Heasley, J., Meyer, D., Savola, P, Pignataro, C. 2007. The 3620 Generalized TTL Security Mechanism (GTSM). RFC 5082. 3622 Gont, F. 2006. Advanced ICMP packet filtering. Available at: 3623 http://www.gont.com.ar/papers/icmp-filtering.html 3625 Gont, F. 2008a. ICMP attacks against TCP. IETF Internet-Draft 3626 (draft-ietf-tcpm-icmp-attacks-04.txt), work in progress. 3628 Gont, F.. 2008b. TCP's Reaction to Soft Errors. IETF Internet-Draft 3629 (draft-ietf-tcpm-tcp-soft-errors-09.txt), work in progress. 3631 Gont, F. 2009. On the generation of TCP timestamps. IETF Internet- 3632 Draft (draft-gont-tcpm-tcp-timestamps-01.txt), work in progress. 3634 Gont, F., Srisuresh, P. 2008. Security Implications of Network 3635 Address Translators (NATs). IETF Internet-Draft 3636 (draft-gont-behave-nat-security-01.txt), work in progress. 3638 Gont, F., Yourtchenko, A. 2009. On the implementation of TCP urgent 3639 data. IETF Internet-Draft (draft-gont-tcpm-urgent-data-01.txt), work 3640 in progress. 3642 Heffernan, A. 1998. Protection of BGP Sessions via the TCP MD5 3643 Signature Option. RFC 2385. 3645 Heffner, J. 2002. High Bandwidth TCP Queuing. Senior Thesis. 3647 Hnes, A. 2007. TCP options - tcp-parameters IANA registry. Post to 3648 the tcpm wg mailing-list. Available at: 3649 http://www.ietf.org/mail-archive/web/tcpm/current/msg03199.html 3651 IANA. 2007. Transmission Control Protocol (TCP) Option Numbers. 3652 Avialable at: http://www.iana.org/assignments/tcp-parameters/ 3654 IANA. 2008. Port Numbers. Available at: 3655 http://www.iana.org/assignments/port-numbers 3657 Jacobson, V. 1988. Congestion Avoidance and Control. Computer 3658 Communication Review, vol. 18, no. 4, pp. 314-329. Available at: 3659 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z 3661 Jacobson, V., Braden, R. 1988. TCP Extensions for Long-Delay Paths. 3662 RFC 1072. 3664 Jacobson, V., Braden, R., Borman, D. 1992. TCP Extensions for High 3665 Performance. RFC 1323. 3667 Jones, S. 2003. Port 0 OS Fingerprinting. Available at: 3668 http://www.gont.com.ar/docs/port-0-os-fingerprinting.txt 3670 Kent, S. and Seo, K. 2005. Security Architecture for the Internet 3671 Protocol. RFC 4301. 3673 Klensin, J. 2008. Simple Mail Transfer Protocol. RFC 5321. 3675 Ko, Y., Ko, S., and Ko, M. 2001. NIDS Evasion Method named SeolMa. 3676 Phrack Magazine, Volume 0x0b, Issue 0x39, phile #0x03 of 0x12. 3677 Available at: http://www.phrack.org/issues.html?issue=57&id=3#article 3679 Lahey, K. 2000. TCP Problems with Path MTU Discovery. RFC 2923. 3681 Lemon, 2002. Resisting SYN flood DoS attacks with a SYN cache. 3682 Proceedings of the BSDCon 2002 Conference, pp 89-98. 3684 Maimon, U. 1996. Port Scanning without the SYN flag. Phrack 3685 Magazine, Volume Seven, Issue Fourty-Nine, phile #0x0f of 0x10. 3686 Available at: 3687 http://www.phrack.org/issues.html?issue=49&id=15#article 3689 Mathis, M., Mahdavi, J., Floyd, S. Romanow, A. 1996. TCP Selective 3690 Acknowledgment Options. RFC 2018. 3692 Mathis, M., and Heffner, J. 2007. Packetization Layer Path MTU 3693 Discovery. RFC 4821. 3695 McCann, J., Deering, S., Mogul, J. 1996. Path MTU Discovery for IP 3696 version 6. RFC 1981. 3698 McKusick, M., Bostic, K., Karels, M., and J. Quarterman. 1996. The 3699 Design and Implementation of the 4.4BSD Operating System. Addison- 3700 Wesley. 3702 Meltman. 1997. new TCP/IP bug in win95. Post to the bugtraq mailing- 3703 list. Available at: http://insecure.org/sploits/land.ip.DOS.html 3705 Miller, T. 2006. Passive OS Fingerprinting: Details and Techniques. 3706 Available at: http://www.ouah.org/incosfingerp.htm . 3708 Mogul, J., and Deering, S. 1990. Path MTU Discovery. RFC 1191. 3710 Morris, R. 1985. A Weakness in the 4.2BSD Unix TCP/IP Software. 3711 Technical Report CSTR-117, AT&T Bell Laboratories. Available at: 3713 http://pdos.csail.mit.edu/~rtm/papers/117.pdf . 3715 Myst. 1997. Windows 95/NT DoS. Post to the bugtraq mailing-list. 3716 Available at: http://seclists.org/bugtraq/1997/May/0039.html 3718 Nichols, K., Blake, S., Baker, F., and Black, D. 1998. Definition of 3719 the Differentiated Services Field (DS Field) in the IPv4 and IPv6 3720 Headers. RFC 2474. 3722 NISCC. 2004. NISCC Vulnerability Advisory 236929: Vulnerability 3723 Issues in TCP. Available at: 3724 http://www.uniras.gov.uk/niscc/docs/re-20040420-00391.pdf 3726 NISCC. 2005. NISCC Vulnerability Advisory 532967/NISCC/ICMP: 3727 Vulnerability Issues in ICMP packets with TCP payloads. Available 3728 at: http://www.niscc.gov.uk/niscc/docs/re-20050412-00303.pdf 3730 NISCC. 2006. NISCC Technical Note 01/2006: Egress and Ingress 3731 Filtering. Available at: 3732 http://www.niscc.gov.uk/niscc/docs/re-20060420-00294.pdf?lang=en 3734 Ostermann, S. 2008. tcptrace tool. Tool and documentation available 3735 at: http://www.tcptrace.org. 3737 Paxson, V., Allman, M. 2000. Computing TCP's Retransmission Timer. 3738 RFC 2988. 3740 PCNWG. 2009. Congestion and Pre-Congestion Notification (pcn) 3741 charter. Available at: 3742 http://www.ietf.org/html.charters/pcn-charter.html 3744 PMTUDWG. 2007. Path MTU Discovery (pmtud) charter. Available at: 3745 http://www.ietf.org/html.charters/OLD/pmtud-charter.html 3747 Postel, J. 1981a. Internet Protocol. DARPA Internet Program. 3748 Protocol Specification. RFC 791. 3750 Postel, J. 1981b. Internet Control Message Protocol. RFC 792. 3752 Postel, J. 1981c. Transmission Control Protocol. DARPA Internet 3753 Program. Protocol Specification. RFC 793. 3755 Postel, J. 1987. TCP AND IP BAKE OFF. RFC 1025. 3757 Ptacek, T. H., and Newsham, T. N. 1998. Insertion, Evasion and 3758 Denial of Service: Eluding Network Intrusion Detection. Secure 3759 Networks, Inc. Available at: 3760 http://www.aciri.org/vern/Ptacek-Newsham-Evasion-98.ps 3761 Ramaiah, A., Stewart, R., and Dalal, M. 2008. Improving TCP's 3762 Robustness to Blind In-Window Attacks. IETF Internet-Draft 3763 (draft-ietf-tcpm-tcpsecure-10.txt), work in progress. 3765 Ramakrishnan, K., Floyd, S., and Black, D. 2001. The Addition of 3766 Explicit Congestion Notification (ECN) to IP. RFC 3168. 3768 Rekhter, Y., Li, T., Hares, S. 2006. A Border Gateway Protocol 4 3769 (BGP-4). RFC 4271. 3771 Rivest, R. 1992. The MD5 Message-Digest Algorithm. RFC 1321. 3773 Rowland, C. 1997. Covert Channels in the TCP/IP Protocol Suite. 3774 First Monday Journal, Volume 2, Number 5. Available at: 3775 http://www.firstmonday.org/issues/issue2_5/rowland/ 3777 Savage, S., Cardwell, N., Wetherall, D., Anderson, T. 1999. TCP 3778 Congestion Control with a Misbehaving Receiver. ACM Computer 3779 Communication Review, 29(5), October 1999. 3781 Semke, J., Mahdavi, J., Mathis, M. 1998. Automatic TCP Buffer 3782 Tuning. ACM Computer Communication Review, Vol. 28, No. 4. 3784 Shalunov, S. 2000. Netkill. Available at: 3785 http://www.internet2.edu/~shalunov/netkill/netkill.html 3787 Shimomura, T. 1995. Technical details of the attack described by 3788 Markoff in NYT. Message posted in USENETs comp.security.misc 3789 newsgroup, Message-ID: <3g5gkl$5j1@ariel.sdsc.edu>. Available at: 3790 http://www.gont.com.ar/docs/post-shimomura-usenet.txt. 3792 Silbersack, M. 2005. Improving TCP/IP security through randomization 3793 without sacrificing interoperability. EuroBSDCon 2005 Conference. 3795 SinFP. 2006. Net::SinFP - a Perl module to do OS fingerprinting. 3796 Available at: 3797 http://www.gomor.org/cgi-bin/index.pl?mode=view;page=sinfp 3799 Smart, M., Malan, G., Jahanian, F. 2000. Defeating TCP/IP Stack 3800 Fingerprinting. Proceedings of the 9th USENIX Security Symposium, 3801 pp. 229-240. Available at: http://www.usenix.org/publications/ 3802 library/proceedings/sec2000/full_papers/smart/smart_html/index.html 3804 Smith, C., Grundl, P. 2002. Know Your Enemy: Passive Fingerprinting. 3805 The Honeynet Project. 3807 Spring, N., Wetherall, D., Ely, D. 2003. Robust Explicit Congestion 3808 Notification (ECN) Signaling with Nonces. RFC 3540. 3810 Srisuresh, P., Egevang, K. 2001. Traditional IP Network Address 3811 Translator (Traditional NAT). RFC 3022. 3813 Stevens, W. R. 1994. TCP/IP Illustrated, Volume 1: The Protocols. 3814 Addison-Wesley Professional Computing Series. 3816 TBIT. 2001. TBIT, the TCP Behavior Inference Tool. Available at: 3817 http://www.icir.org/tbit/ 3819 Touch, J. 2007. Defending TCP Against Spoofing Attacks. RFC 4953. 3821 US-CERT. 2001. US-CERT Vulnerability Note VU#498440: Multiple TCP/IP 3822 implementations may use statistically predictable initial sequence 3823 numbers. Available at: http://www.kb.cert.org/vuls/id/498440 3825 US-CERT. 2003a. US-CERT Vulnerability Note VU#26825: Cisco Secure 3826 PIX Firewall TCP Reset Vulnerability. Available at: 3827 http://www.kb.cert.org/vuls/id/26825 3829 US-CERT. 2003b. US-CERT Vulnerability Note VU#464113: TCP/IP 3830 implementations handle unusual flag combinations inconsistently. 3831 Available at: http://www.kb.cert.org/vuls/id/464113 3833 US-CERT. 2004a. US-CERT Vulnerability Note VU#395670: FreeBSD fails 3834 to limit number of TCP segments held in reassembly queue. Available 3835 at: http://www.kb.cert.org/vuls/id/395670 3837 US-CERT. 2005a. US-CERT Vulnerability Note VU#102014: Optimistic TCP 3838 acknowledgements can cause denial of service. Available at: 3839 http://www.kb.cert.org/vuls/id/102014 3841 US-CERT. 2005b. US-CERT Vulnerability Note VU#396645: Microsoft 3842 Windows vulnerable to DoS via LAND attack. Available at: 3843 http://www.kb.cert.org/vuls/id/396645 3845 US-CERT. 2005c. US-CERT Vulnerability Note VU#637934: TCP does not 3846 adequately validate segments before updating timestamp value. 3847 Available at: http://www.kb.cert.org/vuls/id/637934 3849 US-CERT. 2005d. US-CERT Vulnerability Note VU#853540: Cisco PIX 3850 fails to verify TCP checksum. Available at: 3851 http://www.kb.cert.org/vuls/id/853540. 3853 Veysset, F., Courtay, O., Heen, O. 2002. New Tool And Technique For 3854 Remote Operating System Fingerprinting. Intranode Research Team. 3856 Watson, P. 2004. Slipping in the Window: TCP Reset Attacks, 3857 CanSecWest 2004 Conference. 3859 Welzl, M. 2008. Internet congestion control: evolution and current 3860 open issues. CAIA guest talk, Swinburne University, Melbourne, 3861 Australia. Available at: 3862 http://www.welzl.at/research/publications/caia-jan08.pdf 3864 Wright, G. and W. Stevens. 1994. TCP/IP Illustrated, Volume 2: The 3865 Implementation. Addison-Wesley. 3867 Zalewski, M. 2001a. Strange Attractors and TCP/IP Sequence Number 3868 Analysis. Available at: 3869 http://lcamtuf.coredump.cx/oldtcp/tcpseq.html 3871 Zalewski, M. 2001b. Delivering Signals for Fun and Profit. 3872 Available at: http://lcamtuf.coredump.cx/signals.txt 3874 Zalewski, M. 2002. Strange Attractors and TCP/IP Sequence Number 3875 Analysis - One Year Later. Available at: 3876 http://lcamtuf.coredump.cx/newtcp/ 3878 Zalewski, M. 2003a. Windows URG mystery solved! Post to the bugtraq 3879 mailing-list. Available at: 3880 http://lcamtuf.coredump.cx/p0f-help/p0f/doc/win-memleak.txt 3882 Zalewski, M. 2003b. A new TCP/IP blind data injection technique? 3883 Post to the bugtraq mailing-list. Available at: 3884 http://lcamtuf.coredump.cx/ipfrag.txt 3886 Zalewski, M. 2006a. p0f passive fingerprinting tool. Available at: 3887 http://lcamtuf.coredump.cx/p0f.shtml 3889 Zalewski, M. 2006b. p0f - RST+ signatures. Available at: 3890 http://lcamtuf.coredump.cx/p0f-help/p0f/p0fr.fp 3892 Zalewski, M. 2007. 0trace - traceroute on established connections. 3893 Post to the bugtraq mailing-list. Available at: 3894 http://seclists.org/bugtraq/2007/Jan/0176.html 3896 Zalewski, M. 2008. Museum of broken packets. Available at: 3897 http://lcamtuf.coredump.cx/mobp/ 3899 Zander, S. 2008. Covert Channels in Computer Networks. Available 3900 at: http://caia.swin.edu.au/cv/szander/cc/index.html 3902 Zquete, A. 2002. Improving the functionality of SYN cookies. 6th 3903 IFIP Communications and Multimedia Security Conference (CMS 2002). 3904 Available at: http://www.ieeta.pt/~avz/pubs/CMS02.html 3906 Zweig, J., Partridge, C. 1990. TCP Alternate Checksum Options. RFC 3907 1146. 3909 20. References 3911 20.1. Normative References 3913 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 3914 RFC 793, September 1981. 3916 [RFC1122] Braden, R., "Requirements for Internet Hosts - 3917 Communication Layers", STD 3, RFC 1122, October 1989. 3919 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 3920 of Explicit Congestion Notification (ECN) to IP", 3921 RFC 3168, September 2001. 3923 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 3924 Control", RFC 5681, September 2009. 3926 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 3927 Robustness to Blind In-Window Attacks", RFC 5961, 3928 August 2010. 3930 [RFC6056] Larsen, M. and F. Gont, "Recommendations for Transport- 3931 Protocol Port Randomization", BCP 156, RFC 6056, 3932 January 2011. 3934 [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the 3935 TCP Urgent Mechanism", RFC 6093, January 2011. 3937 [RFC6191] Gont, F., "Reducing the TIME-WAIT State Using TCP 3938 Timestamps", BCP 159, RFC 6191, April 2011. 3940 [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence 3941 Number Attacks", RFC 6528, February 2012. 3943 20.2. Informative References 3945 [I-D.gont-timestamps-generation] 3946 Gont, F. and A. Oppermann, "On the generation of TCP 3947 timestamps", draft-gont-timestamps-generation-00 (work in 3948 progress), June 2010. 3950 [I-D.ietf-tcpm-3517bis] 3951 Blanton, E., Jarvinen, I., Wang, L., Allman, M., Kojo, M., 3952 and Y. Nishida, "A Conservative Selective Acknowledgment 3953 (SACK)-based Loss Recovery Algorithm for TCP", 3954 draft-ietf-tcpm-3517bis-01 (work in progress), 3955 January 2012. 3957 [Morris1985] 3958 Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP 3959 Software", CSTR 117, AT&T Bell Laboratories, Murray Hill, 3960 NJ, 1985. 3962 [RFC1025] Postel, J., "TCP and IP bake off", RFC 1025, 3963 September 1987. 3965 [RFC1379] Braden, B., "Extending TCP for Transactions -- Concepts", 3966 RFC 1379, November 1992. 3968 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 3970 [RFC6429] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender 3971 Clarification for Persist Condition", RFC 6429, 3972 December 2011. 3974 [Shimomura1995] 3975 Shimomura, T., "Technical details of the attack described 3976 by Markoff in NYT", 3977 http://www.gont.com.ar/docs/post-shimomura-usenet.txt, 3978 Message posted in USENET's comp.security.misc newsgroup, 3979 Message-ID: <3g5gkl$5j1@ariel.sdsc.edu>, 1995. 3981 Appendix A. TODO list 3983 A Number of formatting issues still have to be fixed in this 3984 document. Among others are: 3986 o The ASCII-art corresponding to some figures are still missing. We 3987 still have to convert the nice JPGs of the UK CPNI document into 3988 ugly ASCII-art. 3990 o The references have not yet been converted to xml, but are 3991 hardcoded, instead. That's why they may not look as expected 3993 Appendix B. Change log (to be removed by the RFC Editor before 3994 publication of this document as an RFC) 3996 B.1. Changes from draft-ietf-tcpm-tcp-security-02 3997 o Lots of text has been removed out of the document. 3999 o The documento track has been changed from BCP to Informational 4000 (RFC2119-language recommendations ahve been removed). 4002 o Where necessary, stand-alone std tracks documents have been 4003 produced. 4005 B.2. Changes from draft-ietf-tcpm-tcp-security-01 4007 A Number of formatting issues still have to be fixed in this 4008 document. Among others are: 4010 o The whole document was reformatted with RFC 1122 style. 4012 Author's Address 4014 Fernando Gont 4015 UK Centre for the Protection of National Infrastructure 4017 Email: fernando@gont.com.ar 4018 URI: http://www.cpni.gov.uk