idnits 2.17.1 draft-ietf-tcpm-tcp-security-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 8 characters in excess of 72. == There are 4 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 430: '... TCP SHOULD randomize its ephemeral ...' RFC 2119 keyword, line 432: '...gest posible port range SHOULD be used...' RFC 2119 keyword, line 440: '... TCP MUST NOT allocate port number 0...' RFC 2119 keyword, line 442: '...ion Port, a RST segment SHOULD be sent...' RFC 2119 keyword, line 476: '... TCP MUST be able to grecefully hand...' (94 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 21, 2011) is 4844 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Clark' is mentioned on line 187, but not defined -- Looks like a reference, but probably isn't: '1988' on line 187 == Missing Reference: 'Bellovin' is mentioned on line 4316, but not defined -- Looks like a reference, but probably isn't: '1989' on line 4208 == Missing Reference: 'NISCC' is mentioned on line 3947, but not defined -- Looks like a reference, but probably isn't: '2005' on line 219 == Missing Reference: 'Silbersack' is mentioned on line 219, but not defined == Missing Reference: 'Postel' is mentioned on line 4637, but not defined == Missing Reference: '1981c' is mentioned on line 4637, but not defined == Missing Reference: 'Braden' is mentioned on line 4208, but not defined == Missing Reference: 'Jones' is mentioned on line 459, but not defined -- Looks like a reference, but probably isn't: '2003' on line 3092 == Missing Reference: 'CERT' is mentioned on line 2987, but not defined -- Looks like a reference, but probably isn't: '1996' on line 4612 == Missing Reference: 'Meltman' is mentioned on line 488, but not defined -- Looks like a reference, but probably isn't: '1997' on line 1906 == Missing Reference: 'Morris' is mentioned on line 567, but not defined -- Looks like a reference, but probably isn't: '1985' on line 567 == Missing Reference: 'Shimomura' is mentioned on line 1978, but not defined -- Looks like a reference, but probably isn't: '1995' on line 1978 -- Looks like a reference, but probably isn't: '2001' on line 4275 == Missing Reference: 'US-CERT' is mentioned on line 4013, but not defined == Missing Reference: 'Zalewski' is mentioned on line 4719, but not defined == Missing Reference: '2001a' is mentioned on line 590, but not defined -- Looks like a reference, but probably isn't: '2002' on line 2853 -- Looks like a reference, but probably isn't: '1987' on line 681 -- Looks like a reference, but probably isn't: '1992' on line 681 == Missing Reference: '2001b' is mentioned on line 754, but not defined == Missing Reference: 'Watson' is mentioned on line 3947, but not defined -- Looks like a reference, but probably isn't: '2004' on line 3947 == Missing Reference: 'Heffner' is mentioned on line 2853, but not defined == Missing Reference: 'Barisani' is mentioned on line 1038, but not defined -- Looks like a reference, but probably isn't: '2006' on line 4273 == Missing Reference: 'Ed3f' is mentioned on line 1054, but not defined == Missing Reference: '2005d' is mentioned on line 1750, but not defined -- Looks like a reference, but probably isn't: '2008' on line 4777 == Missing Reference: 'Myst' is mentioned on line 1091, but not defined == Missing Reference: 'IANA' is mentioned on line 1098, but not defined -- Looks like a reference, but probably isn't: '2007' on line 4682 == Missing Reference: 'Hnes' is mentioned on line 1103, but not defined -- Looks like a reference, but probably isn't: '1994' on line 1393 == Missing Reference: 'CCSDS' is mentioned on line 1168, but not defined == Missing Reference: 'Stevens' is mentioned on line 1393, but not defined == Missing Reference: 'Reed' is mentioned on line 1408, but not defined == Missing Reference: '1981a' is mentioned on line 1422, but not defined == Missing Reference: 'Heffernan' is mentioned on line 1608, but not defined -- Looks like a reference, but probably isn't: '1998' on line 4367 == Missing Reference: 'Welzl' is mentioned on line 1653, but not defined == Missing Reference: '2005c' is mentioned on line 1744, but not defined == Missing Reference: 'Gont' is mentioned on line 1889, but not defined == Missing Reference: '2008b' is mentioned on line 1889, but not defined == Missing Reference: 'Borman' is mentioned on line 1906, but not defined == Missing Reference: 'Eddy' is mentioned on line 1906, but not defined == Missing Reference: 'Lemon' is mentioned on line 1910, but not defined == Missing Reference: 'Bernstein' is mentioned on line 1996, but not defined == Missing Reference: 'Zquete' is mentioned on line 1986, but not defined == Missing Reference: 'CPNI' is mentioned on line 4795, but not defined -- Looks like a reference, but probably isn't: '2000' on line 2582 == Missing Reference: '2003b' is mentioned on line 4719, but not defined == Missing Reference: 'Linux' is mentioned on line 2306, but not defined == Missing Reference: 'Shalunov' is mentioned on line 2582, but not defined == Missing Reference: '2004a' is mentioned on line 2726, but not defined == Missing Reference: 'CORE' is mentioned on line 2987, but not defined == Missing Reference: 'Allman' is mentioned on line 3092, but not defined == Missing Reference: '2005a' is mentioned on line 3158, but not defined == Missing Reference: 'Touch' is mentioned on line 3667, but not defined == Missing Reference: 'Ostermann' is mentioned on line 3328, but not defined == Missing Reference: 'PCNWG' is mentioned on line 3696, but not defined -- Looks like a reference, but probably isn't: '2009' on line 4795 == Missing Reference: '2003a' is mentioned on line 4013, but not defined == Missing Reference: 'Fyodor' is mentioned on line 4585, but not defined == Missing Reference: '2006b' is mentioned on line 4585, but not defined == Missing Reference: 'TBIT' is mentioned on line 4268, but not defined == Missing Reference: '2006a' is mentioned on line 4272, but not defined == Missing Reference: 'Miller' is mentioned on line 4273, but not defined == Missing Reference: 'Beck' is mentioned on line 4275, but not defined == Missing Reference: 'Rowland' is mentioned on line 4455, but not defined == Missing Reference: 'Zander' is mentioned on line 4458, but not defined == Missing Reference: 'Maimon' is mentioned on line 4612, but not defined == Unused Reference: 'RFC6093' is defined on line 5308, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-ietf-tcpm-tcp-timestamps-03 ** Obsolete normative reference: RFC 6093 (Obsoleted by RFC 9293) Summary: 4 errors (**), 0 flaws (~~), 64 warnings (==), 21 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor F. Gont 3 Extensions (tcpm) UK CPNI 4 Internet-Draft January 21, 2011 5 Intended status: BCP 6 Expires: July 25, 2011 8 Security Assessment of the Transmission Control Protocol (TCP) 9 draft-ietf-tcpm-tcp-security-02.txt 11 Abstract 13 This document contains a security assessment of the specifications of 14 the Transmission Control Protocol (TCP), and of a number of 15 mechanisms and policies in use by popular TCP implementations. 16 Additionally, it contains best current practices for hardening a TCP 17 implementation. 19 Status of this Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on July 25, 2011. 36 Copyright Notice 38 Copyright (c) 2011 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 54 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 5 55 1.2. Scope of this document . . . . . . . . . . . . . . . . . 6 56 1.3. Organization of this document . . . . . . . . . . . . . . 8 57 2. The Transmission Control Protocol . . . . . . . . . . . . . . 8 58 3. TCP header fields . . . . . . . . . . . . . . . . . . . . . . 9 59 3.1. Source Port and Destination Port . . . . . . . . . . . . 10 60 3.2. Sequence number . . . . . . . . . . . . . . . . . . . . . 12 61 3.3. Acknowledgement Number . . . . . . . . . . . . . . . . . 14 62 3.4. Data Offset . . . . . . . . . . . . . . . . . . . . . . . 15 63 3.5. Control bits . . . . . . . . . . . . . . . . . . . . . . 15 64 3.5.1. Reserved (four bits) . . . . . . . . . . . . . . . . 15 65 3.5.2. CWR (Congestion Window Reduced) . . . . . . . . . . . 16 66 3.5.3. ECE (ECN-Echo) . . . . . . . . . . . . . . . . . . . 16 67 3.5.4. URG . . . . . . . . . . . . . . . . . . . . . . . . . 17 68 3.5.5. ACK . . . . . . . . . . . . . . . . . . . . . . . . . 17 69 3.5.6. PSH . . . . . . . . . . . . . . . . . . . . . . . . . 17 70 3.5.7. RST . . . . . . . . . . . . . . . . . . . . . . . . . 19 71 3.5.8. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 19 72 3.5.9. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 20 73 3.6. Window . . . . . . . . . . . . . . . . . . . . . . . . . 20 74 3.7. Checksum . . . . . . . . . . . . . . . . . . . . . . . . 22 75 3.8. Urgent pointer . . . . . . . . . . . . . . . . . . . . . 23 76 3.9. Options . . . . . . . . . . . . . . . . . . . . . . . . . 24 77 3.10. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 28 78 3.11. Data . . . . . . . . . . . . . . . . . . . . . . . . . . 28 79 4. Common TCP Options . . . . . . . . . . . . . . . . . . . . . 29 80 4.1. End of Option List (Kind = 0) . . . . . . . . . . . . . . 29 81 4.2. No Operation (Kind = 1) . . . . . . . . . . . . . . . . . 29 82 4.3. Maximum Segment Size (Kind = 2) . . . . . . . . . . . . . 29 83 4.4. Selective Acknowledgement Option . . . . . . . . . . . . 32 84 4.4.1. SACK-permitted Option (Kind = 4) . . . . . . . . . . 32 85 4.4.2. SACK Option (Kind = 5) . . . . . . . . . . . . . . . 33 86 4.5. MD5 Option (Kind=19) . . . . . . . . . . . . . . . . . . 35 87 4.6. Window scale option (Kind = 3) . . . . . . . . . . . . . 36 88 4.7. Timestamps option (Kind = 8) . . . . . . . . . . . . . . 37 89 4.7.1. Generation of timestamps . . . . . . . . . . . . . . 37 90 4.7.2. Vulnerabilities . . . . . . . . . . . . . . . . . . . 38 91 5. Connection-establishment mechanism . . . . . . . . . . . . . 39 92 5.1. SYN flood . . . . . . . . . . . . . . . . . . . . . . . . 40 93 5.2. Connection forgery . . . . . . . . . . . . . . . . . . . 44 94 5.3. Connection-flooding attack . . . . . . . . . . . . . . . 45 95 5.3.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 45 96 5.3.2. Countermeasures . . . . . . . . . . . . . . . . . . . 46 97 5.4. Firewall-bypassing techniques . . . . . . . . . . . . . . 48 98 6. Connection-termination mechanism . . . . . . . . . . . . . . 49 99 6.1. FIN-WAIT-2 flooding attack . . . . . . . . . . . . . . . 49 100 6.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 49 101 6.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 50 102 7. Buffer management . . . . . . . . . . . . . . . . . . . . . . 52 103 7.1. TCP retransmission buffer . . . . . . . . . . . . . . . . 52 104 7.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 52 105 7.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 53 106 7.2. TCP segment reassembly buffer . . . . . . . . . . . . . . 56 107 7.3. Automatic buffer tuning mechanisms . . . . . . . . . . . 59 108 7.3.1. Automatic send-buffer tuning mechanisms . . . . . . . 59 109 7.3.2. Automatic receive-buffer tuning mechanism . . . . . . 61 110 8. TCP segment reassembly algorithm . . . . . . . . . . . . . . 63 111 8.1. Problems that arise from ambiguity in the reassembly 112 process . . . . . . . . . . . . . . . . . . . . . . . . . 63 113 9. TCP Congestion Control . . . . . . . . . . . . . . . . . . . 64 114 9.1. Congestion control with misbehaving receivers . . . . . . 66 115 9.1.1. ACK division . . . . . . . . . . . . . . . . . . . . 66 116 9.1.2. DupACK forgery . . . . . . . . . . . . . . . . . . . 66 117 9.1.3. Optimistic ACKing . . . . . . . . . . . . . . . . . . 67 118 9.2. Blind DupACK triggering attacks against TCP . . . . . . . 68 119 9.2.1. Blind throughput-reduction attack . . . . . . . . . . 70 120 9.2.2. Blind flooding attack . . . . . . . . . . . . . . . . 70 121 9.2.3. Difficulty in performing the attacks . . . . . . . . 71 122 9.2.4. Modifications to TCP's loss recovery algorithms . . . 72 123 9.2.5. Countermeasures . . . . . . . . . . . . . . . . . . . 74 124 9.3. TCP Explicit Congestion Notification (ECN) . . . . . . . 79 125 9.3.1. Possible attacks by a compromised router . . . . . . 79 126 9.3.2. Possible attacks by a malicious TCP endpoint . . . . 80 127 10. TCP API . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 128 10.1. Passive opens and binding sockets . . . . . . . . . . . . 81 129 10.2. Active opens and binding sockets . . . . . . . . . . . . 82 130 11. Blind in-window attacks . . . . . . . . . . . . . . . . . . . 84 131 11.1. Blind TCP-based connection-reset attacks . . . . . . . . 84 132 11.1.1. RST flag . . . . . . . . . . . . . . . . . . . . . . 85 133 11.1.2. SYN flag . . . . . . . . . . . . . . . . . . . . . . 86 134 11.1.3. Security/Compartment . . . . . . . . . . . . . . . . 88 135 11.1.4. Precedence . . . . . . . . . . . . . . . . . . . . . 89 136 11.1.5. Illegal options . . . . . . . . . . . . . . . . . . . 90 137 11.2. Blind data-injection attacks . . . . . . . . . . . . . . 90 138 12. Information leaking . . . . . . . . . . . . . . . . . . . . . 91 139 12.1. Remote Operating System detection via TCP/IP stack 140 fingerprinting . . . . . . . . . . . . . . . . . . . . . 91 141 12.1.1. FIN probe . . . . . . . . . . . . . . . . . . . . . . 91 142 12.1.2. Bogus flag test . . . . . . . . . . . . . . . . . . . 92 143 12.1.3. TCP ISN sampling . . . . . . . . . . . . . . . . . . 92 144 12.1.4. TCP initial window . . . . . . . . . . . . . . . . . 92 145 12.1.5. RST sampling . . . . . . . . . . . . . . . . . . . . 93 146 12.1.6. TCP options . . . . . . . . . . . . . . . . . . . . . 94 147 12.1.7. Retransmission Timeout (RTO) sampling . . . . . . . . 94 148 12.2. System uptime detection . . . . . . . . . . . . . . . . . 94 149 13. Covert channels . . . . . . . . . . . . . . . . . . . . . . . 95 150 14. TCP Port scanning . . . . . . . . . . . . . . . . . . . . . . 95 151 14.1. Traditional connect() scan . . . . . . . . . . . . . . . 96 152 14.2. SYN scan . . . . . . . . . . . . . . . . . . . . . . . . 96 153 14.3. FIN, NULL, and XMAS scans . . . . . . . . . . . . . . . . 96 154 14.4. Maimon scan . . . . . . . . . . . . . . . . . . . . . . . 98 155 14.5. Window scan . . . . . . . . . . . . . . . . . . . . . . . 98 156 14.6. ACK scan . . . . . . . . . . . . . . . . . . . . . . . . 99 157 15. Processing of ICMP error messages by TCP . . . . . . . . . . 99 158 16. TCP interaction with the Internet Protocol (IP) . . . . . . . 99 159 16.1. TCP-based traceroute . . . . . . . . . . . . . . . . . . 99 160 16.2. Blind TCP data injection through fragmented IP traffic . 100 161 16.3. Broadcast and multicast IP addresses . . . . . . . . . . 102 162 17. Security Considerations . . . . . . . . . . . . . . . . . . . 102 163 18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 102 164 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 103 165 20. References . . . . . . . . . . . . . . . . . . . . . . . . . 113 166 20.1. Normative References . . . . . . . . . . . . . . . . . . 113 167 20.2. Informative References . . . . . . . . . . . . . . . . . 113 168 Appendix A. TODO list . . . . . . . . . . . . . . . . . . . . . 113 169 Appendix B. Change log (to be removed by the RFC Editor 170 before publication of this document as an RFC) . . . 113 171 B.1. Changes from draft-ietf-tcpm-tcp-security-01 . . . . . . 113 172 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 114 174 1. Preface 176 1.1. Introduction 178 The TCP/IP protocol suite was conceived in an environment that was 179 quite different from the hostile environment they currently operate 180 in. However, the effectiveness of the protocols led to their early 181 adoption in production environments, to the point that, to some 182 extent, the current world's economy depends on them. 184 While many textbooks and articles have created the myth that the 185 Internet protocols were designed for warfare environments, the top 186 level goal for the DARPA Internet Program was the sharing of large 187 service machines on the ARPANET [Clark, 1988]. As a result, many 188 protocol specifications focus only on the operational aspects of the 189 protocols they specify, and overlook their security implications. 191 While the Internet technology evolved since it early inception, the 192 Internet's building blocks are basically the same core protocols 193 adopted by the ARPANET more than two decades ago. During the last 194 twenty years, many vulnerabilities have been identified in the TCP/IP 195 stacks of a number of systems. Some of them were based on flaws in 196 some protocol implementations, affecting only a reduced number of 197 systems, while others were based in flaws in the protocols 198 themselves, affecting virtually every existing implementation 199 [Bellovin, 1989]. Even in the last couple of years, researchers were 200 still working on security problems in the core protocols [NISCC, 201 2004] [NISCC, 2005]. 203 The discovery of vulnerabilities in the TCP/IP protocol suite usually 204 led to reports being published by a number of CSIRTs (Computer 205 Security Incident Response Teams) and vendors, which helped to raise 206 awareness about the threats and the best mitigations known at the 207 time the reports were published. Unfortunately, this also led to the 208 documentation of the discovered protocol vulnerabilities being spread 209 among a large number of documents, which are sometimes difficult to 210 identify. 212 For some reason, much of the effort of the security community on the 213 Internet protocols did not result in official documents (RFCs) being 214 issued by the IETF (Internet Engineering Task Force). This basically 215 led to a situation in which "known" security problems have not always 216 been addressed by all vendors. In addition, in many cases vendors 217 have implemented quick "fixes" to the identified vulnerabilities 218 without a careful analysis of their effectiveness and their impact on 219 interoperability [Silbersack, 2005]. 221 Producing a secure TCP/IP implementation nowadays is a very difficult 222 task, in part because of the lack of a single document that serves as 223 a security roadmap for the protocols. Implementers are faced with 224 the hard task of identifying relevant documentation and 225 differentiating between that which provides correct advice, and that 226 which provides misleading advice based on inaccurate or wrong 227 assumptions. 229 There is a clear need for a companion document to the IETF 230 specifications that discusses the security aspects and implications 231 of the protocols, identifies the existing vulnerabilities, discusses 232 the possible countermeasures, and analyzes their respective 233 effectiveness. 235 This document is the result of a security assessment of the IETF 236 specifications of the Transmission Control Protocol (TCP), from a 237 security point of view. Possible threats are identified and, where 238 possible, countermeasures are proposed. Additionally, many 239 implementation flaws that have led to security vulnerabilities have 240 been referenced in the hope that future implementations will not 241 incur the same problems. 243 This document does not aim to be the final word on the security 244 aspects of TCP. On the contrary, it aims to raise awareness about a 245 number of TCP vulnerabilities that have been faced in the past, those 246 that are currently being faced, and some of those that we may still 247 have to deal with in the future. 249 Feedback from the community is more than encouraged to help this 250 document be as accurate as possible and to keep it updated as new 251 vulnerabilities are discovered. 253 This document is heavily based on the "Security Assessment of the 254 Transmission Control Protocol (TCP)" released by the UK Centre for 255 the Protection of National Infrastructure (CPNI), available at: http: 256 //www.cpni.gov.uk/Products/technicalnotes/ 257 Feb-09-security-assessment-TCP.aspx . 259 1.2. Scope of this document 261 While there are a number of protocols that may affect the way TCP 262 operates, this document focuses only on the specifications of the 263 Transmission Control Protocol (TCP) itself. 265 The following IETF RFCs were selected for assessment as part of this 266 work: 268 o RFC 793, "Transmission Control Protocol. DARPA Internet Program. 269 Protocol Specification" (91 pages) 271 o RFC 1122, "Requirements for Internet Hosts -- Communication 272 Layers" (116 pages) 274 o RFC 1191, "Path MTU Discovery" (19 pages) 276 o RFC 1323, "TCP Extensions for High Performance" (37 pages) 278 o RFC 1948, "Defending Against Sequence Number Attacks" (6 pages) 280 o RFC 1981, "Path MTU Discovery for IP version 6" (15 pages) 282 o RFC 2018, "TCP Selective Acknowledgment Options" (12 pages) 284 o RFC 2385, "Protection of BGP Sessions via the TCP MD5 Signature 285 Option" (6 pages) 287 o RFC 2581, "TCP Congestion Control" (14 pages) 289 o RFC 2675, "IPv6 Jumbograms" (9 pages) 291 o RFC 2883, "An Extension to the Selective Acknowledgement (SACK) 292 Option for TCP" (17 pages) 294 o RFC 2884, "Performance Evaluation of Explicit Congestion 295 Notification (ECN) in IP Networks" (18 pages) 297 o RFC 2988, "Computing TCP's Retransmission Timer" (8 pages) 299 o RFC 3168, "The Addition of Explicit Congestion Notification (ECN) 300 to IP" (63 pages) 302 o RFC 3465, "TCP Congestion Control with Appropriate Byte Counting 303 (ABC)" (10 pages) 305 o RFC 3517, "A Conservative Selective Acknowledgment (SACK)-based 306 Loss Recovery Algorithm for TCP" (13 pages) 308 o RFC 3540, "Robust Explicit Congestion Notification (ECN) Signaling 309 with Nonces" (13 pages) 311 o RFC 3782, "The NewReno Modification to TCP's Fast Recovery 312 Algorithm" (19 pages) 314 1.3. Organization of this document 316 This document is basically organized in two parts. The first part 317 contains a discussion of each of the TCP header fields, identifies 318 their security implications, and discusses the possible 319 countermeasures. The second part contains an analysis of the 320 security implications of the mechanisms and policies implemented by 321 TCP, and of a number of implementation strategies in use by a number 322 of popular TCP implementations. 324 2. The Transmission Control Protocol 326 The Transmission Control Protocol (TCP) is a connection-oriented 327 transport protocol that provides a reliable byte-stream data transfer 328 service. 330 Very few assumptions are made about the reliability of underlying 331 data transfer services below the TCP layer. Basically, TCP assumes 332 it can obtain a simple, potentially unreliable datagram service from 333 the lower level protocols. Figure 1 illustrates where TCP fits in 334 the DARPA reference model. 336 +---------------+ 337 | Application | 338 +---------------+ 339 | TCP | 340 +---------------+ 341 | IP | 342 +---------------+ 343 | Network | 344 +---------------+ 346 Figure 1: TCP in the DARPA reference model 348 TCP provides facilities in the following areas: 350 o Basic Data Transfer 352 o Reliability 354 o Flow Control 356 o Multiplexing 358 o Connections 359 o Precedence and Security 361 o Congestion Control 363 The core TCP specification, RFC 793 [Postel, 1981c], dates back to 364 1981 and standardizes the basic mechanisms and policies of TCP. RFC 365 1122 [Braden, 1989] provides clarifications and errata for the 366 original specification. RFC 2581 [Allman et al, 1999] specifies TCP 367 congestion control and avoidance mechanisms, not present in the 368 original specification. Other documents specify extensions and 369 improvements for TCP. 371 The large amount of documents that specify extensions, improvements, 372 or modifications to existing TCP mechanisms has led the IETF to 373 publish a roadmap for TCP, RFC 4614 [Duke et al, 2006], that 374 clarifies the relevance of each of those documents. 376 3. TCP header fields 378 RFC 793 [Postel, 1981c] defines the syntax of a TCP segment, along 379 with the semantics of each of the header fields. Figure 2 380 illustrates the syntax of a TCP segment. 382 0 1 2 3 383 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 | Source Port | Destination Port | 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 | Sequence Number | 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 389 | Acknowledgment Number | 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 | Data | |C|E|U|A|P|R|S|F| | 392 | Offset|Resrved|W|C|R|C|S|S|Y|I| Window | 393 | | |R|E|G|K|H|T|N|N| | 394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 395 | Checksum | Urgent Pointer | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 | Options | Padding | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 | data | 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 402 Note that one tick mark represents one bit position 404 Figure 2: Transmission Control Protocol header format 406 The minimum TCP header size is 20 bytes, and corresponds to a TCP 407 segment with no options and no data. However, a TCP module might be 408 handed an (illegitimate) "TCP segment" of less than 20 bytes. 409 Therefore, before doing any processing of the TCP header fields, the 410 following check should be performed by TCP on the segments handed by 411 the internet layer: 413 Segment.Size >= 20 415 If a segment does not pass this check, it should be dropped. 417 The following subsections contain further sanity checks that should 418 be performed on TCP segments. 420 3.1. Source Port and Destination Port 422 The Source Port field contains a 16-bit number that identifies the 423 TCP end-point that originated this TCP segment. The TCP Destination 424 Port contains a 16-bit number that identifies the destination TCP 425 end-point of this segment. In most of the discussion we refer to 426 client-side (or "ephemeral") port-numbers and server-side port 427 numbers, since that distinction is what usually affects the 428 interpretation of a port number. 430 TCP SHOULD randomize its ephemeral (client-side) ports, to improve 431 its resistance to off-path attacks. For the purpose of ephemeral 432 port selection, the largest posible port range SHOULD be used 433 (ideally 1024-65535) I-D.ietf-tsvwg-port-randomization. 435 DISCUSSION: 437 [I-D.ietf-tsvwg-port-randomization] provides advice on port 438 randomization. 440 TCP MUST NOT allocate port number 0, as its use could lead to 441 interoperability problems. If a segment is received with port 0 as 442 the Source Port or the Destination Port, a RST segment SHOULD be sent 443 in response (provided that the incomming segment does not have the 444 RST flag set). 446 DISCUSSION: 448 While port 0 is a legitimate port number, it has a special meaning 449 in the UNIX Sockets API. For example, when a TCP port number of 0 450 is passed as an argument to the bind() function, rather than 451 binding port 0, an ephemeral port is selected for the 452 corresponding TCP end-point. As a result, the TCP port number 0 453 is never actually used in TCP segments. 455 Different implementations have been found to respond differently 456 to TCP segments that have a port number of 0 as the Source Port 457 and/or the Destination Port. As a result, TCP segments with a 458 port number of 0 are usually employed for remote OS detection via 459 TCP/IP stack fingerprinting [Jones, 2003]. 461 Since in practice TCP port 0 is not used by any legitimate 462 application and is only used for fingerprinting purposes, a number 463 of host implementations already reject TCP segments that use 0 as 464 the Source Port and/or the Destination Port. Also, a number 465 firewalls filter (by default) any TCP segments that contain a port 466 number of zero for the Source Port and/or the Destination Port. 468 We therefore recommend that TCP implementations respond to 469 incoming TCP segments that have a Source Port or a Destination 470 Port of 0 with an RST (provided these incoming segments do not 471 have the RST bit set). 473 Responding with an RST segment to incoming segments that have the 474 RST bit would open the door to RST-war attacks. 476 TCP MUST be able to grecefully handle the case where the source end- 477 point (IP Source Address, TCP Source Port) is the same as the 478 destination end-point (IP Destination Address, TCP Destination Port). 480 DISCUSSION: 482 Some systems have been found to be unable to process TCP segments 483 in which the source endpoint {Source Address, Source Port} is the 484 same than the destination end-point {Destination Address, 485 Destination Port}. Such TCP segments have been reported to cause 486 malfunction of a number of implementations [CERT, 1996], and have 487 been exploited in the past to perform Denial of Service (DoS) 488 attacks [Meltman, 1997]. While these packets are very very 489 unlikely to exist in real and legitimate scenarios, TCP should 490 nevertheless be able to process them without the need of any 491 "extra" code. 493 A SYN segment in which the source end-point {Source Address, 494 Source Port} is the same as the destination end-point {Destination 495 Address, Destination Port} will result in a "simultaneous open" 496 scenario, such as the one described in page 32 of RFC 793 [Postel, 497 1981c]. Therefore, those TCP implementations that correctly 498 handle simultaneous opens should already be prepared to handle 499 these unusual TCP segments. 501 TCP SHOULD NOT allocate of port numbers that are in use by a TCP that 502 is in the LISTEN or CLOSED states for use as ephemeral ports, as this 503 could allow attackers on the local system to "steal" incomming TCP 504 connections. 506 DISCUSSION: 508 While the only requirement for a selected ephemeral port is that 509 the resulting four-tuple (connection-id) is unique (i.e., not 510 currently in use by any other TCP connection), in practice it may 511 be necessary to not allow the allocation of port numbers that are 512 in use by a TCP that is in the LISTEN or CLOSED states for use as 513 ephemeral ports, as this might allow an attacker to "steal" 514 incoming connections from a local server application. Therefore, 515 TCP SHOULD NOT allocate port numbers that are in use by a TCP in 516 the LISTEN or CLOSED states for use as ephemeral ports. Section 517 10.2 of this document provides a detailed discussion of this 518 issue. 520 While some systems restrict use of the port numbers in the range 521 0-1024 to privileged users, applications SHOULD NOT grant any trust 522 based on the port numbers used for a TCP connection. 524 DISCUSSION: 526 Not all systems require superuser privileges to bind port numbers 527 in that range. Besides, with desktop computers such "distinction" 528 has generally become irrelevant. 530 Middle-boxes such as packet filters MUST NOT assume that clients use 531 port numbers from only the Dynamic or Registered port ranges. 533 DISCUSSION: 535 It should also be noted that some clients, such as DNS resolvers, 536 are known to use port numbers from the "Well Known Ports" range. 537 Therefore, middle-boxes such as packet filters MUST NOT assume 538 that clients use port number from only the Dynamic or Registered 539 port ranges. 541 3.2. Sequence number 543 TCP SHOULD select its Initial Sequence Numbers (ISNs) with the 544 following expression: 546 ISN = M + F(localhost, localport, remotehost, remoteport, secret_key) 548 where M is a monotonically increasing counter maintained within TCP, 549 and F() is a Pseudo-Random Function (PRF). As it is vital that F() 550 not be computable from the outside, F() could be a PRF of the 551 connection-id and some secret data. HMAC-SHA-256 would be a good 552 choice for F() 554 DISCUSSION: 556 The choice of the Initial Sequence Number of a connection is not 557 arbitrary, but aims to minimize the chances of a stale segment 558 from being accepted by a new incarnation of a previous connection. 559 RFC 793 [Postel, 1981c] suggests the use of a global 32-bit ISN 560 generator, whose lower bit is incremented roughly every 4 561 microseconds. 563 However, use of such an ISN generator makes it trivial to predict 564 the ISN that a TCP will use for new connections, thus allowing a 565 variety of attacks against TCP, such as those described in Section 566 5.2 and Section 11 of this document. This vulnerability was first 567 described in [Morris, 1985], and its exploitation was widely 568 publicized about 10 years later [Shimomura, 1995]. 570 As a matter of fact, protection against old stale segments from a 571 previous incarnation of the connection comes from allowing the 572 creation of a new incarnation of a previous connection only after 573 2*MSL have passed since a segment corresponding to the old 574 incarnation was last seen. This is accomplished by the TIME-WAIT 575 state, and TCP's "quiet time" concept. However, as discussed in 576 Section 3.1 and Section 11.1.2 of this document, the ISN can be 577 used to perform some heuristics meant to avoid an interoperability 578 problem that may arise when two systems establish connections at a 579 high rate. In order for such heuristics to work, the ISNs 580 generated by a TCP should be monotonically increasing. 582 The ISN generation scheme recommended in this section was 583 originally proposed in RFC 1948 [Bellovin, 1996], such that the 584 chances of an attacker from guessing the ISN of a TCP are reduced, 585 while still producing a monotonically-increasing sequence that 586 allows implementation of the optimization described in Section 3.1 587 and Section 11.1.2 of this document. 589 [CERT, 2001] and [US-CERT, 2001] are advisories about the security 590 implications of weak ISN generators. [Zalewski, 2001a] and 591 [Zalewski, 2002] contain a detailed analysis of ISN generators, 592 and a survey of the algorithms in use by popular TCP 593 implementations. 595 Another security consideration that should be made about TCP 596 sequence numbers is that they might allow an attacker to count the 597 number of systems behind a Network Address Translator (NAT) 598 [Srisuresh and Egevang, 2001]. Depending on the ISN generators 599 implemented by each of the systems behind the NAT, an attacker 600 might be able to count the number of systems behind the NAT by 601 establishing a number of TCP connections (using the public address 602 of the NAT) and indentifying the number of different sequence 603 number "spaces". This information leakage could be eliminated by 604 rewriting the contents of all those header fields and options that 605 make use of sequence numbers (such as the Sequence Number and the 606 Acknowledgement Number fields, and the SACK Option) at the NAT. 607 [Gont and Srisuresh, 2008] provides a detailed discussion of the 608 security implications of NATs and of the possible mitigations for 609 this and other issues. 611 3.3. Acknowledgement Number 613 TCP SHOULD set the Acknowledgement Number to zero when sending a TCP 614 segment that does not have the ACK bit set (i.e., a SYN segment). 616 TCP MUST check that, on segments that have the ACK bit set, the 617 Acknowledgment Number satisfies the expression: 619 SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT 621 If a TCP segment does not pass this check, the segment MUST be 622 dropped, and an ACK segment SHOULD be sent in response. 624 DISCUSSION: 626 If the ACK bit is on, the Acknowledgement Number contains the 627 value of the next sequence number the sender of this segment is 628 expecting to receive. According to RFC 793, the Acknowledgement 629 Number is considered valid as long as it does not acknowledge the 630 receipt of data that has not yet been sent. 632 However, as a result of recent concerns on forgery attacks against 633 TCP (see Section 11 of this document), ongoing work at the IETF 634 [Ramaiah et al, 2008] has proposed to enforce a more strict check 635 on the Acknowledgement Number of segments that have the ACK bit 636 set: 638 SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT 640 If the ACK bit is off, the Acknowledgement Number field is not 641 valid. We recommend TCP implementations to set the 642 Acknowledgement Number to zero when sending a TCP segment that 643 does not have the ACK bit set (i.e., a SYN segment). Some TCP 644 implementations have been known to fail to set the Acknowledgement 645 Number to zero, thus leaking information. 647 TCP Acknowledgements are also used to perform heuristics for loss 648 recovery and congestion control. Section 9 of this document 649 describes a number of ways in which these mechanisms can be 650 exploited. 652 3.4. Data Offset 654 TCP MUST enforce the following checks on the Data Offset field: 656 Data Offset >= 5 658 Data Offset * 4 <= TCP segment length 660 If a TCP segment does not pass these checks, it should be silently 661 dropped. 663 The TCP segment length should be obtained from the IP layer, as 664 TCP does not include a TCP segment length field. 666 DISCUSSION: 668 The Data Offset field indicates the length of the TCP header in 669 32-bit words. As the minimum TCP header size is 20 bytes, the 670 minimum legal value for this field is 5. 672 For obvious reasons, the TCP header cannot be larger than the 673 whole TCP segment it is part of. 675 3.5. Control bits 677 The following subsections provide a discussion of the different 678 control bits in the TCP header. TCP segments with unusual 679 combinations of flags set have been known in the past to cause 680 malfunction of some implementations, sometimes to the extent of 681 causing them to crash [Postel, 1987] [Braden, 1992]. These packets 682 are still usually employed for the purpose of TCP/IP stack 683 fingerprinting. Section 12.1 contains a discussion of TCP/IP stack 684 fingerprinting. 686 3.5.1. Reserved (four bits) 688 TCP MUST ignore the Reserved field of incoming TCP segments. 690 DISCUSSION: 692 These four bits are reserved for future use, and must be zero. As 693 with virtually every field, the Reserved field could be used as a 694 covert channel. While there exist intermediate devices such as 695 protocol scrubbers that clear these bits, and firewalls that drop/ 696 reject segments with any of these bits set, these devices should 697 consider the impact of these policies on TCP interoperability. 698 For example, as TCP continues to evolve, all or part of the bits 699 in the Reserved field could be used to implement some new 700 functionality. If some middle-box or end-system implementation 701 were to drop a TCP segment merely because some of these bits are 702 not set to zero, interoperability problems would arise. 704 3.5.2. CWR (Congestion Window Reduced) 706 DISCUSSION: 708 The CWR flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is 709 used as part of the Explicit Congestion Notification (ECN) 710 mechanism. For connections in any of the synchronized states, 711 this flag indicates, when set, that the TCP sending this segment 712 has reduced its congestion window. 714 An analysis of the security implications of ECN can be found in 715 Section 9.3 of this document. 717 3.5.3. ECE (ECN-Echo) 719 DISCUSSION: 721 The ECE flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is 722 used as part of the Explicit Congestion Notification (ECN) 723 mechanism. 725 Once a TCP connection has been established, an ACK segment with 726 the ECE bit set indicates that congestion was encountered in the 727 network on the path from the sender to the receiver. This 728 indication of congestion should be treated just as a congestion 729 loss in non-ECN-capable TCP [Ramakrishnan et al, 2001]. 730 Additionally, TCP should not increase the congestion window (cwnd) 731 in response to such an ACK segment that indicates congestion, and 732 should also not react to congestion indications more than once 733 every window of data (or once per round-trip time). 735 An analysis of the security implications of ECN can be found in 736 Section 9.3 of this document. 738 3.5.4. URG 740 DISCUSSION: 742 When the URG flag is set, the Urgent Pointer field contains the 743 current value of the urgent pointer. 745 Receipt of an "urgent" indication generates, in a number of 746 implementations (such as those in UNIX-like systems), a software 747 interrupt (signal) that is delivered to the corresponding process. 749 In UNIX-like systems, receipt of an urgent indication causes a 750 SIGURG signal to be delivered to the corresponding process. 752 A number of applications handle TCP urgent indications by 753 installing a signal handler for the corresponding signal (e.g., 754 SIGURG). As discussed in [Zalewski, 2001b], some signal handlers 755 can be maliciously exploited by an attacker, for example to gain 756 remote access to a system. While secure programming of signal 757 handlers is out of the scope of this document, we nevertheless 758 raise awareness that TCP urgent indications might be exploited to 759 abuse poorly-written signal handlers. 761 Section 3.9 discusses the security implications of the TCP urgent 762 mechanism. 764 3.5.5. ACK 766 DISCUSSION: 768 When the ACK bit is one, the Acknowledgment Number field contains 769 the next sequence number expected, cumulatively acknowledging the 770 receipt of all data up to the sequence number in the 771 Acknowledgement Number, minus one. Section 3.4 of this document 772 describes sanity checks that should be performed on the 773 Acknowledgement Number field. 775 TCP Acknowledgements are also used to perform heuristics for loss 776 recovery and congestion control. Section 9 of this document 777 describes a number of ways in which these mechanisms can be 778 exploited. 780 3.5.6. PSH 782 As a result of a SEND call, TCP SHOULD send all queued data (provided 783 that TCP's flow control and congestion control algorithms allow it). 785 Received data SHOULD be immediately delivered to an application 786 calling the RECEIVE function, even if the data already available are 787 less than those requested by the application. 789 DISCUSSION: 791 RFC 793 [Postel, 1981c] contains (in pages 54-64) a functional 792 description of a TCP Application Programming Interface (API). One 793 of the parameters of the SEND function is the PUSH flag which, 794 when set, signals the local TCP that it must send all unsent data. 795 The TCP PSH (PUSH) flag will be set in the last outgoing segment, 796 to signal the push function to the receiving TCP. Upon receipt of 797 a segment with the PSH flag set, the receiving user's buffer is 798 returned to the user, without waiting for additional data to 799 arrive. 801 There are two security considerations arising from the PUSH 802 function. On the sending side, an attacker could cause a large 803 amount of data to be queued for transmission without setting the 804 PUSH flag in the SEND call. This would prevent the local TCP from 805 sending the queued data, causing system memory to be tied to those 806 data for an unnecessarily long period of time. 808 An analogous consideration should be made for the receiving TCP. 809 TCP is allowed to buffer incoming data until the receiving user's 810 buffer fills or a segment with the PSH bit set is received. If 811 the receiving TCP implements this policy, an attacker could send a 812 large amount of data, slightly less than the receiving user's 813 buffer size, to cause system memory to be tied to these data for 814 an unnecessarily long period of time. Both of these issues are 815 discussed in Section 4.2.2.2 of RFC 1122 [Braden, 1989]. 817 In order to mitigate these potential vulnerabilities, we suggest 818 assuming an implicit "PUSH" in every SEND call. On the sending 819 side, this means that as a result of a SEND call TCP should try to 820 send all queued data (provided that TCP's flow control and 821 congestion control algorithms allow it). On the receiving side, 822 this means that the received data will be immediately delivered to 823 an application calling the RECEIVE function, even if the data 824 already available are less than those requested by the 825 application. 827 It is interesting to note that popular TCP APIs (such as 828 "sockets") do not provide a PUSH flag in any of the interfaces 829 they define, but rather perform some kind of "heuristics" to set 830 the PSH bit in outgoing segments. As a result, the value of the 831 PSH bit in the received TCP segments is usually a policy of the 832 sending TCP, rather than a policy of the sending application. All 833 robust applications that make use of those APIs (such as the 834 sockets API) properly handle the case of a RECEIVE call returning 835 less data (e.g., zero) than requested, usually by performing 836 subsequent RECEIVE calls. 838 Another potential malicious use of the PSH bit would be for an 839 attacker to send small TCP segments (probably with zero bytes of 840 data payload) to cause the receiving application to be 841 unnecessarily woken up (increasing the CPU load), or to cause 842 malfunction of poorly-written applications that may not handle 843 well the case of RECEIVE calls returning less data than requested. 845 3.5.7. RST 847 TCP MUST process RST segments (i.e., segments with the RST bit set) 848 as follows: 850 o If the Sequence Number of the RST segment is not valid (i.e., 851 falls outside of the receive window), silently drop the segment. 853 o If the Sequence Number of the RST segment matches the next 854 expected sequence number (RCV.NXT), abort the corresponding 855 connection. 857 o If the Sequence Number is valid (i.e., falls within the receive 858 window) but is not exactly RCV.NXT, send an ACK segment (a 859 "challenge ACK") of the form: . 860 TCP SHOULD rate-limit these challenge ACK segments. 862 DISCUSSION: 864 The RST bit is used to request the abortion (abnormal close) of a 865 TCP connection. RFC 793 [Postel, 1981c] suggests that an RST 866 segment should be considered valid if its Sequence Number is valid 867 (i.e., falls within the receive window). However, in response to 868 the security concerns raised by [Watson, 2004] and [NISCC, 2004], 869 [Ramaiah et al, 2008] proposec the aforementioned stricter 870 validity checks. 872 Section 11.1 of this document describes TCP-based connection-reset 873 attacks, along with a number of countermeasures to mitigate their 874 impact. 876 3.5.8. SYN 878 DISCUSSION: 880 The SYN bit is used during the connection-establishment phase, to 881 request the synchronization of sequence numbers. 883 There are basically four different vulnerabilities that make use 884 of the SYN bit: SYN-flooding attacks, connection forgery attacks, 885 connection flooding attacks, and connection-reset attacks. They 886 are described in Section 5.1, Section 5.2, Section 5.3, and 887 Section 11.1.2, respectively, along with the possible 888 countermeasures. 890 3.5.9. FIN 892 DISCUSSION: 894 The FIN flag is used to signal the remote end-point the end of the 895 data transfer in this direction. Receipt of a valid FIN segment 896 (i.e., a TCP segment with the FIN flag set) causes the transition 897 in the connection state, as part of what is usually referred to as 898 the "connection termination phase". 900 The connection-termination phase can be exploited to perform a 901 number of resource-exhaustion attacks. Section 6 of this document 902 describes a number of attacks that exploit the connection- 903 termination phase along with the possible countermeasures. 905 3.6. Window 907 DISCUSSION: 909 The TCP Window field advertises how many bytes of data the remote 910 peer is allowed to send before a new advertisement is made. 911 Theoretically, the maximum transfer rate that can be achieved by 912 TCP is limited to: 914 Maximum Transfer Rate = Window / RTT 916 This means that, under ideal network conditions (e.g., no packet 917 loss), the TCP Window in use should be at least: 919 Window = 2 * Bandwidth * Delay 921 Using a larger Window than that resulting from the previous 922 equation will not provide any improvements in terms of 923 performance. 925 In practice, selection of the most convenient Window size may also 926 depend on a number of other parameters, such as: packet loss rate, 927 loss recovery mechanisms in use, etc. 929 Security implications of the maximum TCP window size 931 An aspect of the TCP Window that is usually overlooked is the 932 security implications of its size. Increasing the TCP window 933 increases the sequence number space that will be considered 934 "valid" for incoming segments. Thus, use of unnecessarily large 935 TCP Window sizes increases TCP's vulnerability to forgery attacks 936 unnecessarily. 938 In those scenarios in which the network conditions are known 939 and/or can be easily predicted, it is recommended that the TCP 940 Window is never set to a value larger than that resulting from the 941 equations above. Additionally, the nature of the application 942 running on top of TCP should be considered when tuning the TCP 943 window. As an example, an H.245 signaling application certainly 944 does not have high requirements on throughput, and thus a window 945 size of around 4 KBytes will usually fulfill its needs, while 946 keeping TCP's resistance to off-path forgery attacks at a decent 947 level. Some rough measurements seem to indicate that a TCP window 948 of 4Kbytes is common practice for TCP connections servicing 949 applications such as BGP. 951 In principle, a possible approach to avoid requiring 952 administrators to manually set the TCP window would be to 953 implement an automatic buffer tuning mechanism, such as that 954 described in [Heffner, 2002]. However, as discussed in Section 955 7.3.2 of this document these mechanisms can be exploited to 956 perform other types of attacks. 958 Security implications arising from closed windows 960 The TCP window is a flow-control mechanism that prevents a fast 961 data sender application from overwhelming a "slow" receiver. When 962 a TCP end-point is not willing to receive any more data (before 963 some of the data that have already been received are consumed), it 964 will advertise a TCP window of zero bytes. This will effectively 965 stop the sender from sending any new data to the TCP receiver. 966 Transmission of new data will resume when the TCP receiver 967 advertises a nonzero TCP window, usually with a TCP segment that 968 contains no data ("an ACK"). 970 This segment is usually referred to as a "window update", as the 971 only purpose of this segment is to update the server regarding the 972 new window. 974 To accommodate those scenarios in which the ACK segment that 975 "opens" the window is lost, TCP implements a "persist timer" that 976 causes the TCP sender to query the TCP receiver periodically if 977 the last segment received advertised a window of zero bytes. This 978 probe simply consists of sending one byte of new data that will 979 force the TCP receiver to send an ACK segment back to the TCP 980 sender, containing the current TCP window. Similarly to the 981 retransmission timeout timer, an exponential back-off is used when 982 calculating the retransmission timer, so that the spacing between 983 probes increases exponentially. 985 A fundamental difference between the "persist timer" and the 986 retransmission timer is that there is no limit on the amount of 987 time during which a TCP can advertise a zero window. This means 988 that a TCP end-point could potentially advertise a zero window 989 forever, thus keeping kernel memory at the TCP sender tied to the 990 TCP retransmission buffer. This could clearly be exploited as a 991 vector for performing a Denial of Service (DoS) attack against 992 TCP, such as that described in Section 7.1 of this document. 994 Section 7.1 of this document describes a Denial of Service attack 995 that aims at exhausting the kernel memory used for the TCP 996 retransmission buffer, along with possible countermeasures. 998 3.7. Checksum 1000 Middleboxes that process TCP segments MUST validate the Checksum 1001 field, and silently discard the TCP segment if such validation fails. 1003 DISCUSSION: 1005 The Checksum field is an error detection mechanism meant for the 1006 contents of the TCP segment and a number of important fields of 1007 the IP header. It is computed over the full TCP header pre-pended 1008 with a pseudo header that includes the IP Source Address, the IP 1009 Destination Address, the Protocol number, and the TCP segment 1010 length. While in principle there should not be security 1011 implications arising from this field, due to non-RFC-compliant 1012 implementations, the Checksum can be exploited to detect 1013 firewalls, evade network intrusion detection systems (NIDS), 1014 and/or perform Denial of Service attacks. 1016 If a stateful firewall does not check the TCP Checksum in the 1017 segments it processes, an attacker can exploit this situation to 1018 perform a variety of attacks. For example, he could send a flood 1019 of TCP segments with invalid checksums, which would nevertheless 1020 create state information at the firewall. When each of these 1021 segments is received at its intended destination, the TCP checksum 1022 will be found to be incorrect, and the corresponding will be 1023 silently discarded. As these segments will not elicit a response 1024 (e.g., an RST segment) from the intended recipients, the 1025 corresponding connection state entries at the firewall will not be 1026 removed. Therefore, an attacker may end up tying all the state 1027 resources of the firewall to TCP connections that will never 1028 complete or be terminated, probably leading to a Denial of Service 1029 to legitimate users, or forcing the firewall to randomly drop 1030 connection state entries. 1032 If a NIDS does not check the Checksum of TCP segments, an attacker 1033 may send TCP segments with an invalid checksum to cause the NIDS 1034 to obtain a TCP data stream different from that obtained by the 1035 system being monitored. In order to "confuse" the NIDS, the 1036 attacker would send TCP segments with an invalid Checksum and a 1037 Sequence Number that would overlap the sequence number space being 1038 used for his malicious activity. FTester [Barisani, 2006] is a 1039 tool that can be used to assess NIDS on this issue. 1041 Finally, an attacker performing port-scanning could potentially 1042 exploit intermediate systems that do not check the TCP Checksum to 1043 detect whether a given TCP port is being filtered by an 1044 intermediate firewall, or the port is actually closed by the host 1045 being port-scanned. If a given TCP port appeared to be closed, 1046 the attacker would then send a SYN segment with an invalid 1047 Checksum. If this segment elicited a response (either an ICMP 1048 error message or a TCP RST segment) to this packet, then that 1049 response should come from a system that does not check the TCP 1050 checksum. Since normal host implementations of the TCP protocol 1051 do check the TCP checksum, such a response would most likely come 1052 from a firewall or some other middle-box. 1054 [Ed3f, 2002] describes the exploitation of the TCP checksum for 1055 performing the above activities. [US-CERT, 2005d] provides an 1056 example of a TCP implementation that failed to check the TCP 1057 checksum. 1059 3.8. Urgent pointer 1061 Segment.Size - Data Offset * 4 > 0 1063 If a TCP segment with the URG bit set does not pass this check, it 1064 MUST be silently dropped. 1066 For TCP segments that have the URG bit set to zero, sending TCP TCP 1067 SHOULD set the Urgent Pointer to zero. 1069 A receiving TCP MUST ignore the Urgent Pointer field of TCP segments 1070 for which the URG bit is zero. 1072 DISCUSSION: 1074 Section 3.7 of RFC 793 [Postel, 1981c] states (in page 42) that to 1075 send an urgent indication the user must also send at least one 1076 byte of data. 1078 If the URG bit is zero, the Urgent Pointer is not valid, and thus 1079 should not be processed by the receiving TCP. Nevertheless, we 1080 recommend TCP implementations to set the Urgent Pointer to zero 1081 when sending a TCP segment that does not have the URG bit set, and 1082 to ignore the Urgent Pointer (as required by RFC 793) when the URG 1083 bit is zero. 1085 Some stacks have been known to fail to set the Urgent Pointer to 1086 zero when the URG bit is zero, thus leaking out the corresponding 1087 system memory contents. [Zalewski, 2008] provides further details 1088 about this issue. 1090 Some implementations have been found to be unable to process TCP 1091 urgent indications correctly. [Myst, 1997] originally described 1092 how TCP urgent indications could be exploited to perform a Denial 1093 of Service (DoS) attack against some TCP/IP implementations, 1094 usually leading to a system crash. 1096 3.9. Options 1098 [IANA, 2007] contains the official list of the assigned option 1099 numbers. TCP Options have been specified in the past both within the 1100 IETF and by other groups. [Hnes, 2007] contains an un-official 1101 updated version of the IANA list of assigned option numbers. The 1102 following table contains a summary of the assigned TCP option 1103 numbers, which is based on [Hnes, 2007]. 1105 +--------+----------------------+-----------------------------------+ 1106 | Kind | Meaning | Summary | 1107 +--------+----------------------+-----------------------------------+ 1108 | 0 | End of Option List | Discussed in Section 4.1 | 1109 +--------+----------------------+-----------------------------------+ 1110 | 1 | No-Operation | Discussed in Section 4.2 | 1111 +--------+----------------------+-----------------------------------+ 1112 | 2 | Maximum Segment Size | Discussed in Section 4.3 | 1113 +--------+----------------------+-----------------------------------+ 1114 | 3 | WSOPT - Window Scale | Discussed in Section 4.6 | 1115 +--------+----------------------+-----------------------------------+ 1116 | 4 | SACK Permitted | Discussed in Section 4.4.1 | 1117 +--------+----------------------+-----------------------------------+ 1118 | 5 | SACK | Discussed in Section 4.4.2 | 1119 +--------+----------------------+-----------------------------------+ 1120 | 6 | Echo (obsoleted by | Obsolete. Specified in RFC 1072 | 1121 | | option 8) | [Jacobson and Braden, 1988] | 1122 +--------+----------------------+-----------------------------------+ 1123 | 7 | Echo Reply | Obsolete. Specified in RFC 1072 | 1124 | | (obsoleted by option | [Jacobson and Braden, 1988] | 1125 | | 8) | | 1126 +--------+----------------------+-----------------------------------+ 1127 | 8 | TSOPT - Time Stamp | Discussed in Section 4.7 | 1128 | | Option | | 1129 +--------+----------------------+-----------------------------------+ 1130 | 9 | Partial Order | Historic. Specified in RFC 1693 | 1131 | | Connection Permitted | [Connolly et al, 1994] | 1132 +--------+----------------------+-----------------------------------+ 1133 | 10 | Partial Order | Historic. Specified in RFC 1693 | 1134 | | Service Profile | [Connolly et al, 1994] | 1135 +--------+----------------------+-----------------------------------+ 1136 | 11 | CC | Historic. Specified in RFC 1644 | 1137 | | | [Braden, 1994] | 1138 +--------+----------------------+-----------------------------------+ 1139 | 12 | CC.NEW | Historic. Specified in RFC 1644 | 1140 | | | [Braden, 1994] | 1141 +--------+----------------------+-----------------------------------+ 1142 | 13 | CC.ECHO | Historic. Specified in RFC 1644 | 1143 | | | [Braden, 1994] | 1144 +--------+----------------------+-----------------------------------+ 1145 | 14 | TCP Alternate | Historic. Specified in RFC 1146 | 1146 | | Checksum Request | [Zweig and Partridge, 1990] | 1147 +--------+----------------------+-----------------------------------+ 1148 | 15 | TCP Alternate | Historic. Specified in RFC 1145 | 1149 | | Checksum Data | [Zweig and Partridge, 1990] | 1150 +--------+----------------------+-----------------------------------+ 1151 | 16 | Skeeter | Historic | 1152 +--------+----------------------+-----------------------------------+ 1153 +--------+----------------------+-----------------------------------+ 1154 | 17 | Bubba | Historic | 1155 +--------+----------------------+-----------------------------------+ 1156 | 18 | Trailer Checksum | Historic | 1157 | | Option | | 1158 +--------+----------------------+-----------------------------------+ 1159 | 19 | MD5 Signature Option | Discussed in Section 4.5 | 1160 +--------+----------------------+-----------------------------------+ 1161 | 20 | SCPS Capabilities | Specified in [CCSDS, 2006] | 1162 +--------+----------------------+-----------------------------------+ 1163 | 21 | Selective Negative | Specified in [CCSDS, 2006] | 1164 | | Acknowledgements | | 1165 +--------+----------------------+-----------------------------------+ 1166 | 22 | Record Boundaries | Specified in [CCSDS, 2006] | 1167 +--------+----------------------+-----------------------------------+ 1168 | 23 | Corruption | Specified in [CCSDS, 2006] | 1169 | | experienced | | 1170 +--------+----------------------+-----------------------------------+ 1171 | 24 | SNAP | Historic | 1172 +--------+----------------------+-----------------------------------+ 1173 | 25 | Unassigned (released | Unassigned | 1174 | | 2000-12-18) | | 1175 +--------+----------------------+-----------------------------------+ 1176 | 26 | TCP Compression | Historic | 1177 | | Filter | | 1178 +--------+----------------------+-----------------------------------+ 1179 | 27 | Quick-Start Response | Specified in RFC 4782 [Floyd et | 1180 | | | al, 2007] | 1181 +--------+----------------------+-----------------------------------+ 1182 | 28-252 | Unassigned | Unassigned | 1183 +--------+----------------------+-----------------------------------+ 1184 | 253 | RFC3692-style | Described by RFC 4727 [Fenner, | 1185 | | Experiment 1 | 2006] | 1186 +--------+----------------------+-----------------------------------+ 1187 | 254 | RFC3692-style | Described by RFC 4727 [Fenner, | 1188 | | Experiment 2 | 2006] | 1189 +--------+----------------------+-----------------------------------+ 1191 Table 1: TCP Options 1193 There are two cases for the format of a TCP option: 1195 o Case 1: A single byte of option-kind. 1197 o Case 2: An option-kind byte, followed by an option-length byte, 1198 and the actual option-data bytes. 1200 In options of the Case 2 above, the option-length byte counts the 1201 option-kind byte and the option-length byte, as well as the actual 1202 option-data bytes. 1204 All options except "End of Option List" (Kind = 0) and "No Operation" 1205 (Kind = 1), are of "Case 2". 1207 For options that belong to the "Case 2" described above, the 1208 following checks MUST be performed: 1210 option-length >= 2 1212 option-offset + option-length <= Data Offset * 4 1214 Where option-offset is the offset of the first byte of the option 1215 within the TCP header, with the first byte of the TCP header being 1216 assigned an offset of 0. 1218 If a TCP segment fails to pass any of these checks, it SHOULD be 1219 silently dropped. 1221 TCP MUST ignore unknown TCP options, provided they pass the 1222 validation checks specified above. In the same way, middle-boxes 1223 such as packet filters SHOULD NOT reject TCP segments containing 1224 "unknown" TCP options that pass the validation checks described 1225 earlier in this Section. 1227 DISCUSSION: 1229 The value "2" in the first equation accounts for the option-kind 1230 byte and the option-length byte, and assumes zero bytes of option- 1231 data. This check prevents, among other things, loops in option 1232 processing that may arise from incorrect option lengths. 1234 The second equation takes into account the limit on the legitimate 1235 option length imposed by the syntax of the TCP header, and is 1236 meant to detect forged option-length values that might make an 1237 option overlap with the TCP payload, or even go past the actual 1238 end of the TCP segment carrying the option. 1240 Middle-boxes such as packet filters should not reject TCP segments 1241 containing unknown options solely because these options have not been 1242 present in the SYN/SYN-ACK handshake. 1244 DISCUSSION: 1246 There is renewed interest in defining new TCP options for purposes 1247 like improved connection management and maintenance, advanced 1248 congestion control schemes, and security features. The evolution 1249 of the TCP/IP protocol suite would be severely impacted by 1250 obstacles to deploying such new protocol mechanisms. 1252 Middle-boxes such as packet filters SHOULD NOT reject TCP segments 1253 containing unknown options solely because these options have not been 1254 present in the SYN/SYN-ACK handshake. 1256 DISCUSSION: 1258 In the past, TCP enhancements based on TCP options regularly have 1259 specified the exchange of a specific "enabling" option during the 1260 initial SYN/SYN-ACK handshake. Due to the severely limited TCP 1261 option space which has already become a concern, it should be 1262 expected that future specifications might introduce new options 1263 not negotiated or enabled in this way. Therefore, middle-boxes 1264 such as packet filters should not reject TCP segments containing 1265 unknown options solely because these options have not been present 1266 in the SYN/SYN-ACK handshake. 1268 TCP MUST NOT "echo" in any way unknown TCP options received in 1269 inbound TCP segments. 1271 DISCUSSION: 1273 Some TCP implementations have been known to "echo" unknown TCP 1274 options received in incoming segments. Here we stress that TCP 1275 must not "echo" in any way unknown TCP options received in inbound 1276 TCP segments. This is at the foundation for the introduction of 1277 new TCP options, ensuring unambiguous behavior of systems not 1278 supporting a new specification. 1280 Section 4 discusses the security implications of common TCP options. 1282 3.10. Padding 1284 The TCP header padding is used to ensure that the TCP header ends and 1285 data begins on a 32-bit boundary. The padding is composed of zeros. 1287 3.11. Data 1289 The data field contains the upper-layer packet being transmitted by 1290 means of TCP. This payload is processed by the application process 1291 making use of the transport services of TCP. Therefore, the security 1292 implications of this field are out of the scope of this document. 1294 4. Common TCP Options 1296 4.1. End of Option List (Kind = 0) 1298 TCP implementations MUST be able to gracefully handle those TCP 1299 segments in which the End of Option List should have been present, 1300 but is missing. 1302 DISCUSSION: 1304 This option is used to indicate the "end of options" in those 1305 cases in which the end of options would not coincide with the end 1306 of the TCP header. 1308 TCP implementations are required to ignore those options they do 1309 not implement, and to be able to handle options with illegal 1310 lengths. Therefore, TCP implementations should be able to 1311 gracefully handle those TCP segments in which the End of Option 1312 List should have been present, but is missing. 1314 It is interesting to note that some TCP implementations do not use 1315 the "End of Option List" option for indicating the "end of 1316 options", but simply pad the TCP header with several "No 1317 Operation" (Kind = 1) options to meet the header length specified 1318 by the Data Offset header field. 1320 4.2. No Operation (Kind = 1) 1322 The no-operation option is basically used to allow the sending system 1323 to align subsequent options in, for example, 32-bit boundaries. 1325 This option does not have any known security implications. 1327 4.3. Maximum Segment Size (Kind = 2) 1329 The Maximum Segment Size (MSS) option is used to indicate to the 1330 remote TCP endpoint the maximum segment size this TCP is willing to 1331 receive. 1333 The following check MUST be performed on a TCP segment that carries a 1334 MSS option: 1336 SYN == 1 1338 If the segment does not pass this check, it MUST be silently dropped. 1340 DISCUSSION: 1342 As stated in Section 3.1 of RFC 793 [Postel, 1981c], this option 1343 can only be sent in the initial connection request (i.e., in 1344 segments with the SYN control bit set). 1346 TCP MUST check that the option length is 4. If the option does not 1347 pass this check, it MUST be dropped. 1349 The received MSS SHOULD be sanitized as follows: 1351 Sanitized_MSS = max(MSS, 536) 1353 This "sanitized" MSS value SHOULD be used to compute the "effective 1354 send MSS" by the expression included in Section 4.2.2.6 of RFC 1122 1355 [Braden, 1989], as follows: 1357 Eff.snd.MSS = min(Sanitized_MSS+20, MMS_S) - TCPhdrsize - IPoptionsize 1359 where: 1361 Sanitized_MSS: 1362 sanitized MSS value (the value received in the MSS option, with an 1363 enforced minimum value) 1365 MMS_S: 1366 maximum size for a transport-layer message that TCP may send 1368 TCPhdrsize: 1369 size of the TCP header, which typically was 20, but may be larger 1370 if TCP options are to be sent. 1372 IPoptionsize 1373 size of any IP options that TCP will pass to the IP layer with the 1374 current message. 1376 DISCUSSION: 1378 The advertised maximum segment size may be the result of the 1379 consideration of a number of factors. Firstly, if fragmentation 1380 is employed, the size of the IP reassembly buffer may impose a 1381 limit on the maximum TCP segment size that can be received. 1382 Considering that the minimum IP reassembly buffer size is 576 1383 bytes, if an MSS option is not present included in the connection- 1384 establishment phase, an MSS of 536 bytes should be assumed. 1385 Secondly, if Path-MTU Discovery (specified in RFC 1191 [Mogul and 1386 Deering, 1990] and RFC 1981 [McCann et al, 1996]) is expected to 1387 be used for the connection, an artificial maximum segment size may 1388 be enforced by a TCP to prevent the remote peer from sending TCP 1389 segments which would be too large to be transmitted without 1390 fragmentation. Finally, a system connected by a low-speed link 1391 may choose to introduce an artificial maximum segment size to 1392 enforce an upper limit on the network latency that would otherwise 1393 negatively affect its interactive applications [Stevens, 1994]. 1395 The TCP specifications do not impose any requirements on the 1396 maximum segment size value that is included in the MSS option. 1397 However, there are a number of values that may cause undesirable 1398 results. Firstly, an MSS of 0 could possible "freeze" the TCP 1399 connection, as it would not allow data to be included in the 1400 payload of the TCP segments. Secondly, low values other than 0 1401 would degrade the performance of the TCP connection (wasting more 1402 bandwidth in protocol headers than in actual data), and could 1403 potentially exhaust processing cycles at the sending TCP and/or 1404 the receiving TCP by producing an increase in the interrupt rate 1405 caused by the transmitted (or received) packets. 1407 The problems that might arise from low MSS values were first 1408 described by [Reed, 2001]. However, the community did not reach 1409 consensus on how to deal with these issues at that point. 1411 RFC 791 [Postel, 1981a] requires IP implementations to be able to 1412 receive IP datagrams of at least 576 bytes. Assuming an IPv4 1413 header of 20 bytes, and a TCP header of 20 bytes, there should be 1414 room in each IP packet for 536 application data bytes. 1416 There are two cases to analyze when considering the possible 1417 interoperability impact of sanitizing the received MSS value: TCP 1418 connections relying on IP fragmentation and TCP connections 1419 implementing Path-MTU Discovery. In case the corresponding TCP 1420 connection relies on IP fragmentation, given that the minimum 1421 reassembly buffer size is required to be 576 bytes by RFC 791 1422 [Postel, 1981a], the adoption of 536 bytes as a lower limit is 1423 safe. 1425 In case the TCP connection relies on Path-MTU Discovery, imposing 1426 a lower limit on the adopted MSS may ignore the advice of the 1427 remote TCP on the maximum segment size that can possibly be 1428 transmitted without fragmentation. As a result, this could lead 1429 to the first TCP data segment to be larger than the Path-MTU. 1430 However, in such a scenario, the TCP segment should elicit an ICMP 1431 Unreachable "fragmentation needed and DF bit set" error message 1432 that would cause the "effective send MSS" (E_MSS) to be decreased 1433 appropriately. Thus, imposing a lower limit on the accepted MSS 1434 will not cause any interoperability problems. 1436 A possible scenario exists in which the proposed enforcement of a 1437 lower limit in the received MSS might lead to an interoperability 1438 problem. If a system was attached to the network by means of a 1439 link with an MTU of less than 576 bytes, and there was some 1440 intermediate system which either silently dropped (i.e., without 1441 sending an ICMP error message) those packets equal to or larger 1442 than that 576 bytes, or some intermediate system simply filtered 1443 ICMP "fragmentation needed and DF bit set" error messages, the 1444 proposed behavior would not lead to an interoperability problem, 1445 when communication could have otherwise succeeded. However, the 1446 interoperability problem would really be introduced by the network 1447 setup (e.g., the middle-box silently dropping packets), rather 1448 than by the mechanism proposed in this section. In any case, TCP 1449 should nevertheless implement a mechanism such as that specified 1450 by RFC 4821 [Mathis and Heffner, 2007] to deal with this type of 1451 "network black-holes". 1453 4.4. Selective Acknowledgement Option 1455 The Selective Acknowledgement option provides an extension to allow 1456 the acknowledgement of individual segments, to enhance TCP's loss 1457 recovery. 1459 Two options are involved in the SACK mechanism. The "Sack-permitted 1460 option" is sent during the connections-establishment phase, to 1461 advertise that SACK is supported. If both TCP peers agree to use 1462 selective acknowledgements, the actual selective acknowledgements are 1463 sent, if needed, by means of "SACK options". 1465 4.4.1. SACK-permitted Option (Kind = 4) 1467 The SACK-permitted option is meant to advertise that the TCP sending 1468 this segment supports Selective Acknowledgements. 1470 The following check MUST be performed on a TCP segment that carries a 1471 MSS option: 1473 SYN == 1 1475 If a segment does not pass this check, it MUST be silently dropped. 1477 DISCUSSION: 1479 The SACK-permitted option can be sent only in SYN segments. 1481 TCP MUST check that the option length is 2. If the option does not 1482 pass this check it MUST be silently dropped. 1484 4.4.2. SACK Option (Kind = 5) 1486 The SACK option is used to convey extended acknowledgment information 1487 from the receiver to the sender over an established TCP connection. 1488 The option consists of an option-kind byte (which must be 5), an 1489 option-length byte, and a variable number of SACK blocks. 1491 TCP MUST silently discard those TCP segments carrying a SACK option 1492 that does not pass the following check: 1494 option-offset + option-length <= Data Offset * 4 1496 TCP MUST silently discard those TCP segments carrying a SACK option 1497 that does not pass the following check: 1499 option-length >= 10 1501 DISCUSSION: 1503 A SACK Option with zero SACK blocks is nonsensical. The value 1504 "10" accounts for the option-kind byte, the option-length byte, a 1505 4-byte left-edge field, and a 4-byte right-edge field. 1507 TCP MUST silently discard those TCP segments carrying a SACK option 1508 that does not pass the following check: 1510 (option-length - 2) % 8 == 0 1512 DISCUSSION: 1514 As stated in Section 3 of RFC 2018 [Mathis et al, 1996], a SACK 1515 option that specifies n blocks will have a length of 8*n+2. 1517 TCP MUST silently discard those TCP segments carrying a SACK option 1518 that contains a SACK block that does not pass the following check: 1520 Left Edge of Block < Right Edge of Block 1522 As in all the other occurrences in this document, all comparisons 1523 between sequence numbers should be performed using sequence number 1524 arithmetic. 1526 DISCUSSION: 1528 Each block included in a SACK option represents a number of 1529 received data bytes that are contiguous and isolated; that is, the 1530 bytes just below the block, (Left Edge of Block - 1), and just 1531 above the block, (Right Edge of Block), have not yet been 1532 received. 1534 TCP MUST enforce a limit on the number of SACK blocks that a TCP will 1535 store in memory for each connection at any time. 1537 DISCUSSION: 1539 The TCP receiving a SACK option is expected to keep track of the 1540 selectively-acknowledged blocks. Even when space in the TCP 1541 header is limited (and thus each TCP segment can selectively- 1542 acknowledge at most four blocks of data), an attacker could try to 1543 perform a buffer overflow or a resource-exhaustion attack by 1544 sending a large number of SACK options. 1546 For example, an attacker could send a large number of SACK 1547 options, each of them acknowledging one byte of data. 1548 Additionally, for the purpose of wasting resources on the attacked 1549 system, each of these blocks would be separated from each other by 1550 one byte, to prevent the attacked system from coalescing two (or 1551 more) contiguous SACK blocks into a single SACK block. If the 1552 attacked system kept track of each SACKed block by storing both 1553 the Left Edge and the Right Edge of the block, then for each 1554 window of data, the attacker could waste up to 4 * Window bytes of 1555 memory at the attacked TCP. 1557 The value "4 * Window" results from the expression "(Window / 2) * 1558 8", in which the value "2" accounts for the 1-byte block 1559 selectively-acknowledged by each SACK block and 1 byte that would 1560 be used to separate each SACK blocks from each other, and the 1561 value "8" accounts for the 8 bytes needed to store the Left Edge 1562 and the Right Edge of each SACKed block. 1564 Therefore, it is clear that a limit should be imposed on the 1565 number of SACK blocks that a TCP will store in memory for each 1566 connection at any time. Measurements in [Dharmapurikar and 1567 Paxson, 2005] indicate that in the vast majority of cases 1568 connections have a single hole in the data stream at any given 1569 time. Thus, a limit of 16 SACK blocks for each connection would 1570 handle even most of the more unusual cases in which there is more 1571 than one simultaneous hole at a time. 1573 4.5. MD5 Option (Kind=19) 1575 The TCP MD5 option provides a mechanism for authenticating TCP 1576 segments with a 18-byte digest produced by the MD5 algorithm. The 1577 option consists of an option-kind byte (which must be 19), an option- 1578 length byte (which must be 18), and a 16-byte MD5 digest. 1580 TCP MUST silently drop a TCP segment that carries a TCP MD5 option 1581 that does not pass the following checks: 1583 option-offset + option-length <= Data Offset * 4 1585 option-length == 18 1587 DISCUSSION: 1589 The TCP MD5 option is of "Case 2", and has a fixed length. 1591 DISCUSSION: 1593 A basic weakness on the TCP MD5 option is that the MD5 algorithm 1594 itself has been known (for a long time) to be vulnerable to 1595 collision search attacks. 1597 [Bellovin, 2006] argues that it has two other weaknesses, namely 1598 that it does not provide a key identifier, and that it has no 1599 provision for automated key management. However, it is generally 1600 accepted that while a Key-ID field can be a good approach for 1601 providing smooth key rollover, it is not actually a requirement. 1602 For instance, most systems implementing the TCP MD5 option include 1603 a "keychain" mechanism that fully supports smooth key rollover. 1604 Additionally, with some further work, ISAKMP/IKE could be used to 1605 configure the MD5 keys. 1607 It is interesting to note that while the TCP MD5 option, as 1608 specified by RFC 2385 [Heffernan, 1998], addresses the TCP-based 1609 forgery attacks against TCP discussed in Section 11, it does not 1610 address the ICMP-based connection-reset attacks discussed in 1611 Section 15. As a result, while a TCP connection may be protected 1612 from TCP-based forgery attacks by means of the MD5 option, an 1613 attacker might still be able to successfully perform the ICMP- 1614 based counter-part. 1616 The TCP MD5 option has been obsoleted by the TCP-AO. 1618 4.6. Window scale option (Kind = 3) 1620 The window scale option provides a mechanism to expand the definition 1621 of the TCP window to 32 bits, such that the performance of TCP can be 1622 improved in some network scenarios. The Window scale option consists 1623 of an option-kind byte (which must be 3), followed by an option- 1624 length byte (which must be 3), and a shift count (shift.cnt) byte 1625 (the actual option-data). 1627 The option may be sent only in the initial SYN segment, but may also 1628 be sent in a SYN/ACK segment if the option was received in the 1629 initial SYN segment. If the option is received in any other segment, 1630 it MUST be silently dropped. 1632 TCP MUST silently discard TCP segments that contain a Window scale 1633 option whose option-length is not 3. 1635 DISCUSSION: 1637 This option has a fixed length. 1639 TCP MUST silently discard TCP segments that contain a Window scale 1640 option that does not pass the following check: 1642 shift.cnt <= 14 1644 DISCUSSION: 1646 As discussed in Section 2.3 of RFC 1323 [Jacobson et al, 1992], in 1647 order to prevent new data from being mistakenly considered as old 1648 and vice versa, the resulting window should be equal to or smaller 1649 than 2^32. 1651 DISCUSSION: 1653 [Welzl, 2008] describes major problems with the use of the Window 1654 scale option in the Internet due to faulty equipment. 1656 While there are not known security implications arising from the 1657 window scale mechanism itself, the size of the TCP window has a 1658 number of security implications. In general, larger window sizes 1659 increase the chances of an attacker from successfully performing 1660 forgery attacks against TCP, such as those described in Section 11 1661 of this document. Additionally, large windows can exacerbate the 1662 impact of resource exhaustion attacks such as those described in 1663 Section 7 of this document. 1665 Section 3.7 provides a general discussion of the security 1666 implications of the TCP window size. Section 7.3.2 discusses the 1667 security implications of Automatic receive-buffer tuning 1668 mechanisms. 1670 4.7. Timestamps option (Kind = 8) 1672 The Timestamps option, specified in RFC 1323 [Jacobson et al, 1992], 1673 is used to perform two functions: Round-Trip Time Measurement (RTTM), 1674 and Protection Against Wrapped Sequence Numbers (PAWS). 1676 TCP MUST silently discard TCP segments that contain a Timestamps 1677 option that does not pass the following check: 1679 option-length == 10 1681 DISCUSSION: 1683 As specified by RFC 1323, the option-length must be 10. 1685 4.7.1. Generation of timestamps 1687 TCP SHOULD generate timestamps with the following expression: 1689 timestamp = T() + F(localhost, localport, remotehost, remoteport, secret_key) 1691 where the result of T() is a global system clock that complies with 1692 the requirements of Section 4.2.2 of RFC 1323 [Jacobson et al, 1992], 1693 and F() is a function that should not be computable from the outside. 1694 Therefore, we suggest F() to be a cryptographic hash function of the 1695 connection-id and some secret data. 1697 DISCUSSION: 1699 For the purpose of PAWS, the timestamps sent on a connection are 1700 required to be monotonically increasing. While there is no 1701 requirement that timestamps are monotonically increasing across 1702 TCP connections, the generation of timestamps such that they are 1703 monotonically increasing across connections between the same two 1704 endpoints allows the use of timestamps for improving the handling 1705 of SYN segments that are received while the corresponding four- 1706 tuple is in the TIME-WAIT state. This is discussed in Section 1707 11.1.2 of this document. 1709 F() provides an offset that will be the same for all incarnations 1710 of a connection between the same two endpoints, while T() provides 1711 the monotonically increasing values that are needed for PAWS. 1713 Further discussion about this algorithm is available in 1714 [I-D.gont-timestamps-generation]. 1716 TCP SHOULD NOT initialize a global timestamp counter to a fixed value 1717 when the system is bootstrapped. 1719 DISCUSSION: 1721 Some implementations are known to initialize their global 1722 timestamp clock to zero when the system is bootstrapped. This is 1723 undesirable, as the timestamp clock would disclose the system 1724 uptime. 1726 TCP SHOULD set the Timestamp Echo Reply (TSecr) field to zero when 1727 sending a TCP segment that does not have the ACK bit set (i.e., a SYN 1728 segment). 1730 DISCUSSION: 1732 Some TCP implementations have been found to fail to set the 1733 Timestamp Echo Reply field (TSecr) to zero in TCP segments that do 1734 not have the ACK bit set, thus potentially leaking information. 1736 4.7.2. Vulnerabilities 1738 Blind In-Window Attacks 1740 Segments that contain a timestamp option smaller than the last 1741 timestamp option recorded by TCP are silently dropped. This allows 1742 for a subtle attack against TCP that would allow an attacker to cause 1743 one direction of data transfer of the attacked connection to freeze 1744 [US-CERT, 2005c]. An attacker could forge a TCP segment that 1745 contains a timestamp that is much larger than the last timestamp 1746 recorded for that direction of the data transfer of the connection. 1747 The offending segment would cause the recorded timestamp (TS.Recent) 1748 to be updated and, as a result, subsequent segments sent by the 1749 impersonated TCP peer would be simply dropped by the receiving TCP. 1750 This vulnerability has been documented in [US-CERT, 2005d]. However, 1751 it is worth noting that exploitation of this vulnerability requires 1752 an attacker to guess (or know) the four-tuple {IP Source Address, IP 1753 Destination Address, TCP Source Port, TCP Destination Port}, as well 1754 a valid Sequence Number and a valid Acknowledgement Number. If an 1755 attacker has such detailed knowledge about a TCP connection, unless 1756 TCP segments are protected by proper authentication mechanisms (such 1757 as IPsec [Kent and Seo, 2005]), he can perform a variety of attacks 1758 against the TCP connection, even more devastating than the one just 1759 described. 1761 Information leaking 1763 Some implementations are known to maintain a global timestamp clock, 1764 which is used for all connections. This is undesirable, as an 1765 attacker that can establish a connection with a host would learn the 1766 timestamp used for all the other connections maintained by that host, 1767 which could be useful for performing any attacks that require the 1768 attacker to forge TCP segments. A timestamps generator such as the 1769 one recommended in Section 4.7.1 of this document would prevent this 1770 information leakage, as it separates the "timestamps space" among the 1771 different TCP connections. 1773 Some implementations are known to initialize their global timestamp 1774 clock to zero when the system is bootstrapped. This is undesirable, 1775 as the timestamp clock would disclose the system uptime. A 1776 timestamps generator such as the one recommended in Section 4.7.1 of 1777 this document would prevent this information leakage, as the function 1778 F() introduces an "offset" that does not disclose the system uptime. 1780 As discussed in Section 3.2 of RFC 1323 [Jacobson et al, 1992], the 1781 Timestamp Echo Reply field (TSecr) is only valid if the ACK bit of 1782 the TCP header is set, and its value must be zero when it is not 1783 valid. However, some TCP implementations have been found to fail to 1784 set the Timestamp Echo Reply field (TSecr) to zero in TCP segments 1785 that do not have the ACK bit set, thus potentially leaking 1786 information. We stress that TCP implementations should comply with 1787 RFC 1323 by setting the Timestamp Echo Reply field (TSecr) to zero in 1788 those TCP segments that do not have the ACK bit set, thus eliminating 1789 this potential information leakage. 1791 Finally, it should be noted that the Timestamps option can be 1792 exploited to count the number of systems behind NATs (Network Address 1793 Translators) [Srisuresh and Egevang, 2001]. An attacker could count 1794 the number of systems behind a NAT by establishing a number of TCP 1795 connections (using the public address of the NAT) and indentifying 1796 the number of different timestamp sequences. This information 1797 leakage could be eliminated by rewriting the contents of the 1798 Timestamps option at the NAT. [Gont and Srisuresh, 2008] provides a 1799 detailed discussion of the security implications of NATs, and 1800 proposes mitigations for this and other issues. 1802 5. Connection-establishment mechanism 1804 The following subsections describe a number of attacks that can be 1805 performed against TCP by exploiting its connection-establishment 1806 mechanism. 1808 5.1. SYN flood 1810 TCP SHOULD implement (and enable by default) a syn-cache [Lemon, 1811 2002]. 1813 TCP SHOULD implement syn-cookies, and SHOULD enable them only after a 1814 specified number of TCBs has been allocated for connections in the 1815 SYN-RECEIVED state. 1817 DISCUSSION: 1819 TCP uses a mechanism known as the "three-way handshake" for the 1820 establishment of a connection between two TCP peers. RFC 793 1821 [Postel, 1981c] states that when a TCP that is in the LISTEN state 1822 receives a SYN segment (i.e., a TCP segment with the SYN flag 1823 set), it must transition to the SYN-RECEIVED state, record the 1824 control information (e.g., the ISN) contained in the SYN segment 1825 in a Transmission Control Block (TCB), and respond with a SYN/ACK 1826 segment. 1828 A Transmission Control Block is the data structure used to store 1829 (usually within the kernel) all the information relevant to a TCP 1830 connection. The concept of "TCB" is introduced in the core TCP 1831 specification RFC 793 [Postel, 1981c]. 1833 In practice, virtually all existing implementations do not modify 1834 the state of the TCP that was in the LISTEN state, but rather 1835 create a new TCP (i.e., a new "protocol machine"), and perform all 1836 the state transitions on this newly-created TCP. This allows the 1837 application running on top of TCP to service to more than one 1838 client at the same time. As a result, each connection request 1839 results in the allocation of system memory to store the TCB 1840 associated with the newly created TCB. 1842 If TCP was implemented strictly as described in RFC 793, the 1843 application running on top of TCP would have to finish servicing 1844 the current client before being able to service the next one in 1845 line, or should instead be able to perform some kind of connection 1846 hand-off. 1848 An attacker could exploit TCP's connection-establishment mechanism 1849 to perform a Denial of Service (DoS) attack, by sending a large 1850 number of connection requests to the target system, with the 1851 intent of exhausting the system memory destined for storing TCBs 1852 (or related kernel data structures), thus preventing the attacked 1853 system from establishing new connections with legitimate users. 1854 This attack is widely known as "SYN flood", and has received a lot 1855 of attention during the late 90's [CERT, 1996]. 1857 Given that the attacker does not need to complete the three-way 1858 handshake for the attacked system to tie system resources to the 1859 newly created TCBs, he will typically forge the source IP address 1860 of the malicious SYN segments he sends, thus concealing his own IP 1861 address. 1863 If the forged IP addresses corresponded to some reachable system, 1864 the impersonated system would receive the SYN/ACK segment sent by 1865 the attacked host (in response to the forged SYN segment), which 1866 would elicit an RST segment. This RST segment would be delivered 1867 to the attacked system, causing the corresponding connection to be 1868 aborted, and the corresponding TCB to be removed. 1870 As the impersonated host would not have any state information for 1871 the TCP connection being referred to by the SYN/ACK segment, it 1872 would respond with a RST segment, as specified by the TCP segment 1873 processing rules of RFC 793 [Postel, 1981c]. 1875 However, if the forged IP source addresses were unreachable, the 1876 attacked TCP would continue retransmitting the SYN/ACK segment 1877 corresponding to each connection request, until timing out and 1878 aborting the connection. For this reason, a number of widely 1879 available attack tools first check whether each of the (forged) IP 1880 addresses are reachable by sending an ICMP echo request to them. 1881 The receipt of an ICMP echo response is considered an indication 1882 of the IP address being reachable (and thus results in the 1883 corresponding IP address not being used for performing the 1884 attack), while the receipt of an ICMP unreachable error message is 1885 considered an indication of the IP address being unreachable (and 1886 thus results in the corresponding IP address being used for 1887 performing the attack). 1889 [Gont, 2008b] describes how the so-called ICMP soft errors could 1890 be used by TCP to abort connections in any of the non-synchronized 1891 states. While implementation of the mechanism described in that 1892 document would certainly not eliminate the vulnerability of TCP to 1893 SYN flood attacks (as the attacker could use addresses that are 1894 simply "black-holed"), it provides an example of how signaling 1895 information such as that provided by means of ICMP error messages 1896 can provide valuable information that a transport protocol could 1897 use to perform heuristics. 1899 In order to mitigate the impact of this attack, the amount of 1900 information stored for non-established connections should be 1901 reduced (ideally, non-synchronized connections should not require 1902 any state information to be maintained at the TCP performing the 1903 passive OPEN). There are basically two mitigation techniques for 1904 this vulnerability: a syn-cache and syn-cookies. 1906 [Borman, 1997] and RFC 4987 [Eddy, 2007] contain a general 1907 discussion of SYN-flooding attacks and common mitigation 1908 approaches. 1910 The syn-cache [Lemon, 2002] approach aims at reducing the amount 1911 of state information that is maintained for connections in the 1912 SYN-RECEIVED state, and allocates a full TCB only after the 1913 connection has transited to the ESTABLISHED state. 1915 The syn-cookie [Bernstein, 1996] approach aims at completely 1916 eliminating the need to maintain state information at the TCP 1917 performing the passive OPEN, by encoding the most elementary 1918 information required to complete the three-way handshake in the 1919 Sequence Number of the SYN/ACK segment that is sent in response to 1920 the received SYN segment. Thus, TCP is relieved from keeping 1921 state for connections in the SYN-RECEIVED state. 1923 The syn-cookie approach has a number of drawbacks: 1925 * Firstly, given the limited space in the Sequence Number field, 1926 it is not possible to encode all the information included in 1927 the initial segment, such as, for example, support of Selective 1928 Acknowledgements (SACK). 1930 * Secondly, in the event that the Acknowledgement segment sent in 1931 response to the SYN/ACK sent by the TCP that performed the 1932 passive OPEN (i.e., the TCP server) were lost, the connection 1933 would end up in the ESTABLISHED state on the client-side, but 1934 in the CLOSED state on the server side. This scenario is 1935 normally handled in TCP by having the TCP server retransmit its 1936 SYN/ACK. However, if syn-cookies are enabled, there would be 1937 no connection state information on the server side, and thus 1938 the SYN/ACK would never be retransmitted. This could lead to a 1939 scenario in which the connection could remain in the 1940 ESTABLISHED state on the client side, but in the CLOSED state 1941 at the server side, indefinitely. If the application protocol 1942 was such that it required the client to wait for some data from 1943 the server (e.g., a greeting message) before sending any data 1944 to the server, a deadlock would take place, with the client 1945 application waiting for such server data, and the server 1946 waiting for the TCP three-way handshake to complete. 1948 * Thirdly, unless the function used to encode information in the 1949 SYN/ACK packet is cryptographically strong, an attacker could 1950 forge TCP connections in the ESTABLISHED state by forging ACK 1951 segments that would be considered as "legitimate" by the 1952 receiving TCP. 1954 * Fourthly, in those scenarios in which establishment of new 1955 connections is blocked by simply dropping segments with the SYN 1956 bit set, use of SYN cookies could allow an attacker to bypass 1957 the firewall rules, as a connection could be established by 1958 forging an ACK segment with the correct values, without the 1959 need of setting the SYN bit. 1961 As a result, syn-cookies are usually not employed as a first line 1962 of defense against SYN-flood attacks, but are only as the last 1963 resort to cope with them. For example, some TCP implementations 1964 enable syn-cookies only after a certain number of TCBs has been 1965 allocated for connections in the SYN-RECEIVED state. We recommend 1966 this implementation technique, with a syn-cache enabled by 1967 default, and use of syn-cookies triggered, for example, when the 1968 limit of TCBs for non-synchronized connections with a given port 1969 number has been reached. 1971 It is interesting to note that a SYN-flood attack should only 1972 affect the establishment of new connections. A number of books 1973 and online documents seem to assume that TCP will not be able to 1974 respond to any TCP segment that is meant for a TCP port that is 1975 being SYN-flooded (e.g., respond with an RST segment upon receipt 1976 of a TCP segment that refers to a non-existent TCP connection). 1977 While SYN-flooding attacks have been successfully exploited in the 1978 past for achieving such a goal [Shimomura, 1995], as clarified by 1979 RFC 1948 [Bellovin, 1996] the effectiveness of SYN flood attacks 1980 to silence a TCP implementation arose as a result of a bug in the 1981 4.4BSD TCP implementation [Wright and Stevens, 1994], rather than 1982 from a theoretical property of SYN-flood attacks themselves. 1983 Therefore, those TCP implementations that do not suffer from such 1984 a bug should not be silenced as a result of a SYN-flood attack. 1986 [Zquete, 2002] describes a mechanism that could theoretically 1987 improve the functionality of SYN cookies. It exploits the TCP 1988 "simultaneous open" mechanism, as illustrated in Figure 5. 1990 See Figure 5, in page 46 of the UK CPNI document. 1992 Use of TCP simultaneous open for handling SYN floods 1994 In line 1, TCP A initiates the connection-establishment phase by 1995 sending a SYN segment to TCP B. In line 2, TCP B creates a SYN 1996 cookie as described by [Bernstein, 1996], but does not set the ACK 1997 bit of the segment it sends (thus really sending a SYN segment, 1998 rather than a SYN/ACK). This "fools" TCP A into thinking that 1999 both SYN segments "have crossed each other in the network" as if a 2000 "simultaneous open" scenario had taken place. As a result, in 2001 line 3 TCP A sends a SYN/ACK segment containing the same options 2002 that were contained in the original SYN segment. In line 4, upon 2003 receipt of this segment, TCP processes the cookie encoded in the 2004 ACK field as if it had been the result of a traditional SYN cookie 2005 scenario, and moves the connection into the ESTABLISHED state. In 2006 line 5, TCP B sends a SYN/ACK segment, which causes the connection 2007 at TCP A to move into the ESTABLISHED state. In line 6, TCP A 2008 sends a data segment on the connection. 2010 While this mechanism would work in theory, unfortunately there are 2011 a number of factors that prevent it from being usable in real 2012 network environments: 2014 * Some systems are not able to perform the "simultaneous open" 2015 operation specified in RFC 793, and thus the connection 2016 establishment will fail. 2018 * Some firewalls might prevent the establishment of TCP 2019 connections that rely on the "simultaneous open" mechanism 2020 (e.g., a given firewall might be allowing incoming SYN/ACK 2021 segments, but not outgoing SYN/ACK segments). 2023 Therefore, we do not recommend implementation of this mechanism 2024 for mitigating SYN-flood attacks. 2026 5.2. Connection forgery 2028 The process of causing a TCP connection to be illegitimately 2029 established between two arbitrary remote peers is usually referred to 2030 as "connection spoofing" or "connection forgery". This can have a 2031 great negative impact when systems establish some sort of trust 2032 relationships based on the IP addresses used to establish a TCP 2033 connection [daemon9 et al, 1996]. 2035 It should be stressed that hosts should not establish trust 2036 relationships based on the IP addresses [CPNI, 2008] or on the TCP 2037 ports in use for the TCP connection (see Section 3.1 and Section 3.2 2038 of this document). 2040 One of the underlying weaknesses that allow this vulnerability to be 2041 more easily exploited is the use of an inadequate Initial Sequence 2042 Number (ISN) generator, as explained back in the 80's in [Morris, 2043 1985]. As discussed in Section 3.3.1 of this document, any TCP 2044 implementation that makes use of an inadequate ISN generator will be 2045 more vulnerable to this type of attack. A discussion of approaches 2046 for a more careful generation of Initial Sequence Numbers (ISNs) can 2047 be found in Section 3.3.1 of this document. 2049 Another attack vector for performing connection-forgery attacks is 2050 the use of IP source routing. By forging the Source Address of the 2051 IP packets that encapsulate the TCP segments of a connection, and 2052 carefully crafting an IP source route option (i.e., either LSSR or 2053 SSRR) that includes a system whose traffic he can monitor, an 2054 attacker could cause the packets sent by the attacked system (e.g., 2055 the SYN/ACK segment sent in response to the attacker's SYN segment) 2056 to be illegitimately directed to him [CPNI, 2008]. Thus, the 2057 attacker would not even need to guess valid sequence numbers for 2058 forging a TCP connection, as he would simply have direct access to 2059 all this information. As discussed in [CPNI, 2008], it is strongly 2060 recommended that systems disable IP Source Routing by default, or at 2061 the very least, they disable source routing for IP packets that 2062 encapsulate TCP segments. 2064 The IPv6 Routing Header Type 0, which provides a similar 2065 functionality to that provided by IPv4 source routing, has been 2066 officially deprecated by RFC 5095 [Abley et al, 2007]. 2068 5.3. Connection-flooding attack 2070 5.3.1. Vulnerability 2072 The creation and maintenance of a TCP connection requires system 2073 memory to maintain shared state between the local and the remote TCP. 2074 As system memory is a finite resource, there is a limit on the number 2075 of TCP connections that a system can maintain at any time. When the 2076 TCP API is employed to create a TCP connection with a remote peer, it 2077 allocates system memory for maintaining shared state with the remote 2078 TCP peer, and thus the resulting connection would tie a similar 2079 amount of resources at the remote host as at the local host. 2080 However, if special packet-crafting tools are employed to forge TCP 2081 segments to establish TCP connections with a remote peer, the local 2082 kernel implementation of TCP can be bypassed, and the allocation of 2083 resources on the attacker's system for maintaining shared state can 2084 be avoided. Thus, a malicious user could create a large number of 2085 TCP connections, and subsequently abandon them, thus tying system 2086 resources only at the remote peer. This allows an attacker to create 2087 a large number of TCP connections at the attacked system with the 2088 intent of exhausting its kernel memory, without exhausting the 2089 attacker's own resources. [CERT, 2000] discusses this vulnerability, 2090 which is usually referred to as the "Naptha attack". 2092 This attack is similar in nature to the "Netkill" attack discussed in 2093 Section 7.1.1. However, while Netkill ties both TCBs and TCP send 2094 buffers to the abandoned connections, Naptha only ties TCBs (and 2095 related kernel structures), as it doesn't issue any application 2096 requests. 2098 The symptom of this attack is an extremely large number of TCP 2099 connections in the ESTABLISHED state, which would tend to exhaust 2100 system resources and deny service to new clients (or possibly cause 2101 the system to crash). 2103 It should be noted that it is possible for an attacker to perform the 2104 same type of attack causing the abandoned connections to remain in 2105 states other than ESTABLISHED. This might be interesting for an 2106 attacker, as it is usually the case that connections in states other 2107 than ESTABLISHED usually have no controlling user-space process (that 2108 is, the former controlling process for the connection has already 2109 closed the corresponding file descriptor). 2111 A particularly interesting case of a connection-flooding attack that 2112 aims at abandoning connections in a state other than ESTABLISHED is 2113 discussed in Section 6.1 of this document. 2115 5.3.2. Countermeasures 2117 As with many other resource exhaustion attacks, the problem in 2118 generating countermeasures for this attack is that it may be 2119 difficult to differentiate between an actual attack and a legitimate 2120 high-load scenario. However, there are a number of countermeasures 2121 which, when tuned for each particular network environment, could 2122 allow a system to resist this attack and continue servicing 2123 legitimate clients. 2125 Hosts SHOULD enforce limits on the number of TCP connections with no 2126 user-space controlling process. 2128 DISCUSSION: 2130 Connections in states other than ESTABLISHED usually have no user- 2131 space controlling process. This prevents the application making 2132 use of those connections from enforcing limits on the maximum 2133 number of ongoing connections (either on a global basis or a 2134 per-IP address basis). When resource exhaustion is imminent or 2135 some threshold of ongoing connections is reached, the operating 2136 system should consider freeing system resources by aborting 2137 connections that have no user-space controlling process. A number 2138 of such connections could be aborted on a random basis, or based 2139 on some heuristics performed by the operating system (e.g., first 2140 abort connections with peers that have the largest number of 2141 ongoing connections with no user-space controlling process). 2143 Hosts SHOULD enforce per-process and per-user limits on maximum 2144 kernel memory that can be used at any time. 2146 Hosts SHOULD enforce per-process and per-user limits on the number of 2147 existent TCP connections at any time. 2149 DISCUSSION: 2151 While the Naphta attack is usually targeted at a service such as 2152 HTTP, its impact is usually system-wide. This is particularly 2153 undesirable, as an attack against a single service might affect 2154 the system as a whole (for example, possibly precluding remote 2155 system administration). 2157 In order to avoid an attack to a single service from affecting 2158 other services, we advise TCP implementations to enforce per- 2159 process and per-user limits on maximum kernel memory that can be 2160 used at any time. Additionally, we recommend implementations to 2161 enforce per-process and per-user limits on the number of existent 2162 TCP connections at any time. 2164 Applications SHOULD enforce limits on the number of simultaneous 2165 connections that can be established from a single IP address or 2166 network prefix at any given time. 2168 DISCUSSION: 2170 An application could limit the number of simultaneous connections 2171 that can be established from a single IP address or network prefix 2172 at any given time. Once that limit has been reached, some other 2173 connection from the same IP address or network prefix would be 2174 aborted, thus allowing the application to service this new 2175 incoming connection. 2177 There are a number of factors that should be taken into account 2178 when defining the specific limit to enforce. For example, in the 2179 case of protocols that have an authentication phase (e.g., SSH, 2180 POP3, etc.), this limit could be applied to sessions that have not 2181 yet been authenticated. Additionally, depending on the nature and 2182 use of the application, it might or might not be normal for a 2183 single system to have multiple connections to the same server at 2184 the same time. 2186 For many network services, the limit of maximum simultaneous 2187 connections could be kept very low. For example, an SMTP server 2188 could limit the number of simultaneous connections from a single 2189 IP address to 10 or 20 connections. 2191 While this limit could work in many network scenarios, we 2192 recommend network operators to measure the maximum number of 2193 concurrent connections from a single IP address during normal 2194 operation, and set the limit accordingly. 2196 In the case of web servers, this limit will usually need to be set 2197 much higher, as it is common practice for web clients to establish 2198 multiple simultaneous connections with a single web server to 2199 speed up the process of loading a web page (e.g., multiple graphic 2200 files can be downloaded simultaneously using separate TCP 2201 connections). 2203 NATs (Network Address Translators) [Srisuresh and Egevang, 2001] 2204 are widely deployed in the Internet, and may exacerbate this 2205 situation, as a large number of clients behind a NAT might each 2206 establish multiple TCP connections with a given web server, which 2207 would all appear to be originate from the same IP address (that of 2208 the NAT box). 2210 Firewalls MAY enforce limits on the number of simultaneous 2211 connections that can be established from a single IP address or 2212 network prefix at any given time. 2214 DISCUSSION: 2216 Some firewalls can be configured to limit the number of 2217 simultaneous connections that any system can maintain with a 2218 specific system and/or service at any given time. Limiting the 2219 number of simultaneous connections that each system can establish 2220 with a specific system and service would effectively limit the 2221 possibility of an attacker that controls a single IP address to 2222 exhaust system resources at the attacker system/service. 2224 5.4. Firewall-bypassing techniques 2226 TCP MUST silently drop those TCP segments that have both the SYN and 2227 the RST flags set. 2229 DISCUSSION: 2231 Some firewalls block incoming TCP connections by blocking only 2232 incoming SYN segments. However, there are inconsistencies in how 2233 different TCP implementations handle SYN segments that have 2234 additional flags set, which may allow an attacker to bypass 2235 firewall rules [US-CERT, 2003b]. 2237 For example, some firewalls have been known to mistakenly allow 2238 incoming SYN segments if they also have the RST bit set. As some 2239 TCP implementations will create a new connection in response to a 2240 TCP segment with both the SYN and RST bits set, an attacker could 2241 bypass the firewall rules and establish a connection with a 2242 "protected" system by setting the RST bit in his SYN segments. 2244 Here we advise TCP implementations to silently drop those TCP 2245 segments that have both the SYN and the RST flags set. 2247 6. Connection-termination mechanism 2249 6.1. FIN-WAIT-2 flooding attack 2251 6.1.1. Vulnerability 2253 TCP implements a connection-termination mechanism that is employed 2254 for the graceful termination of a TCP connection. This mechanism 2255 usually consists of the exchange of four-segments. Figure 6 2256 illustrates the usual segment exchange for this mechanism. 2258 Figure 6: TCP connection-termination mechanism 2260 See Figure 6, in page 50 of the UK CPNI document. 2262 TCP connection-termination mechanism 2264 A potential problem may arise as a result of the FIN-WAIT-2 state: 2265 there is no limit on the amount of time that a TCP can remain in the 2266 FIN-WAIT-2 state. Furthermore, no segment exchange is required to 2267 maintain the connection in that state. 2269 As a result, an attacker could establish a large number of 2270 connections with the target system, and cause it close each of them. 2271 For each connection, once the target system has sent its FIN segment, 2272 the attacker would acknowledge the receipt of this segment, but would 2273 send no further segments on that connection. As a result, an 2274 attacker could cause the corresponding system resources (e.g., the 2275 system memory used for storing the TCB) without the need to send any 2276 further packets. 2278 While the CLOSE command described in RFC 793 [Postel, 1981c] simply 2279 signals the remote TCP end-point that this TCP has finished sending 2280 data (i.e., it closes only one direction of the data transfer), the 2281 close() system-call available in most operating systems has different 2282 semantics: it marks the corresponding file descriptor as closed (and 2283 thus it is no longer usable), and assigns the operating system the 2284 responsibility to deliver any queued data to the remote TCP peer and 2285 to terminate the TCP connection. This makes the FIN-WAIT-2 state 2286 particularly attractive for performing memory exhaustion attacks, as 2287 even if the application running on top of TCP were imposing limits on 2288 the maximum number of ongoing connections, and/or time limits on the 2289 function calls performed on TCP connections, that application would 2290 be unable to enforce these limits on the FIN-WAIT-2 state. 2292 6.1.2. Countermeasures 2294 A number of countermeasures can be implemented to mitigate FIN-WAIT-2 2295 flooding attacks. Some of these countermeasures require changes in 2296 the TCP implementations, while others require changes in the 2297 applications running on top of TCP. 2299 TCP SHOULD enforce limits on the duration of the FIN-WAIT-2 state. 2301 DISCUSSION: 2303 In order to avoid the risk of having connections stuck in the FIN- 2304 WAIT-2 state indefinitely, a number of systems incorporate a 2305 timeout for the FIN-WAIT-2 state. For example, the Linux kernel 2306 version 2.4 enforces a timeout of 60 seconds [Linux, 2008]. If 2307 the connection-termination mechanism does not complete before that 2308 timeout value, it is aborted. 2310 Enabling applications to enforce limits on ongoing connections 2312 As discussed in Section 6.1.1, the fact that the close() system call 2313 marks the corresponding file descriptor as closed prevents the 2314 application running on top of TCP from enforcing limits on the 2315 corresponding connection. 2317 While it is common practice for applications to terminate their 2318 connections by means of the close() system call, it is possible for 2319 an application to initiate the connection-termination phase without 2320 closing the corresponding file descriptor (hence keeping control of 2321 the connection). 2323 In order to achieve this, an application performing an active close 2324 (i.e., initiating the connection-termination phase) should replace 2325 the system-call close(sockfd) with the following code sequence: 2327 o A call to shutdown(sockfd, SHUT_WR), to close the sending 2328 direction of this connection 2330 o Successive calls to read(), until it returns 0, thus indicating 2331 that the remote TCP peer has finished sending data. 2333 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 2334 sizeof(l)), where l is of type struct linger (with its members 2335 l.l_onoff=1 and l.l_linger=90). 2337 o A call to close(sockfd), to close the corresponding file 2338 descriptor. 2340 The call to shutdown() (instead of close()) allows the application to 2341 retain control of the underlying TCP connection while the connection 2342 transitions through the FIN-WAIT-1 and FIN-WAIT-2 states. However, 2343 the application will not retain control of the connection while it 2344 transitions through the CLOSING and TIME-WAIT states. 2346 It should be noted that, strictly speaking, close(sockfd) decrements 2347 the reference count for the descriptor sockfd, and initiates the 2348 connection termination phase only when the reference count reaches 0. 2349 On the other hand, shutdown(sockfd, SHUT_WR) initiates the 2350 connection-termination phase, regardless of the reference count for 2351 the sockfd descriptor. This should be taken into account when 2352 performing the code replacement described above. For example, it 2353 would be a bug for two processes (e.g., parent and child) that share 2354 a descriptor to both call shutdown(sockfd, SHUT_WR). 2356 An application performing a passive close should replace the call to 2357 close(sockfd) with the following code sequence: 2359 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 2360 sizeof(l)), where l is of type struct linger (with its members 2361 l.l_onoff=1 and l.l_linger=90). 2363 o A call to close(sockfd), to close the corresponding file 2364 descriptor. 2366 It is assumed that if the application is performing a passive close, 2367 the application already detected that the remote TCP peer finished 2368 sending data by means as a result of a call to read() returning 0. 2370 In this scenario, the application will not retain control of the 2371 underlying connection when it transitions through the LAST_ACK state. 2373 Enforcing limits on the number of connections with no user-space 2374 controlling process 2376 The considerations and recommendations in Section 5.3.2 for enforcing 2377 limits on the number of connections with no user-space controlling 2378 process are applicable to mitigate this vulnerability. 2380 Limiting the number of simultaneous connections at the application 2382 The considerations and recommendations in Section 5.3.2 for limiting 2383 the number of simultaneous connections at the application are to 2384 mitigate this vulnerability. We note, however, that unless 2385 applications are implemented to retain control of the underlying TCP 2386 connection while the connection transitions through the FIN-WAIT-1 2387 and FIN-WAIT-2 states, enforcing such limits may prove to be a 2388 difficult task. 2390 Limiting the number of simultaneous connections at firewalls 2392 The considerations and recommendations in Section 5.3.2 for enforcing 2393 limiting the number of simultaneous connections at firewalls are 2394 applicable to mitigate this vulnerability. 2396 7. Buffer management 2398 7.1. TCP retransmission buffer 2400 7.1.1. Vulnerability 2402 [Shalunov, 2000] describes a resource exhaustion attack (Netkill) 2403 that can be performed against TCP. The attack aims at exhausting 2404 system memory by creating a large number of TCP connections which are 2405 then abandoned. The attack is usually performed as follows: 2407 o The attacker creates a TCP connection to a service in which a 2408 small client request can result in a large server response (e.g., 2409 HTTP). Rather than relying on his kernel implementation of TCP, 2410 the attacker creates his TCP connections by means of a specialized 2411 packet-crafting tool. This allows the attacker to create the TCP 2412 connections and later abandon them, exhausting the resources at 2413 the attacked system, while not tying his own system resources to 2414 the abandoned connections. 2416 o When the connection is established (i.e., the three-way handshake 2417 has completed), an application request is sent, and the TCP 2418 connection is subsequently abandoned. At this point, any state 2419 information kept by the attack tool is removed. 2421 o The attacked server allocates TCP send buffers for transmitting 2422 the response to the client's request. This causes the victim TCP 2423 to tie resources not only for the Transmission Control Block 2424 (TCB), but also for the application data that needs to be 2425 transferred. 2427 o Once the application response is queued for transmission, the 2428 application closes the TCP connection, and thus TCP takes the 2429 responsibility to deliver the queued data. Having the application 2430 close the connection has the benefit for the attacker that the 2431 application is not able to keep track of the number of TCP 2432 connections in use, and thus it is not able to enforce limits on 2433 the number of connections. 2435 o The attacker repeats the above steps a large number of times, thus 2436 causing a large amount of system memory at the victim host to be 2437 tied to the abandoned connections. When the system memory is 2438 exhausted, the victim host denies service to new connections, or 2439 possibly crashes. 2441 There are a number of factors that affect the effectiveness of this 2442 attack that are worth considering. Firstly, while the attack is 2443 typically targeted at a service such as HTTP, the consequences of the 2444 attack are usually system-wide. Secondly, depending on the size of 2445 the server's response, the underlying TCP connection may or may not 2446 be closed: if the response is larger than the TCP send buffer size at 2447 the server, the application will usually block in a call to write() 2448 or send(), and would therefore not close the TCP connection, thus 2449 allowing the application to enforce limits on the number of ongoing 2450 connections. Consequently, the attacker will usually try to elicit a 2451 response that is equal to or slightly smaller than the send buffer of 2452 the attacked TCP. Thirdly, while [Shalunov, 2000] notes that one 2453 visible effect of this attack is a large number of connections in the 2454 FIN-WAIT-1 state, this will not usually be the case. Given that the 2455 attacker never acknowledges any segment other than the SYN/ACK 2456 segment that is part of the three-way handshake, at the point in 2457 which the attacked TCP tries to send the application's response the 2458 congestion window (cwnd) will usually be 4*SMSS (four maximum-sized 2459 segments). If the application's response were larger than 4*SMSS, 2460 even if the application had closed the connection, the FIN segment 2461 would never be sent, and thus the connection would still remain in 2462 the ESTABLISHED state (rather than transit to the FIN-WAIT-1 state). 2464 7.1.2. Countermeasures 2466 The resource exhaustion attack described in Section 7.1.1 does not 2467 necessarily differ from a legitimate high-load scenario, and 2468 therefore is hard to mitigate without negatively affecting the 2469 robustness of TCP. However, complementary mitigations can still be 2470 implemented to limit the impact of these attacks. 2472 Enforcing limits on the number of connections with no user-space 2473 controlling process 2475 The considerations and recommendations in Section 5.3.2 for enforcing 2476 limits on the number of connections with no user-space controlling 2477 process are applicable to mitigate this vulnerability. 2479 Enforcing per-user and per-process limits 2480 While the Netkill attack is usually targeted at a service such as 2481 HTTP, its impact is usually system-wide. This is particularly 2482 undesirable, as an attack against a single service might affect the 2483 system as a whole (for example possibly precluding remote system 2484 administration). 2486 In order to avoid an attack against a single service from affecting 2487 other services, we advise TCP implementations to enforce per-process 2488 and per-user limits on maximum kernel memory that can be used at any 2489 time. Additionally, we recommend implementations to enforce per- 2490 process and per-user limits on the number of existent TCP connections 2491 at any time. 2493 Limiting the number of ongoing connections at the application 2495 The considerations and recommendations in Section 5.3.2 for enforcing 2496 limits on the number of ongoing connections at the application are 2497 applicable to mitigate this vulnerability. 2499 Enabling applications to enforce limits on ongoing connections 2501 As discussed in Section 6.1.1, the fact that the close() system call 2502 marks the corresponding file descriptor as closed prevents the 2503 application running on top of TCP from enforcing limits on the 2504 corresponding connection. 2506 While it is common practice for applications to terminate their 2507 connections by means of the close() system call, it is possible for 2508 an application to initiate the connection-termination phase without 2509 closing the corresponding file descriptor (hence keeping control of 2510 the connection). 2512 In order to achieve this, an application performing an active close 2513 (i.e., initiating the connection-termination phase) should replace 2514 the call to close(sockfd) with the following code sequence: 2516 o A call to shutdown(sockfd, SHUT_WR), to close the sending 2517 direction of this connection 2519 o Successive calls to read(), until it returns 0, thus indicating 2520 that the remote TCP peer has finished sending data. 2522 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 2523 sizeof(l)), where l is of type struct linger (with its members 2524 l.l_onoff=1 and l.l_linger=90). 2526 o A call to close(sockfd), to close the corresponding file 2527 descriptor. 2529 The call to shutdown() (instead of close()) allows the application to 2530 retain control of the underlying TCP connection while the connection 2531 transitions through the FIN-WAIT-1 and FIN-WAIT-2 states. However, 2532 the application will not retain control of the connection while it 2533 transitions through the CLOSING and TIME-WAIT states. Nevertheless, 2534 in these states TCP should not have any pending data to send to the 2535 remote TCP peer or to be received by the application running on top 2536 of it, and thus these states are less of a concern for this 2537 particular vulnerability (Netkill). 2539 It should be noted that, strictly speaking, close(sockfd) decrements 2540 the reference count for the descriptor sockfd, and initiates the 2541 connection termination phase only when the reference count reaches 0. 2542 On the other hand, shutdown(sockfd, SHUT_WR) initiates the 2543 connection-termination phase, regardless of the reference count for 2544 the sockfd descriptor. This should be taken into account when 2545 performing the code replacement described above. For example, it 2546 would be a bug for two processes (e.g., parent and child) that share 2547 a descriptor to both call shutdown(sockfd, SHUT_WR). 2549 An application performing a passive close should replace the call to 2550 close(sockfd) with the following code sequence: 2552 o A call to setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, 2553 sizeof(l)), where l is of type struct linger (with its members 2554 l.l_onoff=1 and l.l_linger=90). 2556 o A call to close(sockfd), to close the corresponding file 2557 descriptor. 2559 It is assumed that if the application is performing a passive close, 2560 the application already detected that the remote TCP peer finished 2561 sending data by means as a result of a call to read() returning 0. 2563 In this scenario, the application will not retain control of the 2564 underlying connection when it transitions through the LAST_ACK state. 2565 However, in this state TCP should not have any pending data to send 2566 to the remote TCP peer or to be received by the application running 2567 on top of TCP, and thus this state is less of a concern for this 2568 particular vulnerability (Netkill). 2570 Limiting the number of simultaneous connections at firewalls 2572 The considerations and recommendations in Section 5.3.2 for enforcing 2573 limiting the number of simultaneous connections at firewalls are 2574 applicable to mitigate this vulnerability. 2576 Performing heuristics on ongoing TCP connections 2577 Some heuristics could be performed on TCP connections that may 2578 possibly help if scarce system requirements such as memory become 2579 exhausted. A number of parameters may be useful to perform such 2580 heuristics. 2582 In the case of the Netkill attack described in [Shalunov, 2000], 2583 there are two parameters that are characteristic of a TCP being 2584 attacked: 2586 o A large amount of data queued in the TCP retransmission buffer 2587 (e.g., the socket send buffer). 2589 o Only small amount of data has been successfully transferred to the 2590 remote peer. 2592 Clearly, these two parameters do not necessarily indicate an ongoing 2593 attack. However, if exhaustion of the corresponding system resources 2594 was imminent, these two parameters (among others) could be used to 2595 perform heuristics when considering aborting ongoing connections. 2597 It should be noted that while an attacker could advertise a zero 2598 window to cause the target system to tie system memory to the TCP 2599 retransmission buffer, it is hard to perform any useful statistics 2600 from the advertised window. While it is tempting to enforce a limit 2601 on the length of the persist state (see Section 3.7.2 of this 2602 document), an attacker could simply open the window (i.e., advertise 2603 a TCP window larger than zero) from time to time to prevent this 2604 enforced limit from causing his malicious connections to be aborted. 2606 7.2. TCP segment reassembly buffer 2608 TCP MAY discard out-of-order data when system-memory exhaustion is 2609 imminent. 2611 DISCUSSION: 2613 TCP buffers out-of-order segments to more efficiently handle the 2614 occurrence of packet reordering and segment loss. When out-of- 2615 order data are received, a "hole" momentarily exists in the data 2616 stream which must be filled before the received data can be 2617 delivered to the application making use of TCP's services. This 2618 situation can be exploited by an attacker, which could 2619 intentionally create a hole in the data stream by sending a number 2620 of segments with a sequence number larger than the next sequence 2621 number expected (RCV.NXT) by the attacked TCP. Thus, the attacked 2622 TCP would tie system memory to buffer the out-of-order segments, 2623 without being able to hand the received data to the corresponding 2624 application. 2626 If a large number of such connections were created, system memory 2627 could be exhausted, precluding the attacked TCP from servicing new 2628 connections and/or continue servicing TCP connections previously 2629 established. 2631 Fortunately, these attacks can be easily mitigated, at the expense 2632 of degrading the performance of possibly legitimate connections. 2633 When out-of-order data is received, an Acknowledgement segment is 2634 sent with the next sequence number expected (RCV.NXT). This means 2635 that receipt of the out-of-order data will not be actually 2636 acknowledged by the TCP's cumulative Acknowledgement Number. As a 2637 result, a TCP is free to discard any data that have been received 2638 out-of-order, without affecting the reliability of the data 2639 transfer. Given the performance implications of discarding out- 2640 of-order segments for legitimate connections, this pruning policy 2641 should be applied only if memory exhaustion is imminent. 2643 As a result of discarding the out-of-order data, these data will 2644 need to be unnecessarily retransmitted. Additionally, a loss 2645 event will be detected by the sending TCP, and thus the slow start 2646 phase of TCP's congestion control will be entered, thus reducing 2647 the data transfer rate of the connection. 2649 It is interesting to note that this pruning policy could be 2650 applied even if Selective Acknowledgements (SACK) (specified in 2651 RFC 2018 [Mathis et al, 1996]) are in use, as SACK provides only 2652 advisory information, and does not preclude the receiving TCP from 2653 discarding data that have been previously selectively-acknowledged 2654 by means of TCP's SACK option, but not acknowledged by TCP's 2655 cumulative Acknowledgement Number. 2657 There are a number of ways in which the pruning policy could be 2658 triggered. For example, when out of order data are received, a 2659 timer could be set, and the sequence number of the out-of-order 2660 data could be recorded. If the hole were filled before the timer 2661 expires, the timer would be turned off. However, if the timer 2662 expired before the hole were filled, all the out-of-order segments 2663 of the corresponding connection would be discarded. This would be 2664 a proactive counter-measure for attacks that aim at exhausting the 2665 receive buffers. 2667 In addition, an implementation could incorporate reactive 2668 mechanisms for more carefully controlling buffer allocation when 2669 some predefined buffer allocation threshold was reached. At such 2670 point, pruning policies would be applied. 2672 A number of mechanisms can aid in the process of freeing system 2673 resources. For example, a table of network prefixes corresponding 2674 to the IP addresses of TCP peers that have ongoing TCP connections 2675 could record the aggregate amount of out-of-order data currently 2676 buffered for those connections. When the pruning policy was 2677 triggered, TCP connections with hosts that have network prefixes 2678 with large aggregate out-of-order buffered data could be selected 2679 first for pruning the out-of-order segments. 2681 Alternatively, if TCP segments were de-multiplexed by means of a 2682 hash table (as it is currently the case in many TCP 2683 implementations), a counter could be held at each entry of the 2684 hash table that would record the aggregate out-of-order data 2685 currently buffered for those connections belonging to that hash 2686 table entry. When the pruning policy is triggered, the out-of- 2687 order data corresponding to those connections linked by the hash 2688 table entry with largest amount of aggregate out-of-order data 2689 could be pruned first. It is important that this hash is not 2690 computable by an attacker, as this would allow him to maliciously 2691 cause the performance of specific connections to be degraded. 2692 That is, given a four-tuple that identifies a connection, an 2693 attacker should not be able to compute the corresponding hash 2694 value used by the target system to de-multiplex incoming TCP 2695 segments to that connection. 2697 Another variant of a resource exhaustion attack against TCP's 2698 segment reassembly mechanism would target the data structures used 2699 to link the different holes in a data stream. For example, an 2700 attacker could send a burst of 1 byte segments, leaving a one-byte 2701 hole between each of the data bytes sent. Depending on the data 2702 structures used for holding and linking together each of the data 2703 segments, such an attack might waste a large amount of system 2704 memory by exploiting the overhead needed store and link together 2705 each of these one-byte segments. 2707 For example, if a linked-list is used for holding and linking each 2708 of the data segments, each of the involved data structures could 2709 involve one byte of kernel memory for storing the received data 2710 byte (the TCP payload), plus 4 bytes (32 bits) for storing a 2711 pointer to the next node in the linked-list. Additionally, while 2712 such a data structure would require only a few bytes of kernel 2713 memory, it could result in the allocation of a whole memory page, 2714 thus consuming much more memory than expected. 2716 Therefore, implementations should enforce a limit on the number of 2717 holes that are allowed in the received data stream at any given 2718 time. When such a limit is reached, incoming TCP segments which 2719 would create new holes would be silently dropped. Measurements in 2720 [Dharmapurikar and Paxson, 2005] indicate that in the vast 2721 majority of TCP connections have at most a single hole at any 2722 given time. A limit of 16 holes for each connection would 2723 accommodate even most of the very unusual cases in which there can 2724 be more than hole in the data stream at a given time. 2726 [US-CERT, 2004a] is a security advisory about a Denial of Service 2727 vulnerability resulting from a TCP implementation that did not 2728 enforce limits on the number of segments stored in the TCP 2729 reassembly buffer. 2731 Section 8 of this document describes the security implications of 2732 the TCP segment reassembly algorithm. 2734 7.3. Automatic buffer tuning mechanisms 2736 7.3.1. Automatic send-buffer tuning mechanisms 2738 A TCP implementing an automatic send-buffer tuning mechanism SHOULD 2739 enforce the following limit on the size of the send buffer of each 2740 TCP connection: 2742 send_buffer_size <= send_buffer_pool / (min_buffer_size * max_connections) 2744 where 2746 send_buffer_size: 2747 Maximum send buffer size to be used for this connection 2749 send_buffer_pool: 2750 Total amount of system memory meant for TCP send buffers 2752 min_buffer_size: 2753 Minimum send buffer size for each TCP connection 2755 max_connections: 2756 Maximum number of TCP connections this system is expected to 2757 handle at a time 2759 max_connections may be an artificial limit enforced by the system 2760 administrator specifically on the number of TCP connections, or may 2761 be derived from some other system limit (e.g., the maximum number of 2762 file descriptors) 2764 DISCUSSION: 2766 A number of TCP implementations incorporate automatic tuning 2767 mechanisms for the TCP send buffer size. In most of them, the 2768 underlying idea is to set the send buffer to some multiple of the 2769 congestion window (cwnd). This type of mechanism usually improves 2770 TCP's performance, by preventing the socket send buffer from 2771 becoming a bottleneck, while avoiding the need to simply 2772 overestimate the TCP send buffer size (i.e., make it arbitrarily 2773 large). [Semke et al, 1998] discusses such an automatic buffer 2774 tuning mechanism. 2776 Unfortunately, automatic tuning mechanisms can be exploited by 2777 attackers to amplify the impact of other resource exhaustion 2778 attacks. For example, an attacker could establish a TCP 2779 connection with a victim host, and cause the congestion window to 2780 be increased (either legitimately or illegitimately). Once the 2781 congestion window (and hence the TCP send buffer) is increased, he 2782 could cause the corresponding system memory to be tied up by 2783 advertising a zero-byte TCP window (see Section 3.7) or simply not 2784 acknowledging any data, thus amplifying the effect of resource 2785 exhaustion attacks such as that discussed in Section 7.1.1. 2787 When an automatic buffer tuning mechanism is implemented, a number 2788 of countermeasures should be incorporated to prevent the mechanism 2789 from being exploited to amplify other resource exhaustion attacks. 2791 Firstly, appropriate policies should be applied to guarantee fair 2792 use of the available system memory by each of the established TCP 2793 connections. Secondly, appropriate policies should be applied to 2794 avoid existing TCP connections from consuming all system 2795 resources, thus preventing service to new TCP connections. 2797 Appendix A of [Semke et al, 1998] proposes an algorithm for the 2798 fair share of the available system memory among the established 2799 connections. However, there are a number of limits that should be 2800 enforced on the system memory assigned for the send buffer of each 2801 connection. Firstly, each connection should always be assigned 2802 some minimum send buffer space that would enable TCP to perform at 2803 an acceptable performance. Secondly, some system memory should be 2804 reserved for future connections, according to the maximum number 2805 of concurrent TCP connections that are expected to be successfully 2806 handled at any given time. 2808 These limits preclude the automatic tuning algorithm from 2809 assigning all the available memory buffers to ongoing connections, 2810 thus preventing the establishment of new connections. 2812 Even if these limits are enforced, an attacker could still create 2813 a large number of TCP connections, each of them tying valuable 2814 system resources. Therefore, in scenarios in which most of the 2815 system memory reserved for TCP send buffers is allocated to 2816 ongoing connections, it may be necessary for TCP to enforce some 2817 policy to free resources to either service more TCP connections, 2818 or to be able to improve the performance of other existing 2819 connections, by allocating more resources to them. 2821 When needing to free memory in use for send buffers, particular 2822 attention should be paid to TCP's that have a large amount of data 2823 in the socket send buffer, and that at the same time fall into any 2824 of these categories: 2826 * The remote TCP peer that has been advertising a small (possibly 2827 zero) window for a considerable period of time. 2829 * There have been a large number of retransmissions of segments 2830 corresponding to the first few windows of data. 2832 * Connections that fall into one of the previous categories, for 2833 which only a reduced amount of data have been successfully 2834 transferred to the peer TCP since the connection was 2835 established. 2837 Unfortunately, all these cases are valid scenarios for the TCP 2838 protocol, and thus aborting connections that fall in any of these 2839 categories has the potential of causing interoperability problems. 2840 However, in scenarios in which all system resources are allocated, 2841 it may make sense to free resources allocated to TCP connections 2842 which are tying a considerable amount of system resources and that 2843 have not made progress in a considerable period of time. 2845 7.3.2. Automatic receive-buffer tuning mechanism 2847 A number of TCP implementations include automatic tuning mechanisms 2848 for the receive buffer size. These mechanisms aim at setting the 2849 socket buffer to a size that is large enough to avoid the TCP window 2850 from becoming a bottleneck that would limit TCP's throughput, without 2851 wasting system memory by over-sizing it. 2853 [Heffner, 2002] describes a mechanism for the automatic tuning of the 2854 socket receive buffer. Basically, the mechanism aims at measuring 2855 the amount of data received during a RTT (Round-Trip Time), and 2856 setting the socket receive buffer to some multiple of that value. 2858 A TCP implementing an automatic receive-buffer tuning mechanism 2859 SHOULD enforce the following limit on the size of the receive buffer 2860 of each TCP connection: 2862 recv_buffer_size <= recv_buffer_pool / (min_buffer_size * max_connections) 2863 where: 2865 recv_buffer_size: 2866 Maximum receive buffer size to be used for this connection 2868 recv_buffer_pool: 2869 Total amount of system memory meant for TCP receive buffers 2871 min_buffer_size: 2872 Minimum receive buffer size for each TCP connection 2874 max_connections: 2875 Maximum number of TCP connections this system is expected to 2876 handle at a time 2878 max_connections may be an artificial limit enforced by the system 2879 administrator specifically on the number of TCP connections, or may 2880 be derived from some other system limit (e.g., the maximum number of 2881 file descriptors). 2883 DISCUSSION: 2885 Unfortunately, automatic tuning mechanisms for the socket receive 2886 buffer can be exploited to perform a resource exhaustion attack. 2887 An attacker willing to exploit the automatic buffer tuning 2888 mechanism would first establish a TCP connection with the victim 2889 host. Subsequently, he would start a bulk data transfer to the 2890 victim host. By carefully responding to the peer's TCP segments, 2891 the attacker could cause the peer TCP to measure a large data/RTT 2892 value, which would lead to the adoption of an unnecessarily large 2893 socket receive buffer. For example, the attacker could 2894 optimistically send more data than those allowed by the TCP window 2895 advertised by the remote TCP. Those extra data would cross in the 2896 network with the window updates sent by the remote TCP, and could 2897 lead the TCP receiver to measure a data/RTT twice as big as the 2898 real one. Alternatively, if the TCP timestamp option (specified 2899 in RFC 1323 [Jacobson et al, 1992]) is used for RTT measurement, 2900 the attacker could lead the TCP receiver to measure a small RTT 2901 (and hence a large Data/RTT rate) by "optimistically" echoing 2902 timestamps that have not yet been received. 2904 Finally, once the TCP receiver is led to increase the size of its 2905 receive buffer, the attacker would transmit a large amount of 2906 data, filling the whole peer's receive buffer except for a few 2907 bytes at the beginning of the window (RCV.NXT). This gap would 2908 prevent the peer application from reading the data queued by TCP, 2909 thus tying system memory to the received data segments until (if 2910 ever) the peer application times out. 2912 A number of limits should be enforced on the amount of system 2913 memory assigned to any given connection. Firstly, each connection 2914 should always be assigned some minimum receive buffer space that 2915 would enable TCP to perform at a minimum acceptable performance. 2916 Additionally, some system memory should be reserved for future 2917 connections, according to the maximum number of concurrent TCP 2918 connections that are expected to be successfully handled at any 2919 given time. 2921 These limits preclude the automatic tuning algorithm from 2922 assigning all the available memory buffers to existing 2923 connections, thus preventing the establishment of new connections. 2925 It is interesting to note that a TCP sender will always try to 2926 retransmit any data that have not been acknowledged by TCP's 2927 cumulative acknowledgement. Therefore, if memory exhaustion is 2928 imminent, a system should consider freeing those memory buffers 2929 used for TCP segments that were received out of order, 2930 particularly when a given connection has been keeping a large 2931 number of out-of-order segments in the receive buffer for a 2932 considerable period of time. 2934 It is worth noting that TCP Selective Acknowledgements (SACK) are 2935 advisory, in the sense that a TCP that has SACKed (but not ACKed) 2936 a block of data is free to discard that block, and expect the TCP 2937 sender to retransmit them when the retransmission timer of the 2938 peer TCP expires. 2940 8. TCP segment reassembly algorithm 2942 8.1. Problems that arise from ambiguity in the reassembly process 2944 If a TCP segment is received containing some data bytes that had 2945 already been received, the first copy of those data SHOULD be used 2946 for reassembling the application data stream. 2948 DISCUSSION: 2950 A security consideration that should be made for the TCP segment 2951 reassembly algorithm is that of data stream consistency between 2952 the host performing the TCP segment reassembly, and a Network 2953 Intrusion Detection System (NIDS) being employed to monitor the 2954 host in question. 2956 In the event a TCP segment was unnecessarily retransmitted, or 2957 there was packet duplication in any of the intervening networks, a 2958 TCP might get more than one copy of the same data. Also, as TCP 2959 segments can be re-packetized when they are retransmitted, a given 2960 TCP segment might partially overlap data already received in 2961 earlier segments. In all these cases, the question arises about 2962 which of the copies of the received data should be used when 2963 reassembling the data stream. In legitimate and normal 2964 circumstances, all copies would be identical, and the same data 2965 stream would be obtained regardless of which copy of the data was 2966 used. However, an attacker could maliciously send overlapping 2967 segments containing different data, with the intent of evading a 2968 Network Intrusion Detection Systems (NIDS), which might reassemble 2969 the received TCP segments differently than the monitored system. 2970 [Ptacek and Newsham, 1998] provides a detailed discussion of these 2971 issues. 2973 As suggested in Section 3.9 of RFC 793 [Postel, 1981c], if a TCP 2974 segment arrives containing some data bytes that have already been 2975 received, the first copy of those data should be used for 2976 reassembling the application data stream. It should be noted that 2977 while convergence to this policy might prevent some cases of 2978 ambiguity in the reassembly process, there are a number of other 2979 techniques that an attacker could still exploit to evade a NIDS 2980 [CPNI, 2008]. These techniques can generally be defeated if the 2981 NIDS is placed in-line with the monitored system, thus allowing 2982 the NIDS to normalize the network traffic or apply some other 2983 policy that could ensure consistency between the result of the 2984 segment reassembly process obtained by the monitored host and that 2985 obtained by the NIDS. 2987 [CERT, 2003] and [CORE, 2003] are advisories about a heap buffer 2988 overflow in a popular Network Intrusion Detection System resulting 2989 from incorrect sequence number calculations in its TCP stream- 2990 reassembly module. 2992 9. TCP Congestion Control 2994 TCP implements two algorithms, "slow start" and "congestion 2995 avoidance", for controlling the rate at which data is transmitted on 2996 a TCP connection [Allman et al, 1999]. These algorithms require the 2997 addition of two variables as part of TCP per-connection state: cwnd 2998 and ssthresh. 3000 The congestion window (cwnd) is a sender-side limit on the amount of 3001 outstanding data that the sender can have at any time, while the 3002 receiver's advertised window (rwnd) is a receiver-side limit on the 3003 amount of outstanding data. The minimum of cwnd and rwnd governs 3004 data transmission. 3006 Another state variable, the slow-start threshold (ssthresh), is used 3007 to determine whether it is the slow start or the congestion avoidance 3008 algorithm that should control data transmission. When cwnd < 3009 ssthresh, "slow start" governs data transmission, and the congestion 3010 window (cwnd) is exponentially increased. When cwnd > ssthresh, 3011 "congestion avoidance" governs data transmission, and the congestion 3012 window (cwnd) is only linearly increased. 3014 As specified in RFC 2581 [Allman et al, 1999], when cwnd and ssthresh 3015 are equal the sender may use either slow start or congestion 3016 avoidance. 3018 During slow start, TCP increments cwnd by at most SMSS bytes for each 3019 ACK received that acknowledges new data. During congestion 3020 avoidance, cwnd is incremented by 1 full-sized segment per round-trip 3021 time (RTT), until congestion is detected. 3023 Additionally, TCP uses two algorithms, Fast Retransmit and Fast 3024 Recovery, to mitigate the effects of packet loss. The "Fast 3025 Retransmit" algorithm infers packet loss when three Duplicate 3026 Acknowledgements (DupACKs) are received. 3028 The value "three" is meant to allow for fast-retransmission of 3029 "missing" data, while avoiding network packet reordering from 3030 triggering loss recovery. 3032 Once packet loss is detected by the receipt of three duplicate-ACKs, 3033 the "Fast Recovery" algorithm governs the transfer of new data until 3034 a non-duplicate ACK is received that acknowledges the receipt of new 3035 data. The Fast Retransmit and Fast Recovery algorithms are usually 3036 implemented together, as follows (from RFC 2581): 3038 o When the third duplicate ACK is received, set ssthresh to no more 3039 than the value given in the equation: ssthresh = max (FlightSize / 3040 2, 2*SMSS) 3042 o Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS. 3043 This artificially "inflates" the congestion window by the number 3044 of segments (three) that have left the network and which the 3045 receiver has buffered. 3047 o For each additional duplicate ACK received, increment cwnd by 3048 SMSS. This artificially inflates the congestion window in order 3049 to reflect the additional segment that has left the network. 3051 o Transmit a segment, if allowed by the new value of cwnd and the 3052 receiver's advertised window. 3054 o When the next ACK arrives that acknowledges new data, set cwnd to 3055 ssthresh (the value set in step 1). This is termed "deflating" 3056 the window. 3058 9.1. Congestion control with misbehaving receivers 3060 [Savage et al, 1999] describes a number of ways in which TCP's 3061 congestion control mechanisms can be exploited by a misbehaving TCP 3062 receiver to obtain more than its fair share of bandwidth. The 3063 following subsections provide a brief discussion of these 3064 vulnerabilities, along with the possible countermeasures. 3066 9.1.1. ACK division 3068 TCP SHOULD increase cwnd by one SMSS only when a valid ACK covers the 3069 entire data segment sent 3071 (note: or should we recommend the other counter-measure (i.e., 3072 implementation of ABC?) 3074 DISCUSSION: 3076 Given that TCP updates cwnd based on the number of duplicate ACKs 3077 it receives, rather than on the amount of data that each ACK is 3078 actually acknowledging, a malicious TCP receiver could cause the 3079 TCP sender to illegitimately increase its congestion window by 3080 acknowledging a data segment with a number of separate 3081 Acknowledgements, each covering a distinct piece of the received 3082 data segment. 3084 See Figure 7, in page 64 of the UK CPNI document. 3086 ACK division attack 3088 [Savage et al, 1999] describes two possible countermeasures for 3089 this vulnerability. One of them is to increment cwnd not by a 3090 full SMSS, but proportionally to the amount of data being 3091 acknowledged by the received ACK, similarly to the policy 3092 described in RFC 3465 [Allman, 2003]. Another alternative is to 3093 increase cwnd by one SMSS only when a valid ACK covers the entire 3094 data segment sent. 3096 9.1.2. DupACK forgery 3098 TCP SHOULD keep track of the number of outstanding segments (o_seg), 3099 and accept only up to (o_seg -1) duplicate Acknowledgements. 3101 DISCUSSION: 3103 The second vulnerability discussed in [Savage et al, 1999] allows 3104 an attacker to cause the TCP sender to illegitimately increase its 3105 congestion window by forging a number of duplicate 3106 Acknowledgements (DupACKs). Figure 8 shows a sample scenario. 3107 The first three DupACKs trigger the Fast Recovery mechanism, while 3108 the rest of them cause the congestion window at the TCP sender to 3109 be illegitimately inflated. Thus, the attacker is able to 3110 illegitimately cause the TCP sender to increase its data 3111 transmission rate. 3113 See Figure 8, in page 65 of the UK CPNI document. 3115 DupACK forgery attack 3117 Fortunately, a number of sender-side heuristics can be implemented 3118 to mitigate this vulnerability. First, the TCP sender could keep 3119 track of the number of outstanding segment (o_seg), and accept 3120 only up to (o_seg -1) DupACKs. Secondly, a TCP sender might, for 3121 example, refuse to enter Fast Recovery multiple times in some 3122 period of time (e.g., one RTT). 3124 [Savage et al, 1999] also describes a modification to TCP to 3125 implement a nonce protocol that would eliminate this 3126 vulnerability. However, this would require modification of all 3127 implementations, which makes this counter-measure hard to deploy. 3129 9.1.3. Optimistic ACKing 3131 Another alternative for an attacker to exploit TCP's congestion 3132 control mechanisms is to acknowledge data that has not yet been 3133 received, thus causing the congestion window at the TCP sender to be 3134 incremented faster than it should. 3136 See Figure 9, in page 66 of the UK CPNI document. 3138 Optimistic ACKing attack 3140 [Savage et al, 1999] describes a number of mitigations for this 3141 vulnerability. Firstly, it describes a countermeasure based on the 3142 concept of "cumulative nonce", which would allow a receiver to prove 3143 that it has received all the segments it is acknowledging. However, 3144 this countermeasure requires the introduction of two new fields to 3145 the TCP header, thus requiring a modification to all the 3146 communicating TCPs, makes this counter-measure hard to deploy. 3147 Secondly, it describes a possible way to encode the nonce in a TCP 3148 segment by carefully modifying its size. While this countermeasure 3149 could be easily deployed (as it is just sender side policy), we 3150 believe that middle-boxes such as protocol-scrubbers might prevent 3151 this counter-measure from working as expected. Finally, it suggests 3152 that a TCP sender might penalize a TCP receiver that acknowledges 3153 data not yet sent by resetting the corresponding connection. Here we 3154 discourage the implementation of this policy, as it would provide an 3155 attack vector for a TCP-based connection-reset attack, similar to 3156 those described in Section 11. 3158 [US-CERT, 2005a] is a vulnerability advisory about this issue. 3160 9.2. Blind DupACK triggering attacks against TCP 3162 While all of the attacks discussed in [Savage et al, 1999] have the 3163 goal of increasing the performance of the attacker's TCP connections, 3164 TCP congestion control mechanisms can be exploited with a variety of 3165 goals. 3167 Firstly, if bursts of many duplicate-ACKs are sent to the "sending 3168 TCP", the third duplicate-ACK will cause the "lost" segment to be 3169 retransmitted, and each subsequent duplicate-ACK will cause cwnd to 3170 be artificially inflated. Thus, the "sending TCP" might end up 3171 injecting more packets into the network than it really should, with 3172 the potential of causing network congestion. This is a potential 3173 consequence of the "Duplicate-ACK spoofing attack" described in 3174 [Savage et al, 1999]. 3176 Secondly, if bursts of three duplicate ACKs are sent to the TCP 3177 sender, the attacked system would infer packet loss, and ssthresh and 3178 cwnd would be reduced. As noted in RFC 2581 [Allman et al, 1999], 3179 causing two congestion control events back-to-back will often cut 3180 ssthresh and cwnd to their minimum value of 2*SMSS, with the 3181 connection immediately entering the slower-performing congestion 3182 avoidance phase. While it would not be attractive for an attacker to 3183 perform this attack against one of his TCP connections, the attack 3184 might be attractive when the TCP connection to be attacked is 3185 established between two other parties. 3187 It is usually assumed that in order for an off-path attacker to 3188 perform attacks against a third-party TCP connection, he should be 3189 able to guess a number of values, including a valid TCP Sequence 3190 Number and a valid TCP Acknowledgement Number. While this is true if 3191 the attacker tries to "inject" valid packets into the connection by 3192 himself, a feature of TCP can be exploited to fool one of the TCP 3193 endpoints to transmit valid duplicate Acknowledgements on behalf of 3194 the attacker, hence relieving the attacker of the hard task of 3195 forging valid values for the Sequence Number and Acknowledgement 3196 Number TCP header fields. 3198 Section 3.9 of RFC 793 [Postel, 1981c] describes the processing of 3199 incoming TCP segments as a function of the connection state and the 3200 contents of the various header fields of the received segment. For 3201 connections in the ESTABLISHED state, the first check that is 3202 performed on incoming segments is that they contain "in window" data. 3203 That is, 3205 RCV.NXT <= SEG.SEQ <= RCV.NXT+RCV.WND, or 3207 RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 3209 If a segment does not pass this check, it is dropped, and an 3210 Acknowledgement is sent in response: 3212 3214 The goal of this behavior is that, in the event data segments are 3215 received by the TCP receiver, but all the corresponding 3216 Acknowledgements are lost, when the TCP sender retransmits the 3217 supposedly lost data, the TCP receiver will send an Acknowledgement 3218 reflecting all the data received so far. If "old" TCP segments were 3219 silently dropped, the scenario just described would lead to a 3220 "frozen" TCP connection, with the TCP sender retransmitting the data 3221 for which it has not yet received an Acknowledgement, and the TCP 3222 receiver silently ignoring these segments. Additionally, it helps 3223 TCP to detect half-open connections. 3225 This feature implies that, provided the four-tuple that identifies a 3226 given TCP connection is known or can be easily guessed, an attacker 3227 could send a TCP segment with an "out of window" Sequence Number to 3228 one of the endpoints of the TCP connection to cause it to send a 3229 valid ACK to the other endpoint of the connection. Figure 10 3230 illustrates such a scenario. 3232 See Figure 10, in page 68 of the UK CPNI document. 3234 Blind Dup-ACK forgery attack 3236 As discussed in [Watson, 2004] and RFC 4953 [Touch, 2007], there are 3237 a number of scenarios in which the four-tuple that identifies a TCP 3238 connection is known or can be easily guessed. In those scenarios, an 3239 attacker could perform any of the "blind" attacks described in the 3240 following subsections by exploiting the technique described above. 3242 The following subsections describe blind DupACK-triggering attacks 3243 that aim at either degrading the performance of an arbitrary 3244 connection, or causing a TCP sender to illegitimately increase the 3245 rate at which it transmits data, potentially leading to network 3246 congestion. 3248 9.2.1. Blind throughput-reduction attack 3250 As discussed in Section 9, when three duplicate Acknowledgements are 3251 received, the congestion window is reduced to half the current amount 3252 of outstanding data (FlightSize). Additionally, the slow-start 3253 threshold (ssthresh) is reduced to the same value, causing the 3254 connection to enter the slower-performing congestion avoidance phase. 3255 If two congestion-control events occur back to back, ssthresh and 3256 cwnd will often be reduced to their minimum value of 2*SMSS. 3258 An attacker could exploit the technique described in Section 9.2 to 3259 cause the throughput of the attacked TCP connection to be reduced, by 3260 eliciting three duplicate acknowledgements from the TCP receiver, 3261 which would cause the TCP sender to reduce its congestion window. In 3262 principle, the attacker would need to send a burst of only three out- 3263 of-window segments. However, in case the TCP receiver implements an 3264 acknowledgement policy such as "ACK every other segment", four out- 3265 of-window segments might be needed. The first segment would cause 3266 the pending (delayed) Acknowledgement to be sent, and the next three 3267 segments would elicit the actual duplicate Acknowledgements. 3269 Figure 11 shows a time-line graph of a sample scenario. The burst of 3270 DupACKs (in green) elicited by the burst of out-of-window segments 3271 (in red) sent by the attacker causes the TCP sender to retransmit the 3272 missing segment (in blue) and enter the loss recovery phase. Once a 3273 segment that acknowledges new data is received by the TCP sender, the 3274 loss recovery phase ends, and cwnd and ssthresh are set to half the 3275 number of segments that were outstanding when the loss recovery phase 3276 was entered. 3278 See Figure 11, in page 69 of the UK CPNI document. 3280 Blind throughput-reduction attack (time-line graph) 3282 The graphic assumes that the TCP receiver sends an Acknowledgement 3283 for every other data segment it receives, and that the TCP sender 3284 implements Appropriate Byte Counting (specified in RFC 3465 [Allman, 3285 2003]) on the received Acknowledgement segments. However, 3286 implementation of these policies is not required for the attack to 3287 succeed. 3289 9.2.2. Blind flooding attack 3291 As discussed in Section 9, when three duplicate Acknowledgements are 3292 received, the "lost" segment is retransmitted, and the congestion 3293 window is artificially inflated for each DupACK received, until the 3294 loss recovery phase ends. By sending a long burst of out-of-window 3295 segments to the TCP receiver of the attacked connection, an attacker 3296 could elicit a long burst of valid duplicate acknowledgements that 3297 would illegitimately cause the TCP sender of the attacked TCP 3298 connection to increase its data transmission rate. 3300 Figure 12 shows a time-line graph for this attack. The long burst of 3301 DupACKs (in green) elicited by the long burst of out-of-window 3302 segments (in red) sent by the attacker causes the TCP sender to enter 3303 the loss recovery phase and illegitimately inflate the congestion 3304 window, leading to an increase in the data transmission rate. Once a 3305 segment that acknowledges new data is received by the TCP sender, the 3306 loss recovery phase ends, and the data transmission rate is reduced. 3308 See Figure 12, in page 70 of the UK CPNI document. 3310 Blind flooding attack (time-line graph) 3312 Figure 13 is a time-sequence graph produced from packet logs obtained 3313 from tests of the described attack in a real network. A burst of 3314 segments is sent upon receipt of the burst of Duplicate 3315 Acknowledgements illegitimately elicited by the attacker. Figure 14 3316 is an averaged-throughput graphic for the same time frame, which 3317 clearly shows the effect of the attack in terms of throughput. 3319 See Figure 13, in page 71 of the UK CPNI document. 3321 Blind flooding attack (time sequence graph) 3323 See Figure 14, in page 71 of the UK CPNI document. 3325 Blind flooding attack (averaged throughput graph) 3327 These graphics were produced with Shawn Ostermann's tcptrace tool 3328 [Ostermann, 2008]. An explanation of the format of the graphics can 3329 be found in tcptrace's manual (available at the project's web site: 3330 http://www.tcptrace.org). 3332 9.2.3. Difficulty in performing the attacks 3334 In order to exploit the technique described in Section 9.2 of this 3335 document, an attacker would need to know the four-tuple {IP Source 3336 Address, TCP Source Port, IP Destination Address, TCP Destination 3337 Port} that identifies the connection to be attacked. As discussed by 3338 [Watson, 2004] and RFC 4953 [Touch, 2007], there are a number of 3339 scenarios in which these values may be known or easily guessed. 3341 It is interesting to note that the attacks described in Section 9.2 3342 of this document will typically require a much smaller number of 3343 packets than other "blind" attacks against TCP, such as those 3344 described in [Watson, 2004] and RFC 4953 [Touch, 2007], as the 3345 technique discussed in Section 9.2 relieves the attacker from having 3346 to guess valid TCP Sequence Numbers and a TCP Acknowledgement 3347 numbers. 3349 The attacks described in Section 9.2.1 and Section 9.2.2 of this 3350 document require the attacker to forge the source address of the 3351 packets it sends. Therefore, if ingress/egress filtering is 3352 performed by intermediate systems, the attacker's packets would not 3353 get to the intended recipient, and thus the attack would not succeed. 3354 However, we consider that ingress/egress filtering cannot be relied 3355 upon as the first line of defense against these attacks. 3357 Finally, it is worth noting that in order to successfully perform the 3358 blind attacks discussed in Section 9.2.1 and Section 9.2.2 of this 3359 document, the burst of out-of-sequence segments sent by the attacker 3360 should not be intermixed with valid data segments sent by the TCP 3361 sender, or else the Acknowledgement number of the illegitimately- 3362 elicited ACK segments would change, and the Acknowledgements would 3363 not be considered "Duplicate Acknowledgements" by the TCP sender. 3364 Tests performed in real networks seem to suggest that this 3365 requirement is not hard to fulfill, though. 3367 9.2.4. Modifications to TCP's loss recovery algorithms 3369 There are a number of algorithms that augment TCP's loss recovery 3370 mechanism that have been suggested by TCP researchers and have been 3371 specified by the IETF in the RFC series. This section describes a 3372 number of these algorithms, and discusses how their implementation 3373 affects (or not) the vulnerability of TCP to the attacks discussed in 3374 Section 9.2.1 and Section 9.2.2 of this document. 3376 NewReno 3378 RFC 3782 [Floyd et al, 2004] specifies the NewReno algorithm, which 3379 is meant to improve TCP's performance in the presence of multiple 3380 losses in a single window of data. The implication of this algorithm 3381 with respect to the attacks discussed in the previous sections is 3382 that whenever either of the attacks is performed against a connection 3383 with a NewReno TCP sender, a full-window (or half a window) of data 3384 will be unnecessarily retransmitted. This is particularly 3385 interesting in the case of the blind-flooding attack, as the attack 3386 would elicit even more packets from the TCP sender. 3388 Whether a full-window or just half a window of data is retransmitted 3389 depends on the Acknowledgement policy at the TCP receiver. If the 3390 TCP receiver sends an Acknowledgement (ACK) for every segment, a 3391 full-window of data will be retransmitted. If the TCP receiver sends 3392 an Acknowledgement (ACK) for every other segment, then only half a 3393 window of data will be retransmitted. 3395 Figure 15 is a time-sequence graph produced from packet logs obtained 3396 from tests performed in a real network. Once loss recovery is 3397 illegitimately triggered by the duplicate-ACKs elicited by the 3398 attacker, an entire flight of data is unnecessarily retransmitted. 3399 Figure 16 is an averaged-throughput graphic for the same time-frame, 3400 which shows an increase in the throughput of the connection resulting 3401 from the retransmission of segments governed by NewReno's loss 3402 recovery. 3404 See Figure 15, in page 73 of the UK CPNI document. 3406 NewReno loss recovery (time-sequence graph) 3408 See Figure 16, in page 74 of the UK CPNI document. 3410 NewReno loss recovery (averaged throughput graph) 3412 Limited Transmit 3414 RFC 3042 [Allman et al, 2001] proposes an enhancement to TCP to more 3415 effectively recover lost segments when a connection's congestion 3416 window is small, or when a large number of segments are lost in a 3417 single transmission window. The "Limited Transmit" algorithm calls 3418 for sending a new data segment in response to each of the first two 3419 Duplicate Acknowledgements that arrive at the TCP sender. This would 3420 provide two additional transmitted packets that may be useful for the 3421 attacker in the case of the blind flooding attack described in 3422 Section 9.2.2 is performed. 3424 SACK-based loss recovery 3426 RFC 3517 [Blanton et al, 2003] specifies a conservative loss-recovery 3427 algorithm that is based on the use of the selective acknowledgement 3428 (SACK) TCP option. The algorithm uses DupACKs as an indication of 3429 congestion, as specified in RFC 2581 [Allman et al, 1999]. However, 3430 a difference between this algorithm and the basic algorithm described 3431 in RFC 2581 is that it clocks out segments only with the SACK 3432 information included in the DupACKs. That is, during the loss 3433 recovery phase, segments will be injected in the network only if the 3434 SACK information included in the received DupACKs indicates that one 3435 or more segments have left the network. As a result, those systems 3436 that implement SACK-based loss recovery will not be vulnerable to the 3437 blind flooding attack described in Section 9.2.2. However, as RFC 3438 3517 does not actually require DupACKs to include new SACK 3439 information (corresponding to data that has not yet been acknowledged 3440 by TCP's cumulative Acknowledgement), systems that implement SACK- 3441 based loss-recovery may still remain vulnerable to the blind 3442 throughput-reduction attack described in Section 9.2.1. SACK-based 3443 loss recovery implementations should be updated to implement the 3444 countermeasure ("Use of SACK information to validate DupACKs") 3445 described in Section 9.2.5. 3447 9.2.5. Countermeasures 3449 TCP SHOULD validate the Sequence Number of an incomming TCP segment 3450 as follows: 3452 RCV.NXT - MAX.RCV.WND <= SEG.SEQ <= RCV.NXT + RCV.WND 3454 where MAX.RCV.WND is the largest TCP window that has so far been 3455 advertised to the remote endpoint. 3457 If a segment passes this check, the processing rules specified in RFC 3458 793 [Postel, 1981c] MUST applied. Otherwise, TCP SHOULD send an ACK 3459 (as specified by the processing rules in RFC 793 [Postel, 1981c]), 3460 applying rate-limiting to the Acknowledgement segments sent in 3461 response to out-of-window segments. 3463 DISCUSSION: 3465 As discussed in Section 9.2, TCP responds with an ACK when an out- 3466 of-window segment is received, to accommodate those scenarios in 3467 which the Acknowledgement segments that correspond to some 3468 received data are lost in the network, and to help discover half- 3469 open TCP connections. 3471 However, it is possible to restrict the sequence numbers that are 3472 considered acceptable, and have TCP respond with ACKs only when it 3473 is strictly necessary. 3475 A feature of TCP is that, in some scenarios, it can detect half- 3476 open connections. If an implementation chose to silently drop 3477 those TCP segments that do not pass the check enforced by the 3478 equation above, it could prevent TCP from detecting half-open 3479 connections. Figure 17 shows a scenario in which, provided that 3480 "TCP B" behaves as specified in RFC 793, a half-open connection 3481 would be discovered and aborted. 3483 An established connection is said to be "half open" if one of the 3484 TCPs has closed or aborted the connection at its end without the 3485 knowledge of the other, or if the two ends of the connection have 3486 become desynchronized owing to a crash that resulted in loss of 3487 memory. 3489 See Figure 17, in page 76 of the UK CPNI document. 3491 Half-Open Connection Discovery 3493 In the scenario illustrated by Figure 17, TCP A crashes losing the 3494 connection-state information of the TCP connection with TCP B. In 3495 line 3, TCP A tries to establish a new connection with TCP B, 3496 using the same four-tuple {IP Source Address, TCP source port, IP 3497 Destination Address, TCP destination port}. In line 4, as the SYN 3498 segment is out of window, TCP B responds with an ACK. This ACK 3499 elicits an RST segment from TCP A, which causes the half-open 3500 connection at TCP B to be aborted. 3502 If the SYN segment had been "in window", TCP B would have sent an 3503 RST segment instead, which would have closed the half-open 3504 connection. Ongoing work at the TCPM WG of the IETF proposes to 3505 change this behavior, and make TCP respond to a SYN segment 3506 received for any of the synchronized states with an ACK segment, 3507 to avoid in-window SYN segments from being used to perform 3508 connection-reset attacks [Ramaiah et al, 2008]. 3510 However, in case the out-of-window segment was silently dropped, 3511 the scenario in Figure 17 would change into that in Figure 18. 3513 See Figure 18, in page 76 of the UK CPNI document. 3515 Half-Open Connection Discovery with the proposed counter-measure 3517 In line 3, the SYN segment sent by TCP A is silently dropped by 3518 TCP B because it does not pass the check enforced by the equation 3519 above (i.e., it contains an out-of-window sequence number). As a 3520 result, some time later (an RTO) TCP A retransmits its SYN 3521 segment. Even after TCP A times out, the half-open connection at 3522 TCP B will remain in the same state. 3524 Thus, a conservative reaction to those segments that do not pass 3525 the check enforced by the equation above would be to respond with 3526 an Acknowledgement segment (as specified by RFC 793), applying 3527 rate-limiting to those Acknowledgement segments sent in response 3528 to segments that do not pass the check enforced by that equation. 3529 An implementation might choose to enforce a rate-limit of, e.g., 3530 one ACK per five seconds, as a single ACK segment is needed for 3531 the Half-Open Connection Discovery mechanism to work. 3533 As the only reason to respond with an ACK to those segments that 3534 do not pass the check enforced by the equation above is to allow 3535 TCP to discover half-open connections, an aggressive rate-limit 3536 can be enforced. As long as the rate-limit prevents out-of-window 3537 segments from eliciting three Acknowledgment segments in a Round- 3538 trip Time (RTT), an attacker would not be able to trigger TCP's 3539 loss-recovery, and thus would not be able to perform the attacks 3540 described in the previous sections. 3542 It is interesting to note that RFC 793 [Postel, 1981c] itself 3543 states that half-open connections are expected to be unusual. 3544 Additionally, given that in many scenarios it may be unlikely for 3545 a TCP connection request to be issued with the same four-tuple as 3546 that of the half-open connection, a complete solution for the 3547 discovery of half-open connections cannot rely on the mechanism 3548 illustrated by Figure 17, either. Therefore, some implementations 3549 might choose to sacrifice TCP's ability to detect half-open 3550 connections, and have a more aggressive reaction to those segments 3551 that do not pass the check enforced by the equation above by 3552 silently dropping them. 3554 This validation check can also help to avoid ACK wars in some 3555 scenarios that may arise from the use of transparent proxies. In 3556 those scenarios, when the transparent proxy fails to wire (i.e., 3557 is disabled), the sequence numbers of the two end-points of the 3558 TCP connection become desynchronized, and both TCPs begin to send 3559 duplicate Acknowledgements to each other, with the intention of 3560 re-synchronizing them. As the sequence numbers never get re- 3561 synchronized, the ACK war can only be stopped by an external 3562 agent. 3564 TCP SHOULD limit the number of duplicate acknowledgements it will 3565 honour to: 3567 Max_DupACKs = (FlightSize / SMSS) - 1 3569 Where FlightSize and SMSS are the values defined in RFC 2581 [Allman 3570 et al, 1999]. When more than Max_DupACKs duplicate acknowledgements 3571 are received, the exceeding DupACKs should be silently dropped. 3573 DISCUSSION: 3575 Note that duplicate acknowledgements should be elicited by out-of- 3576 order segments. 3578 In the case of TCP connections that have agreed to employ SACK, TCP 3579 SHOULD validate duplicate ACKs with the following criteria: Valid 3580 Duplicate ACKs MUST contain new SACK information. The SACK 3581 information MUST refer to data that has already been sent, but that 3582 has not yet been acknowledged by TCP's cumulative Acknowledgement. A 3583 TCP segment that does not pass this check SHOULD NOT be considered as 3584 "duplicate Acknowledgement". 3586 DISCUSSION: 3588 SACK, specified in 2018 [Mathis et al, 1996], provides a mechanism 3589 for TCP to be able to acknowledge the receipt of out-of-order TCP 3590 segments. For connections that have agreed to use SACK, each 3591 legitimate DupACK will contain new SACK information that reflects 3592 the data bytes contained in the out-of-order data segment that 3593 elicited the DupACK. 3595 RFC 3517 [Blanton et al, 2003] specifies a SACK-based loss 3596 recovery algorithm for TCP. However, it does recommend TCP 3597 implementations to validate DupACKs by requiring that they contain 3598 new SACK information. Results obtained from auditing a number of 3599 TCP implementations seem to indicate that most TCP implementations 3600 do not enforce this validation check on incoming DupACKs, either. 3602 In the case of TCP connections that have agreed to use SACK, a 3603 validation check should be performed on incoming ACK segments to 3604 completely eliminate the attacks described in Section 9.2.1 and 3605 Section 9.2.2 of this document: "Duplicate ACKs should contain new 3606 SACK information. The SACK information should refer to data that 3607 has already been sent, but that has not yet been acknowledged by 3608 TCP's cumulative Acknowledgement". 3610 Those ACK segments that do not comply with this validation check 3611 should not be considered "duplicate ACKs", and thus should not 3612 trigger the loss-recovery phase. 3614 In case at least one segment in a window of data has been lost, 3615 the successive segments will elicit the generation of Duplicate 3616 ACKs containing new SACK information. This SACK information will 3617 indicate the receipt of these successive segments by the TCP 3618 receiver. 3620 In the case of pure ACKs illegitimately elicited by out-of-window 3621 segments, however, the ACKs will not contain any SACK information. 3623 If DSACK (specified in 2883 [Floyd et al, 2000]) were implemented 3624 by the TCP receiver, then the illegitimately elicited DupACKs 3625 might contain out-of-window SACK information if the sequence 3626 number of the forged TCP segment (SEG.SEQ) is lower than the next 3627 expected sequence number (RECV.NXT) at the TCP receiver. Such 3628 segments should be considered to indicate the receipt of duplicate 3629 data, rather than an indication of lost data, and therefore should 3630 not trigger loss recovery. 3632 Other possible general mitigations are discussed in the following 3633 paragraphs: 3635 TCP port number randomization 3637 As in order to perform the blind attacks described in Section 9.2.1 3638 and Section 9.2.2 the attacker needs to know the TCP port numbers in 3639 use by the connection to be attacked, obfuscating the TCP source port 3640 used for outgoing TCP connections will increase the number of packets 3641 required to successfully perform these attacks. Section 3.1 of this 3642 document discusses the use of port randomization. 3644 It must be noted that given that these blind DupACK triggering 3645 attacks do not require the attacker to forge valid TCP Sequence 3646 numbers and TCP Acknowledgement numbers, port randomization should 3647 not be relied upon as a first line of defense. 3649 Ingress and Egress filtering 3651 Ingress and Egress filtering reduces the number of systems in the 3652 global Internet that can perform attacks that rely on forged source 3653 IP addresses. While protection from the blind attacks discussed in 3654 Section 9.2 should not rely only on Ingress and Egress filtering, its 3655 deployment is recommended to help prevent all attacks that rely on 3656 forged IP addresses. RFC 3704 [Baker and Savola, 2004], RFC 2827 3657 [Ferguson and Senie, 2000], and [NISCC, 2006] provide advice on 3658 Ingress and Egress filtering. 3660 Generalized TTL Security Mechanism (GTSM) 3662 RFC 5082 [Gill et al, 2007] proposes a check on the TTL field of the 3663 IP packets that correspond to a given TCP connection to reduce the 3664 number of systems that could successfully attack the protected TCP 3665 connection. It provides for the attacks discussed in this document 3666 the same level of protection than for the attacks described in 3667 [Watson, 2004] and RFC 4953 [Touch, 2007]. While implementation of 3668 this mechanism may be useful in some scenarios, it should be clear 3669 that countermeasures discussed in the previous sections provide a 3670 more effective and simpler solution than that provided by the GTSM. 3672 9.3. TCP Explicit Congestion Notification (ECN) 3674 ECN (Explicit Congestion Notification) provides a mechanism for 3675 intermediate systems to signal congestion to the communicating 3676 endpoints that in some scenarios can be used as an alternative to 3677 dropping packets. 3679 RFC 3168 [Ramakrishnan et al, 2001] contains a detailed discussion of 3680 the possible ways and scenarios in which ECN could be exploited by an 3681 attacker. 3683 RFC 3540 [Spring et al, 2003] specifies an improvement to ECN based 3684 on nonces, that protects against accidental or malicious concealment 3685 of marked packets from the TCP sender. The specified mechanism 3686 defines a "NS" ("Nonce Sum") field in the TCP header that makes use 3687 of one bit from the Reserved field, and requires a modification in 3688 both of the endpoints of a TCP connection to process this new field. 3689 This mechanism is still in "Experimental" status, and since it might 3690 suffer from the behavior of some middle-boxes such as firewalls or 3691 packet-scrubbers, we defer a recommendation of this mechanism until 3692 more experience is gained. 3694 There also is ongoing work in the research community and the IETF to 3695 define alternate semantics for the ECN field of the IP header (e.g., 3696 see [PCNWG, 2009]). 3698 The following subsections try to summarize the security implications 3699 of ECN. 3701 9.3.1. Possible attacks by a compromised router 3703 Firstly, a router controlled by a malicious user could erase the CE 3704 codepoint (either by replacing it with the ECT(0), ECT(1), or non-ECT 3705 codepoints), effectively eliminating the congestion indication. As a 3706 result, the corresponding TCP sender would not reduce its data 3707 transmission rate, possibly leading to network congestion. This 3708 could also lead to unfairness, as this flow could experience better 3709 performance than other flows for which the congestion indication is 3710 not erased (and thus their transmission rate is reduced). 3712 Secondly, a router controlled by a malicious user could 3713 illegitimately set the CE codepoint, falsely indicating congestion, 3714 to cause the TCP sender to reduce its data transmission rate. 3715 However, this particular attack is no worse than the malicious router 3716 simply dropping the packets rather setting their CE codepoint. 3718 Thirdly, a malicious router could turn off the ECT codepoint of a 3719 packet, thus disabling ECN support. As a result, if the packet later 3720 arrives at a router that is experiencing congestion, it may be 3721 dropped rather than marked. As with the previous scenario, though, 3722 this is no worse than the malicious router simply dropping the 3723 corresponding packet. 3725 It should be noted that a compromised on-path IP router could engage 3726 in a much broader range of attacks, with broader impacts, and at much 3727 lower attacker cost than the ones described here. Such a compromised 3728 router is extremely unlikely to engage in the attack vectors 3729 discussed in this section, given the existence of more effective 3730 attack vectors that have lower attacker cost. 3732 9.3.2. Possible attacks by a malicious TCP endpoint 3734 If a packet with the ECT codepoint set arrives at an ECN-capable 3735 router that is experiencing moderate congestion, the router may 3736 decide to set its CE codepoint instead of dropping it. If either of 3737 the TCP endpoints do not honour the congestion indication provided by 3738 an ECN-capable router, this would result in unfairness, as other 3739 (legitimate) ECN-capable flows would still reduce their sending rate 3740 in response to the ECN marking of packets. Furthermore, under 3741 moderate congestion, non-ECN-capable flows would be subject to packet 3742 drops by the same router. As a result, the flow with a malicious TCP 3743 end-point would obtain better service than the legitimate flows. 3745 As noted in RFC 3168 [Ramakrishnan et al, 2001], a TCP endpoint 3746 falsely indicating ECN capability could lead to unfairness, allowing 3747 the mis-beheaving flow to get more than its fair share of the 3748 bandwidth. This could be the result of the mis-behavior of either of 3749 the TCP endpoints. For example, the sending TCP could indicate ECN 3750 capability, but then send a CWR in response to an ECE without 3751 actually reducing its congestion window. Alternatively (or in 3752 addition), the receiving TCP could simply ignore those packets with 3753 the CE codepoint set, thus avoiding the sending TCP from receiving 3754 the congestion indication. 3756 In the case of the sending TCP ignoring the ECN congestion 3757 indication, this would be no worse than the sending TCP ignoring the 3758 congestion indication provided by a lost segment. However, the case 3759 of a TCP receiver ignoring the CE codepoint allows the TCP receiver 3760 to get more than its fair share of bandwidth in a way that was 3761 previously unavailable. If congestion was kept "moderate", then the 3762 malicious TCP receiver could maintain the unfairness, as the router 3763 experiencing congestion would mark the offending packets of the 3764 misbehaving flow rather than dropping them. At the same time, 3765 legitimate ECN-capable flows would respond to the congestion 3766 indication provided by the CE codepoint, while legitimate non-ECN- 3767 capable flows would be subject of packet dropping. However, if 3768 congestion turned to sufficiently heavy, the router experiencing 3769 congestion would switch from marking packets to dropping packets, and 3770 at that point the attack vector provided by ECN could no longer be 3771 exploited (until congestion returns to moderate state). 3773 RFC 3168 [Ramakrishnan et al, 2001] describes the use of "penalty 3774 boxes" which would act on flows that do not respond appropriately to 3775 congestion indications. Section 10 of RFC 3168 suggests that a first 3776 action taken at a penalty box for an ECN-capable flow would be to 3777 switch to dropping packets (instead of marking them), and, if the 3778 flow does not respond appropriately to the congestion indication, the 3779 penalty box could reset the misbehaving connection. Here we 3780 discourage implementation of such a policy, as it would create a 3781 vector for connection-reset attacks. For example, an attacker could 3782 forge TCP segments with the same four-tuple as the targeted 3783 connection and cause them to transit the penalty box. The penalty 3784 box would first switch from marking to dropping packets. However, 3785 the attacker would continue sending forged segments, at a steady 3786 rate. As a result, if the penalty box implemented such a severe 3787 policy of resetting connections for flows that still do not respond 3788 to end-to-end congestion control after switching from marking to 3789 dropping, the attacked connection would be reset. 3791 10. TCP API 3793 Section 3.8 of RFC 793 [Postel, 1981c] describes the minimum set of 3794 TCP User Commands required of all TCP Implementations. Most 3795 operating systems provide an Application Programming Interface (API) 3796 that allows applications to make use of the services provided by TCP. 3797 One of the most popular APIs is the Sockets API, originally 3798 introduced in the BSD networking package [McKusick et al, 1996]. 3800 10.1. Passive opens and binding sockets 3802 When there is already a pending passive OPEN for some local port 3803 number, TCP SHOULD NOT allow processes that do not belong to the same 3804 user to "reuse" the local port for another passive OPEN. 3805 Additionally, reuse of a local port SHOULD default to "off", and be 3806 enabled only by an explicit command (e.g., the setsockopt() function 3807 of the Sockets API). 3809 DISCUSSION: 3811 RFC 793 specifies the syntax of the "OPEN" command, which can be 3812 used to perform both passive and active opens. The syntax of this 3813 command is as follows: 3815 OPEN (local port, foreign socket, active/passive [, timeout] [, 3816 precedence] [, security/compartment] [, options]) -> local 3817 connection name 3819 When this command is used to perform a passive open (i.e., the 3820 active/passive flag is set to passive), the foreign socket 3821 parameter may be either fully-specified (to wait for a particular 3822 connection) or unspecified (to wait for any call). 3824 As discussed in Section 2.7 of RFC 793 [Postel, 1981c], if there 3825 are several passive OPENs with the same local socket (recorded in 3826 the corresponding TCB), an incoming connection will be matched to 3827 the TCB with the more specific foreign socket. This means that 3828 when the foreign socket of a passive OPEN matches that of the 3829 incoming connection request, that passive OPEN takes precedence 3830 over those passive OPENs with an unspecified foreign socket. 3832 Popular implementations such as the Sockets API let the user 3833 specify the local socket as fully-specified {local IP address, 3834 local TCP port} pair, or as just the local TCP port (leaving the 3835 local IP address unspecified). In the former case, only those 3836 connection requests sent to {local port, local IP address} will be 3837 accepted. In the latter case, connection requests sent to any of 3838 the system's IP addresses will be accepted. In a similar fashion 3839 to the generic API described in Section 2.7 of RFC 793, if there 3840 is a pending passive OPEN with a fully-specified local socket that 3841 matches that for which a connection establishment request has been 3842 received, that local socket will take precedence over those which 3843 have left the local IP address unspecified. The implication of 3844 this is that an attacker could "steal" incoming connection 3845 requests meant for a local application by performing a passive 3846 OPEN that is more specific than that performed by the legitimate 3847 application. 3849 10.2. Active opens and binding sockets 3851 TCP SHOULD NOT allow port numbers that have been allocated for a TCP 3852 that is the LISTEN or CLOSED states to be specified as the "local 3853 port" argument of the "OPEN" command. 3855 An implementation MAY relax the aforementioned restriction when the 3856 process or system user requesting allocation of such a port number is 3857 the same that the process or system user controlling the TCP in the 3858 CLOSED or LISTEN states with the same port number. 3860 DISCUSSION: 3862 As discussed in Section 10.1, the "OPEN" command specified in 3863 Section 3.8 of RFC 793 [Postel, 1981c] can be used to perform 3864 active opens. In case of active opens, the parameter "local port" 3865 will contain a so-called "ephemeral port". While the only 3866 requirement for such an ephemeral port is that the resulting 3867 connection-id is unique, port numbers that are currently in use by 3868 a TCP in the LISTEN state should not be allowed for use as 3869 ephemeral ports. If this rule is not complied, an attacker could 3870 potentially "steal" an incoming connection to a local server 3871 application by issuing a connection request to the victim client 3872 at roughly the same time the client tries to connect to the victim 3873 server application. If the SYN segment corresponding to the 3874 attacker's connection request and the SYN segment corresponding to 3875 the victim client "cross each other in the network", and provided 3876 the attacker is able to know or guess the ephemeral port used by 3877 the client, a TCP simultaneous open scenario would take place, and 3878 the incoming connection request sent by the client would be 3879 matched with the attacker's socket rather than with the victim 3880 server application's socket. 3882 As already noted, in order for this attack to succeed, the 3883 attacker should be able to guess or know (in advance) the 3884 ephemeral port selected by the victim client, and be able to know 3885 the right moment to issue a connection request to the victim 3886 client. While in many scenarios this may prove to be a difficult 3887 task, some factors such as an inadequate ephemeral port selection 3888 policy at the victim client could make this attack feasible. 3890 It should be noted that most applications based on popular 3891 implementations of TCP API (such as the Sockets API) perform 3892 "passive opens" in three steps. Firstly, the application obtains 3893 a file descriptor to be used for inter-process communication 3894 (e.g., by issuing a socket() call). Secondly, the application 3895 binds the file descriptor to a local TCP port number (e.g., by 3896 issuing a bind() call), thus creating a TCP in the fictional 3897 CLOSED state. Thirdly, the aforementioned TCP is put in the 3898 LISTEN state (e.g., by issuing a listen() call). As a result, 3899 with such an implementation of the TCP API, even if port numbers 3900 in use for TCPs in the LISTEN state were not allowed for use as 3901 ephemeral ports, there is a window of time between the second and 3902 the third steps in which an attacker could be allowed to select a 3903 port number that would be later used for listening to incoming 3904 connections. Therefore, these implementations of the TCP API 3905 should enforce a stricter requirement for the allocation of port 3906 numbers: port numbers that are in use by a TCP in the LISTEN or 3907 CLOSED states should not be allowed for allocation as ephemeral 3908 ports. 3910 An implementation might choose to relax the aforementioned 3911 restriction when the process or system user requesting allocation 3912 of such a port number is the same that the process or system user 3913 controlling the TCP in the CLOSED or LISTEN states with the same 3914 port number. 3916 11. Blind in-window attacks 3918 In the last few years awareness has been raised about a number of 3919 "blind" attacks that can be performed against TCP by forging TCP 3920 segments that fall within the receive window [NISCC, 2004] [Watson, 3921 2004]. 3923 The term "blind" refers to the fact that the attacker does not have 3924 access to the packets that belong to the attacked connection. 3926 The effects of these attacks range from connection resets to data 3927 injection. While these attacks were known in the research community, 3928 they were generally considered unfeasible. However, increases in 3929 bandwidth availability and the use of larger TCP windows raised 3930 concerns in the community. The following subsections discuss a 3931 number of forgery attacks against TCP, along with the possible 3932 countermeasures to mitigate their impact. 3934 11.1. Blind TCP-based connection-reset attacks 3936 Blind connection-reset attacks have the goal of causing a TCP 3937 connection maintained between two TCP endpoints to be aborted. The 3938 level of damage that the attack may cause usually depends on the 3939 application running on top of TCP, with the more vulnerable 3940 applications being those that rely on long-lived TCP connections. 3942 An interesting case of such applications is BGP [Rekhter et al, 3943 2006], in which a connection-reset usually results in the 3944 corresponding entries of the routing table being flushed. 3946 There are a variety of vectors for performing TCP-based connection- 3947 reset attacks against TCP. [Watson, 2004] and [NISCC, 2004] raised 3948 awareness about connection-reset attacks that exploit the RST flag of 3949 TCP segments. [Ramaiah et al, 2008] noted that carefully crafted SYN 3950 segments could also be used to perform connection-reset attacks. 3951 This document describes yet two previously undocumented vectors for 3952 performing connection-reset attacks: the Precedence field of IP 3953 packets that encapsulate TCP segments, and illegal TCP options. 3955 11.1.1. RST flag 3957 TCP SHOULD implement the mitigation for RST-based attacks specified 3958 in [Ramaiah et al, 2008]. 3960 DISCUSSION: 3962 The RST flag signals a TCP peer that the connection should be 3963 aborted. In contrast with the FIN handshake (which gracefully 3964 terminates a TCP connection), an RST segment causes the connection 3965 to be abnormally closed. 3967 As stated in Section 3.4 of RFC 793 [Postel, 1981c], all reset 3968 segments are validated by checking their Sequence Numbers, with 3969 the Sequence Number considered valid if it is within the receive 3970 window. In the SYN-SENT state, however, an RST is valid if the 3971 Acknowledgement Number acknowledges the SYN segment that 3972 supposedly elicited the reset. 3974 [Ramaiah et al, 2008] proposes a modification to TCP's transition 3975 diagram to address this attack vector. The counter-measure is a 3976 combination of enforcing a more strict validation check on the 3977 sequence number of reset segments, and the addition of a 3978 "challenge" mechanism. With the implementation of the proposed 3979 mechanism, TCP would behave as follows: 3981 If the Sequence Number of an RST segment is outside the receive 3982 window, the segment is silently dropped (as stated by RFC 793). 3983 That is, a reset segment is discarded unless it passes the 3984 following check: 3986 RCV.NXT <= Sequence Number < RCV.NXT+RCV.WND 3988 If the sequence number falls exactly on the left-edge of the 3989 receive window, the reset is honoured. That is, the connection is 3990 reset if the following condition is true: 3992 Sequence Number == RCV.NXT 3994 If an RST segment passes the first check (i.e., it is within the 3995 receive window) but does not pass the second check (i.e., it does 3996 not fall exactly on the left edge of the receive window), an 3997 Acknowledgement segment ("challenge ACK") is set in response: 3999 4001 This Acknowledgement segment is referred to as a "challenge ACK" 4002 as, in the event the RST segment that elicited it had been 4003 legitimate (but silently dropped as a result of enforcing the 4004 above checks), the challenge ACK would elicit a new reset segment 4005 that would fall exactly on the left edge of the window and would 4006 thus pass all the above checks, finally resetting the connection. 4008 We recommend the implementation of this countermeasure. However, 4009 we are aware of patent claims on this counter-measure, and suggest 4010 vendors to research the consequences of the possible patents that 4011 may apply. 4013 [US-CERT, 2003a] is an advisory of a firewall system that was 4014 found particularly vulnerable to resets attack because of not 4015 validating the TCP Sequence Number of RST segments. Clearly, all 4016 TCPs (including those in middle-boxes) should validate RST 4017 segments as discussed in this section. 4019 11.1.2. SYN flag 4021 Processing of SYN segments received for connections in the 4022 synchronized states SHOULD occur as follows: 4024 o If a SYN segment is received for a connection in any synchronized 4025 state other than TIME-WAIT, respond with an ACK, applying rate- 4026 throttling. [Ramaiah et al, 2008] 4028 o If the corresponding connection is in the TIME-WAIT state, then 4029 process the incomming SYN as specified in 4030 [I-D.ietf-tcpm-tcp-timestamps]. 4032 DISCUSSION: 4034 Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if a 4035 SYN segment is received with a valid (i.e., "in window") Sequence 4036 Number, an RST segment should be sent in response, and the 4037 connection should be aborted. 4039 The IETF has published an RFC, "Improving TCP's Resistance to 4040 Blind In-Window Attacks" [Ramaiah et al, 2008] which addresses, 4041 among others, this variant of TCP-based connection-reset attack. 4042 This section describes the counter-measure proposed by the IETF, a 4043 problem that may arise from the implementation of that solution, 4044 and a workaround to it. 4046 In order to mitigate this attack vector, [Ramaiah et al, 2008] 4047 proposes to change TCP's reaction to SYN segments as follows. 4048 When a SYN segment is received for a connection in any of the 4049 synchronized states, an Acknowledgement (ACK) segment is sent in 4050 response. 4052 As discussed in [Ramaiah et al, 2008], there is a corner-case that 4053 would not be properly handled by this mechanism. If a host (TCP 4054 A) establishes a TCP connection with a remote peer (TCP B), and 4055 then crashes, reboots and tries to initiate a new incarnation of 4056 the same connection (i.e., a connection with the same four-tuple 4057 as the previous connection) using an Initial Sequence Number equal 4058 to the RCV.NXT value at the remote peer (TCP B), the ACK segment 4059 sent by TCP B in response to the SYN segment would contain an 4060 Acknowledgement number that would be considered valid by TCP A, 4061 and thus an RST segment would not be sent in response to the 4062 Acknowledgement (ACK) segment. As this ACK would not have the SYN 4063 bit set, TCP A (being in the SYN-SENT state) would silently drop 4064 it (as stated on page 68 of RFC 793). After a Retransmission 4065 Timeout (RTO), TCP A would retransmit its SYN segment, which would 4066 lead to the same sequence of events as before. Eventually, TCP A 4067 would timeout, and the connection would be aborted. This is a 4068 corner case in which the introduced change would lead to a non- 4069 desirable behavior. However, we consider this scenario to be 4070 extremely unlikely and, in the event it ever took place, the 4071 connection would nevertheless be aborted after retrying for a 4072 period of USER TIMEOUT seconds. 4074 However, when this change is implemented exactly as described in 4075 [Ramaiah et al, 2008], the potential of interoperability problems 4076 is introduced, as a heuristic widely incorporated in many TCP 4077 implementations is disabled. 4079 In a number of scenarios a socket pair may need to be reused while 4080 the corresponding four-tuple is still in the TIME-WAIT state in a 4081 remote TCP peer. For example, a client accessing some service on 4082 a host may try to create a new incarnation of a previous 4083 connection, while the corresponding four-tuple is still in the 4084 TIME-WAIT state at the remote TCP peer (the server). This may 4085 happen if the ephemeral port numbers are being reused too quickly, 4086 either because of a bad policy of selection of ephemeral ports, or 4087 simply because of a high connection rate to the corresponding 4088 service. In such scenarios, the establishment of new connections 4089 that reuse a four-tuple that is in the TIME-WAIT state would fail. 4090 In order to avoid this problem, RFC 1122 [Braden, 1989] states (in 4091 Section 4.2.2.13) that when a connection request is received with 4092 a four-tuple that is in the TIME-WAIT state, the connection 4093 request could be accepted if the sequence number of the incoming 4094 SYN segment is greater than the last sequence number seen on the 4095 previous incarnation of the connection (for that direction of the 4096 data transfer). 4098 This requirement aims at avoiding the sequence number space of the 4099 new and old incarnations of the connection to overlap, thus 4100 avoiding old segments from the previous incarnation of the 4101 connection to be accepted as valid by the new connection. 4103 The requirement in [Ramaiah et al, 2008] to disregard SYN segments 4104 received for connections in any of the synchronized states forbids 4105 the implementation of the heuristic described above. As a result, 4106 we argue that the processing of SYN segments proposed in [Ramaiah 4107 et al, 2008] should apply only for connections in any of the 4108 synchronized states other than the TIME-WAIT state. 4110 11.1.3. Security/Compartment 4112 If the security/compartment field of an incoming TCP segment does not 4113 match the value recorded in the corresponding TCB, TCP SHOULD NOT 4114 abort the connection, but simply discard the corresponding packet. 4115 Additionally, this whole event SHOULD be logged as a security 4116 violation. 4118 DISCUSSION: 4120 Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if 4121 the IP security/compartment of an incoming segment does not 4122 exactly match the security/compartment in the TCB, a RST segment 4123 should be sent, and the connection should be aborted. 4125 A discussion of the IP security options relevant to this section 4126 can be found in Section 3.13.2.12, Section 3.13.2.13, and Section 4127 3.13.2.14 of [CPNI, 2008]. 4129 This certainly provides another attack vector for performing 4130 connection-reset attacks, as an attacker could forge TCP segments 4131 with a security/compartment that is different from that recorded 4132 in the corresponding TCB and, as a result, the attacked connection 4133 would be reset. 4135 It is interesting to note that for connections in the ESTABLISHED 4136 state, this check is performed after validating the TCP Sequence 4137 Number and checking the RST bit, but before validating the 4138 Acknowledgement field. Therefore, even if the stricter validation 4139 of the Acknowledgement field (described in Section 3.4) was 4140 implemented, it would not help to mitigate this attack vector. 4142 This attack vector can be easily mitigated by relaxing the 4143 reaction to TCP segments with "incorrect" security/compartment 4144 values as specified in this section. 4146 11.1.4. Precedence 4148 If the Precedence field of an incomming TCP segment does not match 4149 the value recorded in the corresponding TCB, TCP MUST NOT abort the 4150 connection, and MUST instead continue processing the segment as 4151 specified by RFC 793. 4153 DISCUSSION: 4155 Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if 4156 the IP Precedence of an incoming segment does not exactly match 4157 the Precedence recorded in the TCB, a RST segment should be sent, 4158 and the connection should be aborted. 4160 This certainly provides another attack vector for performing 4161 connection-reset attacks, as an attacker could forge TCP segments 4162 with a IP Precedence that is different from that recorded in the 4163 corresponding TCB and, as a result, the attacked connection would 4164 be reset. 4166 It is interesting to note that for connections in the ESTABLISHED 4167 state, this check is performed after validating the TCP Sequence 4168 Number and checking the RST bit, but before validating the 4169 Acknowledgement field. Therefore, even if the stricter validation 4170 of the Acknowledgement field (described in Section 3.4) were 4171 implemented, it would not help to mitigate this attack vector. 4173 This attack vector can be easily mitigated by relaxing the 4174 reaction to TCP segments with "incorrect" IP Precedence values. 4175 That is, even if the Precedence field does not match the value 4176 recorded in the corresponding TCB, TCP should not abort the 4177 connection, and should instead continue processing the segment as 4178 specified by RFC 793. 4180 It is interesting to note that resetting a connection due to a 4181 change in the Precedence value might have a negative impact on 4182 interoperability. For example, the packets that correspond to the 4183 connection could temporarily take a different internet path, in 4184 which some middle-box could re-mark the Precedence field (due to 4185 administration policies at the network to be transited). In such 4186 a scenario, an implementation following the advice in RFC 793 4187 would abort the connection, when the connection would have 4188 probably survived. 4190 While the IPv4 Type of Service field (and hence the Precedence 4191 field) has been redefined by the Differentiated Services (DS) 4192 field specified in RFC 2474 [Nichols et al, 1998], RFC 793 4193 [Postel, 1981c] was never formally updated in this respect. We 4194 note that both legacy systems that have not been upgraded to 4195 implement the differentiated services architecture described in 4196 RFC 2475 [Blake et al, 1998] and current implementations that have 4197 extrapolated the discussion of the Precedence field to the 4198 Differentiated Services field may still be vulnerable to the 4199 connection reset vector discussed in this section. 4201 11.1.5. Illegal options 4203 TCP MUST silently drop those TCP segments that contain TCP options 4204 with illegal option lengths. 4206 DISCUSSION: 4208 Section 4.2.2.5 of RFC 1122 [Braden, 1989] discusses the 4209 processing of TCP options. It states that TCP must be able to 4210 receive a TCP option in any segment, and must ignore without error 4211 any option it does not implement. Additionally, it states that 4212 TCP should be prepared to handle an illegal option length (e.g., 4213 zero) without crashing, and suggests handling such illegal options 4214 by resetting the corresponding connection and logging the reason. 4215 However, this suggested behavior could be exploited to perform 4216 connection-reset attacks. Therefore, as discussed in Section 3.10 4217 of this document, we advise TCP implementations to silently drop 4218 those TCP segments that contain illegal option lengths. 4220 11.2. Blind data-injection attacks 4222 An attacker could try to inject data in the stream of data being 4223 transferred on the connection. As with the other attacks described 4224 in Section 11 of this document, in order to perform a blind data 4225 injection attack the attacker would need to know or guess the four- 4226 tuple that identifies the TCP connection to be attacked. 4227 Additionally, he should be able to guess a valid ("in window") TCP 4228 Sequence Number, and a valid Acknowledgement Number. 4230 As discussed in Section 3.4 of this document, [Ramaiah et al, 2008] 4231 proposes to enforce a more strict check on the Acknowledgement Number 4232 of incoming segments than that specified in RFC 793 [Postel, 1981c]. 4234 Implementation of the proposed check requires more packets on the 4235 side of the attacker to successfully perform a blind data-injection 4236 attack. However, it should be noted that applications concerned with 4237 any of the attacks discussed in Section 11 of this document should 4238 make use of proper authentication techniques, such as those specified 4239 for IPsec in RFC 4301 [Kent and Seo, 2005]. 4241 12. Information leaking 4243 12.1. Remote Operating System detection via TCP/IP stack fingerprinting 4245 Clearly, remote Operating System (OS) detection is a useful tool for 4246 attackers. Tools such as nmap [Fyodor, 2006b] can usually detect the 4247 operating system type and version of a remote system with an 4248 amazingly accurate precision. This information can in turn be used 4249 by attackers to tailor their exploits to the identified operating 4250 system type and version. 4252 Evasion of OS fingerprinting can prove to be a very difficult task. 4253 Most systems make use of a variety of protocols, each of which have a 4254 large number of parameters that can be set to arbitrary values. 4255 Thus, information on the operating system may be obtained from a 4256 number of sources ranging from application banners to more obscure 4257 parameters such as TCP's retransmission timer. 4259 Nmap [Fyodor, 2006b] is probably the most popular tool for remote OS 4260 detection via active TCP/IP stack fingerprinting. p0f [Zalewski, 4261 2006a], on the other hand, is a tool for performing remote OS 4262 detection via passive TCP/IP stack fingerprinting. SinFP [SinFP, 4263 2006] can perform both active and passive fingerprinting. Finally, 4264 TBIT [TBIT, 2001] is a TCP fingerprinting tool that aims at 4265 characterizing the behavior of a remote TCP peer based on active 4266 probes, and which has been widely used in the research community. 4268 TBIT [TBIT, 2001] implements a number of tests not present in other 4269 tools, such as characterizing the behavior of a TCP peer with respect 4270 to TCP congestion control. 4272 [Fyodor, 1998] and [Fyodor, 2006a] are classic papers on the subject. 4273 [Miller, 2006] and [Smith and Grundl, 2002] provide an introduction 4274 to passive TCP/IP stack fingerprinting. [Smart et al, 2000] and 4275 [Beck, 2001] discuss some techniques for evading OS detection through 4276 TCP/IP stack fingerprinting. 4278 The following subsections discuss TCP-based techniques for remote OS 4279 detection via and, where possible, propose ways to mitigate them. 4281 12.1.1. FIN probe 4283 TCP MUST silently drop TCP any segments received for a connection in 4284 the LISTEN state that do not have the SYN, RST, or ACK flags set. In 4285 the rest of the cases, the processing rules in RFC 793 MUST be 4286 applied. 4288 DISCUSSION: 4290 The attacker sends a FIN (or any packet without the SYN or the ACK 4291 flags set) to an open port. RFC 793 [Postel, 1981c] leaves the 4292 reaction to such segments unspecified. As a result, some 4293 implementations silently drop the received segment, while others 4294 respond with a RST. 4296 12.1.2. Bogus flag test 4298 TCP MUST ignore any flags not supported, and MUST NOT reflect them if 4299 a TCP segment is sent in response to the one just received. 4301 DISCUSSION: 4303 The attacker sends a TCP segment setting at least one bit of the 4304 Reserved field. Some implementations ignore this field, while 4305 others reset the corresponding connection or reflect the field in 4306 the TCP segment sent in response. 4308 12.1.3. TCP ISN sampling 4310 The attacker samples a number of Initial Sequence Numbers by sending 4311 a number of connection requests. Many TCP implementations differ on 4312 the ISN generator they implement, thus allowing the correlation of 4313 ISN generation algorithm to the operating system type and version. 4315 This document advises implementing an ISN generator that follows the 4316 behavior described in RFC 1948 [Bellovin, 1996]. However, it should 4317 be noted that even if all TCP implementations generated their ISNs as 4318 proposed in RFC 1948, there is still a number of implementation 4319 details that are left unspecified, which would allow remote OS 4320 fingerprinting by means of ISN sampling. For example, the time- 4321 dependent parameter of the hash could have a different frequency in 4322 different TCP implementations. 4324 12.1.4. TCP initial window 4326 Many TCP implementations differ on the initial TCP window they use. 4327 There are a number of factors that should be considered when 4328 selecting the TCP window to be used for a given system. A number of 4329 implementations that use static windows (i.e., no automatic buffer 4330 tuning mechanisms are implemented) default to a window of around 32 4331 KB, which seems sensible for the general case. On the other hand, a 4332 window of 4 KB seems to be common practice for connections servicing 4333 critical applications such as BGP. It is clear that the window size 4334 is a tradeoff among a number of considerations. Section 3.7 4335 discusses some of the considerations that should be made when 4336 selecting the window size for a TCP connection. 4338 If automatic tuning mechanisms are implemented, we suggest the 4339 initial window to be at least 4 * RMSS segments. We note that a 4340 remote OS fingerprinting tool could still sample the advertised TCP 4341 window, trying to correlate the advertised window with the potential 4342 automatic buffer tuning algorithm and Operating System. 4344 12.1.5. RST sampling 4346 If an RST must be sent in response to an incoming segment, then if 4347 the ACK bit of an incoming TCP segment is off, a Sequence Number of 4348 zero MUST be used in the RST segment sent in response. That is, 4350 4352 It should be noted that the SEG.LEN value used for the 4353 Acknowledgement Number MUST be incremented once for each flag set in 4354 the original segment that makes use of a byte of the sequence number 4355 space. That is, if only one of the SYN or FIN flags were set in the 4356 received segment, the Acknowledgement Number of the response should 4357 be set to SEG.SEQ+SEG.LEN+1. If both the SYN and FIN flags were set 4358 in the received segment, the Acknowledgement Number should be set to 4359 SEG.SEQ+SEG.LEN+2. 4361 We also RECOMMEND that TCP sets ACK bit (and the Acknowledgement 4362 Number) in all outgoing RST segments, as it allows for additional 4363 validation checks to be enforced at the system receiving the segment. 4365 DISCUSSION: 4367 [Fyodor, 1998] reports that many implementations differ in the 4368 Acknowledgement Number they use in response to segments received 4369 for connections in the CLOSED state. In particular, these 4370 implementations differ in the way they construct the RST segment 4371 that is sent in response to those TCP segments received for 4372 connections in the CLOSED state. 4374 RFC 793 [Postel, 1981c] describes (in pages 36-37) how RST 4375 segments are to be generated. According to this RFC, the ACK bit 4376 (and the Acknowledgment Number) is set in a RST only if the 4377 incoming segment that elicited the RST did not have the ACK bit 4378 set (and thus the Sequence Number of the outgoing RST segment must 4379 be set to zero). However, we recommend TCP implementations to set 4380 the ACK bit (and the Acknowledgement Number) in all outgoing RST 4381 segments, as it allows for additional validation checks to be 4382 enforced at the system receiving the segment. 4384 12.1.6. TCP options 4386 Different implementations differ in the TCP options they enable by 4387 default. Additionally, they differ in the actual contents of the 4388 options, and in the order in which the options are included in a TCP 4389 segment. There is currently no recommendation on the order in which 4390 to include TCP options in TCP segments. 4392 12.1.7. Retransmission Timeout (RTO) sampling 4394 TCP uses a retransmission timer for retransmitting data in the 4395 absence of any feedback from the remote data receiver. The duration 4396 of this timer is referred to as "retransmission timeout" (RTO). RFC 4397 2988 [Paxson and Allman, 2000] specifies the algorithm for computing 4398 the TCP retransmission timeout (RTO). 4400 The algorithm allows the use of clocks of different granularities, to 4401 accommodate the different granularities used by the existing 4402 implementations. Thus, the difference in the resulting RTO can be 4403 used for remote OS fingerprinting. [Veysset et al, 2002] describes 4404 how to perform remote OS fingerprinting by sampling and analyzing the 4405 RTO of the target system. However, this fingerprinting technique has 4406 at least the following drawbacks: 4408 o It is usually much slower than other fingerprinting techniques, as 4409 it may require considerable time to sample the RTO of a given 4410 target. 4412 o It is less reliable than other fingerprinting techniques, as 4413 latency and packet loss can lead to bogus results. 4415 While in principle it would be possible to defeat this fingerprinting 4416 technique (e.g., by obfuscating the granularity of the clock used for 4417 computing the RTO), we consider that a more important step to defeat 4418 remote OS detection is for implementations to address the more 4419 effective fingerprinting techniques described in Sections 12.1.1 4420 through 12.1.7 of this document. 4422 12.2. System uptime detection 4424 The "uptime" of a system may prove to be valuable information to an 4425 attacker. For example, it might reveal the last time a security 4426 patch was applied. Information about system uptime is usually leaked 4427 by TCP header fields or options that are (or may be) time-dependent, 4428 and are usually initialized to zero when the system is bootstrapped. 4429 As a result, if the attacker knows the frequency with which the 4430 corresponding parameter or header field is incremented, and is able 4431 to sample the current value of that parameter or header field, the 4432 system uptime will be easily obtained. Two fields that can 4433 potentially reveal the system uptime is the Sequence Number field of 4434 a SYN or SYN/ACK segment (i.e., when it contains an ISN) and the 4435 TSval field of the timestamp option. Section 3.3.1 of this document 4436 discusses the generation of TCP Initial Sequence Numbers. Section 4437 4.7.1 of this document discusses the generation of TCP timestamps. 4439 13. Covert channels 4441 As virtually every communications protocol, TCP can be exploited to 4442 establish covert channels. While an exhaustive discussion of covert 4443 channels is out of the scope of this document, for completeness of 4444 the document we simply note that it is possible for a (probably 4445 malicious) user to establish a covert channel by means of TCP, such 4446 that data can be surreptitiously passed to a remote system, probably 4447 unnoticed by a monitoring system, and with the possibility of 4448 concealing the location of the source system. 4450 In most cases, covert channels based on manipulation of TCP fields 4451 can be eliminated by protocol scrubbers and other middle-boxes. On 4452 the other hand, "timing channels" may prove to be more difficult to 4453 eliminate. 4455 [Rowland, 1996] contains a discussion of covert channels in the 4456 TCP/IP protocol suite, with some TCP-based examples. [Giffin et al, 4457 2002] describes the use of TCP timestamps for the establishment of 4458 covert channels. [Zander, 2008] contains an extensive bibliography 4459 of papers on covert channels, and a list of freely-available tools 4460 that implement covert channels with the TCP/IP protocol suite. 4462 14. TCP Port scanning 4464 TCP port scanning aims at identifying TCP port numbers on which there 4465 is a process listening for incoming connections. That is, it aims at 4466 identifying TCPs at the target system that are in the LISTEN state. 4467 The following subsections describe different TCP port scanning 4468 techniques that have been implemented in freely-available tools. 4469 These subsections focus only on those port scanning techniques that 4470 exploit features of TCP itself, and not of other communication 4471 protocols. 4473 For example, the following subsections do not discuss the 4474 exploitation of application protocols (such as FTP) or the 4475 exploitation of features of underlying protocols (such as the IP 4476 Identification field) for port-scanning purposes. 4478 14.1. Traditional connect() scan 4480 The most trivial scanning technique consists in trying to perform the 4481 TCP three-way handshake with each of the port numbers at the target 4482 system (e.g. by issuing a call to the connect() function of the 4483 Sockets API). The three-way handshake will complete for port numbers 4484 that are "open", but will fail for those port numbers that are 4485 "closed". 4487 As this port-scanning technique can be implemented by issuing a call 4488 to the connect() function of the Sockets API that normal applications 4489 use, it does not require the attacker to have superuser privileges. 4490 The downside of this port-scanning technique is that it is less 4491 efficient than other scanning methods (e.g., the "SYN scan" described 4492 in Section 14.2), and that it can be easily logged by the target 4493 system. 4495 14.2. SYN scan 4497 The SYN scan was introduced as a "stealth" port-scanning technique. 4498 It aims at avoiding the target system from logging the port scan by 4499 not completing the TCP three-way handshake. When a SYN/ACK segment 4500 is received in response to the initial SYN segment, the system 4501 performing the port scan will respond with an RST segment, thus 4502 preventing the three-way handshake from completing. While this port- 4503 scanning technique is harder to detect and log than the traditional 4504 connect() scan described in Section 14.1, most current NIDS (Network 4505 Intrusion Detection Systems) can detect and log it. 4507 SYN scans are sometimes mistakenly reported as "SYN flood" attacks by 4508 NIDS, though. 4510 The main advantage of this port scanning technique is that it is much 4511 more efficient than the traditional connect() scan. 4513 In order to implement this port-scanning technique, port-scanning 4514 tools usually bypass the TCP API, and forge the SYN segments they 4515 send (e.g., by using raw sockets). This typically requires the 4516 attacker to have superuser privileges to be able to run the port- 4517 scanning tool. 4519 14.3. FIN, NULL, and XMAS scans 4521 TCP SHOULD respond with an RST when a TCP segment is received for a 4522 connection in the LISTEN state, and the incoming segment has neither 4523 the SYN bit nor the RST bit set. 4525 DISCUSSION: 4527 RFC 793 [Postel, 1981c] states, in page 65, that an incoming 4528 segment that does not have the RST bit set and that is received 4529 for a connection in the fictional state CLOSED causes an RST to be 4530 sent in response. Pages 65-66 of RFC 793 describes the processing 4531 of incoming segments for connections in the state LISTEN, and 4532 implicitly states that an incoming segment that does not have the 4533 ACK bit set (and is not a SYN or an RST) should be silently 4534 dropped. 4536 As a result, an attacker can exploit this situation to perform a 4537 port scan by sending TCP segments that do not have the ACK bit set 4538 to the target system. When a port is "open" (i.e., there is a TCP 4539 in the LISTEN state on the corresponding port), the target system 4540 will respond with an RST segment. On the other hand, if the port 4541 is "closed" (i.e., there is a TCP in the fictional state CLOSED) 4542 the attacker will not get any response from the target system. 4544 Since the only requirement for exploiting this port scanning 4545 vector is that the probe segments must not have the ACK bit set, 4546 there are a number of different TCP control-bits combinations that 4547 can be used for the probe segments. 4549 When the probe segment sent to the target system is a TCP segment 4550 that has only the FIN bit set, the scanning technique is usually 4551 referred to as a "FIN scan". When the probe packet is a TCP 4552 segment that does not have any of the control bits set, the 4553 scanning technique is usually known as a "NULL scan". Finally, 4554 when the probe packet sent to the target system has only the FIN, 4555 PSH, and the URG bits set, the port-scanning technique is known as 4556 a "XMAS scan". 4558 It should be clear that while the aforementioned control-bits 4559 combinations are the most popular ones, other combinations could 4560 be used to exploit this port-scanning vector. For example, the 4561 CWR, ECE, and/or any of the Reserved bits could be set in the 4562 probe segments. 4564 The advantage of this port-scanning technique is that in can 4565 bypass some stateless firewalls. However, the downside is that a 4566 number of implementations do not comply strictly with RFC 793 4567 [Postel, 1981c], and thus always respond to the probe segments 4568 with an RST, regardless of whether the port is open or closed. 4570 This port-scanning vector can be easily defeated as rby responding 4571 with an RST when a TCP segment is received for a connection in the 4572 LISTEN state, and the incoming segment has neither the SYN bit nor 4573 the RST bit set. 4575 14.4. Maimon scan 4577 If a TCP that is in the CLOSED or LISTEN states receives a TCP 4578 segment with both the FIN and ACK bits set, it MUST respond with a 4579 RST. 4581 DISCUSSION: 4583 This port scanning technique was introduced in [Maimon, 1996] with 4584 the name "StealthScan" (method #1), and was later incorporated 4585 into the nmap tool [Fyodor, 2006b] as the "Maimon scan". 4587 This port scanning technique employs TCP segments that have both 4588 the FIN and ACK bits sets as the probe segments. While according 4589 to RFC 793 [Postel, 1981c] these segments should elicit an RST 4590 regardless of whether the corresponding port is open or closed, a 4591 programming flaw found in a number of TCP implementations has 4592 caused some systems to silently drop the probe segment if the 4593 corresponding port was open (i.e., there was a TCP in the LISTEN 4594 state), and respond with an RST only if the port was closed. 4596 Therefore, an RST would indicate that the scanned port is closed, 4597 while the absence of a response from the target system would 4598 indicate that the scanned port is open. 4600 While this bug has not been found in current implementations of 4601 TCP, it might still be present in some legacy systems. 4603 14.5. Window scan 4605 When sending an RST segment, TCP SHOULD set the Window field to zero. 4607 DISCUSSION: 4609 This port-scanning technique employs ACK segments as the probe 4610 packets. ACK segments will elicit an RST from the target system 4611 regardless of whether the corresponding TCP port is open or 4612 closed. However, as described in [Maimon, 1996], some systems set 4613 the Window field of the RST segments with different values 4614 depending on whether the corresponding TCP port is open or closed. 4615 These systems set the Window field of their RST segments to zero 4616 when the corresponding TCP port is closed, and set the Window 4617 field to a non-zero value when the corresponding TCP port is open. 4619 As a result, an attacker could exploit this situation for 4620 performing a port scan by sending ACK segments to the target 4621 system, and examining the Window field of the RST segments that 4622 his probe segments elicit. 4624 In order to defeat this port-scanning technique, we recommend TCP 4625 implementations to set the Window field to zero in all the RST 4626 segments they send. Most popular implementations of TCP already 4627 implement this policy. 4629 14.6. ACK scan 4631 The so-called "ACK scan" is not really a port-scanning technique 4632 (i.e., it does not aim at determining whether a specific port is open 4633 or closed), but rather aims at determining whether some intermediate 4634 system is filtering TCP segments sent to that specific port number. 4636 The probe packet is a TCP segment with the ACK bit set which, 4637 according to RFC 793 [Postel, 1981c] should elicit an RST from the 4638 target system regardless of whether the corresponding TCP port is 4639 open or closed. If no response is received from the target system, 4640 it is assumed that some intermediate system is filtering the probe 4641 packets sent to the target system. 4643 It should be noted that this "port scanning" techniques exploits 4644 basic TCP processing rules, and therefore cannot be defeated at an 4645 end-system. 4647 15. Processing of ICMP error messages by TCP 4649 TCP SHOULD silently ignore received ICMP Source Quench messages. 4651 TCP SHOULD process ICMP "hard errors" as "soft errors" when they are 4652 received for connections that are in any of he synchronized states. 4654 TCP SHOULD process ICMP "fragmentation needed and DF bit set" and 4655 ICMPv6 "Packet Too Big" error messages as described in [RFC5927]. 4657 DISCUSSION: 4659 [RFC5927] analyzes a number of vulnerabilities based on crafted 4660 ICMP messages, along with possible counter-measures. 4662 16. TCP interaction with the Internet Protocol (IP) 4664 16.1. TCP-based traceroute 4666 The traceroute tool is used to identify the intermediate systems the 4667 local system and the destination system. It is usually implemented 4668 by sending "probe" packets with increasing IP Time to Live values 4669 (starting from 0), without maintaining any state with the final 4670 destination. 4672 Some traceroute implementations use ICMP "echo request" messages as 4673 the probe packets, while others use UDP packets or TCP SYN segments. 4675 In some cases, the state-less nature of the traceroute tool may 4676 prevent it from working correctly across stateful devices such as 4677 Network Address Translators (NATs) or firewalls. 4679 In order to by-pass this limitation, an attacker could establish a 4680 TCP connection with the destination system, and start sending TCP 4681 segments on that connection with increasing IP Time to Live values 4682 (starting from 0) [Zalewski, 2007] [Zalewski, 2008]. Provided ICMP 4683 error messages are not blocked by any intermediate system, an 4684 attacker could exploit this technique to map the network topology 4685 behind the aforementioned stateful devices in scenarios in which he 4686 could not have achieved this goal using the traditional traceroute 4687 tool. 4689 NATs [Srisuresh and Egevang, 2001] and other middle-boxes could 4690 defeat this network-mapping technique by overwriting the Time to Live 4691 of the packets they forward to the internal network. For example, 4692 they could overwrite the Time to Live of all packets being forwarded 4693 to an internal network with a value such as 128. We strongly 4694 recommend against overwriting the IP Time to Live field with the 4695 value 255 or other similar large values, as this could allow an 4696 attacker to bypass the protection provided by the Generalized TTL 4697 Security Mechanism (GTSM) described in RFC 5087 [Gill et al, 2007]. 4699 [Gont and Srisuresh, 2008] discusses the security implications of 4700 NATs, and proposes mitigations for this and other issues. 4702 16.2. Blind TCP data injection through fragmented IP traffic 4704 As discussed in Section 11.2, TCP data injection attacks usually 4705 require an attacker to guess or know a number of parameters related 4706 with the target TCP connection, such as the connection-id {Source 4707 Address, Source Port, Destination Address, Destination Port}, the TCP 4708 Sequence Number, and the TCP Acknowledgement Number. Provided these 4709 values are obfuscated as recommended in this document, the chances of 4710 an off-path attacker of successfully performing a data injection 4711 attack against a TCP connection are fairly low for many of the most 4712 common scenarios. 4714 As discussed in this document, randomization of the values contained 4715 in different TCP header fields is not a replacement for cryptographic 4716 methods for protecting a TCP connection, such as IPsec (specified in 4717 RFC 4301 [Kent and Seo, 2005]). 4719 However, [Zalewski, 2003b] describes a possible vector for performing 4720 a TCP data injection attack that does not require the attacker to 4721 guess or know the aforementioned TCP connection parameters, and could 4722 therefore be successfully exploited in some scenarios with less 4723 effort than that required to exploit the more traditional data- 4724 injection attack vectors. 4726 The attack vector works as follows. When one system is transferring 4727 information to a remote peer by means of TCP, and the resulting 4728 packet gets fragmented, the first fragment will usually contain the 4729 entire TCP header which, together with the IP header, includes all 4730 the connection parameters that an attacker would need to guess or 4731 know to successfully perform a data injection attack against TCP. If 4732 an attacker were able to forge all the fragments other than the first 4733 one, his forged fragments could be reassembled together with the 4734 legitimate first fragment, and thus he would be relieved from the 4735 hard task of guessing or knowing connection parameters such as the 4736 TCP Sequence Number and the TCP Acknowledgement Number. 4738 In order to successfully exploit this attack vector, the attacker 4739 should be able to guess or know both of the IP addresses involved in 4740 the target TCP connection, the IP Identification value used for the 4741 specific packet he is targeting, and the TCP Checksum of that target 4742 packet. While it would seem that these values are hard to guess, in 4743 some specific scenarios, and with some security-unwise implementation 4744 approaches for the TCP and IP protocols, these values may be feasible 4745 to guess or know. For example, if the sending system uses 4746 predictable IP Identification values, the attacker could simply 4747 perform a brute force attack, trying each of the possible 4748 combinations for the TCP Checksum field. In more specific scenarios, 4749 the attacker could have more detailed knowledge about the data being 4750 transferred over the target TCP connection, which might allow him to 4751 predict the TCP Checksum of the target packet. For example, if both 4752 of the involved TCP peers used predictable values for the TCP 4753 Sequence Number and for the IP Identification fields, and the 4754 attacker knew the data being transferred over the target TCP 4755 connection, he could be able to carefully forge the IP payload of his 4756 IP fragments so that the checksum of the reassembled TCP segment 4757 matched the Checksum included in the TCP header of the first (and 4758 legitimate) IP fragment. 4760 As discussed in Section 4.1 of [CPNI, 2008], IP fragmentation 4761 provides a vector for performing a variety of attacks against an IP 4762 implementation. Therefore, we discourage the reliance on IP 4763 fragmentation by end-systems, and recommend the implementation of 4764 mechanisms for the discovery of the Path-MTU, such as that described 4765 in Section 15.7.3 of this document and/or that described in RFC 4821 4766 [Mathis and Heffner, 2007]. We nevertheless recommend randomization 4767 of the IP Identification field as described in Section 3.5.2 of 4768 [CPNI, 2008]. While randomization of the IP Identification field 4769 does not eliminate this attack vector, it does require more work on 4770 the side of the attacker to successfully exploit it. 4772 16.3. Broadcast and multicast IP addresses 4774 TCP connection state is maintained between only two endpoints at a 4775 time. As a result, broadcast and multicast IP addresses should not 4776 be allowed for the establishment of TCP connections. Section 4.3 of 4777 [CPNI, 2008] provides advice about which specific IP address blocks 4778 should not be allowed for connection-oriented protocols such as TCP. 4780 17. Security Considerations 4782 This document provides a thorough security assessment of the 4783 Transmission Control Protocol (TCP), identifies a number of 4784 vulnerabilities, and specifies possible counter-measures. 4785 Additionally, it provides implementation guidance such that the 4786 resilience of TCP implementations is improved. 4788 18. Acknowledgements 4790 The author would like to thank (in alphabetical order) David Borman, 4791 Wesley Eddy, and Alfred Hoenes, for providing valuable feedback on 4792 earlier versions of thi document. 4794 This document is heavily based on the document "Security Assessment 4795 of the Transmission Control Protocol (TCP)" [CPNI, 2009] written by 4796 Fernando Gont on behalf of CPNI (Centre for the Protection of 4797 National Infrastructure). 4799 The author would like to thank (in alphabetical order) Randall 4800 Atkinson, Guillermo Gont, Alfred Hoenes, Jamshid Mahdavi, Stanislav 4801 Shalunov, Michael Welzl, Dan Wing, Andrew Yourtchenko, Michal 4802 Zalewski, and Christos Zoulas, for providing valuable feedback on 4803 earlier versions of the UK CPNI document. 4805 Additionally, the author would like to thank (in alphabetical order) 4806 Mark Allman, David Black, Ethan Blanton, David Borman, James Chacon, 4807 John Heffner, Jerrold Leichter, Jamshid Mahdavi, Keith Scott, Bill 4808 Squier, and David White, who generously answered a number of 4809 questions that araised while the aforementioned document was being 4810 written. 4812 Finally, the author would like to thank CPNI (formely NISCC) for 4813 their continued support. 4815 19. References 4817 Abley, J., Savola, P., Neville-Neil, G. 2007. Deprecation of Type 0 4818 Routing Headers in IPv6. RFC 5095. 4820 Allman, M. 2003. TCP Congestion Control with Appropriate Byte 4821 Counting (ABC). RFC 3465. 4823 Allman, M. 2008. Comments On Selecting Ephemeral Ports. Available 4824 at: http://www.icir.org/mallman/share/ports-dec08.pdf 4826 Allman, M., Paxson, V., Stevens, W. 1999. TCP Congestion Control. 4827 RFC 2581. 4829 Allman, M., Balakrishnan, H., Floyd, S. 2001. Enhancing TCP's Loss 4830 Recovery Using Limited Transmit. RFC 3042. 4832 Allman, M., Floyd, S., and C. Partridge. 2002. Increasing TCP's 4833 Initial Window. RFC 3390. 4835 Baker, F. 1995. Requirements for IP Version 4 Routers. RFC 1812. 4837 Baker, F., Savola, P. 2004. Ingress Filtering for Multihomed 4838 Networks. RFC 3704. 4840 Barisani, A. 2006. FTester - Firewall and IDS testing tool. 4841 Available at: http://dev.inversepath.com/trac/ftester 4843 Beck, R. 2001. Passive-Aggressive Resistance: OS Fingerprint 4844 Evasion. Linux Journal. 4846 Bellovin, S. M. 1989. Security Problems in the TCP/IP Protocol 4847 Suite. Computer Communication Review, Vol. 19, No. 2, pp. 32-48. 4849 Bellovin, S. M. 1996. Defending Against Sequence Number Attacks. 4850 RFC 1948. 4852 Bellovin, S. M. 2006. Towards a TCP Security Option. IETF Internet- 4853 Draft (draft-bellovin-tcpsec-00.txt), work in progress. 4855 Bernstein, D. J. 1996. SYN cookies. Available at: 4856 http://cr.yp.to/syncookies.html 4858 Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, 4859 W., 1998. An Architecture for Differentiated Services. RFC 2475. 4861 Blanton, E., Allman, M., Fall, K., Wang, L. 2003. A Conservative 4862 Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for 4863 TCP. RFC 3517. 4865 Borman, D. 1997. Post to the tcp-impl mailing-list. Message-Id: 4866 <199706061526.KAA01535@frantic.BSDI.COM>. Available at: 4867 http://www.kohala.com/start/borman.97jun06.txt 4869 Borman, D., Deering, S., Hinden, R. 1999. IPv6 Jumbograms. RFC 4870 2675. 4872 Braden, R. 1989. Requirements for Internet Hosts -- Communication 4873 Layers. RFC 1122. 4875 Braden, R. 1992. Extending TCP for Transactions -- Concepts. RFC 4876 1379. 4878 Braden, R. 1994. T/TCP -- TCP Extensions for Transactions Functional 4879 Specification. RFC 1644. 4881 CCSDS. 2006. Consultative Committee for Space Data Systems (CCSDS) 4882 Recommendation Communications Protocol Specification (SCPS) -- 4883 Transport Protocol (SCPS-TP). Blue Book. Issue 2. Available at: 4884 http://public.ccsds.org/publications/archive/714x0b2.pdf 4886 CERT. 1996. CERT Advisory CA-1996-21: TCP SYN Flooding and IP 4887 Spoofing Attacks. Available at: 4888 http://www.cert.org/advisories/CA-1996-21.html 4890 CERT. 1997. CERT Advisory CA-1997-28 IP Denial-of-Service Attacks. 4891 Available at: http://www.cert.org/advisories/CA-1997-28.html 4893 CERT. 2000. CERT Advisory CA-2000-21: Denial-of-Service 4894 Vulnerabilities in TCP/IP Stacks. Available at: 4895 http://www.cert.org/advisories/CA-2000-21.html 4897 CERT. 2001. CERT Advisory CA-2001-09: Statistical Weaknesses in 4898 TCP/IP Initial Sequence Numbers. Available at: 4899 http://www.cert.org/advisories/CA-2001-09.html 4901 CERT. 2003. CERT Advisory CA-2003-13 Multiple Vulnerabilities in 4902 Snort Preprocessors. Available at: 4903 http://www.cert.org/advisories/CA-2003-13.html 4905 Cisco. 2008a. Cisco Security Appliance Command Reference, Version 4906 7.0. Available at: http://www.cisco.com/en/US/docs/security/asa/ 4907 asa70/command/reference/tz.html#wp1288756 4908 Cisco. 2008b. Cisco Security Appliance System Log Messages, Version 4909 8.0. Available at: http://www.cisco.com/en/US/docs/security/asa/ 4910 asa80/system/message/logmsgs.html#wp4773952 4912 Clark, D.D. 1982. Fault isolation and recovery. RFC 816. 4914 Clark, D.D. 1988. The Design Philosophy of the DARPA Internet 4915 Protocols, Computer Communication Review, Vol. 18, No.4, pp. 106-114. 4917 Connolly, T., Amer, P., Conrad, P. 1994. An Extension to TCP : 4918 Partial Order Service. RFC 1693. 4920 Conta, A., Deering, S., Gupta, M. 2006. Internet Control Message 4921 Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) 4922 Specification. RFC 4443. 4924 CORE. 2003. Core Secure Technologies Advisory CORE-2003-0307: Snort 4925 TCP Stream Reassembly Integer Overflow Vulnerability. Available at: 4926 http://www.coresecurity.com/common/showdoc.php?idx=313&idxseccion=10 4928 CPNI, 2008. Security Assessment of the Internet Protocol. Available 4929 at: http://www.cpni.gov.uk/Docs/InternetProtocol.pdf 4931 CPNI, 2009. Security Assessment of the Transmission Control Protocol 4932 (TCP). Available at: 4933 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-TCP.pdf 4935 daemon9, route, and infinity. 1996. IP-spoofing Demystified (Trust- 4936 Relationship Exploitation), Phrack Magazine, Volume Seven, Issue 4937 Forty-Eight, File 14 of 18. Available at: 4938 http://www.phrack.org/archives/48/P48-14 4940 Deering, S., Hinden, R. 1998. Internet Protocol, Version 6 (IPv6) 4941 Specification. RFC 2460. 4943 Dharmapurikar, S., Paxson, V. 2005. Robust TCP Stream Reassembly In 4944 the Presence of Adversaries. Proceedings of the USENIX Security 4945 Symposium 2005. 4947 Duke, M., Braden, R., Eddy, W., Blanton, E. 2006. A Roadmap for 4948 Transmission Control Protocol (TCP) Specification Documents. RFC 4949 4614. 4951 Ed3f. 2002. Firewall spotting and networks analisys with a broken 4952 CRC. Phrack Magazine, Volume 0x0b, Issue 0x3c, Phile #0x0c of 0x10. 4953 Available at: http://www.phrack.org/phrack/60/p60-0x0c.txt 4955 Eddy, W. 2007. TCP SYN Flooding Attacks and Common Mitigations. RFC 4956 4987. 4958 Fenner, B. 2006. Experimental Values in IPv4, IPv6, ICMPv4, ICMPv6, 4959 UDP, and TCP Headers. RFC 4727. 4961 Ferguson, P., and Senie, D. 2000. Network Ingress Filtering: 4962 Defeating Denial of Service Attacks which employ IP Source Address 4963 Spoofing. RFC 2827. 4965 Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 4966 Leach, P., and Berners-Lee, T. 1999. Hypertext Transfer Protocol -- 4967 HTTP/1.1. RFC 2616. 4969 Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. 2000. An Extension 4970 to the Selective Acknowledgement (SACK) Option for TCP. RFC 2883. 4972 Floyd, S., Henderson, T., Gurtov, A. 2004. The NewReno Modification 4973 to TCP's Fast Recovery Algorithm. RFC 3782. 4975 Floyd, S., Allman, M., Jain, A., Sarolahti, P. 2007. Quick-Start for 4976 TCP and IP. RFC 4782. 4978 Fyodor. 1998. Remote OS Detection via TCP/IP Stack Fingerprinting. 4979 Phrack Magazine, Volume 8, Issue, 54. 4981 Fyodor. 2006a. Remote OS Detection via TCP/IP Fingerprinting (2nd 4982 Generation). Available at: http://insecure.org/nmap/osdetect/. 4984 Fyodor. 2006b. Nmap - Free Security Scanner For Network Exploration 4985 and Audit. Available at: http://www.insecure.org/nmap. 4987 Fyodor. 2008. Nmap Reference Guide: Port Scanning Techniques. 4988 Available at: http://nmap.org/book/man-port-scanning-techniques.html 4990 GIAC. 2000. Egress Filtering v 0.2. Available at: 4991 http://www.sans.org/y2k/egress.htm 4993 Giffin, J., Greenstadt, R., Litwack, P., Tibbetts, R. 2002. Covert 4994 Messaging through TCP Timestamps. PET2002 (Workshop on Privacy 4995 Enhancing Technologies), San Francisco, CA, USA, April2002. 4996 Available at: 4997 http://web.mit.edu/greenie/Public/CovertMessaginginTCP.ps 4999 Gill, V., Heasley, J., Meyer, D., Savola, P, Pignataro, C. 2007. The 5000 Generalized TTL Security Mechanism (GTSM). RFC 5082. 5002 Gont, F. 2006. Advanced ICMP packet filtering. Available at: 5003 http://www.gont.com.ar/papers/icmp-filtering.html 5004 Gont, F. 2008a. ICMP attacks against TCP. IETF Internet-Draft 5005 (draft-ietf-tcpm-icmp-attacks-04.txt), work in progress. 5007 Gont, F.. 2008b. TCP's Reaction to Soft Errors. IETF Internet-Draft 5008 (draft-ietf-tcpm-tcp-soft-errors-09.txt), work in progress. 5010 Gont, F. 2009. On the generation of TCP timestamps. IETF Internet- 5011 Draft (draft-gont-tcpm-tcp-timestamps-01.txt), work in progress. 5013 Gont, F., Srisuresh, P. 2008. Security Implications of Network 5014 Address Translators (NATs). IETF Internet-Draft 5015 (draft-gont-behave-nat-security-01.txt), work in progress. 5017 Gont, F., Yourtchenko, A. 2009. On the implementation of TCP urgent 5018 data. IETF Internet-Draft (draft-gont-tcpm-urgent-data-01.txt), work 5019 in progress. 5021 Heffernan, A. 1998. Protection of BGP Sessions via the TCP MD5 5022 Signature Option. RFC 2385. 5024 Heffner, J. 2002. High Bandwidth TCP Queuing. Senior Thesis. 5026 Hnes, A. 2007. TCP options - tcp-parameters IANA registry. Post to 5027 the tcpm wg mailing-list. Available at: 5028 http://www.ietf.org/mail-archive/web/tcpm/current/msg03199.html 5030 IANA. 2007. Transmission Control Protocol (TCP) Option Numbers. 5031 Avialable at: http://www.iana.org/assignments/tcp-parameters/ 5033 IANA. 2008. Port Numbers. Available at: 5034 http://www.iana.org/assignments/port-numbers 5036 Jacobson, V. 1988. Congestion Avoidance and Control. Computer 5037 Communication Review, vol. 18, no. 4, pp. 314-329. Available at: 5038 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z 5040 Jacobson, V., Braden, R. 1988. TCP Extensions for Long-Delay Paths. 5041 RFC 1072. 5043 Jacobson, V., Braden, R., Borman, D. 1992. TCP Extensions for High 5044 Performance. RFC 1323. 5046 Jones, S. 2003. Port 0 OS Fingerprinting. Available at: 5047 http://www.gont.com.ar/docs/port-0-os-fingerprinting.txt 5049 Kent, S. and Seo, K. 2005. Security Architecture for the Internet 5050 Protocol. RFC 4301. 5052 Klensin, J. 2008. Simple Mail Transfer Protocol. RFC 5321. 5054 Ko, Y., Ko, S., and Ko, M. 2001. NIDS Evasion Method named SeolMa. 5055 Phrack Magazine, Volume 0x0b, Issue 0x39, phile #0x03 of 0x12. 5056 Available at: http://www.phrack.org/issues.html?issue=57&id=3#article 5058 Lahey, K. 2000. TCP Problems with Path MTU Discovery. RFC 2923. 5060 Larsen, M., Gont, F. 2008. Port Randomization. IETF Internet-Draft 5061 (draft-ietf-tsvwg-port-randomization-02), work in progress. 5063 Lemon, 2002. Resisting SYN flood DoS attacks with a SYN cache. 5064 Proceedings of the BSDCon 2002 Conference, pp 89-98. 5066 Maimon, U. 1996. Port Scanning without the SYN flag. Phrack 5067 Magazine, Volume Seven, Issue Fourty-Nine, phile #0x0f of 0x10. 5068 Available at: 5069 http://www.phrack.org/issues.html?issue=49&id=15#article 5071 Mathis, M., Mahdavi, J., Floyd, S. Romanow, A. 1996. TCP Selective 5072 Acknowledgment Options. RFC 2018. 5074 Mathis, M., and Heffner, J. 2007. Packetization Layer Path MTU 5075 Discovery. RFC 4821. 5077 McCann, J., Deering, S., Mogul, J. 1996. Path MTU Discovery for IP 5078 version 6. RFC 1981. 5080 McKusick, M., Bostic, K., Karels, M., and J. Quarterman. 1996. The 5081 Design and Implementation of the 4.4BSD Operating System. Addison- 5082 Wesley. 5084 Meltman. 1997. new TCP/IP bug in win95. Post to the bugtraq mailing- 5085 list. Available at: http://insecure.org/sploits/land.ip.DOS.html 5087 Miller, T. 2006. Passive OS Fingerprinting: Details and Techniques. 5088 Available at: http://www.ouah.org/incosfingerp.htm . 5090 Mogul, J., and Deering, S. 1990. Path MTU Discovery. RFC 1191. 5092 Morris, R. 1985. A Weakness in the 4.2BSD Unix TCP/IP Software. 5093 Technical Report CSTR-117, AT&T Bell Laboratories. Available at: 5094 http://pdos.csail.mit.edu/~rtm/papers/117.pdf . 5096 Myst. 1997. Windows 95/NT DoS. Post to the bugtraq mailing-list. 5097 Available at: http://seclists.org/bugtraq/1997/May/0039.html 5099 Nichols, K., Blake, S., Baker, F., and Black, D. 1998. Definition of 5100 the Differentiated Services Field (DS Field) in the IPv4 and IPv6 5101 Headers. RFC 2474. 5103 NISCC. 2004. NISCC Vulnerability Advisory 236929: Vulnerability 5104 Issues in TCP. Available at: 5105 http://www.uniras.gov.uk/niscc/docs/re-20040420-00391.pdf 5107 NISCC. 2005. NISCC Vulnerability Advisory 532967/NISCC/ICMP: 5108 Vulnerability Issues in ICMP packets with TCP payloads. Available 5109 at: http://www.niscc.gov.uk/niscc/docs/re-20050412-00303.pdf 5111 NISCC. 2006. NISCC Technical Note 01/2006: Egress and Ingress 5112 Filtering. Available at: 5113 http://www.niscc.gov.uk/niscc/docs/re-20060420-00294.pdf?lang=en 5115 Ostermann, S. 2008. tcptrace tool. Tool and documentation available 5116 at: http://www.tcptrace.org. 5118 Paxson, V., Allman, M. 2000. Computing TCP's Retransmission Timer. 5119 RFC 2988. 5121 PCNWG. 2009. Congestion and Pre-Congestion Notification (pcn) 5122 charter. Available at: 5123 http://www.ietf.org/html.charters/pcn-charter.html 5125 PMTUDWG. 2007. Path MTU Discovery (pmtud) charter. Available at: 5126 http://www.ietf.org/html.charters/OLD/pmtud-charter.html 5128 Postel, J. 1981a. Internet Protocol. DARPA Internet Program. 5129 Protocol Specification. RFC 791. 5131 Postel, J. 1981b. Internet Control Message Protocol. RFC 792. 5133 Postel, J. 1981c. Transmission Control Protocol. DARPA Internet 5134 Program. Protocol Specification. RFC 793. 5136 Postel, J. 1987. TCP AND IP BAKE OFF. RFC 1025. 5138 Ptacek, T. H., and Newsham, T. N. 1998. Insertion, Evasion and 5139 Denial of Service: Eluding Network Intrusion Detection. Secure 5140 Networks, Inc. Available at: 5141 http://www.aciri.org/vern/Ptacek-Newsham-Evasion-98.ps 5143 Ramaiah, A., Stewart, R., and Dalal, M. 2008. Improving TCP's 5144 Robustness to Blind In-Window Attacks. IETF Internet-Draft 5145 (draft-ietf-tcpm-tcpsecure-10.txt), work in progress. 5147 Ramakrishnan, K., Floyd, S., and Black, D. 2001. The Addition of 5148 Explicit Congestion Notification (ECN) to IP. RFC 3168. 5150 Rekhter, Y., Li, T., Hares, S. 2006. A Border Gateway Protocol 4 5151 (BGP-4). RFC 4271. 5153 Rivest, R. 1992. The MD5 Message-Digest Algorithm. RFC 1321. 5155 Rowland, C. 1997. Covert Channels in the TCP/IP Protocol Suite. 5156 First Monday Journal, Volume 2, Number 5. Available at: 5157 http://www.firstmonday.org/issues/issue2_5/rowland/ 5159 Savage, S., Cardwell, N., Wetherall, D., Anderson, T. 1999. TCP 5160 Congestion Control with a Misbehaving Receiver. ACM Computer 5161 Communication Review, 29(5), October 1999. 5163 Semke, J., Mahdavi, J., Mathis, M. 1998. Automatic TCP Buffer 5164 Tuning. ACM Computer Communication Review, Vol. 28, No. 4. 5166 Shalunov, S. 2000. Netkill. Available at: 5167 http://www.internet2.edu/~shalunov/netkill/netkill.html 5169 Shimomura, T. 1995. Technical details of the attack described by 5170 Markoff in NYT. Message posted in USENETs comp.security.misc 5171 newsgroup, Message-ID: <3g5gkl$5j1@ariel.sdsc.edu>. Available at: 5172 http://www.gont.com.ar/docs/post-shimomura-usenet.txt. 5174 Silbersack, M. 2005. Improving TCP/IP security through randomization 5175 without sacrificing interoperability. EuroBSDCon 2005 Conference. 5177 SinFP. 2006. Net::SinFP - a Perl module to do OS fingerprinting. 5178 Available at: 5179 http://www.gomor.org/cgi-bin/index.pl?mode=view;page=sinfp 5181 Smart, M., Malan, G., Jahanian, F. 2000. Defeating TCP/IP Stack 5182 Fingerprinting. Proceedings of the 9th USENIX Security Symposium, 5183 pp. 229-240. Available at: http://www.usenix.org/publications/ 5184 library/proceedings/sec2000/full_papers/smart/smart_html/index.html 5186 Smith, C., Grundl, P. 2002. Know Your Enemy: Passive Fingerprinting. 5187 The Honeynet Project. 5189 Spring, N., Wetherall, D., Ely, D. 2003. Robust Explicit Congestion 5190 Notification (ECN) Signaling with Nonces. RFC 3540. 5192 Srisuresh, P., Egevang, K. 2001. Traditional IP Network Address 5193 Translator (Traditional NAT). RFC 3022. 5195 Stevens, W. R. 1994. TCP/IP Illustrated, Volume 1: The Protocols. 5197 Addison-Wesley Professional Computing Series. 5199 TBIT. 2001. TBIT, the TCP Behavior Inference Tool. Available at: 5200 http://www.icir.org/tbit/ 5202 Touch, J. 2007. Defending TCP Against Spoofing Attacks. RFC 4953. 5204 US-CERT. 2001. US-CERT Vulnerability Note VU#498440: Multiple TCP/IP 5205 implementations may use statistically predictable initial sequence 5206 numbers. Available at: http://www.kb.cert.org/vuls/id/498440 5208 US-CERT. 2003a. US-CERT Vulnerability Note VU#26825: Cisco Secure 5209 PIX Firewall TCP Reset Vulnerability. Available at: 5210 http://www.kb.cert.org/vuls/id/26825 5212 US-CERT. 2003b. US-CERT Vulnerability Note VU#464113: TCP/IP 5213 implementations handle unusual flag combinations inconsistently. 5214 Available at: http://www.kb.cert.org/vuls/id/464113 5216 US-CERT. 2004a. US-CERT Vulnerability Note VU#395670: FreeBSD fails 5217 to limit number of TCP segments held in reassembly queue. Available 5218 at: http://www.kb.cert.org/vuls/id/395670 5220 US-CERT. 2005a. US-CERT Vulnerability Note VU#102014: Optimistic TCP 5221 acknowledgements can cause denial of service. Available at: 5222 http://www.kb.cert.org/vuls/id/102014 5224 US-CERT. 2005b. US-CERT Vulnerability Note VU#396645: Microsoft 5225 Windows vulnerable to DoS via LAND attack. Available at: 5226 http://www.kb.cert.org/vuls/id/396645 5228 US-CERT. 2005c. US-CERT Vulnerability Note VU#637934: TCP does not 5229 adequately validate segments before updating timestamp value. 5230 Available at: http://www.kb.cert.org/vuls/id/637934 5232 US-CERT. 2005d. US-CERT Vulnerability Note VU#853540: Cisco PIX 5233 fails to verify TCP checksum. Available at: 5234 http://www.kb.cert.org/vuls/id/853540. 5236 Veysset, F., Courtay, O., Heen, O. 2002. New Tool And Technique For 5237 Remote Operating System Fingerprinting. Intranode Research Team. 5239 Watson, P. 2004. Slipping in the Window: TCP Reset Attacks, 5240 CanSecWest 2004 Conference. 5242 Welzl, M. 2008. Internet congestion control: evolution and current 5243 open issues. CAIA guest talk, Swinburne University, Melbourne, 5244 Australia. Available at: 5246 http://www.welzl.at/research/publications/caia-jan08.pdf 5248 Wright, G. and W. Stevens. 1994. TCP/IP Illustrated, Volume 2: The 5249 Implementation. Addison-Wesley. 5251 Zalewski, M. 2001a. Strange Attractors and TCP/IP Sequence Number 5252 Analysis. Available at: 5253 http://lcamtuf.coredump.cx/oldtcp/tcpseq.html 5255 Zalewski, M. 2001b. Delivering Signals for Fun and Profit. 5256 Available at: http://lcamtuf.coredump.cx/signals.txt 5258 Zalewski, M. 2002. Strange Attractors and TCP/IP Sequence Number 5259 Analysis - One Year Later. Available at: 5260 http://lcamtuf.coredump.cx/newtcp/ 5262 Zalewski, M. 2003a. Windows URG mystery solved! Post to the bugtraq 5263 mailing-list. Available at: 5264 http://lcamtuf.coredump.cx/p0f-help/p0f/doc/win-memleak.txt 5266 Zalewski, M. 2003b. A new TCP/IP blind data injection technique? 5267 Post to the bugtraq mailing-list. Available at: 5268 http://lcamtuf.coredump.cx/ipfrag.txt 5270 Zalewski, M. 2006a. p0f passive fingerprinting tool. Available at: 5271 http://lcamtuf.coredump.cx/p0f.shtml 5273 Zalewski, M. 2006b. p0f - RST+ signatures. Available at: 5274 http://lcamtuf.coredump.cx/p0f-help/p0f/p0fr.fp 5276 Zalewski, M. 2007. 0trace - traceroute on established connections. 5277 Post to the bugtraq mailing-list. Available at: 5278 http://seclists.org/bugtraq/2007/Jan/0176.html 5280 Zalewski, M. 2008. Museum of broken packets. Available at: 5281 http://lcamtuf.coredump.cx/mobp/ 5283 Zander, S. 2008. Covert Channels in Computer Networks. Available 5284 at: http://caia.swin.edu.au/cv/szander/cc/index.html 5286 Zquete, A. 2002. Improving the functionality of SYN cookies. 6th 5287 IFIP Communications and Multimedia Security Conference (CMS 2002). 5288 Available at: http://www.ieeta.pt/~avz/pubs/CMS02.html 5290 Zweig, J., Partridge, C. 1990. TCP Alternate Checksum Options. RFC 5291 1146. 5293 20. References 5295 20.1. Normative References 5297 [I-D.ietf-tcpm-tcp-timestamps] 5298 Gont, F., "Reducing the TIME-WAIT state using TCP 5299 timestamps", draft-ietf-tcpm-tcp-timestamps-03 (work in 5300 progress), December 2010. 5302 [I-D.ietf-tsvwg-port-randomization] 5303 Larsen, M. and F. Gont, "Transport Protocol Port 5304 Randomization Recommendations", 5305 draft-ietf-tsvwg-port-randomization-09 (work in progress), 5306 August 2010. 5308 [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the 5309 TCP Urgent Mechanism", RFC 6093, January 2011. 5311 20.2. Informative References 5313 [I-D.gont-timestamps-generation] 5314 Gont, F. and A. Oppermann, "On the generation of TCP 5315 timestamps", draft-gont-timestamps-generation-00 (work in 5316 progress), June 2010. 5318 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 5320 Appendix A. TODO list 5322 A Number of formatting issues still have to be fixed in this 5323 document. Among others are: 5325 o The ASCII-art corresponding to some figures are still missing. We 5326 still have to convert the nice JPGs of the UK CPNI document into 5327 ugly ASCII-art. 5329 o The references have not yet been converted to xml, but are 5330 hardcoded, instead. That's why they may not look as expected 5332 Appendix B. Change log (to be removed by the RFC Editor before 5333 publication of this document as an RFC) 5335 B.1. Changes from draft-ietf-tcpm-tcp-security-01 5337 A Number of formatting issues still have to be fixed in this 5338 document. Among others are: 5340 o The whole document was reformatted with RFC 1122 style. 5342 Author's Address 5344 Fernando Gont 5345 UK Centre for the Protection of National Infrastructure 5347 Email: fernando@gont.com.ar 5348 URI: http://www.cpni.gov.uk