idnits 2.17.1 draft-yourtchenko-nat-reveal-hash-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 25, 2010) is 4992 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) == Outdated reference: A later version (-05) exists of draft-ietf-intarea-shared-addressing-issues-01 == Outdated reference: A later version (-04) exists of draft-ietf-tcpm-tcp-timestamps-00 -- Obsolete informational reference (is this intentional?): RFC 1948 (Obsoleted by RFC 6528) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Yourtchenko 3 Internet-Draft D. Wing 4 Intended status: Standards Track cisco 5 Expires: February 26, 2011 August 25, 2010 7 NAT confessions: revealing the hosts behind the translator 8 draft-yourtchenko-nat-reveal-hash-00 10 Abstract 12 When an IP address is shared among several subscribers, it is 13 impossible to determine which subscriber has initiated that TCP 14 connection. This memo describes a technique to share the identity of 15 a subscriber that initiated a TCP connection with the TCP server.. 16 The proposed method avoids altering the application-level payload and 17 works well with SSL-protected connections. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on February 26, 2011. 36 Copyright Notice 38 Copyright (c) 2010 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4 55 3. Description . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Calculating the Internal Address Mapping . . . . . . . . . . . 5 57 5. Calculating the Verifier . . . . . . . . . . . . . . . . . . . 6 58 6. Encoding of the VFY into the packet: IP ID encoding . . . . . 6 59 7. Encoding of the VFY into the packet: TSval encoding . . . . . 6 60 8. Operation of the mechanism . . . . . . . . . . . . . . . . . . 7 61 8.1. Translator Operation . . . . . . . . . . . . . . . . . . . 7 62 8.2. Server Operation . . . . . . . . . . . . . . . . . . . . . 7 63 9. Interaction with TCP SYN cookies . . . . . . . . . . . . . . . 8 64 10. Other Mechanisms to Encode Client Identifier . . . . . . . . . 8 65 10.1. Defining a new TCP option to store the address . . . . . . 8 66 10.2. Using TSecr in TCP SYN . . . . . . . . . . . . . . . . . . 8 67 10.3. Reserving the different port ranges per client . . . . . . 8 68 11. Security Considerations . . . . . . . . . . . . . . . . . . . 8 69 12. IANA considerations . . . . . . . . . . . . . . . . . . . . . 9 70 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 71 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 14.1. Normative References . . . . . . . . . . . . . . . . . . . 9 73 14.2. Informative References . . . . . . . . . . . . . . . . . . 10 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 76 1. Introduction 78 There are several scenarios where it is valuable to know the identity 79 of a TCP client, including geolocation, DoS blocking, and spam 80 blacklists. Today, this is done by equating IPv4 address with 81 'identity'. However, the identity of a TCP client is obscured when 82 an IP address is shared I-D.ietf-intarea-shared-addressing-issues 83 [I-D.ietf-intarea-shared-addressing-issues]. IP address sharing is 84 done by both network address and port translators (NAPT) and by 85 application-layer proxies (e.g., HTTP or FTP proxies). 87 The current state of the art requires the address sharing alter the 88 application-level payload and include the identity of the internal 89 host -- usually the internal host's private IP address. This incurs 90 several drawbacks, 92 o adjustment of TCP sequence numbers and acknowledgement numbers for 93 the duration of the TCP session 95 o risk of false-positive application matching (e.g., accidentally 96 inserting an HTTP header into a non-HTTP payload). 98 o interference with application payload by increasing packet size 99 (e.g., MTU) 101 With SSL-protected applications the current state of the art 102 requires breaking the end-to-end encrypted connection. This results 103 in several undesirable consequences: 105 o necessity for the translator to break the end-to-end encryption, 106 typically by installing an addional Certificate Authority on the 107 client's CA trust list 109 o noticeable increase in the processing power required on the 110 address sharing device to decrypt and re-encrypt that application 111 payload 113 This specification avoids the problems described above, and defines 114 the method of communicating the TCP client's identity to the TCP 115 server by overloading the TCP timestamp field and IP Identifier field 116 of the initial TCP SYN. 118 This extension is necessary because IP address sharing, deployed by 119 NAT64 devices, will allow malicious users to connect to IPv4-capable 120 servers. Thus, until a server is only accessible via IPv6 (and 121 inaccessible via IPv4), the IPv4-capable server will suffer from an 122 inability to identify individual TCP clients as discussed in 123 I-D.ietf-intarea-shared-addressing-issues 125 [I-D.ietf-intarea-shared-addressing-issues]. 127 2. Notational Conventions 129 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 130 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 131 document are to be interpreted as described in RFC2119 [RFC2119]. 133 3. Description 135 This proposal leverages the common deployment of TCP timestamps and 136 that a timestamp-aware TCP server will echo the timestamp.. 138 The caveat with the above is that the remote peer must know in 139 advance if the TCP client implements this technique or not -- the 140 timestamp on the server side looks just the same. This could be 141 resolved by manual configuration but that is impractical, so an 142 automatic detection mechanism is proposed. The automatic mechanism 143 calculates a hash over the values of interest and placing the result 144 into another field. The receiver can then perform the same operation 145 and verify. If the received and computed values match, then the TCP 146 timestamp received does contain the encoded internal address. The 147 verifier value is computed as a hash function over the mapped value 148 encoded into the timestamp, address after translation, and the TCP 149 initial sequence number - i.e. the sequence number within the SYN 150 segment. The usage of the TCP initial sequence number allows to 151 avoid the verifier value being almost always the same. The reason 152 for doing so is to satisfy the protocol constraints of the field that 153 is used to convey this value. 155 In order to find some place for storing this verification value, we 156 make another observation: TCP SYN segments are generally rather 157 small, and the minimum MTU on IPv4 is 576. Typical stacks send the 158 TCP SYN with DF=1. Therefore, they would never be fragmented. This 159 means we could use the 16-bit value of the IP ID to put the verifier 160 value in. The verifier is dependent on the initial sequence number 161 (ISN) -- which is should have some randomness properties as described 162 in RFC1948 [RFC1948], therefore the IP ID will be reasonably 163 different to still serve its purpose even in the extremely unlikely 164 case that the TCP SYN is fragmented. 166 Using a 16-bit value as a verifier gives 1 in 65536 chances (or, 167 0.0015%) probability of erroneously judging that the timestamp 168 contains the encoded internal address. This may be insufficient 169 assurance for some of the scenarios. Therefore, we calculate the 170 verifier (referred to as VFY value) to be a 32-bit integer - and 171 store 16 or more bits of this value - at the expense of storing less 172 bits of Internal Address Mapping (iAM). However, we expect that the 173 range of iAM for a single public translation would be relatively 174 small - so, no information will be lost in this process. 176 4. Calculating the Internal Address Mapping 178 The main useful property of iAM is that it MUST stay the same for the 179 same internal address unless the configuration on the translator has 180 changed. Since the goal is to provide the stable mapping, rather 181 than fully reveal the internal address, any method that has this 182 property is acceptable - and the choice of it is left to the 183 implementors of the translator. If the addresses to be translated 184 are configured as a prefix, then the iAM can be obtained just by 185 taking the host bits of the address within the prefix. If the 186 assignment of these addresses is on an individual basis, then the 187 simple enumeration might be used. If the internal addresses are 188 assigned to the pool as set of subnets - then the combination of the 189 two methods above (the host bits in the least significant part, and 190 the enumeration in the most significant part) will give good results. 191 This also stimulates allocation of the internal address in equal- 192 sized chunks, which should make the maintenance of the network 193 easier. 195 As a result, the calculation of the iAM on the outgoing SYN segment 196 MUST return two values: 198 o iAM = Internal Address Mapping: a 32-bit unsigned integer 200 o siAM = Size of Internal Address Mapping, in bits: integer, 201 allowed range 9..24 - this is the number of significant bits 202 within the iAM. 204 The minimum value of siAM being 9 was chosen based on the following 205 logic: 207 o having a room of 512 possible hosts allows to keep the property of 208 iAM to not change during the smaller configuration changes, in 209 case the pool is made up of individual hosts. 211 o the range 9..24 has exactly 16 possible values, which will be 212 useful for encoding. 214 By encoding only the significant bits of the internal address mapping 215 the operator of the translator can minimize the probability of the 216 error - all the unused bits are allocated for the value used to 217 "fingerprint" the presence of the internal identifier. The more bits 218 this "Verifier" value can contain - the less is the chance of 219 accidental match - and erroneous record of the internal identifier 220 when there is none. 222 The range from 9 bits to 24 bits allows to encode between 512 and 223 16777216 internal identifiers for a single public IP address. 225 5. Calculating the Verifier 227 The verifier is calculated as a 32-bit result of a hash function. 228 This hash function is not expected to be cryptographically strong 229 (the 'Security considerations' section explains why), however it 230 should have good distribution, good collision resistance, good 231 avalanche behavior and be fast and cheap to compute. These 232 properties are satisfied by Murmur hash [URL.Murmur-hash] function, 233 therefore it is the hash that we will use. 235 The calculation of the VFY is performed as follows: 237 VFY = murmur(iAM | AddrPub | siAM, TCP-ISN) 239 o iAM is included into the calculation as a 32 bit word. 241 o siAM is included into the hash calculation as a single byte. 242 (TBD: the 'selector' referenced below might be a more natural 243 number to check against, instead of siAM ?). 245 6. Encoding of the VFY into the packet: IP ID encoding 247 The low 16 bits of the VFY are encoded in network order into the IP 248 ID of the packet after translation. the remaining 16 bits form the 249 "VFYhi" value, which we attempt to fit into the TSval along with the 250 other information. 252 7. Encoding of the VFY into the packet: TSval encoding 254 The TCP timestamp field encodes the iAM and VFYhi as follows: 256 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 257 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 259 |E E E E|S S S S| iAM MSB ... iAM LSB | VFYhi MSB .. VFYhi LSB | 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 The range of siAM gives 16 possible ways to store iAM (along with the 263 same number of degrees of assurance for the detection). In order to 264 distinguish between those, we introduce the encoding selector (S) 265 field, which will determine how the lower 24 bits are split between 266 the iAM and the upper 16 bit of VFY. Note that the smallest value of 267 siAM being 9, we will never be able to store the most significant bit 268 of VFY. 270 The value of S is the number of zero-fill right-shift operations it 271 would take on the low 24 bit in order to "normalize" the iAM - or, in 272 other words, it is the number of bits of VFYhi stored within the 273 timestamp. 275 Best practices in I-D.ietf-tcpm-tcp-timestamps 276 [I-D.ietf-tcpm-tcp-timestamps], mention that to reduce the TIME-WAIT 277 state the timestamp value should be monotonously increasing across 278 the connections with the same 5-tuple. To give the translators an 279 opportunity to achieve this property, we reserve several most 280 significant bits within the timestamp to signify the "Epoch" (E).This 281 would require storing some additional state per 5-tuple, and the 282 implementation of such a mechanism is outside of scope for this 283 document. The implementations that do not implement the monotonously 284 increasing timestamps, MUST keep the Epoch bits intact from the 285 original value of the timestamp. 287 8. Operation of the mechanism 289 This section outlines the use of this mechanism by the translators 290 and servers. 292 8.1. Translator Operation 294 The translator is involved into processing of the initial SYN segment 295 (calculating the new version of the TCP timestamp and IP ID), as well 296 as the SYN-ACK segments (restoring the original value of the TCP 297 timestamp within the TSecr field). 299 8.2. Server Operation 301 The server would operate on every SYN that is of interest for the 302 logging. It would extract the candidate iAM, and calculate the VFY 303 value based on the public address and TCP ISN within the received SYN 304 segment. Then it would compare the VFY against the corresponding 305 bits in the TSval and IP ID fields. If there is a match, it means 306 (with a reasonable probability) that the iAM was a valid one 307 calculated by the translator inbetween. This information is stored 308 for later access by the application listening on that socket (e.g., 309 stored in the TCB). 311 9. Interaction with TCP SYN cookies 313 TCP SYN cookies are commonly deployed to mitigate TCP SYN attacks 314 RFC4987 [RFC4987]. The mechanism described in this document requires 315 the server store extra information which arrives on the TCP SYN, 316 which increases the TCP server's attack surface. To mitigate this, 317 the translator should apply the similar algorithm to the timestamp of 318 the ACK segment that is sent by the initiator of the connection in 319 response to the server's SYN ACK. The authors considered that 320 serverside might use the TSval in its SYN ACK segment, however this 321 would interfere with the Extended syncookies. This section needs 322 further discussion. 324 10. Other Mechanisms to Encode Client Identifier 326 This section outlines other mechanisms that we considered, and 327 outlines the reasons we consider them not applicable. 329 10.1. Defining a new TCP option to store the address 331 This would be the cleanest and simplest approach, and is discussed in 332 [ I-D.wing-reveal-address]. 334 10.2. Using TSecr in TCP SYN 336 This value is set to zero, and is effectively unused - so it looks 337 like a convenient place. However this violates the RFC1323 338 [RFC1323], and this would require much more thorough testing - and 339 update to RFC1323 [RFC1323]. 341 10.3. Reserving the different port ranges per client 343 This approach has an appeal due to its simplicity, but it would be 344 specific to each NAPT device operated by each service provider. That 345 is, there is no way to identify the device or know the source port 346 range assigned to an TCP client without contacting the administrator 347 of the NAPT device. Restricting clients to a specific range also 348 exposes the clients to some security risk I-D.ietf-tsvwg-port- 349 randomization [I-D.ietf-tsvwg-port-randomization]. 351 11. Security Considerations 353 The connections that happen, today, without aNAPT necessarily reveal 354 the source address of the TCP client -- so revealing the identity of 355 the client this should not be a concern except for the installations 356 that attempt to use NAPT for "privacy" reasons. If such an 357 installation exists, it is easy to see that any 1:1 remapping of 358 e.g., IP ID would cause the failure of the validation algorithm - 359 therefore "protecting the identity". 361 Therefore, if an organization has more than one level of NAPT and 362 wants to ensure that the internal translators do not disclose the 363 information about the internal addresses, it can alter any of the 364 elements used for the calculations - e.g. randomize the ISN, or remap 365 the IP ID. 367 An attacker might might use this functionality to appear as if IP 368 address sharing is occuring, in the hopes that a naive server will 369 allow additional attack traffic. TCP servers and applications SHOULD 370 NOT assume the mere presence of the functionality described in this 371 paper indicates there are other (benign) users sharing the same IP 372 address. 374 The modification of the TSVal option value will break TCP-AO RFC5925 375 [RFC5925], which provides integrity protection of the TCP SYN 376 (including TCP options). However, TCP-AO is already known to not 377 survive address sharing (through a NAPT or through an application 378 proxy). 380 12. IANA considerations 382 None. 384 13. Acknowledgements 386 Thanks to Nicholas Leavy for the review. 388 14. References 390 14.1. Normative References 392 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 393 for High Performance", RFC 1323, May 1992. 395 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 396 Requirement Levels", BCP 14, RFC 2119, March 1997. 398 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 399 Authentication Option", RFC 5925, June 2010. 401 14.2. Informative References 403 [I-D.ietf-intarea-shared-addressing-issues] 404 Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 405 Roberts, "Issues with IP Address Sharing", 406 draft-ietf-intarea-shared-addressing-issues-01 (work in 407 progress), June 2010. 409 [I-D.ietf-tcpm-tcp-timestamps] 410 Gont, F., "Reducing the TIME-WAIT state using TCP 411 timestamps", draft-ietf-tcpm-tcp-timestamps-00 (work in 412 progress), June 2010. 414 [I-D.ietf-tsvwg-port-randomization] 415 Larsen, M. and F. Gont, "Transport Protocol Port 416 Randomization Recommendations", 417 draft-ietf-tsvwg-port-randomization-09 (work in progress), 418 August 2010. 420 [RFC1948] Bellovin, S., "Defending Against Sequence Number Attacks", 421 RFC 1948, May 1996. 423 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 424 Mitigations", RFC 4987, August 2007. 426 [URL.Murmur-hash] 427 "Murmur hash", . 429 Authors' Addresses 431 Andrew Yourtchenko 432 cisco 433 6a de Kleetlaan 434 Diegem 1831 435 BE 437 Phone: +32 2 704 5494 438 Email: ayourtch@cisco.com 439 Dan Wing 440 cisco 441 170 West Tasman Drive 442 San Jose CA 95134 443 USA 445 Email: dwing@cisco.com