idnits 2.17.1 

draft-yourtchenko-nat-reveal-hash-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (August 25, 2010) is 4992 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)

  == Outdated reference: A later version (-05) exists of
     draft-ietf-intarea-shared-addressing-issues-01

  == Outdated reference: A later version (-04) exists of
     draft-ietf-tcpm-tcp-timestamps-00

  -- Obsolete informational reference (is this intentional?): RFC 1948
     (Obsoleted by RFC 6528)


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     A. Yourtchenko
3	Internet-Draft                                                   D. Wing
4	Intended status:  Standards Track                                  cisco
5	Expires:  February 26, 2011                              August 25, 2010

7	       NAT confessions: revealing the hosts behind the translator
8	                  draft-yourtchenko-nat-reveal-hash-00

10	Abstract

12	   When an IP address is shared among several subscribers, it is
13	   impossible to determine which subscriber has initiated that TCP
14	   connection.  This memo describes a technique to share the identity of
15	   a subscriber that initiated a TCP connection with the TCP server..
16	   The proposed method avoids altering the application-level payload and
17	   works well with SSL-protected connections.

19	Status of this Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on February 26, 2011.

36	Copyright Notice

38	   Copyright (c) 2010 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
54	   2.  Notational Conventions . . . . . . . . . . . . . . . . . . . .  4
55	   3.  Description  . . . . . . . . . . . . . . . . . . . . . . . . .  4
56	   4.  Calculating the Internal Address Mapping . . . . . . . . . . .  5
57	   5.  Calculating the Verifier . . . . . . . . . . . . . . . . . . .  6
58	   6.  Encoding of the VFY into the packet: IP ID encoding  . . . . .  6
59	   7.  Encoding of the VFY into the packet: TSval encoding  . . . . .  6
60	   8.  Operation of the mechanism . . . . . . . . . . . . . . . . . .  7
61	     8.1.  Translator Operation . . . . . . . . . . . . . . . . . . .  7
62	     8.2.  Server Operation . . . . . . . . . . . . . . . . . . . . .  7
63	   9.  Interaction with TCP SYN cookies . . . . . . . . . . . . . . .  8
64	   10. Other Mechanisms to Encode Client Identifier . . . . . . . . .  8
65	     10.1. Defining a new TCP option to store the address . . . . . .  8
66	     10.2. Using TSecr in TCP SYN . . . . . . . . . . . . . . . . . .  8
67	     10.3. Reserving the different port ranges per client . . . . . .  8
68	   11. Security Considerations  . . . . . . . . . . . . . . . . . . .  8
69	   12. IANA considerations  . . . . . . . . . . . . . . . . . . . . .  9
70	   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
71	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
72	     14.1. Normative References . . . . . . . . . . . . . . . . . . .  9
73	     14.2. Informative References . . . . . . . . . . . . . . . . . . 10
74	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10

76	1.  Introduction

78	   There are several scenarios where it is valuable to know the identity
79	   of a TCP client, including geolocation, DoS blocking, and spam
80	   blacklists.  Today, this is done by equating IPv4 address with
81	   'identity'.  However, the identity of a TCP client is obscured when
82	   an IP address is shared I-D.ietf-intarea-shared-addressing-issues
83	   [I-D.ietf-intarea-shared-addressing-issues].  IP address sharing is
84	   done by both network address and port translators (NAPT) and by
85	   application-layer proxies (e.g., HTTP or FTP proxies).

87	   The current state of the art requires the address sharing alter the
88	   application-level payload and include the identity of the internal
89	   host -- usually the internal host's private IP address.  This incurs
90	   several drawbacks,

92	   o  adjustment of TCP sequence numbers and acknowledgement numbers for
93	      the duration of the TCP session

95	   o  risk of false-positive application matching (e.g., accidentally
96	      inserting an HTTP header into a non-HTTP payload).

98	   o  interference with application payload by increasing packet size
99	      (e.g., MTU)

101	    With SSL-protected applications the current state of the art
102	   requires breaking the end-to-end encrypted connection.  This results
103	   in several undesirable consequences:

105	   o  necessity for the translator to break the end-to-end encryption,
106	      typically by installing an addional Certificate Authority on the
107	      client's CA trust list

109	   o  noticeable increase in the processing power required on the
110	      address sharing device to decrypt and re-encrypt that application
111	      payload

113	   This specification avoids the problems described above, and defines
114	   the method of communicating the TCP client's identity to the TCP
115	   server by overloading the TCP timestamp field and IP Identifier field
116	   of the initial TCP SYN.

118	   This extension is necessary because IP address sharing, deployed by
119	   NAT64 devices, will allow malicious users to connect to IPv4-capable
120	   servers.  Thus, until a server is only accessible via IPv6 (and
121	   inaccessible via IPv4), the IPv4-capable server will suffer from an
122	   inability to identify individual TCP clients as discussed in
123	   I-D.ietf-intarea-shared-addressing-issues

125	   [I-D.ietf-intarea-shared-addressing-issues].

127	2.  Notational Conventions

129	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
130	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
131	   document are to be interpreted as described in RFC2119 [RFC2119].

133	3.  Description

135	   This proposal leverages the common deployment of  TCP timestamps and
136	   that a timestamp-aware TCP server will echo the timestamp..

138	   The caveat with the above is that the remote peer must know in
139	   advance if the TCP client implements this technique or not -- the
140	   timestamp on the server side looks just the same.  This could be
141	   resolved by manual configuration but that is impractical, so an
142	   automatic detection mechanism is proposed.  The automatic mechanism
143	   calculates a hash over the values of interest and placing the result
144	   into another field.  The receiver can then perform the same operation
145	   and verify.  If the received and computed values match, then the TCP
146	   timestamp received does contain the encoded internal address.  The
147	   verifier value is computed as a hash function over the mapped value
148	   encoded into the timestamp, address after translation, and the TCP
149	   initial sequence number - i.e. the sequence number within the SYN
150	   segment.  The usage of the TCP initial sequence number allows to
151	   avoid the verifier value being almost always the same.  The reason
152	   for doing so is to satisfy the protocol constraints of the field that
153	   is used to convey this value.

155	   In order to find some place for storing this verification value, we
156	   make another observation:  TCP SYN segments are generally rather
157	   small, and the minimum MTU on IPv4 is 576.  Typical stacks send the
158	   TCP SYN with DF=1.  Therefore, they would never be fragmented.  This
159	   means we could use the 16-bit value of the IP ID to put the verifier
160	   value in.  The verifier is dependent on the initial sequence number
161	   (ISN) -- which is should have some randomness properties as described
162	   in RFC1948 [RFC1948], therefore the IP ID will be reasonably
163	   different to still serve its purpose even in the extremely unlikely
164	   case that the TCP SYN is fragmented.

166	   Using a 16-bit value as a verifier gives 1 in 65536 chances (or,
167	   0.0015%) probability of erroneously judging that the timestamp
168	   contains the encoded internal address.  This may be insufficient
169	   assurance for some of the scenarios.  Therefore, we calculate the
170	   verifier (referred to as VFY value) to be a 32-bit integer - and
171	   store 16 or more bits of this value - at the expense of storing less
172	   bits of Internal Address Mapping (iAM).  However, we expect that the
173	   range of iAM for a single public translation would be relatively
174	   small - so, no information will be lost in this process.

176	4.  Calculating the Internal Address Mapping

178	   The main useful property of iAM is that it MUST stay the same for the
179	   same internal address unless the configuration on the translator has
180	   changed.  Since the goal is to provide the stable mapping, rather
181	   than fully reveal the internal address, any method that has this
182	   property is acceptable - and the choice of it is left to the
183	   implementors of the translator.  If the addresses to be translated
184	   are configured as a prefix, then the iAM can be obtained just by
185	   taking the host bits of the address within the prefix.  If the
186	   assignment of these addresses is on an individual basis, then the
187	   simple enumeration might be used.  If the internal addresses are
188	   assigned to the pool as set of subnets - then the combination of the
189	   two methods above (the host bits in the least significant part, and
190	   the enumeration in the most significant part) will give good results.
191	   This also stimulates allocation of the internal address in equal-
192	   sized chunks, which should make the maintenance of the network
193	   easier.

195	   As a result, the calculation of the iAM on the outgoing SYN segment
196	   MUST return two values:

198	   o  iAM = Internal Address Mapping:  a 32-bit unsigned integer

200	   o  siAM = Size of Internal Address Mapping, in bits:  integer,
201	      allowed range 9..24 - this is the number of significant bits
202	      within the iAM.

204	   The minimum value of siAM being 9 was chosen based on the following
205	   logic:

207	   o  having a room of 512 possible hosts allows to keep the property of
208	      iAM to not change during the smaller configuration changes, in
209	      case the pool is made up of individual hosts.

211	   o  the range 9..24 has exactly 16 possible values, which will be
212	      useful for encoding.

214	   By encoding only the significant bits of the internal address mapping
215	   the operator of the translator can minimize the probability of the
216	   error - all the unused bits are allocated for the value used to
217	   "fingerprint" the presence of the internal identifier.  The more bits
218	   this "Verifier" value can contain - the less is the chance of
219	   accidental match - and erroneous record of the internal identifier
220	   when there is none.

222	   The range from 9 bits to 24 bits allows to encode between 512 and
223	   16777216 internal identifiers for a single public IP address.

225	5.  Calculating the Verifier

227	   The verifier is calculated as a 32-bit result of a hash function.
228	   This hash function is not expected to be cryptographically strong
229	   (the 'Security considerations' section explains why), however it
230	   should have good distribution, good collision resistance, good
231	   avalanche behavior and be fast and cheap to compute.  These
232	   properties are satisfied by Murmur hash [URL.Murmur-hash] function,
233	   therefore it is the hash that we will use.

235	   The calculation of the VFY is performed as follows:

237	   VFY = murmur(iAM | AddrPub | siAM, TCP-ISN)

239	   o  iAM is included into the calculation as a 32 bit word.

241	   o  siAM is included into the hash calculation as a single byte.
242	      (TBD:  the 'selector' referenced below might be a more natural
243	      number to check against, instead of siAM ?).

245	6.  Encoding of the VFY into the packet: IP ID encoding

247	   The low 16 bits of the VFY are encoded in network order into the IP
248	   ID of the packet after translation. the remaining 16 bits form the
249	   "VFYhi" value, which we attempt to fit into the TSval along with the
250	   other information.

252	7.  Encoding of the VFY into the packet: TSval encoding

254	   The TCP timestamp field encodes the iAM and VFYhi as follows:

256	    3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
257	    1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
258	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
259	   |E E E E|S S S S| iAM MSB ... iAM LSB  | VFYhi MSB .. VFYhi LSB |
260	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

262	   The range of siAM gives 16 possible ways to store iAM (along with the
263	   same number of degrees of assurance for the detection).  In order to
264	   distinguish between those, we introduce the encoding selector (S)
265	   field, which will determine how the lower 24 bits are split between
266	   the iAM and the upper 16 bit of VFY.  Note that the smallest value of
267	   siAM being 9, we will never be able to store the most significant bit
268	   of VFY.

270	   The value of S is the number of zero-fill right-shift operations it
271	   would take on the low 24 bit in order to "normalize" the iAM - or, in
272	   other words, it is the number of bits of VFYhi stored within the
273	   timestamp.

275	   Best practices in I-D.ietf-tcpm-tcp-timestamps
276	   [I-D.ietf-tcpm-tcp-timestamps], mention that to reduce the TIME-WAIT
277	   state the timestamp value should be monotonously increasing across
278	   the connections with the same 5-tuple.  To give the translators an
279	   opportunity to achieve this property, we reserve several most
280	   significant bits within the timestamp to signify the "Epoch" (E).This
281	   would require storing some additional state per 5-tuple, and the
282	   implementation of such a mechanism is outside of scope for this
283	   document.  The implementations that do not implement the monotonously
284	   increasing timestamps, MUST keep the Epoch bits intact from the
285	   original value of the timestamp.

287	8.  Operation of the mechanism

289	   This section outlines the use of this mechanism by the translators
290	   and servers.

292	8.1.  Translator Operation

294	   The translator is involved into processing of the initial SYN segment
295	   (calculating the new version of the TCP timestamp and IP ID), as well
296	   as the SYN-ACK segments (restoring the original value of the TCP
297	   timestamp within the TSecr field).

299	8.2.  Server Operation

301	   The server would operate on every SYN that is of interest for the
302	   logging.  It would extract the candidate iAM, and calculate the VFY
303	   value based on the public address and TCP ISN within the received SYN
304	   segment.  Then it would compare the VFY against the corresponding
305	   bits in the TSval and IP ID fields.  If there is a match, it means
306	   (with a reasonable probability) that the iAM was a valid one
307	   calculated by the translator inbetween.  This information is stored
308	   for later access by the application listening on that socket (e.g.,
309	   stored in the TCB).

311	9.  Interaction with TCP SYN cookies

313	   TCP SYN cookies are commonly deployed to mitigate TCP SYN attacks
314	   RFC4987 [RFC4987].  The mechanism described in this document requires
315	   the server store extra information which arrives on the TCP SYN,
316	   which increases the TCP server's attack surface.  To mitigate this,
317	   the translator should apply the similar algorithm to the timestamp of
318	   the ACK segment that is sent by the initiator of the connection in
319	   response to the server's SYN ACK.  The authors considered that
320	   serverside might use the TSval in its SYN ACK segment, however this
321	   would interfere with the Extended syncookies.  This section needs
322	   further discussion.

324	10.  Other Mechanisms to Encode Client Identifier

326	   This section outlines other mechanisms that we considered, and
327	   outlines the reasons we consider them not applicable.

329	10.1.  Defining a new TCP option to store the address

331	   This would be the cleanest and simplest approach, and is discussed in
332	   [ I-D.wing-reveal-address].

334	10.2.  Using TSecr in TCP SYN

336	   This value is set to zero, and is effectively unused - so it looks
337	   like a convenient place.  However this violates the RFC1323
338	   [RFC1323], and this would require much more thorough testing - and
339	   update to RFC1323 [RFC1323].

341	10.3.  Reserving the different port ranges per client

343	   This approach has an appeal due to its simplicity, but it would be
344	   specific to each NAPT device operated by each service provider.  That
345	   is, there is no way to identify the device or know the source port
346	   range assigned to an TCP client without contacting the administrator
347	   of the NAPT device.  Restricting clients to a specific range also
348	   exposes the clients to some security risk I-D.ietf-tsvwg-port-
349	   randomization [I-D.ietf-tsvwg-port-randomization].

351	11.  Security Considerations

353	   The connections that happen, today, without aNAPT necessarily reveal
354	   the source address of the TCP client -- so revealing the identity of
355	   the client this should not be a concern except for the installations
356	   that attempt to use NAPT for "privacy" reasons.  If such an
357	   installation exists, it is easy to see that any 1:1 remapping of
358	   e.g., IP ID would cause the failure of the validation algorithm -
359	   therefore "protecting the identity".

361	   Therefore, if an organization has more than one level of NAPT and
362	   wants to ensure that the internal translators do not disclose the
363	   information about the internal addresses, it can alter any of the
364	   elements used for the calculations - e.g. randomize the ISN, or remap
365	   the IP ID.

367	   An attacker might might use this functionality to appear as if IP
368	   address sharing is occuring, in the hopes that a naive server will
369	   allow additional attack traffic.  TCP servers and applications SHOULD
370	   NOT assume the mere presence of the functionality described in this
371	   paper indicates there are other  (benign) users sharing the same IP
372	   address.

374	   The modification of the TSVal option value will break TCP-AO  RFC5925
375	   [RFC5925], which provides integrity protection of the  TCP SYN
376	   (including TCP options).  However, TCP-AO is already known to not
377	   survive address sharing (through a NAPT or through an application
378	   proxy).

380	12.  IANA considerations

382	   None.

384	13.  Acknowledgements

386	   Thanks to Nicholas Leavy for the review.

388	14.  References

390	14.1.  Normative References

392	   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
393	              for High Performance", RFC 1323, May 1992.

395	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
396	              Requirement Levels", BCP 14, RFC 2119, March 1997.

398	   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
399	              Authentication Option", RFC 5925, June 2010.

401	14.2.  Informative References

403	   [I-D.ietf-intarea-shared-addressing-issues]
404	              Ford, M., Boucadair, M., Durand, A., Levis, P., and P.
405	              Roberts, "Issues with IP Address Sharing",
406	              draft-ietf-intarea-shared-addressing-issues-01 (work in
407	              progress), June 2010.

409	   [I-D.ietf-tcpm-tcp-timestamps]
410	              Gont, F., "Reducing the TIME-WAIT state using TCP
411	              timestamps", draft-ietf-tcpm-tcp-timestamps-00 (work in
412	              progress), June 2010.

414	   [I-D.ietf-tsvwg-port-randomization]
415	              Larsen, M. and F. Gont, "Transport Protocol Port
416	              Randomization Recommendations",
417	              draft-ietf-tsvwg-port-randomization-09 (work in progress),
418	              August 2010.

420	   [RFC1948]  Bellovin, S., "Defending Against Sequence Number Attacks",
421	              RFC 1948, May 1996.

423	   [RFC4987]  Eddy, W., "TCP SYN Flooding Attacks and Common
424	              Mitigations", RFC 4987, August 2007.

426	   [URL.Murmur-hash]
427	              "Murmur hash", <http://sites.google.com/site/murmurhash/>.

429	Authors' Addresses

431	   Andrew Yourtchenko
432	   cisco
433	   6a de Kleetlaan
434	   Diegem  1831
435	   BE

437	   Phone:  +32 2 704 5494
438	   Email:  ayourtch@cisco.com
439	   Dan Wing
440	   cisco
441	   170 West Tasman Drive
442	   San Jose  CA 95134
443	   USA

445	   Email:  dwing@cisco.com