idnits 2.17.1 

draft-irtf-pearg-numeric-ids-generation-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (May 14, 2020) is 1436 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 6528 (Obsoleted by RFC 9293)

  == Outdated reference: A later version (-11) exists of
     draft-gont-numeric-ids-sec-considerations-04

  == Outdated reference: A later version (-11) exists of
     draft-irtf-pearg-numeric-ids-history-02


     Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Research Task Force (IRTF)                              F. Gont
3	Internet-Draft                                              SI6 Networks
4	Intended status: Informational                                   I. Arce
5	Expires: November 15, 2020                                     Quarkslab
6	                                                            May 14, 2020

8	           On the Generation of Transient Numeric Identifiers
9	               draft-irtf-pearg-numeric-ids-generation-02

11	Abstract

13	   This document performs an analysis of the security and privacy
14	   implications of different types of "numeric identifiers" used in IETF
15	   protocols, and tries to categorize them based on their
16	   interoperability requirements and the associated failure severity
17	   when such requirements are not met.  Subsequently, it provides advice
18	   on possible algorithms that could be employed to satisfy the
19	   interoperability requirements of each identifier category, while
20	   minimizing the security and privacy implications, thus providing
21	   guidance to protocol designers and protocol implementers.  Finally,
22	   this describes a number of algorithms that have been employed in real
23	   implementations to generate transient numeric identifiers and
24	   analyzes their security and privacy properties.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on November 15, 2020.

43	Copyright Notice

45	   Copyright (c) 2020 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
59	   3.  Threat Model  . . . . . . . . . . . . . . . . . . . . . . . .   5
60	   4.  Issues with the Specification of Identifiers  . . . . . . . .   5
61	   5.  Protocol Failure Severity . . . . . . . . . . . . . . . . . .   6
62	   6.  Categorizing Identifiers  . . . . . . . . . . . . . . . . . .   6
63	   7.  Common Algorithms for Transient Numeric Identifier Generation   9
64	     7.1.  Category #1: Uniqueness (soft failure)  . . . . . . . . .   9
65	     7.2.  Category #2: Uniqueness (hard failure)  . . . . . . . . .  11
66	     7.3.  Category #3: Uniqueness, stable within context (soft
67	           failure)  . . . . . . . . . . . . . . . . . . . . . . . .  12
68	     7.4.  Category #4: Uniqueness, monotonically increasing within
69	           context (hard failure)  . . . . . . . . . . . . . . . . .  13
70	   8.  Common Vulnerabilities Associated with Transient Numeric
71	       Identifiers . . . . . . . . . . . . . . . . . . . . . . . . .  19
72	     8.1.  Network Activity Correlation  . . . . . . . . . . . . . .  19
73	     8.2.  Information Leakage . . . . . . . . . . . . . . . . . . .  20
74	     8.3.  Exploitation of Semantics of Transient Numeric
75	           Identifiers . . . . . . . . . . . . . . . . . . . . . . .  21
76	     8.4.  Exploitation of Collisions of Transient Numeric
77	           Identifiers . . . . . . . . . . . . . . . . . . . . . . .  21
78	     8.5.  Cryptanalysis . . . . . . . . . . . . . . . . . . . . . .  21
79	   9.  Vulnerability Analysis of Specific Transient Numeric
80	       Identifiers Categories  . . . . . . . . . . . . . . . . . . .  22
81	     9.1.  Category #1: Uniqueness (soft failure)  . . . . . . . . .  22
82	     9.2.  Category #2: Uniqueness (hard failure)  . . . . . . . . .  23
83	     9.3.  Category #3: Uniqueness, stable within context (soft
84	           failure)  . . . . . . . . . . . . . . . . . . . . . . . .  23
85	     9.4.  Category #4: Uniqueness, monotonically increasing within
86	           context (hard failure)  . . . . . . . . . . . . . . . . .  23
87	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
88	   11. Security Considerations . . . . . . . . . . . . . . . . . . .  25
89	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  26
90	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
91	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  26
92	     13.2.  Informative References . . . . . . . . . . . . . . . . .  27
93	   Appendix A.  Flawed Algorithms  . . . . . . . . . . . . . . . . .  30
94	     A.1.  Predictable Linear Identifiers Algorithm  . . . . . . . .  30
95	     A.2.  Random-Increments Algorithm . . . . . . . . . . . . . . .  32

97	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  34

99	1.  Introduction

101	   Network protocols employ a variety of numeric identifiers for
102	   different protocol entities, ranging from DNS Transaction IDs (TxIDs)
103	   to transport protocol ports (e.g.  TCP ports) or IPv6 Interface
104	   Identifiers (IIDs).  These identifiers usually have specific
105	   properties (e.g. uniqueness during a specified period of time) that
106	   must be satisfied such that they do not result in negative
107	   interoperability implications, and an associated failure severity
108	   when such properties are not met, ranging from soft to hard failures.

110	   For more than 30 years, a large number of implementations of the TCP/
111	   IP protocol suite have been subject to a variety of attacks, with
112	   effects ranging from Denial of Service (DoS) or data injection, to
113	   information leakages that could be exploited for pervasive monitoring
114	   [RFC7258].  The root cause of these issues has been, in many cases,
115	   the poor selection of transient numeric identifiers in such
116	   protocols, usually as a result of insufficient or misleading
117	   specifications.  While it is generally trivial to identify an
118	   algorithm that can satisfy the interoperability requirements of a
119	   given identifier, empirical evidence exists that doing so without
120	   negatively affecting the security and/or privacy properties of the
121	   aforementioned protocols is prone to error
122	   [I-D.irtf-pearg-numeric-ids-history].

124	   For example, implementations have been subject to security and/or
125	   privacy issues resulting from:

127	   o  Predictable TCP Initial Sequence Numbers (ISNs)

129	   o  Predictable transport protocol ephemeral port numbers

131	   o  Predictable IPv4 or IPv6 Fragment Identifiers (Fragment IDs)

133	   o  Predictable IPv6 Interface Identifiers (IIDs)

135	   o  Predictable DNS Transaction Identifiers (TxIDs)

137	   Recent history indicates that when new protocols are standardized or
138	   new protocol implementations are produced, the security and privacy
139	   properties of the associated identifiers tend to be overlooked, and
140	   inappropriate algorithms to generate transient numeric identifiers
141	   are either suggested in the specification or selected by
142	   implementers.  As a result, it should be evident that advice in this
143	   area is warranted.

145	   This document contains a non-exhaustive survey of identifiers
146	   employed in various IETF protocols, and aims to categorize such
147	   identifiers based on their interoperability requirements, and the
148	   associated failure severity when such requirements are not met.
149	   Subsequently, it provides advice on possible algorithms that could be
150	   employed to satisfy the interoperability requirements of each
151	   category, while minimizing the associated security and privacy
152	   implications.  Finally, it analyzes several algorithms that have been
153	   employed in real implementations to meet such requirements and
154	   analyzes their security and privacy properties.

156	2.  Terminology

158	   Transient Numeric Identifier:
159	      A data object in a protocol specification that can be used to
160	      definitely distinguish a protocol object (a datagram, network
161	      interface, transport protocol endpoint, session, etc) from all
162	      other objects of the same type, in a given context.  Transient
163	      numeric identifiers are usually defined as a series of bits, and
164	      represented using integer values.  These identifiers are typically
165	      dynamically selected, as opposed to statically-assigned numeric
166	      identifiers (see e.g.  [IANA-PROT]).  We note that different
167	      identifiers may have additional requirements or properties
168	      depending on their specific use in a protocol.  We use the term
169	      "transient numeric identifier" (or simply "numeric identifier" or
170	      "identifier" as short forms) as a generic term to refer to any
171	      data object in a protocol specification that satisfies the
172	      identification property stated above.

174	   Failure Severity:
175	      The consequences of a failure to comply with the interoperability
176	      requirements of a given identifier.  Severity considers the worst
177	      potential consequence of a failure, determined by the system
178	      damage and/or time lost to repair the failure.  In this document
179	      we define two types of failure severity: "soft failure" and "hard
180	      failure".

182	   Soft Failure:
183	      A soft failure is a recoverable condition in which a protocol does
184	      not operate in the prescribed manner but normal operation can be
185	      resumed automatically in a short period of time.  For example, a
186	      simple packet-loss event that is subsequently recovered with a
187	      retransmission can be considered a soft failure.

189	   Hard Failure:
190	      A hard failure is a non-recoverable condition in which a protocol
191	      does not operate in the prescribed manner or it operates with
192	      excessive degradation of service.  For example, an established TCP
193	      connection that is aborted due to an error condition constitutes,
194	      from the point of view of the transport protocol, a hard failure,
195	      since it enters a state from which normal operation cannot be
196	      resumed.

198	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
199	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
200	   document are to be interpreted as described in RFC 2119 [RFC2119].

202	3.  Threat Model

204	   Throughout this document, we assume an attacker does not have
205	   physical or logical access to the device(s) being attacked.  We
206	   assume the attacker can simply send any traffic to the target
207	   device(s), to e.g. sample identifiers employed by such device(s).

209	4.  Issues with the Specification of Identifiers

211	   While assessing protocol specifications regarding the use of
212	   identifiers, we found that most of the issues discussed in this
213	   document arise as a result of one of the following conditions:

215	   o  Protocol specifications which under-specify the requirements for
216	      their identifiers

218	   o  Protocol specifications that over-specify their identifiers

220	   o  Protocol implementations that simply fail to comply with the
221	      specified requirements

223	   A number of protocol specifications (too many of them) have simply
224	   overlooked the security and privacy implications of transient numeric
225	   identifiers [I-D.irtf-pearg-numeric-ids-history].  Examples of them
226	   are the specification of TCP port numbers in [RFC0793], the
227	   specification of TCP sequence numbers in [RFC0793], or the
228	   specification of the DNS TxID in [RFC1035].

230	   On the other hand, there are a number of protocol specifications that
231	   over-specify some of their associated transient numeric identifiers.
232	   For example, [RFC4291] essentially overloads the semantics of IPv6
233	   Interface Identifiers (IIDs) by embedding link-layer addresses in the
234	   IPv6 IIDs, when the interoperability requirement of uniqueness could
235	   be achieved in other ways that do not result in negative security and
236	   privacy implications [RFC7721].  Similarly, [RFC2460] suggested the
237	   use of a global counter for the generation of Fragment Identification
238	   values, when the interoperability properties of uniqueness per {Src
239	   IP, Dst IP} could be achieved with other algorithms that do not
240	   result in negative security and privacy implications [RFC7739].

242	   Finally, there are protocol implementations that simply fail to
243	   comply with existing protocol specifications.  For example, some
244	   popular operating systems (notably Microsoft Windows) still fail to
245	   implement transport protocol ephemeral port randomization, as
246	   recommended in [RFC6056].

248	5.  Protocol Failure Severity

250	   Section 2 defines the concept of "Failure Severity", along with two
251	   types of failure severities that we employ throughout this document:
252	   soft and hard.

254	   Our analysis of the severity of a failure is performed from the point
255	   of view of the protocol in question.  However, the corresponding
256	   severity on the upper application or protocol may not be the same as
257	   that of the protocol in question.  For example, a TCP connection that
258	   is aborted may or may not result in a hard failure of the upper
259	   application protocol: if the upper application can establish a new
260	   TCP connection without any impact on the application, a hard failure
261	   at the TCP protocol may have no severity at the application level.
262	   On the other hand, if a hard failure of a TCP connection results in
263	   excessive degradation of service at the application layer, it will
264	   also result in a hard failure at the application.

266	6.  Categorizing Identifiers

268	   This section includes a non-exhaustive survey of transient numeric
269	   identifiers, and proposes a number of categories that can accommodate
270	   these identifiers based on their interoperability requirements and
271	   their failure modes (soft or hard)
272	   +--------------+------------------------------------+---------------+
273	   |  Identifier  |   Interoperability Requirements    |    Failure    |
274	   |              |                                    |    Severity   |
275	   +--------------+------------------------------------+---------------+
276	   | IPv6 Frag ID |  Uniqueness (for IP address pair)  | Soft/Hard (1) |
277	   +--------------+------------------------------------+---------------+
278	   |   IPv6 IID   | Uniqueness (and stable within IPv6 |    Soft (3)   |
279	   |              |            prefix) (2)             |               |
280	   +--------------+------------------------------------+---------------+
281	   |   TCP ISN    |      Monotonically-increasing      |    Hard (4)   |
282	   +--------------+------------------------------------+---------------+
283	   |   TCP eph.   |   Uniqueness (for connection ID)   |      Hard     |
284	   |     port     |                                    |               |
285	   +--------------+------------------------------------+---------------+
286	   |  IPv6 Flow   |             Uniqueness             |    None (5)   |
287	   |    Label     |                                    |               |
288	   +--------------+------------------------------------+---------------+
289	   |   DNS TxID   |             Uniqueness             |    None (6)   |
290	   +--------------+------------------------------------+---------------+

292	                      Table 1: Survey of Identifiers

294	   Notes:

296	   (1)
297	      While a single collision of Fragment ID values would simply lead
298	      to a single packet drop (and hence a "soft" failure), repeated
299	      collisions at high data rates might trash the Fragment ID space,
300	      leading to a hard failure [RFC4963].

302	   (2)
303	      While the interoperability requirements are simply that the
304	      Interface ID results in a unique IPv6 address, for operational
305	      reasons it is typically desirable that the resulting IPv6 address
306	      (and hence the corresponding Interface ID) be stable within each
307	      network [RFC7217] [RFC8064].

309	   (3)
310	      While IPv6 Interface IDs must result in unique IPv6 addresses,
311	      IPv6 Duplicate Address Detection (DAD) [RFC4862] allows for the
312	      detection of duplicate addresses, and hence such Interface ID
313	      collisions can be recovered.

315	   (4)
316	      In theory, there are no interoperability requirements for TCP
317	      Initial Sequence Numbers (ISNs), since the TIME-WAIT state and
318	      TCP's "quiet time" concept take care of old segments from previous
319	      incarnations of the connection.  However, a widespread
320	      optimization allows for a new incarnation of a previous connection
321	      to be created if the ISN of the incoming SYN is larger than the
322	      last sequence number seen in that direction for the previous
323	      incarnation of the connection.  Thus, monotonically-increasing TCP
324	      sequence numbers allow for such optimization to work as expected
325	      [RFC6528], since otherwise such connections-establishment attempts
326	      would fail.

328	   (5)
329	      The IPv6 Flow Label is typically employed for load sharing
330	      [RFC7098], along with the Source and Destination IPv6 addresses.
331	      Reuse of a Flow Label value for the same set {Source Address,
332	      Destination Address} would typically cause both flows to be
333	      multiplexed onto the same link.  However, as long as this does not
334	      occur deterministically, it will not result in any negative
335	      implications.

337	   (6)
338	      DNS TxIDs are employed, together with the Source Address,
339	      Destination Address, Source Port, and Destination Port, to match
340	      DNS requests and responses.  However, since an implementation
341	      knows which DNS requests were sent for that set of {Source
342	      Address, Destination Address, Source Port, and Destination Port,
343	      DNS TxID}, a collision of TxID would result, if anything, in a
344	      small performance penalty (the response would nevertheless be
345	      discarded when it is found that it does not answer the query sent
346	      in the corresponding DNS query).

348	   Based on the survey above, we can categorize identifiers as follows:

350	   +-----+---------------------------------------+---------------------+
351	   | Cat |                Category               |   Sample Proto IDs  |
352	   |  #  |                                       |                     |
353	   +-----+---------------------------------------+---------------------+
354	   |  1  |       Uniqueness (soft failure)       |  IPv6 Flow L., DNS  |
355	   |     |                                       |        TxIDs        |
356	   +-----+---------------------------------------+---------------------+
357	   |  2  |       Uniqueness (hard failure)       |  IPv6 Frag ID, TCP  |
358	   |     |                                       |    ephemeral port   |
359	   +-----+---------------------------------------+---------------------+
360	   |  3  |   Uniqueness, stable within context   |      IPv6 IIDs      |
361	   |     |             (soft failure)            |                     |
362	   +-----+---------------------------------------+---------------------+
363	   |  4  |  Uniqueness, monotonically increasing |       TCP ISN       |
364	   |     |     within context (hard failure)     |                     |
365	   +-----+---------------------------------------+---------------------+

367	                      Table 2: Identifier Categories

369	   We note that Category #4 could be considered a generalized case of
370	   category #3, in which a monotonically increasing element is added to
371	   a stable (within context) element, such that the resulting
372	   identifiers are monotonically increasing within a specified context.
373	   That is, the same algorithm could be employed for both #3 and #4,
374	   given appropriate parameters.

376	7.  Common Algorithms for Transient Numeric Identifier Generation

378	   The following subsections describe some sample algorithms that can be
379	   employed for generating transient numeric identifiers for each of the
380	   categories above.

382	7.1.  Category #1: Uniqueness (soft failure)

384	   The requirement of uniqueness with a soft failure mode can be
385	   complied with a Pseudo-Random Number Generator (PRNG).  In scenarios
386	   where ongoing use of previously selected numeric IDs is possible and
387	   desirable, an implementation may opt to select the next available
388	   identifier in the same sequence, or select another random number.
389	   Section 7.1.1 is an implementation of the former strategy, while
390	   Section 7.1.2 is an implementation of the later.

392	   We note that since the premise is that collisions of numeric
393	   identifiers of this category only leads to soft failures, in many (if
394	   not most) cases, the algorithm will not need to check the suitability
395	   of a selected identifier (i.e., check_suitable_id() would always be
396	   "true").

398	7.1.1.  Simple Randomization Algorithm
399	       /* Numeric ID selection function */

401	       id_range = max_id - min_id + 1;
402	       next_id = min_id + (random() % id_range);
403	       count = next_id;

405	       do {
406	           if(check_suitable_id(next_id))
407	               return next_id;

409	           if (next_id == max_id) {
410	               next_id = min_id;
411	           } else {
412	               next_id++;
413	           }

415	           count--;
416	       } while (count > 0);

418	       return ERROR;

420	   NOTE:
421	      random() is a function that returns a pseudo-random unsigned
422	      integer number of appropriate size.  Note that the output needs to
423	      be unpredictable, and typical implementations of the POSIX
424	      random() function do not necessarily meet this requirement.  See
425	      [RFC4086] for randomness requirements for security.  Beware that
426	      that "adapting" the length of the output of random() with a modulo
427	      operator (e.g., C language's "%") may change the distribution of
428	      the PRNG.

430	      The function check_suitable_id() can check, when possible and
431	      desirable, whether this identifier is suitable (e.g. it is not
432	      already in use).  Depending on how/where the numeric identifier is
433	      used, it may or may not be possible (or even desirable) to check
434	      whether the numeric identifier is in use (or whether it has been
435	      recently been employed).  When an identifier is found to be
436	      unsuitable, this algorithm selects the next available numeric
437	      identifier in sequence.

439	      All the variables (in this and all the algorithms discussed in
440	      this document) are unsigned integers.

442	   This algorithm does not suffer from any of the issues discussed in
443	   Section 8.

445	7.1.2.  Another Simple Randomization Algorithm

447	   The following pseudo-code illustrates another algorithm for selecting
448	   a random numeric identifier which, in the event a selected identifier
449	   is found to be unsuitable (e.g., already in use), another identifier
450	   is randomly selected:

452	       /* Numeric ID selection function */

454	       id_range = max_id - min_id + 1;
455	       next_id = min_id + (random() % id_range);
456	       count = id_range;

458	       do {
459	           if(check_suitable_id(next_id))
460	               return next_id;

462	           next_id = min_id + (random() % id_range);
463	           count--;
464	       } while (count > 0);

466	       return ERROR;

468	   This algorithm might be unable to select an identifier (i.e., return
469	   "ERROR") even if there are suitable identifiers available, in cases
470	   where a large number of identifiers are unsuitable (e.g. "in use").

472	   The same considerations from Section 7.1.1 with respect to the
473	   properties of random() and the adaptation of its output length apply
474	   to this algorithm.

476	   This algorithm does not suffer from any of the issues discussed in
477	   Section 8.

479	7.2.  Category #2: Uniqueness (hard failure)

481	   One of the most trivial approaches for achieving uniqueness for an
482	   identifier (with a hard failure mode) is to reduce the identifier
483	   reuse frequency by generating the numeric identifiers with a linear
484	   function.  As a result, all of the algorithms described in
485	   Section 7.4 ("Category #4: Uniqueness, monotonically increasing
486	   within context (hard failure)") can be readily employed for complying
487	   with the requirements of this numeric identifier category.

489	7.3.  Category #3: Uniqueness, stable within context (soft failure)

491	   The goal of the following algorithm is to produce identifiers that
492	   are stable for a given context (identified by "CONTEXT"), but that
493	   change when the aforementioned context changes.

495	   In order to avoid storing in memory the numeric identifier computed
496	   for each CONTEXT value, the following algorithm employs a calculated
497	   technique (as opposed to keeping state in memory) to generate a
498	   stable identifier for each given context.

500	       /* Numeric ID selection function  */

502	       id_range = max_id - min_id + 1;

504	       counter = 0;

506	       do {
507	           offset = F(CONTEXT, counter, secret_key);
508	           next_id = min_id + (offset % id_range);

510	           if(check_suitable_id(next_id))
511	               return next_id;

513	           counter++;

515	       } while (counter <= MAX_RETRIES);

517	       return ERROR;

519	   In the following algorithm, the function F() provides a stateless and
520	   stable per-CONTEXT numeric identifier, where CONTEXT is the
521	   concatenation of all the elements that define the given context.

523	      For example, if this algorithm is expected to produce IPv6 IIDs
524	      that are unique per network interface card (NIC) and SLAAC
525	      autoconfiguration prefix, the CONTEXT should be the concatenation
526	      of e.g. the interface index and the SLAAC autoconfiguration prefix
527	      (please see [RFC7217] for an implementation of this algorithm for
528	      generation of stable IPv6 IIDs).

530	   F() must be a cryptographically-secure hash function (e.g.  SHA-256
531	   [FIPS-SHS]), that is computed over the concatenation of its
532	   arguments.  The result of F() is no more secure than the secret key,
533	   and therefore 'secret_key' must be unknown to the attacker, and must
534	   be of a reasonable length. 'secret_key' must remain stable for a
535	   given CONTEXT, since otherwise the numeric identifiers generated by
536	   this algorithm would not have the desired stability properties (i.e.,
537	   stable for a given CONTEXT).  In most cases, 'secret_key' can be
538	   selected with a PRNG (see [RFC4086] for recommendations on choosing
539	   secrets) at an appropriate time, and stored in stable or volatile
540	   storage for future use.

542	   The result of F() is stored in the variable 'offset', which may take
543	   any value within the storage type range, since we are restricting the
544	   resulting identifier to be in the range [min_id, max_id] in a similar
545	   way as in the algorithm described in Section 7.1.1.

547	   check_suitable_id() checks that the candidate identifier has suitable
548	   uniqueness properties.  Collisions (i.e., an identifier that is not
549	   unique) are recovered by incrementing the 'counter' variable and
550	   recomputing F().

552	   For obvious reasons, the transient network identifiers generated with
553	   this algorithm allow for network activity correlation within
554	   "CONTEXT".  However, this is essentially a design goal of this
555	   category of transient numeric identifiers.

557	7.4.  Category #4: Uniqueness, monotonically increasing within context
558	      (hard failure)

560	7.4.1.  Per-context Counter Algorithm

562	   One possible way to achieve low identifier reuse frequency while
563	   still avoiding predictable sequences would be to employ a per-context
564	   counter, as opposed to a global counter.  Such an algorithm could be
565	   described as follows:

567	       /* Initialization code */
568	       id_inc = 1;

570	       /* Numeric ID selection function */

572	       count = max_id - min_id + 1;

574	       if(lookup_counter(CONTEXT) == ERROR){
575	            create_counter(CONTEXT);
576	       }

578	       next_id = lookup_counter(CONTEXT);

580	       do {
581	           if (next_id == max_id) {
582	               next_id = min_id;
583	           }
584	           else {
585	               next_id = next_id + id_inc;
586	           }

588	           if (check_suitable_id(next_id)){
589	               store_counter(CONTEXT, next_id);
590	               return next_id;
591	           }

593	           count--;

595	       } while (count > 0);

597	       store_counter(CONTEXT, next_id);
598	       return ERROR;

600	   NOTE:
601	      lookup_counter() returns the current counter for a given context,
602	      or an error condition if such a counter does not exist.

604	      create_counter() creates a counter for a given context, and
605	      initializes such counter to a random value.

607	      store_counter() saves (updates) the current counter for a given
608	      context.

610	      check_suitable_id() is a function that checks whether the
611	      resulting identifier is acceptable (e.g., whether it is not
612	      already in use, etc.).

614	   Essentially, whenever a new identifier is to be selected, the
615	   algorithm checks whether there there is a counter for the
616	   corresponding context.  If there is, such counter is incremented to
617	   obtain the new identifier, and the new identifier updates the
618	   corresponding counter.  If there is no counter for such context, a
619	   new counter is created an initialized to a random value, and used as
620	   the new identifier.  This algorithm produces a per-context counter,
621	   which results in one linear function for each context.  Since each
622	   counter is initialized to a random value, the resulting values are
623	   unpredictable by an off-path attacker.

625	   This algorithm has the following drawbacks:

627	   o  This algorithm requires an implementation to store each per-
628	      CONTEXT counter in memory.  If, as a result of resource
629	      management, the counter for a given context must be removed, the
630	      last identifier value used for that context will be lost.  Thus,
631	      if subsequently an identifier needs to be generated for the same
632	      context, that counter will need to be recreated and reinitialized
633	      to random value, thus possibly leading to reuse/collision of
634	      numeric identifiers.

636	   o  An implementation may map more than one context to the same
637	      counter, such the amount of memory required to store counters is
638	      reduce, at the expense of a possible unnecessary increase in the
639	      numeric identifier reuse frequency.  In such cases, if the
640	      identifiers are predictable by the destination system (e.g., the
641	      destination host represents the "context"), a vulnerable host
642	      might possibly leak to third parties the identifiers used by other
643	      hosts to send traffic to it (i.e., a vulnerable Host B could leak
644	      to Host C the identifier values that Host A is using to send
645	      packets to Host B).  Appendix A of [RFC7739] describes one
646	      possible scenario for such leakage in detail.

648	   Otherwise, the identifiers produced by this algorithm do not suffer
649	   from the other issues discussed in Section 8.

651	7.4.2.  Simple Hash-Based Algorithm

653	   The goal of this algorithm is to produce monotonically-increasing
654	   sequences, with a randomized initial value, for each given context.
655	   For example, if the identifiers being generated must be unique for
656	   each {src IP, dst IP} set, then each possible combination of {src IP,
657	   dst IP} should have a corresponding "next_id" value.

659	   Keeping one counter for each possible "context" may in many cases be
660	   considered too onerous in terms of memory requirements.  As a
661	   workaround, the following algorithm employs a calculated technique
662	   (as opposed to keeping state in memory) to maintain the random offset
663	   for each possible context.

665	   In the following algorithm, the function F() provides a (stateless)
666	   unpredictable offset for each given context (as identified by
667	   'CONTEXT').

669	       /* Initialization code */
670	       counter = 0;

672	       /* Numeric ID selection function  */

674	       id_range = max_id - min_id + 1;
675	       offset = F(CONTEXT, secret_key);
676	       count = id_range;

678	       do {
679	           next_id = min_id +
680	                  (counter + offset) % id_range;

682	           counter++;

684	           if(check_suitable_id(next_id))
685	               return next_id;

687	           count--;

689	       } while (count > 0);

691	       return ERROR;

693	   The function F() provides a "per-CONTEXT" fixed offset within the
694	   numeric identifier "space".  Both the 'offset' and 'counter'
695	   variables may take any value within the storage type range since we
696	   are restricting the resulting identifier to be in the range [min_id,
697	   max_id] in a similar way as in the algorithm described in
698	   Section 7.1.1.  This allows us to simply increment the 'counter'
699	   variable and rely on the unsigned integer to wrap around.

701	   The function F() should be a cryptographically-secure hash function
702	   (e.g.  SHA-256 [FIPS-SHS]).  CONTEXT is the concatenation of all the
703	   elements that define a given context.  For example, if this algorithm
704	   is expected to produce identifiers that are monotonically-increasing
705	   for each set (Source IP Address, Destination IP Address), CONTEXT
706	   should be the concatenation of these two IP addresses.

708	   The result of F() is no more secure than the secret key, and
709	   therefore 'secret_key' must be unknown to the attacker, and must be
710	   of a reasonable length. 'secret_key' must remain stable for a given
711	   CONTEXT, since otherwise the numeric identifiers generated by this
712	   algorithm would not have the desired stability properties (i.e.,
713	   stable for a given CONTEXT).  In most cases, 'secret_key' can be
714	   selected with a PRNG (see [RFC4086] for recommendations on choosing
715	   secrets) at an appropriate time, and stored in stable or volatile
716	   storage for future use.

718	   It should be noted that, since this algorithm uses a global counter
719	   ("counter") for selecting identifiers (i.e., all counters share the
720	   same increments space), this algorithm produces an information
721	   leakage (as described in Section 8.2).  For example, if this
722	   algorithm were used for selecting TCP ephemeral ports, and an
723	   attacker could force a client to periodically establish a new TCP
724	   connection to an attacker-controlled machine (or through an attacker-
725	   observable routing path), the attacker could subtract consecutive
726	   source port values to obtain the number of outgoing TCP connections
727	   established globally by the target host within that time period (up
728	   to wrap-around issues and five-tuple collisions, of course).

730	7.4.3.  Double-Hash Algorithm

732	   A trade-off between maintaining a single global 'counter' variable
733	   and maintaining 2**N 'counter' variables (where N is the width of the
734	   result of F()), could be achieved as follows.  The system would keep
735	   an array of TABLE_LENGTH integers, which would provide a separation
736	   of the increment space into multiple buckets.  This improvement could
737	   be incorporated into the algorithm from Section 7.4.2 as follows:

739	       /* Initialization code */

741	       for(i = 0; i < TABLE_LENGTH; i++)
742	           table[i] = random();

744	       id_inc = 1;

746	       /* Numeric ID selection function */

748	       id_range = max_id - min_id + 1;
749	       offset = F(CONTEXT, secret_key1);
750	       index = G(CONTEXT, secret_key2) % TABLE_LENGTH;
751	       count = id_range;

753	       do {
754	           next_id = min_id + (offset + table[index]) % id_range;
755	           table[index] = table[index] + id_inc;

757	           if(check_suitable_id(next_id))
758	               return next_id;

760	          count--;

762	       } while (count > 0);

764	       return ERROR;

766	   'table[]' could be initialized with random values, as indicated by
767	   the initialization code in the pseudo-code above.

769	   Both F() and G() should be a cryptographically-secure hash functions
770	   (e.g.  SHA-256 [FIPS-SHS]) computed over the concatenation of each of
771	   their respective arguments.  Both F() and G() would employ the same
772	   CONTEXT (the concatenation of all the elements that define a given
773	   context), and would use separate secreted keys (secret_key1, and
774	   secret_key2, respectively).

776	   The results of F() and G() are no more secure than their respective
777	   secret keys ('secret_key1' and 'secret_key2', respectively), and
778	   therefore both secret keys must be unknown to the attacker, and must
779	   be of a reasonable length.  Both secret keys must remain stable for
780	   the given CONTEXT, since otherwise the numeric identifiers generated
781	   by this algorithm would not have the desired stability properties
782	   (i.e., stable for a given CONTEXT).  In most cases, both secret keys
783	   can be selected with a PRNG (see [RFC4086] for recommendations on
784	   choosing secrets) at an appropriate time, and stored in stable or
785	   volatile storage for future use.

787	   The array 'table[]' assures that successive identifiers for a given
788	   context will be monotonically-increasing.  However, the increments
789	   space is separated into TABLE_LENGTH different spaces, and thus
790	   identifier reuse frequency will be (probabilistically) lower than
791	   that of the algorithm in Section 7.4.2.  That is, the generation of
792	   an identifier for one given context will not necessarily result in
793	   increments in the identifier sequence for other contexts.  It is
794	   interesting to note that the size of 'table[]' does not limit the
795	   number of different identifier sequences, but rather separates the
796	   *increments* into TABLE_LENGTH different spaces.  The identifier
797	   sequence will result from adding the corresponding entry of 'table[]'
798	   to the variable 'offset', which selects the actual identifier
799	   sequence (as in the algorithm from Section 7.4.2).

801	   An attacker can perform traffic analysis for any "increment space"
802	   (i.e., context) into which the attacker has "visibility" -- namely,
803	   the attacker can force a node to generate identifiers where
804	   G(CONTEXT, secret_key2) identifies the target "increment space".
805	   However, the attacker's ability to perform traffic analysis is very
806	   reduced when compared to the predictable linear identifiers
807	   (described in Appendix A.1) and the hash-based identifiers (described
808	   in Section 7.4.2).  Additionally, an implementation can further limit
809	   the attacker's ability to perform traffic analysis by further
810	   separating the increment space (that is, using a larger value for
811	   TABLE_LENGTH) and/or by randomizing the increments.

813	   Otherwise, this algorithm does not suffer from the issues discussed
814	   in Section 8.

816	8.  Common Vulnerabilities Associated with Transient Numeric Identifiers

818	8.1.  Network Activity Correlation

820	   An identifier that is predictable or stable within a given context
821	   allows for network activity correlation within that context.

823	   For example, a stable IPv6 Interface Identifier allows for network
824	   activity to be correlated for the context in which that address is
825	   stable [RFC7721].  A stable-per-network IPv6 Interface Identifier (as
826	   in [RFC7217]) allows for network activity correlation within a
827	   network, whereas a constant IPv6 Interface Identifier (that remains
828	   the same across networks) allows not only network activity
829	   correlation within the same network, but also across networks ("host
830	   tracking").

832	   Similarly, a node that generates TCP ISNs with a global counter could
833	   allow network activity correlation across networks, since the
834	   communicating nodes could infer the identity of the node based on the
835	   TCP ISNs employed for subsequent communication instances.  Similarly,
836	   a node that generates predictable IPv6 Fragment Identification values
837	   could be subject to network activity correlation (see e.g.
838	   [Bellovin2002]).

840	8.2.  Information Leakage

842	   Transient numeric identifiers that are not randomized can leak out
843	   information to other communicating nodes.  For example, it is common
844	   to generate identifiers like:

846	                ID = offset(CONTEXT_1) + linear(CONTEXT_2);

848	   This generic expression generates identifiers by adding a linear
849	   function to an offset.  The offset is stable within a given context,
850	   whereas linear() is a linear function for a given context (possibly
851	   different to that of offset()).  Identifiers generated with this
852	   expression will generally be predictable within CONTEXT_1.  Thus,
853	   CONTEXT_1 essentially specifies the context within which information
854	   will be "leaked".  When both CONTEXT_1 and CONTEXT_2 are a constant
855	   value, then all the corresponding transient numeric identifiers
856	   become predictable in all contexts.

858	      NOTE: If offset() has a global context and the specific value is
859	      known, the resulting identifiers may leak even more information.
860	      For example, the if Fragment Identification values are generated
861	      with the generic function above, and CONTEXT_1 is "global", then
862	      the corresponding identifiers will leak the number of fragmented
863	      datagrams sent for CONTEXT_2.  If both CONTEXT_1 and CONTEXT_2 are
864	      "global", then Fragment Identification values would be generated
865	      with a global counter (initialized to offset()), and thus each
866	      generated Fragment Identification value would leak the number of
867	      fragmented datagrams transmitted by the node since it has been
868	      bootstrapped.

870	   On the other hand, linear() will be predictable within CONTEXT_2.
871	   The predictability of linear(), irrespective of the context and/or
872	   predictability of offset(), can leak out information that is of use
873	   to attackers.  For example, a node that selects ephemeral port
874	   numbers on as in:

876	                ephemeral_port = offset(Dest_IP) + linear()

878	   that is, with a per-destination offset, but global linear() function
879	   (e.g., a global counter), will leak information about the number of
880	   outgoing connections that have been issued between any two issued
881	   outgoing connections.

883	   Similarly, a node that generates Fragment Identification values as
884	   in:

886	                Frag_ID = offset(Srd_IP, Dst_IP) + linear()

888	   will leak out information about the number of fragmented packets that
889	   have been transmitted between any two other transmitted fragmented
890	   packets.  The vulnerabilities described in [Sanfilippo1998a],
891	   [Sanfilippo1998b], and [Sanfilippo1999] are all associated with the
892	   use of a global linear() function (i.e., a global CONTEXT_2).

894	8.3.  Exploitation of Semantics of Transient Numeric Identifiers

896	   Identifiers that are not semantically opaque tend to be more
897	   predictable than semantically-opaque identifiers.  For example, a MAC
898	   address contains an OUI (Organizationally-Unique Identifier) which
899	   identifies the vendor that manufactured the underlying network
900	   interface card.  This fact may be leveraged by an attacker meaning to
901	   "guess" MAC addresses and who has some knowledge about the possible
902	   NIC vendor.

904	   [RFC7707] discusses a number of techniques to reduce the search space
905	   when performing IPv6 address-scanning attacks by leveraging the
906	   semantics of the IIDs produced by a number by traditional IID-
907	   generation algorithms that embed MAC addresses (now replaced by
908	   [RFC8064] with [RFC7217]).

910	8.4.  Exploitation of Collisions of Transient Numeric Identifiers

912	   In many cases, the collision of transient network identifiers can
913	   have a hard failure severity (or result in a hard failure severity if
914	   an attacker can cause multiple collisions deterministically, one
915	   after another).  For example, predictable Fragment Identification
916	   values open the door to Denial of Service (DoS) attacks (see e.g.
917	   [RFC5722].  Similarly, predictable TCP ISNs open the door to trivial
918	   connection-reset and data injection attacks (see e.g.
919	   [Joncheray1995]).

921	8.5.  Cryptanalysis

923	   A number of algorithms discussed in this document (such as
924	   Section 7.4.2 and Section 7.4.3 rely on cryptographically-secure hash
925	   functions.  Implementations that employ weak hash functions and keys
926	   of inappropriate size may be subject to cryptanalysis, where an
927	   attacker may be able to obtain the secret key employed for the hash
928	   algorithms, predict numeric identifiers, etc.

930	   Futhermore, an implementation that overloads the semantics of the
931	   secret key may result in more trivial cryptanalysis, possibly
932	   resulting in the leakage of the value employed for the secret key.

934	   NOTE:
935	      [IPID-DEV] describes two vulnerable numeric ID generators that
936	      employ cryptographically-weak hash functions.  Additionally, one
937	      of such implementations employs a 32-bits of a kernel address as
938	      the secret key for a hash function, and therefore successful
939	      cryptanalysis leaks the aforementioned kernel address, allowing
940	      for Kernel Address Space Layout Randomization (KASLR) [KASLR]
941	      bypass.

943	9.  Vulnerability Analysis of Specific Transient Numeric Identifiers
944	    Categories

946	   The following subsections analyze common vulnerabilities associated
947	   with the generation of identifiers for each of the categories
948	   identified in Section 6.

950	9.1.  Category #1: Uniqueness (soft failure)

952	   Possible vulnerabilities associated with identifiers of this category
953	   are:

955	   o  Use of trivial algorithms (e.g. global counters) that generate
956	      predictable identifiers

958	   o  Use of flawed PRNGs (please see e.g.  [Zalewski2001],
959	      [Zalewski2002] and [Klein2007])

961	   Since the only interoperability requirement for these identifiers is
962	   uniqueness (with an associated soft failure), the obvious approach to
963	   generate them is to employ a PRNG.  An implementer should consult
964	   [RFC4086] regarding randomness requirements for security, and consult
965	   relevant documentation when employing a PRNG provided by the
966	   underlying system.

968	   Use of algorithms other than PRNGs for generating identifiers of this
969	   category is discouraged.

971	9.2.  Category #2: Uniqueness (hard failure)

973	   As noted in Section 7.2 this category typically employs the same
974	   algorithms as Category #4, since a monotonically-increasing sequence
975	   tends to minimize the identifier reuse frequency.  Therefore, the
976	   vulnerability analysis of Section 9.4 applies to this category.

978	9.3.  Category #3: Uniqueness, stable within context (soft failure)

980	   There are three main vulnerabilities that may be associated with
981	   identifiers of this category:

983	   1.  Use algorithms or sources that result in predictable identifiers

985	   2.  Use cryptographically-weak hash functions, or inappropriate
986	       secret key sizes that allow for cryptanalysis

988	   3.  Employing the same identifier across contexts in which stability
989	       is not required (overloading the numeric identifier)

991	   At times, an implementation or specification may be tempted to employ
992	   a source for the numeric identifiers which is known to provide unique
993	   values, that may have other properties such as being predictable or
994	   leaking information about the node in question.  For example, as
995	   noted in [RFC7721], embedding link-layer addresses for generating
996	   IPv6 IIDs not only results in predictable values, but also leaks
997	   information about the manufacturer of the network interface card.

999	   Employing cryptographically-weak hash functions or inappropriate
1000	   secret key sizes may allow for cryptanalysis, which may eventually be
1001	   exploited by an attacker to predict future numeric identifiers and
1002	   perform a variety of attacks.

1004	   On the other hand, using an identifier across contexts where
1005	   stability is not required can be leveraged for correlation of
1006	   activities.  On of the most trivial examples of this is the use of
1007	   IPv6 IIDs that are stable across networks (such as IIDs that embed
1008	   the underlying link-layer address).

1010	9.4.  Category #4: Uniqueness, monotonically increasing within context
1011	      (hard failure)

1013	   A simple way to generalize algorithms employed for generating
1014	   identifiers of Category #4 would be as follows:

1016	       /* Numeric ID selection function */

1018	       count = max_id - min_id + 1;

1020	       do {
1021	           linear(CONTEXT_2)= linear(CONTEXT_2) + increment();
1022	           next_id = offset(CONTEXT_1) + linear(CONTEXT_2);

1024	           if(check_suitable_id(next_id))
1025	              return next_id;

1027	           count--;
1028	       } while (count > 0);

1030	       return ERROR;

1032	   Essentially, an identifier (next_id) is generated by adding a linear
1033	   function (linear()) to an offset value, which is unknown to the
1034	   attacker, and stable for given context (CONTEXT_1).

1036	   The following aspects of the algorithm should be considered:

1038	   o  For the most part, it is the offset() function that results in
1039	      identifiers that are unpredictable by an off-path attacker.  While
1040	      the resulting sequence will be monotonically-increasing, the use
1041	      of an offset value that is unknown to the attacker makes the
1042	      resulting values unknown to the attacker.

1044	   o  The most straightforward "stateless" implementation of offset
1045	      would be that in which offset() is the result of a
1046	      cryptographically-secure hash-function that takes the values that
1047	      identify the context and a "secret_key" (not shown in the figure
1048	      above) as arguments.

1050	   o  Another possible (but stateful) approach would be to simply
1051	      generate a random "per-context" "counter" and store it in memory,
1052	      and then look-up the corresponding context when a new identifier
1053	      is to be selected, and increment the counter to obtain the
1054	      transient numeric identifier.  The algorithm in Section 7.4.1 is
1055	      essentially an implementation of this type.

1057	   o  The linear function is incremented according to increment().  In
1058	      the most trivial case increment() could always return the constant
1059	      "1".  But it could also possibly return small random integers such
1060	      the increments are unpredictable.

1062	   Considering the generic algorithm illustrated above we can identify
1063	   the following possible vulnerabilities:

1065	   o  All the vulnerabilities discussed in Section 9.3 ("Category #3:
1066	      Uniqueness, stable within context (soft failure)") since the
1067	      algorithms for this category are similar to those of Section 9.3,
1068	      with the addition of a linear function.

1070	   o  The function linear() could be seen as representing the number of
1071	      identifiers that have so far been generated for a given context
1072	      (CONTEXT_2).  If linear() spans more than the necessary context,
1073	      the "increments" could be leaked to other parties, thus disclosing
1074	      information about the number of identifiers that have so far been
1075	      generated.  For example, an implementation in which linear() is
1076	      implemented as a single global counter will unnecessarily leak
1077	      information the number of identifiers that have been produced.
1078	      [Fyodor2004] is one example of how such information leakages can
1079	      be exploited.  However, limiting the span of the increments space
1080	      will require a larger number of counters to be stored in memory
1081	      (i.e., a larger size for the TABLE_LENGTH parameter of the
1082	      algorithm in Section 7.4.3.

1084	   o  increment() determines the increments of linear() for each
1085	      identifier that is selected.  In the most trivial case,
1086	      increment() will return the integer "1".  However, an
1087	      implementation may have increment() return a "small" random
1088	      integer value such that even if the current value employed by the
1089	      generator is guessed (see Appendix A of [RFC7739]), the exact next
1090	      identifier to be selected will be slightly harder to identify.

1092	10.  IANA Considerations

1094	   There are no IANA registries within this document.  The RFC-Editor
1095	   can remove this section before publication of this document as an
1096	   RFC.

1098	11.  Security Considerations

1100	   The entire document is about the security and privacy implications of
1101	   transient numeric identifiers.
1102	   [I-D.gont-numeric-ids-sec-considerations] formally requires protocol
1103	   specifications to include an appropriate analysis of the
1104	   interoperability, security, and privacy implications of the transient
1105	   numeric identifiers they specify and employ, while this document
1106	   analyzes possible algorithms (and their implications) that could be
1107	   employed to comply with the interoperability properties of a
1108	   transient numeric identifier, while mitigating the possible security
1109	   and privacy implications.

1111	12.  Acknowledgements

1113	   The authors would like to thank (in alphabetical order) Steven
1114	   Bellovin, Joseph Lorenzo Hall, Gre Norcie, Shivan Sahib, and Martin
1115	   Thomson, for providing valuable comments on earlier versions of this
1116	   document.

1118	   The authors would like to thank Diego Armando Maradona for his magic
1119	   and inspiration.

1121	13.  References

1123	13.1.  Normative References

1125	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
1126	              RFC 793, DOI 10.17487/RFC0793, September 1981,
1127	              <https://www.rfc-editor.org/info/rfc793>.

1129	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1130	              Requirement Levels", BCP 14, RFC 2119,
1131	              DOI 10.17487/RFC2119, March 1997,
1132	              <https://www.rfc-editor.org/info/rfc2119>.

1134	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1135	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
1136	              December 1998, <https://www.rfc-editor.org/info/rfc2460>.

1138	   [RFC4086]  Eastlake 3rd, D., Schiller, J., and S. Crocker,
1139	              "Randomness Requirements for Security", BCP 106, RFC 4086,
1140	              DOI 10.17487/RFC4086, June 2005,
1141	              <https://www.rfc-editor.org/info/rfc4086>.

1143	   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
1144	              Architecture", RFC 4291, DOI 10.17487/RFC4291, February
1145	              2006, <https://www.rfc-editor.org/info/rfc4291>.

1147	   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
1148	              Address Autoconfiguration", RFC 4862,
1149	              DOI 10.17487/RFC4862, September 2007,
1150	              <https://www.rfc-editor.org/info/rfc4862>.

1152	   [RFC5722]  Krishnan, S., "Handling of Overlapping IPv6 Fragments",
1153	              RFC 5722, DOI 10.17487/RFC5722, December 2009,
1154	              <https://www.rfc-editor.org/info/rfc5722>.

1156	   [RFC6528]  Gont, F. and S. Bellovin, "Defending against Sequence
1157	              Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
1158	              2012, <https://www.rfc-editor.org/info/rfc6528>.

1160	   [RFC7217]  Gont, F., "A Method for Generating Semantically Opaque
1161	              Interface Identifiers with IPv6 Stateless Address
1162	              Autoconfiguration (SLAAC)", RFC 7217,
1163	              DOI 10.17487/RFC7217, April 2014,
1164	              <https://www.rfc-editor.org/info/rfc7217>.

1166	   [RFC8064]  Gont, F., Cooper, A., Thaler, D., and W. Liu,
1167	              "Recommendation on Stable IPv6 Interface Identifiers",
1168	              RFC 8064, DOI 10.17487/RFC8064, February 2017,
1169	              <https://www.rfc-editor.org/info/rfc8064>.

1171	13.2.  Informative References

1173	   [Bellovin2002]
1174	              Bellovin, S., "A Technique for Counting NATted Hosts",
1175	              IMW'02 Nov. 6-8, 2002, Marseille, France, 2002.

1177	   [CPNI-TCP]
1178	              Gont, F., "Security Assessment of the Transmission Control
1179	              Protocol (TCP)",  United Kingdom's Centre for the
1180	              Protection of National Infrastructure (CPNI) Technical
1181	              Report, 2009, <https://www.gont.com.ar/papers/tn-03-09-
1182	              security-assessment-TCP.pdf>.

1184	   [FIPS-SHS]
1185	              FIPS, "Secure Hash Standard (SHS)",  Federal Information
1186	              Processing Standards Publication 180-4, August 2015,
1187	              <https://nvlpubs.nist.gov/nistpubs/FIPS/
1188	              NIST.FIPS.180-4.pdf>.

1190	   [Fyodor2004]
1191	              Fyodor, "Idle scanning and related IP ID games", 2004,
1192	              <http://www.insecure.org/nmap/idlescan.html>.

1194	   [I-D.gont-numeric-ids-sec-considerations]
1195	              Gont, F. and I. Arce, "Security Considerations for
1196	              Transient Numeric Identifiers Employed in Network
1197	              Protocols", draft-gont-numeric-ids-sec-considerations-04
1198	              (work in progress), July 2019.

1200	   [I-D.irtf-pearg-numeric-ids-history]
1201	              Gont, F. and I. Arce, "Unfortunate History of Transient
1202	              Numeric Identifiers", draft-irtf-pearg-numeric-ids-
1203	              history-02 (work in progress), April 2020.

1205	   [IANA-PROT]
1206	              IANA, "Protocol Registries",
1207	              <https://www.iana.org/protocols>.

1209	   [IPID-DEV]
1210	              Klein, A. and B. Pinkas, "From IP ID to Device ID and
1211	              KASLR Bypass (Extended Version)", June 2019,
1212	              <https://arxiv.org/pdf/1906.10478.pdf>.

1214	   [Joncheray1995]
1215	              Joncheray, L., "A Simple Active Attack Against TCP", Proc.
1216	              Fifth Usenix UNIX Security Symposium, 1995.

1218	   [KASLR]    PaX Team, "Address Space Layout Randomization",
1219	              <https://pax.grsecurity.net/docs/aslr.txt>.

1221	   [Klein2007]
1222	              Klein, A., "OpenBSD DNS Cache Poisoning and Multiple O/S
1223	              Predictable IP ID Vulnerability", 2007,
1224	              <http://www.trusteer.com/files/OpenBSD_DNS_Cache_Poisoning
1225	              _and_Multiple_OS_Predictable_IP_ID_Vulnerability.pdf>.

1227	   [Morris1985]
1228	              Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP
1229	              Software", CSTR 117, AT&T Bell Laboratories, Murray Hill,
1230	              NJ, 1985,
1231	              <https://pdos.csail.mit.edu/~rtm/papers/117.pdf>.

1233	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
1234	              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
1235	              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

1237	   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
1238	              Errors at High Data Rates", RFC 4963,
1239	              DOI 10.17487/RFC4963, July 2007,
1240	              <https://www.rfc-editor.org/info/rfc4963>.

1242	   [RFC6056]  Larsen, M. and F. Gont, "Recommendations for Transport-
1243	              Protocol Port Randomization", BCP 156, RFC 6056,
1244	              DOI 10.17487/RFC6056, January 2011,
1245	              <https://www.rfc-editor.org/info/rfc6056>.

1247	   [RFC7098]  Carpenter, B., Jiang, S., and W. Tarreau, "Using the IPv6
1248	              Flow Label for Load Balancing in Server Farms", RFC 7098,
1249	              DOI 10.17487/RFC7098, January 2014,
1250	              <https://www.rfc-editor.org/info/rfc7098>.

1252	   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
1253	              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
1254	              2014, <https://www.rfc-editor.org/info/rfc7258>.

1256	   [RFC7707]  Gont, F. and T. Chown, "Network Reconnaissance in IPv6
1257	              Networks", RFC 7707, DOI 10.17487/RFC7707, March 2016,
1258	              <https://www.rfc-editor.org/info/rfc7707>.

1260	   [RFC7721]  Cooper, A., Gont, F., and D. Thaler, "Security and Privacy
1261	              Considerations for IPv6 Address Generation Mechanisms",
1262	              RFC 7721, DOI 10.17487/RFC7721, March 2016,
1263	              <https://www.rfc-editor.org/info/rfc7721>.

1265	   [RFC7739]  Gont, F., "Security Implications of Predictable Fragment
1266	              Identification Values", RFC 7739, DOI 10.17487/RFC7739,
1267	              February 2016, <https://www.rfc-editor.org/info/rfc7739>.

1269	   [Sanfilippo1998a]
1270	              Sanfilippo, S., "about the ip header id", Post to Bugtraq
1271	              mailing-list, Mon Dec 14 1998,
1272	              <http://seclists.org/bugtraq/1998/Dec/48>.

1274	   [Sanfilippo1998b]
1275	              Sanfilippo, S., "Idle scan", Post to Bugtraq mailing-list,
1276	              1998, <https://github.com/antirez/hping/blob/master/docs/
1277	              SPOOFED_SCAN.txt>.

1279	   [Sanfilippo1999]
1280	              Sanfilippo, S., "more ip id", Post to Bugtraq mailing-
1281	              list, 1999,
1282	              <https://github.com/antirez/hping/raw/master/docs/MORE-
1283	              FUN-WITH-IPID>.

1285	   [Shimomura1995]
1286	              Shimomura, T., "Technical details of the attack described
1287	              by Markoff in NYT", Message posted in USENET's
1288	              comp.security.misc newsgroup  Message-ID:
1289	              <3g5gkl$5j1@ariel.sdsc.edu>, 1995,
1290	              <https://www.gont.com.ar/docs/post-shimomura-usenet.txt>.

1292	   [Silbersack2005]
1293	              Silbersack, M., "Improving TCP/IP security through
1294	              randomization without sacrificing interoperability",
1295	              EuroBSDCon 2005 Conference, 2005,
1296	              <http://citeseerx.ist.psu.edu/viewdoc/
1297	              download?doi=10.1.1.91.4542&rep=rep1&type=pdf>.

1299	   [TCPT-uptime]
1300	              McDanel, B., "TCP Timestamping - Obtaining System Uptime
1301	              Remotely", March 2001,
1302	              <https://securiteam.com/securitynews/5np0c153pi/>.

1304	   [Zalewski2001]
1305	              Zalewski, M., "Strange Attractors and TCP/IP Sequence
1306	              Number Analysis", 2001,
1307	              <http://lcamtuf.coredump.cx/oldtcp/tcpseq.html>.

1309	   [Zalewski2002]
1310	              Zalewski, M., "Strange Attractors and TCP/IP Sequence
1311	              Number Analysis - One Year Later", 2001,
1312	              <http://lcamtuf.coredump.cx/newtcp/>.

1314	Appendix A.  Flawed Algorithms

1316	   The following subsections document algorithms with known negative
1317	   security and privacy implications.

1319	A.1.  Predictable Linear Identifiers Algorithm

1321	   One of the most trivial ways to achieve uniqueness with a low
1322	   identifier reuse frequency is to produce a linear sequence.

1324	   For example, the following algorithm has been employed (see e.g.
1325	   [Morris1985], [Shimomura1995], [Silbersack2005] and [CPNI-TCP]) in a
1326	   number of operating systems for selecting IP fragment IDs, TCP
1327	   ephemeral ports, etc.:

1329	       /* Initialization code */

1331	       next_id = min_id;
1332	       id_inc= 1;

1334	       /* Numeric ID selection function */

1336	       count = max_id - min_id + 1;

1338	       do {
1339	           if (next_id == max_id) {
1340	               next_id = min_id;
1341	           }
1342	           else {
1343	               next_id = next_id + id_inc;
1344	           }

1346	           if (check_suitable_id(next_id))
1347	               return next_id;

1349	           count--;

1351	       } while (count > 0);

1353	       return ERROR;

1355	   Note:
1356	      check_suitable_id() is a function that checks whether the
1357	      resulting identifier is acceptable (e.g., whether it's in use,
1358	      etc.).

1360	   For obvious reasons, this algorithm results in predicable sequences.
1361	   If a global counter is used (such as "next_id" in the example above),
1362	   a node that learns one numeric identifier can also learn or guess
1363	   values employed by past and future protocol instances.  On the other
1364	   hand, when the value of increments is known (such as "1" in this
1365	   case), an attacker can sample two values, and learn the number of
1366	   identifiers that were generated in-between.  Furthermore, if the
1367	   counter is initialized e.g. when the system its bootstrapped to some
1368	   known value, it will likely leak information (for example, the number
1369	   of transmitted in the case of an IP ID generator [Sanfilippo1998a],
1370	   or the system uptime in the case of TCP timestamps [TCPT-uptime]).

1372	   Where identifier reuse would lead to a hard failure, one typical
1373	   approach to generate unique identifiers (while minimizing the
1374	   security and privacy implications of predictable identifiers) is to
1375	   obfuscate the resulting numeric identifiers by either:

1377	   o  Replacing the global counter with multiple counters (initialized
1378	      to a random value)

1380	   o  Randomizing the "increments"

1382	   Avoiding global counters essentially means that learning one
1383	   identifier for a given context (e.g., one TCP ephemeral port for a
1384	   given {src IP, Dst IP, Dst Port}) is of no use for learning or
1385	   guessing identifiers for a different context (e.g., TCP ephemeral
1386	   ports that involve other peers).  However, this may imply keeping one
1387	   additional variables/counter per contexts, which may be prohibitive
1388	   in some environments.

1390	   The choice of id_inc has implications on both the security and
1391	   privacy properties of the resulting identifiers, but also on the
1392	   corresponding interoperability properties.  On one hand, minimizing
1393	   the increments (as in "id_inc = 1" in our case) generally minimizes
1394	   the identifier reuse frequency, albeit at increased predictability.
1395	   On the other hand, if the increments are randomized, predictability
1396	   of the resulting identifiers is reduced, and the information leakage
1397	   produced by global constant increments is mitigated.  However, using
1398	   larger increments than necessary can result in higher identifier
1399	   reuse frequency.

1401	A.2.  Random-Increments Algorithm

1403	   This algorithm offers a middle ground between the algorithms that
1404	   select numeric identifiers randomly (such as those described in
1405	   Section 7.1.1 and Section 7.1.2), and those that offer obfuscation
1406	   but no randomization (such as those described in Section 7.4.2 and
1407	   Section 7.4.3).

1409	       /* Initialization code */

1411	       next_id = random();        /* Initialization value */
1412	       id_inc = 500;            /* Determines the trade-off */

1414	       /* Numeric ID selection function */

1416	       id_range = max_id - min_id + 1;

1418	       count = id_range;

1420	       do {
1421	           /* Random increment */
1422	           next_id = next_id + (random() % id_inc) + 1;

1424	           /* Keep the identifier within acceptable range */
1425	           next_id = min_id + (next_id % id_range);

1427	           if(check_suitable_id(next_id))
1428	              return next_id;

1430	           count--;
1431	       } while (count > 0);

1433	       return ERROR;

1435	   This algorithm aims at producing a monotonically-increasing sequence
1436	   of numeric identifiers, while avoiding the use of fixed increments,
1437	   which would lead to trivially predictable sequences.  The value
1438	   "id_inc" allows for direct control of the trade-off between the level
1439	   of obfuscation and the identifier reuse frequency.  The smaller the
1440	   value of "id_inc", the more similar this algorithm is to a
1441	   predicable, global monotonically-increasing ID generation algorithm.
1442	   The larger the value of "id_inc", the more similar this algorithm is
1443	   to the algorithm described in Section 7.1.1 of this document.

1445	   When the identifiers wrap, there is the risk of collisions of
1446	   identifiers (i.e., identifier reuse).  Therefore, "id_inc" should be
1447	   selected according to the following criteria:

1449	   o  It should maximize the wrapping time of the identifier space.

1451	   o  It should minimize identifier reuse frequency.

1453	   o  It should maximize obfuscation.

1455	   Clearly, these are competing goals, and the decision of which value
1456	   of "id_inc" to use is a trade-off.  Therefore, the value of "id_inc"
1457	   should be configurable so that system administrators can make the
1458	   trade-off for themselves.  We note that the alternative algorithms
1459	   discussed throughout this document offer better interoperability,
1460	   security and privacy implications than this algorithm, and hence
1461	   implementation of this algorithm is discouraged.

1463	Authors' Addresses

1465	   Fernando Gont
1466	   SI6 Networks
1467	   Evaristo Carriego 2644
1468	   Haedo, Provincia de Buenos Aires  1706
1469	   Argentina

1471	   Email: fgont@si6networks.com
1472	   URI:   https://www.si6networks.com

1474	   Ivan Arce
1475	   Quarkslab

1477	   Email: iarce@quarkslab.com
1478	   URI:   https://www.quarkslab.com